LlamaIndex have a most excellent open source project called LiteParse, which provides a Node.js CLI tool for extracting text from PDFs. I got a version of LiteParse working entirely in the browser, using most of the same libraries that Lit
Simon Willison's WeblogApril 23, 2026
Extract PDF text in your browser with LiteParse for the web
by Simon Willison's Weblog