Elixir wrapper for LiteParse, a fast and lightweight PDF parser written in Rust. Parsing runs locally with no cloud dependencies.

Note: this Elixir binding exposes a subset of the upstream LiteParse features and may not yet cover all of them. Check the upstream project for the complete capability set.

Installation

Add to your mix.exs:

def deps do
  [
    {:liteparse, "~> 0.1.0"}
  ]
end

Usage

Parse a PDF from disk:

{:ok, %{text: text, page_count: n}} = LiteParse.parse("document.pdf")

Parse a PDF from binary data:

{:ok, %{text: text, page_count: n}} = LiteParse.parse_input(pdf_binary)

Options can be passed as a keyword list:

LiteParse.parse("doc.pdf", max_pages: 100, ocr_enabled: false)

Or as a reusable struct:

config = LiteParse.Config.new(ocr_language: "spa", max_pages: 50)
LiteParse.parse("doc.pdf", config)

See LiteParse.Config for the full list of available options.

Supported Formats

  • PDF (.pdf)
  • Microsoft Office (.docx, .xlsx, .pptx, etc.) — requires LibreOffice
  • OpenDocument (.odt, .ods, .odp) — requires LibreOffice
  • Images (.png, .jpg, .tiff, etc.) — requires ImageMagick

License

MIT. See LICENSE.