Elixir wrapper for LiteParse, a fast and lightweight PDF parser written in Rust. Parsing runs locally with no cloud dependencies.
Note: this Elixir binding exposes a subset of the upstream LiteParse features and may not yet cover all of them. Check the upstream project for the complete capability set.
Installation
Add to your mix.exs:
def deps do
[
{:liteparse, "~> 0.1.0"}
]
endUsage
Parse a PDF from disk:
{:ok, %{text: text, page_count: n}} = LiteParse.parse("document.pdf")Parse a PDF from binary data:
{:ok, %{text: text, page_count: n}} = LiteParse.parse_input(pdf_binary)Options can be passed as a keyword list:
LiteParse.parse("doc.pdf", max_pages: 100, ocr_enabled: false)Or as a reusable struct:
config = LiteParse.Config.new(ocr_language: "spa", max_pages: 50)
LiteParse.parse("doc.pdf", config)See LiteParse.Config for the full list of available options.
Supported Formats
- PDF (
.pdf) - Microsoft Office (
.docx,.xlsx,.pptx, etc.) — requires LibreOffice - OpenDocument (
.odt,.ods,.odp) — requires LibreOffice - Images (
.png,.jpg,.tiff, etc.) — requires ImageMagick
License
MIT. See LICENSE.