Pure-Elixir PDF parsing and lossless surgery engine.
No NIFs, no C bindings, no external binaries — one runtime dependency
(:telemetry). This module is the read-oriented facade (open/1,
page_count/1, pages/1, extract_text/1,2); editing, projection, and
serialization live in dedicated modules:
PdfEx.Editor— structural page ops (insert / delete / reorder)PdfEx.ContentEdit— run-level text replacement and glyph deletionPdfEx.Convert— visual & semantic HTML projection + reverse mutationPdfEx.Serializer— incremental (lossless) and full re-serializationPdfEx.Session— supervised collaborative editing sessionsPdfEx.Font.Surgery— TrueType glyph-subset surgery
Every read/edit API is a pure function over an immutable PdfEx.Document;
malformed input never raises, it returns {:error, PdfEx.Error.t()}.
Usage
iex> {:ok, doc} = PdfEx.open(File.read!("document.pdf"))
iex> {:ok, pages} = PdfEx.pages(doc)
iex> {:ok, text} = PdfEx.extract_text(doc)
Summary
Functions
Extracts all text, in reading order, joined across pages by a page-break marker.
Extracts text from a single 1-based page_number.
Opens PDF data into a PdfEx.Document. Malformed input returns {:error, ...}; encrypted PDFs are refused.
Like open/1 but returns the document directly and raises on failure.
Returns the document's page count from the catalog's /Pages /Count.
Walks the page tree, returning the leaf pages in document order with inherited attributes resolved.
Functions
@spec extract_text(PdfEx.Document.t()) :: {:ok, String.t()} | {:error, PdfEx.Error.t()}
Extracts all text, in reading order, joined across pages by a page-break marker.
@spec extract_text(PdfEx.Document.t(), pos_integer()) :: {:ok, String.t()} | {:error, PdfEx.Error.t()}
Extracts text from a single 1-based page_number.
@spec open(binary()) :: {:ok, PdfEx.Document.t()} | {:error, PdfEx.Error.t()}
Opens PDF data into a PdfEx.Document. Malformed input returns {:error, ...}; encrypted PDFs are refused.
@spec open!(binary()) :: PdfEx.Document.t()
Like open/1 but returns the document directly and raises on failure.
@spec page_count(PdfEx.Document.t()) :: {:ok, non_neg_integer()} | {:error, PdfEx.Error.t()}
Returns the document's page count from the catalog's /Pages /Count.
@spec pages(PdfEx.Document.t()) :: {:ok, [PdfEx.PageTree.Page.t()]} | {:error, PdfEx.Error.t()}
Walks the page tree, returning the leaf pages in document order with inherited attributes resolved.