PdfEx (pdf_ex v0.1.0)

Copy Markdown View Source

Pure-Elixir PDF parsing and lossless surgery engine.

No NIFs, no C bindings, no external binaries — one runtime dependency (:telemetry). This module is the read-oriented facade (open/1, page_count/1, pages/1, extract_text/1,2); editing, projection, and serialization live in dedicated modules:

Every read/edit API is a pure function over an immutable PdfEx.Document; malformed input never raises, it returns {:error, PdfEx.Error.t()}.

Usage

iex> {:ok, doc} = PdfEx.open(File.read!("document.pdf"))
iex> {:ok, pages} = PdfEx.pages(doc)
iex> {:ok, text} = PdfEx.extract_text(doc)

Summary

Functions

Extracts all text, in reading order, joined across pages by a page-break marker.

Extracts text from a single 1-based page_number.

Opens PDF data into a PdfEx.Document. Malformed input returns {:error, ...}; encrypted PDFs are refused.

Like open/1 but returns the document directly and raises on failure.

Returns the document's page count from the catalog's /Pages /Count.

Walks the page tree, returning the leaf pages in document order with inherited attributes resolved.

Functions

extract_text(doc)

@spec extract_text(PdfEx.Document.t()) ::
  {:ok, String.t()} | {:error, PdfEx.Error.t()}

Extracts all text, in reading order, joined across pages by a page-break marker.

extract_text(doc, page_number)

@spec extract_text(PdfEx.Document.t(), pos_integer()) ::
  {:ok, String.t()} | {:error, PdfEx.Error.t()}

Extracts text from a single 1-based page_number.

open(data)

@spec open(binary()) :: {:ok, PdfEx.Document.t()} | {:error, PdfEx.Error.t()}

Opens PDF data into a PdfEx.Document. Malformed input returns {:error, ...}; encrypted PDFs are refused.

open!(data)

@spec open!(binary()) :: PdfEx.Document.t()

Like open/1 but returns the document directly and raises on failure.

page_count(doc)

@spec page_count(PdfEx.Document.t()) ::
  {:ok, non_neg_integer()} | {:error, PdfEx.Error.t()}

Returns the document's page count from the catalog's /Pages /Count.

pages(doc)

@spec pages(PdfEx.Document.t()) ::
  {:ok, [PdfEx.PageTree.Page.t()]} | {:error, PdfEx.Error.t()}

Walks the page tree, returning the leaf pages in document order with inherited attributes resolved.