Native Elixir PDF reader — text extraction with positions, layout reconstruction, links, images, metadata, encryption (RC4/AES-128/AES-256), AcroForm fields, outlines, and annotations. Uses only the Erlang/OTP standard library.
Part of the ExPDF umbrella.
Installation
def deps do
[
{:ex_pdf_read, "~> 1.0"}
]
endMost users should depend on
ex_pdfinstead, which bundles core, components, and the reader.
Usage
{:ok, doc} = Pdf.Reader.open("invoice.pdf")
{:ok, %Pdf.Reader.Result{} = result, _doc} = Pdf.Reader.read(doc)
result.meta.title # "Invoice 042"
result.meta.page_count # 3
for page <- result.pages do
for line <- page.lines, token <- line.tokens do
IO.puts(token.text)
end
endConvenience shapes
{:ok, pages, _} = Pdf.Reader.read(doc, shape: :text) # [String.t()] per page
{:ok, shapes, _} = Pdf.Reader.read(doc, shape: :shapes) # [%Shape{}] flatEncrypted PDFs
{:ok, doc} = Pdf.Reader.open("encrypted.pdf", password: "secret")Supports Standard Security Handler V1–V5 (RC4-40, RC4-128, AES-128, AES-256).
Error recovery
{:ok, doc} = Pdf.Reader.open(broken_bin, recover: true)Recovers from corrupted xref tables, missing %%EOF, broken page-tree
links, dangling font refs, and truncated streams.
Dictionary-based word split
{:ok, result, _} = Pdf.Reader.read(doc, dictionary: :es)Bundled Spanish frequency list (~50k words). Custom wordlists via
MapSet.
License
MIT. See LICENSE.md.