Elixir bindings for pdfium, Google's
Chromium PDF engine, via the Rust pdfium-render
crate. The native library ships precompiled (rustler_precompiled), so
there is no Rust toolchain or separately-installed pdfium to set up.
Read-only toolkit
ExPdfium is a read & extract toolkit: open documents, page counts, rendering, text extraction/search, metadata, page geometry, permissions, structure (bookmarks/links/attachments), and forms/annotations (read). It does not create, edit, or save PDFs.
Example
{:ok, doc} = ExPdfium.open("file.pdf")
{:ok, 3} = ExPdfium.page_count(doc)
{:ok, %ExPdfium.Bitmap{data: data, width: w, height: h}} =
ExPdfium.render_page(doc, 0, dpi: 300)
{:ok, image} = Vix.Vips.Image.new_from_binary(data, w, h, 4, :VIPS_FORMAT_UCHAR)
:ok = ExPdfium.close(doc)
# Encrypted documents:
{:ok, doc} = ExPdfium.open("secret.pdf", password: "hunter2")
Summary
Documents
Explicitly close a document, releasing pdfium memory early. Optional and idempotent.
Open a PDF from a file path or an in-memory binary.
Number of pages in the document.
Rendering
Render a 0-indexed page to an ExPdfium.Bitmap (an uncompressed 4-channel
pixel buffer).
Text & search
Extract the plain text of the whole document. Pages are joined by a form-feed
("\f") character. Returns {:error, :document_closed} if the document has
been closed.
Extract the plain text of a 0-indexed page.
Search a page for query, returning the matches.
Return the page's text as runs (segments), each with its bounding box.
Forms & annotations
Return the annotations on a 0-indexed page, in page order.
Read the document's AcroForm fields, one entry per widget, across all pages.
Return which interactive-form technology the document uses.
Diagnostics
Return a marker string confirming the native pdfium library loaded and initialized. Useful as a smoke test that the precompiled NIF is healthy.
Documents
@spec close(ExPdfium.Document.t()) :: :ok
Explicitly close a document, releasing pdfium memory early. Optional and idempotent.
Documents are also closed when garbage-collected, but that close is processed asynchronously (on a background thread, so it can't stall a scheduler while a long render holds the pdfium lock). Call this for deterministic, immediate release.
@spec open( Path.t() | binary(), keyword() ) :: {:ok, ExPdfium.Document.t()} | {:error, atom()}
Open a PDF from a file path or an in-memory binary.
A binary beginning with "%PDF" is treated as document bytes; any other binary
is treated as a file path. (A few PDFs carry junk bytes before the header; pass
those as an explicit path, or strip the leading bytes.)
Options
:password— password for an encrypted PDF (defaultnil)
Errors
Returns {:error, reason} where reason is one of:
:enoent— the path does not exist:invalid_pdf— the bytes are not a parseable PDF:password_error— the document is encrypted and the password was missing or incorrect:unsupported_security— unsupported encryption/security handler:file_error/:io_error/:open_failed— other read/open failures:bad_source— internal: malformed source argument (e.g. a non-UTF-8 path)
@spec page_count(ExPdfium.Document.t()) :: {:ok, non_neg_integer()} | {:error, :document_closed | :lock_poisoned}
Number of pages in the document.
Returns {:error, :document_closed} if the document has been closed with
close/1.
Rendering
@spec render_page(ExPdfium.Document.t(), non_neg_integer(), keyword()) :: {:ok, ExPdfium.Bitmap.t()} | {:error, atom()}
Render a 0-indexed page to an ExPdfium.Bitmap (an uncompressed 4-channel
pixel buffer).
Options
Sizing (highest precedence first; the default is dpi: 72):
:widthand/or:height— output size in pixels (aspect-preserving if only one is given):scale— multiple of the natural size (1.0== 72 DPI):dpi— dots per inch (e.g.150,300)
Other:
:format—:rgba(default) or:bgra(pdfium's native order, no conversion):background—:white(default) or:transparent
Bitmap layout
data is width * height * 4 bytes, row-major, stride (== width * 4) bytes
per row, 8 bits per channel. Hand it straight to Vix/Image:
{:ok, %ExPdfium.Bitmap{data: data, width: w, height: h}} =
ExPdfium.render_page(doc, 0, dpi: 300)
{:ok, image} = Vix.Vips.Image.new_from_binary(data, w, h, 4, :VIPS_FORMAT_UCHAR)Errors
:page_out_of_bounds— no such page index:document_closed— the document was closed:unsupported_format/:unsupported_background— bad option value:render_failed— pdfium failed to render the page
Text & search
@spec extract_text(ExPdfium.Document.t()) :: {:ok, String.t()} | {:error, atom()}
Extract the plain text of the whole document. Pages are joined by a form-feed
("\f") character. Returns {:error, :document_closed} if the document has
been closed.
@spec extract_text(ExPdfium.Document.t(), non_neg_integer()) :: {:ok, String.t()} | {:error, atom()}
Extract the plain text of a 0-indexed page.
Returns {:error, :document_closed} or {:error, :page_out_of_bounds} as
appropriate. A page with no text returns {:ok, ""}.
@spec search_text(ExPdfium.Document.t(), non_neg_integer(), String.t(), keyword()) :: {:ok, [%{text: String.t(), rects: [bounds()]}]} | {:error, atom()}
Search a page for query, returning the matches.
Each match is %{text: String.t(), rects: [t:bounds/0]} — a match can span more
than one rect when it wraps across lines.
Options
:match_case— case-sensitive (defaultfalse):whole_word— match whole words only (defaultfalse)
An empty query returns {:error, :empty_query}.
@spec text_segments(ExPdfium.Document.t(), non_neg_integer()) :: {:ok, [%{text: String.t(), bounds: bounds()}]} | {:error, atom()}
Return the page's text as runs (segments), each with its bounding box.
Each element is %{text: String.t(), bounds: t:bounds/0}. Bounds are in PDF
points (see bounds/0).
Metadata & geometry
Forms & annotations
@spec annotations(ExPdfium.Document.t(), non_neg_integer()) :: {:ok, [map()]} | {:error, atom()}
Return the annotations on a 0-indexed page, in page order.
Each annotation is:
%{
type: atom(), # the PDF /Subtype, e.g. :text, :highlight,
# :link, :widget, :ink, :stamp, :free_text…
bounds: t:bounds/0 | nil, # the annotation rectangle, in PDF points
contents: String.t() | nil, # the /Contents text
name: String.t() | nil, # the annotation's /NM name (not a field name)
hidden: boolean(),
printed: boolean()
}Widget annotations (form-field controls) are listed alongside markup
annotations; use form_fields/1 to read their field values. A page with no
annotations returns {:ok, []}.
@spec form_fields(ExPdfium.Document.t()) :: {:ok, [map()]} | {:error, atom()}
Read the document's AcroForm fields, one entry per widget, across all pages.
Each field is:
%{
name: String.t() | nil, # the field's /T name
type: :text | :checkbox | :radio_button | :combo_box | :list_box |
:push_button | :signature | :unknown,
value: String.t() | nil, # text/combo/list value, or the selected on-state of a button group
checked: boolean() | nil, # checkbox/radio only; nil for other types
read_only: boolean(),
required: boolean(),
page: non_neg_integer(), # 0-indexed page the widget sits on
bounds: t:bounds/0 | nil
}A checkbox or radio group shares one name across its option widgets, so it
surfaces as one entry per option widget. For these, value is the group's
currently-selected on-state (the same string on every widget in the group),
and checked flags which widget is the selected one — so to find a radio
group's answer, take the value of the entry whose checked is true. A
document with no form returns {:ok, []}.
value and checked are read straight from pdfium without coercion: a
checked checkbox is %{value: "Yes", checked: true}, never flattened to a
string.
Limitations
- This reads a group's selected value, not its available options — pdfium
does not expose per-option export names for checkbox/radio groups. A naive
Map.new(fields, &{&1.name, &1.value})collapses a group to one entry; to find a group's answer, take thevalueof the entry whosecheckedistrue. - A multi-select list box reports only pdfium's single
valuestring, so additional selections beyond the first are not surfaced.
@spec form_type(ExPdfium.Document.t()) :: {:ok, :none | :acrobat | :xfa_full | :xfa_foreground} | {:error, atom()}
Return which interactive-form technology the document uses.
One of :none, :acrobat (a classic AcroForm), :xfa_full, or
:xfa_foreground (XFA forms). A document with no form returns {:ok, :none}.
XFA caveat
Reading XFA form data requires a pdfium build with the V8 JavaScript engine,
which ExPdfium does not ship. form_fields/1 reads AcroForm fields; for an
:xfa_full document the AcroForm view may be empty or partial.
Diagnostics
@spec pdfium_version() :: String.t()
Return a marker string confirming the native pdfium library loaded and initialized. Useful as a smoke test that the precompiled NIF is healthy.
pdfium exposes no build-version string through its public C API, so this is a fixed confirmation marker rather than a version number.