All notable changes to this project are documented here. The format is based on Keep a Changelog, and the project aims to follow Semantic Versioning.
[0.1.0] - 2026-06-15
First release: a pure-Elixir PDF parsing and lossless surgery engine.
Parsing & extraction
- Lazy dual-AST parser: classic xref tables, PDF 1.5+ xref streams, object streams, Flate + PNG-predictor decoding.
PdfEx.open/1,page_count/1,pages/1,extract_text/1,2.- Text extraction with positions, fonts, real
/Widthsmetrics, and ToUnicode/encoding decoding.
Editing
- Structural page ops (
PdfEx.Editor): insert / delete (lossless free) / reorder, with inherited-attribute materialization on reorder-flatten. - Run-level text editing (
PdfEx.ContentEdit):replace_text/3,delete_glyph/2,run_text/2— token-span patches with width compensation; single-byte fonts and Type0 / Identity-H composite fonts. - Stable per-glyph UIDs and visual position mutation
(
PdfEx.Convert.apply_visual_mutation/3).
Projection
- Visual and semantic HTML (
PdfEx.Convert.to_html/2) withdata-uidback-references; reverse mapping of edited semantic blocks into per-run text ops (semantic_ops/3,apply_semantic_mutation/3).
Collaboration
- Supervised per-document editing sessions (
PdfEx.Session) with a crash-surviving snapshot cache, plain-struct operations (PdfEx.Op), and operational transformation (PdfEx.OT) for intention-preserving concurrent edits.
Serialization
- Incremental-first serializer (
PdfEx.Serializer): byte-exact round-trip on unmodified documents, xref style matched to the source; opt-in full re-serialization (mode: :full, a single clean revision, not byte-lossless).
Fonts
- TrueType glyph-retaining subset surgery (
PdfEx.Font.Surgery) with composite-glyph closure and recomputed checksums.
Robustness
- Hardened against hostile input: atom-table exhaustion, nesting-depth bombs,
circular xref/
/Lengthchains, unbounded xref-stream ranges, malformed positioning operands, CR/LF escaping in re-serialized strings, spec-legal real number forms, huge-float serialization, and refc binary pinning. - Real-PDF corpus harness and a deterministic fuzz suite.