All notable changes to this project are documented here. The format is based on Keep a Changelog, and the project aims to follow Semantic Versioning.

[0.1.0] - 2026-06-15

First release: a pure-Elixir PDF parsing and lossless surgery engine.

Parsing & extraction

  • Lazy dual-AST parser: classic xref tables, PDF 1.5+ xref streams, object streams, Flate + PNG-predictor decoding.
  • PdfEx.open/1, page_count/1, pages/1, extract_text/1,2.
  • Text extraction with positions, fonts, real /Widths metrics, and ToUnicode/encoding decoding.

Editing

  • Structural page ops (PdfEx.Editor): insert / delete (lossless free) / reorder, with inherited-attribute materialization on reorder-flatten.
  • Run-level text editing (PdfEx.ContentEdit): replace_text/3, delete_glyph/2, run_text/2 — token-span patches with width compensation; single-byte fonts and Type0 / Identity-H composite fonts.
  • Stable per-glyph UIDs and visual position mutation (PdfEx.Convert.apply_visual_mutation/3).

Projection

  • Visual and semantic HTML (PdfEx.Convert.to_html/2) with data-uid back-references; reverse mapping of edited semantic blocks into per-run text ops (semantic_ops/3, apply_semantic_mutation/3).

Collaboration

  • Supervised per-document editing sessions (PdfEx.Session) with a crash-surviving snapshot cache, plain-struct operations (PdfEx.Op), and operational transformation (PdfEx.OT) for intention-preserving concurrent edits.

Serialization

  • Incremental-first serializer (PdfEx.Serializer): byte-exact round-trip on unmodified documents, xref style matched to the source; opt-in full re-serialization (mode: :full, a single clean revision, not byte-lossless).

Fonts

  • TrueType glyph-retaining subset surgery (PdfEx.Font.Surgery) with composite-glyph closure and recomputed checksums.

Robustness

  • Hardened against hostile input: atom-table exhaustion, nesting-depth bombs, circular xref//Length chains, unbounded xref-stream ranges, malformed positioning operands, CR/LF escaping in re-serialized strings, spec-legal real number forms, huge-float serialization, and refc binary pinning.
  • Real-PDF corpus harness and a deterministic fuzz suite.