HTML projections of a document, and the reverse (HTML edit → PDF text ops).
Two modes, both built off the glyph index:
:visual— byte-faithful absolute layout, one<span data-uid=...>per glyph (y_html = mediabox_height - y_pdf - font_size).:semantic— y-band row clustering classified into<h1>/<h2>/<li>/<p>blocks carryingdata-uidsranges (heuristic; best-effort for rotated text).
Reverse mapping turns an edited semantic block back into per-run
PdfEx.Op.UpdateTexts (semantic_ops/3) or applies them
(apply_semantic_mutation/3). apply_visual_mutation/3 repositions one
glyph's run; its delta applies to a Tm matrix's translation components only
(documented limitation for scaled/rotated matrices).
Summary
Functions
Applies the semantic_ops/3 plan. Ops are applied in descending run order so
each op's uid stays valid: editing a run renumbers only the glyph UIDs that
follow it, so earlier runs (applied later) are unaffected (spec D4).
Moves the span containing uid so that glyph lands at the given x/y.
Plans the per-run Op.UpdateTexts that turn the block named by uids into
new_text, by Myers-diffing against the run-joined current text. Does not
apply them (see apply_semantic_mutation/3).
Renders the document to HTML. mode: :visual (default) is a byte-faithful
absolute layout; mode: :semantic emits classified data-uid blocks.
Functions
@spec apply_semantic_mutation(PdfEx.Document.t(), [binary()] | binary(), String.t()) :: {:ok, PdfEx.Document.t()} | {:error, PdfEx.Error.t()}
Applies the semantic_ops/3 plan. Ops are applied in descending run order so
each op's uid stays valid: editing a run renumbers only the glyph UIDs that
follow it, so earlier runs (applied later) are unaffected (spec D4).
@spec apply_visual_mutation(PdfEx.Document.t(), binary(), %{x: number(), y: number()}) :: {:ok, PdfEx.Document.t()} | {:error, PdfEx.Error.t()}
Moves the span containing uid so that glyph lands at the given x/y.
Token-span patch: rewrites only the nearest preceding Td/TD/Tm operands in the content stream (no regeneration); marks only that /Contents object dirty. Equal-position mutations are no-ops (dirty_objects untouched).
@spec semantic_ops(PdfEx.Document.t(), [binary()] | binary(), String.t()) :: {:ok, [PdfEx.Op.UpdateText.t()]} | {:error, PdfEx.Error.t()}
Plans the per-run Op.UpdateTexts that turn the block named by uids into
new_text, by Myers-diffing against the run-joined current text. Does not
apply them (see apply_semantic_mutation/3).
@spec to_html(PdfEx.Document.t(), [{:mode, :visual | :semantic}]) :: {:ok, binary()} | {:error, PdfEx.Error.t()}
Renders the document to HTML. mode: :visual (default) is a byte-faithful
absolute layout; mode: :semantic emits classified data-uid blocks.