PdfEx.Convert (pdf_ex v0.1.0)

Copy Markdown View Source

HTML projections of a document, and the reverse (HTML edit → PDF text ops).

Two modes, both built off the glyph index:

  • :visual — byte-faithful absolute layout, one <span data-uid=...> per glyph (y_html = mediabox_height - y_pdf - font_size).
  • :semantic — y-band row clustering classified into <h1>/<h2>/<li>/ <p> blocks carrying data-uids ranges (heuristic; best-effort for rotated text).

Reverse mapping turns an edited semantic block back into per-run PdfEx.Op.UpdateTexts (semantic_ops/3) or applies them (apply_semantic_mutation/3). apply_visual_mutation/3 repositions one glyph's run; its delta applies to a Tm matrix's translation components only (documented limitation for scaled/rotated matrices).

Summary

Functions

Applies the semantic_ops/3 plan. Ops are applied in descending run order so each op's uid stays valid: editing a run renumbers only the glyph UIDs that follow it, so earlier runs (applied later) are unaffected (spec D4).

Moves the span containing uid so that glyph lands at the given x/y.

Plans the per-run Op.UpdateTexts that turn the block named by uids into new_text, by Myers-diffing against the run-joined current text. Does not apply them (see apply_semantic_mutation/3).

Renders the document to HTML. mode: :visual (default) is a byte-faithful absolute layout; mode: :semantic emits classified data-uid blocks.

Functions

apply_semantic_mutation(doc, uids, new_text)

@spec apply_semantic_mutation(PdfEx.Document.t(), [binary()] | binary(), String.t()) ::
  {:ok, PdfEx.Document.t()} | {:error, PdfEx.Error.t()}

Applies the semantic_ops/3 plan. Ops are applied in descending run order so each op's uid stays valid: editing a run renumbers only the glyph UIDs that follow it, so earlier runs (applied later) are unaffected (spec D4).

apply_visual_mutation(doc, uid, map)

@spec apply_visual_mutation(PdfEx.Document.t(), binary(), %{x: number(), y: number()}) ::
  {:ok, PdfEx.Document.t()} | {:error, PdfEx.Error.t()}

Moves the span containing uid so that glyph lands at the given x/y.

Token-span patch: rewrites only the nearest preceding Td/TD/Tm operands in the content stream (no regeneration); marks only that /Contents object dirty. Equal-position mutations are no-ops (dirty_objects untouched).

semantic_ops(doc, uids, new_text)

@spec semantic_ops(PdfEx.Document.t(), [binary()] | binary(), String.t()) ::
  {:ok, [PdfEx.Op.UpdateText.t()]} | {:error, PdfEx.Error.t()}

Plans the per-run Op.UpdateTexts that turn the block named by uids into new_text, by Myers-diffing against the run-joined current text. Does not apply them (see apply_semantic_mutation/3).

to_html(doc, opts \\ [])

@spec to_html(PdfEx.Document.t(), [{:mode, :visual | :semantic}]) ::
  {:ok, binary()} | {:error, PdfEx.Error.t()}

Renders the document to HTML. mode: :visual (default) is a byte-faithful absolute layout; mode: :semantic emits classified data-uid blocks.