View Source Pdf.Reader.CID.Decoder (ExPDF v1.0.1)
CID font decoder for Type0/Identity-H and Identity-V composite fonts.
Returns a decoder_fn() closure with the same contract as the simple-font
decoder: (binary() -> {String.t(), [{non_neg_integer(), binary()}]}).
Resolution cascade (per CID)
- ToUnicode CMap — if the font has a
/ToUnicodestream, itsbf_char/bf_rangeentries are checked first (most specific). - Adobe registry table —
/CIDSystemInfo /Orderingmaps to one of the four bundled collection modules (AdobeJapan1,AdobeCNS1,AdobeKorea1,AdobeGB1). O(1) pattern-match dispatch. - U+FFFD fallback — unresolved CIDs yield
U+FFFDplus a sentinel tuple{idx, "cid:0xHHHH"}appended to the unresolved list.
__test_cmap__ shortcut
For unit tests, a pre-parsed %Pdf.Reader.CMap{} can be injected by storing
it in the font dict under the key "__test_cmap__". This bypasses stream
resolution. (Mirrors the same shortcut in Pdf.Reader.Font.)
Width / advance computation
This module handles character decoding only (bytes → Unicode text). Glyph
advance widths (/W and /DW entries on the DescendantFonts[0] dict) are read
separately by Pdf.Reader.Font.Widths (§ 9.7.4.3). The two concerns are
intentionally kept in separate modules: decoding and advance computation are
independent of each other.
Spec references
- PDF 1.7 (ISO 32000-1) § 9.7 — Composite Fonts: https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf
- PDF 1.7 § 9.7.4 — CIDFonts
- PDF 1.7 § 9.7.4.3 — /W and /DW arrays (handled by
Pdf.Reader.Font.Widths) - PDF 1.7 § 9.7.5 — CMaps (Identity-H, Identity-V predefined)
Summary
Functions
Build a CID decoder closure from a Type0 font dict.
Build a predefined CMap decoder closure from a Type0 font dict whose
/Encoding names a bundled predefined CMap (e.g. UniJIS-UTF16-H).
Types
@type decoder_fn() :: (binary() -> {String.t(), [{non_neg_integer(), binary()}]})
Functions
@spec build(map(), Pdf.Reader.Document.t()) :: {:ok, decoder_fn(), Pdf.Reader.Document.t()} | {:error, term()}
Build a CID decoder closure from a Type0 font dict.
font_dict is the top-level Type0 font dictionary (already resolved).
Reads DescendantFonts, CIDSystemInfo, CIDToGIDMap, and ToUnicode.
Returns {:ok, decoder_fn, updated_doc}.
@spec build_predefined(map(), Pdf.Reader.Document.t()) :: {:ok, decoder_fn(), Pdf.Reader.Document.t()} | {:error, term()}
Build a predefined CMap decoder closure from a Type0 font dict whose
/Encoding names a bundled predefined CMap (e.g. UniJIS-UTF16-H).
Resolution cascade per code token (PDF 1.7 § 9.7.5, D9):
- ToUnicode CMap — if present, checked first (most specific).
- Predefined CMap —
cidchar→cidrange→notdeflookup. - Adobe registry table — CID → Unicode via AdobeJapan1/CNS1/Korea1/GB1.
- U+FFFD fallback — unresolved codes yield
U+FFFD+ sentinel.
Returns {:ok, decoder_fn, updated_doc}.