View Source Pdf.Reader.CMap (ExPDF v1.0.1)
Parser for the ToUnicode CMap subset used in PDF fonts.
Spec reference: PDF 1.7 ยง 9.10.3 and Adobe Tech Note 5099 (CMap and CIDFont Files Specification).
Supported subset
Only beginbfchar/endbfchar and beginbfrange/endbfrange sections
are parsed. Everything else (codespacerange, cidchar, cidrange, notdefchar,
notdefrange, and PostScript prologue/epilogue) is silently skipped.
Data shape
%Pdf.Reader.CMap{
bf_char: %{integer => String.t()}, # O(log n) map lookup
bf_range: [{lo, hi, dst}] # linear scan, dst is String.t() or [String.t()]
}Lookup order
bf_char(O(log n) map) โ checked first.bf_range(linear, typically < 10 entries) โ checked on miss.
Returns nil if not mapped by either table.
UTF-16BE decoding
Hex strings in the CMap (<HHHH...>) are UTF-16BE encoded codepoint sequences.
Erlang's :unicode.characters_to_binary/3 converts them to UTF-8 (Elixir String.t()).
Summary
Functions
Looks up a character code in the CMap.
Parses a ToUnicode CMap binary into a %Pdf.Reader.CMap{} struct.
Types
@type t() :: %Pdf.Reader.CMap{ bf_char: %{required(non_neg_integer()) => String.t()}, bf_range: [{non_neg_integer(), non_neg_integer(), String.t() | [String.t()]}] }
Functions
@spec lookup(t(), non_neg_integer()) :: String.t() | nil
Looks up a character code in the CMap.
Returns the corresponding UTF-8 String.t() or nil if not mapped.
Lookup order: bf_char first (O(log n)), then bf_range (linear scan).
Parses a ToUnicode CMap binary into a %Pdf.Reader.CMap{} struct.
Only bfchar and bfrange sections are extracted.
All other PostScript CMap constructs are skipped silently.