View Source Pdf.Reader.CID.Codespace (ExPDF v1.0.1)
Variable-length codespace-aware tokenizer for predefined CMap byte sequences.
Per PDF 1.7 § 9.7.6, byte sequences are matched against codespace ranges grouped by length (1-4 bytes). Shortest match wins. Bytes that don't match any codespace are silently dropped one at a time.
Spec references
- PDF 1.7 (ISO 32000-1) § 9.7.6 — Codespace ranges: https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf
- Adobe Tech Note #5099 — CMap and CIDFont Files Specification
Summary
Functions
Tokenize a binary into a list of integer codes per codespace ranges.
Types
@type codespaces() :: %{required(1..4) => [{non_neg_integer(), non_neg_integer()}]}
Functions
@spec tokenize(binary(), codespaces()) :: [non_neg_integer()]
Tokenize a binary into a list of integer codes per codespace ranges.
Tries to match the shortest prefix of bytes against one of the codespace
ranges (by byte-length, 1 first). On a hit, appends the big-endian decoded
integer to the result and recurses on the remainder. On a miss for all
lengths 1–4, drops the first byte and recurses.
Returns [non_neg_integer()] (big-endian-decoded integers).