Parser for git packfiles (v2 / v3).
All decoder paths return {:ok, _} / {:error, _} tuples — no call
on untrusted input ever raises. Memory is bounded via the
:max_pack_bytes, :max_objects, :max_object_bytes, and
:max_resolved_bytes options so a hostile server cannot exhaust
the heap.
Tracked inflate (zlib_inflate_tracked/2)
Git packs store zlib streams concatenated with no length prefix,
so the parser must determine exactly how many input bytes each
stream consumed. Erlang's :zlib module does not expose consumed-
input-count directly, so we detect stream completion by calling
:zlib.inflateEnd/1 on a fresh stream after feeding a candidate
prefix. inflateEnd raises data_error iff the input did not
include a proper end-of-stream marker + adler32, so catching
that raise gives a clean succeed-or-raise completeness predicate.
Implementation structure:
- Phase 1 — verified full inflate. Open a zlib stream, feed
the upper-bounded slice through
:zlib.safeInflate/2, drain all output. Verify the total output bytes equal what the pack header declared (expected_size).safeInflatebounds output per call, so a zip-bomb input cannot exhaust the heap. - Phase 2 — bisect for the exact boundary. Binary-search
the smallest prefix length whose
prefix_complete?/2returns true.prefix_complete?/2opens a fresh stream, feeds the prefix viasafeInflate(never:zlib.uncompress, which raises on malformed input), then triesinflateEndand catches the raise. Output from the phase-2 probes is discarded — they're only testing the end-of-stream marker.
prefix_complete?/2 is monotone non-decreasing past the real
boundary (once the stream is complete, all longer prefixes are
also "complete" from zlib's perspective — it consumes up to the
end marker and ignores trailing input), so the binary search is
correct regardless of input shape. No hostile construction can
desync the search.
Previous implementations used :zlib.uncompress/1 as the probe,
which raises on malformed input and (worse) allocates the full
decompressed result on every probe. safeInflate + inflateEnd
never raises across the API boundary and bounds per-probe output.
Summary
Functions
Parse a single object at offset in the pack. Does NOT iterate the
whole pack — used by ObjectStore.Disk for fast single-object lookup
via the pack .idx offset. OFS_DELTA and REF_DELTA bases are resolved
recursively (REF_DELTA requires an :object_store option).
Types
@type parsed_object() :: {Exgit.Object.object_type(), binary(), binary()}
Functions
@spec parse( binary(), keyword() ) :: {:ok, [parsed_object()]} | {:error, term()}
@spec parse_at(binary(), non_neg_integer(), keyword()) :: {:ok, parsed_object()} | {:error, term()}
Parse a single object at offset in the pack. Does NOT iterate the
whole pack — used by ObjectStore.Disk for fast single-object lookup
via the pack .idx offset. OFS_DELTA and REF_DELTA bases are resolved
recursively (REF_DELTA requires an :object_store option).