Exgit.Pack.Reader (exgit v0.1.0)

Copy Markdown View Source

Parser for git packfiles (v2 / v3).

All decoder paths return {:ok, _} / {:error, _} tuples — no call on untrusted input ever raises. Memory is bounded via the :max_pack_bytes, :max_objects, :max_object_bytes, and :max_resolved_bytes options so a hostile server cannot exhaust the heap.

Tracked inflate (zlib_inflate_tracked/2)

Git packs store zlib streams concatenated with no length prefix, so the parser must determine exactly how many input bytes each stream consumed. Erlang's :zlib module does not expose consumed- input-count directly, so we detect stream completion by calling :zlib.inflateEnd/1 on a fresh stream after feeding a candidate prefix. inflateEnd raises data_error iff the input did not include a proper end-of-stream marker + adler32, so catching that raise gives a clean succeed-or-raise completeness predicate.

Implementation structure:

  1. Phase 1 — verified full inflate. Open a zlib stream, feed the upper-bounded slice through :zlib.safeInflate/2, drain all output. Verify the total output bytes equal what the pack header declared (expected_size). safeInflate bounds output per call, so a zip-bomb input cannot exhaust the heap.
  2. Phase 2 — bisect for the exact boundary. Binary-search the smallest prefix length whose prefix_complete?/2 returns true. prefix_complete?/2 opens a fresh stream, feeds the prefix via safeInflate (never :zlib.uncompress, which raises on malformed input), then tries inflateEnd and catches the raise. Output from the phase-2 probes is discarded — they're only testing the end-of-stream marker.

prefix_complete?/2 is monotone non-decreasing past the real boundary (once the stream is complete, all longer prefixes are also "complete" from zlib's perspective — it consumes up to the end marker and ignores trailing input), so the binary search is correct regardless of input shape. No hostile construction can desync the search.

Previous implementations used :zlib.uncompress/1 as the probe, which raises on malformed input and (worse) allocates the full decompressed result on every probe. safeInflate + inflateEnd never raises across the API boundary and bounds per-probe output.

Summary

Functions

Parse a single object at offset in the pack. Does NOT iterate the whole pack — used by ObjectStore.Disk for fast single-object lookup via the pack .idx offset. OFS_DELTA and REF_DELTA bases are resolved recursively (REF_DELTA requires an :object_store option).

Types

parsed_object()

@type parsed_object() :: {Exgit.Object.object_type(), binary(), binary()}

Functions

parse(pack_data, opts \\ [])

@spec parse(
  binary(),
  keyword()
) :: {:ok, [parsed_object()]} | {:error, term()}

parse_at(pack_data, offset, opts \\ [])

@spec parse_at(binary(), non_neg_integer(), keyword()) ::
  {:ok, parsed_object()} | {:error, term()}

Parse a single object at offset in the pack. Does NOT iterate the whole pack — used by ObjectStore.Disk for fast single-object lookup via the pack .idx offset. OFS_DELTA and REF_DELTA bases are resolved recursively (REF_DELTA requires an :object_store option).