Forward-only, bounded-memory streaming pack parser.
Accepts raw pack bytes incrementally via ingest/2 and writes each
resolved object directly to an Exgit.ObjectStore as it is decoded.
Memory model (Phase 3+)
| Component | Bound |
|---|---|
| Parse buffer | O(zlib_window) per ingest/2 chunk |
| In-flight inflate | O(one zlib output chunk, ~4 KB) |
| In-flight write handle | O(compressed output) — raw content never sits |
| alongside the compressed form in the heap | |
| offset_to_sha map | ~35 bytes × N objects |
| sha_to_depth map | ~30 bytes × N objects |
For non-delta objects (types blob/tree/commit/tag), each decompressed
chunk is piped immediately to the object store via
ObjectStore.open_write / write_chunk / close_write. The raw content is
never materialised in full — it flows inflate-port → write-handle → store
one HTTP-chunk-sized piece at a time. The adler32 (for zlib boundary
detection) and the git SHA are both computed incrementally.
For delta objects (OFS_DELTA / REF_DELTA), the decompressed delta
instructions must be held in full to call Pack.Delta.apply/2. These
objects are still accumulated in inflate_out; the resulting resolved
content then goes through ObjectStore.put/2 as before.
The compressed-buffer spike of the naive approach (inflate_upper_bound
bytes must be present before inflate can start) is eliminated: the zlib
port is opened as soon as @zlib_min bytes are available and fed
incrementally on every subsequent ingest/2.
Adversarial hardening (Phase 4)
Every limit is enforced per-object during the streaming parse:
max_object_bytes— rejects any object whose declared uncompressed size exceeds the limit before allocating.max_inflate_ratio— zip-bomb defence; ifuncompressed / compressed > ratio, the object is rejected.max_delta_depth— cap on delta chain length; stops an attacker from constructing a chain that forces O(depth) store fetches per object.max_objects— rejects packs with an absurd object count header before any objects are parsed.deadline— monotonic deadline (:erlang.monotonic_time(:millisecond));ingest/2returns{:error, :deadline_exceeded}when the clock passes it.
OFS_DELTA / REF_DELTA resolution
Git packs guarantee that a delta's base always appears earlier in the
pack. Each resolved object is written to the store immediately; OFS_DELTA
looks up pack_offset → {type, sha, depth} in offset_to_sha and
fetches from the store. REF_DELTA uses sha_to_depth to look up the
base depth for chain-length tracking (defaults to 0 for objects already
in the store from a prior fetch).
SHA-1 checksum
A rolling 20-byte delay ensures that sha_tail at finalize/1 contains
exactly the pack's trailing checksum. Verification only happens in
finalize/1 — not in the streaming loop — because sha_tail doesn't
reach the correct final value until all bytes have been fed.
Summary
Functions
Assert the parse is complete: all N objects were decoded and the pack's
SHA-1 trailer matches. Returns {:ok, n_objects, final_store} or
{:error, reason}.
Feed a chunk of raw pack bytes into the parser.
Create a new StreamParser state that will write objects to store.
Types
@type t() :: %Exgit.Pack.StreamParser{ buffer: term(), buffer_start: term(), current: term(), limits: term(), num_objects: term(), objects_done: term(), offset_to_sha: term(), phase: term(), raw_cache: term(), raw_cache_bytes: term(), sha_ctx: term(), sha_tail: term(), sha_to_depth: term(), store: term() }
Functions
@spec finalize(t()) :: {:ok, non_neg_integer(), Exgit.ObjectStore.t()} | {:error, term()}
Assert the parse is complete: all N objects were decoded and the pack's
SHA-1 trailer matches. Returns {:ok, n_objects, final_store} or
{:error, reason}.
final_store is the object store after all objects have been written.
For value-typed stores (e.g. Memory) this is the updated struct; for
side-effect stores (e.g. Disk) it equals the original store reference.
Feed a chunk of raw pack bytes into the parser.
Objects are written to the store as they complete. Returns {:ok, state} when the chunk was processed successfully (the parser may need
more bytes), or {:error, reason} on a fatal parse error.
@spec new( Exgit.ObjectStore.t(), keyword() ) :: t()
Create a new StreamParser state that will write objects to store.
Options:
:max_object_bytes— max inflated size of any single object (default 100 MB).:max_objects— max number of objects in the pack (default 10 M).:max_delta_depth— max delta chain depth (default 50, same as git).:max_inflate_ratio— max uncompressed/compressed ratio; detects zip bombs(default 1000×).:deadline—:erlang.monotonic_time(:millisecond)value afterwhich `ingest/2` returns `{:error, :deadline_exceeded}`. `nil` (default) means no deadline.:raw_cache_bytes— budget in bytes for the raw-content cache used tospeed up delta base resolution (default 64 MB). Set to 0 to disable and always go through the store.