How fast is exgit at what it's designed for — agent workflows that lazy-clone a repo, prefetch the trees, and do many reads (grep, read_path, walk)?
Short answer: on anomalyco/opencode (4,645 files, ~30 MB
fetch pack), a cold clone + prefetch completes in ~7 s and
steady-state grep runs in 130–160 ms for literal patterns
(Boyer-Moore) or ~335 ms for case-insensitive regex. On the
436-file adafruit/Adafruit_CircuitPython_Bundle, steady-state
grep runs in ~2 ms.
TL;DR
| Fixture | Files | Fetch pack | Clone + prefetch | Grep (literal) | Grep (regex/ci) |
|---|---|---|---|---|---|
ivarvong/pyex | 275 | 1.2 MB | ~700 ms | ~11 ms | ~11 ms |
cloudflare/agents | 1,418 | 4 MB | ~8 s | ~58 ms | ~58 ms |
anomalyco/opencode | 4,645 | ~30 MB | ~7 s | ~140 ms | ~335 ms |
adafruit/Adafruit_CircuitPython_Bundle | 436 | ~1 MB | ~2 s | ~2 ms | ~15 ms |
"Clone + prefetch" = Exgit.clone(url, lazy: true) +
Exgit.FS.prefetch(repo, "HEAD", blobs: true).
"Grep (literal)" = case-sensitive, no regex metacharacters;
routes through :binary.matches (Boyer-Moore). "Grep (regex/ci)"
= case-insensitive %Regex{} scan.
Both grep numbers are steady-state (warm CPU, all objects in the Memory store). The first grep after prefetch is the same cost — unlike an older lazy-fetch path, prefetch now pre-populates everything.
What we measure
Four fixtures, each a real public GitHub repo. Picked to cover a small-to-large size range with repos that have submodules, many binary assets, and diverse layouts:
ivarvong/pyex— owned by the exgit maintainer. Guaranteed against surprise force-push. Small (275 files), good for validating algorithmic baselines.cloudflare/agents— ~1.4k files. Medium real-world project.anomalyco/opencode— ~4.6k files, 26 MB blob in the pack. Large.adafruit/Adafruit_CircuitPython_Bundle— 436 files, uses git submodules (.gitmodulespresent). Previously crashed on prefetch due to an over-eager reserved-name check. Included as a regression fixture.
The benchmark harness (bench/review_bench.exs) does:
Exgit.clone(url, lazy: true)— refs only, no objects.Exgit.FS.prefetch(repo, "HEAD", blobs: true)— stream the full blob pack directly into the Memory store viaPack.StreamParser.- One "cold" grep — steady-state (prefetch already populated everything).
- Five "warm" greps — report the median.
Benchmarks
All numbers are medians measured on a MacBook over a home internet
connection. transport.fetch varies 30–50% run-to-run due to
network; pack.stream_parse, fs.grep, and fs.walk are stable.
anomalyco/opencode (4,645 files)
Phase measured
--------------------------------------------------------------
clone(url, lazy: true) 0.26 s
prefetch(blobs: true) 6.4 s ← streaming parser
grep "scd" literal 158 ms
grep "TODO" literal 130 ms
grep "useState" literal 138 ms
grep "export default" literal 132 ms
grep "anthropic" regex/ci 333 msGrep phase breakdown (4,645 blobs, 82 MB raw text):
| Phase | Time | Share |
|---|---|---|
| Tree walk | 13 ms | 4% |
zlib.uncompress × 4,645 blobs | 140 ms | 43% |
| Boyer-Moore scan (literal) | ~3 ms | 1% |
| PCRE scan (regex/ci) | ~150 ms | 46% |
| Line lookup + result alloc | ~15–65 ms | ~10% |
Literal patterns spend almost no time in the scan phase; the
bottleneck is zlib.uncompress in the Memory store. Case-insensitive
regex pays both the decompress and a slower PCRE scan.
adafruit/Adafruit_CircuitPython_Bundle (436 files)
Phase measured
--------------------------------------------------------------
clone(url, lazy: true) 1.16 s
prefetch(blobs: true) 0.65 s
grep "scd" literal 2.2 ms (14 hits)
grep "scd" literal 2.5 ms
grep "scd" literal 2.3 ms436 files fit entirely in L3 cache after one prefetch pass; Boyer-Moore through the full repo takes 2 ms and barely registers.
Scaling
| Files | Grep / literal (ms) | Per-file (µs) |
|---|---|---|
| 275 | 11 | 40 |
| 436 | 2 | 5 |
| 1,418 | 58 | 41 |
| 4,645 | 140 | 30 |
Per-file cost on opencode is lower than smaller repos because
its blobs are larger (more bytes compressed per file → fewer
zlib.uncompress calls relative to scan throughput); on adafruit
it is 5 µs because the compressed blobs stay warm in CPU cache
after the first grep.
Architecture: end-to-end streaming pipeline
The biggest structural change since the original benchmarks is the replacement of the buffered pack pipeline with a fully streaming one. The old shape:
HTTP response → full binary in heap
→ Pack.Reader.parse (binary + resolved objects in heap simultaneously)
→ import_objects (another copy into the store)Peak memory for opencode's 135 MB pack: ~400 MB (pack binary + decoded object list + compressed store).
The new shape:
HTTP chunks → PktLine.Decoder → sideband demux
→ Pack.StreamParser.ingest/2 (one chunk at a time)
├── type/size header decode
├── zlib inflate port (open across ingest calls)
├── streaming deflate → ObjectStore directly
└── OFS/REF delta resolved through store
→ StreamParser.finalize/1 (checksum verify)Peak memory: one HTTP chunk (~4 KB) + one object's compressed bytes in the write handle + the compressed store. The pack binary never exists as a whole.
opencode prefetch: 57 s → 6.4 s — most of the 57 s was the old Pack.Reader holding 135 MB of binary and the object list simultaneously, triggering multiple major GC cycles. The streaming parser never triggers that pressure.
Adversarial hardening in the parser
Pack.StreamParser.new/2 accepts limits enforced per-object
during the streaming parse:
max_object_bytes: 100 MB — rejects before allocating
max_inflate_ratio: 1000× — zip-bomb defence (compressed/raw ratio)
max_delta_depth: 50 — OFS/REF delta chain cap (same as git)
max_objects: 10 M — rejects absurd pack headers
deadline: nil — monotonic cutoff; returns :deadline_exceededThese fire during streaming, not as a post-parse check, so a hostile pack stops consuming CPU/memory immediately.
Grep: literal pattern fast path
FS.grep/4 and FS.multi_grep/4 detect case-sensitive literal
patterns (no PCRE metacharacters) at compile time and route them
through :binary.matches (Boyer-Moore-Horspool in the BEAM
runtime) instead of Regex.scan:
# case-sensitive, no metacharacters → :binary.matches (9.5× faster)
FS.grep(repo, "HEAD", "useState")
# case-insensitive or metacharacters → Regex.scan
FS.grep(repo, "HEAD", "useState", case_insensitive: true)
FS.grep(repo, "HEAD", "use.*State")Measured on 7.4 MB of synthetic text:
| Engine | Time | Speedup |
|---|---|---|
:binary.matches (literal) | 8.6 ms | 1× (baseline) |
Regex.scan (literal regex) | 82 ms | 9.5× slower |
Regex.scan (ci regex) | >10 s | >>100× slower at high hit density |
For typical code-search patterns (function names, import paths, identifiers), the literal path is the default. Most agent queries hit it without any caller changes.
Parallelism: still a net loss
An earlier attempt parallelized FS.grep across blobs via
Task.async_stream. Result on opencode:
sequential (default): 340 ms
parallel (16 workers): 1550 ms ← 4.5× SLOWERThe cause: zlib.uncompress is a regular (non-dirty) NIF. Running
16 concurrent calls each allocating large binaries simultaneously
causes severe GC pressure — 74 MB of heap allocation per grep in
16 processes simultaneously fragments memory and triggers
stop-the-world GC. The sequential path avoids this: each blob's
bytes are allocated, used, and collected before the next blob is
touched.
max_concurrency: :schedulers remains available for callers with
workloads where per-file work is substantial (large blobs, I/O-bound
stores). For typical code search on a Memory-backed repo, leave it
at the default of 1.
Bug fixes in this cycle
.gitmodules blocked legitimate repos
Tree.decode/1 was rejecting .gitmodules as a reserved entry
name, treating it the same as .git (CVE-2014-9390 class). The
comment even noted it was pre-emptive: "URL-injection vector for
submodule handling if/when we add submodules."
The consequence: any repo that uses git submodules — including
adafruit/Adafruit_CircuitPython_Bundle — crashed on prefetch
with {:tree_entry_name_reserved, ".gitmodules"}.
Fix: .gitmodules is now accepted. The URL-injection concern
only applies if we process submodule URLs, which exgit does not.
.git remains rejected (CVE-2014-9390 is real on case-insensitive
filesystems even for read-only clients).
Earlier bugs (still in history)
Three compounding bugs in the original hot path documented here
for historical context (fixes landed in commit 550100d):
FS.walkdiscarded the updated repo afterresolve_tree, re-fetching the same commit from GitHub on everywalkcall. 7.7s → 2 ms on cloudflare/agents.Promisor cache accounting counted decompressed bytes while the store held compressed bytes; eviction fired 3–10× too early and dropped commits that were immediately needed. Fixed by tracking compressed sizes.
:max_resolved_bytesdefault of 500 MiB rejected opencode's ~524 MiB resolved set. Raised to 2 GiB.
Optimizations that matter (shipped)
In order of impact:
Streaming pack parser (
Pack.StreamParser) — replaces the bufferedPack.Readerin all fetch/prefetch paths. Eliminates the O(pack_size) binary + object list from the heap; bounded to one chunk + one object at a time. opencode prefetch: 57 s → 6 s.Streaming object-store writes —
open_write/write_chunk/close_writeprotocol onObjectStore; Memory and Disk stores stream compressed output as inflate output arrives. Raw content never coexists with compressed form in the heap.Walk state threading — updated repo threaded through the walk
Stream.resourcestate, eliminating per-walk network fetches on lazy repos. 3,800× faster on cloudflare/agents.Literal grep fast path —
:binary.matches(Boyer-Moore) for case-sensitive literal patterns. 9.5× faster scan per blob; visible at adafruit scale (2 ms grep) and meaningful at opencode scale (dominant cost shifts tozlib.uncompress, not scan).Adler32 probe for pack zlib tracking — finds the end of each zlib stream in O(1) instead of O(log N) binary-search probes. 2.6× faster
Pack.Reader.parse(still used for Disk store random-access lookups).Sequential grep as default — avoids Task.async_stream GC pressure on typical workloads.
What we're not doing
Decompressed-blob cache. The 140 ms
zlib.uncompresstax is paid on every grep call. Arepo.blob_cache: %{sha => binary}field on the Repository struct, populated by aFS.warm/2call, would reduce repeated greps to near-zero. The design is correct (state on the struct, caller opts in, GC'd with the repo) but deferred until a measured workload asks for it. We explicitly ruled out ETS, Process dictionary, and persistent_term — any cache must be caller-visible and scoped to the repo value.NIF-based zlib / libdeflate. Would reduce
zlib.uncompresscost 3–5×, making the 140 ms → ~30 ms. Undercuts the "pure Elixir, no NIFs" positioning; not doing this without a concrete workload and a clear tradeoff decision.Parallel pack parsing. OFS_DELTA chains impose a sequential dependency (base must precede delta in the forward walk). A two-pass design could unlock parallelism for the inflate phase; left for when a workload demonstrates the need.
Chunked parallel grep. Per-task
Task.async_streamat file granularity is net-negative (4.5× slower). A chunked variant batching 200–500 files per task would amortize spawn overhead and likely win on 10k+ file repos. Needs a measured workload.
Running the benchmark yourself
# Clone + prefetch + grep workflow (all fixtures, 30 runs each)
mix run bench/review_bench.exs
# Filter to one fixture
mix run bench/review_bench.exs 10 opencode
# Local pack parse: StreamParser vs Pack.Reader head-to-head
# (requires local opencode .git pack files)
mix run bench/local_pack_eval.exs
# Pack parse scaling (synthetic, no network)
mix run bench/pack_parse_bench.exs
# Agent-session simulation: multi_grep + grep+context + blame + read_lines
mix run bench/agent_session_bench.exs
Memory model summary
| Component | Bound |
|---|---|
| HTTP transport | One pkt-line per ingest chunk |
| Pack buffer | One object's compressed bytes |
| In-flight inflate | O(zlib_window) per chunk |
| Streaming write handle | O(compressed output chunks) |
| offset_to_sha map | ~35 bytes × N objects |
| sha_to_depth map | ~30 bytes × N objects |
| raw_cache (delta resolution) | 64 MB budget (plain map in StreamParser state) |
| Object store (Memory) | All objects compressed — inherent minimum |
The object store is the floor: if you fetch a 135 MB pack and store it in a Memory backend, you'll hold however many bytes the compressed objects take. Exgit does not add overhead on top of that minimum.
Correctness oracle
FS.grep output is validated against git grep via
test/exgit/fs_grep_git_parity_test.exs. The test builds a small
real-git repo, runs both git grep -n and Exgit.FS.grep against
a set of representative patterns, and asserts the two agree on the
(path, line_number) match set. Tagged :real_git and :slow.
History
See CHANGELOG.md for the feature-level history.
Key perf commits:
- Streaming pack parser, streaming writes, literal grep,
.gitmodulesfix — current PR 550100d— walk state threading; cache accounting fix; Adler32 probe9bb1256— partial clone haves bug fix8678b0d— initial Adler32 probe; code-quality gates