All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.1.0] — 2026-07-01
Initial release: pure-Elixir git client for clone, fetch, push over smart HTTP v2, with lazy partial-clone support and a path-oriented FS API for agents.
See README and BENCHMARKS on the smoketest repo.
Security — credential redaction + ref bounds
- Telemetry no longer leaks URL-embedded credentials. A token
embedded in a remote URL (
https://token@host/…, as git clients commonly accept) is now redacted to***before the URL enters any:telemetryspan metadata (ls_refs,fetch,push) or the[:exgit, :security, :ref_rejected]event. Previously such a token could reach telemetry exporters / log aggregators. Prefer the:authfield regardless — it was already redacted. ls-refsresponses are capped at 1,000,000 refs. A hostile or broken server can no longer stream unbounded refs into client memory; the transport stream halts once the cap trips and the caller gets{:error, {:too_many_refs, cap}}. Tunable via the:max_refsoption onExgit.Transport.HTTP.new/2. Real repos (linux, esp-idf) sit far below the default.- Dependencies bumped to clear HTTP-stack advisories.
req0.5.17 → 0.6.2andmint1.7.1 → 1.9.0resolve the decompression- bomb DoS (CVE-2026-49755), multipart header injection (CVE-2026-49756), and HTTP/2 CONTINUATION flood (CVE-2026-49754). The cross-origin credential-leak test suite was re-run against the newreqline. The only remainingmix hex.auditadvisories are incowlib, reachable solely through theonly: :testbypassdependency — never part of the published package or a consumer's runtime.
Notes
- The optional
:vfsintegration depends on the pre-1.0vfspackage (~> 0.1.0), which itself depends on exgit. The cycle is handled withruntime: false+ a compile-timeCode.ensure_loaded?guard, but treat the integration as pre-release and pinvfsif you rely on it.
Added — size-aware reads
Exgit.FS.size/3— byte size of a blob at a path without materializing its content. The size-aware companion toread_path/4: gate on it before pulling a blob into memory. O(1) for the in-memory store; on-disk loose objects inflate only the header. Resolving the path may fetch trees (small) on a lazy clone, but never the blob — an un-fetched blob returns{:error, :not_local}instead of triggering a possibly-multi-GB fetch. Directories return{:error, :not_a_blob}; gitlink (submodule) entries return{:error, :submodule}— as doread_path/4lookups on them, whilestat/3reports%{type: :submodule}without fetching.Exgit.ObjectStore.object_size/2— new protocol callback backing the above. Memory keeps a parallelsha => sizeindex (no extra decompression);Promisoranswers from cache or returns{:error, :not_local}without fetching.
Added — observability + workload bench
Exgit.Profiler— structured trace of:telemetryspan events emitted during a function call. One-shotprofile/1returns{result, %{total_us, totals, peak_cache_bytes, events}}; manualattach/0+read/1+detach/1for long-running processes. Process-scoped — concurrent profilers don't interfere.Exgit.Repository.memory_report/1— structured memory report (object counts by type, cache_bytes, max_cache_bytes, mode, backend) with consistent shape across all object-store backends; counts report:unknownfor backends that can't be introspected (Disk). Suitable for emission into observability stacks.bench/agent_workload.exs— realistic agent-session benchmark: clone + prefetch + ls + grep + reads, with:coldand:hotvariants. Reports per-op breakdown + peak cache bytes.test/exgit/fs_grep_git_parity_test.exs— correctness oracle:Exgit.FS.grepoutput vsgit grep -nfor 7 representative patterns. Tagged:real_git :slow; gates every push via the extended CI tier.docs/NOTES.md— design notes for deferred work (LRU eviction, decompressed-blob cache, literal-string grep fast path). Captures enough detail that future implementation doesn't re-reason from scratch.
Performance
A perf-focused round triggered by a real-world bug report (partial
clone returning empty packs). Adding real-world fixtures
(cloudflare/agents, anomalyco/opencode) to the benchmark
immediately surfaced three cascading bugs in the core
"clone + prefetch + read" hot path that pyex was too small to
expose:
FS.walkthreaded the updated repo through stream state. Previously discarded the grown promisor fromresolve_tree, so every walk on a lazy repo triggered a fresh commit fetch. Oncloudflare/agents(1,418 files): 7,700 ms → 2 ms per walk. ~3,800× faster.- Promisor cache bytes now tracked as compressed, not decompressed. Previous accounting over-counted by 3-10×, tripping eviction during normal prefetch. Combined with the evictor only dropping commits (not blobs/trees), this could drop the single commit we'd just fetched for a streaming walk.
:max_cache_bytesdefault changed from 64 MiB to:infinity. Unbounded is the right default for partial-clone / prefetch workflows. Callers with real memory envelopes (long-running daemons, low-memory deployments) set an explicit cap based on their budget.:max_resolved_bytesdefault raised from 500 MiB to 2 GiB (matches:max_pack_bytes). The old cap blocked real-world monorepos;anomalyco/opencoderesolves to 524 MB.
Plus two real optimizations:
- Adler32 trailer probe for pack zlib stream tracking.
Replaces an O(log N) binary-search with one linear scan +
one verify probe.
pack.parsewent from 127 ms → 49 ms on pyex (2.6× faster). Saves several seconds on large packs. - Single-pass grep (
matches_inrewrite). Previously split every blob into lines via regex before matching; nowRegex.scanon whole blob + compute line numbers only for matched files. 13× faster on the common case (repo with few matches).
Explicitly reverted: initial parallel-grep implementation
using Task.async_stream. Measured 22× SLOWER on cloudflare/agents
— per-file spawn overhead (~50-100 µs) dominates the microsecond
regex work. Default stays sequential. Callers with substantial
per-file work opt in via max_concurrency: :schedulers.
Full benchmark methodology + per-fixture numbers in
docs/PERFORMANCE.md. Benchmark harness
in bench/review_bench.exs.
Production-readiness round
A follow-up audit after the staff-engineering review closed the reviewer's "what I didn't look at" list:
- Config RCE audit —
Exgit.Configis read-only data; no code path executes values from it (nocore.sshCommand,core.fsmonitor,core.hookspath,insteadOf,includeIfexpansion). A new structural test (test/exgit/security/no_shell_exec_test.exs) assertslib/contains zeroSystem.cmd/:os.cmd/Port.open/Path.expand/Path.absnamecalls; failure means someone introduced a new execution path that needs review against the threat model. - Pack.Writer concurrent-build stress — 3 tests assert 100 parallel builds of identical input produce byte-identical output, 100 parallel builds of distinct input round-trip cleanly, and 1000 sequential builds don't leak zlib ports.
- Decoder fuzz corpus — 10 property tests, 500 cases each,
exercise
Blob.decode/1,Tree.decode/1,Commit.decode/1,Tag.decode/1, andPack.Reader.parse/2on random bytes. Every decoder's "never raises on untrusted input" promise is now explicitly tested. - Config fuzz corpus — 3 property tests, 500 cases each,
cover
Config.parse/1on random bytes, section-header-like noise, and roundtrip fixpoint. Includes RCE-shape regression tests that parsecore.fsmonitor/core.sshCommand/includeIfvalues and assert they are stored verbatim (not executed or expanded). - Walk cross-check vs real git —
test/exgit/walk_real_git_test.exsconstructs 5 DAG shapes (fork, criss-cross, linear, deep-fork, octopus) with real git, then comparesExgit.Walk.merge_base/2andmerge_base_all/2againstgit merge-baseandgit merge-base --all. Found and fixed a nondeterministic LCA-pick bug (criss-cross merges).
Fixed
Walk.merge_base/2picked from the candidateMapSetwithhd(MapSet.to_list(...)), whose order depends on insertion hashing. Multiple-LCA cases (criss-cross merges) returned different SHAs on different runs. Now sorts candidates by{-timestamp, sha}(newest first, SHA-ascending tiebreak) for a deterministic pick. Documented divergence from git's exact tiebreak (traversal-order-dependent) in the docstring.
Added
Walk.merge_base_all/2— returns every valid LCA, matchinggit merge-base --all. Cross-checked against real git on 5 DAG shapes.Diff.trees/4bounds —:max_depth(default 256),:max_changes(defaultnil), and tree-cycle detection via the descent-pathseenset. Hostile trees can no longer overflow the stack or loop forever during a diff.Index.parse/2bounds —:max_entries(default 1M),:max_bytes(default 512 MiB), and SHA-1 checksum verification (:verify_checksum, defaulttrue). Catches hostile indexes claiming 4-billion entries, oversized inputs, and bit-rot.
Changed — breaking (pre-release API redesign)
These changes were driven by an API audit after the staff-engineering review round. Exgit has not yet cut an official release, so we're taking the opportunity to land the right shapes before v0.1.
Exgit.lazy_clone/2removed. Fold intoExgit.clone/2via new options:clone(url)— full clone (eager; default behavior).clone(url, lazy: true)— refs only; objects fetched on demand. Returns%Repository{mode: :lazy}.clone(url, filter: {:blob, :none})— partial clone; commits and trees eager, blobs on demand.clone(url, filter: ..., lazy: true)— refs only; everything on demand.clone(url, path: "...", lazy: true)— returns{:error, :disk_partial_clone_unsupported}(explicit; no silent:path-ignored footgun).
Matches
git clone's single-command mental model.%Exgit.Repository{}gained:modefield (:eager | :lazy). Defaults to:eagerinRepository.new/3.clone(url, lazy: true)andclone(url, filter: ...)produce:lazy.Repository.materialize/2flips:lazy → :eager. Streaming FS ops (FS.walk/2,FS.grep/4) now pattern-match on:eagerand raise on:lazywith a pointer atmaterialize/2orprefetch/3. Callers ofFS.walk/2/FS.grep/4on lazy repos get a clear error message; the previousArgumentErrorchecked struct-internal cache emptiness.FS.prefetch/3withblobs: trueflips:modeto:eageron a previously-lazy repo. After a full prefetch every reachable object is resident, so streaming ops proceed without a second conversion step.blobs: false(trees-only) leaves:modeunchanged.Exgit.Transport.ls_refs/2return shape changed from{:ok, refs}to{:ok, refs, meta}.refsis always a list of{ref_name, sha}2-tuples (the protocol spec never described any other shape);metais a map carrying protocol-v2 side-channel data:meta.head— HEAD's symref target (e.g."refs/heads/main"), present when the server advertises it via the protocol-v2symrefsargument.meta.peeled—%{tag_ref => peeled_target_sha}, populated when the server emitspeeled:<sha>attributes on annotated tags.Exgit.Transport.File.ls_refs/2surfacesmeta.headby reading the on-disk HEAD symref. Every user-defined Transport implementation must update to the new 3-tuple return shape.
Added
Exgit.RefName— validation of git ref names at the transport boundary. Portsgit check-ref-formatrules; emits[:exgit, :security, :ref_rejected]telemetry on hostile names.Exgit.Filter— structured partial-clone filter specs ({:blob, :none},{:blob, {:limit, n}},{:tree, depth},{:raw, "spec"}).Exgit.Repository.materialize/2— convert a Promisor-backed repo into a plainObjectStore.Memory-backed one in a single call.Exgit.Transport.HTTP.request_opts/5and.auth_headers_for/2— exposed for test introspection; host-bound credential check is now the single enforcement point.Exgit.Transport.HTTP.capabilities_cached/1— memoizing capabilities accessor. Reduces HTTP round-trips in agent workflows that issue many fetches against one transport (review #13).Exgit.Error— canonical error struct (%Exgit.Error{code, context, message}). New error paths SHOULD use it; existing ad-hoc shapes ({:error, atom},{:error, {atom, details}}) are preserved for SemVer. v1.0 may coalesce (review #18).Exgit.Credentials.bind_to/2— pipeline-friendly host-binding:Credentials.bearer(token) |> Credentials.bind_to("github.com")(review #44).Exgit.ObjectStore.Promisor.empty?/1— stable abstraction replacing struct-peeking on%Promisor{cache: %Memory{objects: _}}(review #17).Exgit.ObjectStore.Promisor.resolve_with_fetch/2— variant ofresolve/2that threads the grown promisor back on the fetch-but-not-found path so the cache side-effect isn't wasted (review #33).:max_pack_bytes(default 2 GiB),:max_object_bytes(default 100 MiB), and:max_resolved_bytes(default 500 MiB) options onExgit.Pack.Reader.parse/2bound memory on untrusted input (review #11/#35).:max_cache_bytesoption onExgit.ObjectStore.Promisor.new/2— enables FIFO-by-commit eviction so long-running agent loops don't OOM (review #34).:redirectoption onExgit.Transport.HTTP.new/2—false(default),:same_origin, or:follow. Host-bound credentials enforce the cross-origin leak check regardless (review #14).- Protocol v2
symrefsargument onls-refs—Exgit.clone/2now picks the server's actual HEAD target instead of guessingmain/master/first-advertised (review #9). [:exgit, :security, :ref_rejected],[:exgit, :ref_store, :write_failed], and[:exgit, :object_store, :haves_sent],[:exgit, :object_store, :cache_overfull]telemetry events.- Peeled-tag parsing in
packed-refs(review #37). Peeled targets are threaded through for a future fetch-negotiator; not yet surfaced inlist_refs/2. - Dialyzer and Credo in CI (currently report-only; will gate in a future release).
Changed — breaking
Exgit.FS.read_path/3,ls/3,stat/3,write_path/4now return{:ok, result, repo}to support Promisor cache growth across calls. Callers must thread the returnedrepoforward to benefit from the populated cache.Exgit.Transport.HTTP.new/2automatically wraps bare auth tuples ({:basic, u, p},{:bearer, t}, etc.) in a host-bound%Exgit.Credentials{}. Legacy callers are transparently protected against cross-origin credential leaks. To opt out, wrap the tuple withExgit.Credentials.unbound/1.{:callback, fun}auth now receives the request URL as its sole argument (was previously mis-called with zero arguments — crash on first use).ObjectStore.Disk.import_objects/2returns{:error, {:partial_import, [{sha, reason}]}}on any per-object failure instead of crashing or silently succeeding.Exgit.FS.walk/2and.grep/4now raiseArgumentErrorif called on a Promisor-backed repo whose cache is empty, pointing the caller atFS.prefetch/3orRepository.materialize/2. Prefixes no longer silently return empty results.- HTTP requests explicitly set
redirect: falseon Req — no longer depends on Req's default cross-origin auth-stripping behavior. Exgit.Transport.HTTP.ls_refs/2now returns a mix of 2-tuples{ref, sha}and 3-tuples{ref, sha, meta}— the 3-tuple shape carries protocol-v2 attributes likesymref-targetandpeeled. Consumers that care only about the{ref, sha}pair can useelem/2or run through a tuple-shape-agnostic iteration.Tree.new/1accepts:strictoption; whentrue, unknown modes raiseArgumentErrorinstead of being silently coerced (review #10). Default behavior unchanged.
Fixed
- Pack parser no longer raises
ArgumentError/MatchErroron malformed input. Every decoder returns{:error, _}. Pack.Delta.apply/2validates copy offsets, insert lengths, and the result-size cap — hostile deltas produce tagged errors.Pack.Common.decode_type_size_varint/1anddecode_ofs_varint/1return{:error, :truncated}on empty input instead of crashing onFunctionClauseError.- Loose-object parser validates the declared size against the content length and rejects unknown object types with a structured error.
Pack.Indexno longer generates descending0..-1ranges on empty packs (removes Elixir 1.19 deprecation warning).- Commit.decode/1 and Tag.decode/1 validate hex-header
values — a structurally-valid commit with non-hex
tree/parentbytes is rejected with{:error, {:invalid_hex_header, name, value}}instead of crashing downstream accessors (review #23). - Tree.decode/1 validates every entry name against
path-traversal rules — rejects empty,
.,.., any/, any NUL, and case-insensitive.git/.gitmodules(review #2). - RefStore.Disk validates ref names at every public entry
(
read_ref/2,resolve_ref/2,write_ref/4,delete_ref/2) and revalidates symbolic targets read from disk. Hostile targets return{:error, :invalid_ref_name}with telemetry (review #1). - ObjectStore.Disk.get_object/2 wraps
:zlib.uncompress/1intry/rescue, returning{:error, :zlib_error}on corrupt/hostile loose objects instead of raising (review #3). - Pack.Reader zlib tracking uses
:zlib.safeInflate/2+:zlib.inflateEnd/1probes — no:zlib.uncompress/1calls on hostile input; per-probe output is bounded bysafeInflate's implementation-defined threshold (review #4). - Pack.Writer.deflate/1 wraps zlib calls in
try/afterso the zlib port is freed even whendeflate/3raises. Previously a long-running server would slowly leak ports under memory pressure (review #30). - Credentials.host_matches?/2 normalizes both pattern and URL
host: ASCII-case-folded, trailing-dot-stripped.
GITHUB.COM,github.com.,GitHub.com.all match a"github.com"binding. Host-confusion attacks likeevil.comgithub.comstill correctly fail to match (review #5). - Custom
Inspectimpl for%Exgit.Credentials{}— default Inspect would dump the raw token into crash logs (review #15). - Walk.merge_base/2 maintains
stale_in_queueincrementally; the early-termination check is now O(1) instead of O(Q) per iteration. Merge-base on histories with hundreds of shared ancestors is no longer O(Q²) (review #25). - Walk.parse_timestamp/1 uses a module-attribute regex compiled once at load time instead of per-call (review #27).
- Config pre-compiles section-header regexes at module load (review #29).
- Config.parse/1 uses
caseinstead of an unconditional match onparse_key_value/1's result — future branches that return{:error, _}cannot crash the parser, matching the moduledoc's "never raises on untrusted input" contract (review #28). - Pack.Reader bounds
by_sha+resolvedmemory via:max_resolved_bytesso a pack of many small OFS_DELTA chains can't balloon heap beyond the per-pack cap (review #11/#35). - ObjectStore.Disk
pread_tail/3size-probes the pack file and reads the full object body instead of capping at 128 KiB. Objects larger than 128 KiB in packs now decode correctly; previously they silently returned truncated bodies (review #12). - Promisor.collect_commit_haves/1 uses a
:gb_treespriority queue keyed on recency instead of sorting the full commit map. O(N log K) where N is the 256-cap, not O(K log K) per miss (review #32). Exgit.clone/2picks the default branch from the server's HEAD symref (via protocol-v2symrefsonls-refs) instead of guessing from advertised refs (review #9).Exgit.lazy_clone/2emits[:exgit, :ref_store, :write_failed]telemetry if a ref-store write fails during initial seed, instead of silently dropping the ref (review #8).Exgit.push/3emits an empty-but-valid PACK header when pushing a fast-forward that needs no new objects, matching git'ssend-packwire shape; pure-delete pushes still send no pack (review #6).- RefStore.Disk.list_loose_refs/3 caps recursion depth at 16 and refuses to follow symlinks, defending against symlink loops in ref directories (review #36).
- RefStore.Disk parses peeled-tag lines in
packed-refsinstead of silently dropping them (review #37). - FS.resolve_tree/2 accepts a ref that points directly at a tree in both the string-ref and raw-SHA branches (review #40).
- FS.resolve_tree/2 disambiguates 20-byte binary inputs: a binary of all printable ASCII with non-hex characters is treated as a ref name, not a SHA (review #41).
- FS.compile_glob/1 returns a harmless always-false regex on compilation failure instead of raising (review #20).
Security
- CVE-worthy: remote-controlled ref names can no longer escape
the repo root via
Path.join.Exgit.RefNamevalidates every ref at the wire perimeter;RefStore.Diskre-validates defense-in-depth. - CVE-worthy: hostile trees containing path-traversal entry
names (
..,/foo,.git) are rejected atTree.decode/1— they never reach FS operations or a future checkout. - CVE-worthy: a malformed commit (structurally valid but with
non-hex
tree/parentheaders) previously DoS'd every operation that called a Commit accessor (walk, diff, push, FS). Validation moved intodecode/1. - CVE-worthy: credentials set via bare auth tuples are now host-bound automatically. Cross-origin redirects cannot leak the token regardless of Req's redirect behavior. Host matching is ASCII-case-folded and trailing-dot-stripped.
- Pack parser bounded at 2 GiB pack / 100 MiB per-object / 500 MiB resolved-total by default; no hostile server response can unbounded-allocate the BEAM heap.
%Exgit.Credentials{}has a customInspectimpl that redacts auth values; crash logs, SASL reports, and IEx sessions do not leak tokens.- Loose-object zlib decompression is wrapped in
try/rescue; corrupt or tampered objects return tagged errors instead of crashing.