# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.1.0] — 2026-07-01

Initial release: pure-Elixir git client for clone, fetch, push over
smart HTTP v2, with lazy partial-clone support and a path-oriented FS
API for agents.

See [README](./README.md) and [BENCHMARKS on the smoketest
repo](https://github.com/ivarvong/exgit_smoketest/blob/main/BENCHMARKS.md).

### Security — credential redaction + ref bounds

- **Telemetry no longer leaks URL-embedded credentials.** A token
  embedded in a remote URL (`https://token@host/…`, as git clients
  commonly accept) is now redacted to `***` before the URL enters any
  `:telemetry` span metadata (`ls_refs`, `fetch`, `push`) or the
  `[:exgit, :security, :ref_rejected]` event. Previously such a token
  could reach telemetry exporters / log aggregators. Prefer the
  `:auth` field regardless — it was already redacted.
- **`ls-refs` responses are capped at 1,000,000 refs.** A hostile or
  broken server can no longer stream unbounded refs into client
  memory; the transport stream halts once the cap trips and the
  caller gets `{:error, {:too_many_refs, cap}}`. Tunable via the
  `:max_refs` option on `Exgit.Transport.HTTP.new/2`. Real repos
  (linux, esp-idf) sit far below the default.
- **Dependencies bumped to clear HTTP-stack advisories.** `req`
  `0.5.17 → 0.6.2` and `mint` `1.7.1 → 1.9.0` resolve the decompression-
  bomb DoS (CVE-2026-49755), multipart header injection
  (CVE-2026-49756), and HTTP/2 CONTINUATION flood (CVE-2026-49754).
  The cross-origin credential-leak test suite was re-run against the
  new `req` line. The only remaining `mix hex.audit` advisories are in
  `cowlib`, reachable solely through the `only: :test` `bypass`
  dependency — never part of the published package or a consumer's
  runtime.

### Notes

- The optional `:vfs` integration depends on the pre-1.0 `vfs`
  package (`~> 0.1.0`), which itself depends on exgit. The cycle is
  handled with `runtime: false` + a compile-time `Code.ensure_loaded?`
  guard, but treat the integration as pre-release and pin `vfs` if you
  rely on it.

### Added — size-aware reads

- **`Exgit.FS.size/3`** — byte size of a blob at a path *without*
  materializing its content. The size-aware companion to
  `read_path/4`: gate on it before pulling a blob into memory.
  O(1) for the in-memory store; on-disk loose objects inflate only
  the header. Resolving the path may fetch trees (small) on a lazy
  clone, but never the blob — an un-fetched blob returns
  `{:error, :not_local}` instead of triggering a possibly-multi-GB
  fetch. Directories return `{:error, :not_a_blob}`; gitlink
  (submodule) entries return `{:error, :submodule}` — as do
  `read_path/4` lookups on them, while `stat/3` reports
  `%{type: :submodule}` without fetching.
- **`Exgit.ObjectStore.object_size/2`** — new protocol callback
  backing the above. Memory keeps a parallel `sha => size` index
  (no extra decompression); `Promisor` answers from cache or
  returns `{:error, :not_local}` without fetching.

### Added — observability + workload bench

- **`Exgit.Profiler`** — structured trace of `:telemetry` span
  events emitted during a function call. One-shot `profile/1`
  returns `{result, %{total_us, totals, peak_cache_bytes,
  events}}`; manual `attach/0` + `read/1` + `detach/1` for
  long-running processes. Process-scoped — concurrent
  profilers don't interfere.
- **`Exgit.Repository.memory_report/1`** — structured memory
  report (object counts by type, cache_bytes, max_cache_bytes,
  mode, backend) with consistent shape across all object-store
  backends; counts report `:unknown` for backends that can't be
  introspected (Disk). Suitable for emission into observability
  stacks.
- **`bench/agent_workload.exs`** — realistic agent-session
  benchmark: clone + prefetch + ls + grep + reads, with `:cold`
  and `:hot` variants. Reports per-op breakdown + peak cache
  bytes.
- **`test/exgit/fs_grep_git_parity_test.exs`** — correctness
  oracle: `Exgit.FS.grep` output vs `git grep -n` for 7
  representative patterns. Tagged `:real_git :slow`; gates
  every push via the extended CI tier.
- **`docs/NOTES.md`** — design notes for deferred work (LRU
  eviction, decompressed-blob cache, literal-string grep fast
  path). Captures enough detail that future implementation
  doesn't re-reason from scratch.

### Performance

A perf-focused round triggered by a real-world bug report (partial
clone returning empty packs). Adding real-world fixtures
(`cloudflare/agents`, `anomalyco/opencode`) to the benchmark
immediately surfaced three cascading bugs in the core
"clone + prefetch + read" hot path that pyex was too small to
expose:

- **`FS.walk` threaded the updated repo through stream state**.
  Previously discarded the grown promisor from `resolve_tree`,
  so every walk on a lazy repo triggered a fresh commit fetch.
  On `cloudflare/agents` (1,418 files): 7,700 ms → 2 ms per walk.
  ~3,800× faster.
- **Promisor cache bytes now tracked as compressed**, not
  decompressed. Previous accounting over-counted by 3-10×,
  tripping eviction during normal prefetch. Combined with the
  evictor only dropping commits (not blobs/trees), this could
  drop the single commit we'd just fetched for a streaming walk.
- **`:max_cache_bytes` default changed from 64 MiB to `:infinity`**.
  Unbounded is the right default for partial-clone / prefetch
  workflows. Callers with real memory envelopes (long-running
  daemons, low-memory deployments) set an explicit cap based on
  their budget.
- **`:max_resolved_bytes` default raised from 500 MiB to 2 GiB**
  (matches `:max_pack_bytes`). The old cap blocked real-world
  monorepos; `anomalyco/opencode` resolves to 524 MB.

Plus two real optimizations:

- **Adler32 trailer probe** for pack zlib stream tracking.
  Replaces an O(log N) binary-search with one linear scan +
  one verify probe. `pack.parse` went from 127 ms → 49 ms on
  pyex (2.6× faster). Saves several seconds on large packs.
- **Single-pass grep (`matches_in` rewrite)**. Previously
  split every blob into lines via regex before matching; now
  `Regex.scan` on whole blob + compute line numbers only for
  matched files. 13× faster on the common case (repo with few
  matches).

**Explicitly reverted:** initial parallel-grep implementation
using `Task.async_stream`. Measured 22× SLOWER on `cloudflare/agents`
— per-file spawn overhead (~50-100 µs) dominates the microsecond
regex work. Default stays sequential. Callers with substantial
per-file work opt in via `max_concurrency: :schedulers`.

Full benchmark methodology + per-fixture numbers in
[`docs/PERFORMANCE.md`](docs/PERFORMANCE.md). Benchmark harness
in `bench/review_bench.exs`.

### Production-readiness round

A follow-up audit after the staff-engineering review closed the
reviewer's "what I didn't look at" list:

- **Config RCE audit** — `Exgit.Config` is read-only data; no code
  path executes values from it (no `core.sshCommand`,
  `core.fsmonitor`, `core.hookspath`, `insteadOf`, `includeIf`
  expansion). A new structural test
  (`test/exgit/security/no_shell_exec_test.exs`) asserts `lib/`
  contains zero `System.cmd` / `:os.cmd` / `Port.open` /
  `Path.expand` / `Path.absname` calls; failure means someone
  introduced a new execution path that needs review against the
  threat model.
- **Pack.Writer concurrent-build stress** — 3 tests assert 100
  parallel builds of identical input produce byte-identical
  output, 100 parallel builds of distinct input round-trip
  cleanly, and 1000 sequential builds don't leak zlib ports.
- **Decoder fuzz corpus** — 10 property tests, 500 cases each,
  exercise `Blob.decode/1`, `Tree.decode/1`, `Commit.decode/1`,
  `Tag.decode/1`, and `Pack.Reader.parse/2` on random bytes.
  Every decoder's "never raises on untrusted input" promise is
  now explicitly tested.
- **Config fuzz corpus** — 3 property tests, 500 cases each,
  cover `Config.parse/1` on random bytes, section-header-like
  noise, and roundtrip fixpoint. Includes RCE-shape regression
  tests that parse `core.fsmonitor` / `core.sshCommand` /
  `includeIf` values and assert they are stored verbatim (not
  executed or expanded).
- **Walk cross-check vs real git** — `test/exgit/walk_real_git_test.exs`
  constructs 5 DAG shapes (fork, criss-cross, linear, deep-fork,
  octopus) with real git, then compares `Exgit.Walk.merge_base/2`
  and `merge_base_all/2` against `git merge-base` and
  `git merge-base --all`. Found and fixed a nondeterministic
  LCA-pick bug (criss-cross merges).

### Fixed

- **`Walk.merge_base/2`** picked from the candidate `MapSet` with
  `hd(MapSet.to_list(...))`, whose order depends on insertion
  hashing. Multiple-LCA cases (criss-cross merges) returned
  different SHAs on different runs. Now sorts candidates by
  `{-timestamp, sha}` (newest first, SHA-ascending tiebreak) for
  a deterministic pick. Documented divergence from git's exact
  tiebreak (traversal-order-dependent) in the docstring.

### Added

- **`Walk.merge_base_all/2`** — returns every valid LCA, matching
  `git merge-base --all`. Cross-checked against real git on 5 DAG
  shapes.
- **`Diff.trees/4` bounds** — `:max_depth` (default 256),
  `:max_changes` (default `nil`), and tree-cycle detection via
  the descent-path `seen` set. Hostile trees can no longer
  overflow the stack or loop forever during a diff.
- **`Index.parse/2` bounds** — `:max_entries` (default 1M),
  `:max_bytes` (default 512 MiB), and SHA-1 checksum verification
  (`:verify_checksum`, default `true`). Catches hostile indexes
  claiming 4-billion entries, oversized inputs, and bit-rot.

### Changed — **breaking (pre-release API redesign)**

These changes were driven by an API audit after the staff-engineering
review round. Exgit has not yet cut an official release, so we're
taking the opportunity to land the right shapes before v0.1.

- **`Exgit.lazy_clone/2` removed.** Fold into `Exgit.clone/2` via
  new options:
    - `clone(url)` — full clone (eager; default behavior).
    - `clone(url, lazy: true)` — refs only; objects fetched on demand.
      Returns `%Repository{mode: :lazy}`.
    - `clone(url, filter: {:blob, :none})` — partial clone; commits
      and trees eager, blobs on demand.
    - `clone(url, filter: ..., lazy: true)` — refs only; everything
      on demand.
    - `clone(url, path: "...", lazy: true)` — returns
      `{:error, :disk_partial_clone_unsupported}` (explicit; no
      silent `:path`-ignored footgun).

  Matches `git clone`'s single-command mental model.

- **`%Exgit.Repository{}` gained `:mode` field** (`:eager | :lazy`).
  Defaults to `:eager` in `Repository.new/3`. `clone(url, lazy: true)`
  and `clone(url, filter: ...)` produce `:lazy`. `Repository.materialize/2`
  flips `:lazy → :eager`. Streaming FS ops (`FS.walk/2`, `FS.grep/4`)
  now pattern-match on `:eager` and raise on `:lazy` with a pointer
  at `materialize/2` or `prefetch/3`. Callers of `FS.walk/2`/`FS.grep/4`
  on lazy repos get a clear error message; the previous
  `ArgumentError` checked struct-internal cache emptiness.

- **`FS.prefetch/3` with `blobs: true` flips `:mode` to `:eager`** on
  a previously-lazy repo. After a full prefetch every reachable
  object is resident, so streaming ops proceed without a second
  conversion step. `blobs: false` (trees-only) leaves `:mode`
  unchanged.

- **`Exgit.Transport.ls_refs/2` return shape changed** from
  `{:ok, refs}` to `{:ok, refs, meta}`. `refs` is always a list of
  `{ref_name, sha}` 2-tuples (the protocol spec never described any
  other shape); `meta` is a map carrying protocol-v2 side-channel
  data:
    - `meta.head` — HEAD's symref target (e.g. `"refs/heads/main"`),
      present when the server advertises it via the protocol-v2
      `symrefs` argument.
    - `meta.peeled` — `%{tag_ref => peeled_target_sha}`, populated
      when the server emits `peeled:<sha>` attributes on annotated
      tags.
  `Exgit.Transport.File.ls_refs/2` surfaces `meta.head` by reading
  the on-disk HEAD symref. Every user-defined Transport
  implementation must update to the new 3-tuple return shape.

### Added

- **`Exgit.RefName`** — validation of git ref names at the transport
  boundary. Ports `git check-ref-format` rules; emits
  `[:exgit, :security, :ref_rejected]` telemetry on hostile names.
- **`Exgit.Filter`** — structured partial-clone filter specs
  (`{:blob, :none}`, `{:blob, {:limit, n}}`, `{:tree, depth}`,
  `{:raw, "spec"}`).
- **`Exgit.Repository.materialize/2`** — convert a Promisor-backed repo
  into a plain `ObjectStore.Memory`-backed one in a single call.
- **`Exgit.Transport.HTTP.request_opts/5`** and **`.auth_headers_for/2`** —
  exposed for test introspection; host-bound credential check is now
  the single enforcement point.
- **`Exgit.Transport.HTTP.capabilities_cached/1`** — memoizing
  capabilities accessor. Reduces HTTP round-trips in agent workflows
  that issue many fetches against one transport (review #13).
- **`Exgit.Error`** — canonical error struct (`%Exgit.Error{code,
  context, message}`). New error paths SHOULD use it; existing ad-hoc
  shapes (`{:error, atom}`, `{:error, {atom, details}}`) are preserved
  for SemVer. v1.0 may coalesce (review #18).
- **`Exgit.Credentials.bind_to/2`** — pipeline-friendly host-binding:
  `Credentials.bearer(token) |> Credentials.bind_to("github.com")`
  (review #44).
- **`Exgit.ObjectStore.Promisor.empty?/1`** — stable abstraction
  replacing struct-peeking on `%Promisor{cache: %Memory{objects: _}}`
  (review #17).
- **`Exgit.ObjectStore.Promisor.resolve_with_fetch/2`** — variant of
  `resolve/2` that threads the grown promisor back on the
  fetch-but-not-found path so the cache side-effect isn't wasted
  (review #33).
- `:max_pack_bytes` (default 2 GiB), `:max_object_bytes` (default
  100 MiB), and `:max_resolved_bytes` (default 500 MiB) options on
  `Exgit.Pack.Reader.parse/2` bound memory on untrusted input
  (review #11/#35).
- `:max_cache_bytes` option on `Exgit.ObjectStore.Promisor.new/2` —
  enables FIFO-by-commit eviction so long-running agent loops don't
  OOM (review #34).
- `:redirect` option on `Exgit.Transport.HTTP.new/2` — `false`
  (default), `:same_origin`, or `:follow`. Host-bound credentials
  enforce the cross-origin leak check regardless (review #14).
- Protocol v2 `symrefs` argument on `ls-refs` — `Exgit.clone/2` now
  picks the server's actual HEAD target instead of guessing
  `main`/`master`/first-advertised (review #9).
- `[:exgit, :security, :ref_rejected]`, `[:exgit, :ref_store,
  :write_failed]`, and `[:exgit, :object_store, :haves_sent]`,
  `[:exgit, :object_store, :cache_overfull]` telemetry events.
- Peeled-tag parsing in `packed-refs` (review #37). Peeled targets
  are threaded through for a future fetch-negotiator; not yet
  surfaced in `list_refs/2`.
- Dialyzer and Credo in CI (currently report-only; will gate in a
  future release).

### Changed — **breaking**

- **`Exgit.FS.read_path/3`**, **`ls/3`**, **`stat/3`**, **`write_path/4`**
  now return `{:ok, result, repo}` to support Promisor cache growth
  across calls. Callers must thread the returned `repo` forward to
  benefit from the populated cache.
- **`Exgit.Transport.HTTP.new/2`** automatically wraps bare auth tuples
  (`{:basic, u, p}`, `{:bearer, t}`, etc.) in a host-bound
  `%Exgit.Credentials{}`. Legacy callers are transparently protected
  against cross-origin credential leaks. To opt out, wrap the tuple
  with `Exgit.Credentials.unbound/1`.
- **`{:callback, fun}` auth** now receives the request URL as its sole
  argument (was previously mis-called with zero arguments — crash on
  first use).
- **`ObjectStore.Disk.import_objects/2`** returns
  `{:error, {:partial_import, [{sha, reason}]}}` on any per-object
  failure instead of crashing or silently succeeding.
- **`Exgit.FS.walk/2`** and **`.grep/4`** now raise `ArgumentError` if
  called on a Promisor-backed repo whose cache is empty, pointing the
  caller at `FS.prefetch/3` or `Repository.materialize/2`. Prefixes
  no longer silently return empty results.
- HTTP requests explicitly set `redirect: false` on Req — no longer
  depends on Req's default cross-origin auth-stripping behavior.
- **`Exgit.Transport.HTTP.ls_refs/2`** now returns a mix of 2-tuples
  `{ref, sha}` and 3-tuples `{ref, sha, meta}` — the 3-tuple shape
  carries protocol-v2 attributes like `symref-target` and `peeled`.
  Consumers that care only about the `{ref, sha}` pair can use
  `elem/2` or run through a tuple-shape-agnostic iteration.
- **`Tree.new/1`** accepts `:strict` option; when `true`, unknown
  modes raise `ArgumentError` instead of being silently coerced
  (review #10). Default behavior unchanged.

### Fixed

- Pack parser no longer raises `ArgumentError` / `MatchError` on
  malformed input. Every decoder returns `{:error, _}`.
- `Pack.Delta.apply/2` validates copy offsets, insert lengths, and
  the result-size cap — hostile deltas produce tagged errors.
- `Pack.Common.decode_type_size_varint/1` and
  `decode_ofs_varint/1` return `{:error, :truncated}` on empty input
  instead of crashing on `FunctionClauseError`.
- Loose-object parser validates the declared size against the
  content length and rejects unknown object types with a structured
  error.
- `Pack.Index` no longer generates descending `0..-1` ranges on empty
  packs (removes Elixir 1.19 deprecation warning).
- **Commit.decode/1** and **Tag.decode/1** validate hex-header
  values — a structurally-valid commit with non-hex `tree`/`parent`
  bytes is rejected with `{:error, {:invalid_hex_header, name,
  value}}` instead of crashing downstream accessors (review #23).
- **Tree.decode/1** validates every entry name against
  path-traversal rules — rejects empty, `.`, `..`, any `/`, any NUL,
  and case-insensitive `.git`/`.gitmodules` (review #2).
- **RefStore.Disk** validates ref names at every public entry
  (`read_ref/2`, `resolve_ref/2`, `write_ref/4`, `delete_ref/2`) and
  revalidates symbolic targets read from disk. Hostile targets
  return `{:error, :invalid_ref_name}` with telemetry (review #1).
- **ObjectStore.Disk.get_object/2** wraps `:zlib.uncompress/1` in
  `try/rescue`, returning `{:error, :zlib_error}` on corrupt/hostile
  loose objects instead of raising (review #3).
- **Pack.Reader** zlib tracking uses `:zlib.safeInflate/2` +
  `:zlib.inflateEnd/1` probes — no `:zlib.uncompress/1` calls on
  hostile input; per-probe output is bounded by `safeInflate`'s
  implementation-defined threshold (review #4).
- **Pack.Writer.deflate/1** wraps zlib calls in `try/after` so the
  zlib port is freed even when `deflate/3` raises. Previously a
  long-running server would slowly leak ports under memory pressure
  (review #30).
- **Credentials.host_matches?/2** normalizes both pattern and URL
  host: ASCII-case-folded, trailing-dot-stripped. `GITHUB.COM`,
  `github.com.`, `GitHub.com.` all match a `"github.com"` binding.
  Host-confusion attacks like `evil.comgithub.com` still correctly
  fail to match (review #5).
- **Custom `Inspect` impl** for `%Exgit.Credentials{}` — default
  Inspect would dump the raw token into crash logs (review #15).
- **Walk.merge_base/2** maintains `stale_in_queue` incrementally;
  the early-termination check is now O(1) instead of O(Q) per
  iteration. Merge-base on histories with hundreds of shared
  ancestors is no longer O(Q²) (review #25).
- **Walk.parse_timestamp/1** uses a module-attribute regex compiled
  once at load time instead of per-call (review #27).
- **Config** pre-compiles section-header regexes at module load
  (review #29).
- **Config.parse/1** uses `case` instead of an unconditional match
  on `parse_key_value/1`'s result — future branches that return
  `{:error, _}` cannot crash the parser, matching the moduledoc's
  "never raises on untrusted input" contract (review #28).
- **Pack.Reader** bounds `by_sha` + `resolved` memory via
  `:max_resolved_bytes` so a pack of many small OFS_DELTA chains
  can't balloon heap beyond the per-pack cap (review #11/#35).
- **ObjectStore.Disk** `pread_tail/3` size-probes the pack file and
  reads the full object body instead of capping at 128 KiB. Objects
  larger than 128 KiB in packs now decode correctly; previously
  they silently returned truncated bodies (review #12).
- **Promisor.collect_commit_haves/1** uses a `:gb_trees` priority
  queue keyed on recency instead of sorting the full commit map.
  O(N log K) where N is the 256-cap, not O(K log K) per miss
  (review #32).
- **`Exgit.clone/2`** picks the default branch from the server's
  HEAD symref (via protocol-v2 `symrefs` on `ls-refs`) instead of
  guessing from advertised refs (review #9).
- **`Exgit.lazy_clone/2`** emits `[:exgit, :ref_store,
  :write_failed]` telemetry if a ref-store write fails during
  initial seed, instead of silently dropping the ref (review #8).
- **`Exgit.push/3`** emits an empty-but-valid PACK header when
  pushing a fast-forward that needs no new objects, matching git's
  `send-pack` wire shape; pure-delete pushes still send no pack
  (review #6).
- **RefStore.Disk.list_loose_refs/3** caps recursion depth at 16 and
  refuses to follow symlinks, defending against symlink loops in
  ref directories (review #36).
- **RefStore.Disk** parses peeled-tag lines in `packed-refs` instead
  of silently dropping them (review #37).
- **FS.resolve_tree/2** accepts a ref that points directly at a
  tree in both the string-ref and raw-SHA branches (review #40).
- **FS.resolve_tree/2** disambiguates 20-byte binary inputs: a
  binary of all printable ASCII with non-hex characters is treated
  as a ref name, not a SHA (review #41).
- **FS.compile_glob/1** returns a harmless always-false regex on
  compilation failure instead of raising (review #20).

### Security

- **CVE-worthy**: remote-controlled ref names can no longer escape
  the repo root via `Path.join`. `Exgit.RefName` validates every ref
  at the wire perimeter; `RefStore.Disk` re-validates defense-in-depth.
- **CVE-worthy**: hostile trees containing path-traversal entry
  names (`..`, `/foo`, `.git`) are rejected at `Tree.decode/1` —
  they never reach FS operations or a future checkout.
- **CVE-worthy**: a malformed commit (structurally valid but with
  non-hex `tree`/`parent` headers) previously DoS'd every operation
  that called a Commit accessor (walk, diff, push, FS). Validation
  moved into `decode/1`.
- **CVE-worthy**: credentials set via bare auth tuples are now
  host-bound automatically. Cross-origin redirects cannot leak the
  token regardless of Req's redirect behavior. Host matching is
  ASCII-case-folded and trailing-dot-stripped.
- Pack parser bounded at 2 GiB pack / 100 MiB per-object /
  500 MiB resolved-total by default; no hostile server response can
  unbounded-allocate the BEAM heap.
- `%Exgit.Credentials{}` has a custom `Inspect` impl that redacts
  auth values; crash logs, SASL reports, and IEx sessions do not
  leak tokens.
- Loose-object zlib decompression is wrapped in `try/rescue`;
  corrupt or tampered objects return tagged errors instead of
  crashing.

