An object store that fetches missing objects on demand from a transport, caching them locally.
The Promisor is a pure value — no processes, no pids, no shared
state. Growing the cache requires the caller to thread the updated
struct forward via resolve/2.
{:ok, obj, promisor2} = Promisor.resolve(promisor, sha)
{:ok, obj2, promisor3} = Promisor.resolve(promisor2, other_sha)Two callers holding the same %Promisor{} see the same cache.
Comparing promisors by value (==) reflects their logical state.
Sharing via message passing, snapshotting, or serialization just
works.
Concurrency
Because the struct is pure, two concurrent resolve(p, sha_a) and
resolve(p, sha_b) calls from the same p each fetch
independently, and only the return value the caller threads
forward "wins" — the other fetch's cache growth is discarded.
This is a CACHE RACE, not a correctness race: both results are
valid, but the merged cache is strictly smaller than if the calls
had been serialized.
For workloads that do concurrent bulk reads against the same
repo (e.g. a grep agent spawning N tasks), use
Exgit.ObjectStore.SharedPromisor — a GenServer wrapper that
serializes cache access across processes and eliminates the
cache race entirely.
Integration
Exgit.FS threads the updated repo through its strict operations
(read_path, ls, stat, write_path) so callers get the grown
cache back:
{:ok, {mode, blob}, repo} = Exgit.FS.read_path(repo, "HEAD", path)Streaming operations (FS.walk, FS.grep) use the pure
ObjectStore.get/2 and do NOT grow the cache. For a warm cache,
call Exgit.FS.prefetch/2 up front.
Memory
The cache is unbounded by default. Pass :max_cache_bytes
explicitly to enable FIFO-by-commit eviction bounded at a byte
count of your choosing.
The unbounded default is a deliberate choice: partial-clone and
full-clone workflows typically prefetch 100-500 MB of tree and
blob data up front, then do many reads against that working set.
A small cap (e.g. 64 MiB) trips during prefetch on any
real-world repo, and because eviction only evicts COMMITS (not
blobs or trees — git's access patterns don't cleanly map to LRU
at the blob level), triggering the evictor mid-stream can drop
state the caller is actively using. For long-running daemons or
memory-constrained deployments, size the cap to your actual
envelope — e.g. max_cache_bytes: 2 * 1024 * 1024 * 1024 for a
2 GiB budget.
When a cap IS set and the cache approaches it, the eviction loop drops the oldest commits (and their associated pointer into the commit queue) in FIFO order. Tree and blob objects are NOT evicted individually; they remain until either (a) the process dies or (b) a higher-level operation discards the whole repo.
Server negotiation (haves)
On-demand fetches (resolve/2 → fetch_and_cache/2) deliberately
send no haves to the server. This is counter-intuitive —
every bulk git fetch DOES send haves to avoid redundant transfer —
but on-demand fetches have different semantics:
Bulk fetch (
Exgit.fetch/3): "I'm at commit X, catch me up to ref Y." Haves save bandwidth by excluding objects reachable from X.On-demand fetch (Promisor): "Ship me exactly this blob, please." Haves actively break this. A smart server (GitHub, anything running modern
git-upload-pack) treats haves as a reachability closure — "the client has commit X, therefore they have everything reachable from X" — and returns an empty pack. The blob is "reachable" from any cached commit that points at its containing tree, so every partial-clone read after the first would fail.
See test/exgit/security/haves_empty_pack_test.exs for an
offline regression against this.
Overfull behavior
When the evictor runs out of commits to drop but cache_bytes
is still above the cap, the cache is technically over-full.
The :on_overfull option selects the policy:
:log(default) — emit[:exgit, :object_store, :cache_overfull]telemetry and keep going. Matches the previous behavior.:error— nextput/resolvereturns{:error, :cache_overfull, promisor}. Force a fail-fast loop to surface misconfigured caps quickly.{:callback, fun}—fun.(promisor)is invoked; its return value is discarded. Use for custom metrics, alerting, or graceful shutdown.
Summary
Functions
True if the cache is empty (no objects). Provides a
stable abstraction for callers that used to reach into
%Promisor{cache: %Memory{objects: objs}} — e.g.
FS.require_non_promisor!/2.
Fetch wants from the transport with explicit fetch options (e.g.
a partial-clone filter), and merge the returned objects into the
cache. Returns {:ok, new_promisor}.
True if sha is in the local cache. Does NOT trigger a fetch.
Merge raw_objects into the cache.
Build a fresh Promisor wrapping transport.
Uncompressed byte size of sha IF it is already cached locally —
without triggering a fetch. Returns {:error, :not_local} when the
object has not been fetched yet, so a size check can never silently
pull a multi-GB blob over the network.
Return a new Promisor with object inserted into its cache.
Look up sha. On a cache hit, returns {:ok, obj, promisor} where
the promisor is unchanged. On a miss, fetches from the transport,
caches every object the pack returned, and returns
{:ok, obj, new_promisor} — the returned struct carries the grown
cache.
Types
@type t() :: %Exgit.ObjectStore.Promisor{ cache: Exgit.ObjectStore.Memory.t(), cache_bytes: non_neg_integer(), commit_counter: non_neg_integer(), commit_queue: :gb_trees.tree() | nil, default_fetch_opts: keyword(), haves_cap: pos_integer(), max_cache_bytes: non_neg_integer() | :infinity, on_overfull: overfull_policy(), transport: term() }
Functions
True if the cache is empty (no objects). Provides a
stable abstraction for callers that used to reach into
%Promisor{cache: %Memory{objects: objs}} — e.g.
FS.require_non_promisor!/2.
Fetch wants from the transport with explicit fetch options (e.g.
a partial-clone filter), and merge the returned objects into the
cache. Returns {:ok, new_promisor}.
Used by Exgit.clone/2 (with filter:) to perform the eager
commits+trees fetch under a blob:none filter at clone time. End
users should normally rely on resolve/2, which handles misses
transparently.
True if sha is in the local cache. Does NOT trigger a fetch.
Merge raw_objects into the cache.
Build a fresh Promisor wrapping transport.
Options:
:initial_objects— list of pre-decoded objects to seed the cache.:default_fetch_opts— keyword list merged into everyTransport.fetch/3call the Promisor makes. Used bylazy_cloneto propagate things like the partial-clone filter spec onto subsequent on-demand fetches.:max_cache_bytes— cap on total cached object bytes. Default:infinity(no cap). Set to an integer byte count for long-running daemons / memory-constrained deployments that need a bound. See "Memory" in the moduledoc for sizing guidance.:on_overfull— policy when the eviction loop can't reducecache_bytesbelowmax_cache_bytes(commit queue empty; only raw blobs/trees left in the cache). One of::log(default) — emit[:exgit, :object_store, :cache_overfull]telemetry and keep accepting new objects.:error— fail subsequentput/resolvewith{:error, :cache_overfull, promisor}.{:callback, fun}— invokefun.(promisor). Return value is ignored; raise for hard-fail.
@spec object_size(t(), binary()) :: {:ok, non_neg_integer()} | {:error, :not_local}
Uncompressed byte size of sha IF it is already cached locally —
without triggering a fetch. Returns {:error, :not_local} when the
object has not been fetched yet, so a size check can never silently
pull a multi-GB blob over the network.
@spec put(t(), Exgit.Object.t()) :: {:ok, binary(), t()} | {:error, :cache_overfull, t()}
Return a new Promisor with object inserted into its cache.
When :on_overfull is :error and the post-insert cache exceeds
:max_cache_bytes with no commits left to evict, returns
{:error, :cache_overfull, promisor} instead — the promisor is
still threaded back so the caller can inspect cache_bytes /
decide what to do.
@spec resolve(t(), binary()) :: {:ok, Exgit.Object.t(), t()} | {:error, term()} | {:error, term(), t()}
Look up sha. On a cache hit, returns {:ok, obj, promisor} where
the promisor is unchanged. On a miss, fetches from the transport,
caches every object the pack returned, and returns
{:ok, obj, new_promisor} — the returned struct carries the grown
cache.
Error shape
Errors come in two flavors:
{:error, reason}— transport-level failure, no cache change. Returned when the fetch itself failed (connection error, HTTP non-2xx, malformed pack).{:error, reason, promisor}— the fetch succeeded and the cache grew, but the specific SHA requested wasn't in the returned pack (rare; happens when a partial-clone server defers the requested object itself). Callers should thread the returned promisor forward to avoid refetching the sibling objects that WERE returned.
Pattern-match on both shapes:
case Promisor.resolve(p, sha) do
{:ok, obj, p2} -> ...
{:error, _, p2} -> ... # grown cache, but sha missing
{:error, _} -> ... # fetch failed entirely
end