Content-addressed result cache for evals.
Evals are expensive — each is a real agent run plus an LLM judge call — so the harness can skip a case that already passed when nothing in its scope has changed. Scope is captured as a fingerprint over:
- the case (name, prompt, rubric, system prompt),
- the agent and judge models,
- the source of every skill/tool provider under test (file contents on disk, or the module name for module providers),
- a harness-version token (
@harness_version) bumped whenever scoring semantics change, so upgrades invalidate stale entries.
Only passes are recorded; a fingerprint present in the cache is treated as a prior pass and skipped. Failures and unknown fingerprints always run.
Because LLMs are non-deterministic, a cache hit means "this exact scope already passed, trust it" — not a guarantee the run would pass again. That is the intended contract for an expensive suite, analogous to a build cache.
The store is a term file (default under _build/<env>/); enable caching with
SkillKit.Eval.Runner.run(eval, cache: true) or cache: "path/to/file".
Summary
Functions
Default cache path, under the current Mix build directory.
Computes the scope fingerprint for eval under opts (the same options
passed to SkillKit.Eval.Runner.run/2).
:pass if fingerprint is recorded in the cache at path, else :miss.
Records fingerprint as a pass for case name in the cache at path.
Functions
@spec default_path() :: String.t()
Default cache path, under the current Mix build directory.
@spec fingerprint( SkillKit.Eval.t(), keyword() ) :: String.t()
Computes the scope fingerprint for eval under opts (the same options
passed to SkillKit.Eval.Runner.run/2).
:pass if fingerprint is recorded in the cache at path, else :miss.
Records fingerprint as a pass for case name in the cache at path.