SkillKit.Eval.Cache (SkillKit v0.3.0)

Copy Markdown View Source

Content-addressed result cache for evals.

Evals are expensive — each is a real agent run plus an LLM judge call — so the harness can skip a case that already passed when nothing in its scope has changed. Scope is captured as a fingerprint over:

  • the case (name, prompt, rubric, system prompt),
  • the agent and judge models,
  • the source of every skill/tool provider under test (file contents on disk, or the module name for module providers),
  • a harness-version token (@harness_version) bumped whenever scoring semantics change, so upgrades invalidate stale entries.

Only passes are recorded; a fingerprint present in the cache is treated as a prior pass and skipped. Failures and unknown fingerprints always run.

Because LLMs are non-deterministic, a cache hit means "this exact scope already passed, trust it" — not a guarantee the run would pass again. That is the intended contract for an expensive suite, analogous to a build cache.

The store is a term file (default under _build/<env>/); enable caching with SkillKit.Eval.Runner.run(eval, cache: true) or cache: "path/to/file".

Summary

Functions

Default cache path, under the current Mix build directory.

Computes the scope fingerprint for eval under opts (the same options passed to SkillKit.Eval.Runner.run/2).

:pass if fingerprint is recorded in the cache at path, else :miss.

Records fingerprint as a pass for case name in the cache at path.

Functions

default_path()

@spec default_path() :: String.t()

Default cache path, under the current Mix build directory.

fingerprint(eval, opts \\ [])

@spec fingerprint(
  SkillKit.Eval.t(),
  keyword()
) :: String.t()

Computes the scope fingerprint for eval under opts (the same options passed to SkillKit.Eval.Runner.run/2).

get(path, fingerprint)

@spec get(String.t(), String.t()) :: :pass | :miss

:pass if fingerprint is recorded in the cache at path, else :miss.

put(path, fingerprint, name)

@spec put(String.t(), String.t(), String.t()) :: :ok

Records fingerprint as a pass for case name in the cache at path.