SkillKit.Eval.Result (SkillKit v0.4.0)

Copy Markdown View Source

The outcome of running one eval: the eval, the captured transcript, and the list of %SkillKit.Eval.Check{} scored against it.

An eval passes only when every check passes. failure_message/1 renders the failing checks and transcript into the message ExUnit shows when a generated eval test fails.

Summary

Functions

A human-readable explanation of why the eval failed, for ExUnit output.

The checks that failed.

True when every check passed.

Non-fatal warning notes from checks that passed (e.g. the judge's).

Types

t()

@type t() :: %SkillKit.Eval.Result{
  cached: boolean(),
  checks: [SkillKit.Eval.Check.t()],
  eval: SkillKit.Eval.t(),
  transcript: SkillKit.Eval.Transcript.t()
}

Functions

failure_message(result)

@spec failure_message(t()) :: String.t()

A human-readable explanation of why the eval failed, for ExUnit output.

Renders the failing checks (including the judge's verdict and reasoning) plus the captured transcript — the prompt sent, the tools the agent called, and its response — so a failure shows both what was judged and why.

failures(result)

@spec failures(t()) :: [SkillKit.Eval.Check.t()]

The checks that failed.

passed?(result)

@spec passed?(t()) :: boolean()

True when every check passed.

warnings(result)

@spec warnings(t()) :: [String.t()]

Non-fatal warning notes from checks that passed (e.g. the judge's).