AgentSea.Evaluate.Metric behaviour (agentsea_evaluate v0.1.0)

Copy Markdown

A scoring metric. Given an example (the :output under test, plus optional :input/:expected), it returns a score in [0, 1] and a pass/fail. Built-in metrics: ExactMatch, Contains, and LLMJudge (provider-backed).

Summary

Types

example()

@type example() :: %{
  optional(:id) => term(),
  optional(:input) => String.t(),
  optional(:expected) => term(),
  output: String.t()
}

result()

@type result() :: %{score: float(), passed: boolean()}

Callbacks

evaluate(example, opts)

@callback evaluate(example(), opts :: keyword()) :: result()

name()

@callback name() :: String.t()