API Reference agentsea_evaluate v#0.1.0
Copy MarkdownModules
Run scoring metrics over a dataset, concurrently, and aggregate the results.
A scoring metric. Given an example (the :output under test, plus optional
:input/:expected), it returns a score in [0, 1] and a pass/fail. Built-in
metrics: ExactMatch, Contains, and LLMJudge (provider-backed).
Scores 1.0 when the output contains the expected value (case-insensitive substring).
Scores 1.0 when the output equals the expected value (trimmed, case-insensitive).
Uses an LLM to score an output against a rubric — "LLM-as-judge". Runs over any
AgentSea.Provider (so it can go through the gateway).