API Reference agentsea_evaluate v#0.1.0

Copy Markdown

Modules

Run scoring metrics over a dataset, concurrently, and aggregate the results.

A scoring metric. Given an example (the :output under test, plus optional :input/:expected), it returns a score in [0, 1] and a pass/fail. Built-in metrics: ExactMatch, Contains, and LLMJudge (provider-backed).

Scores 1.0 when the output contains the expected value (case-insensitive substring).

Scores 1.0 when the output equals the expected value (trimmed, case-insensitive).

Uses an LLM to score an output against a rubric — "LLM-as-judge". Runs over any AgentSea.Provider (so it can go through the gateway).