The Eval context for managing datasets, evaluation specs, and runs.
Summary
Functions
Adds an item to a dataset, checking its state.
Finalizes a successful shard and updates the parent campaign summary row.
Marks an eval run complete and persists final aggregate facts.
Creates a campaign, child target rows, child eval runs, and batch-enqueues worker jobs.
Creates a dataset with state :open.
Assigns version to "1" if not provided.
Creates a campaign parent row and its runtime-only target rows atomically.
Creates an eval run with resolved spec and dataset snapshot facts.
Creates an eval spec.
Dismisses one active review candidate without removing durable score evidence.
Executes one campaign target through the shared orchestrator-backed judge path.
Finalizes a failed shard and narrows campaign-wide fatal state to explicit failure classes.
Gets a single dataset.
Gets a single eval spec.
Returns one projected review candidate or nil.
Lists campaign targets in insertion order.
Returns the list of dataset items for a given dataset id.
Returns the list of datasets.
Returns the list of current eval specs.
Lists projected review-queue candidates for the operator UI.
Resolves the authoritative campaign/target/run lineage for a worker envelope. Persisted lineage remains the durable tenant truth even when envelope tenant data differs.
Moves a pending shard to running and refreshes aggregate campaign counters.
Builds a frozen preview for workflow-source promotion into a dataset item.
Promotes one review candidate into an open dataset and records durable queue lineage.
Promotes a Trace and its Spans into a new Dataset snapshot.
Promotes one original or replay workflow source into an existing dataset.
Persists per-item eval score evidence and updates aggregate run counters.
Replaces prior score truth for an eval run, keeping worker retries idempotent.
Requests sealed-baseline approval from one review candidate and keeps approval lineage visible.
Schedules online scoring sampling for a persisted trace on an async boundary.
Seals a dataset, making it immutable.
Returns summary strip counts for the projected review queue.
Updates an eval spec immutably.
Functions
Adds an item to a dataset, checking its state.
Finalizes a successful shard and updates the parent campaign summary row.
Marks an eval run complete and persists final aggregate facts.
Creates a campaign, child target rows, child eval runs, and batch-enqueues worker jobs.
Creates a dataset with state :open.
Assigns version to "1" if not provided.
Creates a campaign parent row and its runtime-only target rows atomically.
Creates an eval run with resolved spec and dataset snapshot facts.
Creates an eval spec.
Dismisses one active review candidate without removing durable score evidence.
Executes one campaign target through the shared orchestrator-backed judge path.
Finalizes a failed shard and narrows campaign-wide fatal state to explicit failure classes.
Gets a single dataset.
Gets a single eval spec.
Returns one projected review candidate or nil.
Lists campaign targets in insertion order.
Returns the list of dataset items for a given dataset id.
Returns the list of datasets.
Returns the list of current eval specs.
Lists projected review-queue candidates for the operator UI.
Resolves the authoritative campaign/target/run lineage for a worker envelope. Persisted lineage remains the durable tenant truth even when envelope tenant data differs.
Moves a pending shard to running and refreshes aggregate campaign counters.
Builds a frozen preview for workflow-source promotion into a dataset item.
Promotes one review candidate into an open dataset and records durable queue lineage.
Promotes a Trace and its Spans into a new Dataset snapshot.
Promotes one original or replay workflow source into an existing dataset.
Persists per-item eval score evidence and updates aggregate run counters.
Replaces prior score truth for an eval run, keeping worker retries idempotent.
Requests sealed-baseline approval from one review candidate and keeps approval lineage visible.
Schedules online scoring sampling for a persisted trace on an async boundary.
Seals a dataset, making it immutable.
Returns summary strip counts for the projected review queue.
Updates an eval spec immutably.