Dsxir. EvaluationResult
(dsxir v0.4.0)
Copy Markdown
Result of a single Dsxir.Evaluate.run/2 invocation.
:score—avg(metric_value) * 100, rounded to 1 decimal place.:results— one row per devset entry, in input order. Successful rows carrymetric: float(); errored rows carrymetric: nil, error: %Exception{}.:errors— aggregate failure summary::count— total errored rows.:by_class—%{atom() => non_neg_integer()}keyed by the splode error class atom (:adapter,:lm,:invalid,:halted,:runtime,:framework,:unknown). Coarse bucket.:by_module—%{module() => non_neg_integer()}keyed by the concrete exception struct, so distinct failures sharing a class are still separable.:samples— up to three distinct error samples (deduped by struct and reason shape), each%{module, class, message}with the message truncated to 500 chars. Lets the summary be debugged on its own without re-running.
Subscribers branch on nil vs. populated; the :errors map is always
present, even when zero errors occurred (then count: 0 and empty
by_class, by_module, samples).
Summary
Functions
Build an errored row, attaching the caught exception in place of a prediction.
Build a successful row with its example, prediction, and numeric metric.
Aggregate values into a 0..100 score by averaging and scaling, rounded to
one decimal place. An empty list returns 0.0.
Types
@type errors() :: %{ count: non_neg_integer(), by_class: %{required(atom()) => non_neg_integer()}, by_module: %{required(module()) => non_neg_integer()}, samples: [error_sample()] }
@type row() :: %{ example: Dsxir.Example.t(), prediction: nil | Dsxir.Prediction.t(), metric: nil | float(), error: nil | Exception.t() }
Functions
@spec error_row(Dsxir.Example.t(), Exception.t()) :: row()
Build an errored row, attaching the caught exception in place of a prediction.
@spec ok_row(Dsxir.Example.t(), Dsxir.Prediction.t(), float()) :: row()
Build a successful row with its example, prediction, and numeric metric.
Aggregate values into a 0..100 score by averaging and scaling, rounded to
one decimal place. An empty list returns 0.0.