Extracts per-turn interaction quality metrics from SubAgent execution results.
All functions are pure — they take a %Step{} and return computed values.
Summary
Functions
Aggregates metrics across multiple runs.
Computes all metrics in one call.
Whether the run exhausted its turn budget.
The most frequent error reason across failed turns, or nil if no errors.
Frequency map of error reasons across failed turns with structured reasons.
Fraction of turns with the given error reason.
Whether turn 1 produced parseable code.
Whether the run had any failed turn (parse, analysis, runtime, or tool errors).
Fraction of turns with a :multiple_code_blocks result reason.
Fraction of turns with a :no_code_found result reason.
Fraction of turns with a :parse_error result reason.
Number of turns used.
Turn number of the first successful tool call, or nil if none.
Functions
Aggregates metrics across multiple runs.
Input: list of maps returned by analyze/2.
Returns a summary map with rates and means. Returns an empty summary with zero values when given an empty list.
@spec analyze( PtcRunner.Step.t(), keyword() ) :: map()
Computes all metrics in one call.
The passed? field is provided externally via opts[:passed?] (from test validation),
not derived from the Step.
Returns a map with keys: :first_turn_valid?, :parse_failure_rate, :no_code_rate,
:multi_code_block_rate, :turns_to_first_tool_call, :budget_exhausted?,
:has_failed_turn?, :turn_count, :input_tokens, :output_tokens,
:total_tokens, :passed?.
@spec budget_exhausted?(PtcRunner.Step.t()) :: boolean()
Whether the run exhausted its turn budget.
Checks for reasons: :max_turns_exceeded, :turn_budget_exhausted, :budget_exhausted.
Examples
iex> step = %PtcRunner.Step{fail: %{reason: :max_turns_exceeded, message: "exceeded"}}
iex> PtcRunner.Metrics.TurnAnalysis.budget_exhausted?(step)
true
iex> step = %PtcRunner.Step{return: 42}
iex> PtcRunner.Metrics.TurnAnalysis.budget_exhausted?(step)
false
@spec dominant_error(PtcRunner.Step.t()) :: atom() | nil
The most frequent error reason across failed turns, or nil if no errors.
Examples
iex> t1 = PtcRunner.Turn.failure(1, "raw", nil, %{reason: :parse_error, message: "bad"})
iex> t2 = PtcRunner.Turn.failure(2, "raw", nil, %{reason: :parse_error, message: "bad"})
iex> t3 = PtcRunner.Turn.failure(3, "raw", "(/ 1 0)", %{reason: :eval_error, message: "div/0"})
iex> step = %PtcRunner.Step{turns: [t1, t2, t3]}
iex> PtcRunner.Metrics.TurnAnalysis.dominant_error(step)
:parse_error
@spec error_breakdown(PtcRunner.Step.t()) :: %{required(atom()) => non_neg_integer()}
Frequency map of error reasons across failed turns with structured reasons.
Only counts turns where result.reason is a structured atom.
Turns without structured reasons are skipped (not counted as :unknown).
Returns an empty map when there are no qualifying failed turns.
Examples
iex> t1 = PtcRunner.Turn.failure(1, "raw", nil, %{reason: :parse_error, message: "bad"})
iex> t2 = PtcRunner.Turn.success(2, "raw", "(+ 1 2)", 3)
iex> t3 = PtcRunner.Turn.failure(3, "raw", "(/ 1 0)", %{reason: :eval_error, message: "div/0"})
iex> step = %PtcRunner.Step{turns: [t1, t2, t3]}
iex> PtcRunner.Metrics.TurnAnalysis.error_breakdown(step)
%{parse_error: 1, eval_error: 1}
@spec error_rate(PtcRunner.Step.t(), atom()) :: float()
Fraction of turns with the given error reason.
Examples
iex> t1 = PtcRunner.Turn.failure(1, "raw", nil, %{reason: :parse_error, message: "bad"})
iex> t2 = PtcRunner.Turn.success(2, "raw", "(+ 1 2)", 3)
iex> step = %PtcRunner.Step{turns: [t1, t2]}
iex> PtcRunner.Metrics.TurnAnalysis.error_rate(step, :parse_error)
0.5
@spec first_turn_valid?(PtcRunner.Step.t()) :: boolean()
Whether turn 1 produced parseable code.
Measures interaction quality (did the model write structurally valid code?), not task success. A turn that parses but fails at runtime is still "valid".
Examples
iex> turn = PtcRunner.Turn.success(1, "raw", "(+ 1 2)", 3)
iex> step = %PtcRunner.Step{turns: [turn]}
iex> PtcRunner.Metrics.TurnAnalysis.first_turn_valid?(step)
true
@spec has_failed_turn?(PtcRunner.Step.t()) :: boolean()
Whether the run had any failed turn (parse, analysis, runtime, or tool errors).
Used for computing salvage rate: of runs that hit any error, how many still passed?
@spec multi_code_block_rate(PtcRunner.Step.t()) :: float()
Fraction of turns with a :multiple_code_blocks result reason.
@spec no_code_rate(PtcRunner.Step.t()) :: float()
Fraction of turns with a :no_code_found result reason.
@spec parse_failure_rate(PtcRunner.Step.t()) :: float()
Fraction of turns with a :parse_error result reason.
Examples
iex> t1 = PtcRunner.Turn.success(1, "raw", "(+ 1 2)", 3)
iex> step = %PtcRunner.Step{turns: [t1]}
iex> PtcRunner.Metrics.TurnAnalysis.parse_failure_rate(step)
0.0
@spec turn_count(PtcRunner.Step.t()) :: non_neg_integer()
Number of turns used.
Examples
iex> step = %PtcRunner.Step{turns: nil}
iex> PtcRunner.Metrics.TurnAnalysis.turn_count(step)
0
@spec turns_to_first_tool_call(PtcRunner.Step.t()) :: pos_integer() | nil
Turn number of the first successful tool call, or nil if none.
Examples
iex> t1 = PtcRunner.Turn.success(1, "raw", "(+ 1 2)", 3)
iex> step = %PtcRunner.Step{turns: [t1]}
iex> PtcRunner.Metrics.TurnAnalysis.turns_to_first_tool_call(step)
nil