PtcRunner.Metrics.TurnAnalysis (PtcRunner v0.10.1)

Copy Markdown View Source

Extracts per-turn interaction quality metrics from SubAgent execution results.

All functions are pure — they take a %Step{} and return computed values.

Summary

Functions

Aggregates metrics across multiple runs.

Computes all metrics in one call.

Whether the run exhausted its turn budget.

The most frequent error reason across failed turns, or nil if no errors.

Frequency map of error reasons across failed turns with structured reasons.

Fraction of turns with the given error reason.

Whether turn 1 produced parseable code.

Whether the run had any failed turn (parse, analysis, runtime, or tool errors).

Fraction of turns with a :multiple_code_blocks result reason.

Fraction of turns with a :no_code_found result reason.

Fraction of turns with a :parse_error result reason.

Number of turns used.

Turn number of the first successful tool call, or nil if none.

Functions

aggregate(metrics_list)

@spec aggregate([map()]) :: map()

Aggregates metrics across multiple runs.

Input: list of maps returned by analyze/2.

Returns a summary map with rates and means. Returns an empty summary with zero values when given an empty list.

analyze(step, opts \\ [])

@spec analyze(
  PtcRunner.Step.t(),
  keyword()
) :: map()

Computes all metrics in one call.

The passed? field is provided externally via opts[:passed?] (from test validation), not derived from the Step.

Returns a map with keys: :first_turn_valid?, :parse_failure_rate, :no_code_rate, :multi_code_block_rate, :turns_to_first_tool_call, :budget_exhausted?, :has_failed_turn?, :turn_count, :input_tokens, :output_tokens, :total_tokens, :passed?.

budget_exhausted?(step)

@spec budget_exhausted?(PtcRunner.Step.t()) :: boolean()

Whether the run exhausted its turn budget.

Checks for reasons: :max_turns_exceeded, :turn_budget_exhausted, :budget_exhausted.

Examples

iex> step = %PtcRunner.Step{fail: %{reason: :max_turns_exceeded, message: "exceeded"}}
iex> PtcRunner.Metrics.TurnAnalysis.budget_exhausted?(step)
true

iex> step = %PtcRunner.Step{return: 42}
iex> PtcRunner.Metrics.TurnAnalysis.budget_exhausted?(step)
false

dominant_error(step)

@spec dominant_error(PtcRunner.Step.t()) :: atom() | nil

The most frequent error reason across failed turns, or nil if no errors.

Examples

iex> t1 = PtcRunner.Turn.failure(1, "raw", nil, %{reason: :parse_error, message: "bad"})
iex> t2 = PtcRunner.Turn.failure(2, "raw", nil, %{reason: :parse_error, message: "bad"})
iex> t3 = PtcRunner.Turn.failure(3, "raw", "(/ 1 0)", %{reason: :eval_error, message: "div/0"})
iex> step = %PtcRunner.Step{turns: [t1, t2, t3]}
iex> PtcRunner.Metrics.TurnAnalysis.dominant_error(step)
:parse_error

error_breakdown(step)

@spec error_breakdown(PtcRunner.Step.t()) :: %{required(atom()) => non_neg_integer()}

Frequency map of error reasons across failed turns with structured reasons.

Only counts turns where result.reason is a structured atom. Turns without structured reasons are skipped (not counted as :unknown). Returns an empty map when there are no qualifying failed turns.

Examples

iex> t1 = PtcRunner.Turn.failure(1, "raw", nil, %{reason: :parse_error, message: "bad"})
iex> t2 = PtcRunner.Turn.success(2, "raw", "(+ 1 2)", 3)
iex> t3 = PtcRunner.Turn.failure(3, "raw", "(/ 1 0)", %{reason: :eval_error, message: "div/0"})
iex> step = %PtcRunner.Step{turns: [t1, t2, t3]}
iex> PtcRunner.Metrics.TurnAnalysis.error_breakdown(step)
%{parse_error: 1, eval_error: 1}

error_rate(step, reason)

@spec error_rate(PtcRunner.Step.t(), atom()) :: float()

Fraction of turns with the given error reason.

Examples

iex> t1 = PtcRunner.Turn.failure(1, "raw", nil, %{reason: :parse_error, message: "bad"})
iex> t2 = PtcRunner.Turn.success(2, "raw", "(+ 1 2)", 3)
iex> step = %PtcRunner.Step{turns: [t1, t2]}
iex> PtcRunner.Metrics.TurnAnalysis.error_rate(step, :parse_error)
0.5

first_turn_valid?(step)

@spec first_turn_valid?(PtcRunner.Step.t()) :: boolean()

Whether turn 1 produced parseable code.

Measures interaction quality (did the model write structurally valid code?), not task success. A turn that parses but fails at runtime is still "valid".

Examples

iex> turn = PtcRunner.Turn.success(1, "raw", "(+ 1 2)", 3)
iex> step = %PtcRunner.Step{turns: [turn]}
iex> PtcRunner.Metrics.TurnAnalysis.first_turn_valid?(step)
true

has_failed_turn?(step)

@spec has_failed_turn?(PtcRunner.Step.t()) :: boolean()

Whether the run had any failed turn (parse, analysis, runtime, or tool errors).

Used for computing salvage rate: of runs that hit any error, how many still passed?

multi_code_block_rate(step)

@spec multi_code_block_rate(PtcRunner.Step.t()) :: float()

Fraction of turns with a :multiple_code_blocks result reason.

no_code_rate(step)

@spec no_code_rate(PtcRunner.Step.t()) :: float()

Fraction of turns with a :no_code_found result reason.

parse_failure_rate(step)

@spec parse_failure_rate(PtcRunner.Step.t()) :: float()

Fraction of turns with a :parse_error result reason.

Examples

iex> t1 = PtcRunner.Turn.success(1, "raw", "(+ 1 2)", 3)
iex> step = %PtcRunner.Step{turns: [t1]}
iex> PtcRunner.Metrics.TurnAnalysis.parse_failure_rate(step)
0.0

turn_count(step)

@spec turn_count(PtcRunner.Step.t()) :: non_neg_integer()

Number of turns used.

Examples

iex> step = %PtcRunner.Step{turns: nil}
iex> PtcRunner.Metrics.TurnAnalysis.turn_count(step)
0

turns_to_first_tool_call(step)

@spec turns_to_first_tool_call(PtcRunner.Step.t()) :: pos_integer() | nil

Turn number of the first successful tool call, or nil if none.

Examples

iex> t1 = PtcRunner.Turn.success(1, "raw", "(+ 1 2)", 3)
iex> step = %PtcRunner.Step{turns: [t1]}
iex> PtcRunner.Metrics.TurnAnalysis.turns_to_first_tool_call(step)
nil