Dsxir.Optimizer.SIMBA.Bucket (dsxir v0.5.0)

Copy Markdown

Per-example trajectory bucket with variance statistics.

A bucket groups all trajectory records for one trainset example and computes gap statistics used to prioritise which examples are processed first during candidate generation.

Summary

Functions

Builds a bucket from a list of records for one example.

Computes {p10, p90} over a flat list of numeric scores using linear interpolation (numpy-style).

Sorts a list of buckets descending by {max_to_min_gap, max_score, max_to_avg_gap}.

Types

t()

@type t() :: %{
  records: [trajectory_record()],
  max_to_min_gap: float(),
  max_score: float(),
  max_to_avg_gap: float()
}

trajectory_record()

@type trajectory_record() :: %{
  score: float(),
  trace: list(),
  prediction: term(),
  example: term(),
  metadata: term()
}

Functions

from_records(records)

@spec from_records([trajectory_record()]) :: t()

Builds a bucket from a list of records for one example.

Records are sorted by score descending. Three gap stats are computed: max_to_min_gap, max_score, and max_to_avg_gap.

percentiles(scores)

@spec percentiles([number()]) :: {float(), float()}

Computes {p10, p90} over a flat list of numeric scores using linear interpolation (numpy-style).

For percentile p over sorted xs of length n: rank = p/100 * (n-1); interpolate between floor and ceil indices.

sort(buckets)

@spec sort([t()]) :: [t()]

Sorts a list of buckets descending by {max_to_min_gap, max_score, max_to_avg_gap}.

Examples with more variance and a higher score ceiling are processed first.