Dsxir.Optimizer.COPRO (dsxir v0.3.0)

Copy Markdown

Coordinate-ascent instruction optimizer. Port of DSPy's COPRO.

COPRO optimizes per-predictor instructions only (no demos). Each round it asks a proposer LM for a breadth of candidate instructions per predictor, evaluates every candidate as a whole program (the other predictors held at their current committed override), and at the end of the round commits each predictor's strict round winner. The next round's proposer is grounded in the predictor's scored attempt history. After depth rounds the committed overrides are applied to the student and returned.

The pure coordinate-ascent state and its transitions live in Dsxir.Optimizer.COPRO.Sampler; this module owns the IO — proposer (LM) calls via Dsxir.Optimizer.COPRO.Proposer and whole-program scoring via Dsxir.Optimizer.COPRO.Evaluator — and threads the sampler through.

Quick start

{:ok, compiled, stats} =
  Dsxir.compile(
    Dsxir.Optimizer.COPRO,
    program,
    trainset,
    metric,
    auto: :light
  )

Options

See Dsxir.Optimizer.COPRO.Auto for the :light | :medium | :heavy presets controlling :breadth, :depth, and :init_temperature.

  • :auto (default :medium) — budget preset.
  • :proposer_lm{module, config} tuple for instruction proposals. Defaults to the resolved task LM.

Returned stats

Dsxir.Optimizer.COPRO.Stats.t/0. Notable fields:

  • :best_score — whole-program score under the committed overrides.
  • :best_instructions — the committed per-predictor overrides.
  • :rounds — committed rounds (equals cfg.depth on a full run).
  • :trials — per-candidate Dsxir.Optimizer.COPRO.Stats.Record list.
  • :proposer_calls — proposer LM calls issued.
  • :degradedtrue when any proposer call failed and was substituted with the predictor's current best instruction.

Under compile/4 the reported best_score is a single confirmatory whole-program evaluation under the committed best_overrides after the final round commits, so it reflects the program actually returned rather than a per-predictor candidate maximum.

In session mode (Dsxir.OptimizerSession) best_program/best_score are selected by the session from the highest-scoring individual trial, and each COPRO trial's candidate program changes only one predictor's instruction. So for a program where more than one predictor improves, a session run can return a different program and score than compile/4, which applies every committed override at once. This mirrors Dsxir.Optimizer.MIPROv2's session behaviour.

Wrapper-only bookkeeping (proposer call count, degraded flag, per-candidate Stats.Record list) lives in first-class Sampler fields so session-mode step/6 callers can checkpoint it; it serializes and survives Sampler transitions transparently.

Summary

Functions

Compile student against trainset under metric.

Like compile/4 but raises the validation exception on {:error, _}.

Prepare a resumable COPRO session.

Run a single COPRO trial against the session sampler.

Functions

compile(student, trainset, metric, opts)

Compile student against trainset under metric.

Returns {:ok, program, stats} on success or {:error, exception} on validation failure. See the module doc for opts.

compile!(student, trainset, metric, opts)

Like compile/4 but raises the validation exception on {:error, _}.

init_session(student, trainset, metric, opts)

@spec init_session(
  Dsxir.Program.t(),
  [Dsxir.Example.t()],
  nil | Dsxir.Metric.t(),
  keyword()
) ::
  {:ok, Dsxir.Optimizer.COPRO.Sampler.t(), pos_integer()}
  | {:error, Exception.t()}

Prepare a resumable COPRO session.

Expands the budget preset, validates the trainset and predictor set, scores the seed program once to seed the round baseline, and builds the round-zero candidate queue via the basic proposer. The returned planned-trial count is cfg.depth * length(predictors) * cfg.breadth.

step(sampler, trial_idx, program, trainset, metric, opts)

Run a single COPRO trial against the session sampler.

Halts once the planned trial budget is met. Otherwise pops the next candidate, applies it on top of the current best overrides, scores the whole program, records the result, and returns a trial_result map. When the round queue is exhausted it commits the round; if the depth budget is not yet spent it builds the next round's queue (grounded in the attempt history) and re-enters to return one evaluated trial.