CouncilEx.Councils.JuryWithRetry (CouncilEx v0.1.0)

Copy Markdown View Source

K-judge majority topology with confidence-triggered retry.

Pattern: K judges run :independent_analysis in parallel; aggregate member confidence is checked against a threshold; if below, the whole round re-runs (up to :max_iterations). Chair synthesizes the final iteration.

Convergent design across multiple production systems and papers: Chaos-MoA-Pipeline (Generate → Critique → Rebuttal → multi-judge majority with confidence), Adjudicator (weighted-vote council with KG override), and the broader literature on jury-style aggregation. See docs/RELATED_WORK.md for citations.

Why retry on low confidence

Self-reported confidence is noisy but cheap. When the average across K judges is high, the easy-case path costs K calls. When low, the hard-case path spends 2K-3K calls to push the answer past threshold. Net: more compute on hard cases, none wasted on easy ones — the same shape as adaptive sampling in human juries.

Wu et al. Can LLM Agents Really Debate? (arXiv:2511.07784) shows that visible majority pressure causes correct agents to capitulate to wrong consensus. JuryWithRetry mitigates this by not sharing judges' answers across iterations: each retry is an independent re-sample, not a debate. Judges never see prior-iteration verdicts.

Defaults (sensible for most cases)

  • confidence_threshold: 0.7 — calibrated against :self_report on frontier models. Above this, judges meaningfully agree they know the answer; below, the question is hard enough to warrant a re-sample. Tune up for higher-stakes calls, down for fuzzier tasks.
  • max_iterations: 2 — initial run + at most one retry. Three or more iterations rarely change the verdict; the marginal calls are usually wasted. Bump to 3 for adversarial / safety-critical flows.
  • Members auto-injected with confidence: :self_report if no :confidence opt is set. The retry mechanism is meaningless without confidence.

Usage

council =
  CouncilEx.Councils.JuryWithRetry.new(
    as: MyApp.Jury,
    judges: [
      {:j1, MyApp.Members.Judge, [provider: :openai, model: "gpt-4o"]},
      {:j2, MyApp.Members.Judge, [provider: :anthropic, model: "claude-sonnet-4-6"]},
      {:j3, MyApp.Members.Judge, [provider: :gemini, model: "gemini-2.5-pro"]}
    ],
    chair: {MyApp.Members.Synth, [provider: :openai, model: "gpt-4o"]},
    confidence_threshold: 0.75,
    max_iterations: 2
  )

CouncilEx.run(council, %{question: "..."})

After the run, result.rounds |> hd |> Map.get(:metadata) |> Map.get(:iterations) reports how many retries actually fired.

When NOT to use this

  • Tasks where confidence-of-form is poorly correlated with correctness (creative writing, open-ended generation). The retry signal is meaningless.
  • Single-shot extraction where retry cost dominates. Use Councils.ParallelPanel instead.
  • Adversarial debate where you want judges to see each other. Use Councils.PeerReview or Councils.Consensus.

See also

  • Councils.WeightedConsensus — weight-aware aggregation, no retry
  • Rounds.Iterate — primitive this topology composes
  • CouncilEx.Confidence — strategies for populating MemberResult.:confidence

Summary

Functions

Build the JuryWithRetry topology as a generated council module.

Functions

new(opts)

@spec new(keyword()) :: module()

Build the JuryWithRetry topology as a generated council module.

See moduledoc for opts and defaults.