K-judge majority topology with confidence-triggered retry.
Pattern: K judges run :independent_analysis in parallel; aggregate
member confidence is checked against a threshold; if below, the
whole round re-runs (up to :max_iterations). Chair synthesizes the
final iteration.
Convergent design across multiple production systems and papers:
Chaos-MoA-Pipeline (Generate → Critique → Rebuttal → multi-judge
majority with confidence), Adjudicator (weighted-vote council with
KG override), and the broader literature on jury-style aggregation.
See docs/RELATED_WORK.md for
citations.
Why retry on low confidence
Self-reported confidence is noisy but cheap. When the average across K judges is high, the easy-case path costs K calls. When low, the hard-case path spends 2K-3K calls to push the answer past threshold. Net: more compute on hard cases, none wasted on easy ones — the same shape as adaptive sampling in human juries.
Wu et al. Can LLM Agents Really Debate? (arXiv:2511.07784) shows that visible majority pressure causes correct agents to capitulate to wrong consensus. JuryWithRetry mitigates this by not sharing judges' answers across iterations: each retry is an independent re-sample, not a debate. Judges never see prior-iteration verdicts.
Defaults (sensible for most cases)
confidence_threshold: 0.7— calibrated against:self_reporton frontier models. Above this, judges meaningfully agree they know the answer; below, the question is hard enough to warrant a re-sample. Tune up for higher-stakes calls, down for fuzzier tasks.max_iterations: 2— initial run + at most one retry. Three or more iterations rarely change the verdict; the marginal calls are usually wasted. Bump to 3 for adversarial / safety-critical flows.- Members auto-injected with
confidence: :self_reportif no:confidenceopt is set. The retry mechanism is meaningless without confidence.
Usage
council =
CouncilEx.Councils.JuryWithRetry.new(
as: MyApp.Jury,
judges: [
{:j1, MyApp.Members.Judge, [provider: :openai, model: "gpt-4o"]},
{:j2, MyApp.Members.Judge, [provider: :anthropic, model: "claude-sonnet-4-6"]},
{:j3, MyApp.Members.Judge, [provider: :gemini, model: "gemini-2.5-pro"]}
],
chair: {MyApp.Members.Synth, [provider: :openai, model: "gpt-4o"]},
confidence_threshold: 0.75,
max_iterations: 2
)
CouncilEx.run(council, %{question: "..."})After the run, result.rounds |> hd |> Map.get(:metadata) |> Map.get(:iterations) reports how many retries actually fired.
When NOT to use this
- Tasks where confidence-of-form is poorly correlated with correctness (creative writing, open-ended generation). The retry signal is meaningless.
- Single-shot extraction where retry cost dominates. Use
Councils.ParallelPanelinstead. - Adversarial debate where you want judges to see each other.
Use
Councils.PeerRevieworCouncils.Consensus.
See also
Councils.WeightedConsensus— weight-aware aggregation, no retryRounds.Iterate— primitive this topology composesCouncilEx.Confidence— strategies for populatingMemberResult.:confidence