CouncilEx.Rounds.AnonymizedPeerReview (CouncilEx v0.1.0)

Copy Markdown View Source

Like CouncilEx.Rounds.PeerReview but each judge sees the prior round's outputs under anonymous labels ("Response A", "Response B", …) instead of original member ids.

Pattern from karpathy/llm-council: anonymizing peer outputs prevents models from playing favorites when ranking each other. The :label_to_id map is preserved through the round and handed to the aggregator so winners and rankings are reported in original-id space.

Decision guide for PeerReview vs AnonymizedPeerReview: docs/PEER_REVIEW_PATTERNS.md.

Why anonymize?

When LLMs judge each other's work with author identities visible, rankings collapse to garbage signal. Three failure modes anonymization prevents:

  1. Self-recognition bias. LLMs recognize their own writing style (idioms, formatting, hedging patterns). Given a mixed pile labeled by id, a model spots its own output and ranks itself first. Every model does it. Result: every judge picks self → no winner, no signal.

  2. Brand bias. If labels expose model names (gpt-4o-mini, claude-sonnet-4-6), models defer to known-strong brands or attack rivals based on training-data sentiment, not actual answer quality. Judgments based on reputation, not text.

  3. Stable-position leakage. Repeated runs with the same id order let a judge learn "slot N = competitor, downrank." Stable id ordering across runs leaks signal anonymization is meant to remove.

Anon labels (Response A/B/C) plus own-slot removal close all three. The judge sees only text and is forced to evaluate substance.

Why this lives in the library, not user code

User-side anonymization is doable but error-prone:

  • Easy to leak ids in prompts (forget to strip from one field).
  • Easy to assign per-judge labels inconsistently — label A meaning different model to different judges breaks aggregation.
  • Easy to drop the de-anon map and lose UI traceability.

This round solves all three:

  • Global stable map — every judge sees gpt → Response A. Aggregation across judges is meaningful.
  • Own-slot removal — judge never sees its own answer at all. Self-recognition impossible.
  • Map preserved through to aggregatorwinner, scores, avg_position, judge_ballots all reported in original-id space.

When NOT to use it

  • Single-judge setups (no peer pool to anonymize over).
  • Tasks where author identity is the signal — e.g. "which model is most aligned with house style?". Use Rounds.PeerReview or a custom round so labels stay visible.
  • Heterogeneous member roles where ids carry semantic meaning the next round depends on (e.g. Researcher → Critic → Researcher revision). Anonymization removes role context the workflow needs. Use Rounds.PeerReview for cross-visibility, not blind judging.
  • When peer answers contain identifying content the round can't strip (model signs its name, includes vendor markers). Anonymization is label-level only; it does not sanitize content.

Member input shape

Each judge sees:

%{
  # ...original input fields merged in...
  peers: %{"Response A" => content, "Response B" => content, ...},
  peer_labels: ["Response A", "Response B", ...]
}

The judge's own slot is omitted (a judge does not rank itself), but every other judge sees the same global label for every other id. This means rankings can be aggregated across judges and translated back to original ids in one shot.

The judge's system prompt should instruct it to evaluate the entries in peers and emit :ordering (typically via CouncilEx.Schemas.Ranking) using the labels.

Aggregation

Default aggregator: CouncilEx.Aggregators.PeerRanking. The round threads the global :label_to_id map into aggregator opts, so winner, scores, avg_position, and judge_ballots are all reported in original-id space.

Determinism

Labels are assigned by sorting prior-round member ids alphabetically. This keeps the same id → label mapping stable across all judges and across prepare_input/3 / parse_output/3 / aggregate/2.