Per-member historical reliability tracking for adaptive weighting.
Inspired by Wu et al. Council Mode (arXiv:2604.02923): "Agents demonstrating higher accuracy on similar historical queries receive elevated weights during aggregation."
Pieces
CouncilEx.Reliability.Store— behaviour forrecord/3andscore/2.CouncilEx.Reliability.ETS— default in-memory backend. Process-independent, survives across runs in the same BEAM node. Wipe on restart.CouncilEx.Reliability.Null— no-op default that always returnsnilfromscore/2. Used when no store is configured.
Persistent backends (Postgres / Redis / etc.) implement the same behaviour. Out of scope for core — host apps wire their own.
Lifecycle
- Run a council. Members produce outputs.
- Out-of-band signal — eval harness, downstream metric, user feedback — decides if a member's output was correct.
- Caller invokes
Reliability.record(:member_id, query_features, match?)once per (member, query). - Future runs:
WeightedConsensus(or any weighted aggregator) callsReliability.score(:member_id, query_features)to fetch a reliability prior. Cold start →nil→ equal-weight fallback.
This module is the chicken-and-egg layer: you need ground-truth
signals to populate it, and bench/ is the obvious source.
Configuration
config :council_ex, :reliability_store, CouncilEx.Reliability.ETSOr pass a store opt to score/3 / record/4:
Reliability.score(:m1, %{features: ...}, store: MyApp.PgStore)
Summary
Functions
Record one outcome. match? = whether member's output was correct.
Look up a reliability score for a member on a query.
Callbacks
Functions
Record one outcome. match? = whether member's output was correct.
Look up a reliability score for a member on a query.
Returns a float in [0.0, 1.0] or nil when no history exists
(cold-start). Callers should treat nil as "use equal weight".