Per-member confidence extraction.
Inspired by Wu et al. Council Mode (arXiv:2604.02923), which weights
agent contributions by reliability metrics. Confidence is one input to
CouncilEx.Councils.WeightedConsensus.
Strategies
:self_report— appends a confidence-rating instruction to the member's system prompt and parses a JSON tail of the form{"confidence": 0.0..1.0}. Cheap, supported by every adapter, but noisy — LLMs are poorly calibrated self-raters.:logprob— sums normalized top-token logprobs across the response. Pluggable per adapter; only OpenAI exposes logprobs reliably today. Returnsnilwhen the provider response carries no logprob data.{:semantic_entropy, samples: n}— n-sample variance via repeated calls. Expensive (n× cost) but the most calibrated. Not yet implemented; reserved.
Usage
member :critic, MyApp.Members.Critic,
provider: :openai,
model: "gpt-4o-mini",
confidence: :self_reportWhen :confidence is set, the member's %MemberResult{} will carry
a :confidence float (0.0–1.0) or nil if extraction failed.
Strategies are opt-in. The default is nil, preserving prior behaviour.
Summary
Functions
Apply strategy to a %MemberResult{}, populating :confidence.
Augment a member's system prompt for the given strategy.
Extract a confidence float from a %Response{} for the given strategy.
Types
@type strategy() :: :self_report | :logprob | {:semantic_entropy, keyword()} | nil
Functions
@spec apply_to_member_result(CouncilEx.MemberResult.t(), strategy()) :: CouncilEx.MemberResult.t()
Apply strategy to a %MemberResult{}, populating :confidence.
Augment a member's system prompt for the given strategy.
Pure function — caller decides when/whether to apply. Returns the prompt unchanged for strategies that don't require prompt mutation.
@spec extract(CouncilEx.Response.t(), strategy()) :: {CouncilEx.Response.t(), float() | nil}
Extract a confidence float from a %Response{} for the given strategy.
Returns {updated_response, confidence | nil}. For :self_report,
strips the parsed JSON tail from :content so downstream consumers
don't see it.