CouncilEx.BiasDetector (CouncilEx v0.1.0)

Copy Markdown View Source

Diagnostic detector for demographic-laden disagreement across member outputs.

Inspired by the Bias Detector component in Wu et al. Council Mode (arXiv:2604.02923). The paper observes that when heterogeneous models disagree, the disagreement sometimes correlates with demographic axes (gender, ethnicity, religion, age, ability). This module surfaces that correlation as a structured report.

Diagnostic only — does not mitigate. Reading the report is on you.

Backends

  • :lexicon (default) — substring/regex match against a built-in term list per axis. Cheap. False-positive prone (any neutral mention of "women" or "Christians" trips it). Ship a default lexicon; users can extend or replace it.

Future backends planned: :llm_judge (separate LLM rates the disagreement), :embedding_cluster (cluster responses, correlate cluster membership with demographic phrasing).

Usage

member_results = %{
  a: %CouncilEx.MemberResult{
    status: :ok,
    response: %CouncilEx.Response{content: "..."}
  },
  b: %CouncilEx.MemberResult{...}
}

report = CouncilEx.BiasDetector.analyze(member_results)
# %{
#   flagged: true,
#   axes: [%{axis: :gender, score: 0.5, evidence: [...]}, ...],
#   baseline_disagreement: 0.4
# }

Report shape

  • :flagged — boolean. True if any axis crosses the threshold (default 0.3).
  • :axes — list of %{axis, score, evidence} per axis where members differ in coverage of demographic terms. :score is coverage_variance ∈ [0, 1]. :evidence is a list of {member_id, [matched_terms]}.
  • :baseline_disagreement — content-similarity proxy across all members (Jaccard over token sets, 0 = identical, 1 = no overlap). High baseline + low axis scores = members disagree on substance, not demographics.

Summary

Functions

Analyze member results and return a bias report.

Returns the default lexicon. Pass your own via :lexicon opt to analyze/2.

Types

axis()

@type axis() :: :gender | :ethnicity | :religion | :age | :ability | atom()

report()

@type report() :: %{
  flagged: boolean(),
  axes: [%{axis: axis(), score: float(), evidence: [{atom(), [String.t()]}]}],
  baseline_disagreement: float()
}

Functions

analyze(member_results, opts \\ [])

@spec analyze(
  %{required(atom()) => CouncilEx.MemberResult.t()},
  keyword()
) :: report()

Analyze member results and return a bias report.

Options

  • :lexicon%{axis => [terms]} to override the default.
  • :threshold — float 0.0..1.0. Axes scoring above this set :flagged => true for the whole report. Default 0.3.
  • :backend:lexicon (default). Reserved for future backends.

default_lexicon()

@spec default_lexicon() :: %{required(axis()) => [String.t()]}

Returns the default lexicon. Pass your own via :lexicon opt to analyze/2.