Multi-model LLM council workflows for Elixir.

Define a council of specialized members, run structured rounds of analysis, and synthesize a final answer. Works against popular providers (OpenAI, Anthropic, Gemini, Ollama, OpenRouter). Built to get richer answers from multiple models while keeping control over the process.

Inspired by Andrej Karpathy's karpathy/llm-council: the multi-stage peer-review pattern that motivated this framework.

CouncilEx

Contents

Features

Ordered roughly from core primitives → execution → per-member capabilities → reliability → observability → dev tooling.

  • 🏛️ Static & dynamic councils: declare councils with the use CouncilEx DSL or build them as data via %CouncilEx.DynamicCouncil{} (pipeable builder, JSON ser/de, registry-by-string-name).
  • 🔌 Multi-provider adapters: OpenAI, Anthropic, Gemini, and OpenRouter implement the CouncilEx.Provider.Adapter behaviour; Ollama ships as a config preset over the OpenAI adapter. All five are built in.
  • 🥊 Round library: :independent_analysis, :peer_review, :vote, :pairwise_elimination, plus prebuilt Councils.{Specialist,Consensus,Tournament,WeightedConsensus,JuryWithRetry} and a custom-round behaviour.
  • ⚖️ Confidence-triggered retry: Councils.JuryWithRetry runs K judges in parallel and re-samples on low average confidence (default threshold 0.7, max 2 iterations). Judges DO NOT see each other across retries: independent re-sample, not debate. Pattern convergent across Chaos-MoA / Adjudicator / production systems; respects Wu et al. Can LLM Agents Really Debate? (arXiv:2511.07784).
  • ⚖️ Reliability-weighted consensus: Councils.WeightedConsensus weights member contributions by static :weight opts, per-member :confidence scores, or historical Reliability lookups. Inspired by Wu et al. Council Mode (arXiv:2604.02923); full mapping in docs/COUNCIL_MODE_PAPER.md.
  • 🎯 Per-member confidence: opt-in :confidence strategies (:self_report, :logprob) populate %MemberResult{}.confidence for downstream weighting.
  • 🔍 BiasDetector: diagnostic-only CouncilEx.BiasDetector.analyze/2 flags when member disagreement correlates with demographic axes (gender, ethnicity, religion, age, ability). Lexicon backend in core. LLM-judge and embedding-cluster backends planned.
  • 📚 Reliability store: CouncilEx.Reliability (ETS default, pluggable) tracks per-member historical accuracy by query features. Feeds WeightedConsensus for adaptive weighting.
  • Sync + async runs: blocking run/3 for short workflows, start/3 (GenServer.start/3 semantics: unsupervised, unlinked) and start_link/3 (linked to caller) for async. Both return {:ok, pid}. Communicate with the runner via message passing, like any GenServer.
  • 🛂 Pre-run validation: CouncilEx.validate/1 returns structured [%{path, code, message}] errors for module-form or %DynamicCouncil{} councils. start/3 gates on it so config errors return {:error, {:invalid_council, errs}} before any process spawns or token is spent.
  • 🌳 Optional run grouping: CouncilEx.Supervisor is a thin DynamicSupervisor wrapper for callers who want tenant isolation, bulk-terminate, or in-flight visibility. Library has no bundled supervisor: runs are unsupervised by default (caller's responsibility, like GenServer.start/3).
  • 🪆 Sub-councils: nest a council as a member; works in static and dynamic forms (registered name, module atom, or nested %DynamicCouncil{}) with optional input mappers.
  • 🚦 Routers: dynamic next-step selection between members or rounds, declared inline or registered by name.
  • 🤖 AutoCouncil: opt-in routing layer. A council that picks itself. Pluggable strategies (:rules, :cascade, plus stub :embedding / :llm_classify / :llm_build) select an existing council per prompt, or synthesize a fresh %DynamicCouncil{} on the fly. Same CouncilEx.run/3 entry. Routing decision surfaced in result.metadata.auto.
  • 🛠️ Tool calling: parallel tool execution with concurrency + timeout knobs, multi-iteration tool-loops in both complete/2 and stream/3, and :tool_choice (:auto | :required | :none | "name").

  • 📚 RAG via tools: council-level add_council_tool/2 exposes a shared toolset to every member. Per-member :tools keeps specialist corpora private. CouncilEx.Tools.InMemoryDocs is a zero-dep BM25 retrieval tool baked from a compile-time corpus, useful for examples and tests. Production retrieval should wrap your real index. See docs/RAG.md.
  • 📐 Structured output: Ecto-schema or inline JSON Schema per member, with native responseSchema (Gemini) and tool-shaped fallback (OpenAI/Anthropic).
  • 🌊 Streaming: token-level streaming with sink callbacks, integrated with the tool-loop so tool-spanning turns look like one continuous response.
  • 🎛️ Profiles: reusable per-member capability bundles (provider, model, temperature, tools, retry); 9 prebaked profiles plus user-defined use CouncilEx.Profile modules.
  • 🔀 Polymorphic dispatch: CouncilEx.run/3 and start/3 take either a module-form council or a %DynamicCouncil{}; one execution path, identical semantics.
  • 🛡️ Failure handling: per-round failure_mode: :continue | :fail_fast, retry policies, member timeouts, run-level cancel/1, and structured %CouncilEx.Error{}.

  • 📒 Registry: config + runtime registration of profiles, tools, schemas, routers, rounds, sub-councils, and input mappers, all resolvable by string name.
  • 📡 PubSub events: 10 frozen events on "council_ex:run:#{run_id}" (CouncilEx.Events); idempotent subscribe across :pg and Phoenix.PubSub adapters.
  • 📊 Telemetry: [:council_ex, :run | :round | :member | :tool, :*] events with full parity on the modern async path; ~3µs/event overhead.

  • 🔍 Verbose tracer: verbose: true | :debug opt prints a human-readable per-run timeline (member start/stop, durations, tokens, tool calls). Pure event consumer, zero production cost when off.

  • 🗺️ Diagram tooling: CouncilEx.Diagram.{to_ir,topology,sequence} for both council shapes; IR is React-Flow-friendly JSON.
  • 🧪 Mock provider: scriptable in-memory provider for tests and example fixtures (CouncilEx.Providers.Mock.script/2); not for production use.

Installation

def deps do
  [
    {:council_ex, "~> 0.1"}
  ]
end

Real LLM providers need a configured adapter + API key (e.g. OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, OPENROUTER_API_KEY). See Providers.

Optional dependencies

The core (parallel rounds, aggregation, streaming, tools, telemetry/PubSub observability) needs nothing beyond the dep above. Each opt-in backend pulls its own library — add it only when you use that feature:

FeatureAdd to depsDocs
Ecto persistence (Recorder/Registry/Reliability.Ecto, migrations){:ecto_sql, "~> 3.13"} + a driver, e.g. {:postgrex, "~> 0.20"}PERSISTENCE.md
Durable background runs{:oban, "~> 2.19"}RUNNING_WITH_OBAN.md
Redis backends (Registry/Reliability.Redis){:redix, "~> 1.5"}
Route events through your own PubSub{:phoenix_pubsub, "~> 2.1"}RUNNING_IN_PHOENIX.md

These are declared optional: true, so they are not installed transitively — including under Mix.install (e.g. in a Livebook). council_ex compiles fine without them; the relevant modules are simply omitted until the dep is present.

Quickstart

This walkthrough uses OpenRouter to answer the meta-question: when should you use an LLM council instead of a single model call? OpenRouter is the easiest way to start. One API key reaches every major frontier model (openai/gpt-4o, anthropic/claude-sonnet-4-6, google/gemini-2.5-flash, meta-llama/llama-3.3-70b-instruct, etc.), so a multi-model council needs no extra wiring. The same council code runs against OpenAI, Anthropic, Gemini, or Ollama directly. See Providers.

# 1. Configure OpenRouter. Set OPENROUTER_API_KEY in your shell.
Application.put_env(:council_ex, :providers,
  openrouter: [
    adapter: CouncilEx.Provider.Adapters.OpenRouter,
    api_key: {:system, "OPENROUTER_API_KEY"}
  ]
)

# 2. Define members (identity: role + system prompt)
defmodule MyApp.Members.Advocate do
  use CouncilEx.Member
  role "Advocate"

  system_prompt """
  You argue FOR using a multi-model LLM council. Given the user's task,
  list 3-5 concrete situations where multiple model voices outperform a
  single call (e.g. high-stakes decisions, contested judgement, weak
  ground truth, creative divergence). Be specific. No hedging.
  """
end

defmodule MyApp.Members.Skeptic do
  use CouncilEx.Member
  role "Skeptic"

  system_prompt """
  You argue AGAINST using a multi-model LLM council. Given the user's
  task, list 3-5 concrete situations where a council is overkill or
  actively harmful (latency, cost, false consensus, deterministic
  problems with a known answer). Be specific. No hedging.
  """
end

defmodule MyApp.Members.Synthesizer do
  use CouncilEx.Member
  role "Synthesizer"

  system_prompt """
  Read the Advocate's and Skeptic's lists. Produce a short decision rule
  the reader can apply to their own task: "use a council when …, skip it
  when …". Two short paragraphs max.
  """
end

# 3. Define a council (capability: provider + model)
#    Each member can run on a different frontier model. That's the
#    point. OpenRouter exposes them all under one provider.
defmodule MyApp.WhenToCouncil do
  use CouncilEx

  member :advocate, MyApp.Members.Advocate,
    provider: :openrouter, model: "openai/gpt-4o-mini"

  member :skeptic, MyApp.Members.Skeptic,
    provider: :openrouter, model: "anthropic/claude-sonnet-4-6"

  round :independent_analysis

  chair MyApp.Members.Synthesizer, id: :chair,
    provider: :openrouter, model: "openai/gpt-4o"
end

# 4. Run
{:ok, result} =
  CouncilEx.run(
    MyApp.WhenToCouncil,
    %{question: "When should I use an LLM council instead of a single LLM call?"}
  )

IO.puts(result.final.content)
Example run output (VERBOSE=1 mix run examples/quickstart_example.exs) ``` VERBOSE=1 mix run examples/quickstart_example.exs 17s 17:48:01 ▶ run …Sd72NF started council=QuickstartCouncil.WhenToCouncil ▶ round independent_analysis (#0) ▶ advocate ▶ skeptic ✓ advocate 5547ms in=90 out=361 ✓ skeptic 5839ms in=91 out=398 ✓ round independent_analysis ▶ round synthesis (#1) ▶ chair ✓ chair 3825ms in=918 out=156 ✓ round synthesis ✓ run …Sd72NF ok === Panel members (independent_analysis round) === [advocate] 1. **High-Stakes Decisions**: multiple model voices minimize catastrophic-error risk … 2. **Contested Judgement**: subjective calls benefit from differing viewpoints … … (5 situations total) [skeptic] 1. **Latency in Time-Sensitive Applications**: each model query adds delay … 2. **Cost-Efficiency in High-Volume Use Cases**: per-call costs multiply … … (5 situations total) === Final synthesis (chair) === Use a council for complex, high-stakes, or contested decisions where multiple perspectives or weak/disputed data justify the extra cost. Skip it when the task demands speed, cost-efficiency, or has a clear single answer. Total duration: 15226ms Total tokens: 2014 ```

The Advocate and Skeptic run in parallel during the :independent_analysis round; the Synthesizer chair sees both outputs and produces the final answer. Inspect result.rounds for each member's verdict and result.metadata for token + timing totals.

A runnable version of this exact council lives in examples/quickstart_example.exs. Run it with OPENROUTER_API_KEY=sk-or-v1-... mix run examples/quickstart_example.exs.

Single-vendor variant: If you only have one vendor's API key (OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY), swap the provider config in step 1 for that vendor's adapter and use that vendor's model ids (see Providers). The council code is unchanged.

Karpathy-style 3-stage council: for the opinions → anonymized peer review → chairman pattern from karpathy/llm-council, see docs/TUTORIAL_KARPATHY_COUNCIL.md and the runnable examples/karpathy_council_example.exs. Decision guide for picking between PeerReview and AnonymizedPeerReview lives at docs/PEER_REVIEW_PATTERNS.md.

Mock provider: CouncilEx.Providers.Mock exists for tests and deterministic example fixtures only. Do not use it as a stand-in for a real LLM in application code. See Test helpers.

Examples

Index of examples/*.exs. Every example runs against a real provider (default OpenAI or OpenRouter: see the Run: comment at the top of each file for the required API key). The Mock provider exists for tests only; do not run it as a stand-in for an LLM in examples.

Most examples support the COUNCIL_FORM=static|dynamic env switch (see Dual-form pattern). Examples that don't support the switch: dynamic_council_example.exs (already dynamic), the prebaked Councils.{Specialist,Consensus,Tournament}.new/1 wrappers (specialist, consensus, tournament), and the council-bypass demos (parallel_tools, tool_call_events). sub_council_example.exs also supports the switch.

Topologies & composition

Profiles & dynamic councils

Custom rounds & voting

Streaming & tools

Operational concerns

Per-provider quickstarts

Concepts

Vocabulary used throughout the rest of the README.

  • Council: the workflow itself. A named ordering of members + rounds + an optional chair. Two interchangeable forms: module-form (use CouncilEx) or data-form (%CouncilEx.DynamicCouncil{}).
  • Member: one LLM seat at the table. Defines identity (role, system_prompt, optional output_schema). Identity is reusable; pair it with different capability stacks via Profiles.
  • Profile: capability stack (provider, model, temperature, max_tokens, tools, retry). Same Member + different Profile = same brain, different model. Resolution: inline opts > member :profile > council default_profile > app config.
  • Round: one phase of the run. Built-in types: :independent_analysis (members run in parallel), :peer_review (members see each other's prior turn), :vote (each member emits a ballot, aggregator picks a winner), :pairwise_elimination (tournament bracket), plus :anonymized_peer_review, :critique, :ranking, :synthesis, :iterate, and user-defined CouncilEx.Round modules. A council can have any number of rounds.
  • Chair: final synthesis member. Runs once after all rounds, sees every prior member output, and produces %Result{}.final. Optional. Councils without a chair return per-round results only.
  • Router: dynamic next-step picker. Inspects state mid-run and chooses the next member or round. Inline closure or registered-by-name.
  • Sub-council: a council used as a member of another council. Composes vertically: the outer council sees the sub-council's final as that member's response. Works in static + dynamic forms.
  • Run: one execution of a council against an input. Identified by run_id. Sync via run/3, async via start/3 + await/2 / cancel/1.
  • Result: %CouncilEx.Result{} returned from run/3 and await/2. Carries input, per-round %RoundResult{} (with per-member %MemberResult{}), final chair response, status, errors, and metadata (timings + token totals).
  • Tool: Elixir module implementing CouncilEx.Tool that the model can call mid-turn. Parallel execution + multi-iteration tool-loops are built in.
  • Aggregator: function that reduces a :vote round's ballots into a winner. Plurality, WeightedMean ship in core; user-defined ones plug into the same interface.
  • Registry: runtime/config table of named profiles, tools, schemas, routers, rounds, sub-councils, and input mappers. Lets data-form councils reference behaviour by string name ("my_tool") instead of module atoms, required for JSON ser/de.
  • Provider adapter: module behind a configured provider: key (:openai, :anthropic, …) that translates a normalized request into an HTTP call and parses the response. Implements CouncilEx.Provider.Adapter. OpenAI / Anthropic / Gemini / Ollama / OpenRouter ship in core.
  • Council vs ensemble: a classical ensemble = N models in parallel + flat aggregator (one round, no roles). A council adds roles, multi-round flow, cross-member visibility, iteration, chair synthesis, sub-councils, and dynamic routing. Only the Voting topology reduces to ensemble shape; the other six add structure ensembles cannot express. See docs/COUNCILS.md for the full comparison.
  • AutoCouncil: %CouncilEx.AutoCouncil{} data struct that resolves to a council at run time. Holds a :strategy (:rules, :cascade, …), a :catalog of routable councils (inline list or registry-backed), and an :on_no_match policy. From the runner's perspective it is a council. Pass it to CouncilEx.run/3 like any other. The picked council's identity surfaces in result.metadata.auto. See Auto-routing.

Council forms

CouncilEx exposes two interchangeable ways to declare a council. Both lower to the same %CouncilEx.Spec{} and execute through the same runtime — behaviour, telemetry, and %Result{} shape are identical.

PickWhen
Static (use CouncilEx)Workflow is checked into code. Members, rounds, chair, router known at compile time.
Dynamic (%DynamicCouncil{})Workflow built at runtime, persisted to a DB as JSON, edited in a UI.

CouncilEx.run/3 and start/3 accept either form (polymorphic dispatch), so you can switch a council from static to dynamic without touching call sites.

Static module-form

defmodule MyApp.MyCouncil do
  use CouncilEx

  default_profile CouncilEx.Profiles.OpenAIMini

  member :researcher, MyApp.Members.Researcher
  member :critic,     MyApp.Members.Critic
  round :peer_review
  chair MyApp.Members.Synthesizer, profile: CouncilEx.Profiles.OpenAIBalanced
end

Full DSL macro reference (member forms, round, chair, router, default_profile, output_schema) and prebuilt Councils.* templates (ParallelPanel, PeerReview, Voting, Specialist, Consensus, Tournament, WeightedConsensus, JuryWithRetry): docs/COUNCILS.md.

Dynamic form, registry, sub-councils, hybrid

docs/DYNAMIC_COUNCILS.md covers everything runtime-configurable:

  • Dynamic data-form — pipeable builder (add_member/2, set_chair/2, …), JSON round-trip (to_json/2 / from_json/1), inline JSON Schema output, profile_overrides, React-Flow export (to_flow_graph/1).
  • Registry — string-keyed lookup with config + runtime tiers; eight kinds (:profile, :tool, :schema, :router, :round, :sub_council, :input_mapper, :council).
  • Sub-councils — nest any council (module, %DynamicCouncil{}, or registered name) as a member; :input_mapper projects input between layers.
  • Hybrid form — static outer with dynamic sub-council, or dynamic outer referencing static modules; per-tenant flows and incremental migration.
  • Prebuilt dynamic variantsCouncils.{Specialist,Consensus,Tournament,WeightedConsensus}.new_dynamic/1 return a %DynamicCouncil{}.
  • Dual-form pattern — run the same topology as static or dynamic via a COUNCIL_FORM=static|dynamic switch.

Providers

CouncilEx ships five provider adapters. Configure once in app config; route members via the provider: opt or a Profile. The council DSL is provider-agnostic.

Provider atomEnv varNotes
:openaiOPENAI_API_KEYTool-calling, streaming, structured output.
:anthropicANTHROPIC_API_KEYresponse_schema: and tools: are mutually exclusive per member.
:geminiGEMINI_API_KEYNative responseSchema; same mutual-exclusion as Anthropic.
:ollama(none)Config preset over the OpenAI adapter — not a separate adapter impl.
:openrouterOPENROUTER_API_KEYThin wrapper over the OpenAI adapter; reaches any model OpenRouter routes.

See docs/PROVIDERS.md for full config snippets, adapter quirks, multi-provider council patterns, and the CouncilEx.Provider.Adapter behaviour (7 required + 6 optional callbacks) for adding your own provider.

Profiles

A Profile bundles the capability stack (provider, model, temperature, max_tokens, tools, retry) separately from the Member's identity (role, system prompt, output schema). Nine prebaked profiles ship in CouncilEx.Profiles.*: OpenAIBalanced, OpenAIMini, OpenAICreative, OpenAIDeterministic, AnthropicBalanced, GeminiBalanced, OllamaLocal, OpenRouterAuto, OpenRouterClaudeSonnet.

Resolution order (later wins): app config default → council default_profile → member :profile opt → inline opts.

See docs/PROFILES.md for defining custom profiles, dynamic-form registration, profile_overrides, and the prebaked-profile capability table.

Running councils

Start a run and block:

{:ok, result} = CouncilEx.run(MyCouncil, %{question: "go or wait?"})

Start async, stream progress events, then await:

{:ok, pid} = CouncilEx.start(MyCouncil, input, subscribe: true)
run_id = CouncilEx.RunServer.run_id(pid)

receive do
  {:round_completed, ^run_id, name, _rr} -> IO.puts("round done: #{name}")
end

{:ok, result} = CouncilEx.await(pid)

Council topologies

Nine pre-built templates (ParallelPanel, PeerReview, Voting, Specialist, Consensus, Tournament, Chairman, WeightedConsensus, JuryWithRetry), five aggregators (Plurality, Borda, Condorcet, WeightedMean, Median), and the Iterate round wrapper for convergence loops.

WeightedConsensus ports Wu et al. Council Mode (arXiv:2604.02923): heterogeneous members aggregated by :weight / :confidence / Reliability lookup rather than equal-weight chair synthesis. Mapping in docs/COUNCIL_MODE_PAPER.md.

JuryWithRetry runs K judges and re-samples on low average confidence (default threshold 0.7, max 2 iterations). Judges don't see each other across iterations. Wu et al. Can LLM Agents Really Debate? (arXiv:2511.07784) conformity mitigation. Pattern shared with Chaos-MoA-Pipeline + Adjudicator. Full multi-paper context in docs/RELATED_WORK.md.

council =
  CouncilEx.Councils.Specialist.new(
    as: MyApp.MyCouncil,
    members: [
      {:seo, MyApp.Members.Seo, [provider: :openai, model: "gpt-4o-mini"]},
      {:tech, MyApp.Members.Tech, [provider: :openai, model: "gpt-4o-mini"]}
    ],
    chair: {MyApp.Members.Synth, [provider: :openai, model: "gpt-4o"]}
  )

{:ok, result} = CouncilEx.run(council, %{topic: "..."})

See docs/COUNCILS.md for the full topology table, aggregator catalog, iteration semantics, and RoundResult.metadata.history shape.

Per-member capabilities

CouncilEx members support structured outputs, streaming, and tool calling independently of one another. Full details — every default, Anthropic-specific behaviour, and PubSub event payloads — are in docs/PER_MEMBER_CAPABILITIES.md.

Structured outputs — set output_schema on a member to an Ecto embedded schema. CouncilEx.Providers.Instructor casts the LLM's JSON into that schema and runs the schema's optional validate_changeset/2; the member module's validate/1 then runs for business rules. On Anthropic, CouncilEx forces a synthetic _respond tool whose input_schema mirrors your Ecto schema; structured-output and user tools: are mutually exclusive on the same member.

Streaming — add stream true to a member. During streaming the adapter reassembles Anthropic partial_json SSE fragments; subscribers receive :member_token PubSub events carrying %CouncilEx.StreamChunk{content, index, finish_reason}. The [:council_ex, :member, :stream_chunk] telemetry event fires per chunk.

Tools — a tool implements CouncilEx.Tool (four callbacks: name/0, description/0, parameters_schema/0, execute/1). The dispatcher runs a bounded tool-call loop (default max_tool_iterations: 5); exceptions are caught by safe_execute/2 and surfaced as {:tool_raised, exception}. Multiple tool calls in one turn run in parallel by default (parallel_tools: true, strategy :collect, tool_concurrency_factor: 1.0, tool_timeout_ms: 30_000). CouncilEx.Providers.Instructor.stream/3 drives the same loop across streaming round-trips; subscribe for :tool_call_request / :tool_call_result events (the synthetic _respond tool is excluded).

Composition

Two ways to scale a council beyond a flat member list: nest a council inside another (sub-councils, including dynamic %DynamicCouncil{} forms with input_mapper), and gate which members participate per round (adaptive routers — council-level or per-round override). Excluded members land in RoundResult with status: :skipped.

See docs/COMPOSITION.md for the full sub-council and router surface (sub-run event topics, :sub_run_id / :sub_result metadata, dynamic-form router registration, :skipped semantics, and a runnable two-level example with mixed providers).

Auto-routing with AutoCouncil

CouncilEx.AutoCouncil is an opt-in routing layer for callers that don't know up-front which council fits a given prompt. Pass it to CouncilEx.run/3 like any other council — internally a strategy picks from a catalog, executes the winning council, and records the decision in result.metadata.auto:

auto = CouncilEx.AutoCouncil.new(
  strategy: :rules,
  catalog:  [
    %{id: "seo",  council: MyApp.Councils.SEO,        match: ~r/seo|sitemap/i},
    %{id: "code", council: MyApp.Councils.CodeReview, match: ~r/code|PR/i}
  ]
)

{:ok, result} = CouncilEx.run(auto, %{question: "audit my SEO"})
result.metadata.auto
# => %{strategy: :rules, kind: :static, catalog_id: "seo",
#      reason: "matched ~r/seo|sitemap/i", score: nil, latency_ms: 1}
  • Strategies:rules (regex/fun, zero cost), :cascade (chain cheap→expensive), :embedding / :llm_classify / :llm_build (stubs, return {:error, :not_implemented}), or {MyModule, opts} for custom.
  • Catalog — inline list or {:registry, :council} for hot-reloadable shared routing. provider_check: true drops entries whose providers aren't configured.
  • Fallbackon_no_match: :error (default), {:fallback, MyCouncil}, or {:fallback, "registered_id"}.
  • ShortcutCouncilEx.auto/1,2 uses :council_ex, :auto app config as default; per-call opts override it and :verbose/:await_timeout forward to run/3.

Full reference — Strategy behaviour, custom-strategy recipe, decision-shape contract, telemetry events (:decision, :cascade_step, :catalog_filtered), composability — in docs/AUTO_COUNCILS.md.

Observability

Ten events fire on topic "council_ex:run:#{run_id}": :run_started, :round_started, :member_started, :member_token, :tool_call_request, :tool_call_result, :member_completed, :round_completed, :run_completed, :run_failed (documented in CouncilEx.Events).

  • Phoenix.PubSub adapter — route events through your own server: config :council_ex, pubsub: {CouncilEx.PubSub.Phoenix, name: MyApp.PubSub}. CouncilEx never starts a PubSub server itself.
  • Telemetry loggerCouncilEx.Telemetry.attach_default_logger/0,1 attaches Logger handlers (:events subset opt; :exception always logs at :warning; re-attach is idempotent); detach_default_logger/0 removes them.
  • Verbose modeverbose: true | :debug prints a per-run timeline to stdout (zero cost when off; verbose_io: to redirect).

Full reference: docs/OBSERVABILITY.md. Topology diagrams: docs/DIAGRAMS.md.

Introspection — inspect a council's structure as data at runtime (Mod.__council__/0%Spec{}, __providers__/0), export it as a node/edge graph for a UI (CouncilEx.Diagram.to_ir/1, both forms), or query a live run (CouncilEx.RunServer.state/1, list_active_runs/0). See docs/INTROSPECTION.md.

Testing

import CouncilEx.Test for three helpers: script_council/2 (script Mock responses for every member of a council — or nested sub-council — in one call), capture_events/2 (drain a run's PubSub topic until the terminal event or timeout), and assert_round_completed/3 (block on :round_completed and return the %RoundResult{}). The Mock provider is CouncilEx.Providers.Mock (tests and fixtures only; never production code).

See docs/UNIT_TESTING.md for the full helper reference, streaming scripts, and state inspection; docs/TESTING.md for live-provider and manual testing.

Deployment Considerations

A single :mode config knob picks the deployment shape: :single_node (default, no config needed) uses an ETS-backed Registry, a Null reliability store, and no Recorder; :multi_node flips all three to their *.Ecto defaults and autowires Recorder.Ecto into every CouncilEx.start/3 call. Per-key overrides (:reliability_store, :registry_backend, :recorder) always win over the mode default, so mixing backends (e.g. Reliability.Redis + Registry.Ecto + Recorder.Ecto) is one line each.

See docs/PERSISTENCE.md for the module map, migration setup, Redis backends, Oban durable retries, and the deployment topology matrix.

Roadmap & changelog

Capabilities

Topic-tagged highlights of what ships in the 0.1.0 release. See CHANGELOG.md for the full release notes.

  • Paper-replication slate: Councils.WeightedConsensus + Rounds.WeightedSynthesis (Wu et al. Council Mode port, arXiv:2604.02923); per-member :confidence field on %MemberResult{} with :self_report and :logprob strategies; CouncilEx.BiasDetector diagnostic round (lexicon backend); CouncilEx.Reliability store (Null + ETS + Ecto/Postgres + Redis backends); Councils.JuryWithRetry with confidence-triggered re-sample (Chaos-MoA / Adjudicator pattern, Wu et al. Can LLM Agents Really Debate? (arXiv:2511.07784) conformity mitigation); bench/eval/ skeleton harness for TruthfulQA / HaluEval / BBQ; :expose_confidence opt on WeightedSynthesis. Mapping: docs/COUNCIL_MODE_PAPER.md + docs/RELATED_WORK.md.
  • Dynamic councils: build / edit / validate / serialise data-form councils with sub-council composition (registered name / module / nested struct), polymorphic run/3 + start/3 dispatch, full run/round telemetry parity on the async path. Profile DSL + 9 prebaked profiles, per-run verbose: opt, OpenRouter adapter, diagram tooling (CouncilEx.Diagram), real-key-only examples, Gemini schema sanitization. :tool_choice member opt, atom-exhaustion DoS fix on JSON ser/de, idempotent :pg PubSub subscribe, cold-load tool-call adapter probe.
  • Providers: stock OpenAI / Anthropic / Gemini / Ollama / OpenRouter adapters; pluggable Provider.Adapter behaviour; frozen CouncilEx.Events PubSub surface; :member_completed carries full %MemberResult{}.
  • Tool calling: stream tool-loop in CouncilEx.Providers.Instructor.stream/3; parallel tool execution; tool-call PubSub events; Tournament Bracket round; Anthropic structured output via the tool-use API.
  • GenServer-aligned run lifecycle: caller-owned pids via run/3, start/3, start_link/3; opt-in CouncilEx.Supervisor for tenant isolation; no auto-started supervisor (you own the pids).
  • Persistence: optional *.Ecto backends for Reliability, Registry, Recorder, plus *.Redis for Reliability and Registry (Recorder is Ecto-only); CouncilEx.Config :mode knob (:single_node / :multi_node) flips all backends in one place.

Planned

  • Nice-to-have, unscheduled: chained multi-step tool loops where one tool call feeds the next within a single member turn (gap #11 in docs/FUTURE_EXAMPLES.md); ranking-parser regex fallback for cheap models (karpathy pattern); Fairness.parity/2 metric helper (cultural_debate); persona-counterweight presets; LLM-judge / embedding-cluster backends for BiasDetector; logical-validity-aware aggregator (Wu 2025); deterministic pre-injection RAG (docs/future/RAG_PRE_INJECTION.md). Tracked in docs/RELATED_WORK.md.
  • Out of scope for this repo: durable run history, durable execution, and a LiveView dashboard. Build them in your host app against the frozen CouncilEx.Events PubSub surface and Diagram.to_ir/1.

License

Apache-2.0. See LICENSE.

Built by Humberto Aquino · Brewing Elixir.

Acknowledgements

Special thanks to Andrej Karpathy, whose karpathy/llm-council sparked the initial idea behind this project. His "models review each other before a final synthesis" experiment is what we set out to bring to Elixir as a reusable framework. See docs/TUTORIAL_KARPATHY_COUNCIL.md for the Elixir port.

References

  • Wu, S., Li, X., Feng, Y., Li, Y., Wang, Z., & Wang, R. (2026). Council Mode: A Heterogeneous Multi-Agent Consensus Framework for Reducing LLM Hallucination and Bias. arXiv:2604.02923. PDF. Implemented as Councils.WeightedConsensus, per-member confidence (MemberResult.:confidence), BiasDetector (diagnostic), and Reliability store. Full mapping in docs/COUNCIL_MODE_PAPER.md.

For broader context on multi-agent LLM papers and projects (MAD, Adjudicator, karpathy/llm-council, Chaos-MoA-Pipeline, culturaldebate, etc.) and how each maps onto CouncilEx, see docs/RELATED_WORK.md. The Wu et al. _Can LLM Agents Really Debate? (arXiv:2511.07784) finding on conformity-under-visible-majority motivated Councils.JuryWithRetry's "judges don't see each other across iterations" design.