End-to-end guide for adding durable run history, multi-replica backends, and durable-execution job wrapping to a CouncilEx app.

By default :council_ex ships in-memory only — runs live in a GenServer, Reliability/Registry use ETS, no DB tables. That's enough for single-node prototyping, REPL work, and tests. Production with multiple replicas, durable run history, or cross-node tool sharing needs the persistence layer.

The persistence layer is opt-in via deps. Add the deps you need and the matching modules light up. Skip them and the library stays lean.

What's in this doc

  • When to add persistence
  • mode: :single_node vs mode: :multi_node (the two shapes)
  • Single dep flip — Ecto repo (Postgres or SQLite)
  • Module map (which behaviour each backend implements)
  • End-to-end setup: schemas + migrations + config
  • Optional: Redis backends
  • Optional: Oban worker for durable retries

Mode-driven configuration

CouncilEx exposes a single :mode config key that flips every persistence-related default at once:

# Single node (default; no config needed):
# - Registry backend = ETS
# - Reliability store = Null (no-op)
# - Recorder = none

# Multi-node (one knob flips everything):
config :council_ex,
  mode: :multi_node,
  repo: MyApp.Repo,
  pubsub: {CouncilEx.PubSub.Phoenix, name: MyApp.PubSub}
# Under :multi_node:
# - Registry backend = CouncilEx.Registry.Ecto
# - Reliability store = CouncilEx.Reliability.Ecto
# - Recorder = CouncilEx.Recorder.Ecto (autowired into every start/3)

Mode just sets defaults — per-key overrides win, so you can mix and match:

config :council_ex,
  mode: :multi_node,
  repo: MyApp.Repo,
  reliability_store: CouncilEx.Reliability.Redis,   # override one backend
  recorder: nil                                      # opt out of autowired recorder

:repo is resolved per *.Ecto module: shared config :council_ex, repo: MyApp.Repo covers everything, and a per-module override (config :council_ex, CouncilEx.Reliability.Ecto, repo: OtherRepo) wins where set. Boot emits a Logger.warning/1 if :multi_node is selected but no repo is resolvable for an *.Ecto backend that's still active.

When to add persistence

NeedReach for
LiveView rejoin after node hop with full run historyRecorder.Ecto + Persistence.Schema.{Run, RunEvent} + Persistence.Recovery
Tool / input-mapper registrations visible across replicasRegistry.Ecto (or .Redis)
Per-member reliability scores accumulated across replicasReliability.Ecto (or .Redis)
Durable retries surviving node deathWorkers.Oban
Single-node, in-memory onlyNone — defaults are fine

SQL portability

Backends use plain Ecto.Repo.query/2 with portable SQL — INSERT ... ON CONFLICT (col) DO UPDATE SET v = EXCLUDED.v works on Postgres and SQLite (3.24+). The module names are *.Ecto, not *.Postgres, because the underlying engine is whatever your Ecto.Repo adapter is.

Tested with :postgrex. SQLite via :ecto_sqlite3 should work for Reliability + Registry; the Recorder.Ecto schemas use :bigint, :text, and :jsonb types — JSONB is Postgres-only. SQLite users needing Recorder.Ecto would have to fork the schemas to use :text plus app-side encoding. Reliability + Registry have no such constraint.

Single dep flip

# mix.exs
def deps do
  [
    {:council_ex, "~> 0.12"},
    {:postgrex, "~> 0.20"},     # or {:ecto_sqlite3, "~> 0.18"} for SQLite
    {:ecto_sql, "~> 3.13"}
  ]
end

That's it for the dep side. Everything below is config + migrations.

Module map

Behaviour:single_node default:multi_node defaultOther backends
CouncilEx.Reliability (store)CouncilEx.Reliability.Null (no-op)CouncilEx.Reliability.EctoCouncilEx.Reliability.ETS, .Redis
CouncilEx.Registry.BackendCouncilEx.Registry.ETSCouncilEx.Registry.EctoCouncilEx.Registry.Redis
CouncilEx.Recordernone (events ephemeral)CouncilEx.Recorder.Ecto

Pick per-feature. You can mix freely — e.g. Reliability.Redis (low latency hot path) + Registry.Ecto (audit-friendly history) + Recorder.Ecto (run history) on one repo.

End-to-end setup

1. Add deps

{:postgrex, "~> 0.20"},
{:ecto_sql, "~> 3.13"}

2. Configure your repo

If you already have an Ecto.Repo for your app, reuse it. CouncilEx doesn't need its own.

# config/config.exs
config :my_app, MyApp.Repo,
  database: "my_app_dev",
  username: "postgres",
  hostname: "localhost"

config :my_app, ecto_repos: [MyApp.Repo]

3. Wire CouncilEx to use the Ecto-backed modules

One mode flip + a shared repo:

# config/runtime.exs
config :council_ex,
  mode: :multi_node,
  repo: MyApp.Repo

That's it. :multi_node switches :reliability_store, :registry_backend, and :recorder defaults to their *.Ecto modules. The shared :repo flows to all three via CouncilEx.Config.repo!/1.

Per-module overrides are still honoured:

# Optional: a different repo for one backend.
config :council_ex, CouncilEx.Reliability.Ecto,
  repo: MyApp.AnalyticsRepo

4. Generate + run the migration

defmodule MyApp.Repo.Migrations.AddCouncilEx do
  use Ecto.Migration

  def up do
    CouncilEx.Persistence.Migration.up()
  end

  def down do
    CouncilEx.Persistence.Migration.down()
  end
end

CouncilEx.Persistence.Migration.up/0 creates all required tables:

  • council_ex_runs — Recorder run rows
  • council_ex_run_events — Recorder event log
  • council_ex_reliability — Reliability counters
  • council_ex_registry — Registry blobs

Then:

mix ecto.migrate

5. Recorder is autowired under :multi_node

With mode: :multi_node + shared :repo, every CouncilEx.start/3 call automatically attaches CouncilEx.Recorder.Ecto — no per-call opt needed. The recorder spawns alongside the RunServer, subscribes to the run's PubSub topic, and persists :run_started, every :round_started, :member_completed, and :run_completed (or :run_failed).

Opt out for a specific run by passing recorder: nil:

{:ok, pid} = CouncilEx.start(MyCouncil, %{question: q}, recorder: nil)

Or pass an explicit recorder spec to override args:

{:ok, pid} = CouncilEx.start(MyCouncil, %{question: q},
  recorder: {CouncilEx.Recorder.Ecto, %{repo: MyApp.AnalyticsRepo}}
)

Under :single_node (default) no recorder is attached unless you pass :recorder per call, matching the in-memory model.

6. Optional: boot-time recovery

If a node crashed mid-run, the council_ex_runs row is left in :running state. Reconcile on app boot:

defmodule MyApp.Application do
  def start(_, _) do
    children = [
      MyApp.Repo,
      # ... your app's children ...
      {Task, fn ->
        CouncilEx.Persistence.Recovery.reconcile_orphans(repo: MyApp.Repo)
      end}
    ]

    Supervisor.start_link(children, strategy: :one_for_one)
  end
end

Marks orphans as :failed with a recovery error so they're not mistaken for active runs.

7. Optional: Redis backends instead

Per-key overrides win over the :multi_node defaults — set just the backend you want to swap:

# Add :redix
def deps do
  [{:redix, "~> 1.5"}]
end

# Start Redix in your supervision tree
children = [
  {Redix, name: :council_ex_redix, host: "localhost"}
]

# Mode keeps the other two on Ecto; reliability moves to Redis.
config :council_ex,
  mode: :multi_node,
  repo: MyApp.Repo,
  reliability_store: CouncilEx.Reliability.Redis

config :council_ex, CouncilEx.Reliability.Redis,
  conn: :council_ex_redix,
  prefix: "council_ex:reliability"

You can run mixed: Reliability.Redis + Registry.Ecto + Recorder.Ecto.

8. Optional: Oban worker for durable retries

CouncilEx.Workers.Oban wraps a council run in an Oban job — durable across deploys, retry-safe, with the runner linked to the worker so job timeout kills the run.

def deps do
  [{:oban, "~> 2.19"}]
end

# Enqueue:
%{council: "MyApp.Councils.Advisor", input: %{q: "..."}}
|> CouncilEx.Workers.Oban.new()
|> Oban.insert()

Multi-replica deployments

Replica scenarioSolution
Tool registrations on node A invisible on node BSwitch to Registry.Ecto or .Redis
Reliability scores accumulated on node A invisible on node BSwitch to Reliability.Ecto or .Redis
LiveView reconnects to different node mid-runRecorder.Ecto + hydrate from Persistence.Schema.Run row
Run survives node deathWorkers.Oban retries job on a live node

Telemetry-driven persistence (app-owned schema)

Recorder.Ecto is one option. The other — and often the right one if your app already has analytics tables — is telemetry handlers that write directly to your own schema. No Recorder involved, no second source of truth.

Use this when:

  • You already maintain a council_runs / council_member_results schema (or similar) and don't want a parallel run_events table.
  • Your downstream analytics ETL already reads from that schema.
  • You want per-row cost / token accounting and your pricing model lives in your app, not in CouncilEx.

Event contract

Subscribe to [:council_ex, :member, :stop] and you have everything needed to insert one complete row per member call:

Measurements

KeyTypeNotes
durationinteger (native units)Wall clock
input_tokensnon_neg_integerFrom Response.usage.input_tokens
output_tokensnon_neg_integerFrom Response.usage.output_tokens

Metadata

KeyTypeNotes
run_idString.t
member_idString.t (or atom for static members)
member_modulemodule
round_nameatomE.g. :peer_review
round_idxnon_neg_integerPosition in the round timeline
modelString.t | nilProvider-reported model id when available
provideratom | nilProvider id from member opts
status:ok | :error | :timeout | :skipped | :invalid_output | :eliminated
attemptspos_integerRetry attempts used
confidencefloat | nilWhen a confidence strategy is configured

cost_usd is intentionally NOT emitted. Compute it in your handler from input_tokens, output_tokens, and your pricing table — pricing changes often, and CouncilEx doesn't want to ship a stale model.

Skeleton handler

defmodule MyApp.Council.TelemetryRecorder do
  @events [
    [:council_ex, :run, :start],
    [:council_ex, :run, :stop],
    [:council_ex, :member, :stop],
    [:council_ex, :member, :exception]
  ]

  def attach do
    :telemetry.attach_many(
      "myapp-council-recorder",
      @events,
      &__MODULE__.handle/4,
      %{}
    )
  end

  def handle([:council_ex, :member, :stop], meas, meta, _) do
    MyApp.Repo.insert!(%MyApp.Council.MemberResult{
      run_id:        meta.run_id,
      member_id:     to_string(meta.member_id),
      round_name:    Atom.to_string(meta.round_name),
      round_idx:     meta.round_idx,
      model:         meta.model,
      provider:      meta.provider && Atom.to_string(meta.provider),
      status:        Atom.to_string(meta.status),
      attempts:      meta.attempts,
      confidence:    meta.confidence,
      input_tokens:  meas.input_tokens,
      output_tokens: meas.output_tokens,
      duration_ms:   System.convert_time_unit(meas.duration, :native, :millisecond),
      cost_usd:      MyApp.Pricing.cost(meta.model, meas.input_tokens, meas.output_tokens)
    })
  end

  # … run start/stop handlers populate the parent run row …
end

Attach from your application supervisor:

MyApp.Council.TelemetryRecorder.attach()

When NOT to use this approach

  • You don't already have an analytics schema and want a turnkey history table → use Recorder.Ecto.
  • You need run history available to other systems that already query run_events → use Recorder.Ecto.

You can also run both, but expect to reconcile the two tables.

Topology matrix

Which mode and extras to reach for at each deployment scale:

TopologySync run/3Async start/3Recommended :modeCaveats / host-app responsibilities
Single-nodeyesyes:single_node (default)Run history is in-memory; flip to :multi_node + Recorder.Ecto for durability.
Distributed Erlang clusteryesyes (node-pinned):single_node or :multi_nodePin awaiters to the run's node; for failover, layer your own job runner (Oban / Broadway / queue) over start/3 + await/2.
Replicated BEAM (no cluster, shared DB)yesno cross-replica:multi_nodeRecorder.Ecto + Workers.Oban for cross-replica resumption.
Ephemeral BEAM (autoscaled / scale-to-zero)yes (short runs)same-machine only:multi_nodeLong runs need an external job runner that survives instance termination.

What does NOT need persistence

  • The RunServer itself stays in-memory. A run executing on node A can't be migrated to node B mid-flight (PubSub events only reach the origin node's subscribers). Use Recorder.Ecto + Oban for cross-node durability.
  • PubSub topic distribution across nodes uses :pg (default) or Phoenix.PubSub — not Postgres. Configure separately.

Troubleshooting

  • UndefinedFunctionError: Postgrex.Query.parameters/0 — missing :postgrex dep. Add it.
  • UndefinedFunctionError: Redix.command/2 — missing :redix dep. Add it.
  • Recorder.Ecto writes nothing — confirm :recorder opt was passed to CouncilEx.start/3 for that specific run.
  • Recovery never marks orphans — check the row's :status in DB; if it's already :completed/:failed, it's not an orphan.