End-to-end guide for adding durable run history, multi-replica backends, and durable-execution job wrapping to a CouncilEx app.
By default :council_ex ships in-memory only — runs live in a
GenServer, Reliability/Registry use ETS, no DB tables. That's enough
for single-node prototyping, REPL work, and tests. Production with
multiple replicas, durable run history, or cross-node tool sharing
needs the persistence layer.
The persistence layer is opt-in via deps. Add the deps you need and the matching modules light up. Skip them and the library stays lean.
What's in this doc
- When to add persistence
mode: :single_nodevsmode: :multi_node(the two shapes)- Single dep flip — Ecto repo (Postgres or SQLite)
- Module map (which behaviour each backend implements)
- End-to-end setup: schemas + migrations + config
- Optional: Redis backends
- Optional: Oban worker for durable retries
Mode-driven configuration
CouncilEx exposes a single :mode config key that flips every
persistence-related default at once:
# Single node (default; no config needed):
# - Registry backend = ETS
# - Reliability store = Null (no-op)
# - Recorder = none
# Multi-node (one knob flips everything):
config :council_ex,
mode: :multi_node,
repo: MyApp.Repo,
pubsub: {CouncilEx.PubSub.Phoenix, name: MyApp.PubSub}
# Under :multi_node:
# - Registry backend = CouncilEx.Registry.Ecto
# - Reliability store = CouncilEx.Reliability.Ecto
# - Recorder = CouncilEx.Recorder.Ecto (autowired into every start/3)Mode just sets defaults — per-key overrides win, so you can mix and match:
config :council_ex,
mode: :multi_node,
repo: MyApp.Repo,
reliability_store: CouncilEx.Reliability.Redis, # override one backend
recorder: nil # opt out of autowired recorder:repo is resolved per *.Ecto module: shared
config :council_ex, repo: MyApp.Repo covers everything, and a
per-module override (config :council_ex, CouncilEx.Reliability.Ecto, repo: OtherRepo) wins where set. Boot emits a Logger.warning/1 if
:multi_node is selected but no repo is resolvable for an *.Ecto
backend that's still active.
When to add persistence
| Need | Reach for |
|---|---|
| LiveView rejoin after node hop with full run history | Recorder.Ecto + Persistence.Schema.{Run, RunEvent} + Persistence.Recovery |
| Tool / input-mapper registrations visible across replicas | Registry.Ecto (or .Redis) |
| Per-member reliability scores accumulated across replicas | Reliability.Ecto (or .Redis) |
| Durable retries surviving node death | Workers.Oban |
| Single-node, in-memory only | None — defaults are fine |
SQL portability
Backends use plain Ecto.Repo.query/2 with portable SQL — INSERT ... ON CONFLICT (col) DO UPDATE SET v = EXCLUDED.v works on Postgres
and SQLite (3.24+). The module names are *.Ecto, not *.Postgres,
because the underlying engine is whatever your Ecto.Repo adapter is.
Tested with :postgrex. SQLite via :ecto_sqlite3 should work for
Reliability + Registry; the Recorder.Ecto schemas use :bigint,
:text, and :jsonb types — JSONB is Postgres-only. SQLite users
needing Recorder.Ecto would have to fork the schemas to use :text
plus app-side encoding. Reliability + Registry have no such constraint.
Single dep flip
# mix.exs
def deps do
[
{:council_ex, "~> 0.12"},
{:postgrex, "~> 0.20"}, # or {:ecto_sqlite3, "~> 0.18"} for SQLite
{:ecto_sql, "~> 3.13"}
]
endThat's it for the dep side. Everything below is config + migrations.
Module map
| Behaviour | :single_node default | :multi_node default | Other backends |
|---|---|---|---|
CouncilEx.Reliability (store) | CouncilEx.Reliability.Null (no-op) | CouncilEx.Reliability.Ecto | CouncilEx.Reliability.ETS, .Redis |
CouncilEx.Registry.Backend | CouncilEx.Registry.ETS | CouncilEx.Registry.Ecto | CouncilEx.Registry.Redis |
CouncilEx.Recorder | none (events ephemeral) | CouncilEx.Recorder.Ecto | — |
Pick per-feature. You can mix freely — e.g. Reliability.Redis (low
latency hot path) + Registry.Ecto (audit-friendly history) +
Recorder.Ecto (run history) on one repo.
End-to-end setup
1. Add deps
{:postgrex, "~> 0.20"},
{:ecto_sql, "~> 3.13"}2. Configure your repo
If you already have an Ecto.Repo for your app, reuse it. CouncilEx
doesn't need its own.
# config/config.exs
config :my_app, MyApp.Repo,
database: "my_app_dev",
username: "postgres",
hostname: "localhost"
config :my_app, ecto_repos: [MyApp.Repo]3. Wire CouncilEx to use the Ecto-backed modules
One mode flip + a shared repo:
# config/runtime.exs
config :council_ex,
mode: :multi_node,
repo: MyApp.RepoThat's it. :multi_node switches :reliability_store,
:registry_backend, and :recorder defaults to their *.Ecto
modules. The shared :repo flows to all three via
CouncilEx.Config.repo!/1.
Per-module overrides are still honoured:
# Optional: a different repo for one backend.
config :council_ex, CouncilEx.Reliability.Ecto,
repo: MyApp.AnalyticsRepo4. Generate + run the migration
defmodule MyApp.Repo.Migrations.AddCouncilEx do
use Ecto.Migration
def up do
CouncilEx.Persistence.Migration.up()
end
def down do
CouncilEx.Persistence.Migration.down()
end
endCouncilEx.Persistence.Migration.up/0 creates all required tables:
council_ex_runs— Recorder run rowscouncil_ex_run_events— Recorder event logcouncil_ex_reliability— Reliability counterscouncil_ex_registry— Registry blobs
Then:
mix ecto.migrate
5. Recorder is autowired under :multi_node
With mode: :multi_node + shared :repo, every CouncilEx.start/3
call automatically attaches CouncilEx.Recorder.Ecto — no per-call
opt needed. The recorder spawns alongside the RunServer, subscribes
to the run's PubSub topic, and persists :run_started, every
:round_started, :member_completed, and :run_completed (or
:run_failed).
Opt out for a specific run by passing recorder: nil:
{:ok, pid} = CouncilEx.start(MyCouncil, %{question: q}, recorder: nil)Or pass an explicit recorder spec to override args:
{:ok, pid} = CouncilEx.start(MyCouncil, %{question: q},
recorder: {CouncilEx.Recorder.Ecto, %{repo: MyApp.AnalyticsRepo}}
)Under :single_node (default) no recorder is attached unless you
pass :recorder per call, matching the in-memory model.
6. Optional: boot-time recovery
If a node crashed mid-run, the council_ex_runs row is left in
:running state. Reconcile on app boot:
defmodule MyApp.Application do
def start(_, _) do
children = [
MyApp.Repo,
# ... your app's children ...
{Task, fn ->
CouncilEx.Persistence.Recovery.reconcile_orphans(repo: MyApp.Repo)
end}
]
Supervisor.start_link(children, strategy: :one_for_one)
end
endMarks orphans as :failed with a recovery error so they're not
mistaken for active runs.
7. Optional: Redis backends instead
Per-key overrides win over the :multi_node defaults — set just the
backend you want to swap:
# Add :redix
def deps do
[{:redix, "~> 1.5"}]
end
# Start Redix in your supervision tree
children = [
{Redix, name: :council_ex_redix, host: "localhost"}
]
# Mode keeps the other two on Ecto; reliability moves to Redis.
config :council_ex,
mode: :multi_node,
repo: MyApp.Repo,
reliability_store: CouncilEx.Reliability.Redis
config :council_ex, CouncilEx.Reliability.Redis,
conn: :council_ex_redix,
prefix: "council_ex:reliability"You can run mixed: Reliability.Redis + Registry.Ecto + Recorder.Ecto.
8. Optional: Oban worker for durable retries
CouncilEx.Workers.Oban wraps a council run in an Oban job — durable
across deploys, retry-safe, with the runner linked to the worker so
job timeout kills the run.
def deps do
[{:oban, "~> 2.19"}]
end
# Enqueue:
%{council: "MyApp.Councils.Advisor", input: %{q: "..."}}
|> CouncilEx.Workers.Oban.new()
|> Oban.insert()Multi-replica deployments
| Replica scenario | Solution |
|---|---|
| Tool registrations on node A invisible on node B | Switch to Registry.Ecto or .Redis |
| Reliability scores accumulated on node A invisible on node B | Switch to Reliability.Ecto or .Redis |
| LiveView reconnects to different node mid-run | Recorder.Ecto + hydrate from Persistence.Schema.Run row |
| Run survives node death | Workers.Oban retries job on a live node |
Telemetry-driven persistence (app-owned schema)
Recorder.Ecto is one option. The other — and often the right one if
your app already has analytics tables — is telemetry handlers that
write directly to your own schema. No Recorder involved, no second
source of truth.
Use this when:
- You already maintain a
council_runs/council_member_resultsschema (or similar) and don't want a parallelrun_eventstable. - Your downstream analytics ETL already reads from that schema.
- You want per-row cost / token accounting and your pricing model lives in your app, not in CouncilEx.
Event contract
Subscribe to [:council_ex, :member, :stop] and you have everything
needed to insert one complete row per member call:
Measurements
| Key | Type | Notes |
|---|---|---|
duration | integer (native units) | Wall clock |
input_tokens | non_neg_integer | From Response.usage.input_tokens |
output_tokens | non_neg_integer | From Response.usage.output_tokens |
Metadata
| Key | Type | Notes |
|---|---|---|
run_id | String.t | |
member_id | String.t (or atom for static members) | |
member_module | module | |
round_name | atom | E.g. :peer_review |
round_idx | non_neg_integer | Position in the round timeline |
model | String.t | nil | Provider-reported model id when available |
provider | atom | nil | Provider id from member opts |
status | :ok | :error | :timeout | :skipped | :invalid_output | :eliminated | |
attempts | pos_integer | Retry attempts used |
confidence | float | nil | When a confidence strategy is configured |
cost_usd is intentionally NOT emitted. Compute it in your handler from
input_tokens, output_tokens, and your pricing table — pricing
changes often, and CouncilEx doesn't want to ship a stale model.
Skeleton handler
defmodule MyApp.Council.TelemetryRecorder do
@events [
[:council_ex, :run, :start],
[:council_ex, :run, :stop],
[:council_ex, :member, :stop],
[:council_ex, :member, :exception]
]
def attach do
:telemetry.attach_many(
"myapp-council-recorder",
@events,
&__MODULE__.handle/4,
%{}
)
end
def handle([:council_ex, :member, :stop], meas, meta, _) do
MyApp.Repo.insert!(%MyApp.Council.MemberResult{
run_id: meta.run_id,
member_id: to_string(meta.member_id),
round_name: Atom.to_string(meta.round_name),
round_idx: meta.round_idx,
model: meta.model,
provider: meta.provider && Atom.to_string(meta.provider),
status: Atom.to_string(meta.status),
attempts: meta.attempts,
confidence: meta.confidence,
input_tokens: meas.input_tokens,
output_tokens: meas.output_tokens,
duration_ms: System.convert_time_unit(meas.duration, :native, :millisecond),
cost_usd: MyApp.Pricing.cost(meta.model, meas.input_tokens, meas.output_tokens)
})
end
# … run start/stop handlers populate the parent run row …
endAttach from your application supervisor:
MyApp.Council.TelemetryRecorder.attach()When NOT to use this approach
- You don't already have an analytics schema and want a turnkey
history table → use
Recorder.Ecto. - You need run history available to other systems that already query
run_events→ useRecorder.Ecto.
You can also run both, but expect to reconcile the two tables.
Topology matrix
Which mode and extras to reach for at each deployment scale:
| Topology | Sync run/3 | Async start/3 | Recommended :mode | Caveats / host-app responsibilities |
|---|---|---|---|---|
| Single-node | yes | yes | :single_node (default) | Run history is in-memory; flip to :multi_node + Recorder.Ecto for durability. |
| Distributed Erlang cluster | yes | yes (node-pinned) | :single_node or :multi_node | Pin awaiters to the run's node; for failover, layer your own job runner (Oban / Broadway / queue) over start/3 + await/2. |
| Replicated BEAM (no cluster, shared DB) | yes | no cross-replica | :multi_node | Recorder.Ecto + Workers.Oban for cross-replica resumption. |
| Ephemeral BEAM (autoscaled / scale-to-zero) | yes (short runs) | same-machine only | :multi_node | Long runs need an external job runner that survives instance termination. |
What does NOT need persistence
- The RunServer itself stays in-memory. A run executing on node A
can't be migrated to node B mid-flight (PubSub events only reach the
origin node's subscribers). Use
Recorder.Ecto+ Oban for cross-node durability. - PubSub topic distribution across nodes uses
:pg(default) orPhoenix.PubSub— not Postgres. Configure separately.
Troubleshooting
UndefinedFunctionError: Postgrex.Query.parameters/0— missing:postgrexdep. Add it.UndefinedFunctionError: Redix.command/2— missing:redixdep. Add it.Recorder.Ectowrites nothing — confirm:recorderopt was passed toCouncilEx.start/3for that specific run.- Recovery never marks orphans — check the row's
:statusin DB; if it's already:completed/:failed, it's not an orphan.