Status: draft v1. Companion to gen_durable_spec.md (normative). This document is the how;
the spec is the what. Where this plan and the spec disagree, the spec wins — fix this plan.
The engine is an Elixir library: a Postgres-backed durable FSM runtime on top of GenServer. The one guarantee we are building toward: on step completion, the new state is committed to the database before execution proceeds (spec §1). Everything below serves that sentence.
0. Tech stack and key decisions
These are the choices that shape every later phase. Each has a recommendation and the reasoning; items marked CONFIRM are worth a yes/no before we lean on them.
| Decision | Recommendation | Why |
|---|---|---|
| Language / build | Elixir library, Mix, lib/gen_durable/… | Spec is GenServer-based. |
| DB driver | ecto_sql + postgrex (DECIDED) | Use Ecto.Migration for the DDL and an Ecto.Repo for connection pooling, but run hot-path statements as raw SQL (Repo.query!/Postgrex). The spec is SQL-first (FOR UPDATE SKIP LOCKED, generated columns, advisory locks); we do not hide those behind the query DSL. |
| New-work latency | Polling only. LISTEN/NOTIFY is banned, permanently. (DECIDED) | Spec §6 mandates a "dumb picker". Polling is trivially correct; NOTIFY is not an option in this codebase. Do not add it, do not propose it. |
state encoding | Typed struct per FSM, jsonb ↔ struct via Ecto embedded schema (DECIDED) | Each FSM declares its own state struct; the engine loads jsonb → struct before step/2 and dumps struct → jsonb on the outcome. ctx.state is that struct, not a raw map. Living with a plain map is allowed but unsupported — you're on your own. |
| FSM definition | use GenDurable.FSM behaviour, step/2 + handle/2 callbacks, state: struct module | The step text column maps to a function-clause head; state: names the embedded schema. |
| FSM versioning | Explicit registry {fsm_name, fsm_version} => module in config/start (DECIDED) | Spec §8: old instances finish on their fsm_version. Old versions (Checkout.V1, Checkout.V2) coexist as explicitly-registered modules; we resolve module per row, never assume "latest", no compile-time magic. |
| Defaults (timings) | lease 60s · heartbeat 20s · poll 1s · reaper 30s ("Balanced", DECIDED) | All config-overridable. Margin: heartbeat × 3 = lease. |
| Signal consume | Name-scoped (spec §5) — on a progressing outcome the engine deletes the inbox signals whose name = awaits, then clears awaits (DECIDED → option (b)) | Non-matching signals survive until their own await. Delivery keeps awaits; :await also guards against a pre-arrived signal (no lost wake-up). Supersedes the earlier snapshot-delete idea. |
| Step ↔ connection | one DB connection is held for the whole step when partition_key is set | Session-level pg_try_advisory_lock must live on a connection that survives the step, which runs outside any transaction. This ties a connection to each in-flight partitioned step — call it out in capacity planning. |
The advisory-lock subtlety (read this before Phase 9)
partition_key serialization (spec §6, Pick) uses a session-level advisory lock, not
pg_advisory_xact_lock, because the step body runs outside a transaction. Consequences:
- The lock must be taken on, and held by, the same connection for the lock's lifetime, then
released after the outcome commits (
pg_advisory_unlock, or drop the connection). - The picking
UPDATEruns in a short transaction on a pooled connection; the lock attempt and the step run on a (possibly different) checked-out connection. Be deliberate about which connection owns the lock. - Worker death drops the connection → Postgres releases the advisory lock automatically. This is the free recovery the spec relies on; do not "optimize" it by reusing connections across steps.
1. Architecture overview
GenDurable.Application (supervisor)
├── GenDurable.Repo (Ecto repo / Postgrex pool)
├── GenDurable.Registry ({fsm, version} -> module)
├── GenDurable.Reaper (periodic: expired leases -> runnable, attempt+1)
└── GenDurable.Queue.Supervisor (one child per configured queue/pool)
└── GenDurable.Queue "default"
├── Scheduler (GenServer: poll, run Pick for free slots, hand rows to workers)
├── Heartbeat (extends lease_expires_at for in-flight rows)
└── Task.Supervisor (one Task per executing step; local concurrency = pool size)Data flow of one step:
- Scheduler picks
runnablerows (short txn,FOR UPDATE SKIP LOCKED) →executing+ lease. - If
partition_keyset, Executor takes the advisory lock; on conflict, return row torunnable. - Executor loads pending signals (if relevant), resolves the module, calls
step/2undertry. - Outcome (or
handle/2outcome on exception) is written in one transaction, deleting consumed signals in that same transaction (spec §5). - Advisory lock released, heartbeat stopped.
Crash with no outcome → lease expires → Reaper resets to runnable, attempt + 1, step re-runs
from scratch (spec §4.3). handle/2 is not called on a reap — that is the at-least-once floor.
2. Module layout
lib/gen_durable.ex # public API: insert/2, insert_all/2, signal/4
lib/gen_durable/application.ex
lib/gen_durable/repo.ex
lib/gen_durable/fsm.ex # `use GenDurable.FSM`, behaviour + __using__
lib/gen_durable/state.ex # `use GenDurable.State` — embedded schema + load/dump codec
lib/gen_durable/outcome.ex # outcome constructors / validation (the §3 table)
lib/gen_durable/context.ex # ctx struct passed to step/2 and handle/2
lib/gen_durable/registry.ex # {name, version} -> module
lib/gen_durable/queries.ex # ALL raw SQL from spec §10, one function each
lib/gen_durable/executor.ex # run one picked row end-to-end
lib/gen_durable/scheduler.ex # per-queue picker loop + concurrency accounting
lib/gen_durable/heartbeat.ex
lib/gen_durable/reaper.ex
lib/gen_durable/signals.ex # sender-side delivery
lib/gen_durable/migration.ex # GenDurable.Migration.up/1 down/1 — host app calls it (Oban-style)
test/support/test_repo/migrations/..._setup.exs # test repo only; just calls GenDurable.Migration.up()queries.ex is the single home for the spec's SQL so §10 maps 1:1 to functions and stays auditable.
The library ships no Ecto migration files of its own — the DDL lives in GenDurable.Migration
and the host app writes a one-line migration that delegates to it (see §4).
3. Public API (target shape)
# Defining a machine — state is a nested embedded schema, adopted by convention
defmodule Checkout do
use GenDurable.FSM, version: 1, queue: "checkout"
defmodule State do
use GenDurable.State # thin wrapper over `embedded_schema`
embedded_schema do
field :order, :integer
field :n, :integer, default: 0
field :shipped, :boolean, default: false
end
end
@impl true
def step("start", %{state: s} = _ctx), do: {:next, "await_pay", %{s | n: s.n + 1}}
def step("await_pay", ctx), do: {:await, "payment_confirmed", ctx.state}
def step("ship", %{state: s} = _ctx), do: {:done, %{"shipped" => true}}
@impl true
def handle(reason, ctx) do
if ctx.attempt < 5, do: {:replay, ctx.state, backoff(ctx.attempt)}, else: {:stop, reason}
end
end
# Enqueue — `state:` is cast into Checkout.State
{:ok, id} = GenDurable.insert(Checkout,
state: %{order: 42},
partition_key: "order:42",
priority: 0,
unique_key: <<...>>, unique_scope: ["runnable", "executing", "awaiting_signal"])
# Batch (single statement, dedup via the partial unique index)
GenDurable.insert_all(Checkout, [ %{...}, %{...} ])
# Signal into an await (durable, at-least-once)
GenDurable.signal(id, "payment_confirmed", %{"amount" => 100}, dedup_key: "evt-7")ctx.state is a %Checkout.State{} struct (loaded from jsonb before the call); step/2/handle/2
return a struct of the same type in :next/:replay/:await, which the engine dumps back to jsonb.
result in :done stays a plain string-keyed map (terminal payload, never re-loaded into step code).
ctx for step/2: %Context{id, fsm, fsm_version, step, attempt, state, signals}.
ctx for handle/2: same minus signals (spec §4.2: %{id, fsm, fsm_version, step, attempt, state}).
Outcomes returned by step/2 and handle/2 are exactly the spec §3 table:
{:next, step, state} · {:replay, state, delay_ms} · {:await, name, state} ·
{:done, result} · {:stop, reason}. Outcome validates the shape before it touches SQL.
4. Data model
Migration: library-owned, Oban-style
The DDL lives in GenDurable.Migration, not in copy-pasted Ecto migration files. The host app writes
a thin migration that delegates:
defmodule MyApp.Repo.Migrations.SetupGenDurable do
use Ecto.Migration
def up, do: GenDurable.Migration.up() # opts: [prefix: "public", version: :latest]
def down, do: GenDurable.Migration.down()
endup(opts \\ [])/down(opts \\ []), both runexecute/1of the spec §9 DDL verbatim (the generated column, partial indexes, and enum are clearer as raw SQL than via the migration DSL).:prefix— target Postgres schema (defaultpublic);:version— schema version to migrate to (default latest). v1 ships version 1 only.- The installed schema version is recorded in
COMMENT ON TABLE gen_durable(Oban's trick), read onupso only missing increments apply. This keeps the user-facing call stable when we add a v2.
Tables created: gen_durable, signals, the durable_status enum, the three indexes
(gen_durable_pick, gen_durable_lease, gen_durable_unique) plus signals_target.
Spec deviation D1 — unique_scope must be durable_status[], not text[]
Spec §7/§9 define unique_scope text[] and the generated column as
status::text = any(unique_scope). Postgres rejects this: a generated always as (...) stored
expression must be IMMUTABLE, but the enum→text cast (enum_out) is only STABLE (enum labels can be
renamed). Result: ERROR 42P17 generation expression is not immutable.
Fix applied in GenDurable.Migration (v1): store unique_scope as durable_status[] and compare
status = any(unique_scope) (enum = enum, enum_eq is IMMUTABLE — the cast disappears). The §7
semantics are unchanged. Downstream impact: inserts pass scope as $scope::durable_status[], and the
engine reads scope as enum-label strings. The normative spec §7/§9 should be patched to match.
Notes to verify during implementation:
unique_guardisgenerated always as (...) stored; the partial unique index is on it.- Batch insert uses
ON CONFLICT (unique_guard) WHERE unique_guard is not null DO NOTHING— the inference predicate must match the partial index predicate exactly, or PG won't pick the index. signalshasunique (target_id, dedup_key);dedup_key NULL⇒ no dedup (NULLs don't conflict).
5. Component design
5.1 Queries (queries.ex)
One function per spec §10 block, all parameterized, all raw SQL:
pick/3, heartbeat/2, reap/0, complete_next/…, complete_replay/…, complete_await/…,
complete_done/…, complete_stop/…, deliver_signal/…, insert/…, insert_all/….
Progressing complete_* run a name-scoped signal DELETE (signals whose name = awaits) and the
outcome UPDATE (which clears awaits) in one txn; complete_await guards a pre-arrived signal.
5.2 Executor (executor.ex)
Given a picked row map, run the step to a committed outcome:
- If
partition_key,pg_try_advisory_lock(hashtext(partition_key))on the step connection; onfalse, reset row torunnable(drop lease) and stop — another worker owns the key. - Load pending signals for
idintoctx.signals(read-only view; deletion is name-scoped in SQL on the outcome, not by these ids). - Resolve module via
Registry.fetch!({fsm, fsm_version}); load jsonb → state struct (the FSM'sstate:embedded schema, viaEcto.embedded_load); buildContextwith the struct. try→module.step(step, ctx); on raised exception →module.handle(reason, ctx); ifhandle/2itself raises → write:stop/failed(spec §4.2).- Dump the returned state struct → jsonb (
Ecto.embedded_dump), then apply the outcome via the matchingcomplete_*query, which name-scope-deletes the awaited signals in the same txn. pg_advisory_unlock, stop heartbeat.
A worker process crash before step 5 means no outcome row → reaper path. We do not trap and
convert crashes into handle/2; crash ≠ caught exception (spec §4 distinguishes the three sources).
5.3 Scheduler (scheduler.ex) — feeder + executor with backpressure
Per queue. A finished Task refills immediately, so throughput is bounded by
concurrency / step_time, not by poll_interval (the poll timer only governs idle discovery
latency). Backpressure-driven with four aggressiveness knobs (see F5 below and the module doc):
concurrency— executor width (max Tasks at once).prefetch— extra rows claimed into an in-memory buffer beyond the running slots (0⇒ today's pick-exactly-free-slots). Buffered rows areexecuting+leased and are heartbeated (the heartbeat set isbuffer ++ in_flight), so buffer depth is decoupled fromlease_ttl— they never go stale. Cost of depth: cross-node fairness (claimed rows are invisible to other nodes) and priority freshness; crash blip is bounded by the (short) TTL, not by depth.min_demand— batch gate: don't pick unless ≥ this many slots are free (fat picks), bypassed when fully idle to avoid starvation.poll_interval/max_poll_interval— idle backoff: an empty pick on a fully idle queue doubles the interval up to the ceiling; any fetched or in-flight work snaps back to the base. The lever that cuts idle DB load (NOTIFY is banned → poll adaptively, not constantly).
Loop: refill (pick up to demand = concurrency + prefetch − claimed into the buffer when the gate
allows) → drain (spawn buffered jobs, highest-priority first, into free slots). Demand-driven, so
we never claim more than concurrency + prefetch.
5.4 Heartbeat (heartbeat.ex)
Extends lease_expires_at for in-flight rows on an interval < lease_ttl (e.g. lease_ttl/3).
Implementable as one process per queue ticking over the in-flight set, or per-step timer. Lease TTL
and heartbeat interval are config; document the safety margin (heartbeat × k < lease_ttl).
5.5 Reaper (reaper.ex)
Periodic GenServer running the spec §10 Reaper UPDATE. Global (not per-queue). Interval config.
5.6 Signals (signals.ex)
signal/4: the spec §10 sender transaction — insert the signal row (with ON CONFLICT … DO NOTHING
for dedup) then flip awaiting_signal+matching awaits to runnable. Insert-before-flip so the
re-executed step already sees the signal in its inbox (spec §5).
6. Milestones
Status: M0–M10 all ✅ DONE. 33 tests green (Elixir 1.18 / OTP 27 + Postgres 17 in the
devcontainer). The whole engine — typed/map state, :next loop, await/signal, handle→
:replay→:stop, reaper crash-recovery, uniqueness, and partition_key serialization — runs
end-to-end. See module map in §2; tests in test/.
Each milestone ends in something testable. Dependencies are roughly linear; signals (M7) and uniqueness (M8) can proceed in parallel once persistence (M3) lands.
- M0 — Scaffolding. ✅ DONE. Mix project (Elixir 1.18 / OTP 27 via
.devcontainer+ Postgres), deps (ecto_sql3.14,postgrex0.22,jason),GenDurable.Migration.up/down(spec §9 DDL, Oban-style facade with version-in-table-comment),GenDurable.Test.Repo, test migration, smoke tests.down → upround-trip verified (enum + comment torn down cleanly). All green. Deviation found & applied: see "Spec deviation D1" below. - M1 — Core types (pure). ✅
FSMbehaviour +__using__,State(embedded schema + jsonb load/dump codec),Outcome,Context,Registry, step dispatch. No DB. Tests:outcome_test,state_test,registry_test. - M2 — Persistence ops. ✅ All of
queries.exagainst a real DB, called directly. Tests:queries_testdrives pick → each outcome, reaper, signals, uniqueness, batch insert. - M3 — Executor. ✅
executor.exruns one picked row end-to-end incl.try/handle/2/handle-raises. Crash ≠ caught exception (nohandle/2on a process crash). - M4 — Scheduler + pool + heartbeat. ✅
scheduler.ex— per-queue picker loop, concurrency cap, batched lease heartbeat. Test: Counter/MapCounter drive to:done. - M5 — Reaper. ✅
reaper.ex. Test:Rebornkills the worker mid-step → reaper resets →attempt+1→ succeeds;handle/2not invoked. - M6 —
:replay& backoff plumbing. ✅attemptreset on:next,+1on:replay,eligible_athonored (seequeries_test+Crasher). - M7 — await + signals. ✅
:awaitparks;signal/4delivers durably (insert-before-flip); re-executed step sees the signal and the engine deletes it in the outcome txn. Test:Awaiter. - M8 — Uniqueness + batch insert. ✅
unique_guarddedup, per-job scope, single-statementinsert_all. Test: dedup vs existing rows and within the batch; scope-exit frees the key. - M9 — partition_key serialization. ✅ Session advisory lock on a checked-out connection held for
the step; contended keys returned to
runnable; auto-release on worker death. Test:PartitionInc— N instances sharing a key, no lost updates. - M10 — Hardening. ✅ Telemetry (
[:gen_durable, :step, :stop],[:gen_durable, :reaper, :reaped]), config surface (GenDurable.Supervisoropts), README,--warnings-as-errorsclean.
M11 — schedule_childs (spec §11) ✅ DONE
Fan-out + fan-in barrier as a first-class primitive. Since the schema was never deployed, the v2
columns were folded into the v1 migration (no incremental migration — per "никто не использовал"):
enum value awaiting_children, columns parent_id / children_pending, index gen_durable_parent.
- Outcome
{:schedule_childs, next_step, children, state}(Outcome.validate); child spec is{FsmModule, insert_opts}or a bare module. Queries.complete_schedule_childs— batch-insert children (parent_idstamped) + park in one txn,children_pending= rows actually inserted, zero ⇒ straight tonext_steprunnable.- Child→parent join decrement (
notify_parent) appended tocomplete_done/complete_stop(no-op whenparent_idis null); the decrement that hits zero releases the barrier. ctx.childsloaded on every run (one indexed SELECT; optimizable like the always-load-signals).- Tests: all-children join, a
failedchild still releasing its slot, zero-children → immediatenext_step. 36 tests green.
Fixed (F1–F3)
- F1 / D2 — resolved. Spec §10
:doneno longer listsstate = $state; it matches the engine (writesresultonly, leavesstate). - F1 / signals (option b) — resolved. Name-scoped delete +
awaitskept on delivery +:awaitpre-arrival guard (no lost wake-up). Non-matching signals survive (spec §5). Tests inqueries_test. - F2 / test coverage — added. Priority ordering, per-queue routing,
schedule_ineligibility, heartbeat keeps a long lease (no spurious reap), cross-key partition parallelism. (engine_test) - F3 / scheduling sugar — added.
insert/2accepts:schedule_in(ms from now) /:schedule_at(DateTime); precedence:eligible_at>:schedule_at>:schedule_in.
F5 — Feeder / backpressure + tunable aggressiveness ✅ DONE
The per-queue Scheduler is now a feeder: it claims work into a small in-memory buffer and drains it
into concurrency Tasks, with four knobs (prefetch, min_demand, poll_interval /
max_poll_interval) exposed via GenDurable.Supervisor opts. Defaults (prefetch: 0, min_demand: 1, max_poll_interval: 5_000) reproduce the prior behaviour plus idle backoff. Buffered rows are
heartbeated (heartbeat set = buffer ++ in_flight), so over-fetch never risks a spurious reap —
buffer depth is decoupled from lease_ttl. Test: prefetch: 5 + concurrency: 1 holds the tail
buffered past a 300ms lease and still completes with attempt == 0. 43 tests green.
- Knob trade-offs (cross-node fairness, priority freshness, crash blip) are documented in the module doc; defaults are conservative (fair on a cluster), aggression is opt-in per deployment.
- Follow-up (deferred): per-queue (vs engine-wide) knobs; graceful drain that resets the buffer on shutdown instead of waiting out the lease.
F6 — Picker-side partition_key dedup ✅ DONE
The picker (Queries.pick) no longer claims work it can't run, killing the claim→try_lock→
reset_to_runnable churn that prefetch amplifies on hot keys. Two guards in one statement:
NOT EXISTS— exclude a runnable row whosepartition_keyis alreadyexecuting(so a sibling isn't claimed only to bounce off the advisory lock).partition_key IS NULLshort-circuits the guard, so non-partitioned work pays nothing. Backed by a new partial indexgen_durable_partition_active (partition_key) WHERE status='executing'(folded into v1).- intra-batch dedup — at most one row per key per batch (the most urgent); NULL keys fall back to
idso each is its own group and is never collapsed. (Final form in F8 — a window function over the locked set; the originalDISTINCT ONneeded a second re-lock pass.) The advisory lock stays the correctness guard (cross-node / unlock-gap races still possible); dedup is the optimization that makes contention rare. New telemetry[:gen_durable, :partition, :contended]fires on the residual bounce. Deterministic picker tests inqueries_test; 46 tests green.
F7 — Picker performance: queue equality + bounded dedup window ✅ DONE
Two fixes, measured in PERFORMANCE.md (real EXPLAIN on 1M rows, Postgres 17):
queue = $1(equality), neverANY. Each scheduler owns one queue, so the picker filters by a single value. This lets thegen_durable_pick (queue, priority, eligible_at)index supply rows already ordered, so theLIMITstops after ~batchrows. WithANY, the planner cannot trust the index order and scans + top-N sorts the entire runnable set: measured 613 ms → 0.7 ms (~850×) at 295k runnable.Queries.pick/5now takes a single queue string;Schedulerpassesopts.queue.- Bounded dedup window. The dedup
scanisLIMIT $2(the batch) on the index-ordered inner scan, so DISTINCT ON sorts ≤batchrows, never the backlog. The earlier unboundedcandidatesscan is gone. A same-key cluster filling the window underfills the batch; completion-driven refill closes it next pick. Residual limit: a hot key with a huge runnable backlog still makes the scan skip past its excluded siblings (one index probe each) — picker sharding by key hash is the deferred fix (noted in PERFORMANCE.md §6). All hot-path statements are PK/partial-index driven and sub-ms; seePERFORMANCE.mdfor plans, the round-trip throughput model, and the F4 round-trip-reduction backlog. 46 tests green.
F8 — Picker: one nested loop (window dedup over the locked set) ✅ DONE
The picker is now the canonical Postgres claim — one SELECT … FOR UPDATE SKIP LOCKED LIMIT, then one
UPDATE — with the partition dedup folded in as row_number() OVER (PARTITION BY coalesce(key, id))
over the locked set, so there is exactly one nested loop (the UPDATE join). The previous
DISTINCT ON form dedup'd before locking and so needed a second re-lock pass (a second nested loop
that scaled per-row with the batch). Per-key losers (rn > 1) are locked-but-not-updated → stay
runnable, lock released at commit, no advisory bounce. Measured (median of 5, VACUUM'd, 1M rows):
batch 5000 70 ms → 57 ms (−19%), batch 1000 17.3 → 16.1; no new index. Also folded in: the
gen_durable_partition_active index scoped to partition_key IS NOT NULL (the guard never probes
null keys, so non-partitioned claims skip that index write).
- Rejected by measurement (in PERFORMANCE.md §2.5–2.6): forcing the nested loop off → full-table
Seq Scan, ~10× slower (the join is optimal); a
ctid-join cut buffer hits ~28% but not wall-clock; a correlated-subquery single-scan dedup needed a new index that taxed writes. The cost floor is the per-row claim write (~11–16 µs/row: heap + partial-index moves + WAL), not the query shape. - 46 tests green (partition serialization, dedup units, cross-key parallelism all hold under M).
F9 — Outcome collapsed to one round-trip ✅ DONE
Each complete_* was a repo.transaction(BEGIN + consume DELETE + outcome UPDATE [+ notify_parent] + COMMIT) = 4–5 round-trips. Folded into a single data-modifying-CTE statement: consumed AS (DELETE …) rides as a leading CTE, atomic with the outcome UPDATE because one statement is its own
implicit transaction; :done/:stop carry the parent-join decrement as the main UPDATE after a
terminal AS (UPDATE …) CTE (child id = $1 and parent are different rows, read under the shared
snapshot). Removed the tx / consume_awaited / notify_parent helpers. ~4 round-trips → 1,
asserted single-statement via Ecto query telemetry in test/perf_test.exs (a :bench test, excluded
by default, prints the old-vs-new wall-clock — consistently faster). 53 tests green (1 bench excluded).
F10 — Graceful drain + telemetry breadth ✅ DONE
- Graceful drain: the
Schedulertraps exits and aterminate/2(a) releases the buffered (un-started) claims straight back torunnableviaQueries.release/3— so deep-prefetch work is picked up immediately on deploy instead of waiting a fulllease_ttl— and (b) waits up todrain_timeout(config, default 5_000 ms) for in-flight steps to commit their outcomes. The sibling shutdown order (schedulers before theTask.Supervisor) means in-flight tasks are still alive to finish; the scheduler child'sshutdownis set todrain_timeout + 1_000so the supervisor doesn't brutal-kill it mid-drain. Stragglers past the deadline fall to the reaper (the lease floor). Test:concurrency 1+prefetch 5+ 60 s lease ⇒ only the drain (not the reaper) can free the buffered rows; afterstop_supervised, 3 released torunnable, 1 drained todone. - Telemetry: added
[:gen_durable, :pick, :stop](count/demand per pick),[:gen_durable, :scheduler, :saturation](per-poll gauge: in_flight/buffer/concurrency/prefetch — the feeder-tuning signal), and[:gen_durable, :scheduler, :drain]. Documented all events (with the existingstep.stop/partition.contended/reaper.reaped) in theGenDurablemoduledoc. Test asserts pick + saturation fire. 55 tests green (1 bench excluded).
F11 — :fsms is now optional (dynamic FSM resolution) ✅ DONE
Listing every FSM module to start the engine was needless boilerplate for the common case: the fsm
column already defaults to inspect(module), so the module is recoverable from the row. Registry.fetch!
now falls back, on an ETS miss, to resolving name as a module — accepting it only if it is a
GenDurable.FSM whose own __gd_name__ and __gd_version__ match the row (so we never run an
arbitrary or wrong-version module). :fsms is now needed only for a custom :name (the fsm column
isn't a module name) or to keep an old :version running (spec §8). README/Supervisor/Registry docs
updated; GenDurable.Test.Auto (no custom name, unregistered) proves end-to-end + unit resolution. 57
tests green.
F12 — Nested State schema adopted by convention ✅ DONE
Declaring the state as a separate top-level module (Checkout.State) and wiring it via state: was
busywork. The state schema can now live as a nested defmodule State inside the FSM; GenDurable.FSM
resolves it at @before_compile (the nested module is already compiled by then, so __gd_state__ is a
zero-cost compile-time constant). An explicit :state opt still wins as an override; omit both for
plain-map state. README / FSM / State docs and the Counter test FSM moved to the nested form. 57 tests
green. Bumped to 0.1.1.
F13 — Job form: perform/1|2 folded into GenDurable.FSM ✅ DONE
The trivial "run once and finish" case carried the whole FSM vocabulary (step names, {:done, map},
plain-map state flagged "unsupported"). Rather than a separate GenDurable.Job behaviour, the job form
is folded into GenDurable.FSM via the same @before_compile trick: define perform/1 or perform/2
(instead of step/2) and the macro generates the bridging one-step step/2, a retry handle/2, and a
default backoff/1. perform returns :ok / {:ok, map} (done), {:error, reason} (retry w/ backoff
until :max_attempts, default 20), {:cancel, reason} (fail, no retry); a raise is treated as
{:error, _}. A module defines exactly one of step/2 / perform/1|2 — both or neither is a
compile error. :args is an alias for :state at insert. README leads with the job form. New tests:
five engine tests (ok / result+ctx / retry-then-succeed / exhaust-max_attempts / cancel) plus
compile-guard + default-backoff unit tests. 66 tests green.
Open follow-ups (post-v1, not blocking)
- F4 (remaining) — gate
signals+childsloads: every step still does twotarget/parentSELECTs; gate them onuse GenDurable.FSM, awaits: true, childs: true→ a plain:nextstep goes from ~3 round-trips to ~1. (The outcome-collapse half of F4 is done — see F9.) - partition_key busy-spin on a hot key (picker-sharding, §6) — v2.
- per-queue (vs engine-wide) feeder knobs; property/multi-node tests — v2.
7. Testing strategy
The guarantee is concurrency- and crash-shaped, so tests must be too. Ecto.Adapters.SQL.Sandbox
fights advisory locks and multi-connection flows — run engine tests against a real DB,
async: false, with explicit cleanup.
- Outcome correctness: for each §3 outcome, assert the resulting row matches the §10
UPDATE(status,attempt,eligible_at, lease cleared, signals deleted). - At-least-once on crash: kill the step Task mid-flight; assert reap → re-run, and that an effect guarded by idempotency runs at-least-once (and an unguarded one may run twice — documented).
- await/signal: loss-free delivery (signal always wakes), duplicate-tolerant (dedup_key), and the "insert before flip" ordering so the woken step sees its signal.
- Uniqueness: concurrent
insert_allfrom multiple connections dedups; leaving an occupied status frees the key for re-insertion;unique_key IS NULLnever conflicts. - partition_key: spawn many machines sharing a key; assert steps never overlap and preserve order; assert cross-key parallelism; assert lock auto-release on simulated connection drop.
- Property test (optional): random interleavings of pick/heartbeat/reap/outcome preserve "state committed before proceed" and never lose a signal.
8. Decisions (all settled)
- DB:
ecto_sql+ raw hot-path SQL. - Latency: polling only;
LISTEN/NOTIFYbanned permanently — do not add it. - State: typed struct per FSM (Ecto embedded schema), jsonb ↔ struct; plain-map state is
unsupported.
resultstays a plain string-keyed map. - Timings (Balanced): lease 60s · heartbeat 20s · poll 1s · reaper 30s, all config-overridable.
- Registry: explicit
{name, version} => moduleconfig; old versions coexist as registered modules. - Signals: name-scoped consumption (spec §5, option b) — a progressing outcome deletes inbox
signals whose
name = awaitsand clearsawaits; non-matching signals survive; delivery keepsawaits;:awaitguards a pre-arrived signal against lost wake-up.
9. Risks / watch-items
- Connection pressure from partitioned steps (§0): each in-flight partitioned step holds a connection for its whole duration. Pool sizing must account for it.
- ON CONFLICT inference on the generated column must match the partial index predicate exactly.
- Heartbeat vs lease race: if heartbeat starves (slow DB), a live step can be reaped and run twice. The TTL margin and the at-least-once contract make this safe, but tune the margin.
handle/2re-entrancy: a caught exception callshandle/2, which may:replay; ensure the re-run path is identical to a fresh run (no leftover lease/lock state).