A Postgres-backed durable-execution engine for Elixir. You declare a finite-state machine; the engine commits its state to Postgres before each step proceeds, so an instance survives process and node death and resumes where it left off.

Inspired by durable-execution systems (Temporal, DBOS) and Postgres-backed job runners (Oban) — but the unit of durability is an explicit FSM step, and the state lives in the database, not in a process. There is no GenServer per instance: an FSM is a row, and each step runs as an ephemeral task. The runtime backbone (scheduler, reaper, GC) is a small set of GenServers that pick runnable rows and dispatch them.

The one guarantee: on step completion, the new state is committed to the database before execution proceeds. On a crash before commit, the step re-executes from scratch (at-least-once). Idempotency of step effects is the user's responsibility.

Install

def deps, do: [{:gen_durable, "~> 0.2.0"}]

Add the migration (the DDL lives in the library) and run it:

defmodule MyApp.Repo.Migrations.SetupGenDurable do
  use Ecto.Migration

  def up,   do: GenDurable.Migration.up()
  def down, do: GenDurable.Migration.down()
end

Start the engine in your supervision tree, after your repo:

children = [
  MyApp.Repo,
  {GenDurable, repo: MyApp.Repo, queues: [default: 10, checkout: 5]}
]

A first machine

defmodule Checkout do
  use GenDurable.FSM, queue: "checkout"

  defmodule State do
    use GenDurable.State
    embedded_schema do
      field :order, :integer
    end
  end

  @impl true
  # park until the payment webhook fires, then run "ship" with it in ctx.awaited
  def step("start", ctx), do: {:await, "payment_confirmed", "ship", ctx.state}
  def step("ship",  ctx), do: {:done, %{"order" => ctx.state.order, "paid" => hd(ctx.awaited).payload}}
end

{:ok, _id} = GenDurable.insert(Checkout, state: %{order: 42}, correlation_key: "order:42")

# later, from a webhook that only knows the business key:
GenDurable.signal("order:42", "payment_confirmed", %{amount: 100})

For the trivial "run once and finish" case, define perform/1 instead of step/2 and you get a durable job with retries for free.

Features

GuideWhat
Jobsone-shot durable jobs (perform/1|2) with retries and backoff
State machinesstep/2, typed State, the outcome contract, error handling
Signals & awaitpark on external events; durable, at-least-once, sets and packs
Child fan-outschedule_childs — fan work out, join on all of it
Rate limitingper-step token-bucket limits, partitioned, weighted
Concurrency keysserialize per key, parallel across keys
Instance identitycorrelation_key — address a signal by business key + dedup
Scheduling & queuesdelays, priority, queues, recurring work
Operationsmigration, crash recovery, GC, the config reference, telemetry

Documentation

Development

The toolchain (Elixir 1.18 / OTP 27 + Postgres) is pinned in .devcontainer/.

make up     # build the devcontainer
make test   # run the suite