Continuum v0.5 uses BEAM distribution for low-latency routing between nodes. It does not start or configure a clustering transport for your application. Use the tooling you already operate, such as DNS clustering, libcluster, releases with known node names, or your platform's service discovery, and make sure Node.list/0 contains the other application nodes.

Routing Model

Continuum starts a :pg scope named :continuum. Every live workflow engine joins the group {instance_name, run_id} while it owns a run. When a signal, timer, or other wake reaches a node that does not own the local engine, Continuum.Runtime.Engine.wake/2 first checks the local Registry, then forwards the wake to a :pg member if one is present.

:pg is advisory. The Postgres lease and fencing token remain the authority for journal writes. If a stale node is still listed in :pg, its next write fails the lease check and the heartbeater stops that engine.

Failure Recovery

Node failure recovery is lease-expiry based in v0.5.0. A run abandoned by a dead node becomes claimable after its lease TTL expires, then any node's dispatcher can resume it from the journal. The default TTL is 30 seconds, so worst-case resume latency after node death is roughly that TTL plus the dispatcher poll interval.

Activity tasks use the same rule. Boot recovery only requeues leased activity tasks after their task lease has expired, so a newly booted node does not steal work from a live worker on another node.

Observability

Continuum emits [:continuum, :run, :forwarded] when a wake is forwarded through :pg, with :from_node and :to_node metadata. It emits [:continuum, :lease, :lost] when a heartbeater discovers that another owner has stolen a run lease.

Test Harness

The repository includes mix test.cluster, which runs test/cluster with real :peer nodes against the test Postgres database. These tests are excluded from ordinary mix test because Ecto SQL Sandbox transactions do not span BEAM nodes.