DurableServer.Supervisor (durable_server v0.1.1)
Supervisor for DurableServer processes with lifecycle management and graceful shutdown.
DurableServer.Supervisor provides a scoped environment for managing DurableServer processes similar to how Task.Supervisor manages Task processes. Each supervisor instance maintains its own lifecycle manager, heartbeat system, and object storage namespace, preventing conflicts between different applications or components.
Usage
Start a DurableServer.Supervisor in your application supervision tree:
children = [
{DurableServer.Supervisor, name: MyApp.DurableSup, prefix: "myapp/"}
]Then start DurableServer processes through the supervisor:
DurableServer.Supervisor.start_child(
MyApp.DurableSup,
{MyServer, key: "user_123", initial_state: %{}}
)Architecture
Each DurableServer.Supervisor creates the following supervision tree:
MyApp.DurableSup
├── TaskSupervisor # The task supervisor for for async internal operations
├── DynamicSupervisor # The supervisor for all `DurableServer` processes on this node
├── SingleflightGuard # Guard table sweeper for ensure_started waiters
├── LifecycleManager # Monitors and restarts crashed servers
└── Terminator # Coordinates graceful shutdownComponents
LifecycleManager: Automatically detects and restarts crashed or orphaned DurableServer processes within this supervisor's scope. Uses object storage queries and node heartbeats to identify servers that need restart.
SingleflightGuard: Maintains and sweeps the per-key/module waiter guard
table used by ensure_started_child/3 overload protection.
Terminator: Handles graceful shutdown by instructing all DurableServer processes to sync their state before termination. Waits for confirmation (up to a timeout) before allowing the supervisor to shut down.
Object Storage Scoping
Each supervisor uses a unique prefix for object storage to prevent naming conflicts:
prefix: "myapp/"
# Results in keys like: myapp/user_123, myapp/session_456Node Heartbeats
The LifecycleManager maintains node-level heartbeats in object storage at
{prefix}nodes/{node_name} and caches them locally for efficient health
checking during restart decisions.
Configuration Options
:name- Required. Registered name for this supervisor instance:prefix- Required. Object storage prefix for scoping (should end with "/"):max_children- Maximum concurrent DurableServer processes (default: :infinity):discovery_interval_ms- How often to scan for orphaned servers (default: 60_000):initial_discovery_delay_ms- Initial delay before the first discovery sweep. Accepts either a fixed integer delay or a{min_ms, max_ms}jitter tuple (default:{1_000, 6_000}):discovery_burst_count- Number of initial discovery sweeps to run back-to-back without waiting for the discovery interval (default: 3):discovery_shuffle_batch_size- Number of candidate keys to accumulate before shuffling restart order (default: 20_000):parallel_restart_batch_size- Number of restart attempts to run concurrently per node during a discovery sweep (default: 50):restart_start_timeout_ms- Maximum time to wait for a claimed LM restart to finish bootstrapping before treating the outcome as unknown (default: 30_000):restart_claim_preferred_fanout- Number of eligible nodes allowed to contend for a freshly restartable key before widening (default: 2):restart_claim_expanded_fanout- Number of eligible nodes allowed to contend after the first restart-gate age threshold (default: 4):restart_claim_gate_expand_after_ms- Age after which restart contention widens from preferred to expanded fanout (default: 30_000):restart_claim_gate_disable_after_ms- Age after which the restart contention gate is disabled and all eligible nodes may contend (default: 120_000):heartbeat_interval_ms- How often to write node heartbeats (default: 10_000):heartbeat_staleness_threshold_ms- How long a node heartbeat may go without success before the node is considered stale/orphan-claimable (default: 30_000):heartbeat_tracking_mode- Heartbeat cache strategy::pollor:subscribe. Defaults from backend capabilities.:heartbeat_reconcile_interval_ms- Full heartbeat cache reconcile interval used in:subscribemode (default from backend capabilities).:dead_node_threshold_ms- How long before a node is considered permanently dead and cleaned up (default: 86_400_000 = 24 hours):crash_threshold_count- Number of crashes before marking object as permanently crashed (default: 5):crash_threshold_window_ms- Time window for crash threshold counting (default: 3_600_000 = 1 hour):module_circuit_breaker_count- Module-wide crash limit before circuit breaker opens (default: 50):module_circuit_breaker_window_ms- Time window for module circuit breaker (default: 300_000 = 5 minutes):module_circuit_breaker_cooldown_ms- Cooldown period when module circuit breaker opens (default: 600_000 = 10 minutes):global_lock_failure_count- Supervisor-wide lock race threshold before the global lock circuit breaker opens (default: 100):global_lock_failure_window_ms- Time window for the global lock circuit breaker threshold (default: 30_000 = 30 seconds):global_lock_failure_cooldown_ms- Cooldown period when the global lock circuit breaker opens (default: 60_000 = 1 minute):backend- Optional storage backend spec:{BackendModule, opts}or a pre-initialized%DurableServer.StorageBackend{}:object_store- Legacy object storage config (used when:backendis not set):max_cpu- Maximum CPU usage percentage before rejecting new children on this node. Values above 100 are valid since CPU load can exceed 100% when the run queue is larger than the core count. When CPU usage reaches this threshold, new placements will be routed to other nodes.:max_memory- Maximum memory usage percentage (1-100) before rejecting new children on this node. When memory usage reaches this threshold, new placements will be routed to other nodes.:max_disk- Maximum disk usage as{percent, mount_point}tuple (e.g.,{90, "/data"}). When disk usage on the specified mount point reaches the threshold, new placements will be routed to other nodes. Unlike CPU and memory limits, disk limits are bypassed for sticky restarts (children returning to their previous node) since part of the disk usage is the child's own data.:heartbeat_meta- Optional node metadata as a map or zero-arity function returning a map. Metadata is included in heartbeats and can be queried viaget_cluster_nodes/1for admin dashboards or other informational purposes. Keys are converted to strings during JSON serialization. Example:heartbeat_meta: %{"app" => "myapp"}orheartbeat_meta: fn -> %{"deployment" => "bluegreen"} end:placement_region- Optional region label used for placement timeout tuning. This value is written to heartbeat metadata as"placement_region"and used to detect same-region vs cross-region placement calls.:placement_erpc_timeout_same_region_ms- Timeout for remote placement ERPC calls when target node is in the sameplacement_region. Default:3_000:placement_erpc_timeout_cross_region_ms- Timeout for remote placement ERPC calls when target node is in a different/unknownplacement_region. Default:8_000:max_singleflight_waiters_per_key_module- Per{key, module}cap for concurrentensure_started_child/3waiters. Calls beyond the cap fail fast with{:error, :singleflight_overloaded}. Default:50_000. Set tonilto disable.:sticky_placement_history_limit- Maximum number of placement history entries to keep per server (default: 5). History tracks unique placement changes over time, useful for identifying displaced servers and re-homing decisions. Oldest entries are pruned first.:init_info- A map of user-defined data passed to each DurableServer'sinit/2callback. Use this to provide shared configuration, API clients, or other dependencies to all servers managed by this supervisor. The map is merged with built-in keys (:supervisor,:task_supervisor,:dynamic_supervisor). Example:init_info: %{api_client: MyApp.API}:group- Options to pass toGroup:shards- The number of group shards. Defaults to 8:log- The log level. One offalse,:info, or:verbose. Defaults:info.
Examples
# Basic usage
{DurableServer.Supervisor, name: MyApp.DurableSup, prefix: "myapp/"}
# With custom intervals
{DurableServer.Supervisor,
name: MyApp.DurableSup,
prefix: "myapp/",
discovery_interval_ms: 30_000,
heartbeat_interval_ms: 15_000}
# With an explicit backend module
{DurableServer.Supervisor,
name: MyApp.DurableSup,
prefix: "myapp/",
backend:
{DurableServer.Backends.ObjectStore,
[
bucket: "my-bucket",
region: "iad"
]}}
# With resource limits
{DurableServer.Supervisor,
name: MyApp.DurableSup,
prefix: "myapp/",
max_cpu: 80,
max_memory: 85,
max_disk: {90, "/data"}}
# With init_info for passing dependencies to servers
{DurableServer.Supervisor,
name: MyApp.DurableSup,
prefix: "myapp/",
init_info: %{api_client: MyApp.APIClient, pubsub: MyApp.PubSub}}
# Start a server
{:ok, {pid, _meta}} = DurableServer.Supervisor.start_child(
MyApp.DurableSup,
{MyUserServer, key: "user_123", initial_state: %{name: "Alice"}}
)
# Terminate a specific server
DurableServer.Supervisor.terminate_child(pid)
Summary
Functions
Returns the count of currently running DurableServer processes.
Returns the current capacity map for this supervisor.
Ensures a DurableServer child process is started under this supervisor.
Gets all cluster nodes from the heartbeat cache with their heartbeat metadata.
Gets detailed information about a server from storage.
Gets all global members matching this supervisor name on the cluster along with their metadata.
Looks up a global durable server by key.
Gets the unique node reference for this supervisor instance.
Checks if the DurableServer.Supervisor is ready to handle requests.
Rehomes a DurableServer child to a different node, bypassing sticky placement.
Starts a DurableServer child process under this supervisor.
Starts a DurableServer.Supervisor with the given options.
Streams server info for all servers in storage.
Terminates a DurableServer child process AND deletes its object storage.
Terminates a specific DurableServer child process gracefully.
Terminates a specific DurableServer child process gracefully, and unmark it for permanent restart.
Blocks until the supervisor is ready or timeout expires.
Lists all currently running DurableServer child processes on this node's supervisor.
Functions
Returns the count of currently running DurableServer processes.
Returns the current capacity map for this supervisor.
Returns a map with :total (total children across all modules) and per-module capacity information,
or nil if no limits are configured.
Examples
iex> current_capacity(MySupervisor)
%{
:total => %{current: 50, limit: 100},
MyModule => %{current: 10, limit: 20}
}
iex> current_capacity(UnlimitedSupervisor)
nil
Ensures a DurableServer child process is started under this supervisor.
Unlike start_child/2, this function first checks the registry for an existing
process before attempting to start a new one. This is useful when you want to
ensure a process exists but don't know if it's already running.
The child spec is {Module, key: key, initial_state: initial_state}.
:initial_state is required and must be a map. If a new process is started
and no persisted state exists yet, DurableServer passes :initial_state
through the module's dump_state/1, the configured backend's encode/decode
path, and then load_state/2 before init/1 or init/2. This means the
dumped initial state must be encodable by the configured backend, and
load_state/2 receives the backend-decoded shape.
Options
:local_only- Whentrue, the child will only be started on the local node. Skips sticky placement preferences and never attempts remote placement. If the local node is at capacity, returns{:error, {:capacity_limit, reason}}. Default:false.:max_placement_retries- Maximum number of remote nodes to try when local placement fails due to capacity limits. Default:3. Ignored whenlocal_only: true.:placement_timeout- Maximum time in milliseconds to keep retrying remote placement. When set, if all placement attempts fail, retries with fresh eligible nodes every 500ms until the deadline. Default:nil(no retry).:timeout- Maximum total time in milliseconds to wait for the process to be found or bootstrapped. Returns{:error, :timeout}on expiration. Set to:infinityto disable. Default:5000ms.
Returns
{:ok, {pid, meta}}- Process is running (either found or newly started){:error, reason}- Failed to start the process
Examples
# Will start if not running, or return existing process
{:ok, {pid, meta}} = DurableServer.Supervisor.ensure_started_child(
MyApp.DurableSup,
{MyServer, key: "server_1", initial_state: %{initial_value: 42}}
)
# Ensure locally only — never attempt remote placement
{:ok, {pid, meta}} = DurableServer.Supervisor.ensure_started_child(
MyApp.DurableSup,
{MyServer, key: "server_1", initial_state: %{}},
local_only: true
)
# Calling again returns the same process
{:ok, {^pid, ^meta}} = DurableServer.Supervisor.ensure_started_child(
MyApp.DurableSup,
{MyServer, key: "server_1", initial_state: %{initial_value: 42}}
)
Gets all cluster nodes from the heartbeat cache with their heartbeat metadata.
Returns a map of node names to node info maps containing heartbeat_meta.
Examples
iex> get_cluster_nodes(MyApp.DurableSupervisor)
%{
"node1@host" => %{heartbeat_meta: %{"region" => "ord"}},
"node2@host" => %{heartbeat_meta: nil}
}
Gets detailed information about a server from storage.
Returns a rich map with server information regardless of whether the server is currently running. This is useful for admin dashboards, debugging, and re-homing decisions.
Return Value
Returns {:ok, info_map} on success or {:error, :not_found} if the server
doesn't exist in storage.
The info map contains:
:key- The server's unique key:module- The DurableServer module:vsn- The state version:status- Server status (:running,:stopped_graceful,:crashed, etc.):permanent- Whether the server is marked as permanent:last_heartbeat_at- Timestamp of last heartbeat (milliseconds):node- The node where the server last ran (from storage):sticky_placement- Current placement values (where it last ran):sticky_placement_history- History of placement changes (most recent first):crash_history- List of crash entries (most recent first), each with:timestampand:reason:user_state- The raw user state (JSON decoded from storage):pid- PID if currently running,nilotherwise:running- Boolean indicating if server is currently running
Placement History
The sticky_placement_history tracks placement changes over time. Each entry
contains an :at timestamp and :placement values. Only unique placements are
recorded (no duplicates when placement doesn't change). The history is capped
at a configurable limit (default 5), with oldest entries pruned first.
The first entry is the most recent placement, and the last entry is the oldest known placement (which may be the original if history hasn't been pruned):
info = DurableServer.Supervisor.get_server_info(MySup, "user_123")
case info.sticky_placement_history do
[current | _rest] ->
# current.placement is where it's running now
# current.at is when it moved there
[] ->
# No placement history (no sticky config or new server)
endExamples
iex> DurableServer.Supervisor.get_server_info(MyDurableSup, "user_123")
{:ok, %{
key: "user_123",
module: MyServer,
vsn: 1,
status: :running,
permanent: true,
last_heartbeat_at: 1704067200000,
node: "node1@host",
sticky_placement: [%{env_var: "FLY_REGION", value: "sjc"}],
sticky_placement_history: [
%{at: 1704067200000, placement: [%{env_var: "FLY_REGION", value: "sjc"}]},
%{at: 1704000000000, placement: [%{env_var: "FLY_REGION", value: "ord"}]}
],
user_state: %{"count" => 42},
pid: #PID<0.123.0>,
running: true
}}
iex> DurableServer.Supervisor.get_server_info(MyDurableSup, "nonexistent")
{:error, :not_found}
Gets all global members matching this supervisor name on the cluster along with their metadata.
Returns a map of all members in the form %{key => {pid, meta}}.
Examples
# Get all members for a supervisor
DurableServer.Supervisor.global_members(MySup)
#=> %{"user_1" => {#PID<0.123.0>, %{...}}, "user_2" => {#PID<0.124.0>, %{...}}}
# Get only members for a specific module
DurableServer.Supervisor.global_members(MySup, MyServer)
#=> %{"user_1" => {#PID<0.123.0>, %{...}}}
Looks up a global durable server by key.
Note: the provided key is not prefixed – the configured supervisor prefix will automatically be applied when looking up the key from underlying storage.
Examples
{DurableServer.Supervisor, name: MyDurableSup, prefix: "myapp/"}
{:ok, {pid, _meta}} = DurableServer.Supervisor.start_child(
MyDurableSup,
{Counter, key: "counter123", initial_state: %{value: 0}}
)
iex> {pid, _meta} = DurableServer.Supervisor.lookup(MyDurableUp, "counter123")
Gets the unique node reference for this supervisor instance.
Note: other nodes will rpc us and call this function, which can race our table creation and config insert, so we handle those cases explicitly.
The node_ref is used to detect when a node has been restarted to avoid PID reuse from making stale locks appear valid. Each supervisor maintains its own node_ref in ets storage that gets cleaned up when supervisor dies.
Checks if the DurableServer.Supervisor is ready to handle requests.
Returns true once the supervisor and its lifecycle manager child are
registered, false otherwise.
This is safe to call at any time, even if the supervisor hasn't started yet.
Rehomes a DurableServer child to a different node, bypassing sticky placement.
This is useful for manual rebalancing or administrative operations. The operation:
- Terminates the process gracefully on its current node (if running)
- Starts the process on the target node (or any eligible node if no target specified)
Parameters
supervisor- The DurableServer.Supervisor namechild_spec- The child spec tuple{module, key: "...", initial_state: %{...}}opts- Options::target_node- Specific node atom to place on (optional, defaults to best available):force- If true, ignore sticky placement entirely (default: true)
Returns
{:ok, {pid, meta}}- Successfully rehomed the process{:error, reason}- Failed to rehome
Examples
# Rehome to a specific node
{:ok, {pid, meta}} = DurableServer.Supervisor.rehome_child(
MySup,
{MyServer, key: "server_1", initial_state: %{}},
target_node: :"node2@host"
)
# Rehome to any available node (ignoring sticky placement)
{:ok, {pid, meta}} = DurableServer.Supervisor.rehome_child(
MySup,
{MyServer, key: "server_1", initial_state: %{}}
)
Starts a DurableServer child process under this supervisor.
The child spec is {Module, key: key, initial_state: initial_state}.
:initial_state is required and must be a map. Before the first init/1 or
init/2 call, DurableServer passes it through the module's dump_state/1,
the configured backend's encode/decode path, and then load_state/2. This
means the dumped initial state must be encodable by the configured backend,
and load_state/2 receives the backend-decoded shape.
Options
:local_only- Whentrue, the child will only be started on the local node. If the local node is at capacity, returns{:error, {:capacity_limit, reason}}instead of attempting remote placement. Default:false.:max_placement_retries- Maximum number of remote nodes to try when local placement fails due to capacity limits. Default:3. Ignored whenlocal_only: true.:placement_timeout- Maximum time in milliseconds to keep retrying remote placement. If all placement attempts fail, the caller retries with fresh eligible nodes every 500ms until the deadline. Useful during rolling deploys when nodes are temporarily unavailable. Set tonilto disable. Default:15000ms.:timeout- Maximum total time in milliseconds to wait for the child bootstrap to complete, including internal retries. Returns{:error, :timeout}on expiration. Set to:infinityto disable. Default:5000ms.
Examples
# Start with init args
{:ok, {pid, meta}} = DurableServer.Supervisor.start_child(
MyApp.DurableSup,
{MyServer, key: "server_1", initial_state: %{initial_value: 42}}
)
# Start locally only — never attempt remote placement
{:ok, {pid, meta}} = DurableServer.Supervisor.start_child(
MyApp.DurableSup,
{MyServer, key: "server_1", initial_state: %{}},
local_only: true
)
# Retry placement for up to 15 seconds during rolling deploys
{:ok, {pid, meta}} = DurableServer.Supervisor.start_child(
MyApp.DurableSup,
{MyServer, key: "server_1", initial_state: %{}},
placement_timeout: 15_000
)
# The server module must use DurableServer
defmodule MyServer do
use DurableServer, vsn: 1
def init(%{initial_value: value}, info) do
{:ok, %{value: value, key: info.key}, meta: %{my: "meta"}}
end
end
Starts a DurableServer.Supervisor with the given options.
Options
:name- Required. The registered name for this supervisor:prefix- Required. Object storage prefix (should end with "/"):max_children- Maximum concurrent children (default: :infinity):discovery_interval_ms- Lifecycle discovery interval (default: 60_000):initial_discovery_delay_ms- Initial discovery delay as a fixed integer or{min_ms, max_ms}jitter tuple (default:{1_000, 6_000}):discovery_shuffle_batch_size- Discovery shuffle batch size (default: 20_000):parallel_restart_batch_size- Concurrent restart attempts per node (default: 50):restart_start_timeout_ms- Timeout for LM-owned claimed restarts (default: 30_000):restart_claim_preferred_fanout- Initial restart claim contention fanout (default: 2):restart_claim_expanded_fanout- Expanded restart claim contention fanout (default: 4):restart_claim_gate_expand_after_ms- Age before widening claim fanout (default: 30_000):restart_claim_gate_disable_after_ms- Age before disabling the claim gate (default: 120_000):heartbeat_interval_ms- Node heartbeat interval (default: 10_000):heartbeat_staleness_threshold_ms- Node heartbeat stale/orphan threshold (default: 30_000):heartbeat_tracking_mode- Heartbeat cache strategy::pollor:subscribe:heartbeat_reconcile_interval_ms- Full heartbeat cache reconcile interval:dead_node_threshold_ms- Dead node cleanup threshold (default: 300_000):crash_threshold_count- Crashes before permanent crash (default: 5):crash_threshold_window_ms- Crash threshold window (default: 3_600_000):module_circuit_breaker_count- Module crash limit (default: 50):module_circuit_breaker_window_ms- Module circuit breaker window (default: 300_000):module_circuit_breaker_cooldown_ms- Module circuit breaker cooldown (default: 600_000):backend- Optional storage backend spec:{BackendModule, opts}or a pre-initialized%DurableServer.StorageBackend{}:object_store- Legacy object storage config (used when:backendis not set):init_info- Map of user-defined data passed to each server'sinit/2callback (default:%{}):placement_region- Optional region label used for placement timeout tuning.:placement_erpc_timeout_same_region_ms- Same-region remote placement ERPC timeout in ms. Default: 3000:placement_erpc_timeout_cross_region_ms- Cross-region remote placement ERPC timeout in ms. Default: 8000:max_singleflight_waiters_per_key_module- Per{key, module}cap for concurrentensure_started_child/3waiters. Calls beyond the cap fail fast with{:error, :singleflight_overloaded}. Default:50_000. Set tonilto disable.
Streams server info for all servers in storage.
Returns a Stream that yields info maps for each server found in storage. Failed fetches are filtered out. Excludes internal node metadata objects.
This is useful for admin dashboards that need to iterate over all servers without loading everything into memory at once.
Examples
# Stream all servers
DurableServer.Supervisor.stream_all_server_info(MySup)
|> Enum.to_list()
# Stream only permanently crashed servers
DurableServer.Supervisor.stream_all_server_info(MySup)
|> Stream.filter(fn info -> info.status == :permanently_crashed end)
|> Enum.to_list()
Terminates a DurableServer child process AND deletes its object storage.
This permanently removes the server and all its persisted state. The operation:
- Finds the running process (if any) by PID or key
- Terminates the process gracefully (allowing final state sync)
- Deletes the object storage data
Parameters
supervisor- The DurableServer.Supervisor namepid_or_key- Either a PID of the running process or the key string
Returns
:ok- Successfully terminated process and deleted storage{:error, reason}- Failed to delete (process may still be terminated)
Examples
# Delete by PID
:ok = DurableServer.Supervisor.terminate_and_delete_child(MySup, pid)
# Delete by key
:ok = DurableServer.Supervisor.terminate_and_delete_child(MySup, "user_123")
Terminates a specific DurableServer child process gracefully.
The child will be given time to sync its state before termination.
Terminates a specific DurableServer child process gracefully, and unmark it for permanent restart.
Useful to stop a previously permanently started durable server so that it won't be considered a candidated for permanent restart in the future.
The child will be given time to sync its state before termination.
Blocks until the supervisor is ready or timeout expires.
Returns :ok if ready, {:error, :timeout} if timeout expires.
This is intended to be called via RPC on a node that may still be booting. The remote node will block until its supervisor is ready, preventing ETS errors from concurrent access during startup.
Options
:timeout- Maximum time to wait in milliseconds (default: 5000):poll_interval- How often to check readiness in milliseconds (default: 100)
Lists all currently running DurableServer child processes on this node's supervisor.