DurableServer.Supervisor (durable_server v0.1.1)

Supervisor for DurableServer processes with lifecycle management and graceful shutdown.

DurableServer.Supervisor provides a scoped environment for managing DurableServer processes similar to how Task.Supervisor manages Task processes. Each supervisor instance maintains its own lifecycle manager, heartbeat system, and object storage namespace, preventing conflicts between different applications or components.

Usage

Start a DurableServer.Supervisor in your application supervision tree:

children = [
  {DurableServer.Supervisor, name: MyApp.DurableSup, prefix: "myapp/"}
]

Then start DurableServer processes through the supervisor:

DurableServer.Supervisor.start_child(
  MyApp.DurableSup,
  {MyServer, key: "user_123", initial_state: %{}}
)

Architecture

Each DurableServer.Supervisor creates the following supervision tree:

MyApp.DurableSup
 TaskSupervisor          # The task supervisor for for async internal operations
 DynamicSupervisor       # The supervisor for all `DurableServer` processes on this node
 SingleflightGuard       # Guard table sweeper for ensure_started waiters
 LifecycleManager        # Monitors and restarts crashed servers
 Terminator              # Coordinates graceful shutdown

Components

LifecycleManager: Automatically detects and restarts crashed or orphaned DurableServer processes within this supervisor's scope. Uses object storage queries and node heartbeats to identify servers that need restart.

SingleflightGuard: Maintains and sweeps the per-key/module waiter guard table used by ensure_started_child/3 overload protection.

Terminator: Handles graceful shutdown by instructing all DurableServer processes to sync their state before termination. Waits for confirmation (up to a timeout) before allowing the supervisor to shut down.

Object Storage Scoping

Each supervisor uses a unique prefix for object storage to prevent naming conflicts:

prefix: "myapp/"
# Results in keys like: myapp/user_123, myapp/session_456

Node Heartbeats

The LifecycleManager maintains node-level heartbeats in object storage at {prefix}nodes/{node_name} and caches them locally for efficient health checking during restart decisions.

Configuration Options

  • :name - Required. Registered name for this supervisor instance
  • :prefix - Required. Object storage prefix for scoping (should end with "/")
  • :max_children - Maximum concurrent DurableServer processes (default: :infinity)
  • :discovery_interval_ms - How often to scan for orphaned servers (default: 60_000)
  • :initial_discovery_delay_ms - Initial delay before the first discovery sweep. Accepts either a fixed integer delay or a {min_ms, max_ms} jitter tuple (default: {1_000, 6_000})
  • :discovery_burst_count - Number of initial discovery sweeps to run back-to-back without waiting for the discovery interval (default: 3)
  • :discovery_shuffle_batch_size - Number of candidate keys to accumulate before shuffling restart order (default: 20_000)
  • :parallel_restart_batch_size - Number of restart attempts to run concurrently per node during a discovery sweep (default: 50)
  • :restart_start_timeout_ms - Maximum time to wait for a claimed LM restart to finish bootstrapping before treating the outcome as unknown (default: 30_000)
  • :restart_claim_preferred_fanout - Number of eligible nodes allowed to contend for a freshly restartable key before widening (default: 2)
  • :restart_claim_expanded_fanout - Number of eligible nodes allowed to contend after the first restart-gate age threshold (default: 4)
  • :restart_claim_gate_expand_after_ms - Age after which restart contention widens from preferred to expanded fanout (default: 30_000)
  • :restart_claim_gate_disable_after_ms - Age after which the restart contention gate is disabled and all eligible nodes may contend (default: 120_000)
  • :heartbeat_interval_ms - How often to write node heartbeats (default: 10_000)
  • :heartbeat_staleness_threshold_ms - How long a node heartbeat may go without success before the node is considered stale/orphan-claimable (default: 30_000)
  • :heartbeat_tracking_mode - Heartbeat cache strategy: :poll or :subscribe. Defaults from backend capabilities.
  • :heartbeat_reconcile_interval_ms - Full heartbeat cache reconcile interval used in :subscribe mode (default from backend capabilities).
  • :dead_node_threshold_ms - How long before a node is considered permanently dead and cleaned up (default: 86_400_000 = 24 hours)
  • :crash_threshold_count - Number of crashes before marking object as permanently crashed (default: 5)
  • :crash_threshold_window_ms - Time window for crash threshold counting (default: 3_600_000 = 1 hour)
  • :module_circuit_breaker_count - Module-wide crash limit before circuit breaker opens (default: 50)
  • :module_circuit_breaker_window_ms - Time window for module circuit breaker (default: 300_000 = 5 minutes)
  • :module_circuit_breaker_cooldown_ms - Cooldown period when module circuit breaker opens (default: 600_000 = 10 minutes)
  • :global_lock_failure_count - Supervisor-wide lock race threshold before the global lock circuit breaker opens (default: 100)
  • :global_lock_failure_window_ms - Time window for the global lock circuit breaker threshold (default: 30_000 = 30 seconds)
  • :global_lock_failure_cooldown_ms - Cooldown period when the global lock circuit breaker opens (default: 60_000 = 1 minute)
  • :backend - Optional storage backend spec: {BackendModule, opts} or a pre-initialized %DurableServer.StorageBackend{}
  • :object_store - Legacy object storage config (used when :backend is not set)
  • :max_cpu - Maximum CPU usage percentage before rejecting new children on this node. Values above 100 are valid since CPU load can exceed 100% when the run queue is larger than the core count. When CPU usage reaches this threshold, new placements will be routed to other nodes.
  • :max_memory - Maximum memory usage percentage (1-100) before rejecting new children on this node. When memory usage reaches this threshold, new placements will be routed to other nodes.
  • :max_disk - Maximum disk usage as {percent, mount_point} tuple (e.g., {90, "/data"}). When disk usage on the specified mount point reaches the threshold, new placements will be routed to other nodes. Unlike CPU and memory limits, disk limits are bypassed for sticky restarts (children returning to their previous node) since part of the disk usage is the child's own data.
  • :heartbeat_meta - Optional node metadata as a map or zero-arity function returning a map. Metadata is included in heartbeats and can be queried via get_cluster_nodes/1 for admin dashboards or other informational purposes. Keys are converted to strings during JSON serialization. Example: heartbeat_meta: %{"app" => "myapp"} or heartbeat_meta: fn -> %{"deployment" => "bluegreen"} end
  • :placement_region - Optional region label used for placement timeout tuning. This value is written to heartbeat metadata as "placement_region" and used to detect same-region vs cross-region placement calls.
  • :placement_erpc_timeout_same_region_ms - Timeout for remote placement ERPC calls when target node is in the same placement_region. Default: 3_000
  • :placement_erpc_timeout_cross_region_ms - Timeout for remote placement ERPC calls when target node is in a different/unknown placement_region. Default: 8_000
  • :max_singleflight_waiters_per_key_module - Per {key, module} cap for concurrent ensure_started_child/3 waiters. Calls beyond the cap fail fast with {:error, :singleflight_overloaded}. Default: 50_000. Set to nil to disable.
  • :sticky_placement_history_limit - Maximum number of placement history entries to keep per server (default: 5). History tracks unique placement changes over time, useful for identifying displaced servers and re-homing decisions. Oldest entries are pruned first.
  • :init_info - A map of user-defined data passed to each DurableServer's init/2 callback. Use this to provide shared configuration, API clients, or other dependencies to all servers managed by this supervisor. The map is merged with built-in keys (:supervisor, :task_supervisor, :dynamic_supervisor). Example: init_info: %{api_client: MyApp.API}
  • :group - Options to pass to Group
    • :shards - The number of group shards. Defaults to 8
    • :log - The log level. One of false, :info, or :verbose. Defaults :info.

Examples

# Basic usage
{DurableServer.Supervisor, name: MyApp.DurableSup, prefix: "myapp/"}

# With custom intervals
{DurableServer.Supervisor,
 name: MyApp.DurableSup,
 prefix: "myapp/",
 discovery_interval_ms: 30_000,
 heartbeat_interval_ms: 15_000}

# With an explicit backend module
{DurableServer.Supervisor,
 name: MyApp.DurableSup,
 prefix: "myapp/",
 backend:
   {DurableServer.Backends.ObjectStore,
    [
      bucket: "my-bucket",
      region: "iad"
    ]}}

# With resource limits
{DurableServer.Supervisor,
 name: MyApp.DurableSup,
 prefix: "myapp/",
 max_cpu: 80,
 max_memory: 85,
 max_disk: {90, "/data"}}

# With init_info for passing dependencies to servers
{DurableServer.Supervisor,
 name: MyApp.DurableSup,
 prefix: "myapp/",
 init_info: %{api_client: MyApp.APIClient, pubsub: MyApp.PubSub}}

# Start a server
{:ok, {pid, _meta}} = DurableServer.Supervisor.start_child(
  MyApp.DurableSup,
  {MyUserServer, key: "user_123", initial_state: %{name: "Alice"}}
)

# Terminate a specific server
DurableServer.Supervisor.terminate_child(pid)

Summary

Functions

Returns the count of currently running DurableServer processes.

Returns the current capacity map for this supervisor.

Ensures a DurableServer child process is started under this supervisor.

Gets all cluster nodes from the heartbeat cache with their heartbeat metadata.

Gets detailed information about a server from storage.

Gets all global members matching this supervisor name on the cluster along with their metadata.

Looks up a global durable server by key.

Gets the unique node reference for this supervisor instance.

Checks if the DurableServer.Supervisor is ready to handle requests.

Rehomes a DurableServer child to a different node, bypassing sticky placement.

Starts a DurableServer child process under this supervisor.

Starts a DurableServer.Supervisor with the given options.

Streams server info for all servers in storage.

Terminates a DurableServer child process AND deletes its object storage.

Terminates a specific DurableServer child process gracefully.

Terminates a specific DurableServer child process gracefully, and unmark it for permanent restart.

Blocks until the supervisor is ready or timeout expires.

Lists all currently running DurableServer child processes on this node's supervisor.

Functions

count_children(supervisor)

Returns the count of currently running DurableServer processes.

current_capacity(supervisor_name)

Returns the current capacity map for this supervisor.

Returns a map with :total (total children across all modules) and per-module capacity information, or nil if no limits are configured.

Examples

iex> current_capacity(MySupervisor)
%{
  :total => %{current: 50, limit: 100},
  MyModule => %{current: 10, limit: 20}
}

iex> current_capacity(UnlimitedSupervisor)
nil

ensure_started_child(supervisor, child_spec, opts \\ [])

Ensures a DurableServer child process is started under this supervisor.

Unlike start_child/2, this function first checks the registry for an existing process before attempting to start a new one. This is useful when you want to ensure a process exists but don't know if it's already running.

The child spec is {Module, key: key, initial_state: initial_state}. :initial_state is required and must be a map. If a new process is started and no persisted state exists yet, DurableServer passes :initial_state through the module's dump_state/1, the configured backend's encode/decode path, and then load_state/2 before init/1 or init/2. This means the dumped initial state must be encodable by the configured backend, and load_state/2 receives the backend-decoded shape.

Options

  • :local_only - When true, the child will only be started on the local node. Skips sticky placement preferences and never attempts remote placement. If the local node is at capacity, returns {:error, {:capacity_limit, reason}}. Default: false.
  • :max_placement_retries - Maximum number of remote nodes to try when local placement fails due to capacity limits. Default: 3. Ignored when local_only: true.
  • :placement_timeout - Maximum time in milliseconds to keep retrying remote placement. When set, if all placement attempts fail, retries with fresh eligible nodes every 500ms until the deadline. Default: nil (no retry).
  • :timeout - Maximum total time in milliseconds to wait for the process to be found or bootstrapped. Returns {:error, :timeout} on expiration. Set to :infinity to disable. Default: 5000ms.

Returns

  • {:ok, {pid, meta}} - Process is running (either found or newly started)
  • {:error, reason} - Failed to start the process

Examples

# Will start if not running, or return existing process
{:ok, {pid, meta}} = DurableServer.Supervisor.ensure_started_child(
  MyApp.DurableSup,
  {MyServer, key: "server_1", initial_state: %{initial_value: 42}}
)

# Ensure locally only — never attempt remote placement
{:ok, {pid, meta}} = DurableServer.Supervisor.ensure_started_child(
  MyApp.DurableSup,
  {MyServer, key: "server_1", initial_state: %{}},
  local_only: true
)

# Calling again returns the same process
{:ok, {^pid, ^meta}} = DurableServer.Supervisor.ensure_started_child(
  MyApp.DurableSup,
  {MyServer, key: "server_1", initial_state: %{initial_value: 42}}
)

get_cluster_nodes(supervisor_name)

Gets all cluster nodes from the heartbeat cache with their heartbeat metadata.

Returns a map of node names to node info maps containing heartbeat_meta.

Examples

iex> get_cluster_nodes(MyApp.DurableSupervisor)
%{
  "node1@host" => %{heartbeat_meta: %{"region" => "ord"}},
  "node2@host" => %{heartbeat_meta: nil}
}

get_dynamic_supervisor(supervisor)

get_server_info(sup_name, key)

Gets detailed information about a server from storage.

Returns a rich map with server information regardless of whether the server is currently running. This is useful for admin dashboards, debugging, and re-homing decisions.

Return Value

Returns {:ok, info_map} on success or {:error, :not_found} if the server doesn't exist in storage.

The info map contains:

  • :key - The server's unique key
  • :module - The DurableServer module
  • :vsn - The state version
  • :status - Server status (:running, :stopped_graceful, :crashed, etc.)
  • :permanent - Whether the server is marked as permanent
  • :last_heartbeat_at - Timestamp of last heartbeat (milliseconds)
  • :node - The node where the server last ran (from storage)
  • :sticky_placement - Current placement values (where it last ran)
  • :sticky_placement_history - History of placement changes (most recent first)
  • :crash_history - List of crash entries (most recent first), each with :timestamp and :reason
  • :user_state - The raw user state (JSON decoded from storage)
  • :pid - PID if currently running, nil otherwise
  • :running - Boolean indicating if server is currently running

Placement History

The sticky_placement_history tracks placement changes over time. Each entry contains an :at timestamp and :placement values. Only unique placements are recorded (no duplicates when placement doesn't change). The history is capped at a configurable limit (default 5), with oldest entries pruned first.

The first entry is the most recent placement, and the last entry is the oldest known placement (which may be the original if history hasn't been pruned):

info = DurableServer.Supervisor.get_server_info(MySup, "user_123")
case info.sticky_placement_history do
  [current | _rest] ->
    # current.placement is where it's running now
    # current.at is when it moved there
  [] ->
    # No placement history (no sticky config or new server)
end

Examples

iex> DurableServer.Supervisor.get_server_info(MyDurableSup, "user_123")
{:ok, %{
  key: "user_123",
  module: MyServer,
  vsn: 1,
  status: :running,
  permanent: true,
  last_heartbeat_at: 1704067200000,
  node: "node1@host",
  sticky_placement: [%{env_var: "FLY_REGION", value: "sjc"}],
  sticky_placement_history: [
    %{at: 1704067200000, placement: [%{env_var: "FLY_REGION", value: "sjc"}]},
    %{at: 1704000000000, placement: [%{env_var: "FLY_REGION", value: "ord"}]}
  ],
  user_state: %{"count" => 42},
  pid: #PID<0.123.0>,
  running: true
}}

iex> DurableServer.Supervisor.get_server_info(MyDurableSup, "nonexistent")
{:error, :not_found}

get_task_supervisor(supervisor)

global_members(sup_name)

Gets all global members matching this supervisor name on the cluster along with their metadata.

Returns a map of all members in the form %{key => {pid, meta}}.

Examples

# Get all members for a supervisor
DurableServer.Supervisor.global_members(MySup)
#=> %{"user_1" => {#PID<0.123.0>, %{...}}, "user_2" => {#PID<0.124.0>, %{...}}}

# Get only members for a specific module
DurableServer.Supervisor.global_members(MySup, MyServer)
#=> %{"user_1" => {#PID<0.123.0>, %{...}}}

global_members(sup_name, module)

lookup(sup_name, key)

Looks up a global durable server by key.

Note: the provided key is not prefixed – the configured supervisor prefix will automatically be applied when looking up the key from underlying storage.

Examples

{DurableServer.Supervisor, name: MyDurableSup, prefix: "myapp/"}
{:ok, {pid, _meta}} = DurableServer.Supervisor.start_child(
  MyDurableSup,
  {Counter, key: "counter123", initial_state: %{value: 0}}
)

iex> {pid, _meta} = DurableServer.Supervisor.lookup(MyDurableUp, "counter123")

node_ref(supervisor_name)

Gets the unique node reference for this supervisor instance.

Note: other nodes will rpc us and call this function, which can race our table creation and config insert, so we handle those cases explicitly.

The node_ref is used to detect when a node has been restarted to avoid PID reuse from making stale locks appear valid. Each supervisor maintains its own node_ref in ets storage that gets cleaned up when supervisor dies.

ready?(supervisor_name)

Checks if the DurableServer.Supervisor is ready to handle requests.

Returns true once the supervisor and its lifecycle manager child are registered, false otherwise.

This is safe to call at any time, even if the supervisor hasn't started yet.

rehome_child(supervisor, arg, opts \\ [])

Rehomes a DurableServer child to a different node, bypassing sticky placement.

This is useful for manual rebalancing or administrative operations. The operation:

  1. Terminates the process gracefully on its current node (if running)
  2. Starts the process on the target node (or any eligible node if no target specified)

Parameters

  • supervisor - The DurableServer.Supervisor name
  • child_spec - The child spec tuple {module, key: "...", initial_state: %{...}}
  • opts - Options:
    • :target_node - Specific node atom to place on (optional, defaults to best available)
    • :force - If true, ignore sticky placement entirely (default: true)

Returns

  • {:ok, {pid, meta}} - Successfully rehomed the process
  • {:error, reason} - Failed to rehome

Examples

# Rehome to a specific node
{:ok, {pid, meta}} = DurableServer.Supervisor.rehome_child(
  MySup,
  {MyServer, key: "server_1", initial_state: %{}},
  target_node: :"node2@host"
)

# Rehome to any available node (ignoring sticky placement)
{:ok, {pid, meta}} = DurableServer.Supervisor.rehome_child(
  MySup,
  {MyServer, key: "server_1", initial_state: %{}}
)

start_child(supervisor, child_spec, opts \\ [])

Starts a DurableServer child process under this supervisor.

The child spec is {Module, key: key, initial_state: initial_state}. :initial_state is required and must be a map. Before the first init/1 or init/2 call, DurableServer passes it through the module's dump_state/1, the configured backend's encode/decode path, and then load_state/2. This means the dumped initial state must be encodable by the configured backend, and load_state/2 receives the backend-decoded shape.

Options

  • :local_only - When true, the child will only be started on the local node. If the local node is at capacity, returns {:error, {:capacity_limit, reason}} instead of attempting remote placement. Default: false.
  • :max_placement_retries - Maximum number of remote nodes to try when local placement fails due to capacity limits. Default: 3. Ignored when local_only: true.
  • :placement_timeout - Maximum time in milliseconds to keep retrying remote placement. If all placement attempts fail, the caller retries with fresh eligible nodes every 500ms until the deadline. Useful during rolling deploys when nodes are temporarily unavailable. Set to nil to disable. Default: 15000ms.
  • :timeout - Maximum total time in milliseconds to wait for the child bootstrap to complete, including internal retries. Returns {:error, :timeout} on expiration. Set to :infinity to disable. Default: 5000ms.

Examples

# Start with init args
{:ok, {pid, meta}} = DurableServer.Supervisor.start_child(
  MyApp.DurableSup,
  {MyServer, key: "server_1", initial_state: %{initial_value: 42}}
)

# Start locally only — never attempt remote placement
{:ok, {pid, meta}} = DurableServer.Supervisor.start_child(
  MyApp.DurableSup,
  {MyServer, key: "server_1", initial_state: %{}},
  local_only: true
)

# Retry placement for up to 15 seconds during rolling deploys
{:ok, {pid, meta}} = DurableServer.Supervisor.start_child(
  MyApp.DurableSup,
  {MyServer, key: "server_1", initial_state: %{}},
  placement_timeout: 15_000
)

# The server module must use DurableServer
defmodule MyServer do
  use DurableServer, vsn: 1

  def init(%{initial_value: value}, info) do
    {:ok, %{value: value, key: info.key}, meta: %{my: "meta"}}
  end
end

start_link(opts)

Starts a DurableServer.Supervisor with the given options.

Options

  • :name - Required. The registered name for this supervisor
  • :prefix - Required. Object storage prefix (should end with "/")
  • :max_children - Maximum concurrent children (default: :infinity)
  • :discovery_interval_ms - Lifecycle discovery interval (default: 60_000)
  • :initial_discovery_delay_ms - Initial discovery delay as a fixed integer or {min_ms, max_ms} jitter tuple (default: {1_000, 6_000})
  • :discovery_shuffle_batch_size - Discovery shuffle batch size (default: 20_000)
  • :parallel_restart_batch_size - Concurrent restart attempts per node (default: 50)
  • :restart_start_timeout_ms - Timeout for LM-owned claimed restarts (default: 30_000)
  • :restart_claim_preferred_fanout - Initial restart claim contention fanout (default: 2)
  • :restart_claim_expanded_fanout - Expanded restart claim contention fanout (default: 4)
  • :restart_claim_gate_expand_after_ms - Age before widening claim fanout (default: 30_000)
  • :restart_claim_gate_disable_after_ms - Age before disabling the claim gate (default: 120_000)
  • :heartbeat_interval_ms - Node heartbeat interval (default: 10_000)
  • :heartbeat_staleness_threshold_ms - Node heartbeat stale/orphan threshold (default: 30_000)
  • :heartbeat_tracking_mode - Heartbeat cache strategy: :poll or :subscribe
  • :heartbeat_reconcile_interval_ms - Full heartbeat cache reconcile interval
  • :dead_node_threshold_ms - Dead node cleanup threshold (default: 300_000)
  • :crash_threshold_count - Crashes before permanent crash (default: 5)
  • :crash_threshold_window_ms - Crash threshold window (default: 3_600_000)
  • :module_circuit_breaker_count - Module crash limit (default: 50)
  • :module_circuit_breaker_window_ms - Module circuit breaker window (default: 300_000)
  • :module_circuit_breaker_cooldown_ms - Module circuit breaker cooldown (default: 600_000)
  • :backend - Optional storage backend spec: {BackendModule, opts} or a pre-initialized %DurableServer.StorageBackend{}
  • :object_store - Legacy object storage config (used when :backend is not set)
  • :init_info - Map of user-defined data passed to each server's init/2 callback (default: %{})
  • :placement_region - Optional region label used for placement timeout tuning.
  • :placement_erpc_timeout_same_region_ms - Same-region remote placement ERPC timeout in ms. Default: 3000
  • :placement_erpc_timeout_cross_region_ms - Cross-region remote placement ERPC timeout in ms. Default: 8000
  • :max_singleflight_waiters_per_key_module - Per {key, module} cap for concurrent ensure_started_child/3 waiters. Calls beyond the cap fail fast with {:error, :singleflight_overloaded}. Default: 50_000. Set to nil to disable.

stream_all_server_info(sup_name)

Streams server info for all servers in storage.

Returns a Stream that yields info maps for each server found in storage. Failed fetches are filtered out. Excludes internal node metadata objects.

This is useful for admin dashboards that need to iterate over all servers without loading everything into memory at once.

Examples

# Stream all servers
DurableServer.Supervisor.stream_all_server_info(MySup)
|> Enum.to_list()

# Stream only permanently crashed servers
DurableServer.Supervisor.stream_all_server_info(MySup)
|> Stream.filter(fn info -> info.status == :permanently_crashed end)
|> Enum.to_list()

terminate_and_delete_child(supervisor, pid_or_key, timeout \\ 5000)

Terminates a DurableServer child process AND deletes its object storage.

This permanently removes the server and all its persisted state. The operation:

  1. Finds the running process (if any) by PID or key
  2. Terminates the process gracefully (allowing final state sync)
  3. Deletes the object storage data

Parameters

  • supervisor - The DurableServer.Supervisor name
  • pid_or_key - Either a PID of the running process or the key string

Returns

  • :ok - Successfully terminated process and deleted storage
  • {:error, reason} - Failed to delete (process may still be terminated)

Examples

# Delete by PID
:ok = DurableServer.Supervisor.terminate_and_delete_child(MySup, pid)

# Delete by key
:ok = DurableServer.Supervisor.terminate_and_delete_child(MySup, "user_123")

terminate_child(supervisor_name, pid)

Terminates a specific DurableServer child process gracefully.

The child will be given time to sync its state before termination.

terminate_child_permanent(supervisor_name, pid)

Terminates a specific DurableServer child process gracefully, and unmark it for permanent restart.

Useful to stop a previously permanently started durable server so that it won't be considered a candidated for permanent restart in the future.

The child will be given time to sync its state before termination.

wait_until_ready(supervisor_name, opts \\ [])

Blocks until the supervisor is ready or timeout expires.

Returns :ok if ready, {:error, :timeout} if timeout expires.

This is intended to be called via RPC on a node that may still be booting. The remote node will block until its supervisor is ready, preventing ETS errors from concurrent access during startup.

Options

  • :timeout - Maximum time to wait in milliseconds (default: 5000)
  • :poll_interval - How often to check readiness in milliseconds (default: 100)

which_children(supervisor)

Lists all currently running DurableServer child processes on this node's supervisor.