distribute/cluster
Types
pub type ClusterHealth {
ClusterHealth(
self_node: String,
is_distributed: Bool,
connected_nodes: List(String),
connected_count: Int,
reachable_nodes: List(String),
unreachable_nodes: List(String),
)
}
Constructors
-
ClusterHealth( self_node: String, is_distributed: Bool, connected_nodes: List(String), connected_count: Int, reachable_nodes: List(String), unreachable_nodes: List(String), )
Errors from connect/1.
net_kernel:connect_node/1 returns only true | false | ignored.
It cannot distinguish “node does not exist” from “unreachable”. A
NodeNotFound variant would be a lie at this layer, so it is not
exposed; both cases collapse to ConnectFailed.
pub type ConnectError {
ConnectFailed
ConnectIgnored
InvalidNodeFormat(String)
ConnectAtomBudgetExceeded
}
Constructors
-
ConnectFailedThe peer was reachable in principle (distribution is up) but refused or did not answer. Returned by
net_kernel:connect_node/1 = false. -
ConnectIgnoredThe local node is not running distribution (
net_kernelnot started), soconnect_nodedeclined to even try. Returned byconnect_node = ignored. -
InvalidNodeFormat(String)The supplied name failed format validation (missing
@, disallowed charset, length). Carries a human-readable reason. -
ConnectAtomBudgetExceededConnecting would create a fresh node atom but the configured
max_distribution_atomsbudget is exhausted. Refused before touchingbinary_to_atom. The VM atom table stays safe.
pub type StartError {
InvalidNodeName(String)
InvalidCookieFormat(String)
AlreadyStarted
NetworkError(String)
StartFailed(String)
StartAtomBudgetExceeded
}
Constructors
-
InvalidNodeName(String)Node name failed format validation: must be
<name>@<host>with charset[a-zA-Z0-9_-]+@[a-zA-Z0-9._-]+and 1..255 bytes. -
InvalidCookieFormat(String)Cookie failed format validation: charset
[a-zA-Z0-9_-]+, 1..255 bytes. -
AlreadyStartednet_kernel:start/1reported the node was already running. -
NetworkError(String)net_kernel:start/1failed with a network-related reason (network,eaddrinuse,econnrefused). -
StartFailed(String)net_kernel:start/1failed with another reason. -
StartAtomBudgetExceededThe configured
max_distribution_atomsbudget has been exhausted. Creating the node-name or cookie atom would exceed the cap. Either raisemax_distribution_atomsor stop accepting fresh node names from the upstream caller.
Values
pub fn connect(node: String) -> Result(Nil, ConnectError)
Connect to a remote node. Returns Ok(Nil) on success.
Atom-table guardrail
Each call with a previously-unseen node name interns one atom in
the BEAM atom table (atoms are never garbage collected, and the
table is capped at 1 048 576 entries by default). To prevent a
caller, malicious or buggy, from exhausting the table by
looping over millions of valid-looking names, every fresh atom
creation is counted against config.max_distribution_atoms.
The check is atomic (atomics:add_get/3) and lock-free. Once the
budget is reached, this function returns
Error(ConnectAtomBudgetExceeded) before binary_to_atom/2 is
called: the VM atom table cannot be exhausted through this path.
Default budget: 10 000 fresh atoms over the process lifetime.
10x a generous cluster size, four orders of magnitude below the
VM cap. Tune via config.configure(... max_distribution_atoms:).
pub fn connect_error_to_string(err: ConnectError) -> String
pub fn has_peers() -> Bool
Whether this node has at least one connected peer.
This is a topology check, not a health check: a single-node deployment
is operationally fine and will return False here.
pub fn health() -> ClusterHealth
Perform a cluster health check, pinging each known node in parallel.
See also: has_peers/0 (boolean topology shortcut), is_healthy/0
(compatibility alias), is_distributed/0,
ping/1 (single node).
net_adm:ping/1 is a synchronous network call with an implicit BEAM
distribution timeout of several seconds. Pinging N nodes sequentially
would block the caller for up to N * timeout_per_ping (e.g. 50 nodes
during a partition = ~6 minutes). We fan out with bounded parallelism
and collect results with a single 8 s deadline. Worst-case wall clock
is still bounded by the deadline, not by cluster size.
Output ordering is deterministic: reachable_nodes and
unreachable_nodes are projected in the same order as
connected_nodes, regardless of worker reply timing.
pub fn is_distributed() -> Bool
Whether this node is running BEAM distribution.
Backed by erlang:is_alive/0, which is the authoritative signal.
It returns true iff net_kernel has been started. Previous versions
compared the string form of the node name against "nonode@nohost",
which would lie if the runtime ever changed that placeholder.
pub fn is_healthy() -> Bool
Deprecated alias for has_peers/0, kept for compatibility with direct
distribute/cluster imports from pre-facade code.
pub fn ping(node: String) -> Bool
Ping a remote node. Returns True if it responds.
Subject to the same config.max_distribution_atoms guardrail as
connect: once the fresh-atom budget is exhausted,
ping returns False (cannot reach) without touching the VM
atom table.
pub fn start_error_to_string(err: StartError) -> String
pub fn start_monitor() -> Result(
process.Subject(cluster_monitor.Message),
actor.StartError,
)
Start the cluster monitor actor. It listens for Erlang node events and broadcasts them to all Gleam subscribers.
pub fn start_node(
name: String,
cookie: String,
) -> Result(Nil, StartError)
Start a distributed BEAM node.
name must contain @ (e.g. "myapp@127.0.0.1").
Cookie length and charset are enforced byte-wise by the FFI: any
failure surfaces as InvalidCookieFormat.
Atom-budget exhaustion: the FFI emits
AtomBudgetExhausted(<offending input>, AtomBudgetOnStartNode)
before returning, with the actual offending input (name or
cookie). We do not re-emit here because the public unit
constructor StartAtomBudgetExceeded cannot carry that
attribution.
Blocking and OS-level dependencies
This call can block. It delegates to net_kernel:start/1,
which talks to epmd (Erlang Port Mapper Daemon) and resolves
the host portion of name against the OS resolver. If epmd
is not running, if DNS is misconfigured, or if the network goes
down a moment before the call, the BEAM may hang on a libc
resolver timeout for tens of seconds and there is no Gleam-side
timeout the library can interpose.
Mitigations callers can apply:
- Run
epmd -daemonbefore the process boots, and treat its absence as a fatal startup condition rather than somethingstart_nodeshould recover from. - Use IP literals (
myapp@127.0.0.1) when the deployment allows, bypassing DNS entirely. - In container deployments, ensure
/etc/hostsresolves the chosen host beforestart_nodeis called.
If you cannot accept a potentially long boot wait, supervise the
boot itself: spawn a process that calls start_node, monitor
it, and treat a deadline miss as a startup failure. The library
does not bake a timeout in because the right value is
deployment-specific (a 2 s timeout is generous for IP literals
but a hair-trigger for DNS-backed names).
pub fn subscribe(
monitor: process.Subject(cluster_monitor.Message),
listener: process.Subject(cluster_monitor.ClusterEvent),
) -> Nil
Subscribe a subject to cluster events (NodeUp/NodeDown).
pub fn unsubscribe(
monitor: process.Subject(cluster_monitor.Message),
listener: process.Subject(cluster_monitor.ClusterEvent),
) -> Nil
Unsubscribe from cluster events.