distribute/registry
Types
Errors from polling lookups.
pub type LookupError {
LookupNotFound
LookupInvalidTimeout(Int)
LookupInvalidPollInterval(Int)
}
Constructors
-
LookupNotFoundName not registered within the timeout window.
-
LookupInvalidTimeout(Int)timeout_mswas zero or negative. -
LookupInvalidPollInterval(Int)poll_interval_mswas zero or negative.
Errors from registry operations.
pub type RegisterError {
AlreadyExists
InvalidProcess
InvalidArgument(String)
NetworkError(String)
RegistrationFailed(String)
}
Constructors
-
AlreadyExistsName is already registered by another process.
-
InvalidProcessProcess is not alive or invalid.
-
InvalidArgument(String)Name is empty, too long, or contains invalid characters.
-
NetworkError(String)Network partition or connectivity issue.
-
RegistrationFailed(String)Generic registration failure.
A name bound to an encoder/decoder pair. The msg type links registration and lookup at compile time.
pub opaque type TypedName(msg)
Outcome of a failed unregister/1. The success case
(Ok(Nil)) means we removed the entry; an error tells the
caller why we did not, so debugging surfaces the real cause
instead of a silent no-op.
pub type UnregisterError {
NotFound
NotOwned
}
Constructors
-
NotFoundName was not registered at the moment of the call. Idempotent callers can ignore this; observability paths surface it.
-
NotOwnedName resolved to a PID on another node. The local-ownership ACL refused to forward the unregister, the
:globaltable is untouched.
Values
pub fn is_registered(name: String) -> Bool
Check whether a name is currently registered.
pub fn lookup(
tn: TypedName(msg),
) -> Result(global.GlobalSubject(msg), Nil)
Look up a GlobalSubject by TypedName.
Reconstructs the subject using the name-based tag, so messages sent via the returned subject reach the actor mailbox.
Snapshot semantics
lookup returns the Subject that resolves to the PID found at
the exact moment the lookup ran. The PID may crash, deregister,
or leave the cluster a microsecond later. Subsequent
global.send calls on a stale Subject silently drop messages
because that is the BEAM’s contract for erlang:send/2 to a
dead PID. For long-lived references, either re-lookup before
each critical send, or use global.call (which monitors the
target and surfaces Error(TargetDown) immediately when the PID
has died).
Propagation is not atomic across the cluster
A lookup from node B issued microseconds after a successful
register_global on node A may still return Error(Nil)
because :global propagates registrations via gossip rather
than synchronous broadcast. This is the standard :global
behaviour, not a defect in distribute. For the
register-then-immediately-lookup pattern across nodes, prefer
lookup_with_timeout with a 200-500 ms budget over a
single-shot lookup: the polling loop bridges the propagation
window without busy-waiting.
pub fn lookup_async(
tn: TypedName(msg),
reply_to: process.Subject(
Result(global.GlobalSubject(msg), LookupError),
),
timeout_ms: Int,
poll_interval_ms: Int,
) -> Nil
Non-blocking poll: spawns a polling worker that probes the registry until the name resolves, the timeout elapses, or the caller dies.
The worker monitors the caller. If the caller process dies for
any reason (crash or normal exit(normal)), the worker stops
polling and terminates immediately. A bare link would only catch
crashes: a short-lived web handler that completes with reason
normal would leave the worker busy-polling for the full timeout
while the result has nowhere to go.
Validation errors (invalid timeout or poll interval) are forwarded to
reply_to synchronously. The worker is not spawned for invalid params.
let reply = process.new_subject()
registry.lookup_async(tn, reply, 5000, 100)
let assert Ok(Ok(gs)) = process.receive(reply, 5100)
pub fn lookup_error_to_string(err: LookupError) -> String
pub fn lookup_with_timeout(
tn: TypedName(msg),
timeout_ms: Int,
poll_interval_ms: Int,
) -> Result(global.GlobalSubject(msg), LookupError)
Poll until a TypedName is registered or timeout_ms elapses.
Retries every poll_interval_ms milliseconds. Both timeout_ms and
poll_interval_ms must be strictly positive. Passing zero or a
negative value returns a typed Error instead of crashing the
scheduler (a non-positive poll_interval_ms would tight-loop or
raise badarg inside process.sleep).
Warning: blocks the calling process. Do not call inside an OTP actor
handler. Use lookup_async instead.
pub fn named(name: String, c: codec.Codec(msg)) -> TypedName(msg)
Create a typed name from a bundled Codec.
let counter = registry.named("counter", codec.int())
Caution: split-brain conflict resolution is brutal
Names registered via register_global are stored in Erlang’s
:global table. After a network partition heals, :global may
discover the same name registered to different PIDs on each
side. Its default conflict resolver (random_notify_name/3)
picks a winner and sends an uncatchable
exit(loser_pid, kill) to the other PID. The library cannot
trap this and there is no user hook to plug in: it is a
property of :global, not of distribute.
Operational consequences:
- critical singletons can vanish without warning during a partition-heal event;
- any work in flight on the loser PID is lost;
- Subjects obtained from a pre-split
lookuppoint at the loser and silently drop sends after the kill.
Mitigation: subscribe to cluster_monitor and treat each
NodeUp after a known partition as a “re-validate critical
state” signal; re-lookup registered actors before each
critical operation rather than caching Subjects across long
timescales. For workloads where this is unacceptable, use pg
(process groups, eventually-consistent, no kill-on-merge) or a
consensus layer instead. See docs/safety_and_limits.md under
“BEAM and OS-level risks” for the full discussion.
pub fn pool_member(
base: TypedName(msg),
index: Int,
) -> TypedName(msg)
Derive a TypedName for a pool worker.
pool_member(base, 2) returns a TypedName with name "<base>_2" and
the same codecs.
pub fn register(
name: String,
pid: process.Pid,
) -> Result(Nil, RegisterError)
Register a PID under a global name.
Low-level escape hatch. Prefer register_global/2 when you have a
GlobalSubject, or register_typed/2 when you already hold the exact
distributed Subject(BitArray) that remote lookups will reconstruct.
A raw PID carries no protocol/tag invariant by itself: if you register the
owner of a subject whose mailbox tag does not equal name, remote lookups
will still reconstruct unsafe_from_name(name, pid) and can silently send into a
mailbox slot the actor never receives on.
Caution: split-brain conflict resolution is brutal
Backed by :global. After a network partition heals, conflict
resolution sends an uncatchable exit(loser_pid, kill) to one
side of any duplicate registration. See named/2 and
docs/safety_and_limits.md for the full operational picture.
pub fn register_error_to_string(err: RegisterError) -> String
pub fn register_global(
tn: TypedName(msg),
gs: global.GlobalSubject(msg),
) -> Result(Nil, RegisterError)
Register a GlobalSubject under a typed name.
The msg type parameter on TypedName(msg) and GlobalSubject(msg)
must match. The compiler enforces this.
Runtime invariant: the supplied gs must have been built with
global.unsafe_from_name(name, ...) (or via actor.start_*, which does the
same internally) so that its tag equals the registry name. A remote
lookup reconstructs the Subject as unsafe_from_name(name, pid) and any
other tag would silently accumulate messages in the actor’s mailbox
without ever matching its selector.
We enforce the invariant up front. Registering a Subject built with
global.new() (random Ref tag) or global.from_pid() (Nil tag)
returns Error(InvalidArgument(...)) instead of silently breaking.
Propagation is not atomic across the cluster
Ok(Nil) means :global accepted the registration on this
node. Cross-cluster visibility follows the standard :global
gossip path and is not instantaneous: a lookup from a remote
node a microsecond after this call returns can still see the
previous state (or no state) until propagation completes. This
is a property of :global itself, not a defect in
distribute. Callers who need to register-then-immediately-
lookup on another node have three options:
- Use
lookup_with_timeoutwith a small budget (200-500 ms is usually plenty); the polling loop bridges the propagation window without busy-waiting. - Send a synchronous
callfrom the registering node back to the actor before notifying remote callers; the round trip forces propagation to complete. - For test setups, call
:global.sync()directly via FFI. Not recommended for production hot paths because it blocks on the global_name_server queue.
pub fn register_global_with_resolver(
tn: TypedName(msg),
gs: global.GlobalSubject(msg),
resolver: fn(String, process.Pid, process.Pid) -> conflict.ConflictOutcome,
) -> Result(Nil, RegisterError)
Like register_global/2, but installs a custom split-brain
conflict resolver alongside the registration.
:global invokes the resolver when the same name is claimed by
two different PIDs (typically after a network partition heals).
The default resolver behind register_global/2
(:global.random_notify_name/3) picks a winner at random and
kills the loser. For workloads where one side should always win,
a leader pinned to a specific node, a router whose state
must survive on a primary, a singleton whose oldest instance is
authoritative. Pass an explicit resolver from
distribute/conflict.
import distribute/conflict
import distribute/registry
let assert Ok(Nil) =
registry.register_global_with_resolver(
counter_name,
gs,
conflict.node_priority(["primary@host", "secondary@host"]),
)
Operational guard rails
The resolver runs inside :global’s singleton worker
process; while it executes, every :global operation cluster-
wide is serialised behind it. To bound the worst-case stall,
the FFI shim spawns the resolver in a short-lived worker with
a hard deadline read from
config.get().conflict_resolver_timeout_ms (default
1 000 ms). On timeout or crash, the fallback
(lowest term-ordered PID wins) fires and
telemetry.ConflictResolverFailed is emitted so operators can
see that the user fn misbehaved. Every resolution emits
telemetry.ConflictResolved with the surviving PID (or None
for KillBoth).
Data-loss honesty box
The fallback is deterministic but not state-aware: if
your resolver was supposed to keep the side with the most
recent state and it fails (timeout, panic, malformed return),
the fallback picks lowest-PID and the cluster may lose
whatever the loser was holding. For stateful actors where this
would corrupt the system, use conflict.kill_both() as the
resolver and watch telemetry.ConflictResolved(_, None) to
trigger an application-level recovery path (re-elect, re-
bootstrap from durable storage). See distribute/conflict for
the full discussion.
Tag invariant
Same as register_global/2: the supplied GlobalSubject must
carry the TypedName as its tag. A mismatch returns
Error(InvalidArgument(...)) before any :global call. Build
the subject via actor.start_* or via lookup.
pub fn register_typed(
name: String,
subject: process.Subject(BitArray),
) -> Result(Nil, RegisterError)
Register a name-tagged distributed Subject(BitArray) under a global name.
Low-level escape hatch: this is the raw-subject sibling of
register_global/2. The supplied subject must already be the exact
name-tagged distributed subject that remote lookups will reconstruct
(global.unsafe_from_name(name, pid, ...) or the subject returned by
actor.start_*).
Subjects built with process.new_subject() (random Ref tag) or
global.from_pid() (Nil tag) are rejected: a remote lookup would
reconstruct unsafe_from_name(name, pid) and silently send into a mailbox slot
that never matches the original selector.
pub fn typed_name(
name: String,
encoder: fn(msg) -> Result(BitArray, codec.EncodeError),
decoder: fn(BitArray) -> Result(msg, codec.DecodeError),
) -> TypedName(msg)
Create a typed name from explicit encoder/decoder.
pub fn typed_name_decoder(
tn: TypedName(msg),
) -> fn(BitArray) -> Result(msg, codec.DecodeError)
Get the decoder.
pub fn typed_name_encoder(
tn: TypedName(msg),
) -> fn(msg) -> Result(BitArray, codec.EncodeError)
Get the encoder.
pub fn typed_name_to_string(tn: TypedName(msg)) -> String
Get the name string.
pub fn unregister(name: String) -> Result(Nil, UnregisterError)
Unregister a global name.
See also: register/2, register_global/2, lookup/1.
Returns Ok(Nil) when this VM owned the name and the entry has
been removed from :global. Returns Error(NotOwned) when the
owning PID runs on another node (the local-ownership ACL refuses
the unregister), and Error(NotFound) when the name was not
registered at all. Idempotent cleanup paths can let _ = unregister(name) to discard the outcome; observability paths can
pattern-match for diagnostics.
Local-ownership ACL: stateless node(Pid) =:= node() check via
:global.whereis_name/1. No local mirror can go stale, no
auto-cleanup is needed when processes die. This blocks the “registry
wipe” vector where unvalidated input flows into unregister and
tears down arbitrary cluster routing. Standard auto-cleanup
(:global removes a name when the owning process dies) is
unaffected.
pub fn unregister_error_to_string(err: UnregisterError) -> String
pub fn unregister_typed(
tn: TypedName(msg),
) -> Result(Nil, UnregisterError)
Unregister using a TypedName directly. Type-safe sibling of
unregister/1 that reuses the protocol the caller already holds,
avoiding a stringly-typed extraction at the call site.
pub fn whereis(name: String) -> Result(process.Pid, Nil)
Look up a globally registered PID by name.