distribute/actor

Named actor lifecycle: start, register, supervise, pool.

Resource cleanup and the {terminate, Reason} gap

gleam/otp/actor 1.x does not implement OTP’s {terminate, Reason} system message (see gleam_otp_external.erl; there is an explicit TODO). That has two consequences for any actor that owns external resources (file handles, ETS tables, ports, locks, sockets):

  1. Handler-controlled exits run user code first. If your handler returns receiver.Stop or receiver.StopAbnormal, you have full control: release every resource you own before returning the stop variant. The actor will then exit and the BEAM frees its mailbox.
  2. External terminations skip the actor. When the actor is process.kill-ed (uncatchable), or when the supervisor sends exit(Pid, shutdown), gleam/otp/actor does not currently convert it into a callback. Resources owned only on the actor’s heap are released by the BEAM (memory), but any external handle (a file, a TCP connection, an ETS table you forgot to make public) leaks.

The OTP-pure mitigation is the “linked resource owner” pattern: spawn a tiny dedicated process to own the external resource, monitor the actor, and run a close callback when the actor dies for any reason. The library ships actor.start_resource_owner/3 as a ready-made helper for this pattern. See docs/recipes.md for the underlying recipe.

When upstream gleam/otp ships termination callbacks (tracking issue: gleam-lang/otp#126) the helper becomes redundant for the actor case. Its API is shaped so the migration is a one-line change in user code and a deprecation here, no semantics shift.

Types

Error from start_registered: distinguishes an actor init failure from a :global registration failure after a successful start.

pub type StartRegisteredError {
  ActorStartFailed(actor.StartError)
  GlobalRegisterFailed(registry.RegisterError)
}

Constructors

  • ActorStartFailed(actor.StartError)

    The OTP actor failed to start (init crashed, timeout, etc.).

  • GlobalRegisterFailed(registry.RegisterError)

    The actor started but :global registration failed; the orphaned actor process has been killed.

Values

pub fn child_spec(
  typed_name: registry.TypedName(msg),
  initial_state: state,
  handler: fn(msg, state) -> receiver.HandlerStep(state),
  init_timeout_ms: Int,
) -> supervision.ChildSpecification(global.GlobalSubject(msg))

OTP child spec for a named actor that auto-registers on (re)start.

pub fn child_spec_default(
  typed_name: registry.TypedName(msg),
  initial_state: state,
  handler: fn(msg, state) -> receiver.HandlerStep(state),
) -> supervision.ChildSpecification(global.GlobalSubject(msg))

Like child_spec, with config.get().default_init_timeout_ms.

pub fn pool(
  typed_name: registry.TypedName(msg),
  size: Int,
  initial_state: state,
  handler: fn(msg, state) -> receiver.HandlerStep(state),
  init_timeout_ms: Int,
) -> Result(process.Pid, actor.StartError)

Start N supervised actors, registered as name_1 .. name_N.

Returns Error(actor.InitFailed(...)) if size < 1. list.range would otherwise produce a degenerate list of weird worker names (e.g. name_0, name_-1) and silently start a useless supervisor.

Cascading Failure Risk: :global and fixed pools do not mix perfectly. If a single worker in the pool fails its global registration (e.g. because name_4 is legitimately held by a node on the other side of a network split), the worker crashes. The pool supervisor will try to restart it. If the conflict persists, the supervisor exhausts its MaxR restart intensity and crashes the entire pool. This means a collision on 1 worker brings down all N workers, causing the parent supervisor to restart the whole pool. This architectural limit will be solved natively when the syn registry backend is introduced in v4.2.0 (via syn process groups).

pub fn pool_default(
  typed_name: registry.TypedName(msg),
  size: Int,
  initial_state: state,
  handler: fn(msg, state) -> receiver.HandlerStep(state),
) -> Result(process.Pid, actor.StartError)

Like pool, with config.get().default_init_timeout_ms.

pub fn start(
  typed_name: registry.TypedName(msg),
  initial_state: state,
  handler: fn(msg, state) -> receiver.HandlerStep(state),
  init_timeout_ms: Int,
) -> Result(global.GlobalSubject(msg), actor.StartError)

Start a named actor.

pub fn start_default(
  typed_name: registry.TypedName(msg),
  initial_state: state,
  handler: fn(msg, state) -> receiver.HandlerStep(state),
) -> Result(global.GlobalSubject(msg), actor.StartError)

Like start, but uses config.get().default_init_timeout_ms as the init timeout.

pub fn start_observed(
  typed_name: registry.TypedName(msg),
  initial_state: state,
  handler: fn(msg, state) -> receiver.HandlerStep(state),
  init_timeout_ms: Int,
  on_decode_error: fn(codec.DecodeError) -> Nil,
) -> Result(global.GlobalSubject(msg), actor.StartError)

Start a named actor, with a callback for decode errors.

Useful for logging or metering malformed messages across nodes (e.g. during rolling deploys with mismatched codec versions).

pub fn start_registered(
  typed_name: registry.TypedName(msg),
  initial_state: state,
  handler: fn(msg, state) -> receiver.HandlerStep(state),
  init_timeout_ms: Int,
) -> Result(global.GlobalSubject(msg), StartRegisteredError)

Start an actor and register it globally. Kills the actor if registration fails.

See also: start_registered_default/3 (configured init timeout), start_registered_observed/5 (decode-error hook), start_supervised/4 (auto-restart on crash).

pub fn start_registered_default(
  typed_name: registry.TypedName(msg),
  initial_state: state,
  handler: fn(msg, state) -> receiver.HandlerStep(state),
) -> Result(global.GlobalSubject(msg), StartRegisteredError)

Like start_registered, but uses config.get().default_init_timeout_ms as the init timeout.

pub fn start_registered_error_to_string(
  err: StartRegisteredError,
) -> String

Render a StartRegisteredError as a human-readable string. Mirrors the *_error_to_string formatters published by the other error modules so observability paths can io.println an error directly without pattern-matching at every call site.

pub fn start_registered_observed(
  typed_name: registry.TypedName(msg),
  initial_state: state,
  handler: fn(msg, state) -> receiver.HandlerStep(state),
  init_timeout_ms: Int,
  on_decode_error: fn(codec.DecodeError) -> Nil,
) -> Result(global.GlobalSubject(msg), StartRegisteredError)

Start a named actor and register it globally, with a callback for decode errors.

pub fn start_resource_owner(
  open: fn() -> resource,
  close: fn(resource) -> Nil,
  lifetime: process.Pid,
) -> process.Pid

Spawn a dedicated process that owns an external resource and runs close(resource) when lifetime dies for any reason. Workaround for the missing {terminate, Reason} callback in gleam/otp/actor 1.x: external resources (database connections, distributed locks, NIF handles, OS pipes) cannot rely on the actor’s exit to free them, but a dedicated observer process can.

Returns the owner’s PID, in case you need to stop the owner independently of the lifetime PID.

Why this shape

The resource is opened inside the owner, so it is BEAM-process- owned by the owner. That matters for resources whose lifetime is tied to a specific process at the runtime level (ETS tables, ports, gen_tcp controlling process).

The owner is linked to the caller (which is usually the actor’s init function). It also traps exits (process.trap_exits(True)). This guarantees a fail-fast behaviour: if open() crashes before it returns, the owner dies, and because it is linked, it pulls the actor down with it immediately. This prevents the “partial failure” scenario where the actor survives but its critical resource failed to initialize.

The owner uses a process.monitor on lifetime. Monitor delivery is asynchronous but reliable: the BEAM guarantees a DOWN for every monitored PID, and a monitor on an already-dead PID fires immediately. That makes the helper safe to call right after the actor has started: even if there is a microsecond gap before the monitor goes up, the BEAM resolves it correctly.

Usage

import distribute
import distribute/actor as dist_actor
import distribute/global

let assert Ok(gs) =
  distribute.start_registered(name, init, handler)
let assert Ok(actor_pid) = global.owner(gs)
let _owner = dist_actor.start_resource_owner(
  fn() { open_postgres_pool() },
  fn(pool) { close_postgres_pool(pool) },
  actor_pid,
)

If the actor needs to use the resource, have open build a process.Subject the owner serves and pass it through your actor’s init.

Failure modes

  • open raises: the owner dies before the resource is opened. Because the owner is linked to the caller (the actor), the actor receives an exit signal and crashes immediately. This fail-fast behaviour avoids partial failures where the actor runs without its resource.
  • close raises: the owner dies after the panic, no further cleanup runs. The resource may be in a partially-closed state.
  • lifetime is already dead at call time: monitor fires DOWN immediately, close runs on the next scheduler tick.
  • The owner is process.kill-ed: cleanup is uncatchable, BEAM reaps the resource via process death (only effective for BEAM-process-tied resources). External handles leak. This is the same uncatchable-kill caveat OTP itself has.

Future

When gleam/otp/actor ships native termination callbacks (gleam-lang/otp#126), this helper becomes redundant for the actor case. The contract will not shift: callers can replace the actor.start_resource_owner(open, close, pid) line with the upstream API and delete the helper call site. The function will stay (deprecated) for the more general “tie cleanup to any PID” pattern.

pub fn start_supervised(
  typed_name: registry.TypedName(msg),
  initial_state: state,
  handler: fn(msg, state) -> receiver.HandlerStep(state),
  init_timeout_ms: Int,
) -> Result(process.Pid, actor.StartError)

Start a supervised actor that auto-registers on (re)start.

If registration fails, the worker crashes and the supervisor retries, which is the correct OTP pattern for transient registration failures.

pub fn start_supervised_default(
  typed_name: registry.TypedName(msg),
  initial_state: state,
  handler: fn(msg, state) -> receiver.HandlerStep(state),
) -> Result(process.Pid, actor.StartError)

Like start_supervised, with config.get().default_init_timeout_ms.

Search Document