dgen_registry_elector (DGen v0.3.0)

Copy Markdown View Source

dgen_server callback module that tracks registry membership and elects a leader.

The elector's state (member map + current leader) is stored durably in the backend and is shared across every node that runs the same named registry. Messages arrive via a durable FIFO queue, so membership changes are processed one at a time, serialised through the backend.

Leader election

The incumbent leader is kept as long as it remains a member — leadership only changes when the incumbent leaves or no leader has been elected yet. This prevents thrashing when a non-leader node happens to win a backend transaction race.

When a new leader must be chosen (no valid incumbent), the node that wins the backend transaction race is preferred: if {node(), MemberName} is a current member, that node becomes leader. If not (transient window during startup), lists:min/1 over live member IDs is used as a deterministic fallback.

Replication on leadership change

When elect_leader/4 returns a leader different from the previous one, handle_cast_tx returns {lock, NewState}, triggering the following sequence before the lock clears:

  1. Commit + lock — the backend transaction commits the new member set and leader key atomically, then sets a distributed lock key that pauses all other elector consumers on all nodes. The lock is held for at most ?SnapshotTimeout × 2 ms — the worst-case duration of two synchronous cross-node calls in handle_locked/4. An Erlang after block guarantees the lock is cleared even if handle_locked raises; only a hard process kill (SIGKILL / VM abort) can leave it permanently set.

  2. Snapshot acquisitionhandle_locked/4 decides what names snapshot the new leader starts with:

    • If the new leader is a brand-new member with no prior follower state (only possible when there is no valid incumbent, i.e. first join into an existing cluster), the elector calls the old leader via {transfer_snapshot}. The old leader flushes any pending registrations from its mailbox, returns its authoritative names list, and sets its own leader field to undefined — relinquishing leadership atomically in a single gen_server call.
    • Otherwise (an existing member takes over, or leader died and a follower is promoted), the new leader uses self_snapshot — its own in-memory names map, which is already a follower replica.
  3. Leader assumption — the elector calls the new leader via {elector_assume_and_distribute, Snapshot, MemberId, AllIds, Tokens, Epoch}. The new leader stores the epoch, applies the snapshot, sets up erlang:monitor/2 for every registered pid, and casts {apply_names_snapshot, ..., Epoch} to every follower from its own process.

  4. Follower sync — each follower receives {apply_names_snapshot}, overwrites its names map, and updates its leader field. Because these casts originate from the same process as subsequent {name_registered} broadcasts, Erlang's per-pair FIFO guarantee ensures every follower sees the snapshot before any registration that post-dates the transition.

  5. Lock clearshandle_locked returns, the lock key is cleared, and all waiting consumers resume.

Leader key in the backend

Key path: {Tuid, <<"leader">>}
Value:    term_to_binary(MemberId | undefined)

Summary

Functions

Handles read-only priority calls: get_leader and get_members.

Processes membership change messages within a backend transaction.

Executes the replication sequence after a leadership change, before the lock clears.

Initialises the elector state with an empty member map and undefined leader.

Returns the packed backend key for the leader value.

Types

member_id()

-type member_id() :: {node(), atom()}.

member_info()

-type member_info() :: #{joined_at := integer(), join_token := reference()}.

registry_state()

-type registry_state() ::
          #{name := atom(),
            members := #{member_id() => member_info()},
            leader := member_id() | undefined,
            epoch := non_neg_integer()}.

Functions

handle_call/3

-spec handle_call(term(), dgen_server:from(), registry_state()) -> dgen_server:reply_ret().

Handles read-only priority calls: get_leader and get_members.

handle_cast_tx/3

Processes membership change messages within a backend transaction.

Handles {join, MemberId, Token} and {member_down, MemberId, Token}. Returns {lock, NewState} when leadership changes, {noreply, NewState} otherwise.

Each {join} carries a unique token (a reference() generated by the member process before enqueuing). The elector stores this token in member_info.

A {member_down, MemberId, Token} is silently discarded when its token does not match the stored token for that member — this means the member has rejoined with a new token since the DOWN was detected, so the message is stale. This prevents a partition-recovery race where a {member_down} enqueued during the disconnect is processed after the subsequent {join} that heals the cluster.

handle_locked/4

Executes the replication sequence after a leadership change, before the lock clears.

See the module doc for the full step-by-step sequence.

transfer_snapshot is called only when the new leader is a brand-new member with no prior follower state — i.e., Leader =:= MemberId (the joiner itself won) and at least one other member existed before it joined. With sticky leadership this only occurs when there is no valid incumbent (first join into an existing cluster). In all other cases self_snapshot is used.

All calls to member processes are wrapped in try/catch. If a target is unreachable:

  • transfer_snapshot failure: falls back to self_snapshot. The new leader starts with a potentially stale names map for that transition window.
  • elector_assume_and_distribute failure: the lock clears normally. The membership change is already committed to the backend; affected members self-correct on the next membership event.

init/1

-spec init(#{name := atom()}) -> {ok, dgen_server:tuid(), registry_state()}.

Initialises the elector state with an empty member map and undefined leader.

leader_db_key(Dir, Tuid)

-spec leader_db_key(dgen_backend:dir(), dgen_server:tuid()) -> dgen_backend:key().

Returns the packed backend key for the leader value.

Exported so callers can set up a backend watch without going through the elector process.