reckon_db_cluster (reckon_db v2.3.7)

View Source

Cluster-level health and consistency facade.

Stable surface for cluster-wide checks that reckon_db_gateway_worker (and therefore the gateway's HealthService RPCs) consumes. Delegates to the underlying primitives in [[reckon_db_consistency_checker]] and ra/khepri.

This module deliberately does NOT depend on the reckon_db_consistency_checker gen_server being supervised — the per-call entry points (verify_membership_consensus/1, etc.) are pure functions that gather state from ra/khepri on demand. That makes the facade safe to call in both single and cluster modes, and avoids a dependency on the periodic-checker actually running.

Historical note: prior to reckon_db 2.2.1, reckon_db_gateway_worker referenced this module's four functions, but the module itself was never extracted from the legacy esdb_cluster rename in 2.0.0. The dangling references caused HealthService.Check and the three VerifyXxx RPCs to hang in the retry loop until the gRPC client timed out. 2.2.1 ships this facade and gets those handlers working.

Summary

Functions

Verify Raft log consistency across followers.

Quick health check — does the store have quorum and a leader?

Full cluster consistency check.

Verify membership consensus.

Functions

check_log_consistency(StoreId)

-spec check_log_consistency(atom()) -> {ok, map()} | {error, term()}.

Verify Raft log consistency across followers.

Collects per-follower term/index stats and checks for replication divergence. Used by the gateway's HealthService.CheckRaftLogConsistency RPC.

health_check(StoreId)

-spec health_check(atom()) -> {ok, map()} | {error, term()}.

Quick health check — does the store have quorum and a leader?

Cheap. No RPCs to other nodes. Suitable for liveness probes and the gateway's HealthService.Check RPC.

Returns {ok, #{status => healthy | degraded | no_quorum, ...}} on success, {error, Reason} when the store isn't reachable at all (e.g. coordinator not started, store not known to ra).

verify_consistency(StoreId)

-spec verify_consistency(atom()) -> {ok, map()} | {error, term()}.

Full cluster consistency check.

Combines membership consensus + leader consensus into a single verdict. Quorum is checked first as a cheap precondition. Used by the gateway's HealthService.VerifyClusterConsistency RPC.

Returns {ok, #{status => healthy | degraded | split_brain | no_quorum, membership => ..., leader => ...}} or {error, Reason}.

verify_membership(StoreId)

-spec verify_membership(atom()) -> {ok, map()} | {error, term()}.

Verify membership consensus.

Collects each node's view of cluster membership via RPC and confirms they agree. Used by the gateway's HealthService.VerifyMembershipConsensus RPC.