Ra state machine for a single FerricStore shard.
Each shard is an independent Raft group. The state machine receives write
commands via apply/3, which deterministically applies them to both the
Bitcask persistent store (via synchronous NIF) and the ETS hot cache.
Callbacks
init/1-- receives the shard config, returns initial machine state.apply/3-- deterministic command application (called on every node). Supports:put,:delete, and:batchcommands.state_enter/2-- lifecycle hook for leader/follower transitions.tick/2-- periodic callback (unused currently, placeholder for metrics).init_aux/1-- initializes non-replicated auxiliary state.handle_aux/5-- handles non-replicated auxiliary commands (new API).overview/1-- returns a summary map for debugging/monitoring.
Design notes
Per the spec (section 2C.4):
apply/3is deterministic and runs on every node in the Raft group.- Only synchronous NIF calls are allowed inside
apply/3. - Effects (
send_msg,release_cursor) are returned as the third element of the apply return tuple. - In single-node mode, the shard's Raft group has one member (self quorum), so every write commits immediately after local log append + fsync.
HLC piggybacking (spec 2G.6)
HLC timestamps are piggybacked on Raft commands. The Batcher stamps each
command with the leader's current HLC timestamp before submitting it to ra.
When apply/3 processes a command carrying an hlc_ts metadata map, it
calls HLC.update/1 to merge the leader's clock into the local node's HLC.
In single-node mode this merge is a no-op (the node merges its own timestamp). In multi-node clusters, followers use this to stay causally synchronized with the leader's clock, bounding inter-node TTL precision to Raft heartbeat RTT (~10 ms).
Commands may arrive in two forms:
- Wrapped:
{inner_command, %{hlc_ts: {physical_ms, logical}}}-- the metadata map carries the leader's HLC timestamp for merging. - Unwrapped:
inner_command(legacy / test) -- processed as before without HLC merging.
Log compaction (spec 2E.5)
The Raft log grows unbounded unless compacted. Every
:release_cursor_interval applied commands (default: 1000), apply/3
emits a {:release_cursor, ra_index, state} effect. This tells ra that
all log entries up to ra_index are fully reflected in the given state
snapshot and can be safely truncated.
The interval is stored in the machine state at init time (from the config
map or application env) so that apply/3 remains deterministic -- it never
reads runtime configuration.
Summary
Functions
Applies a replicated command to the shard state.
Handles non-replicated auxiliary commands (5-arity new API).
Initializes the state machine for a shard.
Initializes non-replicated auxiliary state.
Returns a summary map for debugging and monitoring.
Lifecycle hook called when the Raft node transitions roles.
Periodic tick callback. Returns a list of effects (currently empty).
Types
@type shard_state() :: %{ shard_index: non_neg_integer(), shard_data_path: binary(), active_file_id: non_neg_integer(), active_file_path: binary(), ets: atom(), applied_count: non_neg_integer(), release_cursor_interval: pos_integer() }
Functions
Applies a replicated command to the shard state.
Supported commands:
{:put, key, value, expire_at_ms}-- Write a key-value pair with optional expiry. Writes to Bitcask (sync NIF) and updates ETS.{:delete, key}-- Delete a key. Writes a tombstone to Bitcask, removes from ETS.{:batch, commands}-- Apply a list of commands atomically. Each command in the batch is a tuple matching one of the above forms. Returns{:ok, results}where results is a list of individual command results.{:list_op, key, operation}-- Execute a list operation (LPUSH, RPUSH, LPOP, RPOP, etc.) as an atomic read-modify-write. Reads the current value from ETS/Bitcask, delegates toListOps.execute/4, and persists the result.{:compound_put, compound_key, value, expire_at_ms}-- Write a hash/set/zset field. Inserts{compound_key, value, expire_at_ms}into ETS and Bitcask.{:compound_delete, compound_key}-- Delete a hash/set/zset field. Removes the compound key from ETS and Bitcask.{:compound_delete_prefix, prefix}-- Delete all compound keys matching the given prefix from ETS and Bitcask. Used by DEL on data structures (hashes, sets, sorted sets) to clean up all fields.{:incr_float, key, delta}-- Atomic read-modify-write float increment. Reads the current value, parses as float, addsdelta, formats the result, and writes back. Returns{:ok, new_float_string}or{:error, "ERR value is not a valid float"}.{:append, key, suffix}-- Atomic read-modify-write append. Reads the current value (or""), concatenatessuffix, writes back. Returns{:ok, byte_size(new_value)}.{:getset, key, new_value}-- Atomic get-and-set. Reads the old value, writes the new value with no expiry, returns the old value (ornil).{:getdel, key}-- Atomic get-and-delete. Reads the value, deletes the key, returns the value (ornil).{:getex, key, expire_at_ms}-- Atomic get-and-update-expiry. Reads the value, re-writes with the newexpire_at_ms, returns the value (ornil).{:setrange, key, offset, value}-- Atomic set-range. Reads the current value, pads with zero bytes if needed, replaces bytes atoffset, writes back. Returns{:ok, byte_size(new_value)}.{:cas, key, expected, new_value, ttl_ms}-- Compare-and-swap. Reads the current value; if it matchesexpected, writesnew_valuewith optional TTL. Returns1(swapped),0(mismatch), ornil(key missing/expired).{:lock, key, owner, ttl_ms}-- Distributed lock acquire. If the key does not exist, is expired, or is already held by the same owner, sets{owner, ttl}. Returns:okor{:error, reason}.{:unlock, key, owner}-- Distributed lock release. If the key exists and the owner matches, deletes the key. Returns1on success,{:error, reason}on owner mismatch.{:extend, key, owner, ttl_ms}-- Distributed lock TTL extension. If the key exists and the owner matches, updates the TTL. Returns1on success,{:error, reason}on owner mismatch or missing key.{:ratelimit_add, key, window_ms, max, count}-- Sliding window rate limiter. Reads counters, rotates windows, computes effective count, and updates. Returns[status, count, remaining, ttl_ms].
Returns {new_state, result} or {new_state, result, effects}.
Handles non-replicated auxiliary commands (5-arity new API).
The int_state parameter is ra's internal state and must be passed back
unchanged in the return tuple.
Currently supports:
{:cast, {:key_written, key}}-- Increments a local hot-key counter.
@spec init(map()) :: shard_state()
Initializes the state machine for a shard.
The config map must include (v2 -- path-based, no NIF store reference):
:shard_index-- zero-based shard index:shard_data_path-- absolute path to the shard's Bitcask data directory:active_file_id-- numeric ID of the active log file:active_file_path-- absolute path to the active log file:ets-- ETS table name (already created)
Optional:
:release_cursor_interval-- number of applies between release_cursor effects (default: 20000). Can also be set viaApplication.get_env(:ferricstore, :release_cursor_interval).
Returns the initial machine state.
Initializes non-replicated auxiliary state.
Aux state is local to each node and not replicated via Raft. Used for tracking hot-key statistics and other node-local metadata.
Returns a summary map for debugging and monitoring.
Includes the shard index, ETS keydir size, total applied command count, and the release_cursor interval.
Lifecycle hook called when the Raft node transitions roles.
When becoming leader, generates a fresh HLC timestamp via HLC.now/0 to
ensure the leader's clock is up to date before it starts stamping commands.
This is a side-effect only -- it does not affect the deterministic state
machine output.
In single-node mode, the node is always the leader. In multi-node clusters, this can be used to start/stop leader-only processes (e.g., merge scheduler, active expiry sweeper).
Returns a list of effects (currently empty).
Periodic tick callback. Returns a list of effects (currently empty).