Shard-by-shard data directory copy for new node sync.
Provides WAL gap detection to avoid unnecessary full copies, per-shard sync status tracking, leader-aware copy source resolution, and automatic retry with partial cleanup on failure.
Summary
Functions
Reads the last applied Raft index from a ra meta.dets file on disk.
Retries sync_shard/3 up to max_retries times, cleaning up partial data
on the target node between attempts.
Copies all shards sequentially, tracking per-shard sync status.
Syncs a single shard's data to a target node.
Pure WAL gap check: given the target's last applied index and the leader's first available WAL index, determines if WAL replay can bridge the gap.
Functions
@spec needs_resync?(non_neg_integer(), node(), node()) :: :wal_bridgeable | :needs_resync
@spec read_last_applied_from_disk(binary(), non_neg_integer()) :: non_neg_integer()
Reads the last applied Raft index from a ra meta.dets file on disk.
Used when a node boots from a disk clone (EBS snapshot) — we need to know the Raft index the cloned data is consistent up to, BEFORE starting ra (since the WAL has the wrong node IDs and must be deleted).
Returns the last_applied index, or 0 if the file doesn't exist or can't be read.
@spec retry_sync_shard( non_neg_integer(), node(), FerricStore.Instance.t(), non_neg_integer() ) :: {:ok, :wal_bridgeable | non_neg_integer()} | {:error, term()}
Retries sync_shard/3 up to max_retries times, cleaning up partial data
on the target node between attempts.
Parameters
shard_index-- zero-based shard indextarget_node-- the node to sync data toctx-- the FerricStore instance contextmax_retries-- maximum number of attempts (default: 3)
@spec sync_all_shards(node(), FerricStore.Instance.t()) :: {:ok, map()} | {:error, term()}
Copies all shards sequentially, tracking per-shard sync status.
Returns {:ok, results} when every shard succeeds, where results is a
map of shard_index => {:synced, detail}. On partial failure returns
{:error, {:partial_sync, results}} with per-shard success/failure info.
Parameters
target_node-- the node to sync data toctx-- the FerricStore instance context
@spec sync_shard(non_neg_integer(), node(), FerricStore.Instance.t()) :: {:ok, :wal_bridgeable | non_neg_integer()} | {:error, term()}
Syncs a single shard's data to a target node.
Resolves the current leader for the shard and copies data FROM the leader (not from the local node). Before copying, checks whether the target can catch up via WAL replay alone -- if so, the expensive data copy is skipped.
- Find leader for the shard
- Check WAL bridgeability
- If resync needed: pause writes, copy data + ra dir, resume writes
- Return
{:ok, detail}with:wal_bridgeableor the Raft index at copy time
Parameters
shard_index-- zero-based shard indextarget_node-- the node to sync data toctx-- the FerricStore instance context
@spec wal_bridgeable?(non_neg_integer(), non_neg_integer()) :: :wal_bridgeable | :needs_resync
Pure WAL gap check: given the target's last applied index and the leader's first available WAL index, determines if WAL replay can bridge the gap.