Ferricstore.Cluster.DataSync (ferricstore v0.4.0)

Copy Markdown View Source

Shard-by-shard data directory copy for new node sync.

Provides WARaft segment-log gap detection to avoid unnecessary full copies, per-shard sync status tracking, leader-aware copy source resolution, and automatic retry with partial cleanup on failure.

Summary

Functions

Reads the persisted replay-safe index for a copied shard.

Retries sync_shard/3 up to max_retries times, cleaning up partial data on the target node between attempts.

Copies all shards sequentially, tracking per-shard sync status.

Syncs a single shard's data to a target node.

Pure segment-log gap check: given the target's replay-safe index and the leader's first available segment index, determines if log replay can bridge the gap.

Functions

needs_resync?(shard_index, target_node, leader_node)

@spec needs_resync?(non_neg_integer(), node(), node()) ::
  :wal_bridgeable | :needs_resync

read_last_applied_from_disk(data_dir, shard_index)

@spec read_last_applied_from_disk(binary(), non_neg_integer()) :: non_neg_integer()

Reads the persisted replay-safe index for a copied shard.

Returns 0 when the marker is absent or unreadable.

retry_sync_shard(shard_index, target_node, ctx, max_retries \\ 3)

@spec retry_sync_shard(
  non_neg_integer(),
  node(),
  FerricStore.Instance.t(),
  non_neg_integer()
) ::
  {:ok, :wal_bridgeable | non_neg_integer()} | {:error, term()}

Retries sync_shard/3 up to max_retries times, cleaning up partial data on the target node between attempts.

Parameters

  • shard_index -- zero-based shard index
  • target_node -- the node to sync data to
  • ctx -- the FerricStore instance context
  • max_retries -- maximum number of attempts (default: 3)

sync_all_shards(target_node, ctx)

@spec sync_all_shards(node(), FerricStore.Instance.t()) ::
  {:ok, map()} | {:error, term()}

Copies all shards sequentially, tracking per-shard sync status.

Returns {:ok, results} when every shard succeeds, where results is a map of shard_index => {:synced, detail}. On partial failure returns {:error, {:partial_sync, results}} with per-shard success/failure info.

Parameters

  • target_node -- the node to sync data to
  • ctx -- the FerricStore instance context

sync_shard(shard_index, target_node, ctx)

@spec sync_shard(non_neg_integer(), node(), FerricStore.Instance.t()) ::
  {:ok, :wal_bridgeable | non_neg_integer()} | {:error, term()}

Syncs a single shard's data to a target node.

Resolves the current leader for the shard and copies data FROM the leader (not from the local node). Before copying, checks whether the target can catch up via WARaft segment replay alone -- if so, the expensive data copy is skipped.

  1. Find leader for the shard
  2. Check segment-log bridgeability
  3. If resync needed: pause writes, copy data, resume writes
  4. Return {:ok, detail} with :wal_bridgeable or the Raft index at copy time

Parameters

  • shard_index -- zero-based shard index
  • target_node -- the node to sync data to
  • ctx -- the FerricStore instance context

wal_bridgeable?(target_index, leader_first_index)

@spec wal_bridgeable?(non_neg_integer(), non_neg_integer()) ::
  :wal_bridgeable | :needs_resync

Pure segment-log gap check: given the target's replay-safe index and the leader's first available segment index, determines if log replay can bridge the gap.