Ferricstore.Store.BitcaskCheckpointer (ferricstore v0.3.2)

Copy Markdown View Source

Per-shard background fsync for Bitcask data files.

Replaces the per-apply v2_fsync in StateMachine.flush_pending_writes and the old shard-level fsync_needed deferred fsync timer. One shared mechanism, one shared flag (atomics on the Instance), covering all write paths (Raft state machine + async BitcaskWriter).

Correctness

Ra WAL is the source of truth for client-visible durability. Writes hit Bitcask data files via v2_append_batch_nosync (page cache only). On a crash, the Ra log replays any post-checkpoint entries and rebuilds the Bitcask state exactly — no acknowledged data is lost.

The checkpointer's job is to move data from page cache to disk on a predictable cadence, bounding replay time after kernel panic.

Algorithm

every checkpoint_interval_ms:
  if :atomics.get(checkpoint_flags, idx+1) == 1:
    :atomics.put(checkpoint_flags, idx+1, 0)   # clear BEFORE fsync
    {_fid, active_path, _sp} = ActiveFile.get(idx)
    NIF.v2_fsync_async(self(), corr_id, active_path)
  else: skip (idle shard  no syscalls)

Clearing the flag before firing async-fsync is intentional: a writer that arrives during the fsync re-sets the flag, so the next tick picks it up. The current fsync may miss bytes from that concurrent write, which is fine because Ra WAL is authoritative.

On fsync error (disk full, I/O error), we re-set the flag so the next tick retries, and raise DiskPressure to shed writes.

Configuration

  • :checkpoint_interval_ms (default 10_000 = 10s) — how often to check the flag. Ra WAL is fdatasync'd per batch and is the source of truth for acknowledged writes, so a large interval is safe: on kernel panic we replay up to one interval's worth of Ra log entries and rebuild Bitcask exactly. Short intervals mean more fsync syscalls per shard for no durability gain.

Summary

Functions

Returns a specification to start this module under a supervisor.

Canonical process name for the checkpointer of a given shard.

Forces a synchronous fsync of the shard's active file right now. Used by graceful shutdown (see design doc §shutdown ordering) and by tests. Bypasses the async path and clears the dirty flag on success.

Functions

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

process_name(index, arg2)

@spec process_name(non_neg_integer(), map() | nil) :: atom()

Canonical process name for the checkpointer of a given shard.

start_link(opts)

@spec start_link(keyword()) :: GenServer.on_start()

sync_now(server)

@spec sync_now(pid() | atom()) :: :ok | {:error, term()}

Forces a synchronous fsync of the shard's active file right now. Used by graceful shutdown (see design doc §shutdown ordering) and by tests. Bypasses the async path and clears the dirty flag on success.