Ferricstore.HLC (ferricstore v0.3.2)

Copy Markdown View Source

Hybrid Logical Clock (HLC) for FerricStore.

An HLC combines a physical wall-clock component (milliseconds since epoch) with a logical counter to produce timestamps that are:

  1. Monotonically increasing -- even when the wall clock is not (NTP corrections, VM migration, etc.).
  2. Causally ordered -- merging a remote timestamp via update/1 ensures happens-before relationships are preserved across nodes.
  3. Close to real time -- the physical component tracks System.os_time(:millisecond) and only diverges when the wall clock jumps backward or a remote node is ahead.

Spec reference (2G.6)

  • HLC is piggybacked on Raft heartbeats.
  • Inter-node TTL precision is bounded to Raft heartbeat RTT (~10 ms).
  • A telemetry warning is emitted at 500 ms drift between HLC physical and wall clock.
  • TTL-sensitive reads are rejected at 1 000 ms drift.
  • The HLC millisecond component is used for Stream ID generation (XADD *).
  • HLC timestamps are stamped on commands before they enter Raft -- they are not computed inside apply().

Architecture

The hot-path functions now/0 and now_ms/0 are lock-free: they use a single :atomics slot (stored in :persistent_term) instead of a GenServer.call. Physical ms and logical counter are packed into one 64-bit integer, eliminating the two-slot race that broke monotonicity under contention.

The GenServer is retained only for:

  • update/1 -- merging remote timestamps from Raft heartbeats (~6 calls/sec). This must be serialized to avoid lost-update races between concurrent merges.
  • Process supervision -- the init/1 callback creates the atomics ref and stores it in :persistent_term.

Packed layout in a single unsigned 64-bit atomic:

|-- physical_ms (48 bits) --|-- logical (16 bits) --|

48 bits covers ~8,920 years of milliseconds. 16 bits allows 65,535 increments per millisecond. If logical overflows, it spills into the physical bits — effectively advancing the clock by 1 ms, which is safe.

Usage

The application supervision tree starts a single named HLC process (Ferricstore.HLC). Other modules obtain timestamps via:

{physical_ms, logical} = Ferricstore.HLC.now()
ms = Ferricstore.HLC.now_ms()

When receiving a Raft heartbeat with a piggybacked timestamp:

:ok = Ferricstore.HLC.update(remote_timestamp)

Timestamp representation

A timestamp is a 2-tuple {physical_ms, logical} where:

  • physical_ms -- milliseconds since Unix epoch (same scale as System.os_time(:millisecond))
  • logical -- a non-negative integer counter that disambiguates events within the same physical millisecond

Timestamps are ordered lexicographically: physical first, then logical.

Cross-node synchronization

HLC sync across nodes currently happens via Raft heartbeats (~6/sec), giving ~150ms max drift. This is sufficient for all current use cases:

  • Read-your-writes on the writing node: guaranteed by ra:process_command(reply_from: :local).
  • Cross-node reads after quorum write: guaranteed by Raft consensus — all quorum nodes applied the command before the write returns.
  • TTL expiry: second-granularity TTLs are unaffected by 150ms drift.
  • Cross-shard WATCH: uses value hashing, not timestamps.

Per-command HLC stamping (not wired up)

The state machine can handle commands wrapped as {inner_command, %{hlc_ts: {physical_ms, logical}}}. When apply/3 processes a wrapped command, it calls update/1 to merge the leader's clock into the follower's HLC. This gives per-command causal sync (~0ms drift) instead of heartbeat-level sync (~150ms drift).

Nothing sends wrapped commands today. Enable if any of these arise:

  1. Sub-second TTLs with cross-node reads — a key with PX 100 (100ms TTL) could appear alive on one node and expired on another within the 150ms drift window.
  2. Cross-node causal ordering — if a feature needs "write A happened-before write B" guarantees across nodes beyond Raft log ordering (e.g., conflict resolution in multi-leader setups).
  3. Observed drift exceeding 500ms — if heartbeat intervals increase due to network issues or overloaded nodes.

To enable, stamp commands in Router.raft_write/4 before submitting:

stamped = {command, %{hlc_ts: HLC.now()}}
:ra.process_command(shard_id, stamped, opts)

Cost: ~50ns per write (now/0 on leader) + one update/1 GenServer call per follower during apply/3. CockroachDB and YugabyteDB stamp every RPC and consider the cost negligible. Redis and Kvrocks do not have cross-node transactions or HLC.

Graceful fallback

When the HLC GenServer has not been started (e.g. in unit tests that exercise command modules without the full application), now/0 and now_ms/0 fall back to System.os_time(:millisecond) with a logical counter of 0.

Summary

Types

An HLC timestamp: {physical_ms, logical_counter}.

Functions

Returns a specification to start this module under a supervisor.

Compares two HLC timestamps.

Returns true when drift exceeds the reject threshold (1 000 ms), meaning TTL-sensitive reads should not be served.

Returns true when drift exceeds the reject threshold (1 000 ms).

Returns the absolute drift in milliseconds between the HLC physical component and the current wall clock.

Returns the absolute drift in milliseconds.

Extracts the millisecond (physical) component from an HLC timestamp.

Returns the current HLC timestamp as {physical_ms, logical}.

Returns the current HLC timestamp as {physical_ms, logical}.

Convenience: returns only the physical (millisecond) component of now/0.

Convenience: returns only the physical (millisecond) component of now/0.

Starts the HLC GenServer.

Merges a remote HLC timestamp received from another node.

Merges a remote HLC timestamp, sending the call to server.

Types

timestamp()

@type timestamp() :: {non_neg_integer(), non_neg_integer()}

An HLC timestamp: {physical_ms, logical_counter}.

Functions

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

compare(arg1, arg2)

@spec compare(timestamp(), timestamp()) :: :lt | :eq | :gt

Compares two HLC timestamps.

Returns :lt, :eq, or :gt following the same convention as DateTime.compare/2.

Examples

iex> Ferricstore.HLC.compare({100, 0}, {200, 0})
:lt
iex> Ferricstore.HLC.compare({100, 1}, {100, 0})
:gt
iex> Ferricstore.HLC.compare({100, 0}, {100, 0})
:eq

drift_exceeded?()

@spec drift_exceeded?() :: boolean()

Returns true when drift exceeds the reject threshold (1 000 ms), meaning TTL-sensitive reads should not be served.

Lock-free; does not call the GenServer.

drift_exceeded?(server)

@spec drift_exceeded?(GenServer.server()) :: boolean()

Returns true when drift exceeds the reject threshold (1 000 ms).

This overload accepts a server argument for backward compatibility.

Parameters

  • server -- ignored (kept for API compatibility).

drift_ms()

@spec drift_ms() :: non_neg_integer()

Returns the absolute drift in milliseconds between the HLC physical component and the current wall clock.

This function is lock-free -- it reads the atomics ref directly.

Under normal single-node operation this is 0 or near-0. A non-zero drift indicates that update/1 received a future timestamp from a remote node or the wall clock jumped backward.

drift_ms(server)

@spec drift_ms(GenServer.server()) :: non_neg_integer()

Returns the absolute drift in milliseconds.

This overload accepts a server argument for backward compatibility.

Parameters

  • server -- ignored (kept for API compatibility).

encode_ms(arg)

@spec encode_ms(timestamp()) :: non_neg_integer()

Extracts the millisecond (physical) component from an HLC timestamp.

This is used when a caller needs a plain integer millisecond value, for example as the ms part of a Redis Stream ID.

Examples

iex> Ferricstore.HLC.encode_ms({1_234_567_890, 42})
1_234_567_890

now()

@spec now() :: timestamp()

Returns the current HLC timestamp as {physical_ms, logical}.

This function is lock-free -- it reads and updates a single packed :atomics slot directly, bypassing the GenServer. Physical ms and logical counter are packed into one 64-bit integer, so updates are truly atomic with no torn-read races.

On the common path (same millisecond), uses add_get(ref, 1, 1) which atomically increments the logical counter with zero contention. CAS is only used at millisecond boundaries.

Falls back to {System.os_time(:millisecond), 0} when the HLC GenServer has not been started.

Examples

iex> {phys, logical} = Ferricstore.HLC.now()
iex> phys > 0
true
iex> logical >= 0
true

now(server)

@spec now(GenServer.server()) :: timestamp()

Returns the current HLC timestamp as {physical_ms, logical}.

This overload accepts a server argument for backward compatibility with existing tests. In production, prefer the zero-arity now/0 which is lock-free.

Parameters

  • server -- ignored (kept for API compatibility). The atomics ref is always read from :persistent_term.

now_ms()

@spec now_ms() :: non_neg_integer()

Convenience: returns only the physical (millisecond) component of now/0.

This is the value used as the millisecond part of Redis Stream IDs and as the base for TTL computations. Lock-free; does not call the GenServer.

Falls back to System.os_time(:millisecond) when the HLC GenServer is not running.

now_ms(server)

@spec now_ms(GenServer.server()) :: non_neg_integer()

Convenience: returns only the physical (millisecond) component of now/0.

This overload accepts a server argument for backward compatibility.

Parameters

  • server -- ignored (kept for API compatibility).

start_link(opts \\ [])

@spec start_link(keyword()) :: GenServer.on_start()

Starts the HLC GenServer.

The GenServer creates an :atomics ref on init and stores it in :persistent_term so that now/0 and now_ms/0 can read/write it without a GenServer call.

Options

update(remote_ts)

@spec update(timestamp()) :: :ok

Merges a remote HLC timestamp received from another node.

Called in two contexts:

  • Raft heartbeats (~6 calls/sec) -- the leader's HLC timestamp is piggybacked on heartbeat RPCs. This is the current sync mechanism, giving ~150ms max drift between nodes.
  • Per-command stamping (not wired up) -- if commands are wrapped as {inner_command, %{hlc_ts: ts}}, the state machine calls update/1 during apply/3 on each follower, giving per-command causal sync. See the module doc for when to enable this.

This is a GenServer.call because remote timestamp merges must be serialized to avoid lost-update races. This is acceptable because merges are rare (~6 calls/sec from Raft heartbeats).

The merge rule follows the standard HLC algorithm:

  1. new_physical = max(wall_clock, local_physical, remote_physical)
  2. If all three physical values tie, logical = max(local_logical, remote_logical) + 1.
  3. If two tie at the max, the logical from the winner is incremented.
  4. If wall clock alone wins, logical resets to 0.

Parameters

  • remote_ts -- the remote HLC timestamp {physical_ms, logical}

update(server, remote_ts)

@spec update(GenServer.server(), timestamp()) :: :ok

Merges a remote HLC timestamp, sending the call to server.

Parameters

  • server -- the HLC process
  • remote_ts -- the remote HLC timestamp {physical_ms, logical}