Ferricstore.HyperLogLog (ferricstore v0.3.3)

Copy Markdown View Source

Pure-Elixir HyperLogLog probabilistic cardinality estimator.

HyperLogLog (HLL) is a space-efficient probabilistic data structure that estimates the number of distinct elements (cardinality) in a multiset. It uses a fixed-size sketch of ~16 KB to provide estimates with a standard error of approximately 0.81%.

Design

This implementation uses:

  • 14-bit precision -- 2^14 = 16,384 registers
  • 1 byte per register -- only the lower 6 bits are used (values 0..63)
  • 128-bit hash -- MD5 via :crypto.hash/2, truncated to 64 bits for register index and rank computation
  • Bias correction -- the standard HyperLogLog formula with small-range (linear counting) and large-range corrections

Sketches are stored as plain 16,384-byte binaries compatible with the store layer's put/3 and get/1 interface. No special encoding or headers are needed.

Algorithm

For each element added:

  1. Compute a 64-bit hash from the element's MD5 digest.
  2. Use the lower 14 bits as a register index (0..16,383).
  3. Count the number of leading zeros in the remaining upper 50 bits, add 1 to get the "rank".
  4. Store the rank in the register if it exceeds the current value (max wins).

Cardinality estimation applies the harmonic mean across all registers with the alpha correction constant and small/large range bias corrections.

Examples

iex> sketch = Ferricstore.HyperLogLog.new()
iex> {sketch, true} = Ferricstore.HyperLogLog.add(sketch, "hello")
iex> Ferricstore.HyperLogLog.count(sketch) > 0
true

iex> s1 = Ferricstore.HyperLogLog.new()
iex> s2 = Ferricstore.HyperLogLog.new()
iex> {s1, _} = Ferricstore.HyperLogLog.add(s1, "a")
iex> {s2, _} = Ferricstore.HyperLogLog.add(s2, "b")
iex> merged = Ferricstore.HyperLogLog.merge(s1, s2)
iex> Ferricstore.HyperLogLog.count(merged) >= 1
true

Summary

Types

A 16,384-byte binary representing a HyperLogLog sketch.

Functions

Adds an element to the sketch.

Estimates the cardinality (number of distinct elements) of the sketch.

Merges two sketches by taking the maximum register value at each position.

Creates a new empty HyperLogLog sketch.

Returns the number of registers in the sketch.

Validates that a binary is a well-formed HLL sketch.

Types

sketch()

@type sketch() :: <<_::131_072>>

A 16,384-byte binary representing a HyperLogLog sketch.

Functions

add(sketch, element)

@spec add(sketch(), binary()) :: {sketch(), boolean()}

Adds an element to the sketch.

The element is hashed with MD5 and the resulting 64-bit value determines the register index and rank. The register is updated only if the new rank exceeds the stored value.

Parameters

  • sketch -- existing 16,384-byte HLL sketch
  • element -- any binary element to add

Returns

{updated_sketch, modified?} where modified? is true if the sketch was changed (i.e. the register was updated), false otherwise.

Examples

iex> sketch = Ferricstore.HyperLogLog.new()
iex> {sketch, true} = Ferricstore.HyperLogLog.add(sketch, "hello")
iex> {_sketch, false} = Ferricstore.HyperLogLog.add(sketch, "hello")

count(sketch)

@spec count(sketch()) :: non_neg_integer()

Estimates the cardinality (number of distinct elements) of the sketch.

Applies the standard HyperLogLog estimation formula with small-range (linear counting) and large-range (2^64 correction) bias adjustments.

Parameters

  • sketch -- a 16,384-byte HLL sketch

Returns

A non-negative integer representing the estimated cardinality.

Examples

iex> sketch = Ferricstore.HyperLogLog.new()
iex> Ferricstore.HyperLogLog.count(sketch)
0

merge(sketch1, sketch2)

@spec merge(sketch(), sketch()) :: sketch()

Merges two sketches by taking the maximum register value at each position.

The merged sketch represents the union of the two input multisets.

Parameters

  • sketch1 -- first 16,384-byte HLL sketch
  • sketch2 -- second 16,384-byte HLL sketch

Returns

A new 16,384-byte HLL sketch containing the element-wise maximum.

Examples

iex> s1 = Ferricstore.HyperLogLog.new()
iex> s2 = Ferricstore.HyperLogLog.new()
iex> merged = Ferricstore.HyperLogLog.merge(s1, s2)
iex> byte_size(merged)
16384

new()

@spec new() :: sketch()

Creates a new empty HyperLogLog sketch.

Returns a 16,384-byte binary of zeroes.

Examples

iex> sketch = Ferricstore.HyperLogLog.new()
iex> byte_size(sketch)
16384

num_registers()

@spec num_registers() :: pos_integer()

Returns the number of registers in the sketch.

This is a compile-time constant (16,384 for 14-bit precision).

Examples

iex> Ferricstore.HyperLogLog.num_registers()
16384

valid_sketch?(binary)

@spec valid_sketch?(binary()) :: boolean()

Validates that a binary is a well-formed HLL sketch.

A valid sketch is exactly 16,384 bytes long.

Parameters

  • binary -- the binary to validate

Returns

true if the binary is exactly num_registers/0 bytes, false otherwise.

Examples

iex> Ferricstore.HyperLogLog.valid_sketch?(Ferricstore.HyperLogLog.new())
true

iex> Ferricstore.HyperLogLog.valid_sketch?(<<1, 2, 3>>)
false