Pure-Elixir HyperLogLog probabilistic cardinality estimator.
HyperLogLog (HLL) is a space-efficient probabilistic data structure that estimates the number of distinct elements (cardinality) in a multiset. It uses a fixed-size sketch of ~16 KB to provide estimates with a standard error of approximately 0.81%.
Design
This implementation uses:
- 14-bit precision -- 2^14 = 16,384 registers
- 1 byte per register -- only the lower 6 bits are used (values 0..63)
- 128-bit hash -- MD5 via
:crypto.hash/2, truncated to 64 bits for register index and rank computation - Bias correction -- the standard HyperLogLog formula with small-range (linear counting) and large-range corrections
Sketches are stored as plain 16,384-byte binaries compatible with the store
layer's put/3 and get/1 interface. No special encoding or headers are
needed.
Algorithm
For each element added:
- Compute a 64-bit hash from the element's MD5 digest.
- Use the lower 14 bits as a register index (0..16,383).
- Count the number of leading zeros in the remaining upper 50 bits, add 1 to get the "rank".
- Store the rank in the register if it exceeds the current value (max wins).
Cardinality estimation applies the harmonic mean across all registers with the alpha correction constant and small/large range bias corrections.
Examples
iex> sketch = Ferricstore.HyperLogLog.new()
iex> {sketch, true} = Ferricstore.HyperLogLog.add(sketch, "hello")
iex> Ferricstore.HyperLogLog.count(sketch) > 0
true
iex> s1 = Ferricstore.HyperLogLog.new()
iex> s2 = Ferricstore.HyperLogLog.new()
iex> {s1, _} = Ferricstore.HyperLogLog.add(s1, "a")
iex> {s2, _} = Ferricstore.HyperLogLog.add(s2, "b")
iex> merged = Ferricstore.HyperLogLog.merge(s1, s2)
iex> Ferricstore.HyperLogLog.count(merged) >= 1
true
Summary
Functions
Adds an element to the sketch.
Estimates the cardinality (number of distinct elements) of the sketch.
Merges two sketches by taking the maximum register value at each position.
Creates a new empty HyperLogLog sketch.
Returns the number of registers in the sketch.
Validates that a binary is a well-formed HLL sketch.
Types
Functions
Adds an element to the sketch.
The element is hashed with MD5 and the resulting 64-bit value determines the register index and rank. The register is updated only if the new rank exceeds the stored value.
Parameters
sketch-- existing 16,384-byte HLL sketchelement-- any binary element to add
Returns
{updated_sketch, modified?} where modified? is true if the sketch
was changed (i.e. the register was updated), false otherwise.
Examples
iex> sketch = Ferricstore.HyperLogLog.new()
iex> {sketch, true} = Ferricstore.HyperLogLog.add(sketch, "hello")
iex> {_sketch, false} = Ferricstore.HyperLogLog.add(sketch, "hello")
@spec count(sketch()) :: non_neg_integer()
Estimates the cardinality (number of distinct elements) of the sketch.
Applies the standard HyperLogLog estimation formula with small-range (linear counting) and large-range (2^64 correction) bias adjustments.
Parameters
sketch-- a 16,384-byte HLL sketch
Returns
A non-negative integer representing the estimated cardinality.
Examples
iex> sketch = Ferricstore.HyperLogLog.new()
iex> Ferricstore.HyperLogLog.count(sketch)
0
Merges two sketches by taking the maximum register value at each position.
The merged sketch represents the union of the two input multisets.
Parameters
sketch1-- first 16,384-byte HLL sketchsketch2-- second 16,384-byte HLL sketch
Returns
A new 16,384-byte HLL sketch containing the element-wise maximum.
Examples
iex> s1 = Ferricstore.HyperLogLog.new()
iex> s2 = Ferricstore.HyperLogLog.new()
iex> merged = Ferricstore.HyperLogLog.merge(s1, s2)
iex> byte_size(merged)
16384
@spec new() :: sketch()
Creates a new empty HyperLogLog sketch.
Returns a 16,384-byte binary of zeroes.
Examples
iex> sketch = Ferricstore.HyperLogLog.new()
iex> byte_size(sketch)
16384
@spec num_registers() :: pos_integer()
Returns the number of registers in the sketch.
This is a compile-time constant (16,384 for 14-bit precision).
Examples
iex> Ferricstore.HyperLogLog.num_registers()
16384
Validates that a binary is a well-formed HLL sketch.
A valid sketch is exactly 16,384 bytes long.
Parameters
binary-- the binary to validate
Returns
true if the binary is exactly num_registers/0 bytes, false otherwise.
Examples
iex> Ferricstore.HyperLogLog.valid_sketch?(Ferricstore.HyperLogLog.new())
true
iex> Ferricstore.HyperLogLog.valid_sketch?(<<1, 2, 3>>)
false