UltraLogLog (ULL) sketch for cardinality estimation.
ULL (Ertl, 2023) provides approximately 20% better accuracy than HLL at the
same memory footprint. It uses the same 2^p register array but stores a
different value per register that encodes both the geometric rank and an extra
sub-bucket bit, then applies the FGRA estimator (sigma/tau convergence from
Ertl 2017) instead of HLL's harmonic mean.
Memory and Accuracy
- Register count:
m = 2^p - Memory:
8 + mbytes (8-byte header + one byte per register) - Relative standard error: approximately
0.835 / sqrt(m)(vs1.04 / sqrt(m)for HLL)
| p | Registers | Memory | ~Error (ULL) | ~Error (HLL) |
|---|---|---|---|---|
| 10 | 1,024 | ~1 KiB | 2.61% | 3.25% |
| 12 | 4,096 | ~4 KiB | 1.30% | 1.63% |
| 14 | 16,384 | ~16 KiB | 0.65% | 0.81% |
| 16 | 65,536 | ~64 KiB | 0.33% | 0.41% |
Estimation Strategy
The ULL estimator selects between two estimators based on whether any register is still empty:
Linear counting (
zeros > 0): when at least one register is empty, the formulam * ln(m / zeros)is used. Linear counting is the maximum-likelihood estimator in this regime and is the dominant estimator for realistic workloads (any cardinality where the load factor leaves empty registers, roughlyn < m * ln(m)). It also significantly outperforms the FGRA estimator at low precision (p < 12), where FGRA carries non-trivial bias for moderaten / m. The FGRA Horner loop is skipped on this branch.FGRA estimator (
zeros == 0): when every register is occupied, the Flajolet-style geometric rank aggregation with sigma/tau convergence from Algorithm 4 of Ertl 2017 is used. This is the large-cardinality regime, and the published relative standard error~0.835 / sqrt(m)applies here.Large range correction: applied on top of FGRA when the raw estimate exceeds
2^56 / 30. The hash-space bias correction-2^64 * ln(1 - raw_estimate / 2^64)is used. This branch is effectively unreachable with 64-bit hashes.
Note that for typical workloads (where cardinality is comparable to or
smaller than m * ln(m)) the linear counting branch is taken; the FGRA
branch is exercised primarily in the very-large-n regime.
Recommended Precision
p >= 12is recommended for production use. Belowp = 12, the transition between linear counting and FGRA (atzeros = 0) can exhibit increased relative error because FGRA's small-pbias is significant and linear counting at very few empty registers (zeros < ~m * e^(-5)) has high variance.- When
p >= 12, linear counting provides reliable estimates across the entire small-to-moderate cardinality range, and FGRA's RSE bound is tight when it engages.
Binary State Layout (ULL1)
All multi-byte fields are little-endian.
Offset Size Field
------ ------ -----
0 4 Magic bytes: "ULL1"
4 1 Version (u8, currently 1)
5 1 Precision p (u8, 4..26)
6 2 Reserved flags (u16 little-endian, must be 0)
8 m Registers (m = 2^p bytes, one u8 per register)Total: 8 + 2^p bytes.
Options
:p- precision parameter, integer 4..26 (default: 14):backend- backend module (default:ExDataSketch.Backend.Pure):update_many_chunk_size- chunk size forupdate_many/2internal batching (default: 10000). Must be set at creation time; cannot be overridden on a per-call basis.
Merge Properties
ULL merge is associative and commutative (register-wise max). This means sketches can be merged in any order or grouping and produce the same result, making ULL safe for parallel and distributed aggregation.
Summary
Functions
Alias for estimate/1.
Deserializes an EXSK binary into a ULL sketch.
Estimates the number of distinct items in the sketch.
Creates a new ULL sketch from an enumerable of items.
Merges two ULL sketches.
Merges a non-empty enumerable of ULL sketches into one.
Returns a 2-arity merge function suitable for combining sketches.
Creates a new ULL sketch.
Returns a 2-arity reducer function suitable for Enum.reduce/3 and similar.
Serializes the sketch to the ExDataSketch-native EXSK binary format.
Returns the size of the sketch state in bytes.
Updates the sketch with a single item.
Updates the sketch with multiple items in a single pass.
Types
Functions
Alias for estimate/1.
Examples
iex> ExDataSketch.ULL.new(p: 10) |> ExDataSketch.ULL.count()
0.0
@spec deserialize(binary()) :: {:ok, t()} | {:error, Exception.t()}
Deserializes an EXSK binary into a ULL sketch.
Returns {:ok, sketch} on success or {:error, reason} on failure.
Examples
iex> ExDataSketch.ULL.deserialize(<<"invalid">>)
{:error, %ExDataSketch.Errors.DeserializationError{message: "deserialization failed: invalid magic bytes, expected EXSK"}}
Estimates the number of distinct items in the sketch.
Returns a floating-point estimate. The accuracy depends on the precision
parameter p. ULL typically achieves ~20% lower relative error than HLL
at the same precision.
Examples
iex> ExDataSketch.ULL.new(p: 10) |> ExDataSketch.ULL.estimate()
0.0
@spec from_enumerable( Enumerable.t(), keyword() ) :: t()
Creates a new ULL sketch from an enumerable of items.
Equivalent to new(opts) |> update_many(enumerable).
Options
Same as new/1.
Examples
iex> sketch = ExDataSketch.ULL.from_enumerable(["a", "b", "c"], p: 10)
iex> ExDataSketch.ULL.estimate(sketch) > 0.0
true
Merges two ULL sketches.
Both sketches must have the same precision p. The result contains the
register-wise maximum, which corresponds to the union of the two input
multisets.
Returns the merged sketch. Raises ExDataSketch.Errors.IncompatibleSketchesError
if the sketches have different parameters.
Examples
iex> a = ExDataSketch.ULL.new(p: 10) |> ExDataSketch.ULL.update("x")
iex> b = ExDataSketch.ULL.new(p: 10) |> ExDataSketch.ULL.update("y")
iex> merged = ExDataSketch.ULL.merge(a, b)
iex> ExDataSketch.ULL.estimate(merged) >= ExDataSketch.ULL.estimate(a)
true
@spec merge_many(Enumerable.t()) :: t()
Merges a non-empty enumerable of ULL sketches into one.
Raises Enum.EmptyError if the enumerable is empty.
Examples
iex> a = ExDataSketch.ULL.new(p: 10) |> ExDataSketch.ULL.update("x")
iex> b = ExDataSketch.ULL.new(p: 10) |> ExDataSketch.ULL.update("y")
iex> merged = ExDataSketch.ULL.merge_many([a, b])
iex> ExDataSketch.ULL.estimate(merged) > 0.0
true
Returns a 2-arity merge function suitable for combining sketches.
The returned function calls merge/2 on two sketches.
Examples
iex> is_function(ExDataSketch.ULL.merger(), 2)
true
Creates a new ULL sketch.
Options
:p- precision parameter, integer 4..26 (default: 14). Higher values use more memory but give better accuracy.:backend- backend module (default:ExDataSketch.Backend.Pure).:hash_fn- custom hash function(term -> non_neg_integer).:seed- hash seed (default: 0).
Examples
iex> sketch = ExDataSketch.ULL.new(p: 10)
iex> sketch.opts[:p]
10
iex> ExDataSketch.ULL.size_bytes(sketch)
1032
Returns a 2-arity reducer function suitable for Enum.reduce/3 and similar.
The returned function calls update/2 on each item.
Examples
iex> is_function(ExDataSketch.ULL.reducer(), 2)
true
Serializes the sketch to the ExDataSketch-native EXSK binary format.
The serialized binary includes magic bytes, version, sketch type,
parameters, and state. See ExDataSketch.Codec for format details.
Examples
iex> sketch = ExDataSketch.ULL.new(p: 10)
iex> binary = ExDataSketch.ULL.serialize(sketch)
iex> <<"EXSK", _rest::binary>> = binary
iex> byte_size(binary) > 0
true
@spec size_bytes(t()) :: non_neg_integer()
Returns the size of the sketch state in bytes.
Examples
iex> ExDataSketch.ULL.new(p: 10) |> ExDataSketch.ULL.size_bytes()
1032
Updates the sketch with a single item.
The item is hashed using ExDataSketch.Hash.hash64/1 before being
inserted into the sketch.
Examples
iex> sketch = ExDataSketch.ULL.new(p: 10) |> ExDataSketch.ULL.update("hello")
iex> ExDataSketch.ULL.estimate(sketch) > 0.0
true
@spec update_many(t(), Enumerable.t()) :: t()
Updates the sketch with multiple items in a single pass.
More efficient than calling update/2 repeatedly because it minimizes
intermediate binary allocations.
The internal batch size is controlled by :update_many_chunk_size,
which must be set at new/1 time and cannot be changed per call.
Examples
iex> sketch = ExDataSketch.ULL.new(p: 10) |> ExDataSketch.ULL.update_many(["a", "b", "c"])
iex> ExDataSketch.ULL.estimate(sketch) > 0.0
true