Default UltraLogLog cardinality estimator — Further Generalized Remaining Area (FGRA), a faithful translation of Algorithm 6 from Ertl 2024 (PVLDB 2024, p. 1664).
The estimator combines:
- a single-pass register sweep that sums the intermediate-range
contribution
g(rᵢ)(eq. 15 / §3.3) and counts the eight special-case bytes that trigger range corrections, - a small-range correction (§3.5, eq. 23 + the corrected contributions in eq. 18) when any byte falls into the four smallest reachable states,
- a large-range correction (§3.6, eq. 24 + the trailing branch of eq. 18) when any byte saturates,
- the closing
λₚ · s^(-1/τ)factor (eq. 12 / Algorithm 6 caption).
Asymptotic relative standard error is √v/m ≈ 0.782/√m; the
memory-variance product (MVP) is 8v ≈ 4.895145 (paper line 624) —
this is the constant the README and UltraLogLog moduledoc cite as
"storage factor 4.895". It is not the same as τ; the FGRA
estimator's exponent constant is τ ≈ 0.819491, defined per
Algorithm 6 caption.
Why paper-faithful and not table-based
Hash4j's reference implementation (OptimalFGRAEstimator in
UltraLogLog.java v0.17.0, lines 481–923) precomputes two arrays:
a 252-entry REGISTER_CONTRIBUTIONS table caching g(byte) for the
intermediate range, and a 24-entry ESTIMATION_FACTORS table caching
λₚ. The numerical math is identical to Algorithm 6 — the tables
only trade memory for a few :math.pow calls. We compute both inline
so that this module reads as a one-to-one translation of the paper
alongside paper/p1655-ertl.pdf. A table-backed variant is v0.2 work;
cross-checked against OPTIMAL_FGRA_ESTIMATOR in test/fgra_test.exs
to within 0.1% relative.
Shifted register convention
This module operates on the shifted byte encoding byte = paper_r + 4p − 8 for nonzero registers (see UltraLogLog.Encoding). That
shift collapses the eight special-case paper-r values
{0, 4, 8, 10, 4w, 4w+1, 4w+2, 4w+3} (with w := 65 − p) into byte
values {0, 4p − 4, 4p, 4p + 2, 252, 253, 254, 255} — the
large-range constants are precision-independent because the +4p−8
shift is what saturates the byte at 255.
Summary
Functions
Estimate cardinality from the current sketch state, per Ertl 2024 Algorithm 6.
Functions
@spec estimate(UltraLogLog.t()) :: float()
Estimate cardinality from the current sketch state, per Ertl 2024 Algorithm 6.