UltraLogLog.Estimator.FGRA (UltraLogLog v0.1.0)

Copy Markdown View Source

Default UltraLogLog cardinality estimator — Further Generalized Remaining Area (FGRA), a faithful translation of Algorithm 6 from Ertl 2024 (PVLDB 2024, p. 1664).

The estimator combines:

  • a single-pass register sweep that sums the intermediate-range contribution g(rᵢ) (eq. 15 / §3.3) and counts the eight special-case bytes that trigger range corrections,
  • a small-range correction (§3.5, eq. 23 + the corrected contributions in eq. 18) when any byte falls into the four smallest reachable states,
  • a large-range correction (§3.6, eq. 24 + the trailing branch of eq. 18) when any byte saturates,
  • the closing λₚ · s^(-1/τ) factor (eq. 12 / Algorithm 6 caption).

Asymptotic relative standard error is √v/m0.782/√m; the memory-variance product (MVP) is 8v ≈ 4.895145 (paper line 624) — this is the constant the README and UltraLogLog moduledoc cite as "storage factor 4.895". It is not the same as τ; the FGRA estimator's exponent constant is τ ≈ 0.819491, defined per Algorithm 6 caption.

Why paper-faithful and not table-based

Hash4j's reference implementation (OptimalFGRAEstimator in UltraLogLog.java v0.17.0, lines 481–923) precomputes two arrays: a 252-entry REGISTER_CONTRIBUTIONS table caching g(byte) for the intermediate range, and a 24-entry ESTIMATION_FACTORS table caching λₚ. The numerical math is identical to Algorithm 6 — the tables only trade memory for a few :math.pow calls. We compute both inline so that this module reads as a one-to-one translation of the paper alongside paper/p1655-ertl.pdf. A table-backed variant is v0.2 work; cross-checked against OPTIMAL_FGRA_ESTIMATOR in test/fgra_test.exs to within 0.1% relative.

Shifted register convention

This module operates on the shifted byte encoding byte = paper_r + 4p − 8 for nonzero registers (see UltraLogLog.Encoding). That shift collapses the eight special-case paper-r values {0, 4, 8, 10, 4w, 4w+1, 4w+2, 4w+3} (with w := 65 − p) into byte values {0, 4p − 4, 4p, 4p + 2, 252, 253, 254, 255} — the large-range constants are precision-independent because the +4p−8 shift is what saturates the byte at 255.

Summary

Functions

Estimate cardinality from the current sketch state, per Ertl 2024 Algorithm 6.

Functions

estimate(ultra_log_log)

@spec estimate(UltraLogLog.t()) :: float()

Estimate cardinality from the current sketch state, per Ertl 2024 Algorithm 6.