Maximum-likelihood cardinality estimator for UltraLogLog.
Asymptotic storage factor 8 · ln(2)/ζ(2, 5/4) ≈ 4.631, relative
standard error √(ln(2)/ζ(2, 5/4)) ≈ 0.761/√m. Gives the full ~28%
space reduction vs HLL (compared to FGRA's ~24%) at the cost of one
iterative root-find per call.
What is being maximized
The log-likelihood under the Poisson model (Ertl 2024 §3.1, line 474):
ln L = -(n/m)·α + Σ_{u=1}^{w-1} β_u · ln(1 − e^(−n/(m·2^u)))where α and β_u are closed-form weighted sums of register-value
counts c_j (paper lines 480–492). Setting d/dn ln L = 0 gives the
ML equation, which has the same shape as the corresponding HLL one —
so the paper (line 503) reuses the secant solver from Ertl 2017
(arXiv:1702.01284, Algorithm 8).
After reparameterization that solver maximizes
e^(−x·a) · ∏_{k=0}^{n} (1 − e^(−x/2^k))^{b[k]}i.e. finds the root of
g(x) := −Σb[k] + a·x + Σ_{k} b[k] · h(x/2^k)with h(x) := 1 − x/(e^x − 1). h is concave and monotonically
increasing on (0, ∞), so g is monotonically increasing and has a
unique root for any non-degenerate sketch.
Algorithm shape (paper-faithful, mirroring Hash4j v0.17.0)
Map registers → (a, b[]): each register contributes to
aand to up to threeb[k]slots. The mapping (contribute/3below) portsUltraLogLog.MaximumLikelihoodEstimator.contributeand follows the paper'sα,β_udefinitions.Initial guess: closed-form Jensen lower bound on the root (derivation in
DistinctCountUtil.javalines 83–104). Provably≤ root, so the secant method converges monotonically from below. We deliberately avoid usingFGRA.estimate/1as the initial guess even though the brief originally proposed it — Jensen's bound is coherent with the solver's state (no inter-estimator coupling) and monotonicity is a stronger guarantee.Secant iteration: at each step, evaluate
g(x)and use the secant ratio(g − Σb) / (g_prev − g)to update the step size. Convergence threshold scales with precision: stop when|Δx|/x ≤ 0.001 · √v_ML / √m. Hash4j-tight tolerance — output matchesMAXIMUM_LIKELIHOOD_ESTIMATORto ~1e-12 relative.Inner-loop
h(x)evaluation: avoid:math.expand:math.logby computingh(2·x')for smallx' ∈ [0, 0.25]via a degree-6 Taylor polynomial (x' − x'²/3 + x'⁴/45 − x'⁶/472.5, the Bernoulli expansion of1 − 2x'/(e^(2x') − 1)) and walking up by the doubling recurrenceh(2z) = (z/2 + h(z)(1−h(z))) / (z/2 + (1−h(z)))to the targetx/2^k. No transcendental evaluations inside the loop.Bias correction: divide the secant output by
1 + 0.481.../m(paper eq. 11; the second-order delta-method correction).
Verification
Spot-checked against UltraLogLog.MAXIMUM_LIKELIHOOD_ESTIMATOR
v0.17.0 on the same 16 register snapshots used for FGRA — see
test/mle_test.exs for the tolerance and statistical-correctness
battery, and test/mle_test.exs again for the convergence-speed
test (mean iterations < 10, max < 50 across 100 random sketches).
Summary
Functions
Estimate cardinality from the current sketch state, per Ertl 2024 §3.1 (conference numbering; equivalent to arXiv §4.2).
Same as estimate/1, but also returns the secant iteration count.
Used by test/mle_test.exs to assert convergence speed; not intended
for production use.
Functions
@spec estimate(UltraLogLog.t()) :: float()
Estimate cardinality from the current sketch state, per Ertl 2024 §3.1 (conference numbering; equivalent to arXiv §4.2).
@spec estimate_with_iterations(UltraLogLog.t()) :: {float(), non_neg_integer()}
Same as estimate/1, but also returns the secant iteration count.
Used by test/mle_test.exs to assert convergence speed; not intended
for production use.