viva_math/tdigest
t-digest — streaming approximate quantiles.
Dunning’s t-digest (2014, 2019) provides accurate approximate quantiles
over a stream with bounded memory and fast updates. Particularly
accurate at the tail (q < 0.01 or q > 0.99), which is where most
applications need precision.
Memory: O(δ) where δ is the compression parameter (default 100). Update: O(log δ) amortised. Quantile query: O(δ).
When to use
- You need percentiles over a stream that doesn’t fit in memory.
- Tail quantiles (p99, p99.9) matter — t-digest is exact-ish there.
- You want to merge digests from parallel workers.
Reference
Dunning & Ertl (2019) “Computing Extremely Accurate Quantiles Using t-Digests” — https://github.com/tdunning/t-digest
Algorithm summary
A t-digest is a sorted set of centroids, each (mean, weight).
New samples merge into the nearest centroid up to a size limit dictated
by the scale function k(q) = δ · q · (1 - q) / 2π — small at the
tails, large at the median, so tail centroids stay small and accurate.
Types
A centroid: a weighted point on the real line.
A weighted centroid (mean, weight) representing a cluster of nearby
samples in the digest. Opaque — invariants (sorted by mean, positive
weight) are maintained by insert / merge / compress.
pub opaque type Centroid
t-digest state.
compression controls memory/accuracy tradeoff (typical 100). Larger →
more accurate, more memory.
Opaque — direct construction could violate the sorted-centroids
invariant or the consistency between centroids and total_weight.
Use new / with_compression / insert to build.
pub opaque type TDigest
Values
pub fn compression(td: TDigest) -> Float
Compression parameter δ (typical 100; larger → more accurate / more memory).
pub fn insert_all(td: TDigest, xs: List(Float)) -> TDigest
Insert many samples from a list. Equivalent to folding insert.
pub fn insert_weighted(
td: TDigest,
value: Float,
weight: Float,
) -> TDigest
Insert a weighted sample.
pub fn merge(a: TDigest, b: TDigest) -> TDigest
Combine two digests into one. Useful for distributed reductions.
pub fn min(td: TDigest) -> Result(Float, Nil)
Minimum sample so far (first centroid mean, since centroids stay sorted).
pub fn p99(td: TDigest) -> Result(Float, Nil)
Convenience: p99 (extreme-tail quantile, where t-digest excels).
pub fn quantile(td: TDigest, q: Float) -> Result(Float, Nil)
Approximate quantile q ∈ [0, 1]. Returns Error for an empty digest.
pub fn with_compression(delta: Float) -> TDigest
Empty t-digest with explicit compression parameter.