viva_math/tdigest
t-digest — streaming approximate quantiles.
Dunning’s t-digest (2014, 2019) provides accurate approximate quantiles
over a stream with bounded memory and fast updates. Particularly
accurate at the tail (q < 0.01 or q > 0.99), which is where most
applications need precision.
Memory: O(δ) where δ is the compression parameter (default 100). Update: O(log δ) amortised. Quantile query: O(δ).
When to use
- You need percentiles over a stream that doesn’t fit in memory.
- Tail quantiles (p99, p99.9) matter — t-digest is exact-ish there.
- You want to merge digests from parallel workers.
Reference
Dunning & Ertl (2019) “Computing Extremely Accurate Quantiles Using t-Digests” — https://github.com/tdunning/t-digest
Algorithm summary
A t-digest is a sorted set of centroids, each (mean, weight).
New samples merge into the nearest centroid up to a size limit dictated
by the scale function k(q) = δ · q · (1 - q) / 2π — small at the
tails, large at the median, so tail centroids stay small and accurate.
Types
A centroid: a weighted point on the real line.
pub type Centroid {
Centroid(mean: Float, weight: Float)
}
Constructors
-
Centroid(mean: Float, weight: Float)
t-digest state.
compression controls memory/accuracy tradeoff (typical 100). Larger →
more accurate, more memory.
pub type TDigest {
TDigest(
compression: Float,
centroids: List(Centroid),
total_weight: Float,
)
}
Constructors
-
TDigest( compression: Float, centroids: List(Centroid), total_weight: Float, )
Values
pub fn insert_all(td: TDigest, xs: List(Float)) -> TDigest
Insert many samples from a list. Equivalent to folding insert.
pub fn insert_weighted(
td: TDigest,
value: Float,
weight: Float,
) -> TDigest
Insert a weighted sample.
pub fn merge(a: TDigest, b: TDigest) -> TDigest
Combine two digests into one. Useful for distributed reductions.
pub fn min(td: TDigest) -> Result(Float, Nil)
Minimum sample so far (first centroid mean, since centroids stay sorted).
pub fn p99(td: TDigest) -> Result(Float, Nil)
Convenience: p99 (extreme-tail quantile, where t-digest excels).
pub fn quantile(td: TDigest, q: Float) -> Result(Float, Nil)
Approximate quantile q ∈ [0, 1]. Returns Error for an empty digest.
pub fn with_compression(delta: Float) -> TDigest
Empty t-digest with explicit compression parameter.