viva_math/tdigest

t-digest — streaming approximate quantiles.

Dunning’s t-digest (2014, 2019) provides accurate approximate quantiles over a stream with bounded memory and fast updates. Particularly accurate at the tail (q < 0.01 or q > 0.99), which is where most applications need precision.

Memory: O(δ) where δ is the compression parameter (default 100). Update: O(log δ) amortised. Quantile query: O(δ).

When to use

Reference

Dunning & Ertl (2019) “Computing Extremely Accurate Quantiles Using t-Digests” — https://github.com/tdunning/t-digest

Algorithm summary

A t-digest is a sorted set of centroids, each (mean, weight). New samples merge into the nearest centroid up to a size limit dictated by the scale function k(q) = δ · q · (1 - q) / 2π — small at the tails, large at the median, so tail centroids stay small and accurate.

Types

A centroid: a weighted point on the real line. A weighted centroid (mean, weight) representing a cluster of nearby samples in the digest. Opaque — invariants (sorted by mean, positive weight) are maintained by insert / merge / compress.

pub opaque type Centroid

t-digest state.

compression controls memory/accuracy tradeoff (typical 100). Larger → more accurate, more memory.

Opaque — direct construction could violate the sorted-centroids invariant or the consistency between centroids and total_weight. Use new / with_compression / insert to build.

pub opaque type TDigest

Values

pub fn centroid_mean(c: Centroid) -> Float

Mean of a centroid.

pub fn centroid_weight(c: Centroid) -> Float

Weight of a centroid.

pub fn compression(td: TDigest) -> Float

Compression parameter δ (typical 100; larger → more accurate / more memory).

pub fn count(td: TDigest) -> Float

Total number of samples represented by the digest.

pub fn insert(td: TDigest, value: Float) -> TDigest

Insert a single sample into the digest.

pub fn insert_all(td: TDigest, xs: List(Float)) -> TDigest

Insert many samples from a list. Equivalent to folding insert.

pub fn insert_weighted(
  td: TDigest,
  value: Float,
  weight: Float,
) -> TDigest

Insert a weighted sample.

pub fn max(td: TDigest) -> Result(Float, Nil)

Maximum sample so far.

pub fn median(td: TDigest) -> Result(Float, Nil)

Convenience: median.

pub fn merge(a: TDigest, b: TDigest) -> TDigest

Combine two digests into one. Useful for distributed reductions.

pub fn min(td: TDigest) -> Result(Float, Nil)

Minimum sample so far (first centroid mean, since centroids stay sorted).

pub fn new() -> TDigest

Empty t-digest with default compression (δ = 100).

pub fn p99(td: TDigest) -> Result(Float, Nil)

Convenience: p99 (extreme-tail quantile, where t-digest excels).

pub fn quantile(td: TDigest, q: Float) -> Result(Float, Nil)

Approximate quantile q ∈ [0, 1]. Returns Error for an empty digest.

pub fn with_compression(delta: Float) -> TDigest

Empty t-digest with explicit compression parameter.

Search Document