Pure functional t-digest data structure for accurate on-line accumulation of rank-based statistics such as quantiles, trimmed means, and cumulative distribution values.
A t-digest compresses a potentially unbounded stream of floating-point observations into a compact set of centroids -- (mean, weight) pairs -- that can answer percentile queries with bounded memory.
Key properties:
- Accuracy is best at the tails. P99, P99.9, and P0.1 estimates are more accurate than P50 estimates.
- Constant memory. A t-digest with compression=100 holds roughly 100-300 centroids regardless of how many samples are added.
- Mergeable. Two t-digests can be merged in O(c log c) time.
- Streaming. Samples can be added one at a time with amortized O(1) cost per sample.
Scale function
Uses the k1 scale function from Dunning's paper:
k(q) = (delta / (2 * pi)) * arcsin(2q - 1)Centroids near the tails (q near 0 or 1) are forced to be small (few observations), while centroids near the median can be large.
Usage
digest = Core.new(100)
digest = Core.add(digest, 42.0)
digest = Core.add_many(digest, [1.0, 2.0, 3.0])
Core.quantile(digest, 0.5)
Core.cdf(digest, 2.5)
Summary
Functions
Adds a single floating-point value to the digest.
Adds multiple floating-point values to the digest.
Returns the estimated value at a given rank position (0-based).
Returns the estimated value at a given reverse-rank position (0-based).
Estimates the cumulative distribution function value for a given observation.
Forces a compression pass, flushing the buffer into the centroids.
Returns metadata about the digest as a map.
Merges another t-digest into this one.
Merges multiple t-digests into a single digest with the given compression.
Creates a new empty t-digest with the given compression parameter.
Estimates the value at a given quantile position.
Estimates the number of observations less than the given value.
Resets the digest to empty, preserving its compression setting.
Estimates the number of observations greater than the given value.
Computes the trimmed mean between two quantile boundaries.
Types
@type t() :: %Ferricstore.TDigest.Core{ buffer: [float()], buffer_size: non_neg_integer(), centroids: [centroid()], compression: pos_integer(), count: non_neg_integer(), max: float() | nil, min: float() | nil, total_compressions: non_neg_integer() }
Functions
Adds a single floating-point value to the digest.
The value is buffered internally. When the buffer fills (capacity = ceil(compression * 3)), a compression pass runs automatically.
Parameters
digest- the t-digest structvalue- a float or integer value to add
Returns
The updated t-digest struct.
Adds multiple floating-point values to the digest.
More efficient than calling add/2 repeatedly as it batches the values
into the buffer before triggering compression.
Parameters
digest- the t-digest structvalues- list of float or integer values
Returns
The updated t-digest struct.
Returns the estimated value at a given rank position (0-based).
Parameters
digest- the t-digest structr- the rank (0-based non-negative integer)
Returns
- The estimated value at rank
r :"-inf"for rank < 0:inffor rank >= count:nanfor empty digest
Returns the estimated value at a given reverse-rank position (0-based).
Rank 0 is the largest observation.
Parameters
digest- the t-digest structr- the reverse rank (0-based)
Returns
- The estimated value at reverse rank
r :inffor r < 0:"-inf"for r >= count:nanfor empty digest
Estimates the cumulative distribution function value for a given observation.
Returns the estimated fraction of observations less than or equal to value.
Parameters
digest- the t-digest structvalue- the query value
Returns
- A float in [0.0, 1.0]
:nanif the digest is empty
Forces a compression pass, flushing the buffer into the centroids.
This is called automatically when the buffer fills or when a query is issued. You can call it manually to force the merge.
Returns
The compressed t-digest struct with an empty buffer.
Returns metadata about the digest as a map.
Returns
A map with keys: :compression, :capacity, :merged_nodes,
:unmerged_nodes, :merged_weight, :unmerged_weight,
:total_compressions, :memory_usage.
Merges another t-digest into this one.
The resulting digest contains all observations from both digests.
Parameters
a- the destination t-digestb- the source t-digest to merge in
Returns
A new t-digest containing all observations from both.
@spec merge_many([t()], pos_integer()) :: t()
Merges multiple t-digests into a single digest with the given compression.
Parameters
digests- list of t-digest structs to mergecompression- compression parameter for the result
Returns
A new merged t-digest.
@spec new(pos_integer()) :: t()
Creates a new empty t-digest with the given compression parameter.
The compression parameter (delta) controls the accuracy/memory tradeoff. Higher values retain more centroids. Typical values: 50-500.
Parameters
compression- positive integer, default 100
Examples
iex> Ferricstore.TDigest.Core.new()
%Ferricstore.TDigest.Core{compression: 100}
iex> Ferricstore.TDigest.Core.new(200)
%Ferricstore.TDigest.Core{compression: 200}
Estimates the value at a given quantile position.
Parameters
digest- the t-digest structq- quantile in [0.0, 1.0]
Returns
- The estimated value at quantile
q :nanif the digest is empty
Estimates the number of observations less than the given value.
Parameters
digest- the t-digest structvalue- the query value
Returns
An integer estimate of the rank (number of values below value).
Returns -1 if value is below the minimum observation.
Returns -2 for empty digest.
Resets the digest to empty, preserving its compression setting.
Returns
A fresh empty t-digest with the same compression parameter.
Estimates the number of observations greater than the given value.
Parameters
digest- the t-digest structvalue- the query value
Returns
An integer estimate of the reverse rank.
Computes the trimmed mean between two quantile boundaries.
Returns the mean of observations between lo and hi quantiles,
weighted by the fraction of each centroid that falls within the range.
Parameters
digest- the t-digest structlo- lower quantile boundary in [0.0, 1.0)hi- upper quantile boundary in (0.0, 1.0]
Returns
- The trimmed mean as a float
:nanif the digest is empty