Production-grade streaming data sketching algorithms for Elixir.
ExDataSketch provides probabilistic data structures for approximate counting and frequency estimation on streaming data. All sketch state is stored as Elixir-owned binaries, enabling straightforward serialization, distribution, and persistence.
Sketch Families
ExDataSketch.HLL-- HyperLogLog for cardinality (distinct count) estimation.ExDataSketch.CMS-- Count-Min Sketch for frequency estimation.ExDataSketch.Theta-- Theta Sketch for set operations on cardinalities.ExDataSketch.KLL-- KLL Sketch for rank and quantile estimation.ExDataSketch.DDSketch-- DDSketch for value-relative-accuracy quantile estimation.ExDataSketch.FrequentItems-- SpaceSaving for approximate heavy-hitter detection.ExDataSketch.Bloom-- Bloom filter for probabilistic membership testing.ExDataSketch.Cuckoo-- Cuckoo filter for membership testing with deletion support.ExDataSketch.Quotient-- Quotient filter for membership testing with deletion and merge.ExDataSketch.CQF-- Counting Quotient Filter for multiset membership with approximate counting.ExDataSketch.XorFilter-- Xor filter for static, immutable membership testing.ExDataSketch.IBLT-- Invertible Bloom Lookup Table for set reconciliation.ExDataSketch.FilterChain-- Capability-aware composition framework for membership filters.ExDataSketch.REQ-- REQ Sketch for relative error quantiles with tail accuracy.ExDataSketch.MisraGries-- Misra-Gries for deterministic heavy hitter detection.ExDataSketch.Quantiles-- Facade for quantile sketch algorithms.
Architecture
- Binary state: All sketch state is canonical Elixir binaries. No opaque NIF resources.
- Backend system: Computation is dispatched through backend modules.
ExDataSketch.Backend.Pure(pure Elixir) is always available.ExDataSketch.Backend.Rust(optional, precompiled binaries provided) provides NIF acceleration. - Serialization: ExDataSketch-native format (EXSK) for all sketches, plus Apache DataSketches interop for Theta CompactSketch.
- Deterministic hashing:
ExDataSketch.Hashprovides a stable 64-bit hash interface for reproducible results.
Quick Example
# Cardinality estimation with HLL
sketch = ExDataSketch.HLL.new(p: 14)
sketch = ExDataSketch.update_many(sketch, ["alice", "bob", "alice"])
ExDataSketch.HLL.estimate(sketch)
# Frequency estimation with CMS
sketch = ExDataSketch.CMS.new(width: 2048, depth: 5)
sketch = ExDataSketch.update_many(sketch, ["page_a", "page_a", "page_b"])
ExDataSketch.CMS.estimate(sketch, "page_a")Integration Patterns
Each sketch module provides convenience functions for ecosystem integration:
from_enumerable/2— build a sketch from anyEnumerablein one call.merge_many/1— merge a collection of sketches (e.g. from parallel workers).reducer/1— returns a 2-arity function for use withEnum.reduce/3, Flow, etc.merger/1— returns a 2-arity function for merging sketches in reduce operations.
Stream Integration
ExDataSketch.Stream provides terminal stream consumers that build sketches
from lazy enumerables without buffering the entire input:
1..100_000
|> Stream.map(&to_string/1)
|> ExDataSketch.Stream.hll(p: 14)
|> ExDataSketch.HLL.estimate()For partition-local reduction:
1..1_000_000
|> ExDataSketch.Stream.reduce_partitioned(ExDataSketch.HLL, partitions: 8, p: 14)Collectable
All mergeable sketches implement the Collectable protocol, enabling
Enum.into/2 usage:
sketch = Enum.into(1..1000, ExDataSketch.HLL.new(p: 14))See the Integration Guide for examples with Flow, Broadway, Explorer, Nx, and other ecosystem libraries.
See the Quick Start guide for more examples.
Summary
Functions
Updates a sketch with multiple items in a single pass.
Functions
@spec update_many( ExDataSketch.HLL.t() | ExDataSketch.CMS.t() | ExDataSketch.Theta.t() | ExDataSketch.KLL.t() | ExDataSketch.DDSketch.t() | ExDataSketch.FrequentItems.t() | ExDataSketch.Bloom.t() | ExDataSketch.Cuckoo.t() | ExDataSketch.Quotient.t() | ExDataSketch.CQF.t() | ExDataSketch.IBLT.t() | ExDataSketch.REQ.t() | ExDataSketch.MisraGries.t(), Enumerable.t() ) :: ExDataSketch.HLL.t() | ExDataSketch.CMS.t() | ExDataSketch.Theta.t() | ExDataSketch.KLL.t() | ExDataSketch.DDSketch.t() | ExDataSketch.FrequentItems.t() | ExDataSketch.Bloom.t() | ExDataSketch.Cuckoo.t() | ExDataSketch.Quotient.t() | ExDataSketch.CQF.t() | ExDataSketch.IBLT.t() | ExDataSketch.REQ.t() | ExDataSketch.MisraGries.t()
Updates a sketch with multiple items in a single pass.
Delegates to the appropriate sketch module's update_many/2 based on
the struct type.
Examples
iex> sketch = ExDataSketch.HLL.new(p: 10)
iex> sketch = ExDataSketch.update_many(sketch, ["a", "b"])
iex> ExDataSketch.HLL.estimate(sketch) > 0.0
true