All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
[0.9.0] - 2026-05-19
Release theme: Streaming Integrations. Transforms ex_data_sketch from a collection of probabilistic algorithms into a BEAM-native streaming approximate analytics infrastructure layer. Stream/Collectable integration, Broadway/GenStage/Flow pipelines, five persistence backends, production-grade telemetry + OpenTelemetry, ULL accuracy fixes, and comprehensive educational materials.
Added
Stream and Collectable integration (Phase 1).
ExDataSketch.Stream-- terminal stream consumers (hll/2,cms/2,theta/2,ull/2,kll/2,ddsketch/2,req/2,bloom/2,quotient/2,cqf/2,iblt/2,frequent_items/2,misra_gries//2).ExDataSketch.Stream.reduce_into/3-- reduce an enumerable into a module or existing sketch.ExDataSketch.Stream.reduce_partitioned/3-- partitioned parallel reduction with merge.Collectableprotocol for all mergeable sketches --Enum.into/2andforcomprehensions.from_enumerable/2on all 13 mergeable sketch modules.reducer/1andmerger/1on all mergeable sketch modules forEnum.reduce/3andFlow.reduce/3ergonomics.
Broadway, GenStage, and Flow integration (Phase 2).
ExDataSketch.Broadway.accumulate/3andaccumulate_into/4-- build sketches from Broadway message batches.ExDataSketch.Broadway.PeriodicAggregator-- GenServer that accumulates sketches and flushes on a timer with optional callback.ExDataSketch.GenStage.SketchConsumer-- GenStage consumer that accumulates events into a sketch, supports periodic flush.ExDataSketch.GenStage.SketchProducer-- GenStage producer that emits accumulated sketches on demand.ExDataSketch.GenStage.SketchStage-- combined producer-consumer that accumulates and emits.ExDataSketch.Flow.reduce/3andmerge/2-- parallel partition-local reduction with merge for Flow pipelines.- All integration modules are optional and gated behind dependency availability checks (
ExDataSketch.Integration).
Persistence surfaces (Phase 3).
ExDataSketch.Storage.ETS-- in-memory persistence withsave/3,load/3,merge/3,delete/2.ExDataSketch.Storage.DETS-- disk-backed persistence with same API.ExDataSketch.Storage.CubDB-- CubDB persistence for atomic key-value storage.ExDataSketch.Storage.Mnesia-- distributed persistence for multi-node scenarios.ExDataSketch.Storage.Ecto-- SQL database persistence with schema and migration helpers.ExDataSketch.Storage.Ecto.SchemaandExDataSketch.Storage.Ecto.Migration-- Ecto schema and migration for sketch storage.ExDataSketch.Storage-- shared behaviour documentation and types for all backends.- All backends serialize via EXSK v2 binary format with CRC32C checksum; no raw state is ever stored.
- Configuration-driven backend availability via
config :ex_data_sketch, :persistence_backends.
Telemetry and observability (Phase 4).
ExDataSketch.Telemetry-- structured telemetry event emission at batch/compound operation boundaries (not per-update).- Four event categories:
:sketch(create, ingest, merge, serialize, deserialize),:persistence(save, load, merge, delete),:stream(reduce, partition_merge),:pipeline(accumulate, periodic_flush). Telemetry.execute/4,Telemetry.span/5,Telemetry.span_with_result/6-- timing wrappers with category-based enable/disable.Telemetry.event_name/2andall_event_names/0for programmatic handler attachment.ExDataSketch.Telemetry.OpenTelemetry-- OTEL span bridge (requires:opentelemetry_api ~> 1.0).- Configuration:
config :ex_data_sketch, telemetry_enabled: falseor per-categoryconfig :ex_data_sketch, :telemetry, sketch: true, persistence: false. - Telemetry events instrumented in all 13 sketch modules, all 5 storage backends,
Stream, andBroadway/GenStage/Flow.
ULL accuracy correction (Phase 5).
- ULL linear counting correction:
zeros > 0threshold (not HLL-styleraw_estimate <= 2.5*m && zeros > 0). Empirical validation shows linear counting always more accurate for ULL when empty registers exist. - ULL large range correction: bias correction for very high cardinality estimates, matching Ertl 2023.
- Both Pure Elixir and Rust NIF backends updated; property tests updated with tiered accuracy bounds (35%/25%/15% at p=8).
- ULL linear counting correction:
Configurable
update_manychunk size (Phase 5).update_many_chunk_sizeoption on HLL, ULL, CMS, and Theta (vianew/1opts). Default 10,000 (backward compatible).
EXSK v1 serialization escape hatch (Phase 5).
HLL.serialize(sketch, format: :v1)produces a backward-compatible v0.7.x binary (requires:phash2hash strategy, raisesArgumentErrorfor other strategies).Binary.encode_v1/4utility for custom v1 encoding.- v0.7.x binaries remain decodable via
Binary.decode/1(version sniffing).
Generalized corruption propagation properties (Phase 5).
- HLL, ULL, and CMS bit-flip properties in
property_guarantees_test.exsasserting that corrupted frames either fail CRC or produce estimates within 10x of the truthful estimate (never silently catastrophic). - Quotient filter delete property: count reduction (not
member?becomes false).
- HLL, ULL, and CMS bit-flip properties in
Benchmarks and property tests (Phase 6).
bench/persistence_bench.exs-- ETS save/load/merge overhead.bench/serialization_bench.exs-- serialize/deserialize throughput.bench/merge_throughput_bench.exs-- HLL/ULL/CMSmerge_manybenchmarks.bench/update_many_chunk_bench.exs-- configurable chunk size impact on throughput.bench/stream_ingestion_bench.exs-- stream ingestion latency and throughput.test/ex_data_sketch_serialization_stability_test.exs-- 7 round-trip properties (HLL v2/v1, ULL, CMS, Theta, Bloom, v1-v2 cross-version).- Expanded stream properties: ULL stream equivalence, ULL partition merge, ULL merge associativity, Theta stream equivalence, CMS merge associativity.
- Expanded storage properties: DETS save/load, DETS merge.
Educational materials.
guides/aggregation_wall.md(188 lines) -- why exact aggregation breaks at scale, BEAM's natural fit, common patterns.guides/distributed_merge_semantics.md(328 lines) -- associativity/commutativity proofs, fan-in/tree/partition patterns, anti-patterns.guides/livebooks.md-- Livebook catalogue with recommended order and learning objectives.- Updated
guides/telemetry.md-- pipeline/stream event tables,all_event_names/0reference. - Updated
guides/streaming_sketches.md-- Stream API, Collectable, partitioned reduction. - Updated
guides/broadway_integration.md--accumulate/3,accumulate_into/4,PeriodicAggregator. - Updated
guides/genstage_integration.md--SketchConsumer,SketchProducer,SketchStage. - Updated
guides/persistence.md-- all 5 backends, configuration, EXSK v2 storage contract. - Updated
guides/observability.md-- telemetry categories, event names, OTEL bridge.
Livebooks.
livebooks/streaming_cardinality.livemd-- Stream API, precision tradeoffs, ULL vs HLL.livebooks/broadway_integration.livemd-- accumulate, PeriodicAggregator, partition handling.livebooks/genstage_aggregation.livemd-- SketchConsumer, SketchProducer, flush patterns.livebooks/rolling_telemetry.livemd-- time-windowed sketches, ETS persistence.livebooks/distributed_merges.livemd-- associativity, tree aggregation, ETS sharding.livebooks/persistence_snapshots.livemd-- ETS/DETS, serialization, multi-backend strategy.livebooks/livedashboard_integration.livemd-- telemetry wiring, custom dashboard pages.livebooks/ai_token_analytics.livemd-- LLM workload multi-dimensional sketch dashboard.livebooks/phoenix_observability.livemd-- DAU, latency, rate limiting, ETS persistence.
Changed
ExDataSketch.HLL.new/1now acceptsupdate_many_chunk_sizeoption (default 10,000).ExDataSketch.ULL.new/1now acceptsupdate_many_chunk_sizeoption (default 10,000).ExDataSketch.CMS.new/1now acceptsupdate_many_chunk_sizeoption (default 10,000).ExDataSketch.Theta.new/1now acceptsupdate_many_chunk_sizeoption (default 10,000).- ULL accuracy at low cardinalities significantly improved via linear counting + large range correction. Users may see different estimates for sketches with very few items; the new estimates are more accurate.
ExDataSketch.ULLmoduledoc updated with estimation strategy description and p>=12 recommendation.
Fixed
- ULL low-precision accuracy: p=8 with n=1000 improved from ~62.5% relative error to ~0.8% via linear counting correction.
- ETS merge test tolerance: 60% tolerance for cardinality < 5 (HLL at p=10 has high relative error at tiny cardinalities).
- Quotient filter delete property: corrected from asserting
member?becomes false (not guaranteed) to asserting count reduction. - DETS API: corrected
close_fileto:dets.close/1in property tests. PeriodicAggregatortelemetry metadata: usessketch_typeonly (removed non-existentstate.id).- OTEL handler IDs: tuples
{"ex_data_sketch_opentelemetry", event_name}, not strings.
Migration
See guides/v0.8.0_migration_notes.md for the v0.7.x to v0.8.0 migration guide. For v0.8.0 to v0.9.0:
- No code changes required for most users. All new modules are additive; existing APIs are backward compatible.
- ULL estimates may change at very low cardinalities (p < 12, n < 500). The new estimates are more accurate. If you depend on exact numeric values in tests, add tolerance for small cardinalities.
update_many_chunk_sizedefaults to 10,000 (matching v0.8.0 behavior). No change needed unless you want to tune batch throughput.- v1 serialization is an opt-in escape hatch via
format: :v1. Default serialization remains EXSK v2. - Telemetry is enabled by default. Disable with
config :ex_data_sketch, telemetry_enabled: false. - Persistence backends are enabled by default when their runtime dependencies are available. Disable individuals via
config :ex_data_sketch, :persistence_backends, ets: [enabled: false].
Stats
- +21 new modules:
Stream,Broadway,Broadway.PeriodicAggregator,Flow,GenStage,GenStage.SketchConsumer,GenStage.SketchProducer,GenStage.SketchStage,Storage,Storage.ETS,Storage.DETS,Storage.CubDB,Storage.Mnesia,Storage.Ecto,Storage.Ecto.Schema,Storage.Ecto.Migration,Telemetry,Telemetry.OpenTelemetry,Integration,Binary,Binary.encode_v1/4utility. - 1558 tests, 204 doctests, 199 properties, 0 failures (NIF on).
- 9 Livebooks, 20 guides (3 new educational guides + 6 updated +
livebooks.mdindex). - 5 new benchmark suites, 7 new property test groups.
:telemetry ~> 1.0required dependency;:opentelemetry_api ~> 1.0,:broadway,:flow,:cubdb,:ecto_sql,:mnesiaoptional dependencies.
[0.8.0] - 2026-05-12
Release theme: Deterministic Foundations. Transforms ex_data_sketch from a collection of probabilistic algorithms into a production-grade probabilistic runtime for the BEAM. Focus: deterministic hashing, binary stability, corruption detection, hot-path performance, and installation reliability.
Added
Deterministic hashing infrastructure (Phase 1).
ExDataSketch.Hash.XXH3— focused XXHash3-64 wrapper. RaisesArgumentErrorwhen the Rust NIF is unavailable so hash drift cannot occur silently.ExDataSketch.Hash.Murmur3— fullMurmurHash3_x64_128returning the high 64 bits (Apache DataSketches convention). Pure Elixir and Rust NIF implementations are byte-identical, verified by property-based parity (200 random inputs per CI run) and against canonical Pythonmmh3regression vectors.ExDataSketch.Hash.Metadata— 16-byte versioned binary block recording(algorithm, seed, sketch_family, sketch_family_version, backend)with a forward-compatible extension trailer. The building block for the EXSK v2 binary header.ExDataSketch.Hash.Validation— centralized merge-compatibility checks (validate_options!/3,validate_metadata!/3,compatible_options?/2).- Public registry API on
ExDataSketch.Hash:default_algorithm/0,supported_algorithms/0,algorithm_info/1. ExDataSketch.Hash.resolve_strategy/1— single source of truth for sketch constructors. Honors a user-supplied:hash_strategyor falls back todefault_algorithm/0.HLL, ULL, Theta, CMS
new/1now respect user-supplied:hash_strategy(:xxhash3 | :murmur3 | :phash2). The option was silently overridden in v0.7.x.
Binary stability and corruption detection (Phase 2).
ExDataSketch.Binary— public facade for the EXSK v2 frame (encode/3,decode/1,peek_version/1,build_payload/2,metadata_from_opts/3).ExDataSketch.Binary.Header— EXSK v2 frame encoder/decoder. Layout: magic + version + sketch_family + family_version + flags + header_size +Hash.Metadatablock + payload_size + payload + CRC32C trailer.ExDataSketch.Binary.Validator— discrete defensive check helpers (check_minimum_v2_size,check_magic,check_version,check_crc).ExDataSketch.Binary.CRC— CRC32C (Castagnoli polynomial, reflected, init0xFFFFFFFF, final XOR0xFFFFFFFF). Pure Elixir and Rust NIF implementations are byte-identical, verified against the standard"123456789" -> 0xE3069283check vector and Pythoncrc32cregression vectors.- Rust
crc32c_nif— table-driven CRC32C, ~1 GB/s on commodity hardware.
HLL hot-path generalization (Phase 3).
- 8 new Rust NIFs:
{hll, ull, theta, cms}_update_many_raw_h_nif/_dirty_nif. Each accepts analgorithm: u8parameter (1 = xxhash3,2 = murmur3) and shares a singleMurmur3_x64_128implementation viapub(crate)export fromhash.rs. - End-to-end Murmur3 acceleration:
:murmur3callers now hit the in-Rust hashing fast path instead of falling off to the Elixir-side hash. bench/hll_hot_path_bench.exs— comprehensive benchmark across Pure phash2 / Pure xxhash3 / Rust raw XXH3 / Rust raw_h Murmur3 at 10k / 100k / 1M items.
- 8 new Rust NIFs:
Precompiled NIF platform matrix (Phase 4).
- Two Windows targets added:
x86_64-pc-windows-msvcandaarch64-pc-windows-msvc. Release matrix is now 8 targets × 2 NIF versions = 16 artifacts per tagged release. mix test.nif_onandmix test.nif_offaliases automatically reset the per-envrustler_precompiled :force_buildstate between local NIF-mode flips.test/ex_data_sketch/nif_availability_test.exs— 18 contract tests assertingHash.nif_available?/0stability, default-algorithm reflection, registry availability flags,XXH3.hash/2failure mode, Murmur3 NIF-less fallback, checksum file shape, andnif.ex↔release.ymltarget-list alignment.
- Two Windows targets added:
Property-based validation (Phase 5).
test/property_guarantees_test.exs— 14 new properties locking:- HLL / ULL cardinality monotonicity and error bounds within published RSE.
- KLL / REQ rank monotonicity and quantile/rank inversion within published epsilon.
- CMS overestimation-only (
estimate(item) >= true_count(item)). - Bloom / XorFilter / Cuckoo no-false-negative guarantees.
- Binary v2 bit-flip corruption never silently propagates to a sketch.
User-facing release guides (shipped to HexDocs):
guides/v0.8.0_migration_notes.md— v0.7.x to v0.8.0 upgrade guide.guides/v0.8.0_architecture.md— layered architecture overview.guides/serialization_compatibility.md— the v0.x EXSK stability contract.guides/hash_strategies.md— choosing between phash2, XXH3, Murmur3, and custom.guides/hll_performance.md— HLL hot-path architecture, benchmark numbers, and external-library context.guides/precompiled_nifs.md— platform matrix, release pipeline, and source-build fallback.guides/roadmap.md— v0.9.0 preview.
Internal plans and reviewer checklists (repo-only, not packaged):
plans/hash_binary_contract.mdplans/binary_contract.md,plans/corruption_detection.mdplans/hll_scheduler_safety.mdplans/property_testing.mdplans/0.8.0_implementation_plan.md, Phase 1-5 reviewer checklistsplans/0.8.0-risks.md(31-risk consolidated register)plans/0.8.0-review.md(pre-release code review)
Changed
- EXSK serialization format bumped to v2. Every sketch's
serialize/1now produces an EXSK v2 frame (magic + version 2 + sketch family + family version + flags + header_size + 16-byte hash metadata block + payload size + payload + CRC32C trailer). v0.7.x EXSK v1 frames remain decodable viaBinary.decode/1's version sniffing;ExDataSketch.Codecis preserved as the legacy v1 path. - Golden vectors regenerated as v2 under
test/vectors/. The previous v1 vectors are preserved undertest/vectors_v1/and exercised bytest/ex_data_sketch_v1_compat_test.exsas a permanent regression guard. README.mdroadmap rewritten to match the strategic roadmap inplans/next_steps.md: v0.8.0 = Deterministic Foundations; v0.9.0 = Streaming Integrations; v0.10.0 = Apache Interoperability; v0.11.0 = New Sketch Families (CPC, Tuple); v0.12.0 = Similarity & Sampling (MinHash, VarOpt); v1.0.0 = Stable Binary Contract.ExDataSketch.Hash.validate_merge_hash_compat!/3is preserved as a backward-compatible shim that delegates toExDataSketch.Hash.Validation.validate_options!/3.
Fixed
ExDataSketch.Hash.XXH3doctests are now NIF-safe: they exercise the success path when the NIF is available and explicitly verify the documentedArgumentErrorcontract when the NIF is unavailable, removing a CI failure on theEX_DATA_SKETCH_SKIP_NIF=truelane.test/ex_data_sketch/nif_availability_test.exschecksum-file assertions softened to "if present, parses as a map" so fresh checkouts (wherechecksum-Elixir.ExDataSketch.Nif.exshas not been populated by the release pipeline) pass CI.
Migration
See guides/v0.8.0_migration_notes.md (shipped in HexDocs) for the full v0.7.x -> v0.8.0 migration guide. Key points:
- No code changes required for most users. EXSK v1 frames are still decoded; the
serialize/1output format changes but downstream code that uses round-trip serialization sees no API difference. - One-way upgrade for persisted sketches. v0.7.x cannot read v0.8.0-produced binaries. Stage your rollout: deploy v0.8.0 readers first, then producers.
- Opt-in Murmur3. New
:murmur3strategy is opt-in viahash_strategy: :murmur3at sketch construction. Default remains:xxhash3.
Documentation
User-facing guides (shipped to HexDocs and the Hex package):
guides/v0.8.0_architecture.md— consolidated Phase 1-5 design overview.guides/serialization_compatibility.md— the v0.x stability contract.guides/roadmap.md— preview of the next release's streaming-integration scope.
Internal documentation (repo-only; linked from the user guides):
plans/0.8.0-risks.md— open risk register at release time.plans/0.8.0-review.md— pre-release code review.
Stats
- +10 new modules:
Hash.XXH3,Hash.Murmur3,Hash.Metadata,Hash.Validation,Binary,Binary.Header,Binary.Validator,Binary.CRC(Elixir); 2 Rust NIF entry points (hash,crc). - +11 Rust NIFs:
murmur3_x64_128_nif,murmur3_x64_128_full_nif,crc32c_nif, 4 ×*_update_many_raw_h_nif+ 4 ×*_dirty_nif. - +92 tests, +33 doctests, +19 properties since v0.7.1.
- Full suite (NIF on): 1,317 tests, 202 doctests, 171 properties, 0 failures.
- Full suite (NIF off): 1,088 tests, 202 doctests, 128 properties, 0 failures.
- Coverage: 92.7% line coverage (target was 70%).
- HLL throughput: 25-34 M items/sec at p=14 (XXH3, Rust raw); ~15x faster than the Pure path.
[0.7.1] - 2026-03-22
Added
- Move hashing into NIF batch calls:
update_manyfor HLL, ULL, Theta, and CMS now sends raw items to Rust and hashes inside the NIF, eliminating per-item Elixir heap allocations (94.6% memory reduction at 10M items). (#202) - Wire
:hash_fnand:seedoptions through HLL, ULL, Theta, and CMS, enabling custom hash functions and reproducible seeded hashing. (#198) - Merge hash-compatibility validation:
merge/2on HLL, ULL, Theta, and CMS now raisesIncompatibleSketchesErrorwhen hash strategy or seed differs between sketches. (#205) - Pure backend
hll_update_manyoptimization: pre-aggregate map with sorted binary splice replaces tuple-based per-hash full-tuple copies, reducing transient allocation from O(n * m) to O(n + m). - ListIterator-based NIF item decoding for zero-copy Erlang list iteration in Rust.
- Test infrastructure: configurable coverage baselines, Rust CI coverage reporting, 39 new tests covering deserialization edge cases, custom hash_fn paths, helper functions, and merge validation.
Fixed
- Quotient filter wrap-around bug:
extract_allin both Pure and Rust backends failed when a cluster wrapped from slot N-1 to slot 0, producing nil quotients (Pure crash) or silent corruption (Rust). (#203, #204) - CMS merge validation: replaced flawed
Keyword.delete(opts, :hash_strategy)comparison with explicit width/depth/counter_width checks. - Pure backend
update_manyregression: restored chunk + batch*_update_manypath for HLL, ULL, and Theta (was incorrectly using per-item*_update). - Seed clamping: all raw NIF functions now clamp seed values to u64 range before passing to Rust.
- Hash-dependent vector tests tagged with
@tag :rust_nifto prevent failures in pure-only CI (vectors were generated with xxhash3).
[0.7.0] - 2026-03-11
Added
- UltraLogLog sketch (
ExDataSketch.ULL) for improved cardinality estimation with ~20% lower relative error than HLL at the same memory footprint. ULL1 binary state format with 8-byte header + 2^p registers. EXSK serialization (sketch ID 15). - ULL register encoding from Ertl 2023:
register_value = 2 * geometric_rank - sub_bitdoubles the information per register compared to HLL. - FGRA estimator (Ertl 2017 sigma/tau convergence) for ULL cardinality estimation.
- Rust NIF acceleration for ULL:
update_many,merge, andestimateoperations with dirty scheduler thresholds. - Precision parameter
psupports range 4..26 (vs 4..16 for HLL), allowing higher accuracy at larger memory budgets. - Full API:
new,update,update_many,merge,estimate,count,serialize,deserialize,from_enumerable,merge_many,reducer,merger,size_bytes. - ULL test vectors, parity tests, merge law property tests, and benchmark suite.
[0.6.0] - 2026-03-11
Added
- REQ sketch (
ExDataSketch.REQ) for relative-error quantile estimation with configurable high-rank accuracy. REQ1 binary state format. EXSK serialization (sketch ID 13). - Misra-Gries sketch (
ExDataSketch.MisraGries) for deterministic heavy-hitter detection with configurable key encoding (:binary,:int,{:term, :external}). MG01 binary state format. EXSK serialization (sketch ID 14). - XXHash3 NIF integration (
ExDataSketch.Hash.xxhash3_64/1,2) for fast, cross-platform stable hashing via Rust NIF with phash2-based fallback. - KLL
cdf/2andpmf/2for cumulative distribution and probability mass functions. - DDSketch
rank/2for normalized rank queries. - Rust NIF acceleration for all membership filters: Bloom, Cuckoo, Quotient, CQF, XorFilter, and IBLT. Batch operations (
put_many,merge,build) automatically use compiled Rust NIFs when available, with dirty scheduler thresholds for large inputs. - Parity tests verifying byte-identical serialization between Pure Elixir and Rust NIF backends for all sketch algorithms.
- Benchmark suites for REQ sketch, Misra-Gries, and XXHash3 NIF throughput.
Quantilesfacade for unified quantile sketch API across KLL and DDSketch.
[0.5.0] - 2026-03-10
Added
- Cuckoo filter (
ExDataSketch.Cuckoo) with Pure Elixir backend. CKO1 binary state format. Partial-key cuckoo hashing with configurable fingerprint size, bucket size, and max kicks. Supports insertion, safe deletion, and membership testing. EXSK serialization (sketch ID 8). - Quotient filter (
ExDataSketch.Quotient) with Pure Elixir backend. QOT1 binary state format. Quotient/remainder fingerprint splitting with linear probing and metadata bits (is_occupied, is_continuation, is_shifted). Supports insertion, safe deletion, merge, and membership testing. EXSK serialization (sketch ID 9). - Counting Quotient Filter (
ExDataSketch.CQF) with Pure Elixir backend. CQF1 binary state format. Extends quotient filter with variable-length counter encoding for multiset membership and approximate counting viaestimate_count/2. Supports insertion, deletion, merge. EXSK serialization (sketch ID 10). - XorFilter (
ExDataSketch.XorFilter) with Pure Elixir backend. XOR1 binary state format. Static build-once immutable filter constructed viabuild/2with 8-bit or 16-bit fingerprints. Supports membership testing only. EXSK serialization (sketch ID 11). - IBLT (
ExDataSketch.IBLT) with Pure Elixir backend. IBL1 binary state format. Invertible Bloom Lookup Table for set reconciliation viasubtract/2andlist_entries/1. Supports set mode and key-value mode, insertion, deletion, merge. EXSK serialization (sketch ID 12). - FilterChain (
ExDataSketch.FilterChain) for capability-aware membership filter composition. FCN1 binary state format. Lifecycle-tier patterns (hot/warm/cold) with stage position enforcement. Supportsadd_stage/2,put/2,member?/2,delete/2. Serializes all stages in order. - Benchmark suites for Cuckoo, Quotient, CQF, XorFilter, IBLT, and FilterChain (
bench/*.exs). UnsupportedOperationErrorfor operations not supported by a structure (used by FilterChain).InvalidChainCompositionErrorfor invalid FilterChain stage composition.capabilities/0function on Bloom, Cuckoo, Quotient, CQF, XorFilter, IBLT, and FilterChain modules.- Cuckoo, Quotient, CQF, XorFilter, IBLT, and FilterChain backend callbacks on
ExDataSketch.Backend.
[0.4.0] - 2026-03-06
Added
- Bloom filter (
ExDataSketch.Bloom) with Pure Elixir backend. - BLM1 binary state format (40-byte header + LSB-first packed bitset).
- Double hashing (Kirsch-Mitzenmacher) deriving k bit positions from a single 64-bit hash.
- Bloom backend callbacks:
bloom_new/1,bloom_put/3,bloom_put_many/3,bloom_member?/3,bloom_merge/3,bloom_count/2. - Automatic parameter derivation from capacity and false_positive_rate options.
- Bloom merge via bitwise OR with validation of matching bit_count, hash_count, and seed.
- Bloom popcount-based cardinality estimation.
- Bloom serialization via EXSK envelope (sketch ID 7).
- Bloom property tests (no false negatives, merge commutativity/associativity/identity, serialization round-trip).
- Bloom statistical validation tests (observed FPR within 2x of target).
- Bloom merge law properties in merge_laws_test.exs.
- Bloom parity test stubs for future Rust NIF backend.
- Bloom benchmark suite (
bench/bloom_bench.exs). - Bloom options section in usage guide.
[0.3.0] - 2026-03-06
Added
- FrequentItems sketch (
ExDataSketch.FrequentItems) using the SpaceSaving algorithm with Pure Elixir and Rust NIF backends. - FI1 binary state format (32-byte header + variable-length sorted entries).
- FrequentItems backend callbacks:
fi_new/1,fi_update/3,fi_update_many/3,fi_merge/3,fi_estimate/3,fi_top_k/3,fi_count/2,fi_entry_count/2. - Batch optimization via pre-aggregation (
Enum.frequencies/1) with weighted updates. - Deterministic tie-breaking on eviction (lexicographically smallest item_bytes).
- Key encoding policies:
:binary,:int(signed 64-bit LE),{:term, :external}. - Commutative merge via additive count combination and canonical replay.
- Rust NIF acceleration for
fi_update_manyandfi_mergewith dirty scheduler support. - FrequentItems support in
ExDataSketch.update_many/2facade. - FrequentItems merge law properties (commutativity, identity, count conservation).
- FrequentItems golden vector test fixtures.
- FrequentItems parity tests ensuring byte-identical output between Pure and Rust backends.
- FrequentItems benchmark suite (
bench/frequent_items_bench.exs). - EXSK codec sketch ID 6 for FrequentItems.
- FrequentItems usage documentation in usage guide with SpaceSaving algorithm overview.
- Mox test dependency for backend contract testing.
- Theta
compact/1function for explicit compaction.
[0.2.1] - 2026-03-05
Added
- DDSketch quantiles sketch (
ExDataSketch.DDSketch) with Pure Elixir and Rust NIF backends. - DDSketch backend callbacks:
ddsketch_new/1,ddsketch_update/3,ddsketch_update_many/3,ddsketch_merge/3,ddsketch_quantile/3,ddsketch_count/2,ddsketch_min/2,ddsketch_max/2. - Rust NIF acceleration for
ddsketch_update_manyandddsketch_mergewith dirty scheduler support. - DDSketch support in
ExDataSketch.Quantilesfacade (type: :ddsketch). - DDSketch merge law properties (commutativity, identity, count additivity, min/max preservation).
- DDSketch golden vector test fixtures (empty, single, small_set, merge, zeros).
- DDSketch parity tests ensuring byte-identical output between Pure and Rust backends.
- DDSketch benchmark suite (
bench/ddsketch_bench.exs). - EXSK codec sketch ID 5 for DDSketch.
- DDSketch usage documentation in usage guide with KLL vs DDSketch comparison table.
[0.2.0] - 2026-03-04
Added
- KLL quantiles sketch (
ExDataSketch.KLL) with Pure Elixir and Rust NIF backends. ExDataSketch.Quantilesfacade module for type-dispatched quantile sketch access.- KLL backend callbacks:
kll_new/1,kll_update/3,kll_update_many/3,kll_merge/3,kll_quantile/3,kll_rank/3,kll_count/2,kll_min/2,kll_max/2. - Rust NIF acceleration for
kll_update_manyandkll_mergewith dirty scheduler support. - KLL merge law properties (associativity, commutativity, identity, count additivity, min/max preservation).
- KLL golden vector test fixtures (empty, single, small_set, merge).
- KLL parity tests ensuring byte-identical output between Pure and Rust backends.
- KLL benchmark suite (
bench/kll_bench.exs). - EXSK codec sketch ID 4 for KLL.
[0.1.1] - 2026-03-04
Added
- Deterministic golden vector test fixtures for HLL, CMS, and Theta (JSON format with versioned schema).
- Pure vs Rust parity test suite ensuring byte-identical serialization and estimates.
- Merge-law property tests (associativity, commutativity, identity, chunking equivalence) for all sketch types.
- Compatibility and Stability section in README documenting serialization and parity guarantees.
- CI regression tracking for coverage and benchmark baselines.
Changed
- Stabilized benchmark scripts with deterministic datasets and JSON output.
- Clarified HLL and CMS DataSketches interop stubs as intentionally unimplemented (not "future").
- Removed stale "Phase 2" language from module documentation.
[0.1.0] - 2026-03-02
Added
- Precompiled Rust NIF binaries for macOS (ARM64, x86_64) and Linux (x86_64 gnu/musl, aarch64 gnu/musl).
- Optional Rust NIF acceleration backend (
ExDataSketch.Backend.Rust). - Rust NIFs for HLL (update_many, merge, estimate), CMS (update_many, merge), and Theta (update_many, merge).
- Normal and dirty CPU scheduler NIF variants with configurable thresholds.
- Automatic fallback to Pure Elixir backend when Rust NIF is unavailable.
- Cross-backend parity tests ensuring byte-identical output between Pure and Rust.
- Side-by-side Pure vs Rust benchmark scenarios.
- CI jobs for Rust NIF compilation and testing (
test-rust,bench-rust). - Pure Elixir Theta sketch implementation (new, update, update_many, compact, merge, estimate).
- Apache DataSketches CompactSketch codec (serialize/deserialize) for Theta interop.
- MurmurHash3 seed hash computation for DataSketches compatibility.
- Deterministic test vectors for Theta sketch.
- Cross-language vector harness specification for Java interop testing.
- Theta Benchee benchmarks.
- Pure Elixir HLL implementation (new, update, update_many, merge, estimate).
- Pure Elixir CMS implementation (new, update, update_many, merge, estimate).
- Deterministic test vectors for HLL and CMS.
- Real Benchee benchmarks for HLL and CMS.
- Project skeleton with directory structure and dependencies.
- Public API stubs for HLL, CMS, and Theta sketch modules.
- ExDataSketch-native binary codec (EXSK format).
- Hash module with stable 64-bit hash interface.
- Backend behaviour with Pure Elixir stub implementation.
- Quick Start and Usage Guide documentation.
- GitHub Actions CI workflow.
- Integration convenience functions (
from_enumerable/2,merge_many/1,reducer/1,merger/1) on all sketch modules. - Integration guide with ecosystem examples (Flow, Broadway, Explorer, Nx, ex_arrow/ExZarr).
- Documented merge properties (associativity, commutativity) for HLL, CMS, and Theta.