All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

[0.9.0] - 2026-05-19

Release theme: Streaming Integrations. Transforms ex_data_sketch from a collection of probabilistic algorithms into a BEAM-native streaming approximate analytics infrastructure layer. Stream/Collectable integration, Broadway/GenStage/Flow pipelines, five persistence backends, production-grade telemetry + OpenTelemetry, ULL accuracy fixes, and comprehensive educational materials.

Added

  • Stream and Collectable integration (Phase 1).

  • Broadway, GenStage, and Flow integration (Phase 2).

  • Persistence surfaces (Phase 3).

  • Telemetry and observability (Phase 4).

    • ExDataSketch.Telemetry -- structured telemetry event emission at batch/compound operation boundaries (not per-update).
    • Four event categories: :sketch (create, ingest, merge, serialize, deserialize), :persistence (save, load, merge, delete), :stream (reduce, partition_merge), :pipeline (accumulate, periodic_flush).
    • Telemetry.execute/4, Telemetry.span/5, Telemetry.span_with_result/6 -- timing wrappers with category-based enable/disable.
    • Telemetry.event_name/2 and all_event_names/0 for programmatic handler attachment.
    • ExDataSketch.Telemetry.OpenTelemetry -- OTEL span bridge (requires :opentelemetry_api ~> 1.0).
    • Configuration: config :ex_data_sketch, telemetry_enabled: false or per-category config :ex_data_sketch, :telemetry, sketch: true, persistence: false.
    • Telemetry events instrumented in all 13 sketch modules, all 5 storage backends, Stream, and Broadway/GenStage/Flow.
  • ULL accuracy correction (Phase 5).

    • ULL linear counting correction: zeros > 0 threshold (not HLL-style raw_estimate <= 2.5*m && zeros > 0). Empirical validation shows linear counting always more accurate for ULL when empty registers exist.
    • ULL large range correction: bias correction for very high cardinality estimates, matching Ertl 2023.
    • Both Pure Elixir and Rust NIF backends updated; property tests updated with tiered accuracy bounds (35%/25%/15% at p=8).
  • Configurable update_many chunk size (Phase 5).

    • update_many_chunk_size option on HLL, ULL, CMS, and Theta (via new/1 opts). Default 10,000 (backward compatible).
  • EXSK v1 serialization escape hatch (Phase 5).

    • HLL.serialize(sketch, format: :v1) produces a backward-compatible v0.7.x binary (requires :phash2 hash strategy, raises ArgumentError for other strategies).
    • Binary.encode_v1/4 utility for custom v1 encoding.
    • v0.7.x binaries remain decodable via Binary.decode/1 (version sniffing).
  • Generalized corruption propagation properties (Phase 5).

    • HLL, ULL, and CMS bit-flip properties in property_guarantees_test.exs asserting that corrupted frames either fail CRC or produce estimates within 10x of the truthful estimate (never silently catastrophic).
    • Quotient filter delete property: count reduction (not member? becomes false).
  • Benchmarks and property tests (Phase 6).

    • bench/persistence_bench.exs -- ETS save/load/merge overhead.
    • bench/serialization_bench.exs -- serialize/deserialize throughput.
    • bench/merge_throughput_bench.exs -- HLL/ULL/CMS merge_many benchmarks.
    • bench/update_many_chunk_bench.exs -- configurable chunk size impact on throughput.
    • bench/stream_ingestion_bench.exs -- stream ingestion latency and throughput.
    • test/ex_data_sketch_serialization_stability_test.exs -- 7 round-trip properties (HLL v2/v1, ULL, CMS, Theta, Bloom, v1-v2 cross-version).
    • Expanded stream properties: ULL stream equivalence, ULL partition merge, ULL merge associativity, Theta stream equivalence, CMS merge associativity.
    • Expanded storage properties: DETS save/load, DETS merge.
  • Educational materials.

    • guides/aggregation_wall.md (188 lines) -- why exact aggregation breaks at scale, BEAM's natural fit, common patterns.
    • guides/distributed_merge_semantics.md (328 lines) -- associativity/commutativity proofs, fan-in/tree/partition patterns, anti-patterns.
    • guides/livebooks.md -- Livebook catalogue with recommended order and learning objectives.
    • Updated guides/telemetry.md -- pipeline/stream event tables, all_event_names/0 reference.
    • Updated guides/streaming_sketches.md -- Stream API, Collectable, partitioned reduction.
    • Updated guides/broadway_integration.md -- accumulate/3, accumulate_into/4, PeriodicAggregator.
    • Updated guides/genstage_integration.md -- SketchConsumer, SketchProducer, SketchStage.
    • Updated guides/persistence.md -- all 5 backends, configuration, EXSK v2 storage contract.
    • Updated guides/observability.md -- telemetry categories, event names, OTEL bridge.
  • Livebooks.

    • livebooks/streaming_cardinality.livemd -- Stream API, precision tradeoffs, ULL vs HLL.
    • livebooks/broadway_integration.livemd -- accumulate, PeriodicAggregator, partition handling.
    • livebooks/genstage_aggregation.livemd -- SketchConsumer, SketchProducer, flush patterns.
    • livebooks/rolling_telemetry.livemd -- time-windowed sketches, ETS persistence.
    • livebooks/distributed_merges.livemd -- associativity, tree aggregation, ETS sharding.
    • livebooks/persistence_snapshots.livemd -- ETS/DETS, serialization, multi-backend strategy.
    • livebooks/livedashboard_integration.livemd -- telemetry wiring, custom dashboard pages.
    • livebooks/ai_token_analytics.livemd -- LLM workload multi-dimensional sketch dashboard.
    • livebooks/phoenix_observability.livemd -- DAU, latency, rate limiting, ETS persistence.

Changed

  • ExDataSketch.HLL.new/1 now accepts update_many_chunk_size option (default 10,000).
  • ExDataSketch.ULL.new/1 now accepts update_many_chunk_size option (default 10,000).
  • ExDataSketch.CMS.new/1 now accepts update_many_chunk_size option (default 10,000).
  • ExDataSketch.Theta.new/1 now accepts update_many_chunk_size option (default 10,000).
  • ULL accuracy at low cardinalities significantly improved via linear counting + large range correction. Users may see different estimates for sketches with very few items; the new estimates are more accurate.
  • ExDataSketch.ULL moduledoc updated with estimation strategy description and p>=12 recommendation.

Fixed

  • ULL low-precision accuracy: p=8 with n=1000 improved from ~62.5% relative error to ~0.8% via linear counting correction.
  • ETS merge test tolerance: 60% tolerance for cardinality < 5 (HLL at p=10 has high relative error at tiny cardinalities).
  • Quotient filter delete property: corrected from asserting member? becomes false (not guaranteed) to asserting count reduction.
  • DETS API: corrected close_file to :dets.close/1 in property tests.
  • PeriodicAggregator telemetry metadata: uses sketch_type only (removed non-existent state.id).
  • OTEL handler IDs: tuples {"ex_data_sketch_opentelemetry", event_name}, not strings.

Migration

See guides/v0.8.0_migration_notes.md for the v0.7.x to v0.8.0 migration guide. For v0.8.0 to v0.9.0:

  • No code changes required for most users. All new modules are additive; existing APIs are backward compatible.
  • ULL estimates may change at very low cardinalities (p < 12, n < 500). The new estimates are more accurate. If you depend on exact numeric values in tests, add tolerance for small cardinalities.
  • update_many_chunk_size defaults to 10,000 (matching v0.8.0 behavior). No change needed unless you want to tune batch throughput.
  • v1 serialization is an opt-in escape hatch via format: :v1. Default serialization remains EXSK v2.
  • Telemetry is enabled by default. Disable with config :ex_data_sketch, telemetry_enabled: false.
  • Persistence backends are enabled by default when their runtime dependencies are available. Disable individuals via config :ex_data_sketch, :persistence_backends, ets: [enabled: false].

Stats

  • +21 new modules: Stream, Broadway, Broadway.PeriodicAggregator, Flow, GenStage, GenStage.SketchConsumer, GenStage.SketchProducer, GenStage.SketchStage, Storage, Storage.ETS, Storage.DETS, Storage.CubDB, Storage.Mnesia, Storage.Ecto, Storage.Ecto.Schema, Storage.Ecto.Migration, Telemetry, Telemetry.OpenTelemetry, Integration, Binary, Binary.encode_v1/4 utility.
  • 1558 tests, 204 doctests, 199 properties, 0 failures (NIF on).
  • 9 Livebooks, 20 guides (3 new educational guides + 6 updated + livebooks.md index).
  • 5 new benchmark suites, 7 new property test groups.
  • :telemetry ~> 1.0 required dependency; :opentelemetry_api ~> 1.0, :broadway, :flow, :cubdb, :ecto_sql, :mnesia optional dependencies.

[0.8.0] - 2026-05-12

Release theme: Deterministic Foundations. Transforms ex_data_sketch from a collection of probabilistic algorithms into a production-grade probabilistic runtime for the BEAM. Focus: deterministic hashing, binary stability, corruption detection, hot-path performance, and installation reliability.

Added

  • Deterministic hashing infrastructure (Phase 1).

    • ExDataSketch.Hash.XXH3 — focused XXHash3-64 wrapper. Raises ArgumentError when the Rust NIF is unavailable so hash drift cannot occur silently.
    • ExDataSketch.Hash.Murmur3 — full MurmurHash3_x64_128 returning the high 64 bits (Apache DataSketches convention). Pure Elixir and Rust NIF implementations are byte-identical, verified by property-based parity (200 random inputs per CI run) and against canonical Python mmh3 regression vectors.
    • ExDataSketch.Hash.Metadata — 16-byte versioned binary block recording (algorithm, seed, sketch_family, sketch_family_version, backend) with a forward-compatible extension trailer. The building block for the EXSK v2 binary header.
    • ExDataSketch.Hash.Validation — centralized merge-compatibility checks (validate_options!/3, validate_metadata!/3, compatible_options?/2).
    • Public registry API on ExDataSketch.Hash: default_algorithm/0, supported_algorithms/0, algorithm_info/1.
    • ExDataSketch.Hash.resolve_strategy/1 — single source of truth for sketch constructors. Honors a user-supplied :hash_strategy or falls back to default_algorithm/0.
    • HLL, ULL, Theta, CMS new/1 now respect user-supplied :hash_strategy (:xxhash3 | :murmur3 | :phash2). The option was silently overridden in v0.7.x.

  • Binary stability and corruption detection (Phase 2).

    • ExDataSketch.Binary — public facade for the EXSK v2 frame (encode/3, decode/1, peek_version/1, build_payload/2, metadata_from_opts/3).
    • ExDataSketch.Binary.Header — EXSK v2 frame encoder/decoder. Layout: magic + version + sketch_family + family_version + flags + header_size + Hash.Metadata block + payload_size + payload + CRC32C trailer.
    • ExDataSketch.Binary.Validator — discrete defensive check helpers (check_minimum_v2_size, check_magic, check_version, check_crc).
    • ExDataSketch.Binary.CRC — CRC32C (Castagnoli polynomial, reflected, init 0xFFFFFFFF, final XOR 0xFFFFFFFF). Pure Elixir and Rust NIF implementations are byte-identical, verified against the standard "123456789" -> 0xE3069283 check vector and Python crc32c regression vectors.
    • Rust crc32c_nif — table-driven CRC32C, ~1 GB/s on commodity hardware.
  • HLL hot-path generalization (Phase 3).

    • 8 new Rust NIFs: {hll, ull, theta, cms}_update_many_raw_h_nif/_dirty_nif. Each accepts an algorithm: u8 parameter (1 = xxhash3, 2 = murmur3) and shares a single Murmur3_x64_128 implementation via pub(crate) export from hash.rs.
    • End-to-end Murmur3 acceleration: :murmur3 callers now hit the in-Rust hashing fast path instead of falling off to the Elixir-side hash.
    • bench/hll_hot_path_bench.exs — comprehensive benchmark across Pure phash2 / Pure xxhash3 / Rust raw XXH3 / Rust raw_h Murmur3 at 10k / 100k / 1M items.
  • Precompiled NIF platform matrix (Phase 4).

    • Two Windows targets added: x86_64-pc-windows-msvc and aarch64-pc-windows-msvc. Release matrix is now 8 targets × 2 NIF versions = 16 artifacts per tagged release.
    • mix test.nif_on and mix test.nif_off aliases automatically reset the per-env rustler_precompiled :force_build state between local NIF-mode flips.
    • test/ex_data_sketch/nif_availability_test.exs — 18 contract tests asserting Hash.nif_available?/0 stability, default-algorithm reflection, registry availability flags, XXH3.hash/2 failure mode, Murmur3 NIF-less fallback, checksum file shape, and nif.exrelease.yml target-list alignment.
  • Property-based validation (Phase 5).

    • test/property_guarantees_test.exs — 14 new properties locking:
      • HLL / ULL cardinality monotonicity and error bounds within published RSE.
      • KLL / REQ rank monotonicity and quantile/rank inversion within published epsilon.
      • CMS overestimation-only (estimate(item) >= true_count(item)).
      • Bloom / XorFilter / Cuckoo no-false-negative guarantees.
      • Binary v2 bit-flip corruption never silently propagates to a sketch.
  • User-facing release guides (shipped to HexDocs):

    • guides/v0.8.0_migration_notes.md — v0.7.x to v0.8.0 upgrade guide.
    • guides/v0.8.0_architecture.md — layered architecture overview.
    • guides/serialization_compatibility.md — the v0.x EXSK stability contract.
    • guides/hash_strategies.md — choosing between phash2, XXH3, Murmur3, and custom.
    • guides/hll_performance.md — HLL hot-path architecture, benchmark numbers, and external-library context.
    • guides/precompiled_nifs.md — platform matrix, release pipeline, and source-build fallback.
    • guides/roadmap.md — v0.9.0 preview.
  • Internal plans and reviewer checklists (repo-only, not packaged):

Changed

  • EXSK serialization format bumped to v2. Every sketch's serialize/1 now produces an EXSK v2 frame (magic + version 2 + sketch family + family version + flags + header_size + 16-byte hash metadata block + payload size + payload + CRC32C trailer). v0.7.x EXSK v1 frames remain decodable via Binary.decode/1's version sniffing; ExDataSketch.Codec is preserved as the legacy v1 path.
  • Golden vectors regenerated as v2 under test/vectors/. The previous v1 vectors are preserved under test/vectors_v1/ and exercised by test/ex_data_sketch_v1_compat_test.exs as a permanent regression guard.
  • README.md roadmap rewritten to match the strategic roadmap in plans/next_steps.md: v0.8.0 = Deterministic Foundations; v0.9.0 = Streaming Integrations; v0.10.0 = Apache Interoperability; v0.11.0 = New Sketch Families (CPC, Tuple); v0.12.0 = Similarity & Sampling (MinHash, VarOpt); v1.0.0 = Stable Binary Contract.
  • ExDataSketch.Hash.validate_merge_hash_compat!/3 is preserved as a backward-compatible shim that delegates to ExDataSketch.Hash.Validation.validate_options!/3.

Fixed

  • ExDataSketch.Hash.XXH3 doctests are now NIF-safe: they exercise the success path when the NIF is available and explicitly verify the documented ArgumentError contract when the NIF is unavailable, removing a CI failure on the EX_DATA_SKETCH_SKIP_NIF=true lane.
  • test/ex_data_sketch/nif_availability_test.exs checksum-file assertions softened to "if present, parses as a map" so fresh checkouts (where checksum-Elixir.ExDataSketch.Nif.exs has not been populated by the release pipeline) pass CI.

Migration

See guides/v0.8.0_migration_notes.md (shipped in HexDocs) for the full v0.7.x -> v0.8.0 migration guide. Key points:

  • No code changes required for most users. EXSK v1 frames are still decoded; the serialize/1 output format changes but downstream code that uses round-trip serialization sees no API difference.
  • One-way upgrade for persisted sketches. v0.7.x cannot read v0.8.0-produced binaries. Stage your rollout: deploy v0.8.0 readers first, then producers.
  • Opt-in Murmur3. New :murmur3 strategy is opt-in via hash_strategy: :murmur3 at sketch construction. Default remains :xxhash3.

Documentation

User-facing guides (shipped to HexDocs and the Hex package):

  • guides/v0.8.0_architecture.md — consolidated Phase 1-5 design overview.
  • guides/serialization_compatibility.md — the v0.x stability contract.
  • guides/roadmap.md — preview of the next release's streaming-integration scope.

Internal documentation (repo-only; linked from the user guides):

Stats

  • +10 new modules: Hash.XXH3, Hash.Murmur3, Hash.Metadata, Hash.Validation, Binary, Binary.Header, Binary.Validator, Binary.CRC (Elixir); 2 Rust NIF entry points (hash, crc).
  • +11 Rust NIFs: murmur3_x64_128_nif, murmur3_x64_128_full_nif, crc32c_nif, 4 × *_update_many_raw_h_nif + 4 × *_dirty_nif.
  • +92 tests, +33 doctests, +19 properties since v0.7.1.
  • Full suite (NIF on): 1,317 tests, 202 doctests, 171 properties, 0 failures.
  • Full suite (NIF off): 1,088 tests, 202 doctests, 128 properties, 0 failures.
  • Coverage: 92.7% line coverage (target was 70%).
  • HLL throughput: 25-34 M items/sec at p=14 (XXH3, Rust raw); ~15x faster than the Pure path.

[0.7.1] - 2026-03-22

Added

  • Move hashing into NIF batch calls: update_many for HLL, ULL, Theta, and CMS now sends raw items to Rust and hashes inside the NIF, eliminating per-item Elixir heap allocations (94.6% memory reduction at 10M items). (#202)
  • Wire :hash_fn and :seed options through HLL, ULL, Theta, and CMS, enabling custom hash functions and reproducible seeded hashing. (#198)
  • Merge hash-compatibility validation: merge/2 on HLL, ULL, Theta, and CMS now raises IncompatibleSketchesError when hash strategy or seed differs between sketches. (#205)
  • Pure backend hll_update_many optimization: pre-aggregate map with sorted binary splice replaces tuple-based per-hash full-tuple copies, reducing transient allocation from O(n * m) to O(n + m).
  • ListIterator-based NIF item decoding for zero-copy Erlang list iteration in Rust.
  • Test infrastructure: configurable coverage baselines, Rust CI coverage reporting, 39 new tests covering deserialization edge cases, custom hash_fn paths, helper functions, and merge validation.

Fixed

  • Quotient filter wrap-around bug: extract_all in both Pure and Rust backends failed when a cluster wrapped from slot N-1 to slot 0, producing nil quotients (Pure crash) or silent corruption (Rust). (#203, #204)
  • CMS merge validation: replaced flawed Keyword.delete(opts, :hash_strategy) comparison with explicit width/depth/counter_width checks.
  • Pure backend update_many regression: restored chunk + batch *_update_many path for HLL, ULL, and Theta (was incorrectly using per-item *_update).
  • Seed clamping: all raw NIF functions now clamp seed values to u64 range before passing to Rust.
  • Hash-dependent vector tests tagged with @tag :rust_nif to prevent failures in pure-only CI (vectors were generated with xxhash3).

[0.7.0] - 2026-03-11

Added

  • UltraLogLog sketch (ExDataSketch.ULL) for improved cardinality estimation with ~20% lower relative error than HLL at the same memory footprint. ULL1 binary state format with 8-byte header + 2^p registers. EXSK serialization (sketch ID 15).
  • ULL register encoding from Ertl 2023: register_value = 2 * geometric_rank - sub_bit doubles the information per register compared to HLL.
  • FGRA estimator (Ertl 2017 sigma/tau convergence) for ULL cardinality estimation.
  • Rust NIF acceleration for ULL: update_many, merge, and estimate operations with dirty scheduler thresholds.
  • Precision parameter p supports range 4..26 (vs 4..16 for HLL), allowing higher accuracy at larger memory budgets.
  • Full API: new, update, update_many, merge, estimate, count, serialize, deserialize, from_enumerable, merge_many, reducer, merger, size_bytes.
  • ULL test vectors, parity tests, merge law property tests, and benchmark suite.

[0.6.0] - 2026-03-11

Added

  • REQ sketch (ExDataSketch.REQ) for relative-error quantile estimation with configurable high-rank accuracy. REQ1 binary state format. EXSK serialization (sketch ID 13).
  • Misra-Gries sketch (ExDataSketch.MisraGries) for deterministic heavy-hitter detection with configurable key encoding (:binary, :int, {:term, :external}). MG01 binary state format. EXSK serialization (sketch ID 14).
  • XXHash3 NIF integration (ExDataSketch.Hash.xxhash3_64/1,2) for fast, cross-platform stable hashing via Rust NIF with phash2-based fallback.
  • KLL cdf/2 and pmf/2 for cumulative distribution and probability mass functions.
  • DDSketch rank/2 for normalized rank queries.
  • Rust NIF acceleration for all membership filters: Bloom, Cuckoo, Quotient, CQF, XorFilter, and IBLT. Batch operations (put_many, merge, build) automatically use compiled Rust NIFs when available, with dirty scheduler thresholds for large inputs.
  • Parity tests verifying byte-identical serialization between Pure Elixir and Rust NIF backends for all sketch algorithms.
  • Benchmark suites for REQ sketch, Misra-Gries, and XXHash3 NIF throughput.
  • Quantiles facade for unified quantile sketch API across KLL and DDSketch.

[0.5.0] - 2026-03-10

Added

  • Cuckoo filter (ExDataSketch.Cuckoo) with Pure Elixir backend. CKO1 binary state format. Partial-key cuckoo hashing with configurable fingerprint size, bucket size, and max kicks. Supports insertion, safe deletion, and membership testing. EXSK serialization (sketch ID 8).
  • Quotient filter (ExDataSketch.Quotient) with Pure Elixir backend. QOT1 binary state format. Quotient/remainder fingerprint splitting with linear probing and metadata bits (is_occupied, is_continuation, is_shifted). Supports insertion, safe deletion, merge, and membership testing. EXSK serialization (sketch ID 9).
  • Counting Quotient Filter (ExDataSketch.CQF) with Pure Elixir backend. CQF1 binary state format. Extends quotient filter with variable-length counter encoding for multiset membership and approximate counting via estimate_count/2. Supports insertion, deletion, merge. EXSK serialization (sketch ID 10).
  • XorFilter (ExDataSketch.XorFilter) with Pure Elixir backend. XOR1 binary state format. Static build-once immutable filter constructed via build/2 with 8-bit or 16-bit fingerprints. Supports membership testing only. EXSK serialization (sketch ID 11).
  • IBLT (ExDataSketch.IBLT) with Pure Elixir backend. IBL1 binary state format. Invertible Bloom Lookup Table for set reconciliation via subtract/2 and list_entries/1. Supports set mode and key-value mode, insertion, deletion, merge. EXSK serialization (sketch ID 12).
  • FilterChain (ExDataSketch.FilterChain) for capability-aware membership filter composition. FCN1 binary state format. Lifecycle-tier patterns (hot/warm/cold) with stage position enforcement. Supports add_stage/2, put/2, member?/2, delete/2. Serializes all stages in order.
  • Benchmark suites for Cuckoo, Quotient, CQF, XorFilter, IBLT, and FilterChain (bench/*.exs).
  • UnsupportedOperationError for operations not supported by a structure (used by FilterChain).
  • InvalidChainCompositionError for invalid FilterChain stage composition.
  • capabilities/0 function on Bloom, Cuckoo, Quotient, CQF, XorFilter, IBLT, and FilterChain modules.
  • Cuckoo, Quotient, CQF, XorFilter, IBLT, and FilterChain backend callbacks on ExDataSketch.Backend.

[0.4.0] - 2026-03-06

Added

  • Bloom filter (ExDataSketch.Bloom) with Pure Elixir backend.
  • BLM1 binary state format (40-byte header + LSB-first packed bitset).
  • Double hashing (Kirsch-Mitzenmacher) deriving k bit positions from a single 64-bit hash.
  • Bloom backend callbacks: bloom_new/1, bloom_put/3, bloom_put_many/3, bloom_member?/3, bloom_merge/3, bloom_count/2.
  • Automatic parameter derivation from capacity and false_positive_rate options.
  • Bloom merge via bitwise OR with validation of matching bit_count, hash_count, and seed.
  • Bloom popcount-based cardinality estimation.
  • Bloom serialization via EXSK envelope (sketch ID 7).
  • Bloom property tests (no false negatives, merge commutativity/associativity/identity, serialization round-trip).
  • Bloom statistical validation tests (observed FPR within 2x of target).
  • Bloom merge law properties in merge_laws_test.exs.
  • Bloom parity test stubs for future Rust NIF backend.
  • Bloom benchmark suite (bench/bloom_bench.exs).
  • Bloom options section in usage guide.

[0.3.0] - 2026-03-06

Added

  • FrequentItems sketch (ExDataSketch.FrequentItems) using the SpaceSaving algorithm with Pure Elixir and Rust NIF backends.
  • FI1 binary state format (32-byte header + variable-length sorted entries).
  • FrequentItems backend callbacks: fi_new/1, fi_update/3, fi_update_many/3, fi_merge/3, fi_estimate/3, fi_top_k/3, fi_count/2, fi_entry_count/2.
  • Batch optimization via pre-aggregation (Enum.frequencies/1) with weighted updates.
  • Deterministic tie-breaking on eviction (lexicographically smallest item_bytes).
  • Key encoding policies: :binary, :int (signed 64-bit LE), {:term, :external}.
  • Commutative merge via additive count combination and canonical replay.
  • Rust NIF acceleration for fi_update_many and fi_merge with dirty scheduler support.
  • FrequentItems support in ExDataSketch.update_many/2 facade.
  • FrequentItems merge law properties (commutativity, identity, count conservation).
  • FrequentItems golden vector test fixtures.
  • FrequentItems parity tests ensuring byte-identical output between Pure and Rust backends.
  • FrequentItems benchmark suite (bench/frequent_items_bench.exs).
  • EXSK codec sketch ID 6 for FrequentItems.
  • FrequentItems usage documentation in usage guide with SpaceSaving algorithm overview.
  • Mox test dependency for backend contract testing.
  • Theta compact/1 function for explicit compaction.

[0.2.1] - 2026-03-05

Added

  • DDSketch quantiles sketch (ExDataSketch.DDSketch) with Pure Elixir and Rust NIF backends.
  • DDSketch backend callbacks: ddsketch_new/1, ddsketch_update/3, ddsketch_update_many/3, ddsketch_merge/3, ddsketch_quantile/3, ddsketch_count/2, ddsketch_min/2, ddsketch_max/2.
  • Rust NIF acceleration for ddsketch_update_many and ddsketch_merge with dirty scheduler support.
  • DDSketch support in ExDataSketch.Quantiles facade (type: :ddsketch).
  • DDSketch merge law properties (commutativity, identity, count additivity, min/max preservation).
  • DDSketch golden vector test fixtures (empty, single, small_set, merge, zeros).
  • DDSketch parity tests ensuring byte-identical output between Pure and Rust backends.
  • DDSketch benchmark suite (bench/ddsketch_bench.exs).
  • EXSK codec sketch ID 5 for DDSketch.
  • DDSketch usage documentation in usage guide with KLL vs DDSketch comparison table.

[0.2.0] - 2026-03-04

Added

  • KLL quantiles sketch (ExDataSketch.KLL) with Pure Elixir and Rust NIF backends.
  • ExDataSketch.Quantiles facade module for type-dispatched quantile sketch access.
  • KLL backend callbacks: kll_new/1, kll_update/3, kll_update_many/3, kll_merge/3, kll_quantile/3, kll_rank/3, kll_count/2, kll_min/2, kll_max/2.
  • Rust NIF acceleration for kll_update_many and kll_merge with dirty scheduler support.
  • KLL merge law properties (associativity, commutativity, identity, count additivity, min/max preservation).
  • KLL golden vector test fixtures (empty, single, small_set, merge).
  • KLL parity tests ensuring byte-identical output between Pure and Rust backends.
  • KLL benchmark suite (bench/kll_bench.exs).
  • EXSK codec sketch ID 4 for KLL.

[0.1.1] - 2026-03-04

Added

  • Deterministic golden vector test fixtures for HLL, CMS, and Theta (JSON format with versioned schema).
  • Pure vs Rust parity test suite ensuring byte-identical serialization and estimates.
  • Merge-law property tests (associativity, commutativity, identity, chunking equivalence) for all sketch types.
  • Compatibility and Stability section in README documenting serialization and parity guarantees.
  • CI regression tracking for coverage and benchmark baselines.

Changed

  • Stabilized benchmark scripts with deterministic datasets and JSON output.
  • Clarified HLL and CMS DataSketches interop stubs as intentionally unimplemented (not "future").
  • Removed stale "Phase 2" language from module documentation.

[0.1.0] - 2026-03-02

Added

  • Precompiled Rust NIF binaries for macOS (ARM64, x86_64) and Linux (x86_64 gnu/musl, aarch64 gnu/musl).
  • Optional Rust NIF acceleration backend (ExDataSketch.Backend.Rust).
  • Rust NIFs for HLL (update_many, merge, estimate), CMS (update_many, merge), and Theta (update_many, merge).
  • Normal and dirty CPU scheduler NIF variants with configurable thresholds.
  • Automatic fallback to Pure Elixir backend when Rust NIF is unavailable.
  • Cross-backend parity tests ensuring byte-identical output between Pure and Rust.
  • Side-by-side Pure vs Rust benchmark scenarios.
  • CI jobs for Rust NIF compilation and testing (test-rust, bench-rust).
  • Pure Elixir Theta sketch implementation (new, update, update_many, compact, merge, estimate).
  • Apache DataSketches CompactSketch codec (serialize/deserialize) for Theta interop.
  • MurmurHash3 seed hash computation for DataSketches compatibility.
  • Deterministic test vectors for Theta sketch.
  • Cross-language vector harness specification for Java interop testing.
  • Theta Benchee benchmarks.
  • Pure Elixir HLL implementation (new, update, update_many, merge, estimate).
  • Pure Elixir CMS implementation (new, update, update_many, merge, estimate).
  • Deterministic test vectors for HLL and CMS.
  • Real Benchee benchmarks for HLL and CMS.
  • Project skeleton with directory structure and dependencies.
  • Public API stubs for HLL, CMS, and Theta sketch modules.
  • ExDataSketch-native binary codec (EXSK format).
  • Hash module with stable 64-bit hash interface.
  • Backend behaviour with Pure Elixir stub implementation.
  • Quick Start and Usage Guide documentation.
  • GitHub Actions CI workflow.
  • Integration convenience functions (from_enumerable/2, merge_many/1, reducer/1, merger/1) on all sketch modules.
  • Integration guide with ecosystem examples (Flow, Broadway, Explorer, Nx, ex_arrow/ExZarr).
  • Documented merge properties (associativity, commutativity) for HLL, CMS, and Theta.