v0.9.0 Roadmap Preview

Copy Markdown View Source

This document is the planning stub for the v0.9.0 release. It is intentionally not a commitment — actual scope will be locked when a v0.9.0 prompt is authored. The purpose here is to:

  1. Inform users and contributors what the v0.9.0 work surface will likely look like.
  2. Park v0.8.0 follow-up items so they are not lost.
  3. Give the next prompt-author a starting point.

For the strategic context see plans/next_steps.md. For the just-shipped release see CHANGELOG.md [0.8.0].

Release theme (proposed)

v0.9.0 — Streaming Integrations.

Where v0.8.0 hardened the substrate (deterministic hashing, binary stability, hot paths, precompiled NIFs, property validation), v0.9.0 opens it to the streaming and observability ecosystems that the BEAM already excels at. The goal is to make ExDataSketch.HLL.from_enumerable/2 the boring, obvious choice for anyone reaching for "approximate distinct count" inside a Broadway pipeline or a Phoenix LiveView.

Tracks (proposed)

Track A — Stream Integration

DeliverableNotes

| ExDataSketch.Stream | Wrap each sketch family as a stream sink: Stream |> ExDataSketch.Stream.hll(p: 14) |> Enum.to_list. Returns a sketch. | | Reducer / collectable | Implement Enumerable.reduce/3 adapters so any Enum.into/2 works. | | Partition-aware merge | A stream over an iolist of partitioned inputs should merge per-partition results correctly without manual merge_many/1. |

Track B — Broadway / GenStage

DeliverableNotes
ExDataSketch.BroadwayProducer / processor wrappers. Windowed sketches with periodic flush.
ExDataSketch.GenStageSame surface, GenStage-level.
Partition-aware aggregationPer-partition sketches that merge in the consumer stage.

Track C — Persistence

DeliverableNotes
ExDataSketch.Storage.ETSConcurrent-safe sketch table; periodic snapshot.
ExDataSketch.Storage.DETSDisk-backed equivalent.
ExDataSketch.Storage.CubDBHigh-throughput KV-store backed by CubDB.
Snapshot semanticsDocument atomic-merge guarantees per store.

Track D — Observability

DeliverableNotes
:telemetry eventsPer-operation latency, batch size, scheduler dispatch (normal vs dirty).
OpenTelemetry instrumentationAuto-link sketch operations to a parent span.
Suggested dashboardsA reference Grafana / LiveDashboard panel set for sketches in production.

Carry-forward from v0.8.0 (follow-up issues)

The following v0.8.0 risks are candidate v0.9.0 work. None of them is guaranteed scope; the v0.9.0 prompt should choose deliberately.

High-priority carry-forward

IDTitleWhy v0.9.0?
5-R1 / X-R1ULL low-p accuracy + HLL memory profile at 10M itemsBoth surface the "BEAM-side chunk lifecycle interacts with sketch internals at scale" theme. v0.9.0's streaming work has to grapple with batch size and memory budget anyway. Natural place to fix.
2-R1EXSK v2 one-way upgradeAn opt-in serialize(sketch, format: :v1) escape hatch would smooth multi-version rollouts. Trivial code change once the format is stable.
3-R4Membership filter raw-NIF hot pathThe 6 membership filters (Bloom / Cuckoo / Quotient / CQF / Xor / IBLT) still hash in Elixir. Closing this gap completes the "every cardinality / membership operation hashes inside Rust" promise.

Medium-priority carry-forward

IDTitleNotes
4-R5Backend.default/0 returns Pure regardless of NIF stateReconsider only with data showing adoption friction.
3-R7Benchmarks run on M1 onlyCI step to run on x86_64 / Linux ARM64 release matrix. Mechanical.
X-R2README roadmap not protected by integration testTiny ci/check_roadmap.exs script.
4-R4Cross-compile reliabilityAdd retry annotation to the four cross-compiled matrix entries.
5-R4Corruption-propagation property targets HLL onlyGeneralize across all sketches. Mechanical.

Low-priority / opportunistic

IDTitleNotes
5-R2REQ rank/quantile slack too looseEmpirically tighten.
5-R3Cuckoo saturation not exercised by propertyAdd saturation-specific property.
5-R5Property max_runs bounded; nightly deep runOptional CI workflow.
X-R3prompts/benchmark_comparisons.md is emptyDecide: populate or delete.

Explicitly deferred (NOT v0.9.0)

IDTitleTarget release
3-R56-bit register packingv1.0
3-R6SIMD intrinsics for HLLv1.0
3-R3Remove legacy _raw_nif familyv1.0 (binary-stability break)
1-R4Deprecate :phash2v0.10+ (data-driven)

Out-of-scope guardrails

The v0.9.0 release should NOT:

  • Add new sketch families (CPC, Tuple, MinHash, VarOpt are v0.11+).
  • Break the v0.x serialization compatibility contract documented in serialization_compatibility.md.
  • Default any opt-out path (e.g., flipping Backend.default/0 silently). Such changes are v1.0 work.
  • Add Rust dependencies that pull in a C compiler at NIF-build time (slows down EX_DATA_SKETCH_BUILD=1 users).

Suggested v0.9.0 prompt outline

When authoring the v0.9.0 prompt, the structure of the v0.8.0 prompt worked well and should be reused:

  1. Release theme banner.
  2. IMPORTANT EXECUTION RULES (architectural, Elixir design philosophy).
  3. PROJECT GOALS.
  4. RELEASE SCOPE — explicit INCLUDES and DOES NOT INCLUDE lists.
  5. Phase-by-phase breakdown with STOP conditions.
  6. FINAL RELEASE REQUIREMENTS.
  7. FINAL OUTPUT REQUIREMENTS.
  8. Next-release preview (v0.10.0).

Phase count for v0.9.0 will likely be similar (4-6 phases). Likely phase breakdown:

PhaseThemePrimary modules
1Stream + CollectableExDataSketch.Stream, Enumerable adapters
2Broadway / GenStageExDataSketch.Broadway, ExDataSketch.GenStage
3PersistenceExDataSketch.Storage.{ETS, DETS, CubDB}
4Telemetry / OpenTelemetryevent names + reference dashboards
5Carry-forward from v0.8.0 (high-priority risks)per the table above
6Property + bench expansion for new surfacesnew properties for streaming semantics

See also