The Aggregation Wall

Copy Markdown View Source

What Is the Aggregation Wall?

When processing high-volume event streams, traditional exact aggregation hits a performance barrier: memory grows linearly with cardinality, CPU cost grows with distinct counting, and latency increases as data volume scales. This barrier is the aggregation wall -- the point where exact computation becomes economically or technically infeasible.

Probabilistic sketches break through this wall by trading a small, controlled amount of error for dramatically lower resource consumption.

Exact vs. Approximate: The Scaling Problem

Consider counting distinct users across a stream of 100 million events:

MethodMemoryTime (per event)Latency (p99)
MapSet (exact)~2 GBO(1) amortized50-200 ms
HLL p=14~16 KBO(1)< 0.01 ms
HLL p=10~1 KBO(1)< 0.01 ms

The exact MapSet approach uses 125,000x more memory than HLL at p=14. At 1 billion events, the MapSet approach requires ~20 GB of memory and GC pauses that can stall the BEAM for seconds.

Why the BEAM Makes Sketches Natural

The BEAM's actor model and message-passing semantics make sketch-based aggregation particularly natural:

  1. Per-process sketch state: Each BEAM process can hold its own sketch instance. No shared mutable state, no locks, no contention.

  2. Merge as message: Sketch merging is associative and commutative. A process can accumulate locally and periodically send its sketch to an aggregator process -- the merge is a single message.

  3. Partition-local aggregation: Each partition of a Broadway or GenStage pipeline accumulates into its own sketch. Partitions never share state. Merging happens at the consumer stage, exactly where you want it.

  4. Hot code upgrades: Because sketches are BEAM-owned binaries, they survive hot code upgrades. You can deploy new aggregator logic without losing in-flight sketch state.

The Aggregation Wall in Practice

Scenario 1: Real-Time Analytics Dashboard

Your Phoenix LiveView shows "active users in the last 5 minutes." Exact counting requires maintaining a time-windowed set of user IDs. At 100K concurrent users:

  • Exact: 100K entries in an ETS set = ~8 MB, O(n) per query
  • HLL p=14: 16 KB, O(1) per query, <0.5% error

The dashboard refreshes every second. The exact approach spends most of its CPU on set maintenance. The HLL approach is effectively free.

Scenario 2: Distributed Cardinality Across Nodes

You need to count distinct events across a 5-node cluster. Each node processes 1M events/second:

  • Exact: Each node must broadcast its full set of IDs to all other nodes. Network traffic grows as O(n^2 * cardinality).
  • Sketch merge: Each node maintains a local HLL (16 KB). Periodic broadcast of 16 KB sketches to an aggregator. Network traffic: O(n) sketches, each 16 KB.

At 1M events/second with 10M distinct IDs, the exact approach requires transferring hundreds of MB per second. The sketch approach transfers 80 KB per merge round.

Scenario 3: Ad Impression Counting

An ad platform counts impressions per campaign. A single campaign receives 50M impressions per day, with 20M unique viewers. Using CMS (width=1024, depth=5):

  • Exact count per viewer: 20M entries = ~160 MB per campaign
  • CMS: ~5 KB per campaign, O(1) update, O(1) query, <1% error

With 1000 active campaigns, exact counting needs 160 GB. CMS needs 5 MB.

Breaking Through the Wall

Pattern 1: Stream Accumulation

# Instead of collecting all items into a Set:
sketch = ExDataSketch.HLL.new(p: 14)
sketch = Enum.reduce(events, sketch, fn event, acc ->
  ExDataSketch.HLL.update(acc, event.user_id)
end)

# Or more ergonomically:
sketch = ExDataSketch.Stream.hll(events, p: 14)

Pattern 2: Collectable

sketch = Enum.into(events, ExDataSketch.HLL.new(p: 14))

Pattern 3: Broadway Pipeline

defmodule MyPipeline do
  use Broadway

  def handle_message(_, message, state) do
    %{sketch: sketch} = state
    updated = ExDataSketch.HLL.update(sketch, message.data.user_id)
    {:ok, message, %{state | sketch: updated}}
  end
end

Pattern 4: Periodic Aggregation

# Each of N worker processes holds a local sketch.
# Every 5 seconds, each sends its sketch to the aggregator.
defmodule Aggregator do
  def handle_info(:flush, state) do
    merged = ExDataSketch.HLL.merge_many(state.pending)
    # Store or publish the merged estimate
    {:noreply, %{state | pending: []}}
  end
end

Operational Considerations

Choosing Precision

Higher p means more memory but lower error. The sweet spot depends on your application:

pMemoryErrorBest For
101 KB~3.25%High-volume, low-precision dashboards
124 KB~1.63%General analytics
1416 KB~0.81%Production monitoring (recommended)
1664 KB~0.41%Financial compliance

Memory Budgets

When choosing p, consider your total memory budget across all sketch instances:

  • 1000 concurrent sketches at p=14 = 16 MB
  • 1000 concurrent sketches at p=10 = 1 MB
  • 1000 concurrent sketches at p=16 = 64 MB

When Sketches Are NOT Appropriate

Sketches are inappropriate when:

  1. Exact answers are required: Financial reconciliation, audit logging, compliance reporting.
  2. Cardinality is very small: If you expect < 100 distinct values, a MapSet is faster and uses less memory than any sketch.
  3. You need to enumerate the distinct values: Sketches estimate cardinality; they cannot list the values. Use a MapSet if you need the actual items.

Sketch Type Selection Guide

QuestionUse
How many unique items?HLL or ULL
How many times did item X appear?CMS
Is item X a member of the set?Bloom or Cuckoo
What's the median/value at percentile P?KLL or DDSketch
How many unique items, 20% better accuracy?ULL (vs HLL)
Approximate set membership with deletions?Quotient or CQF

Further Reading

  • guides/streaming_sketches.md -- Stream and Collectable integration
  • guides/broadway_integration.md -- Broadway pipeline patterns
  • guides/distributed_merge_semantics.md -- Distributed aggregation
  • guides/telemetry.md -- Monitoring sketch performance in production