ExDataSketch provides five persistence backends for storing and recovering sketch state. All backends serialize sketches using the EXSK v2 binary format with CRC32C checksum integrity.

Supported Backends

BackendModuleDistributionDurabilityTransactional
ETSExDataSketch.Storage.ETSPer-nodeProcess lifetimeNo (RMW)
DETSExDataSketch.Storage.DETSPer-nodeDiskNo (file lock)
CubDBExDataSketch.Storage.CubDBPer-nodeDiskYes (MVCC)
MnesiaExDataSketch.Storage.MnesiaMulti-nodeDisk+RAMYes (ACID)
EctoExDataSketch.Storage.EctoMulti-nodeDatabaseYes (DB)

Note: ETS merge/3 uses a read-modify-write cycle without table-level locking, so concurrent writers may overwrite each other. For atomic merge guarantees, use Mnesia or Ecto. DETS provides file-lock serialization for single-node atomicity only.

Unified API

Every backend implements the same operations:

# Save a sketch under a key
:ok = Backend.save(sketch, storage, key)

# Load a sketch by key and module
{:ok, sketch} = Backend.load(SketchModule, storage, key)

# Merge into persisted sketch
:ok = Backend.merge(sketch, storage, key)

# Delete a sketch by key
:ok = Backend.delete(storage, key)

The storage argument varies by backend:

  • ETS/DETS: table name (atom)
  • CubDB: CubDB pid or name
  • Mnesia: table name (atom)
  • Ecto: Ecto repo module

ETS

ETS provides fast in-memory storage. It is always available (no extra dependencies required).

# Create the table (application concern)
:ets.new(:sketches, [:set, :public, :named_table])

# Save
:ok = ExDataSketch.Storage.ETS.save(sketch, :sketches, "cardinality:2024-01")

# Load
{:ok, loaded} = ExDataSketch.Storage.ETS.load(ExDataSketch.HLL, :sketches, "cardinality:2024-01")

# Merge (read-modify-write, not truly atomic under concurrency)
:ok = ExDataSketch.Storage.ETS.merge(partial, :sketches, "cardinality:2024-01")

# Delete
:ok = ExDataSketch.Storage.ETS.delete(:sketches, "cardinality:2024-01")

ETS tables must be :set or :ordered_set type.

DETS

DETS provides disk-backed storage that survives process and node restarts.

# Open the table (application concern)
{:ok, _} = :dets.open_file(:sketches, [type: :set])

# Save, load, merge, delete -- same API as ETS
:ok = ExDataSketch.Storage.DETS.save(sketch, :sketches, "cardinality:2024-01")
{:ok, loaded} = ExDataSketch.Storage.DETS.load(ExDataSketch.HLL, :sketches, "cardinality:2024-01")
:ok = ExDataSketch.Storage.DETS.merge(partial, :sketches, "cardinality:2024-01")
:ok = ExDataSketch.Storage.DETS.delete(:sketches, "cardinality:2024-01")

# Close when done
:ok = :dets.close(:sketches)

DETS tables must be :set type. :ordered_set and :bag are not supported. DETS has a practical 2GB file size limit.

CubDB

CubDB provides disk-backed key-value storage with MVCC transactions. It requires the :cubdb dependency.

Dependencies:

{:cubdb, "~> 2.0"}
# Start CubDB (application concern)
{:ok, db} = CubDB.start_link(data_dir: "/path/to/data")

# Save
:ok = ExDataSketch.Storage.CubDB.save(sketch, db, "cardinality:2024-01")

# Load
{:ok, loaded} = ExDataSketch.Storage.CubDB.load(ExDataSketch.HLL, db, "cardinality:2024-01")

# Atomic merge (uses CubDB transaction)
:ok = ExDataSketch.Storage.CubDB.merge(partial, db, "cardinality:2024-01")

# Delete
:ok = ExDataSketch.Storage.CubDB.delete(db, "cardinality:2024-01")

Mnesia

Mnesia provides distributed, transactional storage across BEAM cluster nodes. It is always available (no extra dependencies required).

# Setup the table (once per node)
:ok = ExDataSketch.Storage.Mnesia.setup(:sketches)
# Or with disc copies:
:ok = ExDataSketch.Storage.Mnesia.setup(:sketches, disc_copies: [node()])

# Save
:ok = ExDataSketch.Storage.Mnesia.save(sketch, :sketches, "cardinality:2024-01")

# Load
{:ok, loaded} = ExDataSketch.Storage.Mnesia.load(ExDataSketch.HLL, :sketches, "cardinality:2024-01")

# Atomic merge (uses Mnesia transaction)
:ok = ExDataSketch.Storage.Mnesia.merge(partial, :sketches, "cardinality:2024-01")

# Delete
:ok = ExDataSketch.Storage.Mnesia.delete(:sketches, "cardinality:2024-01")

Distributed Mnesia

For multi-node setups, create the table on all nodes before use:

:ok = ExDataSketch.Storage.Mnesia.setup(:sketches, disc_copies: [node(), :other@host])

Mnesia transactions ensure atomic merge across all replicas. For operational concerns including network partition recovery, refer to the Mnesia documentation.

Ecto

The Ecto backend stores sketches in a SQL database. It requires :ecto_sql.

Dependencies:

{:ecto_sql, "~> 3.0"}

Setup

Generate and run the migration:

mix ex_data_sketch.gen.migration --repo MyApp.Repo
mix ecto.migrate

Or add the migration manually:

defmodule MyApp.Repo.Migrations.AddExDataSketchSketches do
  use Ecto.Migration

  def up do
    ExDataSketch.Storage.Ecto.Migration.up()
  end

  def down do
    ExDataSketch.Storage.Ecto.Migration.down()
  end
end

Usage

# Save
:ok = ExDataSketch.Storage.Ecto.save(sketch, MyApp.Repo, "cardinality:2024-01")

# Load
{:ok, loaded} = ExDataSketch.Storage.Ecto.load(ExDataSketch.HLL, MyApp.Repo, "cardinality:2024-01")

# Atomic merge (uses Ecto transaction with SELECT FOR UPDATE)
:ok = ExDataSketch.Storage.Ecto.merge(partial, MyApp.Repo, "cardinality:2024-01")

# Delete
:ok = ExDataSketch.Storage.Ecto.delete(MyApp.Repo, "cardinality:2024-01")

The Ecto backend uses SELECT ... FOR UPDATE to ensure atomic merge in concurrent environments.

Choosing a Backend

Use CaseRecommended Backend
Fast in-memory cacheETS
Survive process restartsDETS or CubDB
Simple disk persistenceCubDB
Distributed clusterMnesia
Existing Ecto appEcto
SQL database requiredEcto
Need ACID across nodesMnesia or Ecto

Configuration

Backends can be enabled or disabled via application config:

config :ex_data_sketch,
  persistence_backends: [
    ets:   [enabled: true],
    dets:  [enabled: true],
    cubdb: [enabled: true],
    mnesia: [enabled: true],
    ecto:  [enabled: true]
  ]

When not explicitly configured, a backend defaults to enabled if its runtime dependency is available. Set enabled: false to disable a backend regardless of dependency availability.

See Also