ExDataSketch provides five persistence backends for storing and recovering sketch state. All backends serialize sketches using the EXSK v2 binary format with CRC32C checksum integrity.
Supported Backends
| Backend | Module | Distribution | Durability | Transactional |
|---|---|---|---|---|
| ETS | ExDataSketch.Storage.ETS | Per-node | Process lifetime | No (RMW) |
| DETS | ExDataSketch.Storage.DETS | Per-node | Disk | No (file lock) |
| CubDB | ExDataSketch.Storage.CubDB | Per-node | Disk | Yes (MVCC) |
| Mnesia | ExDataSketch.Storage.Mnesia | Multi-node | Disk+RAM | Yes (ACID) |
| Ecto | ExDataSketch.Storage.Ecto | Multi-node | Database | Yes (DB) |
Note: ETS
merge/3uses a read-modify-write cycle without table-level locking, so concurrent writers may overwrite each other. For atomic merge guarantees, use Mnesia or Ecto. DETS provides file-lock serialization for single-node atomicity only.
Unified API
Every backend implements the same operations:
# Save a sketch under a key
:ok = Backend.save(sketch, storage, key)
# Load a sketch by key and module
{:ok, sketch} = Backend.load(SketchModule, storage, key)
# Merge into persisted sketch
:ok = Backend.merge(sketch, storage, key)
# Delete a sketch by key
:ok = Backend.delete(storage, key)The storage argument varies by backend:
- ETS/DETS: table name (atom)
- CubDB: CubDB pid or name
- Mnesia: table name (atom)
- Ecto: Ecto repo module
ETS
ETS provides fast in-memory storage. It is always available (no extra dependencies required).
# Create the table (application concern)
:ets.new(:sketches, [:set, :public, :named_table])
# Save
:ok = ExDataSketch.Storage.ETS.save(sketch, :sketches, "cardinality:2024-01")
# Load
{:ok, loaded} = ExDataSketch.Storage.ETS.load(ExDataSketch.HLL, :sketches, "cardinality:2024-01")
# Merge (read-modify-write, not truly atomic under concurrency)
:ok = ExDataSketch.Storage.ETS.merge(partial, :sketches, "cardinality:2024-01")
# Delete
:ok = ExDataSketch.Storage.ETS.delete(:sketches, "cardinality:2024-01")ETS tables must be :set or :ordered_set type.
DETS
DETS provides disk-backed storage that survives process and node restarts.
# Open the table (application concern)
{:ok, _} = :dets.open_file(:sketches, [type: :set])
# Save, load, merge, delete -- same API as ETS
:ok = ExDataSketch.Storage.DETS.save(sketch, :sketches, "cardinality:2024-01")
{:ok, loaded} = ExDataSketch.Storage.DETS.load(ExDataSketch.HLL, :sketches, "cardinality:2024-01")
:ok = ExDataSketch.Storage.DETS.merge(partial, :sketches, "cardinality:2024-01")
:ok = ExDataSketch.Storage.DETS.delete(:sketches, "cardinality:2024-01")
# Close when done
:ok = :dets.close(:sketches)DETS tables must be :set type. :ordered_set and :bag are not supported.
DETS has a practical 2GB file size limit.
CubDB
CubDB provides disk-backed key-value storage with MVCC transactions. It
requires the :cubdb dependency.
Dependencies:
{:cubdb, "~> 2.0"}# Start CubDB (application concern)
{:ok, db} = CubDB.start_link(data_dir: "/path/to/data")
# Save
:ok = ExDataSketch.Storage.CubDB.save(sketch, db, "cardinality:2024-01")
# Load
{:ok, loaded} = ExDataSketch.Storage.CubDB.load(ExDataSketch.HLL, db, "cardinality:2024-01")
# Atomic merge (uses CubDB transaction)
:ok = ExDataSketch.Storage.CubDB.merge(partial, db, "cardinality:2024-01")
# Delete
:ok = ExDataSketch.Storage.CubDB.delete(db, "cardinality:2024-01")Mnesia
Mnesia provides distributed, transactional storage across BEAM cluster nodes. It is always available (no extra dependencies required).
# Setup the table (once per node)
:ok = ExDataSketch.Storage.Mnesia.setup(:sketches)
# Or with disc copies:
:ok = ExDataSketch.Storage.Mnesia.setup(:sketches, disc_copies: [node()])
# Save
:ok = ExDataSketch.Storage.Mnesia.save(sketch, :sketches, "cardinality:2024-01")
# Load
{:ok, loaded} = ExDataSketch.Storage.Mnesia.load(ExDataSketch.HLL, :sketches, "cardinality:2024-01")
# Atomic merge (uses Mnesia transaction)
:ok = ExDataSketch.Storage.Mnesia.merge(partial, :sketches, "cardinality:2024-01")
# Delete
:ok = ExDataSketch.Storage.Mnesia.delete(:sketches, "cardinality:2024-01")Distributed Mnesia
For multi-node setups, create the table on all nodes before use:
:ok = ExDataSketch.Storage.Mnesia.setup(:sketches, disc_copies: [node(), :other@host])Mnesia transactions ensure atomic merge across all replicas. For operational concerns including network partition recovery, refer to the Mnesia documentation.
Ecto
The Ecto backend stores sketches in a SQL database. It requires :ecto_sql.
Dependencies:
{:ecto_sql, "~> 3.0"}Setup
Generate and run the migration:
mix ex_data_sketch.gen.migration --repo MyApp.Repo
mix ecto.migrate
Or add the migration manually:
defmodule MyApp.Repo.Migrations.AddExDataSketchSketches do
use Ecto.Migration
def up do
ExDataSketch.Storage.Ecto.Migration.up()
end
def down do
ExDataSketch.Storage.Ecto.Migration.down()
end
endUsage
# Save
:ok = ExDataSketch.Storage.Ecto.save(sketch, MyApp.Repo, "cardinality:2024-01")
# Load
{:ok, loaded} = ExDataSketch.Storage.Ecto.load(ExDataSketch.HLL, MyApp.Repo, "cardinality:2024-01")
# Atomic merge (uses Ecto transaction with SELECT FOR UPDATE)
:ok = ExDataSketch.Storage.Ecto.merge(partial, MyApp.Repo, "cardinality:2024-01")
# Delete
:ok = ExDataSketch.Storage.Ecto.delete(MyApp.Repo, "cardinality:2024-01")The Ecto backend uses SELECT ... FOR UPDATE to ensure atomic merge in
concurrent environments.
Choosing a Backend
| Use Case | Recommended Backend |
|---|---|
| Fast in-memory cache | ETS |
| Survive process restarts | DETS or CubDB |
| Simple disk persistence | CubDB |
| Distributed cluster | Mnesia |
| Existing Ecto app | Ecto |
| SQL database required | Ecto |
| Need ACID across nodes | Mnesia or Ecto |
Configuration
Backends can be enabled or disabled via application config:
config :ex_data_sketch,
persistence_backends: [
ets: [enabled: true],
dets: [enabled: true],
cubdb: [enabled: true],
mnesia: [enabled: true],
ecto: [enabled: true]
]When not explicitly configured, a backend defaults to enabled if its runtime
dependency is available. Set enabled: false to disable a backend regardless
of dependency availability.