AshScylla.DataLayer.Batch (AshScylla v0.10.0)

Copy Markdown View Source

Batch operations support for AshScylla using ScyllaDB's BATCH statements.

ScyllaDB/Cassandra supports batch operations for executing multiple CQL statements in a single request.

Synchronous Batches

For small batches or when ordering matters, use batch_insert/3, batch_update/3, or batch_delete/3.

Async Partition-Aware Batches

For large bulk operations, use batch_insert_async/4. This function:

  • Groups records by partition key (safe for ScyllaDB)
  • Executes sub-batches in parallel using Task.async_stream
  • Respects ScyllaDB's recommendation to avoid cross-partition BATCH statements

Examples

# Synchronous batch
statements = [
  {"INSERT INTO users (id, name) VALUES (?, ?)", [id1, "Alice"]},
  {"INSERT INTO users (id, name) VALUES (?, ?)", [id2, "Bob"]}
]
DataLayer.Batch.batch_insert(repo, statements)

# Async partition-aware batch
DataLayer.Batch.batch_insert_async(repo, statements, max_concurrency: 8)

Summary

Functions

Executes a batch of DELETE statements.

Executes a batch of INSERT statements.

Executes batch inserts asynchronously, grouped by partition key.

Executes a batch of UPDATE statements.

Extracts the partition key values from a record for a given resource.

Functions

batch_delete(repo, statements, opts \\ [])

@spec batch_delete(module(), [{String.t(), list()}], keyword()) ::
  {:ok, term()} | {:error, term()}

Executes a batch of DELETE statements.

batch_insert(repo, statements, opts \\ [])

@spec batch_insert(module(), [{String.t(), list()}], keyword()) ::
  {:ok, term()} | {:error, term()}

Executes a batch of INSERT statements.

batch_insert_async(repo, statements, opts \\ [])

@spec batch_insert_async(module(), [{String.t(), list()}], keyword()) ::
  {:ok, [term()]} | {:error, term()}

Executes batch inserts asynchronously, grouped by partition key.

This is the recommended approach for large bulk inserts in ScyllaDB. Records are grouped by their partition key values, and each group is executed as a separate batch in parallel. This avoids the cross-partition BATCH anti-pattern.

Options

  • :max_concurrency - Maximum number of concurrent batch executions (defaults to System.schedulers_online())

Examples

statements = [
  {"INSERT INTO users (id, name) VALUES (?, ?)", [id1, "Alice"]},
  {"INSERT INTO users (id, name) VALUES (?, ?)", [id2, "Bob"]},
  # ... hundreds more
]

DataLayer.Batch.batch_insert_async(repo, statements, resource: MyApp.User)

batch_update(repo, statements, opts \\ [])

@spec batch_update(module(), [{String.t(), list()}], keyword()) ::
  {:ok, term()} | {:error, term()}

Executes a batch of UPDATE statements.

partition_key(record, resource)

@spec partition_key(map(), module()) :: map()

Extracts the partition key values from a record for a given resource.

Returns a map of partition key column names to their values.