ExArrow.Broadway.BatchBuilder (ex_arrow v0.7.1)

View Source

Assemble ExArrow.RecordBatch values from Broadway messages.

Each Broadway message is expected to carry one of:

  • an ExArrow.RecordBatch.t() handle in message.data (the common case for Arrow-aware producers), or
  • a {names, binaries, dtypes, length} tuple describing raw Arrow columns, which is converted to a batch via ExArrow.RecordBatch.from_columns/4.

from_messages/1 returns {:ok, schema, batches} where schema is the schema of the first batch and batches is the list of assembled batches. from_messages/2 accepts options:

  • :rows_per_batch — split the assembled batches into smaller batches of at most this many rows by re-chunking the column buffers. Defaults to :infinity (no splitting).

Example

{:ok, schema, batches} =
  ExArrow.Broadway.BatchBuilder.from_messages(messages)

Summary

Functions

Extract the ExArrow.RecordBatch handles from a list of Broadway messages, without resolving the schema.

Build a list of ExArrow.RecordBatch values from a list of Broadway messages, returning the shared schema and the batch list.

Functions

extract_batches(messages)

@spec extract_batches([term()]) ::
  {:ok, [ExArrow.RecordBatch.t()]} | {:error, String.t()}

Extract the ExArrow.RecordBatch handles from a list of Broadway messages, without resolving the schema.

Returns {:ok, [batch, ...]} or {:error, message}.

from_messages(messages)

@spec from_messages([term()]) ::
  {:ok, ExArrow.Schema.t(), [ExArrow.RecordBatch.t()]} | {:error, String.t()}

Build a list of ExArrow.RecordBatch values from a list of Broadway messages, returning the shared schema and the batch list.

Returns {:ok, schema, [batch, ...]} or {:error, message}.

from_messages(messages, opts)

@spec from_messages(
  [term()],
  keyword()
) :: {:ok, ExArrow.Schema.t(), [ExArrow.RecordBatch.t()]} | {:error, String.t()}