ExArrow.Broadway.ParquetSink (ex_arrow v0.7.0)

View Source

Write assembled Arrow batches to a Parquet file from a Broadway batch handler.

Intended to be called from a Broadway handle_batch/4 callback. The batches are written in a single ExArrow.Parquet.Writer.to_file/3 call so the output is one Parquet file with one row group per batch (subject to the writer's chunking).

Emits a [:ex_arrow, :parquet, :write] telemetry event with :rows, :batch_count, and %{destination: path, source: :broadway} metadata.

Example

def handle_batch(:parquet, messages, _info, _ctx) do
  {:ok, schema, batches} = ExArrow.Broadway.BatchBuilder.from_messages(messages)
  ExArrow.Broadway.ParquetSink.write("/data/out.parquet", schema, batches)
end

Summary

Functions

Write schema and batches to a Parquet file at path.

Functions

write(path, schema, batches)

@spec write(Path.t(), ExArrow.Schema.t(), [ExArrow.RecordBatch.t()]) ::
  :ok | {:error, String.t()}

Write schema and batches to a Parquet file at path.

Returns :ok or {:error, message}.