ExArrow.RecordBatch (ex_arrow v0.6.3)

View Source

Arrow record batch handle (opaque reference to native record batch).

A batch is a collection of column arrays with a shared schema and row count. It sits between ExArrow.Array (one column) and ExArrow.Table or ExArrow.Stream (multiple batches). Data stays in native memory; accessors return handles or small metadata.

Position in the hierarchy

Schema  Field (metadata)
            
RecordBatch  Array (one per column)
                  
Table / Stream  RecordBatch (one or more)

Supported dtype strings (from_columns/4)

The from_columns/4 constructor accepts a per-column dtype string. The full set of supported strings, the corresponding Arrow logical type, and the wire format expected for each column binary are listed below.

Fixed-width primitives

Each column binary is exactly length × element_size bytes, in little-endian byte order for multi-byte types.

dtypeArrow typeelement size
"s8"Int81 byte
"s16"Int162 bytes
"s32"Int324 bytes
"s64"Int648 bytes
"u8"UInt81 byte
"u16"UInt162 bytes
"u32"UInt324 bytes
"u64"UInt648 bytes
"f32"Float324 bytes
"f64"Float648 bytes

Boolean

"bool": exactly length bytes, one byte per element (0 = false, non-zero = true).

Date and time

Dates are days or milliseconds since 1970-01-01. Timestamps are ticks since the Unix epoch in UTC. Durations are tick counts. All temporal types are little-endian.

dtypeArrow typeRust scalarelement size
"date32"Date32i32 days4 bytes
"date64"Date64i64 millis8 bytes
"timestamp_seconds"Timestamp(Second, None)i64 sec8 bytes
"timestamp_millis"Timestamp(Millisecond, None)i64 ms8 bytes
"timestamp_micros"Timestamp(Microsecond, None)i64 µs8 bytes
"timestamp_nanos"Timestamp(Nanosecond, None)i64 ns8 bytes
"duration_seconds"Duration(Second)i64 sec8 bytes
"duration_millis"Duration(Millisecond)i64 ms8 bytes
"duration_micros"Duration(Microsecond)i64 µs8 bytes
"duration_nanos"Duration(Nanosecond)i64 ns8 bytes

Timestamps are emitted with no timezone (None). The caller is responsible for ensuring the i64 ticks are in UTC if the consuming server treats the column as zoned.

Variable-length string and binary

Variable-length columns use a length-prefixed wire format. The column binary is the concatenation of length records, each of the form:

<<elem_len::unsigned-little-32, elem_bytes::binary-size(elem_len)>>
dtypeArrow type
"utf8"Utf8
"large_utf8"LargeUtf8
"binary"Binary
"large_binary"LargeBinary

"utf8" and "large_utf8" validate UTF-8 on the entire payload and return {:error, msg} if any element is invalid. "binary" and "large_binary" accept arbitrary bytes.

Nullability

from_columns/4 produces non-nullable columns (Field.nullable = false). Pass nulls by binding a separate column or by using a parameter schema that accepts non-null values only.

Summary

Functions

Returns the column names of this batch.

Create a RecordBatch from column-oriented binary data.

Returns the number of columns in this batch.

Returns the number of rows in this batch.

Returns the schema of this record batch.

Types

t()

@opaque t()

Functions

column_names(batch)

@spec column_names(t()) :: [String.t()]

Returns the column names of this batch.

Derived from the batch's schema. Equivalent to ExArrow.Schema.field_names(ExArrow.RecordBatch.schema(batch)).

Examples

{:ok, ipc_bin} = ExArrow.Native.ipc_test_fixture_binary()
{:ok, stream}  = ExArrow.IPC.Reader.from_binary(ipc_bin)
batch = ExArrow.Stream.next(stream)

from_columns(names, binaries, dtypes, length)

@spec from_columns([String.t()], [binary()], [String.t()], non_neg_integer()) ::
  {:ok, t()} | {:error, String.t()}

Create a RecordBatch from column-oriented binary data.

Each column is provided as a raw binary paired with an Arrow dtype string and a shared row count. This constructor builds parameter batches for Flight SQL prepared statement binding.

Parameters

  • names: list of column name strings
  • binaries: list of column data binaries (one per column). See the supported dtypes table in the moduledoc for the wire format of each dtype.
  • dtypes: list of Arrow dtype strings, one per column
  • length: number of rows (must be the same for every column)

All four lists must have the same length and at least one entry.

Returns

  • {:ok, %ExArrow.RecordBatch{}} on success
  • {:error, message} if the inputs are inconsistent (mismatched list lengths, malformed binary, unknown dtype, invalid UTF-8 in a "utf8"/"large_utf8" column, etc.)

Examples

# Single int64 column with one row
{:ok, batch} = ExArrow.RecordBatch.from_columns(
  ["id"],
  [<<42::little-signed-64>>],
  ["s64"],
  1
)

# Mixed primitives
{:ok, batch} = ExArrow.RecordBatch.from_columns(
  ["id", "score"],
  [<<1::little-signed-64>>, <<3.14::little-float-64>>],
  ["s64", "f64"],
  1
)

# utf8 column with two rows ("hello", "world") using length-prefixed
# records: <<len::little-32, bytes::binary-size(len)>>
utf8 = <<5::little-32, "hello", 5::little-32, "world">>
{:ok, batch} = ExArrow.RecordBatch.from_columns(["s"], [utf8], ["utf8"], 2)

# timestamp_micros column
ts = <<1_700_000_000_000_000::little-signed-64>>
{:ok, batch} =
  ExArrow.RecordBatch.from_columns(["t"], [ts], ["timestamp_micros"], 1)

num_columns(batch)

@spec num_columns(t()) :: non_neg_integer()

Returns the number of columns in this batch.

Derived from the batch's schema; no separate NIF call is needed.

Examples

{:ok, ipc_bin} = ExArrow.Native.ipc_test_fixture_binary()
{:ok, stream}  = ExArrow.IPC.Reader.from_binary(ipc_bin)
batch = ExArrow.Stream.next(stream)
ExArrow.RecordBatch.num_columns(batch)  #=> 2

num_rows(record_batch)

@spec num_rows(t()) :: non_neg_integer()

Returns the number of rows in this batch.

schema(record_batch)

@spec schema(t()) :: ExArrow.Schema.t()

Returns the schema of this record batch.