ExArrow. Nx
(ex_arrow v0.6.0)
View Source
Bridge between ExArrow and Nx tensors.
Converts numeric and boolean Arrow columns to Nx.Tensor values (and back)
by copying the raw byte buffer once from native Arrow memory into an Elixir
binary, then handing it directly to Nx.from_binary/2. No intermediate
list materialisation occurs.
Requires {:nx, "~> 0.9"} in your mix.exs dependencies. When Nx is
absent every function returns {:error, "Nx is not available..."}.
Supported column types
| Arrow type | Nx dtype |
|---|---|
| Int8 | {:s, 8} |
| Int16 | {:s, 16} |
| Int32 | {:s, 32} |
| Int64 | {:s, 64} |
| UInt8 | {:u, 8} |
| UInt16 | {:u, 16} |
| UInt32 | {:u, 32} |
| UInt64 | {:u, 64} |
| Float32 | {:f, 32} |
| Float64 | {:f, 64} |
| Boolean | {:u, 8} |
Arrow Boolean columns are materialised as one byte per element (0 or 1) and
converted to an {:u, 8} Nx tensor. The reverse path accepts {:u, 8}
tensors and builds an Arrow Boolean column when the as: :boolean option is
passed to from_tensor/3.
Columns of other types (Utf8, Timestamp, etc.) are not supported for direct
buffer extraction and return {:error, "unsupported column type..."}.
to_tensors/1 silently skips unsupported columns.
Null handling
Arrow validity (null) bitmaps are not exposed through this API.
For numeric columns, null slots have unspecified backing bytes in Arrow memory, so the extracted buffer is not meaningful at null positions. For Boolean columns, ExArrow explicitly checks the null bitmap and emits 0 for null slots.
If you need to distinguish nulls from real zero values, inspect the original batch (full null support may be added in a future release).
Public API
| Function | Direction | Description |
|---|---|---|
column_to_tensor/2 | Arrow → Nx | Extract one named numeric/boolean column as an Nx.Tensor |
to_tensors/1 | Arrow → Nx | Extract all numeric/boolean columns as %{name => Nx.Tensor} |
from_tensor/3 | Nx → Arrow | Single tensor → single-column RecordBatch |
from_tensors/1 | Nx → Arrow | Map of tensors → multi-column RecordBatch (single NIF call) |
Quick example
# Read a batch, extract one column as a tensor
{:ok, stream} = ExArrow.Parquet.Reader.from_file("/data/trades.parquet")
batch = ExArrow.Stream.next(stream)
{:ok, tensor} = ExArrow.Nx.column_to_tensor(batch, "price")
mean_price = tensor |> Nx.mean() |> Nx.to_number()
# Build a multi-column batch from tensors (v0.4+)
tensors = %{
"price" => Nx.tensor([1.0, 2.0, 3.0], type: {:f, 64}),
"volume" => Nx.tensor([10, 20, 30], type: {:s, 64})
}
{:ok, batch} = ExArrow.Nx.from_tensors(tensors)
Summary
Functions
Convert a named numeric or boolean column from batch to an Nx.Tensor.
Convert an Nx.Tensor to a single-column ExArrow.RecordBatch.
Convert a map of {column_name => Nx.Tensor} to a multi-column
ExArrow.RecordBatch in a single call.
Convert all numeric and boolean columns from batch to a map of
Nx.Tensor values.
Functions
@spec column_to_tensor(ExArrow.RecordBatch.t(), String.t()) :: {:ok, Nx.Tensor.t()} | {:error, String.t()}
Convert a named numeric or boolean column from batch to an Nx.Tensor.
The column's raw byte buffer is copied once from native Arrow memory into an
Elixir binary, then passed to Nx.from_binary/2. No list materialisation
occurs.
Boolean columns are extracted as one byte per element (0 or 1) and returned
as an {:u, 8} tensor.
Returns {:ok, tensor} or {:error, message}.
Examples
# Extract an int64 column
{:ok, ids} = ExArrow.Nx.column_to_tensor(batch, "id")
Nx.type(ids) #=> {:s, 64}
Nx.shape(ids) #=> {1000}
# Extract a float64 column and compute the mean
{:ok, prices} = ExArrow.Nx.column_to_tensor(batch, "price")
Nx.mean(prices) |> Nx.to_number()
# Extract a boolean column
{:ok, flags} = ExArrow.Nx.column_to_tensor(batch, "active")
Nx.type(flags) #=> {:u, 8}
# Non-numeric column returns an error
{:error, msg} = ExArrow.Nx.column_to_tensor(batch, "name")
msg #=> "unsupported column type for Nx: Utf8"
# Unknown column returns an error
{:error, msg} = ExArrow.Nx.column_to_tensor(batch, "no_such_col")
@spec from_tensor(Nx.Tensor.t(), String.t(), keyword()) :: {:ok, ExArrow.RecordBatch.t()} | {:error, String.t()}
Convert an Nx.Tensor to a single-column ExArrow.RecordBatch.
The tensor's raw bytes are extracted via Nx.to_binary/1 and written into
a native Arrow array. For rank-2 or higher-rank tensors, all elements are
flattened into a single 1-D column (Nx.size(tensor) elements).
Supported Nx dtypes: {:s, 8|16|32|64}, {:u, 8|16|32|64},
{:f, 32|64}. Other dtypes (e.g. {:bf, 16}, {:c, 64}) return
{:error, "unsupported Nx dtype..."}.
Options
:as— when set to:boolean, the column is created as an Arrow Boolean array instead of UInt8. Only valid when the tensor dtype is{:u, 8}.
Returns {:ok, batch} or {:error, message}.
Examples
# Float64 tensor → RecordBatch
tensor = Nx.tensor([1.0, 2.0, 3.0], type: {:f, 64})
{:ok, batch} = ExArrow.Nx.from_tensor(tensor, "weights")
ExArrow.RecordBatch.num_rows(batch) #=> 3
# Round-trip: tensor → batch → tensor
original = Nx.tensor([10, 20, 30], type: {:s, 64})
{:ok, batch} = ExArrow.Nx.from_tensor(original, "vals")
{:ok, recovered} = ExArrow.Nx.column_to_tensor(batch, "vals")
Nx.to_list(recovered) #=> [10, 20, 30]
# Boolean tensor → Arrow Boolean column
flags = Nx.tensor([1, 0, 1], type: {:u, 8})
{:ok, batch} = ExArrow.Nx.from_tensor(flags, "active", as: :boolean)
# Unsupported dtype
{:error, msg} = ExArrow.Nx.from_tensor(Nx.tensor([1, 2], type: {:bf, 16}), "x")
@spec from_tensors(%{required(String.t()) => Nx.Tensor.t()}) :: {:ok, ExArrow.RecordBatch.t()} | {:error, String.t()}
Convert a map of {column_name => Nx.Tensor} to a multi-column
ExArrow.RecordBatch in a single call.
All tensors must have the same number of elements (Nx.size/1). For
rank-2 or higher-rank tensors the elements are flattened into a 1-D column.
Column order in the resulting batch is deterministic: columns are sorted
lexicographically by name. Supported dtypes are the same as from_tensor/3.
Returns {:ok, batch} or {:error, message}.
Examples
tensors = %{
"price" => Nx.tensor([1.5, 2.5, 3.5], type: {:f, 64}),
"qty" => Nx.tensor([10, 20, 30], type: {:s, 32})
}
{:ok, batch} = ExArrow.Nx.from_tensors(tensors)
ExArrow.RecordBatch.num_rows(batch) #=> 3
# Round-trip: all columns
{:ok, recovered} = ExArrow.Nx.to_tensors(batch)
Nx.to_list(recovered["price"]) #=> [1.5, 2.5, 3.5]
# Mismatched sizes return an error
bad = %{"a" => Nx.tensor([1, 2]), "b" => Nx.tensor([1, 2, 3])}
{:error, _} = ExArrow.Nx.from_tensors(bad)
@spec to_tensors(ExArrow.RecordBatch.t()) :: {:ok, %{required(String.t()) => Nx.Tensor.t()}} | {:error, String.t()}
Convert all numeric and boolean columns from batch to a map of
Nx.Tensor values.
Non-numeric columns (Utf8, Timestamp, etc.) are silently skipped.
Returns {:ok, %{column_name => tensor}} or {:error, message}.
Example
{:ok, tensors} = ExArrow.Nx.to_tensors(batch)
# tensors is a map: %{"price" => #Nx.Tensor<...>, "qty" => #Nx.Tensor<...>}
tensors["price"] |> Nx.sort()
Map.keys(tensors) # only numeric/boolean columns are present