ExArrow.Nx (ex_arrow v0.6.0)

View Source

Bridge between ExArrow and Nx tensors.

Converts numeric and boolean Arrow columns to Nx.Tensor values (and back) by copying the raw byte buffer once from native Arrow memory into an Elixir binary, then handing it directly to Nx.from_binary/2. No intermediate list materialisation occurs.

Requires {:nx, "~> 0.9"} in your mix.exs dependencies. When Nx is absent every function returns {:error, "Nx is not available..."}.

Supported column types

Arrow typeNx dtype
Int8{:s, 8}
Int16{:s, 16}
Int32{:s, 32}
Int64{:s, 64}
UInt8{:u, 8}
UInt16{:u, 16}
UInt32{:u, 32}
UInt64{:u, 64}
Float32{:f, 32}
Float64{:f, 64}
Boolean{:u, 8}

Arrow Boolean columns are materialised as one byte per element (0 or 1) and converted to an {:u, 8} Nx tensor. The reverse path accepts {:u, 8} tensors and builds an Arrow Boolean column when the as: :boolean option is passed to from_tensor/3.

Columns of other types (Utf8, Timestamp, etc.) are not supported for direct buffer extraction and return {:error, "unsupported column type..."}. to_tensors/1 silently skips unsupported columns.

Null handling

Arrow validity (null) bitmaps are not exposed through this API.

For numeric columns, null slots have unspecified backing bytes in Arrow memory, so the extracted buffer is not meaningful at null positions. For Boolean columns, ExArrow explicitly checks the null bitmap and emits 0 for null slots.

If you need to distinguish nulls from real zero values, inspect the original batch (full null support may be added in a future release).

Public API

FunctionDirectionDescription
column_to_tensor/2Arrow → NxExtract one named numeric/boolean column as an Nx.Tensor
to_tensors/1Arrow → NxExtract all numeric/boolean columns as %{name => Nx.Tensor}
from_tensor/3Nx → ArrowSingle tensor → single-column RecordBatch
from_tensors/1Nx → ArrowMap of tensors → multi-column RecordBatch (single NIF call)

Quick example

# Read a batch, extract one column as a tensor
{:ok, stream}  = ExArrow.Parquet.Reader.from_file("/data/trades.parquet")
batch          = ExArrow.Stream.next(stream)
{:ok, tensor}  = ExArrow.Nx.column_to_tensor(batch, "price")
mean_price     = tensor |> Nx.mean() |> Nx.to_number()

# Build a multi-column batch from tensors (v0.4+)
tensors = %{
  "price"  => Nx.tensor([1.0, 2.0, 3.0], type: {:f, 64}),
  "volume" => Nx.tensor([10, 20, 30],     type: {:s, 64})
}
{:ok, batch} = ExArrow.Nx.from_tensors(tensors)

Summary

Functions

Convert a named numeric or boolean column from batch to an Nx.Tensor.

Convert a map of {column_name => Nx.Tensor} to a multi-column ExArrow.RecordBatch in a single call.

Convert all numeric and boolean columns from batch to a map of Nx.Tensor values.

Functions

column_to_tensor(batch, col_name)

@spec column_to_tensor(ExArrow.RecordBatch.t(), String.t()) ::
  {:ok, Nx.Tensor.t()} | {:error, String.t()}

Convert a named numeric or boolean column from batch to an Nx.Tensor.

The column's raw byte buffer is copied once from native Arrow memory into an Elixir binary, then passed to Nx.from_binary/2. No list materialisation occurs.

Boolean columns are extracted as one byte per element (0 or 1) and returned as an {:u, 8} tensor.

Returns {:ok, tensor} or {:error, message}.

Examples

# Extract an int64 column
{:ok, ids} = ExArrow.Nx.column_to_tensor(batch, "id")
Nx.type(ids)   #=> {:s, 64}
Nx.shape(ids)  #=> {1000}

# Extract a float64 column and compute the mean
{:ok, prices} = ExArrow.Nx.column_to_tensor(batch, "price")
Nx.mean(prices) |> Nx.to_number()

# Extract a boolean column
{:ok, flags} = ExArrow.Nx.column_to_tensor(batch, "active")
Nx.type(flags)  #=> {:u, 8}

# Non-numeric column returns an error
{:error, msg} = ExArrow.Nx.column_to_tensor(batch, "name")
msg #=> "unsupported column type for Nx: Utf8"

# Unknown column returns an error
{:error, msg} = ExArrow.Nx.column_to_tensor(batch, "no_such_col")

from_tensor(tensor, col_name, opts \\ [])

@spec from_tensor(Nx.Tensor.t(), String.t(), keyword()) ::
  {:ok, ExArrow.RecordBatch.t()} | {:error, String.t()}

Convert an Nx.Tensor to a single-column ExArrow.RecordBatch.

The tensor's raw bytes are extracted via Nx.to_binary/1 and written into a native Arrow array. For rank-2 or higher-rank tensors, all elements are flattened into a single 1-D column (Nx.size(tensor) elements).

Supported Nx dtypes: {:s, 8|16|32|64}, {:u, 8|16|32|64}, {:f, 32|64}. Other dtypes (e.g. {:bf, 16}, {:c, 64}) return {:error, "unsupported Nx dtype..."}.

Options

  • :as — when set to :boolean, the column is created as an Arrow Boolean array instead of UInt8. Only valid when the tensor dtype is {:u, 8}.

Returns {:ok, batch} or {:error, message}.

Examples

# Float64 tensor → RecordBatch
tensor = Nx.tensor([1.0, 2.0, 3.0], type: {:f, 64})
{:ok, batch} = ExArrow.Nx.from_tensor(tensor, "weights")
ExArrow.RecordBatch.num_rows(batch)  #=> 3

# Round-trip: tensor → batch → tensor
original = Nx.tensor([10, 20, 30], type: {:s, 64})
{:ok, batch}     = ExArrow.Nx.from_tensor(original, "vals")
{:ok, recovered} = ExArrow.Nx.column_to_tensor(batch, "vals")
Nx.to_list(recovered)  #=> [10, 20, 30]

# Boolean tensor → Arrow Boolean column
flags = Nx.tensor([1, 0, 1], type: {:u, 8})
{:ok, batch} = ExArrow.Nx.from_tensor(flags, "active", as: :boolean)

# Unsupported dtype
{:error, msg} = ExArrow.Nx.from_tensor(Nx.tensor([1, 2], type: {:bf, 16}), "x")

from_tensors(tensors)

@spec from_tensors(%{required(String.t()) => Nx.Tensor.t()}) ::
  {:ok, ExArrow.RecordBatch.t()} | {:error, String.t()}

Convert a map of {column_name => Nx.Tensor} to a multi-column ExArrow.RecordBatch in a single call.

All tensors must have the same number of elements (Nx.size/1). For rank-2 or higher-rank tensors the elements are flattened into a 1-D column.

Column order in the resulting batch is deterministic: columns are sorted lexicographically by name. Supported dtypes are the same as from_tensor/3.

Returns {:ok, batch} or {:error, message}.

Examples

tensors = %{
  "price" => Nx.tensor([1.5, 2.5, 3.5], type: {:f, 64}),
  "qty"   => Nx.tensor([10, 20, 30],     type: {:s, 32})
}
{:ok, batch} = ExArrow.Nx.from_tensors(tensors)
ExArrow.RecordBatch.num_rows(batch)  #=> 3

# Round-trip: all columns
{:ok, recovered} = ExArrow.Nx.to_tensors(batch)
Nx.to_list(recovered["price"])  #=> [1.5, 2.5, 3.5]

# Mismatched sizes return an error
bad = %{"a" => Nx.tensor([1, 2]), "b" => Nx.tensor([1, 2, 3])}
{:error, _} = ExArrow.Nx.from_tensors(bad)

to_tensors(batch)

@spec to_tensors(ExArrow.RecordBatch.t()) ::
  {:ok, %{required(String.t()) => Nx.Tensor.t()}} | {:error, String.t()}

Convert all numeric and boolean columns from batch to a map of Nx.Tensor values.

Non-numeric columns (Utf8, Timestamp, etc.) are silently skipped.

Returns {:ok, %{column_name => tensor}} or {:error, message}.

Example

{:ok, tensors} = ExArrow.Nx.to_tensors(batch)
# tensors is a map: %{"price" => #Nx.Tensor<...>, "qty" => #Nx.Tensor<...>}
tensors["price"] |> Nx.sort()
Map.keys(tensors)  # only numeric/boolean columns are present