ExArrow (ex_arrow v0.6.0)

View Source

Apache Arrow support for the BEAM: IPC, Flight, ADBC, and data interchange.

ExArrow keeps Arrow data in native (Rust) memory and exposes opaque handles on the Elixir side. Copying to the BEAM heap happens only when explicitly requested.

Arrow hierarchy

Arrow organises columnar data in a strict hierarchy:

  • Array — a single column of typed values (the leaf node).
  • RecordBatch — a collection of Arrays sharing a row count and schema.
  • Table — a logical table backed by one or more RecordBatches.
  • Stream — a lazy sequence of RecordBatches (used by IPC, Flight, ADBC, and Parquet).
  • Schema — metadata describing field names, types, and nullability.
  • Field — one column's metadata within a Schema.

Data flows through these levels: a Stream yields Batches, a Batch exposes its Schema and row count, and the Schema lists its Fields.

Data interchange (v0.6+)

ExArrow serves as a universal data interchange layer between Arrow and the wider Elixir ecosystem:

These top-level functions delegate to focused bridge modules: ExArrow.DataFrame, ExArrow.Explorer, and ExArrow.Nx.

Public API outline

Data interchange

Core handles (opaque references)

IPC (ExArrow.IPC)

Flight (ExArrow.Flight)

ADBC (ExArrow.ADBC)

Schema mapping

Errors

  • ExArrow.Error – structured exception with code, message, details

Summary

Functions

from_dataframe(df)

@spec from_dataframe(Explorer.DataFrame.t()) ::
  {:ok, ExArrow.RecordBatch.t()} | {:error, String.t()}

Convert an Explorer.DataFrame to a single ExArrow.RecordBatch.

The dataframe is serialised to Arrow IPC and read back as native batches. When Explorer splits a large dataframe into multiple IPC batches they are concatenated into one batch, so the full row count and all values are preserved.

Schema field names and value types are preserved. Nullability is not guaranteed to survive the Explorer IPC round-trip: Explorer does not distinguish nullable from non-nullable columns, so columns may be reported as nullable regardless of the source data.

Returns {:ok, batch} or {:error, message}.

Examples

df = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
{:ok, batch} = ExArrow.from_dataframe(df)
ExArrow.RecordBatch.num_rows(batch)  #=> 3

from_nx(tensor, opts \\ [])

@spec from_nx(
  Nx.Tensor.t(),
  keyword()
) :: {:ok, ExArrow.RecordBatch.t()} | {:error, String.t()}

Convert an Nx.Tensor to an ExArrow.RecordBatch.

Supported dtypes: {:u, 8}, {:s, 64}, {:f, 32}, {:f, 64}, and all other integer/float dtypes supported by ExArrow.Schema.Mapper.

Rank-1 tensors produce a single-column batch named "value". Rank-2 tensors produce an N-column batch with columns named "c0", "c1", ... "c{N-1}", where N is the size of the second axis. Tensors of rank > 2 are not supported.

Options

  • :as — when set to :boolean, the column is created as an Arrow Boolean array. Only valid for {:u, 8} tensors. Default: :numeric.
  • :name — column name for rank-1 tensors. Default: "value".

Returns {:ok, batch} or {:error, message}.

Examples

# Rank-1 s64 tensor
{:ok, batch} = ExArrow.from_nx(Nx.tensor([1, 2, 3], type: {:s, 64}))
ExArrow.RecordBatch.num_rows(batch)  #=> 3

# Rank-2 f64 tensor → 3 columns (c0, c1, c2), 2 rows
{:ok, batch} = ExArrow.from_nx(Nx.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], type: {:f, 64}))

# Boolean tensor
{:ok, batch} = ExArrow.from_nx(Nx.tensor([1, 0, 1], type: {:u, 8}), as: :boolean)

native_version()

@spec native_version() :: String.t()

Returns the native NIF crate version. Used to verify the NIF loads.

to_dataframe(batch_or_stream)

@spec to_dataframe(ExArrow.RecordBatch.t() | ExArrow.Stream.t()) ::
  {:ok, Explorer.DataFrame.t()} | {:error, String.t()}

Convert an ExArrow.RecordBatch or ExArrow.Stream to an Explorer.DataFrame.

Column names, row count, and values are preserved. See from_dataframe/1 for a note on nullability through the Explorer round-trip.

Returns {:ok, dataframe} or {:error, message}.

Examples

{:ok, batch} = ExArrow.from_dataframe(df)
{:ok, df2} = ExArrow.to_dataframe(batch)
Explorer.DataFrame.n_rows(df2)  #=> 3

to_nx(batch)

@spec to_nx(ExArrow.RecordBatch.t()) :: {:ok, Nx.Tensor.t()} | {:error, String.t()}

Convert an ExArrow.RecordBatch to an Nx.Tensor.

For a single-column numeric/boolean batch, returns a rank-1 tensor. For a multi-column batch with uniform numeric dtype, returns a rank-2 tensor where columns become the second axis.

Returns {:ok, tensor} or {:error, message}.

Examples

# Single column → rank-1
{:ok, batch} = ExArrow.from_nx(Nx.tensor([1, 2, 3], type: {:s, 64}))
{:ok, tensor} = ExArrow.to_nx(batch)
Nx.shape(tensor)  #=> {3}

# Multi-column uniform → rank-2
{:ok, batch} = ExArrow.from_nx(Nx.tensor([[1, 2], [3, 4]], type: {:s, 64}))
{:ok, tensor} = ExArrow.to_nx(batch)
Nx.shape(tensor)  #=> {2, 2}