ExArrow (ex_arrow v0.6.0)
View SourceApache Arrow support for the BEAM: IPC, Flight, ADBC, and data interchange.
ExArrow keeps Arrow data in native (Rust) memory and exposes opaque handles on the Elixir side. Copying to the BEAM heap happens only when explicitly requested.
Arrow hierarchy
Arrow organises columnar data in a strict hierarchy:
- Array — a single column of typed values (the leaf node).
- RecordBatch — a collection of Arrays sharing a row count and schema.
- Table — a logical table backed by one or more RecordBatches.
- Stream — a lazy sequence of RecordBatches (used by IPC, Flight, ADBC, and Parquet).
- Schema — metadata describing field names, types, and nullability.
- Field — one column's metadata within a Schema.
Data flows through these levels: a Stream yields Batches, a Batch exposes its Schema and row count, and the Schema lists its Fields.
Data interchange (v0.6+)
ExArrow serves as a universal data interchange layer between Arrow and the wider Elixir ecosystem:
from_dataframe/1/to_dataframe/1— Explorer DataFrame <-> Arrowfrom_nx/1/to_nx/1— Nx Tensor <-> Arrow
These top-level functions delegate to focused bridge modules:
ExArrow.DataFrame, ExArrow.Explorer, and ExArrow.Nx.
Public API outline
Data interchange
ExArrow.from_dataframe/1,to_dataframe/1— Explorer <-> ArrowExArrow.from_nx/1,to_nx/1— Nx <-> ArrowExArrow.DataFrame.from_arrow/1,to_arrow/1— DataFrame-oriented API
Core handles (opaque references)
ExArrow.Schema– schema metadata (fields)ExArrow.Field– field name, type, and nullabilityExArrow.Array– column array handleExArrow.RecordBatch– batch of columns with shared row countExArrow.Table– table with schema and batchesExArrow.Stream– stream of record batches (IPC/Flight)
IPC (ExArrow.IPC)
ExArrow.IPC.Reader.from_binary/1,from_file/1– read stream from binary or fileExArrow.IPC.Writer.to_binary/2,to_file/3– write batches to binary or file
Flight (ExArrow.Flight)
ExArrow.Flight.Client– connect, do_get, do_putExArrow.Flight.Server– minimal server (e.g. echo)
ADBC (ExArrow.ADBC)
ExArrow.ADBC.Database.open/1– open database (driver path or opts)ExArrow.ADBC.Connection.open/1– open connection from databaseExArrow.ADBC.Statement– new(conn, sql) or new(conn, sql, bind: batch), execute (returns stream); set_sql/bind for reuse/rebind
Schema mapping
ExArrow.Schema.Mapper– bidirectional Arrow <-> Explorer/Nx type mapping
Errors
ExArrow.Error– structured exception with code, message, details
Summary
Functions
Convert an Explorer.DataFrame to a single ExArrow.RecordBatch.
Convert an Nx.Tensor to an ExArrow.RecordBatch.
Returns the native NIF crate version. Used to verify the NIF loads.
Convert an ExArrow.RecordBatch or ExArrow.Stream to an
Explorer.DataFrame.
Convert an ExArrow.RecordBatch to an Nx.Tensor.
Functions
@spec from_dataframe(Explorer.DataFrame.t()) :: {:ok, ExArrow.RecordBatch.t()} | {:error, String.t()}
Convert an Explorer.DataFrame to a single ExArrow.RecordBatch.
The dataframe is serialised to Arrow IPC and read back as native batches. When Explorer splits a large dataframe into multiple IPC batches they are concatenated into one batch, so the full row count and all values are preserved.
Schema field names and value types are preserved. Nullability is not guaranteed to survive the Explorer IPC round-trip: Explorer does not distinguish nullable from non-nullable columns, so columns may be reported as nullable regardless of the source data.
Returns {:ok, batch} or {:error, message}.
Examples
df = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
{:ok, batch} = ExArrow.from_dataframe(df)
ExArrow.RecordBatch.num_rows(batch) #=> 3
@spec from_nx( Nx.Tensor.t(), keyword() ) :: {:ok, ExArrow.RecordBatch.t()} | {:error, String.t()}
Convert an Nx.Tensor to an ExArrow.RecordBatch.
Supported dtypes: {:u, 8}, {:s, 64}, {:f, 32}, {:f, 64}, and
all other integer/float dtypes supported by ExArrow.Schema.Mapper.
Rank-1 tensors produce a single-column batch named "value". Rank-2
tensors produce an N-column batch with columns named "c0", "c1", ...
"c{N-1}", where N is the size of the second axis. Tensors of rank > 2
are not supported.
Options
:as— when set to:boolean, the column is created as an Arrow Boolean array. Only valid for{:u, 8}tensors. Default::numeric.:name— column name for rank-1 tensors. Default:"value".
Returns {:ok, batch} or {:error, message}.
Examples
# Rank-1 s64 tensor
{:ok, batch} = ExArrow.from_nx(Nx.tensor([1, 2, 3], type: {:s, 64}))
ExArrow.RecordBatch.num_rows(batch) #=> 3
# Rank-2 f64 tensor → 3 columns (c0, c1, c2), 2 rows
{:ok, batch} = ExArrow.from_nx(Nx.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], type: {:f, 64}))
# Boolean tensor
{:ok, batch} = ExArrow.from_nx(Nx.tensor([1, 0, 1], type: {:u, 8}), as: :boolean)
@spec native_version() :: String.t()
Returns the native NIF crate version. Used to verify the NIF loads.
@spec to_dataframe(ExArrow.RecordBatch.t() | ExArrow.Stream.t()) :: {:ok, Explorer.DataFrame.t()} | {:error, String.t()}
Convert an ExArrow.RecordBatch or ExArrow.Stream to an
Explorer.DataFrame.
Column names, row count, and values are preserved. See from_dataframe/1 for a
note on nullability through the Explorer round-trip.
Returns {:ok, dataframe} or {:error, message}.
Examples
{:ok, batch} = ExArrow.from_dataframe(df)
{:ok, df2} = ExArrow.to_dataframe(batch)
Explorer.DataFrame.n_rows(df2) #=> 3
@spec to_nx(ExArrow.RecordBatch.t()) :: {:ok, Nx.Tensor.t()} | {:error, String.t()}
Convert an ExArrow.RecordBatch to an Nx.Tensor.
For a single-column numeric/boolean batch, returns a rank-1 tensor. For a multi-column batch with uniform numeric dtype, returns a rank-2 tensor where columns become the second axis.
Returns {:ok, tensor} or {:error, message}.
Examples
# Single column → rank-1
{:ok, batch} = ExArrow.from_nx(Nx.tensor([1, 2, 3], type: {:s, 64}))
{:ok, tensor} = ExArrow.to_nx(batch)
Nx.shape(tensor) #=> {3}
# Multi-column uniform → rank-2
{:ok, batch} = ExArrow.from_nx(Nx.tensor([[1, 2], [3, 4]], type: {:s, 64}))
{:ok, tensor} = ExArrow.to_nx(batch)
Nx.shape(tensor) #=> {2, 2}