ExArrow.Schema.Mapper (ex_arrow v0.6.0)

View Source

Bidirectional mapping between Arrow type representations and external type systems.

ExArrow interacts with several Elixir libraries that have their own type systems — Explorer, Nx, and in the future ExZarr and Dataset. This module is the single authority for converting between Arrow dtype strings (used by the NIF layer) and each external representation, eliminating duplicated mapping logic across bridge modules.

Arrow dtype strings

The NIF layer identifies column types with short string codes:

CodeArrow type
"s8"Int8
"s16"Int16
"s32"Int32
"s64"Int64
"u8"UInt8
"u16"UInt16
"u32"UInt32
"u64"UInt64
"f32"Float32
"f64"Float64
"bool"Boolean
"utf8"Utf8

These are the canonical internal representation. All public conversion functions accept and return these strings.

Extensibility

New external targets (e.g. ExZarr, Dataset) can be added by introducing new target_dtype_to_arrow/1 and arrow_dtype_to_target/1 clause groups. The existing targets are grouped by module section below.

Summary

Functions

Convert an Arrow dtype string to an Explorer dtype atom.

Convert an Arrow dtype string to an Nx dtype tuple.

Convert an Arrow dtype string to an Arrow type atom.

Convert an Arrow type atom (as returned by ExArrow.Schema.fields/1) to an Arrow dtype string.

Convert an Explorer dtype atom to an Arrow dtype string.

Returns true if the given Arrow dtype string maps to a numeric Nx dtype, false otherwise.

Convert an Nx dtype tuple to an Arrow dtype string.

Types

arrow_dtype()

@type arrow_dtype() :: String.t()

nx_dtype()

@type nx_dtype() :: {:s, 8 | 16 | 32 | 64} | {:u, 8 | 16 | 32 | 64} | {:f, 32 | 64}

Functions

arrow_dtype_to_explorer(dtype)

@spec arrow_dtype_to_explorer(arrow_dtype()) :: {:ok, atom()} | {:error, String.t()}

Convert an Arrow dtype string to an Explorer dtype atom.

Returns {:ok, explorer_dtype} or {:error, message}.

Integer dtypes (s8s64, u8u64) all map to :integer because Explorer does not distinguish integer widths in its dtype system. Float dtypes (f32, f64) map to :float.

arrow_dtype_to_nx(dtype)

@spec arrow_dtype_to_nx(arrow_dtype()) :: {:ok, nx_dtype()} | {:error, String.t()}

Convert an Arrow dtype string to an Nx dtype tuple.

Returns {:ok, nx_dtype} or {:error, message}.

Boolean columns ("bool") map to {:u, 8} because Nx represents booleans as unsigned 8-bit integers with values 0 and 1.

arrow_dtype_to_type_atom(dtype)

@spec arrow_dtype_to_type_atom(arrow_dtype()) :: {:ok, atom()} | {:error, String.t()}

Convert an Arrow dtype string to an Arrow type atom.

Returns {:ok, type_atom} or {:error, message}.

Examples

iex> ExArrow.Schema.Mapper.arrow_dtype_to_type_atom("s64")
{:ok, :int64}

iex> ExArrow.Schema.Mapper.arrow_dtype_to_type_atom("bool")
{:ok, :boolean}

arrow_type_atom_to_dtype(atom)

@spec arrow_type_atom_to_dtype(atom()) :: {:ok, arrow_dtype()} | {:error, String.t()}

Convert an Arrow type atom (as returned by ExArrow.Schema.fields/1) to an Arrow dtype string.

Returns {:ok, dtype_string} or {:error, message}.

This bridges the NIF schema representation (atoms like :int64) to the dtype strings used by column buffer NIFs ("s64").

Examples

iex> ExArrow.Schema.Mapper.arrow_type_atom_to_dtype(:int64)
{:ok, "s64"}

iex> ExArrow.Schema.Mapper.arrow_type_atom_to_dtype(:boolean)
{:ok, "bool"}

iex> ExArrow.Schema.Mapper.arrow_type_atom_to_dtype(:timestamp)
{:error, "unsupported Arrow type atom for dtype mapping: timestamp"}

explorer_dtype_to_arrow(dtype)

@spec explorer_dtype_to_arrow(atom()) :: {:ok, arrow_dtype()} | {:error, String.t()}

Convert an Explorer dtype atom to an Arrow dtype string.

Returns {:ok, dtype_string} or {:error, message}.

Supported Explorer dtypes

Explorer dtypeArrow dtypeNotes
:integer"s64"Explorer stores as 64-bit int
:float"f64"Explorer stores as 64-bit float
:boolean"bool"Arrow Boolean column
:string"utf8"Arrow Utf8 column

Explorer dtypes :date, :datetime, :duration, and :nil are not yet mapped and return an error. These will be added as the NIF layer gains support for the corresponding Arrow types.

numeric?(arg1)

@spec numeric?(arrow_dtype()) :: boolean()

Returns true if the given Arrow dtype string maps to a numeric Nx dtype, false otherwise.

Examples

iex> ExArrow.Schema.Mapper.numeric?("s64")
true

iex> ExArrow.Schema.Mapper.numeric?("bool")
true

iex> ExArrow.Schema.Mapper.numeric?("utf8")
false

nx_dtype_to_arrow(dtype)

@spec nx_dtype_to_arrow(nx_dtype()) :: {:ok, arrow_dtype()} | {:error, String.t()}

Convert an Nx dtype tuple to an Arrow dtype string.

Returns {:ok, dtype_string} or {:error, message}.

Supported Nx dtypes

Nx dtypeArrow dtype
{:s, 8}"s8"
{:s, 16}"s16"
{:s, 32}"s32"
{:s, 64}"s64"
{:u, 8}"u8"
{:u, 16}"u16"
{:u, 32}"u32"
{:u, 64}"u64"
{:f, 32}"f32"
{:f, 64}"f64"

Nx does not have a dedicated boolean dtype; booleans are represented as {:u, 8} with values 0 and 1. Arrow Boolean columns map to {:u, 8} via arrow_dtype_to_nx/1.