ExZarr.Nx (ExZarr v1.1.0)

View Source

Optimized Nx integration for ExZarr arrays.

This module provides efficient conversion between ExZarr arrays and Nx tensors using direct binary transfer, avoiding the overhead of nested tuple conversion.

Performance

Direct binary conversion is 5-10x faster than nested tuple approach:

  • Optimized (this module): 10-20ms for 8MB (400-800 MB/s)
  • Nested tuples: 80-150ms for 8MB (50-100 MB/s)

Usage

# Convert ExZarr array to Nx tensor
{:ok, array} = ExZarr.open(path: "/data/my_array")
{:ok, tensor} = ExZarr.Nx.to_tensor(array)

# Convert Nx tensor to ExZarr array
tensor = Nx.iota({1000, 1000})
{:ok, array} = ExZarr.Nx.from_tensor(tensor,
  path: "/data/output",
  chunks: {100, 100}
)

Compatibility

All 10 standard numeric types are supported:

  • Signed integers: int8, int16, int32, int64
  • Unsigned integers: uint8, uint16, uint32, uint64
  • Floating point: float32, float64

Unsupported types (BF16, FP16, complex) will return errors with helpful messages.

Memory Efficiency

For large arrays that don't fit in memory, use to_tensor_chunked/3:

{:ok, tensors} = ExZarr.Nx.to_tensor_chunked(array, {100, 100})
# Returns stream of smaller tensors

See ExZarr.Nx.DataLoader for ML training workflows.

Summary

Types

Nx tensor type specification.

ExZarr dtype atom.

Functions

Converts Nx tensor to ExZarr array using direct binary transfer.

Converts Nx type to ExZarr dtype.

Returns list of supported dtype conversions.

Returns list of supported Nx type conversions.

Converts ExZarr array to Nx tensor using direct binary transfer.

Converts ExZarr array to stream of Nx tensors by processing in chunks.

Converts ExZarr dtype to Nx type.

Types

nx_type()

@type nx_type() ::
  {:s, 8}
  | {:s, 16}
  | {:s, 32}
  | {:s, 64}
  | {:u, 8}
  | {:u, 16}
  | {:u, 32}
  | {:u, 64}
  | {:f, 32}
  | {:f, 64}

Nx tensor type specification.

zarr_dtype()

@type zarr_dtype() ::
  :int8
  | :int16
  | :int32
  | :int64
  | :uint8
  | :uint16
  | :uint32
  | :uint64
  | :float32
  | :float64

ExZarr dtype atom.

Functions

from_tensor(tensor, opts)

@spec from_tensor(
  Nx.Tensor.t(),
  keyword()
) :: {:ok, ExZarr.Array.t()} | {:error, term()}

Converts Nx tensor to ExZarr array using direct binary transfer.

Creates a new ExZarr array and writes the tensor data efficiently without converting to nested tuples.

Required Options

  • :chunks - Chunk shape as tuple (required)

Optional Options

  • :storage - Storage backend atom (default: :memory)
  • :path - Path for filesystem storage
  • :compressor - Compression codec (default: :zlib)
  • :fill_value - Fill value for uninitialized chunks (default: 0)
  • :zarr_version - Zarr format version, 2 or 3 (default: 2)

Returns

  • {:ok, array} - Successfully created ExZarr.Array
  • {:error, reason} - Conversion failed

Examples

# Basic conversion to memory
tensor = Nx.iota({1000, 1000})
{:ok, array} = ExZarr.Nx.from_tensor(tensor, chunks: {100, 100})

# Save to filesystem
{:ok, array} = ExZarr.Nx.from_tensor(tensor,
  storage: :filesystem,
  path: "/data/output",
  chunks: {100, 100},
  compressor: %{id: "zstd", level: 3}
)

# Create v3 array
{:ok, array} = ExZarr.Nx.from_tensor(tensor,
  chunks: {100, 100},
  zarr_version: 3
)

nx_to_zarr_type(nx_type)

@spec nx_to_zarr_type(nx_type()) :: {:ok, zarr_dtype()} | {:error, String.t()}

Converts Nx type to ExZarr dtype.

Examples

iex> ExZarr.Nx.nx_to_zarr_type({:f, 64})
{:ok, :float64}

iex> ExZarr.Nx.nx_to_zarr_type({:s, 32})
{:ok, :int32}

iex> ExZarr.Nx.nx_to_zarr_type({:bf, 16})
{:error, "Unsupported Nx type: {:bf, 16}. BF16 is not part of Zarr specification."}

supported_dtypes()

@spec supported_dtypes() :: [zarr_dtype()]

Returns list of supported dtype conversions.

Examples

iex> dtypes = ExZarr.Nx.supported_dtypes()
iex> :float64 in dtypes
true
iex> length(dtypes)
10

supported_nx_types()

@spec supported_nx_types() :: [nx_type()]

Returns list of supported Nx type conversions.

Examples

iex> types = ExZarr.Nx.supported_nx_types()
iex> {:f, 64} in types
true
iex> length(types)
10

to_tensor(array, opts \\ [])

@spec to_tensor(
  ExZarr.Array.t(),
  keyword()
) :: {:ok, Nx.Tensor.t()} | {:error, term()}

Converts ExZarr array to Nx tensor using direct binary transfer.

This is the recommended way to convert ExZarr arrays to Nx tensors. Uses efficient binary conversion without intermediate nested tuple representation.

Options

  • :backend - Nx backend to use (default: current default backend)
  • :names - Axis names for the tensor (list of atoms)

Returns

  • {:ok, tensor} - Successfully converted Nx.Tensor
  • {:error, reason} - Conversion failed

Performance

Conversion time scales linearly with array size:

  • 1 MB: ~2ms
  • 10 MB: ~15ms
  • 100 MB: ~150ms

For arrays larger than available RAM, use to_tensor_chunked/3 instead.

Examples

# Basic conversion
{:ok, array} = ExZarr.open(path: "/data/array")
{:ok, tensor} = ExZarr.Nx.to_tensor(array)

# Transfer to specific backend
{:ok, tensor} = ExZarr.Nx.to_tensor(array, backend: EXLA.Backend)

# With axis names
{:ok, tensor} = ExZarr.Nx.to_tensor(array, names: [:batch, :features])

to_tensor_chunked(array, chunk_size, opts \\ [])

@spec to_tensor_chunked(ExZarr.Array.t(), tuple(), keyword()) ::
  Enumerable.t({:ok, Nx.Tensor.t()} | {:error, term()})

Converts ExZarr array to stream of Nx tensors by processing in chunks.

For large arrays that don't fit in memory, this function loads the array in chunks and yields a tensor for each chunk. Useful for processing large datasets incrementally.

Parameters

  • array - ExZarr array to convert
  • chunk_size - Size of chunks to load (tuple matching array dimensionality)
  • opts - Options (same as to_tensor/2)

Returns

Stream that yields {:ok, tensor} or {:error, reason} for each chunk.

Examples

# Process 100×100 chunks from large array
{:ok, array} = ExZarr.open(path: "/data/large_array")

array
|> ExZarr.Nx.to_tensor_chunked({100, 100})
|> Stream.each(fn {:ok, tensor} ->
  # Process each tensor chunk
  result = Nx.mean(tensor) |> Nx.to_number()
  IO.puts("Chunk mean: #{result}")
end)
|> Stream.run()

# Map over chunks in parallel
results =
  array
  |> ExZarr.Nx.to_tensor_chunked({100, 100})
  |> Task.async_stream(fn {:ok, tensor} ->
    Nx.sum(tensor) |> Nx.to_number()
  end, max_concurrency: 4)
  |> Enum.to_list()

zarr_to_nx_type(dtype)

@spec zarr_to_nx_type(zarr_dtype()) :: {:ok, nx_type()} | {:error, String.t()}

Converts ExZarr dtype to Nx type.

Examples

iex> ExZarr.Nx.zarr_to_nx_type(:float64)
{:ok, {:f, 64}}

iex> ExZarr.Nx.zarr_to_nx_type(:int32)
{:ok, {:s, 32}}

iex> ExZarr.Nx.zarr_to_nx_type(:invalid)
{:error, "Unsupported dtype: :invalid"}