ExZarr.Array (ExZarr v1.1.0)

View Source

N-dimensional array implementation with chunking and compression support.

Arrays are the core data structure in ExZarr. They provide:

  • Arbitrary N-dimensional shapes (1D to N-D)
  • Chunked storage for efficient I/O and memory usage
  • Compression using various codecs (zlib, zstd, lz4, or none)
  • Support for 10 data types (integers, unsigned integers, and floats)
  • Persistent storage on filesystem or temporary in-memory storage
  • Lazy loading of chunks (only reads what is needed)

Array Structure

An array consists of:

  • Shape: The dimensions of the array (e.g., {1000, 1000} for a 2D array)
  • Chunks: The size of each chunk for storage (e.g., {100, 100})
  • Dtype: The data type of elements (e.g., :float64, :int32)
  • Compressor: The compression codec used for chunks
  • Fill value: The default value for uninitialized elements

Memory Efficiency

Arrays use chunked storage to avoid loading entire arrays into memory. Only the chunks needed for a specific operation are loaded and decompressed. This allows working with arrays larger than available RAM.

Examples

# Create a 2D array
{:ok, array} = ExZarr.Array.create(
  shape: {1000, 1000},
  chunks: {100, 100},
  dtype: :float64,
  compressor: :zlib,
  storage: :memory
)

# Query array properties
ExZarr.Array.ndim(array)     # => 2
ExZarr.Array.size(array)     # => 1000000
ExZarr.Array.itemsize(array) # => 8 (bytes per float64)

# Convert to binary
{:ok, data} = ExZarr.Array.to_binary(array)

Summary

Functions

Appends data along an axis.

Returns a specification to start this module under a supervisor.

Streams chunks lazily as {chunk_index, data} tuples.

Creates a new array with the specified configuration.

Block indexing for chunk-aligned access.

Boolean indexing with a mask.

Gets the chunk bounds for a given chunk index, considering chunk grids.

Fancy indexing with integer arrays (vindex equivalent).

Orthogonal fancy indexing (oindex equivalent).

Gets a slice of data from the array.

Returns the size of each element in bytes.

Returns the number of dimensions in the array.

Opens an existing array from storage.

Processes chunks in parallel using a mapper function.

Resizes the array to a new shape.

Saves the array metadata to storage.

Sets a slice of data in the array.

Returns the total number of elements in the array.

Streams chunks lazily as {chunk_index, data} tuples.

Streams array slices along a dimension.

Converts the entire array to a binary.

Writes chunks from a stream into an array.

Types

t()

@type t() :: %ExZarr.Array{
  cache_enabled: boolean(),
  chunk_grid_module: module() | nil,
  chunk_grid_state: struct() | nil,
  chunks: tuple(),
  compressor: ExZarr.compressor(),
  dtype: ExZarr.dtype(),
  fill_value: number() | nil,
  metadata: ExZarr.Metadata.t() | ExZarr.MetadataV3.t(),
  server_pid: pid() | nil,
  shape: tuple(),
  storage: ExZarr.Storage.t(),
  version: 2 | 3
}

Functions

append(array, data, opts \\ [])

@spec append(t(), binary(), keyword()) :: {:ok, t()} | {:error, term()}

Appends data along an axis.

Efficient for adding data incrementally. The array is resized along the specified axis, and the new data is written at the end.

Parameters

  • array - The array to append to
  • data - Binary data or tuple to append
  • opts - Options including :axis

Options

  • :axis - Axis along which to append (default: 0)

Examples

# Append rows to a 2D array
data = <<...>>  # New row data
:ok = ExZarr.Array.append(array, data, axis: 0)

# Append columns to a 2D array
data = <<...>>  # New column data
:ok = ExZarr.Array.append(array, data, axis: 1)

Returns

  • {:ok, updated_array} on success with the array struct updated to reflect the new shape
  • {:error, reason} on failure

Notes

  • The array is automatically resized to accommodate the new data
  • Data size must be compatible with the array shape (all dimensions except axis)
  • Returns a new array struct with updated shape (Elixir structs are immutable)

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

chunk_stream(array, opts \\ [])

@spec chunk_stream(
  t(),
  keyword()
) :: Enumerable.t()

Streams chunks lazily as {chunk_index, data} tuples.

Alias for stream_chunks/2 retained for backward compatibility.

create(opts)

@spec create(keyword()) :: {:ok, t()} | {:error, term()}

Creates a new array with the specified configuration.

Initializes a new Zarr array with the given shape, chunk size, data type, and compression settings. The array can be stored in memory or on the filesystem.

Options

  • :shape - Tuple specifying array dimensions (required)
  • :chunks - Tuple specifying chunk dimensions (required)
  • :dtype - Data type (default: :float64)
  • :compressor - Compression codec (default: :zstd)
  • :storage - Storage backend (default: :memory)
  • :path - Path for filesystem storage
  • :fill_value - Fill value for uninitialized chunks (default: 0)
  • :enable_server - Start ArrayServer for coordinated access (default: false)
  • :enable_cache - Enable chunk caching (default: false)

Examples

# Simple 1D array
{:ok, array} = ExZarr.Array.create(
  shape: {1000},
  chunks: {100}
)

# 2D array with specific dtype
{:ok, array} = ExZarr.Array.create(
  shape: {500, 500},
  chunks: {50, 50},
  dtype: :int32,
  compressor: :zlib
)

# Array on filesystem
{:ok, array} = ExZarr.Array.create(
  shape: {1000, 1000},
  chunks: {100, 100},
  storage: :filesystem,
  path: "/tmp/my_array"
)

Returns

  • {:ok, array} on success
  • {:error, reason} on failure

get_blocks(array, opts \\ [])

@spec get_blocks(
  t(),
  keyword()
) :: {:ok, [{tuple(), tuple(), binary()}]} | {:error, term()}

Block indexing for chunk-aligned access.

Returns data in complete chunks for efficient access. The blocks are aligned to chunk boundaries, which can be more efficient than arbitrary slicing as it avoids partial chunk reads.

Parameters

  • array - The array to read from
  • opts - Options including :start and :stop tuples

Options

  • :start - Starting indices (default: all zeros)
  • :stop - Stopping indices (default: array shape)

Examples

# Read blocks covering a region
{:ok, blocks} = ExZarr.Array.get_blocks(array, start: {0, 0}, stop: {100, 100})

# Read all blocks
{:ok, blocks} = ExZarr.Array.get_blocks(array, [])

Returns

  • {:ok, list} where each element is {start_indices, stop_indices, binary_data}
  • {:error, reason} on failure

get_boolean(array, mask)

@spec get_boolean(t(), tuple()) :: {:ok, tuple()} | {:error, term()}

Boolean indexing with a mask.

Selects elements where the mask is true. The mask must match the array shape. For multidimensional arrays, the mask is flattened and elements are selected in row-major order.

Parameters

  • array - The array to index
  • mask - Tuple of booleans matching the array shape

Examples

# 1D array
mask = {true, false, true, false, true}
{:ok, selected} = ExZarr.Array.get_boolean(array, mask)
# Returns elements at indices 0, 2, 4

# 2D array
mask = {{true, false, true}, {false, true, false}}
{:ok, selected} = ExZarr.Array.get_boolean(array, mask)

Returns

  • {:ok, tuple} containing the selected elements
  • {:error, reason} on failure

get_chunk_bounds(array, chunk_index)

@spec get_chunk_bounds(t(), tuple()) :: {tuple(), tuple()}

Gets the chunk bounds for a given chunk index, considering chunk grids.

For regular arrays, behaves identically to ExZarr.Chunk.chunk_bounds/3. For arrays with irregular chunk grids, uses the grid to determine the actual chunk shape and calculates proper bounds by accumulating sizes.

Parameters

  • array - Array struct
  • chunk_index - Tuple identifying the chunk

Returns

  • {start_indices, end_indices} - Tuple of start and end coordinates

Examples

# Regular array
{:ok, array} = ExZarr.create(shape: {1000, 1000}, chunks: {100, 100})
ExZarr.Array.get_chunk_bounds(array, {0, 0})
# => {{0, 0}, {100, 100}}

# Array with irregular grid
{:ok, array} = ExZarr.create(
  shape: {100, 200},
  chunk_grid: %{
    "name" => "irregular",
    "configuration" => %{
      "chunk_sizes" => [[50, 50], [100, 100]]
    }
  }
)
ExZarr.Array.get_chunk_bounds(array, {0, 0})
# => {{0, 0}, {50, 100}}

get_fancy(array, index_arrays)

@spec get_fancy(t(), [[integer()]]) :: {:ok, binary()} | {:error, term()}

Fancy indexing with integer arrays (vindex equivalent).

Select elements using arrays of indices for each dimension. Each index array must have the same length, and the result will have that length in the first dimension.

Parameters

  • array - The array to index
  • index_arrays - List of index arrays, one per dimension

Examples

# 2D array indexing - select 3 specific elements
{:ok, data} = ExZarr.Array.get_fancy(array, [[0, 1, 2], [5, 10, 15]])
# Returns 3 elements at positions (0,5), (1,10), (2,15)

# 1D array indexing
{:ok, data} = ExZarr.Array.get_fancy(array, [[0, 5, 10, 15]])

Returns

  • {:ok, binary} containing the selected elements
  • {:error, reason} on failure

get_orthogonal(array, index_arrays)

@spec get_orthogonal(t(), [[integer()]]) :: {:ok, binary()} | {:error, term()}

Orthogonal fancy indexing (oindex equivalent).

Like get_fancy but treats each index array as defining a separate axis. The result shape is the product of the lengths of all index arrays.

Parameters

  • array - The array to index
  • index_arrays - List of index arrays, one per dimension

Examples

# 2D array orthogonal indexing
{:ok, data} = ExZarr.Array.get_orthogonal(array, [[0, 1], [5, 10, 15]])
# Returns 2x3 array with elements at all combinations:
# (0,5), (0,10), (0,15), (1,5), (1,10), (1,15)

Returns

  • {:ok, binary} containing the selected elements in row-major order
  • {:error, reason} on failure

get_slice(array, opts)

@spec get_slice(
  t(),
  keyword()
) :: {:ok, binary() | tuple()} | {:error, term()}

Gets a slice of data from the array.

Reads a rectangular region from the array. Only the chunks that overlap with the requested region are loaded and decompressed. This allows efficient access to subsets of large arrays.

Options

  • :start - Starting index for each dimension (default: all zeros)
  • :stop - Stopping index for each dimension (default: array shape)

Named dimensions (v3 arrays only):

  • Use dimension names as options with Range or tuple values
  • Example: time: 0..30, latitude: 0..179, longitude: 0..359
  • Range values are inclusive (0..30 means elements 0 through 30)

Examples

# Read a 100x100 region from a larger array (numeric)
{:ok, data} = ExZarr.Array.get_slice(array,
  start: {0, 0},
  stop: {100, 100}
)

# Read using named dimensions (v3 arrays)
{:ok, data} = ExZarr.Array.get_slice(array,
  time: 0..30,
  latitude: 0..179,
  longitude: 0..359
)

# Read entire first row of a 2D array
{:ok, data} = ExZarr.Array.get_slice(array,
  start: {0, 0},
  stop: {1, 1000}
)

Returns

  • {:ok, binary} containing the requested data in row-major order
  • {:error, reason} on failure

itemsize(array)

@spec itemsize(t()) :: non_neg_integer()

Returns the size of each element in bytes.

Different data types have different sizes:

  • :int8, :uint8 - 1 byte
  • :int16, :uint16 - 2 bytes
  • :int32, :uint32, :float32 - 4 bytes
  • :int64, :uint64, :float64 - 8 bytes

Examples

{:ok, array} = ExZarr.Array.create(
  shape: {100},
  chunks: {10},
  dtype: :float64
)
ExZarr.Array.itemsize(array)
# => 8

{:ok, array} = ExZarr.Array.create(
  shape: {100},
  chunks: {10},
  dtype: :uint8
)
ExZarr.Array.itemsize(array)
# => 1

Returns

Integer representing bytes per element (1, 2, 4, or 8).

ndim(array)

@spec ndim(t()) :: non_neg_integer()

Returns the number of dimensions in the array.

Examples

{:ok, array} = ExZarr.Array.create(shape: {100, 200, 300}, chunks: {10, 20, 30})
ExZarr.Array.ndim(array)
# => 3

Returns

Integer indicating the number of dimensions (1 for 1D, 2 for 2D, etc.)

open(opts)

@spec open(keyword()) :: {:ok, t()} | {:error, term()}

Opens an existing array from storage.

Reads the array metadata from storage and initializes the array structure. The array must have been previously saved using ExZarr or another Zarr v2 compatible implementation.

Options

  • :path - Path to the array directory (required)
  • :storage - Storage backend (default: :filesystem)

Examples

# Open array from filesystem
{:ok, array} = ExZarr.Array.open(path: "/tmp/my_array")

Returns

  • {:ok, array} on success
  • {:error, :path_not_found} if path does not exist
  • {:error, :metadata_not_found} if .zarray file is missing
  • {:error, reason} for other failures

parallel_chunk_map(array, mapper_fn, opts \\ [])

@spec parallel_chunk_map(t(), ({tuple(), binary()} -> term()), keyword()) ::
  Enumerable.t()

Processes chunks in parallel using a mapper function.

This is a convenience wrapper around chunk_stream/2 that applies a transformation to each chunk in parallel and collects the results.

Options

  • :max_concurrency - Maximum parallel tasks (default: System.schedulers_online())
  • :timeout - Timeout per chunk in ms (default: 30_000)
  • :on_timeout - :kill_task or :exit (default: :kill_task)
  • :ordered - Maintain order (default: true)

Examples

# Transform all chunks
Array.parallel_chunk_map(array, fn {index, data} ->
  # Process chunk data
  transform(data)
end)
|> Enum.to_list()

# With custom concurrency
Array.parallel_chunk_map(array, &process_chunk/1,
  max_concurrency: 8,
  timeout: 60_000
)

resize(array, new_shape)

@spec resize(t(), tuple()) :: {:ok, t()} | {:error, term()}

Resizes the array to a new shape.

Can grow or shrink dimensions. When growing, new elements are filled with the array's fill_value. When shrinking, data outside the new bounds is discarded by deleting affected chunks.

Parameters

  • array - The array to resize
  • new_shape - New shape as a tuple of positive integers

Examples

# Grow array from {100, 100} to {200, 200}
:ok = ExZarr.Array.resize(array, {200, 200})

# Shrink array from {200, 200} to {100, 100}
:ok = ExZarr.Array.resize(array, {100, 100})

Returns

  • {:ok, updated_array} on success with the array struct updated to reflect the new shape
  • {:error, reason} on failure

Notes

  • New elements are lazily created (filled with fill_value on first read)
  • Shrinking may delete chunks to free storage space
  • The array metadata is updated with the new shape
  • Returns a new array struct with updated shape (Elixir structs are immutable)

save(array, opts)

@spec save(
  t(),
  keyword()
) :: :ok | {:error, term()}

Saves the array metadata to storage.

Writes the array configuration to a .zarray file in the storage location. This persists the array structure, allowing it to be reopened later. Note that chunk data is written separately when chunks are modified.

Options

  • :path - Path where metadata should be written (for new filesystem storage)

Examples

{:ok, array} = ExZarr.Array.create(shape: {1000}, chunks: {100})
:ok = ExZarr.Array.save(array, path: "/tmp/my_array")

Returns

  • :ok on success
  • {:ok, storage} for in-memory storage (returns updated storage)
  • {:error, reason} on failure

set_slice(array, data, opts)

@spec set_slice(t(), binary(), keyword()) :: :ok | {:error, term()}

Sets a slice of data in the array.

Writes data to a rectangular region in the array. The data is automatically split into chunks, compressed, and written to storage. Only the affected chunks are modified.

Parameters

  • array - The array to write to
  • data - Binary data to write (must match region size and dtype)
  • opts - Options including :start and :stop indices

Options

  • :start - Starting index for the write (default: all zeros)
  • :stop - Stopping index for the write (required for correct multi-dimensional writes)

Examples

# Write 10x10 block of data
data = <<...>>  # 100 int32 values = 400 bytes
:ok = ExZarr.Array.set_slice(array, data,
  start: {0, 0},
  stop: {10, 10}
)

# Write to beginning of 1D array
data = <<...>>  # 100 int32 values
:ok = ExZarr.Array.set_slice(array, data, start: {0}, stop: {100})

Returns

  • :ok on success
  • {:error, reason} on failure

size(array)

@spec size(t()) :: non_neg_integer()

Returns the total number of elements in the array.

Calculates the product of all dimensions in the shape.

Examples

{:ok, array} = ExZarr.Array.create(shape: {100, 200}, chunks: {10, 20})
ExZarr.Array.size(array)
# => 20000 (100 * 200)

Returns

Non-negative integer representing total element count.

stream_chunks(array, opts \\ [])

@spec stream_chunks(
  t(),
  keyword()
) :: Enumerable.t()

Streams chunks lazily as {chunk_index, data} tuples.

This is the canonical v1.1 streaming read API. Each chunk is read, decompressed, and yielded on demand so memory stays bounded regardless of array size.

Options

  • :concurrency - Number of concurrent chunk reads (default: 1)
  • :parallel - Alias for :concurrency (deprecated, use :concurrency)
  • :ordered - Maintain chunk order in output (default: true)
  • :timeout - Per-chunk timeout in milliseconds (default: 60_000)
  • :progress_callback - Function called with (done, total) progress updates
  • :filter - Function to filter which chunk indices to include
  • :include_missing - Stream all logical chunk indices, not only stored chunks
  • :metadata - When true, yield %{index:, data:, metadata:} maps
  • :on_error - :skip, :halt, or fn index, reason -> ... end

Examples

array
|> ExZarr.Array.stream_chunks()
|> Stream.map(fn {index, data} -> process_chunk(index, data) end)
|> Stream.run()

array
|> ExZarr.Array.stream_chunks(concurrency: 4, ordered: false)
|> Enum.to_list()

array
|> ExZarr.Array.stream_chunks(metadata: true)
|> Enum.map(fn %{index: index, metadata: meta} ->
  {index, meta.bounds}
end)

Performance

Chunk data memory is bounded by :concurrency (one chunk buffer per in-flight read). The chunk index list is materialized up front and is O(number of chunks). Increase :concurrency for cloud storage backends where network latency dominates.

stream_slices(array, along, opts \\ [])

@spec stream_slices(t(), non_neg_integer(), keyword()) :: Enumerable.t()

Streams array slices along a dimension.

Yields {slice_start, data} tuples for each unit slice along along. This is useful for row-wise or time-step processing without loading the full array.

Arguments

  • array - The source array
  • along - Dimension index to slice along (0-based)
  • opts - Slice bounds and streaming options

Slice Options

  • :start - Slice region start (default: all zeros)
  • :stop - Slice region stop (default: array shape)
  • :step - Step between slices (default: 1)

Streaming options (:concurrency, :ordered, :timeout, :metadata, :on_error) match stream_chunks/2. Options :filter and :include_missing apply only to chunk streaming.

Each unit slice calls get_slice/2 independently. Slices that overlap the same underlying chunks re-read and decompress those chunks. For row-wise iteration on large arrays, prefer stream_chunks/2 or enable chunk caching.

Examples

# Stream each row of a 2D array
array
|> ExZarr.Array.stream_slices(0)
|> Enum.each(fn {_start, row_data} -> process_row(row_data) end)

# Stream time steps from a region
array
|> ExZarr.Array.stream_slices(0,
  start: {100, 0, 0},
  stop: {200, 180, 360},
  concurrency: 4
)
|> Enum.to_list()

to_binary(array)

@spec to_binary(t()) :: {:ok, binary()} | {:error, term()}

Converts the entire array to a binary.

Reads all chunks from the array and assembles them into a single binary in row-major (C-order) format. This is useful for loading complete arrays but may use significant memory for large arrays.

Examples

{:ok, array} = ExZarr.Array.create(shape: {10, 10}, chunks: {5, 5})
{:ok, data} = ExZarr.Array.to_binary(array)
# data is a binary with 10 * 10 * itemsize bytes

Returns

  • {:ok, binary} containing all array data
  • {:error, reason} on failure

Memory Warning

This loads the entire array into memory. For a {1000, 1000} array of :float64, this requires 8MB of memory.

write_stream(array, stream, opts \\ [])

@spec write_stream(t(), Enumerable.t(), keyword()) :: {:ok, map()} | {:error, term()}

Writes chunks from a stream into an array.

Accepts a stream of {chunk_index, binary} tuples or %{index:, data:} maps. Each chunk is encoded and written atomically.

Options

  • :batch_size - Chunks to process per batch before optional flush (default: 1)
  • :validate - Validate chunk byte size before writing (default: true)
  • :checkpoint - fn stats -> ... end called after each successful batch
  • :on_error - :halt, :skip, or fn index, reason -> ... end

Durability

Object storage backends (S3, GCS, Azure) provide atomic single-object writes. The filesystem backend serializes concurrent writers with file locks but does not use temp-file rename; a crash mid-write can leave a truncated chunk file. A failed stream leaves previously written chunks intact. Use :checkpoint to record progress for resumable ingestion.

Examples

ExZarr.Array.write_stream(array, chunks)
|> case do
  {:ok, %{written: n}} -> IO.puts("Wrote #{n} chunks")
  {:error, reason} -> IO.inspect(reason)
end

File.stream!("chunks.bin", [], 40_000)
|> Stream.with_index()
|> Stream.map(fn {data, i} -> {chunk_index(i), data} end)
|> then(&ExZarr.Array.write_stream(array, &1, batch_size: 8))