ExZarr.Array (ExZarr v1.1.0)
View SourceN-dimensional array implementation with chunking and compression support.
Arrays are the core data structure in ExZarr. They provide:
- Arbitrary N-dimensional shapes (1D to N-D)
- Chunked storage for efficient I/O and memory usage
- Compression using various codecs (zlib, zstd, lz4, or none)
- Support for 10 data types (integers, unsigned integers, and floats)
- Persistent storage on filesystem or temporary in-memory storage
- Lazy loading of chunks (only reads what is needed)
Array Structure
An array consists of:
- Shape: The dimensions of the array (e.g.,
{1000, 1000}for a 2D array) - Chunks: The size of each chunk for storage (e.g.,
{100, 100}) - Dtype: The data type of elements (e.g.,
:float64,:int32) - Compressor: The compression codec used for chunks
- Fill value: The default value for uninitialized elements
Memory Efficiency
Arrays use chunked storage to avoid loading entire arrays into memory. Only the chunks needed for a specific operation are loaded and decompressed. This allows working with arrays larger than available RAM.
Examples
# Create a 2D array
{:ok, array} = ExZarr.Array.create(
shape: {1000, 1000},
chunks: {100, 100},
dtype: :float64,
compressor: :zlib,
storage: :memory
)
# Query array properties
ExZarr.Array.ndim(array) # => 2
ExZarr.Array.size(array) # => 1000000
ExZarr.Array.itemsize(array) # => 8 (bytes per float64)
# Convert to binary
{:ok, data} = ExZarr.Array.to_binary(array)
Summary
Functions
Appends data along an axis.
Returns a specification to start this module under a supervisor.
Streams chunks lazily as {chunk_index, data} tuples.
Creates a new array with the specified configuration.
Block indexing for chunk-aligned access.
Boolean indexing with a mask.
Gets the chunk bounds for a given chunk index, considering chunk grids.
Fancy indexing with integer arrays (vindex equivalent).
Orthogonal fancy indexing (oindex equivalent).
Gets a slice of data from the array.
Returns the size of each element in bytes.
Returns the number of dimensions in the array.
Opens an existing array from storage.
Processes chunks in parallel using a mapper function.
Resizes the array to a new shape.
Saves the array metadata to storage.
Sets a slice of data in the array.
Returns the total number of elements in the array.
Streams chunks lazily as {chunk_index, data} tuples.
Streams array slices along a dimension.
Converts the entire array to a binary.
Writes chunks from a stream into an array.
Types
@type t() :: %ExZarr.Array{ cache_enabled: boolean(), chunk_grid_module: module() | nil, chunk_grid_state: struct() | nil, chunks: tuple(), compressor: ExZarr.compressor(), dtype: ExZarr.dtype(), fill_value: number() | nil, metadata: ExZarr.Metadata.t() | ExZarr.MetadataV3.t(), server_pid: pid() | nil, shape: tuple(), storage: ExZarr.Storage.t(), version: 2 | 3 }
Functions
Appends data along an axis.
Efficient for adding data incrementally. The array is resized along the specified axis, and the new data is written at the end.
Parameters
array- The array to append todata- Binary data or tuple to appendopts- Options including:axis
Options
:axis- Axis along which to append (default: 0)
Examples
# Append rows to a 2D array
data = <<...>> # New row data
:ok = ExZarr.Array.append(array, data, axis: 0)
# Append columns to a 2D array
data = <<...>> # New column data
:ok = ExZarr.Array.append(array, data, axis: 1)Returns
{:ok, updated_array}on success with the array struct updated to reflect the new shape{:error, reason}on failure
Notes
- The array is automatically resized to accommodate the new data
- Data size must be compatible with the array shape (all dimensions except axis)
- Returns a new array struct with updated shape (Elixir structs are immutable)
Returns a specification to start this module under a supervisor.
See Supervisor.
@spec chunk_stream( t(), keyword() ) :: Enumerable.t()
Streams chunks lazily as {chunk_index, data} tuples.
Alias for stream_chunks/2 retained for backward compatibility.
Creates a new array with the specified configuration.
Initializes a new Zarr array with the given shape, chunk size, data type, and compression settings. The array can be stored in memory or on the filesystem.
Options
:shape- Tuple specifying array dimensions (required):chunks- Tuple specifying chunk dimensions (required):dtype- Data type (default::float64):compressor- Compression codec (default::zstd):storage- Storage backend (default::memory):path- Path for filesystem storage:fill_value- Fill value for uninitialized chunks (default:0):enable_server- Start ArrayServer for coordinated access (default:false):enable_cache- Enable chunk caching (default:false)
Examples
# Simple 1D array
{:ok, array} = ExZarr.Array.create(
shape: {1000},
chunks: {100}
)
# 2D array with specific dtype
{:ok, array} = ExZarr.Array.create(
shape: {500, 500},
chunks: {50, 50},
dtype: :int32,
compressor: :zlib
)
# Array on filesystem
{:ok, array} = ExZarr.Array.create(
shape: {1000, 1000},
chunks: {100, 100},
storage: :filesystem,
path: "/tmp/my_array"
)Returns
{:ok, array}on success{:error, reason}on failure
Block indexing for chunk-aligned access.
Returns data in complete chunks for efficient access. The blocks are aligned to chunk boundaries, which can be more efficient than arbitrary slicing as it avoids partial chunk reads.
Parameters
array- The array to read fromopts- Options including:startand:stoptuples
Options
:start- Starting indices (default: all zeros):stop- Stopping indices (default: array shape)
Examples
# Read blocks covering a region
{:ok, blocks} = ExZarr.Array.get_blocks(array, start: {0, 0}, stop: {100, 100})
# Read all blocks
{:ok, blocks} = ExZarr.Array.get_blocks(array, [])Returns
{:ok, list}where each element is{start_indices, stop_indices, binary_data}{:error, reason}on failure
Boolean indexing with a mask.
Selects elements where the mask is true. The mask must match the array shape. For multidimensional arrays, the mask is flattened and elements are selected in row-major order.
Parameters
array- The array to indexmask- Tuple of booleans matching the array shape
Examples
# 1D array
mask = {true, false, true, false, true}
{:ok, selected} = ExZarr.Array.get_boolean(array, mask)
# Returns elements at indices 0, 2, 4
# 2D array
mask = {{true, false, true}, {false, true, false}}
{:ok, selected} = ExZarr.Array.get_boolean(array, mask)Returns
{:ok, tuple}containing the selected elements{:error, reason}on failure
Gets the chunk bounds for a given chunk index, considering chunk grids.
For regular arrays, behaves identically to ExZarr.Chunk.chunk_bounds/3. For arrays with irregular chunk grids, uses the grid to determine the actual chunk shape and calculates proper bounds by accumulating sizes.
Parameters
array- Array structchunk_index- Tuple identifying the chunk
Returns
{start_indices, end_indices}- Tuple of start and end coordinates
Examples
# Regular array
{:ok, array} = ExZarr.create(shape: {1000, 1000}, chunks: {100, 100})
ExZarr.Array.get_chunk_bounds(array, {0, 0})
# => {{0, 0}, {100, 100}}
# Array with irregular grid
{:ok, array} = ExZarr.create(
shape: {100, 200},
chunk_grid: %{
"name" => "irregular",
"configuration" => %{
"chunk_sizes" => [[50, 50], [100, 100]]
}
}
)
ExZarr.Array.get_chunk_bounds(array, {0, 0})
# => {{0, 0}, {50, 100}}
Fancy indexing with integer arrays (vindex equivalent).
Select elements using arrays of indices for each dimension. Each index array must have the same length, and the result will have that length in the first dimension.
Parameters
array- The array to indexindex_arrays- List of index arrays, one per dimension
Examples
# 2D array indexing - select 3 specific elements
{:ok, data} = ExZarr.Array.get_fancy(array, [[0, 1, 2], [5, 10, 15]])
# Returns 3 elements at positions (0,5), (1,10), (2,15)
# 1D array indexing
{:ok, data} = ExZarr.Array.get_fancy(array, [[0, 5, 10, 15]])Returns
{:ok, binary}containing the selected elements{:error, reason}on failure
Orthogonal fancy indexing (oindex equivalent).
Like get_fancy but treats each index array as defining a separate axis. The result shape is the product of the lengths of all index arrays.
Parameters
array- The array to indexindex_arrays- List of index arrays, one per dimension
Examples
# 2D array orthogonal indexing
{:ok, data} = ExZarr.Array.get_orthogonal(array, [[0, 1], [5, 10, 15]])
# Returns 2x3 array with elements at all combinations:
# (0,5), (0,10), (0,15), (1,5), (1,10), (1,15)Returns
{:ok, binary}containing the selected elements in row-major order{:error, reason}on failure
Gets a slice of data from the array.
Reads a rectangular region from the array. Only the chunks that overlap with the requested region are loaded and decompressed. This allows efficient access to subsets of large arrays.
Options
:start- Starting index for each dimension (default: all zeros):stop- Stopping index for each dimension (default: array shape)
Named dimensions (v3 arrays only):
- Use dimension names as options with Range or tuple values
- Example:
time: 0..30, latitude: 0..179, longitude: 0..359 - Range values are inclusive (0..30 means elements 0 through 30)
Examples
# Read a 100x100 region from a larger array (numeric)
{:ok, data} = ExZarr.Array.get_slice(array,
start: {0, 0},
stop: {100, 100}
)
# Read using named dimensions (v3 arrays)
{:ok, data} = ExZarr.Array.get_slice(array,
time: 0..30,
latitude: 0..179,
longitude: 0..359
)
# Read entire first row of a 2D array
{:ok, data} = ExZarr.Array.get_slice(array,
start: {0, 0},
stop: {1, 1000}
)Returns
{:ok, binary}containing the requested data in row-major order{:error, reason}on failure
@spec itemsize(t()) :: non_neg_integer()
Returns the size of each element in bytes.
Different data types have different sizes:
:int8,:uint8- 1 byte:int16,:uint16- 2 bytes:int32,:uint32,:float32- 4 bytes:int64,:uint64,:float64- 8 bytes
Examples
{:ok, array} = ExZarr.Array.create(
shape: {100},
chunks: {10},
dtype: :float64
)
ExZarr.Array.itemsize(array)
# => 8
{:ok, array} = ExZarr.Array.create(
shape: {100},
chunks: {10},
dtype: :uint8
)
ExZarr.Array.itemsize(array)
# => 1Returns
Integer representing bytes per element (1, 2, 4, or 8).
@spec ndim(t()) :: non_neg_integer()
Returns the number of dimensions in the array.
Examples
{:ok, array} = ExZarr.Array.create(shape: {100, 200, 300}, chunks: {10, 20, 30})
ExZarr.Array.ndim(array)
# => 3Returns
Integer indicating the number of dimensions (1 for 1D, 2 for 2D, etc.)
Opens an existing array from storage.
Reads the array metadata from storage and initializes the array structure. The array must have been previously saved using ExZarr or another Zarr v2 compatible implementation.
Options
:path- Path to the array directory (required):storage- Storage backend (default::filesystem)
Examples
# Open array from filesystem
{:ok, array} = ExZarr.Array.open(path: "/tmp/my_array")Returns
{:ok, array}on success{:error, :path_not_found}if path does not exist{:error, :metadata_not_found}if .zarray file is missing{:error, reason}for other failures
Processes chunks in parallel using a mapper function.
This is a convenience wrapper around chunk_stream/2 that applies a transformation
to each chunk in parallel and collects the results.
Options
:max_concurrency- Maximum parallel tasks (default: System.schedulers_online()):timeout- Timeout per chunk in ms (default: 30_000):on_timeout-:kill_taskor:exit(default: :kill_task):ordered- Maintain order (default: true)
Examples
# Transform all chunks
Array.parallel_chunk_map(array, fn {index, data} ->
# Process chunk data
transform(data)
end)
|> Enum.to_list()
# With custom concurrency
Array.parallel_chunk_map(array, &process_chunk/1,
max_concurrency: 8,
timeout: 60_000
)
Resizes the array to a new shape.
Can grow or shrink dimensions. When growing, new elements are filled with the array's fill_value. When shrinking, data outside the new bounds is discarded by deleting affected chunks.
Parameters
array- The array to resizenew_shape- New shape as a tuple of positive integers
Examples
# Grow array from {100, 100} to {200, 200}
:ok = ExZarr.Array.resize(array, {200, 200})
# Shrink array from {200, 200} to {100, 100}
:ok = ExZarr.Array.resize(array, {100, 100})Returns
{:ok, updated_array}on success with the array struct updated to reflect the new shape{:error, reason}on failure
Notes
- New elements are lazily created (filled with fill_value on first read)
- Shrinking may delete chunks to free storage space
- The array metadata is updated with the new shape
- Returns a new array struct with updated shape (Elixir structs are immutable)
Saves the array metadata to storage.
Writes the array configuration to a .zarray file in the storage location.
This persists the array structure, allowing it to be reopened later.
Note that chunk data is written separately when chunks are modified.
Options
:path- Path where metadata should be written (for new filesystem storage)
Examples
{:ok, array} = ExZarr.Array.create(shape: {1000}, chunks: {100})
:ok = ExZarr.Array.save(array, path: "/tmp/my_array")Returns
:okon success{:ok, storage}for in-memory storage (returns updated storage){:error, reason}on failure
Sets a slice of data in the array.
Writes data to a rectangular region in the array. The data is automatically split into chunks, compressed, and written to storage. Only the affected chunks are modified.
Parameters
array- The array to write todata- Binary data to write (must match region size and dtype)opts- Options including:startand:stopindices
Options
:start- Starting index for the write (default: all zeros):stop- Stopping index for the write (required for correct multi-dimensional writes)
Examples
# Write 10x10 block of data
data = <<...>> # 100 int32 values = 400 bytes
:ok = ExZarr.Array.set_slice(array, data,
start: {0, 0},
stop: {10, 10}
)
# Write to beginning of 1D array
data = <<...>> # 100 int32 values
:ok = ExZarr.Array.set_slice(array, data, start: {0}, stop: {100})Returns
:okon success{:error, reason}on failure
@spec size(t()) :: non_neg_integer()
Returns the total number of elements in the array.
Calculates the product of all dimensions in the shape.
Examples
{:ok, array} = ExZarr.Array.create(shape: {100, 200}, chunks: {10, 20})
ExZarr.Array.size(array)
# => 20000 (100 * 200)Returns
Non-negative integer representing total element count.
@spec stream_chunks( t(), keyword() ) :: Enumerable.t()
Streams chunks lazily as {chunk_index, data} tuples.
This is the canonical v1.1 streaming read API. Each chunk is read, decompressed, and yielded on demand so memory stays bounded regardless of array size.
Options
:concurrency- Number of concurrent chunk reads (default: 1):parallel- Alias for:concurrency(deprecated, use:concurrency):ordered- Maintain chunk order in output (default: true):timeout- Per-chunk timeout in milliseconds (default: 60_000):progress_callback- Function called with(done, total)progress updates:filter- Function to filter which chunk indices to include:include_missing- Stream all logical chunk indices, not only stored chunks:metadata- When true, yield%{index:, data:, metadata:}maps:on_error-:skip,:halt, orfn index, reason -> ... end
Examples
array
|> ExZarr.Array.stream_chunks()
|> Stream.map(fn {index, data} -> process_chunk(index, data) end)
|> Stream.run()
array
|> ExZarr.Array.stream_chunks(concurrency: 4, ordered: false)
|> Enum.to_list()
array
|> ExZarr.Array.stream_chunks(metadata: true)
|> Enum.map(fn %{index: index, metadata: meta} ->
{index, meta.bounds}
end)Performance
Chunk data memory is bounded by :concurrency (one chunk buffer per in-flight
read). The chunk index list is materialized up front and is O(number of chunks).
Increase :concurrency for cloud storage backends where network latency dominates.
@spec stream_slices(t(), non_neg_integer(), keyword()) :: Enumerable.t()
Streams array slices along a dimension.
Yields {slice_start, data} tuples for each unit slice along along.
This is useful for row-wise or time-step processing without loading the
full array.
Arguments
array- The source arrayalong- Dimension index to slice along (0-based)opts- Slice bounds and streaming options
Slice Options
:start- Slice region start (default: all zeros):stop- Slice region stop (default: array shape):step- Step between slices (default: 1)
Streaming options (:concurrency, :ordered, :timeout, :metadata, :on_error)
match stream_chunks/2. Options :filter and :include_missing apply only to
chunk streaming.
Each unit slice calls get_slice/2 independently. Slices that overlap the same
underlying chunks re-read and decompress those chunks. For row-wise iteration on
large arrays, prefer stream_chunks/2 or enable chunk caching.
Examples
# Stream each row of a 2D array
array
|> ExZarr.Array.stream_slices(0)
|> Enum.each(fn {_start, row_data} -> process_row(row_data) end)
# Stream time steps from a region
array
|> ExZarr.Array.stream_slices(0,
start: {100, 0, 0},
stop: {200, 180, 360},
concurrency: 4
)
|> Enum.to_list()
Converts the entire array to a binary.
Reads all chunks from the array and assembles them into a single binary in row-major (C-order) format. This is useful for loading complete arrays but may use significant memory for large arrays.
Examples
{:ok, array} = ExZarr.Array.create(shape: {10, 10}, chunks: {5, 5})
{:ok, data} = ExZarr.Array.to_binary(array)
# data is a binary with 10 * 10 * itemsize bytesReturns
{:ok, binary}containing all array data{:error, reason}on failure
Memory Warning
This loads the entire array into memory. For a {1000, 1000} array of
:float64, this requires 8MB of memory.
@spec write_stream(t(), Enumerable.t(), keyword()) :: {:ok, map()} | {:error, term()}
Writes chunks from a stream into an array.
Accepts a stream of {chunk_index, binary} tuples or
%{index:, data:} maps. Each chunk is encoded and written atomically.
Options
:batch_size- Chunks to process per batch before optional flush (default: 1):validate- Validate chunk byte size before writing (default: true):checkpoint-fn stats -> ... endcalled after each successful batch:on_error-:halt,:skip, orfn index, reason -> ... end
Durability
Object storage backends (S3, GCS, Azure) provide atomic single-object writes.
The filesystem backend serializes concurrent writers with file locks but does
not use temp-file rename; a crash mid-write can leave a truncated chunk file.
A failed stream leaves previously written chunks intact. Use :checkpoint to
record progress for resumable ingestion.
Examples
ExZarr.Array.write_stream(array, chunks)
|> case do
{:ok, %{written: n}} -> IO.puts("Wrote #{n} chunks")
{:error, reason} -> IO.inspect(reason)
end
File.stream!("chunks.bin", [], 40_000)
|> Stream.with_index()
|> Stream.map(fn {data, i} -> {chunk_index(i), data} end)
|> then(&ExZarr.Array.write_stream(array, &1, batch_size: 8))