ExZarr.Codecs.ShardingIndexed (ExZarr v1.1.0)

View Source

Implements the Zarr v3 sharding-indexed codec.

Sharding combines multiple logical chunks into single physical shard files, dramatically reducing metadata overhead for cloud storage systems.

Benefits

  • Reduced cloud API calls: 100 chunks in 1 shard = 99% fewer S3 requests
  • Lower metadata overhead: Less filesystem or object storage metadata
  • Better caching: Larger units suitable for block-based caching

Shard Structure

Per the Zarr v3 specification:

[Chunk 0 data][Chunk 1 data]...[Chunk N data][Index][Index Size (8 bytes)]

The index is a list of (offset, size) pairs for each chunk, encoded as little-endian uint64 pairs. The last 8 bytes store the index size.

Configuration

  • :chunk_shape - Tuple defining shard dimensions in chunks (e.g., {10, 10})
  • :codecs - Codec pipeline for individual chunks within the shard
  • :index_codecs - Codec pipeline for the shard index (default: bytes + crc32c)
  • :index_location - :start or :end (default: :end)

Example

config = %{
  "chunk_shape" => [10, 10],
  "codecs" => [
    %{name: "bytes"},
    %{name: "gzip", configuration: %{level: 5}}
  ],
  "index_codecs" => [
    %{name: "bytes"},
    %{name: "crc32c"}
  ],
  "index_location" => "end"
}

{:ok, shard_codec} = ShardingIndexed.init(config)

# Encode multiple chunks into shard
chunks = %{
  {0, 0} => <<1, 2, 3, 4>>,
  {0, 1} => <<5, 6, 7, 8>>,
  {1, 0} => <<9, 10, 11, 12>>
}
{:ok, shard_binary} = ShardingIndexed.encode(chunks, shard_codec)

# Extract specific chunk from shard
{:ok, chunk_data} = ShardingIndexed.decode_chunk(shard_binary, {0, 1}, shard_codec)

# Decode all chunks from shard
{:ok, all_chunks} = ShardingIndexed.decode(shard_binary, shard_codec)

Specification

Zarr v3 Sharding Extension: https://zarr-specs.readthedocs.io/en/latest/v3/codecs/sharding-indexed/v1.0.html

Summary

Functions

Decodes all chunks from shard into a map.

Decodes specific chunk from shard.

Encodes multiple chunks into a shard with embedded index.

Initializes the sharding codec with configuration.

Types

chunk_data()

@type chunk_data() :: binary()

chunk_index()

@type chunk_index() :: tuple()

chunks_map()

@type chunks_map() :: %{required(chunk_index()) => chunk_data()}

index_location()

@type index_location() :: :start | :end

shard_binary()

@type shard_binary() :: binary()

shard_index()

@type shard_index() :: %{
  chunk_offsets: %{
    required(chunk_index()) =>
      {offset :: non_neg_integer(), size :: non_neg_integer()}
  },
  chunk_indices: [chunk_index()]
}

t()

@type t() :: %ExZarr.Codecs.ShardingIndexed{
  chunk_shape: tuple(),
  codecs: [map()],
  index_codecs: [map()],
  index_location: index_location()
}

Functions

decode(shard_binary, codec)

@spec decode(shard_binary(), t()) :: {:ok, chunks_map()} | {:error, term()}

Decodes all chunks from shard into a map.

Parameters

  • shard_binary - The shard data
  • codec - Initialized sharding codec

Returns

  • {:ok, chunks_map} - Map of chunk_index => decoded_data
  • {:error, reason} - Decoding failure

decode_chunk(shard_binary, chunk_index, codec)

@spec decode_chunk(shard_binary(), chunk_index(), t()) ::
  {:ok, chunk_data()} | {:error, term()}

Decodes specific chunk from shard.

Parameters

  • shard_binary - The shard data
  • chunk_index - Index of chunk to extract
  • codec - Initialized sharding codec

Returns

  • {:ok, chunk_data} - Decoded chunk data
  • {:error, reason} - Decoding failure or chunk not found

encode(chunks, codec)

@spec encode(chunks_map(), t()) :: {:ok, shard_binary()} | {:error, term()}

Encodes multiple chunks into a shard with embedded index.

Parameters

  • chunks - Map of chunk_index => binary_data
  • codec - Initialized sharding codec

Returns

  • {:ok, shard_binary} - Shard with chunks and index
  • {:error, reason} - Encoding failure

init(config)

@spec init(map()) :: {:ok, t()} | {:error, term()}

Initializes the sharding codec with configuration.

Parameters

  • config - Configuration map with chunk_shape, codecs, index_codecs, index_location

Returns

  • {:ok, codec} - Initialized codec struct
  • {:error, reason} - Invalid configuration