ExZarr.Codecs.ShardingIndexed (ExZarr v1.1.0)
View SourceImplements the Zarr v3 sharding-indexed codec.
Sharding combines multiple logical chunks into single physical shard files, dramatically reducing metadata overhead for cloud storage systems.
Benefits
- Reduced cloud API calls: 100 chunks in 1 shard = 99% fewer S3 requests
- Lower metadata overhead: Less filesystem or object storage metadata
- Better caching: Larger units suitable for block-based caching
Shard Structure
Per the Zarr v3 specification:
[Chunk 0 data][Chunk 1 data]...[Chunk N data][Index][Index Size (8 bytes)]The index is a list of (offset, size) pairs for each chunk, encoded as little-endian uint64 pairs. The last 8 bytes store the index size.
Configuration
:chunk_shape- Tuple defining shard dimensions in chunks (e.g.,{10, 10}):codecs- Codec pipeline for individual chunks within the shard:index_codecs- Codec pipeline for the shard index (default: bytes + crc32c):index_location-:startor:end(default::end)
Example
config = %{
"chunk_shape" => [10, 10],
"codecs" => [
%{name: "bytes"},
%{name: "gzip", configuration: %{level: 5}}
],
"index_codecs" => [
%{name: "bytes"},
%{name: "crc32c"}
],
"index_location" => "end"
}
{:ok, shard_codec} = ShardingIndexed.init(config)
# Encode multiple chunks into shard
chunks = %{
{0, 0} => <<1, 2, 3, 4>>,
{0, 1} => <<5, 6, 7, 8>>,
{1, 0} => <<9, 10, 11, 12>>
}
{:ok, shard_binary} = ShardingIndexed.encode(chunks, shard_codec)
# Extract specific chunk from shard
{:ok, chunk_data} = ShardingIndexed.decode_chunk(shard_binary, {0, 1}, shard_codec)
# Decode all chunks from shard
{:ok, all_chunks} = ShardingIndexed.decode(shard_binary, shard_codec)Specification
Zarr v3 Sharding Extension: https://zarr-specs.readthedocs.io/en/latest/v3/codecs/sharding-indexed/v1.0.html
Summary
Functions
Decodes all chunks from shard into a map.
Decodes specific chunk from shard.
Encodes multiple chunks into a shard with embedded index.
Initializes the sharding codec with configuration.
Types
@type chunk_data() :: binary()
@type chunk_index() :: tuple()
@type chunks_map() :: %{required(chunk_index()) => chunk_data()}
@type index_location() :: :start | :end
@type shard_binary() :: binary()
@type shard_index() :: %{ chunk_offsets: %{ required(chunk_index()) => {offset :: non_neg_integer(), size :: non_neg_integer()} }, chunk_indices: [chunk_index()] }
@type t() :: %ExZarr.Codecs.ShardingIndexed{ chunk_shape: tuple(), codecs: [map()], index_codecs: [map()], index_location: index_location() }
Functions
@spec decode(shard_binary(), t()) :: {:ok, chunks_map()} | {:error, term()}
Decodes all chunks from shard into a map.
Parameters
shard_binary- The shard datacodec- Initialized sharding codec
Returns
{:ok, chunks_map}- Map of chunk_index => decoded_data{:error, reason}- Decoding failure
@spec decode_chunk(shard_binary(), chunk_index(), t()) :: {:ok, chunk_data()} | {:error, term()}
Decodes specific chunk from shard.
Parameters
shard_binary- The shard datachunk_index- Index of chunk to extractcodec- Initialized sharding codec
Returns
{:ok, chunk_data}- Decoded chunk data{:error, reason}- Decoding failure or chunk not found
@spec encode(chunks_map(), t()) :: {:ok, shard_binary()} | {:error, term()}
Encodes multiple chunks into a shard with embedded index.
Parameters
chunks- Map of chunk_index => binary_datacodec- Initialized sharding codec
Returns
{:ok, shard_binary}- Shard with chunks and index{:error, reason}- Encoding failure
Initializes the sharding codec with configuration.
Parameters
config- Configuration map with chunk_shape, codecs, index_codecs, index_location
Returns
{:ok, codec}- Initialized codec struct{:error, reason}- Invalid configuration