Migrating from Zarr v2 to v3 in ExZarr
View SourceThis guide helps you migrate existing Zarr v2 code to use the v3 specification in ExZarr.
Overview
Zarr v3 introduces several improvements over v2:
- Unified codec pipeline: Single
codecsarray replaces separatefiltersandcompressor - Improved metadata format: Consolidated
zarr.jsonreplaces multiple files - Better extensibility: Built-in support for custom extensions
- Simplified data types: Human-readable type names like
"float64"instead of"<f8" - Hierarchical chunk storage: Chunks stored in
c/directory with slash-separated paths
ExZarr supports both v2 and v3 simultaneously, allowing gradual migration.
Quick Migration Checklist
- [ ] Update array creation to specify
zarr_version: 3 - [ ] Convert
filtersandcompressorto unifiedcodecsarray - [ ] Verify chunk key format in storage (if directly manipulating files)
- [ ] Update tests to handle both v2 and v3 metadata files
- [ ] Consider whether existing v2 arrays need conversion
Breaking Changes
1. Metadata File Names
v2:
my_array/
.zarray # Array metadata
.zgroup # Group metadata (if applicable)
.zattrs # Attributes (optional)
0.0 # Chunk files
0.1v3:
my_array/
zarr.json # Unified metadata (includes attributes)
c/ # Chunk directory
0/
0 # Chunk files
1Impact: Code that directly reads .zarray files will not work with v3 arrays.
Solution: Use ExZarr's open/1 function which automatically detects the version:
# Works for both v2 and v3
{:ok, array} = ExZarr.open(path: "/path/to/array")2. Codec Configuration
v2:
{:ok, array} = ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
dtype: :float64,
filters: [{:shuffle, [elementsize: 8]}],
compressor: :zlib,
zarr_version: 2
)v3:
{:ok, array} = ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
dtype: :float64,
codecs: [
%{name: "shuffle", configuration: %{elementsize: 8}},
%{name: "bytes"}, # Required
%{name: "gzip", configuration: %{level: 5}}
],
zarr_version: 3
)Impact: The filters and compressor options are replaced by a single codecs array.
Solution: Convert your codec configuration using the mapping below.
3. Chunk Key Format
v2: Dot-separated indices
0.0
0.1
1.0
1.1v3: Slash-separated with prefix
c/0/0
c/0/1
c/1/0
c/1/1Impact: Direct file path manipulation breaks.
Solution: Use ExZarr's chunk operations instead of direct file access:
# Don't do this
File.read!("#{path}/0.0") # v2 specific
# Do this instead
{:ok, array} = ExZarr.open(path: path)
{:ok, data} = ExZarr.Array.get_slice(array, start: {0, 0}, stop: {100, 100})Step-by-Step Migration
Step 1: Update Array Creation
Change from v2 to v3 format:
Before (v2):
def create_my_array do
ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
dtype: :float64,
compressor: :zlib,
filters: [{:shuffle, [elementsize: 8]}],
storage: :filesystem,
path: "/data/my_array"
)
endAfter (v3):
def create_my_array do
ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
dtype: :float64,
codecs: [
%{name: "shuffle", configuration: %{elementsize: 8}},
%{name: "bytes"},
%{name: "gzip", configuration: %{level: 5}}
],
zarr_version: 3,
storage: :filesystem,
path: "/data/my_array"
)
endStep 2: Codec Mapping Reference
Use this table to convert v2 configuration to v3 codecs:
Compressor Mapping
| v2 Compressor | v3 Codec |
|---|---|
:zlib | %{name: "gzip", configuration: %{level: 5}} |
:zstd | %{name: "zstd", configuration: %{level: 5}} |
:lz4 | %{name: "lz4"} |
:blosc | %{name: "blosc"} |
:bzip2 | %{name: "bz2"} |
:crc32c | %{name: "crc32c"} |
:none | (omit bytes-to-bytes codec) |
Filter Mapping
| v2 Filter | v3 Codec |
|---|---|
{:shuffle, [elementsize: N]} | %{name: "shuffle", configuration: %{elementsize: N}} |
{:delta, [dtype: T]} | %{name: "delta", configuration: %{dtype: "typeN"}} |
{:quantize, [digits: N, dtype: T]} | %{name: "quantize", configuration: %{digits: N, dtype: "typeN"}} |
{:astype, [encode_dtype: T]} | %{name: "astype", configuration: %{encode_dtype: "typeN"}} |
{:bitround, [keepbits: N]} | %{name: "bitround", configuration: %{keepbits: N}} |
Data Type Mapping
| v2 Atom | v3 String |
|---|---|
:int8 | "int8" |
:int16 | "int16" |
:int32 | "int32" |
:int64 | "int64" |
:uint8 | "uint8" |
:uint16 | "uint16" |
:uint32 | "uint32" |
:uint64 | "uint64" |
:float32 | "float32" |
:float64 | "float64" |
Step 3: Update Array Opening
Version detection is automatic:
Before:
{:ok, array} = ExZarr.open(path: "/data/my_array")
# Assume it's v2After:
{:ok, array} = ExZarr.open(path: "/data/my_array")
# Automatically detects v2 or v3
# Optionally check version
case array.version do
2 -> handle_v2_array(array)
3 -> handle_v3_array(array)
endStep 4: Codec Ordering in v3
v3 enforces strict codec ordering:
- Array → Array codecs (zero or more): Filters like shuffle, delta
- Array → Bytes codec (exactly one, required): Always
bytes - Bytes → Bytes codecs (zero or more): Compression like gzip, zstd
Correct order:
codecs: [
%{name: "shuffle", configuration: %{elementsize: 8}}, # 1. Array→Array
%{name: "delta", configuration: %{dtype: "int64"}}, # 1. Array→Array
%{name: "bytes"}, # 2. Array→Bytes (required)
%{name: "gzip", configuration: %{level: 5}}, # 3. Bytes→Bytes
%{name: "zstd", configuration: %{level: 3}} # 3. Bytes→Bytes
]Incorrect order (will fail validation):
# Wrong: compression before bytes codec
codecs: [
%{name: "gzip"}, # Error: bytes→bytes before array→bytes
%{name: "bytes"}
]
# Wrong: filter after bytes codec
codecs: [
%{name: "bytes"},
%{name: "shuffle"} # Error: array→array after array→bytes
]Automatic v2-to-v3 Conversion
ExZarr automatically converts v2-style configuration when you specify zarr_version: 3:
# Specify v2-style filters/compressor but request v3 format
{:ok, array} = ExZarr.create(
shape: {1000},
chunks: {100},
dtype: :int64,
filters: [{:shuffle, [elementsize: 8]}],
compressor: :zlib,
zarr_version: 3 # Request v3 format
)
# ExZarr automatically converts to:
# codecs: [
# %{name: "shuffle", configuration: %{elementsize: 8}},
# %{name: "bytes"},
# %{name: "gzip", configuration: %{level: 5}}
# ]This allows gradual migration without immediately rewriting all codec specifications.
Compatibility Mode
Reading v2 Arrays
v2 arrays open transparently:
# v2 array on disk
{:ok, v2_array} = ExZarr.open(path: "/data/old_v2_array")
assert v2_array.version == 2
# All operations work normally
{:ok, data} = ExZarr.Array.get_slice(v2_array, start: {0}, stop: {100})Reading v3 Arrays
v3 arrays also open transparently:
# v3 array on disk
{:ok, v3_array} = ExZarr.open(path: "/data/new_v3_array")
assert v3_array.version == 3
# Same API as v2
{:ok, data} = ExZarr.Array.get_slice(v3_array, start: {0}, stop: {100})Mixed v2/v3 Usage
You can work with both versions simultaneously:
# Open both versions
{:ok, v2} = ExZarr.open(path: "/data/legacy_v2")
{:ok, v3} = ExZarr.open(path: "/data/modern_v3")
# Use identical API for both
{:ok, v2_data} = ExZarr.Array.to_binary(v2)
{:ok, v3_data} = ExZarr.Array.to_binary(v3)
# Version is tracked internally
IO.puts("v2 array version: #{v2.version}") # 2
IO.puts("v3 array version: #{v3.version}") # 3Common Patterns
Pattern 1: Simple Compression
v2:
{:ok, array} = ExZarr.create(
shape: {1000},
chunks: {100},
dtype: :float64,
compressor: :zlib
)v3:
{:ok, array} = ExZarr.create(
shape: {1000},
chunks: {100},
dtype: :float64,
codecs: [
%{name: "bytes"},
%{name: "gzip", configuration: %{level: 5}}
],
zarr_version: 3
)Pattern 2: No Compression
v2:
{:ok, array} = ExZarr.create(
shape: {1000},
chunks: {100},
dtype: :int32,
compressor: :none
)v3:
{:ok, array} = ExZarr.create(
shape: {1000},
chunks: {100},
dtype: :int32,
codecs: [%{name: "bytes"}], # Only required codec, no compression
zarr_version: 3
)Pattern 3: Shuffle + Compression
v2:
{:ok, array} = ExZarr.create(
shape: {1000},
chunks: {100},
dtype: :int64,
filters: [{:shuffle, [elementsize: 8]}],
compressor: :zlib
)v3:
{:ok, array} = ExZarr.create(
shape: {1000},
chunks: {100},
dtype: :int64,
codecs: [
%{name: "shuffle", configuration: %{elementsize: 8}},
%{name: "bytes"},
%{name: "gzip", configuration: %{level: 5}}
],
zarr_version: 3
)Pattern 4: Multiple Filters
v2:
{:ok, array} = ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
dtype: :float64,
filters: [
{:quantize, [digits: 2, dtype: :float64]},
{:shuffle, [elementsize: 8]}
],
compressor: :zstd
)v3:
{:ok, array} = ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
dtype: :float64,
codecs: [
%{name: "quantize", configuration: %{digits: 2, dtype: "float64"}},
%{name: "shuffle", configuration: %{elementsize: 8}},
%{name: "bytes"},
%{name: "zstd", configuration: %{level: 5}}
],
zarr_version: 3
)Testing Your Migration
Test Both Versions
Ensure your code handles both v2 and v3:
defmodule MyApp.ArrayTest do
use ExUnit.Case
test "handles v2 arrays" do
{:ok, array} = ExZarr.create(
shape: {100},
chunks: {10},
dtype: :int32,
compressor: :zlib,
zarr_version: 2,
storage: :memory
)
assert array.version == 2
# Test your operations
:ok = ExZarr.Array.set_slice(array, data, start: {0}, stop: {100})
{:ok, read_data} = ExZarr.Array.get_slice(array, start: {0}, stop: {100})
assert read_data == data
end
test "handles v3 arrays" do
{:ok, array} = ExZarr.create(
shape: {100},
chunks: {10},
dtype: :int32,
codecs: [%{name: "bytes"}, %{name: "gzip"}],
zarr_version: 3,
storage: :memory
)
assert array.version == 3
# Same operations should work
:ok = ExZarr.Array.set_slice(array, data, start: {0}, stop: {100})
{:ok, read_data} = ExZarr.Array.get_slice(array, start: {0}, stop: {100})
assert read_data == data
end
endVerify File Structure
Check that persisted arrays have the correct structure:
# v2 structure
assert File.exists?("#{v2_path}/.zarray")
assert File.exists?("#{v2_path}/0") # Chunk with dot notation
# v3 structure
assert File.exists?("#{v3_path}/zarr.json")
assert File.dir?("#{v3_path}/c") # Chunk directory
assert File.exists?("#{v3_path}/c/0") # Chunk with slash notationTroubleshooting
Problem: "Unknown codec" error with v3 array
Symptom: Error when opening v3 array with custom codecs
Solution: Ensure codec names match v3 specification. v2 uses atoms (:shuffle), v3 uses strings ("shuffle").
Problem: Chunks not found after migration
Symptom: {:error, :not_found} when reading chunks from v3 array
Cause: Chunk key format changed from dot-notation to slash-notation
Solution: Don't manually construct chunk paths. Use ExZarr's API:
# Wrong: Manual path construction
File.read!("#{path}/0.0")
# Correct: Use ExZarr API
{:ok, array} = ExZarr.open(path: path)
{:ok, data} = ExZarr.Array.get_slice(array, ...)Problem: Metadata parse error
Symptom: {:error, :invalid_metadata} when opening array
Cause: Mixed v2/v3 metadata files in same directory
Solution: Keep v2 and v3 arrays in separate directories. Don't try to convert in-place.
Problem: Codec ordering validation fails
Symptom: {:error, :invalid_codec_order}
Cause: v3 enforces strict codec ordering
Solution: Follow this order:
- Array→Array codecs (filters)
- Array→Bytes codec (
bytes- required) - Bytes→Bytes codecs (compression)
Best Practices
1. Use v3 for New Arrays
Unless you need compatibility with older tools, use v3 for all new arrays:
# Prefer this for new arrays
{:ok, array} = ExZarr.create(
shape: {1000},
chunks: {100},
dtype: :float64,
codecs: [%{name: "bytes"}, %{name: "gzip"}],
zarr_version: 3
)2. Keep v2 Arrays as v2
Don't convert existing v2 arrays to v3 unless necessary. Both formats work seamlessly:
# It's fine to keep v2 arrays as v2
{:ok, legacy_array} = ExZarr.open(path: "/data/legacy_v2_array")
# All operations work normally3. Use Automatic Conversion for Gradual Migration
When creating new arrays, use v2-style syntax with zarr_version: 3 for gradual migration:
# Transitional approach: v2 syntax, v3 format
{:ok, array} = ExZarr.create(
shape: {1000},
chunks: {100},
dtype: :int64,
filters: [{:shuffle, [elementsize: 8]}], # v2 style
compressor: :zlib, # v2 style
zarr_version: 3 # But save as v3
)
# Later, migrate to explicit v3 syntax
# codecs: [
# %{name: "shuffle", configuration: %{elementsize: 8}},
# %{name: "bytes"},
# %{name: "gzip", configuration: %{level: 5}}
# ]4. Version-Agnostic Code
Write code that works with both versions:
def process_array(path) do
{:ok, array} = ExZarr.open(path: path)
# Version-agnostic operations
{:ok, data} = ExZarr.Array.to_binary(array)
process_data(data)
end5. Document Version Requirements
If your code requires a specific version, document it:
@doc """
Processes a Zarr array.
Requires: Zarr v3 format for optimal performance.
"""
def process_modern_array(path) do
{:ok, array} = ExZarr.open(path: path)
unless array.version == 3 do
{:error, :requires_v3_format}
else
# Process array
end
endAdditional Resources
Summary
Migrating from v2 to v3 in ExZarr:
- For new arrays: Use
zarr_version: 3with unifiedcodecsarray - For existing arrays: No migration needed - both versions work seamlessly
- Gradual migration: Use v2-style syntax with
zarr_version: 3for automatic conversion - Testing: Ensure your code handles both v2 and v3 arrays
- Best practice: Write version-agnostic code using ExZarr's API
The transition from v2 to v3 is designed to be gradual and non-breaking. ExZarr handles version differences internally, allowing you to focus on your application logic.