Python Interoperability
View SourceExZarr implements the Zarr specification to enable reliable cross-language collaboration between Elixir and Python. This guide shows how to read and write arrays across languages, maps data types, and identifies compatibility boundaries.
Overview and Use Cases
Why Cross-Language Compatibility Matters
Scientific and data engineering workflows increasingly span multiple languages:
- Python ecosystem: NumPy, Pandas, Xarray, scikit-learn for analysis and ML
- Elixir/BEAM: Distributed data pipelines, web services, fault-tolerant systems
- Shared storage format: Enables specialization without vendor lock-in
Zarr provides the interoperability layer. ExZarr and zarr-python both implement the same specification, ensuring arrays can be exchanged reliably.
Common Interoperability Patterns
1. Python Producer, Elixir Consumer
ML training in Python, inference service in Elixir:
Python (scikit-learn) → Train model → Export weights to Zarr
↓
Elixir (Phoenix) → Load weights → Serve predictions via HTTPUse case: Training happens offline in Python (rich ML ecosystem), production inference runs in Elixir (high concurrency, low latency).
2. Elixir Producer, Python Consumer
Data pipeline in Elixir, analysis in Python:
Elixir (Broadway) → Process stream → Write results to Zarr
↓
Python (Pandas/Jupyter) → Load results → Analyze and visualizeUse case: Real-time ingestion in Elixir (distributed, fault-tolerant), exploratory analysis in Python (interactive, visualization-rich).
3. Bidirectional Collaboration
Incremental processing across languages:
Python → Initial processing → Zarr
↓
Elixir → Additional processing → Update Zarr
↓
Python → Final analysis → ResultsUse case: Collaborative workflows where each language handles steps best suited to its strengths.
4. Archive Interchange
Share datasets between teams using different stacks:
Team A (Python) → Publish dataset as Zarr archive
↓
Team B (Elixir) → Consume dataset for production workloadUse case: Cross-team collaboration without forcing tool standardization.
Zarr Specification Compliance
How interoperability works:
- Zarr specification: Defines file format, metadata structure, compression
- zarr-python: Reference implementation (Python)
- ExZarr: Full implementation (Elixir)
- Specification compliance: Ensures compatibility
ExZarr implements:
- Zarr v2: Full compliance (stable, widely supported)
- Zarr v3: Full compliance, production-ready (modern features, works with zarr-python 3.x)
When both sides follow the spec, arrays are interchangeable.
Reading Python Arrays from Elixir
Complete Example: Python to Elixir
Step 1: Create array in Python
# create_array.py
import zarr
import numpy as np
# Create array with zarr-python
z = zarr.open_array(
'/shared/data/measurements',
mode='w',
shape=(1000, 500),
chunks=(100, 50),
dtype='float64',
compressor=zarr.Blosc(cname='zstd', clevel=3)
)
# Fill with random data
z[:, :] = np.random.randn(1000, 500)
# Add metadata for consumers
z.attrs['units'] = 'meters'
z.attrs['description'] = 'Sensor measurements'
z.attrs['created_by'] = 'Python zarr-python 2.16.0'
print(f"Created array: shape={z.shape}, dtype={z.dtype}")
print(f"Chunks: {z.chunks}")
print(f"Compressor: {z.compressor}")Run the Python script:
$ python create_array.py
Created array: shape=(1000, 500), dtype=float64
Chunks: (100, 50)
Compressor: Blosc(cname='zstd', clevel=3, shuffle=SHUFFLE, blocksize=0)
Step 2: Read array in Elixir
# read_array.exs
# Open array created by Python
{:ok, array} = ExZarr.open(
storage: :filesystem,
path: "/shared/data/measurements"
)
# Inspect metadata (automatically detected from Python format)
IO.puts("Shape: #{inspect(array.metadata.shape)}")
IO.puts("Dtype: #{inspect(array.metadata.dtype)}")
IO.puts("Chunks: #{inspect(array.metadata.chunks)}")
IO.puts("Compressor: #{inspect(array.metadata.compressor)}")
# Read attributes created by Python
IO.puts("\nAttributes:")
IO.puts(" Units: #{array.metadata.attributes["units"]}")
IO.puts(" Description: #{array.metadata.attributes["description"]}")
IO.puts(" Created by: #{array.metadata.attributes["created_by"]}")
# Read a slice of data
{:ok, data} = ExZarr.Array.get_slice(array,
start: {0, 0},
stop: {100, 50}
)
# Data is nested tuples (Elixir native format)
IO.puts("\nRead slice [0:100, 0:50]")
IO.puts("First value: #{elem(elem(data, 0), 0)}")
IO.puts("Data structure: nested tuples")
# Optionally convert to Nx tensor for numerical operations
tensor = data
|> Tuple.to_list()
|> Enum.map(&Tuple.to_list/1)
|> Nx.tensor()
IO.puts("Converted to Nx tensor: #{inspect(Nx.shape(tensor))}")Run the Elixir script:
$ elixir read_array.exs
Shape: {1000, 500}
Dtype: :float64
Chunks: {100, 50}
Compressor: :blosc
Attributes:
Units: meters
Description: Sensor measurements
Created by: Python zarr-python 2.16.0
Read slice [0:100, 0:50]
First value: -0.4532891
Data structure: nested tuples
Converted to Nx tensor: {100, 50}
Key Considerations
Automatic format detection:
ExZarr automatically detects Zarr v2 or v3 format by inspecting metadata files (.zarray for v2, zarr.json for v3).
Dtype mapping:
Python dtype float64 automatically maps to ExZarr :float64. See Data Type Compatibility Matrix for full mapping.
Compressor compatibility:
If Blosc (or another optional codec) is not available in ExZarr, the read will fail. Use ExZarr.Codecs.codec_available?(:blosc) to check. See Compression Codec Compatibility.
Data structure differences:
- Python: NumPy
ndarray(C-contiguous memory) - Elixir: Nested tuples (immutable, BEAM-native)
- Conversion: Use Nx for numerical operations
Writing Elixir Arrays for Python
Complete Example: Elixir to Python
Step 1: Create array in Elixir
# create_array.exs
# Create array for Python consumption
{:ok, array} = ExZarr.create(
shape: {500, 500},
chunks: {50, 50},
dtype: :int32,
compressor: :zlib, # Universally supported
storage: :filesystem,
path: "/shared/data/results",
zarr_version: 2 # Use v2 for maximum compatibility
)
# Generate data (nested tuples)
data = for i <- 0..499 do
for j <- 0..499 do
i * 500 + j
end
|> List.to_tuple()
end
|> List.to_tuple()
# Write data
:ok = ExZarr.Array.set_slice(array, data,
start: {0, 0},
stop: {500, 500}
)
# Add attributes for Python users
# (Provides context for consumers)
:ok = ExZarr.Array.put_attributes(array, %{
"units" => "counts",
"description" => "Processing results from Elixir pipeline",
"created_by" => "ExZarr v1.0.0",
"timestamp" => DateTime.utc_now() |> DateTime.to_iso8601()
})
IO.puts("Created array for Python consumption")
IO.puts("Path: /shared/data/results")
IO.puts("Format: Zarr v2 (compatible with zarr-python 2.x)")
IO.puts("Compressor: zlib (universally supported)")Run the Elixir script:
$ elixir create_array.exs
Created array for Python consumption
Path: /shared/data/results
Format: Zarr v2 (compatible with zarr-python 2.x)
Compressor: zlib (universally supported)
Step 2: Read array in Python
# read_array.py
import zarr
import numpy as np
# Read array created by Elixir
z = zarr.open_array('/shared/data/results', mode='r')
print(f"Shape: {z.shape}")
print(f"Dtype: {z.dtype}")
print(f"Chunks: {z.chunks}")
print(f"Compressor: {z.compressor}")
# Read attributes written by Elixir
print("\nAttributes:")
for key, value in z.attrs.items():
print(f" {key}: {value}")
# Read data (returns NumPy array)
data = z[:]
print(f"\nData type: {type(data)}")
print(f"First value: {data[0, 0]}")
print(f"Last value: {data[499, 499]}")
# Verify data integrity
expected_first = 0 * 500 + 0 # i * 500 + j
expected_last = 499 * 500 + 499
assert data[0, 0] == expected_first
assert data[499, 499] == expected_last
print("\nData integrity verified!")Run the Python script:
$ python read_array.py
Shape: (500, 500)
Dtype: int32
Chunks: (50, 50)
Compressor: Zlib(level=6)
Attributes:
units: counts
description: Processing results from Elixir pipeline
created_by: ExZarr v1.0.0
timestamp: 2026-01-29T10:30:00Z
Data type: <class 'numpy.ndarray'>
First value: 0
Last value: 249999
Data integrity verified!
Best Practices for Python Compatibility
1. Choose appropriate Zarr version
# For modern Python tools (zarr-python 3.x)
zarr_version: 3 # Recommended for new projects
# For legacy compatibility (zarr-python 2.x)
zarr_version: 2 # Maximum compatibility with older toolsBoth Zarr v2 and v3 are production-ready in ExZarr. Use v3 for new projects with modern Python tools, or v2 for compatibility with older zarr-python 2.x installations.
2. Use standard compressors
compressor: :zlib # Always available
# OR
compressor: :zstd # Widely available, better compressionAvoid custom codecs unless coordinated with Python consumers.
3. Use common dtypes
dtype: :int32 # Standard 32-bit integer
dtype: :float64 # Standard 64-bit floatAvoid exotic types. See Data Type Compatibility Matrix.
4. Include descriptive attributes
ExZarr.Array.put_attributes(array, %{
"units" => "meters",
"description" => "Clear description of data",
"schema_version" => "1.0"
})Attributes help Python users understand the data without reading source code.
5. Test with zarr-python before deploying
# After creating array in Elixir
$ python -c "import zarr; z = zarr.open('/path/to/array'); print(z.shape)"
Verify Python can read the array before committing to format.
Data Type Compatibility Matrix
ExZarr supports 10 standard numeric dtypes with full bidirectional compatibility:
| ExZarr Atom | Python NumPy | Zarr v2 String | Zarr v3 Name | Bytes | Interoperable |
|---|---|---|---|---|---|
:int8 | int8 | |i1, <i1 | int8 | 1 | ✓ Yes |
:int16 | int16 | <i2 | int16 | 2 | ✓ Yes |
:int32 | int32 | <i4 | int32 | 4 | ✓ Yes |
:int64 | int64 | <i8 | int64 | 8 | ✓ Yes |
:uint8 | uint8 | |u1, <u1 | uint8 | 1 | ✓ Yes |
:uint16 | uint16 | <u2 | uint16 | 2 | ✓ Yes |
:uint32 | uint32 | <u4 | uint32 | 4 | ✓ Yes |
:uint64 | uint64 | <u8 | uint64 | 8 | ✓ Yes |
:float32 | float32 | <f4 | float32 | 4 | ✓ Yes |
:float64 | float64 | <f8 | float64 | 8 | ✓ Yes |
Byte Order
Zarr v2 uses byte order prefixes:
<- Little-endian (default, most common)>- Big-endian|- Native/not applicable (for single-byte types like int8, uint8)
ExZarr automatically handles all byte orders when reading. When writing, ExZarr defaults to little-endian (matching NumPy default).
Unsupported Python Types
ExZarr does not support:
| Python Type | Description | Status | Workaround |
|---|---|---|---|
bool | Boolean (True/False) | Not supported | Use uint8 with 0/1 values |
complex64 | 64-bit complex number | Not supported | Store real/imaginary in separate arrays |
complex128 | 128-bit complex number | Not supported | Store real/imaginary in separate arrays |
datetime64 | Temporal data | Not supported | Store as int64 Unix timestamps |
timedelta64 | Time duration | Not supported | Store as int64 (e.g., microseconds) |
object | Variable-length strings | Not supported | Store as attributes or external file |
str, unicode | Text data | Not supported | Use attributes or external format |
| Structured dtypes | Record/struct types | Not supported | Use multiple arrays or external format |
Workarounds for Unsupported Types
Boolean data:
# Instead of Python bool
# Store as uint8
{:ok, array} = ExZarr.create(
dtype: :uint8, # 0 = false, 1 = true
# ... other options
)
# Document in attributes
ExZarr.Array.put_attributes(array, %{
"dtype_semantic" => "boolean",
"false_value" => 0,
"true_value" => 1
})# Read in Python
z = zarr.open_array('/path/to/array')
bool_data = z[:].astype(bool) # Convert uint8 to boolDatetime data:
# Store as int64 Unix timestamps (microseconds since epoch)
{:ok, array} = ExZarr.create(
dtype: :int64,
# ... other options
)
# Document in attributes
ExZarr.Array.put_attributes(array, %{
"dtype_semantic" => "datetime64[us]",
"epoch" => "1970-01-01T00:00:00Z"
})
# Convert DateTime to int64 microseconds
timestamp = DateTime.utc_now()
|> DateTime.to_unix(:microsecond)# Read in Python
z = zarr.open_array('/path/to/array')
timestamps = z[:]
datetimes = pd.to_datetime(timestamps, unit='us') # Convert to datetimeComplex numbers:
# Store real and imaginary parts in separate arrays
{:ok, real_array} = ExZarr.create(
path: "/data/complex_real",
dtype: :float64,
# ... options
)
{:ok, imag_array} = ExZarr.create(
path: "/data/complex_imag",
dtype: :float64,
# ... options
)# Read in Python
real = zarr.open_array('/data/complex_real')[:]
imag = zarr.open_array('/data/complex_imag')[:]
complex_data = real + 1j * imag # Reconstruct complex arrayString data:
# Store strings as attributes (for small datasets)
ExZarr.Array.put_attributes(array, %{
"labels" => ["label1", "label2", "label3"]
})
# Or store in separate JSON file
labels = ["label1", "label2", "label3"]
File.write!("/data/labels.json", Jason.encode!(labels))Metadata Translation Details
Zarr v2 Metadata Files
Structure:
/path/to/array/
.zarray # Array metadata (shape, dtype, chunks, compressor)
.zattrs # User attributes (optional, JSON object)
0.0, 0.1, ... # Chunk files.zarray example (Python-created):
{
"zarr_format": 2,
"shape": [1000, 1000],
"chunks": [100, 100],
"dtype": "<f8",
"compressor": {
"id": "zlib",
"level": 5
},
"fill_value": 0.0,
"order": "C",
"filters": null
}ExZarr equivalent:
%ExZarr.Metadata{
zarr_format: 2,
shape: {1000, 1000},
chunks: {100, 100},
dtype: :float64,
compressor: :zlib,
compressor_config: [level: 5],
fill_value: 0.0,
order: "C",
filters: nil
}Dtype String Parsing
ExZarr automatically converts between NumPy dtype strings and Elixir atoms:
| NumPy String | ExZarr Atom | Notes |
|---|---|---|
<i4 | :int32 | Little-endian 32-bit int |
>i4 | :int32 | Big-endian 32-bit int (handled) |
|i1 | :int8 | Native byte order (single byte) |
<f8 | :float64 | Little-endian 64-bit float |
<u2 | :uint16 | Little-endian 16-bit unsigned int |
Dimension Separator (v2 Quirk)
Zarr v2 allows two chunk key formats:
Dot separator (default):
0.0, 0.1, 1.0, 1.1 # Chunks named with dotsSlash separator:
0/0, 0/1, 1/0, 1/1 # Chunks named with slashesExZarr supports both formats:
- Reading: Automatically detects separator from metadata
- Writing: Defaults to
.for compatibility with Python default
Override with dimension_separator option:
{:ok, array} = ExZarr.create(
dimension_separator: "/", # Use slash separator
# ... other options
)Compression Codec Compatibility
Codec Compatibility Matrix
| Codec | ExZarr Support | zarr-python Support | Interoperable | Notes |
|---|---|---|---|---|
| zlib | Always (Erlang) | Always | ✓ Yes | Recommended for compatibility |
| gzip | Always (Erlang) | Always | ✓ Yes | Same as zlib with gzip headers |
| zstd | Optional (Zig NIF) | Yes (numcodecs) | ✓ Yes | Requires libzstd on both sides |
| lz4 | Optional (Zig NIF) | Yes (numcodecs) | ✓ Yes | Requires liblz4 on both sides |
| blosc | Optional (Zig NIF) | Yes (numcodecs) | ✓ Yes | Requires libblosc on both sides |
| snappy | Optional (Zig NIF) | Yes (numcodecs) | Partial | Less common in Python ecosystem |
| bzip2 | Optional (Zig NIF) | Yes (numcodecs) | ✓ Yes | Requires libbz2 on both sides |
| Custom | Via behavior | Via numcodecs | ✗ No | Requires identical codec implementation |
Recommendation for Maximum Compatibility
Use zlib:
# Always works, no dependencies
compressor: :zlibFor better compression (if both sides have zstd):
# Check availability first
if ExZarr.Codecs.codec_available?(:zstd) do
compressor: :zstd
else
compressor: :zlib # Fallback
endChecking Codec Availability
In Elixir:
ExZarr.Codecs.codec_available?(:zstd)
# => true (if libzstd installed)
ExZarr.Codecs.available_codecs()
# => [:zlib, :gzip, :zstd, :lz4, :blosc, :bzip2, :crc32c]In Python:
import numcodecs
print(numcodecs.blosc.list_compressors())
# ['blosclz', 'lz4', 'lz4hc', 'snappy', 'zlib', 'zstd']Compatibility Boundaries
What Works ✓
Fully compatible:
- All 10 standard numeric dtypes (int8-int64, uint8-uint64, float32-float64)
- Zarr v2 format (ExZarr <-> zarr-python 2.x)
- Zarr v3 format (ExZarr <-> zarr-python 3.x)
- Standard compressors (zlib, gzip, zstd, lz4, blosc, bzip2)
- N-dimensional arrays (1D, 2D, 3D, ..., any dimension count)
- Chunked reading and writing
- Attributes and metadata (JSON-serializable)
- Hierarchical groups (nested arrays)
- Fill values (uninitialized chunks)
- Standard filters (shuffle, delta)
What Partially Works WARNING:
Requires coordination:
Zarr v3 core features:
- Unified codec pipeline: Full support in ExZarr and zarr-python 3.x
- Dimension names: Full support in ExZarr and zarr-python 3.x
- Improved metadata format: Full support in ExZarr and zarr-python 3.x
Zarr v3 extension features:
- Sharding: ExZarr supports, requires zarr-python 3.x with sharding extension
- Custom chunk grids: ExZarr supports, requires zarr-python 3.x
Recommendation: Use v3 for new projects with modern Python tools (zarr-python 3.x). Use v2 for maximum compatibility with legacy zarr-python 2.x installations.
Filters:
- Shuffle, Delta: ExZarr supports, zarr-python requires
numcodecspackage - Quantize: ExZarr supports, zarr-python requires
numcodecspackage
Recommendation: Ensure Python side has numcodecs installed: pip install numcodecs
What Doesn't Work ✗
Not supported in ExZarr:
- Complex dtypes (
complex64,complex128) - Boolean dtype (
bool) - use uint8 workaround - Datetime dtype (
datetime64) - use int64 timestamp workaround - Timedelta dtype (
timedelta64) - Object dtype (
object, variable-length strings) - Structured dtypes (record arrays)
- Custom codecs without matching implementation on both sides
Edge Cases
Very large arrays (> 1 TB):
- Both implementations handle large arrays
- Test at scale before production deployment
- Monitor memory usage (chunks loaded into RAM)
Concurrent writes:
- Both implementations support concurrent writes to different chunks
- Need external coordination for writes to same chunk
- Use file locking or application-level coordination
Symbolic links:
- Filesystem backend behavior may differ between implementations
- Test if using symlinks in storage paths
Network filesystems:
- NFS, SMB may have different consistency guarantees
- Test cross-language access on shared network storage
Troubleshooting Cross-Language Issues
Issue: Python Can't Read ExZarr Array
Symptom:
>>> import zarr
>>> z = zarr.open_array('/data/array')
ValueError: Unsupported compressor: 'custom_codec'Diagnosis:
Check Zarr version compatibility:
# Is array v3? Does Python have zarr-python 3.0+? cat /data/array/zarr.json # v3 indicatorCheck compressor availability:
import numcodecs print(numcodecs.blosc.list_compressors())Validate
.zarrayfile:cat /data/array/.zarray | python -m json.tool
Fix:
- Use Zarr v2:
zarr_version: 2in ExZarr - Use standard codec:
compressor: :zlib - Install missing codec:
pip install numcodecs
Issue: ExZarr Can't Read Python Array
Symptom:
{:error, {:unsupported_dtype, "complex128"}}Diagnosis:
Check dtype:
import zarr z = zarr.open_array('/data/array') print(z.dtype) # Is it in compatibility matrix?Check compressor:
print(z.compressor)Check for custom filters:
print(z.filters) # ExZarr may not support all filters
Fix:
- Use supported dtype (see compatibility matrix)
- Use standard codec:
compressor=zarr.Zlib() - Remove unsupported filters
Issue: Data Values Don't Match
Symptom: Values read in Python differ from values written in Elixir.
Diagnosis:
Check dtype matches exactly:
# Elixir array.metadata.dtype # :int32 # Python z.dtype # int32 (must match)Check byte order (should both be little-endian):
# ExZarr always writes little-endianCheck fill value handling:
# Uninitialized chunks return fill_value array.metadata.fill_value # 0.0Read small slice and compare byte-by-byte:
{:ok, data} = ExZarr.Array.get_slice(array, start: {0, 0}, stop: {2, 2})data = z[0:2, 0:2] print(data)
Fix:
- Ensure dtype is identical
- Verify both use same fill_value
- Check that data was actually written (not returning fill_value)
Issue: Metadata Looks Correct But Reading Fails
Symptom: Metadata parses correctly, but chunk reading fails.
Diagnosis:
Check file permissions:
ls -la /data/array/ # All files readable?Check dimension separator:
cat /data/array/.zarray | grep dimension_separator # "." or "/"?Verify chunk files exist:
ls /data/array/ | grep -E '^[0-9]+\.[0-9]+$' # Or: ls /data/array/ | grep -E '^[0-9]+/[0-9]+$'Compare chunk file names to expected coordinates:
# Expected chunk (0, 0) # v2 dot: "0.0" # v2 slash: "0/0"
Fix:
- Fix file permissions:
chmod -R a+r /data/array - Ensure dimension_separator matches chunk naming
- Verify chunks were actually written to storage
Issue: Performance Differs Between Python and Elixir
Symptom: Same operation takes different time in each language.
Expected behavior:
- Different compression implementations may have different speeds
- BEAM parallelism may give ExZarr advantage on multi-chunk reads
- Python NumPy operations may be faster (C-optimized)
Diagnosis:
Check chunk size appropriate for workload:
array.metadata.chunks # Too small? Too large?Check compression level:
array.metadata.compressor_configProfile both implementations:
# Elixir {time_us, _result} = :timer.tc(fn -> ExZarr.Array.get_slice(array, start: {0, 0}, stop: {1000, 1000}) end) IO.puts("#{div(time_us, 1000)}ms")# Python import time start = time.time() data = z[0:1000, 0:1000] print(f"{(time.time() - start) * 1000}ms")
Not a bug if:
- Both complete successfully with correct results
- Performance difference explained by implementation choices
- Both meet production requirements
Summary
This guide covered Python interoperability:
- Use cases: Python producer/Elixir consumer, bidirectional, archive interchange
- Reading Python arrays: Automatic format detection, dtype mapping, codec compatibility
- Writing for Python: Use v2, standard codecs, common dtypes, descriptive attributes
- Dtype compatibility: 10 standard types fully supported, workarounds for unsupported
- Metadata translation: Automatic conversion between NumPy strings and Elixir atoms
- Codec compatibility: zlib recommended, zstd widely available, custom codecs incompatible
- Boundaries: What works (numeric types), partial (v3 features), doesn't work (complex, bool, datetime)
- Troubleshooting: 5 common issues with diagnosis steps and fixes
Key recommendations:
- Use Zarr v2 for maximum compatibility
- Use zlib compressor (always works)
- Test cross-language before deploying
- Document dtype semantics in attributes
- Check codec availability on both sides
Next Steps
Now that you understand Python interoperability:
- Performance Tuning: Optimize for your workload in Performance Guide
- Nx Integration: Work with tensors in Nx Integration Guide
- Storage Providers: Apply interop knowledge to cloud storage in Storage Providers Guide
- Troubleshooting: Debug issues in Troubleshooting Guide