Changelog
View SourceAll notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[1.1.0] - 2026-06-12
BEAM-Native Streaming and Concurrent Zarr Processing
ExZarr v1.1.0 adds streaming APIs, pipeline integrations, and telemetry for large-scale array processing on the BEAM.
Added
Streaming APIs
ExZarr.Array.stream_chunks/2- lazy chunk streaming with concurrency, metadata, and filteringExZarr.Array.stream_slices/3- dimension-wise slice streamingExZarr.Array.write_stream/3- chunk ingestion from enumerables with validation and checkpoints- Shared streaming internals module (internal; not part of the public API)
chunk_stream/2retained as backward-compatible alias forstream_chunks/2
Pipeline Integrations (Optional Dependencies)
ExZarr.Flowwithchunk_flow/2andslice_flow/3ExZarr.GenStagewithChunkProducerandSliceProducerExZarr.BroadwaywithChunkProducerand pipeline helpers
Observability
ExZarr.Telemetrymodule with chunk read/write and stream start/stop events
Documentation
docs/architecture_review.md,docs/gap_analysis.md,docs/v1_1_design.mddocs/cloud_storage_patterns.mddocs/cookbook/starter recipes for large-array workflowslivebooks/broadway_pipeline.livemd,livebooks/nx_streaming.livemdrelease_notes_v1_1_0.md,migration_guide_v1_1_0.md
Benchmarks
benchmarks/streaming_bench.exsfor streaming throughput measurement
Changed
chunk_stream/2now delegates tostream_chunks/2:paralleloption aliased to:concurrencyin streaming APIs (logs deprecation warning)- Maximum
:concurrencyincreased from 10 (chunk_stream/2legacy cap) to 128 - Optional dependencies added:
flow,gen_stage,broadway ziglerconstrained to~> 0.16(requires zig 0.16.0; runmix zig.getbefore compile)
[1.0.0] - 2026-01-27
First Stable Release!
ExZarr 1.0.0 marks the first production-ready release with comprehensive testing, security hardening, and extensive documentation.
Added
Testing & Quality Assurance
- Comprehensive Test Suite: 1,713 total tests (146 doctests + 65 properties + 1,502 unit tests)
- Zero test failures across all test suites
- 4 tests intentionally skipped for environment-specific features
- 151 tests excluded (cloud storage backends requiring credentials)
- Property-Based Testing: Expanded from 21 to 65 properties
- New
test/ex_zarr_codecs_property_test.exswith 19 codec-focused properties - Enhanced
test/ex_zarr_property_test.exswith 25 additional properties - Comprehensive coverage of compression, indexing, storage, and metadata operations
- New
- Backend Test Coverage: New comprehensive test files
test/ex_zarr/storage_comprehensive_test.exs- 39 tests for storage operationstest/ex_zarr/storage/backend/filesystem_test.exs- 41 tests (82% coverage)test/ex_zarr/storage/backend/zip_test.exs- 40 tests (95.2% coverage)test/mix/tasks/fix_nif_rpaths_test.exs- 14 tests for Mix task
- Core Module Coverage: Significantly improved test coverage
format_converter.ex: 20% → 80% (+60 percentage points, 36 tests)indexing.ex: 12.1% → 85.1% (+73pp, 69 tests)metadata.ex: 59.1% → 79.5% (+20pp, 56 tests)storage.ex: ~29% → 68.1% (+39pp, 39 tests)filesystem.ex: 0% → 82% (+82pp, 41 tests)zip.ex: 66.6% → 95.2% (+29pp, 40 tests)
- Overall Coverage: 80.3% (up from 76.3%), with 100% coverage on 6 critical modules
Security & Documentation
- Security Policy: Comprehensive
SECURITY.mdwith 550+ lines- Vulnerability reporting process and timelines
- Input validation best practices with code examples
- Cloud authentication security patterns
- Path traversal prevention guidelines
- Resource limit recommendations
- Security checklist for production deployments
- Sobelow Integration: Static security analysis configured
.sobelow-confconfiguration file with documented exceptions- All high/medium confidence warnings resolved
- 45 low-confidence warnings documented as expected behavior
- Detailed explanation of file traversal, String.to_atom, and configuration warnings
- Enhanced Error Handling Guide:
guides/error_handling.md- Comprehensive error handling patterns
- Recovery strategies for common failures
- Circuit breaker and retry patterns
- Logging and debugging recommendations
- Telemetry Guide:
guides/telemetry.md- Complete instrumentation documentation
- Integration examples for monitoring systems
- Performance metrics and event tracking
Code Quality
- Zero Compilation Warnings: Clean compilation across all environments
- Credo Grade A+: Strict mode with 0 issues (1,396 mods/funs analyzed)
- Dialyzer Passing: All type specs validated, 14 known issues properly suppressed
- Documentation Coverage:
- Zero
mix docswarnings - All public functions have
@docannotations - All modules have
@moduledocannotations - Comprehensive guides in
guides/directory
- Zero
Changed
API Stability
- Semantic Versioning Commitment: v1.0.0 marks API stability
- No breaking changes planned for 1.x series
- Deprecation warnings will be added for any future API changes
- At least one minor version deprecation period before removal
- Dependency Versions: Updated to stable releases
:telemetryadded for observability support- All dependencies pinned to stable versions
Fixed
Test Stability
- Fixed metadata tests to include required
fill_valuefield - Fixed storage tests to handle mock backend registration idempotency
- Fixed property tests to only test public API functions
- Corrected filter metadata encoding to include both
dtypeandastypefields
Documentation
- Fixed all relative path references in guides
- Corrected README.md installation instructions
- Updated all version references to 1.0.0
Testing Metrics
Test Suite Performance
- Total execution time: ~5.8 seconds
- Async tests: 3.4 seconds
- Sync tests: 2.4 seconds
- All tests: 100% passing rate
Code Coverage by Module
- 100% coverage:
ex_zarr.ex,application.ex,chunk_cache.ex,version.ex,storage/backend.ex,codecs/codec.ex 90% coverage:
chunk_key.ex(96%),storage/backend/zip.ex(95.2%),codecs/sharding_indexed.ex(93.9%),array_server.ex(93%),memory.ex(90.9%)80% coverage: 15 additional modules
- Overall project: 80.3%
Security
Vulnerability Status: No security vulnerabilities reported or discovered
Security Hardening
- Comprehensive path validation examples
- Safe usage patterns for all file operations
- Secure cloud storage authentication patterns
- Input sanitization guidelines
- Resource limit recommendations
- Documented DoS prevention strategies
Static Analysis Results (Sobelow)
- 0 high confidence warnings
- 0 medium confidence warnings
- 45 low confidence warnings (all documented as expected for data storage library)
Breaking Changes
None. This is the first stable release, establishing the baseline API.
Deprecations
None.
Migration Guide
For users upgrading from v0.7.0:
- Update dependency in
mix.exs:{:ex_zarr, "~> 1.0"} - Run
mix deps.get - No code changes required - full backward compatibility maintained
Contributors
Special thanks to all contributors who made v1.0.0 possible through testing, feedback, and code contributions.
Looking Forward
Planned for v1.1.0:
- Additional cloud storage backend optimizations
- Zarr v3 extension features (variable chunking, additional codecs)
- Performance improvements for large arrays
- Additional convenience functions for common patterns
[0.7.0] - 2026-01-26
Added
Chunk Streaming and Parallel Processing
Array.chunk_stream/2- Stream chunks lazily with constant memory usage- Sequential mode using
Stream.resourcefor truly lazy evaluation - Parallel mode with configurable concurrency (max 10 workers)
- Progress callback support for monitoring long operations
- Filter option to process subset of chunks
- Ordered and unordered streaming modes
- Sequential mode using
Array.parallel_chunk_map/3- Process chunks in parallel with custom mapper function- Configurable concurrency and timeout
- Automatic error handling for failed tasks
- Integration with existing chunk cache and locking mechanisms
Custom Chunk Key Encoding
ChunkKey.Encoderbehavior - Define custom chunk naming schemesencode/2callback - Convert chunk index to string keydecode/2callback - Parse string key back to chunk indexpattern/1callback - Provide regex for key validation
ChunkKey.V2EncoderandChunkKey.V3Encoder- Default encoder implementationsChunkKey.Registry- Runtime encoder registration with AgentChunkKey.register_encoder/2- Register custom encoders by nameChunkKey.encode_with/3anddecode_with/3- Use registered encoders- Centralized chunk key logic in S3 and GCS backends
Group Convenience Features
- Access behavior implementation - Use bracket notation for path-based access
group["experiments/exp1/results"]syntax support- Automatic intermediate group creation on write
- Works with
get_in,put_in,update_infunctions
Group.get_item/2- Lazy load arrays and groups from storage- Path-based access with forward slash separator
- Caches loaded items in memory
- Tracks checked paths to avoid redundant storage queries
Group.put_item/3- Add items with auto-creation of parent groupsGroup.remove_item/2- Remove items from in-memory structureGroup.require_group/2- Create group hierarchy like mkdir -p- Returns existing group if present
- Creates all intermediate groups
- Returns error if path conflicts with array
Group.tree/2- ASCII tree visualization of group hierarchy- Box-drawing characters for structure (├── └──)
- Array [A] and group [G] markers
- Optional depth limiting
- Optional shape display
Group.batch_create/2- Create multiple groups/arrays in parallel- Concurrent metadata writes for cloud storage efficiency
- Mixed group and array creation support
- Up to 10 concurrent operations
Fixed
Type Safety
- Fixed unmatched return value in
Array.chunk_streamlock acquisition - Fixed
ChunkCache.putargument order in streaming code - All dialyzer warnings resolved
Test Stability
- Memory efficiency test now uses sequential streaming for consistent results
- Changed from parallel to sequential mode
- Increased threshold to account for OTP version variance
- Uses
Enum.reducefor truly lazy processing
- ArrayServer FIFO queue test uses staggered delays to ensure reliable ordering
- Added task-specific delays to guarantee queue order
- Uses Agent to track actual acquisition order
- Prevents race conditions in CI environments
Path Handling
- Fixed
Group.create_group/3to use correct storage path - Fixed
Group.create_array/3to properly join storage path with array path - Corrected filesystem metadata file locations
Changed
Storage Backend Refactoring
- S3 and GCS backends now use centralized
ChunkKey.encodefunction - Eliminated duplicate chunk key encoding logic
- Simplified pattern matching with
ChunkKey.chunk_key_pattern
Group Structure
- Added
_loadedfield to Group struct for lazy loading cache - Type specification updated to include MapSet for loaded paths
Testing
Test Coverage
- Total tests: 1246 (up from 794 in v0.5.0)
- New chunk streaming tests: 12 tests
- New chunk key encoding tests: 32 tests
- New group convenience tests: 37 tests
- Success rate: 100% passing (0 failures, 6 skipped)
- Quality checks: All passing (dialyzer, format checks)
Test Files
test/ex_zarr/chunk_streaming_test.exs- Chunk iteration and parallel processingtest/ex_zarr/chunk_key_encoding_test.exs- Custom encoder behavior and registrytest/ex_zarr/group_convenience_test.exs- Group access and convenience features
0.5.0 - 2026-01-25
Added
Metadata Serialization and Deserialization
- Complete JSON encoding/decoding for Zarr v3 metadata
MetadataV3.to_json/1- Serialize metadata structures to JSON stringsMetadataV3.from_json/1- Parse JSON to MetadataV3 structures- Full support for chunk grids (regular and irregular)
- Full support for codec pipeline encoding (nested configurations)
- Full support for dimension names and storage transformers
- 54 new tests covering JSON serialization and zarr-python 3.x file compatibility
Format Conversion
- Bidirectional conversion between Zarr v2 and v3 formats
MetadataV3.from_v2/1- Convert v2 metadata to v3 formatMetadataV3.to_v2/1- Convert v3 metadata to v2 format (with validation)ExZarr.FormatConvertermodule - Convert entire arrays between formatsFormatConverter.convert/1- Copy arrays with proper chunk key encodingFormatConverter.check_v2_compatibility/1- Pre-validate v3→v2 conversion- Automatic handling of codec pipeline differences
- Clear error messages for incompatible features (sharding, irregular grids)
- 18 conversion tests including round-trip verification
S3 Storage Backend Enhancements
- Localstack and minio support for local testing
- Custom endpoint URL configuration
- Automatic parsing of
AWS_ENDPOINT_URLenvironment variable - ExAws configuration for S3-compatible services (scheme, host, port)
- 45 mock tests using Mox for fast CI/CD testing without AWS
- 31 integration tests with full localstack support
- Comprehensive testing guide (
test/ex_zarr/storage/S3_TESTING.md) - Example usage script (
examples/s3_storage.exs)
Dependencies
- sweet_xml ~> 0.7 - Required for ExAws.S3 XML parsing
Fixed
Type Specifications
- MetadataV3.from_v2/1 return type - Removed unreachable
{:error, term()}case- Function always succeeds with v2 metadata input
- Updated spec to match success typing:
{:ok, t()}
S3 Backend Configuration
- ExAws configuration for S3-compatible services
- Fixed endpoint URL parsing from single string to ExAws format
- Corrected credential handling for localstack/minio
- Fixed
build_ex_aws_config/2to parse URI components properly - Resolved test setup issues with ExUnit callback return values
Changed
Test Organization
- S3 integration tests now properly skip when
AWS_ENDPOINT_URLnot configured - setup_all callback returns proper values for ExUnit compatibility
- Mock tests run independently without requiring AWS services
Documentation
- Updated S3 backend moduledoc with localstack/minio configuration
- Created S3_TESTING.md with complete setup and troubleshooting guide
- Updated examples with S3 endpoint URL usage patterns
Testing
Test Coverage
- Total tests: 794 (up from 482 in v0.4.0)
- S3 mock tests: 45 tests
- S3 integration tests: 31 tests
- Format conversion tests: 18 tests
- JSON serialization tests: 54 tests
- Success rate: 100% passing (0 failures, 3 skipped)
- Quality checks: All passing (format, credo strict, dialyzer)
Technical Details
Format Conversion Limitations
- v3 → v2 conversion has known limitations detected by validation:
- Sharding codec not supported in v2 (rejected with clear error)
- Dimension names lost in conversion (documented)
- Irregular chunk grids not supported in v2 (rejected with clear error)
- Array→array codecs (transpose, quantize, bitround) dropped
- v2 → v3 conversion fully supported without limitations
S3 Configuration
- Endpoint URL parsing converts HTTP URL to ExAws format:
http://localhost:4566→scheme: "http://", host: "localhost", port: 4566- Credentials from environment:
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY - Region configuration: default
us-east-1or specified
- Backward compatible - works with real AWS S3 when endpoint not specified
Breaking Changes
None - Full backward compatibility maintained with v0.1.0, v0.3.0, and v0.4.0
0.1.0 - 2026-01-23
Added
Core Functionality
- Complete array slicing implementation with
get_sliceandset_slicefunctions - Chunked storage system for efficient I/O and memory usage
- N-dimensional array support (1D to N-D) with optimized implementations for 1D and 2D
- 10 data types: int8, int16, int32, int64, uint8, uint16, uint32, uint64, float32, float64
- Compression support: zlib (fully working), with graceful fallbacks for zstd and lz4
- Two storage backends:
- Memory storage (using Agent for persistent state)
- Filesystem storage (Zarr v2 directory structure)
Validation and Safety
- Comprehensive index validation for all slicing operations
- Bounds checking to prevent out-of-bounds access
- Data size validation ensuring data matches slice dimensions
- Type validation for indices and data
- Clear error messages with actionable feedback
Interoperability
- Full Zarr v2 specification compliance
- Bidirectional compatibility with zarr-python
- 14 integration tests verifying Python - Elixir compatibility
- All data types work across implementations
- Metadata format compatibility
Documentation
- Comprehensive module documentation with examples
INTEROPERABILITY.mdguide for multi-language workflows- Interactive demo script (
examples/python_interop_demo.exs) - Integration test documentation (
test/support/README.md) - Python helper scripts for testing
Testing
- 196 tests covering all functionality
- 21 property-based tests using StreamData
- 35 validation tests for bounds checking and error handling
- 19 slicing tests for read/write operations
- 14 Python integration tests for cross-language compatibility
- 100% passing test suite with zero failures
Code Quality
- Passes all Credo checks (strict mode)
- Well-documented functions and modules
- Type specifications for public APIs
- Consistent error handling patterns
Technical Details
Array Operations
- Row-major (C-order) data layout
- Lazy chunk loading (only loads needed chunks)
- Efficient slice extraction across chunk boundaries
- Partial chunk updates (read-modify-write)
- Fill value support for uninitialized regions
Performance Optimizations
- Dimension-specific implementations (1D, 2D, ND)
- Minimal data copying during operations
- Efficient binary pattern matching
- Agent-based memory storage for fast writes
Metadata
- Zarr v2 JSON metadata format
- Shape, chunks, dtype, compressor configuration
- Fill value preservation
- Automatic metadata generation
Breaking Changes
None (initial release)
Deprecated
None
Fixed
- Memory storage now correctly persists writes using Agent
- Chunk boundary calculations work correctly for all dimensions
- Data size validation accounts for element size
- Index validation prevents invalid operations before I/O
Security
- Input validation prevents buffer overflows
- Bounds checking prevents out-of-bounds access
- Type checking ensures data integrity
0.3.0 - 2026-01-24
Added
Compression Codecs - Complete Implementation
All Major Codecs via Zig NIFs:
- zstd - Zstandard compression (native implementation via Zig NIF + libzstd)
- lz4 - LZ4 fast compression (native implementation via Zig NIF + liblz4)
- snappy - Snappy compression (native implementation via Zig NIF + libsnappy)
- blosc - Blosc meta-compressor (native implementation via Zig NIF + libblosc)
- bzip2 - Bzip2 compression (native implementation via Zig NIF + libbz2)
- crc32c - CRC32C checksum codec (pure Zig implementation, RFC 3720 compliant)
- zlib - Standard zlib compression (via Erlang
:zlib, already present in v0.1.0) - none - No compression option
Codec Features:
- High-performance native implementations using Zig NIFs
- Full compatibility with Python zarr's compression formats
- CRC32C uses Castagnoli polynomial (0x1EDC6F41) matching RFC 3720 specification
- Corruption detection and validation for CRC32C
- All codecs tested for Python interoperability
Custom Codec Plugin System
Extensible Architecture:
ExZarr.Codecs.Codecbehavior - Contract defining codec interfacecodec_id/0- Unique atom identifiercodec_info/0- Metadata (name, version, type, description)available?/0- Runtime availability checkencode/2- Compression/transformation functiondecode/2- Decompression/inverse transformationvalidate_config/1- Configuration validation
ExZarr.Codecs.RegistryGenServer - Dynamic codec management- Runtime registration:
ExZarr.Codecs.register_codec/2 - Runtime unregistration:
ExZarr.Codecs.unregister_codec/1 - Codec queries:
list_codecs/0,available_codecs/0,codec_info/1 - Force registration option for codec replacement
- Protection against unregistering built-in codecs
- Runtime registration:
Application supervision tree -
ExZarr.Applicationwith supervised registry- Fault-tolerant codec registry under supervision
- Automatic recovery on crashes
- OTP-compliant architecture
Plugin Capabilities:
- Create compression codecs (like zlib, zstd)
- Create transformation codecs (like transpose, shuffle)
- Create checksum codecs (like crc32c)
- Register at runtime without recompiling ExZarr
- Seamless integration with built-in codecs
- Can be chained with other codecs
Examples and Documentation
Example Codecs (examples/custom_codec_example.exs):
UppercaseCodec- Simple transformation codec demonstrating APIRleCodec- Run-length encoding compression codec- Complete usage demonstration with:
- Codec registration/unregistration
- Encoding and decoding operations
- Querying codec information
- Chaining custom and built-in codecs
Documentation Updates:
- Updated README with "Custom Codecs" section
- Code examples for creating custom codecs
- Usage patterns and best practices
- Updated compression codec list with all implementations
- Updated roadmap to reflect completed features
Testing
Comprehensive Test Coverage:
- 10 CRC32C tests - Encoding, decoding, corruption detection, edge cases
- 29 custom codec tests - Full plugin system coverage including:
- Behavior validation (
Codec.implements?/1) - Registry operations (register, unregister, list, info)
- Custom codec compression/decompression
- Availability checks
- Integration with built-in codecs
- Error handling for failing codecs
- Protection against invalid operations
- Behavior validation (
Total Test Suite:
- 238 tests (up from 196 in v0.1.0)
- 21 property-based tests with 2,100+ generated test cases
- 100% passing with 72.5% code coverage
- All codecs verified for Python zarr compatibility
Changed
Codec Module Refactoring:
- Refactored
ExZarr.Codecsto route through registry - Split built-in codec logic into
compress_builtin/3anddecompress_builtin/2 - Updated
available_codecs/0to dynamically query registry - Updated
codec_available?/1to check registry and call custom codec'savailable?/0 - Changed
@type codecfrom fixed atom list toatom()for extensibility
Mix Configuration:
- Updated
mix.exsto startExZarr.Applicationwith supervision tree - Added application module configuration for codec registry initialization
Documentation:
- Updated
GAP_ANALYSIS.mdto reflect codec completion (v1.0 → v1.1) - Updated compression performance comparison
- Updated test statistics
- Updated feature implementation status
Technical Details
CRC32C Implementation
- Pure Zig implementation with 256-entry lookup table
- Castagnoli polynomial: 0x1EDC6F41 (RFC 3720)
- 4-byte overhead in little-endian format
- Compatible with Python zarr's
google-crc32clibrary - Validates data integrity on decode
- Returns error on checksum mismatch
Zig NIF Integration
- Uses Zigler 0.13 for seamless Zig-to-Elixir integration
- Automatic memory management via beam.allocator
Proper error handling with Elixir result tuples
{:ok, data} | {:error, reason}- Platform-specific library linking (macOS, Linux)
- Post-compile RPATH fixing for library loading
Custom Codec Architecture
- GenServer-based registry pattern following OTP best practices
- ETS table for O(1) codec lookup performance
- Behavior contract ensures consistent codec API
- Support for both
:compressionand:transformationcodec types - Validation prevents registering invalid codecs
- Protection prevents unregistering built-in codecs
Breaking Changes
None - Full backward compatibility maintained
Deprecated
None
Fixed
- Removed codec fallbacks - all codecs now have native implementations
- Fixed
available_codecs/0to return dynamically registered codecs - Fixed
codec_available?/1to properly check custom codecs
Performance
- Native Zig implementations provide significant performance improvements over fallbacks
- CRC32C table-driven algorithm for fast checksum computation
- GenServer registry with ETS backend for fast codec lookups
- Zero-copy binary operations where possible
Security
- CRC32C detects data corruption and tampering
- Codec validation prevents malformed codec registration
- Input validation on all encode/decode operations
- Protection against unregistering critical built-in codecs
0.4.0 - 2026-01-25
Added
Zarr v3 Specification Support
- Complete Zarr v3 implementation with full specification compliance
- Python zarr 3.x interoperability - bidirectional compatibility verified
- 16 Python v3 integration tests covering all v3 features
- Unified codec pipeline - array→array, array→bytes, bytes→bytes stages
- v3 metadata format -
zarr.jsonwithnode_type,data_type, unifiedcodecs - v3 chunk key encoding - slash-separated paths (
c/0/1/2) with configurable encoding - Automatic version detection - seamlessly works with both v2 and v3 arrays
- Version-aware array operations - smart routing based on zarr_format field
- Group metadata support - explicit group nodes with v3 format
- Data type conversion utilities - automatic mapping between v2 and v3 type systems
Python v3 Interoperability Testing
- Comprehensive test suite (
test/ex_zarr_v3_python_interop_test.exs):- 7 tests: Python 3.x → ExZarr v3 compatibility
- 6 tests: ExZarr v3 → Python 3.x compatibility
- 3 tests: v3 metadata compatibility
- 2 tests: v3 codec compatibility
- Enhanced Python helper script (
test/support/zarr_python_helper.py):check_zarr_version()- Detects zarr-python versioncreate_v3_array()- Creates v3 arraysread_v3_array()- Reads v3 arraysverify_v3_array()- Validates v3 arrays
- Automatic test exclusion - Tests tagged with
:python_v3excluded when zarr-python 3.x not available
Documentation
docs/V3_PYTHON_INTEROP.md- Comprehensive Python v3 testing guidedocs/V3_PYTHON_INTEROP_FIX.md- Metadata save pattern documentationdocs/V3_GZIP_AND_CODEC_CONFIG_FIX.md- Gzip format and codec configuration fixes- Migration guide - v2 to v3 conversion patterns (in plan documentation)
Changed
Gzip Codec Implementation
- Fixed gzip format - Now produces true gzip format (RFC 1952) with magic bytes
0x1F 0x8B - Previous issue: Was producing zlib/deflate format (RFC 1950) with magic bytes
0x78 0x9C - New implementation (
lib/ex_zarr/codecs/pipeline_v3.ex):gzip_compress/2- Uses:zlib.deflateInit/6withwindowBits = 16 + 15gzip_decompress/1- Uses:zlib.inflateInit/2withwindowBits = 16 + 15
- Python compatibility - Gzip-compressed arrays now readable by zarr-python 3.x
Codec Configuration Serialization
- Always include
configurationkey in v3 codec specs (required by Zarr v3 spec) - Previous issue: Was omitting
configurationwhen empty, causing zarr-python validation errors - Fixed in
lib/ex_zarr/storage.ex-encode_codecs_v3/1function - Compliance: Matches Zarr v3 specification requirement for codec metadata
Type Specifications
- Fixed 12 dialyzer warnings - Narrowed type specs to match success typing
- Files updated:
lib/ex_zarr/codecs/pipeline_v3.ex- More specific error typeslib/ex_zarr/metadata_v3.ex- Precise validation error tupleslib/ex_zarr/data_type.ex- Specific return types (1 | 2 | 4 | 8instead ofpos_integer())lib/ex_zarr/version.ex- Non-empty list indicators
- Result: Zero dialyzer errors
Fixed
Python Interoperability Issues
- Issue #101: Python v3 interoperability testing implementation
- Gzip format mismatch: Fixed codec to produce RFC 1952 gzip format instead of RFC 1950 zlib
- Missing configuration key: Fixed codec metadata to always include
configurationfield - Metadata persistence: Documented requirement for
ExZarr.save/2afterExZarr.create/1for v3 arrays
Type System
- Dialyzer compliance: All type specifications now match inferred success types
- More precise error types: Better error reporting with specific error tuple shapes
- Better tooling support: IDE autocomplete and static analysis work correctly
Technical Details
Zarr v3 Architecture
Version abstraction layer (
lib/ex_zarr/version.ex):detect_version/1- Automatic v2/v3 detection from metadatadefault_version/0- Configurable default (v3 by default)supported_versions/0- Lists supported versions
v3 metadata module (
lib/ex_zarr/metadata_v3.ex):- Complete v3 metadata struct with validation
node_typefield for array vs group distinctiondata_typestring format (e.g., "int32", "float64")codecsarray with three-stage pipelinechunk_gridandchunk_key_encodingextensions
Chunk key encoding (
lib/ex_zarr/chunk_key.ex):- v2: dot-separated (e.g., "0.1.2")
- v3: slash-separated with prefix (e.g., "c/0/1/2")
encode/2anddecode/2for version-aware conversion
Codec pipeline (
lib/ex_zarr/codecs/pipeline_v3.ex):- Three-stage pipeline validation and execution
- Array→array codecs (filters/transforms)
- Array→bytes codec (required serializer)
- Bytes→bytes codecs (compression)
- Strict ordering enforcement per v3 spec
Gzip Format Details
- windowBits parameter in Erlang's
:zlib:15= Zlib/Deflate format (RFC 1950)16 + 15= Gzip format (RFC 1952)+16modifier adds gzip wrapper (header + CRC32 trailer)
- Magic bytes:
- Zlib:
0x78 0x9C - Gzip:
0x1F 0x8B
- Zlib:
- Compatibility: Gzip format required by zarr-python 3.x
Testing
- Total test count: 482 tests (up from 238 in v0.2.0)
- New v3 tests: 16 Python interoperability tests
- Test exclusions: Python tests automatically excluded without zarr-python 3.x
- Success rate: 100% passing (0 failures)
- Dialyzer: Zero type errors
Breaking Changes
None - Full backward compatibility maintained with v2 arrays
Planned Features
- S3 storage backend
- Parallel chunk operations
- Advanced indexing (fancy indexing, boolean indexing)
- Filter pipeline support (delta, quantize, shuffle)
- Additional codecs (lzma)
- v3 sharding extension
- v3 storage transformers