ExZarr.Storage.Backend.MongoGridFS (ExZarr v1.1.0)

View Source

MongoDB GridFS storage backend for Zarr arrays.

Stores chunks and metadata in MongoDB using GridFS, which is designed for storing and retrieving large files. Ideal for distributed deployments with MongoDB replication.

Configuration

Requires the following options:

  • :url - MongoDB connection URL (required)
  • :database - Database name (required)
  • :bucket - GridFS bucket name (optional, default: "zarr")
  • :array_id - Unique identifier for this array (required)

Dependencies

Requires the mongodb_driver package:

{:mongodb_driver, "~> 1.4"}

Example

# Register the MongoDB GridFS backend
:ok = ExZarr.Storage.Registry.register(ExZarr.Storage.Backend.MongoGridFS)

# Create array with MongoDB GridFS storage
{:ok, array} = ExZarr.create(
  shape: {1000, 1000},
  chunks: {100, 100},
  dtype: :float64,
  storage: :mongo_gridfs,
  url: "mongodb://localhost:27017",
  database: "zarr_db",
  bucket: "arrays",
  array_id: "experiment_001"
)

# Write and read data
ExZarr.Array.set_slice(array, data, start: {0, 0}, stop: {100, 100})
{:ok, result} = ExZarr.Array.get_slice(array, start: {0, 0}, stop: {100, 100})

GridFS Structure

Files are stored with the following naming convention:

{array_id}/.zarray           # Metadata
{array_id}/0.0               # Chunk at index (0, 0)
{array_id}/0.1               # Chunk at index (0, 1)

Performance Considerations

  • GridFS chunks files into 255KB pieces by default
  • Suitable for storing arrays with chunk sizes > 16MB
  • Benefits from MongoDB sharding for large deployments
  • Consider using indexes on filename for faster lookups

Error Handling

MongoDB errors are returned as {:error, reason} tuples. Common errors:

  • :connection_error - Cannot connect to MongoDB
  • :not_found - File doesn't exist
  • :database_error - MongoDB operation failed