ExESDB Architecture Analysis

Overview

ExESDB is a BEAM-native Event Store built on top of the Khepri library, which in turn is built on the Ra library. It's designed as a distributed, fault-tolerant event sourcing system that leverages the strengths of the BEAM ecosystem for handling concurrent, distributed workloads.

High-Level Architecture

graph TD
    A[ExESDB.App] --> B[ExESDB.System]
    B --> C[PubSub Layer]
    B --> D[Store Management]
    B --> E[Cluster Management]
    B --> F[Event Processing]
    B --> G[Gateway Layer]
    
    D --> D1[StoreManager]
    D --> D2[Store Workers]
    D --> D3[Khepri Backend]
    
    E --> E1[LibCluster]
    E --> E2[ClusterSystem]
    E --> E3[KhepriCluster]
    E --> E4[LeaderSystem]
    
    F --> F1[Streams]
    F --> F2[Snapshots]
    F --> F3[Subscriptions]
    
    G --> G1[GatewaySupervisor]
    G --> G2[GatewayWorker]

Core Components

1. Application Layer

ExESDB.App

  • Purpose: Main application entry point
  • Responsibilities:
    • Application lifecycle management
    • Initial configuration loading
    • Starting the main supervisor tree
    • Graceful shutdown handling

ExESDB.System

  • Purpose: Top-level supervisor for the entire system
  • Responsibilities:
    • Supervises all major subsystems
    • Manages system startup sequence
    • Handles OS signal processing
    • Dynamically configures components based on deployment mode (single vs cluster)

2. Storage Layer

ExESDB.StoreManager

  • Purpose: Multi-store management and coordination
  • Responsibilities:
    • Dynamic store creation and removal
    • Store lifecycle management
    • Configuration management per store
    • Store status tracking

ExESDB.Store

  • Purpose: Individual event store wrapper around Khepri
  • Responsibilities:
    • Khepri store initialization
    • Store state management
    • Direct interaction with Khepri API
graph LR
    A[StoreManager] --> B[Store1]
    A --> C[Store2]
    A --> D[StoreN]
    
    B --> E[Khepri Instance 1]
    C --> F[Khepri Instance 2]
    D --> G[Khepri Instance N]
    
    E --> H[Data Directory 1]
    F --> I[Data Directory 2]
    G --> J[Data Directory N]

3. Clustering Layer

The clustering layer provides distributed coordination and fault tolerance:

ExESDB.KhepriCluster

  • Purpose: Khepri-specific cluster coordination
  • Responsibilities:
    • Cluster join/leave operations
    • Leadership detection and tracking
    • Membership monitoring
    • Node health monitoring

ExESDB.ClusterSystem

  • Purpose: High-level cluster coordination
  • Responsibilities:
    • Supervises cluster coordination components
    • Manages cluster-specific services
    • Handles split-brain prevention

ExESDB.LeaderSystem

  • Purpose: Leadership management
  • Responsibilities:
    • Leader election coordination
    • Leader-specific functionality activation
    • Leader state tracking
graph TD
    A[LibCluster] --> B[Node Discovery]
    B --> C[KhepriCluster]
    C --> D[Cluster Join/Leave]
    C --> E[Leadership Detection]
    C --> F[Membership Monitoring]
    
    G[ClusterSystem] --> H[ClusterCoordinator]
    G --> I[NodeMonitor]
    
    J[LeaderSystem] --> K[LeaderWorker]
    J --> L[LeaderTracker]
    
    style A fill:#e1f5fe
    style G fill:#f3e5f5
    style J fill:#e8f5e8

4. Event Processing Layer

ExESDB.Streams

  • Purpose: Event stream management
  • Responsibilities:
    • Stream read/write operations
    • Stream partitioning via PartitionSupervisor
    • Worker pool management for stream operations

ExESDB.Snapshots

  • Purpose: Snapshot management for event sourcing
  • Responsibilities:
    • Snapshot creation and retrieval
    • Snapshot versioning
    • Snapshot storage path management

ExESDB.Subscriptions

  • Purpose: Event subscription management
  • Responsibilities:
    • Subscription lifecycle management
    • Event delivery to subscribers
    • Subscription persistence
graph TD
    A[Streams] --> B[StreamsWriters Pool]
    A --> C[StreamsReaders Pool]
    
    D[Snapshots] --> E[SnapshotsWriters Pool]
    D --> F[SnapshotsReaders Pool]
    
    G[Subscriptions] --> H[SubscriptionsReader]
    G --> I[SubscriptionsWriter]
    
    B --> J[DynamicSupervisor]
    C --> K[DynamicSupervisor]
    E --> L[DynamicSupervisor]
    F --> M[DynamicSupervisor]
    
    J --> N[StreamWriterWorker1]
    J --> O[StreamWriterWorker2]
    K --> P[StreamReaderWorker1]
    K --> Q[StreamReaderWorker2]

5. Communication Layer

PubSub Integration

  • Purpose: Inter-process and inter-node communication
  • Responsibilities:
    • Event broadcasting
    • Subscription management
    • Message routing

ExESDB.GatewaySupervisor & GatewayWorker

  • Purpose: External API gateway
  • Responsibilities:
    • External client request handling
    • API endpoint management
    • Request routing to appropriate subsystems

Architecture Patterns

1. Supervision Tree Pattern

graph TD
    A[ExESDB.App] --> B[ExESDB.System]
    B --> C[PubSub]
    B --> D[StoreManager]
    B --> E[Streams]
    B --> F[Snapshots]
    B --> G[Subscriptions]
    B --> H[LeaderSystem]
    B --> I[KhepriCluster]
    B --> J[GatewaySupervisor]
    B --> K[ClusterSystem]
    B --> L[EmitterPools]
    
    E --> M[PartitionSupervisor - Writers]
    E --> N[PartitionSupervisor - Readers]
    
    F --> O[PartitionSupervisor - Writers]
    F --> P[PartitionSupervisor - Readers]
    
    H --> Q[LeaderWorker]
    H --> R[LeaderTracker]
    
    K --> S[ClusterCoordinator]
    K --> T[NodeMonitor]
    
    J --> U[GatewayWorker]

2. Worker Pool Pattern

ExESDB extensively uses worker pools for different types of operations:

graph LR
    A[Client Request] --> B[PartitionSupervisor]
    B --> C[DynamicSupervisor]
    C --> D[Worker1]
    C --> E[Worker2]
    C --> F[WorkerN]
    
    G[Hash Ring] --> B
    H[Load Balancing] --> B

3. Distributed State Management

sequenceDiagram
    participant C as Client
    participant G as Gateway
    participant L as Leader
    participant F as Follower
    participant K as Khepri
    
    C->>G: Write Request
    G->>L: Route to Leader
    L->>K: Write to Khepri
    K->>F: Replicate to Followers
    F->>K: Acknowledge
    K->>L: Confirm Write
    L->>G: Success Response
    G->>C: Return Result

Deployment Modes

Single Node Mode

  • Configuration: db_type: :single
  • Characteristics:
    • No clustering components
    • Local-only operations
    • Simplified architecture
    • Development/testing focused

Cluster Mode

  • Configuration: db_type: :cluster
  • Characteristics:
    • Full clustering capabilities
    • Distributed consensus via Ra
    • Fault tolerance
    • Production-ready
graph TD
    A[Configuration] --> B{db_type}
    B -->|:single| C[Single Node Components]
    B -->|:cluster| D[Cluster Components]
    
    C --> E[StoreManager]
    C --> F[Local Streams]
    C --> G[Local Snapshots]
    
    D --> H[LibCluster]
    D --> I[ClusterSystem]
    D --> J[KhepriCluster]
    D --> K[Distributed Components]

Data Flow

Event Write Flow

sequenceDiagram
    participant C as Client
    participant GW as Gateway
    participant SM as StoreManager
    participant S as Store
    participant K as Khepri
    participant PS as PubSub
    
    C->>GW: Write Event
    GW->>SM: Route to Store
    SM->>S: Write Request
    S->>K: Store Event
    K->>S: Confirm Write
    S->>PS: Publish Event
    PS->>C: Event Notification
    S->>GW: Success Response
    GW->>C: Return Result

Event Read Flow

sequenceDiagram
    participant C as Client
    participant GW as Gateway
    participant SR as StreamReader
    participant S as Store
    participant K as Khepri
    
    C->>GW: Read Stream
    GW->>SR: Route to Reader
    SR->>S: Read Request
    S->>K: Query Events
    K->>S: Return Events
    S->>SR: Event Data
    SR->>GW: Stream Response
    GW->>C: Return Events

Key Design Decisions

1. Khepri as Backend

  • Rationale: BEAM-native, Ra-based distributed database
  • Benefits:
    • Native Erlang integration
    • Built-in clustering
    • Strong consistency guarantees
    • Fault tolerance

2. Supervisor Tree Architecture

  • Rationale: Leverages OTP supervision principles
  • Benefits:
    • Fault isolation
    • Automatic restart strategies
    • System resilience
    • Clear responsibility boundaries

3. Worker Pool Pattern

  • Rationale: Efficient concurrent processing
  • Benefits:
    • Load distribution
    • Resource management
    • Scalability
    • Fault tolerance

4. Multi-Store Architecture

  • Rationale: Support for multiple event stores in single cluster
  • Benefits:
    • Tenant isolation
    • Resource optimization
    • Flexible deployment
    • Gradual migration support

Performance Considerations

Partitioning Strategy

  • Uses PartitionSupervisor for distributing workload
  • Hash-based routing for even distribution
  • Separate pools for read/write operations

Clustering Optimization

  • Configurable probe intervals for node monitoring
  • Failure thresholds to prevent cascade failures
  • Efficient membership change detection

Resource Management

  • Dynamic worker creation/destruction
  • Configurable timeout values
  • Memory-efficient event storage via Khepri

Security Considerations

Network Security

  • Node-to-node communication via Erlang distribution
  • Cluster authentication via shared secrets
  • Network partitioning detection and handling

Access Control

  • Gateway-based request filtering
  • Store-level access control
  • Subscription-based permissions

Monitoring and Observability

Metrics Collection

  • Built-in metrics module
  • Performance monitoring
  • Cluster health tracking

Logging Strategy

  • Structured logging throughout
  • Configurable log levels
  • Cluster-aware log correlation

Health Checks

  • Node health monitoring
  • Store availability checks
  • Leadership status tracking

Scalability Patterns

Horizontal Scaling

  • Add nodes to existing cluster
  • Automatic workload redistribution
  • Leader election for coordination

Vertical Scaling

  • Worker pool sizing
  • Memory allocation tuning
  • Timeout configuration

This architecture provides a solid foundation for building distributed, fault-tolerant event sourcing systems while leveraging the unique strengths of the BEAM ecosystem.