Reconnection Architecture Guide
View SourceOverview
WebsockexAdapter implements a dual-layer reconnection architecture that provides both network-level and process-level reliability. This design leverages Gun's flexible process ownership model to ensure robust connection management for financial trading systems.
Dual-Layer Design Rationale
Layer 1: Client-Level Reconnection (Network Recovery)
The WebsockexAdapter.Client
GenServer provides built-in reconnection for network-level failures:
- Handles temporary network outages
- Manages WebSocket protocol errors
- Preserves connection state during recovery
- Configurable via
reconnect_on_error
option (default: true)
Layer 2: Adapter-Level Supervision (Process Recovery)
Platform adapters like DeribitGenServerAdapter
provide process-level supervision:
- Monitors Client process health
- Handles complete process crashes
- Restores authentication state
- Re-establishes subscriptions
- Always sets
reconnect_on_error: false
to prevent duplicate attempts
Gun Process Ownership Model
Gun creates a dedicated process for each WebSocket connection with flexible ownership:
┌─────────────────┐
│ Client │
│ GenServer │─── owns ──→ ┌─────────────┐
└─────────────────┘ │ Gun Process │
└─────────────┘
Key Features:
- The process that creates the Gun connection owns it initially
- Gun sends all messages to the owning process
- Ownership can be transferred between processes if needed
- If owner dies, Gun process terminates
- New connections require new Gun processes
This ownership model enables our Client GenServer to directly receive and handle all Gun messages, including WebSocket frames and heartbeats.
When to Use Each Reconnection Mechanism
Use Client-Level Reconnection (Standalone Pattern)
For simple use cases without complex state requirements:
# Client handles its own reconnection
{:ok, client} = Client.connect(url) # reconnect_on_error: true (default)
Characteristics:
- Network failures trigger automatic reconnection
- Connection state preserved
- No authentication restoration
- No subscription restoration
- Suitable for read-only market data
Use Adapter-Level Supervision (Supervised Pattern)
For production trading systems requiring full state restoration:
# Adapter disables client reconnection
connect_opts = [
reconnect_on_error: false, # Client stops cleanly on errors
heartbeat_config: %{...} # Other options preserved
]
Characteristics:
- Adapter monitors Client process
- Full state restoration on crashes
- Authentication automatically restored
- Subscriptions re-established
- Cancel-on-disconnect protection maintained
Message and Failure Flows
Normal Operation Flow
User Request
↓
Adapter GenServer
↓
Client GenServer
↓
Gun Process ←──── WebSocket ────→ Exchange
↓
Client GenServer (receives Gun messages directly)
↓
Adapter GenServer
↓
User Response
Network Failure with Client Reconnection
Network Failure Detected
↓
Gun Process notifies Client GenServer
↓
Client checks reconnect_on_error
↓
If true: Exponential backoff retry
↓
Create new Gun process (Client owns it)
↓
Re-establish WebSocket
↓
Resume message flow
Process Crash with Adapter Supervision
Client Process Crash
↓
Adapter receives {:DOWN, ...}
↓
Adapter initiates reconnection
↓
Create new Client (reconnect_on_error: false)
↓
New Client creates Gun process (owns it)
↓
Restore authentication
↓
Re-establish subscriptions
↓
Resume trading operations
Architecture Decision Records
ADR-001: Dual-Layer Reconnection
Context: Different use cases require different reconnection strategies. Decision: Implement reconnection at both Client and Adapter layers. Consequences: Clear separation of concerns, no duplicate attempts, flexible deployment.
ADR-002: Configuration-Based Behavior
Context: Need to prevent duplicate reconnection attempts in supervised scenarios.
Decision: Use reconnect_on_error
flag to control Client behavior.
Consequences: Backward compatibility maintained, explicit configuration required.
ADR-003: Client GenServer Owns Gun Connection
Context: Gun sends messages to the owning process; integrated heartbeat needs these messages. Decision: Client GenServer always owns the Gun connection it creates. Consequences: Direct message handling, simplified architecture, no message forwarding needed.
ADR-004: Adapter Ownership of Reconnection Logic
Context: Production systems need full state restoration beyond network recovery. Decision: Adapters always disable Client reconnection and handle it themselves. Consequences: Consistent pattern, predictable behavior, simplified debugging.
Performance Considerations
Connection Pooling: Each reconnection creates a new Gun process. Consider connection pooling for high-frequency reconnection scenarios.
Exponential Backoff: Both layers implement exponential backoff to prevent thundering herd problems.
State Restoration Cost: Adapter-level reconnection has higher overhead due to authentication and subscription restoration.
Memory Usage: Failed Gun processes are garbage collected. Monitor process count during extended outages.
Message Routing: Client GenServer owns Gun connection, eliminating message forwarding overhead.
Best Practices
- Always use supervised pattern for production trading
- Monitor both Client and Adapter processes
- Implement circuit breakers for persistent failures
- Log reconnection attempts with correlation IDs
- Test reconnection scenarios in staging environments
- Set appropriate timeout values for your latency requirements
- Ensure Client GenServer owns Gun connection for direct message handling
Implementation Details
Client Configuration
defmodule WebsockexAdapter.Config do
defstruct [
# ... other fields ...
reconnect_on_error: true, # Default: enabled for standalone use
# ... other fields ...
]
end
Adapter Pattern
# In adapter's do_connect function
connect_opts = [
heartbeat_config: %{
type: :deribit,
interval: heartbeat_interval
},
reconnect_on_error: false # Critical: prevents duplicate reconnection
]
{:ok, client} = Client.connect(url, connect_opts)
Monitor Pattern
# Adapter monitors Client for process-level failures
ref = Process.monitor(client.server_pid)
# Handle Client death
def handle_info({:DOWN, ^ref, :process, _pid, reason}, state) do
# Initiate adapter-level reconnection with full state restoration
end