ExLLM.Infrastructure.CircuitBreaker.HealthCheck (ex_llm v0.8.1)
View SourceHealth check and monitoring system for circuit breakers.
Provides comprehensive health assessment for both individual circuits and the overall circuit breaker system. Includes health scoring, issue detection, and recommendations for maintaining optimal fault tolerance.
Health Scoring
Health scores range from 0-100:
- 90-100: Excellent - Circuit is performing optimally
- 70-89: Good - Circuit is stable with minor concerns
- 50-69: Fair - Circuit has issues that should be monitored
- 30-49: Poor - Circuit requires attention
- 0-29: Critical - Circuit needs immediate intervention
Health Factors
- State: Circuit breaker state (closed/open/half-open)
- Failure Rate: Recent failure percentage
- Recovery Time: Time circuits spend in open state
- Frequency: How often circuits are being triggered
- Bulkhead Utilization: Concurrency and queue usage
- Configuration: Threshold appropriateness
Usage
# Check overall system health
ExLLM.Infrastructure.CircuitBreaker.HealthCheck.system_health()
# Check specific circuit health
ExLLM.Infrastructure.CircuitBreaker.HealthCheck.circuit_health("api_service")
# Get health summary for all circuits
ExLLM.Infrastructure.CircuitBreaker.HealthCheck.health_summary()
# Get detailed health report
ExLLM.Infrastructure.CircuitBreaker.HealthCheck.health_report()
Summary
Functions
Get detailed health status for a specific circuit.
Get circuits that need immediate attention.
Get a detailed health report for dashboard/monitoring systems.
Get a summary of health status for all circuits.
Check if the circuit breaker system is healthy overall.
Get comprehensive health status for the entire circuit breaker system.
Types
@type circuit_health() :: %{ circuit_name: String.t(), health_score: health_score(), health_level: health_level(), state: :closed | :open | :half_open, issues: [String.t()], recommendations: [String.t()], metrics: map(), last_updated: DateTime.t() }
@type health_level() :: :excellent | :good | :fair | :poor | :critical
@type health_score() :: 0..100
@type system_health() :: %{ overall_score: health_score(), overall_level: health_level(), total_circuits: non_neg_integer(), healthy_circuits: non_neg_integer(), unhealthy_circuits: non_neg_integer(), critical_circuits: non_neg_integer(), issues: [String.t()], recommendations: [String.t()], last_updated: DateTime.t() }
Functions
@spec circuit_health( String.t(), keyword() ) :: {:ok, circuit_health()} | {:error, term()}
Get detailed health status for a specific circuit.
Get circuits that need immediate attention.
Get a detailed health report for dashboard/monitoring systems.
Get a summary of health status for all circuits.
Check if the circuit breaker system is healthy overall.
@spec system_health(keyword()) :: {:ok, system_health()} | {:error, term()}
Get comprehensive health status for the entire circuit breaker system.