ExMCP.Reliability.HealthCheck (ex_mcp v0.9.2)
View SourceHealth check system for MCP clients and servers.
Provides proactive health monitoring, automatic failure detection, and recovery mechanisms for MCP services.
Features
- Periodic health checks with configurable intervals
- Multiple check strategies (ping, capability check, custom)
- Automatic status updates and notifications
- Integration with circuit breakers and retry logic
- Health metrics and history tracking
Usage
# Start health checker for a client
{:ok, checker} = HealthCheck.start_link(
name: :my_health_check,
target: client_pid,
check_interval: 30_000,
timeout: 5_000,
failure_threshold: 3,
recovery_threshold: 2
)
# Get current health status
HealthCheck.get_status(checker)
#=> {:healthy, %{last_check: ~U[...], consecutive_successes: 5}}
# Subscribe to health events
HealthCheck.subscribe(checker)
# Manual health check
HealthCheck.check_now(checker)
Summary
Functions
Triggers an immediate health check.
Returns a specification to start this module under a supervisor.
Gets health check history.
Gets the current health status.
Creates a health check function for MCP clients.
Creates a health check function for MCP servers.
Pauses health checks.
Resumes health checks.
Starts a health check process.
Subscribes to health status changes.
Unsubscribes from health status changes.
Types
@type check_result() :: %{ status: status(), timestamp: DateTime.t(), duration_ms: non_neg_integer(), details: map() }
@type status() :: :healthy | :unhealthy | :degraded | :unknown
@type t() :: %ExMCP.Reliability.HealthCheck{ check_fn: (any() -> {:ok, map()} | {:error, any()}) | nil, check_interval: pos_integer(), consecutive_failures: non_neg_integer(), consecutive_successes: non_neg_integer(), failure_threshold: pos_integer(), history: [check_result()], last_check_result: check_result() | nil, last_check_time: DateTime.t() | nil, metadata: map(), name: atom(), on_status_change: (status(), status() -> any()) | nil, recovery_threshold: pos_integer(), status: status(), subscribers: MapSet.t(pid()), target: pid() | atom(), timeout: pos_integer(), timer_ref: reference() | nil }
Functions
@spec check_now(GenServer.server()) :: check_result()
Triggers an immediate health check.
Returns a specification to start this module under a supervisor.
See Supervisor.
@spec get_history(GenServer.server(), pos_integer()) :: [check_result()]
Gets health check history.
@spec get_status(GenServer.server()) :: {status(), map()}
Gets the current health status.
Creates a health check function for MCP clients.
This function attempts to list tools as a health check.
Creates a health check function for MCP servers.
This function sends an initialize request to check server health.
@spec pause(GenServer.server()) :: :ok
Pauses health checks.
@spec resume(GenServer.server()) :: :ok
Resumes health checks.
@spec start_link(keyword()) :: GenServer.on_start()
Starts a health check process.
Options
:name- Process name (required):target- PID or name of process to check (required):check_fn- Custom check function (optional, defaults to MCP ping):check_interval- Ms between checks (default: 60000):timeout- Check timeout in ms (default: 5000):failure_threshold- Failures before unhealthy (default: 3):recovery_threshold- Successes before healthy (default: 2):on_status_change- Callback for status changes
@spec subscribe(GenServer.server()) :: :ok
Subscribes to health status changes.
Subscribers receive messages: {:health_status_changed, old_status, new_status, metadata}
@spec unsubscribe(GenServer.server()) :: :ok
Unsubscribes from health status changes.