ExMCP.Reliability.HealthCheck (ex_mcp v0.9.2)

View Source

Health check system for MCP clients and servers.

Provides proactive health monitoring, automatic failure detection, and recovery mechanisms for MCP services.

Features

  • Periodic health checks with configurable intervals
  • Multiple check strategies (ping, capability check, custom)
  • Automatic status updates and notifications
  • Integration with circuit breakers and retry logic
  • Health metrics and history tracking

Usage

# Start health checker for a client
{:ok, checker} = HealthCheck.start_link(
  name: :my_health_check,
  target: client_pid,
  check_interval: 30_000,
  timeout: 5_000,
  failure_threshold: 3,
  recovery_threshold: 2
)

# Get current health status
HealthCheck.get_status(checker)
#=> {:healthy, %{last_check: ~U[...], consecutive_successes: 5}}

# Subscribe to health events
HealthCheck.subscribe(checker)

# Manual health check
HealthCheck.check_now(checker)

Summary

Functions

Triggers an immediate health check.

Returns a specification to start this module under a supervisor.

Gets health check history.

Gets the current health status.

Creates a health check function for MCP clients.

Creates a health check function for MCP servers.

Pauses health checks.

Resumes health checks.

Starts a health check process.

Subscribes to health status changes.

Unsubscribes from health status changes.

Types

check_result()

@type check_result() :: %{
  status: status(),
  timestamp: DateTime.t(),
  duration_ms: non_neg_integer(),
  details: map()
}

status()

@type status() :: :healthy | :unhealthy | :degraded | :unknown

t()

@type t() :: %ExMCP.Reliability.HealthCheck{
  check_fn: (any() -> {:ok, map()} | {:error, any()}) | nil,
  check_interval: pos_integer(),
  consecutive_failures: non_neg_integer(),
  consecutive_successes: non_neg_integer(),
  failure_threshold: pos_integer(),
  history: [check_result()],
  last_check_result: check_result() | nil,
  last_check_time: DateTime.t() | nil,
  metadata: map(),
  name: atom(),
  on_status_change: (status(), status() -> any()) | nil,
  recovery_threshold: pos_integer(),
  status: status(),
  subscribers: MapSet.t(pid()),
  target: pid() | atom(),
  timeout: pos_integer(),
  timer_ref: reference() | nil
}

Functions

check_now(checker)

@spec check_now(GenServer.server()) :: check_result()

Triggers an immediate health check.

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

get_history(checker, limit \\ 10)

@spec get_history(GenServer.server(), pos_integer()) :: [check_result()]

Gets health check history.

get_status(checker)

@spec get_status(GenServer.server()) :: {status(), map()}

Gets the current health status.

mcp_client_check_fn()

@spec mcp_client_check_fn() :: (pid() -> {:ok, map()} | {:error, any()})

Creates a health check function for MCP clients.

This function attempts to list tools as a health check.

mcp_server_check_fn()

@spec mcp_server_check_fn() :: (pid() -> {:ok, map()} | {:error, any()})

Creates a health check function for MCP servers.

This function sends an initialize request to check server health.

pause(checker)

@spec pause(GenServer.server()) :: :ok

Pauses health checks.

resume(checker)

@spec resume(GenServer.server()) :: :ok

Resumes health checks.

start_link(opts)

@spec start_link(keyword()) :: GenServer.on_start()

Starts a health check process.

Options

  • :name - Process name (required)
  • :target - PID or name of process to check (required)
  • :check_fn - Custom check function (optional, defaults to MCP ping)
  • :check_interval - Ms between checks (default: 60000)
  • :timeout - Check timeout in ms (default: 5000)
  • :failure_threshold - Failures before unhealthy (default: 3)
  • :recovery_threshold - Successes before healthy (default: 2)
  • :on_status_change - Callback for status changes

subscribe(checker)

@spec subscribe(GenServer.server()) :: :ok

Subscribes to health status changes.

Subscribers receive messages: {:health_status_changed, old_status, new_status, metadata}

unsubscribe(checker)

@spec unsubscribe(GenServer.server()) :: :ok

Unsubscribes from health status changes.