Mimir.Health (Mimir v0.1.0)

Copy Markdown View Source

Failure-streak table for router lanes.

This is NOT a circuit breaker. There is no half-open state and no automatic probe logic. Any single success resets the streak to zero. The :degraded state is purely informational: the Oracle uses it to exclude the lane from routing candidates during a snapshot window. Recovery happens the moment the next successful completion arrives.

State is stored in an ETS table owned by this GenServer. All reads and writes go directly to ETS (no GenServer call overhead) — the GenServer exists only for lifecycle (create table on start, supervise ownership).

Lane naming

A "lane" is the provider prefix of a resolved model string — e.g. the lane for "anthropic:claude-sonnet-4-6" is "anthropic". The telemetry handler derives this by splitting on the first ":". This matches the convention used in the router catalog (lane: "anthropic") and the Oracle's snap.health lookup key.

Threshold

:degraded when streak >= Application.get_env(:mimir, :health_threshold, 3). The threshold is read at call time so it can be overridden in tests without restarting the GenServer.

Completion event

attach/0 binds the handler to Application.get_env(:mimir, :completion_event, [:mimir, :completion]). An embedder that emits its own app-namespaced completion event can point Health at it by setting :mimir, :completion_event in config.

Summary

Functions

Returns a lane → state map for every lane that has been recorded. Used by Snapshot.assemble/1 to populate health.

Attach the telemetry handler for the configured completion event (:mimir, :completion_event).

Returns a specification to start this module under a supervisor.

Detach the telemetry handler attached by attach/0.

Increment the failure streak for lane by 1.

Reset the failure streak for lane to 0.

Delete all rows from the health table. Intended for test isolation only.

Start the health table owner. opts are unused; accepted for supervision-tree conformance.

Returns :ok or :degraded for lane. Unknown lanes are :ok.

Functions

all()

@spec all() :: %{required(String.t()) => :ok | :degraded}

Returns a lane → state map for every lane that has been recorded. Used by Snapshot.assemble/1 to populate health.

attach()

@spec attach() :: :ok | {:error, term()}

Attach the telemetry handler for the configured completion event (:mimir, :completion_event).

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

detach()

@spec detach() :: :ok | {:error, :not_found}

Detach the telemetry handler attached by attach/0.

record_failure(lane)

@spec record_failure(String.t()) :: :ok

Increment the failure streak for lane by 1.

record_success(lane)

@spec record_success(String.t()) :: :ok

Reset the failure streak for lane to 0.

reset()

@spec reset() :: :ok

Delete all rows from the health table. Intended for test isolation only.

start_link(opts \\ [])

@spec start_link(keyword()) :: GenServer.on_start()

Start the health table owner. opts are unused; accepted for supervision-tree conformance.

state(lane)

@spec state(String.t()) :: :ok | :degraded

Returns :ok or :degraded for lane. Unknown lanes are :ok.