Failure-streak table for router lanes.
This is NOT a circuit breaker. There is no half-open state and no
automatic probe logic. Any single success resets the streak to zero. The
:degraded state is purely informational: the Oracle uses it to exclude
the lane from routing candidates during a snapshot window. Recovery happens
the moment the next successful completion arrives.
State is stored in an ETS table owned by this GenServer. All reads and writes go directly to ETS (no GenServer call overhead) — the GenServer exists only for lifecycle (create table on start, supervise ownership).
Lane naming
A "lane" is the provider prefix of a resolved model string — e.g. the lane
for "anthropic:claude-sonnet-4-6" is "anthropic". The telemetry handler
derives this by splitting on the first ":". This matches the convention
used in the router catalog (lane: "anthropic") and the Oracle's
snap.health lookup key.
Threshold
:degraded when streak >= Application.get_env(:mimir, :health_threshold, 3).
The threshold is read at call time so it can be overridden in tests without
restarting the GenServer.
Completion event
attach/0 binds the handler to Application.get_env(:mimir, :completion_event, [:mimir, :completion]). An embedder that emits its own app-namespaced
completion event can point Health at it by setting :mimir, :completion_event
in config.
Summary
Functions
Returns a lane → state map for every lane that has been recorded.
Used by Snapshot.assemble/1 to populate health.
Attach the telemetry handler for the configured completion event (:mimir, :completion_event).
Returns a specification to start this module under a supervisor.
Increment the failure streak for lane by 1.
Reset the failure streak for lane to 0.
Delete all rows from the health table. Intended for test isolation only.
Start the health table owner. opts are unused; accepted for supervision-tree conformance.
Returns :ok or :degraded for lane. Unknown lanes are :ok.
Functions
@spec all() :: %{required(String.t()) => :ok | :degraded}
Returns a lane → state map for every lane that has been recorded.
Used by Snapshot.assemble/1 to populate health.
@spec attach() :: :ok | {:error, term()}
Attach the telemetry handler for the configured completion event (:mimir, :completion_event).
Returns a specification to start this module under a supervisor.
See Supervisor.
@spec detach() :: :ok | {:error, :not_found}
Detach the telemetry handler attached by attach/0.
@spec record_failure(String.t()) :: :ok
Increment the failure streak for lane by 1.
@spec record_success(String.t()) :: :ok
Reset the failure streak for lane to 0.
@spec reset() :: :ok
Delete all rows from the health table. Intended for test isolation only.
@spec start_link(keyword()) :: GenServer.on_start()
Start the health table owner. opts are unused; accepted for supervision-tree conformance.
@spec state(String.t()) :: :ok | :degraded
Returns :ok or :degraded for lane. Unknown lanes are :ok.