Hemdal.Check (Hemdal v1.1.0)
View SourceEvery check performed by Hemdal is based on a state machine which is in charge of running a command, check the return and based on the return and if it was successfully executed or not, determine the state of the machine.
The state machine has the following states:
disabled
: it's not performing checks, it waits until it's activated.normal
: it's running correctly the command and always receiving a success state. It's configuring a state timeout based oncheck_in_sec
fromHemdal.Config.Alert
.failing
: when innormal
state, it receives an failed response, it's moved tofailing
status. It's configuring a state timeout based onrecheck_in_sec
and if it's not recovering after a number ofretries
it's moving tobroken
(seeHemdal.Config.Alert
).broken
: it was not running correctly for some time. We consider the subject under check broken and we are checking everybroken_recheck_in_sec
seconds. Only if it's recovered it back tonormal
state.
Summary
Types
The alert ID in use to identify the state machine running the checks for the alert.
The returned status retrieved from the process is built to contain a map with keys which are strings and the content which could be different depending on the key. The keys are the following ones
The status available inside of the events. It's valid for both, current and previous state.
Functions
Returns a specification to start this module under a supervisor.
Check if the alert is running.
Get all of the alerts running. It's requesting to the supervisor the list
of all of the alerts and it's gathering the status for each one based on
the get_status/1
function.
Returns the PID of the alert process if it's running.
Get the status of an alert. It's requesting the status directly to the process.
Reload all of the alerts based on the configuration backend. See
Hemdal.Config
for further information. If the alert isn't running
it's starting it.
Ensure all of the alerts are started.
Update the alert passing the new configuration to the process. It's useful when we want to change the configuration for the command, the host or whatever else inside of the alert/check.
Types
@type alert_id() :: String.t()
The alert ID in use to identify the state machine running the checks for the alert.
@type returned_status() :: map()
The returned status retrieved from the process is built to contain a map with keys which are strings and the content which could be different depending on the key. The keys are the following ones:
status
is an atom and it could be:ok
,:disabled
,:warn
or:error
.alert
is a map which is including information for the alert itself, information like: id, name, host, and command.last_update
is a naive datetime generated at the moment.result
is a map with information of the executed command.
@type status() :: :ok | :warn | :error | :disabled
The status available inside of the events. It's valid for both, current and previous state.
@type t() :: %Hemdal.Check{ alert: Hemdal.Config.Alert.t() | nil, fail_started: NaiveDateTime.t() | nil, last_update: NaiveDateTime.t(), retries: non_neg_integer(), status: returned_status() | nil }
Functions
Returns a specification to start this module under a supervisor.
See Supervisor
.
Check if the alert is running.
@spec get_all() :: [returned_status()]
Get all of the alerts running. It's requesting to the supervisor the list
of all of the alerts and it's gathering the status for each one based on
the get_status/1
function.
Returns the PID of the alert process if it's running.
@spec get_status(pid() | alert_id()) :: [returned_status()]
Get the status of an alert. It's requesting the status directly to the process.
@spec reload_all() :: :ok
Reload all of the alerts based on the configuration backend. See
Hemdal.Config
for further information. If the alert isn't running
it's starting it.
@spec start_all() :: :ok
Ensure all of the alerts are started.
@spec update_alert(Hemdal.Config.Alert.t()) :: {:ok, pid()}
Update the alert passing the new configuration to the process. It's useful when we want to change the configuration for the command, the host or whatever else inside of the alert/check.