Hemdal.Check (Hemdal v1.2.0)

View Source

Every check performed by Hemdal is based on a state machine which is in charge of running a command, check the return and based on the return and if it was successfully executed or not, determine the state of the machine.

The state machine has the following states:

  • disabled: it's not performing checks, it waits until it's activated.
  • normal: it's running correctly the command and always receiving a success state. It's configuring a state timeout based on check_in_sec from Hemdal.Config.Alert.
  • failing: when in normal state, it receives an failed response, it's moved to failing status. It's configuring a state timeout based on recheck_in_sec and if it's not recovering after a number of retries it's moving to broken (see Hemdal.Config.Alert).
  • broken: it was not running correctly for some time. We consider the subject under check broken and we are checking every broken_recheck_in_sec seconds. Only if it's recovered it back to normal state.

Summary

Types

The alert ID in use to identify the state machine running the checks for the alert.

The returned status retrieved from the process is built to contain a map with keys which are strings and the content which could be different depending on the key. The keys are the following ones

The status available inside of the events. It's valid for both, current and previous state.

t()

Functions

Returns a specification to start this module under a supervisor.

Check if the alert is running.

Get all of the alerts running. It's requesting to the supervisor the list of all of the alerts and it's gathering the status for each one based on the get_status/1 function.

Returns the PID of the alert process if it's running.

Get the status of an alert. It's requesting the status directly to the process.

Reload all of the alerts based on the configuration backend. See Hemdal.Config for further information. If the alert isn't running it's starting it.

Ensure all of the alerts are started.

Update the alert passing the new configuration to the process. It's useful when we want to change the configuration for the command, the host or whatever else inside of the alert/check.

Types

alert_id()

@type alert_id() :: String.t()

The alert ID in use to identify the state machine running the checks for the alert.

returned_status()

@type returned_status() :: map()

The returned status retrieved from the process is built to contain a map with keys which are strings and the content which could be different depending on the key. The keys are the following ones:

  • status is an atom and it could be :ok, :disabled, :warn or :error.
  • alert is a map which is including information for the alert itself, information like: id, name, host, and command.
  • last_update is a naive datetime generated at the moment.
  • result is a map with information of the executed command.

status()

@type status() :: :ok | :warn | :error | :disabled

The status available inside of the events. It's valid for both, current and previous state.

t()

@type t() :: %Hemdal.Check{
  alert: Hemdal.Config.Alert.t() | nil,
  fail_started: NaiveDateTime.t() | nil,
  last_update: NaiveDateTime.t(),
  retries: non_neg_integer(),
  status: returned_status() | nil
}

Functions

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

exists?(alert_id)

@spec exists?(alert_id()) :: boolean()

Check if the alert is running.

get_all()

@spec get_all() :: [returned_status()]

Get all of the alerts running. It's requesting to the supervisor the list of all of the alerts and it's gathering the status for each one based on the get_status/1 function.

get_pid(alert_id)

@spec get_pid(alert_id()) :: pid() | nil

Returns the PID of the alert process if it's running.

get_status(pid)

@spec get_status(pid() | alert_id()) :: [returned_status()]

Get the status of an alert. It's requesting the status directly to the process.

reload_all()

@spec reload_all() :: :ok

Reload all of the alerts based on the configuration backend. See Hemdal.Config for further information. If the alert isn't running it's starting it.

start_all()

@spec start_all() :: :ok

Ensure all of the alerts are started.

update_alert(alert)

@spec update_alert(Hemdal.Config.Alert.t()) :: {:ok, pid()}

Update the alert passing the new configuration to the process. It's useful when we want to change the configuration for the command, the host or whatever else inside of the alert/check.