Observability and monitors

Copy Markdown View Source

HostKit lets resources carry observability intent alongside their runtime declarations.

use HostKit.DSL, providers: [HostKit.Providers.Caddy]

project :prod do
  observability do
    logs driver: :journald,
         retention: "14d",
         ship: true,
         attributes: [deployment_environment: :prod]

    telemetry service: "hostkit-prod",
              endpoint: "otel.example.com:4317"
  end

  service :api do
    daemon do
      description "API"
      exec ["/opt/api/bin/server"]
      listen :http, port: 4000
      logs stdout: :journal, stderr: :journal, identifier: "api"
      monitor :systemd, name: :api_unit, expect: [state: :active], severity: :critical
    end

    caddy_site "api.example.com" do
      reverse_proxy :http
      logs driver: :access, attributes: [service_name: :api]
      monitor :http, name: :api_http, url: "https://api.example.com", expect: [status: 200]
    end
  end
end

The declarations can be used to:

Monitors can also attach to the most recently declared resource:

service :data do
  directory "/srv/data", mode: :private_dir
  monitor :filesystem, name: :data_dir, expect: [exists: true]
end

Operational validation commands can be modeled as monitor metadata without making them apply resources:

service :ops do
  file "/usr/local/sbin/dr-validate", content: "..."

  monitor :command,
    name: :dr_validate,
    exec: argv("/usr/local/sbin/dr-validate"),
    expect: [exit: 0],
    severity: :critical
end

Use the same exec: command shapes as command resources: argv(...), ~SH, {command, args}, or [command | args]. Use command/2 resources for operations that change host state and need plan/down behavior; use monitor :command for observational checks.

This keeps checks stable even as resources are refactored.

External monitoring config should be generated from the same intent when possible. HTTP monitors can carry provider-neutral fields such as group, interval, expect, and alerts:

monitor :http,
  name: "api",
  group: "prod",
  url: "https://api.example.com/health",
  interval: "1m",
  expect: [status: 200, response_time_lt: 5000],
  alerts: [:telegram]

Then a provider can render those specs. The Gatus provider maps endpoint specs to Gatus endpoints and conditions:

endpoints = HostKit.Providers.Gatus.endpoints_from_monitors(project)