# HostKit

Elixir-native host infrastructure declarations, planning, and runtime control.

HostKit is intended to be used from a normal Mix project with `.exs` infrastructure files. The DSL compiles to plain inspectable structs; Mix tasks are wrappers around the runtime API.

For naming, block shape, defaults, and reference style, see [DSL design guidelines](dsl-guidelines.md).

## Design

- Core owns systemd/systemdkit persistent units.
- Core owns unitctl transient runtime primitives.
- Integrations such as Caddy, Forgejo, object storage, and monitoring are providers.
- DSL evaluation never applies changes to a host.
- Planning and rendering are available as runtime APIs.

## Example

```elixir
use HostKit.DSL

project :toys do
  roots source: "/opt/toys/src",
        data: "/srv/toys",
        state: "/var/lib/toys",
        config: "/etc/toys"

  prefixes user: "toys-", unit: "toys-"

  host :elixir_toys, at: "elixir.toys" do
    ssh do
      user "dannote"
      sudo true
    end
  end

  service :exograph do
    account system: true
    storage :data, mode: 0o755
    storage :state, mode: 0o750

    daemon do
      description "Exograph search"
      after_target :network_online
      wants :network_online
      working_directory path(:source)
      exec ["/usr/local/bin/mix", "exograph.index.hex", "--web", "--port", "4200"]
      restart :on_failure
      restart_sec 10

      isolate do
        writable :data
        writable :state
        network :loopback
      end
    end
  end
end
```

## Plans and down plans

Rollback is represented as another HostKit plan. A plan change already carries `before` and `after` state, so HostKit can derive a down plan from the exact plan that was applied. Down plans include coverage stats for reversible, explicit no-op, and skipped original changes:

```elixir
{:ok, plan} = HostKit.plan(project, target: prod)
{:ok, down_plan} = HostKit.down(plan)

HostKit.format_plan(down_plan)
HostKit.apply(down_plan, confirm: true)
```

Partial rollback uses the same plan model:

```elixir
{:ok, down_plan} =
  HostKit.down(plan, only: [{:file, "/etc/gatehouse/config.exs"}])
```

Command-like operations need semantic down steps because HostKit cannot infer the opposite of an arbitrary command:

```elixir
command :migrate,
  exec: {"bin/app", ["eval", "App.Release.migrate()"]},
  phase: :before_start,
  down: {"bin/app", ["eval", "App.Release.rollback()"]}

command :warm_cache,
  exec: {"bin/app", ["eval", "App.Cache.warm()"]},
  down: :noop
```

The down command is emitted as an ordinary command change in the down plan. `down: :irreversible` records an explicit warning and omits the command from the down plan.

Created resources use conservative rollback policies. File-like resources and symlinks can be deleted by a down plan, but directories are kept unless explicitly opted in:

```elixir
file "/etc/app/config", content: "..."
symlink "/opt/app/current", to: "/opt/app/releases/20260615"
directory "/tmp/demo", rollback: :delete_if_created
directory "/srv/app", rollback: :keep
account :app, system: true, rollback: :keep
package :caddy, rollback: :keep
```

Symlink ownership is unmanaged unless `owner:` or `group:` is explicitly set. This keeps release/current links reproducible across platforms where changing symlink inode ownership is unsupported or unreliable. When explicit symlink ownership is requested, apply verifies it and fails if the target cannot enforce it.

CLI usage mirrors this:

```sh
mix host_kit.plan infra/config.exs --host prod --out up.plan.json
mix host_kit.down up.plan.json --out down.plan.json
mix host_kit.apply --plan down.plan.json --confirm
```

## Run tracking

Tracked applies write minimal run records under the project-configured HostKit runs root:

```sh
mix host_kit.apply --track --plan up.plan.json --confirm
mix host_kit.runs --host prod infra/config.exs
mix host_kit.runs --host prod --verbose infra/config.exs
mix host_kit.runs --host prod --latest --verbose infra/config.exs
mix host_kit.down --host prod --run 20260614-101148-demo-up --out down.plan.json infra/config.exs
```

Run records are intentionally compact: they identify the run, project, direction, timestamp, and applied change statuses. They do not replace plan artifacts; use plan artifacts for inspectable up/down plan contents. When a tracked apply is started from `--plan`, HostKit copies that up-plan artifact under the runs root and records the copied path so `mix host_kit.down --last` can work from the tracked run.

Tracked applies also write backup payloads for previous file-like state when that state was captured in the plan. Backup payloads live under `hostkit_backups/<run-id>/` or the `--backups-root` override. `mix host_kit.down --last` and `mix host_kit.down --run RUN_ID` rewrite supported previous file-like state to `%HostKit.BackupRef{}` entries so generated down plans restore from backup payloads instead of embedding prior content. Backup-backed restore currently covers ordinary files plus rendered file resources such as env files, Caddy sites, proxy config, firewall/egress files, and systemd unit files when their previous rendered content was captured. Symlink rollback restores the previous link target directly in the plan. Use `mix host_kit.runs --verbose`, `--latest`, or `--id RUN_ID` to inspect copied plan artifacts and backup payload paths.

Source updates are intentionally not inferred as reversible by default: a previous Git remote/ref may no longer be reachable. Treat source rollback as an explicit lifecycle operation or pair it with a backup/source-bundle strategy.

Run retention is explicit. Use `mix host_kit.runs --prune --keep N` to remove older run records plus their copied plan artifact and backup payload directories.

## Elixir app lifecycle helpers

The Elixir app recipe can emit lifecycle commands for common BEAM deployment operations. Ecto migrations are represented as normal commands with explicit down commands:

```elixir
elixir_app :shop do
  source github: "acme/shop", path: ".", ref: "main"
  phoenix host: "shop.example.com", secret_key_base: secret_env("SECRET_KEY_BASE")

  ecto release: "Shop.Release"
end
```

This emits a `:before_start` migration command that runs through the built release and a matching down command that calls `Shop.Release.rollback()`.

For multiple repos, HostKit emits one ordered command per repo. Down plans reverse that order:

```elixir
elixir_app :shop do
  source github: "acme/shop", path: ".", ref: "main"
  phoenix host: "shop.example.com", secret_key_base: secret_env("SECRET_KEY_BASE")

  ecto release: "Shop.Release" do
    repo "Shop.Repo"
    repo "Shop.AnalyticsRepo"
  end
end
```

The default expressions are:

```elixir
Shop.Release.migrate(Shop.Repo)
Shop.Release.rollback(Shop.Repo)
```

Use `:migrate` and `:rollback` for custom release functions when the defaults do not fit.

## OTP release artifacts

The OTP release recipe consumes a BEAM-native release artifact manifest written as ETF and expands it into ordinary HostKit resources. The app repository remains responsible for building the Mix release tarball; HostKit remains responsible for accounts, directories, env files, systemd, readiness, planning, apply, and down plans.

[ReleaseKit](https://hex.pm/packages/release_kit) is the reference producer for this manifest format. Applications should configure ReleaseKit artifact defaults and prebuild steps in application config, then run `mix release_kit.artifact` directly. For example, frontend assets belong in ReleaseKit prebuild steps such as `ReleaseKit.Step.Volt`, not in HostKit or app-specific artifact wrapper tasks.

Import the recipe explicitly:

```elixir
use HostKit.DSL, recipes: [HostKit.Recipes.OTPRelease]
```

Then reference the manifest:

```elixir
project :example do
  otp_release :demo_app,
    manifest: "_build/prod/demo_app.etf",
    port: 4000,
    base_dir: "/opt/example/demo_app",
    config_dir: "/etc/example/demo_app"
end
```

Use the `:account_home` option when an existing service account should keep a home directory outside the release base. Use the `:env` option to add deployment-specific clear environment variables to the generated service env file without rebuilding the artifact manifest:

```elixir
otp_release :demo_app,
  manifest: "_build/prod/demo_app.etf",
  account_home: "/var/lib/demo_app/home",
  env: %{"APP_DATA_DIR" => "/srv/demo"}
```

The manifest is decoded with:

```elixir
:erlang.binary_to_term(binary, [:safe])
```

HostKit does not embed release tarball bytes into the plan. The tarball path recorded in the manifest must be available to the target where the generated unpack command runs.

## RPC service bindings

`rpc` models service-to-service RPC wiring. HostKit owns service names, listener locations, module-level bindings, and local socket access; the runtime RPC protocol owns exact operations, typespecs, and handshakes.

Same-host RPC defaults to Unix sockets instead of TCP ports:

```elixir
service :catalog do
  daemon do
    listen :rpc, protocol: :rpc
  end

  rpc do
    expose Catalog.API
    expose Catalog.Admin
  end
end

service :web do
  bind :catalog
end
```

With `roots run: "/run/apps"`, the default RPC socket for `catalog` is:

```text
/run/apps/catalog/rpc.sock
```

The provider side uses `expose` for RPC modules. Do not list every runtime operation in HostKit; SafeRPC or another RPC runtime should describe exact callable functions during handshake.

The caller side uses `bind` to declare Docker-like service bindings. `bind :catalog` means the current service may discover and connect to `catalog`'s exposed RPC modules. Use `bind :catalog, modules: [Catalog.Admin]` only when the caller should narrow the binding metadata to a subset.

HostKit validates RPC bindings during planning:

- the target service must exist;
- the target listener must exist;
- the target service must expose requested modules when a module subset is specified;
- a service cannot bind itself.

For each service with RPC bindings, HostKit emits a caller-local SafeRPC binding term under the service runtime directory and injects its path as `HOSTKIT_RPC_BINDINGS` into the caller's systemd services:

```text
/run/<service>/rpc.etf
```

With service-scoped runtime roots, this becomes for example:

```text
/run/apps/web/rpc.etf
```

The ETF file contains only bindings for that caller:

```elixir
%{
  catalog: %{
    listener: :rpc,
    socket: "/run/apps/catalog/rpc.sock",
    upstream: "unix:/run/apps/catalog/rpc.sock",
    modules: [Catalog.API, Catalog.Admin],
    unit: "catalog.service"
  }
}
```

Consumers read it with:

```elixir
bindings =
  System.fetch_env!("HOSTKIT_RPC_BINDINGS")
  |> File.read!()
  |> :erlang.binary_to_term([:safe])
```

HostKit also derives the local access boundary from `bind`. The provider RPC socket metadata defaults to the provider service user/group with mode `0660`, and the caller service account is added to the provider service group when an account resource is declared for the caller. For example, `bind :catalog` lets the `web` service account join the `catalog` service group so it can open `/run/apps/catalog/rpc.sock`.

Gatehouse/SafeRPC config can build on the same metadata later.

Use TCP explicitly only when the RPC endpoint must cross a host/container boundary:

```elixir
daemon do
  listen :rpc, protocol: :rpc, port: 4451, on: :loopback
end
```

## Providers

Providers can contribute DSL modules, resource types, renderers, validators, and read/plan/apply lifecycle operations. Systemd and Unitctl are core primitives, not providers; integrations such as Caddy should be providers.

```elixir
use HostKit.DSL, providers: [HostKit.Providers.Caddy]

project :demo do
  provider :caddy, HostKit.Providers.Caddy do
    set :sites_dir, "/etc/caddy/sites"
  end

  service :web do
    daemon do
      exec ["/opt/web/bin/server"]
      listen :http, port: 4000
    end

    caddy_site "example.com", path: "web.caddy" do
      encode [:zstd, :gzip]
      reverse_proxy :http
    end
  end
end
```

Providers should keep generated resources inspectable. For example, the Gatus provider is a thin structured-config helper: it emits an ordinary `yaml/2` config resource rather than hiding a daemon or runtime lifecycle.

```elixir
use HostKit.DSL, providers: [HostKit.Providers.Gatus]

project :demo, providers: [HostKit.Providers.Gatus] do
  service :api do
    file "/srv/api/health.txt", content: "ok"

    monitor :http,
      name: "API",
      group: "demo",
      url: "https://api.example.com/health",
      interval: "1m",
      expect: [status: 200],
      alerts: [:telegram]
  end

  service :monitoring do
    gatus_config path(:config, "gatus.yaml"), owner: "root", group: service_user(), mode: 0o640 do
      web address: "127.0.0.1", port: 8080
      gatus_storage :sqlite, path: path(:state, "gatus.db")

      telegram_alerting token: "${MONITORING_TELEGRAM_BOT_TOKEN}", id: "${MONITORING_TELEGRAM_CHAT_ID}" do
        default_alert enabled: true, "failure-threshold": 3, "success-threshold": 2
      end

      gatus_monitor_endpoints order: ["API"]
    end
  end
end
```

## Instances and nested hosts

Top-level `host` declarations describe existing connection targets. `instance` declarations describe lifecycle-managed compute boundaries with backend-selected lifecycle and normal HostKit contents nested inside.

```elixir
use HostKit.DSL

project :demo do
  instance :demo_vm do
    backend :incus
    image "images:ubuntu/24.04"
    kind :container
    lifecycle :ephemeral

    expose :ssh, host: 2222, guest: 22
    expose :web, host: 18080, guest: 80

    target_host :guest

    host :guest, at: "127.0.0.1" do
      ssh do
        user "root"
        password "hostkit-demo"
        port 2222
        accept_hosts true
      end
    end

    service :web do
      package :caddy

      daemon do
        exec ["/usr/bin/env", "true"]
        listen :http, port: 80
      end
    end
  end
end
```

The instance owns compute lifecycle metadata (`backend`, `image`, `kind`, `lifecycle`, `expose`). The nested host owns connection metadata. Nested services/resources are ordinary HostKit declarations scoped to the instance contents. Plans emit the instance lifecycle resource first, then nested content resources annotated with the nested host target so read/apply operations run through that endpoint. If an instance declares more than one nested host, use `target_host :name` to choose the endpoint for nested content resources; otherwise HostKit uses the first nested host.

Down plans delete `lifecycle :ephemeral` instances after their nested content has been rolled back. Persistent instances are intentionally skipped in down plans and reported as warnings rather than destroyed implicitly.

Backend implementations are intentionally separate from the generic DSL. Incus is implemented as a backend for `instance`, not as a user-facing `incus_machine` DSL. The Incus backend maps `expose` declarations to Incus proxy devices.

Backend configuration stays on the `backend` declaration instead of leaking backend-specific flags into generic plan/apply commands:

```elixir
instance :demo_vm do
  backend :incus, sudo: true, project: "hostkit"
end
```

For multi-line configuration, use backend options:

```elixir
instance :demo_vm do
  backend :incus do
    option :sudo, true
    option :project, "hostkit"
  end
end
```

Backend authors implement `HostKit.Instance.Backend`:

- `read/2` returns the observed instance or `nil`,
- `apply/2` creates/starts/configures/waits for the instance,
- `delete/2` destroys an instance when an ephemeral down plan requests it.

Backends should emit apply events for long-running lifecycle work so CLI and Livebook progress remain mailbox-first.

## Host bootstrap packages and mise-managed runtimes

HostKit can install OS packages through the target package manager. The DSL is distribution-neutral by default and can be pinned to a manager when needed.

```elixir
bootstrap do
  package :ca_certificates
  package :build_essential, as: "build-essential", update: true
end
```

HostKit can also bootstrap `mise` and install system-wide tool versions. This is intended for host bootstrap and workspace agents; application services should still prefer packaged release artifacts where possible.

```elixir
bootstrap do
  mise do
    tool :erlang, "29.0.2"
    tool :elixir, "1.20.1"
  end
end
```

This applies through the `mise` CLI contract: it installs the binary with `mise.run` when missing, then runs `mise install --system` with `MISE_SYSTEM_DATA_DIR` set.

Package planning resolves semantic package names through Repology and caches responses in `.host_kit/cache/repology` for 24 hours by default. Use locks for deterministic apply:

```sh
mix host_kit.plan --write-package-lock host_kit.package.lock infra/config.exs
mix host_kit.apply --package-lock host_kit.package.lock --confirm infra/config.exs
```

Plan/apply artifacts make remote changes inspectable before apply. Prefer declaring the remote host in normal `.exs` HostKit config and selecting it with `--host`:

```elixir
use HostKit.DSL

project :infra do
  host :prod, at: "host.example" do
    ssh do
      user "root"
      identity_file Path.expand("~/.ssh/id_ed25519")
      password secret_env("HOSTKIT_SSH_PASSWORD")
      accept_hosts true
      retry attempts: 3, base_delay: 250, max_delay: 2_000
    end
  end
end
```

```sh
mix host_kit.plan --host prod \
  --package-lock host_kit.package.lock \
  --out host_kit.plan.json infra/config.exs

mix host_kit.apply --host prod \
  --plan host_kit.plan.json --confirm infra/config.exs
```

`ssh retry: ...` is an SSH transport policy. It retries connection establishment for transient SSH startup/network failures; it does not blindly rerun arbitrary deployment commands after a command has been sent to the remote host. Use `retry: 3` as shorthand for three attempts, `retry: false` to disable, or keyword options with `:attempts`, `:base_delay`/`:base_delay_ms`, and `:max_delay`/`:max_delay_ms`. Retry progress is emitted as apply events and mirrored to Logger for collection.

Plan artifacts are JSON and intended to be inspectable. They include an artifact version, target metadata, dumped project/resources/changes, source identities, diagnostics, aggregate resource/action statistics, source-location metadata on changes where available, and structured diffs for resources that support semantic review. Structured diffs are generated through HostKit's diff wrapper around JSON Patch concepts; HostKit stores its own stable diff structs rather than exposing the dependency as the artifact contract. Dotenv/INI/YAML resources diff public keys or paths. Templates diff public assign metadata and redacted assign names, not arbitrary rendered text. Secret references are stored as references, not values, for example:

```json
{
  "$type": "struct",
  "module": "Elixir.HostKit.Secret",
  "fields": {
    "source": {
      "$type": "tuple",
      "items": [
        {"$type": "atom", "value": "env"},
        "HOSTKIT_SSH_PASSWORD"
      ]
    }
  }
}
```

`secret_env/1` records an environment-backed secret reference and resolves it only at the control-plane boundary that needs the value. Use it for HostKit's own credentials, such as SSH passwords or future provider API tokens. Target application environment files use contextual `env` declarations. Inside `service`, `env :name do ... end` declares a managed env file at the service's config path. Inside `daemon`, `env :name` attaches that same file to the systemd unit:

```elixir
service :app do
  env :runtime do
    set :mix_env, :prod
    secret :database_url, env: "DATABASE_URL"
  end

  daemon do
    env :runtime
    exec ["/opt/app/bin/server"]
  end
end
```

Use `dotenv path do ... end` when you need an explicit dotenv-format file at a specific path.

Raw SSH flags remain available as an escape hatch: `--remote`, `--user`, `--port`, `--identity-file`, `--password`, and `--password-env`.

For Linux integration testing, use Incus as the lightweight native container/VM backend:

```sh
HOSTKIT_INCUS_SUDO=true HOSTKIT_SSH_PUBLIC_KEY=$HOME/.ssh/id_ed25519.pub \
  scripts/incus_integration_vm.sh ensure
HOSTKIT_INCUS_SUDO=true scripts/incus_integration_vm.sh ip
```

Set `HOSTKIT_INCUS_TYPE=vm` to launch an Incus VM instead of the default container, and `HOSTKIT_INCUS_INSTANCE=name` to change the instance name. Run the remote CLI integration against Incus with `HOSTKIT_INTEGRATION_TOOL=incus`, or against a pre-existing host declared in `.exs` config with `HOSTKIT_INTEGRATION_TOOL=remote HOSTKIT_INTEGRATION_CONFIG=examples/integration_hosts.example.exs`.

A real remote validation can use the same host config and a shell-provided secret:

```sh
HOSTKIT_SSH_PASSWORD='...' \
HOSTKIT_INTEGRATION_TOOL=remote \
HOSTKIT_INTEGRATION_CONFIG=examples/integration_hosts.example.exs \
mix test test/integration/cli_remote_test.exs --include integration
```

## Project-local DSLs

Use `HostKit.ProjectDSL` in consuming projects to build local conventions without baking them into HostKit.
Load project-local DSL files explicitly through the runtime API or Mix task `--require` option:

```elixir
# infra/toys_infra.exs
defmodule ToysInfra do
  use HostKit.ProjectDSL

  root :source, "/opt/toys/src"
  root :data, "/srv/toys"
  root :state, "/var/lib/toys"
  root :config, "/etc/toys"

  prefix :user, "toys-"
  prefix :unit, "toys-"

  defservice :toy_service do
    let :service_user, do: prefixed(:user, service_name())
    let :unit_name, do: prefixed(:unit, service_name()) <> ".service"

    path :source_dir, root(:source), service_name()
    path :data_dir, root(:data), service_name()
    path :state_dir, root(:state), service_name()
    path :config_dir, root(:config), service_name()

    macro :standard_user do
      account service_user(), system: true, home: state_path("home")
    end
  end
end
```

```elixir
# infra/config.exs
use HostKit.DSL, providers: [HostKit.Providers.Caddy]
use ToysInfra

project :toys do
  toy_service :exograph do
    standard_user()

    systemd_service unit_name() do
      working_directory source_dir()
      read_write_paths [data_dir(), state_dir(), source_dir()]
    end
  end
end
```

## Runtime API

```elixir
{:ok, project} = HostKit.load("infra/config.exs", require: ["toys_infra.exs"])
{:ok, plan} = HostKit.plan(project)
#=> %HostKit.Plan{changes: [%HostKit.Change{action: :create, ...}]}

prod = HostKit.Target.ssh(:prod, host: "elixir.toys", user: "dannote", sudo: true)
{:ok, remote_plan} = HostKit.plan(project, target: prod, reader: HostKit.Remote)

HostKit.format_plan(plan)
execution_graph = HostKit.Plan.ExecutionGraph.build(plan)
HostKit.Plan.ExecutionGraph.format(execution_graph)
{:ok, results} = HostKit.apply(plan, dry_run: true)

# Supported apply resources include accounts, directories, files, structured configs,
# templates, symlinks, env files, systemd units, commands, packages, and provider-rendered files.
{:ok, results} = HostKit.apply(plan, confirm: true, sudo: true)

# Command and filesystem operations are routed through a runner boundary.
{:ok, results} = HostKit.apply(plan, confirm: true, runner: HostKit.Runner.Local)

prod = HostKit.Target.ssh(:prod, host: "elixir.toys", user: "dannote", sudo: true)

{:ok, results} = HostKit.apply(plan, target: prod, confirm: true)

{:ok, conn} = HostKit.Runner.SSH.Connection.open(host: "elixir.toys", user: "dannote")
try do
  prod = HostKit.Target.ssh(:prod, runner: {HostKit.Runner.SSH.Connection, conn: conn}, sudo: true)
  {:ok, remote_plan} = HostKit.plan(project, target: prod, reader: HostKit.Remote)
after
  HostKit.Runner.SSH.Connection.close(conn)
end

{:ok, unit} = HostKit.Render.render(project, {:systemd_service, "toys-exograph.service"})
```

Plans can also be inspected as an execution dependency graph. The graph is derived from active create/update/delete changes and records why ordering exists: declared `depends_on`, parent directories, owner/group accounts, command source inputs, symlink target paths, systemd timer/service relationships, systemd service file/path references, and systemd readiness checks. It is currently an inspection/debug artifact; future parallel apply can consume the same graph without changing the plan format.

```sh
mix host_kit.plan infra/config.exs --host prod --show-graph
mix host_kit.plan infra/config.exs --host prod --graph-format json
```

The JSON graph output is a JSON-safe map with display labels and `HostKit.Resource.dump/1` terms for resource ids; it does not encode raw Elixir structs or embed full before/after resource payloads. See [Parallel apply design](parallel-apply-design.md) for how this graph may later feed a bounded scheduler.

## Storage volumes

HostKit models storage as named metadata instead of repeated path strings:

```elixir
volume =
  HostKit.Storage.volume(:repositories,
    path: "/srv/toys/forgejo/repositories",
    owner: "toys-forgejo",
    group: "toys-forgejo",
    mode: 0o750,
    backup: true
  )

directory HostKit.Storage.directory(volume)
read_write_paths HostKit.Storage.read_write_paths([volume])
```

Service conventions can derive these paths without project-specific macros and later reuse the same volume metadata for systemd sandboxing, Unitctl transient runtimes, and backups.

```elixir
project :toys do
  roots data: "/srv/toys", config: "/etc/toys"
  prefixes user: "toys-", unit: "toys-"

  service :forgejo do
    storage :repositories, under: :data, path: "repositories", mode: 0o750, backup: true
    storage :config, under: :config, owner: "root", group: service_user(), writable: false, secret: true

    daemon unit_name() do
      run user: service_user(), read_write_paths: writable_storage_paths()
    end
  end
end
```

## HostKit agent

HostKit can run as a supervised OTP application. The supervision tree currently starts agent state and a monitor worker:

```elixir
HostKit.Agent.status()
HostKit.Agent.configure(project: project, target: HostKit.Target.local(:prod))
HostKit.Agent.run_plan()
HostKit.Agent.run_monitor()
```

HostKit can also declare its own outer systemd supervisor unit:

```elixir
HostKit.Agent.Systemd.service(
  exec_start: ["/opt/host_kit/bin/host_kit", "agent", "--config", "/etc/host_kit/config.exs"]
)
```

State snapshots can be written for audit/drift history:

```elixir
HostKit.State.write(plan, "/var/lib/host_kit/state/latest-plan.json")
HostKit.State.read("/var/lib/host_kit/state/latest-plan.json")
```

This gives a clean two-layer supervision model: OTP inside the BEAM and systemd outside it.

## Firewall policy

HostKit can declare project- or host-scoped firewall policy:

```elixir
firewall do
  allow tcp: 22, from: :any
  allow tcp: [80, 443], from: :any
  allow tcp: 9100, from: {10, 44, 0, 0, 24}
  deny :all
end
```

Host-scoped policy lives inside `host`:

```elixir
host :prod, at: "elixir.toys" do
  firewall do
    allow tcp: 22, from: :any
    deny :all
  end
end
```

Extract, render, plan, and apply policies with:

```elixir
HostKit.Firewall.policies(project)
HostKit.Firewall.Nftables.render(policy)
HostKit.plan(project, reader: HostKit.Local)
HostKit.apply(plan, confirm: true, nft_reload: true)
```

Firewall policy is written to `/etc/nftables.d/hostkit.nft` by default and validated with `nft -c -f` before optional reload.

## Workspace inside monitoring

Workspace services can declare checks that are intended to run inside the sandbox later via a workspace agent:

```elixir
workspace :blog, owner: :alice do
  service :preview do
    inside do
      monitor :mix, task: "test", every: "5m"
      monitor :port, port: 4000
      monitor :git, clean: true
    end
  end
end
```

Extract them with:

```elixir
HostKit.Workspace.inside_monitors(project)
```

## Workspace execution and tenants

Tenants can own workspaces:

```elixir
tenant :alice, quota: [memory: "4G"] do
  agent port: 4173
end
```

Workspace command specs can be built for transient execution:

```elixir
HostKit.Workspace.exec_spec(project, :alice, :blog, ["mix", "test"])
HostKit.Workspace.exec(project, :alice, :blog, ["mix", "test"])
```

Inside monitors currently return `:pending_workspace_agent`, reserving execution for the sandbox agent boundary.

## OpenTelemetry Collector config

Telemetry declarations can be converted to an OpenTelemetry Collector config map:

```elixir
HostKit.OtelCollector.config(project, endpoint: "otel.example:4317")
```

## Workspace sandbox profiles

Systemd-backed isolation profiles can be applied inside daemons:

```elixir
workspace :blog, owner: :alice do
  service :preview do
    daemon do
      exec ["mix", "phx.server"]

      isolate :vibe_dev do
        writable path(:data)
        network :loopback
      end
    end
  end
end
```

Profiles include `:vibe_dev`, `:strict_app`, and `:untrusted`, and can be overridden inside `isolate`:

```elixir
isolate :untrusted do
  memory_max "256M"
  private_network false
end
```

## Workspace preview helper

Workspace services can expose a preview route with a named listener and Caddy site:

```elixir
workspace :blog, owner: :alice do
  service :preview do
    daemon unit_name() do
      run exec_start: ["mix", "phx.server"]
    end

    preview :http, port: 4000, domain: "alice-blog.dev.example.com"
  end
end
```

This expands to `listen :http`, a Caddy reverse proxy to that listener, an HTTP monitor, telemetry metadata, and Caddy access-log metadata.

## Workspace agent helper

Workspaces can declare the default sandbox agent service as ordinary HostKit resources:

```elixir
workspace :blog, owner: :alice do
  agent port: 4173
end
```

This expands to a service with an account, workspace directory, systemd daemon, loopback listener, logs, telemetry, systemd monitor, and loopback-only network policy.

## Workspace scope

`workspace` scopes ordinary HostKit DSL for user sandboxes while keeping resources inspectable:

```elixir
workspace :blog, owner: :alice do
  service :preview do
    directory path(:data), mode: :private_dir

    daemon unit_name() do
      run exec_start: ["mix", "phx.server"]
      listen :http, port: 4000, on: :loopback
    end
  end
end
```

Inside a workspace, services get workspace metadata plus separate path and identity names:

```elixir
path(:data) # .../alice/blog/preview
unit_name()      # prefix-alice-blog-preview.service
```

## Named listeners

Services can declare named listeners and reuse them from provider declarations:

```elixir
daemon unit_name() do
  run exec_start: ["/usr/bin/env", "true"]
  listen :http, port: 3000, on: :loopback
end

caddy_site "web.example.com" do
  reverse_proxy :http
end
```

Named listeners are stored as service metadata and render Caddy upstreams like `127.0.0.1:3000` at the provider boundary.

## Network addresses and policy

Network addresses can use Elixir tuple forms and semantic aliases:

```elixir
listen 3000, on: :loopback
listen 4000, on: {127, 0, 0, 1}
network_policy deny: :all, allow: [:loopback, {10, 44, 0, 0, 24}]
```

Systemd services compile network policy to:

```ini
IPAddressDeny=any
IPAddressAllow=localhost 10.44.0.0/24
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
```

## Log management intent

Log management can be declared globally, per service, or on individual resources:

```elixir
observability do
  logs driver: :journald,
       retention: "14d",
       ship: true,
       attributes: [deployment_environment: :prod]
end
```

Systemd service log declarations also add unit directives:

```elixir
daemon unit_name() do
  run exec_start: ["/usr/bin/env", "true"]
  logs identifier: service_name(), stdout: :journal, stderr: :journal
end
```

Extract log intent with:

```elixir
HostKit.Logs.configs(project)
```

Read recent journald logs through local or remote targets:

```elixir
HostKit.Logs.read("toys-forgejo.service", target: prod, since: "1h")
HostKit.Logs.tail("toys-forgejo.service", target: prod, lines: 100)
```

## OpenTelemetry collection intent

Observability defaults can be enabled once at project or service scope and inherited by resources:

```elixir
observability do
  telemetry logs: true,
            metrics: true,
            traces: false,
            attributes: [deployment_environment: :prod]
end
```

Resource-level overrides are still available:

```elixir
daemon unit_name() do
  run exec_start: ["/usr/bin/env", "true"]
  telemetry logs: :journald, metrics: false, service_name: service_name()
end
```

Extract collection intent with:

```elixir
HostKit.Telemetry.signals(project)
```

Systemd services and Caddy sites get default collection intent even without global defaults:

```elixir
# systemd: logs: :journald, metrics: :systemd
# caddy: logs: :access, metrics: :http
```

## Monitoring metadata

Declarations can carry monitoring intent for a later monitoring service or config generator:

```elixir
daemon do
  exec ["/usr/bin/env", "true"]
  listen :http, port: 4000
  monitor :systemd, expect: [state: :active], severity: :critical
end

caddy_site "web.example.com" do
  reverse_proxy :http
  monitor :http, url: "https://web.example.com", expect: [status: 200]
end
```

Extract, project, or run checks with:

```elixir
HostKit.Monitor.checks(project)
HostKit.Monitor.endpoint_checks(project, group: "prod", interval: "1m")
HostKit.Providers.Gatus.endpoints_from_monitors(project)
HostKit.Monitor.run(project, target: prod)
```

Initial execution supports systemd state, HTTP status, filesystem existence, and command exit checks. Command monitors use the same `exec:` command shapes as command resources:

```elixir
monitor :command,
  name: :dr_validate,
  exec: argv("/usr/local/sbin/dr-validate"),
  expect: [exit: 0],
  severity: :critical
```

Endpoint projection currently turns HTTP monitors into provider-neutral external endpoint specs; providers such as Gatus can render those specs into concrete monitoring config.

## Binary release layouts

Use `release/2` when a service follows the common unpacked-binary pattern of a versions directory and a `current/<name>` symlink. It is only a helper: it emits ordinary `directory/2` and `symlink/2` resources that remain visible in plans.

```elixir
service :gatus do
  release :gatus, version: "5.36.0", owner: "deploy", group: "deploy"

  daemon do
    exec [path(:opt, "current/gatus/gatus")]
  end
end
```

The default layout is:

- versions directory: `path(:opt, "releases/<name>")`
- current symlink: `path(:opt, "current/<name>")`
- symlink target: `<versions_dir>/<version>`

Use `current_dir: [owner: ..., group: ..., mode: ...]` when HostKit should also manage the parent `current` directory. See [Release design notes](release-design.md) for the intended boundary before adding artifact download, activation, retention, or rollback behavior.

## File modes

Mode values can be raw octal, semantic aliases, tuples, keywords, or capability lists:

```elixir
mode: :secret_group_file
mode: {:rw, :r, nil}
mode: [owner: :rw, group: :r]
mode: [:setgid, :owner_rwx, :group_rwx, :other_rx]
```

Resources store normalized integer modes, so plan/apply remains simple.

## Env files and secrets

HostKit has a Dotenvy-validated `dotenv` resource for explicit env files. Secret values are resolved at apply time. Drift detection compares metadata and non-secret `set` entries with structured key-level diffs; secret entry values are not read into plan artifacts for comparison. Use `secret KEY, env: :redacted` for existing/generated env-file secrets that should be modeled but never rendered by HostKit. Secret sources support `env: "NAME"`, `file: "/run/secrets/name"`, and `command: ["pass", "show", "name"]`.

```elixir
service :web do
  env :runtime do
    set :MIX_ENV, :prod
    set :PORT, 4000
    secret :SECRET_KEY_BASE, env: "SECRET_KEY_BASE"
    secret :API_TOKEN, file: "/run/secrets/api-token"
    secret :GENERATED_TOKEN, env: :redacted
  end

  daemon do
    env :runtime
    exec ["/opt/web/bin/server"]
  end
end
```

For explicit paths, use `dotenv` alongside `ini` and `yaml`:

```elixir
dotenv path(:config, "env"), owner: "root", group: service_user(), mode: 0o640 do
  set "MIX_ENV", "prod"
  set "PORT", 4000
  secret "GENERATED_TOKEN", env: :redacted
end
```

## Structured config files

Use `ini/2` and `yaml/2` when a managed file is naturally data. Structured config resources are first-class resources in plans and render to ordinary managed files during read/apply.

```elixir
service :forgejo, path: "forgejo" do
  ini path(:config, "app.ini"), owner: "root", group: service_user(), mode: 0o640 do
    set "APP_NAME", "elixir.toys git"

    section "server" do
      set "DOMAIN", "git.elixir.toys"
      set "ROOT_URL", "https://git.elixir.toys/"
      set "HTTP_PORT", 3000
      secret "LFS_JWT_SECRET", env: :redacted
    end

    section "database" do
      set "DB_TYPE", "sqlite3"
      set "PATH", path(:data, "forgejo.db")
    end
  end
end
```

Secret or redacted INI/YAML values are omitted from public drift comparison. For public values, HostKit produces structured plan diffs with operations, paths, before/after values, and human-readable output such as `~ server.HTTP_PORT: 3000 -> 4000`; redacted values are reported as redacted paths without reading or storing their actual values. HostKit decodes YAML with `yaml_elixir` for public-path comparison, renders YAML scalars with `ymlr`, and uses JSON Patch-style operations internally for structured diffs; HostKit does not hand-roll YAML quoting/parsing. `env: :redacted` is useful for modeling existing generated secrets without storing or rendering them, and it is intentionally not renderable during apply. Use an env-backed secret when HostKit should render the file during apply:

```elixir
secret "TOKEN", env: "APP_TOKEN"
secret "TOKEN", file: "/run/secrets/app-token"
secret "TOKEN", command: ["pass", "show", "app/token"]
```

YAML configs use Elixir keyword data for stable order and may contain redacted secret leaves:

```elixir
yaml path(:config, "gatus.yaml"),
  content: [
    storage: [type: "sqlite", path: path(:state, "gatus.db")],
    alerting: [telegram: [token: :redacted, id: "chat-id"]],
    endpoints: [
      [
        name: "Forgejo",
        url: "https://git.elixir.toys",
        conditions: ["[STATUS] == 200"]
      ]
    ]
  ],
  owner: "root",
  group: service_user(),
  mode: 0o640
```

## Elixir `.exs` files

Use `exs/2` when the desired file is Elixir configuration code and should be represented as quoted AST rather than an EEx string template.

```elixir
exs path(:config, "runtime.exs"), owner: "root", group: service_user(), mode: 0o640 do
  import Config

  config :my_app,
    url: unquote(value("https://example.com")),
    secret_key_base: unquote(secret("SECRET_KEY_BASE", env: "SECRET_KEY_BASE"))
end
```

The block is captured and rendered; it is not evaluated. HostKit currently interprets only strict placeholder forms inside `unquote(...)`: `value(literal)` and `secret(literal, literal_opts)`. Use templates for free-form text generation.

## Templates

Use `template/2` for deterministic EEx-rendered text resources. Templates are first-class resources in plans and render to ordinary managed files during read/apply.

```elixir
service :forgejo, path: "forgejo" do
  template path(:config, "app.ini"),
    from: "templates/forgejo/app.ini.eex",
    assigns: %{
      domain: "git.elixir.toys",
      data_dir: path(:data),
      repositories_dir: path(:data, "repositories")
    },
    owner: "root",
    group: service_user(),
    mode: 0o640
end
```

In DSL configs, relative `from:` paths are resolved relative to the declaring config file. Runtime code may use absolute `from:` paths or inline `source:`:

```elixir
HostKit.Resources.Template.new("/etc/app.conf",
  source: "port=<%= @port %>\n",
  assigns: %{port: 4000}
)
```

Templates support regular EEx bindings (`<%= port %>`) and assigns syntax (`<%= @port %>`). Assign keys must be atoms because they become EEx bindings. Keep templates inspectable and deterministic; do not hide runtime behavior in templates. Template assigns may contain `%HostKit.Secret{}` references; plans show public assign diffs and redacted assign names without resolving secret values. `:redacted` assign values are useful for modeling existing generated values but cannot be rendered or applied by HostKit.

## Read, audit, and facts APIs

Runtime APIs are primary; Mix tasks wrap them. Besides `HostKit.plan/2`, projects expose focused read/audit helpers:

```elixir
{:ok, current_resources} = HostKit.Project.read(project, target: HostKit.Target.local(:prod))
{:ok, audit_plan} = HostKit.Project.audit(project, target: HostKit.Target.local(:prod))
{:ok, facts} = HostKit.Facts.collect(HostKit.Target.local(:prod), only: [:os, :users, :systemd, :ports])
```

`read/2` returns the current snapshots captured for each desired resource. `audit/2` returns the same plan shape as `HostKit.plan/2`, so callers can inspect creates, updates, deletes, read errors, and no-ops without going through Mix tasks. CLI wrappers are available as `mix host_kit.read`, `mix host_kit.audit`, and `mix host_kit.facts`.

## Command argv builder

Use `argv/2` when a service command has many CLI options. It keeps argv inspectable without hand-writing long flag lists.

```elixir
daemon :search do
  exec argv(path(:bin, "mix"),
    args: ["exograph.web"],
    opts: [
      backend: "duckdb",
      manifest_path: path(:data, "hex-manifest.json"),
      duckdb_memory_limit: "2GB",
      port: 4200
    ]
  )
end
```

Option styles are configurable:

```elixir
argv("cmd", opts: [foo_bar: "baz"], style: :gnu)         # --foo-bar baz
argv("cmd", opts: [foo_bar: "baz"], style: :equals)      # --foo-bar=baz
argv("cmd", opts: [foo_bar: "baz"], style: :single_dash) # -foo-bar baz
argv("cmd", opts: [f: "baz", v: true], style: :short)    # -f baz -v
argv("cmd", opts: [foo_bar: "baz"], style: :underscore)  # --foo_bar baz
```

Booleans with `true` emit flags, `false`/`nil` are omitted, and list values repeat the option.

BEAM command builders wrap the same argv structure:

```elixir
mix("ecto.migrate", opts: [quiet: true])
mix("exograph.web", opts: [port: 4200])
elixir("script.exs", opts: [name: "demo"])
elixir(args: ["--version"])
eval("IO.puts(:ok)")
```

These return `%HostKit.CommandLine{}` and can be used anywhere `exec:` or `exec_start` accepts command lines. In DSL context, `mix`, `elixir`, and `eval` default to `path(:bin, "mix")` / `path(:bin, "elixir")`, so projects can override the executable root with `roots bin: ...`.

## Systemd unit names

`daemon`, `job`, and `schedule` normalize systemd suffixes. Strings without a suffix get the right suffix; strings with `.service`/`.timer` are preserved. Atom names use the configured `:unit` prefix.

```elixir
daemon "custom" do ... end      # custom.service
schedule "custom" do ... end    # custom.timer

daemon :health_alert do ... end  # e.g. toys-health-alert.service
schedule :health_alert do ... end
```

Use raw `systemd_service`/`systemd_timer` only when you intentionally want the low-level resource constructor.

## Timer schedule helpers

`schedule` supports typed helpers for common systemd timer shapes while keeping raw systemd calendar syntax available through `timer on_calendar: ...`.

```elixir
schedule :backup do
  daily at: ~T[02:30:00]
  jitter "15m"
  persistent true
  wanted_by :timers
end

schedule :weekly_maintenance do
  weekly :monday, at: "03:00"
end

schedule :monthly_report do
  monthly day: 1, at: "04:00"
end
```

- `daily at: time` renders `*-*-* HH:MM:SS`.
- `weekly day, at: time` renders `Day *-*-* HH:MM:SS`.
- `monthly day: n, at: time` renders `*-*-NN HH:MM:SS`.
- Times may be `Time` structs or strict `"HH:MM"` / `"HH:MM:SS"` strings.
- `jitter value` sets `RandomizedDelaySec`.
- `repeat_after value` sets `OnUnitActiveSec`.
- `after_boot value` and `on_boot value` set `OnBootSec`.

## Runtime isolation

HostKit uses shared runtime isolation structs for persistent systemd units and future transient Unitctl workloads:

```elixir
sandbox = HostKit.Runtime.Sandbox.new(:strict_web)
resources = HostKit.Runtime.Resources.new(memory_max: "512M", cpu_quota: "50%")

service sandbox |> HostKit.Runtime.Sandbox.to_systemd_service_options()
service resources |> HostKit.Runtime.Resources.to_systemd_service_options()
```

Built-in profiles include `:web_service`, `:strict_web`, `:strict_app`, `:small`, `:medium`, and `:large`.

The daemon DSL exposes a human-oriented isolation block for common service hardening:

```elixir
service :api do
  storage :data, mode: 0o750

  daemon do
    exec ["/opt/api/bin/server"]

    isolate do
      memory_max "512M"
      writable :data
      network :loopback
    end
  end
end
```

`daemon do ... end` derives the unit name from the enclosing service and enables it for `multi-user.target` by default. Use explicit systemd directives only when you need non-default unit behavior.

## Runtime controls

HostKit exposes Unitctl as its core transient runtime layer:

```elixir
{:ok, spec} =
  HostKit.Runtime.Spec.new(
    name: "demo-check",
    command: ["/usr/bin/env", "true"],
    sandbox: %{no_new_privileges: true, private_tmp: true}
  )

{:ok, instance} = HostKit.Runtime.start(spec)
{:ok, state} = HostKit.Runtime.status(instance)
:ok = HostKit.Runtime.stop(instance)
```

## Mix tasks

```sh
mix host_kit.dump --require toys_infra.exs infra/config.exs
mix host_kit.plan --require toys_infra.exs infra/config.exs
mix host_kit.plan --require toys_infra.exs infra/config.exs --local
mix host_kit.plan --require toys_infra.exs infra/config.exs --local --ignore systemd_service:toys-exograph.service
mix host_kit.plan --require toys_infra.exs infra/config.exs --remote elixir.toys --user dannote --sudo
mix host_kit.apply --require toys_infra.exs infra/config.exs --local --dry-run
mix host_kit.render --require toys_infra.exs infra/config.exs systemd_service toys-exograph.service
```