Tank.Reconciler (Tank v0.1.0)

Copy Markdown View Source

The level-triggered control loop that converges running pods to the desired state in Tank.Store. This is what closes the declarative loop: you Tank.apply/1 a pod and the reconciler starts it — you never start a container imperatively.

Each pass reads Tank.Store.list_pods/0, diffs it against what's tracked, and actuates:

  • desired ∧ ¬tracked → start a Tank.Runtime under the loop's DynamicSupervisor,
  • tracked ∧ ¬desired → stop it,
  • tracked ∧ desired-but-changed → restart it with the new spec.

Resync runs on a timer (the truth — drift, crashes, reboots are corrected on the next pass) and can be woken early by nudge/0 (the write path calls it, debounced). Runtimes are started :temporary, so the reconciler — not the supervisor — owns restart.

Crash handling

Each pod carries a status: :running, :backing_off, or :terminal. When a runtime exits unexpectedly, the loop honours the pod's :restart policy (:always; :on_failure only on an abnormal exit; :never):

  • restartable → wait min(base · 2ⁿ, cap) then restart (status :backing_off); n resets after a stable run, so a crash loop backs off while an occasional crash recovers fast.
  • not restartable → status :terminal; resync leaves it stopped (until its spec changes or it is deleted).

Options

  • :runtime — the runtime module (default Tank.Runtime); injectable for tests.
  • :owner — forwarded to each runtime's :owner (default: none).
  • :runtime_opts — extra opts merged into each runtime's start (e.g. [image: [cache: …]]).
  • :interval — resync period in ms (default 5000).
  • :backoff_base / :backoff_cap — restart backoff bounds in ms (defaults 10_000 / 300_000, per the PLAN).
  • :stable_window — a run lasting at least this long (ms) resets the backoff (default 600_000).
  • :name — GenServer name (default Tank.Reconciler).

Summary

Functions

Returns a specification to start this module under a supervisor.

Wake the loop early (debounced). Best-effort: a no-op if not running.

The pods currently running, as name => runtime_pid.

Each tracked pod's %{status:, retries:}. Mainly for introspection/tests.

Force a synchronous resync pass and return. Mainly for tests.

Functions

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

nudge(server \\ __MODULE__)

@spec nudge(GenServer.server()) :: :ok

Wake the loop early (debounced). Best-effort: a no-op if not running.

running(server \\ __MODULE__)

@spec running(GenServer.server()) :: %{optional(String.t()) => pid()}

The pods currently running, as name => runtime_pid.

start_link(opts \\ [])

@spec start_link(keyword()) :: GenServer.on_start()

status(server \\ __MODULE__)

@spec status(GenServer.server()) :: %{optional(String.t()) => map()}

Each tracked pod's %{status:, retries:}. Mainly for introspection/tests.

sync(server \\ __MODULE__, timeout \\ 5000)

@spec sync(GenServer.server(), timeout()) :: :ok

Force a synchronous resync pass and return. Mainly for tests.