Condukt.Sandbox.Kubernetes (Condukt v1.2.0)

Copy Markdown View Source

Sandbox that runs each session inside a dedicated Kubernetes Pod.

One Pod per session. All filesystem reads and writes and all subprocess execution happen inside the Pod via the Kubernetes exec API. The agent cannot reach the host running the Condukt BEAM process.

Idempotent init via the session id

init/1 is idempotent on a stable id: it derives a deterministic Pod name from it and either adopts an existing Pod or creates a fresh one. The session's :id (passed to Condukt.Session.start_link/1, or auto-generated) flows into the sandbox by default, so a single id at the session level drives both the session and the Pod. This is the recommended pattern for Oban-style workers where the job lifecycle and the Pod lifecycle are decoupled:

defmodule MyApp.AgentWorker do
  use Oban.Worker, queue: :agents, max_attempts: 3

  @impl true
  def perform(%Oban.Job{id: job_id, args: %{"prompt" => prompt}}) do
    {:ok, agent} =
      MyApp.CodingAgent.start_link(
        id: job_id,
        api_key: System.get_env("ANTHROPIC_API_KEY"),
        sandbox: {Condukt.Sandbox.Kubernetes, namespace: "agents"}
      )

    Condukt.Session.run(agent, prompt)
  end
end

If the job is retried after a crash, the same job_id flows through and the sandbox reattaches to the existing Pod. Repo clones and in-progress file edits persist (they live in an emptyDir volume mounted at the session cwd, which survives container restarts within the same Pod).

Pass :id explicitly on the sandbox spec only when you want the pod identity to diverge from the session identity. An explicit value wins over the session-supplied default. When no id is supplied at the session level, one is generated and the pod is single-use: shutdown/1 deletes it.

Init options

  • :id — stable identifier used to derive the pod name. Defaults to the session id when invoked through Condukt.Session. Pass it explicitly on the sandbox spec only to diverge from the session id.
  • :namespace — Kubernetes namespace (default "default").
  • :image — container image (default "debian:bookworm-slim").
  • :cwd — working directory inside the pod, also where the workspace volume is mounted (default "/workspace").
  • :env — environment variables to set on the pod container, as a map or list of {key, value} pairs.
  • :labels — additional pod labels (caller-supplied; merged on top of Condukt's defaults).
  • :annotations — additional pod annotations.
  • :resources — Kubernetes resource requests/limits map, e.g. %{requests: %{cpu: "500m", memory: "1Gi"}, limits: %{cpu: "2", memory: "4Gi"}}.
  • :service_account — Kubernetes ServiceAccount the pod runs as.
  • :active_deadline_seconds — K8s-side hard ceiling for the pod's lifetime (default 8 hours). Insurance against abandoned pods.
  • :heartbeat_interval — milliseconds between pod heartbeat annotation updates (default 60_000). Pass false to disable. Use reap_stale/1 from a separate process to delete pods whose heartbeat is too old.
  • :workspace_source — git repository to clone into the workspace at init. Accepts a git URL string or a keyword/map with :git and optional :ref. The runtime image must include git.
  • :workspace_source_timeout — milliseconds to wait for the workspace clone or checkout command (default 300_000).
  • :ready_timeout — milliseconds to wait for a created pod to reach Running phase (default 120_000).
  • :on_stale — what to do when adopting a pod that is in an unexpected phase (Succeeded / Failed). :error (default) returns {:error, {:stale_pod, phase}}; :recreate deletes and recreates.
  • :delete_on_shutdown — whether shutdown/1 deletes the pod. Defaults to false when :id is supplied (the pod outlives any single BEAM process), true when no id is given.
  • :conn — already-built K8s.Conn. Skips kubeconfig/in-cluster resolution.
  • :kubeconfig — path to a kubeconfig file (default ~/.kube/config).
  • :context — kubeconfig context name (default: current-context).
  • :in_clustertrue to use the pod's mounted ServiceAccount token. Auto-detected when KUBERNETES_SERVICE_HOST is set, so usually not needed.

RBAC

The Kubernetes identity used by the Condukt process needs:

apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "create", "patch", "delete"]

apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create"]

See guides/sandbox.md for a full sample Role + RoleBinding.

Limitations

  • mount/3 is not supported. Volumes cannot be added to a running pod.
  • Node failure loses the pod's emptyDir workspace. Mount a PersistentVolumeClaim into the pod manifest if you need cross-node durability — currently requires a custom :image setup, not exposed as an init option in v1.
  • :workspace_source shells out to git inside the pod. Use an image that includes git when enabling it.

Summary

Functions

Updates the heartbeat annotation on a Kubernetes sandbox pod.

Deletes Condukt-managed pods whose heartbeat annotation is older than :stale_after.

Explicitly delete the pod backing a session.

Functions

heartbeat(state)

Updates the heartbeat annotation on a Kubernetes sandbox pod.

The sandbox starts a worker tied to the owner process by default. This helper is exposed for callers that disable the worker and want to drive heartbeats from their own supervision tree.

reap_stale(opts \\ [])

Deletes Condukt-managed pods whose heartbeat annotation is older than :stale_after.

Options:

  • :namespace - namespace to scan, default "default".
  • :stale_after - heartbeat age in milliseconds, default 15 minutes.
  • :now - DateTime used for tests, default DateTime.utc_now().
  • K8s connection options accepted by init/1, such as :conn, :kubeconfig, :context, and :in_cluster.

terminate(id, opts \\ [])

Explicitly delete the pod backing a session.

Use this when a session is truly done and you do not want the pod to outlive the BEAM process (the default when :id is set).

Condukt.Sandbox.Kubernetes.terminate(id, namespace: "agents")