ExAtlas.Orchestrator (ExAtlas v0.5.0)

Copy Markdown View Source

Opt-in OTP orchestration for transient-per-user compute sessions.

The core ExAtlas API is stateless — each call hits the provider directly. The orchestrator adds one lightweight GenServer per spawned resource. That server:

  • Holds the resource's metadata (id, auth handle, proxy URL, user context).
  • Heartbeats the resource via touch/1 so idle sessions auto-terminate.
  • Traps exits and calls ExAtlas.terminate/2 on shutdown, guaranteeing no leaked pods.
  • Broadcasts state changes over Phoenix.PubSub so LiveViews can react.

The full supervision tree (Registry + DynamicSupervisor + PubSub + Reaper) only starts when you opt in:

# config/config.exs
config :ex_atlas, start_orchestrator: true

When opted out (the default), ExAtlas boots with no processes — library-only consumers never pay for processes they don't use.

Spawning a tracked resource

{:ok, pid, compute} =
  ExAtlas.Orchestrator.spawn(
    provider: :runpod,
    gpu: :h100,
    image: "pytorch/pytorch:2.5.0-cuda12.1-cudnn9-runtime",
    ports: [{8000, :http}],
    auth: :bearer,
    user_id: current_user.id,
    idle_ttl_ms: 15 * 60_000
  )

# LiveView can subscribe for state changes:
Phoenix.PubSub.subscribe(ExAtlas.PubSub, "compute:" <> compute.id)

Heartbeating

# Any time the user is still actively using the session:
ExAtlas.Orchestrator.touch(compute.id)

Missing a heartbeat for idle_ttl_ms triggers graceful termination.

Manual termination

:ok = ExAtlas.Orchestrator.stop_tracked(compute.id)

Summary

Functions

Fetch the latest tracked state for a resource.

Return the list of currently-tracked resource ids.

Spawn a compute resource under supervision.

Gracefully stop tracking and terminate the upstream resource.

Record activity so the idle-reaper keeps the resource alive.

Functions

info(id)

@spec info(String.t()) :: {:ok, map()} | {:error, :not_tracked}

Fetch the latest tracked state for a resource.

list_ids()

@spec list_ids() :: [String.t()]

Return the list of currently-tracked resource ids.

spawn(opts)

@spec spawn(keyword()) :: {:ok, pid(), ExAtlas.Spec.Compute.t()} | {:error, term()}

Spawn a compute resource under supervision.

Returns {:ok, pid, compute} where pid is the tracking GenServer and compute is the ExAtlas.Spec.Compute normally returned by ExAtlas.spawn_compute/1.

stop_tracked(id)

@spec stop_tracked(String.t()) :: :ok | {:error, :not_tracked}

Gracefully stop tracking and terminate the upstream resource.

touch(id)

@spec touch(String.t()) :: :ok | {:error, :not_tracked}

Record activity so the idle-reaper keeps the resource alive.