This is the scenario ExAtlas was built for: a Phoenix app spawns a GPU pod per active user, the user's browser talks directly to the pod, and the pod is reaped when the user leaves.
Why not proxy through the Phoenix app?
For real-time workloads (video inference, audio transcription, generative streaming) the extra hop doubles latency and forces your Phoenix node to carry per-user bandwidth. Handing the browser a URL that points straight at the pod keeps Phoenix out of the data path.
The flow
Browser Phoenix (Fly.io) RunPod pod
│ │ │
│ 1. open session │ │
├─────────────────────────►│ │
│ │ 2. spawn_compute │
│ ├────────────────────────────►│
│ │ (inject ATLAS_PRESHARED_KEY env var)
│ 3. {url, token} │◄────────────────────────────┤
│◄─────────────────────────┤ │
│ │
│ 4. inference over HTTPS with Authorization: Bearer │
├───────────────────────────────────────────────────────►│
│ │
│ 5. touch heartbeats │
├─────────────────────────►│ │
│ │ │
│ 6. idle_ttl_ms passes with no heartbeat │
│ │ 7. terminate │
│ ├────────────────────────────►│Implementation
The LiveView
defmodule MyAppWeb.InferenceLive do
use MyAppWeb, :live_view
@idle_ttl_ms 15 * 60_000 # 15 minutes
def mount(_params, _session, socket) do
{:ok, _pid, compute} =
ExAtlas.Orchestrator.spawn(
gpu: :h100,
image: "ghcr.io/me/my-inference-server:latest",
ports: [{8000, :http}],
auth: :bearer,
user_id: socket.assigns.current_user.id,
idle_ttl_ms: @idle_ttl_ms,
name: "atlas-" <> to_string(socket.assigns.current_user.id)
)
Phoenix.PubSub.subscribe(ExAtlas.PubSub, "compute:" <> compute.id)
{:ok,
assign(socket,
compute_id: compute.id,
inference_url: hd(compute.ports).url,
inference_token: compute.auth.token
)}
end
def handle_event("ping", _, socket) do
_ = ExAtlas.Orchestrator.touch(socket.assigns.compute_id)
{:noreply, socket}
end
def handle_info({:atlas_compute, _id, {:status, :terminated}}, socket) do
{:noreply,
socket
|> put_flash(:info, "Inference session ended")
|> redirect(to: ~p"/")}
end
def handle_info({:atlas_compute, _id, _other}, socket), do: {:noreply, socket}
def terminate(_reason, socket) do
# LiveView process is dying; cut the pod short to save $
_ = ExAtlas.Orchestrator.stop_tracked(socket.assigns.compute_id)
:ok
end
endThe inference server (inside the pod)
defmodule InferenceServer do
@moduledoc """
Minimal Plug app running inside the RunPod pod. Rejects any request
that doesn't carry the preshared key injected by ExAtlas.
"""
import Plug.Conn
@behaviour Plug
def init(_), do: []
def call(conn, _) do
if authenticated?(conn) do
handle(conn)
else
conn |> put_status(401) |> send_resp(401, "unauthorized") |> halt()
end
end
defp authenticated?(conn) do
preshared = System.fetch_env!("ATLAS_PRESHARED_KEY")
case get_req_header(conn, "authorization") do
["Bearer " <> token] -> Plug.Crypto.secure_compare(token, preshared)
_ -> false
end
end
defp handle(conn) do
# ... your inference logic ...
end
endSigned URLs for media streams
<video src> can't send an Authorization header. Use
ExAtlas.Auth.SignedUrl:
# Generate a secret once per pod, inject it via env var (ExAtlas already does
# this when auth: :signed_url)
signed =
ExAtlas.Auth.SignedUrl.sign(
hd(compute.ports).url <> "/video/session-42.m3u8",
secret: compute.auth.token,
expires_in: 3600
)
# In the LiveView:
<video src={signed} />Choosing idle_ttl_ms
- Too short: users blink and the pod dies. Bad UX, repeated cold starts (and RunPod boot times on some GPUs can be 30-90 seconds).
- Too long: abandoned sessions burn $/hour until the reaper catches them.
A good default is 2–3× your expected user-idle window. If your app
sends a :ping every 30 seconds and users normally stay active,
idle_ttl_ms: 120_000 is reasonable. For exploratory/bursty tools
(generative art, Jupyter-like), go higher (10–15 min).
What the orchestrator protects against
- Node crashes. When the Phoenix node restarts, the Reaper finds
orphan pods (live on RunPod, not tracked locally, name prefix matches)
and terminates them within
:reap_interval_ms. - LiveView disconnect without clean shutdown. The
ComputeServer's idle timer fires regardless of what's talking to it. - Provider API hiccups.
terminate/2errors are logged and broadcast as{:terminate_failed, error}but don't cause the server to hang.
Pitfalls
- Don't share a single pod across users unless you've designed for isolation. The preshared-key model assumes one key per pod.
- Don't put the orchestrator in a cluster-shared PubSub — ExAtlas's PubSub is per-node. If you need cluster-wide visibility, subscribe from each node and reduce upstream.
- Don't spawn from a
Task.start/1without supervision. If the task crashes between the provider call and the ComputeServer start, the pod is live on the cloud but untracked. The Reaper will eventually catch it, but your budget won't thank you.