Every HTTP request that ExAtlas makes emits a :telemetry event, so you
can wire the library into your existing metrics pipeline without writing
provider-specific code.
Events
[:ex_atlas, <provider>, :request]
Emitted after every REST, runtime, or GraphQL call.
Measurements:
| Key | Type | Value |
|---|---|---|
status | int | HTTP status code |
Metadata:
| Key | Type | Value |
|---|---|---|
api | atom | :management / :runtime / :graphql |
method | atom | :get / :post / :delete / ... |
url | string | Full request URL |
[:ex_atlas, :fly, :token, :acquire] (span)
:start / :stop / :exception events around every
ExAtlas.Fly.Tokens.get/1 call. Measure cache-hit rate, CLI acquisition
latency, and resolution failures.
:stop metadata:
| Key | Type | Value |
|---|---|---|
app | string | Fly app name |
source | atom | :ets / :storage / :config / :cli / :manual / :none (resolution failed) |
acquirer | atom | :facade (cross-process ETS fast-path hit) / :app_server (slow path or coalesced) |
Measurements follow the standard :telemetry.span/3 shape (system_time
on :start, duration + monotonic_time on :stop).
Reading source + acquirer together
source: :ets, acquirer: :facade— pure fast-path cache hit. No mailbox round-trip. This is what you want for the vast majority of requests once caches are warm.source: :ets, acquirer: :app_server— coalescing success. The caller entered the AppServer mailbox, and by the timehandle_callran, a concurrent first-mover had already filled ETS. Mostly seen during cold-start thundering herds; proves the per-app serialization is coalescing CLI calls.source: :cli, acquirer: :app_server— first-in-line caller doing the actualfly tokens create readonlywork. One of these per app per cold start (plus expiries).source: :storage, acquirer: :app_server— ETS empty but DETS storage had a valid token. Expect a burst of these right after VM restart.source: :none, acquirer: :app_server— full resolution chain miss. Worth alerting on if sustained.
[:ex_atlas, :fly, :logs, :fetch] (span)
:start / :stop / :exception around ExAtlas.Fly.Logs.Client.fetch_logs/3.
Emitted regardless of whether you call fetch_logs/3 directly or go
through fetch_logs_with_retry/2.
:stop metadata:
| Key | Type | Value |
|---|---|---|
app | string | Fly app name |
status | term | :ok / {:error, reason} |
count | int | Number of entries returned |
Log line content is never included in metadata — Fly log bodies may contain bearer tokens, and we do not want them flowing into a metrics pipeline.
[:ex_atlas, :fly, :deploy, :line] and [:ex_atlas, :fly, :deploy, :exit]
Two events from ExAtlas.Fly.Deploy.stream_deploy/3:
:linefires once per non-empty output line.measurements: %{count: 1}so a Counter reporter sums to total lines.:exitfires once when the deploy terminates.
:line metadata:
| Key | Type | Value |
|---|---|---|
ticket_id | string | The deploy ticket ID |
:exit metadata:
| Key | Type | Value |
|---|---|---|
ticket_id | string | The deploy ticket ID |
result | term | :ok / {:error, :timeout} / {:error, {:exit_code, N}} |
Line content is deliberately excluded — Fly build output can contain bearer tokens.
Wiring into Logger
:telemetry.attach(
"atlas-http-logger",
[:ex_atlas, :runpod, :request],
fn _event, measurements, metadata, _config ->
Logger.info(
"ExAtlas → #{metadata.api} #{metadata.method} #{metadata.url} → #{measurements.status}"
)
end,
nil
)Wiring into :telemetry_metrics
defmodule MyAppWeb.Telemetry do
use Supervisor
import Telemetry.Metrics
def metrics do
[
# Count requests grouped by provider + status class
counter("atlas.runpod.request.count",
event_name: [:ex_atlas, :runpod, :request],
measurement: :status,
tags: [:api, :method]
),
# Watch error rates
counter("atlas.runpod.request.errors",
event_name: [:ex_atlas, :runpod, :request],
measurement: :status,
tags: [:api, :method],
keep: fn metadata, measurements ->
measurements.status >= 400
end
)
]
end
endPlug into Grafana / Prometheus / StatsD via whichever reporter you
prefer (TelemetryMetricsPrometheus, TelemetryMetricsStatsd, ...).
Event attachment on application start
defmodule MyApp.AtlasTelemetry do
@events [
# Provider HTTP requests
[:ex_atlas, :runpod, :request],
[:ex_atlas, :fly, :request],
[:ex_atlas, :lambda_labs, :request],
[:ex_atlas, :vast, :request],
# Fly platform ops (spans emit :start + :stop + :exception)
[:ex_atlas, :fly, :token, :acquire, :start],
[:ex_atlas, :fly, :token, :acquire, :stop],
[:ex_atlas, :fly, :logs, :fetch, :start],
[:ex_atlas, :fly, :logs, :fetch, :stop],
[:ex_atlas, :fly, :deploy, :line],
[:ex_atlas, :fly, :deploy, :exit]
]
def attach do
:telemetry.attach_many(
"atlas-telemetry",
@events,
&__MODULE__.handle/4,
nil
)
end
def handle(event, measurements, metadata, _config) do
# Dispatch to your metrics system
end
end
# lib/my_app/application.ex
def start(_type, _args) do
MyApp.AtlasTelemetry.attach()
# ...
endOrchestrator events
PubSub broadcasts from the orchestrator are covered in the README —
subscribe to "compute:<id>" on ExAtlas.PubSub for state-change
notifications. These are PubSub messages, not Telemetry events.
If you want Telemetry-style metrics for spawn/terminate counts, wrap
ExAtlas.Orchestrator.spawn/1 in your own helper that emits a Telemetry
event alongside the call.