Every HTTP request that ExAtlas makes emits a :telemetry event, so you can wire the library into your existing metrics pipeline without writing provider-specific code.

Events

[:ex_atlas, <provider>, :request]

Emitted after every REST, runtime, or GraphQL call.

Measurements:

KeyTypeValue
statusintHTTP status code

Metadata:

KeyTypeValue
apiatom:management / :runtime / :graphql
methodatom:get / :post / :delete / ...
urlstringFull request URL

[:ex_atlas, :fly, :token, :acquire] (span)

:start / :stop / :exception events around every ExAtlas.Fly.Tokens.get/1 call. Measure cache-hit rate, CLI acquisition latency, and resolution failures.

:stop metadata:

KeyTypeValue
appstringFly app name
sourceatom:ets / :storage / :config / :cli / :manual / :none (resolution failed)
acquireratom:facade (cross-process ETS fast-path hit) / :app_server (slow path or coalesced)

Measurements follow the standard :telemetry.span/3 shape (system_time on :start, duration + monotonic_time on :stop).

Reading source + acquirer together

  • source: :ets, acquirer: :facade — pure fast-path cache hit. No mailbox round-trip. This is what you want for the vast majority of requests once caches are warm.
  • source: :ets, acquirer: :app_servercoalescing success. The caller entered the AppServer mailbox, and by the time handle_call ran, a concurrent first-mover had already filled ETS. Mostly seen during cold-start thundering herds; proves the per-app serialization is coalescing CLI calls.
  • source: :cli, acquirer: :app_server — first-in-line caller doing the actual fly tokens create readonly work. One of these per app per cold start (plus expiries).
  • source: :storage, acquirer: :app_server — ETS empty but DETS storage had a valid token. Expect a burst of these right after VM restart.
  • source: :none, acquirer: :app_server — full resolution chain miss. Worth alerting on if sustained.

[:ex_atlas, :fly, :logs, :fetch] (span)

:start / :stop / :exception around ExAtlas.Fly.Logs.Client.fetch_logs/3. Emitted regardless of whether you call fetch_logs/3 directly or go through fetch_logs_with_retry/2.

:stop metadata:

KeyTypeValue
appstringFly app name
statusterm:ok / {:error, reason}
countintNumber of entries returned

Log line content is never included in metadata — Fly log bodies may contain bearer tokens, and we do not want them flowing into a metrics pipeline.

[:ex_atlas, :fly, :deploy, :line] and [:ex_atlas, :fly, :deploy, :exit]

Two events from ExAtlas.Fly.Deploy.stream_deploy/3:

  • :line fires once per non-empty output line. measurements: %{count: 1} so a Counter reporter sums to total lines.
  • :exit fires once when the deploy terminates.

:line metadata:

KeyTypeValue
ticket_idstringThe deploy ticket ID

:exit metadata:

KeyTypeValue
ticket_idstringThe deploy ticket ID
resultterm:ok / {:error, :timeout} / {:error, {:exit_code, N}}

Line content is deliberately excluded — Fly build output can contain bearer tokens.

Wiring into Logger

:telemetry.attach(
  "atlas-http-logger",
  [:ex_atlas, :runpod, :request],
  fn _event, measurements, metadata, _config ->
    Logger.info(
      "ExAtlas → #{metadata.api} #{metadata.method} #{metadata.url}#{measurements.status}"
    )
  end,
  nil
)

Wiring into :telemetry_metrics

defmodule MyAppWeb.Telemetry do
  use Supervisor
  import Telemetry.Metrics

  def metrics do
    [
      # Count requests grouped by provider + status class
      counter("atlas.runpod.request.count",
        event_name: [:ex_atlas, :runpod, :request],
        measurement: :status,
        tags: [:api, :method]
      ),

      # Watch error rates
      counter("atlas.runpod.request.errors",
        event_name: [:ex_atlas, :runpod, :request],
        measurement: :status,
        tags: [:api, :method],
        keep: fn metadata, measurements ->
          measurements.status >= 400
        end
      )
    ]
  end
end

Plug into Grafana / Prometheus / StatsD via whichever reporter you prefer (TelemetryMetricsPrometheus, TelemetryMetricsStatsd, ...).

Event attachment on application start

defmodule MyApp.AtlasTelemetry do
  @events [
    # Provider HTTP requests
    [:ex_atlas, :runpod, :request],
    [:ex_atlas, :fly, :request],
    [:ex_atlas, :lambda_labs, :request],
    [:ex_atlas, :vast, :request],
    # Fly platform ops (spans emit :start + :stop + :exception)
    [:ex_atlas, :fly, :token, :acquire, :start],
    [:ex_atlas, :fly, :token, :acquire, :stop],
    [:ex_atlas, :fly, :logs, :fetch, :start],
    [:ex_atlas, :fly, :logs, :fetch, :stop],
    [:ex_atlas, :fly, :deploy, :line],
    [:ex_atlas, :fly, :deploy, :exit]
  ]

  def attach do
    :telemetry.attach_many(
      "atlas-telemetry",
      @events,
      &__MODULE__.handle/4,
      nil
    )
  end

  def handle(event, measurements, metadata, _config) do
    # Dispatch to your metrics system
  end
end

# lib/my_app/application.ex
def start(_type, _args) do
  MyApp.AtlasTelemetry.attach()
  # ...
end

Orchestrator events

PubSub broadcasts from the orchestrator are covered in the README — subscribe to "compute:<id>" on ExAtlas.PubSub for state-change notifications. These are PubSub messages, not Telemetry events.

If you want Telemetry-style metrics for spawn/terminate counts, wrap ExAtlas.Orchestrator.spawn/1 in your own helper that emits a Telemetry event alongside the call.