ExAtlas (ExAtlas v0.5.0)

Copy Markdown View Source

ExAtlas is a composable, pluggable Elixir SDK for managing GPU and CPU compute across multiple cloud providers (RunPod, Fly.io Machines, Lambda Labs, Vast.ai, or any module you write that implements ExAtlas.Provider).

The top-level API is intentionally thin: it validates input, resolves the provider, builds a ctx, and delegates to the provider module. That means you write the same call against RunPod today, Lambda Labs tomorrow, and your own bare-metal backend the day after — only the :provider option changes.

Quick start

# 1. Configure
config :ex_atlas, default_provider: :runpod
config :ex_atlas, :runpod, api_key: System.get_env("RUNPOD_API_KEY")

# 2. Spawn a GPU pod
{:ok, compute} =
  ExAtlas.spawn_compute(
    gpu: :h100,
    image: "pytorch/pytorch:2.5.0-cuda12.1-cudnn9-runtime",
    ports: [{8000, :http}],
    auth: :bearer
  )

compute.ports
# [%{internal: 8000, external: nil, protocol: :http,
#    url: "https://<pod_id>-8000.proxy.runpod.net"}]

compute.auth.header
# "Authorization: Bearer kX9fP..."

# 3. Your user's browser talks to the pod directly (bearer token guards access).

# 4. Shut it down when done
:ok = ExAtlas.terminate(compute.id)

Running a serverless inference job

{:ok, job} =
  ExAtlas.run_job(
    endpoint: "abc123",
    input: %{prompt: "a beautiful sunset"},
    mode: :async
  )

{:ok, done} = ExAtlas.get_job(job.id)
done.output

Stream a job

ExAtlas.stream_job(job.id) |> Enum.each(&IO.inspect/1)

Swapping providers

ExAtlas.spawn_compute(provider: :runpod, gpu: :h100, ...)
ExAtlas.spawn_compute(provider: :lambda_labs, gpu: :h100, ...)  # v0.2
ExAtlas.spawn_compute(provider: MyInternalCloud.Provider, gpu: :h100, ...)

See ExAtlas.Provider for the behaviour contract and ExAtlas.Config for how provider + API key resolution works.

Summary

Functions

Cancel an in-flight serverless job.

Return the capability atoms honored by a provider.

Fetch a compute resource by id.

Fetch a serverless job by id.

List compute resources, optionally filtered.

Return the provider's catalog of GPU types + pricing.

Submit a serverless inference job.

Spawn a compute resource.

Resume a stopped compute resource.

Stop a compute resource without destroying storage.

Stream partial results from a running job as a lazy Enumerable.

Terminate and destroy a compute resource.

Types

opts()

@type opts() :: keyword()

Functions

cancel_job(id, opts \\ [])

@spec cancel_job(String.t(), opts()) :: :ok | {:error, term()}

Cancel an in-flight serverless job.

capabilities(provider)

@spec capabilities(atom() | module()) :: [atom()]

Return the capability atoms honored by a provider.

get_compute(id, opts \\ [])

@spec get_compute(String.t(), opts()) ::
  {:ok, ExAtlas.Spec.Compute.t()} | {:error, term()}

Fetch a compute resource by id.

get_job(id, opts \\ [])

@spec get_job(String.t(), opts()) :: {:ok, ExAtlas.Spec.Job.t()} | {:error, term()}

Fetch a serverless job by id.

list_compute(opts \\ [])

@spec list_compute(opts()) :: {:ok, [ExAtlas.Spec.Compute.t()]} | {:error, term()}

List compute resources, optionally filtered.

list_gpu_types(opts \\ [])

@spec list_gpu_types(opts()) :: {:ok, [ExAtlas.Spec.GpuType.t()]} | {:error, term()}

Return the provider's catalog of GPU types + pricing.

run_job(opts)

@spec run_job(opts()) :: {:ok, ExAtlas.Spec.Job.t()} | {:error, term()}

Submit a serverless inference job.

run_job(req, opts)

@spec run_job(ExAtlas.Spec.JobRequest.t(), opts()) ::
  {:ok, ExAtlas.Spec.Job.t()} | {:error, term()}

spawn_compute(opts)

@spec spawn_compute(opts()) :: {:ok, ExAtlas.Spec.Compute.t()} | {:error, term()}

Spawn a compute resource.

Accepts either a keyword list (convenience) or a pre-built ExAtlas.Spec.ComputeRequest. See ExAtlas.Spec.ComputeRequest for the full field list.

spawn_compute(req, opts)

@spec spawn_compute(ExAtlas.Spec.ComputeRequest.t(), opts()) ::
  {:ok, ExAtlas.Spec.Compute.t()} | {:error, term()}

start(id, opts \\ [])

@spec start(String.t(), opts()) :: :ok | {:error, term()}

Resume a stopped compute resource.

stop(id, opts \\ [])

@spec stop(String.t(), opts()) :: :ok | {:error, term()}

Stop a compute resource without destroying storage.

stream_job(id, opts \\ [])

@spec stream_job(String.t(), opts()) :: Enumerable.t()

Stream partial results from a running job as a lazy Enumerable.

terminate(id, opts \\ [])

@spec terminate(String.t(), opts()) :: :ok | {:error, term()}

Terminate and destroy a compute resource.