ALLM is a provider-neutral LLM execution library for Elixir. You write your workflow once — building a request, picking an engine, calling generate/3 or chat/3 — and run it against OpenAI, Anthropic, Gemini, or any custom adapter without changing the call site.

This guide walks you from a blank mix.exs to a working round-trip against a real provider in five minutes. We'll use ALLM.Providers.Fake (the deterministic test adapter that ships with the library) for the first pass — it requires no API key and no network — then swap to a real provider.

Install

Add ALLM to your mix.exs deps:

def deps do
  [
    {:allm, "~> 0.3"}
  ]
end

Run mix deps.get. ALLM pulls in req, finch, jason, and telemetry as transitive deps; you don't need to declare them yourself.

The toolchain floor is Elixir ~> 1.17 and Erlang/OTP 27+.

Hello, ALLM (no network)

The simplest possible round-trip uses the fake adapter. Open iex -S mix in your project and paste:

iex> engine = ALLM.Engine.new(
...>   adapter: ALLM.Providers.Fake,
...>   adapter_opts: [script: [{:text, "Hello, ALLM!"}, {:finish, :stop}]]
...> )
iex> {:ok, %ALLM.ChatResult{final_response: %ALLM.Response{output_text: text}}} =
...>   ALLM.chat(engine, [ALLM.user("Hi.")])
iex> text
"Hello, ALLM!"

Three things happened:

  1. ALLM.Engine.new/1 built a runtime engine. Engines hold the non-serializable bits — adapter module, adapter opts, optional key resolver. They're cheap to construct and safe to share across processes.
  2. ALLM.chat/3 ran the auto-loop. With no tools declared, the loop completes after a single round-trip and returns an %ALLM.ChatResult{} wrapping the final %ALLM.Response{}.
  3. The fake adapter ignored the request entirely and returned the scripted reply ("Hello, ALLM!"). That's the whole point — Fake is for testing orchestration, not provider wire fidelity.

Handling responses — the three-clause pattern

Every Layer-C call (generate/3, chat/3, etc.) returns one of:

  • {:ok, %Response{finish_reason: :stop, output_text: text}} — the happy path.
  • {:ok, %Response{finish_reason: :error, metadata: %{error: e}}} — a mid-stream adapter failure (rate limit, content filter, network blip) folded back into the response. The call-site tuple stays {:ok, _} — matching only {:error, _} silently swallows these.
  • {:error, struct} — a synchronous pre-flight failure (no adapter, no key, invalid request).

The full three-clause shape:

case ALLM.generate(engine, request) do
  {:ok, %ALLM.Response{finish_reason: :stop, output_text: text}} ->
    {:ok, text}

  {:ok, %ALLM.Response{finish_reason: :error, metadata: %{error: err}}} ->
    {:error, err}

  {:error, err} ->
    {:error, err}
end

When you just want the text or a clear error, reach for ALLM.unwrap/1 — it collapses the three clauses into one call:

{:ok, text} = ALLM.unwrap(ALLM.generate(engine, request))

unwrap/1 also handles non-stop finishes (:length, :tool_calls, :content_filter) and structured-content responses; see its @doc for the full clause list.

Building a request explicitly

ALLM.chat/3 accepts either a list of messages or a %Request{}. The list form is shorthand. Here's the explicit form:

iex> engine = ALLM.Engine.new(
...>   adapter: ALLM.Providers.Fake,
...>   adapter_opts: [script: [{:text, "Three primes: 2, 3, 5."}, {:finish, :stop}]]
...> )
iex> req = ALLM.request([
...>   ALLM.system("Be concise."),
...>   ALLM.user("Name three primes.")
...> ])
iex> {:ok, %ALLM.ChatResult{final_response: %ALLM.Response{output_text: text}}} =
...>   ALLM.chat(engine, req)
iex> text
"Three primes: 2, 3, 5."

ALLM.request/2 accepts the same opts you'd set on the request struct directly: :model, :tools, :tool_choice, :response_format, :stream, :max_tokens, :temperature, :metadata.

When to reach for what

You want to…Use thisReturns
One-shot completionALLM.generate/3{:ok, %Response{}}
One-shot streamingALLM.stream_generate/3{:ok, Enumerable.t} of events
Single round-trip with tool executionALLM.step/3{:ok, %StepResult{}}
Multi-turn auto-loop with toolsALLM.chat/3{:ok, %ChatResult{}}
Multi-turn auto-loop, streamingALLM.stream/3{:ok, Enumerable.t}
Multi-turn with persistence between turnsALLM.Session.*{:ok, %Session{}}
Generate / edit / vary imagesALLM.generate_image/3 etc.{:ok, %ImageResponse{}}

Swap to a real provider

The engine is the only thing that changes — everything downstream stays identical. For OpenAI:

engine = ALLM.Engine.new(
  adapter: ALLM.Providers.OpenAI,
  model: "gpt-4.1-mini"
)

{:ok, response} = ALLM.generate(engine, ALLM.request([ALLM.user("Hi.")]))

For Anthropic:

engine = ALLM.Engine.new(
  adapter: ALLM.Providers.Anthropic,
  model: "claude-sonnet-4-6"
)

For Gemini:

engine = ALLM.Engine.new(
  adapter: ALLM.Providers.Gemini,
  model: "gemini-3-flash-preview"
)

Each provider has its own model strings; otherwise the call site is byte-identical.

Where do API keys come from?

You have four resolution paths, in priority order:

  1. Per-callALLM.generate(engine, req, api_key: "sk-..."). Wins over everything. Use this for multi-tenant SaaS where the key changes per request.
  2. Engine-level resolverALLM.Engine.new(adapter: ..., keys: %{my_provider: fn -> System.fetch_env!("MY_KEY") end}).
  3. Application configconfig :allm, :keys, openai: "sk-...".
  4. Environment variable — each provider has a default (OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY).

%ALLM.Engine{} has no :api_key field; keys resolve per-call via opts[:api_key] or via the :allm, :keys application config (see ALLM.Keys). An engine struct can be persisted to ETF or JSON safely — it carries no secrets.

Engines never persist API keys — they round-trip safely through ETF and JSON. See multi_tenant_keys.md for the full resolution chain.

Where to next

Pick the path that matches what you're building:

  • Streaming UIstreaming.md — events, filters, cancellation.
  • Tool callstools.md — auto loop, manual mode, ask-user.
  • Multi-turn persistencesessions.md%Session{} and the status union.
  • Multi-modal inputvision.mdTextPart and ImagePart.
  • Image generationimage_generation.mdgenerate_image/3, edit_image/4, image_variations/3.
  • Production hardeningerrors_and_retries.md and multi_tenant_keys.md.

Testing your integration

ALLM.Providers.Fake is the canonical test vehicle. Drop it into your config/test.exs-built engine and write deterministic assertions against scripted replies — no network, no flakes, no mocking infrastructure.

iex> engine = ALLM.Engine.new(
...>   adapter: ALLM.Providers.Fake,
...>   adapter_opts: [script: [{:text, "ok"}, {:finish, :stop}]]
...> )
iex> {:ok, %ALLM.Response{output_text: text}} =
...>   ALLM.generate(engine, ALLM.request([ALLM.user("ping")]))
iex> text
"ok"

The examples/ directory in the repository contains 15 numbered scripts you can run against any of the bundled providers — see examples/README.md.