ALLM is a provider-neutral LLM execution library for Elixir. You write
your workflow once — building a request, picking an engine, calling
generate/3 or chat/3 — and run it against OpenAI, Anthropic, Gemini,
or any custom adapter without changing the call site.
This guide walks you from a blank mix.exs to a working round-trip
against a real provider in five minutes. We'll use ALLM.Providers.Fake
(the deterministic test adapter that ships with the library) for the
first pass — it requires no API key and no network — then swap to a real
provider.
Install
Add ALLM to your mix.exs deps:
def deps do
[
{:allm, "~> 0.3"}
]
endRun mix deps.get. ALLM pulls in req, finch, jason, and
telemetry as transitive deps; you don't need to declare them yourself.
The toolchain floor is Elixir ~> 1.17 and Erlang/OTP 27+.
Hello, ALLM (no network)
The simplest possible round-trip uses the fake adapter. Open
iex -S mix in your project and paste:
iex> engine = ALLM.Engine.new(
...> adapter: ALLM.Providers.Fake,
...> adapter_opts: [script: [{:text, "Hello, ALLM!"}, {:finish, :stop}]]
...> )
iex> {:ok, %ALLM.ChatResult{final_response: %ALLM.Response{output_text: text}}} =
...> ALLM.chat(engine, [ALLM.user("Hi.")])
iex> text
"Hello, ALLM!"Three things happened:
ALLM.Engine.new/1built a runtime engine. Engines hold the non-serializable bits — adapter module, adapter opts, optional key resolver. They're cheap to construct and safe to share across processes.ALLM.chat/3ran the auto-loop. With no tools declared, the loop completes after a single round-trip and returns an%ALLM.ChatResult{}wrapping the final%ALLM.Response{}.- The fake adapter ignored the request entirely and returned the
scripted reply (
"Hello, ALLM!"). That's the whole point — Fake is for testing orchestration, not provider wire fidelity.
Building a request explicitly
ALLM.chat/3 accepts either a list of messages or a %Request{}. The
list form is shorthand. Here's the explicit form:
iex> engine = ALLM.Engine.new(
...> adapter: ALLM.Providers.Fake,
...> adapter_opts: [script: [{:text, "Three primes: 2, 3, 5."}, {:finish, :stop}]]
...> )
iex> req = ALLM.request([
...> ALLM.system("Be concise."),
...> ALLM.user("Name three primes.")
...> ])
iex> {:ok, %ALLM.ChatResult{final_response: %ALLM.Response{output_text: text}}} =
...> ALLM.chat(engine, req)
iex> text
"Three primes: 2, 3, 5."ALLM.request/2 accepts the same opts you'd set on the request struct
directly: :model, :tools, :tool_choice, :response_format,
:stream, :max_tokens, :temperature, :metadata.
When to reach for what
| You want to… | Use this | Returns |
|---|---|---|
| One-shot completion | ALLM.generate/3 | {:ok, %Response{}} |
| One-shot streaming | ALLM.stream_generate/3 | {:ok, Enumerable.t} of events |
| Single round-trip with tool execution | ALLM.step/3 | {:ok, %StepResult{}} |
| Multi-turn auto-loop with tools | ALLM.chat/3 | {:ok, %ChatResult{}} |
| Multi-turn auto-loop, streaming | ALLM.stream/3 | {:ok, Enumerable.t} |
| Multi-turn with persistence between turns | ALLM.Session.* | {:ok, %Session{}} |
| Generate / edit / vary images | ALLM.generate_image/3 etc. | {:ok, %ImageResponse{}} |
Swap to a real provider
The engine is the only thing that changes — everything downstream stays identical. For OpenAI:
engine = ALLM.Engine.new(
adapter: ALLM.Providers.OpenAI,
model: "gpt-4.1-mini"
)
{:ok, response} = ALLM.generate(engine, ALLM.request([ALLM.user("Hi.")]))For Anthropic:
engine = ALLM.Engine.new(
adapter: ALLM.Providers.Anthropic,
model: "claude-sonnet-4-6"
)For Gemini:
engine = ALLM.Engine.new(
adapter: ALLM.Providers.Gemini,
model: "gemini-3-flash-preview"
)Each provider has its own model strings; otherwise the call site is byte-identical.
Where do API keys come from?
You have four resolution paths, in priority order:
- Per-call —
ALLM.generate(engine, req, api_key: "sk-..."). Wins over everything. Use this for multi-tenant SaaS where the key changes per request. - Engine-level resolver —
ALLM.Engine.new(adapter: ..., keys: %{my_provider: fn -> System.fetch_env!("MY_KEY") end}). - Application config —
config :allm, :keys, openai: "sk-...". - Environment variable — each provider has a default
(
OPENAI_API_KEY,ANTHROPIC_API_KEY,GEMINI_API_KEY).
Engines never persist API keys — they round-trip safely through ETF and
JSON. See multi_tenant_keys.md for the full resolution chain.
Where to next
Pick the path that matches what you're building:
- Streaming UI →
streaming.md— events, filters, cancellation. - Tool calls →
tools.md— auto loop, manual mode, ask-user. - Multi-turn persistence →
sessions.md—%Session{}and the status union. - Multi-modal input →
vision.md—TextPartandImagePart. - Image generation →
image_generation.md—generate_image/3,edit_image/4,image_variations/3. - Production hardening →
errors_and_retries.mdandmulti_tenant_keys.md.
Testing your integration
ALLM.Providers.Fake is the canonical test vehicle. Drop it into your
config/test.exs-built engine and write deterministic assertions
against scripted replies — no network, no flakes, no mocking
infrastructure.
iex> engine = ALLM.Engine.new(
...> adapter: ALLM.Providers.Fake,
...> adapter_opts: [script: [{:text, "ok"}, {:finish, :stop}]]
...> )
iex> {:ok, %ALLM.Response{output_text: text}} =
...> ALLM.generate(engine, ALLM.request([ALLM.user("ping")]))
iex> text
"ok"The examples/ directory in the repository contains 15 numbered scripts
you can run against any of the bundled providers — see
examples/README.md.