Provider-neutral LLM execution and agentic loops for Elixir — one engine surface, swap the adapter to retarget OpenAI, Anthropic, or Gemini without touching call sites.

Why ALLM?

  • One surface, three providers. Pick OpenAI, Anthropic, or Gemini by changing one line. Vision input, structured output, tool use, and image generation all share the same caller code.
  • Streaming is the primitive. Every non-streaming entry point is a reducer over a token-by-token event stream. Drop into deltas when a UI needs them; pop back up when it doesn't.
  • State is plain data. Threads, requests, and sessions round-trip through :erlang.term_to_binary/1 and JSON. Persist them, ship them between nodes, resume them tomorrow — no PIDs, refs, funs, or API keys leak in.

Public API is stable across minor versions within v0.x; we'll bump major before breaking changes.

Install

Add ALLM to your mix.exs deps:

def deps do
  [
    {:allm, "~> 0.3"}
  ]
end

Run mix deps.get. Toolchain floor: Elixir ~> 1.17, Erlang/OTP 27+.

Hello, ALLM

Drive a one-shot chat against the deterministic ALLM.Providers.Fake adapter — no API key, no network:

engine = ALLM.Engine.new(
adapter: ALLM.Providers.Fake,
adapter_opts: [script: [{:text, "Hello, ALLM!"}, {:finish, :stop}]]
)
{:ok, %ALLM.ChatResult{final_response: %ALLM.Response{output_text: text}}} =
ALLM.chat(engine, [ALLM.user("Hi.")])
text
# => "Hello, ALLM!"

The block above is the canonical first-run snippet. The same code lives as a runnable doctest on the ALLM module — both copies are kept in lock-step by test/readme_hello_consistency_test.exs.

Pick a provider

Construct an engine for any of the three bundled providers. Once an engine is in hand, every call site below this section is identical across providers — pick once, swap freely.

# OpenAI
engine = ALLM.Engine.new(adapter: ALLM.Providers.OpenAI, model: "gpt-4.1-mini")

# Anthropic
engine = ALLM.Engine.new(adapter: ALLM.Providers.Anthropic, model: "claude-sonnet-4-5")

# Gemini
engine = ALLM.Engine.new(adapter: ALLM.Providers.Gemini, model: "gemini-2.5-flash")

The shared call site any of those engines drops into:

{:ok, response} = ALLM.chat(engine, [ALLM.user("Say hi.")])

API keys come from OPENAI_API_KEY, ANTHROPIC_API_KEY, and GEMINI_API_KEY by default — see Real providers below for per-call BYOK and the full resolution chain.

The 5-minute tour

A grand tour of what ALLM looks like in practice. Every snippet uses the same engine value — pick a provider once, every call site keeps working when you swap.

1. Generate — single round-trip

# Synchronous — get the final response
{:ok, %ALLM.Response{output_text: text}} =
  ALLM.generate(engine, ALLM.request([ALLM.user("Name three primes.")]))

# Streaming — same engine, same request, token-by-token
{:ok, stream} =
  ALLM.stream_generate(engine, ALLM.request([ALLM.user("Name three primes.")]))

Enum.each(stream, fn
  {:text_delta, %{delta: t}} -> IO.write(t)
  _other                     -> :ok
end)

generate/3 is implemented as a fold over stream_generate/3. Streaming is the primitive; sync is the convenience. Deeper dive: see guides/streaming.md.

2. Stream — token-by-token

ALLM.stream_generate/3 (single round-trip) and ALLM.stream/3 (multi-turn, including tool calls) both return a lazy enumerable of ALLM.Event tagged tuples. No event fires until you reduce.

{:ok, stream} = ALLM.stream(engine, [ALLM.user("Tell me a haiku.")])

stream
|> Enum.each(fn
  {:text_delta, %{delta: t}}         -> IO.write(t)
  {:step_completed, %{response: r}}  -> IO.puts("\n[step] #{r.finish_reason}")
  {:chat_completed, %{result: r}}    -> IO.puts("\n[done] #{r.halted_reason}")
  _                                  -> :ok
end)

Filter knobs (:emit_text_deltas, :emit_tool_deltas, :include_raw_chunks, :on_event) live on every streaming entry point. See guides/streaming.md for the full event union, cancellation semantics, and observer-callback rules.

3. Chat — multi-turn loop

{:ok, result} =
  ALLM.chat(engine, [
    ALLM.system("You are a concise assistant."),
    ALLM.user("Hi! Who are you?")
  ])

result.final_response.output_text
# => "I'm a concise assistant. How can I help?"

# Continue the conversation by appending and re-issuing
followup =
  result.thread
  |> ALLM.Thread.add_message(ALLM.user("Tell me a joke."))

{:ok, result} = ALLM.chat(engine, followup)

chat/3 runs the full model-tool loop until completion and returns a %ChatResult{} with the final response, the accumulated thread, and per-step records. The streaming sibling ALLM.stream/3 emits the same lifecycle as events.

4. Tools — declare, run, done

weather =
  ALLM.tool(
    name: "get_weather",
    description: "Return the current weather for a city.",
    schema: %{
      "type" => "object",
      "properties" => %{"city" => %{"type" => "string"}},
      "required" => ["city"]
    },
    handler: fn %{"city" => city} ->
      {:ok, %{forecast: "sunny", city: city}}
    end
  )

engine = ALLM.Engine.put_tools(engine, [weather])

{:ok, result} =
  ALLM.chat(engine, [ALLM.user("What's the weather in Boston?")])

result.final_response.output_text
# => "It's sunny in Boston."

length(result.steps)
# => 2  — model called the tool, then summarized

The handler is a plain Elixir function. The engine runs it, encodes the result for the next turn, and feeds it back to the model. For mode: :manual (caller computes the tool result), per-tool manual: true, {:ask_user, _} suspension, and the full tool-error policy, see guides/tools.md.

5. Sessions — pick up where you left off

# Earlier — store the session after a turn:
#     binary = :erlang.term_to_binary(session)
#     MyApp.Repo.update!(conversation, session_blob: binary)

# Later, possibly on a different node, in a different request:
session = :erlang.binary_to_term(blob_from_db)

{:ok, session, result} =
  ALLM.Session.reply(engine, session, "What did I just ask?")

session.status
# => :completed
result.final_response.output_text
# => "You asked about the weather in Boston."

A %ALLM.Session{} bundles the thread with a status (:idle, :awaiting_user, :awaiting_tools, :completed, :error) and any pending tool calls or ask-user prompt. Round-trip it through ETF or JSON, hand it to a worker, store it in a database column — when you're ready, hand it back to ALLM.Session.reply/4 (or stream_reply/4). Deeper dive: guides/sessions.md.

Worked examples

The examples/ directory ships 15 runnable scripts that double as integration tests. Each is self-asserting and runs against a real provider. See examples/README.md for the full table; the deeper-dive guides cross-link the relevant scripts at the bottom of each section.

For narrative walkthroughs, jump to a guide:

Real providers

ALLM ships three production adapters:

Configure via env vars (OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY) or per-call:

{:ok, response} = ALLM.generate(engine, request, api_key: tenant_key)

The per-call :api_key opt has the highest precedence in ALLM.Keys's five-level resolution chain — it overrides env vars, app config, and the runtime store. The engine itself is safe to cache and share across tenants. See guides/multi_tenant_keys.md for the full chain.

To run the bundled live-call examples:

OPENAI_API_KEY=sk-...     mix run examples/run_all.exs
ANTHROPIC_API_KEY=sk-...  ALLM_PROVIDER=anthropic mix run examples/run_all.exs
GEMINI_API_KEY=...        ALLM_PROVIDER=gemini    mix run examples/run_all.exs

Compatibility

  • Elixir ~> 1.17
  • Erlang/OTP 27+

ALLM follows semantic versioning. Within v0.x, public APIs and on-disk session shapes are stable across minor releases — we'll bump major before any breaking change.

Development

mix deps.get
mix compile
mix test                  # full suite (80% coverage threshold)
mix format
mix credo --strict
mix dialyzer
iex -S mix

The included dev container installs a compatible toolchain automatically.

License

MIT.