Multi-tenant keys (BYOK)

Copy Markdown View Source

In a multi-tenant SaaS — every customer brings their own LLM API key — the engine must NOT hold a key. Engines round-trip through ETF and JSON, so a key on the engine becomes a key in your job queue, your session store, your audit log. ALLM's resolution chain pushes credentials to call time and lets you swap per request.

This guide covers ALLM.Keys's five-level resolution chain, the per-call :api_key opt, app config, environment variables, custom resolvers, and the BYOK pattern in practice.

Resolution order

When an adapter needs an API key, ALLM.Keys.get/2 walks five sources in priority order. The first that returns a value wins:

  1. Per-callALLM.generate(engine, request, api_key: "sk-...")
  2. Engine :keys resolver — function or map on the engine
  3. ALLM.Keys.put/2 runtime store — global Agent (use sparingly)
  4. Application configconfig :allm, :keys, [openai: "sk-..."]
  5. Environment variable — provider-specific default

If none match, the adapter raises ALLM.Error.AdapterError{reason: :authentication}.

Per-call (the BYOK primitive)

The highest-priority source is the per-call :api_key opt:

engine = ALLM.Engine.new(adapter: ALLM.Providers.OpenAI, model: "gpt-4.1-mini")

{:ok, response} = ALLM.generate(engine, request, api_key: tenant.openai_key)

The engine itself never sees the key. Cache the engine, share it across processes, persist it — the key flows in per request.

Available on every entry point: generate/3, stream_generate/3, step/3, stream_step/3, chat/3, stream/3, Session.start/3, Session.reply/4, Session.continue/3, generate_image/3, edit_image/4, image_variations/3.

Engine resolver

For static deployments where one engine maps to one provider with one key, set the resolver at engine construction:

engine = ALLM.Engine.new(
  adapter: ALLM.Providers.OpenAI,
  model: "gpt-4.1-mini",
  keys: %{openai: System.fetch_env!("OPENAI_API_KEY")}
)

Or with a function (re-evaluated per call — useful for rotating credentials):

engine = ALLM.Engine.new(
  adapter: ALLM.Providers.OpenAI,
  model: "gpt-4.1-mini",
  keys: fn :openai -> MyApp.Vault.fetch!(:openai_key) end
)

The resolver receives the provider's key tag (:openai, :anthropic, :gemini, or whatever a custom adapter declares) and must return a binary key.

Application config

Library-wide defaults belong in config/runtime.exs:

config :allm, :keys,
  openai: System.fetch_env!("OPENAI_API_KEY"),
  anthropic: System.fetch_env!("ANTHROPIC_API_KEY"),
  gemini: System.fetch_env!("GEMINI_API_KEY")

Single-tenant apps where all calls use the same key — this is the shape you want. Multi-tenant apps should NOT use this; per-call override is the right primitive.

Environment variables

Each provider has a default env var:

  • OpenAI → OPENAI_API_KEY
  • Anthropic → ANTHROPIC_API_KEY
  • Gemini → GEMINI_API_KEY

If nothing higher in the chain matches, ALLM.Keys reads the env var at call time. Adequate for scripts and one-shot tools; insufficient for production multi-tenant.

Custom resolver behaviour

For non-trivial cases — Vault integration, dynamic key rotation, per-tenant override on a shared engine — implement the ALLM.Keys.Resolver behaviour:

defmodule MyApp.LLMKeys do
  @behaviour ALLM.Keys.Resolver

  @impl true
  def fetch(:openai, _opts) do
    case Process.get(:current_tenant) do
      nil -> :error
      tenant -> {:ok, MyApp.Vault.openai_key(tenant)}
    end
  end

  def fetch(:anthropic, _opts), do: {:ok, System.fetch_env!("ANTHROPIC_API_KEY")}
end

Wire it on the engine:

engine = ALLM.Engine.new(
  adapter: ALLM.Providers.OpenAI,
  model: "gpt-4.1-mini",
  keys: MyApp.LLMKeys
)

fetch/2 returns {:ok, binary} on hit or :error to fall through to the next chain link.

The BYOK pattern in practice

A canonical multi-tenant SaaS using ALLM looks like this:

defmodule MyApp.Chat do
  @engine ALLM.Engine.new(
    adapter: ALLM.Providers.OpenAI,
    model: "gpt-4.1-mini"
  )

  def ask(tenant_id, message) do
    tenant = MyApp.Tenants.get!(tenant_id)

    ALLM.chat(@engine, [ALLM.user(message)], api_key: tenant.openai_key)
  end
end

The engine is module-level (built once, cached in beam memory). The key per call. Crashes won't leak keys to crash dumps; ETF dumps of the engine won't carry credentials; logs won't accidentally print them.

What NOT to do

# DON'T put per-tenant keys on the engine.
engine = ALLM.Engine.new(
  adapter: ALLM.Providers.OpenAI,
  keys: %{openai: tenant.openai_key}  # leaks into ETF, JSON, crash dumps
)
# DON'T use ALLM.Keys.put/2 for BYOK.
ALLM.Keys.put(:openai, tenant.openai_key)
# ^^ this is a globally-named Agent. Two concurrent requests for two
# different tenants race — request B reads request A's key.

ALLM.Keys.put/2 is for development and single-tenant scripts. For multi-tenant production, ALWAYS use the per-call opt or a custom resolver.

Verifying keys aren't on engines

ALLM's tests verify this invariant — if you persist an engine, no key material appears in the binary. You can verify locally:

iex> engine = ALLM.Engine.new(
...>   adapter: ALLM.Providers.Fake,
...>   adapter_opts: [script: [{:text, "ok"}, {:finish, :stop}]]
...> )
iex> binary = :erlang.term_to_binary(engine)
iex> String.contains?(inspect(binary), "sk-")
false

(With Fake there's no key to leak. With a real provider, do the same check after constructing the engine — there should be no key material in the term.)

Where to next

  • getting_started.md — the quick install + first-call tour.
  • errors_and_retries.md:authentication reason and recovery.
  • examples/README.md § "SaaS bring-your-own-key (BYOK)" — runnable pattern.
  • ALLM.Keys module docs for the full API reference.