ALLM.Providers.Fake is the deterministic, scripted adapter that ships with the library. It's the canonical test vehicle — fast (~50µs per call), serializable, requires no network, and passes every conformance suite that real provider adapters do.

This guide consolidates the script-entry vocabulary, the cursor model, and the test-only :usage / :record opts. Reach for it whenever you write a test against ALLM's orchestration layer.

When to reach for it

Use ALLM.Providers.Fake for every orchestration test:

  • chat/3 / step/3 flows including tool execution.
  • Streaming tests (stream/3, stream_step/3).
  • Session state transitions (:idle:awaiting_tools:completed).
  • Error-path tests (rate limits, content filters, mid-stream failures).
  • Multi-turn loop bound tests (:max_turns, :halt_when, ask-user).

Use real-provider wire tests (@tag :wire, Bypass/Plug.Test) ONLY when you're testing request/response byte-shape. For everything else, the Fake is faster, deterministic, and decoupled from provider quirks.

Script-entry vocabulary

A script is a list of tagged tuples — each tuple describes one event the Fake will produce. Two disjoint vocabularies exist; the leading tag disambiguates.

Spec entries (user-facing)

TagShapeEmits
{:text, s}binary:text_delta (streaming) / accumulates text (non-streaming)
{:tool_call, kw}keyword with :id, :name, :arguments:tool_call_completed + sets finish_reason: :tool_calls
{:tool_call_delta, kw}keyword with :id, :arguments_delta:tool_call_delta
{:usage, map}map of %Usage{} fieldssets response.usage (non-streaming) / metadata.usage on :message_completed (streaming)
{:raw_chunk, term}opaque:raw_chunk
{:finish, reason}atomterminal :message_completed
{:error, term}atom (legal reason) or any term:error event (mid-stream)
{:delay, ms}non-neg intProcess.sleep(ms) — no event
{:sleep, ms}non-neg intdeprecated alias of :delay

Conformance-harness entries

TagShapeNotes
{:ok, map}a %Response{}-shaped mapone entry per call
{:error, reason, opts}3-tuplehands off to AdapterError.new/2
{:text_delta, s}streaming-onlyidentical to {:text, s}
{:preflight_error, reason, opts}streaming-onlysynchronous {:error, _} from stream/2
{:error_event, reason, opts}streaming-onlymid-stream :error event
{:stream_error, reason, opts}streaming-only%StreamError{} mid-stream

The full grammar lives in ALLM.Providers.Fake.Script's moduledoc.

Construction

engine = ALLM.Engine.new(
  adapter: ALLM.Providers.Fake,
  adapter_opts: [
    script: [{:text, "ok"}, {:finish, :stop}]
  ]
)

For multi-call tests, use :scripts (a list of per-call lists):

adapter_opts: [
  scripts: [
    [{:tool_call, id: "c0", name: "echo", arguments: %{"x" => 1}}, {:finish, :tool_calls}],
    [{:text, "done"}, {:finish, :stop}]
  ]
]

Streaming uses :stream_script with the same shapes (it accepts either a flat list for a single call or a list-of-lists for multi-call).

Cursor patterns

Multi-call scripts advance a per-process cursor on every call. By default the cursor lives in the process dictionary keyed by :erlang.phash2(scripts) — isolated per ExUnit test process (async: true), GC'd on pid-down, zero-setup for the common case.

Footgun: content-equal scripts collide

Two engines built with byte-identical :scripts values in the same process share the cursor. Workaround:

cursor = ALLM.Providers.Fake.start_script_cursor()

engine1 = ALLM.Engine.new(
  adapter: ALLM.Providers.Fake,
  adapter_opts: [scripts: scripts, script_cursor: cursor]
)

start_script_cursor/0 returns an Agent pid; cursor_index/1 reads it so a test can assert how many calls have been consumed.

Cross-process cursor sharing

When a test dispatches the adapter call across processes (Task.async/1), the explicit cursor is load-bearing — process-dict isolation would otherwise reset the cursor for each Task.

The :usage opt (Phase 21.2)

adapter_opts[:usage] materializes a %ALLM.Usage{} on every response without writing the usage entry per script:

adapter_opts: [
  script: [{:text, "ok"}, {:finish, :stop}],
  usage: [input_tokens: 12, output_tokens: 4]
]

Accepts a pre-built %Usage{} or a keyword list (normalized via Usage.new/1). The opt wins over any per-script {:usage, _} entry for the same call.

On streaming, the Usage rides on the :message_completed payload's metadata.usage key (additive payload-key extension — no new event variant). ALLM.StreamCollector.apply_event/2 copies it onto state.usage so non-streaming collection produces a %Response{usage: _}.

A per-script {:usage, _} entry behaves the same on streaming: it accumulates into metadata.usage rather than emitting a :raw_chunk. Real adapters emitting {:raw_chunk, {:usage, _}} keep their existing path; the change is scoped to Fake's {:usage, _} entry.

The :record opt (Phase 21.2)

adapter_opts[:record] accepts a pid that receives {:allm_fake_record, %Request{}, opts} verbatim BEFORE the script interpretation runs. The recording fires once per call — both generate/2 and stream/2 send before opening the stream.

test "tool call sends the right schema" do
  me = self()

  engine = ALLM.Engine.new(
    adapter: ALLM.Providers.Fake,
    adapter_opts: [
      script: [{:text, "ok"}, {:finish, :stop}],
      record: me
    ],
    tools: [my_tool]
  )

  {:ok, _} = ALLM.chat(engine, [ALLM.user("trigger")])

  assert_receive {:allm_fake_record, %ALLM.Request{tools: [tool]}, _opts}
  assert tool.schema["properties"]["city"]["type"] == "string"
end

opts are forwarded verbatim — no key scrubbing. The caller owns the opts they passed in; redact via Keyword.delete/2 before asserting if needed. A dead recording pid raises ArgumentError — a dead pid is a test bug.

Cleanup observation

For streaming tests asserting that Stream.resource/3's after_fun runs:

ref = :counters.new(1, [:atomics])

{:ok, stream} = ALLM.Providers.Fake.stream(req,
  adapter_opts: [script: [...], cleanup_observer: ref])

_ = Enum.take(stream, 2)
assert :counters.get(ref, 1) == 1

The counter increments at most once per stream (on consumer halt, reducer throws, or Stream.run/1 scope exit). Brutal Process.exit(pid, :kill) skips cleanup per OTP design — don't simulate :kill in tests.

Retry simulation

adapter_opts[:retry_until_call] makes the first n - 1 calls fail transiently (with :timeout) and the n-th call succeed:

adapter_opts: [
  script: [{:text, "ok"}, {:finish, :stop}],
  retry_until_call: 3
]

generate/2 retries automatically under the default policy. stream/2 emits the transient failure as a mid-stream {:error, _} event so the consumer reduces to %Response{finish_reason: :error} per the mid-stream error contract (ALLM.Runner / chat/3 do not retry the streaming arm — spec §6.1).

Cross-process engine injection

When a test fans work out across Task.async/1 and you want the workers to see the test's engine, use ALLM.Sandbox.set_engine/1:

test "fan-out workers use the test engine" do
  ALLM.Sandbox.set_engine(fake_engine())

  results =
    ["a", "b", "c"]
    |> Task.async_stream(fn input ->
      ALLM.generate(ALLM.Sandbox.get_engine(), ALLM.request([ALLM.user(input)]))
    end)
    |> Enum.map(fn {:ok, r} -> r end)

  assert length(results) == 3
end

Sandbox.get_engine/0 walks $callers so worker processes inherit the registering ancestor's engine — same idiom as Mox.allow/3 and Ecto.Adapters.SQL.Sandbox.allow/3.

Where to next

  • streaming.md — the event-shape vocabulary the scripts emit.
  • tools.md — tool-loop tests against scripted tool calls.
  • sessions.md — multi-turn persistence tests.
  • ALLM.Providers.Fake and ALLM.Providers.Fake.Script moduledocs — reference-level documentation of every entry tag.