A "tool" is a function the model can call — a weather lookup, a database
query, an action in your app. ALLM ships a synchronous tool loop that
handles the round-trip: the model emits a tool call, your code runs the
tool, the result feeds back to the model, the model produces a final
reply. This guide covers the auto-loop, manual mode, per-tool manual
control, and the {:ask_user, _} suspension protocol.
Declaring a tool
A tool has a name, a description, a JSON Schema for its arguments, and an executor function:
weather = ALLM.tool(
name: "get_weather",
description: "Returns the current weather for a city.",
schema: %{
"type" => "object",
"properties" => %{
"city" => %{"type" => "string"}
},
"required" => ["city"]
}
)ALLM.tool/1 returns a %ALLM.Tool{} struct. Pass it to
ALLM.request/2 (or ALLM.chat/3 directly) via the :tools opt:
req = ALLM.request([ALLM.user("Weather in Boston?")], tools: [weather])The model now knows the tool exists. To actually run it when the model asks, configure a tool executor on the engine.
The default tool executor
ALLM.ToolExecutor.Default ships with the library. It takes a map of
tool-name → 1-arity function:
engine = ALLM.Engine.new(
adapter: ALLM.Providers.OpenAI,
model: "gpt-4.1-mini",
tool_executor: {ALLM.ToolExecutor.Default, tools: %{
"get_weather" => fn %{"city" => city} ->
{:ok, %{temperature: 62, conditions: "sunny", city: city}}
end
}}
)The function receives the parsed argument map and must return one of:
{:ok, term}— JSON-encodable result. Default encoder isALLM.ToolResultEncoder.JSON.{:error, reason}— tool raised a domain error. The chat loop continues by feeding the error back to the model (it can recover or abandon).{:ask_user, prompt, metadata}— suspend the loop and ask the user.
The auto-loop
Pass the request to chat/3. The loop handles the round-trip:
iex> engine = ALLM.Engine.new(
...> adapter: ALLM.Providers.Fake,
...> adapter_opts: [scripts: [
...> [
...> {:tool_call, %{id: "call_1", name: "get_weather", args: %{"city" => "Boston"}}},
...> {:finish, :tool_calls}
...> ],
...> [
...> {:text, "It's 62F and sunny in Boston."},
...> {:finish, :stop}
...> ]
...> ]],
...> tool_executor: {ALLM.ToolExecutor.Default, tools: %{
...> "get_weather" => fn _args -> {:ok, %{temperature: 62}} end
...> }}
...> )
iex> weather = ALLM.tool(name: "get_weather", description: "weather", schema: %{"type" => "object"})
iex> req = ALLM.request([ALLM.user("Weather?")], tools: [weather])
iex> {:ok, %ALLM.ChatResult{final_response: %ALLM.Response{output_text: text}}} =
...> ALLM.chat(engine, req)
iex> text
"It's 62F and sunny in Boston."The loop ran two round-trips: the first produced a tool call, the executor ran the tool, the result fed back in, and the second round-trip produced the final assistant text.
step/3 is the same minus the loop — one round-trip, one
%StepResult{} returned. Use it when you want explicit control over
each iteration.
Manual mode (engine-wide)
Sometimes you don't want the loop to run tools at all — you want the
model's tool calls returned to your code so you can audit them, queue
them, or run them in a different process. Pass mode: :manual on the
engine:
engine = ALLM.Engine.new(
adapter: ALLM.Providers.OpenAI,
model: "gpt-4.1-mini",
mode: :manual
)Now chat/3 halts after one round-trip whenever the model emits tool
calls. The %ChatResult{} carries halted_reason: :tool_calls and the
calls live on the final response's tool_calls field. You're
responsible for executing them and constructing a :tool message
containing each result, then re-issuing chat/3 with the augmented
thread.
Per-tool manual control
Mix-and-match: most tools auto, one tool manual. Set manual: true on
the tool definition:
auto_tool = ALLM.tool(name: "get_weather", description: "...", schema: %{...})
manual_tool = ALLM.tool(
name: "confirm_action",
description: "Asks the user to confirm an irreversible action.",
schema: %{...},
manual: true
)
req = ALLM.request([ALLM.user("...")], tools: [auto_tool, manual_tool])Under mode: :auto (the default), the chat orchestrator runs the auto
bucket eagerly. If the model ALSO calls a manual tool in the same
round, the loop halts with halted_reason: :manual_tool_calls and the
manual subset surfaces in metadata.manual_tool_calls (for
chat/3/stream/3) or Session.pending_tool_calls (for
Session.start/3).
After you've handled the manual tool, append a :tool message
containing the result and re-issue chat/3 (or call
Session.submit_tool_result/3 then Session.continue/3).
examples/14_per_tool_manual.exs and
examples/15_per_tool_manual_session.exs are runnable smoke tests of
this flow.
:on_tool_error policy
When a tool returns {:error, reason}, the loop's default behaviour is
to feed the error back to the model and continue. Override with
:on_tool_error:
ALLM.chat(engine, req, on_tool_error: :halt)Legal values:
:continue(default) — feed the error back to the model.:halt— halt the loop withhalted_reason: :tool_error.A 2-arity function
fn tool_call, error -> :continue | :halt end— decide per-call.
Ask-user suspension
A tool can return {:ask_user, prompt, metadata} to halt the loop and
wait for human input. The chat loop returns with
halted_reason: :ask_user; the prompt and metadata live on the result.
ask_tool = fn _args ->
{:ask_user, "Confirm deleting the production database?", %{action: :delete_db}}
endResume by appending the user's reply as a :user message and re-issuing
chat/3, or by calling Session.reply/4 if you're using sessions.
examples/09_ask_user.exs is a runnable smoke test.
Streaming tool calls
stream/3 is the streaming version of chat/3. Tool calls arrive as
:tool_call_delta events (the argument blob accumulates) followed by a
:tool_call event when the call is complete. The auto-loop dispatches
the tool, emits a :tool_result event, and continues the loop.
See streaming.md for the full event-shape table.
Handler context (arity-2)
A tool handler may be 1-arity (fn args -> ... end) or 2-arity
(fn args, context -> ... end). ALLM detects the arity at dispatch
time and routes accordingly.
The arity-2 keyword list carries call context. Standard keys provided by
ALLM.ToolExecutor.Default:
| Key | Type | Notes |
|---|---|---|
:context | term() | The opaque value passed via ALLM.chat(engine, thread, context: ...) or Session.reply(session, msg, context: ...). Caller-defined shape. |
:session_id | String.t() | nil | The %Session{}.id when invoked through the Session API; nil for stateless chat/3 / step/3. |
:tool_call | %ALLM.ToolCall{} | The exact tool call the assistant emitted (:id, :name, :arguments). |
:engine | %ALLM.Engine{} | The engine driving the call — handlers needing to issue downstream LLM calls reuse it via ALLM.generate/3. |
:request_id | String.t() | nil | Telemetry-correlation id from the parent span. |
handler = fn args, ctx ->
case Keyword.get(ctx, :context) do
%{user_id: id} -> {:ok, lookup_for_user(id, args)}
_ -> {:ok, args}
end
endReach for the 1-arity form when handlers don't need context — it keeps
the call site simple. Custom keys in :context are passed through
unchanged so tests can inject arbitrary correlation data.
Adapter-call cadence
Each turn of the tool loop consumes two adapter calls: one for the
assistant's tool-call request, and one for the post-tool-result
assistant turn. Token bills scale with turn_count × 2. Multi-tool
turns (parallel tool calls) still count as one assistant call each
direction — only the turn count drives the call multiplier.
A loop running three tool-call turns issues six adapter requests. With
max_turns: 8 (the library default), the upper bound is sixteen calls
per ALLM.chat/3 invocation.
Structured response after tool loop
When you need the post-tool-loop assistant turn to return JSON matching
a schema (rather than free-form text), pass both :response_format and
structured_finalize: true:
schema = ALLM.json_schema("answer", %{
"type" => "object",
"properties" => %{"answer" => %{"type" => "string"}},
"required" => ["answer"]
})
{:ok, result} =
ALLM.chat(engine, [ALLM.user("what is 6×7?")],
response_format: schema,
structured_finalize: true
)
{:ok, %{"answer" => "42"}} = Jason.decode(result.final_response.output_text)structured_finalize: true runs a two-pass orchestration: pass 1 runs
the tool loop freely (the model may emit any text or tool calls); pass 2
re-prompts the model with response_format constrained to the schema so
the final turn is guaranteed to match.
The result's metadata carries observability for the two passes:
result.metadata.structured_finalize.pass_1_halted— the halt reason pass 1 reached (typically:completed).result.metadata.structured_finalize.pass_1_response— pass 1's raw%Response{}for inspection.
result.steps contains the merged step list from both passes so step
indexes remain stable across the two-pass boundary.
Where to next
sessions.md— multi-turn tool flows with persistence.streaming.md— tool calls in the event stream.examples/03_single_tool_call.exs— runnable single-tool smoke test.examples/04_parallel_tool_calls.exs— two tools in one round.examples/07_manual_tool_round_trip.exs— engine-wide manual mode.