LLM.stream/3 returns the final LLM.Response directly — it opens the stream, collects chunks, executes tool calls, and assembles the response automatically.
Basic usage
{:ok, response} = LLM.stream("Tell me a story",
provider: :openai,
model: "gpt-4"
)
response.message.content
#=> "Once upon a time..."The first argument is the prompt, the second is the options keyword list, and the third is an optional callbacks keyword list.
Callbacks
Process each chunk as it arrives:
{:ok, response} = LLM.stream("Write a poem",
[provider: :openai, model: "gpt-4"],
on_chunk: fn
%LLM.Stream.Chunk{text: text} -> IO.write(text)
%LLM.Stream.Thinking{text: text} -> IO.puts("\n[thinking] #{text}")
%LLM.Stream.Stop{reason: reason} -> IO.puts("\n[stopped: #{reason}]")
_ -> :ok
end
)on_message
The :on_message callback fires once per completed LLM.Message in conversation order — every assistant turn and every tool result:
{:ok, response} = LLM.stream("What's the weather?",
[provider: :openai, model: "gpt-4", tools: [WeatherTool]],
on_message: fn
%LLM.Message{role: :assistant} = msg -> IO.puts("assistant: #{msg.content}")
%LLM.Message{role: :tool} = msg -> IO.puts("tool #{msg.name}: #{msg.content}")
end
)The full reconstructed conversation is also available afterwards as response.messages.
Avoid sending messages to the calling process's mailbox from inside the callback, as the stream's receive loop will consume and discard unknown messages — use an Agent or ETS table to accumulate instead.
Controlling tool execution
By default, stream/3 automatically executes tool calls and loops. Disable this with:
{:ok, response} = LLM.stream("Hello",
provider: :openai,
model: "gpt-4",
tools: [MyTool],
auto_tools: false
)Limit the number of tool call rounds:
{:ok, response} = LLM.stream("Hello",
provider: :openai,
model: "gpt-4",
tools: [MyTool],
max_rounds: 3
)stream!
The bang variant raises on error:
response = LLM.stream!("Hello", provider: :openai, model: "gpt-4")Chunk types
Streaming responses produce a mix of chunk types:
| Chunk | Module | Fields | Description |
|---|---|---|---|
| Text | LLM.Stream.Chunk | text, index | A piece of generated text |
| Thinking | LLM.Stream.Thinking | text, signature | Reasoning/thinking content |
| Tool call | LLM.Stream.ToolCall | id, name, arguments, index, complete | A tool invocation |
| Stop | LLM.Stream.Stop | reason, usage | Stream ended |
| Error | LLM.Stream.Error | message, code | An error occurred |
Text chunks
%LLM.Stream.Chunk{text: "Hello", index: 0}Thinking chunks
Providers that support reasoning (Anthropic, Gemini, OpenAI with o-series) emit thinking chunks:
%LLM.Stream.Thinking{text: "Let me consider...", signature: nil}Anthropic thinking blocks include a signature field for verification:
%LLM.Stream.Thinking{
text: "The user is asking about...",
signature: "EqQBCgIYAhIM..."
}Tool call chunks
Tool calls arrive as chunks with complete: true when the full call is available:
%LLM.Stream.ToolCall{
id: "call_abc123",
name: "get_weather",
arguments: %{"location" => "San Francisco"},
index: 0,
complete: true
}During streaming, partial tool call arguments may arrive in multiple chunks. The library accumulates them internally and only emits a chunk when complete: true.
Stop chunks
%LLM.Stream.Stop{reason: :stop, usage: %LLM.Usage{input_tokens: 10, output_tokens: 50}}Common stop reasons:
:stop— natural end of response:length— hitmax_tokenslimit:tool_calls— model wants to call tools (stream continues):content_filter— content was filtered
Structured output
When schema: is set, the model returns JSON matching the given schema. Use stream/3 to get streaming with automatic parsing:
{:ok, response} = LLM.stream("Extract name and age",
[provider: :openai, model: "gpt-4o", schema: %{name: "person", schema: schema}],
on_chunk: &IO.write/1
)
response.parsed #=> %{"name" => "Alice", "age" => 30}Error handling
case LLM.stream("Hello", provider: :openai, model: "gpt-4") do
{:ok, response} -> handle_response(response)
{:error, reason} -> handle_error(reason)
endCommon errors:
:timeout— no data received within the timeout period%{status: 401, body: ...}— authentication failure%{status: 429, body: ...}— rate limit exceeded%{status: 500, body: ...}— server error
Manual chunk processing (advanced)
For fine-grained control, use LLM.Stream.start/2 and LLM.Stream.next/1:
{:ok, stream} = LLM.Stream.start(context, opts)
case LLM.Stream.next(stream) do
{:ok, chunks, stream} ->
Enum.each(chunks, fn
%LLM.Stream.Chunk{text: text} -> IO.write(text)
%LLM.Stream.ToolCall{name: name, arguments: args} ->
IO.puts("\nTool call: #{name}(#{inspect(args)})")
_ -> :ok
end)
{:halt, stream} ->
IO.puts("\nStream finished")
{:error, reason} ->
IO.puts("\nError: #{inspect(reason)}")
endNext steps
- Messages, Roles, and Tool Calls — message structure and the tool call lifecycle
- Tools — define tools for use with streaming
- Configuration — HTTP client and runtime options