All LLM requests in this library go through the streaming path. Even generate/2 starts a stream and collects it internally. This guide covers how to work with streams directly.
Starting a stream
{:ok, stream} = LLM.stream("Tell me a story",
provider: :openai,
model: "gpt-4"
)The returned stream is an LLM.Stream.t() struct containing the HTTP response reference, adapter state, and configuration.
Collecting chunks
The simplest way to consume a stream is LLM.Stream.collect/2:
{:ok, response} = LLM.Stream.collect(stream)
response.message.content
#=> "Once upon a time..."With a callback
Process each chunk as it arrives:
{:ok, response} = LLM.Stream.collect(stream,
on_chunk: fn
%LLM.Stream.Chunk{text: text} -> IO.write(text)
%LLM.Stream.Thinking{text: text} -> IO.puts("\n[thinking] #{text}")
%LLM.Stream.Stop{reason: reason} -> IO.puts("\n[stopped: #{reason}]")
_ -> :ok
end
)Controlling tool execution
By default, collect/2 automatically executes tool calls and loops. Disable this with:
{:ok, response} = LLM.Stream.collect(stream, auto_tools: false)Limit the number of tool call rounds:
{:ok, response} = LLM.Stream.collect(stream, max_rounds: 3)Manual chunk processing
For fine-grained control, use LLM.Stream.next/1:
case LLM.Stream.next(stream) do
{:ok, chunks, stream} ->
Enum.each(chunks, fn
%LLM.Stream.Chunk{text: text} -> IO.write(text)
%LLM.Stream.ToolCall{name: name, arguments: args} ->
IO.puts("\nTool call: #{name}(#{inspect(args)})")
_ -> :ok
end)
# Continue with next/1...
{:halt, stream} ->
IO.puts("\nStream finished")
{:error, reason} ->
IO.puts("\nError: #{inspect(reason)}")
endChunk types
Streaming responses produce a mix of chunk types:
| Chunk | Module | Fields | Description |
|---|---|---|---|
| Text | LLM.Stream.Chunk | text, index | A piece of generated text |
| Thinking | LLM.Stream.Thinking | text, signature | Reasoning/thinking content |
| Tool call | LLM.Stream.ToolCall | id, name, arguments, index, complete | A tool invocation |
| Stop | LLM.Stream.Stop | reason, usage | Stream ended |
| Error | LLM.Stream.Error | message, code | An error occurred |
Text chunks
%LLM.Stream.Chunk{text: "Hello", index: 0}Thinking chunks
Providers that support reasoning (Anthropic, Gemini, OpenAI with o-series) emit thinking chunks:
%LLM.Stream.Thinking{text: "Let me consider...", signature: nil}Anthropic thinking blocks include a signature field for verification:
%LLM.Stream.Thinking{
text: "The user is asking about...",
signature: "EqQBCgIYAhIM..."
}Tool call chunks
Tool calls arrive as chunks with complete: true when the full call is available:
%LLM.Stream.ToolCall{
id: "call_abc123",
name: "get_weather",
arguments: %{"location" => "San Francisco"},
index: 0,
complete: true
}During streaming, partial tool call arguments may arrive in multiple chunks. The library accumulates them internally and only emits a chunk when complete: true.
Stop chunks
%LLM.Stream.Stop{reason: :stop, usage: %LLM.Usage{input_tokens: 10, output_tokens: 50}}Common stop reasons:
:stop— natural end of response:length— hitmax_tokenslimit:tool_calls— model wants to call tools (stream continues):content_filter— content was filtered
Stream timeout
The default timeout is 30 seconds. Override it in the options:
{:ok, stream} = LLM.stream("Long generation", provider: :openai, model: "gpt-4")
# The stream struct's timeout can be set via optsError handling
case LLM.stream("Hello", provider: :openai, model: "gpt-4") do
{:ok, stream} ->
case LLM.Stream.collect(stream) do
{:ok, response} -> handle_response(response)
{:error, reason} -> handle_error(reason)
end
{:error, reason} ->
handle_error(reason)
endCommon errors:
:timeout— no data received within the timeout period%{status: 401, body: ...}— authentication failure%{status: 429, body: ...}— rate limit exceeded%{status: 500, body: ...}— server error
Combining streaming with tools
When using collect/2 with the default auto_tools: true, the library will:
- Collect all chunks from the stream
- If tool calls are present, execute them
- Append the tool results to the conversation
- Start a new stream with the updated context
- Repeat until no more tool calls or
max_roundsis reached
This is transparent — the final LLM.Response contains the complete message after all tool call rounds.
{:ok, response} = LLM.generate("Read mix.exs and summarize it",
provider: :openai,
model: "gpt-4",
tools: [MyApp.ReadFileTool, MyApp.SummarizeTool],
max_rounds: 5
)
# The response includes text from after all tool executions
response.message.content
#=> "The mix.exs file defines..."Next steps
- Messages, Roles, and Tool Calls — message structure and the tool call lifecycle
- Tools — define tools for use with streaming
- Configuration — HTTP client and runtime options