AI SDK - Streaming Capabilities
View SourceOverview
The Elixir AI SDK provides streaming capabilities for text generation, allowing you to receive AI model responses incrementally as they're generated instead of waiting for the complete response. This provides a more responsive user experience, especially for longer responses.
The SDK implements Server-Sent Events (SSE) streaming using a robust EventSource implementation based on Finch to efficiently process responses chunk by chunk in real-time with minimal memory overhead. This implementation properly handles backpressure, connection management, and parsing of the SSE protocol.
Streaming Modes
The SDK supports two streaming modes to accommodate different use cases:
- String Mode (Default): Returns a stream of plain text chunks, ideal for simple use cases
- Event Mode: Returns a stream of event tuples for fine-grained control over different event types
Getting Started with Streaming
String Mode (Default)
The default mode returns a stream of text strings, making it extremely simple to use:
{:ok, result} = AI.stream_text(%{
model: AI.openai("gpt-3.5-turbo"),
prompt: "Write a short story about a robot learning to paint."
})
# Process the stream - each chunk is a simple string
result.stream
|> Stream.each(&IO.write/1) # Write each chunk as it arrives
|> Stream.run() # Start consuming the stream
# Or collect all chunks into a single string
full_text = Enum.join(result.stream, "")
Event Mode
For applications that need to handle different types of streaming events (text, errors, completion), use event mode:
{:ok, result} = AI.stream_text(%{
model: AI.openai("gpt-3.5-turbo"),
prompt: "Write a short story about a robot learning to paint.",
mode: :event # Enable event mode
})
# Process different event types
result.stream
|> Enum.each(fn
{:text_delta, chunk} -> IO.write(chunk)
{:finish, reason} -> IO.puts("\nCompleted: #{reason}")
{:error, error} -> IO.puts("\nError: #{inspect(error)}")
_ -> :ok # Ignore other events
end)
Stream Format
String Mode Format
In string mode, the stream produces text chunks directly as strings. Each chunk is a fragment of the model's response, making it easy to consume and work with.
Event Mode Format
In event mode, the stream produces tuples representing different event types:
{:text_delta, String.t()}
- A chunk of text from the model{:finish, String.t()}
- Stream has completed with reason (e.g., "stop", "length"){:metadata, map()}
- Additional metadata from the model{:error, term()}
- An error occurred during streaming
These events are produced by parsing the Server-Sent Events (SSE) format returned by AI providers. The SDK handles all the complexities of SSE parsing, including multi-line data fields, JSON parsing, and event formatting.
Examples
Basic Streaming
{:ok, result} = AI.stream_text(%{
model: AI.openai("gpt-3.5-turbo"),
prompt: "Explain how streaming works in LLMs."
})
result.stream
|> Stream.each(&IO.write/1)
|> Stream.run()
With System Message
{:ok, result} = AI.stream_text(%{
model: AI.openai("gpt-3.5-turbo"),
system: "You are a helpful assistant that responds in the style of Shakespeare.",
prompt: "Tell me about the weather today."
})
result.stream
|> Stream.each(&IO.write/1)
|> Stream.run()
Collecting the Full Response
If you want to collect all chunks into a single string:
{:ok, result} = AI.stream_text(%{
model: AI.openai("gpt-3.5-turbo"),
prompt: "Write a haiku about programming."
})
full_text = Enum.join(result.stream, "")
IO.puts("Full response: #{full_text}")
Error Handling
case AI.stream_text(%{
model: AI.openai("gpt-3.5-turbo"),
prompt: "Tell me a joke."
}) do
{:ok, result} ->
# Simply process the text chunks
result.stream
|> Stream.each(&IO.write/1)
|> Stream.run()
{:error, error} ->
IO.puts("Failed to start streaming: #{inspect(error)}")
end
The SDK handles several types of errors during streaming:
- Initial connection failures
- Network interruptions during streaming
- Invalid SSE format
- Timeouts (default: 30 seconds for initial connection)
- HTTP error responses from the provider
- Finch-related errors
- JSON parsing errors from malformed responses
All error conditions are properly propagated as {:error, reason}
events in the stream or as an error response from the initial connection.
Advanced Usage
EventSource Implementation
The SDK uses a custom AI.Provider.Utils.EventSource
module that implements the Server-Sent Events (SSE) protocol. This implementation:
- Creates and manages HTTP connections with proper headers
- Parses the SSE format according to specification
- Handles event reassembly from chunks
- Provides proper backpressure for efficient streaming
- Cleans up resources when the stream is done
Custom Stream Processing
You can use all of Elixir's stream processing capabilities:
{:ok, result} = AI.stream_text(%{
model: AI.openai("gpt-3.5-turbo"),
prompt: "List 10 programming languages."
})
result.stream
|> Stream.chunk_by(fn chunk -> chunk == "\n" end)
|> Stream.reject(fn chunk -> chunk == ["\n"] end)
|> Stream.map(fn chunks -> Enum.join(chunks, "") end)
|> Stream.each(fn line -> IO.puts("Language: #{line}") end)
|> Stream.run()
Working with Phoenix LiveView
Streaming works especially well with Phoenix LiveView, allowing you to update the UI in real-time as responses are generated:
def handle_event("generate", %{"prompt" => prompt}, socket) do
# Start streaming in a separate process
Task.async(fn ->
case AI.stream_text(%{
model: AI.openai("gpt-3.5-turbo"),
prompt: prompt
}) do
{:ok, result} ->
result.stream
|> Stream.each(fn chunk ->
# Send each chunk to the LiveView process
send(self(), {:stream_chunk, chunk})
end)
|> Stream.run()
send(self(), :stream_complete)
{:error, error} ->
send(self(), {:stream_error, error})
end
end)
{:noreply, socket |> assign(generating: true, response: "")}
end
def handle_info({:stream_chunk, chunk}, socket) do
# Append the new chunk to the existing response
new_response = socket.assigns.response <> chunk
{:noreply, socket |> assign(response: new_response)}
end
def handle_info(:stream_complete, socket) do
{:noreply, socket |> assign(generating: false)}
end
def handle_info({:stream_error, error}, socket) do
{:noreply, socket |> assign(generating: false, error: inspect(error))}
end
API Reference
AI.stream_text/1
@spec stream_text(map()) :: {:ok, map()} | {:error, any()}
Options:
:model
- The language model to use:system
- A system message that will be part of the prompt:prompt
- A simple text prompt (can use either prompt or messages):messages
- A list of messages (can use either prompt or messages):max_tokens
- Maximum number of tokens to generate:temperature
- Temperature setting for randomness:top_p
- Nucleus sampling:top_k
- Top-k sampling:frequency_penalty
- Penalize new tokens based on their frequency:presence_penalty
- Penalize new tokens based on their presence:tools
- Tools that are accessible to and can be called by the model:mode
- Stream output format::string
(default) or:event
Streaming-specific options:
:timeout
- Connection timeout in milliseconds (default: 30000):max_line_length
- Maximum length of an SSE line (default: 16384):retry_interval
- Time to wait before reconnecting in ms (default: 3000):test_mode
- For testing only, can be:basic
,:openai
,:multi_line
, or:error
Returns:
{:ok, result}
- Success, with result containing:stream
- The stream of text chunks (strings in string mode, event tuples in event mode)warnings
- Any warnings generated during processingprovider_metadata
- Additional provider-specific metadataresponse
- The raw response from the provider
{:error, reason}
- Error with reason for failure
Limitations and Future Improvements
- Tool calls via streaming are not yet fully supported
- Support for structured output streaming is planned for future releases
- Future enhancements will include:
- Improved backpressure handling for very large responses
- Automatic reconnection for temporary network issues
- Progress tracking and statistics
- Configurable stream processing pipelines
- Additional providers beyond OpenAI
Technical Implementation
The streaming functionality is built on several key components:
AI.stream_text/1
- The main API entry pointAI.Core.StreamText
- Core implementation of stream handlingAI.Provider.Utils.EventSource
- SSE protocol implementation- Provider-specific
do_stream
implementations (e.g., OpenAI)
The SSE protocol implementation follows the HTML5 EventSource specification and is compatible with the streaming APIs of major providers like OpenAI.