Server-Sent Events (SSE) streaming for LLM inference.
Streams tokens from local engines and remote providers as they are generated, calling a user-supplied callback for each chunk.
Usage
Candil.Stream.chat(:llama3, [
%{role: "user", content: "Write a haiku about Elixir"}
], fn chunk ->
IO.write(chunk.content)
end)The callback receives a chunk() map:
%{content: "token", finish_reason: nil | "stop" | "length", done: false}When streaming ends the callback is called once more with done: true.
Provider support
OpenAI, Anthropic, Ollama, OpenAI-compatible and local llama-server.
Summary
Functions
Streams a chat completion from a running local engine identified by alias.
Streams a chat completion from a remote provider.
Types
Functions
@spec chat(atom(), [Candil.Inference.message()], stream_callback(), keyword()) :: :ok | {:error, any()}
Streams a chat completion from a running local engine identified by alias.
@spec chat( Candil.Model.t(), Candil.Provider.t(), [Candil.Inference.message()], stream_callback(), keyword() ) :: :ok | {:error, any()}
Streams a chat completion from a remote provider.