Internal
This module is internal — it's documented for transparency, but
call sites should use ALLM.generate/3 instead.
Layer-C non-streaming entry point. run/3 delegates to
ALLM.StreamRunner.run/3, folds the returned stream through
ALLM.StreamCollector, and wraps the final %Response{} in
{:ok, _}.
Generate is stream-collected under the hood
Every public Layer-C entry point (ALLM.generate/3, ALLM.step/3,
ALLM.chat/3) routes through this module which then delegates to
ALLM.StreamRunner.run/3. The non-streaming public API is therefore
a stream-collector reduction of the streaming path; the adapter's
ALLM.Adapter.generate/2 callback is never invoked from the
public façade. Consequence: [:allm, :adapter, :retry] telemetry
(which is forbidden on streaming calls because partial output has
already been delivered) does not fire from ALLM.generate/3. The
retry surface is exercised by direct adapter calls and by the image
façade — see ALLM.Retry.
Stream-first
Non-streaming generation is a reducer over the streaming path:
{:ok, stream} = ALLM.StreamRunner.run(engine, request, opts)
stream
|> Enum.reduce(ALLM.StreamCollector.new, &ALLM.StreamCollector.apply_event(&2, &1))
|> ALLM.StreamCollector.to_responseThis is the same algorithm consumers can run manually against
ALLM.stream_generate/3; it exists here so generate/3 has one
canonical code path and stream-equivalence is preserved by
construction.
Pre-flight vs. mid-stream errors
- Pre-flight —
StreamRunner.run/3returns{:error, struct}synchronously (no stream opened).run/3bubbles the error up verbatim. - Mid-stream — the adapter opened a stream and then emitted a
terminal
{:error, struct}event.StreamCollector.apply_event/2folds the error into%Response{finish_reason: :error, metadata: %{error: struct}}.run/3still returns{:ok, response}— the caller inspectsresponse.finish_reason == :errorto detect it.
Usage carve-out
StreamRunner.run/3's include_raw_chunks: false filter preserves
{:raw_chunk, {:usage, _}} events regardless of the caller's filter
preference, so the collector always sees usage and populates
response.usage — no Runner-side override needed.
Summary
Functions
Dispatch a non-streaming request by reducing the streaming adapter's
output via ALLM.StreamCollector.
Functions
@spec run(ALLM.Engine.t(), ALLM.Request.t(), keyword()) :: {:ok, ALLM.Response.t()} | {:error, ALLM.Error.EngineError.t() | ALLM.Error.AdapterError.t() | ALLM.Error.ValidationError.t()}
Dispatch a non-streaming request by reducing the streaming adapter's
output via ALLM.StreamCollector.
Returns {:ok, %Response{}} on a successfully-completed stream (a
mid-stream {:error, _} still returns {:ok, _} with
response.finish_reason == :error — see module doc) or
{:error, struct} on a synchronous pre-flight failure.
Examples
iex> engine = ALLM.Engine.new(
...> adapter: ALLM.Providers.Fake,
...> adapter_opts: [script: [{:text, "hi"}, {:finish, :stop}]]
...>)
iex> req = ALLM.request([ALLM.user("say hi")])
iex> {:ok, response} = ALLM.Runner.run(engine, req)
iex> {response.output_text, response.finish_reason}
{"hi", :stop}