ALLM.Runner (allm v0.4.0)

Copy Markdown View Source

Internal

This module is internal — it's documented for transparency, but call sites should use ALLM.generate/3 instead.

Layer-C non-streaming entry point. run/3 delegates to ALLM.StreamRunner.run/3, folds the returned stream through ALLM.StreamCollector, and wraps the final %Response{} in {:ok, _}.

Generate is stream-collected under the hood

Every public Layer-C entry point (ALLM.generate/3, ALLM.step/3, ALLM.chat/3) routes through this module which then delegates to ALLM.StreamRunner.run/3. The non-streaming public API is therefore a stream-collector reduction of the streaming path; the adapter's ALLM.Adapter.generate/2 callback is never invoked from the public façade. Consequence: [:allm, :adapter, :retry] telemetry (which is forbidden on streaming calls because partial output has already been delivered) does not fire from ALLM.generate/3. The retry surface is exercised by direct adapter calls and by the image façade — see ALLM.Retry.

Stream-first

Non-streaming generation is a reducer over the streaming path:

{:ok, stream} = ALLM.StreamRunner.run(engine, request, opts)
stream
|> Enum.reduce(ALLM.StreamCollector.new, &ALLM.StreamCollector.apply_event(&2, &1))
|> ALLM.StreamCollector.to_response

This is the same algorithm consumers can run manually against ALLM.stream_generate/3; it exists here so generate/3 has one canonical code path and stream-equivalence is preserved by construction.

Pre-flight vs. mid-stream errors

  • Pre-flightStreamRunner.run/3 returns {:error, struct} synchronously (no stream opened). run/3 bubbles the error up verbatim.
  • Mid-stream — the adapter opened a stream and then emitted a terminal {:error, struct} event. StreamCollector.apply_event/2 folds the error into %Response{finish_reason: :error, metadata: %{error: struct}}. run/3 still returns {:ok, response} — the caller inspects response.finish_reason == :error to detect it.

Usage carve-out

StreamRunner.run/3's include_raw_chunks: false filter preserves {:raw_chunk, {:usage, _}} events regardless of the caller's filter preference, so the collector always sees usage and populates response.usage — no Runner-side override needed.

Summary

Functions

Dispatch a non-streaming request by reducing the streaming adapter's output via ALLM.StreamCollector.

Functions

run(engine, request, opts \\ [])

Dispatch a non-streaming request by reducing the streaming adapter's output via ALLM.StreamCollector.

Returns {:ok, %Response{}} on a successfully-completed stream (a mid-stream {:error, _} still returns {:ok, _} with response.finish_reason == :error — see module doc) or {:error, struct} on a synchronous pre-flight failure.

Examples

iex> engine = ALLM.Engine.new(
...> adapter: ALLM.Providers.Fake,
...> adapter_opts: [script: [{:text, "hi"}, {:finish, :stop}]]
...>)
iex> req = ALLM.request([ALLM.user("say hi")])
iex> {:ok, response} = ALLM.Runner.run(engine, req)
iex> {response.output_text, response.finish_reason}
{"hi", :stop}