Process topology
One agent process per conversation. It is a :gen_statem, not a GenServer,
because a turn is genuinely a state machine and several features (human-in-the-loop
suspension above all) are state transitions rather than flags. States:
idle— no turn in flight.preparing— running pre-message hooks, assembling and reducing context.streaming— an LLM turn is streaming via a task.executing_tools— one or more tool tasks in flight.awaiting_input— suspended on external resolution (human or client tool).terminating— shutting down (idle eviction or explicit stop).
Supporting processes: a Registry for addressing agents by conversation_id, a
DynamicSupervisor for spawning them, and Phoenix.PubSub for broadcasting live
events to subscribers. The agent knows nothing about LiveView — it publishes to a
per-conversation topic and any number of subscribers consume.
The agent holds only the working set, never the full history — see
07-memory-and-sizing.md. Footprint is flat in conversation length and linear in
the active agent count.
Two event planes
There is not one event stream; there are two, with different masters. Conflating them is the trap behind "the event contract."
- Canonical events are written to the log in-process, synchronously, as part
of the agent's transition:
user_msg,assistant_msg(final),tool_call,tool_result,suspension,resolution. They are the source of truth and must never depend on PubSub delivery — a dropped broadcast cannot lose history. - Live events are broadcast over PubSub for observers and include ephemeral things never logged: token deltas, thinking deltas, tool progress, state transitions. If a subscriber misses them, nothing is lost; durable state is reconstructable from the log.
This is why a renderer is snapshot + live tail (see 06): read the log to rebuild
durable assigns on mount, then subscribe to live events for the delta.
The non-blocking principle
The agent process never blocks on I/O. The LLM stream runs in a monitored task that
sends chunks back as messages; tool calls run in their own monitored tasks and
report results back as messages. The agent's mailbox preserves per-turn ordering,
and because the agent is never parked inside an HTTP call it can always handle
cancel, inspect, and (later) queued input mid-turn. The only thing that happens
inside the agent's own execution is bookkeeping: append to the log, update the
pending-calls map, broadcast, decide the next transition.
ensure_started — the unifying door
Callers never hold an agent pid. Everything addresses by conversation_id through a
single ensure_started(conversation_id) that returns the live process (via its
Registry via tuple) or starts it under the DynamicSupervisor, rehydrating from
persistence on the way up. Both a new user message and a pending-call resolution
enter through this same door, which is why resumability falls out for free.
Persisted fsm_state (deliberately small)
fsm_state = %{
state: :idle | :awaiting_input, # only ever persisted in these two
pending: %{tool_call_id => %{executor, kind, prompt}},
last_seq: N # log position the working set was built from
}kind is :approval | :elicitation | :client_exec — the discriminator the
renderer switches on (see 06); executor alone can't distinguish a gated
:server call from an elicitation.
There is no persisted streaming or executing_tools state. This one shape
resolves "what to persist," "which states are safe to evict," and "what a
suspended-on-human agent carries" at once.
Because suspension and resolution are canonical event types, fsm_state is
strictly a cache over the log — rebuildable from it, never authoritative. On
any disagreement the log wins. This removes the atomicity problem between the log
append and the snapshot write (the ETS adapter has no transactions).
Resolved: suspend-safe states and mid-turn kills
Only idle and awaiting_input are snapshotted and evictable — they are clean. A
kill in streaming or mid-:server-tool is not frozen and resumed; mid-turn
durability is "recover from the log," not "freeze the stream." The two dangling
log shapes recover differently:
- Log ends in a
user_msg(killed while streaming, nothing dispatched) — re-run the LLM turn. Safe; no side effects happened yet. - Log ends in a
tool_callwith notool_result(killed mid-tool) — do not re-roll the LLM: it would mint new calls with new ids and re-fire side effects (send_emailtwice). Re-dispatch that exact call, sametool_call_id. The library passes the id through; only the user's tool can honor it — document loudly that side-effecting:servertools should usetool_call_idas their idempotency key.
Cancellation
"Stop generating" works from any non-idle state, not only streaming:
- From
streaming— kill the streaming task and tear down the HTTP connection. ReqLLM's stream response carries cancellation as a captured closure, invokedstream_response.cancel.()(there is no module-levelcancel/1); verify in tests the socket actually closes. Records a cancelled/partial assistant turn. - From
executing_tools—Task.shutdownthe tool tasks and log synthesized"[cancelled]"tool results so everytool_callkeeps its paired result — providers reject a rendered context with an orphaned call (see05). - From
awaiting_input— resolve all pending calls as "user cancelled" tool-errors. This is also the escape hatch from the single-in-flight composer lock: a user parked on an elicitation they don't want to answer cancels the turn instead of waiting out the timeout.
Resolved: idle eviction (two tiers)
- Short idle, parked in
awaiting_input→ hibernate. Compacts the heap via GC while keeping the process alive and addressable, so the human's eventual resolution arrives without a DB rehydrate. - Long idle → persist
fsm_stateand terminate, dropping from memory. Revival throughensure_startedis cheap, so terminating frees more than hibernating holds.
Concurrent-message policy
v0 is single-in-flight: a user message arriving mid-turn is rejected (the default UI disables the composer). Post-v0 we expect two caller-chosen modes — direct-send (interrupt/run-alongside) and queue (serialize). The state machine should handle "user input arrived in a non-idle state" explicitly per state to leave room for this.
Crash semantics
A crashing tool task is isolated by its task boundary and surfaces as an error result
fed back to the model, never an agent crash. If the agent crashes, the supervisor
restarts it and it rehydrates from the log via ensure_started. The log is the
recovery boundary.
ReqLLM operational notes (verified v1.16.0)
- Streaming pools are HTTP/1-only by default (a known Finch ALPN bug with mixed-protocol large bodies) — revisit pool config before chasing streaming concurrency at scale.
- Don't reach into ReqLLM struct internals: the substrate already migrated once within 1.x (TypedStruct → Zoi). Treat public constructors as the contract.
Remaining open
- Multi-node addressing (Registry is local; a distributed registry such as Horde or
syn would be needed for clustering). Out of scope for v0 — the invariant to protect
is that
ensure_startedstays the only addressing point, so this becomes a one-file change later.