Process topology

One agent process per conversation. It is a :gen_statem, not a GenServer, because a turn is genuinely a state machine and several features (human-in-the-loop suspension above all) are state transitions rather than flags. States:

  • idle — no turn in flight.
  • preparing — running pre-message hooks, assembling and reducing context.
  • streaming — an LLM turn is streaming via a task.
  • executing_tools — one or more tool tasks in flight.
  • awaiting_input — suspended on external resolution (human or client tool).
  • terminating — shutting down (idle eviction or explicit stop).

Supporting processes: a Registry for addressing agents by conversation_id, a DynamicSupervisor for spawning them, and Phoenix.PubSub for broadcasting live events to subscribers. The agent knows nothing about LiveView — it publishes to a per-conversation topic and any number of subscribers consume.

The agent holds only the working set, never the full history — see 07-memory-and-sizing.md. Footprint is flat in conversation length and linear in the active agent count.

Two event planes

There is not one event stream; there are two, with different masters. Conflating them is the trap behind "the event contract."

  • Canonical events are written to the log in-process, synchronously, as part of the agent's transition: user_msg, assistant_msg (final), tool_call, tool_result, suspension, resolution. They are the source of truth and must never depend on PubSub delivery — a dropped broadcast cannot lose history.
  • Live events are broadcast over PubSub for observers and include ephemeral things never logged: token deltas, thinking deltas, tool progress, state transitions. If a subscriber misses them, nothing is lost; durable state is reconstructable from the log.

This is why a renderer is snapshot + live tail (see 06): read the log to rebuild durable assigns on mount, then subscribe to live events for the delta.

The non-blocking principle

The agent process never blocks on I/O. The LLM stream runs in a monitored task that sends chunks back as messages; tool calls run in their own monitored tasks and report results back as messages. The agent's mailbox preserves per-turn ordering, and because the agent is never parked inside an HTTP call it can always handle cancel, inspect, and (later) queued input mid-turn. The only thing that happens inside the agent's own execution is bookkeeping: append to the log, update the pending-calls map, broadcast, decide the next transition.

ensure_started — the unifying door

Callers never hold an agent pid. Everything addresses by conversation_id through a single ensure_started(conversation_id) that returns the live process (via its Registry via tuple) or starts it under the DynamicSupervisor, rehydrating from persistence on the way up. Both a new user message and a pending-call resolution enter through this same door, which is why resumability falls out for free.

Persisted fsm_state (deliberately small)

fsm_state = %{
  state:    :idle | :awaiting_input,   # only ever persisted in these two
  pending:  %{tool_call_id => %{executor, kind, prompt}},
  last_seq: N                          # log position the working set was built from
}

kind is :approval | :elicitation | :client_exec — the discriminator the renderer switches on (see 06); executor alone can't distinguish a gated :server call from an elicitation.

There is no persisted streaming or executing_tools state. This one shape resolves "what to persist," "which states are safe to evict," and "what a suspended-on-human agent carries" at once.

Because suspension and resolution are canonical event types, fsm_state is strictly a cache over the log — rebuildable from it, never authoritative. On any disagreement the log wins. This removes the atomicity problem between the log append and the snapshot write (the ETS adapter has no transactions).

Resolved: suspend-safe states and mid-turn kills

Only idle and awaiting_input are snapshotted and evictable — they are clean. A kill in streaming or mid-:server-tool is not frozen and resumed; mid-turn durability is "recover from the log," not "freeze the stream." The two dangling log shapes recover differently:

  • Log ends in a user_msg (killed while streaming, nothing dispatched) — re-run the LLM turn. Safe; no side effects happened yet.
  • Log ends in a tool_call with no tool_result (killed mid-tool) — do not re-roll the LLM: it would mint new calls with new ids and re-fire side effects (send_email twice). Re-dispatch that exact call, same tool_call_id. The library passes the id through; only the user's tool can honor it — document loudly that side-effecting :server tools should use tool_call_id as their idempotency key.

Cancellation

"Stop generating" works from any non-idle state, not only streaming:

  • From streaming — kill the streaming task and tear down the HTTP connection. ReqLLM's stream response carries cancellation as a captured closure, invoked stream_response.cancel.() (there is no module-level cancel/1); verify in tests the socket actually closes. Records a cancelled/partial assistant turn.
  • From executing_toolsTask.shutdown the tool tasks and log synthesized "[cancelled]" tool results so every tool_call keeps its paired result — providers reject a rendered context with an orphaned call (see 05).
  • From awaiting_input — resolve all pending calls as "user cancelled" tool-errors. This is also the escape hatch from the single-in-flight composer lock: a user parked on an elicitation they don't want to answer cancels the turn instead of waiting out the timeout.

Resolved: idle eviction (two tiers)

  • Short idle, parked in awaiting_inputhibernate. Compacts the heap via GC while keeping the process alive and addressable, so the human's eventual resolution arrives without a DB rehydrate.
  • Long idle → persist fsm_state and terminate, dropping from memory. Revival through ensure_started is cheap, so terminating frees more than hibernating holds.

Concurrent-message policy

v0 is single-in-flight: a user message arriving mid-turn is rejected (the default UI disables the composer). Post-v0 we expect two caller-chosen modes — direct-send (interrupt/run-alongside) and queue (serialize). The state machine should handle "user input arrived in a non-idle state" explicitly per state to leave room for this.

Crash semantics

A crashing tool task is isolated by its task boundary and surfaces as an error result fed back to the model, never an agent crash. If the agent crashes, the supervisor restarts it and it rehydrates from the log via ensure_started. The log is the recovery boundary.

ReqLLM operational notes (verified v1.16.0)

  • Streaming pools are HTTP/1-only by default (a known Finch ALPN bug with mixed-protocol large bodies) — revisit pool config before chasing streaming concurrency at scale.
  • Don't reach into ReqLLM struct internals: the substrate already migrated once within 1.x (TypedStruct → Zoi). Treat public constructors as the contract.

Remaining open

  • Multi-node addressing (Registry is local; a distributed registry such as Horde or syn would be needed for clustering). Out of scope for v0 — the invariant to protect is that ensure_started stays the only addressing point, so this becomes a one-file change later.