Persistence and resumability

Copy Markdown View Source

Pluggable behaviour

Persistence is a behaviour with two shipped adapters: an ephemeral/ETS default for people who don't want a database, and an Ecto/Postgres adapter for durable, resumable conversations. The framing is Oban-style: an append-only, ordered, queryable log is the source of truth.

ReqLLM's Context, Message, and ContentPart structs all implement Jason.Encoder, so writes to jsonb are free. Reads are not: ReqLLM (verified v1.16.0) has no public JSON→struct decode path — Response.decode_* decodes provider wire payloads, not persisted JSON. Agentix owns a small deserializer for those three structs (worth offering upstream); budget for the content-part variants and tool-call-args edge cases there.

The ETS table in the default adapter is owned by a supervisor-side owner process (or is a named public table) — never by the agent, or kill-and-resume dies with the very process it is supposed to survive.

Schema (sketch)

conversations
  id              uuid pk
  settings        jsonb        -- provider/model/system prompt snapshot
  fsm_state       jsonb        -- the persisted machine state (see below)
  status          enum         -- active | suspended | idle | ended
  inserted_at / updated_at

events                          -- append-only, ordered by per-conversation seq
  id              uuid pk
  conversation_id uuid fk
  seq             bigint       -- monotonic per conversation
  type           enum          -- user_msg | assistant_msg | tool_call
                                --  | tool_result | suspension | resolution
  content         jsonb
  inserted_at

summaries                       -- derived compaction artifacts; double as snapshots
  id              uuid pk
  conversation_id uuid fk
  from_seq        bigint        -- span of events this summary covers
  to_seq          bigint
  content         jsonb
  version         text
  inserted_at

tool_calls                      -- so HITL suspensions survive a kill
  id              text pk       -- the tool_call_id (correlation key)
  conversation_id uuid fk
  executor        enum
  status          enum          -- pending | resolved | errored | expired
  args            jsonb
  result          jsonb
  inserted_at / resolved_at

model_calls                     -- OPTIONAL audit, off by default (see below)
  id              uuid pk
  conversation_id uuid fk
  turn_ref        bigint
  rendered_context jsonb        -- exactly what was sent to the model
  model           text
  usage           jsonb
  latency_ms      integer
  summary_version text          -- which compaction summary applied
  evictions       jsonb         -- what was stubbed/evicted this turn
  inserted_at

The events table is the canonical log. tool_calls is partly derived but kept separate because its row-level mutable status (pending → resolved) is exactly what a revived agent needs to reconstruct in-flight suspensions.

summaries is derived, never canonical (compaction must not touch the log — see 05), but it is load-bearing: revival reads "latest summary + events after its to_seq," and the summarization reducer's state ("latest summary covers up to seq N") is derived from this table rather than carried loose.

Kill-and-resume

Both a new user message and a pending-call resolution enter through ensure_started(conversation_id) (see 01), so revival is automatic: kill the agent on idle, and the next event of either kind starts it back up and rehydrates from the log.

What you persist is not just the message log. A revived agent must also know it was suspended on a human and which calls are still pending — so the fsm_state snapshot carries the current state and the pending map. Reconstruct in-flight tool_calls from their rows. Without this, a revived agent comes back not knowing it owes the model a tool result.

Resolved: the fsm_state payload shape

The persisted snapshot is deliberately small (the canonical definition lives in 01):

fsm_state = %{
  state:    :idle | :awaiting_input,   # only ever persisted in these two
  pending:  %{tool_call_id => %{executor, kind, prompt}},
  last_seq: N                          # log position the working set was built from
}

last_seq is what lets revival read "latest summary + events since that summary's span" instead of replaying from zero (see snapshot cadence below). This one shape is the join between the resolver (03), the kill/resume path, and the schema. And because suspension/resolution are canonical event types, fsm_state is strictly a cache over the log — rebuildable, never authoritative; on disagreement the log wins (canonical statement in 01).

Safe-to-suspend states: idle and awaiting_input are clean to snapshot and evict. streaming and mid-:server-tool are not — there is no persisted streaming or executing_tools. A kill in those states is not frozen and resumed; recovery is from the log, and the two dangling shapes differ: a log ending in a user_msg re-runs the LLM turn (safe — no side effects yet); a log ending in a tool_call with no tool_result re-dispatches that exact call with the same tool_call_id — never re-rolls the LLM, which would mint new ids and duplicate side effects (canonical statement in 01). Idempotency on tool_call_id also covers the kill → revive → late-answer race.

Timeout machinery

Suspension timeouts belong to the persistence behaviour (schedule_expiry / cancel_expiry), not to core machinery — a per-agent timer dies with the agent, and Oban cannot be a core dependency when persistence is pluggable (Oban requires Ecto/Postgres; the default adapter is ETS). The Ecto adapter backs expiry with Oban jobs ("expire pending tool call X if still unresolved"), which survive kill/revive. The ETS adapter uses Process.send_after best-effort — acceptable, since ETS doesn't survive a restart anyway. Oban stays an optional dep of the adapter, not of Agentix.

Resolved: scope on revival

%Agentix.Scope{} is runtime ambient state (current user, db handle) and is not persisted. It is supplied per entry call: a LiveView resolution passes its own scope; a webhook or job passes what it has. Timeout-driven resolutions (the expiry job) run with a documented system scope — a tool that needs a real user scope and receives the system scope fails as a tool-error rather than guessing. Apps that need more can stash a serializable scope seed in conversations.settings, but that is app-level composition, not library machinery.

Context vs message storage

The principle that resolves the "store them together?" question: the message log is canonical; the rendered context is derived.

  • Messages (what the user said, what the model said, tool calls and results) are always logged. They define the logical conversation and are what replay reconstructs.
  • The resolved context a hook injects for a given turn (retrieval hits, memory) is a per-turn artifact — a function of the message plus external state at that instant. It does not go in the message history. Storing it inline conflates "what the user said" with "how we happened to augment that turn," bloats the canonical record, and lies on replay, because re-running the conversation should generally re-derive fresh context, not resurrect a stale snapshot.

So: messages always logged; resolved context logged separately and optionally, keyed by turn, in model_calls. That optional table is also where summarization version and evictions land (see 05), so that when someone reports "it forgot the address I gave it," you can tell whether it was compacted out or the model ignored it. The tradeoff "optional" buys is reproducibility: without recording exactly what was rendered, you cannot perfectly reconstruct why the model said what it said — valuable for evals and debugging, but it costs storage and has privacy implications, so it is off by default and switched on when evaluating.

Resolved: snapshot cadence

Event-sourced truth + snapshots as an optimization — and the compaction summaries are the snapshots. Pure replay-from-zero gets slow on long conversations; snapshot-only loses auditability. But the prefix summaries compaction already produces (see 05) are prefix snapshots of the conversation, keyed by the span of events they cover. So revival reads "latest summary + events since its span" (using fsm_state.last_seq) rather than replaying everything. One mechanism serves both compaction and snapshotting — no separate snapshot table or cadence to tune.

Resolved: model_calls GC

TTL-based, configurable, off by default. It is the fastest-growing, least- permanent table and exists only for debugging/evals, so a simple time-based drop (or a per-conversation row cap) is enough. Since the audit table itself is off by default (below), there is usually nothing to GC; when it is switched on for an eval run, the TTL keeps it from growing without bound.

Open questions

  • Multi-node persistence story (follows the addressing question in 01; out of scope for v0).