Pluggable behaviour
Persistence is a behaviour with two shipped adapters: an ephemeral/ETS default for people who don't want a database, and an Ecto/Postgres adapter for durable, resumable conversations. The framing is Oban-style: an append-only, ordered, queryable log is the source of truth.
ReqLLM's Context, Message, and ContentPart structs all implement
Jason.Encoder, so writes to jsonb are free. Reads are not: ReqLLM
(verified v1.16.0) has no public JSON→struct decode path — Response.decode_*
decodes provider wire payloads, not persisted JSON. Agentix owns a small
deserializer for those three structs (worth offering upstream); budget for the
content-part variants and tool-call-args edge cases there.
The ETS table in the default adapter is owned by a supervisor-side owner process (or is a named public table) — never by the agent, or kill-and-resume dies with the very process it is supposed to survive.
Schema (sketch)
conversations
id uuid pk
settings jsonb -- provider/model/system prompt snapshot
fsm_state jsonb -- the persisted machine state (see below)
status enum -- active | suspended | idle | ended
inserted_at / updated_at
events -- append-only, ordered by per-conversation seq
id uuid pk
conversation_id uuid fk
seq bigint -- monotonic per conversation
type enum -- user_msg | assistant_msg | tool_call
-- | tool_result | suspension | resolution
content jsonb
inserted_at
summaries -- derived compaction artifacts; double as snapshots
id uuid pk
conversation_id uuid fk
from_seq bigint -- span of events this summary covers
to_seq bigint
content jsonb
version text
inserted_at
tool_calls -- so HITL suspensions survive a kill
id text pk -- the tool_call_id (correlation key)
conversation_id uuid fk
executor enum
status enum -- pending | resolved | errored | expired
args jsonb
result jsonb
inserted_at / resolved_at
model_calls -- OPTIONAL audit, off by default (see below)
id uuid pk
conversation_id uuid fk
turn_ref bigint
rendered_context jsonb -- exactly what was sent to the model
model text
usage jsonb
latency_ms integer
summary_version text -- which compaction summary applied
evictions jsonb -- what was stubbed/evicted this turn
inserted_atThe events table is the canonical log. tool_calls is partly derived but kept
separate because its row-level mutable status (pending → resolved) is exactly what a
revived agent needs to reconstruct in-flight suspensions.
summaries is derived, never canonical (compaction must not touch the log — see
05), but it is load-bearing: revival reads "latest summary + events after its
to_seq," and the summarization reducer's state ("latest summary covers up to seq
N") is derived from this table rather than carried loose.
Kill-and-resume
Both a new user message and a pending-call resolution enter through
ensure_started(conversation_id) (see 01), so revival is automatic: kill the
agent on idle, and the next event of either kind starts it back up and rehydrates
from the log.
What you persist is not just the message log. A revived agent must also know it
was suspended on a human and which calls are still pending — so the fsm_state
snapshot carries the current state and the pending map. Reconstruct in-flight
tool_calls from their rows. Without this, a revived agent comes back not knowing
it owes the model a tool result.
Resolved: the fsm_state payload shape
The persisted snapshot is deliberately small (the canonical definition lives in
01):
fsm_state = %{
state: :idle | :awaiting_input, # only ever persisted in these two
pending: %{tool_call_id => %{executor, kind, prompt}},
last_seq: N # log position the working set was built from
}last_seq is what lets revival read "latest summary + events since that summary's
span" instead of replaying from zero (see snapshot cadence below). This one shape is
the join between the resolver (03), the kill/resume path, and the schema. And
because suspension/resolution are canonical event types, fsm_state is strictly
a cache over the log — rebuildable, never authoritative; on disagreement the log
wins (canonical statement in 01).
Safe-to-suspend states: idle and awaiting_input are clean to snapshot and
evict. streaming and mid-:server-tool are not — there is no persisted
streaming or executing_tools. A kill in those states is not frozen and
resumed; recovery is from the log, and the two dangling shapes differ: a log ending
in a user_msg re-runs the LLM turn (safe — no side effects yet); a log ending in a
tool_call with no tool_result re-dispatches that exact call with the same
tool_call_id — never re-rolls the LLM, which would mint new ids and duplicate
side effects (canonical statement in 01). Idempotency on tool_call_id also
covers the kill → revive → late-answer race.
Timeout machinery
Suspension timeouts belong to the persistence behaviour (schedule_expiry /
cancel_expiry), not to core machinery — a per-agent timer dies with the agent,
and Oban cannot be a core dependency when persistence is pluggable (Oban requires
Ecto/Postgres; the default adapter is ETS). The Ecto adapter backs expiry with Oban
jobs ("expire pending tool call X if still unresolved"), which survive kill/revive.
The ETS adapter uses Process.send_after best-effort — acceptable, since ETS
doesn't survive a restart anyway. Oban stays an optional dep of the adapter, not of
Agentix.
Resolved: scope on revival
%Agentix.Scope{} is runtime ambient state (current user, db handle) and is not
persisted. It is supplied per entry call: a LiveView resolution passes its own
scope; a webhook or job passes what it has. Timeout-driven resolutions (the expiry
job) run with a documented system scope — a tool that needs a real user scope
and receives the system scope fails as a tool-error rather than guessing. Apps that
need more can stash a serializable scope seed in conversations.settings, but that
is app-level composition, not library machinery.
Context vs message storage
The principle that resolves the "store them together?" question: the message log is canonical; the rendered context is derived.
- Messages (what the user said, what the model said, tool calls and results) are always logged. They define the logical conversation and are what replay reconstructs.
- The resolved context a hook injects for a given turn (retrieval hits, memory) is a per-turn artifact — a function of the message plus external state at that instant. It does not go in the message history. Storing it inline conflates "what the user said" with "how we happened to augment that turn," bloats the canonical record, and lies on replay, because re-running the conversation should generally re-derive fresh context, not resurrect a stale snapshot.
So: messages always logged; resolved context logged separately and optionally,
keyed by turn, in model_calls. That optional table is also where summarization
version and evictions land (see 05), so that when someone reports "it forgot the
address I gave it," you can tell whether it was compacted out or the model ignored
it. The tradeoff "optional" buys is reproducibility: without recording exactly what
was rendered, you cannot perfectly reconstruct why the model said what it said —
valuable for evals and debugging, but it costs storage and has privacy
implications, so it is off by default and switched on when evaluating.
Resolved: snapshot cadence
Event-sourced truth + snapshots as an optimization — and the compaction summaries
are the snapshots. Pure replay-from-zero gets slow on long conversations;
snapshot-only loses auditability. But the prefix summaries compaction already
produces (see 05) are prefix snapshots of the conversation, keyed by the span of
events they cover. So revival reads "latest summary + events since its span" (using
fsm_state.last_seq) rather than replaying everything. One mechanism serves both
compaction and snapshotting — no separate snapshot table or cadence to tune.
Resolved: model_calls GC
TTL-based, configurable, off by default. It is the fastest-growing, least- permanent table and exists only for debugging/evals, so a simple time-based drop (or a per-conversation row cap) is enough. Since the audit table itself is off by default (below), there is usually nothing to GC; when it is switched on for an eval run, the TTL keeps it from growing without bound.
Open questions
- Multi-node persistence story (follows the addressing question in
01; out of scope for v0).