Every Jidoka operation declares one idempotency policy. That single field
drives whether the runtime can retry, whether resume can replay, whether
the spec compiles without an operation control, and whether incomplete
work surfaces to application reconciliation. This guide documents each
policy in detail, the Jidoka.Effect.Journal semantics on resume, and the
production guardrails that depend on getting this right.
When To Use This
- Use this guide when an agent is about to perform an external action you cannot blindly retry (charges, sends, deletes, deploys).
- Use this guide when you add a new operation source and need to pick the
right
idempotency:value. - Use this guide when an incomplete
Jidoka.resume/2should not blow up the agent loop but instead route to a reconciliation worker.
Prerequisites
- An agent with at least one operation. See Getting Started.
- Familiarity with
Jidoka.Agent.Spec.Operation(theidempotencyfield lives there). - Familiarity with snapshots and resume; see Snapshots And Resume.
mix deps.get
mix test
Quick Example
Declare a single risky operation. The spec refuses to compile until a matching operation control is attached.
defmodule MyApp.SupportAgent do
use Jidoka.Agent
agent :support_agent do
instructions "Refund only when explicitly approved."
end
tools do
action MyApp.RefundOrder, idempotency: :unsafe_once
end
controls do
operation MyApp.RequireRefundApproval,
when: [name: :refund_order, idempotency: :unsafe_once]
end
endCompiling the plan now succeeds:
{:ok, _plan} = Jidoka.plan(MyApp.SupportAgent)Remove the control and the plan refuses to compile with
{:error, {:unsafe_once_requires_control, "refund_order", :action}}.
Concepts
Idempotency policy is the contract between the agent definition, the runtime, and the journal. The runtime never assumes; it always asks.
╭───────────────────────╮ ╭──────────────────────╮
│ Operation.idempotency │────▶│ Spec validation │
╰───────────────────────╯ │ (compile-time gate) │
╰──────┬───────────────╯
│
▼
╭──────────────────────╮
│ Effect.Intent │
│ idempotency: ... │
│ idempotency_key: ... │
╰──────┬───────────────╯
│
╭──────────────┼──────────────╮
▼ ▼ ▼
Journal has Journal has Journal has
no intent intent only intent + result
│ │ │
▼ ▼ ▼
run capability per-policy replay journal
decision resultPer-policy resume rules:
:pureand:idempotentretry safely from inputs. Resume will replay the journaled result; missing results are recomputed.:dedupeprefers a recorded journal result. Use it for operations that are expensive but safe to repeat.:reconcileallows the application to inspect incomplete work after resume. The runtime returns the intent and lets a reconciliation worker decide.:unsafe_onceforbids automatic retry. Resume returns a typed error when an intent is recorded without a result.
How To
Step 1: Pick A Policy For Each Operation
Use the smallest policy that is still correct.
:pure- the operation is a deterministic function of its arguments with no observable side effects. Lookups, transformations, schema validations.:idempotent(default) - calling twice with the same key has the same external outcome. Most external APIs that accept idempotency keys.:dedupe- calling twice may be expensive or noisy, but is otherwise safe. Prefer the journaled result.:reconcile- external work can leave the system in an in-between state (an enqueued job whose status is unknown). The application owns reconciliation.:unsafe_once- calling twice is unsafe. Charges, sends, deletes, one-way deploys.
tools do
action MyApp.LookupOrder, idempotency: :pure
action MyApp.ChargeCard, idempotency: :unsafe_once
action MyApp.EnqueueJob, idempotency: :reconcile
endStep 2: Add Controls For :unsafe_once
Jidoka.Agent.Spec.Operation.requires_control?/1 returns true for
:unsafe_once. The plan compiler refuses to produce a plan without a
matching operation control.
controls do
operation MyApp.RequireChargeApproval,
when: [name: :charge_card, idempotency: :unsafe_once]
endThe control can allow, block, or interrupt for human review. See Human In The Loop for the durable approval flow.
Step 3: Understand The Journal On Resume
Every effect is recorded as an intent before the capability runs and as a result when the capability returns. On resume:
- If the journal already has a result for the pending intent, the effect interpreter replays it and never calls the capability.
- If only the intent is recorded, the per-policy validation runs.
- If the intent is missing, the runtime asks for the result through the capability.
%Jidoka.Effect.Journal{
intents: %{"operation:abc" => intent},
results: %{"operation:abc" => result}
}Jidoka.Effect.Journal.result_for/2 and Jidoka.Effect.Journal.intent_for/2
are the lookup helpers. Jidoka.Effect.Journal.incomplete_intent?/2 is
true when an intent has no recorded result.
Step 4: Handle Reconciliation Paths
For :reconcile operations, the application is expected to observe
incomplete intents and resolve them out of band. A common pattern is to
enumerate snapshots whose journal has incomplete intents and route them
to a reconciler.
def reconcile_pending(snapshot) do
journal = snapshot.turn_state.journal
for {_id, intent} <- journal.intents,
is_nil(Jidoka.Effect.Journal.result_for(journal, intent)),
intent.idempotency == :reconcile do
MyApp.Reconciler.queue(intent)
end
endAfter reconciliation completes externally, persist the result into the journal (or rebuild the snapshot through your own session pipeline) and resume.
Step 5: Trust The :unsafe_once Guard On Resume
The runtime never quietly retries an :unsafe_once operation that has
an incomplete intent. Resume returns a typed error instead:
case Jidoka.resume(snapshot, llm: llm, operations: operations) do
{:error, %Jidoka.Error{} = error} ->
case Jidoka.error_to_map(error) do
%{reason: :unsafe_once_incomplete_effect, intent_id: id} ->
MyApp.UnsafeOnceQueue.route(id, snapshot)
_ ->
Logger.error("resume failed: " <> inspect(error))
end
other ->
other
endApproved interrupts are the supported way to re-enter an :unsafe_once
intent: the operation control approves the specific interrupt, and the
runtime stamps metadata["approved_interrupt_id"] on the effect. The
journal then accepts the call exactly once.
Step 6: Distinguish :dedupe From :idempotent
Both policies are safe to retry. The difference is intent:
:idempotentsays "retry is correct; do not avoid it." Use it for HTTP calls with idempotency keys, database upserts, and most external APIs that already promise safe retries.:dedupesays "retry is correct but wasteful; prefer the journaled result." Use it for cache-fronted lookups or any operation that encountered a cost (LLM call, paid API, expensive aggregation) you do not want to repeat.
In practice this guides resume behavior: :dedupe on resume always
prefers the journaled result; :idempotent is happy to call the
capability again when the intent has no result.
Common Patterns
- Default to
:idempotent. It is the spec default for a reason. - Promote to
:unsafe_oncefor external state changes. Charges, sends, deletes, and deploys deserve the strongest guard. - Use
:reconcilewhen truth lives elsewhere. Async work queued to an external system is the canonical case. - Pair
:unsafe_oncewith an approval workflow. A blocking control is fine for "never allow"; an interrupting control is needed for "allow once a reviewer says so." - Treat the journal as the contract. Tests should make assertions
against
Effect.Journal.intent_recorded?/2andEffect.Journal.result_for/2, not on capability call counts alone.
Testing
A simple test exercises both the compile-time gate and the resume-time guard.
test "unsafe_once requires an operation control before plan compiles" do
spec =
Jidoka.agent!(
id: "risky",
instructions: "Charge only when explicit.",
operations: [
%{name: "charge_card", idempotency: :unsafe_once, kind: :action}
]
)
assert {:error, {:unsafe_once_requires_control, "charge_card", :action}} =
Jidoka.plan(spec)
end
test "incomplete unsafe_once intent fails on resume" do
llm = fn _intent, _journal ->
{:ok, %{type: :operation, name: "charge_card",
arguments: %{"order_id" => "A1"}}}
end
operations = fn _intent, _journal -> raise "boom" end
assert {:error, _error} =
Jidoka.turn(MyApp.SupportAgent, "Charge A1",
llm: llm,
operations: operations
)
# The application persisted the snapshot for a reconciler before
# surfacing the failure; resuming it never retries the operation.
endFor :dedupe and :reconcile operations, build a journal that already
has the desired shape and assert that resume routes correctly.
Troubleshooting
| Symptom | Likely Cause | Fix |
|---|---|---|
{:error, {:unsafe_once_requires_control, name, kind}} | An :unsafe_once operation has no matching operation control. | Add a controls do ... operation ... when: [name: name] end clause. |
{:error, %Jidoka.Error{reason: :unsafe_once_incomplete_effect}} | Resume saw a recorded intent without a result. | Route the snapshot to reconciliation; do not auto-retry. |
| Capability called twice on resume | Operation is :idempotent and the journal lost the result. | Persist the full snapshot including its journal; ensure your store preserves all fields. |
| Reconciliation never fires | The journal had no incomplete intents because the runtime did call the capability. | Confirm the policy is :reconcile, not :idempotent, and inspect result.journal. |
Approved interrupt still errors on :unsafe_once | Approval target was the wrong interrupt id. | Build the response with Jidoka.Review.Response.approve(review.interrupt_id). |
:dedupe operation still runs every time | The journal across resumes is empty because each call started a new turn. | Use a session so the journal persists, or pass the prior snapshot to resume/2. |
Reference
Key modules touched in this guide:
Jidoka.Agent.Spec.Operation-valid_idempotencies/0,requires_control?/1,replay_safe?/1,kind/1.Jidoka.Agent.Spec-validate_operation_policies/1,validate_operation_policy/2.Jidoka.Effect.Intent- struct that carries the policy and the deterministicidempotency_key.Jidoka.Effect.Journal-put_intent/2,put_result/2,result_for/2,intent_recorded?/2,incomplete_intent?/2.Jidoka.Runtime.EffectInterpreter- effect shell that enforces the per-policy resume rules.Jidoka.Review.Response- the approval path that lets:unsafe_onceoperations execute exactly once.
Related Guides
- Controls - the operation control surface required by
:unsafe_once. - Human In The Loop - durable approvals for risky operations.
- Snapshots And Resume - the durable artifact the journal lives inside.
- Sessions And Stores - the durable session that preserves the journal between turns.