You need memory-aware responses and budget controls before production traffic scales.
After this guide, you can enable retrieval and quota plugins, run retrieval/quota actions directly, and understand request-rewrite behavior.
⚠️ ReAct agents and retrieval enrichment
You enabled retrieval but your ReAct agent ignores memory? That's expected.
Jido.AI.Agent.ask/ask_syncemitsai.react.query, and retrieval auto-enrichment only runs onchat.messageandreasoning.*.runsignals.For ReAct query enrichment, recall memory first and prepend it to the prompt explicitly. See First Agent for the ReAct request flow.
1. Enable Retrieval And Quota In An Agent
defmodule MyApp.Actions.Multiply do
use Jido.Action,
name: "multiply",
schema: Zoi.object(%{a: Zoi.integer(), b: Zoi.integer()})
@impl true
def run(%{a: a, b: b}, _context), do: {:ok, %{product: a * b}}
end
defmodule MyApp.SupportAgent do
use Jido.AI.Agent,
name: "support_agent",
model: :fast,
tools: [MyApp.Actions.Multiply],
retrieval: %{
enabled: true,
namespace: "support_memory",
top_k: 3,
max_snippet_chars: 280
},
quota: %{
enabled: true,
scope: "support_ops",
window_ms: 60_000,
max_requests: 100,
max_total_tokens: 40_000,
error_message: "quota exceeded for current window"
}
endThe retrieval: and quota: options are opt-in plugin config shortcuts on Jido.AI.Agent.
2. Manage Memory With Retrieval Actions
alias Jido.AI.Actions.Retrieval.{UpsertMemory, RecallMemory, ClearMemory}
context = %{plugin_state: %{retrieval: %{namespace: "support_memory"}}}
{:ok, %{retrieval: %{last_upsert: _entry}}} =
Jido.Exec.run(UpsertMemory, %{text: "Customer prefers email updates", metadata: %{customer_id: "c-123"}}, context)
{:ok, %{retrieval: %{memories: memories}}} =
Jido.Exec.run(RecallMemory, %{query: "How should we contact customer c-123?", top_k: 3}, context)
# Use ClearMemory when you need to drop namespace memory:
{:ok, %{retrieval: %{cleared: _count}}} = Jido.Exec.run(ClearMemory, %{}, context)namespace resolution for retrieval actions:
- explicit action param
context[:plugin_state][:retrieval][:namespace]context[:state][:retrieval][:namespace]context[:agent][:id]"default"
3. Understand Retrieval Auto-Enrichment Scope
Retrieval plugin enrichment currently applies to:
chat.messagereasoning.*.run
Per-request opt-out:
signal =
Jido.Signal.new!(
"chat.message",
%{prompt: "Draft a response for customer c-123", disable_retrieval: true},
source: "/docs"
)See the ReAct + retrieval gotcha at the top of this guide for details on why ai.react.query is not enriched.
4. Track And Reset Quota
alias Jido.AI.Actions.Quota.{GetStatus, Reset}
context = %{plugin_state: %{quota: %{scope: "support_ops", window_ms: 60_000, max_total_tokens: 40_000}}}
{:ok, %{quota: status}} = Jido.Exec.run(GetStatus, %{}, context)
# status includes: usage, limits, remaining, over_budget?
{:ok, %{quota: %{scope: "support_ops", reset: true}}} = Jido.Exec.run(Reset, %{}, context)scope resolution for quota actions:
- explicit action param
context[:plugin_state][:quota][:scope]context[:state][:quota][:scope]context[:agent][:id]"default"
5. Quota Rewrite Behavior
When over budget, request/query signals in these families are rewritten to ai.request.error:
chat.*ai.*.queryreasoning.*.run
Rewrite payload fields:
request_idreason: :quota_exceededmessagefrom quota config
ai.usage drives counters; token accounting uses total_tokens first, then input_tokens + output_tokens.
Failure Mode: Retrieval Returns No Memories
Symptom:
RecallMemoryreturnsmemories: []- no retrieval snippets appear in enriched prompts
Fix:
- verify namespace alignment (
upsertandrecallmust target same namespace) - increase
top_k - ensure query text overlaps stored memory text
Failure Mode: Requests Rejected As Over Budget
Symptom:
- request resolves to quota-related
ai.request.error
Fix:
- inspect quota status via
Jido.AI.Actions.Quota.GetStatus - increase
max_requests/max_total_tokensor shorten response budgets - reset counters with
Jido.AI.Actions.Quota.Resetduring testing
Defaults You Should Know
- Retrieval defaults:
enabled: true,top_k: 3,max_snippet_chars: 280 - Quota defaults:
enabled: true,window_ms: 60_000,max_requests: nil,max_total_tokens: nil - Quota error default message in plugin runtime:
"quota exceeded for current window"
When To Use / Not Use
Use this path when:
- responses should use short-term memory context
- you must enforce request/token budgets
Do not use this path when:
- workload is stateless and unconstrained (for example local prototyping only)