Retrieval And Quota

Copy Markdown View Source

You need memory-aware responses and budget controls before production traffic scales.

After this guide, you can enable retrieval and quota plugins, run retrieval/quota actions directly, and understand request-rewrite behavior.

⚠️ ReAct agents and retrieval enrichment

You enabled retrieval but your ReAct agent ignores memory? That's expected. Jido.AI.Agent.ask/ask_sync emits ai.react.query, and retrieval auto-enrichment only runs on chat.message and reasoning.*.run signals.

For ReAct query enrichment, recall memory first and prepend it to the prompt explicitly. See First Agent for the ReAct request flow.

1. Enable Retrieval And Quota In An Agent

defmodule MyApp.Actions.Multiply do
  use Jido.Action,
    name: "multiply",
    schema: Zoi.object(%{a: Zoi.integer(), b: Zoi.integer()})

  @impl true
  def run(%{a: a, b: b}, _context), do: {:ok, %{product: a * b}}
end

defmodule MyApp.SupportAgent do
  use Jido.AI.Agent,
    name: "support_agent",
    model: :fast,
    tools: [MyApp.Actions.Multiply],
    retrieval: %{
      enabled: true,
      namespace: "support_memory",
      top_k: 3,
      max_snippet_chars: 280
    },
    quota: %{
      enabled: true,
      scope: "support_ops",
      window_ms: 60_000,
      max_requests: 100,
      max_total_tokens: 40_000,
      error_message: "quota exceeded for current window"
    }
end

The retrieval: and quota: options are opt-in plugin config shortcuts on Jido.AI.Agent.

2. Manage Memory With Retrieval Actions

alias Jido.AI.Actions.Retrieval.{UpsertMemory, RecallMemory, ClearMemory}

context = %{plugin_state: %{retrieval: %{namespace: "support_memory"}}}

{:ok, %{retrieval: %{last_upsert: _entry}}} =
  Jido.Exec.run(UpsertMemory, %{text: "Customer prefers email updates", metadata: %{customer_id: "c-123"}}, context)

{:ok, %{retrieval: %{memories: memories}}} =
  Jido.Exec.run(RecallMemory, %{query: "How should we contact customer c-123?", top_k: 3}, context)

# Use ClearMemory when you need to drop namespace memory:
{:ok, %{retrieval: %{cleared: _count}}} = Jido.Exec.run(ClearMemory, %{}, context)

namespace resolution for retrieval actions:

  1. explicit action param
  2. context[:plugin_state][:retrieval][:namespace]
  3. context[:state][:retrieval][:namespace]
  4. context[:agent][:id]
  5. "default"

3. Understand Retrieval Auto-Enrichment Scope

Retrieval plugin enrichment currently applies to:

  • chat.message
  • reasoning.*.run

Per-request opt-out:

signal =
  Jido.Signal.new!(
    "chat.message",
    %{prompt: "Draft a response for customer c-123", disable_retrieval: true},
    source: "/docs"
  )

See the ReAct + retrieval gotcha at the top of this guide for details on why ai.react.query is not enriched.

4. Track And Reset Quota

alias Jido.AI.Actions.Quota.{GetStatus, Reset}

context = %{plugin_state: %{quota: %{scope: "support_ops", window_ms: 60_000, max_total_tokens: 40_000}}}

{:ok, %{quota: status}} = Jido.Exec.run(GetStatus, %{}, context)
# status includes: usage, limits, remaining, over_budget?

{:ok, %{quota: %{scope: "support_ops", reset: true}}} = Jido.Exec.run(Reset, %{}, context)

scope resolution for quota actions:

  1. explicit action param
  2. context[:plugin_state][:quota][:scope]
  3. context[:state][:quota][:scope]
  4. context[:agent][:id]
  5. "default"

5. Quota Rewrite Behavior

When over budget, request/query signals in these families are rewritten to ai.request.error:

  • chat.*
  • ai.*.query
  • reasoning.*.run

Rewrite payload fields:

  • request_id
  • reason: :quota_exceeded
  • message from quota config

ai.usage drives counters; token accounting uses total_tokens first, then input_tokens + output_tokens.

Failure Mode: Retrieval Returns No Memories

Symptom:

  • RecallMemory returns memories: []
  • no retrieval snippets appear in enriched prompts

Fix:

  • verify namespace alignment (upsert and recall must target same namespace)
  • increase top_k
  • ensure query text overlaps stored memory text

Failure Mode: Requests Rejected As Over Budget

Symptom:

  • request resolves to quota-related ai.request.error

Fix:

Defaults You Should Know

  • Retrieval defaults: enabled: true, top_k: 3, max_snippet_chars: 280
  • Quota defaults: enabled: true, window_ms: 60_000, max_requests: nil, max_total_tokens: nil
  • Quota error default message in plugin runtime: "quota exceeded for current window"

When To Use / Not Use

Use this path when:

  • responses should use short-term memory context
  • you must enforce request/token budgets

Do not use this path when:

  • workload is stateless and unconstrained (for example local prototyping only)

Next