Nous.Teams.RateLimiter (nous v0.15.8)

View Source

Token-bucket rate limiter for per-team and per-agent usage control.

Tracks token usage and enforces budget limits and rate limits (requests per minute, tokens per minute) for a team's agents.

Architecture

Each team optionally gets its own RateLimiter GenServer. Agents call acquire/3 BEFORE making an LLM request to atomically reserve an estimated token count + 1 request slot. After the call completes, the agent calls record_usage/3 with the reservation ref to reconcile actual vs estimated; if the call errored before completing, the agent calls release/2 to refund the reservation.

Pre-deduction is what makes the limiter race-safe under concurrent acquires (M-9): without it, two callers could both see "budget remaining" before either's usage was recorded.

Quick Start

{:ok, pid} = RateLimiter.start_link(
  team_id: "team_1",
  budget: 10.0,
  per_agent_budget: 5.0,
  rpm: 60,
  tpm: 100_000
)

# Reserve, run, reconcile:
{:ok, ref} = RateLimiter.acquire(pid, "alice", 1000)

case do_llm_call(...) do
  {:ok, response} ->
    actual_tokens = response.usage.total_tokens
    actual_cost = response.usage.cost
    RateLimiter.record_usage(pid, "alice", %{
      tokens: actual_tokens, cost: actual_cost, reservation: ref
    })

  {:error, _} ->
    RateLimiter.release(pid, ref)
end

Backward compatibility

record_usage/3 called WITHOUT a :reservation key still works as post-hoc accounting (legacy semantics). This keeps direct usage like RateLimiter.record_usage(pid, "alice", %{tokens: 500}) valid for callers that don't go through acquire.

Reservations that are never reconciled or released are pruned after :reservation_ttl_ms (default 5 minutes) with a Logger.warning/1, so a missing release/2 doesn't leak budget forever.

Configuration

  • :budget — team-wide budget in USD (default: :infinity)
  • :per_agent_budget — per-agent budget in USD (default: :infinity)
  • :rpm — requests per minute limit (default: :infinity)
  • :tpm — tokens per minute limit (default: :infinity)
  • :reservation_ttl_ms — reservation expiry (default: 300_000 = 5 min)

Summary

Functions

Atomically reserve tokens and 1 request for agent_name.

Returns a specification to start this module under a supervisor.

Get the current status of the rate limiter.

Record actual usage after an LLM call completes.

Cancel a reservation. Refunds the reserved tokens + request.

Start a RateLimiter for a team.

Types

reservation_ref()

@type reservation_ref() :: reference()

status()

@type status() :: %{
  budget_remaining: float() | :infinity,
  agents: %{
    required(String.t()) => %{
      cost: float(),
      tokens: non_neg_integer(),
      requests: non_neg_integer()
    }
  },
  open_reservations: non_neg_integer()
}

Functions

acquire(pid, agent_name, tokens \\ 1)

@spec acquire(pid(), String.t(), non_neg_integer()) ::
  {:ok, reservation_ref()}
  | {:error, :budget_exceeded}
  | {:error, :rate_limited}

Atomically reserve tokens and 1 request for agent_name.

Returns {:ok, reservation_ref} if within budget and rate limits. The ref must be passed back to either record_usage/3 (with :reservation) or release/2 so the reservation isn't held forever.

Examples

{:ok, ref} = RateLimiter.acquire(pid, "alice", 1000)
{:error, :budget_exceeded} = RateLimiter.acquire(pid, "alice", 1000)
{:error, :rate_limited} = RateLimiter.acquire(pid, "alice", 1000)

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

get_status(pid)

@spec get_status(pid()) :: status()

Get the current status of the rate limiter.

Returns a map with :budget_remaining, :agents usage breakdown, and :open_reservations (held but not yet reconciled or released).

record_usage(pid, agent_name, usage_map)

@spec record_usage(pid(), String.t(), map()) :: :ok

Record actual usage after an LLM call completes.

Two modes:

  • With :reservation key — reconciles actual vs estimated. The reservation is consumed (dropped from open reservations) and the delta (actual - estimate) is applied to totals/agent/window.

  • Without :reservation key (legacy) — adds the actual usage as a fresh entry, with no reconciliation. Use this only when you didn't go through acquire/3.

Examples

# Reservation-based (race-safe)
{:ok, ref} = RateLimiter.acquire(pid, "alice", 1000)
RateLimiter.record_usage(pid, "alice",
  %{tokens: 850, cost: 0.012, reservation: ref})

# Post-hoc (legacy, not race-safe)
RateLimiter.record_usage(pid, "alice", %{tokens: 500, cost: 0.01})

release(pid, ref)

@spec release(pid(), reservation_ref()) :: :ok

Cancel a reservation. Refunds the reserved tokens + request.

Use this when an LLM call errored before completing and you don't have actual usage to record.

start_link(opts)

@spec start_link(keyword()) :: GenServer.on_start()

Start a RateLimiter for a team.

Options

  • :team_id (required) — unique identifier for the team
  • :budget — total team budget in USD (default: :infinity)
  • :per_agent_budget — per-agent budget in USD (default: :infinity)
  • :rpm — requests per minute (default: :infinity)
  • :tpm — tokens per minute (default: :infinity)
  • :reservation_ttl_ms — reservation expiry (default: 300_000)
  • :name — optional GenServer name