Nous.Teams.RateLimiter (nous v0.16.4)
View SourceToken-bucket rate limiter for per-team and per-agent usage control.
Tracks token usage and enforces budget limits and rate limits (requests per minute, tokens per minute) for a team's agents.
Architecture
Each team optionally gets its own RateLimiter GenServer. The agent runner
calls acquire/3 BEFORE making an LLM request (when a limiter is wired into
the agent's deps as :rate_limiter_pid) to atomically reserve an estimated
token count + 1 request slot. After the call completes, it calls
record_usage/3 with the reservation ref to reconcile actual vs estimated;
if the call errored before completing, it calls release/2 to refund.
Pre-deduction makes the token (tpm) and request (rpm) limits race-safe
under concurrent acquires (M-9). Note: the cost budget is reconciled
post-hoc — acquire/3 reserves 0 cost (the runtime has no per-token cost
model), so N concurrent in-flight calls can overshoot the dollar budget by
the cost of those calls. tpm/rpm are the hard concurrency guards.
Quick Start
{:ok, pid} = RateLimiter.start_link(
team_id: "team_1",
budget: 10.0,
per_agent_budget: 5.0,
rpm: 60,
tpm: 100_000
)
# Reserve, run, reconcile:
{:ok, ref} = RateLimiter.acquire(pid, "alice", 1000)
case do_llm_call(...) do
{:ok, response} ->
actual_tokens = response.usage.total_tokens
actual_cost = response.usage.cost
RateLimiter.record_usage(pid, "alice", %{
tokens: actual_tokens, cost: actual_cost, reservation: ref
})
{:error, _} ->
RateLimiter.release(pid, ref)
endBackward compatibility
record_usage/3 called WITHOUT a :reservation key still works as
post-hoc accounting (legacy semantics). This keeps direct usage like
RateLimiter.record_usage(pid, "alice", %{tokens: 500}) valid for
callers that don't go through acquire.
Reservations that are never reconciled or released are pruned after
:reservation_ttl_ms (default 5 minutes) with a Logger.warning/1,
so a missing release/2 doesn't leak budget forever.
Configuration
:budget— team-wide budget in USD (default::infinity):per_agent_budget— per-agent budget in USD (default::infinity):rpm— requests per minute limit (default::infinity):tpm— tokens per minute limit (default::infinity):reservation_ttl_ms— reservation expiry (default: 300_000 = 5 min)
Summary
Functions
Atomically reserve tokens and 1 request for agent_name.
Returns a specification to start this module under a supervisor.
Get the current status of the rate limiter.
Record actual usage after an LLM call completes.
Cancel a reservation. Refunds the reserved tokens + request.
Start a RateLimiter for a team.
Types
@type reservation_ref() :: reference()
@type status() :: %{ budget_remaining: float() | :infinity, agents: %{ required(String.t()) => %{ cost: float(), tokens: non_neg_integer(), requests: non_neg_integer() } }, open_reservations: non_neg_integer() }
Functions
@spec acquire(pid(), String.t(), non_neg_integer()) :: {:ok, reservation_ref()} | {:error, :budget_exceeded} | {:error, :rate_limited}
Atomically reserve tokens and 1 request for agent_name.
Returns {:ok, reservation_ref} if within budget and rate limits.
The ref must be passed back to either record_usage/3 (with
:reservation) or release/2 so the reservation isn't held forever.
Examples
{:ok, ref} = RateLimiter.acquire(pid, "alice", 1000)
{:error, :budget_exceeded} = RateLimiter.acquire(pid, "alice", 1000)
{:error, :rate_limited} = RateLimiter.acquire(pid, "alice", 1000)
Returns a specification to start this module under a supervisor.
See Supervisor.
Get the current status of the rate limiter.
Returns a map with :budget_remaining, :agents usage breakdown, and
:open_reservations (held but not yet reconciled or released).
Record actual usage after an LLM call completes.
Two modes:
With
:reservationkey — reconciles actual vs estimated. The reservation is consumed (dropped from open reservations) and the delta(actual - estimate)is applied to totals/agent/window.Without
:reservationkey (legacy) — adds the actual usage as a fresh entry, with no reconciliation. Use this only when you didn't go throughacquire/3.
Examples
# Reservation-based (race-safe)
{:ok, ref} = RateLimiter.acquire(pid, "alice", 1000)
RateLimiter.record_usage(pid, "alice",
%{tokens: 850, cost: 0.012, reservation: ref})
# Post-hoc (legacy, not race-safe)
RateLimiter.record_usage(pid, "alice", %{tokens: 500, cost: 0.01})
@spec release(pid(), reservation_ref()) :: :ok
Cancel a reservation. Refunds the reserved tokens + request.
Use this when an LLM call errored before completing and you don't have actual usage to record.
@spec start_link(keyword()) :: GenServer.on_start()
Start a RateLimiter for a team.
Options
:team_id(required) — unique identifier for the team:budget— total team budget in USD (default::infinity):per_agent_budget— per-agent budget in USD (default::infinity):rpm— requests per minute (default::infinity):tpm— tokens per minute (default::infinity):reservation_ttl_ms— reservation expiry (default: 300_000):name— optional GenServer name