# Usage & Billing ## Overview ReqLLM provides normalized usage tracking and best-effort cost calculation for API requests. Every response includes usage data that works consistently across providers, with detailed breakdowns for tokens, tools, and images when the provider exposes enough information. ## Pricing Policy ReqLLM currently targets **"some assistance, no guarantees"** for pricing. In practice, that means: - `response.usage` is intended to be useful for product analytics, tenant attribution, dashboards, and rough billing estimates - token, tool, image, and caching costs are calculated from provider usage data plus model pricing metadata when those inputs exist - the resulting USD totals are not guaranteed to match provider invoices exactly When exact billing matters, treat ReqLLM usage as a helpful estimate and reconcile against provider-side reporting. For the full contract, known gaps, and production guidance, see the [Pricing Policy](pricing-policy.md) guide. ## The Usage Structure Every `ReqLLM.Response` includes a `usage` map with normalized metrics: ```elixir {:ok, response} = ReqLLM.generate_text("anthropic:claude-haiku-4-5", "Hello") response.usage #=> %{ # # Token counts # input_tokens: 8, # output_tokens: 12, # total_tokens: 20, # # # Cost summary (USD) # input_cost: 0.00024, # output_cost: 0.00036, # total_cost: 0.0006, # # # Detailed cost breakdown # cost: %{ # tokens: 0.0006, # tools: 0.0, # images: 0.0, # total: 0.0006 # } # } ``` ## Token Usage ### Standard Tokens All providers report basic token counts: | Field | Description | |-------|-------------| | `input_tokens` | Tokens in the request (prompt, context, tools) | | `output_tokens` | Tokens generated by the model | | `total_tokens` | Sum of input and output tokens | ### Reasoning Tokens For reasoning models (OpenAI o1/o3/gpt-5, Anthropic extended thinking, Google thinking): ```elixir {:ok, response} = ReqLLM.generate_text("openai:o3-mini", prompt) response.usage.reasoning_tokens #=> 1250 # Tokens used for internal reasoning ``` The `reasoning_tokens` field tracks tokens used for chain-of-thought reasoning. These may be billed differently than standard tokens depending on the provider. ### Cached Tokens For providers that support prompt caching (Anthropic, OpenAI): ```elixir response.usage.cached_tokens #=> 500 # Input tokens served from cache response.usage.cache_creation_tokens #=> 0 # Tokens used to create new cache entries ``` Cached tokens are typically billed at a reduced rate. See [Anthropic Prompt Caching](anthropic.md#anthropic_prompt_cache) for details. ## Tool Usage When using tools like web search, usage is tracked in `tool_usage`: ```elixir response.usage.tool_usage #=> %{ # web_search: %{count: 2, unit: "call"} # } ``` ### Web Search Each provider has slightly different web search tracking: | Provider | Unit | Notes | |----------|------|-------| | Anthropic | `"call"` | $10 per 1,000 searches | | OpenAI | `"call"` | Responses API models only | | xAI | `"call"` or `"source"` | Varies by response format | | Google | `"query"` | Grounding queries | **Anthropic Example:** ```elixir {:ok, response} = ReqLLM.generate_text( "anthropic:claude-sonnet-4-5", "What's happening in AI today?", provider_options: [web_search: %{max_uses: 5}] ) response.usage.tool_usage.web_search #=> %{count: 3, unit: "call"} ``` **xAI Example:** ```elixir {:ok, response} = ReqLLM.generate_text( "xai:grok-4-1-fast-reasoning", "Latest tech news", xai_tools: [%{type: "web_search"}] ) response.usage.tool_usage.web_search #=> %{count: 5, unit: "call"} ``` **Google Grounding Example:** ```elixir {:ok, response} = ReqLLM.generate_text( "google:gemini-3-flash-preview", "Current stock market trends", provider_options: [google_grounding: %{enable: true}] ) response.usage.tool_usage.web_search #=> %{count: 2, unit: "query"} ``` ## Image Usage For image generation, usage is tracked in `image_usage`: ```elixir {:ok, response} = ReqLLM.generate_image("openai:gpt-image-1", prompt) response.usage.image_usage #=> %{ # generated: %{count: 1, size_class: "1024x1024"} # } ``` ### Size Classes Image costs vary by resolution: | Provider | Size Classes | |----------|-------------| | OpenAI GPT Image | `"1024x1024"`, `"1536x1024"`, `"1024x1536"`, `"auto"` | | OpenAI DALL-E 3 | `"1024x1024"`, `"1792x1024"`, `"1024x1792"` | | Google | Based on aspect ratio | ### Multiple Images ```elixir {:ok, response} = ReqLLM.generate_image("openai:dall-e-2", prompt, n: 3) response.usage.image_usage.generated #=> %{count: 3, size_class: "1024x1024"} ``` ## Cost Breakdown The `cost` map provides a detailed breakdown by category: ```elixir response.usage.cost #=> %{ # tokens: 0.001, # Token-based costs (input + output) # tools: 0.02, # Web search and tool costs # images: 0.04, # Image generation costs # total: 0.061, # Sum of all costs # line_items: [...] # Per-component details # } ``` ### Line Items For detailed billing analysis, `line_items` provides per-component costs: ```elixir response.usage.cost.line_items #=> [ # %{component: "token.input", cost: 0.0003, quantity: 100}, # %{component: "token.output", cost: 0.0007, quantity: 50}, # %{component: "tool.web_search", cost: 0.02, quantity: 2} # ] ``` ## Provider-Specific Notes ### Anthropic - **Web search**: $10 per 1,000 searches - **Prompt caching**: Reduced rates for cached tokens - **Extended thinking**: Reasoning tokens tracked separately ### OpenAI - **Responses API**: Web search available for o1, o3, gpt-5 models - **Chat Completions API**: No built-in web search - **Image generation**: Costs vary by model and size ### xAI - **Web search**: Via `xai_tools` option - **Deprecated**: `live_search` is no longer supported - **Units**: May report as `"call"` or `"source"` ### Google - **Grounding**: Search via `google_grounding` option - **Units**: Reports as `"query"` - **Image generation**: Gemini image models supported ## Known Limits ReqLLM does not currently guarantee support for every provider billing surface. In particular: - realtime audio/text billing is not modeled yet - video generation billing is not modeled yet - account-specific discounts, credits, taxes, and regional pricing are outside the public contract ## Telemetry ReqLLM now emits three telemetry families: - `[:req_llm, :request, :start | :stop | :exception]` for lifecycle timing, request and response summaries, usage, and standardized reasoning metadata - `[:req_llm, :reasoning, :start | :update | :stop]` for provider-neutral thinking and reasoning milestones - `[:req_llm, :token_usage]` for backwards-compatible token and cost tracking For billing and tenant attribution, use `[:req_llm, :request, :stop]` as the source of truth. It includes duration in measurements plus `request_id`, `usage`, `finish_reason`, and normalized `reasoning` metadata in the event metadata. The token usage event remains useful if you only want token and cost totals. When you audit reasoning-heavy workloads, prefer the normalized `reasoning` snapshot on the request lifecycle events over raw provider payloads. It captures both the originally requested reasoning settings and the effective translated request, so you can see when a provider rewrites or disables a reasoning configuration before you attribute cost or behavior to a tenant. ```elixir :telemetry.attach_many( "my-req-llm-billing", [ [:req_llm, :request, :stop], [:req_llm, :request, :exception], [:req_llm, :token_usage] ], fn event, measurements, metadata, _config -> case event do [:req_llm, :request, :stop] -> duration_ms = System.convert_time_unit(measurements.duration, :native, :millisecond) IO.inspect( %{ request_id: metadata.request_id, duration_ms: duration_ms, finish_reason: metadata.finish_reason, usage: metadata.usage, reasoning: metadata.reasoning }, label: "Request" ) [:req_llm, :request, :exception] -> IO.inspect(metadata, label: "Failed request") [:req_llm, :token_usage] -> IO.inspect(%{measurements: measurements, metadata: metadata}, label: "Usage") end end, nil ) ``` `[:req_llm, :token_usage]` remains available on every request, including streaming: ```elixir :telemetry.attach( "my-usage-handler", [:req_llm, :token_usage], fn _event, measurements, metadata, _config -> IO.inspect(measurements, label: "Usage") IO.inspect(metadata, label: "Metadata") end, nil ) ``` Event measurements include: - `input_tokens`, `output_tokens`, `total_tokens` - `input_cost`, `output_cost`, `total_cost` - `reasoning_tokens` (when applicable) See the [Telemetry Guide](telemetry.md) for the full event contract, reasoning lifecycle, milestone semantics, and payload capture options. ## Example: Complete Usage Tracking ```elixir defmodule UsageTracker do def track_request(model, prompt, opts \\ []) do {duration_us, result} = :timer.tc(fn -> ReqLLM.generate_text(model, prompt, opts) end) case result do {:ok, response} -> usage = response.usage IO.puts(""" Request completed in #{duration_us / 1000}ms Tokens: Input: #{usage.input_tokens} Output: #{usage.output_tokens} Total: #{usage.total_tokens} #{if usage.reasoning_tokens, do: "Reasoning: #{usage.reasoning_tokens}", else: ""} Cost: Input: $#{format_cost(usage.input_cost)} Output: $#{format_cost(usage.output_cost)} Total: $#{format_cost(usage.total_cost)} #{format_tool_usage(usage.tool_usage)} #{format_image_usage(usage.image_usage)} """) {:ok, response} error -> error end end defp format_cost(nil), do: "n/a" defp format_cost(cost), do: :erlang.float_to_binary(cost, decimals: 6) defp format_tool_usage(nil), do: "" defp format_tool_usage(tool_usage) do Enum.map_join(tool_usage, "\n", fn {tool, %{count: count, unit: unit}} -> "Tool Usage: #{tool} = #{count} #{unit}(s)" end) end defp format_image_usage(nil), do: "" defp format_image_usage(%{generated: %{count: count, size_class: size}}) do "Image Usage: #{count} image(s) at #{size}" end defp format_image_usage(_), do: "" end ``` ## See Also - [Data Structures](data-structures.md) - Response structure details - [Anthropic Guide](anthropic.md) - Web search and prompt caching - [OpenAI Guide](openai.md) - Responses API and image generation - [xAI Guide](xai.md) - Grok web search - [Google Guide](google.md) - Grounding and search - [Image Generation Guide](image-generation.md) - Image costs