# Data Structures

## Overview

ReqLLM's canonical data model is the foundation for provider-agnostic AI interactions. It normalizes provider differences by enforcing a small, consistent set of structs that represent models, conversations, tools, responses, and streaming.

## Hierarchy

```
LLMDB.Model           # Canonical model metadata struct used by ReqLLM
    ↓
ReqLLM.Context        # Conversation history
    ↓
ReqLLM.Message        # A turn in the conversation
    ↓
ReqLLM.Message.ContentPart  # Typed content (text, images, files, tool calls, results)
    ↓
ReqLLM.Tool           # Tool definitions (name, description, schema)
    ↓
ReqLLM.StreamChunk    # Streaming events (content, tool_call, thinking, meta)
    ↓
ReqLLM.Response       # Canonical final response with usage and helpers
    ↓
ReqLLM.StreamResponse # Streaming handle with helpers
```

## Design goals

- **Provider-agnostic**: One set of types works across Anthropic, OpenAI, Google, etc.
- **Typed and explicit**: Discriminated unions for content; consistent fields, no surprises.
- **Composable and immutable**: Build contexts and messages with simple, predictable APIs.
- **Extensible**: Metadata fields and new content types can be added without breaking shape.

## 1) LLMDB.Model

Represents a model choice for a specific provider plus optional routing and capability metadata.

**Typical fields**:
- `provider`: atom, e.g., `:anthropic`
- `id`: string, e.g., `"claude-haiku-4-5"`
- `provider_model_id`: optional provider-facing wire ID
- `base_url`: optional per-model endpoint metadata
- `capabilities`, `limits`, `modalities`, `cost`, `pricing`, `extra`: optional metadata (often sourced from LLMDB)

**Constructors**:
```elixir
{:ok, model} = ReqLLM.model("anthropic:claude-haiku-4-5")

model =
  ReqLLM.model!(%{
    provider: :openai,
    id: "gpt-6-mini",
    base_url: "http://localhost:8000/v1"
  })

# Direct struct creation if you need full control
model = LLMDB.Model.new!(%{
  provider: :anthropic,
  id: "claude-3-5-sonnet-20241022",
  capabilities: %{tool_call: true},
  modalities: %{input: [:text, :image], output: [:text]},
  cost: %{input: 3.0, output: 15.0}
})
```

**How this supports normalization**:
- One way to specify models across providers.
- Common options are normalized; provider-specific options are translated by the provider adapter.
- See the [Model Specs](model-specs.md) guide for the full model-spec resolution rules and explicit model-specification path.

## 2) ReqLLM.Context

A conversation wrapper around a list of `Message` structs. Implements `Enumerable` and `Collectable` for ergonomic manipulation.

**Constructors and helpers**:
```elixir
import ReqLLM.Context
alias ReqLLM.Message.ContentPart

context = Context.new([
  system("You are a helpful assistant."),
  user("Summarize this document."),
  user([
    ContentPart.file(pdf_data, "report.pdf", "application/pdf")
  ])
])
```

**How this supports normalization**:
- One conversation format for all providers (no provider-specific role/content layouts).
- Multimodal content is embedded uniformly via `ContentPart`.

## 3) ReqLLM.Message

Represents one conversational turn with a role and a list of `ContentPart` items.

**Typical fields**:
- `role`: `:system` | `:user` | `:assistant` | `:tool` (when appropriate)
- `content`: list of `ContentPart`

**Examples**:
```elixir
alias ReqLLM.Message.ContentPart

msg = %ReqLLM.Message{
  role: :user,
  content: [ContentPart.text("Hello!")]
}
```

**How this supports normalization**:
- Every message has a uniform shape; multimodality is handled by `ContentPart` rather than provider-specific message types.

## 4) ReqLLM.Message.ContentPart

Typed content elements that compose a `Message`. Common variants:
- `text/1`: `ContentPart.text("...")`
- `text/2`: `ContentPart.text("...", metadata)` with metadata map
- `image_url/1`: `ContentPart.image_url("https://...")`
- `image_url/2`: `ContentPart.image_url("https://...", metadata)` with metadata
- `image/2`: `ContentPart.image(binary, "image/png")`
- `image/3`: `ContentPart.image(binary, "image/png", metadata)` with metadata
- `file/3`: `ContentPart.file(binary, "name.ext", "mime/type")`
- `thinking/1`: `ContentPart.thinking("...")` for models that expose reasoning tokens
- `tool_call/2`: `ContentPart.tool_call("name", %{arg: "value"})` for assistant-issued calls
- `tool_result/2`: `ContentPart.tool_result("tool_call_id", %{...})` for tool outputs

**Example**:
```elixir
parts = [
  ContentPart.text("Analyze:"),
  ContentPart.image_url("https://example.com/chart.png")
]
```

**Metadata field**:

The `metadata` field allows passing provider-specific attributes through to the wire format. Currently supported metadata keys:

- `cache_control`: Anthropic prompt caching control (e.g., `%{type: "ephemeral"}`)

```elixir
# Enable prompt caching for text content
cached_text = ContentPart.text(
  "Long system prompt to cache...",
  %{cache_control: %{type: "ephemeral"}}
)

# Enable prompt caching for images
cached_image = ContentPart.image_url(
  "https://example.com/large-diagram.png",
  %{cache_control: %{type: "ephemeral"}}
)

# Or with binary image data
cached_binary_image = ContentPart.image(
  image_data,
  "image/png",
  %{cache_control: %{type: "ephemeral"}}
)
```

**How this supports normalization**:
- Discriminated union eliminates polymorphism across providers.
- New content types can be added without changing the `Message` shape.
- Metadata enables provider-specific features without breaking the canonical model.

## 5) ReqLLM.Tool

Defines callable functions (aka "tools" or "function calling") with validation.

**Typical fields**:
- `name`: string
- `description`: string
- `parameter_schema`: `NimbleOptions`-based schema for argument validation
- `callback`: function or MFA tuple to execute the tool

**Example**:
```elixir
{:ok, tool} = ReqLLM.Tool.new(
  name: "get_weather",
  description: "Gets weather by city",
  parameter_schema: [city: [type: :string, required: true]],
  callback: fn %{city: city} -> {:ok, "Weather in #{city}: sunny"} end
)

# Execute locally (e.g., after a model issues a tool_call)
{:ok, result} = ReqLLM.Tool.execute(tool, %{"city" => "NYC"})
```

**How this supports normalization**:
- One tool definition is used across providers that support function/tool calling.
- Tool calls/results appear in `ContentPart` and `StreamChunk` the same way for all providers.
- Structured tool results should be represented in the content body as JSON when
  they carry model-visible semantics like `%{ok: true, result: ...}` or
  `%{ok: false, error: ...}`. Message metadata can preserve the original native
  output for adapters and local consumers, but it should not be the only source
  of meaning for follow-up model turns.

## 6) ReqLLM.StreamChunk

Unified streaming event payloads emitted during `stream_text`.

**Common chunk types**:
- `:content` — text tokens or content fragments
- `:thinking` — reasoning tokens (if provider exposes them)
- `:tool_call` — a call intent with name and arguments
- `:meta` — metadata such as `finish_reason`, usage deltas, etc.

**Example**:
```elixir
%ReqLLM.StreamChunk{type: :content, text: "Hello"}
%ReqLLM.StreamChunk{type: :tool_call, name: "get_weather", arguments: %{city: "NYC"}}
%ReqLLM.StreamChunk{type: :meta, metadata: %{finish_reason: "stop"}}
```

**How this supports normalization**:
- All providers' streaming formats are mapped into this single, consistent event model.

## 7) ReqLLM.Response

Canonical final response returned by non-streaming calls (and available after streaming completes, when applicable).

**Typical fields and helpers**:
- `content`/`messages`: unified assistant output as Messages/ContentParts
- `usage`: normalized token/cost data when available
- helpers: `ReqLLM.Response.text/1`, `ReqLLM.Response.object/1`, `ReqLLM.Response.usage/1`

**Example**:
```elixir
{:ok, response} = ReqLLM.generate_text("anthropic:claude-haiku-4-5",
  [ReqLLM.Context.user("Hello")]
)

text = ReqLLM.Response.text(response)
usage = ReqLLM.Response.usage(response)
```

### Usage Structure

The `usage` field contains normalized usage data with token counts, costs, and tool/image usage:

```elixir
%{
  # Token counts
  input_tokens: 150,
  output_tokens: 200,
  total_tokens: 350,
  reasoning_tokens: 0,        # For reasoning models (o1, o3, gpt-5)
  cached_tokens: 100,         # Cached input tokens
  cache_creation_tokens: 0,   # Tokens used to create cache

  # Cost breakdown (USD)
  input_cost: 0.00045,
  output_cost: 0.0006,
  total_cost: 0.00105,

  # Detailed cost by category
  cost: %{
    tokens: 0.00105,
    tools: 0.02,              # Web search, function calls
    images: 0.0,              # Image generation
    total: 0.02105,
    line_items: [...]         # Per-component cost details
  },

  # Tool usage (web search, etc.)
  tool_usage: %{
    web_search: %{count: 2, unit: "call"}
  },

  # Image usage (for image generation)
  image_usage: %{
    generated: %{count: 1, size_class: "1024x1024"}
  }
}
```

See the [Usage & Billing Guide](usage-and-billing.md) for comprehensive documentation.

**How this supports normalization**:
- One response object to extract text, structured objects, and usage across providers.

## 8) ReqLLM.StreamResponse

Handle for streaming operations with helpers to consume chunks or tokens.

**Example**:
```elixir
{:ok, sr} = ReqLLM.stream_text("anthropic:claude-haiku-4-5",
  [ReqLLM.Context.user("Tell me a story")]
)

# Stream raw chunks
ReqLLM.StreamResponse.stream(sr)
|> Stream.each(fn chunk ->
  case chunk.type do
    :content -> IO.write(chunk.text)
    :tool_call -> IO.inspect(chunk, label: "Tool call")
    :meta -> :ok
    _ -> :ok
  end
end)
|> Stream.run()

# Or tokens helper (if available)
ReqLLM.StreamResponse.tokens(sr)
|> Stream.each(&IO.write/1)
|> Stream.run()
```

**How this supports normalization**:
- Same streaming consumption API for every provider; adapters convert SSE/WS specifics into `StreamChunk`.

## 9) Validation and type safety

ReqLLM provides validation utilities so you can fail early and clearly:
- `ReqLLM.Context.validate/1`
- `ReqLLM.StreamChunk.validate/1`
- Tool argument validation via `NimbleOptions` schemas

**Example**:
```elixir
case ReqLLM.Context.validate(context) do
  {:ok, ctx} -> ReqLLM.generate_text(model, ctx)
  {:error, reason} -> raise ArgumentError, "Invalid context: #{reason}"
end
```

## 10) End-to-end example (provider-agnostic)

```elixir
alias ReqLLM.Message.ContentPart

{:ok, model} = ReqLLM.model("anthropic:claude-haiku-4-5")

{:ok, tool} = ReqLLM.Tool.new(
  name: "get_weather",
  description: "Gets weather by city",
  parameter_schema: [city: [type: :string, required: true]],
  callback: fn %{city: city} -> {:ok, "Weather in #{city}: sunny"} end
)

context = ReqLLM.Context.new([
  ReqLLM.Context.system("You are a helpful assistant."),
  ReqLLM.Context.user([
    ContentPart.text("What is the weather in NYC today?")
  ])
])

{:ok, response} = ReqLLM.generate_text(model, context, tools: [tool])

IO.puts("Answer: " <> ReqLLM.Response.text(response))
IO.inspect(ReqLLM.Response.usage(response), label: "Usage")
```

**How this supports normalization**:
- At no point does your application code need to branch on provider.
- Providers translate request/response specifics into these canonical types.

## Key takeaways

- The canonical data structures are the heart of ReqLLM's "normalize everything" approach.
- Build contexts, messages, and tools once; reuse them across providers.
- Consume streaming and final results through a single, consistent API.