ExLLM User Guide
View SourceThis comprehensive guide covers all features and capabilities of the ExLLM library.
Table of Contents
- Installation and Setup
- Configuration
- Basic Usage
- Providers
- Chat Completions
- Streaming
- Session Management
- Context Management
- Function Calling
- Vision and Multimodal
- Embeddings
- Structured Outputs
- Cost Tracking
- Error Handling and Retries
- Caching
- Response Caching
- Model Discovery
- Provider Capabilities
- Logging
- Testing with Mock Adapter
- Advanced Topics
Installation and Setup
Adding to Your Project
Add ExLLM to your mix.exs
dependencies:
def deps do
[
{:ex_llm, "~> 0.4.1"},
# Included dependencies (automatically installed with ex_llm):
# - {:instructor, "~> 0.1.0"} - For structured outputs
# - {:bumblebee, "~> 0.5"} - For local model inference
# - {:nx, "~> 0.7"} - For numerical computing
# Optional hardware acceleration (choose one):
# {:exla, "~> 0.7"} # For CUDA/ROCm GPUs
# {:emlx, github: "elixir-nx/emlx", branch: "main"} # For Apple Silicon
]
end
Run mix deps.get
to install the dependencies.
Optional Dependencies
- Req: HTTP client (automatically included)
- Jason: JSON parser (automatically included)
- Instructor: Structured outputs with schema validation (automatically included)
- Bumblebee: Local model inference (automatically included)
- Nx: Numerical computing (automatically included)
- EXLA: CUDA/ROCm GPU acceleration (optional)
- EMLX: Apple Silicon Metal acceleration (optional)
Configuration
ExLLM supports multiple configuration methods to suit different use cases.
Environment Variables
The simplest way to configure ExLLM:
# OpenAI
export OPENAI_API_KEY="sk-..."
export OPENAI_API_BASE="https://api.openai.com/v1" # Optional custom endpoint
# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
# Google Gemini
export GOOGLE_API_KEY="..."
# or
export GEMINI_API_KEY="..."
# Groq
export GROQ_API_KEY="gsk_..."
# OpenRouter
export OPENROUTER_API_KEY="sk-or-..."
# X.AI
export XAI_API_KEY="xai-..."
# Mistral AI
export MISTRAL_API_KEY="..."
# Perplexity
export PERPLEXITY_API_KEY="pplx-..."
# AWS Bedrock
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION="us-east-1"
# Ollama
export OLLAMA_API_BASE="http://localhost:11434"
# LM Studio
export LMSTUDIO_API_BASE="http://localhost:1234"
Static Configuration
For more control, use static configuration:
config = %{
openai: %{
api_key: "sk-...",
api_base: "https://api.openai.com/v1",
default_model: "gpt-4o"
},
anthropic: %{
api_key: "sk-ant-...",
default_model: "claude-3-5-sonnet-20241022"
}
}
{:ok, provider} = ExLLM.ConfigProvider.Static.start_link(config)
# Use with config_provider option
{:ok, response} = ExLLM.chat(:openai, messages, config_provider: provider)
Custom Configuration Provider
Implement your own configuration provider:
defmodule MyApp.ConfigProvider do
@behaviour ExLLM.ConfigProvider
def get([:openai, :api_key]), do: fetch_from_vault("openai_key")
def get([:anthropic, :api_key]), do: fetch_from_vault("anthropic_key")
def get(_path), do: nil
def get_all() do
%{
openai: %{api_key: fetch_from_vault("openai_key")},
anthropic: %{api_key: fetch_from_vault("anthropic_key")}
}
end
end
# Use it
{:ok, response} = ExLLM.chat(:openai, messages,
config_provider: MyApp.ConfigProvider
)
Basic Usage
Simple Chat
messages = [
%{role: "user", content: "Hello, how are you?"}
]
{:ok, response} = ExLLM.chat(:openai, messages)
IO.puts(response.content)
Provider/Model Syntax
# Use provider/model string syntax
{:ok, response} = ExLLM.chat("anthropic/claude-3-haiku-20240307", messages)
# Equivalent to
{:ok, response} = ExLLM.chat(:anthropic, messages,
model: "claude-3-haiku-20240307"
)
Response Structure
%ExLLM.Types.LLMResponse{
content: "I'm doing well, thank you!",
model: "gpt-4o",
finish_reason: "stop",
usage: %{
input_tokens: 12,
output_tokens: 8,
total_tokens: 20
},
cost: %{
input_cost: 0.00006,
output_cost: 0.00016,
total_cost: 0.00022,
currency: "USD"
}
}
Providers
Supported Providers
ExLLM supports these providers out of the box:
- :openai - OpenAI GPT models
- :anthropic - Anthropic Claude models
- :gemini - Google Gemini models
- :groq - Groq fast inference
- :mistral - Mistral AI models
- :perplexity - Perplexity search-enhanced models
- :ollama - Local models via Ollama
- :lmstudio - Local models via LM Studio
- :bedrock - AWS Bedrock
- :openrouter - OpenRouter (300+ models)
- :xai - X.AI Grok models
- :bumblebee - Local models via Bumblebee/NX
- :mock - Mock adapter for testing
Checking Provider Configuration
# Check if a provider is configured
if ExLLM.configured?(:openai) do
{:ok, response} = ExLLM.chat(:openai, messages)
end
# Get default model for a provider
model = ExLLM.default_model(:anthropic)
# => "claude-3-5-sonnet-20241022"
# List available models
{:ok, models} = ExLLM.list_models(:openai)
for model <- models do
IO.puts("#{model.id}: #{model.context_window} tokens")
end
Chat Completions
Basic Options
{:ok, response} = ExLLM.chat(:openai, messages,
model: "gpt-4o", # Specific model
temperature: 0.7, # 0.0-1.0, higher = more creative
max_tokens: 1000, # Max response length
top_p: 0.9, # Nucleus sampling
frequency_penalty: 0.5, # Reduce repetition
presence_penalty: 0.5, # Encourage new topics
stop: ["\n\n", "END"], # Stop sequences
seed: 12345, # Reproducible outputs
timeout: 60_000 # Request timeout in ms (default: provider-specific)
)
Timeout Configuration
Different providers have different timeout requirements. ExLLM allows you to configure timeouts per request:
# Ollama with function calling (can be slow)
{:ok, response} = ExLLM.chat(:ollama, messages,
functions: functions,
timeout: 300_000 # 5 minutes
)
# Quick requests with shorter timeout
{:ok, response} = ExLLM.chat(:openai, messages,
timeout: 30_000 # 30 seconds
)
Default timeouts:
- Ollama: 120,000ms (2 minutes) - Local models can be slower
- Other providers: Use their HTTP client defaults (typically 30-60 seconds)
System Messages
messages = [
%{role: "system", content: "You are a helpful coding assistant."},
%{role: "user", content: "How do I read a file in Elixir?"}
]
{:ok, response} = ExLLM.chat(:openai, messages)
Multi-turn Conversations
conversation = [
%{role: "user", content: "What's the capital of France?"},
%{role: "assistant", content: "The capital of France is Paris."},
%{role: "user", content: "What's the population?"}
]
{:ok, response} = ExLLM.chat(:openai, conversation)
Streaming
Basic Streaming
{:ok, stream} = ExLLM.stream_chat(:openai, messages)
for chunk <- stream do
case chunk do
%{content: content} when content != nil ->
IO.write(content)
%{finish_reason: reason} when reason != nil ->
IO.puts("\nFinished: #{reason}")
_ ->
# Other chunk types (role, etc.)
:ok
end
end
Streaming with Callback
{:ok, stream} = ExLLM.stream_chat(:openai, messages,
on_chunk: fn chunk ->
if chunk.content, do: IO.write(chunk.content)
end
)
# Consume the stream
Enum.to_list(stream)
Collecting Streamed Response
{:ok, stream} = ExLLM.stream_chat(:openai, messages)
# Collect all chunks into a single response
full_content =
stream
|> Enum.map(& &1.content)
|> Enum.reject(&is_nil/1)
|> Enum.join("")
Stream Recovery
Enable automatic stream recovery for interrupted streams:
{:ok, stream} = ExLLM.stream_chat(:openai, messages,
stream_recovery: true,
recovery_strategy: :exact # :exact, :paragraph, or :summarize
)
# If stream is interrupted, you can resume
{:ok, resumed_stream} = ExLLM.resume_stream(recovery_id)
Session Management
Sessions provide stateful conversation management with automatic token tracking.
Creating and Using Sessions
# Create a new session
session = ExLLM.new_session(:openai, name: "Customer Support")
# Chat with session (automatically manages message history)
{:ok, {response, session}} = ExLLM.chat_with_session(
session,
"What's the weather like?"
)
# Continue the conversation
{:ok, {response2, session}} = ExLLM.chat_with_session(
session,
"What should I wear?"
)
# Check token usage
total_tokens = ExLLM.session_token_usage(session)
IO.puts("Total tokens used: #{total_tokens}")
Managing Session Messages
# Add messages manually
session = ExLLM.add_session_message(session, "user", "Hello!")
session = ExLLM.add_session_message(session, "assistant", "Hi there!")
# Get message history
messages = ExLLM.get_session_messages(session)
recent_10 = ExLLM.get_session_messages(session, 10)
# Clear messages but keep session metadata
session = ExLLM.clear_session(session)
Persisting Sessions
# Save session to JSON
{:ok, json} = ExLLM.save_session(session)
File.write!("session.json", json)
# Load session from JSON
{:ok, json} = File.read("session.json")
{:ok, restored_session} = ExLLM.load_session(json)
Session with Context
# Create session with default context
session = ExLLM.new_session(:openai,
name: "Tech Support",
context: %{
temperature: 0.3,
system_message: "You are a technical support agent."
}
)
# Context is automatically applied to all chats
{:ok, {response, session}} = ExLLM.chat_with_session(session, "Help!")
Context Management
Automatically manage conversation context to fit within model limits.
Context Window Validation
# Check if messages fit in context window
case ExLLM.validate_context(messages, provider: :openai, model: "gpt-4") do
{:ok, token_count} ->
IO.puts("Messages use #{token_count} tokens")
{:error, reason} ->
IO.puts("Messages too large: #{reason}")
end
# Get context window size for a model
window_size = ExLLM.context_window_size(:anthropic, "claude-3-opus-20240229")
# => 200000
Automatic Message Truncation
# Prepare messages to fit in context window
truncated = ExLLM.prepare_messages(long_conversation,
provider: :openai,
model: "gpt-4",
max_tokens: 4000, # Reserve tokens for response
strategy: :sliding_window, # or :smart
preserve_messages: 5 # Always keep last 5 messages
)
Truncation Strategies
- :sliding_window - Keep most recent messages
- :smart - Preserve system messages and recent context
# Smart truncation preserves important context
{:ok, response} = ExLLM.chat(:openai, very_long_conversation,
strategy: :smart,
preserve_messages: 10
)
Context Statistics
stats = ExLLM.context_stats(messages)
# => %{
# message_count: 20,
# total_tokens: 1500,
# by_role: %{"user" => 10, "assistant" => 9, "system" => 1},
# avg_tokens_per_message: 75
# }
Function Calling
Enable AI models to call functions/tools in your application.
Basic Function Calling
# Define available functions
functions = [
%{
name: "get_weather",
description: "Get current weather for a location",
parameters: %{
type: "object",
properties: %{
location: %{
type: "string",
description: "City and state, e.g. San Francisco, CA"
},
unit: %{
type: "string",
enum: ["celsius", "fahrenheit"],
description: "Temperature unit"
}
},
required: ["location"]
}
}
]
# Let the AI decide when to call functions
{:ok, response} = ExLLM.chat(:openai,
[%{role: "user", content: "What's the weather in NYC?"}],
functions: functions,
function_call: "auto" # or "none" or %{name: "get_weather"}
)
Handling Function Calls
# Parse function calls from response
case ExLLM.parse_function_calls(response, :openai) do
{:ok, [function_call | _]} ->
# AI wants to call a function
IO.inspect(function_call)
# => %ExLLM.FunctionCalling.FunctionCall{
# name: "get_weather",
# arguments: %{"location" => "New York, NY"}
# }
# Execute the function
result = get_weather_impl(function_call.arguments["location"])
# Format result for conversation
function_message = ExLLM.format_function_result(
%ExLLM.FunctionCalling.FunctionResult{
name: "get_weather",
result: result
},
:openai
)
# Continue conversation with function result
messages = messages ++ [response_message, function_message]
{:ok, final_response} = ExLLM.chat(:openai, messages)
{:ok, []} ->
# No function call, regular response
IO.puts(response.content)
end
Function Execution
# Define functions with handlers
functions_with_handlers = [
%{
name: "calculate",
description: "Perform mathematical calculations",
parameters: %{
type: "object",
properties: %{
expression: %{type: "string"}
},
required: ["expression"]
},
handler: fn args ->
# Your implementation
{result, _} = Code.eval_string(args["expression"])
%{result: result}
end
}
]
# Execute function automatically
{:ok, result} = ExLLM.execute_function(function_call, functions_with_handlers)
Provider-Specific Notes
Different providers use different terminology:
- OpenAI: "functions" and "function_call"
- Anthropic: "tools" and "tool_use"
- ExLLM normalizes these automatically
Vision and Multimodal
Work with images and other media types.
Basic Image Analysis
# Create a vision message
{:ok, message} = ExLLM.vision_message(
"What's in this image?",
["path/to/image.jpg"]
)
# Send to vision-capable model
{:ok, response} = ExLLM.chat(:openai, [message],
model: "gpt-4o" # or any vision model
)
Multiple Images
{:ok, message} = ExLLM.vision_message(
"Compare these images",
[
"image1.jpg",
"image2.jpg",
"https://example.com/image3.png" # URLs work too
],
detail: :high # :low, :high, or :auto
)
Loading Images
# Load image with options
{:ok, image_part} = ExLLM.load_image("photo.jpg",
detail: :high,
resize: {1024, 1024} # Optional resizing
)
# Build custom message
message = %{
role: "user",
content: [
%{type: "text", text: "Describe this image"},
image_part
]
}
Checking Vision Support
# Check if provider/model supports vision
if ExLLM.supports_vision?(:anthropic, "claude-3-opus-20240229") do
# This model supports vision
end
# Find all vision-capable models
vision_models = ExLLM.find_models_with_features([:vision])
Text Extraction from Images
# OCR-like functionality
{:ok, text} = ExLLM.extract_text_from_image(:openai, "document.png",
model: "gpt-4o",
prompt: "Extract all text, preserving formatting and layout"
)
Image Analysis
# Analyze multiple images
{:ok, analysis} = ExLLM.analyze_images(:anthropic,
["chart1.png", "chart2.png"],
"Compare these charts and identify trends",
model: "claude-3-5-sonnet-20241022"
)
Embeddings
Generate vector embeddings for semantic search and similarity.
Basic Embeddings
# Generate embeddings for text
{:ok, response} = ExLLM.embeddings(:openai,
["Hello world", "Goodbye world"]
)
# Response structure
%ExLLM.Types.EmbeddingResponse{
embeddings: [
[0.0123, -0.0456, ...], # 1536 dimensions for text-embedding-3-small
[0.0789, -0.0234, ...]
],
model: "text-embedding-3-small",
usage: %{total_tokens: 8}
}
Embedding Options
{:ok, response} = ExLLM.embeddings(:openai, texts,
model: "text-embedding-3-large",
dimensions: 256, # Reduce dimensions (model-specific)
encoding_format: "float" # or "base64"
)
Similarity Search
# Calculate similarity between embeddings
similarity = ExLLM.cosine_similarity(embedding1, embedding2)
# => 0.87 (1.0 = identical, 0.0 = orthogonal, -1.0 = opposite)
# Find similar items
query_embedding = get_embedding("search query")
items = [
%{id: 1, text: "Document 1", embedding: [...]},
%{id: 2, text: "Document 2", embedding: [...]},
# ...
]
results = ExLLM.find_similar(query_embedding, items,
top_k: 10,
threshold: 0.7 # Minimum similarity
)
# => [
# %{item: %{id: 2, ...}, similarity: 0.92},
# %{item: %{id: 5, ...}, similarity: 0.85},
# ...
# ]
Listing Embedding Models
{:ok, models} = ExLLM.list_embedding_models(:openai)
for model <- models do
IO.puts("#{model.name}: #{model.dimensions} dimensions")
end
Caching Embeddings
# Enable caching for embeddings
{:ok, response} = ExLLM.embeddings(:openai, texts,
cache: true,
cache_ttl: :timer.hours(24)
)
Structured Outputs
Generate structured data with schema validation using Instructor integration.
Basic Structured Output
defmodule EmailClassification do
use Ecto.Schema
embedded_schema do
field :category, Ecto.Enum, values: [:personal, :work, :spam]
field :priority, Ecto.Enum, values: [:high, :medium, :low]
field :summary, :string
end
end
{:ok, result} = ExLLM.chat(:openai,
[%{role: "user", content: "Classify this email: Meeting tomorrow at 3pm"}],
response_model: EmailClassification,
max_retries: 3 # Retry on validation failure
)
IO.inspect(result)
# => %EmailClassification{
# category: :work,
# priority: :high,
# summary: "Meeting scheduled for tomorrow"
# }
Complex Schemas
defmodule ProductExtraction do
use Ecto.Schema
embedded_schema do
field :name, :string
field :price, :decimal
field :currency, :string
field :in_stock, :boolean
embeds_many :features, Feature do
field :name, :string
field :value, :string
end
end
def changeset(struct, params) do
struct
|> Ecto.Changeset.cast(params, [:name, :price, :currency, :in_stock])
|> Ecto.Changeset.cast_embed(:features)
|> Ecto.Changeset.validate_required([:name, :price])
|> Ecto.Changeset.validate_number(:price, greater_than: 0)
end
end
{:ok, product} = ExLLM.chat(:anthropic,
[%{role: "user", content: "Extract product info from: iPhone 15 Pro, $999, 256GB storage, A17 chip"}],
response_model: ProductExtraction
)
Lists and Collections
defmodule TodoList do
use Ecto.Schema
embedded_schema do
embeds_many :todos, Todo do
field :task, :string
field :priority, Ecto.Enum, values: [:high, :medium, :low]
field :completed, :boolean, default: false
end
end
end
{:ok, todo_list} = ExLLM.chat(:openai,
[%{role: "user", content: "Create a todo list for launching a new feature"}],
response_model: TodoList
)
Cost Tracking
ExLLM automatically tracks API costs for all operations.
Automatic Cost Tracking
{:ok, response} = ExLLM.chat(:openai, messages)
# Cost is included in response
IO.inspect(response.cost)
# => %{
# input_cost: 0.00003,
# output_cost: 0.00006,
# total_cost: 0.00009,
# currency: "USD"
# }
# Format for display
IO.puts(ExLLM.format_cost(response.cost.total_cost))
# => "$0.009¢"
Manual Cost Calculation
usage = %{input_tokens: 1000, output_tokens: 500}
cost = ExLLM.calculate_cost(:openai, "gpt-4", usage)
# => %{
# input_cost: 0.03,
# output_cost: 0.06,
# total_cost: 0.09,
# currency: "USD",
# per_million_input: 30.0,
# per_million_output: 120.0
# }
Token Estimation
# Estimate tokens for text
tokens = ExLLM.estimate_tokens("Hello, world!")
# => 4
# Estimate for messages
tokens = ExLLM.estimate_tokens([
%{role: "user", content: "Hi"},
%{role: "assistant", content: "Hello!"}
])
# => 12
Disabling Cost Tracking
{:ok, response} = ExLLM.chat(:openai, messages,
track_cost: false
)
# response.cost will be nil
Error Handling and Retries
Automatic Retries
Retries are enabled by default with exponential backoff:
{:ok, response} = ExLLM.chat(:openai, messages,
retry: true, # Default: true
retry_count: 3, # Default: 3 attempts
retry_delay: 1000, # Default: 1 second initial delay
retry_backoff: :exponential, # or :linear
retry_jitter: true # Add randomness to prevent thundering herd
)
Error Types
case ExLLM.chat(:openai, messages) do
{:ok, response} ->
IO.puts(response.content)
{:error, %ExLLM.Error{type: :rate_limit} = error} ->
IO.puts("Rate limited. Retry after: #{error.retry_after}")
{:error, %ExLLM.Error{type: :invalid_api_key}} ->
IO.puts("Check your API key configuration")
{:error, %ExLLM.Error{type: :context_length_exceeded}} ->
IO.puts("Message too long for model")
{:error, %ExLLM.Error{type: :timeout}} ->
IO.puts("Request timed out")
{:error, error} ->
IO.inspect(error)
end
Custom Retry Logic
defmodule MyApp.RetryHandler do
def with_custom_retry(provider, messages, opts \\ []) do
Enum.reduce_while(1..5, nil, fn attempt, _acc ->
case ExLLM.chat(provider, messages, Keyword.put(opts, :retry, false)) do
{:ok, response} ->
{:halt, {:ok, response}}
{:error, %{type: :rate_limit} = error} ->
wait_time = error[:retry_after] || :timer.seconds(attempt * 10)
Process.sleep(wait_time)
{:cont, nil}
{:error, _} = error ->
if attempt == 5 do
{:halt, error}
else
Process.sleep(:timer.seconds(attempt))
{:cont, nil}
end
end
end)
end
end
Caching
Cache responses to reduce API calls and costs.
Basic Caching
# Enable caching globally
Application.put_env(:ex_llm, :cache_enabled, true)
# Or per request
{:ok, response} = ExLLM.chat(:openai, messages,
cache: true,
cache_ttl: :timer.minutes(15) # Default: 15 minutes
)
# Same request will use cache
{:ok, cached_response} = ExLLM.chat(:openai, messages, cache: true)
Cache Management
# Clear specific cache entry
ExLLM.Cache.delete(cache_key)
# Clear all cache
ExLLM.Cache.clear()
# Get cache stats
stats = ExLLM.Cache.stats()
# => %{size: 42, hits: 100, misses: 20}
Custom Cache Keys
# Cache key is automatically generated from:
# - Provider
# - Messages
# - Relevant options (model, temperature, etc.)
# You can also use manual cache management
cache_key = ExLLM.Cache.generate_cache_key(:openai, messages, options)
Response Caching
Cache real provider responses for offline testing and development cost reduction.
ExLLM provides two approaches for response caching:
- Unified Cache System (Recommended) - Extends the runtime cache with optional disk persistence
- Legacy Response Cache - Standalone response collection system
Unified Cache System (Recommended)
The unified cache system extends ExLLM's runtime performance cache with optional disk persistence. This provides both speed benefits and testing capabilities from a single system.
Enabling Unified Cache Persistence
# Method 1: Environment variables (temporary)
export EX_LLM_CACHE_PERSIST=true
export EX_LLM_CACHE_DIR="/path/to/cache" # Optional
# Method 2: Runtime configuration (recommended for tests)
ExLLM.Cache.configure_disk_persistence(true, "/path/to/cache")
# Method 3: Application configuration
config :ex_llm,
cache_persist_disk: true,
cache_disk_path: "/tmp/ex_llm_cache"
Automatic Response Collection with Unified Cache
When persistence is enabled, all cached responses are automatically stored to disk:
# Normal caching usage - responses automatically persist to disk when enabled
{:ok, response} = ExLLM.chat(messages, provider: :openai, cache: true)
{:ok, response} = ExLLM.chat(messages, provider: :anthropic, cache: true)
Benefits of Unified Cache System
- Zero performance impact when persistence is disabled (default)
- Single configuration controls both runtime cache and disk persistence
- Natural development workflow - enable during development, disable in production
- Automatic mock integration - cached responses work seamlessly with Mock adapter
Legacy Response Cache System
For compatibility, the original response cache system is still available:
Enabling Legacy Response Caching
# Enable response caching via environment variables
export EX_LLM_CACHE_RESPONSES=true
export EX_LLM_CACHE_DIR="/path/to/cache" # Optional: defaults to /tmp/ex_llm_cache
Automatic Response Collection
When caching is enabled, all provider responses are automatically stored:
# Normal usage - responses are automatically cached
{:ok, response} = ExLLM.chat(messages, provider: :openai)
{:ok, stream} = ExLLM.stream_chat(messages, provider: :anthropic)
Cache Structure
Responses are organized by provider and endpoint:
/tmp/ex_llm_cache/
├── openai/
│ ├── chat.json # Chat completions
│ └── streaming.json # Streaming responses
├── anthropic/
│ ├── chat.json # Claude messages
│ └── streaming.json # Streaming responses
└── openrouter/
└── chat.json # OpenRouter responses
Manual Response Storage
# Store a specific response
ExLLM.ResponseCache.store_response(
"openai", # Provider
"chat", # Endpoint
%{messages: messages}, # Request data
%{"choices" => [...]} # Response data
)
Mock Adapter Integration
Configure the Mock adapter to replay cached responses from any provider:
Using Unified Cache System
With the unified cache system, responses are automatically available for mock testing when disk persistence is enabled:
# 1. Enable disk persistence during development/testing
ExLLM.Cache.configure_disk_persistence(true, "/tmp/ex_llm_cache")
# 2. Use normal caching to collect responses
{:ok, response} = ExLLM.chat(:openai, messages, cache: true)
{:ok, response} = ExLLM.chat(:anthropic, messages, cache: true)
# 3. Configure mock adapter to use cached responses
ExLLM.ResponseCache.configure_mock_provider(:openai)
# 4. Mock calls now return authentic cached responses
{:ok, response} = ExLLM.chat(messages, provider: :mock)
# Returns real OpenAI response structure and content
# 5. Switch to different provider responses
ExLLM.ResponseCache.configure_mock_provider(:anthropic)
{:ok, response} = ExLLM.chat(messages, provider: :mock)
# Now returns real Anthropic response structure
Using Legacy Response Cache
For compatibility with the original caching approach:
# Enable legacy response caching
export EX_LLM_CACHE_RESPONSES=true
# Use cached OpenAI responses for realistic testing
ExLLM.ResponseCache.configure_mock_provider(:openai)
# Now mock calls return authentic OpenAI responses
{:ok, response} = ExLLM.chat(messages, provider: :mock)
# Returns real OpenAI response structure and content
Response Collection for Testing
Collect comprehensive test scenarios:
# Collect responses for common test cases
ExLLM.CachingInterceptor.create_test_collection(:openai)
# Collect specific scenarios
test_cases = [
{[%{role: "user", content: "Hello"}], []},
{[%{role: "user", content: "What is 2+2?"}], [max_tokens: 10]},
{[%{role: "user", content: "Tell me a joke"}], [temperature: 0.8]}
]
ExLLM.CachingInterceptor.collect_test_responses(:anthropic, test_cases)
Cache Management
# List available cached providers
providers = ExLLM.ResponseCache.list_cached_providers()
# => [{"openai", 15}, {"anthropic", 8}] # {provider, response_count}
# Clear cache for specific provider
ExLLM.ResponseCache.clear_provider_cache("openai")
# Clear all cached responses
ExLLM.ResponseCache.clear_all_cache()
# Get specific cached response
cached = ExLLM.ResponseCache.get_response("openai", "chat", request_data)
Configuration Options
# Environment variables
EX_LLM_CACHE_RESPONSES=true # Enable/disable caching
EX_LLM_CACHE_DIR="/custom/path" # Custom cache directory
# Check if caching is enabled
ExLLM.ResponseCache.caching_enabled?()
# => true
# Get current cache directory
ExLLM.ResponseCache.cache_dir()
# => "/tmp/ex_llm_cache"
Use Cases
Development Testing with Unified Cache:
# 1. Enable disk persistence during development
ExLLM.Cache.configure_disk_persistence(true)
# 2. Use normal caching - responses get collected automatically
{:ok, response} = ExLLM.chat(:openai, messages, cache: true)
{:ok, response} = ExLLM.chat(:anthropic, messages, cache: true)
# 3. Use cached responses in tests
ExLLM.ResponseCache.configure_mock_provider(:openai)
# Tests now use real OpenAI response structures
Development Testing with Legacy Cache:
# 1. Collect responses during development
export EX_LLM_CACHE_RESPONSES=true
# Run your app normally - responses get cached
# 2. Use cached responses in tests
ExLLM.ResponseCache.configure_mock_provider(:openai)
# Tests now use real OpenAI response structures
Cost Reduction:
# Unified cache approach - enable persistence temporarily
ExLLM.Cache.configure_disk_persistence(true)
# Cache expensive model responses during development
{:ok, response} = ExLLM.chat(:openai, messages,
cache: true,
model: "gpt-4o" # Expensive model
)
# Response is cached automatically both in memory and disk
# Later testing uses cached response - no API cost
ExLLM.ResponseCache.configure_mock_provider(:openai)
{:ok, same_response} = ExLLM.chat(messages, provider: :mock)
# Disable persistence for production
ExLLM.Cache.configure_disk_persistence(false)
Cross-Provider Testing:
# Test how your app handles different provider response formats
ExLLM.ResponseCache.configure_mock_provider(:openai)
test_openai_format()
ExLLM.ResponseCache.configure_mock_provider(:anthropic)
test_anthropic_format()
ExLLM.ResponseCache.configure_mock_provider(:openrouter)
test_openrouter_format()
Advanced Usage
Streaming Response Caching:
# Streaming responses are automatically cached
{:ok, stream} = ExLLM.stream_chat(messages, provider: :openai)
chunks = Enum.to_list(stream)
# Later, mock can replay the exact same stream
ExLLM.ResponseCache.configure_mock_provider(:openai)
{:ok, cached_stream} = ExLLM.stream_chat(messages, provider: :mock)
# Returns identical streaming chunks
Interceptor Wrapping:
# Manually wrap API calls for caching
{:ok, response} = ExLLM.CachingInterceptor.with_caching(:openai, fn ->
ExLLM.Adapters.OpenAI.chat(messages)
end)
# Wrap streaming calls
{:ok, stream} = ExLLM.CachingInterceptor.with_streaming_cache(
:anthropic,
messages,
options,
fn -> ExLLM.Adapters.Anthropic.stream_chat(messages, options) end
)
Model Discovery
Finding Models
# Get model information
{:ok, info} = ExLLM.get_model_info(:openai, "gpt-4o")
IO.inspect(info)
# => %ExLLM.ModelCapabilities.ModelInfo{
# id: "gpt-4o",
# context_window: 128000,
# max_output_tokens: 16384,
# capabilities: %{
# vision: %{supported: true},
# function_calling: %{supported: true},
# streaming: %{supported: true},
# ...
# }
# }
# Check specific capability
if ExLLM.model_supports?(:openai, "gpt-4o", :vision) do
# Model supports vision
end
Model Recommendations
# Get recommendations based on requirements
recommendations = ExLLM.recommend_models(
features: [:vision, :function_calling],
min_context_window: 100_000,
max_cost_per_1k_tokens: 1.0,
prefer_local: false,
limit: 5
)
for {provider, model, info} <- recommendations do
IO.puts("#{provider}/#{model}")
IO.puts(" Score: #{info.score}")
IO.puts(" Context: #{info.context_window}")
IO.puts(" Cost: $#{info.cost_per_1k}/1k tokens")
end
Finding Models by Feature
# Find all models with specific features
models = ExLLM.find_models_with_features([:vision, :streaming])
# => [
# {:openai, "gpt-4o"},
# {:anthropic, "claude-3-opus-20240229"},
# ...
# ]
# Group models by capability
grouped = ExLLM.models_by_capability(:vision)
# => %{
# supported: [{:openai, "gpt-4o"}, ...],
# not_supported: [{:openai, "gpt-3.5-turbo"}, ...]
# }
Comparing Models
comparison = ExLLM.compare_models([
{:openai, "gpt-4o"},
{:anthropic, "claude-3-5-sonnet-20241022"},
{:gemini, "gemini-1.5-pro"}
])
# See feature support across models
IO.inspect(comparison.features[:vision])
# => [
# %{model: "gpt-4o", supported: true, details: %{...}},
# %{model: "claude-3-5-sonnet", supported: true, details: %{...}},
# %{model: "gemini-1.5-pro", supported: true, details: %{...}}
# ]
Provider Capabilities
Capability Normalization
ExLLM automatically normalizes different provider terminologies:
# These all work and refer to the same capability
ExLLM.provider_supports?(:openai, :function_calling) # => true
ExLLM.provider_supports?(:anthropic, :tool_use) # => true
ExLLM.provider_supports?(:openai, :tools) # => true
# Find providers using any terminology
ExLLM.find_providers_with_features([:tool_use]) # Works!
ExLLM.find_providers_with_features([:function_calling]) # Also works!
Provider Discovery
# Get provider capabilities
{:ok, caps} = ExLLM.get_provider_capabilities(:openai)
IO.inspect(caps)
# => %ExLLM.ProviderCapabilities.ProviderInfo{
# id: :openai,
# name: "OpenAI",
# endpoints: [:chat, :embeddings, :images, ...],
# features: [:streaming, :function_calling, ...],
# limitations: %{max_file_size: 512MB, ...}
# }
# Find providers by feature
providers = ExLLM.find_providers_with_features([:embeddings, :streaming])
# => [:openai, :gemini, :bedrock, ...]
# Check authentication requirements
if ExLLM.provider_requires_auth?(:openai) do
# Provider needs API key
end
# Check if provider is local
if ExLLM.is_local_provider?(:ollama) do
# No API costs
end
Provider Recommendations
recommendations = ExLLM.recommend_providers(%{
required_features: [:vision, :streaming],
preferred_features: [:embeddings, :function_calling],
exclude_providers: [:mock],
prefer_local: false,
prefer_free: false
})
for %{provider: provider, score: score, matched_features: features} <- recommendations do
IO.puts("#{provider}: #{Float.round(score, 2)}")
IO.puts(" Features: #{Enum.join(features, ", ")}")
end
Comparing Providers
comparison = ExLLM.compare_providers([:openai, :anthropic, :gemini])
# See all features across providers
IO.puts("All features: #{Enum.join(comparison.features, ", ")}")
# Check specific provider capabilities
openai_features = comparison.comparison.openai.features
# => [:streaming, :function_calling, :embeddings, ...]
Logging
ExLLM provides a unified logging system with security features.
Basic Logging
alias ExLLM.Logger
# Log at different levels
Logger.debug("Starting chat request")
Logger.info("Chat completed in #{duration}ms")
Logger.warn("Rate limit approaching")
Logger.error("API request failed", error: reason)
Structured Logging
# Log with metadata
Logger.info("Chat completed",
provider: :openai,
model: "gpt-4o",
tokens: 150,
duration_ms: 523
)
# Context-aware logging
Logger.with_context(request_id: "abc123") do
Logger.info("Processing request")
# All logs in this block include request_id
end
Security Features
# API keys are automatically redacted
Logger.info("Using API key", api_key: "sk-1234567890")
# Logs: "Using API key [api_key: REDACTED]"
# Configure content filtering
Application.put_env(:ex_llm, :log_redact_messages, true)
Configuration
# In config/config.exs
config :ex_llm,
log_level: :info, # Minimum level to log
log_redact_keys: true, # Redact API keys
log_redact_messages: false, # Don't log message content
log_include_metadata: true, # Include structured metadata
log_filter_components: [:cache] # Don't log from cache component
See the Logger User Guide for complete documentation.
Testing with Mock Adapter
The mock adapter helps you test LLM integrations without making real API calls.
Basic Mocking
# Start the mock adapter
{:ok, _} = ExLLM.Adapters.Mock.start_link()
# Configure mock response
{:ok, response} = ExLLM.chat(:mock, messages,
mock_response: "This is a mock response"
)
assert response.content == "This is a mock response"
Dynamic Responses
# Use a handler function
{:ok, response} = ExLLM.chat(:mock, messages,
mock_handler: fn messages, _options ->
last_message = List.last(messages)
%ExLLM.Types.LLMResponse{
content: "You said: #{last_message.content}",
model: "mock-model",
usage: %{input_tokens: 10, output_tokens: 20}
}
end
)
Simulating Errors
# Simulate specific errors
{:error, error} = ExLLM.chat(:mock, messages,
mock_error: %ExLLM.Error{
type: :rate_limit,
message: "Rate limit exceeded",
retry_after: 60
}
)
Streaming Mocks
{:ok, stream} = ExLLM.stream_chat(:mock, messages,
mock_chunks: [
%{content: "Hello"},
%{content: " world"},
%{content: "!", finish_reason: "stop"}
],
chunk_delay: 100 # Milliseconds between chunks
)
for chunk <- stream do
IO.write(chunk.content || "")
end
Request Capture
# Capture requests for assertions
ExLLM.Adapters.Mock.clear_requests()
{:ok, _} = ExLLM.chat(:mock, messages,
capture_requests: true,
mock_response: "OK"
)
requests = ExLLM.Adapters.Mock.get_requests()
assert length(requests) == 1
assert List.first(requests).messages == messages
Advanced Topics
Custom Adapters
Create your own adapter for unsupported providers:
defmodule MyApp.CustomAdapter do
@behaviour ExLLM.Adapter
@impl true
def configured?(options) do
# Check if adapter is properly configured
config = get_config(options)
config[:api_key] != nil
end
@impl true
def default_model() do
"custom-model-v1"
end
@impl true
def chat(messages, options) do
# Implement chat logic
# Return {:ok, %ExLLM.Types.LLMResponse{}} or {:error, reason}
end
@impl true
def stream_chat(messages, options) do
# Return {:ok, stream} where stream yields StreamChunk structs
end
# Optional callbacks
@impl true
def list_models(options) do
# Return {:ok, [%ExLLM.Types.Model{}]}
end
@impl true
def embeddings(inputs, options) do
# Return {:ok, %ExLLM.Types.EmbeddingResponse{}}
end
end
Stream Processing
Advanced stream handling:
defmodule StreamProcessor do
def process_with_buffer(provider, messages, opts) do
{:ok, stream} = ExLLM.stream_chat(provider, messages, opts)
stream
|> Stream.scan("", fn chunk, buffer ->
case chunk do
%{content: nil} -> buffer
%{content: text} -> buffer <> text
end
end)
|> Stream.each(fn buffer ->
# Process complete sentences
if String.ends_with?(buffer, ".") do
IO.puts("\nComplete: #{buffer}")
end
end)
|> Stream.run()
end
end
Token Budget Management
Manage token usage across multiple requests:
defmodule TokenBudget do
use GenServer
def init(budget) do
{:ok, %{budget: budget, used: 0}}
end
def track_usage(pid, tokens) do
GenServer.call(pid, {:track, tokens})
end
def handle_call({:track, tokens}, _from, state) do
new_used = state.used + tokens
if new_used <= state.budget do
{:reply, :ok, %{state | used: new_used}}
else
{:reply, {:error, :budget_exceeded}, state}
end
end
end
# Use with ExLLM
{:ok, budget} = GenServer.start_link(TokenBudget, 10_000)
{:ok, response} = ExLLM.chat(:openai, messages)
:ok = TokenBudget.track_usage(budget, response.usage.total_tokens)
Multi-Provider Routing
Route requests to different providers based on criteria:
defmodule ProviderRouter do
def route_request(messages, requirements) do
cond do
# Use local for development
Mix.env() == :dev ->
ExLLM.chat(:ollama, messages)
# Use Groq for speed-critical requests
requirements[:max_latency_ms] < 1000 ->
ExLLM.chat(:groq, messages)
# Use OpenAI for complex reasoning
requirements[:complexity] == :high ->
ExLLM.chat(:openai, messages, model: "gpt-4o")
# Default to Anthropic
true ->
ExLLM.chat(:anthropic, messages)
end
end
end
Batch Processing
Process multiple requests efficiently:
defmodule BatchProcessor do
def process_batch(items, opts \\ []) do
# Use Task.async_stream for parallel processing
items
|> Task.async_stream(
fn item ->
ExLLM.chat(opts[:provider] || :openai, [
%{role: "user", content: item}
])
end,
max_concurrency: opts[:concurrency] || 5,
timeout: opts[:timeout] || 30_000
)
|> Enum.map(fn
{:ok, {:ok, response}} -> {:ok, response}
{:ok, {:error, reason}} -> {:error, reason}
{:exit, reason} -> {:error, {:timeout, reason}}
end)
end
end
Custom Configuration Management
Implement advanced configuration strategies:
defmodule ConfigManager do
use GenServer
def start_link(opts) do
GenServer.start_link(__MODULE__, opts, name: __MODULE__)
end
def init(_opts) do
# Load from multiple sources
config = %{}
|> load_from_env()
|> load_from_file()
|> load_from_vault()
|> validate_config()
{:ok, config}
end
def get_config(provider) do
GenServer.call(__MODULE__, {:get, provider})
end
defp load_from_vault(config) do
# Fetch from HashiCorp Vault, AWS Secrets Manager, etc.
Map.merge(config, fetch_secrets())
end
end
Best Practices
- Always handle errors - LLM APIs can fail for various reasons
- Use streaming for long responses - Better user experience
- Enable caching for repeated queries - Save costs
- Monitor token usage - Stay within budget
- Use appropriate models - Don't use GPT-4 for simple tasks
- Implement fallbacks - Have backup providers ready
- Test with mocks - Don't make API calls in tests
- Use context management - Handle long conversations properly
- Track costs - Monitor spending across providers
- Follow rate limits - Respect provider limitations
Troubleshooting
Common Issues
"API key not found"
- Check environment variables
- Verify configuration provider is started
- Use
ExLLM.configured?/1
to debug
"Context length exceeded"
- Use context management strategies
- Choose models with larger context windows
- Truncate conversation history
"Rate limit exceeded"
- Enable automatic retry
- Implement backoff strategies
- Consider multiple API keys
"Stream interrupted"
- Enable stream recovery
- Implement reconnection logic
- Check network stability
"Invalid response format"
- Check provider documentation
- Verify model capabilities
- Use appropriate options
Debug Mode
Enable debug logging:
# In config
config :ex_llm, :log_level, :debug
# Or at runtime
Logger.configure(level: :debug)
Getting Help
- Check the API documentation
- Review example applications
- Open an issue on GitHub
- Read provider-specific documentation
Additional Resources
- Quick Start Guide - Get started quickly
- Provider Capabilities - Detailed provider information
- Logger Guide - Logging system documentation
- API Reference - Complete API documentation