# ExLLM User Guide This comprehensive guide covers all features and capabilities of the ExLLM library. ## Table of Contents 1. [Installation and Setup](#installation-and-setup) 2. [Configuration](#configuration) 3. [Basic Usage](#basic-usage) 4. [Providers](#providers) 5. [Chat Completions](#chat-completions) 6. [Streaming](#streaming) 7. [Session Management](#session-management) 8. [Context Management](#context-management) 9. [Function Calling](#function-calling) 10. [Vision and Multimodal](#vision-and-multimodal) 11. [Embeddings](#embeddings) 12. [Structured Outputs](#structured-outputs) 13. [Cost Tracking](#cost-tracking) 14. [Error Handling and Retries](#error-handling-and-retries) 15. [Caching](#caching) 16. [Response Caching](#response-caching) 17. [Model Discovery](#model-discovery) 18. [Provider Capabilities](#provider-capabilities) 19. [Logging](#logging) 20. [Testing with Mock Adapter](#testing-with-mock-adapter) 21. [Advanced Topics](#advanced-topics) ## Installation and Setup ### Adding to Your Project Add ExLLM to your `mix.exs` dependencies: ```elixir def deps do [ {:ex_llm, "~> 0.4.1"}, # Included dependencies (automatically installed with ex_llm): # - {:instructor, "~> 0.1.0"} - For structured outputs # - {:bumblebee, "~> 0.5"} - For local model inference # - {:nx, "~> 0.7"} - For numerical computing # Optional hardware acceleration (choose one): # {:exla, "~> 0.7"} # For CUDA/ROCm GPUs # {:emlx, github: "elixir-nx/emlx", branch: "main"} # For Apple Silicon ] end ``` Run `mix deps.get` to install the dependencies. ### Optional Dependencies - **Req**: HTTP client (automatically included) - **Jason**: JSON parser (automatically included) - **Instructor**: Structured outputs with schema validation (automatically included) - **Bumblebee**: Local model inference (automatically included) - **Nx**: Numerical computing (automatically included) - **EXLA**: CUDA/ROCm GPU acceleration (optional) - **EMLX**: Apple Silicon Metal acceleration (optional) ## Configuration ExLLM supports multiple configuration methods to suit different use cases. ### Environment Variables The simplest way to configure ExLLM: ```bash # OpenAI export OPENAI_API_KEY="sk-..." export OPENAI_API_BASE="https://api.openai.com/v1" # Optional custom endpoint # Anthropic export ANTHROPIC_API_KEY="sk-ant-..." # Google Gemini export GOOGLE_API_KEY="..." # or export GEMINI_API_KEY="..." # Groq export GROQ_API_KEY="gsk_..." # OpenRouter export OPENROUTER_API_KEY="sk-or-..." # X.AI export XAI_API_KEY="xai-..." # Mistral AI export MISTRAL_API_KEY="..." # Perplexity export PERPLEXITY_API_KEY="pplx-..." # AWS Bedrock export AWS_ACCESS_KEY_ID="..." export AWS_SECRET_ACCESS_KEY="..." export AWS_REGION="us-east-1" # Ollama export OLLAMA_API_BASE="http://localhost:11434" # LM Studio export LMSTUDIO_API_BASE="http://localhost:1234" ``` ### Static Configuration For more control, use static configuration: ```elixir config = %{ openai: %{ api_key: "sk-...", api_base: "https://api.openai.com/v1", default_model: "gpt-4o" }, anthropic: %{ api_key: "sk-ant-...", default_model: "claude-3-5-sonnet-20241022" } } {:ok, provider} = ExLLM.ConfigProvider.Static.start_link(config) # Use with config_provider option {:ok, response} = ExLLM.chat(:openai, messages, config_provider: provider) ``` ### Custom Configuration Provider Implement your own configuration provider: ```elixir defmodule MyApp.ConfigProvider do @behaviour ExLLM.ConfigProvider def get([:openai, :api_key]), do: fetch_from_vault("openai_key") def get([:anthropic, :api_key]), do: fetch_from_vault("anthropic_key") def get(_path), do: nil def get_all() do %{ openai: %{api_key: fetch_from_vault("openai_key")}, anthropic: %{api_key: fetch_from_vault("anthropic_key")} } end end # Use it {:ok, response} = ExLLM.chat(:openai, messages, config_provider: MyApp.ConfigProvider ) ``` ## Basic Usage ### Simple Chat ```elixir messages = [ %{role: "user", content: "Hello, how are you?"} ] {:ok, response} = ExLLM.chat(:openai, messages) IO.puts(response.content) ``` ### Provider/Model Syntax ```elixir # Use provider/model string syntax {:ok, response} = ExLLM.chat("anthropic/claude-3-haiku-20240307", messages) # Equivalent to {:ok, response} = ExLLM.chat(:anthropic, messages, model: "claude-3-haiku-20240307" ) ``` ### Response Structure ```elixir %ExLLM.Types.LLMResponse{ content: "I'm doing well, thank you!", model: "gpt-4o", finish_reason: "stop", usage: %{ input_tokens: 12, output_tokens: 8, total_tokens: 20 }, cost: %{ input_cost: 0.00006, output_cost: 0.00016, total_cost: 0.00022, currency: "USD" } } ``` ## Providers ### Supported Providers ExLLM supports these providers out of the box: - **:openai** - OpenAI GPT models - **:anthropic** - Anthropic Claude models - **:gemini** - Google Gemini models - **:groq** - Groq fast inference - **:mistral** - Mistral AI models - **:perplexity** - Perplexity search-enhanced models - **:ollama** - Local models via Ollama - **:lmstudio** - Local models via LM Studio - **:bedrock** - AWS Bedrock - **:openrouter** - OpenRouter (300+ models) - **:xai** - X.AI Grok models - **:bumblebee** - Local models via Bumblebee/NX - **:mock** - Mock adapter for testing ### Checking Provider Configuration ```elixir # Check if a provider is configured if ExLLM.configured?(:openai) do {:ok, response} = ExLLM.chat(:openai, messages) end # Get default model for a provider model = ExLLM.default_model(:anthropic) # => "claude-3-5-sonnet-20241022" # List available models {:ok, models} = ExLLM.list_models(:openai) for model <- models do IO.puts("#{model.id}: #{model.context_window} tokens") end ``` ## Chat Completions ### Basic Options ```elixir {:ok, response} = ExLLM.chat(:openai, messages, model: "gpt-4o", # Specific model temperature: 0.7, # 0.0-1.0, higher = more creative max_tokens: 1000, # Max response length top_p: 0.9, # Nucleus sampling frequency_penalty: 0.5, # Reduce repetition presence_penalty: 0.5, # Encourage new topics stop: ["\n\n", "END"], # Stop sequences seed: 12345, # Reproducible outputs timeout: 60_000 # Request timeout in ms (default: provider-specific) ) ``` ### Timeout Configuration Different providers have different timeout requirements. ExLLM allows you to configure timeouts per request: ```elixir # Ollama with function calling (can be slow) {:ok, response} = ExLLM.chat(:ollama, messages, functions: functions, timeout: 300_000 # 5 minutes ) # Quick requests with shorter timeout {:ok, response} = ExLLM.chat(:openai, messages, timeout: 30_000 # 30 seconds ) ``` Default timeouts: - **Ollama**: 120,000ms (2 minutes) - Local models can be slower - **Other providers**: Use their HTTP client defaults (typically 30-60 seconds) ### System Messages ```elixir messages = [ %{role: "system", content: "You are a helpful coding assistant."}, %{role: "user", content: "How do I read a file in Elixir?"} ] {:ok, response} = ExLLM.chat(:openai, messages) ``` ### Multi-turn Conversations ```elixir conversation = [ %{role: "user", content: "What's the capital of France?"}, %{role: "assistant", content: "The capital of France is Paris."}, %{role: "user", content: "What's the population?"} ] {:ok, response} = ExLLM.chat(:openai, conversation) ``` ## Streaming ### Basic Streaming ```elixir {:ok, stream} = ExLLM.stream_chat(:openai, messages) for chunk <- stream do case chunk do %{content: content} when content != nil -> IO.write(content) %{finish_reason: reason} when reason != nil -> IO.puts("\nFinished: #{reason}") _ -> # Other chunk types (role, etc.) :ok end end ``` ### Streaming with Callback ```elixir {:ok, stream} = ExLLM.stream_chat(:openai, messages, on_chunk: fn chunk -> if chunk.content, do: IO.write(chunk.content) end ) # Consume the stream Enum.to_list(stream) ``` ### Collecting Streamed Response ```elixir {:ok, stream} = ExLLM.stream_chat(:openai, messages) # Collect all chunks into a single response full_content = stream |> Enum.map(& &1.content) |> Enum.reject(&is_nil/1) |> Enum.join("") ``` ### Stream Recovery Enable automatic stream recovery for interrupted streams: ```elixir {:ok, stream} = ExLLM.stream_chat(:openai, messages, stream_recovery: true, recovery_strategy: :exact # :exact, :paragraph, or :summarize ) # If stream is interrupted, you can resume {:ok, resumed_stream} = ExLLM.resume_stream(recovery_id) ``` ## Session Management Sessions provide stateful conversation management with automatic token tracking. ### Creating and Using Sessions ```elixir # Create a new session session = ExLLM.new_session(:openai, name: "Customer Support") # Chat with session (automatically manages message history) {:ok, {response, session}} = ExLLM.chat_with_session( session, "What's the weather like?" ) # Continue the conversation {:ok, {response2, session}} = ExLLM.chat_with_session( session, "What should I wear?" ) # Check token usage total_tokens = ExLLM.session_token_usage(session) IO.puts("Total tokens used: #{total_tokens}") ``` ### Managing Session Messages ```elixir # Add messages manually session = ExLLM.add_session_message(session, "user", "Hello!") session = ExLLM.add_session_message(session, "assistant", "Hi there!") # Get message history messages = ExLLM.get_session_messages(session) recent_10 = ExLLM.get_session_messages(session, 10) # Clear messages but keep session metadata session = ExLLM.clear_session(session) ``` ### Persisting Sessions ```elixir # Save session to JSON {:ok, json} = ExLLM.save_session(session) File.write!("session.json", json) # Load session from JSON {:ok, json} = File.read("session.json") {:ok, restored_session} = ExLLM.load_session(json) ``` ### Session with Context ```elixir # Create session with default context session = ExLLM.new_session(:openai, name: "Tech Support", context: %{ temperature: 0.3, system_message: "You are a technical support agent." } ) # Context is automatically applied to all chats {:ok, {response, session}} = ExLLM.chat_with_session(session, "Help!") ``` ## Context Management Automatically manage conversation context to fit within model limits. ### Context Window Validation ```elixir # Check if messages fit in context window case ExLLM.validate_context(messages, provider: :openai, model: "gpt-4") do {:ok, token_count} -> IO.puts("Messages use #{token_count} tokens") {:error, reason} -> IO.puts("Messages too large: #{reason}") end # Get context window size for a model window_size = ExLLM.context_window_size(:anthropic, "claude-3-opus-20240229") # => 200000 ``` ### Automatic Message Truncation ```elixir # Prepare messages to fit in context window truncated = ExLLM.prepare_messages(long_conversation, provider: :openai, model: "gpt-4", max_tokens: 4000, # Reserve tokens for response strategy: :sliding_window, # or :smart preserve_messages: 5 # Always keep last 5 messages ) ``` ### Truncation Strategies 1. **:sliding_window** - Keep most recent messages 2. **:smart** - Preserve system messages and recent context ```elixir # Smart truncation preserves important context {:ok, response} = ExLLM.chat(:openai, very_long_conversation, strategy: :smart, preserve_messages: 10 ) ``` ### Context Statistics ```elixir stats = ExLLM.context_stats(messages) # => %{ # message_count: 20, # total_tokens: 1500, # by_role: %{"user" => 10, "assistant" => 9, "system" => 1}, # avg_tokens_per_message: 75 # } ``` ## Function Calling Enable AI models to call functions/tools in your application. ### Basic Function Calling ```elixir # Define available functions functions = [ %{ name: "get_weather", description: "Get current weather for a location", parameters: %{ type: "object", properties: %{ location: %{ type: "string", description: "City and state, e.g. San Francisco, CA" }, unit: %{ type: "string", enum: ["celsius", "fahrenheit"], description: "Temperature unit" } }, required: ["location"] } } ] # Let the AI decide when to call functions {:ok, response} = ExLLM.chat(:openai, [%{role: "user", content: "What's the weather in NYC?"}], functions: functions, function_call: "auto" # or "none" or %{name: "get_weather"} ) ``` ### Handling Function Calls ```elixir # Parse function calls from response case ExLLM.parse_function_calls(response, :openai) do {:ok, [function_call | _]} -> # AI wants to call a function IO.inspect(function_call) # => %ExLLM.FunctionCalling.FunctionCall{ # name: "get_weather", # arguments: %{"location" => "New York, NY"} # } # Execute the function result = get_weather_impl(function_call.arguments["location"]) # Format result for conversation function_message = ExLLM.format_function_result( %ExLLM.FunctionCalling.FunctionResult{ name: "get_weather", result: result }, :openai ) # Continue conversation with function result messages = messages ++ [response_message, function_message] {:ok, final_response} = ExLLM.chat(:openai, messages) {:ok, []} -> # No function call, regular response IO.puts(response.content) end ``` ### Function Execution ```elixir # Define functions with handlers functions_with_handlers = [ %{ name: "calculate", description: "Perform mathematical calculations", parameters: %{ type: "object", properties: %{ expression: %{type: "string"} }, required: ["expression"] }, handler: fn args -> # Your implementation {result, _} = Code.eval_string(args["expression"]) %{result: result} end } ] # Execute function automatically {:ok, result} = ExLLM.execute_function(function_call, functions_with_handlers) ``` ### Provider-Specific Notes Different providers use different terminology: - OpenAI: "functions" and "function_call" - Anthropic: "tools" and "tool_use" - ExLLM normalizes these automatically ## Vision and Multimodal Work with images and other media types. ### Basic Image Analysis ```elixir # Create a vision message {:ok, message} = ExLLM.vision_message( "What's in this image?", ["path/to/image.jpg"] ) # Send to vision-capable model {:ok, response} = ExLLM.chat(:openai, [message], model: "gpt-4o" # or any vision model ) ``` ### Multiple Images ```elixir {:ok, message} = ExLLM.vision_message( "Compare these images", [ "image1.jpg", "image2.jpg", "https://example.com/image3.png" # URLs work too ], detail: :high # :low, :high, or :auto ) ``` ### Loading Images ```elixir # Load image with options {:ok, image_part} = ExLLM.load_image("photo.jpg", detail: :high, resize: {1024, 1024} # Optional resizing ) # Build custom message message = %{ role: "user", content: [ %{type: "text", text: "Describe this image"}, image_part ] } ``` ### Checking Vision Support ```elixir # Check if provider/model supports vision if ExLLM.supports_vision?(:anthropic, "claude-3-opus-20240229") do # This model supports vision end # Find all vision-capable models vision_models = ExLLM.find_models_with_features([:vision]) ``` ### Text Extraction from Images ```elixir # OCR-like functionality {:ok, text} = ExLLM.extract_text_from_image(:openai, "document.png", model: "gpt-4o", prompt: "Extract all text, preserving formatting and layout" ) ``` ### Image Analysis ```elixir # Analyze multiple images {:ok, analysis} = ExLLM.analyze_images(:anthropic, ["chart1.png", "chart2.png"], "Compare these charts and identify trends", model: "claude-3-5-sonnet-20241022" ) ``` ## Embeddings Generate vector embeddings for semantic search and similarity. ### Basic Embeddings ```elixir # Generate embeddings for text {:ok, response} = ExLLM.embeddings(:openai, ["Hello world", "Goodbye world"] ) # Response structure %ExLLM.Types.EmbeddingResponse{ embeddings: [ [0.0123, -0.0456, ...], # 1536 dimensions for text-embedding-3-small [0.0789, -0.0234, ...] ], model: "text-embedding-3-small", usage: %{total_tokens: 8} } ``` ### Embedding Options ```elixir {:ok, response} = ExLLM.embeddings(:openai, texts, model: "text-embedding-3-large", dimensions: 256, # Reduce dimensions (model-specific) encoding_format: "float" # or "base64" ) ``` ### Similarity Search ```elixir # Calculate similarity between embeddings similarity = ExLLM.cosine_similarity(embedding1, embedding2) # => 0.87 (1.0 = identical, 0.0 = orthogonal, -1.0 = opposite) # Find similar items query_embedding = get_embedding("search query") items = [ %{id: 1, text: "Document 1", embedding: [...]}, %{id: 2, text: "Document 2", embedding: [...]}, # ... ] results = ExLLM.find_similar(query_embedding, items, top_k: 10, threshold: 0.7 # Minimum similarity ) # => [ # %{item: %{id: 2, ...}, similarity: 0.92}, # %{item: %{id: 5, ...}, similarity: 0.85}, # ... # ] ``` ### Listing Embedding Models ```elixir {:ok, models} = ExLLM.list_embedding_models(:openai) for model <- models do IO.puts("#{model.name}: #{model.dimensions} dimensions") end ``` ### Caching Embeddings ```elixir # Enable caching for embeddings {:ok, response} = ExLLM.embeddings(:openai, texts, cache: true, cache_ttl: :timer.hours(24) ) ``` ## Structured Outputs Generate structured data with schema validation using Instructor integration. ### Basic Structured Output ```elixir defmodule EmailClassification do use Ecto.Schema embedded_schema do field :category, Ecto.Enum, values: [:personal, :work, :spam] field :priority, Ecto.Enum, values: [:high, :medium, :low] field :summary, :string end end {:ok, result} = ExLLM.chat(:openai, [%{role: "user", content: "Classify this email: Meeting tomorrow at 3pm"}], response_model: EmailClassification, max_retries: 3 # Retry on validation failure ) IO.inspect(result) # => %EmailClassification{ # category: :work, # priority: :high, # summary: "Meeting scheduled for tomorrow" # } ``` ### Complex Schemas ```elixir defmodule ProductExtraction do use Ecto.Schema embedded_schema do field :name, :string field :price, :decimal field :currency, :string field :in_stock, :boolean embeds_many :features, Feature do field :name, :string field :value, :string end end def changeset(struct, params) do struct |> Ecto.Changeset.cast(params, [:name, :price, :currency, :in_stock]) |> Ecto.Changeset.cast_embed(:features) |> Ecto.Changeset.validate_required([:name, :price]) |> Ecto.Changeset.validate_number(:price, greater_than: 0) end end {:ok, product} = ExLLM.chat(:anthropic, [%{role: "user", content: "Extract product info from: iPhone 15 Pro, $999, 256GB storage, A17 chip"}], response_model: ProductExtraction ) ``` ### Lists and Collections ```elixir defmodule TodoList do use Ecto.Schema embedded_schema do embeds_many :todos, Todo do field :task, :string field :priority, Ecto.Enum, values: [:high, :medium, :low] field :completed, :boolean, default: false end end end {:ok, todo_list} = ExLLM.chat(:openai, [%{role: "user", content: "Create a todo list for launching a new feature"}], response_model: TodoList ) ``` ## Cost Tracking ExLLM automatically tracks API costs for all operations. ### Automatic Cost Tracking ```elixir {:ok, response} = ExLLM.chat(:openai, messages) # Cost is included in response IO.inspect(response.cost) # => %{ # input_cost: 0.00003, # output_cost: 0.00006, # total_cost: 0.00009, # currency: "USD" # } # Format for display IO.puts(ExLLM.format_cost(response.cost.total_cost)) # => "$0.009¢" ``` ### Manual Cost Calculation ```elixir usage = %{input_tokens: 1000, output_tokens: 500} cost = ExLLM.calculate_cost(:openai, "gpt-4", usage) # => %{ # input_cost: 0.03, # output_cost: 0.06, # total_cost: 0.09, # currency: "USD", # per_million_input: 30.0, # per_million_output: 120.0 # } ``` ### Token Estimation ```elixir # Estimate tokens for text tokens = ExLLM.estimate_tokens("Hello, world!") # => 4 # Estimate for messages tokens = ExLLM.estimate_tokens([ %{role: "user", content: "Hi"}, %{role: "assistant", content: "Hello!"} ]) # => 12 ``` ### Disabling Cost Tracking ```elixir {:ok, response} = ExLLM.chat(:openai, messages, track_cost: false ) # response.cost will be nil ``` ## Error Handling and Retries ### Automatic Retries Retries are enabled by default with exponential backoff: ```elixir {:ok, response} = ExLLM.chat(:openai, messages, retry: true, # Default: true retry_count: 3, # Default: 3 attempts retry_delay: 1000, # Default: 1 second initial delay retry_backoff: :exponential, # or :linear retry_jitter: true # Add randomness to prevent thundering herd ) ``` ### Error Types ```elixir case ExLLM.chat(:openai, messages) do {:ok, response} -> IO.puts(response.content) {:error, %ExLLM.Error{type: :rate_limit} = error} -> IO.puts("Rate limited. Retry after: #{error.retry_after}") {:error, %ExLLM.Error{type: :invalid_api_key}} -> IO.puts("Check your API key configuration") {:error, %ExLLM.Error{type: :context_length_exceeded}} -> IO.puts("Message too long for model") {:error, %ExLLM.Error{type: :timeout}} -> IO.puts("Request timed out") {:error, error} -> IO.inspect(error) end ``` ### Custom Retry Logic ```elixir defmodule MyApp.RetryHandler do def with_custom_retry(provider, messages, opts \\ []) do Enum.reduce_while(1..5, nil, fn attempt, _acc -> case ExLLM.chat(provider, messages, Keyword.put(opts, :retry, false)) do {:ok, response} -> {:halt, {:ok, response}} {:error, %{type: :rate_limit} = error} -> wait_time = error[:retry_after] || :timer.seconds(attempt * 10) Process.sleep(wait_time) {:cont, nil} {:error, _} = error -> if attempt == 5 do {:halt, error} else Process.sleep(:timer.seconds(attempt)) {:cont, nil} end end end) end end ``` ## Caching Cache responses to reduce API calls and costs. ### Basic Caching ```elixir # Enable caching globally Application.put_env(:ex_llm, :cache_enabled, true) # Or per request {:ok, response} = ExLLM.chat(:openai, messages, cache: true, cache_ttl: :timer.minutes(15) # Default: 15 minutes ) # Same request will use cache {:ok, cached_response} = ExLLM.chat(:openai, messages, cache: true) ``` ### Cache Management ```elixir # Clear specific cache entry ExLLM.Cache.delete(cache_key) # Clear all cache ExLLM.Cache.clear() # Get cache stats stats = ExLLM.Cache.stats() # => %{size: 42, hits: 100, misses: 20} ``` ### Custom Cache Keys ```elixir # Cache key is automatically generated from: # - Provider # - Messages # - Relevant options (model, temperature, etc.) # You can also use manual cache management cache_key = ExLLM.Cache.generate_cache_key(:openai, messages, options) ``` ## Response Caching Cache real provider responses for offline testing and development cost reduction. ExLLM provides two approaches for response caching: 1. **Unified Cache System** (Recommended) - Extends the runtime cache with optional disk persistence 2. **Legacy Response Cache** - Standalone response collection system ### Unified Cache System (Recommended) The unified cache system extends ExLLM's runtime performance cache with optional disk persistence. This provides both speed benefits and testing capabilities from a single system. #### Enabling Unified Cache Persistence ```elixir # Method 1: Environment variables (temporary) export EX_LLM_CACHE_PERSIST=true export EX_LLM_CACHE_DIR="/path/to/cache" # Optional # Method 2: Runtime configuration (recommended for tests) ExLLM.Cache.configure_disk_persistence(true, "/path/to/cache") # Method 3: Application configuration config :ex_llm, cache_persist_disk: true, cache_disk_path: "/tmp/ex_llm_cache" ``` #### Automatic Response Collection with Unified Cache When persistence is enabled, all cached responses are automatically stored to disk: ```elixir # Normal caching usage - responses automatically persist to disk when enabled {:ok, response} = ExLLM.chat(messages, provider: :openai, cache: true) {:ok, response} = ExLLM.chat(messages, provider: :anthropic, cache: true) ``` #### Benefits of Unified Cache System - **Zero performance impact** when persistence is disabled (default) - **Single configuration** controls both runtime cache and disk persistence - **Natural development workflow** - enable during development, disable in production - **Automatic mock integration** - cached responses work seamlessly with Mock adapter ### Legacy Response Cache System For compatibility, the original response cache system is still available: #### Enabling Legacy Response Caching ```elixir # Enable response caching via environment variables export EX_LLM_CACHE_RESPONSES=true export EX_LLM_CACHE_DIR="/path/to/cache" # Optional: defaults to /tmp/ex_llm_cache ``` ### Automatic Response Collection When caching is enabled, all provider responses are automatically stored: ```elixir # Normal usage - responses are automatically cached {:ok, response} = ExLLM.chat(messages, provider: :openai) {:ok, stream} = ExLLM.stream_chat(messages, provider: :anthropic) ``` ### Cache Structure Responses are organized by provider and endpoint: ``` /tmp/ex_llm_cache/ ├── openai/ │ ├── chat.json # Chat completions │ └── streaming.json # Streaming responses ├── anthropic/ │ ├── chat.json # Claude messages │ └── streaming.json # Streaming responses └── openrouter/ └── chat.json # OpenRouter responses ``` ### Manual Response Storage ```elixir # Store a specific response ExLLM.ResponseCache.store_response( "openai", # Provider "chat", # Endpoint %{messages: messages}, # Request data %{"choices" => [...]} # Response data ) ``` ### Mock Adapter Integration Configure the Mock adapter to replay cached responses from any provider: #### Using Unified Cache System With the unified cache system, responses are automatically available for mock testing when disk persistence is enabled: ```elixir # 1. Enable disk persistence during development/testing ExLLM.Cache.configure_disk_persistence(true, "/tmp/ex_llm_cache") # 2. Use normal caching to collect responses {:ok, response} = ExLLM.chat(:openai, messages, cache: true) {:ok, response} = ExLLM.chat(:anthropic, messages, cache: true) # 3. Configure mock adapter to use cached responses ExLLM.ResponseCache.configure_mock_provider(:openai) # 4. Mock calls now return authentic cached responses {:ok, response} = ExLLM.chat(messages, provider: :mock) # Returns real OpenAI response structure and content # 5. Switch to different provider responses ExLLM.ResponseCache.configure_mock_provider(:anthropic) {:ok, response} = ExLLM.chat(messages, provider: :mock) # Now returns real Anthropic response structure ``` #### Using Legacy Response Cache For compatibility with the original caching approach: ```elixir # Enable legacy response caching export EX_LLM_CACHE_RESPONSES=true # Use cached OpenAI responses for realistic testing ExLLM.ResponseCache.configure_mock_provider(:openai) # Now mock calls return authentic OpenAI responses {:ok, response} = ExLLM.chat(messages, provider: :mock) # Returns real OpenAI response structure and content ``` ### Response Collection for Testing Collect comprehensive test scenarios: ```elixir # Collect responses for common test cases ExLLM.CachingInterceptor.create_test_collection(:openai) # Collect specific scenarios test_cases = [ {[%{role: "user", content: "Hello"}], []}, {[%{role: "user", content: "What is 2+2?"}], [max_tokens: 10]}, {[%{role: "user", content: "Tell me a joke"}], [temperature: 0.8]} ] ExLLM.CachingInterceptor.collect_test_responses(:anthropic, test_cases) ``` ### Cache Management ```elixir # List available cached providers providers = ExLLM.ResponseCache.list_cached_providers() # => [{"openai", 15}, {"anthropic", 8}] # {provider, response_count} # Clear cache for specific provider ExLLM.ResponseCache.clear_provider_cache("openai") # Clear all cached responses ExLLM.ResponseCache.clear_all_cache() # Get specific cached response cached = ExLLM.ResponseCache.get_response("openai", "chat", request_data) ``` ### Configuration Options ```elixir # Environment variables EX_LLM_CACHE_RESPONSES=true # Enable/disable caching EX_LLM_CACHE_DIR="/custom/path" # Custom cache directory # Check if caching is enabled ExLLM.ResponseCache.caching_enabled?() # => true # Get current cache directory ExLLM.ResponseCache.cache_dir() # => "/tmp/ex_llm_cache" ``` ### Use Cases **Development Testing with Unified Cache:** ```elixir # 1. Enable disk persistence during development ExLLM.Cache.configure_disk_persistence(true) # 2. Use normal caching - responses get collected automatically {:ok, response} = ExLLM.chat(:openai, messages, cache: true) {:ok, response} = ExLLM.chat(:anthropic, messages, cache: true) # 3. Use cached responses in tests ExLLM.ResponseCache.configure_mock_provider(:openai) # Tests now use real OpenAI response structures ``` **Development Testing with Legacy Cache:** ```elixir # 1. Collect responses during development export EX_LLM_CACHE_RESPONSES=true # Run your app normally - responses get cached # 2. Use cached responses in tests ExLLM.ResponseCache.configure_mock_provider(:openai) # Tests now use real OpenAI response structures ``` **Cost Reduction:** ```elixir # Unified cache approach - enable persistence temporarily ExLLM.Cache.configure_disk_persistence(true) # Cache expensive model responses during development {:ok, response} = ExLLM.chat(:openai, messages, cache: true, model: "gpt-4o" # Expensive model ) # Response is cached automatically both in memory and disk # Later testing uses cached response - no API cost ExLLM.ResponseCache.configure_mock_provider(:openai) {:ok, same_response} = ExLLM.chat(messages, provider: :mock) # Disable persistence for production ExLLM.Cache.configure_disk_persistence(false) ``` **Cross-Provider Testing:** ```elixir # Test how your app handles different provider response formats ExLLM.ResponseCache.configure_mock_provider(:openai) test_openai_format() ExLLM.ResponseCache.configure_mock_provider(:anthropic) test_anthropic_format() ExLLM.ResponseCache.configure_mock_provider(:openrouter) test_openrouter_format() ``` ### Advanced Usage **Streaming Response Caching:** ```elixir # Streaming responses are automatically cached {:ok, stream} = ExLLM.stream_chat(messages, provider: :openai) chunks = Enum.to_list(stream) # Later, mock can replay the exact same stream ExLLM.ResponseCache.configure_mock_provider(:openai) {:ok, cached_stream} = ExLLM.stream_chat(messages, provider: :mock) # Returns identical streaming chunks ``` **Interceptor Wrapping:** ```elixir # Manually wrap API calls for caching {:ok, response} = ExLLM.CachingInterceptor.with_caching(:openai, fn -> ExLLM.Adapters.OpenAI.chat(messages) end) # Wrap streaming calls {:ok, stream} = ExLLM.CachingInterceptor.with_streaming_cache( :anthropic, messages, options, fn -> ExLLM.Adapters.Anthropic.stream_chat(messages, options) end ) ``` ## Model Discovery ### Finding Models ```elixir # Get model information {:ok, info} = ExLLM.get_model_info(:openai, "gpt-4o") IO.inspect(info) # => %ExLLM.ModelCapabilities.ModelInfo{ # id: "gpt-4o", # context_window: 128000, # max_output_tokens: 16384, # capabilities: %{ # vision: %{supported: true}, # function_calling: %{supported: true}, # streaming: %{supported: true}, # ... # } # } # Check specific capability if ExLLM.model_supports?(:openai, "gpt-4o", :vision) do # Model supports vision end ``` ### Model Recommendations ```elixir # Get recommendations based on requirements recommendations = ExLLM.recommend_models( features: [:vision, :function_calling], min_context_window: 100_000, max_cost_per_1k_tokens: 1.0, prefer_local: false, limit: 5 ) for {provider, model, info} <- recommendations do IO.puts("#{provider}/#{model}") IO.puts(" Score: #{info.score}") IO.puts(" Context: #{info.context_window}") IO.puts(" Cost: $#{info.cost_per_1k}/1k tokens") end ``` ### Finding Models by Feature ```elixir # Find all models with specific features models = ExLLM.find_models_with_features([:vision, :streaming]) # => [ # {:openai, "gpt-4o"}, # {:anthropic, "claude-3-opus-20240229"}, # ... # ] # Group models by capability grouped = ExLLM.models_by_capability(:vision) # => %{ # supported: [{:openai, "gpt-4o"}, ...], # not_supported: [{:openai, "gpt-3.5-turbo"}, ...] # } ``` ### Comparing Models ```elixir comparison = ExLLM.compare_models([ {:openai, "gpt-4o"}, {:anthropic, "claude-3-5-sonnet-20241022"}, {:gemini, "gemini-1.5-pro"} ]) # See feature support across models IO.inspect(comparison.features[:vision]) # => [ # %{model: "gpt-4o", supported: true, details: %{...}}, # %{model: "claude-3-5-sonnet", supported: true, details: %{...}}, # %{model: "gemini-1.5-pro", supported: true, details: %{...}} # ] ``` ## Provider Capabilities ### Capability Normalization ExLLM automatically normalizes different provider terminologies: ```elixir # These all work and refer to the same capability ExLLM.provider_supports?(:openai, :function_calling) # => true ExLLM.provider_supports?(:anthropic, :tool_use) # => true ExLLM.provider_supports?(:openai, :tools) # => true # Find providers using any terminology ExLLM.find_providers_with_features([:tool_use]) # Works! ExLLM.find_providers_with_features([:function_calling]) # Also works! ``` ### Provider Discovery ```elixir # Get provider capabilities {:ok, caps} = ExLLM.get_provider_capabilities(:openai) IO.inspect(caps) # => %ExLLM.ProviderCapabilities.ProviderInfo{ # id: :openai, # name: "OpenAI", # endpoints: [:chat, :embeddings, :images, ...], # features: [:streaming, :function_calling, ...], # limitations: %{max_file_size: 512MB, ...} # } # Find providers by feature providers = ExLLM.find_providers_with_features([:embeddings, :streaming]) # => [:openai, :gemini, :bedrock, ...] # Check authentication requirements if ExLLM.provider_requires_auth?(:openai) do # Provider needs API key end # Check if provider is local if ExLLM.is_local_provider?(:ollama) do # No API costs end ``` ### Provider Recommendations ```elixir recommendations = ExLLM.recommend_providers(%{ required_features: [:vision, :streaming], preferred_features: [:embeddings, :function_calling], exclude_providers: [:mock], prefer_local: false, prefer_free: false }) for %{provider: provider, score: score, matched_features: features} <- recommendations do IO.puts("#{provider}: #{Float.round(score, 2)}") IO.puts(" Features: #{Enum.join(features, ", ")}") end ``` ### Comparing Providers ```elixir comparison = ExLLM.compare_providers([:openai, :anthropic, :gemini]) # See all features across providers IO.puts("All features: #{Enum.join(comparison.features, ", ")}") # Check specific provider capabilities openai_features = comparison.comparison.openai.features # => [:streaming, :function_calling, :embeddings, ...] ``` ## Logging ExLLM provides a unified logging system with security features. ### Basic Logging ```elixir alias ExLLM.Logger # Log at different levels Logger.debug("Starting chat request") Logger.info("Chat completed in #{duration}ms") Logger.warn("Rate limit approaching") Logger.error("API request failed", error: reason) ``` ### Structured Logging ```elixir # Log with metadata Logger.info("Chat completed", provider: :openai, model: "gpt-4o", tokens: 150, duration_ms: 523 ) # Context-aware logging Logger.with_context(request_id: "abc123") do Logger.info("Processing request") # All logs in this block include request_id end ``` ### Security Features ```elixir # API keys are automatically redacted Logger.info("Using API key", api_key: "sk-1234567890") # Logs: "Using API key [api_key: REDACTED]" # Configure content filtering Application.put_env(:ex_llm, :log_redact_messages, true) ``` ### Configuration ```elixir # In config/config.exs config :ex_llm, log_level: :info, # Minimum level to log log_redact_keys: true, # Redact API keys log_redact_messages: false, # Don't log message content log_include_metadata: true, # Include structured metadata log_filter_components: [:cache] # Don't log from cache component ``` See the [Logger User Guide](LOGGER.md) for complete documentation. ## Testing with Mock Adapter The mock adapter helps you test LLM integrations without making real API calls. ### Basic Mocking ```elixir # Start the mock adapter {:ok, _} = ExLLM.Adapters.Mock.start_link() # Configure mock response {:ok, response} = ExLLM.chat(:mock, messages, mock_response: "This is a mock response" ) assert response.content == "This is a mock response" ``` ### Dynamic Responses ```elixir # Use a handler function {:ok, response} = ExLLM.chat(:mock, messages, mock_handler: fn messages, _options -> last_message = List.last(messages) %ExLLM.Types.LLMResponse{ content: "You said: #{last_message.content}", model: "mock-model", usage: %{input_tokens: 10, output_tokens: 20} } end ) ``` ### Simulating Errors ```elixir # Simulate specific errors {:error, error} = ExLLM.chat(:mock, messages, mock_error: %ExLLM.Error{ type: :rate_limit, message: "Rate limit exceeded", retry_after: 60 } ) ``` ### Streaming Mocks ```elixir {:ok, stream} = ExLLM.stream_chat(:mock, messages, mock_chunks: [ %{content: "Hello"}, %{content: " world"}, %{content: "!", finish_reason: "stop"} ], chunk_delay: 100 # Milliseconds between chunks ) for chunk <- stream do IO.write(chunk.content || "") end ``` ### Request Capture ```elixir # Capture requests for assertions ExLLM.Adapters.Mock.clear_requests() {:ok, _} = ExLLM.chat(:mock, messages, capture_requests: true, mock_response: "OK" ) requests = ExLLM.Adapters.Mock.get_requests() assert length(requests) == 1 assert List.first(requests).messages == messages ``` ## Advanced Topics ### Custom Adapters Create your own adapter for unsupported providers: ```elixir defmodule MyApp.CustomAdapter do @behaviour ExLLM.Adapter @impl true def configured?(options) do # Check if adapter is properly configured config = get_config(options) config[:api_key] != nil end @impl true def default_model() do "custom-model-v1" end @impl true def chat(messages, options) do # Implement chat logic # Return {:ok, %ExLLM.Types.LLMResponse{}} or {:error, reason} end @impl true def stream_chat(messages, options) do # Return {:ok, stream} where stream yields StreamChunk structs end # Optional callbacks @impl true def list_models(options) do # Return {:ok, [%ExLLM.Types.Model{}]} end @impl true def embeddings(inputs, options) do # Return {:ok, %ExLLM.Types.EmbeddingResponse{}} end end ``` ### Stream Processing Advanced stream handling: ```elixir defmodule StreamProcessor do def process_with_buffer(provider, messages, opts) do {:ok, stream} = ExLLM.stream_chat(provider, messages, opts) stream |> Stream.scan("", fn chunk, buffer -> case chunk do %{content: nil} -> buffer %{content: text} -> buffer <> text end end) |> Stream.each(fn buffer -> # Process complete sentences if String.ends_with?(buffer, ".") do IO.puts("\nComplete: #{buffer}") end end) |> Stream.run() end end ``` ### Token Budget Management Manage token usage across multiple requests: ```elixir defmodule TokenBudget do use GenServer def init(budget) do {:ok, %{budget: budget, used: 0}} end def track_usage(pid, tokens) do GenServer.call(pid, {:track, tokens}) end def handle_call({:track, tokens}, _from, state) do new_used = state.used + tokens if new_used <= state.budget do {:reply, :ok, %{state | used: new_used}} else {:reply, {:error, :budget_exceeded}, state} end end end # Use with ExLLM {:ok, budget} = GenServer.start_link(TokenBudget, 10_000) {:ok, response} = ExLLM.chat(:openai, messages) :ok = TokenBudget.track_usage(budget, response.usage.total_tokens) ``` ### Multi-Provider Routing Route requests to different providers based on criteria: ```elixir defmodule ProviderRouter do def route_request(messages, requirements) do cond do # Use local for development Mix.env() == :dev -> ExLLM.chat(:ollama, messages) # Use Groq for speed-critical requests requirements[:max_latency_ms] < 1000 -> ExLLM.chat(:groq, messages) # Use OpenAI for complex reasoning requirements[:complexity] == :high -> ExLLM.chat(:openai, messages, model: "gpt-4o") # Default to Anthropic true -> ExLLM.chat(:anthropic, messages) end end end ``` ### Batch Processing Process multiple requests efficiently: ```elixir defmodule BatchProcessor do def process_batch(items, opts \\ []) do # Use Task.async_stream for parallel processing items |> Task.async_stream( fn item -> ExLLM.chat(opts[:provider] || :openai, [ %{role: "user", content: item} ]) end, max_concurrency: opts[:concurrency] || 5, timeout: opts[:timeout] || 30_000 ) |> Enum.map(fn {:ok, {:ok, response}} -> {:ok, response} {:ok, {:error, reason}} -> {:error, reason} {:exit, reason} -> {:error, {:timeout, reason}} end) end end ``` ### Custom Configuration Management Implement advanced configuration strategies: ```elixir defmodule ConfigManager do use GenServer def start_link(opts) do GenServer.start_link(__MODULE__, opts, name: __MODULE__) end def init(_opts) do # Load from multiple sources config = %{} |> load_from_env() |> load_from_file() |> load_from_vault() |> validate_config() {:ok, config} end def get_config(provider) do GenServer.call(__MODULE__, {:get, provider}) end defp load_from_vault(config) do # Fetch from HashiCorp Vault, AWS Secrets Manager, etc. Map.merge(config, fetch_secrets()) end end ``` ## Best Practices 1. **Always handle errors** - LLM APIs can fail for various reasons 2. **Use streaming for long responses** - Better user experience 3. **Enable caching for repeated queries** - Save costs 4. **Monitor token usage** - Stay within budget 5. **Use appropriate models** - Don't use GPT-4 for simple tasks 6. **Implement fallbacks** - Have backup providers ready 7. **Test with mocks** - Don't make API calls in tests 8. **Use context management** - Handle long conversations properly 9. **Track costs** - Monitor spending across providers 10. **Follow rate limits** - Respect provider limitations ## Troubleshooting ### Common Issues 1. **"API key not found"** - Check environment variables - Verify configuration provider is started - Use `ExLLM.configured?/1` to debug 2. **"Context length exceeded"** - Use context management strategies - Choose models with larger context windows - Truncate conversation history 3. **"Rate limit exceeded"** - Enable automatic retry - Implement backoff strategies - Consider multiple API keys 4. **"Stream interrupted"** - Enable stream recovery - Implement reconnection logic - Check network stability 5. **"Invalid response format"** - Check provider documentation - Verify model capabilities - Use appropriate options ### Debug Mode Enable debug logging: ```elixir # In config config :ex_llm, :log_level, :debug # Or at runtime Logger.configure(level: :debug) ``` ### Getting Help - Check the [API documentation](https://hexdocs.pm/ex_llm) - Review [example applications](../examples/) - Open an issue on [GitHub](https://github.com/azmaveth/ex_llm) - Read provider-specific documentation ## Additional Resources - [Quick Start Guide](QUICKSTART.md) - Get started quickly - [Provider Capabilities](PROVIDER_CAPABILITIES.md) - Detailed provider information - [Logger Guide](LOGGER.md) - Logging system documentation - [API Reference](https://hexdocs.pm/ex_llm) - Complete API documentation