# LLM Providers The Rag library supports multiple LLM providers through a unified interface, enabling flexible provider selection and failover. ## Available Providers | Provider | Module | Embeddings | Tools | Streaming | Max Context | |----------|--------|-----------|-------|-----------|-------------| | **Gemini** | `Rag.Ai.Gemini` | Yes | Yes | Yes | 1M tokens | | **Claude** | `Rag.Ai.Claude` | No | Yes | Yes | 200K tokens | | **Codex** | `Rag.Ai.Codex` | No | Yes | Yes | 128K tokens | | **OpenAI** | `Rag.Ai.OpenAI` | Yes | Yes | Yes | Model-dependent | | **Ollama** | `Rag.Ai.Ollama` | Yes | Yes | Yes | Model-dependent | | **Cohere** | `Rag.Ai.Cohere` | Yes | Yes | Yes | Model-dependent | | **Nx** | `Rag.Ai.Nx` | Yes | No | Config | Local | ## Provider Behaviour All providers implement the `Rag.Ai.Provider` behaviour: ```elixir @callback new(attrs :: map()) :: struct() @callback generate_embeddings(provider, texts, opts) :: {:ok, list(embedding())} | {:error, any()} @callback generate_text(provider, prompt, opts) :: {:ok, response()} | {:error, any()} ``` ## Configuration ### Environment Variables ```bash # Gemini (recommended for embeddings) export GEMINI_API_KEY="your-api-key" # Claude (best for analysis) export ANTHROPIC_API_KEY="your-api-key" # OpenAI/Codex (best for code) export OPENAI_API_KEY="your-api-key" # or export CODEX_API_KEY="your-api-key" ``` ## Gemini Provider The default provider with full embedding support. Models are resolved through `Gemini.Config`, so you can pass alias keys (e.g., `:flash_lite_latest`) or omit `:model` to use auth-aware defaults. Optional app-wide defaults: ```elixir config :rag, Rag.Ai.Gemini, model: Gemini.Config.default_model(), embedding_model: Gemini.Config.default_embedding_model() ``` ### Usage ```elixir alias Rag.Ai.Gemini alias Gemini.Config, as: GeminiConfig # Create provider instance provider = Gemini.new(%{model: :flash_lite_latest}) # Text generation {:ok, response} = Gemini.generate_text(provider, "Hello!", []) # Streaming {:ok, stream} = Gemini.generate_text(provider, "Hello!", stream: true) Enum.each(stream, &IO.write/1) # Embeddings {:ok, embeddings} = Gemini.generate_embeddings(provider, ["text1", "text2"], []) ``` ### Options ```elixir # Text generation options [ stream: false, # Enable streaming temperature: 0.7, # Randomness (0.0-2.0) max_tokens: 1024, # Max output tokens top_p: 0.9, # Nucleus sampling top_k: 40 # Top-K sampling ] # Embedding options [ task_type: :retrieval_document, # or :retrieval_query model: GeminiConfig.default_embedding_model() # Auth-aware default ] ``` ### Capabilities ```elixir Gemini.supports_tools?() # true Gemini.supports_embeddings?() # true Gemini.max_context_tokens() # 1_000_000 Gemini.cost_per_1k_tokens() # {0.000075, 0.000300} ``` ## Claude Provider Best for analysis, reasoning, and agentic workflows. ### Usage ```elixir alias Rag.Ai.Claude provider = Claude.new(%{ model: "claude-sonnet-4-20250514", max_turns: 10 }) # Text generation {:ok, response} = Claude.generate_text(provider, "Analyze this code", []) # With system prompt {:ok, response} = Claude.generate_text(provider, "Hello", system_prompt: "You are a helpful assistant." ) # Embeddings NOT supported {:error, :not_supported} = Claude.generate_embeddings(provider, ["text"], []) ``` ### Options ```elixir [ stream: false, # Enable streaming system_prompt: "You are...", # System instruction output_format: :text # Output format ] ``` ## Codex Provider (OpenAI-compatible) Best for code generation and structured output. ### Usage ```elixir alias Rag.Ai.Codex provider = Codex.new(%{ model: "gpt-4o", reasoning_effort: :medium # :low, :medium, or :high }) # Text generation {:ok, response} = Codex.generate_text(provider, "Write a function", []) # With structured output {:ok, response} = Codex.generate_text(provider, "Generate JSON", output_schema: %{type: "object", properties: %{...}} ) ``` ## OpenAI Provider (Direct HTTP) Alternative OpenAI implementation without SDK. ### Usage ```elixir alias Rag.Ai.OpenAI provider = OpenAI.new(%{ embeddings_url: "https://api.openai.com/v1/embeddings", embeddings_model: "text-embedding-3-small", text_url: "https://api.openai.com/v1/chat/completions", text_model: "gpt-4o", api_key: System.get_env("OPENAI_API_KEY") }) {:ok, embeddings} = OpenAI.generate_embeddings(provider, ["text"], []) {:ok, response} = OpenAI.generate_text(provider, "Hello", []) ``` ## Ollama Provider (Local) For self-hosted local models. ### Usage ```elixir alias Rag.Ai.Ollama provider = Ollama.new(%{ embeddings_url: "http://localhost:11434/api/embed", embeddings_model: "nomic-embed-text", text_url: "http://localhost:11434/api/chat", text_model: "llama2" }) {:ok, embeddings} = Ollama.generate_embeddings(provider, ["text"], []) {:ok, response} = Ollama.generate_text(provider, "Hello", []) ``` ## Nx Provider (On-Device) For local inference using Bumblebee models. ### Usage ```elixir alias Rag.Ai.Nx # Must pre-configure Nx.Serving instances provider = Nx.new(%{ embeddings_serving: embedding_serving, # from Bumblebee text_serving: text_serving }) {:ok, embeddings} = Nx.generate_embeddings(provider, ["text"], []) ``` ## Capabilities Module Query provider capabilities at runtime: ```elixir alias Rag.Ai.Capabilities # Get all providers Capabilities.all() # %{gemini: %{embeddings: true, ...}, claude: %{...}, codex: %{...}} # Get available providers (with valid credentials) Capabilities.available() # Check specific capability Capabilities.can_handle?(:gemini, :embeddings) # true Capabilities.can_handle?(:claude, :embeddings) # false # Get providers with capability Capabilities.with_capability(:embeddings) # [{:gemini, %{...}}] # Best provider for task Capabilities.best_for(:embeddings) # :gemini Capabilities.best_for(:code_generation) # :codex Capabilities.best_for(:analysis) # :claude Capabilities.best_for(:long_context) # :gemini ``` ### Task Mappings | Task | Best Provider | Reason | |------|---------------|--------| | `:embeddings` | Gemini | Only provider with embedding support | | `:long_context` | Gemini | 1M token context window | | `:multimodal` | Gemini | Image/audio support | | `:cost` | Gemini | Most cost-effective | | `:speed` | Gemini | Fastest inference | | `:code_generation` | Codex | Optimized for code | | `:structured_output` | Codex | Best JSON generation | | `:analysis` | Claude | Deep reasoning | | `:writing` | Claude | Best prose quality | | `:agentic` | Claude | Multi-step workflows | | `:reasoning` | Claude | Complex logic | | `:safety` | Claude | Strongest safety | ## Streaming Responses All major providers support streaming: ```elixir {:ok, stream} = Router.execute(router, :text, "Count to 10", stream: true) # Consume stream Enum.each(stream, fn chunk -> IO.write(chunk) end) ``` ## Cost Comparison | Provider | Input (per 1M tokens) | Output (per 1M tokens) | |----------|----------------------|------------------------| | Gemini | $0.075 | $0.30 | | Claude | $3.00 | $15.00 | | Codex/GPT-4o | $2.50 | $10.00 | ## Best Practices 1. **Use Gemini for embeddings** - It's the only provider with native embedding support 2. **Use Claude for analysis** - Best reasoning capabilities 3. **Use Codex for code** - Optimized for code generation 4. **Configure fallback** - Use Router with multiple providers for reliability 5. **Check capabilities first** - Use `Capabilities.can_handle?/2` before calling ## Next Steps - [Smart Router](router.md) - Learn about routing strategies - [Embeddings](embeddings.md) - Deep dive into embedding generation