API Reference ex_llm v#0.5.0

View Source

Modules

ExLLM - Unified Elixir client library for Large Language Models.

Behaviour for LLM backend adapters.

Anthropic Claude API adapter for ExLLM.

AWS Bedrock adapter for ExLLM. Supports multiple providers including Claude, Titan, Llama, Cohere, AI21, and Mistral through Bedrock.

Bumblebee adapter for on-device LLM inference.

Google Gemini API adapter for ExLLM.

Groq adapter for ExLLM.

LM Studio adapter for local LLM inference.

Mistral AI API adapter for ExLLM.

Mock adapter for testing ExLLM integrations.

Ollama API adapter for ExLLM - provides local model inference via Ollama server.

OpenAI GPT API adapter for ExLLM.

Base implementation for OpenAI-compatible API providers.

OpenRouter API adapter for ExLLM.

Perplexity AI API adapter for ExLLM.

Shared configuration management utilities for adapters.

Shared error handling utilities for ExLLM adapters.

Shared HTTP client utilities for ExLLM adapters.

Shared message formatting utilities for ExLLM adapters.

Unified behavior and utilities for fetching models from LLM provider APIs.

Shared utilities for model management across adapters.

Unified request building for LLM providers.

Shared utilities for building standardized responses across adapters.

Shared streaming behavior and utilities for ExLLM adapters.

Unified streaming coordinator for all LLM adapters.

Shared validation functions for ExLLM adapters.

Unified vision/multimodal content formatting for LLM providers.

Adapter for X.AI's Grok models.

Configuration module for EXLA/EMLX backend optimization.

Handles loading and caching of Bumblebee models for local inference.

Token counting utilities for local models.

Unified caching system for ExLLM providing both runtime performance caching and optional disk persistence for development/testing.

Cache statistics.

Behaviour for cache storage backends.

ETS-based storage backend for ExLLM cache.

Interceptor module for automatically caching provider responses.

Unified capability querying with automatic normalization.

Behaviour for configuration providers.

Default configuration provider that reads from environment variables. This is an alias for the Env provider for backward compatibility.

Environment-based configuration provider.

Static configuration provider for testing and library usage.

Context management for LLM conversations.

Message truncation strategies for context window management.

Cost calculation functionality for ExLLM.

Standardized error types and utilities for ExLLM.

Function calling support for ExLLM.

Represents a callable function/tool.

Represents a function call request from the LLM.

Represents the result of executing a function.

OAuth2 authentication helper for Google Gemini APIs that require user authentication.

Base HTTP request functionality for Gemini API modules.

Google Gemini Context Caching API implementation.

Represents cached content that can be reused across requests.

Token usage information for cached content.

Chunk management for Gemini's Semantic Retrieval API.

Response from batch operations containing a list of chunks.

Extracted data that represents the Chunk content.

User provided metadata stored as key-value pairs.

Response from listing chunks with pagination support.

User provided string values assigned to a single metadata key.

Google Gemini Content Generation API implementation.

A generated candidate response.

Represents content with a role and parts.

Request structure for content generation.

Response from content generation.

Configuration for content generation.

Represents a content part which can be text, inline data, or function call/response.

Safety settings for content generation.

Tool definitions for function calling.

Configuration for tool usage.

Token usage information.

Google Gemini Corpus Management API implementation.

Filter condition for metadata values.

Information about a corpus.

Request structure for creating a corpus.

Request structure for listing corpora.

Response from listing corpora.

Filter for chunk and document metadata.

Request structure for querying a corpus.

Response from querying a corpus.

A chunk relevant to a query with its relevance score.

Request structure for updating a corpus.

Document management for Gemini's Semantic Retrieval API.

Filter condition applicable to a single key.

User provided metadata stored as key-value pairs.

Response from listing documents with pagination support.

User provided filter to limit retrieval based on Chunk or Document level metadata values.

Response from document query containing a list of relevant chunks.

The information for a chunk relevant to a query.

User provided string values assigned to a single metadata key.

Google Gemini Embeddings API implementation.

A list of floats representing an embedding.

Request containing the Content for the model to embed.

Google Gemini Files API implementation.

Represents an uploaded file in the Gemini API.

Error status information.

Metadata for video files.

Google Gemini Live API implementation using WebSockets.

Content structure for Live API

Function call details

Generation configuration for Live API

Server disconnect notification

Real-time input message

Initial session setup message

Tool call message from server

Google Gemini Models API implementation.

Represents a Gemini model with its capabilities and metadata.

Google Gemini Permissions API implementation.

Response from listing permissions.

Permission resource that grants access to a tuned model or corpus.

Request to transfer ownership of a tuned model.

Google Gemini Question Answering API implementation.

Request structure for generating grounded answers.

Response structure for grounded answers.

A single passage included inline with a grounding configuration.

A list of passages provided inline with the request.

Feedback related to the input data used to answer the question.

Configuration for retrieving grounding content from a Corpus or Document created using the Semantic Retriever API.

Google Gemini Token Counting API implementation.

Request structure for counting tokens.

Response from the token counting API.

Token count breakdown by modality (TEXT, IMAGE, AUDIO, VIDEO).

Google Gemini Fine-tuning API implementation.

Dataset for training or validation.

Hyperparameters controlling the tuning process.

Response from listing tuned models.

A fine-tuned model created using the tuning API.

Tuned model as a source for training a new model.

A single example for tuning.

A set of tuning examples.

Record for a single tuning step.

Tuning task that creates tuned models.

Structured output support for ExLLM using instructor_ex.

Unified logging for ExLLM with automatic context and security features.

Model capability discovery and management for ExLLM.

Represents a model capability or feature.

Complete information about a model's capabilities.

Model configuration loader for ExLLM.

Dynamic model loader for ExLLM adapters.

Provider-level capability tracking for ExLLM.

Represents provider-level information and capabilities.

Response caching system for collecting and storing real provider responses.

Request retry logic with exponential backoff for ExLLM.

Circuit breaker to prevent cascading failures.

Defines a retry policy for a specific provider or operation.

Session management for ExLLM - handles conversation sessions with LLM providers.

Type definitions for ExLLM.Session.

Represents a conversation session with message history and metadata.

Streaming error recovery and resumption support for ExLLM.

Represents a partial streaming response that can be resumed.

Shared type definitions used across ExLLM modules.

Represents an available embedding model.

Represents an embedding response from an LLM provider.

Standard response format from LLM adapters with integrated cost calculation.

Represents an available LLM model.

Represents a chunk from a streaming LLM response.

Vision and multimodal support for ExLLM.