Changelog

View Source

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

[0.8.1] - 2025-06-17

Added

  • Comprehensive API Documentation - Complete public API reference
    • docs/API_REFERENCE.md - Full public API documentation with examples
    • guides/internal_modules.md - Internal modules guide with migration examples
    • Enhanced ExDoc configuration with organized guide sections
    • Clear separation between public API and internal implementation

Fixed

  • All compilation warnings resolved
    • Replace deprecated Logger.warn with Logger.warning
    • Fix unreachable error clauses in HTTP client and metrics modules
    • Add conditional compilation for optional dependencies (Prometheus, StatsD)
    • Remove unused streaming functions and helper methods
    • Fix module attribute ordering issues
    • Add embeddings function stubs to all provider implementations
    • Fix nil module reference warnings using apply/3

Enhanced

  • Developer Experience - Better documentation structure and API clarity
  • Code Quality - Clean compilation with zero warnings
  • Documentation Organization - Logical grouping of guides and references

[0.8.0] - 2025-06-16

Added

  • Advanced Streaming Infrastructure - Production-ready streaming enhancements
    • StreamBuffer - Memory-efficient circular buffer with overflow protection
    • FlowController - Advanced flow control with backpressure handling
    • ChunkBatcher - Intelligent chunk batching for optimized I/O
    • Configurable consumer types: :direct, :buffered, :managed
    • Comprehensive streaming metrics and monitoring
    • Adaptive batching based on chunk characteristics
    • Graceful degradation for slow consumers
  • Comprehensive Telemetry System - Complete observability and instrumentation
    • Telemetry events for all major operations (chat, streaming, cache, session, context)
    • Optional telemetry_metrics and OpenTelemetry integration
    • Context and session management instrumentation
    • Cache operation tracking with hit/miss/put events
    • Cost calculation and threshold monitoring
    • Default logging handlers with configurable levels

Enhanced

  • Streaming Performance - Reduced system calls through intelligent batching
  • Memory Safety - Fixed-size buffers prevent unbounded memory growth
  • User Experience - Smooth output even with fast providers (Groq, Claude)

Fixed

  • Test Infrastructure - Comprehensive test tagging and organization improvements
    • Fixed 11 failing unit tests by properly categorizing integration vs unit tests
    • Improved test tagging strategy with :unit, :integration, :model_loading, :requires_service tags
    • Fixed MockConfigProvider implementation in Gemini tokens tests
    • Separated unit tests from integration tests requiring external dependencies

[0.7.1] - 2025-06-14

Added

  • Comprehensive Documentation System - Complete ExDoc configuration with organized structure
    • 24 Mix test aliases for targeted testing (provider, capability, and type-based)
    • Organized documentation into logical groups: Guides, References
    • Complete test documentation covering semantic tagging and caching system

Changed

  • Updated ExDoc configuration to include all public documentation files
  • Streamlined documentation structure by removing internal development docs
  • Enhanced README with current feature set and improved examples

Fixed

  • Resolved all ExDoc file reference warnings
  • Fixed documentation generation for publication-ready docs

[0.7.0] - 2025-06-14

Added

  • Advanced Test Response Caching System - Complete caching infrastructure for integration tests
    • Intelligent cache storage with JSON-based persistence
    • TTL-based cache expiration and cleanup
    • Request/response matching with fuzzy algorithms
    • Cache statistics and performance monitoring
    • Automatic cache key generation and indexing
    • Smart fallback strategies for cache misses
    • Configurable cache organization (by provider, test module, or tag)
    • Environment-based cache configuration
    • Mix task for cache management: mix ex_llm.cache

Enhanced

  • Test Caching Performance - 25x speed improvement for integration tests
  • Cache Detection - Automatic detection of destructive operations
  • Response Interception - Transparent request/response caching for HTTP calls
  • Metadata Tracking - Comprehensive test context and response metadata

[0.6.0] - 2025-06-14

Added

  • Comprehensive Test Tagging System - Replaced all 138 generic @tag :skip with meaningful semantic tags
    • :live_api - Tests that call live provider APIs
    • :requires_api_key - Tests needing API keys with provider-specific checking
    • :requires_oauth - Tests needing OAuth2 authentication
    • :requires_service - Tests needing local services (Ollama, LM Studio)
    • :requires_resource - Tests needing pre-existing resources (tuned models, corpora)
    • :integration - Integration tests with external services
    • :external - Tests making external network calls
    • Provider-specific tags: :anthropic, :openai, :gemini, etc.
  • Enhanced Test Caching System - Intelligent caching based on test tags
    • Uses :live_api tag to determine which tests to cache
    • Automatic detection of destructive operations (create, delete, modify)
    • Smart cache exclusion for corpus deletion and state-changing tests
    • 25x speed improvement for cached integration tests (2.2s → 0.09s)
  • Mix Test Aliases - 24 new test aliases for targeted testing
    • Provider-specific: mix test.anthropic, mix test.openai, etc.
    • Tag-based: mix test.integration, mix test.oauth2, mix test.live_api
    • Capability-based: mix test.streaming, mix test.vision
  • ExLLM.Case Test Module - Custom test case with automatic requirement checking
    • Dynamic skipping with meaningful messages when requirements aren't met
    • API key validation per provider
    • OAuth2 token validation
    • Service availability checking

Changed

  • BREAKING: Migrated from generic :skip tags to semantic tagging system
  • Enhanced OAuth2 test helper to use consistent :requires_oauth tag
  • Improved test cache detection to prevent caching destructive operations
  • Updated all provider integration tests with proper module-level tags

Fixed

  • Fixed undefined variable service in ExLLM.Case rescue clause
  • Fixed OpenRouter test compilation error with undefined function
  • Fixed OAuth2 tag inconsistency (now uses :requires_oauth everywhere)
  • Fixed test cache configuration for destructive operation detection

[0.5.0] - 2025-06-13

Added

  • Complete Google Gemini API Implementation - All 15 Gemini APIs now fully implemented
    • Live API: Real-time bidirectional communication with WebSocket support
      • Text, audio, and video streaming capabilities
      • Tool/function calling in live sessions
      • Session resumption and context compression
      • Activity detection and management
      • Audio transcription for input/output
    • Models API: List and get model information
    • Content Generation API: Chat and streaming with multimodal support
    • Token Counting API: Count tokens for any content
    • Files API: Upload and manage media files
    • Context Caching API: Cache content for reuse across requests
    • Embeddings API: Generate text embeddings
    • Fine-tuning API: Create and manage custom tuned models
    • Permissions API: Manage access to tuned models and corpora
    • Question Answering API: Semantic search and QA
    • Corpus Management API: Create and manage knowledge corpora
    • Document Management API: Manage documents within corpora
    • Chunk Management API: Fine-grained document chunk management
    • Retrieval Permissions API: Control access to retrieval resources
  • Gun WebSocket Library: Added Gun dependency for Live API WebSocket support
  • OAuth2 Authentication: Full OAuth2 support for Gemini APIs requiring user auth
  • Comprehensive Test Suite: 477 tests covering all Gemini functionality

Changed

  • Updated Gemini adapter to use new modular API implementation
  • Enhanced authentication to support both API keys and OAuth2 tokens
  • Improved error handling with Gemini-specific error messages
  • Updated documentation with complete Gemini API coverage

Fixed

  • Fixed unused variable warnings in Gemini auth module
  • Fixed Live API compilation errors with proper string escaping
  • Fixed content parsing to handle JSON response formats correctly

[0.4.2] - 2025-06-08

Changed

  • BREAKING: Renamed :local provider atom to :bumblebee for clarity
    • All references to :local in code and documentation have been updated
    • Update any code using ExLLM.chat(:local, ...) to ExLLM.chat(:bumblebee, ...)
  • Changed default Bumblebee model from microsoft/phi-2 to Qwen/Qwen3-0.6B
  • Excluded emlx dependency from Hex package until it's published
  • Updated README with instructions for adding emlx manually for Apple Silicon support
  • Updated documentation to clarify that instructor, bumblebee, and nx are required dependencies
  • Clarified that exla and emlx are optional hardware acceleration backends

Fixed

  • Mock adapter now properly checks for mock_error option in chat function

[0.4.1] - 2025-06-08

Added

  • Response Caching System - Cache real provider responses for offline testing and development
    • Automatic Response Collection: All provider responses automatically cached when enabled
    • Mock Integration: Configure Mock adapter to replay cached responses from any provider
    • Cache Management: Full CRUD operations for cached responses with provider organization
    • Fuzzy Matching: Robust request matching handles real-world usage variations
    • Environment Configuration: Simple enable/disable via EX_LLM_CACHE_RESPONSES environment variable
    • Cost Reduction: Reduce API costs during development by replaying cached responses
    • Realistic Testing: Use authentic provider responses in tests without API calls
    • Streaming Support: Cache and replay streaming responses with exact chunk reproduction
    • Cross-Provider Testing: Test application compatibility across different provider response formats

Changed

  • Enhanced shared response builder to support more response formats (completion, image, audio, moderation)
  • Extended HTTP client with provider-specific headers for 15+ providers
  • Improved error handling with normalization and retry logic for multiple providers

Fixed

  • Fixed pre-push hook to exclude integration tests preventing timeouts
  • Fixed unsafe String.to_atom usage throughout codebase (Sobelow warnings)
  • Fixed length() > 0 warnings by using pattern matching
  • Fixed typing warnings for potentially nil values
  • Fixed ModelConfig runtime path resolution for test environment
  • Fixed ResponseCache JSON key atomization for proper cache loading
  • Fixed capability normalization to handle already-normalized capability names
  • Added missing model capabilities (vision for Claude-3-Opus, reasoning for XAI models)

[0.4.0] - 2025-06-06

Added

  • Complete OpenAI API Implementation - Full support for modern OpenAI API features
    • Audio Features: Support for audio input in messages and audio output configuration
    • Web Search Integration: Support for web search options in chat completions
    • O-Series Model Features: Reasoning effort parameter and developer role support
    • Predicted Outputs: Support for faster regeneration with prediction hints
    • Additional APIs: Six new OpenAI API endpoints
      • moderate_content/2 - Content moderation using OpenAI's moderation API
      • generate_image/2 - DALL-E image generation with configurable parameters
      • transcribe_audio/2 - Whisper audio transcription (basic implementation)
      • upload_file/3 - File upload for assistants and other endpoints (basic implementation)
      • create_assistant/2 - Create assistants with custom instructions and tools
      • create_batch/2 - Batch processing for multiple requests
    • Enhanced Message Support: Multiple content parts per message (text + audio/image)
    • Modern Request Parameters: Support for all modern OpenAI API parameters
      • max_completion_tokens, top_p, frequency_penalty, presence_penalty
      • seed, stop, service_tier, logprobs, top_logprobs
    • JSON Response Formats: JSON mode and JSON Schema structured outputs
    • Modern Tools API: Full support for tools API replacing deprecated functions
    • Enhanced Streaming: Tool calls and usage information in streaming responses
    • Enhanced Usage Tracking: Detailed token usage with cached/reasoning/audio tokens

Changed

  • MessageFormatter: Added support for "developer" role for O1+ models
  • OpenAI Adapter: Comprehensive test coverage with 46 tests following TDD methodology
  • Response Types: Enhanced LLMResponse struct with new fields (refusal, logprobs, tool_calls)

Technical

  • Implemented using Test-Driven Development (TDD) methodology

  • Maintains full backward compatibility with existing API

  • All features validated with comprehensive test suite

  • Proper error handling and API key validation for all new endpoints

  • Ollama Configuration Management - Generate and update local model configurations

    • New generate_config/1 function to create YAML config for all installed models
    • New update_model_config/2 function to update specific model configurations
    • Automatic capability detection using /api/show endpoint
    • Real context window sizes from model metadata
    • Preserves existing configuration when merging
    • Example: ExLLM.Adapters.Ollama.generate_config(save: true)

[0.3.2] - 2025-06-06

Added

  • Capability Normalization - Automatic normalization of provider-specific capability names
    • New ExLLM.Capabilities module providing unified capability interface
    • Normalizes different provider terminologies (e.g., tool_usefunction_calling)
    • Works transparently with all capability query functions
    • Comprehensive mappings for common capability variations
    • Example: find_providers_with_features([:tool_use]) works across all providers
  • Enhanced provider capability tracking with real-time API discovery
    • New fetch_provider_capabilities.py script for API-based capability detection
    • Updated fetch_provider_models.py with better context window detection
    • Fixed incorrect context windows (e.g., GPT-4o now correctly shows 128,000)
    • Automatic capability detection from model IDs
  • New capability normalization demo in example app (option 6 in Provider Capabilities Explorer)
  • Comprehensive Documentation
    • New Quick Start Guide (docs/QUICKSTART.md) - Get up and running in 5 minutes
    • New User Guide (docs/USER_GUIDE.md) - Complete documentation of all features
    • Reorganized documentation into docs/ directory
    • Added prominent documentation links to README

Changed

  • Updated provider_supports?/2, model_supports?/3, find_providers_with_features/1, and find_models_with_features/1 to use normalized capabilities

Fixed

  • Mock provider now properly supports Instructor integration for structured outputs
  • Cost formatting now consistently uses dollars with appropriate decimal places (e.g., "$0.000324" instead of "$0.032¢")
  • Anthropic provider now includes required max_tokens parameter when using Instructor
  • Mock provider now generates semantically meaningful embeddings for realistic similarity search
  • Fixed KeyError when using providers without pricing data (e.g., Ollama)
  • Cost tracking now properly adds cost information to chat responses
  • Ollama now properly supports function calling for compatible models
  • Made request timeouts configurable via :timeout option (defaults: Ollama 2min, others use client defaults)
  • Fixed MatchError in example app when displaying providers without capabilities info
  • Provider and model capability queries now accept any provider's terminology
  • Moved LOGGER.md, PROVIDER_CAPABILITIES.md, and DROPPED.md to docs/ directory
  • Enhanced provider capabilities with data from API discovery scripts

[0.3.1] - 2025-06-05

Added

  • Major Code Refactoring - Reduced code duplication by ~40% through shared modules:
    • StreamingCoordinator - Unified streaming implementation for all adapters
      • Standardized SSE parsing and buffering
      • Provider-agnostic chunk handling
      • Integrated error recovery support
      • Simplified adapter streaming implementations
    • RequestBuilder - Common request construction patterns
      • Unified parameter handling across providers
      • Provider-specific transformations via callbacks
      • Support for chat, embeddings, and completion endpoints
    • ModelFetcher - Standardized model discovery behavior
      • Common API fetching patterns
      • Unified filter/parse/transform pipeline
      • Integration with ModelLoader for caching
    • VisionFormatter - Centralized vision/multimodal content handling
      • Provider-specific image formatting (Anthropic, OpenAI, Gemini)
      • Media type detection from file extensions and magic bytes
      • Base64 encoding/decoding utilities
      • Image size validation
  • Unified ExLLM.Logger module replacing multiple logging approaches
    • Single consistent API for all logging needs
    • Simple Logger-like interface: Logger.info("message")
    • Automatic context tracking with with_context/2
    • Structured logging for LLM-specific events (requests, retries, streaming)
    • Configurable log levels and component filtering
    • Security features: API key and content redaction
    • Performance tracking with automatic duration measurement

Changed

  • BREAKING: Replaced all Logger and DebugLogger usage with unified ExLLM.Logger
    • All modules now use alias ExLLM.Logger instead of require Logger
    • Consistent logging interface across the entire codebase
    • Simplified developer experience with one logging API
  • Enhanced HTTPClient with unified streaming support via post_stream/3
  • Improved error handling consistency across all shared modules
  • Better separation of concerns in adapter implementations

Technical Improvements

  • Reduced code duplication significantly across adapters
  • More maintainable and testable codebase structure
  • Easier to add new providers using shared behaviors
  • Consistent patterns for common operations

[0.3.0] - 2025-06-05

Added

  • X.AI adapter implementation with complete feature support
    • Full OpenAI-compatible API integration
    • Support for all Grok models (Beta, 2, 3, Vision variants)
    • Streaming, function calling, vision, and structured outputs
    • Web search and reasoning capabilities
    • Complete Instructor integration for structured outputs
  • Synced model metadata from LiteLLM (1053 models across 56 providers)
    • New OpenAI models: GPT-4.1 series (gpt-4.1, gpt-4.1-mini, gpt-4.1-nano)
    • New OpenAI O1 reasoning models (o1-pro, o1, o1-mini, o1-preview)
    • New XAI Grok-3 models (grok-3, grok-3-beta, grok-3-fast, grok-3-mini variants)
    • New model capabilities: structured_output, prompt_caching, reasoning, web_search
    • Updated pricing and context windows for all models
  • Fetched latest models from provider APIs (606 models from 6 providers)
    • New Anthropic models: Claude 4 Opus/Sonnet/Haiku with 32K output tokens
    • New Groq models: DeepSeek R1 distilled models, QwQ-32B, Mistral Saba
    • New Gemini models: Gemini 2.5 Pro/Flash, Gemini 2.0 Flash with multimodal support
    • New OpenAI models: O3/O4 series, GPT-4.5 preview, search-enabled models
    • Updated context windows and capabilities from live APIs
  • Groq support for structured outputs via Instructor integration

Changed

  • Updated default models:
    • OpenAI: Set to gpt-4.1-nano
    • Anthropic: Set to claude-3-5-sonnet-latest
  • Enhanced Instructor module to support Groq provider
  • Updated example app to include Groq in structured output demos
  • Updated README.md with current model information:
    • Anthropic: Added Claude 4 series and Claude 3.7
    • OpenAI: Added GPT-4.1 series and O1 reasoning models
    • Gemini: Added Gemini 2.5 and 2.0 series
    • Groq: Added Llama 4 Scout, DeepSeek R1 Distill, and QwQ-32B
  • Task reorganization:
    • Created docs/DROPPED.md for features that don't align with core library mission
    • Reorganized TASKS.md with clearer priorities and focused roadmap
    • Added refactoring tasks to reduce code duplication by ~40%

Fixed

  • Instructor integration now correctly separates params and config for chat_completion
  • Advanced features demo uses correct Mock adapter method (set_stream_chunks)
  • Module reference errors in Context management demo

[0.2.1] - 2025-06-05

Added

  • Provider Capability Discovery System
    • New ExLLM.ProviderCapabilities module for tracking API-level provider capabilities
    • Provider feature discovery independent of specific models
    • Authentication method tracking (API key, OAuth, AWS signature, etc.)
    • Provider endpoint discovery (chat, embeddings, images, audio, etc.)
    • Provider recommendations based on required/preferred features
    • Provider comparison tools for feature analysis
    • Integrated provider capability functions into main ExLLM module
    • Added provider capability explorer to example app demo
  • Environment variable wrapper script (scripts/run_with_env.sh) for Claude CLI usage
  • Groq models API support (https://api.groq.com/openai/v1/models)
  • Dynamic model loading from provider APIs
    • All adapters now fetch models dynamically from provider APIs when available
    • Automatic fallback to YAML configuration when API is unavailable
    • Created ExLLM.ModelLoader module for centralized model loading with caching
    • Anthropic adapter now uses /v1/models API endpoint
    • OpenAI adapter fetches from /v1/models and filters chat models
    • Gemini adapter uses Google's models API
    • Ollama adapter fetches from local server's /api/tags
    • OpenRouter adapter uses public /api/v1/models API
  • OpenRouter adapter with access to 300+ models from multiple providers
    • Support for Claude, GPT-4, Llama, PaLM, and many other model families
    • Unified API interface for different model architectures
    • Automatic model discovery and cost-effective access to premium models
  • External YAML configuration system for model metadata
    • Model pricing, context windows, and capabilities stored in config/models/*.yml
    • Runtime configuration loading with ETS caching for performance
    • Separation of model data from code for easier maintenance
    • Support for easy updates without code changes
  • OpenAI-Compatible base adapter for shared implementation
    • Reduces code duplication across providers with OpenAI-compatible APIs
    • Groq adapter as first implementation using the base adapter
  • Model configuration sync script from LiteLLM
    • Python script to sync model data from LiteLLM's database
    • Added 1048 models with pricing, context windows, and capabilities
    • Automatic conversion from LiteLLM's JSON to ExLLM's YAML format
  • Extracted ALL provider configurations from LiteLLM
    • Created YAML files for 56 unique providers (49 new providers)
    • Includes Azure, Mistral, Perplexity, Together AI, Databricks, and more
    • Ready-to-use configurations for future adapter implementations

Changed

  • BREAKING: Model configuration moved from hardcoded maps to external YAML files
    • All providers now use ExLLM.ModelConfig for pricing and context window data
    • Default models, pricing, and context windows loaded from YAML configuration
    • Added yaml_elixir dependency for YAML parsing
  • Updated Bedrock adapter with comprehensive model support:
    • Added all latest Anthropic models (Claude 4, 3.7, 3.5 series)
    • Added Amazon Nova models (Micro, Lite, Pro, Premier)
    • Added AI21 Labs Jamba series (1.5-large, 1.5-mini, instruct)
    • Added Cohere Command R series (R, R+)
    • Added DeepSeek R1 model
    • Added Meta Llama 4 and 3.x series models
    • Added Mistral Pixtral Large 2025-02
    • Added Writer Palmyra X4 and X5 models
    • Changed default model from "claude-3-sonnet" to "nova-lite" for cost efficiency
  • Updated pricing data for all Bedrock providers with per-1M token rates
  • Updated context window sizes for all new Bedrock models
  • Enhanced streaming support for all new providers (Writer, DeepSeek)
  • All adapters now use ModelConfig for consistent default model retrieval

Changed

  • BREAKING: Refactored ExLLM.Adapters.OpenAICompatible base adapter
    • Extracted common helper functions (format_model_name/1, default_model_transformer/2) as public module functions
    • Simplified adapter implementations by removing duplicate code
    • Added ModelLoader integration to base adapter for consistent dynamic model loading
    • Added filter_model/1 and parse_model/1 callbacks for customizing model parsing

Fixed

  • Anthropic models API fetch now correctly parses response structure (uses data field instead of models)
  • Python model fetch script updated to handle Anthropic's API response format
  • OpenRouter pricing parser now handles string values correctly
  • Groq adapter compilation warnings for undefined callbacks
  • DateTime serialization in MessageFormatter for session persistence
  • OpenAI adapter streaming termination handling
  • JSON double-encoding issue in HTTPClient
  • Token field name standardization across adapters (input_tokens/output_tokens)
  • Instructor integration API parameter passing
  • Context management module reference errors in example app
  • Function calling demo error handling with string keys
  • Streaming chat demo now shows token usage and cost estimates

Changed

  • Made Instructor a required dependency instead of optional
  • OpenAI default model changed to gpt-4.1-nano
  • Instructor now uses dynamic default models from YAML configs
  • Example app no longer hardcodes model names

Improved

  • Code organization with shared modules to eliminate duplication:
    • Created ExLLM.Adapters.Shared.Validation for API key validation
    • All adapters now use ModelUtils.format_model_name for consistent formatting
    • All adapters now use ConfigHelper.ensure_default_model for default models
    • Test files updated to use TestHelpers consistently
  • Example app enhancements:
    • Session management shows full conversation history
    • Function calling demo clearly shows available tools
    • Advanced features demo now has real implementations
    • Cost formatting uses decimal notation instead of scientific

Removed

  • Removed hardcoded model names from adapters
  • Removed model_capabilities.ex.bak backup file
  • Removed DUPLICATE_CODE_ANALYSIS.md after completing all refactoring

[0.2.0] - 2025-05-25

Added

  • OpenAI adapter with GPT-4 and GPT-3.5 support
  • Ollama adapter for local model inference
  • AWS Bedrock adapter with full multi-provider support (Anthropic, Amazon Titan, Meta Llama, Cohere, AI21, Mistral)
    • Complete AWS credential chain support (environment vars, profiles, instance metadata, ECS task roles)
    • Provider-specific request/response formatting
    • Native streaming support
    • Dynamic model listing via AWS Bedrock API
  • Google Gemini adapter with Pro, Ultra, and Nano models
  • Context management functionality to automatically handle LLM context windows
  • ExLLM.Context module with the following features:
    • Automatic message truncation to fit within model context windows
    • Multiple truncation strategies (sliding_window, smart)
    • Context window validation
    • Token estimation and statistics
    • Model-specific context window sizes
  • Session management functionality for conversation state tracking
  • ExLLM.Session module with the following features:
    • Conversation state management
    • Message history tracking
    • Token usage tracking
    • Session persistence (save/load)
    • Export to markdown/JSON formats
  • Local model support via Bumblebee integration
  • ExLLM.Adapters.Local with the following features:
    • Support for Phi-2, Llama 2, Mistral, GPT-Neo, and Flan-T5
    • Hardware acceleration (Metal, CUDA, ROCm, CPU)
    • Model lifecycle management with ModelLoader GenServer
    • Zero-cost inference (no API fees)
    • Privacy-preserving local execution
  • New public API functions in main ExLLM module:
    • Context management: prepare_messages/2, validate_context/2, context_window_size/2, context_stats/1
    • Session management: new_session/2, chat_with_session/2, save_session/2, load_session/1, etc.
  • Automatic context management in chat/3 and stream_chat/3
  • Optional dependencies (Bumblebee, Nx, EXLA) for local model support
  • Application supervisor for managing ModelLoader lifecycle
  • Comprehensive test coverage for all new features

Changed

  • Updated chat/3 and stream_chat/3 to automatically apply context truncation
  • Enhanced documentation with context management and session examples
  • ExLLM is now a comprehensive all-in-one solution including cost tracking, context management, and session handling

[0.1.0] - 2025-05-24

Added

  • Initial release with unified LLM interface
  • Support for Anthropic Claude models
  • Streaming support via Server-Sent Events
  • Integrated cost tracking and calculation
  • Token estimation functionality
  • Configurable provider system
  • Comprehensive error handling