ExLLM.Adapters.Local (ex_llm v0.1.0)
View SourceLocal LLM adapter using Bumblebee for on-device inference.
This adapter enables running language models locally using Bumblebee and EXLA/EMLX backends. It supports Apple Silicon (via EMLX), NVIDIA GPUs (via CUDA), and CPU inference.
Configuration
The local adapter doesn't require API keys but may need backend configuration:
# For Apple Silicon (automatic detection)
{:ok, response} = ExLLM.chat(:local, messages)
# With specific model
{:ok, response} = ExLLM.chat(:local, messages, model: "microsoft/phi-2")
Available Models
microsoft/phi-2
- Phi-2 (2.7B) - Defaultmeta-llama/Llama-2-7b-hf
- Llama 2 (7B)mistralai/Mistral-7B-v0.1
- Mistral (7B)EleutherAI/gpt-neo-1.3B
- GPT-Neo (1.3B)google/flan-t5-base
- Flan-T5 Base
Features
- On-device inference with no API calls
- Automatic hardware acceleration detection
- Support for Apple Silicon, NVIDIA GPUs, and CPUs
- Model caching for faster subsequent loads
- Streaming support for real-time generation