ExLLM.Local.EXLAConfig (ex_llm v0.1.0)
View SourceConfiguration module for EXLA/EMLX backend optimization.
Provides optimal settings for CPU and GPU inference, including Apple Silicon support. This module automatically detects available hardware acceleration and configures the appropriate backend for best performance.
Supported Backends
- EMLX - Apple Silicon (Metal) acceleration
- CUDA - NVIDIA GPU acceleration
- ROCm - AMD GPU acceleration
- CPU - Optimized CPU inference
Features
- Automatic hardware detection
- Mixed precision support
- Memory optimization
- Dynamic batch sizing
- Parallel execution configuration
Summary
Functions
Get information about available acceleration.
Configure EXLA/EMLX backend with optimal settings based on available hardware.
Determine optimal backend options based on available hardware.
Enable mixed precision training/inference for better performance.
Optimize memory usage for large models.
Get optimal compiler options for model serving.
Functions
Get information about available acceleration.
Returns a map with acceleration details including type, name, and capabilities.
Configure EXLA/EMLX backend with optimal settings based on available hardware.
Returns {:ok, backend}
where backend is :emlx
, :cuda
, :rocm
, :cpu
, or :binary
.
Determine optimal backend options based on available hardware.
Returns a map of backend configuration options.
Enable mixed precision training/inference for better performance.
Optimize memory usage for large models.
Get optimal compiler options for model serving.
Returns keyword list of options for Bumblebee serving configuration.