ExLLM.Providers.LMStudio (ex_llm v0.8.1)

View Source

LM Studio adapter for local LLM inference.

This adapter provides integration with LM Studio, a desktop application for running local LLMs with an OpenAI-compatible API. LM Studio supports models from Hugging Face and provides both GUI and server modes for local inference.

Configuration

LM Studio runs a local server with OpenAI-compatible endpoints. By default, it listens on http://localhost:1234 with API key "lm-studio".

# Basic usage
{:ok, response} = ExLLM.chat(:lmstudio, messages)

# With custom endpoint
{:ok, response} = ExLLM.chat(:lmstudio, messages, 
  host: "192.168.1.100", 
  port: 8080
)

Features

  • OpenAI-compatible API (/v1/chat/completions, /v1/models, /v1/embeddings)
  • Native LM Studio REST API (/api/v0/*) with enhanced model information
  • Model loading status and quantization details
  • TTL (Time-To-Live) parameter for automatic model unloading
  • Support for both llama.cpp and MLX engines on Apple Silicon
  • Streaming chat completions

Requirements

  1. Install LM Studio from https://lmstudio.ai
  2. Download and load at least one model in LM Studio
  3. Start the local server (usually localhost:1234)
  4. Ensure the server is running when using this adapter

API Endpoints

This adapter uses both OpenAI-compatible and native LM Studio endpoints:

  • OpenAI Compatible: /v1/chat/completions, /v1/models, /v1/embeddings
  • Native API: /api/v0/models, /api/v0/chat/completions (enhanced features)

The native API provides additional information like model loading status, quantization details, architecture information, and performance metrics.