ReqLLM. Providers. VLLM
(ReqLLM v1.12.0)
View Source
vLLM provider – self-hosted OpenAI-compatible Chat Completions API.
Implementation
Uses built-in OpenAI-style encoding/decoding defaults. vLLM is fully OpenAI-compatible, so no custom request/response handling is needed.
Self-Hosted Configuration
vLLM is a self-hosted inference server. Users must:
- Deploy a vLLM service (via pip install, Docker, or other methods)
- Configure the model to serve (e.g.,
--served-model-name my-model) - Set the base_url to point to their vLLM instance
Since vLLM runs multiple models on different ports, use the :base_url option
per-request or configure model entries with their specific URLs.
Authentication
By default, vLLM uses OPENAI_API_KEY as an environment variable. The presence of a value is required but typically not validated by vLLM. Set any non-empty value if authentication is not configured on your vLLM server.
Configuration
# Add to .env file (automatically loaded)
OPENAI_API_KEY=any-value-for-vllmExamples
# Basic usage with default localhost
ReqLLM.generate_text("vllm:my-local-model", "Hello!")
# With custom base_url for a specific vLLM instance
ReqLLM.generate_text("vllm:llama-3", "Hello!",
base_url: "http://my-server:8001/v1"
)
# Streaming
ReqLLM.stream_text("vllm:mistral-7b", "Tell me a story")
|> Enum.each(&IO.write/1)
Summary
Functions
Default implementation of attach/3.
Default implementation of attach_stream/4.
Default implementation of build_body/1.
Default implementation of decode_response/1.
Default implementation of decode_stream_event/2.
Callback implementation for ReqLLM.Provider.default_env_key/0.
Default implementation of encode_body/1.
Default implementation of extract_usage/2.
Default implementation of prepare_request/4.
Default implementation of translate_options/3.
Functions
Default implementation of attach/3.
Sets up Bearer token authentication and standard pipeline steps.
Default implementation of attach_stream/4.
Builds complete streaming requests using OpenAI-compatible format.
Default implementation of build_body/1.
Builds request body using OpenAI-compatible format for chat and embedding operations.
Default implementation of decode_response/1.
Handles success/error responses with standard ReqLLM.Response creation.
Default implementation of decode_stream_event/2.
Decodes SSE events using OpenAI-compatible format.
Callback implementation for ReqLLM.Provider.default_env_key/0.
Default implementation of encode_body/1.
Encodes request body using OpenAI-compatible format for chat and embedding operations.
Default implementation of extract_usage/2.
Extracts usage data from standard usage field in response body.
Default implementation of prepare_request/4.
Handles :chat, :object, and :embedding operations using OpenAI-compatible patterns.
Default implementation of translate_options/3.
Pass-through implementation that returns options unchanged.