Google Vertex AI
View SourceAccess Claude, Gemini, and MaaS models through Google Cloud's Vertex AI platform.
Configuration
Vertex AI uses Google Cloud OAuth2 authentication. ReqLLM uses Application Default Credentials (ADC) by default.
Application Default Credentials (Recommended)
For local development:
gcloud auth application-default login
export GOOGLE_CLOUD_PROJECT="your-project-id"
export GOOGLE_CLOUD_REGION="global"
ReqLLM checks ADC sources in this order:
GOOGLE_APPLICATION_CREDENTIALSpointing to an ADC credential fileGOOGLE_APPLICATION_CREDENTIALS_JSONcontaining ADC credential JSON- The well-known gcloud ADC file, such as
~/.config/gcloud/application_default_credentials.json(honorsCLOUDSDK_CONFIG) - The Google Cloud metadata server
Supported ADC credential types include user ADC credentials from gcloud auth application-default login, service account keys, workload identity credential
configuration files, and metadata server credentials. Local ADC files of type
impersonated_service_account are not supported yet.
Service Account
Environment Variables:
GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
GOOGLE_CLOUD_PROJECT="your-project-id"
GOOGLE_CLOUD_REGION="global"
Application Config:
config :req_llm, :google_vertex,
service_account_json: "/path/to/service-account.json",
project_id: "your-project-id",
region: "global"Provider Options:
ReqLLM.generate_text(
"google_vertex:claude-sonnet-4-5@20250929",
"Hello",
provider_options: [
service_account_json: "/path/to/service-account.json",
project_id: "your-project-id",
region: "global"
]
)Model Specs
For the full model-spec workflow, see Model Specs.
Use exact Vertex model IDs from LLMDB.xyz when possible. For MaaS and other OpenAI-compatible Vertex models that are not in the registry yet, build a full explicit model spec with ReqLLM.model!/1. Some MaaS model IDs also need extra.family when the family cannot be inferred from the ID alone.
Provider Options
Passed via :provider_options keyword:
service_account_json
- Type: String (file path or JSON string) or map
- Purpose: Explicit Google Cloud service account JSON credentials
- Fallback:
config :req_llm, :google_vertex - Example:
provider_options: [service_account_json: "/path/to/credentials.json"] - Note: For normal ADC usage, prefer
GOOGLE_APPLICATION_CREDENTIALSinstead of this option.
access_token
- Type: String
- Purpose: Use an existing OAuth2 access token generated outside ReqLLM (e.g., via Goth or gcloud)
- Behavior: Bypasses the service account JSON flow and internal token management
- Fallback:
config :req_llm, :google_vertex - Example:
provider_options: [access_token: "your-access-token"]
project_id
- Type: String
- Purpose: Google Cloud project ID
- Fallback:
config :req_llm, :google_vertex, thenGOOGLE_CLOUD_PROJECTenv var - Example:
provider_options: [project_id: "my-project-123"] - Required: Yes
region
- Type: String
- Default:
"global" - Purpose: GCP region for Vertex AI endpoint
- Fallback:
config :req_llm, :google_vertex, thenGOOGLE_CLOUD_REGIONenv var - Example:
provider_options: [region: "us-central1"] - Note: Use
"global"for newest models, specific regions for regional deployment
additional_model_request_fields
- Type: Map
- Purpose: Model-specific request fields (e.g., thinking configuration)
- Example:
provider_options: [ additional_model_request_fields: %{ thinking: %{type: "enabled", budget_tokens: 4096} } ]
labels
- Type: Map of strings to strings
- Purpose: Custom metadata labels attached to the request. Used by Google Cloud for billing and reporting — labels are filterable in billing reports and BigQuery exports.
- Constraints: Up to 64 labels per request; keys 1–63 chars starting with a lowercase letter; keys and values may only contain lowercase letters, numbers, underscores, and dashes.
- Availability: Vertex AI only — the direct Gemini API (
generativelanguage.googleapis.com) does not support this field. - Example:
provider_options: [ labels: %{ "team" => "engineering", "environment" => "production", "use_case" => "contract_analysis" } ] - Reference: Custom metadata labels
Claude-Specific Options
Vertex AI supports the same Claude options as native Anthropic:
anthropic_top_k
- Type:
1..40 - Purpose: Sample from top K options per token
- Example:
provider_options: [anthropic_top_k: 20]
stop_sequences
- Type: List of strings
- Purpose: Custom stop sequences
- Example:
provider_options: [stop_sequences: ["END", "STOP"]]
anthropic_metadata
- Type: Map
- Purpose: Request metadata for tracking
- Example:
provider_options: [anthropic_metadata: %{user_id: "123"}]
thinking
- Type: Map
- Purpose: Enable extended thinking/reasoning
- Example:
provider_options: [thinking: %{type: "enabled", budget_tokens: 4096}] - Access:
ReqLLM.Response.thinking(response)
anthropic_prompt_cache
- Type: Boolean
- Purpose: Enable prompt caching
- Example:
provider_options: [anthropic_prompt_cache: true]
anthropic_prompt_cache_ttl
- Type: String (e.g.,
"1h") - Purpose: Cache TTL (default ~5min if omitted)
- Example:
provider_options: [anthropic_prompt_cache_ttl: "1h"]
Supported Models
Claude 4.5 Family
Haiku 4.5:
google_vertex:claude-haiku-4-5@20251001- Fast, cost-effective
- Full tool calling and reasoning support
Sonnet 4.5:
google_vertex:claude-sonnet-4-5@20250929- Balanced performance and capability
- Extended thinking support
Opus 4.1:
google_vertex:claude-opus-4-1@20250805- Highest capability
- Advanced reasoning
Claude 4.0 & Earlier
- Sonnet 4.0:
google_vertex:claude-sonnet-4@20250514 - Opus 4.0:
google_vertex:claude-opus-4@20250514 - Sonnet 3.7:
google_vertex:claude-3-7-sonnet@20250219 - Sonnet 3.5 v2:
google_vertex:claude-3-5-sonnet@20241022 - Haiku 3.5:
google_vertex:claude-3-5-haiku@20241022
Model ID Format
Vertex uses the @ symbol for versioning:
- Format:
claude-{tier}-{version}@{date} - Example:
claude-sonnet-4-5@20250929
Wire Format Notes
- Authentication: OAuth2 with service account tokens (auto-refreshed)
- Endpoint: Model-specific paths under
aiplatform.googleapis.com - API: Uses Anthropic's raw message format (compatible with native API)
- Streaming: Standard Server-Sent Events (SSE)
- Region routing: Global endpoint for newest models, regional for specific deployments
All differences handled automatically by ReqLLM.