ReqLLM.Providers.Google (ReqLLM v1.12.0)

View Source

Google Gemini provider – built on the OpenAI baseline defaults with Gemini-specific customizations.

Implementation

Uses built-in defaults with custom encoding/decoding to translate between OpenAI format and Gemini API format.

Google-Specific Extensions

Beyond standard OpenAI parameters, Google supports:

  • google_api_version - Select API version ("v1" or "v1beta"). Defaults to "v1" for production stability. Set to "v1beta" to enable beta features like Google Search grounding.
  • google_safety_settings - List of safety filter configurations
  • google_candidate_count - Number of response candidates to generate (default: 1)
  • google_grounding - Enable Google Search grounding (built-in web search). Requires google_api_version: "v1beta"
  • google_thinking_budget - Thinking token budget for Gemini 2.5 models (cannot be combined with google_thinking_level)
  • google_thinking_level - Thinking level for Gemini 3+ models (:minimal, :low, :medium, :high). Cannot be combined with google_thinking_budget
  • cached_content - Reference to cached content for 90% cost savings (see Context Caching below)
  • dimensions - Number of dimensions for embedding vectors
  • task_type - Task type for embeddings (e.g., RETRIEVAL_QUERY)
  • response_modalities - Control output modalities for image generation (e.g., ["IMAGE"] for image-only)

See provider_schema/0 for the complete Google-specific schema and ReqLLM.Provider.Options for inherited OpenAI parameters.

Context Caching

Gemini models support explicit context caching to reduce costs by up to 90% when reusing large amounts of content:

# Create a cache with large context
{:ok, cache} = ReqLLM.Providers.Google.CachedContent.create(
  provider: :google,
  model: "gemini-2.5-flash",
  api_key: System.get_env("GOOGLE_API_KEY"),
  contents: [%{role: "user", parts: [%{text: large_document}]}],
  system_instruction: "You are a helpful assistant.",
  ttl: "3600s"
)

# Use the cache in requests (90% discount on cached tokens!)
{:ok, response} = ReqLLM.generate_text(
  "google:gemini-2.5-flash",
  "Question about the document?",
  provider_options: [cached_content: cache.name]
)

# Check token usage - note the cached_tokens field
IO.inspect(response.usage)
# %{input_tokens: 50, cached_tokens: 10000, output_tokens: 100, ...}

See ReqLLM.Providers.Google.CachedContent for full API documentation.

API Version Selection

The provider defaults to Google's v1beta API which supports all features including function calling (tools) and Google Search grounding. For legacy compatibility, you can force v1 by setting google_api_version: "v1", but note that v1 does not support function calling or grounding:

ReqLLM.generate_text(
  "google:gemini-2.5-flash",
  "What are today's tech headlines?",
  provider_options: [
    google_grounding: %{enable: true}
  ]
)

Note: Setting google_api_version: "v1" with function calling (tools) or grounding will return an error.

Configuration

# Add to .env file (automatically loaded)
GOOGLE_API_KEY=AIza...

Summary

Functions

Default implementation of attach/3.

Default implementation of attach_stream/4.

Default implementation of build_body/1.

Default implementation of decode_response/1.

Default implementation of decode_stream_event/2.

Default implementation of encode_body/1.

Default implementation of extract_usage/2.

Custom prepare_request for chat operations to use Google's specific endpoints.

Default implementation of translate_options/3.

Functions

attach(request, model_input, user_opts)

Default implementation of attach/3.

Sets up Bearer token authentication and standard pipeline steps.

attach_stream(model, context, opts, finch_name)

Default implementation of attach_stream/4.

Builds complete streaming requests using OpenAI-compatible format.

base_url()

build_body(request)

Default implementation of build_body/1.

Builds request body using OpenAI-compatible format for chat and embedding operations.

decode_response(request_response)

Default implementation of decode_response/1.

Handles success/error responses with standard ReqLLM.Response creation.

decode_stream_event(event, model)

Default implementation of decode_stream_event/2.

Decodes SSE events using OpenAI-compatible format.

default_base_url()

default_env_key()

Callback implementation for ReqLLM.Provider.default_env_key/0.

encode_body(request)

Default implementation of encode_body/1.

Encodes request body using OpenAI-compatible format for chat and embedding operations.

extract_usage(body, model)

Default implementation of extract_usage/2.

Extracts usage data from standard usage field in response body.

pre_validate_options(operation, model, opts)

prepare_request(operation, model_spec, input, opts)

Custom prepare_request for chat operations to use Google's specific endpoints.

Uses Google's :generateContent and :streamGenerateContent endpoints instead of the standard OpenAI /chat/completions endpoint.

provider_extended_generation_schema()

provider_id()

provider_schema()

supported_provider_options()

translate_options(operation, model, opts)

Default implementation of translate_options/3.

Pass-through implementation that returns options unchanged.