ALLM.Providers.Gemini (allm v0.3.1)

Copy Markdown View Source

Google Gemini provider adapter — Layer B. (bundled adapters).

ships the non-streaming ALLM.Adapter callback set against the Generative Language API at https://generativelanguage.googleapis.com/v1beta. Streaming (ALLM.StreamAdapter) lands in tools / vision / image-out in Phases 16.3/16.4/16.5.

This module implements:

  • generate/2 — fires POST /v1beta/models/{model}:generateContent via Req, wrapped in ALLM.Retry.run/3 with the default retry policy (the documented contract — Gemini's 429 / 500 / 503 / 504 are already covered by the documented contract's default retryable set; no Gemini-specific wrapper is needed).
  • prepare_request/2 — returns an unfired %Req.Request{} with the API key injected as x-goog-api-key.
  • translate_options/2 — identity. Gemini's camelCase rename and generationConfig nesting happens inside to_generation_config/1 at request-build time.

Single translator

Gemini exposes one chat endpoint, generateContent, that covers both text and image generation — image generation is selected by toggling generationConfig.responseModalities. The request-builder (to_gemini_request_body/2) is therefore a single function shared across the chat adapter and (in) the image adapter. This amortizes the PHASE_10 dual-translator drift class to zero.

Auth header

The API key flows on the x-goog-api-key request header, not the documented ?key=... query parameter. Both forms are equivalent server-side; the header form keeps the API key out of HTTP access logs and metrics. The same header is reused for the streaming endpoint.

Wire field map

ConcernGemini wire field
Endpoint hosthttps://generativelanguage.googleapis.com/v1beta
Method (chat non-streaming)POST /models/{model}:generateContent
Auth headerx-goog-api-key: $key
Rolesuser, model (:assistant → "model")
System prompttop-level systemInstruction.parts[].text
Generation paramsnested under generationConfig.{maxOutputTokens, temperature, topP, topK, stopSequences, responseMimeType, responseSchema}
finish_reasoncandidates[0].finishReason (UPPER_SNAKE_CASE; mapping table below)
Prompt-blocked pathpromptFeedback.blockReason (top-level, no candidates)
Usage locationusageMetadata.{promptTokenCount, candidatesTokenCount, totalTokenCount}
Error envelope{"error": {"code", "status", "message"}}

Finish-reason mapping

Gemini's enum has 19 documented values. ALLM's Response.finish_reason is a closed 6-atom union; the raw string is preserved at Response.raw_finish_reason for non-canonical rows.

Gemini finishReasonALLM Response.finish_reason
STOP:stop
MAX_TOKENS:length
SAFETY:content_filter
RECITATION:content_filter
LANGUAGE:content_filter
BLOCKLIST:content_filter
PROHIBITED_CONTENT:content_filter
SPII:content_filter
IMAGE_SAFETY:content_filter
IMAGE_PROHIBITED_CONTENT:content_filter
IMAGE_RECITATION:content_filter
IMAGE_OTHER:other
NO_IMAGE:other
MALFORMED_FUNCTION_CALL:error
UNEXPECTED_TOOL_CALL:error
TOO_MANY_TOOL_CALLS:error
MISSING_THOUGHT_SIGNATURE:error
MALFORMED_RESPONSE:error
OTHER / FINISH_REASON_UNSPECIFIED / unknown:other

Empty-candidates branches (Decisions #9 + #10)

  • promptFeedback.blockReason with empty candidates → {:ok, %Response{finish_reason: :content_filter, content: ""}}. The block reason is preserved at metadata.error.reason = "blocked:<BLOCK_REASON>".
  • empty candidates with no promptFeedback.blockReason{:error, %AdapterError{reason: :malformed_response}}.

Usage decoding

usageMetadata.candidatesTokenCount is canonical; usageMetadata.responseTokenCount is read as a defensive fallback when candidatesTokenCount is absent. If both are missing, Usage.output_tokens is left at nil and a one-time Logger.warning/1 fires per call.

Error envelope mapping

Maps Google's {error: {code, status, message}} envelope onto %AdapterError{reason:...}:

HTTPGoogle statusAdapterError.reason
400INVALID_ARGUMENT (no marker):invalid_request
400INVALID_ARGUMENT (exceeds the maximum number of tokens substring):context_length_exceeded
401UNAUTHENTICATED:authentication_failed
403PERMISSION_DENIED:authentication_failed
404NOT_FOUND:invalid_request
429RESOURCE_EXHAUSTED:rate_limited
500INTERNAL:provider_unavailable
503UNAVAILABLE:provider_unavailable
504DEADLINE_EXCEEDED:provider_unavailable

Retry policy

No Gemini-specific retry-policy wrapper. The default policy at lib/allm/retry.ex already retries HTTP 429, 500, 502, 503, 504, and :timeout / :network_error. Streaming never retries .

Key resolution

Keys never appear on the engine. prepare_request/2 and generate/2 call ALLM.Keys.fetch!(:gemini, opts) at request-build time. The :gemini provider atom is not in ALLM.Keys's @env_var_table; the unknown-provider fallback at lib/allm/keys.ex:189-194 returns "GEMINI_API_KEY".

Summary

Functions

Execute a non-streaming generateContent request synchronously.

Map a Gemini finishReason string to ALLM's closed Response.finish_reason enum, returning {atom, raw_string_or_nil} per the documented contract.

Build an unfired %Req.Request{} with the resolved API key injected as x-goog-api-key: <key>.

Open an SSE stream against streamGenerateContent?alt=sse.

Compose the JSON request body for generateContent from a canonical %Request{}. Pure function; no I/O.

Translate an ALLM canonical tool_choice to Gemini's functionCallingConfig map.

Translate a list of canonical %ALLM.Tool{}s to Gemini's functionDeclarations shape.

Identity translator. Gemini accepts ALLM's canonical :max_tokens, :temperature, :top_p, etc. — the camelCase rename and generationConfig nesting happens in to_generation_config/1 at request-build time, not here.

Functions

generate(request, opts)

Execute a non-streaming generateContent request synchronously.

Wraps the HTTP call in ALLM.Retry.run/3 with the the documented contract default policy. Returns {:ok, %Response{}} on 2xx success or {:error, %AdapterError{}} on every failure shape.

Empty-candidates handling (Decisions #9 + #10)

  • promptFeedback.blockReason with empty candidates → {:ok, %Response{finish_reason: :content_filter, content: ""}} (a successful HTTP response is a successful call from the adapter's perspective; the content filter is a finish reason).
  • Empty candidates with no promptFeedback.blockReason{:error, %AdapterError{reason: :malformed_response}}.

Error reasons

HTTPAdapterError.reason
400 generic:invalid_request
400 ctx-window:context_length_exceeded
401 / 403:authentication_failed
404:invalid_request
429:rate_limited
500 / 503 / 504:provider_unavailable
network drop:network_error
malformed body:malformed_response

Examples

iex> ALLM.Keys.put(:gemini, "AIza-doctest-gen")
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gemini-2.5-flash")
iex> {:error, %ALLM.Error.AdapterError{reason: :authentication_failed}} =
...> ALLM.Providers.Gemini.generate(req,
...> retry: false,
...> adapter_opts: [plug: fn conn ->
...> conn
...> |> Plug.Conn.put_resp_content_type("application/json")
...> |> Plug.Conn.resp(401, ~s({"error":{"code":401,"status":"UNAUTHENTICATED","message":"bad"}}))
...> end]
...>)
iex> ALLM.Keys.delete(:gemini)
:ok

parse_finish_reason(other)

@spec parse_finish_reason(String.t() | nil) ::
  {ALLM.Response.finish_reason() | nil, String.t() | nil}

Map a Gemini finishReason string to ALLM's closed Response.finish_reason enum, returning {atom, raw_string_or_nil} per the documented contract.

STOP collapses to {:stop, nil} (the canonical "natural completion" row); every other row preserves the raw string at index 1 so callers can recover provider fidelity from Response.raw_finish_reason.

Examples

iex> ALLM.Providers.Gemini.parse_finish_reason("STOP")
{:stop, nil}

iex> ALLM.Providers.Gemini.parse_finish_reason("MAX_TOKENS")
{:length, "MAX_TOKENS"}

iex> ALLM.Providers.Gemini.parse_finish_reason("SAFETY")
{:content_filter, "SAFETY"}

iex> ALLM.Providers.Gemini.parse_finish_reason("OTHER")
{:other, "OTHER"}

iex> ALLM.Providers.Gemini.parse_finish_reason(nil)
{nil, nil}

prepare_request(request, opts)

@spec prepare_request(
  ALLM.Request.t(),
  keyword()
) :: {:ok, Req.Request.t()} | {:error, ALLM.Error.AdapterError.t()}

Build an unfired %Req.Request{} with the resolved API key injected as x-goog-api-key: <key>.

Per ALLM.Keys.fetch!/2, this function raises %ALLM.Error.EngineError{reason: :missing_key} when no key resolver yields a value.

Honors opts[:request_timeout] (forwarded as Req's :receive_timeout) and opts[:adapter_opts][:endpoint] (URL host override, primarily for testing).

Examples

iex> ALLM.Keys.put(:gemini, "AIza-doctest-prep")
iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "hi"}], model: "gemini-2.5-flash")
iex> {:ok, %Req.Request{} = http} = ALLM.Providers.Gemini.prepare_request(req, [])
iex> Req.Request.get_header(http, "x-goog-api-key")
["AIza-doctest-prep"]
iex> ALLM.Keys.delete(:gemini)
:ok

stream(request, opts)

Open an SSE stream against streamGenerateContent?alt=sse.

Returns {:ok, enumerable} on success — the enumerable is lazy; the HTTP request fires on the first reduce. Returns {:error, %AdapterError{}} only for synchronous pre-flight failures (key-resolution failure raises %EngineError{} directly per the Keys.fetch!/2 contract; that is surfaced through the existing with-chain at the call site).

Per CLAUDE.md mid-stream-error invariant, HTTP-shaped errors observed AFTER the consumer starts reducing are folded into a terminating {:error, _} event — the call-site tuple stays {:ok, stream}. This includes 4xx status codes received before the first SSE event (the {:status, code} Finch frame folds via handle_finch_payload/2).

Decision references

  • the documented contract — request body byte-equal to generate/2's. Only the URL path differs (:streamGenerateContent?alt=sse vs :generateContent).
  • the documented contract?alt=sse is the ONLY required query parameter; auth still flows via x-goog-api-key.
  • the documented contractusageMetadata may appear on intermediate chunks; the chunk-mapper emits {:raw_chunk, {:usage, _}} on every appearance and StreamCollector.apply_event/2 overwrites.
  • the documented contract — stream terminates on Finch's :done payload, not a data: [DONE] lookahead. The synthetic :message_completed event is built from accumulated state.

Options

  • :stream_timeout (default 60_000 ms) — receive-loop after-clause between chunks.
  • :finch_module (default Finch) — test injection seam.
  • :finch_name (default ALLM.Finch).
  • :finch_stub_ref — opaque ref forwarded to the Finch shim (used only by ALLM.Test.FinchStub).
  • :adapter_opts[:endpoint] — endpoint override (testing).

to_gemini_request_body(request, opts)

@spec to_gemini_request_body(
  ALLM.Request.t(),
  keyword()
) :: map()

Compose the JSON request body for generateContent from a canonical %Request{}. Pure function; no I/O.

Performs system-message extraction (hoist into top-level systemInstruction), role mapping (:assistant → "model"), and generationConfig composition.

surface only — tools (16.3) and image-out (16.5) extend this builder via opts flags without changing the text-only path.

Examples

iex> req = ALLM.Request.new(
...> [%ALLM.Message{role: :system, content: "Be concise."},
...> %ALLM.Message{role: :user, content: "Hi"}],
...> model: "gemini-2.5-flash", max_tokens: 256
...>)
iex> body = ALLM.Providers.Gemini.to_gemini_request_body(req, [])
iex> {body["systemInstruction"], length(body["contents"]), body["generationConfig"]["maxOutputTokens"]}
{%{"parts" => [%{"text" => "Be concise."}]}, 1, 256}

to_gemini_tool_config(name)

@spec to_gemini_tool_config(ALLM.Request.tool_choice() | {:tool, String.t()}) :: map()

Translate an ALLM canonical tool_choice to Gemini's functionCallingConfig map.

ALLM canonicalGemini wire
:auto%{"mode" => "AUTO"}
:required%{"mode" => "ANY"}
:none%{"mode" => "NONE"}
{:tool, "name"}%{"mode" => "ANY", "allowedFunctionNames" => ["name"]}
"name" (string)shorthand for {:tool, "name"}

Map shapes (%{"mode" => "AUTO"}, etc.) are passed through verbatim so callers can hand-craft Gemini-specific extensions.

Examples

iex> ALLM.Providers.Gemini.to_gemini_tool_config(:auto)
%{"mode" => "AUTO"}

iex> ALLM.Providers.Gemini.to_gemini_tool_config({:tool, "set_color"})
%{"mode" => "ANY", "allowedFunctionNames" => ["set_color"]}

to_gemini_tools(tools)

@spec to_gemini_tools([ALLM.Tool.t()]) :: [map()]

Translate a list of canonical %ALLM.Tool{}s to Gemini's functionDeclarations shape.

Gemini's tools is an array of %{functionDeclarations: [...]} objects, not a flat array of declarations. Each declaration carries :name, :description, and :parameters (Gemini's name for the JSON-Schema field — distinct from OpenAI's parameters key on the tool's function sub-map and Anthropic's input_schema).

Examples

iex> tool = ALLM.Tool.new(name: "get_weather", description: "weather", schema: %{"type" => "object"})
iex> ALLM.Providers.Gemini.to_gemini_tools([tool])
[%{"name" => "get_weather", "description" => "weather", "parameters" => %{"type" => "object"}}]

translate_options(opts, request)

@spec translate_options(
  keyword(),
  ALLM.Request.t()
) :: keyword()

Identity translator. Gemini accepts ALLM's canonical :max_tokens, :temperature, :top_p, etc. — the camelCase rename and generationConfig nesting happens in to_generation_config/1 at request-build time, not here.

Examples

iex> req = ALLM.Request.new([%ALLM.Message{role: :user, content: "x"}], model: "gemini-2.5-flash")
iex> ALLM.Providers.Gemini.translate_options([max_tokens: 100, temperature: 0.7], req)
[max_tokens: 100, temperature: 0.7]