ALLM.Providers.Gemini.Images (allm v0.4.0)

Copy Markdown View Source

Google Gemini native image-out adapter — implements ALLM.ImageAdapter against generateContent with responseModalities: ["TEXT", "IMAGE"] on the Gemini-native image preview models (gemini-3.1-flash-image-preview / "Nano Banana 2", gemini-3-pro-image-preview / "Nano Banana Pro"). and . Layer B — runtime. Consumed through the `ALLM.generate_image/3` façade. Keys resolve via `ALLM.Keys.fetch!(:gemini, opts)` at request-build time per the documented contract — no key ever lives on the engine. ## Single translator Image generation is `generateContent` with `responseModalities` toggled to `["TEXT", "IMAGE"]`. The request body is built by `ALLM.Providers.Gemini.to_gemini_request_body/2` (the same translator the chat adapter uses). The image adapter then overrides `generationConfig.responseModalities` and adds `generationConfig.imageConfig.aspectRatio` from the the documented contract size-mapping table. The `:edit` operation reuses 's `part_to_block/1` for source-image translation by synthesizing a user-role message with `[%TextPart{}, %ImagePart{},...]` content. ## Aspect-ratio mapping | ALLM `ImageRequest.size` | Gemini `imageConfig.aspectRatio` | |--------------------------|----------------------------------| | `"1024x1024"`, `"512x512"`, `"256x256"`, any square | `"1:1"` | | `"1792x1024"`, any 16:9 | `"16:9"` | | `"1024x1792"`, any 9:16 | `"9:16"` | | `"1024x768"`, any 4:3 | `"4:3"` | | `"768x1024"`, any 3:4 | `"3:4"` | | `nil` | omit `imageConfig` (Gemini default) | | anything else | `{:error, %ImageAdapterError{reason: :invalid_request}}` | Pixel sizing (`imageSize: "1K"|"2K"|"4K"`) is not exposed in v0.2's `ImageRequest.size` field; deferred. Aspect-ratio is the only knob. ## Operation gate `supported_operations/0` returns `[:generate, :edit]`. `:variation` is rejected with `:unsupported_operation` BEFORE any HTTP I/O per `ImageAdapter` invariant 4. ## Test-injection escape hatch `opts[:adapter_opts][:image_script]`, when present, delegates to `ALLM.Providers.FakeImages.generate/2` BEFORE any pre-flight gate runs. Mirrors the OpenAI.Images precedent at `lib/allm/providers/openai/images.ex:251`. ## Shared response decoder (Cross-function invariant) Response bodies are decoded via `ALLM.Providers.Gemini.Decode.candidate_parts/1` — the same helper `Gemini.generate/2` calls (see `lib/allm/providers/gemini.ex:991` post-Phase-16.5 refactor). The image adapter consumes the `image_parts` element of the returned tuple while the chat adapter consumes `text` + `tool_calls`; both walk the parts list once. Per cross-function invariants lines 217-219.

Summary

Functions

Return the Gemini endpoint path (relative to the API base URL) for the image-generation operation.

Execute an image-generation or edit request synchronously.

Return an unfired Req.Request configured exactly as generate/2 would fire it.

Resolve an %Image{} source to raw bytes. Mirrors the OpenAI seam at lib/allm/providers/openai/images.ex:858.

Return the closed list of operations Gemini's image adapter supports.

Map ImageRequest.size to Gemini's imageConfig.aspectRatio per the documented contract. Returns the raw aspect-ratio string, :omit for nil, or {:error, :invalid_size} for an unmappable size.

Build the JSON request body for an image request.

Functions

endpoint_for(model)

(since 0.3.0)
@spec endpoint_for(String.t()) :: String.t()

Return the Gemini endpoint path (relative to the API base URL) for the image-generation operation.

Both :generate and :edit route through generateContent (the request body shape differs, the URL path does not). :variation is rejected pre-flight by gate_operation/2.

Examples

iex> ALLM.Providers.Gemini.Images.endpoint_for("gemini-3.1-flash-image-preview")
"/models/gemini-3.1-flash-image-preview:generateContent"

generate(request, opts)

@spec generate(
  ALLM.ImageRequest.t(),
  keyword()
) :: {:ok, ALLM.ImageResponse.t()} | {:error, ALLM.Error.ImageAdapterError.t()}

Execute an image-generation or edit request synchronously.

Pre-flight gates (per ImageAdapter invariant 4)

Before any HTTP I/O, generate/2 checks (in order):

  1. Test-injection escape hatch. When opts[:adapter_opts][:image_script] is non-nil, the call delegates to ALLM.Providers.FakeImages.generate/2.
  2. Operation gate. request.operation in supported_operations. Failure → :unsupported_operation with metadata: %{operation: op}.
  3. Aspect-ratio gate. request.size, when non-nil, must map to one of "1:1" | "16:9" | "9:16" | "4:3" | "3:4". Failure → :invalid_request.

Key resolution (ALLM.Keys.fetch!/2) runs AFTER the gates — a request rejected pre-flight does not require a valid key.

Request-id / metadata round-trip (invariants 5 + 6)

opts[:request_id] is reflected onto response.request_id. request.metadata round-trips onto response.metadata unchanged.

prepare_request(request, opts)

@spec prepare_request(
  ALLM.ImageRequest.t(),
  keyword()
) :: {:ok, Req.Request.t()} | {:error, ALLM.Error.ImageAdapterError.t()}

Return an unfired Req.Request configured exactly as generate/2 would fire it.

Same gate ordering as generate/2. Returns {:error, %ImageAdapterError{}} for any pre-flight failure.

resolve_image_bytes(image, opts)

(since 0.3.0)
@spec resolve_image_bytes(
  ALLM.Image.t(),
  keyword()
) :: {:ok, binary(), String.t()} | {:error, ALLM.Error.ImageAdapterError.t()}

Resolve an %Image{} source to raw bytes. Mirrors the OpenAI seam at lib/allm/providers/openai/images.ex:858.

For Gemini, this helper exists for parity with the OpenAI image-adapter testing surface. The actual :edit request build delegates source translation to Gemini.part_to_block/1 via the chat translator, which handles :binary, :base64, and :file sources; :url is rejected by Gemini.reject_unsupported_image_sources/1.

supported_operations()

@spec supported_operations() :: [:generate | :edit]

Return the closed list of operations Gemini's image adapter supports.

Per the documented contract — [:generate, :edit]. :variation is not supported by the Gemini-native image models and is rejected pre-flight.

Examples

iex> ALLM.Providers.Gemini.Images.supported_operations
[:generate, :edit]

to_aspect_ratio(s)

(since 0.3.0)
@spec to_aspect_ratio(ALLM.ImageRequest.size() | nil) ::
  {:ok, String.t()} | :omit | {:error, :invalid_size}

Map ImageRequest.size to Gemini's imageConfig.aspectRatio per the documented contract. Returns the raw aspect-ratio string, :omit for nil, or {:error, :invalid_size} for an unmappable size.

Square sizes ("NxN" or {n, n}) collapse to "1:1". Non-square sizes use exact ratio comparison rather than substring matching so "768x1024" (3:4) and "1024x1792" (~9:16) are disambiguated.

Examples

iex> ALLM.Providers.Gemini.Images.to_aspect_ratio("1024x1024")
{:ok, "1:1"}

iex> ALLM.Providers.Gemini.Images.to_aspect_ratio({1792, 1024})
{:ok, "16:9"}

iex> ALLM.Providers.Gemini.Images.to_aspect_ratio(nil)
:omit

iex> ALLM.Providers.Gemini.Images.to_aspect_ratio("999x111")
{:error, :invalid_size}

to_image_request_body(request, opts)

(since 0.3.0)
@spec to_image_request_body(
  ALLM.ImageRequest.t(),
  keyword()
) :: {:ok, map()} | {:error, ALLM.Error.ImageAdapterError.t()}

Build the JSON request body for an image request.

Synthesizes a chat-equivalent %Request{} (single user message whose content is the prompt for :generate, or [%TextPart{}, %ImagePart{},...] for :edit) and delegates to Gemini.to_gemini_request_body/2 per the documented contract. Then overrides generationConfig.responseModalities = ["TEXT", "IMAGE"] and (when the size maps to a known aspect ratio) adds generationConfig.imageConfig.aspectRatio. :n > 1 adds generationConfig.candidateCount: n.

Returns {:error, %ImageAdapterError{reason: :invalid_request}} for unmappable sizes per the documented contract's closed table.