Image generation lives on a parallel surface to the text APIs. %ALLM.ImageRequest{} and %ALLM.ImageResponse{} mirror the Request/Response shape; the engine has a separate :image_adapter slot; and the entry points (ALLM.generate_image/3, ALLM.edit_image/4, ALLM.image_variations/3) take the same engine and return image responses.

This guide covers what each entry point does, the parallel adapter slot, OpenAI vs Gemini coverage, and the FakeImages adapter for deterministic testing.

Three operations

OperationFunctionWhat it does
GenerateALLM.generate_image/3Produces a new image from a text prompt
Edit (inpaint)ALLM.edit_image/4Modifies an existing image, optionally masked
VariationsALLM.image_variations/3Produces visual variations of an existing image

Each returns {:ok, %ALLM.ImageResponse{}} with :images (list of %ALLM.Image{}) and :usage (provider-reported counts).

The image-adapter engine slot

An engine has two adapter slots: :adapter for chat and :image_adapter for images. Set whichever you need:

engine = ALLM.Engine.new(
  adapter: ALLM.Providers.OpenAI,             # for chat, optional here
  image_adapter: ALLM.Providers.OpenAI.Images,
  image_default_model: "dall-e-2"
)

If you only generate images (no chat), the :adapter slot can stay unset.

Generating an image

iex> engine = ALLM.Engine.new(
...>   image_adapter: ALLM.Providers.FakeImages,
...>   image_adapter_opts: [
...>     scripts: [[{:ok, %{
...>       images: [%ALLM.Image{source: {:bytes, <<137, 80, 78, 71>>}, mime_type: "image/png"}]
...>     }}]]
...>   ]
...> )
iex> {:ok, %ALLM.ImageResponse{images: [%ALLM.Image{} = img]}} =
...>   ALLM.generate_image(engine, "a watercolor kestrel")
iex> img.mime_type
"image/png"

ALLM.generate_image/3 accepts opts:

  • :model — override the engine's default.
  • :size"512x512", "1024x1024", or a {w, h} tuple. Provider capabilities differ; OpenAI's dall-e-2 only supports 256×256, 512×512, and 1024×1024.
  • :n — number of images to generate.
  • :response_format:url (default for OpenAI 1.x) or :b64_json (default for newer models).

Editing an image (inpaint)

ALLM.edit_image/4 takes the engine, the base image, the prompt, and optionally a mask:

base = File.read!("base.png")
mask = File.read!("mask.png")  # white = paint here, transparent = keep

{:ok, response} = ALLM.edit_image(engine, base, "add a fountain", mask: mask)

The base and mask can be raw bytes, a file path ({:file, "/path/to/x.png"}), or an %ALLM.Image{}.

Variations

ALLM.image_variations/3 produces visual variations of an existing image — no prompt:

{:ok, response} = ALLM.image_variations(engine, base_image, n: 3)

OpenAI is the only bundled provider with native variation support, on dall-e-2 at 256×256.

Provider coverage

OperationOpenAIGemini
Generate (generate_image/3)yes (dall-e-2, dall-e-3, gpt-image-1)yes (gemini-2.5-flash-image-preview)
Edit (edit_image/4)yes (dall-e-2, gpt-image-1)yes
Variations (image_variations/3)yes (dall-e-2 only)no

Anthropic does not ship an image adapter — set :image_adapter to OpenAI's or Gemini's even when your chat adapter is Anthropic.

Materializing the result

A %ALLM.Image{} carries a :source (either {:bytes, binary} or {:url, string}) and a :mime_type. To get raw bytes regardless of source:

{:ok, bytes} = ALLM.Image.to_binary(image)

This handles the URL fetch transparently if needed.

To write to disk:

{:ok, bytes} = ALLM.Image.to_binary(image)
File.write!("output.png", bytes)

Testing with FakeImages

ALLM.Providers.FakeImages is the canonical test vehicle for image flows — same idea as ALLM.Providers.Fake for chat. Build a scripted response and assert against it:

iex> engine = ALLM.Engine.new(
...>   image_adapter: ALLM.Providers.FakeImages,
...>   image_adapter_opts: [
...>     scripts: [[{:ok, %{
...>       images: [
...>         %ALLM.Image{source: {:bytes, <<137, 80, 78, 71, 0, 0>>}, mime_type: "image/png"}
...>       ]
...>     }}]]
...>   ]
...> )
iex> {:ok, %ALLM.ImageResponse{images: images}} =
...>   ALLM.generate_image(engine, "anything")
iex> length(images)
1

Fake replies are deterministic, async-test-safe (per-process cursor), and require no network or API key.

Common patterns

Generate + persist

{:ok, %ALLM.ImageResponse{images: [image]}} =
  ALLM.generate_image(engine, prompt, size: "1024x1024")

{:ok, bytes} = ALLM.Image.to_binary(image)
File.write!(target_path, bytes)

Edit with progress

generate_image/3 and friends are non-streaming. Long generations block until the provider returns the bytes. Set a longer timeout via the engine's :request_options if needed.

Multi-tenant key resolution

Image-adapter calls go through the same ALLM.Keys resolution chain as chat calls. Pass :api_key per-call for BYOK SaaS:

ALLM.generate_image(engine, prompt, api_key: tenant.openai_key)

Where to next

  • vision.md — sending images TO the model, vs generating new ones.
  • examples/10_generate_image.exs — runnable smoke test.
  • examples/11_edit_image.exs — inpaint with mask.
  • examples/13_image_variations.exs — OpenAI-only variation flow.