ExAthena supports multimodal messages — text plus images — through the
ExAthena.Messages.ContentPart struct. Two entry points are available: the
ergonomic images: shorthand for quick one-liners, and full ContentPart
construction for complex payloads.
Quick start: images: shorthand
Pass images: [...] to ExAthena.query/2, ExAthena.stream/3, or
ExAthena.run/2 alongside a prompt string:
png = File.read!("diagram.png")
{:ok, response} =
ExAthena.query("Describe what you see",
provider: :ollama,
model: "llava",
images: [%{data: png, media_type: "image/png"}]
)
IO.puts(response.text)Each entry in the images: list may be one of:
| Shape | Description |
|---|---|
%{data: binary(), media_type: String.t()} | Inline image bytes |
%{data: binary()} | Inline image, media type defaults to "image/png" |
%{url: String.t()} | Remote image URL |
ExAthena builds a multimodal user message with the text part first, followed
by the image parts. When no prompt is given, the images are merged into the
last user message in :messages, or appended as a new user message.
Full ContentPart approach
For finer control — mixing text, images, and files in arbitrary order — build
ContentPart structs directly and pass them as the message content:
alias ExAthena.Messages
alias ExAthena.Messages.ContentPart
png = File.read!("chart.png")
pdf = File.read!("report.pdf")
parts = [
ContentPart.text("Summarize the chart and cross-reference the report:"),
ContentPart.image(png, "image/png"),
ContentPart.file(pdf, "report.pdf", "application/pdf")
]
{:ok, response} =
ExAthena.query(nil,
provider: :claude,
model: "claude-opus-4-7",
messages: [Messages.user(parts)]
)ContentPart factory functions
| Function | Type | Fields |
|---|---|---|
ContentPart.text(content) | :text | text |
ContentPart.image(data, media_type \\ "image/png") | :image | data, media_type |
ContentPart.image_url(url) | :image_url | url |
ContentPart.file(data, filename, media_type \\ "application/octet-stream") | :file | data, filename, media_type |
Provider examples
Ollama (llava, qwen2-vl)
# config/config.exs
config :ex_athena, :ollama,
base_url: "http://localhost:11434",
model: "llava"
# usage
png = File.read!("screenshot.png")
{:ok, response} =
ExAthena.query("What is shown in this screenshot?",
provider: :ollama,
model: "llava",
images: [%{data: png, media_type: "image/png"}]
)Pull a vision-capable model first:
ollama pull llava
# or
ollama pull qwen2-vl
Ollama vision support is model-dependent. Non-vision models will return an error or silently ignore image parts.
OpenAI-compatible (gpt-4o)
{:ok, response} =
ExAthena.query("What's in this image?",
provider: :openai_compatible,
model: "gpt-4o",
images: [%{url: "https://upload.wikimedia.org/wikipedia/commons/thumb/4/47/PNG_transparency_demonstration_1.png/280px-PNG_transparency_demonstration_1.png"}]
)For inline images with the OpenAI API:
png = File.read!("photo.jpg")
{:ok, response} =
ExAthena.query("Describe the photo",
provider: :openai_compatible,
model: "gpt-4o-mini",
images: [%{data: png, media_type: "image/jpeg"}]
)Anthropic Claude
png = File.read!("diagram.png")
{:ok, response} =
ExAthena.query("Explain this architecture diagram",
provider: :claude,
model: "claude-opus-4-7",
images: [%{data: png, media_type: "image/png"}]
)Claude supports PNG, JPEG, GIF, and WebP. Maximum image size is 5 MB per image.
Google Gemini
png = File.read!("chart.png")
{:ok, response} =
ExAthena.query("What trend does this chart show?",
provider: :gemini,
model: "gemini-2.5-flash",
images: [%{data: png, media_type: "image/png"}]
)Using images: in the agent loop
ExAthena.run/2 forwards images: to Request.new/2 so the first turn
has the image attached:
png = File.read!("codebase_diagram.png")
{:ok, result} =
ExAthena.run("Implement the architecture shown in this diagram",
provider: :claude,
model: "claude-opus-4-7",
cwd: "/path/to/project",
images: [%{data: png, media_type: "image/png"}]
)Image format notes
- Inline images are sent as base64-encoded data to the provider. The
req_llmadapter handles encoding transparently. - Image URLs (
%{url: ...}) are forwarded as-is. The provider fetches the image at inference time. Not all providers support URL references — prefer inline for maximum compatibility. - media_type should match the actual image format (
"image/png","image/jpeg","image/gif","image/webp"). Some providers are lenient; others require an accurate MIME type. - Multiple images in one message are supported by all major providers (Claude, OpenAI, Gemini). Ollama support is model-dependent.
Vision support by provider
| Provider | Vision support | Notes |
|---|---|---|
:ollama | Model-dependent | llava, qwen2-vl, llava-phi3, bakllava |
:openai_compatible | ✅ gpt-4o, gpt-4o-mini | URL + inline; other OAI-compat endpoints vary |
:claude | ✅ Any claude-3+ model | PNG, JPEG, GIF, WebP; max 5 MB per image |
:gemini | ✅ Any gemini-1.5+ model | Inline + URL; very generous size limits |