planck_ai is a typed LLM provider abstraction for Elixir, built on top of req_llm. It gives you a single, consistent interface for streaming and completing requests across Anthropic, OpenAI, Google Gemini, Ollama, and llama.cpp — without leaking provider-specific details into your application.

Installation

# mix.exs
{:planck_ai, "~> 0.1"}

Providers

ProviderAtomAPI key env var
Anthropic (Claude):anthropicANTHROPIC_API_KEY
OpenAI (GPT):openaiOPENAI_API_KEY
Google (Gemini):googleGOOGLE_API_KEY
Ollama (local):ollama
llama.cpp (local):llama_cpp

Quick start

alias Planck.AI
alias Planck.AI.{Context, Message}

# 1. Pick a model from the catalog
{:ok, model} = AI.get_model(:anthropic, "claude-sonnet-4-6")

# 2. Build a context
context = %Context{
  system: "You are a helpful assistant.",
  messages: [
    %Message{role: :user, content: [{:text, "What is the Planck length?"}]}
  ]
}

# 3. Stream the response
model
|> AI.stream(context, temperature: 0.7)
|> Enum.each(fn
  {:text_delta, text} -> IO.write(text)
  {:done, _meta}      -> IO.puts("")
  {:error, reason}    -> IO.puts("Error: #{inspect(reason)}")
  _                   -> :ok
end)

# Or block for the full message
{:ok, %Message{content: content}} = AI.complete(model, context)

Model catalog

Cloud providers (:anthropic, :openai, :google) source their catalog from a bundled LLMDB snapshot loaded offline at startup — no network call required. Local providers (:ollama, :llama_cpp) query the running server at call time.

# List all providers
AI.list_providers()
#=> [:anthropic, :openai, :google, :ollama, :llama_cpp]

# List models for a provider
AI.list_models(:anthropic)
#=> [%Planck.AI.Model{id: "claude-opus-4-7", ...}, ...]

# Fetch a specific model by ID
{:ok, model} = AI.get_model(:anthropic, "claude-sonnet-4-6")
{:error, :not_found} = AI.get_model(:anthropic, "does-not-exist")

Anthropic

models = AI.list_models(:anthropic)
{:ok, model} = AI.get_model(:anthropic, "claude-sonnet-4-6")

Requires ANTHROPIC_API_KEY.

OpenAI

models = AI.list_models(:openai)
{:ok, model} = AI.get_model(:openai, "gpt-4o")

Requires OPENAI_API_KEY.

Google Gemini

models = AI.list_models(:google)
{:ok, model} = AI.get_model(:google, "gemini-2.5-flash")

Requires GOOGLE_API_KEY. Models that support extended thinking have supports_thinking: true set in the catalog. To enable thinking on a request, pass the budget via the Google-specific opt:

AI.stream(model, context, google_thinking_budget: 8_192)

Ollama

Ollama has no static catalog — the available models depend on what you have pulled into your local instance. Use all/1 to discover them at runtime, or model/2 to build one directly:

# Discover all models from the running server
models = Planck.AI.Models.Ollama.all()
models = Planck.AI.Models.Ollama.all(base_url: "http://10.0.0.5:11434")

# Build a model struct directly (no server call)
model = Planck.AI.Models.Ollama.model("llama3.2")
model = Planck.AI.Models.Ollama.model("deepseek-r1",
  base_url:          "http://10.0.0.5:11434",
  context_window:    64_000,
  max_tokens:        8_192,
  supports_thinking: true
)

Ollama must be running at http://localhost:11434 (or the specified base_url). No API key needed.

llama.cpp

llama.cpp has no static catalog because the loaded model depends on your server. Use all/1 to discover models, or model/2 to build one directly:

# Discover models from the running server
models = Planck.AI.Models.LlamaCpp.all(base_url: "http://localhost:8080")
models = Planck.AI.Models.LlamaCpp.all(base_url: "http://10.0.0.5:8080", api_key: "secret")

# Build a model struct directly
model = Planck.AI.Models.LlamaCpp.model("mistral-7b",
  base_url:       "http://localhost:8080",
  context_window: 32_768,
  max_tokens:     4_096
)

Pass api_key: when the server requires a token — it is sent as a Bearer header in both all/1 (discovery) and via req_llm during inference.

Per-model inference defaults

%Planck.AI.Model{} has a default_opts field for inference parameters that should apply to every call for that model. Opts passed explicitly to stream/3 or complete/3 override the defaults.

model = Planck.AI.Models.LlamaCpp.model("qwen3-coder",
  default_opts: [temperature: 1.0, top_p: 0.95, top_k: 64, min_p: 0.01]
)

# temperature: 1.0 applies unless overridden
AI.stream(model, context)

# temperature: 0.3 overrides the model default
AI.stream(model, context, temperature: 0.3)

Config file loader

Planck.AI.Config loads a list of models from a JSON file — useful for configuring local servers without hardcoding model structs in your application.

JSON format

Only "id" and "provider" are required. All other fields are optional and have the same defaults as model/2.

[
  {
    "id": "qwen3-coder-q4",
    "provider": "llama_cpp",
    "name": "Qwen3 Coder Q4",
    "base_url": "http://localhost:8080",
    "context_window": 40960,
    "max_tokens": 8192,
    "default_opts": {
      "temperature": 1.0,
      "top_p": 0.95,
      "top_k": 40,
      "min_p": 0.01
    }
  },
  {
    "id": "llama3.2:latest",
    "provider": "ollama",
    "context_window": 4096
  }
]

Valid "provider" values: "anthropic", "openai", "google", "ollama", "llama_cpp".

Valid "input_types" values: "text", "image", "image_url", "file", "video_url". Note that "video_url" is only supported by Google Gemini.

Loading

{:ok, models} = Planck.AI.Config.load("config/models.json")

model = Enum.find(models, &(&1.id == "qwen3-coder-q4"))
AI.stream(model, context)

Invalid entries are skipped with a warning; the file read or JSON parse returning an error is propagated as {:error, reason}.

Streaming events

AI.stream/3 returns a lazy Enumerable of tagged tuples:

EventMeaning
{:text_delta, string}A chunk of assistant text
{:thinking_delta, string}A chunk of extended-thinking text
{:tool_call_complete, %{id:, name:, args:}}A fully-assembled tool call
{:done, %{stop_reason:, usage:}}Stream finished; usage stats included
{:error, reason}Transport or API error; stream halts

Exceptions raised during enumeration (e.g. a dropped HTTP connection) are caught and emitted as {:error, exception} events, so the stream never raises.

Streaming patterns

AI.stream(model, context)
|> Enum.each(fn
  {:text_delta, text} -> IO.write(text)
  {:done, _}          -> IO.puts("")
  {:error, reason}    -> IO.puts("\nError: #{inspect(reason)}")
  _                   -> :ok
end)

Forward events to another process

Since AI.stream/3 returns a lazy enumerable, you can run it in a Task and send each event to a LiveView or any other process as chunks arrive:

parent = self()

Task.start(fn ->
  AI.stream(model, context)
  |> Stream.each(fn event -> send(parent, {:llm_event, event}) end)
  |> Stream.run()
end)

# Handle in a LiveView or GenServer:
def handle_info({:llm_event, {:text_delta, text}}, socket) do
  {:noreply, update(socket, :response, &(&1 <> text))}
end

def handle_info({:llm_event, {:done, _}}, socket) do
  {:noreply, assign(socket, :streaming, false)}
end

def handle_info({:llm_event, _}, socket), do: {:noreply, socket}

Inference parameters

All keyword opts accepted by AI.stream/3 and AI.complete/3 are forwarded directly to req_llm, which handles per-provider translation:

AI.complete(model, context,
  temperature: 0.8,
  top_p:       0.95,
  max_tokens:  2_048
)

Tool calling

Define tools with Tool.new/1 and attach them to the context:

alias Planck.AI.Tool

read_file = Tool.new(
  name: "read_file",
  description: "Read the contents of a file",
  parameters: %{
    "type" => "object",
    "properties" => %{
      "path" => %{"type" => "string", "description" => "Absolute path to the file"}
    },
    "required" => ["path"]
  }
)

context = %Context{
  system: "You are a coding assistant.",
  messages: [
    %Message{role: :user, content: [{:text, "Show me lib/app.ex"}]}
  ],
  tools: [read_file]
}

{:ok, %Message{content: content}} = AI.complete(model, context)

# Inspect the tool calls in the response
for {:tool_call, id, name, args} <- content do
  IO.inspect({id, name, args})
end

To complete the loop, append a tool result message and call complete/3 again:

result_msg = %Message{
  role: :tool_result,
  content: [{:tool_result, call_id, File.read!(args["path"])}]
}

updated_context = %{context | messages: context.messages ++ [assistant_msg, result_msg]}
{:ok, final} = AI.complete(model, updated_context)

Multimodal input

Four content part types carry non-text data:

# Binary image
{:image, File.read!("photo.png"), "image/png"}

# Image by URL (all cloud providers)
{:image_url, "https://example.com/photo.png"}

# Binary file / document (Anthropic PDFs, Google files)
{:file, File.read!("report.pdf"), "application/pdf"}

# Video by URL (Google Gemini only)
{:video_url, "https://example.com/clip.mp4"}
%Message{
  role: :user,
  content: [
    {:image_url, "https://example.com/screenshot.png"},
    {:text, "What do you see in this image?"}
  ]
}

Support depends on the model's input_types field in the catalog.