Candil.Inference (Candil v1.0.0)

Copy Markdown View Source

Inference execution for Candil.

Handles chat completions and embeddings for both local engines and remote providers. Normalises the request/response format across the supported provider APIs (OpenAI, Anthropic, Ollama, OpenAI-compatible).

Message format

All messages are plain maps with :role and :content string keys:

%{role: "system", content: "You are a helpful assistant."}
%{role: "user", content: "Hello!"}
%{role: "assistant", content: "Hi there!"}

Response format

All chat functions return a response() map:

%{
  content: "Hello, how can I help?",
  role: "assistant",
  model: "llama-3-8b",
  finish_reason: "stop",
  usage: %{prompt_tokens: 12, completion_tokens: 8, total_tokens: 20}
}

Summary

Functions

Runs a chat completion against a local llama-server.

Runs a chat completion against a remote provider.

Generates embeddings for a list of texts against a local engine.

Generates embeddings for a list of texts against a remote provider.

Types

embed_response()

@type embed_response() :: [[float()]]

message()

@type message() :: %{role: binary(), content: binary()}

response()

@type response() :: %{
  content: binary(),
  role: binary(),
  model: binary(),
  finish_reason: binary() | nil,
  usage: usage() | nil
}

usage()

@type usage() :: %{
  prompt_tokens: non_neg_integer(),
  completion_tokens: non_neg_integer(),
  total_tokens: non_neg_integer()
}

Functions

chat_local(model_alias, messages, opts \\ [])

@spec chat_local(atom(), [message()], keyword()) ::
  {:ok, response()} | {:error, any()}

Runs a chat completion against a local llama-server.

The engine must be running and healthy. Resolves the server URL from the registry via the model alias.

Options

  • :temperature — sampling temperature 0.0–2.0 (default: 0.7)
  • :max_tokens — maximum tokens to generate (default: 512)
  • :stop — list of stop sequences (default: [])
  • :system — system prompt string (prepended to messages if set)

chat_remote(model, provider, messages, opts)

@spec chat_remote(Candil.Model.t(), Candil.Provider.t(), [message()], keyword()) ::
  {:ok, response()} | {:error, any()}

Runs a chat completion against a remote provider.

Dispatches to the appropriate protocol based on provider.type.

Options

Same as chat_local/3.

embed_local(model_alias, texts, opts \\ [])

@spec embed_local(atom(), [binary()], keyword()) ::
  {:ok, embed_response()} | {:error, any()}

Generates embeddings for a list of texts against a local engine.

The model must have :embeddings in its usage list.

embed_remote(model, provider, texts, opts)

@spec embed_remote(Candil.Model.t(), Candil.Provider.t(), [binary()], keyword()) ::
  {:ok, embed_response()} | {:error, any()}

Generates embeddings for a list of texts against a remote provider.