View Source Agens.Serving behaviour (agens v0.2.0)
The Serving module provides functions for starting, stopping and running Servings.
A Serving is a module that implements the Agens.Serving behaviour and is started as a GenServer. Its job is to take a prepared Agens.Message, perform LM inference, and return a structured Agens.Serving.Result. The actual inference call inside handle_message/3 can target anything — an HTTP API (OpenAI, Anthropic, Ollama, etc), an in-process Nx.Serving/Bumblebee pipeline, a local rules engine — Agens is unopinionated about the backend.
A single Agens.Serving process can be reused across many Agens.Job.Nodes and Jobs. In most cases you will only need to start one text-generation Serving to be used by most, if not all, of your Nodes.
In some cases, you may have additional Servings for more specific use cases such as image generation, speech recognition, etc.
Routing
A Serving owns routing for any Node that declares it. The router can be the Serving module itself (the "merged" pattern) or a separate module (the "split" pattern):
# Merged: Serving is its own Router
defmodule MyServing do
use Agens.Serving
use Agens.Router
# ...handle_message/3, handle_result/3, outputs/1, resolve/2
end
# Split: a dedicated Router module reused across Servings
defmodule MyRouter do
use Agens.Router
# outputs/1, resolve/2
end
defmodule MyServing do
use Agens.Serving, router: MyRouter
# ...handle_message/3, handle_result/3
endWhen :router is omitted it defaults to __MODULE__, so merging is the zero-config path.
After handle_result/3 or handle_sub/3 returns a Result, the macro auto-invokes the router's route/1 on the message+outputs only when the returned next is empty/nil. Callbacks that need to set next explicitly (e.g. :end, :retry) can still do so directly and the router will not override.
Sub-Jobs (handle_sub/3)
When a Node declares both :serving and :sub, the Sub-Job runs in place of a Serving inference call. Once the Sub completes, the parent invokes handle_sub/3 on the Node's declared Serving to derive the parent Node's outputs and next from the Sub's final Agens.Message.
The default implementation generates the router's outputs/1 keys with nil values and lets the router's resolve/2 produce the fallback next. Override handle_sub/3 when the Sub's output schema differs from the parent's — typical implementations map the Sub's outputs/body into the parent's output schema, optionally via an LM call.
Summary
Callbacks
Renders an Agens.Message into the {system, user} prompt pair sent to the LM.
Builds the full JSON schema sent to the LM for structured-output enforcement.
Performs the LM inference call for a single Node run.
Validates the parsed LM response and converts it into an Agens.Serving.Result.
Maps a completed Sub-Job's final message back onto the parent Node's Agens.Serving.Result.
Loads per-agent context for injection under the Context prefix.
Resolves an Agens.Resource declared on the Node before inference.
Returns the {property_key, schema} pair for the structured-output fragment of the response.
Returns the base response-shape JSON schema (root object passed to build_schema/1).
Initialization hook called once during init/1.
Executes a single tool call requested by the LM.
Returns the {property_key, schema} pair for the tool-call fragment of the response.
Functions
Retrieves the Serving configuration by Serving name or pid.
Executes an Agens.Message against an Agens.Serving
Starts an Agens.Serving process
Stops an Agens.Serving process
Types
@type state() :: %{:config => Agens.Serving.Config.t(), optional(atom()) => any()}
The internal state passed to every Serving callback.
Always a map containing at least :config (the Agens.Serving.Config the Serving was started
with). Additional keys are managed by the macro injected by use Agens.Serving (queue, counters,
etc) and may be augmented by start/1.
Callbacks
@callback build_prompt(Agens.Message.t(), Agens.Prefixes.t(), binary() | nil) :: {String.t(), String.t()}
Renders an Agens.Message into the {system, user} prompt pair sent to the LM.
Receives the message, the Serving's Agens.Prefixes, and any context string returned by
load_context/2. The default implementation calls Agens.Prompt.build/3 and joins each
section into two heading-prefixed strings — appropriate for OpenAI-style system/user
concatenation.
Override for providers that need a different shape, e.g. a chat-message array with role
labels for Anthropic, multi-turn message lists, or custom delimiters. The two-string return
contract is fixed; the strings can be whatever the provider's handle_message/3 expects to
receive on message.system / message.user.
Optional — defaults to the heading-and-detail format described above.
@callback build_schema(Agens.Message.t()) :: map()
Builds the full JSON schema sent to the LM for structured-output enforcement.
Default implementation composes response_schema/1, outputs_schema/1, and
tools_schema/1 into one object: it starts from response_schema/1, layers the
outputs and tools fragments under their declared property keys, and sets required to all
top-level properties (strict-mode compatible).
Override only when the layered callbacks below can't express the shape you need.
Optional.
@callback handle_message(state(), Agens.Message.t(), map()) :: {:ok, term()} | {:error, term()}
Performs the LM inference call for a single Node run.
Receives the current state, the prepared Agens.Message (with :system and :user already
populated by build_prompt/3 and :agent_id/:objective/etc. carried through from the
Node), and the JSON schema assembled from the Router's declared outputs and the Node's tools.
The host writes the actual HTTP call (OpenAI, Anthropic, Ollama, etc) or Nx.Serving /
Bumblebee invocation here, returning either {:ok, parsed} — where parsed is whatever
shape the host wants handle_result/3 to receive — or {:error, reason}.
@callback handle_result({:ok, term()} | {:error, any()}, state(), Agens.Message.t()) :: {:ok, Agens.Serving.Result.t()} | {:error, any()} | {:retry, String.t()}
Validates the parsed LM response and converts it into an Agens.Serving.Result.
Receives the tagged tuple from handle_message/3, the current state, and the Agens.Message
the inference ran against. Returns one of:
{:ok, %Agens.Serving.Result{}}— the response passed validation; routing continues.{:retry, reason}— the response failed validation; the runtime increments the retry counter and re-runs the Node withreasoninjected under theRetryprefix. Bounded byAgens.Job.Config.max_retries.{:error, reason}— hard error; the Job terminates via the normal error path.
This is the seam for custom validation — domain-specific business rules, structured-output shape checks, downstream API errors that should retry rather than abort.
@callback handle_sub(state(), Agens.Message.t(), Agens.Message.t()) :: {:ok, Agens.Serving.Result.t()} | {:error, any()}
Maps a completed Sub-Job's final message back onto the parent Node's Agens.Serving.Result.
Invoked when a Node declares both :serving and :sub and the Sub-Job runs in place of
inference. The first message is the Sub's final Agens.Message (with :result populated by
the Sub's last Node); the second is the parent Node's Agens.Message at the point the Sub
was launched.
Returns {:ok, %Agens.Serving.Result{}} populating the parent Node's outputs and next.
Typical implementations either reshape the Sub's outputs to match the parent's output schema
directly, or feed the Sub's body into an LM call to derive parent outputs.
Optional — the default implementation populates body from the Sub's :result and zeroes
outputs against the router's declared keys, leaving routing to the router's route/1
fallback.
@callback load_context(state(), Agens.Message.t()) :: String.t() | nil
Loads per-agent context for injection under the Context prefix.
Called before build_prompt/3 on every Node run, with the current state and the
Agens.Message (which carries the Node's :agent_id). Return a string to surface as context,
or nil to omit the section. Typical uses: per-agent system prompts/personas, retrieved
memory, and conversation history for turn-based / multi-turn Servings — Agens does not
store messages across runs, so multi-turn flows load prior turns here keyed by run_id,
parent_run_id, or an application-defined conversation id carried via agent_id.
Optional — defaults to nil.
@callback load_resource(state(), Agens.Resource.t(), Agens.Message.t()) :: Agens.Resource.t()
Resolves an Agens.Resource declared on the Node before inference.
Called once per declared resource. Receives the current state, the Agens.Resource struct
(URI, name, optional description), and the Agens.Message. Return the resource with :content
populated — file contents, vector-DB result, MCP resources/read response, HTTP GET body, or
whatever the URI maps to in your application. Loaded content is surfaced under the Resources
prefix in the prompt.
Optional — defaults to returning the resource unchanged.
@callback outputs_schema(Agens.Message.t()) :: {binary(), map()}
Returns the {property_key, schema} pair for the structured-output fragment of the response.
Default returns {"outputs", Agens.Schema.outputs()} — an empty placeholder. Most Servings
override this to derive a strict schema from the Router's declared Agens.Router.Output list
(see examples/servings/instructor_serving.ex).
Optional.
@callback response_schema(Agens.Message.t()) :: map()
Returns the base response-shape JSON schema (root object passed to build_schema/1).
Default is Agens.Schema.response/0 — a minimal {type: "object", properties: %{}} shell
that build_schema/1 augments with outputs and tool_calls properties. Override to
declare additional top-level response fields the LM should emit.
Optional.
Initialization hook called once during init/1.
Receives the initial state (already populated with :config, :queue, :count, and :limit)
and returns {:ok, state}. Typical implementations stash provider-specific configuration from
state.config.args (API base URLs, credentials, model identifiers, HTTP clients) into the
state for use in handle_message/3.
@callback tool_call(state(), map(), Agens.Message.t()) :: {binary() | integer(), any()} | {:error, term()}
Executes a single tool call requested by the LM.
Invoked for each entry in the LM response's tool_calls. Receives the current state, the
tool-call args map (matching the tool's declared parameter schema), and the Agens.Message.
Returns {tool_id, result} — where tool_id matches the LM's tool-call id and result is
whatever JSON-serializable value should be surfaced under the Tool Results prefix on the
next Node run — or {:error, reason} on failure.
The host writes the actual tool effect: hitting an MCP server, calling an HTTP API, executing local code, querying a database.
Optional — defaults to {:error, :tool_exec_not_implemented}.
@callback tools_schema(Agens.Message.t()) :: {binary(), map()}
Returns the {property_key, schema} pair for the tool-call fragment of the response.
Default returns {"tool_calls", Agens.Schema.tools()}. Override when the Node's tool
schemas need provider-specific shaping or when the property key needs to match a provider's
expected tool-call field name.
Optional.
Functions
@spec get_config(atom() | pid()) :: {:ok, Agens.Serving.Config.t()} | {:error, :serving_not_found}
Retrieves the Serving configuration by Serving name or pid.
@spec run(Agens.Message.t()) :: {:ok, Agens.Serving.Result.t()} | {:error, term()} | {:retry, String.t()}
Executes an Agens.Message against an Agens.Serving
@spec start(Agens.Serving.Config.t()) :: {:ok, pid()} | {:error, term()}
Starts an Agens.Serving process
@spec stop(atom()) :: :ok | {:error, :serving_not_found}
Stops an Agens.Serving process