View Source Agens.Serving behaviour (agens v0.2.0)

The Serving module provides functions for starting, stopping and running Servings.

A Serving is a module that implements the Agens.Serving behaviour and is started as a GenServer. Its job is to take a prepared Agens.Message, perform LM inference, and return a structured Agens.Serving.Result. The actual inference call inside handle_message/3 can target anything — an HTTP API (OpenAI, Anthropic, Ollama, etc), an in-process Nx.Serving/Bumblebee pipeline, a local rules engine — Agens is unopinionated about the backend.

A single Agens.Serving process can be reused across many Agens.Job.Nodes and Jobs. In most cases you will only need to start one text-generation Serving to be used by most, if not all, of your Nodes.

In some cases, you may have additional Servings for more specific use cases such as image generation, speech recognition, etc.

Routing

A Serving owns routing for any Node that declares it. The router can be the Serving module itself (the "merged" pattern) or a separate module (the "split" pattern):

# Merged: Serving is its own Router
defmodule MyServing do
  use Agens.Serving
  use Agens.Router

  # ...handle_message/3, handle_result/3, outputs/1, resolve/2
end

# Split: a dedicated Router module reused across Servings
defmodule MyRouter do
  use Agens.Router
  # outputs/1, resolve/2
end

defmodule MyServing do
  use Agens.Serving, router: MyRouter
  # ...handle_message/3, handle_result/3
end

When :router is omitted it defaults to __MODULE__, so merging is the zero-config path.

After handle_result/3 or handle_sub/3 returns a Result, the macro auto-invokes the router's route/1 on the message+outputs only when the returned next is empty/nil. Callbacks that need to set next explicitly (e.g. :end, :retry) can still do so directly and the router will not override.

Sub-Jobs (handle_sub/3)

When a Node declares both :serving and :sub, the Sub-Job runs in place of a Serving inference call. Once the Sub completes, the parent invokes handle_sub/3 on the Node's declared Serving to derive the parent Node's outputs and next from the Sub's final Agens.Message.

The default implementation generates the router's outputs/1 keys with nil values and lets the router's resolve/2 produce the fallback next. Override handle_sub/3 when the Sub's output schema differs from the parent's — typical implementations map the Sub's outputs/body into the parent's output schema, optionally via an LM call.

Summary

Types

The internal state passed to every Serving callback.

Callbacks

Renders an Agens.Message into the {system, user} prompt pair sent to the LM.

Builds the full JSON schema sent to the LM for structured-output enforcement.

Performs the LM inference call for a single Node run.

Validates the parsed LM response and converts it into an Agens.Serving.Result.

Maps a completed Sub-Job's final message back onto the parent Node's Agens.Serving.Result.

Loads per-agent context for injection under the Context prefix.

Resolves an Agens.Resource declared on the Node before inference.

Returns the {property_key, schema} pair for the structured-output fragment of the response.

Returns the base response-shape JSON schema (root object passed to build_schema/1).

Initialization hook called once during init/1.

Executes a single tool call requested by the LM.

Returns the {property_key, schema} pair for the tool-call fragment of the response.

Functions

Retrieves the Serving configuration by Serving name or pid.

Executes an Agens.Message against an Agens.Serving

Starts an Agens.Serving process

Stops an Agens.Serving process

Types

@type state() :: %{:config => Agens.Serving.Config.t(), optional(atom()) => any()}

The internal state passed to every Serving callback.

Always a map containing at least :config (the Agens.Serving.Config the Serving was started with). Additional keys are managed by the macro injected by use Agens.Serving (queue, counters, etc) and may be augmented by start/1.

Callbacks

Link to this callback

build_prompt(t, t, arg3)

View Source
@callback build_prompt(Agens.Message.t(), Agens.Prefixes.t(), binary() | nil) ::
  {String.t(), String.t()}

Renders an Agens.Message into the {system, user} prompt pair sent to the LM.

Receives the message, the Serving's Agens.Prefixes, and any context string returned by load_context/2. The default implementation calls Agens.Prompt.build/3 and joins each section into two heading-prefixed strings — appropriate for OpenAI-style system/user concatenation.

Override for providers that need a different shape, e.g. a chat-message array with role labels for Anthropic, multi-turn message lists, or custom delimiters. The two-string return contract is fixed; the strings can be whatever the provider's handle_message/3 expects to receive on message.system / message.user.

Optional — defaults to the heading-and-detail format described above.

Link to this callback

build_schema(t)

View Source (optional)
@callback build_schema(Agens.Message.t()) :: map()

Builds the full JSON schema sent to the LM for structured-output enforcement.

Default implementation composes response_schema/1, outputs_schema/1, and tools_schema/1 into one object: it starts from response_schema/1, layers the outputs and tools fragments under their declared property keys, and sets required to all top-level properties (strict-mode compatible).

Override only when the layered callbacks below can't express the shape you need.

Optional.

Link to this callback

handle_message(state, t, map)

View Source
@callback handle_message(state(), Agens.Message.t(), map()) ::
  {:ok, term()} | {:error, term()}

Performs the LM inference call for a single Node run.

Receives the current state, the prepared Agens.Message (with :system and :user already populated by build_prompt/3 and :agent_id/:objective/etc. carried through from the Node), and the JSON schema assembled from the Router's declared outputs and the Node's tools.

The host writes the actual HTTP call (OpenAI, Anthropic, Ollama, etc) or Nx.Serving / Bumblebee invocation here, returning either {:ok, parsed} — where parsed is whatever shape the host wants handle_result/3 to receive — or {:error, reason}.

Link to this callback

handle_result(arg1, state, t)

View Source
@callback handle_result({:ok, term()} | {:error, any()}, state(), Agens.Message.t()) ::
  {:ok, Agens.Serving.Result.t()} | {:error, any()} | {:retry, String.t()}

Validates the parsed LM response and converts it into an Agens.Serving.Result.

Receives the tagged tuple from handle_message/3, the current state, and the Agens.Message the inference ran against. Returns one of:

  • {:ok, %Agens.Serving.Result{}} — the response passed validation; routing continues.
  • {:retry, reason} — the response failed validation; the runtime increments the retry counter and re-runs the Node with reason injected under the Retry prefix. Bounded by Agens.Job.Config.max_retries.
  • {:error, reason} — hard error; the Job terminates via the normal error path.

This is the seam for custom validation — domain-specific business rules, structured-output shape checks, downstream API errors that should retry rather than abort.

Link to this callback

handle_sub(state, t, t)

View Source (optional)
@callback handle_sub(state(), Agens.Message.t(), Agens.Message.t()) ::
  {:ok, Agens.Serving.Result.t()} | {:error, any()}

Maps a completed Sub-Job's final message back onto the parent Node's Agens.Serving.Result.

Invoked when a Node declares both :serving and :sub and the Sub-Job runs in place of inference. The first message is the Sub's final Agens.Message (with :result populated by the Sub's last Node); the second is the parent Node's Agens.Message at the point the Sub was launched.

Returns {:ok, %Agens.Serving.Result{}} populating the parent Node's outputs and next. Typical implementations either reshape the Sub's outputs to match the parent's output schema directly, or feed the Sub's body into an LM call to derive parent outputs.

Optional — the default implementation populates body from the Sub's :result and zeroes outputs against the router's declared keys, leaving routing to the router's route/1 fallback.

Link to this callback

load_context(state, t)

View Source (optional)
@callback load_context(state(), Agens.Message.t()) :: String.t() | nil

Loads per-agent context for injection under the Context prefix.

Called before build_prompt/3 on every Node run, with the current state and the Agens.Message (which carries the Node's :agent_id). Return a string to surface as context, or nil to omit the section. Typical uses: per-agent system prompts/personas, retrieved memory, and conversation history for turn-based / multi-turn Servings — Agens does not store messages across runs, so multi-turn flows load prior turns here keyed by run_id, parent_run_id, or an application-defined conversation id carried via agent_id.

Optional — defaults to nil.

Link to this callback

load_resource(state, t, t)

View Source (optional)
@callback load_resource(state(), Agens.Resource.t(), Agens.Message.t()) ::
  Agens.Resource.t()

Resolves an Agens.Resource declared on the Node before inference.

Called once per declared resource. Receives the current state, the Agens.Resource struct (URI, name, optional description), and the Agens.Message. Return the resource with :content populated — file contents, vector-DB result, MCP resources/read response, HTTP GET body, or whatever the URI maps to in your application. Loaded content is surfaced under the Resources prefix in the prompt.

Optional — defaults to returning the resource unchanged.

Link to this callback

outputs_schema(t)

View Source (optional)
@callback outputs_schema(Agens.Message.t()) :: {binary(), map()}

Returns the {property_key, schema} pair for the structured-output fragment of the response.

Default returns {"outputs", Agens.Schema.outputs()} — an empty placeholder. Most Servings override this to derive a strict schema from the Router's declared Agens.Router.Output list (see examples/servings/instructor_serving.ex).

Optional.

Link to this callback

response_schema(t)

View Source (optional)
@callback response_schema(Agens.Message.t()) :: map()

Returns the base response-shape JSON schema (root object passed to build_schema/1).

Default is Agens.Schema.response/0 — a minimal {type: "object", properties: %{}} shell that build_schema/1 augments with outputs and tool_calls properties. Override to declare additional top-level response fields the LM should emit.

Optional.

@callback start(state()) :: {:ok, state()}

Initialization hook called once during init/1.

Receives the initial state (already populated with :config, :queue, :count, and :limit) and returns {:ok, state}. Typical implementations stash provider-specific configuration from state.config.args (API base URLs, credentials, model identifiers, HTTP clients) into the state for use in handle_message/3.

Link to this callback

tool_call(state, map, t)

View Source (optional)
@callback tool_call(state(), map(), Agens.Message.t()) ::
  {binary() | integer(), any()} | {:error, term()}

Executes a single tool call requested by the LM.

Invoked for each entry in the LM response's tool_calls. Receives the current state, the tool-call args map (matching the tool's declared parameter schema), and the Agens.Message. Returns {tool_id, result} — where tool_id matches the LM's tool-call id and result is whatever JSON-serializable value should be surfaced under the Tool Results prefix on the next Node run — or {:error, reason} on failure.

The host writes the actual tool effect: hitting an MCP server, calling an HTTP API, executing local code, querying a database.

Optional — defaults to {:error, :tool_exec_not_implemented}.

Link to this callback

tools_schema(t)

View Source (optional)
@callback tools_schema(Agens.Message.t()) :: {binary(), map()}

Returns the {property_key, schema} pair for the tool-call fragment of the response.

Default returns {"tool_calls", Agens.Schema.tools()}. Override when the Node's tool schemas need provider-specific shaping or when the property key needs to match a provider's expected tool-call field name.

Optional.

Functions

@spec get_config(atom() | pid()) ::
  {:ok, Agens.Serving.Config.t()} | {:error, :serving_not_found}

Retrieves the Serving configuration by Serving name or pid.

@spec run(Agens.Message.t()) ::
  {:ok, Agens.Serving.Result.t()} | {:error, term()} | {:retry, String.t()}

Executes an Agens.Message against an Agens.Serving

@spec start(Agens.Serving.Config.t()) :: {:ok, pid()} | {:error, term()}

Starts an Agens.Serving process

@spec stop(atom()) :: :ok | {:error, :serving_not_found}

Stops an Agens.Serving process