Nx.Serving integration for ExBurn.
Provides batched, concurrent inference using Nx.Serving so that ExBurn
can be used in Bumblebee-style production pipelines.
Usage
# Define a serving for a compiled model
serving =
ExBurn.Serving.new(model,
batch_size: 32,
batch_timeout: 50,
partitions: System.schedulers_online()
)
# Run batched inference
Nx.Serving.run(serving, input_tensor)Options
:batch_size— Maximum number of inputs to batch together (default: 32):batch_timeout— Max milliseconds to wait for a full batch (default: 50):partitions— Number of serving partitions (default: scheduler count):padding— Whether to pad batches to full size (default: false)
Summary
Functions
Builds an Nx.Serving for the given model and options.
Creates a new ExBurn serving for the given compiled model.
Runs inference on a single input tensor using the serving.
Returns the status of the serving as a map.
Returns a new serving with the specified batch size.
Returns a new serving with the specified batch timeout.
Types
@type model() :: ExBurn.Model.t()
@type t() :: %ExBurn.Serving{ batch_size: pos_integer(), batch_timeout: pos_integer(), model: model(), padding: boolean(), partitions: pos_integer() }
Functions
@spec build( model(), keyword() ) :: Nx.Serving.t()
Builds an Nx.Serving for the given model and options.
This is the primary entry point for production use. The returned
Nx.Serving can be used with Nx.Serving.run/2 or supervised
in your application tree.
Examples
serving =
ExBurn.Serving.build(model,
batch_size: 16,
batch_timeout: 100
)
# Run inference
output = Nx.Serving.run(serving, input)
Creates a new ExBurn serving for the given compiled model.
Returns a struct that can be passed to Nx.Serving or used directly
with run/2.
@spec run(t(), Nx.Tensor.t()) :: Nx.Tensor.t()
Runs inference on a single input tensor using the serving.
This is a convenience wrapper around Nx.Serving.run/2.
Returns the status of the serving as a map.
Returns
%{batch_size: pos_integer(), batch_timeout: pos_integer(),
partitions: pos_integer(), padding: boolean()}
@spec with_batch_size(t(), pos_integer()) :: t()
Returns a new serving with the specified batch size.
@spec with_timeout(t(), pos_integer()) :: t()
Returns a new serving with the specified batch timeout.