Nx.Serving integration for ExBurn.
Provides batched, concurrent inference using Nx.Serving so that ExBurn
can be used in Bumblebee-style production pipelines.
Usage
# Define a serving for a compiled model
serving =
ExBurn.Serving.new(model,
batch_size: 32,
batch_timeout: 50,
partitions: System.schedulers_online()
)
# Run batched inference
Nx.Serving.run(serving, input_tensor)Options
:batch_size— Maximum number of inputs to batch together (default: 32):batch_timeout— Max milliseconds to wait for a full batch (default: 50):partitions— Number of serving partitions (default: scheduler count):padding— Whether to pad batches to full size (default: false)
Summary
Functions
Builds an Nx.Serving for the given model and options.
Creates a new ExBurn serving for the given compiled model.
Runs inference on a single input tensor using the serving.
Types
@type t() :: %ExBurn.Serving{ batch_size: pos_integer(), batch_timeout: pos_integer(), model: ExBurn.Model.t(), padding: boolean(), partitions: pos_integer() }
Functions
@spec build( ExBurn.Model.t(), keyword() ) :: Nx.Serving.t()
Builds an Nx.Serving for the given model and options.
This is the primary entry point for production use. The returned
Nx.Serving can be used with Nx.Serving.run/2 or supervised
in your application tree.
Examples
serving =
ExBurn.Serving.build(model,
batch_size: 16,
batch_timeout: 100
)
# Run inference
output = Nx.Serving.run(serving, input)
@spec new( ExBurn.Model.t(), keyword() ) :: t()
Creates a new ExBurn serving for the given compiled model.
Returns a struct that can be passed to Nx.Serving or used directly
with run/2.
@spec run(t(), Nx.Tensor.t()) :: Nx.Tensor.t()
Runs inference on a single input tensor using the serving.
This is a convenience wrapper around Nx.Serving.run/2.