Nx.Serving integration for ExBurn models in Dala.
Provides batched, concurrent inference using Nx.Serving so that
ExBurn models can be used in production pipelines within Dala apps.
Usage
# Compile a model
model = Dala.ML.Burn.compile(axon_model, loss: :cross_entropy, optimizer: :adam)
# Create a serving
serving = Dala.ML.Burn.Serving.build(model, batch_size: 16, batch_timeout: 100)
# Run batched inference
output = Nx.Serving.run(serving, input_tensor)
# Or supervise it in your app tree
children = [
{Nx.Serving,
serving: Dala.ML.Burn.Serving.build(trained_model, batch_size: 32),
name: :my_model_serving}
]Options
:batch_size— Maximum number of inputs to batch together (default: 32):batch_timeout— Max milliseconds to wait for a full batch (default: 50):partitions— Number of serving partitions (default: scheduler count):padding— Whether to pad batches to full size (default: false)
Summary
Functions
Builds an Nx.Serving for the given model and options.
Creates a new ExBurn serving for the given compiled model.
Runs inference on a single input tensor using the serving.
Returns the status of the serving as a map.
Builds an Nx.Serving and supervises it under a DynamicSupervisor.
Returns a new serving with the specified batch size.
Returns a new serving with the specified batch timeout.
Functions
@spec build( ExBurn.Model.t(), keyword() ) :: Nx.Serving.t()
Builds an Nx.Serving for the given model and options.
This is the primary entry point for production use. The returned
Nx.Serving can be used with Nx.Serving.run/2 or supervised
in your application tree.
@spec new( ExBurn.Model.t(), keyword() ) :: ExBurn.Serving.t()
Creates a new ExBurn serving for the given compiled model.
@spec run(ExBurn.Serving.t(), Nx.Tensor.t()) :: Nx.Tensor.t()
Runs inference on a single input tensor using the serving.
This is a convenience wrapper around Nx.Serving.run/2.
@spec status(ExBurn.Serving.t()) :: map()
Returns the status of the serving as a map.
@spec supervise( ExBurn.Model.t(), keyword() ) :: {:ok, pid()} | {:error, term()}
Builds an Nx.Serving and supervises it under a DynamicSupervisor.
Options
:name— Name for the serving (default::burn_serving):supervisor— DynamicSupervisor pid or name (required)
Returns {:ok, pid} on success.
Example
Dala.ML.Burn.Serving.supervise(model,
name: :my_model,
supervisor: MyApp.DynamicSupervisor
)Alternatively, add the serving directly to your app's children list:
children = [
{Nx.Serving,
serving: Dala.ML.Burn.Serving.build(model, batch_size: 32),
name: :my_model}
]
@spec with_batch_size(ExBurn.Serving.t(), pos_integer()) :: ExBurn.Serving.t()
Returns a new serving with the specified batch size.
@spec with_timeout(ExBurn.Serving.t(), pos_integer()) :: ExBurn.Serving.t()
Returns a new serving with the specified batch timeout.