Dala.ML.Burn.Serving (dala v0.8.0)

Copy Markdown View Source

Nx.Serving integration for ExBurn models in Dala.

Provides batched, concurrent inference using Nx.Serving so that ExBurn models can be used in production pipelines within Dala apps.

Usage

# Compile a model
model = Dala.ML.Burn.compile(axon_model, loss: :cross_entropy, optimizer: :adam)

# Create a serving
serving = Dala.ML.Burn.Serving.build(model, batch_size: 16, batch_timeout: 100)

# Run batched inference
output = Nx.Serving.run(serving, input_tensor)

# Or supervise it in your app tree
children = [
  {Nx.Serving,
   serving: Dala.ML.Burn.Serving.build(trained_model, batch_size: 32),
   name: :my_model_serving}
]

Options

  • :batch_size — Maximum number of inputs to batch together (default: 32)
  • :batch_timeout — Max milliseconds to wait for a full batch (default: 50)
  • :partitions — Number of serving partitions (default: scheduler count)
  • :padding — Whether to pad batches to full size (default: false)

Summary

Functions

Builds an Nx.Serving for the given model and options.

Creates a new ExBurn serving for the given compiled model.

Runs inference on a single input tensor using the serving.

Returns the ExBurn.Serving.Server module. This is the Nx.Serving behaviour implementation that handles batching and dispatching inference requests to the ExBurn backend.

Returns the status of the serving as a map.

Builds an Nx.Serving and supervises it under a DynamicSupervisor.

Returns a new serving with the specified batch size.

Returns a new serving with the specified batch timeout.

Functions

build(model, opts \\ [])

@spec build(
  ExBurn.Model.t(),
  keyword()
) :: Nx.Serving.t()

Builds an Nx.Serving for the given model and options.

This is the primary entry point for production use. The returned Nx.Serving can be used with Nx.Serving.run/2 or supervised in your application tree.

new(model, opts \\ [])

@spec new(
  ExBurn.Model.t(),
  keyword()
) :: ExBurn.Serving.t()

Creates a new ExBurn serving for the given compiled model.

run(serving, input)

Runs inference on a single input tensor using the serving.

This is a convenience wrapper around Nx.Serving.run/2.

server()

@spec server() :: module()

Returns the ExBurn.Serving.Server module. This is the Nx.Serving behaviour implementation that handles batching and dispatching inference requests to the ExBurn backend.

Implementation details

The server:

You typically don't need to use this directly — use build/2 instead.

status(serving)

@spec status(ExBurn.Serving.t()) :: map()

Returns the status of the serving as a map.

supervise(model, opts \\ [])

@spec supervise(
  ExBurn.Model.t(),
  keyword()
) :: {:ok, pid()} | {:error, term()}

Builds an Nx.Serving and supervises it under a DynamicSupervisor.

Options

  • :name — Name for the serving (default: :burn_serving)
  • :supervisor — DynamicSupervisor pid or name (required)

Returns {:ok, pid} on success.

Example

Dala.ML.Burn.Serving.supervise(model,
  name: :my_model,
  supervisor: MyApp.DynamicSupervisor
)

Alternatively, add the serving directly to your app's children list:

children = [
  {Nx.Serving,
   serving: Dala.ML.Burn.Serving.build(model, batch_size: 32),
   name: :my_model}
]

with_batch_size(serving, batch_size)

@spec with_batch_size(ExBurn.Serving.t(), pos_integer()) :: ExBurn.Serving.t()

Returns a new serving with the specified batch size.

with_timeout(serving, timeout)

@spec with_timeout(ExBurn.Serving.t(), pos_integer()) :: ExBurn.Serving.t()

Returns a new serving with the specified batch timeout.