Dala.ML.Burn.Serving (dala v0.6.0)

Copy Markdown View Source

Nx.Serving integration for ExBurn models in Dala.

Provides batched, concurrent inference using Nx.Serving so that ExBurn models can be used in production pipelines within Dala apps.

Usage

# Compile a model
model = Dala.ML.Burn.compile(axon_model, loss: :cross_entropy, optimizer: :adam)

# Create a serving
serving = Dala.ML.Burn.Serving.build(model, batch_size: 16, batch_timeout: 100)

# Run batched inference
output = Nx.Serving.run(serving, input_tensor)

# Or supervise it in your app tree
children = [
  {Nx.Serving, serving: serving, name: :my_model_serving}
]

Options

  • :batch_size — Maximum number of inputs to batch together (default: 32)
  • :batch_timeout — Max milliseconds to wait for a full batch (default: 50)
  • :partitions — Number of serving partitions (default: scheduler count)
  • :padding — Whether to pad batches to full size (default: false)

Summary

Functions

Builds an Nx.Serving for the given model and options.

Creates a new ExBurn serving for the given compiled model.

Runs inference on a single input tensor using the serving.

Builds an Nx.Serving and supervises it under a registry.

Functions

build(model, opts \\ [])

@spec build(
  ExBurn.Model.t(),
  keyword()
) :: Nx.Serving.t()

Builds an Nx.Serving for the given model and options.

This is the primary entry point for production use. The returned Nx.Serving can be used with Nx.Serving.run/2 or supervised in your application tree.

new(model, opts \\ [])

@spec new(
  ExBurn.Model.t(),
  keyword()
) :: ExBurn.Serving.t()

Creates a new ExBurn serving for the given compiled model.

run(serving, input)

Runs inference on a single input tensor using the serving.

This is a convenience wrapper around Nx.Serving.run/2.

supervise(model, opts \\ [])

@spec supervise(
  ExBurn.Model.t(),
  keyword()
) :: {:ok, pid()} | {:error, term()}

Builds an Nx.Serving and supervises it under a registry.

Returns {:ok, pid} on success.