ExTorch.AOTI.Server (extorch v0.4.0)

Copy Markdown

A GenServer that wraps a loaded AOTI (.pt2) model for concurrent serving.

Provides the same OTP fault tolerance and telemetry instrumentation as ExTorch.JIT.Server, but for AOTInductor-compiled models.

Telemetry Events

The server emits the following :telemetry events:

  • [:extorch, :aoti, :load, :start | :stop] - Model loading.

  • [:extorch, :aoti, :forward, :start | :stop | :exception] - Inference.

All events include %{path: String.t()} in metadata. Forward events also include %{input_count: integer()}.

Example

{:ok, pid} = ExTorch.AOTI.Server.start_link(path: "model.pt2")
[output] = ExTorch.AOTI.Server.predict(pid, [input])

Named servers

{:ok, _} = ExTorch.AOTI.Server.start_link(path: "model.pt2", name: FastModel)
[output] = ExTorch.AOTI.Server.predict(FastModel, [input])

Summary

Functions

Returns a specification to start this module under a supervisor.

Get information about the loaded model.

Run inference on the model (synchronous).

Start an AOTI model server.

Functions

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

info(server)

@spec info(GenServer.server()) :: map()

Get information about the loaded model.

predict(server, inputs, timeout \\ 30000)

@spec predict(GenServer.server(), [ExTorch.Tensor.t()], timeout()) :: [
  ExTorch.Tensor.t()
]

Run inference on the model (synchronous).

Returns

A list of output tensors.

start_link(opts)

@spec start_link(keyword()) :: GenServer.on_start()

Start an AOTI model server.

Options

  • :path (required) - Path to the .pt2 model package.
  • :name - Optional registered name.
  • :model_name - Model name within the package (default: "model").
  • :device_index - CUDA device index (default: -1 for CPU).