ExTorch.AOTI.Server
(extorch v0.4.0)
Copy Markdown
A GenServer that wraps a loaded AOTI (.pt2) model for concurrent serving.
Provides the same OTP fault tolerance and telemetry instrumentation as
ExTorch.JIT.Server, but for AOTInductor-compiled models.
Telemetry Events
The server emits the following :telemetry events:
[:extorch, :aoti, :load, :start | :stop]- Model loading.[:extorch, :aoti, :forward, :start | :stop | :exception]- Inference.
All events include %{path: String.t()} in metadata. Forward events also
include %{input_count: integer()}.
Example
{:ok, pid} = ExTorch.AOTI.Server.start_link(path: "model.pt2")
[output] = ExTorch.AOTI.Server.predict(pid, [input])Named servers
{:ok, _} = ExTorch.AOTI.Server.start_link(path: "model.pt2", name: FastModel)
[output] = ExTorch.AOTI.Server.predict(FastModel, [input])
Summary
Functions
Returns a specification to start this module under a supervisor.
Get information about the loaded model.
Run inference on the model (synchronous).
Start an AOTI model server.
Functions
Returns a specification to start this module under a supervisor.
See Supervisor.
@spec info(GenServer.server()) :: map()
Get information about the loaded model.
@spec predict(GenServer.server(), [ExTorch.Tensor.t()], timeout()) :: [ ExTorch.Tensor.t() ]
Run inference on the model (synchronous).
Returns
A list of output tensors.
@spec start_link(keyword()) :: GenServer.on_start()
Start an AOTI model server.
Options
:path(required) - Path to the.pt2model package.:name- Optional registered name.:model_name- Model name within the package (default:"model").:device_index- CUDA device index (default:-1for CPU).