ExTorch.JIT.Server (extorch v0.3.0)

Copy Markdown

A GenServer that wraps a loaded TorchScript model for concurrent serving.

Provides process isolation, fault tolerance, and serialized access to model inference. Forward calls are serialized through the GenServer to ensure thread safety for models with mutable state (e.g., BatchNorm, Dropout).

Telemetry Events

The server emits the following :telemetry events:

  • [:extorch, :jit, :load, :start] - When model loading begins.

    • Measurements: %{system_time: integer}
    • Metadata: %{path: String.t(), device: atom()}
  • [:extorch, :jit, :load, :stop] - When model loading completes.

    • Measurements: %{duration: native_time}
    • Metadata: %{path: String.t(), device: atom()}
  • [:extorch, :jit, :load, :exception] - When model loading fails.

    • Measurements: %{duration: native_time}
    • Metadata: %{path: String.t(), device: atom(), kind: atom(), reason: term()}
  • [:extorch, :jit, :forward, :start] - When inference begins.

    • Measurements: %{system_time: integer}
    • Metadata: %{path: String.t(), device: atom(), input_count: integer()}
  • [:extorch, :jit, :forward, :stop] - When inference completes.

    • Measurements: %{duration: native_time}
    • Metadata: %{path: String.t(), device: atom(), input_count: integer()}
  • [:extorch, :jit, :forward, :exception] - When inference fails.

    • Measurements: %{duration: native_time}
    • Metadata: %{path: String.t(), device: atom(), input_count: integer(), kind: atom(), reason: term()}

Example

{:ok, pid} = ExTorch.JIT.Server.start_link(path: "model.pt", device: :cpu)
result = ExTorch.JIT.Server.predict(pid, [input_tensor])

Summary

Functions

Returns a specification to start this module under a supervisor.

Get information about the loaded model.

Run inference on the model (synchronous).

Start a model server.

Functions

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

info(server)

@spec info(GenServer.server()) :: map()

Get information about the loaded model.

predict(server, inputs, timeout \\ 30000)

@spec predict(GenServer.server(), [ExTorch.Tensor.t()], timeout()) :: term()

Run inference on the model (synchronous).

Arguments

  • server - PID or registered name of the model server.
  • inputs - List of input tensors.
  • timeout - Call timeout in milliseconds (default: 30_000).

Returns

The model output.

start_link(opts)

@spec start_link(keyword()) :: GenServer.on_start()

Start a model server.

Options

  • :path (required) - Path to the .pt model file.
  • :device - Device to load the model onto (default: :cpu).
  • :name - Optional registered name for the server.
  • :eval - Whether to set the model to eval mode on load (default: true).