ExTorch.Export (extorch v0.4.0)

Copy Markdown

Read and introspect PyTorch ExportedProgram .pt2 archives.

This module provides a pure-Elixir reader for .pt2 files produced by torch.export.save(). It can extract the model graph, weight metadata, and raw weight tensors without requiring Python or C++ ExportedProgram support.

Python export workflow

import torch

model = MyModel()
model.eval()
exported = torch.export.export(model, (example_input,))
torch.export.save(exported, "model.pt2")

Elixir usage

# Load and run inference directly
model = ExTorch.Export.load("model.pt2")
output = ExTorch.Export.forward(model, [input])

# Or read schema and weights separately
schema = ExTorch.Export.read_schema("model.pt2")
weights = ExTorch.Export.read_weights("model.pt2")

# Generate DSL source code
IO.puts(ExTorch.Export.to_elixir("model.pt2", "MyModel"))

Note

This reads .pt2 files from torch.export.save, NOT from aoti_compile_and_package. AOTI-compiled .pt2 files don't contain the graph or separable weights -- use ExTorch.AOTI for those.

Summary

Functions

Run inference on a loaded Export model.

Run inference using the pre-compiled graph executor.

Run inference using the native graph executor.

Run forward/2 with per-node timing instrumentation. Returns {output, %{op_target => %{count: N, total_us: T}}}, aggregated by op target so you can see which ops dominate inference time.

Load an exported .pt2 model for inference.

Read the model schema from an exported .pt2 archive.

Load weight tensors from an exported .pt2 archive.

Generate an ExTorch.NN.Module DSL definition from an exported .pt2 archive.

Functions

forward(model, inputs)

Run inference on a loaded Export model.

Interprets the ATen computation graph, dispatching each operation to the corresponding ExTorch tensor function.

Args

  • model (ExTorch.Export.Model) - the loaded model.
  • inputs ([ExTorch.Tensor]) - input tensors, matching the model's user inputs.

Returns

The output tensor (or list of tensors for multi-output models).

Example

model = ExTorch.Export.load("model.pt2")
input = ExTorch.randn({1, 10})
output = ExTorch.Export.forward(model, [input])

forward_compiled(model, inputs)

@spec forward_compiled(ExTorch.Export.Model.t(), [ExTorch.Tensor.t()]) ::
  ExTorch.Tensor.t() | [ExTorch.Tensor.t()]

Run inference using the pre-compiled graph executor.

The fastest Export inference path. All op schemas were resolved and argument templates pre-built at load/2 time. This function only passes tensors to C++ and gets tensors back — zero encoding overhead.

Falls back to forward_native/2 if the graph couldn't be pre-compiled.

model = ExTorch.Export.load("model.pt2", device: :cuda)
output = ExTorch.Export.forward_compiled(model, [input])

forward_native(model, inputs)

Run inference using the native graph executor.

Compiles the schema graph into an instruction stream and executes the entire graph in a single NIF call via execute_graph, eliminating per-node NIF boundary crossings. This is significantly faster than forward/2 for high-node-count models (e.g., ViT with 430 nodes) while still supporting all ops through the c10::Dispatcher.

Falls back gracefully for ops registered via ExTorch.Export.OpRegistry since those are also dispatched through the same C++ dispatcher.

model = ExTorch.Export.load("vit_b_16.pt2", device: :cuda)
input = ExTorch.Tensor.to(input, device: :cuda)
output = ExTorch.Export.forward_native(model, [input])

forward_profiled(model, inputs)

@spec forward_profiled(ExTorch.Export.Model.t(), [ExTorch.Tensor.t()]) ::
  {ExTorch.Tensor.t() | [ExTorch.Tensor.t()], map()}

Run forward/2 with per-node timing instrumentation. Returns {output, %{op_target => %{count: N, total_us: T}}}, aggregated by op target so you can see which ops dominate inference time.

Only meant for diagnostics. Adds ~1μs of measurement overhead per node from :erlang.monotonic_time/1.

load(path, opts \\ [])

@spec load(
  String.t(),
  keyword()
) :: ExTorch.Export.Model.t()

Load an exported .pt2 model for inference.

Reads the graph and weights, and prepares the model for forward/2.

Args

  • path (String) - path to the .pt2 file from torch.export.save.
  • opts (keyword) - optional:
    • :device (:cpu | :cuda | {:cuda, index}) - device to place all weight tensors on. Defaults to :cpu. When set to :cuda, every loaded parameter/buffer is moved to the GPU at load time, so subsequent forward/2 calls run entirely on the GPU (as long as the user input is also on the GPU).

Returns

An %ExTorch.Export.Model{} struct.

Example

# CPU (default)
model = ExTorch.Export.load("model.pt2")
output = ExTorch.Export.forward(model, [input_tensor])

# GPU
model = ExTorch.Export.load("model.pt2", device: :cuda)
input = ExTorch.Tensor.to(cpu_input, device: :cuda)
output = ExTorch.Export.forward(model, [input])

read_schema(path)

@spec read_schema(String.t()) :: map()

Read the model schema from an exported .pt2 archive.

Returns a map with:

  • :graph - the computation graph as a list of node maps
  • :inputs - graph input names
  • :outputs - graph output names
  • :weights - weight metadata (name → shape, dtype, requires_grad)

read_weights(path)

@spec read_weights(String.t()) :: %{required(String.t()) => ExTorch.Tensor.t()}

Load weight tensors from an exported .pt2 archive.

Returns a map of %{fqn => %ExTorch.Tensor{}}.

to_elixir(path, module_name \\ "MyModel")

@spec to_elixir(String.t(), String.t()) :: String.t()

Generate an ExTorch.NN.Module DSL definition from an exported .pt2 archive.

Maps ATen operations in the graph to ExTorch NN layer types where possible.

Args

  • path - path to the .pt2 file.
  • module_name - name for the generated Elixir module.