GPU-accelerated ML inference via EXCubeCL.
Provides a high-level interface for running ML models on the GPU,
integrating with Dala's existing Dala.ML modules and Nx tensors.
Architecture
Nx Tensor → GPU Buffer → CubeCL Kernels → GPU Buffer → Nx TensorSupported Models
Models are loaded from pre-compiled CubeCL kernel libraries:
:mobilenet_v2— Image classification (224x224 RGB):yolo_v5— Object detection (640x640 RGB):blazeface— Face detection (128x128 RGB):posenet— Pose estimation (257x257 RGB):deeplab— Semantic segmentation (513x513 RGB)
Example
# Load a model
{:ok, model} = Dala.ML.Gpu.load_model(:mobilenet_v2)
# Preprocess image
input_tensor = Dala.ML.preprocess(image_data, size: {224, 224})
# Run inference on GPU
{:ok, output} = Dala.ML.Gpu.predict(model, input_tensor)
# Post-process results
classes = Dala.ML.Gpu.top_k(output, k: 5)GPU-to-GPU Frame Inference
For video pipelines, run inference directly on GPU frame buffers without CPU round-trip:
{:ok, model} = Dala.ML.Gpu.load_model(:mobilenet_v2)
# Load model from video frames (GPU textures)
{:ok, model} = Dala.ML.Gpu.load_model_from_frames(model, video_frames)
# Run inference on a single frame (GPU-to-GPU)
{:ok, output_tensor} = Dala.ML.Gpu.predict_frame(model, frame)Integration with Dala.ML
This module complements (not replaces) the existing Dala.ML modules:
Dala.ML.CoreML— iOS-native CoreML (best performance on iOS)Dala.ML.EMLX— MLX backend for Apple SiliconDala.ML.ONNX— Cross-platform ONNX RuntimeDala.ML.Gpu.Inference— GPU compute via CubeCL (this module)
Use Dala.ML.predict/2 for automatic backend selection, or call
this module directly for GPU-specific control.
Summary
Functions
List available pre-compiled models.
Free a model's GPU pipeline resources.
Load a pre-compiled model for GPU inference.
Load model weights from GPU video frames for GPU-to-GPU inference.
Return model metadata.
Run inference on a loaded model with an Nx tensor input.
Run inference asynchronously.
Run inference directly on a VideoFrame (GPU-to-GPU).
Return the top-k predictions from a classification output.
Types
Functions
@spec available_models() :: [atom()]
List available pre-compiled models.
Free a model's GPU pipeline resources.
Load a pre-compiled model for GPU inference.
@spec load_model_from_frames(model(), [ExCubecl.VideoFrame.t() | binary()]) :: {:ok, model()} | {:error, term()}
Load model weights from GPU video frames for GPU-to-GPU inference.
This enables processing of ExCubecl.VideoFrame structs without
CPU round-trip. The frames are uploaded to GPU buffers and bound
to the model pipeline.
Parameters
model— a loaded model structframes— list ofExCubecl.VideoFramestructs or raw binaries
Returns
{:ok, updated_model} with frame buffers bound to the pipeline.
Example
frames = ExCubecl.VideoFrame.stream(camera_source, max_frames: 30)
{:ok, model} = Dala.ML.Gpu.load_model(:mobilenet_v2)
{:ok, model} = Dala.ML.Gpu.load_model_from_frames(model, frames)
Return model metadata.
@spec predict(model(), Nx.Tensor.t()) :: {:ok, Nx.Tensor.t()} | {:error, term()}
Run inference on a loaded model with an Nx tensor input.
@spec predict_async(model(), Nx.Tensor.t()) :: {:ok, reference()} | {:error, term()}
Run inference asynchronously.
@spec predict_frame(model(), ExCubecl.VideoFrame.t() | binary()) :: {:ok, Nx.Tensor.t()} | {:error, term()}
Run inference directly on a VideoFrame (GPU-to-GPU).
This avoids CPU read-back by running the model pipeline directly on the GPU texture backing the VideoFrame. The output is still returned as an Nx tensor (requires one GPU→CPU read).
Parameters
model— a loaded model with frame buffers (fromload_model_from_frames/2)frame— anExCubecl.VideoFramestruct or raw binary frame data
Returns
{:ok, output_tensor} on success.
Example
{:ok, model} = Dala.ML.Gpu.load_model(:mobilenet_v2)
{:ok, model} = Dala.ML.Gpu.load_model_from_frames(model, calibration_frames)
# Process each frame in the video stream
for frame <- video_stream do
{:ok, predictions} = Dala.ML.Gpu.predict_frame(model, frame)
# Use predictions...
end
@spec top_k( Nx.Tensor.t(), keyword() ) :: [{number(), non_neg_integer()}]
Return the top-k predictions from a classification output.