Overview
ExBurn compiles trained models for mobile deployment via Burn's CubeCL backend. The pipeline optimizes models for the target GPU backend:
- iOS: Metal via CubeCL
- Android: Vulkan via CubeCL
ExBurn is designed as a library — it provides the Nx backend and GPU acceleration layer that other frameworks can build on top of.
Compiling a Model
# Define a model with Axon
model =
Axon.input("input", shape: {nil, 784})
|> Axon.dense(128, activation: :relu)
|> Axon.dropout(rate: 0.2)
|> Axon.dense(10)
# Compile for training/inference
compiled = ExBurn.Model.compile(model,
loss: :cross_entropy,
optimizer: :adam,
learning_rate: 0.001
)
# Run inference
{:ok, output} = ExBurn.Model.predict(compiled, input_tensor)
# Save for deployment
ExBurn.Model.save(compiled, "model.bin")
# Load
{:ok, loaded} = ExBurn.Model.load(compiled, "model.bin")Using ExCubecl for GPU Inference
ExBurn integrates with ExCubecl for GPU buffer management and kernel execution:
# Create GPU buffers via ExCubecl
{:ok, input_buf} = ExCubecl.buffer([1.0, 2.0, 3.0], [3], :f32)
{:ok, output_buf} = ExCubecl.buffer([0.0, 0.0, 0.0], [3], :f32)
# Run a kernel
ExCubecl.run_kernel("elementwise_add", [input_buf, input_buf], output_buf)
# Read results back
{:ok, data} = ExCubecl.read(output_buf)Using ExBurn.Serving for Batched Inference
For production inference with concurrent batching:
# Build a serving from a compiled model
serving = ExBurn.Serving.build(compiled,
batch_size: 32,
batch_timeout: 50
)
# Run batched inference
output = Nx.Serving.run(serving, input_tensor)Model Optimization Tips
- Use f16 quantization: Halves memory usage with minimal accuracy loss
- Reduce model size: Target < 10MB for mobile apps
- Batch inference: Process multiple inputs together for better throughput
- Use ExCubecl pipelines: Chain multiple GPU kernels without CPU round-trips
- Profile on device: Benchmark on the target hardware before deploying
Supported Operations
| Operation | iOS (Metal) | Android (Vulkan) |
|---|---|---|
| Dense | ✅ | ✅ |
| Conv2D | ✅ | ✅ |
| ReLU | ✅ | ✅ |
| Sigmoid | ✅ | ✅ |
| Softmax | ✅ | ✅ |
| Dropout | ✅ | ✅ |
| LayerNorm | ✅ | ✅ |