Nx.Defn.Compiler implementation that runs defn computations on
Emily.Backend.
The compiler walks Nx.Defn.Expr in Elixir and dispatches each node
through the active backend — exactly what Nx.Defn.Evaluator already
does — with two adjustments specific to Emily:
__to_backend__/1returns{Emily.Backend, [device: …]}soNx.Defn.to_backend/1(and the callers that consult it, includingNx.Serving) allocate inputs and outputs on Emily rather than the process-default backend.__partitions_options__/1always returns a single partition. MLX's Metal runtime was historically unsafe for concurrent kernel dispatch from multiple OS threads.:max_concurrencyis accepted for API compatibility withNx.Servingbut capped at 1. For concurrent inference on a shared model useEmily.Stream.
Public API
Users do not call this module directly. Install it as the default
compiler and Nx.Serving / Bumblebee picks it up:
Nx.Defn.global_default_options(compiler: Emily.Compiler)Or attach it per-call:
Nx.Defn.jit(&my_fn/1, compiler: Emily.Compiler).(input)The four callbacks on Nx.Defn.Compiler (__jit__/5,
__compile__/4, __partitions_options__/1, __to_backend__/1)
are invoked by Nx.Defn on your behalf.
Design notes
__jit__/5 and __compile__/4 delegate to Nx.Defn.Evaluator
after option validation. There is no external JIT cache beyond the
closure Nx.Defn.compile/3 already returns: Bumblebee and
Nx.Serving hold that closure on warmup, so subsequent calls skip
the walk.
The compiler does not wrap mlx::core::compile. The bench harness
under bench/native/ measured the fusion win at <1.2× on
transformer-shaped workloads — below the threshold that justified
the integration cost.
Options
:device—:gpu(default) or:cpu. Forwarded toEmily.Backendvia the__to_backend__/1callback.:hooks,:debug_options,:garbage_collect— passed through toNx.Defn.Evaluatorunchanged. See its moduledoc.:max_concurrency— accepted forNx.Servingcompatibility, but multi-partition serving is rejected because MLX kernel dispatch isn't thread-safe. Pass1(the default) to silence. For concurrent inference seeEmily.Stream.:batch_keys,:cache— accepted and ignored.Nx.Servingpropagates:batch_keysto the compiler viadefn_optionsfor arity-1 serving builders (e.g.Bumblebee.Audio.speech_to_text_whisper/5), and Bumblebee passes:cachethrough for its own per-scope cache suffixing. Neither is used by the Evaluator walk, but rejecting them would break those servings.
Examples
Process-global installation (typical for Nx.Serving / Bumblebee):
Nx.global_default_backend(Emily.Backend)
Nx.Defn.global_default_options(compiler: Emily.Compiler)Per-call:
add_one = Nx.Defn.jit(fn x -> Nx.add(x, 1) end, compiler: Emily.Compiler)
add_one.(Nx.tensor([1.0, 2.0]))
# => #Nx.Tensor<f32[2] [2.0, 3.0]> on Emily.Backend