Nx.Defn.Compiler implementation that runs defn computations on
Emily.Backend.
The compiler walks Nx.Defn.Expr in Elixir and dispatches each node
through the active backend — exactly what Nx.Defn.Evaluator already
does — with two adjustments specific to Emily:
__to_backend__/1returns{Emily.Backend, [device: …]}soNx.Defn.to_backend/1(and the callers that consult it, includingNx.Serving) allocate inputs and outputs on Emily rather than the process-default backend.__partitions_options__/1always returns a single partition. MLX's Metal runtime was historically unsafe for concurrent kernel dispatch from multiple OS threads.:max_concurrencyis accepted for API compatibility withNx.Servingbut capped at 1. For concurrent inference on a shared model useEmily.Stream.
Public API
Users do not call this module directly. Install it as the default
compiler and Nx.Serving / Bumblebee picks it up:
Nx.Defn.global_default_options(compiler: Emily.Compiler)Or attach it per-call:
Nx.Defn.jit(&my_fn/1, compiler: Emily.Compiler).(input)The four callbacks on Nx.Defn.Compiler (__jit__/5,
__compile__/4, __partitions_options__/1, __to_backend__/1)
are invoked by Nx.Defn on your behalf.
Design notes
__jit__/5 and __compile__/4 delegate to Nx.Defn.Evaluator
after filtering the option list down to the keys this module
consumes. There is no external JIT cache beyond the
closure Nx.Defn.compile/3 already returns: Bumblebee and
Nx.Serving hold that closure on warmup, so subsequent calls skip
the walk.
The compiler does not wrap mlx::core::compile. The bench harness
under bench/native/ measured the fusion win at <1.2× on
transformer-shaped workloads — below the threshold that justified
the integration cost.
Options
:device—:gpu(default) or:cpu. Forwarded toEmily.Backendvia the__to_backend__/1callback.:hooks,:debug_options,:garbage_collect— passed through toNx.Defn.Evaluatorunchanged. See its moduledoc.:max_concurrency— accepted forNx.Servingcompatibility, but multi-partition serving is rejected because MLX kernel dispatch isn't thread-safe. Pass1(the default) to silence. For concurrent inference seeEmily.Stream.:batch_keys,:cache— accepted and ignored.Nx.Servingpropagates:batch_keysto the compiler viadefn_optionsfor arity-1 serving builders (e.g.Bumblebee.Audio.speech_to_text_whisper/5), and Bumblebee passes:cachethrough for its own per-scope cache suffixing. Neither is used by the Evaluator walk, but rejecting them would break those servings.
Any other option is silently dropped. This matches how
Nx.Defn.Evaluator and EXLA handle their own option lists, and is
the contract higher-level libraries rely on when they forward
caller-supplied options to the JIT compiler — e.g. Axon.build/2,
whose docs state that "all other options are forwarded to the
underlying JIT compiler".
Examples
Process-global installation (typical for Nx.Serving / Bumblebee):
Nx.global_default_backend(Emily.Backend)
Nx.Defn.global_default_options(compiler: Emily.Compiler)Per-call:
add_one = Nx.Defn.jit(fn x -> Nx.add(x, 1) end, compiler: Emily.Compiler)
add_one.(Nx.tensor([1.0, 2.0]))
# => #Nx.Tensor<f32[2] [2.0, 3.0]> on Emily.Backend