Emily.Compiler (emily v0.7.2)

Copy Markdown View Source

Nx.Defn.Compiler implementation that runs defn computations on Emily.Backend.

The compiler walks Nx.Defn.Expr in Elixir and dispatches each node through the active backend — exactly what Nx.Defn.Evaluator already does — with two adjustments specific to Emily:

  • __to_backend__/1 returns {Emily.Backend, [device: …]} so Nx.Defn.to_backend/1 (and the callers that consult it, including Nx.Serving) allocate inputs and outputs on Emily rather than the process-default backend.
  • __partitions_options__/1 always returns a single partition. MLX's Metal runtime was historically unsafe for concurrent kernel dispatch from multiple OS threads. :max_concurrency is accepted for API compatibility with Nx.Serving but capped at 1. For concurrent inference on a shared model use Emily.Stream.

Public API

Users do not call this module directly. Install it as the default compiler and Nx.Serving / Bumblebee picks it up:

Nx.Defn.global_default_options(compiler: Emily.Compiler)

Or attach it per-call:

Nx.Defn.jit(&my_fn/1, compiler: Emily.Compiler).(input)

The four callbacks on Nx.Defn.Compiler (__jit__/5, __compile__/4, __partitions_options__/1, __to_backend__/1) are invoked by Nx.Defn on your behalf.

Design notes

__jit__/5 and __compile__/4 delegate to Nx.Defn.Evaluator after filtering the option list down to the keys this module consumes. There is no external JIT cache beyond the closure Nx.Defn.compile/3 already returns: Bumblebee and Nx.Serving hold that closure on warmup, so subsequent calls skip the walk.

The compiler does not wrap mlx::core::compile by default. The single-NIF replay is the load-bearing win (it collapses the per-op BEAM↔worker round-trips); mx::compile is exposed as an opt-in compiled eval mode on the program resource, which fuses the elementwise runs the replay leaves separate. On a decode-shaped transformer block bench/program_compile.exs measures ~1.6× over the sync replay (kernel-launch + intermediate-memory overhead dominates at small sequence lengths, and fusion removes it), at the cost of last-few-ULP f32 reassociation and a shape-stability requirement — hence opt-in, not the default for the general compiler.

Options

  • :device:gpu (default) or :cpu. Forwarded to Emily.Backend via the __to_backend__/1 callback.
  • :hooks, :debug_options, :garbage_collect — passed through to Nx.Defn.Evaluator unchanged. See its moduledoc.
  • :max_concurrency — accepted for Nx.Serving compatibility, but multi-partition serving is rejected because MLX kernel dispatch isn't thread-safe. Pass 1 (the default) to silence. For concurrent inference see Emily.Stream.
  • :batch_keys, :cache — accepted and ignored. Nx.Serving propagates :batch_keys to the compiler via defn_options for arity-1 serving builders (e.g. Bumblebee.Audio.speech_to_text_whisper/5), and Bumblebee passes :cache through for its own per-scope cache suffixing. Neither is used by the Evaluator walk, but rejecting them would break those servings.
  • :nativetrue (the default) compiles the traced Nx.Defn.Expr to a flat IR and replays the whole graph in a single NIF call per invocation; false runs the op-by-op Evaluator walk instead. The default is read from config :emily, :native (itself defaulting to true), so config :emily, native: false opts every defn out of the native lane application-wide — e.g. on a memory-constrained host where the one-shot compile peak is too large. The per-call option wins over the app env, so an explicit native: false overrides a global config :emily, native: true and vice versa. A non-boolean raises ArgumentError.
  • :native_fallback:eval (default) or :raise. Controls what happens when native: true but the expression contains an op or construct the IR can't lower yet. :eval routes the whole defn through Nx.Defn.Evaluator (each op then dispatches through Emily.Backend, with its own per-op via_binary fallback) and fires a one-shot [:emily, :compiler, :fallback] event, so installing compiler: Emily.Compiler, native: true globally is safe on any model. :raise re-raises the lowering error instead — use it in CI to prove a model lowers fully native. The per-call option wins over config :emily, :native_fallback, :eval | :raise.
  • :fusetrue evals the compiled program in the mx::compile'd mode instead of the plain replay. For a while-free forward this fuses the elementwise runs the replay leaves separate (the CM6 win); for a Bumblebee.Text.generation defn while it keeps the decode loop host-controlled but fuses each loop body under mx::compile, replaying the cached fused callable every token. Defaults to false; a non-boolean raises ArgumentError. Opt-in because the fusion reassociates f32 to within a few ULP — logits are not bit-identical to the evaluator. Greedy argmax is robust to that drift (greedy token ids matched the evaluator in our tests), but the match is empirical, not guaranteed: any discrete decision the drift can tip — argmax on a near-tie, or a while trip count whose condition reads a reassociated reduction — diverges once it flips. Sampling strategies diverge from the evaluator under fusion even with a fixed seed. Only the native path consults it, so it is ignored unless native: true.

Any other option is silently dropped. This matches how Nx.Defn.Evaluator and EXLA handle their own option lists, and is the contract higher-level libraries rely on when they forward caller-supplied options to the JIT compiler — e.g. Axon.build/2, whose docs state that "all other options are forwarded to the underlying JIT compiler".

Examples

Process-global installation (typical for Nx.Serving / Bumblebee):

Nx.global_default_backend(Emily.Backend)
Nx.Defn.global_default_options(compiler: Emily.Compiler)

Per-call:

add_one = Nx.Defn.jit(fn x -> Nx.add(x, 1) end, compiler: Emily.Compiler)
add_one.(Nx.tensor([1.0, 2.0]))
# => #Nx.Tensor<f32[2] [2.0, 3.0]> on Emily.Backend