Emily.Compiler (emily v0.3.3)

Copy Markdown View Source

Nx.Defn.Compiler implementation that runs defn computations on Emily.Backend.

The compiler walks Nx.Defn.Expr in Elixir and dispatches each node through the active backend — exactly what Nx.Defn.Evaluator already does — with two adjustments specific to Emily:

  • __to_backend__/1 returns {Emily.Backend, [device: …]} so Nx.Defn.to_backend/1 (and the callers that consult it, including Nx.Serving) allocate inputs and outputs on Emily rather than the process-default backend.
  • __partitions_options__/1 always returns a single partition. MLX's Metal runtime was historically unsafe for concurrent kernel dispatch from multiple OS threads. :max_concurrency is accepted for API compatibility with Nx.Serving but capped at 1. For concurrent inference on a shared model use Emily.Stream.

Public API

Users do not call this module directly. Install it as the default compiler and Nx.Serving / Bumblebee picks it up:

Nx.Defn.global_default_options(compiler: Emily.Compiler)

Or attach it per-call:

Nx.Defn.jit(&my_fn/1, compiler: Emily.Compiler).(input)

The four callbacks on Nx.Defn.Compiler (__jit__/5, __compile__/4, __partitions_options__/1, __to_backend__/1) are invoked by Nx.Defn on your behalf.

Design notes

__jit__/5 and __compile__/4 delegate to Nx.Defn.Evaluator after filtering the option list down to the keys this module consumes. There is no external JIT cache beyond the closure Nx.Defn.compile/3 already returns: Bumblebee and Nx.Serving hold that closure on warmup, so subsequent calls skip the walk.

The compiler does not wrap mlx::core::compile. The bench harness under bench/native/ measured the fusion win at <1.2× on transformer-shaped workloads — below the threshold that justified the integration cost.

Options

  • :device:gpu (default) or :cpu. Forwarded to Emily.Backend via the __to_backend__/1 callback.
  • :hooks, :debug_options, :garbage_collect — passed through to Nx.Defn.Evaluator unchanged. See its moduledoc.
  • :max_concurrency — accepted for Nx.Serving compatibility, but multi-partition serving is rejected because MLX kernel dispatch isn't thread-safe. Pass 1 (the default) to silence. For concurrent inference see Emily.Stream.
  • :batch_keys, :cache — accepted and ignored. Nx.Serving propagates :batch_keys to the compiler via defn_options for arity-1 serving builders (e.g. Bumblebee.Audio.speech_to_text_whisper/5), and Bumblebee passes :cache through for its own per-scope cache suffixing. Neither is used by the Evaluator walk, but rejecting them would break those servings.

Any other option is silently dropped. This matches how Nx.Defn.Evaluator and EXLA handle their own option lists, and is the contract higher-level libraries rely on when they forward caller-supplied options to the JIT compiler — e.g. Axon.build/2, whose docs state that "all other options are forwarded to the underlying JIT compiler".

Examples

Process-global installation (typical for Nx.Serving / Bumblebee):

Nx.global_default_backend(Emily.Backend)
Nx.Defn.global_default_options(compiler: Emily.Compiler)

Per-call:

add_one = Nx.Defn.jit(fn x -> Nx.add(x, 1) end, compiler: Emily.Compiler)
add_one.(Nx.tensor([1.0, 2.0]))
# => #Nx.Tensor<f32[2] [2.0, 3.0]> on Emily.Backend