Emily.Compiler (emily v0.3.2)

Copy Markdown View Source

Nx.Defn.Compiler implementation that runs defn computations on Emily.Backend.

The compiler walks Nx.Defn.Expr in Elixir and dispatches each node through the active backend — exactly what Nx.Defn.Evaluator already does — with two adjustments specific to Emily:

  • __to_backend__/1 returns {Emily.Backend, [device: …]} so Nx.Defn.to_backend/1 (and the callers that consult it, including Nx.Serving) allocate inputs and outputs on Emily rather than the process-default backend.
  • __partitions_options__/1 always returns a single partition. MLX's Metal runtime was historically unsafe for concurrent kernel dispatch from multiple OS threads. :max_concurrency is accepted for API compatibility with Nx.Serving but capped at 1. For concurrent inference on a shared model use Emily.Stream.

Public API

Users do not call this module directly. Install it as the default compiler and Nx.Serving / Bumblebee picks it up:

Nx.Defn.global_default_options(compiler: Emily.Compiler)

Or attach it per-call:

Nx.Defn.jit(&my_fn/1, compiler: Emily.Compiler).(input)

The four callbacks on Nx.Defn.Compiler (__jit__/5, __compile__/4, __partitions_options__/1, __to_backend__/1) are invoked by Nx.Defn on your behalf.

Design notes

__jit__/5 and __compile__/4 delegate to Nx.Defn.Evaluator after option validation. There is no external JIT cache beyond the closure Nx.Defn.compile/3 already returns: Bumblebee and Nx.Serving hold that closure on warmup, so subsequent calls skip the walk.

The compiler does not wrap mlx::core::compile. The bench harness under bench/native/ measured the fusion win at <1.2× on transformer-shaped workloads — below the threshold that justified the integration cost.

Options

  • :device:gpu (default) or :cpu. Forwarded to Emily.Backend via the __to_backend__/1 callback.
  • :hooks, :debug_options, :garbage_collect — passed through to Nx.Defn.Evaluator unchanged. See its moduledoc.
  • :max_concurrency — accepted for Nx.Serving compatibility, but multi-partition serving is rejected because MLX kernel dispatch isn't thread-safe. Pass 1 (the default) to silence. For concurrent inference see Emily.Stream.
  • :batch_keys, :cache — accepted and ignored. Nx.Serving propagates :batch_keys to the compiler via defn_options for arity-1 serving builders (e.g. Bumblebee.Audio.speech_to_text_whisper/5), and Bumblebee passes :cache through for its own per-scope cache suffixing. Neither is used by the Evaluator walk, but rejecting them would break those servings.

Examples

Process-global installation (typical for Nx.Serving / Bumblebee):

Nx.global_default_backend(Emily.Backend)
Nx.Defn.global_default_options(compiler: Emily.Compiler)

Per-call:

add_one = Nx.Defn.jit(fn x -> Nx.add(x, 1) end, compiler: Emily.Compiler)
add_one.(Nx.tensor([1.0, 2.0]))
# => #Nx.Tensor<f32[2] [2.0, 3.0]> on Emily.Backend