All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.5.1 - 2026-05-23
Fixed
CHANGELOG.md— corrected the 0.5.0 entry. The published release carried two### Changedheadings and listed three new-functionality items (mix emily.doctor,config :emily, fallback:, and theEmily.Memorypublic allocator API) under Changed rather than Added. Merged the duplicate Changed sections, moved the new-functionality items to Added, and put items into reverse chronological order. No code change.
0.5.0 - 2026-05-23
Added
Emily.Quantization.dequantize_defn/1now supports thenvfp4microscaled mode in addition toaffine,mxfp4, andmxfp8— the full MLXQuantizationModeenum now runs through the defn-native dequant path.nvfp4reuses the FP4-E2M1 lane LUT frommxfp4and the FP8-E4M3 LUT frommxfp8(consumed against the per-group scale bytes rather than lane codes — the NVIDIA microscaled convention uses finer-grained group_size=16 with FP8-E4M3 scales instead of mxfp4/mxfp8's group_size=32 with FP8-E8M0 scales). Output dtype is bf16 to matchQuantizedWeight.to_dense/1, round-trip is bit-identical (max abs diff = 0.0).Emily.Quantization.Transformacceptsmode: "nvfp4".Emily.Quantization.dequantize_defn/1now supports themxfp8microscaled mode in addition toaffineandmxfp4. Each 8-bit lane code decodes through a 256-entry FP8-E4M3 lookup table precomputed via MLX'sFromFP8bit-trick (strip sign, shift the low 7 bits left by 7 to align the E4M3 exponent into f16's exponent field, multiply by 256 for the bias difference, restore sign). Per-group scales reuse the FP8-E8M0 decode from the mxfp4 path. Output dtype is bf16 to matchQuantizedWeight.to_dense/1, and the round-trip is bit-identical (max abs diff = 0.0) on realistic data.Emily.Quantization.Transformacceptsmode: "mxfp8"; onlynvfp4(which uses an FP8-E4M3 per-group scale instead of FP8-E8M0) remains defn-unsupported.Emily.Quantization.dequantize_defn/1now supports themxfp4microscaled mode in addition toaffine. Each 4-bit lane code decodes through MLX's FP4-E2M1 lookup table (+0.0, +0.5, +1.0, +1.5, +2.0, +3.0, +4.0, +6.0and their negatives); each u8 scale byte decodes through2^(s - 127)(FP8-E8M0). Output dtype is bf16 to matchQuantizedWeight.to_dense/1, and the round-trip is bit-identical (max abs diff = 0.0) on realistic scale bytes because every FP4 LUT entry and every E8M0 power-of-two is exact in bf16.Emily.Quantization.Transformgains a:modeoption (default"affine", accepts"mxfp4");mxfp8andnvfp4are still defn-unsupported and route through the Native NIF.Emily.Quantization.dequantize_defn/1now supports int3 and int6 weights in addition to int2/int4/int8. The new path reads each lane's two adjacent u32 words as a u64, shifts by the in-word bit offset, and masks — handling the cross-u32 packing MLX uses for bit widths that don't divide 32 cleanly.defn_supported_bits/0now returns[2, 3, 4, 6, 8]; quantized Axon graphs rewritten viaEmily.Quantization.Transform(andEmily.Quantization.Layers.quantized_dense/4) pick the expanded set up automatically. Previously the defn path rejectedbits ∈ {3, 6}and callers had to fall back toQuantizedWeight.to_dense/1(the Native NIF).ARCHITECTURE.md— current shape of the library extracted fromPLAN.md. Covers the four-layer dispatch model, the worker-thread- per-process-stream concurrency model, the public
Emily.Memoryallocator API, the telemetry event catalogue, the:debug_bounds_check/:debug_detect_nan_infcompile-time flags, build/packaging notes, the per-layer testing oracle table, and the active risk register. Linked from the README under a new Documentation section and grouped under "Project" in the HexDocs sidebar.
- per-process-stream concurrency model, the public
ROADMAP.md— active and future work, separated from the historical milestone log. Lists deferred-to-post-1.0 items (typed exceptions, GPU interop pointers, source-build doctor probes) and the open in-roadmap MLX capability gaps (sparse / MoE matmuls, FP8 dtype,ThreadLocalStream).mix emily.doctor— diagnostic Mix task that verifies the local Emily runtime installation. Checks the host platform (OS, arch, macOS version against the active variant's minimum), the active MLX variant,priv/libemily.soandpriv/mlx.metallib, NIF loadability, and a tinyEmily.Backendsmoke test that asserts the result didn't silently fall back toNx.BinaryBackend. Checks short-circuit: when a prerequisite fails, dependent checks report[skip]rather than producing cascading noise. Supports--variant aot|jitfor "would this host satisfy :jit?" probes and--helpfor usage.config :emily, fallback: :silent | :warn | :raise— strict fallback modes for development and CI.:silent(the default) preserves today's behaviour;:warnemits the one-shotLogger.warningper{op, input_shapes}pair previously gated by:warn_on_fallback;:raiseraisesRuntimeErrorwith op, shapes, and dtypes on entry, letting CI fail the build when a hot path unexpectedly routes throughNx.BinaryBackend. An invalid:fallbackvalue raisesArgumentErroron the first fallback so typos surface immediately.Emily.Memory— public allocator API for long-running serving and training workloads that need to observe and manage MLX memory without reaching intoEmily.Native. Exposesstats/0(active, peak, and cached bytes, also emitting[:emily, :memory, :stats]),reset_peak/0, andclear_cache/0. Documented under the README's Observability section and grouped withEmily.Telemetryin the ExDoc sidebar.
Changed
PLAN.mdslimmed to its milestone-history role. The current-shape sections (architecture diagram, core design decisions, testing philosophy, risks-and-mitigations) moved toARCHITECTURE.md; goals, non-goals, and deferred-milestone summaries moved toROADMAP.md. The M0–M27 milestone narratives, the ratified project decisions, and the 2026-04-22 MLX capability audit stay inPLAN.mdas the historical record. The stale "narrowwith_stream/2+new/1+synchronize/1surface" reference (nosynchronize/1ever shipped) and the plannedset_default_stream/1primary deliverable (removed during the post-M14 fixes) drop out with the prologue rewrite.Emily.Nativenow annotates NIF errors with operation, input shape/dtype, options, and worker context.ArgumentErrorandRuntimeErrorraised from async ops get anEmily.Native context: op=… inputs=[…] options=[…] stream=…suffix, so common failures (shape mismatches inmatmul, divisibility errors inquantize, mask shape bugs infast_scaled_dot_product_attention, etc.) are diagnosable from the message alone. The error-formatting path is total — bad context maps degrade to?markers rather than masking the underlying NIF error.- The legacy
config :emily, :warn_on_fallback, trueboolean is soft-deprecated in favour of:fallback. It is still honoured when:fallbackis unset (true→:warn); when both are set,:fallbackwins. Emily.Telemetry.memory_stats/0now delegates toEmily.Memory.stats/0. Behaviour is unchanged — same event, measurements, and return shape — but new code should prefer theEmily.Memoryentry point.
0.4.0 - 2026-05-17
Changed
- Upgraded to Nx 0.12 / Bumblebee 0.7 / Axon 0.8. Nx 0.12 replaces
the optional-callback list (
lu,svd,qr,cholesky,eigh,solve,take,take_along_axis,fft2,ifft2,cumulative_*,logical_not,all_close) with a single genericNx.Backend.block/4dispatch keyed onNx.Block.*structs.Emily.Backendnow routes every previously-native op throughblock/4, preserving the MLX fast paths without losing the BinaryBackend fallback when an unknown block arrives. ExistingEmily.Backendconsumers see no behavioural change. - Migrated
Emily.Fast.*from the now-removedNx.Defn.Expr.optional/3extension point toNx.block/4. Each fused kernel (rms_norm,layer_norm,rope,rope_with_freqs,scaled_dot_product_attentionwith and without mask/sinks) now emits anEmily.Fast.Block.*struct thatEmily.Backend.block/4pattern-matches to the matchingmx::fast::*NIF. The composed-defn fallbacks under non-Emily backends are unchanged. - Bumblebee 0.7 ships Qwen3 first-class, so
notebooks/qwen3_quantized.livemdno longer needs themain-ref Bumblebee pin from the 0.6.3 era.
Added
Nx.rfft/2andNx.irfft/2support. The underlyingNative.rfftn/Native.irfftnNIFs were already in place from earlier MLX work; Nx 0.12 surfaces these as backend-block ops so Emily wires them up at no MLX-side cost.- Smoke tests for three new Bumblebee 0.7 model families on
Emily.Backend: NomicBERT (:nomic_embeddings), SmolLM3 (:smollm3), and ModernBERT (:modernbert). All three drive a tiny synthetic spec end-to-end throughAxon.predictso they remain offline-friendly; tagged:conformance. - Runnable Livebooks for each of the three new Bumblebee 0.7
families:
notebooks/nomic_embeddings.livemd(NomicBERT embeddings with cosine similarity),notebooks/smollm3_chat.livemd(SmolLM3-3B chat completion with a<think>toggle for hybrid reasoning), andnotebooks/modernbert_classification.livemd(ModernBERT NLI fine-tune). All three are published under the HexDocs Notebooks group. - A
[:emily, :block, :fallback]telemetry event fires wheneverEmily.Backend.block/4falls through to the supplied defaultfun. Surfaces ops we used to handle natively but now land on the composed-defn path — useful in soak runs to spot silent regressions after a Bumblebee bump.
Fixed
mix docsno longer emits autolinker warnings for theEmily.Backend.block/4andNx.Defn.Expr.optional/3references in theEmily.FastandEmily.Fast.Blockmoduledocs. The references resolved to@doc falsecallees (the backend callback is hidden byNx.Backend, andoptional/3was removed in Nx 0.12); the prose stays, theMod.fun/arityshape is broken up so the autolinker no longer follows it. Same pattern as the earlier fix inee32c7c.
Removed
{:f8_e4m3fn, 8}(introduced in Nx 0.11) is rejected at the backend boundary with the same "no MLX primitive"ArgumentErrorpattern as{:f, 64}. MLX has no float-8 dtype; cast to:f16or:bf16.
0.3.5 - 2026-05-03
0.3.4 - 2026-05-03
Fixed
Nx.LinAlg.svd(tensor, full_matrices?: false)on rank-2 inputs no longer routes through MLX's full-matrices SVD and post-slices — MLX's SVD has no thin switch, so the old path materialised the full m × m U on device and instantly OOM'd Metal for tall matrices like the Qwen3-0.6B embedder kernel (151936 × 1024 → ~92 GB U). The thin case now computesG = MᵀM → eigh → S, V; U = MV / S(or the symmetricMMᵀroute for wide matrices), keeping the decomposition at min(m, n)². See theEmily.Backendmoduledoc Divergences section for the numerical caveat (the Gram step squares M's condition number). Refs #84.mix docsruns cleanly. The MNIST notebook referencedAxon.Loop'strainer/2(no such arity); three other inline references resolved to@doc falsecallees in upstream libraries (Nx.Defn.Expr'soptional/3, Bumblebee'srms_norm/2) and triggered autolinker warnings on every doc build. The notebook now uses the correcttrainer/3arity, and the prose references have been reshaped so the autolinker no longer follows them, keeping the build warning-free for future--warnings-as-errorsenforcement. Refs #83.
0.3.3 - 2026-05-03
Fixed
Emily.Compilernow silently drops options it doesn't recognise instead of raisingArgumentError. This matches the behaviour ofNx.Defn.Evaluatorand EXLA, and restores compatibility with higher-level libraries that forward caller-supplied options through the JIT compiler — notablyAxon.build/2, whose contract states that "all other options are forwarded to the underlying JIT compiler". Hit when running a Bumblebee-built Axon model withAxon.predict(..., global_layer_options: [output_hidden_states: true])under Emily as the global defn compiler. Refs #81.
0.3.2 - 2026-04-25
0.3.1 - 2026-04-25
Fixed
- Precompiled NIF download no longer times out on the
:peer.call/4default 5sgen_server.calldeadline. Consumers installing{:emily, "~> 0.3"}on a cold cache could see:gen_server.calltimeouts while fetching the multi-MB tarball; the.sha256sidecar fit in the window but the main asset did not. The peer RPC now runs with:infinityso httpc's own request timing drives cancellation.
0.3.0 - 2026-04-25
Changed
- Hex consumers now receive a precompiled NIF
(
libemily.{so,dylib}+mlx.metallib) instead of source. Firstmix compiledownloads the matchingemily-nif-<v>-<variant>- <target>.tar.gz(and its.sha256sidecar) from the emily GitHub release for the pinned version, verifies the tarball against the published SHA256, and extracts intopriv/. No cmake / Xcode / C++ toolchain is needed on the consumer side. - In-repo / CI builds now clone MLX's source via a Mix git dep
(
:mlx_src) and build libmlx from source;release-mlx.ymlis retired. - Variant selection is unified under the
:variantapp-config key (:aot|:jit). Contributors flip variants viaEMILY_MLX_VARIANT=jit(read byconfig/config.exs); consumers setconfig :emily, variant: :jitin their ownconfig/config.exs. The old:mlx_variantkey andconfig/local.exsoverride are gone. - macOS default cache location moves from
~/Library/Caches/emily/toDARWIN_USER_CACHE_DIR(/private/var/folders/<hash>/C/emily) — the per-user sandboxed cache root Apple's own sandboxed apps use. Persistent across reboots, lives outside~/Library/. Linux / Windows still use the XDG convention. Override viaEMILY_CACHE. Existing macOS users canrm -rf ~/Library/Caches/emily/to reclaim the orphaned data after upgrade. - NIF object files move from the user-level cache to
$(MIX_APP_PATH)/obj/(i.e._build/<env>/lib/emily/obj/). As a consequence, plainmix cleannow correctly removes them via the existing Makefile rule — they were previously left behind becausemake cleandidn't see the cache-dir env vars.
Added
.github/workflows/release-nif.yml— on bare-semver tag push, builds the precompiled NIF for each(variant × target)cell and uploads tarball +.sha256sidecar to a draft GitHub release.workflow_dispatchis also wired for out-of-band rebuilds (artefacts go to workflow storage; the release is untouched).mix clean.mlx— wipes the MLX install dir(s) under the cache. Plainmix cleandeliberately preserves them since rebuilding MLX from source is ~5-7 minutes.
Fixed
- MLX source builds are now atomic. The build script installs into
${PREFIX}.stagingand onlymvs onto the final path after the artefact sanity checks pass; an EXIT trap wipes the scratch dirs on failure. Previously, an interrupted build (Ctrl-C, killed process, concurrent run) left an empty install dir that subsequentmix compileruns misread as "MLX is already installed", silently skipping the build and bombing out inelixir_makewithmake: *** No rule to make target '.../mlx.metallib'. The compile-time check now requires bothlib/libmlx.aandlib/mlx.metallibto be present before trusting the dir. - Concurrent invocations of
build-mlx.shagainst the same install prefix are now serialised via amkdir-based lock with stale-PID reclaim. ElixirLS uses its own build path (.elixir_ls/build/...) so an LSP-drivenmix compileand a CLImix compile.emily_mlx --forcelock on differentMix.Project.with_build_lockkeys and freely raced into the same MLX cache dir, clobbering each other's${PREFIX}.build/mid-build and surfacing asclang ... Rename failed: ... No such file or directoryduring Metal-shader compilation. - CMake's FetchContent sub-build of metal_cpp / json / fmt during
configure runs with
CMAKE_BUILD_PARALLEL_LEVEL=1, dodging a race in its download → extract → rename → stamp-touch pipeline that surfaced asgetcwd: cannot access parent directoriesfollowed bycd: <dir>/_deps: No such file or directory. The main MLX build still runs at full NCPU jobs. - The MLX scratch build dir (
${PREFIX}.build) is preserved on configure failure soCMakeError.logsurvives for diagnostics.
Removed
config/local.exsoverride (obsoleted by the env-var plumbing)..github/workflows/release-mlx.yml(MLX build is folded into the NIF workflow).scripts/build-mlx-prebuilt.sh(superseded by in-treescripts/build-mlx.sh).scripts/smoke-test-package.shand the taggedsmoke-testjob inci.yml(simulated a source-compile consumer, no longer applicable).
See MAINTAINING.md for the updated release flow.
0.2.2 - 2026-04-23
Fixed
- MLX prebuilt download now runs on a peer VM (
:peer.start_link/1with stdio connection) so it is unaffected by Mix's code-path pruning during dep compilation. Previous releases crashed in the taggedsmoke-testCI lane with{:error, :nofile}/ "module :public_key is not available" on clean caches, because Mix removed the:ssl/:public_key/:asn1/:inetsebin directories from the parent VM's code path even though the apps were started. The peer node has a fresh code path, so standardhttpc+public_keywork without further shimming.
0.2.1 - 2026-04-22
Fixed
mix compilecrash on a cold MLX download in a clean consumer project.http_download!/2inmix.exscalled:public_key.cacerts_get/0right afterApplication.ensure_all_started(:ssl). The app-start path pulled:public_keyin transitively, but the module itself was not guaranteed to be loaded at call time — the tag-triggered Hex smoke test on CI blew up withUndefinedFunctionError ... module :public_key is not availableon 0.2.0.http_download!now force-loads the module via:code.ensure_loaded/1before touching it. Any checkout with a populated~/Library/Caches/emily/mlx-<v>-*directory skipped this path, which is why the break only surfaced in the first clean CI run.
0.2.0 - 2026-04-22
Added
- MLX prebuilt-release workflow
(
.github/workflows/release-mlx.yml). Manual workflow that buildslibmlx.a+mlx.metallib+ headers from a chosenml-explore/mlxtag and uploads the tarball to a draft GitHub release taggedmlx-<version>on this repo. Used to produce the prebuilts that Emily's compile step downloads instead of the previous source-build path. To cut a new MLX prebuilt release:- Run the workflow with
build_type=no-jiton macos-14 (producesmlx-<v>-macos-arm64-aot.tar.gz). - Run it again with
build_type=jiton macos-26 (producesmlx-<v>-macos-arm64-jit.tar.gz). - Copy the two SHA256s from the draft release's
.sha256sidecars into@mlx_checksumsinmix.exs. - Un-draft the release so consumers can fetch.
The heavy lifting sits in
scripts/build-mlx-prebuilt.sh, which runs standalone for local debugging:scripts/build-mlx-prebuilt.sh path/to/mlx-src 0.31.2 0.
- Run the workflow with
Emily.Fast.einsum/2— eager-only wrapper around MLX's path-optimisedmx::einsum. Accepts a standard Einstein-summation string and a list ofEmily.Backend-backed tensors; MLX picks the contraction order internally. Operands on any other backend raiseArgumentErrorwith a transfer-first message. The helper is a direct-call eager helper (same pattern asEmily.Quantization.quantized_matmul/2) and is intentionally notdefn-callable — a fallback viaNx.Defn.Expr'soptional/3would require a full einsum-string parser and is deferred until a user needs cross-backend composability.
Fixed
Nx.top_k/2on Emily tensors. The backend'stop_k/3override pattern-matchedoutas a single%Nx.Tensor{}and returned a single tensor, but the real Nx callback contract takes{out_values, out_indices}and returns a{values, indices}tuple. Any call toNx.top_kraisedFunctionClauseError. Dropped the override so Nx falls back toargsort(:desc) + take_along_axis + slice_along_axis, each of which routes through Emily's backend.
Changed
- MLX prebuilt download replaces the vendored source build. The
vendor/mlxsubmodule and the cmake-from-source path are gone.mix compilenow downloads a SHA256-verifiedlibmlx.a+mlx.metallib+ headers tarball for the pinned@mlx_versionfrom this repo's releases into$EMILY_CACHEand links the NIF against it directly. Consumer prerequisites drop from "Xcode + Metal toolchain + cmake + submodule checkout" to just macOS Apple Silicon. The JIT / no-JIT switch moves from theEMILY_MLX_JITenv var toconfig :emily, mlx_variant: :jit | :no_jitinconfig/config.exs(default:no_jit); variant is read viaConfig.Reader.read!at project load, so a gitignoredconfig/local.exsis the supported per-checkout override. Version bumps are a single-commit change of@mlx_version+@mlx_checksumsinmix.exs, paired with a newmlx-<version>GitHub release produced byrelease-mlx.yml. First MLX pin under the new scheme: 0.31.2. - Microscaled quantization modes on
Emily.QuantizedWeight. The container now carries a:modefield (default"affine") and accepts"mxfp4","mxfp8","nvfp4"— MLX's fullQuantizationModeenum (vendor/mlx/mlx/primitives.h:155).from_dense/2,to_dense/1, andEmily.Quantization.quantized_matmul/2all thread the mode through to MLX; mode-specific{group_size, bits}constraints are validated up front with a clear Emily error before the NIF call. Microscaled modes carry a placeholder biases tensor — MLX'sfp_quantizereturns only(wq, scales), and the Native layer substitutesnilbefore the MLX call.Emily.Quantization.dequantize_defn/1is affine-only (it's a hand-rolled nibble unpacker) and now raisesArgumentErroron non-affine modes, pointing users atto_dense/1. Smoke-tested end-to-end on Metal for all four modes (Apple Silicon, macOS 26). - SDPA attention sinks (
mx::fast::scaled_dot_product_attentionsinksparam).Emily.Fast.scaled_dot_product_attention/4andscaled_dot_product_attention_with_mask/5now accept an optional:sinkskeyword opt — a per-head tensor broadcastable to{1, heads, 1, 1}whose entries participate in the softmax denominator as extra "null destinations" (StreamingLLM). When absent the helpers emit the pre-existing optional-node, soEmily.Bumblebee.FastKernelsand direct callers stay source- and bit-compatible. The defn fallback implements the same semantics in numerically-stable form; equivalence vs. the fused kernel was measured at ~2e-7 max-abs-diff on f32. - MLX JIT build no longer patches vendored MLX. The
patches/mlx-jit-nax-gate.patchworkaround (and themaybe_apply_mlx_patchesplumbing inmix.exs) has been removed. The JIT build now requires the macOS 26.2+ SDK directly, which ships<MetalPerformancePrimitives/MetalPerformancePrimitives.h>; the AOT (default) build is unchanged and still works on older macOS. Upstream discussion: ml-explore/mlx#3426. - CI matrix split across macOS versions. The
jit=0row stays onmacos-14to keep AOT coverage on older macOS; thejit=1row now runs onmacos-26so the Metal Performance Primitives SDK is available natively. - Native axis reversal via
mx::slicewith stride -1. The descending branches ofNx.sortandNx.argsort(andNx.reverse) previously built anarangeindex tensor and gathered withtake. They now call a newNative.flip/3NIF that lowers to a single strided slice, saving the index allocation and gather kernel per call. - Parallel NIF C++ build.
elixir_makedoesn't pass-jby default andmix.exsdidn't set:make_args, so every.cppinc_src/compiled serially.mix.exsnow passes-j#{System.schedulers_online()}through, and the vestigialJOBS/MAKE_JOBSpair in theMakefile(computed but never referenced) has been removed. On an 8-core M-series, a clean NIF build drops from ~19 s to ~7 s.
0.1.2 - 2026-04-19
Fixed
- HexDocs source links.
mix.exs'ssource_url_patternprepended avprefix to the version tag, but the project's release convention (viamix publisho) uses bare semver tags. The generated[source]links in HexDocs pointed at nonexistentv<version>tags. Dropped the prefix so links resolve to the actual tag.
0.1.1 - 2026-04-19
Initial release. See the git history for per-milestone detail.
Added
- Nx backend.
Emily.Backendimplements every requiredNx.Backendcallback against MLX, with transparent fallback toNx.BinaryBackendfor ops without a native primitive. - Defn compiler.
Emily.Compilerrunsdefn/Nx.Serving/ Bumblebee on Emily; pins the result backend and caps partition concurrency soNx.Servingstays compatible. - Fused transformer kernels.
Emily.Fastexposesmx::fast::rms_norm,layer_norm,rope, and scaled-dot-product attention as defn-callable helpers with composed-defn fallbacks for non-Emily backends.Emily.Bumblebee.FastKernelsrewrites a Bumblebee Axon graph to call the fused kernels in place; declared as an optional dep on:axon+:bumblebee, elides cleanly if either is absent. - Affine group-wise quantization.
Emily.QuantizedWeightandEmily.Quantizationwrap MLXquantize/dequantize/quantized_matmulfor int2 / int4 / int8 inference.Emily.Quantization.dequantize_defn/1provides a defn-native dequantize for use inside Axon forward passes. - Mixed-precision training.
Emily.MixedPrecisionships the bf16 recipe:cast_paramsfor the forward pass, f32 master weights, dynamic loss scaling with overflow detection. - Per-process Metal streams.
Emily.Streamlets each BEAM process own its own Metal command queue, enabling concurrent inference on a shared model. - Zero-copy
to_binary.Nx.to_binary/1on an Emily tensor returns a BEAM resource binary aliasing the MLX buffer — no memcpy. - Native gradient + training primitives.
gather,scatter,scatter_add,conv, and the window-reduction family lower directly to MLX soNx.Defn.gradand CNN training stay native. - Native linalg.
lu,svd,qr,cholesky,eigh,solve, andtriangular_solvedispatch tomx::linalg::*instead of rounding throughNx.BinaryBackend. - Telemetry.
[:emily, :eval, *],[:emily, :to_binary, *],[:emily, :fallback, *], and[:emily, :memory, :stats]span events; opt-in one-shot fallback warnings viaconfig :emily, :warn_on_fallback, true. - Compile-time debug flags.
:debug_bounds_checkand:debug_detect_nan_infre-enable runtime assertions on hot paths; default off with zero runtime cost. - Bumblebee conformance. End-to-end suites for DistilBERT, Qwen3-0.6B (dense and quantized), ViT-base, and Whisper-tiny, pinned against HuggingFace reference values.
- Worker-thread dispatch. Each MLX stream is owned by a
dedicated OS thread. NIFs enqueue work on the worker and return
immediately; the worker posts the result back to the caller via
enif_send, and the public wrapper awaits it withreceive. No BEAM scheduler (regular or dirty) blocks on MLX work, and the per-thread MetalCommandEncoderstate stays consistent regardless of how the BEAM migrates Elixir processes between schedulers. - Vendored MLX build. MLX is built from source via cmake from
vendor/mlx(git submodule); no prebuilt download. Build cache keyed on the submodule SHA under~/Library/Caches/emily/. - Documentation. Per-module HexDocs, five runnable Livebooks
(
notebooks/distilbert_qa.livemd,notebooks/qwen3_quantized.livemd,notebooks/mnist_training.livemd,notebooks/whisper_transcription.livemd,notebooks/fast_kernels.livemd), and worked Bumblebee examples in the conformance suite.