All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.3.2 - 2026-04-25
0.3.1 - 2026-04-25
Fixed
- Precompiled NIF download no longer times out on the
:peer.call/4default 5sgen_server.calldeadline. Consumers installing{:emily, "~> 0.3"}on a cold cache could see:gen_server.calltimeouts while fetching the multi-MB tarball; the.sha256sidecar fit in the window but the main asset did not. The peer RPC now runs with:infinityso httpc's own request timing drives cancellation.
0.3.0 - 2026-04-25
Changed
- Hex consumers now receive a precompiled NIF
(
libemily.{so,dylib}+mlx.metallib) instead of source. Firstmix compiledownloads the matchingemily-nif-<v>-<variant>- <target>.tar.gz(and its.sha256sidecar) from the emily GitHub release for the pinned version, verifies the tarball against the published SHA256, and extracts intopriv/. No cmake / Xcode / C++ toolchain is needed on the consumer side. - In-repo / CI builds now clone MLX's source via a Mix git dep
(
:mlx_src) and build libmlx from source;release-mlx.ymlis retired. - Variant selection is unified under the
:variantapp-config key (:aot|:jit). Contributors flip variants viaEMILY_MLX_VARIANT=jit(read byconfig/config.exs); consumers setconfig :emily, variant: :jitin their ownconfig/config.exs. The old:mlx_variantkey andconfig/local.exsoverride are gone. - macOS default cache location moves from
~/Library/Caches/emily/toDARWIN_USER_CACHE_DIR(/private/var/folders/<hash>/C/emily) — the per-user sandboxed cache root Apple's own sandboxed apps use. Persistent across reboots, lives outside~/Library/. Linux / Windows still use the XDG convention. Override viaEMILY_CACHE. Existing macOS users canrm -rf ~/Library/Caches/emily/to reclaim the orphaned data after upgrade. - NIF object files move from the user-level cache to
$(MIX_APP_PATH)/obj/(i.e._build/<env>/lib/emily/obj/). As a consequence, plainmix cleannow correctly removes them via the existing Makefile rule — they were previously left behind becausemake cleandidn't see the cache-dir env vars.
Added
.github/workflows/release-nif.yml— on bare-semver tag push, builds the precompiled NIF for each(variant × target)cell and uploads tarball +.sha256sidecar to a draft GitHub release.workflow_dispatchis also wired for out-of-band rebuilds (artefacts go to workflow storage; the release is untouched).mix clean.mlx— wipes the MLX install dir(s) under the cache. Plainmix cleandeliberately preserves them since rebuilding MLX from source is ~5-7 minutes.
Fixed
- MLX source builds are now atomic. The build script installs into
${PREFIX}.stagingand onlymvs onto the final path after the artefact sanity checks pass; an EXIT trap wipes the scratch dirs on failure. Previously, an interrupted build (Ctrl-C, killed process, concurrent run) left an empty install dir that subsequentmix compileruns misread as "MLX is already installed", silently skipping the build and bombing out inelixir_makewithmake: *** No rule to make target '.../mlx.metallib'. The compile-time check now requires bothlib/libmlx.aandlib/mlx.metallibto be present before trusting the dir. - Concurrent invocations of
build-mlx.shagainst the same install prefix are now serialised via amkdir-based lock with stale-PID reclaim. ElixirLS uses its own build path (.elixir_ls/build/...) so an LSP-drivenmix compileand a CLImix compile.emily_mlx --forcelock on differentMix.Project.with_build_lockkeys and freely raced into the same MLX cache dir, clobbering each other's${PREFIX}.build/mid-build and surfacing asclang ... Rename failed: ... No such file or directoryduring Metal-shader compilation. - CMake's FetchContent sub-build of metal_cpp / json / fmt during
configure runs with
CMAKE_BUILD_PARALLEL_LEVEL=1, dodging a race in its download → extract → rename → stamp-touch pipeline that surfaced asgetcwd: cannot access parent directoriesfollowed bycd: <dir>/_deps: No such file or directory. The main MLX build still runs at full NCPU jobs. - The MLX scratch build dir (
${PREFIX}.build) is preserved on configure failure soCMakeError.logsurvives for diagnostics.
Removed
config/local.exsoverride (obsoleted by the env-var plumbing)..github/workflows/release-mlx.yml(MLX build is folded into the NIF workflow).scripts/build-mlx-prebuilt.sh(superseded by in-treescripts/build-mlx.sh).scripts/smoke-test-package.shand the taggedsmoke-testjob inci.yml(simulated a source-compile consumer, no longer applicable).
See MAINTAINING.md for the updated release flow.
0.2.2 - 2026-04-23
Fixed
- MLX prebuilt download now runs on a peer VM (
:peer.start_link/1with stdio connection) so it is unaffected by Mix's code-path pruning during dep compilation. Previous releases crashed in the taggedsmoke-testCI lane with{:error, :nofile}/ "module :public_key is not available" on clean caches, because Mix removed the:ssl/:public_key/:asn1/:inetsebin directories from the parent VM's code path even though the apps were started. The peer node has a fresh code path, so standardhttpc+public_keywork without further shimming.
0.2.1 - 2026-04-22
Fixed
mix compilecrash on a cold MLX download in a clean consumer project.http_download!/2inmix.exscalled:public_key.cacerts_get/0right afterApplication.ensure_all_started(:ssl). The app-start path pulled:public_keyin transitively, but the module itself was not guaranteed to be loaded at call time — the tag-triggered Hex smoke test on CI blew up withUndefinedFunctionError ... module :public_key is not availableon 0.2.0.http_download!now force-loads the module via:code.ensure_loaded/1before touching it. Any checkout with a populated~/Library/Caches/emily/mlx-<v>-*directory skipped this path, which is why the break only surfaced in the first clean CI run.
0.2.0 - 2026-04-22
Added
- MLX prebuilt-release workflow
(
.github/workflows/release-mlx.yml). Manual workflow that buildslibmlx.a+mlx.metallib+ headers from a chosenml-explore/mlxtag and uploads the tarball to a draft GitHub release taggedmlx-<version>on this repo. Used to produce the prebuilts that Emily's compile step downloads instead of the previous source-build path. To cut a new MLX prebuilt release:- Run the workflow with
build_type=no-jiton macos-14 (producesmlx-<v>-macos-arm64-aot.tar.gz). - Run it again with
build_type=jiton macos-26 (producesmlx-<v>-macos-arm64-jit.tar.gz). - Copy the two SHA256s from the draft release's
.sha256sidecars into@mlx_checksumsinmix.exs. - Un-draft the release so consumers can fetch.
The heavy lifting sits in
scripts/build-mlx-prebuilt.sh, which runs standalone for local debugging:scripts/build-mlx-prebuilt.sh path/to/mlx-src 0.31.2 0.
- Run the workflow with
Emily.Fast.einsum/2— eager-only wrapper around MLX's path-optimisedmx::einsum. Accepts a standard Einstein-summation string and a list ofEmily.Backend-backed tensors; MLX picks the contraction order internally. Operands on any other backend raiseArgumentErrorwith a transfer-first message. The helper is a direct-call eager helper (same pattern asEmily.Quantization.quantized_matmul/2) and is intentionally notdefn-callable — a fallback viaNx.Defn.Expr.optional/3would require a full einsum-string parser and is deferred until a user needs cross-backend composability.
Fixed
Nx.top_k/2on Emily tensors. The backend'stop_k/3override pattern-matchedoutas a single%Nx.Tensor{}and returned a single tensor, but the real Nx callback contract takes{out_values, out_indices}and returns a{values, indices}tuple. Any call toNx.top_kraisedFunctionClauseError. Dropped the override so Nx falls back toargsort(:desc) + take_along_axis + slice_along_axis, each of which routes through Emily's backend.
Changed
- MLX prebuilt download replaces the vendored source build. The
vendor/mlxsubmodule and the cmake-from-source path are gone.mix compilenow downloads a SHA256-verifiedlibmlx.a+mlx.metallib+ headers tarball for the pinned@mlx_versionfrom this repo's releases into$EMILY_CACHEand links the NIF against it directly. Consumer prerequisites drop from "Xcode + Metal toolchain + cmake + submodule checkout" to just macOS Apple Silicon. The JIT / no-JIT switch moves from theEMILY_MLX_JITenv var toconfig :emily, mlx_variant: :jit | :no_jitinconfig/config.exs(default:no_jit); variant is read viaConfig.Reader.read!at project load, so a gitignoredconfig/local.exsis the supported per-checkout override. Version bumps are a single-commit change of@mlx_version+@mlx_checksumsinmix.exs, paired with a newmlx-<version>GitHub release produced byrelease-mlx.yml. First MLX pin under the new scheme: 0.31.2. - Microscaled quantization modes on
Emily.QuantizedWeight. The container now carries a:modefield (default"affine") and accepts"mxfp4","mxfp8","nvfp4"— MLX's fullQuantizationModeenum (vendor/mlx/mlx/primitives.h:155).from_dense/2,to_dense/1, andEmily.Quantization.quantized_matmul/2all thread the mode through to MLX; mode-specific{group_size, bits}constraints are validated up front with a clear Emily error before the NIF call. Microscaled modes carry a placeholder biases tensor — MLX'sfp_quantizereturns only(wq, scales), and the Native layer substitutesnilbefore the MLX call.Emily.Quantization.dequantize_defn/1is affine-only (it's a hand-rolled nibble unpacker) and now raisesArgumentErroron non-affine modes, pointing users atto_dense/1. Smoke-tested end-to-end on Metal for all four modes (Apple Silicon, macOS 26). - SDPA attention sinks (
mx::fast::scaled_dot_product_attentionsinksparam).Emily.Fast.scaled_dot_product_attention/4andscaled_dot_product_attention_with_mask/5now accept an optional:sinkskeyword opt — a per-head tensor broadcastable to{1, heads, 1, 1}whose entries participate in the softmax denominator as extra "null destinations" (StreamingLLM). When absent the helpers emit the pre-existing optional-node, soEmily.Bumblebee.FastKernelsand direct callers stay source- and bit-compatible. The defn fallback implements the same semantics in numerically-stable form; equivalence vs. the fused kernel was measured at ~2e-7 max-abs-diff on f32. - MLX JIT build no longer patches vendored MLX. The
patches/mlx-jit-nax-gate.patchworkaround (and themaybe_apply_mlx_patchesplumbing inmix.exs) has been removed. The JIT build now requires the macOS 26.2+ SDK directly, which ships<MetalPerformancePrimitives/MetalPerformancePrimitives.h>; the AOT (default) build is unchanged and still works on older macOS. Upstream discussion: ml-explore/mlx#3426. - CI matrix split across macOS versions. The
jit=0row stays onmacos-14to keep AOT coverage on older macOS; thejit=1row now runs onmacos-26so the Metal Performance Primitives SDK is available natively. - Native axis reversal via
mx::slicewith stride -1. The descending branches ofNx.sortandNx.argsort(andNx.reverse) previously built anarangeindex tensor and gathered withtake. They now call a newNative.flip/3NIF that lowers to a single strided slice, saving the index allocation and gather kernel per call. - Parallel NIF C++ build.
elixir_makedoesn't pass-jby default andmix.exsdidn't set:make_args, so every.cppinc_src/compiled serially.mix.exsnow passes-j#{System.schedulers_online()}through, and the vestigialJOBS/MAKE_JOBSpair in theMakefile(computed but never referenced) has been removed. On an 8-core M-series, a clean NIF build drops from ~19 s to ~7 s.
0.1.2 - 2026-04-19
Fixed
- HexDocs source links.
mix.exs'ssource_url_patternprepended avprefix to the version tag, but the project's release convention (viamix publisho) uses bare semver tags. The generated[source]links in HexDocs pointed at nonexistentv<version>tags. Dropped the prefix so links resolve to the actual tag.
0.1.1 - 2026-04-19
Initial release. See the git history for per-milestone detail.
Added
- Nx backend.
Emily.Backendimplements every requiredNx.Backendcallback against MLX, with transparent fallback toNx.BinaryBackendfor ops without a native primitive. - Defn compiler.
Emily.Compilerrunsdefn/Nx.Serving/ Bumblebee on Emily; pins the result backend and caps partition concurrency soNx.Servingstays compatible. - Fused transformer kernels.
Emily.Fastexposesmx::fast::rms_norm,layer_norm,rope, and scaled-dot-product attention as defn-callable helpers with composed-defn fallbacks for non-Emily backends.Emily.Bumblebee.FastKernelsrewrites a Bumblebee Axon graph to call the fused kernels in place; declared as an optional dep on:axon+:bumblebee, elides cleanly if either is absent. - Affine group-wise quantization.
Emily.QuantizedWeightandEmily.Quantizationwrap MLXquantize/dequantize/quantized_matmulfor int2 / int4 / int8 inference.Emily.Quantization.dequantize_defn/1provides a defn-native dequantize for use inside Axon forward passes. - Mixed-precision training.
Emily.MixedPrecisionships the bf16 recipe:cast_paramsfor the forward pass, f32 master weights, dynamic loss scaling with overflow detection. - Per-process Metal streams.
Emily.Streamlets each BEAM process own its own Metal command queue, enabling concurrent inference on a shared model. - Zero-copy
to_binary.Nx.to_binary/1on an Emily tensor returns a BEAM resource binary aliasing the MLX buffer — no memcpy. - Native gradient + training primitives.
gather,scatter,scatter_add,conv, and the window-reduction family lower directly to MLX soNx.Defn.gradand CNN training stay native. - Native linalg.
lu,svd,qr,cholesky,eigh,solve, andtriangular_solvedispatch tomx::linalg::*instead of rounding throughNx.BinaryBackend. - Telemetry.
[:emily, :eval, *],[:emily, :to_binary, *],[:emily, :fallback, *], and[:emily, :memory, :stats]span events; opt-in one-shot fallback warnings viaconfig :emily, :warn_on_fallback, true. - Compile-time debug flags.
:debug_bounds_checkand:debug_detect_nan_infre-enable runtime assertions on hot paths; default off with zero runtime cost. - Bumblebee conformance. End-to-end suites for DistilBERT, Qwen3-0.6B (dense and quantized), ViT-base, and Whisper-tiny, pinned against HuggingFace reference values.
- Worker-thread dispatch. Each MLX stream is owned by a
dedicated OS thread. NIFs enqueue work on the worker and return
immediately; the worker posts the result back to the caller via
enif_send, and the public wrapper awaits it withreceive. No BEAM scheduler (regular or dirty) blocks on MLX work, and the per-thread MetalCommandEncoderstate stays consistent regardless of how the BEAM migrates Elixir processes between schedulers. - Vendored MLX build. MLX is built from source via cmake from
vendor/mlx(git submodule); no prebuilt download. Build cache keyed on the submodule SHA under~/Library/Caches/emily/. - Documentation. Per-module HexDocs, five runnable Livebooks
(
notebooks/distilbert_qa.livemd,notebooks/qwen3_quantized.livemd,notebooks/mnist_training.livemd,notebooks/whisper_transcription.livemd,notebooks/fast_kernels.livemd), and worked Bumblebee examples in the conformance suite.