This document explains how ex_data_sketch ships its Rust NIF as a set
of precompiled binary artifacts, why this matters for adoption, and how
the v0.8.0 release pipeline produces those artifacts.
Why precompiled NIFs matter
A Rust NIF is a .so / .dylib / .dll file that the BEAM dynamically
loads at runtime. Building it requires:
- a working Rust toolchain (
rustc,cargo, the target's standard library); - a working C linker (
cc,link.exe, etc.) for system glue; - network access during
cargoto download crate dependencies; - a non-trivial amount of CPU time (~30s on a modern laptop for a clean release build).
For a typical Elixir application that adds ex_data_sketch as a
dependency, this means:
- developers cannot
mix deps.get && mix compileand have it Just Work unless they install Rust first; - CI pipelines need to either bake Rust into the base image or pay the toolchain install cost on every run;
- Docker layer caching for
mix deps.compileis invalidated whenever the dependency tree changes.
The RustlerPrecompiled library (and the rustler_precompiled_action
GitHub Action) solves this by:
- Building the NIF on every supported platform at release time and uploading each result as a GitHub Release asset.
- At
mix deps.compiletime on the downstream side, downloading the pre-built.tar.gzmatching the host'starget-triple + nif-versionfrom the GitHub Release URL. - Verifying the downloaded artifact's SHA-256 against a checksum file shipped in the Hex package.
- Falling back to a source build (via
EX_DATA_SKETCH_BUILD=true) if the user explicitly opts in, or if the precompiled artifact is missing for the host.
The end result for downstream users: a Rust-toolchain-free
mix deps.get, with a verifiable supply chain.
Platform matrix (v0.8.0)
| Target triple | OS / Architecture | Runner | Cross? |
|---|---|---|---|
aarch64-apple-darwin | macOS 11+ (Apple Silicon) | macos-14 | no |
x86_64-apple-darwin | macOS 10.15+ (Intel) | macos-14 | no |
x86_64-unknown-linux-gnu | glibc Linux (x86_64) | ubuntu-22.04 | no |
x86_64-unknown-linux-musl | musl Linux (Alpine, etc.) | ubuntu-22.04 | yes |
aarch64-unknown-linux-gnu | glibc Linux (ARM64) | ubuntu-22.04 | yes |
aarch64-unknown-linux-musl | musl Linux (ARM64) | ubuntu-22.04 | yes |
x86_64-pc-windows-msvc | Windows 10+ (x86_64) | windows-2022 | no |
aarch64-pc-windows-msvc | Windows 11 (ARM64) | windows-2022 | no |
Two NIF API versions are produced per target (2.16 and 2.17), giving 16 artifacts per release.
Targets explicitly NOT covered (with rationale)
- FreeBSD / NetBSD / OpenBSD — GitHub Actions does not provide BSD
runners; cross-compilation to BSD requires a libc shim that
cross-rsdoes not bundle by default. Users on BSD must build from source withEX_DATA_SKETCH_BUILD=1. Volume is low enough to defer to v1.0+. riscv64gc-unknown-linux-gnu— too small a user base to justify the cross-build complexity. Users build from source.x86_64-pc-windows-gnu(MinGW) — superseded by MSVC. MSVC is the Microsoft-supported default toolchain and aligns with what Erlang itself ships.- Old macOS Intel (pre-10.15) —
xxhash-rustrequires recent macOS SDKs; pre-10.15 is out of warranty from Apple and not in our test matrix.
Release pipeline
The release pipeline is .github/workflows/release.yml. It runs on
every git tag v* push and has three jobs that execute in sequence:
1. build_release
Matrix-builds the NIF for all 8 targets × 2 NIF versions = 16 jobs. Each job:
- Checks out the repo at the tagged commit.
- Installs the Rust toolchain via
dtolnay/rust-toolchain@stable. - On non-native targets, installs the cross-compile target via
rustup target add. - Builds the NIF via
philss/rustler-precompiled-action@v1.1.4, which under the hood:- runs
cargo build --release --target <triple>; - optionally invokes
crossfor Linux musl / ARM64 targets; - packages the resulting
.so/.dylib/.dllintolibex_data_sketch_nif-v<VERSION>-nif-<NIF>-<TRIPLE>.tar.gz; - emits the file name and path as action outputs.
- runs
- Uploads the tarball as a GitHub Actions artifact.
2. release
Downloads all 16 build artifacts, flattens them into a single nifs/
directory, and creates a GitHub Release with all tarballs attached.
The release notes are auto-generated by softprops/action-gh-release.
3. publish_hex
Once the release exists:
- Checks out the repo.
- Installs Elixir + Erlang + Rust (the Rust toolchain is needed only
to satisfy
rustler_precompiled's compile-time validation, NOT to build the NIF —EX_DATA_SKETCH_BUILD=truewould build but is set here only to skip the artifact download check during checksum generation). - Runs
mix rustler_precompiled.download ExDataSketch.Nif --all --printwhich fetches every artifact from the just-published GitHub Release and writes the SHA-256 checksums tochecksum-Elixir.ExDataSketch.Nif.exs. - Runs
mix hex.publish --yes, which uploads the Hex package including the now-populated checksum file.
The end result is a Hex package that, when installed by a downstream
project on any of the 16 supported (target, NIF) combinations, will:
- read the
checksum-Elixir.ExDataSketch.Nif.exsmap at compile time; - look up the SHA-256 for the host's triple + NIF version;
- download the matching
.tar.gzfrom the GitHub Release; - verify the SHA-256 against the checksum file;
- extract the
.so/.dylib/.dllintopriv/native/; - load it via
:erlang.load_nif/2at module load time.
Source-compile fallback
The RustlerPrecompiled setup in lib/ex_data_sketch/nif.ex is gated
on EX_DATA_SKETCH_SKIP_NIF at compile time:
unless System.get_env("EX_DATA_SKETCH_SKIP_NIF") in ["1", "true"] do
use RustlerPrecompiled,
otp_app: :ex_data_sketch,
crate: "ex_data_sketch_nif",
base_url: "https://github.com/thanos/ex_data_sketch/releases/download/v#{version}",
version: version,
nif_versions: ["2.16", "2.17"],
targets: [...]
endTwo compile-time env vars influence this:
EX_DATA_SKETCH_SKIP_NIF=true— skipsuse RustlerPrecompiledentirely. The NIF stubs (def xxhash3_64_nif(...), do: :erlang.nif_error(:not_loaded)) are the only code that gets loaded. Any call into the NIF raises:erlang.nif_error(:not_loaded).ExDataSketch.Hash.nif_available?/0correctly returnsfalse.EX_DATA_SKETCH_BUILD=true— seeconfig/config.exs. Setsconfig :rustler_precompiled, :force_build, ex_data_sketch: true. This causesRustlerPrecompiledto invokerustlerand build the NIF from source instead of downloading the precompiled artifact.
The two flags are independent and intentionally so:
SKIP_NIFis for fast iterative test cycles where the user does not need NIF-accelerated paths (CI's NIF-off matrix lane uses this).BUILDis for development on a target that has no precompiled artifact (e.g. FreeBSD, NetBSD), or for verifying that the source matches the precompiled artifact.
Validating the contract
The contract between the precompiled setup and the user-facing API is
locked by test/ex_data_sketch/nif_availability_test.exs. It asserts:
Hash.nif_available?/0returns a stable boolean and is cached in:persistent_term.Hash.default_algorithm/0is:xxhash3when the NIF is loaded and:phash2otherwise.Hash.algorithm_info/1:available?flag reflects the NIF state for:xxhash3and istruefor:murmur3and:phash2.Backend.Rust.available?/0mirrorsHash.nif_available?/0.Backend.default/0isPureunless the application has been explicitly configured to use the Rust backend — the NIF is never silently selected as the default.- The XXH3 wrapper raises
ArgumentErrorwhen the NIF is unavailable (rather than silently falling back). - The pure-Elixir Murmur3 path works without the NIF.
- The checksum file exists and is a valid Elixir map.
- The target list declared in
nif.exmatches the expected matrix (developer-facing alignment guard betweennif.exandrelease.yml).
These tests run in both NIF-on and NIF-off CI lanes; the body of each
test branches on Hash.nif_available?/0.
Reproducing the release locally
For maintainers verifying the pipeline:
# Build for the host's native target, source-compiled.
EX_DATA_SKETCH_BUILD=1 mix compile
# Verify the test suite under both modes. Use the dedicated aliases
# so the per-env rustler_precompiled state is reset automatically.
EX_DATA_SKETCH_BUILD=1 mix test.nif_on
EX_DATA_SKETCH_SKIP_NIF=true mix test.nif_off
# Dry-run a cross-build (Linux musl from macOS, using `cross`).
cd native/ex_data_sketch_nif
cross build --release --target aarch64-unknown-linux-musl
The full 16-artifact release matrix can only be exercised on GitHub Actions because some targets (Apple Silicon, Windows ARM64) cannot be cross-compiled to from a Linux runner.
Why two aliases?
The force_build: true / false value in config/config.exs is read by
rustler_precompiled as a compile-time setting (it determines
whether to bake in the precompiled-download logic or the source-build
logic). When a maintainer flips EX_DATA_SKETCH_BUILD between local
runs, the runtime value disagrees with the previously-compiled
_build/<env>/lib/rustler_precompiled/ebin/ state and the BEAM aborts
startup with:
the application :rustler_precompiled has a different value set for path [:ex_data_sketch] inside key :force_build during runtime compared to compile time
The test.nif_on and test.nif_off aliases avoid this by running
mix deps.clean rustler_precompiled --build before mix test. CI sets
the env once per job and does not flip modes, so it does not need them.
Failure modes and recovery
"Precompiled NIF download failed"
Caused by:
- a target that is not yet in the matrix (most likely a new platform);
- a checksum mismatch (a corrupted upload, very rare);
- the GitHub Release artifact being deleted or renamed;
- network failure during
mix deps.compile.
User remedy:
# Force source compilation.
EX_DATA_SKETCH_BUILD=1 mix deps.compile ex_data_sketch
This requires the user to have a working Rust toolchain. If Rust is not available, the user can also fall back to the pure backend:
EX_DATA_SKETCH_SKIP_NIF=true mix deps.compile ex_data_sketch
…and use the pure-Elixir paths (:phash2 or :murmur3 hash strategy
with Backend.Pure). The pure paths are ~15× slower than the NIF
paths (see hll_performance.md) but correct.
"Hex publish failed: checksum file empty"
Caused by mix rustler_precompiled.download --all --print not finding
the GitHub Release artifacts. The release job must complete
successfully before publish_hex runs; if a build target failed, the
release will still be created but the checksum-download step will see
a missing artifact for that target.
Maintainer remedy: re-run only the failed build_release matrix
entries, then re-run publish_hex manually.
"Stale checksum file in git"
If a developer accidentally commits a populated
checksum-Elixir.ExDataSketch.Nif.exs from a local
mix rustler_precompiled.download, the next release will overwrite it
in the publish_hex step. Pre-release the file should remain %{} in
git; the release pipeline owns its content.
Future work (out of scope for v0.8.0)
- FreeBSD target. Requires a FreeBSD GitHub Actions runner (which
GitHub does not provide) or a cross-compile pipeline using
cross-rswith a custom FreeBSD libc image. Deferred to v0.10+. - NIF 2.18+ support. Currently we ship 2.16 and 2.17 only. The next NIF API bump will require adding to the matrix.
- Reproducible builds. The current pipeline does not guarantee
byte-identical artifacts across rebuilds.
cargo buildis mostly deterministic butrustcincludes timestamps and absolute paths. Closing this gap requires--remap-path-prefixand a frozen build environment. Out of scope for v0.8.0. - SBOM / SLSA provenance. Generating a Software Bill of Materials
and SLSA Level 3 provenance for each release artifact. The
actions/attest-build-provenanceaction makes this easy; deferred to a v1.0 hardening pass. - Mirror artifacts to a CDN. Currently all artifacts are served by GitHub Releases. For high-volume downstream installs, a CDN mirror (or Hex itself hosting the NIFs) would reduce latency.
References
lib/ex_data_sketch/nif.ex— theuse RustlerPrecompiledblock..github/workflows/release.yml— the release pipeline.checksum-Elixir.ExDataSketch.Nif.exs— the SHA-256 catalog.mix.exspackage/0— the Hex package file list.config/config.exs— theforce_buildtoggle.test/ex_data_sketch/nif_availability_test.exs— the contract tests.- philss/rustler_precompiled
- philss/rustler-precompiled-action