This document explains how ex_data_sketch ships its Rust NIF as a set of precompiled binary artifacts, why this matters for adoption, and how the v0.8.0 release pipeline produces those artifacts.

Why precompiled NIFs matter

A Rust NIF is a .so / .dylib / .dll file that the BEAM dynamically loads at runtime. Building it requires:

  • a working Rust toolchain (rustc, cargo, the target's standard library);
  • a working C linker (cc, link.exe, etc.) for system glue;
  • network access during cargo to download crate dependencies;
  • a non-trivial amount of CPU time (~30s on a modern laptop for a clean release build).

For a typical Elixir application that adds ex_data_sketch as a dependency, this means:

  • developers cannot mix deps.get && mix compile and have it Just Work unless they install Rust first;
  • CI pipelines need to either bake Rust into the base image or pay the toolchain install cost on every run;
  • Docker layer caching for mix deps.compile is invalidated whenever the dependency tree changes.

The RustlerPrecompiled library (and the rustler_precompiled_action GitHub Action) solves this by:

  1. Building the NIF on every supported platform at release time and uploading each result as a GitHub Release asset.
  2. At mix deps.compile time on the downstream side, downloading the pre-built .tar.gz matching the host's target-triple + nif-version from the GitHub Release URL.
  3. Verifying the downloaded artifact's SHA-256 against a checksum file shipped in the Hex package.
  4. Falling back to a source build (via EX_DATA_SKETCH_BUILD=true) if the user explicitly opts in, or if the precompiled artifact is missing for the host.

The end result for downstream users: a Rust-toolchain-free mix deps.get, with a verifiable supply chain.

Platform matrix (v0.8.0)

Target tripleOS / ArchitectureRunnerCross?
aarch64-apple-darwinmacOS 11+ (Apple Silicon)macos-14no
x86_64-apple-darwinmacOS 10.15+ (Intel)macos-14no
x86_64-unknown-linux-gnuglibc Linux (x86_64)ubuntu-22.04no
x86_64-unknown-linux-muslmusl Linux (Alpine, etc.)ubuntu-22.04yes
aarch64-unknown-linux-gnuglibc Linux (ARM64)ubuntu-22.04yes
aarch64-unknown-linux-muslmusl Linux (ARM64)ubuntu-22.04yes
x86_64-pc-windows-msvcWindows 10+ (x86_64)windows-2022no
aarch64-pc-windows-msvcWindows 11 (ARM64)windows-2022no

Two NIF API versions are produced per target (2.16 and 2.17), giving 16 artifacts per release.

Targets explicitly NOT covered (with rationale)

  • FreeBSD / NetBSD / OpenBSD — GitHub Actions does not provide BSD runners; cross-compilation to BSD requires a libc shim that cross-rs does not bundle by default. Users on BSD must build from source with EX_DATA_SKETCH_BUILD=1. Volume is low enough to defer to v1.0+.
  • riscv64gc-unknown-linux-gnu — too small a user base to justify the cross-build complexity. Users build from source.
  • x86_64-pc-windows-gnu (MinGW) — superseded by MSVC. MSVC is the Microsoft-supported default toolchain and aligns with what Erlang itself ships.
  • Old macOS Intel (pre-10.15)xxhash-rust requires recent macOS SDKs; pre-10.15 is out of warranty from Apple and not in our test matrix.

Release pipeline

The release pipeline is .github/workflows/release.yml. It runs on every git tag v* push and has three jobs that execute in sequence:

1. build_release

Matrix-builds the NIF for all 8 targets × 2 NIF versions = 16 jobs. Each job:

  1. Checks out the repo at the tagged commit.
  2. Installs the Rust toolchain via dtolnay/rust-toolchain@stable.
  3. On non-native targets, installs the cross-compile target via rustup target add.
  4. Builds the NIF via philss/rustler-precompiled-action@v1.1.4, which under the hood:
    • runs cargo build --release --target <triple>;
    • optionally invokes cross for Linux musl / ARM64 targets;
    • packages the resulting .so / .dylib / .dll into libex_data_sketch_nif-v<VERSION>-nif-<NIF>-<TRIPLE>.tar.gz;
    • emits the file name and path as action outputs.
  5. Uploads the tarball as a GitHub Actions artifact.

2. release

Downloads all 16 build artifacts, flattens them into a single nifs/ directory, and creates a GitHub Release with all tarballs attached. The release notes are auto-generated by softprops/action-gh-release.

3. publish_hex

Once the release exists:

  1. Checks out the repo.
  2. Installs Elixir + Erlang + Rust (the Rust toolchain is needed only to satisfy rustler_precompiled's compile-time validation, NOT to build the NIF — EX_DATA_SKETCH_BUILD=true would build but is set here only to skip the artifact download check during checksum generation).
  3. Runs mix rustler_precompiled.download ExDataSketch.Nif --all --print which fetches every artifact from the just-published GitHub Release and writes the SHA-256 checksums to checksum-Elixir.ExDataSketch.Nif.exs.
  4. Runs mix hex.publish --yes, which uploads the Hex package including the now-populated checksum file.

The end result is a Hex package that, when installed by a downstream project on any of the 16 supported (target, NIF) combinations, will:

  • read the checksum-Elixir.ExDataSketch.Nif.exs map at compile time;
  • look up the SHA-256 for the host's triple + NIF version;
  • download the matching .tar.gz from the GitHub Release;
  • verify the SHA-256 against the checksum file;
  • extract the .so / .dylib / .dll into priv/native/;
  • load it via :erlang.load_nif/2 at module load time.

Source-compile fallback

The RustlerPrecompiled setup in lib/ex_data_sketch/nif.ex is gated on EX_DATA_SKETCH_SKIP_NIF at compile time:

unless System.get_env("EX_DATA_SKETCH_SKIP_NIF") in ["1", "true"] do
  use RustlerPrecompiled,
    otp_app: :ex_data_sketch,
    crate: "ex_data_sketch_nif",
    base_url: "https://github.com/thanos/ex_data_sketch/releases/download/v#{version}",
    version: version,
    nif_versions: ["2.16", "2.17"],
    targets: [...]
end

Two compile-time env vars influence this:

  • EX_DATA_SKETCH_SKIP_NIF=true — skips use RustlerPrecompiled entirely. The NIF stubs (def xxhash3_64_nif(...), do: :erlang.nif_error(:not_loaded)) are the only code that gets loaded. Any call into the NIF raises :erlang.nif_error(:not_loaded). ExDataSketch.Hash.nif_available?/0 correctly returns false.
  • EX_DATA_SKETCH_BUILD=true — see config/config.exs. Sets config :rustler_precompiled, :force_build, ex_data_sketch: true. This causes RustlerPrecompiled to invoke rustler and build the NIF from source instead of downloading the precompiled artifact.

The two flags are independent and intentionally so:

  • SKIP_NIF is for fast iterative test cycles where the user does not need NIF-accelerated paths (CI's NIF-off matrix lane uses this).
  • BUILD is for development on a target that has no precompiled artifact (e.g. FreeBSD, NetBSD), or for verifying that the source matches the precompiled artifact.

Validating the contract

The contract between the precompiled setup and the user-facing API is locked by test/ex_data_sketch/nif_availability_test.exs. It asserts:

  1. Hash.nif_available?/0 returns a stable boolean and is cached in :persistent_term.
  2. Hash.default_algorithm/0 is :xxhash3 when the NIF is loaded and :phash2 otherwise.
  3. Hash.algorithm_info/1 :available? flag reflects the NIF state for :xxhash3 and is true for :murmur3 and :phash2.
  4. Backend.Rust.available?/0 mirrors Hash.nif_available?/0.
  5. Backend.default/0 is Pure unless the application has been explicitly configured to use the Rust backend — the NIF is never silently selected as the default.
  6. The XXH3 wrapper raises ArgumentError when the NIF is unavailable (rather than silently falling back).
  7. The pure-Elixir Murmur3 path works without the NIF.
  8. The checksum file exists and is a valid Elixir map.
  9. The target list declared in nif.ex matches the expected matrix (developer-facing alignment guard between nif.ex and release.yml).

These tests run in both NIF-on and NIF-off CI lanes; the body of each test branches on Hash.nif_available?/0.

Reproducing the release locally

For maintainers verifying the pipeline:

# Build for the host's native target, source-compiled.
EX_DATA_SKETCH_BUILD=1 mix compile

# Verify the test suite under both modes. Use the dedicated aliases
# so the per-env rustler_precompiled state is reset automatically.
EX_DATA_SKETCH_BUILD=1 mix test.nif_on
EX_DATA_SKETCH_SKIP_NIF=true mix test.nif_off

# Dry-run a cross-build (Linux musl from macOS, using `cross`).
cd native/ex_data_sketch_nif
cross build --release --target aarch64-unknown-linux-musl

The full 16-artifact release matrix can only be exercised on GitHub Actions because some targets (Apple Silicon, Windows ARM64) cannot be cross-compiled to from a Linux runner.

Why two aliases?

The force_build: true / false value in config/config.exs is read by rustler_precompiled as a compile-time setting (it determines whether to bake in the precompiled-download logic or the source-build logic). When a maintainer flips EX_DATA_SKETCH_BUILD between local runs, the runtime value disagrees with the previously-compiled _build/<env>/lib/rustler_precompiled/ebin/ state and the BEAM aborts startup with:

the application :rustler_precompiled has a different value set for path [:ex_data_sketch] inside key :force_build during runtime compared to compile time

The test.nif_on and test.nif_off aliases avoid this by running mix deps.clean rustler_precompiled --build before mix test. CI sets the env once per job and does not flip modes, so it does not need them.

Failure modes and recovery

"Precompiled NIF download failed"

Caused by:

  • a target that is not yet in the matrix (most likely a new platform);
  • a checksum mismatch (a corrupted upload, very rare);
  • the GitHub Release artifact being deleted or renamed;
  • network failure during mix deps.compile.

User remedy:

# Force source compilation.
EX_DATA_SKETCH_BUILD=1 mix deps.compile ex_data_sketch

This requires the user to have a working Rust toolchain. If Rust is not available, the user can also fall back to the pure backend:

EX_DATA_SKETCH_SKIP_NIF=true mix deps.compile ex_data_sketch

…and use the pure-Elixir paths (:phash2 or :murmur3 hash strategy with Backend.Pure). The pure paths are ~15× slower than the NIF paths (see hll_performance.md) but correct.

"Hex publish failed: checksum file empty"

Caused by mix rustler_precompiled.download --all --print not finding the GitHub Release artifacts. The release job must complete successfully before publish_hex runs; if a build target failed, the release will still be created but the checksum-download step will see a missing artifact for that target.

Maintainer remedy: re-run only the failed build_release matrix entries, then re-run publish_hex manually.

"Stale checksum file in git"

If a developer accidentally commits a populated checksum-Elixir.ExDataSketch.Nif.exs from a local mix rustler_precompiled.download, the next release will overwrite it in the publish_hex step. Pre-release the file should remain %{} in git; the release pipeline owns its content.

Future work (out of scope for v0.8.0)

  • FreeBSD target. Requires a FreeBSD GitHub Actions runner (which GitHub does not provide) or a cross-compile pipeline using cross-rs with a custom FreeBSD libc image. Deferred to v0.10+.
  • NIF 2.18+ support. Currently we ship 2.16 and 2.17 only. The next NIF API bump will require adding to the matrix.
  • Reproducible builds. The current pipeline does not guarantee byte-identical artifacts across rebuilds. cargo build is mostly deterministic but rustc includes timestamps and absolute paths. Closing this gap requires --remap-path-prefix and a frozen build environment. Out of scope for v0.8.0.
  • SBOM / SLSA provenance. Generating a Software Bill of Materials and SLSA Level 3 provenance for each release artifact. The actions/attest-build-provenance action makes this easy; deferred to a v1.0 hardening pass.
  • Mirror artifacts to a CDN. Currently all artifacts are served by GitHub Releases. For high-volume downstream installs, a CDN mirror (or Hex itself hosting the NIFs) would reduce latency.

References

  • lib/ex_data_sketch/nif.ex — the use RustlerPrecompiled block.
  • .github/workflows/release.yml — the release pipeline.
  • checksum-Elixir.ExDataSketch.Nif.exs — the SHA-256 catalog.
  • mix.exs package/0 — the Hex package file list.
  • config/config.exs — the force_build toggle.
  • test/ex_data_sketch/nif_availability_test.exs — the contract tests.
  • philss/rustler_precompiled
  • philss/rustler-precompiled-action