How to upgrade the llama.cpp submodule and publish a new release.

Prerequisites

  • Elixir 1.18+, Erlang/OTP 27+
  • cmake
  • A GGUF model file for testing (e.g. Qwen3.5-0.8B)
  • An embedding model file for embedding tests (e.g. Qwen3-Embedding-0.6B)

1. Update the submodule

# Fetch latest upstream commits
git -C vendor/llama.cpp fetch origin

# Check what's new since the current pin
git -C vendor/llama.cpp log --oneline HEAD..origin/master

# Checkout the target commit
git -C vendor/llama.cpp checkout <commit-hash>

2. Check API compatibility

Before building, verify the llama.cpp APIs used by the NIF haven't changed:

# Diff the public header between old and new commits
git -C vendor/llama.cpp diff <old-commit>..<new-commit> -- include/llama.h

# Diff common headers used by the NIF
git -C vendor/llama.cpp diff <old-commit>..<new-commit> -- common/chat.h
git -C vendor/llama.cpp diff <old-commit>..<new-commit> -- common/json-schema-to-grammar.h

The NIF uses these key APIs (grep llama_nif.cpp for the full list):

  • llama_model_*, llama_context_*, llama_vocab_* — model/context/vocab management
  • llama_tokenize, llama_detokenize, llama_token_to_piece — tokenization
  • llama_batch_*, llama_decode — inference
  • llama_sampler_* — sampling chain
  • llama_memory_* — KV cache / memory management
  • llama_get_embeddings_*, llama_pooling_type — embeddings
  • llama_chat_apply_template — legacy chat templates
  • common_chat_templates_init, common_chat_templates_apply — Jinja chat templates
  • json_schema_to_grammar — grammar generation

If any signatures changed, update c_src/llama_cpp_ex/llama_nif.cpp and/or llama_nif.h.

3. Build and test

# Bump version first (so it builds from source instead of downloading precompiled)
# Edit mix.exs @version

# Clean build
mix clean && mix compile

# Run full test suite
LLAMA_MODEL_PATH=~/Downloads/Qwen3.5-0.8B-UD-Q4_K_XL.gguf \
LLAMA_EMBEDDING_MODEL_PATH=~/Downloads/Qwen3-Embedding-0.6B-f16.gguf \
mix test

# Verify formatting and types
mix format --check-formatted
mix dialyzer

4. Update version and changelog

  1. mix.exs line 40: bump @version (e.g. "0.6.5""0.6.6")
  2. CHANGELOG.md: add a new ## vX.Y.Z section at the top with:
    • The submodule commit range and count
    • Notable changes categorized by subsystem (follow existing format)

To list commits for the changelog:

git -C vendor/llama.cpp log --oneline <old-commit>..<new-commit>

5. Commit

git add vendor/llama.cpp mix.exs CHANGELOG.md
git commit -m "Bump llama.cpp to <short-hash>, release vX.Y.Z"

6. Tag and push

git tag vX.Y.Z
git push origin master
git push origin vX.Y.Z

The tag push triggers the precompile workflow (.github/workflows/precompile.yml) which:

  1. Builds precompiled NIFs for macOS (Metal) and Linux (CPU) across OTP 27 (NIF 2.17) and OTP 29 (NIF 2.18)
  2. Uploads .tar.gz artifacts to the GitHub release
  3. Runs mix elixir_make.checksum --all --ignore-unavailable and auto-commits checksum.exs to master

7. Publish to Hex

After the CI checksum commit lands:

git pull origin master   # get the updated checksum.exs
mix hex.publish

Troubleshooting

Compilation errors after upgrade

  • Missing function: check if the API was renamed or removed in include/llama.h
  • Struct field changes: check llama_model_params, llama_context_params, llama_batch structs
  • Common library changes: common/chat.h is the most volatile dependency — check common_chat_templates_inputs and common_chat_msg

Build downloads precompiled binary instead of compiling from source

The precompiler fetches binaries by version. If you haven't bumped the version, it'll find and use the old binary. Bump @version in mix.exs before running mix compile.

CI precompile fails

Check .github/workflows/precompile.yml for the build matrix. Common issues:

  • New llama.cpp dependencies not available in CI runners
  • CMake flag changes requiring updates to the Makefile