All notable changes to whisper_cpp will be documented in this file. The format follows Keep a Changelog; this project adheres to Semantic Versioning.

[0.4.0] - 2026-06-11

Added

  • Built-in voice activity detection: pass :vad_model_path (a silero GGML model from huggingface.co/ggml-org/whisper-vad) to strip silence before the encoder, with :vad_threshold, :vad_min_speech_ms, :vad_min_silence_ms, and :vad_speech_pad_ms tuning options. Audio with no detected speech returns an empty transcription. The NIF runs the VAD itself and remaps all timestamps back to the original timeline - whisper.cpp's own VAD hook is dead code on the state-based API whisper-rs uses.

Changed

  • Native whisper.cpp/GGML logging is filtered to warnings and errors; the dozens of info lines per model load no longer reach stderr. WHISPER_CPP_NATIVE_LOG accepts none, error, warn (default), info, and debug. VAD contexts stay per-call: loading the silero model costs about a millisecond, and a shared context would serialise detection across concurrent transcribes.
  • Integer options are bounded to u32 and the VAD millisecond knobs to two minutes, returning :invalid_request instead of raising or overflowing inside the detector. Option validation now also runs before the sub-0.3 s transcribe_slice short-circuit, and an abort raised during the VAD pass is honoured before the encoder starts.
  • :duration_ms must be at least 1. 0 previously meant "whole audio" without VAD but "empty window" with it; the ambiguity is rejected as :invalid_request.
  • Passing :vad_threshold, :vad_min_speech_ms, :vad_min_silence_ms, or :vad_speech_pad_ms without :vad_model_path returns :invalid_request instead of being silently ignored.
  • Buffers above i32::MAX samples (about 37 hours) are rejected instead of silently truncating at the FFI boundary.

Fixed

  • Multi-segment transcriptions no longer contain doubled spaces in Transcription.text (whisper segments carry their own leading space; the join added another). Space-free scripts no longer gain spurious spaces.
  • :temperature is validated to 0.0..1.0 (above 1.0 whisper.cpp's retry ladder is empty and the decoder state undefined), :n_threads to GGML's 512-thread abort threshold, and :beam_size/:best_of to whisper.cpp's 8-decoder limit - all returning :invalid_request instead of native crashes or opaque inference errors.
  • :best_of defaults to 5, matching whisper.cpp, and now also applies to temperature-fallback passes in beam-search mode.
  • Sub-0.3 s transcribe_slice windows validate options, buffer bounds, and alignment before returning the documented empty transcription, and a window of exactly 0.3 s transcribes instead of being dropped by float subtraction error. The empty result keeps the pinned language.
  • translate: true on English-only models returns :invalid_request instead of being silently ignored; use_gpu: false wins over a conflicting :device; invalid UTF-8 string options and non-keyword option lists return :invalid_request instead of raising.
  • Native error messages no longer leak the internal "kind=..." routing tag; results with no decoded segments echo the requested language instead of fabricating "en"; progress percentages are clamped to the documented 0..100.
  • Pcm.slice/4 rounds sample positions instead of truncating, so millisecond-precise windows keep their last sample.
  • Builds with two GPU features fail at compile time instead of silently picking one; unknown WHISPER_CPP_VARIANT values fail the build instead of falling back to the CPU artefact.
  • :abort_handle and :progress_pid callbacks no longer leak memory per call: the vendored whisper-rs (branch vendor/whisper-rs-0.16.0-patched) fixes the abort-trampoline type confusion and the callback closure leak at the source (upstream issues 277/271, fix PR 278), replacing the downstream pre-boxing and sentinel workarounds. The progress sender thread now exits via natural channel close. The same vendor patch stops set_language, set_initial_prompt, and the VAD path from leaking one CString per call.

[0.3.1] - 2026-06-11

Changed

  • Vendored whisper.cpp 1.8.3 -> 1.8.6. whisper-rs has no release vendoring anything newer, so whisper-rs-sys is patched via [patch.crates-io] to this repo's vendor/whisper-rs-sys-1.8.6 branch - the published whisper-rs-sys 0.15.0 with only its whisper.cpp submodule bumped. The patch applies to source builds and the precompiled NIF artefacts alike, and is dropped as soon as upstream re-vendors (see issue #18).
  • CI: sccache-action v0.0.9 -> v0.0.10 (Node 24; GitHub retires the Node 20 runtime on 2026-06-16).

[0.3.0] - 2026-06-11

Changed

  • rustler 0.37 → 0.38 (Rust crate and optional Hex package). Additive upstream release; no NIF API changes needed. The vendored whisper.cpp stays at 1.8.3 until whisper-rs ships a release vendoring something newer - upstream's latest (0.16.0, 2026-03-12) predates whisper.cpp 1.8.4.
  • language: nil now actually auto-detects on multilingual models, as the docs always claimed. Previously nil silently fell through to whisper.cpp's forced-"en" default, decoding non-English audio as English. English-only models resolve nil/"auto" to "en".
  • :language is validated against whisper.cpp's language table. Unknown codes - including BCP 47 tags such as "de-CH" - return :invalid_request instead of silently corrupting the decoder prompt with an invalid language token. Passing a non-English language to an English-only model is rejected the same way instead of being silently ignored.
  • The :beam_size and :best_of docs state the real defaults: greedy decoding with best_of: 1. The docs previously claimed a beam-search default of 5 that no code path produced.

Fixed

  • :abort_handle cancellation works now. The abort callback is passed to whisper-rs as a boxed trait object so the trampoline polls the real flag; the bare closure was reinterpreted memory (out-of-bounds reads) and the flag was never consulted, so cancellation silently did nothing.
  • :progress_pid no longer leaks one OS thread per call. The progress sender thread is shut down explicitly after inference; the previous design waited for a channel close that whisper-rs's leaked callback closure could never trigger.
  • :word_timestamps no longer corrupts multibyte UTF-8. Token bytes are accumulated per word and converted once, so characters split across BPE tokens (umlauts and most non-Latin scripts) survive instead of turning into replacement characters.
  • Dropping the last reference to a loaded model frees the whisper context on a detached thread instead of the garbage-collecting BEAM scheduler, which a multi-gigabyte free would stall.
  • {:pcm_f32, _} buffers containing NaN or infinity samples are rejected with :invalid_request instead of being fed to inference.

[0.2.0] - 2026-05-20

Added

  • WhisperCpp.load_model/2: GGML/GGUF model loading with :cpu, :cuda, :hipblas, :vulkan, :metal, :coreml, :intel_sycl, and :auto device selection.
  • WhisperCpp.transcribe/3: full whisper.cpp transcription on {:pcm_f32, binary} buffers (little-endian f32 mono at 16 kHz) with segment, token, and optional per-word output.
  • WhisperCpp.transcribe_slice/4: time-shifted per-slice transcription that reuses one decoded PCM buffer.
  • WhisperCpp.AbortHandle: cooperative cancellation. Pass an %AbortHandle{} via :abort_handle and call AbortHandle.abort/1 from another process to stop in-flight inference; the partial transcription produced before the abort is returned.
  • :progress_pid transcribe option: receive {:whisper_progress, pct} messages as work advances; duplicate percentages are coalesced.
  • :word_timestamps option for per-word timing.
  • WhisperCpp.available_devices/0: backend introspection for the loaded NIF artefact.
  • WhisperCpp.Pcm: PCM slicing helpers. Audio file decoding is intentionally out of scope; callers decode upstream (ffmpeg, Bumblebee, ...) and share one decoded PCM buffer across stages.
  • Rustler NIF built on whisper-rs, with cargo features for cuda, hipblas, vulkan, metal, coreml, intel-sycl, openblas, and openmp. Inference does not serialise across processes sharing one loaded model.
  • Precompiled NIF artefacts via rustler_precompiled for x86_64 / aarch64 Linux (CPU, CUDA, hipBLAS variants) and aarch64 macOS (Metal).