WhisperCpp.Segment (whisper_cpp v0.2.0)

Copy Markdown View Source

One segment of a whisper.cpp transcription.

Times are seconds within the input audio. tokens is the raw text-token ID list (timestamp tokens stripped). no_speech_prob is whisper.cpp's no_speech probability for the segment. avg_logprob is the segment's average token log probability - filter at e.g. avg_logprob < -1.0 to reject low-confidence hallucinations. words is nil unless :word_timestamps was set on the transcribe call; when present it carries one %WhisperCpp.Word{} per word with its own time span.

Summary

Types

t()

@type t() :: %WhisperCpp.Segment{
  avg_logprob: float(),
  end: float(),
  no_speech_prob: float(),
  start: float(),
  text: String.t(),
  tokens: [non_neg_integer()],
  words: [WhisperCpp.Word.t()] | nil
}