AgentSea.Voice.STT.Whisper (agentsea_bumblebee v0.1.0)

Copy Markdown

In-process speech-to-text via Whisper (Bumblebee + Nx) — a local AgentSea.Voice.STT with no transcription API.

Build a serving once (it loads the model), then pass it per call:

serving = AgentSea.Voice.STT.Whisper.serving("openai/whisper-tiny")
{:ok, %{text: text}} = AgentSea.Voice.STT.Whisper.transcribe(audio, serving: serving)

The serving accepts what Bumblebee's Whisper serving accepts (raw PCM samples as an Nx.Tensor, or a file path when ffmpeg is available). The serving call is injectable (:run) so the transcription plumbing is testable without a model. Add :exla for real throughput (see AgentSea.Embedder.Bumblebee).

Summary

Functions

Build an Nx.Serving for a Whisper model. Downloads the model on first use.

Serving options from app config (e.g. an EXLA compiler), overridable per call.

Functions

serving(model_id \\ "openai/whisper-tiny", opts \\ [])

Build an Nx.Serving for a Whisper model. Downloads the model on first use.

serving_options()

Serving options from app config (e.g. an EXLA compiler), overridable per call.