ReqLLM. Transcription
(ReqLLM v1.12.0)
View Source
Speech-to-text transcription functionality for ReqLLM.
Inspired by the Vercel AI SDK's transcribe() function, this module provides
audio transcription capabilities with support for:
- Audio file transcription from binary data or file paths
- Transcript segments with timing information
- Language detection
- Duration extraction
- Provider-specific options
Usage
# Transcribe from a file path
{:ok, result} = ReqLLM.transcribe("openai:whisper-1", "/path/to/audio.mp3")
result.text
#=> "Hello, this is a transcription test."
result.segments
#=> [%{text: "Hello, this is a transcription test.", start_second: 0.0, end_second: 2.5}]
result.language
#=> "en"
result.duration_in_seconds
#=> 2.5
# Transcribe from binary audio data
audio_data = File.read!("/path/to/audio.mp3")
{:ok, result} = ReqLLM.transcribe("openai:whisper-1", {:binary, audio_data, "audio/mpeg"})
# With provider-specific options
{:ok, result} = ReqLLM.transcribe("openai:whisper-1", "/path/to/audio.mp3",
language: "en",
provider_options: [prompt: "ZyntriQix, Currentex, Reiterwood"]
)
Summary
Functions
Returns the base transcription options schema.
Transcribes audio using an AI model.
Transcribes audio, raising on error.
Functions
@spec schema() :: NimbleOptions.t()
Returns the base transcription options schema.
@spec transcribe( ReqLLM.model_input(), String.t() | {:binary, binary(), String.t()} | {:base64, String.t(), String.t()}, keyword() ) :: {:ok, ReqLLM.Transcription.Result.t()} | {:error, term()}
Transcribes audio using an AI model.
Returns a ReqLLM.Transcription.Result containing the transcribed text,
segments with timing, detected language, and duration.
Parameters
model_spec- Model specification (e.g.,"openai:whisper-1","groq:whisper-large-v3")audio- Audio input in one of these formats:String.t()- File path to an audio file{:binary, binary(), String.t()}- Raw audio binary with media type (e.g.,{:binary, data, "audio/mpeg"}){:base64, String.t(), String.t()}- Base64-encoded audio with media type
opts- Additional options (keyword list)
Options
:language- Language hint in ISO-639-1 format (e.g., "en"):provider_options- Provider-specific options:receive_timeout- HTTP timeout in milliseconds (default: 120_000)
Examples
# From file path
{:ok, result} = ReqLLM.transcribe("openai:whisper-1", "speech.mp3")
result.text #=> "Hello world"
# From binary data
data = File.read!("speech.mp3")
{:ok, result} = ReqLLM.transcribe("openai:whisper-1", {:binary, data, "audio/mpeg"})
# With language hint
{:ok, result} = ReqLLM.transcribe("openai:whisper-1", "speech.mp3", language: "en")
@spec transcribe!( ReqLLM.model_input(), String.t() | {:binary, binary(), String.t()} | {:base64, String.t(), String.t()}, keyword() ) :: ReqLLM.Transcription.Result.t() | no_return()
Transcribes audio, raising on error.
Same as transcribe/3 but raises on error.