ReqLLM.Transcription (ReqLLM v1.12.0)

View Source

Speech-to-text transcription functionality for ReqLLM.

Inspired by the Vercel AI SDK's transcribe() function, this module provides audio transcription capabilities with support for:

  • Audio file transcription from binary data or file paths
  • Transcript segments with timing information
  • Language detection
  • Duration extraction
  • Provider-specific options

Usage

# Transcribe from a file path
{:ok, result} = ReqLLM.transcribe("openai:whisper-1", "/path/to/audio.mp3")

result.text
#=> "Hello, this is a transcription test."

result.segments
#=> [%{text: "Hello, this is a transcription test.", start_second: 0.0, end_second: 2.5}]

result.language
#=> "en"

result.duration_in_seconds
#=> 2.5

# Transcribe from binary audio data
audio_data = File.read!("/path/to/audio.mp3")
{:ok, result} = ReqLLM.transcribe("openai:whisper-1", {:binary, audio_data, "audio/mpeg"})

# With provider-specific options
{:ok, result} = ReqLLM.transcribe("openai:whisper-1", "/path/to/audio.mp3",
  language: "en",
  provider_options: [prompt: "ZyntriQix, Currentex, Reiterwood"]
)

Summary

Functions

Returns the base transcription options schema.

Transcribes audio using an AI model.

Transcribes audio, raising on error.

Functions

schema()

@spec schema() :: NimbleOptions.t()

Returns the base transcription options schema.

transcribe(model_spec, audio, opts \\ [])

@spec transcribe(
  ReqLLM.model_input(),
  String.t()
  | {:binary, binary(), String.t()}
  | {:base64, String.t(), String.t()},
  keyword()
) :: {:ok, ReqLLM.Transcription.Result.t()} | {:error, term()}

Transcribes audio using an AI model.

Returns a ReqLLM.Transcription.Result containing the transcribed text, segments with timing, detected language, and duration.

Parameters

  • model_spec - Model specification (e.g., "openai:whisper-1", "groq:whisper-large-v3")
  • audio - Audio input in one of these formats:
    • String.t() - File path to an audio file
    • {:binary, binary(), String.t()} - Raw audio binary with media type (e.g., {:binary, data, "audio/mpeg"})
    • {:base64, String.t(), String.t()} - Base64-encoded audio with media type
  • opts - Additional options (keyword list)

Options

  • :language - Language hint in ISO-639-1 format (e.g., "en")
  • :provider_options - Provider-specific options
  • :receive_timeout - HTTP timeout in milliseconds (default: 120_000)

Examples

# From file path
{:ok, result} = ReqLLM.transcribe("openai:whisper-1", "speech.mp3")
result.text #=> "Hello world"

# From binary data
data = File.read!("speech.mp3")
{:ok, result} = ReqLLM.transcribe("openai:whisper-1", {:binary, data, "audio/mpeg"})

# With language hint
{:ok, result} = ReqLLM.transcribe("openai:whisper-1", "speech.mp3", language: "en")

transcribe!(model_spec, audio, opts \\ [])

@spec transcribe!(
  ReqLLM.model_input(),
  String.t()
  | {:binary, binary(), String.t()}
  | {:base64, String.t(), String.t()},
  keyword()
) :: ReqLLM.Transcription.Result.t() | no_return()

Transcribes audio, raising on error.

Same as transcribe/3 but raises on error.