ReqLLM.Speech (ReqLLM v1.14.0)

View Source

Text-to-speech generation functionality for ReqLLM.

Inspired by the Vercel AI SDK's generateSpeech() function, this module provides speech synthesis capabilities with support for:

  • Multiple voices and output formats
  • Speed control
  • Provider-specific instructions (e.g., tone, style)
  • Language selection

Usage

# Basic speech generation
{:ok, result} = ReqLLM.speak("openai:tts-1", "Hello, how are you?", voice: "alloy")
File.write!("greeting.mp3", result.audio)

# With options
{:ok, result} = ReqLLM.speak("openai:tts-1-hd", "Welcome to our app!",
  voice: "nova",
  speed: 1.2,
  output_format: :wav
)

# With instructions (gpt-4o-mini-tts)
{:ok, result} = ReqLLM.speak("openai:gpt-4o-mini-tts", "Breaking news!",
  voice: "coral",
  provider_options: [instructions: "Speak in an excited, energetic tone"]
)

Summary

Functions

Returns the base speech generation options schema.

Generates speech audio from text using an AI model.

Generates speech audio from text, raising on error.

Functions

schema()

@spec schema() :: NimbleOptions.t()

Returns the base speech generation options schema.

speak(model_spec, text, opts \\ [])

@spec speak(
  ReqLLM.model_input(),
  String.t(),
  keyword()
) :: {:ok, ReqLLM.Speech.Result.t()} | {:error, term()}

Generates speech audio from text using an AI model.

Returns a ReqLLM.Speech.Result containing the generated audio binary, media type, and format information.

Parameters

  • model_spec - Model specification (e.g., "openai:tts-1", "openai:gpt-4o-mini-tts")
  • text - The text to convert to speech
  • opts - Additional options (keyword list)

Options

  • :voice - Voice identifier (e.g., "alloy", "echo", "fable", "onyx", "nova", "shimmer")
  • :speed - Speech speed multiplier (0.25 to 4.0)
  • :output_format - Audio format: :mp3, :opus, :aac, :flac, :wav, :pcm
  • :language - ISO-639-1 language code
  • :provider_options - Provider-specific options (e.g., [instructions: "Speak calmly"])
  • :receive_timeout - HTTP timeout in milliseconds (default: 120_000)

Examples

{:ok, result} = ReqLLM.speak("openai:tts-1", "Hello world", voice: "alloy")
File.write!("hello.mp3", result.audio)

{:ok, result} = ReqLLM.speak("openai:tts-1-hd", "High quality audio",
  voice: "nova",
  output_format: :wav
)

speak!(model_spec, text, opts \\ [])

Generates speech audio from text, raising on error.

Same as speak/3 but raises on error.