Deepgram.Speak (Deepgram v0.1.0)

View Source

Text-to-Speech services for the Deepgram API.

The Deepgram.Speak module provides comprehensive text-to-speech synthesis capabilities through Deepgram's API. It offers both synchronous (REST API) and asynchronous streaming (WebSocket API) approaches for converting text to natural-sounding speech.

Key Features

  • Text-to-Speech Synthesis - Convert text to high-quality audio
  • Multiple Voice Models - Access to various voice models like Aura 2
  • Voice Customization - Control pitch, rate, and other voice characteristics
  • Audio Format Options - Support for various audio formats and encodings
  • Streaming TTS - Real-time text-to-speech via WebSocket connections
  • SSML Support - Speech Synthesis Markup Language for fine-grained control
  • Asynchronous Callbacks - Send results to a webhook when processing completes

Authentication

All functions in this module require a properly configured Deepgram.Client struct, which can be created using Deepgram.new/1.

Example:

# Create client with API key
client = Deepgram.new(api_key: System.get_env("DEEPGRAM_API_KEY"))

# Or with OAuth token
client = Deepgram.new(token: "your-oauth-token")

Basic Usage

Synthesize text to speech and get audio data:

client = Deepgram.new(api_key: System.get_env("DEEPGRAM_API_KEY"))
text_source = %{text: "Welcome to Deepgram's text to speech API."}
options = %{model: "aura-2-thalia-en", encoding: "mp3"}
{:ok, audio_data} = Deepgram.Speak.synthesize(client, text_source, options)

Save synthesized audio to a file:

{:ok, response} = Deepgram.Speak.save_to_file(client, "welcome.mp3", text_source, options)

Advanced Usage

Using Speech Synthesis Markup Language (SSML):

# Create request with SSML
ssml_source = %{ssml: "<speak><p>Welcome to <emphasis>Deepgram's</emphasis> API.</p></speak>"}
{:ok, audio_data} = Deepgram.Speak.synthesize(client, ssml_source, options)

Live streaming synthesis via WebSocket:

options = %{model: "aura-2-thalia-en", encoding: "mp3"}
{:ok, ws} = Deepgram.Speak.live_synthesis(client, options)

Summary

Functions

Starts a live text-to-speech WebSocket connection.

Synthesizes text to speech and saves it to a file.

Synthesizes text to speech and returns the audio data.

Synthesizes text to speech with callback support (asynchronous).

Functions

live_synthesis(client, options \\ %{})

@spec live_synthesis(Deepgram.Client.t(), Deepgram.Types.Speak.speak_ws_options()) ::
  {:ok, pid()} | {:error, any()}

Starts a live text-to-speech WebSocket connection.

Parameters

Examples

iex> client = Deepgram.new(api_key: "your-api-key")
iex> options = %{model: "aura-2-thalia-en", encoding: "linear16"}
iex> {:ok, websocket} = Deepgram.Speak.live_synthesis(client, options)
{:ok, #PID<...>}

save_to_file(client, file_path, text_source, options \\ %{})

Synthesizes text to speech and saves it to a file.

Parameters

  • client - A Deepgram.Client struct
  • file_path - Path where the audio file should be saved
  • text_source - A map containing the text: %{text: "Hello, world!"}
  • options - Optional synthesis options (see Deepgram.Types.Speak.speak_options/0)

Examples

iex> client = Deepgram.new(api_key: "your-api-key")
iex> text_source = %{text: "Hello, world!"}
iex> options = %{model: "aura-2-thalia-en", encoding: "linear16"}
iex> {:ok, response} = Deepgram.Speak.save_to_file(client, "output.wav", text_source, options)
{:ok, %{content_type: "audio/wav", ...}}

synthesize(client, text_source, options \\ %{})

Synthesizes text to speech and returns the audio data.

Parameters

  • client - A Deepgram.Client struct
  • text_source - A map containing the text: %{text: "Hello, world!"}
  • options - Optional synthesis options (see Deepgram.Types.Speak.speak_options/0)

Examples

iex> client = Deepgram.new(api_key: "your-api-key")
iex> text_source = %{text: "Hello, world!"}
iex> options = %{model: "aura-2-thalia-en", encoding: "linear16"}
iex> {:ok, audio_data} = Deepgram.Speak.synthesize(client, text_source, options)
{:ok, <<binary_audio_data>>}

synthesize_callback(client, text_source, callback_url, options \\ %{})

@spec synthesize_callback(
  Deepgram.Client.t(),
  Deepgram.Types.Speak.text_source(),
  String.t(),
  Deepgram.Types.Speak.speak_options()
) :: {:ok, map()} | {:error, any()}

Synthesizes text to speech with callback support (asynchronous).

Parameters

  • client - A Deepgram.Client struct
  • text_source - A map containing the text: %{text: "Hello, world!"}
  • callback_url - URL to receive the audio result
  • options - Optional synthesis options (see Deepgram.Types.Speak.speak_options/0)

Examples

iex> client = Deepgram.new(api_key: "your-api-key")
iex> text_source = %{text: "Hello, world!"}
iex> callback_url = "https://example.com/webhook"
iex> options = %{model: "aura-2-thalia-en", encoding: "linear16"}
iex> {:ok, response} = Deepgram.Speak.synthesize_callback(client, text_source, callback_url, options)
{:ok, %{request_id: "..."}}