View Source ExAzureSpeech.SpeechToText.Websocket (ex_azure_speech v0.1.0)

Websocket Connection with the Azure Cognitive Services Speech-to-Text service.

The SpeechSDK internals is straightforward and works like this:

  1. Open a websocket connection to the Azure Cognitive Services Speech-to-Text service.
  2. The client sends a ExAzureSpeech.Common.Messages.SpeechConfigMessage informing the basic configuration for the recognition.
  3. The client sends a ExAzureSpeech.SpeechToText.Messages.SpeechContextMessage to configure the context of the recognition, with language, if it should run a pronunciation assessment and so on.
  4. The Server process the data and answers with a speech.startDetected followed by a turn.start message, indicating that its ready to receive the audio input.
  5. The client sends the audio input in chunks using the ExAzureSpeech.Common.Messages.AudioMessage message.
  6. The client sends a ExAzureSpeech.Common.Messages.AudioMessage.end_of_stream message to indicate the end of the audio input. (optional)
  7. The client waits for the recognition response, which is a JSON object with the recognition result and optionally the pronunciation assessment.
  8. The server sends a speech.endDetected message followed by a turn.end message, indicating that the recognition is over.

Summary

Types

All possible responses from the Azure Cognitive Services Speech-to-Text service.

Functions

Opens a WebSocket connection with the Azure Cognitive Services Speech-to-Text service.

Types

@type expected_responses() ::
  :speech_start_detected
  | :turn_start
  | :speech_hypothesis
  | :speech_end_detected
  | :speech_phrase
  | :turn_end

All possible responses from the Azure Cognitive Services Speech-to-Text service.

speech_start_detected: The server accepted the speech configs and contexts and will start a recognition turn.
turn_start: The server started a recognition turn and is waiting for audio input.
speech_hypothesis: The server is processing the audio input and has a hypothesis already processed
speech_end_detected: The server detected the end of the speech input and will return the results.
speech_phrase: The server has a recognition result.
turn_end: The server ended the recognition turn and the connection is ready for a new recognition turn or to being closed.

Functions

Link to this function

open_connection(opts, context_config, stream)

View Source

Opens a WebSocket connection with the Azure Cognitive Services Speech-to-Text service.