View Source ExAzureSpeech.SpeechToText.Websocket (ex_azure_speech v0.2.2)
Websocket Connection with the Azure Cognitive Services Speech-to-Text service.
The SpeechSDK internals is straightforward and works like this:
- Open a websocket connection to the Azure Cognitive Services Speech-to-Text service.
- The client sends a
ExAzureSpeech.Common.Messages.SpeechConfigMessage
informing the basic configuration for the recognition. - The client sends a
ExAzureSpeech.SpeechToText.Messages.SpeechContextMessage
to configure the context of the recognition, with language, if it should run a pronunciation assessment and so on. - The Server process the data and answers with a speech.startDetected followed by a turn.start message, indicating that its ready to receive the audio input.
- The client sends the audio input in chunks using the
ExAzureSpeech.Common.Messages.AudioMessage
message. - The client sends a
ExAzureSpeech.Common.Messages.AudioMessage.end_of_stream
message to indicate the end of the audio input. (optional) - The client waits for the recognition response, which is a JSON object with the recognition result and optionally the pronunciation assessment.
- The server sends a speech.endDetected message followed by a turn.end message, indicating that the recognition is over.
Summary
Types
All possible responses from the Azure Cognitive Services Speech-to-Text service.
Functions
Opens a WebSocket connection with the Azure Cognitive Services Speech-to-Text service.
Types
@type expected_responses() ::
:speech_start_detected
| :turn_start
| :speech_hypothesis
| :speech_end_detected
| :speech_phrase
| :turn_end
All possible responses from the Azure Cognitive Services Speech-to-Text service.
speech_start_detected: The server accepted the speech configs and contexts and will start a recognition turn.
turn_start: The server started a recognition turn and is waiting for audio input.
speech_hypothesis: The server is processing the audio input and has a hypothesis already processed
speech_end_detected: The server detected the end of the speech input and will return the results.
speech_phrase: The server has a recognition result.
turn_end: The server ended the recognition turn and the connection is ready for a new recognition turn or to being closed.
Functions
@spec open_connection( ExAzureSpeech.SpeechToText.SocketConfig.t(), ExAzureSpeech.SpeechToText.SpeechContextConfig.t(), Enumerable.t() ) :: {:ok, pid()} | {:error, ExAzureSpeech.Auth.Errors.Unauthorized.t() | ExAzureSpeech.Auth.Errors.Failure.t()}
Opens a WebSocket connection with the Azure Cognitive Services Speech-to-Text service.
@spec process_to_stream( websocket_pid :: pid(), close_connection_callback :: function() ) :: {:ok, Enumerable.t()} | {:error, ExAzureSpeech.Common.Errors.WebsocketConnectionFailed.t() | ExAzureSpeech.Common.Errors.FailedToDispatchCommand.t()}