Membrane Element: GCloud SpeechToText v0.3.1 Membrane.Element.GCloud.SpeechToText View Source

An element providing speech recognition via Google Cloud Speech To Text service using Streaming API.

The element has to handle a connection time limit (currently 5 minutes). It does that by spawning multiple streaming clients - the streaming is stopped after streaming_time_limit (see t/0) and a new client that starts streaming is spawned. The old one is kept alive for results_await_time and will receive recognition results for the streamed audio.

This means that first results from the new client might arrive before the last result from an old client.

Bear in mind that streaming_time_limit + results_await_time must be smaller than recognition time limit for Google Streaming API (currently 5 minutes)

Element options

Passed via struct Membrane.Element.GCloud.SpeechToText.t/0

  • reconnection_overlap_time

    Default value: 2 |> Membrane.Time.seconds()

    Duration of audio re-sent in a new client session after reconnection

  • results_await_time

    Default value: 90 |> Membrane.Time.seconds()

    The amount of time a client that stopped streaming is kept alive awaiting results from recognition API.

  • streaming_time_limit

    Default value: 200 |> Membrane.Time.seconds()

    Determines how much audio can be sent to recognition API in one client session. After this time, a new client session is created while the old one is kept alive for some time to receive recognition results.

    Bear in mind that streaming_time_limit + results_await_time must be smaller than recognition time limit for Google Streaming API (currently 5 minutes)

  • model

    Default value: :default

    Model used for speech recognition. Bear in mind that :video model is a premium model that costs more than the standard rate.

  • speech_contexts

    Default value: []

    A list of speech recognition contexts. See the docs for more info.

  • word_time_offsets

    Default value: false

    If true, the top result includes a list of words and the start and end time offsets (timestamps) for those words.

  • interim_results

    Default value: false

    If set to true, the interim results may be returned by recognition API. See Google API docs for more info.

  • language_code

    Default value: "en-US"

    The language of the supplied audio. See Language Support for a list of supported languages codes.

Pads

:input

Availability:always
CapsMembrane.Caps.Audio.FLAC
Demand unit:buffers
Direction:input
Mode:pull

Link to this section Summary

Types

t()

Struct containing options for Membrane.Element.GCloud.SpeechToText

Functions

Returns description of options available for this module

Link to this section Types

Link to this type

t()

View Source
t() :: %Membrane.Element.GCloud.SpeechToText{
  interim_results: boolean(),
  language_code: String.t(),
  model: :default | :video | :phone_call | :command_and_search,
  reconnection_overlap_time: Membrane.Time.t(),
  results_await_time: Membrane.Time.t(),
  speech_contexts: [%Google.Cloud.Speech.V1.SpeechContext{phrases: term()}],
  streaming_time_limit: Membrane.Time.t(),
  word_time_offsets: boolean()
}

Struct containing options for Membrane.Element.GCloud.SpeechToText

Link to this section Functions

Returns pads descriptions for Membrane.Element.GCloud.SpeechToText

Returns description of options available for this module