View Source ExAzureSpeech.SpeechToText.SpeechContextConfig (ex_azure_speech v0.2.1)

Configures the Speech-to-Text context.-- THe objective of the Speech Context is to provide more data to the Speech-to-Text service, so it can better understand the user's speech, it also provides configurations for speech assessment and detailed output analysis.

Summary

Types

t()
  • :phrase_detection (keyword/0) - Specifies details about the phrase detection.

Functions

Returns a valid configuration for the Speech-to-Text context.

Types

@type t() :: [phrase_detection: keyword(), speech_assessment: keyword()]
  • :phrase_detection (keyword/0) - Specifies details about the phrase detection.

    • :recognition_mode - The recognition mode to be used. :interactive is optimized for short phrases, :conversation is optimized for conversational speech, and :dictation is optimized for long-form speech.

    • :speech_segmentation_silence_ms (integer/0) - The minimum length of silence that indicates the end of a phrase.

  • :speech_assessment (keyword/0) - Configuration for the speech aassesment, if not informed, assessment will not be performed.

    • :reference_text (String.t/0) - The reference text to be used to evaluate the user's speech.
      This is optional for unscripted assessment. The default value is "".

    • :grading_system - The grading system to be used to evaluate the user's speech.

      Supported grading systems:

      • :five_point - The user's speech will be graded on a scale of 1 to 5.
      • :hundred_mark - The user's speech will be graded on a scale of 0 to 100.

      The default value is :five_point.

    • :granularity - The granularity to be used to evaluate the user's speech.

      Supported granularities:

      • :phoneme - The user's speech will be evaluated at the phoneme level.
      • :word - The user's speech will be evaluated at the word level.
      • :fulltext - The user's speech will be evaluated at the full text level.

      The default value is :phoneme.

    • :dimension - How many dimensions will be outputted for the user's speech.

      Supported dimensions:

      • :comprehensive - All dimensions will be outputted.
      • :basic - Only the basic dimensions will be outputted.

      The default value is :comprehensive.

    • :enable_prosody_assessment (boolean/0) - If the prosody assessment should be enabled or not. The default value is false.

    • :enable_miscue (boolean/0) - If miscues should be validated in the prosody assessment. The default value is false.

Example Configuration

[
  phrase_detection: [
    speech_segmentation_silence_ms: 500
  ],
  speech_assessment: [
    reference_text: "The quick brown fox jumps over the lazy dog.",
    grading_system: :five_point,
    granularity: :phoneme,
    dimension: :comprehensive,
    enable_prosody_assessment: true,
    enable_miscue: true
  ]
]

Functions

@spec new(Keyword.t()) :: {:ok, t()} | {:error, NimbleOptions.ValidationError.t()}

Returns a valid configuration for the Speech-to-Text context.