View Source Bumblebee.Audio (Bumblebee v0.3.0)

High-level tasks related to audio processing.

Link to this section Summary

Link to this section Types

Link to this type

speech_to_text_input()

View Source
@type speech_to_text_input() :: Nx.t() | {:file, String.t()}

A term representing audio.

Can be either of:

  • a 1-dimensional Nx.Tensor with audio samples

  • {:file, path} with path to an audio file (note that this requires ffmpeg installed)

Link to this type

speech_to_text_output()

View Source
@type speech_to_text_output() :: %{results: [speech_to_text_result()]}
Link to this type

speech_to_text_result()

View Source
@type speech_to_text_result() :: %{text: String.t()}

Link to this section Functions

Link to this function

speech_to_text(model_info, featurizer, tokenizer, generation_config, opts \\ [])

View Source

Builds serving for speech-to-text generation.

The serving accepts speech_to_text_input/0 and returns speech_to_text_output/0. A list of inputs is also supported.

options

Options

  • :seed - random seed to use when sampling. By default the current timestamp is used

  • :compile - compiles all computations for predefined input shapes during serving initialization. Should be a keyword list with the following keys:

    • :batch_size - the maximum batch size of the input. Inputs are optionally padded to always match this batch size

    It is advised to set this option in production and also configure a defn compiler using :defn_options to maximally reduce inference time.

  • :defn_options - the options for JIT compilation. Defaults to []

examples

Examples

{:ok, whisper} = Bumblebee.load_model({:hf, "openai/whisper-tiny"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-tiny"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/whisper-tiny"})
{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "openai/whisper-tiny"})

serving =
  Bumblebee.Audio.speech_to_text(whisper, featurizer, tokenizer, generation_config,
    defn_options: [compiler: EXLA]
  )

Nx.Serving.run(serving, {:file, "/path/to/audio.wav"})
#=> %{results: [%{text: "There is a cat outside the window."}]}