Membrane.Gemini.Bin (Membrane Gemini plugin v0.1.1)

View Source

A Membrane Bin for integrating with Google's Gemini Live API.

Session lifecycle

A Gemini.Live.Session is started during element initialisation and connected when the bin enters the :playing state. The session runs for the lifetime of the element.

If the server sends a go_away message, the session is transparently restarted. When a resume handle is available the new session picks up the previous conversation context; otherwise the session starts fresh.

The parent can send a :reset_session notification to Membrane.Gemini.Bin at any time to force an immediate session restart.

In addition to audio buffers, the :output pad emits events relevant to the streamed response:

End of stream

EOS is propagated to the :output pad once all input pads have received EOS and the model is not currently generating a response. If a response is in progress when EOS arrives on both inputs, propagation is deferred until the current turn finishes.

Bin options

Passed via struct Membrane.Gemini.Bin.t/0

  • mode

    :paced | :raw

    Default value: :paced
    Whether the element should output audio as a continuous, real-time stream, intertwining the response audio with silence (:paced), or just the response audio buffers as they come (:raw).

    The bin stops on-server response generation upon interruption by the user (barge-in).

    When the bin is working in :raw mode, received response buffers are immediately sent downstream in the pipeline, and so the developer has to provide custom mechanisms to detect and get rid of buffers from the interrupted response, if they choose to do so, e.g. by adding a response UID to the buffer's metadata.

    :paced mode discards buffers from the interrupted response automatically, and as such is preferred for straightforward LLM integrations.

  • model

    nil | String.t()

    Default value: "gemini-2.5-flash-native-audio-latest"
    Name of the model that should be used. For details, see Gemini.Live.Models. Defaults to "gemini-2.5-flash-native-audio-latest".

  • system_instruction

    nil | String.t()

    Default value: nil
    The system instruction that will be attached to each prompt for the model to follow.

  • extra_opts

    Keyword.t()

    Default value: []
    Extra options that will be passed to Gemini.Live.Session.start_link/1.

    NOTE: The bin relies on the following fields to have specific values:

    1. generation_config.response_modalities == [:audio]
    2. realtime_input_config.automatic_activity_detection.disabled == false Changing them may break functionality.

    Examples:

    Changing the voice

    %Membrane.Gemini.Bin{
      extra_opts: [
        generation_config: %{
          # This has to be set
          response_modalities: [:audio],
          speech_config: %{
            voice_config: %{
              prebuilt_voice_config: %{
                voice_name: "Sadachbia"
              }
            }
          }
        }
      ]
    }

    Enabling thinking (off by default for Gemini 3 models)

    %Membrane.Gemini.Bin{
      extra_opts: [
        generation_config: %{
          # This has to be set
          response_modalities: [:audio],
          thinking_config: %{
            thinking_budget: 1024,
            include_thoughts: true
          }
        }
      ]
    }

    Enabling context window compression

    %Membrane.Gemini.Bin{
      extra_opts: [
        context_window_compression: %{
          trigger_tokens: 16_000,
          sliding_window: %{
            target_tokens: 8_000
          }
        }
      ]
    }

    Fine-tuning automatic VAD

    %Membrane.Gemini.Bin{
      extra_opts: [
        realtime_input_config: %{
          automatic_activity_detection: %{
            start_of_speech_sensitivity: :high,
            end_of_speech_sensitivity: :low,
            prefix_padding_ms: 100,
            silence_duration_ms: 500
          }
        }
      ]
    }

Pads

:text_input

Accepted formats:

%Membrane.RemoteStream{type: :bytestream}
Direction::input
Availability::always

:audio_input

Accepted formats:

%RawAudio{sample_format: :s16le, channels: 1, sample_rate: 16000}
Direction::input
Availability::always

:output

Accepted formats:

%RawAudio{sample_format: :s16le, channels: 1, sample_rate: 24000}
Direction::output
Availability::always

Summary

Types

t()

Struct containing options for Membrane.Gemini.Bin

Functions

Returns description of options available for this module

Types

t()

@type t() :: %Membrane.Gemini.Bin{
  extra_opts: Keyword.t(),
  mode: :paced | :raw,
  model: nil | String.t(),
  system_instruction: nil | String.t()
}

Struct containing options for Membrane.Gemini.Bin

Functions

options()

@spec options() :: keyword()

Returns description of options available for this module