Membrane.Gemini.Bin
(Membrane Gemini plugin v0.1.1)
View Source
A Membrane Bin for integrating with Google's Gemini Live API.
Session lifecycle
A Gemini.Live.Session is started during element initialisation and connected when
the bin enters the :playing state. The session runs for the lifetime of the element.
If the server sends a go_away message, the session is transparently restarted.
When a resume handle
is available the new session picks up the previous conversation context;
otherwise the session starts fresh.
The parent can send a :reset_session notification to Membrane.Gemini.Bin
at any time to force an immediate session restart.
In addition to audio buffers, the :output pad emits events relevant to the streamed response:
Membrane.Gemini.Events.ResponseStart— signals the beginning of a new model turn.Membrane.Gemini.Events.ResponseEnd— signals turn completion or barge-in interruption (interrupted?: true).Membrane.Gemini.Events.Thinking— carries intermediate thinking text when the model's thinking mode is enabled.Membrane.Gemini.Events.Transcript— carries transcription segments for both input audio (audio_origin: :client) and the model's audio output (audio_origin: :server).
End of stream
EOS is propagated to the :output pad once all input pads have received EOS and
the model is not currently generating a response. If a response is in progress when
EOS arrives on both inputs, propagation is deferred until the current turn finishes.
Bin options
Passed via struct Membrane.Gemini.Bin.t/0
mode:paced | :rawDefault value:
:paced
Whether the element should output audio as a continuous, real-time stream, intertwining the response audio with silence (:paced), or just the response audio buffers as they come (:raw).The bin stops on-server response generation upon interruption by the user (barge-in).
When the bin is working in
:rawmode, received response buffers are immediately sent downstream in the pipeline, and so the developer has to provide custom mechanisms to detect and get rid of buffers from the interrupted response, if they choose to do so, e.g. by adding a response UID to the buffer's metadata.:pacedmode discards buffers from the interrupted response automatically, and as such is preferred for straightforward LLM integrations.modelnil | String.t()Default value:
"gemini-2.5-flash-native-audio-latest"
Name of the model that should be used. For details, seeGemini.Live.Models. Defaults to"gemini-2.5-flash-native-audio-latest".system_instructionnil | String.t()Default value:
nil
The system instruction that will be attached to each prompt for the model to follow.extra_optsKeyword.t()Default value:
[]
Extra options that will be passed toGemini.Live.Session.start_link/1.NOTE: The bin relies on the following fields to have specific values:
generation_config.response_modalities == [:audio]realtime_input_config.automatic_activity_detection.disabled == falseChanging them may break functionality.
Examples:
Changing the voice
%Membrane.Gemini.Bin{ extra_opts: [ generation_config: %{ # This has to be set response_modalities: [:audio], speech_config: %{ voice_config: %{ prebuilt_voice_config: %{ voice_name: "Sadachbia" } } } } ] }Enabling thinking (off by default for Gemini 3 models)
%Membrane.Gemini.Bin{ extra_opts: [ generation_config: %{ # This has to be set response_modalities: [:audio], thinking_config: %{ thinking_budget: 1024, include_thoughts: true } } ] }Enabling context window compression
%Membrane.Gemini.Bin{ extra_opts: [ context_window_compression: %{ trigger_tokens: 16_000, sliding_window: %{ target_tokens: 8_000 } } ] }Fine-tuning automatic VAD
%Membrane.Gemini.Bin{ extra_opts: [ realtime_input_config: %{ automatic_activity_detection: %{ start_of_speech_sensitivity: :high, end_of_speech_sensitivity: :low, prefix_padding_ms: 100, silence_duration_ms: 500 } } ] }
Pads
:text_input
Accepted formats:
%Membrane.RemoteStream{type: :bytestream}| Direction: | :input |
| Availability: | :always |
:audio_input
Accepted formats:
%RawAudio{sample_format: :s16le, channels: 1, sample_rate: 16000}| Direction: | :input |
| Availability: | :always |
:output
Accepted formats:
%RawAudio{sample_format: :s16le, channels: 1, sample_rate: 24000}| Direction: | :output |
| Availability: | :always |
Summary
Types
Struct containing options for Membrane.Gemini.Bin
Types
@type t() :: %Membrane.Gemini.Bin{ extra_opts: Keyword.t(), mode: :paced | :raw, model: nil | String.t(), system_instruction: nil | String.t() }
Struct containing options for Membrane.Gemini.Bin
Functions
@spec options() :: keyword()
Returns description of options available for this module