ReqLLM. Bedrock. NovaSonic
(ReqLLM v1.17.0)
View Source
Amazon Nova Sonic speech-to-speech client over Bedrock bidirectional streaming.
Nova Sonic is driven by an ordered sequence of JSON events on a single HTTP/2
bidirectional stream (see ReqLLM.Bedrock.BidiStream). This module provides the
event builders and a small orchestration API on top:
{:ok, s} = NovaSonic.start("amazon.nova-sonic-v1:0",
system_prompt: "You are a terse assistant.",
region: "us-east-1")
# Continuous input: open one audio block, stream frames into it, close it.
{:ok, s} = NovaSonic.start_audio(s)
:ok = NovaSonic.send_audio(s, pcm_16bit_mono_chunks)
{:ok, s} = NovaSonic.end_audio(s)
# (or, one-shot: {:ok, s} = NovaSonic.audio_turn(s, pcm_16bit_mono_chunks))
:ok = NovaSonic.finish(s) # closes any open audio, then promptEnd + sessionEnd
# consume model output (transcription, text, base64 audio)
Stream.repeatedly(fn -> NovaSonic.next(s) end)
|> Enum.take_while(&match?({:ok, _}, &1))Event schemas follow the Nova Sonic v1 bidirectional API: https://docs.aws.amazon.com/nova/latest/userguide/input-events.html https://docs.aws.amazon.com/nova/latest/userguide/output-events.html
Summary
Functions
One-shot convenience: open an audio content block, stream all chunks, and
close it. For continuous/multi-turn input, use start_audio/2 + send_audio/3
and keep the block open.
Closes the open audio content block (contentEnd). No-op if none is open.
Ends the interaction: closes any open audio content block, then sends
promptEnd + sessionEnd and half-closes the request stream.
Pull the next output event, normalized to {type, body} (e.g.
{"textOutput", %{...}}), or :halt / {:error, reason}.
Streams PCM frames (a binary or list of binaries, 16-bit mono LPCM) into the
open audio content block. Use :pace_ms to pace frames at roughly mic cadence.
Opens a bidirectional stream and runs the opening handshake:
sessionStart, promptStart, and the SYSTEM text prompt.
Opens a single USER audio content block (contentStart[USER/AUDIO]) and
remembers its contentName on the session.
Functions
@spec audio_turn( ReqLLM.Bedrock.NovaSonic.Session.t(), iodata() | [binary()], keyword() ) :: {:ok, ReqLLM.Bedrock.NovaSonic.Session.t()} | {:error, term()}
One-shot convenience: open an audio content block, stream all chunks, and
close it. For continuous/multi-turn input, use start_audio/2 + send_audio/3
and keep the block open.
@spec close(ReqLLM.Bedrock.NovaSonic.Session.t()) :: :ok
@spec end_audio(ReqLLM.Bedrock.NovaSonic.Session.t()) :: {:ok, ReqLLM.Bedrock.NovaSonic.Session.t()} | {:error, term()}
Closes the open audio content block (contentEnd). No-op if none is open.
@spec finish(ReqLLM.Bedrock.NovaSonic.Session.t()) :: :ok | {:error, term()}
Ends the interaction: closes any open audio content block, then sends
promptEnd + sessionEnd and half-closes the request stream.
@spec next(ReqLLM.Bedrock.NovaSonic.Session.t(), non_neg_integer()) :: {:ok, {String.t(), map()}} | {:ok, map()} | :halt | {:error, term()}
Pull the next output event, normalized to {type, body} (e.g.
{"textOutput", %{...}}), or :halt / {:error, reason}.
@spec send_audio( ReqLLM.Bedrock.NovaSonic.Session.t(), iodata() | [binary()], keyword() ) :: :ok | {:error, term()}
Streams PCM frames (a binary or list of binaries, 16-bit mono LPCM) into the
open audio content block. Use :pace_ms to pace frames at roughly mic cadence.
@spec start( String.t(), keyword() ) :: {:ok, ReqLLM.Bedrock.NovaSonic.Session.t()} | {:error, term()}
Opens a bidirectional stream and runs the opening handshake:
sessionStart, promptStart, and the SYSTEM text prompt.
Options: :system_prompt, :voice_id, :max_tokens, :top_p, :temperature,
:output_sample_rate, plus anything BidiStream.connect/2 accepts
(:credentials, :region).
@spec start_audio( ReqLLM.Bedrock.NovaSonic.Session.t(), keyword() ) :: {:ok, ReqLLM.Bedrock.NovaSonic.Session.t()} | {:error, term()}
Opens a single USER audio content block (contentStart[USER/AUDIO]) and
remembers its contentName on the session.
Nova Sonic models continuous microphone input as one audio container that
stays open across the whole interaction — barge-in and multi-turn detection
happen within it via the server's VAD. Open it once, stream frames with
send_audio/3, and close it with end_audio/1 (or finish/1).