ReqLLM.Bedrock.NovaSonic (ReqLLM v1.17.0)

View Source

Amazon Nova Sonic speech-to-speech client over Bedrock bidirectional streaming.

Nova Sonic is driven by an ordered sequence of JSON events on a single HTTP/2 bidirectional stream (see ReqLLM.Bedrock.BidiStream). This module provides the event builders and a small orchestration API on top:

{:ok, s} = NovaSonic.start("amazon.nova-sonic-v1:0",
             system_prompt: "You are a terse assistant.",
             region: "us-east-1")

# Continuous input: open one audio block, stream frames into it, close it.
{:ok, s} = NovaSonic.start_audio(s)
:ok = NovaSonic.send_audio(s, pcm_16bit_mono_chunks)
{:ok, s} = NovaSonic.end_audio(s)
# (or, one-shot: {:ok, s} = NovaSonic.audio_turn(s, pcm_16bit_mono_chunks))

:ok = NovaSonic.finish(s)   # closes any open audio, then promptEnd + sessionEnd

# consume model output (transcription, text, base64 audio)
Stream.repeatedly(fn -> NovaSonic.next(s) end)
|> Enum.take_while(&match?({:ok, _}, &1))

Event schemas follow the Nova Sonic v1 bidirectional API: https://docs.aws.amazon.com/nova/latest/userguide/input-events.html https://docs.aws.amazon.com/nova/latest/userguide/output-events.html

Summary

Functions

One-shot convenience: open an audio content block, stream all chunks, and close it. For continuous/multi-turn input, use start_audio/2 + send_audio/3 and keep the block open.

Closes the open audio content block (contentEnd). No-op if none is open.

Ends the interaction: closes any open audio content block, then sends promptEnd + sessionEnd and half-closes the request stream.

Pull the next output event, normalized to {type, body} (e.g. {"textOutput", %{...}}), or :halt / {:error, reason}.

Streams PCM frames (a binary or list of binaries, 16-bit mono LPCM) into the open audio content block. Use :pace_ms to pace frames at roughly mic cadence.

Opens a bidirectional stream and runs the opening handshake: sessionStart, promptStart, and the SYSTEM text prompt.

Opens a single USER audio content block (contentStart[USER/AUDIO]) and remembers its contentName on the session.

Functions

audio_turn(session, chunks, opts \\ [])

@spec audio_turn(
  ReqLLM.Bedrock.NovaSonic.Session.t(),
  iodata() | [binary()],
  keyword()
) ::
  {:ok, ReqLLM.Bedrock.NovaSonic.Session.t()} | {:error, term()}

One-shot convenience: open an audio content block, stream all chunks, and close it. For continuous/multi-turn input, use start_audio/2 + send_audio/3 and keep the block open.

close(session)

@spec close(ReqLLM.Bedrock.NovaSonic.Session.t()) :: :ok

end_audio(session)

@spec end_audio(ReqLLM.Bedrock.NovaSonic.Session.t()) ::
  {:ok, ReqLLM.Bedrock.NovaSonic.Session.t()} | {:error, term()}

Closes the open audio content block (contentEnd). No-op if none is open.

finish(session)

@spec finish(ReqLLM.Bedrock.NovaSonic.Session.t()) :: :ok | {:error, term()}

Ends the interaction: closes any open audio content block, then sends promptEnd + sessionEnd and half-closes the request stream.

next(session, timeout \\ 30000)

@spec next(ReqLLM.Bedrock.NovaSonic.Session.t(), non_neg_integer()) ::
  {:ok, {String.t(), map()}} | {:ok, map()} | :halt | {:error, term()}

Pull the next output event, normalized to {type, body} (e.g. {"textOutput", %{...}}), or :halt / {:error, reason}.

send_audio(session, pcm, opts \\ [])

@spec send_audio(
  ReqLLM.Bedrock.NovaSonic.Session.t(),
  iodata() | [binary()],
  keyword()
) ::
  :ok | {:error, term()}

Streams PCM frames (a binary or list of binaries, 16-bit mono LPCM) into the open audio content block. Use :pace_ms to pace frames at roughly mic cadence.

start(model_id \\ "amazon.nova-sonic-v1:0", opts \\ [])

@spec start(
  String.t(),
  keyword()
) :: {:ok, ReqLLM.Bedrock.NovaSonic.Session.t()} | {:error, term()}

Opens a bidirectional stream and runs the opening handshake: sessionStart, promptStart, and the SYSTEM text prompt.

Options: :system_prompt, :voice_id, :max_tokens, :top_p, :temperature, :output_sample_rate, plus anything BidiStream.connect/2 accepts (:credentials, :region).

start_audio(session, opts \\ [])

@spec start_audio(
  ReqLLM.Bedrock.NovaSonic.Session.t(),
  keyword()
) :: {:ok, ReqLLM.Bedrock.NovaSonic.Session.t()} | {:error, term()}

Opens a single USER audio content block (contentStart[USER/AUDIO]) and remembers its contentName on the session.

Nova Sonic models continuous microphone input as one audio container that stays open across the whole interaction — barge-in and multi-turn detection happen within it via the server's VAD. Open it once, stream frames with send_audio/3, and close it with end_audio/1 (or finish/1).