Mojentic exposes Mojentic.Realtime.Broker as the entry point for realtime voice sessions against providers that speak the OpenAI Realtime API (the only built-in provider today is OpenAI's own).

It mirrors the Python and TypeScript ports: a long-lived broker, a short-lived per-session GenServer that owns a WebSocket, and a vendor-neutral event stream you subscribe to.

30-second example (text mode)

alias Mojentic.Realtime.{Broker, Config, OpenAIGateway, Session}

config =
  Config.new(
    modalities: [:text],
    instructions: "You are a concise assistant.",
    turn_detection: :none,
    input_audio_transcription: false
  )

broker =
  Broker.new("gpt-realtime-2",
    gateway: OpenAIGateway.new(),
    config: config
  )

{:ok, session} = Broker.connect(broker)
:ok = Session.subscribe(session, self())
:ok = Session.send_text(session, "What's the capital of Canada?")

receive_loop = fn loop ->
  receive do
    {:realtime_event, %{kind: :assistant_text, payload: %{text: text}}} ->
      IO.puts(text)

    {:realtime_event, %{kind: :assistant_turn_completed}} ->
      :done

    {:realtime_event, _} ->
      loop.(loop)

    {:realtime_close, _} ->
      :done
  end
end

receive_loop.(receive_loop)
Session.close(session)

Events

Mojentic.Realtime.Event is a struct with two fields: :kind (a symbol drawn from a 23-element discriminated union) and :payload (a map of the event's fields). Pattern match in receive blocks or case statements.

GroupKinds
Session lifecycle:session_opened, :session_updated, :session_closed
User speech:user_speech_started, :user_speech_stopped, :user_transcript_delta, :user_transcript
Assistant output:assistant_turn_started, :assistant_text_delta, :assistant_text, :assistant_transcript_delta, :assistant_transcript, :assistant_audio_delta, :assistant_turn_completed
Tool calls:tool_call_started, :tool_call_args_delta, :tool_call_dispatched, :tool_call_completed, :tool_call_failed, :tool_batch_submitted
Control:interrupted, :rate_limited, :error

Tools

Pass tools (modules or struct instances implementing the Mojentic.LLM.Tools.Tool behaviour) via Config.new(tools: [...]). The session dispatches them through Mojentic.LLM.Tools.ParallelToolRunner by default — when the model emits multiple function_call items in one turn, they execute concurrently via Task.async_stream/3 and the results are submitted back as function_call_output items before the next response.create lands.

Audio I/O

The library is hardware-free. Use Session.send_audio_frame/2 with raw PCM16 binaries and consume :assistant_audio_delta events to play audio back. For a portable example, read frames from a WAV file and write the assistant's response to another WAV file.

For live device I/O, integrate a platform audio library at the boundary (Membrane, PortAudio, etc.); the session API stays the same.

Interruption

The default on_interrupt: :drop policy discards tool outputs from a cancelled batch so the next turn isn't polluted by stale answers. Alternatives:

  • :submit_completed_only — submit only outcomes that finished before the cancel landed (snake_case; matches the atom in code).
  • :submit — submit every outcome, even after the cancel.

Manual interruption: Session.interrupt(session). Server-driven barge-in: the session detects input_audio_buffer.speech_started mid-response and cancels automatically.