Baton.Debug (Baton v0.1.0)

Copy Markdown View Source

Capture and inspect full LLM context windows at each workflow step.

Why this exists

When debugging a multi-step LLM pipeline you need to answer:

  • What exact prompt did step 3 send?
  • What upstream context did it have access to?
  • Did the model see the right information or was something lost in transit?
  • What did the raw response look like before parsing?

This module captures the full request/response payload and stores it in a dedicated table, separate from the job's meta (which only holds the parsed result) and separate from workflow_step_stats (which only holds numeric usage data).

How to use it

A drop-in replacement for your LLM client that handles capture automatically:

def perform_workflow(%Oban.Job{} = job) do
  {:ok, parsed} = Results.get_result(job, :parse_patent)

  messages = [
    %{role: "system", content: "You are a patent analyst."},
    %{role: "user", content: "Assess this patent: #{parsed["title"]}"}
  ]

  case Debug.call_llm(job, messages, model: "claude-sonnet-4-20250514") do
    {:ok, response, _debug_log} ->
      {:ok, %{
        quality: Jason.decode!(response.text),
        llm_usage: %{model: response.model, ...}
      }}

    {:error, reason} ->
      {:error, reason}
  end
end

Option B: Manual capture

If you need more control, call log_request/3 and log_response/2 yourself:

debug = Debug.log_request(job, request_payload)
{:ok, response} = MyApp.LLM.complete(request_payload)
Debug.log_response(debug, response)

Option C: Capture upstream context separately

upstream = Results.get_all_results(job)
Debug.log_upstream_context(job, upstream)

Enabling/disabling

Debug logging is controlled by config. In production, disable it to avoid storing large payloads:

# config/dev.exs
config :baton, Baton.Debug, enabled: true

# config/prod.exs
config :baton, Baton.Debug, enabled: false

Or enable it selectively per-workflow:

Baton.new(workflow_name: "debug-run", debug: true)

Pruning

Debug logs can be large. Prune old ones periodically:

# In a daily cron job or Oban plugin
Baton.Debug.prune_older_than(days: 7)

Security

Captured logs contain the full prompt (messages) and request options, and may include PII — disable debug logging in production unless you need it. As a safeguard, options whose key looks like a credential (:api_key, :authorization, :token, :headers, …) are masked as "[REDACTED]" before storage, so secrets passed through your LLM client's options don't leak into the database.

Summary

Functions

Call the LLM and automatically capture the full request/response for debugging.

Get just the request messages for a step — the "context window" view. Returns the messages array from the stored request.

Check whether debug logging is enabled globally.

Check whether debug logging is enabled for a specific job.

Reconstruct the full conversation as the LLM saw it, formatted for display. Returns a list of %{role: string, content: string, token_estimate: integer}.

Update a debug log with error info after a failed call.

Get the debug log for a specific step (most recent attempt).

Log the full request payload before making the LLM call. Returns {:ok, debug_log} or {:error, changeset}.

Update a debug log with the LLM response after a successful call.

Store the upstream context for a step separately. Useful when you want to capture what a step received from its deps independently of the prompt that was built from it.

Get all debug logs for a workflow, ordered by step execution time.

Delete debug logs older than the given duration.

Functions

call_llm(job, messages, opts \\ [])

Call the LLM and automatically capture the full request/response for debugging.

This is a thin wrapper around the configured LLM client's complete/2 that:

  1. Builds a request map from the messages and options
  2. Logs the request before calling the LLM
  3. Logs the response (or error) after the call returns
  4. Returns {:ok, response, debug_log} or {:error, reason}

The debug_log is the inserted DebugLog struct — you can ignore it.

Options

The client is resolved from config :baton, llm_client: MyApp.LLM.

All options are passed through to the client's complete/2. Common ones:

  • :model — model string
  • :system — system prompt (will be included in the captured request)
  • :max_tokens — max completion tokens
  • :temperature — sampling temperature

Upstream context capture

Pass :upstream_context to also store what this step received from its deps:

Debug.call_llm(job, messages,
  model: "claude-sonnet-4-20250514",
  upstream_context: Results.get_all_results(job)
)

context_window(workflow_id, step_name)

Get just the request messages for a step — the "context window" view. Returns the messages array from the stored request.

enabled?()

Check whether debug logging is enabled globally.

enabled_for_job?(job)

Check whether debug logging is enabled for a specific job.

format_context_window(debug_log)

Reconstruct the full conversation as the LLM saw it, formatted for display. Returns a list of %{role: string, content: string, token_estimate: integer}.

log_error(arg1, reason)

Update a debug log with error info after a failed call.

log_for_step(workflow_id, step_name)

Get the debug log for a specific step (most recent attempt).

log_request(job, request_payload, upstream_context \\ nil)

Log the full request payload before making the LLM call. Returns {:ok, debug_log} or {:error, changeset}.

log_response(arg1, response)

Update a debug log with the LLM response after a successful call.

log_upstream_context(job, context)

Store the upstream context for a step separately. Useful when you want to capture what a step received from its deps independently of the prompt that was built from it.

logs_for_workflow(workflow_id)

Get all debug logs for a workflow, ordered by step execution time.

prune_older_than(opts)

Delete debug logs older than the given duration.

Examples

Debug.prune_older_than(days: 7)
Debug.prune_older_than(hours: 24)
Debug.prune_older_than(seconds: 3_600)