mix req_managed_agents.qa_checkpoint (ReqManagedAgents v0.1.0)

Copy Markdown View Source

QA-CHECKPOINT — canonical proof that the Provider/Session refactor changed no observable behavior of either provider.

It runs the SAME deterministic capture (qa/checkpoint_capture_test.exs) against two states of the codebase and diffs the resulting behavior fingerprints:

  • PR11 (baseline) — the three old drivers, in a throwaway jj worktree at --base.
  • PR13 (current) — the unified Session, in this worktree.

The capture drives both providers through the public facade with deterministic transports (the Bedrock invoke_fun seam + a Bypass SSE stub), so the only variable is the codebase. Per scenario it records: result tag, terminal, normalized stop-reason, the tool calls the loop ran, final event count, and any error. Those fields must match exactly. One field — stop_reason_raw_kind — is informational (the documented Claude map→string change) and is reported but not failed.

mix req_managed_agents.qa_checkpoint
mix req_managed_agents.qa_checkpoint --base main@origin --rebuild

Options:

  • --base REV baseline revision (default main@origin)
  • --rebuild recreate the baseline worktree from scratch
  • --keep leave the baseline worktree in place after running (default; reused next run)

Summary

Functions

Pure comparison of two fingerprint scenario-lists. Returns %{scenarios:, pass:, total:, allowlisted:}; a scenario passes when its @compared fields match exactly. Public so the pass/fail gate is unit-tested (a gate that cannot fail proves nothing).

Functions

compare(pr11, pr13)

Pure comparison of two fingerprint scenario-lists. Returns %{scenarios:, pass:, total:, allowlisted:}; a scenario passes when its @compared fields match exactly. Public so the pass/fail gate is unit-tested (a gate that cannot fail proves nothing).