QA-CHECKPOINT — canonical proof that the Provider/Session refactor changed no observable behavior of either provider.
It runs the SAME deterministic capture (qa/checkpoint_capture_test.exs) against two states of
the codebase and diffs the resulting behavior fingerprints:
- PR11 (baseline) — the three old drivers, in a throwaway jj worktree at
--base. - PR13 (current) — the unified
Session, in this worktree.
The capture drives both providers through the public facade with deterministic transports (the
Bedrock invoke_fun seam + a Bypass SSE stub), so the only variable is the codebase. Per
scenario it records: result tag, terminal, normalized stop-reason, the tool calls the loop ran,
final event count, and any error. Those fields must match exactly. One field —
stop_reason_raw_kind — is informational (the documented Claude map→string change) and is
reported but not failed.
mix req_managed_agents.qa_checkpoint
mix req_managed_agents.qa_checkpoint --base main@origin --rebuildOptions:
--base REVbaseline revision (defaultmain@origin)--rebuildrecreate the baseline worktree from scratch--keepleave the baseline worktree in place after running (default; reused next run)
Summary
Functions
Pure comparison of two fingerprint scenario-lists. Returns %{scenarios:, pass:, total:, allowlisted:}; a scenario passes when its @compared fields match exactly. Public so the
pass/fail gate is unit-tested (a gate that cannot fail proves nothing).