Merges a first test run with an automatic retry run to distinguish flaky failures from confirmed ones.
mix test.json re-runs the previously-failed tests (ExUnit's native
--failed) after a run with failures, then calls merge/2 to overlay the
two result documents. The goal is to never block an AI agent on a failure
that heals on re-run, while never hiding a failure that doesn't.
Classification
Tests are matched across runs by their {module, name} identity (ExUnit's
canonical manifest key — file:line can shift between runs).
- confirmed — failed run 1 and did not pass run 2. Stays in
testsand keeps run 1's failure detail. Counts towardsummary.failed. - flaky — failed run 1 but passed run 2. Moved out of
testsinto a top-levelflakyarray (run 1's failure detail preserved so the agent sees what flaked). Counted insummary.flaky, never insummary.failed.
A test that failed run 1 but was not re-verified as passing (e.g. --failed
could not re-run it) stays confirmed — conservative by design: nothing is
marked flaky we could not observe passing.
setup_all failures (module_failures) are classified the same way by module
name: recurs in run 2 → stays confirmed; cleared → moves to flaky tagged
"scope" => "module".
Output shape (additive to schema v1)
The merged document is run 1's document with:
tests— flaky entries removed (passing/confirmed entries preserved, so--allruns keep their passing tests)module_failures— only the recurring (confirmed) ones, omitted if noneflaky— flaky tests and modules, omitted when emptysummary.failed— confirmed test count;summary.flaky— flaky count;summary.result—"passed"iff nothing is confirmedretry—%{"ran" => true, "passes" => 1, "retried" => N, "confirmed" => X, "flaky" => Y}
All functions are pure — the orchestration (subprocess spawn, file IO, exit
code) lives in Mix.Tasks.Test.Json.
Summary
Types
A decoded (string-keyed) JSON result document.
Functions
Merges run 1 and run 2 result documents into a single document distinguishing confirmed failures from flaky ones.
Types
Functions
Merges run 1 and run 2 result documents into a single document distinguishing confirmed failures from flaky ones.
Both arguments are string-keyed maps (as produced by :json.decode/1 on a
buffered run). Run 2 is expected to have been produced with --all so every
re-run test carries its "state".