ExUnitJSON.Retry (ex_unit_json v0.5.1)

Copy Markdown View Source

Merges a first test run with an automatic retry run to distinguish flaky failures from confirmed ones.

mix test.json re-runs the previously-failed tests (ExUnit's native --failed) after a run with failures, then calls merge/2 to overlay the two result documents. The goal is to never block an AI agent on a failure that heals on re-run, while never hiding a failure that doesn't.

Classification

Tests are matched across runs by their {module, name} identity (ExUnit's canonical manifest key — file:line can shift between runs).

  • confirmed — failed run 1 and did not pass run 2. Stays in tests and keeps run 1's failure detail. Counts toward summary.failed.
  • flaky — failed run 1 but passed run 2. Moved out of tests into a top-level flaky array (run 1's failure detail preserved so the agent sees what flaked). Counted in summary.flaky, never in summary.failed.

A test that failed run 1 but was not re-verified as passing (e.g. --failed could not re-run it) stays confirmed — conservative by design: nothing is marked flaky we could not observe passing.

setup_all failures (module_failures) are classified the same way by module name: recurs in run 2 → stays confirmed; cleared → moves to flaky tagged "scope" => "module".

Invalid tests (setup_all casualties)

When a setup_all fails, ExUnit marks the module's tests "invalid". ExUnit's failure manifest records invalid tests like failures, so the retry re-runs them. The merge resolves each run-1 invalid test against its run-2 state:

  • passed run 2 → healed: replaced by its run-2 (passing) entry, moved from summary.invalid into summary.passed.
  • failed run 2 → confirmed failure: replaced by its run-2 entry (run-2 failure detail). Counts toward summary.failed.
  • still invalid / not re-run → stays as-is; its recurring module failure keeps the run red.

In failures-only output (the default), run-1 invalid tests are not present in tests. Run-2 failures of those tests are surfaced into tests so a real failure is never hidden, and summary.invalid goes to zero only when no module failure recurred (invalid tests cannot outlive their setup_all failure).

Output shape (additive to schema v1)

The merged document is run 1's document with:

  • tests — flaky entries removed; healed/re-failed invalid entries replaced by their run-2 selves (passing/confirmed entries preserved, so --all runs keep their passing tests)
  • module_failures — only the recurring (confirmed) ones, omitted if none
  • flaky — flaky tests and modules, omitted when empty
  • summary.failed — confirmed test count; summary.flaky — flaky count; summary.passed / summary.invalid — adjusted for healed invalid tests; summary.result"passed" iff nothing is confirmed and nothing is still invalid
  • retry%{"ran" => true, "passes" => 1, "retried" => N, "confirmed" => X, "flaky" => Y} where retried counts the failed tests, invalid tests, and failed modules that were re-run

All functions are pure — the orchestration (subprocess spawn, file IO, exit code) lives in Mix.Tasks.Test.Json.

Summary

Types

A decoded (string-keyed) JSON result document.

Functions

Merges run 1 and run 2 result documents into a single document distinguishing confirmed failures from flaky ones.

Types

document()

@type document() :: %{optional(String.t()) => term()}

A decoded (string-keyed) JSON result document.

Functions

merge(run1, run2)

@spec merge(document(), document()) :: document()

Merges run 1 and run 2 result documents into a single document distinguishing confirmed failures from flaky ones.

Both arguments are string-keyed maps (as produced by :json.decode/1 on a buffered run). Run 2 is expected to have been produced with --all so every re-run test carries its "state".