ExUnitJSON.Retry (ex_unit_json v0.5.0)

Copy Markdown View Source

Merges a first test run with an automatic retry run to distinguish flaky failures from confirmed ones.

mix test.json re-runs the previously-failed tests (ExUnit's native --failed) after a run with failures, then calls merge/2 to overlay the two result documents. The goal is to never block an AI agent on a failure that heals on re-run, while never hiding a failure that doesn't.

Classification

Tests are matched across runs by their {module, name} identity (ExUnit's canonical manifest key — file:line can shift between runs).

  • confirmed — failed run 1 and did not pass run 2. Stays in tests and keeps run 1's failure detail. Counts toward summary.failed.
  • flaky — failed run 1 but passed run 2. Moved out of tests into a top-level flaky array (run 1's failure detail preserved so the agent sees what flaked). Counted in summary.flaky, never in summary.failed.

A test that failed run 1 but was not re-verified as passing (e.g. --failed could not re-run it) stays confirmed — conservative by design: nothing is marked flaky we could not observe passing.

setup_all failures (module_failures) are classified the same way by module name: recurs in run 2 → stays confirmed; cleared → moves to flaky tagged "scope" => "module".

Output shape (additive to schema v1)

The merged document is run 1's document with:

  • tests — flaky entries removed (passing/confirmed entries preserved, so --all runs keep their passing tests)
  • module_failures — only the recurring (confirmed) ones, omitted if none
  • flaky — flaky tests and modules, omitted when empty
  • summary.failed — confirmed test count; summary.flaky — flaky count; summary.result"passed" iff nothing is confirmed
  • retry%{"ran" => true, "passes" => 1, "retried" => N, "confirmed" => X, "flaky" => Y}

All functions are pure — the orchestration (subprocess spawn, file IO, exit code) lives in Mix.Tasks.Test.Json.

Summary

Types

A decoded (string-keyed) JSON result document.

Functions

Merges run 1 and run 2 result documents into a single document distinguishing confirmed failures from flaky ones.

Types

document()

@type document() :: %{optional(String.t()) => term()}

A decoded (string-keyed) JSON result document.

Functions

merge(run1, run2)

@spec merge(document(), document()) :: document()

Merges run 1 and run 2 result documents into a single document distinguishing confirmed failures from flaky ones.

Both arguments are string-keyed maps (as produced by :json.decode/1 on a buffered run). Run 2 is expected to have been produced with --all so every re-run test carries its "state".