Firebreak.FailureSim (Firebreak v0.1.0)

Copy Markdown View Source

Per-process failure simulation — "if this process crashes right now, what actually goes down, and who blocks?"

The supervision tree shows containment; it does not show that a supervisor's strategy couples the fates of its children. Under :one_for_all a crash in any child restarts every sibling; under :rest_for_one it restarts every sibling started after it. So a crash in child A — which nothing outside the tree depends on — can still co-restart sibling B, and every external caller of B then observes :noproc/:timeout. That coupling is invisible in the tree and is exactly what this pass recovers.

For each supervised process we compute its restart closure: the transitive set of processes terminated alongside it under its parent's strategy (its own subtree, plus the sibling subtrees the strategy pulls in). We then find the resolved coupling edges crossing from outside that closure into it — those source modules are the ones that block when the victim crashes.

Findings (:crash_cascade) are emitted per amplifying supervisor (:one_for_all / :rest_for_one) when external modules depend on processes in its co-restart group, and call out the trigger children — those with no external dependents of their own whose crash nonetheless cascades to a sibling that does.

Summary

Types

simulation()

@type simulation() :: %{
  victim: module(),
  closure: MapSet.t(),
  external_blockers: [module()],
  edges: [Firebreak.Edge.t()],
  amplified?: boolean(),
  sync?: boolean()
}

Functions

analyze(forest, edges)