Firebreak.Observe (Firebreak v0.1.0)

Copy Markdown View Source

Ground-truth enrichment from a live BEAM node.

Static analysis (and even the init/1 path in Firebreak.Runtime) recovers the supervision tree as it is declared. It cannot see what is actually running: a DynamicSupervisor is empty in init/1 but may hold thousands of children at runtime, and a registered name assembled at runtime never appears in the source. --observe attaches to a running node and reads the real picture, which:

  • raises the soundness ceiling — live children (especially under DynamicSupervisors) are folded into the supervision forest, so a caller that crosses into a runtime-only child is no longer invisible to the blast-radius and cascade checks;
  • recovers names static can't — a process registered under a name we could not bind to a module statically is bound from the live registry, so previously-unresolved coupling edges resolve;
  • measures process cardinality — how many processes each supervisor is really running, surfaced as runtime_fanout findings at :exact confidence (the only findings derived from observed reality, not source).

How it attaches

attach/2 connects to the target node over distributed Erlang (starting a local distribution if one isn't running and setting the cookie if given). All data is then read with :rpc calls to standard-library MFAs only (:supervisor.which_children/1, :erlang.registered/0, :proc_lib.translate_initial_call/1) — never a shipped closure — so the target does not need Firebreak loaded. Observing the local node (node() == target) short-circuits the transport and runs the same reads directly, which is how the gathering path is exercised in tests without bringing up distribution.

Everything here is best-effort: a node we can't reach, a supervisor not registered under its module name, or a child whose module we can't translate is simply skipped, exactly as the rest of the tool treats what it can't resolve.

Summary

Functions

Connect to node. Observing the local node is a no-op. Otherwise start a local distribution if needed, set the cookie from opts[:cookie], and Node.connect.

runtime_fanout findings: a supervisor running more live children than static analysis modelled. At :exact confidence — this is observed reality. :low when the strategy co-restarts the whole set (:one_for_all / :rest_for_one), :info otherwise.

Fold a snapshot into the parsed modules so the existing forest/coupling/checks see runtime reality: live children become synthetic dynamic children of their supervisor, and recovered names are attributed to the module that owns them.

runtime_mailbox_backlog findings: a process with a deep mailbox (>= 1000 queued messages) that something calls synchronously. A synchronous caller blocks until the process drains the backlog ahead of its message, so a sustained backlog is a GenServer.call timeout and a stall that cascades back through the callers. At :exact confidence (observed). Only synchronous callers count — a fire-and-forget caster doesn't block — so this is precise about which backed-up processes are coupling hazards. Needs the resolved coupling edges.

If opts[:observe] names a node, attach and take a Snapshot; otherwise :none (the default, fully-static path). A node we can't reach degrades to :none with a note on stderr rather than failing the run.

Read the live supervision tree rooted at each of sup_modules (those that are registered under their module name) plus the node's registered names, into a Snapshot.

Functions

attach(node, opts)

@spec attach(
  node(),
  keyword()
) :: {:ok, node()} | {:error, term()}

Connect to node. Observing the local node is a no-op. Otherwise start a local distribution if needed, set the cookie from opts[:cookie], and Node.connect.

cardinality_findings(snap, modules)

@spec cardinality_findings(Firebreak.Snapshot.t(), [Firebreak.ModuleInfo.t()]) :: [
  Firebreak.Finding.t()
]

runtime_fanout findings: a supervisor running more live children than static analysis modelled. At :exact confidence — this is observed reality. :low when the strategy co-restarts the whole set (:one_for_all / :rest_for_one), :info otherwise.

enrich_modules(modules, snap)

Fold a snapshot into the parsed modules so the existing forest/coupling/checks see runtime reality: live children become synthetic dynamic children of their supervisor, and recovered names are attributed to the module that owns them.

mailbox_findings(snapshot, edges)

@spec mailbox_findings(Firebreak.Snapshot.t(), [Firebreak.Edge.t()]) :: [
  Firebreak.Finding.t()
]

runtime_mailbox_backlog findings: a process with a deep mailbox (>= 1000 queued messages) that something calls synchronously. A synchronous caller blocks until the process drains the backlog ahead of its message, so a sustained backlog is a GenServer.call timeout and a stall that cascades back through the callers. At :exact confidence (observed). Only synchronous callers count — a fire-and-forget caster doesn't block — so this is precise about which backed-up processes are coupling hazards. Needs the resolved coupling edges.

maybe_snapshot(modules, opts)

@spec maybe_snapshot(
  [Firebreak.ModuleInfo.t()],
  keyword()
) :: {:ok, Firebreak.Snapshot.t()} | :none

If opts[:observe] names a node, attach and take a Snapshot; otherwise :none (the default, fully-static path). A node we can't reach degrades to :none with a note on stderr rather than failing the run.

snapshot(node, sup_modules)

@spec snapshot(node(), [module()]) :: Firebreak.Snapshot.t()

Read the live supervision tree rooted at each of sup_modules (those that are registered under their module name) plus the node's registered names, into a Snapshot.