The Firebreak model IR — a contract for backends

Copy Markdown View Source

mix firebreak --format model emits a supervision-model intermediate representation: per supervisor, exactly the facts a lifecycle model needs and nothing it doesn't. It's a stable, versioned, checkable interface that other tools build on. Firebreak.Tla (TLA+ spec generation) and Firebreak.Conformance (topology-drift detection) are simply the first two consumers — both are pure functions of this IR, and so is anything you write against it.

This document is the contract: the envelope, the schema, the serialization rules, the design law that decides what's in and out, and the versioning promise. If you produce or consume the IR, this is the spec.

mix firebreak --format model            # emit the IR as JSON

In Elixir you can also call Firebreak.Model.build/1 for the in-memory term form (atom keys) and Firebreak.Model.json/1 for the JSON string.

Document envelope

{ "schema_version": 1, "supervisors": [ <bundle>, ... ] }
  • schema_version (integer) — always present, always first. Check it before trusting the document (see Versioning).
  • supervisors — one bundle per supervisor in the analysis, in no guaranteed order. Match bundles by supervisor, not by position.

Bundle schema

Each bundle is the complete lifecycle projection of one supervisor:

FieldTypeMeaning
supervisormodule stringThe supervisor this bundle describes.
strategystring | null"one_for_one" / "one_for_all" / "rest_for_one" / "simple_one_for_one", or null if unresolved.
max_restartsint | nullRestart intensity (the OTP default of 3 is filled in when the source didn't set it).
max_secondsint | nullIntensity window (default 5).
parentmodule string | nullThe supervisor this one escalates to, or null for a root.
childrenarrayChildren in declaration order (order matters for :rest_for_one).
inbound_crossingsarrayResolved coupling edges that cross into this subtree from outside it.

children[] — each child:

FieldTypeMeaning
modulemodule stringThe child's module.
restartstringEffective restart type: "permanent" / "transient" / "temporary".
typestring"worker" or "supervisor".
is_supervisorboolWhether this child is itself a supervisor (a subtree, not a leaf).

inbound_crossings[] — each crossing:

FieldTypeMeaning
frommodule stringThe caller, outside this subtree.
tomodule stringThe target, inside this subtree.
kindstringCoupling kind: "call", "cast", "whereis", "send", "registry", "ets", "pubsub", "pg".
syncboolWhether the caller blocks (:call/:whereis). Only sync crossings justify an availability property — a backend should model the caller hitting :noproc only here.
in_initboolWhether the crossing happens during the caller's init/1 (a boot-order signal).

Serialization conventions

  • Modules and atoms serialize as strings via Elixir's Atom.to_string/1, so a module is "Elixir.Sample.Cache", an atom is "one_for_one". Consumers in other languages should treat the Elixir. prefix as part of the name (or strip it for display).
  • nil is JSON null.
  • Key order is deterministic within the envelope (schema_version first); bundle and child keys may appear in any order — parse by key, never by position.
  • Zero dependencies: the encoder is a tiny built-in (Firebreak.JSON). No decoder ships — an Elixir consumer should use Firebreak.Model.build/1 directly.

The design law: topology in, behaviour out

The IR carries lifecycle / topology facts — strategy, intensity, ordered children, restart types, escalation parent, and the resolved cross-subtree edges. It deliberately excludes anything behavioural: when or why a call fires, message contents, protocol state, timing. For the lifecycle layer the correct abstraction is "a crossing call may happen at any time," so topology is sufficient to reason about restart blast radius and cross-tree availability. Protocol-level facts belong to a runtime tool, not here. This boundary is what keeps backends pure functions of the IR rather than re-derivations of the whole analysis.

Versioning

schema_version is a single integer. It bumps on a breaking change to the bundle shape — a removed, renamed, or retyped field. Additive fields (new optional keys) do not bump it, so a consumer that ignores unknown keys keeps working. A consumer should reject a document whose schema_version is greater than the one it was written against.

Current version: 1.

A real document

From mix firebreak --format model on the test fixture (test/fixtures/sample):

{
  "schema_version": 1,
  "supervisors": [
    {
      "parent": null,
      "supervisor": "Elixir.Sample.App",
      "strategy": "one_for_one",
      "max_restarts": 3,
      "max_seconds": 5,
      "children": [
        { "module": "Elixir.Sample.CacheSup", "restart": "permanent", "type": "worker", "is_supervisor": true },
        { "module": "Elixir.Sample.Worker", "restart": "permanent", "type": "worker", "is_supervisor": false }
      ],
      "inbound_crossings": []
    },
    {
      "parent": "Elixir.Sample.App",
      "supervisor": "Elixir.Sample.CacheSup",
      "strategy": "one_for_one",
      "max_restarts": 3,
      "max_seconds": 5,
      "children": [
        { "module": "Elixir.Sample.Cache", "restart": "permanent", "type": "worker", "is_supervisor": false }
      ],
      "inbound_crossings": [
        { "from": "Elixir.Sample.Worker", "to": "Elixir.Sample.Cache", "kind": "call", "sync": true, "in_init": false }
      ]
    }
  ]
}

Read it: Sample.Worker (under the root) synchronously calls Sample.Cache (under Sample.CacheSup) — a sync crossing into CacheSup's subtree. That single fact is what lets a backend assert "a restart of CacheSup surfaces :noproc in Sample.Worker."

Writing a backend (the worked example)

A backend is a pure function from a bundle to whatever you want — a spec, a diagram, a scenario, a check. The reference consumer is Firebreak.Tla (mix firebreak.spec), which turns each bundle into a TLA+ lifecycle spec. Its whole dispatch is a function of the bundle's strategy and crossings:

def generate(%{children: [_ | _]} = bundle) do
  cond do
    # a :temporary child that's a synchronous target -> models a permanent :noproc
    temp_sync_target?(bundle)          -> temp_target_tla(bundle)
    bundle.strategy == :one_for_all    -> one_for_all_tla(bundle)
    bundle.strategy in [:one_for_one, :rest_for_one] -> budget_tla(bundle)
    is_nil(bundle.strategy)            -> {:unsupported, "no resolved strategy"}
    true                               -> {:unsupported, "strategy not templated"}
  end
end

def generate(_), do: {:unsupported, "no children to model"}

Two patterns to copy:

  1. Read only what the bundle gives you. max_restarts/max_seconds → the restart budget; strategy → the blast shape; children order → who co-restarts; parent → the escalation target.
  2. Gate availability properties on sync. Firebreak.Tla only emits the cross-tree :noproc property (extStuck) when inbound_crossings contains a crossing with sync: true — because only a synchronous caller blocks. Do the same: an async crossing is not an availability hazard.

The same shape works for any backend — a lockstep scenario, a sequence diagram, a service-graph overlay. If you can express it as "for each supervisor, given these facts," it's a pure function of this IR.

Validating a document

Firebreak.Model.valid?/1 (and validate/1, which returns reasons) checks a projection — the build/1 list form — against this contract, so a producer or consumer can self-verify:

bundles = Firebreak.Model.build(analysis)
Firebreak.Model.valid?(bundles)          # => true
Firebreak.Model.validate(malformed)      # => {:error, ["bundle 0: missing key(s) [...]", ...]}