AtpClient.ResultNormalization (AtpClient v0.5.0)

Copy Markdown View Source

Interprets output from various provers into SZS-faithful atoms.

Each backend returns the same atp_result/0 shape: {:ok, status} where status is the SZS Ontology verdict downcased to an Elixir atom (e.g. :theorem, :unsatisfiable, :counter_satisfiable, :satisfiable, :gave_up, :timeout, …), or {:error, reason} when the call could not produce any SZS verdict (network failure, malformed prover output, missing binary). The distinction between e.g. :theorem ("conjecture follows from the axioms") and :unsatisfiable ("the clause set has no model") is preserved so callers do not lose information across backends — see https://tptp.org/Seminars/SZSOntologies/Summary.html for the full ontology.

TPTP-oriented prover outputs (SystemOnTPTP, StarExec, LocalExec) are classified by interpret_result/1; Isabelle use_theories payloads by interpret_isabelle_result/1 and per_lemma_results/3.

Summary

Types

Standardized format for ATP outputs.

Backend errors that prevented any SZS verdict from being produced.

Per-lemma result returned by per_lemma_results/3.

Specification for one lemma in a generated theory body, as computed by AtpClient.Isabelle.lemma_specs/1.

SZS NoSuccess statuses recognised explicitly by interpret_result/1.

SZS Ontology status atoms surfaced to callers. The Success/NoSuccess split mirrors the SZS hierarchy. The unions below enumerate the atoms the classifier maps explicitly — i.e. the SZS verdicts a sledgehammer-targeted prover (or a contemporary TPTP-compliant system) is likely to emit. A permissive fallback in interpret_result/1 also converts any unrecognised % SZS status <CamelCase> line to its snake-case atom (Tautology:tautology, EquivalentTheorem:equivalent_theorem), so the runtime shape is atom(); the typespec lists the well-known names for callers that pattern-match.

SZS Success statuses recognised explicitly by interpret_result/1.

Functions

Concatenates all "message" strings from a use_theories payload (the result field of a finished %IsabelleClient.Task{}) into a single newline-separated string.

Interprets the result payload of a finished use_theories task — the map returned in %IsabelleClient.Task{status: :finished, result: ...}, with top-level keys "ok", "errors", "nodes", etc.

Interprets the output of provers that follow the SZS ontology (including those available on SystemOnTPTP) into the standardized representation.

Classifies a finished use_theories payload into one entry per lemma, in the order given by lemma_specs.

Types

atp_result()

@type atp_result() :: {:ok, szs_status()} | {:error, failure_t()}

Standardized format for ATP outputs.

failure_t()

@type failure_t() ::
  :internal_error
  | :input_error
  | {:prover_not_found, String.t()}
  | {:unrecognized_output, String.t()}

Backend errors that prevented any SZS verdict from being produced.

These are not SZS statuses — they describe problems on the path between the caller and the prover (no executable, output the classifier could not parse, prover crash with no SZS line, malformed input rejected before classification).

lemma_result()

@type lemma_result() :: %{name: String.t() | nil, result: atp_result()}

Per-lemma result returned by per_lemma_results/3.

Carries the lemma name from the input body (not parsed out of Isabelle messages, which omit it for oops-based diagnostic methods) and the classified outcome. The on-disk line number is intentionally not exposed: it refers to the generated theory file, not the caller's TPTP source, so it would mislead more often than it would help.

lemma_spec()

@type lemma_spec() :: %{name: String.t(), range: Range.t()}

Specification for one lemma in a generated theory body, as computed by AtpClient.Isabelle.lemma_specs/1.

  • name — lemma name as written in the body (the same string the classifier surfaces in lemma_result/0).
  • range — body-line range (inclusive) the lemma covers, from its lemma <name>: line through the line before the next lemma (or the end of the body). Used by per_lemma_results/3 to bucket Isabelle messages without relying on the exact pos.line of each message, which is noisy for sledgehammer / nitpick output.

szs_no_success()

@type szs_no_success() ::
  :gave_up
  | :unknown
  | :incomplete
  | :timeout
  | :resource_out
  | :memory_out
  | :forced
  | :user
  | :inappropriate
  | :error
  | :input_error

SZS NoSuccess statuses recognised explicitly by interpret_result/1.

szs_status()

@type szs_status() :: szs_success() | szs_no_success() | atom()

SZS Ontology status atoms surfaced to callers. The Success/NoSuccess split mirrors the SZS hierarchy. The unions below enumerate the atoms the classifier maps explicitly — i.e. the SZS verdicts a sledgehammer-targeted prover (or a contemporary TPTP-compliant system) is likely to emit. A permissive fallback in interpret_result/1 also converts any unrecognised % SZS status <CamelCase> line to its snake-case atom (Tautology:tautology, EquivalentTheorem:equivalent_theorem), so the runtime shape is atom(); the typespec lists the well-known names for callers that pattern-match.

  • Success — a definite verdict: :theorem, :unsatisfiable, :satisfiable, :counter_satisfiable, :contradictory_axioms, :equivalent, :counter_equivalent, :counter_theorem, :equivalent_counter_theorem, :equi_satisfiable, :tautology, :tautologous_conclusion, :weaker_conclusion, :no_consequence.
  • NoSuccess — the prover finished without classifying the input as Success: :gave_up, :unknown, :incomplete, :timeout, :resource_out, :memory_out, :forced, :user, :inappropriate, :error, :input_error.

szs_success()

@type szs_success() ::
  :theorem
  | :unsatisfiable
  | :satisfiable
  | :counter_satisfiable
  | :contradictory_axioms
  | :equivalent
  | :counter_equivalent
  | :counter_theorem
  | :equivalent_counter_theorem
  | :equi_satisfiable
  | :tautology
  | :tautologous_conclusion
  | :weaker_conclusion
  | :no_consequence

SZS Success statuses recognised explicitly by interpret_result/1.

Functions

extract_isabelle_text(payload)

@spec extract_isabelle_text(map()) :: String.t()

Concatenates all "message" strings from a use_theories payload (the result field of a finished %IsabelleClient.Task{}) into a single newline-separated string.

Walks all nodes in the payload, not just the first — useful when a theory transitively imports others and the server reports messages against multiple nodes.

interpret_isabelle_result(payload)

@spec interpret_isabelle_result(map()) :: atp_result()

Interprets the result payload of a finished use_theories task — the map returned in %IsabelleClient.Task{status: :finished, result: ...}, with top-level keys "ok", "errors", "nodes", etc.

Classification is driven by the messages Isabelle emits, not by whether the task finished without errors. Isabelle is goal-directed (it proves stated conjectures), so successful proofs map to :theorem rather than :unsatisfiable, and a refuted goal maps to :counter_satisfiable:

  • a "Nitpick / Quickcheck found a counterexample" message yields {:ok, :counter_satisfiable};
  • a "Nitpick / Quickcheck found a model" message yields {:ok, :satisfiable};
  • a Sledgehammer "found a proof" message yields {:ok, :theorem};
  • any message starting with "theorem " (Isabelle's proof-completion notification, emitted for every discharged goal regardless of the tactic used) yields {:ok, :theorem};
  • a "timed out" / "TIMEOUT" message yields {:ok, :timeout};
  • a "Out of memory" message yields {:ok, :memory_out};
  • remaining cases yield {:ok, :gave_up}.

Failed tasks should be surfaced through AtpClient.Isabelle.prove_theory/4's {:error, {:isabelle_failed, _, _}} channel and never reach this function.

interpret_result(res_str)

@spec interpret_result(String.t()) :: atp_result()

Interprets the output of provers that follow the SZS ontology (including those available on SystemOnTPTP) into the standardized representation.

Three layers run in order:

  1. Prover-specific patterns for solvers that do not (always) emit an SZS status line — Alt-Ergo, E, iProver, LEO-II, SPASS, Vampire, Waldmeister. Mirrors the sledgehammer source (https://github.com/seL4/isabelle/blob/master/src/HOL/Tools/Sledgehammer/sledgehammer_atp_systems.ML) but maps to the honest SZS atom rather than sledgehammer's negated-conjecture interpretation: e.g. SPASS "Completion found" becomes {:ok, :satisfiable}, not {:ok, :gave_up}.
  2. The explicit SZS table (@known_szs_results) recognises every status in szs_success/0 and szs_no_success/0 from a "SZS status <Name>" or " says <Name>" substring.
  3. A permissive fallback converts any remaining "SZS status <CamelCase>" (or " says <CamelCase>") line to its snake-case atom, so SZS additions like EquivalentTheorem become {:ok, :equivalent_theorem} without a code change.

Output that none of the three layers matches comes back as {:error, {:unrecognized_output, res_str}}.

per_lemma_results(payload, lemma_specs, opts \\ [])

@spec per_lemma_results(map(), [lemma_spec()], keyword()) :: [lemma_result()]

Classifies a finished use_theories payload into one entry per lemma, in the order given by lemma_specs.

Messages from each lemma's body-line range are bucketed and scanned one-at-a-time — never across the concatenated text of a bucket. The cross-message scan that earlier versions of this function used produced false :counter_satisfiable verdicts when one message read Nitpick found a model and another read Nitpick found no counterexample (the "Nitpick found a" + "counterexample" substring test fires across the two), and false :theorem verdicts when by auto failed on a False goal (Isabelle still echoes theorem name: <goal> at the by position next to the Failed to finish proof error).

Options

  • :file — keep only messages whose pos.file ends with the given suffix (typically "/<theory_name>.thy"). Messages from the bundled TPTP.thy and from transitively imported theories are dropped. Defaults to no filter.
  • :line_offset — number to subtract from each Isabelle-reported pos.line before comparing against lemma_spec.range. Auto-wrap adds one line (theory <name> imports … begin) to the on-disk file, so callers that pass a body without the header set this to 1. Defaults to 0.

Verdict precedence

Within a lemma bucket, signals collected per-message are reconciled in this order (each step short-circuits):

  1. :counter_satisfiable (Nitpick / Quickcheck counter-example) — disproves the goal; dominates any concurrent proof attempt.
  2. :theorem from a sledgehammer / metis "found a proof" line.
  3. :satisfiable (Nitpick / Quickcheck model).
  4. Proof-method failure veto — an error-kind message containing "Failed to finish proof" cancels a theorem name: verdict from the same bucket.
  5. :theorem from a theorem name: completion notification.
  6. :timeout, :memory_out, then :gave_up.

Lemmas whose bucket is empty come back as {:ok, :gave_up}.