Interprets output from various provers into a standardized format.
The TPTP-oriented prover outputs (used by SystemOnTPTP and StarExec backends
when solvers follow the SZS Ontology) are interpreted by interpret_result/1.
Isabelle use_theories results are interpreted by
interpret_isabelle_result/1.
Summary
Types
Standardized format for ATP outputs.
Unexpected failures (including parsing errors of user input)
Per-lemma result returned by per_lemma_results/3.
Specification for one lemma in a generated theory body, as computed by
AtpClient.Isabelle.lemma_specs/1.
Expected ATP results
Functions
Concatenates all "message" strings from a use_theories payload (the
result field of a finished %IsabelleClient.Task{}) into a single
newline-separated string.
Interprets the result payload of a finished use_theories task — the map
returned in %IsabelleClient.Task{status: :finished, result: ...}, with
top-level keys "ok", "errors", "nodes", etc.
Interprets the output of provers that follow the SZS ontology (including those available on SystemOnTPTP) into the standardized representation.
Classifies a finished use_theories payload into one entry per lemma,
in the order given by lemma_specs.
Types
Standardized format for ATP outputs.
@type failure_t() :: :internal_error | :malformed_input | {:prover_not_found, String.t()} | {:unrecognized_output, String.t()}
Unexpected failures (including parsing errors of user input)
@type lemma_result() :: %{name: String.t() | nil, result: atp_result()}
Per-lemma result returned by per_lemma_results/3.
Carries the lemma name from the input body (not parsed out of Isabelle
messages, which omit it for oops-based diagnostic methods) and the
classified outcome. The on-disk line number is intentionally not
exposed: it refers to the generated theory file, not the caller's TPTP
source, so it would mislead more often than it would help.
Specification for one lemma in a generated theory body, as computed by
AtpClient.Isabelle.lemma_specs/1.
name— lemma name as written in the body (the same string the classifier surfaces inlemma_result/0).range— body-line range (inclusive) the lemma covers, from itslemma <name>:line through the line before the nextlemma(or the end of the body). Used byper_lemma_results/3to bucket Isabelle messages without relying on the exactpos.lineof each message, which is noisy for sledgehammer / nitpick output.
@type success_t() ::
:thm | :csat | :sat | :timeout | :out_of_resources | :gave_up | :interrupted
Expected ATP results
Functions
Concatenates all "message" strings from a use_theories payload (the
result field of a finished %IsabelleClient.Task{}) into a single
newline-separated string.
Walks all nodes in the payload, not just the first — useful when a theory transitively imports others and the server reports messages against multiple nodes.
@spec interpret_isabelle_result(map()) :: atp_result()
Interprets the result payload of a finished use_theories task — the map
returned in %IsabelleClient.Task{status: :finished, result: ...}, with
top-level keys "ok", "errors", "nodes", etc.
Classification is driven by the messages Isabelle emits, not by whether the task finished without errors:
- a "Nitpick found a counterexample" message yields
{:ok, :csat}; - a "Nitpick found a model" message yields
{:ok, :sat}; - a Sledgehammer "found a proof" message yields
{:ok, :thm}; - any message starting with
"theorem "(Isabelle's proof-completion notification, emitted for every discharged goal regardless of the tactic used) yields{:ok, :thm}; - a "timed out" / "TIMEOUT" message yields
{:ok, :timeout}; - a "Out of memory" message yields
{:ok, :out_of_resources}; - remaining cases yield
{:ok, :gave_up}.
Failed tasks should be surfaced through AtpClient.Isabelle.prove_theory/4's
{:error, {:isabelle_failed, _, _}} channel and never reach this function.
@spec interpret_result(String.t()) :: atp_result()
Interprets the output of provers that follow the SZS ontology (including those available on SystemOnTPTP) into the standardized representation.
This might not work for all available systems but at least for those available
through the sledgehammer tool in Isabelle/HOL. Much of the result
interpretation is taken directly from the sledgehammer source code
(https://github.com/seL4/isabelle/blob/master/src/HOL/Tools/Sledgehammer/sledgehammer_atp_systems.ML).
@spec per_lemma_results(map(), [lemma_spec()], keyword()) :: [lemma_result()]
Classifies a finished use_theories payload into one entry per lemma,
in the order given by lemma_specs.
Messages from each lemma's body-line range are bucketed and scanned
one-at-a-time — never across the concatenated text of a bucket. The
cross-message scan that earlier versions of this function used produced
false :csat verdicts when one message read Nitpick found a model
and another read Nitpick found no counterexample (the
"Nitpick found a" + "counterexample" substring test fires across the
two), and false :thm verdicts when by auto failed on a False goal
(Isabelle still echoes theorem name: <goal> at the by position next
to the Failed to finish proof error).
Options
:file— keep only messages whosepos.fileends with the given suffix (typically"/<theory_name>.thy"). Messages from the bundledTPTP.thyand from transitively imported theories are dropped. Defaults to no filter.:line_offset— number to subtract from each Isabelle-reportedpos.linebefore comparing againstlemma_spec.range. Auto-wrap adds one line (theory <name> imports … begin) to the on-disk file, so callers that pass a body without the header set this to1. Defaults to0.
Verdict precedence
Within a lemma bucket, signals collected per-message are reconciled in this order (each step short-circuits):
:csat(Nitpick / Quickcheck counter-example) — disproves the goal; dominates any concurrent proof attempt.:thmfrom a sledgehammer / metis"found a proof"line.:sat(Nitpick / Quickcheck model).- Proof-method failure veto — an
error-kind message containing"Failed to finish proof"cancels atheorem name:verdict from the same bucket. :thmfrom atheorem name:completion notification.:timeout,:out_of_resources, then:gave_up.
Lemmas whose bucket is empty come back as {:ok, :gave_up}.