Interprets output from various provers into a standardized format.
The TPTP-oriented prover outputs (used by SystemOnTPTP and StarExec backends
when solvers follow the SZS Ontology) are interpreted by interpret_result/1.
Isabelle use_theories results are interpreted by
interpret_isabelle_result/1.
Summary
Types
Standardized format for ATP outputs.
Unexpected failures (including parsing errors of user input)
Per-lemma result returned by per_lemma_results/2.
Expected ATP results
Functions
Concatenates all "message" strings from a use_theories payload (the
result field of a finished %IsabelleClient.Task{}) into a single
newline-separated string.
Interprets the result payload of a finished use_theories task — the map
returned in %IsabelleClient.Task{status: :finished, result: ...}, with
top-level keys "ok", "errors", "nodes", etc.
Interprets the output of provers that follow the SZS ontology (including those available on SystemOnTPTP) into the standardized representation.
Parses a finished use_theories payload into one entry per lemma.
Types
Standardized format for ATP outputs.
@type failure_t() :: :internal_error | :malformed_input | {:prover_not_found, String.t()} | {:unrecognized_output, String.t()}
Unexpected failures (including parsing errors of user input)
@type lemma_result() :: %{ line: non_neg_integer() | nil, name: String.t() | nil, result: atp_result() }
Per-lemma result returned by per_lemma_results/2.
line— source line in the theory file as reported by Isabelle;nilwhen the underlying message carries no position. If the theory was auto-wrapped byprove_lemmas/4, this is already adjusted to the caller's line numbering (body line 1 = 1, not 2).name— lemma name extracted from Isabelle's completion message, ornilfor anonymous lemmas, sledgehammer/nitpick verdicts (which Isabelle does not tag with the lemma name), and unnamed error entries.result— any value ofatp_result/0. The same signal-table classifier used byinterpret_isabelle_result/1runs per line, sosledgehammer ... oopsandnitpickverdicts surface as{:ok, :thm}/{:ok, :csat}/{:ok, :sat}/{:ok, :gave_up}/{:ok, :timeout}/{:ok, :out_of_resources}instead of collapsing to:gave_up.
@type success_t() ::
:thm | :csat | :sat | :timeout | :out_of_resources | :gave_up | :interrupted
Expected ATP results
Functions
Concatenates all "message" strings from a use_theories payload (the
result field of a finished %IsabelleClient.Task{}) into a single
newline-separated string.
Walks all nodes in the payload, not just the first — useful when a theory transitively imports others and the server reports messages against multiple nodes.
@spec interpret_isabelle_result(map()) :: atp_result()
Interprets the result payload of a finished use_theories task — the map
returned in %IsabelleClient.Task{status: :finished, result: ...}, with
top-level keys "ok", "errors", "nodes", etc.
Classification is driven by the messages Isabelle emits, not by whether the task finished without errors:
- a "Nitpick found a counterexample" message yields
{:ok, :csat}; - a "Nitpick found a model" message yields
{:ok, :sat}; - a Sledgehammer "found a proof" message yields
{:ok, :thm}; - any message starting with
"theorem "(Isabelle's proof-completion notification, emitted for every discharged goal regardless of the tactic used) yields{:ok, :thm}; - a "timed out" / "TIMEOUT" message yields
{:ok, :timeout}; - a "Out of memory" message yields
{:ok, :out_of_resources}; - remaining cases yield
{:ok, :gave_up}.
Failed tasks should be surfaced through AtpClient.Isabelle.prove_theory/4's
{:error, {:isabelle_failed, _, _}} channel and never reach this function.
@spec interpret_result(String.t()) :: atp_result()
Interprets the output of provers that follow the SZS ontology (including those available on SystemOnTPTP) into the standardized representation.
This might not work for all available systems but at least for those available
through the sledgehammer tool in Isabelle/HOL. Much of the result
interpretation is taken directly from the sledgehammer source code
(https://github.com/seL4/isabelle/blob/master/src/HOL/Tools/Sledgehammer/sledgehammer_atp_systems.ML).
@spec per_lemma_results( map(), keyword() ) :: [lemma_result()]
Parses a finished use_theories payload into one entry per lemma.
Returns a list of lemma_result/0 maps sorted by line number. Messages
are grouped by their reported pos.line and each group is classified in
this order:
- Isabelle's
"theorem name:"completion notification →{:ok, :thm}, withnameextracted from the message; "Nitpick found a counterexample"→{:ok, :csat};"Nitpick found a model"→{:ok, :sat};- Sledgehammer
"found a proof"→{:ok, :thm}(namenil— the message does not carry the lemma name); "timed out"/"TIMEOUT"→{:ok, :timeout};"Out of memory"→{:ok, :out_of_resources};"Nitpick found no …"/"No proof found"→{:ok, :gave_up};- any other error-kind message at this line →
{:ok, :gave_up}; - groups with none of the above produce no entry.
Groups with pos.line == nil are classified the same way; the resulting
entry carries line: nil.
Pass line_offset: n to subtract n from all Isabelle-reported line
numbers. AtpClient.Isabelle.prove_lemmas/4 uses this to undo the +1 shift
introduced by auto-wrapping.