Contains a tokenize/1 function for tokenizing a string representing a
formula in THF syntax or a TPTP TH0/TH1 problem file using NimbleParsec.
This is mainly used as a preprocessing step for parsing. For information about
the returned structure, see
https://hexdocs.pm/nimble_parsec/NimbleParsec.html.
Every token is a 3-tuple {type, value, byte_offset} where byte_offset is
the position of the first byte of the token in the original input string.
Downstream parsers use this to report the position of type errors.
Examples
iex> {:ok, tokens, "", _, _, _} = tokenize("A & B")
iex> tokens
[{:var "A", 0}, {:and, "&", 2}, {:var, "B", 4}]
Summary
Types
Tokens are generated by the lexer as a list of {type, value, byte_offset}
triples, where type is a label given as atom, value is the original
string representation of the token, and byte_offset is the position in
the original input.
Functions
Converts a byte offset into the source string into a {line, column} pair
(1-indexed). Used for decorating type-error messages produced by the parser.
Parses the given binary as tokenize.
Types
@type tokens() :: [{atom(), String.t() | atom(), non_neg_integer()}]
Tokens are generated by the lexer as a list of {type, value, byte_offset}
triples, where type is a label given as atom, value is the original
string representation of the token, and byte_offset is the position in
the original input.
Functions
@spec byte_offset_to_line_col(String.t(), non_neg_integer()) :: {pos_integer(), pos_integer()}
Converts a byte offset into the source string into a {line, column} pair
(1-indexed). Used for decorating type-error messages produced by the parser.
@spec tokenize(binary(), keyword()) :: {:ok, [term()], rest, context, line, byte_offset} | {:error, reason, rest, context, line, byte_offset} when line: {pos_integer(), byte_offset}, byte_offset: non_neg_integer(), rest: binary(), reason: String.t(), context: map()
Parses the given binary as tokenize.
Returns {:ok, [token], rest, context, position, byte_offset} or
{:error, reason, rest, context, line, byte_offset} where position
describes the location of the tokenize (start position) as {line, offset_to_start_of_line}.
To column where the error occurred can be inferred from byte_offset - offset_to_start_of_line.
Options
:byte_offset- the byte offset for the whole binary, defaults to 0:line- the line and the byte offset into that line, defaults to{1, byte_offset}:context- the initial context value. It will be converted to a map