Tokeniser for the Snowball string-processing language.
Converts a Snowball source binary into a flat list of token/0 tuples.
NimbleParsec is used to compose the individual token parsers; the public
tokenize/1 function wraps the generated parser and strips whitespace and
comments from the result.
Token format
Each token is a tagged tuple {tag, value, line}:
{:integer, n, line}— decimal or hexadecimal integer literal.{:string, s, line}— single-quoted string literal (UTF-8 binary).{'}inside the string produces a literal apostrophe;{X}for a single charXproduces that char literally.{:name, s, line}— identifier (not a reserved keyword).{:keyword, atom, line}— reserved keyword, e.g.{:keyword, :define, 5}.{:sym, atom, line}— punctuation or operator, e.g.{:sym, :lparen, 3}.
Summary
Types
@type token() :: {:integer, integer(), pos_integer()} | {:string, binary(), pos_integer()} | {:name, binary(), pos_integer()} | {:keyword, atom(), pos_integer()} | {:sym, atom(), pos_integer()}
A single Snowball token.
{:integer, n, line}— integer literal.{:string, s, line}— string literal (UTF-8 binary).{:name, s, line}— identifier.{:keyword, atom, line}— reserved keyword.{:sym, atom, line}— punctuation / operator.
Functions
@spec _tokenize_impl(binary(), keyword()) :: {:ok, [term()], rest, context, line, byte_offset} | {:error, reason, rest, context, line, byte_offset} when line: {pos_integer(), byte_offset}, byte_offset: non_neg_integer(), rest: binary(), reason: String.t(), context: map()
Parses the given binary as _tokenize_impl.
Returns {:ok, [token], rest, context, position, byte_offset} or
{:error, reason, rest, context, line, byte_offset} where position
describes the location of the _tokenize_impl (start position) as {line, offset_to_start_of_line}.
To column where the error occurred can be inferred from byte_offset - offset_to_start_of_line.
Options
:byte_offset- the byte offset for the whole binary, defaults to 0:line- the line and the byte offset into that line, defaults to{1, byte_offset}:context- the initial context value. It will be converted to a map
@spec tokenize(binary()) :: {:ok, [token()]} | {:error, binary(), binary(), pos_integer()}
Tokenize a Snowball source binary.
Arguments
sourceis the UTF-8 binary source text.
Returns
{:ok, tokens}— a list oftoken/0tuples in source order.{:error, reason, rest, line}— the tokeniser failed atrestonlinewith the givenreason.
Examples
iex> Snowball.Lexer.tokenize("define")
{:ok, [{:keyword, :define, 1}]}
iex> Snowball.Lexer.tokenize("'hello'")
{:ok, [{:string, "hello", 1}]}