Snowball.Lexer (snowball v0.1.1)

Copy Markdown View Source

Tokeniser for the Snowball string-processing language.

Converts a Snowball source binary into a flat list of token/0 tuples. NimbleParsec is used to compose the individual token parsers; the public tokenize/1 function wraps the generated parser and strips whitespace and comments from the result.

Token format

Each token is a tagged tuple {tag, value, line}:

  • {:integer, n, line} — decimal or hexadecimal integer literal.

  • {:string, s, line} — single-quoted string literal (UTF-8 binary). {'} inside the string produces a literal apostrophe; {X} for a single char X produces that char literally.

  • {:name, s, line} — identifier (not a reserved keyword).

  • {:keyword, atom, line} — reserved keyword, e.g. {:keyword, :define, 5}.

  • {:sym, atom, line} — punctuation or operator, e.g. {:sym, :lparen, 3}.

Summary

Types

A single Snowball token.

Functions

Parses the given binary as _tokenize_impl.

Tokenize a Snowball source binary.

Types

token()

@type token() ::
  {:integer, integer(), pos_integer()}
  | {:string, binary(), pos_integer()}
  | {:name, binary(), pos_integer()}
  | {:keyword, atom(), pos_integer()}
  | {:sym, atom(), pos_integer()}

A single Snowball token.

  • {:integer, n, line} — integer literal.

  • {:string, s, line} — string literal (UTF-8 binary).

  • {:name, s, line} — identifier.

  • {:keyword, atom, line} — reserved keyword.

  • {:sym, atom, line} — punctuation / operator.

Functions

_tokenize_impl(binary, opts \\ [])

@spec _tokenize_impl(binary(), keyword()) ::
  {:ok, [term()], rest, context, line, byte_offset}
  | {:error, reason, rest, context, line, byte_offset}
when line: {pos_integer(), byte_offset},
     byte_offset: non_neg_integer(),
     rest: binary(),
     reason: String.t(),
     context: map()

Parses the given binary as _tokenize_impl.

Returns {:ok, [token], rest, context, position, byte_offset} or {:error, reason, rest, context, line, byte_offset} where position describes the location of the _tokenize_impl (start position) as {line, offset_to_start_of_line}.

To column where the error occurred can be inferred from byte_offset - offset_to_start_of_line.

Options

  • :byte_offset - the byte offset for the whole binary, defaults to 0
  • :line - the line and the byte offset into that line, defaults to {1, byte_offset}
  • :context - the initial context value. It will be converted to a map

tokenize(source)

@spec tokenize(binary()) ::
  {:ok, [token()]} | {:error, binary(), binary(), pos_integer()}

Tokenize a Snowball source binary.

Arguments

  • source is the UTF-8 binary source text.

Returns

  • {:ok, tokens} — a list of token/0 tuples in source order.

  • {:error, reason, rest, line} — the tokeniser failed at rest on line with the given reason.

Examples

iex> Snowball.Lexer.tokenize("define")
{:ok, [{:keyword, :define, 1}]}

iex> Snowball.Lexer.tokenize("'hello'")
{:ok, [{:string, "hello", 1}]}