View Source Grammar (Grammar v0.1.0)

This module provides a DSL to define parser of structured inputs. Parsers are defined as a grammar.

Grammar defined must be LL(1) grammars, i.e. they must be unambiguous and have a single token lookahead.

The grammar is defined by a set of rules, each rule being a set of clauses. Clauses must be understood as disjoinded paths in the rule resolution, or as the or operator in the classic notation.

The tokenization process relies on the TokenExtractor protocol, which is used to extract tokens from the input string. This protocol is implemented for BitString and Regex, and can be extended to custom token types.

To declare a parser module, just use the Grammar module in your module, and define your rules using the rule/2 and rule?/2 macro. The newly defined module will expose a parse/1 function that will parse the input string, and return a tuple with the tokenizer in its final state, and result.

See rule/2 for a full example.

Spaces and line breaks handling

By default, the tokenizer will drop spaces and line breaks. If you want to keep them, you can pass the drop_spaces: false option to the use Grammar macro. In this case, you are fully responsible for handling spaces and line breaks in your rules.

Options

  • drop_spaces: true (default): if set to false, the tokenizer will not drop spaces and line breaks.

Example

In the following MyModuleKO module, the start rule doesn't handle spaces and line breaks, so it will fail if the input contains them.

iex> defmodule MyModuleKO do
...>   use Grammar, drop_spaces: false
...>
...>   # spaces and linebreaks not handled
...>
...>   rule start("hello", "world") do
...>     [_hello, _world] = params
...>     "hello world"
...>   end
...> end
iex> MyModuleKO.parse("helloworld")
{:ok, "hello world"}
iex> MyModuleKO.parse("hello world")
{:error, {1, 6}, :no_token}

But in the MyModuleOK module, the start explicitly handles spaces and line breaks between "hello" and "world".

And even more that rule definition requires at least one space between "hello" and "world", parsing fails if no space is found.

iex> defmodule MyModuleOK do
...>   use Grammar, drop_spaces: false
...>
...>   # spaces and linebreaks not handled
...>
...>   rule start("hello", ~r/[\s]+/, "world") do
...>     [_hello, _spaces, _world] = params
...>     "hello world"
...>   end
...> end
iex> MyModuleOK.parse("helloworld")
{:error, {1, 6}, :no_token}
iex> MyModuleOK.parse(~s/hello  \t world/)
{:ok, "hello world"}

Summary

Functions

Use this macro to define rules of your grammar.

Same as rule/2 but relaxed : if the rule cannot be matched, it will be valued as nil.

Functions

Link to this macro

rule(arg, list)

View Source (macro)

Use this macro to define rules of your grammar.

The first rule defined will be the entry rule of the grammar.

Calls to this macro sharing the same name will be grouped together as they define the same rule, each call is a possible path in the rule resolution.

Lets name a single call to rule a clause. All clauses must be disjointed, i.e. they must not share the same first token. They can be understood as the or operator in a rule.

Each rule of rule clause is defined by

  • a name, which is an atom
  • a definition, which is a list of atoms or token prototypes
  • a block, which is the code to execute when the clause is fully matched

When executed the code block is provided with a params binding, which is a list of the results of the clause steps.

In the case where a rule cannot be matched, a RuntimeError is raised (see rule?/2 for a relaxed version).

Example

iex> defmodule NumberOfNameListParser do
...>   use Grammar
...>
...>   rule start("[", :list_or_empty_list) do
...>     [_, list] = params
...>     list || []
...>   end
...>
...>   rule? list_or_empty_list(:item, :list_tail, "]") do
...>     [item, list_tail, _] = params
...>     [item | (list_tail || [])]
...>   end
...>
...>   rule? list_tail(",", :item, :list_tail) do
...>     [_, item, list_tail] = params
...>     [item | (list_tail || [])]
...>   end
...>
...>   rule item(~r/[0-9]+/) do
...>     [number] = params
...>     String.to_integer(number)
...>   end
...>
...>   rule item(~r/[a-zA-Z]+/) do
...>     [string] = params
...>     string
...>   end
...> end
iex> GrammarTest.NumberOfNameListParser.parse("[1, toto, 23]")
{:ok, [1, "toto", 23]}
Link to this macro

rule?(arg, list)

View Source (macro)

Same as rule/2 but relaxed : if the rule cannot be matched, it will be valued as nil.

Useful for optional or recursive rules.

See example in rule/2.