More documentation will be added, for now please refer to the documentation of NPeg, a Nim implementation of a similar PEG parser.
Introduction
XPeg is a pure Elixir pattern matching library. It provides macros to compile patterns and grammars (PEGs) to Elixir function which will parse a string and collect selected parts of the input. PEGs are not unlike regular expressions, but offer more power and flexibility, and have less ambiguities. (More about PEGs on Wikipedia)
Some use cases where XPeg is useful are configuration or data file parsers, robust protocol implementations, input validation, lexing of programming languages or domain specific languages.
Installation
def deps do
[
{:xpeg, "~> 0.3.0"}
]
end
Quickstart
Here is a simple example showing the power of XPeg: The macro peg
compiles a
grammar definition into a parser
functiion, which is used to match a string and
place the key-value pairs into a list of tuples:
p = Xpeg.peg :dict do
:dict <- :pair * star("," * :pair) * !1
:pair <- :word * "=" * :number * fn [a,b|cs] -> [{b,a}|cs] end
:word <- cap(+{'a'..'z'})
:number <- cap(+{'0'..'9'}) * fn [v|cs] -> [String.to_integer(v) | cs] end
end
Xpeg.match(p, "grass=4,horse=1,star=2")
Output:
[{"star", 2}, {"horse", 1}, {"grass", 4}]
Usage
The basic operation consists of the provided grammar, which consists of a set of named rules. A rule is made up of a number of atoms (not to be confused with Elixirs atom) and operators, which are executed to match the input string. Rules can also call into other rules, allowing for recursive grammars.
For example, the grammar below matches a comma-separated list of words
p = peg :list do
:list <- :word * star( "," * :word )
:word <- +{'a'..'z'}
end
- The
:list
rule matches one:word
, followed by zero or more (star(P)
) times a,
followed by a:word
- The
:word
rule matches one-or-more (+P
) times the set of characters ({}
) consisting of all letters from'a'
to'z'
During the execution of the grammar, matching parts of the subject strings can
be captured with the cap()
operator. All captures are stored on the
captures
list inside the parser state. This list is returned by the match()
function, but can also be used by in-grammar functions to perform conversions
or transformations.
Below is the same grammar as above, but in this case it captures all
the individual :word:
s:
p = peg :list do
:list <- :word * star( "," * :word )
:word <- cap(+{'a'..'z'})
end
match(p, "one,two,three")
The above will return these following list of captures:
["three", "two", "one"]
A powerful feature allows mixing of Elixir functions with the grammar, which can be used to perform transformations of the captures or build abstract syntax trees (ASTs) on-the-fly.
For example, the grammar above is changed to match numbers instead of words,
and a conversion function is called after every matching number that
converts the last captured value on the captures
list to an integer:
p = peg :list do
:list <- :word * star( "," * :word )
:word <- cap(+{'0'..'9'}) *
fn [v|cs] ->
[String.to_integer(v)|cs]
end
end
match(p, "123,42,31415")
which results in the following captures:
[31415, 42, 123]
More elaborate examples can be found in examples_test.exs, including a parser for arithmatic expressions and a full JSON parser.
Syntax
The XPeg syntax is similar to normal PEG notation, but some changes were made to allow the grammar to be properly parsed by the Elixir compiler:
- XPeg uses prefix operators instead of suffix operators for
+
,-
- Elixir does not support the
*
and?
prefix operators, so insteadstar(P)
andopt(P)
are used - The explicit
*
infix operator is used for concatenation
XPeg patterns and grammars can be composed of the following parts:
Atoms:
0 # matches always and consumes nothing
1 # matches any character
n # matches exactly n characters
'x' # matches literal character 'x'
"xyz" # matches literal string "xyz"
{'x'..'y'} # matches any character in the range from 'x'..'y'
{'x','y','z'} # matches any character from the set
Operators:
P1 * P2 # concatenation
P1 | P2 # ordered choice
P1 - P2 # matches P1 if P2 does not match
(P) # grouping
!P # matches everything but P
&P # matches P without consuming input
opt(P) # matches P zero or one times
star(P) # matches P zero or more times
+P # matches P one or more times
P[n] # matches P n times
P[m..n] # matches P m to n times
Captures:
cap(P) # Adds the matched patterns to the captures list
Elixir function:
fn(captures) # Elixir function for transormations
TODO
XPeg is already useful, but still young and immature. Here's a list of things I want to fix:
- I do not like the
star()
andopt()
syntax of the AST, but given the limited support for prefix operators in Elixir I'm not yet sure how to make this better - Port most of the optimizations from NPeg to XPeg
- Fix the subpar compile time performance