View Source ExAequo.RegexTokenizer (ExAequo v0.6.6)

Allows tokenizing text by means of priorized regular expressions

Summary

Functions

Link to this function

tokenize(string, tokens, options \\ [])

View Source

A simple example first

iex(1)> tokens = [
...(1)>   { "\\d+", &String.to_integer/1 },
...(1)>   { "[\\s,]+", &nil_fn/1 },                   # from ExAequo.Fn
...(1)>   { "\\w+", &String.to_atom/1 } ]
...(1)> tokenize("42, and 43", tokens)
{:ok, [42, nil, :and, nil, 43]}

If we want to ignore nil (or other values)

iex(2)> tokens = [
...(2)>   { "\\d+", &String.to_integer/1 },
...(2)>   { "[\\s,]+", &nil_fn/1 },                   # from ExAequo.Fn
...(2)>   { "\\w+", &String.to_atom/1 } ]
...(2)> tokenize("42, and 43", tokens, ignores: [nil])
{:ok, [42, :and, 43]}

And a little bit more complex example as used in this library

iex(3)> tokens = [
...(3)>   "\\\\(.)",                                  # same as {"\\.\\s+", &(&1)}
...(3)>   "\\.\\s+",
...(3)>   { "\\.(\\w+)\\.", &String.to_atom/1 },
...(3)>   ".[^\\\\.]+" ]
...(3)> [
...(3)> tokenize!(".red.hello", tokens),
...(3)> tokenize!(". \\.red.blue\\..green.", tokens)]
[
  [:red, "hello"],
  [". ", ".", "red", ".blue", ".", :green]
]
Link to this function

tokenize!(string, tokens, options \\ [])

View Source