expug v0.7.3 Expug.Tokenizer

Tokenizes a Pug template into a list of tokens. The main entry point is tokenize/1.

iex> Expug.Tokenizer.tokenize("title= name")
[
  {{1, 8}, :buffered_text, "name"},
  {{1, 1}, :element_name, "title"},
  {{1, 1}, :indent, 0}
]

Note that the tokens are reversed! It’s easier to append to the top of a list rather than to the end, making it more efficient.

This output is the consumed next by Expug.Compiler, which turns them into an Abstract Syntax Tree.

Token types

div.blue#box
  • :indent - 0
  • :element_name - "div"
  • :element_class - "blue"
  • :element_id - "box"
div(name="en")
  • :attribute_open - "("
  • :attribute_key - "name"
  • :attribute_value - "\"en\""
  • :attribute_close - ")"
div= hello
  • :buffered_text - hello
div!= hello
  • :unescaped_text - hello
div hello
  • :raw_text - "hello"
| Hello there
  • :raw_text - "Hello there"
= Hello there
  • :buffered_text - "Hello there"
- foo = bar
  • :statement - foo = bar
doctype html5
  • :doctype - html5
-# comment
  more comments
  • :line_comment - comment
  • :subindent - more comments
// comment
  more comments
  • :html_comment - comment
  • :subindent - more comments

Also see

Summary

Functions

Matches foo='val' or foo

Matches foo='val' bar='val'

Matches an optional comma in between attributes

Matches [name='foo' ...]

Matches doctype html

Matches an entire document

Matches div.foo[id="name"]= Hello world

Matches .foo

Matches .foo or #id (just one)

Matches .foo.bar#baz

Matches div, div.foo div.foo.bar#baz, etc

Matches div.foo.bar#baz

Matches #id

Matches title in title= hello

Matches an HTML element, text node, or, you know… the basic statements. I don’t know what to call this

Returns the next indentation level after some newlines. Infers the last indentation level based on doc

Returns the next indentation level after some newlines

Matches an indentation. Gives a token that looks like {_, :indent, 2} where the last number is the number of spaces/tabs

Matches any number of blank newlines. Whitespaces are accounted for

Matches text

Tokenizes a string. Returns a list of tokens. Each token is in the format {position, token, value}

Matches whitespace; no tokens emitted

Matches whitespace or newline; no tokens emitted

Functions

attribute(state)

Matches foo='val' or foo

attribute_brace(state)
attribute_bracket(state)
attribute_equal(state)
attribute_key(state)
attribute_key_value(state)
attribute_list(state)

Matches foo='val' bar='val'

attribute_paren(state)
attribute_separator(state)

Matches an optional comma in between attributes.

div(id=a class=b)
div(id=a, class=b)
attribute_value(state)
attributes_block(state)

Matches [name='foo' ...]

block_text(state)
buffered_text(state)
doctype(state)

Matches doctype html.

document(state)

Matches an entire document.

element(state)

Matches div.foo[id="name"]= Hello world

element_class(state)

Matches .foo

element_class_or_id(state)

Matches .foo or #id (just one)

element_class_or_id_list(state)

Matches .foo.bar#baz

element_descriptor(state)

Matches div, div.foo div.foo.bar#baz, etc

element_descriptor_full(state)

Matches div.foo.bar#baz

element_id(state)

Matches #id

element_name(state)

Matches title in title= hello

element_or_text(state)

Matches an HTML element, text node, or, you know… the basic statements. I don’t know what to call this.

get_indent(list)
get_next_indent(state)

Returns the next indentation level after some newlines. Infers the last indentation level based on doc.

iex> source = "-#\n  span"
iex> doc = [{0, :indent, 0}]
iex> Expug.Tokenizer.get_next_indent(%{tokens: doc, source: source, position: 2}, 0)
2
get_next_indent(state, level)

Returns the next indentation level after some newlines.

iex> source = "-#\n  span"
iex> Expug.Tokenizer.get_next_indent(%{tokens: [], source: source, position: 2}, 0)
2

iex> source = "-#\n\n\n  span"
iex> Expug.Tokenizer.get_next_indent(%{tokens: [], source: source, position: 2}, 0)
2
html_comment(state)
indent(state)

Matches an indentation. Gives a token that looks like {_, :indent, 2} where the last number is the number of spaces/tabs.

Doesn’t really care if you use spaces or tabs; a tab is treated like a single space.

line_comment(state)
multiline_buffered_text(state)
multiline_statement(state)
multiline_unescaped_text(state)
newlines(state)

Matches any number of blank newlines. Whitespaces are accounted for.

one_line_buffered_text(state)
one_line_statement(state)
one_line_unescaped_text(state)
optional_whitespace(state)
optional_whitespace_or_newline(state)
raw_text(state)
sole_buffered_text(state)

Matches =

sole_raw_text(state)

Matches text

sole_unescaped_text(state)

Matches !=

statement(state)
subindent(state, level)
subindent_block(state)
tokenize(source, opts \\ [])

Tokenizes a string. Returns a list of tokens. Each token is in the format {position, token, value}.

unescaped_text(state)
whitespace(state)

Matches whitespace; no tokens emitted

whitespace_or_newline(state)

Matches whitespace or newline; no tokens emitted