Char-by-char lexer for the ~NFT sigil and .nft files.
Mirrors the architecture of Phoenix.LiveView.TagEngine.Tokenizer
and of nft's own src/scanner.l: an explicit stack of
start conditions (lex states) lets context-sensitive
constructs add a new state without disturbing the rest of the
lexer.
The conditions in play:
:default— top-level lexing of keywords, identifiers, literals, operators, punctuation, statement separators.:line_comment—#to end of line.:block_comment—/* ... */; supports nesting (nft itself doesn't, but supporting nesting costs ~5 lines and prevents a real footgun on hand-edited files).:string—"..."with\\/\"/\n/\t/\r/\0escapes. (String-internal Elixir interpolation is not yet supported — it'll push:elixir_exprfrom:stringwhen added, no other change required.):elixir_expr— only enterable when the:interpolation?option is true. Scans an Elixir expression up to the matching}, skipping}characters that appear inside strings/charlists/comments inside the expression.
Token shape
Each token is a 2- or 3-tuple:
{:kind, meta} # punctuation with no payload
{:kind, value, meta} # everything elsewhere meta is %{line: pos_integer(), column: pos_integer()}
pointing at the start of the token.
Identifiers are emitted as {:identifier, "name", meta} — the
parser decides which names are keywords. (Pattern-matching on
binaries is ergonomic in Elixir; this avoids a 200-entry
keyword table here.)
Statement separators
In nft syntax, statements inside a { ... } body are separated
by either ; or a newline. To keep parsing simple, the
tokenizer emits a single :stmt_sep token for every ; and
for every (possibly multi-line) run of newlines, collapsing
consecutive separators into one. Newlines that appear inside
brackets are still emitted — the parser ignores spurious
separators in positions where they're not meaningful.
Line continuations (\\\n) are consumed silently.
Numeric / address literals
Network primitives need a small lookahead to disambiguate:
0x.../0X...— hex integer.0b.../0B...— binary integer.\d+followed by no.or:or/— plain decimal integer.\d+\.\d+\.\d+\.\d+— IPv4 literal (optional/NCIDR).- IPv6: any run starting with hex chars that contains
:and whose contents are valid IPv6 syntax. - MAC: six 2-char hex octets joined by
:.
Identifiers that happen to begin with hex letters (e.g. eth0
or even fe80) are still tagged as identifiers when not
followed by :. If the identifier is all-hex and followed by
: plus a hex char, the lexer rewinds and re-scans as an
IPv6/MAC literal.
Errors
Anything the tokenizer can't classify raises a
Linx.NFT.ParseError with {file, line, column} and the
offending source line. The caller (sigil macro, parse/1,
parse_file/1) catches and either re-raises (compile-time) or
returns {:error, %ParseError{}}.
Extensibility
All architectural decisions here were chosen for incremental extension, since the supported grammar is the common ~85% subset and the long tail of nft constructs (synproxy, secmark, osf, fib, jhash, advanced ct, dup/fwd, tproxy, xfrm, tunnel) will be added per-construct over time. Each addition becomes:
- (Optional) a new start condition pushed from somewhere in
:default— add a clause and a step function. - (Optional) a new token kind — extend the
@type tokenunion and the parser's pattern matches.
The stack discipline means none of these touch existing conditions.
Summary
Functions
Tokenizes source into a flat list of tokens.
Types
@type token() :: {:identifier, String.t(), token_meta()} | {:integer, integer(), token_meta()} | {:string, String.t(), token_meta()} | {:ipv4, String.t(), token_meta()} | {:ipv6, String.t(), token_meta()} | {:mac, String.t(), token_meta()} | {:cidr_v4, String.t(), token_meta()} | {:cidr_v6, String.t(), token_meta()} | {:elixir_expr, String.t(), token_meta()} | {:stmt_sep, token_meta()} | {atom(), token_meta()}
@type token_meta() :: %{line: pos_integer(), column: pos_integer()}
Functions
@spec tokenize( String.t(), keyword() ) :: {:ok, [token()]} | {:error, Linx.NFT.ParseError.t()}
Tokenizes source into a flat list of tokens.
Options
:file— source filename for error messages (default"nofile").:line— starting line number (default1); useful when called from a~NFTsigil with__CALLER__.lineto make error locations line up with the surrounding.exsource.:column— starting column number (default1).:interpolation?— whether to recognize#{...}Elixir interpolation (defaultfalse). The sigil sets this totrue;parse/1/parse_file/1leave itfalse.
Returns {:ok, tokens} or {:error, %Linx.NFT.ParseError{}}.