ExSQL.Tokenizer (exsql v0.1.4)

Copy Markdown

Lexical analysis for SQL text.

Mirrors SQLite's tokenize.c, which classifies bytes through a 256-entry lookup table and resolves keywords with a generated perfect hash. In Elixir the same job is done with binary pattern matching: each clause of scan/3 corresponds to a character class, and keyword recognition is a MapSet lookup on the downcased identifier.

Tokens are {type, value, line} tuples:

  • {:keyword, :select, 1} — case-insensitive SQL keywords
  • {:id, "users", 1} — identifiers (bare, "quoted", [bracketed], or backticked)
  • {:int, 42, 1} / {:float, 3.14, 1} — numeric literals
  • {:string, "abc", 1} — single-quoted string literals ('' escapes a quote)
  • {:blob, <<0xAB>>, 1}x'AB' hex blob literals
  • punctuation and operators such as {:comma, ",", 1}, {:le, "<=", 1}

Summary

Functions

Tokenizes sql, returning {:ok, tokens} or {:error, reason}.

Types

token()

@type token() :: {atom(), term(), pos_integer()}

Functions

tokenize(sql)

@spec tokenize(String.t()) :: {:ok, [token()]} | {:error, term()}

Tokenizes sql, returning {:ok, tokens} or {:error, reason}.

The token list does not include trailing end-of-input markers; the parser treats the empty list as end of input.