Snowball.Preprocessor (snowball v0.1.1)

Copy Markdown View Source

Source-level preprocessor for Snowball .sbl files.

Handles the two Snowball directives that must be resolved before tokenisation:

  • stringescapes LB RB — declares that LB name RB inside string literals is a substitution reference (e.g. stringescapes {} makes {name} the substitution syntax).

  • stringdef name 'value' — defines a named string alias, where the value may use {U+XXXX} Unicode escapes.

After preprocessing the returned source:

  • All stringescapes and stringdef declarations are removed.

  • Every string literal in the remaining source has its {U+XXXX} and {name} sequences expanded to real UTF-8 characters.

The resulting source can be fed directly to Snowball.Lexer.tokenize/1.

Summary

Functions

Preprocess a Snowball source binary.

Functions

preprocess(source)

@spec preprocess(binary()) :: binary()

Preprocess a Snowball source binary.

Arguments

  • source is the raw UTF-8 source text.

Returns

  • The preprocessed source binary with all stringdef / stringescapes declarations removed and all string-escape sequences expanded.