mix snowball.gen (snowball v0.1.1)

Copy Markdown View Source

Generate Elixir stemmer modules from Snowball .sbl algorithm sources.

Reads every .sbl file from the algorithms directory, runs it through the Snowball compiler pipeline (Lexer -> Analyser -> Generator), and writes the resulting Elixir source to the output directory.

Usage

mix snowball.gen                   # generate all algorithms
mix snowball.gen english french    # generate specific algorithms

Options

  • --module-prefix is the Elixir module prefix to use for generated stemmer modules. Defaults to Snowball.Stemmers. The full module name is the prefix joined with the PascalCase algorithm suffix (for example, Text.Stemmer.Stemmers.DutchPorter).

  • --output-dir is the directory into which generated .ex files are written. Defaults to lib/snowball/stemmers.

  • --algorithms-dir is the directory from which .sbl source files are read. Defaults to src/algorithms.

Language name mapping

The file stem (e.g. dutch_porter) becomes both the Elixir module suffix in PascalCase (DutchPorter) and the language atom passed to the generator (:dutch_porter).