Penelope v0.2.3 Penelope.NLP.Tokenize.Tokenizer behaviour

The behaviour implemented by all tokenizers.

Link to this section Summary

Callbacks

Reverse the tokenization process, turning a list of tokens into a single string. Tokenization is often a lossy process, so detokenization is not guaranteed to return a string identical to the original tokenizer’s input

Separate a string into a list of tokens

Link to this section Callbacks

Link to this callback detokenize(list)
detokenize([String.t()]) :: String.t()

Reverse the tokenization process, turning a list of tokens into a single string. Tokenization is often a lossy process, so detokenization is not guaranteed to return a string identical to the original tokenizer’s input.

Link to this callback tokenize(arg0)
tokenize(String.t()) :: [String.t()]

Separate a string into a list of tokens.