Penelope v0.1.0 Penelope.NLP.PennTreebankTokenizer

The tokenization scheme used for the creation of the Penn Treebank corpus. See ftp://ftp.cis.upenn.edu/pub/treebank/public_html/tokenization.html.

Some alterations have been made to the original script to better handle common Unicode replacement characters.

Link to this section Summary

Link to this section Functions

Callback implementation for Penelope.NLP.Tokenizer.tokenize/1.