Penelope v0.1.1 Penelope.NLP.PennTreebankTokenizer
The tokenization scheme used for the creation of the Penn Treebank corpus. See ftp://ftp.cis.upenn.edu/pub/treebank/public_html/tokenization.html.
Some alterations have been made to the original script to better handle common Unicode replacement characters.
Link to this section Summary
Functions
Callback implementation for Penelope.NLP.Tokenizer.tokenize/1
Link to this section Functions
Link to this function
tokenize(text)
Callback implementation for Penelope.NLP.Tokenizer.tokenize/1
.