View Source tflite_beam_wordpiece_tokenizer (tflite_beam v0.3.4)
Link to this section Summary
Functions
Tokenizes a piece of text into its word pieces.
Link to this section Functions
-spec tokenize(binary(), map()) -> [binary()].
Tokenizes a piece of text into its word pieces.
This uses a greedy longest-match-first algorithm to perform tokenization using the given vocabulary.
For example:
Input = "unaffable".
Output = ["una", "##ffa", "##ble"].
Input = "unaffableX".
Output = ["[UNK]"].
Related link: https://github.com/tensorflow/examples/blob/master/lite/examples/bert_qa/ios/BertQACore/Models/Tokenizers/WordpieceTokenizer.swift