View Source tflite_beam_wordpiece_tokenizer (tflite_beam v0.3.8)

Runs WordPiece tokenziation.

Summary

Functions

Tokenizes a piece of text into its word pieces.

Functions

tokenize(BinaryText, VocabularyID)

-spec tokenize(binary(), map()) -> [binary()].

Tokenizes a piece of text into its word pieces.

This uses a greedy longest-match-first algorithm to perform tokenization using the given vocabulary.

For example:

  Input = "unaffable".
  Output = ["una", "##ffa", "##ble"].
  Input = "unaffableX".
  Output = ["[UNK]"].

Related link: https://github.com/tensorflow/examples/blob/master/lite/examples/bert_qa/ios/BertQACore/Models/Tokenizers/WordpieceTokenizer.swift