View Source Tokenizers.Model.WordPiece (Tokenizers v0.5.1)
Summary
Functions
Instantiate an empty WordPiece model.
Instantiate a WordPiece model from the given vocab file.
Instantiate a WordPiece model from the given vocab.
Types
@type options() :: [ unk_token: String.t(), max_input_chars_per_word: number(), continuing_subword_prefix: String.t() ]
Options for model initialisation.
:unk_token
- the unknown token to be used by the model. Defaults to"[UNK]"
:max_input_chars_per_word
- the maximum number of characters to allow in a single word. Defaults to100
:continuing_subword_prefix
- the prefix to attach to subword units that don't represent a beginning of word. Defaults to"##"
.
Functions
@spec empty() :: {:ok, Tokenizers.Model.t()}
Instantiate an empty WordPiece model.
@spec from_file(String.t(), options()) :: {:ok, Tokenizers.Model.t()}
Instantiate a WordPiece model from the given vocab file.
@spec init(%{required(String.t()) => integer()}, options()) :: {:ok, Tokenizers.Model.t()}
Instantiate a WordPiece model from the given vocab.