View Source Tokenizers.Model.BPE (Tokenizers v0.5.1)
Summary
Functions
Instantiate an empty BPE model.
Instantiate a BPE model from the given vocab and merges files.
Instantiate a BPE model from the given vocab and merges.
Types
@type options() :: [ cache_capacity: number(), dropout: float(), unk_token: String.t(), continuing_subword_prefix: String.t(), end_of_word_suffix: String.t(), fuse_unk: boolean(), byte_fallback: boolean() ]
Options for model initialisation.
:byte_fallback
- whether to use the byte fallback trick:cache_capacity
- the number of words that the BPE cache can contain. The cache allows to speed-up the process by keeping the result of the merge operations for a number of words. Defaults to10_000
:dropout
- The BPE dropout to use. Must be a float between 0 and 1:unk_token
- The unknown token to be used by the model:continuing_subword_prefix
- The prefix to attach to subword units that don't represent a beginning of word:end_of_word_suffix
- The suffix to attach to subword units that represent an end of word
Functions
@spec empty() :: {:ok, Tokenizers.Model.t()}
Instantiate an empty BPE model.
@spec from_file(String.t(), String.t(), options()) :: {:ok, Tokenizers.Model.t()}
Instantiate a BPE model from the given vocab and merges files.
@spec init( %{required(String.t()) => integer()}, [{String.t(), String.t()}], options() ) :: {:ok, Tokenizers.Model.t()}
Instantiate a BPE model from the given vocab and merges.