View Source Tokenizers.Trainer (Tokenizers v0.5.1)
A Trainer has the responsibility to train a model. We feed it with lines/sentences and then it can train the given Model.
Summary
Types
Options for BPE trainer initialisation. All options can be ommited.
Options for Unigram trainer initialisation. All options can be ommited.
Options for WordLevel trainer initialisation. All options can be ommited.
Options for WordPiece trainer initialisation. All options can be ommited.
Functions
Creates a new BPE Trainer.
Get trainer info
Creates a new Unigram Trainer.
Creates a new WordLevel Trainer.
Creates a new WordPiece Trainer.
Types
@type bpe_options() :: [ vocab_size: non_neg_integer(), min_frequency: non_neg_integer(), special_tokens: [String.t()], limit_alphabet: non_neg_integer(), initial_alphabet: [char()], show_progress: boolean(), continuing_subword_prefix: String.t(), end_of_word_suffix: String.t() ]
Options for BPE trainer initialisation. All options can be ommited.
@type t() :: %Tokenizers.Trainer{resource: reference()}
@type unigram_options() :: [ vocab_size: non_neg_integer(), n_sub_iterations: non_neg_integer(), shrinking_factor: float(), special_tokens: [String.t()], initial_alphabet: [char()], uni_token: String.t(), max_piece_length: non_neg_integer(), seed_size: non_neg_integer(), show_progress: boolean() ]
Options for Unigram trainer initialisation. All options can be ommited.
@type wordlevel_options() :: [ vocab_size: non_neg_integer(), min_frequency: non_neg_integer(), special_tokens: [String.t()], show_progress: boolean() ]
Options for WordLevel trainer initialisation. All options can be ommited.
@type wordpiece_options() :: [ vocab_size: non_neg_integer(), min_frequency: non_neg_integer(), special_tokens: [String.t()], limit_alphabet: non_neg_integer(), initial_alphabet: [char()], show_progress: boolean(), continuing_subword_prefix: String.t(), end_of_word_suffix: String.t() ]
Options for WordPiece trainer initialisation. All options can be ommited.
Functions
@spec bpe(bpe_options()) :: {:ok, t()} | {:error, any()}
Creates a new BPE Trainer.
Get trainer info
@spec unigram(unigram_options()) :: {:ok, t()} | {:error, any()}
Creates a new Unigram Trainer.
@spec wordlevel(wordlevel_options()) :: {:ok, t()} | {:error, any()}
Creates a new WordLevel Trainer.
@spec wordpiece(wordpiece_options()) :: {:ok, t()} | {:error, any()}
Creates a new WordPiece Trainer.