View Source Tokenizers.Trainer (Tokenizers v0.5.0)

A Trainer has the responsibility to train a model. We feed it with lines/sentences and then it can train the given Model.

Summary

Types

Options for BPE trainer initialisation. All options can be ommited.

t()

Options for Unigram trainer initialisation. All options can be ommited.

Options for WordLevel trainer initialisation. All options can be ommited.

Options for WordPiece trainer initialisation. All options can be ommited.

Functions

Creates a new BPE Trainer.

Get trainer info

Creates a new Unigram Trainer.

Creates a new WordLevel Trainer.

Creates a new WordPiece Trainer.

Types

@type bpe_options() :: [
  vocab_size: non_neg_integer(),
  min_frequency: non_neg_integer(),
  special_tokens: [String.t()],
  limit_alphabet: non_neg_integer(),
  initial_alphabet: [char()],
  show_progress: boolean(),
  continuing_subword_prefix: String.t(),
  end_of_word_suffix: String.t()
]

Options for BPE trainer initialisation. All options can be ommited.

@type t() :: %Tokenizers.Trainer{resource: reference()}
@type unigram_options() :: [
  vocab_size: non_neg_integer(),
  n_sub_iterations: non_neg_integer(),
  shrinking_factor: float(),
  special_tokens: [String.t()],
  initial_alphabet: [char()],
  uni_token: String.t(),
  max_piece_length: non_neg_integer(),
  seed_size: non_neg_integer(),
  show_progress: boolean()
]

Options for Unigram trainer initialisation. All options can be ommited.

@type wordlevel_options() :: [
  vocab_size: non_neg_integer(),
  min_frequency: non_neg_integer(),
  special_tokens: [String.t()],
  show_progress: boolean()
]

Options for WordLevel trainer initialisation. All options can be ommited.

@type wordpiece_options() :: [
  vocab_size: non_neg_integer(),
  min_frequency: non_neg_integer(),
  special_tokens: [String.t()],
  limit_alphabet: non_neg_integer(),
  initial_alphabet: [char()],
  show_progress: boolean(),
  continuing_subword_prefix: String.t(),
  end_of_word_suffix: String.t()
]

Options for WordPiece trainer initialisation. All options can be ommited.

Functions

@spec bpe(bpe_options()) :: {:ok, t()} | {:error, any()}

Creates a new BPE Trainer.

@spec info(t()) :: map()

Get trainer info

@spec unigram(unigram_options()) :: {:ok, t()} | {:error, any()}

Creates a new Unigram Trainer.

Link to this function

wordlevel(options \\ [])

View Source
@spec wordlevel(wordlevel_options()) :: {:ok, t()} | {:error, any()}

Creates a new WordLevel Trainer.

Link to this function

wordpiece(options \\ [])

View Source
@spec wordpiece(wordpiece_options()) :: {:ok, t()} | {:error, any()}

Creates a new WordPiece Trainer.