View Source Tokenizers.Model.WordPiece (Tokenizers v0.5.1)

Summary

Types

Options for model initialisation.

Functions

Instantiate an empty WordPiece model.

Instantiate a WordPiece model from the given vocab file.

Instantiate a WordPiece model from the given vocab.

Types

@type options() :: [
  unk_token: String.t(),
  max_input_chars_per_word: number(),
  continuing_subword_prefix: String.t()
]

Options for model initialisation.

  • :unk_token - the unknown token to be used by the model. Defaults to "[UNK]"

  • :max_input_chars_per_word - the maximum number of characters to allow in a single word. Defaults to 100

  • :continuing_subword_prefix - the prefix to attach to subword units that don't represent a beginning of word. Defaults to "##".

Functions

@spec empty() :: {:ok, Tokenizers.Model.t()}

Instantiate an empty WordPiece model.

Link to this function

from_file(vocab_path, options \\ [])

View Source
@spec from_file(String.t(), options()) :: {:ok, Tokenizers.Model.t()}

Instantiate a WordPiece model from the given vocab file.

Link to this function

init(vocab, options \\ [])

View Source
@spec init(%{required(String.t()) => integer()}, options()) ::
  {:ok, Tokenizers.Model.t()}

Instantiate a WordPiece model from the given vocab.