View Source Tokenizers.AddedToken (Tokenizers v0.5.1)
This struct represents a token added to tokenizer vocabulary.
Summary
Types
@type t() :: %Tokenizers.AddedToken{resource: reference()}
Functions
Retrieves information about added token.
Builds a new added token.
Options
:special
- defines whether this token is a special token. Defaults tofalse
:single_word
- defines whether this token should only match single words. Iftrue
, this token will never match inside of a word. For example the tokening
would match ontokenizing
if this option isfalse
. The notion of ”inside of a word” is defined by the word boundaries pattern in regular expressions (i.e. the token should start and end with word boundaries). Defaults tofalse
:lstrip
- defines whether this token should strip all potential whitespace on its left side. Iftrue
, this token will greedily match any whitespace on its left. For example if we try to match the token[MASK]
withlstrip=true
, in the text"I saw a [MASK]"
, we would match on" [MASK]"
. (Note the space on the left). Defaults tofalse
:rstrip
- defines whether this token should strip all potential whitespaces on its right side. Iftrue
, this token will greedily match any whitespace on its right. It works just like:lstrip
, but on the right. Defaults tofalse
:normalized
- defines whether this token should match against the normalized version of the input text. For example, with the added token"yesterday"
, and a normalizer in charge of lowercasing the text, the token could be extract from the input"I saw a lion Yesterday"
. Iftrue
, the token will be extracted from the normalized input"i saw a lion yesterday"
. Iffalse
, the token will be extracted from the original input"I saw a lion Yesterday"
. Defaults tofalse
for special tokens andtrue
otherwise