View Source Tokenizers.Encoding (Tokenizers v0.1.2)
The struct and associated functions for an encoding, the output of a tokenizer.
Use these functions to retrieve the inputs needed for a natural language processing machine learning model.
Link to this section Summary
Functions
Get the attention mask from an encoding.
Get the ids from an encoding.
Get offsets from an encoding.
Get special tokens mask from an encoding.
Get the tokens from an encoding.
Get token type ids from an encoding.
Returns the number of tokens in an Encoding.t()
.
Pad the encoding to the given length.
Truncate the encoding to the given length.
Link to this section Types
Specs
Link to this section Functions
Specs
get_attention_mask(Encoding.t()) :: [integer()]
Get the attention mask from an encoding.
Specs
get_ids(Encoding.t()) :: [integer()]
Get the ids from an encoding.
Specs
Get offsets from an encoding.
Specs
get_special_tokens_mask(Encoding.t()) :: [integer()]
Get special tokens mask from an encoding.
Specs
get_tokens(Encoding.t()) :: [binary()]
Get the tokens from an encoding.
Specs
get_type_ids(Encoding.t()) :: [integer()]
Get token type ids from an encoding.
Specs
n_tokens(encoding :: Encoding.t()) :: non_neg_integer()
Returns the number of tokens in an Encoding.t()
.
Specs
pad(encoding :: Encoding.t(), length :: pos_integer(), opts :: Keyword.t()) :: Encoding.t()
Pad the encoding to the given length.
Options
direction
- The padding direction. Can be:right
or:left
. Default::right
.pad_id
- The id corresponding to the padding token. Default:0
.pad_token
- The padding token to use. Default:"[PAD]"
.pad_type_id
- The type ID corresponding to the padding token. Default:0
.
Specs
Truncate the encoding to the given length.
Options
direction
- The truncation direction. Can be:right
or:left
. Default::right
.stride
- The length of previous content to be included in each overflowing piece. Default:0
.