View Source Evision.DNN.Tokenizer (Evision v1.0.0-rc.0)
Summary
Functions
decode
Encode UTF-8 text to token ids (special tokens currently disabled).
Load a tokenizer from a model directory.
Types
@type t() :: %Evision.DNN.Tokenizer{ref: reference()}
Type that represents an DNN.Tokenizer struct.
ref.
reference()The underlying erlang resource variable.
Functions
decode
Positional Arguments
- self:
Evision.DNN.Tokenizer.t() - tokens:
[integer()]
Return
- retval:
string
Python prototype (for reference only):
decode(tokens) -> retval
Encode UTF-8 text to token ids (special tokens currently disabled).
Positional Arguments
self:
Evision.DNN.Tokenizer.t()text:
string.UTF-8 input string.
Return
- retval:
[integer()]
Calls the underlying CoreBPE::encode with an empty allowed-special set.
@return Vector of token ids (32-bit ids narrowed to int for convenience).
Python prototype (for reference only):
encode(text) -> retval
@spec load(Keyword.t()) :: any() | {:error, String.t()}
@spec load(binary()) :: t() | {:error, String.t()}
Load a tokenizer from a model directory.
Positional Arguments
model_config:
string.Path to config.json for model.
Return
- retval:
Tokenizer
Expects the directory to contain:
config.jsonwith fieldmodel_typewith value "gpt2" or "gpt4".tokenizer.jsonproduced by the corresponding model family.
The argument is a path prefix; this function concatenates file
names directly (e.g. model_dir + "config.json"), so model_dir must
end with an appropriate path separator.
@return A Tokenizer ready for use. Throws cv::Exception if files are missing or model_type is unsupported.
Python prototype (for reference only):
load(model_config) -> retval