View Source Evision.DNN.Tokenizer (Evision v1.0.0-rc.0)

Summary

Types

t()

Type that represents an DNN.Tokenizer struct.

Functions

Encode UTF-8 text to token ids (special tokens currently disabled).

Load a tokenizer from a model directory.

Types

@type t() :: %Evision.DNN.Tokenizer{ref: reference()}

Type that represents an DNN.Tokenizer struct.

  • ref. reference()

    The underlying erlang resource variable.

Functions

@spec decode(Keyword.t()) :: any() | {:error, String.t()}
@spec decode(Evision.DNN.DNN.Tokenizer.t(), [integer()]) ::
  binary() | {:error, String.t()}

decode

Positional Arguments
  • self: Evision.DNN.Tokenizer.t()
  • tokens: [integer()]
Return
  • retval: string

Python prototype (for reference only):

decode(tokens) -> retval
@spec encode(Keyword.t()) :: any() | {:error, String.t()}
@spec encode(Evision.DNN.DNN.Tokenizer.t(), binary()) ::
  [integer()] | {:error, String.t()}

Encode UTF-8 text to token ids (special tokens currently disabled).

Positional Arguments
  • self: Evision.DNN.Tokenizer.t()

  • text: string.

    UTF-8 input string.

Return
  • retval: [integer()]

Calls the underlying CoreBPE::encode with an empty allowed-special set. @return Vector of token ids (32-bit ids narrowed to int for convenience).

Python prototype (for reference only):

encode(text) -> retval
@spec load(Keyword.t()) :: any() | {:error, String.t()}
@spec load(binary()) :: t() | {:error, String.t()}

Load a tokenizer from a model directory.

Positional Arguments
  • model_config: string.

    Path to config.json for model.

Return
  • retval: Tokenizer

Expects the directory to contain:

  • config.json with field model_type with value "gpt2" or "gpt4".
  • tokenizer.json produced by the corresponding model family.

The argument is a path prefix; this function concatenates file names directly (e.g. model_dir + "config.json"), so model_dir must end with an appropriate path separator. @return A Tokenizer ready for use. Throws cv::Exception if files are missing or model_type is unsupported.

Python prototype (for reference only):

load(model_config) -> retval