View Source AI.Tokenizer behaviour (fnord v0.4.34)

Oh yeah? I'm gonna make my own tokenizer, with blackjack and hookers!

                                                -- ~Bender~ ChatGPT

The only tokenizer modules available when this was written are either older and don't correctly count for OpenAI's newer models (Gpt3Tokenizer) or can't be used in an escript because they require priv access or OTP support beyond escript's abilities (Tokenizers).

This module tokenizes text using the o200k_base vocabulary and merges files from OpenAI's tiktoken repo.

Summary

Callbacks

decode(list)

@callback decode(list()) :: String.t()

encode(t)

@callback encode(String.t()) :: list()

Functions

chunk(input, max_tokens)

get_impl()