View Source AI.Tokenizer behaviour (fnord v0.4.34)
Oh yeah? I'm gonna make my own tokenizer, with blackjack and hookers!
-- ~Bender~ ChatGPT
The only tokenizer modules available when this was written are either older and don't correctly count for OpenAI's newer models (Gpt3Tokenizer) or can't be used in an escript because they require priv access or OTP support beyond escript's abilities (Tokenizers).
This module tokenizes text using the o200k_base
vocabulary and merges files
from OpenAI's tiktoken repo.