Lua.VM.Stdlib.Utf8 (Lua v1.0.0-rc.1)

View Source

Lua 5.3 utf8 standard library (§6.5).

Operates over byte strings; Lua strings have no Unicode awareness of their own — this library treats the bytes as a UTF-8 encoded sequence and validates per the BMP+supplementary range [0, 0x10FFFF]. Overlong encodings (e.g. \xC0\x80 for U+0000), continuation bytes appearing in the lead position, and codepoints above 0x10FFFF all surface as "invalid UTF-8 code".

Functions

  • utf8.char(...) — codepoints to UTF-8 string
  • utf8.codepoint(s [, i [, j]]) — UTF-8 string slice to codepoints
  • utf8.codes(s) — stateless (byte_pos, codepoint) iterator
  • utf8.len(s [, i [, j]]) — codepoint count, or nil, byte_pos on the first invalid sequence in the slice
  • utf8.offset(s, n [, i]) — byte position of the n-th codepoint
  • utf8.charpattern — Lua pattern matching one UTF-8 byte sequence