A decoder for the "utf-8-variants" family of not-quite-UTF-8 encodings,
including CESU-8 (UTF-8 layered over UTF-16 surrogate pairs) and Java's
"modified UTF-8", which encodes the null character as the two bytes
0xc0 0x80.
This is a port of ftfy.bad_codecs.utf8_variants. Encoding is identical to
standard UTF-8, so only decoding is interesting.
The decoder works incrementally so that a byte stream can be fed in arbitrary
chunks (see feed/3); decode/2 is the one-shot convenience that decodes a
whole binary at once.
Summary
Functions
Decode a whole binary in one shot. Returns {:ok, string} or
{:error, :invalid} in strict mode.
Feed a chunk of bytes to an incremental decoder. Returns
{decoder, decoded_string}. Pass final: false while more bytes may follow,
and true (the default) for the last chunk. Raises on a decode error in
strict mode.
Create an incremental decoder with the given error mode.
Types
Functions
Decode a whole binary in one shot. Returns {:ok, string} or
{:error, :invalid} in strict mode.
Feed a chunk of bytes to an incremental decoder. Returns
{decoder, decoded_string}. Pass final: false while more bytes may follow,
and true (the default) for the last chunk. Raises on a decode error in
strict mode.
Create an incremental decoder with the given error mode.