Ftfy.Codecs.Utf8Variants (ftfy v0.1.0)

Copy Markdown View Source

A decoder for the "utf-8-variants" family of not-quite-UTF-8 encodings, including CESU-8 (UTF-8 layered over UTF-16 surrogate pairs) and Java's "modified UTF-8", which encodes the null character as the two bytes 0xc0 0x80.

This is a port of ftfy.bad_codecs.utf8_variants. Encoding is identical to standard UTF-8, so only decoding is interesting.

The decoder works incrementally so that a byte stream can be fed in arbitrary chunks (see feed/3); decode/2 is the one-shot convenience that decodes a whole binary at once.

Summary

Functions

Decode a whole binary in one shot. Returns {:ok, string} or {:error, :invalid} in strict mode.

Feed a chunk of bytes to an incremental decoder. Returns {decoder, decoded_string}. Pass final: false while more bytes may follow, and true (the default) for the last chunk. Raises on a decode error in strict mode.

Create an incremental decoder with the given error mode.

Types

t()

@type t() :: %Ftfy.Codecs.Utf8Variants{buffer: binary(), errors: String.t()}

Functions

decode(bin, errors \\ "strict")

@spec decode(binary(), String.t()) :: {:ok, binary()} | {:error, :invalid}

Decode a whole binary in one shot. Returns {:ok, string} or {:error, :invalid} in strict mode.

feed(dec, input, final \\ true)

@spec feed(t(), binary(), boolean()) :: {t(), binary()}

Feed a chunk of bytes to an incremental decoder. Returns {decoder, decoded_string}. Pass final: false while more bytes may follow, and true (the default) for the last chunk. Raises on a decode error in strict mode.

new(errors \\ "strict")

@spec new(String.t()) :: t()

Create an incremental decoder with the given error mode.