Details about characters and the encodings that use them: the regexes that detect mojibake-ish byte and character sequences, the "could this string have come from this single-byte encoding?" check, and the translation tables for ligatures, character width, and control-character removal.
Port of ftfy.chardata.
Summary
Functions
Delete every character whose codepoint is in set from text.
Whether text could have been decoded from the given single-byte encoding
(possibly sloppily). Mirrors chardata.possible_encoding.
Translate a string by a %{codepoint => replacement_string} map, leaving
characters absent from the map unchanged. Equivalent to Python's
str.translate for string replacements.
Functions
Delete every character whose codepoint is in set from text.
Whether text could have been decoded from the given single-byte encoding
(possibly sloppily). Mirrors chardata.possible_encoding.
Translate a string by a %{codepoint => replacement_string} map, leaving
characters absent from the map unchanged. Equivalent to Python's
str.translate for string replacements.