Ftfy.Chardata (ftfy v0.1.0)

Copy Markdown View Source

Details about characters and the encodings that use them: the regexes that detect mojibake-ish byte and character sequences, the "could this string have come from this single-byte encoding?" check, and the translation tables for ligatures, character width, and control-character removal.

Port of ftfy.chardata.

Summary

Functions

Delete every character whose codepoint is in set from text.

Whether text could have been decoded from the given single-byte encoding (possibly sloppily). Mirrors chardata.possible_encoding.

Translate a string by a %{codepoint => replacement_string} map, leaving characters absent from the map unchanged. Equivalent to Python's str.translate for string replacements.

Functions

a_grave_word_re()

altered_utf8_re()

c1_control_re()

charmap_encodings()

control_chars()

delete_codepoints(text, set)

@spec delete_codepoints(binary(), MapSet.t()) :: binary()

Delete every character whose codepoint is in set from text.

double_quote_re()

html_entity_re()

ligatures()

lossy_utf8_re()

possible_encoding(text, encoding)

@spec possible_encoding(binary(), String.t()) :: boolean()

Whether text could have been decoded from the given single-byte encoding (possibly sloppily). Mirrors chardata.possible_encoding.

single_quote_re()

translate(text, map)

@spec translate(binary(), %{optional(integer()) => binary()}) :: binary()

Translate a string by a %{codepoint => replacement_string} map, leaving characters absent from the map unchanged. Equivalent to Python's str.translate for string replacements.

utf8_detector_re()

width_map()