ftfy: fixes text for you.
A port of the Python ftfy library for making text less broken — most importantly, fixing mojibake (text that was decoded in the wrong encoding).
iex> Ftfy.fix_text("✔ No problems")
"✔ No problems"
iex> Ftfy.fix_text("Broken text… it’s flubberific!")
"Broken text… it's flubberific!"See Ftfy.TextFixerConfig for the available options. The top-level functions
accept either a %Ftfy.TextFixerConfig{} or a keyword list of overrides.
Summary
Functions
Apply a plan (a list of {operation, arg} steps) to fix the encoding of text.
Print a breakdown of each codepoint in text (number, glyph, and category)
for debugging mysterious Unicode.
Fix text as a single segment, returning {fixed_text, explanation}.
Apply just the encoding-fixing steps of ftfy, discarding the explanation.
Apply just the encoding-fixing steps of ftfy, returning {fixed, explanation}.
Fix text found in a stream of lines, returning a stream of fixed lines.
Given Unicode text, fix inconsistencies and glitches such as mojibake.
Fix text as a single segment, discarding the explanation.
Guess a reasonable decoding for some bytes in an unknown encoding, returning
{string, encoding_name}.
Functions
Apply a plan (a list of {operation, arg} steps) to fix the encoding of text.
operation is one of "encode", "decode", "transcode", or "apply".
iex> {_text, plan} = Ftfy.fix_and_explain("schön")
iex> Ftfy.apply_plan("schön", plan)
"schön"
@spec explain_unicode(binary()) :: :ok
Print a breakdown of each codepoint in text (number, glyph, and category)
for debugging mysterious Unicode.
Note: unlike the Python original, this does not print the Unicode name of each character, since the BEAM does not ship the Unicode names database.
@spec fix_and_explain(binary(), Ftfy.TextFixerConfig.t() | keyword()) :: {binary(), [{String.t(), String.t()}] | nil}
Fix text as a single segment, returning {fixed_text, explanation}.
The explanation is a list of {action, parameter} steps that can be replayed
with apply_plan/2, or nil if explain is false.
@spec fix_encoding(binary(), Ftfy.TextFixerConfig.t() | keyword()) :: binary()
Apply just the encoding-fixing steps of ftfy, discarding the explanation.
iex> Ftfy.fix_encoding("ó")
"ó"
iex> Ftfy.fix_encoding("ó")
"ó"
@spec fix_encoding_and_explain(binary(), Ftfy.TextFixerConfig.t() | keyword()) :: {binary(), [{String.t(), String.t()}]}
Apply just the encoding-fixing steps of ftfy, returning {fixed, explanation}.
iex> Ftfy.fix_encoding_and_explain("só")
{"só", [{"encode", "latin-1"}, {"decode", "utf-8"}]}
@spec fix_file( Enumerable.t(), keyword() ) :: Enumerable.t()
Fix text found in a stream of lines, returning a stream of fixed lines.
opts may include :encoding (decode each line from these bytes; nil or
absent means guess) and :config.
@spec fix_text(binary(), Ftfy.TextFixerConfig.t() | keyword()) :: binary()
Given Unicode text, fix inconsistencies and glitches such as mojibake.
Fixes the text in independent segments (usually lines), and discards any
explanation. Pass a %Ftfy.TextFixerConfig{} or keyword overrides as the
second argument.
iex> Ftfy.fix_text("LOUD NOISES")
"LOUD NOISES"
@spec fix_text_segment(binary(), Ftfy.TextFixerConfig.t() | keyword()) :: binary()
Fix text as a single segment, discarding the explanation.
Guess a reasonable decoding for some bytes in an unknown encoding, returning
{string, encoding_name}.
This is not accurate and may create Unicode problems. It is not the recommended way to use ftfy.