ftfy: fixes text for you.

A port of the Python ftfy library for making text less broken — most importantly, fixing mojibake (text that was decoded in the wrong encoding).

iex> Ftfy.fix_text("✔ No problems")
"✔ No problems"

iex> Ftfy.fix_text("Broken text… it’s flubberific!")
"Broken text… it's flubberific!"

See Ftfy.TextFixerConfig for the available options. The top-level functions accept either a %Ftfy.TextFixerConfig{} or a keyword list of overrides.

Summary

Functions

Apply a plan (a list of {operation, arg} steps) to fix the encoding of text.

Print a breakdown of each codepoint in text (number, glyph, and category) for debugging mysterious Unicode.

Fix text as a single segment, returning {fixed_text, explanation}.

Apply just the encoding-fixing steps of ftfy, discarding the explanation.

Apply just the encoding-fixing steps of ftfy, returning {fixed, explanation}.

Fix text found in a stream of lines, returning a stream of fixed lines.

Given Unicode text, fix inconsistencies and glitches such as mojibake.

Fix text as a single segment, discarding the explanation.

Guess a reasonable decoding for some bytes in an unknown encoding, returning {string, encoding_name}.

Functions

apply_plan(text, plan)

@spec apply_plan(binary(), [{String.t(), String.t()}]) :: binary()

Apply a plan (a list of {operation, arg} steps) to fix the encoding of text.

operation is one of "encode", "decode", "transcode", or "apply".

iex> {_text, plan} = Ftfy.fix_and_explain("schön")
iex> Ftfy.apply_plan("schön", plan)
"schön"

explain_unicode(text)

@spec explain_unicode(binary()) :: :ok

Print a breakdown of each codepoint in text (number, glyph, and category) for debugging mysterious Unicode.

Note: unlike the Python original, this does not print the Unicode name of each character, since the BEAM does not ship the Unicode names database.

fix_and_explain(text, config_or_opts \\ [])

@spec fix_and_explain(binary(), Ftfy.TextFixerConfig.t() | keyword()) ::
  {binary(), [{String.t(), String.t()}] | nil}

Fix text as a single segment, returning {fixed_text, explanation}.

The explanation is a list of {action, parameter} steps that can be replayed with apply_plan/2, or nil if explain is false.

fix_encoding(text, config_or_opts \\ [])

@spec fix_encoding(binary(), Ftfy.TextFixerConfig.t() | keyword()) :: binary()

Apply just the encoding-fixing steps of ftfy, discarding the explanation.

iex> Ftfy.fix_encoding("ó")
"ó"
iex> Ftfy.fix_encoding("ó")
"ó"

fix_encoding_and_explain(text, config_or_opts \\ [])

@spec fix_encoding_and_explain(binary(), Ftfy.TextFixerConfig.t() | keyword()) ::
  {binary(), [{String.t(), String.t()}]}

Apply just the encoding-fixing steps of ftfy, returning {fixed, explanation}.

iex> Ftfy.fix_encoding_and_explain("só")
{"só", [{"encode", "latin-1"}, {"decode", "utf-8"}]}

fix_file(lines, opts \\ [])

@spec fix_file(
  Enumerable.t(),
  keyword()
) :: Enumerable.t()

Fix text found in a stream of lines, returning a stream of fixed lines.

opts may include :encoding (decode each line from these bytes; nil or absent means guess) and :config.

fix_text(text, config_or_opts \\ [])

@spec fix_text(binary(), Ftfy.TextFixerConfig.t() | keyword()) :: binary()

Given Unicode text, fix inconsistencies and glitches such as mojibake.

Fixes the text in independent segments (usually lines), and discards any explanation. Pass a %Ftfy.TextFixerConfig{} or keyword overrides as the second argument.

iex> Ftfy.fix_text("LOUD NOISES")
"LOUD NOISES"

fix_text_segment(text, config_or_opts \\ [])

@spec fix_text_segment(binary(), Ftfy.TextFixerConfig.t() | keyword()) :: binary()

Fix text as a single segment, discarding the explanation.

fixers()

guess_bytes(bytes)

@spec guess_bytes(binary()) :: {binary(), String.t()}

Guess a reasonable decoding for some bytes in an unknown encoding, returning {string, encoding_name}.

This is not accurate and may create Unicode problems. It is not the recommended way to use ftfy.

version()