An Elixir port of the Python ftfy library (version 6.3.1). It takes in broken Unicode text and makes it less broken — most importantly, it detects and fixes mojibake (text that was decoded in the wrong encoding).
iex> Ftfy.fix_text("✔ No problems")
"✔ No problems"
iex> Ftfy.fix_text("Broken text… it’s flubberific!")
"Broken text… it's flubberific!"
iex> Ftfy.fix_text("LOUD NOISES")
"LOUD NOISES"
iex> Ftfy.fix_encoding_and_explain("só")
{"só", [{"encode", "latin-1"}, {"decode", "utf-8"}]}What it does
Ftfy.fix_text/2 runs a sequence of fixes, each individually configurable via
Ftfy.TextFixerConfig:
- fix_encoding — detect mojibake and undo it by re-encoding and re-decoding
through the right pair of encodings (the heart of ftfy), including the
sub-fixes
restore_byte_a0,replace_lossy_sequences,decode_inconsistent_utf8, andfix_c1_controls - unescape_html — decode HTML entities (
&,é,’, …) - remove_terminal_escapes — strip ANSI color codes
- fix_latin_ligatures —
fi→fi - fix_character_width — fullwidth/halfwidth → standard width
- uncurl_quotes — curly quotes → straight quotes
- fix_line_breaks — CRLF, CR, LS, PS, NEL →
\n - fix_surrogates — repair UTF-16 surrogate pairs
- remove_control_chars — strip useless control characters
- Unicode normalization (NFC by default)
Other entry points mirror the Python API: fix_and_explain/2,
fix_encoding/2, fix_encoding_and_explain/2, fix_text_segment/2,
apply_plan/2, guess_bytes/1, fix_file/2, and explain_unicode/1. The
Ftfy.Fixes, Ftfy.Badness, Ftfy.Chardata, Ftfy.Codecs, and
Ftfy.Formatting modules expose the lower-level building blocks.
Configuration
Pass a keyword list or a %Ftfy.TextFixerConfig{}:
Ftfy.fix_text(text, uncurl_quotes: false)
Ftfy.fix_text(text, %Ftfy.TextFixerConfig{normalization: "NFKC"})Command line
Build the escript and fix text from a file or stdin:
mix escript.build
echo '✔ No problems' | ./ftfy
./ftfy -e latin-1 broken.txt -o fixed.txt
Installation
Add ftfy to your dependencies in mix.exs:
def deps do
[
{:ftfy, "~> 0.1.0"}
]
endNotes on the port
- The encoding-detection data tables (HTML entities, the single-byte charmap
encodings, the fullwidth/halfwidth map, the
wcwidthwidth tables) and the two large heuristic regexes are generated from the reference implementation byscripts/gen_data.pyinto the Ftfy.Data module (internal, undocumented). The reference package is vendored as a git submodule atvendor/python-ftfy(pinned to thev6.3.1tag); rungit submodule update --initbefore regenerating. Ftfy.Codecsreimplements Python'sbad_codecs: thesloppy-windows-*and related charmap encodings, and theutf-8-variants(CESU-8 / Java modified UTF-8) decoder, including incremental decoding.- The behavioral test corpus is read directly from the pinned
vendor/python-ftfysubmodule (tests/test_cases.json); the unit tests are ported from python-ftfy. All 151 "pass" cases and 10 "known failure" cases match the reference. (Running the tests therefore needs the submodule:git submodule update --init.) - One deliberate difference: the BEAM cannot represent lone UTF-16 surrogate
codepoints in a binary, so
Ftfy.Fixes.fix_surrogates/1is effectively a no-op on valid strings, andexplain_unicode/1omits the Unicode character name (the BEAM has no names database).
License and credits
This library is a port of ftfy ("fixes text for you"), created by Robyn Speer. ftfy is the result of years of careful work on the messy reality of broken Unicode, and this Elixir port exists only because of it — our deepest thanks to Robyn Speer for building and maintaining the original, and for releasing it under a permissive license.
- Original ftfy: Copyright 2023 Robyn Speer, licensed under the Apache License, Version 2.0 — https://github.com/rspeer/python-ftfy
- This Elixir port: Copyright 2026 FashionUnited, also licensed under the Apache License, Version 2.0.
The data tables and test corpus in this repository are generated from / ported
directly from python-ftfy 6.3.1 and remain the work of the original author.
See LICENSE for the full
license text and NOTICE for
the attribution and change notice required by the Apache License.
If you use ftfy in research, please cite the original author's work as described at https://github.com/rspeer/python-ftfy.