HTML2Text (HTML2Text v0.2.0)
View SourceA native-implemented HTML to plain text converter using Rust NIF.
This module provides functionality to convert HTML documents to plain text format
with configurable line width wrapping. It uses the Rust html2text
crate under
the hood for high-performance HTML parsing and text extraction.
The converter handles HTML entities, removes tags, and formats the output as readable plain text while preserving the logical structure of the content.
Summary
Functions
Converts HTML content to plain text.
Converts HTML content to plain text, raising on failure.
Types
@type opts() :: [ width: pos_integer() | :infinity, decorate: boolean(), link_footnotes: boolean(), table_borders: boolean(), pad_block_width: boolean(), allow_width_overflow: boolean(), min_wrap_width: pos_integer(), raw: boolean(), wrap_links: boolean(), unicode_strikeout: boolean() ]
Functions
@spec convert(html :: String.t(), opts()) :: {:ok, text :: String.t()} | {:error, reason :: String.t()}
Converts HTML content to plain text.
Options
:width
— Maximum line width (positive integer or:infinity
). Defaults to80
. Setting to:infinity
disables line wrapping and outputs the entire text on a single line.:decorate
— Enables text decorations like bold or italic. Boolean, defaults totrue
. Whenfalse
, output is plain text without styling.:link_footnotes
— Adds numbered link footnotes at the end of the text. Boolean, defaults totrue
. Whenfalse
, links are omitted.:table_borders
— Shows ASCII borders around table cells. Boolean, defaults totrue
. Whenfalse
, tables render without borders.:pad_block_width
— Pads blocks with spaces to align text to full width. Boolean, defaults tofalse
. Useful for fixed-width layouts.:allow_width_overflow
— Allows lines to exceed the specified width if wrapping is impossible. Boolean, defaults tofalse
. Prevents errors when content can't fit.:min_wrap_width
— Minimum length of text chunks when wrapping lines. Integer ≥ 1, defaults to3
. Helps avoid awkwardly narrow wraps.:raw
— Enables raw mode with minimal processing and formatting. Boolean, defaults tofalse
. Produces plain, raw text output.:wrap_links
— Wraps long URLs or links onto multiple lines. Boolean, defaults totrue
. Whenfalse
, links stay on a single line and may overflow.:unicode_strikeout
— Uses Unicode characters for strikeout text. Boolean, defaults totrue
. Whenfalse
, strikeout renders in simpler styles.
Examples
iex> html = "<h1>Title</h1><p>Some paragraph text.</p>"
...> HTML2Text.convert(html, width: 15)
{:ok, "# Title\n\nSome paragraph\ntext.\n"}
iex> HTML2Text.convert("<b>Important</b>", decorate: false)
{:ok, "Important\n"}
iex> HTML2Text.convert("<table><tr><td>A</td><td>B</td></tr></table>", [])
{:ok, "─┬─\nA│B\n─┴─\n"}
Converts HTML content to plain text, raising on failure.
This function behaves like convert/2
, but raises an error if conversion fails.