HTML2Text (HTML2Text v0.1.0)
View SourceA native-implemented HTML to plain text converter using Rust NIF.
This module provides functionality to convert HTML documents to plain text format
with configurable line width wrapping. It uses the Rust html2text
crate under
the hood for high-performance HTML parsing and text extraction.
The converter handles HTML entities, removes tags, and formats the output as readable plain text while preserving the logical structure of the content.
Summary
Functions
Converts HTML content to plain text with configurable line width.
Functions
@spec convert(String.t(), pos_integer() | :infinity) :: String.t()
Converts HTML content to plain text with configurable line width.
This function converts HTML content to plain text and optionally wraps lines at the
specified width. The width can be either a positive integer representing the maximum
number of characters per line, or :infinity
for unlimited line width.
Parameters
html
- A binary containing the HTML content to convertwidth
- Either a positive integer for line width or:infinity
for unlimited width
Return Value
Returns a string containing the plain text representation of the HTML content.
Examples
# Converting with specific width
iex> html = "<h1>Welcome to Our Amazing Website</h1><p>This is a comprehensive guide that covers everything you need to know about our services and products.</p>"
iex> HTML2Text.convert(html, 30)
"# Welcome to Our Amazing\n# Website\n\nThis is a comprehensive guide\nthat covers everything you\nneed to know about our\nservices and products.\n"
# Converting with unlimited width
iex> html = "<div><strong>Important:</strong> Please read all the terms and conditions carefully before proceeding with your purchase.</div>"
iex> HTML2Text.convert(html, :infinity)
"**Important:** Please read all the terms and conditions carefully before proceeding with your purchase.\n"
# Converting lists and complex HTML
iex> html = "<ul><li>First item with some detailed description</li><li>Second item that also has quite a bit of text</li><li>Third item</li></ul>"
iex> HTML2Text.convert(html, 25)
"* First item with some\n detailed description\n* Second item that also\n has quite a bit of text\n* Third item\n"
# Converting tables and structured content
iex> html = "<table><tr><td>Product Name</td><td>Description</td><td>Price</td></tr><tr><td>Widget</td><td>A useful widget for everyday tasks</td><td>$19.99</td></tr></table>"
iex> HTML2Text.convert(html, 50)
"""
───────────┬────────────────────────────────┬─────
Product │Description │Price
Name │ │
───────────┼────────────────────────────────┼─────
Widget │A useful widget for everyday │$19.9
│tasks │9
───────────┴────────────────────────────────┴─────
"""