Html2Markdown.Parser (html2markdown v0.3.0)

Handles HTML preprocessing and parsing operations.

This module is responsible for:

  1. Parsing HTML content using Floki
  2. Filtering out non-content elements
  3. Preparing the document tree for conversion

Filtering Strategy

The parser removes elements in two ways:

  • Tag-based filtering: Removes elements like <script>, <style>, <nav>
  • Class-based filtering: Removes elements with navigation classes like "footer", "sidebar"

Performance

Uses MapSet for O(1) lookup performance when checking tags and classes.

Summary

Functions

Preprocesses HTML content by parsing it and filtering out non-content elements.

Types

html_tree()

@type html_tree() :: [Floki.html_node()]

Functions

preprocess_content(content, opts)

@spec preprocess_content(String.t(), Html2Markdown.Options.t()) :: html_tree()

Preprocesses HTML content by parsing it and filtering out non-content elements.