Html2Markdown.Parser (html2markdown v0.3.0)
Handles HTML preprocessing and parsing operations.
This module is responsible for:
- Parsing HTML content using Floki
- Filtering out non-content elements
- Preparing the document tree for conversion
Filtering Strategy
The parser removes elements in two ways:
- Tag-based filtering: Removes elements like
<script>
,<style>
,<nav>
- Class-based filtering: Removes elements with navigation classes like "footer", "sidebar"
Performance
Uses MapSet for O(1) lookup performance when checking tags and classes.
Summary
Functions
Preprocesses HTML content by parsing it and filtering out non-content elements.
Types
@type html_tree() :: [Floki.html_node()]
Functions
@spec preprocess_content(String.t(), Html2Markdown.Options.t()) :: html_tree()
Preprocesses HTML content by parsing it and filtering out non-content elements.