scrape v2.0.0 API Reference

Modules

Refine a given Website struct to a fully analyzed content article. While the website parsing just inspects some HTML structure, this module processes the plain text content extracted from the HTML

A minimal abstraction to have basic selectors working similar to jQuery. Getters are enough for now, setters are irrelevant for scraping. Also the HTML nodes itself are unimportant, only the actual content matters

These functions transform a given list of string results into specific subsets. Very useful to normalize results from Floki

Common helpers for normalizing and modifying URLs

Compiles a list of “stopwords” into a list filtering function. “Stopwords” are words that carry not very meaningful on their own and can be skipped when calculating Tags from a text

Calculate relevant keywords (aka tags) from a given Text

Small helper functions that help dealing with plain text, sanitizing HTML snippets and the like

Every function in this module takes an HTML string, and returns some data extracted from it, mostly strings. Floki is used for parsing the raw HTML