scrape v2.0.0 API Reference
Modules
Refine a given Website struct to a fully analyzed content article. While the website parsing just inspects some HTML structure, this module processes the plain text content extracted from the HTML
A minimal abstraction to have basic selectors working similar to jQuery. Getters are enough for now, setters are irrelevant for scraping. Also the HTML nodes itself are unimportant, only the actual content matters
These functions transform a given list of string results into specific subsets. Very useful to normalize results from Floki
Common helpers for normalizing and modifying URLs
Compiles a list of “stopwords” into a list filtering function. “Stopwords” are words that carry not very meaningful on their own and can be skipped when calculating Tags from a text
Calculate relevant keywords (aka tags) from a given Text
Small helper functions that help dealing with plain text, sanitizing HTML snippets and the like
Every function in this module takes an HTML string, and returns some data extracted from it, mostly strings. Floki is used for parsing the raw HTML