Crawler v1.1.1 API Reference
Modules
A high performance web crawler in Elixir.
Dispatches requests to a queue for crawling.
A worker that performs the crawling.
Fetches pages and perform tasks on them.
Captures and prepares HTTP response headers.
Modifies request options and headers before dispatch.
Checks a series of conditions to determine whether it is okay to continue.
Records information about each crawl for internal use.
Makes HTTP requests.
Handles retries for failed crawls.
Spec for defining a fetch retrier.
A placeholder module that lets all URLs pass through.
Spec for defining an url filter.
Custom HTTPoison base module for potential customisation.
A set of high level functions for making online and offline URLs and links.
Builds a path for a link (can be a URL itself or a relative link) based on the input string which is a URL with or without its protocol.
Expands the path by expanding any .
and ..
characters.
Finds different components of a given URL, e.g. its domain name, directory path, or full path.
Transforms a link to be storeable and linkable offline.
Returns prefixes (../
s) according to the given URL's structure.
Options for the crawler.
Parses pages and calls a link handler to handle the detected links.
Parses CSS files.
Detects whether a page is parsable.
Parses HTML files.
Parses links and transforms them if necessary.
Expands a link into a full URL.
Spec for defining a parser.
Handles the queueing of crawl requests.
A placeholder module that demonstrates the scraping interface.
Spec for defining a scraper.
Stores crawled pages offline.
Makes a new (nested) folder according to the options provided.
Replaces links found in a page so they work offline.
An internal data store for information related to each crawl.
An internal struct for keeping the url and content of a crawled page.
Handles the crawl tasks.