Crawler v1.0.0 API Reference
Modules
A high performance web crawler in Elixir
Dispatches requests to a queue for crawling
A worker that performs the crawling
Fetches pages and perform tasks on them
Captures and prepares HTTP response headers
Checks a series of conditions to determine whether it is okay to continue
Records information about each crawl for internal use
Makes HTTP requests
Handles retries for failed crawls
Spec for defining a fetch retrier
A placeholder module that lets all URLs pass through
Spec for defining an url filter
Custom HTTPoison base module for potential customisation
A set of high level functions for making online and offline URLs and links
Builds a path for a link (can be a URL itself or a relative link) based on the input string which is a URL with or without its protocol
Expands the path by expanding any .
and ..
characters
Finds different components of a given URL, e.g. its domain name, directory path, or full path
Transforms a link to be storeable and linkable offline
Returns prefixes (../
s) according to the given URL’s structure
Options for the crawler
Parses pages and calls a link handler to handle the detected links
Parses CSS files
Detects whether a page is parsable
Parses HTML files
Parses links and transforms them if necessary
Expands a link into a full URL
Spec for defining a parser
Handles the queueing of crawl requests
A placeholder module that demonstrates the scraping interface
Spec for defining a scraper
Stores crawled pages offline
Makes a new (nested) folder according to the options provided
Replaces links found in a page so they work offline
An internal data store for information related to each crawl
An internal struct for keeping the url and content of a crawled page
Handles the crawl tasks