Crawler v0.3.0 API Reference
Modules
A high performance web crawler in Elixir
Dispatches requests to a queue for crawling
A worker that performs the crawling
Fetches pages and perform tasks on them
Checks a series of conditions to determine whether it is okay to continue,
i.e. to allow Crawler.Fetcher.fetch/1
to begin its tasks
Records information about each crawl for internal use
Makes HTTP requests
Handles retries for failed crawls
Spec for defining a fetch retrier
A placeholder module that let all URLs pass through
Spec for defining an url filter
Custom HTTPoison base module for potential customisation
A set of high level functions for making online and offline URLs and links
Builds a path for a link (can be a URL itself or a relative link) based on the input string which is a URL with or without its protocol
Expands the path by expanding any .
and ..
characters
Finds different components of a given URL, e.g. its domain name, directory path, or full path
Transforms a link to be storeable and linkable offline
Returns prefixes (“../“) according to the given URL’s structure
Options for the crawler
Parses pages and calls a link handler to handle the detected links
Parses CSS files
Detects whether a page is parsable
Parses HTML files
Parses links and transforms them if necessary
Detects the file type of a given link
Expands a link into a full URL
Spec for defining a parser
Handles the queueing of crawl requests
Stores crawled pages offline
Makes a new (nested) folder according to the options provided
Replaces links found in a page so they work offline
An internal data store for information related to each crawl
Starts the crawl tasks
A supervisor for dynamically starting workers