Crawler v0.3.0 API Reference

Modules

A high performance web crawler in Elixir

Dispatches requests to a queue for crawling

A worker that performs the crawling

Fetches pages and perform tasks on them

Checks a series of conditions to determine whether it is okay to continue, i.e. to allow Crawler.Fetcher.fetch/1 to begin its tasks

Records information about each crawl for internal use

Makes HTTP requests

Handles retries for failed crawls

Spec for defining a fetch retrier

A placeholder module that let all URLs pass through

Spec for defining an url filter

Custom HTTPoison base module for potential customisation

A set of high level functions for making online and offline URLs and links

Builds a path for a link (can be a URL itself or a relative link) based on the input string which is a URL with or without its protocol

Expands the path by expanding any . and .. characters

Finds different components of a given URL, e.g. its domain name, directory path, or full path

Transforms a link to be storeable and linkable offline

Returns prefixes (“../“) according to the given URL’s structure

Options for the crawler

Parses pages and calls a link handler to handle the detected links

Parses CSS files

Detects whether a page is parsable

Parses HTML files

Parses links and transforms them if necessary

Detects the file type of a given link

Expands a link into a full URL

Spec for defining a parser

Handles the queueing of crawl requests

Stores crawled pages offline

Makes a new (nested) folder according to the options provided

Replaces links found in a page so they work offline

An internal data store for information related to each crawl

Starts the crawl tasks

A supervisor for dynamically starting workers