Crawler v1.0.0 API Reference

Modules

A high performance web crawler in Elixir

Dispatches requests to a queue for crawling

A worker that performs the crawling

Fetches pages and perform tasks on them

Captures and prepares HTTP response headers

Checks a series of conditions to determine whether it is okay to continue

Records information about each crawl for internal use

Makes HTTP requests

Handles retries for failed crawls

Spec for defining a fetch retrier

A placeholder module that lets all URLs pass through

Spec for defining an url filter

Custom HTTPoison base module for potential customisation

A set of high level functions for making online and offline URLs and links

Builds a path for a link (can be a URL itself or a relative link) based on the input string which is a URL with or without its protocol

Expands the path by expanding any . and .. characters

Finds different components of a given URL, e.g. its domain name, directory path, or full path

Transforms a link to be storeable and linkable offline

Returns prefixes (../s) according to the given URL’s structure

Options for the crawler

Parses pages and calls a link handler to handle the detected links

Parses CSS files

Detects whether a page is parsable

Parses HTML files

Parses links and transforms them if necessary

Expands a link into a full URL

Spec for defining a parser

Handles the queueing of crawl requests

A placeholder module that demonstrates the scraping interface

Spec for defining a scraper

Stores crawled pages offline

Makes a new (nested) folder according to the options provided

Replaces links found in a page so they work offline

An internal data store for information related to each crawl

An internal struct for keeping the url and content of a crawled page

Handles the crawl tasks