crawlie v0.5.1 Crawlie
The simple Elixir web crawler.
Summary
Functions
Crawls the urls provided in source
, using the Crawlie.ParserLogic
provided
in parser_logic
Functions
Crawls the urls provided in source
, using the Crawlie.ParserLogic
provided
in parser_logic
.
The options
are used to tweak the crawler’s behaviour. You can use most of
the options for HttPoison,
as well as Crawlie specific options.
arguments
source
- aStream
or anEnum
containing the urls to crawlparser_logic
- aCrawlie.ParserLogic
behaviour implementationoptions
- a Keyword List of options
Crawlie specific options
:http_client
- module implementing theCrawlie.HttpClient
behaviour to be used to make the requests. If not provided, will default toCrawlie.HttpClient.HTTPoisonClient
.:mock_client_fun
- If you’re using theCrawlie.HttpClient.MockClient
, this would be theurl :: String.t -> {:ok, body :: String.t} | {:error, term}
function simulating making the requests. for details:max_depth
- maximum crawling “depth”.0
by default.:max_retries
- maximum amount of tries Crawlie should try to fetch any individual page before giving up. By default3
.:fetch_phase
-Flow
partition configuration for the fetching phase of the crawlingFlow
. It should be a Keyword List containing any subset of:min_demand
,:max_demand
and:stages
properties. For the meaning of these options see Flow documentation:process_phase
- same as:fetch_phase
, but for the processing (page parsing, data and link extraction) part of the process:pqueue_module
- One of pqueue implementations::pqueue
,:pqueue2
,:pqueue3
,:pqueue4
. Different implementation have different performance characteristics and allow for different:max_depth
values. Consult docs for details. By default using:pqueue3
- good performance and allowing arbitrary:max_depth
values.