crawlie v0.3.0 Crawlie
The simple Elixir web crawler.
Summary
Functions
Crawls the urls provided in source
, using the Crawlie.ParserLogic
provided
in parser_logic
Functions
crawl(source, parser_logic, options \\ [])
crawl(Stream.t, module, Keyword.t) :: Experimental.Flow.t
Crawls the urls provided in source
, using the Crawlie.ParserLogic
provided
in parser_logic
.
The options
are used to tweak the crawler’s behaviour. You can use most of
the options for HttPoison,
as well as Crawlie specific options.
arguments
source
- aStream
or anEnum
containing the urls to crawlparser_logic
- aCrawlie.ParserLogic
behaviour implementationoptions
- a Keyword List of options
Crawlie specific options
:http_client
- module implementing theCrawlie.HttpClient
behaviour to be used to make the requests. If not provided, will default toCrawlie.HttpClient.HTTPoisonClient
.:mock_client_fun
- If you’re using theCrawlie.HttpClient.MockClient
, this would be theurl -> {:ok, body :: String.t} | {:error, term}
function simulating making the requests. for details:max_depth
- maximum crawling “depth”.0
by default.:max_retries
- maximum amount of tries Crawlie should try to fetch any individual page before giving up. By default3
.:fetch_phase
-Flow
partition configuration for the fetching phase of the crawlingFlow
. It should be a Keyword List containing any subset of:min_demand
,:max_demand
and:stages
properties. For the meaning of these options see Flow documentation:process_phase
- same as:fetch_phase
, but for the processing (page parsing, data and link extraction) part of the process