crawlie v0.2.0-alpha1 Crawlie

The simple Elixir web crawler.

Summary

Functions

Crawls the urls provided in source, using the Crawlie.ParserLogic provided in parser_logic

Functions

crawl(source, parser_logic, options \\ [])

Crawls the urls provided in source, using the Crawlie.ParserLogic provided in parser_logic.

The options are used to tweak the crawler’s behaviour. You can use most of the options for HttPoison, as well as Crawlie specific options.

arguments

Crawlie options

  • :http_client - module implementing the Crawlie.HttpClient behaviour to be used to make the requests. If not provided, will default to Crawlie.HttpClient.HTTPoisonClient.
  • :mock_client_fun - If you’re using the Crawlie.HttpClient.MockClient, this would be the url -> {:ok, body :: String.t} | {:error, term} function simulating making the requests.
  • :min_demand, :max_demand - see Flow documentation for details
  • :max_depth - maximum crawling “depth”. 0 by default.
  • :max_retries - maximum amount of tries Crawlie should try to fetch any individual page before giving up. By default 3.
is_ok_tuple(arg1)