Sputnik v0.2.0 Queue View Source
This module crawls all pages and returns a list of pages as tuples.
The crawler will never go outside of the given URL host.
Link to this section Summary
Functions
Asyncronously crawls all page linked from the initial URL
Link to this section Functions
Asyncronously crawls all page linked from the initial URL.
It returns a list of tuples, each tuple containing:
- status code
- page url
- map with CSS selectors and their count
Parameters
url
: the initial URL to crawlquery
: list of valid CSS selectors as stringsoptions
: Keyword list of options like[{:connections, 10}]
sputnik_pid
: the pid which will receive the output