Sputnik v0.2.0 Queue View Source

This module crawls all pages and returns a list of pages as tuples.

The crawler will never go outside of the given URL host.

Link to this section Summary

Functions

Asyncronously crawls all page linked from the initial URL

Link to this section Functions

Link to this function start(url, query, options, sputnik_pid) View Source

Asyncronously crawls all page linked from the initial URL.

It returns a list of tuples, each tuple containing:

  • status code
  • page url
  • map with CSS selectors and their count

Parameters

  • url: the initial URL to crawl
  • query: list of valid CSS selectors as strings
  • options: Keyword list of options like [{:connections, 10}]
  • sputnik_pid: the pid which will receive the output