Sputnik v0.2.1 Crawl View Source

This module exposes a sync and an async way to find all href in a html body string

Link to this section Summary

Functions

Finds all links in the given html body string

Spawns a new process that finds all links in the given html body string. It sends back a message to the given pid with the links it found

Link to this section Functions

Link to this function start(body, request_url) View Source

Finds all links in the given html body string.

It automatically converts relative urls to absolutes urls.

Parameters

  • body: html page as string
  • request_url: the page url. Needed for relative -> absolute url conversion
Link to this function start(body, request_url, pid) View Source

Spawns a new process that finds all links in the given html body string. It sends back a message to the given pid with the links it found.

It automatically converts relative urls to absolutes urls.

Parameters

  • body: html page as string
  • request_url: the page url. Needed for relative -> absolute url conversion
  • pid: the pid which will receive a messages with the found links