Crawly v0.2.0 Crawly.Middlewares.RobotsTxt View Source

Obey robots.txt

A robots.txt file tells search engine crawlers which pages or files the crawler can or can't request from your site. This is used mainly to avoid overloading a site with requests!

Please NOTE: The first rule of web crawling is you do not harm the website. The second rule of web crawling is you do NOT harm the website

Link to this section Summary

Functions

Callback implementation for Crawly.Pipeline.run/2.

Link to this section Functions

Callback implementation for Crawly.Pipeline.run/2.