View Source ReqCrawl.Robots (ReqCrawl v0.1.0)
A Req plugin to parse robots.txt files
You can attach this plugin to any %Req.Request
you use for a crawler and it will only run against
URLs with a path of /robots.txt
.
It outputs a map with the following fields:
:errors
- A list of any errors encountered during parsing:sitemaps
- A list of the sitemaps:rules
- A map of the rules with User-Agents as the keys and a map with the following values as the fields::allow
- A list of allowed paths:disallow
- A list of the disallowed paths
Options
:robots_output_target
- Where to store the parsed output. Defaults to:body
- Overwrites the existing body.:header
- Stores in the response headers under the:robots
key