View Source ReqCrawl.Sitemap (ReqCrawl v0.2.1)
Gathers all URLs from a Sitemap or SitemapIndex according to the specification described at https://sitemaps.org/protocol.html
Supports the following formats:
.xml
(forsitemap
andsitemapindex
).txt
(forsitemap
)
Outputs a 2-Tuple of {type, urls}
where type
is one of :sitemap
or :sitemapindex
and urls
is a list
of URL strings extracted from the body.
Output is stored in the ReqResponse
in the private field under the :crawl_sitemap
key