Roboxir (Roboxir v0.1.1) View Source
Roboxir is a straightforward Robots.txt parser that lets you know if the crawler with specified name is legable to crawl a website. This parser has two functions, crawlable/2 and crawlable?/2
Link to this section Summary
Functions
Simillarly to crawlable?/2
parses the robots.txt on the desired website,
returns a Struct which can be used to determine the allowed/disallowed url paths per agent.
Checks if a user-agent is legable to crawl the website, returns true if the agent can crawl the page, false otherwise.
Link to this section Functions
Specs
crawlable(String.t(), String.t()) :: Roboxir.UserAgent.t()
Simillarly to crawlable?/2
parses the robots.txt on the desired website,
returns a Struct which can be used to determine the allowed/disallowed url paths per agent.
Examples
iex> user_agent = Roboxir.crawlable("some_random_agent", "https://google.com/")
%Roboxir.UserAgent{
allowed_urls: ["/js/", "/finance", "/maps/reserve/partners", "/maps/reserve",
"/searchhistory/", "/alerts/$", "/alerts/remove", "/alerts/manage",
"/accounts/o8/id", "/s2/static"],
delay: 0,
disallowed_urls: ["/nonprofits/account/", "/localservices/*", "/local/tab/",
"/local/place/rap/", "/local/place/reviews/", ..],
name: "google",
sitemap_urls: []
}
iex> user_agent = Roboxir.crawlable("some_random_agent", "https://google.com/")
iex> user_agent.disallowed_urls
["/nonprofits/account/", "/localservices/*", "/local/tab/", "/local/place/rap/",
"/local/place/reviews/", "/local/place/products/", "/local/dining/",
"/local/dealership/", "/local/cars/", "/local/cars", "/intl/*/about/views/",
"/about/views/", ..]
Specs
Checks if a user-agent is legable to crawl the website, returns true if the agent can crawl the page, false otherwise.
Examples
iex> Roboxir.crawlable?("your_agent_name", "https://google.com/")
true