Roboxir View Source

0 dependencies, straightforward Robots.txt parser. Plug, parse, and do whatever needed with the result in a convinient UserAgent struct. This parser has two functions, crawlable/2 and crawlable?/2.

Usage

crawlable/2 usage example:

iex> Roboxir.crawlable("some_random_agent", "https://google.com/")
%Roboxir.UserAgent{
  allowed_urls: ["/js/", "/finance", "/maps/reserve/partners", "/maps/reserve",
   "/searchhistory/", "/alerts/$", "/alerts/remove", "/alerts/manage",
   "/accounts/o8/id", "/s2/static", ..],
  delay: 0,
  disallowed_urls: ["/nonprofits/account/", "/localservices/*", "/local/tab/",
   "/local/place/rap/", "/local/place/reviews/", ..],
  name: "google",
  sitemap_urls: []
}

crawlable?/2 usage example:

iex> Roboxir.crawlable?("other_randome_agent", "https://google.com/")
true

Using crawlable/2 is recommended, with your own logic to itterate over disallowed_urls and decide what you can or can't parse. crawlable?/2 is still being developed.

Config

You can skip and not pass the url param everytime by adding the config line to your config.exs

config :roboxir, url: "https://your_website.com/"

Installation

If available in Hex, the package can be installed by adding roboxir to your list of dependencies in mix.exs:

def deps do
  [
    {:roboxir, "~> 0.1.0"}
  ]
end

The docs can be found at https://hexdocs.pm/roboxir.