UrlFetcher
UrlFetcher fetches URLs present in image and anchor tags in a given URL.
Usage
UrlFetcher
UrlFetcher.fetch("https://myawesome.url/page.html")
will retrieve all link and image URLs present in https://myawesome.url/page.html
, returning them as lists links
and assets
in UrlFetcher.SiteData
struct.
Some options you can provide to the fetcher:
http_client
: HTTP Client to be used. Must comply withUrlFetcher.Http.Client
behaviour. Defaults toUrlFetcher.Http.Adapter.Poison
.unique
: boolean. If set, removes duplicates from results. Defaults totrue
.normalize
: transforms all urls to absolute if set to:absolute
, or leaves them as they are with:original
. Defaults tooriginal
.
HTTP Client behaviour
HTTP Client behaviour is defined in UrlFetcher.Http.Client
. You can choose whatever HTTP client you prefer as long as it complies with that behavior or you implement a wrapper. Note that, by default, HTTP Client must follow redirects.
Installation
The package is available in Hex, and can be installed
by adding url_fetcher
to your list of dependencies in mix.exs
:
def deps do
[
{:url_fetcher, "~> 0.1.0"}
]
end
Documentation can be found at https://hexdocs.pm/url_fetcher/.