Crawler v1.1.1 Crawler.Snapper
Stores crawled pages offline.
Link to this section Summary
Functions
In order to store pages offline, it provides the following functionalities
Link to this section Functions
Link to this function
snap(body, opts)
In order to store pages offline, it provides the following functionalities:
- replaces all URLs to their equivalent relative paths
- creates directories when necessary to store the files
Examples
iex> Snapper.snap("hello", %{save_to: tmp("snapper"), url: "http://hello-world.local"})
iex> File.read(tmp("snapper/hello-world.local", "index.html"))
{:ok, "hello"}
iex> Snapper.snap("hello", %{save_to: tmp("snapper"), url: "http://snapper.local/index.html"})
iex> File.read(tmp("snapper/snapper.local", "index.html"))
{:ok, "hello"}
iex> Snapper.snap("hello", %{save_to: "nope", url: "http://snapper.local/index.html"})
{:error, "Cannot write to file nope/snapper.local/index.html, reason: enoent"}
iex> Snapper.snap("hello", %{save_to: tmp("snapper"), url: "http://snapper.local/hello"})
iex> File.read(tmp("snapper/snapper.local/hello", "index.html"))
{:ok, "hello"}
iex> Snapper.snap("hello", %{save_to: tmp("snapper"), url: "http://snapper.local/hello1/"})
iex> File.read(tmp("snapper/snapper.local/hello1", "index.html"))
{:ok, "hello"}
iex> Snapper.snap(
iex> "<a href='http://another.domain/page'></a>",
iex> %{
iex> save_to: tmp("snapper"),
iex> url: "http://snapper.local/depth0",
iex> depth: 1,
iex> max_depths: 2,
iex> html_tag: "a",
iex> content_type: "text/html",
iex> }
iex> )
iex> File.read(tmp("snapper/snapper.local/depth0", "index.html"))
{:ok, "<a href='../../another.domain/page/index.html'></a>"}
iex> Snapper.snap(
iex> "<a href='https://another.domain:8888/page'></a>",
iex> %{
iex> save_to: tmp("snapper"),
iex> url: "http://snapper.local:7777/dir/depth1",
iex> depth: 1,
iex> max_depths: 2,
iex> html_tag: "a",
iex> content_type: "text/html",
iex> }
iex> )
iex> File.read(tmp("snapper/snapper.local-7777/dir/depth1", "index.html"))
{:ok, "<a href='../../../another.domain-8888/page/index.html'></a>"}