Funkspector (funkspector v1.0.0)

Funkspector is a web scraper that lets you extract data from web pages.

Summary

Functions

Link to this function

page_scrape(url, options \\ %{})

Parses an HTML document.

This can be used to request a document by passing its URL, like:

Funkspector.page_scrape("https://example.com")

Or to scrape an already loaded document, by passing its HTML contents:

Funkspector.page_scrape("https://example.com", contents: "<html>...</html>")

Example: request a document

iex> { :ok, document } = Funkspector.page_scrape("https://jaimeiniesta.com")
iex> Enum.take(document.data.links.http.external, 3)
["http://www.archive.elixirconf.eu/elixirconf2016", "https://steadyhq.com/", "https://stuart.com/"]

Example: site not found

iex> Funkspector.page_scrape("https://notfoundwebsite.com")
{:error, "https://notfoundwebsite.com", %HTTPoison.Error{reason: :nxdomain, id: nil}}
Link to this function

scrape(url, options, scraping_function)

Link to this function

sitemap_scrape(url, options \\ %{})

Parses an XML sitemap.

This can be used to request a document by passing its URL, like:

Funkspector.sitemap_scrape("https://example.com")

Or to scrape an already loaded document, by passing its XML contents:

Funkspector.sitemap_scrape("https://example.com/sitemap.xml", contents: "<xml>...</xml>")

Example

iex> { :ok, document } = Funkspector.sitemap_scrape("https://rocketvalidator.com/sitemap.xml")
iex> length document.data.locs
1238
iex> Enum.take(document.data.locs, 3)
["https://rocketvalidator.com/", "https://rocketvalidator.com/pricing?billing=weekly", "https://rocketvalidator.com/pricing?billing=monthly"]
Link to this function

text_sitemap_scrape(url, options \\ %{})

Parses a text sitemap.

This can be used to request a document by passing its URL, like:

Funkspector.text_sitemap_scrape("https://example.com")

Or to scrape an already loaded document, by passing its text contents:

Funkspector.text_sitemap_scrape("https://example.com/sitemap.txt", contents: "...")

Example

iex> { :ok, document } = Funkspector.text_sitemap_scrape("https://rocketvalidator.com/sitemap.txt")
iex> length document.data.lines
1238
iex> Enum.take(document.data.lines, 3)
["https://rocketvalidator.com/", "https://rocketvalidator.com/pricing?billing=weekly", "https://rocketvalidator.com/pricing?billing=monthly"]