Crawler v0.3.1 Crawler.Parser

Parses pages and calls a link handler to handle the detected links.

Link to this section Summary

Link to this section Functions

Link to this function parse(input, link_handler \\ &(Dispatcher.dispatch(&1, &2)))

Parses the links and returns the page.

The second argument link_handler is useful when a custom parser calls this default parser and utilises a different link handler for processing links.

Examples

iex> Parser.parse(%{page: %Page{
iex>   body: "Body"
iex> }, opts: %{html_tag: "a", content_type: "text/html"}})
%Page{body: "Body"}

iex> Parser.parse(%{page: %Page{
iex>   body: "<a href='http://parser/1'>Link</a>"
iex> }, opts: %{html_tag: "a", content_type: "text/html"}})
%Page{body: "<a href='http://parser/1'>Link</a>"}

iex> Parser.parse(%{page: %Page{
iex>   body: "<a name='hello'>Link</a>"
iex> }, opts: %{html_tag: "a", content_type: "text/html"}})
%Page{body: "<a name='hello'>Link</a>"}

iex> Parser.parse(%{page: %Page{
iex>   body: "<a href='http://parser/2' target='_blank'>Link</a>"
iex> }, opts: %{html_tag: "a", content_type: "text/html"}})
%Page{body: "<a href='http://parser/2' target='_blank'>Link</a>"}

iex> Parser.parse(%{page: %Page{
iex>   body: "<a href='parser/2'>Link</a>"
iex> }, opts: %{html_tag: "a", content_type: "text/html", referrer_url: "http://hello"}})
%Page{body: "<a href='parser/2'>Link</a>"}

iex> Parser.parse(%{page: %Page{
iex>   body: "<a href='../parser/2'>Link</a>"
iex> }, opts: %{html_tag: "a", content_type: "text/html", referrer_url: "http://hello"}})
%Page{body: "<a href='../parser/2'>Link</a>"}

iex> Parser.parse(%{page: %Page{
iex>   body: image_file()
iex> }, opts: %{html_tag: "img", content_type: "image/png"}})
%Page{body: "#{image_file()}"}
Link to this function parse_links(body, opts, link_handler)