Crawler v0.3.1 Crawler.Parser
Parses pages and calls a link handler to handle the detected links.
Link to this section Summary
Functions
Parses the links and returns the page
Link to this section Functions
Link to this function
parse(input, link_handler \\ &(Dispatcher.dispatch(&1, &2)))
Parses the links and returns the page.
The second argument link_handler
is useful when a custom parser calls
this default parser and utilises a different link handler for processing
links.
Examples
iex> Parser.parse(%{page: %Page{
iex> body: "Body"
iex> }, opts: %{html_tag: "a", content_type: "text/html"}})
%Page{body: "Body"}
iex> Parser.parse(%{page: %Page{
iex> body: "<a href='http://parser/1'>Link</a>"
iex> }, opts: %{html_tag: "a", content_type: "text/html"}})
%Page{body: "<a href='http://parser/1'>Link</a>"}
iex> Parser.parse(%{page: %Page{
iex> body: "<a name='hello'>Link</a>"
iex> }, opts: %{html_tag: "a", content_type: "text/html"}})
%Page{body: "<a name='hello'>Link</a>"}
iex> Parser.parse(%{page: %Page{
iex> body: "<a href='http://parser/2' target='_blank'>Link</a>"
iex> }, opts: %{html_tag: "a", content_type: "text/html"}})
%Page{body: "<a href='http://parser/2' target='_blank'>Link</a>"}
iex> Parser.parse(%{page: %Page{
iex> body: "<a href='parser/2'>Link</a>"
iex> }, opts: %{html_tag: "a", content_type: "text/html", referrer_url: "http://hello"}})
%Page{body: "<a href='parser/2'>Link</a>"}
iex> Parser.parse(%{page: %Page{
iex> body: "<a href='../parser/2'>Link</a>"
iex> }, opts: %{html_tag: "a", content_type: "text/html", referrer_url: "http://hello"}})
%Page{body: "<a href='../parser/2'>Link</a>"}
iex> Parser.parse(%{page: %Page{
iex> body: image_file()
iex> }, opts: %{html_tag: "img", content_type: "image/png"}})
%Page{body: "#{image_file()}"}
Link to this function
parse_links(body, opts, link_handler)