UrlFetcher v0.2.1 UrlFetcher.Parser View Source

HTML parser module

Link to this section Summary

Functions

Parses an HTML document extracting URLs from attribute value for given tags. Assumes given attribute values to be URLs.

Link to this section Functions

Link to this function

parse(html, base_url, arg, opts \\ [])

View Source

Specs

Parses an HTML document extracting URLs from attribute value for given tags. Assumes given attribute values to be URLs.

Parameters

  • html: String. HTML content.
  • base_url: String. Base url of the given content, used for absolute url normalization.
  • tag: String. HTML tag to look for.
  • attribute: String. HTML Tag attribute to extract.
  • opts: [key: value]. Options for the parser.

Available options:

  • unique: Boolean. If set, removes duplicates from results. Defaults to true.
  • normalize: Atom. Transforms all urls to absolute if set to :absolute, or leaves them as they are with :original. Defaults to :original.
  • internal_only: Boolean. If set, filters urls to those internal to the site being fetched. Defaults to false.