UrlFetcher v0.2.0 UrlFetcher.Parser View Source
HTML parser module
Link to this section Summary
Functions
Parses an HTML document extracting URLs from attribute value for given tags. Assumes given attribute values to be URLs.
Link to this section Functions
Specs
parse( Floki.html_tree(), UrlFetcher.Http.Client.url(), {String.t(), String.t()}, [{:key, any()}] ) :: [UrlFetcher.Http.Client.url()]
Parses an HTML document extracting URLs from attribute value for given tags. Assumes given attribute values to be URLs.
Parameters
- html: String. HTML content.
- base_url: String. Base url of the given content, used for absolute url normalization.
- tag: String. HTML tag to look for.
- attribute: String. HTML Tag attribute to extract.
- opts: [key: value]. Options for the parser.
Available options:
- unique: Boolean. If set, removes duplicates from results. Defaults to
true
. - normalize: Atom. Transforms all urls to absolute if set to
:absolute
, or leaves them as they are with:original
. Defaults to:original
. - internal_only: Boolean. If set, filters urls to those internal to the site being fetched. Defaults to
false
.