View Source z_html_parse (zotonic_stdlib v1.15.1)

Loosely tokenizes and generates parse trees for (X)HTML and XML. Adapted by Maas-Maarten Zeeman Extended for basic XML parsing by Marc Worrell

Link to this section Summary

Functions

Escape a string such that it's safe for HTML (amp; lt; gt;).
Escape a string such that it's safe for HTML attrs (amp; lt; gt; quot;).
tokenize and then transform the token stream into a HTML tree.
Parse an HTML/XML document to a JSON compatible map. Attributes will be added as keys in an @attributes key. Elements will be mapped to keys with value lists. all keys are lowercased.
Parse an HTML/XML document to a JSON compatible map. Attributes will be added as keys in an @attributes key. Elements will be mapped to keys with value lists. all keys are lowercased.
Transform the output of tokens(Doc) into a HTML tree.
Convert a list of html_token() to a HTML document.
Convert a html_node() tree to a list of tokens.
Transform the input UTF-8 HTML into a token stream.

Link to this section Types

-type end_tag() :: {end_tag, Name :: binary()}.
-type html_attr() :: {html_attr_name(), html_attr_value()}.
-type html_attr_name() :: binary() | string() | atom().
-type html_attr_value() :: binary() | string() | atom() | number().
-type html_comment() :: {comment, Comment :: binary()}.
-type html_data() :: {data, binary(), Whitespace :: boolean()}.
-type html_doctype() :: {doctype, [Doctype :: any()]}.
-type html_element() ::
    html_node() |
    html_comment() |
    html_nop() |
    pi_tag() |
    inline_html() |
    {html_tag()} |
    {html_tag(), [html_element()]} |
    binary().
-type html_node() :: {html_tag(), [html_attr()], [html_element()]}.
-type html_nop() :: {nop, [html_element()]}.
Special node used by sanitizer for unwanted elements
-type html_tag() :: binary() | string() | atom().
-type html_token() ::
    html_data() |
    start_tag() |
    end_tag() |
    pi_tag() |
    inline_html() |
    html_comment() |
    html_doctype().
-type html_tree() ::
    html_doctype() |
    html_node() |
    html_comment() |
    inline_html() |
    {html_tag()} |
    {html_tag(), [html_element()]} |
    pi_tag().
-type inline_html() :: {'=', binary()}.
-type options() :: #{mode => xml | html, escape => boolean(), lowercase => boolean()}.
-type pi_tag() :: {pi, binary()} | {pi, Tag :: binary(), [html_attr()]}.
-type start_tag() :: {start_tag, Name :: binary(), [html_attr()], Singleton :: boolean()}.

Link to this section Functions

Escape a string such that it's safe for HTML (amp; lt; gt;).
Escape a string such that it's safe for HTML attrs (amp; lt; gt; quot;).
-spec parse(iodata()) -> {ok, html_node()} | {error, nohtml}.
tokenize and then transform the token stream into a HTML tree.
-spec parse(iodata(), options()) -> {ok, html_node()} | {error, nohtml}.
-spec parse_to_map(Input :: iodata() | {binary, list(), list()}) -> {ok, map()} | {error, term()}.
Parse an HTML/XML document to a JSON compatible map. Attributes will be added as keys in an @attributes key. Elements will be mapped to keys with value lists. all keys are lowercased.
Link to this function

parse_to_map(Input, Options)

View Source
-spec parse_to_map(Input :: iodata() | {binary, list(), list()}, options()) ->
                {ok, map()} | {error, term()}.
Parse an HTML/XML document to a JSON compatible map. Attributes will be added as keys in an @attributes key. Elements will be mapped to keys with value lists. all keys are lowercased.
-spec parse_tokens([html_token()]) -> {ok, html_node()} | {error, nohtml}.
Transform the output of tokens(Doc) into a HTML tree.
-spec to_html([html_token()] | html_tree()) -> iodata().
Convert a list of html_token() to a HTML document.
-spec to_html([html_token()] | html_tree(), options()) -> iodata().
-spec to_tokens(html_tree()) -> [html_token()].
Convert a html_node() tree to a list of tokens.
-spec to_tokens(html_tree(), options()) -> [html_token()].
-spec tokens(iodata()) -> [html_token()].
Transform the input UTF-8 HTML into a token stream.
-spec tokens(iodata(), options()) -> [html_token()].