View Source z_html_parse (zotonic_stdlib v1.23.1)

Loosely tokenizes and generates parse trees for (X)HTML and XML. Adapted by Maas-Maarten Zeeman Extended for basic XML parsing by Marc Worrell

Summary

Functions

Escape a string such that it's safe for HTML (amp; lt; gt;).

Escape a string such that it's safe for HTML attrs (amp; lt; gt; quot;).

tokenize and then transform the token stream into a HTML tree.

Parse an HTML/XML document to a JSON compatible map. Attributes will be added as keys in an <tt>@attributes</tt> key. Elements will be mapped to keys with value lists. all keys are lowercased.

Parse an HTML/XML document to a JSON compatible map. Attributes will be added as keys in an <tt>@attributes</tt> key. Elements will be mapped to keys with value lists. all keys are lowercased.

Transform the output of tokens(Doc) into a HTML tree.

Convert a list of html_token() to a HTML document.

Convert a html_node() tree to a list of tokens.

Transform the input UTF-8 HTML into a token stream.

Types

end_tag/0

-type end_tag() :: {end_tag, Name :: binary()}.

html_attr/0

-type html_attr() :: {html_attr_name(), html_attr_value()}.

html_attr_name/0

-type html_attr_name() :: binary() | string() | atom().

html_attr_value/0

-type html_attr_value() :: binary() | string() | atom() | number().

html_comment/0

-type html_comment() :: {comment, Comment :: binary()}.

html_data/0

-type html_data() :: {data, binary(), Whitespace :: boolean()}.

html_doctype/0

-type html_doctype() :: {doctype, [Doctype :: any()]}.

html_element/0

-type html_element() ::
          html_node() |
          html_comment() |
          html_nop() |
          pi_tag() |
          inline_html() |
          {html_tag()} |
          {html_tag(), [html_element()]} |
          binary().

html_node/0

-type html_node() :: {html_tag(), [html_attr()], [html_element()]}.

html_nop/0

-type html_nop() :: {nop, [html_element()]}.

Special node used by sanitizer for unwanted elements

html_tag/0

-type html_tag() :: binary() | string() | atom().

html_token/0

-type html_token() ::
          html_data() |
          start_tag() |
          end_tag() |
          pi_tag() |
          inline_html() |
          html_comment() |
          html_doctype().

html_tree/0

-type html_tree() ::
          html_doctype() |
          html_node() |
          html_comment() |
          inline_html() |
          {html_tag()} |
          {html_tag(), [html_element()]} |
          pi_tag().

inline_html/0

-type inline_html() :: {'=', binary()}.

options/0

-type options() :: #{mode => xml | html, escape => boolean(), lowercase => boolean()}.

pi_tag/0

-type pi_tag() :: {pi, binary()} | {pi, Tag :: binary(), [html_attr()]}.

start_tag/0

-type start_tag() :: {start_tag, Name :: binary(), [html_attr()], Singleton :: boolean()}.

Functions

escape(B)

Escape a string such that it's safe for HTML (amp; lt; gt;).

escape_attr(B)

Escape a string such that it's safe for HTML attrs (amp; lt; gt; quot;).

parse(Input)

-spec parse(iodata()) -> {ok, html_node()} | {error, nohtml}.

tokenize and then transform the token stream into a HTML tree.

parse(Input, Options)

-spec parse(iodata(), options()) -> {ok, html_node()} | {error, nohtml}.

parse_to_map(Input)

-spec parse_to_map(Input :: iodata() | {binary, list(), list()}) -> {ok, map()} | {error, term()}.

Parse an HTML/XML document to a JSON compatible map. Attributes will be added as keys in an <tt>@attributes</tt> key. Elements will be mapped to keys with value lists. all keys are lowercased.

parse_to_map(Input, Options)

-spec parse_to_map(Input :: iodata() | {binary, list(), list()}, options()) ->
                      {ok, map()} | {error, term()}.

Parse an HTML/XML document to a JSON compatible map. Attributes will be added as keys in an <tt>@attributes</tt> key. Elements will be mapped to keys with value lists. all keys are lowercased.

parse_tokens(Tokens)

-spec parse_tokens([html_token()]) -> {ok, html_node()} | {error, nohtml}.

Transform the output of tokens(Doc) into a HTML tree.

to_html(Node)

-spec to_html([html_token()] | html_tree()) -> iodata().

Convert a list of html_token() to a HTML document.

to_html(Node, Options)

-spec to_html([html_token()] | html_tree(), options()) -> iodata().

to_tokens(HtmlNode)

-spec to_tokens(html_tree()) -> [html_token()].

Convert a html_node() tree to a list of tokens.

to_tokens(T, Options)

-spec to_tokens(html_tree(), options()) -> [html_token()].

tokens(Input)

-spec tokens(iodata()) -> [html_token()].

Transform the input UTF-8 HTML into a token stream.

tokens(Input, Options)

-spec tokens(iodata(), options()) -> [html_token()].