antikythera v0.4.0 Antikythera.Xml View Source

Convenient XML parser module wrapping fast_xml.

decode/2 can parse XML into Antikythera.Xml.Element.t, and encode/2 can serialize Antikythera.Xml.Element.t back to XML string.

Antikythera.Xml.Element.t is XML element data structure, and it is JSON-convertible struct. You can safely convert them to JSON using Poison.encode/2 while keeping order of appearance of children, and also convert them back to Antikythera.Xml.Element.t with Poison.decode/2 and Antikythera.Xml.Element.new/1.

Note that order of attributes will not be preserved, since it is not significant. See here

Namespace of tags (e.g. "ns" in <ns:tag>) are kept as is in :name of elements.

Namespace definitions (e.g. xmlns:ns='http://example.com/ns') are treated as plain attributes, and kept as is in :attributes of elements.

Access behaviour

Antikythera.Xml.Element implements Access behaviour for convenient lookups and updates. Following access patterns are available:

  • element[:name], element[:attributes], element[:children]
    • Fetch values of fields in dynamic lookup style.
  • element["@some_attr"]
    • Fetch value of "some_attr" in :attributes map.
  • element[:texts]
    • Fetch text (character data) children. It always returns list.
  • element["some_name"]
    • Fetch child elements with name: "some_name". It always returns list.

You can also use these patterns in Kernel.get_in/2 and its variants.

iex> xml = "<a>foo<b>bar</b>baz</a>"
iex> element = Antikythera.Xml.decode!(xml)
%Antikythera.Xml.Element{name: "a", attributes: %{}, children: [
  "foo",
  %Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
  "baz",
]}
iex> get_in(element, [:texts])
["foo", "baz"]
iex> get_in(element, ["b", Access.at(0), :texts])
["bar"]
iex> get_and_update_in(element, [:children, Access.at(0)], fn _ -> :pop end)
{"foo",
%Antikythera.Xml.Element{name: "a", attributes: %{}, children: [
  %Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
  "baz",
]}}
iex> update_in(element, [:children, Access.all()], fn
...>   text when is_binary(text) -> %Antikythera.Xml.Element{name: "b", attributes: %{}, children: [text]}
...>   e -> e
...> end)
%Antikythera.Xml.Element{name: "a", attributes: %{}, children: [
  %Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["foo"]},
  %Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
  %Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["baz"]},
]}
iex> update_in(element, ["@id"], fn _ -> "001" end)
%Antikythera.Xml.Element{name: "a", attributes: %{"id" => "001"}, children: [
  "foo",
  %Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
  "baz",
]}

Notes on updating with Kernel.get_and_update_in/3 and its variants:

  • Struct fields are static and cannot be popped.
  • Custom access keys except "@some_attr" cannot be used in updating. Use :children instead, in order to update children while preserving order of appearance.

Link to this section Summary

Functions

Reads an XML string and parses it into Antikythera.Xml.Element.t.

Serializes Antikythera.Xml.Element.t into XML string.

Link to this section Types

Specs

decode_option() :: {:trim, boolean()}

Specs

encode_option() :: {:pretty | :with_header, boolean()}

Link to this section Functions

Link to this function

decode(xml_string, opts \\ [])

View Source

Specs

Reads an XML string and parses it into Antikythera.Xml.Element.t.

Comments and header will be discarded.

It can read XHTML document as long as they are well-formatted, though it does not understand Document Type Definition (DTD, header line with "<!DOCTYPE html PUBLIC ..."), so you must remove them.

It tries to read a document with UTF-8 encoding, regardless of "encoding" attribute in the header.

Options:

  • :trim - Drop whitespace-only texts. Default false.
    • There are no universal way to distinguish significant and insignificant whitespaces, so this option may alter the meaning of original document. Use with caution.
    • In W3C recommendation, it is stated that whitespace texts (character data) are basically significant and must be preserved.

Specs

Link to this function

encode(xml_element, opts \\ [])

View Source

Specs

Serializes Antikythera.Xml.Element.t into XML string.

Specifications:

  • Trailing newline will not be generated.
  • All single- and double-quotations in attribute values or entity values are escaped to &apos; and &quot; respectively.
  • All attribute values are SINGLE-quoted.
  • Does not insert a whitespace before "/>" in element without children.

Options:

  • :pretty - Pretty print with 2-space indents. Default false.
    • Similar to :trim option in decode/2, inserted whitespaces may be significant, thus it can alter meaning of original document. Use with caution.
    • It does not insert whitespaces to elements with mixed-content and their descendants, in order to reduce probability to alter the meaning of original document.
  • :with_header - Prepend <?xml version='1.0' encoding='UTF-8'?>\n. Default false.