antikythera v0.2.0 Antikythera.Xml View Source
Convenient XML parser module wrapping fast_xml.
decode/2
can parse XML into Antikythera.Xml.Element.t
, and encode/2
can serialize Antikythera.Xml.Element.t
back to XML string.
Antikythera.Xml.Element.t
is XML element data structure, and it is JSON-convertible struct.
You can safely convert them to JSON using Poison.encode/2
while keeping order of appearance of children,
and also convert them back to Antikythera.Xml.Element.t
with Poison.decode/2
and Antikythera.Xml.Element.new/1
.
Note that order of attributes will not be preserved, since it is not significant. See here
Namespace of tags (e.g. “ns” in <ns:tag>
) are kept as is in :name
of elements.
Namespace definitions (e.g. xmlns:ns='http://example.com/ns'
) are treated as plain attributes,
and kept as is in :attributes
of elements.
Access
behaviour
Antikythera.Xml.Element
implements Access
behaviour for convenient lookups and updates.
Following access patterns are available:
element[:name]
,element[:attributes]
,element[:children]
- Fetch values of fields in dynamic lookup style.
element["@some_attr"]
- Fetch value of “some_attr” in
:attributes
map.
- Fetch value of “some_attr” in
element[:texts]
- Fetch text (character data) children. It always returns list.
element["some_name"]
- Fetch child elements with
name: "some_name"
. It always returns list.
- Fetch child elements with
You can also use these patterns in Kernel.get_in/2
and its variants.
iex> xml = "<a>foo<b>bar</b>baz</a>"
iex> element = Antikythera.Xml.decode!(xml)
%Antikythera.Xml.Element{name: "a", attributes: %{}, children: [
"foo",
%Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
"baz",
]}
iex> get_in(element, [:texts])
["foo", "baz"]
iex> get_in(element, ["b", Access.at(0), :texts])
["bar"]
iex> get_and_update_in(element, [:children, Access.at(0)], fn _ -> :pop end)
{"foo",
%Antikythera.Xml.Element{name: "a", attributes: %{}, children: [
%Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
"baz",
]}}
iex> update_in(element, [:children, Access.all()], fn
...> text when is_binary(text) -> %Antikythera.Xml.Element{name: "b", attributes: %{}, children: [text]}
...> e -> e
...> end)
%Antikythera.Xml.Element{name: "a", attributes: %{}, children: [
%Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["foo"]},
%Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
%Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["baz"]},
]}
iex> update_in(element, ["@id"], fn _ -> "001" end)
%Antikythera.Xml.Element{name: "a", attributes: %{"id" => "001"}, children: [
"foo",
%Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
"baz",
]}
Notes on updating with Kernel.get_and_update_in/3
and its variants:
- Struct fields are static and cannot be popped.
- Custom access keys except “@some_attr” cannot be used in updating.
Use
:children
instead, in order to update children while preserving order of appearance.
Link to this section Summary
Functions
Reads an XML string and parses it into Antikythera.Xml.Element.t
Serializes Antikythera.Xml.Element.t
into XML string
Link to this section Types
encode_option() :: {:pretty | :with_header, boolean()}
Link to this section Functions
decode(String.t(), [decode_option()]) :: Croma.Result.t(Antikythera.Xml.Element.t())
Reads an XML string and parses it into Antikythera.Xml.Element.t
.
Comments and header will be discarded.
It can read XHTML document as long as they are well-formatted, though it does not understand Document Type Definition (DTD, header line with “<!DOCTYPE html PUBLIC …”), so you must remove them.
It tries to read a document with UTF-8 encoding, regardless of “encoding” attribute in the header.
Options:
:trim
- Drop whitespace-only texts. Defaultfalse
.- There are no universal way to distinguish significant and insignificant whitespaces, so this option may alter the meaning of original document. Use with caution.
- In W3C recommendation, it is stated that whitespace texts (character data) are basically significant and must be preserved.
decode!(String.t(), [decode_option()]) :: Antikythera.Xml.Element.t()
encode(Antikythera.Xml.Element.t(), [encode_option()]) :: String.t()
Serializes Antikythera.Xml.Element.t
into XML string.
Specifications:
- Trailing newline will not be generated.
- All single- and double-quotations in attribute values or entity values are escaped to
'
and"
respectively. - All attribute values are SINGLE-quoted.
- Does not insert a whitespace before “/>” in element without children.
Options:
:pretty
- Pretty print with 2-space indents. Defaultfalse
.- Similar to
:trim
option indecode/2
, inserted whitespaces may be significant, thus it can alter meaning of original document. Use with caution. - It does not insert whitespaces to elements with mixed-content and their descendants, in order to reduce probability to alter the meaning of original document.
- Similar to
:with_header
- Prepend<?xml version='1.0' encoding='UTF-8'?>\n
. Defaultfalse
.