Relyra.Security.XML.SaxyTree (relyra v1.2.0)

Copy Markdown View Source

Saxy.Handler that turns a raw SAML XML binary into a structured parse tree carrying, per element node: the verbatim qualified name, raw attributes in document order, and a computed in-scope namespace stack inherited from ancestors.

This module is the parse-substrate foundation for Phase 28. It applies the three Relyra-owned infoset-normalization layers that Saxy does not provide (Saxy performs zero namespace resolution, zero attribute-value normalization, and zero line-ending normalization):

  1. In-scope namespace stack (xml-exc-c14n visibly-utilized precondition): each element node's :ns map = the parent's in-scope map overlaid with the element's own xmlns / xmlns:prefix declarations.
  2. Attribute-value whitespace normalization (XML 1.0 §3.3.3, CDATA-type rule — SAML is DTD-less so every attribute is CDATA-type): each literal #x9 (tab) / #xA (LF) / #xD (CR) inside an attribute value becomes a single #x20 (space).
  3. Line-ending normalization (XML 1.0 §2.11): \r\n and a lone \r become \n in all parsed text / CDATA content.

These are infoset normalizations applied at tree-build time. They are kept STRICTLY SEPARATE from C14N escaping (e.g. 	 / 
), which is a serialize-time concern owned by the exclusive-C14N engine in a later plan.

Tree-node shape (the contract for Plans 02 and 03)

The tree is built from Relyra.Security.XML.SaxyTree.Node structs. This shape is the stable interface the exclusive-C14N engine (Plan 02) and the seam re-wiring (Plan 03) build against — do not reshape it without updating those plans.

%Relyra.Security.XML.SaxyTree.Node{
  qname:    String.t(),                      # verbatim qualified name, e.g. "ds:Signature" or "Assertion"
  prefix:   String.t(),                      # derived namespace prefix; "" when the element is unprefixed
  local:    String.t(),                      # derived local name (qname with the prefix stripped)
  attrs:    [{String.t(), String.t()}],      # raw attributes in DOCUMENT ORDER; each value is
                                             #   attribute-value normalized (layer #2). xmlns / xmlns:*
                                             #   declarations are retained here verbatim (as attrs) so the
                                             #   C14N engine can render them; they are ALSO surfaced in :ns.
  ns:       %{optional(String.t()) => String.t()},
                                             # in-scope namespace map: prefix => URI. The default namespace
                                             #   uses the "" key. Inherited from ancestors + this element's
                                             #   own declarations (layer #1).
  content:  [{:text, String.t()} | {:element, t()}],
                                             # ORDERED document-order content (D-09): interleaved text
                                             #   segments and child elements in source order, e.g.
                                             #   `[{:text, "x"}, {:element, %Node{}}, {:text, "y"}]`. This is
                                             #   the SINGLE SOURCE OF TRUTH for document order, consumed by
                                             #   the exclusive-C14N engine (Plan 02) so text and child
                                             #   elements canonicalize in source order (mixed content /
                                             #   inter-element whitespace). Text segments are line-ending
                                             #   normalized (layer #3), not whitespace-collapsed.
  children: [t()],                           # DERIVED view: the `{:element, _}` segments of `content`, in
                                             #   document order (kept for downstream helpers; unchanged shape).
  text:     String.t()                       # DERIVED view: concatenation of the `{:text, _}` segments of
                                             #   `content`, line-ending normalized (layer #3), in document
                                             #   order; NOT whitespace-collapsed.
}

content vs children/text (D-09): content is the ordered single source of truth; children and text are DERIVED views over it (the element segments and the concatenated text segments respectively), kept byte-identical to their pre-D-09 values so downstream helpers (Relyra.Security.XML.PureBeam field derivation) need no change.

Summary

Functions

Parse an XML binary into a Relyra.Security.XML.SaxyTree.Node tree.

Types

Functions

parse(xml)

@spec parse(binary()) :: {:ok, t()} | {:error, Saxy.ParseError.t()}

Parse an XML binary into a Relyra.Security.XML.SaxyTree.Node tree.

Returns {:ok, root_node} for well-formed input, or {:error, %Saxy.ParseError{}} for input Saxy rejects as not well-formed. Callers in the seam (Plan 03) map the Saxy.ParseError to the existing :malformed_xml member of the Relyra.Security.XML.xml_error_type union — no new error atom is introduced.