Meeseeks v0.5.0 Meeseeks.Document

A Meeseeks.Document represents a flattened, queryable view of an HTML document in which:

  • The nodes (element, comment, or text) have been provided an id
  • Parent-child relationships have been made explicit

Examples

The actual contents of a document become quickly unwieldly in iex, so the inspect value of a document is always #Meeseeks.Document<{...}> regardless of the content. The example below ignores this fact for educational purposes.

tuple_tree = {"html", [],
               [{"head", [], []},
                {"body", [],
                 [{"h1", [{"id", "greeting"}], ["Hello, World!"]},
                  {"div", [], [
                      {"p", [], ["1"]},
                      {"p", [], ["2"]},
                      {"p", [], ["3"]}]}]}]}

document = Meeseeks.Parser.parse(tuple_tree)
#=> %Meeseeks.Document{
#      id_counter: 12,
#      roots: [1],
#      nodes: %{
#        1 => %Meeseeks.Document.Element{attributes: [], children: [3, 2],
#         id: 1, namespace: nil, parent: nil, tag: "html"},
#        2 => %Meeseeks.Document.Element{attributes: [], children: [], id: 2,
#         namespace: nil, parent: 1, tag: "head"},
#        3 => %Meeseeks.Document.Element{attributes: [], children: [6, 4], id: 3,
#         namespace: nil, parent: 1, tag: "body"},
#        4 => %Meeseeks.Document.Element{attributes: [{"id", "greeting"}],
#         children: [5], id: 4, namespace: nil, parent: 3, tag: "h1"},
#        5 => %Meeseeks.Document.Text{content: "Hello, World!", id: 5, parent: 4},
#        6 => %Meeseeks.Document.Element{attributes: [], children: [7, 9, 11],
#         id: 6, namespace: nil, parent: 3, tag: "div"},
#        7 => %Meeseeks.Document.Element{attributes: [], children: [8], id: 7,
#         namespace: nil, parent: 6, tag: "p"},
#        8 => %Meeseeks.Document.Text{content: "1", id: 8, parent: 7},
#        9 => %Meeseeks.Document.Element{attributes: [], children: [10], id: 9,
#         namespace: nil, parent: 6, tag: "p"},
#        10 => %Meeseeks.Document.Text{content: "2", id: 10, parent: 9},
#        11 => %Meeseeks.Document.Element{attributes: [], children: [12], id: 11,
#         namespace: nil, parent: 6, tag: "p"},
#        12 => %Meeseeks.Document.Text{content: "3", id: 12, parent: 11}}}

Meeseeks.Document.children(document, 6)
#=> [7, 9, 11]

Meeseeks.Document.descendants(document, 6)
#=> [7, 8, 9, 10, 11, 12]

Summary

Functions

Returns the node ids of node_id's ancestors in the context of the document

Returns the node ids of node_id's children in the context of the document

Returns the node ids of node_id's descendants in the context of the document

Checks if a node_id refers to a Meeseeks.Document.Element in the context of the document

Returns the node referred to by node_id in the context of the document

Returns all of the document's nodes

Returns the nodes referred to by node_ids in the context of the document

Returns the node ids of the siblings that come after node_id in the context of the document

Returns the node id of nodeid's parent in the context of the document, or nil if nodeid does not have a parent

Returns the node ids of the siblings that come before node_id in the context of the document

Returns the node ids of node_id's siblings in the context of the document

Types

node_id()
node_id() :: integer
node_t()
node_t() :: Meeseeks.Document.Node.t
t()
t() :: %Meeseeks.Document{id_counter: node_id | nil, nodes: %{optional(node_id) => node_t}, roots: [node_id]}

Functions

ancestors(document, node_id)

Returns the node ids of node_id's ancestors in the context of the document.

Returns the ancestors in reverse order: [parent, grandparent, ...]

children(document, node_id)

Returns the node ids of node_id's children in the context of the document.

Returns all children, not just those that are Meeseeks.Document.Elements.

Returns children in depth-first order.

descendants(document, node_id)
descendants(Meeseeks.Document.t, node_id) :: [node_id]

Returns the node ids of node_id's descendants in the context of the document.

Returns all descendants, not just those that are Meeseeks.Document.Elements.

Returns descendants in depth-first order.

element?(document, node_id)
element?(Meeseeks.Document.t, node_id) :: boolean

Checks if a node_id refers to a Meeseeks.Document.Element in the context of the document.

get_node(document, node_id)

Returns the node referred to by node_id in the context of the document.

get_nodes(document)
get_nodes(Meeseeks.Document.t) :: [node_t]

Returns all of the document's nodes.

Returns nodes in depth-first order.

get_nodes(document, node_ids)
get_nodes(Meeseeks.Document.t, [node_id]) :: [node_t]

Returns the nodes referred to by node_ids in the context of the document.

Returns nodes in the order they are provided if node_ids are provided.

next_siblings(document, node_id)
next_siblings(Meeseeks.Document.t, node_id) :: [node_id]

Returns the node ids of the siblings that come after node_id in the context of the document.

Returns all of these siblings, not just those that are Meeseeks.Document.Elements.

Returns siblings in depth-first order.

parent(document, node_id)
parent(Meeseeks.Document.t, node_id) :: node_id | nil

Returns the node id of nodeid's parent in the context of the document, or nil if nodeid does not have a parent.

previous_siblings(document, node_id)
previous_siblings(Meeseeks.Document.t, node_id) :: [node_id]

Returns the node ids of the siblings that come before node_id in the context of the document.

Returns all of these siblings, not just those that are Meeseeks.Document.Elements.

Returns siblings in depth-first order.

siblings(document, node_id)

Returns the node ids of node_id's siblings in the context of the document.

Returns all siblings, including node_id itself, and not just those that are Meeseeks.Document.Elements.

Returns siblings in depth-first order.