Meeseeks v0.7.5 Meeseeks.Document

A Meeseeks.Document represents a flattened, queryable view of an HTML document in which:

  • The nodes (element, comment, or text) have been provided an id
  • Parent-child relationships have been made explicit

Examples

The actual contents of a document become quickly unwieldly in iex, so the inspect value of a document is always #Meeseeks.Document<{...}> regardless of the content. The example below ignores this fact for educational purposes.

tuple_tree = {"html", [],
               [{"head", [], []},
                {"body", [],
                 [{"h1", [{"id", "greeting"}], ["Hello, World!"]},
                  {"div", [], [
                      {"p", [], ["1"]},
                      {"p", [], ["2"]},
                      {"p", [], ["3"]}]}]}]}

document = Meeseeks.Parser.parse(tuple_tree)
#=> %Meeseeks.Document{
#      id_counter: 12,
#      roots: [1],
#      nodes: %{
#        1 => %Meeseeks.Document.Element{attributes: [], children: [3, 2],
#         id: 1, namespace: nil, parent: nil, tag: "html"},
#        2 => %Meeseeks.Document.Element{attributes: [], children: [], id: 2,
#         namespace: nil, parent: 1, tag: "head"},
#        3 => %Meeseeks.Document.Element{attributes: [], children: [6, 4], id: 3,
#         namespace: nil, parent: 1, tag: "body"},
#        4 => %Meeseeks.Document.Element{attributes: [{"id", "greeting"}],
#         children: [5], id: 4, namespace: nil, parent: 3, tag: "h1"},
#        5 => %Meeseeks.Document.Text{content: "Hello, World!", id: 5, parent: 4},
#        6 => %Meeseeks.Document.Element{attributes: [], children: [7, 9, 11],
#         id: 6, namespace: nil, parent: 3, tag: "div"},
#        7 => %Meeseeks.Document.Element{attributes: [], children: [8], id: 7,
#         namespace: nil, parent: 6, tag: "p"},
#        8 => %Meeseeks.Document.Text{content: "1", id: 8, parent: 7},
#        9 => %Meeseeks.Document.Element{attributes: [], children: [10], id: 9,
#         namespace: nil, parent: 6, tag: "p"},
#        10 => %Meeseeks.Document.Text{content: "2", id: 10, parent: 9},
#        11 => %Meeseeks.Document.Element{attributes: [], children: [12], id: 11,
#         namespace: nil, parent: 6, tag: "p"},
#        12 => %Meeseeks.Document.Text{content: "3", id: 12, parent: 11}}}

Meeseeks.Document.children(document, 6)
#=> [7, 9, 11]

Meeseeks.Document.descendants(document, 6)
#=> [7, 8, 9, 10, 11, 12]

Summary

Functions

Returns the node ids of node_id's ancestors in the context of the document

Returns the node ids of node_id's children in the context of the document

Returns the node ids of node_id's descendants in the context of the document

Checks if a node_id refers to a Meeseeks.Document.Element in the context of the document

Returns the node referred to by node_id in the context of the document

Returns all of the document's nodes

Returns the nodes referred to by node_ids in the context of the document

Returns all of the document's root nodes

Returns the HTML of the document

Returns the node ids of the siblings that come after node_id in the context of the document

Returns the node id of nodeid's parent in the context of the document, or nil if nodeid does not have a parent

Returns the node ids of the siblings that come before node_id in the context of the document

Returns the node ids of node_id's siblings in the context of the document

Returns the Meeseeks.TupleTree of the document

Types

node_id()
node_id() :: integer
node_t()
node_t() :: Meeseeks.Document.Node.t
t()
t() :: %Meeseeks.Document{id_counter: node_id | nil, nodes: %{optional(node_id) => node_t}, roots: [node_id]}

Functions

ancestors(document, node_id)

Returns the node ids of node_id's ancestors in the context of the document.

Returns the ancestors in reverse order: [parent, grandparent, ...]

children(document, node_id)

Returns the node ids of node_id's children in the context of the document.

Returns all children, not just those that are Meeseeks.Document.Elements.

Returns children in depth-first order.

descendants(document, node_id)
descendants(Meeseeks.Document.t, node_id) :: [node_id]

Returns the node ids of node_id's descendants in the context of the document.

Returns all descendants, not just those that are Meeseeks.Document.Elements.

Returns descendants in depth-first order.

element?(document, node_id)
element?(Meeseeks.Document.t, node_id) :: boolean

Checks if a node_id refers to a Meeseeks.Document.Element in the context of the document.

get_node(document, node_id)

Returns the node referred to by node_id in the context of the document.

get_nodes(document)
get_nodes(Meeseeks.Document.t) :: [node_t]

Returns all of the document's nodes.

Returns nodes in depth-first order.

get_nodes(document, node_ids)
get_nodes(Meeseeks.Document.t, [node_id]) :: [node_t]

Returns the nodes referred to by node_ids in the context of the document.

Returns nodes in the order they are provided if node_ids are provided.

get_root_nodes(document)
get_root_nodes(Meeseeks.Document.t) :: [node_t]

Returns all of the document's root nodes.

Returns nodes in depth-first order.

html(document)

Returns the HTML of the document.

next_siblings(document, node_id)
next_siblings(Meeseeks.Document.t, node_id) :: [node_id]

Returns the node ids of the siblings that come after node_id in the context of the document.

Returns all of these siblings, not just those that are Meeseeks.Document.Elements.

Returns siblings in depth-first order.

parent(document, node_id)
parent(Meeseeks.Document.t, node_id) :: node_id | nil

Returns the node id of nodeid's parent in the context of the document, or nil if nodeid does not have a parent.

previous_siblings(document, node_id)
previous_siblings(Meeseeks.Document.t, node_id) :: [node_id]

Returns the node ids of the siblings that come before node_id in the context of the document.

Returns all of these siblings, not just those that are Meeseeks.Document.Elements.

Returns siblings in depth-first order.

siblings(document, node_id)

Returns the node ids of node_id's siblings in the context of the document.

Returns all siblings, including node_id itself, and not just those that are Meeseeks.Document.Elements.

Returns siblings in depth-first order.

tree(document)

Returns the Meeseeks.TupleTree of the document.