Datalog
Copy MarkdownIntro
These notebooks are Livebook adaptations of the upstream egglog tutorial from https://egraphs-good.github.io/egglog-tutorial/. The egglog examples follow the upstream chapter order closely; the extra Elixir cells are there to make normally silent egglog state visible in Livebook.
Each executable egglog block is sent directly to one live mutable Egglog.EGraph session. The helper below is only for displaying outputs and snapshots; the cells that change the e-graph call Egglog.EGraph.run/3 themselves.
When using the published package, replace the dependency with {:egglog_elixir, "~> 0.1"}.
Mix.install([
{:egglog_elixir, path: Path.expand("../..", __DIR__)},
{:kino, "~> 0.19.0"}
])defmodule Tutorial do
def print_outputs(result, code \\ "") do
(result.outputs ++ check_outputs(code))
|> Enum.map(& &1.text)
|> Enum.reject(&is_nil/1)
|> Enum.map(&String.trim_trailing/1)
|> Enum.reject(&(&1 == ""))
|> Enum.each(&IO.puts/1)
end
def render_snapshot(%{format: :svg, svg: svg}) when is_binary(svg) do
kino_html = Module.concat(Kino, HTML)
if Code.ensure_loaded?(kino_html) do
apply(kino_html, :new, [svg])
else
svg
end
end
def render_snapshot(%{text: text}), do: text
def summarize_egraph(egraph, opts \\ []) do
summary =
egraph
|> json_snapshot(opts)
|> Egglog.Snapshot.summary()
%{
e_nodes: summary.nodes,
e_classes: summary.classes,
by_operator: summary.operators,
omitted: summary.omitted
}
end
defp json_snapshot(egraph, opts) do
Egglog.EGraph.snapshot!(egraph, Keyword.merge([render: :json], opts))
end
defp check_outputs(code) do
code
|> String.split("\n")
|> Enum.map(&String.trim/1)
|> Enum.reject(&(&1 == ""))
|> Enum.filter(&String.contains?(&1, "(check "))
|> Enum.map(fn line ->
text =
if String.starts_with?(line, "(fail ") do
"expected failure observed: #{line}"
else
"check succeeded: #{line}"
end
%{type: :check_success, text: text <> "\n"}
end)
end
end
{:ok, egraph} = Egglog.EGraph.new()Datalog
Datalog is a relational language for deductive reasoning. In chapter 1, rules mostly enlarged equivalence classes of terms. Here, rules also derive rows of relations and functions.
The useful unifying picture is this:
- A relation such as
(edge 1 2)is a table row. - A function such as
(path 1 3) -> 20is a table row with a distinguished output value. - A constructor such as
(mk 3)is also function-like: equal inputs imply equal outputs. - Equality saturation and Datalog-style rule evaluation share the same e-graph database.
This chapter has three examples:
- Transitive closure as a relation.
- Shortest paths as a function with a merge expression.
- Reachability over unionable terms, where congruence closure and Datalog interact.
In this lesson, we will define multiple relations with the same name,
so we use the (push) and (pop) commands to clone and reset the database.
Under the hood, (push) clones the current database and pushes it onto a stack,
and (pop) resets the database to the top of the stack.
(push)egglog = """
(push)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)push and pop are native egglog commands. They operate on the same live egraph handle, so the stack behavior is visible across cells.
Let's first define relations edge and path.
We use (edge a b) to mean the tuple (a, b) is in the edge relation.
(edge a b) means there are directed edges from a to b,
and we will use it to compute the path relation,
where (path a b) means there is a (directed) path from a to b.
(relation edge (i64 i64))
(relation path (i64 i64))egglog = """
(relation edge (i64 i64))
(relation path (i64 i64))
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)We can insert edges into our relation by asserting facts:
(edge 1 2)
(edge 2 3)
(edge 3 4)egglog = """
(edge 1 2)
(edge 2 3)
(edge 3 4)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)Fact definitions are similar to definitions using (let ..) definitions in the last lesson,
in that facts are immediately added to relations.
Now let's tell egglog how to derive the path relation.
First, if an edge from a to b exists, then it is already a proof that there exists a path from a to b.
(rule ((edge a b))
((path a b)))egglog = """
(rule ((edge a b))
((path a b)))
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)A rule has the form (rule (atom1 atom2 ..) (action1 action2 ..)).
For the rule we have just defined, the only atom in the query is (edge a b), which asks egglog to search
for possible a and b such that (edge a b) is a fact in the database.
We call the first part the "query" of a rule, and the second part the "actions" of a rule.
In Datalog terminology, confusingly, the first part is called the "body" of the rule
while the second part is called the "head" of the rule. This is because Datalog rules
are usually written as head :- body. To avoid confusion, we will refrain from using
Datalog terminology.
The rule above defines the base case of the path relation. The inductive case reads as follows:
if we know there is a path from a to b, and there is an edge from b to c, then
there is also a path from a to c.
This can be expressed as egglog rule below:
(rule ((path a b) (edge b c))
((path a c)))egglog = """
(rule ((path a b) (edge b c))
((path a c)))
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)Again, defining a rule does not mean running it in egglog, which may be a surprise to those familiar with Datalog. The user still needs to run the program. For instance, the following check would fail at this point.
(fail (check (path 1 4)))egglog = """
(fail (check (path 1 4)))
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)But it passes after we run our rules for 10 iterations.
(run 10)
(check (path 1 4)) ; this succeeds
(print-function path 20)egglog = """
(run 10)
(check (path 1 4))
(print-function path 20)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)The relation table already shows the derived paths. The graph view is a complementary picture: relation facts appear as nodes, and the saturated transitive closure becomes visible as a larger connected structure.
For a compact, non-visual inspection, the JSON snapshot gives counts by operator. This is useful when the graph drawing is too busy.
Tutorial.summarize_egraph(egraph)Egglog.EGraph.snapshot!(egraph)
|> Tutorial.render_snapshot()For many deductive rules, we do not know the number of iterations
needed to reach a fixpoint. The egglog language provides the saturate scheduling primitive to run the rules until fixpoint.
(run-schedule (saturate (run)))egglog = """
(run-schedule (saturate (run)))
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)We will cover more details about schedules later in the tutorial.
Exercises:
Consider the variant of our last rule:
(rule ((path a b) (path b c))
((path a c)))Does this rule compute the same relation as the original rule? How does this rule compare to the original rule? Hint: it's slow, why?
(pop) ;; resets egglog
(push)egglog = """
(pop)
(push)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)Our last example determines whether there is a path from one node to another, but we don't know the details about the path. Let's slightly extend our program to obtain the length of the shortest path between any two nodes.
(function edge (i64 i64) i64 :no-merge)
(function path (i64 i64) i64 :merge (min old new))egglog = """
(function edge (i64 i64) i64 :no-merge)
(function path (i64 i64) i64 :merge (min old new))
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)Egglog.EGraph.snapshot!(egraph) |> Tutorial.render_snapshot()Here, we use a new keyword called function to define a table that respects the functional dependency.
A relation is just a function with output domain Unit.
By defining edge and path with function, we can associate a length to each path.
What happens if the same tuple of a function is mapped to two values?
In the case of relation, this is easy: Unit only has one value, so the two values must be identical.
But in general, that would be a violation of functional dependency, the property that a = b implies f(a) = f(b).
Egglog allows us to specify how to reconcile two values that are mapped from the same tuple using merge expressions.
For instance, for path, the merge expression is (min old new); old and new are two special values in an expression
that denotes the current output of the tuple and the output of the new, to-be-inserted value.
The merge expression for path says that, when there are two paths from a to b with lengths old and new,
we keep the shorter one, i.e., (min old new).
For edge, we can define the merge expression the same as path, which means that we only keep the shortest edge
if there are multiple edges. But we can also assert that edge does not have a merge expression using :no-merge.
This means we don't expect there will be multiple edges between two nodes. More generally, it is the user's
responsibility to ensure no tuples with conflicting output values exist. If a conflict happens, native egglog reports an error through the wrapper.
Now let's insert the same edges as before, but we will assign a length to each edge. This is done using the set action,
which takes a tuple and an output value:
(set (edge 1 2) 10)
(set (edge 2 3) 10)
(set (edge 1 3) 30)egglog = """
(set (edge 1 2) 10)
(set (edge 2 3) 10)
(set (edge 1 3) 30)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)Let us define the edge base rule and transitive rule for the path function.
In this rule, we use the set action to set the output value of the path function.
On the query side, we use = to bind the output value of a function.
(rule ((= (edge x y) len))
((set (path x y) len)))
(rule ((= (path x y) xy)
(= (edge y z) yz))
((set (path x z) (+ xy yz))))egglog = """
(rule ((= (edge x y) len))
((set (path x y) len)))
(rule ((= (path x y) xy)
(= (edge y z) yz))
((set (path x z) (+ xy yz))))
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)Let's run our rules and check we get the desired shortest path.
(run-schedule (saturate (run)))
(check (= (path 1 3) 20))
(print-function path 20)egglog = """
(run-schedule (saturate (run)))
(check (= (path 1 3) 20))
(print-function path 20)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)Here the interesting thing is not just reachability but the :merge (min old new) behavior: the direct path 1 -> 3 has cost 30, while the derived path through 2 has cost 20. The serialized graph lets us inspect the function rows that survived merging.
Tutorial.summarize_egraph(egraph)Egglog.EGraph.snapshot!(egraph)
|> Tutorial.render_snapshot()(pop)
(push)egglog = """
(pop)
(push)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)Now let us combine the knowledge we have learned in lessons 1 and 2 to write a program that combines both equality saturation and Datalog.
We reuse our path example, but this time the nodes are terms constructed using the mk constructor,
We start by defining a new, union-able sort.
(sort Node)egglog = """
(sort Node)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)We can then define a new constructor for our sort.
(constructor mk (i64) Node)egglog = """
(constructor mk (i64) Node)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)Note: We could have equivalently written
(datatype Node
(mk i64))datatype is syntax sugar for sorts and constructors.
(relation edge (Node Node))
(relation path (Node Node))
(rule ((edge x y))
((path x y)))
(rule ((path x y) (edge y z))
((path x z)))
(edge (mk 1) (mk 2))
(edge (mk 2) (mk 3))
(edge (mk 3) (mk 1))
(edge (mk 5) (mk 6))egglog = """
(relation edge (Node Node))
(relation path (Node Node))
(rule ((edge x y))
((path x y)))
(rule ((path x y) (edge y z))
((path x z)))
(edge (mk 1) (mk 2))
(edge (mk 2) (mk 3))
(edge (mk 3) (mk 1))
(edge (mk 5) (mk 6))
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)Because we defined our nodes using a sort, we can "union" two nodes.
This makes them indistinguishable to rules in egglog.
(union (mk 3) (mk 5))egglog = """
(union (mk 3) (mk 5))
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)union is a new keyword here, but it is our old friend: rewrites are implemented as rules whose
actions are unions. For instance, (rewrite (Add x y) (Add y x)) is lowered to the following
rule:
(rule ((= e (Add x y)))
((union e (Add y x))))(run-schedule (saturate (run)))
(check (edge (mk 3) (mk 6)))
(check (path (mk 1) (mk 6)))
(print-function edge 20)
(print-function path 20)egglog = """
(run-schedule (saturate (run)))
(check (edge (mk 3) (mk 6)))
(check (path (mk 1) (mk 6)))
(print-function edge 20)
(print-function path 20)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)This is the first example where congruence closure and Datalog-style reachability interact directly: after (mk 3) and (mk 5) are unioned, paths through one representative can satisfy queries through the other.
In universal-algebra terms, the mk terms live in a quotient. The edge and path relations are interpreted over representatives of that quotient, so equal nodes share relational consequences.
Tutorial.summarize_egraph(egraph)Egglog.EGraph.snapshot!(egraph)
|> Tutorial.render_snapshot()We can also give a new meaning to equivalence by adding the following rule.
(rule ((path x y)
(path y x))
((union x y)))egglog = """
(rule ((path x y)
(path y x))
((union x y)))
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)This rule says that if there is a path from x to y and from y to x, then
x and y are equivalent.
This rule allows us to tell if a and b are in the same connected component by checking
(check (= (mk a) (mk b))).
(run-schedule (saturate (run)))
(check (= (mk 1) (mk 2))
(= (mk 1) (mk 3))
(= (mk 2) (mk 3)))
(print-function path 20)egglog = """
(run-schedule (saturate (run)))
(check (= (mk 1) (mk 2))
(= (mk 1) (mk 3))
(= (mk 2) (mk 3)))
(print-function path 20)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)Exercises:
(Parametrized rewrite rule) In lesson 1, it is tedious to write associativity and commutativity (AC) rules for every AC operator (e.g., Add, Mul, ...). One way to save us from this repetitive work is to parameterize over the operators.
In the following program, instead of Mul and Add, Expr has a Bop operator
that takes an operator kind and two operands.
(datatype BopKind (Add) (Mul) (Div))
(datatype Expr (Num i64) (Var String) (Bop BopKind Expr Expr))First, add a relation AcBopKind that holds BopKinds which are associative and commutative.
Second, insert the appropriate BopKinds into that relation.
Third, add a single rule that implements commutativity for all Bops with an AcBopKind.
Finally, add a single rule that implements associativity for all Bops with an AcBopKind.
(Universe relation)
One of the rewrite rules that we cannot make a birewrite in lesson 1 is the rule
(rewrite (Add x (Num 0)) x). This is because it is not clear what x is bound to
in the inverse rule, (rewrite x (Add x (Num 0))). Therefore, defining this rule in egglog
causes an error. One way to fix this is to introduce a "universe" relation, which contains
every term in the e-graph. So we can write our rules as follows:
(rule ((universe x))
((union x (Add x (Num 0)))))Please define such a universe relation so that the above rule works.
Side Note: The above rule can also be written using the shorthand rewrite with a :when
condition: (rewrite x (Add x (Num 0)) :when ((universe x))), and similarly the birewrite
rule (birewrite (Add x (Num 0)) x :when ((universe x))), which introduces a bit of
overhead for the forward direction, but is more concise.
Everything is a function
We have talked about relations being a special case of functions, but in fact, constructors like Add and Mul
are also function tables under the hood!
For instance, the Add constructor
(constructor Add (Expr Expr) Expr)is a function with two Expr inputs and one Expr output. Its merge expression is simply to union the two output values.
This corresponds to functional dependency: If x1 = x2, and y1 = y2, then it follows that
(Add x1 y1) and (Add x2 y2) are equivalent.
In other words, functions, functional dependencies, and merge expressions are the fundamental unifying concepts in our framework.
This allows egglog to be implemented as a database system using advanced database techniques.
There is one last caveat about constructors though: Different from function but similar to relation, constructors have default values, so expressions
like (let e1 (Num 2)) succeeds even when (Num 2) is not already in the e-graph, in which case it creates a new id for the output.
In contrast, since relations are functions with output domain Unit, their default value is the only value of type Unit.