Scheduling
Copy MarkdownIntro
These notebooks are Livebook adaptations of the upstream egglog tutorial from https://egraphs-good.github.io/egglog-tutorial/. The egglog examples follow the upstream chapter order closely; the extra Elixir cells are there to make normally silent egglog state visible in Livebook.
Each executable egglog block is sent directly to one live mutable Egglog.EGraph session. The helper below is only for displaying outputs and snapshots; the cells that change the e-graph call Egglog.EGraph.run/3 themselves.
When using the published package, replace the dependency with {:egglog_elixir, "~> 0.1"}.
Mix.install([
{:egglog_elixir, path: Path.expand("../..", __DIR__)},
{:kino, "~> 0.19.0"}
])defmodule Tutorial do
def print_outputs(result, code \\ "") do
(result.outputs ++ check_outputs(code))
|> Enum.map(& &1.text)
|> Enum.reject(&is_nil/1)
|> Enum.map(&String.trim_trailing/1)
|> Enum.reject(&(&1 == ""))
|> Enum.each(&IO.puts/1)
end
def render_snapshot(%{format: :svg, svg: svg}) when is_binary(svg) do
kino_html = Module.concat(Kino, HTML)
if Code.ensure_loaded?(kino_html) do
apply(kino_html, :new, [svg])
else
svg
end
end
def render_snapshot(%{text: text}), do: text
def summarize_egraph(egraph, opts \\ []) do
summary =
egraph
|> json_snapshot(opts)
|> Egglog.Snapshot.summary()
%{
e_nodes: summary.nodes,
e_classes: summary.classes,
by_operator: summary.operators,
omitted: summary.omitted
}
end
defp json_snapshot(egraph, opts) do
Egglog.EGraph.snapshot!(egraph, Keyword.merge([render: :json], opts))
end
defp check_outputs(code) do
code
|> String.split("\n")
|> Enum.map(&String.trim/1)
|> Enum.reject(&(&1 == ""))
|> Enum.filter(&String.contains?(&1, "(check "))
|> Enum.map(fn line ->
text =
if String.starts_with?(line, "(fail ") do
"expected failure observed: #{line}"
else
"check succeeded: #{line}"
end
%{type: :check_success, text: text <> "\n"}
end)
end
end
{:ok, egraph} = Egglog.EGraph.new()Scope note: the executable part of this Livebook covers rulesets,
run-schedule,saturate,repeat, and ordinaryrun. The upstream chapter also uses experimental conveniences and scheduler constructs fromegglog-experimental:with-ruleset,let-scheduler,run-with, andback-off.with-rulesetis expanded below to ordinary:rulesetannotations. The custom scheduler blocks are preserved as reference text rather than executed through the thin wrapper.
Scheduling
This chapter is about controlling when rules run.
The key lesson is not that one schedule is always best. The key lesson is that equality saturation has two competing forces:
- Analysis rules often derive useful facts without creating many new terms.
- Optimization rewrites can create many equivalent terms, especially associativity and commutativity rules.
Scheduling lets us spend budget where it matters. We start by using the same expression language as chapter 3.
(datatype Expr
(Num BigRat)
(Var String)
(Add Expr Expr)
(Mul Expr Expr)
(Div Expr Expr))
(let zero (bigrat (bigint 0) (bigint 1)))
(let one (bigrat (bigint 1) (bigint 1)))
(let two (bigrat (bigint 2) (bigint 1)))egglog = """
(datatype Expr
(Num BigRat)
(Var String)
(Add Expr Expr)
(Mul Expr Expr)
(Div Expr Expr))
(let zero (bigrat (bigint 0) (bigint 1)))
(let one (bigrat (bigint 1) (bigint 1)))
(let two (bigrat (bigint 2) (bigint 1)))
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)Rulesets
Rulesets name groups of rules. Here we separate rules that derive semantic facts from rules that grow the syntactic search space.
(ruleset optimizations)
(ruleset analysis)egglog = """
(ruleset optimizations)
(ruleset analysis)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)We can add rules to rulesets using the :ruleset annotation on rules, rewrites, and birewrites.
Leaving off :ruleset causes the rule to be added to the default ruleset, which is what we've shown so far.
We can run rulesets using (run ruleset iters), or (run iters) for running the default ruleset.
Here, we add leq rules to the analysis ruleset, because they don't add new Expr nodes to the e-graph.
(relation leq (Expr Expr))
(rule ((leq e1 e2) (leq e2 e3)) ((leq e1 e3)) :ruleset analysis)
(rule ((= e1 (Num n1)) (= e2 (Num n2)) (<= n1 n2)) ((leq e1 e2)) :ruleset analysis)
(rule ((= v (Var x))) ((leq v v)) :ruleset analysis)
(rule ((= e1 (Add e1a e1b)) (= e2 (Add e2a e2b)) (leq e1a e2a) (leq e1b e2b))
((leq e1 e2))
:ruleset analysis)egglog = """
(relation leq (Expr Expr))
(rule ((leq e1 e2) (leq e2 e3)) ((leq e1 e3)) :ruleset analysis)
(rule ((= e1 (Num n1)) (= e2 (Num n2)) (<= n1 n2)) ((leq e1 e2)) :ruleset analysis)
(rule ((= v (Var x))) ((leq v v)) :ruleset analysis)
(rule ((= e1 (Add e1a e1b)) (= e2 (Add e2a e2b)) (leq e1a e2a) (leq e1b e2b))
((leq e1 e2))
:ruleset analysis)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)In contrast, the following axiomatic rules are doing optimizations, so we add them to the optimizations ruleset.
(birewrite (Add x (Add y z)) (Add (Add x y) z) :ruleset optimizations)
(birewrite (Mul x (Mul y z)) (Mul (Mul x y) z) :ruleset optimizations)
(rewrite (Add x y) (Add y x) :ruleset optimizations)
(rewrite (Mul x y) (Mul y x) :ruleset optimizations)
(rewrite (Mul x (Add y z)) (Add (Mul x y) (Mul x z)) :ruleset optimizations)
(rewrite (Add x (Num zero)) x :ruleset optimizations)
(rewrite (Mul x (Num one)) x :ruleset optimizations)
(rewrite (Add (Num a) (Num b)) (Num (+ a b)) :ruleset optimizations)
(rewrite (Mul (Num a) (Num b)) (Num (* a b)) :ruleset optimizations)egglog = """
(birewrite (Add x (Add y z)) (Add (Add x y) z) :ruleset optimizations)
(birewrite (Mul x (Mul y z)) (Mul (Mul x y) z) :ruleset optimizations)
(rewrite (Add x y) (Add y x) :ruleset optimizations)
(rewrite (Mul x y) (Mul y x) :ruleset optimizations)
(rewrite (Mul x (Add y z)) (Add (Mul x y) (Mul x z)) :ruleset optimizations)
(rewrite (Add x (Num zero)) x :ruleset optimizations)
(rewrite (Mul x (Num one)) x :ruleset optimizations)
(rewrite (Add (Num a) (Num b)) (Num (+ a b)) :ruleset optimizations)
(rewrite (Mul (Num a) (Num b)) (Num (* a b)) :ruleset optimizations)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)It is tedious and error-prone to manually annotate each rule with a ruleset,
so egglog-experimental provides a convenience syntax with-ruleset:
(with-ruleset optimizations
(birewrite (Add x (Add y z)) (Add (Add x y) z))
...
)From now on, we will use this syntax.
Here we add the rest of the rules from the last section, but tagged with the appropriate rulesets.
(function upper-bound (Expr) BigRat :merge (min old new))
(function lower-bound (Expr) BigRat :merge (max old new))
(rule ((leq e (Num n))) ((set (upper-bound e) n)) :ruleset analysis)
(rule ((leq (Num n) e)) ((set (lower-bound e) n)) :ruleset analysis)
(rule ((= e (Add e1 e2)) (= (upper-bound e1) u1) (= (upper-bound e2) u2))
((set (upper-bound e) (+ u1 u2))) :ruleset analysis)
(rule ((= e (Add e1 e2)) (= (lower-bound e1) l1) (= (lower-bound e2) l2))
((set (lower-bound e) (+ l1 l2))) :ruleset analysis)
(rule ((= e (Mul e1 e2))
(= le1 (lower-bound e1)) (= le2 (lower-bound e2))
(= ue1 (upper-bound e1)) (= ue2 (upper-bound e2)))
((set (lower-bound e)
(min (* le1 le2) (min (* le1 ue2) (min (* ue1 le2) (* ue1 ue2)))))
(set (upper-bound e)
(max (* le1 le2) (max (* le1 ue2) (max (* ue1 le2) (* ue1 ue2))))))
:ruleset analysis)
(relation non-zero (Expr))
(rule ((< (upper-bound e) zero)) ((non-zero e)) :ruleset analysis)
(rule ((> (lower-bound e) zero)) ((non-zero e)) :ruleset analysis)egglog = """
(function upper-bound (Expr) BigRat :merge (min old new))
(function lower-bound (Expr) BigRat :merge (max old new))
(rule ((leq e (Num n))) ((set (upper-bound e) n)) :ruleset analysis)
(rule ((leq (Num n) e)) ((set (lower-bound e) n)) :ruleset analysis)
(rule ((= e (Add e1 e2)) (= (upper-bound e1) u1) (= (upper-bound e2) u2))
((set (upper-bound e) (+ u1 u2))) :ruleset analysis)
(rule ((= e (Add e1 e2)) (= (lower-bound e1) l1) (= (lower-bound e2) l2))
((set (lower-bound e) (+ l1 l2))) :ruleset analysis)
(rule ((= e (Mul e1 e2))
(= le1 (lower-bound e1)) (= le2 (lower-bound e2))
(= ue1 (upper-bound e1)) (= ue2 (upper-bound e2)))
((set (lower-bound e)
(min (* le1 le2) (min (* le1 ue2) (min (* ue1 le2) (* ue1 ue2)))))
(set (upper-bound e)
(max (* le1 le2) (max (* le1 ue2) (max (* ue1 le2) (* ue1 ue2))))))
:ruleset analysis)
(relation non-zero (Expr))
(rule ((< (upper-bound e) zero)) ((non-zero e)) :ruleset analysis)
(rule ((> (lower-bound e) zero)) ((non-zero e)) :ruleset analysis)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)Finally, we have optimization rules that depend on the analysis rules we defined above.
(rewrite (Div x x) (Num one) :when ((non-zero x)) :ruleset optimizations)
(rewrite (Mul x (Div y x)) y :when ((non-zero x)) :ruleset optimizations)egglog = """
(rewrite (Div x x) (Num one) :when ((non-zero x)) :ruleset optimizations)
(rewrite (Mul x (Div y x)) y :when ((non-zero x)) :ruleset optimizations)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)Now consider the following program, which consists of a long sequence of additions inside a cancelling division.
(push)
; a + (b + (c + (d + (e + f))))
(let addition-chain
(Add (Var "a") (Add (Var "b")
(Add (Var "c") (Add (Var "d")
(Add (Var "e") (Var "f")))))))
; 1 + 1 + 1 + 1 + 2
(let nonzero-expr
(Add (Num one) (Add (Num one)
(Add (Num one) (Add (Num one) (Num two))))))
(let expr (Mul nonzero-expr (Div addition-chain nonzero-expr)))egglog = """
(push)
(let addition-chain
(Add (Var "a") (Add (Var "b")
(Add (Var "c") (Add (Var "d")
(Add (Var "e") (Var "f")))))))
(let nonzero-expr
(Add (Num one) (Add (Num one)
(Add (Num one) (Add (Num one) (Num two))))))
(let expr (Mul nonzero-expr (Div addition-chain nonzero-expr)))
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)We want the following check to pass after running the rules.
(fail (check (= expr addition-chain)))egglog = """
(fail (check (= expr addition-chain)))
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)Before any schedule runs, the e-graph contains the long addition chain and the cancelling division, but not yet the analysis facts needed by the guarded rewrite.
Tutorial.summarize_egraph(egraph)Egglog.EGraph.snapshot!(egraph)
|> Tutorial.render_snapshot()To make this check pass, we have to first discover that nonzero-expr is indeed non-zero,
which allows the rule from (Mul x (Div y x)) to y to fire.
On the other hand, if we apply the optimization rules, we risk the exponential blowup from
the associative and commutative permutations of the addition-chain.
Therefore, if we try to run both rulesets directly, egglog will spend lots of effort reassociating and
commuting the terms in the addition-chain, even though the optimization that we actually
want to run only takes one iteration. However, that optimization requires knowing a fact
that takes multiple iterations to compute (propagating lower- and upper-bounds
through nonzero-expr). We can build a more efficient schedule.
(push)egglog = """
(push)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)Schedules
Our schedule starts by saturating the analysis rules, fully propagating the non-zero information without
adding any e-nodes to the e-graph.
(run-schedule (saturate (run analysis)))egglog = """
(run-schedule (saturate (run analysis)))
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)After saturating only the analysis rules, the useful facts have been derived without running the associativity/commutativity-heavy optimization ruleset.
Compare this summary with the previous one: the database has learned facts, but it has avoided most term explosion.
Tutorial.summarize_egraph(egraph)Egglog.EGraph.snapshot!(egraph)
|> Tutorial.render_snapshot()Then, just run one iteration of the optimizations ruleset.
(run optimizations 1)egglog = """
(run optimizations 1)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)Now the guarded optimization has had a chance to use the analysis facts. Comparing this snapshot with the previous one shows why schedule order matters.
Tutorial.summarize_egraph(egraph)Egglog.EGraph.snapshot!(egraph)
|> Tutorial.render_snapshot()Or equivalently,
(run-schedule
(saturate (run analysis))
(run optimizations))This makes our check pass
(check (= expr addition-chain))
(pop)egglog = """
(check (= expr addition-chain))
(pop)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)The one-shot schedule above is effective for this specific term, but some problems require interleaving: optimize a little, analyze the new terms, optimize again. Egglog's scheduling language gives explicit control with primitives such as repeat, seq, saturate, and run.
The idea behind the following schedule is to always saturate analyses before running optimizations.
This combination is wrapped in a repeat block to give us control over how long to run egglog.
With repeat 1 it is the same schedule as before, but now we can increase the iteration
count if we want to optimize harder with more time and space budget.
(run-schedule
(repeat 2
(saturate (run analysis))
(run optimizations)))
(pop)egglog = """
(run-schedule
(repeat 2
(saturate (run analysis))
(run optimizations)))
(pop)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)Running more iterations does not help our above example per se, but if we had started with a slightly more complex program to optimize...
; a + (b + (c + (d + (e + f))))
(let addition-chain
(Add (Var "a") (Add (Var "b")
(Add (Var "c") (Add (Var "d")
(Add (Var "e") (Var "f")))))))
; x * 0
(let x-times-zero (Mul (Var "x") (Num zero)))
(let nonzero-expr
(Add (Num one) (Add (Num one)
(Add (Num one) (Add (Num one)
x-times-zero)))))
(let expr (Mul nonzero-expr (Div addition-chain nonzero-expr)))egglog = """
(let addition-chain
(Add (Var "a") (Add (Var "b")
(Add (Var "c") (Add (Var "d")
(Add (Var "e") (Var "f")))))))
(let x-times-zero (Mul (Var "x") (Num zero)))
(let nonzero-expr
(Add (Num one) (Add (Num one)
(Add (Num one) (Add (Num one)
x-times-zero)))))
(let expr (Mul nonzero-expr (Div addition-chain nonzero-expr)))
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)For the purpose of this example, we add this rule
(rewrite (Mul x (Num zero)) (Num zero) :ruleset optimizations)egglog = """
(rewrite (Mul x (Num zero)) (Num zero) :ruleset optimizations)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)To prove expr is equivalent to addition-chian by applying the cancellation law,
we need to prove nonzero-expr is nonzero, which requires proving
x-times-zero's bound.
To show x-time-zero's bound, we need to apply an optimization rule to rewrite
it to 0.
In other words, this requires running analyses in between two runs of optimization rules
(the cancellation law and Mul's identity law)
Therefore, only running our schedule with one iteration (repeat 1) does not give us the optimal program.
(push)
(run-schedule
(repeat 1
(saturate (run analysis))
(run optimizations)))
(extract expr)
(pop)egglog = """
(push)
(run-schedule
(repeat 1
(saturate (run analysis))
(run optimizations)))
(extract expr)
(pop)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)Instead, we need to increase the iteration number.
(push)
(run-schedule
(repeat 2
(saturate (run analysis))
(run optimizations)))
(extract expr)
(pop)egglog = """
(push)
(run-schedule
(repeat 2
(saturate (run analysis))
(run optimizations)))
(extract expr)
(pop)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)The contrast between repeat 1 and repeat 2 is the point of the example. The first schedule cannot prove enough facts before extraction; the second gives the analysis a second chance after optimization has simplified x * 0.
Using custom schedulers
However, sometimes just having an iteration number does not give you enough control. For example, for many rules, such as associativity and commutativity (AC), the size of the e-graph grows hyper-exponentially with respect to the number of iterations.
Let's go back to this example, and run for five iterations.
(push)
(run-schedule
(repeat 5
(saturate (run analysis))
(run optimizations)))
(print-size Mul)
(pop)egglog = """
(push)
(run-schedule
(repeat 5
(saturate (run analysis))
(run optimizations)))
(print-size Mul)
(pop)
"""
{:ok, result} = Egglog.EGraph.run(egraph, egglog)
Tutorial.print_outputs(result, egglog)At iteration 5, the Mul function has size 582. However, if we bump that to 6,
the size of the Mul function will increase to 13285! Therefore, the iteration number is too coarse
of a granularity for defining the search space.
To this end, egglog provides a scheduler mechanism. A scheduler can decide which matches are important and need to be applied,
while others can be delayed or skipped. To use scheduler, there are two operations: let-scheduler and run-with.
(let-scheduler sched ..) instantiates a scheduler sched, and (run-with sched ruleset) rules a ruleset with that scheduler.
Currently, egglog-experimental implements one scheduler, back-off. (We will implement our own scheduler
in Section 6.) The idea of back-off is that it will ban a rule from applying if that rule grows the
e-graph too fast. The decision to ban is based on a threshold, which is initially small and increases as rules are banned.
This scheduler works well when the ruleset contains explosive rules like AC.
In this example, the back-off scheduler can prevent the associativity rule from dominating the equality saturation: when the the associativity rule (or any other rule) is fired too much, the scheduler will automatically ban this rule for a few iterations, so that other rules can catch up.
(run-schedule
(let-scheduler bo (back-off))
(repeat 10 (run-with bo optimizations)))
(print-size Mul)This block is reference code in this Livebook clone. It uses egglog functionality outside the thin Elixir wrapper path exercised here.
It is important that the scheduler bo is instantiated outside the repeat loop, since each scheduler carries some state that is updated
when run. For example, the following schedule has a very semantics than the schedule above.
(run-schedule
(repeat 10
(let-scheduler bo (back-off))
(run-with bo optimizations)))This schedule instantiates a (fresh) back-off scheduler for each run-with, so the ten iterations of rulesets are all run
with the initial configuration of the back-off scheduler, which has a very low threshold for banning rules.