Case study: LA compiler
Copy MarkdownIntro
These notebooks are Livebook adaptations of the upstream egglog tutorial from https://egraphs-good.github.io/egglog-tutorial/. The egglog examples follow the upstream chapter order closely; the extra Elixir cells are there to make normally silent egglog state visible in Livebook.
Each executable egglog block is sent directly to one live mutable Egglog.EGraph session. The helper below is only for displaying outputs and snapshots; the cells that change the e-graph call Egglog.EGraph.run/3 themselves.
When using the published package, replace the dependency with {:egglog_elixir, "~> 0.1"}.
Mix.install([
{:egglog_elixir, path: Path.expand("../..", __DIR__)},
{:kino, "~> 0.19.0"}
])defmodule Tutorial do
def print_outputs(result, code \\ "") do
(result.outputs ++ check_outputs(code))
|> Enum.map(& &1.text)
|> Enum.reject(&is_nil/1)
|> Enum.map(&String.trim_trailing/1)
|> Enum.reject(&(&1 == ""))
|> Enum.each(&IO.puts/1)
end
def render_snapshot(%{format: :svg, svg: svg}) when is_binary(svg) do
kino_html = Module.concat(Kino, HTML)
if Code.ensure_loaded?(kino_html) do
apply(kino_html, :new, [svg])
else
svg
end
end
def render_snapshot(%{text: text}), do: text
def summarize_egraph(egraph, opts \\ []) do
summary =
egraph
|> json_snapshot(opts)
|> Egglog.Snapshot.summary()
%{
e_nodes: summary.nodes,
e_classes: summary.classes,
by_operator: summary.operators,
omitted: summary.omitted
}
end
defp json_snapshot(egraph, opts) do
Egglog.EGraph.snapshot!(egraph, Keyword.merge([render: :json], opts))
end
defp check_outputs(code) do
code
|> String.split("\n")
|> Enum.map(&String.trim/1)
|> Enum.reject(&(&1 == ""))
|> Enum.filter(&String.contains?(&1, "(check "))
|> Enum.map(fn line ->
text =
if String.starts_with?(line, "(fail ") do
"expected failure observed: #{line}"
else
"check succeeded: #{line}"
end
%{type: :check_success, text: text <> "\n"}
end)
end
end
{:ok, egraph} = Egglog.EGraph.new()Scope note: this case study is about using egglog as a Rust library. It involves a custom parser, AST lowering, Rust-side command construction,
extract_value_with_cost_model, custom schedulers, and custom cost models. The Livebook preserves the upstream tutorial text and points to the upstream Rust project, but it does not reimplement those Rust extension hooks through the thin Elixir wrapper.
Case study: LA compiler
This chapter is different from the previous ones. The earlier chapters taught the egglog language through the thin Elixir wrapper. This case study is about what you can do when you embed egglog as a Rust library and extend it.
The important boundary is:
- The Elixir wrapper can run ordinary egglog programs, inspect outputs, take snapshots, and keep a live e-graph session.
- This case study customizes Rust-side parsing, scheduling, and cost-model behavior. Those extension hooks are not reimplemented in the thin wrapper.
- The Livebook therefore serves as a guided map of the upstream case-study source rather than as an executable Elixir reconstruction.
The upstream case-study project is here:
https://github.com/egraphs-good/egglog-tutorial/tree/main/case-study-linear-algebra-compiler
This library does not vendor that Rust project. The notes below describe the
case study conceptually so egglog_elixir remains self-contained.
In this section, we learn about functionality egglog provides as a Rust library. Besides providing basic e-graph operations, egglog allows extensions in base values, primitive functions, containers, schedulers, commands, and cost models.
For the first half of the case study, we will use the E-graph operations egglog provides to build a compiler for linear algebra. For the second half, we will showcase how we can extend egglog with application-specific scheduler and cost models.
The code is accessible at this GitHub directory.
To see the answer to each question, please take a look at the solutions branch.
Language for Linear Algebra
The input language for the compiler is a simple straight-line language consisting of a list of declarations, followed by a list of bindings. Each declaration declares an input from the user and has the form
V: Type;where V is a variable name and Type is either R (reals) or [R; NxM] (matrix of dimension N x M, where N and M are natural numbers).
Each binding has the form
V = E;where V is a variable name and E is a variable, a number, or binary expressions with
operators *, /, +, -.
For example, below is a General Matrix-Multiply (GEMM) operation expressed in our DSL.
a: R;
b: R;
A: [R; 32x32];
B: [R; 32x32];
C: [R; 32x32];
R = a*(A*B)+b*C;Note that the semantics of binary operators are overloaded: * can mean any of matrix
multiplication, scalar-matrix multiplication, or scalar multiplication, depending on the types
of its operands.
The parser for this language has already been implemented. The result of parsing is stored in the Rust CoreBindings struct in src/ast.rs.
For an Elixir user, the conceptual translation is: the host language parses a user-facing DSL, lowers it to egglog terms, runs saturation, extracts a representative, and prints the optimized DSL back out. The thin wrapper does the middle part; the case study demonstrates how far one can go when the host language is Rust and can plug into egglog internals directly.
Part 1: Building the initial e-graph
In this part, we will process the parsed expressions and turn them into egglog ASTs.
To start, take a look at the schema definitions in src/defn.egg.
There are several ways of running egglog in Rust: the user can call
EGraph::parse_and_run_program with the program string, call EGraph::run_program with a
command AST, or define rules using the convenience method provided in egglog::prelude::*. In
this part, we will use EGraph::parse_and_run_program for Problems 1 and 2, and
EGraph::run_program for Problem 3.
The corresponding thin-wrapper mental model is Egglog.EGraph.run(egraph, source): send egglog source to a live e-graph. The Rust case study goes lower level because it constructs and runs command ASTs directly.
Part 2: Running optimizations
We then run the rules we defined in src/defn.egg to grow the e-graph (Problem 4).
After the e-graph is grown, we will run e-graph extraction to obtain the optimized program
(Problem 5). To extract a program from the e-graph, you can use
EGraph::extract_value_with_cost_model, which requires a cost model as the input.
Since we used dynamic cost (via set-cost) in our egglog program, we need to use
DynamicCostModel from egglog-experimental to ensure the cost incorporates those dynamically set.
This is the same extraction story as chapter 5, but with a richer cost model. Chapter 5 showed static constructor costs; the case study uses matrix dimensions and dynamic costs to prefer cheaper linear-algebra plans.
Part 3: Custom scheduling and cost models
In this part, we will further customize egglog's capability.
In Problem 7, we will implement FirstNScheduler, which applies at most N matches in each iteration.
Every scheduler in egglog needs to implement a filter_matches method, which takes rule metainfo and a Matches struct.
The Matches struct encapsulates a set of matches and allows users to choose the
subset of matches to be applied.
Matches that are not chosen for application will be delayed for scheduling to the next iteration.
Additionally, filter_matches returns a boolean flag, indicating whether the next run of the
rule needs to search the database again to fetch more matches.
For instance, if the scheduler decides to not mark any match for application in the current
iteration, it may also not want to request more matches until the current batch of matches is
applied.
In problem 8, we will customize the cost model. We will define a cost model that prefers terms
with smaller depth. Such a cost model is useful when e.g., we want to minimize the depth of
computation for maximal parallelism. The cost model can be defined in egglog by implementing
the CostModel trait, which requires the user to define the cost of an e-node, a container
node, a base value, and additionally, the cost of an AST given the cost of the node itself
and its children.