Case study: LA compiler

Copy Markdown

Intro

These notebooks are Livebook adaptations of the upstream egglog tutorial from https://egraphs-good.github.io/egglog-tutorial/. The egglog examples follow the upstream chapter order closely; the extra Elixir cells are there to make normally silent egglog state visible in Livebook.

Each executable egglog block is sent directly to one live mutable Egglog.EGraph session. The helper below is only for displaying outputs and snapshots; the cells that change the e-graph call Egglog.EGraph.run/3 themselves.

When using the published package, replace the dependency with {:egglog_elixir, "~> 0.1"}.

Mix.install([
  {:egglog_elixir, path: Path.expand("../..", __DIR__)},
  {:kino, "~> 0.19.0"}
])
defmodule Tutorial do
  def print_outputs(result, code \\ "") do
    (result.outputs ++ check_outputs(code))
    |> Enum.map(& &1.text)
    |> Enum.reject(&is_nil/1)
    |> Enum.map(&String.trim_trailing/1)
    |> Enum.reject(&(&1 == ""))
    |> Enum.each(&IO.puts/1)
  end

  def render_snapshot(%{format: :svg, svg: svg}) when is_binary(svg) do
    kino_html = Module.concat(Kino, HTML)

    if Code.ensure_loaded?(kino_html) do
      apply(kino_html, :new, [svg])
    else
      svg
    end
  end

  def render_snapshot(%{text: text}), do: text

  def summarize_egraph(egraph, opts \\ []) do
    summary =
      egraph
      |> json_snapshot(opts)
      |> Egglog.Snapshot.summary()

    %{
      e_nodes: summary.nodes,
      e_classes: summary.classes,
      by_operator: summary.operators,
      omitted: summary.omitted
    }
  end

  defp json_snapshot(egraph, opts) do
    Egglog.EGraph.snapshot!(egraph, Keyword.merge([render: :json], opts))
  end

  defp check_outputs(code) do
    code
    |> String.split("\n")
    |> Enum.map(&String.trim/1)
    |> Enum.reject(&(&1 == ""))
    |> Enum.filter(&String.contains?(&1, "(check "))
    |> Enum.map(fn line ->
      text =
        if String.starts_with?(line, "(fail ") do
          "expected failure observed: #{line}"
        else
          "check succeeded: #{line}"
        end

      %{type: :check_success, text: text <> "\n"}
    end)
  end
end

{:ok, egraph} = Egglog.EGraph.new()

Scope note: this case study is about using egglog as a Rust library. It involves a custom parser, AST lowering, Rust-side command construction, extract_value_with_cost_model, custom schedulers, and custom cost models. The Livebook preserves the upstream tutorial text and points to the upstream Rust project, but it does not reimplement those Rust extension hooks through the thin Elixir wrapper.

Case study: LA compiler

This chapter is different from the previous ones. The earlier chapters taught the egglog language through the thin Elixir wrapper. This case study is about what you can do when you embed egglog as a Rust library and extend it.

The important boundary is:

  • The Elixir wrapper can run ordinary egglog programs, inspect outputs, take snapshots, and keep a live e-graph session.
  • This case study customizes Rust-side parsing, scheduling, and cost-model behavior. Those extension hooks are not reimplemented in the thin wrapper.
  • The Livebook therefore serves as a guided map of the upstream case-study source rather than as an executable Elixir reconstruction.

The upstream case-study project is here:

https://github.com/egraphs-good/egglog-tutorial/tree/main/case-study-linear-algebra-compiler

This library does not vendor that Rust project. The notes below describe the case study conceptually so egglog_elixir remains self-contained.

In this section, we learn about functionality egglog provides as a Rust library. Besides providing basic e-graph operations, egglog allows extensions in base values, primitive functions, containers, schedulers, commands, and cost models.

For the first half of the case study, we will use the E-graph operations egglog provides to build a compiler for linear algebra. For the second half, we will showcase how we can extend egglog with application-specific scheduler and cost models.

The code is accessible at this GitHub directory. To see the answer to each question, please take a look at the solutions branch.

Language for Linear Algebra

The input language for the compiler is a simple straight-line language consisting of a list of declarations, followed by a list of bindings. Each declaration declares an input from the user and has the form

 V: Type;

where V is a variable name and Type is either R (reals) or [R; NxM] (matrix of dimension N x M, where N and M are natural numbers).

Each binding has the form

 V = E;

where V is a variable name and E is a variable, a number, or binary expressions with operators *, /, +, -.

For example, below is a General Matrix-Multiply (GEMM) operation expressed in our DSL.

 a: R;
 b: R;
 A: [R; 32x32];
 B: [R; 32x32];
 C: [R; 32x32];
 R = a*(A*B)+b*C;

Note that the semantics of binary operators are overloaded: * can mean any of matrix multiplication, scalar-matrix multiplication, or scalar multiplication, depending on the types of its operands.

The parser for this language has already been implemented. The result of parsing is stored in the Rust CoreBindings struct in src/ast.rs.

For an Elixir user, the conceptual translation is: the host language parses a user-facing DSL, lowers it to egglog terms, runs saturation, extracts a representative, and prints the optimized DSL back out. The thin wrapper does the middle part; the case study demonstrates how far one can go when the host language is Rust and can plug into egglog internals directly.

Part 1: Building the initial e-graph

In this part, we will process the parsed expressions and turn them into egglog ASTs. To start, take a look at the schema definitions in src/defn.egg. There are several ways of running egglog in Rust: the user can call EGraph::parse_and_run_program with the program string, call EGraph::run_program with a command AST, or define rules using the convenience method provided in egglog::prelude::*. In this part, we will use EGraph::parse_and_run_program for Problems 1 and 2, and EGraph::run_program for Problem 3.

The corresponding thin-wrapper mental model is Egglog.EGraph.run(egraph, source): send egglog source to a live e-graph. The Rust case study goes lower level because it constructs and runs command ASTs directly.

Part 2: Running optimizations

We then run the rules we defined in src/defn.egg to grow the e-graph (Problem 4). After the e-graph is grown, we will run e-graph extraction to obtain the optimized program (Problem 5). To extract a program from the e-graph, you can use EGraph::extract_value_with_cost_model, which requires a cost model as the input. Since we used dynamic cost (via set-cost) in our egglog program, we need to use DynamicCostModel from egglog-experimental to ensure the cost incorporates those dynamically set.

This is the same extraction story as chapter 5, but with a richer cost model. Chapter 5 showed static constructor costs; the case study uses matrix dimensions and dynamic costs to prefer cheaper linear-algebra plans.

Part 3: Custom scheduling and cost models

In this part, we will further customize egglog's capability. In Problem 7, we will implement FirstNScheduler, which applies at most N matches in each iteration. Every scheduler in egglog needs to implement a filter_matches method, which takes rule metainfo and a Matches struct. The Matches struct encapsulates a set of matches and allows users to choose the subset of matches to be applied. Matches that are not chosen for application will be delayed for scheduling to the next iteration. Additionally, filter_matches returns a boolean flag, indicating whether the next run of the rule needs to search the database again to fetch more matches. For instance, if the scheduler decides to not mark any match for application in the current iteration, it may also not want to request more matches until the current batch of matches is applied.

In problem 8, we will customize the cost model. We will define a cost model that prefers terms with smaller depth. Such a cost model is useful when e.g., we want to minimize the depth of computation for maximal parallelism. The cost model can be defined in egglog by implementing the CostModel trait, which requires the user to define the cost of an e-node, a container node, a base value, and additionally, the cost of an AST given the cost of the node itself and its children.