0x0f0f0f - Alessandro Cheli

Metatheory.jl and Julia Symbolics - GSOC 2021

Published 18 August 2021

Tags: #julia, #metatheory.jl, #metatheory, #symbolics, #gsoc

Google Summer of Code 2021

This year I've applied for Google Summer of Code. My proposed project involved many practical applications of e-graph term rewriting for existing packages in the Julia programming language ecosystem through my package Metatheory.jl.

See the GSoC project page here.

Metatheory.jl is a general purpose metaprogramming and algebraic computation library for the Julia programming language, designed to take advantage of the powerful reflection capabilities to bridge the gap between symbolic mathematics, abstract interpretation, equational reasoning, optimization, composable compiler transforms, and advanced homoiconic pattern matching features. The core feature of Metatheory.jl is e-graph rewriting, a fresh approach to term rewriting achieved through an equality saturation algorithm. Our implementation of equality saturation on e-graphs is based on the excellent, state-of-the-art technique implemented in the egg library, reimplemented in pure Julia. MT (Metatheory.jl for short) allows users to define concise rewriting rules in pure, syntactically valid Julia on a high level of abstraction.

There is a lot of unexplored potential behind this novel technique of e-graph rewriting, and Julia felt like the natural language where we could explore and unleash this great symbolic manipulation potential.

Talk at JuliaCon

During this period of time, me and Philip Zucker have prepared a talk for the JuliaCon 2021 conference explaining Metatheory.jl and its potential. I have to thank Philip for being the first person to have the idea to implement e-graph rewriting in Julia. You can check out his early blog posts showcasing the ideas here and here. Before reading this article, which goes quite into technical implementation details, I strongly suggest readers to take a look at the Recommended Readings section in the Metatheory.jl README file. In this 25 min talk, we try to give an high level explanation of most things and facts behind the equality saturation algorithm, e-graph rewriting and why Metatheory.jl can be useful in the Julia ecosystem.

Another great talk which explains the equality saturation algorithm more concisely has been given by Max Willsey for POPL21. Please note that this lightning talk is about the original egg library, implemented in Rust. Watch the talk here.

We will also discuss some SymbolicUtils.jl and Symbolics.jl implementation details. You can check out this talk by Shashi Gowda and Yingbo Ma about Symbolics.jl at JuliaCon 2021:

Goals Proposed

For this GSOC project, I've proposed a bunch of goals, though, all these goals imply a large amount of work that is impractical to tackle alone. Completing all of them would have required a huge amount of time. Together with my mentor Sashi Gowda, we pinpointed and chosen some of these goals that fit most in the GSoC period, most regarding the SymbolicUtils.jl and Symbolics.jl integration with Metatheory.jl

Positive and Negative Results

Not all of the goals have been achieved. We achieved many positive results in integrating MT with SymbolicUtils.jl and Symbolics.jl, while we've encountered some problems that caused a slowdown in researching e-graph rewriting for category theoretical applications. Here's a recap of the positive results achieved:

What's new in Metatheory.jl

The latest release of Metatheory.jl is 0.4.1. Here are the things that changed in the package approximately since the start of GSoC:

┌ Info: Equality Saturation Report
│ =================
│       Stop Reason: Iteration Timeout
│       Iterations: 8
│       EGraph Size: 2 eclasses, 1165 nodes
│       Total Time: (time = 0.084321744, bytes = 25927056, gctime = 0.012734903)
│       Search Time: (time = 0.051270235000000004, bytes = 18560096, gctime = 0.012734903)
│       Apply Time: (time = 0.025271813000000004, bytes = 5952600, gctime = 0.0)
└       Rebuild Time: (time = 0.007779695999999999, bytes = 1414360, gctime = 0.0)
Configurable Parameters for the equality saturation process.
@with_kw mutable struct SaturationParams
    timeout::Int = 8
    matchlimit::Int = 5000
    eclasslimit::Int = 5000
    enodelimit::Int = 15000
    goal::Union{Nothing, SaturationGoal} = nothing
    stopwhen::Function = ()->false
    scheduler::Type{<:AbstractScheduler} = BackoffScheduler

What about TermInterface.jl?

We have decided, together with Shashi, to create TermInterface.jl, a package that defines a shared interface for all the common operations on symbolic terms and expressions, a feature that is missing in the Base module in the Julia standard library and is only present for the native Expr type. I have then updated Metatheory.jl, Symbolics.jl and SymbolicUtils.jl to support this shared interface and it is working great.

The evil plan consists of a simple design choice: any type that overrides and dispatches on the TermInterface.jl functions can be used as a symbolic expression in all the Julia packages that do symbolic manipulation and term rewriting magic: Symbolics.jl, SymbolicUtils.jl and Metatheory.jl (and possibly more!).

Unreleased Changes in Metatheory.jl!

However, I am close to release version 0.5 of Metatheory.jl with the full integration with TermInterface.jl (and consequently SymbolicUtils.jl and Symbolics.jl)! Here are the new commits on the master branch since the 0.4.1 release. What is new and close to be released?

Before releasing this new version tho, my PR has to be merged in SymbolicUtils.jl, since I want to add a dependency on it for Rewriters combinators.

SymbolicUtils.jl + Metatheory.jl = 🚀

The integration with TermInterface.jl is still on PR branches of SymbolicUtils.jl and Symbolics.jl, so you have to manually add my branches if you want to try. (SU PR and Symbolics PR), please note that it is still a work in progress.

First, a Symbolics.optimize function has been introduced. It takes a symbolic expression and returns another (numerically-approximately) equivalent expression which is less costly to compute, the cost function for optimization is defined to approximate the cost in CPU cycles.

This is based on the work that we published in the paper 'High-performance symbolic-numerics via multiple dispatch', submitted to arxiv in May with Shashi Gowda, Yingbo Ma, Maja Gwozdz, Viral B. Shah, Alan Edelman and Christopher Rackauckas. We showcase an e-graph ruleset which optimizes Symbolics.jl expressions minimizing the number of CPU cycles during expression evaluation, and demonstrate how it simplifies a real-world chemical reaction-network from a Catalyst.jl simulation to halve the runtime required, all thanks to this new TermInterface.jl integration between Symbolics.jl and Metatheory.jl

We submitted the paper to the ISSAC 2021 conference, Chris Rackauckas has given a nice talk about the paper, Symbolics.jl and also explaining the Metatheory.jl integration in detail.

Metatheory.jl rules now are callable functions (as SU rules), so one can do cool things using SU's Rewriter combinators (and more). Metatheory.jl classical rewriting module was slow and unstable so now most of the magic happens thanks to what was already in SU, Here is a short demo of how it will work.

using Metatheory.jl
using SymbolicUtils
using SymbolicUtils.Rewriters

arithm_rules = @theory begin
	(a + b, σ) 		  => (a, σ) + (b, σ)
	(a * b, σ) 		  => (a, σ) * (b, σ)
	(a - b, σ) 		  => (a, σ) - (b, σ)
	(a::Int + b::Int) |> a + b
	(a::Int * b::Int) |> a * b
	(a::Int - b::Int) |> a - b
	(n::Int, σ) |> n

strategy = (Fixpoint ∘ Postwalk ∘ Fixpoint ∘ Chain)

eval_arithm(ex, mem) = 
	strategy(read_mem ∪ arithm_rules)(:($ex, $mem))

All the operation, similarterm, arguments and istree methods in my branches of Symbolics and SU have been moved and renamed to dispatches of the functions in the TermInterface.jl package, namely gethead, similarterm, getargs, isterm. There's still some things to polish. Some codegen and minor things are broken, the core works fine and I have to define deprecated methods to make it still work with other packages such as ModelingToolkit.jl. Most importantly it allows developers to implement a simple interface for their symbolic expression types to be handled by both Metatheory (EGraph rewriting) and Symbolics and potentially by any other symbolic magic package. This provides Expr dispatches as symbolic types by default.

A working demo/blogpost of Metatheory.jl optimizing a Catalyst.jl chemical reaction network through Symbolics.jl is coming soon!

Ideas for Future Work on SymbolicUtils.jl

After working with SymbolicUtils and Symbolics, I have a couple of suggestions/critics for some SymbolicUtils.jl design choices. I'm eager to dedicate some time to code and improve the packages in the near future.

for f ∈ [:+, :-, :*, :/, :^]
    @eval begin
        Base.$f(x::Union{Expr, Symbol}, y::Number) = Expr(:call, $f, x, y)
        Base.$f(x::Number, y::Union{Expr, Symbol}) = Expr(:call, $f, x, y)
        Base.$f(x::Union{Expr, Symbol}, y::Union{Expr, Symbol}) = (Expr(:call, $f, x, y))


What about the goals about IR rewriting in the compiler pipeline? Sadly, the time was short and focusing on all the symbolic mathematics, Metatheory.jl internals and compiler pipeline goals at the same would have been too hard for me in a single summer. Together with Chen Zhao and Roger Luo, we discussed and proposed some experiments of Metatheory.jl e-graph rewriting for optimizing their quantum circuit IR for their quantum computing package Yao.jl!

This work is still in an early stage, but you can check out our experiments in this repo YaoEGraphs.jl. If this high level optimization use case is successful, this will show that optimizing with e-graph rewriting is also practicable on the various native Julia compiler IRs. McCoy Becker has also spent a lot of time working in his Compiler Plugin Github Organization, we tried to experiment a bit in optimizing Julia code using his package Mixtape.jl and Metatheory.jl, but in the end, during the past few months we didn't really end up defining a TermInterface.jl implementation for the native Julia IR constructs (or higher level abstractions).

Results with Category Theory and Catlab.

Regarding the (currently negative) results about rewriting in category theory, we have experimented a lot with Philip Zucker and James Fairbanks. The Catlab.jl package, designed for modeling category theoretical and algebraic constructs inside of Julia works through a different kind of symbolic expression, based on the theory of Generalized Algebraic Theories. These expressions require a complex way of embedding their types into the expressions. In higher order category contexts, those type tags often directly contain other expressions and not only 'type objects'. We tried many different encodings of those 'type tags' and tried to produce some proofs in various categories, but we found out that the equality saturation algorithm sometimes does not terminate even on trivial proofs and 'loses' a lot of time trying to apply equalities that are not relevant for the proof in question.

This is why we need:

Proof Certificates are Hard

Seeing why an e-graph rewrite fails is hard. This is why, when experimenting with rewriting for Catlab.jl expressions I have spent quite some time trying to design and implement an algorithm that provides an human readable proof (trace of rule application), after Metatheory.jl tries to prove that two expressions are equal using e-graphs. Max Willsey then let me know that Oliver Flatt is working on this problem on the Rust side of this e-graph rewriting world. In a long video chat with Oliver, he explained to me his new, almost finished algorithm. Turns out this task is much more complicated than I've expected, so, I've decided to focus on the Symbolics.jl integration goal for the rest of the GSoC, leaving this much harder problem for future work together with Catlab.jl integration.


I have to thank Max Willsey for creating the awesome egg library, Philip Zucker for having the awesome idea of implementing e-graph rewriting in Julia, Shashi Gowda and Keno Fisher for mentoring me in this GSoC project. Shashi Gowda, Yingbo Ma, Maja Gwozdz, Viral B. Shah, Alan Edelman, Christopher Rackauckas for giving me the awesome opportunity to co-author the Symbolics.jl paper with them, Shashi Gowda, Chris Rackauckas and Yingbo Ma for helping me understand more and work on Symbolics.jl and SymbolicUtils.jl and introducing me in the JuliaSymbolics Github organization. Filippo Bonchi for teaching me a lot about category theory and guiding me through this work, with many friendly revisions of my talks and papers. Evan Patterson and James Fairbanks for creating Catlab.jl, helping me understand it and work with it, Chen Zhao and Roger Luo for their interest in my project and letting me experiment with them on the Yao quantum IR, McCoy Becker for explaining me a lot about how the Julia compiler internals work, and for his Mixtape.jl package and his experiments with Metatheory.jl, Marek Kaluba for his interest in the project and for showing me some cool group theoretical problems, Miguel Raz Guzmán Macedo for his support, Taine Zhao for the MatchCore.jl library and help during the initial development phase of Metatheory.jl. Thanks to Philip Zucker, Chen Zhao, David P. Sanders, Greg Peairs, Mosè Giordano, 'NumHack' and McCoy R. Becker for contributing to the Metatheory.jl package on Github, and finally, thanks to everybody that has given me support in my project from the beginning!