E-Graphs as a Persistent Compiler Abstraction
Recent algorithmic advances have made equality saturation an appealing approach to program optimization because it avoids the phase-ordering problem. Existing work uses external equality saturation libraries, or custom implementations that are deeply tied to the specific application. However, these works only apply equality saturation at a single level of abstraction, or discard the discovered equalities when code is transformed by other compiler passes. We propose an alternative approach that represents an e-graph natively in the compiler’s intermediate representation, facilitating the application of constructive compiler passes that maintain the e-graph state throughout the compilation flow. We build on a Python-based MLIR framework, xDSL, and introduce a new MLIR dialect, eqsat, that represents e-graphs in MLIR code. We show that this representation expands the scope of equality saturation in the compiler, allowing us to interleave pattern rewriting with other compiler transformations. The eqsat dialect provides a unified abstraction for compilers to utilize equality saturation across various levels of intermediate representations concurrently within the same MLIR flow.
💡 Research Summary
The paper “E‑Graphs as a Persistent Compiler Abstraction” introduces a novel way to embed equality‑saturation directly into a compiler’s intermediate representation, eliminating the need for external e‑graph libraries and preserving discovered equivalences across multiple compilation stages. Traditional approaches convert a program at a single abstraction level into an e‑graph, run saturation, and then extract a single optimized representation for the next pass. This workflow suffers from two major drawbacks: (1) translation overhead and loss of information when moving between the compiler IR and an external e‑graph library, and (2) the inability to reuse the rich set of equivalent expressions during later transformations such as lowering, inlining, or domain‑specific optimizations.
To address these issues, the authors build on the Python‑based MLIR framework xDSL and define a new dialect called eqsat. In eqsat, an e‑graph is represented as a graph region—a first‑class MLIR construct that can contain cycles and is fully integrated with the SSA‑based value system. Each e‑class and e‑node becomes an MLIR operation, allowing the e‑graph to be manipulated using the same infrastructure that drives conventional compiler passes.
A key technical contribution is the reinterpretation of MLIR’s Pattern Description Language (PDL). Instead of applying PDL patterns destructively (replacing matched sub‑trees immediately), the authors implement a non‑destructive e‑matching interpreter that adds new nodes to the e‑graph or merges existing ones while leaving the original IR untouched. This enables the reuse of existing declarative rewrite specifications for both traditional lowering and equality‑saturation, without requiring developers to learn a new language.
The paper also extends MLIR’s sparse data‑flow analysis framework to support e‑class analyses and a cost model. Each e‑class can be annotated with operation costs (e.g., number of scalar operations), and a dedicated extraction pass replaces every e‑class with its lowest‑cost member after saturation reaches a fixed point or a user‑defined iteration limit.
The authors validate their approach with a case study involving complex‑number arithmetic. They start from high‑level expressions such as complex(p, q) / complex(r, s) and apply a set of rewrite rules that lower complex operations to scalar arithmetic, compute absolute values, and perform algebraic simplifications. When using an external e‑graph library, the saturation process requires many iterations and the e‑graph grows explosively because all rewrite rules are applied indiscriminately. By contrast, with the eqsat dialect the same high‑level expressions are inserted directly into the e‑graph, and non‑monotonic rules (e.g., associativity, commutativity) can be applied safely. The system discovers the optimal scalar expression (eight operations) in far fewer iterations, demonstrating both performance gains and the ability to interleave saturation with other passes.
Beyond the case study, the paper highlights several broader benefits:
- Zero translation overhead – No need to serialize/deserialize between IR and an external data structure.
- Preservation of equivalences – All discovered equalities remain available for subsequent passes, enabling cross‑level optimizations.
- Reuse of existing infrastructure – Existing PDL patterns, SSA‑based analyses, and passes can operate on the embedded e‑graph without modification.
- Sparse, region‑based rewriting – Because MLIR’s SSA use‑def chains are available, transformations can target only the relevant sub‑graph, avoiding unnecessary work.
- Support for cyclic e‑graphs – Graph regions naturally allow cycles, which are essential for representing certain equivalence classes.
The authors argue that this approach is especially suitable for modern compilers that manipulate multiple abstraction levels (e.g., tensor compilers, digital circuit synthesis tools, high‑performance numerical libraries). By making equality‑saturation a first‑class, persistent abstraction, the framework removes the classic phase‑ordering problem and opens the door to more aggressive, exploration‑driven optimization strategies.
In summary, the paper presents a compelling integration of e‑graphs into MLIR, demonstrates its practical advantages through a concrete mathematical example, and outlines a path for broader adoption in diverse compiler ecosystems.
Comments & Academic Discussion
Loading comments...
Leave a Comment