GROOT: Graph Edge Re-growth and Partitioning for the Verification of Large Designs in Logic Synthesis
Traditional verification methods in chip design are highly time-consuming and computationally demanding, especially for large scale circuits. Graph neural networks (GNNs) have gained popularity as a potential solution to improve verification efficiency. However, there lacks a joint framework that considers all chip design domain knowledge, graph theory, and GPU kernel designs. To address this challenge, we introduce GROOT, an algorithm and system co-design framework that contains chip design domain knowledge and redesigned GPU kernels, to improve verification efficiency. More specifically, we create node features utilizing the circuit node types and the polarity of the connections between the input edges to nodes in And-Inverter Graphs (AIGs). We utilize a graph partitioning algorithm to divide the large graphs into smaller sub-graphs for fast GPU processing and develop a graph edge re-growth algorithm to recover verification accuracy. We carefully profile the EDA graph workloads and observe the uniqueness of their polarized distribution of high degree (HD) nodes and low degree (LD) nodes. We redesign two GPU kernels (HD-kernel and LD-kernel), to fit the EDA graph learning workload on a single GPU. We compare the results with state-of-the-art (SOTA) methods: GAMORA, a GNN-based approach, and the traditional ABC framework. Results show that GROOT achieves a significant reduction in memory footprint (59.38 %), with high accuracy (99.96%) for a very large CSA multiplier, i.e. 1,024 bits with a batch size of 16, which consists of 134,103,040 nodes and 268,140,544 edges. We compare GROOT with GPU-based GPU Kernel designs SOTAs such as cuSPARSE, MergePath-SpMM, and GNNAdvisor. We achieve up to 1.104x, 5.796x, and 1.469x improvement in runtime, respectively.
💡 Research Summary
The paper presents “GROOT,” an innovative algorithm and system co-design framework specifically engineered to overcome the computational bottlenecks in verifying large-scale circuit designs during logic synthesis. As modern chip designs grow exponentially in complexity, traditional verification methods like the ABC framework become prohibitively slow, while existing Graph Neural Network (GNN) approaches, such as GAMORA, struggle with memory constraints and a lack of hardware-level optimization for massive-scale graphs.
The core innovation of GROOT lies in its holistic integration of three critical domains: EDA (Electronic Design Automation) domain knowledge, graph theory, and GPU kernel architecture. To address the limitations of general-purpose GNNs, the authors first implemented advanced feature engineering by incorporating node types and the polarity of connections within And-Inverter Graphs (AIGs). This allows the model to capture the essential logical properties of the circuit, rather than just the structural topology.
To handle the immense scale of modern circuits, the researchers introduced a sophisticated graph partitioning strategy. Since a single GPU cannot accommodate graphs with hundreds of millions of nodes, the large graph is divided into manageable sub-graphs. To mitigate the accuracy loss typically associated with partitioning—where boundary information is lost—the authors developed an “Edge Re-growth” algorithm. This mechanism reconstructs the missing edge information at the partition boundaries, enabling the framework to achieve a remarkable verification accuracy of 99.96%.
Furthermore, the paper addresses the hardware-level inefficiency of processing EDA-specific workloads. By profiling the workload, the authors identified a unique, polarized distribution of high-degree (HD) and low-degree (LD) nodes. Leveraging this insight, they redesigned specialized GPU kernels—the HD-kernel and LD-kernel—to optimize memory access and computational throughput. This hardware-aware approach yielded significant performance gains, outperforming state-of-the-art GPU-based sparse matrix multiplication (SpMM) libraries, such as achieving up to a 5.796x speedup over MergePath-SpMM and 1.469x over GNNAdvisor.
The empirical results are highly impressive: when tested on a massive 1,024-bit CSA multiplier containing approximately 134 million nodes and 268 million edges, GROOT reduced the memory footprint by 59.38% while maintaining near-perfect accuracy. Ultimately, GROOT demonstrates that the synergy between domain-specific algorithmic innovation and hardware-centric system design is the key to enabling the next generation of large-scale, high-efficiency semiconductor verification.
Comments & Academic Discussion
Loading comments...
Leave a Comment