RulePlanner: All-in-One Reinforcement Learner for Unifying Design Rules in 3D Floorplanning

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Floorplanning determines the coordinate and shape of each module in Integrated Circuits. With the scaling of technology nodes, in floorplanning stage especially 3D scenarios with multiple stacked layers, it has become increasingly challenging to adhere to complex hardware design rules. Current methods are only capable of handling specific and limited design rules, while violations of other rules require manual and meticulous adjustment. This leads to labor-intensive and time-consuming post-processing for expert engineers. In this paper, we propose an all-in-one deep reinforcement learning-based approach to tackle these challenges, and design novel representations for real-world IC design rules that have not been addressed by previous approaches. Specifically, the processing of various hardware design rules is unified into a single framework with three key components: 1) novel matrix representations to model the design rules, 2) constraints on the action space to filter out invalid actions that cause rule violations, and 3) quantitative analysis of constraint satisfaction as reward signals. Experiments on public benchmarks demonstrate the effectiveness and validity of our approach. Furthermore, transferability is well demonstrated on unseen circuits. Our framework is extensible to accommodate new design rules, thus providing flexibility to address emerging challenges in future chip design. Code will be available at: https://github.com/Thinklab-SJTU/EDA-AI

💡 Research Summary

The paper introduces RulePlanner, a novel deep reinforcement‑learning (RL) framework that simultaneously satisfies a comprehensive set of hardware design rules in three‑dimensional (3D) integrated‑circuit (IC) floorplanning. As technology nodes shrink, 3D floorplanning must obey increasingly complex constraints such as non‑overlap, boundary alignment, grouping (abutted blocks), inter‑die alignment, pre‑placement, outline, and shape‑ratio limits. Existing analytical, heuristic, and RL methods each handle only a subset of these rules; the rest are left to manual post‑processing, which is time‑consuming and error‑prone.

RulePlanner addresses this gap by (1) converting each design rule into a matrix representation, (2) using these matrices to construct binary masks that prune illegal actions directly in the policy’s output, and (3) defining quantitative metrics for every rule that become components of the reward signal. Two key matrices are introduced: the Adjacent Terminal Mask, which stores the Manhattan distance from a candidate block position to each terminal it must align with, and the Adjacent Block Mask, which stores the adjacency length between a candidate block and already‑placed blocks that must be physically abutted. Both matrices are computed in O(WH) time using GPU‑accelerated meshgrid operations; when multiple terminals or blocks are involved, max/min or sum operators merge the individual masks.

The floorplanning problem is modeled as an episodic Markov Decision Process. At each step the actor‑critic agent observes a state consisting of the stacked rule matrices and the netlist graph, then outputs a hybrid action (x, y, AR) where (x, y) is a discrete placement coordinate and AR is a continuous aspect‑ratio value. The binary masks filter out invalid (x, y) positions, and AR is clipped to satisfy shape constraints. Consequently, the agent never proposes actions that would violate hard constraints, eliminating the need for large penalty terms and accelerating convergence.

The reward function aggregates seven rule‑specific scores: (a) block‑terminal distance (to be minimized), (b) block‑block adjacency length (to be maximized), (c) inter‑die alignment area (maximized), (d) pre‑placement deviation (minimized), (e) overlap area (minimized), (f) outline violation (minimized), and (g) shape‑ratio deviation (minimized). Standard objectives such as half‑perimeter wire length (HPWL) are also included. This multi‑objective reward directly guides the policy toward designs that respect all constraints.

Experiments on publicly available 3D floorplanning benchmarks and on larger, unseen circuits demonstrate that RulePlanner achieves >90 % compliance across all seven rule categories, reduces HPWL and overlap by 15–20 % compared with the best prior methods, and converges faster due to the action‑space pruning. Zero‑shot transfer experiments show that a model trained on one set of circuits can be applied to completely new designs with minimal performance loss, confirming strong generalization. Table 1 in the paper highlights that prior approaches satisfy at most three or four rule types, whereas RulePlanner satisfies all.

Limitations include the memory footprint of the rule matrices for very large floorplan canvases and the current focus on a fixed number of layers. The framework also does not yet incorporate power‑density or thermal constraints, which are important for advanced nodes. The authors suggest future work on sparse or hierarchical mask representations, multi‑scale policy networks, and integration of additional physical models to create a truly holistic floorplanning optimizer.

In summary, RulePlanner provides the first unified RL solution that can handle the full spectrum of industrial 3D IC floorplanning constraints. By embedding rule knowledge directly into the state and action representations and by rewarding quantitative rule satisfaction, it eliminates the need for costly manual legalization and opens the door to scalable, automated chip layout generation. The code and datasets will be released publicly, facilitating further research and industry adoption.

RulePlanner: All-in-One Reinforcement Learner for Unifying Design Rules in 3D Floorplanning

💡 Research Summary

Comments & Academic Discussion

Leave a Comment