클라우드 EDA 작업 예측을 위한 대형 언어 모델 파인튜닝 프레임워크
📝 Abstract
The rapid growth of cloud computing in the Electronic Design Automation (EDA) industry has created a critical need for resource and job lifetime prediction to achieve optimal scheduling. Traditional machine learning methods often struggle with the complexity and heterogeneity of EDA workloads, requiring extensive feature engineering and domain expertise. We propose a novel framework that fine-tunes Large Language Models (LLMs) to address this challenge through text-to-text regression. We introduce the scientific notation and prefix filling to constrain the LLM, significantly improving output format reliability. Moreover, we found that full-attention finetuning and inference improves the prediction accuracy of sliding-window-attention LLMs. We demonstrate the effectiveness of our proposed framework on real-world cloud datasets, setting a new baseline for performance prediction in the EDA domain.
💡 Analysis
The rapid growth of cloud computing in the Electronic Design Automation (EDA) industry has created a critical need for resource and job lifetime prediction to achieve optimal scheduling. Traditional machine learning methods often struggle with the complexity and heterogeneity of EDA workloads, requiring extensive feature engineering and domain expertise. We propose a novel framework that fine-tunes Large Language Models (LLMs) to address this challenge through text-to-text regression. We introduce the scientific notation and prefix filling to constrain the LLM, significantly improving output format reliability. Moreover, we found that full-attention finetuning and inference improves the prediction accuracy of sliding-window-attention LLMs. We demonstrate the effectiveness of our proposed framework on real-world cloud datasets, setting a new baseline for performance prediction in the EDA domain.
📄 Content
The semiconductor industry relies heavily on Electronic Design Automation (EDA) tools to design and verify complex ICs. As chip designs grow in complexity, the computational demands of EDA workloads have skyrocketed, leading to a massive migration of these tasks to cloud computing platforms (Bavikadi et al., 2022;Stok, 2014). While the cloud offers scalability and flexibility, efficiently managing resources to control costs without compromising performance remains a fundamental challenge (Liu et al., 2023). Accurate prediction of a job’s compute resource requirements (e.g., CPU, memory, and disk) and its execution time, or lifetime, is crucial for efficient workload prioritization, real time resource provisioning, and long term infrastructure planning.
Traditional approaches to this prediction problem often rely on statistical methods or machine learning models like Directed Acyclic Graphs (DAGs) (Huang, 2021), or convolutional graph network Kipf (2016); Zhu et al. (2024). However, these methods require structured, tabular data, forcing engineers to perform extensive and often brittle feature engineering. EDA job configurations are inherently complex and semi-structured, comprising tool settings, design parameters, technology node details, and script configurations. Flattening this rich information into a fixed-length vector is challenging and often leads to a loss of critical contextual information, limiting predictive performance.
The recent success of Large Language Models (LLMs) in diverse domains has opened up new possibilities for tackling complex regression tasks through a text-to-text formulation (Song and Bahri, 2025;Song et al., 2024). However, such opportunity has not yet been explored for EDA cloud job prediction. For the first time, by representing the entire EDA job configuration as a single string we directly train an LLM to “read” the configuration and “write” the predicted resource and lifetime values. This approach employs LLMs to encode the semi-structured job representations and learn to extract predictive signals from the inherent structure, relation, and dependency of the data. In this paper, we present a framework for fine-tuning LLMs for EDA job prediction. We demonstrate that this first text-to-text regression approach is not only feasible but highly effective:
• We provide the first validation of training LLMs on semi-structured EDA data for predicting the resource consumption and lifetime of EDA cloud jobs, establishing a new modeling paradigm for this problem space. • We propose two key techniques to enhance performance: representing numerical outputs in scientific notation to handle large dynamic ranges and using constrained decoding to guide the model’s output, improving both accuracy and robustness. Moreover, we demonstrate that full attention fine-tuning can further improve generation accuracy of a sliding-window pre-trained LLM. • We empirically validate our framework on real-world EDA datasets, demonstrating significant improvements over various manual and heuristic baselines.
EDA workflows consist of a series of computational jobs, such as logic synthesis, place-androute, timing analysis, and physical verification. These jobs’ performance is highly dependent on a multitude of factors, including but not limited to:
Design Characteristics: The size and complexity of the circuit design (e.g., number of logic gates, memory blocks).
Tool Configuration: The specific EDA tool, its version, and the multitude of settings and flags used for a particular run.
Technology Node: The target semiconductor manufacturing process (e.g., 7nm, 5nm), which dictates physical design rules.
Execution Environment: The underlying cloud infrastructure, including VM types and storage solutions.
The interplay between these factors creates a high-dimensional and complex feature space. A minor change in a synthesis script could lead to a drastically different netlist and cause a tenfold increase in the runtime of the subsequent place-and-route stage. This sensitivity makes prediction a complicated regression task.
Let X denote the space of heterogeneous EDA job configurations, where a specific job instance 𝑋 ∈ X encapsulates parameters such as dependency graphs, command-line arguments, and hardware constraints. Our objective is to predict a set of performance metrics 𝑌 ∈ ℝ 𝑚 , which includes peak memory usage, disk I/O, CPU utilization, and wall-clock execution time.
Traditional approaches typically frame this as a regression problem, requiring a feature extraction function 𝜙 : X → ℝ 𝑑 to map the complex configuration 𝑋 into a fixed-size feature vector. A regression model 𝑓 𝜃 is then learned such that 𝑓 𝜃 (𝜙(𝑋)) ≈ 𝑌 . The primary limitation of this paradigm lies in the design of 𝜙(•), which often struggles to capture the nuanced semantics of textual and hierarchical data within 𝑋. In contrast, we formulate this task as a sequence-to-sequence generation problem. We define a serialization functi
This content is AI-processed based on ArXiv data.