AI agents powered by large language models (LLMs) are being used to solve increasingly complex software engineering challenges, but struggle with hardware design tasks. Register Transfer Level (RTL) code presents a unique challenge for LLMs, as it encodes complex, dynamic, time-evolving behaviors using the low-level language features of SystemVerilog. LLMs struggle to infer these complex behaviors from the syntax of RTL alone, which limits their ability to complete all downstream tasks like code completion, documentation, or verification. In response to this issue, we present DUET: a general methodology for developing Design Understanding via Experimentation and Testing. DUET mimics how hardware design experts develop an understanding of complex designs: not just via a one-off readthrough of the RTL, but via iterative experimentation using a number of tools. DUET iteratively generates hypotheses, tests them with EDA tools (e.g., simulation, waveform inspection, and formal verification), and integrates the results to build a bottom-up understanding of the design. In our evaluations, we show that DUET improves AI agent performance on formal verification, when compared to a baseline flow without experimentation.
💡 Deep Analysis
📄 Full Content
DUET: Agentic Design Understanding via
Experimentation and Testing
Gus Henry Smith1*, Sandesh Adhikary2, Vineet Thumuluri2, Karthik Suresh2, Vivek Pandit2,
Kartik Hegde2, Hamid Shojaei2, Chandra Bhagavatula2
1Southmountain Research, 2ChipStack**
gus@southmountain.ai, {sandesha,vineett,sureshk,vivekp,
kartikv,hamids,bchandra}@cadence.com
Abstract–AI agents powered by large language models (LLMs) are being used to solve increasingly complex software
engineering challenges, but struggle with hardware design tasks. Register Transfer Level (RTL) code presents a unique
challenge for LLMs, as it encodes complex, dynamic, time-evolving behaviors using the low-level language features of
SystemVerilog. LLMs struggle to infer these complex behaviors from the syntax of RTL alone, which limits their ability
to complete all downstream tasks like code completion, documentation, or verification. In response to this issue, we
present DUET: a general methodology for developing Design Understanding via Experimentation and Testing. DUET
mimics how hardware design experts develop an understanding of complex designs: not just via a one-off readthrough
of the RTL, but via iterative experimentation using a number of tools. DUET iteratively generates hypotheses, tests
them with EDA tools (e.g., simulation, waveform inspection, and formal verification), and integrates the results to build
a bottom-up understanding of the design. In our evaluations, we show that DUET improves AI agent performance on
formal verification, when compared to a baseline flow without experimentation.
I. INTRODUCTION
AI agents are quickly taking over many tasks in software engineering and beyond. Powered by large language
models (LLMs), AI agents can be deployed as powerful autonomous software workflows. An AI agent takes a
text prompt as input, and is generally also equipped with tools it can call (e.g., Python functions or command-line
utilities to read and write files, execute computations, or run installed tools). The agent then iteratively queries
the LLM (starting with the prompt) to get the next action. When the LLM requests an action, the function or
command-line utility is called and the results are sent back to the LLM. Eventually, the LLM decides to stop (or
is stopped externally), and sends a final result. Throughout the process, the agent may have also created a number
of artifacts, such as files in the filesystem. Using this general structure, AI agents have been able to replicate many
human tasks.
While AI agents have shown human-level performance in software engineering tasks, they continue to struggle
with similar tasks in hardware design. Among many reasons, we hypothesize that this is because the underlying
LLMs powering AI-assisted tools struggle with Register Transfer Level (RTL) code. RTL is inherently complex,
often capturing dynamic, time-evolving behaviors using the low-level, heavily implicit syntax of languages like
SystemVerilog. This is in stark contrast to software languages like Python or Java which contain more structure in
the syntax of the code itself; for example, sequential lines in imperative programming languages generally correspond
to instructions which will execute over time, and function names can be used to understand when control flow jumps
across files in the codebase. On the other hand, RTL does not have such structure. For example, sequential states in
a finite state machine (FSM) might be separated by tens or hundreds of lines in a case statement, and the control
flow between these states may be much less explicit. Thus, when LLMs are given only the RTL, they struggle
to understand the deeper behaviors of hardware designs and build design understanding. As a result, AI agents
perform worse on downstream hardware tasks like verification or debugging.
However, hardware designers themselves do not build their understanding of a design simply by reading the RTL.
Instead, the process is more dynamic; designers will use tools like simulation, waveform viewers, and formal tools to
understand the design. Even if the designer builds their understanding of the design purely from documentation, that
documentation contains descriptions of the dynamic behavior of the design such as timing diagrams and waveforms.
We present DUET: a general methodology for developing Design Understanding via Experimentation and Testing.
DUET that mimics how hardware design experts develop an understanding of complex designs: not just through
*work done while at ChipStack; **work done before ChipStack acquisition by Cadence Design Systems
arXiv:2512.06247v2 [cs.SE] 22 Jan 2026
reading the RTL, but via iterative experimentation. DUET enables AI agents to generate hypotheses about a design,
and equips it with tools to test and confirm or refine these hypotheses.
Let’s consider a simple example. Imagine we task an AI agent with describing a design with a reasonably complex
finite state machine—for example, an implementation of I2C. Specifically