AutoFSM: A Multi-agent Framework for FSM Code Generation with IR and SystemC-Based Testing

AutoFSM: A Multi-agent Framework for FSM Code Generation with IR and SystemC-Based Testing
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

With the rapid advancement of large language models (LLMs) in code generation, their applications in hardware design are receiving growing attention. However, existing LLMs face several challenges when generating Verilog code for finite state machine (FSM) control logic, including frequent syntax errors, low debugging efficiency, and heavy reliance on test benchmarks. To address these challenges, this paper proposes AutoFSM, a multi-agent collaborative framework designed for FSM code generation tasks. AutoFSM introduces a structurally clear intermediate representation (IR) to reduce syntax error rate during code generation and provides a supporting toolchain to enable automatic translation from IR to Verilog. Furthermore, AutoFSM is the first to integrate SystemC-based modeling with automatic testbench generation, thereby improving debugging efficiency and feedback quality. To systematically evaluate the framework’s performance, we construct SKT-FSM, the first hierarchical FSM benchmark in the field, comprising 67 FSM samples across different complexity levels. Experimental results show that, under the same base LLM, AutoFSM consistently outperforms the open-source framework MAGE on the SKT-FSM benchmark, achieving up to an 11.94% improvement in pass rate and up to a 17.62% reduction in syntax error rate. These results demonstrate the potential of combining LLMs with structured IR and automated testing to improve the reliability and scalability of register-transfer level (RTL) code generation.


💡 Research Summary

The paper presents AutoFSM, a novel multi‑agent framework that tackles three persistent problems in large‑language‑model (LLM)‑driven hardware code generation: high syntax‑error rates, low debugging efficiency, and over‑reliance on static test benches. AutoFSM introduces a clear, JSON‑based intermediate representation (IR) for finite‑state‑machine (FSM) specifications. An “FSMExtractor” agent translates natural‑language design descriptions into this IR, which explicitly enumerates states, transitions, inputs, outputs, and initial conditions. By decoupling the semantic understanding performed by the LLM from the syntactic generation of Verilog, the framework can feed the IR into an existing tool (fsm2sv) that reliably emits syntactically correct Verilog, dramatically reducing syntax errors.

The system is organized around six cooperating agents: (1) FSMExtractor builds the IR, (2) Verifier checks that the IR matches the original intent, (3) Coder generates Verilog from the IR, (4) Tester automatically creates a SystemC model of the FSM and a corresponding testbench, (5) Fixer amends either the IR, Verilog, or test code when errors are detected, and (6) Judge analyses simulation logs to pinpoint the root cause of functional failures and directs the Fixer. This division of labor creates a closed feedback loop: compilation errors trigger immediate fixing, while functional mismatches are diagnosed by the Judge before targeted repairs are applied.

To evaluate the approach, the authors construct SKT‑FSM, a new benchmark consisting of 67 hierarchical FSM examples drawn from 100 open‑source FSMs, filtered and de‑duplicated with LLM assistance. Each sample is annotated with a design description, a reference Verilog model, and an automatically generated test program. Complexity is quantified by normalized line count, state count, and transition count, yielding three difficulty tiers (Easy, Medium, Hard).

Experiments compare AutoFSM against the open‑source multi‑agent framework MA​GE, using the same underlying LLM for both. Across the full benchmark, AutoFSM improves the pass rate by up to 11.94 percentage points and cuts the syntax‑error rate by up to 17.62 percentage points. Gains are especially pronounced on Hard‑level FSMs, demonstrating that the IR‑centric pipeline scales to more intricate control logic.

The paper also discusses limitations. The current IR is tailored to FSMs; extending the approach to other RTL constructs (pipelines, memory controllers, etc.) will require new schema definitions and possibly additional conversion tools. SystemC‑based simulation, while providing high‑fidelity functional checking, incurs non‑trivial computational cost, suggesting future work on simulation scheduling or parallel execution for large designs.

In summary, AutoFSM successfully combines LLM semantic capabilities with a structured IR and automated SystemC testing, delivering a more reliable and scalable workflow for RTL code generation. The framework’s modular agent design, error‑reduction strategy, and benchmark contributions constitute a significant step toward fully autonomous hardware design and verification pipelines.


Comments & Academic Discussion

Loading comments...

Leave a Comment