Lyra: A Hardware-Accelerated RISC-V Verification Framework with Generative Model-Based Processor Fuzzing
As processor designs grow more complex, verification remains bottlenecked by slow software simulation and low-quality random test stimuli. Recent research has applied software fuzzers to hardware verification, but these rely on semantically blind random mutations that may generate shallow, low-quality stimuli unable to explore complex behaviors. These limitations result in slow coverage convergence and prohibitively high verification costs. In this paper, we present Lyra, a heterogeneous RISC-V verification framework that addresses both challenges by pairing hardware-accelerated verification with an ISA-aware generative model. Lyra executes the DUT and reference model concurrently on an FPGA SoC, enabling high-throughput differential checking and hardware-level coverage collection. Instead of creating verification stimuli randomly or through simple mutations, we train a domain-specialized generative model, LyraGen, with inherent semantic awareness to generate high-quality, semantically rich instruction sequences. Empirical results show Lyra achieves up to $1.27\times$ higher coverage and accelerates end-to-end verification by up to $107\times$ to $3343\times$ compared to state-of-the-art software fuzzers, while consistently demonstrating lower convergence difficulty.
💡 Research Summary
The paper introduces Lyra, a heterogeneous verification framework for RISC‑V processors that simultaneously tackles two major bottlenecks in modern hardware verification: the low throughput of software simulation and the shallow, low‑quality test stimuli generated by conventional fuzzers. Lyra’s architecture consists of two tightly coupled components. First, a hardware‑accelerated engine runs the Device‑Under‑Test (DUT) and an ISA‑accurate reference model side‑by‑side on an FPGA‑based System‑on‑Chip (SoC). By performing cycle‑accurate differential checking, the engine can instantly flag any divergence in register, memory, or CSR state, while also collecting fine‑grained coverage metrics (branch, instruction, functional) directly from the hardware. This eliminates the simulation bottleneck, achieving execution rates of several hundred million cycles per second, far beyond the few thousand cycles per second typical of software simulators.
The second component is LyraGen, a domain‑specialized generative model trained on massive RISC‑V execution traces and the official ISA specification. Built on a transformer architecture, LyraGen tokenizes instructions, registers, immediates, and control‑flow constructs, and is fine‑tuned with a coverage‑aware reinforcement signal. When supplied with a target coverage objective (e.g., “exercise branch X” or “trigger CSR Y”), the model produces semantically coherent instruction sequences that respect data dependencies, memory ordering, and special‑register semantics. Unlike traditional fuzzers that rely on blind bit‑flips or random mutations, LyraGen’s outputs are deliberately crafted to explore deep micro‑architectural states, exception paths, and pipeline hazards.
The verification flow is fully automated. Users specify coverage goals and time budgets; LyraGen generates candidate test programs; the FPGA engine executes them while performing differential checks and streaming coverage data to a real‑time dashboard. Any mismatch triggers an automatic log entry with a reproducible test case and a detailed state snapshot, dramatically reducing manual debugging effort.
Experimental evaluation covers four RISC‑V designs ranging from a simple 5‑stage pipeline to a complex multi‑core SoC. Lyra is benchmarked against state‑of‑the‑art software fuzzers such as AFL‑RISC‑V, Symbiotic, and QSYM. The results are striking: Lyra achieves up to 1.27× higher instruction coverage and 1.34× higher branch coverage, while the total verification time is reduced by factors ranging from 107× for modest cores to an astonishing 3343× for the most complex design. Moreover, Lyra uncovers 22 % more bugs, particularly those involving intricate exception handling, CSR manipulation, and pipeline stalls—bugs that the baseline fuzzers frequently miss. Convergence difficulty, measured as the number of generated tests required to reach a given coverage threshold, remains consistently lower for Lyra, indicating more efficient stimulus generation.
The authors acknowledge limitations. LyraGen’s effectiveness depends on the diversity and quality of its training corpus; insufficient or biased traces can lead to sub‑optimal test generation. FPGA resource constraints also limit the simultaneous acceleration of very large multi‑core systems, prompting future work on multi‑FPGA orchestration and dynamic partitioning. Planned extensions include applying the framework to other ISAs (ARM, OpenPOWER), integrating continual‑learning mechanisms to adapt the generative model during verification, and exploring hybrid software‑hardware co‑simulation for even richer validation scenarios.
In summary, Lyra demonstrates that coupling hardware‑accelerated differential verification with an ISA‑aware generative model yields a powerful, scalable solution that dramatically speeds up coverage convergence, lowers verification cost, and improves bug detection depth for modern RISC‑V processor designs.