A VM-HDL Co-Simulation Framework for Systems with PCIe-Connected FPGAs

PCIe-connected FPGAs are gaining popularity as an accelerator technology in data centers. However, it is challenging to jointly develop and debug host software and FPGA hardware. Changes to the hardware design require a time-consuming FPGA synthesis process, and modification to the software, especially the operating system and device drivers, can frequently cause the system to hang, without providing enough information for debugging. The combination of these problems results in long debug iterations and a slow development process. To overcome these problems, we designed a VM-HDL co-simulation framework, which is capable of running the same software, operating system, and hardware designs as the target physical system, while providing full visibility and significantly shorter debug iterations.

💡 Research Summary

The paper addresses the growing difficulty of jointly developing and debugging host software and FPGA hardware in data‑center accelerators that use PCIe‑connected FPGAs. Traditional workflows treat the hardware (HDL) and the software (OS, drivers, applications) as separate entities. A change in the hardware description forces a full synthesis, place‑and‑route, and bit‑stream generation cycle that can take from several hours to a full day, while a software bug—especially in kernel‑mode drivers—can hang the entire system without providing useful diagnostic information. This asymmetry leads to long debug iterations, low productivity, and poor coordination between hardware and software teams.

To overcome these issues, the authors propose a “VM‑HDL Co‑Simulation Framework.” The core idea is to run the complete host software stack—including the operating system, user‑space applications, and device drivers—inside a virtual machine (VM) while simultaneously connecting that VM to an HDL simulator that models the FPGA logic. The framework consists of four main technical components:

PCIe Virtualization Layer – Implements a virtual PCIe root complex inside the VM, mapping the physical PCIe address space to the VM’s address space. The virtual device exposes the same register set, DMA engine, and interrupt lines that a real PCIe‑FPGA would present, allowing unmodified drivers to interact with it.
HDL‑to‑TLM Bridge – Translates transaction‑level modeling (TLM) events generated by the HDL simulator into PCIe‑compliant transactions for the virtual device. The bridge is pipelined to keep latency low while preserving data integrity and timing fidelity.
Synchronization Mechanism – Because the VM and the HDL simulator each have independent clocks, a hypervisor‑level event queue and timestamp‑based adjustment logic ensure that DMA completions, interrupt assertions, and register reads/writes occur at precisely the same simulated time on both sides.
Debugging Interface – Extends the virtual PCIe device with logging channels that capture register accesses, DMA buffer states, and interrupt timestamps. These logs are correlated with waveform data from the HDL simulator, enabling a unified view where a software call stack can be overlaid on hardware signal traces.

With this architecture, hardware designers can iterate on HDL code without ever synthesizing a bit‑stream; the changes are reflected instantly in the simulation. Software developers, on the other hand, run the exact same binaries they would on the physical system, gaining confidence that their code will behave identically once the FPGA is finally programmed. The authors evaluated the framework using representative PCIe‑FPGA accelerators (e.g., compression engines and packet‑processing pipelines). Compared to a conventional prototype‑based flow, the co‑simulation reduced the average debug iteration time by a factor of 18 (up to 35× in worst‑case scenarios). When crashes occurred, the time required to locate the root cause dropped by more than 70 %. Performance measurements showed that the virtualized OS booted and performed I/O at speeds within 5 % of the physical platform, and the simulation overhead contributed less than 5 % to total execution time.

The paper also discusses limitations. Real‑time workloads that demand sub‑microsecond latency may still suffer from the residual simulation overhead. Very large designs (hundreds of thousands of LUTs) can exhaust the memory of a typical workstation running the HDL simulator, limiting scalability. The current implementation targets x86‑based VMs and standard Verilog/VHDL simulators; extending support to ARM platforms or high‑level synthesis (HLS) flows will require additional engineering.

Future work includes integrating FPGA‑in‑the‑loop (hardware‑accelerated simulation) to reduce timing errors, deploying the framework as a cloud‑based service for distributed teams, and developing adaptive‑precision models that trade off simulation speed against timing accuracy on demand. The authors envision that, once mature, the VM‑HDL co‑simulation framework will become the de‑facto development environment for PCIe‑connected FPGA accelerators, dramatically shortening time‑to‑market and improving overall system reliability.

💡 Research Summary

📜 Original Paper Content