Designing a CPU model: from a pseudo-formal document to fast code

Designing a CPU model: from a pseudo-formal document to fast code
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

For validating low level embedded software, engineers use simulators that take the real binary as input. Like the real hardware, these full-system simulators are organized as a set of components. The main component is the CPU simulator (ISS), because it is the usual bottleneck for the simulation speed, and its development is a long and repetitive task. Previous work showed that an ISS can be generated from an Architecture Description Language (ADL). In the work reported in this paper, we generate a CPU simulator directly from the pseudo-formal descriptions of the reference manual. For each instruction, we extract the information describing its behavior, its binary encoding, and its assembly syntax. Next, after automatically applying many optimizations on the extracted information, we generate a SystemC/TLM ISS. We also generate tests for the decoder and a formal specification in Coq. Experiments show that the generated ISS is as fast and stable as our previous hand-written ISS.


💡 Research Summary

**
The paper presents a fully automated methodology for generating a high‑performance Instruction Set Simulator (ISS) directly from the pseudo‑formal sections of the ARMv6 reference manual. The authors first convert the PDF manual to plain text, apply a small patch to fix typographical errors, and then use three dedicated OCaml parsers to extract (1) the binary encoding tables, (2) the pseudo‑code describing instruction semantics, and (3) the assembly syntax. These extracts are represented as abstract syntax trees (ASTs).

A series of transformations are applied to the ASTs. Correctness‑preserving passes include turning symbolic expressions passed to helper functions into explicit function calls (e.g., converting OverflowFrom(Rn+op+C) into OverflowFromAdd3(Rn,op,C)) and inlining addressing‑mode code into the corresponding instruction. Write‑back semantics are reordered so that register updates occur after possible memory‑access exceptions, preserving architectural behavior.

Performance‑oriented optimizations consist of pre‑computing static sub‑expressions (e.g., Hamming‑weight of a register list), instruction flattening (merging addressing modes with their host instructions), and specialization (generating multiple variants of an instruction with fixed condition flags or S‑bit values). Specialization is guided by profiling data that indicate which variants are worth materialising, thereby controlling code‑size explosion.

The optimized intermediate representation is then fed to a C/C++ code generator that produces a SystemC/TLM‑compatible ISS. The generated code includes: (a) a union‑based instruction descriptor with automatically derived parameter structs (80 distinct layouts), (b) two decoders (ARM and Thumb) that first select candidate instructions via a switch statement and then filter them using validity constraints, (c) semantics functions directly translated from the pseudo‑code, (d) a “may‑branch” function for basic‑block analysis, and (e) assembly printers for debugging. In addition, the toolchain automatically emits Coq specifications and decoder unit tests, enabling formal verification of the generated model.

Experimental results show that the automatically generated ISS matches the speed of a hand‑written counterpart, achieving over 90 MIPS on several benchmarks, while also providing the same stability and functional coverage. Compared with prior work that relies on Architecture Description Languages (ADL) or static compilation techniques (e.g., QEMU, FACILE), this approach leverages the existing pseudo‑formal documentation, requires no manual coding of the instruction set, and integrates dynamic translation to handle dynamically loaded code.

The authors conclude that their pipeline dramatically reduces development effort, eliminates human error, and can be extended to other architectures such as PowerPC, MIPS, and SH‑2, which share a similar manual structure. Future work includes adding cycle‑accurate timing models and deeper static analyses to further improve the fidelity of the generated simulators.


Comments & Academic Discussion

Loading comments...

Leave a Comment