Reasoning About LLVM Code Using Codewalker

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper reports on initial experiments using J Moore’s Codewalker to reason about programs compiled to the Low-Level Virtual Machine (LLVM) intermediate form. Previously, we reported on a translator from LLVM to the applicative subset of Common Lisp accepted by the ACL2 theorem prover, producing executable ACL2 formal models, and allowing us to both prove theorems about the translated models as well as validate those models by testing. That translator provided many of the benefits of a pure decompilation into logic approach, but had the disadvantage of not being verified. The availability of Codewalker as of ACL2 7.0 has provided an opportunity to revisit this idea, and employ a more trustworthy decompilation into logic tool. Thus, we have employed the Codewalker method to create an interpreter for a subset of the LLVM instruction set, and have used Codewalker to analyze some simple array-based C programs compiled to LLVM form. We discuss advantages and limitations of the Codewalker-based method compared to the previous method, and provide some challenge problems for future Codewalker development.

💡 Research Summary

This paper presents an initial exploration of using J. Moore’s Codewalker, introduced in ACL2 7.0, to reason about programs compiled to the Low‑Level Virtual Machine (LLVM) intermediate representation. The authors previously built an OCaml‑based translator that converted LLVM IR into the applicative subset of Common Lisp accepted by the ACL2 theorem prover. That approach produced executable ACL2 models and allowed both theorem proving and testing, but the translator itself was not verified, leaving a gap in the overall trustworthiness of the verification pipeline.

With Codewalker, which provides a machine‑independent decompilation‑to‑logic framework, the authors set out to construct a more trustworthy pipeline. They first selected a simple C function that counts the occurrences of a given value in the first n elements of an array. Compiling this function with clang ‑O1 yields LLVM IR in SSA form, featuring phi nodes, conditional branches, integer comparisons, loads, and arithmetic operations. The authors manually translated the LLVM syntax into a Lisp‑like ACL2 representation, where each instruction becomes a list such as (ADD x y z).

The core of the new approach is the LL2 interpreter, a small subset of the LLVM instruction set implemented directly in ACL2. LL2 models the machine state as a single‑threaded object (stobj) containing a program counter, a vector of locals (the SSA registers), a memory list, a stack, and the program itself. For each supported opcode (ADD, BR, CONST, EQ, etc.) a corresponding ACL2 function (execute‑ADD, execute‑BR, …) is written. These functions read the current state, perform the operation, write results back to the appropriate registers, and increment the PC. To handle phi nodes, the authors introduced auxiliary instructions (CONST, PUSH, POPTO) that emulate the SSA renaming semantics.

Concrete execution tests demonstrate that LL2 can correctly run the example program on a concrete initial state (array base address 100, length 8, several values stored in memory). Running (ll2 s 113) yields the expected count of three occurrences, and the interpreter processes roughly 226 000 LLVM instructions per second on a typical laptop. This is about one‑tenth the speed of the earlier compiled ACL2 model, which is unsurprising given the interpreted nature of LL2, but the speed remains sufficient for validation testing.

Having a verified interpreter in place, the authors then integrated Codewalker. They defined a “model API” that tells Codewalker how to invoke the interpreter’s step function, access the PC, and manipulate the state. The program is divided into two focus regions: a pre‑amble (the code before the loop) and the loop body itself. For each region they supplied invariants (e.g., loop‑inv asserting that the loop index is less than n) and a measure function (clk‑8‑measure) that quantifies the number of remaining loop iterations. Using the def‑semantics command, Codewalker automatically generated semantic functions (sem‑pre‑amble‑0, sem‑loop‑8) that compute the final state of the region symbolically, and it produced correctness theorems stating that executing LL2 for the appropriate number of steps yields the same result as evaluating the generated semantic function.

The paper discusses several practical challenges encountered with Codewalker. The tool is “touchy” regarding focus‑region boundaries, invariant strength, and measure annotations; often the user must prove additional lemmas to help the ACL2 simplifier reduce terms to canonical forms. Moreover, the current implementation uses unbounded integers for all registers and memory locations, sidestepping the 64‑bit modular arithmetic that LLVM normally employs. This simplification limits the fidelity of the model for programs that rely on overflow behavior, and the authors note that incorporating a proper modular arithmetic model is a priority for future work.

In summary, the authors demonstrate that a verified LLVM‑subset interpreter combined with Codewalker can produce trustworthy decompilations of LLVM code into ACL2 semantic functions. The advantages of this approach include a fully verified decompilation step, automatic generation of semantic functions and associated correctness theorems, and the ability to reason about the high‑level functional specification of the original C program within ACL2. The limitations are the relatively narrow subset of LLVM instructions supported, the performance penalty compared to the earlier compiled model, and the need for substantial manual effort to craft invariants, measures, and auxiliary lemmas. Future directions include extending the interpreter to cover more LLVM opcodes, automating invariant discovery, integrating modular arithmetic to match LLVM’s 64‑bit semantics, and improving Codewalker’s ergonomics for larger, more complex programs.

Reasoning About LLVM Code Using Codewalker

💡 Research Summary

Comments & Academic Discussion

Leave a Comment