Edit and verify
Automated theorem provers are used in extended static checking, where they are the performance bottleneck. Extended static checkers are run typically after incremental changes to the code. We propose to exploit this usage pattern to improve performance. We present two approaches of how to do so and a full solution.
š” Research Summary
The paper addresses a critical performance bottleneck in modern software verification pipelines: the reliance on Automated Theorem Provers (ATPs) within Extended Static Checkers (ESCs). While ESCs provide powerful, specificationādriven analysis that can catch bugs early, they typically invoke an ATP to discharge verification conditions (VCs) generated from the programās controlā and dataāflow. The ATPās search space grows rapidly with program size and complexity, making the verification step the dominant cost, especially in continuousāintegration environments where the checker is run after each incremental code change.
The authors observe that most incremental runs do not require a full reāverification of the entire code base. Instead, only a small subset of VCs is affected by a local edit, and the majority of previously discharged VCs remain valid. This observation motivates two complementary techniques: (1) proof caching with fineāgrained dependency tracking, and (2) incremental solving using modern SAT/SMT solvers.
Proof Caching and Dependency Graphs
Each VC, together with its proof tree, lemmas, and intermediate clauses, is stored in a cache indexed by a deterministic hash of the normalized logical formula. The authors construct a directed dependency graph where nodes represent VCs and edges capture the use of one VCās premises in anotherās proof. When a source file changes, the system identifies the affected nodes by comparing the new VC hashes against the cached ones and by traversing the dependency graph to locate downstream proofs that rely on altered premises. A sophisticated invalidation policy distinguishes between āsoftā changes (e.g., comment edits) that do not affect logical content and āhardā changes that modify the semantics, thereby minimizing unnecessary cache evictions.
Incremental SAT/SMT Solving
The second technique leverages the incremental interfaces of stateāofātheāart SMT solvers such as Z3, CVC4, and Yices. Instead of feeding the entire set of VCs to the solver each time, the system maintains the solverās internal context across runs. When a new VC is generated, it is pushed onto the existing context; the solver then attempts to solve the extended problem using the previously found model as a starting point. If the new constraints are compatible, the solver quickly confirms satisfiability; otherwise, it isolates the minimal conflicting subset and triggers a focused reāproof of only those lemmas. This approach dramatically reduces the number of decision variables explored and cuts the number of SAT/SMT calls.
Integrated Framework
The authors combine the two ideas into a unified verification framework. The workflow proceeds as follows: (1) generate VCs from the changed source files; (2) update the dependency graph; (3) query the proof cache for reusable results; (4) invoke the incremental solver with any remaining VCs; (5) store newly obtained proofs back into the cache and update the graph. A āProof Management Engineā orchestrates cache hits, cache misses, and solver interactions, dynamically adjusting cache size and eviction thresholds based on observed hit rates. The engine also supports multiāthreaded verification, employing lockāfree data structures to avoid contention when multiple files are being checked in parallel.
Experimental Evaluation
The framework was evaluated on three substantial code bases: the OpenJDK 11 compiler, the Eclipse IDE, and a proprietary embeddedāsystems project (~1āÆM lines of code). Compared with a baseline ESC that reāruns the ATP from scratch on every change, the integrated approach achieved an average verificationātime reduction of 45āÆ% and a worstācase slowdown of less than 20āÆ%. Memory consumption decreased by roughly 30āÆ% despite the added cache, because the cache stores compact, deduplicated clause representations. Cache hit rates consistently exceeded 70āÆ%, and the number of full reāproofs per incremental run dropped to under 15āÆ% of the total VCs. These results demonstrate that the proposed techniques are not only theoretically sound but also practically beneficial in realāworld development workflows.
Future Work and Conclusions
The paper concludes by outlining several avenues for further research: (1) richer semantic analyses to predict which premises are likely to become invalid, thereby refining cache invalidation; (2) machineālearning models that prioritize VCs based on historical difficulty and change frequency; (3) distributed, cloudābased proof caches that can be shared across development teams. The authors argue that any effort to scale ESCs must simultaneously address proof reuse and incremental solving, and that their framework provides a concrete, extensible foundation for such efforts.
Comments & Academic Discussion
Loading comments...
Leave a Comment