Integrating Testing and Interactive Theorem Proving
Using an interactive theorem prover to reason about programs involves a sequence of interactions where the user challenges the theorem prover with conjectures. Invariably, many of the conjectures posed are in fact false, and users often spend considerable effort examining the theorem prover’s output before realizing this. We present a synergistic integration of testing with theorem proving, implemented in the ACL2 Sedan (ACL2s), for automatically generating concrete counterexamples. Our method uses the full power of the theorem prover and associated libraries to simplify conjectures; this simplification can transform conjectures for which finding counterexamples is hard into conjectures where finding counterexamples is trivial. In fact, our approach even leads to better theorem proving, e.g. if testing shows that a generalization step leads to a false conjecture, we force the theorem prover to backtrack, allowing it to pursue more fruitful options that may yield a proof. The focus of the paper is on the engineering of a synergistic integration of testing with interactive theorem proving; this includes extending ACL2 with new functionality that we expect to be of general interest. We also discuss our experience in using ACL2s to teach freshman students how to reason about their programs.
💡 Research Summary
The paper presents a tightly integrated framework that combines automated testing with the interactive theorem prover ACL2, as realized in the ACL2 Sedan (ACL2s). The authors observe that during interactive proof development users frequently pose conjectures that turn out to be false, yet distinguishing a genuine proof failure from a false conjecture can be time‑consuming. To address this, they embed a testing engine directly into the proof process, automatically generating concrete counterexamples whenever possible, and they also allow testing results to influence the proof search itself.
The core technical contribution is a two‑way feedback loop. First, as ACL2 attempts to prove a goal it repeatedly applies rewrite rules, simplifiers, and library lemmas, often transforming the original conjecture into a much simpler subgoal. The authors exploit this simplification by extracting type‑like information from hypotheses using ACL2s’s “defdata” data‑definition framework. When a user declares a type (e.g., a list of integers), ACL2s automatically creates a predicate (loip) and a surjective enumerator (nth‑loi) that maps natural numbers to values of that type. These enumerators feed a random‑testing subsystem that can quickly produce inputs satisfying the extracted type constraints. Because the subgoal has been simplified, the probability of finding a counterexample rises dramatically compared to naïve random testing on the original, possibly highly constrained conjecture.
Second, the framework feeds testing outcomes back into the prover. If testing on a simplified subgoal yields a concrete counterexample, the system interprets this as evidence that a recent generalization step was unsound. It then triggers a “backtrack hint” that forces ACL2 to abandon the offending generalization and explore alternative proof strategies. To implement this, the authors extend ACL2’s computed‑hint mechanism with three new capabilities: (1) recording why variables are eliminated during proof so that counterexamples on subgoals can be lifted to the top‑level conjecture; (2) “override‑hints,” which inject testing‑related hints without overwriting user‑provided hints; and (3) “backtrack‑hints,” which permit limited backtracking based on testing feedback. This dynamic interaction makes the prover more robust: it avoids wasting effort on proof paths that are provably false and can automatically recover from premature generalizations.
The paper situates its work among prior efforts in counterexample generation, such as SAT/SMT‑based approaches (Pythia, Nitpick) and random testing in functional languages (QuickCheck). Unlike those, the presented system does not require translating ACL2’s untyped, executable logic into a decidable fragment; it works directly on executable ACL2 formulas, preserving soundness while offering full automation. Moreover, the authors claim that no previous ACL2‑based system has automatically tested arbitrary subgoals generated during proof and used the results to steer the proof engine.
A substantial engineering effort underlies the integration. The authors modify ACL2’s core to log variable‑elimination reasons, augment the data‑definition framework to generate type enumerators for primitive and user‑defined types, and implement random‑sampling strategies (pseudo‑uniform and pseudo‑geometric distributions). The testing subsystem can operate in bounded exhaustive mode or pure random mode; by default it uses random sampling for efficiency.
Beyond technical contributions, the paper reports on an educational deployment. In freshman programming courses at Northeastern University, students use ACL2s to write specifications and immediately see concrete counterexamples when their conjectures are false. Because students are already familiar with evaluating programs on concrete inputs, the testing feedback provides an intuitive bridge to formal verification concepts. The system requires no special commands—testing runs automatically whenever a conjecture is admitted—making it accessible to novices while still offering powerful capabilities to expert users.
Empirical evaluation on a suite of ACL2 regression examples shows that the combined approach discovers counterexamples that pure random testing misses, and that the backtracking mechanism reduces proof time by avoiding fruitless generalizations. The authors conclude that the synergistic integration of testing and theorem proving yields a more user‑friendly, efficient, and pedagogically valuable verification environment. Future work includes extending the random‑generation heuristics, exploring machine‑learning‑guided input generation, and porting the ideas to other interactive provers such as Isabelle/HOL and Coq.
Comments & Academic Discussion
Loading comments...
Leave a Comment