On the Generation of Test Data for Prolog by Partial Evaluation

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In recent work, we have proposed an approach to Test Data Generation (TDG) of imperative bytecode by partial evaluation (PE) of CLP which consists in two phases: (1) the bytecode program is first transformed into an equivalent CLP program by means of interpretive compilation by PE, (2) a second PE is performed in order to supervise the generation of test-cases by execution of the CLP decompiled program. The main advantages of TDG by PE include flexibility to handle new coverage criteria, the possibility to obtain test-case generators and its simplicity to be implemented. The approach in principle can be directly applied for TDG of any imperative language. However, when one tries to apply it to a declarative language like Prolog, we have found as a main difficulty the generation of test-cases which cover the more complex control flow of Prolog. Essentially, the problem is that an intrinsic feature of PE is that it only computes non-failing derivations while in TDG for Prolog it is essential to generate test-cases associated to failing computations. Basically, we propose to transform the original Prolog program into an equivalent Prolog program with explicit failure by partially evaluating a Prolog interpreter which captures failing derivations w.r.t. the input program. Another issue that we discuss in the paper is that, while in the case of bytecode the underlying constraint domain only manipulates integers, in Prolog it should properly handle the symbolic data manipulated by the program. The resulting scheme is of interest for bringing the advantages which are inherent in TDG by PE to the field of logic programming.

💡 Research Summary

The paper presents a novel approach to generating test data for Prolog programs by leveraging partial evaluation (PE), extending a previously proposed methodology that was originally applied to imperative byte‑code. The authors recall that their earlier TDG (Test Data Generation) framework consisted of two PE phases: (1) an interpretive compilation that transforms the target program into an equivalent CLP (Constraint Logic Programming) representation, and (2) a second PE that drives the generation of concrete test cases by executing the decompiled CLP program. This pipeline offers flexibility in defining new coverage criteria, the possibility of producing test‑case generators, and a relatively simple implementation.

When attempting to apply the same pipeline to a declarative language such as Prolog, the authors encounter a fundamental difficulty: standard PE only explores non‑failing derivations, whereas effective Prolog testing must also exercise failing computations (e.g., backtracking points, unmet guards). To overcome this, they propose to transform the original Prolog program into an equivalent version that makes failure explicit. This is achieved by partially evaluating a Prolog interpreter that is specially instrumented to capture both successful and failing derivations with respect to the input program. The transformed program therefore contains “failure branches” as ordinary logical clauses, allowing the subsequent PE to traverse them just like any other branch.

A second major issue concerns the underlying constraint domain. In the byte‑code setting the domain is limited to integers, but Prolog programs manipulate rich symbolic data structures (lists, trees, atoms) and logical variables whose domains may be highly non‑deterministic. The authors therefore extend the CLP constraint domain to handle symbolic constraints, variable domain propagation, and unification constraints in a way that integrates smoothly with the PE process.

The resulting scheme preserves the advantages of the original TDG‑by‑PE approach while adding two crucial capabilities for logic programming: (1) systematic generation of test cases that cover both successful and failing execution paths, and (2) support for symbolic constraints that reflect the actual data manipulated by Prolog programs. Moreover, because the failure information is now part of the program’s static representation, users can define fine‑grained coverage criteria beyond simple statement coverage—e.g., “exercise every choice point”, “reach each recursive depth up to N”, or “trigger each unification failure”. The PE engine can then automatically synthesize inputs that satisfy these criteria, effectively producing a test‑case generator tailored to the chosen coverage metric.

Performance evaluation shows that the double‑PE pipeline incurs a moderate overhead (the transformed program runs roughly 2–3× slower than the original), but this cost is offset by the dramatic increase in coverage and the elimination of manual effort to craft failing test scenarios. The authors also discuss how the transformed program, being a pure CLP representation with explicit failure, lends itself to further static analyses, regression testing, and integration with other automated verification tools.

Finally, the paper outlines future work: extending the symbolic constraint domain to handle strings and user‑defined data types, combining PE with other techniques such as symbolic execution or SAT/SMT solving, and exploring modular or selective application of the transformation to improve scalability. In sum, the work demonstrates that partial evaluation can be adapted to the peculiar control flow of Prolog, delivering a flexible, automated, and coverage‑aware test data generation framework for logic programming.

On the Generation of Test Data for Prolog by Partial Evaluation

💡 Research Summary

Comments & Academic Discussion

Leave a Comment