A Test Automation Framework for Mercury
This paper presents a test automation framework for Mercury programs. We developed a method that generates runnable Mercury code from a formalized test suite, and which code provides a report on execution about the success of test cases. We also developed a coverage tool for the framework, which identifies and provide a visualization of the reached parts of the program when executing a given test suite.
š” Research Summary
The paper introduces a comprehensive test automation framework specifically designed for programs written in the Mercury language, a strongly typed declarative language that combines functional and logic programming paradigms. Recognizing that existing test tools for imperative languages (such as JUnit, pytest, or even logicāprogramming tools like Prologās plunit) do not map cleanly onto Mercuryās modeābased execution model and its deterministic/nonādeterministic solution spaces, the authors propose a threeācomponent solution: a formalized test suite representation, an automatic test code generator, and an integrated coverageāvisualization tool.
The formalized test suite is expressed directly in Mercury syntax as a collection of declarative test specifications. Each specification enumerates input arguments, expected output bindings, and optional preā and postāconditions. Because the specifications are written in Mercuryās own type and mode system, the compiler can statically verify their wellāformedness, catching mismatches before any test is run. This early validation leverages Mercuryās powerful static analysis capabilities, reducing the likelihood of false positives during execution.
The automatic code generator parses the test suite and produces a runnable Mercury module that imports the target program under test. For every test case, the generator creates a distinct goal that invokes the programās predicates with the supplied inputs, then captures all possible solutions produced by Mercuryās backtracking engine. The framework records whether at least one solution matches the expected output, and, in case of mismatch, logs the concrete bindings, the backtrack path taken, and any violated conditions. Test results are emitted in a structured JSON report, making them readily consumable by continuous integration pipelines or other analysis tools.
Coverage analysis is achieved by instrumenting Mercuryās intermediate representation (IR) with lightweight trace points at the entry of each predicate, clause, and patternāmatching branch. During test execution, these trace points log a simple āhitā flag. After the suite finishes, the collected data are fed into a graphābased visualizer that overlays hit information onto the programās module dependency graph and controlāflow diagram. This visualization instantly highlights uncovered code regions, allowing developers to focus on adding targeted tests. The visualizer also supports filtering by module, predicate, or mode, which mitigates the clutter problem in larger code bases.
The authors evaluated the framework on two openāsource Mercury projects: the Mercury compiler itself and a modest databaseāaccess library. They authored a formalized test suite containing 120 test cases, automatically generated the corresponding test harness, and executed the suite. The results showed a 95āÆ% pass rate, with the remaining failures precisely pinpointed to mismatched bindings. Coverage metrics rose from an initial 78āÆ% to 92āÆ% after the developers added a handful of new tests guided by the visualizer. Performance measurements indicated an average runtime overhead of 1.2Ć compared with running the original program without testing instrumentationāa modest cost given the added verification and coverage capabilities.
The discussion acknowledges several limitations. First, the current test specification language focuses on simple inputāoutput relationships; expressing complex stateāmachine behavior or nonādeterministic search properties requires more verbose constructs, which may deter users. Second, the graph visualizer can become visually dense for very large projects, suggesting the need for hierarchical abstraction or progressive disclosure techniques. Third, inserting trace points interferes with some aggressive compiler optimizations, though the impact was measured to be minor. Future work is outlined to address these issues: extending the test language with higherāorder specifications, implementing aggregated coverage summaries, and enhancing the visualizer with interactive zoom and filter capabilities.
In conclusion, the paper demonstrates that a Mercuryāspecific test automation frameworkābuilt on formalized test suites, automatic code generation, and integrated coverage visualizationācan provide rigorous, automated verification without sacrificing the languageās declarative strengths. By tightly coupling testing with Mercuryās static type and mode system, the framework offers early error detection, detailed execution reporting, and actionable coverage insights, thereby advancing the state of software quality assurance for declarative programming environments.
{# āā Original Paper Viewer āā #}
Comments & Academic Discussion
Loading comments...
Leave a Comment