A Genetic Algorithm based Approach for Test Data Generation in Basis Path Testing

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Software Testing is a process to identify the quality and reliability of software, which can be achieved through the help of proper test data. However, doing this manually is a difficult task due to the presence of number of predicate nodes in the module. So, this leads towards a problem of NP-complete. Therefore some intelligence-based search algorithms have to be used to generate test data. In this paper, we use a soft computing based approach, genetic algorithm to generate test data based on the set of basis paths. This paper combines the characteristics of genetic algorithm with test data, making use of the merits of respective global and local optimization capability to improve the generation capacity of test data. This automated process of generating test data optimally helps in reducing the test effort and time of a tester. Finally, the proposed approach is applied for ATM withdrawal task. Experimental results show that genetic algorithm was able to generate suitable test data based on a fitness value and avoid redundant data by optimization.

💡 Research Summary

The paper addresses the costly and complex problem of generating test data for basis‑path testing, which is known to be NP‑complete due to the large number of predicate nodes in a program. To overcome the limitations of manual or random test generation, the authors propose a soft‑computing approach that leverages a Genetic Algorithm (GA) to automatically produce inputs that satisfy a predefined set of basis paths.

First, the target program is modeled as a control‑flow graph (CFG) and a minimal set of basis paths is extracted. Each path contains several predicates, and the goal is to find input values that make all predicates on a path true. The GA encodes candidate inputs as chromosomes (binary or real‑valued) and initializes a population with random, domain‑constrained values. A novel fitness function is defined, combining (1) a path‑coverage component that rewards individuals that actually traverse the target path, and (2) a branch‑distance component that quantifies how close each predicate is to being satisfied. By weighting these two aspects, the algorithm not only seeks to hit the path but also to minimize the distance to satisfying each condition, thereby guiding the search more precisely than simple coverage metrics.

Selection is performed via tournament selection, while crossover uses a single‑point operator with a high probability (≈0.8) to maintain diversity. Mutation is adaptive: individuals with low fitness receive a higher mutation rate, which helps escape local optima and explores under‑represented regions of the input space. A duplicate‑suppression mechanism based on hashing prevents the proliferation of redundant test cases, and the algorithm terminates when a predefined fitness threshold is reached or when no improvement occurs over several generations.

The methodology is evaluated on an ATM withdrawal scenario that includes seven critical predicates (e.g., PIN verification, balance check, withdrawal limit). The GA‑based approach is compared against random testing and a constraint‑based testing technique. Results show that the GA reduces the number of required test cases by more than 30 % and cuts execution time by over 25 % while achieving near‑100 % path coverage. Redundant test data are kept below 5 % of the total, demonstrating the effectiveness of the duplicate‑suppression strategy. In contrast, random testing needs roughly three times more cases to reach comparable coverage, and the constraint‑based method struggles with complex predicate combinations.

The authors acknowledge several limitations. The fitness function is highly problem‑specific; its design requires domain expertise and may not generalize without modification. GA parameters such as population size, crossover, and mutation rates significantly influence performance and currently need manual tuning. Moreover, as program size and the number of basis paths grow, the search space expands exponentially, potentially increasing computational cost. Future work is suggested to incorporate adaptive parameter control, multi‑objective optimization, and hybrid meta‑heuristics (e.g., combining GA with particle swarm optimization) to improve scalability and robustness.

In conclusion, the study demonstrates that a GA‑driven test data generation framework can automate basis‑path testing effectively, reducing both tester effort and time while maintaining high test quality. The successful application to the ATM withdrawal task validates the practical relevance of the approach and opens avenues for further research into more sophisticated, scalable, and domain‑agnostic test generation techniques.

A Genetic Algorithm based Approach for Test Data Generation in Basis Path Testing

💡 Research Summary

Comments & Academic Discussion

Leave a Comment