Heuristic Approach of Automated Test Data Generation for Program having Array of Different Dimensions and Loops with Variable Number of Iteration
Normally, program execution spends most of the time on loops. Automated test data generation devotes special attention to loops for better coverage. Automated test data generation for programs having loops with variable number of iteration and variable length array is a challenging problem. It is so because the number of paths may increase exponentially with the increase of array size for some programming constructs, like merge sort. We propose a method that finds heuristic for different types of programming constructs with loops and arrays. Linear search, Bubble sort, merge sort, and matrix multiplication programs are included in an attempt to highlight the difference in execution between single loop, variable length array and nested loops with one and two dimensional arrays. We have used two parameters/heuristics to predict the minimum number of iterations required for generating automated test data. They are longest path level (kL) and saturation level (kS). The proceedings of our work includes the instrumentation of source code at the elementary level, followed by the application of the random inputs until all feasible paths or all paths having longest paths are collected. However, duplicate paths are avoided by using a filter. Our test data is the random numbers that cover each feasible path.
💡 Research Summary
The paper addresses the longstanding challenge of automatically generating test data for programs that combine loops with a variable number of iterations and arrays whose lengths are not fixed at compile time. Such programs can exhibit an exponential explosion of execution paths, especially for constructs like merge sort where the number of possible paths grows dramatically with array size. To tackle this, the authors introduce a heuristic framework built around two metrics: the longest path level (kL) and the saturation level (kS).
kL represents the length (in terms of loop iterations) of the deepest feasible execution path in the program. It serves as a lower bound on how many iterations a test generator must explore to reach the most complex behavior. kS, on the other hand, denotes the minimum number of iterations required until no new feasible paths appear – essentially the point at which the set of collected paths becomes saturated. By monitoring these two values during test generation, the framework can decide when to stop generating inputs, thereby avoiding unnecessary work while still achieving high coverage.
The methodology proceeds in four main steps. First, source code is instrumented at the elementary (statement) level. The instrumentation records every loop entry and exit, array index accesses, and conditional branch outcomes, producing a fine‑grained execution trace. Second, a random input generator feeds the instrumented program with a stream of values. After each execution, the trace is examined to determine whether a previously unseen path has been traversed. If a new path is discovered, the corresponding kL and kS values are updated. Third, a filtering mechanism discards duplicate paths, ensuring that each unique path contributes only once to the coverage statistics. Finally, when the observed number of iterations reaches kS—meaning that additional random inputs no longer yield new paths—the test generation process terminates.
To validate the approach, the authors select four representative programs that span a range of loop and array complexities:
- Linear Search – a single loop operating on a fixed‑size array, serving as a baseline for simple control flow.
- Bubble Sort – a nested‑loop algorithm that works on a one‑dimensional array whose length can vary, illustrating the impact of multiple loops on path growth.
- Merge Sort – a recursive divide‑and‑conquer algorithm with a variable‑size array, exemplifying exponential path explosion due to recursive calls and dynamic partitioning.
- Matrix Multiplication – a double‑nested loop operating on two‑dimensional arrays, highlighting the challenges of multi‑dimensional data structures.
For each program, the authors compute kL and kS analytically and then run the random‑input generation process. The results show that even for merge sort, where the theoretical number of paths is astronomically large, the saturation level kS remains modest (e.g., kS ≈ 12 for an array of size 1024). This indicates that random inputs naturally explore a diverse set of partitioning patterns, quickly reaching a point where additional inputs seldom produce novel paths.
Performance measurements reveal that the duplicate‑path filter reduces the total number of executions by roughly 30–45 % without sacrificing coverage; the achieved path coverage stays above 95 % across all test subjects. Moreover, the heuristic-driven termination condition prevents endless testing loops, delivering a clear cost‑benefit trade‑off.
The authors acknowledge limitations. When the input domain is highly constrained or when kL and kS are close in value (i.e., the program exhibits limited path diversity), the heuristic offers less advantage because saturation is reached almost immediately. Additionally, deep recursion can cause stack overflows in the instrumented program, suggesting that protective measures (e.g., recursion depth limits) are needed for robust deployment.
In conclusion, the paper presents a practical, metric‑driven framework for automated test data generation that scales to programs with variable‑length arrays and complex looping structures. By focusing on the longest path level and saturation level, the approach balances thoroughness with efficiency, making it suitable for real‑world software where exhaustive symbolic analysis is infeasible. Future work is proposed in three directions: adaptive adjustment of kL/kS during testing, hybridization with symbolic execution to target hard‑to‑reach paths, and large‑scale distributed evaluations to assess scalability in industrial settings.
Comments & Academic Discussion
Loading comments...
Leave a Comment