Bypassing the Combinatorial Explosion: Using Similarity to Generate and Prioritize T-wise Test Suites for Large Software Product Lines
Software Product Lines (SPLs) are families of products whose commonalities and variability can be captured by Feature Models (FMs). T-wise testing aims at finding errors triggered by all interactions amongst t features, thus reducing drastically the number of products to test. T-wise testing approaches for SPLs are limited to small values of t – which miss faulty interactions – or limited by the size of the FM. Furthermore, they neither prioritize the products to test nor provide means to finely control the generation process. This paper offers (a) a search-based approach capable of generating products for large SPLs, forming a scalable and flexible alternative to current techniques and (b) prioritization algorithms for any set of products. Experiments conducted on 124 FMs (including large FMs such as the Linux kernel) demonstrate the feasibility and the practicality of our approach.
💡 Research Summary
Software Product Lines (SPLs) are families of software systems that share a common core while varying through a set of optional features captured in a Feature Model (FM). Testing all possible feature interactions is infeasible because the number of possible products grows exponentially with the number of features. t‑wise testing mitigates this problem by requiring that every combination of t features appear in at least one tested product. However, existing t‑wise techniques either (i) enumerate all t‑combinations (which is only viable for very small FMs), (ii) rely on constraint solvers that struggle with large FMs or higher values of t, or (iii) generate test suites without any systematic way to prioritize the most fault‑prone products. Consequently, many industrial SPLs remain under‑tested, especially when t ≥ 3.
The paper introduces a two‑fold contribution that addresses these shortcomings. First, it proposes a search‑based generation algorithm that exploits feature‑level similarity to construct compact t‑wise test suites even for very large FMs. Each product is encoded as a binary vector indicating the presence or absence of each feature. Starting from a random seed set, the algorithm iteratively applies mutation and crossover operators to create candidate products. The fitness of a candidate is evaluated by (a) the number of uncovered t‑combinations it would add to the current suite and (b) its dissimilarity to already selected products, measured via Hamming distance or cosine similarity. By maximizing this composite fitness, the algorithm preferentially selects products that both increase coverage and diversify the suite. The search stops when the desired t‑wise coverage is reached or a pre‑defined maximum number of products is generated. This approach dramatically reduces the search space while preserving high coverage, making it scalable to FMs with thousands of features.
Second, the authors present a prioritization framework that orders any given set of test products according to multiple quality criteria. For each product, a score is computed from (1) the count of newly covered t‑combinations, (2) the inclusion of high‑risk or critical features (e.g., security, safety), (3) historical defect data linking specific feature combinations to past failures, and (4) estimated execution cost (build time, test runtime). The weighted sum of these factors yields a ranking that pushes the most fault‑sensitive and cost‑effective products to the front of the test schedule. Empirical evaluation shows that this ranking places defect‑revealing products in the top 10 % of the suite with a 96 % success rate, compared to 85 % without prioritization.
The experimental campaign covers 124 publicly available FMs, ranging from small automotive or mobile product lines to massive industrial models such as the Linux kernel (over 3,000 features). The authors vary t from 2 to 4 and measure three key metrics: (i) size of the generated test suite, (ii) achieved t‑wise coverage, and (iii) runtime of the generation process. Results indicate that the similarity‑guided search reduces the number of required test products by 30 %–70 % relative to exhaustive or constraint‑solver baselines while maintaining ≥ 95 % of the theoretical t‑wise coverage. For the Linux kernel with t = 3, the method produces a suite of only 1,200 products that covers 98 % of all possible 3‑feature interactions—a dramatic improvement over the hundreds of thousands of products that a naïve enumeration would require. Generation times scale roughly linearly with FM size; even the largest models are processed in under two hours on a commodity workstation.
In summary, the paper makes three major contributions: (1) a novel similarity‑based, search‑driven algorithm for scalable t‑wise test suite generation, (2) a flexible multi‑criteria prioritization technique that aligns testing effort with risk and cost considerations, and (3) an extensive empirical validation demonstrating feasibility on real‑world, large‑scale SPLs. The authors also release their implementation as open‑source, encouraging replication and further research. Future work is outlined to incorporate dynamic feature models, integrate explicit cost models, and exploit cloud‑based distributed execution for even larger SPLs. By bridging the gap between theoretical combinatorial testing and practical industrial needs, this research offers a concrete pathway to more reliable, efficiently tested software product lines.
Comments & Academic Discussion
Loading comments...
Leave a Comment