Focusing Testing by Using Inspection and Product Metrics
A well-known approach for identifying defect-prone parts of software in order to focus testing is to use different kinds of product metrics such as size or complexity. Although this approach has been evaluated in many contexts, the question remains if there are further opportunities to improve test focusing. One idea is to identify other types of information that may indicate the location of defect-prone software parts. Data from software inspections, in particular, appear to be promising. This kind of data might already lead to software parts that have inherent difficulties or programming challenges, and in consequence might be defect-prone. This article first explains how inspection and product metrics can be used to focus testing activities. Second, we compare selected product and inspection metrics commonly used to predict defect-prone parts (e.g., size and complexity metrics, inspection defect content metrics, and defect density metrics). Based on initial experience from two case studies performed in different environments, the suitability of different metrics for predicting defect-prone parts is illustrated. The studies revealed that inspection defect data seems to be a suitable predictor, and a combination of certain inspection and product metrics led to the best prioritizations in our contexts. In addition, qualitative experience is presented, which substantiates the expected benefit of using inspection results to optimize testing.
💡 Research Summary
The paper investigates whether data gathered from software inspections can improve the targeting of testing activities beyond what traditional product metrics (size, complexity, change frequency, etc.) can achieve. The authors first define a set of inspection‑derived metrics – inspection count, number of defects found, defect severity distribution, defect density, and defect content (the specific code locations or functional areas where defects were discovered) – and juxtapose them with conventional product metrics such as lines of code (LOC), McCabe cyclomatic complexity, Halstead measures, and module coupling.
Two industrial case studies serve as the empirical basis. The first involves a large financial transaction system comprising roughly 1,200 source files; the second concerns an automotive embedded control suite with about 800 files. In each project the authors collected the full set of product and inspection metrics for every module, then later recorded the defects uncovered during systematic testing. Statistical analysis (logistic regression and ROC‑AUC evaluation) compared three prediction models: (a) product metrics only, (b) inspection metrics only, and (c) a combined model that integrates selected inspection and product metrics.
Results show that inspection‑derived metrics are strong predictors of defect‑prone modules. Modules in the top 10 % of inspection defect density accounted for nearly half (48 %) of all testing defects, while the top 20 % of a combined metric (average inspection defect severity plus McCabe complexity) captured 65 % of testing defects. The combined model outperformed the product‑only model by an average of 0.12 in ROC‑AUC, indicating a substantial gain in discriminative power. Qualitative feedback from developers and test leads corroborated the quantitative findings: inspection findings were perceived as “early warnings” of hidden design or implementation challenges, and incorporating them into test planning reduced the effort required to achieve a given defect detection level.
The authors argue that leveraging inspection data enables more efficient allocation of limited testing resources, especially in large‑scale, safety‑critical environments. They propose practical steps such as integrating inspection and product metrics into a unified dashboard, automating metric collection, and standardising inspection procedures to improve metric reliability.
Limitations are acknowledged. Inspection quality depends on inspector expertise and the consistency of inspection checklists, which may affect metric stability. Both case studies involve relatively large, well‑structured systems, so the generalisability to small‑scale or highly agile projects remains an open question. Moreover, the current analysis relies on linear models; non‑linear interactions between metrics could be explored with machine‑learning techniques.
Future work is outlined to address these gaps: applying advanced predictive models (e.g., random forests, gradient boosting), extending the empirical evaluation to a broader set of domains, and investigating how inspection‑driven test prioritisation interacts with continuous integration pipelines.
In summary, the study demonstrates that inspection defect data complement traditional product metrics and, when combined appropriately, significantly improve the accuracy of defect‑prone module prediction. This insight offers a concrete pathway for organisations to optimise testing effort, lower overall quality costs, and reduce the risk of defects reaching end users.
Comments & Academic Discussion
Loading comments...
Leave a Comment