Predicting Defect Content and Quality Assurance Effectiveness by Combining Expert Judgment and Defect Data - A Case Study

Predicting Defect Content and Quality Assurance Effectiveness by   Combining Expert Judgment and Defect Data - A Case Study
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Planning quality assurance (QA) activities in a systematic way and controlling their execution are challenging tasks for companies that develop software or software-intensive systems. Both require estimation capabilities regarding the effectiveness of the applied QA techniques and the defect content of the checked artifacts. Existing approaches for these purposes need extensive measurement data from historical projects. Due to the fact that many companies do not collect enough data for applying these approaches (especially for the early project lifecycle), they typically base their QA planning and controlling solely on expert opinion. This article presents a hybrid method that combines commonly available measurement data and context-specific expert knowledge. To evaluate the method’s applicability and usefulness, we conducted a case study in the context of independent verification and validation activities for critical software in the space domain. A hybrid defect content and effectiveness model was developed for the software requirements analysis phase and evaluated with available legacy data. One major result is that the hybrid model provides improved estimation accuracy when compared to applicable models based solely on data. The mean magnitude of relative error (MMRE) determined by cross-validation is 29.6% compared to 76.5% obtained by the most accurate data-based model.


💡 Research Summary

The paper tackles two fundamental estimation problems that are essential for systematic quality‑assurance (QA) planning and control in software‑intensive projects: (1) how many defects are likely to be present in a given artifact (defect content) and (2) how effective a chosen QA technique will be at detecting those defects (QA effectiveness). Existing quantitative approaches rely heavily on large historical datasets, which many organizations lack, especially during the early phases of a project. Consequently, most companies still base QA scheduling and monitoring primarily on expert intuition.

To bridge this gap, the authors propose a hybrid method that fuses whatever limited defect measurement data are available with context‑specific expert knowledge. The technical core of the approach is a Bayesian estimation framework. Measured defect counts and detection rates from past projects form the likelihood component, while expert assessments of project attributes—such as requirements complexity, team experience, safety criticality, and domain risk—are transformed into prior probability distributions. By doing so, the model automatically gives more weight to expert judgment when data are sparse and lets empirical evidence dominate as more data become available.

The method is evaluated through a case study carried out in the independent verification and validation (IV&V) of critical space‑domain software. The focus is the requirements‑analysis phase, for which only a modest legacy dataset exists (historical defect logs and detection efficiencies of reviews, static analysis, and testing). A panel of seven domain experts (requirements engineers, QA managers, system architects) provided 5‑point Likert ratings on several contextual factors. These ratings were normalized and encoded as priors in the Bayesian model.

Model performance was assessed using k‑fold cross‑validation and compared against three baselines: (a) a pure data‑driven regression model, (b) a rule‑based estimator that uses only expert input, and (c) a traditional defect‑density model taken from the literature. The hybrid Bayesian model achieved a mean magnitude of relative error (MMRE) of 29.6 %, dramatically outperforming the best data‑only model, which recorded an MMRE of 76.5 %. The expert‑only estimator yielded an MMRE of 58 %, confirming that the combination of data and expert knowledge produces a synergistic improvement.

Key insights emerging from the study include:

  1. Practicality under data scarcity – Even with limited historical defect information, integrating expert priors yields reliable defect‑content and effectiveness estimates, enabling more informed QA resource allocation.
  2. Dynamic uncertainty handling – The Bayesian formulation quantifies uncertainty and automatically adjusts the influence of data versus expert opinion as the project progresses and more measurements become available.
  3. Scalability across development stages – Although demonstrated only for requirements analysis, the approach is conceptually extensible to design, implementation, and test phases, provided appropriate expert factors are identified.
  4. Potential for automation – The authors suggest future work on tool support that captures expert assessments via structured surveys and links them to real‑time defect‑tracking systems, allowing continuous model updates.

The paper concludes that a hybrid expert‑data model can substantially improve estimation accuracy over traditional data‑only techniques, especially in early‑life‑cycle contexts where measurement data are insufficient. This finding supports a broader shift toward integrated, evidence‑based QA planning that leverages both quantitative metrics and qualitative expertise, offering a viable path for organizations seeking higher assurance without the overhead of extensive historical data collection.


Comments & Academic Discussion

Loading comments...

Leave a Comment