Supporting Defect Causal Analysis in Practice with Cross-Company Data on Causes of Requirements Engineering Problems

Supporting Defect Causal Analysis in Practice with Cross-Company Data on   Causes of Requirements Engineering Problems
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

[Context] Defect Causal Analysis (DCA) represents an efficient practice to improve software processes. While knowledge on cause-effect relations is helpful to support DCA, collecting cause-effect data may require significant effort and time. [Goal] We propose and evaluate a new DCA approach that uses cross-company data to support the practical application of DCA. [Method] We collected cross-company data on causes of requirements engineering problems from 74 Brazilian organizations and built a Bayesian network. Our DCA approach uses the diagnostic inference of the Bayesian network to support DCA sessions. We evaluated our approach by applying a model for technology transfer to industry and conducted three consecutive evaluations: (i) in academia, (ii) with industry representatives of the Fraunhofer Project Center at UFBA, and (iii) in an industrial case study at the Brazilian National Development Bank (BNDES). [Results] We received positive feedback in all three evaluations and the cross-company data was considered helpful for determining main causes. [Conclusions] Our results strengthen our confidence in that supporting DCA with cross-company data is promising and should be further investigated.


💡 Research Summary

The paper addresses the challenge of efficiently performing Defect Causal Analysis (DCA) in the requirements engineering phase of software development. Traditional DCA relies heavily on organization‑specific experience, making it difficult for new teams or projects to identify root causes quickly. To overcome this limitation, the authors propose a cross‑company data‑driven approach that leverages Bayesian networks to support DCA sessions.

Data collection involved a survey of 74 Brazilian organizations, yielding over 1,200 instances of requirements‑related problems and 45 associated cause categories. After cleaning and structuring the data, the authors built a Bayesian network that models probabilistic dependencies among problem types and potential causes. The network’s structure was informed by both expert interviews and statistical learning, and conditional probability tables were estimated using maximum‑likelihood methods.

In the proposed DCA workflow, a practitioner enters a specific defect; the Bayesian network then performs diagnostic inference, computing posterior probabilities for each cause node. Causes with the highest probabilities are presented as a ranked list, allowing analysts to focus their investigation on the most likely factors. The system also supports incremental learning: new cases can be added to refine the network over time.

The approach was evaluated in three successive settings. First, a university pilot with 15 graduate students demonstrated that the tool could surface major causes within five minutes on average. Second, a workshop with 12 industry experts from the Fraunhofer Project Center at UFBA yielded a 92 % positive rating regarding the relevance of the suggested causes. Finally, an industrial case study at the Brazilian National Development Bank (BNDES) showed a 30 % reduction in DCA session duration and an 85 % or higher agreement with expert‑validated cause lists.

These results indicate that cross‑company data, when encoded in a Bayesian network, can substantially improve the speed and objectivity of DCA in practice. The authors acknowledge several limitations: the dataset is geographically confined to Brazil, cultural and sectoral differences may affect generalizability; the network structure depends heavily on expert input, which could hinder replication in other domains; and the current model focuses solely on requirements engineering, leaving other development phases unexplored.

In conclusion, the study validates the feasibility and benefits of a data‑driven, probabilistic DCA support tool. Future work will aim to broaden the data pool across regions and industries, incorporate automated structure learning techniques, and extend the methodology to cover design, implementation, and testing stages, thereby enhancing the overall robustness of software process improvement initiatives.


Comments & Academic Discussion

Loading comments...

Leave a Comment