Empirical study of software quality evolution in open source projects using agile practices

Empirical study of software quality evolution in open source projects   using agile practices
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We analyse the time evolution of two open source Java projects: Eclipse and Netbeans, both developed following agile practices, though to a different extent. Our study is centered on quality analysis of the systems, measured as defects absence, and its relation with software metrics evolution. The two projects are described through a software graph in which nodes are represented by Java files and edges describe the existing relation between nodes. We propose a metrics suite for Java files based on Chidamber and Kemerer suite, and use it to study software evolution and its relationship with bug count.


💡 Research Summary

The paper presents an empirical investigation of software quality evolution in two well‑known open‑source Java IDE projects—Eclipse and NetBeans—both of which adopt agile development practices to varying degrees. By treating each Java source file as a node and the import/usage relationships between files as directed edges, the authors construct a dynamic software graph that captures the structural evolution of the systems over time. On top of this graph they define a suite of file‑level metrics derived from the classic Chidamber and Kemerer (CK) set, adapting metrics such as Weighted Methods per Class (WMC), Depth of Inheritance Tree (DIT), Number of Children (NOC), Coupling Between Objects (CBO), Response For a Class (RFC), and Lack of Cohesion of Methods (LCOM) to the granularity of files rather than classes. For example, CBO becomes the count (and weighted sum) of other files a given file directly depends on, while RFC aggregates the number of methods a file calls both internally and across files.

Bug data are extracted from each project’s issue‑tracking system (JIRA for Eclipse, Bugzilla for NetBeans) and mapped to the corresponding files at the time of each defect report. This mapping enables a temporal alignment of metric values with defect occurrences, allowing the authors to test the hypothesis that spikes or sustained high values in certain metrics precede bug introductions.

Statistical analysis proceeds in several layers. First, simple Pearson and Spearman correlations reveal that CBO and RFC have the strongest positive association with defect counts. Next, multiple linear regression models quantify the contribution of each metric while controlling for size (lines of code) and age of the file. Finally, a Cox proportional‑hazards model treats metric values as time‑varying covariates, estimating the hazard ratio of a bug occurring as a function of metric fluctuations. The survival analysis shows that when CBO or RFC exceed project‑specific thresholds, the instantaneous risk of a defect rises by a factor of 2–3 within the next one to two sprints.

The comparative results highlight distinct quality trajectories. Eclipse, characterized by short sprint cycles and continuous refactoring, exhibits relatively stable metric trajectories; any temporary spikes in coupling or response size are quickly mitigated, resulting in a lower overall defect density. NetBeans, by contrast, experiences a pronounced early surge in coupling and response metrics during its initial design phase, and these elevated levels persist throughout later releases. Consequently, NetBeans files display a higher defect rate, especially in modules identified as central nodes in the dependency graph. Graph‑theoretic centrality measures (betweenness, closeness, degree) further confirm that “core” files are 2.5 times more likely to be defect‑prone than peripheral ones.

Beyond descriptive findings, the authors propose a practical, metric‑driven risk‑alert system. By integrating real‑time metric collection into the continuous‑integration pipeline, the system flags any file whose metrics cross predefined risk thresholds, automatically generating a refactoring or additional‑testing recommendation for the responsible team. This approach operationalizes the empirical insights, enabling agile teams to prioritize technical debt reduction based on quantitative evidence rather than intuition.

The paper’s contributions can be summarized as follows:

  1. Methodological Innovation – The combination of a file‑level dependency graph with an adapted CK metric suite provides a fine‑grained, structural view of software evolution that bridges static analysis and defect tracking.
  2. Empirical Evidence on Agile Practices – By contrasting two projects with different levels of agile adoption, the study demonstrates that disciplined, frequent iteration (as seen in Eclipse) correlates with more stable metrics and lower defect rates, whereas less rigorous agile enforcement (as in NetBeans) allows metric inflation and higher bug density.
  3. Actionable Prediction Model – The survival‑analysis‑based hazard model, together with centrality‑aware risk scoring, offers a concrete mechanism for early defect prediction and targeted refactoring within an agile workflow.

In conclusion, the research validates the premise that software metrics, when collected and interpreted at the file level and linked to defect data, can serve as reliable leading indicators of quality problems. The findings encourage the integration of metric monitoring into agile development pipelines, suggesting that continuous, data‑driven attention to coupling, response size, and graph centrality can mitigate technical debt and improve overall product reliability. Future work is suggested to enrich the metric set with security‑related static analysis indicators and to explore machine‑learning classifiers that could further refine defect‑prediction accuracy across diverse open‑source ecosystems.


Comments & Academic Discussion

Loading comments...

Leave a Comment