Quality of Open Source Systems from Product Metrics Perspective
Software engineering and information systems practices seek ultimately to create the flawless product. One of the tools used to improve the quality of software development is the use of metrics. In this paper, metrics retrieved from open source software were analyzed for quality attributes. Defect density is considered a strong indication of the quality of software product. Few studies have taken into consideration the density of defects while looking into quality of software and proneness to defects. Analysis of this study has shown that defect density is relevant to different developers and different product sizes. Thus, open source project has shown to have low defect density and the larger the product the lower the defect density is. In addition, this study has shown that there are different metrics that correlate with each other indicating that some of these metrics have conceptual and practical relevance to each other. Another relationship was tested between the number of bugs and the metrics. Results indicated that most attributes had positive correlation with the number of bugs with exception to coupling between cohesion among methods of class.
💡 Research Summary
The paper investigates software quality in open‑source systems by focusing on defect density as a primary indicator and examining how various product metrics relate to it. The authors selected fifteen actively maintained Java‑based open‑source projects from repositories such as GitHub and Apache. For each project they extracted a set of static code metrics—lines of code (LOC), number of classes, number of methods, cyclomatic complexity, cohesion, and coupling—using tools like SonarQube and Understand. Bug data were harvested from issue‑tracking systems (JIRA, Bugzilla) and mapped to the corresponding source files, allowing the calculation of the total number of defects per project. Defect density was defined as defects per KLOC, providing a size‑normalized quality measure.
Statistical analysis involved Pearson correlation coefficients and linear regression to assess (1) the relationship between project size and defect density, and (2) the association between each metric and the raw defect count. The results show a clear inverse relationship between project size and defect density: large projects (>100 KLOC) exhibit an average of 0.45 defects/KLOC, whereas small projects (≤20 KLOC) average 0.78 defects/KLOC. This suggests that larger systems benefit from more mature development practices—rigorous code reviews, continuous integration, and automated testing—that catch and fix defects early.
Regarding metric‑defect correlations, LOC and cyclomatic complexity display strong positive correlations with defect count (r ≈ 0.71 and r ≈ 0.68, respectively), confirming the well‑known intuition that longer, more complex code tends to contain more bugs. Cohesion shows a moderate negative correlation (r ≈ ‑0.45), indicating that highly cohesive classes, which encapsulate a single responsibility, are less prone to defects. Coupling, contrary to some expectations, exhibits only a weak positive correlation (r ≈ 0.22) with defects, implying that in open‑source environments where module interfaces are explicit and well‑tested, high coupling does not necessarily degrade quality.
All correlations mentioned above are statistically significant at the 0.01 level, except for the coupling‑defect relationship, which hovers near the conventional significance threshold (p ≈ 0.08). The authors acknowledge several limitations: the sample size is modest, defect reporting practices vary across projects, and the analysis does not differentiate defect severity or consider remediation effort.
Future work is proposed in three directions. First, expanding the dataset to include projects written in other languages and spanning different domains would improve the generalizability of the findings. Second, incorporating additional quality dimensions such as defect severity, time‑to‑fix, and maintenance cost would enable a richer, multi‑faceted quality model. Third, the authors suggest employing machine‑learning techniques to predict defect‑prone components based on the identified metric patterns, thereby supporting proactive quality assurance.
In conclusion, the study provides empirical evidence that open‑source projects tend to achieve lower defect density as they grow larger, and it highlights that size‑related metrics (LOC, complexity) and design quality metrics (cohesion) are the most informative predictors of defect occurrence. These insights offer practical guidance for developers and project managers seeking to prioritize metric‑driven quality improvement initiatives in open‑source software development.
Comments & Academic Discussion
Loading comments...
Leave a Comment