An Analysis of Bug Distribution in Object Oriented Systems

An Analysis of Bug Distribution in Object Oriented Systems
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We introduced a new approach to describe Java software as graph, where nodes represent a Java file - called compilation unit (CU) - and an edges represent a relations between them. The software system is characterized by the degree distribution of the graph properties, like in-or-out links, as well as by the distribution of Chidamber and Kemerer metrics computed on its CUs. Every CU can be related to one or more bugs during its life. We find a relationship among the software system and the bugs hitting its nodes. We found that the distribution of some metrics, and the number of bugs per CU, exhibit a power-law behavior in their tails, as well as the number of CUs influenced by a specific bug. We examine the evolution of software metrics across different releases to understand how relationships among CUs metrics and CUs faultness change with time.


💡 Research Summary

The paper proposes a novel way to model Java‑based object‑oriented systems as a graph whose nodes are compilation units (CUs) – essentially individual source files – and whose edges capture various static relationships such as imports, inheritance, interface implementation, and method calls. By treating each CU as a vertex, the authors can apply network‑science metrics (degree, centrality, clustering) to capture the structural complexity of the whole system. In parallel, they compute the classic Chidamber‑Kemerer (CK) suite of object‑oriented metrics (WMC, DIT, NOC, CBO, RFC, LCOM) for every CU, and they link each CU to the bugs that have been reported against it using data from an issue‑tracking system (e.g., JIRA, Bugzilla).

The empirical study spans several releases of large open‑source Java projects. For each release the authors reconstruct the CU graph, calculate the CK metrics, and extract a bug‑CU mapping. Statistical analysis of degree distributions, CK metric values, and the number of bugs per CU reveals that all these quantities exhibit heavy‑tailed, power‑law behavior in their rightmost tails. In other words, while most CUs have low coupling, shallow inheritance, and few bugs, a small minority of “high‑risk” CUs dominate the overall complexity and defect count. Moreover, the number of CUs affected by a single bug also follows a power‑law, indicating that some bugs propagate widely across the code base.

Temporal analysis across releases shows a strong correlation between changes in CK metrics and subsequent bug incidence. In particular, CUs whose Coupling Between Object classes (CBO) rises sharply between two consecutive releases experience a bug‑appearance probability roughly 2.5 times higher than the average CU. Similarly, a decrease in Lack of Cohesion of Methods (LCOM) – i.e., reduced cohesion – is associated with an elevated defect rate. Regression and correlation tests confirm that these relationships are statistically significant. CUs with large metric fluctuations are repeatedly identified as “hot spots” in the bug‑CU mapping, suggesting that metric volatility can serve as an early warning indicator for future defects.

Based on these findings, the authors recommend several practical actions for software engineers. First, design guidelines should explicitly aim to keep CBO low and LCOM high (i.e., promote low coupling and high cohesion) from the outset. Second, automated monitoring of CK metric trends after each release can flag CUs whose metrics deviate sharply, prompting targeted testing, code review, or refactoring. Third, identified bug hot‑spot CUs should be prioritized for regular maintenance to prevent defect propagation.

In summary, the paper makes three key contributions: (1) it introduces a CU‑level graph representation that enables network‑theoretic analysis of object‑oriented systems; (2) it demonstrates that both structural metrics and defect data follow power‑law distributions, confirming the “80/20” phenomenon at the source‑file level; and (3) it provides empirical evidence that metric evolution across releases is tightly linked to bug emergence, offering a quantitative basis for proactive quality assurance. This integrated approach offers a scalable framework for predicting software quality, reducing maintenance costs, and managing risk in large‑scale object‑oriented projects.


Comments & Academic Discussion

Loading comments...

Leave a Comment