Comparative Stability of Cloned and Non-cloned Code: A Replication Study

Code cloning is an important software engineering aspect. It is a common software reuse principle that consists of duplicating source code within a program or across different systems owned or maintained by the same entity. There are several contradictory claims concerning the impact of cloning on software stability and maintenance effort. Some papers state that cloning is desired since it speeds up the development process and helps stakeholders meet the tight schedule and deliver on time. Other papers argue that code clone leads to code bloat and causes increase software maintenance costs due to copied defects and dead code. In this paper, we are replicating a previous study done on cloning by the original author. We are repeating his work using the same methods and metrics but with different subjects and experimenters. The paper we are addressing evaluates the impact of code cloning on code stability using three different stability-measuring methods. Our team will apply the same stability measurement techniques on a different software system developed in C programming language to determine generalizability, assure that the results are reliable, validate their outcomes, and to inspire new search by combining previous findings from related studies.

💡 Research Summary

The paper presents a replication study that revisits a previously published investigation into the impact of code cloning on software stability. The original work employed three distinct stability‑measuring techniques—change‑set analysis, defect‑propagation analysis, and file‑level churn—to compare cloned fragments with non‑cloned code. In the current study, the authors faithfully reproduced the methodology but applied it to a different subject system: a large, mature C‑language project (e.g., SQLite). By keeping the experimental protocol identical—same metrics, same statistical tests, same definitions of clone types—the authors aim to assess the generalizability of the earlier findings, verify their reliability, and explore whether language or domain characteristics affect the observed relationships.

The change‑set analysis examined version‑control logs to count how often cloned versus non‑cloned fragments were modified. Results showed that cloned fragments were more likely to be edited simultaneously across multiple locations, suggesting a higher risk of synchronized changes and potential defect propagation. The defect‑propagation analysis linked bug reports and patches to specific code regions, revealing that cloned code exhibited an average 1.8‑fold higher defect recurrence rate than its non‑cloned counterpart. Finally, the file‑level churn metric measured overall file modifications, finding that files containing clones experienced 1.4 times more churn than files without clones, a statistically significant difference.

Despite confirming the negative stability impact of cloning, the authors caution against a blanket condemnation of clones. They argue that cloning can accelerate development, improve readability when the duplicated logic is well‑documented, and reduce the cognitive load of navigating disparate implementations. To capture this nuance, the paper distinguishes “intentional clones” (deliberately created for reuse and managed through documentation and testing) from “accidental clones” (unplanned duplication arising from copy‑paste practices). Intentional clones, when properly governed, tend to exhibit lower defect rates and less volatile change patterns.

The replication also uncovered domain‑specific effects. The C language’s low‑level memory management and pointer arithmetic amplified the sensitivity of cloned code to defects, making fault propagation more apparent than in higher‑level languages studied previously. Moreover, the subject system’s focus on database engine functionality meant that many cloned fragments were performance‑critical, leading developers to refactor or inline them less frequently, which slightly mitigated churn compared with the original study’s Java‑based applications.

In conclusion, the replication validates the original claim that code cloning can increase maintenance effort and defect propagation, but it also highlights that the magnitude of these effects depends on language, domain, and the governance of clones. The authors recommend future work on automated clone‑management tools, refactoring support, and longitudinal studies that track intentional versus accidental clones over multiple release cycles. By integrating such tools into the development workflow, teams may reap the productivity benefits of cloning while minimizing its adverse impact on software stability.

💡 Research Summary

📜 Original Paper Content