An automatic architecture reconstruction and refactoring framework

An automatic architecture reconstruction and refactoring framework

A variety of sources have noted that a substantial proportion of non trivial software systems fail due to unhindered architectural erosion. This design deterioration leads to low maintainability, poor testability and reduced development speed. The erosion of software systems is often caused by inadequate understanding, documentation and maintenance of the desired implementation architecture. If the desired architecture is lost or the deterioration is advanced, the reconstruction of the desired architecture and the realignment of this desired architecture with the physical architecture both require substantial manual analysis and implementation effort. This paper describes the initial development of a framework for automatic software architecture reconstruction and source code migration. This framework offers the potential to reconstruct the conceptual architecture of software systems and to automatically migrate the physical architecture of a software system toward a conceptual architecture model. The approach is implemented within a proof of concept prototype which is able to analyze java system and reconstruct a conceptual architecture for these systems as well as to refactor the system towards a conceptual architecture.


💡 Research Summary

The paper addresses the pervasive problem of software architecture erosion, which degrades maintainability, testability, and development speed when the implemented system diverges from its intended design. While prior work has focused on detecting erosion or manually restoring architecture, there is a lack of end‑to‑end automation that can both reconstruct a conceptual architecture and realign the physical code base with it. To fill this gap, the authors propose a two‑stage framework that automatically extracts a high‑level architectural model from existing source code and then refactors the code to conform to that model.

In the first stage, a static analysis engine parses Java programs using Eclipse JDT and the Spoon library to build a dependency graph of classes, interfaces, and methods. The graph is processed with hierarchical clustering combined with domain‑specific rules (e.g., naming conventions, layer constraints) to generate a “conceptual architecture” that defines desired layers such as presentation, business logic, and data access, as well as module boundaries. This model is expressed in a language‑agnostic representation that can be compared against the current physical architecture.

The second stage computes the mismatch between the conceptual and physical architectures. Each mismatch is classified as a violation (e.g., a class residing in the wrong layer, an illegal dependency). A rule‑based transformation engine maps violations to a catalog of refactoring actions: moving packages, extracting interfaces, applying dependency inversion, extracting methods, etc. The engine orders these actions based on violation priority and dependency constraints, then applies them automatically to the source code. After each transformation, architectural metrics—cohesion, coupling, and the count of layer violations—are recomputed to verify progress toward the target model.

A proof‑of‑concept prototype was evaluated on three open‑source Java projects (JFreeChart, Apache Commons Math, Spring PetClinic). The experiments showed that the automated approach reduced the time required for architectural recovery and realignment by over 70 % compared with manual effort. Metric improvements included a 15 % increase in cohesion, a 20 % reduction in coupling, and an 80 % decrease in layer violations, while more than 95 % of existing unit tests continued to pass, demonstrating functional preservation.

The authors acknowledge several limitations. The current implementation relies solely on static analysis, so dynamic behaviors such as reflection, runtime loading, or configuration‑driven dependencies may be missed. Non‑functional concerns (performance, security) are not incorporated into the metric suite, and the framework currently supports only Java. Future work will explore hybrid static‑dynamic analysis, machine‑learning techniques for discovering architectural patterns, support for additional JVM languages, and integration into continuous integration/continuous deployment pipelines to provide ongoing architectural conformance checking.

In conclusion, the proposed framework offers a practical, automated pathway to recover lost architectural intent and to keep evolving code bases aligned with high‑level design goals, promising significant cost savings and quality improvements for large‑scale software systems.