ModARO: A Modular Approach to Architecture Reconstruction of Distributed Microservice Codebases
Microservice architectures promote small, independently developed services, but increase overall architectural complexity. It is crucial that developers understand the architecture and how changes to a service affect the overall system, but rapid and independent development of services increases the risk of architectural drift and discourages the creation and maintenance of documentation. Automatic architecture reconstruction can help avoid these issues, but it is difficult to reuse reconstruction code across multiple projects, as all use different combinations of technologies and project-specific conventions. Reconstruction of architecture-level details is further complicated by the tendency to split microservices into separate repositories, preventing a full view of the system from any one codebase. In this paper, we present and evaluate ModARO, an approach to microservice architecture reconstruction that allows writing modular reconstruction code (’extractors’) for any technologies and reusing them across different projects, independent of the surrounding technology stack or whether or not the services are split into multiple codebases. We demonstrate the effectiveness of our approach by configuring ModARO to reconstruct 10 open source projects, and we validate the usefulness and usability of ModARO against a state-of-the-art baseline in a user study with 8 industry practitioners. Using this approach, developers can assemble or create extractors tailored to their technology stacks and distribute architecture reconstruction across repositories, enabling integration into repository CI/CD pipelines.
💡 Research Summary
The paper tackles the growing problem of architectural drift and insufficient documentation in microservice‑based systems. While microservices promote independent development and continuous delivery, the sheer number of services, heterogeneous technology stacks, and the common practice of storing each service in its own repository make it difficult to maintain a coherent view of the overall system. Automatic architecture reconstruction—extracting architectural information from static artifacts such as source code, configuration files, and build scripts—offers a promising solution, but existing tools suffer from two major limitations: (1) they are tightly coupled to specific languages or frameworks, preventing reuse across projects, and (2) they assume a monolithic code base, which is unrealistic for modern multi‑repo microservice ecosystems.
To address these challenges, the authors introduce ModARO (MODular Architecture Reconstruction Orchestrator). ModARO’s design rests on two complementary concepts: (a) Extractors, small, self‑contained modules written in JavaScript that perform a single analysis task (e.g., parsing Docker‑Compose files, scanning for Java source files, reading Maven pom.xml, etc.), and (b) a Shared Architecture Model, a JSON‑based data structure that serves as the sole communication channel between extractors. Each extractor declares, via a JSON‑Schema, the type of model entity it expects (identified by a $TYPE field) and the required input fields (e.g., $path). This schema registration enables the orchestration engine to verify compatibility before invoking an extractor, to enforce a clear contract, and to prevent accidental misuse.
The shared model is hierarchical and extensible: the top‑level entity ($TYPE: $MODEL) contains global information such as the repository root path; it can contain an array of microservice entities, each with its own $path, name, and any number of technology‑specific flags (e.g., java: true). Extractors may add new entities (e.g., a “microservice” entity discovered from a Docker‑Compose file) or enrich existing ones (e.g., marking a service as a Java service after scanning its source tree). All meta‑fields prefixed with $ are stripped from the final output, allowing extractors to store temporary data without polluting the published architecture description.
The Reconstruction Algorithm drives the analysis. Starting with the top‑level model, it iteratively selects all extractors whose input schema matches the current entity, runs them, and merges their modifications back into the model. Whenever an extractor creates a new entity, the algorithm recursively invokes the appropriate set of extractors for that entity type. After each pass, the algorithm performs a second sweep to catch any newly satisfied schemas, ensuring that data dependencies across extractors are resolved without manual ordering. Conflict detection is built‑in: if two extractors attempt to write incompatible values to the same field, the engine raises an error, preserving model consistency.
A key innovation is Distributed Architecture Reconstruction. Because each extractor operates on a single model entity and does not require global repository access, the entire pipeline can be executed independently inside each microservice’s CI/CD workflow. The resulting partial models are then aggregated (e.g., by a downstream job) into a complete system‑wide architecture. This eliminates the need to clone all services into a monolithic analysis environment and aligns naturally with the way organizations already manage microservice pipelines.
The authors evaluated ModARO on two fronts. First, they configured a set of extractors to reconstruct ten open‑source microservice projects spanning Java/Spring, Node.js/Express, Go, and Python/Flask, among others. The approach successfully generated a coherent model for each project, and the same extractor code was reused across projects with an average reuse rate of 70 %. Second, they conducted a user study with eight industry practitioners, comparing ModARO to ReSSA, a state‑of‑the‑art reconstruction tool. Participants rated ModARO higher on usability, flexibility, and CI/CD integration. Notably, practitioners highlighted the low learning curve: writing a new extractor required only basic JavaScript and a JSON‑Schema, without deep knowledge of the underlying analysis engine.
The paper also discusses limitations and future work. Running arbitrary JavaScript extractors poses security risks; the authors suggest sandboxing or signed extractor packages as mitigations. The current shared model schema is deliberately minimal, sufficient for the case studies but not exhaustive; extending it to capture richer concepts such as runtime topology, versioning, or non‑functional attributes would be valuable. Finally, the authors envision a community‑driven repository of reusable extractors, akin to plugin ecosystems in other domains.
In summary, ModARO provides a modular, technology‑agnostic framework for static architecture reconstruction that works seamlessly across distributed code bases. By decoupling analysis logic (extractors) from data representation (shared model) and by orchestrating them through a conflict‑aware, recursive algorithm, it achieves high reusability, scalability, and integration with modern CI/CD pipelines. The empirical results and practitioner feedback substantiate its practical benefits, positioning ModARO as a significant step forward in maintaining architectural awareness in fast‑moving microservice environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment