Distributed Architecture Reconstruction of Polyglot and Multi-Repository Microservice Projects
Microservice architectures encourage the use of small, independently developed services; however, this can lead to increased architectural complexity. Accurate documentation is crucial, but is challenging to maintain due to the rapid, independent evolution of services. While static architecture reconstruction provides a way to maintain up-to-date documentation, existing approaches suffer from technology limitations, mono-repo constraints, or high implementation barriers. This paper presents a novel framework for static architecture reconstruction that supports technology-specific analysis modules, called \emph{extractors}, and supports \emph{distributed architecture reconstruction} in multi-repo environments. We describe the core design concepts and algorithms that govern how extractors are executed, how data is passed between them, and how their outputs are unified. Furthermore, the framework is interoperable with existing static analysis tools and algorithms, allowing them to be invoked from or embedded within extractors.
💡 Research Summary
The paper tackles the growing problem of architectural documentation in microservice‑based systems, where services evolve independently and often reside in separate repositories. While dynamic tracing solutions such as OpenTelemetry are mature, static reconstruction—deriving an architectural model directly from source code and configuration—remains under‑explored, especially for polyglot, multi‑repo environments. Existing static approaches either focus narrowly on a single technology stack (e.g., Java Spring) or rely on language‑agnostic abstract syntax trees that demand substantial effort to support new languages.
To address these gaps, the authors introduce ModAR O (Modular Architecture Reconstruction Orchestrator), a framework that centers on extractors—small, self‑contained analysis modules written as functions. Each extractor declares, via a JSON‑Schema definition, the shape of the model entity it can consume (e.g., a top‑level repository entity, a microservice entity with a $path field, etc.). The extractor receives a copy of the entity, inspects the underlying codebase (using file‑system globbing, external parsers, or any third‑party library), mutates the entity by adding new fields or sub‑entities, and returns the modified copy. Crucially, extractors are stateless and are guaranteed to run at most once per entity, which enables parallel execution, memoisation, and deterministic results.
The reconstruction algorithm proceeds in two stages. First, createModelEntity builds the initial top‑level model, injecting configuration values such as the repository root and a unique UUID. Second, runExtractors iterates over the registered extractors, checks schema compatibility, and invokes matching extractors on a copy of the current entity. After all applicable extractors have run, the framework detects changes, merges them into the original entity, and performs a second pass to see whether the newly enriched entity now matches additional extractor schemas. If any sub‑entities were created or altered, the algorithm recurses on those sub‑entities. Conflict detection is performed when two extractors attempt incompatible scalar updates to the same field; non‑scalar updates (arrays, maps) are merged by union, allowing multiple extractors to contribute complementary information.
A key contribution is the support for distributed architecture reconstruction. In a multi‑repo setting, each microservice repository runs ModAR O as part of its CI/CD pipeline, producing a JSON model file that captures the service’s internal structure (endpoints, configuration, dependencies, etc.). These model files follow a common schema that enables a downstream aggregation step to merge them into a single system‑wide view. Inter‑service relationships (e.g., HTTP calls) are expressed as retroactive links: individual extractors record outgoing calls without needing knowledge of the target service; during aggregation, the framework resolves these links by matching recorded URIs, ports, or service identifiers across the collected models. This approach preserves the loose‑coupling principle of microservices while still delivering a coherent architectural diagram.
The framework is deliberately extensible. Existing static analysis tools—such as the ReSSA language‑agnostic AST extractor, SonarQube rule engines, or OpenAPI generators—can be wrapped inside an extractor, allowing their outputs to be injected directly into the model. Adding support for a new language or framework therefore requires only the implementation of a new extractor, not a rewrite of the core engine. The authors acknowledge that this flexibility introduces security considerations, as third‑party extractors may execute arbitrary code; they suggest sandboxing and verification mechanisms as future work.
The paper also discusses internal model fields prefixed with $, which serve as a shared namespace for extractors to exchange auxiliary data (e.g., directory mappings in a mono‑repo). Because extractors cannot rely on global variables, these fields provide a controlled way to pass context without breaking the stateless contract.
In evaluation, the authors provide a prototype implementation (open‑source on GitLab) and demonstrate its operation on a synthetic polyglot project containing Java Spring, Node.js, and Docker‑Compose services spread across three Git repositories. The prototype successfully generated individual service models during each service’s build, and a subsequent aggregation step produced a unified architecture graph showing service endpoints, database connections, and cross‑service HTTP calls.
Overall, ModAR O offers a modular, scalable, and distributed solution for static microservice architecture reconstruction. By decoupling analysis logic into reusable extractors, supporting arbitrary technology stacks, and enabling per‑service CI/CD integration, the framework promises to keep architectural documentation up‑to‑date with minimal developer overhead. The authors identify future directions such as tighter integration with dynamic tracing, richer model schemas for non‑functional attributes, and secure execution environments for third‑party extractors.
Comments & Academic Discussion
Loading comments...
Leave a Comment