From Monolith to Microservices: A Comparative Evaluation of Decomposition Frameworks

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Software modernisation through the migration from monolithic architectures to microservices has become increasingly critical, yet identifying effective service boundaries remains a complex and unresolved challenge. Although numerous automated microservice decomposition frameworks have been proposed, their evaluation is often fragmented due to inconsistent benchmark systems, incompatible metrics, and limited reproducibility, thus hindering objective comparison. This work presents a unified comparative evaluation of state-of-the-art microservice decomposition approaches spanning static, dynamic, and hybrid techniques. Using a consistent metric computation pipeline, we assess the decomposition quality across widely used benchmark systems (JPetStore, AcmeAir, DayTrader, and Plants) using Structural Modularity (SM), Interface Number(IFN), Inter-partition Communication (ICP), Non-Extreme Distribution (NED), and related indicators. Our analysis combines results reported in prior studies with experimentally reproduced outputs from available replication packages. Findings indicate that the hierarchical clustering-based methods, particularly HDBScan, produce the most consistently balanced decompositions across benchmarks, achieving strong modularity while minimizing communication and interface overhead.

💡 Research Summary

The paper addresses the persistent challenge of automatically identifying service boundaries when refactoring monolithic applications into microservices. Although numerous decomposition frameworks have been proposed, their comparative evaluation has been hampered by fragmented benchmark sets, inconsistent metric definitions, and limited reproducibility. To remedy this, the authors conduct a unified, reproducible study of eight state‑of‑the‑art tools spanning static, dynamic, and hybrid analysis techniques. The selected tools are Bunch, MEM, CoGCN, HDBScan, a‑BMSC, MonoEmbed (static), FoSCI, Mono2Micro (dynamic), and CHGNN (hybrid).

Four widely used open‑source monoliths serve as benchmarks: JPetStore (e‑commerce), AcmeAir (airline reservation), DayTrader (stock trading), and Plants (online plant nursery). These systems provide a mix of architectural styles, codebase sizes, and runtime behaviors, enabling a comprehensive assessment of each tool’s strengths and weaknesses.

The evaluation relies on a consistent metric pipeline that computes four quantitative quality indicators for every decomposition:

Structural Modularity (SM) – a composite measure of intra‑partition cohesion and inter‑partition coupling; higher values indicate better modularity.
Interface Number (IFN) – average number of APIs per service; lower values suggest simpler, less fragmented services.
Inter‑partition Communication (ICP) – proportion of runtime calls crossing service boundaries; lower values reflect reduced communication overhead.
Non‑Extreme Distribution (NED) – balance of service sizes, penalizing partitions that are too large or too small.

For each benchmark, raw metric values are z‑score normalized across tools, then aggregated using a weighted sum (SM = +3, IFN = ‑1, ICP = ‑1, NED = ‑1). Positive weight for SM reflects its desirability, while the negative weights penalize metrics that should be minimized.

Experimental execution was performed on multiple developer workstations; performance‑related aspects such as runtime latency were deliberately excluded to focus solely on decomposition quality. Where possible, the authors reproduced results by running CHGNN and MonoEmbed from the authors’ replication packages, applying the unified metric pipeline to the generated JSON outputs. For the remaining tools, metric values were taken directly from the original publications because executable artifacts were unavailable.

The results reveal a clear winner: HDBScan, a hierarchical density‑based clustering algorithm, consistently achieves the highest aggregated scores across all four benchmarks. HDBScan delivers strong SM scores while simultaneously keeping IFN, ICP, and NED low, indicating well‑balanced, cohesive services with minimal cross‑service communication and evenly distributed sizes. Among static approaches, HDBScan outperforms other graph‑clustering methods (CoGCN, a‑BMSC) and the classic Bunch/MEM techniques.

Dynamic‑only tools (FoSCI, Mono2Micro) attain low ICP values, reflecting effective reduction of runtime coupling, but their SM scores are comparatively modest, suggesting that they preserve less internal cohesion. The hybrid CHGNN, despite integrating static call graphs with dynamic CRUD traces, does not surpass HDBScan in the weighted aggregate, highlighting that current hybrid implementations may still suffer from suboptimal feature fusion or parameter tuning.

The authors’ contributions are threefold: (1) they curate a common benchmark suite and metric computation pipeline, enabling direct, reproducible comparison of disparate decomposition frameworks; (2) they fill gaps in prior literature by experimentally evaluating tool‑benchmark pairs that had not been previously reported; and (3) they provide a systematic analysis that identifies HDBScan as the most effective approach for generating balanced microservice partitions under the chosen evaluation criteria.

Limitations are acknowledged. The distributed hardware environment introduces uncontrolled variability, although the authors argue that this mirrors real‑world practitioner settings. The equal negative weighting of IFN and ICP may oversimplify the nuanced cost differences between API surface complexity and network latency in production environments. Finally, the study’s scope is limited to four benchmarks; larger, more heterogeneous systems (e.g., microservice‑native applications like SockShop or domain‑specific monoliths) remain unexplored.

Future work is suggested in three directions: expanding the benchmark corpus to include large‑scale enterprise systems and cloud‑native microservice suites; enriching the metric set with operational cost indicators such as deployment time, orchestration overhead, and fault isolation effectiveness; and advancing hybrid decomposition algorithms to better fuse static structural information with dynamic behavioral traces, possibly through machine‑learning‑driven feature weighting.

In summary, this paper delivers a rigorous, reproducible comparative evaluation of microservice decomposition frameworks, demonstrating that hierarchical clustering—specifically HDBScan—offers the most balanced trade‑off among cohesion, coupling, interface simplicity, and service size distribution across diverse monolithic applications.

From Monolith to Microservices: A Comparative Evaluation of Decomposition Frameworks

💡 Research Summary

Comments & Academic Discussion

Leave a Comment