Modeling Languages: metrics and assessing tools

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Any traditional engineering field has metrics to rigorously assess the quality of their products. Engineers know that the output must satisfy the requirements, must comply with the production and market rules, and must be competitive. Professionals in the new field of software engineering started a few years ago to define metrics to appraise their product: individual programs and software systems. This concern motivates the need to assess not only the outcome but also the process and tools employed in its development. In this context, assessing the quality of programming languages is a legitimate objective; in a similar way, it makes sense to be concerned with models and modeling approaches, as more and more people start the software development process by a modeling phase. In this paper we introduce and motivate the assessment of models quality in the Software Development cycle. After the general discussion of this topic, we focus the attention on the most popular modeling language – the UML – presenting metrics. Through a Case-Study, we present and explore two tools. To conclude we identify what is still lacking in the tools side.

💡 Research Summary

The paper opens by drawing a parallel between traditional engineering disciplines, where quantitative metrics have long been used to assess product quality, and the relatively younger field of software engineering, which is only now beginning to adopt systematic measurement practices. It argues that, just as code and the development process are subject to evaluation, the modeling phase—particularly the use of modeling languages—should also be measured to ensure that the artifacts produced are of high quality and that the tools supporting them are effective.

Focusing on the Unified Modeling Language (UML), the most widely adopted visual modeling language, the authors first review existing literature on UML metrics and organize them into four principal categories. The first category, structural complexity, includes simple counts such as the number of classes, interfaces, relationships, inheritance depth, and overall diagram node/edge counts, providing a baseline sense of model size and intricacy. The second category addresses cohesion and coupling, measuring how tightly responsibilities are encapsulated within a class and how strongly classes depend on each other, using indicators like method call frequencies, attribute usage ratios, and average path lengths in dependency graphs. The third category evaluates readability and maintainability, considering layout density, label length, color contrast, and the presence of automatic alignment to gauge how easily a human can understand and modify the model. The fourth category examines consistency, checking adherence to UML naming conventions, stereotype usage, and cross‑diagram linking rules. Each metric yields a numeric value that can be mapped to quality thresholds, enabling designers to receive immediate, actionable feedback.

To illustrate how these metrics can be applied in practice, the paper conducts a case study comparing two tools that claim to assess UML model quality. Tool A is a commercial offering that provides automated metric calculation, a visual dashboard, integration with continuous‑integration pipelines, and collaborative features. Its drawbacks are a high licensing cost and limited ability for users to customize metric definitions. Tool B is an open‑source platform built on a plug‑in architecture, allowing extensive extension and custom metric creation, but it suffers from a more complex user interface and performance degradation when handling very large models.

Both tools are fed the same UML model, and the resulting metric values are compared. For basic structural and cohesion/coupling metrics the tools produce nearly identical results, confirming that the underlying definitions are largely consistent. However, for composite metrics that combine multiple aspects—such as a weighted cohesion‑coupling score—differences emerge due to divergent implementation details. Tool A excels at delivering real‑time feedback and automatically generated reports, whereas Tool B shines in flexibility but lacks live validation capabilities.

The authors conclude by identifying several systemic shortcomings in the current tool landscape. There is no universally accepted standard for metric definitions or interpretation, making cross‑tool comparison difficult. Real‑time quality monitoring throughout the modeling process is generally absent, and most tools do not support analysis of inter‑diagram relationships (e.g., consistency between class and sequence diagrams). Moreover, visualizations are often rudimentary, limiting the usefulness of the feedback for designers.

Future research directions proposed include: (1) collaborating with standards bodies such as the Object Management Group (OMG) to formalize a common metric taxonomy; (2) leveraging machine‑learning techniques to predict model quality based on historical project data, thereby providing proactive guidance; and (3) developing an integrated quality‑monitoring framework that spans the entire development lifecycle—from initial modeling through code generation and deployment—offering continuous, actionable quality insights. By addressing these gaps, the authors argue that the industry can achieve more reliable, maintainable software systems while reducing overall development costs.

Modeling Languages: metrics and assessing tools

💡 Research Summary

Comments & Academic Discussion

Leave a Comment