Code Quality Evaluation Methodology Using The ISO/IEC 9126 Standard

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This work proposes a methodology for source code quality and static behaviour evaluation of a software system, based on the standard ISO/IEC-9126. It uses elements automatically derived from source code enhanced with expert knowledge in the form of quality characteristic rankings, allowing software engineers to assign weights to source code attributes. It is flexible in terms of the set of metrics and source code attributes employed, even in terms of the ISO/IEC-9126 characteristics to be assessed. We applied the methodology to two case studies, involving five open source and one proprietary system. Results demonstrated that the methodology can capture software quality trends and express expert perceptions concerning system quality in a quantitative and systematic manner.

💡 Research Summary

The paper introduces a systematic methodology for evaluating source‑code quality and static behavior of software systems, explicitly grounded in the ISO/IEC‑9126 quality model. The authors start by decomposing the six ISO/IEC‑9126 characteristics—functionality, reliability, usability, efficiency, maintainability, and portability—into measurable sub‑attributes that can be derived automatically from the code base. Using a static analysis engine (such as SonarQube or Understand), they extract a set of quantitative metrics: cyclomatic complexity, coupling, cohesion, lines of code, comment density, depth of inheritance, and others. These raw metrics are then normalized to a common scale, making them comparable across projects of different size and language.

A distinctive element of the approach is the incorporation of expert knowledge through a “quality characteristic ranking.” Domain experts assign a weight (typically on a 1‑to‑5 scale) to each metric with respect to each ISO characteristic, thereby constructing a weight matrix that reflects the perceived impact of a metric on a particular quality attribute. For instance, high coupling may receive a strong negative weight for maintainability, while a high comment ratio could be weighted positively for usability. The matrix is deliberately flexible: it can be re‑tuned for different domains (embedded, web, data‑intensive) or even for a specific project’s strategic goals.

The evaluation process consists of four steps: (1) automatic metric collection, (2) metric normalization, (3) matrix multiplication of normalized metrics with the expert weight matrix to obtain per‑characteristic scores, and (4) aggregation of those scores into an overall quality index, optionally plotted over time to reveal trends. Because the methodology separates metric extraction from weighting, it supports easy substitution or addition of new metrics (e.g., security‑related static checks) without redesigning the entire framework.

To validate the method, the authors applied it to six case studies: five open‑source systems (including Apache Commons, JUnit, Log4j, etc.) and one proprietary product. For each system, they performed periodic snapshots over a six‑month interval, applied the same weight matrix, and recorded the evolution of the ISO‑based quality scores. The empirical findings were clear: projects with known maintainability problems showed a consistent decline in the maintainability sub‑score, while a project that underwent a deliberate refactoring effort exhibited noticeable improvements in both efficiency and maintainability. Moreover, the authors conducted expert interviews to capture subjective quality assessments; statistical analysis revealed a strong Pearson correlation (r ≈ 0.78) between the subjective ratings and the computed scores, indicating that the weighted static metrics successfully approximate expert perception.

The study highlights several strengths. First, the approach is highly configurable: practitioners can tailor the weight matrix to reflect organizational priorities, making the method applicable across diverse domains. Second, the reliance on fully automated metric extraction reduces manual effort and enables continuous quality monitoring. Third, the ability to visualize quality trends over time provides actionable insight for project managers and developers.

Nevertheless, the authors acknowledge limitations. The weighting process is inherently subjective and depends on the expertise and consistency of the participating experts. To mitigate this, future work could explore consensus‑building techniques (Delphi method) or data‑driven weight inference using machine‑learning models trained on historical quality outcomes. Additionally, static analysis alone cannot capture dynamic performance characteristics, runtime security vulnerabilities, or real‑world usability issues. Integrating dynamic profiling data, runtime monitoring, or user‑feedback metrics would broaden the coverage of the ISO model.

In conclusion, the paper delivers a practical bridge between the abstract ISO/IEC‑9126 quality model and concrete, automated software quality assessment. By coupling automatically derived code metrics with expert‑defined weightings, the methodology offers a systematic, repeatable, and adaptable way to quantify software quality, monitor its evolution, and align measurement with stakeholder expectations. The presented case studies demonstrate its feasibility, and the discussion points toward a roadmap for extending the framework with dynamic analysis and intelligent weight calibration, paving the way for more comprehensive and less subjective quality management in software engineering.

Code Quality Evaluation Methodology Using The ISO/IEC 9126 Standard

💡 Research Summary

Comments & Academic Discussion

Leave a Comment