Measuring Software Quality in Use: State-of-the-Art and Research Challenges

Software quality in use comprises quality from the user’s perspective. It has gained its importance in e-government applications, mobile-based applications, embedded systems, and even business process development. User’s decisions on software acquisitions are often ad hoc or based on preference due to difficulty in quantitatively measuring software quality in use. But, why is quality-in-use measurement difficult? Although there are many software quality models, to the authors’ knowledge no works survey the challenges related to software quality-in-use measurement. This article has two main contributions: 1) it identifies and explains major issues and challenges in measuring software quality in use in the context of the ISO SQuaRE series and related software quality models and highlights open research areas; and 2) it sheds light on a research direction that can be used to predict software quality in use. In short, the quality-in-use measurement issues are related to the complexity of the current standard models and the limitations and incompleteness of the customized software quality models. A sentiment analysis of software reviews is proposed to deal with these issues.

💡 Research Summary

The paper addresses the persistent problem of measuring software quality from the user’s perspective, commonly referred to as “quality‑in‑use” (Q‑i‑U). It begins by outlining the ISO/IEC 25010 standard and the broader ISO SQuaRE (Software Product Quality Requirements and Evaluation) series, which define Q‑i‑U through four characteristics: effectiveness, efficiency, satisfaction, and risk‑mitigation (or safety). While these standards provide a comprehensive theoretical framework, the authors argue that their practical application is hampered by excessive complexity. Implementing the prescribed measurement procedures requires extensive data collection—such as detailed usage logs, controlled experiments, and user surveys—each of which incurs significant time and cost. Moreover, the mapping between qualitative user experiences and the quantitative metrics stipulated by the standards is often ambiguous, especially for efficiency and effectiveness, which vary dramatically across hardware configurations, network conditions, and workflow contexts.

The authors then review a range of traditional, customized quality models (e.g., McCall, Boehm, Dromey, COQUAMO). Although these models have been valuable for assessing internal product attributes (functionality, reliability, maintainability, etc.), they largely ignore the user‑centric dimensions that define Q‑i‑U. Their domain‑specific nature also limits applicability to modern software ecosystems such as mobile apps, cloud services, and embedded systems, where user experience and rapid iteration are paramount.

To bridge this gap, the paper proposes a novel research direction: leveraging sentiment analysis of user‑generated software reviews. The authors note that app stores, online marketplaces, forums, and open‑source repositories contain massive volumes of textual feedback that directly reflect users’ perceptions of efficiency (“the app loads quickly”), effectiveness (“it helps me complete my tasks”), satisfaction (“I love the UI”), and risk (“crashes frequently”). By applying natural language processing (NLP) techniques—tokenization, part‑of‑speech tagging, sentiment lexicons, and machine‑learning classifiers (SVM, Random Forest) or deep‑learning models (BERT)—these reviews can be transformed into positive/negative scores. These scores are then mapped onto the ISO Q‑i‑U characteristics, providing a scalable, real‑time proxy for traditional measurement methods.

The authors acknowledge several challenges inherent to this approach. First, review data are noisy and may contain spam, promotional content, or biased opinions, necessitating robust filtering and credibility assessment mechanisms. Second, linguistic and cultural diversity across languages and domains requires multilingual sentiment resources and domain‑adapted models. Third, there is currently no standardized method for converting sentiment scores into the precise quantitative values required by ISO metrics (e.g., task completion time, error rates). The paper suggests that regression models, Bayesian networks, or other statistical mapping techniques must be developed to achieve this alignment.

Finally, the paper delineates four key research gaps that must be addressed to operationalize quality‑in‑use measurement: (1) development of automated measurement tools that integrate logs, surveys, and sentiment‑derived data; (2) creation of large, labeled, multi‑domain, multilingual review datasets; (3) formulation of rigorous models that link sentiment outputs to ISO‑defined Q‑i‑U metrics; and (4) construction of predictive quality models that support decision‑making for software acquisition, upgrade, and maintenance.

In conclusion, while existing standards and customized models offer a solid theoretical foundation, they fall short in delivering practical, cost‑effective, and user‑centric measurement solutions. Sentiment analysis of software reviews emerges as a promising avenue to overcome these limitations, provided that future work focuses on data quality assurance, multilingual handling, and robust quantitative mapping. If successfully realized, this approach could enable continuous, real‑time assessment of software quality‑in‑use, thereby informing both developers and purchasers with actionable, evidence‑based insights.

💡 Research Summary

📜 Original Paper Content