Using Latent Semantic Analysis to Identify Quality in Use (QU) Indicators from User Reviews

The paper describes a novel approach to categorize users’ reviews according to the three Quality in Use (QU) indicators defined in ISO: effectiveness, efficiency and freedom from risk. With the tremendous amount of reviews published each day, there is a need to automatically summarize user reviews to inform us if any of the software able to meet requirement of a company according to the quality requirements. We implemented the method of Latent Semantic Analysis (LSA) and its subspace to predict QU indicators. We build a reduced dimensionality universal semantic space from Information System journals and Amazon reviews. Next, we projected set of indicators’ measurement scales into the universal semantic space and represent them as subspace. In the subspace, we can map similar measurement scales to the unseen reviews and predict the QU indicators. Our preliminary study able to obtain the average of F-measure, 0.3627.

💡 Research Summary

The paper tackles the problem of automatically identifying the three ISO‑defined Quality‑in‑Use (QU) indicators—effectiveness, efficiency, and freedom from risk—from large volumes of user reviews. Recognizing that modern software products generate thousands of reviews daily, the authors argue that manual analysis is infeasible for organizations that need to verify whether a product meets their quality requirements. To address this, they propose a pipeline based on Latent Semantic Analysis (LSA) that builds a “universal semantic space” from two heterogeneous corpora: scholarly articles from information systems journals and Amazon product reviews.

First, the authors collect and preprocess the texts (tokenization, stop‑word removal, stemming) and construct a TF‑IDF weighted term‑document matrix. They then apply singular value decomposition (SVD) to reduce dimensionality to a range of 300–500 latent dimensions, thereby creating a shared semantic space that is intended to capture general language patterns across domains.

Second, they operationalize each QU indicator by a set of measurement scales (e.g., “time to complete a task” for efficiency, “error frequency” for risk). These scales are expressed as short sentences or keyword lists, vectorized using the same TF‑IDF weighting, and projected into the universal semantic space. The collection of vectors associated with a single indicator forms a subspace that represents the semantic footprint of that indicator.

Third, an unseen user review is processed in the same way, projected into the universal space, and compared to each indicator subspace using cosine similarity. The review is assigned to the indicator whose subspace yields the highest similarity score. This approach requires no supervised training data; the mapping is driven entirely by semantic proximity.

The authors evaluate the method on a modestly labeled dataset of reviews that have been manually annotated with the appropriate QU indicator. Performance is measured using precision, recall, and the harmonic mean (F‑measure). The reported average F‑measure across the three indicators is 0.3627, with individual scores of roughly 0.41 for efficiency, 0.35 for effectiveness, and 0.28 for freedom from risk.

In the discussion, the authors acknowledge that the modest performance reflects several limitations. The universal semantic space, while designed for generality, dilutes domain‑specific nuances that are crucial for distinguishing subtle quality aspects. The measurement scales are few in number and heavily dependent on expert judgment, which hampers reproducibility and scalability. Moreover, LSA’s linear algebraic foundation cannot capture polysemy or syntactic dependencies as effectively as modern contextual embeddings. The authors also note class imbalance in the labeled data and the variability of review length as additional sources of error.

Future work is outlined along four main directions: (1) replacing or augmenting LSA with transformer‑based language models (e.g., BERT, RoBERTa) to incorporate contextual information; (2) enriching the universal semantic space with domain‑specific corpora to improve relevance; (3) automating the generation of measurement scales through crowdsourcing or semi‑supervised techniques; and (4) applying advanced sampling or cost‑sensitive learning to mitigate class imbalance.

Overall, the paper contributes a novel problem formulation—automatic extraction of ISO QU indicators from user‑generated content—and demonstrates a proof‑of‑concept pipeline based on LSA and subspace projection. While the current results fall short of practical deployment, the work opens a pathway for integrating semantic analysis with software quality assessment, provided that more sophisticated language models, richer labeled data, and robust evaluation protocols are incorporated in subsequent research.

💡 Research Summary

📜 Original Paper Content