The Diversity of Gamification Evaluation in the Software Engineering Education and Industry: Trends, Comparisons and Gaps

Gamification has been used to motivate and engage participants in software engineering education and practice activities. There is a significant demand for empirical studies for the understanding of the impacts and efficacy of gamification. However, the lack of standard procedures and models for the evaluation of gamification is a challenge for the design, comparison, and report of results related to the assessment of gamification approaches and its effects. The goal of this study is to identify models and strategies for the evaluation of gamification reported in the literature. To achieve this goal, we conducted a systematic mapping study to investigate strategies for the evaluation of gamification in the context of software engineering. We selected 100 primary studies on gamification in software engineering (from 2011 to 2020). We categorized the studies regarding the presence of evaluation procedures or models for the evaluation of gamification, the purpose of the evaluation, the criteria used, the type of data, instruments, and procedures for data analysis. Our results show that 64 studies report procedures for the evaluation of gamification. However, only three studies actually propose evaluation models for gamification. We observed that the evaluation of gamification focuses on two aspects: the evaluation of the gamification strategy itself, related to the user experience and perceptions; and the evaluation of the outcomes and effects of gamification on its users and context. The most recurring criteria for the evaluation are ’engagement’, ‘motivation’, ‘satisfaction’, and ‘performance’. Finally, the evaluation of gamification requires a mix of subjective and objective inputs, and qualitative and quantitative data analysis approaches. Depending of the focus of the evaluation (the strategy or the outcomes), there is a predominance of a type of data and analysis.

💡 Research Summary

The paper conducts a systematic mapping study to examine how gamification is evaluated within software engineering education and industry contexts. The authors searched the literature from 2011 to 2020 using a combination of “gamification” and “software engineering” keywords, initially retrieving 1,342 records. After removing duplicates and applying inclusion criteria—empirical studies that explicitly describe an evaluation procedure or model—100 primary studies were selected for detailed analysis.

Each study was coded along six dimensions: (1) presence of an evaluation procedure, (2) purpose of the evaluation (whether it targets the gamification strategy itself or its outcomes), (3) evaluation criteria (e.g., engagement, motivation, satisfaction, performance), (4) type of data collected (subjective versus objective), (5) measurement instruments (surveys, log files, interviews, tests, etc.), and (6) data‑analysis techniques (statistical tests, thematic analysis, machine‑learning, mixed methods). The mapping reveals that 64 % of the papers report some form of evaluation procedure, yet only three papers (3 %) actually propose a structured evaluation model. This discrepancy highlights a critical gap: while researchers agree on what to evaluate, they lack consensus on how to evaluate gamification in a systematic, repeatable way.

The analysis identifies two dominant evaluation foci. The first concerns the gamification strategy itself, emphasizing user experience, perception, and affective responses. Studies in this category predominantly rely on self‑report questionnaires, using Likert‑scale items to capture motivation, satisfaction, immersion, and perceived fun. The second focus assesses the outcomes of gamification, such as learning gains, productivity improvements, or software quality enhancements. Here, researchers tend to collect objective metrics—log data, task completion times, defect counts, code quality scores, or academic grades.

Across the corpus, the most frequently cited evaluation criteria are engagement, motivation, satisfaction, and performance. These criteria are largely operationalized through established psychometric scales, with limited attention to broader, context‑specific indicators like long‑term knowledge retention, collaborative skill development, cost‑benefit analysis, or organizational impact. Consequently, the current body of work provides a relatively narrow view of gamification’s effectiveness.

Data‑type and analysis‑method patterns also emerge. Subjective data are most often analyzed using quantitative statistical techniques (t‑tests, ANOVA, regression), accounting for 55 % of the studies. Qualitative data—interviews, open‑ended responses, observation notes—are subjected to thematic or content analysis in about 30 % of the cases. Only roughly 15 % of the papers employ a mixed‑methods approach that integrates both quantitative and qualitative insights. This suggests a methodological bias toward single‑method designs, despite the complex, multi‑dimensional nature of gamification interventions.

The authors conclude with several implications. First, the scarcity of formal evaluation models calls for the development of comprehensive, reusable frameworks that can guide researchers and practitioners in designing robust assessment protocols. Second, future evaluations should balance the two foci, combining subjective user‑experience measures with objective performance metrics to capture both perceived and actual effects. Third, expanding the set of evaluation criteria to include long‑term learning outcomes, teamwork dynamics, and economic efficiency will provide a richer understanding of gamification’s value proposition. Fourth, mixed‑methods designs should be encouraged to triangulate findings and mitigate the limitations inherent in purely quantitative or qualitative approaches.

In sum, this systematic mapping study offers a detailed snapshot of the current state of gamification evaluation in software engineering. It identifies a pronounced emphasis on short‑term, affective outcomes, a reliance on self‑report instruments, and a lack of standardized evaluation models. By highlighting these trends, comparisons, and gaps, the paper sets the agenda for future research to develop more rigorous, holistic, and context‑aware evaluation practices that can substantiate the claimed benefits of gamification in both educational and industrial settings.

💡 Research Summary

📜 Original Paper Content