"Your click decides your fate": Leveraging clickstream patterns from MOOC videos to infer students information processing & attrition behavior
With an expansive and ubiquitously available gold mine of educational data, Massive Open Online courses (MOOCs) have become the an important foci of learning analytics research. The hope is that this new surge of development will bring the vision of equitable access to lifelong learning opportunities within practical reach. MOOCs offer many valuable learning experiences to students, from video lectures, readings, assignments and exams, to opportunities to connect and collaborate with others through threaded discussion forums and other Web 2.0 technologies. Nevertheless, despite all this potential, MOOCs have so far failed to produce evidence that this potential is being realized in the current instantiation of MOOCs. In this work, we primarily explore video lecture interaction in Massive Open Online Courses (MOOCs), which is central to student learning experience on these educational platforms. As a research contribution, we operationalize video lecture clickstreams of students into behavioral actions, and construct a quantitative information processing index, that can aid instructors to better understand MOOC hurdles and reason about unsatisfactory learning outcomes. Our results illuminate the effectiveness of developing such a metric inspired by cognitive psychology, towards answering critical questions regarding students’ engagement, their future click interactions and participation trajectories that lead to in-video dropouts. We leverage recurring click behaviors to differentiate distinct video watching profiles for students in MOOCs. Additionally, we discuss about prediction of complete course dropouts, incorporating diverse perspectives from statistics and machine learning, to offer a more nuanced view into how the second generation of MOOCs be benefited, if course instructors were to better comprehend factors that lead to student attrition.
💡 Research Summary
This paper investigates how fine‑grained clickstream data from MOOC video lectures can be transformed into meaningful behavioral indicators and used to infer learners’ information‑processing depth and attrition risk. The authors first map raw log events to five elementary actions—play, pause, seek‑forward, seek‑backward, and playback‑rate change—and compute a weighted Information Processing Index (IPI) for each learner‑video pair. The weighting scheme draws on cognitive‑psychology concepts: actions that suggest deeper processing (e.g., rewinding, pausing for note‑taking) receive higher weights, while superficial actions (e.g., fast‑forwarding, increasing playback speed) receive lower weights. IPI values are normalized to a 0‑1 scale, providing a concise metric of how intensively a learner engages with the video content.
Using a large Coursera dataset comprising five science‑engineering courses (over 120 000 participants and 300 videos), the study addresses four research questions. First, the authors demonstrate that IPI correlates positively and significantly with traditional performance measures—quiz scores (r = 0.38), forum posting frequency (r = 0.34), and assignment submission rates (r = 0.31)—indicating that click behavior reflects genuine learning outcomes. Second, they train sequential models (LSTM and a second‑order Markov chain) to predict the next click action. The LSTM achieves 78 % accuracy and an F1‑score of 0.81, outperforming the Markov baseline (65 %). Notably, the model reliably predicts the “rewind → pause → play” pattern, which is associated with higher mastery.
Third, clustering (k‑means with k = 4 and hierarchical Ward’s method) reveals four distinct video‑watching profiles: (1) Deep‑engagers (high IPI, low skip), (2) Selective re‑watchers (moderate IPI, high rewind), (3) Surface skimmers (low IPI, high fast‑forward), and (4) Irregular drop‑outs (high pause‑skip ratio). The latter two groups account for 62 % of all course drop‑outs, while deep‑engagers exhibit a mere 12 % attrition rate.
Finally, the authors build dropout prediction models using logistic regression, random forests, and XGBoost. XGBoost attains the best performance (AUC = 0.86, precision = 0.81, recall = 0.74). Feature importance analysis shows that the average IPI, the proportion of the last video watched, cumulative skip ratio, and early‑week forum participation are the strongest predictors, with IPI contributing the most. This confirms that clickstream‑derived metrics can serve as early warning signals for at‑risk learners.
The paper discusses practical implications: real‑time monitoring of IPI could trigger adaptive interventions such as supplemental explanations for low‑IPI segments or interactive quizzes for surface skimmers. Because click data are inherently non‑identifiable, the approach scales well while respecting privacy. Limitations include reliance on a single platform and a limited set of STEM courses, as well as the expert‑driven weighting of IPI, which may need cultural or disciplinary adjustment. Future work aims to integrate multimodal logs (textual forum posts, quiz responses) into a unified learner model and to evaluate the impact of IPI‑driven interventions through controlled A/B experiments.
In sum, the study provides a robust methodological pipeline—behavioral action extraction, cognitively inspired indexing, and machine‑learning prediction—that demonstrates how MOOC video clickstreams can be leveraged to assess information processing depth and forecast course attrition, offering actionable insights for instructors and platform designers seeking to improve learner retention and outcomes.
Comments & Academic Discussion
Loading comments...
Leave a Comment