Capturing "attrition intensifying" structural traits from didactic interaction sequences of MOOC learners

Capturing "attrition intensifying" structural traits from didactic   interaction sequences of MOOC learners
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This work is an attempt to discover hidden structural configurations in learning activity sequences of students in Massive Open Online Courses (MOOCs). Leveraging combined representations of video clickstream interactions and forum activities, we seek to fundamentally understand traits that are predictive of decreasing engagement over time. Grounded in the interdisciplinary field of network science, we follow a graph based approach to successfully extract indicators of active and passive MOOC participation that reflect persistence and regularity in the overall interaction footprint. Using these rich educational semantics, we focus on the problem of predicting student attrition, one of the major highlights of MOOC literature in the recent years. Our results indicate an improvement over a baseline ngram based approach in capturing “attrition intensifying” features from the learning activities that MOOC learners engage in. Implications for some compelling future research are discussed.


💡 Research Summary

The paper tackles the persistent problem of learner attrition in Massive Open Online Courses (MOOCs) by modeling combined video click‑stream and discussion‑forum activities as directed, weighted graphs. Each raw event—such as video play, pause, forward/backward seek, rate change, as well as forum posts, comments, thread starts, up‑votes, down‑votes, and view actions—is first ordered chronologically and then transformed into a weekly interaction footprint. From these footprints the authors extract three families of features: (1) n‑grams of length 2‑5 to capture short sequential patterns; (2) proportions of “active” versus “passive” behaviors for both video and forum domains (e.g., seeking or commenting are active, while simple play or view actions are passive); and (3) a suite of graph‑theoretic metrics computed on a graph built by linking each pair of consecutive activities with a directed edge of weight one. The graph metrics include node and edge counts, density (allowing values > 1 due to self‑loops), number of self‑loops, number of strongly connected components (SCC), the top three activities by indegree centrality, and the edge with highest betweenness centrality. Control variables such as course week, user week, and a categorical indicator of activity type (video‑only, forum‑only, both, or none) are also added.

Two experimental setups are evaluated on a Coursera course dataset (≈14 k students with valid video logs, ≈31 k forum view events, ≈48 k posts/comments). In the “Curr” setup, features from the current week only are used to predict dropout in the following week; in the “TCurr” setup, all features accumulated from the start of participation up to the current week are used. Classification models (logistic regression and random forests) are trained and compared against a baseline that relies solely on n‑grams. Results show that the TCurr configuration yields higher means and standard deviations for all graph metrics, reflecting the long‑tailed distribution typical of online communities (the 90‑9‑1 rule). Central activities and transitions are predominantly passive (e.g., view‑thread, rate‑change), suggesting that students whose interaction graphs are dominated by such patterns are more likely to disengage. Overall, the graph‑based feature set improves predictive performance over the n‑gram baseline, demonstrating that structural information about how learners switch between video and forum actions provides valuable signals of impending attrition.

The study contributes a novel multi‑modal graph representation for MOOC behavior, highlights the importance of distinguishing active versus passive engagement, and offers a foundation for future work that could address data sparsity, incorporate longer‑range temporal dependencies, and integrate deep sequence models for even more robust dropout prediction and timely intervention.


Comments & Academic Discussion

Loading comments...

Leave a Comment