Sequence Analysis of Learning Behavior in Different Consecutive Activities
The purpose of this research is to study the possibility of identifying students, statistically, by analyzing their behavior in different consecutive activities. In this project, there are three different sorts of activities: animated example, basic example, and parameterized exercises. We extracted the behavior of each student from the log activities of the Mastery Grids platform. Additionally, we investigate by using unsupervised learning technique, whether there are common patterns, that students share or not while performing these activities. We conclude that we are able to identify students from their behavior, besides that there are some common patterns.
💡 Research Summary
This paper investigates whether students can be identified and grouped based on their interaction patterns across three consecutive activity types—animated examples, basic examples, and parameterized exercises—within the Mastery Grids learning environment used for an introductory object‑oriented programming course at the University of Pittsburgh. The authors collected log data for two semesters, encompassing 44 students who completed all three activity types. From each log entry they extracted the student identifier, session, topic, and activity name, and then transformed raw timestamps into binary duration labels (above or below the median) for each activity. For parameterized exercises they also encoded correctness (Pass/Fail) resulting in a compact symbolic representation such as “AnEx p F p”, where each symbol conveys both the activity type and a temporal or performance attribute.
To discover frequent behavioral motifs, the authors applied the SPAM (Sequential PAttern Mining) algorithm, a bitmap‑based approach well‑suited for large sequential datasets. After empirical tuning they set the minimum support to 4 % of all sequences, the maximum gap between consecutive items to 1, and required a minimum pattern length of two items. This configuration yielded 15 common patterns from roughly 650 constructed sequences. Pattern frequencies were normalized per student; a small smoothing constant (0.0001) was added for patterns that never occurred, preventing division‑by‑zero issues in later distance calculations.
The stability of these patterns was examined by randomly splitting each student’s sequence list into two halves, treating each half as a probability distribution over the discovered patterns. Two symmetric distance measures were employed: Jensen‑Shannon divergence (a symmetrized Kullback‑Leibler divergence) and cosine similarity. For every student, the intra‑student distance (self‑distance) was significantly smaller than the inter‑student distance (distance‑to‑other), as confirmed by paired‑sample t‑tests (p < 0.001 for both metrics). This result indicates that individual behavioral signatures are consistent over time and not merely random fluctuations.
Beyond identification, the authors explored whether students naturally form distinct behavioral groups. Using hierarchical agglomerative clustering with Ward’s linkage and forcing the solution to two clusters (k = 2), they derived two interpretable groups. Cluster 1 consists of students who devote a larger proportion of their time to parameterized exercises, repeatedly attempt these tasks, and achieve higher pass rates. Cluster 2, in contrast, emphasizes basic and animated examples, spending comparatively less time on the parameterized component. Visualizations (dendrogram and bar charts) illustrate these differences clearly. The authors argue that such clustering reveals divergent learning strategies that could inform adaptive tutoring or targeted interventions.
Overall, the study demonstrates three key contributions: (1) a pipeline for converting raw educational logs into a symbolic sequential representation that captures both temporal and performance dimensions; (2) evidence that frequent sequential patterns can serve as reliable identifiers of individual learners, with statistical validation of pattern stability; (3) a proof‑of‑concept that unsupervised clustering can uncover meaningful subpopulations with distinct study habits.
Nevertheless, the work has notable limitations. The sample size is modest (44 participants), and the analysis does not incorporate outcome variables such as final grades, assignment scores, or prior knowledge, which would be essential for linking behavioral patterns to academic success. The binary duration labeling based on median splits may obscure finer‑grained timing information, and the choice of only two clusters may oversimplify the spectrum of learning behaviors. Future research should expand the dataset, integrate richer performance metrics, and experiment with more sophisticated temporal models (e.g., hidden Markov models or recurrent neural networks) to capture dynamic changes in behavior. Moreover, linking identified patterns to pedagogical outcomes could enable the design of heuristic or reinforcement‑learning‑based tutoring systems that proactively guide students toward more effective learning trajectories.
Comments & Academic Discussion
Loading comments...
Leave a Comment