Analysing Knowledge Construction in Online Learning: Adapting the Interaction Analysis Model for Unstructured Large-Scale Discourse
The rapid expansion of online courses and social media has generated large volumes of unstructured learner-generated text. Understanding how learners construct knowledge in these spaces is crucial for analysing learning processes, informing content design, and providing feedback at scale. However, existing approaches typically rely on manual coding of well-structured discussion forums, which does not scale to the fragmented discourse found in online learning. This study proposes and validates a framework that combines a codebook inspired by the Interaction Analysis Model with an automated classifier to enable large-scale analysis of knowledge construction in unstructured online discourse. We adapt four comment-level categories of knowledge construction: Non-Knowledge Construction, Share, Explore, and Integrate. Three trained annotators coded a balanced sample of 20,000 comments from YouTube education channels. The codebook demonstrated strong reliability, with Cohen’s kappa = 0.79 on the main dataset and 0.85–0.93 across four additional educational domains. For automated classification, bag-of-words baselines were compared with transformer-based language models using 10-fold cross-validation. A DeBERTa-v3-large model achieved the highest macro-averaged F1 score (0.841), outperforming all baselines and other transformer models. External validation on four domains yielded macro-F1 above 0.705, with stronger transfer in medicine and programming, where discourse was more structured and task-focused, and weaker transfer in language and music, where comments were more varied and context-dependent. Overall, the study shows that theory-driven, semi-automated analysis of knowledge construction at scale is feasible, enabling the integration of knowledge-construction indicators into learning analytics and the design of online learning environments.
💡 Research Summary
The paper addresses the growing need to analyze massive amounts of learner‑generated text that appear on unstructured platforms such as YouTube. Traditional approaches to studying knowledge construction rely on manual coding of well‑structured discussion forums, which does not scale to the fragmented, short, and often asynchronous comments typical of social media. To overcome this limitation, the authors adapt the Interaction Analysis Model (IAM) – originally a five‑phase framework describing progressive collaborative knowledge building – to the level of individual comments. They collapse the original phases into four pragmatic categories: Non‑Knowledge Construction, Share, Explore, and Integrate. This reconceptualization treats IAM phases as epistemic functions that can surface in isolated contributions, making the model applicable to non‑linear discourse.
A balanced sample of 20,000 YouTube education comments was annotated by three trained coders using the newly developed codebook. Inter‑rater reliability was strong, with Cohen’s κ = 0.79 on the primary dataset and κ ranging from 0.85 to 0.93 across four additional domains (medicine, programming, language, music), demonstrating the codebook’s robustness and cross‑domain validity.
For automated classification, the study compares traditional bag‑of‑words baselines (logistic regression, SVM) with several transformer‑based language models, including BERT, RoBERTa, and DeBERTa‑v3‑large. Using 10‑fold cross‑validation, DeBERTa‑v3‑large achieves the highest macro‑averaged F1 score of 0.841, outperforming all baselines and other transformers, particularly in distinguishing the more nuanced Explore and Integrate categories.
External validation is conducted on four distinct educational domains. The DeBERTa‑v3‑large model maintains macro‑F1 scores above 0.705 in all cases, with stronger transfer to medicine and programming where comments tend to be more structured and task‑focused, and weaker performance in language and music where discourse is more varied and context‑dependent. These results highlight both the promise and the limits of cross‑domain generalization.
The discussion emphasizes several contributions: (1) a theoretically grounded, comment‑level codebook that preserves the epistemic intent of IAM while accommodating fragmented discourse; (2) empirical evidence that large‑scale, balanced annotation combined with state‑of‑the‑art transformers can reliably automate knowledge‑construction detection; (3) practical implications for learning analytics, such as real‑time monitoring of cognitive engagement, targeted instructor feedback, and data‑driven instructional design. Limitations include the inability of isolated comments to capture higher‑order collaborative processes (e.g., negotiation of meaning) and the restriction to four coarse categories, which may overlook finer‑grained moves like critique or rebuttal.
Future work is suggested in three directions: integrating reply‑chain information or graph‑based representations to recover interactional context; expanding the label set to multi‑label or hierarchical schemes for richer epistemic granularity; and incorporating multimodal cues (video content, subtitles, audio) to strengthen the alignment between textual signals and underlying cognitive processes.
In sum, the study demonstrates that a theory‑driven, semi‑automated pipeline can scale knowledge‑construction analysis to large, noisy, unstructured online learning environments, opening avenues for more nuanced learning analytics and adaptive educational technologies.
Comments & Academic Discussion
Loading comments...
Leave a Comment