Not All Students Engage Alike: Multi-Institution Patterns in GenAI Tutor Use

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The emergence of generative artificial intelligence (GenAI) has created unprecedented opportunities to provide individualized learning support in classrooms as automated tutoring systems at scale. However, concerns have been raised that students may engage with these tools in ways that do not support learning. Moreover, student engagement with GenAI Tutors may vary across instructional contexts, potentially leading to unequal learning experiences. In this study, we utilize de-identified student interaction logs from an existing GenAI Tutor and the learning management system in which it is embedded. We systematically examined student engagement (N = 11,406) with the tool across 200 classes in ten post-secondary institutions through a two-stage pipeline: First, we identified four distinct engagement types at the conversation session level. In particular, 10.4% of them were “shallow engagement” where copy-pasting behavior was prevalent. Then, at the student level, we show that students transitioned across engagement types over time. However, students who exhibited shallow engagement with the tool were more likely to remain in this mode, whereas those who engaged deeply with the tool transitioned more flexibly across engagement types. Finally, at both the session and student levels, we show substantial heterogeneity in student engagement across institution selectivity and course disciplines. In particular, students from highly selective institutions were more likely to exhibit deep engagement. Together, our study advances the understanding of how GenAI Tutors are used in authentic educational settings and provides a framework for analyzing student engagement with GenAI Tutors, with implications for responsible implementation at scale.

💡 Research Summary

**
This paper presents a large‑scale, multi‑institutional analysis of how post‑secondary students interact with a generative AI (GenAI) tutor embedded in a commercial learning management system (LMS). Using de‑identified interaction logs from 11,406 student‑class enrollments (10,629 unique students) across 200 courses at ten universities, the authors investigate three research questions: (1) overall usage patterns and temporal/institutional variation, (2) the emergence of distinct session‑level engagement types, and (3) how individual students transition among these types over time, with attention to contextual factors such as institutional selectivity and discipline.

Data and Context
The GenAI tutor was launched in Fall 2024 and is powered by a three‑tier architecture that routes queries to a pool of ~30 large language models (including GPT‑4o, DeepSeek‑R1, GLM‑4‑Plus) and a domain‑knowledge engine that draws on instructor‑provided materials. The study focuses on the ten institutions with the highest median adoption rates and selects the 20 most‑adopted classes per institution during the Spring 2025 semester to capture stable usage after the initial novelty effect. The dataset includes timestamps, student prompts, tutor responses, LMS activity (quiz completions, scores), class‑level metadata (department, STEM vs. non‑STEM), and institution‑level attributes (highly selective vs. not). No demographic or grade data are available.

Methodology
A two‑stage learning‑analytics pipeline is employed. In Stage 1, each conversation session is represented by 12 quantitative features (e.g., prompt length, copy‑paste ratio, number of follow‑up questions, semantic similarity to prior sessions, response length, etc.). K‑means clustering, guided by silhouette scores and the elbow method, yields four interpretable engagement clusters:

Shallow Engagement – dominated by copy‑pasting, low semantic depth, high reuse of tutor output.
Intermediate Engagement – moderate question complexity, limited self‑explanation.
Deep Exploration – multi‑step queries, explicit requests for reasoning, frequent self‑generated follow‑ups.
Repetitive Utilization – repeated queries on the same assignment, indicating procedural support.

Shallow engagement accounts for 10.4 % of all sessions.

In Stage 2, student‑level sequences of session types are constructed. Transition probabilities are estimated via first‑order Markov matrices, and process‑mining (α‑algorithm) visualizes common pathways. This reveals how students move between engagement modes across weeks of a semester.

Findings

Session‑Level Patterns – Shallow sessions exhibit a 68 % copy‑paste rate and the shortest average prompt count (≈2). Deep Exploration sessions have the longest prompts (average 7.2 per session) and the highest semantic diversity.
Student‑Level Transitions – Students whose dominant pattern is Shallow tend to remain in that state (≈68 % self‑transition probability), indicating low flexibility. In contrast, Deep Explorers show diverse transitions: 45 % shift to Intermediate, 20 % to Repetitive Utilization, and only 35 % stay in Deep Exploration, suggesting higher metacognitive regulation.
Contextual Variation – Highly selective institutions (3 of the 10) have a significantly higher proportion of Deep Exploration (34 % of their sessions) compared with non‑selective schools (≈19 %). STEM courses display more Repetitive Utilization (22 %) and Deep Exploration (19 %) than non‑STEM courses, reflecting discipline‑specific incentive structures.
Equity Implications – The concentration of shallow, low‑learning‑value interactions in less selective institutions raises concerns about widening educational inequality when GenAI tools are deployed at scale.

Contributions

Empirical Insight – The first large‑scale, cross‑institution study of GenAI tutor usage, revealing nuanced engagement typologies beyond simple adoption metrics.
Analytical Framework – A reproducible two‑stage pipeline that combines feature‑based clustering of AI‑student sessions with process‑mining of student‑level trajectories, applicable to other human‑AI interaction datasets.
Policy Relevance – Evidence that contextual factors (institutional selectivity, discipline) shape engagement quality, informing targeted interventions (e.g., UI warnings for copy‑paste, scaffolded prompt‑design training, discipline‑specific tutor configurations).

Implications for Practice

For instructors: early onboarding that highlights the risks of shallow copy‑paste behavior, and assignment design that encourages iterative questioning rather than single‑shot answer retrieval.
For platform providers: implement real‑time detection of high copy‑paste ratios and offer contextual prompts encouraging deeper inquiry.
For administrators: allocate additional support resources (workshops, tutoring) to less selective institutions to mitigate potential inequities.

Conclusion
By leveraging extensive interaction logs, the authors demonstrate that student engagement with GenAI tutors is heterogeneous, temporally dynamic, and strongly conditioned by institutional and disciplinary contexts. The proposed analytical pipeline not only advances our methodological toolkit for studying human‑AI learning interactions but also provides actionable evidence for responsible, equitable scaling of GenAI tutoring technologies in higher education.

Not All Students Engage Alike: Multi-Institution Patterns in GenAI Tutor Use

💡 Research Summary

Comments & Academic Discussion

Leave a Comment