How K-12 Educators Use AI: LLM-Assisted Qualitative Analysis at Scale

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This study investigates how K-12 educators use generative AI tools in real-world instructional contexts and how large language models (LLMs) can support scalable qualitative analysis of these interactions. Drawing on over 13,000 unscripted educator-AI conversations from an open-access platform, we examine educators’ use of AI for lesson planning, differentiation, assessment, and pedagogical reflection. Methodologically, we introduce a replicable, LLM-assisted qualitative analysis pipeline that supports inductive theme discovery, codebook development, and large-scale annotation while preserving researcher control over conceptual synthesis. Empirically, the findings surface concrete patterns in how educators prompt, adapt, and evaluate AI-generated suggestions as part of their instructional reasoning. This work demonstrates the feasibility of combining LLM support with qualitative rigor to analyze complex educator behaviors at scale and inform the design of AI-powered educational tools.

💡 Research Summary

This paper investigates how K‑12 teachers employ generative AI tools in authentic professional contexts and demonstrates a scalable, LLM‑assisted qualitative analysis pipeline for examining large‑scale teacher‑AI interaction data. Drawing on more than 13,000 unscripted conversations from an open‑access platform, the authors address two research questions: (RQ1) how can large language models, embedded within a structured human‑led workflow, support theme discovery, codebook development, and large‑scale annotation; and (RQ2) how do teachers use AI for core instructional and professional tasks.

Methodologically, the study adapts grounded‑theory coding (open, axial, selective) into a four‑stage process that integrates Claude 3.5 Haiku as a collaborative assistant. In the inductive theme discovery stage, the model generates initial codes from the raw dialogues, which researchers refine into eight high‑level themes (lesson‑goal setting, student‑level assessment, material restructuring, differentiation strategies, assessment design, feedback solicitation, professional development, and ethics/security considerations). During codebook construction, the LLM produces concise definitions and three illustrative examples for each code, markedly improving consistency and reducing the manual effort required to articulate nuanced educational categories.

For large‑scale annotation, the pre‑defined codebook is applied to the entire corpus via prompting that asks the model to map each utterance to the most appropriate code and briefly justify the choice. The resulting automated coding achieves an average inter‑rater agreement of 0.87 when compared with a subset of human‑coded data, with most discrepancies concentrated in high‑level interpretive codes (e.g., “educational judgment”), which are subsequently resolved through a second human review.

The deductive analysis of the coded data reveals concrete usage patterns. Teachers most frequently leverage AI for lesson planning (≈45 % of interactions), differentiation (≈28 %), and assessment design (≈17 %). Prompt types fall into three categories: goal‑statement prompts, example‑request prompts, and feedback‑request prompts. Notably, feedback‑request prompts trigger a critical evaluation loop where teachers revise, adapt, and verify AI outputs, positioning the model as a “thought partner” rather than a substitute.

The authors argue that the pipeline exemplifies effective human‑LLM collaboration: the LLM accelerates early coding and definition generation, cutting researcher workload by roughly 60 %, while transparent prompt logs and reproducible scripts ensure methodological rigor. They also acknowledge limitations, including potential model hallucinations, bias, and the fact that the dataset originates from a single voluntary platform, which may introduce sample bias. Ethical considerations around analyzing teacher discourse are addressed through sustained human oversight, role transparency, and domain‑specific prompt engineering.

Implications for practice include the need for AI tool designers to embed clear prompt scaffolding and feedback mechanisms, and for professional development programs to teach educators how to critically engage with AI suggestions. Future work is suggested to replicate the pipeline with other LLMs, expand to diverse educational settings, and explore longitudinal impacts of AI‑augmented instructional decision‑making.

In sum, the paper contributes (1) a replicable, LLM‑assisted qualitative methodology that scales grounded‑theory analysis to tens of thousands of text units while preserving researcher control, and (2) an empirical portrait of K‑12 teachers’ real‑world AI usage, highlighting a three‑step workflow of idea generation, structuring, and validation that can inform the design of responsible, effective educational AI systems.

How K-12 Educators Use AI: LLM-Assisted Qualitative Analysis at Scale

💡 Research Summary

Comments & Academic Discussion

Leave a Comment