Exploring the Role of Tracing in AI-Supported Planning for Algorithmic Reasoning
AI-powered planning tools show promise in supporting programming learners by enabling early, formative feedback on their thinking processes prior to coding. To date, however, most AI-supported planning tools rely on students’ natural-language explanations, using LLMs to interpret learners’ descriptions of their algorithmic intent. Prior to the emergence of LLM-based systems, CS education research extensively studied trace-based planning in pen-and-paper settings, demonstrating that reasoning through stepwise execution with explicit state transitions helps learners build and refine mental models of program behavior. Despite its potential, little is known about how tracing interacts with AI-mediated feedback and whether integrating tracing into AI-supported planning tools leads to different learning processes or interaction dynamics compared to natural-language-based planning alone. We study how requiring learners to produce explicit execution traces with an AI-supported planning tool affects their algorithmic reasoning. In a between-subjects study with 20 students, tracing shifted learners away from code-like, line-by-line descriptions toward more goal-driven reasoning about program behavior. Moreover, it led to more consistent partially correct solutions, although final coding performance remained comparable across conditions. Notably, tracing did not significantly affect the quality or reliability of LLM-generated feedback. These findings reveal tradeoffs in combining tracing with AI-supported planning and inform design guidelines for integrating natural language, tracing, and coding to support iterative reasoning throughout the programming process.
💡 Research Summary
This paper investigates the impact of incorporating explicit execution tracing into AI‑supported planning tools for algorithmic reasoning. While recent AI‑driven planning environments rely primarily on free‑form natural‑language explanations interpreted by large language models (LLMs), earlier computer‑science education research highlighted the cognitive benefits of pen‑and‑paper tracing, where learners explicitly track intermediate program states. The authors ask two research questions: (RQ1) How does trace‑based planning affect learners’ planning representations and reasoning processes compared to natural‑language‑only planning? (RQ2) How does trace‑based planning influence the accuracy and perceived usefulness of LLM‑generated feedback?
To answer these questions, the authors conducted an exploratory between‑subjects study with 20 participants (10 per condition) who were proficient in introductory programming but had limited exposure to greedy algorithms. Participants were randomly assigned to either a natural‑language (NL) condition or a trace (T) condition. Both groups worked on the same introductory greedy algorithm problem – the “Jump Game” from LeetCode – using a web‑based planning interface that provided iterative LLM feedback. In the NL condition, learners wrote free‑form English descriptions of their intended algorithm. In the T condition, learners wrote the same description and filled out a step‑by‑step execution trace that recorded the values of three predefined variables (index, tracker, counter) for a given input.
The study protocol comprised four stages: a brief practice session, a 15‑minute planning phase with AI feedback, a 10‑minute coding phase (without AI assistance), and a post‑task survey plus semi‑structured interview. Data collected included all planning artifacts, LLM feedback, final code submissions, Likert‑scale responses on learning outcomes, cognitive load, and usability, and interview transcripts. Analyses combined quantitative measures (step count, control‑flow references, word count, semantic similarity between plan and code, coding correctness) with qualitative coding of plan content and interview excerpts. Inter‑rater reliability for plan coding was κ = 0.70.
Key Findings
-
Planning Representation – Trace‑based plans contained significantly fewer steps (p = 0.026) and fewer explicit references to control‑flow constructs (e.g., if/while) (p = 0.018) than NL plans. Total word count did not differ, indicating that traces led to more compact, less code‑adjacent descriptions rather than shorter text overall. Qualitative analysis revealed that T‑plans tended to bundle multiple operations into a single step and frequently included brief justifications, whereas NL‑plans listed fine‑grained, sequential actions mirroring code structure.
-
Semantic Alignment – Cosine similarity between plan embeddings and final code embeddings showed no significant difference across conditions, suggesting that despite structural differences, both groups produced plans that were equally aligned with their eventual implementations.
-
Coding Outcomes – Participants in the trace condition produced more partially correct intermediate solutions during coding, reflecting an iterative refinement process. However, final performance measured by test‑case pass rates and overall correctness scores did not differ significantly between groups.
-
LLM Feedback Quality – The accuracy of LLM‑generated error identification and the quality of explanatory feedback were comparable across conditions. Survey responses indicated no significant differences in perceived learning gains, confidence, cognitive load, or interface usability.
Implications
The study demonstrates that adding execution tracing to AI‑supported planning shifts learners toward goal‑oriented, higher‑level reasoning and yields more concise planning artifacts. However, current LLM feedback mechanisms appear insensitive to these representational changes, delivering similar quality feedback regardless of trace inclusion. Designers of future AI‑enhanced learning environments should therefore consider (a) how to surface trace information to the LLM (e.g., via richer prompts or structured input formats) so that feedback can be tailored to the learner’s state‑based reasoning, and (b) how to balance the additional effort required for tracing with its cognitive benefits. Moreover, the lack of performance differences suggests that tracing may be most valuable for scaffolding reasoning rather than directly boosting final code correctness.
Limitations and Future Work
The study’s sample size is modest and limited to a single greedy algorithm problem, which constrains generalizability. Future research should explore a broader set of programming tasks, larger and more diverse learner populations, and alternative AI models to assess whether the observed trade‑offs hold across contexts. Additionally, investigating adaptive feedback that explicitly references trace entries could reveal stronger synergies between tracing and AI assistance.
In summary, integrating explicit execution tracing into AI‑supported planning tools reshapes learners’ planning behavior toward more abstract, goal‑driven representations without degrading LLM feedback quality. Optimizing AI systems to leverage trace data promises richer, more reliable feedback and may enhance the overall efficacy of AI‑augmented programming education.
Comments & Academic Discussion
Loading comments...
Leave a Comment