Drawing Your Programs: Exploring the Applications of Visual-Prompting with GenAI for Teaching and Assessment

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

When designing a program, both novice programmers and seasoned developers alike often sketch out – or, perhaps more famously, whiteboard – their ideas. Yet despite the introduction of natively multimodal Generative AI models, work on Human-GenAI collaborative coding has remained overwhelmingly focused on textual prompts – largely ignoring the visual and spatial representations that programmers naturally use to reason about and communicate their designs. In this proposal and position paper, we argue and provide tentative evidence that this text-centric focus overlooks other forms of prompting GenAI models, such as problem decomposition diagrams functioning as prompts for code generation in their own right enabling new types of programming activities and assessments. To support this position, we present findings from a large introductory Python programming course, where students constructed decomposition diagrams that were used to prompt GPT-4.1 for code generation. We demonstrate that current models are very successful in their ability to generate code from student-constructed diagrams. We conclude by exploring the implications of embracing multimodal prompting for computing education, particularly in the context of assessment.

💡 Research Summary

**
The paper “Drawing Your Programs: Exploring the Applications of Visual‑Prompting with GenAI for Teaching and Assessment” puts forward a position that the current Human‑GenAI collaborative coding literature has been overwhelmingly text‑centric, despite the emergence of multimodal generative AI models capable of processing images, video, and speech. The authors argue that programmers—both novices and experts—regularly rely on visual artifacts such as sketches, whiteboard drawings, and decomposition diagrams to reason about and communicate program structure. Translating these visual artifacts into textual prompts adds an unnecessary cognitive step that can increase extraneous load, especially for beginners who may struggle to articulate design decisions in natural language.

The paper first surveys related work on GenAI in computing education, highlighting concerns about academic honesty, skill erosion, and the need to teach prompt engineering, testing, and debugging. It then reviews literature on sketching and diagramming, showing that visual representations support distributed cognition, improve problem‑solving success, and serve as a natural communication medium in both novice learning contexts and professional development practices.

The core contribution is a case study conducted in a large introductory Python course (N = 133). Students were tasked with designing a program for “Evil Hangman,” a variant of Hangman that maintains a maximal set of possible words after each guess. After a 50‑minute session, each student submitted a hand‑drawn program decomposition diagram (either a simple linear decomposition or a more complex annotated diagram). These images were fed to GPT‑4.1, a multimodal model that accepts image inputs, along with a short textual instruction to generate Python code from the diagram.

Results show that GPT‑4.1 successfully generated functional code for the majority of submissions. The generated programs passed a suite of behavioral tests that verified correct word‑set management, input handling, and win‑condition detection. The authors note that more intricate diagrams sometimes led to minor naming inconsistencies or omitted helper functions, but overall the model’s performance was comparable to human‑written reference solutions.

From these findings, the authors derive several key insights:

Authenticity and Alignment with Real‑World Practice – Visual prompting allows learners to use the same artifacts they would on a whiteboard, eliminating the “translation” step from diagram to text and preserving the natural workflow of software design.
Reduced Extraneous Cognitive Load – By removing the intermediate textual description, students can move directly from mental model → diagram → code, which is especially beneficial for novices who have limited working memory capacity.
New Assessment Paradigms – Diagrams can serve as specifications that are automatically evaluated by generating code and testing its behavior. This shifts assessment focus from code writing to problem‑specification skills, aligning with emerging competencies for Human‑GenAI collaboration.
Iterative Sketch‑Generate‑Test‑Revise Activities – The feedback loop enabled by visual prompting supports formative learning activities where students iteratively refine their designs based on generated code outcomes, fostering a deeper understanding of the relationship between abstract design and concrete implementation.

The paper also discusses limitations and open challenges. The lack of a standardized visual notation leads to variability in how students draw diagrams, which can affect model interpretation. Current multimodal models may struggle with complex or non‑standard symbols, suggesting a need for a “visual prompt language” or guidelines. Dependence on a single model (GPT‑4.1) raises questions about generalizability across other multimodal systems. Moreover, there is a risk that over‑reliance on AI‑generated code could diminish students’ debugging and testing skills if not carefully balanced in curriculum design.

Future work is outlined as follows: (a) expanding experiments to cover diverse programming paradigms (object‑oriented, functional) and more sophisticated diagram types (UML class diagrams, state‑machine diagrams); (b) developing tooling that integrates visual prompting into large‑scale autograding pipelines; (c) investigating hybrid human‑AI review processes where instructors and models jointly evaluate and improve student diagrams; and (d) studying the pedagogical impact of visual prompting on long‑term learning outcomes.

In conclusion, the authors present visual prompting as a promising, under‑explored modality for Human‑GenAI collaborative programming. Their preliminary evidence demonstrates that multimodal models can reliably translate student‑created decomposition diagrams into working code, opening avenues for more authentic design activities, reduced cognitive load, and novel assessment strategies in computing education. The paper calls for systematic research, standardization of visual specifications, and thoughtful curriculum integration to fully realize the educational potential of visual‑prompting with generative AI.

Drawing Your Programs: Exploring the Applications of Visual-Prompting with GenAI for Teaching and Assessment

💡 Research Summary

Comments & Academic Discussion

Leave a Comment