Designing and Evaluating an AI-enhanced Immersive Multidisciplinary Simulation (AIMS) for Interprofessional Education
Interprofessional education has long relied on case studies and the use of standardized patients to support teamwork, communication, and related collaborative competencies among healthcare professionals. However, traditional approaches are often limited by cost, scalability, and inability to mimic the dynamic complexity of real-world clinical scenarios. To address these challenges, we designed and developed AIMS (AI-enhanced Immersive Multidisciplinary Simulations), a virtual simulation that integrates a large language model (Gemini-2.5-Flash), a Unity-based virtual environment engine, and a character creation pipeline to support synchronized, multimodal interactions between the user and the virtual patient. AIMS was designed to enhance collaborative clinical reasoning and health promotion competencies among students from pharmacy, medicine, nursing, and social work. A formal usability testing session was conducted in which participants assumed professional roles on a healthcare team and engaged in a mix of scripted and unscripted conversations. Participants explored the patient’s symptoms, social context, and care needs. Usability issues were identified (e.g., audio routing, response latency) and used to guide subsequent refinements. Findings suggest that AIMS supports realistic, profession-specific, and contextually appropriate conversations. We discuss technical innovations of AIMS and conclude with future directions.
💡 Research Summary
The paper presents the design, implementation, and preliminary evaluation of an AI‑enhanced Immersive Multidisciplinary Simulation (AIMS) intended to modernize interprofessional education (IPE) for health‑care students. Traditional IPE relies heavily on written case studies and standardized patients (SPs), which, while valuable, are limited by high costs, low scalability, and an inability to capture the dynamic, unpredictable nature of real clinical encounters. To address these shortcomings, the authors built AIMS by integrating three core components: (1) a Character Creation Engine that uses Reallusion Character Creator and iClone to produce a high‑fidelity virtual patient avatar (Jane Ryan) with a library of keyword‑triggered facial expressions and body gestures; (2) a Multimodal AI Engine powered by Google’s Gemini‑2.5‑Flash API that processes spoken input, applies prompt‑engineered conversational guides, and generates context‑appropriate verbal responses while also signalling the appropriate animation; and (3) a Virtual Environment Engine built in Unity and exported as a WebGL application, enabling browser‑based access to two clinical scenes (an emergency‑department triage and a primary‑care office).
The system is role‑aware: students from pharmacy, medicine, nursing, and social work each assume their professional perspective, and the AI model tailors its replies based on the detected role, ensuring that the virtual patient’s answers are profession‑specific and pedagogically relevant. For example, the patient is instructed not to disclose opioid use on the first medication‑history question, prompting learners to practice probing techniques. The architecture allows real‑time speech‑to‑text conversion, AI‑generated text‑to‑speech output, and synchronized animation, creating a multimodal feedback loop that mimics face‑to‑face interaction.
A formal usability study involved 40 interdisciplinary teams (approximately eight students per team) who interacted with AIMS during a scheduled IPE event conducted via Zoom. Participants alternated speaking turns, explored the patient’s symptoms, social context, and care needs, and were observed for usability problems. The most frequent issues were audio routing failures (students’ voices not reaching the AI), response latency (2–3 seconds delay from Gemini calls), and occasional mismatches between trigger keywords and the intended animation, which reduced the realism of non‑verbal cues. These problems were logged, severity‑rated, and fed back into an iterative refinement cycle that introduced audio‑mixing improvements, API call caching, and expanded keyword‑animation mappings.
The authors argue that AIMS offers several advantages over SP‑based training: (i) substantially lower per‑student cost, (ii) unlimited repeatability and scalability through a web‑based deployment, (iii) the ability to embed rich, role‑specific conversational logic, and (iv) the inclusion of synchronized non‑verbal behavior that enhances learner immersion. Nonetheless, they acknowledge limitations such as the current Gemini model’s limited multi‑turn coherence, imperfect speech‑recognition accuracy, and the still‑nascent fidelity of emotional expression compared with live actors.
Future work outlined includes (a) integrating more sophisticated context‑aware dialogue management to sustain longer, more nuanced conversations, (b) exploring VR head‑mounted displays to further increase presence, (c) expanding the scenario library to cover diverse patient demographics and conditions, and (d) conducting large‑scale, controlled trials to measure learning outcomes, knowledge retention, and transfer to real clinical settings. The paper concludes that AI‑driven, screen‑based immersive simulations like AIMS represent a promising, cost‑effective pathway to modernize interprofessional education while preserving the essential human elements of clinical communication.
Comments & Academic Discussion
Loading comments...
Leave a Comment