A Unified XAI-LLM Approach for EndotrachealSuctioning Activity Recognition

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Endotracheal suctioning (ES) is an invasive yet essential clinical procedure that requires a high degree of skill to minimize patient risk - particularly in home care and educational settings, where consistent supervision may be limited. Despite its critical importance, automated recognition and feedback systems for ES training remain underexplored. To address this gap, this study proposes a unified, LLM-centered framework for video-based activity recognition benchmarked against conventional machine learning and deep learning approaches, and a pilot study on feedback generation. Within this framework, the Large Language Model (LLM) serves as the central reasoning module, performing both spatiotemporal activity recognition and explainable decision analysis from video data. Furthermore, the LLM is capable of verbalizing feedback in natural language, thereby translating complex technical insights into accessible, human-understandable guidance for trainees. Experimental results demonstrate that the proposed LLM-based approach outperforms baseline models, achieving an improvement of approximately 15-20% in both accuracy and F1 score. Beyond recognition, the framework incorporates a pilot student-support module built upon anomaly detection and explainable AI (XAI) principles, which provides automated, interpretable feedback highlighting correct actions and suggesting targeted improvements. Collectively, these contributions establish a scalable, interpretable, and data-driven foundation for advancing nursing education, enhancing training efficiency, and ultimately improving patient safety.

💡 Research Summary

The paper presents a novel framework that combines video‑based human activity recognition (HAR) with explainable AI (XAI) to automatically recognize and provide feedback on endotracheal suctioning (ES) procedures. Recognizing that traditional HAR approaches—primarily pose‑based machine learning or graph‑convolutional‑Transformer models—suffer from limited accuracy (often below 60 % F1) and lack interpretability, the authors propose using a large language model (LLM) as a unified reasoning engine. Specifically, Gemini 2.5 Pro is fed three synchronized inputs: raw video frames of the ES task, SHAP‑based attribution maps generated by an Isolation Forest anomaly detector, and a structured natural‑language prompt that frames the classification and explanation tasks.

The LLM performs three functions simultaneously: (i) zero‑shot classification of eight predefined ES action categories, (ii) generation of human‑readable explanations for each prediction, and (iii) production of context‑aware verbal feedback aimed at nursing trainees. Prompt engineering is central to the approach; the prompt explicitly asks the model to identify the current step, justify its decision using visual cues and SHAP scores, and suggest concrete improvements.

Experiments were conducted on a modest dataset of 44 videos captured from ten experienced nurses and twelve nursing students using a simulation mannequin. The videos were split into 32 for training baseline models and 12 for testing the LLM system, with participant‑level separation to avoid data leakage. Baseline comparisons included a pose‑based SVM/RandomForest pipeline, the SkeleTR GCN‑Transformer architecture, and a multi‑angle video fusion method. The LLM‑centric system achieved a 15‑20 % boost in both accuracy and F1 score, reaching an F1 of approximately 0.84, and showed particular gains in high‑risk steps such as catheter insertion and removal.

Explainability was evaluated by comparing SHAP visualizations with the textual rationales produced by the LLM; an 82 % agreement rate indicates that the model’s natural‑language explanations reliably reflect the underlying attribution signals. The pilot feedback module generated specific, actionable comments (e.g., “your wrist angle changed abruptly during catheter insertion”) that were readily understandable by trainees.

Limitations include the small sample size, the computational cost of feeding raw video to a large LLM, and reliance on zero‑shot prompting, which may require re‑engineering when new action classes are introduced. Future work aims to expand the dataset across multiple institutions, develop semi‑automatic labeling tools, explore lightweight LLM variants for real‑time deployment, and integrate continual learning mechanisms that preserve XAI capabilities.

Overall, the study demonstrates that an LLM‑centered, XAI‑enhanced HAR framework can outperform conventional models, provide transparent decision rationales, and deliver natural‑language feedback, thereby offering a scalable, data‑driven solution for improving nursing education and patient safety in endotracheal suctioning.

A Unified XAI-LLM Approach for EndotrachealSuctioning Activity Recognition

💡 Research Summary

Comments & Academic Discussion

Leave a Comment