HealthProcessAI: A Technical Framework and Proof-of-Concept for LLM-Enhanced Healthcare Process Mining
Process mining has emerged as a powerful analytical technique for understanding complex healthcare workflows. However, its application faces significant barriers, including technical complexity, a lack of standardized approaches, and limited access to practical training resources. We introduce HealthProcessAI, a GenAI framework designed to simplify process mining applications in healthcare and epidemiology by providing a comprehensive wrapper around existing Python (PM4PY) and R (bupaR) libraries. To address unfamiliarity and improve accessibility, the framework integrates multiple Large Language Models (LLMs) for automated process map interpretation and report generation, helping translate technical analyses into outputs that diverse users can readily understand. We validated the framework using sepsis progression data as a proof-of-concept example and compared the outputs of five state-of-the-art LLM models through the OpenRouter platform. To test its functionality, the framework successfully processed sepsis data across four proof-of-concept scenarios, demonstrating robust technical performance and its capability to generate reports through automated LLM analysis. LLM evaluation using five independent LLMs as automated evaluators revealed distinct model strengths: Claude Sonnet-4 and Gemini 2.5-Pro achieved the highest consistency scores (3.79/4.0 and 3.65/4.0) when evaluated by automated LLM assessors. By integrating multiple Large Language Models (LLMs) for automated interpretation and report generation, the framework addresses widespread unfamiliarity with process mining outputs, making them more accessible to clinicians, data scientists, and researchers. This structured analytics and AI-driven interpretation combination represents a novel methodological advance in translating complex process mining results into potentially actionable insights for healthcare applications.
💡 Research Summary
HealthProcessAI is a comprehensive, modular framework that lowers the barriers to applying process mining in healthcare and epidemiology by wrapping the well‑established PM4PY (Python) and bupaR (R) libraries and integrating multiple large language models (LLMs) for automated interpretation and report generation. The authors designed six core modules: (1) data loading and preparation, which ingests CSV event logs, enforces international healthcare data standards, and performs extensive quality checks; (2) process mining analysis, offering a suite of discovery algorithms (DFG, Heuristics Miner, Alpha, Inductive Miner) plus healthcare‑specific enhancements such as guideline‑driven pathway discovery, causal treatment‑effect analysis, and risk‑stratification; (3) LLM integration, which connects to five state‑of‑the‑art models (Claude Sonnet‑4, GPT‑4.1, Gemini 2.5‑Pro, DeepSeek R1, X‑AI Grok‑4) via the OpenRouter platform, employing carefully engineered prompts that embed clinical terminology and metadata; (4) report orchestration, which aggregates multi‑model outputs using voting, inter‑rater reliability (Cohen’s Kappa), and uncertainty quantification to produce a single, transparent report that highlights consensus and disagreement; (5) advanced analytics, demonstrating capabilities such as guideline conformance checking, patient clustering, bottleneck detection, and predictive process monitoring; and (6) an evaluation framework that scores LLM‑generated reports on clinical accuracy, process‑mining understanding, actionable insights, readability, uncertainty expression, and overall consistency.
The framework was validated on four proof‑of‑concept (PoC) scenarios using sepsis progression data from the PhysioNet Challenge and process maps from the SCREAM database. In each scenario the pipeline successfully performed data ingestion, model discovery, LLM‑based interpretation, and automated report generation within a few minutes on a standard research workstation, confirming technical feasibility and acceptable resource consumption.
For LLM evaluation, the authors used the same framework to let five independent LLMs act as automated assessors. Scores were weighted (clinical accuracy 25 %, process understanding 20 %, actionable insights 20 %, readability 15 %, uncertainty handling 10 %, overall consistency 10 %). Claude Sonnet‑4 achieved the highest consistency (3.79/4.0) followed closely by Gemini 2.5‑Pro (3.65/4.0). GPT‑4.1 showed strong accuracy but higher latency and cost, while DeepSeek R1 and Grok‑4 performed adequately but lagged in clinical nuance.
Key insights include: (1) multi‑model LLM orchestration mitigates individual model blind spots, especially in medical terminology; (2) translating process‑mining artifacts into natural‑language reports dramatically improves accessibility for clinicians and non‑technical stakeholders; (3) the combination of open‑source mining tools with cloud‑based LLMs offers a cost‑effective, scalable solution; (4) the current work lacks external clinical validation, so a dedicated expert review pipeline will be essential before deployment in patient‑care settings.
Overall, HealthProcessAI represents a novel methodological advance that abstracts the technical complexity of process mining, provides extensive educational resources, and leverages LLMs to render complex analytical outputs into clinically meaningful narratives. The authors envision future extensions that incorporate full clinical validation, broader disease domains (e.g., oncology, mental health), and tighter integration with electronic health record systems to support real‑time decision‑making in healthcare environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment