CURENet: Combining Unified Representations for Efficient Chronic Disease Prediction

Reading time: 4 minute
...

📝 Abstract

Electronic health records (EHRs) are designed to synthesize diverse data types, including unstructured clinical notes, structured lab tests, and time-series visit data. Physicians draw on these multimodal and temporal sources of EHR data to form a comprehensive view of a patient’s health, which is crucial for informed therapeutic decision-making. Yet, most predictive models fail to fully capture the interactions, redundancies, and temporal patterns across multiple data modalities, often focusing on a single data type or overlooking these complexities. In

💡 Analysis

Electronic health records (EHRs) are designed to synthesize diverse data types, including unstructured clinical notes, structured lab tests, and time-series visit data. Physicians draw on these multimodal and temporal sources of EHR data to form a comprehensive view of a patient’s health, which is crucial for informed therapeutic decision-making. Yet, most predictive models fail to fully capture the interactions, redundancies, and temporal patterns across multiple data modalities, often focusing on a single data type or overlooking these complexities. In

📄 Content

CURENet: Combining Unified Representations for Efficient Chronic Disease Prediction Cong-Tinh Dao1,2, Nguyen Minh Thao Phan1,2, Jun-En Ding3, Chenwei Wu4, David Restrepo5, Dongsheng Luo6, Fanyi Zhao3, Chun-Chieh Liao3, Wen-Chih Peng1, Chi-Te Wang8, Pei-Fu Chen8,9, Ling Chen1, Xinglong Ju7, Feng Liu3, Fang-Ming Hung8* 1National Yang Ming Chiao Tung University, Taiwan. 2Can Tho University, Vietnam. 3Stevens Institute of Technology, USA. 4University of Michigan, USA. 5Massachusetts Institute of Technology, USA. 6Florida International University, USA. 7Southern Utah University, USA. 8Far Eastern Memorial Hospital, Taiwan. 9Yuan Ze University, Taiwan. *Corresponding author(s). E-mail(s): philip@mail.femh.org.tw; Contributing authors: dctinh@ctu.edu.vn; pnmthao@ctu.edu.vn; jding17@stevens.edu; chenweiw@umich.edu; davidres@mit.edu; dluo@fiu.edu; fanyi.zhao@icloud.com; cliao9@stevens.edu; wcpengcs@nycu.edu.tw; drwangct@gmail.com; femh96949@mail.femh.org.tw; ling.chen@nycu.edu.tw; xinglongju@suu.edu; fliu22@stevens.edu; Abstract Electronic health records (EHRs) are designed to synthesize diverse data types, including unstructured clinical notes, structured lab tests, and time-series visit data. Physicians draw on these multimodal and temporal sources of EHR data to form a comprehensive view of a patient’s health, which is crucial for informed therapeutic decision-making. Yet, most predictive models fail to fully capture the interactions, redundancies, and temporal patterns across multiple data modali- ties, often focusing on a single data type or overlooking these complexities. In 1 arXiv:2511.11423v1 [cs.AI] 14 Nov 2025 this paper, we present CURENet, a multimodal model (Combining Unified Rep- resentations for Efficient chronic disease prediction) that integrates unstructured clinical notes, lab tests, and patients’ time-series data by utilizing large lan- guage models (LLMs) for clinical text processing and textual lab tests, as well as transformer encoders for longitudinal sequential visits. CURENet has been capable of capturing the intricate interaction between different forms of clinical data and creating a more reliable predictive model for chronic illnesses. We eval- uated CURENet using the public MIMIC-III and private FEMH datasets, where it achieved over 94% accuracy in predicting the top 10 chronic conditions in a multi-label framework. Our findings highlight the potential of multimodal EHR integration to enhance clinical decision-making and improve patient outcomes. Keywords: Large Language Model Fine-tuning, Transformer, Electronic Health Records, Multi-Disease Prediction 1 Introduction Noncommunicable diseases (NCDs), known as chronic illnesses, are typically long-term and are caused by a combination of genetic, physiological, environmental, and behav- ioral factors [1]. According to the World Health Organization (WHO), NCDs account for 41 million deaths each year, or 74% of all deaths globally [1]. These conditions are widespread, impairing quality of life and placing a significant burden on health- care systems. Chronic diseases remain among the leading causes of death [2]. They are neither fully preventable nor curable, and their impact persists over time. However, early identification can help mitigate their impact. Consequently, the importance of automated chronic disease prediction in healthcare is increasingly apparent. Medical practitioners can take early action against the disease via computerized prediction algorithms that evaluate and identify high-risk individuals on the basis of various stud- ies. These methods uncover hidden patterns in vast health datasets, helping clinicians focus on the most vulnerable patients and allocate resources more efficiently. There- fore, automated chronic disease prediction is essential for reducing the global burden of chronic diseases and advancing precision medicine. Electronic health records (EHRs), which combine unstructured clinical notes with structured tabular information, such as vital signs, diagnoses, and test findings, offered a rich, multimodal dataset. These many data sources, such as time series data from ongoing monitoring, provide a significant chance for disease prediction. The achieve- ments of deep learning (DL) provide a great opportunity to improve the precision and efficacy of disease prediction. Despite the sophisticated models that can improve patient outcomes [3–5] and the abundance and variety of multimodal EHR data, it may be challenging to gather and integrate therapeutically valuable data from EHRs because of their redundancy and variety. Figure 1 shows a patient’s medical journey and the progression of their health over time with three hospital stays. The green areas represent the time duration, which is the length of hospital stay at each visit, whereas the red areas represent the time intervals between visits, which are computed from 2 Fig. 1 Example of EHR data with visits and Diagnoses with time durations and intervals. the

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut