Using Natural Language Processing to Screen Patients with Active Heart Failure: An Exploration for Hospital-wide Surveillance

Reading time: 5 minute
...

📝 Abstract

In this paper, we proposed two different approaches, a rule-based approach and a machine-learning based approach, to identify active heart failure cases automatically by analyzing electronic health records (EHR). For the rule-based approach, we extracted cardiovascular data elements from clinical notes and matched patients to different colors according their heart failure condition by using rules provided by experts in heart failure. It achieved 69.4% accuracy and 0.729 F1-Score. For the machine learning approach, with bigram of clinical notes as features, we tried four different models while SVM with linear kernel achieved the best performance with 87.5% accuracy and 0.86 F1-Score. Also, from the classification comparison between the four different models, we believe that linear models fit better for this problem. Once we combine the machine-learning and rule-based algorithms, we will enable hospital-wide surveillance of active heart failure through increased accuracy and interpretability of the outputs.

💡 Analysis

In this paper, we proposed two different approaches, a rule-based approach and a machine-learning based approach, to identify active heart failure cases automatically by analyzing electronic health records (EHR). For the rule-based approach, we extracted cardiovascular data elements from clinical notes and matched patients to different colors according their heart failure condition by using rules provided by experts in heart failure. It achieved 69.4% accuracy and 0.729 F1-Score. For the machine learning approach, with bigram of clinical notes as features, we tried four different models while SVM with linear kernel achieved the best performance with 87.5% accuracy and 0.86 F1-Score. Also, from the classification comparison between the four different models, we believe that linear models fit better for this problem. Once we combine the machine-learning and rule-based algorithms, we will enable hospital-wide surveillance of active heart failure through increased accuracy and interpretability of the outputs.

📄 Content

Using Natural Language Processing to Screen Patients with Active Heart Failure: An Exploration for Hospital-wide Surveillance.

Shu Dong, MS1, R Kannan Mutharasan, MD2, Siddhartha Jonnalagadda, PhD1

1Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 2Division of Cardiology, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL

Abstract In this paper, we proposed two different approaches, a rule-based approach and a machine-learning based approach, to identify active heart failure cases automatically by analyzing electronic health records (EHR). For the rule-based approach, we extracted cardiovascular data elements from clinical notes and matched patients to different colors according their heart failure condition by using rules provided by experts in heart failure. It achieved 69.4% accuracy and 0.729 F1-Score. For the machine learning approach, with bigram of clinical notes as features, we tried four different models while SVM with linear kernel achieved the best performance with 87.5% accuracy and 0.86 F1-Score. Also, from the classification comparison between the four different models, we believe that linear models fit better for this problem. Once we combine the machine-learning and rule-based algorithms, we will enable hospital-wide surveillance of active heart failure through increased accuracy and interpretability of the outputs.

Introduction Reducing heart failure readmissions remains a major target for improving quality of care and reducing healthcare costs.1 A major component of this is identification of heart failure cases upon hospitalization. Early identification of patients with heart failure allows for early deployment of clinical resources to improve quality of care, continuity of care, and potentially reduce readmissions. The traditional model of accessing specialist care in hospitals relies upon placing a consultation request. Drawing from the operations management literature, this can be termed a “push” model of care delivery.2 A hallmark of push operations is operational delay, and potential inaccuracies in identifying patients who could benefit from more advanced resources. These flaws reduce health care quality and patient experience. In contrast, hospital-wide surveillance by a central heart failure-focused team enables a “pull” model of specialist care and multidisciplinary intervention delivery. In a disease surveillance model, a disease-focused team specifically screens hospital admissions for the presence of that disease state, and calls front-line providers to offer specialist care. A pull model of consultation and provision of multidisciplinary care may reduce time lags and increase accuracy, thereby improving quality of care. Automating identification of patients with heart failure admissions is possible by querying electronic health records (EHR). The use of discrete data from EHR is well established as a strategy for identification of disease states.3 The use of only discrete data, however, circumscribes the efficacy of automated detection methods by neglecting the rich clinical information embedded in free text such as clinical notes, radiological reports, and nursing notes. Natural language processing (NLP) is a strategy whereby clinical data can be mined from free text and meaningfully parsed into a format tractable for further processing by computational algorithms that enable machine learning. NLP, or Information extraction (to be precise) is the discovery by computers of new, previously unfound information, by automatically extracting information from different written resources.4 Earlier attempts were predominantly dictionary- or rule-based systems; however, most modern systems use supervised machine learning where a system is trained to recognize mentions in text based on specific (and numerous) features associated with the mentions that the system learns from annotated corpora. Unstructured information occurs in a wide variety of formats such as “calculated biplane LV ejection fraction is 37%” or “left ventricular function is severely reduced.” At a fundamental level, a medical concept such as a disease, treatment, or diagnostic test could be mentioned as a noun phrase – an incomplete sentence (ex: dry mucous membranes and myotic pupils), a complete sentence (ex: The patient had patent carotids bilaterally on her neck MRA), or as a list (ex: Tylenol 650 mg p.o. q. 4-6h p.r.n. headache or pain; acyclovir 400 mg p.o. t.i.d;). Despite active interest from clinicians in its potential,5 clinical information extraction is an unsolved problem. Here, we propose a natural language processing-based strategy for automated detection of heart failure cases, with high sensitivity and specificity, sufficient to enable operational use in a clinical setting. Methods Problem We identified 3867 pa

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut