A Comparative Study for Predicting Heart Diseases Using Data Mining Classification Methods
📝 Abstract
Improving the precision of heart diseases detection has been investigated by many researchers in the literature. Such improvement induced by the overwhelming health care expenditures and erroneous diagnosis. As a result, various methodologies have been proposed to analyze the disease factors aiming to decrease the physicians practice variation and reduce medical costs and errors. In this paper, our main motivation is to develop an effective intelligent medical decision support system based on data mining techniques. In this context, five data mining classifying algorithms, with large datasets, have been utilized to assess and analyze the risk factors statistically related to heart diseases in order to compare the performance of the implemented classifiers (e.g., Na"ive Bayes, Decision Tree, Discriminant, Random Forest, and Support Vector Machine). To underscore the practical viability of our approach, the selected classifiers have been implemented using MATLAB tool with two datasets. Results of the conducted experiments showed that all classification algorithms are predictive and can give relatively correct answer. However, the decision tree outperforms other classifiers with an accuracy rate of 99.0% followed by Random forest. That is the case because both of them have relatively same mechanism but the Random forest can build ensemble of decision tree. Although ensemble learning has been proved to produce superior results, but in our case the decision tree has outperformed its ensemble version.
💡 Analysis
Improving the precision of heart diseases detection has been investigated by many researchers in the literature. Such improvement induced by the overwhelming health care expenditures and erroneous diagnosis. As a result, various methodologies have been proposed to analyze the disease factors aiming to decrease the physicians practice variation and reduce medical costs and errors. In this paper, our main motivation is to develop an effective intelligent medical decision support system based on data mining techniques. In this context, five data mining classifying algorithms, with large datasets, have been utilized to assess and analyze the risk factors statistically related to heart diseases in order to compare the performance of the implemented classifiers (e.g., Na"ive Bayes, Decision Tree, Discriminant, Random Forest, and Support Vector Machine). To underscore the practical viability of our approach, the selected classifiers have been implemented using MATLAB tool with two datasets. Results of the conducted experiments showed that all classification algorithms are predictive and can give relatively correct answer. However, the decision tree outperforms other classifiers with an accuracy rate of 99.0% followed by Random forest. That is the case because both of them have relatively same mechanism but the Random forest can build ensemble of decision tree. Although ensemble learning has been proved to produce superior results, but in our case the decision tree has outperformed its ensemble version.
📄 Content
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 14, No. 12, December 2016
868 ttps://sites.google.com/site/ijcsis/
ISSN 1947-5500
A Comparative Study for Predicting Heart
Diseases Using Data Mining Classification
Methods
Isra’a Ahmed Zriqat, Ahmad Mousa Altamimi, Mohammad Azzeh
Faculty of Information Technology
Applied Science Private University
Amman, Jordan
{i_zriqat, a_altamimi, m.y.azzah}@asu.edu.jo
Abstract- Improving the precision of heart diseases detection has been investigated by many researchers in
the literature. Such improvement induced by the overwhelming health care expenditures and erroneous
diagnosis. As a result, various methodologies have been proposed to analyze the disease factors aiming to
decrease the physicians practice variation and reduce medical costs and errors. In this paper, our main
motivation is to develop an effective intelligent medical decision support system based on data mining
techniques. In this context, five data mining classifying algorithms, with large datasets, have been utilized to
assess and analyze the risk factors statistically related to heart diseases in order to compare the performance
of the implemented classifiers (e.g., Naïve Bayes, Decision Tree, Discriminant, Random Forest, and Support
Vector Machine). To underscore the practical viability of our approach, the selected classifiers have been
implemented using MATLAB tool with two datasets. Results of the conducted experiments showed that all
classification algorithms are predictive and can give relatively correct answer. However, the decision tree
outperforms other classifiers with an accuracy rate of 99.0% followed by Random forest. That is the case
because both of them have relatively same mechanism but the Random forest can build ensemble of decision
tree. Although ensemble learning has been proved to produce superior results, but in our case the decision
tree has outperformed its ensemble version.
Keywords- Heart Diseases; Prediction Systems; Data Mining Classifiers; Ensemble Learning; Decision Tree
I.
INTRODUCTION
Data mining techniques have been widely used for variety of applications. In health care industry for
example, data mining plays an important role for predicting or diagnosing diseases with good accuracy.
One important application is to diagnose the heart diseases or cardiovascular as these diseases are
recognized as the leading cause of death globally in our modern world [1]. According to the World Heart
Federation and the World Health Organization, more than 17 million people died from cardiovascular
diseases in 2013, and around 3 million of these deaths occurred before the age of 60 [2]. However, 90% of
those deaths were estimated to be preventable if patients have correctly been diagnosed early and they
improved their habits such as: healthy eating, exercise, and alike [3].
In traditional healthcare environments, diagnosis of a disease depends on doctor’s decision for
identifying it as the most likely cause depending on a person’s symptoms. However, this leads to unwanted
errors that resulting on more medical costs and affecting the quality of service provided to patients. Instead,
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 14, No. 12, December 2016
869 ttps://sites.google.com/site/ijcsis/
ISSN 1947-5500
expert systems (that use Data mining techniques) [4] could be used to emulate the decision-making ability
of a human expert for answering not only simple questions like “What is the average age of patients who
have heart disease?”, “Identify the female patients who are single, and who have been treated for heart
diseases?”, but also complex ones like “Given patient records, predict the probability of patients who
diagnosed a heart disease?”, “Find the most significant risk factor that results a heart disease?”. Off course,
using such systems could reduce medical errors, and decrease practice variation, but surprisingly it can
improve diagnose results.
Techniques of data mining can be used for discovering knowledge in huge volumes of data through
detecting patterns and summarizing data into a format that can be understood. In fact, there are three main
techniques of data mining that can be utilized to classify previously unorganized data into predefin
This content is AI-processed based on ArXiv data.