An Innovative Imputation and Classification Approach for Accurate Disease Prediction

Reading time: 6 minute
...

📝 Abstract

Imputation of missing attribute values in medical datasets for extracting hidden knowledge from medical datasets is an interesting research topic of interest which is very challenging. One cannot eliminate missing values in medical records. The reason may be because some tests may not been conducted as they are cost effective, values missed when conducting clinical trials, values may not have been recorded to name some of the reasons. Data mining researchers have been proposing various approaches to find and impute missing values to increase classification accuracies so that disease may be predicted accurately. In this paper, we propose a novel imputation approach for imputation of missing values and performing classification after fixing missing values. The approach is based on clustering concept and aims at dimensionality reduction of the records. The case study discussed shows that missing values can be fixed and imputed efficiently by achieving dimensionality reduction. The importance of proposed approach for classification is visible in the case study which assigns single class label in contrary to multi-label assignment if dimensionality reduction is not performed.

💡 Analysis

Imputation of missing attribute values in medical datasets for extracting hidden knowledge from medical datasets is an interesting research topic of interest which is very challenging. One cannot eliminate missing values in medical records. The reason may be because some tests may not been conducted as they are cost effective, values missed when conducting clinical trials, values may not have been recorded to name some of the reasons. Data mining researchers have been proposing various approaches to find and impute missing values to increase classification accuracies so that disease may be predicted accurately. In this paper, we propose a novel imputation approach for imputation of missing values and performing classification after fixing missing values. The approach is based on clustering concept and aims at dimensionality reduction of the records. The case study discussed shows that missing values can be fixed and imputed efficiently by achieving dimensionality reduction. The importance of proposed approach for classification is visible in the case study which assigns single class label in contrary to multi-label assignment if dimensionality reduction is not performed.

📄 Content

An Innovative Imputation and Classification Approach for Accurate Disease Prediction
Yelipe UshaRani Department of Information Technology VNR VJIET Hyderabad, INDIA

                         Dr.P.Sammulal 

Dept.of Computer Science and Engineering JNT University Karimnagar, INDIA

Abstract—Imputation of missing attribute values in medical datasets for extracting hidden knowledge from medical datasets is an interesting research topic of interest which is very challenging. One cannot eliminate missing values in medical records. The reason may be because some tests may not been conducted as they are cost effective, values missed when conducting clinical trials, values may not have been recorded to name some of the reasons. Data mining researchers have been proposing various approaches to find and impute missing values to increase classification accuracies so that disease may be predicted accurately. In this paper, we propose a novel imputation approach for imputation of missing values and performing classification after fixing missing values. The approach is based on clustering concept and aims at dimensionality reduction of the records. The case study discussed shows that missing values can be fixed and imputed efficiently by achieving dimensionality reduction. The importance of proposed approach for classification is visible in the case study which assigns single class label in contrary to multi-label assignment if dimensionality reduction is not performed.
Keywords— imputation; missing values; prediction; nearest neighbor, cluster, medical records, dimensionality reduction
I. INTRODUCTION
Medical records preprocessing is an important step which cannot be avoided in most of the situations and when handling medical datasets. The attributes present in medical records may be of different data types. Also, the values of attributes have certain domain which requires proper knowledge from medical domain to handle them.
This is because of this diverse nature of medical records, handling medical records is quite challenging for data miners and researchers. The various preprocessing techniques for medical records include fixing outliers in medical data, estimation and imputing missing values, normalizing medical attributes, handling inconsistent medical data, applying smoothing techniques to attributes values of medical records to specify some of them.
Data Quality depends on Data Preprocessing techniques. An efficient preprocessing of medical records may increase the data quality of medical records. In this context, data preprocessing techniques have achieved significant importance from medical data analysts and data miners. Incorrect and improper data values may mislead the prediction and classification results, there by resulting in false classification results and thus leading to improper medical treatment which is a very dangerous potential hazard. This research mainly aims at handling missing attribute values present in medical records of a dataset. The attributes may be numeric, categorical etc. The present method can handle all the attribute types without the need to devise a different method to handle different attribute types. This is first importance of our approach. We outline research objective and problem specification in the succeeding lines of this paper and then discuss importance of our approach.
A. Research Objective We have the following research objectives in this research towards finding missing values

• Obviously our first and foremost objective is to impute missing values. • Aim at dimensionality reduction process of medical records.
• Classify new medical records using the same approach used to find missing values. • Cluster medical records to place similar records in to one group. B. Problem Specification
Given a dataset of medical records with and without missing values, the research objective is to fix set of all missing values in the medical records by using a novel efficient Imputation approach based on clustering normal medical records, so as to estimate missing values in medical records with missing values.
C. Importance of Present Approach The importance of the present approach which we wish to propose has the following advantages
• The method may be used to find missing attribute values from medical records • The same approach for finding missing values may be used to classify medical records • The disease prediction may be achieved using the proposed approach without the need to adopt a separate procedure Special issue on “Computing Applications and Data Mining” International Journal of Computer Science and Information Security (IJCSIS), Vol. 14 S1, February 2016 23 https://sites.google.com/site/ijcsis/ ISSN 1947-5500 • Handles all attribute types • Preserves attribute information • May be applied for datasets with and without class labels

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut