An Innovative Imputation and Classification Approach for Accurate Disease Prediction
📝 Abstract
Imputation of missing attribute values in medical datasets for extracting hidden knowledge from medical datasets is an interesting research topic of interest which is very challenging. One cannot eliminate missing values in medical records. The reason may be because some tests may not been conducted as they are cost effective, values missed when conducting clinical trials, values may not have been recorded to name some of the reasons. Data mining researchers have been proposing various approaches to find and impute missing values to increase classification accuracies so that disease may be predicted accurately. In this paper, we propose a novel imputation approach for imputation of missing values and performing classification after fixing missing values. The approach is based on clustering concept and aims at dimensionality reduction of the records. The case study discussed shows that missing values can be fixed and imputed efficiently by achieving dimensionality reduction. The importance of proposed approach for classification is visible in the case study which assigns single class label in contrary to multi-label assignment if dimensionality reduction is not performed.
💡 Analysis
Imputation of missing attribute values in medical datasets for extracting hidden knowledge from medical datasets is an interesting research topic of interest which is very challenging. One cannot eliminate missing values in medical records. The reason may be because some tests may not been conducted as they are cost effective, values missed when conducting clinical trials, values may not have been recorded to name some of the reasons. Data mining researchers have been proposing various approaches to find and impute missing values to increase classification accuracies so that disease may be predicted accurately. In this paper, we propose a novel imputation approach for imputation of missing values and performing classification after fixing missing values. The approach is based on clustering concept and aims at dimensionality reduction of the records. The case study discussed shows that missing values can be fixed and imputed efficiently by achieving dimensionality reduction. The importance of proposed approach for classification is visible in the case study which assigns single class label in contrary to multi-label assignment if dimensionality reduction is not performed.
📄 Content
An Innovative Imputation and Classification
Approach for Accurate Disease Prediction
Yelipe UshaRani
Department of Information Technology
VNR VJIET
Hyderabad, INDIA
Dr.P.Sammulal
Dept.of Computer Science and Engineering JNT University Karimnagar, INDIA
Abstract—Imputation of missing attribute values in medical
datasets for extracting hidden knowledge from medical datasets
is an interesting research topic of interest which is very
challenging. One cannot eliminate missing values in medical
records. The reason may be because some tests may not been
conducted as they are cost effective, values missed when
conducting clinical trials, values may not have been recorded to
name some of the reasons. Data mining researchers have been
proposing various approaches to find and impute missing values
to increase classification accuracies so that disease may be
predicted accurately. In this paper, we propose a novel
imputation approach for imputation of missing values and
performing classification after fixing missing values. The
approach is based on clustering concept and aims at
dimensionality reduction of the records. The case study discussed
shows that missing values can be fixed and imputed efficiently by
achieving dimensionality reduction. The importance of proposed
approach for classification is visible in the case study which
assigns single class label in contrary to multi-label assignment if
dimensionality reduction is not performed.
Keywords— imputation; missing values; prediction; nearest
neighbor, cluster, medical records, dimensionality reduction
I.
INTRODUCTION
Medical records preprocessing is an important step which
cannot be avoided in most of the situations and when handling
medical datasets. The attributes present in medical records may
be of different data types. Also, the values of attributes have
certain domain which requires proper knowledge from medical
domain to handle them.
This is because of this diverse nature of medical records,
handling medical records is quite challenging for data miners
and researchers. The various preprocessing techniques for
medical records include fixing outliers in medical data,
estimation and imputing missing values, normalizing medical
attributes, handling inconsistent medical data, applying
smoothing techniques to attributes values of medical records to
specify some of them.
Data Quality depends on Data Preprocessing techniques.
An efficient preprocessing of medical records may increase the
data quality of medical records. In this context, data
preprocessing techniques have achieved significant importance
from medical data analysts and data miners. Incorrect and
improper data values may mislead the prediction and
classification results, there by resulting in false classification
results and thus leading to improper medical treatment which is
a very dangerous potential hazard. This research mainly aims at
handling missing attribute values present in medical records of
a dataset. The attributes may be numeric, categorical etc. The
present method can handle all the attribute types without the
need to devise a different method to handle different attribute
types. This is first importance of our approach. We outline
research objective and problem specification in the succeeding
lines of this paper and then discuss importance of our
approach.
A. Research Objective
We have the following research objectives in this research
towards finding missing values
• Obviously our first and foremost objective is to impute
missing values.
• Aim at dimensionality reduction process of medical
records.
• Classify new medical records using the same approach
used to find missing values.
• Cluster medical records to place similar records in to
one group.
B. Problem Specification
Given a dataset of medical records with and without
missing values, the research objective is to fix set of all missing
values in the medical records by using a novel efficient
Imputation approach based on clustering normal medical
records, so as to estimate missing values in medical records
with missing values.
C. Importance of Present Approach
The importance of the present approach which we wish to
propose has the following advantages
•
The method may be used to find missing attribute
values from medical records
•
The same approach for finding missing values may be
used to classify medical records
•
The disease prediction may be achieved using the
proposed approach without the need to adopt a
separate procedure
Special issue on “Computing Applications and Data Mining”
International Journal of Computer Science and Information Security (IJCSIS), Vol. 14 S1, February 2016
23
https://sites.google.com/site/ijcsis/
ISSN 1947-5500
•
Handles all attribute types
•
Preserves attribute information
• May be applied for datasets with and without class
labels
This content is AI-processed based on ArXiv data.