An ensemble based approach for dealing with missing data, without predicting or imputing the missing values is proposed. This technique is suitable for online operations of neural networks and as a result, is used for online condition monitoring. The proposed technique is tested in both classification and regression problems. An ensemble of Fuzzy-ARTMAPs is used for classification whereas an ensemble of multi-layer perceptrons is used for the regression problem. Results obtained using this ensemble-based technique are compared to those obtained using a combination of auto-associative neural networks and genetic algorithms and findings show that this method can perform up to 9% better in regression problems. Another advantage of the proposed technique is that it eliminates the need for finding the best estimate of the data, and hence, saves time.
Deep Dive into Fuzzy Artmap and Neural Network Approach to Online Processing of Inputs with Missing Values.
An ensemble based approach for dealing with missing data, without predicting or imputing the missing values is proposed. This technique is suitable for online operations of neural networks and as a result, is used for online condition monitoring. The proposed technique is tested in both classification and regression problems. An ensemble of Fuzzy-ARTMAPs is used for classification whereas an ensemble of multi-layer perceptrons is used for the regression problem. Results obtained using this ensemble-based technique are compared to those obtained using a combination of auto-associative neural networks and genetic algorithms and findings show that this method can perform up to 9% better in regression problems. Another advantage of the proposed technique is that it eliminates the need for finding the best estimate of the data, and hence, saves time.
Real time processing applications that are highly dependent on the newly arriving data often suffer from the problem of missing data. In cases where decisions have to be made using computational intelligence techniques, missing data become a hindering factor. The biggest challenge on one hand is that most computational intelligence techniques such as neural networks are not able to process input data with missing values and hence, cannot perform classification or regression when some input data are missing. Various heuristics for missing data have however been proposed in the literature [1]. The simplest method is known as 'listwise deletion' and this method simply deletes instances with missing values [1]. The major disadvantage of this method is the dramatic loss of information in data sets. There is also a well documented evidence showing that ignorance and deletion of cases with missing entries is not an effective strategy [1][2]. Other common techniques are imputation methods based on statistical procedures such as mean computation, imputing the most dominant variable in the database, hot deck imputation and many more. Some of the best imputation techniques include the Expectation Maximization (EM) algorithm [3] as well as neural networks coupled with optimisation algorithms such as genetic algorithms as used in [4] and [5]. Imputation techniques where missing data are replaced by estimates are increasingly becoming popular. A great deal of research has been done to find more accurate ways of approximating these estimates. Among others, Abdella and Marwala [4] used neural networks together with Genetic Algorithms (GA) to approximate missing data. Gabrys [6] has also used Neuro-fuzzy techniques in the presence of missing data for pattern recognition problems.
The other challenge in this work is that, online condition monitoring uses time series data and there is often a limited time between the readings depending on how frequently the sensor is sampled. In classification and regression tasks, all decisions concerning how to proceed must be taken during this finite time period. Methods using optimisation techniques may take longer periods to converge to a reliable estimate and this depends entirely on the complexity of the objective function being optimised. This calls for better techniques to deal with this missing data problem.
We argue in this paper that it is not always necessary to have the actual missing data predicted. Differently said, it is not in all cases that the decision is dependent on all actual values. Therefore, a vast amount of computational resources is wasted in attempts to predict the missing values, whereas the ultimate result could have been achieved without such values. In light of this challenge, this paper investigates a problem of condition monitoring where computational intelligence techniques are used to classify and regress in the presence of missing data without the actual prediction of missing values. A novel approach where no attempt is made to recover the missing values, for both regression and classification problems, is presented. An ensemble of fuzzy-ARTMAP classifiers to classify in the presence of missing data is proposed. The algorithm is further extended to a regression application where Multi-layer Perceptron (MLP) is used in an attempt to get the correct output with limited input variables. The proposed method is compared to a technique that combines neural networks with Genetic Algorithm (GA) to approximate the missing data.
According to Little and Rubin [1], missing data are categorized into three basic types namely: ‘Missing at Random’, (MAR), ‘Missing Completely at Random’, (MCAR) and ‘Missing Not at Random’, (MNAR). MAR is also known as the ignorable case [3]. The probability of datum d from a sensor S to be missing at random is dependent on other measured variables from other sensors. A simple example of MAR is when sensor T is only read if sensor S reading is above a certain threshold. In this case, if the value read from sensor S is below the threshold, there will be no need to read sensor T and hence, readings from T will be declared missing at random. MCAR on the other hand refers to a condition where the probability of S values missing is independent of any observed data. In this regard, the missing value is neither dependent on the previous state of the sensor nor any reading from any other sensor. Lastly, MNAR occurs when data is neither MAR nor MCAR and is also referred to as the non-ignorable case [1,3] as the missing observation is dependent on the outcome of interest. A detailed description of missing data theory can be found in [3]. In this paper, we shall assume that data is MAR.
Neural networks may be viewed as systems that learn the complex input-output relationship from any given data. The training process of neural networks involves presenting the network with inputs and corresponding outputs and this process is termed supervised learn
…(Full text truncated)…
This content is AI-processed based on ArXiv data.