Stabilizing the complexity of Feedforward Neural Networks (FNNs) for the given approximation task can be managed by defining an appropriate model magnitude which is also greatly correlated with the generalization quality and computational efficiency. However, deciding on the right level of model complexity can be highly challenging in FNN applications. In this paper, a new Model Selection algorithm using Binary Ant Colony Optimization (MS-BACO) is proposed in order to achieve the optimal FNN model in terms of neural complexity and cross-entropy error. MS-BACO is a meta-heuristic algorithm that treats the problem as a combinatorial optimization problem. By quantifying both the amount of correlation exists among hidden neurons and the sensitivity of the FNN output to the hidden neurons using a sample-based sensitivity analysis method called, extended Fourier amplitude sensitivity test, the algorithm mostly tends to select the FNN model containing hidden neurons with most distinct hyperplanes and high contribution percentage. Performance of the proposed algorithm with three different designs of heuristic information is investigated. Comparison of the findings verifies that the newly introduced algorithm is able to provide more compact and accurate FNN model.
The most compelling approach is to think of Feed-forward Neural Networks (FNNs), as function approximation machines. They are designed to learn complex nonlinear function mapping of some set of input values to output values. In the context of deep learning, FNNs are the quintessential models that have been broadly used as a powerful machine learning algorithm in various fields of science and engineering, e.g. identification and control of dynamical systems 1 , robotics 2 , forecasting financial and economic time series 3 , renewable power systems 4 , big data 5 . Every hidden layer provides a new representation of the input data. The more complexity exists in working dataset, the more complex representation of the data (larger hidden layer), is needed such that allows the network to be able to properly learn the desired function [6][7][8] . As an up-close example regarding this issue, three different classifier FNNs with a single hidden layer composing of respectively 2, 4 and 20 hidden neurons are trained over an artificially generated dataset. The dataset is made of a two-dimensional vector divided into two non-linearly separable class labels. For the purpose of comparison, the hyperplanes of hidden neurons and the produced global decision surface of each network along the test data are plotted in Fig. 1. Ultimately, we come up with multiple opinions: Fig. 1. The comparison of hidden neuron hyperplanes and generalization ability of three different classifier FNNs with a single hidden layer. Each pair of (a, d), (b, e) and (c, f) is the visualized hyperplanes of neurons and the decision surface for the networks with respectively of 2, 4 and 20 hidden neurons that are trained over an artificially generated dataset.
• The complexity of the network mainly influences the generalization quality of the model. Notably, too small network is not able to properly fit the true function described by the training data. In our case study, both larger networks can achieve the classification rate of 100%, while this rate in third network with 2 hidden neurons is 90%, inasmuch as it cannot produce a hypersurface that partitions the underlying vector space into two sets, one for each class. • Another key point is computational efficiency. The network with 20 hidden neurons which is clearly larger than necessary, requires unneeded arithmetic calculations and in effect more computational resources. But the network with 4 hidden neurons is more efficient in both forward and backward propagations while it is capable of reaching the same classification accuracy. This later becomes more important specially because of the fact that back-propagation as the most popular gradient-based learning algorithm cannot satisfy growing realtime learning needs in many applications 9 . • A network with larger hidden layer has more weight connections which produces more dimensions in weight-space. As a result, more paths are created around the barriers of poor local minima in the lower dimensional subspaces. Thus, local minima problem seemingly is intensified in case of too small networks 10 . On the other hand, Fig. 1.a depicts that many hidden neurons in the large network have very similar or identical hyperplanes. In fact, this similarity might result in redundancy of hidden neurons. • The complexity increases in an excessively large network, and even though it might be able to accurately approximate the desired function, nevertheless since a larger network learns quicker, it is more likely to have poor generalization due to overfitting 6,11 . Consequently, learning process would be in further need of using various regularization techniques, which do not always lead to the best solutions. If the generalization ability of two FNN models trained over the same training data is the same as each other, then the model with simpler structure (lower number of free parameters), should be selected as the best model. Altogether, the network with 4 hidden neurons is identified to have finer level of complexity compared to other models in our experience. But, avoiding overestimating or underestimating size of the network in the applications of FNN is recognized to be a difficult task. Neural network architectures that perform well, are still typically designed manually by experts in a cumbersome trial-anderror process 12 , While attempting to find the “minimal” architecture is usually NP-hard 13 . In recent years different algorithms based on techniques such as, sensitivity measure [14][15][16][17][18][19][20] , correlation 17,[21][22][23][24][25] , space-state search [26][27][28][29][30] , extreme learning machine [31][32][33][34] , cascadecorrelation 35,36 , sparse signal representation 37 and thermal perceptron rule 38 have been proposed to optimize structure of FNN. A collection of the prominent algorithms is depicted in Fig. 2. Notably, some researchers have tried to explore the state-space of all different combinations of hidden neurons and layers
This content is AI-processed based on open access ArXiv data.