Using Feature Weights to Improve Performance of Neural Networks

February 23, 2026

Reading time: 5 minute

...

📝 Abstract

Different features have different relevance to a particular learning problem. Some features are less relevant; while some very important. Instead of selecting the most relevant features using feature selection, an algorithm can be given this knowledge of feature importance based on expert opinion or prior learning. Learning can be faster and more accurate if learners take feature importance into account. Correlation aided Neural Networks (CANN) is presented which is such an algorithm. CANN treats feature importance as the correlation coefficient between the target attribute and the features. CANN modifies normal feed-forward Neural Network to fit both correlation values and training data. Empirical evaluation shows that CANN is faster and more accurate than applying the two step approach of feature selection and then using normal learning algorithms.

💡 Analysis

🇰🇷 한글로 읽기

📄 Content

Abstract Different features have different relevance to a particular learning problem. Some features are less relevant; while some very important. Instead of selecting the most relevant features using feature selection, an algorithm can be given this knowledge of feature importance based on expert opinion or prior learning. Learning can be faster and more accurate if learners take feature importance into account. Correlation aided Neural Networks (CANN) is presented which is such an algorithm. CANN treats feature importance as the correlation coefficient between the target attribute and the features. CANN modifies normal feed- forward Neural Network to fit both correlation values and training data. Empirical evaluation shows that CANN is faster and more accurate than applying the two step approach of feature selection and then using normal learning algorithms. 1 Introduction Feature selection is a popular method to improve performance of inductive machine learning algorithms. Many learning problems have a large feature set with many redundant features. Thus, extracting useful features for learning improves performance considerably [Guyon & Elisseeff, 2003]. However, feature selection algorithms are only preprocessors that select and modify a dataset, discarding less relevant features. After this preprocessing stage, machine learning algorithms treat all the features with equal importance. However, less-relevant features may still contribute much in the learning problem, so totally discarding them may impede accuracy; This is the reason feature selection degrades performance in some cases. While some features can be more important to the learning problem. So, a ranking based on importance towards the learning problems can be generated. In fact, such ranking measures are used in many feature selection algorithms [Leray & Gallinari, 1998; Bekkerman et al., 2003; Ruck et al., 1990]. Moreover, the principal problem in machine learning is not just having accuracy. But it is to maintain high accuracy if given less amount of data. Training data is scarce in most fields. In the data mining problems, having data is not an issue as the goal is to learn from massive data warehouses. However, most learning problems in which we have not made good progress are the ones where data is limited. Hence, high accuracy with scarce data is a must for real world use of machine learning. But most machine learning algorithms are inductive and require large portion of data. The amount of data needed is the focus of statistical learning theory [Kearns & Vazirani, 1994]. However, developing more and more refined inductive algorithms is not the solution to the scarcity of data problem. As per learning theory, there is a fundamental limit on how much knowledge can be learned from a set of data. So, the only solution is to provide external knowledge along with data. The focus of this paper is to learn faster if given external knowledge in the form of feature importance weight. Instead of treating all features equally, if the learners treat features based on their importance and use this knowledge of importance in learning, then they have been shown to perform better. [Iqbal, 2011; ZHANG & WANG, 2010]. This is a new field in machine learning research that has been gaining attention. IANN (Importance aided Neural Network) [Iqbal, 2011] extended Multilayer Perceptrons [Mitchell, 1997] to use Feature Importance values. Domain knowledge was provided as feature weights, a real value in [0,1] range to represent the importance of a feature. IANN performed better than many empirical learning algorithms. IANN also required significantly less training data to perform well. Which is much more important than improved accuracy as acquiring training data is expensive in most domains [Scott, 1991; Marcus, 1989] . Our research uses a different approach than the IANN system. We present CANN (Correlation aided neural network) a neural network system that can use feature importance values in the learning process to attain better performance. It is more robust and theoretically sound than IANN. While IANN is based on heuristics with little theoretical justification, CANN is based on the same principles as Neural Network (NN) Backpropagation itself. Using Feature Weights to Improve Performance of Neural Networks Ridwan Al Iqbal American International University-Bangladesh Dhaka, Bangladesh stopofeger@yahoo.com IANN algorithm used the feature importance values by changing the learning rate based on importance. The connections between the input features and the first hidden layer nodes had different learning rates scaled by the feature importance value. The weights were also initialized so that the more important features have higher probability of having a larger initial weight. The heuristic being, more important features have overall higher connection weight while

View Original ArXiv

This content is AI-processed based on ArXiv data.

Using Feature Weights to Improve Performance of Neural Networks

📝 Abstract

💡 Analysis

📄 Content

Table of Contents

Table of Contents

📝 Abstract

💡 Analysis

📄 Content

Start searching

No results found