Feature Selection (FS) plays an important role in learning and classification tasks. The object of FS is to select the relevant and non-redundant features. Considering the huge amount number of features in real-world applications, FS methods using batch learning technique can't resolve big data problem especially when data arrive sequentially. In this paper, we propose an online feature selection system which resolves this problem. More specifically, we treat the problem of online supervised feature selection for binary classification as a decision-making problem. A philosophical vision to this problem leads to a hybridization between two important domains: feature selection using online learning technique (OFS) and automated negotiation (AN). The proposed OFS system called MOANOFS (Multi-Objective Automated Negotiation based Online Feature Selection) uses two levels of decision. In the first level, from n learners (or OFS methods), we decide which are the k trustful ones (with high confidence or trust value). These elected k learners will participate in the second level. In this level, we integrate our proposed Multilateral Automated Negotiation based OFS (MANOFS) method to decide finally which is the best solution or which are relevant features. We show that MOANOFS system is applicable to different domains successfully and achieves high accuracy with several real-world applications. Index Terms: Feature selection, online learning, multi-objective automated negotiation, trust, classification, big data.
URING the last three decades, Feature Selection (FS) has been extensively studied in Data Mining [1], [2], Pattern Classification [3], [4] and Machine Learning [5], [6]. FS is defined as the process of selecting a subset of relevant features and removing the redundant ones from a dataset for building effective prediction models.
In recent years, an enormous increase in data (news, medical imaging) has been observed which allows an increase in redundant information.
Even worse, the redundancy of irrelevant data has a negative impact on the performance of classification methods associated. With the rapid development of the Internet, current tremendous amounts of data up to millions or billions, can be collected for training machine learning models.
Most existing studies of feature selection are conducted in batch learning (off-line learning). In the batch learning, all features are given a priori in training instances. Such assumptions may not always hold for some real-world applications. In these real applications, training examples often arrive in a sequential manner, or it is expensive to collect the full information of training data.
With the emerging of large scale data and big data applications, the feature selection based on batch learning methods becomes non-practical.
Recently, Online Feature Selection (OFS) methods [7], [8], [9], [10] have been proposed to face out the drawbacks of batch feature selection methods. In fact, the proposed OFS methods tend to resolve feature selection tasks by exploring online learning techniques in machine learning.
Nowadays, significant parts of the information are stored in textual databases (or text documents) which are composed of a large set of documents from various sources, such as news, articles, books, digital libraries, email messages and Web pages. It is obviously important to consider that a real-world application has to deal with sequential, massive and high dimensional training data. Online learning has been extensively studied in machine learning and data mining [11], [12], [13]. In a traditional online learning task (e.g. online classification), a learner is trained in a sequential manner to predict the class labels of a sequence of instances.
With the development and penetration of distributed data mining within different disciplines, both feature selection and online learning have emerged to enhance techniques of relevant data selection. These data are mined for several identifications of data mining task in an online fashion for efficient knowledge discovery and collaborative computation.
In this paper, we find a solution to the problem of online feature selection with large-scale and ultra-high dimensional data for classification task using a new vision of data analysis. In fact, when we treated an online feature selection problem, questions raised are: is the online feature selection method used to select relevant features the best method that can enhance the classification performance? Can selecting the features let’s say 1,2,3,8… give minimum errors than the selection of 1,3,4,8… features!? Can making a combination between some OFS methods or between intelligent methods improve the performance of classification!? These different questions were behind developing a new idea in this paper which is hybridization between two domains aiming to assure decision making: feature selection in online fashion and automated negotiation (which is described with a philosophical vision).
This paper aims to address large-scale online feature selection problems with big data. To this end, we propose a novel online feature selection system by exploring the recent advances of online machine learning techniques [11], [14], [15], and a conflict resolution technique (Automated Negotiation) [16] for the purpose of enhancing the classification performance of ultra-high dimensional databases.
The remainder of this paper is organized as follows. We present a review of OFS and AN in Section 2. Then, we describe our proposed ANOFS methods and OFS system in Section 3 and Section 4 respectively. Finally, we draw a conclusion of this paper and we present possible future work in Section 5.
In this section, we first give an overview of online feature selection methods. Then, we introduce the principle of automated negotiation (AN). Specifically, we present some state-of-the-art works using AN on which we were referred to establish our OFS system.
Online Feature Selection (OFS) aims to solve the feature selection problem in an online fashion by effectively exploring online learning techniques. The key challenge of Online Feature Selection is how to make accurate prediction for an instance using a small number of active features. Let us have an overview of online feature selection methods.
One of the most straightforward approaches to online feature selection is applying the Perceptron algorithm via truncation (PETrun) [10]. Specifically, at each step, the classifier fi
This content is AI-processed based on open access ArXiv data.