An ensemble approach for feature selection of Cyber Attack Dataset

Reading time: 5 minute
...

📝 Original Info

  • Title: An ensemble approach for feature selection of Cyber Attack Dataset
  • ArXiv ID: 0912.1014
  • Date: 2009-12-08
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Feature selection is an indispensable preprocessing step when mining huge datasets that can significantly improve the overall system performance. Therefore in this paper we focus on a hybrid approach of feature selection. This method falls into two phases. The filter phase select the features with highest information gain and guides the initialization of search process for wrapper phase whose output the final feature subset. The final feature subsets are passed through the Knearest neighbor classifier for classification of attacks. The effectiveness of this algorithm is demonstrated on DARPA KDDCUP99 cyber attack dataset.

💡 Deep Analysis

Deep Dive into An ensemble approach for feature selection of Cyber Attack Dataset.

Feature selection is an indispensable preprocessing step when mining huge datasets that can significantly improve the overall system performance. Therefore in this paper we focus on a hybrid approach of feature selection. This method falls into two phases. The filter phase select the features with highest information gain and guides the initialization of search process for wrapper phase whose output the final feature subset. The final feature subsets are passed through the Knearest neighbor classifier for classification of attacks. The effectiveness of this algorithm is demonstrated on DARPA KDDCUP99 cyber attack dataset.

📄 Full Content

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 6, No. 2, 2009 An ensemble approach for feature selection of Cyber Attack Dataset

Shailendra Singh Department of Information Technology Rajiv Gandhi Technological University Bhopal, India e-mail:shailendrasingh@rgtu.net Sanjay Silakari Department of Computer Science & Engineering Rajiv Gandhi Technological University Bhopal, India e-mail:ssilakari@rgtu.net

Abstract— Feature selection is an indispensable pre-processing step when mining huge datasets that can significantly improve the overall system performance. Therefore in this paper we focus on a hybrid approach of feature selection. This method falls into two phases. The filter phase select the features with highest information gain and guides the initialization of search process for wrapper phase whose output the final feature subset. The final feature subsets are passed through the K-nearest neighbor classifier for classification of attacks. The effectiveness of this algorithm is demonstrated on DARPA KDDCUP99 cyber attack dataset. Keywords-Filter, Wrapper, Information gain, K-nearest neighbor, KDDCUP99 I. INTRODUCTION Feature selection aims to choose an optimal subset of features that are necessary and sufficient to describe the target concept. It has proven in both theory and practice effective in enhancing learning efficiency, increasing [1][2] predictive accuracy and reducing complexity of learned results. Optimal feature selection requires an exponentially large search space, where N is the number of features [3]. So it may be too costly and impractical. Many feature selection methods have been proposed in recent years. The survey paper [4] gives the complete scenario of different approaches used in cyber attack detection systems. They can fall into two approaches: filter and wrapper [5]. The difference between the filter model and wrapper model is whether feature selection relies on any learning algorithm. The filter model is independent of any learning algorithm, and its advantages lies in better generality and low computational cost [6]. The wrapper model relies on some learning algorithm, and it can expect high classification performance, but it is computationally expensive especially when dealing with large scale data sets [7] like KDDCUP99.

This paper combines the two models to make use of their advantages. We adopt a two-phase feature selection method. The filter phase selects features and uses the feature estimation as the heuristic information to guide wrapper algorithm. We adopt information gain [8] uncertainty to get feature estimation. The second phase is a data mining algorithm which is used to estimate the accuracy of cyber attack detection. We use K-nearest neighbor based wrapper selector. The feature estimation obtained from the first phase is used for building the initialization of the search process. The effectiveness of this method is demonstrated through empirical study on KDDCUP99 datasets [9]. II. THE KDDCUP99 DATASET In the 1998 DARPA cyber attack detection evaluation program an environment [9] [10] was setup to acquire raw TCP/IP dump data for a network by simulating a typical U.S. Air Force LAN. The LAN was operated like a true environment, but being blasted with multiple attacks. For each TCP/IP connection, 41 various quantitative (continuous data type) and qualitative (discrete data type) features were extracted among the 41 features, 34 features are numeric and 7 features are symbolic. The data contains 24 attack types that could be classified into four main categories: • DOS: Denial Of Service attack. • R2L: Remote to Local (User) attack. • U2R: User to Root attack. • Probing: Surveillance and other probing. A. Denial of service Attack (DOS) Denial of service (DOS) is class of attack where an attacker makes a computing or memory resource too busy or too full to handle legitimate requests, thus denying legitimate user access to a machine. B. Remote to Local (User) Attacks A remote to local (R2L) attack is a class of attacks where an attacker sends packets to a machine over network, then exploits the machine’s vulnerability to illegally gain local access to a machine. C. User to Root Attacks User to root (U2R) attacks is a class of attacks where an attacker starts with access to a normal user account on the system and is able to exploit vulnerability to gain root access to the system. D. Probing Probing is class of attacks where an attacker scans a network to gather information or find known vulnerabilities. An attacker with map of machine and services that are 297 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security,
Vol. 6, No. 2, 2009 available on a network can use the information to notice for exploit. TABLE I.
CLASS LABLE THAT APPEARS IN 10% DATA SET

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut