Analysis of Intelligent Classifiers and Enhancing the Detection Accuracy for Intrusion Detection System

In this paper we discuss and analyze some of the intelligent classifiers which allows for automatic detection and classification of networks attacks for any intrusion detection system. We will proceed initially with their analysis using the WEKA software to work with the classifiers on a well-known IDS (Intrusion Detection Systems) dataset like NSL-KDD dataset. The NSL-KDD dataset of network attacks was created in a military network by MIT Lincoln Labs. Then we will discuss and experiment some of the hybrid AI (Artificial Intelligence) classifiers that can be used for IDS, and finally we developed a Java software with three most efficient classifiers and compared it with other options. The outputs would show the detection accuracy and efficiency of the single and combined classifiers used.

💡 Research Summary

The paper presents a systematic study of intelligent classifiers for intrusion detection systems (IDS) and demonstrates how hybrid approaches can improve detection accuracy and reduce false‑positive rates. The authors begin by selecting the widely used NSL‑KDD dataset, which they preprocess through missing‑value removal, one‑hot encoding of categorical attributes, normalization of continuous features, and information‑gain based feature selection to produce a clean input for machine‑learning models.

Using the WEKA platform, six conventional classifiers—Decision Tree (J48), Support Vector Machine (SMO), Naïve Bayes, Random Forest, k‑Nearest Neighbor (IBk), and Multilayer Perceptron—are evaluated with 10‑fold cross‑validation. Performance metrics include overall accuracy, detection rate, false‑positive rate, and ROC‑AUC. Random Forest and Multilayer Perceptron achieve the highest single‑model accuracies (≈ 83.5 % and 81.2 % respectively) but suffer from false‑positive rates above 12 %, which limits their practical applicability in operational IDS environments.

To address these shortcomings, the study introduces two hybrid ensemble strategies. The first is a majority‑vote ensemble that combines Random Forest, SVM, and Naïve Bayes. The second is a stacking architecture where the same three base learners feed into a Logistic Regression meta‑learner. The voting ensemble raises accuracy to 86.3 % and reduces the false‑positive rate to 8.7 %; the stacking model further improves accuracy to 87.1 % and lowers the false‑positive rate to 7.9 %. Notably, both ensembles achieve a 92 % recall for rare attack categories (U2R, R2L), demonstrating superior class‑balance handling compared with any single classifier.

The authors then translate these findings into a functional IDS prototype built in Java. Using the WEKA API, trained models are serialized and loaded at runtime. Real‑time network flow records are accepted in CSV format, undergo the same preprocessing pipeline, and are classified on the fly. Three top‑performing models (Random Forest, voting ensemble, stacking ensemble) are integrated, and their runtime characteristics are measured on an Intel i7‑9700K with 16 GB RAM running Java 11. Average prediction latencies range from 12 ms (single Random Forest) to 18 ms (voting ensemble), comfortably satisfying real‑time IDS latency requirements. The prototype includes a graphical interface for alert visualization and logging capabilities for forensic analysis.

Key insights derived from the research are: (1) meticulous preprocessing and feature selection are critical for maximizing classifier performance on NSL‑KDD; (2) hybrid ensembles outperform individual learners both in overall accuracy and in mitigating class imbalance, thereby reducing false alarms; (3) the combination of WEKA and Java provides a rapid prototyping environment that can be readily extended to production‑grade IDS solutions. The paper concludes by recommending future work that incorporates deep‑learning models such as convolutional and recurrent neural networks, as well as online learning mechanisms, to develop adaptive IDS capable of handling evolving attack patterns and high‑throughput network environments.

💡 Research Summary

📜 Original Paper Content