Unsupervised Anomaly Detection in NSL-KDD Using $β$-VAE: A Latent Space and Reconstruction Error Approach

Unsupervised Anomaly Detection in NSL-KDD Using β -V AE: A Latent Space and Reconstruction Error Approach Dylan Baptiste ∗ † , Ramla Saddem ∗ , Alexandre Philippot ∗ , François Fo yer † ∗ Univ ersité de Reims Champagne-Ardenne, CRESTIC, Reims, France Email: {dylan.baptiste, ramla.saddem, alexandre.philippot}@uni v-reims.fr † Seckiot, Paris, France Email: francois.foyer@seckiot.fr Abstract —As Operational T echnology increasingly integrates with Inf ormation T echnology , the need for Intrusion Detection Systems becomes more important. This paper explores an un- supervised approach to anomaly detection in network trafﬁc using β -V ariational A utoencoders on the NSL-KDD dataset. W e in vestigate two methods: lev eraging the latent space structure by measuring distances from test samples to the training data projections, and using the reconstruction error as a con ventional anomaly detection metric. By comparing these approaches, we pro vide insights into their respectiv e advantages and limitations in an unsupervised setting. Experimental results highlight the effectiveness of latent space exploitation for classiﬁcation tasks. Index T erms —A utoencoder , Deep Learning, Unsupervised Learning, Anomaly Detection, Cybersecurity , Intrusion Detection System. I . I N T R O D U C T I O N The increasing integration of Operational T echnology (O T) with Information T echnology (IT) systems has led to a gro w- ing need for intrusion detection systems (IDS) in industrial en vironments. Anomaly detection is a crucial component of IDS, as it enables the identiﬁcation of malicious activities that deviate from normal beha vior . In this context, the NSL- KDD dataset [1] is a widely used benchmark for ev aluating the performance of anomaly detection algorithms. In this paper , we explore an unsupervised approach to anomaly detection in network traf ﬁc using β -V ariational Au- toencoders ( β -V AE) [2]. β -V AEs are a deep learning model that can learn a lo w-dimensional representation of the input data, known as the latent space. By le veraging the latent space structure, we aim to detect anomalies in netw ork traf ﬁc without the need for labeled data. W e inv estigate two complementary methods for anomaly detection using β -V AEs: measuring the distances from test samples to the projections of the training data in the latent space, and using the reconstruction error as a con ventional anomaly detection metric. By comparing these approaches, we The work presented in this article was carried out as part of collaboration between CReSTIC Labs and Seckiot funded under the ANR T (Association Nationale de la Recherche et de la T echnologie), the national association for Research and T echnology . provide insights into their respectiv e advantages and limita- tions in an unsupervised setting. The remainder of this paper is organized as follows: Sec- tion II pro vides deﬁnitions and formalizations of the key concepts in this study , such as the β -V AE model and the NSL- KDD dataset. Section III presents related work in the ﬁeld of anomaly detection and the use of the latent space in the autoencoder frame work. Section IV describes the methodology used in this study , including the V AE architecture and the anomaly detection methods. Section VI presents the experi- mental results and discusses the performance of the proposed methods. Finally , Section VII concludes the paper and outlines directions for future work. I I . D E FI N I T I O N S A. β -V AE Model The β -V AE architecture is composed of an encoder q that maps the input data x to the latent space z and a decoder p that reconstructs the initial data from the latent representation. The original training objectiv e of a V AE is to maximize the evidence lower bound (ELBO), which for a β -V AE becomes E q ϕ ( z | x ) [log p θ ( x | z )] − β D K L ( q ϕ ( z | x ) ∥ p ( z )) (1) where θ and ϕ are the parameters of the decoder and encoder networks, respectiv ely , and z is the latent variable; q ϕ ( z | x ) is the approximate posterior distribution of z gi ven input x , p θ ( x | z ) is the data likelihood gi ven the latent variable, and D K L is the Kullback–Leibler diver gence [3] between the approximate posterior and the prior distribution p ( z ) . The β term controls the balance between latent space regularization and reconstruction ﬁdelity . Higher values of β enforce greater disentanglement but may compromise reconstruction accuracy . The model is trained by minimizing the loss function (2) with a stochastic gradient descent algorithm. In practice, the model is trained by minimizing the negati ve ELBO, which is written as − E q ϕ ( z | x ) [log p θ ( x | z )] + β D K L ( q ϕ ( z | x ) ∥ p ( z )) (2) T ABLE I C A T E G O RI E S O F A T TAC KS I N N S L -K D D DoS Probe U2R R2L neptune, smurf, back, teardrop, pod, land, apache2, mailbomb, processtable, udpstorm, worm ipsweep, nmap, portsweep, satan, mscan, saint buf fer_overﬂow , loadmodule, perl, rootkit, httptunnel, ps, sqlattack, xterm ftp_write, guess_passwd, imap, multihop, phf, spy , warezclient, warezmaster , snmpgetattack, snmpguess, xlock, xsnoop A reparameterization trick [4] is applied to sample z from the latent distribution q ϕ ( z | x ) , allowing gradients to be back- propagated through the stochastic sampling process. In the rest of this study the reconstruction error will be denoted as L rec = − E q ϕ ( z | x ) [log p θ ( x | z )] and the KL div er- gence as L KL = D K L ( q ϕ ( z | x ) ∥N (0 , I )) , where N (0 , I ) is the standard normal distrib ution. B. NSL-KDD Dataset The NSL-KDD dataset is a benchmark dataset for e valuating the performance of intrusion detection systems. It is a modiﬁed version of the KDD Cup 1999 dataset, which contains network trafﬁc data. The NSL-KDD dataset consists of 41 features, including 34 continuous and 7 cate gorical features. The dataset contains ﬁ ve classes of network trafﬁc: normal, denial of service (DoS), probe, user-to-root (U2R), and remote-to-local (R2L). T able I shows the categories of attacks in the NSL- KDD dataset [1]. I I I . R E L A T E D W O R K Anomaly detection in unsupervised settings has been the subject of numerous approaches based on autoencoders (AE) and their variants, with some recent studies focusing on lev eraging the latent space. Previous works on unsupervised anomaly detection hav e explored various methods to handle high-dimensional or highly nonlinear data. Already in 2007, [5] proposed a fault detection method for industrial processes based on the k-nearest neighbors (k-NN) rule, using only data from normal operation. This approach addresses the absence of anomalous training data by modeling the distrib ution of distances between normal samples and their nearest neighbors. Anomalies are then identiﬁed as samples whose distance exceeds a threshold deri ved from this distrib ution. In this work, the authors clearly explain the principle that will be revisited later in works exploiting approaches generating richer spaces, such as the latent spaces of AEs or their variants. Hybrid models combining AEs or their variants with neigh- borhood techniques hav e been dev eloped to enhance anomaly detection on high-dimensional data [6], [7], and [8]. These methods highlighted the advantage of nonlinear representation while lev eraging distance measures in the latent space. Other works focused on industrial applications and mon- itoring systems have also highlighted the interest of this approach, demonstrating that combining reconstruction error with latent space distance analysis can yield competitive or ev en superior performance compared to traditional anomaly detection methods [9] and [10]. Subsequently , works introduced the use of models based on V ariational Autoencoders (V AE), and more speciﬁcally β -V AE, to achie ve a more structured and interpretable or- ganization of the latent space. These studies suggest that considering the distrib ution of latent v ariables can contribute to ﬁner anomaly detection, whether through reconstruction error measures or distances in the latent space [11]. In a similar context, [12] aimed at estimating a conﬁdence measure through the exploitation of projections in the latent space and Mahalanobis distance has been presented to enhance intrusion detection on datasets like NSL-KDD. Other contributions ha ve sought to constrain the latent space to promote the emergence of clusters with similar beha viors. For example, [13] aims to limit the reconstruction capacity of AEs during training using an additional constraint that acts as a re gularization on the latent space. In [14], compact clustering methods in the latent space were dev eloped in a semi-supervised frame work, allowing for the grouping of pro- jections of samples with the same label, attracting unlabeled projections in the space and thus better identifying de viations. These techniques illustrate the interest of latent structure for separating normal data from anomalies. The work proposed in this article aligns with the pre viously mentioned studies. W e lev erage the structure of the latent space of a β -V AE for anomaly detection in network trafﬁc, using the NSL-KDD dataset. W e formalize and compare the two methods (reconstruction error and distance in the latent space) for anomaly detection, highlighting their respective adv antages and disadvantages. W e show that exploiting the latent space yields results comparable to those of reconstruction error , while providing better interpretability of the results. W e also observe the impacts of certain parameters on the performance of both methods, particularly β and k , the number of neighbors considered for calculating the distance in the latent space. I V . M E T H O D O L O G Y In this study , we focus on the binary classiﬁcation task of detecting normal trafﬁc and anomalies in an unsupervised setting; in the analysis, we will also present results per attack classes and categories. A. Data Pr eprocessing W e exploit an unsupervised learning approach, using only the labeled normal data from the training dataset. W e hav e therefore re-divided NSL-KDD presented in section II-B into 3 parts: • The anomalous dataset, which includes all attack data from the training and test datasets. This dataset is named X attack . • The training dataset, which includes only the labeled normal data from the training dataset. This dataset is named X train . • The test dataset, which includes only the labeled normal data from the test dataset. This dataset is named X test . The NSL-KDD dataset includes a mix of categorical, boolean, and continuous features. T o prepare the data for training the β -V AE model, we ﬁrst applied one-hot encoding to the 3 categorical features ( protocol_type , service , and ﬂag ), con verting them into binary vectors. The 4 boolean features ( land , logged_in , is_guest_login , and is_host_login ) were en- coded as binary values (0 or 1). The remaining 33 features are continuous and were standardized using the mean and standard deviation computed from the training set X train , ensuring all features operate on a comparable scale, which is essential for stable model training. No feature selection was performed; we retained all features except for the difﬁculty attribute, which is not rele vant to our study . Class labels were e xcluded during training to maintain an unsupervised learning setting. B. Model Ar chitectur e W e use a β -V AE architecture as described in section II-A. The encoder and decoder networks each comprise three fully connected hidden layers: 64 , 32 , and 16 neurons for the en- coder , and 16 , 32 , and 64 neurons for the decoder . The encoder outputs the mean and log variance through two separate fully connected layers with 8 neurons each, which represent the parameters of the Gaussian distribution in this 8 -dimensional latent space. The decoder takes the sampled latent variable and reconstructs the input data. For the stochastic gradient descent algorithm, we use the Adam optimizer [15] with a learning rate of 0 . 001 and a batch size of 2048 . During reconstruction, the total reconstruction loss is com- puted as a linear combination of three loss functions, each tailored to the nature of the feature: • Categorical features: softmax acti vation with cate gorical cross-entropy loss L cat = − 1 n n X i =1 m X j =1 x ij log( ˆ x ij ) (3) where n is the number of samples, m is the number of categories for a gi ven feature, x ij is a binary indicator (0 or 1) that the i th sample belongs to category j , and ˆ x ij is the predicted probability for category j in sample i . • Boolean features: sigmoid activ ation with binary cross- entropy loss L bool = − 1 n n X i =1 [ x i log( ˆ x i ) + (1 − x i ) log(1 − ˆ x i )] (4) where x i is the true binary value and ˆ x i is the predicted probability for the i th sample. • Continuous features: linear activ ation with mean squared error (MSE) L cont = 1 n n X i =1 ( x i − ˆ x i ) 2 (5) where x i and ˆ x i are the true and reconstructed continuous values for the i th sample. The total reconstruction loss L rec is deﬁned as a linear combination of the three components L rec = L cat + L bool + L cont (6) The β -V AE loss L is then the combination of L rec and the KL di ver gence term, as deﬁned in Equation (2) and weighted by the β parameter L = L rec + β L KL (7) V . β - V A E E X P L O I TA T I O N F O R C L A S S I FI C A T I O N Anomaly detection can be approached in two distinct ways within the framework of our β -V AE model: through recon- struction error or by analyzing the latent space. Each of these methods allows classifying data as anomalies or normal data, but based on different criteria. In both cases, we ev aluate performance using false positive rate (FPR) and true positiv e rate (TPR) for different thresholds, with the Area Under the Recei ver Operating Characteristic curve (A UROC) as the performance metric. A. Anomaly detection based on reconstruction error The ﬁrst anomaly detection approach relies on reconstruc- tion error, a classic method in unsupervised learning. After training the β -V AE model, each data point from the set X attack ∪ X test is projected into the latent space using the encoder z ∼ q ϕ ( z | x ) , and then reconstructed by the decoder ˆ x = p θ ( x | z ) . The goal is to quantify the difference between the original data x and its reconstruction ˆ x from the latent space. This difference is measured by the reconstruction error L rec presented in Section IV -B. Once the reconstruction error is calculated, a threshold is set to distinguish normal data from anomalous data. Data points for which the error e xceeds this threshold are considered anomalous, while those with an error belo w the threshold are classiﬁed as normal. This detection approach is named L rec - classiﬁcation in the rest of this w ork. The Algorithm 1 implements L rec -classiﬁcation. B. Anomaly detection based on latent space The second approach in volv es leveraging the latent space of the β -V AE model to detect anomalies. The idea is to project the normal data from the training set X train into the latent space using the encoder q ϕ of the β -V AE model. This approach will be referred to as Z k -classiﬁcation, where k is an integer representing the number of neighbors to con- sider for calculating the a verage Euclidean distance. W e denote Algorithm 1 L rec -classiﬁcation Require: x a sample to classify , ( q ϕ , p θ ) : a trained β -V AE, τ : the threshold Ensure: y : classiﬁcation label : normal or anomaly 1: z ∼ q ϕ ( z | x ) ▷ Encoder 2: ˆ x ← p θ ( z ) ▷ Decoder 3: if L rec ( x, ˆ x ) > τ then 4: y ← anomaly 5: else 6: y ← normal 7: end if 8: r eturn y Z X k ( x ) as the average Euclidean distance between z (the projection of x ) and the k nearest neighbors of the projections of X . This average is calculated using the formula (8). Z X k ( x ) = 1 k k X j =1 ∥ z − z ′ ( j ) ∥ 2 (8) with z ∼ q ϕ ( z | x ) and z ′ ( j ) the j -th nearest neighbor of z in the set of projections of X . Similarly to the reconstruction error -based method, if an av erage distance exceeds a threshold, the data point will be considered anomalous; otherwise, it will be classiﬁed as normal. The Algorithm 2 implements the Z k -classiﬁcation. Algorithm 2 Z k -classiﬁcation Require: x a sample to classify , ( q ϕ , p θ ) : a trained β -V AE, X train : the training dataset, k : the number of neighbors, τ : the threshold Ensure: y : classiﬁcation label : normal or anomaly 1: Z train ← { z i ∼ q ϕ ( z | x i ) , ∀ x i ∈ X train } 2: z ∼ q ϕ ( z | x ) 3: Find the k nearest neighbors z ′ (1) , . . . , z ′ ( k ) of z in Z train 4: if Z X train k ( x ) > τ then 5: y ← anomaly 6: else 7: y ← normal 8: end if 9: r eturn y V I . E X P E R I M E N TA L R E S U LT S First, we present the performance of the two methods, L rec - classiﬁcation and Z k -classiﬁcation, on the binary classiﬁcation of normal versus anomalous traf ﬁc. As stated in Section V, we ev aluate performance using A UROC. T o assess stability , we ran both methods over four runs with different seeds. The A UR OC is computed for each run and then av eraged ov er the four runs. Figure 1 and T able II show the mean A UROC across β and k , av eraged o ver four runs. W e ha ve tested the Z k - classiﬁcation with k values of 1 , 100 , 150 , 200 , 250 , 300 , 400 , 500 , 1000 , 2000 , 3000 , 4000 , and 5000 . The β parameter was tested with values of 0 , 0 . 00001 , 0 . 0001 , 0 . 001 , 0 . 01 , 0 . 1 , and 0 . 5 . In T able II, bold v alues represent the best mean result per β . Underlined values indicate cases where the mean A UR OC with Z k -classiﬁcation outperforms L rec -classiﬁcation for a speciﬁc β . In general, increasing k improv es the A UROC for the Z k -classiﬁcation method. Results show that Z k -classiﬁcation can outperform L rec -classiﬁcation in some cases with large value of k . For L rec -classiﬁcation, the best mean A UROC is achiev ed with β = 0 , and this method appears relatively insensitiv e to β ; A UR OC ranges from 0 . 962 to 0 . 968 . With this β setting, L rec -classiﬁcation is outperformed by Z k - classiﬁcation for k ≥ 200 . For Z k -classiﬁcation, the best mean A UR OC is obtained with β = 10 − 5 and k = 5000 . 1 250 500 1000 2000 3000 4000 5000 k 0.975 0.95 0.9 0.8 0.7 AUC = 0 . 5 = 0 . 1 = 0 . 0 1 = 0 . 0 0 1 = 0 . 0 0 0 1 = 1 e - 0 5 = 0 Fig. 1. Mean A UR OC of L rec -classiﬁcation and Z k -classiﬁcation with variables β and k For the rest of the result we choose to focus on the results of a model trained with β = 10 − 5 and k = 5000 . The performance of this model is highlighted in Figure 2, Figure 3 as ROC curves anlysis and in Figure 4 to show the distribution of data classiﬁed by both methods. T ABLE II M E AN AU RO C O F L rec - C LA S S I FIC ATI O N A N D Z k - C LA S S I FIC ATI O N W I T H V A R I A BL E S β A N D k β A UROC (%) Z 1 Z 100 Z 150 Z 200 Z 250 Z 300 Z 400 Z 500 Z 1000 Z 2000 Z 3000 Z 4000 Z 5000 L rec 0 94.11 96.43 96.63 96.76 96.89 96.99 97.09 97.14 97.40 97.48 97.56 97.66 97.70 96.78 0.00001 94.49 96.79 97.03 97.16 97.25 97.32 97.46 97.52 97.68 97.75 97.81 97.87 97.90 96.23 0.0001 94.28 96.60 96.76 96.96 97.12 97.19 97.26 97.29 97.52 97.65 97.70 97.73 97.73 96.52 0.001 93.51 95.81 96.20 96.47 96.66 96.71 96.66 96.58 96.69 96.80 96.82 96.85 96.86 96.61 0.01 93.47 96.16 96.37 96.48 96.57 96.64 96.71 96.76 96.96 96.85 96.86 96.85 96.82 96.44 0.1 91.05 93.52 93.85 94.11 94.31 94.46 94.64 94.76 95.14 95.32 95.35 95.35 95.32 96.48 0.5 75.08 84.26 85.28 86.01 86.59 87.06 87.81 88.37 89.88 90.93 91.35 91.56 91.67 96.28 0 0.2 0.4 0.6 0.8 1 FPR 0 0.2 0.4 0.6 0.8 1 TPR Z 5000 -classiﬁcation (AUC = 0 . 9824 ) L rec -classiﬁcation (AUC = 0 . 9657 ) Fig. 2. ROC curves for the binary classiﬁcation task with L rec -classiﬁcation and Z k -classiﬁcation Figure 2 shows the ROC curves for the two methods on the binary classiﬁcation task. 0 0.2 0.4 0.6 0.8 1 DoS (AUC = 0 . 9835 ) (AUC = 0 . 9665 ) Probe (AUC = 0 . 9820 ) (AUC = 0 . 9653 ) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 U2R (AUC = 0 . 9809 ) (AUC = 0 . 9704 ) 0 0.2 0.4 0.6 0.8 1 R2L (AUC = 0 . 9676 ) (AUC = 0 . 9551 ) Z 5000 L rec Fig. 3. R OC curv es on X test and X attack with L rec -classiﬁcation and Z 5000 -classiﬁcation, per attack class Figure 3 shows the ROC curves for the two anomaly detection methods per attack class (Probe, DoS, U2R, and R2L) described in Section II-B. Some attack classes are more difﬁcult to detect than others. Fig. 4. Distrib ution of Z 5000 -classiﬁcation and L rec -classiﬁcation on X test and X attack . Blue points are classiﬁed as normal, purple as Probe , orange as DoS , green as U2R , and red as R2L . The distribution of each cate gory is represented as a density on the opposing axes. Figure 4 shows the distribution of data classiﬁed by both methods. Both approaches achie ve excellent results. The nor- mal distrib ution is clearly separated from the attack distribu- tion in both methods. W e can also see that sometimes certain normal data are well classiﬁed by one method and not the other . This suggests the two methods are not redundant and can be complementary . In f act, it is possible to implement an adaptiv e thresholding mechanism that considers both methods to classify the data. V I I . C O N C L U S I O N A N D P E R S P E C T I V E S This work studied unsupervised anomaly detection on NSL- KDD with a β -V AE, by comparing two decision signals based on dif ferent principles: reconstruction-based scoring ( L rec -classiﬁcation) and latent space distance-based scoring computed as the mean Euclidean distance to the k nearest neighbors ( Z k -classiﬁcation). W e showed that latent distance can match or surpass reconstruction error depending on β and k , while the two signals remain complementary for some samples. The latent-space method enables incremental learning. Be- cause decisions rely on reference embeddings, the model can be updated online by appending ne w normal and labeled ab- normal projections without retraining the β -V AE. This makes it possible to adapt to ev olving operating conditions and to progress from anomaly detection to beha vior classiﬁcation: clusters of latent patterns corresponding to distinct operat- ing modes can be tracked and labeled ov er time, enabling ﬁne-grained classiﬁcation of behaviors in addition to binary anomaly ﬂags. Beyond Euclidean distance, we can also consider a Maha- lanobis score in latent space. Rather than sampling z from q ϕ ( z | x ) , we simply use the encoder outputs as a deterministic embedding, the mean µ ( x ) and e valuate a Mahalanobis dis- tance to the normal reference statistics estimated on Z train [16]. The adv antage of Mahalanobis distance o ver Euclidean distance is that it takes into account the covariance structure of the data, which can be particularly useful in high-dimensional spaces where features may be correlated, as discussed in [12]. Future w ork will focus on fusing reconstruction-based and latent-based scores via calibrated or learned aggre gation; im- plementing and benchmarking Mahalanobis-based detectors in the latent space; broadening the ev aluation to div erse datasets and operating conditions; and lev eraging incremental learning to track latent clusters and enable behavior -aware intrusion analysis. R E F E R E N C E S [1] M. T av allaee, E. Bagheri, W . Lu, and A. A. Ghorbani, “ A detailed analysis of the KDD CUP 99 data set, ” in 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications , Jul. 2009, pp. 1–6. [2] I. Higgins, L. Matthe y , A. P al, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner , “beta-vae: Learning basic visual concepts with a constrained variational framew ork, ” in International conference on learning r epresentations , 2017. [3] S. Kullback and R. A. Leibler , “On information and sufﬁciency , ” The Annals of Mathematical Statistics , vol. 22, no. 1, pp. 79–86, 1951. [4] D. P . Kingma and M. W elling, “ Auto-encoding variational bayes, ” 2022. [5] Q. He and J. W ang, “Fault detection using the k-nearest neighbor rule for semiconductor manufacturing processes, ” IEEE T ransactions on Semiconductor Manufacturing , v ol. 20, no. 4, pp. 345–354, 2007. [6] H. Song, Z. Jiang, A. Men, and B. Y ang, “ A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data, ” Computational Intelligence and Neuroscience , vol. 2017, no. 1, p. 8501683, 2017. [7] J. Guo, G. Liu, Y . Zuo, and J. W u, “ An Anomaly Detection Frame- work Based on Autoencoder and Nearest Neighbor , ” in 2018 15th International Confer ence on Service Systems and Service Manag ement (ICSSSM) , Jul. 2018, pp. 1–6. [8] F . Angiulli, F . Fassetti, and L. Ferragina, “ Latent Out : An unsupervised deep anomaly detection approach exploiting latent space distribution, ” Machine Learning , vol. 112, no. 11, pp. 4323–4349, Nov . 2023. [9] Z. Zhang, T . Jiang, S. Li, and Y . Y ang, “ Automated feature learning for nonlinear process monitoring – An approach using stacked denoising autoencoder and k-nearest neighbor rule, ” Journal of Pr ocess Contr ol , vol. 64, pp. 49–61, Apr . 2018. [10] R. Corizzo, M. Ceci, and N. Japko wicz, “ Anomaly Detection and Repair for Accurate Predictions in Geo-distributed Big Data, ” Big Data Resear ch , vol. 16, pp. 18–35, Jul. 2019. [11] S. Ramakrishna, Z. Rahiminasab, G. Karsai, A. Easwaran, and A. Dube y , “Efﬁcient Out-of-Distribution Detection Using Latent Space of β -V AE for Cyber-Physical Systems, ” ACM T rans. Cyber-Phys. Syst. , vol. 6, no. 2, Apr . 2022. [12] I. Pitsiorlas, G. Arvanitakis, and M. K ountouris, “Trustw orthy Intrusion Detection: Conﬁdence Estimation Using Latent Space, ” 2024 22nd International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and W ireless Networks (WiOpt) , pp. 92–98, 2024. [13] M. Astrid, M. Z. Zaheer, and S. Lee, “Constricting Normal Latent Space for Anomaly Detection with Normal-only Training Data, ” in 5th W orkshop on practical ML for limited/low r esource settings , 2024. [14] K. Kamnitsas, D. Castro, L. L. Folgoc, I. W alker, R. T anno, D. Rueckert, B. Glocker , A. Criminisi, and A. Nori, “Semi-Supervised Learning via Compact Latent Space Clustering, ” in Proceedings of the 35th International Confer ence on Machine Learning , ser . Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., v ol. 80. PMLR, 10–15 Jul 2018, pp. 2459–2468. [15] D. P . Kingma and J. Ba, “ Adam: A method for stochastic optimization, ” in 3rd International Conference on Learning Representations, ICLR 2015, San Die go, CA, USA, May 7-9, 2015, Conference T rack Pr oceedings , Y . Bengio and Y . LeCun, Eds., 2015. [Online]. A vailable: http://arxiv .org/abs/1412.6980 [16] G. J. McLachlan, “Mahalanobis distance, ” Resonance , vol. 4, no. 6, pp. 20–26, 1999.

Unsupervised Anomaly Detection in NSL-KDD Using $β$-VAE: A Latent Space and Reconstruction Error Approach

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment