Omni SCADA Intrusion Detection Using Deep Learning Algorithms
We investigate deep learning based omni intrusion detection system (IDS) for supervisory control and data acquisition (SCADA) networks that are capable of detecting both temporally uncorrelated and correlated attacks. Regarding the IDSs developed in …
Authors: Jun Gao, Luyun Gan, Fabiola Buschendorf
1 Omni SCAD A Intrusion Detection Using Deep Learning Algorithms Jun Gao, Luyun Gan, Fabiola Buschendorf, Liao Zhang, Hua Liu, Peixue Li, Xiaodai Dong and T ao Lu Abstract —W e in vestigate deep lear ning based omni intrusion detection system (IDS) for supervisory contr ol and data acqui- sition (SCAD A) networks that are capable of detecting both temporally uncorrelated and correlated attacks. Regarding the IDSs dev eloped in this paper , a feedf orward neural network (FNN) can detect temporally uncorrelated attacks at an F 1 of 99.967 ± 0.005% but correlated attacks as low as 58 ± 2%. In contrast, long-short term memory (LSTM) detects correlated attacks at 99.56 ± 0.01% while uncorrelated attacks at 99.3 ± 0.1%. Combining LSTM and FNN through an ensemble approach further impro ves the IDS perf ormance with F 1 of 99.68 ± 0.04% regardless the temporal correlations among the data packets. Index T erms —Feedforward Neural Networks, Multilayer P er - ceptron, Intrusion detection, Network security , SCADA systems, Supervised learning, LSTM, IDS, Modbus, Denial of Service (DoS). I . I N T RO D U C T I O N S UPER VISOR Y control and data acquisition (SCAD A) is a well established industrial system to automate/monitor processes and to gather data from remote or local equip- ment such as programmable logic controller (PLC), remote terminal units (R TU) and human-machine-interfaces (HMI), etc. SCAD A became popular in the 60’ s for power plants, water treatment [1], and oil pipelines [2], etc., which were usually disconnected from the Internet and made use of hardware devices running proprietary protocols. The network was secured from harmful attacks because of its obscurity , thus security means were barely implemented. Ho wev er , as more and more SCADA systems are adopting its Modbus protocol o ver TCP and are accessible via the Internet, they are vulnerable to cyberattacks. In 2010, Stuxnet [3] was spread ov er the world and damaged Iranian nuclear po wer plants. Since then, the need for industrial network security became urgent. T o safeguard SCADA networks, an intrusion detection sys- tem (IDS) needs to be implemented. IDS can be signature- based or anomaly-based. Traditionally , signature-based IDS is Manuscript received June 14, 2019; revised XY , 2019.This work is sup- ported in part by the Nature Science and Engineering Research Council of Canada (NSERC) Discovery Grant (Grant No. RGPIN-2015-06515), Mitacs globalink program, and Nvidia Corporation TIT AN-X GPU grant. (Corre- sponding author: T ao Lu) J. Gao, L. Gan, L. Zhang, X. Dong and T . Lu are with the Department of Electrical and Computer Engineering, University of Victoria, EOW 448, 3800 Finnerty Rd., V ictoria, British Columbia, V8P 5C2, Canada, (e-mail: { jungao,luyun,liao,xdong,taolu } @uvic.ca) F . Buschendorf was with the Department of Computer Science, Uni versity of Goettingen, Germany , (e-mail: fabiola.b uschendorf@protonmail.com) H. Liu and P . Li are with Fortinet T echnology Inc., 899 Kifer Road, Sunnyv ale, California 94086, USA, (e-mail: { hualiu,pxli } @fortinet.com) the mainstream to detect SCAD A attacks. It identifies specific patterns from traffic data to detect the malicious activities and can be implemented as policy rules in IDS software such as Snort [4], [5]. Ref. [6] inv estigates a set of attacks against Modbus and designs rules to detect attacks. Ref. [7] proposes a state-relation-based IDS (SRID) to increase the accuracy and decrease the false negati ve rate in denial-of- service (DoS) detection. Ho we ver , these detection methods are too complicated and only v alid for specific scenarios. Overall, as discov ered in the previous research, signature based IDS is only efficient at finding known attacks and its performance relies heavily on the experts’ knowledge and experiences. An anomaly-based IDS [8] overcomes these challenges by introducing machine learning to identify attack patterns from data. It is also widely used in other applications such as mobile data misuse detection [9], software [10] and wireless sensor security [11]. Sev eral machine learning algorithms are proposed to dev elop anomaly-based IDS. Linda et al. [12] tailored a neural network model with error-back propagation and Levenber g-Marquardt learning rules in their IDS. Rrushi and Kang [13] combined logistic regression and maximum likelihood estimation to detect anomalies in process control networks. Poojitha et al. [14] trained a feedforward neural network (FNN) to classify intrusions on the KDD99 dataset and the industrial control system dataset. Zhang et al. [15] used support vector machine and artificial immune system to identify malicious network traffic in the smart grid. Maglaras and Jiang [16] dev eloped a one-class support vector machine module to train network traces off-line and detect intrusions on-line. All these machine learning algorithms are excellent in observing the pattern of attacks from the in-packet features. None of them, howe ver , takes into account of the temporal features between packets and thus will not perform well on attacks such as DoS which has strong temporal dependence. DoS attacks are among the most popular attacks to slow down or e ven crush the SCADA networks. Most of the devices in SCAD A operate in low power mode with limited capacity and are vulnerable to DoS [17]. Up to date, various DoS types, including spoofing [18], flooding and smurfing [19], etc., have been reported. Among all types of DoS, flooding DoS is widely-exploited where hack ers send a massi ve number of packets to jam the target network. In [20], the author exploits TCP syn flooding attack against the vulnerability of TCP transmission using hping DoS attack tool. Flooding DoS, along with all other DoS, is difficult to detect because the in-packet features extracted from each data packet may not display any suspicious pattern [21]. Similar to DoS, man-in-the-middle (MITM) is another 2 attack that is hard to detect from observing the in-packet features. It will be more efficient to detect them by observing the inter-packet patterns in time domain. Anomaly-based IDS on DoS and MITM becomes popular along with the advances of machine learning. For example, in [22], an auto-associative kernel regression (AAKR) coupled with the statistical probability ratio test (SPR T) is implemented to detect DoS. The result is not satisfactory because the regression model does not take the temporal signatures of DoS into consideration. In [23], FNN is used to classify abnor- mal packets in SCAD A with 85% accuray for MITM-based random response injection and 90% accuracy for DoS-based random response injection attacks but 12% at Replay-based attacks. The author exploits various attacks including DoS attacks and man-in-the-middle (MITM) attacks in the testbed built in Modbus/R TU instead of Modbus/TCP . In [24], the authors propose one-class support vector machine (OCSVM) combined with k-means clustering method to detect the DoS. They set flags on every 10 packets to reflect the relationships of time series. But the handcrafted features may be easily by- passed by expert attackers. T o detect temporally correlated attacks such as flooding DoS and MITM, one should capture the temporal anomaly from these attacks. Howe ver , those above mentioned IDS are not designed to extract temporal patterns from packets sequence. A more practical approach is to implement an IDS with the capacity of time series analysis. Recurrent neural networks (RNN) are the machine learning models that incorporate the recognition of temporal patterns. Among all RNN models, long short-term memory (LSTM) gains its popularity from speech recognition [25], music com- position [26] and to machine translation [27]. It is designed to predict future ev ents according to the information in the previous time steps and suitable for detecting attacks with temporal correlation. F or example, Ref. [28] applied LSTM for distributed DoS with high success rate. In [29] the authors also dev eloped a time-series anomaly detector based on LSTM [30] networks to enhance the performance of IDS and apply this framew ork to the dataset in [31]. But the number of DoS attacks in the dataset is relati vely small and the time interval for the DoS attack in this dataset is too long, making the detection inefficient. Despite of the excellent performance in detecting temporally correlated attacks such as DoS and MITM, the capacity of RNN to detect temporally uncorrelated attacks is limited compared to other types of machine learning algorithms such as FNN. In this paper , utilizing the advantages of both RNN and FNN while av oiding their disadvantages, we implement an omni IDS that can detect all attacks regardless of their temporal dependence. On a SCAD A testbed [17], we demon- strate that our IDS reaches the highest performance against all attacks compared to those that employ RNN or FNN alone. I I . S C A DA T E S T B E D A N D D A TA S Y N T H E S I Z E Our IDS is tested on a simulated SCAD A testbed. A simulated netw ork has the adv antage of being easy to maintain, change and operate and is less costly than a real device network. A software testbed, which simulates a SCAD A industry network and emulates the attacks was built by L. Zhang [17] on the basis work of T . Morris [32]. In the past, sev eral preliminary researches on SCAD A security had been conducted on this testbed [33], [34]. The attack target is a sim- ple SCADA network, consisting of two tanks using Modbus ov er TCP . The liquid lev el of the tanks is controlled by pumps and measured by sensors via Modb us control information. The purpose of this netw ork is to attract hack ers and study possible defense methods. Such a system is called Honeypot, as it fools the attacker while exploiting his behaviour . This tank system is developed by the MBLogic HMIBuilder and HMIServer toolkit [35] and has been extended by L. Zhang in [17]. The HMI’ s purpose is to pull data from the sensor or send the desired pump speed to the motor periodically . The back end of the HMI is a PLC while the front end is a web browser . As this system is simulated, we make use of four virtual machines as shown in Fig. 1. The SCAD A system runs on a Modbus master and sev eral slav es. On a virtual host called Nov a the HMI is deployed, thus we refer to this host as Modbus master . In order to extend the netw ork, some Modbus slav es such as PLCs are simulated by the HoneyD software [36]. This will provide a more realistic Honeypot. The role of a Modbus slav e is to process commands from the master by pulling sensory data about the tank system from the PLCs and sending it back to the master . Fig. 1: T estbed architecture [17] The data needed to feed the neural network is generated by an attack machine using a virtual host named Kali. Kali is a Debian-deriv ed Linux host used for penetration testing and features many attack and defense tools. Additional to the message exchange between the Modbus master (No va) and its slav es we can launch normal traffic mixed with various attacks from Kali. A command line tool, Modpoll [37], is used to send Modbus instructions to the PLC which controls sensible tank system v ariables. An example Modpoll instruction which sends a pump speed of 5 to the system looks like this: $ m o d p o l l − 0 − r 3 2 2 1 0 1 0 . 0 . 0 . 5 5 The command addresses a simulated PLC with an IP address of 10.0.0.5 and a register address which contains either a threshold value (register 42212 - 42215), the current pump speed (32210) or the tank le vel (42210,42211), measured by the sensors. Modpoll will send Modb us requests with function 3 code 16 to attempt a write action to the specified registers. By modifying the pump speed the attackers can exceed the allowed tank le vel and create serious damage to a system. A script on Kali will randomly choose between these normal or malicious Modbus instructions and will launch a Modpoll instruction with another randomly chosen parameter . This will ensure desired distribution of attack/non-attack data. The traffic will be recorded by the fourth virtual machine referred to as “Defense W all”, which operates in the bridge mode and thus is in visible to the attacker . W ith PyShark we capture the traffic between Nova and Modbus slav es and between the attacker machine Kali and the PLCs. During this process we can label each packet as malicious or normal. A. F eatur es extr acted fr om the data packets In our testbed, we use a self-developed IDS installed on “Defense W all” to extract 19 features from each data packet captured. They are listed below: 1) Source IP address; 2) Destination IP address; 3) Source port number; 4) Destination port number; 5) TCP sequence number; 6) Transaction identifier set by the client to uniquely iden- tify each request; 7) Function code identify the Modbus function used; 8) Reference number of the specified register; 9) Modbus register data; 10) Modbus exception code; 11) Time stamp; 12) Relativ e time; 13) Highest threshold; 14) Lowest threshold; 15) High threshold; 16) Low threshold; 17) Pump speed; 18) T ank 1 water level; 19) T ank 2 water level. Here, the “Relati ve time” represents the time in seconds for packets relativ e to the first packet in the same TCP session. T o reduce the periodicity of this feature, we reset it to zero when “Relative time” reaches 3,000 seconds. In our IDS, we adopt feature scaling of each feature x in the dataset according to x 0 = x − ¯ x σ x (1) where ¯ x and σ x are the mean and standard deviation of original feature x and x 0 is the re-scaled feature from x with zero mean and unity variance. B. T ypes of attacks in our datasets Using our scripts, we created two datasets. As illustrated in Fig. 2, in addition to “Normal” data packets, Dataset I contains attacks that are uncorrelated in time domain while Dataset II contains temporally dependent attacks. Here we hav e incorporated 10 attacks in our testbed. 7 of them are Fig. 2: Data packet types distribution in Dataset I, II and online script. The ones with a superscript “*” are temporally correlated attacks. temporally uncorrelated while the remaining 3 are correlated. The temporally uncorrelated attacks include “Pump Speed” (Pump), “T ank 1 Le vel” (T1), “T ank 2 Le vel” (T2), “Threshold Highest” (HH), “Threshold Lowest” (LL), “Threshold High” (H) and “Threshold Lo w” (L) whose detailed descriptions can be found in [17], [32]. Among all temporally correlated attacks, two types of flooding DoS attacks are included [31]. The first labelled as “Scan flooding” (SCAN) is to send massiv e scan command, resulting in increasing latency of communications between the HMI and the sensors in SCAD A. The second type labelled as “Incorrect CRC” (CRC) is sending massiv e packets with incorrect cyclic redundancy check (CRC) to cause latency of master . Another temporally correlated attack included in this testbed is “Man-in-the-middle” (MITM) attack. It is an eavesdropping where the attacker monitors the communication traffics be- tween tw o parties secretly . Here, the MITM attack is launched by Ettercap [38] using ARP spoofing [39]. One ef fecti ve way to detect ARP spoofing is identifying the Media Access Control (MA C) address in layer 2 of OSI model. Howe ver , most of Network IDSs (NIDS) do not support the protocols in layer 2 such as ARP and MA C protocols. Even Snort requires an ARP spoof preprocessor [40] to collect the MAC address information to detect ARP spoofing. Besides, the victim host of ARP spoofing attack would experience packets retransmissions. For SCAD A networks, packet retrasmissions or delay may cause great damages. Therefore, the IDS should raise alert when it detects either MITM attack or packets retransmissions. T o make the IDS robust in detecting both MITM and packets retransmissions we remove the MAC address feature which was used for labeling MITM attack from the datasets for training neural networks. At the first stage, FNN and LSTM IDSs will be trained as binary classifiers that only predict attacks from normal traffic and tested on these datasets separately for performance comparisons. In on-line phases, these two IDSs along with 4 (a) (b) Fig. 3: (a)The schematics of the FNN IDS (b) Details of each neuron in FNN our FNN-LSTM ensemble IDS will be trained as multi-class classifiers by the combined datasets to predict various types of attacks from normal traf fics and implemented on the testbed. In addition, we also implement a script that can launch realtime attacks for online testing. The online script will randomly launch normal traffic, temporally uncorrelated and correlated attacks with ratios shown in the table to examine the omni- detection capability of different IDSs. I I I . I D S I M P L E M E N T A T I O N In this paper, we implemented three IDSs: a con ventional FNN, a LSTM and a FNN-LSTM ensemble IDS. Here, we use Keras [41] to implement tensorflow [42] based machine learning models with AdamOptimizer [43] to train our model. The structure of these IDSs are detailed in the following subsections. A. FNN IDS The basic structure of the FNN IDS is illustrated in Fig. 3. A typical FNN is formed by an input layer, an output layer and one or more hidden layers in-between. Each layer has a number of neurons that use the neuron outputs from the previous layer as input and produces output to the neurons in next layer . In our case, inputs are the scaled and normalized features extracted from the data packets, and outputs are the predictions of attacks and normal ev ents. Mathematically , the FNN can be expressed as: z (1) = W (1) x + b (1) , h 1 = f h ( z (1) ) z (2) = W (2) h 1 + b (2) , h 2 = f h ( z (2) ) ... z ( N +1) = W ( N +1) h N + b ( N +1) , ˆ y = z ( N +1) (2) where N is the number of hidden layers, f h is the ReLU activ ation function, and W (1) , W (2) , ..., W ( N +1) , b (1) , b (2) , ..., b ( N +1) are the parameters to be trained. Here we use softmax cross entropy as our loss function, which can be expressed as f L ( ˆ y , y ) = − C X i =1 y i log( f s ( ˆ y i )) (3) where ˆ y is the predicted label and y the ground truth. C is the number of all possible classes, y i and ˆ y i are the actual and predicted labels that belongs to class i , and f s is the softmax function. B. LSTM IDS The LSTM is built on a collection of single LSTM cells [31]. The structure of single LSTM cells is as Fig. 4a. Each LSTM cell has 3 gates: input gate, forget gate and output gate. The input gate selects useful information and push it to the cell. The irrelev ant information will be discarded in forget gate. The output gate outputs the activ ation state o t . A hidden state vector h t is transferred to the next time steps. The following equations represent the processes of a single LSTM cell: f t = σ ( W f x t + U f h t − 1 + b f ) i t = σ ( W i x t + U i h t − 1 + b i ) o t = σ ( W o x t + U o h t − 1 + b o ) c t = f t ◦ c t − 1 + i t ◦ σ g ( W c x t + U c h t − 1 + b c ) h t = o t ◦ σ g ( c t ) (4) where σ g is hyperbolic tangent function and σ is sigmoid function. ◦ is the element-wise product notation. W , U , b are the weight matrix for the gates. Shown in Fig. 4b, the LSTM IDS includes two LSTM layers with 10 LSTM cells in each layer . An activ ation layer with sigmoid activ ation function is placed after the last LSTM layer . The { x 1 , x 2 , ..., x t } vector is the input v ector containing features of packets within t time steps. The dataset is reshaped in this format and fit into the LSTM model. In our model, we set t = 10 . The loss function in this model is binary cross entropy and the optimizer is Adam optimizer [44]. C. FNN-LSTM Ensemble IDS Our FNN-LSTM ensemble IDS aims to combine the ad- vantages of both FNN and LSTM while avoiding their weak- nesses [45]. The schematics of this model is as shown in Fig. 8. In this model, the data packet features are fed into FNN and LSTM simultaneously to predict attacks as a multi- class classifier . The output labels of both are concatenated as the input of a multilayer perceptron, which through training, is capable of voting for the best prediction of the data packet under inv estigation. 5 (a) (b) Fig. 4: The structure of (a) single LSTM cell, (b) LSTM Network. Fig. 5: Ensemble Model. I V . E X P E R I M E N T A N D R E S U LT T o demonstrate their capability for detecting attacks with- /without temporal correlation, we first implement FNN and LSTM IDSs to establish references for comparison. At this stage, the IDSs only conduct binary classification to predict if T ABLE I: Comparison of the temporally-uncorrelated-attacks detection. Precision Recall F 1 FNN 99 . 996 ± 0 . 006 99 . 84 ± 0 . 05 99 . 92 ± 0 . 03 LSTM 99 . 88 ± 0 . 06 98 . 7 ± 0 . 4 99 . 3 ± 0 . 1 the data packet under in vestigation is normal (labeled as “0”) or attack (labeled as “1”). Consequently , sigmoid function σ ( z ) = e z 1 + e z (5) is selected as the activ ation function. Here, z is the output of the previous LSTM layer . A. Hyper parameters tuning Both IDSs are trained using 70% of the randomly chosen samples from the two datasets and tested with the remaining 30% samples following the 10-fold training/testing procedure so that the average and standard deviation of figures of merits including precision, recall and F 1 can be used for ev aluation. T o determine the number of hidden layers necessary for our FNN, we computed F 1 with 0, 1 and 2 hidden layers where the values of 99.22%, 99.96% and 99.97% are obtained respectiv ely . As shown, employing 1 hidden layers in FNN will increase the F 1 by more than 7% while using 2 hidden layers the improvement is minimal. Therefore, we select 1 hidden layer in our FNN implementation. In addition, to circumvent overfitting, we further adopted early stop procedure in FNN such that the optimization stops when the number of epochs whose relative dif ferences of loss between consecutive ones are less than 10 − 6 reaches 35 [46]. Similarly , LSTM adopts early stop if either maximum epochs reach 3. In implementation of LSTM, we connect 10 LSTM cells in input layer where the features from 10 consecutiv e data packets are entered into the cells to predict if the last packet is normal or an attack. In training, we adopt mini-batch with a batch size of 1 , 000 . B. Detection of temporally uncorr elated attacks W e exploit the Dataset I described in Section II to compare the detection capability of FNN and LSTM for temporally uncorrelated attacks. T o verify the models, learning curves are plotted in Fig. 6 where training and testing losses as a function of training samples are plotted. Here the average value and standard deviation after 10 fold training/testing are represented by circle markers and error bars respecti vely . As shown, with training samples exceeding 40,000, FNN training and testing losses (blue dashed lines) start to con verge while LSTM (red solid lines) conv erges at sample size larger than 60,000. Overall, it confirms that the number of samples in Dataset I is sufficient for the training and testing of our IDS. After the IDSs are trained, we use 30% of samples in Dataset I for 10 fold testing. Also shown in T able II and I, on av erage, for FNN, only 0.6 of the 69,846 normal datapackets are mislabelled as attacks while only 30.7 out of 19,771 6 Fig. 6: Learning Curves of FNN and LSTM using temporally- uncorrelated-attacks dataset (Dataset I). T ABLE II: Confusion matrices of temporally-uncorrelated- attacks detection using Dataset I (av eraged over 10 trials) Predicted Normal Attacks Actual Normal FNN 69 , 845 . 4 0 . 6 LSTM 69 , 902 . 2 22 . 8 Attacks FNN 30 . 7 19 , 741 . 3 LSTM 241 . 9 19 , 448 . 1 T ABLE III: Comparison of temporally correlated attacks (%) Precision Recall F 1 FNN 73 ± 2 49 ± 4 58 ± 2 LSTM 99 . 60 ± 0 . 01 99 . 52 ± 0 . 02 99 . 56 ± 0 . 01 T ABLE IV: Confusion matrix of temporally correlated attacks Predicted Normal Attacks Actual Normal FNN 28 , 668 . 3 5 , 044 . 7 LSTM 33 , 504 . 0 105 . 0 Attacks FNN 13 , 510 . 4 13 , 169 . 6 LSTM 128 . 4 26 , 652 . 6 actual attacks are mislabelled as normal traffic, yielding the precision, recall and F 1 to be 99 . 996 ± 0 . 006% , 99 . 84 ± 0 . 05% , and 99 . 92 ± 0 . 03% . In comparison, LSTM mislabelled 22.8 normal packets as attacks and 241.9 attacks as normal packets, resulting the figures of merits to be 99 . 88 ± 0 . 06% , 98 . 7 ± 0 . 4% and 99 . 3 ± 0 . 1% . The comparison demonstrates that FNN out- performed LSTM in detecting temporally uncorrelated attacks where recognition of the in-packet feature patterns is critical. C. Detection of temporally corr elated attacks In this subsection FNN and LSTM are re-trained and tested using Dataset II for comparison of their temporally correlated attacks detection comparison. Again the learning curves in Fig. 7: Learning Curves of FNN and LSTM using temporally- correlated-attacks dataset (Dataset II). T ABLE V: Macro-average comparison of omni-attacks detec- tion Precision Recall F 1 FNN 88 ± 1 89 . 2 ± 0 . 8 87 . 4 ± 0 . 6 LSTM 99 . 54 ± 0 . 03 99 . 01 ± 0 . 07 99 . 27 ± 0 . 05 Ensemble 99 . 76 ± 0 . 05 99 . 57 ± 0 . 03 99 . 68 ± 0 . 04 Fig. 8 shows that both FNN (blue dashed lines) and LSTM (red solid lines) con verge at training samples exceeding 10,000 while LSTM clearly shows lower testing loss. This confirms the sufficiency of our dataset to generalize the IDS models. The performance of each model is compared in T able III and IV. As shown, FNN is inefficient in detecting temporally correlated attacks with precision, recall and F 1 scores as low as 73 ± 2% , 49 ± 4% and 58 ± 2 respectiv ely . In particular, 5,044.7 out of 33,713 normal pack ets are mislabelled to attacks while 13,510.4 out of 26,680 actual attacks are mislabelled to normal traffic. It is e vident that the poor performance of FNN is caused by its inability to inter-packet features. In contrast, LSTM displays an outstanding performance on the corresponding figures of merits to be 99 . 60 ± 0 . 01% , 99 . 52 ± 0 . 02% and 99 . 56 ± 0 . 01% where only 105.0 normal packets are mislabelled as attacks and 128.4 attacks packets are mislabelled as normal traf fic. As e xpected, LSTM outperforms FNN in detecting temporally correlated attacks due to its inherent nature to observe data pattern in time domain. D. Omni attacks detection Recognizing the mutual strength of FNN and LSTM IDSs in detecting temporally correlated and uncorrelated attacks, we here combine the advantages of both for an omni at- tacks detector through ensemble approach. The structure of FNN-LSTM ensemble is described in Subsection III-C. T o implement, we first remodelled FNN and LSTM to multi- class classifiers so that different attacks can be distinguished. 7 (a) (b) (c) Fig. 8: (a) Precision, (b) Recall and (c) F 1 of indi vidual attacks in omni-attacks detection. Dataset I and II are combined and used to train FNN and LSTM independently . The outputs of both are combined to form the input features of a multilayer perceptron for training. After training, FNN, LSTM and FNN-LSTM ensemble IDSs are integrated into our SCAD A testbed to detect and clas- sify attacks. The traffic is generated online using the script that generates a pre-determined ratio of normal, temporally correlated and uncorrelated attacks as described in Fig 2. T o estimate the figures of merits, we ev enly divide the predicted labels to 10 portions and compute the av erage and standard deviation of macro-av eraged precision, recall and F 1 . As shown in T able V, among all the three IDSs, the FNN achiev e lowest performance with macro-averaged figures of merits of 88 ± 1% , 89 . 2 ± 0 . 8% and 87 . 4 ± 0 . 6% while LSTM reaches 99 . 54 ± 0 . 03% , 99 . 01 ± 0 . 07% and 99 . 27 ± 0 . 05% . In contrast, the FNN-LSTM ensemble IDS further outperforms both with figures of merits to be 99 . 76 ± 0 . 05 , 99 . 57 ± 0 . 03 and 99 . 68 ± 0 . 04 . Detailed analysis in Fig. 8 further confirms that the under-performance of FNN (yellow bars) are due to the mislabels of temporally correlated attacks (MITM, CRC and SCAN) while the performance of LSTM (red bars) by temporally uncorrelated attacks (“Pump Speed (Pump)”, “T ank 1 Lev el (T1)”, and “Threshold High (H)”, etc.). Overall, the FNN-LSTM ensemble demonstrates a consistent out- performance over them in all types of attacks. V . C O N C L U S I O N In this paper we demonstrated that the FNN-LSTM ensem- ble IDS can detect all types of cyberattacks regardless of the their temporal rele vance. In opposite, FNN only performance well in temporally uncorrelated attacks and LSTM is relativ ely weak in uncorrelated attacks. In future research we will further improv e our model through field trials. A C K N O W L E D G M E N T R E F E R E N C E S [1] S. Adepu and A. Mathur, “ An inv estigation into the response of a water treatment system to cyber attacks, ” in 17th IEEE International Symposium on High Assurance Systems Engineering (HASE 2016) , pp. 141–148, 2016. [2] H. Huang, W . Zhang, G. Qi, S. Ma, Y . Y ang, F . Y an, and P . Chen, “Research on accident inv ersion and analysis method of the oil and gas pipeline SCADA system, ” in 2014 Sixth International Conference on Measuring T echnolo gy and Mechatr onics Automation , pp. 492–496, 2014. [3] S. Karnouskos, “Stuxnet worm impact on industrial cyber-physical system security , ” in IEEE 37th Annual Conference of the Industrial Electr onics Society IECON 2011 , pp. 4490–4494, 2011. [4] M. Roesch, “Snort - lightweight intrusion detection for networks, ” in Pr oceedings of the 13th USENIX Conference on System Administration , ser . LISA ’99. Berkeley , CA, USA: USENIX Association, pp. 229–238, 1999. [5] M. A. A ydn, A. H. Zaim, and K. G. Ceylan, “ A hybrid intrusion detection system design for computer network security , ” Computers & Electrical Engineering , vol. 35, no. 3, pp. 517 – 526, 2009. [6] W . Gao and T . Morris, “On c yber attacks and signature based intrusion detection for modbus based industrial control systems, ” Journal of Digital F or ensics, Security and Law , V ol. 9, No. 1, Article. 3, 2014. [7] Y . W ang, Z. Xu, J. Zhang, L. Xu, H. W ang, and G. Gu, “SRID: State relation based intrusion detection for false data injection attacks in SCAD A, ” in Computer Security - ESORICS 2014 , pp. 401–418, 2014. [8] V . Jyothsna, V . V . Rama Prasad, and K. Munivara Prasad, “ A revie w of anomaly based intrusion detection systems, ” International Journal of Computer Applications , vol. 28, pp. 26–35, 08 2011. 8 [9] D. Damopoulos, S. A. Menesidou, G. Kambourakis, M. Papadaki, N. Clarke, and S. Gritzalis, “Evaluation of anomaly-based IDS for mobile devices using machine learning classifiers, ” Security and Communication Networks , vol. 5, no. 1, pp. 3–14, 2012. [10] G. Nascimento and M. Correia, “ Anomaly-based intrusion detection in software as a service, ” in 2011 IEEE/IFIP 41st International Conference on Dependable Systems and Networks W orkshops (DSN-W) , pp. 19–24, June 2011. [11] M. S. Islam and S. A. Rahman, “ Anomaly intrusion detection system in wireless sensor networks: security threats and existing approaches, ” International Journal of Advanced Science and T echnology , vol. 36, no. 1, pp. 1–8, 2011. [12] O. Linda, T . V ollmer , and M. Manic, “Neural network based intrusion detection system for critical infrastructures, ” in 2009 International Joint Confer ence on Neural Networks , pp. 1827–1834, 2009. [13] J. Rrushi and K.-D. Kang, “Detecting anomalies in process control networks, ” Critical Infrastructur e Pr otection III , pp. 151–165, 2009. [14] G. Poojitha, K. N. Kumar , and P . J. Reddy , “Intrusion detection using artificial neural network, ” in Computing Communication and Networking T echnolo gies (ICCCNT), 2010 International Confer ence on Computing, Communication and Networking T echnologies . pp. 1–7, 2010. [15] Y . Zhang, L. W ang, W . Sun, R. C. Green II, and M. Alam, “Distributed intrusion detection system in a multi-layer network architecture of smart grids, ” IEEE T ransactions on Smart Grid , vol. 2, no. 4, pp. 796–808, 2011. [16] L. A. Maglaras and J. Jiang, “Intrusion detection in scada systems using machine learning techniques, ” in 2014 Science and Information Confer ence . London, pp. 626–631, 2014. [17] L. Zhang, “ An implementation of scada network security testbed, ” Masters thesis, University of V ictoria, V ictoria, BC , 2015. [18] L. T . Heberlein and M. Bishop, “ Attack class: address spoofing, ” in Pr o- ceedings of the 19th National Information Systems Security Conference , 1997. [19] S. Kumar , “Smurf-based distributed denial of service (ddos) attack am- plification in internet, ” in Second International Confer ence on Internet Monitoring and Pr otection (ICIMP 2007) , San Jose, CA, pp. 25–25, 2007. [20] B. Chen, N. Pattanaik, A. Goulart, K. Butler-Purry , and D. Kundur , “Implementing attacks for modbus/tcp protocol in a real-time cyber physical system test bed, ” Proceedings - CQR 2015: 2015 IEEE In- ternational W orkshop T echnical Committee on Communications Quality and Reliability , pp. 1–6, 2015. [21] N. Sayegh, A. Chehab, I. H. Elhajj, and A. Kayssi, “Internal security attacks on scada systems, ” in 2013 Third International Conference on Communications and Information T echnology (ICCIT) , pp. 22–27, 2013. [22] D. Y ang, A. Usynin, and J. Hines, “ Anomaly-based intrusion detection for SCADA systems, ” in Pr oceedings of the 5. International T opical Meeting on Nuclear Plant Instrumentation Contr ols, and Human Ma- chine Interface T echnology , vol. 43, no. 47, pp. 797–803, 2006. [23] W . Gao, T . Morris, B. Reaves, and D. Richey , “On scada control system command and response injection and intrusion detection, ” 2010 eCrime Resear chers Summit , Dallas, TX, pp. 1-9, 2010. [24] L. A. Maglaras, J. Jiang, and T . Cruz, “Integrated ocsvm mechanism for intrusion detection in SCAD A systems, ” Electronics Letters , v ol. 50, no. 25, pp. 1935–1936, 2014. [25] A. Graves, A.-r . Mohamed, and G. Hinton, “Speech Recognition with Deep Recurrent Neural Networks, ” arXiv e-prints , p. Mar . 2013. [26] D. Eck and J. Schmidhuber , “ A first look at music composition using lstm recurrent neural networks, ” T echnical Report. Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale, 2002. [27] I. Sutsk ev er, O. Vin yals, and Q. V . Le, “Sequence to Sequence Learning with Neural Networks, ” arXiv e-prints , p. arXi v:1409.3215, Sep. 2014. [28] C. D. McDermott, F . Majdani, and A. V . Petrovski, “Botnet detection in the internet of things using deep learning approaches, ” in 2018 International Joint Conference on Neural Networks (IJCNN) , Rio de Janeiro, pp. 1–8, 2018. [29] C. Feng, T . Li, and D. Chana, “Multi-level anomaly detection in industrial control systems via package signatures and lstm networks, ” in 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) , Den ver, CO, pp. 261–272, 2017. [30] S. Hochreiter and J. Schmidhuber, “Long short-term memory , ” Neural Comput. , vol. 9, no. 8, pp. 1735–1780, Nov . 1997. [31] T . Morris and W . Gao, “Industrial control system traffic data sets for intrusion detection research, ” in Critical Infrastructur e Pr otection VIII , J. Butts and S. Shenoi, Eds. ICCIP 2014. IFIP Advances in Information and Communication T echnology , v ol 441. pp 65–78,2014. [32] T . Morris, A. Sriv astava, B. Reaves, W . Gao, K. Pavurapu, and R. Reddi, “ A control system testbed to validate critical infrastructure protection concepts, ” International Journal of Critical Infrastructure Pr otection , vol. 4, no. 2, pp. 88 – 103, 2011. [33] H. W ang, T . Lu, X. Dong, P . Li, and M. Xie, “Hierarchical online intrusion detection for scada networks, ” arXiv , p. 1611.09418, 2016. [34] S. Patel, “IEC-61850 Protocol Analysis and Online Intrusion Detection System for SCAD A Networks using Machine Learning, ” Master’ s thesis, Univ ersity of Victoria, V ictoria, BC, 2017. [35] MBLogic. Mblogic homepage. [Online]. A vailable: http://mblogic. sourceforge.net/inde x.html [36] N. Pro vos. Honeyd. [Online]. A v ailable: http://www .honeyd.org/ [37] ProconX Pty Ltd. Modpoll modb us master simulator . [Online]. A vailable: http://www .modbusdri ver .com/modpoll.html [38] Ettercap, a comprehensive suite for man in the middle attacks. [Online]. A vailable: http://openmaniak.com/ettercap.php [39] Z. Trabelsi and W . El-Hajj, “ Arp spoofing: A comparative study for education purposes, ” in 2009 Information Security Curriculum Development Confer ence , ser . InfoSecCD ’09. New Y ork, NY , USA: A CM, pp. 60–66, 2009. [40] Snort ARP spoof preprocessor. [Online]. A vailable: http: //manual- snort- org.s3- website- us- east- 1.amazona ws.com/node17.html [41] F . Chollet et al. , “Keras, ” https://keras.io, 2015. [42] M. Abadi, et. al. “T ensorflow: A system for large-scale machine learning, ” in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) , pp. 265–283, 2016. [43] D. Kingma and J. Ba, “ Adam: A method for stochastic optimization, ” arXiv preprint arXiv:1412.6980 , 2014. [44] D. P . Kingma and J. Ba, “ Adam: A method for stochastic optimization, ” CoRR , vol. abs/1412.6980, 2014. [45] T . G. Dietterich, “Ensemble methods in machine learning, ” in Pr oceedings of the F irst International W orkshop on Multiple Classifier Systems , ser . MCS ’00. London, UK, UK: Springer -V erlag, pp. 1–15, 2000. [46] L. Prechelt, “Early stopping-but when?” in Neural Networks: T ricks of the Tr ade . London, UK, UK: Springer-V erlag, pp. 55–69, 2012.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment