Driver Identification Based on Vehicle Telematics Data using LSTM-Recurrent Neural Network
Despite advancements in vehicle security systems, over the last decade, auto-theft rates have increased, and cyber-security attacks on internet-connected and autonomous vehicles are becoming a new threat. In this paper, a deep learning model is propo…
Authors: Abenezer Girma, Xuyang Yan, Abdollah Homaifar
Dri v er Identification Based on V ehicle T elematics Data using LSTM-Recurrent Neural Network Abenezer Girma, Student Member ,IEEE , Xuyang Y an and Abdollah Homaifar Autonomous Control and Information T echnology (A CIT) Institute Department of Electrical and Computer Engineering, North Carolina A & T State Uni versity Email: aggirma@aggies.ncat.edu, xyan@aggies.ncat.edu, homaifar@ncat.edu Abstract —Despite advancements in vehicle security systems, over the last decade, auto-theft rates hav e increased, and cyber - security attacks on internet-connected and autonomous vehicles are becoming a new thr eat. In this paper , a deep lear ning model is proposed, which can identify drivers from their driving beha viors based on vehicle telematics data. The pr oposed Long-Short-T erm- Memory (LSTM) model predicts the identity of the driver based on the individual’ s unique driving patterns learned from the vehicle telematics data. Giv en the telematics is time-series data, the problem is formulated as a time series prediction task to exploit the embedded sequential information. The performance of the proposed appr oach is ev aluated on three naturalistic driving datasets, which gives high accuracy pr ediction results. The rob ustness of the model on noisy and anomalous data that is usually caused by sensor defects or en vironmental factors is also in vestigated. Results show that the proposed model prediction accuracy remains satisfactory and outperforms the other approaches despite the extent of anomalies and noise- induced in the data. Index T erms —Dri ver identification, deep lear ning, LSTM RNN, deep neural network, v ehicle telematics data, OBD-II, CAN bus I . I N T RO D U C T I O N Although the technological improvement of automobile technologies are advancing, the vehicles, security problem is not suf ficiently addressed. For the last decade, auto theft rates ha ve increased around the globe. According to an FBI Crime Report, in 2017 there has been an estimated 773,139 motor vehicle thefts in the USA, a 10.4 % increase when compared with the 2013 report 1 . Secondly , connected and autonomous cars are linked to the internet, which increases their vulnerability for cyber -attacks more than ev er . In 2015, Jeep recalled 1.4 million connected vehicles after hackers remotely hacked and controlled the 2014 model Jeep car ov er the Internet 2 . Thirdly , in shared mobility and insurance companies, identifying the car operator is vital in prev enting dangers caused by unauthorized driv ers. T o address those mentioned v ehicle security problems, this paper proposes a data-dri ven driv er identification technique that can be implemented as an additional line of security for keeping cars safe from unauthorized drivers including thiev es and hackers. The proposed method is based on freely 1 https://ucr .fbi.gov/crime-in-the-u.s/2017/crime-in-the-u.s.-2017/topic- pages/motor-v ehicle-theft 2 https://www .wired.com/2015/07/jeep-hack-chrysler-recalls-1-4m-vehicles- bug-fix/ av ailable vehicle telematics data, also called OBD-II (On Board Diagnosis) data. OBD-II interface of a vehicle provides in-vehicle sensor reading such as vehicle speed, engine RPM, throttle position, engine load, break-pedal displacement, etc. As shown in Figure 1 , OBD-II dongles can be used to e xtract these internal sensors data in real-time to infer information about the car and its driv er . In-vehicle sensors data are directly or indirectly are influenced by the driv ers driving style. The driving style of each indi vidual v aries depending on ho w the y maneuver their vehicle. How frequently the dri ver uses the brake and the gas pedal or how much pressure is applied on the brakes or how the steering wheel angle is adjusted at curves [ 1 ] are a fe w examples. The driver’ s unique driving style attributes directly or indirectly manifested on generated vehicle telematics data. Fig. 1. In-V ehicle (telematics) data acquisition through OBD-II interface Combining v ehicle telematics data with appropriate machine learning tools, helps one to recognize different driving styles [ 1 ], uncov er driving behavior and patterns [ 2 ], and even detect hazardous driving behaviors [ 3 ]. Howe ver , most of the traditional algorithms used in the driver identification task rely on rigorous data-prepossessing steps that require either domain expert knowledge or an extensi ve data exploration process. Secondly , while the vehicle telematics sensor data is time- series data, con ventional machine learning algorithms don’t hav e the inherent ability to exploit sequential relationships. In contrast, end-to-end deep learning-based methods such as Recurrent Neural Network (RNN) can directly extract the most important features without data pre-processing and also exploit temporal relationship from the data in a holistic data-dri ven approach [ 4 ], [ 5 ]. This paper proposes an end-to-end deep learning-based model as a driv er identification technique using RNN architec- tures v ariant called Long-Short T erm Memory (LSTMs). The end-to-end approach of LSTM enables to extract important features without rigorous data pre-processing procedures. The time-series nature of LSTM algorithm allows to holistically exploit the inherent temporal information embedded in time- series data captured from sensors in the driving sessions. Ad- ditionally , based on conducted studies, the proposed approach showed robust performance compared with other con ventional machine learning algorithms ev en under the increasing influ- ence of anomalous and noisy sensor data reading. Finally , we hav e made our model code and its’ comparison with other models av ailable at https://github .com/Abeni18/Deep-LSTM- for-Dri ver-Identification- The main contrib utions of this paper are summarized as follows: • W e proposed a data-driven robust driv er identification system based on end-to-end Long-Short-T erm-Memory (LSTM)-Recurrent Neural Network model. The proposed model architecture utilizes a holistic data-driv en approach to capture the dri ving signature of indi viduals out of telematics data to be able to identify the driv er . • An efficient LSTM architecture is searched and imple- mented to achiev e robust performance. • The effect of sensor data anomalies and random en- vironmental noise influence on the performance of the proposed approach is studied. W e then compared the accuracy of the proposed driv er identification model with three well-known con ventional machine learning models, and a comprehensiv e comparison is presented. The remaining part of this paper is arranged as follows: Section II re views related w ork in the literature. Section III presents a detailed discussion of our proposed methodology . Section IV discusses the experimental studies on a real-world data-set, and the results are presented in Section V . Finally , the conclusions including summary of major points done in Section VI. I I . R E L A T E D W O R K S W ith the success of machine learning algorithms and data mining techniques, in the last couple of years, a growing interest is shown in utilizing OBD-II data for developing a driv er identification system. This section provides a brief revie w of the work conducted by the researchers to extract vital information from this data to identify driv ers. V irtual simulators have been used by v arious researchers to generate data that is similar to real vehicle internal sensor data. W akita et. [ 6 ] and Zhang et. [ 7 ] ha ve used simulation in controlled routes and settings to collect data to dev elop driv er identification predictiv e models. Zhang et. used a hidden Markov model (HMM) to model indi vidual characteristics of driving beha vior based on accelerator and steering wheel angle data and managed to reach a maximum prediction accuracy of 85 % . On the other hand, W akita et. used a Gaussian Mixture Model (GMM) with input data of accelerator pedal, brake pedal, vehicle velocity , and distance from a front vehicle and achiev ed 81 % accuracy with twelve drivers driving in a simulator and 73 % accurac y with 30 dri vers driving in an actual car . Both of these studies rely on few numbers of features or sensor readings, that could not be able to capture all the hidden behavioral information. Although these works provide insights into the success of machine learning approaches for this task, the results are not satisfactory enough or cannot be compared to the real-world situation that in volv es various uncontrolled settings such as different traffic patterns and en vironmental conditions like weather . Kwak [ 8 ] studied the driving patterns of dri vers using data collected in an uncontrolled en vironment from three road types; motorway , city way and parking lot with ten driv ers participating in the experiment. This study compared Decision T ree, KNN, Random Forest and Fully Connected Neural Network algorithms based on their performance in predicting the drivers accurately . They performed data preprocessing that includes feature selection and statistical feature formation such as mean, median, and standard deviation to increase the model performance. According to the results, Random Forest and Decision T ree are the two most accurate algorithms [ 8 ], [ 9 ]. They also showed the importance of adding statistical features to the data to get an accuracy le vel above 95 % . Fabio [ 9 ] classified driv ers through a detailed analysis of human behavior characterization for driving style recognition in-vehicle systems. The pre-processed data is used to compare fiv e classification algorithms J48, J46graft, J47consolidated, Random-tree, and Rep-tree. According to their results, J48 and J48graft classification algorithms sho wed better performance on dif ferent measurement matrices. This work focused on the statistical analysis of the data by comparing dif ferent classifi- cation algorithms based on various performance metrics. Most of these studies in literature focused on rigorous data analysis that in volv es feature selection [ 10 ], [ 11 ], and statistical feature formation techniques. These processes are time-consuming, and some features may only exist in some vehicles, which limits the practicability of the w ork. In this paper , we do not use feature selection and formation process. By taking raw data from any car , our approach can extract important features in a holistic data-dri ven approach. Secondly , while the OBD data has a time-series property , they hav en’t used time-series algorithms. This hinders the advantage of using embedded temporal information in the data. Thirdly , ev en though the effect of anomalous and noisy sensor reading is a common problem in real-world implementation, their effect on the performance of the algorithms used in driver identification system hasn’t been studied. I I I . P RO P O S E D M E T H O D O L O G Y This section discusses how we dev eloped a driv er identi- fication model based on OBD data to pro vide an additional Fig. 2. Driver Identification System Framework Fig. 3. Deep LSTM Driv er Identification model Architecture line of security for vehicles’ protection. Firstly , the problem formulation of the driver identification system is discussed. Secondly , an e xplanation of the model and the learning process we adopted to solve the problem are presented. Finally , more details about the proposed method is discussed. A. Pr oblem F ormulation Giv en OBD-II data is a sequence of sensor data collected ov er time from the car during a driving trip, we formulate the driving identification problem as a time-series prediction task where a window sequence of OBD-II data 0 S 0 from time T start to time T f inish can be identified as it is from one of the driv ers out of a gi ven set of individuals, as shown in Figure 2 , 3 . Additionally , the performance of the model to giv e a correct prediction under different level of en vironmental noise influence and sensor anomalies has been in vestigated. B. Model development Recurrent Neural Networks (RNNs) are one of the most successful time series algorithm techniques that have been widely used for time-series classification tasks like speech recognition, machine translation, and human acti vity recog- nition [ 12 ]. As shown in Fig. 3 , RNNs take sequential input X t at time t and output y t based on a decision made by an internal state h t . The internal state is a connection used to extract important features and capture the temporal dynamics of time-series data,i.e., data coming from sensor reading. This internal state, named hidden state , is multiple copies of a fully connected neural network, each passing a message to a successor for the next time step. Based on the number of inputs and expected outputs, RNNs architectures can be designed in dif ferent ways. In this w ork, there are multiple input features at the input layer , and a single prediction output is expected at the output layer . Because of that “Many to One” type of RNN architecture is used called “Many to One” . As shown in Figure 3 , our model takes X s feature vectors of time-series data with a window size of ( X 0 to X n ) and predicts output vector O s . Then out of the predicted output, the one with the highest probability is selected as a final prediction. Out of a v ailable features, at each time step, one feature v ector is analyzed by the model, which makes the number of time steps equal to the number of features in the data. While the data passes through the hidden layer of Deep-LSTM neural network, important features are extracted, and temporal dependencies get captured automatically . Finally , after the last time-step, the network produces a likelihood score for each driv er as an output value. Then, the one with the highest score is selected as a prediction. Unfortunately , the “vanishing gradient” problem can cause RNNs to become unable to learn to connect long term dependencies when the number of the hidden state increases. One particular v ariant of RNNs architectures called the Long-Short T erm Memory (LSTM) network addresses this problem by using ”memory block” in the hidden unit to capture the long-term-dependencies that could exist in the data. This memorizing capability of LSTM has shown the best performance in many time-series tasks such as activity recognition, video captioning, language translation [ 13 ], [ 14 ]. The cell state (memory block) of LSTM has one or more memory cells that are regulated by structures called gates. Gates control the addition of new sequential information and the remo v al of useless to and from memory , respectiv ely . Gates are a combination of sigmoid activ ation function and a dot (scalar) multiplication operation, and they are used to control information that passes through the netw ork. An LSTM often has three gates, namely forget, input, and output gates, which are summarized as follow: Fig. 4. Long-Short T erm Memory (LSTM) memory block graphical repre- sentation. • Forget gate : Forget gate, equation 1 , decides what information to keep or remov e from the cell state. f t = σ ( W f . [ h t − 1 , x t ] + b f ) (1) • Input gate : Input gate, equation 2 , decides what new information to add and how to update the old cell state, C t − 1 , to the new cell state C t for the next memory block. i t = σ ( W i . [ h t − 1 , x t ] + b i ) ˆ C t = tanh ( W c . [ h t − 1 , x t ] + b c ] C t = f t ∗ C t − 1 + i t ∗ ˆ C t (2) • Output gate : Finally output gate, equation 3 , filters out and decides which information to produce as an output from a memory block at a giv en time step t. o t = σ ( W o [ h t − 1 , x t ] + b o ) h t = o t + tanh ( C t ) (3) T ABLE I T A B LE O F P A RA M E T ER S F O R T H E L S T M M O D E L V ariables Definition (respectively) X t and h t input and output of the memory cell h t − 1 input from previous state f t , i t , o t activ ation function of forget,input & output gates W f , W i , W C , W o , weights of forget,input, candidate & output gates b f , b i , b c , b o biases of forget, input, candidate and output gates ˆ C t and C t candidate cell and updated cell state value The memory blocks are connected to build layers of the Deep LSTM neural network, as shown in Figure 3 . The last layer of our Deep-LSTM model is a sigmoid function which is sho wn in equation 4 . It takes the last hidden layer feature vectors and outputs classification scores for the given set of driv ers. The one with the highest probability score is then selected as a final prediction. During training, a family of the cross-entropy loss function and an Adam optimization algo- rithm are used. These are some of the hyper-parameters that are fixed initially to build the model. There are other hyper- parameters such as the number of neurons, the depth of the network, the input data sequence length, and the data ov erlap amount. The best combination of these hyper-parameters has to be selected for developing an ef ficient model. Accordingly , sev eral experiments are conducted to choose those hyper- parameters, which is discussed in the next section. σ ( z ) = 1 1 + e − z (4) Where z is the output of the network. C. OBD-II Data European and North American countries adopted OBD (On Board Diagnosis) technology to standardize the way vehicles can be checked for compliance. The OBD-I system was firstly used in the early 1980s, and since 1998 all vehicles sold in the USA were enforced by law to be OBD-II prepared [ 15 ]. There are sev eral standardized communication protocols used by car manufacturers to communicate data between ECM (Electronic Control Modules) of a car and a scan tool such as ISO 9141-2, SAE J1850 PMW , SAE J1850 VPW , ISO 14230- 4, SAE J2284, ISO 15765-4, etc [ 15 ]. OBD-II scan tool can automatically detect the communication protocol and vehicle features which makes it a plug-and-play de vice. In comply with OBD-II standards, OBD program is divided into sub-group programs referred to as ’Service $xx’, which range from Service $01-to-Service$09 and Service $0A. Ac- cordingly , service $01 displays an actual real-time reading of in-vehicle sensor data, where the data is commonly referred to as PID (Parameter Identification) data. Other services such as Service $02 displays the state of the PIDs when a fault oc- curred, Service $03 used to access DTCs (Diagnostic Trouble Code) to display trouble codes and so on. Service $01 is an actual sensor reading where most of the features commonly exist across different cars, and it’ s also real-time information. As a result, our approach mainly used data obtained from Service $01 as a source of information. I V . E X P E R I M E N T S The model dev elopment and experimentation process con- sist of three steps, which are data preparation, LSTM archi- tecture design, and model robustness study on dataset affected by sensor data anomalies and noise. A. Data Description and pr eparation As shown in Figure 1 and 2 , Bluetooth, W iFi or USB enabled OBD-II dongles can be plugged into the OBD-II interface of the car to collect the internal real-time sensors reading by external de vices like mobile phone, laptop or cloud. The e xperiments conducted in this paper used the following three datasets: Security Driving Dataset [ 16 ]: This dataset is collected by KIA Motors Corporation car with CarbigsP as OBD-II scanner . The experiment took place in an uncontrolled environment that consists of three types of path; city way , parking space, and motorway . T en driv ers participated in the experiments, and for a reliable classification, each driver completed two round trips on weekdays during off-peak hours from 8 p.m to 11 p.m. There are 94 , 401 data points recorded with 51 different independent features. V ehicular data trace Dataset-1 [ 17 ]: This dataset is col- lected as a part of Intelligent Transportation Systems (ITS) study and V ehicular Ad-hoc Networks (V ANETs). The data is obtained from the OBD-II interface via Bluetooth connection to an Android app installed in a smart-phone. In this data collection, the Hyundai HB20 model vehicle is shared by ten driv ers for 36 trips cov ering a total trip time of 28 hours in their daily routines. Therefore, the e xperiment is naturalistic and uncontrolled. Six male and 4 females with an age range of 25 to 61 participated in the experiment. V ehicular data trace Dataset-2 [ 17 ]: This dataset is col- lected in a similar situation as V ehicular data-trace Dataset- 1 except the experiment is controlled. In this experiment, a Renault Sandero model vehicle is shared with four dri vers to driv e through two different selected routes for a total trip time of 3 hours, which makes it a controlled experiment. T wo male and two female with an age range of 20 to 53 participated in the experiment. T o cancel out the ef fect of different sensor measurement scales, the datasets are standardized using equation 5 , no further preprocessing is applied. Then the datasets are split into training 85 % , validation 5 % and test 10 % data. All training, validation, and test data are then sliced into sequences of a sliding windo ws data chunk. The study used 50 % ov erlapping between a consecutiv e series of data chunks before the data is fed to the model. The ov erlapping window is important to smooth the flo w of sequence, capture sequential information, and increase training data size for better generalization. X new = X − X min X max − x min (5) Where X denotes a given data point, X min and X max are minimum and maximum data points in the data set and X new is the new normalized data. Fig. 5. Overlap sliding window method between consecuti ve sequence of data of feature 1 and 2 B. Deep-LSTM ar chitectur e design As discussed in the previous section, ‘many-to-one’ Deep- LSTM formation is appropriate for this problem. As shown in Figure 3 , ‘Man y-to-One” only defines the number of input(many) and output(one) of the network, whereas the internal part of the network could be constructed in sev eral ways. The number of layers and the number of neurons in hidden layers are some of the internal network parameters that determine both the accuracy and computational complexity of the model [ 4 ]. As the depth of the network and number of neurons in the hidden layer increases the accuracy often increases, but at a cost of computational resources [ 5 ]. A grid architecture search technique is applied using training and validation data to find the most ef ficient architecture. A two hidden layer network with 160 neurons in the first hidden layer and 200 neurons in the second hidden layer found to be the efficient neural architecture. As a time series algorithm, LSTM model takes a sequence of data as an input. Accordingly , we searched for an adequate number of sequence size. With 85 % training and 5 % validation data, different window sizes (4, 8, 16, 32, 64 and 120 data points) are tested on the de veloped architecture. As shown in Fig. 6 , a window size of 16 data-points is found to be the best fit. In the next section, the performance of the designed architecture is tested on real-world data-set, and its robustness against noise and sensor anomalies is studied. Fig. 6. Search for window size for sequence of data C. Model Robustness to sensor data anomalies and noise The performance of telematics data-based dri ver identifica- tion systems depends on the reliability of the collected vehicle sensors data. Howe ver , sensors are prone to failures, defects, and cyberattack attempts which introduces anomaly in the data [ 18 ]–[ 20 ]. Additionally , due to extreme en vironmental factors such as temperature, noise could also significantly and constantly corrupt the quality of sensor data during generation and transport the data to the central data collection unit [ 21 ]. Thus, the dri ver identification model has to be robust in such anomalous and noisy data to ensure a correct prediction in a real-world implementation. By considering this, we have studied the effect of increasing le vels of sensor data anomalies and noise influence on the performance of the proposed approach and its comparison with other three widely used machine learning algorithms. Noise is a random error or v ariance in a measured v ariable [ 22 ]. In this study , White Gaussian Noise (WGN) type is applied to de grade the original dataset at hand. WGN is used to mimic the effect of random noise occurrence in electronic system [ 21 ], [ 22 ]. It is assumed that the noise is randomly distributed for all models, and it is independent of the original data. Then ev ery independent variable X old i in dataset is going to be substituted by a noise inflected one X new i with a probability of n, where n refers to the le vel of noise. Equation 6 is used to calculate the new noisy data. X new i = ( X old i + r and ( σ i , µ i ) p i > = n, X old i p i < n (6) Where the noise is generated with probability p i using a random signal (rand) that has zero mean µ i , which makes it centered on the original data value, and standard-deviation based on the data variance σ i that describes how sev ere the noise affects the data. The induced standard deviation (std) varies from zero std (no noise) to two std, we choose two in order to keep the noise ef fect in tw o normal distribution re gion. When the noise is added on the data it keeps the original patterns but it wiggles individual data points away to some extent from the actual value based on the standard deviation of the noise. As sho wn in figure 7 the harshness le vel of the noise can be increased by increasing the standard de viation of the noise. On the other hand anomalies/outliers are abnormal Fig. 7. V isualization of dataset inflicted with different le vel of noise extreme patterns or events in data that do not conform to a well defined notion of normal beha vior [ 23 ]. Most observ ations in data lies in between two normal regions (standard deviation), points that are far away from these regions are usually referred to as anomalies or outliers [ 24 ]. Accordingly , we induced random anomaly in the original data that push some of the data points out of the normal region by increasing the values of data points with the gi ven percentage. As shown in Figure 8 , when 40 % of Engine load sensor reading data is af fected by 40 % of anomaly (40% increment from original data point) and 85 % of anomaly , some of the af fected data point changed to outliers. In the next sections, the performance of the proposed approach will be examined and tested under the influence of such induced noise and sensor anomalies. V . R E S U LT S A N D D I S C U S S I O N S Three well-kno wn e valuation measures, namely F1-Score, Precision, and Recall, are used to ev aluate the performance of the proposed Deep LSTM model. These metrics are defined in the following equations. • Precision: P r ecision = T P T P + F P . (7) Fig. 8. Increasing level of anomalies rate induced in the data • Recall: Recal l = T P T P + F N . (8) • F1-score: F 1 = 2 × P r ecision × R ecall P r ecision + R ecall . (9) Where T P denotes the number of samples that has the same predicted label with the true class label, F P represents the number of samples that are classified into a class that does not belong to original class. The term F N refers to the number of samples that the classifier fails to classify . In this study , T ensorFlow 3 & Keras 4 Deep Learning Library is used to develop our LSTM model, and sklearn 5 machine learning library is used to replicate other algorithms used for compression. The computer used for developing, training and testing the models is Hp Z840 workstation with Intel Xeon CPU and 64 GB RAM size. Three datasets are used to e v aluate the proposed approach, where the datasets are separately divided into 85% training and 5 % validation and 10 % test data. From T ABLE II , it is observ ed that the proposed LSTM model provides high accuracy in predicting driv er identity . For instance, the av erage recall, precision, and the F1-Score of the proposed LSTM model are above 97% , which indicates the efficacy of the model in the driv er identification tasks. T ABLE II L S TM M OD E L A CC U R AC Y O N N A T U RA L I S TI C DR I V I NG DA T A S ET S Dataset Driv ers Precision Recall F1 score Security Driving data 10 0.988 .981 .98 V ehicular data trace -1 10 0.97 .972 .975 V ehicular data trace-2 4 0.99 0.991 .987 T o compare our model against other driv er identification techniques, we picked the three most popular machine learning algorithms from literature, which are Random Forest (RF), Decision Tree (DT) and Fully Connected Neural Network (FCNN). The same dataset is used by other authors is also used in our study [ 8 ], [ 9 ], [ 25 ]. The same data split is used to train and test all models that are training 85%, v alidation 5%, and test 10% of the ov erall data. As discussed in section 3 https://www .tensorflow .org/ 4 https://keras.io/ 5 https://scikit-learn.org/ IV -C , in the first experimental study , all models are trained on training data with no noise, and then they are e v aluated on test data with noise. White Gaussian Noise (WGN) is applied to the dataset where standard deviations of the WGN are used to control the induced level of noise. Each model is separately e v aluated on test data with different noise lev el varying from zero lev els to two standard de viation lev el (in normal distribution range). The model is ev aluated ten times for each noise le vel, then the av erage accuracy value is taken, and the result is presented in Fig. 9 . Fig. 9. Models accuracy comparison by inducing increasing lev el of noise on test-data from Security Driving Dataset Fig. 10. Models accuracy comparison trained and tested on noise inflicted Security Driving Dataset On the other hand, modeling a supervised machine learn- ing technique that can ef fecti vely learn from dataset already inflicted with noise is a problem of great practical importance [ 26 ]. In practical applications, it is observed that models usually ov erfit in the presence of noise in training data [ 27 ]. T o train driv er identification model in real-time on unclean data directly coming from the vehicle, the model needs to be robust enough to noises in training data too. Accordingly , in the second experimental study , we further experimented by training the selected models on noisy data and then testing them on noisy data. As shown in Figure 10 , neural network-based models (LSTM followed by FCNN) performed better than the others by avoiding ov er -fitting on the noise. As discussed in Section IV -C , aiming to assess the impact of anomalies/outliers of sensor data on the performance of the models, we introduced a dif ferent le vel of outliers to the original test-data. Accordingly , in the third experimental Fig. 11. Models accuracy on increasing lev el of anomalies induced on test- data from V ehicular data trace Dataset-1 study , 40 % of overall test-data affected by anomaly rate ranging from 0% to 65 %. As shown in Figure 11 and 12 , a comparison between the proposed approach accuracy obtained on anomalous data against other models is presented. Fig. 12. Models accuracy on an increasing lev el of anomalies induced on VDT(V ehicular data-trace) and Security (Sc) dataset As presented in the above results, for clean data, con- ventional machine learning models used in literature’ s have comparable or less accuracy compared to the proposed model. Howe ver , under sensor anomalies/outliers or environmental noise, other model’ s performance quickly droped below an unacceptable range. But, the proposed Deep-LSTM model keeps its accuracy above an acceptable lev el in all cases. This difference is mainly attributed to the fact that unlike LSTM, other classical machine learning models do not have an inherent ability to exploit temporal relationship from time- series data. Other con ventional machine learning models, including FCNN, examines a data point recorded at one-time step (single ro w). But, LSTM examines a sequence of data points (multiple consecuti ve rows) to e xtract time-dependent patterns from the data. The internal memory in LSTM helps to remember patterns existed in previous time steps in relation to current timestep or e vents. As a result, LSTM is considered to be the best machine learning algorithm at remembering pieces of information and keep it saved for many time steps [ 5 ]. The importance of capturing the temporal relationship is clearly shown when we compare the two models from the same family , which are Fully Connected Neural Network (FCNN) and the proposed LSTM Recurrent Neural Network. Even though the same exact network was used for building both neural networks architecture, LSTM performed well under noise; howe ver , FCNN performed poorly . Therefore, due to the abov e-discussed reasons and its architectural design (this discussed in detail in section III-B ) the proposed LSTM based approach performed better than the other con ventional machine learning models. V I . C O N C L U S I O N S T o address an increasing vehicle security problem, we presented an end-to-end LSTM-RNN architecture as a driv er identification model. The model is de veloped based on freely av ailable vehicle telematics data collected from the OBD-II in- terface of vehicles. The problem is formulated as a time-series prediction task, where the model is trained on a sequence of in-vehicle sensor data. LSTM has an inherent ability to remember temporal information in data and keep it sa ved for many time steps than the other conv entional machine learning approaches. Accordingly , the proposed model efficiently learns individual unique driving patterns from the data to identify the driv er . The holistic data-driven approach of the technique also has the adv antage of avoiding rigorous data pre-processing procedures. The proposed method is ev aluated on a real- world dataset using different metrics and achieved a better or comparable result against other models. Further studies on anomalous and noisy sensor data show our model scores substantially better than the others. Even under increasing noise and outliers ef fect, the proposed approach maintains its accuracy above the acceptable value, 88%, while other models’ accuracy goes belo w 40 % . V I I . A C K N O WL E D G E M E N T This work is based on research supported by N ASA Langley Research Center under agreement number C16-2B00-NCA T , and Air Force Research Laboratory and Office of the Secretary of Defense (OSD) under agreement number F A8750-15-2- 0116. R E F E R E N C E S [1] H. David, S. Abhijit, S. Rainer, L. Andreas, H. Markus, R. Martin, L.Jure, et al. Driver identification using automobile sensor data from a single turn. In Intelligent T ransportation Systems (ITSC), 19th International Conference . IEEE, 2016. [2] S. Amardeep, S. S. Omid, and H. John HL. Leveraging sensor information from portable devices towards automatic dri ving maneuver recognition. In Intelligent T ransportation Systems (ITSC), 2012 15th International IEEE Conference . [3] T Imkamon, P Saensom, P T angamchit, and P Pongpaibool. Detection of hazardous dri ving behavior using fuzzy logic. In ECTI-CON 5th International Conference . IEEE, 2008. [4] Ian Goodfellow , Y oshua Bengio, and Aaron Courville. Deep learning . MIT press, 2016. [5] Y ann LeCun, Y oshua Bengio, and Geoffrey Hinton. Deep learning. natur e , 2015. [6] T oshihiro W akita, Koji Ozawa, Chiyomi Miyajima, Kei Igarashi, Katunobu Itou, Kazuya T akeda, and Fumitada Itakura. Driv er iden- tification using driving behavior signals. IEICE TRANSACTIONS on Information and Systems , 2006. [7] Xingjian Zhang, Xiaohua Zhao, and Jian Rong. A study of individual characteristics of driving behavior based on hidden markov model. Sensors & T ransducer s , 2014. [8] Byung Il Kwak, JiY oung W oo, and Huy Kang Kim. Know your master: Driv er profiling-based anti-theft method. In 2016 14th Annual Confer ence on Privacy , Security and Trust (PST) . IEEE, 2016. [9] Fabio M, Francesco M, Albina O, V N, and A Kumar Santone, A San- gaiah. Human beha vior characterization for driving style recognition in vehicle system. Computers & Electrical Engineering , 2018. [10] Xuyang Y an, Mohammad Razeghi-Jahromi, Abdollah Homaifar , Berat A Erol, Abenezer Girma, and Edward Tunstel. A novel streaming data clustering algorithm based on fitness proportionate sharing. IEEE Access , 2019. [11] Xuyang Y an, Abdollah Homaifar, Gabriel A wogbami, and Abenezer Girma. Unsupervised feature selection through fitness proportionate sharing clustering. In 2018 IEEE International Confer ence on Systems, Man, and Cybernetics (SMC) . IEEE, 2018. [12] K Cho, Bart V an M, C Gulcehre, D Bahdanau, F Bougares, Holger Schwenk, and Bengio Y oshua. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 , 2014. [13] Francisco Javier Ord ´ o ˜ nez and Daniel Roggen. Deep con volutional and lstm recurrent neural netw orks for multimodal wearable activity recognition. Sensors , 2016. [14] Martin Sundermeyer , Ralf Schl ¨ uter , and Hermann Ney . Lstm neural networks for language modeling. In conference of the international speech communication association , 2012. [15] Global obd vehicle communication software manual. [16] Hacking and countermeasure research lab. Security dataset, http://ocslab .hksecurity .net/datasets/driving-dataset. [17] V ehicular -trace dataset, http://www .rettore.com.br/prof/vehicular-trace/. [18] S Narayanan, Sudip Mittal, and Anupam Joshi. Obd securealert: An anomaly detection system for vehicles. In 2016 IEEE International Confer ence on Smart Computing (SMARTCOMP) . IEEE, 2016. [19] Ke vin Ni, Nithya Ramanathan, Mohamed Nabil Hajj Chehade, Laura Balzano, Sheela Nair , Sadaf Zahedi, Eddie Kohler , Greg Pottie, Mark Hansen, and Mani Srivasta v a. Sensor network data fault types. ACM T ransactions on Sensor Networks (TOSN) , 5(3):25, 2009. [20] David J Hill, Barbara S Minsker , and Eyal Amir . Real-time bayesian anomaly detection for en vironmental sensor data. In Pr oceedings of the Congr ess-International Association for Hydraulic Research , volume 32, page 503. Citeseer, 2007. [21] Elias Kalapanidas, Nikolaos A v ouris, Marian Craciun, and Daniel Neagu. Machine learning algorithms: a study on noise sensitivity . In Pr oc. 1st Balcan Conference in Informatics , pages 356–365, 2003. [22] Jiawei Han. M.. kamber . data mining: concepts and techniques. Mor gan K aufmann , 2000. [23] M Hayes and Miriam AM Capretz. Contextual anomaly detection framew ork for big sensor data. Journal of Big Data , 2(1):2, 2015. [24] V arun Chandola, Arindam Banerjee, and V ipin Kumar . Anomaly detection: A survey . ACM computing surveys (CSUR) , 41(3):15, 2009. [25] C Zhang, M Patel, S Buthpitiya, K Lyons, B Harrison, and Gregory D Abowd. Driv er classification based on dri ving behaviors. In Interna- tional Conference on Intelligent User Interfaces . ACM, 2016. [26] Nagarajan Natarajan, Inderjit S Dhillon, Pradeep K Ravikumar , and Ambuj T ew ari. Learning with noisy labels. In Advances in neural information pr ocessing systems , pages 1196–1204, 2013. [27] Naresh Manwani and PS Sastry . Noise tolerance under risk minimiza- tion. IEEE transactions on cybernetics , 43(3):1146–1151, 2013.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment