Driver Action Prediction Using Deep (Bidirectional) Recurrent Neural Network

Driver Action Prediction Using Deep (Bidirectional) Recurrent Neural Network Oluwat obi Olabi yi , Eric Mart ins on , Vij ay Ch intal apudi , Rui Guo Intelligent Compu ting Division Toyota InfoTechnology Center U SA Mount ain Vie w, C ali forn ia, USA {oolabiyi, emartinson , achintal , rguo }@us.toyota - itc.com Abstract — Advance d driver assistance system s (ADAS) can be significantly improved with effec tive dr iver action prediction (DAP ). Predicting driver actions e arly and accurate ly can help mitiga te the effects of potential ly unsafe driving behaviors and avoid possible accidents. In this paper, we formulate driver action prediction as a timeseries anomaly prediction problem. While the anomaly (driver actions of interest) detection might be trivial in this con text , findin g patte rn s that consistently precede an anomaly requires searching for or extracting features across multi - mod al sensor y inputs. W e present such a driver action predict ion system, inc luding a real - tim e data acquisition , processing and learning framework for predicting future or impending driver action . The prop osed system incorpor ates camera - based knowledge of the driving environment and the driver themselves, in addition to traditional vehicle dynamics. It then uses a deep bidirectional recurrent neural network (DBRNN) to learn the correlatio n betw een senso ry inpu ts and impending driver behavior a chieving accurate and high horizon action prediction. The pr o po sed system p erforms better than other existing system s on d river action prediction tasks and can accurately predict key driver actions includin g acceleration, braking, lane change and turning at durations of 5sec before the action is executed by the driver . Keywords — timeseries mo deling, driving assistant system, driver action prediction, driver intent estimation, deep recurrent neural network I. I NTRODUCTION Drivi ng is a n in dispensable component of daily life. O ver decades, tremendous effort has been dedicated to im pr oving the sa fety a nd e fficiency of driving. Although earlier ad vance d driver assistance syst em s ( ADAS ) primaril y focused on sensing dangerous external environmental factors that migh t impact saf e driving, rec ent efforts are now shiftin g to incorporatin g d river intention into the sys tem [1 -3 ] (e.g. is the driver already planning on braking in the near future?) . Several of these rece nt studies have demonstrated that predictive driving assistant systems that are aware of the driving pattern s of intentional driver behavior will be very help ful by providing early notification to fu rther mitiga te dangerous driving maneuve rs [1 - 11 ] . Whil e the h igh observ ability of the i nteraction between driver and vehic le makes it e asy to recognize driver a ctio n Fig. 1. An example of t imeseries d ata that capture common changes (anomaly) in driver beha vior. Upper Le ft: Left and righ t turn; Upper Right: Braki ng and Accelera tion; Lower Left : Left lane change; Lower Right : Right lane change . when taking place , predicti ng the action before it tak es place is still a daunting task as it involves intric ate, multi - dimensi onal dynamics [1] . In this pap er, w e formulate the driver action prediction as a timeseries anomaly prediction problem. Th erefore, o ur proposed framework contains two main component s . First is the an omaly detection or driv er actio n recognition system. Fig. 1 shows the detecti on of common driver actions r elated to pedalling and steering as timeseries anomalies . The second component is the anomaly prediction or driver actio n prediction (DAP) system which invol ves predict ing earli er detected anomalies before they occur again in the future . Our prediction system uses deep B idirectional Recurrent Neural Network (DBRNN) consisti ng of multiple Long - Short Term Memor y (LSTM) units and/or Gated Recurrent Units (GRU) cells that learns to identify as early as possible, the spatia l- temporal dependen cies in tim e series data with respect to future driver action. Therefore, when those dependencies occur again in the data stre am, our model is able to accurately predict an impendin g d river action before the driver takes the ac tion. Our choice of prediction algorithm in conjunction with high observability of driver action enables the possibility of training the drive r action prediction model both of fline and online using the data obta ined from the action rec ognition sys tem. The rest of the paper is organized as follows: in Section II we descri be the relat ed drive r action predicti on work, and Section III contains the system descripti on while Section IV contains the deep RNN principle and structure . Dr iver - vehicl e - Fig. 2. Left: The Propo sed DAP system. Right: The Proposed Pr ediction Network based on DBRN environment ( DVE ) feature form ulation is detailed in Section V, and experiment al analysis on the real - worl d drivi ng datas et is in Section V I . We sum marize the entire work in Section V I I. II. R ELATED W ORK Most exist ing work on driver act ion predic tio n foc us es on feature en gineering in addi tion to basic tempora l se quence modeli ng. For each action of inter est, feat ure s that are temporally correlated to an action of inte rest are selected based on prior or expert knowledge . For example, i n [ 7 ], vehicl e speed measuremen ts from the CAN - bus , along with the traffic light sensing data , are adopted for predicting driver braking behavior with a 3sec time window. In [ 8 ], y aw and steering - wheel angles were considere d for lane departur e anal ysis. V ehicle velocity, steering angle , an d external GPS info rmation were used for model ling th e lane - change maneuver in [ 9 ]. Similarly, vehicle dynamics are used for identifying driver intention near intersections as proposed in [ 10 ]. The long - term trajectories are also inform ative for d river intention analysis . In [11], a Probabili stic Fin ite State M achine was used to model driver intent ion from the trajector y data. Others also studied the use of extra cted f eatures from vision - based sensors such as cameras. In [ 1 2,13 ], a front -f acing camera was used to predict di fferent driving behaviors. The authors also used ego - vehicle dynamics and lane information extracted from cameras to recognize driving related events . Driver behavior modeling can also inv olve advanced control and machi ne learning tech niques . In some of the existing literature [14 - 17 ], driving models were built using optimal control theory for analyz ing steering behavior, tr acking the driving pa th or estimating the rou te. In [18 - 19 ], Bayesian reasoning was used to recogn ize drive r intention with compl ex input feature dimensio n. Other high performance models [20 - 22 ] used dynamic Bayesian network s , or Hidden Markov Model (HMM) and its modifi ed struc tur es to achieve remarkable ach ievements in variou s applications. However, formul atin g DAP as a timeseri es anomaly prediction problem reveals that hand engineered features might not yield the best result. This is because the D AP system should be c apable of extracting relevant sensing inform ation that is most u seful in predic ting a future d river ac tion. Therefor e, with sensor - rich inp ut, there’s ne ed for a learning framework that can take adv antage of this. Although t he authors in [2] explore sensor - rich inp ut for lane chang e prediction, the lea rning framework based on relevant vector machin e (RVM) has limit ed fe ature extraction capability. The authors in [3] addressed this problem by using LSTM based recurrent neural network (capable of extracting richer discriminative features) with spatial sensory fusion to improve performance. They a lso incl ude turn predict ion in their work. In comparison , this work explore s a richer sensory input than in [2 ] and [3]. O ur model can also predict both steering and braking/acceleration actions unlike [2] and [3] that focused only on steering actions. M ore importantly, using a w indow ing approach similar to [2] , we employ deep bidirectional RNN to extract richer and more stable representation of the sensory input . T his is the core algorithmic difference between this paper and the proposition in [ 3 ]. While the doub le LSTM units used in [3 ] ach ieved spatial fusion by assigning in - vehicle features to one LST M and outside vehicle feature s to the other, ours assign all feature to both LSTM units but one unit process es the sensor y sequen ce from past to current an d the other from futur e to current. This fusion of different temporal directions enabl es the model to find causes of fu ture action more accur ately and much earli er t h an achievable in [3 ]. III. S YSTEM O VERVIEW The purpos e of the proposed D AP system is to corr ectly estimate driver in tent as early as possible before the driver takes a ction . Th e syste m sho wn in Fig. 2 is capab le of modeli ng the pred icti on of any recog nizabl e driver action . The proposed DAP system depicted on the left of F ig. 2 consists of sensing module, feature ex trac tion m odules, actio n recognition module, act ion p redic tion module and t rain ing mod ule. TABLE I. E XAMPLE OF S PLIT OF SENSORY FEAT URES BETWEEN ACTION RECOGNITION AND PRED ICTION SYSTEMS Modalit y Recognition Prediction CAN Bus Data Brake and accelerator pressure, steering angle Brake and accelerator pressure, gear positions , steering angle, velocity and a cceleration , engine rpm, elevation (8) Face Camera None Head pose (9) Hand Camera None Hand positi ons, movement status and directions (19) Dash Camera Left lane offset Right lane o ffset adjancent lane availability , relative position and velocity of road objects to driver’s and adjacent lanes , left and right lane offset and curvature (1 2) GPS + Map None D istance and direction fro m nearest intersection (2) The sensing modul e handl es sens ing. Th e feat ure ext racti on modules indepen dentl y extrac t featu res which can be used to recognize and predict driver actions as shown in Table 1 . The driver action prediction module cont inuously predicts future driver action by running the prediction network (PN) model on the features extra cted for prediction . On the other hand , the driver action recognition module runs the recogniti on network (RN) model on the features extracted for recog nition in order to rec ognize driver actions. Lastly, the training module is retroactively combining features extracted for prediction and the recog nized actio n label to generate training exa mples, which are then used to train the PN mode l . O ur sy stem uses off - the - shelf feature extraction syste ms and therefor e our main algorithmic cont r ibu tion is the p rediction n etwork shown on the right of F ig. 2. IV. DBRNN FOR D RIVER A CTION PREDICTION The key component in this work is the use of DBRNN framework shown on the right hand side of Fig 2 to m odel futur e driver action. The RNN cell can be simple, long - short term mem ory or gate d recurrent u nit RNN s . S ince our go al is to predict d river action before it ta kes place and assumin g that both negative and positive states can be recognized correctly by the recogn ition network , our DBRNN framework focuses on modeling the transition between negative state (none - action state) and positive stat e (action state) in feature sequen ces . Consider a stream of driving data, we use the recognition system to mark the beginnin g of each driver action . Suppose a n action event occurs at time t a . W e de fine a n action event random v ariable , 𝑎 ∈ { a ! , . . , a ! } and a timing random variable 𝜏 ∈ τ ! , . . , τ ! . No t e that a ! is the norm al (negative) a ction state while the rest are anomalous. W e define two parameters ; d, the max imum ho rizon (transitio n period) a nd T, the maximum sequenc e length per predictio n. At any time t, th e prediction syst em models the joint conditional probabilit y 𝑦 ! = 𝑃 ( a = a ! , 𝜏 = 𝜏 ! | X ! ; Θ ) (1 ) where X t is the model input time t , and Θ is the mode l parame t ers . In [2], X ! = 𝑥 ! ! ! ! ! , … , 𝑥 ! where 𝑥 ! is the input sample v ector at time t , 𝜏 ∈ ( t < t ! − 𝑑 , 𝑡 = t ! − 𝑑 ) , d = 2.5sec and T = 2sec . However , to overc ome the timing pro blem in [2] (i.e. difficulty in precise timing of cause s of drive r action) , [3] using RNN model relaxed the timin g constraint and substitute 𝜏 ∈ ( t < t ! − 𝑑 , t ! − 𝑑 ≤ 𝑡 < t ! ) , X ! = 𝑥 ! , T = 0.8sec (1 sample) and d = 4.8sec (7samples ) . Please note tha t the internal state of the R NN is used in [3] to p rovide context from one timestamp to the other. In ord er to lim it arbi t rary context propag ation and therefore impro ve performance over [3], our DBRNN networ k uses a window simila r to [2] to model the con ditional probability in (1) w ith 𝜏 ∈ ( t < t ! − 𝑑 , t ! − 𝑑 ≤ 𝑡 < t ! ) and X ! = 𝑥 ! ! ! ! ! , … , 𝑥 ! and d = T = 5sec (50samples) in all our evaluations . A. Network Archi tecture The predictio n network is implem ented with a Recur rent Neural Network (RNN). The n eural networ k takes the tempor al sequence of o bservation 𝑋 ! = ( 𝑥 ! ! ! ! ! , 𝑥 ! ! ! ! ! , . . . , 𝑥 ! ) as input, and generate s a sequence of vectors ( ℎ ! ! ! ! ! , 𝑥 ! ! ! ! ! , . . . , ℎ ! ) via a non - linear activation. The output vectors are hidden variables that des cribe the distribution of the observation . Differ ent from the traditional feed - forward neural network, a RNN also considers histor ical informat ion or so - called “memory”. The label is determined by a non - linear mappin g over these hidden variables . The mathe matic al formulation s are described in t he following equati ons. ℎ ! = 𝐻 ( 𝑊 ! ! 𝑥 ! + 𝑊 !! ℎ ! ! ! + 𝑏 ! ) (2 ) 𝑦 ! = 𝐹 ( 𝑊 ! ! ℎ ! + 𝑏 ! ) (3 ) where 𝐻 is a no n - linear functio n which is chosen from sigmoid and tan h conv entionally . F is usually chosen from sigmoid or softmax if the outp ut label are just inde pendent or mu tually exclusive respectively . W and b are associated weight (e.g. 𝑊 ! ! is the input - hidden weight matrix), an d bias ( e.g. 𝑏 ! is hidden bias vector ) ma trices respectiv ely . These parameters are learn ed from the training data. Recent ly, re searche r s demonstrated a problem with simple RNN structures where t he gradient va nishes in training. To solve this p roblem, t h e most popular re place ment s are Long Short - Term Memory (LSTM) and Gated Recurrent Unit (GRU) cells that can mai ntain their state over tim e, result ing in captured long term dependencies wi thin time series data [23 - 27] . T he versio ns of LSTM and GRU cell used in thi s paper are respectively given in [ 3, 25 ] and [26]. We use the foll owin g shorthand notation to resp ectively denote the LSTM and GRU cell operation s. ℎ ! = 𝐻 !"#$ ( 𝑥 ! , ℎ ! ! ! , 𝑐 ! ! ! ) (4) ℎ ! = 𝐻 !"# ( 𝑥 ! , ℎ ! ! ! ) (5) The main diff erence betwe en LSTM and GRU is that the content of G RU cell memory is alw ays exposed to the output and therefore it is easier to implement since it requires fewer network parameters. In summ ary, our RNN cell 𝐻 ! ( . ) can be LSTM (Eq. 4 ) or GRU cell (Eq. 5 ) RN N. B. Bidirectional RNN One short coming of convent ional RNNs is that they are only able to make use of previous context. However , with a windowed appr oach where whole temporal context is available , there is no reason not to exploit future context a s well. As shown in right hand side of Fig. 2 , the Bidir ection al RNNs (BRNNs) using basic RNN units do this by processing the data in both directions with two separate hidden layers, whi ch are then fed for ward to the sam e output layer. A BRNN computes the forward hidden sequence ℎ ! , the backw ard hidden sequence ℎ ! , and the output sequenc e 𝑦 ! by iterating the backward layer from t - T +1 to t , the forward layer from t to t- T +1 and then updating the output layer: ℎ ! = 𝐻 ( 𝑊 ! ! 𝑥 ! + 𝑊 ! ! ℎ ! ! ! + 𝑏 ! ) (6) ℎ ! = 𝐻 ( 𝑊 ! ! 𝑥 ! + 𝑊 ! ! ℎ ! ! ! + 𝑏 ! ) ( 7 ) 𝑦 ! = 𝐹 ( 𝑊 ! ! ℎ ! + 𝑊 ! ! ℎ ! + 𝑏 ! ) (8 ) Facilita ting LSTM with this bidirectional lear ning capability enables the system to access long - range context in both input directions and thus enrich sequ ential d ata understanding. C. Deep RNN With the poten tia l to dise nta ngle compli cat ed temp oral dependencies, deep RNNs can be created by stacking multiple RNN hidden layers on top of each other, with the output sequence of o ne laye r forming the input sequence fo r the next, as shown in the LHS of Fig. 2 . A ssuming the sa me hidden layer fun ction is u se d for all N layers in the stack, th e hidden vector sequences ℎ ! are iteratively computed from n = 1 to N and t – T+1 to t : ℎ ! = 𝐻 ( 𝑊 ! ! ! ! ! ! ℎ ! ! ! ! + 𝑊 ! ! ! ! ℎ ! ! ! ! + 𝑏 ! ! ) (9 ) 𝑦 ! = F ( 𝑊 ! ! ! ℎ ! ! + 𝑏 ! ) ( 10 ) where ℎ ! = 𝑥 . D. Deep Bidi recti onal RNN De ep bidir ecti onal RNNs can be imple mented by replaci ng each hidden seque nce ℎ ! with the forward and backward sequences ℎ ! and ℎ ! , and ensuring that every hidden layer receives input from both the forward and backward layers at the level be low. In the p roposed n etwork on the right of Fig. 2 , a deeper architecture is obtained by replacing each RNN unit in the deep bidirectio nal RNN by a deep unidirectional RNN. Hence, for M stacked BRNN and each of forward and backward cell having N stacked RN N configuration, we obtain a 2x M x N RNN unit system. However, we foun d out that very deep conf iguration s do not yi eld be tter performance on our limited dataset as they only o ver - fit the training data . Therefore, for the dataset considered here, we only u se single bidirectional LSTM s tacked wit h singl e unidi rectional GRU cell yielding a three RNN unit system , i.e. ( ℎ ! ! , 𝑐 ! ! ) = 𝐻 !"#$ ( 𝑥 ! , ℎ ! ! ! ! , 𝑐 ! ! ! ! ) ( 11 ) ( ℎ ! ! , 𝑐 ! ! ) = 𝐻 !"#$ ( 𝑥 ! , ℎ ! ! ! ! , 𝑐 ! ! ! ! ) ( 12 ) ℎ ! ! = 𝐻 !"# ( [ ℎ ! ! : ℎ ! ! ] , ℎ ! ! ! ! ) ( 13 ) 𝑦 ! = F ( 𝑊 ! ! ! ℎ ! ! + 𝑏 ! ) ( 14 ) We howe ver believe that large r data set w ill ben efit fro m deeper configurati on. E. Sensory Fusion Althou gh our system uses mult i - modal inputs we do not need to feed each input m odality to its own RNN since the modali ties are already pre - processed . Th erefore, we only concatenated the features togeth er, and pass it through the network as a single vector . We also evalu ate d the sp atial sensory fusion demonstrated in [3] , but it did not yield an improved performanc e wit h our deeper structure and bidirectional RNN . F. Network Train ing We tra in the netw ork with back propagation through time , using Tensorflow on an Nvidia GTX GPU runn ing Ubuntu 14.04 . The network weights and biases are adapted using the Adam optimizer with a learning rate of 1 e - 2 and decay of 0.1 per 100 epochs. The maximum epoch is set to 1000 . The gradient on each RN N cell is clipped to ma x value of 1 0 to avoid gradient explosion problem . Each RNN cel l has 64 hidden units. V. F EATURE EXTRACTION The per formance of machine learning methods is heavily dependent on the choice of data representation (or features) on which the y are applie d . For the data used in this paper, the sensory input of the DAP system is the multi - modali ty sequences derived from the following : (i). CAN - bus: steering angle, pedal pressures, yaw rate, speed, acceleration , road slope and engine rpm ; (ii). Face ca mera: Head pose and eye g aze; (iii) Han d ca mera: hands posi tions , hands movin g and on/off streering wheel status ; (iv ) Da sh camera: adja cent lane availability , right and left la ne offset , relative position and speed of road objects to the drive r and adjacent lanes (v) GPS+map: di stanc e and direction from nea rest intersection . The way we have separated the incoming sens ory data for recognition and prediction purp oses is d epicted in Table 1. The foll owing is the detailed description of the features extracted for driver action prediction. A. Vehicle Dynamic I nformation Monit ori ng the CAN bus is the most straightfo rward method of determinin g th e vehicle ’s work ing status and the driver - vehicle interaction. D ata are obtained by decoding the CAN codes according to the OBD2 protoco ls. W e obtain an 8- dimens ion feature v ector from th e CAN - bus. B. Driver Beha vior Infor mation 1) Face Camera Feature Extractio n T he driver’s head motion is deter mined by analyzing video of the driver’s face . The processing pipeline consists face detection, facial landmark tracking and feature extraction. Similar to [ 3 ], f ace detection and landmark tracking tasks are impleme nted with C onstrained L ocal Neura l Field (CLN F) model [2 8] . T here are totally 68 landmarks are extracted from the detected face. Based on these lan dmark trac ks, the driver’ s face movements and rotations are represented with hist ogram features. In particular, the m atching lan dmarks b etween successive frames and their per - pixel horizontal changes are calcula ted to m odel th e move ments and rotations . Finally , the mean movement and histograms of horizontal and angular motion s with bins [ ≤ − 2, − 2 to 0, 0 to 2, ≥ 2] and [0 to π /2 , π /2 to π , π to 3 / 2 π , 3 / 2 π to 2 π ] are calcu late d . The final result is a 9- dimension al feature vector per sample to repre sent face feature s . 2) Hand Camera Fe ature Extra ction Hand gestur e is another import ant drivi ng behavior descriptor. The location and movement of the hands on the steering can b e highly co rrelated to impend ing lane cha nge or turn and in so me ca ses even braking and acceleration. We adopt a FR CNN facilita ted hand detection algorithm [29 ] and extracted the following feature s: han d position s , distance , relative ang le toward s the steering center, relative pos ition on the steer w heel, motion , moving distance an d moving direction. Overal l, 19 - dimension s per sample encode hand informatio n during driving. C. Drivin g Environme nt Infor ma tion The environment provides the m ost influential drivin g context, affect ing driver’s judgment and driving maneuver change. Features describing the environment were constructed from the outputs of a Mobile ye syste m [ 30 ] process ing dash camera images . Th e extracted features include: (1) tw o b inary features indicating whether a lane exists o n the left side and on the right side of the vehicle and their respective offset and curvature ; (2) object (pedestrian or vehicle) position and speed informatio n in dri ver and both adjacent lanes . An 12- dimension feature vector is obtai ned from the dash camera. We also comp are the vehicle GPS coordinates with street maps to build a 2 - dimensional representation of proximity to an intersection . This consists of: (1) a 3- fac tor feature indicating if the vehicle is at (within 40 meters), approaching or departing a road artifact such as intersections, turns, highway exists, etc ; and (2) the actual distance from the nearest intersection. TABLE II. N UMBER OF POS I TIVE AND NEGATIVE EX A MPLES ( OBTAINED FROM THE RECOGNITION SYSTEM ) USED FOR TRAINING (70%), CROSS - VALIDATION (15%) AND TESTING (15%) Driver Acti on Positives Negatives ( Balanced) Braking 1033 1836 (1550) Lane Change Left - 109, Right - 125 4509 (188) Turns Left - 264, Right - 269 3516 (404) VI. E XPERIMENTS AND R ESULTS A. Data Colle ction We coll ecte d about 3 5 hr of driving data under different driving conditions across 5 drivers recorded in south San Fransisco bay area, California . The data collection veh icle is outfitted w ith three cameras, GPS and CAN bus data logger. The sampling rates across the sen sors are 28 - 30fps, 1H z and 80Hz for cameras, G PS and CAN bus respectively. For action prediction, we resample d th e incoming data t o 10Hz. The missi ng dat a are either extr apola ted for floa t - value features or repeated w ith the neare st past value for factor - value features. For each of postive and negative samples a 5sec sequenc e is used yileding an input size of 50x50 per example ( T =50) . T o dem onstrate the proposed system capabilit ies , we evaluate for braking , lane chan ge and turn anomaly action prediction. Table 2 contain s the detected number of ano malous actions and normal states using our proposed recognition system ( before and after class balancing ) . In all case s , we compare the performance of the pro po sed DBRNN to unidirectional RNN syst em in [3] using a balanced da taset . F inally , we inves tigate the per for mance when trai ning wi th data from an indiv idual driv er and comp are d to generic model of all five drivers. B. Braking Action Predi ction Fig. 3 plots performance for predicting braking events up to 5sec before braking is detected . The proposed DBRNN (Bi - LSTM Win) network gives comparable result to a unidirectional LSTM (U ni - LSTM) with a slight performance improvem ent at the beginn ing of the test seque nce yielding a better prediction horizon . The unidirectio nal LSTM is obtained by removing the backward LSTM in Eq. (12) from the ne twork. O verall, the proposed DBR NN system is able to achieve ~ 80% average accuracy, 70% true positi ve rate and 12% false positi ve rate 3sec from the braking event . C. Lane C hange Action Predic tion In Fig. 4 , we depict the performance of lane ch ange ev ent prediction. Here , the proposed Bi - LSTM network shows a significant performance improvement over the unidirect ional Fig. 3. Piecewise performance vs. time - to - event for braking prediction s. Fig. 4. Pie cewice performace vs. ti me - to - event for lane chang e . LSTM over the entire 5sec preceding the event. However, due to the fewer n umber of positive lane chang e actions, the overall performance is less than that of braking prediction. Please note that the true positive rate here is the average of positive rat es for left and r ight lane chan ge. D. Turning Action Predi ction When predict ing left and righ t turns , t he proposed DBRN N system ag ain shows better performance than the un idirectional version ( Fig. 5 ). Both model s perform better and worse compared to lane change and braking prediction respectively . It is also worth to no te that the performance gap between DBRNN and unidirec tional versi on is higher and lower compared to braking and lane change prediction respect ively. E. Ind ividual Driver Action Predicti on Finally, w e apply the pr edictive models to the braking data of an individual driver as opposed to all 5 drivers data used for the e valuations abov e. T he overall accuracy comparison is depicted in F ig. 6 . The result shows that both model s perform better on an individual driver data com pared to com bined driver in Fig. 3 . However , the pr oposed DBRN N performance improvem ent over the unidire ctional version is h igher with an individual d river. Fig. 5. Piecewice performace vs. t ime - to - event for turning . Fig. 6. Piecewice Overall Accurcy vs. time t o event for braking prediction for individual and combined 5 drivers. F. Sum mary of Results In su mmary, the results indicate th at the p roposed DB RNN performs better than the unidirectional version across all but one evaluated scenarios. The exception is w ith predicting braking using a generic (vs individualized) model, when the two system s demon strated com parable per formance . In gen eral , the pe rformanc e improv ement appears to increase with reduced nu mber of detected positive e xamples as well as on individual driver data. A lso, the performance of both models incre ases with indiv idual driver data comp are d to generic driver data. This h ig hlights tha t our p redictive model is able to learn w ith data from a single indiv idual, mak ing real - time training in the vehicle a real possibility for futur e predictive sys tems. VII. C ONCLUSION In this pap er , we have proposed a novel DAP system that integrate s both recognition and prediction system s . The proposed prediction system is based on DBR NN , which enables tem poral fusion of both past and future context to learn the correlation between sensor data and future driver action. The proposed D BRNN system performs better than the state - of - the - art DAP system based on a unidir ectional RNN , and is potentially trainable in veh icle for m odeling individual drivers . The high predicti on accuracy and prediction horizon performance will enable new driver assistance capabilities to effectively alert drivers before taking dangerous driving actions. Our propose d predic tion sys tem can al so be appl ied to a general class of ano maly prediction system. O ur future work s include improving system performance by exten ding input sensing m odalities as well as exten ding system capability to di rectly predict both the anomalous event and time - to - event simultaneously . A CKNOWLEDGMENT The authors would like to thank Toyota M otor Corporation for sponsoring the research project. Driver A Combi ned Drivers R EFERENCES [1] Ohn - Bar, Eshed, Ashish Tawari, Sebastien Martin, and Mohan Manubhai Trive di. " Predict ing dr iver maneuvers by le arning holist ic features." In Intelligent Vehicles Symposium Proceedings, 201 4 IEEE, pp. 719 - 724. [2] B. Morris, A. Doshi, and M. Trivedi . L ane change intent prediction for driver assistance: On - road design and evaluation. In IEEE Intern ational Vehicle Symp osium Proceed ings, 2011. [3] J ain, Ashesh, Hema S. Koppula, Shane Soh, Bharad Raghavan, Avi Singh, and Ashutosh Saxena. "Brain4Cars: Car Tha t Knows Before You Do via Sensory - Fusion Deep Learning Architecture." arXiv preprint arXiv:1601.00740 (2016). [4] Oliver, Nuria, and Alexander P. Pentland. "Driver behavior recognition and prediction in a SmartCar." In AeroSen se 2000, pp. 280 - 290. Internationa l Society for Op tics and Photonics, 20 00. [5] Maye, Jérôme, Rudolph Triebel , Lucian o Spinell o, and Roland Siegwart. "Bayesian on - line learning of driv ing behaviors." In Rob otics and Automation (ICRA), 2011 IEEE International Con ference on, pp. 4341 - 4346. IEEE, 2011. [6] Miyajima , Chiyomi, Yoshihiro Nishiwaki , Koji Ozawa, Toshihir o Wakita, Katsuno bu Itou, Kazuya Takeda, and Fumitada Itakura. "Driver modeling based on driving behavior and its evaluati on in driver identification." Proce edings of the IEEE 95 , no. 2 (2 007): 427 - 437. [7] Ortiz, Michaël Garcia, Jens Schmüdderich, Franz Kummert, and Alexander Gepperth. "Situat ion - specific learning for ego - vehicle behavior prediction systems." In Intelligent Transportation Systems (ITSC), 2011 14th International IEEE Conference on, pp. 1237 - 1242. IEEE, 2011. [8] Angkititrak ul, Pongtep, Ryuta Tera shima, and Tosh ihiro Wakita. "On the use of stochastic driver behavior model in la ne departure warning." Intelligent Transportation Systems, IEEE Transactions on 12, no. 1 (2011): 174 - 183. [9] X u, Guoqing, Li Liu, Y ongsheng Ou, and Zhangjun Song. "Dynamic modeling of driver control strategy of lane - change behavior and trajectory planning for collision prediction." Intelligent Transpo rtation Systems, IEEE Transact ions on 13, no. 3 ( 2012): 1138 - 1155. [10] Liebner, Martin, Michael Baumann, Felix Klanner, and Christoph Stiller. "Driver intent inference at urban intersections using the intelligent d river mod el." In Intelligent Vehicles Sy mposium (IV), 2012 IEEE, pp. 1162 - 1167. IEEE, 2012. [11] Kurt, Arda, John L. Yester, and Yutaka Mochizuki. "Hybrid - state driver/vehicle modelling, estimation and prediction." In Intelligent Transportation Systems (ITSC), 2010 13th International IEEE Conference on, pp. 806 - 811. IEEE, 2010. [12] M. Herac les , Fernando Martinelli and Jan nik Fritsch. "Vision - based behavior prediction in urban traffic environments by scene categorization." (2010). [13] Ortiz, Michaël Garcia, Franz Kummert, and Jens Schmüdderich. "Prediction of driver behavior on a limited sensory setting." In Intelligent Transportation Systems (ITSC), 2012 15th International IEEE Conference on, pp. 638 - 643. IEEE, 2012. [14] Donges, Edmund. "A two - level model of driver steering behavior." Human Factors: The Journal of the Human Factor s a nd Er gonomics Society 20, no. 6 (1978): 691 - 707. [15] McRuer, Duane. "Human dynamics in man - machin e systems." Automatica 16, no. 3 (1980): 237 - 253. [16] Hess, R. A., and A. Modjtahedza deh. "A control the oretic model of driver steering behavior." Control Systems Magazine, IEEE 10, no. 5 (1990): 3 - 8. [17] MacAdam, Charles C. "Application of an optimal preview control for simulation of closed - loop automobile driving." (1981). [18] Oliver, Nuria, and Alex P. Pentland. "Graphical models for driver behavior recognition in a smartcar." In Intelligent Vehicles Symposium, 2000. IV 2000. Proceedings of the IEEE , pp. 7 - 12 . [19] Liu, Andrew, and Dario Salvucci. "Modeling and prediction of human driver behavior." In I ntl. Conference on HCI. 2001. [20] Meyer - Delius, Daniel, Christian Plagema nn, and Wolfram Burgard . "Probabilistic situation rec ognition for vehicular traffic sce narios." In Robotics and Automation, 2009. ICRA'09. IEEE International Conference on, pp. 459 - 464. IEEE, 2009. [21] Berndt, Holger, Jörg Emmert, and Klaus Dietmayer. "Continuous driver intention recog nition with h idden markov m odels." In Intelligent Transportation Systems, 2008. ITSC 2008. 11th International IEEE Conference on, pp. 1189 - 1194. IEEE, 2008. [22] Kuge, Nobuyuki, Tomohiro Yamamura, Osamu Shimoyama, and Andrew Liu. "A driver behavior recogniti on method based on a driver mo del framework." SAE transact ions 109, no. 6 (2000): 469 - 476 [23] Jain, Ashesh, Hema S. Koppula, Bharad Raghavan, Shane Soh, and Ashutosh Saxena. " Car that Knows Before You Do: Anticipating Maneuver s via Learning Temporal Driving Models ." arXiv preprint arXiv:1 504.02789 (2015). [24] Graves, Alex, and Jürgen Schmidhuber. "Framewise phoneme classification with bidirectional LSTM and other neural network architectures." Neural Networks 18, no. 5 (2005): 602 - 610. [25] Graves, Alan, Abdel - rahman Mohame d, an d G eoffrey Hinton. "Sp eech recognition with de ep recurrent neural networks." In Acoustics, Speech and Signal Processing (ICASSP), 201 3 IEEE International Conference on, pp. 6645 - 6649. IEEE, 2013. [26] S. Hochreiter and J. Schmidhuber, “Long Short - Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735 – 1780, 1997. [27] Cho, K yunghyun, Van Merrienboer, Bart, Gulcehre, ¨ Caglar, Bougares, Fethi, Schwenk, Holger, and Bengio, Yoshua. Learning phrase representations using rnn encoder - decoder for statistical machine translation. arXiv prep rint arXiv:140 6.1078, 2014 [28] Baltrusaitis, Tadas, Peter Robinson, and Louis - Philippe Morency. "Constrained local neural fields for robust facial landm ark d etection in the wild." In Proceedings of the IEEE Internation al Conference on Computer Vision Workshops , pp. 354 - 361. 2013. [29] Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R - CNN: Towards real - time obje ct detection with region prop osal networks." In Advances in Neural Information Processing Systems, pp. 91 - 99. 2015. [30] Mobileye homepa ge. http:// www.mobileye .com/en - us / . Accessed Mar 3, 2017 . .

Driver Action Prediction Using Deep (Bidirectional) Recurrent Neural Network

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment