DeepBrain: Towards Personalized EEG Interaction through Attentional and Embedded LSTM Learning

DeepBrain: T owards P ersonalized EEG Interaction thr ough Attentional and Embedded LSTM Learning Di W u † , Huayan W an † , Siping Liu † , W eiren Y u ‡ , Zhanpeng Jin § , Dakuo W ang ¶ † Hunan Univ ersity , China ‡ Univ ersity of W arwick, UK § Univ ersity at Buff alo, USA ¶ IBM Research AI, USA Abstract The “mind-controlling” capability has always been in mankind’ s fantasy . With the recent advancements of elec- troencephalograph (EEG) techniques, brain-computer inter- face (BCI) researchers have explored various solutions to al- low individuals to perform various tasks using their minds. Howe ver , the commercial off-the-shelf devices to run accu- rate EGG signal collection are usually expensi ve and the comparably cheaper devices can only present coarse results, which prevents the practical application of these de vices in domestic services. T o tackle this challenge, we propose and dev elop an end-to-end solution that enables ﬁne brain-robot interaction (BRI) through embedded learning of coarse EEG signals from the lo w-cost de vices, namely DeepBrain, so that people having difﬁculty to move, such as the elderly , can mildly command and control a robot to perform some ba- sic household tasks. Our contrib utions are two folds: 1) W e present a stacked long short term memory (Stacked LSTM) structure with speciﬁc pre-processing techniques to handle the time-dependency of EEG signals and their classiﬁca- tion. 2) W e propose personalized design to capture mul- tiple features and achieve accurate recognition of individ- ual EEG signals by enhancing the signal interpretation of Stacked LSTM with attention mechanism. Our real-world ex- periments demonstrate that the proposed end-to-end solution with low cost can achiev e satisfactory run-time speed, accu- racy and ener gy-efﬁciency . Introduction Brain-Computer Interface (BCI) design, as an emerging sub- ﬁeld of Machine Learning (ML) and Human-Computer In- teraction (HCI), has made signiﬁcant progress in recent years. It is also emerging as a typical application within the context of artiﬁcial IoT (Zhang et al. 2010; W u et al. 2020; Qin et al. 2018). In general, BCI systems reply on a head- worn device to collect electroencephalography (EEG) sig- nals and interpret them into various user attentions. Based on this technology , many experimental BCI systems hav e been proposed in different scenarios. For example, Akram, Han, and Kim (Akram, Han, and Kim 2015) studied how to extract a user’ s EEG signal to simulate a mouse click action Copyright © 2020, Association for the Advancement of Artiﬁcial Intelligence (www .aaai.org). All rights reserved. on a PC. Mauss and Robinson (Mauss and Robinson 2009) took a step further where the y designed a system to extract that signal from one participant and then transmit it into an- other participant’ s mind to study whether it can inﬂuence participants’ video game playing behavior . O.R.Pinheiro et al. (O.R.Pinheiro et al. 2016) dived into a dif ferent domain where they aimed to design a BCI system that allowed pa- tients to control a robot in the healthcare domain. The advancements of EEG-based BCI also attribute to the powerful neural netw ork architectures released in recent years. Now adays, researchers can use advanced neural net- work based models, as opposed to the early-days regression models, to interpret the EEG signals (Gudmundsson et al. 2007). This approach works extremely well with recurrent neural network model architectures (RNN) and its deriv ed Long Short-T erm Memory (LSTM) architecture, as the EEG signal is a chronological sequence of data. Howe ver , these existing systems suffer from a common drawback that most of them are experimental prototypes, or they were dev eloped for institutional users (e.g., hos- pitals and gov ernments (W illiams et al. 2015)). Thus, the hardware cost is rather expensi ve, which inhibits the wide adoption in people’ s daily use scenarios. EEG collection equipments hav e different prices. As shown in Figure 1, EMO TIV EPOC+ 14 Channel Mobile EEG costs $799.00 whereas Brainlink costs $99. The usage of such EEG col- lection equipments for ordinary users is often limited by the price. Another drawback of the BCI-controlled robots is that they only allow users to perform one action, e.g., using a spe- ciﬁc pattern of signals to mov e the robot forward (Nguyen et al. 2015). Our k ey moti vation is to pro vide an accessible end-to-end solution for the general public users. Many people ha ving difﬁculty on walking or other mov ements, e.g., the elderly liv e alone at home and they have needs to get many “sim- ple” house work done, but those simple tasks (e.g., pick up a remote from the ﬂoor) are not simple for them anymore. W e aim to design a solution for these users to use only their brains to control a robot performing these actions. This so- lution has to be low-cost, energy efﬁcient, performing more- than-one categories of tasks, and at a relativ ely high accu- racy . T o ensure we are getting the users’ needs correctly , we (a) High-cost Emotiv EPOC (b) Low-cost Brainlink Figure 1: Commercially available, consumer-grade EEG collection devices. adopt participatory design (Muller 2003) approach to invite targeted users to be part of the iterative design process. W e spend days visiting the elderly care facilities and some of the houses where an elderly liv e alone. Through the analysis of observation note, interview , and participatory design ses- sion, we ev olve our initial design (e.g., a NA O robot) into the ﬁnal design (e.g., a TX2-based robot) after multiple iter- ations. Contributions T o tackle these challenges, we propose and develop an end-to-end solution that enables ﬁne brain-robot interaction (BRI) through embedded learning of coarse EEG signals from the low-cost devices, namely DeepBrain, so that our targeted users can mindly command and control a robot to perform some basic household tasks. The main contributions of this work are as follo ws: • On technical aspect, we ﬁrst present a Stacked Long short-term memory (Stacked LSTM) structure with spe- ciﬁc pre-processing techniques to handle the time- dependency of EEG signals and their classiﬁcation. Then we propose personalized design in the DeepBrain to cap- ture multiple features and achie ve accurate recognition of individual EEG signals by enhancing the signal interpre- tation of Stack ed LSTM with attention mechanism. Thus, our DeepBrain can present comprehensi ve capabilities to process time-dependency and personal features of EEG signals at the same time. • On experimental aspect, we collect two datasets (one dataset in a quiet environment and the other one is in a practical but noisy en vironment). These datasets are from different gender - and age-groups (3 males and 3 females, span between 40 to 70) to illustrate the performance of our method. W e compared our DeepBrain approach with the current state-of-the-art works. The experimental results demonstrate that our method outperforms other methods on run-time speed, accuracy and ener gy-efﬁcienc y . Related W ork This section presents current research on W earable EEG de- vices and EEG processing using recurrent neural networks. W earable EEG Devices Since emotions have many tracks inside and outside our body , kinds of methods hav e been adopted for construct- ing emotion recognition models, such as facial expressions, voices, and so on (Calvo and D’Mello 2010). Among these approaches, EEG-based methods are measured to be promis- ing approaches for emotion recognition. Many ﬁndings in neuroscience support that EEG allow direct assessment for the ”inner” states of users (Calv o and D’Mello 2014). How- ev er , most of these researches link much with wet electrodes (some with dozens of electrodes). Except the time costs and high price for placing the electrodes, the unrelated chan- nels may interfuse noise in the system, which can af fect the performance of the system badly . The HCI community in- vok es user -friendly usage for effecti ve brain-computer inter - actions. W ith the rapid de velopment of wearable de vices and dry electrode techniques (Huang et al. 2015), it is possible to develop wearable EEG application devices. For instance, a language or hand disabled person wearing such a device could sho w his or her emotions to service robot if the de vice detects that he or she is in certain kind of emotional state. As we all know , an emotional recognition EEG de vice with easy installation is popular in HCI. In order to realize this idea, in our paper, we apply a relativ ely lo wer number of electrodes for EEG collection, the EEG collection device is called Brainlink, which has two non-inv asiv e dry electrodes in the forehead position. When the user is in a dif ferent state (focused or relax ed), the Brainlink will display different col- ors of breathing lights. And then perform emotion recogni- tion on the collected EEG data through the system described abov e (Zheng et al. 2018). EEG Processing Using Recurrent Neural Networks When a neural network is used to EEG data, the connec- tion between the data at the time before and after can be ﬁnished by manually building a sliding window to deal with the EEG time series data. Deep neural networks (Le- Cun, Bengio, and Hinton 2015) are applied to classify EEG data. LSTM (Schmidhuber and Hochreiter 1997) are recur - rent neural networks (RNNs) equipped with a special gat- ing mechanism that controls access to memory cells. Long short-term memory cells controlled by gates allow infor- mation to pass unmodiﬁed over many time steps. Since the gates can prevent the rest of the network from modi- fying the contents of the memory cells for multiple time steps, LSTM networks preserve signals and propagate er- rors for much longer than ordinary RNNs. Through indepen- dently reading, writing and erasing contents from the mem- ory cells, the gates can also be trained to deal with the selec- tion of input signals and negligence of other parts. Stacked LSTM (Grav es 2013) consists of LSTM unit connections along depth dimension. It beneﬁts to store and generate longer range patterns and is much robuster . The attention LSTM cells are present along the sequential computation of each LSTM network. Ho wev er , they are not present in the vertical computation from one layer to the next. Multidimen- sional LSTM (Schmidhuber , Grav es, and Fernandez 2007) replaces a single recurrent connection with many recurrent connections, so that it can deal with multi-dimensional tasks such as images and videos. Recently , Zhang et al. (Zhang et al. 2017) present a brain typing system for conv erting user’ s thoughts to texts via deep feature learning of EEG signals. Classiﬁer used in their systems achieves the accuracy of 95.53% on multi- variate classiﬁcation. their latest research work in (Zhang et al. 2018) builds a univ ersal EEG-based identiﬁcation model which achieves the accuracy of 99.9% without any actual system. Howe ver , in the abov e research approaches, some methods collect only one-channel EEG from the frontal cor- tex with the device, which is limited to calculate the emo- tion index. Some approaches focus on more channels EEG from the frontal cortex with more advanced equipment. Al- though the adv anced de vices can bring us more div erse EEG data, as is known to all, every additional electrode costs a lot of money . Therefore, we may spend thousands of dol- lars on data collection de vices ﬁnally , which is not good for us to design a complete and feasible BCI system. Few stud- ies attempt to build a feasible, high precision, civilian and easily deployable EEG-based emotion recognition system. Howe ver , this approach is only applicable to short-term de- pendent time series data. Our approach uses an enhanced LSTM prediction model. This model introduces the explicit internal LSTM unit structure and frequently updates the in- ternal state values while acquiring input data at each time point, which guarantee that the EEG data before and after the time point can hold a powerful connection. System Overview Our solution consists of two subsystems: EEG signal collec- tion and pre-processing module, and neural-network-based EEG signal interpreter . The main goal of our method is to design a deep learning model that classiﬁes the user’ s emo- tion status with raw EEG signal generated by our low-cost equipment in real-time. E E G Si gnal Co ll ec tion an d Pre - Pre p roce ssing Modu le Low - Co st E E G E quip ment ◼ Raw EEG Signal Co ll ec ti on ◼ Bl uetoo th T ra nsmi ssion EEG Signal Pre - Pr ep ro c e ss ing Neural - Network - Based E E G Sig na l Int erp rete r Pers ona li zati on Enha nc ed Stac k e d L STM Stac k e d L STM Brai n - Rob ot Interacti on in Ho me Settings Figure 2: System architecture diagram. In the EEG signal pre-processing section, we use an off- the-shelf headset equipment to collect raw EEG signal; then we feed the signal into a EEG signal pre-processing algo- rithm for multi-classiﬁcation. The neural-network-based EEG signal interpreter mod- ule read in the processed EEG signals and translate it into one of the four status with the help of LSTM network ar- chitecture. First, we use stacked LSTM to process the long- time dependency . Second, we propose an attention-based en- hanced stacked LSTM to capture user’ s EEG signal status. It is worth mentioning that we also incorporate a personal- ization step in this module so that we can achieve a more accurate predication results after a simple ﬁne-tuning step. LSTM-Based Method Processing EEG Signal W e propose a hybrid deep learning model to interpret the raw EEG signals. In this part, we ﬁrst summarize the pre- processing step for EEG signals. Then, we outline the pro- posed LSTM-Based method and its various components. At the end, we introduce the technical details Brain-robot inter- action system in subsequent subsections. EEG Signal Collection and Pre-processing Module 0 50 100 150     20 40 60 80 100     r e l a x e d f o c u s e d 0 50 100 150     20 40 60 80     (a) EEG signals from relaxed to focused (b) EEG signals keep relaxed. 0 50 100 150     20 40 60 80     f o c u s e d r e l a x e d 0 50 100 150     40 60 80 100     (c) EEG signals from focused to relaxed. (d) EEG signals keep focused. Figure 3: Unpreprocessed EEG signals describe four states. This paper uses low-cost EEG data collection devices. Al- though the device is resistant to noise from multiple signal channels from users, it is occasionally affected by external en vironments such as weather and sound. Therefore, there are sometimes unusual data points in the data set. T o im- prov e the accuracy and stability of the results, it is neces- sary to perform data preprocessing. The unprepared EEG data collects in 180 seconds is shown in Figure 3. It can be seen that the data is quite dense and confusing. If there is no appropriate means for data preprocessing, it will train the model and forecasting brings big challenges. In Figure 3, the low-cost EEG data collection device provides a score value that reﬂects the decimal representation of the user’ s brain emotional state, which is a numerical representation of the abov e-mentioned EEG pattern. When the user’ s EEG signal value is high, it indicates that the brain has a high proba- bility of being in the Beta or Gamma pattern. Con versely , when the user’ s value is lo w , it indicates that the brain has a high probability of being in the Delta or Theta pattern. As for the user’ s medium-conscious Alpha pattern, the lo w-cost EEG data collection device will giv e corresponding numeri- cal values according to dif ferent users (male or female). Furthermore, we use lo w-cost equipment to collect EEG data. Compared to high-cost equipments, such as Emotiv Epoc+11 headset, which detects 5 types of patterns (Delta, Theta, Alpha, Beta and Gamma), Our EEG equipment can only detect the two types of patterns, which is high and low . 0 10 20 30 40 50 60 60 frames of sampling (180 seconds) 0 20 40 60 80 100 EEG score relaxed focused r e l a x e d f o c u s e d 0 10 20 30 40 50 60 60 frames of sampling (180 seconds) 0 20 40 60 80 100 EEG score relaxed focused (a) The status from relaxed to focused. (b) The status keep relaxed. 0 10 20 30 40 50 60 60 frames of sampling (180 seconds) 0 20 40 60 80 100 EEG score relaxed focused f o c u s e d r e l a x e d 0 10 20 30 40 50 60 60 frames of sampling (180 seconds) 0 20 40 60 80 100 EEG score relaxed focused (c) The status from focused to relaxed. (d) The status keep focused. Figure 4: EEG signal sessions as an example to illus- trate four status: focused, relaxed, focused → relaxed, and relaxed → focused.                                              (a) Alpha pattern for male (b) Alpha pattern for female Figure 5: The difference of EEG signal Alpha pattern be- tween male and female in the same age. The signal is low value, and for the middle Alpha pattern, ac- cording to different people (such as male and female). The signal value is divided into focused or relaxed. Therefore, based on the changes in the two states, we divide the local datasets’ label into four categories. For Alpha pattern in the middle of Delta to Gamma, we need to divide it into relaxed or focused based on the signal value from male or female. First of all, our EEG collection device giv es decimal scores that reﬂect the user’ s brain emotion status, as shown in the Figure 4. Our EEG equipment is resistant to the noises from multiple channels of signals from the user , it can oc- casionally be af fected by the external en vironment, such as weather and sound. Thus, sometimes there would be an out- lier data point in the dataset. W e apply statistic methods to identify the outlier and discard them. And we use the aver - age score of the data point before and after it (as all the data is a time series sequence) to represent it. As described at the beginning of the article, we prop- erly encode the two types of collected data , which achiev es the expected four classiﬁcation effects. This article uses one of the typical encoding methods named one-hot encod- ing for label processing, which can effecti vely av oid the model abrupt problem caused by the logarithmic calculation of the model during training. In the initial stage of multi- classiﬁcation coding operation, the experimental results ob- tained by our experiments are always in a low numerical range. W e think that the situation may be caused by two factors, one is that there has a problem with the multi-class model architecture, the other one is that the entered data cannot be classiﬁed correctly . Through the follow-up exper - iments, we exclude the ﬁrst case and ﬁnally ﬁnd that the problem occurs in the second case, because the multi-class model structure has no problem. W e list the problems as fol- lows: 1) There is a speciﬁc correspondence between the clas- siﬁcation data and the label. 2) There may be some similarity in the features of the classiﬁcation data extraction. 3) Classi- ﬁcation data cannot be classiﬁed. 4) The categorical data can be a feature combination (A&B) b ut not a feature (A1&A2). The experimental data that we collect conﬂict with case 3) abov e, which results in the model not being able to classify the data normally . Based on this, we give some additional explanations about case 3). If we no w want to identify the ear characteristics of an animal, such as cat ears and dog ears, we can get the correct classiﬁcation result by training the appropriate model. Ho wever , if the characteristics of dog ears and cat ears are each taken in half and combined into a new feature, the model would not correctly classify the char- acteristics of the combination during the operation. For the middle Alpha pattern, we divide it into relaxed or focused based on the EEG signal value of the male or female user . As shown in the Figure 5, the intermediate state of the EEG data is higher for male, even in the relax ed state, the value is still close to 60, while the value of female is less than 50. Therefore, the model can only judge the category by ran- dom division, which leads to the low numerical range of the e xperimental results. If the above-mentioned mixed con- dition occurs in the feature, it is not good to ﬁnd a suit- able function for feature classiﬁcation. T aking the situation into consideration, we perform secondary separation on the mixed features, and the designed function needs to satisfy the following conditions: F ( u, v ) 6 = F ( v , u ) , and the re- gions represented by F ( u ) and F ( v ) are isolated at u = v . Here we give the mathematical formula used in the prepro- cessing, Y = 2 A − B , where A is the raw data, B is part of the raw data. W e hav e re-sampled the data several times to verify the feasibility of the formula. In addition, we have considered other functions, such as Y = A 2 − B 2 + AB , where A is the raw data, B is part of the raw data, etc., b ut there are always v arious problems to reduce the accuracy of the model. Neural-Network-Based EEG Signal Interpr eter W e focus on learning the meanings of the user’ s intent sig- nals which are 1-D vectors (collected in one time point). W e express the single input EEG signal as E i . Then, we feed E i to the LSTM structure for temporal feature learning. At last, according to the learned temporal features X t , the result of the classiﬁcation is giv en (Stober et al. 2015). The central idea of our DeepBrain workﬂo w and interac- tion operations is depicted in Figure 6. The input ra w EEG data is a single sample vector . W e ﬁrst utilize two fully con- nected layers as our hidden layer, and then input its value of output to the LSTM units. In addition, the arrow shows the internal structure of the LSTM layer , where σ and tanh represent the activ ation function. X t is the input of model. h t is the output of LSTM cell in the t -th time step and h t − 1 is derived in the previous sequence step. S t stands for the value of LSTM memory cell in the t -th time step.                att W ¢ O att O att W              Figure 6: The illustration of DeepBrain workﬂow and inter- action operations For the challenges of time series data processing, we ﬁrst use down sampling technique to obtain the characteristic subsequence of the original time series. Do wn sampling re- duces the complexity of the original time series and makes it easier for learning patterns. At the same time, to speed up the conv ergence rate of the model, we normalize our time series data using min-max normalization, which is a linear transformation of the original data. The transformed v alues are mapped into the interval [0, 1]. In the temporal feature processing part, the powerful time feature extraction capability of the LSTM structure is prov en. LSTM can explore the feature dependencies over time through an inner state of the network, which permits it to display trend temporal behavior . LSTM cells control the input, storage and output of data by introducing a set of gate mechanisms. As sho wn in the Figure 6, the LSTM gate units receiv e the output of the LSTM internal units at the previous time step and the input at current time step sam- ple. If the previous layer of the LSTM cell layer is not the input layer, its various gate units accept the output of previ- ous layer’ s LSTM internal units at the current time step and the output of the LSTM internal units at the pre vious time step. W e utilize an LSTM model that contains three com- ponents: one input layer , 2 hidden layers, and one output layer . LSTM (shown as the rectangles in the Figure 6) cells are in the hidden layers. Assume that a batch of input EEG data contain n s (generally called batch size) EEG samples and the total input data ha ve the 3-D shape as [ n s , 30 , 1] . Let the data in the i -th layer be denoted by X r i = { X r ij k | j = 1 , 2 , · · · , n s , k = 1 , 2 , · · · , K i } , X r i ∈ R [ n s ,K i , 1] , where j denotes the j -th EEG sample and K i denotes the number of dimensions in the i -th layer . Assume that the weights between layer i and layer i + 1 can be denoted by W r i ( i +1) ∈ R [ K i ,K i +1 ] , e.g., W r 23 means the weight between layer 2 and layer 3. b r i ∈ R K i denotes the biases of the i -th layer . The calculation between the i -th layer data and the i + 1 -th layer data can be denoted as X r i +1 = X r i ∗ W r i,i +1 + b r i The calculation of LSTM layers are shown as follo ws: f i = sig moid ( H ( X r ( i − 1) j , X r ( i )( j − 1) )) f f = sig moid ( H ( X r ( i − 1) j , X r ( i )( j − 1) )) f o = sig moid ( H ( X r ( i − 1) j , X r ( i )( j − 1) )) f m = tanh ( H ( X r ( i − 1) j , X r ( i )( j − 1) )) c ij = f f  c i ( j − 1) + f i  f m X r ij = f o  tanh ( c ij ) where f i , f f , f o and f m represent the input gate, forget gate, output gate and input modulation gate separately , and  de- notes the element-wise multiplication. c ij means the state (memory) in the j -th LSTM cell in the i -th layer , which is the most important part to quest the time-series relev ance among EEG data samples. H ( X r ( i − 1) j , X r ( i )( j − 1) ) means the operation as follows: X r ( i − 1) j ∗ W + X r ( i )( j − 1) ∗ W 0 + b where W , W 0 and b mean the corresponding weights and biases. Subsequently , we use the Back Propagation Through T ime (BPTT) algorithm to train our designed model. Finally , we get the designed model prediction results and use soft- max crossentropy as the loss function. The loss function is optimized by the Adam optimizer (Kingma and Ba 2014) with a learning rate of 10 − 4 and a minibatch size of 64. Attention-based Enhanced Stacked LSTM In addition to the proposed Stacked LSTM structure that can handle the time-dependency of EEG signals, we are aware that different people might hav e v arious EEG signal patterns and this aspect need to be carefully tackled with extended design on the Stacked LSTM. In order to present personal- ized solution in DeepBrain, we enhance the representation of Stacked LSTM with attention mechanism. The attention- based enhanced Stacked LSTM can facilitate the DeepBrain to learn speciﬁc features of different people, and tune our system to achiev e accurate recognition of individual EEG signals. Figure 7 depicts the overall architecture of attention-based enhanced stacked LSTM. W e use history timestamp to pre- dict the current timestamp. The embedding layer is the ﬁrst layer of attention-based enhanced stack ed LSTM. And then, we use stacked LSTM to capture the long-dependency . After that, in order to customize the EEG recognition, we enhance the representation of stacked LSTM with attention selector . The attention selector accepts the ﬁnal LSTM cell’ s input value as the attention weights W 0 att which is measured by the operation W 0 att = P 0  c r i ( j − 1) , X r i ( j − 1) , X r ( i − 1) j  where c r i ( j − 1) denotes the hidden state of the ( j − 1) -th LSTM cell. The operation P 0 ( ∗ ) is similar with the calcu- lation process of the LSTM structure and calculates the nor - malized attention weights W att as W att = sof tmax ( W 0 att )                Figure 7: Attention-based enhanced stacked LSTM. Then, the concatenation layer is used to connect the output of Stacked LSTM and attention selector . The dropout layer is a regularization technique to improve the generalization of EEG signal recognition model. The ﬁnal softmax layer is to classify four EEG signal patterns. Experiments and Results In this section, we use the collected local dataset to e valuate the designed deep learning model. Evaluation Metrics W e use six metrics to comprehensively ev aluate the per- formance of our model: Accuracy , Precision, Recall, F1, R OC (Receiv er Operating Characteristic), A UC (Area Un- der Curv e), TPR (T rue Positi ve Rate), and FPR (False Pos- itiv e Rate). These metrics hav e been widely used to assess machine learning algorithms. The details of the metrics are listed following: Accur acy = T P + T N T P + F N + F P + T N F 1 = 2 · P r ecision · Recall P r ecision + Recall P recision = T P T P + F P T P R = T P T P + F N Recal l = T P T P + F N F P R = F P F P + T N In general, accuracy is used to judge a model whose goal is to classify , meanwhile since our goal is to identify the EEG data category , precision and recall are also important metrics for ev aluating our model. Precision rate is mainly used to judge whether the classiﬁer can correctly get classiﬁ- cation results, that is to say , it mainly focuses on identifying abnormal samples. Furthermore, recall rate mainly e valuates whether the classiﬁer can identify all abnormal samples. F β score is a combination of the previous two metrics, if β is less than 1, it represents that the recall rate is more impor- tant. On the contrary , the precision rate has a greater impact on the model quality assessment. F 1 score is used as a gen- eral overvie w of the performance about the algorithm. ROC is a graph composed of a False positive rate (horizontal axis) and a True positiv e rate (v ertical axis). W e can obtain differ - ent T P R/F P R pairs by adjusting the classiﬁer’ s classiﬁ- cation threshold. These data pairs are R OC data points, and they can intuitively reﬂect the advantages of the classiﬁer . A UC is the area under the R OC curv e, and it reﬂects the per - formance of the classiﬁcation model. The closer the A UC value is to 1, the better the classiﬁcation is. Four outcomes of the classiﬁcation include True Positi ve (TP), False Posi- tiv e (FP), True Negati ve (TN) and False Negati ve (FN), as shown above. W e show the R OC curve under the EEG data and analyze the R OC curves of the four models. Experimental Settings W e ﬁrst collect data through Adriano serial communication, and we need to install corresponding driv er module on Jet- son TX2. The subject is asked to wear the EEG device and control robot by mind. At the beginning, we need to com- pile and conﬁgure the various toolkits needed for the exper - iment on Jetson TX2. Speciﬁcally , we install the deep learn- ing framew ork T ensorFlow required for the experiment by means of source code compilation. By running T ensorFlow , we can smoothly load the model we designed. W e have care- fully annotated the EEG data to the corresponding actions that are undertaken by the subject and been av ailable from context. In our experiments, we choose a sum of 800 labeled EEG samples collected from 4 subjects (800 samples per subject). Each sample is a vector of 180 elements and cor- responds to one channel of the EEG data. T o ev aluate the performance, we use se veral ev aluation metrics such as ac- curacy , CPU and GPU ratio, ram footprint and so on. EEG Signals Analysis Furthermore, we concisely analyze the similarities between EEG signals corresponding to different intents and quan- tify them using spearman correlation as shown in T able 1. In order to make the machine understand human intentions better , we present two similarities used in our experiment, inter-class similarity and extra-class similarity . The inter- class similarity means the similarity of EEG signals within the same meaning. W e randomly choose sev eral EEG data samples from the same intent and calculate the spearman correlation coefﬁcient respectively . The inter-class similar- ity is measured as the a verage of spearman correlation coef- ﬁcients of all samples. Likewise, extra-class similarity in- dicates the correlation coefﬁcient between dif ferent EEG categories. W e estimate the correlation coef ﬁcients matrix for each subject and then calculate the av erage matrix. T a- ble 1 sho ws the correlation coef ﬁcients matrix and the rele- vant statistical e xtra-similarity and inter-similarity . Through these observati ons, feature representation and classiﬁcation can be performed effecti vely . Overall Comparison with Other Methods In this section, we give the performance study and then il- lustrate the ef ﬁciency of our approach by comparing with other methods and other deep learning algorithms. Recall that the designed approach is a hybrid model which uses the T able 1: The correlation coefﬁcients matrix of Self and Cross Class relax ed relaxed → focused focused → relaxed focused Self Cross relaxed 0.00155 0.04916 -0.00463 0.02890 0.00156 0.02448 relaxed → focused 0.04916 0.32900 -0.39446 0.05040 0.32900 -0.09830 focused → relaxed -0.00463 -0.39446 0.42358 0.03106 0.42358 -0.36803 focused 0.02891 0.05040 0.03106 0.01933 0.01933 0.03679 T able 2: Comparison on the local dataset with noise. Methods Accuracy Precision Recall F 1 A UC MLP 0.560 0.583 0.560 0.477 0.813 SVM 0.775 0.798 0.775 0.756 0.893 LSTM 0.765 0.879 0.765 0.705 0.960 Stacked LSTM 0.880 0.910 0.880 0.875 0.980 DeepBrain 0.970 0.972 0.970 0.970 0.997 T able 3: Comparison on the local dataset without noise. Methods Accuracy Precision Recall F 1 A UC MLP 0.625 0.555 0.625 0.559 0.904 SVM 0.790 0.783 0.790 0.785 0.886 LSTM 0.760 0.816 0.760 0.695 0.963 Stacked LSTM 0.880 0.919 0.880 0.873 0.983 DeepBrain 0.975 0.977 0.975 0.975 0.998 LSTM for feature learning and the softmax classiﬁer for in- tent recognition. In our experiments, the EEG data are ran- domly divided into two parts: the training dataset and the testing dataset. It should be noted that Brainlink collects EEG data in both noisy and non-noise conditions. W e show that our designed model gets the multi-classiﬁcation accu- racy of 0.975 and 0.970 on without noise and noise local dataset, respectively . T o take a clear look at the result, we introduce the detailed classiﬁcation reports in T able 3. W e can observ e that the Stacked LSTM with attention enhanced layer is generally better than the general hidden layer in the results of each of metrics. T able 2 sho ws the metrics on local dataset with noise. It shows that the ev aluation metrics on local dataset without noise are better than with noise. The R OC in both noisy and non-noise conditions data as the in- put of models is shown in Figure 8. The area under Deep- Brain curve is larger than the area under the curve of the other three methods, which can also be found from the v alue of the A UC. It shows that our method is better than the other three methods. According to the deﬁnition of R OC curve, we can realize that the R OC unceasingly decreases the threshold of classiﬁcation, and then count the values of TPR and FPR. Analyzing the R OC curv es of DeepBrain in the Figure 8, we can ﬁnd that the TPR value rapidly achieve 0.9 during the process of continuously moving do wn the threshold value. At the same time, we also analyze the data under appropri- ate noise in the Figure 8 (b). For our target group, we believ e, the reasonable noise decibels should be below 48 decibels, which is slightly lower than the number of noise decibels when people communicate normally , such as a central air- conditioned room. It means that our approach is much more robust. Although the other three methods perform well, the growth rate of the TPR v alue is slightly worse than ours. Ac- curacy comparison between our method and the other three 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 False Positive Rate 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 True Positive Rate SVM (AUC=0.886) NN (AUC=0.904) LSTM (AUC=0.963) Stack LSTM (AUC=0.983) DeepBrain (AUC=0.998) (a) R OC with quiet condition. 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 False Positive Rate 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 True Positive Rate SVM (AUC=0.893) NN (AUC=0.813) LSTM (AUC=0.960) Stack LSTM (AUC=0.980) DeepBrain (AUC=0.997) (b) R OC with appropriate noise condition. Figure 8: The R OC in both noisy and non-noise conditions data as the input of models methods are also listed in Figure 9. W e compare DeepBrain with SVM, MLP , LSTM, Stacked LSTM. In addition, the key parameters are listed here: multi-layer perceptron (MLP) (hidden layer node is 30), and LSTM with 32 unit cells. The results illustrate that our de- signed model achiev es the higher accuracy than other meth- ods. Our model also performs better than other deep learn- ing models such as MLP or LSTM. Furthermore, contrasted with the e xisting EEG classiﬁcation research which concen- trates on binary classiﬁcation, our designed model runs in multi-class scenario and still achie ves a high-le vel accuracy . T o illustrate the adv antage of our designed model of robust features from raw EEG data, we also contrast our Deep- Brain method with the single deep learning methods MLP and RNN. The experimental results are shown in Figure 9 (a), where we can notice that our method outperforms MLP , LSTM, SVM and Stacked LSTM in classiﬁcation accuracy by 35%, 21.5%, 18.5% and 9.5% respecti vely . Figure 9 (b) demonstrates that the accuracy changes along with the MLP SVM LSTM Stacked LSTM DeepBrain 0.0 0.2 0.4 0.6 0.8 1.0 Accuracy (a) Accuracy comparison. 10 20 30 40 50 60 70 80 90 100 The training data proportion (%) 0.75 0.80 0.85 0.90 0.95 1.00 Accuracy 0 200 400 600 800 1000 1200 1400 Training time (s) (b) Accuracy and training time. Figure 9: Comparisons of accuracy and training time. training iterations under three cate gories of feature learning methods, which shows that the designed model con ver ges to its high accuracy in fewer iterations than independent MLP and RNN. Conclusions In this paper , we propose DeepBrain for disabled people’ s application. W e demonstrate a viable technique that the LSTM neural network builds model on normal time series behaviour and then uses prediction to give real-time feed- back to our domestic robot. The DeepBrain produces rela- tiv ely good results on real-world dataset that inv olves long- term time-dependent and weak time-dependent and is difﬁ- cult to predict. As compared with MLP , SVM, LSTM, and Stacked LSTM, our model achiev es better results, indicating the robustness of our methods. Future work may consider using different lev els of net- work structure and more accurate EEG collection devices instead of the equipment used in our paper , which can bring more categories of classiﬁcation and still maintain high level of accuracy since it has more categorizable data and a high- precision network model. In general, the DeepBrain system with its associated methods presents a viable candidate to apply the state-of-the-art AI techniques to the ﬁeld of HCI applications. References [Akram, Han, and Kim 2015] Akram, F .; Han, S. M.; and Kim, T .- S. 2015. An efﬁcient word typing p300bci system using a modiﬁed t9 interface and random forest classiﬁer . In Computers in biology and medicine , volume 56, 30–36. [Calvo and D’Mello 2010] Calvo, R. A., and D’Mello, S. 2010. Af- fect detection: An interdisciplinary re view of models, methods, and their applications,. IEEE T rans. Affect. Comput. 1(1). [Calvo and D’Mello 2014] Calvo, R. A., and D’Mello, S. 2014. Feature extraction and selection for emotion recognition from eeg. IEEE T rans. Af fect. Comput. 5(3). [Grav es 2013] Graves, A. 2013. Generating sequences with recur- rent neural networks. arXiv preprint . [Gudmundsson et al. 2007] Gudmundsson, S.; Runarsson, T . P .; Sigurdsson, S.; Eiriksdottir , G.; and Johnsen, K. 2007. Reliability of quantitati ve eeg features. In Clin. Neur ophysiol. , v olume 118, 2162–2171. [Huang et al. 2015] Huang, Y .-J.; W u, C.-Y .; W ong, A. M.-K.; and Lin, B.-S. 2015. Novel acti ve comb-shaped dry electrode for eeg measurement in hairy site. IEEE T rans. Biomed. Eng. 62(1). [Kingma and Ba 2014] Kingma, D., and Ba, J. 2014. Adam: A method for stochastic optimization,. . [LeCun, Bengio, and Hinton 2015] LeCun, Y .; Bengio, Y .; and Hin- ton, G. 2015. In Deep learning , volume 521, 436–444. Nature. [Mauss and Robinson 2009] Mauss, I. B., and Robinson, M. D. 2009. Measures of emotion: A re view . In Cogn. Emotion , vol- ume 23, 209–237. [Muller 2003] Muller, M. J. 2003. Participatory design: the third space in hci. In Human-computer interaction: Development pro- cess , 165–185. [Nguyen et al. 2015] Nguyen, T .; Naha vandi, S.; Khosra vi, A.; Creighton, D.; and Hettiarachchi, I. 2015. Eeg signal analysis for bci application using fuzzy system. 2015 International Joint Confer ence on Neural Networks (IJCNN) . [O.R.Pinheiro et al. 2016] O.R.Pinheiro; J.R.deSouza; L.R.Alves; and M.Romero. 2016. Wheelchair simulator game for training people with sev ere disabilities. IEEE . [Qin et al. 2018] Qin, Z.; W u, D.; Xiao, Z.; Fu, B.; and Qin, Z. 2018. Modeling and analysis of data aggregation from conv erge- cast in mobile sensor networks for industrial iot. IEEE T ransac- tions on Industrial Informatics 14(10):4457–4467. [Schmidhuber and Hochreiter 1997] Schmidhuber, J., and Hochre- iter , S. 1997. Long short-term memory . In Neural computation , volume 9, 1735–1780. [Schmidhuber , Graves, and Fernandez 2007] Schmidhuber , J.; Grav es, A.; and Fernandez, S. 2007. Multi-dimensional recurrent neural networks. In International Conference on Artiﬁcial Neural Networks , 549–558. [Stober et al. 2015] Stober, S.; Sternin, A.; Owen, A. M.; and Grahn, J. A. 2015. Deep feature learning for eeg recordings. arXiv pr eprint arXiv:1511.04306 . [W illiams et al. 2015] W illiams, M. A.; Rose way , A.; O’Dowd, C.; Czerwinski, M.; and Morris, M. R. 2015. Swarm: An actuated wearable for mediating affect. Pr oc. 9th ACM Int. Conf. T angible Embedded Embodied Interaction . [W u et al. 2020] W u, D.; Nie, X.; Asmare, E.; Arkhipo v, D. I.; Qin, Z.; Li, R.; McCann, J. A.; and Li, K. 2020. T owards distributed sdn: Mobility management and ﬂow scheduling in software deﬁned urban iot. IEEE T ransactions on P arallel and Distributed Systems 31(6):1400–1418. [Zhang et al. 2010] Zhang, Y .; Bao, L.; Y ang, S.; W elling, M.; and W u, D. 2010. Localization algorithms for wireless sensor retriev al. The Computer Journal 53(10):1594–1605. [Zhang et al. 2017] Zhang, X.; Y ao, L.; Sheng, Q. Z.; Kanhere, S. S.; Gu, T .; and Zhang, D. 2017. Con verting your thoughts to texts: Enabling brain typing via deep feature learning of eeg sig- nals. . [Zhang et al. 2018] Zhang, X.; Y ao, L.; Kanhere, S.; Liu, Y .; Gu, T .; and Chen., K. 2018. Mindid: Person identiﬁcation from brain wa ves through attention-based recurrent neural network. ACM In- ternational J oint Conference on P ervasive and Ubiquitous Com- puting (Ubicomp 2018) . [Zheng et al. 2018] Zheng, W .-L.; Liu, W .; Lu, Y .; Lu, B.-L.; and Cichocki, A. 2018. Emotionmeter: A multimodal frame work for recognizing human emotions. IEEE T ransactions on Cybernetics .

DeepBrain: Towards Personalized EEG Interaction through Attentional and Embedded LSTM Learning

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment