Real-Time Sleep Staging using Deep Learning on a Smartphone for a Wearable EEG
We present the first real-time sleep staging system that uses deep learning without the need for servers in a smartphone application for a wearable EEG. We employ real-time adaptation of a single channel Electroencephalography (EEG) to infer from a T…
Authors: Abhay Koushik, Judith Amores, Pattie Maes
Real-T ime Sleep Staging using Deep Lear ning on a Smartphone f or a W earable EEG Abhay Koushik MIT Media Lab Cambridge, MA USA koushika@mit.edu Judith Amor es MIT Media Lab Cambridge, MA USA amores@mit.edu Pattie Maes MIT Media Lab Cambridge, MA USA pattie@media.mit.edu Abstract W e present the first real-time sleep staging system that uses deep learning without the need for servers in a smartphone application for a wearable EEG. W e emplo y real-time adaptation of a single channel Electroencephalography (EEG) to infer from a Time-Distrib uted 1-D Deep Con v olutional Neural Network. Polysomnogra- phy (PSG)—the gold standard for sleep staging, requires a human scorer and is both complex and resource-intensi v e. Our work demonstrates an end-to-end on-smart phone pipeline that can infer sleep stages in just single 30-second epochs, with an ov erall accuracy of 83.5% on 20-fold cross v alidation for fi v e-class classification of sleep stages using the open Sleep-EDF dataset. 1 Introduction and Background Having a proper night of sleep and regular circadian rhythm is crucial for physical, mental and social well-being [ 24 , 28 ]. Sleep facilitates learning, memory-consolidation and emotion processing[ 5 , 14 ]. Identification of sleep stages is important not only in diagnosing and treating sleep disorders but also for understanding the neuroscience of healthy sleep. Polysomnography (PSG) is used in hospitals to study sleep and diagnose sleep disorders. It is considered the gold standard for sleep staging and in v olves recording of multiple electrophysiological signals from the body such as brain activity using EEG, heart rhythm through Electrocardiography (ECG), muscle tone through Electromyography (EMG) and eye-mo vement through Electrooculograph y (EOG). PSG is a tedious procedure which requires skilled sleep technologists in a laboratory setting. Since EEG is the most reliable signal for sleep staging, automation of PSG can be achiev ed through accurate classification of EEG[ 16 ]. Research in Deep Learning[ 15 , 13 ] has led to man y ef ficient algorithms to classify dif ferent kinds of data including bio-medical and physiological signals. In this paper , we focus on developing a real-time sleep staging application using T ime-distributed 1-D Deep Con v olutional Neural Networks to classify the fiv e sleep stages in a comfortable en vironment. As per the new AASM rules[ 6 ], these stages are—W ake, Rapid-Eye-Mov ement (REM) and Non-Rapid-Eye-Mov ement (N-REM) stages N1, N2, and N3. W e make use of a single channel EEG recorded through a modified research v ersion of the Muse headband[ 12 ]. The headband is flexible and can be comfortably used while sleeping. It has 5 electrodes, namely , AF7, AF8, TP9, TP10 and reference Fpz, for brain activity measurements. Importance of mobile systems for sleep staging PSG requires a minimum of 22 wires attached to the body in order to monitor sleep acti vity . The complexity of this setup requires sleeping in a hospital or laboratory with an expert monitoring and scoring signals in real-time. This results in an unnatural and disturbed night of sleep for the subject which may not only affect the diagnosis but also causes sub-optimal utilization of time and energy resources for recording and scoring and is as such highly undesirable. There is significant development in research on automating sleep staging with wireless signals[ 30 ] and more compact, wearable devices[ 7 , 21 ]. Ne v ertheless, none of these systems implements a fiv e-stage classification of sleep in real-time. Machine Learning for Health (ML4H) W orkshop at NeurIPS 2018. The goal of our research was to simplify and reliably automate PSG on-smart phone in just unit 30-second non-overlapping epochs for automatic real-time interventions during experiments on sleep stages and cognition as the minimum time-resolution of a single sleep score by an expert is 30 seconds. Automated classification is achiev ed through adaptation of a Time-Distrib uted Deep Con v olutional Neural Network model. Simplification is achiev ed by dev eloping T ensorFlow Lite Android application that uses only a single channel recording from a wearable EEG. W e ha ve also designed a friendly user interface that visualizes sleep stages and raw EEG data with real-time statistics about accuracy . Our app connects via Bluetooth Low Energy (BLE) to the flexible EEG headband thus making it portable and not restricted to laboratory and hospital use. 2 Related work Automatic analysis and sleep scoring using multi-layer Neural Networks [ 22 ] was done as early as 1996 using 3 channels of physiological data, namely EEG, EOG and EMG. This in volved power spectral density calculations for feature extraction from raw EEG which required a tedious laboratory setting to collect reliable data through these channels. More recent work has looked into creating portable sleep scoring systems, such as the work by Zhang et al. [ 29 ], that uses pulse, blood oxygen and motion sensors to predict sleep stages. In their paper , they do not detect sleep stages N1 and N2 separately , and N1 is usually the hardest one to predict. The authors already mention that these results cannot provide equally high accuracy as compared to the EEG and EOG signals of PSG. The same limitations apply to the work by Zhao et al[ 30 ]. Our work achie v es reliable accuracy by using only one channel from a wearable EEG, and ov ercomes the complexity of recording multiple signals. Our model is based on T ime-Distributed Deep Con volutional Neural Networks[ 8 ] and is inspired by the DeepSleepNet from Supratak et al.[ 26 ]. DeepSleepNet makes use of representation learning with a Con v olutional Neural Network (CNN) followed by sequence residual learning using Bidirectional- Long Short T erm Memory cells (Bi-LSTM). The major drawback of this network is that it requires 25 epochs of ra w EEG data to be fed in together to obtain 25 labels. This is mainly because of the Bi-LSTM which relies on large temporal sequences to achie v e better accuracy . State-of-the art network model—SeqSleepNet[ 19 ] processes multiple epochs and outputs the sleep labels all at once using end-to-end Hierarchical Recurrent Neural Networks. This uses all 3 chan- nels—namely , EEG, EMG and EOG in order to gi ve the best o verall accuracy of 87.1% on the MASS dataset[ 17 ]. CNN models by Sors et al.[ 25 ] and Tsinalis et al.[ 27 ], as well as SeqSleepNet and DeepSleepNet all use longer temporal sequences for inference—4, 5, 10 and 25 raw EEG epochs of 30 seconds respectiv ely . W e ov ercome this limitation by using Time-Distrib uted Deep CNN to predict single 30-second epochs with real-time adaptation from wearable EEG. Flexibility of this wearable also makes it more preferable than the bulky system used by Lucey et al[ 16 ]. The smart-phone based nature of our sleep-staging application overcomes the need for a client-server architecture as used in Dreem headband[ 18 ]. Our T ensorFlo w-Lite mobile application can also be adapted to other types of EEG devices for real-time settings. 3 Methods and Materials 3.1 Dataset description and pre-pr ocessing W e used the expanded Sleep-EDF database from Physionet-bank[ 11 ]. Single channel EEG (Fpz-Cz at 100Hz) of 20 subjects are di vided into training set of 33 nights and validation set of 4 nights. T ogether , they contain non-ov erlapping nights of 19 subjects for 20 fold-cross validation. The non-o verlapping test set contains 2 nights (1 subject). W e remove the extra w ake states before and after half an hour of sleep as described in the DeepSleepNet[ 26 ]. W e excluded MO VEMENT and UNKNO WN stages, and combined N4 and N3 to follow the fi v e-stage classification as per the ne w AASM rules[6]. 3.2 Model architectur e and training Our model architecture is described in Figure 1. The Base-CNN has 3 repeated sets of two 1-D con v olutional (Con v1D) layers, 1-D max-pooling and spatial dropout layers. This is followed by two Con v1D, 1-D global max-pooling, dropout and dense layers. W e finally hav e a dropout layer 2 Figure 1: Model architecture used for training and inference as the output of Base-CNN. 30-second epochs of normalized ra w EEG at 100Hz is fed into the T ime-Distributed Base-CNN model[ 8 ] as described in the Figure 1. All Con v1D layers use Rectified- Linear-Units (ReLU) acti v ation. The training uses an Adam optimizer of 0.001 with an initial learning rate of e − 3 which is reduced each time the v alidation accurac y plateaus using ReduceLR OnPlateau Keras Callbacks. 3.3 EEG adaptation and experiments Pre-pr ocessing data from wearable EEG Real-time brain activity from the flexible EEG head- band is streamed via BLE to the smartphone application. Raw EEG from Af7 channel is down- sampled to 100Hz at the end of each 30-second epoch before feeding into the network. EEG real-time adaptation Since the raw EEG recording instrument used for training is dif ferent from the testing instrument, we adapt EEG using a Z-score scaling with w ake-stage standard de viation for calibration. The main reason for choosing standard de viation of EEG ov er an y other metrics is highlighted in Figure 2. W e estimate the mutual information for a discrete sleep stage v ariable. W e use the set of statistical features sho wn in the diagram as input calculated e v ery 30 seconds and the corresponding sleep-stage labels as output of the dataset. Final feature sho wn is Min-Max-Distance (MMD) as described by Aboalayon et al.[ 1 ], calculated as a sum o ver euclidean distances between minimum and maximum EEG values with 1 second sliding windo w . Figure 2: The heat-map shows the mutual importance of dif ferent statistical features of raw EEG. W e choose the default of 3 nearest neighbours as the parameter for the mutual_info_classif function for automated feature-selection in scikit-learn machine learning library W e calculated the statistical feature importance of ra w EEG for 3 randomly selected nights. The standard deviation clearly has the major role with a verage relati v e importance of 53.33 percent over other features for the classification of sleep stages. Z-score calibration of wake stage emplo ys this important feature, hence, using this method for 30-second epoch adaptation makes the raw-EEG both instrument-independent and subject-independent as long as signal is noise-reduced. 4 Results Our model has an ov erall accuracy of 83.5% for 20-fold cross v alidation of fi ve-stage classification. The accurac y for nights from the test data ranges from 72%(worst-case). This model achie ves reliable accuracy gi ven that the o verall IRR (Inter -Rater-Reliability)[ 10 ] among human experts scoring sleep recordings reported was about 80% (Cohen’ s κ = 0.68 to 0.76). T able1 describes the precision, recall, F1-score and support of all the five sleep stages on predictions from 5 test nights. The corresponding accuracy of 81.72% and F1-score of 76.23% was obtained. In addition, micro, macro and weighted av erage of these metrics are also calculated in order to give a better statistical understanding. The confusion matrix for the same night is shown in the left part of Figure 3. N1 stage sho ws the poorest agreement because of the absence of an occipital electrode[ 16 ]. The Figure 4 sho ws the difference between the hypnogram of one full-night predicted by the model and the ground truth hypnogram as labeled in the dataset. The mobile application which deploys this deep learning model is b uilt using T ensorFlow Lite. The raw data from the wearable EEG is wirelessly streamed to the phone and updated ev ery 30 seconds with its correspondent sleep stage and confidence value. W e successfully v alidated real-time Rapid-Eye-Mov ement (REM) detection using our wearable headband by re-creating the same closed, lateral e ye mov ement during wak efulness. W e ha ve also simulated and successfully validated ja w clenching and blinks during wak efulness. 3 T able 1: Fi ve-class classification report for sleep staging on 5 test nights Label Sleep Stage Precision Recall F1-score Support 0 W ake 0.83 0.96 0.89 730 1 N1 0.47 0.42 0.44 337 2 N2 0.87 0.83 0.85 2248 3 N3 0.92 0.81 0.86 931 4 REM 0.71 0.83 0.77 903 Micro av erage 0.82 0.82 0.82 5149 Macro av erage 0.76 0.77 0.76 5149 W eighted av erage 0.82 0.82 0.82 5149 Figure 3: Confusion matrix (left). Snapshot of the headband and android application (right) Figure 4: Comparison of the ground truth and predicted hypnograms of one full night 5 Conclusion and future scope This work demonstrates an end-to-end mobile pipeline for the fastest real-time sleep-staging by adaptation of a wearable EEG. W ith the dev elopment of the new mobile T ensorFlow Lite application, we achie ve automated sleep staging without the need of servers, in a portable way that can be used anywhere, including the home. The application is versatile as it can be adapted to take in single channel(Fpz-Cz) recordings from any wearable EEGs. W e aim to use this work for real-time interventions using Brain Computer Interfaces (BCI) for applications in Human Computer Interaction (HCI), such as wearable olfactory interfaces [ 2 , 3 ], real-time audio-neural feedback[ 23 , 9 ] and sleep-based enhancement of learning and memory[20, 4]. 4 References [1] Khald Ali I Aboalayon, Miad Faezipour , W afaa S Almuhammadi, and Saeid Moslehpour . Sleep stage classification using eeg signal analysis: a comprehensiv e surve y and ne w in vestig ation. Entr opy , 18(9):272, 2016. [2] Judith Amores, Ja vier Hernandez, Artem Dementyev , Xiqing W ang, and Pattie Maes. Bioessence: A wearable olfactory display that monitors cardio-respiratory information to support mental wellbeing. In 2018 40th Annual International Confer ence of the IEEE Engineering in Medicine and Biology Society (EMBC) , pages 5131–5134. IEEE, 2018. [3] Judith Amores and Pattie Maes. Essence: Olfactory interfaces for unconscious influence of mood and cognitiv e performance. In Proceedings of the 2017 CHI Confer ence on Human F actors in Computing Systems , pages 28–34. A CM, 2017. [4] Thomas Andrillon, Daniel Pressnitzer , Damien Léger, and Sid Kouider . Formation and suppression of acoustic memories during human sleep. Natur e communications , 8(1):179, 2017. [5] M Alizadeh Asfestani, Elena Braganza, Jan Schwidetzky , J Santiago, S Soekadar , Jan Born, and Gordon B Feld. Overnight memory consolidation facilitates rather than interferes with new learning of similar materials—a study probing nmda receptors. Neur opsychopharmacolo gy , 43(11):2292, 2018. [6] Richard B Berry , Rita Brooks, Charlene E Gamaldo, Susan M Harding, CL Marcus, BV V aughn, et al. The aasm manual for the scoring of sleep and associated ev ents. Rules, T erminology and T ec hnical Specifications, Darien, Illinois, American Academy of Sleep Medicine , 2012. [7] Alexander J Casson, David C Y ates, Shelagh JM Smith, John S Duncan, and Esther Rodriguez-V illegas. W earable electroencephalography . IEEE engineering in medicine and biology magazine , 29(3):44–56, 2010. [8] Y ouness Mansar . EEG_classification. https://github.com/CVxTz/EEG_classification , 2018. [9] Eliran Dafna, Ariel T arasiuk, and Y aniv Zigel. Sleep staging using nocturnal sound analysis. Scientific r eports , 8(1):13474, 2018. [10] Heidi Danker-hopfe, Peter Anderer , Josef Zeitlhofer, Marion Boeck, Hans Dorn, Georg Gruber, Esther Heller , Erna Loretz, Doris Moser , Silvia Parapatics, et al. Interrater reliability for sleep scoring according to the rechtschaffen & kales and the ne w aasm standard. J ournal of sleep r esear ch , 18(1):74–84, 2009. [11] Bob K emp, Aeilko H Zwinderman, Bert T uk, Hilbert A C Kamphuisen, and Josefien JL Oberye. Analysis of a sleep-dependent neuronal feedback loop: the slow-w av e microcontinuity of the eeg. IEEE T r ansactions on Biomedical Engineering , 47(9):1185–1194, 2000. [12] Olav e E Krigolson, Chad C W illiams, Angela Norton, Cameron D Hassall, and Francisco L Colino. Choosing muse: V alidation of a lo w-cost, portable eeg system for erp research. F r ontier s in neur oscience , 11:109, 2017. [13] Alex Krizhe vsky , Ilya Sutske ver , and Geof frey E Hinton. Imagenet classification with deep con v olutional neural networks. In Advances in neural information pr ocessing systems , pages 1097–1105, 2012. [14] Laura BF Kurdziel, Jessica K ent, and Rebecca MC Spencer . Sleep-dependent enhancement of emotional memory in early childhood. Scientific r eports , 8, 2018. [15] Y ann LeCun, Y oshua Bengio, and Geof frey Hinton. Deep learning. natur e , 521(7553):436, 2015. [16] Brendan P Lucey , Jennifer S Mcleland, Cristina D T oedebusch, Jill Boyd, John C Morris, Eric C Land- sness, Kelvin Y amada, and Da vid M Holtzman. Comparison of a single-channel eeg sleep study to polysomnography . Journal of sleep r esear ch , 25(6):625–635, 2016. [17] Christian O’ reilly , Nadia Gosselin, Julie Carrier , and T ore Nielsen. Montreal archi ve of sleep studies: an open-access resource for instrument benchmarking and e xploratory research. J ournal of sleep r esear ch , 23(6):628–635, 2014. [18] Amiya Patanaik, Ju L ynn Ong, Joshua J Gooley , Sonia Ancoli-Israel, and Michael WL Chee. An end-to-end framew ork for real-time automatic sleep stage classification. Sleep , 41(5):zsy041, 2018. [19] Huy Phan, Fernando Andreotti, Navin Cooray , Oliv er Y Chén, and Maarten De V os. Seqsleepnet: End-to- end hierarchical recurrent neural network for sequence-to-sequence automatic sleep staging. arXiv pr eprint arXiv:1809.10932 , 2018. 5 [20] Björn Rasch, Christian Büchel, Steffen Gais, and Jan Born. Odor cues during slow-w av e sleep prompt declarativ e memory consolidation. Science , 315(5817):1426–1429, 2007. [21] A Sano, AJ Phillips, A W McHill, S T aylor , LK Barger , CA Czeisler , and R W Picard. 0182 influence of weekly sleep regularity on self-reported wellbeing. Journal of Sleep and Sleep Disorder s Research , 40(suppl_1):A67–A68, 2017. [22] Nicolas Schaltenbrand, Régis Lengelle, M T oussaint, R Luthringer , G Carelli, A Jacqmin, E Lainey , Alain Muzet, and Jean-Paul Macher . Sleep stage scoring using the neural network model: comparison between visual and automatic analysis in normal subjects and patients. Sleep , 19(1):26–35, 1996. [23] Maren D Schütze and Klaus Junghanns. The difficulty of staying a wake during alpha/theta neurofeedback training. Applied psychophysiology and biofeedbac k , 40(2):85–94, 2015. [24] Eti Ben Simon and Matthew P W alker . Sleep loss causes social withdraw al and loneliness. Natur e communications , 9, 2018. [25] Arnaud Sors, Stéphane Bonnet, Sébastien Mirek, Laurent V ercueil, and Jean-François Payen. A conv olu- tional neural network for sleep stage scoring from raw single-channel ee g. Biomedical Signal Pr ocessing and Contr ol , 42:107–114, 2018. [26] Akara Supratak, Hao Dong, Chao W u, and Y ike Guo. Deepsleepnet: a model for automatic sleep stage scoring based on raw single-channel eeg. IEEE T r ansactions on Neural Systems and Rehabilitation Engineering , 25(11):1998–2008, 2017. [27] Orestis Tsinalis, Paul M Matthews, and Y ik e Guo. Automatic sleep stage scoring using time-frequency analysis and stacked sparse autoencoders. Annals of biomedical engineering , 44(5):1587–1597, 2016. [28] Katharina W ulf f, Silvia Gatti, Joseph G W ettstein, and Russell G Foster . Sleep and circadian rhythm disruption in psychiatric and neurodegenerati ve disease. Natur e Re views Neur oscience , 11(8):589, 2010. [29] Jin Zhang, Dawei Chen, Jianhui Zhao, Mincong He, Y uanpeng W ang, and Qian Zhang. Rass: A portable real-time automatic sleep scoring system. In Real-T ime Systems Symposium (RTSS), 2012 IEEE 33rd , pages 105–114. IEEE, 2012. [30] Mingmin Zhao, Shichao Y ue, Dina Katabi, T ommi S Jaakkola, and Matt T Bianchi. Learning sleep stages from radio signals: a conditional adversarial architecture. In International Conference on Mac hine Learning , pages 4100–4109, 2017. 6 6 A ppendix The Base-CNN model used in our work is described by the Figure 5. All dropout layers have a rate of 0.01, pooling layers hav e a size of 2 and the model is compiled with an Adam optimizer of 0.001. The corresponding parameters of each of the layers are giv en alongside for reference. Figure 5: Architecture of the Base-CNN model 7
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment