Diffuse to fuse EEG spectra -- intrinsic geometry of sleep dynamics for classification

We propose a novel algorithm for sleep dynamics visualization and automatic annotation by applying diffusion geometry based sensor fusion algorithm to fuse spectral information from two electroencephalograms (EEG). The diffusion geometry approach hel…

Authors: Gi-Ren Liu, Yu-Lun Lo, John Malik

Diffuse to fuse EEG spectra -- intrinsic geometry of sleep dynamics for   classification
DIFFUSE TO FUSE EEG SPECTRA – INTRINSIC GEOMETR Y OF SLEEP D YNAMICS F OR CLASSIFICA TION GI-REN LIU, YU-LUN LO, JOHN MALIK, YUAN-CHUNG SHEU, AND HAU-TIENG WU Abstract. W e propose a nov el algorithm for sleep dynamics visualization and automatic annotation b y applying diffusion geometry based sensor fusion algo- rithm to fuse spectral information from tw o electroencephalograms (EEG). The diffusion geometry approach helps organize the nonlinear dynamical structure hidden in the EEG signal. The visualization is achiev ed by the nonlinear di- mension reduction capabilit y of the c hosen diffusion geometry algorithms. F or the automatic annotation purpose, the supp ort v ector machine is trained to predict the sleep stage. The prediction p erformance is v alidated on a publicly av ailable b enchmar k database, Physionet Sleep-EDF [extended] SC ∗ (SC = Sleep Cassette) and ST ∗ (ST = Sleep T elemetry), with the leav e-one-sub ject- out cross v alidation. When w e ha v e a single EEG c hannel (Fpz-Cz), the o v erall accuracy , macro F1 and Cohen’s k appa achiev e 82 . 72%,75 . 91% and 76 . 1% re- spectively in Sleep-EDF SC ∗ and 78 . 63%, 73 . 58% and 69 . 48% in Sleep-EDF ST ∗ . This p erformance is compatible with the state-of-the-art results. When we ha v e t wo EEG c hannels (Fpz-Cz and Pz-Oz), the o v erall accuracy , macro F1 and Cohen’s k appa achiev e 84 . 44%,78 . 25% and 78 . 36% resp ectively in Sleep- EDF SC ∗ and 79 . 05%, 74 . 73% and 70 . 31% in Sleep-EDF ST ∗ . The results suggest the potential of the proposed algorithm in practical applications. 1. Introduction Sleep is a universal recurring dynamical and ph ysiological activit y in mammals. According to American Academy of Sleep Medicine (AASM) [1, 2] that generalizes Rec htsc haffen and Kales (R&K) criteria [3], the sleep dynamics can b e divided into t wo broad stages, the rapid ey e mo vemen t (REM) and the non-rapid ey e mo v emen t (NREM), and the NREM stage is further divided in to shallow sleep (stage N1 and N2) and deep sleep (stage N3). Up to now, we ha v e accum ulated a lot of physiolog- ical knowledge ab out sleep dynamics [4] and a lot of research is actively carried out to explore the unknowns [5]. Despite those unknowns, it has b een w ell known that a distortion of sleep dynamics could lead to c atastrophic outcomes. F or example, sleep depriv ation impacts decision making [6], REM disturbance slo ws do wn the p erceptual skill impro vemen t [7], depriv ation of slo w wa ve sleep is asso ciated with Alzheimer’s disease [8], insufficien t N2 sleep is associated with weaning failure [9], sev eral public disasters are caused b y low sleep quality [10, 11], etc. Therefore, in addition to the in terest stemming from ph ysiological asp ects, w e hav e a lot of clinical applications from knowing the sleep dynamics. The p olysomnograph y (PSG) is the gold standard of ev aluating the sleep dy- namics [12]. Ho wev er, scoring the o v ernight sleep stage from the PSG outputs by h uman experts is time consuming and error-prone due to the h uge signal loading [13]. Due to the adv ance of the technology and computational p ow er, in the past decades a lot of effort has b een dev oted to establish an artificial intelligence (AI) 1 2 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU system for the automatic sleep stage annotation purp ose. In addition to b eing able to accurately score sleep stages, an ideal AI system should also b e able to “self- learn” or “accumulate kno wledge” from the historical database. In other words, when the database with exp erts’ annotations gro ws, an ideal AI system should b e able to use those new annotations. It is worth mentioning that the c hallenge is actually ubiquitous – for a new-arriving sub ject, how ma y one utilize the existing database with the exp ert annotations? 1.1. Related W ork. There hav e b een man y prop osed algorithms for the sake of automatic sleep stage scoring. Among man y papers, w e shall distinguish tw o com- mon cross v alidation (CV) sc hemes. According to whether the training data and the testing data come from different sub jects, the literature is divided in to t wo groups, le ave-one-subje ct-out and non-le ave-one-subje ct-out CV. When the v alida- tion set and training set are determined on the subje ct level , that is, the training set and the v alidation set con tain differen t sub jects, w e call it the lea ve-one-sub ject-out CV (LOSOCV) sc heme; otherwise we call it the non-LOSOCV sc heme. The main c hallenge of the LOSOCV scheme comes from the in ter-individual v ariability , but this scheme is close to the real-w orld scenario – how to predict the sleep dynamics of a new arriv al sub ject from a given annotated database. On the other hand, in the non-LOSOCV scheme, the training set and the testing set are dependent, and the p erformance migh t be o ver-estimated. In this work, to b etter ev aluate the prop osed algorithm caused by the in ter-individual v ariability issue, we fo cus on the LOSOCV sc heme, and only summarize pap ers considering the LOSOCV sc heme and single- or tw o- channel EEG signal. In [14], time-domain and frequency-domain features are designed to predict a wak e, REM, light NREM stage (N1 and N2 are com bined) and deep sleep stage b y supp ort v ector mac hine. The p erformance is ev aluated in a priv ate database with 4 sub jects. In [15], 104 features are extracted from the EEG signals and selected features are used to classify 4 sleep stages (N1 and N2 are combined) by random forests. The following papers classify 5 sleep stages like our w ork. In [16], time- domain and frequency-domain features are used as the input la y er of the stack ed sparse auto enco der, which is a sp ecific t yp e of neural net work mo del, with the sigmoid activ ation function. The performance of the stack ed sparse autoenco der w as ev aluated in the publicly a v ailable b enchmark database, Sleep-EDF SC ∗ [17], and the o v erall accuracy w as 78.9%. Instead of extracting features based on the domain kno wledge, features in [18] are automatically learned by the conv olutional neural net works (CNNs). The ov erall accuracies w as 74.8% for the same Sleep-EDF SC ∗ database. In [19], the authors also prop osed a deep learning mo del, named DeepSleepNet, which utilized CNNs to extract features and apply the bidirectional Long Short-T erm Memory to learn the stage transition rules from EEG ep ochs. The DeepSleepNet algorithm reaches the state-of-the-art 82.0% of ov erall accuracy on the Sleep-EDF SC ∗ database. In [20], a similar approach based on the deep CNN with mo difications is considered and achiev es a compatible result. 1.2. Our con tribution. The main nov elty and con tribution of this pap er is the diffusion ge ometry based feature extraction when there is only one electroencephalo- gram (EEG) c hannel, and diffusion ge ometry based sensor fusion when there are t wo EEG channels. The goal is to extract intrinsic information hidden in the EEG. DIFFUSE TO FUSE EEG SPECTRA 3 This nov el diffusion geometry approac h ac hieves the abov e-mentioned challenges – an accurate automatic sleep stage scoring AI system. The prop osed algorithm is comp osed of feature extraction part and learning part. The feature extraction part of the algorithm is based on a combination of t wo mo dern signal processing tools, including the scattering transform [21] and diffusion map (DM) [22] when only single c hannel is a v ailable or diffusion-based sensor fusion (the m ultiview DM [23]) when t w o c hannels are a v ailable. The scattering transform is a nonlinear-t yp e signal pro cessing tool motiv ated by studying the conv olutional neural netw ork (CNN) with solid theoretical supp orts, whic h allows us an extrac- tion of useful features from the EEG signal. W e mention that while it is motiv ated b y studying CNN, the lab el is not used in the scattering transform, so it is an unsup ervised learning to ol. W e call the extracted features sc attering EEG sp ectral features. The scattering EEG spectral feature, ho w ev er, is in general of high di- mension and different from the intrinsic sleep dynamics. This difference comes from v arious resources, including, for example, the inevitable noise and artifact. Thus, an extra step is needed to reduce the dimension and refine/reorganize the scatter- ing EEG spectral feature, whic h leads to the final intrinsic features for the sleep dynamics. W e suggest DM [22] or diffusion-based sensor fusion algorithms, like m ultiview diffusion [23]. In short, the scattering EEG sp ectral features extracted b y the scattering transform are re-organized by mo dern diffusion geometry based sensor fusion algorithms. W e call the output features of the DM/diffusion-based sensor fusion the intrinsic sle ep dynamic al fe atur es . The learning part of the al- gorithm for the prediction purp ose is based on the kernel support vector mac hine (SVM) [24]. The prediction mo del for each testing sub ject is established by the k ernel SVM based on the intrinsic sleep dynamical features and the annotations of all remaining sub jects in the database. T o show this p erformance of the prop osed algorithm and hav e a fair comparison with rep orted results, we consider the publicly a v ailable b enchmark database, the Ph ysioNet Sleep-EDF SC ∗ (SC = Sleep Cassette) database [17], which consists of health y sub jects without any sleep-related medication, and the Ph ysioNet Sleep- EDF ST ∗ (ST = Sleep T elemetry) [17], whic h consists of sub jects who had mild difficult y falling asleep. W e mention that compared with the existing neural net work (NN) based algorithms men tioned in Section 1.1, the prop osed algorithm, including the feature extraction part and learning part, has solid theoretical backups. In addition, later in this pap er we will sho w that the prop osed algorithm p erforms b etter or at least equally to the existing reported results based on NN approaches when there is a single EEG c hannel, and the p erformance is improv ed if w e fuse information from t wo EEG channels. 1.3. Organization of the pap er. The rest of this pap er is organized as follows. In Section 2, we in tro duce the prop osed algorithm and the databases w e ev aluate the prop osed algorithm. In Section 2.1 we summarize the theoretical bac kground of the scattering transform and in tro duce the sc attering EEG sp e ctrum as the dy- namical features. In Section 2.2, we describe how to apply the diffusion geometry to integrate and fuse the dynamical features in Section 2.1, and the prediction mo del based on SVM is summarized in Section 2.3. In Section 2.4, we describ e t w o databases, including the Sleep-EDF Database [Expanded] from PhysioNet. The re- sults of applying the prop osed algorithm to the tw o databases are shown in Section 3. 4 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU 2. Ma terial and Method The main nov elty of the prop osed algorithm is the feature extraction that de- p ends on the diffusion geometry based sensor fusion to ols, DM and multiview DM. The feature extraction consists of tw o steps. First, we apply the scattering trans- form to extract features from the EEG signal (indicated by Part 1 in Figure 1). Second, w e apply DM or multiview diffusion map (indicated by Part 2-1 and Part 2-2 in Figure 1) with a prop erly designed metric to determine the final features. With the prop osed features, we tak e the well established SVM to build up the prediction mo del. Below, we detail the algorithm implemen tation step by step. 2.1. F eature extraction step, part 1: Scattering transform. In this pap er, w e apply the recently developed scattering transform [21] to extract spectral fea- tures [25] from a giv en EEG signal. The scattering transform is motiv ated b y estab- lishing a mathematical foundation of the con v olutional neural net w ork. It computes co efficien ts of the given EEG signal by iteratively applying wa v elet transforms and nonlinear mo dulus op erators. They hav e t w o b enefits. First, they are inv arian t to time-shifts and stable to time-w arping deformation [25]. Second, they can ef- fectiv ely characterize self-similarit y and intermittency prop erties of m ultiscale time series [26]. These t wo b enefits provide us with a better quantification of the EEG dynamics. Here, we briefly review the scattering transform, particularly the first- and second-order co efficie n ts, whic h w e use as the sp ectral features for the given EEG signal. T ake a signal x defined on [0 , T ]. Let ψ b e an analytic wa velet, that is, a complex band-pass filter with the support o v er positive frequencies and the central frequency 1. Denote the associated father wa velet b y φ , which is a low pass filter. Denote Q ∈ N to b e the n um b er of wa v elets p er o ctav e; that is, the bandwidth of ˆ ψ is of order Q − 1 . T o ev aluate the wa v elet co efficients, we dilate the mother wa v elet ψ as follo ws: (1) ψ j ( t ) = 2 − j /Q ψ (2 − j /Q t ) , where j ∈ { . . . , − 1 , 0 , 1 , . . . , H } , and H ∈ N is the maximal w a velet scale so that the largest-scale wa velet will b e of bandwidth T . Clearly , c ψ j is centered at 2 − j /Q with the bandwidth 2 − j /Q Q − 1 , and hence ψ H captures the frequency ab ov e 2 − H/Q . F or the low frequency region [0 , 2 − H/Q ], we follow the suggestion in [25] to cov er this region b y Q − 1 equally spaced filters, denoted as ψ H +1 , . . . , ψ H + Q , with the frequency bandwidth 2 − H/Q Q − 1 cen tered at l 2 − H/Q Q − 1 , where l = 1 , . . . , Q − 1. Again, we follo w the suggestion in [25] and call ψ H +1 , . . . , ψ H + Q − 1 w av elets. Denote Λ = { 2 − j /Q } H j = −∞ ∪ { l 2 − H/Q Q − 1 } Q − 1 l =1 to b e the grid of cen ter frequencies of all considered wa v elets. With the abov e preparation, we are ready to establish the scattering transform co efficien ts of the signal x . W e start from collecting ordinary wa velet transform co efficien ts x ? ψ j 1 ( t ), where ? is the conv olution operator and −∞ ≤ j 1 ≤ H + Q , and the mo ving a verage by the father wa velet x ? φ H ( t ), where (2) φ H ( t ) := Q − 1 2 − H/Q φ ( Q − 1 2 − H/Q t ) . Note that c φ H has the frequency bandwidth 2 − H/Q Q − 1 . Denote (3) S 0 x ( t ) := x ? φ H ( t ) , DIFFUSE TO FUSE EEG SPECTRA 5 EEG, channel 1 Extract scattering EEG sp ectral features: Scattering transform Determine intrinsic sleep features: DM with the L 2 metric EEG, channel 2 Extract scattering EEG sp ectral features: Scattering transform Determine intrinsic sleep features: DM with the L 2 metric Determine common in trinsic sleep features: m ultiview DM P art 1 P art 2 P art 1 P art 2 P art 2 Figure 1. The flow c hart of the prop osed feature extraction steps and a visualization of the extracted features b y the diffusion map and sensor fusion are shown. The signal is from 20 sub jects in Sleep-EDF SC ∗ [17], where Channel 1 is Fpz-Cz and c hannel 2 is Pz-Oz. In the b ottom figure, only one of the multiview DMs is sho wn. The ratios of the stages Awak e, REM, N1, N2, and N3 are 19%, 18%, 7%, 42%, and 14%. whic h is called the zer oth-or der c o efficients of the sc attering tr ansform . Clearly , S 0 x ( t ) contains only the low frequency information of the signal x . On the other hand, while the wa v elet transform, x ? ψ j ( t ), pro vides multi-scale sp ectral informa- tion of x at the scale 2 − j /Q , the scattering transform outputs translation-inv arian t 6 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU co efficien ts by a v eraging the modulus of the complex coefficients x ? ψ j ( t ) via the follo wing mo ving a v erage step: (4) S 1 x ( t, j 1 ) := | x ? ψ j 1 | ? φ H , where −∞ < j 1 ≤ H . W e call S 1 x ( t, j 1 ) the first-or der c o efficients of the sc attering tr ansform . The time av eraging operator in (4) results in the loss of the fine-scale information in | x ? ψ j 1 | such as vibratos and attac ks [25]. T o retrieve the lost information, we compute | x ? ψ j 1 | ? ψ j 2 . Again, to outputs translation-in v ariant co efficients, w e apply mo dulation and conv olution operators on | x ? ψ j | ? φ H via: (5) S 2 x ( t, j 1 , j 2 ) := || x ? ψ j 1 | ? ψ j 2 | ? φ H , where −∞ ≤ j 1 < j 2 ≤ H . Note that only the wa v elet ψ j 2 with larger scale j 2 > j 1 are needed, since | x ? ψ j 1 | has appro ximately the same frequency supp ort as that of ψ j 1 [25, (29)]. The outputs are called the se c ond-or der c o efficients of the sc attering tr ansform of x . This iterative procedure can b e extended to higher orders. F or example, the third-order co efficients of the scattering transform of x can b e defined b y ev aluating ||| x ? ψ j 1 | ? ψ j 2 | ? φ j 3 | ? φ H , where −∞ ≤ j 1 < j 2 < j 3 ≤ H . In this pap er, only the first-order and second-order co efficients are exploited. Next, w e follo w the suggestions in [25] to p ost-pro cess the scattering transform co efficien ts. T o de-correlate the correlations at differen t orders [25], the first-order scattering co efficients are renormalized as follo ws: (6) e S 1 x ( t, j 1 ) = | x ? ψ j 1 | ? φ H ( t ) | x | ? φ H ( t ) + ε , where ε = 2 − 20 , and the 2nd-order scattering co efficients are renormalized by co ef- ficien ts of the 1st-order: (7) e S 2 x ( t, j 1 , j 2 ) = || x ? ψ j 1 | ? ψ j 2 | ? φ J ( t ) | x ? ψ j 1 | ? φ H ( t ) + ε , where ε = 2 − 20 . Similar to the Mel-frequency cepstral co efficients, we follow [25] and apply a logarithm to x ? φ H ( t ) and the normalized co efficients, i.e., log e S 1 x ( t, j 1 ) and log e S 2 x ( t, j 1 , j 2 ). According to [25], this step can transfer the multiplicativ e comp onen ts in to additiv e ones and improv e the classification p erformance. W e now apply the scattering transform to extract features from the EEG signal. F ollowing the R&K [3] and AASM classifications [2], we take an epo c h for the sleep stage ev aluation to be 30 seconds long. Suppose the l -th ep o ch starts at a certain time t l and ends at t l + 30. Let x l ( t ) b e a 90s EEG signal segment during the p erio d [ t l − 60 , t l + 30]; that is, we consider 60s extra EEG signal b efore the ep o ch w e hav e interest in and hence the signal length is T = 90s. The EEG signal is sampled uniformly every τ second, where τ is the reciprocal of the sampling rate. F or example, τ = 1 / 100 for the Sleep-EDF Database. In this work, w e consider the wa velet ψ to b e the Morlet mother wa velet, and take Q = 2. Numerically , w e represen t x k as a 9 , 000-dim v ector and take H = 17, 0 ≤ j 1 ≤ H for S 1 x k , and 0 ≤ j 1 < j 2 ≤ H for S 2 x k . Since S 0 x l , S 1 x l and S 2 x l are redundant at the time axis, they are subsampled at the rate 2 H/Q − 1 ; that is, ov er half-ov erlapping time windo ws of size 2 H/Q cen tered at the points t l := l 2 H/Q − 1 , where l ∈ N . The collection of these subsampled co efficients are concatenated and called the sc attering EEG sp e ctr al fe atur e of the k -th epo ch, which we denote as u ( k ) . DIFFUSE TO FUSE EEG SPECTRA 7 2.2. F eature extraction step, part 2: (Multiview) diffusion map. In the optimal situation, the sleep dynamics at each ep o ch can b e well captured by the scattering EEG sp ectral features. How ever, the scattering EEG sp ectral features migh t b e erroneous due to the inevitable noise, other sensor-specific artifacts and the information distortion caused by the observ ation procedure. W e then stabilize these features to b etter quantify the intrinsic sleep dynamics by applying DM. DM is a nonlinear manifold learning algorithm that not only is robust to noise but also resp ects the nonlinear structure of the intrinsic dynamics. W e detail its implemen- tation b elow, together with its generalization, m ultiview DM, to fuse information from multiple channels. 2.2.1. Extr act intrinsic sle ep dynamics by diffusion map. T ak e the scattering EEG sp ectral features U x,k := { u ( j ) } J k j =1 generated from the single-channel EEG sig- nal x ( t ) of the k -th sub ject, where k = 1 , . . . , K , which is a p oint cloud in an Euclidean space. Denote U x := ∪ K k =1 U x,k = { u ( j ) } J j =1 , where J = P K k =1 J k and { u ( J 1 + ... + J k − 1 +1) , . . . , u ( J 1 + ... + J k ) } comes from U x,k . First, from the p oint cloud U x , we build a graph with U x b eing vertices. The affinity betw een features u ( i ) and u ( j ) is defined as (8) W x ( i, j ) = exp  − k u ( i ) − u ( j ) k ε  , for i, j = 1 , . . . , J , where  > 0 is chosen by the user and k · k is the Euclidean distance comparing tw o scattering EEG sp ectral features. Thus, we hav e a J × J affinity matrix W x . Next, define the degree matrix D x of size J × J , whic h is diagonal, as (9) D x ( i, i ) = J X j =1 W x ( i, j ) , for i = 1 , . . . , J. With matrices W x and D x , w e could define a random w alk on the p oint cloud U x with the transition matrix given b y the formula A x := D − 1 x W x . (10) Since A x is similar to the symmetric matrix D − 1 / 2 x W x D − 1 / 2 x , it has a complete set of righ t eigenv ectors φ 1 , φ 2 , · · · , φ J ∈ R J with corresp onding eigen v alues 1 = λ 1 > λ 2 ≥ · · · ≥ λ J ≥ 0, where φ 1 = [1 , 1 , . . . , 1] T ∈ R J . Indeed, from the eigen- decomp osition D − 1 / 2 x W x D − 1 / 2 x = O Λ O > , where O is a J × J orthonormal matrix and Λ is a J × J diagonal matrix, we hav e A x = U Λ V > , where U = D − 1 / 2 x O and V = D 1 / 2 x O . With the decomp osition A x = U Λ V > , the DM is defined as (11) Φ x t : u ( j ) 7→  λ t 2 φ 2 ( j ) , λ t 3 φ 3 ( j ) , . . . , λ t ˆ d +1 φ ˆ d +1 ( j )  , where j = 1 , . . . , J and t > 0 is the diffusion time c hosen b y the user, and ˆ d is an estimate of the dimension of the intrinsic state. Here, ˆ d can b e ob- tained according to the sp ectral gap in the deca y of the eigenv alues { λ j } J j =1 . In other w ords, Φ x t ( u ( j ) ) consists of the second to ˆ d + 1 co ordinates of e > j U Λ t , where e j is the unit J -dim vector with the j -th entry 1. As a result, the scat- tering EEG sp ectral features U x are conv erted into a set of new features in R ˆ d via the DM. Denote F x := { Φ x t ( u ( j ) ) } J j =1 ⊂ R ˆ d . Therefore, we obtain the in- trinsic sle ep fe atur es for the recorded EEG signal x of the k -th sub ject F x,k := { Φ x t ( u ( J 1 + ... + J k − 1 +1) ) , . . . , Φ x t ( u ( J 1 + ... + J k ) ) } . 8 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU Note that the diffusion process plays an essential role in the whole algorithm, since (11) is based on the diffusion nature of the random w alk with the transition matrix (10). Based on the sp ectral geometry result rep orted in [27], and the asymp- totical spectral conv ergence results sho wn in [28, 29], we kno w that the Riemannian manifold hosting the inaccessible intrinsic sleep features is almost isometrically re- co vered b y DM (11). In addition to reco v ering the in trinsic state space, another imp ortan t prop erty of DM is its robustness to noise. Essentially , the diffusion pro- cess could b e viewed as a nonlinear “av eraging” pro cess. This viewp oint w as first captured and prov ed in [30], and later extended to larger noise in [31]. Another imp ortan t fact is that the embedding provided b y DM is unique up to a global rotation, so it is not feasible to construct intrinsic sleep features for each sub ject separately and p o ol them together for the prediction purp ose. The prop osed algo- rithm here applies DM to all scattering EEG sp ectral features from all sub jects to construct in trinsic sleep features, whic h is ov er a universal c o or dinate that allows us a comparison among differen t sub jects. See Figure 1 for an example of DM when ˆ d = 3. It is clear that ep o ch s of different sleep stages are clustered and well separated. 2.2.2. F use two channels via multiview DM. If only one EEG signal x is a v ailable, w e pro ceed to the learning step in Section 2.3 with the intrinsic sleep features F x for c hannel x . If w e ha v e t wo or more c hannels, we can tak e them into account sim ultaneously . In general, this problem is understo o d as the sensor fusion prob- lem. While different EEG channels capture information from the same brain, the information recorded migh t v ary and they might be contaminated b y brain-activity irrelev ant artifacts from different sources, including noise and other sensor-sp ecific n uisance. These artifacts not only deteriorate the qualit y of the extracted features but migh t also mislead the analysis result. The main purpose of sensor fusion is dis- tilling the brain information and remo ving those unw anted artifacts. A naive w ay to com bine information from x and y is simply concatenating features and form a new feature. How ever, due to the inevitable sensor-sp ecific artifacts or errors, such a concatenating scheme might not b e the optimal route to fuse sensors [32, 33]. W e consider the recently dev elop ed diffusion-based sensor fusion approach, multiview DM [23] or equiv alently the alternating DM [33], to fuse sensor information. Essen- tially , the information from differen t sensors are “diffused” to in tegrate the common information, and simultaneously eliminate artifacts or noise sp ecific to each sensor. In this work, we fo cus on tw o sim ultaneously recorded EEG c hannels x and y . Here we detail the m ultiview DM algorithm. F or each channel, we apply Sections 2.1 and 2.2 to obtain its in trinsic sleep features, denoted as F x ⊂ R ˆ d x and F y ⊂ R ˆ d y resp ectiv ely , where ˆ d x migh t be differen t from ˆ d y . Define (12) W x,y =  0 J × J W x W y W y W x 0 J × J  ∈ R 2 J × 2 J , where W x and W y are affinity matrices defined in (8), and define a diagonal matrix D x,y ∈ R 2 J × 2 J so that its i -th diagonal en try is the sum of the i -th row of W x,y . Denote q i ∈ R 2 J to b e the i -th righ t eigenv ector of the transition matrix D − 1 x,y W x,y corresp onding to eigenv alue σ i . Since for eac h i , q i ( l ) and q i ( J + l ) corresp ond to the l -th epo ch for eac h l ∈ { 1 , . . . , J } , w e call the 2 ˜ d -dimensional vector (13) v j := [ σ t 2 q 2 ( j ) · · · σ t ˜ d +1 q ˜ d +1 ( j ) σ t 2 q 2 ( J + j ) · · · σ t ˜ d +1 q ˜ d +1 ( J + j )] > DIFFUSE TO FUSE EEG SPECTRA 9 the c ommon intrinsic sle ep fe atur e asso ciated with the sleep stage of the j -th epo ch, where j ∈ { 1 , . . . , J } . Denote F x,y := { v j } J j =1 . Hence, we obtain the common in trinsic sleep features for the simultaneously rec orded EEG signals x and y of the k -th sub ject F x,y ,k := { v ( J 1 + ... + J k − 1 +1) , . . . , v ( J 1 + ... + J k ) } . Note that since the transition matrix D − 1 x,y W x,y is asso ciated with the random w alk on the bipartite graph formed from ep o chs of x and y , its eigenv ectors “co- clustering” tw o channels [34]. Hence, w e can use its eigenv ectors as features for the sleep stage. W e men tion that when there are only t w o c hannels, like in our case, the m ultiview DM is equiv alent to alternating DM [33], but when there are more than t wo channels, the m ultiview DM is different from the generalization of alternating DM to multiple c hannels [35]. An illustration of the result of multiview DM with ˜ d = 3 is shown at the b ottom of Figure 1. 2.3. Learning step: Sleep Stage Classification by the supp ort vector ma- c hine. In this w ork, we choose the standard and widely used kernel SVM [24] for the learning step. SVM find a hyperplane in the feature space that separates the data set in to tw o disjoint subsets. Based on the repro ducing kernel Hilb ert space theory , SVM is generalized to the kernel SVM , which allows for classification when there exists a nonlinear structure in the feature space. Sp ecifically , kernel SVM finds a nonlinear surface separating the data set into tw o disjoints subsets. W e refer the in terested reader to [24] for tec hnical details. In this w ork, we c ho ose the radial based function, K ( x, x 0 ) = exp( − k x − x 0 k 2 2 2 σ 2 ), where σ > 0, as the kernel function. In this sleep dynamics classification problem, the lab el is multi-class so w e need to further generalize the k ernel SVM to the m ulti-class SVM to complete our mission. T o this end, we apply the one-versus-all (O V A) classification sc heme [36]. 2.4. Material. T o ev aluate the prop osed algorithm, we consider the commonly considered b enchmark database , Sleep-EDF Database [Expanded], from the public rep ository Ph ysionet [17]. It contains t wo subsets (marked as SC ∗ and ST ∗ ). The first subset SC ∗ comes from health y sub jects without any sleep-related medication. The subset SC ∗ con tains Fpz-Cz/Pz-Oz EEG signals recorded from 10 males and 10 females without any sleep-related medication, and the age range is 25-34 year-old. There are tw o approximately 20-hour recordings p er sub ject, apart from a single sub ject for whom there is only a single recording. The EEG signals were recorded during tw o subsequent day-nigh t p erio ds at the sub jects’ home. The sampling rate is 100 Hz. The second subset ST ∗ w as obtained in a 1994 study of temazepam effects on the sleep of sub jects with mild difficult y falling asleep. The subset ST ∗ con tains Fpz-Cz/Pz-Oz EEG signals recorded from 7 males and 15 females, who had mild difficulty falling asleep. Since this data set is originally used for studying the effects of temazepam, the EEG signals were recorded in the hospital for tw o nigh ts, one of which was after temazepam in take. Only their placeb o nights can b e do wnloaded from [17]. The sampling rate is 100 Hz. F or b oth SC ∗ and ST ∗ sets, eac h 30s ep o c h of EEG data has b een annotated in to the classes Aw ak e, REM, N1,N2, N3 and N4. The ep o chs corresp onding to mov ement and unkno wn stages w ere excluded and the the ep o chs lab eled by N4 are relab eled to N3 according to the AASM standard. F or more details of the database, w e refer the reader to https://www.physionet.org/physiobank/database/sleep- edfx/ . 10 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU 2.5. Statistics. T o ev aluate the p erformance of the prop osed automatic sleep scor- ing algorithm, we consider the LOSOCV scheme. F or each database, one sub ject is randomly chosen as the testing set and the remaining sub jects form the training set. Sp ecifically , when there are n sub jects, for the ` -th sub ject, we train the kernel SVM mo del using the common in trinsic sleep features ∪ k =1 ,...,K, k 6 = ` F x,y ,k with the pro vided lab els, and test the trained mo del on F x,y ,` and rep ort the results. Then apply the same procedure for sub ject ` = 1 , . . . , n . Note that this LOSOCV sc heme helps preven t ov erfitting in constructing the prediction mo del. All p erformance measurements used in this pap er are computed through the unnormalized confusion matrix M ∈ R 5 × 5 . F or 1 ≤ p, q ≤ 5, the en try M pq represen ts the n umber of exp ert-assigned p -class ep o chs, whic h were predicted to the q -class. The precision (PR p ), recall (RE p ), and F1-score (F1 p ) of the p -th class, where p = 1 , . . . , 5, are computed resp ectiv ely through (14) PR p = M pp P 5 q =1 M q p , RE p = M pp P 5 q =1 M pq , F1 p = 2PR p · RE p PR p + RE p . The ov erall accuracy (ACC), macro F1 score (Macro-F1) and k appa ( κ ) co efficient are computed resp ectively through (15) A CC = P 5 p =1 M pp P 5 p,q =1 M pq , Macro-F1 = 1 5 5 X p =1 F1 p , κ = A CC − EA 1 − EA , where EA means the exp ected accuracy , which is defined by (16) EA = P 5 p =1  P 5 q =1 M pq  ×  P 5 q =1 M q p   P 5 p,q =1 M pq  2 . T o ev aluate if t wo matched samples hav e the same mean, w e apply the one-tail Wilco xon signed-rank test under the null hypothesis that the difference b et ween the pairs follo ws a symmetric distribution around zero. When w e compare the v ariance, w e apply the one-tail F test under the null hypothesis that there is no difference b et ween the v ariances. W e consider the significance lev el of 0 . 05. T o handle the m ultiple comparison issue, w e consider the Bonferroni correction. 3. Resul ts In this section, we rep ort the results of applying the prop osed algorithm to the ab ov e-mentioned t w o databases. W e show a visualization of the (common) in trinsic sleep features. Note that this can b e viewed as a dimension reduction of the EEG signals. The dynamics of the intrinsic sleep features are also shown to compared with the hypnogram. The confusion matrices of the automatic sleep stage classification on differen t databases are shown, and a performance comparison with other rep orted results is also provided. The parameters in the numerical implemen tation are listed here. F or DM and m ultiview DM, the  in (8) is c hosen to b e the 1% p ercentile of pairwise distances, the diffusion time is t = 0 . 3. ˆ d and ˜ d are c hosen to b e 80. No systematic parameter optimization is p erformed. F or the repro ducibilit y purpose, the Matlab code will b e pro vided via request. DIFFUSE TO FUSE EEG SPECTRA 11 3.1. Sleep-EDF database (SC*). W e start from showing the visualization of the in trinsic sleep features and the common in trinsic sleep features from the total 20 differen t sub jects in the Sleep-EDF SC* database. See the middle of Figure 1 for a visualization of eac h EEG c hannel b y DM. In the bottom of Figure 1, we plot the em b eddings { [ q 2 ( i ) , q 3 ( i ) , q 4 ( i )] } J i =1 and { [ q 2 ( i + J ) , q 3 ( i + J ) , q 4 ( i + J )] } J i =1 for a visualization of fusing tw o EEG channels generated by the m ultiview DM. Clearly , w e see that Awak e, REM, N2 and N3 stages are well clustered in all plots, while N1 is less clustered and tends to mixed up with other stages. While the sleep dynamics can b e easily visualized in Figure 1, it is not easy to visualize the temp oral dynamics information. F or this purp ose, we sho w the final in trinsic features expanded in the time line in Figure 2. Figure 2. A visualization of the temporal dynamics in common in trinsic sleep features extracted by the multiview DM. The first in trinsic feature is colored in red, and the tenth intrinsic is colored in blue. The expert’s labels are plotted on the top. Since there are long perio ds of w ak efulness at the start and the end of recordings, when a sub ject is not sleeping, [19] only includes 30 min utes of such p erio ds just b efore and after the sleep perio ds. T o hav e a fair comparison, we also follow this truncation rule in this work. In the end, the labeled ep o c hs are im balanced, with 42.4% ep o c hs lab eled N2 and only 6.6% epo chs lab eled N1. W e run the lea v e-one-sub ject-out CV. The a veraged confusion matrix of the prop osed algorithm ov er 20 sub jects is shown in T able 1 for the single-channel case and T able 2 for the tw o-channel case. F or the the single-c hannel case (e.g., Fpz-Cz), the o v erall accuracy is 82 . 72%, the macro F1 is 75 . 91% and Cohen’s k appa equals 76 . 10%. F or the tw o-c hannel case, the ov erall accuracy is 84 . 44% and the macro F1 is 78 . 25%, with Cohen’s k appa 78 . 36%. Note that the N1 prediction is relatively lo w compared with other stages. Particularly , the F1 is 46% and most N1 ep o chs are classified as REM or N2. This misclassification is related to the scattered N1 ep o chs in Figure 1 that can be visually observ ed, and it is the main reason to 12 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU drag down the o verall accuracy and macro F1. W e also note that N3 is commonly classified wrongly as N2, Aw ak e is commonly classified wrongly as N1, and REM is commonly classified wrongly as N2. T o further examine the performance, the resulting h ypnogram of one sub ject is shown in Figure 3. Note that the discrepancy b et ween the exp erts’ annotations and the prediction frequently happ ens when there is a “stage transition”. Note that the sleep dynamics transition from one stage to another one often happ ens in the middle of one ep o ch. Thus, those epo chs with sleep dynamics transition contain information that is not purely for one stage, and hence harder to classify . In T able 3, we compare p erformance of the proposed algorithm and sev eral rep orted algorithms v alidated b y the leav e-one-sub ject-out CV scheme. T able 1. Performance of DM with the scattering transform ev al- uated b y 20-fold lea ve-one-sub ject-out cross-v alidation on the Fpz- Cz c hannel (top) and Pz-Oz c hannel (b ottom) from the Sleep-EDF SC ∗ database. F or the Fpz-Cz c hannel, the ov erall accuracy equals 82.72%, the macro F1 score equals 75.91% and Cohen’s k appa equals 76 . 10%. The standard deviation of classification accuracy (resp., the macro F1 score and Cohen’s k appa) for the 39-night recordings is 8.90% (resp., 7.71% and 11.30%). F or the Fz-Oz c hannel, the ov erall accuracy equals 80.99%, the macro F1 score equals 72.69% and Cohen’s k appa equals 73 . 49%. The standard deviation of classification accuracy (resp., the macro F1 score and Cohen’s k appa) for the 39-night recordings is 5.06% (resp., 5.99% and 6.97%). The training set for eac h testing sub ject consists of the recordings of the remaining 19 sub jects during sleep. Fpz-Cz Predicted P er-class Metrics Awak e REM N1 N2 N3 PR RE F1 Awak e (19%) 6871 ( 87% ) 298 ( 4% ) 563 ( 7% ) 171 ( 2% ) 24 ( 0% ) 83 87 85 REM (18%) 295 ( 4% ) 6253 ( 81% ) 378 ( 5% ) 790 ( 10% ) 1 ( 0% ) 79 81 80 N1 (7%) 549 ( 20% ) 616 ( 22% ) 1040 ( 37% ) 591 ( 21% ) 8 ( 0% ) 46 37 41 N2 (42%) 416 ( 2% ) 692 ( 4% ) 272 ( 2% ) 15821 ( 89% ) 598 ( 3% ) 87 89 88 N3 (14%) 101 ( 2% ) 13 ( 0% ) 0 ( 0% ) 872 ( 15% ) 4717 ( 83% ) 88 83 85 Fz-Oz Predicted P er-class Metrics Awak e REM N1 N2 N3 PR RE F1 Awak e (19%) 6803 ( 86% ) 420 ( 5% ) 466 ( 6% ) 220 ( 3% ) 18 ( 0% ) 84 86 85 REM (18%) 310 ( 4% ) 6038 ( 78% ) 350 ( 5% ) 1019 ( 13% ) 0 ( 0% ) 77 78 78 N1 (7%) 794 ( 28% ) 647 ( 23% ) 755 ( 27% ) 602 ( 22% ) 6 ( 0% ) 42 27 33 N2 (42%) 211 ( 1% ) 703 ( 4% ) 226 ( 1% ) 15943 ( 90% ) 716 ( 4% ) 84 90 87 N3 (14%) 4 ( 0% ) 9 ( 0% ) 2 ( 0% ) 1250 ( 22% ) 4438 ( 78% ) 86 78 82 3.2. Sleep-EDF database (ST*). See Figures 4 and 5 for a visualization of DM and m ultiview DM of the ST* database. While Awak e, REM, N2 and N3 stages DIFFUSE TO FUSE EEG SPECTRA 13 T able 2. Performance of the m ultiview DM with the scatter- ing transform ev aluated by 20-fold lea ve-one-sub ject-out cross- v alidation on the Fpz-Cz and Pz-Oz channels from the Sleep-EDF SC ∗ database. The o v erall accuracy equals 84.44%, the macro F1 score equals 78.25% and Cohen’s k appa equals 78 . 36%. If the classification accuracy , macro F1 score, and Cohen’s k appa are computed for eac h night recording, the standard deviation of clas- sification accuracy (resp., the macro F1 score and Cohen’s k appa) for the 39-nigh t recordings is 5.36% (resp., 6.40% and 8.01%). The training set for eac h testing sub ject consists of the recordings of the remaining 19 sub jects during sleep. Fpz-Cz+Pz-Oz Predicted P er-class Metrics Awak e REM N1 N2 N3 PR RE F1 Awak e (19%) 7034 ( 89% ) 148 ( 2% ) 525 ( 7% ) 197 ( 2% ) 23 ( 0% ) 90 89 90 REM (18%) 125 ( 2% ) 6070 ( 79% ) 528 ( 7% ) 991 ( 13% ) 3 ( 0% ) 85 79 82 N1 (7%) 498 ( 18% ) 436 ( 16% ) 1218 ( 43% ) 643 ( 23% ) 9 ( 0% ) 47 43 46 N2 (42%) 115 ( 1% ) 492 ( 3% ) 313 ( 2% ) 16337 ( 92% ) 542 ( 3% ) 86 92 89 N3 (14%) 17 ( 0% ) 0 ( 0% ) 1 ( 0% ) 921 ( 16% ) 4764 ( 84% ) 89 86 84 Figure 3. The resulting hypnogram of one sub ject from SC*. The gra y curve is the expert’s lab el, and the blac k dots are the predicted sleep stages. The discrepancy is emphasized by the blac k circles. are still well clustered in all plots, compared with the normal sub jects in the SC* database, the separation is less clear. W e run the lea v e-one-sub ject-out CV. The a veraged confusion matrix of the prop osed algorithm ov er 22 sub jects is shown in T able 4 for the single-channel case and T able 5 for the t wo-c hannel case. F or the single channel case (e.g., Fpz-Cz c hannel), the ov erall accuracy is 78 . 63%, the macro F1 is 73 . 58%, and Cohen’s k appa is 69 . 48%. F or the t wo-c hannel case, the ov erall accuracy is 79 . 05% and the macro F1 is 74 . 73%, with Cohen’s k appa 70 . 31%. In this database, there are 10% ep o chs lab eled as N1, which is sligh tly higher than that of the SC ∗ database. While the prediction p erformance of N1 is slightly higher (50% when w e use the Fpz-Cz 14 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU T able 3. Comparison b etw een DM-HMM and other sleep stage scoring metho ds in terms of the o v erall accuracy (ACC), macro- F1 score (MF1) and Cohen’s k appa. Note that the total n umber of ep o c hs in the ST ∗ dataset is 21076 but the performance of our algorithm is ev aluated on 20988 ep o c hs. In fact, we do not predict the sleep stages of the first 3 ep ochs and the last ep o c h of each recording in the ST ∗ dataset b ecause the features for each ep o ch is extracted from a 90s signal. As a result, 21076 − 88 = 20988 ep o chs in the ST ∗ dataset are ev aluated. Author EEG Channel Extracted features Mo del \ Algorithm A CC \ MF1 [19] SC ∗ Fpz-Cz (41950 ep o chs) - - - - - - - - - - SC ∗ Pz-Oz (41950 ep o chs) Raw data without any prior processing Recurrent Neural Netw orks with class-balanced random sampling and stochastic gradient descent 82.0% \ 76.9% ( κ =76%) - - - - - - - - - - 79.8% \ 73.1% ( κ =72%) [16] SC ∗ Fpz-Cz (37022 ep o chs) (a) Complex Morlet wa velets (b) Time-domain amplitude characteristics (c) Pearson correlation co efficient b etw een each pair of frequency-band signals (d) Auto correlation in the time domain signal for 0.5s lags Stack ed sparse autoenco ders induced by the class-balanced random sampling 78.9% \ 73.7% [18] SC ∗ Fpz-Cz (37022 ep o chs) Raw data without any prior processing Convolutional Neural Netw orks with class-balanced random sampling and stochastic gradient descend 74.8% \ 69.8% Ours SC ∗ Fpz-Cz (41950 ep o chs) - - - - - - - - - - - - - - - - - SC ∗ Pz-Oz (41950 ep o chs) - - - - - - - - - - - - - - - - - SC ∗ Fpz-Cz + Pz-Oz (41950 ep o chs) - - - - - - - - - - - - - - - - - ST ∗ Fpz-Cz (20988 ep o chs) - - - - - - - - - - - - - - - - - ST ∗ Pz-Oz (20988 ep o chs) - - - - - - - - - - - - - - - - - ST ∗ Fpz-Cz + Pz-Oz (20988 ep o chs) Scattering sp ectrum (multiview) diffusion map Support V ector Machine 82.72% \ 75.91% ( κ =76.10%) - - - - - - - - - - 80.99% \ 72.69% ( κ =73.49%) - - - - - - - - - - 84.44% \ 78.25% ( κ =78.36%) - - - - - - - - - - 78.63% \ 73.58% ( κ =69.48%) - - - - - - - - - - 75.74% \ 69.97% ( κ =65.39%) - - - - - - - - - - 79.05% \ 74.73% ( κ =70.31%) DIFFUSE TO FUSE EEG SPECTRA 15 Figure 4. A visualization of the intrinsic sleep features (from single c hannel) extracted from 22 differen t sub jects from the Sleep-EDF database (ST*). In subplot 3.2, w e plot { [ q 2 ( i ) , q 3 ( i ) , q 4 ( i )] } J i =1 , and in subplot 3.2, w e sho w { [ q 2 ( i + J ) , q 3 ( i + J ) , q 4 ( i + J )] } J i =1 . The ratios of the stages Awak e, REM, N1, N2, and N3 are 10%, 20%, 10%, 45%, and 15% respectively . Eac h point corresp onds to a 30-second ep o ch. Figure 5. A visualization of the common in trinsic sleep fea- tures (from tw o channels) extracted from 22 different sub jects from the Sleep-EDF database (ST*). In subplot 3.2, we plot { [ q 2 ( i ) , q 3 ( i ) , q 4 ( i )] } J i =1 , and in subplot 3.2, w e sho w { [ q 2 ( i + J ) , q 3 ( i + J ) , q 4 ( i + J )] } J i =1 . The ratios of the stages Awak e, REM, N1, N2, and N3 are 10%, 20%, 10%, 45%, and 15% respectively . Eac h point corresp onds to a 30-second ep o ch. c hannel and 52% when we use tw o channels), it is still relatively low. Also, note that the prediction performance of N3 is lo w er, and a significant p ortion of N3 is mis-classified as N2. 3.3. More comparisons. 3.3.1. Hand ling imb alanc e d data. An ideal approac h to handle the imbalanced data is collecting more data for the small-sized group to enhance the prediction accuracy . In the SC ∗ dataset, there were long p erio ds of a w ak e ep o chs b efore the start and 16 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU T able 4. Performance of DM with the scattering transform ev al- uated b y 22-fold lea ve-one-sub ject-out cross-v alidation on the Fpz- Cz or Pz-Oz c hannel from the Sleep-EDF ST ∗ database. F or Fpz- Cz, the ov erall accuracy equals 78.63%, the macro F1 score equals 73.58% and Cohen’s k appa equals 69 . 48%. The standard devia- tion of classification accuracy (resp., the macro F1 score and Co- hen’s k appa) for the 22-night recordings is 7.97% (resp., 8.11% and 10.25%). F or Pz-Oz, the ov erall accuracy equals 75.74%, the macro F1 score equals 69.97% and Cohen’s k appa equals 65 . 39%. The standard deviation of classification accuracy (resp., the macro F1 score and Cohen’s k appa) for the 22-nigh t recordings is 7.24% (resp., 8.30% and 8.92%). Fpz-Cz Predicted P er-class Metrics Awak e REM N1 N2 N3 PR RE F1 Awak e (10%) 1726 ( 78% ) 63 ( 3% ) 364 ( 16% ) 55 ( 3% ) 5 ( 0% ) 79 78 78 REM (20%) 84 ( 2% ) 3196 ( 77% ) 326 ( 8% ) 518 ( 13% ) 6 ( 0% ) 86 77 82 N1 (10%) 327 ( 16% ) 256 ( 13% ) 1003 ( 49% ) 444 ( 22% ) 2 ( 0% ) 50 49 50 N2 (45%) 51 ( 0% ) 190 ( 2% ) 307 ( 3% ) 8502 ( 90% ) 439 ( 5% ) 81 90 85 N3 (15%) 10 ( 0% ) 1 ( 0% ) 7 ( 0% ) 1034 ( 33% ) 2093 ( 67% ) 82 67 74 Pz-Oz Predicted P er-class Metrics Awak e REM N1 N2 N3 PR RE F1 Awak e (10%) 1803 ( 81% ) 38 ( 2% ) 334 ( 15% ) 37 ( 2% ) 1 ( 0% ) 74 81 78 REM (20%) 59 ( 2% ) 3272 ( 79% ) 204 ( 5% ) 592 ( 14% ) 3 ( 0% ) 78 79 79 N1 (10%) 449 ( 22% ) 383 ( 19% ) 801 ( 39% ) 395 ( 20% ) 4 ( 0% ) 49 39 44 N2 (45%) 110 ( 1% ) 401 ( 4% ) 282 ( 3% ) 8176 ( 86% ) 520 ( 6% ) 79 86 82 N3 (15%) 5 ( 0% ) 75 ( 3% ) 9 ( 0% ) 1195 ( 38% ) 1861 ( 59% ) 78 59 67 after the end of sleep that w e can use. T o further ev aluate the algorithm, w e consider longer p eriods of w akefulness just before and after the sleep p erio ds. Apart from the 2 recordings (sc4092e0 and sc4192e0), we included 90 minutes of aw ak e p eriods b efore and after the sleep p erio ds. F or the sc4092e0 and sc4192e0 recordings, w e only included 60 minutes of aw ak e p erio ds just b efore and after the sleep p erio ds due to the app earance of artifacts (lab eled as MOVEMENT and UNKNOWN), whic h w ere in the start or the end of each recording. With more a wak e ep o c hs, the corresp onding comparison matrix is sho wn in T able 6. In comparison with the result sub ject to the 30-min ute truncation rule, the accuracy and F1 of Awak e increase from 88% and 90% to 91% and 94% resp ectively , and the accuracy and F1 of N1 c hange from 43% and 46% to 63% and 43% resp ectiv ely . Note that the F1 of N1 decreases since the precision decreases due to the increased n umber of a w ake ep o chs and increased n umber of ep o chs erroneously classified as N1. Another approac h is handling the imbalanced data according to the class-balanced random sampling scheme prop osed in [16] under the same 30-minute truncation rule. Sp ecifically , for eac h recording, w e randomly sample K ep o chs for p er-stage, DIFFUSE TO FUSE EEG SPECTRA 17 T able 5. Performance of the m ultiview DM with the scatter- ing transform ev aluated by 22-fold lea ve-one-sub ject-out cross- v alidation on the Fpz-Cz and Pz-Oz channels from the Sleep-EDF ST ∗ database. The ov erall accuracy equals 79.05%, the macro F1 score equals 74.73% and Cohen’s k appa equals 70 . 31%. The stan- dard deviation of classification accuracy (resp., the macro F1 score and Cohen’s k appa) for the 22-night recordings is 7.21% (resp., 7.58% and 9.38%). Fpz-Cz+Pz-Oz Predicted Pe r-class Metrics Awak e REM N1 N2 N3 PR RE F1 Awak e (10%) 1838 ( 83% ) 19 ( 1% ) 323 ( 15% ) 29 ( 1% ) 4 ( 0% ) 79 83 81 REM (20%) 26 ( 1% ) 3344 ( 81% ) 288 ( 7% ) 465 ( 11% ) 7 ( 0% ) 86 81 84 N1 (10%) 344 ( 17% ) 244 ( 12% ) 1059 ( 52% ) 380 ( 19% ) 5 ( 0% ) 52 52 52 N2 (45%) 109 ( 1% ) 253 ( 3% ) 352 ( 4% ) 8265 ( 87% ) 510 ( 5% ) 81 87 84 N3 (15%) 8 ( 0% ) 10 ( 0% ) 10 ( 0% ) 1016 ( 32% ) 2101 ( 67% ) 80 67 73 where K is the num b er of epo chs of the least represented stage (N1). W e apply this class-balanced random sampling sc heme b efore using the SVM to train the common intrinsic sleep features. See T able 6 for details. Compared with T able 2, the accuracy of N1 can be improv ed from 43% to 76% and the o verall accuracy decreases to 81.12%. 3.3.2. With and without the sensor fusion. T o appreciate the significance of the diffusion geometry based sensor fusion framework, w e report the results without the critical setup in the prop osed algorithm – the sensor fusion. W e consider the case if w e simply concatenate in trinsic sleep features of t w o channels, instead of taking the common in trinsic sleep features; that is, w e concatenate Φ x t ( u ( j ) ) and Φ y t ( u ( j ) ) in (11) directly to replace (13) when w e train the kernel SVM model. W e also compare the single EEG channel result with the fusion result; that is, we train the k ernel SVM on the in trinsic sleep features extracted from the Fpz-Cz or Pz-Oz c hannel. Differen t comparisons of the SC ∗ database are shown in Figure 6. It is clear that the a v eraged A CC, MF1 and Cohen’s k appa are consisten tly do wngraded in these three cases. In Figure 6, we see that compared with single channel or sensor fusion with a direct concatenation, the prop osed sensor fusion of tw o c hannels consistently impro ve the result with statistical significance, for both mean and v ariance of A CC, MF1 and Cohen’s k appa. This fact reflects the essen tial property of the diffusion- based algorithms. Via the diffusion-based sensor fusion, the intrinsic sleep features are “stabilized”, and hence a smaller v ariance. 4. Discussion and Conclusion The diffusion geometry based sensor fusion framework is prop osed to capture the geometric structure of the sleep dynamics. W e take the scattering transform of EEG signals as the initial feature and test the framework on the publicly a v ailable b enc hmark database. With the learning algorithm SVM, w e obtain an accurate 18 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU T able 6. Performance of the m ultiview DM with the scatter- ing transform ev aluated by 20-fold lea ve-one-sub ject-out cross- v alidation on Fpz-Cz and Pz-Oz channels from the Sleep-EDF SC* database with tw o different approac hes to handle the imbal- anced groups. (1) take a longer aw ake perio ds; (2) apply the class- balanced random sampling sc heme proposed in [16]. F or the ap- proac h (1), the ov erall accuracy equals 83.69%, the macro F1 score equals 77.11% and Cohen’s k appa equal 78.14%. If the classifica- tion accuracy , macro F1 score, and Cohen’s k appa are computed for each nigh t recording, the standard deviation of classification accuracy (resp. the macro F1 score and Cohen’s k appa) for the 39-nigh t recordings is 7.23% (resp. 5.47% and 8.77%). F or the ap- proac h (2), the ov erall accuracy equals 81.12%, the macro F1 score equals 77.33% and Cohen’s k appa equals 74 . 82%. If the classifica- tion accuracy , macro F1 score, and Cohen’s k appa are computed for each nigh t recording, the standard deviation of classification accuracy (resp., the macro F1 score and Cohen’s k appa) for the 39-nigh t recordings is 6.29% (resp., 7.30% and 8.77%). Longer aw ake perio ds Fpz-Cz+Pz-Oz Predicted P er-class Metrics Awak e REM N1 N2 N3 PR RE F1 Awak e (34%) 15478 ( 91% ) 221 ( 1% ) 1245 ( 7% ) 76 ( 1% ) 44 ( 0% ) 97 91 94 REM (15%) 114 ( 2% ) 5954 ( 77% ) 1071 ( 14% ) 570 ( 7% ) 8 ( 0% ) 80 77 79 N1 (5%) 167 ( 6% ) 499 ( 18% ) 1754 ( 63% ) 351 ( 12% ) 33 ( 1% ) 33 63 43 N2 (35%) 130 ( 1% ) 740 ( 4% ) 1196 ( 6% ) 14337 ( 81% ) 1396 ( 8% ) 91 81 85 N3 (11%) 18 ( 0% ) 3 ( 0% ) 44 ( 1% ) 407 ( 7% ) 5231 ( 92% ) 78 92 84 Class-balanced random sampling scheme [16] Fpz-Cz+Pz-Oz Predicted P er-class Metrics Awak e REM N1 N2 N3 PR RE F1 Awak e (19%) 6444 ( 81% ) 103 ( 1% ) 1313 ( 17% ) 31 ( 0% ) 36 ( 1% ) 96 81 88 REM (18%) 35 ( 1% ) 5794 ( 75% ) 1322 ( 17% ) 563 ( 7% ) 3 ( 0% ) 87 75 81 N1 (7%) 186 ( 7% ) 238 ( 8% ) 2123 ( 76% ) 242 ( 9% ) 15 ( 0% ) 33 76 46 N2 (42%) 58 ( 0% ) 529 ( 3% ) 1595 ( 9% ) 14517 ( 82% ) 1100 ( 6% ) 92 82 86 N3 (14%) 15 ( 0% ) 1 ( 0% ) 65 ( 1% ) 469 ( 8% ) 5153 ( 91% ) 82 90 86 prediction model, and the result is compatible with several state-of-the-art algo- rithms based on neural net w ork (NN). In addition, the chosen tools all hav e solid theoretical backup. All these summarize the usefulness of the diffusion geometry framew ork for the sleep dynamics study . W e mention that the prop osed framework is flexible to study other physiological dynamics but not only for studying the sleep dynamics. F or example, its v ariation has b een applied to study f-wa ve from sub jects with atrial fibrillation [37], in tra-cranial EEG signal [38], fetal electrocardiogram (ECG) extraction from single lead trans-ab dominal maternal ECG signal [39], etc. 4.1. Dealing with misclassification of stage 1 sleep. Although our ov erall prediction accuracy is compatible with the state-of-the-art prediction algorithm in the Sleep-EDF SC* database, like [19], we see that the prediction accuracy of N1 DIFFUSE TO FUSE EEG SPECTRA 19 Figure 6. Comparison b etw een different information fusion meth- o ds in terms of the accuracy (ACC), macro F1-score (MF1) and Cohen’s Kappa for the SC* database. T o ev aluate if the mean is impro ved, the one-tail Wilco xon signed-rank test is applied under the n ull h yp othesis that the difference b etw een the pairs follows a symmetric distribution around zero. T o ev aluate if the v ariance is smaller, we apply the one-tail F test under the n ull hypothesis that there is no difference b etw een the v ariances. ? (resp ectively ?? ) means statistical significance without (resp ectively with) the Bonferroni connection when the mean is compared; # (resp ectively ##) means statistical significance without (respectively with) the Bonferroni connection when the v ariance is compared. is low by our algorithm (when there is only a single channel, Fpz-Cz, and with the same num b er of ep o c hs, the F1 of N1 prediction is 41% by our metho d and 46.6% in [19, T able I I I], and the ov erall accuracy is 82.72% b y our method and 82% in [19, T able II I]). This prediction rate of N1 is also lo w in the Sleep-EDF ST* database. This lo w prediction rate partially comes from the relativ ely small size of N1 epo chs, and partially comes from av ailable c hannels that w e discuss b elow. 20 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU Based on the AASM criteria [1, 2], to distinguish N1 and REM, we need elec- tro o culogram and electrom y ogram signals, which are not av ailable in the dataset. The EEG backgrounds of N1 and N2 are the same, and exp erts distinguish N1 and N2 b y the K-complex or spindle, as w ell as the 3-minute rule . While w e use 90 seconds signal to establish in trinsic sleep features, the 3-min ute rule is not consid- ered in the algorithm. In the prop osed algorithm, the temp oral information among ep o chs is not fully utilized when we design the intrinsic sleep feature. How to in- corp orate the temporal information in to the diffusion geometry framework will b e explored in the future. F urthermore, there are other information in addition to the sp ectral information discussed in this pap er. W e do not extensively explore all p os- sible information, but fo cus on the diffusion geometry and sensor fusion framew ork. F or example, while the v ertex sharp is a common “landmark” indicating transition from N1 to N2, we do not take it in to account. Another in teresting reasoning that it is possible to improv e the N1 accuracy is based on the deep neural netw ork (DNN) result [19]. This suggests that b y taking exp erts’ labels in to accoun t, some distinguishable EEG structure of N1 that is not sensible the scattering transform can b e efficiently extracted by the DNN framew ork prop osed in [19]. In conclusion, since the prop osed features dep end solely on the scattering transform, we ma y need features of differen t categories to capture this unseen N1 structure. Note that b eside N1, the prediction p erformance of N3 is also low er in the Sleep- EDF ST* database, where the sub jects tak e temazepam before data recording. It has b een well known that in general benzo diazepine hypnotics [40] reduces the lo w frequency activit y and enhances spindle frequency . Since our features are mainly based on the sp ectral information, a N3 ep o c h might look more like N2 epo chs, and hence the confusion and the lo wer p erformance. This outcome emphasizes the imp ortance of the drug history when designing the algorithm. 4.2. Comparison with DNN. Compared with the DNN approach [19], whic h is sup ervised in nature, our feature extraction step is unsupervised in nature. Recall that the main difference b etw een the supervised learning and unsup ervised learn- ing is that the lab el information is taken into account in the sup ervised learning approac h. The success of DNN in many fields is w ell known [41], and it is not surprised that it has a great p otential to help medical data analysis. While DNN is in general an useful to ol for the engineering purp ose, it is often criticized of working as a blac k b o x. F or medical problems and datasets, when in terpretation is needed, a mathematically solid and interpretable to ol would b e useful. The algorithm w e prop osed, on the other hand, has a clear in terpretation with solid mathematical supp orts. Specifically , the scattering transform w e c ho ose has a solid harmonic analysis supp ort, and it is motiv ated b y capturing the essence of CNN. In this sense, our approach com bines the “neural netw ork” and diffusion geometry to design features for the sleep dynamics. Note that besides knowing that the scattering transform allows us capturing the dynamical information from the sp ectral domain in a nonlinear w ay , we cannot claim that we ha ve a full interpre- tation of what physiological information the scattering transform tells us. This is a ph ysiological problem we need to explore in the future w ork. Moreov er, a p eculiar prop ert y of medical databases, the “uncertaint y”, deserves more discussion. T ak e the sleep dynamics studied in this pap er as an example. It is well known that the in ter-exp ert agreemen t rate is only about 80% for normal sub jects, not to sa y for sub jects with sleep problems [13]. With this uncertaint y , a sup ervised learning DIFFUSE TO FUSE EEG SPECTRA 21 algorithm might learn both goo d and bad lab els. On the other hand, the prop osed feature extraction approach is independent of the provided labels, and the chosen scattering features all come from the EEG signal, and sp eak solely for the sleep dynamics but not the lab els. T o some extent, the “uncertaint y” issue is less critical via our feature extraction approach, since the uncertain labels are not tak en in to accoun t in the feature extraction step. Since b oth sup ervised and unsup ervised approaches ha ve their own merits, it is natural to seek for a wa y to combine b oth. W e are exploring the p ossibility of com bining DNN and the proposed feature extraction techniques, and the result will b e rep orted in the future w ork. 4.3. Limitation and future w ork. Despite the strength of the prop osed method, the discussion is not complete without mentioning its limitations. While we test the algorithm on tw o databases, and compare our results with those of state-of-the-art algorithms, those databases are small. T o draw a conclusion and confirm its clinical applicabilit y , a large scale and prospective study is needed. Although extended theoretical understandings of applied algorithms, including the scattering transform and diffusion-based sensor fusion hav e b een established in the past decade, there are still op en problems we need to explore. F or example, there are op en questions on the the statistical prop erty of scattering transform on EEG. While alternating diffusion map and multiview DM are designed for the same “sensor fusion ” purp ose, they are developed under differen t motiv ations, and the consequence is never discussed. The relationship b et ween alternating DM and m ul- tiview DM when there are more than 2 c hannels is also not clear at this momen t. It is well known that DM leads to an embedding free up to rotation, so aggregating in trinsic sleep features from different sub jects is not a suitable approach. While the prop osed approach, constructing a universal coordinate by applying DM on the p o ol of all scattering EEG sp ectral features from all sub jects, leads to a satisfac- tory result, in general we run in to the computational issue. Finding a systematic n umerical solution to this problem is out of the scop e of this work. The im balanced dataset is a general challenge in data science. Ma y we take more physiological kno wledge to conquer this limitation is an interesting direction to explore. Another op en problem is how to further take the temp oral information, lik e the 3-minute rule, into account when w e deal with the in ter-individual prediction. The abov e men tioned limitations will be studied and reported in the future work. 5. Conclusion A no vel sleep stage prediction algorithm with an accurate automatic sleep stage annotation is prop osed. The main nov elt y is organizing scattering EEG spectral features b y the unsup ervised diffusion geometry based sensor fusion framework, whic h has a solid mathematical backup. In addition to providing a visualization of the sleep dynamics, the prediction result is compatible with several state-of-the-art algorithms based on neural netw ork. In addition to visually seeing the relationship b et ween differen t sleep stages, the prediction result suggests its p otential in prac- tical applications. The flexibilit y of the prop osed sensor fusion framew ork allo ws researc hers to further explore other researc h problems that hav e complicated dy- namics. F or the reproducibility purp ose and its application to other problems, the Matlab co de will be pro vided via request. 22 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU 6. Funding G.-R. Liu is supp orted b y Ministry of Science and T echnology (MOST) gran t MOST 106-2115-M-006 -016 -MY2 . Y.-L. Lo is supp orted b y MOST grant MOST 101-2220-E-182A-001, 102-2220-E-182A-001, 103-2220-E-182A-001 and MOST 104-220-E-182-002 . Y.-C. Sheu is supp orted b y MOST gran t MOST 106-2115-M-009-006 and NCTS, T aiwan. References [1] C. Iber, S. Ancoli-Isreal, A. C. Jr., , S. Quan, The AASM Manual for Scoring of Sleep and Associated Even ts-Rules: T erminology and T echnical Sp ecification, American Academy of Sleep Medicine, 2007. [2] R. B. Berry , D. G. Budhira ja, et al., Rules for scoring respiratory events in sleep: update of the 2007 AASM manual for the scoring of sleep and asso ciated even ts, J Clin Sleep Med 8 (5) (2012) 597–619. [3] A. Rec htschaffen, A. Kales, A Manual of Standardized T erminology , T ec hniques and Scor- ing System for Sleep Stages of Human Sub jects, W ashington: Public Health Service, US Gov ernment Printing Office, 1968. [4] C. B. Saper, The Neurobiology of Sleep, Continuum 19 (1) (2013) 19–31. [5] T. Kanda, N. Tsujino, E. Kuramoto, Y. Koyama, E. A. Susaki, S. Chik ahisa, H. F unato, Sleep as a biological problem: an ov erview of frontiers in sleep research, The Journal of Physiological Sciences 66 (1) (2016) 1–13. [6] Y. Harrison, J. A. Horne, The impact of sleep depriv ation on decision making: a review., Journal of experimental psychology: Applied 6 (3) (2000) 236. [7] A. Karni, D. T anne, B. S. Rub enstein, J. J. Askenasy , D. Sagi, Dep endence on REM sleep of ov ernight improv ement of a p erceptual skill, Science 265 (5172) (1994) 679–682. [8] J.-E. Kang, M. M. Lim, R. J. Bateman, J. J. Lee, L. P . Sm yth, J. R. Cirrito, N. F ujiki, S. Nishino, D. M. Holtzman, Amyloid-b Dynamics are regulated by Orexin and the sleep- wak e cycle, Science 326 (Nov 13) (2009) 1005–1007. [9] F. Roche Camp o, X. Drouot, A. W. Thille, F. Galia, B. Cabello, M.-P . D’Ortho, L. Brochard, Poor sleep qualit y is associated with late nonin v asive ven tilation failure in patien ts with acute hypercapnic respiratory failure., Critical care medicine 38 (2) (2010) 477–85. [10] J. Horne, L. Reyner, V ehicle accidents related to sleep: a review., Occupational and environ- mental medicine 56 (5) (1999) 289–294. [11] D. Leger, V. Ba yon, J. Laaban, P . Philip, Impact of sleep apnea on economics, Sleep medicine reviews 16 (5) (2012) 455–62. [12] N. A. Collop, W. M. Anderson, B. Boehleck e, D. Claman, R. Goldb erg, D. J. Gottlieb, D. Hudgel, M. Sateia, R. Sch wab, Clinical guidelines for the use of unattended p ortable monitors in the diagnosis of obstructive sleep apnea in adult patients, J Clin Sleep Med 3 (7) (2007) 737–747. [13] R. G. Norman, I. Pal, C. Stewart, J. A. W alsleb en, D. M. Rapoport, Interobserver Agreement Among Sleep Scorers F rom Differen t Cen ters in a Large Dataset, Sleep 23 (7) (2000) 901–908. [14] S. Gudmundsson, T. Runarsson, S. Sigurdsson, Automatic Sleep Staging using Supp ort V ec- tor Machines with Posterior Probability Estimates, in: CIMCA-IA WTIC’06, 2006, pp. 2–7. [15] P . Memar, F. F aradji, A nov el multi-class EEG-based sleep stage classification system, IEEE T ransactions on Neural Systems and Rehabilitation Engineering 26 (1) (2018) 84–95. [16] O. Tsinalis, P . M. Matthews, Y. Guo, Automatic Sleep Stage Scoring Using Time-F requency Analysis and Stack ed Sparse Auto enco ders, Annals of Biomedical Engineering 44 (5) (2016) 1587–1597. [17] A. Goldb erger, L. Amaral, L. Glass, J. Hausdorff, P . Iv anov, R. Mark, J. Mietus, G. Mo ody , C.-K. Peng, H. Stanley , Ph ysiobank, physiotoolkit, and physionet: Comp onents of a new research resource for complex physiologic signals., Circulation 101 (23) (2000) e215–e220. [18] O. Tsinalis, P . M. Matthews, Y. Guo, S. Zafeiriou, Automatic sleep stage scoring with single- channel EEG using convolutional neural networks, [19] A. Supratak, H. Dong, C. W u, Y. Guo, DeepSleepNet: a mo del for automatic sleep stage scoring based on raw single-channel EEG, IEEE T rans. Neural. Syst. Rehabil. Eng. 25 (2017) 1998–2008. DIFFUSE TO FUSE EEG SPECTRA 23 [20] A. Vilamala, K. H. Madsen, L. K. Hansen, Deep Con volutional Neural Net works for In ter- pretable Analysis of EEG Sleep Stage Scoring, in: 2017 IEEE in ternational w orkshop on machine learning for signal pro cessing, 2017. [21] S. Mallat, Group inv arian t scattering, Comm unications on Pure and Applied Mathematics 65 (10) (2012) 1331–1398. [22] R. R. Coifman, S. Lafon, Diffusion maps, Appl. Comput. Harmon. Anal. 21 (1) (2006) 5–30. [23] O. Lindenbaum, A. Y eredor, M. Salhov, A. Av erbuch, MultiView Diffusion Maps, under review (2015). [24] B. Scholk opf, A. Smola, Learning with Kernels, MIT Press, 2002. [25] J. And´ en, S. Mallat, Deep scattering spectrum, IEEE T ransactions on Signal Processing 62 (16) (2014) 4114–4128. [26] J. Bruna, S. Mallat, E. Bacry , J. Muzy , et al., Intermitten t pro cess analysis with scattering moments, The Annals of Statistics 43 (1) (2015) 323–351. [27] P . B ´ erard, G. Besson, S. Gallot, Embedding Riemannian manifolds by their heat kernel, Geom. F unct. Anal. 4 (1994) 373–398. [28] U. v on Luxburg, M. Belkin, O. Bousquet, Consistency of sp ectral clustering, Ann. Stat. 36 (2) (2008) 555–586. [29] A. Singer, H.-T. W u, Sp ectral con vergence of the connection Laplacian from random samples, Information and Inference: A Journal of the IMA 6 (1) (2017) 58–123. [30] N. El Karoui, On information plus noise kernel random matrices, Ann. Stat. 38 (5) (2010) 3191–3216. [31] N. El Karoui, H.-T. W u, Connection graph Laplacian metho ds can b e made robust to noise, Ann. Stat. 44 (1) (2016) 346–372. [32] R. R. Lederman, R. T almon, Learning the geometry of common latent v ariables using alternating-diffusion, Appl. Comp. Harmon. Anal. in press (2015). [33] R. T almon, H.-T. W u, Discov ering a latent common manifold with alternating diffusion for multimodal sensor data analysis, Appl. Comput. Harmon. Anal., In pr ess . [34] I. S. Dhillon, Co-clustering do cuments and words using bipartite spectral graph partition- ing, in: Proceedings of the sev en th A CM SIGKDD international conference on Knowledge discov ery and data mining, ACM, 2001, pp. 269–274. [35] O. Katz, R. T almon, Y.-L. Lo, H.-T. W u, Diffusion-based nonlinear filtering for multimodal data fusion with application to sleep stage assessment, Information F usion, In pr ess . [36] R. Rifkin, A. Klautau, In Defense of One-Vs-All Classification, Journal of Machine Learning Research 5 (2004) 101–141. [37] J. Malik, N. Reed, C.-L. W ang, H.-T. W u, Single-lead f-w a ve extraction using diffusion ge- ometry , Physiological Measurement 38 (2017) 1310–1334. [38] S. Alagapan, H. W. Shin, F. F rohlich, H.-T. W u, Diffusion geometry approach to efficien tly re- mov e electrical stim ulation artifacts in in tracranial electro encephalography , Journal of Neural Engineering in press (2018) http://iopscience.iop.org/article/10.1088/1741- - 2552/ aaf2ba . [39] R. Li, M. G. F rasch, H.-T. W u, Efficient fetal-maternal ecg signal separation from t wo channel maternal ab dominal ecg via diffusion-based c hannel selection, F ron tiers in Ph ysiology 8 (2017) 277. [40] A. Borb´ ely , P . Mattmann, M. Loepfe, I. Strauch, D. Lehmann, Effect of b enzodiazepine hypnotics on all-night sleep EEG spectra, Hum Neurobiol 4 (3) (1985) 189–94. [41] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015) 436–444. 24 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU Dep ar tment of Ma thema tics, Na tional Chen-Kung University, T ainan, T aiw an Dep ar tment of Thoracic Medicine, Heal thcare Center, Chang Gung Memorial Hos- pit al, Chang Gung University, School of Medicine, New T aipei, T aiw an Dep ar tment of Ma thema tics, Duke University, Durham, NC, USA Dep ar tment of Applied Ma thema tics, Na tional Chiao Tung University, Hsinchu, T ai- w an Dep ar tment of Ma thema tics and Dep ar tment of St a tistical Science, Duke Univer- sity, Durham, NC, USA. Ma thema tics Division, Na tional Center for Theoretical Sci- ences, T aipei, T aiw an E-mail address : hauwu@math.duke.edu

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment