Diffuse to fuse EEG spectra -- intrinsic geometry of sleep dynamics for classification

DIFFUSE TO FUSE EEG SPECTRA – INTRINSIC GEOMETR Y OF SLEEP D YNAMICS F OR CLASSIFICA TION GI-REN LIU, YU-LUN LO, JOHN MALIK, YUAN-CHUNG SHEU, AND HAU-TIENG WU Abstract. W e propose a nov el algorithm for sleep dynamics visualization and automatic annotation b y applying diﬀusion geometry based sensor fusion algo- rithm to fuse spectral information from tw o electroencephalograms (EEG). The diﬀusion geometry approach helps organize the nonlinear dynamical structure hidden in the EEG signal. The visualization is achiev ed by the nonlinear di- mension reduction capabilit y of the c hosen diﬀusion geometry algorithms. F or the automatic annotation purpose, the supp ort v ector machine is trained to predict the sleep stage. The prediction p erformance is v alidated on a publicly av ailable b enchmar k database, Physionet Sleep-EDF [extended] SC ∗ (SC = Sleep Cassette) and ST ∗ (ST = Sleep T elemetry), with the leav e-one-sub ject- out cross v alidation. When w e ha v e a single EEG c hannel (Fpz-Cz), the o v erall accuracy , macro F1 and Cohen’s k appa achiev e 82 . 72%,75 . 91% and 76 . 1% re- spectively in Sleep-EDF SC ∗ and 78 . 63%, 73 . 58% and 69 . 48% in Sleep-EDF ST ∗ . This p erformance is compatible with the state-of-the-art results. When we ha v e t wo EEG c hannels (Fpz-Cz and Pz-Oz), the o v erall accuracy , macro F1 and Cohen’s k appa achiev e 84 . 44%,78 . 25% and 78 . 36% resp ectively in Sleep- EDF SC ∗ and 79 . 05%, 74 . 73% and 70 . 31% in Sleep-EDF ST ∗ . The results suggest the potential of the proposed algorithm in practical applications. 1. Introduction Sleep is a universal recurring dynamical and ph ysiological activit y in mammals. According to American Academy of Sleep Medicine (AASM) [1, 2] that generalizes Rec htsc haﬀen and Kales (R&K) criteria [3], the sleep dynamics can b e divided into t wo broad stages, the rapid ey e mo vemen t (REM) and the non-rapid ey e mo v emen t (NREM), and the NREM stage is further divided in to shallow sleep (stage N1 and N2) and deep sleep (stage N3). Up to now, we ha v e accum ulated a lot of physiolog- ical knowledge ab out sleep dynamics [4] and a lot of research is actively carried out to explore the unknowns [5]. Despite those unknowns, it has b een w ell known that a distortion of sleep dynamics could lead to c atastrophic outcomes. F or example, sleep depriv ation impacts decision making [6], REM disturbance slo ws do wn the p erceptual skill impro vemen t [7], depriv ation of slo w wa ve sleep is asso ciated with Alzheimer’s disease [8], insuﬃcien t N2 sleep is associated with weaning failure [9], sev eral public disasters are caused b y low sleep quality [10, 11], etc. Therefore, in addition to the in terest stemming from ph ysiological asp ects, w e hav e a lot of clinical applications from knowing the sleep dynamics. The p olysomnograph y (PSG) is the gold standard of ev aluating the sleep dy- namics [12]. Ho wev er, scoring the o v ernight sleep stage from the PSG outputs by h uman experts is time consuming and error-prone due to the h uge signal loading [13]. Due to the adv ance of the technology and computational p ow er, in the past decades a lot of eﬀort has b een dev oted to establish an artiﬁcial intelligence (AI) 1 2 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU system for the automatic sleep stage annotation purp ose. In addition to b eing able to accurately score sleep stages, an ideal AI system should also b e able to “self- learn” or “accumulate kno wledge” from the historical database. In other words, when the database with exp erts’ annotations gro ws, an ideal AI system should b e able to use those new annotations. It is worth mentioning that the c hallenge is actually ubiquitous – for a new-arriving sub ject, how ma y one utilize the existing database with the exp ert annotations? 1.1. Related W ork. There hav e b een man y prop osed algorithms for the sake of automatic sleep stage scoring. Among man y papers, w e shall distinguish tw o com- mon cross v alidation (CV) sc hemes. According to whether the training data and the testing data come from diﬀerent sub jects, the literature is divided in to t wo groups, le ave-one-subje ct-out and non-le ave-one-subje ct-out CV. When the v alida- tion set and training set are determined on the subje ct level , that is, the training set and the v alidation set con tain diﬀeren t sub jects, w e call it the lea ve-one-sub ject-out CV (LOSOCV) sc heme; otherwise we call it the non-LOSOCV sc heme. The main c hallenge of the LOSOCV scheme comes from the in ter-individual v ariability , but this scheme is close to the real-w orld scenario – how to predict the sleep dynamics of a new arriv al sub ject from a given annotated database. On the other hand, in the non-LOSOCV scheme, the training set and the testing set are dependent, and the p erformance migh t be o ver-estimated. In this work, to b etter ev aluate the prop osed algorithm caused by the in ter-individual v ariability issue, we fo cus on the LOSOCV sc heme, and only summarize pap ers considering the LOSOCV sc heme and single- or tw o- channel EEG signal. In [14], time-domain and frequency-domain features are designed to predict a wak e, REM, light NREM stage (N1 and N2 are com bined) and deep sleep stage b y supp ort v ector mac hine. The p erformance is ev aluated in a priv ate database with 4 sub jects. In [15], 104 features are extracted from the EEG signals and selected features are used to classify 4 sleep stages (N1 and N2 are combined) by random forests. The following papers classify 5 sleep stages like our w ork. In [16], time- domain and frequency-domain features are used as the input la y er of the stack ed sparse auto enco der, which is a sp eciﬁc t yp e of neural net work mo del, with the sigmoid activ ation function. The performance of the stack ed sparse autoenco der w as ev aluated in the publicly a v ailable b enchmark database, Sleep-EDF SC ∗ [17], and the o v erall accuracy w as 78.9%. Instead of extracting features based on the domain kno wledge, features in [18] are automatically learned by the conv olutional neural net works (CNNs). The ov erall accuracies w as 74.8% for the same Sleep-EDF SC ∗ database. In [19], the authors also prop osed a deep learning mo del, named DeepSleepNet, which utilized CNNs to extract features and apply the bidirectional Long Short-T erm Memory to learn the stage transition rules from EEG ep ochs. The DeepSleepNet algorithm reaches the state-of-the-art 82.0% of ov erall accuracy on the Sleep-EDF SC ∗ database. In [20], a similar approach based on the deep CNN with mo diﬁcations is considered and achiev es a compatible result. 1.2. Our con tribution. The main nov elty and con tribution of this pap er is the diﬀusion ge ometry based feature extraction when there is only one electroencephalo- gram (EEG) c hannel, and diﬀusion ge ometry based sensor fusion when there are t wo EEG channels. The goal is to extract intrinsic information hidden in the EEG. DIFFUSE TO FUSE EEG SPECTRA 3 This nov el diﬀusion geometry approac h ac hieves the abov e-mentioned challenges – an accurate automatic sleep stage scoring AI system. The prop osed algorithm is comp osed of feature extraction part and learning part. The feature extraction part of the algorithm is based on a combination of t wo mo dern signal processing tools, including the scattering transform [21] and diﬀusion map (DM) [22] when only single c hannel is a v ailable or diﬀusion-based sensor fusion (the m ultiview DM [23]) when t w o c hannels are a v ailable. The scattering transform is a nonlinear-t yp e signal pro cessing tool motiv ated by studying the conv olutional neural netw ork (CNN) with solid theoretical supp orts, whic h allows us an extrac- tion of useful features from the EEG signal. W e mention that while it is motiv ated b y studying CNN, the lab el is not used in the scattering transform, so it is an unsup ervised learning to ol. W e call the extracted features sc attering EEG sp ectral features. The scattering EEG spectral feature, ho w ev er, is in general of high di- mension and diﬀerent from the intrinsic sleep dynamics. This diﬀerence comes from v arious resources, including, for example, the inevitable noise and artifact. Thus, an extra step is needed to reduce the dimension and reﬁne/reorganize the scatter- ing EEG spectral feature, whic h leads to the ﬁnal intrinsic features for the sleep dynamics. W e suggest DM [22] or diﬀusion-based sensor fusion algorithms, like m ultiview diﬀusion [23]. In short, the scattering EEG sp ectral features extracted b y the scattering transform are re-organized by mo dern diﬀusion geometry based sensor fusion algorithms. W e call the output features of the DM/diﬀusion-based sensor fusion the intrinsic sle ep dynamic al fe atur es . The learning part of the al- gorithm for the prediction purp ose is based on the kernel support vector mac hine (SVM) [24]. The prediction mo del for each testing sub ject is established by the k ernel SVM based on the intrinsic sleep dynamical features and the annotations of all remaining sub jects in the database. T o show this p erformance of the prop osed algorithm and hav e a fair comparison with rep orted results, we consider the publicly a v ailable b enchmark database, the Ph ysioNet Sleep-EDF SC ∗ (SC = Sleep Cassette) database [17], which consists of health y sub jects without any sleep-related medication, and the Ph ysioNet Sleep- EDF ST ∗ (ST = Sleep T elemetry) [17], whic h consists of sub jects who had mild diﬃcult y falling asleep. W e mention that compared with the existing neural net work (NN) based algorithms men tioned in Section 1.1, the prop osed algorithm, including the feature extraction part and learning part, has solid theoretical backups. In addition, later in this pap er we will sho w that the prop osed algorithm p erforms b etter or at least equally to the existing reported results based on NN approaches when there is a single EEG c hannel, and the p erformance is improv ed if w e fuse information from t wo EEG channels. 1.3. Organization of the pap er. The rest of this pap er is organized as follows. In Section 2, we in tro duce the prop osed algorithm and the databases w e ev aluate the prop osed algorithm. In Section 2.1 we summarize the theoretical bac kground of the scattering transform and in tro duce the sc attering EEG sp e ctrum as the dy- namical features. In Section 2.2, we describe how to apply the diﬀusion geometry to integrate and fuse the dynamical features in Section 2.1, and the prediction mo del based on SVM is summarized in Section 2.3. In Section 2.4, we describ e t w o databases, including the Sleep-EDF Database [Expanded] from PhysioNet. The re- sults of applying the prop osed algorithm to the tw o databases are shown in Section 3. 4 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU 2. Ma terial and Method The main nov elty of the prop osed algorithm is the feature extraction that de- p ends on the diﬀusion geometry based sensor fusion to ols, DM and multiview DM. The feature extraction consists of tw o steps. First, we apply the scattering trans- form to extract features from the EEG signal (indicated by Part 1 in Figure 1). Second, w e apply DM or multiview diﬀusion map (indicated by Part 2-1 and Part 2-2 in Figure 1) with a prop erly designed metric to determine the ﬁnal features. With the prop osed features, we tak e the well established SVM to build up the prediction mo del. Below, we detail the algorithm implemen tation step by step. 2.1. F eature extraction step, part 1: Scattering transform. In this pap er, w e apply the recently developed scattering transform [21] to extract spectral fea- tures [25] from a giv en EEG signal. The scattering transform is motiv ated b y estab- lishing a mathematical foundation of the con v olutional neural net w ork. It computes co eﬃcien ts of the given EEG signal by iteratively applying wa v elet transforms and nonlinear mo dulus op erators. They hav e t w o b eneﬁts. First, they are inv arian t to time-shifts and stable to time-w arping deformation [25]. Second, they can ef- fectiv ely characterize self-similarit y and intermittency prop erties of m ultiscale time series [26]. These t wo b eneﬁts provide us with a better quantiﬁcation of the EEG dynamics. Here, we brieﬂy review the scattering transform, particularly the ﬁrst- and second-order co eﬃcie n ts, whic h w e use as the sp ectral features for the given EEG signal. T ake a signal x deﬁned on [0 , T ]. Let ψ b e an analytic wa velet, that is, a complex band-pass ﬁlter with the support o v er positive frequencies and the central frequency 1. Denote the associated father wa velet b y φ , which is a low pass ﬁlter. Denote Q ∈ N to b e the n um b er of wa v elets p er o ctav e; that is, the bandwidth of ˆ ψ is of order Q − 1 . T o ev aluate the wa v elet co eﬃcients, we dilate the mother wa v elet ψ as follo ws: (1) ψ j ( t ) = 2 − j /Q ψ (2 − j /Q t ) , where j ∈ { . . . , − 1 , 0 , 1 , . . . , H } , and H ∈ N is the maximal w a velet scale so that the largest-scale wa velet will b e of bandwidth T . Clearly , c ψ j is centered at 2 − j /Q with the bandwidth 2 − j /Q Q − 1 , and hence ψ H captures the frequency ab ov e 2 − H/Q . F or the low frequency region [0 , 2 − H/Q ], we follow the suggestion in [25] to cov er this region b y Q − 1 equally spaced ﬁlters, denoted as ψ H +1 , . . . , ψ H + Q , with the frequency bandwidth 2 − H/Q Q − 1 cen tered at l 2 − H/Q Q − 1 , where l = 1 , . . . , Q − 1. Again, we follo w the suggestion in [25] and call ψ H +1 , . . . , ψ H + Q − 1 w av elets. Denote Λ = { 2 − j /Q } H j = −∞ ∪ { l 2 − H/Q Q − 1 } Q − 1 l =1 to b e the grid of cen ter frequencies of all considered wa v elets. With the abov e preparation, we are ready to establish the scattering transform co eﬃcien ts of the signal x . W e start from collecting ordinary wa velet transform co eﬃcien ts x ? ψ j 1 ( t ), where ? is the conv olution operator and −∞ ≤ j 1 ≤ H + Q , and the mo ving a verage by the father wa velet x ? φ H ( t ), where (2) φ H ( t ) := Q − 1 2 − H/Q φ ( Q − 1 2 − H/Q t ) . Note that c φ H has the frequency bandwidth 2 − H/Q Q − 1 . Denote (3) S 0 x ( t ) := x ? φ H ( t ) , DIFFUSE TO FUSE EEG SPECTRA 5 EEG, channel 1 Extract scattering EEG sp ectral features: Scattering transform Determine intrinsic sleep features: DM with the L 2 metric EEG, channel 2 Extract scattering EEG sp ectral features: Scattering transform Determine intrinsic sleep features: DM with the L 2 metric Determine common in trinsic sleep features: m ultiview DM P art 1 P art 2 P art 1 P art 2 P art 2 Figure 1. The ﬂow c hart of the prop osed feature extraction steps and a visualization of the extracted features b y the diﬀusion map and sensor fusion are shown. The signal is from 20 sub jects in Sleep-EDF SC ∗ [17], where Channel 1 is Fpz-Cz and c hannel 2 is Pz-Oz. In the b ottom ﬁgure, only one of the multiview DMs is sho wn. The ratios of the stages Awak e, REM, N1, N2, and N3 are 19%, 18%, 7%, 42%, and 14%. whic h is called the zer oth-or der c o eﬃcients of the sc attering tr ansform . Clearly , S 0 x ( t ) contains only the low frequency information of the signal x . On the other hand, while the wa v elet transform, x ? ψ j ( t ), pro vides multi-scale sp ectral informa- tion of x at the scale 2 − j /Q , the scattering transform outputs translation-inv arian t 6 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU co eﬃcien ts by a v eraging the modulus of the complex coeﬃcients x ? ψ j ( t ) via the follo wing mo ving a v erage step: (4) S 1 x ( t, j 1 ) := | x ? ψ j 1 | ? φ H , where −∞ < j 1 ≤ H . W e call S 1 x ( t, j 1 ) the ﬁrst-or der c o eﬃcients of the sc attering tr ansform . The time av eraging operator in (4) results in the loss of the ﬁne-scale information in | x ? ψ j 1 | such as vibratos and attac ks [25]. T o retrieve the lost information, we compute | x ? ψ j 1 | ? ψ j 2 . Again, to outputs translation-in v ariant co eﬃcients, w e apply mo dulation and conv olution operators on | x ? ψ j | ? φ H via: (5) S 2 x ( t, j 1 , j 2 ) := || x ? ψ j 1 | ? ψ j 2 | ? φ H , where −∞ ≤ j 1 < j 2 ≤ H . Note that only the wa v elet ψ j 2 with larger scale j 2 > j 1 are needed, since | x ? ψ j 1 | has appro ximately the same frequency supp ort as that of ψ j 1 [25, (29)]. The outputs are called the se c ond-or der c o eﬃcients of the sc attering tr ansform of x . This iterative procedure can b e extended to higher orders. F or example, the third-order co eﬃcients of the scattering transform of x can b e deﬁned b y ev aluating ||| x ? ψ j 1 | ? ψ j 2 | ? φ j 3 | ? φ H , where −∞ ≤ j 1 < j 2 < j 3 ≤ H . In this pap er, only the ﬁrst-order and second-order co eﬃcients are exploited. Next, w e follo w the suggestions in [25] to p ost-pro cess the scattering transform co eﬃcien ts. T o de-correlate the correlations at diﬀeren t orders [25], the ﬁrst-order scattering co eﬃcients are renormalized as follo ws: (6) e S 1 x ( t, j 1 ) = | x ? ψ j 1 | ? φ H ( t ) | x | ? φ H ( t ) + ε , where ε = 2 − 20 , and the 2nd-order scattering co eﬃcients are renormalized by co ef- ﬁcien ts of the 1st-order: (7) e S 2 x ( t, j 1 , j 2 ) = || x ? ψ j 1 | ? ψ j 2 | ? φ J ( t ) | x ? ψ j 1 | ? φ H ( t ) + ε , where ε = 2 − 20 . Similar to the Mel-frequency cepstral co eﬃcients, we follow [25] and apply a logarithm to x ? φ H ( t ) and the normalized co eﬃcients, i.e., log e S 1 x ( t, j 1 ) and log e S 2 x ( t, j 1 , j 2 ). According to [25], this step can transfer the multiplicativ e comp onen ts in to additiv e ones and improv e the classiﬁcation p erformance. W e now apply the scattering transform to extract features from the EEG signal. F ollowing the R&K [3] and AASM classiﬁcations [2], we take an epo c h for the sleep stage ev aluation to be 30 seconds long. Suppose the l -th ep o ch starts at a certain time t l and ends at t l + 30. Let x l ( t ) b e a 90s EEG signal segment during the p erio d [ t l − 60 , t l + 30]; that is, we consider 60s extra EEG signal b efore the ep o ch w e hav e interest in and hence the signal length is T = 90s. The EEG signal is sampled uniformly every τ second, where τ is the reciprocal of the sampling rate. F or example, τ = 1 / 100 for the Sleep-EDF Database. In this work, w e consider the wa velet ψ to b e the Morlet mother wa velet, and take Q = 2. Numerically , w e represen t x k as a 9 , 000-dim v ector and take H = 17, 0 ≤ j 1 ≤ H for S 1 x k , and 0 ≤ j 1 < j 2 ≤ H for S 2 x k . Since S 0 x l , S 1 x l and S 2 x l are redundant at the time axis, they are subsampled at the rate 2 H/Q − 1 ; that is, ov er half-ov erlapping time windo ws of size 2 H/Q cen tered at the points t l := l 2 H/Q − 1 , where l ∈ N . The collection of these subsampled co eﬃcients are concatenated and called the sc attering EEG sp e ctr al fe atur e of the k -th epo ch, which we denote as u ( k ) . DIFFUSE TO FUSE EEG SPECTRA 7 2.2. F eature extraction step, part 2: (Multiview) diﬀusion map. In the optimal situation, the sleep dynamics at each ep o ch can b e well captured by the scattering EEG sp ectral features. How ever, the scattering EEG sp ectral features migh t b e erroneous due to the inevitable noise, other sensor-speciﬁc artifacts and the information distortion caused by the observ ation procedure. W e then stabilize these features to b etter quantify the intrinsic sleep dynamics by applying DM. DM is a nonlinear manifold learning algorithm that not only is robust to noise but also resp ects the nonlinear structure of the intrinsic dynamics. W e detail its implemen- tation b elow, together with its generalization, m ultiview DM, to fuse information from multiple channels. 2.2.1. Extr act intrinsic sle ep dynamics by diﬀusion map. T ak e the scattering EEG sp ectral features U x,k := { u ( j ) } J k j =1 generated from the single-channel EEG sig- nal x ( t ) of the k -th sub ject, where k = 1 , . . . , K , which is a p oint cloud in an Euclidean space. Denote U x := ∪ K k =1 U x,k = { u ( j ) } J j =1 , where J = P K k =1 J k and { u ( J 1 + ... + J k − 1 +1) , . . . , u ( J 1 + ... + J k ) } comes from U x,k . First, from the p oint cloud U x , we build a graph with U x b eing vertices. The aﬃnity betw een features u ( i ) and u ( j ) is deﬁned as (8) W x ( i, j ) = exp  − k u ( i ) − u ( j ) k ε  , for i, j = 1 , . . . , J , where  > 0 is chosen by the user and k · k is the Euclidean distance comparing tw o scattering EEG sp ectral features. Thus, we hav e a J × J aﬃnity matrix W x . Next, deﬁne the degree matrix D x of size J × J , whic h is diagonal, as (9) D x ( i, i ) = J X j =1 W x ( i, j ) , for i = 1 , . . . , J. With matrices W x and D x , w e could deﬁne a random w alk on the p oint cloud U x with the transition matrix given b y the formula A x := D − 1 x W x . (10) Since A x is similar to the symmetric matrix D − 1 / 2 x W x D − 1 / 2 x , it has a complete set of righ t eigenv ectors φ 1 , φ 2 , · · · , φ J ∈ R J with corresp onding eigen v alues 1 = λ 1 > λ 2 ≥ · · · ≥ λ J ≥ 0, where φ 1 = [1 , 1 , . . . , 1] T ∈ R J . Indeed, from the eigen- decomp osition D − 1 / 2 x W x D − 1 / 2 x = O Λ O > , where O is a J × J orthonormal matrix and Λ is a J × J diagonal matrix, we hav e A x = U Λ V > , where U = D − 1 / 2 x O and V = D 1 / 2 x O . With the decomp osition A x = U Λ V > , the DM is deﬁned as (11) Φ x t : u ( j ) 7→  λ t 2 φ 2 ( j ) , λ t 3 φ 3 ( j ) , . . . , λ t ˆ d +1 φ ˆ d +1 ( j )  , where j = 1 , . . . , J and t > 0 is the diﬀusion time c hosen b y the user, and ˆ d is an estimate of the dimension of the intrinsic state. Here, ˆ d can b e ob- tained according to the sp ectral gap in the deca y of the eigenv alues { λ j } J j =1 . In other w ords, Φ x t ( u ( j ) ) consists of the second to ˆ d + 1 co ordinates of e > j U Λ t , where e j is the unit J -dim vector with the j -th entry 1. As a result, the scat- tering EEG sp ectral features U x are conv erted into a set of new features in R ˆ d via the DM. Denote F x := { Φ x t ( u ( j ) ) } J j =1 ⊂ R ˆ d . Therefore, we obtain the in- trinsic sle ep fe atur es for the recorded EEG signal x of the k -th sub ject F x,k := { Φ x t ( u ( J 1 + ... + J k − 1 +1) ) , . . . , Φ x t ( u ( J 1 + ... + J k ) ) } . 8 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU Note that the diﬀusion process plays an essential role in the whole algorithm, since (11) is based on the diﬀusion nature of the random w alk with the transition matrix (10). Based on the sp ectral geometry result rep orted in [27], and the asymp- totical spectral conv ergence results sho wn in [28, 29], we kno w that the Riemannian manifold hosting the inaccessible intrinsic sleep features is almost isometrically re- co vered b y DM (11). In addition to reco v ering the in trinsic state space, another imp ortan t prop erty of DM is its robustness to noise. Essentially , the diﬀusion pro- cess could b e viewed as a nonlinear “av eraging” pro cess. This viewp oint w as ﬁrst captured and prov ed in [30], and later extended to larger noise in [31]. Another imp ortan t fact is that the embedding provided b y DM is unique up to a global rotation, so it is not feasible to construct intrinsic sleep features for each sub ject separately and p o ol them together for the prediction purp ose. The prop osed algo- rithm here applies DM to all scattering EEG sp ectral features from all sub jects to construct in trinsic sleep features, whic h is ov er a universal c o or dinate that allows us a comparison among diﬀeren t sub jects. See Figure 1 for an example of DM when ˆ d = 3. It is clear that ep o ch s of diﬀerent sleep stages are clustered and well separated. 2.2.2. F use two channels via multiview DM. If only one EEG signal x is a v ailable, w e pro ceed to the learning step in Section 2.3 with the intrinsic sleep features F x for c hannel x . If w e ha v e t wo or more c hannels, we can tak e them into account sim ultaneously . In general, this problem is understo o d as the sensor fusion prob- lem. While diﬀerent EEG channels capture information from the same brain, the information recorded migh t v ary and they might be contaminated b y brain-activity irrelev ant artifacts from diﬀerent sources, including noise and other sensor-sp eciﬁc n uisance. These artifacts not only deteriorate the qualit y of the extracted features but migh t also mislead the analysis result. The main purpose of sensor fusion is dis- tilling the brain information and remo ving those unw anted artifacts. A naive w ay to com bine information from x and y is simply concatenating features and form a new feature. How ever, due to the inevitable sensor-sp eciﬁc artifacts or errors, such a concatenating scheme might not b e the optimal route to fuse sensors [32, 33]. W e consider the recently dev elop ed diﬀusion-based sensor fusion approach, multiview DM [23] or equiv alently the alternating DM [33], to fuse sensor information. Essen- tially , the information from diﬀeren t sensors are “diﬀused” to in tegrate the common information, and simultaneously eliminate artifacts or noise sp eciﬁc to each sensor. In this work, we fo cus on tw o sim ultaneously recorded EEG c hannels x and y . Here we detail the m ultiview DM algorithm. F or each channel, we apply Sections 2.1 and 2.2 to obtain its in trinsic sleep features, denoted as F x ⊂ R ˆ d x and F y ⊂ R ˆ d y resp ectiv ely , where ˆ d x migh t be diﬀeren t from ˆ d y . Deﬁne (12) W x,y =  0 J × J W x W y W y W x 0 J × J  ∈ R 2 J × 2 J , where W x and W y are aﬃnity matrices deﬁned in (8), and deﬁne a diagonal matrix D x,y ∈ R 2 J × 2 J so that its i -th diagonal en try is the sum of the i -th row of W x,y . Denote q i ∈ R 2 J to b e the i -th righ t eigenv ector of the transition matrix D − 1 x,y W x,y corresp onding to eigenv alue σ i . Since for eac h i , q i ( l ) and q i ( J + l ) corresp ond to the l -th epo ch for eac h l ∈ { 1 , . . . , J } , w e call the 2 ˜ d -dimensional vector (13) v j := [ σ t 2 q 2 ( j ) · · · σ t ˜ d +1 q ˜ d +1 ( j ) σ t 2 q 2 ( J + j ) · · · σ t ˜ d +1 q ˜ d +1 ( J + j )] > DIFFUSE TO FUSE EEG SPECTRA 9 the c ommon intrinsic sle ep fe atur e asso ciated with the sleep stage of the j -th epo ch, where j ∈ { 1 , . . . , J } . Denote F x,y := { v j } J j =1 . Hence, we obtain the common in trinsic sleep features for the simultaneously rec orded EEG signals x and y of the k -th sub ject F x,y ,k := { v ( J 1 + ... + J k − 1 +1) , . . . , v ( J 1 + ... + J k ) } . Note that since the transition matrix D − 1 x,y W x,y is asso ciated with the random w alk on the bipartite graph formed from ep o chs of x and y , its eigenv ectors “co- clustering” tw o channels [34]. Hence, w e can use its eigenv ectors as features for the sleep stage. W e men tion that when there are only t w o c hannels, like in our case, the m ultiview DM is equiv alent to alternating DM [33], but when there are more than t wo channels, the m ultiview DM is diﬀerent from the generalization of alternating DM to multiple c hannels [35]. An illustration of the result of multiview DM with ˜ d = 3 is shown at the b ottom of Figure 1. 2.3. Learning step: Sleep Stage Classiﬁcation by the supp ort vector ma- c hine. In this w ork, we choose the standard and widely used kernel SVM [24] for the learning step. SVM ﬁnd a hyperplane in the feature space that separates the data set in to tw o disjoint subsets. Based on the repro ducing kernel Hilb ert space theory , SVM is generalized to the kernel SVM , which allows for classiﬁcation when there exists a nonlinear structure in the feature space. Sp eciﬁcally , kernel SVM ﬁnds a nonlinear surface separating the data set into tw o disjoints subsets. W e refer the in terested reader to [24] for tec hnical details. In this w ork, we c ho ose the radial based function, K ( x, x 0 ) = exp( − k x − x 0 k 2 2 2 σ 2 ), where σ > 0, as the kernel function. In this sleep dynamics classiﬁcation problem, the lab el is multi-class so w e need to further generalize the k ernel SVM to the m ulti-class SVM to complete our mission. T o this end, we apply the one-versus-all (O V A) classiﬁcation sc heme [36]. 2.4. Material. T o ev aluate the prop osed algorithm, we consider the commonly considered b enchmark database , Sleep-EDF Database [Expanded], from the public rep ository Ph ysionet [17]. It contains t wo subsets (marked as SC ∗ and ST ∗ ). The ﬁrst subset SC ∗ comes from health y sub jects without any sleep-related medication. The subset SC ∗ con tains Fpz-Cz/Pz-Oz EEG signals recorded from 10 males and 10 females without any sleep-related medication, and the age range is 25-34 year-old. There are tw o approximately 20-hour recordings p er sub ject, apart from a single sub ject for whom there is only a single recording. The EEG signals were recorded during tw o subsequent day-nigh t p erio ds at the sub jects’ home. The sampling rate is 100 Hz. The second subset ST ∗ w as obtained in a 1994 study of temazepam eﬀects on the sleep of sub jects with mild diﬃcult y falling asleep. The subset ST ∗ con tains Fpz-Cz/Pz-Oz EEG signals recorded from 7 males and 15 females, who had mild diﬃculty falling asleep. Since this data set is originally used for studying the eﬀects of temazepam, the EEG signals were recorded in the hospital for tw o nigh ts, one of which was after temazepam in take. Only their placeb o nights can b e do wnloaded from [17]. The sampling rate is 100 Hz. F or b oth SC ∗ and ST ∗ sets, eac h 30s ep o c h of EEG data has b een annotated in to the classes Aw ak e, REM, N1,N2, N3 and N4. The ep o chs corresp onding to mov ement and unkno wn stages w ere excluded and the the ep o chs lab eled by N4 are relab eled to N3 according to the AASM standard. F or more details of the database, w e refer the reader to https://www.physionet.org/physiobank/database/sleep- edfx/ . 10 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU 2.5. Statistics. T o ev aluate the p erformance of the prop osed automatic sleep scor- ing algorithm, we consider the LOSOCV scheme. F or each database, one sub ject is randomly chosen as the testing set and the remaining sub jects form the training set. Sp eciﬁcally , when there are n sub jects, for the ` -th sub ject, we train the kernel SVM mo del using the common in trinsic sleep features ∪ k =1 ,...,K, k 6 = ` F x,y ,k with the pro vided lab els, and test the trained mo del on F x,y ,` and rep ort the results. Then apply the same procedure for sub ject ` = 1 , . . . , n . Note that this LOSOCV sc heme helps preven t ov erﬁtting in constructing the prediction mo del. All p erformance measurements used in this pap er are computed through the unnormalized confusion matrix M ∈ R 5 × 5 . F or 1 ≤ p, q ≤ 5, the en try M pq represen ts the n umber of exp ert-assigned p -class ep o chs, whic h were predicted to the q -class. The precision (PR p ), recall (RE p ), and F1-score (F1 p ) of the p -th class, where p = 1 , . . . , 5, are computed resp ectiv ely through (14) PR p = M pp P 5 q =1 M q p , RE p = M pp P 5 q =1 M pq , F1 p = 2PR p · RE p PR p + RE p . The ov erall accuracy (ACC), macro F1 score (Macro-F1) and k appa ( κ ) co eﬃcient are computed resp ectively through (15) A CC = P 5 p =1 M pp P 5 p,q =1 M pq , Macro-F1 = 1 5 5 X p =1 F1 p , κ = A CC − EA 1 − EA , where EA means the exp ected accuracy , which is deﬁned by (16) EA = P 5 p =1  P 5 q =1 M pq  ×  P 5 q =1 M q p   P 5 p,q =1 M pq  2 . T o ev aluate if t wo matched samples hav e the same mean, w e apply the one-tail Wilco xon signed-rank test under the null hypothesis that the diﬀerence b et ween the pairs follo ws a symmetric distribution around zero. When w e compare the v ariance, w e apply the one-tail F test under the null hypothesis that there is no diﬀerence b et ween the v ariances. W e consider the signiﬁcance lev el of 0 . 05. T o handle the m ultiple comparison issue, w e consider the Bonferroni correction. 3. Resul ts In this section, we rep ort the results of applying the prop osed algorithm to the ab ov e-mentioned t w o databases. W e show a visualization of the (common) in trinsic sleep features. Note that this can b e viewed as a dimension reduction of the EEG signals. The dynamics of the intrinsic sleep features are also shown to compared with the hypnogram. The confusion matrices of the automatic sleep stage classiﬁcation on diﬀeren t databases are shown, and a performance comparison with other rep orted results is also provided. The parameters in the numerical implemen tation are listed here. F or DM and m ultiview DM, the  in (8) is c hosen to b e the 1% p ercentile of pairwise distances, the diﬀusion time is t = 0 . 3. ˆ d and ˜ d are c hosen to b e 80. No systematic parameter optimization is p erformed. F or the repro ducibilit y purpose, the Matlab code will b e pro vided via request. DIFFUSE TO FUSE EEG SPECTRA 11 3.1. Sleep-EDF database (SC*). W e start from showing the visualization of the in trinsic sleep features and the common in trinsic sleep features from the total 20 diﬀeren t sub jects in the Sleep-EDF SC* database. See the middle of Figure 1 for a visualization of eac h EEG c hannel b y DM. In the bottom of Figure 1, we plot the em b eddings { [ q 2 ( i ) , q 3 ( i ) , q 4 ( i )] } J i =1 and { [ q 2 ( i + J ) , q 3 ( i + J ) , q 4 ( i + J )] } J i =1 for a visualization of fusing tw o EEG channels generated by the m ultiview DM. Clearly , w e see that Awak e, REM, N2 and N3 stages are well clustered in all plots, while N1 is less clustered and tends to mixed up with other stages. While the sleep dynamics can b e easily visualized in Figure 1, it is not easy to visualize the temp oral dynamics information. F or this purp ose, we sho w the ﬁnal in trinsic features expanded in the time line in Figure 2. Figure 2. A visualization of the temporal dynamics in common in trinsic sleep features extracted by the multiview DM. The ﬁrst in trinsic feature is colored in red, and the tenth intrinsic is colored in blue. The expert’s labels are plotted on the top. Since there are long perio ds of w ak efulness at the start and the end of recordings, when a sub ject is not sleeping, [19] only includes 30 min utes of such p erio ds just b efore and after the sleep perio ds. T o hav e a fair comparison, we also follow this truncation rule in this work. In the end, the labeled ep o c hs are im balanced, with 42.4% ep o c hs lab eled N2 and only 6.6% epo chs lab eled N1. W e run the lea v e-one-sub ject-out CV. The a veraged confusion matrix of the prop osed algorithm ov er 20 sub jects is shown in T able 1 for the single-channel case and T able 2 for the tw o-channel case. F or the the single-c hannel case (e.g., Fpz-Cz), the o v erall accuracy is 82 . 72%, the macro F1 is 75 . 91% and Cohen’s k appa equals 76 . 10%. F or the tw o-c hannel case, the ov erall accuracy is 84 . 44% and the macro F1 is 78 . 25%, with Cohen’s k appa 78 . 36%. Note that the N1 prediction is relatively lo w compared with other stages. Particularly , the F1 is 46% and most N1 ep o chs are classiﬁed as REM or N2. This misclassiﬁcation is related to the scattered N1 ep o chs in Figure 1 that can be visually observ ed, and it is the main reason to 12 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU drag down the o verall accuracy and macro F1. W e also note that N3 is commonly classiﬁed wrongly as N2, Aw ak e is commonly classiﬁed wrongly as N1, and REM is commonly classiﬁed wrongly as N2. T o further examine the performance, the resulting h ypnogram of one sub ject is shown in Figure 3. Note that the discrepancy b et ween the exp erts’ annotations and the prediction frequently happ ens when there is a “stage transition”. Note that the sleep dynamics transition from one stage to another one often happ ens in the middle of one ep o ch. Thus, those epo chs with sleep dynamics transition contain information that is not purely for one stage, and hence harder to classify . In T able 3, we compare p erformance of the proposed algorithm and sev eral rep orted algorithms v alidated b y the leav e-one-sub ject-out CV scheme. T able 1. Performance of DM with the scattering transform ev al- uated b y 20-fold lea ve-one-sub ject-out cross-v alidation on the Fpz- Cz c hannel (top) and Pz-Oz c hannel (b ottom) from the Sleep-EDF SC ∗ database. F or the Fpz-Cz c hannel, the ov erall accuracy equals 82.72%, the macro F1 score equals 75.91% and Cohen’s k appa equals 76 . 10%. The standard deviation of classiﬁcation accuracy (resp., the macro F1 score and Cohen’s k appa) for the 39-night recordings is 8.90% (resp., 7.71% and 11.30%). F or the Fz-Oz c hannel, the ov erall accuracy equals 80.99%, the macro F1 score equals 72.69% and Cohen’s k appa equals 73 . 49%. The standard deviation of classiﬁcation accuracy (resp., the macro F1 score and Cohen’s k appa) for the 39-night recordings is 5.06% (resp., 5.99% and 6.97%). The training set for eac h testing sub ject consists of the recordings of the remaining 19 sub jects during sleep. Fpz-Cz Predicted P er-class Metrics Awak e REM N1 N2 N3 PR RE F1 Awak e (19%) 6871 ( 87% ) 298 ( 4% ) 563 ( 7% ) 171 ( 2% ) 24 ( 0% ) 83 87 85 REM (18%) 295 ( 4% ) 6253 ( 81% ) 378 ( 5% ) 790 ( 10% ) 1 ( 0% ) 79 81 80 N1 (7%) 549 ( 20% ) 616 ( 22% ) 1040 ( 37% ) 591 ( 21% ) 8 ( 0% ) 46 37 41 N2 (42%) 416 ( 2% ) 692 ( 4% ) 272 ( 2% ) 15821 ( 89% ) 598 ( 3% ) 87 89 88 N3 (14%) 101 ( 2% ) 13 ( 0% ) 0 ( 0% ) 872 ( 15% ) 4717 ( 83% ) 88 83 85 Fz-Oz Predicted P er-class Metrics Awak e REM N1 N2 N3 PR RE F1 Awak e (19%) 6803 ( 86% ) 420 ( 5% ) 466 ( 6% ) 220 ( 3% ) 18 ( 0% ) 84 86 85 REM (18%) 310 ( 4% ) 6038 ( 78% ) 350 ( 5% ) 1019 ( 13% ) 0 ( 0% ) 77 78 78 N1 (7%) 794 ( 28% ) 647 ( 23% ) 755 ( 27% ) 602 ( 22% ) 6 ( 0% ) 42 27 33 N2 (42%) 211 ( 1% ) 703 ( 4% ) 226 ( 1% ) 15943 ( 90% ) 716 ( 4% ) 84 90 87 N3 (14%) 4 ( 0% ) 9 ( 0% ) 2 ( 0% ) 1250 ( 22% ) 4438 ( 78% ) 86 78 82 3.2. Sleep-EDF database (ST*). See Figures 4 and 5 for a visualization of DM and m ultiview DM of the ST* database. While Awak e, REM, N2 and N3 stages DIFFUSE TO FUSE EEG SPECTRA 13 T able 2. Performance of the m ultiview DM with the scatter- ing transform ev aluated by 20-fold lea ve-one-sub ject-out cross- v alidation on the Fpz-Cz and Pz-Oz channels from the Sleep-EDF SC ∗ database. The o v erall accuracy equals 84.44%, the macro F1 score equals 78.25% and Cohen’s k appa equals 78 . 36%. If the classiﬁcation accuracy , macro F1 score, and Cohen’s k appa are computed for eac h night recording, the standard deviation of clas- siﬁcation accuracy (resp., the macro F1 score and Cohen’s k appa) for the 39-nigh t recordings is 5.36% (resp., 6.40% and 8.01%). The training set for eac h testing sub ject consists of the recordings of the remaining 19 sub jects during sleep. Fpz-Cz+Pz-Oz Predicted P er-class Metrics Awak e REM N1 N2 N3 PR RE F1 Awak e (19%) 7034 ( 89% ) 148 ( 2% ) 525 ( 7% ) 197 ( 2% ) 23 ( 0% ) 90 89 90 REM (18%) 125 ( 2% ) 6070 ( 79% ) 528 ( 7% ) 991 ( 13% ) 3 ( 0% ) 85 79 82 N1 (7%) 498 ( 18% ) 436 ( 16% ) 1218 ( 43% ) 643 ( 23% ) 9 ( 0% ) 47 43 46 N2 (42%) 115 ( 1% ) 492 ( 3% ) 313 ( 2% ) 16337 ( 92% ) 542 ( 3% ) 86 92 89 N3 (14%) 17 ( 0% ) 0 ( 0% ) 1 ( 0% ) 921 ( 16% ) 4764 ( 84% ) 89 86 84 Figure 3. The resulting hypnogram of one sub ject from SC*. The gra y curve is the expert’s lab el, and the blac k dots are the predicted sleep stages. The discrepancy is emphasized by the blac k circles. are still well clustered in all plots, compared with the normal sub jects in the SC* database, the separation is less clear. W e run the lea v e-one-sub ject-out CV. The a veraged confusion matrix of the prop osed algorithm ov er 22 sub jects is shown in T able 4 for the single-channel case and T able 5 for the t wo-c hannel case. F or the single channel case (e.g., Fpz-Cz c hannel), the ov erall accuracy is 78 . 63%, the macro F1 is 73 . 58%, and Cohen’s k appa is 69 . 48%. F or the t wo-c hannel case, the ov erall accuracy is 79 . 05% and the macro F1 is 74 . 73%, with Cohen’s k appa 70 . 31%. In this database, there are 10% ep o chs lab eled as N1, which is sligh tly higher than that of the SC ∗ database. While the prediction p erformance of N1 is slightly higher (50% when w e use the Fpz-Cz 14 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU T able 3. Comparison b etw een DM-HMM and other sleep stage scoring metho ds in terms of the o v erall accuracy (ACC), macro- F1 score (MF1) and Cohen’s k appa. Note that the total n umber of ep o c hs in the ST ∗ dataset is 21076 but the performance of our algorithm is ev aluated on 20988 ep o c hs. In fact, we do not predict the sleep stages of the ﬁrst 3 ep ochs and the last ep o c h of each recording in the ST ∗ dataset b ecause the features for each ep o ch is extracted from a 90s signal. As a result, 21076 − 88 = 20988 ep o chs in the ST ∗ dataset are ev aluated. Author EEG Channel Extracted features Mo del \ Algorithm A CC \ MF1 [19] SC ∗ Fpz-Cz (41950 ep o chs) - - - - - - - - - - SC ∗ Pz-Oz (41950 ep o chs) Raw data without any prior processing Recurrent Neural Netw orks with class-balanced random sampling and stochastic gradient descent 82.0% \ 76.9% ( κ =76%) - - - - - - - - - - 79.8% \ 73.1% ( κ =72%) [16] SC ∗ Fpz-Cz (37022 ep o chs) (a) Complex Morlet wa velets (b) Time-domain amplitude characteristics (c) Pearson correlation co eﬃcient b etw een each pair of frequency-band signals (d) Auto correlation in the time domain signal for 0.5s lags Stack ed sparse autoenco ders induced by the class-balanced random sampling 78.9% \ 73.7% [18] SC ∗ Fpz-Cz (37022 ep o chs) Raw data without any prior processing Convolutional Neural Netw orks with class-balanced random sampling and stochastic gradient descend 74.8% \ 69.8% Ours SC ∗ Fpz-Cz (41950 ep o chs) - - - - - - - - - - - - - - - - - SC ∗ Pz-Oz (41950 ep o chs) - - - - - - - - - - - - - - - - - SC ∗ Fpz-Cz + Pz-Oz (41950 ep o chs) - - - - - - - - - - - - - - - - - ST ∗ Fpz-Cz (20988 ep o chs) - - - - - - - - - - - - - - - - - ST ∗ Pz-Oz (20988 ep o chs) - - - - - - - - - - - - - - - - - ST ∗ Fpz-Cz + Pz-Oz (20988 ep o chs) Scattering sp ectrum (multiview) diﬀusion map Support V ector Machine 82.72% \ 75.91% ( κ =76.10%) - - - - - - - - - - 80.99% \ 72.69% ( κ =73.49%) - - - - - - - - - - 84.44% \ 78.25% ( κ =78.36%) - - - - - - - - - - 78.63% \ 73.58% ( κ =69.48%) - - - - - - - - - - 75.74% \ 69.97% ( κ =65.39%) - - - - - - - - - - 79.05% \ 74.73% ( κ =70.31%) DIFFUSE TO FUSE EEG SPECTRA 15 Figure 4. A visualization of the intrinsic sleep features (from single c hannel) extracted from 22 diﬀeren t sub jects from the Sleep-EDF database (ST*). In subplot 3.2, w e plot { [ q 2 ( i ) , q 3 ( i ) , q 4 ( i )] } J i =1 , and in subplot 3.2, w e sho w { [ q 2 ( i + J ) , q 3 ( i + J ) , q 4 ( i + J )] } J i =1 . The ratios of the stages Awak e, REM, N1, N2, and N3 are 10%, 20%, 10%, 45%, and 15% respectively . Eac h point corresp onds to a 30-second ep o ch. Figure 5. A visualization of the common in trinsic sleep fea- tures (from tw o channels) extracted from 22 diﬀerent sub jects from the Sleep-EDF database (ST*). In subplot 3.2, we plot { [ q 2 ( i ) , q 3 ( i ) , q 4 ( i )] } J i =1 , and in subplot 3.2, w e sho w { [ q 2 ( i + J ) , q 3 ( i + J ) , q 4 ( i + J )] } J i =1 . The ratios of the stages Awak e, REM, N1, N2, and N3 are 10%, 20%, 10%, 45%, and 15% respectively . Eac h point corresp onds to a 30-second ep o ch. c hannel and 52% when we use tw o channels), it is still relatively low. Also, note that the prediction performance of N3 is lo w er, and a signiﬁcant p ortion of N3 is mis-classiﬁed as N2. 3.3. More comparisons. 3.3.1. Hand ling imb alanc e d data. An ideal approac h to handle the imbalanced data is collecting more data for the small-sized group to enhance the prediction accuracy . In the SC ∗ dataset, there were long p erio ds of a w ak e ep o chs b efore the start and 16 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU T able 4. Performance of DM with the scattering transform ev al- uated b y 22-fold lea ve-one-sub ject-out cross-v alidation on the Fpz- Cz or Pz-Oz c hannel from the Sleep-EDF ST ∗ database. F or Fpz- Cz, the ov erall accuracy equals 78.63%, the macro F1 score equals 73.58% and Cohen’s k appa equals 69 . 48%. The standard devia- tion of classiﬁcation accuracy (resp., the macro F1 score and Co- hen’s k appa) for the 22-night recordings is 7.97% (resp., 8.11% and 10.25%). F or Pz-Oz, the ov erall accuracy equals 75.74%, the macro F1 score equals 69.97% and Cohen’s k appa equals 65 . 39%. The standard deviation of classiﬁcation accuracy (resp., the macro F1 score and Cohen’s k appa) for the 22-nigh t recordings is 7.24% (resp., 8.30% and 8.92%). Fpz-Cz Predicted P er-class Metrics Awak e REM N1 N2 N3 PR RE F1 Awak e (10%) 1726 ( 78% ) 63 ( 3% ) 364 ( 16% ) 55 ( 3% ) 5 ( 0% ) 79 78 78 REM (20%) 84 ( 2% ) 3196 ( 77% ) 326 ( 8% ) 518 ( 13% ) 6 ( 0% ) 86 77 82 N1 (10%) 327 ( 16% ) 256 ( 13% ) 1003 ( 49% ) 444 ( 22% ) 2 ( 0% ) 50 49 50 N2 (45%) 51 ( 0% ) 190 ( 2% ) 307 ( 3% ) 8502 ( 90% ) 439 ( 5% ) 81 90 85 N3 (15%) 10 ( 0% ) 1 ( 0% ) 7 ( 0% ) 1034 ( 33% ) 2093 ( 67% ) 82 67 74 Pz-Oz Predicted P er-class Metrics Awak e REM N1 N2 N3 PR RE F1 Awak e (10%) 1803 ( 81% ) 38 ( 2% ) 334 ( 15% ) 37 ( 2% ) 1 ( 0% ) 74 81 78 REM (20%) 59 ( 2% ) 3272 ( 79% ) 204 ( 5% ) 592 ( 14% ) 3 ( 0% ) 78 79 79 N1 (10%) 449 ( 22% ) 383 ( 19% ) 801 ( 39% ) 395 ( 20% ) 4 ( 0% ) 49 39 44 N2 (45%) 110 ( 1% ) 401 ( 4% ) 282 ( 3% ) 8176 ( 86% ) 520 ( 6% ) 79 86 82 N3 (15%) 5 ( 0% ) 75 ( 3% ) 9 ( 0% ) 1195 ( 38% ) 1861 ( 59% ) 78 59 67 after the end of sleep that w e can use. T o further ev aluate the algorithm, w e consider longer p eriods of w akefulness just before and after the sleep p erio ds. Apart from the 2 recordings (sc4092e0 and sc4192e0), we included 90 minutes of aw ak e p eriods b efore and after the sleep p erio ds. F or the sc4092e0 and sc4192e0 recordings, w e only included 60 minutes of aw ak e p erio ds just b efore and after the sleep p erio ds due to the app earance of artifacts (lab eled as MOVEMENT and UNKNOWN), whic h w ere in the start or the end of each recording. With more a wak e ep o c hs, the corresp onding comparison matrix is sho wn in T able 6. In comparison with the result sub ject to the 30-min ute truncation rule, the accuracy and F1 of Awak e increase from 88% and 90% to 91% and 94% resp ectively , and the accuracy and F1 of N1 c hange from 43% and 46% to 63% and 43% resp ectiv ely . Note that the F1 of N1 decreases since the precision decreases due to the increased n umber of a w ake ep o chs and increased n umber of ep o chs erroneously classiﬁed as N1. Another approac h is handling the imbalanced data according to the class-balanced random sampling scheme prop osed in [16] under the same 30-minute truncation rule. Sp eciﬁcally , for eac h recording, w e randomly sample K ep o chs for p er-stage, DIFFUSE TO FUSE EEG SPECTRA 17 T able 5. Performance of the m ultiview DM with the scatter- ing transform ev aluated by 22-fold lea ve-one-sub ject-out cross- v alidation on the Fpz-Cz and Pz-Oz channels from the Sleep-EDF ST ∗ database. The ov erall accuracy equals 79.05%, the macro F1 score equals 74.73% and Cohen’s k appa equals 70 . 31%. The stan- dard deviation of classiﬁcation accuracy (resp., the macro F1 score and Cohen’s k appa) for the 22-night recordings is 7.21% (resp., 7.58% and 9.38%). Fpz-Cz+Pz-Oz Predicted Pe r-class Metrics Awak e REM N1 N2 N3 PR RE F1 Awak e (10%) 1838 ( 83% ) 19 ( 1% ) 323 ( 15% ) 29 ( 1% ) 4 ( 0% ) 79 83 81 REM (20%) 26 ( 1% ) 3344 ( 81% ) 288 ( 7% ) 465 ( 11% ) 7 ( 0% ) 86 81 84 N1 (10%) 344 ( 17% ) 244 ( 12% ) 1059 ( 52% ) 380 ( 19% ) 5 ( 0% ) 52 52 52 N2 (45%) 109 ( 1% ) 253 ( 3% ) 352 ( 4% ) 8265 ( 87% ) 510 ( 5% ) 81 87 84 N3 (15%) 8 ( 0% ) 10 ( 0% ) 10 ( 0% ) 1016 ( 32% ) 2101 ( 67% ) 80 67 73 where K is the num b er of epo chs of the least represented stage (N1). W e apply this class-balanced random sampling sc heme b efore using the SVM to train the common intrinsic sleep features. See T able 6 for details. Compared with T able 2, the accuracy of N1 can be improv ed from 43% to 76% and the o verall accuracy decreases to 81.12%. 3.3.2. With and without the sensor fusion. T o appreciate the signiﬁcance of the diﬀusion geometry based sensor fusion framework, w e report the results without the critical setup in the prop osed algorithm – the sensor fusion. W e consider the case if w e simply concatenate in trinsic sleep features of t w o channels, instead of taking the common in trinsic sleep features; that is, w e concatenate Φ x t ( u ( j ) ) and Φ y t ( u ( j ) ) in (11) directly to replace (13) when w e train the kernel SVM model. W e also compare the single EEG channel result with the fusion result; that is, we train the k ernel SVM on the in trinsic sleep features extracted from the Fpz-Cz or Pz-Oz c hannel. Diﬀeren t comparisons of the SC ∗ database are shown in Figure 6. It is clear that the a v eraged A CC, MF1 and Cohen’s k appa are consisten tly do wngraded in these three cases. In Figure 6, we see that compared with single channel or sensor fusion with a direct concatenation, the prop osed sensor fusion of tw o c hannels consistently impro ve the result with statistical signiﬁcance, for both mean and v ariance of A CC, MF1 and Cohen’s k appa. This fact reﬂects the essen tial property of the diﬀusion- based algorithms. Via the diﬀusion-based sensor fusion, the intrinsic sleep features are “stabilized”, and hence a smaller v ariance. 4. Discussion and Conclusion The diﬀusion geometry based sensor fusion framework is prop osed to capture the geometric structure of the sleep dynamics. W e take the scattering transform of EEG signals as the initial feature and test the framework on the publicly a v ailable b enc hmark database. With the learning algorithm SVM, w e obtain an accurate 18 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU T able 6. Performance of the m ultiview DM with the scatter- ing transform ev aluated by 20-fold lea ve-one-sub ject-out cross- v alidation on Fpz-Cz and Pz-Oz channels from the Sleep-EDF SC* database with tw o diﬀerent approac hes to handle the imbal- anced groups. (1) take a longer aw ake perio ds; (2) apply the class- balanced random sampling sc heme proposed in [16]. F or the ap- proac h (1), the ov erall accuracy equals 83.69%, the macro F1 score equals 77.11% and Cohen’s k appa equal 78.14%. If the classiﬁca- tion accuracy , macro F1 score, and Cohen’s k appa are computed for each nigh t recording, the standard deviation of classiﬁcation accuracy (resp. the macro F1 score and Cohen’s k appa) for the 39-nigh t recordings is 7.23% (resp. 5.47% and 8.77%). F or the ap- proac h (2), the ov erall accuracy equals 81.12%, the macro F1 score equals 77.33% and Cohen’s k appa equals 74 . 82%. If the classiﬁca- tion accuracy , macro F1 score, and Cohen’s k appa are computed for each nigh t recording, the standard deviation of classiﬁcation accuracy (resp., the macro F1 score and Cohen’s k appa) for the 39-nigh t recordings is 6.29% (resp., 7.30% and 8.77%). Longer aw ake perio ds Fpz-Cz+Pz-Oz Predicted P er-class Metrics Awak e REM N1 N2 N3 PR RE F1 Awak e (34%) 15478 ( 91% ) 221 ( 1% ) 1245 ( 7% ) 76 ( 1% ) 44 ( 0% ) 97 91 94 REM (15%) 114 ( 2% ) 5954 ( 77% ) 1071 ( 14% ) 570 ( 7% ) 8 ( 0% ) 80 77 79 N1 (5%) 167 ( 6% ) 499 ( 18% ) 1754 ( 63% ) 351 ( 12% ) 33 ( 1% ) 33 63 43 N2 (35%) 130 ( 1% ) 740 ( 4% ) 1196 ( 6% ) 14337 ( 81% ) 1396 ( 8% ) 91 81 85 N3 (11%) 18 ( 0% ) 3 ( 0% ) 44 ( 1% ) 407 ( 7% ) 5231 ( 92% ) 78 92 84 Class-balanced random sampling scheme [16] Fpz-Cz+Pz-Oz Predicted P er-class Metrics Awak e REM N1 N2 N3 PR RE F1 Awak e (19%) 6444 ( 81% ) 103 ( 1% ) 1313 ( 17% ) 31 ( 0% ) 36 ( 1% ) 96 81 88 REM (18%) 35 ( 1% ) 5794 ( 75% ) 1322 ( 17% ) 563 ( 7% ) 3 ( 0% ) 87 75 81 N1 (7%) 186 ( 7% ) 238 ( 8% ) 2123 ( 76% ) 242 ( 9% ) 15 ( 0% ) 33 76 46 N2 (42%) 58 ( 0% ) 529 ( 3% ) 1595 ( 9% ) 14517 ( 82% ) 1100 ( 6% ) 92 82 86 N3 (14%) 15 ( 0% ) 1 ( 0% ) 65 ( 1% ) 469 ( 8% ) 5153 ( 91% ) 82 90 86 prediction model, and the result is compatible with several state-of-the-art algo- rithms based on neural net w ork (NN). In addition, the chosen tools all hav e solid theoretical backup. All these summarize the usefulness of the diﬀusion geometry framew ork for the sleep dynamics study . W e mention that the prop osed framework is ﬂexible to study other physiological dynamics but not only for studying the sleep dynamics. F or example, its v ariation has b een applied to study f-wa ve from sub jects with atrial ﬁbrillation [37], in tra-cranial EEG signal [38], fetal electrocardiogram (ECG) extraction from single lead trans-ab dominal maternal ECG signal [39], etc. 4.1. Dealing with misclassiﬁcation of stage 1 sleep. Although our ov erall prediction accuracy is compatible with the state-of-the-art prediction algorithm in the Sleep-EDF SC* database, like [19], we see that the prediction accuracy of N1 DIFFUSE TO FUSE EEG SPECTRA 19 Figure 6. Comparison b etw een diﬀerent information fusion meth- o ds in terms of the accuracy (ACC), macro F1-score (MF1) and Cohen’s Kappa for the SC* database. T o ev aluate if the mean is impro ved, the one-tail Wilco xon signed-rank test is applied under the n ull h yp othesis that the diﬀerence b etw een the pairs follows a symmetric distribution around zero. T o ev aluate if the v ariance is smaller, we apply the one-tail F test under the n ull hypothesis that there is no diﬀerence b etw een the v ariances. ? (resp ectively ?? ) means statistical signiﬁcance without (resp ectively with) the Bonferroni connection when the mean is compared; # (resp ectively ##) means statistical signiﬁcance without (respectively with) the Bonferroni connection when the v ariance is compared. is low by our algorithm (when there is only a single channel, Fpz-Cz, and with the same num b er of ep o c hs, the F1 of N1 prediction is 41% by our metho d and 46.6% in [19, T able I I I], and the ov erall accuracy is 82.72% b y our method and 82% in [19, T able II I]). This prediction rate of N1 is also lo w in the Sleep-EDF ST* database. This lo w prediction rate partially comes from the relativ ely small size of N1 epo chs, and partially comes from av ailable c hannels that w e discuss b elow. 20 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU Based on the AASM criteria [1, 2], to distinguish N1 and REM, we need elec- tro o culogram and electrom y ogram signals, which are not av ailable in the dataset. The EEG backgrounds of N1 and N2 are the same, and exp erts distinguish N1 and N2 b y the K-complex or spindle, as w ell as the 3-minute rule . While w e use 90 seconds signal to establish in trinsic sleep features, the 3-min ute rule is not consid- ered in the algorithm. In the prop osed algorithm, the temp oral information among ep o chs is not fully utilized when we design the intrinsic sleep feature. How to in- corp orate the temporal information in to the diﬀusion geometry framework will b e explored in the future. F urthermore, there are other information in addition to the sp ectral information discussed in this pap er. W e do not extensively explore all p os- sible information, but fo cus on the diﬀusion geometry and sensor fusion framew ork. F or example, while the v ertex sharp is a common “landmark” indicating transition from N1 to N2, we do not take it in to account. Another in teresting reasoning that it is possible to improv e the N1 accuracy is based on the deep neural netw ork (DNN) result [19]. This suggests that b y taking exp erts’ labels in to accoun t, some distinguishable EEG structure of N1 that is not sensible the scattering transform can b e eﬃciently extracted by the DNN framew ork prop osed in [19]. In conclusion, since the prop osed features dep end solely on the scattering transform, we ma y need features of diﬀeren t categories to capture this unseen N1 structure. Note that b eside N1, the prediction p erformance of N3 is also low er in the Sleep- EDF ST* database, where the sub jects tak e temazepam before data recording. It has b een well known that in general benzo diazepine hypnotics [40] reduces the lo w frequency activit y and enhances spindle frequency . Since our features are mainly based on the sp ectral information, a N3 ep o c h might look more like N2 epo chs, and hence the confusion and the lo wer p erformance. This outcome emphasizes the imp ortance of the drug history when designing the algorithm. 4.2. Comparison with DNN. Compared with the DNN approach [19], whic h is sup ervised in nature, our feature extraction step is unsupervised in nature. Recall that the main diﬀerence b etw een the supervised learning and unsup ervised learn- ing is that the lab el information is taken into account in the sup ervised learning approac h. The success of DNN in many ﬁelds is w ell known [41], and it is not surprised that it has a great p otential to help medical data analysis. While DNN is in general an useful to ol for the engineering purp ose, it is often criticized of working as a blac k b o x. F or medical problems and datasets, when in terpretation is needed, a mathematically solid and interpretable to ol would b e useful. The algorithm w e prop osed, on the other hand, has a clear in terpretation with solid mathematical supp orts. Speciﬁcally , the scattering transform w e c ho ose has a solid harmonic analysis supp ort, and it is motiv ated b y capturing the essence of CNN. In this sense, our approach com bines the “neural netw ork” and diﬀusion geometry to design features for the sleep dynamics. Note that besides knowing that the scattering transform allows us capturing the dynamical information from the sp ectral domain in a nonlinear w ay , we cannot claim that we ha ve a full interpre- tation of what physiological information the scattering transform tells us. This is a ph ysiological problem we need to explore in the future w ork. Moreov er, a p eculiar prop ert y of medical databases, the “uncertaint y”, deserves more discussion. T ak e the sleep dynamics studied in this pap er as an example. It is well known that the in ter-exp ert agreemen t rate is only about 80% for normal sub jects, not to sa y for sub jects with sleep problems [13]. With this uncertaint y , a sup ervised learning DIFFUSE TO FUSE EEG SPECTRA 21 algorithm might learn both goo d and bad lab els. On the other hand, the prop osed feature extraction approach is independent of the provided labels, and the chosen scattering features all come from the EEG signal, and sp eak solely for the sleep dynamics but not the lab els. T o some extent, the “uncertaint y” issue is less critical via our feature extraction approach, since the uncertain labels are not tak en in to accoun t in the feature extraction step. Since b oth sup ervised and unsup ervised approaches ha ve their own merits, it is natural to seek for a wa y to combine b oth. W e are exploring the p ossibility of com bining DNN and the proposed feature extraction techniques, and the result will b e rep orted in the future w ork. 4.3. Limitation and future w ork. Despite the strength of the prop osed method, the discussion is not complete without mentioning its limitations. While we test the algorithm on tw o databases, and compare our results with those of state-of-the-art algorithms, those databases are small. T o draw a conclusion and conﬁrm its clinical applicabilit y , a large scale and prospective study is needed. Although extended theoretical understandings of applied algorithms, including the scattering transform and diﬀusion-based sensor fusion hav e b een established in the past decade, there are still op en problems we need to explore. F or example, there are op en questions on the the statistical prop erty of scattering transform on EEG. While alternating diﬀusion map and multiview DM are designed for the same “sensor fusion ” purp ose, they are developed under diﬀeren t motiv ations, and the consequence is never discussed. The relationship b et ween alternating DM and m ul- tiview DM when there are more than 2 c hannels is also not clear at this momen t. It is well known that DM leads to an embedding free up to rotation, so aggregating in trinsic sleep features from diﬀerent sub jects is not a suitable approach. While the prop osed approach, constructing a universal coordinate by applying DM on the p o ol of all scattering EEG sp ectral features from all sub jects, leads to a satisfac- tory result, in general we run in to the computational issue. Finding a systematic n umerical solution to this problem is out of the scop e of this work. The im balanced dataset is a general challenge in data science. Ma y we take more physiological kno wledge to conquer this limitation is an interesting direction to explore. Another op en problem is how to further take the temp oral information, lik e the 3-minute rule, into account when w e deal with the in ter-individual prediction. The abov e men tioned limitations will be studied and reported in the future work. 5. Conclusion A no vel sleep stage prediction algorithm with an accurate automatic sleep stage annotation is prop osed. The main nov elt y is organizing scattering EEG spectral features b y the unsup ervised diﬀusion geometry based sensor fusion framework, whic h has a solid mathematical backup. In addition to providing a visualization of the sleep dynamics, the prediction result is compatible with several state-of-the-art algorithms based on neural netw ork. In addition to visually seeing the relationship b et ween diﬀeren t sleep stages, the prediction result suggests its p otential in prac- tical applications. The ﬂexibilit y of the prop osed sensor fusion framew ork allo ws researc hers to further explore other researc h problems that hav e complicated dy- namics. F or the reproducibility purp ose and its application to other problems, the Matlab co de will be pro vided via request. 22 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU 6. Funding G.-R. Liu is supp orted b y Ministry of Science and T echnology (MOST) gran t MOST 106-2115-M-006 -016 -MY2 . Y.-L. Lo is supp orted b y MOST grant MOST 101-2220-E-182A-001, 102-2220-E-182A-001, 103-2220-E-182A-001 and MOST 104-220-E-182-002 . Y.-C. Sheu is supp orted b y MOST gran t MOST 106-2115-M-009-006 and NCTS, T aiwan. References [1] C. Iber, S. Ancoli-Isreal, A. C. Jr., , S. Quan, The AASM Manual for Scoring of Sleep and Associated Even ts-Rules: T erminology and T echnical Sp eciﬁcation, American Academy of Sleep Medicine, 2007. [2] R. B. Berry , D. G. Budhira ja, et al., Rules for scoring respiratory events in sleep: update of the 2007 AASM manual for the scoring of sleep and asso ciated even ts, J Clin Sleep Med 8 (5) (2012) 597–619. [3] A. Rec htschaﬀen, A. Kales, A Manual of Standardized T erminology , T ec hniques and Scor- ing System for Sleep Stages of Human Sub jects, W ashington: Public Health Service, US Gov ernment Printing Oﬃce, 1968. [4] C. B. Saper, The Neurobiology of Sleep, Continuum 19 (1) (2013) 19–31. [5] T. Kanda, N. Tsujino, E. Kuramoto, Y. Koyama, E. A. Susaki, S. Chik ahisa, H. F unato, Sleep as a biological problem: an ov erview of frontiers in sleep research, The Journal of Physiological Sciences 66 (1) (2016) 1–13. [6] Y. Harrison, J. A. Horne, The impact of sleep depriv ation on decision making: a review., Journal of experimental psychology: Applied 6 (3) (2000) 236. [7] A. Karni, D. T anne, B. S. Rub enstein, J. J. Askenasy , D. Sagi, Dep endence on REM sleep of ov ernight improv ement of a p erceptual skill, Science 265 (5172) (1994) 679–682. [8] J.-E. Kang, M. M. Lim, R. J. Bateman, J. J. Lee, L. P . Sm yth, J. R. Cirrito, N. F ujiki, S. Nishino, D. M. Holtzman, Amyloid-b Dynamics are regulated by Orexin and the sleep- wak e cycle, Science 326 (Nov 13) (2009) 1005–1007. [9] F. Roche Camp o, X. Drouot, A. W. Thille, F. Galia, B. Cabello, M.-P . D’Ortho, L. Brochard, Poor sleep qualit y is associated with late nonin v asive ven tilation failure in patien ts with acute hypercapnic respiratory failure., Critical care medicine 38 (2) (2010) 477–85. [10] J. Horne, L. Reyner, V ehicle accidents related to sleep: a review., Occupational and environ- mental medicine 56 (5) (1999) 289–294. [11] D. Leger, V. Ba yon, J. Laaban, P . Philip, Impact of sleep apnea on economics, Sleep medicine reviews 16 (5) (2012) 455–62. [12] N. A. Collop, W. M. Anderson, B. Boehleck e, D. Claman, R. Goldb erg, D. J. Gottlieb, D. Hudgel, M. Sateia, R. Sch wab, Clinical guidelines for the use of unattended p ortable monitors in the diagnosis of obstructive sleep apnea in adult patients, J Clin Sleep Med 3 (7) (2007) 737–747. [13] R. G. Norman, I. Pal, C. Stewart, J. A. W alsleb en, D. M. Rapoport, Interobserver Agreement Among Sleep Scorers F rom Diﬀeren t Cen ters in a Large Dataset, Sleep 23 (7) (2000) 901–908. [14] S. Gudmundsson, T. Runarsson, S. Sigurdsson, Automatic Sleep Staging using Supp ort V ec- tor Machines with Posterior Probability Estimates, in: CIMCA-IA WTIC’06, 2006, pp. 2–7. [15] P . Memar, F. F aradji, A nov el multi-class EEG-based sleep stage classiﬁcation system, IEEE T ransactions on Neural Systems and Rehabilitation Engineering 26 (1) (2018) 84–95. [16] O. Tsinalis, P . M. Matthews, Y. Guo, Automatic Sleep Stage Scoring Using Time-F requency Analysis and Stack ed Sparse Auto enco ders, Annals of Biomedical Engineering 44 (5) (2016) 1587–1597. [17] A. Goldb erger, L. Amaral, L. Glass, J. Hausdorﬀ, P . Iv anov, R. Mark, J. Mietus, G. Mo ody , C.-K. Peng, H. Stanley , Ph ysiobank, physiotoolkit, and physionet: Comp onents of a new research resource for complex physiologic signals., Circulation 101 (23) (2000) e215–e220. [18] O. Tsinalis, P . M. Matthews, Y. Guo, S. Zafeiriou, Automatic sleep stage scoring with single- channel EEG using convolutional neural networks, [19] A. Supratak, H. Dong, C. W u, Y. Guo, DeepSleepNet: a mo del for automatic sleep stage scoring based on raw single-channel EEG, IEEE T rans. Neural. Syst. Rehabil. Eng. 25 (2017) 1998–2008. DIFFUSE TO FUSE EEG SPECTRA 23 [20] A. Vilamala, K. H. Madsen, L. K. Hansen, Deep Con volutional Neural Net works for In ter- pretable Analysis of EEG Sleep Stage Scoring, in: 2017 IEEE in ternational w orkshop on machine learning for signal pro cessing, 2017. [21] S. Mallat, Group inv arian t scattering, Comm unications on Pure and Applied Mathematics 65 (10) (2012) 1331–1398. [22] R. R. Coifman, S. Lafon, Diﬀusion maps, Appl. Comput. Harmon. Anal. 21 (1) (2006) 5–30. [23] O. Lindenbaum, A. Y eredor, M. Salhov, A. Av erbuch, MultiView Diﬀusion Maps, under review (2015). [24] B. Scholk opf, A. Smola, Learning with Kernels, MIT Press, 2002. [25] J. And´ en, S. Mallat, Deep scattering spectrum, IEEE T ransactions on Signal Processing 62 (16) (2014) 4114–4128. [26] J. Bruna, S. Mallat, E. Bacry , J. Muzy , et al., Intermitten t pro cess analysis with scattering moments, The Annals of Statistics 43 (1) (2015) 323–351. [27] P . B ´ erard, G. Besson, S. Gallot, Embedding Riemannian manifolds by their heat kernel, Geom. F unct. Anal. 4 (1994) 373–398. [28] U. v on Luxburg, M. Belkin, O. Bousquet, Consistency of sp ectral clustering, Ann. Stat. 36 (2) (2008) 555–586. [29] A. Singer, H.-T. W u, Sp ectral con vergence of the connection Laplacian from random samples, Information and Inference: A Journal of the IMA 6 (1) (2017) 58–123. [30] N. El Karoui, On information plus noise kernel random matrices, Ann. Stat. 38 (5) (2010) 3191–3216. [31] N. El Karoui, H.-T. W u, Connection graph Laplacian metho ds can b e made robust to noise, Ann. Stat. 44 (1) (2016) 346–372. [32] R. R. Lederman, R. T almon, Learning the geometry of common latent v ariables using alternating-diﬀusion, Appl. Comp. Harmon. Anal. in press (2015). [33] R. T almon, H.-T. W u, Discov ering a latent common manifold with alternating diﬀusion for multimodal sensor data analysis, Appl. Comput. Harmon. Anal., In pr ess . [34] I. S. Dhillon, Co-clustering do cuments and words using bipartite spectral graph partition- ing, in: Proceedings of the sev en th A CM SIGKDD international conference on Knowledge discov ery and data mining, ACM, 2001, pp. 269–274. [35] O. Katz, R. T almon, Y.-L. Lo, H.-T. W u, Diﬀusion-based nonlinear ﬁltering for multimodal data fusion with application to sleep stage assessment, Information F usion, In pr ess . [36] R. Rifkin, A. Klautau, In Defense of One-Vs-All Classiﬁcation, Journal of Machine Learning Research 5 (2004) 101–141. [37] J. Malik, N. Reed, C.-L. W ang, H.-T. W u, Single-lead f-w a ve extraction using diﬀusion ge- ometry , Physiological Measurement 38 (2017) 1310–1334. [38] S. Alagapan, H. W. Shin, F. F rohlich, H.-T. W u, Diﬀusion geometry approach to eﬃcien tly re- mov e electrical stim ulation artifacts in in tracranial electro encephalography , Journal of Neural Engineering in press (2018) http://iopscience.iop.org/article/10.1088/1741- - 2552/ aaf2ba . [39] R. Li, M. G. F rasch, H.-T. W u, Eﬃcient fetal-maternal ecg signal separation from t wo channel maternal ab dominal ecg via diﬀusion-based c hannel selection, F ron tiers in Ph ysiology 8 (2017) 277. [40] A. Borb´ ely , P . Mattmann, M. Loepfe, I. Strauch, D. Lehmann, Eﬀect of b enzodiazepine hypnotics on all-night sleep EEG spectra, Hum Neurobiol 4 (3) (1985) 189–94. [41] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015) 436–444. 24 G.-R. LIU, Y.-L. LO, J. MALIK, Y.-C. SHEU, AND H.-T. WU Dep ar tment of Ma thema tics, Na tional Chen-Kung University, T ainan, T aiw an Dep ar tment of Thoracic Medicine, Heal thcare Center, Chang Gung Memorial Hos- pit al, Chang Gung University, School of Medicine, New T aipei, T aiw an Dep ar tment of Ma thema tics, Duke University, Durham, NC, USA Dep ar tment of Applied Ma thema tics, Na tional Chiao Tung University, Hsinchu, T ai- w an Dep ar tment of Ma thema tics and Dep ar tment of St a tistical Science, Duke Univer- sity, Durham, NC, USA. Ma thema tics Division, Na tional Center for Theoretical Sci- ences, T aipei, T aiw an E-mail address : hauwu@math.duke.edu

Diffuse to fuse EEG spectra -- intrinsic geometry of sleep dynamics for classification

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment