Multitaper Spectral Estimation HDP-HMMs for EEG Sleep Inference

Multitaper Spectral Estimation HDP-HMMs f or EEG Sleep Infer ence Leon Chlon ∗ MIT Department of BCS Cambridge, MA 02139 lchlon@mit.edu Andrew H. Song ∗ MIT Department of EECS Cambridge, MA 02139 andrew90@mit.edu Sandya Subramanian Harvard-MIT HST Cambridge, MA 02139 sandya@mit.edu Hugo Soulat MIT Department of BCS Cambridge, MA 02139 hsoulat@mit.edu John T auber MIT Department of BCS Cambridge, MA 02139 jtauber@mit.edu Demba Ba SEAS Department of Harvard Cambridge, MA 02138 demba@seas.harvard.edu Michael Prerau MGH Department of Anesthesia, Critical Care and Pain Medicine 55 Fruit st, GRJ 4, Boston, MA 02114 prerau@nmr.mgh.harvard.edu Abstract Electroencephalographic (EEG) monitoring of neural acti vity is widely used for sleep disorder diagnostics and research. The standard of care is to manually classify 30-second epochs of EEG time-domain traces into 5 discrete sleep stages. Unfortunately , this scoring process is subjective and time-consuming, and the deﬁned stages do not capture the heterogeneous landscape of healthy and clinical neural dynamics. This motiv ates the search for a data-driven and principled way to identify the number and composition of salient, reoccurring brain states present during sleep. T o this end, we propose a Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM), combined with wide-sense stationary (WSS) time series spectral estimation to construct a generati ve model for personalized subject sleep states. In addition, we employ multitaper spectral estimation to further reduce the large v ariance of the spectral estimates inherent to ﬁnite-length EEG measurements. By applying our method to both simulated and human sleep data, we arriv e at three main results: 1) a Bayesian nonparametric automated algorithm that recov ers general temporal dynamics of sleep, 2) identiﬁcation of subject-speciﬁc "microstates" within canonical sleep stages, and 3) disco very of stage-dependent sub-oscillations with shared spectral signatures across subjects. 1 Introduction During sleep, the brain displays highly heterogeneous cortical oscillatory dynamics consisting of a complex interplay of numerous netw orks related to arousal and loss of consciousness [ 1 , 2 , 3 ]. The current clinical standard uses 30-second epochs of electroencephalogram (EEG) time-domain traces to categorize brain state during sleep into 5 discrete sleep stages: wake, rapid eye mo vement (REM) sleep, and non-REM sleep, which consists of 3 sub-stages, notated NREM1 through NREM3 [ 4 ]. The progression of these sleep stages through the night, called a hypnogram, is used to characterize ∗ These authors contributed equally to this w ork. Preprint. W ork in progress. sleep architecture. Ho wev er , as these stages are determined subjectiv ely through visual inspection by trained sleep technicians, this process is both time-consuming and inherently suffers from ∼ 20% inter-scorer v ariability [1, 5], which is greatly exacerbated in the case of pathological sleep [6]. The chief explanation for this variability is that sleep, in every dimension thus far studied, is a continuous and dynamic process [ 7 ], which is clea ved into a low-resolution, discrete, state-space by clinical staging. Furthermore, the current clinical standard is based on features easily observed in the time-domain by eye, and limited to a crude time and state resolution that is designed to reduce variability and scoring time in subjective visual categorization, rather than to maximize information content. Recent studies hav e highlighted the information-rich nature of neural oscillatory time-frequency dynamics within the sleep EEG that can observ ed through the multitaper spectrogram [ 5 ]. While there is indeed a continuum of changes in the sleep EEG, the identiﬁcation of rele vant and recurring combinations of oscillatory activity is vital for enhancing our understanding of the interaction of neural mechanisms underlying sleep, as well as pathophysiological de viations from the norm. It is therefore useful to de vise a method for parcellating sleep, which is data rather than semantically dri ven, incorporates time-frequency dynamics, and can determine the state resolution without limitations imposed by the need for human scorers or a pre-assumed number of states. Unsupervised machine learning methods are ideal candidates for this class of problem. Howe ver , existing parametric approaches including Deep Belief Network - Hidden Marko v Models (DBN- HMMs) [ 8 ], k-Nearest Neighbor (k-NN)-based methods [ 9 ], Conditional Random Fields (CRFs) [ 10 ] and others [ 11 , 12 ] constrain the total number of states. T o address this shortcoming, we propose a Bayesian nonparametric framework that integrates 1) the Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) [ 13 ] for the underlying state dynamics and, 2) spectral representations and asymptotic properties of the wide-sense stationary (WSS) time series for the emission distribution of the generative model. The multitaper spectral estimation framework [ 14 ] is incorporated to further optimize the bias and variance of the estimates of spectral characteristics for each state, compared to the estimates based on simple F ourier coefﬁcients of the observ ations. Finally , we use the symmetrized form of the Kullback–Leibler (KL) div ergence [ 15 ] to cluster the inferred states across our patient cohort, with interpretation provided by subject-matter experts. T o our knowledge, ours is the ﬁrst HDP-HMM to applied to human sleep EEG data, along with the multitaper framework. Our work is readily extensible beyond sleep inference to other domains of neuroscience, such as nonparametric state modeling of the neurophysiology underlying epilepsy or Parkinson’ s disease. 2 Model If the discrete time series, { y t } t , can be assumed to be realizations of a wide-sense stationary (WSS) stochastic process, we can apply se veral useful properties for parameter estimation and inference in time series analysis. WSS, or second-order stationarity , states that the mean and the autocov ariance of the stochastic process is in variant with respect to time t . In practice, since the long time series ( ∼ sev eral hours of data for our study) may demonstrate nonstationarity , we se gment the data into multiple non-ov erlapping windows, within which we assume WSS. Let N be the number of sample points for the entire time series and J the number of samples in each window . Then, T= b N J c denotes the number of windows for the data. W ith t = 1 , · · · , T as the index for a window , we deﬁne y t = { y ( t − 1) J , · · · , y tJ − 1 } as the samples in the window t . 2.1 Spectral Representation for Wide-Sense Stationary Time Series T o analyze the discretely sampled time series with sampling rate F s in the frequency domain, we use the Discrete Fourier T ransform (DFT) [ 16 ]. Denoting y ( F ) t ∈ C J as the DFT coefﬁcients of y t ∈ R J , y ( F ) t ( w j ) = 1 J J X l =1 y ( t − 1) J + l W ( j − 1)( l − 1) J (1) where W J = exp  − i 2 π J  and w j = j − 1 J for j = 1 , · · · , J 2 is the normalized frequency . The con version between frequency in Hz, f j , and w j is giv en as f j = F s w j . W e will use w j to refer the frequency index throughout this work. Also for notational conv enience, we will use d Re j,t = Re  y ( F ) t ( w j )  and d I m j,t = I m  y ( F ) t ( w j )  . W e now introduce the follo wing lemma [17]: 2 Figure 1: The generati ve model for the HDP-HMM and example spectrogram. Left : Graphical Model of HDP-HMM. T op-Right : Multi-tapered spectrogram of 80 minutes of sleep EEG data. Bottom right : e xtracted PSD of 15 second windows. Proposition 1. If y t is a WSS time series, the DFT coefﬁcients, d Re j,t and d I m j,t , are distributed as asymptotically independent normal, as J → ∞ , d Re j,t ∼ N  0 , f ( w j ) 2  and d I m j,t ∼ N  0 , f ( w j ) 2  (2) where f ( w j ) is the true underlying PSD of the stochastic process at w j [ 17 ] (for further explanation, see Supplementary Material Section 1). The independence property is the guiding principle for our generativ e model in section 2.3 and clustering analysis in section 3.2. 2.2 Multitaper Spectral Estimation The spectral estimation problem, or the problem of estimating the true PSD, f ( w j ) , from observations (DFT coefﬁcients), has recei ved great attention. The simplest estimate, albeit with high bias and variance, is b f ( w j ) = k y ( F ) t ( w j ) k 2 , the periodogram . Numerous methods hav e attempted to lower both bias and variance of b f ( w j ) [ 18 ]. Multitaper spectral estimation [ 19 ] optimizes the reduction in bias and variance of the PSD estimate by applying a set of speciﬁc tapers, or windowing functions, to the observed time series to obtain M pseudo-observations, { y ( m,F ) t ( w j ) } M m =1 , y ( m,F ) t ( w j ) = 1 J J X l =1 h ( m ) l y ( t − 1) J + l W ( j − 1)( l − 1) J where h ( m ) l ∈ R J is the m th taper , M being the number of tapers. The discrete prolate spheroidal sequences, which are mutually orthogonal with optimal energy concentration properties, are used as tapers. W ith these tapers, the periodogram estimates formed from the pseudo-observ ations, {k y ( m,F ) t ( w j ) k 2 } M m =1 , are approximately uncorrelated [ 14 ]. Therefore, we can write the ﬁnal estimator , b f M T ( w j ) , as, b f M T ( w j ) = 1 M M X m =1 k y ( m,F ) t ( w j ) k 2 (3) Figure 1 shows an e xample of the multitapered spectrogram and PSD for a segment of sleep EEG. 2.3 Generative model Our generativ e model, as depicted in Figure 1, follo ws an HDP-HMM framework [ 13 ]. The observ a- tions, or DFT coefﬁcients for each time windo w across frequency bands of interest, are generated by the spectral r epresentation emission model . W e denote the spectral representation at each time window t as y ( m,F ) t ( w j ) , where t = 1 , · · · , T is the time windo w , m = 1 , · · · , M the tapers, and w j for j = 1 , · · · , J 2 is the frequency inde x. 3 2.3.1 Spectral Representation Emission Model W e use the normal distrib ution in eq.(2) as the likelihood, p  y ( F ) t    θ ( s t )  , where θ ( s t ) = { f ( w ( s t ) j ) } j ,  d Re 1 ,t , d I m 1 ,t , · · · , d Re J 2 ,t , d I m J 2 ,t  ∼ p  y ( F ) t    θ ( s t )  = N  ¯ 0 , Σ s t  = J 2 Y j =1 n N  0 , f ( w ( s t ) j ) 2 o 2 (4) where Σ s t = 1 2 diag ( f ( w ( s t ) 1 ) , f ( w ( s t ) 1 ) , · · · , f ( w ( s t ) J 2 ) , f ( w ( s t ) J 2 )) ∈ R 2 J × 2 J . Note that the diagonal cov ariance matrix is due to Proposition 1 in section 2.1. In practice, only indices that correspond to the frequency of interest are used. For the tapered case, we assume that the observation y ( m,F ) t is generated by the same state-dependent cov ariance matrix Σ s t . p  y ( m,F ) t    θ ( s t )  = N  ¯ 0 , Σ s t  for m = 1 , · · · , M (5) For the prior distribution on θ ( s t ) , we use the In verse Gamma distrib ution, due to the conjugacy between the normal likelihood and the In verse Gamma prior . 2.3.2 HDP-HMM construction W e model { y ( m,F ) t ( w j ) } T t =1 as the realization of a generati ve process dri ven by a set of local latent variables { s t } T t =1 ∈ { 1 , ..., K } that we term "sleep states". Sleep state modeling for each subject proceeds via the following construction for our HDP-HMM: β β β ∼ GEM ( γ ) , π π π k | β ∼ DP ( α, β β β ) , b ( s t ) j ∼ Gamma ( a 0 , b 0 ) , (6) f ( w ( s t ) j ) ∼ IG ( a, b ( s t ) j ) , s t | s t − 1 ∼ Multinomial ( π s t − 1 ) , y ( m,F ) t | s t ∼ N ( ¯ 0 , Σ s t ) (7) where IG denotes the In verse Gamma distribution for each frequenc y and state speciﬁc f ( w ( s t ) j ) as outlined in section 2.3.1. T o capture the frequency-le vel characteristics, we place an uninformati ve Gamma prior ov er the In verse Gamma hyperparameter b ( s t ) j . F ollowing standard con vention, β β β ∼ GEM ( γ ) denotes the stick-breaking construction for the Dirichlet Process (DP) gi ven by β k = β 0 k k − 1 Y i =1 (1 − β 0 i ) , (8) and β 0 k ∼ Beta (1 , γ ) . { β k 0 } K k 0 =1 represents a prior over the transition probabilities into a state k 0 , and π π π k = { π k,k 0 } K k 0 =1 represents the transition probability from state k to k 0 . Our HDP-HMM is fully nonparametric when we take the limit as K → ∞ in the GEM stick breaking construction. Howe ver , in our work, it is reasonable to truncate K since many poorly represented states will not necessarily confer any additional neurophysiological context, especially considering the ﬁnite nature of the data. Finally , we place uninformative gamma hyperpriors o ver the HDP hyperparameters γ and α : γ ∼ Gamma ( α γ , β γ ) and α ∼ Gamma ( α α , β α ) . 3 Inference 3.1 Subject-wise Inference - State T rajectory Sampling For model inference, we utilize beam sampling [ 20 ], which integrates slice sampling and dynamic programming to sample whole state trajectories from the model posterior . HDP-HMMs are inﬁnite- dimensional by formulation, which complicates the computation of state trajectories using ordinary dy- namic programming schema. By iterati vely sampling an auxiliary variable u t ∼ Uniform (0 , π s t − 1 ,s t ) , the beam sampler imposes a constraint π s t − 1 ,s t ≥ u t on sampled state transitions p ( s t | y ( m,F ) 1: t , u 1: t ) , and consequently sets an upper bound on the set of dynamically computed state trajectories (full details outlined in [ 20 ]). For the posterior distribution of f  w ( s t ) j  , we use following facts: 1) the 4 posterior distribution is also an In verse Gamma distribution due to conjugacy and 2) there are M pseudo-observations for the DFT coef ﬁcients. p  f  w ( s t ) j     { d Re, ( m ) j,k , d I m, ( m ) j,k } M m =1  = 1 2 IG  a j + M , b ( s t ) j + 1 2 M X m =1 n  d Re, ( m ) j,k  2 +  d Re, ( m ) j,k  2 o (9) with a gamma prior ov er the cluster and frequency-speciﬁc parameter b ( s t ) j . 3.2 Group Level Inference - Clustering Analysis Follo wing subject-lev el inference, we perform posthoc clustering analysis to in vestigate the shared set of states across different subjects. Since the HDP-HMM is modeled separately for each indi vidual, there is no guarantee that any pair of states from dif ferent subjects share similar spectral characteristics. Symmetric KL diver gence as the distance metric W e use the symmetric KL di ver gence between the Gaussian likelihoods [ 21 ], with the choice of the likelihood justiﬁed by the asymptotic normality from Proposition 1 . Let c = 1 , · · · , C , denote the cluster index and f c the spectral characteristic of the cluster c . Likewise, let b f s p be the spectral characteristic of the state s p of the subject p . This is chosen to be the MAP estimate of p  f s p    θ  in eq. (9). If the observ ation y ( F ) is generated from state s p , the log-likelihood is gi ven as (modulo the constants) log p  y ( F )    b f s p  = X w j h − log b f s p ( w j ) − k y ( F ) k 2 / b f s p ( w j ) i (10) If s p is clustered into cluster c , the log-likelihood with respect to f c would be log p  y ( F )    f c  = X w j h − log f c ( w j ) − k y ( F ) k 2 /f c ( w j ) i (11) T aking the expectation of the log-ratio of the lik elihoods, we obtain both sets of KL diver gence I ( b f s p ; f c ) = E p ( y ( F ) | b f s p ) " log p ( y ( F ) | b f s p ) p ( y ( F ) | f c ) # = X w j ( − log b f s p ( w j ) f c ( w j ) ! + b f s p ( w j ) f c ( w j ) − 1 ) (12) with I ( f c ; b f s p ) computed similarly . The symmetric KL diver gence, J ( b f s p ; f c ) , is deﬁned as follows J ( b f s p ; f c ) = I ( b f s p ; f c ) + I ( f c ; b f s p ) = X w j ( b f s p ( w j ) f c ( w j ) + f c ( w j ) b f s p ( w j ) − 1 ) (13) W eighted K-means clustering W ith eq.(13) as the distance metric, we now perform K-means clustering on all the states across the subjects, with f c as the centroid of the cluster c . Furthermore, we weight [ 22 ] J ( b f s p ; f c ) by n s p , the number of occurrences of s p in subject p . n s p effecti vely acts as another cov ariate for clustering the heterogeneous states. F or instance, high n s p would indicate that the state is likely to be one of the canonical sleep stages (REM or NREM). W e perform two additional preprocessing steps for the clustering analysis. First, we normalize power within each state, b f 0 s p ( w j ) = b f s p ( w j ) / P w j b f s p ( w j ) , to pre vent the total po wer from af fecting the clustering. The distance metric in Eq.(13) has a tendenc y to assign a high power state to a high po wer cluster , and vice versa, re gardless of the relativ e power distrib ution between frequency bands. After normalization, the algorithm is able to cluster the heterogeneous states based on a shared spectral signature. As a second procedure, we take the median v alue of n s p across posterior samples as the representativ e number of occurrences for the speciﬁc state for the patient. W e observe that after the burn-in period, n s p is quite consistent, which justiﬁes our choice of the median. 4 Datasets Simulated Sleep-Inspired Data T o test the robustness of our model, we simulate EEG time series with realistic spectral content and temporal dynamics. W e do not try to reproduce the extremely 5 Figure 2: Simulated Data. For each time window , an EEG signal was generated according to the current hidden state and its corresponding state-space model parameters. A typical EEG trace for one state is depicted on the top left panel. Its multitapered PSD is plotted on the top right panel (orange). The theoretical PSD (black) is determined by the state-space model parameters. The simulated and sampled state trajectories are presented on the bottom left panels. The 5 different states alternate according to a predeﬁned transition matrix, and the difference between simulated and sampled transition matrices are represented on the bottom right panel. complex short-time dynamics of sleep EEG; rather our simulated data gauges the model’ s capacity to extract a set of discrete spectral characteristics from time-series data. Simulated data is generated by combining the oscillation components time series decomposition method [ 23 ] and the sleep dynamics [5]. Our state-space model is described as follo w . For l = 1 , · · · , J x l +1 = Rx l + v l ∈ R 2 D , y l = ( 1 0 1 ... 1 0 ) x l + w l ∈ R (14) where D is the number of frequency components, v t ∼ N ( ¯ 0 , Q ) and w t ∼ N (0 , σ 2 r ) . W e deﬁne R as a block diagonal matrix composed of 2 by 2 rotation matrices R i where F s is the sampling frequency , and f i is a frequency (in Hz) of interest in the range i = 1 , ..., D and : R i = a i  cos (2 π f i /F s ) sin (2 π f i /F s ) sin (2 π f i /F s ) − cos (2 π f i /F s )  where 0 < a i < 1 , (15) Our observation PSD is the sum of the observation noise w t and the PSD of each oscillation. For any gi ven frequency , the latter peaks around f i and is fully determined by a i , f i and Q 2 i, 2 i (the 2 i th diagonal element of Q) [ 23 ]. Speciﬁc details on frequenc y bands modeled and the associated PSD for a typical 15s window of simulated data are a v ailable in the Supplementary Material Section 2. Sleep Study Data Our experimental data consists of 200Hz high-density (64-channel) EEG record- ings from nine healthy right-handed subjects (full clinical details in the original study [ 5 ]). W e used the occipital lobe (channel O1) recordings in accordance with a subject-matter expert, who selected the channel to assign canonical sleep stages to 30-second windows. The observ ations for our model consist of 15-second windows, intended to delineate high frequency sleep phenomena such as sleep ripples and spindles from the signal. W indows containing a total power greater than the 95th percentile of the entire signal were rejected from further analysis. Observation data consisted of multitapered spectral observ ations (5 tapers, Time Bandwith=4) in [0 . 5 − 2 . 5] Hz, [2 . 5 − 4 . 5] Hz, [4 . 5 − 6 . 5] Hz, [6 . 5 − 8 . 5] Hz, [10 . 5 − 12 . 5] Hz, and [12 . 5 − 35] Hz, where the increased resolution at the slo w and alpha frequencies is believ ed to assist the stratiﬁcation of the NREM1, NREM2, and NREM3 sleep stages [ 5 ]. Speciﬁcally , for each tapered windo w , we take the DFT and examine the power in each frequency band to determine the frequenc y index corresponding to the median. DFT coefﬁcients corresponding to the frequency index are selected as the spectral observ ations in the freqency band. 6 Figure 3: T op left: A multitaper spectogram across the entire sleep study duration for Subject 5. Middle left: A state trajectory chosen from the posterior distribution samples of the HDP-HMM according to the median Spearman rho estimate. Bottom left: Sleep expert hypnogram. T op right: Histogram of Spearman’ s rho values computed between all sampled state trajectories and the hypnogram, where the green v ertical line highlights the median of the distribution. Bottom right: The estimated power spectral density for each state in the state trajectory . 5 Results 5.1 Recovering Sleep State T rajectories For each run of the beam sampler, we initialize the following HDP hyperprior distributions: γ ∼ Gamma (1 , 1) , α ∼ Gamma (1 , 1) and b ( k ) j ∼ IG(1,1) . W e "burn-in" the ﬁrst 2000 poste- rior distribution samples, and draw a subsequent 100 samples for the simulation and 1000 samples for the real data with a step size of 50 to minimize inter-sample autocorrelation. Simulated Sleep-Inspired Data A typical simulation result is presented in Figure 2 in which we simulate 2000, 15-second time windows. F or each sample from the posterior , our model successfully recov ers the correct number of states and the exact state trajectory . Furthermore, the ground truth transition matrix is sampled almost perfectly . Finally , our model recov ers the discrete spectral characteristics of the dif ferent simulated states. The estimated cov ariances matrices correspond to the median of the theoretical PSD for each frequency band. As expected, the use of the multitaper framew ork resulted in a smaller v ariance in the estimated PSD when compared to a standard DFT approach. More detailed comparison can be found in the Supplementary Material Section 2. Human Subject EEG Hypnograms illustrate canonical sleep stage traces in terms of descending le vels of arousal, where a 5 on the hypnogram corresponds to the highest level of arousal (wake), and a 1 to the lowest (NREM3). F or comparison, our state labels are also reordered, using descending state- wise alpha band power as a proxy for arousal. Spearman’ s rho ρ t is used to compute the similarity in temporal dynamics between sampled state trajectories and the hypnogram. W e present Subject 5 as a case study in Figure 3, where all sampled state trajectories are used to construct a distribution over ρ t values. The distribution is tightly localized around a median value of ρ t = 0 . 73 , demonstrating that the model effecti vely captures the transition dynamics in the hypnogram. Remarkably , transitions between NREM and REM states in the hypnogram are mirrored by similar transitions between slow states S1, S2 and S3 to REM-like states such as S5, S7 and S8 in the state trajectory . Similarly , inter-NREM transitions in the hypnogram coincide with synon ymous transition ev ents in the state trajectory between S1, S2 and S3. Our model introduces additional "micro"-states beyond the standard canonical sleep stages, typically coinciding with more volatile dynamics in the spectrogram. This can clearly be discerned in Figure 3, where epochs of REM in the hypnogram are broken into alternating S8/S7 (characteristic REM PSDs) and S5 (slow dynamics dominated PSD) in the state trajectory . This is unsurprising considering that a) sleep consists of highly heterogeneous and ﬂuctuating oscillatory 7 Figure 4: Detailed Cluster Analysis for T wo Subjects. T op: Heatmaps illustrating the proportion of time spent in each sleep expert scored stage belongs to each cluster . Bottom: The cluster trajectories on the right show the temporal dynamics of the clusters compared to the sleep e xpert scored stages. dynamics, b) current deﬁned canonical sleep stages are unable to account for inter-subject v ariability and c) both additional and combined sleep stages hav e been reported [ 24 , 5 ]. When considering the variation introduced from additional subject-speciﬁc micro-states, the general temporal dynamics are recov ered well across all the subjects with an average distrib ution median value of ρ t = 0 . 69 . 5.2 Clustering Sleep States Across Subjects The clustering analysis aims to group states based on the underlying hypothesis that e very subject has both "REM-like" and "NREM-like" states; ho we ver , their speciﬁc spectral characteristics may v ary . Across the nine subjects, a total of 103 states and nine clusters were recovered. The HDP-HMM recov ered sleep states for each subject are then mapped to their respecti ve clusters. Finally , the cluster trajectory and hypnogram are compared for each subject. W e discuss their spectral characteristics in the Supplementary Material Section 3.1. Although nine clusters were identiﬁed, not all clusters were visited by each subject. An example of the heterogeneity in cluster dynamics across subjects is sho wn for two subjects in Figure 4. The heatmaps link the proportion of time spent in each sleep expert scored stage to the time spent in each cluster . The clusters are org anized from bottom to top in order of increasing power in the alpha band (8-12 Hz). For example, for Subject 3, the fourth column (REM) indicates that 90 percent of the windo ws labeled REM by the sleep e xpert were classiﬁed as belonging to Cluster 8 and the remaining 10 percent as belonging to Cluster 7. The cluster trajectory plots in Figure 4 show the temporal dynamics of the clusters compared to the sleep expert scored stages. There are se veral note worthy observ ations. Firstly , across all subjects, including the two in Figure 4, there is a clear shift to wards increasing power in the alpha band from deep sleep (NREM3) to lighter sleep (NREM1), REM, and then wake. This is evi denced by increasing proportions of time spent in higher clusters going from left to right on the heatmaps, which agrees with kno wn sleep stage physiology ([ 5 ]). Secondly , certain clusters are dominant during the same scored sleep stage across subjects, while others modulate their behavior between subjects. F or example, across all nine subjects, Clusters 1 and 6 are only dominant during the deepest sleep states (NREM3 and NREM2), while Clusters 3,5 and 9 are only dominant during wake states. Ho wev er , Cluster 4 is dominant during NREM2, NREM1, and REM in different subjects, which suggests that each person may have a slightly different spectral signature for the same scored sleep stage and that certain combinations of network acti vity may not be relegated solely to one stage. (For more details on cluster dominance during scored sleep stages, see T able S1 in the Supplementary Material Section 3.2.) 8 Finally , both the heatmaps and cluster trajectories demonstrate that there are faster sub-oscillations occurring within each sleep expert scored stage across subjects. Howe ver , these sub-oscillations occur at dif ferent times across subjects. For example, Subject 3 oscillates rapidly between clusters during deep sleep (NREM3), b ut is very stable during REM. On the other hand, Subject 7 is stable during deep sleep but oscillates more during lighter stages of sleep and wake. This stage-dependent oscillatory heterogeneity can been seen clearly in T able S2 (Supplementary Material Section 3.2), which has the computed transition rates per minute for each sleep stage for both subjects. In a broader context, stage-dependent sub-oscillations may help describe heterogeneity across subjects and populations. 6 Conclusion Overall, this method provides a r obust, Bayesian nonparametric frame work for identifying the spectral content, number , and dynamics of salient recurring oscillatory states during sleep across patients. In addition, we identify subject-speciﬁc "microstates" within canonical sleep stages and furthermore, discov er stage-dependent sub-oscillations with shared spectral signatures across subjects. This work can serve as a basis for no vel mechanistic studies focusing on the network dynamics of these states, as well as their clinical and scientiﬁc rele vance. By liberating sleep from the practical necessities of human scoring and a priori assumptions of machine learning algorithms, we pav e the way for a higher dimensional, data-driv en feature space for biomarker detection and clinical intervention. 9 References [1] Madeleine M Grigg-Damber ger . The AASM Scoring Manual four years later . J ournal of clinical sleep medicine : JCSM : ofﬁcial publication of the American Academy of Sleep Medicine , 8(3):323–32, 6 2012. [2] A. L. Loomis, E. N. Harv ey , and G. A. Hobart. Cerebral states during sleep, as studied by human brain potentials. Journal of Experimental Psyc hology , 21(2):127–144, 1937. [3] Ritchie E. Brown, Radhika Basheer, James T . McKenna, Robert E. Strecker , and Robert W . McCarley . Control of Sleep and W akefulness. Physiological Revie ws , 92(3):1087–1187, 7 2012. [4] Anthony Kales. A manual of standar dized terminology , techniques and scoring system for sleep stages of human subjects . United States Gov ernment Printing Ofﬁce, W ashington DC, 1968. [5] Michael J. Prerau, Ritchie E. Brown, Matt T . Bianchi, Jeffre y M. Ellenbogen, and Patrick L. Purdon. Sleep Neurophysiological Dynamics Through the Lens of Multitaper Spectral Analysis. Physiology , 32(1):60–92, 1 2017. [6] Michael H Silber , Sonia Ancoli-Israel, Michael H Bonnet, Sudhansu Chokroverty , Madeleine M Grigg- Damberger , Max Hirshkowitz, Sheldon Kapen, Sharon A K eenan, Meir H Kryger , Thomas Penzel, Mark R Pressman, and Conrad Iber . The visual scoring of sleep in adults. Journal of clinical sleep medicine : JCSM : ofﬁcial publication of the American Academy of Sleep Medicine , 3(2):121–31, 3 2007. [7] Robert D. Ogilvie. The process of falling asleep. Sleep Medicine Re views , 5(3):247–270, 6 2001. [8] Martin Längkvist, Lars Karlsson, and Amy Loutﬁ. Sleep Stage Classiﬁcation Using Unsupervised Feature Learning. Advances in Artiﬁcial Neural Systems , 2012:1–9, 7 2012. [9] Salih Güne ¸ s, Kemal Polat, and ¸ Sebnem Y osunkaya. Ef ﬁcient sleep stage recognition system based on EEG signal using k-means clustering based feature weighting. Expert Systems with Applications , 37(12):7922–7928, 12 2010. [10] Gang Luo and W anli Min. Subject-adaptive real-time sleep stage classiﬁcation based on conditional random ﬁeld. AMIA ... Annual Symposium pr oceedings. AMIA Symposium , pages 488–92, 10 2007. [11] I. Gath, C. Feuerstein, and A. Ge va. Unsupervised classiﬁcation and adapti ve deﬁnition of sleep patterns. P attern Recognition Letters , 15(10):977–984, 10 1994. [12] Genshiro A. Sunagaw a, Hiroyoshi Séi, Shigeki Shimba, Y oshihiro Urade, and Hiroki R. Ueda. F ASTER: an unsupervised fully automated sleep staging method for mice. Genes to Cells , 18(6):502–518, 6 2013. [13] Matthew J Beal, Zoubin Ghahramani, and Carl Edward Rasmussen. The Inﬁnite Hidden Mark ov Model. Advances in Neural Information Pr ocessing Systems 14 , pages 577–584, 2002. [14] Behtash Babadi and Emery N. Bro wn. A Revie w of Multitaper Spectral Analysis. IEEE T ransactions on Biomedical Engineering , 61(5):1555–1564, 5 2014. [15] Kullback. The Kullback-Leibler Distance. The American Statistician , 41(4), 1987. [16] Alan V . Oppenheim and Ronald W . Schafer . Discr ete-time signal pr ocessing . Pearson, 2010. [17] David R. Brillinger and Society for Industrial and Applied Mathematics. T ime series : data analysis and theory . Society for Industrial and Applied Mathematics (SIAM, 3600 Market Street, Floor 6, Philadelphia, P A 19104), 2001. [18] Donald B. Perci val and Andre w T . W alden. Spectral analysis for physical applications : multitaper and con ventional univariate techniques . Cambridge Uni versity Press, 1993. [19] D.J. Thomson. Spectrum estimation and harmonic analysis. Pr oceedings of the IEEE , 70(9):1055–1096, 1982. [20] Jurgen V an Gael, Y unus Saatci, Y ee Whye T eh, and Zoubin Ghahramani. Beam Sampling for the Inﬁnite Hidden Marko v Model. Pr oceedings of the 25th International Confer ence on Machine Learning , pages 1088–1095, 2008. [21] Robert H. Shumway and Da vid S. Stof fer . T ime Series Analysis and Its Applications . Springer T exts in Statistics. Springer International Publishing, Cham, 2017. [22] G. C. Tseng. Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics , 23(17):2247–2255, 9 2007. 10 [23] T akeru Matsuda and Fumiyasu Komaki. Multi variate T ime Series Decomposition into Oscillation Compo- nents. Neural Computation , 29(8):2055–2075, 8 2017. [24] Kyle Ulrich, David E Carlson, W enzhao Lian, Jana Schaich Borg, Kafui Dzirasa, and Lawrence Carin. Analysis of Brain States from Multi-Region LFP T ime-Series. Advances in Neural Information Pr ocessing Systems 27 , pages 2483–2491, 2014. 11

Multitaper Spectral Estimation HDP-HMMs for EEG Sleep Inference

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment