Spectral Correlation Hub Screening of Multivariate Time Series

This chapter discusses correlation analysis of stationary multivariate Gaussian time series in the spectral or Fourier domain. The goal is to identify the hub time series, i.e., those that are highly correlated with a specified number of other time s…

Authors: Hamed Firouzi, Dennis Wei, Alfred O. Hero III

Spectral Correlation Hub Screening of Multivariate Time Series
Sp ectral Correlation Hub Screening of Multiv ariate Time Series Hamed Firouzi, Dennis W ei and Alfred O. Hero I II Electrical Engineering and Computer Science Department, Univ ersit y of Michigan, USA firouzi@umich.edu, dlwei@eecs.umich.edu, hero@eecs.umich.edu Summary . This c hapter discusses correlation analysis of stationary multiv ariate Gaussian time series in the sp ectral or F ourier domain. The goal is to iden tify the h ub time series, i.e., those that are highly correlated with a sp ecified n umber of other time series. W e sho w that F ourier comp onen ts of the time series at differ- en t frequencies are asymptotically statistically indep enden t. This prop ert y p ermits indep enden t correlation analysis at each frequency , alleviating the computational and statistical challenges of high-dimensional time series. T o detect correlation hubs at each frequency , an existing correlation screening metho d is extended to the com- plex n umbers to accommo date complex-v alued F ourier comp onen ts. W e c haracterize the num b er of hub discov eries at sp ecified correlation and degree thresholds in the regime of increasing dimension and fixed sample size. The theory specifies appro- priate thresholds to apply to sample correlation matrices to detect h ubs and also allo ws statistical significance to b e attributed to hub discov eries. Numerical results illustrate the accuracy of the theory and the usefulness of the prop osed sp ectral framew ork. Key words: Complex-v alued correlation screening, Sp ectral correlation analysis, Gaussian stationary pro cesses, Hub screening, Correlation graph, Correlation net- w ork, Spatio-temporal analysis of multiv ariate time series, High dimensional data analysis 1 In tro duction Correlation analysis of multiv ariate time series is imp ortant in man y appli- cations suc h as wireless sensor net works, computer netw orks, neuroimaging, and finance [1, 2, 3, 4, 5]. This c hapter fo cuses on the problem of detecting hub time series, ones that ha ve a high degree of in teraction with other time series as measured by correlation or partial correlation. Detection of hubs can lead to reduced computational and/or sampling costs. F or example in wire- less sensor net works, the iden tification of hub nodes can be useful for reducing p o wer usage and adding or removing sensors from the netw ork [6, 7]. Hub de- tection can also giv e new insights ab out underlying structure in the dataset. 2 Hamed Firouzi, Dennis W ei and Alfred O. Hero I II In neuroimaging for instance, studies hav e consisten tly shown the existence of highly connected hubs in brain graphs (connectomes) [8]. In finance, a hub migh t indicate a vulnerable financial instrument or a sector whose collapse could ha ve a ma jor effect on the mark et [9]. Correlation analysis becomes challenging for m ultiv ariate time series when the dimension p of the time series, i.e. the n umber of scalar time series, and the n umber of time samples N are large [4]. A naive approac h is to treat the time series as a set of indep enden t samples of a p -dimensional random vector and estimate the associated cov ariance or correlation matrix, but this approac h completely ignores temp oral correlations as it only considers dep endences at the same time instant and not b et ween differen t time instants. The work in [10] accoun ts for temp oral correlations by quan tifying their effect on conv ergence rates in co v ariance and precision matrix estimation; how ever, only correlations at the same time instan t are estimated. A more general approach is to consider all correlations b et ween any tw o time instants of an y tw o series within a windo w of n ≤ N consecutiv e samples, where the previous case corresp onds to n = 1. How ev er, in general this would entail the estimation of an np × np correlation matrix from a reduced sample of size m = N/n , which can b e computationally costly as w ell as statistically problematic. In this c hapter, w e prop ose sp e ctr al correlation analysis as a metho d of o vercoming the issues discussed ab o ve. As b efore, the time series are divided in to m temp oral segmen ts of n consecutive samples, but instead of estimating temp oral correlations directly , the metho d p erforms analysis on the Discrete F ourier T ransforms (DFT) of the time series. W e pro ve in Theorem 1 that for stationary , jointly Gaussian time series under the mild condition of abso- lute summabilit y of the auto- and cross-correlation functions, differen t F ourier comp onen ts (frequencies) b ecome asymptotically indep endent of eac h other as the DFT length n increases. This prop erty of stationary Gaussian processes allo ws us to fo cus on the p × p correlations at each frequency separately with- out ha ving to consider correlations b et ween different frequencies. Moreo ver, sp ectral analysis isolates correlations at sp ecific frequencies or timescales, p o- ten tially leading to greater insight. T o make aggregate inferences based on all frequencies, straightforw ard pro cedures for multiple inference can b e used as describ ed in Section 4. The sp ectral approach reduces the detection of hub time series to the indep enden t detection of hubs at eac h frequency . How ever, in exc hange for ac hieving sp ectral resolution, the sample size is reduced b y the factor n , from N to m = N /n . T o confidently detect hubs in this high-dimensional, low- sample regime (large p , small m ), as well as to accommo date complex-v alued DFTs, we develop a metho d that we call c omplex-value d (p artial) c orr elation scr e ening . This is a generalization of the correlation and partial correlation screening metho d of [11, 9, 12] to complex-v alued random v ariables. F or each frequency , the metho d computes the sample (partial) correlation matrix of the DFT comp onents of the p time series. Highly correlated v ariables (hubs) are then identified b y thresholding the sample correlation matrix at a level Sp ectral Correlation Hub Screening of Multiv ariate Time Series 3 ρ and screening for ro ws (or columns) with a sp ecified num b er δ of non-zero en tries. W e characterize the b eha vior of complex-v alued correlation screening in the high-dimensional regime of large p and fixed sample size m . Specifically , Theorem 2 and Corollary 2 giv e asymptotic expressions in the limit p → ∞ for the mean num b er of hubs detected at thresholds ρ, δ and the probability of discov ering at least one such hub. Bounds on the rates of con vergence are also provided. These results show that the num b er of hub discov eries under- go es a phase transition as ρ decreases from 1, from almost no discov eries to the maxim um num b er, p . An expression (33) for the critical threshold ρ c,δ is derived to guide the selection of ρ under different settings of p , m , and δ . F urthermore, given a null hypothesis that the population correlation matrix is sufficiently sparse, the expressions in Corollary 2 b ecome indep endent of the underlying probability distribution and can thus b e easily ev aluated. This allo ws the statistical significance of a hub discov ery to be quan tified, sp ecif- ically in the form of a p -v alue under the null h yp othesis. W e note that our results on complex-v alued correlation screening apply more generally than to sp ectral correlation analysis and thus may b e of indep enden t interest. The remainder of the chapter is organized as follows. Section 2 presen ts notation and definitions for multiv ariate time series and establishes the asymp- totic indep endence of sp ectral comp onents. Section 3 describ es complex- v alued correlation screening and characterizes its prop erties in terms of num- b ers of hub discov eries and phase transitions. Section 4 discusses the appli- cation of complex-v alued correlation screening to the sp ectra of multiv ariate time series. Finally , Sec. 5 illustrates the applicability of the prop osed frame- w ork through simulation analysis. 1.1 Notation A triplet ( Ω , F , P ) represents a probabilit y space with sample space Ω , σ - algebra of ev ents F , and probabilit y measure P . F or an even t A ∈ F , P ( A ) represen ts the probability of A . Scalar random v ariables and their realizations are denoted with upp er case and low er case letters, resp ectiv ely . Random v ectors and their realizations are denoted with b old upp er case and b old lo wer case letters. The exp ectation op erator is denoted as E . F or a random v ariable X , the cumulativ e probability distribution (cdf ) of X is defined as F X ( x ) = P ( X ≤ x ). F or an absolutely contin uous cdf F X ( . ) the probability densit y function (pdf ) is defined as f X ( x ) = dF X ( x ) /dx . The cdf and p df are defined similarly for random vectors. Moreov er, we follow the definitions in [13] for conditional probabilities, conditional exp ectations and conditional densities. F or a complex num b er z = a + b √ − 1 ∈ C , < ( z ) = a and = ( z ) = b repre- sen t the real and imaginary parts of z , resp ectiv ely . A complex-v alued random v ariable is comp osed of t wo real-v alued random v ariables as its real and imag- inary parts. A complex-v alued Gaussian v ariable has real and imaginary parts 4 Hamed Firouzi, Dennis W ei and Alfred O. Hero I II that are Gaussian. A complex-v alued (Gaussian) random v ector is a vector whose entries are complex-v alued (Gaussian) random v ariables. The cov ari- ance of a p -dimensional complex-v alued random v ector Y and a q -dimensional complex-v alued random vector Z is a p × q matrix defined as co v ( Y , Z ) = E  ( Y − E [ Y ])( Z − E [ Z ]) H  , where H denotes the Hermitian transpose. W e write cov( Y ) for cov( Y , Y ) and v ar( Y ) = cov( Y , Y ) for the v ariance of a scalar random v ariable Y . The correlation co efficien t b et ween random v ariables Y and Z is defined as cor( Y , Z ) = co v ( Y , Z ) p v ar( Y )v ar( Z ) . Matrices are also denoted by b old upp er case letters. In most cases the distinction b et ween matrices and random vectors will b e clear from the con- text. F or a matrix A we represent the ( i, j )th entry of A b y a ij . Also D A represen ts the diagonal matrix that is obtained by zeroing out all but the diagonal en tries of A . 2 Sp ectral Represen tation of Multiv ariate Time Series 2.1 Definitions Let X ( k ) = [ X (1) ( k ) , X (2) ( k ) , · · · X ( p ) ( k )], k ∈ Z , be a m ultiv ariate time series with time index k . W e assume that the time series X (1) , X (2) , · · · X ( p ) are second-order stationary random pro cesses, i.e.: E [ X ( i ) ( k )] = E [ X ( i ) ( k + ∆ )] (1) and co v [ X ( i ) ( k ) , X ( j ) ( l )] = cov[ X ( i ) ( k + ∆ ) , X ( j ) ( l + ∆ )] (2) for an y integer time shift ∆ . F or 1 ≤ i ≤ p , let X ( i ) = [ X ( i ) ( k ) , · · · , X ( i ) ( k + n − 1)] denote any vector of n consecutiv e samples of time series X ( i ) . The n -p oin t Discrete F ourier T ransform (DFT) of X ( i ) is denoted by Y ( i ) = [ Y ( i ) (0) , · · · , Y ( i ) ( n − 1)] and defined b y Y ( i ) = WX ( i ) , 1 ≤ i ≤ p in whic h W is the DFT matrix: W = 1 √ n      1 1 · · · 1 1 ω · · · ω n − 1 . . . . . . . . . . . . 1 ω n − 1 · · · ω ( n − 1) 2      , Sp ectral Correlation Hub Screening of Multiv ariate Time Series 5 where ω = e − 2 π √ − 1 /n . W e denote the n × n p opulation cov ariance matrix of X ( i ) as C ( i,i ) = [ c ( i,i ) kl ] 1 ≤ k,l ≤ n and the n × n p opulation cross cov ariance matrix b et ween X ( i ) and X ( j ) as C ( i,j ) = [ c ( i,j ) kl ] 1 ≤ k,l ≤ n for i 6 = j . The translation inv ariance prop- erties (1) and (2) imply that C ( i,i ) and C ( i,j ) are T o eplitz matrices. Therefore c ( i,i ) kl and c ( i,j ) kl dep end on k and l only through the quan tit y k − l . Representing the ( k , l )th entry of a T o eplitz matrix T by t ( k − l ), we write c ( i,i ) kl = c ( i,i ) ( k − l ) and c ( i,j ) kl = c ( i,j ) ( k − l ) , where k − l takes v alues from 1 − n to n − 1. In addition, C ( i,i ) is symmetric. 2.2 Asymptotic Indep endence of Sp ectral Comp onents The follo wing theorem states that for stationary time series, DFT comp onen ts at different sp ectral indices (i.e. frequencies) are asymptotically uncorrelated under the condition that the auto-cov ariance and cross-co v ariance functions are absolutely summable. This theorem follows directly from the sp ectral the- ory of large T o eplitz matrices, see, for example, [14] and [15]. Ho wev er, for the b enefit of the reader we give a self contained pro of of the theorem. Theorem 1 Assume lim n →∞ P n − 1 t =0 | c ( i,j ) ( t ) | = M ( i,j ) < ∞ for al l 1 ≤ i, j ≤ p . Define err ( i,j ) ( n ) = M ( i,j ) − P n − 1 m 0 =0 | c ( i,j ) ( m 0 ) | and avg ( i,j ) ( n ) = 1 n P n − 1 m 0 =0 err ( i,j ) ( m 0 ) . Then for k 6 = l , we have: cor  Y ( i ) ( k ) , Y ( j ) ( l )  = O (max { 1 /n, a vg ( i,j ) ( n ) } ) . In other wor ds Y ( i ) ( k ) and Y ( j ) ( l ) ar e asymptotic al ly unc orr elate d as n → ∞ . Pr o of. Without loss of generalit y w e assume that the time series hav e zero mean (i.e. E [ X ( i ) ( k )] = 0 , 1 ≤ i ≤ p, 0 ≤ k ≤ n − 1). W e first establish a represen tation of E [ Z ( i ) ( k ) Z ( j ) ( l ) ∗ ] for general linear functionals: Z ( i ) ( k ) = n − 1 X m 0 =0 g k ( m 0 ) X ( i ) ( m 0 ) , in whic h g k ( . ) is an arbitrary complex sequence for 0 ≤ k ≤ n − 1. W e hav e: E [ Z ( i ) ( k ) Z ( j ) ( l ) ∗ ] = E " n − 1 X m 0 =0 g k ( m 0 ) X ( i ) ( m 0 ) ! n − 1 X n 0 =0 g l ( n 0 ) X ( j ) ( n 0 ) ! ∗ # = n − 1 X m 0 =0 g k ( m 0 ) n − 1 X n 0 =0 g l ( n 0 ) ∗ E [ X ( i ) ( m 0 ) X ( j ) ( n 0 ) ∗ ] = n − 1 X m 0 =0 g k ( m 0 ) n − 1 X n 0 =0 g l ( n 0 ) ∗ c ( i,j ) m 0 n 0 (3) 6 Hamed Firouzi, Dennis W ei and Alfred O. Hero I II No w for a T oeplitz matrix T , define the circulant matrix D T as: D T =        t (0) t ( − 1) + t ( n − 1) · · · t (1 − n ) + t (1) t (1) + t (1 − n ) t (0) · · · t (2 − n ) + t (2) . . . . . . . . . . . . t ( n − 2) + t ( − 2) t ( n − 3) + t ( − 3) · · · t ( − 1) + t ( n − 1) t ( n − 1) + t ( − 1) t ( n − 2) + t ( − 2) · · · t (0)        W e can write: C ( i,j ) = D C ( i,j ) + E ( i,j ) for some T o eplitz matrix E ( i,j ) . Th us c ( i,j ) ( m 0 − n 0 ) = d ( i,j ) ( m 0 − n 0 ) + e ( i,j ) ( m 0 − n 0 ) where d ( i,j ) ( m 0 − n 0 ) and e ( i,j ) ( m 0 − n 0 ) are the ( m 0 , n 0 ) en- tries of D C ( i,j ) and E ( i,j ) , resp ectiv ely . Therefore, (3) can b e written as: n − 1 X m 0 =0 g k ( m 0 ) n − 1 X n 0 =0 g l ( n 0 ) ∗ d ( i,j ) ( m 0 − n 0 ) + n − 1 X m 0 =0 n − 1 X n 0 =0 g k ( m 0 ) g l ( n 0 ) ∗ e ( i,j ) ( m 0 − n 0 ) The first term can b e written as: n − 1 X m 0 =0 g k ( m 0 )  g ∗ l ~ d ( i,j )  ( m 0 ) = n − 1 X m 0 =0 g k ( m 0 ) v ( i,j ) l ( m 0 ) where we hav e recognized v ( i,j ) l ( m 0 ) = g ∗ l ~ d ( i,j ) as the circular con volution of g ∗ l ( . ) and d ( i,j ) ( . ) [16]. Let G k ( . ) and D ( i,j ) ( . ) b e the the DFT of g k ( . ) and d ( i,j ) ( . ), resp ectiv ely . By Plancherel’s theorem [17] we hav e: n − 1 X m 0 =0 g k ( m 0 ) v ( i,j ) l ( m 0 ) = n − 1 X m 0 =0 g k ( m 0 )  v ( i,j ) l ( m 0 ) ∗  ∗ = n − 1 X m 0 =0 G k ( m 0 )  G l ( m 0 ) D ( i,j ) ( − m 0 ) ∗  ∗ = n − 1 X m 0 =0 G k ( m 0 ) G l ( m 0 ) ∗ D ( i,j ) ( − m 0 ) . (4) No w let g k ( m 0 ) = ω km 0 / √ n for 0 ≤ k , m 0 ≤ n − 1. F or this choice of g k ( . ) w e hav e G k ( m 0 ) = 0 for all m 0 6 = n − k and G k ( n − k ) = 1. Hence for k 6 = l the quantit y (4) b ecomes 0. Therefore using the representation E ( i,j ) = C ( i,j ) − D C ( i,j ) w e hav e: Sp ectral Correlation Hub Screening of Multiv ariate Time Series 7 | co v  Y ( i ) ( k ) , Y ( j ) ( l )  | = | E [ Y ( i ) ( k ) Y ( j ) ( l ) ∗ ] | = | n − 1 X m 0 =0 n − 1 X n 0 =0 g k ( m 0 ) g l ( n 0 ) ∗ e ( i,j ) ( m 0 − n 0 ) | ≤ 1 n n − 1 X m 0 =0 n − 1 X n 0 =0 | e ( i,j ) ( m 0 − n 0 ) | = 2 n n − 1 X m 0 =0 m 0 | c ( i,j ) ( m 0 ) | , (5) in whic h the last equation is due to the fact that | c ( i,j ) ( − m 0 ) | = | c ( i,j ) ( m 0 ) | . No w using (4) and (5) we obtain expressions for v ar  Y ( i ) ( k )  and v ar  Y ( j ) ( l )  . Letting j = i and l = k in (4) and (5) gives: v ar  Y ( i ) ( k )  = co v  Y ( i ) ( k ) , Y ( i ) ( k )  = n − 1 X m 0 =0 G k ( m 0 ) G k ( m 0 ) ∗ D ( i,i ) ( − m 0 ) + n − 1 X m 0 =0 n − 1 X n 0 =0 g k ( m 0 ) g k ( n 0 ) ∗ e ( i,i ) ( m 0 − n 0 ) = n. 1 √ n . 1 √ n D ( i,i ) ( k ) + n − 1 X m 0 =0 n − 1 X n 0 =0 g k ( m 0 ) g k ( n 0 ) ∗ e ( i,i ) ( m 0 − n 0 ) = D ( i,i ) ( k ) + n − 1 X m 0 =0 n − 1 X n 0 =0 g k ( m 0 ) g k ( n 0 ) ∗ e ( i,i ) ( m 0 − n 0 ) , (6) in whic h the magnitude of the summation term is b ounded as: | n − 1 X m 0 =0 n − 1 X n 0 =0 g k ( m 0 ) g k ( n 0 ) ∗ e ( i,i ) ( m 0 − n 0 ) | ≤ 1 n n − 1 X m 0 =0 n − 1 X n 0 =0 | e ( i,i ) ( m 0 − n 0 ) | = 2 n n − 1 X m 0 =0 m 0 | c ( i,i ) ( m 0 ) | . (7) Similarly: v ar  Y ( j ) ( l )  = D ( j,j ) ( l ) + n − 1 X m 0 =0 n − 1 X n 0 =0 g l ( m 0 ) g l ( n 0 ) ∗ e ( j,j ) ( m 0 − n 0 ) , (8) in whic h 8 Hamed Firouzi, Dennis W ei and Alfred O. Hero I II | n − 1 X m 0 =0 n − 1 X n 0 =0 g l ( m 0 ) g l ( n 0 ) ∗ e ( j,j ) ( m 0 − n 0 ) | ≤ 2 n n − 1 X m 0 =0 m 0 | c ( j,j ) ( m 0 ) | . (9) T o complete the pro of the following lemma is needed. Lemma 1 If { a m 0 } ∞ m 0 =0 is a se quenc e of non-ne gative numb ers such that P ∞ m 0 =0 a m 0 = M < ∞ . Define err( n ) = M − P n − 1 m 0 =0 a m 0 and avg( n ) = 1 n P n − 1 m 0 =0 err( m 0 ) . Then | 1 n P n − 1 m 0 =0 m 0 a m 0 | ≤ M /n + err( n ) + a vg ( n ) . Pr o of. Let S 0 = 0 and for n ≥ 1 define S n = P n − 1 m 0 =0 a m 0 . W e hav e: n − 1 X m 0 =0 ma m 0 = ( n − 1) S n − ( S 0 + S 1 + . . . + S n − 1 ) . Therefore: 1 n n − 1 X m 0 =0 m 0 a m 0 = n − 1 n S n − 1 − 1 n n − 1 X m 0 =0 S m 0 . Since M − M /n − err( n ) ≤ n − 1 n S n − 1 ≤ M and M − a vg( n ) ≤ 1 n P n − 1 m 0 =0 S m 0 ≤ M , using the triangle inequality the result follows. u t No w let a m 0 = | c ( i,j ) ( m 0 ) | . By assumption lim n →∞ P n − 1 m 0 =0 a m 0 = M ( i,j ) < ∞ . Therefore, Lemma 1 along with (5) concludes: co v  Y ( i ) ( k ) , Y ( j ) ( l )  = O (max { 1 /n, err (i , j) ( n ) , avg (i , j) ( n ) } ) . (10) err ( i,j ) ( n ) is a decreasing decreasing function of n . Therefore avg ( i,j ) ( n ) ≥ err ( i,j ) ( n ), for n ≥ 1. Hence: co v  Y ( i ) ( k ) , Y ( j ) ( l )  = O (max { 1 /n, a vg ( i,j ) ( n ) } ) . Similarly using Lemma 1 along with (6), (7), (8) and (9) we obtain: | v ar  Y ( i ) ( k )  − D ( i,i ) ( k ) | = O (max { 1 /n, avg ( i,i ) ( n ) } ) , (11) and | v ar  Y ( j ) ( l )  − D ( j,j ) ( l ) | = O (max { 1 /n, avg ( j,j ) ( n ) } ) . (12) Using the definition Sp ectral Correlation Hub Screening of Multiv ariate Time Series 9 cor  Y ( i ) ( k ) , Y ( j ) ( l )  = co v  Y ( i ) ( k ) , Y ( j ) ( l )  q v ar  Y ( i ) ( k )  q v ar  Y ( j ) ( l )  , and the fact that as n → ∞ , D ( i,i ) ( k ) and D ( j,j ) ( l ) con verge to constan ts C ( i,i ) ( k ) and C ( j,j ) ( l ), resp ectively , equations (10), (11) and (12) conclude: cor  Y ( i ) ( k ) , Y ( j ) ( l )  = O (max { 1 /n, a vg ( i,j ) ( n ) } ) . u t As an example we apply Theorem 1 to a scalar auto-regressive (AR) pro- cess X ( k ) sp ecified by X ( k ) = L X l =1 ϕ l X ( k − l ) + ε ( k ) , in whic h ϕ l are real-v alued co efficien ts and ε ( . ) is a stationary process with no temp oral correlation. The auto-cov ariance function of an AR pro cess can b e written as [18]: c ( t ) = L X l =1 α l r | t | l , in which r 1 , . . . , r l are the ro ots of the p olynomial β ( x ) = x L − P L l =1 ϕ l x L − l . It is known that for a stationary AR process, | r l | < 1 for all 1 ≤ l ≤ L [18]. Therefore, using the definition of err( . ) w e hav e: err( n ) = ∞ X t = n | c ( t ) | = ∞ X t = n | L X l =1 α l r t l | ≤ L X l =1 | α l | ∞ X t = n | r l | t = L X l =1 | α l | | r l | n 1 − | r l | ≤ C ζ n , in whic h C = P L l =1 | α l | / (1 − | r l | ) and ζ = max 1 ≤ l ≤ L | r l | < 1. Hence: a vg ( n ) = 1 n n − 1 X m 0 =0 err( m 0 ) ≤ 1 n n − 1 X m 0 =0 C ζ m 0 ≤ C n (1 − ζ ) . Therefore, Theorem 1 concludes: cor ( Y ( k ) , Y ( l )) = O (1 /n ) , k 6 = l, where Y ( . ) represents the n -p oint DFT of the AR pro cess X ( . ). In the sequel, we assume that the time series X is multiv ariate Gaussian, i.e., X (1) , . . . , X ( p ) are jointly Gaussian pro cesses. It follows that the DFT comp onen ts Y ( i ) ( k ) are join tly (complex) Gaussian as linear functionals of X . Theorem 1 then immediately implies asymptotic indep endence of DFT com- p onen ts through a well-kno wn prop erty of jointly Gaussian random v ariables. 10 Hamed Firouzi, Dennis W ei and Alfred O. Hero I II Corollary 1 Assume that the time series X is multivariate Gaussian. Un- der the absolute summability c onditions in The or em 1, the DFT c omp onents Y ( i ) ( k ) and Y ( j ) ( l ) ar e asymptotic al ly indep endent for k 6 = l and n → ∞ . Corollary 1 implies that for large n , correlation analysis of the time series X can b e done indep enden tly on eac h frequency in the sp ectral domain. This reduces the problem of screening for hub time series to screening for h ub v ariables among the p DFT comp onen ts at a given frequency . A pro cedure for the latter problem and a corresp onding theory are describ ed next. 3 Complex-V alued Correlation Hub Screening This section discusses complex-v alued correlation h ub screening, a generaliza- tion of real-v alued correlation screening in [11, 9], for identifying highly corre- lated comp onen ts of a complex-v alued random vector from its sample v alues. The metho d is applied to multiv ariate time series in Section 4 to discov er correlation h ubs among the sp ectral comp onen ts at each frequency . Sections 3.1 and 3.2 describ e the underlying statistical mo del and the screening pro ce- dure. Sections 3.3 and 3.4 provide background on the U-score representation of correlation matrices and asso ciated definitions and prop erties. Section 3.5 con tains the main theoretical result c haracterizing the num b er of hub dis- co veries in the high-dimensional regime, while Section 3.6 elab orates on the phenomenon of phase transitions in the n umber of discov eries. 3.1 Statistical Mo del W e use the generic notation Z = [ Z 1 , Z 2 , · · · , Z p ] T in this section to refer to a complex-v alued random vector. The mean of Z is denoted as µ and its p × p non-singular cov ariance matrix is denoted as Σ . W e assume that the vector Z follo ws a complex elliptically contoured distribution with p df f Z ( z ) = g  ( z − µ ) H Σ − 1 ( z − µ )  , in whic h g : R ≥ 0 → R > 0 is an in tegrable and strictly decreasing function [19]. This assumption generalizes the Gaussian assumption made in Section 2 as the Gaussian distribution is one example of an elliptically con toured distribution. In correlation hub screening, the quantities of in terest are the correlation matrix and partial correlation matrix asso ciated with Z . These are defined as Γ = D − 1 2 Σ ΣD − 1 2 Σ and Ω = D − 1 2 Σ − 1 Σ − 1 D − 1 2 Σ − 1 , resp ectively . Note that Γ and Ω are normalized matrices with unit diagonals. 3.2 Screening Pro cedure The goal of correlation hub screening is to identify highly correlated comp o- nen ts of the random v ector Z from its sample realizations. Assume that m samples z 1 , . . . , z m ∈ R p of Z are a v ailable. T o simplify the dev elopment of the Sp ectral Correlation Hub Screening of Multiv ariate Time Series 11 theory , the samples are assumed to be independent and identically distributed (i.i.d.) although the theory also applies to dep enden t samples. W e compute sample correlation and partial correlation matrices from the samples z 1 , . . . , z m as surrogates for the unkno wn p opulation correlation ma- trices Γ and Ω in Section 3.1. First define the p × p sample cov ariance matrix S as S = 1 m − 1 P m i =1 ( z i − z )( z i − z ) H , where z is the sample mean, the av erage of z 1 , . . . , z m . The sample correlation and sample partial correlation matri- ces are then defined as R = D − 1 2 S SD − 1 2 S and P = D − 1 2 R † R † D − 1 2 R † , resp ectiv ely , where R † is the Mo ore-P enrose pseudo-inv erse of R . Correlation hubs are screened by applying thresholds to the sample (par- tial) correlation matrix. A v ariable Z i is declared a h ub screening disco very at degree lev el δ ∈ { 1 , 2 , . . . } and threshold level ρ ∈ [0 , 1] if |{ j : j 6 = i, | ψ ij | ≥ ρ }| ≥ δ, where Ψ = R for correlation screening and Ψ = P for partial correlation screening. W e denote by N δ,ρ ∈ { 0 , . . . , p } the total num b er of hub screening disco veries at levels δ, ρ . Correlation h ub screening can also b e interpreted in terms of the (p artial) c orr elation gr aph G ρ ( Ψ ), depicted in Fig. 1 and defined as follo ws. The v ertices of G ρ ( Ψ ) are v 1 , · · · , v p whic h corresp ond to Z 1 , · · · , Z p , resp ectively . F or 1 ≤ i, j ≤ p , v i and v j are connected by an edge in G ρ ( Ψ ) if the magnitude of the sample (partial) correlation co efficient b etw een Z i and Z j is at least ρ . A v ertex of G ρ ( Ψ ) is called a δ -hub if its degree, the num b er of incident edges, is at least δ . Then the num b er of discov eries N δ,ρ defined earlier is the n umber of δ -h ubs in the graph G ρ ( Ψ ). v 3 v 2 v 1 v p v j v i Fig. 1. Complex-v alued (partial) correlation hub screening thresholds the sample correlation or partial correlation matrix, denoted generically b y the matrix Ψ , to find v ariables Z i that are highly correlated with other v ariables. This is equiv alent to finding hubs in a graph G ρ ( Ψ ) with p vertices v 1 , · · · , v p . F or 1 ≤ i, j ≤ p, v i is connected to v j in G ρ ( Ψ ) if | ψ ij | ≥ ρ . 12 Hamed Firouzi, Dennis W ei and Alfred O. Hero I II 3.3 U-score Representation of Correlation Matrices Our theory for complex-v alued correlation screening is based on the U-score represen tation of the sample correlation and partial correlation matrices. Sim- ilarly to the real case [9], it can be shown that there exists an ( m − 1) × p complex-v alued matrix U R with unit-norm columns u ( i ) R ∈ C m − 1 suc h that the follo wing representation holds: R = U H R U R . (13) Similar to Lemma 1 in [9] it is straightforw ard to show that: R † = U H R ( U R U H R ) − 2 U R . Hence by defining U P = ( U R U H R ) − 1 U R D − 1 2 U H R ( U R U H R ) − 2 U R w e hav e the represen- tation: P = U H P U P , (14) where the ( m − 1) × p matrix U P has unit-norm columns u ( i ) P ∈ C m − 1 . 3.4 Prop erties of U-scores The U-score factorizations in (13) and (14) sho w that sample (partial) cor- relation matrices can b e represen ted in terms of unit vectors in C m − 1 . This subsection presen ts definitions and prop erties related to U-scores that will b e used in Section 3.5. W e denote the unit spheres in R m − 1 and C m − 1 as S m − 1 and T m − 1 , respec- tiv ely . The surface areas of S m − 1 and T m − 1 are denoted as a m − 1 and b m − 1 resp ectiv ely . Define the interlea ving function h : R 2 m − 2 → C m − 1 as b elo w: h ([ x 1 , x 2 , · · · , x 2 m − 2 ] T ) = [ x 1 + x 2 √ − 1 , x 3 + x 4 √ − 1 , · · · , x 2 m − 3 + x 2 m − 2 √ − 1] T . Note that h ( . ) is a one-to-one and onto function and it maps S 2 m − 2 to T m − 1 . F or a fixed v ector u ∈ T m − 1 and a threshold 0 ≤ ρ ≤ 1 define the spherical cap in T m − 1 : A ρ ( u ) = { y : y ∈ T m − 1 , | y H u | ≥ ρ } . Also define P 0 as the probability that a random p oin t Y that is uniformly distributed on T m − 1 falls into A ρ ( u ). Below we give a simple expression for P 0 as a function of ρ and m . Lemma 2 L et Y b e an ( m − 1) -dimensional c omplex-value d r andom ve ctor that is uniformly distribute d over T m − 1 . We have P 0 = P ( Y ∈ A ρ ( u )) = (1 − ρ 2 ) m − 2 . Sp ectral Correlation Hub Screening of Multiv ariate Time Series 13 Pr o of. Without loss of generality we assume u = [1 , 0 , · · · , 0] T . W e hav e: P 0 = P ( | Y 1 | ≥ ρ ) = P ( < ( Y 1 ) 2 + = ( Y 1 ) 2 ≥ ρ 2 ) . Since Y is uniform on T m − 1 , we can write Y = X / k X k 2 , in which X is complex-v alued random vector whose entries are i.i.d. complex-v alued Gaus- sian v ariables with mean 0 and v ariance 1. Thus: P 0 = P  < ( X 1 ) 2 + = ( X 2 1 )  / k X k 2 2 ≥ ρ 2  = P (1 − ρ 2 )  < ( X 1 ) 2 + = ( X 1 ) 2  ≥ ρ 2 m − 1 X k =2 < ( X k ) 2 + = ( X k ) 2 ! . Define V 1 = < ( X 1 ) 2 + = ( X 1 ) 2 and V 2 = P m − 1 k =2 < ( X k ) 2 + = ( X k ) 2 . V 1 and V 2 are indep endent and ha ve chi-squared distributions with 2 and 2( m − 2) degrees of freedom, resp ectiv ely [20]. Therefore, P 0 = Z ∞ 0 Z ∞ ρ 2 v 2 / (1 − ρ 2 ) χ 2 2 ( v 1 ) χ 2 2( m − 2) ( v 2 ) dv 1 dv 2 = Z ∞ 0 χ 2 2( m − 2) ( v 2 ) Z ∞ ρ 2 v 2 / (1 − ρ 2 ) 1 2 e − v 1 / 2 dv 1 dv 2 = Z ∞ 0 1 2 m − 2 Γ ( m − 2) v m − 3 2 e − v 2 / 2 e − ρ 2 2(1 − ρ 2 ) v 2 dv 2 = 1 Γ ( m − 2) (1 − ρ 2 ) m − 2 Z ∞ 0 x m − 3 e − x dx = 1 Γ ( m − 2) (1 − ρ 2 ) m − 2 Γ ( m − 2) = (1 − ρ 2 ) m − 2 , in whic h we hav e made a change of v ariable x = v 2 2(1 − ρ 2 ) . u t Under the assumption that the joint p df of Z exists, the p columns of the U-score matrix ha ve joint p df f U 1 ,..., U p ( u 1 , . . . , u p ) on T p m − 1 = × p i =1 T m − 1 . The following ( δ + 1)-fold av erage of the joint p df will play a significant role in Section 3.5. This ( δ + 1)-fold av erage is defined as: f U ∗ 1 ,..., U ∗ δ +1 ( u 1 , . . . , u δ +1 ) = 1 (2 π ) δ +1 p  p − 1 δ  × X 1 ≤ i 1 < ··· 1. Theorem 2 L et U = [ U 1 , . . . , U p ] b e a ( m − 1) × p r andom matrix with U i ∈ T m − 1 wher e m > 2 . L et δ ≥ 1 b e a fixe d inte ger. Assume the joint p df of any subset of the U i ’s is b ounde d and differ entiable. Then, with Λ define d in (16), Sp ectral Correlation Hub Screening of Multiv ariate Time Series 15 | E [ N δ,ρ ] − Λ | ≤ O  η δ p,δ max n η p,δ p − 1 /δ , (1 − ρ ) 1 / 2 o . (18) F urthermor e, let N ∗ δ,ρ b e a Poisson distribute d r andom variable with r ate E [ N ∗ δ,ρ ] = Λ/ϕ ( δ ) . If ( p − 1) P 0 ≤ 1 , then   P ( N δ,ρ > 0) − P ( N ∗ δ,ρ > 0)   ≤    O  η δ p,δ max n η δ p,δ ( k /p ) δ +1 , Q p,k,δ , k ∆ p,m,k,δ k 1 , p − 1 /δ , (1 − ρ ) 1 / 2 o , δ > 1 O  η p, 1 max n η p, 1 ( k /p ) 2 , k ∆ p,m,k, 1 k 1 , p − 1 , (1 − ρ ) 1 / 2 o , δ = 1 , (19) with Q p,k,δ = η p,δ  k /p 1 /δ  δ +1 and k ∆ p,m,k,δ k 1 define d in (15) . Pr o of. The pro of is similar to the pro of of prop osition 1 in [9]. First we prov e (18). Let φ i = I ( d i ≥ δ ) b e the indicator of the ev ent that d i ≥ δ , in whic h d i represen ts the degree of the vertex v i in the graph G ρ ( Ψ ). W e hav e N δ,ρ = P p i =1 φ i . With φ ij b eing the indicator of the presence of an edge in G ρ ( Ψ ) b et ween v ertices v i and v j w e hav e the relation: φ i = p − 1 X l = δ X k ∈ ˘ C i ( p − 1 ,l ) l Y j =1 φ ik j p − 1 Y q = l +1 (1 − φ ik q ) (20) where w e hav e defined the index vector k = ( k 1 , . . . , k p − 1 ) and the set ˘ C i ( p − 1 , l ) = { k : k 1 < . . . < k l , k l +1 < . . . < k p − 1 k j ∈ { 1 , . . . , p } − { i } , k j 6 = k j 0 } . The inner summation in (20) simply sums ov er the set of distinct indices not equal to i that index all  p − 1 l  differen t types of pro ducts of the form: Q l j =1 φ ik j Q p − 1 q = l +1 (1 − φ ik q ). Subtracting P k ∈ ˘ C i ( p − 1 ,δ ) Q δ j =1 φ ik j from b oth sides of (20) φ i − X k ∈ ˘ C i ( p − 1 ,δ ) δ Y j =1 φ ik j = p − 1 X l = δ +1 X k ∈ ˘ C i ( p − 1 ,l ) l Y j =1 φ ik j p − 1 Y q = l +1 (1 − φ ik q ) + X k ∈ ˘ C i ( p − 1 ,l ) p − 1 X q = δ +1 ( − 1) q − δ X k 0 δ +1 <... 0 if and only if N δ,ρ > 0. This yields: 20 Hamed Firouzi, Dennis W ei and Alfred O. Hero I II | P ( N δ,ρ > 0) − (1 − exp( − Λ )) | ≤    P ( ˜ N δ,ρ > 0) − P ( N δ,ρ > 0)    +    P ( ˜ N δ,ρ > 0) −  1 − exp( − E [ ˜ N δ,ρ ])     +    exp( − E [ ˜ N δ,ρ ]) − exp( − Λ )    ≤ b 1 + b 2 + b 3 + O     E [ ˜ N δ,ρ ] − Λ     (30) Com bining the ab ov e inequalities on b 1 , b 2 and b 3 yields the first three terms in the argumen t of the “max” on the right side of (19). It remains to b ound the term | E [ ˜ N δ,ρ ] − Λ | . Application of the mean v alue theorem to the m ultiple integral (23) gives      E " δ Y l =1 φ ii l # − P δ 0 J  f U i 1 ,..., U i δ , U i       ≤ O  P δ 0 r  . Applying relation (27) yields     E [ ˜ N δ,ρ ] − p  p − 1 δ  P δ 0 J  f U ∗ 1 ,..., U ∗ ( δ +1)      ≤ O  p δ +1 P δ 0 r  = O  η δ p,δ r  . Com bine this with (30) to obtain the bound (19). This completes the pro of of Theorem 2. u t An immediate consequence of Theorem 2 is the following result, similar to Prop osition 2 in [9], which provides asymptotic expressions for the mean n umber of δ -hubs and the probability of the even t N δ,ρ > 0 as p go es to ∞ and ρ con verges to 1 at a prescrib ed rate. Corollary 2 L et ρ p ∈ [0 , 1] b e a se quenc e c onver ging to one as p → ∞ such that η p,δ = p 1 /δ ( p − 1)(1 − ρ 2 p ) ( m − 2) → e m,δ ∈ (0 , ∞ ) . Then lim p →∞ E [ N δ,ρ p ] = Λ ∞ = e δ m,δ /δ ! lim p →∞ J ( f U ∗ 1 ,..., U ∗ ( δ +1) ) . (31) Assume that k = o ( p 1 /δ ) and that for the we ak dep endency c o efficient k ∆ p,m,k,δ k 1 , define d via (15) , we have lim p →∞ k ∆ p,m,k,δ k 1 = 0 . Then P ( N δ,ρ p > 0) → 1 − exp( − Λ ∞ /ϕ ( δ )) . (32) Corollary 2 sho ws that in the limit p → ∞ , the n umber of detected h ubs depends on the true population correlations only through the quantit y J ( f U ∗ 1 ,..., U ∗ ( δ +1) ). In some cases J ( f U ∗ 1 ,..., U ∗ ( δ +1) ) can b e ev aluated explicitly . Similar to the argument in [9], it can b e sho wn that if the p opulation cov ari- ance matrix Σ is sparse in the sense that its non-zero off-diagonal entries can b e arranged into a k × k submatrix by reordering rows and columns, then J ( f U ∗ 1 ,..., U ∗ ( δ +1) ) = 1 + O ( k /p ) . Sp ectral Correlation Hub Screening of Multiv ariate Time Series 21 Hence, if k = o ( p ) as p → ∞ , the quantit y J ( f U ∗ 1 ,..., U ∗ ( δ +1) ) conv erges to 1. If Σ is diagonal, then J ( f U ∗ 1 ,..., U ∗ ( δ +1) ) = 1 exactly . In such cases, the quan tity Λ ∞ in Corollary 2 do es not dep end on the unknown underlying distribution of the U-scores. As a result, the exp ected num b er of δ -h ubs in G ρ ( Ψ ) and the probabilit y of discov ery of at least one δ -hub do not dep end on the underlying distribution. W e will see in Sec. 4 that this result is useful in assigning statistical significance lev els to vertices of the graph G ρ ( Ψ ). 3.6 Phase T ransitions and Critical Threshold It can be seen from Theorem 2 and Corollary 2 that the n umber of δ -h ub disco veries exhibits a phase transition in the high-dimensional regime where the num b er of v ariables p can b e very large relative to the num b er of samples m . Sp ecifically , assume that the p opulation co v ariance matrix Σ is blo c k- sparse as in Section 3.5. Then as the correlation threshold ρ is reduced, the n umber of δ -h ub disco veries abruptly increases to the maximum, p . Con v ersely as ρ increases, the num b er of discov eries quic kly approac hes zero. Similarly , the family-wise error rate (i.e. the probabilit y of disco vering at least one δ - h ub in a graph with no true hubs) exhibits a phase transition as a function of ρ . Figure 3 shows the family-wise error rate obtained via expression (32) for δ = 1 and p = 1000, as a function of ρ and the num b er of samples m . It is seen that for a fixed v alue of m there is a sharp transition in the family-wise error rate as a function of ρ . Applied Threshold ρ Number of Observations m 0 0.2 0.4 0.6 0.8 1 200 400 600 800 1000 0 0.2 0.4 0.6 0.8 1 Fig. 3. F amily-wise error rate as a function of correlation threshold ρ and num b er of samples m for p = 1000 , δ = 1. The phase transition phenomenon is clearly observ able in the plot. 22 Hamed Firouzi, Dennis W ei and Alfred O. Hero I II The phase transition phenomenon motiv ates the definition of a critical threshold ρ c,δ as the threshold ρ satisfying the follo wing slop e condition: ∂ E [ N δ,ρ ] /∂ ρ = − p. Using (16) the solution of the ab o ve equation can b e appro ximated via the expression b elo w: ρ c,δ = q 1 − ( c m,δ ( p − 1)) − 2 δ / ( δ (2 m − 3) − 2) , (33) where c m,δ = b m − 1 δ J ( f U ∗ 1 ,..., U ∗ ( δ +1) ). The screening threshold ρ should b e c hosen greater than ρ c,δ to preven t excessiv ely large n umbers of false positives. Note that the critical threshold ρ c,δ also do es not dep end on the underlying distribution of the U-scores when the co v ariance matrix Σ is blo c k-sparse. Expression (33) is similar to the expression obtained in [9] for the critical threshold in real-v alued correlation screening. How ev er, in the complex-v alued case the co efficient c m,δ and the exp onent of the term c m,δ ( p − 1) are different from the real case. This generally results in smaller v alues of ρ c,δ for fixed m and δ . Figure 4 shows the v alue of ρ c,δ obtained via (33) as a function of m for different v alues of δ and p . The critical threshold decreases as either the sample size m increases, the num b er of v ariables p decreases, or the vertex degree δ increases. Note that ev en for ten billion (10 10 ) dimensions (upp er triplet of curves in the figure) only a relativ ely small num b er of samples are necessary for complex-v alued correlation screening to b e useful. F or example, with m = 200 one can reliably disco v er connected v ertices ( δ = 1 in the figure) ha ving correlation greater than ρ c,δ = 0 . 5. 4 Application to Sp ectral Screening of Multiv ariate Gaussian Time Series In this section, the complex-v alued correlation hub screening metho d of Sec- tion 3 is applied to stationary multiv ariate Gaussian time series. Assume that the time series X (1) , · · · , X ( p ) defined in Section 2 satisfy the condi- tions of Corollary 1. Assume also that a total of N = n × m time samples of X (1) , · · · , X ( p ) are a v ailable. W e divide the N samples in to m parts of n consecutiv e samples and we tak e the n -p oin t DFT of eac h part. Therefore, for each time series, at eac h frequency f i = ( i − 1) /n , 1 ≤ i ≤ n , m sam- ples are av ailable. This allo ws us to construct a (partial) correlation graph corresp onding to each frequency . W e denote the (partial) correlation graph corresp onding to frequency f i and correlation threshold ρ i as G f i ,ρ i . G f i ,ρ i has p vertices v 1 , v 2 , · · · , v p corresp onding to time series X (1) , X (2) , · · · , X ( p ) , resp ectiv ely . V ertices v k and v l are connected if the magnitude of the sample (partial) correlation b et ween the DFTs of X ( k ) and X ( l ) at frequency f i (i.e. Sp ectral Correlation Hub Screening of Multiv ariate Time Series 23 200 400 600 800 1000 0 0.2 0.4 0.6 0.8 1 1 1 1 1 1 2 2 2 2 3 3 3 3 Critical Threshold ρ c, δ Number of Observations (m) 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 p=10000000000 p=1000 p=10 Fig. 4. The critical threshold ρ c,δ as a function of the sample size m for δ = 1 , 2 , 3 (curv e lab els) and p = 10 , 1000 , 10 10 (b ottom to top triplets of curves). The figure sho ws that the critical threshold decreases as either m or δ increases. When the n umber of samples m is small the critical threshold is close to 1 in whic h case reliable hub discov ery is impossible. Ho wev er a relativ ely small incremen t in m is sufficien t to reduce the critical threshold significantly . F or example for p = 10 10 , only m = 200 samples are enough to bring ρ c, 1 do wn to 0 . 5. the sample (partial) correlation b etw een Y ( k ) ( i − 1) and Y ( l ) ( i − 1)) is at least ρ i . Consider a single frequency f i and the n ull hypothesis, H 0 , that the cor- relations among the time series X (1) , X (2) , · · · , X ( p ) at frequency f i are blo c k sparse in the sense of Section 3.5. As discussed in Sec. 3.5, under H 0 the ex- p ected n um b er of δ -h ubs and the probabilit y of disco very of at least one δ -h ub in graph G f i ,ρ i are not functions of the unknown underlying distribution of the data. Therefore the results of Corollary 2 ma y b e used to quan tify the sta- tistical significance of declaring vertices of G f i ,ρ i to b e δ -hubs. The statistical significance is represen ted by the p-v alue, defined in general as the probabilit y of ha ving a test statistic at least as extreme as the v alue actually observed assuming that the null hypothesis H 0 is true. In the case of correlation h ub screening, the p-v alue pv δ ( j ) assigned to vertex v j for b eing a δ -h ub is the maximal probabilit y that v j main tains degree δ given the observed sample correlations, assuming that the blo c k-sparse hypothesis H 0 is true. The de- tailed pro cedure for assigning p-v alues is similar to the pro cedure in [9] for real-v alued correlation screening and is illustrated in Fig. 5. Equation (33) helps in c ho osing the initial threshold ρ ∗ . Giv en Corollary 1, for i 6 = j the correlation graphs G f i ,ρ i and G f j ,ρ j and their asso ciated inferences are approximately indep enden t. Thus we can solve m ultiple inference problems b y first p erforming correlation hub screening on 24 Hamed Firouzi, Dennis W ei and Alfred O. Hero I II • Initialization: 1. Cho ose a degree threshold δ ≥ 1. 2. Cho ose an initial threshold ρ ∗ > ρ c,δ . 3. Calculate the degree d j of eac h vertex of graph G ρ ∗ ( Ψ ). 4. Select a v alue of δ ∈ { 1 , · · · , max 1 ≤ j ≤ p d j } . • F or each j = 1 , · · · , p find ρ j ( δ ) as the δ th greatest elemen t of the j th ro w of the sample (partial) correlation matrix. • Approximate the p-v alue corresp onding to vertex v j as pv δ ( j ) ≈ 1 − exp( − E [ N δ,ρ j ( δ ) ] /ϕ ( δ )), where E [ N δ,ρ j ( δ ) ] is approximated by the limiting expression (31) using J ( f U ∗ 1 ,..., U ∗ ( δ +1) ) = 1. • Screen v ariables b y thresholding the p-v alues pv δ ( j ) at desired significance lev el. Fig. 5. Pro cedure for assigning p-v alues to the vertices of G ρ ∗ ( Ψ ). eac h graph as discussed ab o ve and then aggregating the inferences at eac h frequency in a straightforw ard manner. Examples of aggregation pro cedures are describ ed b elo w. 4.1 Disjunctiv e Hubs One task that can b e easily p erformed is finding the p-v alue for a given time series to b e a hub in at least one of the graphs G f 1 ,ρ 1 , · · · , G f n ,ρ n . More sp ecif- ically , for each j = 1 , . . . , p denote the p-v alues for vertex v j b eing a δ -hub in G f 1 ,ρ 1 , · · · , G f n ,ρ n b y pv f 1 ,ρ 1 ,δ ( j ) , · · · , pv f n ,ρ n ,δ ( j ) resp ectiv ely . These p-v alues are obtained using the metho d of Fig. 5. Then pv δ ( j ), the p-v alue for the ver- tex v j b eing a δ -h ub in at least one of the frequency graphs G f 1 ,ρ 1 , · · · , G f n ,ρ n can b e approximated as: P ( ∃ i : d j,f i ≥ δ |H 0 ) ≈ ˆ pv δ ( j ) = 1 − n Y i =1 (1 − pv f i ,ρ i ,δ ( j )) , in whic h d j,f i is the degree of v j in the graph G f i ,ρ i . 4.2 Conjunctiv e Hubs Another prop ert y of interest is the existence of a hub at all frequencies for a particular time series. In this case w e hav e: P ( ∀ i : d j,f i ≥ δ |H 0 ) ≈ ˇ pv δ ( j ) = n Y i =1 pv f i ,ρ i ,δ ( j ) . Sp ectral Correlation Hub Screening of Multiv ariate Time Series 25 4.3 General Persisten t Hubs The general case is the even t that at least K frequencies hav e hubs of degree at least δ at vertex v j . F or this general case we hav e: P ( ∃ i 1 , . . . , i K : d j,f i 1 ≥ δ, . . . d j,f i K ≥ δ |H 0 ) = n X k 0 = K X i 1 <... 0) ≈ 10 − 65 (using δ = 1 in the asymptotic relation (32) with Λ ∞ = e δ m,δ /δ ! as sp ecified b y (31)). Note that the v alue of the correlation threshold is set to be higher than the criti- cal threshold ρ c = 0 . 24. It can b e observ ed that p erforming complex-v alued sp ectral correlation screening at each frequency correctly discov ers the cor- relations b et ween the time series which are active around that frequency . As an example, for f = 0 . 2 the disco vered hubs (for δ = 1) are the time series X ( i ) ( k ) for i ∈ { 200 , 700 } . These time series are the ones that are activ e at 30 Hamed Firouzi, Dennis W ei and Alfred O. Hero I II frequency f = 0 . 2. Under the null hypothesis of diagonal cov ariance matrices, the p-v alues for the discov ered hubs are of order 10 − 65 or smaller. These re- sults show that complex-v alued sp ectral correlation screening is able to resolv e the sources of correlation b et ween time series in the sp ectral domain. 100 200 300 400 500 600 700 800 900 1000 f=0.1 100 200 300 400 500 600 700 800 900 1000 f=0.2 100 200 300 400 500 600 700 800 900 1000 f=0.3 100 200 300 400 500 600 700 800 900 1000 f=0.4 0 500 1000 0 500 1000 f = 0.1 0 500 1000 0 500 1000 f = 0.2 0 500 1000 0 500 1000 f = 0.3 0 500 1000 0 500 1000 f = 0.4 Fig. 11. Spectral correlation graphs G f ,ρ for f = [0 . 1 , 0 . 2 , 0 . 3 , 0 . 4] and correlation threshold ρ = 0 . 9, which corresp onds to a false p ositiv e probability of 10 − 65 . The data used here is a set of synthetic time series obtained b y band-pass filtering of a Gaussian white noise series with the band-pass filters sho wn in Fig. 9. As can b e seen, complex correlation screening is able to extract the correlations at sp ecific frequencies. This is not directly feasible in the time domain analysis. 6 Conclusion This chapter presented a spectral method for correlation analysis of stationary m ultiv ariate Gaussian time series with a focus on iden tifying correlation h ubs. The asymptotic independence of sp ectral comp onents at different frequen- cies allows the problem to b e decomp osed into indep endent problems at each frequency , th us improving computational and statistical efficiency for high- dimensional time series. The metho d of complex-v alued correlation screening is then applied to detect hub v ariables at each frequency . Using a character- ization of the num b er of h ubs disco vered b y the metho d, thresholds for h ub screening can b e selected to a void an excessiv e num b er of false positives or negativ es, and the statistical significance of hub discov eries can b e quantified. The theory sp ecifically considers the high-dimensional case where the num b er of samples at each frequency can b e significan tly smaller than the num b er of time series. Experimental results v alidated the theory and illustrated the applicabilit y of complex-v alued correlation screening to the sp ectral domain. Sp ectral Correlation Hub Screening of Multiv ariate Time Series 31 7 Ac kno wledgment This w ork was partially supp orted by AF OSR grant F A9550-13-1-0043. References 1. V uran, M.C., Ak an, ¨ O.B., Akyildiz, I.F.: Spatio-temp oral correlation: theory and applications for wireless sensor netw orks. Computer Netw orks 45 (3), 245–259 (2004) 2. P affenroth, R., du T oit, P ., Nong, R., Scharf, L., Jay asumana, A.P ., Bandara, V.: Space-time signal pro cessing for distributed pattern detection in sensor net- w orks. Selected T opics in Signal Processing, IEEE Journal of 7 (1), 38–49 (2013) 3. F riston, K.J., Ashburner, J.T., Kiebel, S.J., Nichols, T.E., P enny , W.D.: Sta- tistical Parametric Mapping: The Analysis of F unctional Brain Images: The Analysis of F unctional Brain Images. Academic Press (2011) 4. Zhang, P ., Huang, Y., Shekhar, S., Kumar, V.: Correlation analysis of spatial time series datasets: A filter-and-refine approach. In: Adv ances in Knowledge Disco very and Data Mining, pp. 532–544. Springer (2003) 5. Tsa y , R.S.: Analysis of financial time series, vol. 543. Wiley . com (2005) 6. Stanley , M., Gerv ais-Ducouret, S., Adams, J.: Intelligen t sensor h ub b enefits for wireless sensor netw orks. In: Sensors Applications Symp osium (SAS), 2012 IEEE, pp. 1–6. IEEE (2012) 7. Li, Y., Thai, M.T., W u, W.: Wireless sensor netw orks and applications. Springer (2008) 8. Bullmore, E., Sporns, O.: Complex brain netw orks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience 10 (3), 186–198 (2009) 9. Hero, A., Ra jaratnam, B.: Hub discov ery in partial correlation graphs. Infor- mation Theory , IEEE T ransactions on 58 (9), 6064–6078 (2012) 10. Chen, X., Xu, M., W u, W.B., et al.: Cov ariance and precision matrix estimation for high-dimensional time series. The Annals of Statistics 41 (6), 2994–3021 (2013) 11. Hero, A., Ra jaratnam, B.: Large-scale correlation screening. Journal of the American Statistical Asso ciation 106 (496), 1540–1552 (2011) 12. Firouzi, H., Ra jaratnam, B., Hero, A.: Predictiv e correlation screening: Ap- plication to tw o-stage predictor design in high dimension. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics (AIST A TS) (2013) 13. Durrett, R.: Probability: theory and examples, vol. 3. Cam bridge universit y press (2010) 14. Grenander, U., Szeg˝ o, G.: T oeplitz forms and their applications. Univ of Cali- fornia Press (1958) 15. Gra y , R.M.: T o eplitz and circulant matrices: A review. F oundations and T rends in Comm unications and Information Theory 2 (3), 155–239 (2006). DOI 10.1561/0100000006. URL http://dx.doi.org/10.1561/0100000006 16. Opp enheim, A.V., Schafer, R.W., Buck, J.R., et al.: Discrete-time signal pro- cessing, v ol. 2. Pren tice-hall Englewoo d Cliffs (1989) 32 Hamed Firouzi, Dennis W ei and Alfred O. Hero I II 17. Con wa y , J.B.: A course in functional analysis, vol. 96. Springer (1990) 18. Hamilton, J.D.: Time series analysis, vol. 2. Princeton univ ersity press Princeton (1994) 19. Mic heas, A.C., Dey , D.K., Mardia, K.V.: Complex elliptical distributions with application to shape analysis. Journal of statistical planning and inference 136 (9), 2961–2982 (2006) 20. Simon, M.K.: Probability distributions inv olving Gaussian random v ariables: A handb ook for engineers and scientists. Springer (2007) 21. Arratia, R., Goldstein, L., Gordon, L.: Poisson appro ximation and the Chen- Stein metho d. Statistical Science 5 (4), 403–424 (1990)

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment