A new Granger causality measure for eliminating the confounding influence of latent common inputs

In this paper, we propose a new Granger causality measure which is robust against the confounding influence of latent common inputs. This measure is inspired by partial Granger causality in the literature, and its variant. Using numerical experiments…

Authors: Takashi Arai

A new Granger causality measure for eliminating the confounding   influence of latent common inputs
A new Granger causalit y measure for eliminating the confounding influence of laten t common inputs T ak ashi Arai The Center f or Data Sci enc e Educ at io n and R ese ar ch , Shiga U niv ersity, Hiko ne 522-85 22, Jap an Abstract In this pap er , w e prop ose a new Granger causalit y measure w hic h is robu st against the confound- ing influence of laten t common inputs. This measure is inspired by partial Granger causalit y in the literature, and its v arian t. Using numerical exp erimen ts w e first sh o w that the test s tatistics for detecting d irected in teractio n s b et we en time series approximat ely ob ey the F -distributions w hen there a r e no in teractions. Then, w e prop ose a practical pro cedure for inferrin g directed int eractions, whic h is based on the idea of multiple statistical test in situations where th e confound in g influ ence of laten t common inpu ts ma y exist. The results of n umerical exp eriments demonstrate that the prop osed metho d successfully eliminates the influence of laten t common inputs while the normal Granger causalit y metho d detects sp urious interac tions du e to the influ ence of the confounder. 1 I. INTR ODUCT ION The Granger c ausality test is a metho d for inferring causal interactions b etw een time series b y a statistical test. This method is based on the idea b y Wiener that if past infor- mation of one time series improv es the prediction o f another time series, there is a causal influence [1]. Granger fo rmalized this notion of causalit y in the framew or k on autoregres- siv e (AR) mo deling of m ultiv ariate time series in the econometrics literature [2]. Strictly sp eaking, Granger causality do es not mean causality in the true sense, although it should b e considered as a necessary conditio n for the tr ue causalit y . Ho w ev er, the metho d has gained p opularity in a wide range of fields, suc h as econometrics and biolog y due to its conceptual simplicit y and ease o f implemen tation [3–5]. F or example, in the field of biology , the G ranger causalit y test w as applied to electro encephalogram [6], functional magnetic res- onance imaging of Blo o d Oxygen Lev el D ep enden t signals [7 – 10], m ulti-electro de arrays of lo cal field p oten tia ls and Calcium imaging of neuronal activit y [11]. Also, more recen tly , Granger causalit y has extended to p oin t pro cess time series and applied to spik e activit y of neurons [12 – 14]. No w adays, application o f the Granger causality test is more common. How ev er, when w e apply t he Grang er causalit y test to real dat a, we must pay at t ention to confounding en vironmen tal influences. In fact, in man y experimen ta l settings, w e are only able to record a subset of all related v ariables in a system. In suc h a case, applications o f Granger causality ma y detect spurious directed in teractions due to the influence of la ten t common inputs and unobserv ed v ariables. F o r example, o ne time series may falsely app ear to cause a nother if they are b oth influenced b y a third time series or a common input but with a delay . In addition, data ana lysis in practice inv olve s t he step of mo del selection, in whic h a relev a nt set of v ariables is selected f or analysis [15 ]. Then, this step is lik ely t o exclude some relev an t v ariables, whic h can lead to the detection of apparen t causal in teractions that a re actually spurious [1 6]. Hence, controlling for latent common inpu ts and laten t v ariables is a critical issue when applying G ranger causalit y to experimen tal data. P artial G ranger causality w as dev elop ed as a metho d for eliminating the confounding influence of latent common inputs and unobserv ed v ariables [17]. According to what the authors in the literature sa y , the metho d w a s inspired by the partial correlation in stat is- tics. They insisted that partial G ranger caus ality does estimate directed in teractions more 2 robustly than conditional Granger causality aga inst the influ ence of laten t common inputs and lat ent v ariables. In the recen t literature we can find some reviews and applications of this type of Grang er causalit y [18]. Although the idea of part ia l G ranger causality is fascinating and the metho d is applied to v ar ious systems, the justification of t he metho d is arguable [19]. In the origina l pa p er, it w as insisted t ha t the sampling distribution o f the test statistics F 1 cannot b e de termined analytically , and one has to res ort to the b o otstrap tec hnique. F urthermore, the test statistics can tak e a nega t iv e v alue, whic h was reasoned as b eing subtracted b y an additional term. By contrast, in the subs equen t study in whic h partial Granger causality was extended to multiv ariate system, it was argued that t he test statistics cannot tak e a negative v alue [20]. In another literature, it w as argued that the negativ e v alue in the test statistics of partial Gra ng er causalit y is a serious fla w, and this undermines the credibilit y of the obtained results and, thus, the v a lidit y o f the approach [19]. Therefore, it remains unclear that at whic h circumstance partial G r a nger causalit y is av ail- able and not eve n in n umerical e xp erimen ts. In this pap er, mo difying the idea of partial Granger causalit y , w e prop o se a new Granger causalit y measure which is robust a gainst the confounding influence of laten t common in- puts. W e first p oin t out that the original partial Grang er causalit y is attempting to eliminate the influence of the confounder b et w een the tar g et v a r ia ble and conditioning v ariable, rather than the target v ariable and source v aria ble. Then, w e in tro duce a G ranger causality mea- sure by elim inat ing the influence of latent common inputs betw een the target v a riable and source v aria ble. W e also discuss t he w a y of parameter estimation for the mo del. In our parameter estimation pr o cedure, the test statistics cannot take a negativ e v alue. Of the influence of laten t confounder, this pa p er only considers the latent common inputs, and the laten t v ariables are not considered whic h has a w orse effect on the estimation of directed in teractions [19]. The reason fo r this is that the influence of laten t v aria bles hav e to be dealt with in the fra mew ork o f the Mo ving Av erage (MA) mo del, rather than the AR mo del. This pap er is organized as fo llows. In Sec. I I, w e review partial G ranger causalit y and in tro duce our Granger c a usality measure to illus tra te and highligh t the difference b et we en partial Granger causalit y and prop osed G ranger causality intro duced in this pap er. F ur- thermore, we discuss the distribution that t he test statistics ob eys by using the lik eliho o d function of the observ ed time series. W e also discuss the w ay of parameter estimation for t he mo del and the reason wh y the test s ta t istics can take a negative v alue in the literature. In 3 Sec. I I I, w e sho w the v alidit y of our measure using the data generated b y the mo dels, whic h are v ariants of the mo dels extensiv ely studied in the literature for num erical exp erimen ts. F urthermore, w e prop ose a practical pro cedure fo r estimating directed inte ra ctions where laten t common inputs ma y exist. Sec. IV is dev oted to conclusions. I I. PR OPOSE D MEASURE FOR GRANGER CAUSALITY In this section, w e review partia l G ranger causalit y to illustrate and highlight the differ- ence b et w een partial Gra nger causalit y and prop osed G r anger causalit y introduced in this pap er, and p oint out the issue of partial Granger causality . Then, we in tro duce our new Granger causalit y measure . A. P artial Granger c ausality Seth et al., prop osed the following idea of Granger causalit y whic h eliminates the influence of laten t confounder. In their notation, t hey considered the follo wing AR mo del for the n ull h yp othesis: X t = ∞ X i =1 a 1 i X t − i + ∞ X i =1 c 1 i Z t − i + u 1 t , (1) Z t = ∞ X i =1 b 1 i Z t − i + ∞ X i =1 d 1 i X t − i + u 2 t , (2) where u 1 t , u 2 t are noise terms whic h ha v e zero mean and in v olve the influence of la t ent com- mon inputs and latent v ariables. They incorp orated the influence into the model exp licitly b y the fo llo wing cov a riance matrix: S =   S 11 S 12 S 21 S 22   =   v ar( u 1 t ) co v ( u 1 t , u 2 t ) co v ( u 2 t , u 1 t ) v ar( u 2 t )   . (3) 4 In the same w ay , the mo del with directe d in teractions for the alt ernativ e h yp othesis is expresse d as follows : X t = ∞ X i =1 a 2 i X t − i + ∞ X i =1 b 2 i Y t − i + ∞ X i =1 c 2 i Z t − i + u 3 t , (4) Y t = ∞ X i =1 d 2 i X t − i + ∞ X i =1 e 2 i Y t − i + ∞ X i =1 f 2 i Z t − i + u 4 t , (5) Z t = ∞ X i =1 g 2 i X t − i + ∞ X i =1 h 2 i Y t − i + ∞ X i =1 k 2 i Z t − i + u 5 t , (6) where eac h no ise has a fo llo wing cov a riance matrix: Σ =      Σ 11 Σ 12 Σ 13 Σ 21 Σ 22 Σ 23 Σ 31 Σ 32 Σ 33      =      v ar( u 3 t ) co v ( u 3 t , u 4 t ) co v ( u 3 t , u 5 t ) co v ( u 4 t , u 3 t ) v ar( u 4 t ) co v ( u 4 t , u 5 t ) co v ( u 5 t , u 3 t ) cov( u 5 t , u 4 t ) v a r( u 5 t )      . (7) Then, for each mo del, they introduce the v aria nce of noise for the target v aria ble X t b y eliminating the influence of the noise for Z t as R (1) X X | Z ≡ S 11 − S 12 S − 1 22 S 21 , (8) R (2) X X | Z ≡ Σ 11 − Σ 13 Σ − 1 33 Σ 31 . (9) Since R (1) X X | Z , R (2) X X | Z are v aria nce o f no ise fo r target v ariable with the influence of lat en t common inputs and la t ent v ariables eliminated, they insisted that directed interactions under the existence of laten t common inputs and latent v ariables can b e estimated b y the analysis of v a riance of R (1) X X | Z , R (2) X X | Z . The test stat istic is giv en b y F 1 = ln R (1) X X | Z R (2) X X | Z ! . (10) It is b ecause the test statistic F 1 quan tifies the decrease of v a riance b y including in teraction terms with the influence of confounder eliminated. They called the ab ov e statistic al test as partial Granger causality since it w as ins pired b y the partial correlation in statistics . 5 B. Proposed st atistical measure In our opinion, partial Granger causality intro duced in the previous subsection has a conceptual issue. The v ariance, Eqs. (8) a nd (9), eliminates the noise correlation b et w een the target v ariable X t and the conditioning v ar ia ble Z t . That is, they eliminate the confounding en vironmen tal influence b etw een the target v ariable and the conditioning v a riable. Ho wev er, when w e are going to estimate directed interactions from the source v ariable Y t to the target v ariable X t , the influence that should b e remo ve d is that b et w een the source v ar ia ble and the tar g et v aria ble, whic h can cause a spurious in teraction from Y t to X t . Therefore, w e define here t he v aria nce b y eliminating the influence of laten t common inputs betw een the source v a r ia ble and the target v aria ble in a simplest example of the AR mo del. Then, using this v ariance, we prop ose a new Granger causality test for directed in teractio ns under the existence of latent common inputs. The mo del we supp ose consists o f tw o time series o f the f ollo wing AR mo dels x t , y t , ho w ev er, the noise structure is different from the usual AR mo del: x t = ax t − 1 + cy t − 1 + ε t , (11) y t = by t − 1 + ξ t . (12) Here, the noise term ε t and ξ t are zero mean Gaussian noises, although they hav e a differen t- time correlatio n due to the la t ent common inputs with delay . That is, the co v a r ia nce matrix of the noise is ex pressed as follo ws, Σ =   Σ xx Σ xy Σ y x Σ y y   ≡   v ar( ε t ) co v ( ε t , ξ t − 1 ) co v ( ξ t − 1 , ε t ) v ar( ξ t − 1 )   =   σ 2 x ρσ x σ y ρσ x σ y σ 2 y   , (13) where ρ is a corr elat io n co efficien t b etw een ε t and ξ t − 1 , whic h means that the lat ent common input reac hes x t with dela y compared to y t . These noises do not ha ve auto correlation, co v ( ε t , ε t − i ) = cov( ξ t , ξ t − i ) = 0 , i ∈ N . In t his pap er, we express the influence of laten t v ariables b y an auto-correlated noise. Our interes t is t he es timatio n o f in t era ctio ns from y t to x t , tha t is, whether the mo del parameter c exists or not. F or simplicit y , w e assume that the in teraction of o pp osite direction, from x t to y t , do es not exist, that is, the system of our in terest comprises asymmetric net w ork structure. 6 Here, w e note the fo llowing f a ct. In the orig inal pap er of partial Granger caus ality , t w o t yp es of noise term whic h ar e generated by the confounding en vironmen tal influence are considered. The first is generated b y laten t common inputs, which has a cross correlation b et we en each time series but do es not hav e auto correlation. In addition, only the equal- time cross-correlation w as considered. Ho we ve r, the equal-time correlated noise generated b y latent common inputs does not pro duce a spurious directed interaction, this fact will b e n umerically confirmed lat er. Laten t common inputs with dela ys could generate a spurious directed in teraction when one uses the normal Granger causalit y test. Th e influence of laten t common inputs can b e remo v ed by the me tho d prop osed in this paper. The se cond is the noise gene rat ed b y the laten t v a r iables. The influence of lat en t v ariables is e xpressed b y an auto- correlated noise, cov( ε t , ε t − i ) 6 = 0 , i ∈ N . How ev er, strictly sp eaking, an auto- correlated noise is b ey ond the scope of the AR mo del. Therefore, in this pap er, w e do not supp ose the existence of la ten t v aria bles. The influence of laten t v ariables requires Gra ng er causalit y constructed by the MA model. The study in this direction has b een w o rk ed on in the framew ork of the state space mo del [21 ]. T o construct our new measure f or Gra ng er causalit y , it is useful to reparametrize the noise parameter ( σ x , σ y , ρ ) to ( τ , σ y , η ) as Σ =   τ 2 + η 2 σ 2 y η σ 2 y η σ 2 y σ 2 y   , (14) where τ 2 = Σ xx − Σ xy Σ − 1 y y Σ y x (15) is the v ariance of the noise with the influence of laten t common inputs eliminated simi- lar to partial G ranger causality . This reparameterization is equiv alent to follow ing linear relationship b et w een ε t and ξ t : ε t = η ξ t − 1 + ω t , (16) where the noise ω t has follow ing property: ω t ∼ N (0 , τ 2 ) , (17) co v ( ω t , ξ t − 1 ) = co v ( ω t , ε t ) = 0 . (18) 7 Using the ab ov e reparameterization, the mo del, Eqs. (11) and (12), c an b e re-expres sed as x t = ax t − 1 + cy t − 1 + η ξ t − 1 + ω t , (19) y t = by t − 1 + ξ t . (20) Since t he conditional v a riance, τ 2 = Σ xx − Σ xy Σ − 1 y y Σ y x , is a v aria nce of noise for x t from whic h the correlated comp onent with the noise fo r y t w as eliminated, we prop ose a stat istical measure fo r Granger causality in detecting directed in teractions as whether the v ariance of ω t decreases significan tly by including the in teraction pa r a meter c . That is, w e set the mo dels of the null h yp othesis and the alternativ e h yp othesis as f o llo ws:      H 0 : c = 0 , H 1 : c 6 = 0 . (21) C. In terpret ation by the lik eliho o d function In this subsection, w e re-ex press the form ulation explained in the previous subsection in terms of the lik eliho o d function in order to consider the dis tr ibutio n that our test statistic ob eys. First, w e note that the v a riance of t he noise, Eq. (1 5), is f o rmally equiv alen t to that of the conditional Gaussian distribution. In fact, the probabilit y of the AR mo del, Eq. (19), can b e expressed as a conditional Gaussian distribution: p ( x t | x t − 1 , y t − 1 , y t − 2 ) = N ( µ (1) t , Σ x | y ) , (22) where µ (1) t = ax t − 1 + cy t − 1 + η ( y t − 1 − by t − 2 ) = ax t − 1 + cy t − 1 + η ξ t − 1 , (23) Σ x | y =Σ xx − Σ xy Σ − 1 y y Σ y x = τ 2 . (24) Therefore, extracting the correlated comp onen t of noise due to laten t common inputs is equiv alen t to conditioning on y t . No w, w e consider the situatio n where time series x t and y t are observ ed, x = ( x T , x T − 1 , . . . , x 2 ), y = ( y T − 1 , y T − 2 , . . . , y 2 , y 1 ). W e denote the model parameters as β = ( a, c, η ) and b . Then, 8 the o ve ra ll like liho o d of the time series except fo r initial v alues is giv en by t he follow ing join t probabilit y dis tributio n: L ( β , b | x , y ) = p ( x T , x T − 1 , . . . , x 3 , y T − 1 , . . . , y 1 | x 2 , y 1 , β , b ) . (25) Since the time series is mo deled as the AR mo del and the net work structure of the time series is assumed to b e asymmetric, the a b ov e lik eliho o d function can b e divided in to the follo wing product o f t w o conditional distributions: L ( β , b | x , y ) = p ( x T , x T − 1 , . . . , y T − 1 , y T − 2 , . . . | x 2 , y 1 , β , b ) = p ( x T | x T − 1 y T − 1 , y T − 2 | x 2 , y 1 , β , b ) p ( x T − 1 , x T − 2 , . . . , y T − 1 , y T − 2 , . . . | x 2 , y 1 , β , b ) . . . = T Y t =3 p ( x t | x t − 1 , y t − 1 , y t − 2 , β , b ) T − 1 Y t =2 p ( y t | y t − 1 , b ) ≡ L x | y ( β , b | x , y ) L y ( b | y ) . (26) Therefore, testing whether the v ariance Σ x | y decreases significan tly , b y including directed in teractions, is equiv alen t to test whether the conditiona l lik eliho o d L x | y increases signifi- can tly . D. Parameter estimation Log-lik eliho o d functions of the o bserv ed time series are giv en b y log L x | y ( β , b | x , y ) = − T − 2 2 log τ 2 − 1 2 τ 2 T X t =3 h x t − ax t − 1 − cy t − 1 − η ( y t − 1 − by t − 2 ) i 2 = − T − 2 2 log τ 2 − 1 2 τ 2 T X t =3 h x t − ax t − 1 − cy t − 1 − η ξ t − 1 i 2 ≡ log L x | y ( β | x , y , ξ ) , (27) log L y ( b | y ) = − T − 2 2 log σ 2 y − 1 2 σ 2 y T − 1 X t =2 ( y t − by t − 1 ) 2 , (28) where we hav e omitted the constant terms not related to mo del parameters, and ξ = ( ξ T − 1 , ξ T − 2 , . . . , ξ 2 ). Though our interes t is the lik eliho o d function L x | y ( β , b | x , y ), this lik e- liho o d is o v er- pa rameterized and uniden tifiable. Th us, w e cannot determine the mo del 9 parameters uniquely b y using L x | y ( β , b | x , y ) alone. Therefore, w e prop ose a parameter es- timation pro cedure where w e first estimate the para meter b b y using maximu m like liho o d estimation fo r L y ( b | y ) and then estimate the r emaining parameters in L x | y ( β , b | x , y ) by sub- stituting the previously estimated pa rameter b = ˆ b . In other w ords, w e first estimate the noise term ξ t b y using maxim um likelih o o d estimation fo r L y ( b | y ) , then we su bstitute the maxim um lik eliho o d estimates ξ t = ˆ ξ t for the explanator y v ariables in L x | y ( β | x , y , ξ ). The justification for this par a meter estimation pro cedure is giv en as follo ws. Since w e a ssume the asym metric netw ork structure, the mo del parameter in L y ( b | y ) can be estimated inde- p enden tly on x t . That is, the mo del parameter b can b e estimated b y y t alone, a nd w e can obtain the residual time series ˆ ξ t ≡ y t − ˆ by t − 1 . Inserting the residual time series ˆ ξ t in to the lik eliho o d function L x | y ( β | x , y , ξ ) reduces t he model for x t to a simple AR mo del with the explanatory v ariables ˆ ξ t added, as show n in Eq. (27). Therefore, t he maxim um lik eliho o d estimate of the inserted lik eliho o d L x | y ( β | x , y , ˆ ξ ) alw a ys increases b y including in teraction term c . P artial Gra nger causality in the literature suffers from a negative v alue of the test statis- tics. In Ref. [19], it w as p oin ted out that the reason for this ma y b e due to small co ding errors. W e infer the reason for this from t he discuss ion of the parameter estimation of the mo del. In the metho d of partial Grang er causalit y , maxim um lik eliho o d estimation for join t distribution p ( x, z ) was prop osed as a metho d of parameter estimation [20], where x and z are the ta rget v a r iable and the conditioning v ariable, resp ectiv ely . Ev en in o ur Granger causalit y measure, maxim um lik eliho o d estimation for the ov erall lik eliho o d L is a p ossible candidate for the para meter estimation. Ho w ev er, when w e use maxim um lik eliho o d esti- mation for L , L alw a ys increases by inc luding the in t era ctio n term although L x | y do es not necessarily increase. It is b ecause, the in teraction parameter included is used to incre ase L itself, but not increase L x | y . W e ha v e n umerically confirmed that only o ccasionally maximum lik eliho o d estimation f o r L ma y lead to a negativ e v alue of the test statistics. If the noise term ξ t w ere observ ed, the distribution o f test statistics for inferring directed in teractions can b e obtained analytically based on the log-lik eliho o d ra t io statistic in the framew ork of the generalized linear mo del as follows. W e denote the mo del parameters of our interes t as β 1 = ( a, c, η ), and the pa rameters of a full mo del as β max . The f ull mo del is a generalized linear mo del w ith the same probabilit y distribution and the same link function as the mo del of our in terest, exc ept that it has a maxim um n umber of parameters that can 10 b e estimated from the data. The n umber of parameters for the mo del o f our in terest and the full mo del is p 1 and T − 1, respectiv ely . Since the like liho o d function L x | y ( β | x , y , ξ ) is expresse d as a pro duct o f that of the Gaussian distribution, the deviance statistic D 1 ob eys the chi-sq uare distribution, D 1 ∼ χ 2 ( T − 1 − p 1 ), pro vided that the mo del of our in terest describes the data as muc h as the full mo del [22, 23]: D 1 ≡ 2  log L x | y ( ˆ β max | x , y , ξ ) − log L x | y ( ˆ β 1 | x , y , ξ )  = 1 τ 2 T X t =2 ( x t − ˆ µ (1) t ) 2 , (29) where ˆ µ (1) t = ˆ ax t − 1 + ˆ cy t − 1 + ˆ η ξ t − 1 , (30) and hats denote maximum lik eliho o d estimates for L x | y ( β | x , y , ξ ). On the other hand, if there is a significan t difference in describing the data b et w een mo del of our in terest and the full mo del, the deviance ob eys the non-cen tral c hi- square distribution and th us, ta k es a larger v alue than that expected b y the central c hi-square distribution. When w e w ant to test the e xistence of directed interactions, w e set a n ull hy p othesis H 0 and an alternativ e h yp othesis H 1 . Mo del parameters corresp onding to t hese hypotheses are expresse d as β 0 = ( a, η ) and β 1 = ( a, c, η ), resp ectiv ely . These mo del are nested, that is, they ha v e the same probabilit y distribution and the same link function, but the parameters of the mo del for H 0 is a sp ecial cas e of the parameters of the mo del for H 1 . The num b er of parameters for these models is p 0 and p 1 , respective ly . The n, w e consider the difference of deviance b etw een these mo dels, ∆ D ≡ D 0 − D 1 =2  log L x | y ( ˆ β 1 | x , y , ξ ) − log L x | y ( ˆ β 0 | x , y , ξ )  = 1 τ 2 T X t =2 h ( x t − ˆ µ (0) t ) 2 − ( x t − ˆ µ (1) t ) 2 i , (31) where D 0 = 1 τ 2 T X t =2 ( x t − ˆ µ (0) t ) 2 , (32) ˆ µ (0) t =ˆ ax t − 1 + ˆ η ξ t − 1 , (33) 11 and hats denote maxim um likelih o o d estimates for L x | y ( β | x , y , ξ ). The difference of deviance ob eys the central c hi-square distribution, ∆ D ∼ χ 2 ( p 1 − p 0 ), prov ided that b oth mo dels describe the data as muc h as the full mo del. On the o ther hand, if there is a significan t difference in describing the data b et w een these mo dels, the difference of deviance ob eys the non- central chi-sq uar e distribution, and tak es a larger v alue than that expected b y the cen tral chi-sq uare distribution. Here, w e are assuming that t he dispersion parameter of the mo del τ cannot b e estimated from the data . Then, the difference of deviance is prop ortional to unknow n parameter τ , i.e., the difference of deviance is expres sed as a scaled deviance. In practice, w e can remo v e t his unkno wn parameter in the test s ta t istics by dividing dev iances. First, w e assume that the mo del for the alternativ e h yp othesis describ es the data as m uch as the full mo del. That is, the deviance D 1 ob eys the cen tral chi-squ a r e distribution. Next, w e divide the difference of deviance ∆ D b y D 1 to mak e a new statistic that is indep endent of the unknown pa r ameter τ , F = ∆ D / ( p 1 − p 0 ) D 1 / ( T − 1 − p 1 ) = 1 p 1 − p 0 P T t =2 h ( x t − ˆ µ (0) t ) 2 − ( x t − ˆ µ (1) t ) 2 i 1 T − 1 − p 1 P T t =2 ( x t − ˆ µ (1) t ) 2 . (34) The statistic F ob eys the F -distribution, F ∼ F ( p 1 − p 0 , T − 1 − p 1 ), provided that b oth mo dels describ e the data as m uch as the full mo del. On the other hand, if there is a significan t difference in describing the data b etw een the n ull mo del and the alternative mo del, the statistic F ob eys the non-cen tral F - distribution a nd ta k es a larger v alue than that expected b y t he cen tral F -distribution. In summary , the existence of directed inte ra ctions can b e tested as follo ws. If the test statistic F is in the significance lev el of the F - distribution, w e judge that there is no difference in describing the data betw een these models and accept a simpler mo del H 0 . If the test statistic F take s a v alue larger than the significance le vel, w e reject H 0 and accept H 1 . The ab ov e result strictly holds true if ξ t w ere observ ed. In fact, ξ t is not observ ed and has to b e estimated from the data . Here, w e approximate ξ t b y it s maxim um lik eliho o d 12 estimates for L y ( b | y ) : L x | y ( β | x , y , ξ ) ≃ L x | y ( β | x , y , ˆ ξ ) , (35) ˆ ξ t ≡ y t − ˆ by t − 1 . (36) Then, mo del parameters β are estimated from t he appro ximated lik eliho o d L x | y ( β | x , y , ˆ ξ ). In this pap er, w e assume that the test statistic ob eys the F -distribution approximately even if w e substitute ˆ ξ t . The ab o v e result can b e directly extended to a more general mo del where the mo del o rder of the inte ra cting term and the lag of the noise correlation tak e a n y v alues. In this case, the mo del of the time series is express ed as follow s: x t = l a X i =1 a i x t − i + l c X i =1 c i x t − i + ε t = l a X i =1 a i x t − i + l c X i =1 c i x t − i + η l η ξ t − l η + ω t , (37) y t = l b X i =1 b i y t − i + ξ t , (38) W e express the mo del pa r a meters as β = ( a , c , η l η ), where a = ( a 1 , a 2 , · · · , a l a ), c = ( c 1 , c 2 , · · · , c l c ) and l η denotes the time p oint of lag fo r noise correlation and tak es an in teger v alue larger than zero. Then, t he log-lik eliho o d functions for this mo del are giv en by log L x | y ( β , b | x , y ) = − ( T − l max ) 2 log τ 2 − 1 2 τ 2 T X t = l max+1 " x t − l a X i =1 a i x t − i − l c X i =1 c i y t − i − η l η  y t − l η − l b X i =1 b i y t − l η − i  # = − ( T − l max ) 2 log τ 2 − 1 2 τ 2 T X t = l max+1 " x t − l a X i =1 a i x t − i − l c X i =1 c i y t − i − η l η ξ t − l η # ≡ log L x | y ( β | x , y , ξ ) , (39) log L y ( b | y ) = − ( T − 1 − l b ) 2 log σ 2 y − 1 2 σ 2 y T − 1 X t = l b +1 " y t − l b X i =1 b i y t − i # , (40) where l max =max( l a , l c , l b + l η ) . (41) 13 It should b e noted that since w e are dealing with the AR mo del, the noise ε t and ξ t do es not hav e auto correlation. Therefore, ε t and ξ t cannot correlate with multiple time p oints. Th us, we just consider the correlation b et w een ε t and ξ t with a single t ime p o in t η l η ξ t − l η rather than m ultiple time p oints P l η i =1 η i ξ t − i . The sp ecific expression f o r the test statistic F is giv en b y F = 1 p 1 − p 0 P T t = l max +1 h ( x t − ˆ µ (0) t ) 2 − ( x t − ˆ µ (1) t ) 2 i 1 T − l max − p 1 P T t = l max +1 ( x t − ˆ µ (1) t ) 2 , (42) where ˆ µ (1) t = l a X i =1 ˆ a i x t − i + l c X i =1 ˆ c i y t − i + ˆ η l η ξ t − l η , (43) ˆ µ (0) t = l a X i =1 ˆ a i x t − i + ˆ η l η ξ t − l η , (44) and ha t s denote maximum lik eliho o d estimates for L x | y ( β | x , y , ξ ). The statistic F ob eys the F -distribution, F ∼ F ( p 1 − p 0 , T − l max − p 1 ), pro vided that both mo dels describe the data as m uch as the full model. Again, t he mo del parameters b are calculated by maxim um lik eliho o d estimation f or L y ( b | y ) first, and the estimated v a lues for ξ t is obtained as ˆ ξ t ≡ y t − l b X i =1 b i y t − i . (45) Then, the like liho o d function L x | y ( β | x , y , ξ ) is appro ximated b y inserting ξ t = ˆ ξ t : L x | y ( β | x , y , ξ ) ≃ L x | y ( β | x , y , ˆ ξ ) . (46) The mo del pa r ameters β are estimated fro m the approx imated lik eliho o d L x | y ( β | x , y , ˆ ξ ), and w e assume that the appro ximated test statistic appro ximately ob eys the F -distribution, ev en if w e substitute ˆ ξ t . I I I. NUMERICAL EXPERIMENTS In this section, we confirm the v alidit y of our parameter estimation pro cedure describ ed in the previous section and the distribution that t he test statistic F ob eys nu merically . Then, w e pro p ose a practical pro cedure for inferring directed in teractions, whic h is based 14 on a m ultiple statistical t est for the c ase where the influe nce of latent common inputs ma y exist. In all n umerical exp erimen ts in this pap er, w e assume that the mo del order of all the self in teractions l a and l b is kn own for simplicit y . In a ddition, w e assume that the net w or k structure is asymmetric, i.e., the in teraction fr o m x t to y t is absen t. W e consider to y models wh ich ha ve b een ex tensiv ely applied in tests of Granger caus al- it y [7, 24], exc ept that w e mo dify this mo del by adding laten t comm on inputs to eac h time series. W e simulated four toy mo dels with and without latent common inputs and directed in teractions, ( a ), ( b ), ( c ) and ( d ): x t = a 1 x t − 1 + a 2 x t − 2 + c 1 y t − 1 + c 2 y t − 2 + ε t , (47) y t = b 1 y t − 1 + b 2 y t − 2 + ξ t , (48) where the noise has a follo wing co v ariance mat r ix: Σ ≡   v ar( ε t ) cov( ε t , ξ t − 1 ) co v ( ξ t − 1 , ε t ) v ar( ξ t − 1 )   =   σ 2 x ρσ x σ y ρσ x σ y σ 2 y   . (49) The parameters a 1 , a 2 , b 1 , b 2 , σ x and σ y tak e the same v alue at eac h mo del, ho w ev er, the other parameters take differen t v alues. The v alues of the mo del parameters are listed in T ABLE I. The sc hematic view of the sim ulated mo dels and the run- sequence plots of time series are show n in FIG. 1 and FIG. 2, r esp ective ly . T ABLE I. The v alues of the mo del p arameters for eac h s imulated mo d el mo del a 1 a 2 b 1 b 2 c 1 c 2 σ x σ y ρ ( a ) 0.9 − 0 . 5 0.5 − 0 . 2 0 0 1 √ 0 . 7 0 ( b ) 0.9 − 0 . 5 0.5 − 0 . 2 0 0 1 √ 0 . 7 0.4 ( c ) 0.9 − 0 . 5 0.5 − 0 . 2 0.16 − 0 . 2 1 √ 0 . 7 0 ( d ) 0.9 − 0 . 5 0.5 − 0 . 2 0.16 − 0 . 2 1 √ 0 . 7 0.4 A. Sa mpling distribution of the test statistics In this subsection, w e first confirm the influence of the cor r elated noise generated b y latent common inputs num erically . F or simplicit y , w e assume that the mo del order of directed 15 ; < D ; ǎ < E ; < F ; ǎ < G FIG. 1. Sc hematic view of sim ulated mo dels. −2 0 2 0 10 20 30 40 50 (a) x y −2 0 2 0 10 20 30 40 50 (b) x y −2 0 2 0 10 20 30 40 50 (c) x y −2 0 2 0 10 20 30 40 50 (d) x y FIG. 2. Run -sequence plot for eac h simulat ed mo del. Blac k lines with circle p oint s denote x t and orange lines with triangle p oin ts denote y t . in teractions and the lag of correlated noise are kno wn. F or t he case where these lags are unkno wn will b e treated in the next subsection. W e first confirm t ha t the latent common inputs with delays pro duce a spurious directed in teraction while the equal-time correlated 16 noise do es not, though only the laten t common inputs without dela ys w ere considered in the literature. FIG. 3 shows the effect of dela ys of latent c ommon inputs on the sampling distributions of the test statistics for normal Gra nger causality . T he Figures sho w that when la t ent common inputs reac h sim ultaneously , the test statistic is the same as that of the n ull h yp othesis F -distribution, while in t he case of laten t common inputs with dela y the test s ta tistic deviates f rom the null distribution. The purp ose of t his pap er is to construct a robust statistical measure for Grang er causalit y a gainst the laten t common inputs with dela ys. 0 2000 4000 6000 0 5 10 15 F (b) l η = 0 0 2000 4000 6000 0 5 10 15 F (b) l η = 1 FIG. 3. The distr ib utions of test statistics for normal Gr an ger causalit y under the infl uence of correlated noise (the red h istograms). Time ser ies are simulate d using the mo del ( b ) but mo d ified so that the noise may ha ve correlations at v arious time lags. Equal-time correlated noise (left) and differen t-time correlate d n oise (right). These are the r esu lts for a sample s ize of 100 with 10,0 00 trials. Th e ligh t green histograms are samp led distrib utions from the F -distrib ution for reference, and the dashed lines denote the 95 p ercent quant ile of the F -d istribution. Next, we confirm the v alidity of our statistical measure and the parameter estimation pro cedure prop osed in the previous section. FIG. 4 sho ws the test statistics for prop osed Granger causality at mo del ( b ) with v arious sample sizes. F rom the figure, the prop osed test statistics approximately ob ey the F -distribution, a nd this prop erty do es not change ev en if the sample size v aries. Th us, the prop osed metho d is robust against the influence of latent common inputs. The reason for why the test statistic do es not exactly ob ey the F -distribution is that w e hav e used the approximated lik eliho o d L x | y ( β | x , y , ˆ ξ ) rather than L x | y ( β | x , y , ξ ). The ab ov e result alone cannot den y the criticism that the prop o sed metho d is just insen- 17 0 1000 2000 3000 4000 0 2 4 6 F (b) T = 75 0 1000 2000 3000 4000 0 2 4 6 F (b) T = 300 0 1000 2000 3000 4000 0 2 4 6 F (b) T = 3000 FIG. 4. The distributions of test statistic s for p rop osed metho d (the blue histograms). Results of sim u lation mo d el ( b ) with 10,000 trials. Sample sizes are 300 (left), 1,000 (cen ter) and 3,00 0 (righ t). Th e ligh t green histograms are sampled d istributions fr om the F -distribu tion for reference, and the dashed lines denote 95 p ercent qu an tile of the F -distribution. sitiv e to directed in teractions. Therefore, w e show that the prop osed metho d has an abilit y to detect true directed in teractions. FIG. 5 sho ws the test statistics o f s imulated mo del ( d ) for v arious sample sizes. As the sample size increases , the test statistic deviates from the F -distribution, th us the method has an abilit y to de tect directed in tera ctio ns. 0 2000 4000 6000 0 4 8 12 F (d) T = 100 0 2000 4000 6000 0 4 8 12 F (d) T = 300 0 2000 4000 6000 0 4 8 12 F (d) T = 1000 FIG. 5. Th e distributions of test statistics for prop osed metho d (the blue h istograms). Result of sim ulation mo del ( d ) with 10,0 00 trials. Sample s izes are 100 (left), 300 (cen ter) and 1,000 (right). The light green histograms are sampled distributions from the F -d istribution for reference, and the d ashed lines denote 95 p ercen t qu an tile of the F -distribution. B. Practical pro cedure The assumption made in the previous subsection that the mo del order of auto- regressiv e co efficien ts and the lag of correlated noise are know n, is not realistic. In pra ctice, these lags in t r ue mo dels are unkno wn in ma ny cases. F urthermore, since the r esult of a statistical t est 18 dep ends crucially on t he n umber of parameters to b e tested, it is necess ar y to estimate these lags prop erly in order to infer the in teractions accurately . Therefore, in this sub section, w e prop ose a practical pro cedure for testing directed interactions, whic h is based on a stepw ise v ariable increasing method through multiple tests. W e first estimate mo del parameter y t b y maxim um lik eliho o d estimation for L y , and obtain the residual time series ˆ ξ t . Then, w e select the b est la g s of the noise correlation and the directed in teraction, l η and l c , b y the Ba yes ian Information criterion (BIC) separately . BIC is giv en b y BIC = − 2 log L x | y + p log ( T − l ) =( T − l ) log  1 T − l T X t = l +1 ( x t − ˆ µ t ) 2  + p log ( T − l ) , (50) where l is the maxim um nu mber of la gs to b e searc hed for, p is the num b er of mo del parameters and we hav e omitted the irrelev an t constan ts whic h do no t dep end on the mo del parameters. Here w e hav e assumed that when w e calculate BIC, the disp ersion parameter can b e estimated from the data, τ = ˆ τ = 1 T − l P T t = l +1 ( x t − ˆ µ t ) 2 . Next, using the sele cted lags whic h minimize BIC, the p -v alues for the noise correlation and the directed in teraction are calculated separately . Then, if the p -v alue, whic h is the smaller one, tak es a smaller v alue than the multiple test threshold, w e reject the n ull hy p othesis and adopt the alternativ e h yp othesis. If either alternativ e h yp othesis is adopted, w e ag a in select the la g and p erfo rm a statistical test for the other v aria ble under the adopted alternativ e hypothesis. Since this pro cedure can b e seen as multiple tests, w e apply the Bonferro ni corr ection to t he threshold of the p -v alue in order f o r the familywise error ra te to b e less than the prescrib ed threshold. W e compare the p erformance in detecting directed in teractions b etw een the normal Granger causalit y pro cedure and the prop osed pro cedure in FIG. 6. In this n umerical exper- imen t and below, w e set the maxim um n umber of lags to b e searc hed, l η and l c , b e 6. Since the driving f orce of t he AR mo del is the noise, the correlation b et w een the v ariable y t and the noise ξ t , in man y cases, is large. Then, when we add b oth y t and ξ t on explanatory v ariables when exploring the b est lags, the parameter estimation o ccasionally cannot p erform due to m ulticollinearit y . Therefore, if the m ulticollinearity o ccurs, w e stop searc hing for lag s. As in the case of the prop osed pro cedure, we select the b est num b er of lags for the directed in teraction and p erform a statistical test on it for the normal G ranger causalit y procedure. 19 W e set the threshold of the p -v alue b e 0.0 5 and the Bonf erro ni correction co efficien t to 1 / 2 in the case of the multiple test in order the f a milywise error rate to b e 0.05. F rom FIG. 6, it turns out that when the sample size is small, laten t common inputs ma y pro duce a spurious directed in teraction, ev en in the prop o sed method. It is because, since the noise is the driving force o f the AR mo del, then b oth p -v alues for in teraction y t and ξ t tak e small v alues, and a comparison of p -v a lues b etw een y t and ξ t fails. In addition, the prop osed metho d has a lesser statistical p ow er in detecting the tr ue directed in teraction than the normal G ranger causality pro cedure. How ev er, these shortcomings a re r esolve d by increasing the sample size, and the directed in teraction is detectable with high acc uracy in an y sim ulat ed mo dels. On the other hand, the normal Granger causalit y pro cedure detec ts a spurious directed inte ra ction with a high proba bility ev en in the small sample size, and inevitably detects a spu rio us directed interaction when the sample size is la r g e. In order to quan tify the shortcoming of our prop osed pro cedure in detecting the true directed in teraction, w e compare the statistical p o w er in detecting directed interactions of the simu lat ion mo del ( c ) with the normal Granger causalit y pr o cedure. FIG . 7 plots the statistical p ow er against the sample size for b oth pro cedures. The prop o sed pro cedure has alw a ys p o o r statistical p ow er than the normal Granger causalit y , i.e., the detection p erformance is conserv ativ e. How ev er, the shortcoming is resolv ed by increasing the sample size, since the statistical pow er saturates to 1 as the sample size incre a ses. 0.971 0.914 0.7 0.104 0.114 0.266 0.321 0.988 0.0 0.2 0.4 0.6 0.8 1.0 (a) (b) (c) (d) simulation model accuracy T = 75 0.977 0.942 0.936 0 0.983 0.999 0.889 1 0.0 0.2 0.4 0.6 0.8 1.0 (a) (b) (c) (d) simulation model accuracy T = 1000 FIG. 6. C omparison of accuracy in d etecting d ir ected interac tion b et w een n orm al metho d (red bars with dash ed b ord er lines) and pr op osed metho d (blue bars with solid b ord er lines). The accuracy is calculated with 10,000 trials for a sample s ize T = 75 (left) and T = 1 , 0 00 (right). 20 0.0 0.2 0.4 0.6 0.8 1.0 0 500 1000 1500 2000 sample size statistical pow er normal proposed FIG. 7. C omparison of statistica l p o wer in d etecting directed in teraction against samp le size b et ween normal metho d (red dashed line with circle p oin ts) and prop osed metho d (blue solid line with triangle p oints). This is a result of sim ulation mo del ( c ) with 10,000 trials. IV. CONCLUSIO NS The o riginal Granger causalit y test detects a spurious directed interaction when there are the confounding influence of latent common inputs a nd laten t v aria bles. In this paper, w e prop ose a new Gr anger causalit y measure whic h is robust against the influence of latent common inputs. Our metho d is inspired by partial Granger causality in the literatur e and its v ariant, in whic h the influence of t he latent common inputs b et w een the source v ariable and the target v ariables w as eliminated from the test statistics. W e prop ose a statistical measure from whic h the correlated comp onen t of the noise b et w een the source v ariable and the target v ariable w as remo ved , whic h w as caused b y the influence of laten t common inputs, while in partial G ranger causality the correlated comp onent of t he noise b et w een the tar get v a r ia ble and the conditioning v ariable w as remo v ed. Then, w e discuss the sampling distribution of our test statistics by using the lik eliho o d function of t ime s eries and show n umerically that the statistics appro ximately ob ey the F -distribution. According to what the authors in the literature say , the metho d of eliminating the correlated compo nen t of the noise b etw een the target v ariable and the conditioning v aria ble was called a partial Granger causalit y , sinc e it w as inspired b y the partial correlation in statistics. Ho we ver, in the lig ht of the discuss ion in this pap er, our metho d whic h eliminates the correlated compo nen t of the noise b et we en the 21 source v ar ia ble and the t a rget v ariable should b e said to b e a natural extension of Granger causalit y to the case where the noise has a correlation at a v arious time la g . In practice, the mo del order of in teractions a nd the lag of the correlat ed noise ar e un- kno wn. Therefore, w e prop ose a pra ctical pro cedure in detecting directed in teractions for the case where the influence of laten t common inputs may exist, which is based on a step wise v ariable increasin g metho d by m ultiple tests. The performance of our pro cedure is verifie d b y using n umerically simu lat ed time series. The normal Granger causalit y test inevitably detects a spurious directed interaction as the sample size increases, w hile the prop osed metho d do es not. The robustness of the prop o sed metho d ag ainst the influence of latent common inputs do es not change as the sample size increases , th us our metho d can remo ve the influence of the la ten t common inputs prop erly . As a shortcoming of this robustness, the ability of prop osed method in detecting the true directed interaction is w eak er than the normal Granger causalit y pro cedure. How ev er, this shortcoming can b e resolv ed adequately b y increasing the sample size, therefore the pro p osed metho d is pra ctical enough. Of the confounding environmen tal influence, this study only treats the laten t common inputs while the influence of latent v ariables w as not considered. It is b ecause, the influence of laten t v ariables app ears as a n auto-cor r elat ed noise, whic h cannot b e dealt with in the framew ork of the AR mo del strictly . Granger causalit y in a mo del with an auto-correlated noise has to b e defined in the framew ork o f the MA mo del. The study in this direction has b een w orked on in the f ramew ork of the s ta t e space mo del [2 1 ]. Our metho d assumes t ha t the net work structure of the in teraction b etw een eac h v ariable is asymmetric. Therefore, the extension to the case where the net works structure ma y b e symmetric will be in future works . In addition, it is impor tan t to sho w a theoretical justi- fication of the parameter estimation prop osed in this pap er. F urthermore, it is interes ting to extend the pro p osed metho d to p oin t pro cess mo dels, suc h as spik e activity mo del of neuron. 22 REFERENCES [1] N. Wiener, The the ory o f pr e diction in Mo dern Mathematics for the Engine er (New Y ork: McGra w-Hill, 1956) p p. 165–19 0. [2] C . W. J. Granger, Econometrica 37 , 424 (1969 ) . [3] C . Granger, Jour nal of Economic Dynamics and Control 2 , 329 (1980). [4] L. Baccala, K. Sameshima, G. Balle ster, a n d A. C . d o V alle, Applied S ignal Pr o cessing 5 (1998 ). [5] B. Gourevitc h, R. L. B. Jeannes, and G. F aucon, Biologic al Cyb ernetics 95 (2006). [6] A. B. Barrett, M. Mu rphy , M.-A. Bru n o, Q. Norhomme, M. Boly , S. Laureys, and A. K. S eth, Journal of Neuroscience 35 , 3293 (2015). [7] M. Ding, Y. Chen, and S. L. Bressler, Gr anger Causality: Basic The ory and Applic ations to Neur oscienc e. In: S c h l t e r S , W i n t e r h a l d e r N . , Ti m m e r J . , e d s . h a n d b o o k o f t i m e s e r i e s a n a l y s i s (Wienheim: Wiley , 2006) Chap. 17, pp. 437– 460, https://onlinelibrary .wiley .com/doi/pd f /10.10 0 2/9783527609 9 7 0 . c h 1 7 . [8] Y. Chen, S . L. Bressler, and M. Ding, Journal of Neuroscience Metho ds 150 , 228 (2006). [9] S . L. Bressler and A. K. S eth, NeuroImage 58 , 323 (2011). [10] A. K. Seth, A. B. Barrett, a n d L. Barnett, Jour nal of Neur oscience 35 , 3293 (2015). [11] F. De Vico F allani, M. C orazzol, J. R. Sternberg, C. Wy art, and M. C h a ve z, IEEE T ransactions on Neural Sy s tems and Rehabilitatio n En gineering 23 , 333 (201 5) . [12] H. I . Steve n s on, M. J . Reb esco, G. N. Hatsop oulos, Z. Haga, E. L. Miller, and P . K. Kording, IEEE T ransactions on Neural Sy s tems and Rehabilitatio n En gineering 17 , 203 (200 9). [13] S. Kim, D. Putrino, S. Ghosh, and N. E . Bro wn, Plos Computational Bio logy 7 , e10 01110 (2011 ). [14] F. Gerhard, T. Kisp ersky , G. J. Gutierrez, E. Marder , M. Kramer, and U. Ed en, Plos Comp utational Biology 9 , e100 3138 (2013). [15] A. Ro ebro ec k, E. F ormisano, and R. Goeb el, NeuroImage 58 , 296 (2011). [16] J. P earl, Causality: Mo dels, R e asoning, a nd Infe r enc e (Cam bridge Univ ersit y Press, 2000 ). [17] S. Guo, A. K. Seth, K. M. Kendr ick, C. Zhou, and J. F eng, Journal of Neuroscience Metho ds 172 , 79 (2008). [18] S. M. Smith, K. L. Miller, G. Salimi-Khorshidi, M. W ebster, C. F. Bec km an n , T. E. Nic hols, 23 J. D. Ramsey , and M. W. W oolric h , NeuroImage 54 , 875 (2011). [19] B. Ro elstraete and Y. Rosseel, Journal of Neuroscience Metho d s 206 , 73 (201 2) . [20] A. B. Barrett, L. Barnett, and A. K. Seth, Phys. Rev. E 81 , 04190 7 ( 2010) . [21] L. Barnett and A. K. Seth, Phys. Rev. E 91 , 04010 1 ( 2015) . [22] A. J. Dobson and A. G. Barnett, An Intr o duction to Gener alize d Line ar Mo dels, Thir d Edition (CR C Press, 2008). [23] P . Mccullagh and J . A. Nelder, Gener alize d Line ar Mo dels, Se c ond Edition (Chapman and Hall, 2008). [24] L. A. Baccal´ a and K. Samesh ima, Biological Cyb er n etics 84 , 463 (20 01) . 24

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment