Counterfactual Reasoning in Linear Structural Equation Models

Coun terfactual Reasoning in Linear Structural Equation Mo dels Zhihong Cai Department of Biostatistics Graduate Sc ho ol of Public Health Kyoto Univ ersity Konoe-cho,Y oshida,Saky o-ku,Kyoto ,Japan cai@pbh.med.kyoto-u.ac.jp Manabu Kuroki Department of Systems Innov ation Graduate School of Engineering Science Osak a Universit y 1-3,Machik aney ama-cho,T o yo nak a,Osak a,Japan mkuroki@sigmath.es.osaka-u.ac.jp Abstract Consider the case where causal relations among v ariables can be describ ed as a Gaus- sian linear structural equation mo del. This paper deals with the problem of clarifying how the v ariance of a resp onse v ariable would hav e c hanged if a treatment v ariable were assigned to some v alue (counterfactually), given that a set of v ariables is observed (actu- ally). In order to achieve this aim, we refor- mulate the form ulas of the counterfactual dis- tribution prop osed by Balke and Pearl (1995) through b oth the total eﬀects and a cov ari- ance matrix of observed v ariables. W e fur- ther extend the framework of Balke and Pearl (1995) from p oint observ ations to interv al ob- serv ations, and from an unconditional plan to a conditional plan. The results of this pap er enable us to clarify the prop erties of counter- factual distribution and establish an optimal plan. 1 INTR ODUCTION Causal inference with counterfactual reasoning is widely used in epidemiology , economics, and politi- cal science. An example of coun terfactual is that “if I had taken aspirin, my headache would ha ve gone now”, which implies that in the actual world I did not take aspirin, and I still hav e the headac he now. This example compares tw o outcomes: the actual out- come that I ha ve the headache no w b ecause I did not take aspirin, and the counterfactual outcome that my headache would hav e gone if I had tak en aspirin. Ev aluation of coun terfactual queries plays an imp or- tant role in treatment estimation, lawsuit comp ensa- tion on hazardous exp osure, planning and p olicy anal- ysis. In medical science, comparison of the actual out- come and the counterfactual outcome is imp ortant to estimate the treatment or exposure eﬀect in clinical tri- als and epidemiological studies. In economics, in order to ev aluate the merit of a p olicy (e.g., taxation), all the possible inﬂuences in v arious counterfactual worlds are compared, where each world is created by a h yp othet- ical implemen tation of a policy . Counterfactual reasoning has been studied by man y re- searchers in epidemiology (e.g. Greenland and Robins, 1988; Robins, 2004; Robins and Greenland, 1989a, 1989b). It is also one of the hot topics in artiﬁcial intelligence (e.g. P earl, 1999, 2000; Tian and Pearl, 2000a, 2000b). Balke and Pearl (1994a, 1994b, 1995) presented formal notation, semantics and inference al- gorithms for the ev aluation of counterfactual queries, and pro vided computational methods for the coun ter- factual distribution in the con text of structural equa- tion mo dels. In this pap er, we examine counterfactual queries prob- lem in the framew ork of linear structural equation models. W e assume that causal knowledge is sp eci- ﬁed by linear structural equation mo dels. Then, giv en that a set of v ariables is observed in the actual world, the aim of this paper is to examine how the mean and v ariance of a response v ariable w ould hav e changed if a treatment v ariable w ere assigned to some other v alue in the counterfactual world. Balke and Pearl (1994a, 1995) pro vided counterfactual form ulas based on the given distribution of disturbances, but in gen- eral it may b e diﬃcult to kno w the distribution of dis- turbances. Therefore, in this pap er, we consider us- ing the v ariables in a path diagram to represent the distribution of disturbances. Then we formulate the mean and v ariance of the counterfactual distribution through b oth the total eﬀect of a treatmen t v ariable on a resp onse v ariable and the cov ariance matrix of a set of observ ed v ariables. Based on this formulation, we can decide the v ariables needed to b e observed in order to examine the c hange of the mean and v ariance of a resp onse v ariable in the counterfactual world. It is shown that for this purp ose we need not observe all the v ariables in the path diagram, but only a subset on it. This formulation enables us to clarify the prop erties of counterfactual distribution. This pap er ﬁrst considers the case where a set of p oint observ ations is av ailable. Then we use the observed data to up date the distribution of disturbances. W e then ev aluate the counterfactual mean and v ariance of a response v ariable if a ﬁxed in terven tion of the treat- ment v ariable X = x 0 were conducted, which is called an unconditional plan. By reformulating the formu- las prop osed by Balke and Pearl (1995), we represent the counterfactual distribution through b oth the to- tal eﬀect and a cov ariance matrix of observed v ari- ables, which makes it easier to analyze the properties of counterfactual distribution. Second, we consider the case where a set of interv al observ ations r 1 ≤ R ≤ r 2 are observed. W e up date distribution of disturbances by using the interv al observ ations. Then we ev aluate how the mean and v ariance of a resp onse v ariable would hav e changed if a conditional plan X = x 0 + aW were conducted, which means that the v alue of X w as deter- mined by a set of observed v ariables W . Through this formula, we can select an optimal plan that minimizes the v ariance of the resp onse v ariable. 2 LINEAR STRUCTRAL EQUA TION MODEL 2.1 P A TH DIA GRAM In statistical causal analysis, a directed acyclic graph that represents cause-eﬀect relationships is called a path diagram. A directed graph is a pair G =( V , E ), where V is a ﬁnite set of v ertices and the set E of arrows is a subset of the set V × V of ordered pairs of distinct vertices. Regarding the graph theoretic termi- nology used in this pap er, for example, refer to Kuroki and Cai (2004). DEFINITION 1 (P A TH DIA GRAM) Suppose a directed acyclic graph G =( V , E ) with a set V = { V 1 , ··· ,V n } of v ariables is given. The graph G is called a path diagram, when each c hild-parent family in the graph G represen ts a linear structural equation model V i =  V j ∈ pa( V i ) α v i v j V j +  v i ,i =1 ,...,n , (1) where pa( V i ) denotes a set of parents of V i in G and  v 1 ,..., v n are assumed to be normally distributed. In addition, α v i v j (  =0) is called a path coeﬃcient. 2 F or detailed discussion regarding linear structural equation models, refer to Bollen (1989). Here, w e deﬁne some notations for future use. Let σ xy · z =c o v ( X, Y | Z ), σ yy · z = v ar( Y | Z ). In addition, β yx · z = σ xy · z /σ xx · z is a regression co eﬃcient of X in the regression mo del of Y on { x }∪ z . When Z is an empty set, Z is omitted from these argumen ts. The similar notations are used for other parameters. F or a set Z of v ariables not including descendants of V j ,i f Z d-separates V i from V j in the graph obtained by deleting from a graph G an arrow p ointing from V i to V j , then β v j v i · z = α v j v i holds true. This criterion is called ”the single do or criterion” (e.g. Pearl, 2000). In addition, when Z d-separates V i from V j in the graph G , V i is conditionally indep endent of V j given Z in the corresp onding distribution (e.g. Pearl, 1988, 2000; Spirtes et al., 1993). 2.2 IDENTIFIABILITY CRITERIA F OR TOT AL EFFECTS Given a path diagram G , we wish to ev aluate total eﬀects from the correlation parameters betw een v ari- ables, where a total eﬀect τ yx of X on Y is deﬁned as the total sum of the pro ducts of the path coeﬃcients on the sequence of arrows along all directed paths from X to Y . How ever, in man y cases, it is diﬃcult to ob- tain all the correlation parameters, since there usually exist unobserved v ariables. Hence, it is imp ortant to recognize suﬃcient sets of observed v ariables in order to ev aluate total eﬀects. P earl (2000), Brito (2003), Brito and Pearl (2002a, 2002b, 2002c) and Tian (2004) provided identiﬁabilit y criteria for causal parameters such as total eﬀects, where ”identiﬁable” means that causal parameters can b e estimated consisten tly . In this pap er, we introduce the back do or criterion (e.g. Pearl, 2000) and the conditional instrumen tal v ariable method (Brito and P earl, 2002a) as graphical iden tiﬁ- ability criteria for total eﬀects. DEFINITION 2 (BA CK DOOR CRITERION) Let { X, Y } and T b e disjoint subsets of V in a path diagram G . If a set T of v ariables satisﬁes the fol- lowing conditions relativ e to an ordered pair ( X, Y ) of v ariables, then T is said to satisfy the back door criterion relative to ( X, Y ). 1. No vertex in T is a descendant of X , and 2. T d-separates X from Y in G X , where G X is the graph obtained by deleting from a graph G all arrows emerging from v ertices in X . 2 If a set T of observ ed v ariables satisﬁes the back do or criterion relativ e to ( X, Y ) in a path diagram G , then the total eﬀect τ yx of X on Y is identiﬁable through the observ ation of { X, Y }∪ T , and is given by the formula β yx · t (Pearl, 2000). DEFINITION 3 (CONDITIONAL INSTR U- MENT AL V ARIABLE (IV)) Let { X, Y , Z } and T b e disjoint subsets of V in a path diagram G . If a set T ∪{ Z } of v ariables satisﬁes the following conditions relative to an ordered pair ( X, Y ) of v ariables, then Z is said to b e a conditional instru- mental v ariable (IV) given T relativ e to ( X, Y ). 1. T is a subset of nondescendants of Y in G , 2. T d-separates Z from Y but not from X in G X . 2 When Z is a conditional instrumental v ariable giv en T relative to ( X, Y ), a total eﬀect of X on Y is iden ti- ﬁable through the observation of { X, Y , Z }∪ T , and is given b y σ yz · t /σ xz · t (Brito and P earl, 2002a). Regarding the discussion ab out selection of identiﬁa- bility criteria, refer to Kuroki and Cai (2004). 3 COUNTERF A CTUAL ANAL YSIS 3.1 REF ORMULA TION OF BALKE AND PEARL (1995) In this section, w e follow the counterfactual reasoning procedure proposed by Balke and Pearl (1995), and reformulate their form ulas by representing counterfac- tual mean and v ariance through path coeﬃcients and a co v ariance matrix of observed v ariables. F or this purp ose, we partition a set V of vertices in a path diagram G into the following three disjoint sets: S = { Y }∪ U : a set of descendan ts of X whose ﬁrst component is a resp onse v ariable Y of interest ( Y ∈ U ), X : a treatment v ariable, T = Z ∪ W = V \ ( { X }∪ S ) : a set of nondescendants of X ( W ∩ Z = φ ). Denote n s as the num b er of elements in S , and the similar notations are used for other n umbers. Accord- ing to the ab ov e partition of V , let A st be a path coeﬃcient matrix of T on S whose ( i, j ) comp onent is the path coeﬃcient of T j on S i ( S i ∈ S ,T j ∈ T ). Let 0 xs be an ( n x ,n s ) zero matrix and I ss an n s di- mensional identit y matrix, resp ectively . The similar notations are used for other matrices. Then, equation (1) can b e rewritten as follows: ⎛ ⎝ S X T ⎞ ⎠ = ⎛ ⎝ A ss A sx A st 0 xs 0 A xt 0 ts 0 tx A tt ⎞ ⎠ ⎛ ⎝ S X T ⎞ ⎠ + ⎛ ⎝  s  x  t ⎞ ⎠ , (2) where  s ,  x and  t are random disturbance vectors corresponding to S , X and T , respectively . In addi- tion, A sx =  A yx A ux  ,A tt =  A zz A zw A wz A ww  , A st =( A sz ,A sw )=  A yz A yw A uz A uw  , and A xt =( A xz ,A xw ). Here, w e deﬁne some notations for future use. F or sets X , Y and Z , let B yx · z be the regression co eﬃ- cient matrix of x in the regression mo del of Y on x ∪ z , and let Σ xy · z be the conditional cov ariance matrix b e- tw een X and Y given Z . In addition, let Σ xx · z be the conditional cov ariance matrix of X given Z . Then, B yx · z can b e ev aluated b y Σ yx · z Σ − 1 xx · z . F urthermore, let µ y · z be the conditional mean vector of Y given Z . Especially , when Y consists of one v ariable Y , the con- ditional mean of Y giv en Z is denoted b y µ y · z . When Z is an empty set, Z is omitted from these arguments. The similar notations are used for other matrices and parameters. 3.1.1 INTER VENTION First, we ev aluate the mean and v ariance of the re- sponse v ariable Y when an external interv ention X = x 0 is conducted. W e use the v ariables in the path diagram and their path coeﬃcients to represent the distribution of disturbances. F rom equation (2), the mean and the v ariance of  v can be provided as ⎛ ⎝ µ  s µ  x µ  t ⎞ ⎠ = ⎛ ⎝ I ss − A ss − A sx − A st 0 xs 1 − A xt 0 ts 0 tx I tt − A tt ⎞ ⎠ ⎛ ⎝ µ s µ x µ t ⎞ ⎠ and ⎛ ⎝ Σ  s  s Σ  s  x Σ  s  t Σ  x  s σ  x  x Σ  x  t Σ  t  s Σ  t  x Σ  t  t ⎞ ⎠ = ⎛ ⎝ I ss − A ss − A sx − A st 0 xs 1 − A xt 0 ts 0 tx I tt − A tt ⎞ ⎠ × ⎛ ⎝ Σ ss Σ sx Σ st Σ xs σ xx Σ xt Σ ts Σ tx Σ tt ⎞ ⎠ × ⎛ ⎝ I ss − A ss − A sx − A st 0 xs 1 − A xt 0 ts 0 tx I tt − A tt ⎞ ⎠  (3) respectively . Here, let S ∗ and T ∗ represent a set of descendants of X and a set of nondescendan ts of X after conducting an external in terven tion X = x 0 (The similar notations are used for other discussions). Then, the modiﬁed structural equation mo del can b e provided as  S ∗ T ∗  =  A ss A st 0 ts A tt  S ∗ T ∗  +  A sx 0 tx  x 0 +   s  t  . (4) Let µ s ∗ and µ t ∗ be the mean vectors of S ∗ and T ∗ , respectively . In addition, let Σ s ∗ s ∗ ,Σ s ∗ t ∗ , and Σ t ∗ t ∗ be the co v ariance matrix of S ∗ , the co v ariance matrix between S ∗ and T ∗ and the co v ariance matrix of T ∗ , respectively . The similar notations are used for other parameters. Then, the mean v ector and the cov ariance matrix of equation (4) are  µ s ∗ µ t ∗  =  I ss − A ss − A st 0 ts I tt − A tt  − 1 ×  A sx 0 tx  x 0 +  µ  s µ  t  (5)  Σ s ∗ s ∗ Σ s ∗ t ∗ Σ t ∗ s ∗ Σ t ∗ t ∗  =  I ss − A ss − A st 0 ts I tt − A tt  − 1 ×  Σ  s  s Σ  s  t Σ  t  s σ  t  t  I ss − A ss − A st 0 ts I tt − A tt  − 1 , (6) respectively . Here, since  I ss − A ss − A st 0 ts I tt − A tt  − 1 ×  I ss − A ss − A st − A sx 0 ts I tt − A tt 0 tx  =  I ss 0 st − ( I ss − A ss ) − 1 A sx 0 ts I tt 0 tx  , (7) by substituting equation (3) for equation (6), we can obtain µ s ∗ = µ s + τ sx ( x 0 − µ x ) (8) Σ s ∗ s ∗ =Σ ss − Σ sx τ  sx − τ sx Σ xs + τ sx τ  sx σ xx =Σ ss · x +( τ sx − B sx )( τ sx − B sx )  σ xx , (9) where τ sx =( I ss − A ss ) − 1 A sx . Noting that the ﬁrst component of S is the response v ariable Y , then the mean and the v ariance of Y when an external interv ention is conducted are pro vided as µ y ∗ = µ y + τ yx ( x 0 − µ x ) , (10) and σ y ∗ y ∗ = σ yy · x +( τ yx − β yx ) 2 σ xx (11) respectively . Equations (10) and (11) are dep endent on the total eﬀect τ yx , the v ariances σ xx and σ yy of X and Y , and the cov ariance σ xy between X and Y . Thus, the graphical criteria for identifying total eﬀects stated in section 2.2 (the back do or criterion and the conditional IV method) can b e used to identify the mean and the v ariance of Y when conducting an ex- ternal interv ention. 3.1.2 INTER VENTION CONDITIONING ON OBSER V A TIONS Next, we ev aluate the mean and v ariance of the re- sponse v ariable Y if an external interv ention were con- ducted in the coun terfactual world, giv en that the ac- tual p oint observations R = r are observ ed. F rom equation (2), the conditional mean vector and the con- ditional co v ariance matrix of  v given R = r are ⎛ ⎝ µ  s · r µ  x · r µ  t · r ⎞ ⎠ = ⎛ ⎝ I ss − A ss − A sx − A st 0 xs 1 − A xt 0 ts 0 tx I tt − A tt ⎞ ⎠ ⎛ ⎝ µ s · r µ x · r µ t · r ⎞ ⎠ and ⎛ ⎝ Σ  s  s · r Σ  s  x · r Σ  s  t · r Σ  x  s · r σ  x  x · r Σ  x  t · r Σ  t  s · r Σ  t  x · r Σ  t  t · r ⎞ ⎠ = ⎛ ⎝ I ss − A ss − A sx − A st 0 xs 1 − A xt 0 ts 0 tx I tt − A tt ⎞ ⎠ × ⎛ ⎝ Σ ss · r Σ sx · r Σ st · r Σ xs · r σ xx · r Σ xt · r Σ ts · r Σ tx · r Σ tt · r ⎞ ⎠ × ⎛ ⎝ I ss − A ss − A sx − A st 0 xs 1 − A xt 0 ts 0 tx I tt − A tt ⎞ ⎠  . Letting (  s · r ,  z · r ,  w · r ) b e the up dated disturbances with mean and cov ariance matrix ab ov e, since the modiﬁed structural equation mo del if an external in- terven tion X = x 0 were conducted in the counterfac- tual world is  S ∗ T ∗  =  A ss A st 0 ts A tt  S ∗ T ∗  +  A sx 0 tx  x +   s · r  t · r  , the mean v ector and the cov ariance matrix are  µ s ∗ µ t ∗  =  I ss − A ss − A st 0 ts I tt − A tt  − 1 ×  A sx 0 tx  x 0 +  µ  s · r µ  t · r  and  Σ s ∗ s ∗ Σ s ∗ t ∗ Σ t ∗ s ∗ Σ t ∗ t ∗  =  I ss − A ss − A st 0 ts I tt − A tt  − 1 ×  Σ  s  s · r Σ  s  t · r Σ  t  s · r Σ  t  t · r  I ss − A ss − A st 0 ts I tt − A tt  − 1 respectively . F rom equation (7), we can obtain µ s ∗ = µ s · r + τ sx ( x 0 − µ x · r ) Σ s ∗ s ∗ =Σ ss · r − Σ sx · r τ  sx − τ sx Σ xs · r + τ sx τ  sx σ xx · r =Σ ss · xr +( τ sx − B sx · r )( τ sx − B sx · r )  σ xx · r =Σ ss · x +( τ sx − B sx )( τ sx − B sx )  σ xx − ( B sr − τ sx B xr )Σ rr ( B sr − τ sx B xr )  . Thus, giv en R = r , the mean and v ariance of Y if an external in terven tion X = x 0 were conducted are ev aluated as µ y ∗ = µ y · r + τ yx ( x 0 − µ x · r ) , (12) σ y ∗ y ∗ = σ yy · x +( τ yx − β yx ) 2 σ xx − ( B yr − τ yx B xr )Σ rr ( B yr − τ yx B xr )  . (13) respectively . The last term in equation (13) is the correlation b et ween R and Y excluding the correlation between R and Y via X . 3.2 PR OPER TIES Based on the reform ulations in section 3.1, we can de- rive the follo wing properties: (I) It can b e seen from equations (10), (11), (12) and (13) that the identiﬁabilit y condition for counterfac- tual mean and v ariance is the same as that for the total eﬀect τ yx of X on Y when b oth X and Y are observed. That is, since these equations are only de- pendent on the total eﬀect τ yx , the graphical criteria for identifying total eﬀects stated in section 2.2 (the back do or criterion and the conditional IV metho d) can also b e used to identify the counterfactual mean and v ariance. (II) Regarding equations (11) and (13), the ﬁrst term is the conditional v ariance of Y giv en X , and the second term is the square of the spurious correlation betw een X and Y . The tw o terms are not dep endent on the selection of R . On the other hand, the last term in equation (13) is dep enden t on R . (II I) When there is no confounder, since ( τ yx − β yx ) 2 σ xx = 0 holds true, both equations (11) and (13) are smaller than b oth the actual v ariance of Y and the conditional v ariance of Y giv en X . (IV) When R satisﬁes the back do or criterion relativ e to ( X, Y ), we can obtain σ y ∗ y ∗ = σ yy · x + B yr · x B rx σ xx B  rx B  yr · x − B yr · x Σ rr B  yr · x = σ yy · x − B yr · x Σ rr · x B  yr · x = σ yy · xr ≤ σ yy · x ≤ σ yy from Lemma 1 in Kuroki and Cai (2004). That is, when R satisﬁes the back do or criterion relative to ( X, Y ), the counterfactual v ariance of Y is alw ays smaller than b oth the actual v ariance of Y and the conditional v ariance of Y giv en X . (V) In the case where the total eﬀect of X on Y and the spurious correlation betw een X and Y ha ve dif- ferent signs, the counterfactual v ariance of Y ma y b e larger than that of Y in the actual world. F or example, letting R b e an empty set, when τ yx  =0 but β yx =0 holds true in equation (11), σ y ∗ y ∗ = σ yy · x +( τ yx − β yx ) 2 σ xx = σ yy + τ 2 yx σ xx ≥ σ yy . This situation o ccurs in the case where the correlation between X and Y is small although the spurious cor- relation is large, which is often called the parametric cancellation (refer to Co x and W ermuth, 1996). 3.3 EXTENSION OF BALKE AND PEARL (1995) Given that a set of p oint observ ations R = r is ob- served, Balke and Pearl (1995) ev aluated the coun ter- factual mean and v ariance of a resp onse v ariable Y if a ﬁxed interv ention of a treatment v ariable X = x 0 were conducted, whic h is called as an unconditional plan in this paper. In this section, we extend their framework in tw o asp ects: from an unconditional plan to a con- ditional plan, and from p oint observ ations to interv al observ ations. Suppose that a set of interv al observ ations r 1 ≤ R ≤ r 2 are observed in the actual world. Here, r 1 ≤ R ≤ r 2 indi- cates that r 1 i ≤ R i ≤ r 2 i holds true for any r 1 i ∈ r 1 , R i ∈ R and r 2 i ∈ r 2 . In addition, R can include a treatment v ariable X and/or a resp onse v ariable Y . Then we consider that a conditional plan were conducted in the counterfactual world, which means that the v alue of X is set according to the following function, where W is a set of observed v ariables of nondescendants of X : X = x 0 + aW , (14) where x 0 and a are a constant v alue and a constant vector, resp ectively . When a is a non-zero v ector, equation (14) is called a conditional plan, otherwise it is called an unconditional plan (e.g. Pearl, 2000). In this section, in order to extend the results of Balke and Pearl (1995), we deﬁne the following no- tations. Let σ xy · [ z ] =c o v ( X, Y | z 1 ≤ Z ≤ z 2 ), σ yy · [ z ] = va r ( Y | z 1 ≤ Z ≤ z 2 ) and µ y · [ z ] = E ( Y | z 1 ≤ Z ≤ z 2 ). F or sets X , Y and Z , Let µ y · [ z ] ,Σ xy · [ z ] and Σ yy · [ z ] be the conditional mean vector of Y given z 1 ≤ Z ≤ z 2 , the conditional cov ariance matrix b etw een X and Y given z 1 ≤ Z ≤ z 2 and the conditional cov ariance matrix of Y given z 1 ≤ Z ≤ z 2 , resp ectively . When Z is an empty set, Z is omitted from these arguments. The similar notations are used for other matrices and parameters. First, we update the distribution of disturbances by using the set of interv al observ ations r 1 ≤ R ≤ r 2 . The mean v ector and cov ariance matrix are ⎛ ⎝ µ  s · [ r ] µ  x · [ r ] µ  t · [ r ] ⎞ ⎠ = ⎛ ⎝ I ss − A ss − A sx − A st 0 xs 1 − A xt 0 ts 0 tx I tt − A tt ⎞ ⎠ × ⎛ ⎝ µ s · [ r ] µ x · [ r ] µ t · [ r ] ⎞ ⎠ (15) and ⎛ ⎝ Σ  s  s · [ r ] Σ  s  x · [ r ] Σ  s  t · [ r ] Σ  x  s · [ r ] σ  x  x · [ r ] Σ  x  t · [ r ] Σ  t  s · [ r ] Σ  t  x · [ r ] Σ  t  t · [ r ] ⎞ ⎠ = ⎛ ⎝ I ss − A ss − A sx − A st 0 xs 1 − A xt 0 ts 0 tx I tt − A tt ⎞ ⎠ × ⎛ ⎝ Σ ss · [ r ] Σ sx · [ r ] Σ st · [ r ] Σ xs · [ r ] σ xx · [ r ] Σ xt · [ r ] Σ ts · [ r ] Σ tx · [ r ] Σ tt · [ r ] ⎞ ⎠ × ⎛ ⎝ I ss − A ss − A sx − A st 0 xs 1 − A xt 0 ts 0 tx I tt − A tt ⎞ ⎠  , (16) respectively . Here, Σ tt · [ r ] =  Σ zz · [ r ] Σ zw · [ r ] Σ wz · [ r ] Σ ww · [ r ]  , Σ ss · [ r ] =  σ yy · [ r ] Σ yu · [ r ] Σ uy · [ r ] Σ uu · [ r ]  , Σ st · [ r ] =  σ yz · [ r ] Σ yw · [ r ] Σ uz · [ r ] Σ uw · [ r ]  . Thus, when a conditional plan X = x 0 + aW were conducted in the counterfactual world, w e can obtain ⎛ ⎝ S ∗ Z ∗ W ∗ ⎞ ⎠ = ⎛ ⎝ A sx 0 zx 0 wx ⎞ ⎠ x 0 + ⎛ ⎝  s · [ r ]  z · [ r ]  w · [ r ] ⎞ ⎠ + ⎛ ⎝ A ss A sz A sw + A sx a  0 zs A zz A zw 0 ws A wz A ww ⎞ ⎠ ⎛ ⎝ S ∗ Z ∗ W ∗ ⎞ ⎠ , (17) where (  s · [ r ] ,  z · [ r ] ,  w · [ r ] ) has the mean vector as equa- tion (15) and the cov ariance matrix as equation (16). Thus, the mean vector and the co v ariance matrix if a control plan X = x 0 + aW given r 1 ≤ R ≤ r 2 were conducted in the counterfactual world are ⎛ ⎝ µ s ∗ µ z ∗ µ w ∗ ⎞ ⎠ = ⎛ ⎝ I ss − A ss − A sz − A sw − A sx a 0 zs I zz − A zz − A zw 0 ws − A wz I ww − A ww ⎞ ⎠ − 1 × ⎛ ⎝ ⎛ ⎝ A sx 0 zx 0 wx ⎞ ⎠ x 0 + ⎛ ⎝ µ  s · [ r ] µ  x · [ r ] µ  t · [ r ] ⎞ ⎠ ⎞ ⎠ , (18) and ⎛ ⎝ Σ s ∗ s ∗ Σ s ∗ z ∗ Σ s ∗ w ∗ Σ z ∗ s ∗ Σ z ∗ z ∗ Σ z ∗ w ∗ Σ w ∗ s ∗ Σ w ∗ z ∗ Σ w ∗ w ∗ ⎞ ⎠ = ⎛ ⎝ I ss − A ss − A sz − A sw − A sx a 0 zs I zz − A zz − A zw 0 ws − A wz I ww − A ww ⎞ ⎠ − 1 × ⎛ ⎝ Σ  s  s · [ r ] Σ  s  x · [ r ] Σ  s  t · [ r ] Σ  x  s · [ r ] σ  x  x · [ r ] Σ  x  t · [ r ] Σ  t  s · [ r ] Σ  t  x · [ r ] Σ  t  t · [ r ] ⎞ ⎠ × ⎛ ⎝ I ss − A ss − A sz − A sw − A sx a 0 zs I zz − A zz − A zw 0 ws − A wz I ww − A ww ⎞ ⎠ − 1 , (19) respectively . Here, since ⎛ ⎝ I ss − A ss − A sz − A sw − A sx a 0 zs I zz − A zz − A zw 0 ws − A wz I ww − A ww ⎞ ⎠ − 1 × ⎛ ⎝ I ss − A ss − A sz − A sw − A sx 0 zs I zz − A zz − A zw 0 zx 0 ws − A wz I ww − A ww 0 wx ⎞ ⎠ = ⎛ ⎝ I ss C sz C sw − ( I ss − A ss ) − 1 A sx 0 zs I zz 0 zw 0 zx 0 ws 0 wz I ww 0 wx ⎞ ⎠ , where ( C sz ,C sw )=( I ss − A ss ) − 1 ( A sz ,A sw + A sx a ) − ( I ss − A ss ) − 1 ( A sz ,A sw ) =( I ss − A ss ) − 1 ( 0 sz ,A sx a ) , by substituting equation (16) for equation (19), we can obtain Σ s ∗ s ∗ =Σ ss · [ r ] + τ sx a Σ ww · [ r ] a  τ  sx + τ sx σ xx · [ r ] τ  sx + τ sx a Σ ws · [ r ] +Σ sw · [ r ] a  τ  sx − τ sx Σ xs · [ r ] − Σ sx · [ r ] τ  sx − τ sx a Σ wx · [ r ] τ  sx − τ sx Σ xw · [ r ] a  τ  sx =  Σ ss · [ r ] − Σ sx · [ r ] Σ xs · [ r ] σ xx · [ r ]  +  τ sx − Σ sx · [ r ] σ xx · [ r ]  τ sx − Σ sx · [ r ] σ xx · [ r ]   σ xx · [ r ] +( τ sx a +Σ sw · [ r ] Σ − 1 ww · [ r ] − τ sx Σ xw · [ r ] Σ − 1 ww · [ r ] ) × Σ ww · [ r ] × ( τ sx a +Σ sw · [ r ] Σ − 1 ww · [ r ] − τ sx Σ xw · [ r ] Σ − 1 ww · [ r ] )  − (Σ sw · [ r ] Σ − 1 ww · [ r ] − τ sx Σ xw · [ r ] Σ − 1 ww · [ r ] ) × Σ ww · [ r ] × (Σ sw · [ r ] Σ − 1 ww · [ r ] − τ sx Σ xw · [ r ] Σ − 1 ww · [ r ] )  . (20) It is seen that only the third term of equation (20) is dep endent on a . When ( τ sx a +Σ sw · [ r ] Σ − 1 ww · [ r ] − τ sx Σ xw · [ r ] Σ − 1 ww · [ r ] )= 0 holds true, letting the a to ˇ a , the v ariance of Y if the conditional plan X = x 0 + ˇ aW were conducted in the counterfactual world is the smallest in the conditional plan X = x 0 + aW . This conditional plan is called an optimal plan in this pap er. Regarding the detailed discussion of an optimal plan and its application, refer to Kuroki (2005). In this case, µ s ∗ = µ s · [ r ] + τ sx ( x 0 − µ x · [ r ] ) +( τ sx Σ xw · [ r ] Σ − 1 ww · [ r ] − Σ sw · [ r ] Σ − 1 ww · [ r ] ) µ w · [ r ] , and Σ s ∗ s ∗ =  Σ ss · [ r ] − Σ sx · [ r ] Σ xs · [ r ] σ xx · [ r ]  +  τ sx − Σ sx · [ r ] σ xx · [ r ]  τ sx − Σ sx · [ r ] σ xx · [ r ]   σ xx · [ r ] − (Σ sw · [ r ] Σ − 1 ww · [ r ] − τ sx Σ xw · [ r ] Σ − 1 ww · [ r ] )Σ ww · [ r ] × (Σ sw · [ r ] Σ − 1 ww · [ r ] − τ sx Σ xw · [ r ] Σ − 1 ww · [ r ] )  . (21) Thus, the mean and the v ariance of the resp onse v ari- able Y if the optimal conditional plan X = x 0 + ˇ aW given r 1 ≤ R ≤ r 2 are ev aluated as µ y ∗ = µ y · [ r ] + τ yx ( x 0 − µ x · [ r ] ) +( τ yx Σ xw · [ r ] Σ − 1 ww · [ r ] − Σ yw · [ r ] Σ − 1 ww · [ r ] ) µ w · [ r ] and σ y ∗ y ∗ =  σ yy · [ r ] − σ 2 yx · [ r ] σ xx · [ r ]  +  τ yx − σ yx · [ r ] σ xx · [ r ]  2 σ xx · [ r ] − ( σ yw · [ r ] Σ − 1 ww · [ r ] − τ yx Σ xw · [ r ] Σ − 1 ww · [ r ] )Σ ww · [ r ] × ( σ yw · [ r ] Σ − 1 ww · [ r ] − τ yx Σ xw · [ r ] Σ − 1 ww · [ r ] )  , (22) respectively . 3.4 PR OPER TIES Based on the formulation in section 3.3, the following properties are deriv ed: (I) Setting a to 0 , in the case where R is an empty set, equation (20) is consistent with equation (9). In the case where R = r 1 = r 2 holds true, equation (20) is consistent with equation (9). Thus, equation (20) is the extension of Balke and Pearl (1995). (II) When setting the a to 0 in equation (20), w e can obtain the counterfactual form ulas in the case where an unconditional plan giv en r 1 ≤ R ≤ r 2 were conducted. (II I) Since w e can obtain ( I ss − A ss ) − 1 Σ  s w · [ r ] =Σ sw · [ r ] − τ sx Σ xw · [ r ] − ( I ss − A ss ) − 1 A sw Σ ww · [ r ] − ( I ss − A ss ) − 1 A sz Σ zw · [ r ] based on the up dated distribution of disturbances, we can obtain Σ s ∗ w ∗ =( I ss − A ss ) − 1 A sz Σ zw · [ r ] +(( I ss − A ss ) − 1 A sw + τ sx ˇ a )Σ ww · [ r ] +( I ss − A ss ) − 1 Σ  s w · [ r ] = 0 (23) from equation (17) and ( τ sx ˇ a +Σ sw · [ r ] Σ − 1 ww · [ r ] − τ sx Σ xw · [ r ] Σ − 1 ww · [ r ] )= 0 . That is, the optimal plan can be also interpreted as the conditional plan which cancels the correlation b etw een S and W given r 1 ≤ R ≤ r 2 . (IV) Letting σ (1) y ∗ y ∗ and σ (2) y ∗ y ∗ be the counterfactual v ariance if an optimal plan X = x 0 + ˇ a w 1 W 1 and an- other optimal plan X = x 0 + ˇ a w 2 W 2 were conducted respectively , if the third term in equation (21) ( σ yw 1 · [ r ] Σ − 1 w 1 w 1 · [ r ] − τ yx Σ xw 1 · [ r ] Σ − 1 w 1 w 1 · [ r ] )Σ w 1 w 1 · [ r ] × ( σ yw 1 · [ r ] Σ − 1 w 1 w 1 · [ r ] − τ yx Σ xw 1 · [ r ] Σ − 1 w 1 w 1 · [ r ] )  ≥ ( σ yw 2 · [ r ] Σ − 1 w 2 w 2 · [ r ] − τ yx Σ xw 1 · [ r ] Σ − 1 w 2 w 2 · [ r ] )Σ w 2 w 2 · [ r ] × ( σ yw 2 · [ r ] Σ − 1 w 2 w 2 · [ r ] − τ yx Σ xw 2 · [ r ] Σ − 1 w 2 w 2 · [ r ] )  , then σ (1) y ∗ y ∗ ≤ σ (2) y ∗ y ∗ holds true. This prop erty provides a cov ariate selection criteria for minimizing the coun- terfactual v ariance of Y . 4 DISCUSSION Counterfactual reasoning is an imp ortant issue in many practical science, yet its theory is less developed. This pap er considered counterfactual problems when causal relations among v ariables can b e describ ed as a Gaussian linear structural equation mo del. W e ﬁrst re- formulated the formulas proposed by Balk e and Pearl (1995), which enables us to clarify the prop erties of counterfactual distribution. In addition, we extended the framework of Balke and Pearl (1995) in tw o as- pects: from point observ ations to in terv al observ a- tions, and from unconditional plan to conditional plan. The results of this paper will promote the application and dev elopment of coun terfactual reasoning theory . Finally , w e w ould lik e to p oint out some further w orks about this theory . First, the discussion of this pa- per is based on linear structural equation mo dels, then a natural extension is nonparametric structural equa- tion mo dels, which may be of interest in a num b er of applications. Second, this pap er ev aluated the coun- terfactual distribution when an external interv ention is conducted on a treatment v ariable, then extension to more than one treatment v ariables is also a future work. Third, the results of this pap er are applicable to acyclic graph models, then corresponding theory to cyclic graph mo dels is needed to be dev elop ed. A CKNOWLEGDEMENT Thanks go to Kazushi Maruo and Hiroki Motogaito of Osak a Universit y for their helpful discussion on this paper. The commen ts of the review ers on preliminary versions of this paper are also ackno wledged. This re- search was supp orted by the College W omen’s Asso ci- ation of Japan, the Sumitomo F oundation, the Murata Overseas Sc holarship F oundation and the Ministry of Education, Culture, Sp orts, Science and T echnology of Japan. REFERENCES Balke, A. and Pearl, J. (1994a). Probabilistic Ev al- uation of Coun terfactual Queries, P rocee dings of the 12th National Confer enc e on Artiﬁcial Intel ligenc e , 230-237. Balke, A. and Pearl, J.(1994b). Counterfactual Prob- abilities: Computational Metho ds, Bounds and Iden- tiﬁcations, P rocee ding of the 10th Confer enc e on Un- c ertainty in Artiﬁcial Intel ligenc e , 11-18. Balke, A. and P earl, J.(1995). Counterfactuals and Policy Analysis in Structural Models, P rocee ding of the 11th Confer enc e on Unc ertainty in Artiﬁcial In- tel ligenc e , 46-54. Bollen, K. A. (1989). Structur al Equations with L atent V ariables , John Wiley & Sons. Brito, C. (2003). A New Approach to the Identiﬁca- tion Problem, A dvanc es in Artiﬁcial Intel ligenc e: The 16th Br azilian Symp osium on Artiﬁcial Intel ligenc e , 41-51. Brito, C. and Pearl, J. (2002a). Generalized Instru- mental V ariables, P rocee ding of the 18th Confer enc e on Uncertainty in Artiﬁcial Intel ligenc e , 85-93. Brito, C. and P earl, J. (2002b). A Graphical Crite- rion for the Identiﬁcation of Causal Eﬀects in Linear Models, P rocee dings of the 18th National Confer enc e on Artiﬁcial Intel ligenc e , 533-538. Brito, C. and Pearl, J. (2002c). A new identiﬁcation condition for recursive mo dels with correlated errors, Structur al Equation Mo deling , 9 , 459-474. Cox, D. R. and W ermuth, N. (1996). Multivariate Dep endencies: Mo dels, Analysis and Interpretation , Chapman & Hall. Greenland, S. and Robins, J. M. (1988). Conceptual Problems in the deﬁnition and Interpretation of At- tributable fractions. A meric an Journal of Epidemiol- og y , 128 , 1185-1197. Kuroki, M. (2005). Selection of a control plan by using causal netw ork in statistical process analysis. Submit- ted. Kuroki, M. and Cai, Z. (2004). Selection of Iden tiﬁ- ability Criteria for T otal Eﬀects b y Using P ath Dia- grams. P rocee ding of the 20th Confer enc e on Unc er- tainty in Artiﬁcial Intel ligenc e , 333-340. Pearl, J. (1988). Pr ob abilistic r e asoning in intel ligenc e systems , Morgan Kaufmann. Pearl, J. (1999). Probabilities of Causation: Three Counterfactual In terpretations and Their Identiﬁca- tion, Synthese , 121 , 93-149. Pearl, J. (2000). Causality: Mo dels, R e asoning, and Infer enc e , Cam bridge Universit y Press. Robins, J. M. (2004). Should Comp ensation Schemes be based on the Probabilit y of Causation or Expected Y ears of Life Lost? Journal of L aw and Policy , 12 , 537-548. Robins, J. M. and Greenland, S. (1989a). Estimabil- ity and Estimation of Excess and Etiologic F ractions, Statistics in Me dicine , 8 , 845-859. Robins, J. M. and Greenland, S. (1989b). The Proba- bility of Causation under a Sto chastic Model for Indi- vidual Risk, Biometrics , 45 , 1125-1138. Spirtes, P ., Glymour, C., and Schienes, R. (1993). Causation, Pr e diction, and Se ar ch , Springer-V erlag. Tian, J. (2004). Identifying Linear Causal Eﬀects, P rocee dings the 19th National Confer enc e on Artiﬁcial Intel ligenc e , 104-110. Tian, J. and P earl, J. (2000a). Probabilities of Cau- sation: Bounds and Identiﬁcation. Annals of Mathe- matics and Artiﬁcial Intel ligenc e , 28 , 287-313. Tian, J. and Pearl, J. (2000b). Probabilities of Cau- sation: Bounds and Identiﬁcation. P rocee dings of the 16th Conferenc e on Unc ertainty in Artiﬁcial Intel li- genc e , 589-598.

Counterfactual Reasoning in Linear Structural Equation Models

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment