The Evaluation of Causal Effects in Studies with an Unobserved Exposure/Outcome Variable: Bounds and Identification
This paper deals with the problem of evaluating the causal effect using observational data in the presence of an unobserved exposure/ outcome variable, when cause-effect relationships between variables can be described as a directed acyclic graph and…
Authors: Manabu Kuroki, Zhihong Cai
The Ev aluation of Causal Effects in Studies with an Unobserv ed Exp osure/Outcome V ariable: Bounds and Iden tification Manabu Kuroki Department of Systems Innov ation Graduate School of Engineering Science Osak a Univ ersity mkuroki@sigmath.es.osaka-u.ac.jp Zhihong Cai Department of Biostatistics School of Public Health Kyoto Univ ersity cai@pbh.med.kyoto-u.ac.jp Abstract This paper deals with the problem of ev al- uating the causal effect using observational data in the presence of an unobserv ed ex- posure/outcome v ariable, when cause-effect relationships b et ween v ariables can be de- scribed as a directed acyclic graph and the corresponding recursive factorization of a joint distribution. First, we propose iden tifi- ability criteria for causal effects when an un- observed exposure/outcome v ariable is con- sidered to contain more than two categories. Next, when unmeasured v ariables exist b e- tw een an unobserv ed outcome v ariable and its proxy v ariables, w e pro vide the tightest bounds based on the p oten tial outcome ap- proach. The results of this pap er are helpful to ev aluate causal effects in the case where it is difficult or exp ensiv e to observe an ex- posure/outcome v ariable in many practical fields. 1 INTR ODUCTION The ev aluation of causal effects from observ ational studies is one of the cen tral aims in many fields of prac- tical science. F or this purpose, man y researc hers ha ve attempted to clarify cause-effect relationships and to ev aluate the causal effect of an exposure v ariable on an outcome v ariable through observed data. Statis- tical causal analysis, which is one of pow erful to ols for solving these problems, started with path analy- sis (W right, 1923, 1934), and adv anced to structural equation mo dels (W old, 1954; Bollen, 1989). It also has been modified in order to b e applicable to categor- ical data (Go odman, 1973, 1974a, 1974b; Hagenaars, 1993). Recently , P earl (2000) developed a new frame- work of causal mo deling based on a directed acyclic graph and the corresp onding nonparametric structural equation model. In observ ational studies, there often exist unobserved v ariables, whic h mak es it difficult to ev aluate reliable causal effects. Man y researchers hav e proposed v ari- ous useful approac hes to ev aluate causal effects when unobserved v ariables are confounding factors betw een an exp osure v ariable and an outcome v ariable, suc h as the instrumen tal v ariable metho d and sensitivity analysis. In the context of graphical causal mo dels, Pearl (2000) provided the mathematical definition of the causal effect. In addition, when b oth an exposure v ariable and an outcome v ariable are observed, Pearl (2000), Tian and Pearl (2002) and Shpitser and Pearl (2006) discussed several graphical identification con- ditions for causal effects, whic h enable us to recognize situations where the causal effects can be ev aluated from observ ational data. How ev er, in some situations, ev en an exp o- sure/outcome v ariable is unobserv ed. F or example, in a study to examine whether the so cioeconomic gra- dient has an influence on low birth-weigh t, socio eco- nomic status is measured by some proxy v ariables suc h as income, wealth, education and occupation, since the true socio economic status is unobserv ed (Finc h, 2003). Another example concerning an unobserv ed exposure is in o ccupational settings. Man y epidemio- logical studies ha ve addressed the question of carcino- genicity in w orkers exp osed to diesel exhaust and coal mine dust, and most showed a low-to-medium increase in the risk of lung cancer. How ev er, exposure measure- ment in these studies is mainly inferred on the basis of job classifications and ma y lead to misclassification (Hoffmann and Jockel, 2006). On the other hand, as an example concerning an unobserv ed outcome, Fleiss et al. (1976) rep orted a comparative clinical trial of ibuprofen, aspirin and placebo in the relief of p ost- extraction pain. Since the true outcome (pain relief ) is unobserved, they used the Ridit analysis (Bross, 1958) to divide patien ts into five categories of pain relief: none, po or, fair, goo d and v ery goo d. These examples show the importance of ev aluating causal effects when an exposure/outcome v ariable is unobserved. Kuroki et al. (2005) p oin ted out that it is difficult to apply the identification criteria prop osed by Pearl and his colleagues to ev aluate causal effects in such situ- ations, and provided the graphical identifiabilit y cri- teria when an unobserved exp osure/outcome v ariable is contin uous. In addition, Kuroki (2007) arranged the identification conditions proposed b y Kuroki et al. (2005) to the case where an exposure/outcome v ari- able is dic hotomous. How ev er, in many situations, researchers and practitioners are more interested in the different exp osure lev els (e.g., none, lo w, medium and high) than the pure binary exposure (exp osed vs. unexposed), and are also more interested in the re- sponse lev els (e.g., none, p oor, fair, go od and v ery goo d) than the simple binary response (impro ved vs. not improv ed). Then, the main purp ose of this pap er is to provide identifiabilit y criteria for causal effects from observa- tional studies in the presence of an unobserved exp o- sure/outcome v ariable with more than tw o categories. It will b e shown that if we can observe some proxy v ariables that are affected b y the unobserv ed v ariable, then the causal effect can b e ev aluated by using sta- tistical causal analysis. More generally , we consider the case where there exist unmeasured v ariables b e- tw een the unobserv ed exposure/outcome v ariable and its pro xy v ariables. Under suc h a situation, the causal effect is not iden tifiable but the bounds on the causal effect can b e deriv ed. Finally , we illustrate our results with an example ab out social science. 2 PRELIMINARIES 2.1 BA YESIAN NETW ORKS Let f ( v 1 ,v 2 ,...,v n ) b e a strictly positive joint dis- tribution of a set V = { V 1 ,V 2 , ··· ,V n } of v ariables, f ( v i | v j ) the conditional distribution of V i given V j = v j ( V i ,V j ∈ V ) and f ( v i ) the marginal distribution of V i . Similar notations are used for other distributions. F or graph theoretic terminology used in this paper, refer to Kuroki et al. (2005). Suppose that a set V of v ariables and a directed acyclic graph G =( V , E ) are given. When the join t distri- bution of V is factorized recursively according to the graph G as the following equation, the graph is called a Ba yesian net work: f ( v 1 ,v 2 , ··· ,v n )= n Π i =1 f ( v i | pa( v i )) . (1) When pa( v i ) is an empty set, f ( v i | pa( v i )) is the marginal distribution f ( v i )o f v i . If a join t distribution is factorized recursiv ely accord- ing to the graph G , the conditional indep endencies im- plied by the factorization (1) can b e obtained from the graph G according to the d-separation criterion (Pearl, 1988), that is, if Z 1 d-separates Z 2 from Z 3 in a di- rected acyclic graph G ( Z 1 , Z 2 , Z 3 ⊂ V ), then Z 2 is conditionally indep enden t of Z 3 given Z 1 in the corre- sponding recursive factorization (1); See, for example, Geiger et al. (1990). 2.2 CA USAL EFFECT Pearl (2000) defined a causal effect as a distribution of an outcome v ariable when conducting an external interv en tion, where an ‘external in terven tion’ means that a v ariable is forced to take on some fixed v alue, regardless of the v alues of other v ariables. If the dis- tribution of the remaining v ariables represented in the directed acyclic graph remains essentially unchanged by suc h an external interv ention, then the graph can be regarded as a causal diagram and the effect of the external interv ention can be calculated from the joint factorized distribution. The exact definition is given as follows. DEFINITION 1 Let V = { X, Y }∪ Q ( { X, Y }∩ Q = φ ) b e a set of v ari- ables represen ted in a Bayesian netw ork G . If the dis- tribution of Y after setting X to a v alue x is given by f ( y | set( X = x )) = q f ( x, y , q ) f ( x | pa( x )) , (2) then G is called a causal diagram with regard to X and equation (2) is called a causal effect of X on Y . Here, set( X = x ) means that X is set to a v alue x by an external interv en tion. 2 If Definition 1 holds true with regard to all pairs of v ariables in the graph, then the whole graph is said to be causal. F or more details about the relationship between Ba yesian netw orks and causal diagrams, see Pearl (2000). Given a causal diagram G , in order to ev aluate the causal effect f ( y | set( X = x )) of X on Y from a joint factorized distribution of observ ed v ariables, it is re- quired to observe not only X and Y but also a set Z of other v ariables, such as confounders. Pearl (2000) pro- vided ‘the bac k do or criterion’ as one of graphical iden- tifiability criteria for causal effects f ( y | set( X = x )), where ‘identifiable’ means that f ( y | set( X = x )) can be determined uniquely from a join t distribution of observed v ariables. DEFINITION 2 Suppose that X is a non-descendant of Y in a directed acyclic graph G . If a set Z of v ertices satisfies the following conditions relativ e to an ordered pair ( X, Y ) of v ertices, then Z is said to satisfy the back door criterion relative to ( X, Y ): (i) no vertex in Z is a descendan t of X ; (ii) Z blo cks ev ery path betw een X and Y that con- tains an arrow p oin ting to X . 2 If a set Z of v ariables satisfies the bac k door criterion relative to ( X, Y ), then the causal effect f ( y | set( X = x )) of X on Y is iden tifiable through the observation of Z ∪{ X, Y } and is given b y the formula f ( y | set( X = x )) = z f ( y | x, z ) f ( z ) . (3) When the bac k do or criterion can not b e applied to ev aluate causal effects, Pearl (2000) pro vided ‘the fron t door criterion’, which is as follo ws: DEFINITION 3 Suppose that X is a non-descendan t of Y in a directed acyclic graph G . If a set Z of v ariables satisfies the following conditions relativ e to an ordered pair ( X, Y ) of v ariables, then Z is said to satisfy the front door criterion relative to ( X, Y ): (i) Z blocks all directed paths from X to Y ; (ii) an empt y set blocks every path betw een X and Z that contains an arro w pointing to X ; (iii) X blocks ev ery path b et ween an y vertex in Z and Y . 2 If a set Z of v ariables satisfies the fron t door criterion relative to ( X, Y ), then the causal effect f ( y | set( X = x )) of X on Y is identifiable through the observation of Z ∪{ X, Y } and is given b y the formula f ( y | set( X = x )) = x , z f ( y | x , z ) f ( z | x ) f ( x ) . (4) 3 IDENTIFICA TION OF CA USAL EFFECTS In section 2, it is assumed that both an exp osure v ari- able and an outcome v ariable are observ able. If either of them is unobserved, we cannot identify the causal effect of an exp osure on an outcome even if a set of v ariables satisfying the bac k do or criterion or the fron t door criterion are observed. In this section, we con- sider the case where an unobserv ed exp osure/outcome v ariable is assumed to be discrete. Let X be an exp o- sure v ariable and Y be an outcome v ariable. Though X or Y is unobserved, researchers are in terested in dividing them into k categories. F or example, when the domain of Y is divided into k = 3 categories, y 1 , y 2 and y 3 may represent the p oor, fair and go od re- sponse levels. Then, let U be either X or Y whic h is an unobserv ed v ariable ( u ∈{ u 1 , ··· ,u k } ). In addition, let a set S and a set T b e observed proxy v ariables that are affected by the unobserved v ariable U . As- sume that we can select k distinct v ectors from the domains of a set S and a set T of v ariables, denoted as t 1 , ··· , t k and s 1 , ··· , s k , resp ectiv ely . A set W and a set Z are assumed to b e con tinuous and/or discrete v ariables. F urthermore, let P and Q be k dimensional nonsingular matrices such that P = ⎛ ⎜ ⎜ ⎜ ⎝ 1 f ( t 1 | z ) ··· f ( t k − 1 | z ) f ( s 1 | z ) f ( s 1 , t 1 | z ) ··· f ( s 1 , t k − 1 | z ) . . . . . . . . . . . . f ( s k − 1 | z ) f ( s k − 1 , t 1 | z ) ··· f ( s k − 1 , t k − 1 | z ) ⎞ ⎟ ⎟ ⎟ ⎠ , (5) Q = ⎛ ⎜ ⎜ ⎜ ⎝ f ( w | z ) f ( w , t 1 | z ) ··· f ( w , t k − 1 | z ) f ( w , s 1 | z ) f ( w , s 1 , t 1 | z ) ··· f ( w , s 1 , t k − 1 | z ) . . . . . . . . . . . . f ( w , s k − 1 | z ) f ( w , s k − 1 , t 1 | z ) ··· f ( w , s k − 1 , t k − 1 | z ) ⎞ ⎟ ⎟ ⎟ ⎠ . (6) Then, the following theorem is obtained. THEOREM 1 Given a causal diagram G on V with S ∪ T ∪{ U } ∪ Z ∪ W ( ⊂ V ), suppose that (i) Z ∪{ U } d-separates S from T and W from S ∪ T ; (ii) f ( u 1 | z ) < ···
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment