Comment: Understanding OR, PS and DR
Comment on ``Understanding OR, PS and DR'' [arXiv:0804.2958]
Authors: Zhiqiang Tan
Statistic al Scienc e 2007, V ol. 22, No. 4, 560– 568 DOI: 10.1214 /07-STS227A Main article DO I: 10.1214/07-STS227 c Institute of Mathematical Statisti cs , 2007 Comment: Understanding OR, PS and DR Zhiqiang T an W e congratulate Kang and Sc hafer (KS) on their excellen t article comparing v arious estimators of a p opulation mean in the presence of missing data, and thank the Editor for organizing the discussion. In this communicati on, we systematically examine the prop ensity score (PS) and the o utcome regres- sion (OR) approac h es and doubly robust (DR) esti- mation, whic h are all discussed b y KS. The a im is to clarify and b etter our und erstanding of the three in terrelated sub jects. Sections 1 an d 2 contai n th e follo wing main p oint s, resp ectiv ely . (a) OR and PS are t wo approac hes w ith differen t c h aracteristics, and one does not necessarily domi- nate th e o ther. The OR approac h suffers th e p rob- lem of implicitly making extrap olation. Th e PS-w eigh ting appr oac h tend s to yield large w eights, explicitly in dicating uncertain ty in the estimate. (b) It seems more constructive to view DR esti- mation in the PS appr oac h b y in corp orating an OR mo del rather than in the OR approac h by incorp o- rating a PS mo del. T an’s ( 2006 ) DR estimator can b e used to impro ve up on any initial PS -w eigh ting estimator with b oth v ariance and bias reduction. Finally , Section 3 presen ts miscellaneous commen ts. 1. UNDERST ANDING OR AND PS F or a p opulation, let X b e a v ector of (pretreat- men t) co v ariates, T b e the tr eatmen t s tatus, Y b e the observ ed outcome giv en b y (1 − T ) Y 0 + T Y 1 , where ( Y 0 , Y 1 ) are p oten tial outcomes. The observ ed Zhiqiang T an is Assistant Pr ofessor, Dep artment of Biostatistics, Blo omb er g Scho ol of Public He alth, J ohns Hopkins University, 615 North Wolfe S tr e et, Baltimor e, Maryland 21 205, USA e-mail: ztan@jhsph.e du . This is an e le ctronic reprint of the orig inal a rticle published b y the Institute of Mathematical Statistics in Statistic al Scienc e , 2007 , V ol. 22, No. 4, 5 60–56 8 . This reprint differs from the orig inal in pag ination a nd t yp ogr aphic de ta il. data consist of indep endent and identica lly distribu ted copies ( X i , T i , Y i ), i = 1 , . . . , n . Assum e that T and ( Y 0 , Y 1 ) are conditionally indep endent giv en X . The ob jectiv e is to estimate µ 1 = E ( Y 1 ) , µ 0 = E ( Y 0 ) , and their d ifference, µ 1 − µ 0 , wh ich giv es the a v erage causal effect (A CE ). K S throughout fo cused on the problem of estimating µ 1 from th e data ( X i , T i , T i Y i ), i = 1 , . . . , n , only , noting in Section 1.2 that estima- tion of the A CE can b e separated in to indep en d en t estimation of the means µ 1 and µ 0 . W e shall in Section 3 d iscuss su btle differences b etw een causal inference and solving t wo separate missing-data prob- lems, but unt il then we shall restrict our atten tion to estima tion of µ 1 from ( X i , T i , T i Y i ) only . The m o del describ ed at this stage is completely nonparametric. No parametric mo deling assum p tion is made on either the regression f u nction m 1 ( X ) = E ( Y | T = 1 , X ) or the prop ensit y score π ( X ) = P ( T = 1 | X ). Robins and Rotnitzky ( 1995 ) and Hahn ( 1998 ) established the follo wing fun damen tal resu lt for semi- parametric (or more precisely , n onparametric) esti- mation of µ 1 . Pr opos ition 1. Under c ertain r e gularity c ondi- tions, ther e exi sts a unique influenc e function, which henc e must b e the e fficient influenc e function, given by τ 1 = T π ( X ) Y − µ 1 − T π ( X ) − 1 m 1 ( X ) = m 1 ( X ) − µ 1 + T π ( X ) ( Y − m 1 ( X )) . The semip ar ametric varianc e b ound (i.e., the lowest asymptot ic varianc e any r e gular estimator of µ 1 c an achieve) is n − 1 E 2 ( τ 1 ) . The semiparametric v ariance b ound dep ends on b oth m 1 ( X ) and π ( X ) . The b oun d b ecomes large or ev en infi nite whenever π ( X ) ≈ 0 for some v alues of 1 2 Z. T AN X . Intuitiv ely , it b ecomes difficult to infer the o v er- all mean of Y 1 in this case, b ecause very few v alues of Y 1 are observ ed among sub j ects with π ( X ) ≈ 0. The difficult y holds whatever parametric approac h , OR or PS, is tak en for inference, although the sym p- toms can b e different. This p oin t is cen tral to our subsequent discussion. The problem of estimating µ 1 is t ypically handled b y in tro ducing parametric mo deling assu mptions on either m 1 ( X ) or π ( X ) . T he OR approac h is to sp ec- ify an OR mo del, sa y m 1 ( X ; α ), for m 1 ( X ) and then estimate µ 1 b y ˆ µ OR = 1 n n X i =1 ˆ m 1 ( X i ) , where ˆ m 1 ( X ) is the fitted resp onse. The PS ap- proac h is to sp ecify a PS mo del, s a y π ( X ; γ ), f or π ( X ) and then estimate µ 1 b y ˆ µ IPW = 1 n n X i =1 T i Y i ˆ π ( X i ) or n X i =1 T i Y i ˆ π ( X i ) n X i =1 T i ˆ π ( X i ) , where ˆ π ( X ) is the fitted prop ensit y s core. T he idea of inv erse probabilit y w eigh ting (IPW) is to r eco ver the join t d istribution of ( X, Y 1 ) by attac hing w eigh t ∝ ˆ π − 1 ( X i ) to eac h p oin t in { ( X i , Y i ) : T i = 1 } (see T an, 2006 , for a lik eliho o d form ulation). More gener- ally , consid er the f ollo wing class of augmented IPW estimators ˆ µ AIPW = ˆ µ AIPW ( h ) dep endin g on a known function h ( X ): ˆ µ AIPW = 1 n n X i =1 T i Y i ˆ π ( X i ) − 1 n n X i =1 T i ˆ π ( X i ) − 1 h ( X i ) . A theoretical comparison of the t wo approac h es is giv en by Pr opos ition 2. Assume that an OR mo del is c orr e ctly sp e cifie d and m 1 ( X ) is efficie ntly estimate d with adaptation to heter osc e dastic v ar( Y 1 | X ) , and that a PS mo del is c orr e c tly sp e cifie d and π ( X ) may or may not b e efficie ntly estimate d. Then asy . v ar ( ˆ µ OR ) ≤ asy . v ar ( ˆ µ AIPW ) , wher e asy .v ar. denotes asymptotic varianc e as n → ∞ . In fact, the asymp totic v ariance of ˆ µ OR , whic h is the lo west under the parametric OR mo del, is no greater than the semiparametric v ariance b ound under th e nonparametric mo del, whereas that of ˆ µ AIPW is no smaller than n − 1 E 2 ( τ 1 ) b ecause τ 1 has the smallest v ariance among π − 1 ( X ) T Y − ( π − 1 ( X ) T − 1) h ( X ) o ver all fun ctions h ( X ). In the degenerate case where m 1 ( X ) and π ( X ) are kno wn, the com- parison can b e attributed to Rao–Blac kwel lization b ecause E [ π − 1 ( X ) T Y − ( π − 1 ( X ) T − 1) h ( X ) | X ] = m 1 ( X ) . T his result has in teresting implications for understand ing the t wo approac hes. First, the result formalizes the often-heard state- men t that the (A)IPW estimator is n o more effi- cien t than the OR estimator. If a co rrect O R mo d el and a correct PS m o del w ere placed in t wo blac k b oxe s, resp ectiv ely , and if a statistician we re ask ed to open one and only one b o x , then the statistic ian should c ho ose the b o x for the OR mo del in terms of asymptotic efficiency (min us the complication d ue to adaptation to h eteroscedastic v ariance of Y 1 giv en X ). Ho wev er, one could immediately argue that this comparison is only of phan tom significance, b ecause all mo dels (by human efforts) are wrong (in the p res- ence of high-dimensional X ) and th er efore th e h y- p othetical situation never o ccurs. In this s en se, we emphasize that the result do es not establish an y ab- solute su p eriorit y of the OR approac h ov er the PS approac h. Second, ev en though not imp lying one app roac h is b etter than the other, the resu lt d o es shed ligh t on differen t c haracteristics of the t wo app roac h es as an appro ximation to the ideal nonparametric estima- tion. T ypically , in creasingly complicated bu t nested parametric mo dels can b e sp ecified in either ap- proac h to reduce the dep endency on mo deling as- sumptions. F or a sequ ence of OR mod els, the asymp- totic v ariance of ˆ µ OR is increasing to the semipara- metric v ariance b oun d , whereas for a sequence of PS m o dels, the asymptotic v ariance of ˆ µ AIPW is de- creasing to the semiparametric v ariance b oun d. F or this difference, w e suggest that the OR appr oac h is aggressiv e and the PS approac h is conserv ative . Correctly sp ecifying an OR mod el ensur es that ˆ µ OR is consisten t and h as asymptotic v ariance no greater, whereas correctly sp ecifying a P S mo d el ensur es that ˆ µ AIPW is consistent and has asymptotic v ariance no smaller, than otherwise w ould b e b est attained with- out an y mo d eling assumption. This in terpretation agrees with the finding of T an ( 2006 ) that the OR approac h w orks dir ectly with the usu al lik eliho o d , COMMENT 3 whereas th e PS appr oac h r etains part of all infor- mation and therefore ignores other p art on the join t distributions of co v ariates and p oten tial outcomes. No w the real, hard questions facing a statisticia n are: (a) Which task is more lik ely to b e accomplished, to co rrectly sp ecify an OR mod el or a PS mo del? (b) Which mistake (even a mild one) can lead to w orse estimates, missp ecification of an OR mo del or a PS mo del? First of all, it seems that n o definite comparison is p ossible, b ecause answe rs to b oth questions de- p end on unmeasurable factors suc h as the statisti- cian’s effort an d exp erience for question (a) and the degree and d irection of mo d el miss p ecification for question (b). Nev erth eless, some informal compar- isons are worth considering. Regarding qu estion (a), a fi rst answer migh t b e “equally likely ,” b ecause b oth mo d els in vo lv e the same vect or of explanatory v ariables X . Ho wev er, the tw o tasks ha ve different forms of difficulties. Th e OR-mo del b uilding wo rks on the “truncated” data { ( X i , Y i ) : T i = 1 } within treated sub jects. Th erefore, an y OR mo del relies on extrap olation to predict m 1 ( X ) at v alues of X that are different from those for most treat ed sub jects [i.e., π ( X ) ≈ 0]. The usual mo del c hecki ng is not capable of detecting OR-mo del missp ecification, whether mild or gross, in this re- gion of X . (Note that fi n ding high-lev erage observ a- tions can p oin t to the existence of s uc h a r egion of X , not mo d el missp ecification.) This problem h olds for low- or h igh-dimensional X , and is separate fr om the d ifficult y to captur e m 1 ( X ) w ith in treated sub- jects w hen X is high-dim en sional [cf. KS’s discussion b elo w d ispla y (2)]. In contrast , the PS-mo del bu ild- ing works on the “full” d ata { ( X i , T i ) } and do es not suffer th e presence of data truncation, although suf- fering the same cur se of d imensionalit y . The exercise of mod el chec king is capable of detecti ng PS-mo del missp ecification. T he matter of concern is that suc- cessful implemen tation is difficult when X is h igh- dimensional. Regarding question (b), KS (Section 2.1) s uggested that th e (A)IPW estimator is sensitive to missp ec- ification of the P S mo del when π ( X ) ≈ 0 for some v alues of X . F or example, if π ( X ) = 0 . 01 is u n der- estimated at 0.001, then, ev en though the absolute bias is small (= 0 . 009), the w eigh t π − 1 ( X ) is o v eres- timated by 10 times higher. I n this case, the estima- tor has inflated standard error, wh ic h can b e muc h greater than its b ias. In con trast, if the OR mo del is missp ecified, then the bias of the OR estimator is the a v erage of those of ˆ m 1 ( X ) across individual sub jects in th e original scale, and can b e of similar magnitude to its standard deviation. In s u mmary , OR and PS are t wo approac hes with differen t charact eristics. If an OR mo d el is correctl y sp ecified, then the OR estimator is consisten t and has asymptotic v ariance n o greater than the s emi- parametric v ariance b ound. Because of data trunca- tion, an y OR m o del suffers the problem of i mplicitly making extrap olation at v alues of X with π ( X ) ≈ 0. Finding h igh-lev erage obser v ations in mo d el c hec k- ing can p oint to the existence of su c h v alues of X . In con trast, the PS approac h sp ecifically examines π ( X ) and addresses data truncation b y w eigh ting to reco ver the join t d istribution of ( X, Y 1 ). T he weigh ts are necessarily large for treated sub jects with π ( X ) ≈ 0, in whic h case the standard error is large, e xplic- itly in d icating un certaint y in the estimate. If a PS mo del is corr ectly s p ecified, then the (A)IPW esti- mator is consistent and has asymptotic v ariance n o smaller than the semiparametric v ariance b ou n d. 2. UNDERST ANDING DR The OR or the (A)IPW estimator requires sp ec- ification of an OR or a PS mo del, r esp ectiv ely . In con trast, a DR estimator uses the t wo mo dels in a manner such that it remains consisten t if either the OR or the PS mod el is correctly sp ecified. The pro- tot ypical DR estimator of Robins, Rotnitzky and Zhao ( 1994 ) is ˆ µ AIPW , fix = 1 n n X i =1 T i Y i ˆ π ( X i ) − 1 n n X i =1 T i ˆ π ( X i ) − 1 ˆ m 1 ( X i ) = 1 n n X i =1 ˆ m 1 ( X i ) + 1 n n X i =1 T i ˆ π ( X i ) ( Y − ˆ m 1 ( X i )) . The t w o equiv alen t exp r essions [resp. (9) and (8) in KS] corresp ond to those for the efficien t influence function τ 1 in Prop osition 1 . Pr op osition 3 collects theoretical comparisons b et w een the three estima- tors. Pr opos ition 3. The fol lowing statements hold: 4 Z. T AN (i) ˆ µ AIPW , fix is doubly r obust. (ii) ˆ µ AIPW , fix is lo c al ly efficient: if a PS and an OR mo del ar e c orr e ctly sp e ci fie d, then ˆ µ AIPW , fix achieves the semip ar ametric varianc e b ound and henc e asy . v ar ( ˆ µ AIPW , fix ) ≤ asy . v ar ( ˆ µ AIPW ) . (iii) If an OR mo del is c orr e ctly sp e cifie d and m 1 ( X ) is efficiently estimate d in ˆ µ OR , then asy . v ar ( ˆ µ AIPW , fix ) ≥ asy . v ar ( ˆ µ OR ) . Compared with ˆ µ OR , ˆ µ AIPW , fix is more robust in terms of bias if the OR mo d el is missp ecified but the PS mo d el is correctly sp ecified, but is less efficien t in terms of v ariance if the OR mod el is correctly sp ec- ified. T h e us u al bias-v ariance trade-off tak es effect. Compared with ˆ µ AIPW , ˆ µ AIPW , fix is more r obust in terms of bias if th e PS mo del is missp ecified but the O R mo d el is correct ly sp ecified, and is more ef- ficien t in term s of v ariance if b oth the PS and the OR mo dels are correctly sp ecified. Th e usual bias- v ariance trade-off seems not to exist. In tuitiv ely , the difference can b e attributed to th e characte ristics of OR (b eing aggressiv e) and PS (b eing conserv ativ e) discussed in Section 1 . It is p ossible for the PS ap- proac h to reduce b oth bias and v ariance by in corp o- rating an OR mod el, but not so for the OR approac h b y incorp orating a PS mo del. Lo cal efficiency implies that if th e PS mo del is correctly sp ecified, then ˆ µ AIPW , fix gains efficiency o ver ˆ µ AIPW for ev ery function h ( X ) under the con- dition that the OR mo del is also correctly sp ecified. A more desirable situation is to find an estimator that is not only doubly robu st and lo cally efficient but also, whenever the P S mo del is correctly sp ec- ified, guarant eed to gain efficiency o ve r ˆ µ AIPW for an y initial, fixed function h ( X ). F or simplicit y , con- sider ˆ µ IPW corresp ondin g to h ( X ) = 0 as the initial estimator. In this case, consider T an’s ( 2006 ) regres- sion (tilde) estimator ˜ µ REG = 1 n n X i =1 T i Y i ˆ π ( X i ) − ˜ β (1) 1 n n X i =1 T i ˆ π ( X i ) − 1 ˆ m 1 ( X i ) , where ˜ β (1) is the first elemen t of ˜ β = ˜ E − 1 ( ˆ ξ ˆ ζ ⊤ ) ˜ E ( ˆ ξ ˆ η ), ˜ E d enotes sample av erage, and ˆ η = T Y ˆ π ( X ) , ˆ ξ = T ˆ π ( X ) − 1 ˆ m 1 ( X ) , ∂ ˆ π ∂ γ ⊤ ( X ) 1 − ˆ π ( X ) ⊤ , ˆ ζ = T ˆ π ( X ) ˆ m 1 ( X ) , ∂ ˆ π ∂ γ ⊤ ( X ) 1 − ˆ π ( X ) ⊤ . This estimator algebraica lly resem b les Robins, Rot- nitzky and Zh ao’s ( 1995 ) regression (hat) estimato r ˆ µ REG = 1 n n X i =1 T i Y i ˆ π ( X i ) − ˆ β (1) 1 n n X i =1 T i ˆ π ( X i ) − 1 ˆ m 1 ( X i ) , where ˆ β (1) is the first elemen t of ˆ β = ˜ E − 1 ( ˆ ξ ˆ ξ ⊤ ) ˜ E ( ˆ ξ ˆ η ). Compared with ˆ µ AIPW , fix , eac h estimator int ro d uces an estimated regression coefficient, ˜ β or ˆ β , of ˆ η aga inst con trol v ariates ˆ ξ . Therefore, ˜ µ REG and ˆ µ REG share the adv an tage of optimally using con tr ol v ariates ˆ ξ [Prop osition 4 (ii)]. See Section 3 for a discu ssion ab out “co ntrol v ariates” and “regression estimators.” On the other h an d , ˆ β is defined in the classic al man- ner, whereas ˜ β is sp ecially constructed b y exploiting the structur e of con trol v ariates ˆ ξ . Th is subtle dif - ference underlies Prop osition 4 (i). Pr opos ition 4. The fol lowing statements hold: (i) ˜ µ REG and ˆ µ REG ar e lo c al ly efficie nt, but ˜ µ REG is doubly r obust and ˆ µ REG is not. (ii) If a P S mo del is c orr e ctly sp e ci fie d and π ( X ) is efficie ntly estimate d, then ˜ µ REG and ˆ µ REG achieve the smal lest asympto tic varianc e among 1 n n X i =1 T i Y i ˆ π ( X i ) − b (1) 1 n n X i =1 T i ˆ π ( X i ) − 1 ˆ m 1 ( X i ) , wher e b (1) is an arbitr ary c o efficient. The two es- timators ar e asymptotic al ly at le ast as efficient as ˆ µ IPW and ˆ µ AIPW , fix , c orr esp onding to b (1) = 0 and 1 . Compared with ˆ µ AIPW , fix , ˜ µ REG pro vides a more concrete impr o v emen t up on ˆ µ IPW due to the p os- session of three p r op erties: optimalit y in using con- trol v ariates, local efficie ncy and double r obustness. Using ˜ µ REG ac h ieves v ariance redu ction if the PS mo del is correctly sp ecified (the effect of whic h is maximal if th e OR mo d el is also correctly sp eci- fied), and bias reduction if the PS m o del is missp ec- ified but th e OR mo del is correctly sp ecified. On the other hand , comparison b etw een ˆ µ OR and ˜ µ REG is COMMENT 5 similarly sub ject to the usual bias-v ariance trade-off as that b et w een ˆ µ OR and ˜ µ AIPW , fix . That is, ˜ µ REG is more r obust than ˆ µ OR if the O R mo del is missp ec- ified but the PS mo del is correctly sp ecified, bu t is less efficien t if the OR mo del is co rrectly sp ecified. The preceding comparisons b et ween ˆ µ AIPW , fix , ˜ µ REG and ˆ µ OR , ˆ µ IPW present useful facts for u nder- standing DR estimation. It seems more meaningful to consider ˆ µ AIPW , fix or ˜ µ REG as an adv ance or im- pro v ement in the PS approac h by incorp orating an OR m o del rather than in the OR app roac h b y incor- p orating a PS m o del. The OR and PS mo dels play differen t roles, ev en though the mo dels are equally referred to in th e concept of DR and ˆ µ AIPW , fix can b e exp ressed as bias-corrected ˆ µ OR or equiv alent ly as bias-corrected ˆ µ IPW . This viewp oint is also sup- p orted by the constru ction of ˆ µ AIPW , fix (in th e first expression by Robins, Rotnitzky and Zh ao, 1994 ) and ˜ µ REG . Both of the estimators are deriv ed under the assumption that the P S mo del is correct, and then examined in the situation where the OR mo del is also correct, or the PS mo d el is missp ecified b u t the O R m o del correct (see T an, 2006 , S ection 3.2). The different c h aracteristics d iscussed in Section 1 p ersist b et w een the PS (ev en using ˆ µ AIPW , fix or ˜ µ REG with th e DR b enefit) and OR approac hes. The asymp - totic v ariance of ˆ µ AIPW , ˆ µ AIPW , fix , or ˜ µ REG if ˜ µ REG the P S mo del is corr ectly sp ecified is no smaller, whereas that of ˆ µ OR if the OR mo d el is correctly sp ecified is no greater, than the semiparametric v ari- ance b ound. Moreo ver, if the OR mo del is correct, the asymptotic v ariance of ˆ µ AIPW , fix or ˜ µ REG is still no s maller than that of ˆ µ OR . Therefore: Pr opos ition 5. The asym ptotic varianc e of ˆ µ AIPW , fix or ˜ µ REG if ei ther a PS or an OR mo del is c orr e ctly sp e cifie d is no smal ler than that of ˆ µ OR if the OR mo del is c orr e ctly sp e cifie d and m 1 ( X ) is efficiently estimate d in ˆ µ OR . Lik e Prop osition 2 , this r esult do es not establish absolute sup eriorit y of the OR approac h o v er the PS-DR app roac h . Instead, it p oin ts to considering practical issues of mod el sp ecification and conse- quences of mo del m issp ecification. There seems to b e n o d efinite comparison, b ecause v arious, un mea- surable factors are inv olved. Nev ertheless, the p oint s regarding questions (a) and (b) in S ection 1 remain relev an t. In sum mary , it seems more constru ctiv e to view DR estimation in the PS approac h by incorp orat- ing an OR mo del rather th an in the OR approac h b y incorp orating a PS mo d el. The estimator ˜ µ REG pro vides a concrete improv ement up on ˆ µ IPW with b oth v ariance and bias r ed uction in the sens e that it gains efficiency wh enev er the PS mo del is cor- rectly sp ecified (and maximally so if the OR mod el is also correctly sp ecified), and remains consistent if the PS mo del is missp ecified bu t th e OR mo del is correctly sp ecified. O n the other han d , comparison b et wee n ˜ µ REG and ˆ µ OR is complicated b y the usual bias-v ariance trade-off. Differen t c haracteristics are asso ciated with the OR and the PS-DR approac hes and should b e carefully w eighed in applicatio ns. 3. OTHER COMMENTS Control Va riat es and Regression Estimato rs The name “regression estimator” is adopted from the literatures of sampling survey (e.g., Co c hran, 1977 , Ch ap ter 7) and Mon te Carlo integrati on (e.g., Hammersley and Handscom b, 1964 ), and should b e distinguished f rom “regression estimation” describ ed b y K S (Section 2.3). Sp ecifically , the idea is to ex- ploit the fact that if the PS mo del is correct, th en ˆ η asymptotically has mean µ 1 (to b e estimated) and ˆ ξ m ean 0 (known). That is, ˆ ξ serv e as auxiliary v ariables (in the terminology of surve y samp ling) or con trol v ariates (in that of Monte Carlo in tegra- tion). V ariance reduction can b e ac hiev ed by using ˜ E ( ˆ η ) − b ˜ E ( ˆ ξ ), instead of ˆ µ IPW = ˜ E ( ˆ η ), with b an estimated r egression coefficient of ˆ η against ˆ ξ . The con tr ol v ariates for ˜ µ REG in Section 2 include ( ˆ π − 1 T − 1) ˆ m 1 and ( T − ˆ π )[ ˆ π (1 − ˆ π )] − 1 ∂ ˆ π /∂ γ , the second of wh ic h is the score f unction for the PS mo del and is necessary f or asymptotic optimalit y in Pr op osition 4 (ii) . If the PS mo del is correct, then ˜ µ REG is alw a ys at least as efficien t as ˆ µ IPW in the ra w v er s ion, that is, ˆ µ AIPW (0), but not alw a ys than ˆ µ IPW in the ratio v ersion. Ho wev er, the indefinite- ness can b e easily resolv ed . If the con trol v ariate ˆ π − 1 T − 1 is added, or (1 , ˆ m 1 ) ⊤ substituted for ˆ m 1 , then ˜ µ REG alw a ys gains efficiency o ver b oth v ersions of ˆ µ IPW . F ur th ermore, if (1 , h, ˆ m 1 ) ⊤ is substituted for ˆ m 1 , then ˜ µ REG alw a ys gains efficiency also o v er the estimator ˆ µ AIPW ( h ). Causal Inference Causal in ference inv olv es estimatio n of b oth µ 1 and µ 0 . Similar estimators of µ 0 can b e separately defined by replacing T , ˆ π and ˆ m 1 with 1 − T , 1 − ˆ π and ˆ m 0 , where m 0 = E ( Y | T = 0 , X ). The con trol 6 Z. T AN v ariates ((1 − ˆ π ) − 1 (1 − T ) − 1 )(1 , ˆ m 0 ) ⊤ for estimat- ing µ 0 differ from ( ˆ π − 1 T − 1)(1 , ˆ m 1 ) ⊤ for estimating µ 1 . As a consequence, ev en though ˜ µ 1 , REG or ˜ µ 0 , REG individually gains efficiency o v er ˆ µ 1 , IPW or ˆ µ 0 , IPW , the difference ˜ µ 1 , REG − ˜ µ 0 , REG do es not n ecessarily gain efficiency ov er ˆ µ 1 , IPW − ˆ µ 0 , IPW . The pr oblem can b e ov ercome by using a combined set of con- trol v ariates, sa y , [ ˆ π − 1 T − (1 − ˆ π ) − 1 (1 − T )]( ˆ π , 1 − ˆ π , ˆ π ˆ m 0 , (1 − ˆ π ) ˆ m 1 ) ⊤ . Then ˜ µ 1 , REG − ˜ µ 0 , REG main- tains optimalit y in using con trol v ariates in the sense of Pr op osition 4 (ii), in add ition to lo cal efficiency and double robustness. The mec h anism of using a common set of control v ariates for estimating b oth µ 1 and µ 0 is automatic in the lik eliho o d PS approac h of T an ( 20 06 ). PS S tratification KS (Section 2.2) describ ed th e stratification es- timator of Rosenbaum and Ru b in ( 1983 ) as a wa y “to coarsen the estimated prop en sit y score int o a few categories and compu te w eigh ted av erages of the mean resp onse across catego ries.” It is h elpful to rewrite the estimator in their displa y (6) as ˆ µ strat = 1 n n X i =1 T i Y i ˆ π strat ( X i ) , where ˆ π strat ( X ) = P n i =1 T i 1 { ˆ π ( X i ) ∈ S j } / P n i =1 1 { ˆ π ( X i ) ∈ ˆ S j } if ˆ π ( X ) ∈ ˆ S j (the j th estimated PS stratum), j = 1 , . . . , s . That is, ˆ µ strat is exactly an IPW estimator based on the discretized ˆ π strat ( X ) . Comparison b et w een ˆ µ strat and ˆ µ IPW is sub ject to the usual bias-v ariance trade-off. On one hand, ˆ µ strat often has smaller v ariance than ˆ µ IPW . O n the other hand, the asymptotic limit of ˆ µ strat can b e sho wn to b e s X j =1 E [ π ( X ) m 1 ( X ) | π ∗ ( X ) ∈ S ∗ j ] E [ π ( X ) | π ∗ ( X ) ∈ S ∗ j ] P ( π ∗ ( X ) ∈ S ∗ j ) , where π ∗ ( X ) is the limit of ˆ π ( X ), which agrees with the tru e π ( X ) if the PS m o del is correct, and S ∗ j is that of ˆ S j . The ratio inside the ab ov e sum is the within-stratum a v erage of m 1 ( X ) w eigh ted p rop or- tionally to π ( X ). Therefore, ˆ µ strat is inconsisten t u n- less π ( X ) or m 1 ( X ) is constant within eac h stratum (cf. KS ’s d iscu ssion ab out crude DR in Section 2.4). The asymptotic bias dep ends on the join t b ehav- ior of m 1 ( X ) and π ( X ), and can b e s u bstant ial if m 1 ( X ) v aries w h ere π ( X ) ≈ 0 v aries so that m 1 ( X ) are wei ght ed differen tially , say , by a factor of 10 at t w o X ’s with π ( X ) = 0 . 01 and 0.1. Simulations KS designed a simulatio n setup with an OR and a PS mo del app earing to b e “nearly correct.” Th e re- sp onse is generat ed as Y = 210 + 27 . 4 Z 1 + 13 . 7 Z 2 + 13 . 7 Z 3 + 13 . 7 Z 4 + ǫ , and the prop ensity score π = expit( − Z 1 + 0 . 5 Z 2 − 0 . 25 Z 3 − 0 . 1 Z 4 ), where ǫ and ( Z 1 , Z 2 , Z 3 , Z 4 ) are indep enden t, standard normal. The co v ariates seen by the statistician are X 1 = exp( Z 1 / 2), X 2 = Z 2 / (1 + exp( Z 1 )) + 10, X 3 = ( Z 1 Z 3 / 25 + 0 . 6) 3 and X 4 = ( Z 2 + Z 4 + 20) 2 . The OR mo del is the linear mo del of Y against X , and the PS mod el is the logisti c mo d el of T against X . In the course of r eplicating their sim ulations, w e acciden tally discov ered that the follo wing mo dels also app ear to b e “nearly correct.” T he cov aria tes seen b y the statistician are the same X 1 , X 2 , X 3 , but X 4 = ( Z 3 + Z 4 + 20) 2 . The OR mo del is linear and the PS mo del is logistic as KS mo dels. F or one sim- ulated d ataset, Figures 1 and 2 presen t scatterplots and b o xp lots similar to Figures 2 and 3 in KS. F or the OR mo d el, the regression coefficien ts are highly significan t and R 2 = 0 . 97. The correlation b et we en the fitted v alues of Y un der the correct and the mis- sp ecified OR mo d els is 0 . 99, and th at b et ween the linear predictors un d er the correct and the miss p eci- fied PS mo d els is 0 . 93. T ables 1 and 2 s ummarize our sim ulations for KS mo dels and for the alternativ e mo dels describ ed ab ov e. The ra w ve rsion of ˆ µ IPW is used. T he estimato rs ˜ µ (m) REG and ˆ µ (m) REG are d efined as ˜ µ REG and ˆ µ REG except that th e score fu nction for the PS mo d el is dropp ed from ˆ ξ . F or these four estimators, (1 , ˆ m 1 ) ⊤ is s u bstituted for ˆ m 1 . KS found that n one of the DR estimators they tried impr o ved up on the p erformance of the OR es- timator; see also T able 1 . T his situation is consis- ten t w ith the discus sion in Section 2 . Th e theory of DR estimati on do es not claim that a DR estimator is guaran teed to p erform b etter than the OR esti- mator wh en the OR and the PS m o dels are b oth missp ecified, wh ether mildly or grossly . Therefore, KS’s simulatio ns serv e as an example to remind u s of this ind efinite comparison. On the other hand, n either is the OR estimator guaran teed to outp erform DR estimators when the OR m o del is missp ecified or eve n “nearly corr ect.” As seen fr om T able 2 , ˆ µ OR yields greate r RMSE v al- ues than the DR estimators, ˆ µ WLS , ˜ µ REG and ˜ µ (m) REG when the alternativ e, m iss p ecified OR and PS mo d - COMMENT 7 Fig. 1. Sc atterplots of r esp onse versus c ovariates (alternative mo dels). Fig. 2. Boxplots of c ovariates and pr op ensity sc or es (alternative mo dels). 8 Z. T AN T able 1 Numeric al c omp ari son of estimators of µ 1 (KS mo dels) Metho d Bias % Bias RMSE MAE Bias % Bias RMSE MAE n = 200 π -mo del correct π -mo del in correct IPW 0 . 080 0 . 64 12 . 6 6 . 11 16 32 52 . 7 8 . 99 strat − 1 . 1 − 37 3 . 20 2 . 04 − 2 . 9 − 93 4 . 28 3 . 11 y -mo del correct y -mo del incorrect OLS − 0 . 025 − 0 . 99 2 . 47 1 . 68 − 0 . 56 17 3 . 33 2 . 19 y -mo del correct y -mo del incorrect π -mo del AIPW fix − 0 . 024 − 0 . 96 2 . 47 1 . 67 0 . 24 6 . 9 3 . 44 2 . 06 correct WLS − 0 . 025 − 1 . 0 2 . 47 1 . 6 8 0 . 39 13 2 . 99 1 . 89 REG tilde − 0 . 025 − 1 . 0 2 . 47 1 . 69 0 . 14 5 . 2 2 . 73 1 . 76 REG hat − 0 . 52 − 20 2 . 63 1 . 73 − 0 . 52 − 19 2 . 81 1 . 78 REG (m) tilde − 0 . 024 − 0 . 98 2 . 47 1 . 68 0 . 24 8 . 9 2 . 74 1 . 79 REG (m) hat − 0 . 21 − 8 . 4 2 . 48 1 . 68 − 0 . 086 − 3 . 2 2 . 65 1 . 74 π -mo del AIPW fix − 0 . 026 − 1 . 0 2 . 48 1 . 71 − 5 . 1 − 44 12 . 6 3 . 75 incorrect WLS − 0 . 026 − 1 . 0 2 . 47 1 . 70 − 2 . 2 − 69 3 . 91 2 . 77 REG tilde − 0 . 027 − 1 . 1 2 . 47 1 . 71 − 1 . 8 − 62 3 . 47 2 . 41 REG hat − 0 . 45 − 18 2 . 60 1 . 71 − 2 . 2 − 76 3 . 68 2 . 53 REG (m) tilde − 0 . 026 − 1 . 1 2 . 47 1 . 69 − 2 . 0 − 68 3 . 56 2 . 47 REG (m) hat − 0 . 13 − 5 . 3 2 . 48 1 . 68 − 2 . 2 − 77 3 . 68 2 . 59 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . n = 1000 π -mo del correct π -mo del in correct IPW 0 . 098 2 . 0 4 . 98 3 . 04 68 9 . 2 746 14 . 7 strat − 1 . 1 − 86 1 . 71 1 . 24 − 2 . 9 − 214 3 . 22 2 . 94 y -mo del correct y -mo del incorrect OLS − 0 . 047 − 4 . 0 1 . 15 0 . 770 − 0 . 85 − 56 1 . 75 1 . 15 y -mo del correct y -mo del incorrect π -mo del AIPW fix − 0 . 046 − 4 . 0 1 . 15 0 . 766 0 . 043 2 . 6 1 . 63 1 . 11 correct WLS − 0 . 046 − 4 . 0 1 . 15 0 . 769 0 . 12 8 . 7 1 . 37 0 . 943 REG tilde − 0 . 046 − 4 . 0 1 . 15 0 . 773 0 . 048 3 . 9 1 . 23 0 . 809 REG hat − 0 . 13 − 11 1 . 16 0 . 796 − 0 . 077 − 6 . 3 1 . 23 0 . 812 REG (m) tilde − 0 . 046 − 4 . 0 1 . 15 0 . 770 0 . 092 7 . 3 1 . 26 0 . 870 REG (m) hat − 0 . 083 − 7 . 2 1 . 15 0 . 768 0 . 024 1 . 9 1 . 24 0 . 857 π -mo del AIPW fix − 0 . 10 − 6 . 5 1 . 61 0 . 769 − 26 − 8 . 5 308 5 . 56 incorrect WLS − 0 . 048 − 4 . 1 1 . 15 0 . 764 − 3 . 0 − 203 3 . 38 3 . 05 REG tilde − 0 . 046 − 4 . 0 1 . 15 0 . 764 − 1 . 7 − 120 2 . 21 1 . 73 REG hat − 0 . 045 − 3 . 9 1 . 16 0 . 786 − 1 . 7 − 122 2 . 24 1 . 75 REG (m) tilde − 0 . 046 − 4 . 0 1 . 15 0 . 763 − 2 . 1 − 152 2 . 48 2 . 04 REG (m) hat − 0 . 058 − 5 . 0 1 . 16 0 . 771 − 2 . 2 − 158 2 . 57 2 . 15 els are b oth used. F or n = 200, th e bias of ˆ µ OR is 2 . 5 and th at of ˜ µ REG is 0 . 44, whic h differ substan- tially from the corresp ondin g biases − 0 . 56 and − 1 . 8 in T able 1 when KS mo dels are used. The consequences of mo del missp ecification are difficult to study , b ecause the degree and d ir ection of mo del missp ecification are sub tle, even elusiv e. F or the d ataset examined earlier, the absolute d if- ferences b etw een the (highly correlated) fi tted v al- ues of Y under the correct and the alt ernative , mis- sp ecified OR mo dels presen t a more serious pictur e of mo del missp ecification. In fact, the qu artiles of these absolute differences are 2 . 0, 3 . 2 and 5 . 1, and the m aximum is 20. F or b oth T ables 1 and 2 , the DR estimators ˜ µ REG and ˜ µ (m) REG p erform o verall b etter than the other DR estimat ors ˆ µ AIPW , fix and ˆ µ WLS . Compared with ˆ µ WLS , ˜ µ REG has MSE redu ced b y 15–20% (T able 1 ) COMMENT 9 T able 2 Numeric al c omp ari son of estimators of µ 1 (alternative mo dels) Metho d Bias % Bias RMSE MAE Bias % Bias RMSE MAE n = 200 π -mo del correct π -mo del incorrect IPW 0 . 080 0 . 64 12 . 6 6 . 11 18 34 55 . 7 9 . 61 strat − 1 . 1 − 37 3 . 20 2 . 04 − 1 . 1 − 36 3 . 22 2 . 21 y -mo del correct y -mod el incorrect OLS − 0 . 025 − 0 . 99 2 . 47 1 . 68 2 . 5 80 4 . 04 2 . 73 y -mo del correct y -mod el incorrect π -mo del AIPW fix − 0 . 024 − 0 . 96 2 . 47 1 . 67 0 . 53 14 3 . 82 2 . 32 correct WLS − 0 . 025 − 1 . 0 2 . 47 1 . 68 0 . 83 28 3 . 09 1 . 96 REG tilde − 0 . 025 − 1 . 0 2 . 47 1 . 69 0 . 33 13 2 . 63 1 . 71 REG hat − 0 . 52 − 20 2 . 63 1 . 73 − 0 . 34 − 13 2 . 70 1 . 74 REG (m) tilde − 0 . 024 − 0 . 98 2 . 47 1 . 68 0 . 45 17 2 . 74 1 . 78 REG (m) hat − 0 . 21 − 8 . 4 2 . 48 1 . 68 0 . 09 3 . 6 2 . 63 1 . 74 π -mo del AIPW fix − 0 . 024 − 0 . 97 2 . 48 1 . 71 − 2 . 5 − 21 12 . 2 2 . 72 incorrect WLS − 0 . 026 − 0 . 10 2 . 47 1 . 70 0 . 33 11 3 . 11 2 . 05 REG tilde − 0 . 025 − 0 . 10 2 . 47 1 . 71 0 . 44 16 2 . 74 1 . 80 REG hat − 0 . 42 − 17 2 . 56 1 . 71 − 0 . 026 − 0 . 95 2 . 74 1 . 78 REG (m) tilde − 0 . 025 − 1 . 0 2 . 47 1 . 69 0 . 31 11 2 . 83 1 . 80 REG (m) hat − 0 . 22 − 8 . 9 2 . 48 1 . 71 0 . 035 1 . 3 2 . 76 1 . 77 n = 1000 π -mo del correct π -mo del incorrect IPW 0 . 098 2 . 0 4 . 98 3 . 04 80 8 . 5 951 16 . 8 strat − 1 . 1 − 86 1 . 71 1 . 24 − 0 . 96 − 72 1 . 65 1 . 17 y -mo del correct y -mod el incorrect OLS − 0 . 047 − 4 . 0 1 . 15 0 . 770 2 . 2 152 2 . 67 2 . 21 y -mo del correct y -mod el incorrect π -mo del AIPW fix − 0 . 046 − 4 . 0 1 . 15 0 . 766 0 . 061 3 . 3 1 . 87 1 . 17 correct WLS − 0 . 046 − 4 . 0 1 . 15 0 . 769 0 . 22 16 1 . 39 0 . 957 REG tilde − 0 . 046 − 4 . 0 1 . 15 0 . 773 0 . 12 10 1 . 21 0 . 818 REG hat − 0 . 13 − 11 1 . 16 0 . 796 − 0 . 012 − 0 . 97 1 . 19 0 . 801 REG (m) tilde − 0 . 046 − 4 . 0 1 . 15 0 . 770 0 . 14 12 1 . 25 0 . 849 REG (m) hat − 0 . 083 − 7 . 2 1 . 15 0 . 768 0 . 069 5 . 7 1 . 22 0 . 826 π -mo del AIPW fix − 0 . 12 − 6 . 3 1 . 83 0 . 780 − 31 − 6 . 9 441 2 . 92 incorrect WLS − 0 . 048 − 4 . 1 1 . 15 0 . 76 8 − 0 . 55 − 38 1 . 55 1 . 12 REG tilde − 0 . 044 − 3 . 9 1 . 15 0 . 765 0 . 61 46 1 . 46 0 . 946 REG hat − 0 . 099 − 8 . 5 1 . 16 0 . 787 0 . 57 43 1 . 45 0 . 910 REG (m) tilde − 0 . 045 − 3 . 9 1 . 15 0 . 757 0 . 22 17 1 . 29 0 . 847 REG (m) hat − 0 . 16 − 14 1 . 17 0 . 764 0 . 13 10 1 . 28 0 . 836 or by 20–2 5% (T able 2 ) when the PS mo del is cor- rect but the OR mo del is missp ecified, whic h agrees with the optimalit y prop ert y of ˜ µ REG in Prop osi- tion 4(ii). Ev en the simplified estimator ˜ µ (m) REG gains similar efficiency , although the gain is not guaran- teed in theory . The non-DR estimators ˆ µ REG and ˆ µ (m) REG sometimes ha ve sizeable biases eve n when the PS mod el is correct. Summary One of the m ain p oin ts of KS is that t w o (mo d- erately) missp ecified mo dels are n ot necessarily b et- ter than one. This p oin t is v aluable. But at the s ame time, neither are t wo missp ecified mo dels necessarily w orse than on e. Practitioners ma y c h o ose to imple- men t either of the OR and the PS-DR approac hes, 10 Z. T AN eac h with its own c haracteristics. It is h elpful for statisticia ns to promote a common, rigorous under- standing of eac h app roac h and to in ve stigate new w a ys for improv emen t. W e welco me K S’s article and the d iscussion as a step forw ard in this d irection. A CKNO WLEDGMENTS W e thank Xiao-Li Meng and Dylan Small for helpfu l commen ts. REFERENCES Cochran, W. G. (1977). Sampling T e chniques , 3rd ed. Wiley New Y ork. MR0474575 Hahn, J. (1998). On the role of the prop ensity score in ef- ficient semiparametric estimation of a verage treatment ef- fects. Ec onometric a 66 315– 331. MR1612242 Hammersley, J. M. and Handscomb, D. C. (1964). Monte Carlo Metho ds . Methuen, London. MR0223065 Ro bins, J. M. and Rotnitzky, A. (1995). S emiparametric efficiency in multiv ariate regression mo dels with missing data. J. Amer. Statist. Asso c. 90 122–129. MR1325119 Ro bins, J. M., R otnitzky, A. and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not alwa y s observed. J. Amer. Statist. Asso c. 89 846– 866. MR1294730 Ro bins, J. M., R otnitzky, A. and Zhao, L. P. (1995). Analysis of semiparametric regressi on models for repeated outcomes in the presence of missing d ata. J. A mer. Stat ist. Asso c. 90 106–1 21. MR1325118 Ro senbaum, P. R. and R ubin, D. B. (1983). The central ro le of the prop ensity score in observ ational stud ies for causal effects. Biometrika 70 41–55. MR0742974 T an, Z. (2006). A distribut ional approac h for causal inference using propensity scores. J. A mer. Stat ist. Asso c. 101 1619– 1637. MR2279484
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment