Vaccines, Contagion, and Social Networks
Consider the causal effect that one individual's treatment may have on another individual's outcome when the outcome is contagious, with specific application to the effect of vaccination on an infectious disease outcome. The effect of one individual'…
Authors: Elizabeth L. Ogburn, Tyler J. V, erWeele
V accines, Con tagion, and So cial Net w orks Elizab eth L. Ogburn, T yler J. V anderW eele ∗ Abstract Consider the causal effect that one indi vidual’s treatment may hav e on another indi- vidual’s outcome when the outcome is cont agious, with sp eci fic application to the effect of v accination on an infectious dis e ase outcome. The effect of o ne individual’s v accination on another’s o utco me can b e decomp osed into t wo differen t ca usal effects, called the “in- fectiousness” and “contagion” effects. W e present iden tifying assumptions and estimation or testing pro cedures for infectiousness and co ntagion effects in tw o different settings: (1) using data sampled from independent groups of observ ations , and ( 2) using data collected from a single int erdep endent so cial netw or k. The metho ds that we prop os e fo r so cial net w ork data require fitting g eneralized linear models (GLMs). GLMs and other s tatisti- cal mo dels that req uir e independence across sub jects have be en used widely to es timate causal effects in so cia l net work d ata, but, beca use the sub jects in net works are presumably not independent, the use of such mo dels is generally in v alid, res ulting in inference that is exp ected to b e anticonserv ativ e. W e in tr o duce a wa y to ensur e that GLM r esiduals are uncorrelated a cross sub jects despite the fact that outcome s ar e non-indep endent. This simul taneously demo nstrates the po ssibility of using GLMs and rela ted statistical models for net work data and hig hlights their limitations. 1 In tro duction W e are concerned here with the effect that one individual’s treatmen t may ha ve on another individual’s outcome, when the outcome is cont agious. In the infectiou s disease literature, this is often called an in dir e ct effe ct of treatmen t (Halloran and Struc hiner , 1991), while the effect of an individu al’s treatmen t on his o wn outcome is a dir e ct effe ct . Indirect effects of infectious disease in terven tions are of significan t imp ortance for understanding infectious disease dynam- ics and for designing public health interv en tions. F or example, the goal of man y v accination ∗ Elizabeth Ogbu rn (email: eogburn@jhsph.edu) is Assistant Professo r, Department of Biostatistics, Johns Hopkins Univ ersity , B altimore, MD 21205; T y ler V anderW eele is Prof essor, Departments of Epidemiol ogy and Biostatis tics, Harv ard School of Public Health, Boston, MA 02115. D r. Ogburn’s researc h w as supp orted by gran t s U54 GM 08855 8 and ES017678 from the Natio nal Institutes of Health. Dr. V anderW eele’s researc h was supp orted by grant ES01767 8 from th e National Institutes of Health. 1 programs is to ac hiev e herd imm unit y , whereb y a large enough subset of a p opulation is v ac- cinated that ev en those individuals who remain un v accinated are protected against infection. This is one type of indirect effect of a v accination program; it has b een extensiv ely stud- ied in the infectious disease literature (Anderson et al. , 1 985; Fine, 1993; John and Samu el, 2000; O’Brien and Dagan, 2003). R ecently , in terest has turned to ward s the iden tification and estimation of a verag e individual-lev el indirect effects (Halloran and Struc hiner , 1991, 1995; Halloran and Hudgens, 201 2; V anderW eele and T c hetgen T c hetgen , 2011a; V ander W eele et al., 2012b; V anderW eele and T c hetgen T c hetgen , 2011b), such as the effect on a single mem b er of a comm unit y of t wo differen t v accination program s implemen ted on the rest of the comm unit y (Halloran and Struc hiner, 1995). V anderW eele et al. (2012b) demonstrated that the individual-lev el indirect effect of v acci- nation in comm unities of size t w o can b e decomposed into to t wo different effects, called the “infectiousness” and “con tagion” effects. These tw o effects represen t distinct causal path w ays b y which one p erson’s v accination ma y affect another’s diseas e status. The con tagion effect is the indire ct effect that v accinating one individual ma y ha ve on another b y preven ting the v ac- cinated individual f rom gettin g the disease and thereb y f rom passing it on. The infectiousness effect is the indirect effect that v accination migh t hav e if, instead of prev en ting the v accinated individual from getting the disease, it renders the disease less infectious, thereb y reducing the probabilit y that the v accinated infected individual transmits the disease, even if infected. V anderW eele et al. (2012b) only considered estimation of th e in fectiousness and con ta- gion effects in a sample comprised of independen t households of size t w o with one mem b er of eac h household ass umed to b e homeb ound. The assumption that one individual is home- b ound and the assumption of indep enden t households are restrictive, the latter b ecause it requires that the househo lds b e sampled from distinct comm unities and geographic areas. Ogburn and V anderW eele (2013) considered the setting in which households are indep enden t but b oth individuals ma y b e exp osed outside the household. Here, w e relax the requiremen t of independent households of s ize t w o and pro vide extensions to indep enden t groups of arbitra ry size and to s o cial net w orks. Increasingly , data are a v ailable on the spread of con tagious outcomes through so cial net- w orks. This s etting is considerably more complex than that considered in V anderW eele et al. (2012b), b ecause the observe d outcomes (e.g. disease s tatus) are not indep enden t of one an- other. Th ere is a gro wing literature on the p ossibilit y of testing for the presence of differen t causal mec hanisms using observ ational data from so cial net wor ks and a consensus that more rigorous methods are needed. An emerging b o dy of w ork rep orts results from generalized linear mo dels (GLMs) and, for longitudinal data, generalized estimating equations (GEEs) as estimates of p eer effects, or the causal effect that one individual’s outcome ma y hav e on his or her so cial con tacts’ out- comes (Ali and Dwy er, 2009; Caciopp o et al., 2009; Christakis and F owl er, 2007, 2008, 2013; 2 F o wler and Christakis, 2008; Lazer et al., 2010; Rosenquist et al., 2010). This w ork has come under criticism that can largely b e summarized in to t w o o v erarchin g themes. First, m uch of the criticism fo cuses on the abilit y to con trol for confounding when estimating p eer effects, and sp ecifically on the iden tifying as s umptions that are required in order to tell the differ- ence b etw een the w ell known problem of homophily (the phenomenon b y whic h individuals with similar traits are more lik ely to form so cial ties with o ne another) and p eer influence (Cohen-Cole and Fletc her, 2008; Lyons, 2011; N o el and Nyhan, 2011; Shalizi and Thomas, 2011; V anderW eele, 2011). Homophily will not b e an issue in m any infectious disease set- tings, as man y suc h illnesses, for example the seasonal flu, are unlikely to change the nature of so cial ties. A dequate con trol for confounding is s till crucial, but w e ass ume throughout that all p oten tial confounders of the causal effects of interest are observ ed. This assumption should b e assessed in an y application of these methods and it ma y not hold in man y real data settings; how ev er, w e do not fo cus on this assumption in the remainder of this pap er. The second class of criticisms addresses the use of statistical mo dels for indep enden t obser- v ations in this dep enden t data setting. Lyons (2011) and V anderW eele et al. (2012a) demon- strated the imp ortance of ensuring that mo dels are coheren t when an observ ation can b e b oth an outcome and a predictor (of so cial con tacts’ outcomes); this is easily accomp lished b y using the observ ations at one time p oin t as predictors and the observ ations at a s ubsequen t time p oin t as outcomes, a solution that wa s implemen ted in man y of applications of GLMs and GEEs to so cial net work data referenced ab ov e. More cha llenging is the f act that, when an analysis assumes independence but observ ations are in fact p ositiv ely correlated, as w e w ould exp ect them to b e for con tagious outcomes in a so cial net work, the resulting standard errors and statistical inference will generally b e an ticonserv ativ e. In some cases, the ass umption of independent outcomes ma y hold under the n ull h yp othesis (V anderW eele et al., 2012a), but it is unkno wn whether tests that rely on this fact hav e an y p o w er to detect the presence of the causal effects of in terest (Shalizi, 2012). Our con tribution to methodology for so cial net work analysis is to adapt GLMs to ensure that the mo dels can b e correctly s p ecified, with uncorrelated residuals, ev en when the out- come is con tagious. W e demonstrate the p ossibilit y of testing for the presence of con tagion and infectiousness effects using so cial net wor k data and gen eralized linear mo dels (GLMs). W e discuss the paradigmatic example of the effect of a v accination on an infectious disease outcome, but effects like con tagion and infectiousness are of in terest in other settings as well. Our general approac h to correctly sp ecifying GLMs f or a contagio us outcome using net w ork data could p oten tially b e applied to an y es timand for whic h G LMs are appropriate under independence. The tests that we propose hav e imp ortan t limitations, most notably lo w p ow er to detect effects unless net w orks are large and/or s parse. Ho w ever, this w ork represen ts an imp ortan t pro of of concept in the ongoing endea vor to dev elop metho ds for v alid inference using data collected from a single net w ork. F urthermore, it clarifies the iss ues of mo del mis- 3 sp ecification and in v alid standard errors raised b y previous prop osals for using GMLs to assess p eer effects using net w ork data. 2 So ci al net w orks and con tagi on F ormally , a so cial net work is a collection of individuals an d the ties b etw een th em. The presence of a tie b et w een t wo individuals indicates that the individuals share some k ind of a relationship; what t yp es of relationships are enco ded b y netw ork ties depends on the con text. F or example, w e migh t define a net wo rk tie to include familial relatedness, friendship, and shared place of wor k. Some t yp es of relationships are m utual, for example familial relatedn ess and shared place of w ork. Others, lik e friendship, ma y go in o nly one direction: T om ma y consider Sue to b e his friend, while Sue do es not consider T om to b e her friend. W e will ass ume that all ties in our net w ork are m utual or undirecte d, but the principles of our method extend to directed ties. A no de whose c haracteristics w e wish to exp lain is called an e go ; no des that share ties with the ego are its alters or c ontacts . If an ego’s outcome ma y b e affected b y his con tacts’ outcomes, then we sa y that the outcome exhibits i nduction or c ontagion . So cial net w orks are crucial to understanding man y features of infectious disease dynamic s, and, increasingly , infectious disease researc hers dra w on so cial net w ork data to refine their understanding of transmission patterns and treatmen t effects. F or example, man y mathe- matical mo dels of infectious disease no w incorp orate so cia l net w ork structure, whereas they previously generally assumed uniform mixing among mem b ers of a comm unity (Eubank et al., 2004; Klo vdahl, 1985; Klo v dahl et al., 1994; Keeling and Eames, 2005), and researc hers collect data on sexual con tact net w orks, since properties of these net works can inform strategies for con trolling sexually transmitted diseases (Latora et al., 2006; Eames and Keeling, 2002, 2004 ). It is desirable for a n um b er of reasons to s tudy infectiousness and con tagion in the con text of so cial net w orks rather than in indep enden t comm unities. First, so cial net w ork data ma y b e easier to collect or to access than data on indep enden t comm unities, as the latter setting requires sampling from a large num b er of differen t lo cations or con texts that are separated b y time or s pace. Second, assessing whether traits can b e transmitted from one individual to another through netw ork ties is one of the cen tral questions in the s tudy of so cial net wor ks; assessing infectiousness and conta gion con tributes further insight in to this problem. Finally , so cial net wor k data more realistically capture the true in terdep endencies of the individuals whom we can hop e to treat with an y public health in terv en tion. V accine programs do not in general target d istan t, indep enden t pairs of individuals; the y targ et villages, cities, or comm unities in whic h individuals are in terconnected and th eir outcomes correlated. Therefore, assessing the presence of v accine effects in so cial netw ork data may b e more informativ e for real-w orld applicat ions. The metho ds w e presen t here represen t a first step tow ards b eing able to estimate and p erform inference ab out suc h effects using s o cial net work data. 4 In terven tions to prev ent infectious diseases generally op erate in tw o w ays. Some reduce the susceptibilit y of treated individuals to the disease, thereb y preven ting them from b ecoming infected. E xamples of s uc h in terv en tions are v accines f or tetan us, hepatitis A and B, rabies, and measles (Keller and Stiehm, 2000). These v accines ha ve indirect effects that op erate via con tagion effects. Other in terv ent ions may reduce the lik el iho o d that an infected individual passes on his infection to others. The malaria transmission-blo c king v accine is designed to prev en t mosquitos from acquiring, and thereb y from transmitting, malaria parasites up on biting infected individuals (Halloran and Struc hiner , 199 2). Th is v accine has no protectiv e effect for the v accinated individual, but it renders v accinated individuals less lik ely to transmit the disease. Therefore any indirect effect of the malaria transmission-blocking v accine is due en tirely to an infectiousness effect. Man y in terven tions hav e indirect effects that op erate via b oth con tagion and infectiousness effects. Existing metho ds for assessing causal effects using net work data are limited. Some re- cen t prop os als give methods for ass essing indirect effects when treatmen t can b e randomized (Airoldi et al., 2013; Arono w and Samii, 2012; Bo wers et al., 2 013; Rosenb aum, 2007), but these metho ds are of limited use in observ ational settings or for teasing apart s p ecific t yp es of indirect effects lik e the infectiousness and con tagion effects. Much of the extan t literature relies on GLMs and GEEs, despite the fact that the key assumption of indep enden t outcomes across sub jects is unlik ely to hold in so cial net wo rk settings (Ly ons, 2011). In this pap er, w e in tro duce a wa y to ensure that GLM residuals are uncorrelated across sub jects despite the fact that outcomes are non-indep enden t; this facilitates the use of GLMs to assess infectiousness and con tagion effects in so cial net work cont exts. W e demonstrate through sim ulations that our methods do ha ve some pow er to detect the presence of con tagion and infectiousness effects; ho w ev er, in order to ensure that residuals are uncorrelated, w e make several adaptations to naiv e GLMs; unfortunately these can result in lo w p o wer. The applications that we discuss in this pap er do not require the use of GEEs to accoun t for within-sub ject dep endence o ver time, but the general principles that we use to adapt GLMs to the netw ork setting apply to GEEs as well. 3 Infectiousness and con tagion in indep enden t groups of size 2 3.1 Notation and assumptions Consider K households comprised of tw o individuals eac h and separated by space or time suc h that an infectious disease cannot b e transmitted b etw een individuals in differen t households. Borro wing terminology from the so cial netw ork literature, w e will refer to one individual as the alter, denoted a , and the other as the ego, denoted e . F or no w we assume that in eac h household the ego is un v accinated, and that all v accinatio n o ccurs b efore the start of follo w-up. 5 Con tagion and infectiousness effects are analogous to causal mediation effects of the alter’s v accination on the ego’s outcome, mediated b y the alter’s dis ease status (V anderW eele et al., 2012b). W e formally define these effects in the next s ection after first in tro ducing k ey notation and identi fying as s umptions. F or individual i in household k , i = a, e , let Y t i k b e the outcome at time t and C i k b e a v ector of cov ariates. Let V a k b e an indicator of v accination for the alter in househol d k . Below w e omit the subscript k when con text allo ws. Define Y t i k ( v ) to b e the count erfactual outcome w e w ould ha ve observ ed for individual i in household k at time t , if, p ossibly contra ry to fact, the alter had received treatmen t v . Let M k b e a v ariable that lies on a causal path w ay from V a k to Y t e k . L et Y t e k ( v , m ) b e the coun terfactual outcome for the ego at time t that w e w ould ha ve observ ed if V a k had b een set to v and M k to m . Throug hout w e mak e the consistency assumptions that M k ( v ) = M k when V a k = v , that Y e k ( v , m ) = Y e k when V k = v and M k = m , and that Y e k ( v , M k ( v )) = Y e k ( v ) . Let Y t e k ( v , M k ( v ′ )) b e the coun terfactual disease status for the ego in household k that w e would ha v e observ ed at time t if V a k had b een set to v and M k to its coun terfactual v alue under V a k = v ′ . T o ensure that this coun terfactual is w ell-defined, w e assume that it is h yp othetically p ossible to in terv ene on the mediator without inte rv ening on V a k . Let C k = ( C a k , C e k ) . In order to iden tify functionals of nested counte rfactuals like Y t e ( v , M ( v ′ )) we require the follo wing four ass umptions (Pea rl, 2001): Y t e ( v , m ) ⊥ V a C , (1) Y t e ( v , m ) ⊥ M V a , C , (2) M ( v ) ⊥ V a | C , (3) and Y t e ( v , m ) ⊥ M v ′ C (4) where A ⊥ B | C denotes that A is indep enden t of B conditional on C . Assumptions (1), (2), and (3) corresp ond to the absence of unmeasured confounders for the effects of the exp osure on the outcome ( V a on Y t e ), of the mediator on the outcome ( M on Y t e ), and of the exp osure on the mediator ( V a on M ), resp ectively . Assumption (4) requires that no confounder of the effect of M on Y t e is affected by V a . Discuss ion of these assumptions in the con text of mediation analysis can b e found in P earl (2001 ). Discussion and extension of thes e assumptions to settings with in terference or spillov er effects can b e found in Ogburn and V anderW eele (2013), including discussion of ho w to determine which cov ariates m ust b e included in C . 6 3.2 Previous m etho dology for decomp osing the indirect effect into infec- tiousness and con tagion effects 3.2.1 Iden tification V anderW eele et al. (2012b) describ ed the decomp os ition of indirect effects in to con tagion and infectiousness effects in comm unities of size t w o. They assumed that the outcome can only o ccur once for eac h individual during the follo w-up p erio d. This is a reasonable ass umption for man y inf ectious disease outcomes, for example for the common flu with a follo w-up p erio d consisting of a s ingle flu s eason. They further assumed that i n each pair the ego cannot b e exp osed to the disease except b y the alter, as migh t b e the cas e if the ego we re homeb ound. Let t f b e the time of the end of follo w-up and Y t f k e b e an indicator of whether the ego in household k has had the disease by the end of follow -up. V anderW eele et al. (2012b) defined the p opulation a verage indirect effect of v accination on the ego as E h Y t f e (1) i − E h Y t f e (0) i , or the exp ected difference in the coun terfactual disease status of the ego at end of follow- up when the alter is v accinated compared to when the alter is not v accinated. When the ego is homeb ound, the indicator Y t f k e of whether the ego in household k has had the disease b y the end of follow -up is, equiv alentl y , an indicator of whether the ego w as in fected b y the alter in household k . I n order to generalize the discuss ion of v accine effects to s ettings in whic h the ego can b e infected from outside the home, the outcome Y t f k e should b e defined more precisely as the indicator of whether the ego w as sic k af ter the alter. Sp ecifically , let Y t f k e = I ( alter w as sick at time T < t f and ego w as sic k at time S, T < S ≤ t f ) . The con tagion effect is the protectiv e effect that treating one individual has on another’s disease status b y preven ting the treated individual from getting the disease and thereb y from transmitting it. Let T k b e the time of the first case of the disease in household k . This is akin to the effect of one individual’s treatmen t on another’s disease status as mediated by the first individual’s disease s tatus. F or the purp os es of the analys is b elo w, w e define a disease case to b egin when an individual b ecomes infectious. If infectiousness do es not coincide with the app earance of disease s ymptoms then we may not observe the timing of disease cases directly , but we could infer the time based on when symptoms app ear and on known disease dynamics. F or example, an individual with the flu will generally b e infectious one day b efore he is s ymptomatic (Earn et al., 2002). Therefore, if flu is the disease under study we w ould classify an individual as ha ving the disease b eginning one day b efore he rep orted having flu symptoms. W e as s ume throughout that there are no asymptomatic carriers of the disease. If neither individual in household k is eve r sick then w e define T k to b e the end of follo w- up. No w Y T k a k is an indicator of whether the alter is sic k at time T k , i.e. an indicator of whether the alter is the first individual in the group to get sick; if neither individual gets s ic k then it will b e 0. Let T k ( v ) b e the time at whic h the first infection in household k w ould ha ve o ccurred if the alter had, p ossibly con trary to fact, had v accine status v . Let Y T k ( v ) a k ( v ) 7 b e the coun terfactual disease status of the alter at time T k ( v ) had he had v accine s tatus v . Let Y t f e k = I ( individual e k b ecame infectious after time T k and on or b efore time t f ) . The con- tagion eff ect is giv en b y a con trast in coun terfactuals of the f orm Y t f e v , Y T ( v ′ ) a ( v ′ ) where, unlik e in the mediation f ramewo rk w e describ ed in Section 3.1, the v ariable Y T ( v ′ ) a that pla ys the role of mediator ma y b e a differen t random v ariable in the t w o terms in the c on trast. Sp ecif- ically , the p opulation av erage con tagion effect is E h Y t f e 0 , Y T (1) a (1) i − E h Y t f e (0 , Y T (0) a (0)) i , and Y T (0) a and Y T (1) a will b e differen t random v ariables whenev er T ( 0) 6 = T (1) . This con trast is the difference in exp ected coun terfactual outcomes for the ego when the v accine status of the alter is held constant at 0 but his infection status is set to that under v accination in the first term and to that under no v accination in the second term of the contr ast. It captures the effect that v accination migh t hav e had on the disease status of the ego by prev en ting the alter from con tracting the disease. The nested coun terfactuals are well -defined because we can imagine in tervenin g on Y T k k a without inte rv ening on V k a , f or example b y admin istering imm une b o osters to prev en t the alter from b eing infected or by exp osing the alter to a high dose of flu virus in a lab oratory setting to cause infection. The population a v erage infectiousness effect is E h Y t f e 1 , Y T (1) a (1) i − E h Y t f e (0 , Y T (1) a (1)) i . This is akin to the effect of one individual’s treatmen t on another’s disease status, not mediated through the first individual’s disease status. This effect op erates if treatmen t renders cases of disease among treated individuals less likely to b e trans mitted. Supp ose that the alter in group k w ould get the flu first if v accinated. That is, Y T k (1) a k (1) = 1 . Then the infectiousness effect is the difference in coun terfactual outcomes for the ego comparing the scenario in whic h the alter is v accinated and infected first with the scenario in whic h to the alter is un v accinated and infected first. If the alter in group k w ould not get the flu first under v accination, then the infectiousness effect for group k is n ull. By the consistency assumption w e made in Section 3.1 ab ov e, E h Y t f e 1 , Y T (1) a (1) i = E h Y t f e (1) i and E h Y t f e 0 , Y T (0) a (0) i = E h Y t f e (0) i . The indirect effect of the v accination of the alter on the ego decomp oses in to the sum of the con tagion and infectiousness effects as follo ws: E h Y t f e (1) i − E h Y t f e (0) i = E h Y t f e 1 , Y T (1) a (1) i − E h Y t f e 0 , Y T (0) a (0) i = E h Y t f e 1 , Y T (1) a (1) i − E h Y t f e (0 , Y T (1) a (1)) i + E h Y t f e 0 , Y T (1) a (1) i − E h Y t f e (0 , Y T (0) a (0)) i The assumptions made b y V anderW eele et al. (2012b) allo w for the iden tification of the infectiousness and con tagion effects even if disease status is only observ ed at the end of follo w- up. Because the ego cannot b e infected except b y the alter, Y T k a k = 0 if and only if neither individual is observ ed to get sic k and Y T k a k = 1 if and only if Y t f a k = 1 . Therefore Y t f a k can b e 8 substituted for Y T k a k in the expressions ab ov e. Ogburn and V anderW eele (2013) ga ve ident ifying ass umptions f or the infectiousness and con tagion effects in groups of size tw o when the time of infections is observed . They did not assume that only one mem b er of eac h pair is exp osed from outside of the group; instead they assumed that the probabilit y of the ego contr acting the disease within a fixed follo w-up in terv al if exp osed at time t is constan t in t . This ensures that the time of the first infection T is not a confounder of the mediator-out come relationship, which w ould constitute a violation of assumption (4) b ecause T is affected b y V a . Supp os e that the tw o mem b ers of each pair are distinguishable from one another, for example paren t-child pairs. W e select one of the t wo to b e the alter (e.g the paren t) and the other is the ego (the ch ild). Alternativ ely , if the individuals are exchan geable, that is, if we ha ve no reason to think that the indirect effect and its comp onen ts will b e differen t for one than for the other, then w e can randomly c ho ose whic h sub ject is the alter and whic h is the ego. Ogburn and V anderW eele (2013) defined the indicator Y T + s e of whether the ego is sick af ter time T and b y time T + s to b e the outcome, where s is a constan t that allo ws T to determine a new end of follow -up. This ensures that T do es not confound the mediator-outcom e relationship. The constan t s should b e c hosen to b e the sum of the infectious p erio d ( f ) and the incubation p erio d ( b ) of the disease under study . The infectious p erio d is the length of time during whic h an infected individual is infectious, and the incuba tion p erio d is the length of time b et we en b eing infected and b ecoming infectious. If the alter b ecomes infectious at time T , then he can infect the ego un til time T + f . If infected at time T + f , the ego will b ecome infectious at time T k + f + b = T k + s . Therefore if the alter infects the ego, the ego m ust b e infectious b y time T k + s . W e assume throughout that the time to efficacy of v accine is immediate and that the infectious and incubation perio ds are constan t across individuals. Let Y T k ( v ′ )+ s e k v , Y T k ( v ′ ) a k ( v ′ ) b e the coun terfactual outcome w e w ould hav e observed for the ego in group k at time T k ( v ′ ) + s if the alter’s v accine status we re set to v and the alter’s disease status at time T k ( v ′ ) w ere set to its coun terfactual under v accine status v ′ . The a v erage con- tagion effect in this setting is giv en b y E h Y T (1)+ s e 0 , Y T (1) a (1) i − E h Y T (0)+ s e 0 , Y T (0) a (0) i and the a ve rage infectiousness effect b y E h Y T (1)+ s e 1 , Y T (1) a (1) i − E h Y T (1)+ s e 0 , Y T (1) a (1) i . The sum of these t wo effects is the av erage indirect effect E h Y T (1)+ s e (1) i − E h Y T (0)+ s e (0) i . Although the disease status of the ego is measured s days after the first infection instead of at the end of follo w-up, this indirect effect still captures any effect that the alter’s v accination status can hav e on the ego’s disease status, because after time T + s an y c hange in the disease status of the ego cannot b e caused b y V a . So far we ha ve described all effects on the difference scale, but eve rything w e hav e written applies equally to eff ects on the ratio and o dds ratio scales. On the ratio and o dds ratio scales the indirect effect of v accination decomposes in to a pro duct of the con tagion and in- 9 fectiousness effects. On the ratio scale, the a verag e indirect effect of V a on the disease status of the ego is E h Y T (1)+ s e (1) i /E h Y T (0)+ s e (0) i , whic h is a pro duct of the av erage infectiousness effect, E h Y T (1)+ s e 1 , Y T (1) a (1) i /E h Y T (1)+ s e (0 , Y T (1) a (1)) i , and the a ver age con tagion effect, E h Y T (1)+ s e 0 , Y T (1) a (1) i /E h Y T (0)+ s e (0 , Y T (0) a (0)) i . On the o dds ratio scale f or a binary out- come the decomp osition is E h Y T (1)+ s e (1) i 1 − E h Y T (0)+ s e (0) i E h Y T (0)+ s e (0) i 1 − E h Y T (1)+ s e (1) i = E h Y T (1)+ s e 1 , Y T (1) a (1) i 1 − E h Y T (1)+ s e (0 , Y T (1) a (1)) i E h Y T (1)+ s e (0 , Y T (1) a (1)) i 1 − E h Y T (1)+ s e 1 , Y T (1) a (1) i × E h Y T (1)+ s e 0 , Y T (1) a (1) i 1 − E h Y T (0)+ s e (0 , Y T (0) a (0)) i E h Y T (0)+ s e (0 , Y T (0) a (0)) i 1 − E h Y T (1)+ s e 0 , Y T (1) a (1) i where the first line is the indirect effect, the second line is the infectiousness effect, and the third line is the con tagion effect. 3.2.2 Estimation The con tagion and infectiousness effects are analogous to the natural indirect and direct ef- fects, resp ectivel y , of the effect of V a on Y T + s e with Y T a as the mediator. Natural indirect and direct effects hav e b een written ab out extensiv ely in the causal inference and media- tion literature (see e.g. P earl, 2001; Robins and Greenland, 1992; Robins and Rich ardson, 2010) and it is w ell-know n ho w to estimate them in a v ariet y of settings (I mai et al. , 2010; V aleri and V anderW eele, 2013). This s etting differs f rom those considered by other authors b ecause the outcome Y T + s e is, by definition, equal to 0 whenev er Y T a is equal to 0 ; therefore one m ust b e careful to ensure that any mo del sp ecified for for E h Y T + s e | V a , Y T a , C i is consis- ten t with this restriction. V anderW eele et al. (2012b) describ e ho w to estimate the con tagion and infectiousness effects on the ratio scale in households of size t wo when one individual is homebound, but the pro cedure they presen t ov erlo ok s this restriction and therefore the mo dels they suggest may f ail to con verg e. W e describ e a pro cedure f or estimating the con tagion and infectiousness effects that is appropriate for the setting considered in V anderW eele et al. (2012b) and for the setting in whic h neither individual is assumed to b e homeb ound. W e desc rib e estimation of the effects on the difference and ratio scales. Estimation of effects on the o dds ratio scale is also p oss ible. Supp os e that assumptions (1) through (4) hold for the effect of V a on Y T + s e with Y T a as the 10 mediator and co v ariates C , and that the follow ing t wo mo dels are correctly sp ecified: log n E h Y T + s e | V a , Y T a = 1 , C io = γ 0 + γ 1 V a + γ ′ 2 C (5) log it n E h Y T a | V a , C io = η 0 + η 1 V a + η ′ 2 C . (6) If the outcome is rare then (5 ) can b e replaced with a logistic mo del. The con tagion effect conditional on cov ariates C = c on the difference scale is giv en by E h Y T (1)+ s e (0 , Y T (1) a (1)) | c i − E h Y T (0)+ s e (0 , Y T (0) a (0)) | c i = 0 + E h Y T + s e | V a = 0 , Y T a = 1 , c i n E h Y T a | V a = 1 , c i − E h Y T a | V a = 0 , c io = e γ 0 + γ ′ 2 c ( e η 0 + η 1 V a + η ′ 2 c 1 + e η 0 + η 1 V a + η ′ 2 c − e η 0 + η ′ 2 c 1 + e η 0 + η ′ 2 c ) . and the infectiousness effect conditional on co v ariates C = c is given by E h Y T (1)+ s e (1 , Y T (1) a (1)) | c i − E h Y T (1)+ s e (0 , Y T (1) a (1)) | c i = 0 + E h Y T a | V a = 1 , c i n E h Y T + s e | V a = 1 , Y T a = 1 , c i − E h Y T + s e | V a = 0 , Y T a = 1 , c io = e η 0 + η 1 V a + η ′ 2 c 1 + e η 0 + η 1 V a + η ′ 2 c n e γ 0 + γ 1 V a + γ ′ 2 c − e γ 0 + γ ′ 2 c o . The con tagion and infectiousness effects can b e estimated b y fitting models (5) and (6) and plugging the parameter es timates in to the expressions ab ov e. The standard errors for these estimates can b e b o otstrapped or derived using the delta metho d (similar to those deriv ed in V aleri and V anderW eele, 2013 for the natural direct and in direct effects). Alternativ ely , a Monte Carlo based approac h similar to Imai et al. (2010) can b e used for estimation of the effects and their standard errors. Softw are pack ages lik e SAS and SPSS mediation macros (V aleri and V anderW eele, 2013) or the R mediation pac k age (Imai et al., 2010) cannot b e used in this setting b ecause instead of (5), whic h mo dels the conditional exp ectation of the Y T + s e only in the Y T a = 1 stratum, these pac k ages require fitting a mo del for E h Y T + s e | V a , Y T a , C i . If the ego can also b e v accinated then V e m ust b e included in C . If V a in teracts with V e or with an y other cov ariates, these in teractions can b e incorporated in to the mo dels and p ose no difficult y for es timation. T o test whether there is a contagion effect, we can simply test whether η 1 = 0 . T o test whether there is an infectiousness effect w e can simply test whether γ 1 = 0 . Using the parameters of mo dels (5) and (6 ) we can also estimate the con tagion and in- fectiousness effects on the ratio scale. T he con tagion effect conditional on C = c is given 11 b y E h Y T (1)+ s e (0 , Y T (1) a (1)) | c i E h Y T (0)+ s e (0 , Y T (0) a (0)) | c i = 0 + E h Y T + s e | V a = 0 , Y T a = 1 , c i E h Y T a | V a = 1 , c i 0 + E h Y T + s e | V a = 0 , Y T a = 1 , c i E [ Y T a | V a = 0 , c ] = E h Y T a | V a = 1 , c i E [ Y T a | V a = 0 , c ] = e η 1 + e η 0 + η 1 + η ′ 2 c 1 + e η 0 + η 1 + η ′ 2 c (7) and the infectiousness effect is given by E h Y T (1)+ s e (1 , Y T (1) a (1)) | c i E h Y T (1)+ s e (0 , Y T (1) a (1)) | c i = 0 + E h Y T + s e | V a = 1 , Y T a = 1 , c i E h Y T a | V a = 1 , c i 0 + E h Y T + s e | V a = 0 , Y T a = 1 , c i E [ Y T a | V a = 1 , c ] = E h Y T + s e | V a = 1 , Y T a = 1 , c i E h Y T + s e | V a = 0 , Y T a = 1 , c i (8) = e γ 1 . Under the restriction that Y T + s e = 0 whenev er Y T a = 0 , the con tagion effect on the ratio scale is s imply a measure of the effect of the alter’s v accination on the alter’s outcome. It is mathematica lly undefined if E h Y T + s e | V a = 0 , Y T a = 1 , c i = 0 , that is, if the ego’s outcome has no effect on the alter’s outcome, but it is natural to define it to b e equal to the n ull v alue of 1 in this case. The infectiousness effect on the ratio s cale is s imply a measure of the effect of the alter’s v accination on the ego’s outcome among pairs in whic h the alter is sick first, that is, in the Y T a = 1 stratum. 4 Infectiousness and con tagion in groups of more than t w o Although allo wing b oth individuals in a household to b e infe cted from outside the household generalizes the results of V anderW eele et al. (2012b), it st ill requires the strong assumption, inheren t in the iden tif y ing assumptions describ ed in Section 3, that the alter and ego do not share an y p oten tially infectious con tacts. If b oth of the individuals in a giv en household could b e infected from outside the household by the s ame m utual friend, then that friend’s disease status would b e a confounder of the mediator-outc ome relationship; if unobserv ed, it w ould constitute a violation of assumption (2 ). W e can relax the assumption of no m utual con tacts outside of the household by collecting data on an y such con ta cts and con trolling for them as co v ariates in our estimating pro cedure. In this section, w e consider iden tification and estimation o f the contag ion and infectiousness 12 effects when indep endent groups of individuals are s ampled. W e ass ume that eac h group includes a pair of individuals who furnish the exp os ure, mediator, and outcome v ariables, plus all mu tual and p oten tially infectious con tacts of the pair. Sev eral types of sampling pro cedures could giv e rise to this data structure. F or example, one poss ibilit y w ould be to sample w orkplaces and randomly select t w o individuals to pla y the role of the alter and ego; another w ould b e to sample household pairs first, ascertain the iden tities of p oten tial m utual cont acts outside of the home, and include all such conta cts in the data collection mo ving forw ard. The s ampling procedure does not affect the iden tification or estimation results described b elo w. Let k index the k th group, k = 1 , ..., K . Let Y t i k b e an indicator of whether individual i in group k has had the disease b y da y t . As in Section 3, w e define a case of the disease to b egin when the individual b ecomes infectious and let s = f + b b e the sum of the infectious and incubation p erio ds for the disease. W e assume that v accination o ccurs b efore the start of follo w-up. Given a non-rare outcome like the flu and time measured in discrete in terv als lik e da ys , it is likely that w e wou ld observ e m ultiple individuals to get sic k on the same da y . W e therefore do not mak e the assumption, made in Section 3, that no tw o individuals can b e observ ed to get sick at the same time. F or group k , let e k index the ego, whose flu status we wish to study , and let a k index the alter, whose v accination s tatus ma y or may not ha v e an effect on the ego’s disease status. W e index the other individuals in group k b y 1 , 2 , ..., n k . Let T k b e the time of the first infection in the k th alter-ego pair. As in Section 3, the ego furnishes the outcome, Y T k + s e k . The alter furnishes the treatmen t, v accine status V a k , and the mediator, indicator of first infection Y T k a k . When con text allow s, w e omit the subscript k . The definition of the mediator needs to b e mo dified sligh tly to reflect the fact that the alter and the ego could get s ic k at the same time: let Y T a b e an indicator of whether the alter w as sick and the ego health y at time T . Let Y T + s e b e an indicator of whether the ego got sic k b etw een time T + b , whic h is the first time at whic h the alter could ha ve infected the ego, and time T + s , whic h is the last time at whic h the alter could ha ve infected the ego. This definition preserv es the in terpretation of Y T a as an indicator that the alter wa s sic k b efore the ego; if the ego and the alter simu ltaneously fell ill on day T then Y T a will be 0 , whic h is desirable b ecause the ego cannot ha ve caugh t the disease from the alter if they b oth fell ill on the same day . It also preserve s the restriction, discussed in Section 3, that Y T + s e is equal to 0 wheneve r Y T a is. Y T ( v ′ )+ s e v , Y T ( v ′ ) a ( v ′ ) is the coun terfactual flu status of the ego at time T ( v ′ ) + s had the alter’s v accine status b een s et to v and his flu status at time T ( v ′ ) s et to its coun terfactual v alue under v accine status v ′ , where T ( v ′ ) is the time at whic h the first infection in the alter- ego pair wou ld ha ve o ccurred if V a had b een set to v ′ . The effects of int erest are the a vera ge 13 con tagion effect C on = E h Y T (1)+ s e 0 , Y T (1) a (1) i E h Y T (0)+ s e 0 , Y T (0) a (0) i (9) and the a ve rage infectiousness effect I nf = E h Y T (1)+ s e 1 , Y T (1) a (1) i E h Y T (1)+ s e 0 , Y T (1) a (1) i , (10) where the exp ectations are tak en o ver all ego-alter pairs. In order to iden tify the effects defined in (9) and (10), w e m ust measure and con trol for all confounders of the relationships b et w een Y T + s e and Y T a , and in particular the p oten tial m utual infectious con tacts of the alter and ego. T o motiv ate our pro cedure for con trolling f or these confounding contac ts, consider the simple case of a gr oup of size three, comprised of a c hild (ego), a parent (alter), and a grandparen t. In the ev ent that the grandparen t contr acted the flu first and transmitted it to b oth the child and the paren t, the grandparen t’s flu status w ould clearly b e a confounder of the mediator-outc ome relationship. But the grandparen t’s en tire disease tra jectory is not a p otenti al confounder; in particular an ything that happ ens to the grandpare n t after time T , that is after the first infection in the paren t-chil d pair, o ccurs after the mediator and cannot p ossibly confound the mediator-outcome relationship. In this simple, three-person group, it suffices to con trol for an indicator of whether the grandparen t has b een sic k b y time T − b , where T is the time of the first infection b et we en the paren t and c hild, and T − b is the latest time at whic h the grandparen t could ha v e b een the cause of an infection at time T . In practice, w e will likely ha ve to sample groups of size greater than three in order to con trol for confounding by p oten tial mu tually infectious con tacts. It is generally s ufficien t to con trol for a summary measure of the infections o ccurring b efore T − b in eac h group. If eac h infectious conta ct of an individual has an indep endent probabilit y of transmitting the disease to the individual, then the sum P n k i =1 Y T − b k i of indicators of whether eac h m utual con tact has b een sic k b y time T − b suffices to contro l for confounding b y p oten tial m utual infectious con tacts. Unde r a different transmission mo del, the prop ortion P n k i =1 Y T − b k i / n k of con tacts who w ere sic k b y time T − b could b e the op erativ e summary measure. If some of the m utual con tacts ma y ha ve been v accinated, then separate summary measures (s um or proportion sic k b y time T − b ) s hould b e included f or v accinated and for un v accinated contacts. In what follo ws w e will assume that the sum is an adequate summary measure. 14 4.1 Alternativ e sampling sc hemes Alter-cen tric sampling can also b e used to collect data on v ariables that suffice to ident ify the con tagion and infectiousness effects. Instead of sampling an alter-ego pair and all of their m utual con tacts, we can sample an individual to serv e as the alter and all of his p oten tially infectious con tacts. The ego is randomly selected f rom among the alter’s con tacts. Conditional on the num b er of the alter’s con tacts who ha ve b een infectious b y da y T − b , Y T a is indep enden t of the n umber of mutual con tacts who w ere sick b y time T − b . The n umber of m utual con tacts is no longer a confounder of the relationship b et w een Y T a and Y T + s e and there is no need to ascertain th e identit y or disease status of the m utual con tacts. Ho w ever , the n um b er of p oten tially infectious contac ts of a single p erson can b e v ast, and it ma y b e easier to iden tify m utual con tacts of a pair of individuals than all con tacts of an y one individual. 5 Infectiousness and con tagion in so cial net w orks So f ar, we hav e assumed that our ob serv ations, comprised of groups of ind ividuals, w ere independent of one another. This as s umption will, in general, b e violated when the alter- ego pairs are sampled from a single comm unit y or so cial netw ork. W e in tro duce some new notation for this con text after briefly describing the example that will serv e as the basis for our exp osition and later for our simulati ons and data analysis. Consider trac king the s easonal flu in the studen t p opulation of a college at whic h all s tuden ts liv e in dorms on campus. Eac h studen t is a no de in the net w ork. W e define a tie to exist betw een tw o no des if the individuals regularly in teract with one another in a w ay that could facilitate transmission of the flu. F or example, if t wo individuals are ro ommates, eat togeth er in the dining hall, or are close friends, then their nodes share a tie. W e observ e eac h individual’s flu status every da y o v er the course of the flu season, which lasts for 100 da ys. The con tagion and infectiousness effects C on and I n f , defined in Section 4, are not es- timable from so cial net w ork data using the metho ds that we prop ose b elo w. Instead w e can define new con tagion and infectiousness effects suc h that h y p othesis tests based on the new effects are v alid and consisten t tests of the h yp otheses that C on and I n f are n ull. W e give assumptions under which the new estimands are estimable fro m net w ork data using GLMs and w e demonstrate that tests of the h y p otheses for the new estimands are v alid and consisten t for C on and I nf . 5.1 Assumptions Along with assumptions (1) - (4), w e make sever al additi onal assumptions that facili tate inference using so cial net wo rk data. Define A i = { j : i and j share a tie } to b e the collection 15 of indices f or individual i ’s con tacts. W e assume that Y t i ⊥ Y r j | X m ∈A i : V m = v Y t − b m , v = 0 , 1 , for all j / ∈ A i and r ≤ t . (11) The set in the conditioning ev en t includes the num ber of v accinated con tacts of individual i who were sick on or b efore da y t − b and the n um b er of unv accinated conta cts of individual i who w ere sic k on or bef ore da y t − b . This assumption says that the outco me of individual i at time t is indep enden t of all past outcom es for non-con tacts of i , conditional on a summary measure of the flu history of the con tacts of i . In other word s, cont acts act as a causal barrier b et we en t wo no des who do not themselves share a tie. If tw o individuals, i and j , do not share a tie, then they can ha ve no effect on one another’s disease status that is not through their con tacts’ disease statuses. Because t − b is the latest time at whic h a disease transmission could affect Y t i , w e do not need to condition on the con tacts’ outcomes past that time. This assumption implies that the total n umber of v accinated and un v accinated con tacts of individual i who ha ve b een sic k b y day t − b are a sufficien t s ummary measure of the complete history of all of i ’s con tacts. It could easily b e mo dified so that the probabilit y of b eing infected at any given time dep ends on a differen t summary measure, f or example on the prop ortion of alters who w ere infectious at or b efore time t − b . W e also assume that Y t i ⊥ V j | X m ∈A i : V m = v Y t − b m , v = 0 , 1 for all j / ∈ A i (12) and that, f or an y cov ariate C that is required f or (1) through (4) to hold, Y t i ⊥ C j | X m ∈A i : V m = v Y t − b m , v = 0 , 1 for all j / ∈ A i . (13) These assumptions state that an y effect of the co v ariates (including v accination) of no des without ties to i on i ’s disease status w ould again ha v e to b e mediated by the disease statuses of i ’s conta cts. Assumption (12) implies that the infectiousness effect is not transitiv e: whether individual j caugh t the flu from a v accinated or un v accinated p erson has no influence on whether individual j transmits the flu. Em b edded in assumptions (11)-(13) is the ass umption that all ties are equiv alen t and all non-ties are equiv alen t with resp ect to transmission of the outcome. This is likely to b e a simplification of reality . It can b e relaxed (s ee Section 5.3), but w e mak e it now for heuristic purp oses. I t rules out the p oss ibilit y that some t yp es of ties , like ro ommates, are more lik ely to facilitate disease transmission than others, lik e friends who liv e in differen t dorms. It allow s 16 an individual to come into contac t with and p ossibly infect (or b e infected by) p eople with whom he do es not share a tie, but it en tails that he will come in to con tact with an y individual in the net wor k who is not his con tact with equal probabilit y . This rules out, for example, the p ossibilit y that an individual is more lik ely to b e infected b y the friends of his friends than b y a distan t no de on the netw ork. W e also make the no-unmeasure d-confounding assumption that, if there exists a p erson with whom t wo individuals in the netw ork inter act regularly , then that p erson is also in the net w ork (with ties to b oth individuals). I n some settings it may b e p ossible to satisfy this condition, e.g. in full s o ciometric studies conducted de no v o, or in s tudies of online data. 5.2 Estimation and hypothesis testing Consider the follo wing strategy for estimating a new con tagion and new infectiousness effect, defined b elo w: 1. Randomly select from the net w ork K pairs of no des suc h that the t w o no des in each pair share a tie, but, for each pair, neither no de nor any of th eir con tacts has a tie to a no de in any other pair or to the contac ts of any mem b er of any other pair. The num b er of p ossible suc h pairs will depend on the netw ork size and top ology . In the next section, w e discuss methods f or sampling these pairs. Randomly s elect one mem b er of each pair to b e the ego and one to b e the alter. 2. Index the pairs b y k , and let e k index the ego and a k the alter in the k th pair. F or the k th pair, define a group, also indexed by k , that includes no des a k , e k , A e k , and A a k . That is, it includes the alter-ego pair and all no des with ties to either the alter or the ego. Due to the w ay we selected pairs, none of the mem b ers of group k can b elong to an y other group. Belo w, w e suppress the index k when con text allo ws. As in the sections ab ov e, T k is the time of the first infection in the pair ( a k , e k ) . Let C k b e a collection of co v ariates f or group k , where the v ariabl es included in C are precisely those required f or ass umptions (1) through (4) to hold for outcome Y T (1)+ b e , mediator Y T (1) a , and treatmen t V a . Note that V e should b e included in C as it is likely to b e a confounder of the mediator - outcome relationship. The n um b er of m utual con tacts of the alter and ego who w ere s ic k by time T − b mu st also b e included. 3. Let U T k + f e k and L T k + f e k b e the n umber of un v accinated and v accinated no des, resp ectively , with ties to e k who wer e sic k b y time T k + f . Define U T − b a k and L T − b a k similarly as the n um b er of un v accinated and v accinated no des, resp ectivel y , with ties to a k who w ere s ick b y time T k − b . Recall that f is the infectiousness p erio d and b the incubation p erio d, defined in Section 3.2.1. 17 4. Estimat e an a v erage mo dified con tagion effect C on ∗ = E h Y T (1)+ s e (0 , Y T (1) a (1)) | U T − b a , L T − b a , U T + f e , L T + f e , C i E h Y T (0)+ b e (0 , Y T (0) a (0)) | U T − b a , L T − b a , U T + f e , L T + f e , C i and an a vera ge mo dified infectiousness eff ect I nf ∗ = E h Y T (1)+ s e (1 , Y T (1) a (1)) | U T − b a , L T − b a , U T + f e , L T + f e , C i E h Y T (1)+ s e (0 , Y T (1) a (1)) | U T − b a , L T − b a , U T + f e , L T + f e , C i and their standard errors. Through Step 2, the pro cedure we describ ed is nearly iden tical to the propos al in Section 4, the only difference b eing that groups are extracted from a netw ork in Step 1 rather than b eing independent ly ascertained. Consideration for this sampling sc heme b ecomes crucial when we estimate the parameter s of GLMs lik e (5) and (6). The standard errors derived f rom these GLMs are consisten t only if the residuals across groups are u ncorrelated. The residuals are indeed uncorrelated for indep enden t groups, but, in the net wo rk setting, they generally are not. How ev er, the s et of additional cov ariates int ro duced in Step 3 essen tially blo cks the flo w of information b etw een groups. Conditional on these additional cov ariates, the residuals are uncorrelated , ev en in the net w ork s etting (see next section for proof ). Roughly , because U T k + f e k and L T k + f e k summarize the disease statuses of the ego’s con tacts b days b efore the outcome Y T (1)+ s e k is as sessed, condition ing on them ensures that the outcomes are uncorrelated across groups. Because U T − b a k and L T − b a k summarize the disease statuses of the alter’s con tacts b da ys b efore the mediator Y T (1) a k is as sessed, conditioning on them ensures that mediators are uncorrelated across groups. The effects defined in Step 4 differ from C on and I nf only in the conditioning s et, but this c hanges sligh tly the causal effect being estimated. Conditioning on U T − b a and L T − b a is j ust lik e conditioning on an ex tra pair of confounders: these v ariables o ccur b efore the mediator and are indep enden t of the treatmen t; therefore they can b e considered to b e pre-treatmen t co v ariates. On the other hand, U T + f e and L T + f e o ccur after the mediator and lie on a poss ible path w ay f rom the mediator to the outcome. Conditioning on th es e v ariables has the effect of biasing C on ∗ and I n f ∗ to w ards the n ull relative to C on and I nf , b ecause it blo c ks the path from Y T a to Y T + s e that op erates when the alter infects a friend of the ego, who then infects the ego. How ev er, conditioning on these v ariables lea ves th e direct path from Y T a to Y T + s e op en, and this path op erates whenev er the alter infects the ego directly . Therefore, whenev er C on and I nf are non-n ull so are I nf ∗ and C on ∗ . Hyp othesis tests using C on ∗ and I nf ∗ are conserv ativ e and consistent for h yp othesis tests for C on and I nf . Similarly , tests that C on ∗ and I nf ∗ are less than the nu ll v alue or are greater than the n ull v alue are also v alid and 18 consisten t for the analogous tests for C on and I nf , resp ectiv ely . 5.2.1 Justification for the use of GLMs Supp os e that the mo dels g ( E h Y T k + s e k | V a k , Y T k a k = 1 , U T − b a , L T − b a , U T + f e , L T + f e , C k i ) (14) = β 0 + β 1 V a k + β 2 U T − b a + β 3 L T − b a + β 4 U T k + f e + β 5 L T k + f e + β ′ 6 C k and m ( E h Y T k a k | V a k , U T − b a , L T − b a , U T + f e , L T + f e , C k i ) = α 0 + α 1 V a k + α 2 U T k − b a + α 3 L T k − b a + α 4 U T k + f e + α 5 L T k + f e + α ′ 6 C k (15) are correctly sp ecified for g () , m () kno wn link functions. F or the effect on the ratio scale with a binary common outcom e lik e the flu w e w ould sp ecify g () to b e the log link and m () the logit link, lik e w e did in Section s 3 and 4. W e ha v e only to pro ve that the residuals from mo del (14) are uncorrelated with one another and that the residuals f rom mo del (15) are uncorrelated with one another (Breslo w, 1996; Gill, 2001). Result 1 Let R es a k = Y T k a k − m − 1 α 0 + α 1 V a k + α 2 U T k − b a + α 3 L T k − b a + α 4 U T k + f e + α 5 L T k + f e + α ′ 6 C k . Then Res a k and R es a h are uncorrelated. Pro of Without loss of generalit y assume that T k > T h . Under correct sp ecification of (15), E [ Res a k ] = E [ R es a h ] = 0 . Therefore C ov ( Res a k , R es a h ) = E [ Res a k Res a h ] . Letting S k denote the set of v ariables n V a k , U T − b a , L T − b a , U T + f e , L T + f e , C k o , w e hav e E [ Res a k Res a h ] = E [ E [ R es a k Res a h | S k , S h ]] = E h E hn Y T k a k − E h Y T k a k | S k io n Y T h a h − E h Y T h a h | S h io | S k , S h ii = E h E h Y T k a k − E h Y T k a k | S k i | S k , S h i × E h Y T h a h − E h Y T h a h | S h i | S k , S h ii = E hn E h Y T k a k | S k , S h i − E h Y T k a k | S k io × E n Y T h a h − E h Y T h a h | S h i | S k , S h oi = E hn E h Y T k a k | S k i − E h Y T k a k | S k io × E n Y T h a h − E h Y T h a h | S h i | S k , S h oi = 0 . The second equalit y f ollo ws from the correct sp ecification of (15). The third equalit y holds b ecause, b y assumptions (11), (12 ), and (13), Y T k a k ⊥ Y T h a h | S k , S h . The fifth inequalit y holds b ecause Y T k a k ⊥ S h | S k , again b y assumptions (11), (12), and (13 ). 19 Result 2 Let R es e k = Y T k + s e k − g − 1 β 0 + β 1 V a k + β 2 U T k − b a k + β 3 L T k − b a k + β 4 U T k + f e + β 5 L T k + f e k + β ′ 6 C k . Then Res a k and R es a h are uncorrelated. The proof of R esult 2 is v ery similar to the pro of of Result 1 and we therefore omit it. It relies on the fact that, conditional on the fact that T + f = T + s − b and therefore conditioning on U T k + f e k and L T k + f e k satisfies the conditions of assumptions (11 ), (12), and (13) and renders Y T k + s e k independent of outcomes, v accines, and co v ariates for other groups. 5.2.2 Implement ation Step 1 is the most difficult to implemen t. One could enu merate all p ossible w ays of parti- tioning the net w ork into non-o v erlapping groups comprised of a pair of no des and all of their con tacts, asso ciate the partitions with a discrete uniform distribution, and randomly sample one realization of the uniform distribution. Steps 2 and 3 of the testing pro cedure are p er- functory . If w e define C ∗ = U T − b a , L T − b a , U T + f e , L T + f e , C to be a new collection of cov ariates then step 4 pro ceeds as in Sections 3 and 4. In teractions b et ween comp onen ts of C ∗ and the other predictors in the mo del can easily b e accommodated. T o test the h yp otheses that C on and I nf are n ull, we estimate 95% confidence in terv als for the mo dified con tagion and infec- tiousness estimands ( C on ∗ and I nf ∗ ) based on the estimates and standard errors calculated in Step 4. W e reject the h yp othesis that C on is null if our confidence in terv al for the estimand in C on ∗ do es not include the n ull v alue and w e reject the hypothesis that I nf is n ull if our confidence in terv al for the estimand in I nf ∗ do es not include the n ull v alue. 5.3 Relaxing some assumptions W e assumed throughout th at v accination o ccurs before the start of follo w-up, but this is not necessary f or our metho ds. If v accination can o ccur during follo w-up, define V t i to b e an indicator of havin g b een v accinated b y time t . Ass ume that the effect of v accination, including an y infectiousness effect, is immediate. If an individual b ecomes infectious on da y T , he w ould ha ve b een infected on da y T − b . If he was v accinated by time T − b , then the v accine w ould hav e b een in full effect at the time of infection. Then V T − b a can replac e V a as the “treatmen t” in the con tagion, infectiousness, and indirect effects. W e similarly redefine the summary measures for v accinated and un v accinated con tacts of the alter and ego that app ear in ass umptions (11) through (13) and that are included in C . Include V T − b e in the set of confounders b ecause the mediator o ccurs at time T and therefore the ego’s v accination status at time T − b suffices to con trol for any confounding. W e assumed throughout that the infectious and incubation p erio ds ( f and b ) are constan t across individuals. These ass umptions, along with the ass umption that the effect of v accination is immediate, could b e relaxed if the determinan ts of time to efficacy of v accine, length of infectious p erio d, and length of incubation p erio d w ere obs erved cov ariates. In this case w e 20 could, f or example, infer effective time of v accination , incubation p erio d, and infectious p erio d for eac h individual based on their cov ariates. W e assumed in Section 5.2 that the probabilit y of disease tra nsmission b et ween tw o con- nected no des do es not dep end on the t yp e of tie. This ass umption can b e a voide d with the addition of sev eral co v ariates to mo dels (14 ) and (15): w e w ould condition on the type of tie that exists b et we en the alter and the ego, and also include separate U an d L terms for eac h t yp e of tie. W e also assumed in Section 5.2 that an individual will come int o con tact with any individual in the net work who is not his con tact with eq ual probabilit y . This can b e relaxed b y expanding the k groups w e define in Step 1 of the estimation pro cedure to include no des within sev eral degrees of s eparation f rom the alter and ego. 6 Sim ula tions 6.1 Indep enden t groups W e ran sim ulations for three differen t sample sizes, K = 200 , K = 500 , and K = 1000 independent groups. Eac h group comprised an alter, an ego, and n k m utual con tacts. First w e generated K con tact group s izes n k b y sampling from a P oisson distribution with mean λ = 3 . Next, w e assigned v accination statuses to eac h individual in eac h group, including the alters and egos, with probabilit y 0 . 4 . W e simul ated the b eha vior of each group during a flu epidemic ov er 100 da y s. F or the purp oses of the sim ulation, we assumed that eac h mem b er of a group had con tact with all other mem b ers of the same group. Eac h da y , an uninfected mem b er of a group had a baseline probab ilit y of p o of b eing infected from outside of the group, a baseline probabilit y of p u of b eing infected b y an y infectious, un v accinated mem b er of the same group and a baseline probabilit y of p v of b eing infected b y an y infectious, v accinated mem b er of the same group. If v accinated , an individual’s probabilit y of b eing infected by an y source was mul tiplied by δ ≤ 1 . If infected on day t , an individual w as infectious from da y t + 1 through da y t + 4 and incapable of b eing infected or transmitting infection f rom day t + 5 unt il the end of f ollow-u p. This corresp onds to an incubation p erio d of b = 1 and an infectious p erio d of f = 3 , and it mimics the flu, for whic h the incubation p erio d is b et ween one and three da ys and the infectious perio d is betw een three and six da ys (Earn et al., 2002). In all sim ulations, w e fixed p o = 0 . 01 . W e sp ecified t wo differen t sim ulation settings for the parameters δ , p v , and p u , one setting corresp onding to the n ull of no infectiousness or con tagion effects ( δ = 1 ; p v = p u = 0 . 4 ) and one s etting corresp onding to the presence of protectiv e con tagion and infectiousness effects ( δ = 0 . 1 ; p v = 0 . 5 , p u = 0 . 05 ). W e simulat ed 500 epidemics eac h under o f the t wo s cenarios, and for each s im ulation we estimated the infectiousness and con tagion effects as follow s: Among the s ubset of groups with Y T a = 1 and using a log-linear link f unction, w e regressed Y T + s e on V a and on the set of p oten tial 21 T able 1: Sim ulation results for independen t groups Under H 0 Num b er of groups Infectiousness (SE) Co v erage Con tagion (SE) Cov erage K = 200 1.0 14 (0.138) 94% 1.016 (0.202) 94% K = 500 1.0 01 (0.082) 92.2% 0.997 (0.117) 93.6% K = 1000 0.997 (0.057) 95% 1.001 (0.083) 94% Under H A Num b er of groups Infectiousness (SE) P ow er Con tagion (SE) P o we r K = 200 0.4 53 (1.160) 49% 0.258 (0.079) 100% K = 500 0.4 43 (0.154) 86% 0.255 (0.049) 100% K = 1000 0.445 (0.107) 100% 0.258 (0.034) 100% confounders comprised b y the ego’s v accination status, the sum U T − b a of un v accinated m utual con tacts who were infectious at time T − b , and the sum L T − b a of v accinated m utual con tacts who w ere infectious at time T − b . W e regressed Y T a on the same co v ariates using a logistic link function. The contag ion and infectiousness eff ects are iden tified by the expressions given in (7) and (8), ev aluated at the sample mean v alue of the co v ariates U T − b a and L T − b a . W e b o otstrapp ed the standard errors with 500 b o otstrap replications. The results are giv en in T able 1. F or each simula tion setting, that is, for eac h sample size ( K ) and f or b oth the null h y p othesis and the alternativ e h yp othesis, w e presen t the mean point estimates for the infectiousness and con tagion effects on the ratio scale, the mean b o otstrap standard error estimator, and the p ercent co verage of the 95% confidence in terv al based on the 2 . 5 th and 97 . 5 th b o otstrap quan tiles. F or sim ulations under the nul l hypothesis, we rep ort co vera ge and for sim ulations under the alternativ e we report p ow er, giv en b y 100% min us the p ercen t co v erage. The p oint estimates are s table across sample sizes and the co verag e of the basic b o otstrap confidence in terv al is close to 95% under the n ull for all K . The p ow er under the alternativ e is 100% f or the contagi on effect, but for the infectiousness effect p o w er is low (49%) when K = 200 . 6.2 So cial net work data The procedure prop osed in Section 5.2 for hypothesis testing using so cial net w ork data suffers from lo w p o we r. In part this is b ecause C on ∗ and I nf ∗ are biased to w ards the n ull relativ e to C on and I nf , but the primary reason for the loss of p o wer is the extraction of conditional ly independent pairs of no des from the net wo rk. As the simul ation illustrates, this results in a dramatic reduction in the sample size used for analysis. Because infectious outcomes sam- pled from nodes in a net w ork are dep enden t, the effectiv e sample size for inference ab out suc h outcomes will alw ays be smaller than the observ ed n um b er of no des, and ho w m uch more infor- 22 mation ab out the parameters of in terest is a v ailable dep ends on the sp ecific s etting. Imp ortan t areas for future researc h include determining the effectiv e sample size when observ ations are sampled from a net w ork and are therefore dep enden t, and developin g methods that mak e use of all av ailable information. W e ran sim ulations for three different net w ork s izes: 12000 no des, 10000 no des, and 8000 no des. W e sim ulated a net wor k of 1000 0 no des as follo ws: first, w e sim ulated 2000 indep endent groups of 5 no des, with eac h group b eing fully connected (i.e. there are ties betw een eac h pair of no des in the group of 5). F or eac h no de w e then added a tie to e ac h out-of-group no de with probabilit y 0 . 0001 . Because ties are undirected (if no de i is tied to no de j , then b y definition no de j is tied no de i ), this results in appro ximately 2 exp ected out-of-group ties p er no de. T o sim ulate net w orks of size 12000 and 8000 , we sim ulated 2400 and 1600 indep enden t groups, resp ectiv ely , and scaled the probabilit y of an out-of-group tie to main tain an exp ected v alue of appro ximately 2 for eac h no de. This net w ork structure could represen t a sample of families living in a cit y , where individuals are fully connected to the members of their f amily and o ccasionally connected to mem b ers of other f amilies. After running step 1 of the pro cedure outlined in Section 5.2, w e w ere left with K = 707 alter-ego pairs for the net work of size 12000 , K = 581 for the netw ork of size 10000 , and K = 466 for the netw ork of size 8000 . On eac h of these three fixed netw orks, w e sim ulated 200 epidemics under the n ull of no infectiousness or con tagion effect and 200 epidemics under the alternativ e. F or each sim ulation, w e assigned v accination statuses to eac h individual in the net w ork with probabilit y 0 . 5 . W e then sim ulated the b eha vior of eac h group during a flu epidemic o ver 100 days. An uninfected no de had a probabilit y of p o = 0 . 01 of b eing infected from outside of the net w ork on day 1 and there w ere no outside infections thereafter. Under the alternativ e, on eac h da y an uninfected no de had a baseline probabilit y of p u = 0 . 5 of b eing infected b y any infectious, un v accinated con tact and group and a baseline probabilit y of p v = 0 . 01 of b eing infected by any infectious, v accinated con tact. If v accinated, an individual’s probabilit y of b eing infected b y an y source w as m ultiplied b y δ = 0 . 2 . Under the null , on eac h day an uninfected no de had a probabilit y of p u = p v = 0 . 5 of b eing infected b y an y infectious con tact (that is, no de with whic h it shared a tie). T o ensure that the con tagion effect w as n ull, we sp ecified that δ = 1 , that is, that v accination had no protectiv e effect against con tracting the flu. In b oth settings, if infected on da y t an individual w as infectious from da y t + 1 through da y t + 4 and incapable of b eing infected or transmitting infection from da y t + 5 un til the end of follo w-up. F or eac h sim ulation, w e estimated the infectiousness and contagi on effects follo wing the pro cedure describ ed in Section 5.2. W e ev aluated these effects at the sample mean v alue of the cov ariates U T − b a , L T − b a , U T + f e and L T + f e . W e b o otstrapp ed the standard errors with 1000 b o otstrap replications. The results are given in T able 2. F o r eac h sim ulation setting, that is for eac h net wo rk s ize and for b oth the n ull hypothesis and the alternativ e h y p othesis, we presen t the mean p oin t estimates for the infectiousness and con tagion effects on the ratio 23 T able 2: Sim ulation results for net w ork data Under H 0 Net wor k size Infectiousness (SE) Co verage Con tagion (SE) Co v erage 8000 no des 0.996 (0.001) 100% 1.205 (1.657) 96% 10000 no des 1.000 (0.001) 100% 1.183 (1.183) 94% 12000 no des 1.001 (0.001) 100% 1.166 (0.) 94% Under H A Net wor k size Infectiousness (SE) P ow er Con tagion (SE) P ow er 8000 no des 0.650 (0.259) 45% 0.168 (0.017) 99% 10000 no des 0.616 (0.072) 53% 0.164 (0.013) 100% 12000 no des 0.609 (0.054) 63% 0.164 (0.010) 100% scale, the mean b o otstrap standard error estimator, and the p ercen t co verag e of the 95% confidence interv al based on the 2 . 5 th and 97 . 5 th b o otstrap quantil es. F or sim ulations under the alternati v e h yp othesis w e calculated the p ow er, given b y 100% min us the p ercen t cov erage. F or the 8000 - and 10000 -node net wor ks, there w ere 6 and 1 simula tions, resp ectively , out of 200, for whic h the GLMs used to estimat e the param eters inv olv ed in the conta gion and infectiousness effects did not conv erge due to empt y strata of the predictors. W e omit these sim ulations from the results in T able 2, but note that in a ext reme cases con verge nce could b e an is sue in addition to p o w er. The p oin t estimates are stable across net wor k sizes and the co v erage of the basic b o otstrap confidence int erv al is close to or ab o ve 95% under the n ull for all net w ork sizes. The p o w er under the alternativ e is close to 100% for the con tagion effect for all net w ork sizes, but for the infectiousness effect p o w er is lo w: 45% for the netw ork of s ize 8000 , increasing to 63% for the net w ork of size 12000 . One concern that has b een raised ab out previous uses of statis tical mo dels like GLMs and GEEs for netw ork data is the p ossibility that the mo dels lac k an y p o wer to reject the nu ll h yp othesis when the alternativ e is true (Shalizi, 2012). This is a concern b ecause the mo dels are inheren tly misspecified under the alternativ e h y p othesis, even if they are correctly sp ecified under the nu ll hypothesis. Because the metho ds we prop os e here can b e correctly sp ecified under b oth the n ull and the alternativ e hypotheses, they can b e p ow ered to reject the nul l h yp othesis when the infectiousness or con tagion effect is pr esen t. 7 Discussion W e prop osed methods for consistentl y estimating con tagion and infectiousness effects in in- dep enden t groups of arbitrary size; these metho ds are easy to implemen t and p erform w ell 24 in s imulat ions. W e extended our metho dology to groups sampled f rom so cial netw ork data, pro viding a theoretically justified metho d for using GLMs to analyze net work data. Note that the principles we applied to GLMs can b e applied to GEEs a s w ell, resulting in correctly sp ecified GEEs for netw ork data. The principles that justify our use of GLMs to estimate the con tagion and infectiousness effects are easily extended to any estimand for whic h GLMs wou ld b e a desirable mo deling to ol. Ho wev er, our net work data metho ds require a large amoun t of data and are not appropriate for small or dense net works. On the one hand this highligh ts the fact that dep endence among observ ations in netw orks reduces effectiv e s ample s ize and necessitates larger samples; on the other hand metho ds should b e dev elop ed that can harness more information from the data and increase the p o wer to detect conta gion, infectiousness, and other causal effects. References Airoldi, E., T oulis, P ., Kao, E., and Rubin, D. B. “Estimation of causal peer influence effects.” In Pr o c e e dings of the 30th In ternational Confer enc e on Machine L e arning, A tlanta, GA, JMLR: W&CP , vo lume 28 (2013). Ali, M. M. and Dwyer, D. S. “Estimating p eer effects in adolescent smoking b eha vior: A longitudinal analysis.” Journal of A dolesc ent He alth , 45(4):402–408 (2009). Anderson, R. M., May , R . M., et al. “V accination and herd imm unit y to infectious diseases.” Natur e , 318(6044):323 –329 (1985). Arono w, P . M. and Samii, C. “Estimatin g a v erage causal effects under general in terference.” In Summer Me eting of the So ciety for Politic al Metho dolo gy, Uni versity of North C ar olina, Chap el Hi l l, July , 19–21. Citeseer (2012). Bo we rs, J., F redric kson, M. M., and Pa nagopoulos, C. “Reasoning ab out In terference Bet w een Units: A General F ramew ork.” Politic al Analysis , 21(1):97–124 (2013). Breslo w, N. E. “Generaliz ed linear mo dels: c hec king assumptions and strengthening conclu- sions.” Statisti c a Applic ata , 8:23–41 (1996). Cacioppo, J. T., F o wler, J. H ., and Christakis, N. A. “Alone in the cro wd: the structure and spread of loneliness in a large so cial net w ork.” Journal of p ersonality and so cial psycholo gy , 97(6):977 (2009). Christakis, N. A. and F o wler, J. H. “The spread of ob es ity in a large s o cial net w ork ov er 32 y ears.” New England Journal of Me di cine , 357(4):370–379 (2007). 25 —. “The collectiv e dynamics of smoking in a large so cial net work.” New England journal of me dicine , 358(21):224 9–2258 (2008). —. “Social con tagion theory: examining dynamic so cial netw orks and huma n b eha vior.” S tati s- tics i n Me di cin e , 32(4):556–577 (2013). Cohen-Cole, E. and Fletc her, J. M. “I s ob esity con tagious? So cial net w orks vs. en vironmen tal factors in the ob esit y epidemic.” Journal of He alth Ec onomics , 27(5):1382–1387 (2008). Eames, K. T. D. and Keeling, M. J. “Mo deling dynamic and net work heterogeneities in the spread of s exually transmitted diseases.” Pr o c e e dings of the National A c ademy of Scienc es , 99(20):13330 –13335 (2002). —. “Monogamous net w orks and the spread of sexually transmitted diseases.” Mathematic al Bioscienc es , 189(2):115–130 (2004). Earn, D. J. D., Dushoff, J., and Levin, S. A. “Ecology and ev olu tion of the flu.” T r ends in Ec olo gy & Evolution , 17(7):334–3 40 (2002). Eubank, S., Guclu, H., Kum ar, V. S. A ., Marathe, M. V., Sriniv asan, A., T oroczk ai, Z. , and W ang, N. “Modelling disease outbreaks in realistic urban so cial net wo rks.” Natur e , 429(6988):18 0–184 (2004). Fine, P . E. M. “Herd imm unity: history , theory , practice.” Epidemiolo gic R eviews , 15(2):265 – 302 (1993). F o wler, J. H. and Christakis, N. A. “Estimating p eer effects o n health in so cial netw orks: A resp onse to Cohen-Cole and Fletch er; T rogdon, Nonnemak er, P ais.” Journal of he alth e c onomics , 27(5):1400 (2008). Gill, J. Gener alize d Line ar Mo dels: A Unifie d Appr o ach , v olume 134. Sage Publicati ons, Inc (2001). Halloran, M. E. and Hudgens, M. G. “Causal inference for v accine effects on infectiousness.” International Journal of Biostatistics , 8(2) (2012). Halloran, M. E. and Struc hiner, C. J. “Study designs for dep endent happenings.” Epidemiolo gy , 2(5):331–338 (1991). —. “Mo deling transmission dynamics of stage-sp ecific malaria v accines.” Par asitolo gy T o day , 8(3):77–85 (1992). —. “Causal inference in inf ectious diseases.” Epidemiolo gy , 6(2):142–1 51 (1995). 26 Imai, K., Keele, L., and Tingley , D. “A ge neral approac h to causal mediation ana lysis.” Psycholo gic al Metho ds , 15(4):309–334 (2010). John, T. J. and Sam uel, R. “Herd immu nit y and herd effect: new insigh ts and definitions.” Eur op e an Journal of Epi demiolo gy , 16(7):601–606 (2000). Keeling, M. J. and Eames, K. T. “Netw orks and epidemic mo dels.” Journal of the R oyal So ciety Interfac e , 2(4):295–307 (2005). Keller, M. A. and Stiehm, E. R. “P assive imm unit y in prev ent ion and treatmen t of infectious diseases.” Clin ic al Micr obiolo gy R eviews , 13(4):602–614 (2000). Klo vdahl, A. S. “So cial net w orks and the spread of inf ectious diseases: the AI DS example.” So cial Scienc e and M e dicine , 21(11):1203 –1216 (1985). Klo vdahl, A. S., P otterat, J. J., W o o dhouse, D. E., Muth, J. B., M uth, S. Q., and Darro w, W. W. “So cial net w orks and infectious disease: The Colorado Springs s tudy .” S o cial Scienc e and M e dicine , 38(1):79– 88 (1994). Latora, V., Ny am ba, A., Simp ore, J ., Sylvett e, B., Diane, S., Sy lvere, B., an d Musumeci, S. “Net work of sexual con tacts and s exually transmitted HIV infection in Burkina F aso.” Journal of Me dic al Vir olo gy , 78(6):724–729 (2006). Lazer, D., Rubineau, B., Chetk o v ic h, C., Katz, N., and Neblo, M. “The co evolutio n of net w orks and p olitical attitudes.” Politic al Communi c ation , 27(3):248–274 (2010). Ly ons, R. “The spread of evidence-p o or medicine via fla wed so cial-net work analysis.” Statistics, Politics, and Policy , 2(1) (2011). No el, H. and Nyhan, B. “The unfriending problem: The conseque nces of homophily in friend- ship retenti on for causal estimate s of so cial influence.” So cial Network s , 33(3):211– 218 (2011). O’Brien, K. and Dagan, R. “The p oten tial indirect effect of conjugate pneumo co ccal v accines.” V ac cine , 21(17-18):1815–18 25 (2003). Ogburn, E. L. and V anderW eele, T. J. “Causal diagrams for in terference.” T ec hnical report (2013). P earl, J. “Direct and indirect effects.” In Pr o c e e dings of the Sevente enth Conf er enc e on Un- c ertainty in Artificial Intel ligenc e , 411–420 (2001). Robins, J. M . and G reenland, S. “I den tifiabilit y and exc hangeabilit y for direct and indirect effects.” Epidemiolo gy , 3(2):143– 155 (1992). 27 Robins, J. M. and Rich ardson, T. S. “Alternativ e graphical c ausal mo dels and the iden tifi- cation of direct effects.” In Shrout, P . (ed.), Causality and Psychop atholo gy: Finding the Determinants of Disor ders and Their C ur es . Oxford Univ ersit y Press (2010). Rosen baum, P . “In terference b et ween units in randomized exp erimen ts.” Journal of the Amer- ic an Statistic al Asso ciation , 102(477):191–200 (2007). Rosenquist, J. N ., Murabito, J., F o wler, J. H., and Christakis, N. A . “The spread of alcohol consumption b eha vior in a large so cial netw ork.” Annals of Internal Me dicine , 152(7):426– 433 (2010). Shalizi, C. R. “Commen t on "Wh y and W hen ’Fla wed ’ So cial Netw ork Analyses Still Yield V alid T ests of no Con tagion".” Statistics, Politics, and Policy , 3(1) (2012). Shalizi, C. R. and Thoma s, A. C. “Homophily and contagi on are generically confounded in observ ational so cial netw ork studies.” So ciolo gic al Metho ds & R ese ar ch , 40(2):211–239 (2011). V aleri, L. and V anderW eele, T. J. “M ediation analysis allo wing f or exp osure-mediator intera c- tions and causal interp retation: theoretical assumptions and implemen tation with SAS and SPSS macros.” Psycholo gic al Metho ds , 18(2):137–150 (2013). V anderW eele, T. J. “Sensitivit y analysis for con tagion effe cts in so cial net w orks.” So ciolo gic al Metho ds & R ese ar ch , 40(2):240–255 (2011). V anderW eele, T. J., Ogburn, E. L., and T c hetgen T c hetgen, E. J. “Wh y and When" Flaw ed" So cial Net w ork Analyses Still Yield V alid T ests of no Con tagion.” Statistics, Politics, and Policy , 3(1):1–11 (2012a). V anderW eele, T. J. and T c hetgen T c hetgen, E. J. “Bounding the Infectiousness Effect in V accine T rials.” Epidemiolo gy , 22(5):686 (2011a). —. “Effect partitioning under inte rference in tw o-stage randomized v accine trials.” Statisti cs & Pr ob ability L etters , 81(7):861–86 9 (2011b). V anderW eele, T. J., T c hetgen T c hetgen, E. J., and Halloran, M. E. “Component s of the indirect effect in v accine trials: iden tification of con tagion and inf ectiousness effects.” Epidemiolo gy , 23(5):751–76 1 (2012b). 28
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment