Vaccines, Contagion, and Social Networks

V accines, Con tagion, and So cial Net w orks Elizab eth L. Ogburn, T yler J. V anderW eele ∗ Abstract Consider the causal eﬀect that one indi vidual’s treatment may hav e on another indi- vidual’s outcome when the outcome is cont agious, with sp eci ﬁc application to the eﬀect of v accination on an infectious dis e ase outcome. The eﬀect of o ne individual’s v accination on another’s o utco me can b e decomp osed into t wo diﬀeren t ca usal eﬀects, called the “in- fectiousness” and “contagion” eﬀects. W e present iden tifying assumptions and estimation or testing pro cedures for infectiousness and co ntagion eﬀects in tw o diﬀerent settings: (1) using data sampled from independent groups of observ ations , and ( 2) using data collected from a single int erdep endent so cial netw or k. The metho ds that we prop os e fo r so cial net w ork data require ﬁtting g eneralized linear models (GLMs). GLMs and other s tatisti- cal mo dels that req uir e independence across sub jects have be en used widely to es timate causal eﬀects in so cia l net work d ata, but, beca use the sub jects in net works are presumably not independent, the use of such mo dels is generally in v alid, res ulting in inference that is exp ected to b e anticonserv ativ e. W e in tr o duce a wa y to ensur e that GLM r esiduals are uncorrelated a cross sub jects despite the fact that outcome s ar e non-indep endent. This simul taneously demo nstrates the po ssibility of using GLMs and rela ted statistical models for net work data and hig hlights their limitations. 1 In tro duction W e are concerned here with the eﬀect that one individual’s treatmen t may ha ve on another individual’s outcome, when the outcome is cont agious. In the infectiou s disease literature, this is often called an in dir e ct eﬀe ct of treatmen t (Halloran and Struc hiner , 1991), while the eﬀect of an individu al’s treatmen t on his o wn outcome is a dir e ct eﬀe ct . Indirect eﬀects of infectious disease in terven tions are of signiﬁcan t imp ortance for understanding infectious disease dynam- ics and for designing public health interv en tions. F or example, the goal of man y v accination ∗ Elizabeth Ogbu rn (email: eogburn@jhsph.edu) is Assistant Professo r, Department of Biostatistics, Johns Hopkins Univ ersity , B altimore, MD 21205; T y ler V anderW eele is Prof essor, Departments of Epidemiol ogy and Biostatis tics, Harv ard School of Public Health, Boston, MA 02115. D r. Ogburn’s researc h w as supp orted by gran t s U54 GM 08855 8 and ES017678 from the Natio nal Institutes of Health. Dr. V anderW eele’s researc h was supp orted by grant ES01767 8 from th e National Institutes of Health. 1 programs is to ac hiev e herd imm unit y , whereb y a large enough subset of a p opulation is v ac- cinated that ev en those individuals who remain un v accinated are protected against infection. This is one type of indirect eﬀect of a v accination program; it has b een extensiv ely stud- ied in the infectious disease literature (Anderson et al. , 1 985; Fine, 1993; John and Samu el, 2000; O’Brien and Dagan, 2003). R ecently , in terest has turned to ward s the iden tiﬁcation and estimation of a verag e individual-lev el indirect eﬀects (Halloran and Struc hiner , 1991, 1995; Halloran and Hudgens, 201 2; V anderW eele and T c hetgen T c hetgen , 2011a; V ander W eele et al., 2012b; V anderW eele and T c hetgen T c hetgen , 2011b), such as the eﬀect on a single mem b er of a comm unit y of t wo diﬀeren t v accination program s implemen ted on the rest of the comm unit y (Halloran and Struc hiner, 1995). V anderW eele et al. (2012b) demonstrated that the individual-lev el indirect eﬀect of v acci- nation in comm unities of size t w o can b e decomposed into to t wo diﬀerent eﬀects, called the “infectiousness” and “con tagion” eﬀects. These tw o eﬀects represen t distinct causal path w ays b y which one p erson’s v accination ma y aﬀect another’s diseas e status. The con tagion eﬀect is the indire ct eﬀect that v accinating one individual ma y ha ve on another b y preven ting the v ac- cinated individual f rom gettin g the disease and thereb y f rom passing it on. The infectiousness eﬀect is the indirect eﬀect that v accination migh t hav e if, instead of prev en ting the v accinated individual from getting the disease, it renders the disease less infectious, thereb y reducing the probabilit y that the v accinated infected individual transmits the disease, even if infected. V anderW eele et al. (2012b) only considered estimation of th e in fectiousness and con ta- gion eﬀects in a sample comprised of independen t households of size t w o with one mem b er of eac h household ass umed to b e homeb ound. The assumption that one individual is home- b ound and the assumption of indep enden t households are restrictive, the latter b ecause it requires that the househo lds b e sampled from distinct comm unities and geographic areas. Ogburn and V anderW eele (2013) considered the setting in which households are indep enden t but b oth individuals ma y b e exp osed outside the household. Here, w e relax the requiremen t of independent households of s ize t w o and pro vide extensions to indep enden t groups of arbitra ry size and to s o cial net w orks. Increasingly , data are a v ailable on the spread of con tagious outcomes through so cial net- w orks. This s etting is considerably more complex than that considered in V anderW eele et al. (2012b), b ecause the observe d outcomes (e.g. disease s tatus) are not indep enden t of one an- other. Th ere is a gro wing literature on the p ossibilit y of testing for the presence of diﬀeren t causal mec hanisms using observ ational data from so cial net wor ks and a consensus that more rigorous methods are needed. An emerging b o dy of w ork rep orts results from generalized linear mo dels (GLMs) and, for longitudinal data, generalized estimating equations (GEEs) as estimates of p eer eﬀects, or the causal eﬀect that one individual’s outcome ma y hav e on his or her so cial con tacts’ out- comes (Ali and Dwy er, 2009; Caciopp o et al., 2009; Christakis and F owl er, 2007, 2008, 2013; 2 F o wler and Christakis, 2008; Lazer et al., 2010; Rosenquist et al., 2010). This w ork has come under criticism that can largely b e summarized in to t w o o v erarchin g themes. First, m uch of the criticism fo cuses on the abilit y to con trol for confounding when estimating p eer eﬀects, and sp eciﬁcally on the iden tifying as s umptions that are required in order to tell the diﬀer- ence b etw een the w ell known problem of homophily (the phenomenon b y whic h individuals with similar traits are more lik ely to form so cial ties with o ne another) and p eer inﬂuence (Cohen-Cole and Fletc her, 2008; Lyons, 2011; N o el and Nyhan, 2011; Shalizi and Thomas, 2011; V anderW eele, 2011). Homophily will not b e an issue in m any infectious disease set- tings, as man y suc h illnesses, for example the seasonal ﬂu, are unlikely to change the nature of so cial ties. A dequate con trol for confounding is s till crucial, but w e ass ume throughout that all p oten tial confounders of the causal eﬀects of interest are observ ed. This assumption should b e assessed in an y application of these methods and it ma y not hold in man y real data settings; how ev er, w e do not fo cus on this assumption in the remainder of this pap er. The second class of criticisms addresses the use of statistical mo dels for indep enden t obser- v ations in this dep enden t data setting. Lyons (2011) and V anderW eele et al. (2012a) demon- strated the imp ortance of ensuring that mo dels are coheren t when an observ ation can b e b oth an outcome and a predictor (of so cial con tacts’ outcomes); this is easily accomp lished b y using the observ ations at one time p oin t as predictors and the observ ations at a s ubsequen t time p oin t as outcomes, a solution that wa s implemen ted in man y of applications of GLMs and GEEs to so cial net work data referenced ab ov e. More cha llenging is the f act that, when an analysis assumes independence but observ ations are in fact p ositiv ely correlated, as w e w ould exp ect them to b e for con tagious outcomes in a so cial net work, the resulting standard errors and statistical inference will generally b e an ticonserv ativ e. In some cases, the ass umption of independent outcomes ma y hold under the n ull h yp othesis (V anderW eele et al., 2012a), but it is unkno wn whether tests that rely on this fact hav e an y p o w er to detect the presence of the causal eﬀects of in terest (Shalizi, 2012). Our con tribution to methodology for so cial net work analysis is to adapt GLMs to ensure that the mo dels can b e correctly s p eciﬁed, with uncorrelated residuals, ev en when the out- come is con tagious. W e demonstrate the p ossibilit y of testing for the presence of con tagion and infectiousness eﬀects using so cial net wor k data and gen eralized linear mo dels (GLMs). W e discuss the paradigmatic example of the eﬀect of a v accination on an infectious disease outcome, but eﬀects like con tagion and infectiousness are of in terest in other settings as well. Our general approac h to correctly sp ecifying GLMs f or a contagio us outcome using net w ork data could p oten tially b e applied to an y es timand for whic h G LMs are appropriate under independence. The tests that we propose hav e imp ortan t limitations, most notably lo w p ow er to detect eﬀects unless net w orks are large and/or s parse. Ho w ever, this w ork represen ts an imp ortan t pro of of concept in the ongoing endea vor to dev elop metho ds for v alid inference using data collected from a single net w ork. F urthermore, it clariﬁes the iss ues of mo del mis- 3 sp eciﬁcation and in v alid standard errors raised b y previous prop osals for using GMLs to assess p eer eﬀects using net w ork data. 2 So ci al net w orks and con tagi on F ormally , a so cial net work is a collection of individuals an d the ties b etw een th em. The presence of a tie b et w een t wo individuals indicates that the individuals share some k ind of a relationship; what t yp es of relationships are enco ded b y netw ork ties depends on the con text. F or example, w e migh t deﬁne a net wo rk tie to include familial relatedness, friendship, and shared place of wor k. Some t yp es of relationships are m utual, for example familial relatedn ess and shared place of w ork. Others, lik e friendship, ma y go in o nly one direction: T om ma y consider Sue to b e his friend, while Sue do es not consider T om to b e her friend. W e will ass ume that all ties in our net w ork are m utual or undirecte d, but the principles of our method extend to directed ties. A no de whose c haracteristics w e wish to exp lain is called an e go ; no des that share ties with the ego are its alters or c ontacts . If an ego’s outcome ma y b e aﬀected b y his con tacts’ outcomes, then we sa y that the outcome exhibits i nduction or c ontagion . So cial net w orks are crucial to understanding man y features of infectious disease dynamic s, and, increasingly , infectious disease researc hers dra w on so cial net w ork data to reﬁne their understanding of transmission patterns and treatmen t eﬀects. F or example, man y mathe- matical mo dels of infectious disease no w incorp orate so cia l net w ork structure, whereas they previously generally assumed uniform mixing among mem b ers of a comm unity (Eubank et al., 2004; Klo vdahl, 1985; Klo v dahl et al., 1994; Keeling and Eames, 2005), and researc hers collect data on sexual con tact net w orks, since properties of these net works can inform strategies for con trolling sexually transmitted diseases (Latora et al., 2006; Eames and Keeling, 2002, 2004 ). It is desirable for a n um b er of reasons to s tudy infectiousness and con tagion in the con text of so cial net w orks rather than in indep enden t comm unities. First, so cial net w ork data ma y b e easier to collect or to access than data on indep enden t comm unities, as the latter setting requires sampling from a large num b er of diﬀeren t lo cations or con texts that are separated b y time or s pace. Second, assessing whether traits can b e transmitted from one individual to another through netw ork ties is one of the cen tral questions in the s tudy of so cial net wor ks; assessing infectiousness and conta gion con tributes further insight in to this problem. Finally , so cial net wor k data more realistically capture the true in terdep endencies of the individuals whom we can hop e to treat with an y public health in terv en tion. V accine programs do not in general target d istan t, indep enden t pairs of individuals; the y targ et villages, cities, or comm unities in whic h individuals are in terconnected and th eir outcomes correlated. Therefore, assessing the presence of v accine eﬀects in so cial netw ork data may b e more informativ e for real-w orld applicat ions. The metho ds w e presen t here represen t a ﬁrst step tow ards b eing able to estimate and p erform inference ab out suc h eﬀects using s o cial net work data. 4 In terven tions to prev ent infectious diseases generally op erate in tw o w ays. Some reduce the susceptibilit y of treated individuals to the disease, thereb y preven ting them from b ecoming infected. E xamples of s uc h in terv en tions are v accines f or tetan us, hepatitis A and B, rabies, and measles (Keller and Stiehm, 2000). These v accines ha ve indirect eﬀects that op erate via con tagion eﬀects. Other in terv ent ions may reduce the lik el iho o d that an infected individual passes on his infection to others. The malaria transmission-blo c king v accine is designed to prev en t mosquitos from acquiring, and thereb y from transmitting, malaria parasites up on biting infected individuals (Halloran and Struc hiner , 199 2). Th is v accine has no protectiv e eﬀect for the v accinated individual, but it renders v accinated individuals less lik ely to transmit the disease. Therefore any indirect eﬀect of the malaria transmission-blocking v accine is due en tirely to an infectiousness eﬀect. Man y in terven tions hav e indirect eﬀects that op erate via b oth con tagion and infectiousness eﬀects. Existing metho ds for assessing causal eﬀects using net work data are limited. Some re- cen t prop os als give methods for ass essing indirect eﬀects when treatmen t can b e randomized (Airoldi et al., 2013; Arono w and Samii, 2012; Bo wers et al., 2 013; Rosenb aum, 2007), but these metho ds are of limited use in observ ational settings or for teasing apart s p eciﬁc t yp es of indirect eﬀects lik e the infectiousness and con tagion eﬀects. Much of the extan t literature relies on GLMs and GEEs, despite the fact that the key assumption of indep enden t outcomes across sub jects is unlik ely to hold in so cial net wo rk settings (Ly ons, 2011). In this pap er, w e in tro duce a wa y to ensure that GLM residuals are uncorrelated across sub jects despite the fact that outcomes are non-indep enden t; this facilitates the use of GLMs to assess infectiousness and con tagion eﬀects in so cial net work cont exts. W e demonstrate through sim ulations that our methods do ha ve some pow er to detect the presence of con tagion and infectiousness eﬀects; ho w ev er, in order to ensure that residuals are uncorrelated, w e make several adaptations to naiv e GLMs; unfortunately these can result in lo w p o wer. The applications that we discuss in this pap er do not require the use of GEEs to accoun t for within-sub ject dep endence o ver time, but the general principles that we use to adapt GLMs to the netw ork setting apply to GEEs as well. 3 Infectiousness and con tagion in indep enden t groups of size 2 3.1 Notation and assumptions Consider K households comprised of tw o individuals eac h and separated by space or time suc h that an infectious disease cannot b e transmitted b etw een individuals in diﬀeren t households. Borro wing terminology from the so cial netw ork literature, w e will refer to one individual as the alter, denoted a , and the other as the ego, denoted e . F or no w we assume that in eac h household the ego is un v accinated, and that all v accinatio n o ccurs b efore the start of follo w-up. 5 Con tagion and infectiousness eﬀects are analogous to causal mediation eﬀects of the alter’s v accination on the ego’s outcome, mediated b y the alter’s dis ease status (V anderW eele et al., 2012b). W e formally deﬁne these eﬀects in the next s ection after ﬁrst in tro ducing k ey notation and identi fying as s umptions. F or individual i in household k , i = a, e , let Y t i k b e the outcome at time t and C i k b e a v ector of cov ariates. Let V a k b e an indicator of v accination for the alter in househol d k . Below w e omit the subscript k when con text allo ws. Deﬁne Y t i k ( v ) to b e the count erfactual outcome w e w ould ha ve observ ed for individual i in household k at time t , if, p ossibly contra ry to fact, the alter had received treatmen t v . Let M k b e a v ariable that lies on a causal path w ay from V a k to Y t e k . L et Y t e k ( v , m ) b e the coun terfactual outcome for the ego at time t that w e w ould ha ve observ ed if V a k had b een set to v and M k to m . Throug hout w e mak e the consistency assumptions that M k ( v ) = M k when V a k = v , that Y e k ( v , m ) = Y e k when V k = v and M k = m , and that Y e k ( v , M k ( v )) = Y e k ( v ) . Let Y t e k ( v , M k ( v ′ )) b e the coun terfactual disease status for the ego in household k that w e would ha v e observ ed at time t if V a k had b een set to v and M k to its coun terfactual v alue under V a k = v ′ . T o ensure that this coun terfactual is w ell-deﬁned, w e assume that it is h yp othetically p ossible to in terv ene on the mediator without inte rv ening on V a k . Let C k = ( C a k , C e k ) . In order to iden tify functionals of nested counte rfactuals like Y t e ( v , M ( v ′ )) we require the follo wing four ass umptions (Pea rl, 2001): Y t e ( v , m ) ⊥ V a    C , (1) Y t e ( v , m ) ⊥ M    V a , C , (2) M ( v ) ⊥ V a | C , (3) and Y t e ( v , m ) ⊥ M  v ′     C (4) where A ⊥ B | C denotes that A is indep enden t of B conditional on C . Assumptions (1), (2), and (3) corresp ond to the absence of unmeasured confounders for the eﬀects of the exp osure on the outcome ( V a on Y t e ), of the mediator on the outcome ( M on Y t e ), and of the exp osure on the mediator ( V a on M ), resp ectively . Assumption (4) requires that no confounder of the eﬀect of M on Y t e is aﬀected by V a . Discuss ion of these assumptions in the con text of mediation analysis can b e found in P earl (2001 ). Discussion and extension of thes e assumptions to settings with in terference or spillov er eﬀects can b e found in Ogburn and V anderW eele (2013), including discussion of ho w to determine which cov ariates m ust b e included in C . 6 3.2 Previous m etho dology for decomp osing the indirect eﬀect into infec- tiousness and con tagion eﬀects 3.2.1 Iden tiﬁcation V anderW eele et al. (2012b) describ ed the decomp os ition of indirect eﬀects in to con tagion and infectiousness eﬀects in comm unities of size t w o. They assumed that the outcome can only o ccur once for eac h individual during the follo w-up p erio d. This is a reasonable ass umption for man y inf ectious disease outcomes, for example for the common ﬂu with a follo w-up p erio d consisting of a s ingle ﬂu s eason. They further assumed that i n each pair the ego cannot b e exp osed to the disease except b y the alter, as migh t b e the cas e if the ego we re homeb ound. Let t f b e the time of the end of follo w-up and Y t f k e b e an indicator of whether the ego in household k has had the disease by the end of follow -up. V anderW eele et al. (2012b) deﬁned the p opulation a verage indirect eﬀect of v accination on the ego as E h Y t f e (1) i − E h Y t f e (0) i , or the exp ected diﬀerence in the coun terfactual disease status of the ego at end of follow- up when the alter is v accinated compared to when the alter is not v accinated. When the ego is homeb ound, the indicator Y t f k e of whether the ego in household k has had the disease b y the end of follow -up is, equiv alentl y , an indicator of whether the ego w as in fected b y the alter in household k . I n order to generalize the discuss ion of v accine eﬀects to s ettings in whic h the ego can b e infected from outside the home, the outcome Y t f k e should b e deﬁned more precisely as the indicator of whether the ego w as sic k af ter the alter. Sp eciﬁcally , let Y t f k e = I ( alter w as sick at time T < t f and ego w as sic k at time S, T < S ≤ t f ) . The con tagion eﬀect is the protectiv e eﬀect that treating one individual has on another’s disease status b y preven ting the treated individual from getting the disease and thereb y from transmitting it. Let T k b e the time of the ﬁrst case of the disease in household k . This is akin to the eﬀect of one individual’s treatmen t on another’s disease status as mediated by the ﬁrst individual’s disease s tatus. F or the purp os es of the analys is b elo w, w e deﬁne a disease case to b egin when an individual b ecomes infectious. If infectiousness do es not coincide with the app earance of disease s ymptoms then we may not observe the timing of disease cases directly , but we could infer the time based on when symptoms app ear and on known disease dynamics. F or example, an individual with the ﬂu will generally b e infectious one day b efore he is s ymptomatic (Earn et al., 2002). Therefore, if ﬂu is the disease under study we w ould classify an individual as ha ving the disease b eginning one day b efore he rep orted having ﬂu symptoms. W e as s ume throughout that there are no asymptomatic carriers of the disease. If neither individual in household k is eve r sick then w e deﬁne T k to b e the end of follo w- up. No w Y T k a k is an indicator of whether the alter is sic k at time T k , i.e. an indicator of whether the alter is the ﬁrst individual in the group to get sick; if neither individual gets s ic k then it will b e 0. Let T k ( v ) b e the time at whic h the ﬁrst infection in household k w ould ha ve o ccurred if the alter had, p ossibly con trary to fact, had v accine status v . Let Y T k ( v ) a k ( v ) 7 b e the coun terfactual disease status of the alter at time T k ( v ) had he had v accine s tatus v . Let Y t f e k = I ( individual e k b ecame infectious after time T k and on or b efore time t f ) . The con- tagion eﬀ ect is giv en b y a con trast in coun terfactuals of the f orm Y t f e  v , Y T ( v ′ ) a ( v ′ )  where, unlik e in the mediation f ramewo rk w e describ ed in Section 3.1, the v ariable Y T ( v ′ ) a that pla ys the role of mediator ma y b e a diﬀeren t random v ariable in the t w o terms in the c on trast. Sp ecif- ically , the p opulation av erage con tagion eﬀect is E h Y t f e  0 , Y T (1) a (1) i − E h Y t f e (0 , Y T (0) a (0)) i , and Y T (0) a and Y T (1) a will b e diﬀeren t random v ariables whenev er T ( 0) 6 = T (1) . This con trast is the diﬀerence in exp ected coun terfactual outcomes for the ego when the v accine status of the alter is held constant at 0 but his infection status is set to that under v accination in the ﬁrst term and to that under no v accination in the second term of the contr ast. It captures the eﬀect that v accination migh t hav e had on the disease status of the ego by prev en ting the alter from con tracting the disease. The nested coun terfactuals are well -deﬁned because we can imagine in tervenin g on Y T k k a without inte rv ening on V k a , f or example b y admin istering imm une b o osters to prev en t the alter from b eing infected or by exp osing the alter to a high dose of ﬂu virus in a lab oratory setting to cause infection. The population a v erage infectiousness eﬀect is E h Y t f e  1 , Y T (1) a (1) i − E h Y t f e (0 , Y T (1) a (1)) i . This is akin to the eﬀect of one individual’s treatmen t on another’s disease status, not mediated through the ﬁrst individual’s disease status. This eﬀect op erates if treatmen t renders cases of disease among treated individuals less likely to b e trans mitted. Supp ose that the alter in group k w ould get the ﬂu ﬁrst if v accinated. That is, Y T k (1) a k (1) = 1 . Then the infectiousness eﬀect is the diﬀerence in coun terfactual outcomes for the ego comparing the scenario in whic h the alter is v accinated and infected ﬁrst with the scenario in whic h to the alter is un v accinated and infected ﬁrst. If the alter in group k w ould not get the ﬂu ﬁrst under v accination, then the infectiousness eﬀect for group k is n ull. By the consistency assumption w e made in Section 3.1 ab ov e, E h Y t f e  1 , Y T (1) a (1) i = E h Y t f e (1) i and E h Y t f e  0 , Y T (0) a (0) i = E h Y t f e (0) i . The indirect eﬀect of the v accination of the alter on the ego decomp oses in to the sum of the con tagion and infectiousness eﬀects as follo ws: E h Y t f e (1) i − E h Y t f e (0) i = E h Y t f e  1 , Y T (1) a (1) i − E h Y t f e  0 , Y T (0) a (0) i = E h Y t f e  1 , Y T (1) a (1) i − E h Y t f e (0 , Y T (1) a (1)) i + E h Y t f e  0 , Y T (1) a (1) i − E h Y t f e (0 , Y T (0) a (0)) i The assumptions made b y V anderW eele et al. (2012b) allo w for the iden tiﬁcation of the infectiousness and con tagion eﬀects even if disease status is only observ ed at the end of follo w- up. Because the ego cannot b e infected except b y the alter, Y T k a k = 0 if and only if neither individual is observ ed to get sic k and Y T k a k = 1 if and only if Y t f a k = 1 . Therefore Y t f a k can b e 8 substituted for Y T k a k in the expressions ab ov e. Ogburn and V anderW eele (2013) ga ve ident ifying ass umptions f or the infectiousness and con tagion eﬀects in groups of size tw o when the time of infections is observed . They did not assume that only one mem b er of eac h pair is exp osed from outside of the group; instead they assumed that the probabilit y of the ego contr acting the disease within a ﬁxed follo w-up in terv al if exp osed at time t is constan t in t . This ensures that the time of the ﬁrst infection T is not a confounder of the mediator-out come relationship, which w ould constitute a violation of assumption (4) b ecause T is aﬀected b y V a . Supp os e that the tw o mem b ers of each pair are distinguishable from one another, for example paren t-child pairs. W e select one of the t wo to b e the alter (e.g the paren t) and the other is the ego (the ch ild). Alternativ ely , if the individuals are exchan geable, that is, if we ha ve no reason to think that the indirect eﬀect and its comp onen ts will b e diﬀeren t for one than for the other, then w e can randomly c ho ose whic h sub ject is the alter and whic h is the ego. Ogburn and V anderW eele (2013) deﬁned the indicator Y T + s e of whether the ego is sick af ter time T and b y time T + s to b e the outcome, where s is a constan t that allo ws T to determine a new end of follow -up. This ensures that T do es not confound the mediator-outcom e relationship. The constan t s should b e c hosen to b e the sum of the infectious p erio d ( f ) and the incubation p erio d ( b ) of the disease under study . The infectious p erio d is the length of time during whic h an infected individual is infectious, and the incuba tion p erio d is the length of time b et we en b eing infected and b ecoming infectious. If the alter b ecomes infectious at time T , then he can infect the ego un til time T + f . If infected at time T + f , the ego will b ecome infectious at time T k + f + b = T k + s . Therefore if the alter infects the ego, the ego m ust b e infectious b y time T k + s . W e assume throughout that the time to eﬃcacy of v accine is immediate and that the infectious and incubation perio ds are constan t across individuals. Let Y T k ( v ′ )+ s e k  v , Y T k ( v ′ ) a k ( v ′ )  b e the coun terfactual outcome w e w ould hav e observed for the ego in group k at time T k ( v ′ ) + s if the alter’s v accine status we re set to v and the alter’s disease status at time T k ( v ′ ) w ere set to its coun terfactual under v accine status v ′ . The a v erage con- tagion eﬀect in this setting is giv en b y E h Y T (1)+ s e  0 , Y T (1) a (1) i − E h Y T (0)+ s e  0 , Y T (0) a (0) i and the a ve rage infectiousness eﬀect b y E h Y T (1)+ s e  1 , Y T (1) a (1) i − E h Y T (1)+ s e  0 , Y T (1) a (1) i . The sum of these t wo eﬀects is the av erage indirect eﬀect E h Y T (1)+ s e (1) i − E h Y T (0)+ s e (0) i . Although the disease status of the ego is measured s days after the ﬁrst infection instead of at the end of follo w-up, this indirect eﬀect still captures any eﬀect that the alter’s v accination status can hav e on the ego’s disease status, because after time T + s an y c hange in the disease status of the ego cannot b e caused b y V a . So far we ha ve described all eﬀects on the diﬀerence scale, but eve rything w e hav e written applies equally to eﬀ ects on the ratio and o dds ratio scales. On the ratio and o dds ratio scales the indirect eﬀect of v accination decomposes in to a pro duct of the con tagion and in- 9 fectiousness eﬀects. On the ratio scale, the a verag e indirect eﬀect of V a on the disease status of the ego is E h Y T (1)+ s e (1) i /E h Y T (0)+ s e (0) i , whic h is a pro duct of the av erage infectiousness eﬀect, E h Y T (1)+ s e  1 , Y T (1) a (1) i /E h Y T (1)+ s e (0 , Y T (1) a (1)) i , and the a ver age con tagion eﬀect, E h Y T (1)+ s e  0 , Y T (1) a (1) i /E h Y T (0)+ s e (0 , Y T (0) a (0)) i . On the o dds ratio scale f or a binary out- come the decomp osition is E h Y T (1)+ s e (1) i  1 − E h Y T (0)+ s e (0) i E h Y T (0)+ s e (0) i  1 − E h Y T (1)+ s e (1) i = E h Y T (1)+ s e  1 , Y T (1) a (1) i  1 − E h Y T (1)+ s e (0 , Y T (1) a (1)) i E h Y T (1)+ s e (0 , Y T (1) a (1)) i  1 − E h Y T (1)+ s e  1 , Y T (1) a (1) i × E h Y T (1)+ s e  0 , Y T (1) a (1) i  1 − E h Y T (0)+ s e (0 , Y T (0) a (0)) i E h Y T (0)+ s e (0 , Y T (0) a (0)) i  1 − E h Y T (1)+ s e  0 , Y T (1) a (1) i where the ﬁrst line is the indirect eﬀect, the second line is the infectiousness eﬀect, and the third line is the con tagion eﬀect. 3.2.2 Estimation The con tagion and infectiousness eﬀects are analogous to the natural indirect and direct ef- fects, resp ectivel y , of the eﬀect of V a on Y T + s e with Y T a as the mediator. Natural indirect and direct eﬀects hav e b een written ab out extensiv ely in the causal inference and media- tion literature (see e.g. P earl, 2001; Robins and Greenland, 1992; Robins and Rich ardson, 2010) and it is w ell-know n ho w to estimate them in a v ariet y of settings (I mai et al. , 2010; V aleri and V anderW eele, 2013). This s etting diﬀers f rom those considered by other authors b ecause the outcome Y T + s e is, by deﬁnition, equal to 0 whenev er Y T a is equal to 0 ; therefore one m ust b e careful to ensure that any mo del sp eciﬁed for for E h Y T + s e | V a , Y T a , C i is consis- ten t with this restriction. V anderW eele et al. (2012b) describ e ho w to estimate the con tagion and infectiousness eﬀects on the ratio scale in households of size t wo when one individual is homebound, but the pro cedure they presen t ov erlo ok s this restriction and therefore the mo dels they suggest may f ail to con verg e. W e describ e a pro cedure f or estimating the con tagion and infectiousness eﬀects that is appropriate for the setting considered in V anderW eele et al. (2012b) and for the setting in whic h neither individual is assumed to b e homeb ound. W e desc rib e estimation of the eﬀects on the diﬀerence and ratio scales. Estimation of eﬀects on the o dds ratio scale is also p oss ible. Supp os e that assumptions (1) through (4) hold for the eﬀect of V a on Y T + s e with Y T a as the 10 mediator and co v ariates C , and that the follow ing t wo mo dels are correctly sp eciﬁed: log n E h Y T + s e | V a , Y T a = 1 , C io = γ 0 + γ 1 V a + γ ′ 2 C (5) log it n E h Y T a | V a , C io = η 0 + η 1 V a + η ′ 2 C . (6) If the outcome is rare then (5 ) can b e replaced with a logistic mo del. The con tagion eﬀect conditional on cov ariates C = c on the diﬀerence scale is giv en by E h Y T (1)+ s e (0 , Y T (1) a (1)) | c i − E h Y T (0)+ s e (0 , Y T (0) a (0)) | c i = 0 + E h Y T + s e | V a = 0 , Y T a = 1 , c i n E h Y T a | V a = 1 , c i − E h Y T a | V a = 0 , c io = e γ 0 + γ ′ 2 c ( e η 0 + η 1 V a + η ′ 2 c 1 + e η 0 + η 1 V a + η ′ 2 c − e η 0 + η ′ 2 c 1 + e η 0 + η ′ 2 c ) . and the infectiousness eﬀect conditional on co v ariates C = c is given by E h Y T (1)+ s e (1 , Y T (1) a (1)) | c i − E h Y T (1)+ s e (0 , Y T (1) a (1)) | c i = 0 + E h Y T a | V a = 1 , c i n E h Y T + s e | V a = 1 , Y T a = 1 , c i − E h Y T + s e | V a = 0 , Y T a = 1 , c io = e η 0 + η 1 V a + η ′ 2 c 1 + e η 0 + η 1 V a + η ′ 2 c n e γ 0 + γ 1 V a + γ ′ 2 c − e γ 0 + γ ′ 2 c o . The con tagion and infectiousness eﬀects can b e estimated b y ﬁtting models (5) and (6) and plugging the parameter es timates in to the expressions ab ov e. The standard errors for these estimates can b e b o otstrapped or derived using the delta metho d (similar to those deriv ed in V aleri and V anderW eele, 2013 for the natural direct and in direct eﬀects). Alternativ ely , a Monte Carlo based approac h similar to Imai et al. (2010) can b e used for estimation of the eﬀects and their standard errors. Softw are pack ages lik e SAS and SPSS mediation macros (V aleri and V anderW eele, 2013) or the R mediation pac k age (Imai et al., 2010) cannot b e used in this setting b ecause instead of (5), whic h mo dels the conditional exp ectation of the Y T + s e only in the Y T a = 1 stratum, these pac k ages require ﬁtting a mo del for E h Y T + s e | V a , Y T a , C i . If the ego can also b e v accinated then V e m ust b e included in C . If V a in teracts with V e or with an y other cov ariates, these in teractions can b e incorporated in to the mo dels and p ose no diﬃcult y for es timation. T o test whether there is a contagion eﬀect, we can simply test whether η 1 = 0 . T o test whether there is an infectiousness eﬀect w e can simply test whether γ 1 = 0 . Using the parameters of mo dels (5) and (6 ) we can also estimate the con tagion and in- fectiousness eﬀects on the ratio scale. T he con tagion eﬀect conditional on C = c is given 11 b y E h Y T (1)+ s e (0 , Y T (1) a (1)) | c i E h Y T (0)+ s e (0 , Y T (0) a (0)) | c i = 0 + E h Y T + s e | V a = 0 , Y T a = 1 , c i E h Y T a | V a = 1 , c i 0 + E h Y T + s e | V a = 0 , Y T a = 1 , c i E [ Y T a | V a = 0 , c ] = E h Y T a | V a = 1 , c i E [ Y T a | V a = 0 , c ] = e η 1 + e η 0 + η 1 + η ′ 2 c 1 + e η 0 + η 1 + η ′ 2 c (7) and the infectiousness eﬀect is given by E h Y T (1)+ s e (1 , Y T (1) a (1)) | c i E h Y T (1)+ s e (0 , Y T (1) a (1)) | c i = 0 + E h Y T + s e | V a = 1 , Y T a = 1 , c i E h Y T a | V a = 1 , c i 0 + E h Y T + s e | V a = 0 , Y T a = 1 , c i E [ Y T a | V a = 1 , c ] = E h Y T + s e | V a = 1 , Y T a = 1 , c i E h Y T + s e | V a = 0 , Y T a = 1 , c i (8) = e γ 1 . Under the restriction that Y T + s e = 0 whenev er Y T a = 0 , the con tagion eﬀect on the ratio scale is s imply a measure of the eﬀect of the alter’s v accination on the alter’s outcome. It is mathematica lly undeﬁned if E h Y T + s e | V a = 0 , Y T a = 1 , c i = 0 , that is, if the ego’s outcome has no eﬀect on the alter’s outcome, but it is natural to deﬁne it to b e equal to the n ull v alue of 1 in this case. The infectiousness eﬀect on the ratio s cale is s imply a measure of the eﬀect of the alter’s v accination on the ego’s outcome among pairs in whic h the alter is sick ﬁrst, that is, in the Y T a = 1 stratum. 4 Infectiousness and con tagion in groups of more than t w o Although allo wing b oth individuals in a household to b e infe cted from outside the household generalizes the results of V anderW eele et al. (2012b), it st ill requires the strong assumption, inheren t in the iden tif y ing assumptions describ ed in Section 3, that the alter and ego do not share an y p oten tially infectious con tacts. If b oth of the individuals in a giv en household could b e infected from outside the household by the s ame m utual friend, then that friend’s disease status would b e a confounder of the mediator-outc ome relationship; if unobserv ed, it w ould constitute a violation of assumption (2 ). W e can relax the assumption of no m utual con tacts outside of the household by collecting data on an y such con ta cts and con trolling for them as co v ariates in our estimating pro cedure. In this section, w e consider iden tiﬁcation and estimation o f the contag ion and infectiousness 12 eﬀects when indep endent groups of individuals are s ampled. W e ass ume that eac h group includes a pair of individuals who furnish the exp os ure, mediator, and outcome v ariables, plus all mu tual and p oten tially infectious con tacts of the pair. Sev eral types of sampling pro cedures could giv e rise to this data structure. F or example, one poss ibilit y w ould be to sample w orkplaces and randomly select t w o individuals to pla y the role of the alter and ego; another w ould b e to sample household pairs ﬁrst, ascertain the iden tities of p oten tial m utual cont acts outside of the home, and include all such conta cts in the data collection mo ving forw ard. The s ampling procedure does not aﬀect the iden tiﬁcation or estimation results described b elo w. Let k index the k th group, k = 1 , ..., K . Let Y t i k b e an indicator of whether individual i in group k has had the disease b y da y t . As in Section 3, w e deﬁne a case of the disease to b egin when the individual b ecomes infectious and let s = f + b b e the sum of the infectious and incubation p erio ds for the disease. W e assume that v accination o ccurs b efore the start of follo w-up. Given a non-rare outcome like the ﬂu and time measured in discrete in terv als lik e da ys , it is likely that w e wou ld observ e m ultiple individuals to get sic k on the same da y . W e therefore do not mak e the assumption, made in Section 3, that no tw o individuals can b e observ ed to get sick at the same time. F or group k , let e k index the ego, whose ﬂu status we wish to study , and let a k index the alter, whose v accination s tatus ma y or may not ha v e an eﬀect on the ego’s disease status. W e index the other individuals in group k b y 1 , 2 , ..., n k . Let T k b e the time of the ﬁrst infection in the k th alter-ego pair. As in Section 3, the ego furnishes the outcome, Y T k + s e k . The alter furnishes the treatmen t, v accine status V a k , and the mediator, indicator of ﬁrst infection Y T k a k . When con text allow s, w e omit the subscript k . The deﬁnition of the mediator needs to b e mo diﬁed sligh tly to reﬂect the fact that the alter and the ego could get s ic k at the same time: let Y T a b e an indicator of whether the alter w as sick and the ego health y at time T . Let Y T + s e b e an indicator of whether the ego got sic k b etw een time T + b , whic h is the ﬁrst time at whic h the alter could ha ve infected the ego, and time T + s , whic h is the last time at whic h the alter could ha ve infected the ego. This deﬁnition preserv es the in terpretation of Y T a as an indicator that the alter wa s sic k b efore the ego; if the ego and the alter simu ltaneously fell ill on day T then Y T a will be 0 , whic h is desirable b ecause the ego cannot ha ve caugh t the disease from the alter if they b oth fell ill on the same day . It also preserve s the restriction, discussed in Section 3, that Y T + s e is equal to 0 wheneve r Y T a is. Y T ( v ′ )+ s e  v , Y T ( v ′ ) a ( v ′ )  is the coun terfactual ﬂu status of the ego at time T ( v ′ ) + s had the alter’s v accine status b een s et to v and his ﬂu status at time T ( v ′ ) s et to its coun terfactual v alue under v accine status v ′ , where T ( v ′ ) is the time at whic h the ﬁrst infection in the alter- ego pair wou ld ha ve o ccurred if V a had b een set to v ′ . The eﬀects of int erest are the a vera ge 13 con tagion eﬀect C on = E h Y T (1)+ s e  0 , Y T (1) a (1) i E h Y T (0)+ s e  0 , Y T (0) a (0) i (9) and the a ve rage infectiousness eﬀect I nf = E h Y T (1)+ s e  1 , Y T (1) a (1) i E h Y T (1)+ s e  0 , Y T (1) a (1) i , (10) where the exp ectations are tak en o ver all ego-alter pairs. In order to iden tify the eﬀects deﬁned in (9) and (10), w e m ust measure and con trol for all confounders of the relationships b et w een Y T + s e and Y T a , and in particular the p oten tial m utual infectious con tacts of the alter and ego. T o motiv ate our pro cedure for con trolling f or these confounding contac ts, consider the simple case of a gr oup of size three, comprised of a c hild (ego), a parent (alter), and a grandparen t. In the ev ent that the grandparen t contr acted the ﬂu ﬁrst and transmitted it to b oth the child and the paren t, the grandparen t’s ﬂu status w ould clearly b e a confounder of the mediator-outc ome relationship. But the grandparen t’s en tire disease tra jectory is not a p otenti al confounder; in particular an ything that happ ens to the grandpare n t after time T , that is after the ﬁrst infection in the paren t-chil d pair, o ccurs after the mediator and cannot p ossibly confound the mediator-outcome relationship. In this simple, three-person group, it suﬃces to con trol for an indicator of whether the grandparen t has b een sic k b y time T − b , where T is the time of the ﬁrst infection b et we en the paren t and c hild, and T − b is the latest time at whic h the grandparen t could ha v e b een the cause of an infection at time T . In practice, w e will likely ha ve to sample groups of size greater than three in order to con trol for confounding by p oten tial mu tually infectious con tacts. It is generally s uﬃcien t to con trol for a summary measure of the infections o ccurring b efore T − b in eac h group. If eac h infectious conta ct of an individual has an indep endent probabilit y of transmitting the disease to the individual, then the sum P n k i =1 Y T − b k i of indicators of whether eac h m utual con tact has b een sic k b y time T − b suﬃces to contro l for confounding b y p oten tial m utual infectious con tacts. Unde r a diﬀerent transmission mo del, the prop ortion P n k i =1 Y T − b k i / n k of con tacts who w ere sic k b y time T − b could b e the op erativ e summary measure. If some of the m utual con tacts ma y ha ve been v accinated, then separate summary measures (s um or proportion sic k b y time T − b ) s hould b e included f or v accinated and for un v accinated contacts. In what follo ws w e will assume that the sum is an adequate summary measure. 14 4.1 Alternativ e sampling sc hemes Alter-cen tric sampling can also b e used to collect data on v ariables that suﬃce to ident ify the con tagion and infectiousness eﬀects. Instead of sampling an alter-ego pair and all of their m utual con tacts, we can sample an individual to serv e as the alter and all of his p oten tially infectious con tacts. The ego is randomly selected f rom among the alter’s con tacts. Conditional on the num b er of the alter’s con tacts who ha ve b een infectious b y da y T − b , Y T a is indep enden t of the n umber of mutual con tacts who w ere sick b y time T − b . The n umber of m utual con tacts is no longer a confounder of the relationship b et w een Y T a and Y T + s e and there is no need to ascertain th e identit y or disease status of the m utual con tacts. Ho w ever , the n um b er of p oten tially infectious contac ts of a single p erson can b e v ast, and it ma y b e easier to iden tify m utual con tacts of a pair of individuals than all con tacts of an y one individual. 5 Infectiousness and con tagion in so cial net w orks So f ar, we hav e assumed that our ob serv ations, comprised of groups of ind ividuals, w ere independent of one another. This as s umption will, in general, b e violated when the alter- ego pairs are sampled from a single comm unit y or so cial netw ork. W e in tro duce some new notation for this con text after brieﬂy describing the example that will serv e as the basis for our exp osition and later for our simulati ons and data analysis. Consider trac king the s easonal ﬂu in the studen t p opulation of a college at whic h all s tuden ts liv e in dorms on campus. Eac h studen t is a no de in the net w ork. W e deﬁne a tie to exist betw een tw o no des if the individuals regularly in teract with one another in a w ay that could facilitate transmission of the ﬂu. F or example, if t wo individuals are ro ommates, eat togeth er in the dining hall, or are close friends, then their nodes share a tie. W e observ e eac h individual’s ﬂu status every da y o v er the course of the ﬂu season, which lasts for 100 da ys. The con tagion and infectiousness eﬀects C on and I n f , deﬁned in Section 4, are not es- timable from so cial net w ork data using the metho ds that we prop ose b elo w. Instead w e can deﬁne new con tagion and infectiousness eﬀects suc h that h y p othesis tests based on the new eﬀects are v alid and consisten t tests of the h yp otheses that C on and I n f are n ull. W e give assumptions under which the new estimands are estimable fro m net w ork data using GLMs and w e demonstrate that tests of the h y p otheses for the new estimands are v alid and consisten t for C on and I nf . 5.1 Assumptions Along with assumptions (1) - (4), w e make sever al additi onal assumptions that facili tate inference using so cial net wo rk data. Deﬁne A i = { j : i and j share a tie } to b e the collection 15 of indices f or individual i ’s con tacts. W e assume that Y t i ⊥ Y r j |    X m ∈A i : V m = v Y t − b m , v = 0 , 1    , for all j / ∈ A i and r ≤ t . (11) The set in the conditioning ev en t includes the num ber of v accinated con tacts of individual i who were sick on or b efore da y t − b and the n um b er of unv accinated conta cts of individual i who w ere sic k on or bef ore da y t − b . This assumption says that the outco me of individual i at time t is indep enden t of all past outcom es for non-con tacts of i , conditional on a summary measure of the ﬂu history of the con tacts of i . In other word s, cont acts act as a causal barrier b et we en t wo no des who do not themselves share a tie. If tw o individuals, i and j , do not share a tie, then they can ha ve no eﬀect on one another’s disease status that is not through their con tacts’ disease statuses. Because t − b is the latest time at whic h a disease transmission could aﬀect Y t i , w e do not need to condition on the con tacts’ outcomes past that time. This assumption implies that the total n umber of v accinated and un v accinated con tacts of individual i who ha ve b een sic k b y day t − b are a suﬃcien t s ummary measure of the complete history of all of i ’s con tacts. It could easily b e mo diﬁed so that the probabilit y of b eing infected at any given time dep ends on a diﬀeren t summary measure, f or example on the prop ortion of alters who w ere infectious at or b efore time t − b . W e also assume that Y t i ⊥ V j |    X m ∈A i : V m = v Y t − b m , v = 0 , 1    for all j / ∈ A i (12) and that, f or an y cov ariate C that is required f or (1) through (4) to hold, Y t i ⊥ C j |    X m ∈A i : V m = v Y t − b m , v = 0 , 1    for all j / ∈ A i . (13) These assumptions state that an y eﬀect of the co v ariates (including v accination) of no des without ties to i on i ’s disease status w ould again ha v e to b e mediated by the disease statuses of i ’s conta cts. Assumption (12) implies that the infectiousness eﬀect is not transitiv e: whether individual j caugh t the ﬂu from a v accinated or un v accinated p erson has no inﬂuence on whether individual j transmits the ﬂu. Em b edded in assumptions (11)-(13) is the ass umption that all ties are equiv alen t and all non-ties are equiv alen t with resp ect to transmission of the outcome. This is likely to b e a simpliﬁcation of reality . It can b e relaxed (s ee Section 5.3), but w e mak e it now for heuristic purp oses. I t rules out the p oss ibilit y that some t yp es of ties , like ro ommates, are more lik ely to facilitate disease transmission than others, lik e friends who liv e in diﬀeren t dorms. It allow s 16 an individual to come into contac t with and p ossibly infect (or b e infected by) p eople with whom he do es not share a tie, but it en tails that he will come in to con tact with an y individual in the net wor k who is not his con tact with equal probabilit y . This rules out, for example, the p ossibilit y that an individual is more lik ely to b e infected b y the friends of his friends than b y a distan t no de on the netw ork. W e also make the no-unmeasure d-confounding assumption that, if there exists a p erson with whom t wo individuals in the netw ork inter act regularly , then that p erson is also in the net w ork (with ties to b oth individuals). I n some settings it may b e p ossible to satisfy this condition, e.g. in full s o ciometric studies conducted de no v o, or in s tudies of online data. 5.2 Estimation and hypothesis testing Consider the follo wing strategy for estimating a new con tagion and new infectiousness eﬀect, deﬁned b elo w: 1. Randomly select from the net w ork K pairs of no des suc h that the t w o no des in each pair share a tie, but, for each pair, neither no de nor any of th eir con tacts has a tie to a no de in any other pair or to the contac ts of any mem b er of any other pair. The num b er of p ossible suc h pairs will depend on the netw ork size and top ology . In the next section, w e discuss methods f or sampling these pairs. Randomly s elect one mem b er of each pair to b e the ego and one to b e the alter. 2. Index the pairs b y k , and let e k index the ego and a k the alter in the k th pair. F or the k th pair, deﬁne a group, also indexed by k , that includes no des a k , e k , A e k , and A a k . That is, it includes the alter-ego pair and all no des with ties to either the alter or the ego. Due to the w ay we selected pairs, none of the mem b ers of group k can b elong to an y other group. Belo w, w e suppress the index k when con text allo ws. As in the sections ab ov e, T k is the time of the ﬁrst infection in the pair ( a k , e k ) . Let C k b e a collection of co v ariates f or group k , where the v ariabl es included in C are precisely those required f or ass umptions (1) through (4) to hold for outcome Y T (1)+ b e , mediator Y T (1) a , and treatmen t V a . Note that V e should b e included in C as it is likely to b e a confounder of the mediator - outcome relationship. The n um b er of m utual con tacts of the alter and ego who w ere s ic k by time T − b mu st also b e included. 3. Let U T k + f e k and L T k + f e k b e the n umber of un v accinated and v accinated no des, resp ectively , with ties to e k who wer e sic k b y time T k + f . Deﬁne U T − b a k and L T − b a k similarly as the n um b er of un v accinated and v accinated no des, resp ectivel y , with ties to a k who w ere s ick b y time T k − b . Recall that f is the infectiousness p erio d and b the incubation p erio d, deﬁned in Section 3.2.1. 17 4. Estimat e an a v erage mo diﬁed con tagion eﬀect C on ∗ = E h Y T (1)+ s e (0 , Y T (1) a (1)) | U T − b a , L T − b a , U T + f e , L T + f e , C i E h Y T (0)+ b e (0 , Y T (0) a (0)) | U T − b a , L T − b a , U T + f e , L T + f e , C i and an a vera ge mo diﬁed infectiousness eﬀ ect I nf ∗ = E h Y T (1)+ s e (1 , Y T (1) a (1)) | U T − b a , L T − b a , U T + f e , L T + f e , C i E h Y T (1)+ s e (0 , Y T (1) a (1)) | U T − b a , L T − b a , U T + f e , L T + f e , C i and their standard errors. Through Step 2, the pro cedure we describ ed is nearly iden tical to the propos al in Section 4, the only diﬀerence b eing that groups are extracted from a netw ork in Step 1 rather than b eing independent ly ascertained. Consideration for this sampling sc heme b ecomes crucial when we estimate the parameter s of GLMs lik e (5) and (6). The standard errors derived f rom these GLMs are consisten t only if the residuals across groups are u ncorrelated. The residuals are indeed uncorrelated for indep enden t groups, but, in the net wo rk setting, they generally are not. How ev er, the s et of additional cov ariates int ro duced in Step 3 essen tially blo cks the ﬂo w of information b etw een groups. Conditional on these additional cov ariates, the residuals are uncorrelated , ev en in the net w ork s etting (see next section for proof ). Roughly , because U T k + f e k and L T k + f e k summarize the disease statuses of the ego’s con tacts b days b efore the outcome Y T (1)+ s e k is as sessed, condition ing on them ensures that the outcomes are uncorrelated across groups. Because U T − b a k and L T − b a k summarize the disease statuses of the alter’s con tacts b da ys b efore the mediator Y T (1) a k is as sessed, conditioning on them ensures that mediators are uncorrelated across groups. The eﬀects deﬁned in Step 4 diﬀer from C on and I nf only in the conditioning s et, but this c hanges sligh tly the causal eﬀect being estimated. Conditioning on U T − b a and L T − b a is j ust lik e conditioning on an ex tra pair of confounders: these v ariables o ccur b efore the mediator and are indep enden t of the treatmen t; therefore they can b e considered to b e pre-treatmen t co v ariates. On the other hand, U T + f e and L T + f e o ccur after the mediator and lie on a poss ible path w ay f rom the mediator to the outcome. Conditioning on th es e v ariables has the eﬀect of biasing C on ∗ and I n f ∗ to w ards the n ull relative to C on and I nf , b ecause it blo c ks the path from Y T a to Y T + s e that op erates when the alter infects a friend of the ego, who then infects the ego. How ev er, conditioning on these v ariables lea ves th e direct path from Y T a to Y T + s e op en, and this path op erates whenev er the alter infects the ego directly . Therefore, whenev er C on and I nf are non-n ull so are I nf ∗ and C on ∗ . Hyp othesis tests using C on ∗ and I nf ∗ are conserv ativ e and consistent for h yp othesis tests for C on and I nf . Similarly , tests that C on ∗ and I nf ∗ are less than the nu ll v alue or are greater than the n ull v alue are also v alid and 18 consisten t for the analogous tests for C on and I nf , resp ectiv ely . 5.2.1 Justiﬁcation for the use of GLMs Supp os e that the mo dels g ( E h Y T k + s e k | V a k , Y T k a k = 1 , U T − b a , L T − b a , U T + f e , L T + f e , C k i ) (14) = β 0 + β 1 V a k + β 2 U T − b a + β 3 L T − b a + β 4 U T k + f e + β 5 L T k + f e + β ′ 6 C k and m ( E h Y T k a k | V a k , U T − b a , L T − b a , U T + f e , L T + f e , C k i ) = α 0 + α 1 V a k + α 2 U T k − b a + α 3 L T k − b a + α 4 U T k + f e + α 5 L T k + f e + α ′ 6 C k (15) are correctly sp eciﬁed for g () , m () kno wn link functions. F or the eﬀect on the ratio scale with a binary common outcom e lik e the ﬂu w e w ould sp ecify g () to b e the log link and m () the logit link, lik e w e did in Section s 3 and 4. W e ha v e only to pro ve that the residuals from mo del (14) are uncorrelated with one another and that the residuals f rom mo del (15) are uncorrelated with one another (Breslo w, 1996; Gill, 2001). Result 1 Let R es a k = Y T k a k − m − 1  α 0 + α 1 V a k + α 2 U T k − b a + α 3 L T k − b a + α 4 U T k + f e + α 5 L T k + f e + α ′ 6 C k  . Then Res a k and R es a h are uncorrelated. Pro of Without loss of generalit y assume that T k > T h . Under correct sp eciﬁcation of (15), E [ Res a k ] = E [ R es a h ] = 0 . Therefore C ov ( Res a k , R es a h ) = E [ Res a k Res a h ] . Letting S k denote the set of v ariables n V a k , U T − b a , L T − b a , U T + f e , L T + f e , C k o , w e hav e E [ Res a k Res a h ] = E [ E [ R es a k Res a h | S k , S h ]] = E h E hn Y T k a k − E h Y T k a k | S k io n Y T h a h − E h Y T h a h | S h io | S k , S h ii = E h E h Y T k a k − E h Y T k a k | S k i | S k , S h i × E h Y T h a h − E h Y T h a h | S h i | S k , S h ii = E hn E h Y T k a k | S k , S h i − E h Y T k a k | S k io × E n Y T h a h − E h Y T h a h | S h i | S k , S h oi = E hn E h Y T k a k | S k i − E h Y T k a k | S k io × E n Y T h a h − E h Y T h a h | S h i | S k , S h oi = 0 . The second equalit y f ollo ws from the correct sp eciﬁcation of (15). The third equalit y holds b ecause, b y assumptions (11), (12 ), and (13), Y T k a k ⊥ Y T h a h | S k , S h . The ﬁfth inequalit y holds b ecause Y T k a k ⊥ S h | S k , again b y assumptions (11), (12), and (13 ). 19 Result 2 Let R es e k = Y T k + s e k − g − 1  β 0 + β 1 V a k + β 2 U T k − b a k + β 3 L T k − b a k + β 4 U T k + f e + β 5 L T k + f e k + β ′ 6 C k  . Then Res a k and R es a h are uncorrelated. The proof of R esult 2 is v ery similar to the pro of of Result 1 and we therefore omit it. It relies on the fact that, conditional on the fact that T + f = T + s − b and therefore conditioning on U T k + f e k and L T k + f e k satisﬁes the conditions of assumptions (11 ), (12), and (13) and renders Y T k + s e k independent of outcomes, v accines, and co v ariates for other groups. 5.2.2 Implement ation Step 1 is the most diﬃcult to implemen t. One could enu merate all p ossible w ays of parti- tioning the net w ork into non-o v erlapping groups comprised of a pair of no des and all of their con tacts, asso ciate the partitions with a discrete uniform distribution, and randomly sample one realization of the uniform distribution. Steps 2 and 3 of the testing pro cedure are p er- functory . If w e deﬁne C ∗ =  U T − b a , L T − b a , U T + f e , L T + f e , C  to be a new collection of cov ariates then step 4 pro ceeds as in Sections 3 and 4. In teractions b et ween comp onen ts of C ∗ and the other predictors in the mo del can easily b e accommodated. T o test the h yp otheses that C on and I nf are n ull, we estimate 95% conﬁdence in terv als for the mo diﬁed con tagion and infec- tiousness estimands ( C on ∗ and I nf ∗ ) based on the estimates and standard errors calculated in Step 4. W e reject the h yp othesis that C on is null if our conﬁdence in terv al for the estimand in C on ∗ do es not include the n ull v alue and w e reject the hypothesis that I nf is n ull if our conﬁdence in terv al for the estimand in I nf ∗ do es not include the n ull v alue. 5.3 Relaxing some assumptions W e assumed throughout th at v accination o ccurs before the start of follo w-up, but this is not necessary f or our metho ds. If v accination can o ccur during follo w-up, deﬁne V t i to b e an indicator of havin g b een v accinated b y time t . Ass ume that the eﬀect of v accination, including an y infectiousness eﬀect, is immediate. If an individual b ecomes infectious on da y T , he w ould ha ve b een infected on da y T − b . If he was v accinated by time T − b , then the v accine w ould hav e b een in full eﬀect at the time of infection. Then V T − b a can replac e V a as the “treatmen t” in the con tagion, infectiousness, and indirect eﬀects. W e similarly redeﬁne the summary measures for v accinated and un v accinated con tacts of the alter and ego that app ear in ass umptions (11) through (13) and that are included in C . Include V T − b e in the set of confounders b ecause the mediator o ccurs at time T and therefore the ego’s v accination status at time T − b suﬃces to con trol for any confounding. W e assumed throughout that the infectious and incubation p erio ds ( f and b ) are constan t across individuals. These ass umptions, along with the ass umption that the eﬀect of v accination is immediate, could b e relaxed if the determinan ts of time to eﬃcacy of v accine, length of infectious p erio d, and length of incubation p erio d w ere obs erved cov ariates. In this case w e 20 could, f or example, infer eﬀective time of v accination , incubation p erio d, and infectious p erio d for eac h individual based on their cov ariates. W e assumed in Section 5.2 that the probabilit y of disease tra nsmission b et ween tw o con- nected no des do es not dep end on the t yp e of tie. This ass umption can b e a voide d with the addition of sev eral co v ariates to mo dels (14 ) and (15): w e w ould condition on the type of tie that exists b et we en the alter and the ego, and also include separate U an d L terms for eac h t yp e of tie. W e also assumed in Section 5.2 that an individual will come int o con tact with any individual in the net work who is not his con tact with eq ual probabilit y . This can b e relaxed b y expanding the k groups w e deﬁne in Step 1 of the estimation pro cedure to include no des within sev eral degrees of s eparation f rom the alter and ego. 6 Sim ula tions 6.1 Indep enden t groups W e ran sim ulations for three diﬀeren t sample sizes, K = 200 , K = 500 , and K = 1000 independent groups. Eac h group comprised an alter, an ego, and n k m utual con tacts. First w e generated K con tact group s izes n k b y sampling from a P oisson distribution with mean λ = 3 . Next, w e assigned v accination statuses to eac h individual in eac h group, including the alters and egos, with probabilit y 0 . 4 . W e simul ated the b eha vior of each group during a ﬂu epidemic ov er 100 da y s. F or the purp oses of the sim ulation, we assumed that eac h mem b er of a group had con tact with all other mem b ers of the same group. Eac h da y , an uninfected mem b er of a group had a baseline probab ilit y of p o of b eing infected from outside of the group, a baseline probabilit y of p u of b eing infected b y an y infectious, un v accinated mem b er of the same group and a baseline probabilit y of p v of b eing infected b y an y infectious, v accinated mem b er of the same group. If v accinated , an individual’s probabilit y of b eing infected by an y source was mul tiplied by δ ≤ 1 . If infected on day t , an individual w as infectious from da y t + 1 through da y t + 4 and incapable of b eing infected or transmitting infection f rom day t + 5 unt il the end of f ollow-u p. This corresp onds to an incubation p erio d of b = 1 and an infectious p erio d of f = 3 , and it mimics the ﬂu, for whic h the incubation p erio d is b et ween one and three da ys and the infectious perio d is betw een three and six da ys (Earn et al., 2002). In all sim ulations, w e ﬁxed p o = 0 . 01 . W e sp eciﬁed t wo diﬀeren t sim ulation settings for the parameters δ , p v , and p u , one setting corresp onding to the n ull of no infectiousness or con tagion eﬀects ( δ = 1 ; p v = p u = 0 . 4 ) and one s etting corresp onding to the presence of protectiv e con tagion and infectiousness eﬀects ( δ = 0 . 1 ; p v = 0 . 5 , p u = 0 . 05 ). W e simulat ed 500 epidemics eac h under o f the t wo s cenarios, and for each s im ulation we estimated the infectiousness and con tagion eﬀects as follow s: Among the s ubset of groups with Y T a = 1 and using a log-linear link f unction, w e regressed Y T + s e on V a and on the set of p oten tial 21 T able 1: Sim ulation results for independen t groups Under H 0 Num b er of groups Infectiousness (SE) Co v erage Con tagion (SE) Cov erage K = 200 1.0 14 (0.138) 94% 1.016 (0.202) 94% K = 500 1.0 01 (0.082) 92.2% 0.997 (0.117) 93.6% K = 1000 0.997 (0.057) 95% 1.001 (0.083) 94% Under H A Num b er of groups Infectiousness (SE) P ow er Con tagion (SE) P o we r K = 200 0.4 53 (1.160) 49% 0.258 (0.079) 100% K = 500 0.4 43 (0.154) 86% 0.255 (0.049) 100% K = 1000 0.445 (0.107) 100% 0.258 (0.034) 100% confounders comprised b y the ego’s v accination status, the sum U T − b a of un v accinated m utual con tacts who were infectious at time T − b , and the sum L T − b a of v accinated m utual con tacts who w ere infectious at time T − b . W e regressed Y T a on the same co v ariates using a logistic link function. The contag ion and infectiousness eﬀ ects are iden tiﬁed by the expressions given in (7) and (8), ev aluated at the sample mean v alue of the co v ariates U T − b a and L T − b a . W e b o otstrapp ed the standard errors with 500 b o otstrap replications. The results are giv en in T able 1. F or each simula tion setting, that is, for eac h sample size ( K ) and f or b oth the null h y p othesis and the alternativ e h yp othesis, w e presen t the mean point estimates for the infectiousness and con tagion eﬀects on the ratio scale, the mean b o otstrap standard error estimator, and the p ercent co verage of the 95% conﬁdence in terv al based on the 2 . 5 th and 97 . 5 th b o otstrap quan tiles. F or sim ulations under the nul l hypothesis, we rep ort co vera ge and for sim ulations under the alternativ e we report p ow er, giv en b y 100% min us the p ercen t co v erage. The p oint estimates are s table across sample sizes and the co verag e of the basic b o otstrap conﬁdence in terv al is close to 95% under the n ull for all K . The p ow er under the alternativ e is 100% f or the contagi on eﬀect, but for the infectiousness eﬀect p o w er is low (49%) when K = 200 . 6.2 So cial net work data The procedure prop osed in Section 5.2 for hypothesis testing using so cial net w ork data suﬀers from lo w p o we r. In part this is b ecause C on ∗ and I nf ∗ are biased to w ards the n ull relativ e to C on and I nf , but the primary reason for the loss of p o wer is the extraction of conditional ly independent pairs of no des from the net wo rk. As the simul ation illustrates, this results in a dramatic reduction in the sample size used for analysis. Because infectious outcomes sam- pled from nodes in a net w ork are dep enden t, the eﬀectiv e sample size for inference ab out suc h outcomes will alw ays be smaller than the observ ed n um b er of no des, and ho w m uch more infor- 22 mation ab out the parameters of in terest is a v ailable dep ends on the sp eciﬁc s etting. Imp ortan t areas for future researc h include determining the eﬀectiv e sample size when observ ations are sampled from a net w ork and are therefore dep enden t, and developin g methods that mak e use of all av ailable information. W e ran sim ulations for three diﬀerent net w ork s izes: 12000 no des, 10000 no des, and 8000 no des. W e sim ulated a net wor k of 1000 0 no des as follo ws: ﬁrst, w e sim ulated 2000 indep endent groups of 5 no des, with eac h group b eing fully connected (i.e. there are ties betw een eac h pair of no des in the group of 5). F or eac h no de w e then added a tie to e ac h out-of-group no de with probabilit y 0 . 0001 . Because ties are undirected (if no de i is tied to no de j , then b y deﬁnition no de j is tied no de i ), this results in appro ximately 2 exp ected out-of-group ties p er no de. T o sim ulate net w orks of size 12000 and 8000 , we sim ulated 2400 and 1600 indep enden t groups, resp ectiv ely , and scaled the probabilit y of an out-of-group tie to main tain an exp ected v alue of appro ximately 2 for eac h no de. This net w ork structure could represen t a sample of families living in a cit y , where individuals are fully connected to the members of their f amily and o ccasionally connected to mem b ers of other f amilies. After running step 1 of the pro cedure outlined in Section 5.2, w e w ere left with K = 707 alter-ego pairs for the net work of size 12000 , K = 581 for the netw ork of size 10000 , and K = 466 for the netw ork of size 8000 . On eac h of these three ﬁxed netw orks, w e sim ulated 200 epidemics under the n ull of no infectiousness or con tagion eﬀect and 200 epidemics under the alternativ e. F or each sim ulation, w e assigned v accination statuses to eac h individual in the net w ork with probabilit y 0 . 5 . W e then sim ulated the b eha vior of eac h group during a ﬂu epidemic o ver 100 days. An uninfected no de had a probabilit y of p o = 0 . 01 of b eing infected from outside of the net w ork on day 1 and there w ere no outside infections thereafter. Under the alternativ e, on eac h da y an uninfected no de had a baseline probabilit y of p u = 0 . 5 of b eing infected b y any infectious, un v accinated con tact and group and a baseline probabilit y of p v = 0 . 01 of b eing infected by any infectious, v accinated con tact. If v accinated, an individual’s probabilit y of b eing infected b y an y source w as m ultiplied b y δ = 0 . 2 . Under the null , on eac h day an uninfected no de had a probabilit y of p u = p v = 0 . 5 of b eing infected b y an y infectious con tact (that is, no de with whic h it shared a tie). T o ensure that the con tagion eﬀect w as n ull, we sp eciﬁed that δ = 1 , that is, that v accination had no protectiv e eﬀect against con tracting the ﬂu. In b oth settings, if infected on da y t an individual w as infectious from da y t + 1 through da y t + 4 and incapable of b eing infected or transmitting infection from da y t + 5 un til the end of follo w-up. F or eac h sim ulation, w e estimated the infectiousness and contagi on eﬀects follo wing the pro cedure describ ed in Section 5.2. W e ev aluated these eﬀects at the sample mean v alue of the cov ariates U T − b a , L T − b a , U T + f e and L T + f e . W e b o otstrapp ed the standard errors with 1000 b o otstrap replications. The results are given in T able 2. F o r eac h sim ulation setting, that is for eac h net wo rk s ize and for b oth the n ull hypothesis and the alternativ e h y p othesis, we presen t the mean p oin t estimates for the infectiousness and con tagion eﬀects on the ratio 23 T able 2: Sim ulation results for net w ork data Under H 0 Net wor k size Infectiousness (SE) Co verage Con tagion (SE) Co v erage 8000 no des 0.996 (0.001) 100% 1.205 (1.657) 96% 10000 no des 1.000 (0.001) 100% 1.183 (1.183) 94% 12000 no des 1.001 (0.001) 100% 1.166 (0.) 94% Under H A Net wor k size Infectiousness (SE) P ow er Con tagion (SE) P ow er 8000 no des 0.650 (0.259) 45% 0.168 (0.017) 99% 10000 no des 0.616 (0.072) 53% 0.164 (0.013) 100% 12000 no des 0.609 (0.054) 63% 0.164 (0.010) 100% scale, the mean b o otstrap standard error estimator, and the p ercen t co verag e of the 95% conﬁdence interv al based on the 2 . 5 th and 97 . 5 th b o otstrap quantil es. F or sim ulations under the alternati v e h yp othesis w e calculated the p ow er, given b y 100% min us the p ercen t cov erage. F or the 8000 - and 10000 -node net wor ks, there w ere 6 and 1 simula tions, resp ectively , out of 200, for whic h the GLMs used to estimat e the param eters inv olv ed in the conta gion and infectiousness eﬀects did not conv erge due to empt y strata of the predictors. W e omit these sim ulations from the results in T able 2, but note that in a ext reme cases con verge nce could b e an is sue in addition to p o w er. The p oin t estimates are stable across net wor k sizes and the co v erage of the basic b o otstrap conﬁdence int erv al is close to or ab o ve 95% under the n ull for all net w ork sizes. The p o w er under the alternativ e is close to 100% for the con tagion eﬀect for all net w ork sizes, but for the infectiousness eﬀect p o w er is lo w: 45% for the netw ork of s ize 8000 , increasing to 63% for the net w ork of size 12000 . One concern that has b een raised ab out previous uses of statis tical mo dels like GLMs and GEEs for netw ork data is the p ossibility that the mo dels lac k an y p o wer to reject the nu ll h yp othesis when the alternativ e is true (Shalizi, 2012). This is a concern b ecause the mo dels are inheren tly misspeciﬁed under the alternativ e h y p othesis, even if they are correctly sp eciﬁed under the nu ll hypothesis. Because the metho ds we prop os e here can b e correctly sp eciﬁed under b oth the n ull and the alternativ e hypotheses, they can b e p ow ered to reject the nul l h yp othesis when the infectiousness or con tagion eﬀect is pr esen t. 7 Discussion W e prop osed methods for consistentl y estimating con tagion and infectiousness eﬀects in in- dep enden t groups of arbitrary size; these metho ds are easy to implemen t and p erform w ell 24 in s imulat ions. W e extended our metho dology to groups sampled f rom so cial netw ork data, pro viding a theoretically justiﬁed metho d for using GLMs to analyze net work data. Note that the principles we applied to GLMs can b e applied to GEEs a s w ell, resulting in correctly sp eciﬁed GEEs for netw ork data. The principles that justify our use of GLMs to estimate the con tagion and infectiousness eﬀects are easily extended to any estimand for whic h GLMs wou ld b e a desirable mo deling to ol. Ho wev er, our net work data metho ds require a large amoun t of data and are not appropriate for small or dense net works. On the one hand this highligh ts the fact that dep endence among observ ations in netw orks reduces eﬀectiv e s ample s ize and necessitates larger samples; on the other hand metho ds should b e dev elop ed that can harness more information from the data and increase the p o wer to detect conta gion, infectiousness, and other causal eﬀects. References Airoldi, E., T oulis, P ., Kao, E., and Rubin, D. B. “Estimation of causal peer inﬂuence eﬀects.” In Pr o c e e dings of the 30th In ternational Confer enc e on Machine L e arning, A tlanta, GA, JMLR: W&CP , vo lume 28 (2013). Ali, M. M. and Dwyer, D. S. “Estimating p eer eﬀects in adolescent smoking b eha vior: A longitudinal analysis.” Journal of A dolesc ent He alth , 45(4):402–408 (2009). Anderson, R. M., May , R . M., et al. “V accination and herd imm unit y to infectious diseases.” Natur e , 318(6044):323 –329 (1985). Arono w, P . M. and Samii, C. “Estimatin g a v erage causal eﬀects under general in terference.” In Summer Me eting of the So ciety for Politic al Metho dolo gy, Uni versity of North C ar olina, Chap el Hi l l, July , 19–21. Citeseer (2012). Bo we rs, J., F redric kson, M. M., and Pa nagopoulos, C. “Reasoning ab out In terference Bet w een Units: A General F ramew ork.” Politic al Analysis , 21(1):97–124 (2013). Breslo w, N. E. “Generaliz ed linear mo dels: c hec king assumptions and strengthening conclu- sions.” Statisti c a Applic ata , 8:23–41 (1996). Cacioppo, J. T., F o wler, J. H ., and Christakis, N. A. “Alone in the cro wd: the structure and spread of loneliness in a large so cial net w ork.” Journal of p ersonality and so cial psycholo gy , 97(6):977 (2009). Christakis, N. A. and F o wler, J. H. “The spread of ob es ity in a large s o cial net w ork ov er 32 y ears.” New England Journal of Me di cine , 357(4):370–379 (2007). 25 —. “The collectiv e dynamics of smoking in a large so cial net work.” New England journal of me dicine , 358(21):224 9–2258 (2008). —. “Social con tagion theory: examining dynamic so cial netw orks and huma n b eha vior.” S tati s- tics i n Me di cin e , 32(4):556–577 (2013). Cohen-Cole, E. and Fletc her, J. M. “I s ob esity con tagious? So cial net w orks vs. en vironmen tal factors in the ob esit y epidemic.” Journal of He alth Ec onomics , 27(5):1382–1387 (2008). Eames, K. T. D. and Keeling, M. J. “Mo deling dynamic and net work heterogeneities in the spread of s exually transmitted diseases.” Pr o c e e dings of the National A c ademy of Scienc es , 99(20):13330 –13335 (2002). —. “Monogamous net w orks and the spread of sexually transmitted diseases.” Mathematic al Bioscienc es , 189(2):115–130 (2004). Earn, D. J. D., Dushoﬀ, J., and Levin, S. A. “Ecology and ev olu tion of the ﬂu.” T r ends in Ec olo gy & Evolution , 17(7):334–3 40 (2002). Eubank, S., Guclu, H., Kum ar, V. S. A ., Marathe, M. V., Sriniv asan, A., T oroczk ai, Z. , and W ang, N. “Modelling disease outbreaks in realistic urban so cial net wo rks.” Natur e , 429(6988):18 0–184 (2004). Fine, P . E. M. “Herd imm unity: history , theory , practice.” Epidemiolo gic R eviews , 15(2):265 – 302 (1993). F o wler, J. H. and Christakis, N. A. “Estimating p eer eﬀects o n health in so cial netw orks: A resp onse to Cohen-Cole and Fletch er; T rogdon, Nonnemak er, P ais.” Journal of he alth e c onomics , 27(5):1400 (2008). Gill, J. Gener alize d Line ar Mo dels: A Uniﬁe d Appr o ach , v olume 134. Sage Publicati ons, Inc (2001). Halloran, M. E. and Hudgens, M. G. “Causal inference for v accine eﬀects on infectiousness.” International Journal of Biostatistics , 8(2) (2012). Halloran, M. E. and Struc hiner, C. J. “Study designs for dep endent happenings.” Epidemiolo gy , 2(5):331–338 (1991). —. “Mo deling transmission dynamics of stage-sp eciﬁc malaria v accines.” Par asitolo gy T o day , 8(3):77–85 (1992). —. “Causal inference in inf ectious diseases.” Epidemiolo gy , 6(2):142–1 51 (1995). 26 Imai, K., Keele, L., and Tingley , D. “A ge neral approac h to causal mediation ana lysis.” Psycholo gic al Metho ds , 15(4):309–334 (2010). John, T. J. and Sam uel, R. “Herd immu nit y and herd eﬀect: new insigh ts and deﬁnitions.” Eur op e an Journal of Epi demiolo gy , 16(7):601–606 (2000). Keeling, M. J. and Eames, K. T. “Netw orks and epidemic mo dels.” Journal of the R oyal So ciety Interfac e , 2(4):295–307 (2005). Keller, M. A. and Stiehm, E. R. “P assive imm unit y in prev ent ion and treatmen t of infectious diseases.” Clin ic al Micr obiolo gy R eviews , 13(4):602–614 (2000). Klo vdahl, A. S. “So cial net w orks and the spread of inf ectious diseases: the AI DS example.” So cial Scienc e and M e dicine , 21(11):1203 –1216 (1985). Klo vdahl, A. S., P otterat, J. J., W o o dhouse, D. E., Muth, J. B., M uth, S. Q., and Darro w, W. W. “So cial net w orks and infectious disease: The Colorado Springs s tudy .” S o cial Scienc e and M e dicine , 38(1):79– 88 (1994). Latora, V., Ny am ba, A., Simp ore, J ., Sylvett e, B., Diane, S., Sy lvere, B., an d Musumeci, S. “Net work of sexual con tacts and s exually transmitted HIV infection in Burkina F aso.” Journal of Me dic al Vir olo gy , 78(6):724–729 (2006). Lazer, D., Rubineau, B., Chetk o v ic h, C., Katz, N., and Neblo, M. “The co evolutio n of net w orks and p olitical attitudes.” Politic al Communi c ation , 27(3):248–274 (2010). Ly ons, R. “The spread of evidence-p o or medicine via ﬂa wed so cial-net work analysis.” Statistics, Politics, and Policy , 2(1) (2011). No el, H. and Nyhan, B. “The unfriending problem: The conseque nces of homophily in friend- ship retenti on for causal estimate s of so cial inﬂuence.” So cial Network s , 33(3):211– 218 (2011). O’Brien, K. and Dagan, R. “The p oten tial indirect eﬀect of conjugate pneumo co ccal v accines.” V ac cine , 21(17-18):1815–18 25 (2003). Ogburn, E. L. and V anderW eele, T. J. “Causal diagrams for in terference.” T ec hnical report (2013). P earl, J. “Direct and indirect eﬀects.” In Pr o c e e dings of the Sevente enth Conf er enc e on Un- c ertainty in Artiﬁcial Intel ligenc e , 411–420 (2001). Robins, J. M . and G reenland, S. “I den tiﬁabilit y and exc hangeabilit y for direct and indirect eﬀects.” Epidemiolo gy , 3(2):143– 155 (1992). 27 Robins, J. M. and Rich ardson, T. S. “Alternativ e graphical c ausal mo dels and the iden tiﬁ- cation of direct eﬀects.” In Shrout, P . (ed.), Causality and Psychop atholo gy: Finding the Determinants of Disor ders and Their C ur es . Oxford Univ ersit y Press (2010). Rosen baum, P . “In terference b et ween units in randomized exp erimen ts.” Journal of the Amer- ic an Statistic al Asso ciation , 102(477):191–200 (2007). Rosenquist, J. N ., Murabito, J., F o wler, J. H., and Christakis, N. A . “The spread of alcohol consumption b eha vior in a large so cial netw ork.” Annals of Internal Me dicine , 152(7):426– 433 (2010). Shalizi, C. R. “Commen t on "Wh y and W hen ’Fla wed ’ So cial Netw ork Analyses Still Yield V alid T ests of no Con tagion".” Statistics, Politics, and Policy , 3(1) (2012). Shalizi, C. R. and Thoma s, A. C. “Homophily and contagi on are generically confounded in observ ational so cial netw ork studies.” So ciolo gic al Metho ds & R ese ar ch , 40(2):211–239 (2011). V aleri, L. and V anderW eele, T. J. “M ediation analysis allo wing f or exp osure-mediator intera c- tions and causal interp retation: theoretical assumptions and implemen tation with SAS and SPSS macros.” Psycholo gic al Metho ds , 18(2):137–150 (2013). V anderW eele, T. J. “Sensitivit y analysis for con tagion eﬀe cts in so cial net w orks.” So ciolo gic al Metho ds & R ese ar ch , 40(2):240–255 (2011). V anderW eele, T. J., Ogburn, E. L., and T c hetgen T c hetgen, E. J. “Wh y and When" Flaw ed" So cial Net w ork Analyses Still Yield V alid T ests of no Con tagion.” Statistics, Politics, and Policy , 3(1):1–11 (2012a). V anderW eele, T. J. and T c hetgen T c hetgen, E. J. “Bounding the Infectiousness Eﬀect in V accine T rials.” Epidemiolo gy , 22(5):686 (2011a). —. “Eﬀect partitioning under inte rference in tw o-stage randomized v accine trials.” Statisti cs & Pr ob ability L etters , 81(7):861–86 9 (2011b). V anderW eele, T. J., T c hetgen T c hetgen, E. J., and Halloran, M. E. “Component s of the indirect eﬀect in v accine trials: iden tiﬁcation of con tagion and inf ectiousness eﬀects.” Epidemiolo gy , 23(5):751–76 1 (2012b). 28

Vaccines, Contagion, and Social Networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment