Contact intervals, survival analysis of epidemic data, and estimation of R_0

Biostatistics (2009), 0 , 0 , pp. 1–30 doi:10.1093 /biostatistics/ Contact intervals, survival analysis of epidemic data, and estimation of R 0 EBEN KEN AH ∗ Departments of Biostatistics and Global Health University of W ashington , Seattle, W A 9810 5, USA eek4@u.washingto n.edu S U M M A RY W e argue that the time from the onset of infe ctiousness to inf ectious con tact, which we call the co ntact interval , is a better basis for inference in epidemic data than the generation or serial interval. Since contact intervals can be right-ce nsored, surviv al analy sis is the natural approach to esti mation. Estimates of the contact interval distrib u tion can be used to estimate R 0 in both mass-action and network-ba sed m odels. K e ywor ds : B asic reproducti ve number ( R 0 ); Epidemic data; Generation interv als; Surviv al analysis 1 . I N T RO D U C T I O N Infectiou s disease remains one of the g reatest threats to h uman health and co mmerce, and the an alysis of epidemic data is one of the most imp ortant a pplications o f statistics in public health. Some of the most im- portant questions inv olve the basic reprod uctiv e number , R 0 , th e n umber o f secon dary in fections caused by a typical infec tious person in the early stages o f an epid emic (Diekmann an d Heesterbeek, 20 00). Higher values of R 0 indicate that an epid emic will be larger an d harde r to contro l. Th e effects of inter- c  Th e Author 2009 . Published by Oxford Uni versity Press. All rights reserv ed. F or permissions, please e-mail: journals.permissions@oxfordjournals.org. 2 E . K E N A H ventions and the dep letion of the susceptib le popu lation can be captu red with the effecti ve re produc ti ve number R ( t ) , which is the numb er of secondary infections caused by a typical person infected at time t . The generation interval o f an infectious disease is the time between the infectio n of a secondary case and the inf ection of his o r h er inf ector . The serial interval is th e time b etween the sy mptom o nset of a second ary case and the symptom onset of his or her infecto r . The generatio n an d serial interval distributions are often considered characteristic featur es of an infectious disease (Fine, 2003). F or a given R 0 , a shorter mean serial or generation interval implies faster spread of the epidemic. Usually , ge neration intervals are times between unob served events. Serial in tervals, wh ich are tim es between obser ved e vents, ar e often used instead. Recent analyses o f sev eral past, emerging , and poten tially emerging infectious diseases ha ve been based o n serial interv al distributions, includin g the 1918 inﬂuenza (Mills et al., 20 04), Se vere Acute Respiratory Syndro me (SARS) (Lipsitch et al., 2003 ; W allinga and T e- unis, 2004), pand emic inﬂuenza A(H1N1) (Fraser et al., 2009; McBryde et al., 2009), and a vian inﬂuenza (Ferguson et al., 2 005, 20 06). Three method s form th e basis of these application s. With a measur ement of the expone ntial g rowth r ate at th e beginn ing of an epidemic and a known serial inte rval d istribution, R 0 can b e estimated via the Lotk a-Euler eq uation (Diekma nn a nd Heesterbeek, 20 00; Svensson , 20 07; W allin ga a nd L ipsitch, 2 007; Rob erts and He esterbeek, 2 007). T wo o ther m ethods use the time series of symptom on set tim es, assuming tha t a ll in fections are symptomatic and o bserved. W allinga and T eunis (2004 ) estima te R ( t ) given a known serial interval distribution. Their approach has been adapted by other researchers (Cauch emez et al., 2 006), often supplem ented with serial-interval observations from co ntact- tracing data. White an d P agano ( 2008) jointly estimate R 0 and the serial in terval d istribution using a branch ing-pro cess ap proxim ation to the initial s pread of infection , assumin g the number of secondary cases generated by each infectious person has a Poisson distribution with mean R 0 . Contact intervals, survival analysis of epidemic data, and estimation of R 0 3 There are se veral pro blems with estimators based on generation or serial intervals in the context of an emerging infection. The Lotka-Euler and W alling a-T eunis estimators rely on a previously kno wn genera- tion serial interval distribution. T he W alling a-T eunis an d White- Pagano estimators assume that all serial intervals are indepen dent an d id entically distributed, wh ich o ccurs only if the incub ation and infectio us periods ar e con stant. All three o f these estimators assum e a stable serial inter val distrib ution, which lim- its their use to the early spread of infectio n. Whe n mu ltiple in fectious p ersons co mpete to infect a given susceptible, the infector is the one who ﬁrst makes in fectious contact. T hus, th e m ean g eneration and serial intervals contract as the prev a lence of infection increases either locally (e.g., within ho useholds) or globally (Svensson, 2007; K enah et al., 2008). In this paper, we outline an alternative an alysis of epid emic data th at applies meth ods from su rviv al analysis to c ontact interva ls . I nformally , the contact interval fro m an infectiou s person i to a susceptib le person j is the time between the onset of infectiousn ess in i and the ﬁrst in fectious contact f rom i to j , where we deﬁne infec tious c ontact to be a contact sufﬁcient to infect a susceptible in dividual. T his inter val will be right-cen sored if j is infected by someone else pr ior to in fectious contact fro m i or if i recovers from infectio n before mak ing infectious conta ct with j . T he contact inter val is similar to th e gen eration interval, except that its deﬁnition is not limited to contacts that actually cause infection and it be gins with the onset of infectiousness rather than infection . Here, we focus on th e analy sis of co mpletely-ob served “ Susceptible-Exp osed-Inf ectious-Recovered” (SEIR) epidemics. The S EIR frame work app lies to acute, immunizing diseases that spread from person to person, such as m easles, inﬂuen za, smallpox , and p olio. W e also assume tha t the epidemic is completely observed, so all cases are detected and their times of infection, onset of infectiousne ss, and recovery are observed. Most epidemics are only partially observed, so we plan to explore the analysis of more realistic 4 E . K E N A H data sets in futur e pap ers. However , it is best v ie wed as a missing data prob lem, wh ich requires that th e methods for complete data be established. In Section 2 , we deﬁne a general stoch astic SEIR epid emic mod el and show that su rviv al lik elihoods for a vector θ of co ntact interval distribution parameters h av e score pro cesses that are zer o-mean martin - gales at th e true pa rameter θ 0 . In Sectio n 3, we show how estimates of the contact in terval distribution can b e used to estimate R 0 in mass-action and network- based models. In Section 4 , we ev aluate the per- forman ce o f these meth ods in simu lated epidem ic d ata and show that assum ptions abo ut th e un derlying contact proce ss p lay a crucial role in accurate statistical inference. In Section 5, we discuss th e advantages and limitations of surviv al methods in epidem ic data analysis. 2 . M E T H O D S In this sectio n, we show that the score processes f rom sur viv al likelihoo ds f or epid emic data can be written as stochastic in tegrals with respect to zero- mean marting ales. W e d ev elop our m ethods in three stages. First, we describe the under lying stochastic SEIR mo del and the o bserved data. Secon d, we imag ine that we ob serve who-infected-who m a nd derive counting-pr ocess m artingales for an or dered pair ij and for a ﬁxed suscep tible j . Finally , we con sider the situ ation where we do not observe who-infe cted-whom and derive coun ting-pro cess m artingales for a ﬁxed su sceptible j and for the comp lete observed data. Ou r sources for the unde rlying theory are Kalbﬂeisch and Prentice (20 02) and S erﬂing (198 0). 2.1 Stochastic SEIR model and o bserved data Consider a stoc hastic “Susceptible-E xposed-I nfectious-Removed” (SEIR) mod el in a closed po pulation of n individuals assigned indices 1 , . . . , n . Each person i moves f rom S to E at his or her infection time t i , Contact intervals, survival analysis of epidemic data, and estimation of R 0 5 with t i = ∞ if i is never inf ected. Af ter in fection, i begins a laten t p eriod o f le ngth ε i during wh ich he or she is in fected but no t infectio us. At time t i + ε i , i moves from E to I, beginnin g an infec tious period of leng th ι i . At time t i + r i , i re covers from infection and moves from I to R, where the r ecovery period r i = ε i + ι i is th e to tal time between infection and remov a l. Once in R, i ca n no lon ger infect other persons o r be infe cted. T he latent pe riod is a nonn egati ve ran dom variable, the inf ectious a nd recovery periods are strictly positive random variables, and the recovery period is ﬁ nite with prob ability one. After becoming infectio us at time t i + ε i , person i makes infectious contact with p erson j 6 = i at their infectious contac t time t ij = t i + ε i + τ ∗ ij , where the infectious conta ct interval τ ∗ ij is a strictly p ositi ve random variable with τ ∗ ij = ∞ i f infectiou s contac t never occur s. Since in fectious contact must occu r while i is infectiou s or never , τ ∗ ij ∈ (0 , ι i ] or τ ∗ ij = ∞ . W e deﬁn e infec tious co ntact to be sufﬁcient to cause infection in a susceptible person, so t j 6 t ij with equality if and only if j is susceptible at time t ij . An ep idemic begins with on e or mor e persons in fected fr om outside the pop ulation, wh ich we call imported infections . For simplicity , we assume that epidemics begin with on e or more imp orted infection s at time t = 0 and ther e are no other imported infections. Contact intervals For ea ch ordered p air ij , let C ij = 1 if infectious contact from i to j is po ssible a nd C ij = 0 o therwise. W e a ssume th at the infectiou s conta ct interval τ ∗ ij is generated in the fo llowing way: A co ntact interval τ ij is d rawn from a distribution with hazard function λ ij ( τ ) . If τ ij 6 ι i and C ij = 1 , then τ ∗ ij = τ ij . Other wise τ ∗ ij = ∞ . I n this pap er , we assume all contact inter vals have an absolutely continuo us distribution and, for a ﬁxed i or a ﬁxed j , the co ntact intervals τ ij , i 6 = j , are ind ependen t. Susceptibility and infe ctiousness p r ocesses Let S i ( t ) = 1 t 6 t i and I i ( t ) = 1 t ∈ ( t i + ε i ,t i + r i ] be the sus- ceptibility an d in fectiousness pro cesses, respectively , fo r person i , where 1 X = 1 if X is true and zero 6 E . K E N A H otherwise. As deﬁned, both pr ocesses are left-co ntinuou s and infectio us contact from i to j is p ossible at time t only if C ij I i ( t ) S j ( t ) = 1 . Complete ob served d ata Our populatio n h as size n , a nd m represen ts the num ber o f in fections we ob- serve. Observation begins at time t = 0 and ends at time t = T . During this period, we observe the times of all S → E (in fection), E → I (onset of infectiousness), and I → R (recovery) tran sitions that occur in the p opulation . For all or dered pairs i j , we o bserve C ij and any covariates X ij needed to specify λ ij ( τ ) up to an unk nown para meter vector θ with true value θ 0 . 2.2 Scor e pr ocesses when who-in fects-whom is observed Choose an o rdered pair ij and let N ij ( t ) = 1 t > t i + ε i + τ ij count th e n umber of inf ectious contac ts from i to j o n or be fore time t . W e co unt only the ﬁrst infectious co ntact because j is infected o n or b efore that time. Consider the ﬁltration H ij t = σ  N ij ( u ) , S i ( u ) , I j ( u ) : 0 6 u 6 t  . W e assume that N ij (0) = 0 and λ ij ( τ ) is predictab le with respect to H ij t , so M ij ( t ) = N ij ( t ) − Z t 0 λ ij ( u − t i − ε i ) C ij I i ( u ) S j ( u ) du (2.1) is a zero-mean martin gale with respect H ij t . Now sup pose λ ij ( τ ) is spe ciﬁed up to a parameter vector θ with true value θ 0 , s o λ ij ( τ ) = λ ij ( τ ; θ 0 ) . If th e pair ij is observed fro m time 0 un til time T , the correspo nding log likelihood is ℓ ij ( θ ) = Z T 0 ln λ ij ( u − t i − ε i ; θ ) dN ij ( u ) − Z T 0 λ ij ( u − t i − ε i ; θ ) C ij I i ( t ) S j ( u ) du . Contact intervals, survival analysis of epidemic data, and estimation of R 0 7 If ln λ ij ( τ ; θ ) is differentiable with respect to θ and we can interchan ge the o rder of differentiation and integration, the score process for data in the time interval [0 , t ] is U ij ( θ, t ) = Z t 0 ∂ ∂ θ ln λ ij ( u − t i − ε i ; θ ) dM ij ( θ, u ) , (2.2) where M ij ( θ, u ) = N ij ( t ) − Z t 0 λ ij ( u − t i − ε i ; θ ) C ij I i ( u ) S j ( u ) du . Therefo re, U ij ( θ 0 , t ) is a zer o-mean martingale because it is th e integral o f a predic table pr ocess with respect to M ij ( θ 0 , t ) . When C ij = 0 , we have M ij ( θ, t ) = U ij ( θ, t ) = 0 for all θ and t . Now ﬁx j and assume th ere exist covariates X ij such that λ ij ( τ ; θ ) = λ ( τ ; θ , X ij ) f or all i 6 = j . For each i 6 = j , assume N ij (0) = 0 and λ ij ( τ ) is predictab le with respect to H ij t . Since the contact intervals τ ij are indepen dent for a ﬁxed j and absolu tely continu ous, the M ij ( θ 0 , τ ) fro m eq uation (2 .1) are orthog onal zero- mean martingales with respect to the ﬁltration H · j t = σ  N ij ( u ) , I i ( u ) , S j ( u ) : 0 6 u 6 t, i 6 = j  . The total score process for j is U · j ( θ, t ) = X i 6 = j U ij ( θ, t ) , (2.3) and U · j ( θ 0 , t ) is a zero-me an martingale with respect t o H · j t because it is a sum o f zero-m ean martingales. The sco re process in equation (2. 3) is that of a surviv al likeliho od w here the t ij are failure ti mes and C ij I i ( t ) S j ( t ) = 1 in dicates risk of in fectious contact in th e ord ered pair ij . At the earliest infectious contact, the con tact in tervals in all rema ining pairs at risk are right- censored, which is a type II ind ependen t censoring mechan ism (Kalbﬂeisch and Prentice, 2002 ). 8 E . K E N A H 2.3 Scor e pr ocesses when who-infects-whom is not observed In the previous section, U · j ( θ, t ) is adapte d on ly if we observe which o f th e N ij ( t ) jumps ﬁrst, which is equiv alent to observing the infector of person j . Now suppose that we observe the infection time of j but not which pe rson i was the in fector . T his is equ i valent to observ ing N · j ( t ) = P i 6 = j R t 0 S j ( u ) dN ij ( u ) , which counts the ﬁrst infectiou s contact received by j . The correspon ding ﬁltration is e H · j t = σ  N · j ( u ) , I i ( u ) , S j ( u ) : 0 6 u 6 t, i 6 = j  , and the correspo nding zero-mean coun ting process martingale is M · j ( θ 0 , t ) , wher e M · j ( θ, t ) = N · j ( t ) − Z t 0 λ · j ( u ; θ ) S j ( u ) du (2.4) and λ · j ( t ; θ ) = P i 6 = j λ ( t − t i − ε i ; θ , X ij ) C ij I i ( t ) . W e can n o longer calculate U · j ( θ, t ) as deﬁned in equation (2.3), but we can calculate its cond itional expectatio n given e H · j t . For each ij , deﬁne th e expected score process e U ij ( θ, t ) = Z t 0 ∂ ∂ θ ln λ ( u − t i − ε i ; θ , X ij ) E [ dM ij ( θ, u ) | e H · j t ] . (2.5) Giv en that N · j jumps at time t , the prob ability that the jump occu rred in N ij is Pr( dN ij ( t ) = 1 | dN · j ( t ) = 1 , θ , e H · j t ) = λ ( t − t i − ε i ; θ , X ij ) C ij I i ( t ) λ · j ( t ; θ ) . Thus, E [ dM ij ( θ, u ) | e H · j t ] = λ ( u − t i − ε i ; θ , X ij ) C ij I i ( u ) λ · j ( u ; θ ) dN · j ( u ) − λ ( u − t i − ε i ; θ , X ij ) C ij I i ( u ) S j ( u ) du, and equation (2.5 ) can be re wr itten e U ij ( θ, t ) = Z t 0 ∂ ∂ θ λ ( u − t i − ε i ; θ , X ij ) C ij I i ( u ) λ · j ( u ; θ ) dN · j ( u ) − Z t 0 ∂ ∂ θ λ ( u − t i − ε i ; θ , X ij ) C ij I i ( u ) S j ( u ) du . Contact intervals, survival analysis of epidemic data, and estimation of R 0 9 Therefo re, the expected score process for person j is e U · j ( θ, t ) = X i 6 = j e U ij ( t ) = Z t 0 ∂ ∂ θ ln λ · j ( u ; θ ) dM · j ( θ, u ) , (2.6) which is the score process of the of the log likelihood e ℓ · j ( θ ) = Z T 0 ln λ · j ( u ; θ ) dN · j ( u ) − Z T 0 λ · j ( u ; θ ) S j ( u ) du . (2.7) e U · j ( θ 0 , t ) is a zero- mean martingale with respect to e H · j t because it is the integral of a predictable process with respect to M · j ( θ 0 , t ) . For an imported infection j , e U · j ( θ, t ) = 0 for all θ and all t ∈ [0 , T ] . Finally , consider the ﬁltration e H t = σ  N · j ( u ) , I j ( u ) , S j ( u ) : 0 6 u 6 t, j = 1 , . . . , n  generated b y th e co mplete d ata de scribed at the end of Section 2.1. Since we assum e that the τ ij , j 6 = i , are independen t fo r a ﬁxed i an d absolutely continu ous, the M · j ( θ 0 , t ) from equation (2.4 ) ar e orthogonal zero-mea n marting ales with respect to e H t . The total expected score process is e U ( θ, t ) = n X j =1 e U · j ( θ, t ) , (2.8) which is the score process for the log likelihood e ℓ ( θ ) = n X j =1 e ℓ · j ( θ ) . (2.9) e U ( θ 0 , t ) is a zer o-mean martingale with r espect to e H t because it is a sum of zer o-mean martingales. The maximum likelihood estimate (MLE) for θ is the solutio n to the equation e U ( ˆ θ , T ) = 0 . 10 E . K E N A H 2.4 Asymptotic distrib ution of ˆ θ In this sectio n, we show that the variance of e U ( θ 0 , t ) can be e stimated using its predicta ble and optional variation processes, which are unbiased estimators of the Fisher information from the surviv al likelihood. W e th en use th e Lindeberg-Feller Central Limit Theorem to give a heuristic justiﬁcation for stand ard maximum likeliho od estimation with ep idemic data. Throu ghout this sectio n, we assume that λ ( τ ; θ , X ) has a bound ed second deriv ativ e i n θ an d that integration and differentiation can be interchanged . T aking the deri vative o f U · j ( θ, t ) with respect to θ in equation (2.6) leads to − ∂ ∂ θ e U · j ( θ, t ) = Z t 0 ∂ 2 ∂ θ 2 ln λ · j ( u ; θ ) dM · j ( θ, u ) − Z t 0 [ ∂ ∂ θ ln λ · j ( u ; θ )][ ∂ ∂ θ ln λ · j ( u ; θ )] T λ · j ( u ; θ ) S j ( u ) du . Setting θ = θ 0 makes the ﬁrst term the in tegral of a pred ictable proc ess with respect to a zero -mean martingale. Theref ore, E  − ∂ ∂ θ e U · j ( θ 0 , t )  = E  Z t 0 [ ∂ ∂ θ ln λ · j ( u ; θ 0 )][ ∂ ∂ θ ln λ · j ( u ; θ 0 )] T λ · j ( u ; θ 0 ) S j ( u ) du  , (2.10) so th e p redictable variation pro cess h e U · j ( θ 0 ) i ( t ) is a n un biased estimator of V ar [ e U · j ( θ 0 , t )] . By eq uation (2.8) and or thogon ality of th e e U · j ( θ 0 , t ) , the to tal pr edictable variation proc ess h e U ( θ 0 ) i ( t ) = P j h e U · j ( θ 0 ) i ( t ) is an unb iased estimator of V ar [ e U ( θ 0 , t )] . T o s how that t he same result holds for the optio nal v ariation proc ess, rearrange equation (2.6) to get e U · j ( θ, t ) = Z t 0 ∂ ∂ θ ln λ · j ( u ; θ ) dN · j − Z t 0 ∂ ∂ θ λ · j ( u ; θ ) S j ( u ) du . T aking the deri vati ve with respect to θ yields − ∂ ∂ θ e U · j ( θ, t ) = Z t 0 [ ∂ ∂ θ ln λ · j ( u ; θ )][ ∂ ∂ θ ln λ · j ( u ; θ )] T dN · j ( u ) − Z t 0 ∂ 2 ∂ θ 2 λ · j ( u ; θ ) λ · j ( u ; θ ) dM · j ( θ, u ) . Setting θ = θ 0 makes the secon d term th e integral of a p redictable proc ess with respect to a zero -mean Contact intervals, survival analysis of epidemic data, and estimation of R 0 11 martingale. Therefore , E  − ∂ ∂ θ e U · j ( θ 0 , t )  = E  Z t 0 [ ∂ ∂ θ ln λ · j ( u ; θ 0 )][ ∂ ∂ θ ln λ · j ( u ; θ 0 )] T dN · j ( u )  (2.11) so the optional variation process [ e U · j ( θ 0 )]( t ) is an unbiased estimator of V ar [ e U · j ( θ 0 , t )] and the total optional variation process [ e U ( θ 0 )]( t ) = P j [ e U · j ( θ 0 )]( t ) is an unbiased estimator of V ar [ e U ( θ 0 , t )] . Imagine a series of epidemics in larger and larger populatio ns, an d assum e th at the ﬁnal sizes o f the epidemics be come inﬁnite as the p opulation size n → ∞ . For any ﬁxed T , th e number o f infection s will not become inﬁnite as n → ∞ , w hich makes it difﬁcult to ap ply th e Martingale Central Limit Theorem to e U ( θ 0 , T ) . Instead, imagine that we observe m n infections in a pop ulation of size n between time 0 an d time T n , with m n → ∞ as n → ∞ . Let e U n ( θ, T n ) b e the correspondin g total exp ected score process, an d let ˆ θ n be th e corresp onding ML E. If th e Lindebe rg cond ition h olds for the trian gular array e U · 1 ( θ 0 , T n ) , . . . , e U · n ( θ 0 , T n ) , then e U n ( θ 0 , T n ) V ar [ e U n ( θ 0 , T n )] 1 2 − → N (0 , 1 ) in d istribution as n → ∞ by the Lin deberg-Feller Central Limit Th eorem (Serﬂing, 19 80). Heuristically , this justiﬁes the use of maximu m likelihood method s such as W ald, score, and likelihood ratio tests. 3 . E S T I M AT I O N O F R 0 The contact interval distribution can be used to estimate R 0 in both network-based and mass-action mod - els. For simplicity , we assume that the ha zard of infectiou s contact d oes n ot dep end on c ov ariates. Thus, λ ij ( τ ; θ ) = λ ( τ ; θ ) f or all ij an d the results in this section apply to homogen eous populatio ns. F or mass- action models, we describe an asymptotic likelihood that is v alid for the initial spread of disease. 12 E . K E N A H 3.1 Network-based models In a network -based model, tran smission takes place acro ss the edges of a contac t n etwork , so we have C ij = 1 if and only if there is an ed ge lea ding f rom i to j in the contact network. Her e, w e will assume that contact networks are undirected , so C ij = C j i for all i and j . In a network-based model, R 0 depend s on the structu re of the contac t network. The most tractable mod els are those on con ﬁguration-mod el networks, which are m aximally r andom except for their d egree distribution ( Molloy and Reed, 19 95, 1 998; Newman et a l., 2002 ). Mo re formally , let D be a nonn egati ve discrete ran dom variable with ﬁnite mean and variance. T o constru ct a conﬁgura tion-mod el ne twork with n nodes, assign ea ch node i = 1 , . . . , n a degree d i random ly sam pled from the distribution of D . Then co nnect the stub s at random , e rasing one stub if necessary so the s um of the de grees is e ven. As n → ∞ , the probability of multiple edges between two nodes or loop from a node to itself goes to zero. In these networks, ther e is a straigh forward deﬁnition of R 0 (Andersson , 199 8; Newman, 200 2; Kenah and Robins, 200 7). I n the early stages of tr ansmission, an infected node of d egree d h as d − 1 edges across which infection can be transmitted. The p robability of transmitting infection acr oss each of these edg es is ex p( − Λ( ι ; θ 0 )) , wh ere ι is the infectious period and Λ( t ; θ ) = R t 0 λ ( u ; θ ) du . Since the probability of r eaching a node by fo llowing ed ges is p roportio nal to the degree of the node, th e mean number o f secondary infection s generated by a typic al i nfected nod e in the early stages of an epid emic is R 0 = E [ e − Λ( ι ; θ 0 ) ]  E [ D 2 ] E [ D ] − 1  , (3.1) where the ﬁrst expectation is taken o ver the distribution of the infectious period ι . Network-based likelihood In a network-based model, the likelihood e ℓ ( θ ) in equation (2.9) de pends only on data ab out individuals who are either inf ected before time T or conn ected to an infected p erson in the Contact intervals, survival analysis of epidemic data, and estimation of R 0 13 contact network. In prin ciple, these peop le co uld be iden tiﬁed thr ough surveillance and contact tracin g. For all oth er ind i viduals j , e U · j ( θ, t ) = 0 for all t ∈ [0 , T ] becau se C ij I i ( t ) = 0 for all i . Sin ce E [ D 2 ] E [ D ] is the expected degree of person s who are infected by transmission within the population, it can be estimated by calculating the mean degree of persons who are infected. 3.2 Mass-action models In a mass-actio n mode l, individuals fo rm no stable social b onds and interact like ga s m olecules. Thu s, C ij = 1 for all ij but the hazard of infectiou s contac t is inv ersely p roportio nal to th e p opulation size. If λ n ( τ ; θ ) is the hazard functio n for the contact interv al distrib u tion in a population of size n , λ n ( τ ; θ ) = λ 0 ( τ ; θ ) n − 1 for a ba seline h azar d fu nction λ 0 ( τ ; θ ) with correspo nding cum ulativ e hazar d f unction Λ 0 ( τ ; θ ) . As be- fore, these functio ns are speciﬁed up to an unk nown par ameter vector θ with true value θ 0 . The baseline hazard and cum ulativ e hazard fu nctions o f a mass-action model have useful interp reta- tions in terms of R 0 and the time cour se o f in fectiousness in th e limit as n → ∞ . Given a n infectiou s period ι , the expected number of infectious contacts made is R 0 = ( n − 1)  1 − e − 1 n − 1 Λ 0 ( ι ; θ 0 )  − → Λ 0 ( ι ; θ 0 ) . (3.2) Giv en that i makes infe ctious contact with j an d h as in fectious per iod ι , the p robability d ensity func tion of the infectious contact interval from i to j is 1 n − 1 λ 0 ( τ ; θ ) e − 1 n − 1 Λ 0 ( τ ; θ 0 ) 1 − e − 1 n − 1 Λ 0 ( ι ; θ 0 ) − → λ 0 ( τ ; θ 0 ) R 0 . (3.3) 14 E . K E N A H Mass-action likelihood Let m be the total numbe r of inf ections o bserved befo re time T . If m ≪ n , an approx imate likelihood that depend s only o n inf ormation ab out infected presons can be written in terms of λ 0 ( τ ; θ ) . Ex panding equation (2.9) in terms of λ 0 ( τ ; θ ) , we g et e ℓ ( θ ) = n X j =1 Z T 0 ln  X i 6 = j λ 0 ( u − t i − ε i ; θ ) I i ( u )  dN · j ( u ) − n X j =1 Z T 0 ln( n − 1 ) dN · j ( u ) − 1 n − 1 n X j =1 Z T 0 ( X i 6 = j λ 0 ( u − t i − ε i ; θ ) I i ( u )) S j ( u ) du . (3.4) All summand s in the ﬁrst term are zero except for those j with t j 6 T . The second term is not a fu nction of θ an d can be ig nored. The third ter m can be split in to terms fro m j who get in fected on or b efore time T and from those who remain uninf ected at time T : 1 n − 1 X j : t j 6 T  X i : t i 0 , wh ere β > 0 is the rate parame ter . The W eibull distribution has th e hazar d function λ ( τ ; α, β ) = αβ ( β τ ) α − 1 for all τ > 0 , wh ere α > 0 is the shape param eter and β > 0 is the rate p arameter . Note that the exponential distrib ution is a W eibull distribution with α = 1 . 16 E . K E N A H P ar ameter estimates F or network-based models, we used the likelihood in equation (2.9) to estimate the parameters of the contact inter val d istribution. For m ass-action models, we used the asymptotic likelihood in equatio n (3.5) to estimate the param eters of the baseline contact interval d istribution. Maximum lik eli- hood estimates were o btained using the mle function in the R library stats4 . Conﬁd ence intervals for each parameter were calculated using the confint f unction, which in verts the one-parameter likelihood ratio chi-squar ed test using a pro ﬁle lik elihood. R 0 estimates For network-based models, R 0 was estimated u sing equatio n (3.1). For mass-action m od- els, R 0 was estimated using e quation ( 3.2). W e c alculated boo tstrap percen tile con ﬁdence intervals by sampling contact in terval distribution parameter s f rom their approximate joint normal distribution and combinin g each sample with a bootstrap sample o f the observed infectiou s p eriods (an d, fo r network- based models, observed degrees in the contact network) . The 95% conﬁde nce interval was deﬁned by the 2.5% and 97.5 % quantiles of the po int estimates from 10,000 samples. Implementatio n Simulations were im plemented in Python 2.6 (www .pytho n.org) using the SciPy 0.7 package ( Jones et al., 2009) . Analyses were perfor med in R 2.10 (R Development Core T eam, 2 009) v ia the RPy 2.0 packag e (Moreira and W arnes, 2 009). Contact networks were generated using the NetworkX 0.99 p ackage (Hag berg et al., 20 08). Sampling from multiv ariate normal d istributions was done using the Cholesky distrib ution of the cov ar iance matr ix (Rizzo, 2008). The simu lation code is included as supplemen tary material (http://www .biostatistics.oxford journals.org). 4.1 .1 Mass-action models For mass-action models with expo nential contact inter vals, R 0 = β for bo th ﬁxed and e xponen tially- distributed infectious perio ds. Let ˆ β denote th e MLE of th e rate param eter β , and let ι k denote the infec- Contact intervals, survival analysis of epidemic data, and estimation of R 0 17 tious period of the k th infection observed. Our point estimate of R 0 is ˆ R 0 = 1 m m X k =1 ˆ β ι i . (4.1) A bootstrap sample of R 0 is R ∗ 0 = 1 m m X k =1 β ∗ ι ∗ k , (4.2) where β ∗ is a par ametric bootstrap sample fro m the appro ximate normal distribution of ˆ β and ι ∗ 1 , . . . , ι ∗ m is a bootstrap sample from the observed ι 1 , . . . , ι m . For m ass-action mod els with W eibull c ontact intervals R 0 = β α for a ﬁxed inf ectious perio d an d R 0 = β α Γ( α + 1 ) f or exponentially- distributed infectiou s period s. In both cases , ˆ R 0 = 1 m m X k =1 ( ˆ β ι k ) ˆ α , (4.3) where ˆ α is the shape parameter MLE and ˆ β is th e rate parameter MLE. A bootstrap sample of R 0 is R ∗ 0 = 1 m m X k =1 ( β ∗ ι ∗ k ) α ∗ , (4.4) where ( α ∗ , β ∗ ) is a sample from the approxim ate joint norma l distribution of ( ˆ α, ˆ β ) . Results T able 1 shows the coverage probabilities achie ved in 1,000 simulations and e xact binomial 95% conﬁdenc e intervals for the tru e coverage probab ilities in each o f the four ty pes of mass-action m odel. Figure 1 shows a scatterplot of ˆ R 0 versus R 0 for models with exponen tial contact interv al and inf ectious period distributions. Figure 2 s hows a scatterplot of estimated versus true ln( R 0 ) f or models with W eib u ll contact inter val distributions and expon ential infectious per iod distributions. For these models, estimates of R 0 are right- ske wed beca use of e xponent ˆ α in equatio n (4 .3); this is r educed b y taking lo garithms. Similar results were obtained in models with a ﬁxed infectious period. 18 E . K E N A H 4.1 .2 Network-based models Let ι k and d k denote the infectiou s perio d and degree, r espectiv ely , of the k th infection obser ved. In a contact network with n nodes, let ¯ D be the mean degree and let e D = ¯ D − 1 n X i =1 d i ( d i − 1) . For network-b ased models with expo nential conta ct inter vals, R 0 = (1 − exp ( − β )) e D fo r a ﬁxed infectiou s period and R 0 = λ λ +1 e D for exponentially -distributed in fectious periods. In both cases, ˆ R 0 = 1 m m X k =1 (1 − e − ˆ β ι k )( d k − 1) . A bootstrap sample of R 0 is R ∗ 0 = 1 m m X k =1 (1 − e − β ∗ ι ∗ k )( d ∗ k − 1 ) , where β ∗ is a sample from the approximate normal distribution of ˆ β an d ( ι ∗ 1 , d ∗ 1 ) , . . . , ( ι ∗ m , d ∗ m ) is a boot- strap sample from ( ι 1 , d 1 ) , . . . , ( ι m , d m ) . For network-based models with W eibull contact intervals, R 0 = (1 − exp( − β α )) e D for a ﬁxed infec- tious perio d and R 0 = 1 − Z ∞ 0 e − ( β x ) α − x dx . for exponen tially-distributed infectio us periods. In both cases , ˆ R 0 = 1 m m X k =1 (1 − e − ( ˆ β ι k ) ˆ α )( d k − 1 ) . A bootstrap sample of R 0 is R ∗ 0 = 1 m m X k =1 (1 − e − ( β ∗ ι ∗ k ) α ∗ )( d ∗ k − 1) , where ( α ∗ , β ∗ ) is a samp le f rom th e ap proxima te jo int n ormal d istribution of ( ˆ α, ˆ β ) and ( ι ∗ 1 , d ∗ 1 ) , . . . , ( ι ∗ m , d ∗ m ) Contact intervals, survival analysis of epidemic data, and estimation of R 0 19 is a bootstrap sample from ( ι 1 , d 1 ) , . . . , ( ι m , d m ) . Results T able 2 shows the coverage probabilities achie ved in 1,000 simulations and e xact binomial 95% conﬁdenc e intervals for the true coverage p robability in each of the four typ es of network-b ased mo del. Figure 3 shows a scatter plot of th e estimated versus true R 0 for mo dels with expone ntial contact interval and infectiou s period d istributions. Figure 4 shows a scatterplot of the estimated versus true R 0 for m odels with W eib ull contact interval distrib utions and exponential infectious perio d distrib utions. Similar results were obtained in models with a ﬁxed infectious period. Mass-action estimates T o look at the effect of assumption s about the contact process on statistical infer- ence du ring an e pidemic, we ap plied the m ass-action likelihoods to d ata gen erated by the n etwork-based models, ignoring all information a bout the contact n etwork. T able 2 shows the coverage pr obabilities achieved in 1,000 simulations and exact bino mial 95 % co nﬁdence in tervals for the tru e c overage proba - bilities for mass-action estimates applied to netw ork-based mode ls. The ‘+’ signs in Figures 3 and 4 show the m ass actio n estimates of R 0 versus the true R 0 in network-based mo dels with exponential infectio us periods. Many of these p oints fall above the top edg e of each grap h. Similar results were obtained in models with a ﬁxed infectious period. 4.2 Illustration: Inﬂuenza A(H1N1) in Me xico , 2009 T o show the practica bility of metho ds based on c ontact intervals as we ll as th e impo rtance of da ta that is uncollected or unrep orted in emerging epid emics, we attempted to estimate R 0 based on two epidemic curves pub lished at the beginnin g of the in ﬂuenza A(H1 N1) pand emic in Mexico. The ﬁrst epidem ic curve contains suspected cases in the village of V era Cruz between March 9 and March 20 (Fraser et al., 2009). 20 E . K E N A H The second epidemic c urve con tains lab- conﬁrmed cases in Mexico City between April 13 and April 2 4 (Ministry of Health, 200 9). In both analyses, we assumed a latent period (between infection and the onset of infectiousn ess) of one day and an incub ation period (between infection and onset of sympto ms) of two days. W ith no data on links between cases or th e duration of il lness in each case, we assumed mass-action with a constant infectiou s perio d. Conﬁdence intervals ar e generated as in the simulations. Assuming an expo nential con tact in terval distribution, we ge t ˆ R 0 = 1 . 95 (1 . 63 , 2 . 33) for V era Cruz and ˆ R 0 = 2 . 31 (2 . 15 , 2 . 48) f or Mexico City . These are hig h but co nsistent with som e early estimates (Fraser et al., 2009; Y ang et al., 200 9). Assuming a W eibull co ntact interval d istribution, we get ˆ R 0 = 3 . 08 (2 . 55 , 3 . 65) for V era C ruz and ˆ R 0 = 4 . 37 (4 . 06 , 4 . 70) for Mexico City; in both cases, the null hypoth esis of an exponential contact interval distribution is strong ly rejected (likelihood ratio p-v alue < . 001 ). The estimates are also sensiti ve to the assumed infectious pe riod. Assuming a ﬁve-day infectious period and a W eibull contact in terval distribution, we ge t ˆ R 0 = 3 . 53 (2 . 79 , 4 . 30) for V era Cru z and ˆ R 0 = 7 . 14 (6 . 63 , 7 . 66) fo r Mexico City . Subsequ ent exper ience sh ows that th ese R 0 estimates are far too h igh. This bias is consistent with the results obtain ed above whe n app lying mass-actio n estimates to simu lated data genera ted b y network-b ased mo dels. Since a m ost inﬂuenza tr ansmission takes p lace in households, w orkplaces, and scho ols ( Y ang et al., 2 009) th e true u nderlyin g transmission model is probab ly closer to a ne twork-based model than a mass-action model. Data on the d uration of illness and, more impor tantly , on th e soc ial lin ks b etween cases would allow b etter p oint a nd in terval estimates of R 0 . The estimates could also be impr oved with incomp lete-data method s that too k into acco unt the discreteness of the data and allowed v ariability in latent, incubation, and infectious periods. Contact intervals, survival analysis of epidemic data, and estimation of R 0 21 5 . D I S C U S S I O N The results of the simulation s conﬁr m that standar d max imum-likelihoo d methods can be applied suc- cessfully to survival likelihoods written in term s of th e con tact interval distribution. In the mass-action models, perfo rmance deteriorated noticeab ly in moving from exponen tial t o W eib ull contact interval dis- tributions, possibly b ecause e U ( θ 0 , T ) was closer to a n ormal distribution in the simpler m odels. No such deterioratio n w as noticeable in the network -based models, possibly due to the add ition of contact-tracin g informa tion. Our meth ods were deliberate ly simple: all poin t estimates were plu g-in estimator s and all conﬁdenc e intervals were based on n ormal appro ximations for the joint distributions of the MLEs. Mo re sophisticated m ethods, such as Bayesian method s, migh t produ ce point estimates and conﬁden ce in ter- vals whose perform ance is even better . The m ethods her e would adapt qu ite well to a Bayesian an alysis, and we believe that a Bay esian fr amew ork is the most natur al setting for the dev elopment of meth ods to analyze partially-ob served ep idemics. Methods based on contact intervals can inco rporate a m uch greater variety of tran smission mo dels than m ethods b ased on g eneration or serial intervals, whic h usua lly assume mass-actio n. The simulation results pr esented above show that th is ﬂexibility is essential for accurate statistical inf erence durin g an epidemic. The mass action e stimates f ailed spectacularly when applied to data generated b y network - based models. The point estimates were severely biased up ward, and all 95% conﬁd ence intervals had coverage prob abilities belo w 85%, with most belo w 25%. The method s an d sim ulation results in this p aper h av e impor tant implicatio ns f or data co llection d uring an emerging epidemic. First, they requir e info rmation on th e on set an d dura tion of infectiou sness. For an acu te infectious disease, the on set an d duratio n of illness may provide a usefu l proxy , e specially if there is some knowledge of the incubation period and th e pattern of pathog en she dding. Second , they 22 E . K E N A H show the p otential value of da ta abou t close co ntacts of cases, wheth er or no t they are in fected. Meth ods based on generatio n and serial inter vals d o not r equire such da ta, but this appar ent advantage co mes at a tr emendou s cost in terms of the ﬂexibility an d validity o f the subsequen t analysis. Th ey are essentially missing-data m ethods with no comple te-data counter parts, an d they almo st certainly unde rstate th e tru e data requirem ents for accurate estimation of R 0 . Limitations The SEIR f ramew ork limits our m ethods to acute, im munizing diseases tha t spread per son- to-person . It does not apply t o many diseases of public health importance, such as tuber culosis, menin go- coccal or p neumoco ccal d iseases, foodborne or waterborne d iseases, or HIV/AIDS. Most (th ough n ot all) em erging infections ﬁt in to the SEIR framework, and almo st all m ethods curren tly u sed to analy ze data from eme rging epidem ics make this assump tion. W e also assumed that all tim es o f infection , on - set of infectio usness, an d recovery are o bserved. This is clear ly un satisfactory , but the d e velopment of incomplete- data metho ds must be b ased on complete-data method s. In Section 2 , we assumed that the contact interval τ ij is indep endent of the infectious period ι i of i . This simpliﬁed the likelihoods, but it is probab ly unrealistic. This problem co uld be addressed by includin g ι i as a covariate in X ij or by using multiv ariate sur viv al methods. In Section 3, we ass umed that th e popu lation is homogeneou s. Th is simpli- ﬁed th e estimation of R 0 , but it is also un realistic. In a h eterogene ous population, estimates of R 0 would have to include the distribution of rele vant covariates in the population . Despite the se limitations, m ethods based on con tact intervals and survival analysis h a ve the p otential to become important tools in infectiou s disease epidemio logy . The purpo se of this paper w as to introduce surviv al analysis based on contact intervals as a useful complete-data method , and we ha ve done so in the simplest setting possible. These method s can be seen as descend ants of methods based on generation and serial intervals, b u t they are more ﬂexible and more explicit about assumptions and data requirements. Contact intervals, survival analysis of epidemic data, and estimation of R 0 23 A C K N OW L E D G M E N T S I would like to thank M. Elizab eth Halloran for h er guidance thro ughou t the prepa ration this manuscript. I am also g rateful for the co mments of Y ang Y ang, Ira M. Long ini, Jr ., participants in the work shop “De- sign a nd Analy sis of Infectio us Disease Stud ies” (Math ematisches Forschung sinstitut Ober wolfach, 1 -7 November 2009 ), and th e an onymous referees of Biostatistics . Th is work was suppor ted b y Nation al In sti- tute of Ge neral Medical Scien ces g rant F32 GM0859 45, “Linking tra nsmission m odels an d da ta a nalysis in infectio us d isease epidemiolo gy”. Ofﬁce space and ad ministrative suppo rt were provided b y the Fred Hutchinson Cancer Research Center . Conﬂict of interes t: None declared. R E F E R E N C E S Andersson, H. (1998). Limit theorems for a random graph epidemic model. Annals of Applied Pro bability 8 , 1331– 1349. Cauchemez, S., P .-Y . Bo elle, G. Thomas, and A.-J. V all eron (20 06). Est imating in real time the efﬁcacy of measures fo control emergin g communicable diseases. A merican J ourna l of Epidemiology 164 , 591–597. Diekmann, O. and J. A . P . Heesterbeek (2000). Mathematical Epidemiology of Infectious Diseases: Model Building, Analysis and Interpr etation . Wiley Seri es in Mathematical and Computational Biology . Hobo ken , NJ: John W i ley & Sons. Ferguson , N. M., D. A. T . Cummings, S. Cauchemez, C. Fraser , S. Riley , A. Meeyai, S. Iamsiri thaw orn, and D. S. Burke (200 5). Strategies for contain ing an emerging inﬂuenza pandemic in southeast asia. Natur e 437 , 209–214 . Ferguson , N. M., D. A. T . Cummings, C. Fraser , J. C. Cajka, P . C. Cooley , and D. S . B urke (2006). S trategies for mitigating an inﬂuenza pande mic. Natur e 442 , 448–452 . Fine, P . E. M . (2003). T he interva l between successiv e cases of an infectious disease. American Journal of Epidemi- ology 158 , 103 9–1047 . Fraser , C., C. A. Donelly , S. Cauchemez, W . P . Hanage, M. D . V an K erkho ve, T . D. Hollingsworth, J. Grifﬁn, R. F . 24 E . K E N A H Baggaley , H. E. Jenkins, E. J. L yons, T . Jombart, W . R. Hinsley , N. C. Grassly , F . Balloux, A. C. Ghani, a nd N. M. Ferguson (20 09). Pandemic potential of a strain of inﬂuenza A (H1N1): Early ﬁndings. Science 324 , 1557–1561. Hagberg , A. A., D. A. Schult, and P . J. Swart (2008). E xploring network structure, dynamics, and function using NetworkX. In Procee dings of the 7th Python in S cience Confer ence (SciPy2008) , Pasadena, CA USA, pp. 11– 15. Jones, E., T . Oliphant, P . Peterson, et al. (2001–2009 ). SciPy: Open source scientiﬁc tools for Python. Kalbﬂeisch, J. D. and R. L. Prentice (2002). The Statistical Analysis o f F ailur e T ime Data (Second ed .). Wile y Series in Probability and Statistics. Hobok en, NJ: John Wile y & Sons. Ke nah, E. , M. Lipsitch, and J. M. Robins (2008). Generation interv al contraction and epidemic data analysis. Math - ematical Biosciences 213 , 71–79. Ke nah, E . and J. M. R obins (2007). Second look at the spread of epidemics on networks. Physical R evie w E 76 , 036113 . Lipsitch, M., T . Cohen, B. Cooper , J. M. Robins, S. Ma, L. James, G. Gopalakrishna, S. K. Chew , C. C . T an, M. H. Samore, D. Fisman, and M. Murray (2003). T ransmission dynamics and control of sev ere acute respiratory syn- drome. Science 300 , 1966–1970. McBryde, E . S., I. Bergeri, C. van Gemert, J. Rotty , E. J. Headley , K. Simpson, R. A. Lester, M. Hellard, and J. E. Fielding (2009). E arly transmission characteristics of inﬂuenza A(H1N1)v in Australia: V ictorian State, 1 6 May–3 June 2009. Eurosu rveillance 14 , 19363. Mills, C., J. M. Robins, and M. Lipsitch (200 4). Transm issibility of 1918 pandemic inﬂuenza. Natur e 432 , 904– 906. Ministry of Health, G. o. M. (2009). S ituaci ´ on actual de la epide mia (20 de mayo del 2009). Molloy , M. and B. Reed (19 95). A critical point for random graph s with a gi v en de gree sequence. Random Structur es and Algorithms 6 , 161–1 80. Molloy , M. and B. Reed (1998). The si ze of the giant compone nt of a random graph wi th a gi ven degree sequence. Combinatorics, Pr obability , and Computing 7 , 295–3 05. Moreira, W . an d G. R. W arnes (2002– 2009). RPy Refere nce Manual . Contact intervals, survival analysis of epidemic data, and estimation of R 0 25 Ne wman, M., A.-L. Barab´ asi, and D. J. W atts (200 6). Structur e and Dynamics of Networks . Princeton, NJ: Princeton Univ ersity P ress. Ne wman, M. E. J. (2002). Spread of epidemic disease on networks. Physical Review E 66 , 016128 . Ne wman, M. E. J., S. H. Strogatz, and D. J. W atts (2002). Random graphs with arbitr ary degree distributions and their applications. Physical Review E 64 , 026118. R De velop ment Core T eam (2009). R: A Languag e and Envir onment for Statistical Computing . V ienna, Austria: R Founda tion for Stati stical Comp uting. Rizzo, M. L. (2008). Statistical Computing wit h R . Boca Raton, FL: Chapman & Hall/CRC. Roberts, M. G. and J. A. P . Heesterbe ek (2007). Model-consistent estimation of the basic reproduction number from the incidence of an emerging in fection. Journal of Mathematical Biology 55 , 803– 816. Serﬂing, R. J. (1980). A ppr oximation Theor ems of Mathematical Statistics . New Y ork, NY : John W iley & Sons . Svensson , ˚ A. (2007). A note on generation times in epidemic models. Mathematical Biosciences 208 , 300–3 11. W allinga, J. and M. Lipsitch (2007). Ho w generation intervals shape the relation ship between growth rates and reproducti ve numbers. Pro ceedings of the Royal Society B 274 , 599–60 4. W allinga, J. and P . T eunis (2004). Di f ferent epidemic curves for sev ere acute respiratory syndrome revea l similar impacts of control measures. American Journ al of Epidemiology 160 , 509–516. White, L. F . and M. Pag ano (2008). A likelihood-b ased me thod for real-time estimate of the serial interval and reproducti ve number of an epidemic. Statistics in Medicine 27 , 2999–3016 . Y an g, Y ., J. Sugimoto, M. E. Halloran, N. E. Basta, D. L. Chao, L. Matrajt, G. Potter , E. K enah, and I. M. Long ini, Jr (2009). The transmissibility and control of pandemic inﬂuenza A(H1N1) virus. Science 326 , 729–73 3. 26 E . K E N A H T able 1. Co verage prob abilities for mass-action models. Infectiou s perio d distribution Parameter Coverage probability Exact binom ial 95% C I Exponential contact interval β .952 (.937, .964 ) Constant R 0 .950 (.935, .962 ) β .951 (.936, .964 ) Exponential R 0 .963 (.949, .974 ) W eibu ll contact interval α .936 (.919, .950) Constant β .907 (.887, .924 ) R 0 .879 (.857, .899 ) α .927 (.909, .942) Exponential β .912 (.893, .929 ) R 0 .902 (.882, .920 ) T able 2. Co verage probabilities for netw ork-ba sed models. Network-based estimates Mass-action estimates Infectiou s perio d Coverage Exact binomial Coverage Exact binomial distribution Parameter pro bability 95% CI p robability 9 5% CI Exponential contact interval β .942 (.926, .956) .004 (.001, .010 ) Constant R 0 .948 (.932, .961) .21 0 (.185, .237) β .943 (.927, .957) .03 5 (.024, .048) Exponential R 0 .962 (.948, .973) .00 0 (.000, .004) W eibu ll contact interval α .936 (.919, .950) .79 8 (.772, .822) Constant β .946 (.930, .959) .02 5 (.016, .037) R 0 .945 (.929, .958) .50 9 (.477, .540) α .946 (.930, .959) .83 4 (.809, .857) Exponential β .950 (.935, .963) .03 1 (.021, .044) R 0 .941 (.925, .955) .39 2 (.362, .423) Contact intervals, survival analysis of epidemic data, and estimation of R 0 27 5 10 15 5 10 15 Estimated v ersus true R 0 T rue R 0 Estimated R 0 Equality Estimates Fig. 1 . Scatterplot of estimated versus true R 0 for mass-action m odels with e xpon ential co ntact interval and infectious period distributions, sho wing excellent agreement. S imilar results were obtained i n models with a ﬁxed infectious period (not sho wn). 28 E . K E N A H 0.0 0.5 1.0 1.5 2.0 2.5 0 1 2 3 4 5 Estimated v ersus true ln ( R 0 ) T rue ln ( R 0 ) Estimated ln ( R 0 ) Equality Estimates Smoothed mean of estimates Fig. 2. Scatterplot of estimated versus true ln( R 0 ) for mass-action models with W eibu ll contact interv al distribution s and exponen tial infectious period distributions. Estimates are nearly unbiased at low R 0 , but biased upward at high R 0 . Similar results were obtained in models with a ﬁx ed infectiou s period (no t sho wn). The smoothed mean w as pro- duced wi th t he R command lowess . O ne simulation that produced ˆ R 0 = 3730 . 8 (1089.9, 12,245.8) was excluded from the graph; it had a true R 0 = 10 . 5 . Contact intervals, survival analysis of epidemic data, and estimation of R 0 29 2 4 6 8 10 12 14 2 4 6 8 10 12 14 16 Estimated v ersus true R 0 T rue R 0 Estimated R 0 Equality Estimates Mass−action estimates Fig. 3. Scatterplot of estimated v ersus true R 0 for network-bas ed models with expone ntial contact interval and infec- tious period distributions, sho wing excellent agreement. The mass-action estimates are se ve rely biased upward; most are out of range of the plot. Similar results were obtained in models with a ﬁxed infectious period (not sho wn). 30 E . K E N A H 2 4 6 8 10 12 14 2 4 6 8 10 12 14 Estimated v ersus true R 0 T rue R 0 Estimated R 0 Equality Estimates Mass−action estimates Fig. 4. Scatterplot of estimated versus true R 0 for network -based models with W eibull contact interval distributions and ex ponential infectious period distribu tions, sho wing excellent agreement. The mass-action e stimates are se verely biased upward. Similar results were obtained in models with a ﬁxed infectious pe riod (not shown ).

Contact intervals, survival analysis of epidemic data, and estimation of R_0

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment