Respondent-driven sampling and an unusual epidemic
Respondent-driven sampling (RDS) is frequently used when sampling hard-to-reach and/or stigmatized communities. RDS utilizes a peer-driven recruitment mechanism where sampled individuals pass on participation coupons to at most $c$ of their acquainta…
Authors: Jens Malmros, Fredrik Liljeros, Tom Britton
Resp onden t-driv en sampling and an un usual epidemic Jens Malmros ∗ † F redrik Liljeros ‡ T om Britton § Abstract Resp onden t-driv en sampling (RDS) is frequen tly used when sam- pling hard-to-reach and/or stigmatized communities. RDS utilizes a p eer-driv en recruitmen t mechanism where sampled individuals pass on participation coup ons to at most c of their acquain tances in the com- m unity ( c = 3 being a common choice), who then in turn pass on to their acquain tances if they choose to participate, and so on. This pro- cess of distributing coup ons is shown to b eha ve lik e a new Reed-F rost t yp e netw ork epidemic mo del, in whic h b ecoming infected corresp onds to receiving a coup on. The difference from existing netw ork epidemic mo dels is that an infected individual can not infect (i.e. sample) all of its contacts, but only at most c of them. W e calculate R 0 , the proba- bilit y of a ma jor “outbreak”, and the relativ e size of a ma jor outbreak in the limit of infinite population size and ev aluate their adequacy in finite p opulations. W e study the effect of v arying c and compare RDS to the corresp onding usual epidemic mo dels, i.e. the case of c = ∞ . Our results suggest that the num b er of coup ons has a large effect on RDS recruitmen t. Additionally , w e use our findings to explain previous empirical observ ations. Key w ords: Respondent-driv en sampling; Epidemic mo del; Configu- ration mo del; Reed-F rost. 1 In tro duction Hidden p opulations are groups of individuals which i) hav e strong priv acy concerns due to illicit or stigmatized b eha viour, and ii) lac k a sampling frame, i.e., their size and comp osition are unkno wn. Examples of hidden p opulations include sev eral groups that are at high risk for contracting and spreading HIV, e.g., men who hav e sex with men, sex work ers, and injecting drug users [1, 2, 3]; it is therefore of great imp ortance to obtain reliable ∗ Departmen t of Mathematics, Stockholm univ ersity , SE-106 91 Stockholm, Sw eden † Corresp onding author: jensm@math.su.se ‡ Departmen t of So ciology , Stockholm universit y , SE-106 91 Sto c kholm, Sweden § Departmen t of Mathematics, Sto ckholm univ ersity , SE-106 91 Sto c kholm, Sweden 1 sampling metho ds for hidden p opulations in order to plan and ev aluate in terven tions in the global HIV epidemic [4, 5]. Resp onden t-driv en sampling (RDS) [6, 7] is a sampling metho dology that utilizes the relationships b et w een individuals in order to sample from the p opulation. By com bining an effectiv e sampling scheme and the abilit y to pro duce unbiased p opulation estimates, RDS has b ecome the perhaps most preferred method when sampling from hidden populations. A typical RDS study starts with the selection of a group of seed individuals. Eac h seed is pro vided with a num b er of coup ons, typically b et ween three to fiv e, to distribute to his or her p eers in the p opulation. An individual is eligible for participation up on presen ting a coup on at the study site. Because recruit- men t takes place by coupons, participants remain anonymous throughout the study , but each coup on is n umbered with a unique ID to keep track of who recruited whom. Incen tives are given b oth for the participation of an individual as well as for the participation of those to whom he or she passed coup ons. After participation, which commonly includes survey questions and p ossibly being tested for diseases, newly recruited individuals (i.e., re- sp onden ts) are also given coup ons to disp erse among their contacts in the p opulation. This pro cedure is then rep eated until the desired sample size has b een reached. The sampled individuals form a tree-lik e structure which is obtained from tracing the coup ons. Recently , online based RDS methods (w ebRDS), where recruitmen t tak es place via email and a surv ey is filled out at a designated web site, ha ve also b een put in to use [8, 9, 10]. There are sev eral procedures av ailable for estimating p opulation characteristics from RDS data, most of whic h use a Mark ov mo del in order to appro ximate the actual recruitment pro cess [11, 12, 13, 14, 15, 16]; this is not the focus of the present pap er. A frequen t problem in RDS studies is the inabilit y of the recruitmen t pro cess to reac h the desired sample size due to premature failure of the recruitmen t c hains started b y the seeds [17]. This is often mitigated b y ad- ditional seeds that en ter the study as the rate of recruitmen t declines; e.g., in [17], 43% of review ed RDS studies with a v ailable data rep orted that addi- tional seeds w ere used. Relatedly , it has been observ ed in webRDS studies, where recruitment is allo wed to go on until it stops by itself, that the re- cruitmen t pro cess fails to reach a large prop ortion of the p opulation despite additional seeds joining in at a later time [10, 18]. While there are most lik ely sev eral reasons b ehind recruitmen t c hain failure, such as communit y structure in the population causing chains to b ecome stuc k in a sub-net work and/or clustering that has a similar effect, but more lo cally , an imp ortant reason is the limited n umber of coupons in the RDS recruitment pro cess. This is the main fo cus of this pap er. F urthermore, recruitment chain failure is highly asso ciated with the ability of the recruitment pro cess to start suc- cessful recruitment chains, the probabilit y of such chains o ccurring, and the relativ e size of the p opulation that is reached by an RDS study , all of whic h 2 are related to quan tities typically studied in epidemic modelling. As it turns out, it is p ossible to use models of infectious disease spread on social net- w orks to describe coup on distribution in RDS, where the disease is defined as “participation in the study” and spreads by the RDS coup on distribution mec hanism. The simplest mo del of infectious disease spread is the Reed-F rost mo del, see e.g. [19, p. 11-18], where in each generation i , each infectious individual indep enden tly infects each susceptible individual with the same probabilit y . The individuals that were infected b y the individuals in generation i make up generation i + 1 of infectious individuals in the epidemic. After spread- ing the disease, the individuals in generation i are considered recov ered (or dead) from the disease and are remo ved from the process. In the original v ersion of the mo del, an infectious individual attempts to infect all suscep- tible individuals in the p opulation. The mo del is how ever easily mo dified to the more realistic case when the structure of the p opulation is describ ed b y a social net w ork, hence imposing the restriction that an infectious individual only may spread the disease to his or her contacts in the social net work in- dep enden tly of eac h other with the same probabilit y . Infectious diseases are usually able to spread to all contacts of an individual, and consequently , the Reed-F rost mo del and other epidemic mo dels defined on social netw orks do not imp ose any restrictions on the num b er of individuals that an infectious individual can infect other than those given by p opulation structure. The RDS recruitment process differs from infectious diseases in that its spread is restricted b y the limited num b er of coup ons. Consequen tly , individuals with more population contacts than the n umber of coup ons distributed to them ha ve less capability of recruiting than if RDS recruitment were to spread in the usual manner of an epidemic, i.e. without any limitations. Depending on how the num b er of con tacts (i.e, degrees) of p opulation members are distributed, this ma y ha ve a large effect on the capability of the RDS re- cruitmen t pro cess to sustain and initiate recruitment. F urthermore, it ma y affect the abilit y of the recruitmen t process to reac h a substantial prop ortion of the p opulation, as the sampling pro cedure can limit recruitment to parts of the population. In this paper, we model RDS as an epidemic taking place on a social net work b y defining a Reed-F rost type mo del which has an upp er limit on the num b er of individuals that an infectious individual could infect. W e will use both infectious disease terminology and RDS terminology when refer- ring to this mo del. In order to b e able to sp ecify the degree distribution of the so cial net work, w e use the configuration mo del [20, 21] to describe the structure of the p opulation. W e calculate the b asic r epr o duction num- b er , i.e., the num b er of individuals that are infected by a typical infectious individual during the early stages of the epidemic. This is often denoted b y R 0 . W e sa y that there is a major outbr e ak if a non-negligible prop or- tion of the p opulation is infected and calculate the probability τ of such 3 outbreaks o ccurring. If R 0 ≤ 1, it is not p ossible for a ma jor outbreak to o ccur, while if R 0 > 1, a ma jor outbreak may o ccur. The critical v alue of R 0 = 1 is often referred to as the epidemic thr eshold . W e also calculate the relativ e size of an outbreak in case of a ma jor outbreak z using so-called susc eptibility sets [22, 23]. Note that τ and z are p ositive only if R 0 is larger than the epidemic threshold. W e compare the RDS recruitmen t pro cess to corresp onding epidemics with unrestricted spread and in v estigate the effect of v arying the num b er of coup ons and the coup on transfer probabilit y . T o our knowledge, there are no previous studies of epidemics on netw orks that describ es b eha viour similar to the present one, although the mo del in [24] allo ws for a restriction on the num b er of individuals that an infectious indi- vidual can infect in a homogeneously mixing p opulation (i.e. a p opulation without netw ork structure). 2 Mo dels 2.1 Net w ork mo del W e consider a configuration mo del netw ork consisting of n vertices. In later calculations, we will assume that n → ∞ . Eac h individual i, i = 1 , . . . , n, is assigned an i.i.d. n um b er of stubs (half-edges) d i from a prescrib ed distri- bution D ha ving supp ort on the non-negative integers. The netw ork is then formed by pairing stubs together uniformly at random. If P n i =1 d i is o dd, an edge is added to the n :th v ertex (this do es not influence our results in the limit of infinite p opulation size). This construction allows the formation of m ultiple edges and self-lo ops; it is how ever w ell kno wn that the fraction of these is small if D has finite second momen t. Sp ecifically , the probability of the resulting graph b eing simple is b ounded aw a y from 0 as n → ∞ ; see [25, Theorem 7.8] and [26, Lemma 5.3]. Hence w e can condition on the graph being simple given that E ( D 2 ) < ∞ . Alternatively , w e ma y pro ceed b y remo ving m ultiple edges and self-lo ops from the generated graph since asymptotically this do es not change the degree distribution if D has finite second moment; see [25, Theorem 7.9]. Hence, w e will from now on assume that the resulting graph is simple. Moreo ver, the graph is lo cally tree-lik e when E ( D 2 ) < ∞ , meaning that it with high probability does not contain short cycles [26]. Hence, w e can tak e adv antage of the branc hing pro cess [e.g., 27] appro ximations that are often used for epidemics, see e.g. [19, ch. 3]. In what follows, w e will assume that the degree distributions considered ha ve finite second momen t. 2.2 Epidemic mo del On this graph, describing the social structure in a communit y , we define an epidemic mo del mimicking the RDS recruitment pro cess. In this mo del, b e- 4 coming infected corresp onds to participating in the RDS study . Initially , all mem b ers of the p opulation (vertices) are susceptible. The epidemic starts with one randomly selected individual (vertex), the index case, b eing in- fected from the outside. The infected individual uniformly selects c of his or her neighbours in the p opulation and infects them indep endently of each other with the same probability p . The parameter c corresp onds to the n umber of c oupons in RDS and the parameter p to the probabilit y of b eing successfully recruited to the RDS study . If the infected individual has less than c con tacts, he or she infects all his or her contacts indep enden tly of each other with probability p . The newly infected individuals make up the first generation of the epidemic. After spreading the disease, the initially infected individual reco vers and b ecomes immune (or dies) and has no further role in the epidemic. The individuals in the first generation each in turn select c of their neigh b ours excluding the one who infected them (whic h for the first generation is the index case), regardless of whether they are susceptible or not. If an individual has less than c neighbours excluding the one who infected him or her, he or she selects all of his or her neigh b ours. Then, they infect the selected contacts that are susceptible, indep enden tly of eac h other with probabilit y p , and then reco ver; con tacts with already infected individuals ha ve no effect. The now infected individuals form the second generation of the epidemic. The disease con tinues to spread in the same fashion from the second generation and on ward until there are no newly infected individuals in a generation. The individuals that were infected dur- ing the course of the epidemic make up the outbreak, and the num b er of ultimately infected individuals is the final size of the outbreak. Note that if w e let c = ∞ , we get the standard Reed-F rost epidemic taking place on the configuration mo del netw ork [26]. Because an individual only tries to infect those he or she selected, the spread of the disease, or coup on distribution mechanism, in our mo del is more similar to that of webRDS than physical RDS. W e discuss this further and present other possible coupon distribution mechanisms in Section 5. 3 Calculations 3.1 The basic repro duction n um b er R 0 Assume that we hav e a configuration mo del graph G of size n , where n is large, and let the degree distribution of G b e D , where P ( D = k ) = p k . The degree of a given neigh b our of an individual follow the size-biase d degree distribution ˜ D , where P ( ˜ D = k ) = ˜ p k = k p k /E ( D ). Assume that w e ha ve an epidemic spreading on this graph according to the description in Subsection 2.2. The degree of the index case is then distributed as D , and the degree of infected individuals in later generations during the early stages of an outbreak is distributed as ˜ D . As previously mentioned in Subsection 2.1, 5 the graphs generated b y the configuration model will with high probabilit y not con tain short cycles, meaning that w e can appro ximate the spread of the epidemic with a (forward) branching pro cess. Let X and ˜ X be the offspring of the ancestor (i.e., the index case) and of the later generations in this branc hing pro cess, resp ectiv ely . Given that the index case has degree k ≤ c , he or she can at most infect k neighbours. If the index case has degree larger than or equal to c + 1, he or she infects at most c neigh b ours. Because infections happ ens indep enden tly with the same probabilit y p , we ha ve that, conditionally on the degree, the probability that the index case infects j neigh b ours is P ( X = j | D = k ) = c ∧ k j p j (1 − p ) ( c ∧ k ) − j , (1) where j = 0 , . . . , c ∧ k . Infectious individuals in later generations ha ve one less con tact a v ailable for infection (the one that infected them). Hence, w e get that, conditionally on the degree, the probability that an infectious individual in later generations infects j neighbours is P ( ˜ X = j | ˜ D = k ) = c ∧ ( k − 1) j p j (1 − p ) ( c ∧ ( k − 1)) − j , (2) where j = 0 , . . . , c ∧ ( k − 1). Because the abilit y of an individual to spread the disease will dep end on its degree, the offspring distributions are obtained b y conditioning on the degree: P ( X = j ) = ∞ X k = j P ( X = j | D = k ) p k ; (3) P ( ˜ X = j ) = ∞ X k = j +1 P ( ˜ X = j | ˜ D = k ) ˜ p k , (4) where j = 0 , . . . , c , and the probabilities P ( X = j | D = K ) and P ( ˜ X = j | ˜ D = k ) come from Eqs. (1) and (2), resp ectiv ely . F rom standard branc h- ing pro cess theory [27] w e hav e that R 0 is the expected n umber of individuals that get infected b y an infectious individual in the second and later genera- tions; hence R 0 = E ( ˜ X ) = c X j =0 j ∞ X k =1 P ( ˜ X = j | ˜ D = k ) ˜ p k (5) = c X j =0 j c − 1 X k = j k j p j (1 − p ) k − j ˜ p k + c j p j (1 − p ) c − j 1 − c X k =1 ˜ p k ! . 6 The obtained R 0 is increasing in p and c , and for a fixed p , R 0 → R (unrestricted) 0 as c → ∞ , where R (unrestricted) 0 is the R 0 v alue for the standard Reed-F rost epidemic on a configuration model net work, given by [26] R (unrestricted) 0 = E ( D ) + V ar( D ) − E ( D ) E ( D ) . 3.2 Probabilit y of ma jor outbreak When R 0 > 1, it is p ossible for a ma jor outbreak to o ccur. The probability τ of such an outbreak o ccurring is giv en b y the surviv al probability of the appro ximating branching process, which w e get by standard tec hniques. W e first consider a branching pro cess with offspring distribution ˜ X for all indi- viduals, i.e. also for the index case. Let the extinction probabilit y of this pro cess b e ˜ π . F or the pro cess to die out, all the branc hing processes initiated b y the offspring of the ancestor must die out; hence b y conditioning on the n umber of offspring in the first generation of the pro cess, we get ˜ π = c X j =0 ˜ π j P ( ˜ X = j ) = ˜ ρ ( ˜ π ) , (6) where ˜ ρ is the probability generating function of ˜ X . The solution to Equa- tion (6) is obtained numerically . In our original branching pro cess the an- cestor has offspring distribution X and later generations ha ve offspring dis- tribution ˜ X . Again by conditioning on the n umber of individuals in the first generation, w e get that the extinction probabilit y π of the original branc hing pro cess is π = ρ ( ˜ π ) , (7) where ˜ π is the solution to Equation (6) and ρ is the probabilit y generating function of X . The solution to Equation (7) is given b y numerical calcula- tions, and w e obtain the probability of a ma jor outbreak τ = 1 − π . Note that if we ha ve 1 < s < ∞ initially infected individuals in the epidemic, the probability of a ma jor outbreak is 1 − π s , whic h approaches 1 as s b ecomes large. The n umber of initially infected individuals do es not affect R 0 or the relative size of a ma jor outbreak calculated in Subsection 3.3. 3.3 Relativ e size of a ma jor outbreak The relativ e size of a ma jor outbreak in case of a ma jor outbreak z can be obtained using susceptibilit y sets, constructed as follo ws. F or each individual i , we can obtain a random list of whic h neighbours that i would infect given that it were to be infected. By combining the lists from all individuals in the p opulation, it is p ossible to construct a directed graph with all v ertices 7 (individuals) in whic h there is an arc from vertex i to vertex j if j is in i :s list. The susceptibility set of an individual j consists of all individuals in this directed graph, including j itself, from which there is a directed path to j . Hence, j :s susceptibility set is such that the infection of an y individual in the set would result in the ultimate infection of j . Note that j will b e infected in the epidemic if and only if the initially infected individual is in j :s susceptibilit y set. The susceptibility set of a randomly c hosen individual, i 0 sa y , can b e appro ximated with a (bac kward) branc hing pro cess in which i 0 is the only mem b er of the zeroth generation. W e consider the n umber of neighbours that, if they w ere to b e infected, would infect i 0 (as opp osed to previously when we considered the n umber of neighbours that an individual w ould in- fect were it to b e infected). Supp ose that i 0 ha ve degree d . Because all neigh b ours of i 0 con tact him or her with the same probabilit y θ indep en- den tly of each other, the num b er of neighbours that contact him or her is Bin( d, θ )-distributed; hence, the unconditional distribution of the num b er of neighbours that contact him or her is a mixed binomial distribution with parameters D and θ . W e now derive an equation for the contact probabilit y θ . The degree distribution of the neighbouring individuals is ˜ D , so we obtain θ = ∞ X k =0 θ k ˜ p k , (8) where θ k is the probability that a neighbour with degree k con tacts i 0 . Because a neighbour of i 0 with degree k has to b e con tacted first in order to become infected, only k − 1 edges are a v ailable for him or her to spread the disease. Therefore, a neighbour must hav e at least degree t wo in order to first b ecome infected and then contact i 0 . If a neigh b our has degree k ≥ c + 2, he or she first selects c of the av ailable k − 1 con tacts and then attempts to spread the disease to them. Hence, the contact probabilities are θ k = 0 , k = 0 , 1; p, k = 2 , . . . , c + 1; c k − 1 p, k = c + 2 , c + 3 , . . . . (9) The probabilit y that a neigh b our mak es contact with i 0 dep ends on his or her degree. Hence, the degree distribution of individuals in the first generation, i.e. those neigh b ours of i 0 that makes con tact with i 0 , and of individuals in later generations in the bac kward branching pro cess is altered b y the fact that they hav e con tacted another individual. Conditionally on the even t that a contact has b een made, call it C , the distribution of the degree D ∗ of an individual in the first and later generations of the susceptibilit y set 8 pro cess is giv en b y P ( D ∗ = k ) = P ( ˜ D = k | C ) = P ( C | ˜ D = k ) P ( ˜ D = k ) P ∞ k =0 P ( C | ˜ D = k ) P ( ˜ D = k ) = θ k ˜ p k θ , (10) so P ( D ∗ = k ) = 0 , k = 0 , 1; p ˜ p k θ , k = 2 , . . . , c + 1; cp ˜ p k ( k − 1) θ , k = c + 2 , c + 3 , . . . . (11) An individual in later generations of the pro cess will b e contacted b y an y of his or her neigh b ours indep endently of other neighbours with the same prob- abilit y θ . Given that this individual has degree k , the num b er of neighbours that contact him or her is binomially distributed with parameters k − 1 and θ . Hence, the unconditional distribution of the n umber of neighbours that con tact an individual in later generations is mixed binomial with parameters D ∗ − 1 and θ . If the approximating bac kw ard branc hing pro cess contains few individ- uals, it is unlikely that i 0 will b e infected, whereas if the pro cess reac hes a large n umber of individuals (i.e. grows infinitely large), there is a p os- itiv e probability that i 0 will not escap e infection. More sp ecifically , the probabilit y that i 0 will b e infected during a ma jor outbreak is given by the surviv al probabilit y of the bac kw ard branc hing process. Because i 0 is c hosen randomly , we also ha ve that the relative size of an outbreak in case of a ma jor outbreak is given b y the surviv al probabilit y of the backw ard branc hing pro cess. Let Y be the n umber of offspring of the ancestor and Y ∗ the num b er of offspring of individuals in later generations in the ap- pro ximating branching pro cess, resp ectiv ely . Hence, Y ∼ MixBin( D , θ ) and Y ∗ ∼ MixBin( D ∗ − 1 , θ ). W e obtain the surviv al probabilit y of the pro cess similarly as in Subsection 3.2. Let the extinction probabilit y of a branc hing pro cess with offspring distribution Y ∗ b e π ∗ . W e hav e π ∗ = ∞ X j =0 ( π ∗ ) j P ( Y ∗ = j ) = E (( π ∗ ) Y ∗ ) = E ( E (( π ∗ ) Y ∗ | D ∗ )) = E (1 − θ + θπ ∗ ) D ∗ − 1 = ∞ X k =2 (1 − θ + θ π ∗ ) k − 1 P ( D ∗ = k ) = p θ c +1 X k =2 (1 − θ + θ π ∗ ) k − 1 ˜ p k + cp θ ∞ X k = c +2 (1 − θ + θ π ∗ ) k − 1 ˜ p k k − 1 . (12) 9 The solution to Equation (12) for π ∗ is obtained n umerically . Let the ex- tinction probability of the appro ximating branc hing process be π 0 . Then, π 0 = ∞ X j =0 ( π ∗ ) j P ( Y = j ) = E (( π ∗ ) Y ) = E ( E (( π ∗ ) Y | D )) = E (1 − θ + θ π ∗ ) D = f D (1 − θ + θ π ∗ ) , (13) where f D ( · ) is the probability generating function of D and π ∗ is the solution to Equation (12). The solutions to Equation (13) is obtained numerically , and the relative final size of the epidemic in case of a ma jor outbreak is z = 1 − π 0 . A rigorous pro of of that z = 1 − π 0 is b ey ond the scope of this paper. It has b een prov ed that for Reed-F rost epidemics on random intersection graphs [28] and Reed-F rost epidemics on configuration mo del graphs [29] that the prop ortion of infected during the epidemic conv erges in probabil- it y to the surviv al probability of the backw ard branching pro cess. Similar argumen ts could also b e used for our pro cess to provide a formal pro of. Ad- ditionally , we b eliev e that the techniques describ ed in [30] could b e used to obtain stronger results for the whole epidemic pro cess. 4 Numerical results and sim ulations W e no w n umerically examine the analytical results obtained in Section 3. In particular, we examine the relation b et ween R 0 , τ , and z and the param- eters c and p , and compare the RDS recruitment process with unrestricted epidemics. W e use tw o different degree distributions in our calculations, the P oisson degree distribution and a v ariant of the pow er-law degree distribu- tion with exp onen tial cut-off giv en b y p k ∝ k − α exp( − k /κ ), k = 1 , 2 , . . . , where α is the p o wer-la w exp onen t and κ refers to the exponential cut- off [e.g. 31]. In Figure 1, we show the R 0 v alues for the RDS recruitment pro cess with c = 3 , 5 , 10 and the unrestricted epidemic for p ∈ [0 , 1]. Figure 1 (a) sho ws the results for the Poisson degree distribution with parameter λ = 8 and Figure 1 (b) shows the results for the p o wer-la w degree distribution with parameters α = 2 and κ = 100. F or b oth degree distributions and a fixed v alue of p , the limitation imp osed by the n umber of coup ons on disease spread yields smaller R 0 v alues for the RDS recruitment pro cess when compared to the unrestricted epidemic for all v alues of c . Esp ecially for the p ow er-law degree distribution, all v alues of c giv e m uch smaller R 0 v alues than those of the unrestricted epidemic, and the v alue of p for which R 0 b ecomes larger than 1 (i.e., the epidemic threshold) is larger than that of the unrestricted epidemic for all v alues of c . 10 (a) D~Po(8) Probability of successful recruitment R 0 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Unrestricted 10 Coupons 5 Coupons 3 Coupons (b) D~PL(2,100) Probability of successful recruitment R 0 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Unrestricted 10 Coupons 5 Coupons 3 Coupons Figure 1: Comparison of R 0 for unrestricted epidemics and RDS recruitmen t pro cesses with 10, 5, and 3 coup ons and p ∈ [0 , 1]. Plot (a) show the results for the Poisson degree distribution with parameter λ = 8 and plots (b) show the results for the p ow er-law degree distribution with parameters α = 2 and κ = 100. The dashed horizontal lines shows the threshold v alue R 0 = 1. Figure 2 shows the v alues of τ and z for the RDS recruitment pro cess with c = 3 , 5 , 10 and the unrestricted epidemic for p ∈ [0 , 1]. Figures 2 (a) and 2 (b) show the results for τ and z , resp ectiv ely , for the Poisson de- gree distribution with parameter λ = 8 and Figures 2 (c) and 2 (d) sho w the results for τ and z , resp ectiv ely , for the p o wer-la w degree distribution with parameters α = 2 and κ = 100. The relative size of a ma jor out- break is alwa ys smaller than the probabilit y of a ma jor outbreak for b oth degree distributions. F or b oth degree distributions, the probabilit y of a ma- jor outbreak for the RDS recruitment pro cess is smaller than that of the unrestricted epidemic for small v alues of p and approaches that of the un- restricted epidemic when p → 1. F or the p ow er-law degree distribution, the size of a ma jor outbreak is muc h smaller than that of the unrestricted epidemic for all v alues of c and p . W e also mak e a brief ev aluation of the adequacy of our asymptotic results in finite p opulations by means of simulations. F rom simulated RDS recruit- men t pro cesses (as described by the mo del), we estimate the probabilit y of a ma jor outbreak and the relativ e size of a ma jor outbreak in case of a ma jor outbreak b y the relative prop ortion of ma jor outbreaks and the mean rela- tiv e size of ma jor outbreaks, resp ectiv ely . Giv en a degree distribution and n umber of coup ons c , let p c b e the smallest v alue of p for which the pro cess is ab o ve the epidemic threshold. Each simulation run consists of generat- ing a net work of size 5000 b y an erased configuration model approac h [32], for which we make use of the iGraph R pack age [33]. Then, RDS recruit- men t proce sses are run on the generated net w ork for v alues of p ∈ [ p c , 1]. In Figure 3, we sho w the estimated probabilit y of a ma jor outbreak ˆ τ and 11 (a) D~Po(8) Probability of successful recruitment Probability of major outbreak 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Unrestricted 10 Coupons 5 Coupons 3 Coupons (b) D~Po(8) Probability of successful recruitment Size of major outbreak 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Unrestricted 10 Coupons 5 Coupons 3 Coupons (c) D~PL(2,100) Probability of successful recruitment Probability of major outbreak 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Unrestricted 10 Coupons 5 Coupons 3 Coupons (d) D~PL(2,100) Probability of successful recruitment Size of major outbreak 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Unrestricted 10 Coupons 5 Coupons 3 Coupons Figure 2: Comparison of the asymptotic probabilit y of a ma jor outbreak and relative size of a ma jor outbreak for unrestricted epidemics and RDS recruitmen t pro cesses with 10, 5, and 3 coupons and p ∈ [0 , 1]. Plots (a) and (b) show the results for the Poisson degree distribution with parameter λ = 8 and plots (c) and (d) sho w the results for the p ow er-law degree distribution with parameters α = 2 and κ = 100. 12 (a) D~Po(12), 3 coupons Probability of successful recruitment 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.2 0.4 0.6 0.8 1.0 o o o o o o o o o o o o o o o o o o o o o o x x x x x x x x x x x x x x x x x x x x x x o x Size of maj. outbr. Prob . of maj. outbr. Simulated size Simulated prob . (b) D~PL(2.5,50), 10 coupons Probability of successful recruitment 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.2 0.4 0.6 0.8 1.0 o o o o o o o o o o o o o o o o o o o x x x x x x x x x x x x x x x x x x x o x Size of maj. outbr. Prob . of maj. outbr. Simulated size Simulated prob . Figure 3: Comparison of results from simulations of RDS recruitmen t pro- cesses and the asymptotic probability and relative size of a ma jor outbreak. Plot (a) shows the results for the Poisson degree distribution with param- eter λ = 12 for pro cesses with c = 3 and plot (b) sho ws the results for the p o w er-law degree distribution with parameters α = 2 . 5 and κ = 50 for pro cesses with c = 10. Note that the error bars for the simulated relative size are v ery narro w and not visible for most sim ulated v alues. Also note that the horizon tal scales are different. estimated relative size of a ma jor outbreak in case of a ma jor outbreak ˆ z for v arying p and the corresp onding asymptotic results. Figure 3 (a) shows the results for the P oisson degree distribution with parameter λ = 12 from 5000 sim ulations runs of RDS recruitment pro cesses with 3 coupons. Figure 3 (b) shows the results for the p ow er-law degree distribution with parameters α = 2 . 5 and κ = 50 from 5000 sim ulation runs of RDS recruitmen t processes with 10 coup ons. In both Figures 3 (a) and (b), w e sho w error bars for the estimates based on ± 2 standard errors, where the standard error for ˆ τ is estimated as S E ( ˆ τ ) = ( ˆ τ (1 − ˆ τ ) /m )) 1 / 2 , where m is the num b er of sim ula- tions, and the standard error for ˆ z is estimated as S E ( ˆ z ) = ( ˆ σ 2 /m maj ) 1 / 2 , where ˆ σ 2 is the sample v ariance of the relative final sizes of ma jor outbreaks and m maj is the n umber of sim ulations resulting in a ma jor outbreak. Note that it is not well defined what constitutes a ma jor outbreak in small, finite p opulations. Usually , the threshold for when an outbreak con- stitutes a ma jor outbreak is determined by insp ecting the distribution of outbreak sizes. T ypically , this distribution is bimo dal with modes at 0 and z , corresp onding to small and ma jor outbreaks. In our mo del, outbreak sizes will dep end on p . F or p close to p c , where “close” dep ends on the degree distribution, small and ma jor outbreaks are indistinguishable. Con- sequen tly , it is difficult to estimate τ and z for such v alues of p . In Figure 3, w e hav e chosen to set the (relatively small) threshold for ma jor outbreaks to 2% of the p opulation ov er the whole interv al [ p c , 1]. This yields fairly 13 correct estimates for p close to p c and do es not affect estimates for p further a wa y from p c . W e see that b oth the estimated probability of a ma jor outbreak and the estimated relative size of ma jor outbreak in case of a ma jor outbreak are very w ell approximated by the asymptotic results for b oth the ev aluated degree distributions. As pointed out in [34], the relative size of the epidemic is more efficien tly estimated than the probabilit y of a ma jor outbreak b ecause each sim ulation yields many (correlated) observ ations of the backw ard pro cess and only one observ ation of the forw ard process. 5 Discussion and conclusions When the RDS recruitmen t pro cess is compared to the corresp onding unre- stricted epidemic, it is clear that the limited n umber of coup ons has a large impact on R 0 and the v alue of p c corresp onding to the epidemic threshold, the probability of a ma jor outbreak, and the relativ e size of a ma jor out- break in case of a ma jor outbreak. This is especially true for the p ow er-law degree distribution, for which in particular R 0 and z is m uc h smaller than for the corresp onding unrestricted epidemic. In so cial netw orks with p o w er-law degree distribution, the v ast ma jorit y of individuals will hav e small degrees. F or these individuals, the probabilit y of b eing infected in an epidemic will b e small. Also, such an individual will, once infected, hav e few or no con tacts to spread the disease to. Hence, the spread of an epidemic in suc h netw orks will b e highly dep enden t on a few individuals with very large degrees that ha ve the capacit y to infect man y of their (small degree) neigh b ours. Because of the relatively small v alue of c , the potential of large degree individuals to spread the disease is muc h impaired in RDS compared to an unrestricted epidemic with the same p , hence impairing the spread of the epidemic as a whole. The impact of the num b er of coup ons on the RDS recruitment process ma y in part explain wh y some RDS studies exp erience difficulties in obtain- ing the desired sample size and/or recruiting a substan tial prop ortion of the study population. Giv en p , the num b er of coup ons will b e crucial to whether R 0 is ab o ve or below the epidemic threshold for the recruitment pro cess; in the latter case all recruitment c hains will even tually fail. Moreov er, the pro- p ortion of the p opulation recruited by the RDS recruitment pro cess may b e small even given that p is relatively large and a ma jor outbreak o ccurs. F or some parameter com binations, the prop ortion reached can b e v ery small; this is esp ecially imp ortan t to consider in webRDS. W e illustrate this b y considering the webRDS studies in [10] and [18]. In b oth studies, eac h re- sp onden t w ere allo wed to make 4 recruitments. In the latter study , 66% of started recruitmen t c hains had a depth of one generation (i.e. index case and one generation of recruitments) and 11% had a depth of three generations 14 or more. This indicates that R 0 is b elo w the epidemic threshold for this study and therefore, recruitmen t never takes off. In the former study , the ma jority of recruitmen ts come from long recruitment chains, implying that R 0 is ab o ve the epidemic threshold. Still, recruitmen t ev entually declined and stopp ed completely b efore reaching a large part of the p opulation de- spite additional seeds joining the study . As we see in Section 4 ho wev er, relativ ely man y parameter combinations with R 0 > 1 yields small z v alues, whic h could explain the observ ed behaviour. F or both studies, heterogeneit y in netw ork structure, such that, lo cally R 0 < 1, ma y also b e an explanation. It would be of interest to find prop er inference pro cedures for our mo del to b e used in further ev aluation of actual RDS studies with resp ect to the quan tities studied in this pap er. One migh t consider other wa ys to distribute coup ons. The coup on dis- tribution mechanism in our mo del, where a resp ondent selects some of his or her neighbours for attempted coup on transfer while ignoring those neigh- b ours that w ere not selected, is most similar to a w ebRDS pro cess. In a ph ysical RDS study where coupons are handed o ver from person to person, a resp onden t ma y attempt to distribute a coupon to another neighbour if the originally in tended recipient declines (here, distributing a coup on implies study participation). This modified mechanism is giv en as follows. A resp on- den t first attempts to give a coup on to a randomly chosen neighbour. If the coup on is rejected, the resp onden t may try to distribute the same coupon to another neigh b our, randomly chosen among those who previously ha ve not b een offered a coup on. When the coup on is accepted, the pro cedure is re- p eated starting b y randomly selecting among those neigh b ours that ha ve not b een offered a coup on. When there are no more neighbours and/or coup ons left, no further distribution attempts are made. The offspring probabilities in the branching pro cess are the same as previously for individuals with de- gree less than the num b er of coup ons, but the distribution of the num b er of coup ons given out b y an individual with degree larger than c will b e tilted to wards larger v alues compared to the previous model. The probabilities in Eq. (1) no w become P ( X = j | D = k ) = ( k j p j (1 − p ) k − j , j < c ; P k i = c k i p i (1 − p ) k − i , j = c. (14) It is straightforw ard to calculate R 0 and τ using the same techniques as in Sections 3.1 and 3.2. Figure 4 sho w s the R 0 v alues for the mo dified RDS recruitment pro cess with c = 3 , 5 , 10 and the unrestricted epidemic for p ∈ [0 , 1]. In Figure 4 (a), w e sho w the results for the P oisson degree distribution with λ = 8 and in Figure 4 (b) we show the results for the p o w er-law degree distribution with parameters α = 2 and κ = 100. It is clear that R 0 is larger for the mo dified recruitment pro cess for all p compared to the pro cess describ ed in Subsection 2.2 and the p v alue corresp onding to 15 (a) D~Po(8) Probability of successful recruitment R 0 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Unrestricted 10 Coupons 5 Coupons 3 Coupons (b) D~PL(2,100) Probability of successful recruitment R 0 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Unrestricted 10 Coupons 5 Coupons 3 Coupons Figure 4: Comparison of R 0 for unrestricted epidemics and RDS recruitmen t pro cesses where a recruiter tries to distribute a coup on until success. Plot (a) shows the results for the Poisson degree distribution with parameter λ = 8 and plot (b) sho ws the results for the p o wer-la w degree distribution with parameters α = 2 and κ = 100. the epidemic threshold is considerably smaller. When p → 1, the R 0 v alues con verges to those seen in Figure 1. Because the modified process has similar epidemic threshold v alues in terms of p for different c , the corresp onding τ v alues (not shown) are close to those for the unrestricted epidemic when R 0 > 1. F or the final size of the epidemic, the calculations are muc h harder to derive and is thus out of the scop e of this pap er. There are several other complications that could be considered in terms of coup on distribution. E.g., it is not likely that all coup on distribution attempts of a respondent will hav e the same success probability , b oth b ecause the resp onden t may act differently dep ending on how many attempts he or she has previously made and b ecause the relations to his or her neighbours may b e different. Other complications include different respondent b eha viour dep ending on (measurable) individual c haracteristics, geographical v ariations, and time dep endence. Ov erall, our results indicate that RDS studies whic h exp erience difficul- ties with resp ect to recruitmen t c hain failure could b enefit from an increased n umber of coup ons, which would reduce the num b er of additional seeds needed. F urthermore, the longer recruitment chains obtained as a result of an increased num b er of coup ons are more likely to reach remote parts of the p opulation and meet equilibrium criteria for inference. As the recruitment p oten tial of RDS increases from an increased n umber of coup ons, the time to reac h the desired sample size is shortened. Additionally , the study time is not sub ject to unexp ected prolongation due to the addition of seeds. Hence, an increased num b er of coup ons may result in low er and more predictable study costs. F or webRDS studies in particular, the increase in the prop or- 16 tion of the p opulation reached due to increasing the n umber of coup ons facilitates larger sample sizes. W e therefore advise that the recruitmen t p o- ten tial of a planned RDS study should b e considered b eforehand so that the num b er of coup ons could be c hosen large enough to facilitate sustained recruitmen t and an acceptable sample size. Other factors may also increase recruitmen t p oten tial. The coup on transfer probability p could b e increased b y e.g. larger incen tives or improv ed information ab out the study; this has an immediate effect on R 0 , τ , and z . Additionally , the selection of seeds could also affect recruitment capability , see e.g. [35] where different seed se- lection metho ds pro duce very different recruitmen t scenarios. In general, it is of in terest to further study wh y certain RDS studies are more successful in reaching the desired sample size with a modest n umber of seeds. The presen ted epidemic mo del is a nov el contribution to the area of sto c hastic epidemic mo dels and although many results from Reed-F rost epi- demics on configuration model netw orks are exp ected to hold for this model, sev eral properties of it remain to b e studied. There are a num b er of exten- sions that can b e considered, e.g. differen t recruitment pr obabilities through unequally weigh ted edges, controlling for net work structural prop erties, e.g. clustering, and mo difying the coup on distribution mec hanism as previously describ ed. Ac kno wledgemen ts J.M. w as supp orted by the Swedish Research Council, gran t no. 2009-5759. T.B. and F.L. are grateful to Riksbankens jubileumsfond (contract P12- 0705:1) for financial supp ort. References [1] Beyrer C, Baral SD, v an Griensven F, Go o dreau SM, Chariyalertsak S, Wirtz AL, Bro okmey er R. 2012 Global epidemiology of HIV infection in men who hav e sex with men. The L anc et 380 , 367–377. (doi:10. 1016/S0140- 6736(12)60821- 6) [2] Kerrigan D, Wirtz A, Semini I, N’Jie N, Stanciole A, Butler J, Oelrichs R, Beyrer C. 2012 The Glob al HIV Epidemics among Sex Workers . The W orld Bank. (doi:10.1596/978- 0- 8213- 9774- 9) [3] Aceijas C, Stimson GV, Hic kman M, Rho des T. 2004 Global o verview of injecting drug use and HIV infection among injecting drug users. AIDS 18 , 2295–2303. (doi:10.1097/00002030- 200411190- 00010) [4] Magnani R, Sabin K, Saidel T, Hec k athorn D. 2005 Review of sampling 17 hard-to-reac h and hidden populations for HIV surv eillance. AIDS 19 , 67–72. (doi:10.1097/01.aids.0000172879.20628.e1) [5] Lamptey PR, Dirks RG. 2008 HIV/AIDS, reac hing high-risk popula- tions. In Encyclop e dia of So cial Pr oblems (ed. VN Parrillo). 443–447. CA: SAGE Publications, Inc. (doi:10.4135/9781412963930) [6] Hec k athorn DD. 1997 Resp onden t-driv en sampling: a new approach to the study of hidden p opulations. So c. Pr obl. 174–199. (doi:10.2307/ 3096941) [7] Hec k athorn D. 2002 Resp onden t-driv en sampling II: Deriving v alid p op- ulation estimates from chain-referral samples of hidden p opulations. So c. Pr obl. 49 , 11–34. (doi: { 10.1525/sp.2002.49.1.11 } ) [8] W ejnert C, Heck athorn DD. 2008 W eb-based netw ork sampling - Effi- ciency and efficacy of respondent-driv en sampling for online research. So ciol. Metho d R es. 37 , 105–134. (doi: { 10.1177/0049124108318333 } ) [9] W ejnert C. 2009 An empirical test of resp onden t-driven sampling: Poin t estimates, v ariance, degree measures, and out-of-equilibrium data. So- ciol. Metho dol. 39 , 73–116. (doi:10.1111/j.1467- 9531.2009.01216.x) [10] Bengtsson L, Lu X, Nguy en QC, Camitz M, Hoang NL, Nguy en T A, Liljeros F, Thorson A. 2012 Implementation of w eb-based respondent- driv en sampling among men who hav e sex with men in vietnam. PL oS ONE 7 . (doi:10.1371/journal.p one.0049417) [11] Salganik MJ, Heck athorn DD. 2004 Sampling and estimation in hidden p opulations using resp onden t-driv en sampling. So ciol. Metho dol. 34 , 193–240. (doi:10.1111/j.0081- 1750.2004.00152.x) [12] V olz E, Heck athorn DD. 2008 Probability based estimation theory for resp onden t driven sampling. J. Off. Stat. 24 , 79–97 [13] Gile KJ. 2011 Impro v ed inference for respondent-driv en sampling data with application to hiv prev alence estimation. J. Am. Stat. Asso c. 106 . (doi:10.1198/jasa.2011.ap09475) [14] Gile KJ, Handco ck MS. 2011 Netw ork mo del-assisted inference from resp onden t-driv en sampling data. arXiv pr eprint [15] Lu X, Malmros J, Liljeros F, Britton T. 2013 Resp onden t-driven sam- pling on directed netw orks. Ele ctr on. J. Stat. 7 , 292–322. (doi: doi:10.1214/13- EJS772) [16] Malmros J, Masuda N, Britton T. 2013 Random w alks on directed net works: Inference and resp ondent-driv en sampling. arXiv pr eprint 18 [17] Malekinejad M, Johnston LG, Kendall C, F ranco Sansigolo Kerr LR, Rifkin MR, Rutherford GW. 2008 Using resp onden t-driven sampling metho dology for HIV biological and b ehavioral surveillance in in terna- tional settings: A systematic review. AIDS Behav 12 , 105–130. (doi: { 10.1007/s10461- 008- 9421- 1 } ) [18] Stein ML, v an Steenbergen JE, Chan yasanha C, Tipay amongkholgul M, Buskens V, v an der Heijden PGM, Sabaiwan W, Bengtsson L, Lu X, Thorson AE, et al. . 2014 Online Respondent-Driv en Sampling for Studying Con tact Patterns Relev an t for the Spread of Close-Contact P athogens: A Pilot Study in Thailand. PL oS ONE 9 . (doi: { 10.1371/ journal.p one.0085256 } ) [19] Andersson H, Britton T. 2000 Sto chastic epidemic mo dels and their statistic al analysis . v ol. 151. New Y ork: Springer. (doi:10.1007/ 978- 1- 4612- 1158- 7) [20] Mollo y M, Reed B. 1995 A critical-point for random graphs with a given degree sequence. R andom Struct. A lgor. 6 , 161–179. (doi: { 10.1002/rsa. 3240060204 } ) [21] Mollo y M, Reed B. 1998 The size of the giant comp onen t of a random graph with a giv en degree sequence. Comb. Pr ob ab. Comput. 7 , 295– 305. (doi: { 10.1017/S0963548398003526 } ) [22] Ball F, Lyne OD. 2001 Stochastic multi-t yp e SIR epidemics among a p opulation partitioned in to households. A dv. Appl. Pr ob ab. 33 , 99–123. (doi:10.1239/aap/999187899) [23] Ball F, Neal P . 2002 A general mo del for sto c hastic SIR epidemics with t wo levels of mixing. Math. Biosci. 180 , 73–102. (doi:10.1016/ s0025- 5564(02)00125- 6) [24] Martin-L¨ of A. 1986 Symmetric sampling pro cedures, general epidemic pro cesses and their threshold limit theorems. J. Appl. Pr ob ab. 23 , 265– 282. (doi:10.2307/3214172) [25] v an der Hofstad R. 2009 Random graphs and complex netw orks. Avail- able on http://www.win.tue.nl/rhofstad/NotesR GCN.p df [26] Britton T, Janson S, Martin-L¨ of A. 2007 Graphs with sp ecified degree distributions, simple epidemics, and lo cal v accination strategies. A dv. Appl. Pr ob ab. 39 , 922–948. (doi:10.1239/aap/1198177233) [27] A threya K, Ney P . 1972 Br anching Pr o c esses . Grundlehren der mathematisc hen Wissensc haften. Berlin: Springer. (doi:10.1007/ springerreference \ 60215) 19 [28] Ball F G, Sirl DJ, T rapman P . 2014 Epidemics on random in tersection graphs. Ann. Appl. Pr ob ab. 24 , 1081–1128. (doi:10.1214/13- aap942) [29] Ball F, Sirl D. 2013 Acquain tance v accination in an epidemic on a random graph with sp ecified degree distribution. J. Appl. Pr ob ab. 50 , 1147–1168. (doi:10.1239/jap/1389370105) [30] Barb our AD, Reinert G. 2013 Appro ximating the epidemic curv e. Ele c- tr on. J. Pr ob ab 18 , 1–30. (doi:10.1214/ejp.v18- 2557) [31] Newman ME. 2002 Spread of epidemic disease on netw orks. Phys. R ev. E 66 , 016 128. (doi:10.1103/physrev e.66.016128) [32] Britton T, Deijfen M, Martin-L¨ of A. 2006 Generating simple random graphs with prescrib ed degree distribution. J. Stat. Phys. 124 , 1377– 1397. (doi:10.1007/s10955- 006- 9168- x) [33] Csardi G, Nepusz T. 2006 The igraph soft ware pack age for complex net work researc h. InterJournal Complex Systems , 1695 [34] Ball F, Sirl D, T rapman P . 2009 Threshold behaviour and final outcome of an epidemic on a random net work with household structure. A dv. Appl. Pr ob ab. 41 , 765–796. (doi:10.1239/aap/1253281063) [35] Wylie JL, Jolly AM. 2013 Understanding recruitmen t: outcomes asso ci- ated with alternate methods for seed selection in respondent driven sam- pling. BMC Me d. R es. Metho dol. 13 . (doi: { 10.1186/1471- 2288- 13- 93 } ) 20
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment