Component Evolution in General Random Intersection Graphs

Componen t Ev olution in General Random Intersection Graphs Milan Bradonji ´ c ∗ , Aric Hagberg † , Nicolas W . Hengartner ‡ , Allon G. Percus § Abstract Random intersection graphs (RIGs) are an im portant random structure with application s in social net- works, epidemic netw o rks, blog readership , and wireless sensor n etworks. RIGs can be interp reted as a model for large rando mly formed non-m etric data sets . W e analy ze the componen t e volution in general RIGs, and give conditions on existence and u niqueness of th e gian t compon ent. Our techniques gener alize existing methods for analysis o f componen t e volution: w e analyze sur viv al and extinction properties of a d ependen t, inhomo geneou s Galton-W atson branch ing proce ss on genera l RIGs. Our analysis relies on bou nding the branch ing processes and inherits the f undame ntal concepts of the study o f com ponent ev olu tion in Erd ˝ os- R ´ en y i graphs. The major ch allenge comes fr om the und erlying structure of RIGs, which in volves its b oth the set o f nod es and the set of attributes, as well as th e set of different probabilities among the n odes and attributes. Keywords: Rando m graphs, branching processes, probabilistic methods, rand om generation of comb inatorial structures, stochastic proce sses in relation with rand om discrete structures. 1 Introd uction Bipartite grap hs, con sisting of two sets of n odes with edges only conne cting nod es in opposite sets, are a natura l repres entation for many netw orks. A well-kno wn e xample is a collaborati on graph, where the two sets might be scient ists and research papers , or a ctors a nd mo vies [25 , 16]. Social networks can often be cast as bipartite graphs since th ey are b uilt from sets of indiv iduals connecte d to sets of attrib utes, such as membership of a club or orga- nizatio n, work colleagues, or fans of th e sa m e sports team. Simulations of epidemic sp read in human po pulations are o ften perfor m ed on networ ks constructe d from bipartit e graphs of people and the loc ations they visit dur ing a typica l day [11]. Bipartite structure, of course, is hardly limited to socia l networks. The relation between nodes and ke ys in secure wireless communication , for examples, forms a bipartite network [6]. In general, biparti te graphs are wel l suited to the problem of cl assifying objects, where each ob ject has a set of propert ies [10]. Ho w- e ver , modeling such classiﬁcat ion network s remains a challen ge. The well-stud ied Erd ˝ os-R ´ en yi m odel, G n,p , succes sfully used for a verage-c ase analysis of algorithm performan ce, does not satisf actorily represent many randomly formed social or collabo ration networ ks. For exampl e, G n,p does not capture the typical scale-free deg ree distrib ution of many real-world networks [3 ]. More realistic de gree distrib utions can be achie ved by the conﬁgura tion m odel [18] or expect ed degr ee model [7], b ut ev en those fail to captur e common properti es of social networ ks suc h as the high number of triangles (or cliques) and strong degree-d egree correlatio n [17, 1]. The most straightforw ard way of remedying these problems is to charact erize each of th e bip artite sets separately . One step in this direction is an extensio n of the conﬁguration model that speciﬁes degrees in both sets [14]. Another related approach i s that of random intersection graphs (RIG), ﬁrst in troduced i n [24, 15]. An y undirec ted ∗ Theoretical Division , and Center for Nonlinear Studies, Los Alamos National Laboratory , Los Alamos, NM 87545, USA, milan@lanl.g ov , † Theoretical Division , Los Alamos National Laboratory , Los Alamos, NM 87545, USA, hagberg @lanl.gov , ‡ Statistical Sciences Group Los Alamos National Laboratory , NM 87545, US A, nickh@lanl.gov , § School of Mathematical Sciences Claremont Graduate Univ ersity , Claremont, CA 91711, USA, allon.per cus@cgu.edu . 1 graph can be represe nted as an intersectio n graph [9]. The simple st version is the “uniform” RIG, G ( n, m, p ) , contai ning a set of n nod es and a set of m attrib utes, where an y gi ven node-attrib ute pair con tains an edge with a ﬁxed prob ability p , ind ependently of othe r pairs . T wo nod es in the graph are taken to be connected if and only if the y are both conn ected to at least one common element in the attribu te set. In our work, we study the more genera l RIG , G ( n , m, p ) [20, 19], where th e node-att ribute edge prob abilities are not giv en by a un iform valu e p b ut rather by a set p = { p w } w ∈ W : a node is attached to the attrib ute w , w ith prob ability p w . This general model has only recently been dev eloped and only a few results hav e obtained, such as expander properties, cover time, and the exis tence and efﬁcien t con struction of large indepe ndent sets [20, 19, 21]. In this paper , we analyz e the e voluti on of componen ts in general RIGs. Related results hav e pre viously been obtain ed for the uniform RIG [4], and for two uniform case s of th e RIG mod el where a speciﬁc o verlap thresh old contro ls the connec tivity of the nodes, were analyzed in [6]. Our main contrib ution is a generaliz ation of the compone nt ev olution on a general RIG. W e provide stochastic bound s, by analyzing the stopping time of the branch ing process on gener al RIG, w here the history of the process is directly dictated by the structure of the genera l RIG. The major challen ge comes from the und erlying structure of RIGs, which in volv es both the set of nodes and the set of attrib utes, as well as the set of diff erent probabilit ies p = { p w } w ∈ W . 2 Model and pr evious work In this paper , we will consider the general interse ction graph G ( n, m, p ) , introduced in [20, 19], with a set of probab ilities p = { p w } w ∈ W , where p w ∈ (0 , 1) . W e n ow formally deﬁne the model. Model. There are two sets: the set of nodes V = { 1 , 2 , . . . , n } and the set of attrib utes W = { 1 , 2 , . . . , m } . For a gi ven set of probab ilities p = { p w } w ∈ W , indepen dently ov er all ( v , w ) ∈ V × W let A v,w := Bernoulli ( p w ) . (1) Every n ode v ∈ V is assigned a random set of attrib utes W ( v ) ⊆ W W ( v ) := { w ⊆ W | A v,w = 1 } . (2) The set of edges in V is deﬁned such that two dif ferent nodes v i , v j ∈ V are connected if and only if | W ( v i ) ∩ W ( v j ) | ≥ s, (3) for a gi ven integ er s ≥ 1 . In our analy sis, p w are not nece ssarily the same as in [4, 6] 1 , and for simplici ty we ﬁx s = 1 . The component ev olution of the unifor m m odel G ( n, m, p ) was analyze d by B ehrisch in [4], for the case when the scaling of nodes and attrib utes is m = n α , with α 6 = 1 an d p 2 m = c/n . T heore m 1 in [4] states that the size of the larges t compon ent N ( G ( n, m, p )) in RIG satisﬁes (i) N ( G ( n, m, p )) ≤ 9 (1 − c 2 ) log n , for α > 1 , c < 1 , (ii) N ( G ( n, m , p )) = (1 + o (1))(1 − ρ ) n , for α > 1 , c > 1 , (iii) N ( G ( n, m, p )) ≤ 10 √ c (1 − c 2 ) p n m log m , for α < 1 , c < 1 , (iv ) N ( G ( n, m, p )) = (1 + o (1))(1 − ρ ) √ cmn , for α < 1 , c > 1 , where ρ is the s olution in (0 , 1) of the equat ion ρ = exp ( c ( ρ − 1)) . The component ev olution for the case s ≥ 1 in the relation | W ( u ) ∩ W ( v ) | ≥ s is con sidered in [6], where the follo wing two RIG mode ls are analyze d: (1) G s ( n, m, d ) mode l, where P [ W ( v ) = A ] =  m d  − 1 for all A ⊆ W on d elements, for a gi ven d ; (2) G ′ s ( n, m, p ) model, where P [ W ( v ) = A ] = p | A | (1 − p ) m −| A | for all A ⊆ W . In ligh t of results of [4], it has been sho wn in [6], that for d = d ( n ) , p = p ( n ) , m = m ( n ) , n = o ( m ) , where 1 Note that p w ’ s do not sum up to 1 . Moreo ver , we can eliminate the cases p w = 0 and p w = 1 . These two cases respectiv ely correspond when none or all nodes v are attached to the attribute w . 2 s is a ﬁxed inte ger , and d 2 s ∼ cm s s ! /n , the lar gest componen t in G s ( n, m, d ) satisﬁes: (i) N ( G s ( n, m, d )) ≤ 9 (1 − c 2 ) log n , for c < 1 , (ii) N ( G s ( n, m, d )) = (1 + o (1))(1 − ρ ) n , for c > 1 , in the case when n log n = o ( m ) for s = 1 and n = o ( m s/ (2 s − 1) ) for s ≤ 2 . The same results for the giant compon ent in G s ( n, m, p ) still hold for the case when p 2 s = cs ! /m s n and n = o ( m s/ (2 s − 1) ) , see [6]. Both G s ( n, m, d ) and G ′ s ( n, m, p ) are speci al cases of a more general class studie d in [13], where the numbe r of attrib utes of each node is assigned randomly as in the bipartite conﬁguration model. That is, for a giv en probab ility distrib ution ( P 0 , P 1 , . . . , P m ) , w e hav e P [ | W ( v ) | = k ] = P k for all 0 ≤ k ≤ m , and moreo ver gi ven the size k , all of the sets W ( v ) are equally probable, that is for any A ⊆ W , P [ W ( v ) = A : | W ( v ) | = k ] =  m k  − 1 . That is, w e see that G s ( n, m, d ) is equi va lent to the model of [13] with the delta-dist ributi on, where the probability of the d -th coordinate is 1 , while G ′ s ( n, m, d ) is equi valen t to the model of [13] with the Bin( m, p ) dis tribut ion. T o complete the picture of pre vious wo rk, in [8], it was sho wn that when n = m a set of probab ilities p = { p w } w ∈ W can be chos en to tune the de gree and clustering coef ﬁ cient of the graph. 3 Mathematical pr eliminaries In this paper , we analyze the component ev olution of the gen eral RIG structu re. As we hav e alr eady mentio ned, the major challenge comes from the underl ying structure of RIGs, which in vo lves both the set of node s and the set of attrib utes, as well as the set of dif ferent probabilitie s p = { p w } w ∈ W . Moreo ver , the edges in RIG are not independ ent. Henc e, a RIG cannot be treated as an E rd ˝ os-R ´ enyi random graph G n, ˆ p , with the edge probabili ty ˆ p = 1 − Q w ∈ W (1 − p 2 w ) . Ho weve r , in [12], the authors provide the comparis on among G n, ˆ p and G ( n, m, p ) , sho w ing that for m = n α and α > 6 , these two classes of graphs hav e asympto tically the same properties . In [23], Rybarczyk has recently shown the equi vale nce of sharp threshold functi ons among G n, ˆ p and G n,m,p , when m ≥ n 3 . In this work, w e do not impose any constraint s among n and m , and w e de velop methods for the analysis of branching proc esses on RIGs, since the existin g methods for the analys is of branc hing pro cesses on G n,p do not apply . W e n ow brieﬂy state the edge depende nce. C onsid er thre e distinct nodes v i , v j , v k from V . Conditionall y on the set W ( v k ) , by th e deﬁnition (2), the sets W ( v i ) ∩ W ( v k ) and W ( v j ) ∩ W ( v k ) are mu tually independ ent, which implies cond itional inde pendence of the e vents { v i ∼ v k | W ( v k ) } , { v j ∼ v k | W ( v k ) } , that is, P [ v i ∼ v k , v j ∼ v k | W ( v k )] = P [ v i ∼ v k | W ( v k )] P [ v j ∼ v k | W ( v k )] . (4) Ho w e ver , the latter does not imply independ ence of the ev ents { v i ∼ v k } an d { v j ∼ v k } since in gener al P [ v i ∼ v k , v j ∼ v k ] = E [ P [ v i ∼ v k , v j ∼ v k | W ( v k )] = E [ P [ v i ∼ v k | W ( v k )] P [ v j ∼ v k | W ( v k )]] 6 = P [ v i ∼ v k ] P [ v j ∼ v k ] . (5) Furthermor e, the condi tional pairwise indepe ndence (4) does not extend to three or more nodes. Indeed, con- dition ally on the set W ( v k ) , the sets W ( v i ) ∩ W ( v j ) , W ( v i ) ∩ W ( v k ) , and W ( v j ) ∩ W ( v k ) are not mutually indepe ndent, and hence neit her are the ev ents { v i ∼ v j } , { v i ∼ v k } , and { v j ∼ v k } , that is, P [ v i ∼ v j , v i ∼ v k , v j ∼ v k | W ( v k )] 6 = P [ v i ∼ v j | W ( v k )] P [ v i ∼ v k | W ( v k )] P [ v j ∼ v k | W ( v k )] . (6) W e n ow prov ide two identitie s, which we will use throu ghout this pa per . For an y w ∈ W , let q w := 1 − p w , and deﬁne Q α ∈∅ q α = 1 . Claim 1 F or any node u ∈ V and given set A ⊆ W , P [ W ( u ) ∩ A = ∅| A ] = Y α ∈ A (1 − p α ) = Y α ∈ A q α . (7) 3 Pro of Write P [ W ( u ) ∩ A = ∅| A ] = P [ ∀ α ∈ A, α / ∈ W ( u ) | A ] = Y α ∈ A P [ α / ∈ W ( u )] = Y α ∈ A (1 − p α ) = Y α ∈ A q α , which is the desire d e xpression. Claim 2 F or any node u ∈ V , and given sets A ⊆ B ⊆ W , P [ W ( u ) ∩ A = ∅ , W ( u ) ∩ B 6 = ∅| A, B ] =  Y α ∈ A q α  1 − Y α ∈ B \ A q β  = Y α ∈ A q α − Y β ∈ B q β . Pro of The sets A an d B \ A are disjoin t. The result follo ws from (7). 4 A uxiliary pr o cess on general random intersection graphs Our analys is for the emer gence of a giant componen t is inspired by the approach described in [2]. The difﬁcu lty in analyzing the ev olution of the stochastic proces s deﬁned by equa tions (1), (2), and (3) resides in the fact that we need, at least in principle, to keep track of the temporal e volut ion of the sets of nodes and attrib utes being exp lored. This results in a process that is not Marko vian. W e construct an auxi liary proce ss, w hich star ts at an arbitra ry node v 0 ∈ V , and reaches zero for the ﬁ rst time in a number of steps equal to the size of the component containing v 0 . The process is algorithmical ly deﬁned as follo ws. A uxiliary Proc ess. Let us denote by V t the cumulati ve set of nodes visited by time t , which we initialize to V 0 = { v 0 } , and set W ( v 0 ) = { v 6 = v 0 : W ( v ) ∩ W ( v 0 ) 6 = ∅} . Starting with Y 0 = 1 , the proce ss ev olves as follo ws: For t = 1 , 2 , 3 , . . . , n − 1 and Y t > 0 , pick a node v t unifor m ly at ran dom from the set V \ V t − 1 and update the set of visited nodes V t = V t − 1 ∪ { v t } . Denote by W ( v t ) = { w ∈ W | A v t ,w = 1 } th e set of features associ ated to node v t , and deﬁne Y t =    n v ∈ V \ V t | W ( v ) ∩ ∪ t τ =0 W ( v τ ) 6 = ∅ o    . The random v ariable Y t counts the number of nodes outside the set of visited nodes V t that are connected to V t . Follo wing [2], we call Y t the number of alive nodes at time t . W e note that we do not need to kee p track of the actual list of neigh bors of V t n v ∈ V \ V t | W ( v ) ∩ ∪ t τ =0 W ( v τ ) 6 = ∅ o , (8) as in [2], becau se e very nod e in V \ V t is equally like ly to belong to the set (8). A s a resul t, each time we need a random node from (8), we pick a node unifo rmly at rando m form V \ V t . T o understand wh y this process is useful, n otice th at by time t , w e kno w that the size of t he co m ponen t conta ining v 0 is at least as large as the number of visited nodes V t plus the number Y t of neighbor s of V t not yet visite d. Once the numbe r Y t of neighbors connected to V t b ut not yet visited dro ps to zero, the size of V t is equal to the size of the compon ent conta ining v 0 . W e formali ze this last statemen t by in troducing the stopping time T ( v 0 ) = inf { t > 0 : Y t = 0 } , (9) whose v alue is | C ( v 0 ) | . Finally , our analys is of that process requi res us to ke ep track of the his tory of the feature sets uncov ered by the proces s H t = { W ( v 0 ) , W ( v 1 ) , . . . , W ( v t ) } . (10) 4 4.1 Pr o cess description in terms of random variable Y t As in [6], we denote the cumulati ve feature set associated to the sequence of nodes v 0 , . . . , v t from the auxiliary proces s by W [ t ] := ∪ t τ =0 W ( v τ ) . (11) W e will cha racterize the proces s { Y t } t ≥ 0 in terms of the number Z t of ne wly discov ered neighb ors to V t . T he latter is directl y relat ed to the increment, deﬁned by of the process Y t Z t = Y t − Y t − 1 + 1 , (12) where the term +1 reﬂects the fact that one node, Y t − 1 decrea ses by one when the node v t becomes a visited node at time t . The e vents that any giv en node, which is neither visited nor aliv e, becomes alive at time t are condit ionally independen t giv en the history H t , since each ev ent in volv es a diff erent subsets of the indicator random v ariable s { A v,w } . In ligh t of Claim 2 , the condi tional probability that a node u becomes ali ve at time t is r t := P [ u ∼ v t , u 6∼ v t − 1 , u 6∼ v t − 2 , . . . , u 6∼ v 0 |H t ] = P [ W ( u ) ∩ W ( v t ) 6 = ∅ , W ( u ) ∩ W [ t − 1] = ∅|H t ] = P [ W ( u ) ∩ W ( v t ) 6 = ∅ , W ( u ) ∩ W [ t − 1] = ∅| W ( v t ) , W [ t − 1] ] = Y α ∈ W [ t − 1] q α − Y β ∈ W [ t ] q β = φ t − 1 − φ t , (13) where we set φ t := Q α ∈ W [ t ] q α , and use the con vention W [ − 1] = W ( ∅ ) ≡ ∅ and φ − 1 ≡ 1 . Obs erve that the probab ility (13) does not depend on u . Hence the number of ne w ali ve nodes at time t is, condi tionally on the histor y H t , a Binomial distrib uted random vari able with par ameters r t and N t = n − t − Y t . (14) Formally , Z t +1 |H t ∼ Bin ( N t , r t ) . (15) This allo ws us to describe the distrib ution of Y t in the ne xt lemma. Lemma 3 F or times t ≥ 1 , the number of alive nodes satisﬁe s Y t |H t − 1 ∼ Bin  n − 1 , 1 − t − 1 Y τ =0 (1 − r τ )  − t + 1 . (16) The proof of this lemma requi res us to estab lish the fo llowing result ﬁrst. Lemma 4 Let rand om va riables Λ 1 , Λ 2 satisfy : Λ 1 ∼ Bin( m, ν 1 ) and Λ 2 given Λ 1 ∼ Bin(Λ 1 , ν 2 ) . Then mar ginally Λ 2 ∼ Bin( m, ν 1 ν 2 ) an d Λ 1 − Λ 2 ∼ Bin( m, ν 1 (1 − ν 2 )) . Pro of Let U 1 , . . . , U m and V 1 , . . . , V m be i.i.d. Uniform (0 , 1) rand om v ariables. Writing Λ 1 d = m X j =1 I ( U j ≤ ν 1 ) and Λ 2 | Λ 1 d = X k : U k <ν 1 I ( V k ≤ ν 2 ) , we ha ve that Λ 2 d = m X k =1 I ( U k ≤ ν 1 ) I ( V k ≤ ν 2 ) d = m X k =1 I ( U k ≤ ν 1 ν 2 ) , 5 from which the concl usion follo ws. Pro of (Proof of Lemma 3) W e prove the assertion on the L emma by induc tion in t . F or t = 0 , Y 0 = 1 and t = 1 , Y 1 = Z 1 ∼ Bin( n − 1 , r 0 ) . Hence, the Lemm a is true for t = 1 and t = 0 . Assume that the assertio n is true for some t ≥ 1 , Y t |H t − 1 ∼ Bin  n − 1 , 1 − t − 1 Y τ =0 (1 − r τ )  − t + 1 . (17) From (15), we ha ve Z t +1 |H t ∼ Bin( N t , r t ) = Bin( n − t − Y t , r t ) , Now , from (12) and Lemm a 4, it follo ws Y t +1 |H t ∼ Bin  n − 1 , 1 − t Y τ =0 (1 − r τ )  − t. (18) Hence, by mathemati cal induc tion, the L emma holds for an y t ≥ 0 . 4.2 Expectation and variance of φ t The his tory H t embodie s the e volut ion of ho w the features are d iscove red over time. It is insi ghtful to recast that histor y in terms o f the discov ery times Γ w of each featu re in W . Gi ven any sequenc e of nodes v 0 , v 1 , v 2 , . . . , the probab ility that a gi ven feature w is ﬁrst discov ered at time t < n is P [Γ w = t ] = P [ A v t ,w = 1 , A v t − 1 ,w = 0 , . . . , A v 0 ,w = 0] = p w (1 − p w ) t . If a featu re w is not discov ered by time n − 1 , we set Γ w = ∞ and note that P [Γ w = ∞ ] = (1 − p w ) n . From the independ ence of the random var iables A v,w , it follo ws that the discov ery times { Γ w : w ∈ W } are indepe ndent. W e no w focus on describin g the distrib ution of φ t = Q α ∈ W [ t ] q α . For t ≥ 0 , we ha ve φ t = Y α ∈ W [ t ] q α = t Y j =0 Y α ∈ s ( v j ) \ S [ j − 1] q α d = t Y j =0 Y w ∈ W q I (Γ w = j ) w = Y w ∈ W q I (Γ w ≤ t ) w . (19) Using the fact tha t for a B ∼ Bernoulli ( r ) , the expec tation E [ a B ] = 1 − (1 − a ) r , we can easily calculate the exp ectation of φ t E [ φ t ] = E [ Y w ∈ W q I (Γ w ≤ t ) w ] = Y w ∈ W  1 − (1 − q w ) P [Γ w ≤ t ]  = Y w ∈ W  1 − (1 − q w )(1 − q t +1 w )  . (20) The conc entration of φ 0 will be crucial for the ana lysis of the superc ritical reg ime, Subsection 5.2. Hence, w e here pro vide E [ φ 0 ] and E [ φ 2 0 ] . From (20) it follo ws E [ φ 0 ] = Y w ∈ W (1 − p 2 w ) = 1 − X w ∈ W p 2 w + ø( X w ∈ W p 2 w ) . (21) Moreo ver , fro m (19) it fo llows E [ φ 2 0 ] = E [ Y w ∈ W q 2 I (Γ w ≤ 0) w ] = Y w ∈ W  1 − (1 − q 2 w ) P [Γ w = 0]  = Y w ∈ W  1 − (1 − q 2 w ) p w  = Y w ∈ W  1 − 2 p 2 w + p 3 w  = 1 − 2 X w ∈ W p 2 w + ø( X w ∈ W p 2 w ) . (22) 6 5 Giant component W ith the pr ocess { Y t } t ≥ 0 deﬁned in t he pre vious section , we analyze both th e subcr itical and supercritic al regime of our rando m intersect ion graph by adapting the perc olation based techniqu es to anal yze Erd ˝ os-R ´ eny i random graphs [2]. The technical difﬁcu lty in analyzing that stopping time rests in the fact that the distrib ution of Y t depen ds on the history of the proc ess, dictated by the structur e of the genera l RIG. In the nex t two subsectio ns, we will gi ve condit ions on n on-existe nce, that is, on exi stence and uniquen ess of the giant compone nt in general RIGs. 5.1 Subcritical r egi me Theor em 5 Let X w ∈ W p 3 w = O (1 /n 2 ) and p w = O (1 /n ) for all w . F or any positiv e consta nt c < 1 , if P w ∈ W p 2 w ≤ c/n , then all componen ts in a gener al rando m inter section gra ph G ( n, m, p ) ar e of or der O (log n ) , with high pr obabil ity 2 . Pro of W e gener alize the techniques used in the proof for the sub-cr itical case in G n,p presen ted in [2]. Let T ( v 0 ) be the stopping time deﬁne in (9), for the proc ess starting at no de v 0 and note that T ( v 0 ) = | C ( v 0 ) | . W e will bo und the size of th e larges t component, and prov e that unde r the condition s of the t heorem, all compone nts are of order O (log n ) , wh p . For a ll t ≥ 0 , P [ T ( v 0 ) > t ] = E [ P [ T ( v 0 ) > t | H t ]] ≤ E [ P [ Y t > 0 | H t ]] = E " P [Bin( n − 1 , 1 − t − 1 Y τ =0 (1 − r τ )) ≥ t | H t ] # . (23) Boundin g from abo ve, which can easily be prov en by induction in t for r τ ∈ [0 , 1] , we hav e 1 − t − 1 Y τ =0 (1 − r τ ) ≤ t − 1 X τ =0 r τ = t − 1 X τ =0 ( φ τ − 1 − φ τ ) = 1 − φ t − 1 . (24) By using stocha stic orderi ng of the Binomial distrib ution, both in n and in P t − 1 τ =0 r τ , and for any positi ve consta nt ν , which is to be spe ciﬁed lat er , it follo w s P [ T ( v 0 ) > t | H t ] ≤ P [Bin( n, t − 1 X τ =0 r τ ) ≥ t | H t ] = P [Bin( n, 1 − φ t − 1 ) ≥ (1 − ν ) t | H t ] = P [Bin( n, 1 − φ t − 1 ) ≥ t | 1 − φ t − 1 < (1 − ν ) t/n ∩ H t ] P [1 − φ t − 1 < (1 − ν ) t/n | H t ] + P [Bin( n, 1 − φ t − 1 ) ≥ t | 1 − φ t − 1 ≥ (1 − ν ) t/n ∩ H t ] P [1 − φ t − 1 ≥ (1 − ν ) t/n | H t ] ≤ P [Bin( n, 1 − φ t − 1 ) ≥ t | 1 − φ t − 1 < (1 − ν ) t/n ∩ H t ] + P [1 − φ t − 1 ≥ (1 − ν ) t/n | H t ] . (25) Furthermor e, using the fact that t he e ven t { 1 − φ t − 1 < (1 − ν ) t/n } is H t -measura ble, together with the s tochastic orderi ng of the binomia l dist ributi on, we obtain P [Bin( n, 1 − φ t − 1 ) ≥ t | 1 − φ t − 1 < (1 − ν ) t/n ∩ H t ] ≤ P [Bin( n, (1 − ν ) t/ n ) ≥ t | H t ] . 2 W e will use the notation “with high probability” and denote as whp , meaning with probability 1 − o (1) , as the number of nodes n → ∞ . 7 T aking the e xpectation with respect to the history H t in (25) yields P [ T ( v 0 ) > t ] ≤ P [Bin( n, (1 − ν ) t/n ) ≥ t ] + P [1 − φ t − 1 ≥ (1 − ν ) t/n ] . For t = K 0 log n , where K 0 is a constant lar ge enough and indepen dent on the initial node v 0 , the Chernof f bound ensure s that P [Bin( n, (1 − ν ) t/n ) ≥ t ] = o (1 /n ) . T o bound P [1 − φ t − 1 ≥ (1 − ν ) t/n | H t ] , use (19) to obtain { 1 − φ t − 1 ≥ (1 − ν ) t/n } = ( Y w ∈ W q I (Γ w ≤ t ) w ≤ 1 − (1 − ν ) t n ) = ( X w ∈ W log  1 1 − p w  I (Γ w ≤ t ) ≥ − log  1 − (1 − ν ) t n  ) . Linearize − log(1 − (1 − ν ) t/n ) = (1 − ν ) t/n + o ( t/n ) and deﬁne the bounded auxilia ry rando m v ariables X t,w = n log (1 / (1 − p w )) I (Γ w ≤ t ) . Direct calcula tions re veal that E [ X t,w ] = n log  1 1 − p w  (1 − q t w ) = n  p w + ø( p w )  1 − (1 − p w ) t  = n  p w + ø( p w )  tp w + ø( tp w ))  = nt p 2 w + ø  ntp 2 w  , (26) which implies X w ∈ W E [ X t,w ] = nt X w ∈ W p 2 w + ø  nt X w ∈ W p 2 w  . (27) Thus unde r the state d cond ition that n X w ∈ W p 2 w ≤ c < 1 , it follo ws that 0 < (1 − c ) t ≤ t − P w ∈ W E [ X t,w ] . In light of Bernstein’ s inequa lity [5], we bou nd P [1 − φ t − 1 ≥ (1 − ν ) t/n ] = P " X w ∈ W X t,w ≥ (1 − ν ) t # ≤ P " X w ∈ W  X t,w − E [ X t,w ]  ≥ (1 − ν − c ) t # ≤ exp  − 3 2 ((1 − ν − c ) t ) 2 3 P w ∈ W V ar [ X t,w ] + nt max w { p w } (1 + ø( 1))  . (28) Since E [ X 2 t,w ] =  n log  1 1 − p w  2 (1 − q t w ) = n 2  p w + ø( p w )  2  1 − (1 − p w ) t  = n 2  p 2 w + ø( p 2 w )  tp w + ø( tp w ))  = n 2 tp 3 w + ø  n 2 t X w ∈ W p 3 w  , (29) it follo ws that for some large cons tant K 1 > 0 X w ∈ W V ar [ X t,w ] ≤ X w ∈ W E [ X 2 t,w ] = n 2 t X w ∈ W p 3 w + ø  n 2 t X w ∈ W p 3 w  ≤ K 1 t. Finally , the assumption of the theorem implies that there exists cons tant K 2 > 0 such that n max w ∈ W p w ≤ K 2 . Substitu ting thes e boun ds into (28) yields P [1 − φ t − 1 ≥ (1 − ν ) t/n ] ≤ exp  − 3(1 − ν − c ) 2 2(3 K 1 + K 2 ) t  , 8 and taking ν ∈ (0 , 1 − c ) and t = K 3 log n for some con stant K 3 lar ge eno ugh and not depending on the initial node v 0 , we conclude that P [1 − φ t − 1 ≥ (1 − ν ) t/n ] = o ( n − 1 ) , w hich in turn implies that taking constant K 4 = max { K 0 , K 3 } , ensure s that P [ T ( v 0 ) > K 4 log n ] = ø(1 /n ) for any in itial node v 0 . Finally , a union bound over the n possible starting valu es v 0 implies that P [ max v 0 ∈ V T ( v 0 ) > K 4 log n ] ≤ n ø( n − 1 ) = o (1) , which implies that all conne cted compo nents in the random intersecti on are of size O (log n ) , whp . Remarks. W e no w consider the condition s of the theorem. From the Cauchy-Schwarz inequality , we obtain  P w ∈ W p 3 w  P w ∈ W p w  ≥  P w ∈ W p 2 w  2 . Moreo ver , gi ven that P w ∈ W p 3 w = O (1 /n 2 ) and p w = O (1 /n ) , it follo ws P w ∈ W p 2 w = Ω( p m/n 3 ) . Hence, for P w ∈ W p 2 w = c/n , w hen c < 1 , it follo ws m = Ω( n ) , which is consis tent with the resu lts in [4] on the non-ex istence of a gian t component in a uniform RIG. 5.2 Super critical regime W e n ow turn to the study of the supercrit ical regi m e in which lim n →∞ n P w ∈ W p 2 w = c > 1 . Theor em 6 Let X w ∈ W p 3 w = o  log n n 2  and p w = o  log n n  , for all w . F or any constan t c > 1 , if P w ∈ W p 2 w ≥ c/n , then w hp ther e exis ts a unique lar gest componen t in G ( n, m, p ) , of or der Θ( n ) . Mor eover , the size of the gian t componen t is given by n ζ c (1 + ø(1)) , wher e ζ c is the solu tion in (0 , 1) of the equation 1 − e − cζ = ζ , while all ot her components are of siz e O (log n ) . Remarks. The conditi ons on p w and P w p 3 w are weaker than on es in th e case of the sub-critic al reg ime. The proof p roceeds as follows. The ﬁrst step is to bound, b oth from abov e and below , the v alue 1 − Q t − 1 τ =0 (1 − r τ ) that gov erns the behav ior the branching process { Y t } t ≥ 0 , see Lemm a 3. W ith the lower bound , we show the emer gence with high probabil ity of at least one giant component of size Θ( n ) . W e use the upper bound to prov e uniqu eness of the g iant componen t. T echnical ly , we make use o f these bou nds to compare ou r branching proce ss to branc hing proces ses arisi ng in the study of Erd ˝ os-R ´ ene yi random graphs. Pro of W e start by boun ding 1 − Q t − 1 τ =0 (1 − r τ ) . The upper bounds P t − 1 τ =0 r τ has been previ ously establis hed in (24). For the lo wer bound, we apply Jensen’ s inequality to the function log (1 − x ) to get log t − 1 Y τ =0 (1 − r τ ) = t − 1 X τ =0 log(1 − r τ ) = t − 1 X τ =0 log  1 − ( φ τ − 1 − φ τ )  ≤ t log  1 − 1 t t − 1 X τ =0 ( φ τ − 1 − φ τ )  = t log  1 − 1 − φ t − 1 t  . (30) In light of (19), φ t is decre asing in t , and hen ce 1 −  1 − 1 − φ 0 t  t ≤ 1 −  1 − 1 − φ t − 1 t  t ≤ 1 − t − 1 Y τ =0 (1 − r τ ) ≤ t − 1 X τ =0 r τ = 1 − φ t − 1 . (31) 9 T o fu rther bound 1 −  1 − 1 − φ 0 t  t , cons ider the fu nction f t ( x ) = 1 − (1 − x/t ) t for x in a neighbo rhood of the origin a nd t ≥ 1 . For any ﬁx ed x , f t ( x ) decreases to 1 − e − x as t tends to inﬁnity . The latter function is conca ve, and hence for all x ≤ ε , 1 − e − ε ε x ≤ f t ( x ) . Note that (1 − e − ε ) /ε can be made arbitrary close to one by taking ε small enough. F urther m ore, f t ( x ) is increa sing in x for ﬁxed t . From (19), 1 − φ 0 ≤ 1 − φ t , hence 1 − (1 − 1 − φ 0 t ) t ≤ 1 − (1 − 1 − φ t − 1 t ) t . L ookin g closer at 1 − φ 0 , from (22) and (21), by using Chebyshe v inequa lity , with P w ∈ W p 2 w = c/n , it follo ws that φ 0 , is concentrate d around its mean E [ φ 0 ] = c/n . That is, for any constant δ > 0 , φ 0 ∈ ((1 − δ ) c/n, (1 + δ ) c/n ) , with probabilit y 1 − o (1 /n ) . W e conclude that for an y δ > 0 there is ǫ > 0 such that ( c − δ ) 1 − e − ǫ ǫ > 1 , sinc e consta nt c > 1 . Moreov er , since lim ǫ → 0 1 − e − ǫ ǫ = 1 , by choosing ǫ sufﬁcien tly small, 1 − e − ǫ ǫ can be arbitrar ily close to 1 . It follo ws that 1 − Q t − 1 τ =0 (1 − r τ ) > c ′ /n , for some constant c > c ′ > 1 arbitrarily close to c . Hence, the branch ing proce ss on R IG is stochas tically lower bounded by the Bin( n − 1 , c ′ /n ) , which stochas tically dominate s a branchin g process on G n,c ′ /n . B ecause c ′ > 1 , there exists whp a giant componen t of size Θ( n ) in G n,c ′ /n . This implies that the stoppin g of the branching proces s associated to G n,c ′ /n is Θ( n ) with high probab ility , and so is the sto pping time T v for some v ∈ V , which implies that there is a giant compo nent in a genera l RIG, whp . Let us look closer at the size of that giant componen t. From the representa tion (19) for φ t − 1 , conside r the pre viously introd uced random v ariables X t,w = n log(1 / (1 − p w )) I (Γ w ≤ t ) . S imilarly , as in the proof of the Theorem 5, it follows that under the conditi ons of the theorem there is a positi ve consta nt δ > 0 such that P w X t,w is concentra ted within (1 ± δ ) P w E [ X t,w ] = (1 ± δ ) c/n , with proba bility 1 − o (1) . Hence, there exi sts p + = c + /n , for some constant c + > c > 1 , such that 1 − φ t − 1 ≤ 1 − (1 − p + ) t , which is equiv alent to − log φ t − 1 ≤ t log (1 − p + ) = tp + + ø( tp + ) = tc + /n + ø( t /n ) . Similarly , the con centration of φ t − 1 implies that there e xists p − = c − /n , with c > c − > 1 , such t hat 1 − (1 − p − ) t ≤ 1 − (1 − (1 − φ t − 1 ) /t ) t , whic h implies that − log φ t − 1 ≥ t log (1 − p − ) = tp − + ø( tp − ) = tc − /n + ø( t/n ) . Combining the u pper and lo wer bound , we conclu de that with prob ability 1 − o (1) , the rate of the branching process on RIG is brackete d by 1 − (1 − p − ) t ≤ 1 − t − 1 Y τ =0 (1 − r τ ) ≤ 1 − (1 − p + ) t . (32) The stocha stic domin ance of the Binomia l distrib ution toge ther with (32), impli es P h Bin  n − 1 , 1 − (1 − p − ) t  ≥ t i ≤ P h Bin  n − 1 , 1 − t − 1 Y τ =0 (1 − r τ )  ≥ t i ≤ P h Bin  n − 1 , 1 − (1 − p + ) t  ≥ t i . (33) In light of (32), the branchi ng process { Y t } t ≥ 0 associ ated to a RIG is stochast ically bounded from below and form abov e by the branch ing processes associate d to G n,p − and G n,p + , respec tive ly (fo r the analys is on an Erd ˝ os-R ´ en yi graph, see [2 ]). Since both c − , c + > 1 , there exist giant component s in both G n,p − and G n,p + , whp . In [2 2 ], it h as been sho wn that the gian t components in G n,λ/n , fo r λ > 1 , is uni que and of size ≈ nζ λ , wher e ζ λ is the uniqu e solu tion from (0 , 1) of the equation 1 − e − λζ = ζ . (34) Moreo ver , the size of the gia nt comp onent in G n,λ/n satisﬁes the central limit theore m max v {| C ( v ) }| − ζ λ n √ n d = N  0 , ζ λ (1 − ζ λ ) (1 − λ + λζ λ ) 2  . (35) 10 From the deﬁnition of the stopping time, see (23), and since (33) and (35), it follows there is a gia nt compo nent in a RIG, of size, at least, nζ λ (1 − ø(1 )) , whp . Furthermore , the stopping times of the branching processes associ ated to G n,p − and G n,p + are approximate ly ζ n , where ζ satisfy (34), with λ − = np − and λ + = np + , respec tive ly . Thes e two stopping times are close to one another , which follo ws from analyzing the functi on F ( ζ , c ) = 1 − ζ − e − cζ , where ( ζ , c ) is the solu tion of F ( ζ , c ) = 0 , for gi ven c . Since all partial der iv ativ es of F ( ζ , c ) are continuo us and boun ded, the sto pping times of the branching proce sses deﬁned from G n,p − , G n,p + are ‘close’ to the solution of (34), for λ = c . F rom (33), the stoppin g time of a RIG is bounded by the stoppin g times on G n,p − , G n,p + . W e conclude by prov ing tha t whp , the giant component of a RIG is uniqu e by adapti ng the argu m ents in [2 ] to our setting . Let us assume that there are at least two giant components in a RIG, with the sets of nodes V 1 , V 2 ⊂ V . Let us create a new , indep endent ‘sp rinkling’ d RIG on the top of our RIG, w ith the same sets of nodes and attrib utes, while ˆ p w = p γ w , fo r γ > 1 to be d eﬁned later . Now , our o bject of i nterest is RIG new = RIG ∪ d RIG. Let us consider all Θ( n 2 ) pairs { v 1 , v 2 } , where v 1 ∈ V 1 , v 2 ∈ V 2 , which are indep endent in d RIG, (b ut not in RIG), hence the prob ability that two n odes v 1 , v 2 ∈ V are connect ed in d RIG is gi ven by 1 − Y w (1 − ˆ p 2 w ) = 1 − Y w (1 − p 2 γ w ) = X w p 2 γ w + ø( X w p 2 γ w ) , (36) which is true, since γ > 1 and p w = O (1 /n ) for any w . Giv en that P w p 2 w = c/n , we choose γ > 1 so that P w p 2 γ w = ω (1 /n 2 ) . Now , by t he Markov inequali ty , whp the re is a pair { v 1 , v 2 } su ch that v 1 is conn ected to v 2 in d RIG, implying that V 1 , V 2 are connected , whp , forming one connected compo nent within RIG new . F rom the pre vious analysis, it follo ws that this compone nt is of size at lea st 2 nζ λ (1 − δ ) for any s m all con stant δ > 0 . On the other hand , the prob abilities p new w in RIG new satisfy p new w = 1 − (1 − p w )(1 − ˆ p w ) = p w + ˆ p w (1 − p w ) = p w + p γ w (1 − p w ) = p w (1 + ø(1)) , which is again true , sin ce γ > 1 a nd p w = O (1 /n ) for any w . Thus, X w ∈ W ( p new w ) 2 = X w ∈ W p 2 w + Θ( X w ∈ W p 1+ γ w (1 − p w )) = X w ∈ W p 2 w (1 + ø(1)) = c/n + o (1 /n ) . (37) Giv en that the stoppin g time on RIG is bounde d by the stopp ing times on G n,p − , G n,p + , and from its continu ity , it follo ws that the giant component in RIG new canno t be of size 2 nζ λ (1 − δ ) , which is a contradictio n. Thus, there is only one giant componen t in RIG, of size gi ven by nζ c (1 + ø(1)) , where ζ c satisﬁes (34), for λ = c . Moreo ver , kno w ing beha vior of G n,p , from (33), it follo ws that all other components are of size O (log n ) . 6 Conclusion The ana lysis of random models for b ipartite graphs is important for the st udy of social network s, or any netw ork formed by associa ting nodes with shared attrib utes. In the random inte rsection graph (RIG) model , nodes ha ve certain attrib utes w ith ﬁxed prob abilities. In this paper , we ha ve considere d the general R IG model, where these probab ilities are represent ed by a set of probabi lities p = { p w } w ∈ W , where p w denote s the probabil ity that a node is attache d to the attr ibute w . W e ha ve analyzed the e voluti on of component s in general RIGs, gi ving condition s for exi stence and uniqueness of the g iant component . W e ha ve done so by generalizin g the br anching process ar gument used to study the bi rth of the giant compon ent in Erd ˝ os-R ´ en yi graphs. W e ha ve considere d a depen dent, inhomogeneou s Galton-W atson proces s, where the number of offs pring follows a binomial distrib ution with a diff erent number of nodes and dif ferent rate at each step during the e volu tion. The analy sis of such a pro cess is complicated by the depend ence on its history , dictated by the structure of general RIGs. W e ha ve sho wn tha t in spite of this dif ﬁculty , it is possible to gi ve stochastic bounds on the branch ing pro cess, and that under certain conditio ns the giant componen t appears at the thresh old n P w ∈ W p 2 w = 1 , with probabili ty tendin g to one, as the number of node s tend s to inﬁni ty . 11 Ackno wledgments Part of this work was funded by the Department of Ener gy at Los Alamos National Laboratory under contra ct DE- A C52-06N A25396 through the Laboratory-Dir ected Research and De velopment Program, and by the National Science Foun dation grant CCF -0829 945. Nicolas W . Hengart ner was supporte d by DOE-LD RD 2 0080391ER. Refer ences [1] A LB E RT , R . , A N D B A R A B ´ A SI , A . L . Stati stical mechanics of comple x networks. R ev . Mod. Phys. 74 , 1 (2002 ), 47 – 97. [2] A LO N , N . , A N D S P E N C E R , J . H . The pr obabili stic method , 2nd ed. John W iley & S ons, Inc., New Y ork, 2000. [3] B A R A B ´ A SI , A . L . , A N D A L B E RT , R . Emer gence of Scaling in Random Networks. Science 286 , 5439 (1999 ), 509–5 12. [4] B EH R IS C H , M . Component ev olution in random intersecti on graphs. In Electr . J. Comb . (2007), vol . 14. [5] B ER N ST E I N , S . N . On a modiﬁcatio n of cheby shevs inequali ty and of the error formula of laplace. Ann. Sci. Inst. Sav . U kra ine, Sect. Math. 4 , 25 (1924). [6] B LO Z N E L I S , M . , J A W O R S K I , J . , A N D R Y BA R C Z Y K , K . Component ev olution in a secu re wireless sensor netwo rk. Netw . 53 , 1 (2009), 19–26. [7] C H U N G , F . , A N D L U , L . The av erage distances in random graphs with giv en expected deg rees. Pr oceedings of the Nationa l Academy of Sci ences of th e United States of A merica 99 , 25 (2002 ), 15879–158 82. [8] D EI J F E N , M . , A N D K E TS , W . Random in tersection graphs with tunabl e degree dis tribut ion and cluster ing. Pr obab . Eng. Inf. Sc i. 23 , 4 (2009), 661–674. [9] E R D ˝ O S , P . , G O O D M A N , A . W . , A N D P ´ O SA , L . The repres entation of a graph by set interse ctions. Canad. J . Math. 18 (1966), 106–112. [10] E R H A R D G O D E H A R D T , J E R Z Y J A W O R S K I , K . R . Random intersec tion grap hs an d classiﬁcatio n. In Advances in Data Analysis (2007) , v ol. 45, pp. 67–74. [11] E U B A N K , S . , G U C L U , H . , A N I L K U M A R , V . S . , M A R A T H E , M . V . , S R I N I V A S A N , A . , T O RO C Z K A I , Z . , A N D W A N G , N . Modelli ng disease outbrea ks in realistic urban social ne tworks. Natur e 429 , 6988 (May 2004) , 180 –184. [12] F I L L , J . A . , S C H E I N E R M A N , E . R . , A N D S I N G E R - C O H E N , K . B . Random intersec tion graphs when m = ω ( n ) : An equi valenc e theore m relating the ev olution of the g ( n, m , p ) and g ( n, p ) models. Rando m Struct . Algor ithms 16 , 2 (2000), 156–17 6. [13] G O D EH A R D T , E . , A N D J A W O R S K I , J . T wo models of random inte rsection graphs and thei r applic ations. Electr onic N otes in Discr ete Mathematics 10 (2001), 129–132 . [14] G U I L L AU M E , J . - L . , A N D L A TA P Y , M . Bipartite graphs as m odels of complex network s. Physica A: Statis tical and Theor etical Physics 371 , 2 (2006), 795 – 813. [15] K A RO ´ N SK I , M . , S C H E I N E R M A N , E . , A N D S I N G E R - C O H E N , K . On random inter section graph s:the sub- graph prob lem. Combinato rics, Pr obability and C omputin g 8 (1999). [16] N E W M A N , M . E . J . Scientiﬁc collabo ration netwo rks. I. Network constru ction and fundamental results . Phys. Rev . E 64 , 1 (Jun 2001), 016131 . 12 [17] N E W M A N , M . E . J . , A N D P A R K , J . Why social ne tworks are d iffere nt from oth er types of networ ks. Phys. Rev . E 68 , 3 (Sep 2003), 036122. [18] N E W M A N , M . E . J . , S T RO G ATZ , S . H . , A N D W ATT S , D . J . Random graphs with arbitrary degree distri- b utions and their applicatio ns. Phys. Rev . E 64 , 2 (Jul 2001), 026118. [19] N I KO L E TS E A S , S . , R A P T O P O U L O S , C . , A N D S P I R A K I S , P . Lar ge independen t sets in general random interse ction graph s. Theor . Comput. Sci. 406 (October 2008), 215–224. [20] N I KO L E TS E A S , S . E . , R A P T O P O U L O S , C . , A N D S P I R A K I S , P . G . T he exis tence and ef ﬁcient con struction of large independen t sets in general random intersect ion graphs. In ICALP (2004 ), J. Daz, J. Karhumki, A. Lepist, a nd D. Sannella , Eds., v ol. 3142 of Lec tur e Notes in Computer Science , Spri nger , pp. 1029–1040 . [21] N I KO L E TS E A S , S . E . , R A P TO P O U L O S , C . , A N D S P I R A K I S , P . G . Expander proper ties and the cov er time of rando m inters ection graphs. Theor . Comput. Sci. 410 , 50 (2009), 5261–527 2. [22] R E M C O V A N D ER H O FS T A D . Random gra phs and complex network s. Lecture notes in pre paration, http: //www .win.tue.nl/ ˜ rhofs tad/N otesRGCN.html . [23] R Y B A R C Z Y K , K . Equiv alence of the rand om inters ection graph and G ( n , p ) , 2009. Submitted, http: //arx iv.org/abs/0910.5311 . [24] S I N G E R - C O H EN , K . R andom inters ection graph s. PhD thesi s, John s Hopkin s Uni versit y , 1995. [25] W A T TS , D . J . , A N D S T RO G A T Z , S . H . Collecti ve dynamics of Small-World networks. Natur e 393 , 6684 (1998 ), 440–4 42. 13

Component Evolution in General Random Intersection Graphs

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment