Finding Dense Subgraphs in G(n,1/2)

Finding Dense Subgraphs in G ( n, 1 / 2) A tish Das Sarma ∗ Georgia Ins titute of T ec hnology Amit Deshpande Microsoft Researc h Ra vi Kannan Microsoft Researc h 1 In tro duction Finding the largest clique is a notoriously hard problem, even on random graphs. It is kno wn that the clique num b er of a random graph G ( n, 1 / 2) is almost su rely either k or k + 1, where k = ⌈ 2 l og n − 2 log log n − 1 ⌉ (Secti on 4.5 in [1], also [2]). Ho w ev er, a simple greedy algorithm ﬁnds a clique of size only log n (1 + o (1)), with h igh probabilit y , and ﬁnding larger cliques – th at of size even (1 + ǫ ) log n – in randomized p olynomial time has b een a long-standing op en problem [3]. In this pap er, w e study the follo wing generalization: giv en a random graph G ( n, 1 / 2) ﬁnd the largest subgraph with edge densit y at least (1 − δ ). W e show that a simple mo diﬁcation of the greedy algorithm ﬁnd s a subset of 2 log n v ertices whose in duced sub graph has edge dens it y at least 0 . 951, with high p robabilit y . T o complement this, w e sh o w that almost sur ely there is n o subset of 2 . 784 log n ve rtices whose ind uced su bgraph h as edge d ensit y 0 . 951 or more. W e use G ( n, p ) to denote a r andom graph on n v ertices where eac h pair of vertic es app ears as an edge indep enden tly with probabilit y p . W e use V to denote its set of v ertices and E to denote its set of edges. Moreo v er, giv en t wo subs ets S ⊆ V and T ⊆ V , we use E ( S, T ) to denote the set of edges with one endp oint in S and another endp oin t in T . The densit y of the sub graph ind uced b y v ertices in S is giv en by densit y ( S ) = | E ( S, S ) |  | S | 2  . Therefore, the exp ected den s it y of G ( n, 1 / 2) is 1 / 2 and th e den s it y of an y clique is 1. In Section 2 we describ e our algorithm for ﬁ nding subgraphs of densit y 1 − δ . W e giv e a b ound on the largest subgraph of densit y 1 − δ in the follo wing Section 3. Finally , in Section 4, we presen t some op en problems. 2 Algorithm for ﬁn d ing large subgraph of densit y 1 − δ In this section, w e describ e our algorithm and giv e a relationship b et w een th e size of the sub graph obtained by the algorithm, and its d ensit y . In particular, w e sho w that th e algo rithm can b e used to obtain a sub set of 2 log n ve rtices of densit y 0 . 951, with high p robabilit y . ∗ W ork done while at Microsof t R esearc h 1 Greedy Algorithm to pick a dense subgraph: Input: a random graph G ( n, 1 / 2) and δ > 0. Output: a subset S ⊆ V o f size k = 2 log n . 1. Partit ion the v ertices into d isjoin t sets V = V 1 ∪ V 2 ∪ · · · ∪ V k , eac h of size n/k . 2. In itialize S 0 = ∅ . 3. F or i = 0 to k − 1 do: (a) Pic k v i +1 ∈ V i +1 that has the maxim um n um b er of edges to S i , i.e., v i +1 = argmax v ∈ V i +1 | E ( v i +1 , S i ) | . (b) S i +1 ← S i ∪ { v i +1 } . 4. Return S = S k − 1 . Notice that the algorithm ﬁrst partitions all nod es int o k r an d om sub sets of th e same size, and then picks one v ertex from eac h p artitio n. This partitioning is n ecessary to argue ab out indep endence in our analysis of c ho osing vertice s greedily . In the analysis b elo w, H ( δ ) is the s tandard notation of th e Shann on en tropy fun ctio n, whic h is − ( δ lo g δ + (1 − δ ) log (1 − δ )). The follo win g lemma giv es a lo w er b ou n d on the num b er of edges w e can exp ect to add to our subgraph, for the i -th ve rtex add ed b y the algorithm. Lemma 2.1. F or any 0 ≤ i ≤ k and δ i that satisﬁes H ( δ i ) ≥ 1 − 1 i log  n 2 k l n(log n )  , we have Pr ( | E ( v i +1 , S i ) | ≥ (1 − δ i ) i ) ≥ 1 − 1 log 2 n . Pr o of. W e kno w b y the previous results, that as long as k < log n , the v ertex added h as all edges to S k − 1 . Consider k ≥ log n . Th e algorithm h as n l v ertices to c ho ose f rom. The exp ected num b er of v ertices among these, w ith at lea st (1 − δ k ) k v ertices is given b y , Fix v ∈ V i +1 . The probabilit y that v has at least (1 − δ i ) i edges to S i is Pr ( | E ( v , S i ) | ≥ (1 − δ i ) i ) = i X t =(1 − δ i ) i  i t  2 − i = 2 ( H ( δ i )+ o (1) − 1) i , where H ( δ ) = − δ log δ − (1 − δ ) log (1 − δ ) is the Shannon en tropy (here log is tak en with base 2). 2 Using indep end ence of these ev ent s for diﬀeren t v ∈ V i +1 , w e get Pr ( | E ( v , S i ) | < (1 − δ i ) i, ∀ v ∈ V i +1 ) ≤  1 − 2 ( H ( δ i ) − 1) i  n/k ≤  1 − 2 k l n(log n ) n  n/k ≤ 1 log 2 n . Therefore, Pr ( | E ( v i +1 , S i ) | ≥ (1 − δ i ) i ) ≥ 1 − 1 log 2 n . W e now give a union b ound o v er all k add itions of v ertices, using the previous lemma. Lemma 2.2. Pr | E ( S, S ) | ≥ k − 1 X i =0 (1 − δ i ) i ! → 1 as n → ∞ . Pr o of. S ince V 1 , V 2 , . . . , V k are disjoin t, using ind ep endence and Lemma 2.1 we get Pr | E ( S, S ) | ≥ k − 1 X i =0 (1 − δ i ) i ! ≥ k − 1 Y i =0 Pr ( | E ( v i +1 , S i ) | ≥ (1 − δ i ) i ) ≥  1 − 1 log 2 n  k − 1 ≥ e 1 / log n using k = 2 log n The p oin t is that w e are pic king exactly one v ertex fr om eac h ve rtex set/partition, and hence do not lose an y randomness or ind ep endence of the edges. This now giv es us a b ou n d on the minim um num b er of edges one ca n exp ect, w.h .p ., in th e chosen set of k vertice s. W e are not able to expr ess, in a closed form, the size of a s ubgraph obtainable u sing this algorithm for a sp eciﬁc densit y . T h erefore, we s tate the b est density one can guaran tee w.h.p . for k = 2 log n . This is stated as a theorem b elo w, w hic h w e pro v e su bsequen tly . Theorem 2.3. Our algorithm pr o duc es a subse t S ⊆ V of size k = 2 log n such that densit y ( S ) & 0 . 951 , alm ost sur ely. 3 Pr o of. F rom Lemma 2.2 w e ha v e that, almost surely , | E ( S, S ) | ≥ k − 1 X i =0 (1 − δ i ) i ≥ k − 1 X i =0  1 − H − 1  1 − 1 i log  n 2 k l n(log n )  i = k − 1 X i =0 i − k − 1 X i =log m iH − 1  1 − log m i  =  k 2  − k − 1 X i =log m iH − 1  1 − log m i  , (1) where m = n/ 2 k ln(log n ). Here w e use th e fact that w e can choose δ i = 0 for the ﬁrst log m steps. No w let k − 1 = (1 + α ) log m . Then k − 1 X i =log m iH − 1  1 − log m i  = (1+ α ) log m X i =log m iH − 1  1 − log m i  = α log m X t =0 (log m + t ) H − 1  1 − log m log m + t  = log 2 m α X x =0 (1 + x ) H − 1  1 − 1 1 + x  ≤ log 2 m Z α 0 (1 + x ) H − 1  1 − 1 1 + x  dx, (2) No w using Equ ations (1) and (2 ) we ha v e densit y ( S ) = | E ( S, S ) |  k 2  ≥ 1 − log 2 m  k 2  Z α 0 (1 + x ) H − 1  1 − 1 1 + x  dx ≥ 1 − 1 2 (1 + o (1)) Z α 0 (1 + x ) H − 1  1 − 1 1 + x  dx & 0 . 951 . using α = k log m − 1 = 2 log n log n − log (4 log n · ln (log n )) − 1 = 1 + o (1) . and computing an u pp er b ound on the integ ral n umerically . 4 3 Upp er b ound on largest su bgraph of densit y 1 − δ In this section, we upp er b ound the size of the largest subgraph of d ensit y 1 − δ in G ( n, 1 / 2). Theorem 3.1. A r andom gr aph G ( n, 1 / 2) has no sub gr aph of size 2 log n + 2 log e 1 − H ( δ ) − o (1) + 1 and density at le ast 1 − δ , almost sur ely. In p articular, ther e is no sub gr aph of si ze 2 . 784 log n and density at le ast 0 . 951 , almost sur e ly. Pr o of. F or ev ery S ⊆ V of size k , deﬁne an indicator rand om v ariable X S as follo ws. X S = ( 1 if S ind uces a subgraph of density ≥ 1 − δ 0 otherwise. Th us E [ X S ] = ( k 2 ) X i =(1 − δ ) ( k 2 )   k 2  i  2 − ( k 2 ) = 2 ( H ( δ )+ o (1) − 1) ( k 2 ) . By linearit y of exp ectat ion, the exp ected n um b er of subgraph s of size k and densit y at least 1 − δ is E   X S : | S | = k X S   = X S : | S | = k E [ X S ] =  n k  2 ( H ( δ )+ o (1) − 1) ( k 2 ) ≤  en k  k  2 ( H ( δ )+ o (1) − 1) k − 1 2  k =  en k · 2 ( H ( δ )+ o (1) − 1) k − 1 2  k = 2 (1 − H ( δ ) − o (1)) k 2 k · 2 ( H ( δ )+ o (1) − 1) k − 1 2 ! k = 2 (1 − H ( δ )+ o (1)) / 2 k ! k → 0 , as n → ∞ , using k = 2 log n + 2 log e 1 − H ( δ ) − o (1) + 1 . Therefore, b y Marko v inequalit y w e hav e Pr   X S : | S | = k X S ≥ 1   ≤ E   X S : | S | = k X S   → 0 , as n → ∞ . Or in other w ords, almost su rely there is n o subset of k v ertices that indu ce a subgraph of density at least 1 − δ . 5 Notice that for densit y 0.9 51, the gap/ratio b etw een the largest subgraph that exists and the largest sub graph that w e can ﬁnd is sm aller than in the case of cliques. This is inte resting, although not en tirely unexp ected as for densit y 0 . 5, the whole graph can b e ou tp ut. This ratio for densit y 0.951 is how eve r signiﬁcan tly smaller than 2; it is 2.784/2 = 1.392. 4 Conclusions F or a concrete op en problem, is there a p olynomial time algorithm that outputs a sub graph of densit y 1 − ǫ and size 2 log n for any c hoice of ǫ > 0 ? Are there simple algorithms that b eat the d ensit y b ound of 0 . 95 for subgraphs of size 2 log n . Is there an O ( n log n ) time algorithm that ﬁnd s the large st clique in G ( n, 1 / 2)? If n ot, wh at is the maxim um density ob tainable for a s ubgraph of size 2 log n ? S p ectral tec hniques could b e tried. References [1] Noga Alon and Joel Sp encer, The pr ob abilistic metho d (2nd e dition) , second ed., Wiley In ter- science, 2000. [2] B ´ ela Bollob´ as, R ando m gr aphs , Academic Press, New Y ork, 19 85. [3] Ric hard Karp , The pr ob abilistic analysis of some c ombinatorial se ar ch algorithms , (1976), 1–19. 6

Finding Dense Subgraphs in G(n,1/2)

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment