Finding Dense Subgraphs in G(n,1/2)
Finding the largest clique is a notoriously hard problem, even on random graphs. It is known that the clique number of a random graph G(n,1/2) is almost surely either k or k+1, where k = 2log n - 2log(log n) - 1. However, a simple greedy algorithm fi…
Authors: Atish Das Sarma, Amit Deshp, e
Finding Dense Subgraphs in G ( n, 1 / 2) A tish Das Sarma ∗ Georgia Ins titute of T ec hnology Amit Deshpande Microsoft Researc h Ra vi Kannan Microsoft Researc h 1 In tro duction Finding the largest clique is a notoriously hard problem, even on random graphs. It is kno wn that the clique num b er of a random graph G ( n, 1 / 2) is almost su rely either k or k + 1, where k = ⌈ 2 l og n − 2 log log n − 1 ⌉ (Secti on 4.5 in [1], also [2]). Ho w ev er, a simple greedy algorithm finds a clique of size only log n (1 + o (1)), with h igh probabilit y , and finding larger cliques – th at of size even (1 + ǫ ) log n – in randomized p olynomial time has b een a long-standing op en problem [3]. In this pap er, w e study the follo wing generalization: giv en a random graph G ( n, 1 / 2) find the largest subgraph with edge densit y at least (1 − δ ). W e show that a simple mo dification of the greedy algorithm find s a subset of 2 log n v ertices whose in duced sub graph has edge dens it y at least 0 . 951, with high p robabilit y . T o complement this, w e sh o w that almost sur ely there is n o subset of 2 . 784 log n ve rtices whose ind uced su bgraph h as edge d ensit y 0 . 951 or more. W e use G ( n, p ) to denote a r andom graph on n v ertices where eac h pair of vertic es app ears as an edge indep enden tly with probabilit y p . W e use V to denote its set of v ertices and E to denote its set of edges. Moreo v er, giv en t wo subs ets S ⊆ V and T ⊆ V , we use E ( S, T ) to denote the set of edges with one endp oint in S and another endp oin t in T . The densit y of the sub graph ind uced b y v ertices in S is giv en by densit y ( S ) = | E ( S, S ) | | S | 2 . Therefore, the exp ected den s it y of G ( n, 1 / 2) is 1 / 2 and th e den s it y of an y clique is 1. In Section 2 we describ e our algorithm for fi nding subgraphs of densit y 1 − δ . W e giv e a b ound on the largest subgraph of densit y 1 − δ in the follo wing Section 3. Finally , in Section 4, we presen t some op en problems. 2 Algorithm for fin d ing large subgraph of densit y 1 − δ In this section, w e describ e our algorithm and giv e a relationship b et w een th e size of the sub graph obtained by the algorithm, and its d ensit y . In particular, w e sho w that th e algo rithm can b e used to obtain a sub set of 2 log n ve rtices of densit y 0 . 951, with high p robabilit y . ∗ W ork done while at Microsof t R esearc h 1 Greedy Algorithm to pick a dense subgraph: Input: a random graph G ( n, 1 / 2) and δ > 0. Output: a subset S ⊆ V o f size k = 2 log n . 1. Partit ion the v ertices into d isjoin t sets V = V 1 ∪ V 2 ∪ · · · ∪ V k , eac h of size n/k . 2. In itialize S 0 = ∅ . 3. F or i = 0 to k − 1 do: (a) Pic k v i +1 ∈ V i +1 that has the maxim um n um b er of edges to S i , i.e., v i +1 = argmax v ∈ V i +1 | E ( v i +1 , S i ) | . (b) S i +1 ← S i ∪ { v i +1 } . 4. Return S = S k − 1 . Notice that the algorithm first partitions all nod es int o k r an d om sub sets of th e same size, and then picks one v ertex from eac h p artitio n. This partitioning is n ecessary to argue ab out indep endence in our analysis of c ho osing vertice s greedily . In the analysis b elo w, H ( δ ) is the s tandard notation of th e Shann on en tropy fun ctio n, whic h is − ( δ lo g δ + (1 − δ ) log (1 − δ )). The follo win g lemma giv es a lo w er b ou n d on the num b er of edges w e can exp ect to add to our subgraph, for the i -th ve rtex add ed b y the algorithm. Lemma 2.1. F or any 0 ≤ i ≤ k and δ i that satisfies H ( δ i ) ≥ 1 − 1 i log n 2 k l n(log n ) , we have Pr ( | E ( v i +1 , S i ) | ≥ (1 − δ i ) i ) ≥ 1 − 1 log 2 n . Pr o of. W e kno w b y the previous results, that as long as k < log n , the v ertex added h as all edges to S k − 1 . Consider k ≥ log n . Th e algorithm h as n l v ertices to c ho ose f rom. The exp ected num b er of v ertices among these, w ith at lea st (1 − δ k ) k v ertices is given b y , Fix v ∈ V i +1 . The probabilit y that v has at least (1 − δ i ) i edges to S i is Pr ( | E ( v , S i ) | ≥ (1 − δ i ) i ) = i X t =(1 − δ i ) i i t 2 − i = 2 ( H ( δ i )+ o (1) − 1) i , where H ( δ ) = − δ log δ − (1 − δ ) log (1 − δ ) is the Shannon en tropy (here log is tak en with base 2). 2 Using indep end ence of these ev ent s for differen t v ∈ V i +1 , w e get Pr ( | E ( v , S i ) | < (1 − δ i ) i, ∀ v ∈ V i +1 ) ≤ 1 − 2 ( H ( δ i ) − 1) i n/k ≤ 1 − 2 k l n(log n ) n n/k ≤ 1 log 2 n . Therefore, Pr ( | E ( v i +1 , S i ) | ≥ (1 − δ i ) i ) ≥ 1 − 1 log 2 n . W e now give a union b ound o v er all k add itions of v ertices, using the previous lemma. Lemma 2.2. Pr | E ( S, S ) | ≥ k − 1 X i =0 (1 − δ i ) i ! → 1 as n → ∞ . Pr o of. S ince V 1 , V 2 , . . . , V k are disjoin t, using ind ep endence and Lemma 2.1 we get Pr | E ( S, S ) | ≥ k − 1 X i =0 (1 − δ i ) i ! ≥ k − 1 Y i =0 Pr ( | E ( v i +1 , S i ) | ≥ (1 − δ i ) i ) ≥ 1 − 1 log 2 n k − 1 ≥ e 1 / log n using k = 2 log n The p oin t is that w e are pic king exactly one v ertex fr om eac h ve rtex set/partition, and hence do not lose an y randomness or ind ep endence of the edges. This now giv es us a b ou n d on the minim um num b er of edges one ca n exp ect, w.h .p ., in th e chosen set of k vertice s. W e are not able to expr ess, in a closed form, the size of a s ubgraph obtainable u sing this algorithm for a sp ecific densit y . T h erefore, we s tate the b est density one can guaran tee w.h.p . for k = 2 log n . This is stated as a theorem b elo w, w hic h w e pro v e su bsequen tly . Theorem 2.3. Our algorithm pr o duc es a subse t S ⊆ V of size k = 2 log n such that densit y ( S ) & 0 . 951 , alm ost sur ely. 3 Pr o of. F rom Lemma 2.2 w e ha v e that, almost surely , | E ( S, S ) | ≥ k − 1 X i =0 (1 − δ i ) i ≥ k − 1 X i =0 1 − H − 1 1 − 1 i log n 2 k l n(log n ) i = k − 1 X i =0 i − k − 1 X i =log m iH − 1 1 − log m i = k 2 − k − 1 X i =log m iH − 1 1 − log m i , (1) where m = n/ 2 k ln(log n ). Here w e use th e fact that w e can choose δ i = 0 for the first log m steps. No w let k − 1 = (1 + α ) log m . Then k − 1 X i =log m iH − 1 1 − log m i = (1+ α ) log m X i =log m iH − 1 1 − log m i = α log m X t =0 (log m + t ) H − 1 1 − log m log m + t = log 2 m α X x =0 (1 + x ) H − 1 1 − 1 1 + x ≤ log 2 m Z α 0 (1 + x ) H − 1 1 − 1 1 + x dx, (2) No w using Equ ations (1) and (2 ) we ha v e densit y ( S ) = | E ( S, S ) | k 2 ≥ 1 − log 2 m k 2 Z α 0 (1 + x ) H − 1 1 − 1 1 + x dx ≥ 1 − 1 2 (1 + o (1)) Z α 0 (1 + x ) H − 1 1 − 1 1 + x dx & 0 . 951 . using α = k log m − 1 = 2 log n log n − log (4 log n · ln (log n )) − 1 = 1 + o (1) . and computing an u pp er b ound on the integ ral n umerically . 4 3 Upp er b ound on largest su bgraph of densit y 1 − δ In this section, we upp er b ound the size of the largest subgraph of d ensit y 1 − δ in G ( n, 1 / 2). Theorem 3.1. A r andom gr aph G ( n, 1 / 2) has no sub gr aph of size 2 log n + 2 log e 1 − H ( δ ) − o (1) + 1 and density at le ast 1 − δ , almost sur ely. In p articular, ther e is no sub gr aph of si ze 2 . 784 log n and density at le ast 0 . 951 , almost sur e ly. Pr o of. F or ev ery S ⊆ V of size k , define an indicator rand om v ariable X S as follo ws. X S = ( 1 if S ind uces a subgraph of density ≥ 1 − δ 0 otherwise. Th us E [ X S ] = ( k 2 ) X i =(1 − δ ) ( k 2 ) k 2 i 2 − ( k 2 ) = 2 ( H ( δ )+ o (1) − 1) ( k 2 ) . By linearit y of exp ectat ion, the exp ected n um b er of subgraph s of size k and densit y at least 1 − δ is E X S : | S | = k X S = X S : | S | = k E [ X S ] = n k 2 ( H ( δ )+ o (1) − 1) ( k 2 ) ≤ en k k 2 ( H ( δ )+ o (1) − 1) k − 1 2 k = en k · 2 ( H ( δ )+ o (1) − 1) k − 1 2 k = 2 (1 − H ( δ ) − o (1)) k 2 k · 2 ( H ( δ )+ o (1) − 1) k − 1 2 ! k = 2 (1 − H ( δ )+ o (1)) / 2 k ! k → 0 , as n → ∞ , using k = 2 log n + 2 log e 1 − H ( δ ) − o (1) + 1 . Therefore, b y Marko v inequalit y w e hav e Pr X S : | S | = k X S ≥ 1 ≤ E X S : | S | = k X S → 0 , as n → ∞ . Or in other w ords, almost su rely there is n o subset of k v ertices that indu ce a subgraph of density at least 1 − δ . 5 Notice that for densit y 0.9 51, the gap/ratio b etw een the largest subgraph that exists and the largest sub graph that w e can find is sm aller than in the case of cliques. This is inte resting, although not en tirely unexp ected as for densit y 0 . 5, the whole graph can b e ou tp ut. This ratio for densit y 0.951 is how eve r significan tly smaller than 2; it is 2.784/2 = 1.392. 4 Conclusions F or a concrete op en problem, is there a p olynomial time algorithm that outputs a sub graph of densit y 1 − ǫ and size 2 log n for any c hoice of ǫ > 0 ? Are there simple algorithms that b eat the d ensit y b ound of 0 . 95 for subgraphs of size 2 log n . Is there an O ( n log n ) time algorithm that find s the large st clique in G ( n, 1 / 2)? If n ot, wh at is the maxim um density ob tainable for a s ubgraph of size 2 log n ? S p ectral tec hniques could b e tried. References [1] Noga Alon and Joel Sp encer, The pr ob abilistic metho d (2nd e dition) , second ed., Wiley In ter- science, 2000. [2] B ´ ela Bollob´ as, R ando m gr aphs , Academic Press, New Y ork, 19 85. [3] Ric hard Karp , The pr ob abilistic analysis of some c ombinatorial se ar ch algorithms , (1976), 1–19. 6
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment