Spectral Clustering and Block Models: A Review And A New Algorithm

Spectral Clustering and Block Models: A Rev iew And A New Algorithm Sharmod eep B hattachary ya and Peter J. Bickel Abstract W e focus on spectral clustering of unlabeled graph s and revie w some re- sults o n clustering m ethods which achieve weak or strong consistent id entiﬁcation in data gene rated by such mode ls. W e also presen t a new algor ithm which appears to perform optimally both theoretically using asymptotic theory and empirically . 1 Intr oduction Since its introdu ction in [ 15 ], spectral analysis of various matrices associated to group s h as beco me one of the most widely u sed cluster ing tech niques in statistics and machine learning . In the con text of u nlabeled graph s, a nu mber o f metho ds, all o f which co me under the b road h eading o f spec tral clu stering have been propo sed. These m ethods based on spectra l analysis of adjacency matrices or some derived matrix suc h as one of the Lap lacians ([ 31 ], [ 28 ], [ 23 ], [ 29 ], [ 32 ]) have been studied in connectio n with th eir effectiv eness in id entifying m embers of blo cks in exchang eable graph block models. In th is paper afte r introducing th e methods and m odels, we in tend to review some of the literatu re. W e relate it to the results of M ossel, Neeman and Sharmode ep Bhattacharyya Orego n State Uni ver sity , Department of Statistics, 4 4 Kidder Hall, Co rv allis, OR, e-mail: bhattas h@scie nce.oregonstate.edu Peter J. Bickel Univ ersity of California at Berkele y , Department of Statisti cs, 367 Ev ans Hall, Berkeley , CA e- mail: bickel@ stat.b erkeley.edu 1 2 Sharmode ep Bhattacharyya and Peter J. Bickel Sly (2012 ) [ 26 ] and Massouli ´ e (20 14) [ 2 4 ], where it is shown that fo r very sparse models, there exists a phase transition belo w which members can not be iden tiﬁed better than chance and also showed that above the phase transition one can do better using rather subtle meth ods. In [ 6 ] we de velop a sp ectral clustering method based on the matrix of g eodesic distanc es b etween nod es which can ac hieve the goals of the w ork we cited and in fact behaves well for all unlabeled networks, sp arse, semi- sparse and dense. W e giv e a statement and sk etch the proof of these claims in [] b ut giv e a full ar gument for the sparse case considered by the above authors only in this paper . W e gi ve the necessary p reliminaries in Section 2, m ore history in Section 3 and show the theoretical properties of the method in Section 4. 2 Pr eliminaries There are many standard methods of clustering based on num erical similarity matri- ces which ar e discussed in a num ber of mon ograph s (Eg:Hartig an [ 19 ], Leroy and Rousseuw [ 30 ]). W e shall not d iscuss these fu rther . Our fo cus is on un labeled graphs of n vertices cha racterized by adjacency matrices, A = || a i j || fo r n data p oints. W ith a i j = 1 if the re is an e dge between i and j and a i j = 0 o therwise. The natural as- sumption then is, A = A T . Our basic g oal is to divide the poin ts in K sets such that on some average criterion the p oints in a given subset are more similar to each other than to those of oth er subsets. Our focu s is on methods of clusterin g based on the spectrum (eigenv alues and eig en vectors) of A o r related matrices. 2.1 Notation and F ormal Deﬁnition of Stochastic Block Model Deﬁnition 1. A g raph G K ( B , ( P , π )) g enerated from th e sto chastic block model (SBM) with K blocks an d pa rameters P ∈ ( 0 , 1 ) K × K and π ∈ ( 0 , 1 ) K can be de- ﬁned in following way - each vertex of gr aph G n is assigned to a comm unity c ∈ { 1 , . . . , K } . The ( c 1 , . . . , c n ) are independ ent outcom es o f m ultinomial draws with parameter π = ( π 1 , . . . , π K ) , where π i > 0 for all i . Con ditional on the label vector c ≡ ( c 1 , . . . , c n ) , the edge variables A i j for i < j are independ ent Bern oulli variables w ith E [ A i j | c ] = P c i c j = min { ρ n B c i c j , 1 } , (1) Spectral Clustering and Block Models: A Re vie w And A New Algorithm 3 where P = [ P ab ] and B = [ B ab ] are K × K symm etric matrices. W e call P th e con- nection p robability matrix and B the kernel matrix for th e co nnection . So, we ha ve P ab ≤ 1 for all a , b = 1 , . . . , K , P 1 ≤ 1 and 1 T P ≤ 1 elem ent-wise. By deﬁnition A ji = A i j , and A ii = 0 (no self-loop s). This formulation is a reparametrizatio n du e to B ickel and C hen (200 9) [ 8 ] of the deﬁnition o f Holland an d Leinhar dt [ 2 0 ]. It permits separa te consid eration asym p- totically of the density of the graph and its structure as follows: P ( V ertex 1 b elongs to block a and vertex 2 to b lock b and are connected ) = π a π b P ab with P ab depend ing o n n. P ab = ρ n min ( B ab , 1 / ρ n ) . W e can interp ret ρ n as the un- condition al p robab ility of a n edge and B ab essentially as P ( V ertex 1 b elongs to a and vertex 2 belo ngs to b | an edg e between 1 and 2 ) . Set Π = diag ( π 1 , . . . , π K ) . 1. Deﬁne the matrices as M = Π B and S = Π 1 / 2 B Π 1 / 2 . 2. Note th at the eigenv alues of M are the same as the symmetric matrix S an d in particular are real-valued. 3. The eigenvalues of th e expected adjacency matrix ¯ A ≡ E ( A ) ar e also the sam e as tho se of S b u t with m ultiplicities. W e denote the eigenv alues by their absolu te order, λ 1 ≥ | λ 2 | ≥ · · · ≥ | λ K | . Let us den ote ( ϕ 1 , . . . , ϕ K ) , ϕ i ∈ R K , as the eigenv ectors of S correspo nding to the e igenv alu es λ 1 , . . . , λ K . If a set of λ j ’ s are equ al to λ , we choo se eigenvec- tors fro m the e igenspace correspo nding to the λ as appro priate. Then , we h av e, φ i = Π − 1 / 2 ϕ i and ψ i = Π 1 / 2 ϕ i as the left and right e igenv ectors of M . Also, h φ i , φ j i π = ∑ K k = 1 π k φ ik φ jk = δ i j . The spectral decompo sition of M , S an d B are B = K ∑ k = 1 λ k φ k φ T k , S = K ∑ k = 1 λ k ϕ k ϕ T k , M = K ∑ k = 1 λ k ψ k φ T k . 2.2 Spectral Clustering The b asic g oal o f com munity d etection is to inf er the nod e labels c fro m the data. Although we do not explicitly consider parameter estimatio n, the y can be recovered from ˆ c , an estimate of ( c 1 , . . . , c n ) by 4 Sharmode ep Bhattacharyya and Peter J. Bickel ˆ P ab ≡ 1 O ab n ∑ i = 1 n ∑ j = 1 A i j 1 ( ˆ c i = a , ˆ c j = b ) , 1 ≤ a , b ≤ K , (2) where, O ab ≡ ( n a n b , 1 ≤ a , b ≤ K , a 6 = b n a ( n a − 1 ) , 1 ≤ a ≤ K , a = b , n a ≡ n ∑ i = 1 1 ( ˆ c i = a ) , 1 ≤ a ≤ K There are a numb er of appr oaches for commun ity detection based on mod ular- ities ([ 18 ], [ 8 ]), maxim um likelihood and variational likeliho od ([ 11 ], [ 7 ]) and ap- proxim ations such a s semide ﬁnite progr amming appr oaches [ 3 ], pseud olikelihood [ 2 ] but these all tend to be comp utationally inten si ve and/or requ ire good initial assignments of blocks. The metho ds which have proved bo th com putationally e f- fective an d asymptotically correct in a sen se we shall d iscuss are related to sp ectral analysis of the adjacency or related matrices.They dif f er in important details. Giv en an n × n symmetric matr ix M based on A , the algorithm s ar e of the form: 1. Using the spectral decomp osition of M or a r elated generalized eigenpro blem. 2. Obtain an n × K matr ix of K n × 1 vectors. 3. Apply K means clustering to the n K -dimension al row vectors of the matrix of Step 2. 4. Identif y the ind ices o f the ro ws belon ging to cluster j , j = 1 , . . . , K with vertices belongin g t o block j . In addition to A , three graph Laplacian matrices discussed by v on Luxburg (200 7) [ 33 ], have been considered extensi vely , as well as som e others we shall mention brieﬂy below and th e matrix we shall show has optimal asy mptotic proper ties a nd discuss in greater detail. The matrices popular ly considered are: • L = D − A : the graph Laplacian. • L rw = D − 1 A : the random walk Laplacian. • L sym = D − 1 / 2 AD − 1 / 2 : the symme tric Laplacian. Here D = diag ( A 1 ) , the diagona l matrix whose diagonal is the vector o f row su ms of A . She c onsiders op timization p roblems which are relaxed versions o f co mbina- torial p roblems which implicitly deﬁne clusters as sets o f nod es with m ore inter nal than external edges. L and L sym appear in two of these relaxations. The form of step 2 d iffers for L and L sym with the K vectors of the L prob - lem correspo nding to th e top K eigenv a lues of the genera lized eigenv alue prob lem Lv = λ Dv ,while the n K -dimen sional vectors of the L sym problem are obtained by Spectral Clustering and Block Models: A Re vie w And A New Algorithm 5 normalizin g the rows o f the matrix of K eigenvectors corr espondin g to the top K eigenv alu es o f L sym . Their relation to the K block model is through asymptotics. Why is spectral clustering expected to work? Given A generated b y a K -block model, let c ↔ ( n 1 , . . . , n K ) wh ere, n a is the num ber of vertices a ssigned to type a . Then we can write, E ( A | c ) = PQP T where, P is a permutation ma trix and Q n × n has succesi ve blocks of n 1 rows, n 2 rows and so on with all the vectors in each ro w the same. Thu s r ank ( E ( A | c ) = K . The same is true of the asympto tic limit o f L giv en c . If as ymptotics as n → ∞ justify concentratio n of A or L around th eir expectations then we expect all e igenv alu es other than the largest K in absolute value are small. It follows that the n rows of the K eigen vector s associated with the to p K eigenv alues should be resolvable in to K clusters in R K with c luster members iden tiﬁed with rows of A n × n , see [ 29 ], [ 32 ] for proof s. 2.3 Asymptotics Now we can co nsider several asymptotic regimes as n → ∞ . Let λ n = n ρ n be the av erage degree of the graph. (I) Th e dense regime: λ n = Ω ( n ) . (II) The semi den se regime: λ n / l o g ( n ) → ∞ . (III) The semi sparse regime: Not semidense but λ n → ∞ . (IV) The sparse regime: λ n = O ( 1 ) . Here are some results in the different regimes. W e deﬁne a method of vertex assignment to co mmunities as a rando m map δ : { 1 , . . . , n } → { 1 , . . . , K } wh ere random ness co mes through the depend ence of delta on A as a f unction. Thus spectral clustering using the various matrices which depend on A is such a δ . Deﬁnition 2. δ is said to be str ongly con sistent if P ( i belongs to a and δ ( i ) = a for all i , a ) → 1 as n → ∞ . Note that the blocks are only determin ed up to permutatio n. 6 Sharmode ep Bhattacharyya and Peter J. Bickel Bickel and Chen (20 09) [ 8 ] s how that in th e ( semi) de nse regime a m ethod called proﬁle lik elihood is stro ngly consistent under minimal iden tiﬁability condition s and later this result was extend ed [ 7 ] to ﬁtting b y maximum likelihood or variational likelihood. In fact, in the ( semi) dense regime, the block mo del likelihood asym p- totically agrees with the joint likelihoo d of A and vertex bloc k iden tities so that efﬁcient estimatio n of all parameters is possible. I t is easy to see that the result can- not hold in the (semi) sparse regime since isolated points then exist with probability 1. Unfortu nately all of th ese methods are computatio nally inten si ve. Although spec- tral clu stering is no t stron gly consistent, a sligh t variant, r eassigning vertice s in any cluster a which are maximally connected to another cluster b rather than a , is strongly consistent. Deﬁnition 3. δ is said to be weakly consistent if and only if W ≡ n − 1 n ∑ i = 1 P ( i ∈ a , δ ( i ) 6 = a |∀ i , a ) = o ( 1 ) Spectral clu stering ap plied to A [ 32 ] o r the Laplacians ([ 29 ] in the mann er we have d escribed) has b een shown to be weakly co nsistent in the sem i dense to d ense regimes. Even weak co nsistency fails for parts of the sparse regime [ 1 ]. Th e best that can b e hoped fo r is W < 1 2 . A shar p problem has been posed and ev entually resolved in a series o f pap ers, Decelle et al [ 14 ], Mossel et al [ 27 ]. These writers considered the case K = 2 , π 1 = π 2 , B 11 = B 22 . First, Dec elle et al. [ 14 ] argued on physical g round s that if , F = 2 ( B 11 − B 12 ) 2 / ( B 11 + B 12 ) ≤ 1, then W ≥ 1 / 2 for any m ethod an d parameter s are unestimable fr om the d ata even if the y satisfy the minimal iden tiﬁability condition s gi ven below . On th e other h and Mossel e t al [ 27 ] and indepen dently Massoulie et al [ 24 ], devised admittedly slo w m ethods such that if F > 1 then W < 1 / 2 and parameters can be estimated consistently . W e no w p resent a fast s pectral clustering meth od given in greater detail in [ 6 ] which yield s weak consistency for the semisparse r egime on and also has the p rop- erties of the M ossel e t al and Massou lie meth ods. In fact, it r eaches the phase tr an- sition threshold for all K not just K=2, but stil l restricted to π j = 1 / K , all j and B aa + 2 ∑ [ B ab : b 6 = a ] indepen dent o f a for all a . W e note th at Zhao et. a l. (20 15) [ 17 ] exhibit a two-stage algorithm which exhib its the same behavior b ut its properties in sparse case are unknown. The algor ithm gi ven in the next section in volves spectral clustering of a new matrix, tha t of all geo desic distances between i and j . Spectral Clustering and Block Models: A Re vie w And A New Algorithm 7 3 Algorithm As usual let G n , an un directed graph on n vertices be the data. den ote the vertex set by V ( G n ) ≡ { v 1 , . . . , v n } and th e edge set by E ( G n ) ≡ { e 1 , . . . , e m } with card inalities | V ( G n ) | = n and E ( G n ) | = m . As usual a p ath between vertices u and v is a set of edges { ( u , v 1 ) , ( v 1 , v 2 ) , . . . , ( v ℓ − 1 , v ) } and the length of such a path is ℓ . The alg orithm we propo se depen ds on the grap h distance or g eodesic d istance between vertices in a graph. Deﬁnition 4. The Graph o r G eodesic distance b etween two vertices i and j of graph G is g iv en by the length of the sho rtest p ath b etween th e vertices i and j , if they are connected. Otherwise, the distance is inﬁnite. So, for any tw o vertices u , v ∈ V ( G ) , graph distance, d g is deﬁned by d g ( u , v ) = ( min { ℓ |∃ path of length ℓ between u and v } , ∞ , if u and v are no t connected For implemen tation, we can r eplace ∞ by n + 1, wh en, u and v a re not co nnected , since any p ath with loops can not be a geod esic. The main step s of the algorithm are as follows 1. Find the grap h distanc e m atrix D = [ d g ( v i , v j )] n i , j = 1 for a gi ven network b u t with distance u pper boun ded by k log n . Assign non-c onnected vertices an arb itrary high value. 2. Perform hierar chical clustering to identify the giant componen t G C of graph G . Let n C = | V ( G C ) | . 3. Normalize the gra ph distance matrix on G C , D C by ¯ D C = −  I − 1 n C 11 T  ( D C ) 2  I − 1 n C 11 T  4. Perform eigenv alue decompo sition on ¯ D C . 5. Consider the top K eigenv ectors of normalized distance matrix ¯ D C and ˜ W be the n × K m atrix form ed by arrangin g the K eigenvectors as columns in ˜ W . Per form K -means clustering on the rows ˜ W , that means, ﬁnd a n n × K matrix C , wh ich has K distinct rows and minimizes || C − ˜ W || F . 8 Sharmode ep Bhattacharyya and Peter J. Bickel 6. (Alternative to 5 .) Perfo rm Gau ssian mixtur e m odel ba sed clu stering o n the rows of ˜ W , when th ere is an indication of high ly-varying average d egree be- tween the commun ities. 7. Let ˆ c : V 7→ [ K ] be the block assignment functio n accordin g to the clustering of the rows of ˜ W perfor med in either Step 5 or 6. Here are some important observations about the implemen tation of the algorithm - (a) There are standard algorithms for graph distance ﬁnd ing in the algorithmic graph theory literature. In the algorithmic graph theory literature th e problem is kno wn as the all pairs shortest pat h problem. The two mo st popular algorith ms are Floyd-W arshall [ 16 ] [ 34 ] and Johnson’ s alg orithm [ 21 ]. (b) Step 3 of the a lgorithm is nothing but the classical multi-dimensional scaling (MDS) of the graph distance matrix. (c) In the Step 5 of the algorithm K -means clustering is appropriate if the expected degree of the blo cks are equal. Ho wever , if the expected degree of the blo cks are different, this leads to multi scale behavior in t he eigenv ectors of the normalize d distance matrix and bad behavior in practice. So, we perfor m Gaussian Mixture Model (GMM) based clustering instead of K -means to take into account that. General theoretical results on the alg orithm will be gi ven in [ 6 ]. In this paper, we ﬁrst r estrict to the spa rse regime W e do so because the a rguments in the sp arse regime are essentially dif ferent from the others. Curiou sly , it is i n the sparse and par t of the sem i-sparse regime only that the matr ix ¯ D C concentr ates to an n × n matrix with K d istinct typ es of row vectors as f or the o ther method s of spectral clustering. It does not con centrate in the de nse regime, while the opp osite is tru e of A an d L . They do not concentrate outside the semidense regime. That the geodesic matrix does not concentrate in the dense regime can easily be seen s ince asymptotically all geodesic path s are o f con stant length . But th e distributions of p ath leng ths differs from block to block ensuring tha t the spectral c lustering works. But we do not touch this further here. 4 Theor etical Results Throu ghout this section we take ρ n = 1 n and specialize to the case B = ( p − q ) I K × K + q 11 T Spectral Clustering and Block Models: A Re vie w And A New Algorithm 9 where, I is th e identity and 1 = ( 1 , . . . , 1 ) T . T hat is, all K blocks have the same probab ility p of co nnecting two block memb ers and prob ability q of conn ecting members of two d ifferent blo cks a nd p > q . W e also assume that π a = 1 K , a = 1 , . . . , K , all blocks are asymptotically o f the same size. W e restrict ourselves to this model here b ecause it is the one tr eated by Mossel, Neeman a nd Sly (2013) [ 27 ] and already subtle technical details are not obscured . He re is the result we prove. Theorem 1. F or the given model, if ( p − q ) 2 > K ( p + ( K − 1 ) q ) , (3) and our algorithm is applied , ˆ c r esults and c is the true assignment function, then, " 1 n n ∑ i = 1 1 ( c ( v i ) 6 = ˆ c ( v i )) < 1 2 # → 1 (4) Notes: 1. ( 3 ) mark s the phase transition conjectured by [ 14 ]. 2. A close reading of our p roof shows that as ( p − q ) 2 / K ( p + ( K − 1 ) q ) → ∞ , 1 n ∑ n i = 1 1 ( c ( v i ) 6 = ˆ c ( v i )) P → 0 . W e conjectu re that our conclusion in f act holds under the following condition s, (A1) W e consider λ 1 > 1, λ 1 > max j ≥ 2 λ j , 1 ≤ j ≤ K and λ K > 0. F or M , there exists a k such that ( M k ) ab > 0 for all a , b = 1 , . . . , K . Also, π j > 0, for j = 1 , . . . , K . (A2) Each vertex has the same asymptotic av erage degree α > 1, that is, α = K ∑ k = 1 π k B ak = K ∑ k = 1 M ak , for all a ∈ { 1 , . . . , K } (A3) W e assume that λ 2 K > λ 1 or alternatively , t here exists real positi ve t , such that, K ∑ k = 1 φ k ( a ) λ t k φ k ( b ) ≤ n , f or all a , b = 1 , . . . , K Note that (A1)-(A3) all hold for the case we consider . In fact, under our model, λ 1 = p + ( K − 1 ) q K , λ 2 = p − q K , λ 2 = λ 3 = · · · = λ K 10 Sharmode ep Bhattacharyya and Peter J. Bickel with (A3) being the condition of the Theorem . Our a rgument will be stated in a form that is g eneralizable a nd we will indicate revisions in in termediate statements as needed, poin ting in particular to a lemma whose conclusion only holds if an implication of (A3) we conjectur e is v alid . The theoretical analysis of the algorithm has two main parts - I. Finding the limiting d istribution o f gr aph distance between two typical vertices of type a and type b (wher e, a , b = 1 , . . . , K ). This pa rt of th e analy sis is h ighly depend ent on results fr om multi-type bran ching pr ocesses and their relation with stochastic blo ck models. Th e pro of techniq ues and results are bo rrowed from [ 9 ], [ 5 ] and [ 4 ]. II. Finding the behavior of the top K eigenvectors of the gr aph distance matrix D using the lim iting distribution of the typical grap h distances. This part of anal- ysis is high ly dependent on perturb ation theory of linear operato rs. The proof technique s and r esults are borrowed from [ 22 ], [ 12 ] and [ 32 ]. W e will state two theorem s correspo nding to I and II above. Theorem 2. Under our model, the graph d istance d G ( u , v ) between two uniformly chosen vertices of type a and b r espe ctively , c ondition ed on bein g connected , satis- ﬁes the following asymptotic r elation - (i) If a = b, for any ε > 0 , as n → ∞ , P [( 1 − ε ) τ 1 ≤ d G ( u , v ) ≤ ( 1 + ε ) τ 1 ] = 1 − o ( 1 ) (5) wher e, τ 1 is the minimum r ea l positive t , which satisﬁes the r elation below ,  λ t 2 + λ t 1 − λ t 2 K  = n (6) (ii)If a 6 = b, for any ε > 0 , as n → ∞ , P [( 1 − ε ) τ 2 ≤ d G ( u , v ) ≤ ( 1 + ε ) τ 2 ] = 1 − o ( 1 ) (7) wher e, τ 2 is the minimum r ea l positive t , which satisﬁes the r elation below ,  λ t 1 − λ t 2 K  = n (8) In Theo rem 2 we have a point- wise result. T o use matrix p erturba tion theory for part II we need the following. Spectral Clustering and Block Models: A Re vie w And A New Algorithm 11 Theorem 3. Let D B be the r estriction of th e geo desic matrix to vertices in th e big compon ent of G n . Then, under our model, P          D log n − D         F ≤ o ( n )  = 1 − o ( 1 ) wher e, D i j ≡ σ 1 = τ 1 / log n, if v i and v j have same type and D i j ≡ σ 2 = τ 2 / log n, otherwise, wher e, τ 1 and τ 2 ar e solutions t in Eq. ( 6 ) and ( 8 ) r esp ectively . T o genera lize Theorem 1 , we need appropriate generalizations of Theorem 2 and 3 . Heuristically , it m ay be argu ed that the genera lizations ( τ sb ) , a , b = 1 , . . . , K sho uld satisfy the equation s, K ∑ k = 1 φ k ( a ) λ t k φ k ( b ) = ( S t ) ab = n , for a ≤ b ∈ [ K ] (9) Our conjecture is that (A1)-(A3) imp ly that the eq uations have asymptotic solutions and that the statements of Theorem 2 and 3 hold with obvious modiﬁcations. Note that in T heorem 2 , since λ j = λ 2 , 2 ≤ j ≤ K there are effectively on ly two equations an d modiﬁcatio ns are also neede d for oth er degen eracies in the parame- ters. W e next turn to a branchin g pr ocess result in [ 10 ] which we will use hea vily . 4.1 A K ey Branching Process Result As others have do ne we link the network formed by SBM with the tree network generated by m ulti-type Galton-W atson branch ing pro cess. In our case, the Multi- type branchin g process (MTBP) has type space S = { 1 , . . . , K } , where a particle of type a ∈ S is rep laced in the next gene ration b y a set of particles distributed a s a Poisson pr ocess on S with intensity ( B ab π b ) K b = 1 = ( M ab ) K b = 1 . Recall the deﬁnitions of B , M and S fro m Section 2.1 . W e denote this branching process, started with a single p article o f ty pe a , by B B , π ( a ) . W e write B B , π for the sam e pr ocess with the type of the in itial particle rando m, distrib uted according to π . Accor ding to Theorem 8.1 of Chap ter 1 of [ 25 ], the branching p rocess has a positi ve surv i val p robab ility if λ 1 > 1, wh ere, λ 1 is the Perron- Frobeniu s eige n value o f M , a positive regular matrix. Recall that for our special M , λ 1 = p − q K + 1. Deﬁnition 5. (a) Deﬁne ρ ( B , π ; a ) as the p robability th at the bra nching p rocess, B B , π ( a ) , survives fo r eternity . 12 Sharmode ep Bhattacharyya and Peter J. Bickel (b) Deﬁne, ρ ≡ ρ ( B , π ) ≡ K ∑ a = 1 ρ ( B , π ; a ) π a (10) as the survival probability of the branching pr ocess B B , π giv en th at its initial distribution is π W e denote Z t = ( Z t ( a )) K a = 1 as the population of p articles of K d ifferent typ es, with Z t ( a ) d enoting p articles o f typ e a , at gene ration t fo r the Poisson multi-type branch ing pr ocess B B , π , with B and π as deﬁned in Section 4 . F rom Theorem 24 o f [ 10 ], we get that Theorem 4 ([ 10 ]). Let β > 0 and Z 0 = x ∈ N K be ﬁxed. The r e e xists C = C ( x , β ) > 0 such that with pr o bability a t le ast 1 − n − β , fo r a ll k ∈ [ K ] , all s , t ≥ 0 , with 0 ≤ s < t , |h φ k , Z s i − λ s − t k h φ k , Z t i| ≤ C ( t + 1 ) 2 λ s / 2 1 ( log n ) 3 / 2 (11) Remark: The ab ove stated theor em is a special case of the gener al theo rem stated in [ 10 ]. Th e g eneral theor em is required for generalizing Theor em 1 . The general version of the theorem is Theorem 5 ([ 10 ]). Let β > 0 and Z 0 = x ∈ N K be ﬁxed. There exists C = C ( x , β ) > 0 such tha t with pr o bability at least 1 − n − β , fo r all k ∈ [ K 0 ] ( wher e, K 0 is the lar gest integer such that λ 2 k > λ 1 for all k ≤ K 0 ), all s , t ≥ 0 , with 0 ≤ s < t , |h φ k , Z s i − λ s − t k h φ k , Z t i| ≤ C ( t + 1 ) 2 λ s / 2 1 ( log n ) 3 / 2 (12) and for all k ∈ [ K ] \ [ K 0 ] , for all t ≥ 0 , |h φ k , Z t i| ≤ C ( t + 1 ) 2 λ t / 2 1 ( log n ) 3 / 2 (13) F ina lly , for all k ∈ [ K ] \ [ K 0 ] , all t ≥ 0 , E |h φ k , Z t i| 2 ≤ C ( t + 1 ) 3 λ t 1 . 4.2 The Neighborhood Explorati on Pr ocess The n eighbo rhood explor ation process o f a vertex v in g raph G gen erated from a n SBM gives us a han dle on the link between local structures of a graph from SBM and multi-ty pe br anching process. Recall the deﬁnitio ns of SBM p arameters from Section 2.1 and the deﬁnitions o f Poisson multi-type branching p rocess from Section Spectral Clustering and Block Models: A Re vie w And A New Algorithm 13 4.1 . W e assume all vertices of grap h G n generated from a stochastic block m odel has been assigned a commun ity or type ξ i (say) for vertex v i ∈ V ( G n ) . The n eighbo rhood exploration pr ocess , ( G , v ) L , of a vertex v in graph G n , gen- erates a spanning tr ee o f the induced subgraph of G n consisting of vertices o f at most L -distance from v . The spann ing tree is for med f rom the exp loration pro- cess which starts from a vertex v as th e r oot in th e r andom g raph G n generated from stochastic bloc k mod el. The set of vertices of type a o f the rando m grap h G n that are neighb ors of v and h as n ot been previously explored are called Γ 1 , a ( v ) and N 1 , a ( v ) = | Γ 1 , a ( v ) | fo r a = 1 , . . . , K and N 1 ( v ) = ( N 1 , 1 ( v ) , . . . , N 1 , K ( v )) . So, Γ 1 ( v ) = { Γ 1 , 1 ( v ) , . . . , Γ 1 , K ( v ) } ar e the children o f the r oot v at step ℓ = 1 in the span- ning tree of th e n eighbo rhood exploration proce ss. The n eighbo rhood exploration process is repeated at secon d step b y lo oking at the neighb ors of ty pe a of th e ver- tices in Γ 1 ( v ) that has not b een pr eviously explored and the set is called Γ 2 , a ( v ) and N 2 , a ( v ) = | Γ 2 , a ( v ) | for a = 1 , . . . , K . Similarly , Γ 2 ( v ) = { Γ 2 , 1 ( v ) , . . . , Γ 2 , K ( v ) } are the children of vertices Γ 1 ( v ) at step ℓ = 2 in th e spanning tree of the neighb orhoo d exploration pr ocess. The explo ration process is continu ed until step ℓ = L . No te that the pr ocess stops when all the vertices in G n has been explo red. So, if G n is connected , then, L ≤ the diameter of the graph G n . Since, we either consider G n connected or only the giant compon ent of G n , the neighbo rhoo d explora tion process will end in a ﬁnite n umber of step s but the n um- ber of steps may d epend on n and is equal to the diam eter , L , of the co nnected compon ent o f the graph c ontaining the root v . It follows from Theorem 14.11 of [ 9 ] that L / log λ 1 ( n ) P → 1 . (14) Now , we ﬁnd a couplin g relation between the neighbo rhood e xp loration pr ocess of a vertex of type a in stochastic block mod el and a m ulti-type Galton -W atson process, B ( a ) starting from a vertex of type a . The Lemma is based on Proposition 31 of [ 10 ]. Lemma 1. Let w ( n ) b e a sequence such that w ( n ) → ∞ and w ( n ) / n → 0 . Let ( T , v ) be the rando m r o oted tr ee associated with the P oisson multi- type Galton-W atson branching pr ocess deﬁned in Section 2.1 started fr o m Z 0 = δ c v and ( G , v ) be the spanning tr ee a ssociated with neighborhoo d exploration pr ocess of r andom SBM graph G n starting fr om v. F or ℓ ≤ τ , wher e τ is the nu mber of steps r equir ed to 14 Sharmode ep Bhattacharyya and Peter J. Bickel explor e w ( n ) vertices in ( G , v ) , the total variation d istance, d TV , b etween t he law of ( G , v ) ℓ and ( T , v ) ℓ at step ℓ g oes to zer o as O  n − 1 2 ∨ w ( n ) / n  = o ( 1 ) . Pr oo f. Let us start the neigh borho od exploration proce ss starting with vertex v of a grap h generate d fro m an SBM model with parameters ( P , π ) = ( B / n , π ) . Cor re- sponding ly the mu lti-type branching process starts from a single particle of ty pe c v , where, c v is the type or class of vertex v in SBM. Let t be such that 0 ≤ t < τ , wher e, τ is deﬁned in the L emma statement. Now , for such a t ≥ 0, let ( x t + 1 ( 1 ) , . . . , x t + 1 ( K ) ) be leaves of ( T , v ) at time t starting from a vertex v t generated by step t of class c v t = a . Let ( y t + 1 ( 1 ) , . . . , y t + 1 ( K ) ) be the vertices exp osed at step t of the exploratio n process starting from a vertex of class a , wher e, a ∈ [ K ] . Now , if c v t is of type a , th en, we ha ve x t + 1 ( b ) f ollows Bin ( n t ( b ) , B ab / n ) and y t + 1 ( b ) follows Poi ( π b B ab ) fo r b = 1 , . . . , K , where, n t ( b ) is the nu mber of u nused vertices of typ e b remaining at time t fo r b = 1 , . . . , K . Also, y t + 1 ( b ) for different b are indep endent. Note that n b ≥ n t ( b ) ≥ n b − w ( n ) f or b = 1 , . . . , K . So, since, we hav e | n b / n − π b | = O ( n − 1 / 2 ) for b = 1 , . . . , K , we get that, | n t ( b ) − π b | < O  n − 1 / 2 + w ( n ) / n  for b = 1 , . . . , K Now , we know that, d T V  Bin ( m ′ , λ / m ) , Poi ( m ′ λ / m )  ≤ λ m , d T V  Poi ( λ ) , Poi ( λ ′ )  ≤ | λ − λ ′ | So, now , we have, d T V ( P t + 1 , Q t + 1 ) ≤ O  n − 1 / 2 ∨ w ( n ) / n  = o ( 1 ) where, P t + 1 is th e distribution of y t + 1 under neighbo rhood exploration p rocess an d Q t + 1 is the distribution of x t + 1 under the branchin g pr ocess, and hence Lemm a 1 follows. Now , we r estrict our selves to the giant compon ent of G n . The size of the giant compon ent of G n , C 1 ( G n ) , of a random graph gener ated from SBM ( B , π ) is related to the multi-type branchin g pr ocess through its survival pro bability as gi ven in Def- inition 5 . According to Theorem 3.1 of [ 9 ], we hav e, 1 n C 1 ( G n ) P → ρ ( B , π ) (15) Spectral Clustering and Block Models: A Re vie w And A New Algorithm 15 Under this additional condition of restricting to the giant com ponent, the branching process can be c oupled with anoth er branching process with a different kernel. The kernel of that branching process is gi ven in follo wing lemma. Lemma 2. I f v is in giant co mponen t of G n , the new branching pr ocess has kernel  B ab  2 ρ ( B , π ) / K − ρ 2 ( B , π ) / K 2  K a , b = 1 . Pr oo f. The pro of is gi ven in Section 10 of [ 9 ]. Since, we will be restricting ourselves to the gia nt c ompon ent o f G n , we shall be using th e B ′ ≡  B ab  2 ρ ( B , π ) / K − ρ 2 ( B , π ) / K 2  K a , b = 1 matrix as the conn ectivity matrix in stead of B . W e abuse notation by referencing to the matrix B ′ as B too. W e pro ceed to prove th e limiting behavior of typ ical distance between vertices v and w o f G n , where, v , w ∈ V ( G n ) . W e ﬁrst try to ﬁnd a lower bou nd fo r distan ce between two v ertices. W e shall separately g iv e an upper b ound and lo wer bou nds for the distance between two vertices of the same type and different types. Lemma 3. Un der our model, for vertices v , w ∈ V ( G ) , if (a) typ e of v = type of w = a (say), then, |{{ v , w } : d G ( v , w ) ≤ ( 1 − ε ) τ 1 }| ≤ O ( n 2 − ε ) with high pr ob ability wher e, τ 1 is the minimum r ea l positive t , which satisﬁes Eq. ( 6 ) , (b) typ e of v = a 6 = b = type of w ( say), then, |{{ v , w } : d G ( v , w ) ≤ ( 1 − ε ) τ 2 }| ≤ O ( n 2 − ε ) with high pr ob ability wher e, τ 2 is the minimum r ea l positive t , which satisﬁes Eq. ( 8 ) . Pr oo f. Let Γ d ( v ) ≡ Γ d ( v , G n ) denote the d -distance set of v in G n , i.e., the set of vertices o f G n at g raph distance exactly d from v , and let Γ ≤ d ( v ) ≡ Γ ≤ d ( v , G n ) de- note th e d -neighborh ood ∪ d ′ ≤ d Γ d ′ ( v ) of v . Let Γ d , a ( v ) ≡ Γ d , a ( v , G n ) de note th e set of vertices of ty pe a at d -distance in G n and le t Γ ≤ d , a ( v ) ≡ Γ ≤ d , a ( v , G n ) d enote the d -neighborho od ∪ d ′ ≤ d Γ d ′ , a ( v ) of v consisting of vertices of type a . Let N a d be the number of particles at generation d of the b ranchin g process B B ( δ a ) and N a d , c be the number o f pa rticles at generatio n d o f th e br anching proce ss B B ( δ a ) o f ty pe c . So, N a d = ∑ K c = 1 N a d , c and Z t ( k ) = ∑ t d = 0 N a d , k . Lemma 1 inv olved ﬁrst showing that, for n large enoug h, th e neig hborh ood ex- ploration process starting at a gi ven vertex v of G n with type a could be coupled 16 Sharmode ep Bhattacharyya and Peter J. Bickel with the branching process B B ′ ( δ a ) , where the B ′ is deﬁned by Lemma 2 . As noted we identify B ′ with B . The n eighbo rhood exploration pro cess and multi-ty pe bran ching pro cess can be coupled so that for every d , | Γ d ( v ) | is at mo st the numb er N d + O  n − 1 2 ∨ w ( n ) / n  , where, N d is num ber of par ticles in g eneration d of B B ( δ a ) and in d g eneration s at most w ( n ) vertices of G n have been explor ed. From Theorem 4 , we get that with high proba bility     h φ k , Z t i λ t k − h φ k , Z 0 i     ≤ C ( t + 1 ) 2 ( log n ) 3 / 2 Since, for any x ∈ R K , we get th e uniqu e representation, x = ∑ K k = 1 h x , φ k i φ k , for any basis { φ k } K k = 1 of R K . If we take x = e b , where, e b is the unit vector with 1 at b -th co-ord inate and 0 els ewhere, b = 1 , . . . , K , we can get Z t ( b ) ≤ K ∑ k = 1 φ k ( b ) λ t k φ k ( a ) h Z 0 ( a ) + C ( t + 1 ) 2 ( log n ) 3 / 2 i Now , un der ou r mod el one re presentation of the e igenv ectors is φ 1 = 1 √ K ( 1 , . . . , 1 ) , φ 2 = 1 √ 2 ( − 1 , 1 , 0 , . . . , 0 ) , φ 3 = 1 √ 6 ( − 1 , − 1 , 2 , 0 , . . . , 0 ) , · · · , φ K − 1 = 1 √ K ( K − 1 ) ( − 1 , . . . , − 1 , K − 1 ) . No w usin g the r epresentation of eigenv ectors for b ranching pro cess starting fr om vertex o f type a , a ∈ [ K ] , we get with high probab ility K ∑ k = 1 Z t ( k ) ≤ λ t 1 h Z 0 ( a ) + C ( t + 1 ) 2 ( log n ) 3 / 2 i Z t ( a ) − Z t ( b ) ≥ λ t 2 h − Z 0 ( a ) − C ( t + 1 ) 2 ( log n ) 3 / 2 i , b = 1 , . . . , K and b 6 = a . So, we can simplify , for each a ∈ [ K ] with Z 0 ( a ) = 1 , with high probab ility , Z t ( a ) ≤ 1 K  λ t 1 + ( K − 1 ) λ t 2  h 1 + C ( t + 1 ) 2 ( log n ) 3 / 2 i Z t ( b ) ≤ λ t 1 − λ t 2 K h 1 + C ( t + 1 ) 2 ( log n ) 3 / 2 i , b ∈ [ K ] a nd b 6 = a . Set D 1 = ( 1 − ε ) τ 1 , where, τ 1 is the solution to the equation  λ t 2 + λ t 1 − λ t 2 K  = n and set D 2 = ( 1 − ε ) τ 2 , where, τ 2 is the solution to the equation Spectral Clustering and Block Models: A Re vie w And A New Algorithm 17  λ t 1 − λ t 2 K  = n where, ε > 0 is ﬁxed a nd sm all. Note that bo th τ 1 and τ 2 are of the o rder O ( lo g n ) . Thus, with high probab ility , for v of type a and w ( n ) = O ( n 1 − ε ) , | Γ ≤ D 1 , a ( v ) | = ∑ D 1 d = 0 N a d , a ≤ Z D 1 ( a ) + O  D 1 n − 1 2 ∨ w ( n ) / n  = O ( n 1 − ε ) | Γ ≤ D 2 , b ( v ) | = ∑ D 2 d = 0 N a d , b ≤ Z D 2 ( b ) + O  D 2 n − 1 2 ∨ w ( n ) / n  = O ( n 1 − ε ) So, summing over v ∈ C a and v ∈ C b , where, C a = { i ∈ V ( G ) | c i = a } and C b = { i ∈ V ( G ) | c i = b } , we have, ∑ v ∈ C a | Γ ≤ D 1 , a ( v ) | = |{{ v , w } : d G ( v , w ) ≤ ( 1 − ε ) τ 1 , v , w ∈ C a }| ∑ v ∈ C a | Γ ≤ D 2 , b ( v ) | = |{{ v , w } : d G ( v , w ) ≤ ( 1 − ε ) τ 2 , v ∈ C a , w ∈ C b }| and so with high probab ility |{{ v , w } : d G ( v , w ) ≤ ( 1 − ε ) τ 1 , v , w ∈ C a }| = ∑ v ∈ V ( G n ) | Γ ≤ D , a ( v ) | = O ( n 2 − ε ) |{{ v , w } : d G ( v , w ) ≤ ( 1 − ε ) τ 2 , v ∈ C a , w ∈ C b }| = ∑ v ∈ V ( G n ) | Γ ≤ D , b ( v ) | = O ( n 2 − ε ) The above statemen t is equiv alent to P  |{{ v , w } : d G ( v , w ) ≤ ( 1 − ε ) τ 1 , v , w ∈ C a }| ≤ O ( n 2 − ε )  = 1 − o ( 1 ) P  |{{ v , w } : d G ( v , w ) ≤ ( 1 − ε ) τ 2 , v ∈ C a , w ∈ C b }| ≤ O ( n 2 − ε )  = 1 − o ( 1 ) for any ﬁxed ε > 0. Now , we u pper boun d the typical distance betwee n two vertices of SBM graph G n . Lemma 4. Un der our model, for vertices v , w ∈ V ( G ) and condition ed on the event that the explor ation pr ocess starts fr o m a vertex in the giant component of G, if , (a) typ e of v = type of w = a (say), then, P ( d G ( v , w ) < ( 1 + ε ) τ 1 ) = 1 − ex p ( − Ω ( n 2 η )) wher e, τ 1 is the minimum r ea l positive t , which satisﬁes Eq. ( 6 ) , (b) typ e of v = a 6 = b = type of w ( say), then, 18 Sharmode ep Bhattacharyya and Peter J. Bickel P ( d G ( v , w ) < ( 1 + ε ) τ 2 ) = 1 − ex p ( − Ω ( n 2 η )) wher e, τ 2 is the minimum r ea l positive t , which satisﬁes Eq. ( 8 ) . Pr oo f. W e c onsider the mu lti-type branchin g process with prob ability kernel P ab = B ab n ∀ a , b = 1 , . . . , K and the correspon ding ran dom g raph G n generated from stochas- tic block mo del h as in total n n odes. W e con dition that bran ching pro cess B K sur- viv es. Note that an upp er bou nd 1 is obvious, since we are bou nding a probab ility , so it sufﬁces to prove a correspond ing lower boun d. W e may a nd shall assume tha t B ab > 0 for some a , b . Again, let Γ d ( v ) ≡ Γ d ( v , G n ) denote the d -distance set o f v in G n , i.e ., the set of vertices of G n at graph distanc e exactly d from v , and let Γ ≤ d ( v ) ≡ Γ ≤ d ( v , G n ) denote the d -neighborh ood ∪ d ′ ≤ d Γ d ′ ( v ) of v . Le t Γ d , a ( v ) ≡ Γ d , a ( v , G n ) denote th e set of vertices o f typ e a at d -distance in G n and le t Γ ≤ d , a ( v ) ≡ Γ ≤ d , a ( v , G n ) d enote the d -n eighbo rhood ∪ d ′ ≤ d Γ d ′ , a ( v ) of v consisting of v ertices o f type a . Let N a d be the number of particles at generation d of branchin g process B B ( δ a ) and N a d , c be the n umber of particles at gener ation d of bra nching process B B ( δ a ) o f ty pe c . So, N a d = ∑ K c = 1 N a d , c and Z t ( k ) = ∑ t d = 0 N a d , k . By Lemma 1 , for w ( n ) = o ( n ) , | Γ d , c ( v ) | ≥ N d , c − O  n − 1 2 ∨ w ( n ) / n  , c = 1 , . . . , K . (16) for all d s.t. | Γ ≤ d ( v ) | < ω ( n ) . This rela tion b etween the nu mber o f vertices at gen- eration d of ty pe c of branch ing p rocess B B ( δ a ) , d enoted b y N d , c and th e numbe r of vertices o f ty pe c at distance d from v for the neigh borho od exploration process of G n , denoted by | Γ d , c ( v ) | becom es highly impo rtant later on in this proo f, wher e, c = 1 , . . . , K . Note th at the re lation only ho lds wh en | Γ ≤ d ( v ) | < ω ( n ) for some ω ( n ) such that ω ( n ) / n → 0 as n → ∞ . From Theorem 4 of the branching process, we get that with high probability     h φ k , Z t i λ t k − h φ k , Z 0 i     ≤ C ( log n ) 3 / 2 Now fo llowing the sam e line of argument as in proof of Lemma 3 , for each a ∈ [ K ] w ith Z 0 ( a ) = 1 , with high probab ility we get that, Spectral Clustering and Block Models: A Re vie w And A New Algorithm 19 Z t ( a ) ≤ 1 K  λ t 1 + ( K − 1 ) λ t 2  h 1 + C ( t + 1 ) 2 ( log n ) 3 / 2 i Z t ( b ) ≤ λ t 1 − λ t 2 K h 1 + C ( t + 1 ) 2 ( log n ) 3 / 2 i , b ∈ [ K ] a nd b 6 = a . Let D 1 be the integer p art of ( 1 + 2 η ) τ ′ 1 , where, τ ′ 1 is the solution to the equation  λ t 2 + λ t 1 − λ t 2 K  = n 1 / 2 − η (17) Thus cond itioned on su rviv al of the branc hing pr ocess B B ( δ a ) , N a D 1 , a ≥ n 1 / 2 + η / 2 . Set D 2 = ( 1 + η ) τ ′ 2 , where, τ ′ 2 is the solution to the equation λ t 1 = n 1 / 2 + η (18) Thus conditione d on surviv al of branchin g pro cess B B ( δ a ) , N a D 2 , b ≥ n 1 / 2 + η / 2 for b = 1 , . . . , K . Furthermor e lim d → ∞ P ( N a d 6 = 0 ) = ρ ( B , a ) . Now , we ha ve conditioned t hat the bran ching process with kernel B is s urviving. The right-h and side tends to ρ ( B , a ) = 1 as η → 0 . Hence, giv en any ﬁxed γ > 0, if we choose η > 0 small enoug h, and for large enough n , we ha ve P  ∀ b : N a D 2 , b ≥ n 1 / 2 + η / 2  = 1 , P  N a D 1 , a ≥ n 1 / 2 + η / 2  = 1 . Now , the neig hborh ood explo ration process and b ranchin g process can b e c ou- pled so that f or every d , | Γ d ( v ) | is at mo st th e nu mber N d of p articles in gen eration d of B B ( a ) from Lemma 1 and Eq ( 1 6 ). So, we have f or v of type a , with h igh probab ility , | Γ ≤ D 1 , a ( v ) | ≤ E D 1 ∑ d = 0 N d = o ( n 2 / 3 ) | Γ ≤ D 2 , b ( v ) | ≤ E D 2 ∑ d = 0 N d = o ( n 2 / 3 ) if η is small enoug h, since D 1 is inte ger p art of ( 1 + 2 η ) τ ′ 1 and D 2 is the integer part of ( 1 + 2 η ) τ ′ 2 , where, τ ′ 1 and τ ′ 2 are solutions to Eq. ( 17 ) and ( 18 ). Note th at the power 2 / 3 here is arbitrary , we cou ld have any power in the ra nge ( 1 / 2 , 1 ) . So, now , we are in a p osition to ap ply Eq ( 16 ), as we h ave | Γ ≤ D ( v ) | ≤ O ( n 2 / 3 a ) < ω ( n ) , with ω ( n ) / n → 0. 20 Sharmode ep Bhattacharyya and Peter J. Bickel Now let v and w be two ﬁxed vertices of G ( n , P ) , o f typ es a and b resp ectiv ely . W e explo re both their neigh borho ods at the same time, stopping either wh en we reach d istance D in b oth n eighbo rhood s, o r we ﬁnd an edg e f rom on e to the o ther, in which case v and w ar e within g raph distance 2 D + 1. W e consider tw o independen t branch ing pro cesses B B ( a ) , B ′ B ( b ) , with N a d , c and N b d , c vertices o f ty pe c in gen er- ation d respecti vely . By the previous argument, with high prob ability we encounter o ( n ) vertices in the explor ation so, by the argument lead ing to ( 16 ), wh p eith er th e explorations meet, or | Γ a d , c ( w ) | ≥ Z ( a ) d ( c ) − O  n − 1 2 ∨ n − 1 3  , c = 1 , . . . , K , c 6 = a | Γ b d , c ( w ) | ≥ Z ( b ) d ( c ) − O  n − 1 2 ∨ n − 1 3  , c = 1 , . . . , K , c 6 = b with the e xplorations not meeting, wh ere, Z ( a ) is the b ranching process starting f rom Z 0 = δ a , for a = 1 , . . . , K . Us ing boun d on N a d , c and the independence of the branch- ing processes, it follows that for a = b , P  d ( v , w ) ≤ 2 D 1 + 1 or | Γ a D 1 , c ( v ) | , | Γ a D 1 , c ( w ) | ≥ n 1 / 2 + η  ≥ 1 − o ( 1 ) . and for a 6 = b , P  d ( v , w ) ≤ 2 D 2 + 1 or ∀ c : | Γ a D 2 , c ( v ) | , | Γ b D 2 , c ( w ) | ≥ n 1 / 2 + η  ≥ 1 − o ( 1 ) . Write these probab ilities as P ( A j ∪ B j ) , j = 1 , 2. W e now show that P ( A c j ∩ B j ) → 0 and since P ( A j ∪ B j ) → 1 , we will have P ( A j ) → 1 . W e have not examined any edges fr om Γ D ( v ) to Γ D ( w ) , so the se edges ar e p resent ind ependen tly with their original uncon ditioned pro babilities. For any end vertex typ es c 1 , c 2 , the expected number o f these e dges is at least | Γ a D , c ( v ) || Γ a D , c ( w ) | B c 1 c 2 / n fo r ﬁrst pr obability and | Γ a D , c 1 ( v ) || Γ b D , c 2 ( w ) | B c 1 c 2 / n for second p robab ility . Choo sing c 1 , c 2 such that B c 1 c 2 > 0, this expectation is Ω (( n 1 / 2 + η / 2 ) 2 / n ) = Ω ( n η ) . It follows that at least one edge is p resent with prob ability 1 − exp ( − Ω ( n η )) = 1 − o ( 1 ) . If such an edge is pr esent, then d ( v , w ) ≤ 2 D 1 + 1 for ﬁrst p robability and d ( v , w ) ≤ 2 D 1 + 1 for second pr obability . So, the prob ability that the seco nd event in the above equation holds but not the ﬁrst is o ( 1 ) . Thus, the last equation implies that P ( d ( v , w ) ≤ 2 D 1 + 1 ) ≥ ( 1 − γ ) 2 − o ( 1 ) ≥ 1 − 2 γ − o ( 1 ) P ( d ( v , w ) ≤ 2 D 2 + 1 ) ≥ ( 1 − γ ) 2 − o ( 1 ) ≥ 1 − 2 γ − o ( 1 ) . Spectral Clustering and Block Models: A Re vie w And A New Algorithm 21 where, γ > 0 is arb itrary . Cho osing η small en ough , w e have 2 D + 1 ≤ ( 1 + ε ) log ( n ) / log λ . As γ is arbitrary , we h av e P ( d ( v , w ) ≤ ( 1 + ε ) τ 1 ) ≥ 1 − exp ( − Ω ( n 2 η )) , P ( d ( v , w ) ≤ ( 1 + ε ) τ 2 ) ≥ 1 − exp ( − Ω ( n 2 η )) . and the lemma follows. The eq uations ( 6 ) an d ( 8 ) co ntrol the asymp totic b ounds fo r the grap h distanc e d G ( v , w ) between two vertices v an d w in V ( G n ) . Un der the condition ( A3) it follows that λ 2 2 > λ 1 . If we con sider λ 2 2 = c λ 1 , where, c is a constant, then th e equations ( 6 ) and ( 8 ) can be written in the for m of quadratic equatio ns. So, the solu tions τ 1 and τ 2 exist under the condition c τ 1 and c τ 2 are of th e order O ( n ) and th e resulting solutions τ 1 and τ 2 are b oth of the order O ( log n ) . Also, fro m the expression of the solution s τ 1 and τ 2 , the limits τ 1 log n and τ 2 log n exist and we sh all d eﬁne th e limit as σ 1 and σ 2 respectively . 4.3 Proof of Theorem 2 and Theorem 3 4.3.1 Proof of Theorem 2 W e shall try to prove the limiting behavior of the ty pical graph distance in the giant compon ent a s n → ∞ . The Theo rem essentially follows f rom Lem ma 3 - 4 . Under the conditions mentioned in the Theor em, par t ( a) follows from Lemma 3 (a) and 4 (a) and part (b) follows from Lemma 3 (b) and 4 (b). 4.3.2 Proof of Theorem 3 From Deﬁn ition 4 , we have th at D i j = gra ph distance between vertices v i and v j , where, v i , v j ∈ V ( G n ) . Fro m Lemma 3 , we get f or any vertices v and w with high probab ility , |{{ v , w } : d G ( v , w ) ≤ ( 1 − ε ) τ 1 }| ≤ O ( n 2 − ε ) , if type of v = ty pe of w |{{ v , w } : d G ( v , w ) ≤ ( 1 − ε ) τ 2 }| ≤ O ( n 2 − ε ) , if type of v 6 = type of w . Also, from Lemma 4 , we get 22 Sharmode ep Bhattacharyya and Peter J. Bickel P ( d G ( v , w ) < ( 1 + ε ) τ 1 ) = 1 − exp ( − Ω ( n 2 η )) , if type of v = type of w , P ( d G ( v , w ) < ( 1 + ε ) τ 2 ) = 1 − exp ( − Ω ( n 2 η )) , if type of v = type of w . Now , σ 1 = τ 1 / log n and σ 2 = τ 2 / log n are asymptotically constan t as both τ 1 and τ 2 are of the or der log n as follows from equ ations ( 6 ) and ( 8 ). So, putting the two statements together, we get that with hig h probability , n ∑ i , j = 1: ty pe ( v i ) 6 = t y pe ( v j )  D i j log n − D i j  2 = O ( n 2 − ε ) + O ( n 2 ) . ε 2 since, by Lem ma 1 , ε = o ( 1 ) and ( 1 − exp ( − Ω ( n 2 η ))) n 2 → 1 as n → ∞ . So, putting the two cases together, we get that with high prob ability , for some ε > 0, n ∑ i , j = 1  D i j log n − D i j  2 = O ( n 2 − ε ) + O ( n 2 ) . ε 2 = o ( n 2 ) . Hence, for some ε > 0,         D log n − D         F ≤ o ( n ) . W e have completed proofs of Theorems 2 and 3 . 4.4 P erturbation Theory of Linear Operators W e n ow establish part II of ou r prog ram. D can b e considered as a p erturba tion of the operator D . The Da vis-Kahan Theor em [ 13 ]] gives a boun d on perturbation of eigenspace instead of eigenv ector, as discussed previously . Theorem 6 (Davis-Kahan (1970 )[ 13 ]). Let H , H ′ ∈ R n × n be symmetric, su ppose V ⊂ R is an inte rval, and sup pose for so me po sitive in te ger d that W , W ′ ∈ R n × d ar e such th at the column s of W form a n o rthonorma l ba sis for the sum o f the eigenspaces o f H associa ted with the eigen values of H in V and th at the columns of W ′ form an orthonormal basis for the sum of the eigenspaces of H ′ associated with the eigenvalues of H ′ in V . Let δ be the minimum distance between any eigenvalue of H in V a nd any eigenvalue of H not in V . Then ther e e xists an orthogonal matrix R ∈ R d × d such that || WR − W ′ || F ≤ √ 2 || H − H ′ || F δ . Spectral Clustering and Block Models: A Re vie w And A New Algorithm 23 4.5 Proof of Theorem 1 The b ehavior of th e eigenvalues of the limiting operato r D can be stated as fo llows - Lemma 5. Un der o ur mo del, the e igen values of D - | µ 1 ( D ) | ≥ | µ 2 ( D ) | ≥ · · · ≥ | µ n ( D ) | , can be bounded as follows - µ 1 ( D ) = O ( n σ 1 ) , | µ K ( D ) | = O ( n ( σ 1 − σ 2 )) , µ K + 1 ( D ) = · · · = µ n ( D ) = − σ 1 (19) Also, W ith high pr ob ability it holds that | µ K ( D / log n ) | = O ( n ( σ 1 − σ 2 )) a nd µ K + 1 ( D / log n ) ≤ o ( n ) . Pr oo f. The matrix D + σ 1 I n × n is a block m atrix with blocks o f sizes { n a } K a = 1 , with ∑ K a = 1 n a = n . The e lements of ( a , b ) th block are all same and equal to σ 1 , if a = b and equal to σ 2 , if a 6 = b . Note, diago nal of D is zero , as diagonal of D is also zero. Now , we have the eigenv alues o f the K × K matrix of the values in D to b e ( σ 1 + ( K − 1 ) σ 2 , σ 1 − σ 2 , . . . , σ 1 − σ 2 ) . I f we con sider , λ 2 2 = c λ 1 , then, if c > 1, we will have σ 1 > σ 2 . So , un der our model, we h av e that σ 1 > σ 2 . So , becau se of repetitions in the block ma trix µ 1 ( D ) = O ( n σ 1 ) = O ( n ) and µ K ( D ) = O ( n ( σ 1 − σ 2 )) = O ( n ) , since, by assum ption (A3) , n a = O ( n ) , fo r all a = 1 , . . . , K . Now , the rest of th e eigenv alu es o f D + σ 1 Id n × n is zero, so the rest of eigenv a lues o f D is − σ 1 . Now , about the second part of Lemma, By W eyl’ s In equality , for all i = 1 , . . . , n , || µ i ( D / log n ) | − | λ i ( D ) || ≤ || D / log n − D || F ≤ o ( n ) Since, from (A 1)-(A3) , it follows that σ 1 − σ 2 > c > 0 , for some con stant c , so, | λ K ( D / log n ) | = O ( n ( σ 1 − σ 2 )) − o ( n ) = O ( n ( σ 1 − σ 2 )) for large n and | λ K + 1 ( D / log n ) | ≤ − σ 1 + o ( n ) = o ( n ) . Now , let W be th e e igenspace cor respond ing to the top K absolu te e igenv alu es of D and ˜ W be the eigenspa ce co rrespond ing to th e top K absolu te eige n values of D . Using Davis-Kahan Lemma 6. W ith hig h pr obab ility , ther e exists an o rthogonal matrix R ∈ R K × K such that || WR − ˜ W || F ≤ o  ( σ 1 − σ 2 ) − 1  Pr oo f. The top K eigenv alues of b oth D and D / log n lies in ( Cn , ∞ ) f or som e C > 0. Also, the gap δ = O ( n ( σ 1 − σ 2 )) b etween top K an d K + 1th eigenv alues of matrix D . So, now , we can apply Da vis-Kahan Theo rem 6 and Theorem 3 , to get that, 24 Sharmode ep Bhattacharyya and Peter J. Bickel || WR − ˜ W || F ≤ √ 2 || D / log n − D || F δ ≤ o ( n ) O ( n ( σ 1 − σ 2 )) = o  ( σ 1 − σ 2 ) − 1  Now , the relationship between the rows of W c an be speciﬁed as follows - Lemma 7. F or any two r ows i , j of W n × K matrix, || u i − u j || 2 ≥ O ( 1 / √ n ) , if type of v i 6 = typ e of v j . Pr oo f. The matrix D + σ 1 Id n × n is a block m atrix wit h blocks o f sizes { n a } K a = 1 , with ∑ K a = 1 n a = n . The e lements of ( a , b ) th block are all same and equal to σ 1 , if a = b and equal to σ 2 , if a 6 = b . Note, diago nal of D is zero , as diagonal of D is also zero. Now , we have the rows of eigenvectors of the K × K matrix o f the values in D that have a constant difference. Under ou r model, we have that σ 1 > σ 2 . So, because of repetitions in the block matrix, rows o f D as well as th e projection of D into into its top K eigenspac e has difference of order O ( n − 1 / 2 ) between rows o f matrix. Now , if we consider K -m eans criterion as the clustering criterion on ˜ W , then, fo r the K -means minimize r centroid matrix C is an n × K matrix with K distinct rows correspo nding to the K centroids of K - means algorith m. By property of K -means objective fu nction and Lemma 6 , with high proba bility , || C − ˜ W || F ≤ || WR − ˜ W || F || C − WR || F ≤ || C − ˜ W || F + || WR − ˜ W || F || C − WR || 2 F ≤ 4 || WR − ˜ W || 2 F ≤ o  ( σ 1 − σ 2 ) − 2  By Lemm a 7 , fo r large n , we can get constant C , such that, K balls, B 1 , . . . , B K , of radius r = C n − 1 / 2 around K distinct rows of W are disjoint. Now note that with hig h proba bility the numb er of r ows i su ch that || C i − ( WR ) i || > r is at m ost cn ( σ 1 − σ 2 ) 2 , with arb itrarily small c onstant c > 0 . If the state- ment does not hold then, || C − WR || 2 F > r 2 .  cn ( σ 1 − σ 2 ) 2  ≥ C n − 1 .  cn ( σ 1 − σ 2 ) 2  = O  ( σ 1 − σ 2 ) − 2  So, we g et a contrad iction, since || C − WR || 2 F ≤ o  ( σ 1 − σ 2 ) − 2  . Thus, the number of mistakes should be at most  cn ( σ 1 − σ 2 ) 2  , with arbitrarily small constant c > 0 . Spectral Clustering and Block Models: A Re vie w And A New Algorithm 25 So, for each v i ∈ V ( G n ) , if c ( v i ) is the type o f v i and ˆ c ( v i ) is the type o f v i as estimated fr om ap plying K -means o n top K e igenspace of geodesic matrix D , we get that for arbitrarily small constant, c > 0, " 1 n n ∑ i = 1 1 ( c ( v i ) 6 = ˆ c ( v i )) < c ( σ 1 − σ 2 ) 2 # → 1 So, for constant σ 1 and σ 2 , we get c > 0 such that, " 1 n n ∑ i = 1 1 ( c ( v i ) 6 = ˆ c ( v i )) < 1 2 # → 1 5 Conclusion W e h av e given an o verview of spectr al clustering in the context of community detec- tion of network s and clustering. W e h av e also introduc ed a new method of commu- nity d etection in the pa per an d we have shown bou nds on theoretical p erform ance of the method. Refer ences 1. Abbe, E., Bandeir a, A. S., Hall, G.: Ex act reco ve ry i n the stoch astic block model. arXi v preprint arXi v:1405.3267 (2014) 2. Amini, A. A., Chen, A. , Bickel, P .J., Le vina, E.: Pseudo-like lihood methods for community detection in l arge sparse networks. An n. Statist. 41 (4), 2097–2122 (2013). DOI 10.1214/ 13- A OS1138. URL htt p://dx. doi.org /10.1214/13- A OS1138 3. Amini, A.A. , Levina, E.: On semideﬁn ite relaxa tions f or t he block mod el. arXi v preprint arXi v:1406.5647 (2014) 4. Athreya, K.B., Ney , P .E.: Branching processes , vol. 28. Springer -V erlag Berlin (1972) 5. Bhamidi, S., V an der Hofstad, R. , Hooghiemstra, G.: First passage percolation on the erds- ren yi random graph. Combinatorics, Probability & Computing 20 (5), 683–707 (2011) 6. Bhattacharyya, S., Bick el, P .J.: Community detection in netw orks using grap h distance. arXi v preprint arXi v:1401.3915 (2014) 7. Bickel, P ., Choi, D. , Cha ng, X., Zhang, H.: Asymptotic normality of max- imum likelihood and its va riational approximation for stochastic blockmod- els. Ann. Statist. 41 (4), 1922–1943 (2013). DOI 10.1214/13- A OS1124. URL http:// dx.doi .org/10.1214/13- AOS1124 26 Sharmode ep Bhattacharyya and Peter J. Bickel 8. Bickel, P .J., Chen, A.: A nonparame tric vie w of network models and ne wman–girv an and other modularities. Proceedings of the Nationa l A cademy of Scienc es 106 ( 50), 21,068– 21,073 (2009) 9. Bollob ´ as, B., Janso n, S., Riordan, O.: The phase trans ition in inhomogene ous random graphs. Random Structures & Algorithms 31 (1) , 3–122 (2007) 10. Bordena ve, C., Lelarge, M., Massouli ´ e, L.: Non-b acktracking spectrum of random gr aphs: community detection and non-reg ular ramanujan graphs. arXi v preprint arXiv:1501 .06087 (2015) 11. Celisse, A., Daudin, J.J., Pierre, L.: Consistenc y of maximum-likelihood and var iational es- timators in t he stochastic block model. Electron. J. Stat. 6 , 1847–1899 (2012). DOI 10.1214/12-EJS729. URL http: //dx.do i.org/10.1214/12- EJS729 12. Chatelin, F .: Spectral Approximation of Linear Operators. SIAM (1983) 13. Davis, C., Kaha n, W .M.: The rotation of eigen vectors by a perturbation. iii . SIAM Journal on Numerical Analysis 7 (1), 1–46 (1970) 14. Decelle, A., Krzakala, F ., Moore, C., Zdeborov ´ a , L.: Asymptotic analysis of the stochastic block mode l for modular netw orks and its a lgorithmic applications. Physical Revie w E 84 (6), 066,106 (2011) 15. Fiedler , M. : Algebra ic connec tivity of gr aphs. Czechoslov ak Math. J . 23(98) , 298–305 (19 73) 16. Floyd, R.W .: Algorithm 97: shortest path. Communications of the A CM 5 (6), 345 (1962) 17. Gao, C., M a, Z., Zhang, A. Y . , Zhou, H.H. : Achieving optimal misclassiﬁcation proportion in stochastic block model. arXi v preprint arXi v:1505.03772 (2015) 18. Girva n, M., Ne wman, M.E.: Co mmunity stru cture in social and b iological netw orks. Proceed- ings of the National Academy of Sciences 99 (12) , 7821–7826 (2002) 19. Hartigan, J.A.: Clus tering algor ithms. John W il ey & So ns, Ne w Y ork -London- Sydne y (1975). W iley Se ries in Probability and Mathematical Statistics 20. Holland, P .W ., Laskey , K.B., Leinhardt, S.: St ochastic blockmodels: First steps. Social net- works 5 (2 ), 109–137 (1983) 21. Johnson, D.B.: Ef ﬁcient algorithms for sho rtest paths in spar se networks . Journal of the A CM (J A CM) 24 (1), 1–13 (1977) 22. Kat ¯ o, T .: Perturbation theory for linear operators , vol. 132. springer (1995) 23. von Luxb ur g, U. , Belkin, M., Bousquet, O.: Consistenc y of spectra l clustering. Ann. St atist. 36 (2), 555–586 (200 8). DOI 10.1214/0090536 07000000 640. URL http:// dx.doi .org/10.1214/009053607000000640 24. Massouli ´ e, L.: Community detection thresholds and t he weak ramanujan property . In: Pro- ceedings of the 46th Annual ACM Sympos ium on Theory of Computing, pp. 694–703. ACM (2014) 25. Mode, C.J.: Multitype bra nching process es: Theory and applications, v ol. 34. American Else- vier Pub . Co. (1971) 26. Mossel, E., Neeman, J., Sly , A.: Stochastic block models and recons truction. arXiv preprint arXi v:1202.1499 (2012) 27. Mossel, E., Neeman, J., Sly , A. : A proo f of the block model thre shold conjec ture. a rXiv preprint arXi v:1311.4115 (2013) 28. Ng, A.Y ., Jordan, M. I., W eiss, Y . , et al.: On spectral clustering: Analysis and an algorithm. Adv ances in neural information proces sing systems 2 , 849–856 (2002) Spectral Clustering and Block Models: A Re vie w And A New Algorithm 27 29. Rohe, K., Chatterjee, S., Y u, B.: Spectral clustering and the high-dimensional stochas tic blockmodel. Ann. Statist. 39 (4), 1878–1915 (2011). DOI 10.1214/11- A O S887. URL http:// dx.doi .org/10.1214/11- AOS887 30. Rousseeuw , P .J., Leroy , A.M. : Robu st regres sion and outlier detection. W iley Se- ries in Probability and Mathematical Statistics: Applied Probability and Stati stics. John Wile y & Sons, Inc., New Y ork (1987). DOI 10 .1002/0471725 382. URL http:// dx.doi .org/10.1002/0471725382 31. Shi, J., Malik, J.: Normalized cuts and i mage segme ntation. Pattern Analysis and M achine Intelligence, IEEE Tr ansactions on 22 (8), 888–905 (200 0) 32. Sussman, D.L. , T ang, M., Fishkind, D.E. , Priebe, C.E.: A consistent adjacency spectral e mbedding for stochastic blockmodel graphs . J. Amer . Statist. As- soc. 107 (499), 1119 –1128 ( 2012). DOI 10.1080/01621459.20 12.699795. URL http:// dx.doi .org/10.1080/01621459.2012.699795 33. V on Luxburg , U.: A t utorial on spectral clustering. Statistics and computing 17 (4), 395–416 (2007) 34. W arshall, S.: A theorem o n bo olean matrice s. Journal of t he A CM (J ACM) 9 (1), 11–12 ( 1962)

Spectral Clustering and Block Models: A Review And A New Algorithm

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment