Exponential error rates of SDP for block models: Beyond Grothendiecks inequality

Exponent i a l error rate s of SDP for block model s : Beyond Grothendie c k’ s inequality Y ingjie Fei and Y udong Chen School of Operations Research and Information En gineering Cornell University {yf275,yudong.chen} @ cornell.edu Abstract In th is paper we consider the cluster estimatio n pr o blem un d er the Stochastic Block Model. W e sho w t hat the semideﬁnite p rogra m ming (SDP) form ulation for this prob lem achieves an error rate that decays exp o nentially in the signal-to-noise r a tio . The error b ound implies weak recovery in the sparse graph regime with bounded expected degrees, as well as exact recovery in the den se regime. An immediate c o rollary of ou r results yields error bou nds under the Censored Block Model. Moreover , these error bounds are robust, con tinuing to hold under hetero geneou s edge probab ilities and a f orm of the so-called monotone attack . Signiﬁcantly , this error rate is achieved by the SDP solu tion itself without any furth er pre- or post-p rocessing, and improves up on existing polynomially -decayin g er ror bounds proved using the Grothendie c k ’ s inequality . Our a nalysis has two key ingr edients: (i) showing that the gr aph has a well-beha ved spectrum, even in th e sparse re gime, after discou n ting an expo- nentially small number of edges, and (ii) an order-statistics argum ent that governs the ﬁnal error r ate. Both argume n ts highlight the implicit regularization e ﬀ ect of the SDP formu lation. 1 Introd uction In this paper , w e consi der the cluste r / community 1 estimatio n problem under the S tochast ic Block Model (SBM) [ 33 ] with a gro wing number of clusters. In this model, a set of n nodes are parti- tioned into k unkno wn clusters of equal size; a random graph is gene rated by indepen dentl y con- nectin g ea ch pair of nodes w ith probabil ity p if they are in the same clu ster , and w ith pro babili ty q otherwis e. Giv en one re alizati on o f the graph represen ted by its adjacenc y matrix A ∈ { 0 , 1 } n × n , the goa l is to estimate the underlying clusters. Much recent progr ess has been made on this problem, particul arly in identifying the sharp thresh olds for exac t / weak reco ver y when there are a fe w communiti es of size line ar in n . Mo ving bey ond this regi me, howe ver , the understan ding of the prob lem is much more limited , espe cially in characteri zing its beha viors with a growin g (in n ) number of clusters with sublinear sizes , and ho w the clus ter e rrors depend on the sig nal-to -noise ratio (SNR) in between t he exac t and weak reco ver y regimes [ 1 , 44 ]. W e focu s on pr ecisel y these question s. Let the ground-tru th clus ters be enc oded by the cluster matrix Y ∗ ∈ { 0 , 1 } n × n deﬁned b y Y ∗ i j =        1 if no des i and j are in the same co mmunity or if i = j , 0 if no des i and j are in di ﬀ erent communities . 1 The words cluster and community are used interchangeably in this paper . 1 W e consider a no w standard semideﬁnite programming (SDP ) formulation for estimati ng the groun d-truth Y ∗ : b Y = ar g max Y ∈ R n × n  Y , A − p + q 2 J  s.t. Y  0 , 0 ≤ Y ≤ J , Y ii = 1 , ∀ i ∈ [ n ] , (1) where J is the n × n all-one matrix and h· , ·i deno tes the trace inner product . W e seek to characterize the accura cy of the SDP solut ion b Y as an estimato r of the true clusterin g. Our main focus is the ℓ 1 error k b Y − Y ∗ k 1 , where k M k 1 : = P i , j    M i j    denote s the entry-wis e ℓ 1 norm. This ℓ 1 error is a natural metric that measure s a form of pairwise cluster / link errors. In particu lar , note that the m atrix Y ∗ repres ents the pairwise cluster relationsh ip b etween node s; an estimator of such is gi ve n by the matr ix b Y R ∈ { 0 , 1 } n × n obtain ed f rom rounding b Y element-wise . The abov e ℓ 1 error satisﬁes k b Y − Y ∗ k 1 ≥    { ( i , j ) : b Y R i j , Y ∗ i j }    / 2, and th erefor e uppe r boun ds th e n umber of pai rs who se relatio nships are incor rectly estimated by the SDP . In a semina l paper [ 29 ], Guédon and V ershyn in exhibit ed a remarkable use of the Groth endiec k’ s inequa lity , and obt ained the follo w ing hig h-prob abilit y error bound fo r the solutio n b Y of th e SDP: k b Y − Y ∗ k 1 k Y ∗ k 1 . r k 2 SNR · n , (2) where S NR = ( p − q ) 2 /  1 k p + (1 − 1 k ) q  ≈ ( p − q ) 2 / p is a measure of t he s ignal- to-noi se ratio. This bound holds e ven in the sparse graph regime with p , q = Θ (1 / n ), manifesti ng the power of the Groth endiec k’ s inequality . In this paper , we go be yond the abo ve results, and sho w that b Y in fa ct satisﬁes (with high probab ility) the follo wing expon entiall y-deca ying er ror bou nd k b Y − Y ∗ k 1 k Y ∗ k 1 . e xp  − Ω  SNR · n k  (3) as long as SNR & k 2 n (Theorem 1 ). The bound is vali d in both the spars e and dense regi mes. Signiﬁcantl y , this error rate is achiev ed by the SDP ( 1 ) itself, without the need of a multi-st ep proced ure, e ven though we are estimati ng a discrete structu re b y solving a continu ous op timiza- tion probl em. In particular , the SDP approa ch does not require pre-proce ssing of graph (such as trimming and spli tting) or an in itial es timate of the clusters, nor an y non-tri vial post-proc essing of b Y (suc h as loc al clus ter reﬁnement or randomized roundi ng). If an expli cit clusterin g of the nodes is concerne d, the result abov e also yields an error bound for estimating σ ∗ , the true cluster la bels. In particular , an e xplicit clust er la beling b σ can be ob- tained e ﬃ ciently from b Y . Let err ( b σ , σ ∗ ) denote the frac tion of nodes that are labeled di ﬀ erently by b σ and σ ∗ (up to p ermutati on of the la bels). This mis-classiﬁca tion error can be sh o wn to be upper bounded by the ℓ 1 error k b Y − Y ∗ k 1 / k Y ∗ k 1 , and therefore satisﬁes the same expone ntial bound (Theorem 2 ): err ( b σ , σ ∗ ) . exp  − Ω  SNR · n k  . (4) Specializ ed to di ﬀ erent value s of the errors, this single error bound ( 3 ) implies su ﬃ cient condi- tions for achie ving exac t reco ve ry (stro ng consi stenc y), almost ex act recov ery (weak consistenc y) and w eak reco ve ry; see S ection 1.2 fo r th e de ﬁnitions of these reco very typ es. More g eneral ly , the above bound yield s SNR conditio ns su ﬃ cient for achie ving an y δ error . As to be di scusse d in d etails in Section 3.1.1 , these conditions are (at least ord er -wise) optimal, and impr ov e upon 2 exi sting results especiall y when the nu mber of clusters k i s allo wed to scale with n . In addi tion, we prove that th e ab ov e guarantees fo r SDP are r obust aga inst de viations from the sta ndard SBM. The same exp onent ial bou nds continue to hold in the presenc e of heterogeneo us edge prob abili- ties as well as a form of monoto ne atta ck where an adversa ry can modify the grap h (Theorem 3 ). In addi tion, we sho w that our results readily exten d to the Censor ed B loc k Mode l , in w hich only partial ly observ ed data is av ailable (Corollary 1 ). In addi tion to impr ov ed error bounds, our resu lts also in v olv e the de velopment of se veral ne w analyt ical techniqu es, as are discussed belo w . W e expect these techniq ues to be more broadly useful in the analy sis of SDP an d othe r algor ithms for SBM and related stat istical problems. 1.1 T echnical highli ghts Our analysis of the SDP formulation bui lds on two ke y ingredie nts. The ﬁrst ar gument in volv es sho wing that the graph can be partitioned into two componen ts, one with a well-beha ved spectrum, and the othe r with an expon entiall y small number of edges; cf. Propos ition 2 . Note that this partiti oning is done in the an alysis , rath er than in the algorithm. It ensure s that the SDP pro duces a useful solu tion all the way down to the spar se regi me w ith p , q = Θ ( 1 n ). The second ingredient is an orde r -statis tics argumen t that cha racteri zes the interp lay between the error matrix and the randomn ess in the graph; cf. Propos ition 1 . Upper bounds on the sum of the top order statistics are what ulti mately di ctate the e xpone ntial deca y of the error . In both ar guments , we mak e crucial use of the entry-wise bound ednes s of the SDP solution b Y , which is a manifest of the implicit reg ulariza tion e ﬀ ect of the SD P fo rmulatio n. Our results are non-as ymptotic in nature, valid for ﬁ nite va lues of n ; lettin g n → ∞ gi ves asympto tic results. All other parameter s p , q and k are allo wed to scale arbitrar ily with n . In particu lar , the number of clusters k may gr o w with n , the clusters may ha ve size sublinear in n , and the edge proba bilitie s p and q may range from the sparse case Θ ( 1 n ) to the dense st case Θ (1). Our results ther efore pro vide a general character ization of the relations hip between the SNR, the cluste r sizes and the recov ery errors. This is particularl y importa nt in the regime of subli near cluste r sizes, in which case all v alues of p and q are of interest. T he p rice of such general ity is that we do not seek to obtain optimal v alues of the multiplicat i ve cons tants in the error bo unds, doing w hich typically require s asymptotic anal ysis with scalin g restrict ions on the parameters. In this sense, our results complement re cent work on the fu ndamenta l limits and sharp r eco very thresh olds of SBM [ 1 ]. 1.2 Related work The SBM [ 33 , 13 ], a lso kno wn as the planted part ition m odel in the computer sci ence community , is a standard mode l for stu dying co mmunity detec tion and graph clusterin g. There is a lar ge body of work on the theoret ical an d algorith mic as pects of this m odel; see for example [ 20 , 1 , 45 , 6 ] and the reference s therein. Here we on ly brieﬂy discu ss the most rele vant work, and de fer to Section 3 for a more detai led comparison after stating our main th eorems. Existing work distin guishe s between sev eral types of recov ery [ 1 , 23 ], includ ing: (a ) weak reco ver y , where the fraction of mis-clustere d node s satisﬁes err ( b σ , σ ∗ ) < 1 − 1 k and is hence better than random guess; (b) almost exact recov ery (weak consis tenc y), where err ( b σ , σ ∗ ) = o (1); (c) exac t recov ery (strong consistenc y), where er r ( b σ , σ ∗ ) = 0. The SDP relaxation approach to SBM has been studie d in [ 10 , 9 , 18 , 20 , 49 , 36 , 17 , 19 ], which mostly f ocus on exact rec ov ery in the logarit hmic-de gree re gime p & log n n . Using the Grothendie ck’ s ine quality , the wo rk in [ 29 ] pro ves for the ﬁrst time that SD P ac hie ves a non-tri vial erro r bo und in the sparse regime p ≍ 1 n 3 with bounded e xpecte d degre es. In the two -clust er case, it is fur ther sho wn in [ 43 ] tha t SDP in fact achie ves the opti mal weak recov ery threshold as long as the expe cted de gree is lar ge (b ut still boun ded). Our singl e error bound implies e xact and weak recove ry in the logarit hmic and bound ed degree regimes , respecti vely . Our result in fac t goes beyon d these existing ones and applie s to eve ry setting in between t he two extreme regimes, ca pturin g the expo nentia l decay of error ra tes from O (1) to zero. A v ery recent line of researc h aims to precisely chara cterize the funda mental limits and phase transit ion beha viors of SBM — in particular , what are the sharp SNR threshold s (incl uding the leadin g cons tants) for achie ving the di ﬀ eren t reco very type s di scusse d abov e. When the numbe r k of clust ers is boun ded, many of thes e question s no w ha ve sati sfac tory answers. W ithout exhau sting this still gro wing line of remarkab le work, we wou ld like to refe r to the paper s [ 45 , 41 , 5 , 46 , 47 ] for weak recov ery , [ 48 , 4 , 10 , 26 , 55 ] for almost exac t recov ery , and [ 48 , 3 , 4 ] for exa ct reco ver y . SDP has in fact been sho wn to achie ve the optimal e xact r eco very thres hold [ 31 , 50 , 11 , 7 ]. Our resu lts imply su ﬃ c ient cond itions for S DP achie ving these va rious types of recov ery , and moreov er interpo late between them. A s mentioned , we a re mostly con cerned with the no n- asympto tic setting with a gro wing number of cluster s, w ithout attempting to optimize the value s of the leading cons tants. Therefore , our resu lts focus on some what di ﬀ erent regimes from the work abo ve. Partic ularly rele va nt to us is the wo rk in [ 21 , 5 5 , 56 , 4 , 26 , 29 , 40 ], which pr ovi des exp licit bound s on the error rates of other algorith ms for estimating th e ground -truth clustering in SBM. The Censore d Block Model is stud ied in the pap ers [ 2 , 30 , 32 , 21 , 37 , 53 ]. Robu stness issues in SBM are consi dered in the wor k in [ 15 , 2 5 , 24 , 36 , 5 0 , 51 , 22 , 42 , 40 ]. W e dis cuss th ese res ults in more deta ils in S ection 3 . 1.3 Notations Column vect ors are denot ed by lower -case bold letters such as u , where u i is its i -th entry . Matrices are denoted by bold capital lette rs such as M , with M ⊤ denoti ng the tran spose of M , T r( M ) its trace, M i j its ( i , j )-th entry , and diag ( M ) the vector of its diagona l entries. For a matrix M , k M k 1 : = P i , j M i j is its entry -wise ℓ 1 norm, k M k ∞ : = max i , j    M i j    the entry- wise ℓ ∞ norm, and k M k op the spectr al norm (the maximum sing ular va lue). Deno te by M i • the i -th ro w of the matrix M and M • j its j -th column. W e write M  0 if M is symmetri c and posit i ve semideﬁnite . W ith anothe r matr ix G of the sa me dimension as M , w e let h M , G i : = T r( M ⊤ G ) denot e the ir trace inner produ ct, and use M ≥ G to m ean that M i j ≥ G i j for all i , j ∈ [ n ]. Let I and J be the n × n ident ity matrix an d all-o ne matrix, resp ecti vely , and 1 the all-on e column ve ctor of length n . W e use Bern( µ ) to denote t he Bernoulli distrib ution with rate µ ∈ [0 , 1]. For a positi ve inte- ger i , let [ i ] ≔ { 1 , 2 , . . . , i } . For a real number x , ⌈ x ⌉ denotes its ceiling. Throughou t a univ ersal consta nt C means a ﬁx ed numbe r that is in depen dent of th e model para meters ( n , k , p , q , etc.) and the graph distri b ution . W e use the followin g standard notat ions for order comparison of two non-n egat i ve sequence s { a n } and { b n } : W e write a n = O ( b n ), b n = Ω ( a n ) or a n . b n if there ex ists a uni ve rsal const ant C > 0 such that a n ≤ C b n for all n . W e write a n = Θ ( b n ) or a n ≍ b n if a n = O ( b n ) an d a n = Ω ( b n ). W e write a n = o ( b n ) or b n = ω ( a n ) if lim n →∞ a n / b n = 0. 2 Pr oblem setup In this sectio n, we formally set up the problem of cluster estimatio n unde r S BM and describ e the SDP a pproac h. 4 2.1 The Stochastic Block Model Giv en n nodes, we assume th at e ach no de b elongs to exact ly one of k ground truth clusters, where the clusters hav e equal size. This gro und truth is encoded in the cluste r matrix Y ∗ ∈ { 0 , 1 } n × n as deﬁned in Section 1 . W e do not kno w Y ∗ , but we observ e the adjacen cy m atrix A of a graph genera ted from the followin g Stochastic Block Mod el (SBM). Model 1 (Standard Stochas tic Blo ck Model) . The g raph adjacen cy matrix A ∈ { 0 , 1 } n × n is sym- metric with its entries { A i j , i < j } genera ted independ ently by A i j ∼        Bern( p ) if Y ∗ i j = 1 , Bern( q ) if Y ∗ i j = 0 , where 0 ≤ q < p ≤ 1 . The v alues of the diag onal entries of A are inconse quenti al for the SDP f ormulatio n ( 1 ) due to the constrain t Y ii = 1 , ∀ i . There fore, we assume w ithout loss of generality that A ii ∼ Bern( p ) indepe ndent ly for all i ∈ [ n ], which s impliﬁes the presentatio n of th e analy sis Our go al is to estimate Y ∗ gi ve n th e obse rved gr aph A . Let σ ∗ ∈ [ k ] n be the vector of groun d- truth cluster labels, whe re σ ∗ i is the inde x of the clu ster that contains node i . (The cluster la bels are unique on ly u p to permuta tion; here σ ∗ is deﬁned with respec t to an a rbitrar y permutatio n.) Playing a crucial role in our result s is the quantity s ≔ ( p − q ) 2 1 k p + (1 − 1 k ) q , (5) which is a measure of the SNR of the model. In particular , the numera tor of s is the squared exp ected d i ﬀ erence between the in- and cross-clust er edge probabiliti es, and the d enomina tor is essent ially the a verage v ariance of the en tries of A . The quant ity s ha s been sho wn to captu re the hardne ss of SBM, an d deﬁnes the celebrat ed K esten -Stigum threshold [ 47 ]. T o av oid cluttered notati on, we assume throu ghout the pape r that n ≥ 4, 2 ≤ k < n and ther e ex ists a uni versal consta nt 0 < c < 1 that q ≥ c p ; this setting encompass es m ost interesting regimes of the pro blem, as c lusteri ng is more chall enging when q is lar ge. 2.2 Semideﬁnite programming r elaxation W e consider the S DP formulatio n in ( 1 ), whose optima l solution b Y serves as an estimator of groun d-truth cluster matrix Y ∗ . This S DP can be interprete d as a con ve x re laxatio n o f the maxi- mum like lihood estimator , the modularity maximization probl em, the optimal subgraph / cut prob- lem, or a va riant of the rob ust / sp arse P CA problem; see [ 10 , 13 , 18 , 20 , 17 ] for such deriv ations. Our goal is to study the recov ery erro r of b Y in terms of the number of nodes n , the number of cluste rs k and the SNR measure s deﬁned ab ov e. Note that there is n othing speci al about the particular formulatio n in ( 1 ). All our resu lts apply to, fo r ex ample, the alternati ve S DP formulation belo w : b Y = ar g max Y ∈ R n × n h Y , A i s.t. Y  0 , 0 ≤ Y ≤ J , n X i , j = 1 Y i j = n X i , j = 1 Y ∗ i j , Y ii = 1 , ∀ i ∈ [ n ] . (6) 5 This formul ation was pre viously considered in [ 20 ]. W e may also repla ce the third cons traint abo ve with the row-wise con strain ts P n j = 1 Y i j = P n j = 1 Y ∗ i j , ∀ i ∈ [ n ], akin to the formulat ion in [ 10 ] moti v ated by weak assortati ve SBM. Under the sta ndard assumption of equal-sized clu sters, the v alues P n i , j = 1 Y ∗ i j = n 2 / k and P n j = 1 Y ∗ i j = n / k are kno wn. Theref ore, the formulation ( 6 ) ha s the adv antage that it does not require kno wledge of the edge probabilit ies p and q , but instead the number o f cluste rs k . 2 The optimization probl ems in ( 1 ) an d ( 6 ) can be solv ed in polynomial time using any general- purpo se SDP solve rs or ﬁrst order algorithms . Moreo ver , this SD P appro ach continue s to motiv ate, and beneﬁt from, the rapid de velopment of e ﬃ cien t algorithms for solvi ng structured SDPs. For exa mple, the algorit hms cons idered in [ 34 , 51 ] can solv e a pro blem in volv ing n = 10 5 nodes within seconds on a lapt op. In addition to computationa l e ﬃ ciency , the SD P approac h also enj oys se ve ral other desired propert ies including rob ustness, conceptu al simplicit y and app licabil ity to sparse graphs, making it an attracti ve option among other clusterin g and community detectio n algori thms. The empirical perfor mance of SDP has been extens i vel y st udied, both under SB M and with real data; see for exa mple the wor k in [ 10 , 17 , 19 , 34 , 51 ]. Here we focus on the theoreti cal guaran tees of this SDP approac h. 2.3 Explicit clustering by k -medians After solv ing the SDP formulations ( 1 ) or ( 6 ), the cluster membership can be extrac ted from the soluti on b Y . This can be done using many simple proc edures . For example, when k b Y − Y ∗ k 1 < 1 2 , simply roun ding the entries of b Y will exactly reco ver Y ∗ , from which the true clusters can be ext racted easily . In the case with k = 2 clust ers, one may use the signs of the entries of ﬁrst eigen vector o f b Y − 1 2 J , a procedure analyzed in [ 29 , 43 ]. More general ly , our the oretic al resul ts guaran tee that the SDP solution b Y is already close to true cluster matrix Y ∗ ; in this case, we expect that many local roundin g / reﬁnement procedures, such as Lloyd’ s-style greedy algor ithms [ 39 ], will be able to e xtract a high -quali ty clus tering. For the sake of retainin g fo cus on the SDP formulat ion, w e choos e not to separa tely analyze these possib le e xtracti on procedures, but inst ead consider a more uniﬁed app roach . In partic u- lar , we v ie w the ro ws of b Y as n points in R n , a nd a pply k -medians clustering to them to ext ract the cluste rs. While ex actly solving the k -medians problem is computation ally hard, there exist polyn omial-time constant-f actor appro ximation schemes, suc h as the 6 2 3 -appro ximation algorithm in [ 16 ], which su ﬃ ces for our purpose. Note that this algorithm may not be the most e ﬃ cient way to extra ct an ex plicit clu stering from b Y ; rath er , it is inte nded as a simple venu e for deri ving a cluste ring error bound th at can be readily co mpared with e xistin g results . Formally , w e use ρ -kmed ( b Y ) to denote a ρ -approx imate k -median procedure applied to the ro ws of b Y ; the details are p rovi ded in Sectio n A . T he output b σ : = ρ - kmed ( b Y ) is a vector in [ k ] n such that node i is assigned to the b σ i -th cluster by the procedure . W e are interes ted in bounding the clus tering error of b σ relat i ve to the ground truth σ ∗ . Let S k denote the symmetric grou p cons isting of all permutations of [ k ]; we consider the metric err ( b σ , σ ∗ ) : = min π ∈ S k 1 n     n i ∈ [ n ] : b σ i , π ( σ ∗ i ) o     , (7) 2 Note that the constraint Y ≤ J in the formulations ( 1 ) and ( 6 ) is in fact redundan t, as it is implied by t he constraints Y  0 and Y ii = 1 , ∀ i . W e still keep this constraint as the property b Y ≤ J plays a crucial r ole in our analysis. 6 which is the proporti on of nodes that are mis-classiﬁed, mod ulo permutatio ns of the cluste r lab els. Before proce eding , we brieﬂy mentio n sev eral possible extensi ons of the setting discus sed abo ve. The number p + q 2 in the SDP ( 1 ) can be replac ed by a tuning para meter λ ; as would become e viden t from the proof , our theoretic al resu lts in fact hold for an entire range of λ v alues , for exa mple λ ∈ [ 1 4 p + 3 4 q , 3 4 p + 1 4 q ]. Our theory also gene ralizes to the setting with unequal cluster sizes; in this case the same theo retica l guarantees hold with k replaced by n /ℓ , where ℓ is any lo wer bound of the clu ster sizes . 3 Main r esults W e present in Section 3.1 our m ain the orems, w hich prov ide e xpon entiall y-deca ying error bounds for the SDP formulati on under SBM. W e also discuss the cons equenc es o f our resul ts, including their implications for robus tness in Section 3.2 and applic ations to the Censored Block Model in Section 3.3 . In the sequel, b Y d enotes an y optimal so lution to the S DP formulation in either ( 1 ) or ( 6 ). 3.1 Err o r rates under standard SBM In this sectio n, we conside r the stand ard SBM setting in Model 1 . Recall that n and k are re- specti vely the numbers of nodes and clusters , and Y ∗ is the ground-truth cluster matrix deﬁned in Section 1 w ith σ ∗ being the correspo nding v ector of true cluster labels. Our result s are stated in terms of the SNR measu re s gi ve n in equation ( 5 ). The ﬁrst theorem, pro ved in S ection 4 , shows that the SDP solution b Y achie ve s an exponent ial error ra te. Theor em 1 (Exponential Error Rate) . Under Model 1 , ther e ex ist univer sal constants C s , C g , C e > 0 fo r whic h the following holds. If s ≥ C s k 2 / n, the n we h ave k b Y − Y ∗ k 1 k Y ∗ k 1 ≤ C g exp " − sn C e k # with p r obabilit y at lea st 1 − 7 n − 7 e − Ω ( √ n ) . Our ne xt result concerns the explic it clustering b σ : = ρ -kmed ( b Y ) extr acting from b Y , using the approx imate k -medians proced ure giv en in Section 2.3 , where ρ = 6 2 3 . As we show in the proof of th e follo wing theorem, the error rate in b σ is always upper -bound ed by the error in b Y : err ( b σ , σ ∗ ) ≤ 86 3 · k b Y − Y ∗ k 1 k Y ∗ k 1 ; cf. Propositio n 3 . Consequently , the nu mber of misc lassiﬁed nodes als o e xhibits an ex ponen tial decay . Theor em 2 (Clu sterin g Error) . U nder Mod el 1 , ther e e xist universa l constants C s , C m , C e > 0 for which the follo wing holds. If s ≥ C s k 2 / n, the n we h ave err ( b σ , σ ∗ ) ≤ C m exp " − sn C e k # with p r obabilit y at lea st 1 − 7 n − 7 e − Ω ( √ n ) . 7 W e pro ve this the orem in S ection D . Theorems 1 and 2 are applic able in the sparse graph regime with bounde d expecte d degrees. For example , suppo se that k = 2, p = a n and q = b n for two constan ts a , b ; the results abo ve guaran tee a non- tri vial accurac y for SDP (i.e., k b Y − Y ∗ k 1 < 1 2 k Y ∗ k 1 or err ( b σ , σ ∗ ) < 1 2 ) as long as ( a − b ) 2 ≥ C ( a + b ) for some constan t C . Another interesti ng regime that our results apply to, is when there is a lar ge nu mber of clusters . For ex ample, f or any constant ǫ ∈ (0 , 1 2 ), if k = n 1 / 2 − ǫ and p = 2 q , then S DP ach ie ve s exact recov ery ( err ( b σ , σ ∗ ) < 1 n ) pro vided that p & n − 2 ǫ . Belo w we provide add itiona l discussion of our results, and compa re with e xisti ng ones. 3.1.1 Consequences and Op timality Theorems 1 and 2 immediately imply su ﬃ cient condit ions for the vario us reco ver y types discusse d in Sect ion 1.2 . • Exact reco very (str ong consist ency): When s & k 2 + k lo g n n , The orem 1 guarantees th at k b Y − Y ∗ k 1 < 1 2 with high probabil ity , in which case element-wis e rounding b Y exactly recov ers the true cluster matrix Y ∗ . 3 This result matches the best kno wn e xact re cov ery gu arante es for SDP (and ot her polynomial-ti me algorithms) when k is allowed to gro w with n ; see [ 20 , 10 ] for a revie w of thes es results . • Almost exa ct r ecove ry (w eak consistency): Under the condition s = ω ( k 2 n ), Theorem 2 ensure s that err ( b σ , σ ∗ ) = o (1) with high prob abilit y as n → ∞ , henc e SDP achie ves weak consis tenc y . T his condition is op timal (nec essary and su ﬃ cient), as ha s be en pro ved in [ 48 , 4 ]. • W eak rec ov ery: When s & k 2 n , Theorem 2 ensures that err ( b σ , σ ∗ ) < 1 − 1 k with high probab ility , hence SDP achiev es weak recov ery . In part icular , in the setting with k = 2 cluste rs, SDP reco vers a clustering that is positi vely corr elated with the ground-tr uth under the condition s & 1 n . This condition match es u p to con stants th e so-called K esten-Sti gum (KS) thre shold s > 1 n , which is kno w n to be opt imal [ 45 , 41 , 5 , 46 , 47 ]. • Reco ver y with δ err or: More gene rally , for any nu mber δ ∈ (0 , 1) , Theorem 2 impl ies that if s & max n k 2 n , k n log 1 δ o , then err ( b σ , σ ∗ ) < δ with high probab ility . In the case with k = 2, the minimax rate resu lt in [ 26 ] impli es that s & 1 n log 1 δ is n ecessa ry for any algorithm to achie ve a δ clustering error . Our re sults are thus optimal up to a multip licati ve constant. Our results therefore cover these di ﬀ erent recov ery regimes by a uniﬁed error bound, using a single algor ithm. This can be contras ted with the previo us error bound ( 2 ) prov ed using the Grothend ieck’ s inequalit y appro ach, which fa ils to iden tify the exact recov ery condition abov e. In particu lar , the bou nd ( 2 ) decays polyn omially w ith the SNR measu re s ; since s i s at most k and k Y ∗ k 1 = n 2 / k , the smallest possible error that can be deriv ed from this boun d is k b Y − Y ∗ k 1 = O ( p n 3 / k ). Our results app ly to general v alues of k , which is allo w ed to scal e with n , hence th e siz e of the cluste rs can be sublinear in n . W e note that in this regime, a computationa l-barr ier phenomenon seems to take place : there may exist instance s of SBM in which cluster reco very is informat ion- theore tically possible but cannot be achie ve d by computationa lly e ﬃ cient algori thms. For exam- ple, the work in [ 20 ] prove s that the intractable maximum lik elihoo d estimator succee ds in ex act 3 In fact, a simple mo diﬁcation of our analysis proves that b Y = Y ∗ . W e omit the details of such reﬁnement for the sake of a more streamlined presentation. 8 reco ver y when s & k log n n ; it also pr ovi des e viden ces sugge sting that all e ﬃ cient algo rithms fail unless s & k 2 + k lo g n n . Note that the latter i s consiste nt with the conditio n d eri ved ab ov e from our theore ms. The abov e discu ssion has the follo wing implications for the optimal ity of Theorems 1 and 2 . On the one hand , th e general minimax rate result in [ 26 ] sugge sts that all algorith ms (re gardless of thei r computationa l complex ity) incur at least exp [ − Θ ( sn / k )] error . Our expo nentia l error rate matches this infor mation-th eoreti c lo wer bound. On the other hand, in vie w of the computati onal barrier discussed in the last paragra ph, o ur SNR condition s & k 2 / n is likely to be unimpro vable if e ﬃ cient algo rithms are cons idered . 3.1.2 Comparison with existing r esults W e disc uss some prior work that also provides e ﬃ cient algo rithms attainin g an exponen tially - decayi ng rate for the clustering error err ( b σ , σ ∗ ). T o be clear , the se algo rithms are very di ﬀ erent from ours, often in v olving a two -step proced ure that ﬁrst compute s an accur ate initial estimate (typic ally by spect ral cluste ring) follo wed by a “clean-up” process to obtain the ﬁnal solution. Some of them require additional steps of sample splitti ng and graph trimming / re gularization . As we discus sed in S ection 3.2 belo w , many of th ese procedu res rely on de licate propertie s of the standa rd SBM , and therefore are not ro b ust again st model de viation. Most relev ant to us is the work in [ 21 ], w hich de ve lops a spectr al algorith m with sample sp lit- ting. As stated in their main theorem, their algorithm achie ves the error rate exp h − Ω ( sn / k 2 ) i when s & k 2 / n , as long as k is a ﬁ xed con stant when n → ∞ . The work in [ 55 ] and [ 56 ] also cons iders spectr al algorith ms, w hich attain expo nentia l error rates assuming that k is a constant and pn → ∞ . The algorithms in [ 26 , 27 ] in volv es obtaini ng an initial clustering using spectral algorithms , which requir e s ≫ k 3 / n ; a po st-proc essing step (e.g., using a Lloyd’ s-style algorithm [ 39 ]) then outputs a ﬁnal soluti on that as ymptotic ally achie ves the minimax error rate exp [ − I ∗ · n / k ] , where I ∗ is an approp riate form of Renyi div er gence and satisﬁes I ∗ ≍ s . The work in [ 4 ] proposes an e ﬃ cient algori thm called Sphere C ompariso n, which achie ves an e xponential error rate in the con stant de- gree regime p = Θ (1 / n ) when s ≥ k 2 / n . The work [ 40 ] uses SDP to produce an initial clustering soluti on to b e fed to another cluster ing algorithm; their ana lysis extend s the techniques in [ 29 ] to the setting with corrupte d observ ation s, and their o ve rall algor ithm attains an e xpone ntial error rate ass uming that s & k 4 / n . 3.2 Rob ustness Compared to ot her clustering algo rithms, one notable adv antage of the SDP appro ach lie s in its rob ustness u nder v arious challenging settings of SBM. For examp le, standard spect ral clustering is kno w n to be incon sistent in the sparse graph regime with p , q = O (1 / n ) due to the ex istenc e of atypic al node de grees, an d allev iating thi s di ﬃ culty gen erally requires sophisticate d algorithmic techni ques. In contrast, as sho wn in Theorem 1 as well as other recent wor k [ 43 , 29 , 17 ], the SDP appr oach is applicable without chan ge to this sparse regime. SDP is also rob ust against the exi stence of o ( n ) outli er nodes and / or edge modiﬁcations, while standard spec tral cluste ring is fair ly fragile in these setting s [ 15 , 50 , 42 , 51 , 43 , 40 ]. Here we focus on anothe r remarka ble form of robust ness enjo yed by SDP with respect to hetero geneou s edge probabiliti es and monotone attac k, captured in the follo wing genera lizatio n of th e stan dard SBM. 9 Model 2 (Hetero geneou s Stochastic Block Model) . Giv en the grou nd-tru th clustering σ ∗ (encod ed in th e clus ter matrix Y ∗ ), the entries { A i j , i < j } of the graph ad jacenc y matrix A are gen erated indepe ndent ly with        A i j is Bern oulli with rate at leas t p if Y ∗ i j = 1 , A i j is Bern oulli with rate at most q if Y ∗ i j = 0 , where 0 ≤ q < p ≤ 1. The abov e model imposes no constr aint on the edge probabiliti es besides the upper / lo wer bound s, and in particular the probabilit ies can be non-uni form. This model enco mpasses a varia nt of the so-ca lled m onotone attac k studied exten si ve ly in the compute r science literatu re [ 24 , 36 , 22 ]: here an adv ersary can arbitra rily set some edge proba bilitie s to 1 or 0, w hich is equi va lent to adding edges to node s in the same cluste r and removing edges across clusters. 4 Note that the adversar y can make f ar m ore tha n o ( n ) edge modiﬁcat ions — O ( n 2 ) to be precise — in a restric ti ve wa y th at seems to strengthe n the clus tering struct ure (hence th e name). Monotone atta ck ho wev er doe s not necess arily mak e the clust ering pro blem e asier . O n the co ntrary , the adversary can si gniﬁcant ly alter some pr edicta ble structures that arise in stand ard SBM (such as the gra ph spectr um, node deg rees, subgraph counts and the non-e xisten ce of d ense spot s [ 24 ]), and henc e foil algo rithms that over -explo it such struc tures. For example, some spectral algo rithms prov ably fail in this setting [ 22 , 51 ]. More generally , Model 2 allo ws for unpre dictab le, non-r andom d e viatio ns (not necess arily due to an adversa ry) from the standar d SBM setting, whi ch has statistical pr opertie s that are rarely possess ed by real world graph s. It is straig htforw ard to sho w that when ex act recov ery is concern ed, SDP is un a ﬀ ected by the hetero geneit y in Mode l 2 ; see [ 24 , 31 , 19 ]. T he follo wing theo rem, pro ve d in Section 5 , sho ws that SDP in fact achie ves the same expo nentia l error rates in the presenc e of hetero genei ty . Theor em 3 (Rob ustne ss) . The co nclusi ons in Theor ems 1 and 2 continue to hold un der Model 2 . Consequ ently , under the same conditi ons discussed in Section 3.1.1 , the SDP app roach achie ves exa ct reco very , almost exac t reco ve ry , w eak rec ov ery and a δ -err or in th e more ge neral Model 2 . As a pa ssing note, the results in [ 42 ] sho w that when e xact constant va lues are c oncern ed, the optimal w eak recov ery threshol d changes in the presen ce of mon otone attack, and there may exist a fundament al tradeo ﬀ between optimal recov ery in standard SBM and robus tness agains t model de viati on. 3.3 Censor ed block model The Censore d Block Model [ 2 ] is a v ariant of the stan dard SB M that repres ents the scenario with partial ly obs erve d d ata, ak in to th e settings of m atrix co mpletion [ 35 ] and graph clustering w ith measuremen t b udgets [ 20 ]. In this section , we sho w that Theorems 1 and 2 immediately yield reco ver y guarant ees for the SDP formulation unde r this model . Concrete ly , again assume a ground-t ruth set of k equ al-size clusters over n nodes , with the corres pondin g labe l vec tor σ ∗ ∈ [ k ] n . These clusters can be enco ded by the cluster matrix Y ∗ ∈ { 0 , 1 } n × n as deﬁned in Section 1 , but it is more con venient to work with its ± 1 vers ion 2 Y ∗ − J . 4 W e do note that here the addition / re mov al of edges are determined be fore the realization of the random e dge connections, which is more restrictive than the stand ard mon otone attack model. W e believ e this restri cti on is an artifact of the analysis, and leave further improv ements to future work. 10 Under the Censore d Block Model, one observes the entrie s of 2 Y ∗ − J restri cted to the edges of an Erdos-Ren yi graph G ( n , α ), b ut with each entry ﬂ ipped with prob ability ǫ < 1 2 . The model is descri bed formally belo w . Model 3 (Censored Block Model) . The observ ed matrix Z ∈ {− 1 , 0 , 1 } n × n is symmetric and has entries ge nerate d independen tly across al l i < j with Z i j =              0 , with p robabi lity 1 − α , 2 Y ∗ i j − 1 with p robabi lity α (1 − ǫ ), − (2 Y ∗ i j − 1) with pro babilit y αǫ , where 0 < α ≤ 1 and 0 < ǫ < 1 2 . The go al is ag ain to recov ery Y ∗ (equi vale ntly 2 Y ∗ − J ), gi ven th e obse rved matr ix Z . One may reduce this problem to the standar d S B M by constru cting an adjacen cy matrix A ∈ { 0 , 1 } n × n with A i j = | Z i j | · ( Z i j + 1) / 2; that is, we zero out the unobs erv ed entries in the binary repres entatio n ( Z + J ) / 2 of Z . The uppe r -trian gular e ntries of A ar e indepen dent B ernoulli varia bles with P  A i j = 1  = P  Z i j = 1  =        α (1 − ǫ ) if Y ∗ i j = 1 , αǫ if Y ∗ i j = 0 . Therefore , the matrix A can be vie w ed as generated from the stan dard SBM (Model 1 ) with p = α (1 − ǫ ) and q = αǫ . W e can then obtain an est imate b Y of Y ∗ by solv ing the SDP fo rmulation ( 1 ) or ( 6 ) with A as the input, po ssibly follo wed by the approxima te k -medians procedure to get an exp licit clustering b σ . The error rates of b Y and b σ can be deri ved as a corollar y of Theorems 1 and 2 . Cor ollary 1 (Censored Block Model) . Under Model 3 , ther e e xist univer sal constants C s , C g , C e > 0 fo r whic h the following holds. If α (1 − 2 ǫ ) 2 ≥ C s k 2 n , th en 3 86 err ( b σ , σ ∗ ) ≤ k b Y − Y ∗ k 1 k Y ∗ k 1 ≤ C g exp " − α (1 − 2 ǫ ) 2 · n C e k # with p r obabilit y at lea st 1 − 7 n − 7 e − Ω ( √ n ) . Specializ ing this corollary to the di ﬀ ere nt types of reco very deﬁned in Sectio n 1.2 , we imme- diately obtain th e follo wing su ﬃ c ient conditions fo r SDP under the Censored Block Mod el: (a) exa ct reco very is ach ie ve d when α & max { k log n , k 2 } n (1 − 2 ǫ ) 2 ; (b) almost e xact re cov ery is ach ie ved w hen α = ω  k 2 n (1 − 2 ǫ ) 2  , (c) weak recov ery is achie ved when α & k 2 n (1 − 2 ǫ ) 2 ; (d) a δ clustering error is achie ved when α & max { k 2 , k l og(1 /δ ) } n (1 − 2 ǫ ) 2 . Sev eral exis ting results focu s on the Censor ed Block Mode l with k = 2 clusters in the asymp- totic re gime n → ∞ . In this settin g, th e work in [ 2 ] pro ves that ex act reco ve ry is possi ble if and only if α > 2 log n n (1 − 2 ǫ ) 2 in the limit ǫ → 1 / 2, and pro vides an SDP-based alg orithm that succeeds twice the above thresh old; a mo re precise threshold α > log n n  √ 1 − ǫ − √ ǫ  2 is gi ven in [ 30 ]. For weak reco ver y un der a sparse g raph α = Θ (1 / n ), it is c onject ured in [ 32 ] that the problem is s olv able if and o nly if α > 1 n ( 1 − 2 ǫ ) 2 . T he con vers e and achie v abili ty parts of the conje cture a re prov ed i n [ 37 ] and [ 5 3 ], respecti vely . Corollary 1 sho ws that S DP achie ves (u p to co nstants ) the abov e e xact and weak reco ve ry threshold s; moreov er , our resu lts apply to the more general setting with k ≥ 2 cluste rs. 11 4 Pr oof of Th eorem 1 In this sectio n we pro ve our main theoretic al results in T heorem 1 for Model 1 (Standard SBM). While Model 1 is a speci al case of Mode l 2 (Heterogene ous SBM), we choose not to deduce The- orem 1 as a coro llary of Theorem 3 which conc erns the mor e gener al model. Instead, to high light the main ideas of the analys is and a void tec hnical ities, we pro vide a separate proof of Theorem 1 . In Sect ion 5 to follo w , we sho w how to adapt th e proof to Model 2 . Before going into the details , we make a fe w observ ations that simplify the proof. First note that it su ﬃ ces to prov e the theorem for the ﬁ rst S DP formu lation ( 1 ). Indeed, the gr ound- truth matrix Y ∗ is also feasible t o second f ormulatio n ( 6 ); moreo ver , thanks to th e equality constraint P n i , j = 1 Y i j = P n i , j = 1 Y ∗ i j , subtractin g the c onstan t-v alued term D Y , p + q 2 J E from th e objecti ve of ( 6 ) does not a ﬀ ect its optimal solution s. The two formulations a re therefor e id entica l exc ept for the abo ve equali ty cons traint , which is ne ver used in th e proof be lo w . Seco ndly , under the assumption c p ≤ q ≤ p for a uni versal consta nt c > 0, we hav e 1 k p + (1 − 1 k ) q ≥ c p . Therefore , it su ﬃ ces to pro ve the theorem with the S NR redeﬁned as s = ( p − q ) 2 p , which only a ﬀ ects the univ ersal cons tant C e in th e ex ponen t of the er ror bound . Thirdly , it is in fact su ﬃ ci ent to pro ve the bo und k b Y − Y ∗ k 1 ≤ C g n 2 exp " − sn C e k # . (8) Suppose tha t this bou nd holds; under the premise s ≥ C s k 2 / n of the theorem with the constant C s su ﬃ cien tly large, we hav e exp h − sn 2 C e k i ≤ e xp h − C s 2 C e · k i ≤ 1 k , henc e the RHS of th e bound ( 8 ) is at most C g n 2 · 1 k · exp " − sn 2 C e k # = C g k Y ∗ k 1 exp " − sn 2 C e k # , which implies the error bou nd in theore m statement again up to a chan ge in the uni versal constant C e . Finally , we deﬁne the con venie nt shorthand s γ ≔ k b Y − Y ∗ k 1 for the error and ℓ : = n k for the cluste r size, which will be used th rough out the proo f. Our proof be gins with a basic inequality using optimality . Since Y ∗ is feasible to the SDP ( 1 ) and b Y is optimal, we ha ve 0 ≤  b Y − Y ∗ , A − p + q 2 J  =  b Y − Y ∗ , E A − p + q 2 J  + D b Y − Y ∗ , A − E A E . (9) A simple observ ation is that the entries o f the matrix Y ∗ − b Y ha ve matching signs w ith those o f E A − p + q 2 J . This observ ation implies the followin g relati onship between the ﬁrst ter m on the RHS of e quatio n ( 9 ) and the error γ ≔ k b Y − Y ∗ k 1 . Fac t 1. W e have the ineq uality  Y ∗ − b Y , E A − p + q 2 J  ≥ p − q 2 γ. (10) The pr oof of this fac t is deferred to S ection C.1 . T aking this fact as giv en and combinin g with the ine qualit y ( 9 ), we obtain that p − q 2 γ ≤ D b Y − Y ∗ , A − E A E . (11) 12 T o bound the error γ , it su ﬃ ces to contro l the RHS of equ ation ( 11 ), w here we depart from exi sting analysis . The seminal work in [ 29 ] bound s the RHS by a direct appli cation of the Grothend ieck’ s ineq uality . As w e discuss belo w , this ar gument f ails to expose the fast, exponen- tial decay of the error γ . Our analy sis de ve lops a m ore precise bound. T o des cribe our approa ch, some additio nal notation is needed. Let U ∈ R n × k be the matrix of the lef t singular vecto rs of Y ∗ . Deﬁne the projection P T ( M ) ≔ UU ⊤ M + MUU ⊤ − U U ⊤ MUU ⊤ and its orthogo nal complement P T ⊥ ( M ) = M − P T ( M ) for any M ∈ R n × n . Our cruc ial observ ation is that we shou ld con trol D b Y − Y ∗ , A − E A E by separa ting the contrib utions fro m two projecte d componen ts of b Y − Y ∗ de- ﬁned by P T and P T ⊥ . In particul ar , we re write the inequality ( 11 ) as p − q 2 γ ≤ D P T ( b Y − Y ∗ ) , A − E A E | { z } S 1 + D P T ⊥ ( b Y − Y ∗ ) , A − E A E | {z } S 2 . (12) The ﬁrst term S 1 in v olv es the compone nt of b Y − Y ∗ that is “aligned ” with Y ∗ ; in partic ular , P T ( b Y − Y ∗ ) is the ort hogon al project ion onto the subspace spanned by matrices with t he same column or ro w space as Y ∗ . The secon d terms S 2 in v olv es the orthog onal component P T ⊥ ( b Y − Y ∗ ), whose column and ro w spaces are orthogo nal to those of Y ∗ . The main steps of our analysi s consist of b oundi ng S 1 and S 2 separa tely . The fo llo w ing propo sition bounds the term S 1 and is prov ed in S ection 4.2 to follo w . Pro position 1. Under the conditions of Theor em 1 , with pr obab ility at least 1 − 6 nk − 2( e 2 ) − 2 n , at least o ne of th e follo wing inequa lities hold: γ ≤ C g n 2 e − sn / ( 2 C e k ) , S 1 ≤ D 1 γ r p log  n 2 /γ  ℓ , (13) wher e D 1 = 12 D = 12 √ 7 . Our ne xt propositi on, pro ved in Sectio n 4.3 to follo w , controls the term S 2 . Pro position 2. Under th e co nditio ns of Theor em 1 , with pr obability 1 − 1 n 2 − e − (3 − ln 2) n − 4 e − c ′ √ n , at le ast one o f the following inequa lities hold: γ ≤ C g n 2 e − sn / (2 C e k ) , S 2 ≤ D 2 √ pn γ ℓ + 1 8 ( p − q ) γ. (14) wher e D 2 > 0 is a univers al constant. Equipped with these two prop osition s, the desi red bound ( 8 ) follo ws easily . If the ﬁ rst in- equali ty in th e tw o propositio ns holds, then we are done. Other wise, th ere must hold the inequali- ties ( 13 ) and ( 14 ), whic h can b e pl ugged into th e RHS of equation ( 12 ) to get p − q 2 γ ≤ D 1 γ r p log  n 2 /γ  ℓ + D 2 r pk 2 n γ + 1 8 ( p − q ) γ. Under the premise s ≥ C s k 2 / n of Theorem 1 , we know that D 2 q pk 2 n ≤ p − q 8 , whe nce p − q 4 γ ≤ D 1 γ r p log  n 2 /γ  ℓ . 13 Doing some algebra yie lds the ine qualit y γ ≤ n 2 exp h − sn / (16 D 2 1 ) i , so th e desired bound ( 8 ) again holds. The rest of this sect ion is dev oted to establishin g Prop ositio ns 1 and 2 . Before proce eding to the pro ofs, we remark on the a bov e argu ments and cont rast them with alternati ve approache s. Comparison with the Gr othendieck’ s in eq u ality appr oach: The argumen ts in the work [ 29 ] als o be gin with a versi on of the inequalit y ( 11 ), an d proc eed by ob servin g that p − q 2 γ ≤ D b Y − Y ∗ , A − E A E ( i ) ≤ 2 sup Y  0 , diag( Y ) ≤ 1 |h Y , A − E A i| , (15) where step ( i ) follo ws from the triangle inequ ality and the feasibili ty of b Y and Y ∗ . Therefore, this arg ument reduces the problem to boun ding the RHS of ( 15 ), which can be done using the celebr ated Groth endie ck’ s inequalit y . One can already see at th is point that this approac h yield s sub-op timal bounds. For e xample, SDP is kno wn to achie ve exact reco ver y ( γ = 0) under certain condit ions, yet the inequal ity ( 15 ) can nev er guara ntee a zero γ . Sub-optimalit y arises in step ( i ): the quanti ty D b Y − Y ∗ , A − E A E diminish es whe n b Y − Y ∗ is small, but the trian gle inequality a nd the worse -case bound used in ( i ) are too crude to capture such beha viors. In comparis on, ou r proof tak es adv antage of the str ucture s of the error matrix b Y − Y ∗ and its interp lay with the no ise matrix A − E A . Bounding the S 1 term: A common approach in volv es using the generalize d Holder’ s in- equali ty S 1 = D b Y − Y ∗ , P T ( A − E A ) E ≤ γ kP T ( A − E A ) k ∞ . U nder SBM, one can sho w that kP T ( A − E A ) k ∞ . q p log n 2 ℓ with high proba bility , hence yielding the bound S 1 . γ q p log n 2 ℓ . V ariants of this appro ach are in fac t common (sometimes implicitly) in the proofs of exa ct re- cov ery for SD P [ 8 , 18 , 10 , 31 , 20 ]. Howe ver , when q p log n 2 ℓ ≥ p − q ( where exact r eco very is impossib le), applyin g this bound for S 1 to the inequalit y ( 12 ) would yield a vacu ous bound for γ . In comparis on, Pro positio n 1 gi ves a strictly sharp er bound ( 13 ), which correctly characte rizes the beh a viors of S 1 bey ond the e xact recov ery regime. Bounding the S 2 term: Note that since P T ⊥ ( Y ∗ ) = 0, we ha ve the equality S 2 = D P T ⊥ ( b Y ) , A − E A E . It is eas y to sho w that the matrix P T ⊥ ( b Y ) is posi ti ve semideﬁnit e and has diagon al entries at most 4 (cf. Fact 3 ). Therefore, one may again attempt to control S 2 using the Grothendie ck’ s inequality , which would yi eld the bound S 2 ≤ 4 · g ( A − E A ) for some function g whose e xact form is not importan t for no w . The bound ( 14 ) in Proposition 2 is much st ronger — it depend s on γ , whic h is in tu rn propo rtional to the trace of the m atrix P T ⊥ ( b Y ) (cf. Fact 2 ). 4.1 Pr eliminaries and a dditional notation Recall that U ∈ R n × k is the matrix of the left singular vecto rs of Y ∗ . W e observ e that U ia = 1 / √ ℓ if no de i is in cluster a and U ia = 0 other wise. There fore, UU ⊤ is a block dia gonal matrix with all entries in side each diagonal block equal to 1 /ℓ . Deﬁne th e “nois e” matrix W ≔ A − E A . T he mat rix W is symmetric, which in troduc es some minor depende ncy among its ent ries. T o handle this, we let Ψ be the m atrix obtaine d from W w ith its entr ies in the lo wer tria ngular part set to ze ro. Note that W = Ψ + Ψ ⊤ , and Ψ has independen t entries (with zero entr ies considere d Bern(0)). Similarly , w e deﬁne Λ as the upper tria ngula r part of th e adja cenc y matrix A . In the p roof we fr equen tly use the inequ alities s = ( p − q ) 2 p 0 , ther e exists a number C ( α ) ≥ 1 such that if s ≥ C ( α ) k n , then pe − p n / ( α k ) ≤ ( p − q ) e − sn / (2 α k ) . Pr oof. Note that pn ≥ ( p − q ) n ≥ sn ≥ C ( α ) k . As long as C ( α ) is su ﬃ ciently large , we hav e pn k ≤ e pn / (2 α k ) . These inequa lities imply that p p − q ≤ pn k ≤ e pn / (2 α k ) ≤ e (2 p − s ) n / (2 α k ) . Multiply ing both sides by ( p − q ) e − 2 pn / (2 α k ) yields the claimed inequ ality .  Finally , w e ne ed a simpl e pilot bound, which en sures tha t the SDP solution satisﬁes a non- tri vial erro r boun d γ < n 2 . Lemma 2. Under Model 1 , if p ≥ 1 n then we have γ ≤ 45 r n 3 s with pr obabili ty at least 1 − P 0 , wher e P 0 : = 2( e / 2) − 2 n . In particu lar , if s ≥ C s 1 n , we have γ ≤ c u n 2 with h igh pr obabi lity with c u = 45 √ C s . This theorem is a v ariation of Theorem 1.3 in [ 29 ], a nd a special case of Theorem 1 in [ 17 ] applie d to the non-deg ree-co rrected setting. For complete ness we pro vide the proof in Sectio n B . The pr oof uses th e Grothe ndieck ’ s ineq uality , an approac h pioneered in [ 29 ]. 4.2 Pr oof of Pr oposition 1 In th is sect ion we pr ov e Propo sition 1 , which cont rols the qu antity S 1 . Using the s ymmetry of b Y − Y ∗ and the cycli c in va riance of the trace, we ob tain the id entity S 1 = D P T ( W ) , b Y − Y ∗ E = D UU ⊤ W , b Y − Y ∗ E + D WUU ⊤ , b Y − Y ∗ E − D UU ⊤ WUU ⊤ , b Y − Y ∗ E = 2 D UU ⊤ W , b Y − Y ∗ E − D UU ⊤ WUU ⊤ , b Y − Y ∗ E = 2 D UU ⊤ Ψ , b Y − Y ∗ E + 2 D UU ⊤ Ψ ⊤ , b Y − Y ∗ E − D UU ⊤ ( Ψ + Ψ ⊤ ) UU ⊤ , b Y − Y ∗ E = 2 D UU ⊤ Ψ , b Y − Y ∗ E + 2 D UU ⊤ Ψ ⊤ , b Y − Y ∗ E − 2 D UU ⊤ Ψ UU ⊤ , b Y − Y ∗ E = 2 D UU ⊤ Ψ , b Y − Y ∗ E + 2 D UU ⊤ Ψ ⊤ , b Y − Y ∗ E − 2 D UU ⊤ Ψ , ( b Y − Y ∗ ) UU ⊤ E . It fo llo w s that S 1 ≤ 2     D UU ⊤ Ψ , b Y − Y ∗ E     + 2     D UU ⊤ Ψ ⊤ , b Y − Y ∗ E     + 2     D UU ⊤ Ψ , ( b Y − Y ∗ ) UU ⊤ E     . (16) Note that k b Y − Y ∗ k ∞ ≤ 1 since b Y , Y ∗ ∈ { 0 , 1 } n × n . One can also check that k  b Y − Y ∗  UU ⊤ k ∞ ≤ 1. This is du e to the f act that each en try of the matrix inside the norm is the mea n of ℓ entries in b Y − Y ∗ gi ve n the block diagonal s tructur e of U U ⊤ , and the absolute value of the mean does not exc eed 1. W ith the same reasoning, we see that k ( b Y − Y ∗ ) UU ⊤ k 1 ≤ k b Y − Y ∗ k 1 = γ . Ke y to our proof is a bound on the sum of order stati stics. Intuiti vely , giv en m i.i.d. random v ariabl es, the sum of the β large st of them (in ab solute v alue) sca les as O  β p log( m /β )  . The fol- lo wing lemma, pro ved in Section C.2 , makes the abo ve intuition pr ecise a nd m oreo ve r est ablish es a un iform bound in β . 15 Lemma 3. Let m ≥ 8 and g ≥ 1 be positiv e inte ger s, c u > 0 is a su ﬃ cientl y small const ant. F or eac h j ∈ [ m ] , deﬁne X j ≔ P g i = 1 ( B i j − E B i j ) , wher e B i j ar e independ ent B ernoulli variables with varian ce at most ρ . Then for a con stant D = √ 7 , we have ⌈ β ⌉ X j = 1    X ( j )    ≤ D ⌈ β ⌉ p g ρ log ( m /β ) , ∀ β ∈ ( e − g ρ/ C e m , ⌈ c u m ⌉ ] , with p r obabilit y at lea st 1 − P 1 ( m ) , w her e P 1 ( m ) ≤ 3 m . W e are read y to boun d the RHS of equat ion ( 16 ) and henc e prov e Propos ition 1 . Consid er the e ven t E 1 : = (    h UU ⊤ Ψ , M i    ≤ 2 Db r p log  n 2 / b  ℓ , ∀ M , b : k M k ∞ ≤ 1 , k M k 1 ≤ b , n 2 e − sn / ( 2 C e k ) < b ≤ c u n 2 ) , and let E 2 be deﬁned similarly with Ψ replaced by Ψ ⊤ . W e w ill use L emma 3 to sho w that E i holds with probabilit y at lea st 1 − P 1 : = 1 − P 1 ( nk ) for each i = 1 , 2. T aking this cla im as gi ven, we no w condit ion on the intersect ion of the ev ents E 1 , E 2 and the conclusion of Lemma 2 . N ote that the matrix b Y − Y ∗ satisﬁes k b Y − Y ∗ k ∞ ≤ 1, and Lemm a 2 furth er ensures that γ : = k b Y − Y ∗ k 1 ≤ c u n 2 for c u su ﬃ cien tly small . If γ ≤ n 2 e − sn / ( 2 C e k ) , then the ﬁrst inequality in Propo sition 1 holds and we are done. Othe rwise, on the ev ent E 1 , we are gua rantee d that     h UU ⊤ Ψ , b Y − Y ∗ i     ≤ 2 D γ r p log  n 2 /γ  ℓ . Since the matrix M : =  b Y − Y ∗  UU ⊤ satisﬁes k M k ∞ ≤ 1 and k M k 1 ≤ γ , we ha ve the b ound     D UU ⊤ Ψ , ( b Y − Y ∗ ) UU ⊤ E     ≤ 2 D γ r p log  n 2 /γ  ℓ . Similarly , on the ev ent E 2 , we hav e the bo und     h UU ⊤ Ψ ⊤ , b Y − Y ∗ i     ≤ 2 D γ r p log  n 2 /γ  ℓ . Applying these esti mates to the RHS of eq uation ( 16 ), we arri ve at th e boun d S 1 ≤ 12 D γ q p l og ( n 2 /γ ) ℓ , which is second inequ ality in Propos ition 1 . It remains to bound the pro babilit y of the e vent E 1 , for which w e shall use Lemma 3 ; the same argu ments apply to the ev ent E 2 . Let us take a digr ession to inspect the structu re of the random mat rix V : = UU ⊤ Ψ . W e trea t each zero entry in Ψ as a B ern(0) random v ariable with its m ean subtra cted and indep endent of all other entrie s. T herefor e, all entries of Ψ are inde- pende nt. Since UU ⊤ is a block dia gonal mat rix, V can be partitio ned into k sub matrices of size ℓ × n stacked v ertically , where r o ws within the sa me submatrix a re identical a nd rows from dif- ferent submatrices are inde penden t with each of their entries equal to 1 /ℓ times the sum of ℓ indepe ndent centered B ernoul li random variab les. T o v erify our observ ations, for a ∈ [ k ] we use R a ≔ { ( a − 1) ℓ + 1 , . . . , a ℓ } to denot e the set of row indi ces of the a -th submatrix of V . C onsider 16 any i ∈ R a , that is, any ro w index of the a -th sub matrix of V . Then for all j ∈ [ n ], we ha ve V i j = P n t = 1  UU ⊤  it Ψ t j = ℓ − 1 P u ∈R a Ψ u j . W e see that w e ge t the same random var iable by v arying i withi n R a while ﬁxing j , b ut independe nt rand om va riables by ﬁxing i while varyin g j . Fix an index i a : = ( a − 1) ℓ + 1 ∈ R a for each a ∈ [ k ]. Con sider any n × n mat rix M and nu mber b su ch that k M k ∞ ≤ 1, k M k 1 ≤ b and n 2 e − sn / ( 2 C e k ) < b ≤ c u n 2 . W e can compu te |h V , M i| ≤ n X i = 1 n X j = 1    V i j       M i j    = k X a = 1 X i ∈R a n X j = 1    V i j       M i j    = X a ∈ [ k ] , j ∈ [ n ]    ℓ V i a j            X i ∈R a    M i j    ℓ         , where the las t step follo ws from the previ ously esta blishe d fact that V i j = V i a j , ∀ i ∈ R a . Recall that the nk random var iables { ℓ V i a j , a ∈ [ k ] , j ∈ [ n ] } are indepen dent; moreo ver , ea ch ℓ V i a j is the sum of ℓ indep endent Bernoul li va riable s with varian ce at most p . Let X ( t ) denote the element in these nk rand om v ariab les with the t -th lar gest absolu te va lue. D eﬁne th e quan tity w ≔ k M k 1 /ℓ = P a ∈ [ k ] , j ∈ [ n ] P i ∈R a    M i j    /ℓ . Sinc e P i ∈R a ℓ − 1    M i j    ≤ k M k ∞ ≤ 1 and w ≤ b /ℓ , we ha ve |h V , M i| ≤        P ⌈ w ⌉ t = 1    X ( t )    ≤ P ⌈ b /ℓ ⌉ t = 1    X ( t )    , b /ℓ ≥ 1    X (1)    ( b /ℓ ) , b /ℓ < 1 . (17) Here we use the fact that for an y seque nce of numbers a 1 , a 2 , . . . in [0 , 1], P t | X t | a t ≤ P P i a i t = 1    X ( t )    . No w note that b /ℓ ∈ ( nke − sn / ( 2 C e k ) , c u nk ], which implies t hat b /ℓ ∈ ( nke − p ℓ/ C e , c u nk ] by Lemma 1 . Applying Lemma 3 with m = nk ≥ 8, g = ℓ , ρ = p and β = b /ℓ , we are guar anteed that with probab ility at least 1 − P 1 ( nk ), ⌈ b /ℓ ⌉ X t = 1    X ( t )    ≤ D ⌈ b /ℓ ⌉ s ℓ p l og nk b /ℓ ! and    X (1)    ≤ D p ℓ p l og( nk ) simultan eously for all re le v ant b /ℓ . On this e vent, we ca n co ntinu e the inequality ( 17 ) to con clude that |h V , M i| ≤          D ⌈ b /ℓ ⌉ q ℓ p l og  nk b /ℓ  , b /ℓ ≥ 1 D ( b /ℓ ) p ℓ p l og( nk ) , b /ℓ < 1 simultan eously for all rele van t matrices M . It is easy to see that the last RHS is bounded by 2 Db q p log ( nk / w ) ℓ in eith er case, hence the e ve nt E 1 holds. W e remark that the box constr aints in the SDP ( 1 ) are crucial to the abo ve argu ments, w hich are ultimately applied to the matri x M = b Y − Y ∗ . In p articul ar , the box constrain ts ensure that k M k ∞ = k b Y − Y ∗ k ∞ ≤ 1, which allo ws us to est ablish the inequality ( 17 ) and apply the order statist ics bound in L emma 3 . 4.3 Pr oof of Pr oposition 2 W e be gin ou r proo f by re- writing S 2 as S 2 = D P T ⊥ ( b Y ) , W E , (18) 17 which holds since P T ⊥ ( Y ∗ ) = 0 by deﬁnition of the projection P T ⊥ . W e can relate the matrix P T ⊥ ( b Y ) app eared abo ve to the qua ntity of in terest, γ = k b Y − Y ∗ k 1 . In parti cular , o bserv e that T r n P T ⊥  b Y o = Tr n I − UU ⊤   b Y − Y ∗   I − UU ⊤ o ( i ) = T r n  I − U U ⊤   b Y − Y ∗  o ( ii ) = T r n UU ⊤  Y ∗ − b Y o , where step ( i ) ho lds since trace is in var iant under c yclic permutations and I − UU ⊤ is a projec tion matrix, and step ( ii ) holds since diag  b Y  = diag ( Y ∗ ) = 1 by feasibility of b Y and Y ∗ to the SDP ( 1 ). Recall that all the non-zer o entri es of UU ⊤ are in its diagona l bl ock and equal to 1 /ℓ . Since the corres pondin g diagon al-bloc k entries of the matrix Y ∗ − b Y ar e non- neg ati ve, we ha ve T r n UU ⊤  Y ∗ − b Y o = X ( i , j ): Y ∗ i j = 1 1 ℓ ·      Y ∗ − b Y  i j     ≤ γ ℓ . Combining the se piec es gi ve s Fac t 2. Tr n P T ⊥  b Y o ≤ γ ℓ . Equipped with this fact, we procee d to bound the quantit y S 2 gi ve n by equatio n ( 18 ). W e consid er separately the dense case p ≥ c d log n n and the sparse case p ≤ c d log n n , where c d > 0 is a consta nt giv en in the statement of Lemma 6 . 4.3.1 The dense case First assume that p ≥ c d log n n . In this case boundin g S 2 is relati ve ly straight forwa rd, as the graph spectr um is well-beha ved. W e ﬁrst recall that the nuclear norm kP T ⊥ ( b Y ) k ∗ is deﬁned as the sum of the singul ar v alues of the matrix P T ⊥ ( b Y ). Since b Y  0 by feasibi lity , we hav e P T ⊥ ( b Y ) = ( I − UU ⊤ ) b Y ( I − UU ⊤ )  0, whence kP T ⊥ ( b Y ) k ∗ = Tr n P T ⊥  b Y o . Revisi ting th e express ion ( 18 ), w e obtain tha t S 2 = D P T ⊥ ( b Y ) , W E ≤ Tr n P T ⊥  b Y o · k W k op ≤ γ ℓ · k W k op , where the ﬁrst inequ ality fo llo w s from the duality between the nuclear and spectral norms, and the second in equali ty follo ws from Fa ct 2 . It re mains to contro l the sp ectral no rm k W k op of th e ce ntered ad jacenc y matrix W : = A − E A . This can be done in the follo wing lemma, which is pro ve d in Section C.3 using standard tools from ra ndom matrix th eory . Lemma 4. W e hav e k A − E A k op ≤ 8 √ pn + 174 p log n with pr obability at least 1 − P 2 wher e P 2 ≔ n − 2 . Applying th e lemma, we obtain that with probabil ity at least 1 − P 2 , S 2 ≤  8 √ pn + 174 p log n  · γ ℓ ≤ D 2 √ pn · γ ℓ , where in the last step holds for some constant D 2 su ﬃ cien tly lar ge under the assumpt ion p ≥ c d log n n . This complet es the pro of of Prop ositio n 2 in the dense ca se. 18 4.3.2 The sparse case No w suppose that p ≤ c d log n n . In th is c ase w e can no longer use the argumen ts abov e, because some nodes will hav e deg rees far e xceedi ng than their expec tation O ( pn ), and k W k op will be dominate d by these nodes and b ecome much large r than √ pn . This issue is parti cularly se vere when p , q ≍ 1 n , in whic h ca se the expect ed deg ree is a con stant, yet the maximum node degree di ve r ges. A ddress ing thi s issue req uires a new argument . In particu lar , we sho w th at the m atrix W can be par titione d into two parts, w here th e ﬁrst part has a spectral norm bounded as desired , and the second p art in volv es only a smal l number of edges ; the structure of the SDP solution all o ws us to c ontrol the impa ct of the second par t. As before , to av oid the minor dep endenc y due to symmetr y , we focus on the upper triangular parts Λ and Ψ of the matrices A and W : = A − E A , and later make use of the relatio ns A = Λ + Λ ⊤ and W = Ψ + Ψ ⊤ . W e deﬁne th e sets V ro w ≔          i ∈ [ n ] : X j ∈ [ n ] ( Λ ) i j > 40 pn          , V col ≔          j ∈ [ n ] : X i ∈ [ n ] ( Λ ) i j > 40 pn          , which are the node s whose deg rees (more speciﬁcally , row / column sums w .r .t. the halv ed m a- trix Λ ) are lar ge compa red to the ir expecta tion O ( pn ). For any matrix M ∈ R n × n , we deﬁne the matrix e M such that e M i j =        0 , if i ∈ V ro w or j ∈ V col , M i j , otherwise. In other words, e M is obtai ned fro m M by “trimming” the rows / co lumns correspo nding to nod es with lar ge de grees. W ith this notat ion, w e can write Ψ = e Ψ + ( Ψ − e Ψ ), which can be combined with th e ex pressi on ( 18 ) to yield S 2 = hP T ⊥ ( b Y ) , Ψ + Ψ ⊤ i = 2 hP T ⊥ ( b Y ) , Ψ i = 2 hP T ⊥ ( b Y ) , e Ψ i + 2 hP T ⊥ ( b Y ) , Ψ − e Ψ i ≤ 2 Tr n P T ⊥ ( b Y ) o · k e Ψ k op + 2 hP T ⊥ ( b Y ) , Ψ − e Ψ i , (19) where in the last st ep we use the fac t that b Y  0 and thus P T ⊥ ( b Y ) =  I − UU ⊤  b Y  I − U U ⊤   0. The ﬁrst term in ( 19 ) can be controll ed usin g the fa ct that the spectrum of the trimmed matrix e Ψ is well -beha ved, for an y p . In particular , we pr ov e the fo llo w ing lemma in Section C.4 as a conseq uence of kno wn results. Lemma 5. F or some absolute cons tant C > 0 , we have k e Ψ k op ≤ C √ pn, with pr obability at least 1 − P 3 , whe r e P 3 : = e − (3 − ln 2)2 n + (2 n ) − 3 . One sh all compare the bo und in Lemma 5 with that in Lemma 4 . After trimming, the term p log n in L emm a 4 (which would dominate in the spars e regime) disappea rs, and the spectral norm of e Ψ beha ve s simil arly as a ra ndom matrix with G aussian entri es. Such a bound is in fact standa rd in recent work on sp ectral algori thms app lied to sparse graph s with constant ex pected deg rees [ 21 , 35 ]. T ypica lly these algorithms proceed by ﬁrst trimming the graph, and then running 19 standa rd spectra l algorithms w ith the trimmed graph as the input. W e emphasize that in our case trimming is used only in the an alysis ; our algorith m itsel f does not require any trimmin g, and the SDP ( 1 ) is ap plied to the origin al graph. As sho wn belo w , we are able to c ontrol the contrib utions from what is not trimmed, namely the sec ond term in ( 19 ). Such a b ound is mad e possible by le ve raging the struc tures of the solutio n b Y induced by the box constr aints of the SDP ( 1 ) — a manifest of the reg ulariza tion e ﬀ ect of SDP . T urning to the second term in equati on ( 19 ), w e ﬁrst not e that the Holder’ s type inequ ality hP T ⊥ ( b Y ) , Ψ − e Ψ i ≤ Tr n P T ⊥ ( b Y ) o k Ψ − e Ψ k op is no longer su ﬃ cien t, as the resi dual matrix Ψ − e Ψ may ha ve large eig en v alues. Here it is cruc ial to use an importan t prope rty of the matrix P T ⊥ ( b Y ), namely the fact that the magnitud es of its entries are O (1), a consequenc e of the con strain t 0 ≤ Y ≤ J in the SDP ( 1 ). More preci sely , we h a ve the follo w ing bound , which is proved in Section C.5 . Fac t 3. W e have kP T ⊥ ( b Y ) k ∞ ≤ 4 . Intuiti vely , this bound ensures that the mass of the matrix P T ⊥ ( b Y ) is spread across its entries, so its eigen-s pace is unlik ely to be aligned with the top eige n vecto r of the random matrix Ψ − e Ψ , which by deﬁnition conc entrate s in a fe w columns / ro ws i nde xed by V col / V ro w . Conseque ntly , the quanti ty hP T ⊥ ( b Y ) , Ψ − e Ψ i is likely to be small. T o make this precise, we start with the inequ ality hP T ⊥ ( b Y ) , Ψ − e Ψ i ≤ kP T ⊥ ( b Y ) k ∞ · k Ψ − e Ψ k 1 (20) The fo llo w ing lemma bo unds the ℓ 1 norm of th e re sidual matrix Ψ − e Ψ . Lemma 6. Suppose p ≥ C p / n fo r some su ﬃ cie ntly lar ge constan t C p > 0 , and p ≤ c d log n / n for some su ﬃ ci ently small constant c d > 0 . T hen the r e exis t some con stants C , c ′ > 0 such tha t k Ψ − e Ψ k 1 ≤ C pn 2 e − pn / C e with p r obabilit y at lea st 1 − P 4 wher e P 4 : = 4 e − c ′ √ n . W e prov e th is lemma in Section 4.4 to fo llo w using tail bou nds for sub-e xponential rando m v ariabl es. Equipped with the abov e bounds , we are ready to bou nd S 2 . If γ ≤ C g n 2 e − sn / (2 C e k ) , then the ﬁrst in equali ty in Propositio n 2 holds an d we ar e done. It remains to consid er the case with γ > C g n 2 e − sn / (2 C e k ) , which by Lemma 1 implies that γ > C g n 2 · p p − q e − pn / C e . The inequ alities ( 19 ) and ( 20 ) tog ether gi ve S 2 ≤ 2 Tr n P T ⊥ ( b Y ) o · k e Ψ k op + kP T ⊥ ( b Y ) k ∞ · k Ψ − e Ψ k 1 ≤ 2 γ ℓ k e Ψ k op + 4 k Ψ − e Ψ k 1 , where we use Facts 2 and 3 in the second step. Applying Lemma 5 to the ﬁrst RHS term abov e and Lemma 6 to the second, we o btain that S 2 ≤ 2 C √ pn ℓ γ + 4 C pn 2 e − pn / C e with p robabi lity at leas t 1 − P 3 − P 4 . Since γ > C g n 2 · p p − q e − pn / C e as sho wn abo ve , we obtain that S 2 ≤ 2 C √ pn ℓ γ + 4 C C g ( p − q ) γ ≤ 2 C √ pn ℓ γ + 1 8 ( p − q ) γ, where the las t ste p holds pro vide d that the c onstan t C g is large enough. This pr ov es the s econd inequa lity in Proposit ion 2 . 20 4.4 Pr oof of Lemma 6 For any matrix M , let pos i • ( M ) an d pos • j ( M ) deno te the numbe r of pos iti ve entries in i -th ro w and j -th column of M , respec ti ve ly . Recall that the ent ries of Λ are in depen dent Bernou lli random v ariabl es with var iance at most p , where C p / n ≤ p ≤ c d log n / n for some su ﬃ cie ntly large con stant C p > 0 and su ﬃ cient ly small constant c p > 0. By deﬁnition, we h a ve Ψ ≔ Λ − E Λ , whence k Ψ − e Ψ k 1 = k ( Λ − E Λ ) − ] ( Λ − E Λ ) k 1 = k ( Λ − E Λ ) − ( e Λ − g E Λ ) k 1 ≤ k E Λ − g E Λ k 1 + k Λ − e Λ k 1 . The two terms on the last RHS are both random quan tities dependin g on the se ts V ro w = { i : pos i • ( Λ ) ≥ 40 pn } and V col : = { j : pos • j ( Λ ) ≥ 40 pn } . W e bound these two terms separat ely . 4.4.1 Boundin g k E Λ − g E Λ k 1 Let I ( · ) denote the indicat or functio n. W e begin by in vestigat ing the indicator v ariable G i : = I ( i ∈ V ro w ) = I (pos i • ( Λ ) ≥ 40 pn ); similar proper ties hold for G ′ j : = I ( j ∈ V col ). Note that { G i , i ∈ [ n ] } are inde pende nt Bernoulli ran dom v ariables. T o bound the ir ra tes E G i , we observ e that each pos i • ( Λ ) = P j Λ i j is the sum of n ind epende nt Bernoulli ra ndom variab les with v arianc e at most p (1 − p ) n , and E pos i • ( Λ ) ≤ pn . The Bernstein ’ s inequality ensures that for any number z > pn , P  pos i • ( Λ ) ≥ z  ≤ P  pos i • ( Λ ) − E pos i • ( Λ ) ≥ z − pn  ≤ e xp        − 1 2 ( z − pn ) 2 p (1 − p ) n + 1 3 · 1 · ( z − pn )        ≤ e xp ( − ( z − pn ) 2 2 z ) . (21) Plugging in z = 40 pn yields E G i = P  pos i • ( Λ ) ≥ 40 pn  ≤ exp ( − (40 pn − pn ) 2 2 · 40 pn ) ≤ e − 8 pn . W e now turn to the quantity of inte rest, k E Λ − g E Λ k 1 . Since E Λ i j ≤ p for al l ( i , j ), we hav e th at k E Λ − g E Λ k 1 = X i X j E Λ i j · I ( i ∈ V ro w or j ∈ V col ) ≤ X i X j p · I ( i ∈ V ro w ) + X i X j p · I ( j ∈ V col ) = pn X i G i + pn X j G ′ j . The quantity P i G i is the sum of independ ent B ernoulli v ariable s with rates bound ed abov e. Ap- plying th e Bernstei n’ s inequa lity to this su m, we obtain that P        X i G i > 2 ne − pn / C e        ≤ P        X i G i − X i E G i > 2 ne − pn / C e − ne − pn / C e        21 ≤ exp          − 1 2  ne − pn / C e  2 n · e − pn / C e + 1 3 · 1 · ne − pn / C e          ≤ exp ( − 3 8 ne − pn / C e ) ≤ exp ( − 3 8 √ n ) , where the ﬁrst two steps hold becau se C e ≥ 28, and the last step holds by the assumptio n in Lemma 6 th at p ≤ c d log n / n for some su ﬃ ciently small c d > 0 . The same tail bou nd hol ds fo r th e sum P j G ′ j . It follo ws that with probability at least 1 − 2 e − 3 8 √ n , we hav e k E Λ − g E Λ k 1 ≤ 4 pn 2 e − pn / C e 4.4.2 Boundin g k Λ − e Λ k 1 Since the matrix Λ − e Λ ha ve non-n eg ati ve entri es, we ha ve the inequality k Λ − e Λ k 1 = X i X j Λ i j I ( i ∈ V ro w or j ∈ V col ) ≤ X i X j Λ i j I ( i ∈ V ro w ) | { z } Z i + X j X i Λ i j I ( j ∈ V col ) | {z } Z ′ j . (22) The varia bles { Z i , i ∈ [ n ] } are indepe ndent. Belo w we show that the y are sub-e xponential, for which we recall that a va riable X is called su b-ex ponen tial with par ameter λ if E e t ( X − E X ) ≤ e t 2 λ 2 / 2 , for all t ∈ R with | t | ≤ 1 λ . (23) It is a standar d fact [ 54 ] tha t sub-e xpone ntial v ariables can be equi v alently deﬁned in terms of the tail con dition P {| X | ≥ z } ≤ 2 e − 8 z /λ , for all z ≥ 0 . (24) For the sake o f completeness an d tra cking explicit c onstan t va lues, we prov e in Section C.6 the one-si ded implication ( 24 ) ⇒ ( 23 ), which is what we need belo w . T o verify the cond ition ( 24 ) for each Z i , we obse rve that by deﬁnition either Z i = 0 or Z i ≥ 40 pn . Therefor e, for eac h number z ≥ 40 pn , we hav e the ta il boun d P { Z i ≥ z } = P  pos i • ( Λ ) ≥ z  ( i ) ≤ exp ( − ( z − pn ) 2 2 z ) ≤ exp ( − ( z − z / 40) 2 2 z ) ≤ e − z / 4 , where in step ( i ) we use the previo usly esta blishe d bound ( 21 ). For each 0 < z < 40 pn , we use the bound ( 21 ) aga in to g et P { Z i ≥ z } = P { Z i ≥ 40 pn } = P  pos i • ( Λ ) ≥ 40 pn  22 ≤ e − 8 pn ≤ e − z / 4 . W e conclu de that P { Z i ≥ z } ≤ e − z / 4 for all z ≥ 0. Since Z i is non-ne gativ e, it satisﬁes the tail condit ion ( 24 ) with λ = 32 and hence is sub-e xpon ential as claimed. Moreov er , the non-n egat i ve v ariabl e Z i satisﬁes the expectat ion bound E Z i = Z 20 pn 0 P { Z i > z } d z + Z ∞ 20 pn P { Z i > z } d z ( i ) ≤ Z 20 pn 0 e − 8 pn d z + Z ∞ 20 pn e − z / 4 d z = 20 pn · e − 8 pn + 4 e − 5 pn ( ii ) ≤ 24 pne − 5 pn , where step ( i ) follo ws the previo usly esta blishe d tail bounds for Z i , and step ( ii ) follo ws from the assumpti on that p ≥ C p / n for some su ﬃ cient ly lar ge constan t C p > 0. It follo ws that E P i Z i  ≤ 24 pn 2 e − 5 pn . T o control the sum P i Z i in equatio n ( 22 ), we use a Bernstein- type in equali ty for the sum of sub-e xponential rand om va riables . Lemma 7 (Theorem 1.13 of [ 52 ]) . Suppose that X 1 , . . . , X n ar e indep enden t ra ndom var iables , eac h of which is sub-e xponentia l with par ameter λ in th e sens e of ( 23 ). Then for each t ≥ 0 , we have P        n X i = 1 ( X i − E X i ) ≥ t        ≤ exp ( − 1 2 min t 2 λ 2 n , t λ !) . For a su ﬃ cien tly larg e constan t C 2 > 1, we app ly the lemma abov e to obtain the tail bound P        X i Z i ≥ 2 C 2 pn 2 e − 5 pn        ( i ) ≤ P        X i ( Z i − E Z i ) ≥ C 2 pn 2 e − 5 pn        ( ii ) ≤ exp ( − 1 2 min ( C 2 pn 2 e − 5 pn ) 2 (32) 2 n , C 2 pn 2 e − 5 pn 32 !) , where step ( i ) follo ws from our pr e vious bound on E P i Z i  . T o proceed, w e recall the ass umption in Lemma 6 th at p satisﬁes C p / n ≤ p ≤ c d log n / n for C p su ﬃ cien tly lar ge and c d > 0 su ﬃ cient ly small, whi ch implie s that log  C 2 pn 2 e − 5 pn  = log C 2 + log( pn ) + log n − 5 pn ≥ 3 4 log n , whence C 2 pn 2 e − 5 pn ≥ n 3 / 4 . Plugging into the abov e tail bound, we get th at P        X i Z i ≥ 2 C 2 pn 2 e − 5 pn        ≤ exp ( − 1 2 min n 1 / 2 (32) 2 , n 3 / 4 32 !) ≤ e − c ′ √ n , for some absolute co nstant c ′ . The same tail boun d ho lds for th e sum P j Z ′ j . Combining these two bound s with the ine qualit y ( 22 ), we obtain that k Λ − e Λ k 1 ≤ 4 C 2 pn 2 e − 5 pn 23 with p robabi lity at least 1 − 2 e − c ′ √ n . Putting together the abo ve bounds o n k E Λ − g E Λ k 1 and k Λ − e Λ k 1 , we conclude that there e xists an abso lute cons tant C > 0 such that k Ψ − e Ψ k 1 ≤ k E Λ − g E Λ k 1 + k Λ − e Λ k 1 ≤ C pn 2 e − pn / C e with p robabi lity at least 1 − 4 e − c ′ √ n , a s desire d. 5 Pr oof of Th eorem 3 W e only need to show that th e co nclus ion of Theorem 1 (the bound on k b Y − Y ∗ k 1 ) holds un der the Heteroge neous S BM in Model 2 . Once this is establ ished, the conclu sion in T heorem 2 follows immediatel y , as th e clu sterin g error is propor tional to k b Y − Y ∗ k 1 (see Prop osition A ). T o achie ve the goal abo ve, we ﬁrst consid er a model that brid ges Model 1 and Model 2 , and pro ve robust ness under this inte rmediate model. Model 4 (Semi-rand om Stochasti c Block Model) . Giv en an arbit rar y set of node pairs L ⊆ { ( i , j ) ∈ [ n ] × [ n ] : i < j } and the groun d-truth clusteri ng σ ∗ (encod ed in the matrix Y ∗ ), a graph is gene rated as follo ws. For each ( i , j ) ∈ L , A i j = Y ∗ i j determin istical ly . For each ( i , j ) < L , A i j is generat ed accord ing to the standard S BM in Model 1 ; that is, A i j ∼ Bern( p ) if Y ∗ i j = 1 , and A i j ∼ Bern( q ) if Y ∗ i j = 0. Note that the standard SBM in Model 1 is a sp ecial c ase of the Semi-rando m SBM in M odel 4 with an empty set L . Model 4 is in turn a special cas e of the H eterogeneou s SBM in Model 2 where A i j ∼ Bern( I ( Y ∗ i j = 1)) for each ( i , j ) ∈ L , and A i j ∼ Bern  p I ( Y ∗ i j = 1) + q I ( Y ∗ i j = 0)  for each ( i , j ) < L . W e ﬁrst sho w that the conc lusion of Theorem 1 hold s unde r Model 4 . W e do so by ver ifying the steps of the proof of Theorem 1 gi ven in Section 4 . The proof consist s of Fact 1 concernin g the exp ected grap h E A , an d P roposit ions 1 an d 2 , w hich bound the deviat ion terms S 1 and S 2 . Under Model 4 , Fact 1 remains v alid: each entry of A in the set L is changed in the direction of the si gn of the corres pondi ng entries of b Y − Y ∗ , which only increases the L HS of equation ( 10 ). Both terms S 1 and S 2 in v olv e the noise matrix W : = A − E A . Examining the pr oofs of Propo sition s 1 and 2 , we see that they only rely on the independen ce of the entries of Ψ ≔ Λ − E Λ (that is, the uppe r triang ular entries of W : = A − E A ) as well as uppe r boun ds on their expec tation s an d varia nces. Under M odel 4 , the entries of A index ed by L are deterministi cally set to 1 or 0; the corresp ondin g entries of Ψ become Bern(0), in which ca se they remain independ ent, with th eir exp ectatio ns and v arianc es decreased to zero. 5 Therefore , Propositi ons 1 and 2 , a nd hen ce all th e steps in the pr oof of The orem 1 , con tinue to hol d unde r Model 4 . W e n o w turn to the Hete rogene ous SBM in Model 2 , and sho w that this mod el ca n b e reduced to Model 4 via a coupling ar gument . Suppose that under Model 2 , A i j ∼ Bern( p i j ) for each ( i , j ) with Y ∗ i j = 1, and A i j ∼ Bern( q i j ) for each ( i , j ) with Y ∗ i j = 0, where by assumption p i j ≥ p and q i j ≤ q . M odel 2 is e qui valent to the follo w ing 3-step pr ocess: 5 W e illustrate this argument using S ection 4.4 as an exam ple. There we would like to upper bound the quantity k Ψ − e Ψ k 1 . Suppose that under Model 4 , the entry ( i , j ) ∈ L is such that A i j = Y ∗ i j = 1 surely . In this case Ψ i j ∼ Bern(0), which can be written as Ψ i j = Λ i j − E Λ i j with E Λ i j = 0 ≤ p and V ar( Λ i j ) = 0 ≤ p (1 − p ). This i s all that is required when we proceed t o bound k E Λ − g E Λ k 1 and k Λ − e Λ k 1 . 24 Step 1 : W e ﬁrst generat e a set of edge pa irs L as foll o ws: indepen dently for each ( i , j ) , i < j , if Y ∗ i j = 1, then ( i , j ) is inclu ded in L with probabi lity 1 − 1 − p i j 1 − p ; if Y ∗ i j = 0, then ( i , j ) is includ ed in L w ith prob abilit y 1 − q i j q . Step 2 : Indepen dently of above , w e sample a graph from Model 1 ; let A 0 denote its adjacenc y matrix. Step 3 : The ﬁnal grap h A is construct ed as follo ws: for each ( i , j ), i < j with Y ∗ i j = 1, A i j = I  A 0 i j = 1 or ( i , j ) ∈ L  ; for each ( i , j ), i < j with Y ∗ i j = 0, A i j = I  A 0 i j = 1 and ( i , j ) < L  . Note that the assumption p i j ≥ p and q i j ≤ q ensures that the probabilit ies i n step 1 are in [0 , 1]. One can verify that the dis trib ution of A is the sa me as i n Model 2 ; in parti cular , for each ( i , j ) , i < j , we hav e P ( A i j = 1) =        P  A 0 i j = 1 or ( i , j ) ∈ L  = 1 − (1 − p ) · 1 − p i j 1 − p = p i j , if Y ∗ i j = 1 , P  A 0 i j = 1 and ( i , j ) < L  = q · q i j q = q i j , if Y ∗ i j = 0 . No w , condition ed on the set L , the distrib ution of A follo ws Model 4 , since steps 1 and 2 are indepe ndent . W e hav e establ ished abov e that under Model 4 , the error bou nd in Theorem 1 hold s with high probabi lity . Integrat ing out the randomness of the set L pro ves that the error bound holds with the same probability under Model 2 . Ackno wledgement Y . Fei and Y . Chen were partially support ed by the National S cience Foundat ion CRII award 16574 20. A ppendices A A pproximate k -medians algorithm W e describ e the procedu re for extractin g an expli cit clustering b σ from the solution b Y of the SDP ( 1 ). Our proc edure b uilds on the idea in the work [ 17 ], and applies a version o f th e k -median algori thm to the ro ws of b Y , vi e wed as n points in R n . The k -median algor ithm seeks a partition of these n points into k clu sters, an d a center associate d with each cluster , such that the sum of th e ℓ 1 distan ces of each point to its cluste r center is m inimized. Note that this proc edure di ﬀ ers fro m the standard k -means alg orithm: t he latter minimizes th e su m of squar ed dist ance, and uses the ℓ 2 distan ce rather than ℓ 1 . T o formally sp ecify the alg orithm, we need s ome additional no tations . L et Rows( M ) be the set of rows in the matrix M . Deﬁ ne M n , k to be the set of membersh ip matrices corresp onding to all k -partitio ns of [ n ]; tha t is, M n , k is the set o f matrices in { 0 , 1 } n × k such that each of their ro ws has exactl y one e ntry equal to 1 an d each of t heir columns has at least one entry equ al to 1. By deﬁnitio n each e lement in M n , k has e xactl y k d istinct ro ws. The k -medians a lgorith m described abo ve can be written as an optimiz ation proble m: min Ψ , X k Ψ X − b Y k 1 25 Algorithm 1 Approximate k -median ρ -kmed Input: data matrix b Y ∈ R n × n ; ap proximat ion factor ρ ≥ 1 . 1. Use a ρ -appro ximatio n alg orithm to solve the optimiz ation proble m ( 25 ). D enote the solutio n as ( ˇ Ψ , ˇ X ). 2. For each i ∈ [ n ], set b σ i ∈ [ k ] to be the ind ex of the uni que non-ze ro element of n ˇ Ψ 1 i , ˇ Ψ 2 i , . . . , ˇ Ψ ki o ; ass ign node i to clust er b σ i . Output: Clustering assignmen t b σ ∈ [ k ] n . s.t. Ψ ∈ M n , k , (25) X ∈ R k × n , Rows( X ) ⊂ Rows( b Y ) Here th e optimiz ation v ariabl e Ψ en codes a pa rtition of n points, assigni ng the i -th point to the cluste r ind ex ed by the unique non-zero el ement of the i -th row of Ψ ; the ro w s of the variab le X repres ent the r cluster centers. Note that the last constrain t in ( 25 ) stipula tes that the cluster centers be elemen ts of the input data po ints represen ted by the rows of b Y , so thi s procedu re is so metimes called k -medoids. Let ( Ψ , X ) be th e optimal sol ution of th e prob lem ( 25 ). Finding the exact solu tion ( Ψ , X ) is i n gene ral computat ionally intractable , but pol ynomial - time cons tant-f actor approxi mation algorithms exist. In par ticular , we will mak e use of the polyno mial- time proc edure in the paper [ 16 ], which produces an approximate solut ion ( ˇ Ψ , ˇ X ) ∈ M n , k × R k × n that is guarante ed to satisfy Rows ( ˇ X ) ⊂ Rows( b Y ) and k ˇ Ψ ˇ X − b Y k 1 ≤ ρ k Ψ X − b Y k 1 (26) with an approximatio n ratio ρ = 20 3 . The v ariab le ˇ Ψ enc odes our ﬁ nal clus tering assign ment. The complete proced ure ρ -k med is gi ve n in Algorithm 1 , which takes as input a data ma- trix b Y (whi ch in our case is the solution to the SDP ( 1 )) and outpu ts an e xplicit clusteri ng b σ = ρ -kmed ( b Y ). W e assume tha t the nu mber of c lusters k is known. B Pr oof of Lemma 2 Starting with the previo us establ ished inequality ( 11 ), we hav e the bound γ ≤ 2 p − q D b Y − Y ∗ , A − E A E . T o co ntrol the RHS, we compute D b Y − Y ∗ , A − E A E = D b Y , A − E A E −  Y ∗ , A − E A  ≤ 2 sup Y  0 , diag( Y ) ≤ 1 |h Y , A − E A i| . The Groth endie ck’ s inequalit y [ 28 , 3 8 ] guar antees that sup Y  0 , diag( Y ) ≤ 1 |h Y , A − E A i| ≤ K G k A − E A k ∞→ 1 where K G denote s the Grothendie ck’ s con stant (0 < K G ≤ 1 . 783) and k M k ∞→ 1 ≔ sup x : k x k ∞ ≤ 1 k Mx k 1 is th e ℓ ∞ → ℓ 1 operat or norm for a matrix M . Furthermore, we ha ve the identity k A − E A k ∞→ 1 = sup x : k x k ∞ ≤ 1 k ( A − E A ) x k 1 = sup y , z ∈{± 1 } n    y ⊤ ( A − E A ) z    . 26 Set σ 2 : = P 1 ≤ i < j ≤ n V ar( A i j ). For each pai r of ﬁxed vec tors y , z ∈ {± 1 } n , th e Bern stein inequal- ity ens ures that for each numbe r t ≥ 0, Pr n    y ⊤ ( A − E A ) z    > t o ≤ 2 exp ( − t 2 2 σ 2 + 4 t / 3 ) . Setting t = √ 16 n σ 2 + 8 3 n gi ves Pr (    y ⊤ ( A − E A ) z    > p 16 n σ 2 + 8 3 n ) ≤ 2 e − 2 n . Applying the union bound and using the fact that σ 2 ≤ p ( n 2 − n ) / 2, w e obtain that with probab ility at mos t 2 2 n · 2 e − 2 n = 2( e / 2) − 2 n , k A − E A k ∞→ 1 > 2 q 2 p ( n 3 − n 2 ) + 8 3 n . Combining pie ces, we conclude that with pr obabil ity at least 1 − 2( e / 2) − 2 n , D b Y − Y ∗ , A − E A E ≤ 8 q 2 p ( n 3 − n 2 ) + 32 3 n ; whence γ ≤ 2 p − q 8 q 2 p ( n 3 − n 2 ) + 32 3 n ! ( i ) ≤ 45 p pn 3 p − q = 45 r n 3 s , where ste p ( i ) ho lds by our assumptio n p ≥ 1 / n . C T echnical lemmas In th is section we provid e the pr oofs of the technical lemmas u sed in the main te xt. C.1 Pr oof of F act 1 For each pair i , j , if nodes i and j belong to the same c luster , then Y ∗ i j = 1 ≥ b Y i j and E A i j − p + q 2 = p − p + q 2 = p − q 2 ; if node s i and j belong to di ﬀ erent clusters, then Y ∗ i j = 0 ≤ b Y i j and E A i j − p + q 2 = q − p + q 2 = − p − q 2 . For i = j , w e ha ve b Y i j = Y ∗ i j . In each cas e, we ha ve the expressio n  Y ∗ i j − b Y i j   E A i j − p + q 2  = p − q 2     b Y i j − Y ∗ i j     . Summing u p the ab ov e equation for a ll i , j ∈ [ n ] gi ves the d esired equ ation ( 10 ). C.2 Pr oof of Lemma 3 T o estab lish a unifor m bound in β , w e app ly a discr etizati on ar gument to the possible val ues of β . Deﬁne the shorthand E : = ( e − g ρ/ C e m , ⌈ c u m ⌉ ]. W e can cove r E b y the sub- interv als E t ≔ ( t − 1 , t ] for inte gers t ∈ [ l e − g ρ/ C e m m , ⌈ c u m ⌉ ]. By constru ction we ha ve g ρ C e = log m e − g ρ/ C e m ≥ log m t . (27 ) 27 Also k no w that our choice of D = √ 7 sa tisﬁes 1 2 D 2 ≥ 3  1 + 1 3 √ C e D  since C e ≥ 28. For each t ∈ [ ⌈ c u m ⌉ ] we deﬁne the pro babili ties α t ≔ P          ∃ β ∈ E t : ⌈ β ⌉ X j = 1    X ( j )    > D ⌈ β ⌉ p g ρ log( m /β )          . W e bou nd each of these proba bilitie s: α t ( i ) ≤ P          t X j = 1    X ( j )    > Dt p g ρ log( m / t )          = P          [ i 1 < ··· Dt p g ρ log( m / t )                   ≤ X i 1 < ··· Dt p g ρ log( m / t )          = X i 1 < ··· Dt p g ρ log( m / t )          ≤ X i 1 < ··· Dt p g ρ log( m / t )          , (28) where ste p ( i ) ho lds sin ce β ∈ E t implies β ≤ ⌈ β ⌉ = t . For each positi ve integer t , ﬁx an index set ( i j ) t j = 1 and a sign pattern u ∈ {± 1 } t . Note that P t j = 1 X i j u j is the sum of tg centered Bernoull i rando m variab les with V ar( B i j − E B i j ) ≤ ρ and    B i j − E B i j    ≤ 1 for each j ∈ [ t ] and i ∈ [ g ]. W e apply Bernstein inequality to bound the probab ility on the RHS of equation ( 28 ): P          t X j = 1 X i j u j > Dt p g ρ log( m / t )          ≤ exp        − 1 2 D 2 t 2 g ρ log( m / t ) tg ρ + 1 3 Dt p g ρ log( m / t )        ( a ) ≤ exp              − 1 2 D 2 t 2 g ρ log( m / t )  1 + 1 3 √ C e D  tg ρ              ( b ) ≤ exp  − 3 t lo g( m / t )  , where step (a) holds by equ ation ( 27 ) and step (b) ho lds by our ch oice of the constant D . Plugging this ba ck to eq uation ( 28 ), we get that fo r each l e − g ρ/ C e m m ≤ t ≤ ⌈ c u m ⌉ , α t ≤ X i 1 < ··· < i t X u ∈{± 1 } t exp  − 3 t lo g( m / t )  = m t ! · 2 t · exp  − 3 t lo g( m / t )  ≤  me t  t e t exp  − 3 t log( m / t )  28 = exp  t log( m / t ) + 2 t − 3 t log( m / t )  ≤ exp  − t log( m / t )  =  t m  t , (29) where the last inequality fol lo ws fro m log( m / t ) ≥ log( m / ( c u m + 1)) ≥ 2 since m ≥ 8 and c u is su ﬃ cien tly small. It follo ws that P          ∃ β ∈ E : ⌈ β ⌉ X j = 1    X ( j )    > D ⌈ β ⌉ p g ρ log( m /β )          ≤ P            ⌈ c u m ⌉ [ t = ⌈ me − ρ g / C e ⌉          ∃ β ∈ E t : ⌈ β ⌉ X j = 1    X ( j )    > D ⌈ β ⌉ p g ρ log( m /β )                     ≤ ⌈ c u m ⌉ X t = ⌈ me − ρ g / C e ⌉ α t ≤ ⌈ c u m ⌉ X t = ⌈ me − ρ g / C e ⌉  t m  t ≤ ⌈ c u m ⌉ X t = 1  t m  t ≕ P 1 ( m ) . It re mains to show tha t P 1 ( m ) ≤ 3 m . First suppose th at m ≥ 16. Since P 1 ( m ) ≤ 1 m + 4 m 2 + 27 m 3 + ⌈ c u m ⌉ X t = 4  t m  t ≤ 1 . 5 m + m · max t = 4 , 5 ,..., ⌈ c u m ⌉  t m  t , the proof is completed if for each integer t = 4 , 5 , . . . , ⌈ c u m ⌉ , we can show the bound  t m  t ≤ 1 m 2 , or equi valen tly f ( t ) : = t (log m − log t ) ≥ 2 log m . S ince m ≥ 16, w e must hav e t ≤ ⌈ c u m ⌉ ≤ c u m + 1 ≤ m 3 , in which case f ( t ) has deri vati ve f ′ ( t ) = log m − log t − 1 ≥ log 3 − 1 ≥ 0 . Therefore , f ( t ) is non- decrea sing fo r 4 ≤ t ≤ ⌈ c u m ⌉ and hence f ( t ) ≥ f (4) = 4 log m − 4 log 4 ≥ 2 log m . No w suppose that m ≤ 16 . In this case, we hav e ⌈ c u m ⌉ = 1, so t can only be 1 and hence P 1 ( m ) ≤ 3 m .  C.3 Pr oof of Lemma 4 Our proof fol lo ws si milar lines as that of Lemma 3 in [ 17 ]. W e ﬁrst bound E k A − E A k op using a stand ard symmetriz ation argu ment. Let A ′ be an indepe ndent copy of A , and R be an n × n symmetric matrix, indepe ndent of both A and A ′ , with i.i.d. Rademach er entries on and abov e its diagon al. W e ca n comput e E k A − E A k op = E k A − E A ′ k op ( i ) ≤ E k A − A ′ k op ( ii ) = E k  A − A ′  ◦ R k op 29 ( iii ) ≤ 2 E k A ◦ R k op , where step ( i ) follows from con ve xity of the spectral norm, step ( ii ) holds since the matrices A − A ′ and ( A − A ′ ) ◦ R are identically distrib uted, and step ( iii ) foll o ws from the triangle inequ ality . W e proceed by boundin g E k A ◦ R k op . Write k ζ k a : = E [ ζ a ] 1 / a for a rando m v ariabl e ζ and an inte ger a ≥ 0. Deﬁne th e sc alars ( b i j ) i ≥ j by b i j = p E A i j . Also d eﬁne th e in depen dent random v ariabl es ( ξ i j ) i ≥ j as fo llo ws: if E A i j = 0, then ξ i j is a Rademacher random v ariable; if E A i j > 0, ξ i j =                  1 √ E A i j , with p rob ability E A i j / 2 , − 1 √ E A i j , with pro babilit y E A i j / 2 , 0 , with prob ability 1 − E A i j . It can be seen that the lo wer tri angula r entries of the symmetric matr ix A ◦ R can be written as A i j R i j = ξ i j b i j . By deﬁnit ion, the random variab les ( ξ i j ) i ≥ j are symmetric and ha ve zero mean and unit v ariance. W e can th erefor e make use of the follo wing kno wn resul t: Lemma 8 (Coroll ary 3.6 in [ 12 ]) . Let X be the n × n symmetr ic random matrix with X i j = ξ i j b i j , wher e { ξ i j , i ≥ j } ar e indep endent symmetric r andom v ariabl es with unit variance and { b i j : i ≥ j } ar e given sca lar s. Then w e have for any α ≥ 3 , E k X k op ≤ e 2 /α ( 2 σ + 14 α max i j k ξ i j b i j k 2 ⌈ α log n ⌉ p log n ) , wher e σ ≔ max i q P j b 2 i j . For ev ery pair i ≥ j , the random va riable ξ i j b i j is surely bou nded by 1 in absolut e v alue and thus satisﬁes k ξ i j b i j k 2 ⌈ α log n ⌉ ≤ 1 fo r an y α ≥ 3. T he scalars ( b i j ) i ≥ j are all bou nded b y √ p , so σ ≤ √ pn . Applying the ab ov e lemma with α = 3 gi ves E k A ◦ R k op ≤ e 2 / 3  2 √ pn + 42 p log n  ≤ 4 √ pn + 84 p log n , whence E k A − E A k op ≤ 2 E k A ◦ R k op ≤ 8 √ pn + 168 p log n . W e complete the proof by bounding th e devia tion of k A − E A k op from its e xpectation. This can be done using a standard Lipschi tz concentra tion inequality : Lemma 9 (Theorem 6.10 in [ 14 ], g eneral ized in E xercise 6.5 therein) . Let X ⊂ R d be a con ve x compact set with diameter B . Let X 1 , . . . , X N be independent ra ndom variables takin g valu es in X and assume tha t f : X N → R is separate ly con vex and 1 -Lipsc hitz, that is, | f ( x ) − f ( y ) | ≤ k x − y k for all x , y ∈ X N ⊂ R d N . Then Z = f ( X 1 , . . . , X N ) sa tisﬁes, for all t > 0 , P { Z > E Z + t } ≤ e − t 2 / (2 B 2 ) . T o use this resul t for our purpos e, w e note that Z = k A − E A k op / √ 2 is a func tion of the N = n ( n − 1) / 2 indepen dent lo wer triangula r entrie s of the symmetric matrix A − E A . Mo reov er , this function is separa tely con ve x and 1-Lipschit z. In our setting, each entry of A − E A take s v alues in the interv al X = [ − p , 1 − q ] , w hich is con vex co mpact with diameter B = 1 − q + p ≤ 2. Applying th e abo ve lemma yields tha t for ea ch t ≥ 0, P n k A − E A k op / √ 2 > E k A − E A k op / √ 2 + t o ≤ e − t 2 / (2 B 2 ) ≤ e − t 2 / 8 . Choosin g t = 3 p 2 log n and combining with the previo us bound on E k A − E A k op , giv es the desired inequa lity P n k A − E A k op > 8 √ pn + 174 p log n o ≤ n − 2 . 30 C.4 Pr oof of Lemma 5 For future reference , we state belo w a more general result; Lemma 5 is a special case w ith σ 2 = p . Lemma 10. Suppose that X ∈ R n × n is a random matrix with ind epend ent entries of the follo wing distrib utions X i j =        1 − p i j with p r obabilit y p i j − p i j with p r obabilit y 1 − p i j . Let σ be a number that sati sﬁes p i j ≤ σ 2 for all ( i , j ) , and ← → X be the matrix obtain ed fr om X by zer oing out all the r ows and columns havin g mor e than 40 σ 2 n posi tive entries. Then with pr obabil ity at least 1 − P 3 = 1 − e − (3 − ln 2)2 n − (2 n ) − 3 , k ← → X k op ≤ C σ √ n fo r so me abs olute consta nt C > 0 . W e no w pro ve this lemma. Note that the matrix X is not symmetric . T o mak e use of a known result tha t requir es symmetry , we emplo y a standard dilation ar gument. W e construct the matrix D ≔ " O 1 X X ⊤ O 2 # ∈ R 2 n × 2 n , where O 1 , O 2 ∈ R n × n are all-zero matrice s. The matri ces O 1 and O 2 can be vie w ed as ra ndom symmetric matric es whose entries abov e the di agona l are indep enden t cente red Bernoulli rando m v ariabl e of rat e zer o. The matri x D is symmetri c and satisﬁes the a ssumptio ns in t he follo wing kno wn result with N = 2 n . Lemma 11 (Lemma 12 of [ 21 ]) . Suppose that D ∈ R N × N is a rando m symmetri c matrix with zer o on the diagonal whose entries abov e the diagon al ar e indep enden t with the following distrib utions D i j =        1 − p i j with p r obabilit y p i j − p i j with p r obabilit y 1 − p i j . Let σ be a quantity such that p i j ≤ σ 2 for all ( i , j ) ∈ [ N ] × [ N ] , an d D 1 be the matrix obtained fr om D by zer oing out all the r ows and columns having mor e than 20 σ 2 N positiv e entries. Then with p r obabilit y 1 − o (1) , k D 1 k op ≤ C ′ σ √ N for some a bsolut e constant C ′ > 0 . Remark. The prob ability abo ve is in fact 1 − o (1) = 1 − P 3 = 1 − e − (3 − ln 2) N − N − 3 , a s can be se en by inspec ting th e proof of the le mma. Moreov er , since D is symmetric, the indices of the rows and columns bei ng zeroed ou t are identical. Lemma 11 ensure s th at k D 1 k op ≤ C ′ σ √ 2 n = C σ √ n with probab ility a t least 1 − P 3 , where C = √ 2 C ′ . The lemma at the be ginni ng of this sub-s ection then follo ws from the clai m that k ← → X k op = k D 1 k op . It remains to prov e the abov e claim. Recall that pos i • ( M ) and pos • j ( M ) are the numbers of positi ve entrie s in the i -th ro w and j -th column of a matrix M , resp ecti vely . For each ( i , j ) ∈ [ n ] × [ n ], we observ e that by constructio n, pos i • ( D ) > 20 σ 2 N ⇔ pos i • ( X ) > 40 σ 2 n pos • ( j + n ) ( D ) > 20 σ 2 N ⇔ pos • j ( X ) > 40 σ 2 n and simil arly pos ( i + n ) • ( D ) > 20 σ 2 N ⇔ pos i • ( X ⊤ ) > 40 σ 2 n 31 pos • j ( D ) > 20 σ 2 N ⇔ pos • j ( X ⊤ ) > 40 σ 2 n . It th en follo ws from the d eﬁnitions of D and D 1 that D 1 =         O 1 ← → X ← → X ⊤ O 2         =        O 1 ← → X ← → X ⊤ O 2        , whence k D 1 k op = k ← → X k op . C.5 Pr oof of F act 3 Because b Y is fea sible to the SDP ( 1 ), we kn o w that k b Y k ∞ ≤ 1. Let c ( i ) be th e inde x of the cluste r that co ntains node i . Fo r eac h ( i , j ) ∈ [ n ] × [ n ], we ha ve the bound | ( UU ⊤ b Y ) i j | =         X t : c ( t ) = c ( i ) ( UU ⊤ ) it b Y t j         ≤ ℓ · 1 ℓ · k b Y k ∞ ≤ 1 thanks to the struct ure of the matrix UU ⊤ . The same bound holds for the matrices b YUU ⊤ and UU ⊤ b YUU ⊤ by similar argu ments. It follo ws that kP T ⊥ ( b Y ) k ∞ = k b Y − UU ⊤ b Y − b YUU ⊤ + UU ⊤ b YUU ⊤ k ∞ ≤ k b Y k ∞ + k UU ⊤ b Y k ∞ + k b YUU ⊤ k ∞ + k UU ⊤ b YUU ⊤ k ∞ ≤ 4 . C.6 Pr oof of the implication ( 24 ) ⇒ ( 23 ) W e ﬁrst sho w tha t the tail conditi on ( 24 ) of a rando m v ariabl e X implies a bo und for its moments. Note th at we do not requi re X to be centered. Lemma 12. Suppose that for some λ > 0 , the ra ndom variab le X sa tisﬁes P ( | X | > z ) ≤ 2 e − 8 z /λ , ∀ z ≥ 0 . Then for each positive inte ger m ≥ 1 ,  E  | X | m  1 / m ≤ 1 4 λ ( m ! ) 1 / m . Pr oof. Let Γ ( · ) den ote the Gamma function . W e ha ve the b ound E  | X | m  = Z ∞ 0 P  | X | m > z  d z = Z ∞ 0 P  | X | > z 1 / m  d z ≤ Z ∞ 0 2 exp − 8 z 1 / m λ ! d z ( i ) = 2 · ( λ/ 8 ) m · m Z ∞ 0 e − u u m − 1 d u ≤ ( λ/ 4) m · m Γ ( m ) = ( λ/ 4) m m ! , where step ( i ) follo w s from a chan ge of varia ble u = 8 z 1 / m /λ . T aking the m -th root of both sides pro ves the result.  32 Using Mink owski’ s inequality , w e can see that the abov e moment bounds are sub-additi ve: Lemma 13. Suppose that fo r some λ 1 , λ 2 > 0 , the r ando m variabl es X 1 and X 2 satisfy  E  | X 1 | m  1 / m ≤ λ 1 ( m ! ) 1 / m and  E  | X 2 | m  1 / m ≤ λ 2 ( m ! ) 1 / m for eac h positive inte ger m. Then for eac h pos itive inte ger m,  E  | X 1 + X 2 | m  1 / m ≤ ( λ 1 + λ 2 ) ( m ! ) 1 / m . Consequ ently , cen tering a random variab le does not a ﬀ ect its sub-expo nentia lity up to constant fact ors. In particula r , suppose that X satisﬁes the moment bounds in Lemma 12 . Then for eve ry positi ve inte ger m ,  E  | E X | m  1 / m =  | E X | m  1 / m ( i ) ≤  E | X | m  1 / m ≤ λ 4 ( m ! ) 1 / m , where ste p ( i ) us es the J ensen’ s inequali ty . A pplyin g Lemma 13 giv es  E  | X − E X | m  1 / m ≤ λ 2 ( m ! ) 1 / m . The next lemma sho ws that the m oment bound implies the bound ( 23 ) on the moment gener - ating func tion, hence co mpleting the proof of th e implicat ion ( 24 ) ⇒ ( 23 ). Lemma 14. Suppose that fo r some number λ > 0 , th e ra ndom va riable X satisﬁes  E  | X − E X | m  1 / m ≤ λ 2 · ( m ! ) 1 / m for eac h positive inte ger m. Then we have E h e t ( X − E X ) i ≤ e t 2 λ 2 / 2 , ∀ | t | ≤ 1 λ . Pr oof. For eac h t such th at | t | ≤ 1 /λ , we hav e E h e t ( X − E X ) i = 1 + t E [ X − E X ] + ∞ X m = 2 t m E [( X − E X ) m ] m ! ( i ) ≤ 1 + ∞ X m = 2 | t | m E  | X − E X | m  m ! ≤ 1 + ∞ X m = 2  | t | · λ 2  m = 1 + t 2 λ 2 4 ∞ X m = 0  | t | · λ 2  m ≤ 1 + t 2 λ 2 2 ≤ e t 2 λ 2 / 2 , where ste p ( i ) holds since E [ X − E X ] = 0.  33 D Proof of Theorem 2 As mentione d in Append ix A , we use an appr oximate k -medians clustering algorith m with ap- proximat ion ratio ρ = 20 3 . Theorem 2 follo ws immediately by combining Theorem 1 and the propo sition belo w . Pro position 3. The clus tering assignmen t b σ = ρ -kmed ( b Y ) pr oduced by the ρ -ap pr oximate k- median a lgorith m (A lgorit hm 1 ) sati sﬁes the er r or bound err ( b σ , σ ∗ ) ≤ 2(1 + 2 ρ ) · k b Y − Y ∗ k 1 k Y ∗ k 1 . W e note that a similar re sult appeared in the paper [ 17 ], speciﬁcally the ir Theorem 2 restr icted to th e non- deg ree-co rrected settin g. T he p roposi tion provide s a moder ate genera lizatio n, estab- lishin g a general relations hip be tween clusterin g er ror of the ρ -approx imation k -median procedu re and the error of its input b Y . The rest of this sectio n is de voted to the proof of P roposi tion 3 . Recall tha t ( Ψ , X ) is the optimal sol ution of the k -medians prob lem ( 25 ), and ( ˇ Ψ , ˇ X ) i s a ρ - approx imate solution. Set Y ≔ Ψ X and ˇ Y ≔ ˇ Ψ ˇ X . Note that the soluti on ( ˇ Ψ , ˇ X ) corresponds to at most k clusters. W ithout loss o f generality we may assume tha t it actu ally co ntains e xactl y k cluste rs, and thus the cluste r membersh ip matr ix ˇ Ψ is in M n , k and has e xactly k distinct ro ws. If this is not true, we can alwa ys move an arbitrary po int from the n input points to a new cluster , without chan ging the k -medians object i ve v alue o f the approximatio n solu tion ( ˇ Ψ , ˇ X ). W e ne xt re write th e clust er error metric ( 7 ) in matr ix form. L et Ψ ∗ ∈ M n , k be the members hip matrix correspo nding to the ground truth clusters; that is, for each i ∈ [ n ], Ψ ∗ i σ ∗ i is the only non-zero element o f the i -th r o w o f Ψ ∗ , and thus Ψ ∗ ( Ψ ∗ ) ⊤ = Y ∗ . Let S k be t he s et o f k × k permutation matrices. The set of misclassiﬁed nodes w ith resp ect to a permutation Π ∈ S k , is then gi ven by E ( Π ) ≔ n i ∈ [ n ] :  ˇ ΨΠ  i • , Ψ ∗ i • o . W ith this no tation , the error metr ic ( 7 ) can be expres sed as err ( b σ , σ ∗ ) = min Π ∈S k n − 1 |E ( Π ) | , and it re mains to bound the RHS. T o this end, we constr uct se ve ral useful sets. For each a ∈ [ k ], let C ∗ a = { i ∈ [ n ] : σ ∗ i = a } be the a -t h clust er , and we deﬁne the node ind ex sets T a ≔ n i ∈ C ∗ a : k ˇ Y i • − Y ∗ i • k 1 < ℓ o and S a ≔ C ∗ a \ T a . Let T ≔ S a ∈ [ k ] T a and S ≔ S a ∈ [ k ] S a . Note that S 1 , . . . , S k , T 1 , . . . , T k are disjoi nt with T ∪ S = [ n ]. Further deﬁne th e clus ter inde x sets R 1 ≔ { a ∈ [ k ] : T a = ∅} , R 2 ≔ n a ∈ [ k ] : T a , ∅ ; ˇ Ψ i • = ˇ Ψ j • , ∀ i , j ∈ T a o , R 3 ≔ [ k ] \ ( R 1 ∪ R 2 ) = n a ∈ [ k ] : T a , ∅ ; ˇ Ψ i • , ˇ Ψ j • , ∃ i , j ∈ T a o . Note th at R 1 , R 2 and R 3 are disjoint sets. Wi th the above notations, we ha ve the follo wing claims. Claim 1 . min Π ∈S k |E ( Π ) | ≤ | S | + | R 3 | ℓ . Claim 2 . | R 3 | ≤ | R 1 | . Claim 3 . | R 1 | ℓ ≤ | S | ≤ 1 ℓ k ˇ Y − Y ∗ k 1 . 34 Claim 4 . k ˇ Y − Y ∗ k 1 ≤ (1 + 2 ρ ) k b Y − Y ∗ k 1 . Applying th e abo ve claims in orde r , we obtain th at min Π ∈S k n − 1 |E ( Π ) | ≤ 2(1 + 2 ρ ) n ℓ k b Y − Y ∗ k 1 = 2(1 + 2 ρ ) k b Y − Y ∗ k 1 k Y ∗ k 1 , where the last equa lity follows fro m k Y ∗ k 1 = n ℓ . C ombining with the afo rementio ned expre ssion err ( b σ , σ ∗ ) = min Π ∈S k n − 1 |E ( Π ) | pro ves Propositio n 3 . W e pro ve the the abov e claims in the sub -sectio ns to follo w . D.1 Pr oof of Claim 1 W e reco rd a pro perty of the cluster membersh ip matrix ˇ Ψ of the approx imate k -medians sol ution. Lemma 15. F or eac h cluster pair a , b ∈ R 2 ∪ R 3 with a , b and each node pair i ∈ T a , j ∈ T b , we have ˇ Ψ i • , ˇ Ψ j • . Pr oof. For each pair a , b ∈ R 2 ∪ R 3 with a , b , we ha ve T a , ∅ and T b , ∅ by deﬁnition. Fo r each pair i ∈ T a , j ∈ T b , we hav e the inequa lity k ˇ Y i • − ˇ Y j • k 1 ≥ k Y ∗ i • − Y ∗ j • k 1 − k ˇ Y i • − Y ∗ i • k 1 − k ˇ Y j • − Y ∗ j • k 1 . Since nodes i and j are in two di ﬀ erent clus ters, each of Y ∗ i • and Y ∗ j • is a binary vector with exa ctly ℓ on es, an d they ha ve d isjoin t sup port; we therefo re ha ve k Y ∗ i • − Y ∗ j • k 1 = 2 ℓ . Moreov er , note that k ˇ Y i • − Y ∗ i • k 1 < ℓ and k ˇ Y j • − Y ∗ j • k 1 < ℓ by deﬁnition of T a and T b . It foll o ws that k ˇ Y i • − ˇ Y j • k 1 > 2 ℓ − ℓ − ℓ = 0, and thus ˇ Y i • , ˇ Y j • . The latter implies that ˇ Ψ i • , ˇ Ψ j • ; otherwis e we wo uld rea ch a contradicti on ˇ Y i • =  ˇ Ψ i •  ⊤ ˇ X =  ˇ Ψ j •  ⊤ ˇ X = ˇ Y j • .  Proceedi ng to the p roof of Claim 1 , we observe tha t S , T 1 , T 2 , . . . , T k are dis joint and satisfy [ n ] = S ∪ T = S ∪         [ a ∈ R 2 T a         ∪         [ a ∈ R 3 T a         , where the last equality holds since a ∈ R 1 implies T a = ∅ . T o prov e the claim, it su ﬃ ces to sho w that there e xists some Π ∈ S k such that E ( Π ) ⊆ S ∪  S a ∈ R 3 T a  = [ n ] \ S a ∈ R 2 T a . Inde ed, for each a ∈ R 2 , an y pair i , j ∈ T a satisﬁes ˇ Ψ i • = ˇ Ψ j • by deﬁnition. This fact, combined with Lemma 15 , implies that there exis ts some Π ∈ S k such that  ˇ ΨΠ  i • = Ψ ∗ i • for all i ∈ S a ∈ R 2 T a . By deﬁnition of E ( Π ), we ha ve E ( Π ) ∩  S a ∈ R 2 T a  = ∅ and ar e there fore done. D.2 Pr oof of Claim 2 The cla im follo ws by rearran ging the left and right hand sides of the equation | R 2 | + 2 | R 3 | ≤ k = | R 1 | + | R 2 | + | R 3 | , which w e now prov e. The equality follows from the deﬁnition of R 1 , R 2 and R 3 . For the inequa lity , note that each elemen t of R 2 contri b utes at least one di stinct row in ˇ Ψ and each element of R 3 contri b utes at least two distin ct ro ws in ˇ Ψ . The indices of these ro ws are all in T by deﬁniti on, and Lemm a 15 guarantees that these ro ws are distinct across R 2 ∪ R 3 . The inequality then fo llo w s from th e fa ct that ˇ Ψ has k disti nct ro w s. 35 D.3 Pr oof of Claim 3 The ﬁrst inequalit y in the claim hold s becau se | S | = X a ∈ [ k ] | S a | ≥ X a ∈ R 1 | S a | ( i ) = X a ∈ R 1    C ∗ a    = | R 1 | ℓ, where ste p ( i ) ho lds sin ce T a = ∅ for ea ch a ∈ R 1 and thu s S a = C ∗ a . On the oth er hand , we ha ve | S | ( ii ) ≤ X i ∈ S 1 ℓ k ˇ Y i • − Y ∗ i • k 1 ( iii ) ≤ 1 ℓ k ˇ Y − Y ∗ k 1 , where step ( ii ) holds since 1 ℓ k ˇ Y i • − Y ∗ i • k 1 ≥ 1 for each i ∈ S , and step ( iii ) ho lds since S ⊆ [ n ]. This p rov es the second ineq uality in the claim. D.4 Pr oof of Claim 4 Recall that Y ∗ = Ψ ∗ ( Ψ ∗ ) ⊤ and Ψ ∗ ∈ M n , k . Introd ucing an e xtra piece of n otatio n X ∗ ≔ ( Ψ ∗ ) ⊤ , we can write Y ∗ = Ψ ∗ X ∗ . Let e X ∈ R k × n be the matrix whose a -th ro w is equal to the element in n b Y i • : i ∈ C ∗ a o that is closest to X ∗ a • in ℓ 1 norm; tha t is, e X a • ≔ ar g min x ∈ n b Y i • : i ∈ C ∗ a o k x − X ∗ a • k 1 for eac h a ∈ [ k ] . Finally le t e Y ≔ Ψ ∗ e X . W e h a ve the inequalit y k b Y − Y ∗ k 1 = X a ∈ [ k ] X i ∈ C ∗ a k b Y i • − Y ∗ i • k 1 ( i ) = X a ∈ [ k ] X i ∈ C ∗ a k b Y i • − X ∗ a • k 1 ≥ X a ∈ [ k ] X i ∈ C ∗ a k e X a • − X ∗ a • k 1 ( ii ) = X a ∈ [ k ] X i ∈ C ∗ a k e Y i • − Y ∗ i • k 1 = k e Y − Y ∗ k 1 , where step ( i ) holds si nce for eac h a ∈ [ k ], the a -th row of X ∗ is the distinct ro w in Y ∗ corres pond- ing to the cluster C ∗ a and thus X ∗ a • = Y ∗ i • for all i ∈ C ∗ a ; ste p ( ii ) can be justiﬁed by applying th e same ar gument to e X and e Y . It follo ws that k e Y − b Y k 1 ≤ k b Y − Y ∗ k 1 + k e Y − Y ∗ k 1 ≤ 2 k b Y − Y ∗ k 1 and he nce k ˇ Y − b Y k 1 ( i ) ≤ ρ k Y − b Y k 1 ( ii ) ≤ ρ k e Y − b Y k 1 ≤ 2 ρ k b Y − Y ∗ k 1 , where step ( i ) holds by the approxi mation ratio guarante e ( 26 ), and step ( ii ) holds since ( Ψ , X ) is optimal for the k -medians problem ( 25 ) (recall that Y ≔ Ψ X ) while ( Ψ ∗ , e X ) is feasib le for the same proble m (becau se Ψ ∗ ∈ M n , k and by deﬁnitio n the rows of e X are selec ted from b Y ). Combining pieces , we obta in that k ˇ Y − Y ∗ k 1 ≤ k b Y − Y ∗ k 1 + k ˇ Y − b Y k 1 ≤ (1 + 2 ρ ) k b Y − Y ∗ k 1 . 36 Refer ences [1] Emmanuel A bbe. Community detection and the stocha stic block model: recent de velop- ments. Journa l of Mac hine Learning Resear ch, to appear , 2017 . [2] Emmanuel Abbe, Afonso S . Bandeira, Annina B racher , a nd Amit Singer . Decoding binary node labels from censore d ed ge measureme nts: P hase transition and e ﬃ cient reco ve ry . IE EE T rans action s on Network Scien ce and Engi neerin g , 1(1):10–22 , 201 4. [3] Emmanuel Abbe, Afonso S. Bandei ra, and Geor gina Hall. Exact recove ry in the stochas tic block model . IEEE T ra nsacti ons on Informa tion Theory , 62(1 ):471– 487, 2016 . [4] Emmanuel Abbe and Colin Sandon. Community dete ction in general sto chasti c block mod- els: Fundamental limits and e ﬃ cient algorithms for reco ver y . In IEEE 56th Annual Sympo- sium o n F ounda tions of Computer Sc ience (FOCS) , pages 670–68 8. IEE E, 2015. [5] Emmanuel Abbe and C olin Sandon. Detection in the stoch astic block model with multipl e cluste rs: p roof of the achie vabi lity co njectu res, ac yclic BP, and the informati on-comp utation gap. arXiv pr eprint arXiv:15 12.090 80 , 2015. [6] Emmanuel Abbe and Colin Sandon. Recov ering communities in th e general stochastic b lock model witho ut kno wing the pa rameters. In Advances in N eur al Info rmation Pr ocessing Sys- tems , pa ges 676–6 84, 2015. [7] Naman Agarw al, Afonso S. B andeira, K onstantinos K oiliaris, and Alexan dra K olla. Mul- tisecti on in the stochasti c block model using semideﬁnite progr amming. arXiv pre print arXiv:15 07.0232 3 , 2015 . [8] Brendan P . W . Ames. Guara nteed clustering and bicluster ing via se mideﬁnite p rogramming . Mathematic al Pr og ramming , 147(1 –2):42 9–465, 2014. [9] Brendan P . W . Ames and Stephen A. V av asis. Con vex optimizati on for the planted k-disjo int- clique pro blem. Mathe matical Pr ogra mming , 143(1 -2):29 9–337, 2014. [10] Arash A. Amini and Eliza veta Levina. On semideﬁnite relaxat ions for the block m odel. arXiv pr eprint arXiv :1406.5 647 , 2014. [11] Afonso S. Bandeira. R andom Laplacian matric es and con vex relax ations . F ound ations of Computatio nal Mathematics , page s 1–35, 20 15. [12] Afonso S . Bandeira and R amon van Handel. Sharp nonasymptoti c bounds on the norm of random matrice s with indepen dent entrie s. The Annals of Pr obabilit y , 44(4):2479– 2506 , 07 2016. [13] Béla Bollobá s and A lex D. Scott. Max cut for rando m graphs with a planted partit ion. Combinato rics, Pr obability and Computin g , 13(4- 5):451 –474, 2004. [14] Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concen tra tion inequal ities: A nonas ymptotic theory of independen ce . Oxford Uni versity Press , 201 3. [15] T . T ony C ai and X iaodon g Li. Robust and computationa lly feasib le community detection in the pre sence of a rbitrary outli er nodes. Annals of Sta tistics , 43(3 ):1027 –1059 , 2015. 37 [16] Moses Charik ar , Sudipt o Guha, Év a T ardos , and Da vid B. Shmo ys. A consta nt-f actor ap- proximat ion alg orithm for the k-median problem. In Pr oceedings o f the 31st Annual ACM Symposiu m on Theory of Computing , pages 1–10. A CM, 1 999. [17] Y udong Chen, Xiaodong L i, and Jiaming Xu. Con vex iﬁed m odular ity m aximization for deg ree-co rrected stochastic block model s. ArXiv e-p rints 151 2.0842 5 , December 2 015. [18] Y udong Chen, S ujay Sangha vi, and H uan Xu. Clustering sparse grap hs. In Advances in neur al informa tion pr ocessing syste ms , pages 220 4–221 2, 2012. [19] Y udong Chen, Sujay Sangha vi, and Huan Xu. Impro ved graph clustering. IEEE T ra nsacti ons on Info rmation Theory , 60 (10):6 440–6 455, 2014. [20] Y udong Chen and Jiaming Xu. Statisti cal-comp utational tradeo ﬀ s in planted probl ems and submatri x localizatio n with a gro wing number of clusters and submatrices . Jou rnal of Ma- chi ne Learni ng Resear ch , 17( 27):1– 57, 201 6. [21] Peter Chin, A nup Rao, and V an V u. Stochastic block model and community detec tion in sparse gra phs: A spectral algorithm with op timal rate of recov ery . In Pr oceedings of The 28th Con fer ence on Learni ng Theory (COLT) , pages 39 1–423 , Pari s, Franc e, Jul y 2015. [22] Amin Coja-Oghlan. Co loring semirandom graphs optimally . Aut omata, L angua ges and Pr ogr amming , page s 383–39 5, 2004. [23] Aurelien Decelle, Florent Krzakala, Cristopher Moore, an d Lenka Zdeboro vá. Asymptot ic analys is of the stochasti c blo ck m odel for modular networks and its algorithmic applicat ions. Physical Rev iew E , 84: 06610 6, Dec 20 11. [24] Uriel F eige and Joe Kilian. Heuristics for semirand om graph problems. Jour nal of C omputer and Sy stem Scie nces , 63(4):63 9–671 , 2001. [25] Alan Frieze and Colin McDiarmid. Algorith mic theory o f random graphs. Rando m Struc- tur es and Algor ithms , 10(1-2 ):5–42 , 1997. [26] Chao Gao, Zongmin g Ma, A nderson Y . Zhang, and Harrison H . Zhou. A chie ving opt imal misclass iﬁcation propor tion in stochastic block model. arXiv pr eprint arXiv:1505.037 72 , 2015. [27] Chao Gao, Zongming Ma, Anderson Y . Zhang, a nd Harrison H. Zhou. Community detecti on in d egr ee-corr ected bloc k models. arXiv pr eprint arXiv:1607 .06993 , 2016. [28] Alexan der Grothendieck . Résumé de la théorie métrique des produit s tensorie ls topolo gique s. Resenhas do Institu to de Mat emática e Estatistica da Univer sidad e de São P aulo , 2( 4):401 –481, 1953 . [29] Oliv ier Guédon and Roman V ershy nin. Community detec tion in sparse networks via Grothend ieck’ s inequality . Pr obability Theory and Related F ields , 165(3-4 ):102 5–1049, 2016. [30] Bruce Hajek, Y ihong W u, and Jiaming Xu. E xact re cov ery thres hold in the binary censor ed block model . In IEEE Informati on T heory W orkshop-F all (ITW) , pa ges 99–10 3, 2015. 38 [31] Bruce Hajek, Y ihong W u, and Jiaming Xu. Achievi ng exact cluster reco ve ry threshold via semideﬁnit e progr amming. IEE E T ransa ctions on Informa tion Theory , 62(5):2 788– 2797, 2016. [32] Simon Heimlicher , Marc Lelarge , and Laurent Ma ssoulié . Community detect ion in the la- belled sto chasti c block model. arXiv pr eprin t arXiv:1209.291 0 , 2012 . [33] Paul W . Holland, Kathryn B. Laske y , and Samuel Leinhar dt. Stochastic block models: Some ﬁrst st eps. Soc ial Networks , 5:1 09–13 7, 1983 . [34] Adel Ja vanmard , Andrea Monta nari, and Federico R icci-T erseng hi. Pha se tran sition s in semideﬁnit e rel axatio ns. Pr oceedin gs of the National A cademy of Sci ences , 113(16):E22 18– E2223, 201 6. [35] Raghunan dan H. Kes ha v an, Sew oong Oh, and Andrea Montanari. Matrix complet ion fro m a fe w entries. In 2009 IEE E Interna tional Symposi um on Informat ion Theory , pages 324–328. IEEE, 2 009. [36] Michael Kri vele vich and D an V ilenchik. Semirandom model s as benchmarks fo r co loring algori thms. In Thir d W orksho p on Analyt ic Algorithmics and Combinator ics (ANALCO) , pages 211 –221, 200 6. [37] Marc Lelarge , Lauren t Mass oulié, an d Jiaming Xu. Reconstructi on in the labelled stoc hastic block model. IEEE T ran sactio ns on Network Science and Eng ineerin g , 2(4): 152–1 63, 2015. [38] Joram Lindenstrau ss and Aleksander Pełczy ´ nski. Absolutely summing operato rs in L p - spaces an d their application s. Studi a Mathematica , 3(29):275 –326, 1968 . [39] Y u Lu an d Harriso n H. Zhou. Statistica l and computati onal guarantees of Llo yd’ s algorith m and its varia nts. arXiv pre print arXiv:161 2.0209 9 , 2016. [40] K onstan tin Makaryc he v , Y ury Makarych e v , and Aravi ndan V ijayaragha van . L earnin g com- munities in the presen ce of errors. In 29th A nnual Confer ence on Learning Theo ry , page s 1258– 1291, 2016. [41] Laurent Massoulié . C omm unity detection thresh olds and the weak Ramanujan property . In Pr oceeding s of the 46th Annual A CM Symposi um on T heory of Computing , pages 694–70 3. A CM, 2014. [42] Ankur Moitra, W illiam Perry , and Alexander S. W ein. H o w rob ust are reconstr uction thres h- olds for community detect ion? In P r oceedi ngs of the 48th Annual ACM SIGA CT Symposium on Theor y of C omputing , pages 828–8 41. A CM, 2016. [43] Andrea M ontana ri a nd Subhabra ta Sen . Semideﬁnite programs on sparse random graphs and their applica tion to community detectio n. In Pr oceedings of the 48th Annua l ACM SIGA C T Symposiu m on Theory of Computing (ST OC) , page s 814–827 , Camb ridge, MA, USA, June 2016. [44] Cristophe r Moore. The compu ter scienc e and phy sics of community detectio n: Landscap es, phase tran sition s, and hardness. Bulletin of E ur opea n Associatio n for The or etical Computer Scienc e (EATCS) , (121), Febru rary 2017 . 39 [45] Elchanan Mossel, Joe Neeman, and Allan Sly . Stochastic block models and reconstruc tion. arXiv pr eprint arXiv :1202.1 499 , 2012. [46] Elchanan M ossel, Joe Nee man, and Allan Sly . A proof of the blo ck model threshol d conjec- ture. arXiv pr eprint arXiv:1311 .4115 , 2013. [47] Elchanan Mosse l, Joe Neeman, and Allan Sly . Reconstr uction and estimation in the pl anted partiti on model. P r obabilit y T heory and Related Fi elds , 162 (3-4): 431–4 61, 2015. [48] Elchanan Mosse l, Joe Neeman , and A llan Sly . Consistenc y thresho lds for the pla nted bise c- tion mod el. Electr onic Jour nal of Pr obabi lity , 21(21) :1–24 , 2016. [49] Samet Oymak and B abak Hassibi . F inding den se clusters via "lo w rank + sparse" decompo - sition . arXiv pr eprint arXiv:110 4.5186 , 2011. [50] W illiam Perry and Alexa nder S. W ein. A semideﬁnite progr am for unbalanc ed m ultisec tion in th e stoc hastic block mode l. arXiv pr eprint arXiv :1507.0 5605 , 2015 . [51] Federico Ricci-T ersenghi, Adel Jav anmard, and Andrea Montana ri. Performance of a com- munity detec tion a lgorith m b ased on semideﬁnite progra mming. J ourna l of P hysics : Con- fer ence Series , 69 9(1):0 1201 5, 2016. [52] Philippe Rigollet. 18.s997 : High dimensio nal statis tics. Lectur e Notes, Cambridg e, MA, USA: MIT OpenCourse W ar e , 2015. [53] Alaa Saade , Marc Lelar ge, Florent Krzakala, and Lenka Z deborová . Spectral detectio n in the censored block m odel. I n 2015 IEE E Intern ation al Symposium o n Informatio n Theory (ISIT) , pages 1184–11 88. IEEE, 2015. [54] Roman V ersh ynin. Introdu ction to the non-asympto tic analysis of random matrices. In Y on- ina C. Eldar and Gitta Kuty niok, editors, Compr essed Sensing , pages 210–268. Cambridge Uni ver sity Press, 2012. [55] Se-Y oung Y un and Alex andre P routier e. Accurate communit y detectio n in the stochas tic block model via spec tral algorithms . arXiv pr eprint arXiv:141 2.7335 , 2014. [56] Se-Y oung Y un and Alexan dre Proutiere. Optimal cluster recov ery in the labeled stochas tic block m odel. In Advan ces in Neura l In formatio n Pr ocessing Syste ms , pag es 965 –973, 20 16. 40

Exponential error rates of SDP for block models: Beyond Grothendiecks inequality

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment