How many asymmetric communities are there in multi-layer directed networks?

Estimating the asymmetric numbers of communities in multi-layer directed networks is a challenging problem due to the multi-layer structures and inherent directional asymmetry, leading to possibly different numbers of sender and receiver communities.…

Authors: Huan Qing

How many asymmetric communities are there in multi-layer directed networks?
Ho w many asymmetric communities are there in multi-layer directed networks? Huan Qing a, ∗ a Scho ol of Economics and Financ e, Chongqing University of T e chnolo gy , Chongqing , 400054, China Abstract Estimating the asymmetric num bers of communities in multi-layer directed networks is a cha llen ging problem due to the multi-layer structur es and inh erent dire c tional asymmetry , leadin g to possibly di ff er ent numb ers of sender an d receiver commu nities. This work addr esses this issue under the m ulti-layer stochastic co- block mod el, a mo del for multi-layer dir ected networks with distinct commu nity struc tu res in sending and receiving sides, b y p ropo sin g a novel goodn ess-of-fit test. The test statistic r elies on th e d eviation o f the largest singula r value of an aggregated norm alized residual matrix from the con stant 2 . The test statistic exhibits a sha r p dichotom y: Und er the n ull h ypoth e sis o f corre ct model spe cification, its up per bo und conv erges to zer o w ith high pro b ability; un der u nderfitting, the test statistic itself div erges to infinity . W ith this pro p erty , we d ev elop a sequen tial testing proced ure that searches th rough candidate pair s of send e r an d receiver commun ity numbe r s in a lexicogr aphic o rder . The pr ocess stop s at the smallest such pair where the test statistic drops below a decaying th r eshold. For robustness, we also p ropose a ratio - based variant algorith m, which detects sh arp change s in the sequen ce o f test statistics by comp aring co nsecutive candidates. Both method s are proven to consistently d e te r mine the tru e n umbers of send er and rec eiv er co mmunities und er the multi-layer stoch astic co-block m o del. K e ywor ds: Goodne ss-o f-fit te st, m u lti-layer stoch astic co- block mod el, multi-lay e r directed networks, com m unity detection 1. Introduction Multi-layer direc ted n etworks h av e emerged as a f undame ntal repr esentation f or complex re latio nal systems char- acterized b y multiple, a sy mmetric inter action patterns. Such n etworks co nsist of a common set of nodes and multiple layers, each captured by a directed adjacen cy matrix. This structu re naturally e n codes two critical aspe c ts of real- world relational data: the multiplicity of interaction contexts (layer s) [ 30 , 29 , 7 , 19 ] and the inheren t direction ality of relationships with in each con text [ 40 , 50 , 52 , 49 ]. By preservin g b oth the variety of inter action types a n d their direc- tional natu r e, multi-lay er directed n etworks o ff er a richer an d mor e faithful repr e sentation o f comp lex systems than single-layer o r undirected abstractions. Representative examp les spa n di verse fields: internation al trade networks, where layers corresp ond to di ff ere nt co mmodities an d directed edges represen t expor t flows [ 8 ]; brain connec ti vity studies, where laye rs reflect distinct cognitive task s and ed ges mode l directed neural pathways [ 4 , 2 ]; an d social commun ication systems, where separate layers may capture email correspond ence, co- a uthorship , or online social interactions [ 29 , 22 , 23 , 19 ]. A central problem in an alyzing such networks is co mmunity de tection—the identificatio n of group s of nodes that exhibit similar con nectivity patter ns [ 3 7 , 12 , 40 , 13 , 2 1 , 4 5 ]. In directed ne twork s, this task naturally decomp oses into two compleme n tary problem s: iden tifying send er com m unities (nod es with similar outgoin g connec tio ns) and receiver commun ities (nodes with similar incoming c o nnection s) [ 50 , 60 , 55 , 59 ]. This distinction reflects the fu ndamen tal asymmetry of d irected relationships, where in a node’ s r ole in in itiating interactions may be su bstantially di ff erent from its role in r eceiving them. The multi-lay er stoch astic co -block model (ML-ScBM) considered in [ 52 ] provides a p rincipled p robab ilistic fr amework fo r this setting. It extends the stochastic co-block mo del (ScBM) intro duced in ∗ Correspondi ng author . Email address: qinghuan@cqut .edu.cn & qinghuan@u.nus.edu (Huan Qing) Prep rint submitted to F eb ruary 26, 2026 [ 50 ] for sing le - layer directed networks—which itself g eneralizes the classical stoc hastic block mo del ( SBM) [ 16 ] by allowing sep arate comm unity assignmen ts f or send e rs and receivers—to mu ltiple layers. Notably , th e ML-ScBM can also be viewed as a g e neralization of the multi-lay er sto c h astic block model (ML-SBM) which is exten si vely studied in th e network litera ture [ 15 , 43 , 44 , 33 , 58 , 34 , 36 , 48 ] in rece n t y ears f r om the undirec te d to the directed setting. T h e ML-ScBM assumes tha t each nod e main tains con sistent sender and r eceiv er commu nity memb erships across all lay- ers, while the conne c tion pro babilities b etween comm unities c an vary from layer to layer . This f ormulatio n ach ieves an elegant b alance between flexibility ( accommo dating layer-specific connectivity patterns) an d parsimony (preserv- ing a com m on commu nity structure across layers), ther eby na turally cap turing asymm e tr ic relational structure s. Under th e ML-ScBM, the development of methods f or estimating commun ity membership s from observed data remains an ac tive area. A no table and e ff ective appro ach is the spectral co-clustering technique , which o ff ers compu- tational e ffi ciency and theo retical g uarantees. For instance, Su e t al. [ 52 ] developed a debiased spectral co -clusterin g algorithm th at aggregates info rmation ac r oss lay ers via a bias-adjusted sum of Gram matrices, followed by k -means applied to the lea ding eigen vectors. The work in [ 52 ] extends th e b ias-adjusted spectral clustering framew ork in- troduced in L ei and Lin [ 3 4 ] from m u lti-layer undirected to m ulti-layer directed ne twork s. This method provides estimation con sistency for commu nity recovery , as is comm only established for spectral clustering ap proach es in commun ity detection (see, e.g., [ 46 , 35 , 24 , 27 , 60 , 4 4 , 55 , 4 7 ] to name a few). A commo n requ irement shared by this an d any p rospective meth od f or detecting asymmetric commun ities in multi-la y er directed ne twork s is the pr io r knowledge of th e nu mber o f sen d er com m unities K s and receiver com munities K r . In pr actice, th ese quantities are rarely known in advance, making their estimatio n a fundamen tal prerequ isite for any community detectio n proce- dure. This disconn ect between method ological requ irements and practical applicatio n und erscores a p ressing need f or reliable, da ta -dr iv en techn iques to determin e these key par ameters. The pro blem of estimatin g the n umber of co m munities has been extensi vely in vestigated f o r simpler network models. For sing le-layer u n directed SBMs, a variety o f app roaches exist, including likelihood- ratio tests [ 54 , 3 9 ], Bayesian in formatio n cr iteria [ 42 , 51 , 17 ], spectru m of the Bethe Hessian matrices [ 31 , 20 ], network cro ss-validation [ 6 , 38 ], and good ness-of-fit tests [ 11 , 5 , 32 , 10 , 18 , 2 6 , 57 , 56 ]. Ho wever , a dir ect extension of these tech n iques to the multi-layer d ir ected setting faces several inherent challenges. First, the send e r and rec e i ver comm unity nu mbers may di ff er, r equiring th e ir join t estimation rather th an in depend ent treatme n t. Second , the presence o f multiple layers induces co mplex dep endencies and necessitates the integration of info rmation acr oss layers in a manner that sep arates common co mmunity signa ls fr om layer-specific n oise. Third , the asymmetr ic nature of the network deman ds to ols from non -symmetric matrix an alysis, movin g beyond the symm etric spectral th eory used fo r metho ds in undirected networks. Consequently , existing single-layer metho ds are not directly applicable to the multi-layer directed case, necessitating the development of novel metho d ologies for this c o mplex setting. This paper ad dresses this gap by intro ducing the first comp rehensive frame work f or the join t, consistent estimatio n of asymmetric commun ity nu mbers K s and K r in multi-layer directed n etworks. Our appr oach is built up on a novel goodn ess- of-fit test that exhib its a sharp theo retical d ichotom y: under a correctly specified m odel, the up per boun d of the test statistic co n verges to zero in pro bability , wh ereas under any underfitted mode l, the statistic itself div erges to infinity . This dichoto my provide s a rigo rous statistical foun dation for model selectio n. Le veraging this pr operty , we design two sequ ential algor ith ms that e ffi ciently search th e two-dimension a l space of cand idate pairs ( k s , k r ). Both algorithm s are computa tio nally e ffi c ient and provably consistent u nder the ML-ScBM with m ild regula r ity con d itions. The main co ntributions o f th is work are a s fo llows: • A novel go odness-of-fit test for the ML-ScBM. W e con struct a test statistic based on the largest singu lar value of a nor malized residual m atrix and characterize its asympto tic behavior under both nu ll (corre c tly spec ified ) and alternative (und erfitted) h ypotheses. W e establish th a t the upp er b ound of th e statistic converges to zer o under the null, while the statistic itself d iv erges to infin ity u nder any form of un derfitting, yielding a statistically principled criterion for mod el selection. • T w o sequential selection procedures for joint community number estima tion. W e develop two compu ta- tionally e ffi cient algorithm s to search the two-dimension al space of can didate co mmunity nu mbers. The first algorithm em ploys a le vel-crossing r ule with a d ata-driven thresho ld, examining candid ate pairs in a lexico- graphic order and stopping w h en the test statistic first falls below the threshold . T he second algorithm em ploys a ratio -based strategy: it comp utes the sequ ence of ratios of successiv e test statistics and selects th e model a t which this seq uence exhibits a sharp rise, ther eby ide ntifying the tran sition from un d erfitted to adequa te mode ls. 2 • Rigorous theoretical guarantee s. W e prove the asy m ptotic proper ties of th e test statistics an d the con sistency of b oth selection algo rithms. The analy sis hinges on two main theoretical developments. First, we derive sharp concentr a tion bo unds fo r the largest sing ular value of th e aggregated , norm alized residu al m atrix und er the null hypoth esis, leveraging no n-symm e tric random matrix theo ry tailored to th e mu lti-layer dir ected setting. Sec- ond, unde r un derfitting, we id entify and quantif y a determ in istic low-rank signal—arising from the merging o f distinct commu nities—that dom inates the stoc h astic noise asympto tically , wh ic h is th e key driver behind the test statistic’ s divergence. The consistency p roofs for the seq uential algo rithms then build on this dichotom y by carefully ch oosing the threshold sequences to separate these two regimes. Furtherm ore, ou r theory covers set- tings where com munity nu m bers grow slowly with network size, and p rovides explicit cond itions on thr eshold sequences tha t ensure cor rect stoppin g. The remaind e r of this paper is structur ed as follows. Section 2 form ally introduces the multi-layer stochastic co-block mo del and states the commu nity-number e stimation p roblem. Sec tion 3 dev elops the go odness-of-fit test and establishes its asy mptotic prop erties. Section 4 presen ts the seq uential selection appro aches and proves their estimation con sistency . Section 5 r e ports experimental studies that validate the th eoretical find ings and in vestigates the p erform ance of the prop osed methods. Section 6 con cludes. All techn ical proo fs are provided in the Ap pendix . 2. Model and problem stepup This section formally defines th e mu lti-layer stoch astic co-b lock mo del (ML- ScBM) and pr ecisely states the core estimation pr o blem. W e specify th e ML- ScBM’ s generative mechanism, which incorpo r ates d istinct send er and re- ceiv er commun ity m embership s across multiple layers to mo del asymmetr ic edge directio nality . The central pro b- lem addressed is the joint estimation of the un known sende r and receiver co mmunity counts ( K s , K r ) from observed multi-layer directed network data . Necessary theoretical assumptions fo r establishin g theoretical gu a r antees are a lso presented in this section. 2.1. Mu lti-layer stochastic co -block model (ML-ScBM) W e study a multi-lay er d irected network with L layers on n vertices. Each layer ℓ is represented by an adjacency matrix A ( ℓ ) ∈ { 0 , 1 } n × n , wher e A ( ℓ ) ii = 0 an d A ( ℓ ) i j = 1 signifies a direc te d edg e sending from no de i to no de j in layer ℓ . The multi-la yer stochastic co -block mo del (ML-ScBM) partitions the n vertices into K s sender commu n ities and K r r eceiver comm u nities acr oss all layers. Formally , th e mo del is specified as follows: Definition 1 (Multi-la y er stochastic co -block mo del (ML-ScBM) ) . Consider a mu lti-layer dir ected network with L layers an d n nodes per la yer . Let A ( ℓ ) ∈ { 0 , 1 } n × n be th e ad jacency ma trix for layer ℓ , wher e A ( ℓ ) ii = 0 an d A ( ℓ ) i j = 1 indicates a dir ected edge fr om no d e i to n ode j in layer ℓ . The multi-la yer sto cha stic co-blo ck mo del (ML-ScBM) is parameterized by: • Sen der commun ity lab els: g s ∈ { 1 , . . . , K s } n (common acr oss laye rs). • Receiver commun ity labels: g r ∈ { 1 , . . . , K r } n (common acr oss layers). • Layer-specific b lock pr oba bility ma trices: B ( ℓ ) ∈ [0 , 1] K s × K r for ℓ = 1 , . . . , L. Given ( g s , g r , { B ( ℓ ) } L ℓ = 1 ) , th e entries of A ( ℓ ) ar e indepen dent with P ( A ( ℓ ) ( i , j ) = 1 ) = B ( ℓ ) ( g s ( i ) , g r ( j )) , ∀ i , j , ℓ = 1 , . . . , L . The expected adjacen cy ma trix Ω ( ℓ ) for layer ℓ satisfies Ω ( ℓ ) ( i , j ) = B ( ℓ ) ( g s ( i ) , g r ( j )) for i , j and Ω ( ℓ ) ( i , i ) = 0 . The sender-receiver asymmetr y inherent in the ML-ScBM—captur ed b y the distinct mem b ership vectors g s and g r and th e possibly di ff er ent numb ers K s and K r —provides a flexible framework fo r modelin g asy m metric relational patterns in mu lti-layer d irected networks. T his ge n erality allows th e m odel to n aturally inc o rpora te direction -specific commun ity stru ctures tha t are o ften observed in real-world systems. The ML-ScBM contains se veral well-studied network mo dels as spec ial cases, there b y situating our work within a broad er literatur e . When the sender and re ceiv er 3 membersh ip s coincide ( g s = g r ) and every block p robab ility matrix B ( ℓ ) is symmetric, the m odel reduces to the multi-layer stoc h astic block model (ML-SBM) studied in [ 15 , 43 , 44 , 33 , 58 , 34 , 36 , 48 ]. In the sin g le-layer setting ( L = 1) , the ML-ScBM specializes to the sto chastic co -block model (ScBM) introd uced by [ 50 ]. Fina lly , when L = 1 and the network is undirec te d , the m odel degenerates to the classical stochastic block mo del ( SBM) of [ 16 ]. Th us, the ML-ScBM o ff ers a u nified prob a bilistic framework that sim u ltaneously gen eralizes the ML -SBM, the ScBM, and the SBM, while pr eserving the essential direction al asymmetry req uired f or analy zing mu lti-layer dir ected intera c tions. 2.2. P r oblem statement: joint estimatio n o f asymmetric comm u nity numb ers In this paper, we address the fund amental prob lem of asymmetric co mmunity numbers estimation in ML- ScBMs: giv en a mu lti-layer directed n e twork , how can we jointly determine the n umbers of sender comm unities K s and receiver co m munities K r ? W e formulate this a s a sequential good n ess-of-fit testing prob lem: for candidate pairs ( K s 0 , K r 0 ), we test H 0 : ( K s , K r ) = ( K s 0 , K r 0 ) versus H 1 : K s > K s 0 or K r > K r 0 , where the alternative hypoth e sis H 1 explicitly encodes underfitting scenarios where the hypo th esized mode l lacks su ffi cient sender or receiv er com munities. Th is fo rmulation is statistically challengin g and practically sig nificant f o r three reaso n s: 1. The b iv ariate na tu re o f th e commun ity structur e req uires a co mbinato rial search over cand idate p airs ( k s , k r ). When a n up per bo und K cand is set fo r th e cand idate comm unity numb ers as will be specified in Section 4.1 , the size of th is sear ch space is K 2 cand , making e ffi cien t testing strategies essential. 2. Underfitting (failing to r eject H 0 when H 1 holds) leads to loss of stru ctural resolu tion, while overfitting is mitigated alg o rithmically via order ed search. 3. Existing goodn ess-of-fit tests for und irected network s rely on symmetric eigenv alue d istributions that do not extend to the asymm etric sing ular value decom p ositions req uired fo r d irected ne twork s, and multi-layer exten - sions are n on-trivial. Giv en that fe w existing meth o ds provide theoretica lly gu aranteed joint estimation of ( K s , K r ) in ML-ScBM for multi-layer directed networks, this work b ridges this gap by developing a testing fr amew ork based on sin g ular value tail bound s of nor malized residual matrices. Our appro ach leverages the asym ptotic b ehavior of the largest sin gular value u n der H 0 versus H 1 , en a b ling con sistent com munity nu m ber estimation. 2.3. T echnical assumption s T o establish theo retical guar antees, we require the fo llowing r egularity conditio n s. Assumption 1. F or each la yer ℓ , the block p r oba bility ma trix sa tisfi e s δ ≤ B ( ℓ ) ( k , l ) ≤ 1 − δ for all k , l, and some δ ∈ (0 , 1 2 ) . Assumption 1 ensu r es well- defined variance in the residu al m atrix defined later a n d exclud es degenerate cases where no rmalization beco mes unstable . Assumption 2 . Th e commu nity sizes satisfy: min k = 1 ,. . ., K s |{ i : g s ( i ) = k }| ≥ c 0 n K s , min l = 1 ,..., K r |{ j : g r ( j ) = l }| ≥ c 0 n K r for some c 0 > 0 . Assumption 2 prevents any commun ity from bein g a sy mptotically n egligible, gu a ranteeing that the sample size within every block grows linea r ly with n . Note that Assump tions 1 and 2 are standar d in the com munity detection literature [ 32 , 17 , 18 , 57 , 56 ], representin g conventional regularity condition s fo r estimating th e nu mber of c o mmun i- ties. Assumption 3 . K max and L sa tisfy K 2 max L log n n → 0 as n → ∞ , wher e K max = m ax( K s , K r ) . 4 Assumption 3 governs the asym ptotic gr owth rates of the num ber of comm unities and lay ers r elativ e to the network size. This condition , K 2 max L log n n → 0 , is fu ndamen tal f o r ensur ing that the model co mplexity remains m anageab le as th e sample size incr eases, wh ic h is essential for establishing the co nsistency of the estimation p rocedu re. I ts specific fo rm arises directly fro m the co n vergence an alysis of the nor malized resid u al matrix ˆ R : the sp e c tral-norm error k ˆ R − R k is of ord er O P  K max p L log n / n  provided b y Lemma 5 in the ap pendix , and squar in g th is boun d yields the p recise condition need ed f or the test statistic ˆ T n to exhibit its r e quired sharp d ic h otomy . Here, the term K 2 max captures the complexity f rom asymmetric send e r and receiver com m unities, the factor L accounts for the linear g r owth in variance when ag gregating r esiduals across layer s, and th e lo g n factor o riginates f rom th e conce n tration ineq u alities contro lling maximum deviations. In practice, when the numb er o f layers L is fixed—a common scen ario in multi-laye r directed network studies—this con dition simp lifies to th e standard single-lay er scaling K 2 max log n / n → 0; wh en L grows, it ensures th e add itio nal variance fro m multiple layers does n ot obscure the c ommun ity signal. Thu s, Assumption 3 serves as a nec e ssary scaling law that extends classical co mmunity dete c tion th eory to multi-layer d irected networks, providing the founda tion for the theor etical gu arantees of the prop osed estimation pr ocedur es. Assumption 4 . Th e commu nity detection algorithm M u sed is con sistent un der H 0 , i.e., P ( ˆ g s = g s ) → 1 , P ( ˆ g r = g r ) → 1 as n → ∞ , wher e ˆ g s and ˆ g r ar e the estimated sending and r eceiving commu nity label vectors r eturned b y M with K s sending community n umbers and K r r eceiving community n umbers. Assumption 4 is stand ard in theoretical analy ses o f community detection meth ods, analogo us to th e consistency requirem ent in [ 32 , 56 ], and is essential for establish in g the asymptotic p roperties of the proposed goo dness-of- fit test. In our co ntext, this a ssum ption ensures that the estimated commun ity labels ˆ g s , ˆ g r conv erge to the tru e labels g s , g r with high proba bility as n g rows unde r H 0 . The consistency gu arantees th at th e plug-in estimates o f the b lock probab ility matric e s and the resultin g residual matrix ˆ R are su ffi c iently close to the ir oracle counter p arts, thereb y enabling the der iv ation o f the sh a rp dichotom y of the test statistic—con vergence to zero under the null and divergence under un derfitting—witho ut bein g ob scured by label estimation errors. 3. A spectral- ba sed goo dness-of-fit test This section develops a theoretically groun ded good ness-of-fit test for ML-ScBM. W e first intro duce an ideal test statistic using oracle parameter s, then d e riv e its p r actical co unterpa rt with estimated par ameters, and finally estab lish its asymptotic behavior un der bo th n ull and alternative hypoth eses. The core inn ovation lies in leveraging singular value tail bo unds of normalize d residual matrices to d etect co mmunity un derfitting. 3.1. Oracle test sta tistic a nd its asymptotics T o for malize the test, we begin with the ideal residual matrix R constructed using true para m eters: R ( i , j ) =                P L ℓ = 1  A ( ℓ ) ( i , j ) − Ω ( ℓ ) ( i , j )  q ( n − 1 ) P L ℓ = 1 Ω ( ℓ ) ( i , j )(1 − Ω ( ℓ ) ( i , j )) i , j , 0 i = j , where Ω ( ℓ ) ( i , j ) = B ( ℓ ) ( g s ( i ) , g r ( j )) is th e tru e edg e pro bability in lay e r ℓ . This norm alization en sures E [ R ( i , j )] = 0 and V ar( R ( i , j )) = 1 n − 1 for i , j , tr ansformin g R in to a gen eralized rand om non- symmetric matrix with controlled variance. The ideal test statistic is d efined as T n = σ 1 ( R ) − 2 , (1) where σ 1 ( · ) denotes th e largest singular value. The shift b y 2 a c c ounts fo r th e asy m ptotic be havior of σ 1 ( R ) under H 0 , as e stablished below . 5 Lemma 1 (Asy m ptotic beh avior of T n ) . Wh en A ssumptions 1 and 3 hold, for a ny ǫ > 0 , we hav e P ( T n < ǫ ) → 1 as n → ∞ . Lemma 1 establishes th e asymp totic spectr al baseline for a correctly specified model. Th is result pr ovid es the essential theo retical an chor: it precisely quan tifies the expected magnitu de of the largest singular value in the ide al residual matrix u nder the null hy pothesis, thereby fo rming the d eterministic referen ce p oint req uired to later estab lish statistical power aga in st und erfitting. 3.2. P ractical te st statistic and its theoretical gu arantees In p ractice, model p arameters are unknown. W e appro x imate T n by estimating th e b lo ck pro bability m atrices an d commun ity lab els fro m the multi-layer adjacency ma tr ices. Let M be a comm unity detection algo r ithm fo r m ulti-layer directed network s. Ap ply M to { A ( ℓ ) } L ℓ = 1 to partition th e n n odes into K s 0 sender commu nities and K r 0 r eceiver commun ities , yield ing estimated lab els ˆ g s for sender com munity and ˆ g r for rece iv er com munity . Then , for each layer ℓ , comp ute the plug-in estimato r ˆ B ( ℓ ) ∈ [0 , 1] K s 0 × K r 0 via ˆ B ( ℓ ) ( k , l ) = P i : ˆ g s ( i ) = k P j : ˆ g r ( j ) = l A ( ℓ ) ( i , j ) |{ i : ˆ g s ( i ) = k }| · | { j : ˆ g r ( j ) = l }| , k = 1 , . . . , K s 0 , l = 1 , . . . , K r 0 . The estimated expected a d jacency matrix ˆ Ω ( ℓ ) for layer ℓ has entries ˆ Ω ( ℓ ) ( i , j ) = ˆ B ( ℓ ) ( ˆ g s ( i ) , ˆ g r ( j )) for i , j an d ˆ Ω ( ℓ ) ( i , i ) = 0. Then we co nstruct the nor m alized residu al matrix ˆ R ∈ R n × n as ˆ R ( i , j ) =                P L ℓ = 1  A ( ℓ ) ( i , j ) − ˆ Ω ( ℓ ) ( i , j )  q ( n − 1 ) P L ℓ = 1 ˆ Ω ( ℓ ) ( i , j )(1 − ˆ Ω ( ℓ ) ( i , j )) i , j , 0 i = j . (2) The practical test statistic is defined a s ˆ T n = σ 1 ( ˆ R ) − 2 . ( 3) The n ormalization in ˆ R ensur es E [ ˆ R ( i , j )] ≈ 0 and V ar[ ˆ R ( i , j ) ] ≈ ( n − 1 ) − 1 under H 0 . The shift by 2 acco unts for the asy m ptotic behavior of σ 1 ( ˆ R )’ s up per bo und u nder a correc tly specified mod el, where σ 1 ( ˆ R )’ s up per bo und concentr a tes n ear 2 as shown by Theorem 1 gi ven later . Under H 1 , σ 1 ( ˆ R ) diverges du e to unmod eled commu nity structure g uaranteed by Th e orem 2 g iv en later . While M can be any metho d with co nsistent com munity re covery fo r multi-layer d irected networks u n der H 0 , this pap er employs the Debiased Sum of Gr am m atrices (DSoG) algorithm developed in [ 52 ], wh ere this algorithm extends the bia s-a djusted spectral clu ster ing idea firstly dev elope d in [ 34 ] fro m multi-layer un directed networks to multi-layer dire cted networks. Alg orithm 1 deta ils this pr o cedure. 6 Algorithm 1 Deb iased Sum of Gram matrices (D So G) Require: Multi-layer ad jacency matrices { A ( ℓ ) } L ℓ = 1 , sender co mmunity nu mber K s 0 , rec e iver commu n ity number K r 0 Ensure: Estimated labels ˆ g s , ˆ g r 1: for ℓ = 1 to L do 2: Compute o ut-degree diago nal matrix D out ℓ with D out ℓ ( i , i ) = P n j = 1 A ( ℓ ) ( i , j ) 3: Compute in -degree diagon al matrix D in ℓ with D in ℓ ( i , i ) = P n j = 1 A ( ℓ ) ( j , i ) 4: end for 5: for side ∈ { sender, rec e iv er } do 6: if side is sender then 7: S ← P L ℓ = 1  A ( ℓ ) ( A ( ℓ ) ) ⊤ − D out ℓ  8: K ← K s 0 9: else 10: S ← P L ℓ = 1  ( A ( ℓ ) ) ⊤ A ( ℓ ) − D in ℓ  11: K ← K r 0 12: end if 13: Compu te top min( K s 0 , K r 0 ) eig e n vectors of S as U ∈ R n × min( K s 0 , K r 0 ) 14: Apply k -means cluster in g o n rows of U with K clu sters 15: Assign resulting labels to ˆ g s (if sen der) or ˆ g r (if r eceiv er) 16: end for 17: retur n ˆ g s , ˆ g r Giv en th at a consistent estimator such as Algorith m 1 p rovides accura te commu nity labels u nder H 0 , the p lug-in residual matr ix ˆ R is asy mptotically well-app roxima te d by its or a cle version R . This appro ximation ensures that th e spectral behavior of ˆ R mirror s that o f R , whose largest singular value c o ncentrate s near 2 u nder the corr e c t mo del. W e therefor e o b tain the fo llowing fun d amental conver genc e re su lt for the good ness-of-fit statistic ˆ T n . Theorem 1 (Asymp totic beh avior of ˆ T n under H 0 ) . Un der H 0 and Assumptions 1 - 4 , and let ˆ R be o btained using consistent commu nity estimators ˆ g s and ˆ g r , then for an y ǫ > 0 , we have P ( ˆ T n < ǫ ) → 1 as n → ∞ . Theorem 1 g uarantees that th e upper bou nd of ˆ T n conv erges to 0 in probability under H 0 . This p rovides the theoretical basis fo r dec isio n rules: small values of ˆ T n support H 0 , while large values signal mode l inadeq uacy . T he following the o rem gu arantees that the test statistic div erges f or u nderfitted mo dels wh e re K s > K s 0 or K r > K r 0 . Theorem 2 (Asym ptotic behavior of ˆ T n under H 1 ) . Under Assumptio ns 1 – 3 , an d with the following add itio nal con- ditions: (A1) Th er e e xists a constant η > 0 such th at for any two distinct true send er co mmunities k , k ′ , the re exists a r eceiver community l satisfying        1 L L X ℓ = 1  B ( ℓ ) ( k , l ) − B ( ℓ ) ( k ′ , l )         ≥ η. The symmetric version should also h old. (A2) Th e network size, number o f layers, and maximum community n umber satisfy n L K 3 max → ∞ a s n → ∞ . If either K s > K s 0 or K r > K r 0 (or both) , then we have ˆ T n P − → ∞ . 7 This d iv ergence prope r ty ensures that whenever the hy pothesized mode l lacks su ffi cient sender or receiver c o mmu- nities, ˆ T n will excee d any fixed th reshold with pro bability appr oaching 1. Combined with Th eorem 1 , this guaran tees asymptotically separ a tion between cor rectly specified and underfitted mo dels. Remark 1 . (Pr oof in tuition for Theo r em 2 ). The divergence of ˆ T n under un derfitting is driven by a structural b ias that persis ts no matter how the parameters ar e estimated within the u nderspecified model. Th is bias arises when distinct true com mu nities—say , in sender r o les—ar e forced into a single estimated b lock. Cond itio n ( A1) guarantees the existence of a receiver c ommunity wh e r e the average connectivity of these merged gr oups di ff ers by a fi xed amo unt η > 0 . This systematic gap cr eates a deterministic, low-rank sig nal in the r esidual matrix. The pr oof hinges on comparing the spectral norms of this signa l and the random no ise: the signal gr ows as O P ( Ln / K 3 / 2 max ) , while the noise is o nly O P ( √ n L ) . A fter the v a riance norma lization in ˆ R, the signa l term remains of or der √ n L / K 3 / 2 max , whereas the n oise is O P (1) . Cond ition (A2) (n L / K 3 max → ∞ ) ensures the signal domina tes asympto tically , for cing σ 1 ( ˆ R ) —and hence ˆ T n —to diver ge. This sharp contrast between bou nded fluctua tio ns under a co rr ect specifi cation (Theo r em 1 ) an d diver gent gr owth und er un derfitting pr ovides the rigor ous fo unda tion for the mod el-selection pr ocedures d eveloped in Section 4 . 4. Sequential testing algo rithms for asymmetric community numbers selectio n Building on the goodn ess-of-fit test developed in Section 3 , we now address the core pro b lem of joint comm u nity number s estimation. The sequen tial testing fram ework leverages the asymptotic behavior of ˆ T n established in Th eo- rems 1 and 2 to sy stematically identif y the true ( K s , K r ) while maintaining compu tational e ffi ciency . This ap proach transform s mod el selection into an ordered explo ration o f candidate p airs, where the test statistic’ s dich otomou s behavior—conver gence to zero u nder correct specification versu s divergence un der underfitting—provid es reliable stopping criter ia. W e furth er pro pose a ratio-b ased variant of this app roach to e n hance robustness in practical settings. 4.1. Th e MLDiGoF algo rithm and its e stima tio n co nsistency The estimatio n pr ocedur e ev aluates cand idate p airs ( k s , k r ) in a lexicog raphical order that first co mpares the total number of com munities k s + k r , with smaller totals being prioritized. For pair s with an equal total, the pair with the smaller sen d er com m unity num ber k s is examin ed first. Formally , we define this search o rder b y th e fo llowing o rder: ( k s , k r ) < ( k ′ s , k ′ r ) if an d on ly if        k s + k r < k ′ s + k ′ r , or k s + k r = k ′ s + k ′ r and k s < k ′ s . This strategy gu ides the search fr o m simple r to more comp lex m odels. Let P = { ( k ( m ) s , k ( m ) r ) } M m = 1 be the complete sequence of candidate pairs fro m (1,1 ) to ( K cand , K cand ), listed in th e or der d e fined above, where K cand is the maximu m candidate numbe r of commu nities, M = K 2 cand , and the index m denotes the position of the candidate pair ( k s , k r ) in the o rdered sequence P . The d e tailed m a pping between the in d ex m and the candida te pair ( k s , k r ) for K cand = 10 can be f ound in the following example. Example 1. Consider sear ching over candid a te sender commu nity n umbers k s fr o m 1 to 1 0 and r eceiver co mmunity numbers k r fr o m 1 to 10. T able 1 lists the ca ndidate pairs P in th e or der defin ed in this pape r (first by in cr easing k s + k r , then by increasing k s ), with the index m runn ing fr o m 1 to 100. The estimator ( ˆ K s , ˆ K r ) is then the first pair in the search sequ e nce for which the go o dness-of- fit test does n ot reject: ( ˆ K s , ˆ K r ) = ( k ( ˆ m ) s , k ( ˆ m ) r ) , where ˆ m = min { m : ˆ T n ( k ( m ) s , k ( m ) r ) < t n } . Here, t n = n − ε for some ε ∈ (0 , 0 . 5 ) is a thre sh old that de c ays to zer o with the network size n . Algo rithm 2 below summarizes the details of this seq uential testing p rocedu re. 8 T able 1: Search order of candidat e pairs P for K cand = 10 m ( k s , k r ) m ( k s , k r ) m ( k s , k r ) m ( k s , k r ) m ( k s , k r ) m ( k s , k r ) m ( k s , k r ) m ( k s , k r ) m ( k s , k r ) m ( k s , k r ) 1 (1,1) 11 (1,5) 21 (6,1) 31 (3,6) 41 (5,5) 51 (6,5) 61 (7,5) 71 (9,4) 81 (6,9) 91 (7,10) 2 (1,2) 12 (2,4) 22 (1,7) 32 (4,5) 42 (6,4) 52 (7,4) 62 (8,4) 72 (10,3) 82 (7,8) 92 (8,9) 3 (2,1) 13 (3,3) 23 (2,6) 33 (5,4) 43 (7,3) 53 (8,3) 63 (9,3) 73 (4,10) 83 (8,7) 93 (9,8) 4 (1,3) 14 (4,2) 24 (3,5) 34 (6,3) 44 (8,2) 54 (9,2) 64 (10,2) 74 (5,9) 84 (9,6) 94 (10,7) 5 (2,2) 15 (5,1) 25 (4,4) 35 (7,2) 45 (9,1) 55 (10,1) 65 (3,10) 75 (6,8) 85 (10,5) 95 (8,10) 6 (3,1) 16 (1,6) 26 (5,3) 36 (8,1) 46 (1,10) 56 (2,10) 66 (4,9) 76 (7,7) 86 (6,10) 96 (9,9) 7 (1,4) 17 (2,5) 27 (6,2) 37 (1,9) 47 (2,9) 57 (3,9) 67 (5,8) 77 (8,6) 87 (7,9) 97 (10,8) 8 (2,3) 18 (3,4) 28 (7,1) 38 (2,8) 48 (3,8) 58 (4,8) 68 (6,7) 78 (9,5) 88 (8,8) 98 (9,10) 9 (3,2) 19 (4,3) 29 (1,8) 39 (3,7) 49 (4,7) 59 (5,7) 69 (7,6) 79 (10,4) 89 (9,7) 99 (10,9) 10 (4,1) 20 (5,2) 30 (2,7) 40 (4,6) 50 (5,6) 60 (6,6) 70 (8,5) 80 (5,10) 90 (10,6) 100 (10,10) Algorithm 2 MLD iGo F Require: Multi-layer a d jacency matrices { A ( ℓ ) } L ℓ = 1 , significance threshold t n (default: n − 1 / 5 ), maximum candidate number K cand (default: ⌊ p n / log n ⌋ ), where n is th e n u mber o f n odes Ensure: Estimated commun ity n umbers ( ˆ K s , ˆ K r ) 1: Gen erate cand idate sequen ce P = { ( k s , k r ) } M m = 1 with M = K 2 cand 2: for m = 1 to M do 3: Let ( k s , k r ) ← P ( m ) 4: Compute ˆ T n ( k s , k r ) via E quation ( 3 ) u sing Algo r ithm 1 for commu nity estimation 5: if ˆ T n ( k s , k r ) < t n then 6: return ( ˆ K s , ˆ K r ) = ( k s , k r ) 7: end if 8: end for 9: return ( ˆ K s , ˆ K r ) = P ( M ) ⊲ If no cand id ate satisfies ˆ T n < t n , retu rn the largest can didate Remark 2. Th e specification of K cand , th e maximum can didate n umber of communities, follows na turally fr om the asymptotic r e gime imposed by Assumption 3 . This assump tion, r equiring K 2 max L log n / n → 0 , p r ovides a n imp licit up- per bo und fo r the true commun ity nu mbers. A theor etically coherent and computa tionally feasible choice is therefor e K cand = ⌊ p n / log n ⌋ . This selection g uarantees that, with pr obability tending to on e, the true pair ( K s , K r ) r esides within the candida te set { ( k s , k r ) : 1 ≤ k s , k r ≤ K cand } for all su ffi ciently large n. Consequently , the sequ ential testing pr ocedur e r emains c o nsistent. Furthe rmo r e, it boun ds th e total sear ch space by O ( n / log n ) cand idate pairs, ensuring the a lgorithm’s practicality without compr omising its theo r etical fo u ndatio n s. The fo llowing th eorem gu arantees that the sequential proc edure achieves joint consistency in recovering th e true commun ity number s under ML-ScBM, provided the threshold sequen c e t n satisfies certain cond itions that b a la n ce th e conv ergence rates under the n u ll and the divergence r ates under underfitting . Theorem 3 (Consistency o f the MLDiGoF algorithm) . Let th e mu lti-layer d irected network be gene rated fr om ML- ScBM with true parameters ( K s , K r , g s , g r , { B ( ℓ ) } L ℓ = 1 ) . A ssume : 1. Assumptio ns 1 - 4 ho ld. 2. Cond itions (A1) and (A2) o f Theor em 2 ho ld. Define α n ≔ q K 2 max L log n n and β n ≔ √ nL K 3 / 2 max . Let { t n } n ≥ 1 be a seque n ce of positive thr esholds sa tisfying: (C1) α n = o ( t n ) as n → ∞ . (C2) t n = o ( β n ) as n → ∞ . Then the ou tput ( ˆ K s , ˆ K r ) of A lg orithm 2 satisfies lim n →∞ P  ( ˆ K s , ˆ K r ) = ( K s , K r )  = 1 . 9 Remark 3 (I n terpretatio n o f cond itions) . Conditions (C1) an d (C2) are necessary for MLDiGoF’s estimation c o nsis- tency . • Cond ition (C1) r equires that th e thr eshold t n decays slower than the estimation err or rate α n = p K 2 max L log n / n. This ensures that the di ff er ence between the practical test statistic ˆ T n and the oracle statistic T n (which is o f or der O P ( α n ) ) is asymptotically n e gligible comp a r ed to t n under the true mode l. • Cond ition (C2) r equires tha t t n gr ows slower than the d iver gence rate β n = √ n L / K 3 / 2 max under un derfitting. This guarantees that ˆ T n will eventually exceed t n for a ny und e rfitted cand id ate. T ogether , co nditions (C1) a nd (C2) ensur e that the th r eshold sequence asymptotically separates the true mod el ( wh er e ˆ T n is sma ll) fr om underfitted models ( wher e ˆ T n is la r ge). • In practical scen arios, th e numbe r o f commun ities K max and the number of layers L ar e typica lly boun ded or gr ow very slowly with n . In ma ny ap plications, on e ma y treat K max and L a s con stants (i.e., O (1) ) for asympto tic analysis. Und er this common setting, we ha v e α n = O q log n n ! = o ( 1) , β n = O  √ n  → ∞ . Th en condition (C1) beco mes t n ≫ q log n n , an d condition (C2) b e c omes t n ≪ √ n. A simple and convenient choice that satisfies both is t n = n − ε for a ny ε ∈ (0 , 1 / 2) . Indeed, fo r a n y ε ∈ (0 , 1 / 2) , we h ave q log n n = o ( n − ε ) and n − ε = o ( √ n ) . Ther efor e, un der the typ ical a ssumption that K max and L ar e bound ed, the default cho ic e t n = n − 1 / 5 (i.e., ε = 1 / 5 ) in A lgorithm 2 satisfies both (C1) and (C2) . This choice pr ovides a practical balance : it decays slowly enoug h to avoid prematur e stopping under the true model, yet shrinks to zer o su ffi ciently fast to ensu re that underfitted models ar e eventua lly r ejected. Theorem 3 provides the formal guara n tee th at our M LDiGoF pr ocedur e is co nsistent—meanin g it r ecovers th e true send er and r eceiver co mmunity cou nts with pr obability ten ding to o ne as th e n etwork size g rows. This r e sult is crucial b ecause it tr a nsforms the conceptually a ppealing seq uential testing id e a into a rigor ously justified estimation tool. Th e proof car efully balances two compe ting rates: the con vergence speed o f the test statistic under the true model and its di vergence rate un der underfitted mo d els. By choosing a thresho ld seq u ence that lies between th ese rates, the algorithm avoids two key pitfalls—premature stoppin g at an underfitted mod el and failure to stop at the true on e . I n pr a ctice, th is theo rem assures users that, under the stated regularity con ditions, the m ethod will not be misled by finite-sample fluctu a tions and will asym ptotically locate the cor r ect pair ( K s , K r ). Thus, this theorem not only establishes theoretical credibility but also provid es clear guidan ce f or imp lementing the algorithm in ap p lications where the true commu nity structure is unknown. 4.2. A ratio-based varia n t: th e MLRDiGoF algo rithm Building on the establishe d d ichotomy o f th e test statistic, we n ow in troduc e a ratio-b ased estimator that id entifies the transition p o int d irectly within the sequenc e of can didate models. By examining the ratio o f successi ve test statistics, th is meth o d detects a clear peak co rrespon ding to the p oint whe re the model first captures the true commu nity structure. W e refer to this alter native app roach as MLRDiGoF , which p rovides a di ff ere nt op erational per spective rooted in the same th eoretical framework. Forma lly , we define the ratio statistic and presen t the comple te algorithm as f ollows. For each m ∈ { 1 , 2 , . . . , M } , let ˆ T n ( m ) d enote th e ˆ T n computed f o r the m -th candidate p air in P using Eq uation ( 3 ). Define th e ratio statistic for the m - th ca n didate p a ir as r m =       ˆ T n ( m − 1) ˆ T n ( m )       , m = 2 , 3 , . . . , M , (4) where the absolute value addresses potential negativ e value o f ˆ T n for the true model ( K s , K r ). Theorem s 1 and 2 guaran tee th at under the true model ( K s , K r ) (which correspon ds to a specific po sition m ∗ in the ordered candidate sequence P ), the upper b ound of ˆ T n ( m ∗ ) conv erges to zero with hig h p robability , while ˆ T n ( m ∗ − 1) div erges to infinity un der under fittin g when m ∗ > 1. Consequently , we sh all expect that r m ∗ = | ˆ T n ( m ∗ − 1) / ˆ T n ( m ∗ ) | is 10 the first significan t pe a k (transition point) in the r a tio seque nce { r m } M m = 2 . W e re f er to this metho d as multi-lay er Ratio - DiGoF (MLRDiGoF for short), which ide n tifies the first peak in the seque nce { ( m , r m ) } M m = 2 . Th e comp le te MLRDiGo F algorithm is summar ized in Algorithm 3 . Algorithm 3 MLRDiGo F Require: Multi-layer adjacency m atrices { A ( ℓ ) } L ℓ = 1 , th resholds t n > 0 (default: n − 1 / 5 ) and τ n > 0 (default: 8 log n ), maximum can didate nu mber K cand (default: ⌊ p n / log n ⌋ ), w h ere n is the numb er o f n odes Ensure: Estimated commun ity n umbers ( ˆ K s , ˆ K r ) 1: Gen erate cand idate sequen ce P = [( k s , k r )] M m = 1 with M = K 2 cand 2: Com pute ˆ T n (1) fo r can didate (1 , 1) via Equatio n ( 3 ) 3: if ˆ T n (1) < t n then 4: return ( ˆ K s , ˆ K r ) = (1 , 1) 5: end if 6: for m = 2 to M do 7: Compute r a tio statistic r m via Eq uation ( 4 ) 8: if r m > τ n then 9: return ( ˆ K s , ˆ K r ) = P ( m ) 10: end if 11: end for 12: retur n ( ˆ K s , ˆ K r ) = P ( M ) ⊲ If no cand id ate satisfies r m > τ n , retu rn the largest can didate The following theoretical r esults establish the asympto tic prop erties and estimation co nsistency of MLRDiGoF . Theorem 4 ch aracterizes the d ichotomo us behavior of the ratio statistic r m : it div erges at the true model due to the sharp tra n sition of the un derlying test statistic, while remain ing u n iformly b ound e d for u nderfitted models. This re sult, analogo u s to the asymptotic behavior of the test statistic established in The o rems 1 and 2 , pr ovid es the theoretical found ation fo r distingu ishing the tr ue co mmun ity stru cture. Buildin g upo n this, Theorem 5 establishes the estimatio n consistency of the MLRDiGoF algorithm , showing that it cor rectly iden tifies the true pair ( K s , K r ) with prob ability tending to on e, analog ous to Theor e m 3 . Theorem 4 (Asymptotic behavior of r m ) . Under Assumptio ns 1 – 4 and co ndition (A1) o f Theorem 2 , with K s and K r fixed (indepe n dent of n), let { ˆ T n ( m ) } M m = 1 be the sequence of test statistics computed via Eq uation ( 3 ) for the le xico- graphically order e d can didate p airs P = { ( k ( m ) s , k ( m ) r ) } M m = 1 with M = K 2 cand and K cand ≥ max( K s , K r ) . Let m ∗ be the index of th e true p air ( K s , K r ) in P . Defin e the ratio statistic for m = 2 , . . . , M as in Equa tion ( 4 ). Then , we ha v e 1. (Divergence at the true model) F or a ny fixed M 0 > 0 , lim n →∞ P  r m ∗ > M 0  = 1 . 2. (Bo unded ness under u nderfitting ) Ther e exists a c o nstant C > 0 ( depend ing only on δ , c 0 , η fr om Assumption s 1 , 2 an d co ndition (A1 ), but not on n, L, o r th e ca ndidate ind ex) su ch th at fo r every m < m ∗ , lim n →∞ P  r m > C  = 0 . Theorem 5 ( Consistency o f the MLRDiGoF algorithm) . Assume the cond itions of Theor em 4 h old, i.e., Assump tio ns 1 – 4 and c o ndition (A1) of Theorem 2 hold, with K s and K r fixed (indep endent of n) . Let τ n be the thr eshold used in Algorithm 3 . Sup pose τ n satisfies the follo wing three cond itions: (D1) The re e xists a co nstant C 0 > 0 an d n 0 ∈ N such th at for all n ≥ n 0 , τ n > C 0 , wh er e C 0 is an y con stant gr eater than the c o nstant C fr o m Theor em 4 part 2 (i.e., C 0 > C ). (D2) τ n = o r n log n ! . 11 Then the ou tput ( ˆ K s , ˆ K r ) of A lg orithm 3 satisfies lim n →∞ P  ( ˆ K s , ˆ K r ) = ( K s , K r )  = 1 . Remark 4 (Inter pretation of cond itions) . Condition s (D1 ) an d ( D2) ar e minimal r equirements that together guarantee the th r eshold seque nce τ n separates und erfitted mod els fr om the tru e mode l. • Cond ition (D1 ) d emands that τ n eventually exceed th e unifo rm bo und C estab lishe d in Th eor em 4 for unde rfitted ratios. This en su r es that, with high pr obability , no underfitted candid ate will pr odu ce a ratio r m lar ger than τ n , ther eby p r eventing false early stops. The strict inequality C 0 > C avoids the deg enerate situation wher e τ n exactly equals the b ound , which could lea d to un stable stoppin g beha vior in finite samples. • Cond ition (D2 ) r equires tha t τ n gr ow slower than p n / log n. This is be c ause the ratio a t th e true model satisfies r m ∗ & n / p log n asymp totically (see the pr oof of Th e or em 5 ). If τ n gr ew as fast a s or fa ster than th is rate, it co u ld dominate r m ∗ and p r event the alg o rithm fr om stop ping at the true pair . Co n dition (D2) ther efore gua rantees that the exploding signal at the true mod e l eventually surpasses the thr eshold. • The defau lt choice τ n = 8 log n satisfies both conditions. Since log n → ∞ , it eventu ally exceeds a ny fixed constant C (satisfying cond ition ( D1)). Moreo ver , lo g n = o  p n / log n  for any positive co nstant, so condition (D2) h olds. This choice is simple, practical, and meets the th e o r etical requir e m e n ts a cr o ss a wide r ange o f network sizes. Theorem 5 provides the consistency guar antee for th e ratio-based M L RDiGoF algorithm u nder the assum ption of a fixed number of c ommun ities. It co nfirms that the sh a rp tr ansition in the ratio statistic—bou nded for und erfit- ted mo dels and divergent at the true model—serves as a reliable selection criterio n. While analogo us to Theor em 3 in establishing consistency , Th e orem 5 validates a distinct, intuiti ve ap proach that detects a relativ e peak rather than an abso lu te threshold , o ff er ing a comp lementary perspective for mod el selection in the common scenario wher e commun ity c o unts are small relative to network size. Remark 5. The assumption that K max is fixed (i.e., does not gr ow with n) in Theo r ems 4 and 5 is ma de to pr eserve the ana lytical sharpne ss of the ratio-based pr ocedure . In the p r o ofs o f Theo rems 4 and 5 , the uniform bo und C for underfitted ratios a n d the sepa ration rate governing the diver gence of r m ∗ both depend on constants derived fr om the mo del p arameters (such as δ , c 0 , and η ), which ar e indepen dent of K max only when K max is fixed. If K max wer e allowed to incr ease with n, these key qua ntities wou ld beco me fu n ctions of n, intricately cou pling the g r o wth rates of the signal and noise terms in th e ratio statistic. Consequ ently , the clea n pha se transition at the true model—whe re r m stays bound ed for all und erfitted mod els an d explodes only at the true pa ir—would be ob scur ed. Moreo ver , choosing a single threshold sequence τ n that r obustly separates these two re gimes acr oss all cand idate p airs would r equir e balancin g multiple order s of g r owth, making the theory unnecessarily co mplex. Fixing K max isolates the core logic of the ratio test a n d d elivers a transparent consistency guarantee, which matches the typica l practical setting wher e th e numbers of asymmetric c ommunities ar e small relative to th e network size. Both the MLDiGoF and MLRDiGoF pro cedures provide con sistent estimators for the asymmetr ic commu nity number s under their resp ectiv e th e o retical co nditions, as estab lishe d in Theo rems 3 and 5 . However , the ratio -based MLRDiGoF ten ds to be more r o bust in practice. This robustness stem s f rom a fund amental di ff eren ce in the u n derly- ing d etection logic. MLDiGoF relies on a level-crossing ru le : it selects th e mod el wher e the g oodne ss-o f-fit statistic ˆ T n first falls be low a vanishing threshold t n . Th is d ecision is sensiti ve to the precise fin ite- sample value of ˆ T n , which itself is an estimate perturb ed fro m its oracle co unterpa r t by an er r or of or der k ˆ R − R k . Con seq uently , any systematic estimation bias can shift the crossing point. In con tr ast, MLRDiGoF implemen ts a ch ange-p o int de tec tion strategy . It monitors the ratio statistic r m = | ˆ T n ( m − 1) / ˆ T n ( m ) | , seeking a prono u nced p eak. Theorem 4 justifies this app roach by showing that r m remains stochastically bou nded for every und erfitted mod el, y et diver ges at the tru e mod el. Thus, instead o f ju dging a noisy statistic against a decayin g benc hmark, MLRDiGoF identifies a clear structural break in the sequence of fits—a criterion that is inv ariant to any comm on b ias a ff e cting all ˆ T n ( m ) similarly . For the comm o n setting wh ere the number s of asymm etric commun ities are small, de tecting this stru ctural break often p rovides a more stable em p irical criterio n than judging the absolute level of a noisy statistic again st a vanishing thr eshold. 12 5. Numerical Experiments In this section, we ev aluate the perfo rmance of the MLDiGo F an d MLRDiGoF algorithms by simulations under the m ulti-layer stoc h astic co- block mod el and one real d ata examp le. W e gener ate multi-lay er directed networks f rom the ML-ScBM define d in Definition 1 with the following detailed specifications. For all simu la tio ns, th e sender (an d th e receiver) co mmun ity assignmen ts are g enerated by letting each node b elong to each sender (receiver) comm unity with equal probab ility fo r all exper iments. For each layer ℓ = 1 , . . . , L , we gener a te a completely independent block probability matrix B ( ℓ ) ∈ [0 , 1] K s × K r as follows. First, we generate layer-specific par ameters ind epend e ntly f or each layer: a d iagonal strength parameter α ( ℓ ) ∼ Un iform[0 . 6 , 0 . 8 ]; a n o ff -diag onal ba se stren gth parameter β ( ℓ ) ∼ Un iform[0 . 1 , 0 . 3 ]; a n d an additiona l medium strength p arameter γ ( ℓ ) ∼ Un iform[0 . 4 , 0 . 6]. Then, we define the ba se matrix H ( ℓ ) 0 ∈ R K s × K r by H ( ℓ ) 0 ( k , l ) =                α ( ℓ ) , if k = l , γ ( ℓ ) , if l =  k + K s − 1  mod K r + 1 , β ( ℓ ) , o therwise , where m od denotes the mo d ulo operation. This constru ction guarantees that, b esides the d iagonal, each send er commun ity has a designated receiver commun ity ( cyclically shifted b y K s ) that rec e ives ed g es with med ium prob- ability , thereby creatin g distinct connectivity pro files for d i ff e rent receiver co mmunities and ensuring that b oth th e sender and receiver versions of condition (A1) are satisfied with high probab ility . T o preser ve asy mmetry even when K s = K r , we add an asym metric per tu rbation. Define a matrix H ( ℓ ) 1 ∈ R K s × K r whose entries are in depend ent draws from Unifor m[ − 0 . 1 , 0 . 1]. W e then fo rm e B ( ℓ ) ( k , l ) = H ( ℓ ) 0 ( k , l ) + H ( ℓ ) 1 ( k , l ) and clip every entr y o f e B ( ℓ ) to the interval [0 , 1]. Finally , let ρ ∈ (0 , 1) b e a g lo bal sparsity param e te r that co ntrols the overall edg e density across all layers. W e scale the matrices by ρ to obtain the bloc k -probability ma tr ices B ( ℓ ) ( k , l ) = ρ · e B ( ℓ ) ( k , l ) , ℓ = 1 , . . . , L . W e consider ρ ∈ { 0 . 05 , 0 . 1 , 0 . 2 , 0 . 3 , 0 . 4 } to examine ne twork s with varying ed g e densities, from very spar se ( ρ = 0 . 0 5) to modera te ly dense ( ρ = 0 . 4). Then, for each la y er ℓ = 1 , . . . , L , we gener ate the adjacen cy matr ix A ( ℓ ) with indepen dent entries by Definition 1 . For all exper iments, we use the following con sistent evaluation m etrics and default settings. Th e p rimary ev alu- ation m etric is the a ccuracy , defined as the propor tion of Monte Carlo re p lications in which th e a lg orithm co rrectly estimates bo th co mmunity nu mbers: Accuracy = Number of r eplications with ( ˆ K s , ˆ K r ) = ( K s , K r ) T otal n umber of replica tions . 5.1. E x periment 1: Beha vior of test statistic ˆ T n under null and altern a tive hypo theses This exper iment systematically verifies the theor e tica l prope r ties of the test statistic ˆ T n under a compre hensive set of h ypoth esis testing scenar io s as stated in Theorems 1 and 2 . W e fix th e number of layers L = 20 and the glob a l sparsity p arameter ρ = 0 . 2. W e consider the true asymmetric comm unity structu re ( K s , K r ) = (3 , 5), which pr ovides a rep resentative ca se of moderate asymmetry with K s < K r . T o tho rough ly validate the the o retical prediction s, we examine the f o llowing four hypoth esis testing scenarios that cover all un derfitting cases implied by T heorem 2 . • Null hy pothesis H 0 (correct specification): can didate pair equ als the true structure, ( K s 0 , K r 0 ) = (3 , 5). • Alternati ve H 1 : sender-only und erfitting: can didate h as insu ffi cient sen d er communities but correct recei ver commun ities, ( K s 0 , K r 0 ) = (2 , 5). • Alternati ve H 1 : receiver-only unde r fitting: candidate has insu ffi cient receiver com munities but correc t sender commun ities, ( K s 0 , K r 0 ) = (3 , 4). • Alternati ve H 1 : bo th -side underfittin g : cand idate has insu ffi cien t send er an d receiver commu n ities, ( K s 0 , K r 0 ) = (2 , 4). 13 W e vary the n etwork size n ∈ { 200 , 400 , 6 00 , 800 , 1000 } . For each n an d each h ypoth e sis scenar io , we gener ate 200 in d epende nt networks and re port the m ean and standar d deviation of ˆ T n over th e 20 0 replica tio ns. The results, detailed in T able 2 , stro ngly sup port T h eorems 1 and 2 . Und er th e co rrectly specified nu ll hyp othesis ( K s , K r ) = (3 , 5) , the absolute value of the test statistic ˆ T n conv erges to zero as n increases, with standard deviations also decr easing. Under all three u nderfitting alterna tives, ˆ T n div erges. The mean values inc r ease monoto nically with n . The diver gence is mo st pro noun ced for both-side un derfitting, fo llowed by sender-only under fitting, while receiver - only u nderfitting exhibits a slower d iv ergence. These r esults validate Theo rem 2 , dem onstrating that ˆ T n div erges un der any form of u nderfitting , en suring a clear separation from the null. T able 2: Beha vior of the test statistic ˆ T n under vari ous hypotheses for true structure ( K s , K r ) = (3 , 5). V alues are mean (standard dev iation) ove r 200 replication s. ˆ T n n H 0 : (3, 5) H 1 : (2, 5) H 1 : (3,4) H 1 : (2,4) 200 -0.014 (0.02 4) 3.259 (0.22 9 ) 0.165 (0 .112) 3. 324 (0.2 77) 400 -0.012 (0.01 4) 5.270 (0.33 9 ) 0.626 (0 .357) 5. 458 (0.4 08) 600 -0.008 ( 0 .010) 6 .900 ( 0.437 ) 0.893 (0 .262) 7. 1 87 (0.4 8 9) 800 -0.007 ( 0 .010) 8 .282 ( 0.516 ) 1.247 (0 .322) 8. 6 29 (0.5 4 6) 1000 -0.007 ( 0 .006) 9 .438 ( 0.627 ) 1.493 (0 .384) 9. 7 69 (0.7 0 9) 5.2. E x periment 2: Statistical discrimination power an d r obustness a nalysis This exper iment evaluates the statistical discrimination p ower of the test statistic ˆ T n across a compreh ensiv e set of asymmetr ic commun ity structur e s. Th e primary go al is to verify th at this statistic perfec tly discrim in ates between correctly specified m o dels and under fitted mod els in pra ctical settings. Fix th e n etwork size n = 800, numb er of lay e r s L = 1 5, and g lobal sparsity p arameter ρ = 0 . 2. W e co nsider eigh t tru e asym metric com m unity structur e s covering various a sy mmetry pattern s: ( K s , K r ) ∈ { (2 , 3) , (2 , 4) , (3 , 2) , ( 3 , 4) , (3 , 5) , (4 , 3) , (4 , 5) , (5 , 4) } , where these structure s in clude cases like K s < K r , K s > K r , an d | K s − K r | > 1 . For each true stru cture, we ev aluate ˆ T n under fou r scenario s: 1. Correct specification ( H 0 ): candidate pair equa ls the true stru cture. 2. Sender-only un derfitting ( H ( s ) 1 ): candidate has on e fewer sender comm unity . 3. Recei ver -on ly u nderfitting ( H ( r ) 1 ): can didate has one fewer receiver comm unity . 4. Both-side un d erfitting ( H ( b ) 1 ): candidate has on e fewer sender and receiver comm unities. For each con fig uration (true co mmunity structure × hy pothesis scenario), we genera te 20 0 indep endent networks. The ev aluation of the test statistic ˆ T n follows the f ollowing decision ru le as defin ed in the MLDiGoF algorithm : • F or the ˆ T n statistic (u sed in MLDiGoF) , we compu te ˆ T n for the spe c ified can didate pair ( K s 0 , K r 0 ) via Equ ation ( 3 ) u sing Algo r ithm 1 . The decision is correct if either: – Un der H 0 (correct specification): ˆ T n < t n , where t n = n − 1 / 5 ≈ 0 . 0043 f o r n = 800. – Un der H 1 (any und e rfitting): ˆ T n ≥ t n . W e re c o rd the emp irical prob ability of cor rect d ecisions over th e 20 0 replications, along with its standard error . T able 3 presents the discriminatio n perf o rmance. Acro ss all eigh t asy m metric com munity structures and all three underfitting ty pes, ˆ T n achieves perfect discrim ination, with empirical probabilities of 1.00. This provides strong evidence that: • Under correc t specification , ˆ T n consistently falls below the threshold t n = n − 1 / 5 ≈ 0 . 0043 . 14 • Under any f orm of un d erfitting (sender-only , receiver -only , or both ), ˆ T n consistently exceeds t n . • The d iscrimination po wer is robust to the direction and degree of asymme try between sender and r eceiver commun ities. Thus, ˆ T n perfectly distinguishes betwe e n co rrectly spec ified and under fitted models in these settings. T able 3: Discriminat ion power of the ˆ T n statisti c. V alues are empirica l probabili ties (standard errors) over 200 replicatio ns. T rue ( K s , K r ) Hypothesized ( K s 0 , K r 0 ) Underfitting typ e P ( ˆ T n < t n ) for H 0 P ( ˆ T n ≥ t n ) fo r H 1 (2 , 3) (2 , 3) T rue m odel ( H 0 ) 1.00 (0.0 00) – (1 , 3) Sender only ( H ( s ) 1 ) – 1.00 ( 0 .000) (2 , 2) Receiv er only ( H ( r ) 1 ) – 1.00 ( 0 .000) (1 , 2) Both ( H ( b ) 1 ) – 1.00 ( 0 .000) (2 , 4) (2 , 4) T rue m odel ( H 0 ) 1.00 (0.0 00) – (1 , 4) Sender only ( H ( s ) 1 ) – 1.00 ( 0 .000) (2 , 3) Receiv er only ( H ( r ) 1 ) – 1.00 ( 0 .000) (1 , 3) Both ( H ( b ) 1 ) – 1.00 ( 0 .000) (3 , 2) (3 , 2) T rue m odel ( H 0 ) 1.00 (0.0 00) – (2 , 2) Sender only ( H ( s ) 1 ) – 1.00 ( 0 .000) (3 , 1) Receiv er only ( H ( r ) 1 ) – 1.00 ( 0 .000) (2 , 1) Both ( H ( b ) 1 ) – 1.00 ( 0 .000) (3 , 4) (3 , 4) T rue m odel ( H 0 ) 1.00 (0.0 00) – (2 , 4) Sender only ( H ( s ) 1 ) – 1.00 ( 0 .000) (3 , 3) Receiv er only ( H ( r ) 1 ) – 1.00 ( 0 .000) (2 , 3) Both ( H ( b ) 1 ) – 1.00 ( 0 .000) (3 , 5) (3 , 5) T rue m odel ( H 0 ) 1.00 (0.0 00) – (2 , 5) Sender only ( H ( s ) 1 ) – 1.00 ( 0 .000) (3 , 4) Receiv er only ( H ( r ) 1 ) – 1.00 ( 0 .000) (2 , 4) Both ( H ( b ) 1 ) – 1.00 ( 0 .000) (4 , 3) (4 , 3) T rue m odel ( H 0 ) 1.00 (0.0 00) – (3 , 3) Sender only ( H ( s ) 1 ) – 1.00 ( 0 .000) (4 , 2) Receiv er only ( H ( r ) 1 ) – 1.00 ( 0 .000) (3 , 2) Both ( H ( b ) 1 ) – 1.00 ( 0 .000) (4 , 5) (4 , 5) T rue m odel ( H 0 ) 1.00 (0.0 00) – (3 , 5) Sender only ( H ( s ) 1 ) – 1.00 ( 0 .000) (4 , 4) Receiv er only ( H ( r ) 1 ) – 1.00 ( 0 .000) (3 , 4) Both ( H ( b ) 1 ) – 1.00 ( 0 .000) (5 , 4) (5 , 4) T rue m odel ( H 0 ) 1.00 (0.0 00) – (4 , 4) Sender only ( H ( s ) 1 ) – 1.00 ( 0 .000) (5 , 3) Receiv er only ( H ( r ) 1 ) – 1.00 ( 0 .000) (4 , 3) Both ( H ( b ) 1 ) – 1.00 ( 0 .000) 5.3. E x periment 3: Estimation accuracy und er varied network sizes and spa rsity levels This experimen t ev aluates the estimation accu racy of MLDiGoF and MLRDiGoF across a wide ran ge of net- work sizes, glob al spar sity levels, an d asymmetric commun ity structures. W e set L = 15, network sizes n ∈ { 200 , 400 , 600 , 800 , 1000 } , global sparsity parameters ρ ∈ { 0 . 1 , 0 . 2 , 0 . 3 , 0 . 4 , 0 . 5 } , an d sev eral tru e asymmetric com- munity structures: ( K s , K r ) ∈ { (1 , 1) , (1 , 3) , (2 , 2) , (2 , 3) , (2 , 4) , (3 , 4) , (3 , 5) , (4 , 4) , (4 , 5) } . For eac h co mbinatio n of parameters, we generate 20 0 indep endent networks and run both MLDiGoF a n d MLRDiGoF with their d efault set- tings. For each rep lication, we record th e estimated co mmunity numb ers ( ˆ K s , ˆ K r ) an d compu te the accuracy as defined earlier . The accuracy results are sh own in T able 4 . Se veral key tren ds em erge: • F or fixed sparsity ρ and true commun ity structure ( K s , K r ), accur a cy improves mono tonically with network size n , c o nfirming the co nsistency r e sults in The o rems 3 an d 5 . 15 • F or fixed n and ( K s , K r ), accuracy improves with increasing global sparsity ρ . The improvement is particularly dramatic for co mplex asymmetric stru ctures. For instance, for ( K s , K r ) = (3 , 5) with n = 600, a c c uracy jumps from 0 .28 at ρ = 0 . 1 to 0 . 95 at ρ = 0 . 2 and 1.0 0 at ρ ≥ 0 . 3 for the MLDiGoF method. • Structures with larger total comm unities K s + K r are inh erently mo re challeng ing to estimate. F or example, at n = 600 and ρ = 0 . 2, the accu racy for the r elativ ely simple structure ( 2 , 3) is 1.00, while fo r the more comp lex (3 , 5) it is 0.95 for ML DiGoF an d 1 .00 fo r ML RDiGoF . • MLRDiGoF gen erally matches or slightly outpe r forms MLDiGoF , particularly in spar se and challeng ing set- tings. For (3 , 5 ) with n = 60 0 and ρ = 0 . 2 , MLRDiGoF achie ves an accuracy of 1 .00 compared to 0.95 for MLDiGoF . For (2 , 4) with n = 20 0 and ρ = 0 . 3, MLRDiGoF accuracy is 0 .95 while MLDiGoF is 0.6 8, highligh ting the r obustness of the r a tio-based appr oach. Overall, b oth algo r ithms demon strate co nsistent estimation performance across diverse asym metric comm unity structures, with accur a cy approac h ing 1 as either n or ρ incr eases, co n firming the consistency theo rems. 5.4. E x periment 4: Sensitivity to th r eshold parameters This experiment examines the sensitivity of MLDiGoF to th e dec ay parame ter ε in t n = n − ε , and of MLRDiGo F to the thr eshold τ n . W e fix the n etwork size n = 800, th e nu mber of layers L = 15, the g lo bal sparsity par a meter ρ = 0 . 2, and the tru e asymmetr ic community structure ( K s , K r ) = (3 , 5 ) . For M L DiGoF , we vary ε f r om 0.1 to 1.00 in steps of 0.1. For each ε , we generate 200 ind epende n t networks, run MLDiGoF with thre shold t n = n − ε , and recor d the accuracy . For MLRDiGoF , we consider two types of threshold seque n ces: constant thresho lds τ ∈ { 2 , 4 , 6 , 8 , 10 , 1 2 , 1 4 , 16 , 18 , 20 } and growing th resholds τ n = a log n with a ∈ { 0 . 5 , 1 . 0 , 1 . 5 , 2 . 0 , 2 . 5 , 3 . 0 , 3 . 5 , 4 , 4 . 5 , 5 } . For each th reshold settin g, we g enerate 200 inde p enden t networks a nd run MLRDiGoF , recordin g th e accu racy . The sensitivity analy sis results are presented in Figur e 1 . For M LDiGoF , it maintains high acc u racy ( ≥ 0 . 9) for the decay parameter ε in the rang e (0 . 05 , 0 . 70]. Its accur a cy begins to decline f or ε ≥ 0 . 7 0. This p attern aligns pr e cisely with Remark 3 , which r equires 0 < ε < 0 . 5 f or the threshold co nditions (C1) an d (C2) to ho ld. The default choice ε = 0 . 2 yields almost perfect accu racy in this experiment. For MLRDiGoF with co nstant th resholds τ , accuracy is poor fo r τ ≤ 6 but impr oves as τ inc reases. W ith growing th resholds τ n = a lo g n , accu racy is larger tha n 0.9 for a ≥ 4. The algor ith m’ s default τ n = 8 log n ( c orrespon ding to a = 8) perfor m s alsmost perf ectly , as expected. Th e se results confirm Theo rem 5 , wh ich requ ires τ n to exceed a cer tain constan t ( Co ndition D1) and grow slower than p n / log n (Condition D2) . Both constan t threshold s τ ≥ 8 and growing thresho lds with a ≥ 8 satisfy these condition s. 16 T able 4: Estimation accurac y of MLDiGoF and ML RDiGoF under s elect ed challe nging asymmetric settin gs. V alues are accurac y ov er 100 replic ations. ( K s , K r ) n MLDiGoF MLRDiGoF ρ = 0 . 1 ρ = 0 . 2 ρ = 0 . 3 ρ = 0 . 4 ρ = 0 . 5 ρ = 0 . 1 ρ = 0 . 2 ρ = 0 . 3 ρ = 0 . 4 ρ = 0 . 5 (1,1) 200 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 400 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 600 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 800 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1000 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 (1,3) 200 0.68 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 400 0.98 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 600 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 800 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1000 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 (2,2) 200 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 400 1.00 1.00 1.00 1.00 0.98 1.00 1.00 1.00 1.00 1.00 600 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 800 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1000 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 (2,3) 200 0.90 1.00 1.00 1.00 1.00 0.96 1.00 1.00 1.00 1.00 400 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 600 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 800 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1000 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 (2,4) 200 0.00 0.36 0.68 0.95 1.00 0.15 0.85 0.95 1.00 1.00 400 0.32 0.94 1.00 0.99 1.00 0.50 1.00 1.00 1.00 1.00 600 0.85 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 800 0.96 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1000 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 (3,4) 200 0.85 1.00 1.00 1.00 1.00 0.79 1.00 1.00 1.00 1.00 400 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 600 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 800 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1000 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 (3,5) 200 0.02 0.02 0.27 0.68 0.90 0.20 0.40 0.85 0.95 1.00 400 0.01 0.63 0.91 0.990 1.00 0.45 1.00 1.00 1.00 1.00 600 0.28 0.95 1.00 1.00 1.00 0.75 1.00 1.00 1.00 1.00 800 0.62 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1000 0.87 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 (4,4) 200 0.89 1.00 1.00 1.00 1.00 0.13 0.82 0.92 0.96 0.98 400 1.00 1.00 1.00 1.00 1.00 0.92 1.00 1.00 1.00 1.00 600 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 800 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1000 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 (4,5) 200 0.21 0.98 1.00 1.00 1.00 0.00 0.71 0.97 1.00 1.00 400 0.98 1.00 1.00 1.00 1.00 0.56 1.00 1.00 1.00 1.00 600 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 800 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1000 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 17 1 2 3 4 5 6 7 8 9 10 Parameter index (1 to 10) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Accuracy Experiment 4: Sensitivity to Threshold Parameters MLDiGoF: MLRDiGoF: (constant) MLRDiGoF: a (a log n) Fig. 1. Sensiti vity analysis of ML D iGoF and MLRDiGoF to threshold paramete rs ( n = 800, L = 15, ρ = 0 . 2, ( K s , K r ) = (3 , 5)). For each m ethod, the x -axis index 1 to 10 correspon ds to an increa sing sequence of the associate d threshol d paramet er: ε for ML DiGoF (blue circle s), constant τ for ML RDiGoF (red squares), and scale factor a for MLRDiGoF with τ n = a log n (yello w triangles). Accuracy is computed ove r independent replic ations. 5.5. R e al d a ta example W e consider a multi-layer d irected network built from the Food an d Ag riculture O rgan ization (F A O) Multiplex T rade Network, av ailable at https: //man liode domenico.com/data.php . In this d ataset, nodes re p resent coun- tries an d each layer c o rrespon ds to a specific agricu ltural prod uct. The data are f or the year 2010 a nd record annua l import and export values amon g countries. W e select the 30 produ c ts with the largest total trade volume. For each selected pr oduct, we con struct a layer by placing a directed edge from the expor ter to the imp orter if the trade value between the two co untries reach es or exceeds 100. The resulting network c onsists of 21 3 coun tr ies a nd 30 lay e rs, covering a r ange of major food and agricu ltural comm odities. Th is dataset provid es a realistic multi-layer direc ted network for ev aluating o u r comm unity num b er estimation methods. Figure 2 d isplays the test statistics fo r the F A O network with K cand = 10, w h ich is ch o sen to be larger than th e default ⌊ p n / log n ⌋ ≈ 6 ( for n = 21 3) to exp a nd the search space. The left p anel of Figure 2 shows that ˆ T n ( m ) remains above 10 for all m , far exceeding the MLDiGoF threshold t n = n − 0 . 2 ≈ 0 . 34 22. Consequently , MLDiGoF never encoun ters a can d idate with ˆ T n ( m ) < t n and returns the largest cand idate p air ( K cand , K cand ) = (10 , 1 0). This indica tes that the F A O data lacks a su ffi ciently strong comm unity signal to dr iv e ˆ T n below the prescribed decay thr eshold. The rig ht panel p lots the r atio statistic r m . Its glob al maxim um is below 1 . 8, wh ich is much sm a ller than the default threshold τ n = 8 log n used in MLRDiGo F . Hence, with the d efault thr eshold, MLRDiGoF would also terminate a t (10 , 10 ) . Howev er, b y detec tin g the peak of r m , one can select a smaller threshold to stop at the glo bal maximu m located at m = 42, correspo n ding to the candid ate pair ( k s , k r ) = (6 , 4) fro m T able 1 . This y ields an estimated commun ity stru cture of (6 , 4 ) f or the F A O n etwork. The con trast between th e two alg orithms high lights a fun d amental advantage o f the ratio-based ap proach . MLDiGoF relies on an absolute thresho ld t n ; its stopping d ecision is sensitive to th e precise finite-sample value of ˆ T n and fails when th e test statistic nev er dr o ps below that threshold. In contrast, MLRDiGoF m onitors the relative changes in consecutive test statistics. E ven whe n all ˆ T n ( m ) are large, a sharp transition in the ratio seq uence—ap p earing as a clear pe a k—can signal the move from unde r fitted to ade q uately specified m odels. This relative measur e makes ML - RDiGoF more robust in re al-world settings wher e clear com munity stru c ture may be absent. By d etecting this peak rather than an absolute le vel, MLRDiGoF provides a m eaningf ul estimate ev en when the absolute g oodn ess-of-fit measure re m ains u niformly high. 18 Test Statistic Ratio Statistic m = 42 Fig. 2. T est statistic ˆ T n ( m ) and ratio statistic r m for ordered candida te pairs 1 ≤ m ≤ 100 (i.e., K cand = 10) for the F A O networ k. The red circle in the right panel m arks the global maximum of r m at m = 42, corresponding to the candidate pair ( k s , k r ) = (6 , 4). 6. Conclusion This paper addr e sses the fund amental challenge of jointly estimating the number of sender and receiver com- munities in multi-la y er directe d network s u nder th e multi- layer stocha stic co -block model. By intro ducing a novel goodn ess-of-fit test based o n th e largest singular v alue of a normalized re sid ual matrix, we establish a sharp di- chotomy : under the nu ll model, the test statistic’ s upper boun d conv erges to 0 with high proba b ility , while itself div erges to infin ity und er und erfitting. Th is theo r etical insight enables the d esign of two e ffi c ient sequen tial testing algorithm s M L DiGoF and its ra tio -based variant MLRDiGoF that lexicog r aphically search thro u gh can didate pairs of c ommun ity numb ers and terminate at the smallest adeq u ate mod e l. Both m ethods are p roven to consistently re- cover th e true asymm e tr ic com munity cou nts. Ex te n siv e num erical expe r iments validate the e ffi cacy and accuracy of the p roposed meth ods, demonstratin g their robustness acr oss varying network sp a rsity levels and commun ity struc- tures. Real data applicatio ns fu rther confirm th e practical utility of the ra tio -based appr oach in recovering mean ingfu l asymmetric co mmun ity struc tu res in comp lex mu lti-layer dir e cted networks. The prop osed fr a m ew ork can be extended in sev eral meaningf ul directio n s. Promising model-ba sed extensions include d ev elopin g analogous g o odness-o f-fit tests fo r multi-lay er degree- c o rrected ScBMs to account f or degree heteroge n eity [ 28 , 39 ], for multi-layer bip artite networks [ 60 ] to consider di ff ere n t ty p es of no des, and fo r m ulti-layer mixed-memb ership ScBMs to allow fo r overlapp ing asymmetric com munities [ 1 , 41 , 25 , 49 ]. Methodo logically , relaxing con ditions such as b alanced comm unity sizes or per f ect label recovery would enhance robustness. Further adaptation s c o uld address networks with weighted edg es o r d ynamic settings where co mmunity me m berships ev olve smoothly acr oss lay ers. Fro m a computatio nal perspective, acceler ating the methodo logy v ia e ffi cient ra n domized algorithm s d eveloped in [ 14 , 9 ] f o r large- scale multi-lay e r d ir ected networks remains an impo r tant c h allenge f or theory and prac tice. CRediT a uthorship cont rib ution statement Huan Q ing is th e sole autho r of this article. Declaratio n of competing interest The authors de clare n o comp eting inter ests. Data availability Data will be ma d e av ailable on request. 19 Ap pendix A. Behaviors of the test statistics Throu g hout the append ix, we use th e notation introd uced in the ma in text. k · k denote s the spectral nor m , k · k F the Frobe n ius nor m, and σ 1 ( · ) the largest sing ular value. For two seq uences a n , b n we write a n ≪ b n when a n = o ( b n ). The symb ols C and c den ote gen e ric positive constants whose values may chan g e from line to lin e. Append ix A.1. Pr oof of Lemma 1 Pr o of. By Assump tion 1 , we have L X ℓ = 1 Ω ( ℓ ) ( i , j )(1 − Ω ( ℓ ) ( i , j )) ≥ L δ ( 1 − δ ) , which gives | R ( i , j ) | ≤ L √ ( n − 1 ) L δ (1 − δ ) = r L ( n − 1 ) δ (1 − δ ) = : M n . The entrie s { R ( i , j ) : i , j } are m utually in depend ent (because edg es are independ ent across layers an d pairs). Th e diagona l entr ies are zero . The no n -symmetr ic version of Corollary 3 .12 in [ 3 ]) is helpful f or our proof . W e state it below: Lemma 2. (R e c tangula r version of Cor ollary 3 .12 in [ 3 ]) Let X b e an n 1 × n 2 random matrix with indep enden t entries X i j satisfying | X i j | ≤ ˜ σ ∗ . Define ˜ σ = max          max i s X j E [ X 2 i j ] , max j s X i E [ X 2 i j ]          . Then ther e exists a un iversal constan t C > 0 such that for any 0 < η ≤ 1 / 2 an d t ≥ 0 , P ( k X k ≥ (1 + η )2 ˜ σ + t ) ≤ ( n 1 ∧ n 2 ) exp − t 2 C ˜ σ 2 ∗ ! . W e ap ply Lemma 2 to th e ide a l residual matrix R , wh ic h is n × n . Given that X j E [ R ( i , j ) 2 ] = X j , i 1 n − 1 = 1 , X i E [ R ( i , j ) 2 ] = X i , j 1 n − 1 = 1 , we have ˜ σ = max { 1 , 1 } = 1 . Choose η = ǫ / 4 with ǫ ≤ 2, we have (1 + η )2 ˜ σ = 2(1 + ǫ / 4) = 2 + ǫ / 2 . Set t = ǫ / 2. Then (1 + η )2 ˜ σ + t = 2 + ǫ / 2 + ǫ / 2 = 2 + ǫ . The entries o f R are bou nded b y ˜ σ ∗ = M n . Applying Lem ma 2 o btains P ( k R k ≥ 2 + ǫ ) ≤ n exp − t 2 C M 2 n ! = n exp − ǫ 2 4 C M 2 n ! . Recall M 2 n = L ( n − 1 ) δ (1 − δ ) . Let C ′ = 4 C δ (1 − δ ) . W e h ave P ( k R k ≥ 2 + ǫ ) ≤ n exp − ǫ 2 δ (1 − δ )( n − 1) 4 C L ! = n exp − ǫ 2 ( n − 1 ) C ′ L ! . (A.1) 20 By Assumption 3 , we have K 2 max L log n n → 0 . Since K max ≥ 1, this implies L log n n → 0 , an d co nsequen tly , n − 1 L log n → ∞ . T aking the log arithm of the r ight-han d side in Equa tio n ( A.1 ) yield s: log " n exp − ǫ 2 ( n − 1 ) C ′ L !# = lo g n − ǫ 2 ( n − 1) C ′ L . W e can rewrite this as: log n − ǫ 2 ( n − 1 ) C ′ L = lo g n 1 − ǫ 2 C ′ · n − 1 L log n ! . Since n − 1 L log n → ∞ , the term inside the parenthe ses ten ds to −∞ , and thus: log n − ǫ 2 ( n − 1 ) C ′ L → −∞ . Expon entiating, we obtain: n exp − ǫ 2 ( n − 1 ) C ′ L ! → 0 . Therefo re, P ( k R k ≥ 2 + ǫ ) → 0 . Since σ 1 ( R ) = k R k , we have P ( σ 1 ( R ) < 2 + ǫ ) → 1 . Therefo re, we get P ( T n < ǫ ) = P ( σ 1 ( R ) − 2 < ǫ ) = P ( σ 1 ( R ) < 2 + ǫ ) → 1 . This completes th e pr oof of this lemma. Append ix A.2. Pr operties of estimated p arameters Lemma 3 ( Concentratio n of th e blo ck probab ility estimator) . F or a ny estimated sen der co mmunity s 0 and r eceiver community r 0 with sizes n s 0 = | { i : ˆ g s ( i ) = s 0 }| an d n r 0 = | { j : ˆ g r ( j ) = r 0 }| , we ha ve | ˆ B ( ℓ ) ( s 0 , r 0 ) − E [ ˆ B ( ℓ ) ( s 0 , r 0 )] | ≤ s 6 log n n s 0 n r 0 with pr obab ility at least 1 − O ( n − 3 ) , where E [ ˆ B ( ℓ ) ( s 0 , r 0 )] is the expectation con ditioned on ˆ g s and ˆ g r . Pr o of. Define the sets: ˆ C s s 0 = { i ∈ [ n ] : ˆ g s ( i ) = s 0 } , ˆ C r r 0 = { j ∈ [ n ] : ˆ g r ( j ) = r 0 } . Set m = n s 0 n r 0 . If m = 0, then the estimated commu nity is emp ty , an d by definition ˆ B ( ℓ ) ( s 0 , r 0 ) = 0 and E [ ˆ B ( ℓ ) ( s 0 , r 0 ) | ˆ g s , ˆ g r ] = 0, so the inequ ality holds trivially . W e thu s assume m ≥ 1 in the following. For each i ∈ ˆ C s s 0 and j ∈ ˆ C r r 0 , A ( ℓ ) ( i , j ) is a Berno ulli ra n dom variable with success probab ility p i j = B ( ℓ ) ( g s ( i ) , g r ( j )) , where g s ( i ) an d g r ( j ) are th e tr ue (un known) sender and receiver com m unity labels. Define the centered rand om variables X i j = A ( ℓ ) ( i , j ) − p i j , ∀ i ∈ ˆ C s s 0 , j ∈ ˆ C r r 0 . W e see that X i j are in depend ent and satisfy E [ X i j | ˆ g s , ˆ g r ] = 0 , | X i j | ≤ 1 , V ar( X i j | ˆ g s , ˆ g r ) = p i j (1 − p i j ) ≤ 1 4 . 21 Set S = P i ∈ ˆ C s s 0 P j ∈ ˆ C r r 0 X i j . W e have E [ S | ˆ g s , ˆ g r ] = 0 , σ 2 : = V a r ( S | ˆ g s , ˆ g r ) = X i ∈ ˆ C s s 0 X j ∈ ˆ C r r 0 p i j (1 − p i j ) ≤ m 4 . Note th at the block pr obability estimator is d efined as ˆ B ( ℓ ) ( s 0 , r 0 ) = 1 m P i ∈ ˆ C s s 0 P j ∈ ˆ C r r 0 A ( ℓ ) ( i , j ), and its conditio nal expectation is E [ ˆ B ( ℓ ) ( s 0 , r 0 ) | ˆ g s , ˆ g r ] = 1 m P i ∈ ˆ C s s 0 P j ∈ ˆ C r r 0 p i j . Th us, we have ˆ B ( ℓ ) ( s 0 , r 0 ) − E [ ˆ B ( ℓ ) ( s 0 , r 0 ) | ˆ g s , ˆ g r ] = S m . Our goal is to contro l the magnitud e of S / m . W e apply Ber n stein’ s ineq uality in [ 53 ]. For any v > 0 , P ( | S | ≥ v | ˆ g s , ˆ g r ) ≤ 2 exp − v 2 / 2 σ 2 + v / 3 ! ≤ 2 exp − v 2 / 2 m / 4 + v / 3 ! . (A.2) W e ch oose v = p 6 m log n . Note that sin ce | X i j | ≤ 1, we hav e | S | ≤ m . Therefor e, if v > m , the event {| S | ≥ v } cannot o ccur . Now v > m if a nd only if p 6 m log n > m ⇐ ⇒ 6 log n > m ⇐ ⇒ m < 6 log n . W e d isting uish two cases. Case 1 : m < 6 log n . In this case, v > m , so P ( | S | ≥ v | ˆ g s , ˆ g r ) = 0 . Mor e over, we always have | ˆ B ( ℓ ) ( s 0 , r 0 ) − E [ ˆ B ( ℓ ) ( s 0 , r 0 ) | ˆ g s , ˆ g r ] | = | S | m ≤ 1 . Since m < 6 log n , we have r 6 log n m > 1 . Consequently , | ˆ B ( ℓ ) ( s 0 , r 0 ) − E [ ˆ B ( ℓ ) ( s 0 , r 0 ) | ˆ g s , ˆ g r ] | ≤ 1 < r 6 log n m holds with pr obability 1. In particular, the failure p robab ility is 0, which certainly satisfies th e O ( n − 3 ) requirem ent. Case 2 : m ≥ 6 log n . Now v ≤ m , so we can apply ineq uality ( A.2 ). Sub stituting v = p 6 m log n into inequality ( A.2 ), we examine the expo nent: v 2 / 2 m / 4 + v / 3 = 3 m log n m 4 + √ 6 m log n 3 . W e n e e d to show that this quantity is at least 3 log n . Th is is equ iv alent to 3 m log n ≥ 3 log n        m 4 + p 6 m log n 3        ⇐ ⇒ m ≥ m 4 + p 6 m log n 3 ⇐ ⇒ 3 4 m ≥ p 6 m log n 3 . Multiplying both sides by 3 gives 9 4 m ≥ p 6 m log n . Squaring both sides gives 81 16 m 2 ≥ 6 m log n ⇐ ⇒ 81 16 m ≥ 6 log n ⇐ ⇒ m ≥ 96 81 log n = 32 27 log n . 22 Since we ar e in the case m ≥ 6 log n and 6 > 32 / 2 7 ≈ 1 . 1 85, th e con dition is satisfied. Therefo re, we have 3 m log n m 4 + √ 6 m log n 3 ≥ 3 lo g n , and ineq uality ( A.2 ) yield s P ( | S | ≥ v | ˆ g s , ˆ g r ) ≤ 2 exp( − 3 log n ) = 2 n − 3 . Hence, in Case 2, with cond itional prob a bility at least 1 − 2 n − 3 , we have | S | < v . W e h av e shown that the cond itional pro bability P ( | S | ≥ v | ˆ g s , ˆ g r ) is e ith er 0 (when m < 6 log n ) or at mo st 2 n − 3 (when m ≥ 6 log n ). T hus, we have P ( | S | ≥ v | ˆ g s , ˆ g r ) ≤ 2 n − 3 . By the law o f to tal expectation, the uncon ditional p robability satisfies P ( | S | ≥ v ) = E  P ( | S | ≥ v | ˆ g s , ˆ g r )  ≤ 2 n − 3 . Whenever | S | < v , we o btain | ˆ B ( ℓ ) ( s 0 , r 0 ) − E [ ˆ B ( ℓ ) ( s 0 , r 0 ) | ˆ g s , ˆ g r ] | = | S | m < v m = r 6 log n m = s 6 log n n s 0 n r 0 . Therefo re, with p robab ility at least 1 − 2 n − 3 , the desired in e quality holds. This completes the p roof of this lemm a . Lemma 4 (Un iform bou ndedn ess of the estimated edge pr o babilities) . Under H 0 and Assump tions 1 - 4 , we ha v e ˆ Ω ( ℓ ) ( i , j ) ∈ [ δ/ 2 , 1 − δ/ 2] for all i , j , ℓ with pr obab ility at least 1 − o ( n − 2 ) . Pr o of. Let E n denote the event that the commu nity detection algorithm M recovers th e true co mmunities up to label permutatio n: E n : = { ˆ g s = g s and ˆ g r = g r } , where equ ality is un derstood up to a p ermutation o f co mmunity labels. By Assumption 4 , we have lim n →∞ P ( E n ) = 1 . For each estimated sender co mmunity k ∈ { 1 , . . . , K s 0 } a n d r eceiver commun ity l ∈ { 1 , . . . , K r 0 } , define n s k : = | { i : ˆ g s ( i ) = k }| , n r l : = | { j : ˆ g r ( j ) = l }| . Note that un d er H 0 , K s 0 = K s and K r 0 = K r . On the ev ent E n , e ach estimated commu nity correspo n ds exactly to one true c ommun ity (after appr opriate label perm utation). Th erefore , fo r any estimated sende r co mmunity k , there exists a true sender com munity k ′ such th at { i : ˆ g s ( i ) = k } = { i : g s ( i ) = k ′ } . By Assumption 2 , we have n s k = | { i : g s ( i ) = k ′ }| ≥ c 0 n K s . Similarly , for any estimated receiver com m unity l , there exists a true receiver commu nity l ′ such that n r l = | { j : g r ( j ) = l ′ }| ≥ c 0 n K r . 23 Consequently , the p roduct satisfies n s k n r l ≥ c 0 n K s ! c 0 n K r ! = c 2 0 n 2 K s K r ≥ c 2 0 n 2 K 2 max . Fix a laye r ℓ ∈ { 1 , . . . , L } and an estimated blo ck ( k , l ) ∈ [ K s ] × [ K r ]. By Le mma 3 , we h av e     ˆ B ( ℓ ) ( k , l ) − E h ˆ B ( ℓ ) ( k , l ) | ˆ g s , ˆ g r i     ≤ s 6 log n n s k n r l with p r obability at least 1 − O ( n − 3 ) given ˆ g s and ˆ g r . On E n , the con ditional expectation e quals the true block probability . More precisely , there exist per mutations π s : [ K s ] → [ K s ] and π r : [ K r ] → [ K r ] such that fo r all i , j , ˆ g s ( i ) = π s ( g s ( i )) , ˆ g r ( j ) = π r ( g r ( j )) . Then fo r any estimated b lock ( k , l ), we h ave E h ˆ B ( ℓ ) ( k , l ) | ˆ g s , ˆ g r i = B ( ℓ ) ( π − 1 s ( k ) , π − 1 r ( l )) . By Assumption 1 , we have B ( ℓ ) ( π − 1 s ( k ) , π − 1 r ( l )) ∈ [ δ , 1 − δ ] . Thus, on E n , we have E h ˆ B ( ℓ ) ( k , l ) | ˆ g s , ˆ g r i ∈ [ δ , 1 − δ ] for all k , l , ℓ. (A.3) and s 6 log n n s k n r l ≤ s 6 log n c 2 0 n 2 / K 2 max = K max p 6 log n c 0 n . By Assumption 3 an d th e fact that L ≥ 1, we have K 2 max log n n → 0 , hence K max p log n n → 0 . Therefo re, there exists N 1 ∈ N such that for all n > N 1 , K max p 6 log n c 0 n ≤ δ 2 . (A.4) Define th e fo llowing event F n : =            ˆ B ( ℓ ) ( k , l ) − E h ˆ B ( ℓ ) ( k , l ) | ˆ g s , ˆ g r i     ≤ K max p 6 log n c 0 n for all k , l , ℓ        . Using th e un ion bo und over all K s K r L ≤ K 2 max L b locks, an d ap plying Lemm a 3 , we have P  F c n | E n  ≤ K 2 max L · 2 n − 3 = 2 K 2 max Ln − 3 . Now , for n > N 1 , on the event E n ∩ F n , co m bining Equ ations ( A.3 ) and ( A.4 ) gives f or all k , l , ℓ , ˆ B ( ℓ ) ( k , l ) ∈  δ − δ 2 , 1 − δ + δ 2  = [ δ/ 2 , 1 − δ/ 2] . (A.5) 24 Recall that ˆ Ω ( ℓ ) ( i , j ) = ˆ B ( ℓ ) ( ˆ g s ( i ) , ˆ g r ( j )). Defin e the target e vent A n : = n ˆ Ω ( ℓ ) ( i , j ) ∈ [ δ/ 2 , 1 − δ / 2] for all i , j , ℓ o . From Equation ( A.5 ), we h ave E n ∩ F n ⊆ A n , he n ce A c n ⊆ E c n ∪ F c n . W e now boun d P ( A c n ): P ( A c n ) ≤ P ( E c n ) + P ( F c n ) . For the first term, b y Assumptio n 4 and th e exponen tial co n vergence rate s ty pical for spectr al clustering und er ou r assumptions (see e.g., [ 32 ]), we have P ( E c n ) = O ( n − 3 ). For the second term, using the law o f to tal pro bability obtain s P ( F c n ) = P ( F c n ∩ E n ) + P ( F c n ∩ E c n ) ≤ P ( F c n | E n ) + P ( E c n ) ≤ 2 K 2 max Ln − 3 + O ( n − 3 ) . By Assumption 3 , K 2 max L = o ( n / log n ). Theref o re, 2 K 2 max Ln − 3 = o n log n · n − 3 ! = o 1 n 2 log n ! = o ( n − 2 ) . Thus, we have P ( A c n ) = O ( n − 3 ) + o ( n − 2 ) = o ( n − 2 ) . Consequently , P ( A n ) = 1 − o ( n − 2 ) , which com pletes th e pro of. Lemma 5 (Conver genc e o f the estimated residu a l matrix) . Under H 0 and Assump tions 1 - 4 , we h ave k ˆ R − R k = o P (1) . Pr o of. Und er H 0 , by Assump tio n 4 , with proba b ility ten ding to 1, ˆ g s = g s and ˆ g r = g r . Cond ition on this ev ent. Then ˆ Ω ( ℓ ) ( i , j ) = ˆ B ( ℓ ) ( g s ( i ) , g r ( j )). By Le mma 3 an d Assum ptions 1 - 2 , we have | ˆ Ω ( ℓ ) ( i , j ) − Ω ( ℓ ) ( i , j ) | = O P        K max p log n n        . Define S i j = L X ℓ = 1 ( A ( ℓ ) ( i , j ) − Ω ( ℓ ) ( i , j )) , E i j = L X ℓ = 1 ( ˆ Ω ( ℓ ) ( i , j ) − Ω ( ℓ ) ( i , j )) , U i j = L X ℓ = 1 Ω ( ℓ ) ( i , j )(1 − Ω ( ℓ ) ( i , j )) , ˆ U i j = L X ℓ = 1 ˆ Ω ( ℓ ) ( i , j )(1 − ˆ Ω ( ℓ ) ( i , j )) , D i j = q ( n − 1 ) U i j , ˆ D i j = q ( n − 1 ) ˆ U i j . Then, we h av e R ( i , j ) = S i j D i j , ˆ R ( i , j ) = S i j − E i j ˆ D i j . By Assumption 1 , U i j ≥ L δ (1 − δ ), so D i j ≥ √ ( n − 1 ) L δ (1 − δ ) = O ( √ n L ). By Lemma 4 , with probability 1 − o ( n − 2 ), ˆ U i j ≥ L δ 2 (1 − δ 2 ), so ˆ D i j = O ( √ n L ). Now , w e d ecompo se ˆ R ( i , j ) − R ( i , j ) as: ˆ R ( i , j ) − R ( i , j ) = S i j − E i j ˆ D i j − S i j D i j = S i j       1 ˆ D i j − 1 D i j       − E i j ˆ D i j . First, we bo und | E i j | : | E i j | =        L X ℓ = 1 ( ˆ Ω ( ℓ ) ( i , j ) − Ω ( ℓ ) ( i , j ))        ≤ L max ℓ | ˆ Ω ( ℓ ) ( i , j ) − Ω ( ℓ ) ( i , j ) | = O P        K max L p log n n        . 25 Second, we bound     1 ˆ D i j − 1 D i j     . By Lemma 4 , there exist constants c 1 , c 2 > 0 such that ˆ D i j ≥ c 1 √ n L and D i j ≥ c 2 √ n L . Let c = m in( c 1 , c 2 ). Then ˆ D i j ≥ c √ n L and D i j ≥ c √ n L . Mo reover , n ote that U i j ≤ L / 4 and ˆ U i j ≤ L / 4, so D i j ≤ 1 2 √ ( n − 1 ) L and ˆ D i j ≤ 1 2 √ ( n − 1) L . Hence, th ere exists a con stan t C 0 > 0 such that for all large n , we hav e D i j ˆ D i j ≥ C 0 D 2 i j . Conseq uently ,       1 ˆ D i j − 1 D i j       = | D i j − ˆ D i j | D i j ˆ D i j ≤ 1 C 0 | D i j − ˆ D i j | D 2 i j . Now , we boun d | D i j − ˆ D i j | : | D i j − ˆ D i j | =      q ( n − 1 ) U i j − q ( n − 1 ) ˆ U i j      = √ n − 1      p U i j − q ˆ U i j      . The function f ( x ) = √ x is continu o usly di ff e rentiable o n (0 , ∞ ) . For any x , y ≥ a > 0, by the mean value theo rem, there exists ξ be twe en x and y such th a t | √ x − √ y | = 1 2 √ ξ | x − y | ≤ 1 2 √ a | x − y | . Thus, f ( x ) is Lipschitz on [ a , ∞ ) with constant 1 / (2 √ a ). By Assumption 1 , we have Ω ( ℓ ) ( i , j ) ∈ [ δ , 1 − δ ], so U i j = L X ℓ = 1 Ω ( ℓ ) ( i , j )(1 − Ω ( ℓ ) ( i , j )) ≥ L δ (1 − δ ) . By Lem m a 4 , with pro bability 1 − o ( n − 2 ), we have ˆ U i j ≥ L δ 2 (1 − δ 2 ). Sinc e δ ∈ (0 , 1 2 ), we have L δ 2 (1 − δ 2 ) ≤ L δ (1 − δ ). Let a = min  L δ (1 − δ ) , L δ 2  1 − δ 2  = L δ 2  1 − δ 2  . Then b o th U i j and ˆ U i j are at least a with high proba b ility . Therefo re, on this event,      p U i j − q ˆ U i j      ≤ 1 2 √ a | U i j − ˆ U i j | = 1 2 q L δ 2 (1 − δ 2 ) | U i j − ˆ U i j | . For simplicity , deno te C δ = 1 2 √ δ 2 (1 − δ 2 ) . Then, we h av e      p U i j − q ˆ U i j      ≤ C δ √ L | U i j − ˆ U i j | . W e also h av e | U i j − ˆ U i j | =        L X ℓ = 1  Ω ( ℓ ) ( i , j )(1 − Ω ( ℓ ) ( i , j )) − ˆ Ω ( ℓ ) ( i , j )(1 − ˆ Ω ( ℓ ) ( i , j ))         ≤ L X ℓ = 1    Ω ( ℓ ) ( i , j )(1 − Ω ( ℓ ) ( i , j )) − ˆ Ω ( ℓ ) ( i , j )(1 − ˆ Ω ( ℓ ) ( i , j ))    . The function g ( x ) = x (1 − x ) is Lipschitz on [0 , 1] with constan t 1, since | g ′ ( x ) | = | 1 − 2 x | ≤ 1 f or x ∈ [0 , 1] . Henc e , we have    Ω ( ℓ ) ( i , j )(1 − Ω ( ℓ ) ( i , j )) − ˆ Ω ( ℓ ) ( i , j )(1 − ˆ Ω ( ℓ ) ( i , j ))    ≤ | Ω ( ℓ ) ( i , j ) − ˆ Ω ( ℓ ) ( i , j ) | . Thus, | U i j − ˆ U i j | ≤ L X ℓ = 1 | Ω ( ℓ ) ( i , j ) − ˆ Ω ( ℓ ) ( i , j ) | . 26 Define S (abs) i j = P L ℓ = 1 | Ω ( ℓ ) ( i , j ) − ˆ Ω ( ℓ ) ( i , j ) | . Since each term | Ω ( ℓ ) ( i , j ) − ˆ Ω ( ℓ ) ( i , j ) | = O P K max √ log n n ! unifor m ly in ℓ , we have S (abs) i j = O P        K max L p log n n        . Therefo re, we get | D i j − ˆ D i j | ≤ √ n − 1 · C δ √ L S (abs) i j = O        √ n · 1 √ L · K max L p log n n        = O        K max p L log n √ n        . Hence, using the ab ove bou nd and the fact that D 2 i j = ( n − 1) U i j ≍ n L , we obtain       1 ˆ D i j − 1 D i j       ≤ 1 C 0 | D i j − ˆ D i j | D 2 i j = O        K max p L log n √ n · n L        = O        K max p log n L 1 / 2 n 3 / 2        . Now , we boun d S i j  1 ˆ D i j − 1 D i j  : | S i j | ≤ L (since | A ( ℓ ) ( i , j ) − Ω ( ℓ ) ( i , j ) | ≤ 1) , so, we have       S i j       1 ˆ D i j − 1 D i j             ≤ L · O        K max p log n L 1 / 2 n 3 / 2        = O        K max p L log n n 3 / 2        . Next, we bound E i j ˆ D i j . Note th at | E i j | ≤ S (abs) i j = O P K max L √ log n n ! , which gives       E i j ˆ D i j       ≤ O K max L √ log n n ! O ( √ n L ) = O        K max p L log n n 3 / 2        . Therefo re, we have | ˆ R ( i , j ) − R ( i , j ) | ≤ O        K max p L log n n 3 / 2        + O        K max p L log n n 3 / 2        = O        K max p L log n n 3 / 2        . The Frobenius n orm is k ˆ R − R k 2 F = X i , j ( ˆ R ( i , j ) − R ( i , j )) 2 = O P n 2 · K 2 max L log n n 3 !! = O P K 2 max L log n n ! . Thus, k ˆ R − R k ≤ k ˆ R − R k F = O P        K max r L log n n        . By Assumption 3 , we have k ˆ R − R k = o P (1), wh ich co mpletes the pr oof of this lemma. Append ix A.3. Pr oof of Th e o r em 1 Pr o of. By Lemma 1 , σ 1 ( R ) ≤ 2 + o P (1). By Lemma 5 , k ˆ R − R k = o P (1), so by W eyl’ s inequality , | σ 1 ( ˆ R ) − σ 1 ( R ) | ≤ k ˆ R − R k = o P (1). Th erefore , σ 1 ( ˆ R ) ≤ σ 1 ( R ) + o P (1) ≤ 2 + o P (1) , which imp lies ˆ T n = σ 1 ( ˆ R ) − 2 ≤ o P (1) . Thus, for any ǫ > 0, P ( ˆ T n < ǫ ) → 1, wh ich co mpletes the proo f of this theorem . 27 Remark 6 . (The polynomial conver gen ce rate of k ˆ R − R k ) In Lem m a 5 , we established k ˆ R − R k = o P (1) un der Assumptions 1 – 4 . F or the sequen tial testing pr ocedure in A lg orithm 2 with a d ecaying threshold t n = n − ε , it is useful to kn ow when the str o nger rate k ˆ R − R k = o P ( n − ε ) holds. F r o m the p r o of o f Lemma 5 , we ha ve k ˆ R − R k ≤ k ˆ R − R k F = O P        K max r L log n n        . Thus, k ˆ R − R k = o P ( n − ε ) is gua ranteed whenever K max r L log n n = o ( n − ε ) ⇐ ⇒ K 2 max L log n = o ( n 1 − 2 ε ) . (*) Condition ( ∗ ) is str onger than Assumptio n 3 (wh ich only requir es K 2 max L log n / n → 0 ). If ( ∗ ) h olds, then k ˆ R − R k = O P        r K 2 max L log n n        = o P  √ n − 2 ε  = o P ( n − ε ) , which ensures th at the estimation err or of th e res idua l ma trix d e cays faster than th e thr eshold t n . While Th eor em 1 does not r equir e such a polyno mial r ate, co ndition ( ∗ ) can be ad opted in finite-sample analyses to sharpen the performance g uarantees of th e seq uential p r o cedure . Append ix A.4. Pr oof of Th e o r em 2 Pr o of. W e pr ovide a detailed proo f f or the case K s > K s 0 (the case K r > K r 0 is symmetr ic). The pro of proceed s in the following steps. Step 1: Identify merged sender c o mmunities. Since K s 0 < K s , b y th e pigeo nhole prin ciple, there exists an estimated sender co mmunity s 0 ∈ { 1 , . . . , K s 0 } th at contains at least two distinct tru e send er c ommun ities. Denote two su c h tru e commun ities as k 1 , k 2 ∈ { 1 , . . . , K s } , k 1 , k 2 . Define th e no de sets: S 1 = { i : g s ( i ) = k 1 } ⊆ ˆ C s s 0 , S 2 = { i : g s ( i ) = k 2 } ⊆ ˆ C s s 0 , where ˆ C s s 0 = { i : ˆ g s ( i ) = s 0 } . Step 2 : Select a receiver community . By condition (A1), ther e exists a receiver com munity l ∗ ∈ { 1 , . . . , K r } such that        L X ℓ = 1  B ( ℓ ) ( k 1 , l ∗ ) − B ( ℓ ) ( k 2 , l ∗ )         ≥ η L . W ithout loss of gener a lity , assume L X ℓ = 1  B ( ℓ ) ( k 1 , l ∗ ) − B ( ℓ ) ( k 2 , l ∗ )  ≥ η L . (A.6) Step 3: Define node sets and their sizes. Let T = { j : g r ( j ) = l ∗ } . By Assump tion 2 , th ere exists a constant c 0 > 0 such that | S 1 | ≥ c 0 n K s , | S 2 | ≥ c 0 n K s , | T | ≥ c 0 n K r . 28 Set s 1 = | S 1 | , s 2 = | S 2 | and rec all that K max = m ax( K s , K r ), we h av e min( s 1 , s 2 ) ≥ c 0 n K max , | T | ≥ c 0 n K max . (A.7) Step 4: Identify a subset of t he estimated receiver c ommunity . Sin ce nodes in T ar e a ssign ed to K r 0 estimated receiver comm unities, by the pigeonh ole p rinciple, there e xists an estimated receiv er comm unity r 0 ∈ { 1 , . . . , K r 0 } such th at the subset T ′ = { j ∈ T : ˆ g r ( j ) = r 0 } satisfies | T ′ | ≥ | T | K r 0 ≥ c 0 n K r K r 0 ≥ c 0 n K 2 max , (A.8) where we used K r 0 ≤ K r ≤ K max . Step 5: Construct the deviation matrix and lower -bound its norm. Define th e aggregated deviation matrix as ∆ = L X ℓ = 1  Ω ( ℓ ) − ˆ Ω ( ℓ )  , and consider its sub matrix ∆ S , T ′ , wh e r e S = S 1 ∪ S 2 . For i ∈ S 1 and j ∈ T ′ , since i is in the estimated sender commun ity s 0 and j in the estimated receiver comm unity r 0 , we have ˆ Ω ( ℓ ) ( i , j ) = ˆ B ( ℓ ) ( s 0 , r 0 ) , while th e true pro b ability is Ω ( ℓ ) ( i , j ) = B ( ℓ ) ( k 1 , l ∗ ). Hen ce, w e have ∆ ( i , j ) = L X ℓ = 1  B ( ℓ ) ( k 1 , l ∗ ) − ˆ B ( ℓ ) ( s 0 , r 0 )  ≕ d 1 . Similarly , for i ∈ S 2 , j ∈ T ′ , we have ∆ ( i , j ) = L X ℓ = 1  B ( ℓ ) ( k 2 , l ∗ ) − ˆ B ( ℓ ) ( s 0 , r 0 )  ≕ d 2 . Observe that d 1 − d 2 = L X ℓ = 1  B ( ℓ ) ( k 1 , l ∗ ) − B ( ℓ ) ( k 2 , l ∗ )  ≥ η L , by Equ ation ( A.6 ). Con sequently , we get max( | d 1 | , | d 2 | ) ≥ | d 1 − d 2 | 2 ≥ η L 2 . (A.9) Now w e a n alyze the structur e and rank o f ∆ S , T ′ . W e d e fine th e following items: • s 1 = | S 1 | , s 2 = | S 2 | , an d t ′ = | T ′ | . • Define indicato r vectors 1 S 1 ∈ R s 1 + s 2 where ( 1 S 1 ) i = 1 if the i -th node in S belo ngs to S 1 , an d 0 otherw ise. • Define 1 S 2 ∈ R s 1 + s 2 similarly f or S 2 . • Define 1 T ′ ∈ R t ′ as th e all-o nes vector . 29 Then ∆ S , T ′ can b e expr e ssed as ∆ S , T ′ = d 1 · 1 S 1 1 ⊤ T ′ + d 2 · 1 S 2 1 ⊤ T ′ . T o see this, note th at for i ∈ S 1 and j ∈ T ′ , th e ( i , j )-entr y of 1 S 1 1 ⊤ T ′ is 1, wh ile the correspo n ding entry of 1 S 2 1 ⊤ T ′ is 0 , g iving ∆ ( i , j ) = d 1 . Similarly for i ∈ S 2 . The matrix 1 S 1 1 ⊤ T ′ is an outer p roduc t of two vectors, hen ce has rank 1 . The same h olds for 1 S 2 1 ⊤ T ′ . Since the ran k of a sum of two m atrices is at most the su m o f the ir ran k s, we have rank( ∆ S , T ′ ) ≤ ra n k( 1 S 1 1 ⊤ T ′ ) + ran k( 1 S 2 1 ⊤ T ′ ) = 1 + 1 = 2 . (A.10) The Frobenius n orm o f ∆ S , T ′ is k ∆ S , T ′ k 2 F = X i ∈ S 1 X j ∈ T ′ d 2 1 + X i ∈ S 2 X j ∈ T ′ d 2 2 = t ′ ( s 1 d 2 1 + s 2 d 2 2 ) . Using E q uation ( A.9 ), we have k ∆ S , T ′ k 2 F ≥ t ′ · min( s 1 , s 2 ) · max( d 2 1 , d 2 2 ) ≥ t ′ · m in( s 1 , s 2 ) ·  η L 2  2 . (A.11) Now , fro m Equa tio ns ( A.7 ) and ( A.8 ), we have the lower bo u nds min( s 1 , s 2 ) ≥ c 0 n K max , t ′ ≥ c 0 n K 2 max . Substituting th ese into ( A.11 ) obtain s k ∆ S , T ′ k F ≥ η L 2 p t ′ · m in( s 1 , s 2 ) ≥ η L 2 r c 0 n K 2 max · c 0 n K max = c 0 η 2 · Ln K 3 / 2 max . Since ∆ S , T ′ has ran k a t most 2 (fro m Equa tion ( A.10 )), its spe c tral no rm satisfies the f ollowing inequality k ∆ S , T ′ k ≥ k ∆ S , T ′ k F p rank( ∆ S , T ′ ) ≥ k ∆ S , T ′ k F √ 2 , which gives k ∆ S , T ′ k ≥ c 0 η 2 √ 2 · Ln K 3 / 2 max . (A.12) Step 6: Control the random part. Define W : = L X ℓ = 1  A ( ℓ ) − Ω ( ℓ )  , and co nsider its su bmatrix W S , T ′ obtained by restricting rows to S = S 1 ∪ S 2 and co lumns to T ′ , where W S , T ′ is the random fluctua tion matrix over the index sets S and T ′ . W e n ow e stab lish a h igh-pr obability upper bo und for the spectral n orm o f W S , T ′ using Lemma 2 (th e re c tangular version o f Cor ollary 3 .12 in [ 3 ]) . The matrix W S , T ′ is of size s × t ′ with s = | S | an d t ′ = | T ′ | . Its entries are independent becau se edg es are indepen d ent acro ss di ff ere n t no de-pair s and layers. For each ( i , j ) ∈ S × T ′ , we have W i j = L X ℓ = 1  A ( ℓ ) ( i , j ) − Ω ( ℓ ) ( i , j )  , 30 which satisfies E [ W i j ] = 0 and, by Assumption 1 , V ar( W i j ) = L X ℓ = 1 Ω ( ℓ ) ( i , j )  1 − Ω ( ℓ ) ( i , j )  ≤ L 4 . Moreover , | W i j | ≤ L beca use each term is bounde d by 1. Define th e matrix X = W S , T ′ . Th en X has in depend ent entries, | X i j | ≤ L , and X j ∈ T ′ E [ X 2 i j ] ≤ t ′ · L 4 , X i ∈ S E [ X 2 i j ] ≤ s · L 4 . Hence, ˜ ˜ σ : = max          max i ∈ S s X j ∈ T ′ E [ X 2 i j ] , max j ∈ T ′ s X i ∈ S E [ X 2 i j ]          ≤ r L max( s , t ′ ) 4 ≤ 1 2 √ Ln , and we set ˜ ˜ σ ∗ = L as the upper bound of all entries. Apply ing Lemma 2 to X , fo r any 0 < η ≤ 1 / 2 and t ≥ 0 , P  k X k ≥ (1 + η )2 ˜ ˜ σ + t  ≤ ( s ∧ t ′ )exp  − t 2 C ˜ ˜ σ 2 ∗  ≤ n exp  − t 2 C L 2  , where C > 0 is a universal con stant fr om the lemma. Choose η = 1 / 2 and set t = M √ n L with a co nstant M > 0 to b e de termined. W e get (1 + η )2 ˜ ˜ σ ≤ 3 √ Ln / 2 , which gives P  k W S , T ′ k ≥ 3 2 √ Ln + M √ n L  ≤ n exp  − M 2 n L C L 2  = n exp  − M 2 n C L  . (A.13) Now , we show that th e right-ha n d side of Equ ation ( A.13 ) ten ds to zero as n → ∞ . Let P n = n exp  − M 2 n C L  . T aking loga r ithms, we have log P n = lo g n − M 2 n C L . By Assumption 3 an d th e fact that K max ≥ 1 , we h av e L log n n → 0 . Re write log P n as log P n = lo g n 1 − M 2 C · n L log n ! . Since n L log n → ∞ , the factor in side th e pare n theses ten ds to −∞ , and thus lo g P n → −∞ . Hen c e, we get P n = exp(log P n ) → 0 . Therefo re, for any fixed M > 0, the probab ility in Equatio n ( A.13 ) tend s to ze r o. This im plies th a t with p robab ility tending to 1, we have k W S , T ′ k ≤ 3 2 √ Ln + M √ n L = O  √ n L  , or e quiv alently , k W S , T ′ k = O P  √ n L  . 31 Step 7 : Lower-bound the agg regated residual submatrix. Let Z = P L ℓ = 1 ( A ( ℓ ) − ˆ Ω ( ℓ ) ) = W + ∆ . The n Z S , T ′ = W S , T ′ + ∆ S , T ′ . By the triang le inequality a nd the lower b ound for k ∆ S , T ′ k f rom Eq uation ( A.12 ), we hav e k Z S , T ′ k ≥ k ∆ S , T ′ k − k W S , T ′ k ≥ c 0 η 2 √ 2 · Ln K 3 / 2 max − O P  √ n L  . (A.14) Step 8: Lower-bound t he spectral no rm of the normalized residual matrix. For i , j , th e nor malizing factor is ˆ D i j = v u t ( n − 1 ) L X ℓ = 1 ˆ Ω ( ℓ ) ( i , j )  1 − ˆ Ω ( ℓ ) ( i , j )  . Since ˆ Ω ( ℓ ) ( i , j ) ∈ [0 , 1], we have ˆ Ω ( ℓ ) ( i , j )(1 − ˆ Ω ( ℓ ) ( i , j )) ≤ 1 / 4 , and thus ˆ D i j ≤ r ( n − 1 ) · L 4 = 1 2 p ( n − 1 ) L . Consequently , for any i ∈ S , j ∈ T ′ , we have 1 ˆ D i j ≥ 2 √ ( n − 1 ) L . (A.15) Consider the submatrix ˆ R S , T ′ with entries ˆ R ( i , j ) = Z ( i , j ) / ˆ D i j . Sin c e all nodes in S share the same estimated sender co m munity s 0 and all nodes in T ′ share th e same estimated r eceiv er commu nity r 0 , the n ormalization factor ˆ D i j is constan t over S × T ′ . Den o te th is common value by ˆ d . Then ˆ R S , T ′ = ˆ d − 1 Z S , T ′ . Th us, u sing Equ ation ( A.15 ) g ets k ˆ R S , T ′ k ≥ 2 √ ( n − 1) L k Z S , T ′ k . By Equation ( A.14 ), we get k ˆ R S , T ′ k ≥ 2 √ ( n − 1 ) L c 0 η 2 √ 2 · Ln K 3 / 2 max − O P  √ n L  ! = c 0 η √ 2 · √ n L K 3 / 2 max · r n n − 1 − O P (1) . (A.16) Step 9 : P rove divergence. Because σ 1 ( ˆ R ) ≥ k ˆ R S , T ′ k , it su ffi ces to show that the right-hand side of Equation ( A.16 ) div erges in p robability . The leading deter ministic term is c 0 η √ 2 · √ n L K 3 / 2 max · r n n − 1 . Since p n n − 1 → 1 , we have ≍ √ nL K 3 / 2 max = q nL K 3 max . By conditio n ( A2), nL K 3 max → ∞ , which implies c 0 η √ 2 · √ nL K 3 / 2 max · p n n − 1 → ∞ . The remainder term O P (1) is stoc h astically bou nded. The r efore, we have k ˆ R S , T ′ k P → ∞ , which imp lies σ 1 ( ˆ R ) P → ∞ an d conseq uently ˆ T n = σ 1 ( ˆ R ) − 2 P → ∞ . Step 10 : The case K r 0 < K r . If K r 0 < K r , the ro les of sender and receiver are inter changed . T he sam e argumen t, selecting two distinct true receiver commun ities and a su itab le sender commu nity via the symmetric version of (A1), yields an ana lo gous divergence r esult. Th is completes the proo f of this theorem . 32 Ap pendix B. Proof s for the MLDiGoF Algorithm Append ix B.1. Pr oof of Th e o r em 3 Pr o of. W e a dopt all n otations from the main text an d Appen dix. Let K cand be any upper bo u nd satisfying K cand ≥ K max for all su ffi ciently large n . (Th e default ch oice K cand = ⌊ p n / log n ⌋ works becau se Assumption 3 imp lies K max = o ( p n / log n ).) For each candid a te pair ( k s , k r ), let ˆ T n ( k s , k r ) be the test statistic comp u ted via Eq uation ( 3 ) using Algor ithm 1 with input ( K s 0 , K r 0 ) = ( k s , k r ). Let P n = { ( k (1) s , k (1) r ) , . . . , ( k ( M n ) s , k ( M n ) r ) } be the lexicograph ically ordered sequence of candidate pairs fro m (1 , 1) to ( K cand , K cand ), where M n = K 2 cand . Since K s , K r ≤ K max ≤ K cand for all large n , the true p air ( K s , K r ) belong s to P n . Let m ∗ ( n ) denote its index in P n . N o te tha t because we allow K max to grow with n , m ∗ ( n ) may also gr ow with n . Define th e events ˜ A n ≔ n ˆ T n ( K s , K r ) < t n o , ˜ B n ≔ m ∗ ( n ) − 1 \ m = 1 n ˆ T n ( k ( m ) s , k ( m ) r ) ≥ t n o . Algorithm 2 re turns ( K s , K r ) exactly when ˜ A n ∩ ˜ B n occurs. W e will pr ove that P ( ˜ A n ) → 1 and P ( ˜ B n ) → 1. Then , by the un ion boun d, we can obtain P ( ˜ A n ∩ ˜ B n ) = 1 − P ( ˜ A c n ∪ ˜ B c n ) ≥ 1 − P ( ˜ A c n ) − P ( ˜ B c n ) → 1 . Part 1: Behavior under the tr ue model: P ( ˜ A n ) → 1 . Let ˆ T ∗ n ≔ ˆ T n ( K s , K r ). W e n e e d to show P ( ˆ T ∗ n ≥ t n ) → 0. Recall the o r acle statistic T n ≔ σ 1 ( R ) − 2 defin e d in Equation ( 1 ), wher e R is the id eal residual matr ix co nstructed with the tr ue p arameters. Fr om Equ ation ( A. 1 ), for any x > 0, we have P ( T n ≥ x ) ≤ n exp − x 2 ( n − 1 ) C 0 L ! , (B.1) where C 0 = 4 C / [ δ (1 − δ )] and C is the un iv ersal co nstant fr om L emma 2 . By W eyl’ s inequality for singular values, we have | σ 1 ( ˆ R ) − σ 1 ( R ) | ≤ k ˆ R − R k , where ˆ R is the n o rmalized residual matrix con structed with the estimated co mmunities from Algor ithm 1 using ( K s 0 , K r 0 ) = ( K s , K r ). Con sequently , we g et | ˆ T ∗ n − T n | ≤ k ˆ R − R k . From this inequality , we obtain the one-sided bo und: ˆ T ∗ n = σ 1 ( ˆ R ) − 2 ≤ σ 1 ( R ) − 2 + k ˆ R − R k = T n + k ˆ R − R k . (B.2) Now c onsider th e event { ˆ T ∗ n ≥ t n } . By Equ ation ( B.2 ), we have { ˆ T ∗ n ≥ t n } ⊆ { T n + k ˆ R − R k ≥ t n } . If T n + k ˆ R − R k ≥ t n , then at lea st one of T n ≥ t n / 3 or k ˆ R − R k ≥ t n / 3 must hold . Inde e d, if both T n < t n / 3 and k ˆ R − R k < t n / 3, then T n + k ˆ R − R k < 2 t n / 3 < t n (since t n > 0 ). Thu s, we have { T n + k ˆ R − R k ≥ t n } ⊆ { T n ≥ t n / 3 } ∪ {k ˆ R − R k ≥ t n / 3 } . Applying the u nion b ound , we obtain P  ˆ T ∗ n ≥ t n  ≤ P  T n + k ˆ R − R k ≥ t n  ≤ P  T n ≥ t n / 3  + P  k ˆ R − R k ≥ t n / 3  . (B.3) 33 Bound ing the first term in Eq uation ( B.3 ) . Applying Eq uation ( B.1 ) with x = t n / 3 giv es P ( T n ≥ t n / 3 ) ≤ n exp − t 2 n ( n − 1 ) 9 C 0 L ! . W e show the r ight-han d side tend s to zero. Con dition (C1 ) states α n = o ( t n ). Squarin g y ields K 2 max L log n n = o ( t 2 n ) ⇐ ⇒ t 2 n n K 2 max L log n → ∞ . Since K max ≥ 1 , th e above imp lies t 2 n n / ( L log n ) → ∞ . H e n ce, we hav e n exp − t 2 n ( n − 1 ) 9 C 0 L ! = n exp − t 2 n n 9 C 0 L · n − 1 n ! ≤ n exp − t 2 n n 10 C 0 L ! → 0 . for su ffi c ien tly large n (b ecause ( n − 1) / n → 1) . Th us, we get lim n →∞ P ( T n ≥ t n / 3 ) = 0 . (B.4) Bound ing the second term in Equation ( B.3 ) . Fro m Lemma 5 and its p roof, we have k ˆ R − R k = O P        K max r L log n n        = O P ( α n ) . More precisely , th ere exists a constant C est > 0 such that fo r all su ffi ciently large n , we h av e P  k ˆ R − R k ≥ C est α n  ≤ 2 n 2 . Condition (C1) giv es α n = o ( t n ). Hence, for any fixed ε > 0, we have C est α n ≤ ε t n for all large n . Choose ε = 1 / 3. Then fo r n large eno ugh, we have P  k ˆ R − R k ≥ t n / 3  ≤ P  k ˆ R − R k ≥ C est α n  ≤ 2 n 2 → 0 . Thus, we get lim n →∞ P  k ˆ R − R k ≥ t n / 3  = 0 . (B.5) Conclusion fo r ˜ A n . Insertin g Equa tio ns ( B.4 ) an d ( B.5 ) into Equatio n ( B.3 ) yields P ( ˆ T ∗ n ≥ t n ) → 0, i.e. P ( ˜ A n ) → 1. Part 2: Behavior under underfitted models: P ( ˜ B n ) → 1 . For any un derfitted candid ate ( k s , k r ) (i. e . , k s < K s or k r < K r ), we need a lower bound fo r ˆ T n ( k s , k r ) tha t hold s unifor m ly with high prob ability . From th e pr oof of The o rem 2 , we have tha t for any un derfitted cand id ate, there exist node sets S and T ′ with sizes satisfy ing | S | ≥ c 0 n / K max and | T ′ | ≥ c 0 n / K 2 max such th at ˆ T n ( k s , k r ) ≥ c 0 η √ 2 √ n L K 3 / 2 max r n n − 1 − 2 √ ( n − 1 ) L k W S , T ′ k − 2 , where W S , T ′ is th e restrictio n of W = P L ℓ = 1 ( A ( ℓ ) − Ω ( ℓ ) ) to S × T ′ . Define Z ( k s , k r ) n : = 2 √ ( n − 1 ) L k W S , T ′ k + 2 . Since √ n / ( n − 1) → 1, there exists a constan t c 2 > 0 (e.g., c 2 = c 0 η/ (2 √ 2)) su ch that for all su ffi ciently large n , we have ˆ T n ( k s , k r ) ≥ c 2 √ n L K 3 / 2 max − Z ( k s , k r ) n . (B.6) 34 Now w e c o ntrol th e ta il o f Z ( k s , k r ) n . Recall from Eq uation ( A.13 ) in the pr oof o f Th eorem 2 that for any M > 0, we have P  k W S , T ′ k ≥ 3 2 √ Ln + M √ n L  ≤ n exp  − M 2 n C L  , (B.7) where C > 0 is a un iv ersal constant. Thus, we hav e Z ( k s , k r ) n = O P (1), and the bo und in Equ a tion ( B.7 ) is unif orm over all un derfitted can didates becau se it depend s only on n , L , and the e ntrywise bound s of W S , T ′ , which are unif o rm. Define γ n : = c 2 √ nL K 3 / 2 max = c 2 β n . By condition (C2), we must ha ve t n ≤ γ n / 2 for all large n . W e now bound P  ˆ T n ( k s , k r ) < t n  . By Eq u ation ( B.6 ), we have P  ˆ T n ( k s , k r ) < t n  ≤ P  c 2 √ n L K 3 / 2 max − Z ( k s , k r ) n < t n  ≤ P  Z ( k s , k r ) n > γ n − t n  ≤ P  Z ( k s , k r ) n > γ n / 2  (since t n ≤ γ n / 2) . Now , Z ( k s , k r ) n > γ n / 2 implies 2 √ ( n − 1) L k W S , T ′ k > γ n / 2 − 2. For large n , γ n / 2 − 2 ≥ γ n / 4 becau se γ n → ∞ . Hence, we have P  Z ( k s , k r ) n > γ n / 2  ≤ P  k W S , T ′ k > γ n 4 · √ ( n − 1 ) L 2  ≤ P  k W S , T ′ k > c 2 8 n L K 3 / 2 max  , where we used that γ n 4 · √ ( n − 1) L 2 ∼ c 2 8 nL K 3 / 2 max . T o app ly Equatio n ( B.7 ), set M such th at 3 2 √ Ln + M √ n L = c 2 8 nL K 3 / 2 max , i.e. M = c 2 8 √ n L K 3 / 2 max − 3 2 . For large n , M ≥ c 2 16 √ nL K 3 / 2 max . Th en by Equa tio n ( B.7 ), we have P  k W S , T ′ k > c 2 8 n L K 3 / 2 max  ≤ n exp  − M 2 n C L  ≤ n exp  − c 2 2 256 C · n L K 3 max · n L  = n exp  − c 2 2 256 C · n 2 K 3 max  . First, r e call that Assumption 3 gives K 2 max L log n n → 0. Since L ≥ 1, we have K 2 max log n n → 0, i.e ., K 2 max = o  n log n  . Thus, ther e exists a co nstant a > 0 such that fo r all su ffi ciently large n , K 2 max ≤ a n log n . (B.8) Consequently , we get K 3 max = K 2 max · K max ≤ a n log n ! · r a n log n = a 3 / 2 n 3 / 2 (log n ) 3 / 2 . Hence, we have n 2 K 3 max ≥ n 2 a 3 / 2 n 3 / 2 / (log n ) 3 / 2 = 1 a 3 / 2 n 1 / 2 (log n ) 3 / 2 , which imp lies exp  − c 2 2 256 C · n 2 K 3 max  ≤ exp  − c 2 2 256 C a 3 / 2 n 1 / 2 (log n ) 3 / 2  . Multiplying both sides by n obtains n exp  − c 2 2 256 C · n 2 K 3 max  ≤ n exp  − c 2 2 256 C a 3 / 2 n 1 / 2 (log n ) 3 / 2  . (B.9) 35 Next, w e sh ow that the r ight-hand side of Equa tio n ( B.9 ) is o  1 / K 2 max  . Fro m Equa tion ( B.8 ), we have 1 K 2 max ≥ log n an . Th erefore, it su ffi ces to prove n exp  − c 2 2 256 Ca 3 / 2 n 1 / 2 (log n ) 3 / 2  (log n ) / ( an ) = a n 2 exp  − c 2 2 256 Ca 3 / 2 n 1 / 2 (log n ) 3 / 2  log n − − − − → n →∞ 0 . ( B.10) T aking loga r ithms gives log  an 2 / log n  − c 2 2 256 C a 3 / 2 n 1 / 2 (log n ) 3 / 2 . As n → ∞ , th e seco nd (negative) te r m d ominates because n 1 / 2 (log n ) 3 / 2 grows faster than log n . Hen ce the wh ole expression tends to −∞ , wh ich implies the ratio in E quation ( B.10 ) converges to zero. Thus, we get n exp  − c 2 2 256 C a 3 / 2 n 1 / 2 (log n ) 3 / 2  = o  1 K 2 max  . (B.11) Combining Equation s ( B.9 ) an d ( B.11 ) we ob tain the desired chain n exp  − c 2 2 256 C · n 2 K 3 max  ≤ n exp  − c 2 2 256 C a 3 / 2 n 1 / 2 (log n ) 3 / 2  = o 1 K 2 max ! . Therefo re, for any under fitted ca ndidate, w e h av e P  ˆ T n ( k s , k r ) < t n  = o 1 K 2 max ! . Now consider the number of u nderfitted candidates. The true pair ( K s , K r ) satisfies K s ≤ K max and K r ≤ K max . Since the cand idate p a ir s ar e o rdered lexico g raphically from (1 , 1) to ( K cand , K cand ), every u n derfitted candidate (i.e ., with k s < K s or k r < K r ) must satisfy k s ≤ K max and k r ≤ K max . Therefore, the numb er of unde r fitted candid ates, m ∗ ( n ) − 1 , is at most K 2 max . Applying the unio n bou nd gives P ( ˜ B c n ) ≤ m ∗ ( n ) − 1 X m = 1 P  ˆ T n ( k ( m ) s , k ( m ) r ) < t n  ≤ K 2 max · o 1 K 2 max ! = o (1) . Thus, we get P ( ˜ B n ) → 1. Part 3: Completion of t he proof. W e h ave shown P ( ˜ A n ) → 1 and P ( ˜ B n ) → 1. Ther efore, we have P  ( ˆ K s , ˆ K r ) = ( K s , K r )  = P ( ˜ A n ∩ ˜ B n ) ≥ 1 − P ( ˜ A c n ) − P ( ˜ B c n ) → 1 , which com pletes th e pro of of this theorem. Ap pendix C. Proofs for the MLRDiGoF algo rithm Append ix C.1. Preliminary lemmas W e first establish two lem mas that c h aracterize the asym p totic behavior of the test statistic ˆ T n under correctly specified and u nderfitted models. These lem m as ar e essential for p roving The o rems 4 and 5 . Through out, we allow K max = m ax( K s , K r ) to g row with n , subject to Assumption 3 and con dition (A2) of Theor e m 2 . Lemma 6 (Behavior o f the test statistic at the tru e model) . F or the true ca ndidate m ∗ (i.e., ( k ( m ∗ ) s , k ( m ∗ ) r ) = ( K s , K r ) ), we h ave | ˆ T n ( m ∗ ) | = o P (1) . 36 Pr o of. By Theo r em 1 , for a ny ǫ > 0, P ( ˆ T n ( m ∗ ) < ǫ ) → 1 when ( K s 0 , K r 0 ) = ( K s , K r ). More over, from Lemma 1 and Lemma 5 , we have | ˆ T n ( m ∗ ) − T n | ≤ k ˆ R − R k = o P (1) , and T n = o P (1). Since ˆ T n ( m ∗ ) = σ 1 ( ˆ R ) − 2 and σ 1 ( ˆ R ) ≥ 0, we also have ˆ T n ( m ∗ ) ≥ − 2. Th erefor e , | ˆ T n ( m ∗ ) | = o P (1). Lemma 7 (Uniform bou nds for u nderfitted models) . F or any u nderfitted candidate m < m ∗ (i.e., k ( m ) s < K s or k ( m ) r < K r ), th er e exis t p ositive con stants c 1 , c 2 (depen d ing o n ly o n δ , c 0 , η fr om Assumption s 1 , 2 , and conditio n (A1) ) such that with p r o bability a t least 1 − O ( K 2 max Ln − 3 ) as n → ∞ , we h ave c 1 √ n L K 3 / 2 max ≤ ˆ T n ( m ) ≤ √ 2 n L , wher e the constan t c 1 is u niform over all underfi tted cand idates. Pr o of. W e prove the two-sided boun d separately . Part 1: Lower bound. The lower bo und follows dir ectly f rom the pr oof of Theo rem 2 . For an un derfitted candidate m , Th eorem 2 establishes that ˆ T n ( m ) P − → ∞ . Mo re prec isely , revisiting the p roof of Theo rem 2 (specifically , Equa tio ns ( A.12 ) and ( A.16 )), there exist node sets S and T ′ with | S | ≥ c 0 n / K max and | T ′ | ≥ c 0 n / K 2 max such th at k ˆ R S , T ′ k ≥ c 0 η 2 √ 2 · √ n L K 3 / 2 max · r n n − 1 − O P (1) . The pr obability that this ineq uality fails is at most O ( K 2 max Ln − 3 ), a s shown via the concen tr ation re su lts in L e mmas 3 and 4 and the boun d on k W S , T ′ k in E q uation ( A.13 ). Since σ 1 ( ˆ R ) ≥ k ˆ R S , T ′ k , we have with probability at least 1 − O ( K 2 max Ln − 3 ) that ˆ T n ( m ) = σ 1 ( ˆ R ) − 2 ≥ c 0 η 4 √ 2 · √ n L K 3 / 2 max = : c L √ n L K 3 / 2 max . Part 2: Upper bound. W e n ow prove a deter ministic up p er boun d that holds fo r any ca ndidate pair ( k s , k r ) ( underfitted or not). Let ˆ R be the normalized residual m atrix defined in Equation ( 2 ). For each estimated block ( s 0 , r 0 ), let n s 0 = | { i : ˆ g s ( i ) = s 0 }| , n r 0 = | { j : ˆ g r ( j ) = r 0 }| , an d m = n s 0 n r 0 . Den o te ˆ p ( ℓ ) s 0 r 0 = ˆ B ( ℓ ) ( s 0 , r 0 ) an d d efine ¯ D s 0 r 0 = L X ℓ = 1 ˆ p ( ℓ ) s 0 r 0 (1 − ˆ p ( ℓ ) s 0 r 0 ) . For any ( i , j ) in th is blo ck (i.e., i ∈ ˆ C s s 0 , j ∈ ˆ C r r 0 ), we have ˆ Ω ( ℓ ) ( i , j ) = ˆ p ( ℓ ) s 0 r 0 and D i j = ¯ D s 0 r 0 . By the Cauchy –Schwarz inequality , we have        L X ℓ = 1  A ( ℓ ) ( i , j ) − ˆ p ( ℓ ) s 0 r 0         2 ≤ L L X ℓ = 1  A ( ℓ ) ( i , j ) − ˆ p ( ℓ ) s 0 r 0  2 . Summing over all pairs ( i , j ) inside the b lock yield s X i ∈ ˆ C s s 0 X j ∈ ˆ C r r 0        L X ℓ = 1  A ( ℓ ) ( i , j ) − ˆ p ( ℓ ) s 0 r 0         2 ≤ L L X ℓ = 1 X i ∈ ˆ C s s 0 X j ∈ ˆ C r r 0  A ( ℓ ) ( i , j ) − ˆ p ( ℓ ) s 0 r 0  2 . For each fixed layer ℓ , the sum of squared deviations inside th e b lo ck satisfies the algebraic identity X i ∈ ˆ C s s 0 X j ∈ ˆ C r r 0  A ( ℓ ) ( i , j ) − ˆ p ( ℓ ) s 0 r 0  2 = m ˆ p ( ℓ ) s 0 r 0  1 − ˆ p ( ℓ ) s 0 r 0  . 37 This identity holds determin istically for any binary matrix and its block-av erage . It is a direct conseq uence of the definition o f ˆ p ( ℓ ) s 0 r 0 . Consequen tly , we have X i , j        L X ℓ = 1  A ( ℓ ) ( i , j ) − ˆ p ( ℓ ) s 0 r 0         2 ≤ L L X ℓ = 1 m ˆ p ( ℓ ) s 0 r 0  1 − ˆ p ( ℓ ) s 0 r 0  = m L ¯ D s 0 r 0 . Now c onsider th e con tribution o f this blo ck to k ˆ R k 2 F . If ¯ D s 0 r 0 > 0 , th en X i ∈ ˆ C s s 0 X j ∈ ˆ C r r 0 ˆ R ( i , j ) 2 = 1 n − 1 1 ¯ D s 0 r 0 X i , j        L X ℓ = 1  A ( ℓ ) ( i , j ) − ˆ p ( ℓ ) s 0 r 0         2 ≤ 1 n − 1 mL ¯ D s 0 r 0 ¯ D s 0 r 0 = mL n − 1 . If ¯ D s 0 r 0 = 0, th en ˆ p ( ℓ ) s 0 r 0 ∈ { 0 , 1 } for every ℓ , and the d efinition of ˆ p ( ℓ ) s 0 r 0 implies th a t A ( ℓ ) ( i , j ) = ˆ p ( ℓ ) s 0 r 0 for all i , j in the block. Hence the numerato r is zero and the contribution is zero, so the same inequality mL n − 1 (which is zer o) hold s trivially . Summing over all block s and using P s 0 , r 0 m = P s 0 n s 0 P r 0 n r 0 = n · n = n 2 , we ob tain k ˆ R k 2 F ≤ L n − 1 X s 0 , r 0 m = n 2 L n − 1 ≤ 2 n L (for n ≥ 2) . Therefo re, we have σ 1 ( ˆ R ) ≤ k ˆ R k F ≤ √ 2 n L , and con sequently ˆ T n ( m ) = σ 1 ( ˆ R ) − 2 ≤ √ 2 n L . This boun d h olds determ inistically , so the requ ir ed pro bability statement is satisfied tr ivially . Combining the bounds. By the union bo und, both the lower bo und and the upp er bo und hold sim u ltaneously with probab ility at least 1 − O ( K 2 max Ln − 3 ). T h is completes the proo f of this lem ma. Append ix C.2. Pr oof of Theor em 4 Pr o of. Since K s and K r are fixed, K max = max( K s , K r ) is a co nstant indepen dent of n . T his simplification is cr ucial for the following analy sis. Part 1: Diver gence at the t r ue model. Let m ∗ be th e in dex of the tru e p a ir ( K s , K r ). For any fixed M 0 > 0, we need to show that lim n →∞ P ( r m ∗ > M 0 ) = 1 . By definition, r m ∗ =       ˆ T n ( m ∗ − 1) ˆ T n ( m ∗ )       . Since m ∗ − 1 corr esponds to an u nderfitted mo del (either k ( m ∗ − 1) s < K s or k ( m ∗ − 1) r < K r or b oth), by Theorem 2 with fixed K max , we have ˆ T n ( m ∗ − 1) P − → ∞ . More precisely , from Lemm a 7 with fixed K max , there exists a constant c 1 > 0 such th at lim n →∞ P  ˆ T n ( m ∗ − 1 ) ≥ c 1 √ n L  = 1 . (C.1) For the true model m ∗ , by The orem 1 , we have ˆ T n ( m ∗ ) = o P (1). That is, for any ǫ > 0, lim n →∞ P  | ˆ T n ( m ∗ ) | < ǫ  = 1 . (C.2) Now , fix an arbitrar y M 0 > 0 . W e will co nstruct an event on which r m ∗ > M 0 and show that the p robab ility o f this ev ent tend s to 1. 38 Consider the event C n = n ˆ T n ( m ∗ − 1 ) ≥ c 1 √ n L o ∩      | ˆ T n ( m ∗ ) | < c 1 √ n L 2 M 0      . On C n , we have r m ∗ =       ˆ T n ( m ∗ − 1 ) ˆ T n ( m ∗ )       > c 1 √ n L c 1 √ n L / (2 M 0 ) = 2 M 0 > M 0 . Thus, we have P ( r m ∗ > M 0 ) ≥ P ( C n ). W e now prove that P ( C n ) → 1 as n → ∞ . First, by Equation ( C.1 ), we have lim n →∞ P  ˆ T n ( m ∗ − 1 ) ≥ c 1 √ n L  = 1 . Second, we an alyze P  | ˆ T n ( m ∗ ) | < c 1 √ nL 2 M 0  . Since ˆ T n ( m ∗ ) = o P (1), fo r any fixed ǫ > 0, there exists N ǫ such that for all n ≥ N η , P  | ˆ T n ( m ∗ ) | < ǫ  > 1 − ǫ . Now , n ote that c 1 √ nL 2 M 0 → ∞ as n → ∞ (since c 1 , M 0 are fixed positive constants and √ n L → ∞ ). Therefo r e, for any fixed ǫ > 0, there exists N 1 such th at fo r all n ≥ N 1 , c 1 √ nL 2 M 0 > ǫ . Then for n ≥ max( N ǫ , N 1 ), P       | ˆ T n ( m ∗ ) | < c 1 √ n L 2 M 0       ≥ P  | ˆ T n ( m ∗ ) | < ǫ  > 1 − ǫ . Since ǫ can be cho sen arbitrar ily small, we conclud e that lim n →∞ P       | ˆ T n ( m ∗ ) | < c 1 √ n L 2 M 0       = 1 . Now , by the union bo und, P ( C c n ) ≤ P  ˆ T n ( m ∗ − 1) < c 1 √ n L  + P       | ˆ T n ( m ∗ ) | ≥ c 1 √ n L 2 M 0       . Both terms on the righ t- hand side c onv erge to 0. Hen c e, P ( C c n ) → 0, which implies P ( C n ) → 1. Th e refore, P ( r m ∗ > M 0 ) → 1, proving part 1. Part 2: Uniform upper bo und for underfitted models. T ake any m < m ∗ (so b oth m − 1 and m correspo nd to un d erfitted models). W e need to show that ther e exists a constant C > 0 such th at lim n →∞ P ( r m > C ) = 0. From Lem m a 7 , th ere exist constants c 1 , c 2 > 0 such that with pr o bability a t least 1 − O ( K 2 max Ln − 3 ) = 1 − O ( Ln − 3 ) (since K max is fixed) , c 1 √ n L ≤ ˆ T n ( m ) ≤ c 2 √ n L . (C.3) The same bo und h olds for ˆ T n ( m − 1) with the sam e probab ility g uarantee. Define the ev ent A ( m ) n = n c 1 √ n L ≤ ˆ T n ( m ) ≤ c 2 √ n L o , and similarly A ( m − 1) n for ˆ T n ( m − 1). Then P  ( A ( m ) n ) c  = O ( Ln − 3 ) and P  ( A ( m − 1) n ) c  = O ( Ln − 3 ). On th e ev ent A ( m ) n ∩ A ( m − 1) n , we have r m =       ˆ T n ( m − 1) ˆ T n ( m )       ≤ c 2 √ n L c 1 √ n L = c 2 c 1 . Define C = c 2 c 1 + ǫ for some arbitrar y ǫ > 0 (fo r co ncreteness, take ǫ = 1 , so C = c 2 c 1 + 1). Then on A ( m ) n ∩ A ( m − 1) n , we have r m ≤ c 2 c 1 < C . 39 Now , P ( r m > C ) ≤ P  ( A ( m ) n ∩ A ( m − 1) n ) c  ≤ P  ( A ( m ) n ) c  + P  ( A ( m − 1) n ) c  = O ( Ln − 3 ) + O ( Ln − 3 ) = O ( Ln − 3 ) . Hence, by Assump tio n 3 , we have lim n →∞ P ( r m > C ) = 0 . This holds for ev ery m < m ∗ . The constant C does not de p end on n o r on the specific m (since c 1 and c 2 are universal for all underfitted mo dels). This com pletes the proof of this theo r em. Append ix C.3. Pr oof of Theor em 5 Pr o of. Since K s and K r are fixed, we h av e that K max = max( K s , K r ) is a con stan t ind e p enden t of n . Recall that m ∗ denotes the index of ( K s , K r ) in P . Because K s and K r are fixed, m ∗ is fixed (doe s not depend on n ). W e co nsider two cases sep arately . Case 1: ( K s , K r ) = (1 , 1) . In this c a se, m ∗ = 1. The alg orithm first computes ˆ T n (1) for c andidate (1 , 1). By Theore m 1 (w ith ( K s 0 , K r 0 ) = (1 , 1) = ( K s , K r )), for any ǫ > 0, lim n →∞ P  ˆ T n (1) < ǫ  = 1 . The threshold used in Algo r ithm 3 for th e first candidate is t n , which satisfies t n → 0. Th erefore, takin g ǫ = t n (which is po sitive an d ten ds to 0), we have lim n →∞ P  ˆ T n (1) < t n  = 1 . Thus, with p r obability tend ing to 1, Algorithm 3 re turns (1 , 1) at the first step. Hence , we have lim n →∞ P  ( ˆ K s , ˆ K r ) = (1 , 1)  = 1 . Case 2 : ( K s , K r ) , (1 , 1) . Then m ∗ > 1. W e define the f ollowing events: • ˘ E n : = { ˆ T n (1) ≥ t n } . This is the ev ent that th e alg orithm d oes n o t stop at the first candidate. • ˘ A n : = { r m ∗ > τ n } , where r m ∗ =     ˆ T n ( m ∗ − 1) ˆ T n ( m ∗ )     . • For e ach m = 2 , . . . , m ∗ − 1 , define ˘ B ( m ) n : = { r m ≤ τ n } , wher e r m =     ˆ T n ( m − 1) ˆ T n ( m )     . • ˘ B n : = T m ∗ − 1 m = 2 ˘ B ( m ) n . If all th r ee events ˘ E n , ˘ A n , and ˘ B n occur, then: • Since ˘ E n occurs, ˆ T n (1) ≥ t n , so the algor ithm pro ceeds to the lo op. • For each m = 2 , . . . , m ∗ − 1, since ˘ B ( m ) n occurs, we h av e r m ≤ τ n , so the con dition r m > τ n is not satisfied, an d the alg orithm d oes n ot stop at these under fitted can didates. • At m = m ∗ , since ˘ A n occurs, we h av e r m ∗ > τ n , so the alg orithm stops and returns P ( m ∗ ) = ( K s , K r ). Therefo re, we have { ( ˆ K s , ˆ K r ) = ( K s , K r ) } ⊇ ˘ E n ∩ ˘ A n ∩ ˘ B n , which gives P  ( ˆ K s , ˆ K r ) = ( K s , K r )  ≥ P ( ˘ E n ∩ ˘ A n ∩ ˘ B n ) ≥ 1 − P ( ˘ E c n ) − P ( ˘ A c n ) − P ( ˘ B c n ) . W e will show that P ( ˘ E c n ) → 0, P ( ˘ A c n ) → 0, and P ( ˘ B c n ) → 0 as n → ∞ . 40 Step 1: Behavior o f ˘ E n . Since (1 , 1) is an underfitted model (be cause ( K s , K r ) , (1 , 1)), T heorem 2 giv es ˆ T n (1) P − → ∞ . For any ǫ > 0, there exists N such that for all n ≥ N , t n < ǫ . Then , we hav e P  ˆ T n (1) < t n  ≤ P  ˆ T n (1) < ǫ  . Since ˆ T n (1) P − → ∞ , w e have lim n →∞ P  ˆ T n (1) < ǫ  = 0 . Hen ce, we get lim n →∞ P ( ˘ E c n ) = lim n →∞ P  ˆ T n (1) < t n  = 0 . Step 2: B ehavior of ˘ A n . W e need to show P ( ˘ A c n ) = P ( r m ∗ ≤ τ n ) → 0 . Recall fro m Lemma 7 (with fixed K max ) that there exist positive con stants c L and c U (depen d ing on ly on δ , c 0 , η from Assum ptions 1 , 2 , an d condition (A1) ) such that fo r any underfitted mo del ( in p a rticular fo r m ∗ − 1 ) , lim n →∞ P  ˆ T n ( m ∗ − 1 ) ≥ c L √ n L  = 1 . (C.4) More p recisely , Lemma 7 g iv es ˆ T n ( m ∗ − 1) ≥ c 1 √ nL K 3 / 2 max with h igh p r obability . Since K max is fixed, we set c L = c 1 / K 3 / 2 max . For the true m odel m ∗ , by Theo rem 1 and Lemma 5 , we have ˆ T n ( m ∗ ) = o P (1). Sp e c ifically , from Lemm a 1 (Equation ( A . 1 )) an d Lemma 5 , there exists a constant M 0 > 0 such that for all su ffi c iently large n , P        | ˆ T n ( m ∗ ) | ≤ M 0 r L log n n        ≥ 1 − υ n , (C.5) where υ n = O ( n − 1 ) + O ( K 2 max Ln − 3 ) = O ( n − 1 ) (since K max and L satisfy A ssum ption 3 and K max is fixed, L can gr ow but L = o ( n / lo g n )). Now define the event ˘ F n : = n ˆ T n ( m ∗ − 1 ) ≥ c L √ n L o ∩        | ˆ T n ( m ∗ ) | ≤ M 0 r L log n n        . On ˘ F n , we have r m ∗ =       ˆ T n ( m ∗ − 1 ) ˆ T n ( m ∗ )       ≥ c L √ n L M 0 p L log n / n = c L M 0 · n p log n . Let b n : = c L M 0 · n √ log n . No te that b n → ∞ as n → ∞ . W e now co mpare b n with τ n . Con dition (D2) implies τ n = o  p n / log n  . Now , we h ave b n p n / log n = c L M 0 · n p log n · r log n n = c L M 0 √ n → ∞ . Hence, b n grows faster than p n / log n , and co nsequently faster tha n τ n (since τ n = o ( p n / log n )). Therefore, for su ffi ciently large n , we h av e τ n < b n . Thus, on F n , we have r m ∗ ≥ b n > τ n , implying that ˘ F n ⊆ ˘ A n . Hen c e, we have P ( ˘ A c n ) ≤ P ( ˘ F c n ) . By the u nion b o und, we get P ( ˘ F c n ) ≤ P  ˆ T n ( m ∗ − 1 ) < c L √ n L  + P        | ˆ T n ( m ∗ ) | > M 0 r L log n n        . 41 From E quation ( C.4 ), the first term tend s to 0. From Equation ( C.5 ), the second term is at mo st υ n = O ( n − 1 ). Therefo re, P ( ˘ F c n ) → 0, and conseq uently P ( ˘ A c n ) → 0. Step 3: Behavior of ˘ B n . W e n e ed to show P ( ˘ B c n ) → 0. Note that ˘ B c n = m ∗ − 1 [ m = 2  ˘ B ( m ) n  c = m ∗ − 1 [ m = 2 { r m > τ n } . By the u nion b o und, we have P ( ˘ B c n ) ≤ m ∗ − 1 X m = 2 P ( r m > τ n ) . For each m = 2 , . . . , m ∗ − 1, both m − 1 and m corr e spond to u nderfitted mo dels (since m < m ∗ ). By Theore m 4 , there exists a constant C > 0 (depend ing only o n δ , c 0 , η ) such that for each such m , lim n →∞ P ( r m > C ) = 0 . Condition (D1) ensures that there exists a con stan t C 0 > C and n 0 such that for all n ≥ n 0 , τ n > C 0 . Since C 0 > C , we have for n ≥ n 0 that { r m > τ n } ⊆ { r m > C 0 } ⊆ { r m > C } . Therefo re, for n ≥ n 0 , we have P ( r m > τ n ) ≤ P ( r m > C ) . Hence, get lim n →∞ P ( r m > τ n ) ≤ lim n →∞ P ( r m > C ) = 0 . Since th e re ar e at mo st m ∗ − 2 ter ms in the sum (a fixed number, becau se m ∗ is fixed) , we obtain lim n →∞ P ( ˘ B c n ) ≤ m ∗ − 1 X m = 2 lim n →∞ P ( r m > τ n ) = 0 . Step 4: Completion of Case 2. W e have shown lim n →∞ P ( ˘ E c n ) = 0 , lim n →∞ P ( ˘ A c n ) = 0 , lim n →∞ P ( ˘ B c n ) = 0. Therefo re, we have lim n →∞ P  ( ˆ K s , ˆ K r ) = ( K s , K r )  ≥ lim n →∞ h 1 − P ( ˘ E c n ) − P ( ˘ A c n ) − P ( ˘ B c n ) i = 1 . Since th e p robab ility can not exceed 1, we conclu de lim n →∞ P  ( ˆ K s , ˆ K r ) = ( K s , K r )  = 1 . Conclusion: Com bining Case 1 and Case 2 , we have shown that under th e stated co nditions, lim n →∞ P  ( ˆ K s , ˆ K r ) = ( K s , K r )  = 1 , which com pletes th e pro of. References [1] Airoldi, E. M. , Blei, D.M., Fienberg, S. E., Xing, E .P ., 2008. Mixed membership stochastic blockmodels. Journal of Machine Learning Researc h 9, 1981–2014. [2] Bakken, T . E., Miller , J.A., Ding, S.L., Sunkin, S.M., Smith, K.A., Ng, L. , Szafer , A. , Dalley , R.A., Royall, J.J. , L emon, T ., et al., 2016. A comprehen siv e transcriptio nal m ap of primate brain de velopmen t. Nature 535, 367–375. [3] Bandeira, A.S., v an Handel, R., 2016. Sharp nonasymptotic bounds on the norm of random matrices with indepe ndent entri es. Annals of Probabil ity 44, 2479 – 2506. 42 [4] Bassett, D.S., W ymbs, N.F ., Porter , M.A., Mucha, P .J., Carlson, J . M., Grafton, S. T ., 2011. Dynamic reconfigur ation of human brain networ ks during learni ng. Proceeding s of the National Academy of Sciences 108, 7641–7646. [5] Bickel, P .J., Sarkar , P ., 2016. Hypothesis testing for automated community detection in netwo rks. Journal of the Royal Statistical Society Series B: Statistical Methodol ogy 78, 253–273. [6] Chen, K., Lei, J., 2018. Netw ork cross-v alidat ion for determi ning the number of communities in network data. Journal of the American Statist ical Associatio n 113, 241–251. [7] De Domenico, M., Granell, C., Porter , M.A., Arenas, A., 2016. The physics of spreading processes in multilaye r network s. Nature Physics 12, 901–906. [8] De Domenico, M., Nicosia, V ., Arenas, A., Latora, V ., 2015. Structural reducibil ity of multilaye r networks. Nature communications 6, 6864. [9] Deng, J., Huang, D. , Ding, Y ., Zhu, Y ., Jing, B., Zhang, B., 2024. Subsampling spectral clusterin g for stochastic block m odels in large-sca le netw orks. Computat ional Statistics & Data Analysis 189, 107835. [10] Dong, Z., W ang, S., Liu, Q., 2020. Spect ral based hypothesis testing for community detecti on in comple x network s. Information Sciences 512, 1360–1371. [11] Fishkind, D.E., Sussman, D.L ., T ang, M., V ogelstein, J.T ., Priebe, C.E., 2013. Consistent adjacenc y-spectra l partit ioning for the stochasti c block m odel when the model parameters are unknown. SIAM Journal on Matrix Analysis and Applicati ons 34, 23–39. [12] Fortunato, S., 2010. Community detect ion in graphs. Physics Reports 486, 75–174. [13] Fortunato, S., Hric, D., 2016. Community detect ion in netw orks: A user guide. Physics Reports 659, 1–44. [14] Guo, X., Qiu, Y . , Zhang, H., Chang, X., 2023. Randomiz ed spectral co-clusterin g for lar ge-scale directed networks. Journa l of Machine Learning Research 24, 1–68. [15] Han, Q., Xu, K. , Airoldi, E., 2015. Consiste nt estimation of dynamic and multi-laye r block models, in: Internati onal Conference on Machine Learning, PMLR. pp. 1511–1520. [16] Holland, P .W ., Laskey , K.B., Leinhardt , S. , 1983. Stochastic blockmodels: First steps. Social Networks 5, 109–137. [17] Hu, J., Qin, H., Y an, T ., Zhao, Y ., 2020. Corrected bayesia n information crite rion for stochast ic block models. Journal of the American Statist ical Associatio n 115, 1771–1783. [18] Hu, J., Z hang, J., Qin, H. , Y an, T ., Zhu, J ., 2021. Using maximum entry-wise deviat ion to test the goodness of fit for stochastic block models. Journal of the American Statistical Associati on 116, 1373–1382. [19] Huang, X., Chen, D . , Ren, T ., W ang, D., 2021. A survey of community detection methods in multilayer netwo rks. Data Mining and Kno wledge Discove ry 35, 1–45. [20] Hwang, N. , Xu, J., Chatterj ee, S., Bhattacha ryya, S., 2024. On the estimation of the number of communities for sparse networks. Journal of the American Statistical Associatio n 119, 1895–1910. [21] Javed, M.A., Y ounis, M.S., Latif, S., Qadir , J ., Baig, A., 2018. Community detecti on in networks: A multidiscipli nary revi ew . Journal of Networ k and Computer Applicat ions 108, 87–111. [22] Ji, P ., Jin, J ., 2016. Coauthor ship and citation networks for statistici ans. Annals of Applied Statisti cs 10, 1779 – 1812. [23] Ji, P ., Jin, J. , Ke, Z .T ., Li, W ., 2022. Co-citatio n and co-authorshi p networks of statisticia ns. Journal of Business & E conomic Statistics 40, 469–485. [24] Jin, J., 2015. Fast community detecti on by SCORE. Annals of Statistics 43, 57–89. [25] Jin, J., Ke, Z .T . , Luo, S., 2024. Mixed membership estimati on for social networks. Journal of Econometrics 239, 105369. [26] Jin, J., Ke, Z.T ., Luo, S., W ang, M., 2023. Optimal estimation of the number of network communities. Journa l of the American Statistical Associati on 118, 2101–2116. [27] Joseph, A. , Y u, B., 2016. Impact of regula rization on spectra l clustering. Annals of Statistics 44, 1765–1791. [28] Karrer , B., Newman, M.E. , 2011. Stochastic blockmodels and community structure in networks. Physical Revie w E—Statisti cal, Nonlinear , and Soft Matter Physics 83, 016107. [29] Kim, J., L ee, J.G. , 2015. Community detection in multi-layer graphs: A survey . AC M SIGMOD Record 44, 37–48. [30] Kive l ¨ a, M., Arenas, A. , Barthele my , M., Gleeson, J. P ., Moreno, Y ., Porter , M.A., 2014. Multilayer networks. Journal of Complex Network s 2, 203–271. [31] L e, C.M., L e vina, E., 2022. Estimating the number of communities by spectral methods. Electronic Journal of Statistics 16, 3315 – 3342. [32] L ei, J . , 2016. A goodness-of-fit test for s tochastic block models. Annals of Statisti cs 44, 401–424. [33] L ei, J . , Chen, K. , L ynch, B., 2020. Consistent community detecti on in multi-la yer networ k data. Biometrika 107, 61–73. [34] L ei, J . , Lin, K. Z., 2023. Bias-adjusted spectral clusteri ng in multi-layer stochastic block m odels. Journal of the American Statistic al Associ- ation 118, 2433–2445. [35] L ei, J . , Rinaldo, A. , 2015. Consistency of spectral clusteri ng in stochastic block m odels. A nnals of Statisti cs 43, 215–237. [36] L ei, J . , Zhang, A.R., Zhu, Z., 2024. Computati onal and statistic al thresholds in multi-layer stochasti c block models. Annals of Statistics 52, 2431–2455. [37] L eicht , E.A., Newman, M.E., 2008. Community structure in directed networks. Physical Re vie w Letters 100, 118703. [38] L i, T ., Levina , E., Zhu, J., 2020. Network cross-v alidati on by edge sampling. Biometrika 107, 257–276. [39] Ma, S. , Su, L. , Z hang, Y . , 2021. Determining the number of communities in degree-c orrected stochasti c block m odels. Journal of Machine Learning Research 22, 1–63. [40] Malliaros, F .D., V azirgi annis, M., 2013. Clusteri ng and community detecti on in directed networks: A surve y. P hysics Reports 533, 95–142. [41] Mao, X., Sarkar , P ., Chakrabarti, D., 2021. Estimating mixe d memberships with sharp eigen vect or devi ations. Journal of the Am erican Statist ical Associatio n 116, 1928–1940. [42] McDaid, A.F . , Murphy , T .B., Friel, N., Hurley , N.J., 2013. Improv ed bayesian inferenc e for the stochastic block model with applica tion to larg e networks. Computationa l Statist ics & Data Analysis 60, 12–31. [43] Paul, S., Chen , Y . , 2016. Consistent community detection in m ulti-relati onal data through restricted multi-laye r stoch astic blockmodel. Electroni c Journal of Statisti cs 10, 3807 – 3870. [44] Paul, S., Chen, Y . , 2020. Spec tral and matrix factoriz ation methods for consistent community detection in multi-layer networks. Anna ls of 43 Statist ics 48, 230 – 250. [45] Paul, S., Chen, Y ., 2021. Null models and community detection in multi-layer networks. Sankhya A , 1–55. [46] Qin, T ., Rohe, K., 2013. Reg ularized spectral clustering under the degree -corrected stochastic blockmodel . Adv ances in Neural Information Processing Systems 26. [47] Qing, H. , 2025a. Community detection by spectral m ethods in m ulti-la yer netwo rks. Applied Soft Computing , 112769. [48] Qing, H., 2025b . Communit y detecti on in multi-lay er networks by reg ularized debiased spectral clustering. Engineering Applicat ions of Artificia l Intelli gence 152, 110627. [49] Qing, H. , 2025c. Discove ring over lapping communitie s in multi-laye r directed networks. Chaos, Solitons & Fractals 194, 116175. [50] Rohe, K., Qin, T ., Y u, B., 2016. Co-clustering direc ted graphs to discov er asymmetries and directi onal communities. Proceedings of the Nationa l Academy of Sciences 113, 12679–12684. [51] Saldana, D.F ., Y u, Y ., Feng, Y ., 2017. How m any communities are there? Journal of Computational and Graphical Statistics 26, 171–181. [52] Su, W ., Guo, X., Chang, X., Y ang, Y ., 2024. Spectral co-clustering in multi-laye r directed networks. Computationa l Sta tistics & Data Analysis 198, 107987. [53] Tropp, J.A. , 2012. Us er-friendly tail bounds for sums of random matrices. Foundati ons of Computation al Mathematics 12, 389–434. [54] W ang, Y .X. R. , Bickel, P .J., 2017. Likel ihood-based model s election for stochasti c block models. Annals of Statistics 45, 500 – 528. [55] W ang, Z., L iang, Y ., Ji, P ., 2020. Spectral algorithms for community detectio n in directed networks. Journal of Machine L earning Research 21, 1–45. [56] Wu, Q., Hu, J., 2024a. A spectral based goodness-of-fit test for stochastic block models. Statistics & Probabilit y L etters 209, 110104. [57] Wu, Q., Hu, J., 2024b . T wo-sa mple test of stochastic block models. Computational Statistics & Data Analysis 192, 107903. [58] Xu, S., Zhen, Y ., W ang, J . , 2023. Cov ariate -assisted community detect ion in multi-la yer networks. Journal of Business & Economic Statisti cs 41, 915–926. [59] Z hang, J., He, X., W ang, J., 2022. Directed community detect ion with netw ork embedding. Journal of the American Statistic al As s ociat ion 117, 1809–1819. [60] Z hou, Z., Amini, A.A., 2019. Analysis of spectral cluste ring algorithms for community detect ion: the general bipartit e s etting. Journal of Machine Learning Research 20, 1–47. 44

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment