Connection Probabilities Estimation in Multi-layer Networks via Iterative Neighborhood Smoothing

Statistica Sinica Connection Probabilities Estimation in Multi-la y er Net w orks via Iterativ e Neigh b orho o d Smo othing Dingzi Guo 1 , Diqing Li 2 , Jingyi W ang 1 , W en-Xin Zhou 3 1 Shandong University 2 Zhejiang Gongshang University 3 University of Il linois Chic ago Abstr act: Understanding the structural mechanisms of m ulti-la yer netw orks is essen tial for analyzing complex systems characterized by m ultiple in teracting lay ers. This w ork studies the problem of estimating connection probabilities in m ulti-lay er netw orks and introduces a new Multi-la yer Iterativ e Connection Probabilit y Estimation (MICE) method. The prop osed approac h emplo ys an iterativ e framew ork that join tly reﬁnes inter-la yer and intra-la yer similarit y sets by dynamically up dating distance metrics deriv ed from curren t probability estimates. By lev eraging b oth lay er-level and no de-level neigh b orhoo d information, MICE impro ves estimation accuracy while preserving computational eﬃciency . Theoretical analysis establishes the consistency of the estimator and shows that, under mild regularity conditions, the prop osed metho d achiev es an optimal conv ergence rate comparable to that of an oracle estimator. Extensive simulation studies across diverse graphon structures demonstrate the sup erior performance of MICE relativ e to existing methods. Empirical ev aluations using brain net work data from patients with Atten tion-Deﬁcit/Hyp eractivit y Disorder (ADHD) and global fo od and agricultural trade netw ork data further illustrate the robustness and eﬀectiveness of the metho d in link prediction tasks. Ov erall, this work provides a theoretically grounded and Diqing Li is the corresp onding author. All authors con tributed equally to this work. practically scalable framework for probabilistic mo deling and inference in multi-la yer netw ork systems. Key wor ds and phr ases: Multi-lay er netw orks, Graphon estimation, Connection probability , Neigh b orho od smo othing. 1. In tro duction Net work analysis has become a central paradigm for studying complex systems, and the past t wo decades ha ve seen rapid metho dological dev elopmen t with applications in so ciology (Eagle et al., 2009; Lesko vec et al., 2010), neurobiology (Rahiminejad et al., 2019; Calderer and Kuijjer, 2021), and statistical physics (Bo ccaletti et al., 2006; Golden b erg et al., 2010). Classical net w ork mo dels typically describe a system using a single-la yer represen tation, where nodes denote system comp onents and edges enco de pairwise in teractions. Ho wev er, man y real-world systems consist of m ultiple interdependent lay ers rather than a single homogeneous structure. F or example, individuals in so cial systems engage across m ultiple online platforms suc h as Twitter , Facebook , and Instagram ; cities in transp ortation net works are link ed through aviation, rail, and road infrastructures; and biological functions are join tly go verned by gene regulatory , protein-protein in teraction, and metabolic net works. T o capture suc h structural complexity , m ulti-lay er netw ork mo dels (Kivelä et al., 2014) ha ve b een prop osed. By explicitly mo deling intra-la y er interactions and in ter- la yer dependencies, these mo dels provide a principled and ﬂexible framew ork for represen ting multidimensional and heterogeneous in teraction structures. A fundamen tal problem in multi-la y er netw ork research is to elucidate the mec h- anisms that driv e the formation of topological structures. As in the single-la y er setting, generativ e mo deling oﬀers a principled statistical framework for studying suc h mechanisms. Within this framew ork, connection probabilities are central: they quan tify the lik eliho o d of edge formation b et w een no de pairs and thus enco de the un- derlying probabilistic structure of the net w ork. How ever, accurately estimating these probabilities in m ulti-la yer netw orks is challenging. Eﬀectiv e estimation requires an adaptiv e strategy that b orro ws strength across lay ers, capturing correlated intra-la yer connectivit y patterns and inter-la yer dep endencies, while sim ultaneously accounting for la yer-speciﬁc no de heterogeneit y and distinct generative mec hanisms. While link prediction has b een extensively studied in single-lay er net works, the corresp onding literature for m ulti-lay er netw orks remains relativ ely limited. Existing approac hes can b e broadly grouped into t wo categories. The ﬁrst category infers p oten tial links by computing similarity scores b et w een non-adjacen t no de pairs (Pham et al., 2022). Although often eﬀective in practice, such heuristic metho ds lack an explicit generativ e mechanism and, as a consequence, do not yield formal theoretical guaran tees. The second stream consists of probabilistic net work mo dels. Although often originally motiv ated by communit y detection, these mo dels naturally facilitate link prediction by explicitly characterizing edge formation through latent structures. Within this framew ork, Jing et al. (2021) and Xu et al. (2023) prop osed tensor- based approaches that lev erage T uc ker decomp osition to reco ver laten t comm unity structures by treating the multi-la yer adjacency matrices as a high-order tensor. While empirically eﬀectiv e, their theoretical guaran tees t ypically rely on strict lo w-rank assumptions that may imp ose excessiv e structural homogeneity across lay ers. More recen tly , Agterb erg et al. (2025) in tro duced DC-MASE, a joint sp ectral clustering metho d for multi-la yer comm unity detection that accommo dates lay er-sp eciﬁc degree- correction parameters and blo c k connectivity matrices, but enforces a strictly common comm unity mem b ership across all lay ers. This assumption may b e o verly restrictive in applications where communit y structures are misaligned or evolv e across la yers. Alternativ ely , MacDonald et al. (2022) prop osed the MultiNeSS framework, which decomp oses no de latent represen tations in to shared and lay er-sp eciﬁc comp onen ts, pro viding greater ﬂexibility . How ever, its p erformance ma y deteriorate in highly sparse regimes, and its computational cost can p ose challenges for large-scale netw orks. In summary , although existing metho ds for link prediction and related probabilistic mo deling in multi-la yer netw orks hav e made notable progress, they con tinue to rely on strong mo del assumptions, exhibit limited ﬂexibility in accommo dating heterogeneit y , and in some cases incur substan tial computational costs. These considerations motiv ate the developmen t of a connection probabilit y estimation framew ork that more ﬂexibly adapts to inter-la yer heterogeneity while maintaining principled theoretical guaran tees and computational eﬃciency . Nonparametric graphon mo dels provide a natural foundation for suc h a framew ork. In recen t y ears, graphon estimation has receiv ed increasing attention in netw ork analysis (W olfe and Olhede, 2013; Gao et al., 2015; Choi, 2017; Chandna et al., 2022) due to its capacit y to represent complex generative mechanisms in a ﬂexible and in terpretable manner. F or single-lay er netw orks, Zhang et al. (2017) and Qin et al. (2021) prop osed neighborho o d smo othing estimators for piecewise Lipsc hitz graphon functions, establishing the optimal error rate among all computationally feasible pro cedures. Building on these foun dations, we prop ose the Multi-lay er Iterative Connection Probabilit y Estimation (MICE) method, tailored to the m ulti-la yer graphon framework (He et al., 2026). Crucially , unlike the classic biv ariate single- la yer graphon, this m ulti-lay er framework incorp orates a lay er-sp eciﬁc latent v ariable η k , leading to a ternary graphon function f ( ξ i , ξ j , η k ) that provides a ﬂexible w a y for mo deling heterogeneous multi-la yer net w orks. This framew ork is remark ably versatile and encompasses many complex netw ork structures as sp ecial cases. F or example, if η k denotes the time p oin t t k , the mo del reduces to a dynamic graphon (Pensky, 2019; Chandna and Maugis, 2020); when η k represen ts observ ational cov ariates, each la yer corresp onds to a co v ariate netw ork (Chandna and Maugis, 2020); and when η k tak es only t wo v alues, the framew ork can b e adapted for c hange-p oin t detection in temp oral net w orks (W ang et al., 2021). F urthermore, when f is a piecewise constant function, the framework naturally recov ers the multi-la yer sto c hastic blo ck mo del (MSBM) prop osed by Paul and Chen (2016). The main con tributions of this work are as follo ws. • First, we develop a general framew ork for estimating connection probabilities in m ulti-lay er netw orks under mild regularit y conditions. In contrast to existing metho ds that t ypically imp ose restrictive structural assumptions, our framework is highly ﬂexible and accommo dates a broad sp ectrum of architectures, ranging from homogeneous to strongly heterogeneous systems. The prop osed MICE metho d employs a dual-level smo othing strategy that adaptiv ely in tegrates information from b oth lo cal no de neigh b orho o ds and global la yer similarities, allo wing it to balance in ter-lay er heterogeneity with shared structural patterns in a data-driv en and robust manner. • Second, we introduce an iterative reﬁnemen t algorithm based on a new pairwise distance metric. Unlik e static, one-step neighborho o d smo othing tec hniques, our approac h dynamically up dates no de- and la y er-level similarity measures at eac h iteration using the current probability estimates. This feedback mechanism progressiv ely sharp ens the identiﬁcation of structurally comparable p eers and eﬀectiv ely corrects initial estimation errors. Through adaptive neigh b orho o d up dates, MICE captures subtle and complex dep endencies that static metho ds frequen tly fail to detect. • Third, w e establish the theoretical consistency and con v ergence guarantees of the MICE estimator. Our analysis shows that incorp orating m ulti-lay er information leads to substan tial accuracy gains. In particular, as the n um b er of la yers increases, the estimator achiev es prov ably faster conv ergence rates than an y computationally feasible single-la yer metho d, underscoring the intrinsic statistical adv antages of aggregating information across la yers. Under mild regularit y conditions, we further show that the prop osed estimator attains the optimal con vergence rate, matc hing that of an oracle estimator. Finally , w e assess the ﬁnite-sample performance and practical utility through extensive sim ulations and applications to brain netw ork data from ADHD patients and to global fo od and agricultural trade netw orks. The remainder of this pap er is organized as follows. Section 2 presents the mo del form ulation, metho dological framework, and algorithmic comp onen ts. Section 3 dev elops the theoretical guaran tees of the proposed method. Sections 4 and 5 rep ort simulation studies and empirical applications that illustrate the ﬁnite-sample p erformance of MICE. Section 6 concludes with a brief summary and discussion. A dditional sim ulation results, complete pro ofs of the main theorems, and auxiliary lemmas are pro vided in the Supplementary Materials. 2. Metho dology 2.1 Notation and mo del Consider a m ulti-lay er netw ork with n no des and K la y ers, represented b y an adjacency tensor A ∈ { 0 , 1 } n × n × K , where A ij k = 1 indicates the presence of an edge b et w een no des i and j in la yer k , and A ij k = 0 otherwise. Let V = [ n ] and L = [ K ] denote the no de set and the la yer set, respectively . Throughout this pap er, w e fo cus on undirected net works without self-lo ops; accordingly , for eac h lay er k ∈ [ K ] , the k -th adjacency matrix slice, denoted by A k = A ·· k , is symmetric with zero diagonal en tries. Assume that eac h la y er is generated from an exchangeable random graph mo del. By the Aldous-Ho o ver represen tation theorem (Aldous, 1981; Ho o v er, 1979), there exist no de-sp eciﬁc latent v ariables ξ i and a lay er-sp eciﬁc graphon function f k suc h that the edge probability satisﬁes P ij k = f k ( ξ i , ξ j ) . T o capture dep endence across la yers in a coherent manner, w e further introduce a lay er-sp eciﬁc laten t v ariable η k represen ting the latent p osition of la yer k . This yields the m ulti-la yer graphon mo del A ij k ∼ Bernoulli ( P ij k ) , P ij k = f ( ξ i , ξ j , η k ) , where f ( · ) is the m ulti-lay er graphon function (Chandna and Maugis, 2020; He et al., 2026). This form ulation encompasses a wide range of multi-la yer architectures, from homogeneous la yers sharing a common structure to highly heterogeneous systems. Let P ∈ [0 , 1] n × n × K denote the full tensor 2.1 Notation and mo del of connection probabilities, and let P k = P ·· k b e the probability matrix for lay er k . In this pap er, w e aim to estimate the connection probabilit y tensor P (or equiv- alen tly , the probability matrices P k for each lay er k ∈ [ K ] ) based on the observ ed adjacency tensor A . Because each entry A ij k pro vides only a single Bernoulli realiza- tion of the underlying probability P ij k , direct likelihoo d-based estimation is ill-p osed without additional structural assumptions. T o ov ercome this diﬃculty , w e exploit the smo othness inherent in the m ulti-lay er graphon mo del, which p ermits the sharing of information across similar no des and similar lay ers and thus enables the construction of consisten t estimators. Sp eciﬁcally , when the latent position η k of la yer k is close to that of another la yer η k ′ , the smo othness of the graphon function f implies that the corresp onding connection probabilit y matrices P ·· k and P ·· k ′ m ust b e similar. This observ ation naturally motiv ates grouping structurally comparable lay ers into sets of neighb oring layers , denoted by S k = { k ′ : P ·· k ≈ P ·· k ′ } . Likewise, within a ﬁxed lay er k , if the laten t p osition ξ i ′ of no de i ′ is close to that of no de i , the smo othness of f in its ﬁrst argumen t ensures that f ( ξ i ′ , · , η k ) ≈ f ( ξ i , · , η k ) , and thus P i · k ≈ P i ′ · k . In this case, the observed edges A i ′ j k serv e as pro xy observ ations for P ij k . This leads to groups of no des exhibiting similar connectivity patterns, referred to as neighb oring no des . F or an y la yer k ′ ∈ S k , w e denote b y S k ′ i = { i ′ : P i · k ′ ≈ P i ′ · k ′ } the set of no des in lay er k ′ that are analogous to no de i . 2.2 Iterativ e connection probability estimation for m ulti-lay er netw orks Based on these notions, the connection probabilit y P ij k can b e estimated b y aggregating information from neighboring no des and neigh b oring lay ers, leading to the oracle estimator e P ij k = e P j ik = P k ′ ∈S k P i ′ ∈S k ′ i P j ′ ∈S k ′ j A i ′ j ′ k ′ P k ′ ∈S k |S k ′ i ||S k ′ j | = ( s ∗ i s ∗ j t ∗ k ) − 1 X k ′ ∈S k X i ′ ∈S k ′ i X j ′ ∈S k ′ j A i ′ j ′ k ′ . (2.1) Because the netw ork is symmetric (i.e., e P ij k = e P j ik ), we employ a uniform smo othing bandwidth. In particular, w e assume that neigh b orho od sizes are stable across structurally similar lay ers and denote s ∗ i = |S k ′ i | and t ∗ k = |S k | . The construction of these neigh b oring sets is describ ed in the next subsection. 2.2 Iterativ e connection probabilit y estimation for m ulti-la y er net w orks W e ﬁrst consider the ideal oracle setting, in whic h the true link probability tensor P is known. In this scenario, neighboring no des and lay ers can b e identiﬁed directly through population-level distance measures deﬁned on the probabilit y matrices. Sp eciﬁcally , the distance b et w een tw o lay ers k and k ′ is measured b y the normalized squared F rob enius distance d kk ′ = n − 2 ∥ P k − P k ′ ∥ 2 F . Lik ewise, within a given lay er k , the distance b etw een no des i and i ′ is quantiﬁed by the normalized squared Euclidean distance b et ween their corresp onding probabilit y v ectors, d k ii ′ = n − 1 ∥ P k i · − P k i ′ · ∥ 2 2 . These distance metrics provide a principled basis for deﬁning neigh b oring la yers and neigh b oring no des in the oracle regime. 2.2 Iterativ e connection probability estimation for m ulti-lay er netw orks Based on these distance metrics, the neighboring lay er set for lay er k is deﬁned as S k = { k ′ : 0 ≤ d kk ′ < d t ∗ k } , where d t ∗ k denotes the t ∗ k -th smallest v alue among { d kk ′ : k ′  = k } . Thus, S k con tains lay er k together with the t ∗ k − 1 lay ers whose probabilit y matrices are closest to P k . F or each k ′ ∈ S k , the neigh b oring no de set for no de i is then constructed as S k ′ i = { i ′ : 0 < d k ′ ii ′ ≤ d s ∗ i } , where d s ∗ i is the s ∗ i -th smallest elemen t of the intra-la yer distance set { d k ′ ii ′ : i ′  = i } . In this wa y , S k ′ i collects s ∗ i no des in lay er k ′ whose connectivit y proﬁles most closely resemble that of no de i . In practice, ho wev er, only the adjacency tensor A is observed, while the underlying connection probabilities P are unknown. Consequen tly , the p opulation-lev el distances and the oracle neighborho o ds deﬁned ab o v e cannot b e computed directly . T o address this c hallenge, the prop osed MICE metho d employs an iterative reﬁnement strategy . Starting from an initial estimate b P k , w e construct plug-in estimators of the inter-la yer and intra-la yer distances, whic h yield the estimated neighboring la yer sets b S k and the estimated neigh b oring no de sets b S k ′ i and b S k ′ j . Using these estimated neighborho o ds, the connection probabilit y estimates are up dated through the smo othing op eration b P ij k = b P j ik = P k ′ ∈ b S k P i ′ ∈ b S k ′ i P j ′ ∈ b S k ′ j A i ′ j ′ k ′ P k ′ ∈ b S k | b S k ′ i || b S k ′ j | = ( s i s j t k ) − 1 X k ′ ∈ b S k X i ′ ∈ b S k ′ i X j ′ ∈ b S k ′ j A i ′ j ′ k ′ , (2.2) where s i = | b S k ′ i | and t k = | b S k | denote the cardinalities of the estimated no de and la yer neigh b orho ods. Based on the up dated estimate, we recompute the pairwise distances b et ween no des and lay ers, which reﬁnes the neigh b orho o d construction and, 2.2 Iterativ e connection probability estimation for m ulti-lay er netw orks in turn, pro duces an impro ved estimator b P k . This iterative pro cedure, alternating b et w een reﬁnemen t and probabilit y smo othing, is rep eated un til conv ergence. The complete computational sc heme is summarized in Algorithm 1. T o fully implemen t the prop osed algorithm, tw o practical considerations require sp eciﬁcation: the c hoice of neigh b orho o d sizes and the initialization strategy . F or the neigh b orho od sizes, w e follow the optimal smo othing theory for single-la yer graphons (Zhang et al., 2017). A ccordingly , w e set the no de-lev el neighborho o d size as s i = D i ( n log n ) 1 / 2 . Extending this principle to the multi-la yer setting, we sp ecify the lay er-lev el neigh b orho o d size as t k = G k ( K log K ) 1 / 2 . The constants D i and G k are further examined through a comprehensiv e sensitivit y analysis in Section 4.2, whic h sho ws that the optimal conﬁguration v aries subtly with the netw ork’s struc- tural complexit y . Based on these ﬁndings, we rep ort the recommended v alues and pro vide their justiﬁcations. The second consideration concerns the initialization b P k (0) . While any computationally feasible estimator that consistently appro ximates the underlying probabilit y tensor is, in principle, acceptable, a high-quality initialization can substan tially accelerate conv ergence. In this w ork, we adopt the MNS estimator of He et al. (2026) as a w arm-start for the iterative pro cedure. 2.2 Iterativ e connection probability estimation for m ulti-lay er netw orks Algorithm 1: Multi-lay er iterative connection probability estimation. Input: observ ed adjacency matrices A k , k = 1 , · · · , K ; initial connection probabilit y estimators b P k (0) , k = 1 , · · · , K ; neigh b orho od sizes s i , s j and t k ; threshold δ 0 > 0 . Output: connection probability estimators b P k , k = 1 , · · · , K . Let δ P = + ∞ and m = 0 ; while δ P > δ 0 do F or eac h lay er pair ( k , k ′ ) and no de pair ( i, i ′ ) , compute the distances ˆ d kk ′ = ∥ b P k ( m ) − b P k ′ ( m ) ∥ 2 F /n 2 and ˆ d k ii ′ = ∥ b P k ( m ) i · − b P k ( m ) i ′ · ∥ 2 2 /n . F or eac h lay er k ∈ L and no de i ∈ V , construct the estimated neighboring la yers b S k = { k ′ : 0 ≤ ˆ d kk ′ < d t k } and neigh b oring no des b S k ′ i = { i ′ : 0 < ˆ d k ′ ii ′ ≤ d s i , k ′ ∈ b S k } . F or eac h no de pair i, j ∈ V within la yer k , up date the connection probabilit y estimator: b P ( m +1) ij k = b P ( m +1) j ik = ( s i s j t k ) − 1 P k ′ ∈ b S k P i ′ ∈ b S k ′ i P j ′ ∈ b S k ′ j A i ′ j ′ k ′ . Set b P k ( m +1) = b P ( m +1) ·· k for all k = 1 , . . . , K , and up date δ P = K P k =1 √ ∥ b P k ( m +1) − b P k ( m ) ∥ 2 F K P k =1 √ ∥ b P k ( m ) ∥ 2 F . Let m = m + 1 . return b P k = b P k ( m ) , k = 1 , · · · , K . 3. Theoretical Prop erties In this section, we establish the consistency of the MICE estimator. The analysis relies on a set of assumptions that are standard in the graphon literature. Assumption 1. Ther e exist p artitions of [0 , 1] , denote d by I = { I r } R r =1 and J = { J t } T t =1 , wher e I r = [ x r − 1 , x r ) and J t = [ z t − 1 , z t ) . Within this fr amework, the multi- layer gr aphon f : [0 , 1] 3 → [0 , 1] is pie c ewise Lipschitz on the grid c el ls I × I × J . Mor e pr e cisely, ther e exist p ositive c onstants L n and L K such that, for any grid c el l I r × I s × J t , the fol lowing Lipschitz c onditions hold: | f ( u 1 , v , w ) − f ( u 2 , v , w ) | ≤ L n | u 1 − u 2 | , | f ( u, v 1 , w ) − f ( u, v 2 , w ) | ≤ L n | v 1 − v 2 | , and | f ( u, v , w 1 ) − f ( u, v , w 2 ) | ≤ L K | w 1 − w 2 | , for al l u, u 1 , u 2 ∈ I r , v , v 1 , v 2 ∈ I s , and w , w 1 , w 2 ∈ J t . Assumption 1 formalizes a piecewise Lipsc hitz condition on the m ulti-la yer graphon, follo wing the framew ork introduced in He et al. (2026). This require- men t is considerably weak er than the structural assumptions typically imp osed in classical graphon estimation, such as monotonicit y (Bick el and Chen, 2009) or global Lipsc hitz con tinuit y (Chan and Airoldi, 2014). Those stronger conditions often ex- clude man y practically relev ant and interpretable mo dels, including sto chastic blo c k mo dels with piecewise constant connection probabilities. By contrast, Assumption 1 accommo dates a wide range of multi-la yer netw ork structures, including multi-la yer sto c hastic blo ck mo dels (Paul and Chen, 2016; Jing et al., 2021) and dynamic graphon mo dels (Zhao et al., 2019; Pensky, 2019). This ﬂexibilit y ensures that our theoretical guaran tees apply to a broad class of complex real-world net works. Assumption 2. The numb er of p artitions R incr e ases with n in such a way that min r | I r | / ( n − 1 log n ) 1 / 2 → ∞ , and the numb er of p artitions T incr e ases with K such that min t | J t | / ( K − 1 log K ) 1 / 2 → ∞ . Her e | I r | and | J t | denote the lengths of the intervals I r and J t , r esp e ctively. F or an y laten t p ositions ξ i , η k ∈ [0 , 1] , let I ( ξ i ) and J ( η k ) denote the partition inter- v als that con tain ξ i and η k , resp ectively . W e deﬁne th e restricted lo cal neighborho o ds b y N i (∆ n ) = [ ξ i − ∆ n , ξ i + ∆ n ] ∩ I ( ξ i ) and M k (∆ K ) = [ η k − ∆ K , η k + ∆ K ] ∩ J ( η k ) . By construction, the graphon f ( u, v , w ) satisﬁes a piecewise Lipsc hitz condition in ( u, v ) on N i (∆ n ) (for ﬁxed w ∈ J t ) and in w on M k (∆ K ) (for ﬁxed u, v ∈ I r × I s ). A dapting Lemma 1 of Zhang et al. (2017) to the multi-la yer setting, w e sp ecify the neighborho o d radii as ∆ n = h C + ( C 1 + 4) 1 / 2 i ( n − 1 log n ) 1 / 2 and ∆ K = h C ′ + ( C ′ 1 + 4) 1 / 2 i ( K − 1 log K ) 1 / 2 , for constants C, C 1 , C ′ , C ′ 1 > 0 . Under these c hoices, with probabilit y at least 1 − 2 n − C 1 / 4 − 2 K − C ′ 1 / 4 , we hav e sim ultaneously that min i ∈V   { i ′ : ξ i ′ ∈ N i (∆ n ) }   ≥ C ( n log n ) 1 / 2 and min k ∈L   { k ′ : η k ′ ∈ M k (∆ K ) }   ≥ C ( K log K ) 1 / 2 . This result ensures that, for suﬃciently large n and K , the num b er of similar nodes and la yers a v ailable for smo othing gro ws at least on the order of ( n log n ) 1 / 2 and ( K log K ) 1 / 2 with high probabilit y . If the true neighboring sets S k and S k ′ i w ere known in adv ance, the oracle estimator e P deﬁned in (2.1) could b e computed directly from the observe d adjacency tensor A . This oracle estimator provides a theoretical b enc hmark: it reﬂects the b est p erformance that any neighborho o d smo othing metho d can ac hieve under p erfect kno wledge of the underlying neighborho o ds. The next theorem establishes an upp er b ound on its estimation error. Theorem 1. L et C l , C m , C ′ l , C ′ m , ˜ C q > 0 b e some suﬃciently lar ge c onstants, wher e l ∈ { 2 , 3 } , m ∈ { 4 , 5 , 6 , 7 , 8 } , and q ∈ { 1 , 2 , 3 } . Deﬁne C ∗ = max { C l / 3 , C m } and C ′ ∗ = max { C ′ l / 3 , C ′ m } . Then, with pr ob ability at l e ast 1 − 2 n − C 1 / 4 − 2 K − C ′ 1 / 4 − 14 n − C ′ ∗ K − C ∗ , the fol lowing err or b ound holds for any layer k ∈ [ K ] : ∥ e P k − P k ∥ 2 F n 2 ≤ ˜ C 1 log n n + ˜ C 2 log K K + ˜ C 3 √ 2 log n + log K [ n (log n ) K (log K )] 1 2 . Theorem 1 establishes the theoretical b enc hmark for estimation error under oracle knowledge of the neigh b orho o ds. The result explicitly characterizes how the error decreases as n and K gro w. In particular, when K is suﬃcien tly large, the con vergence rate impro ves up on sev eral existing results: the O (( log n/n ) 1 / 2 ) rate for single-la yer neigh b orho o d smo othing (Zhang et al., 2017), the O (( n − 3 log n ) 1 / 4 ) rate for single-la yer iterative estimation (Qin et al., 2021), and the O (( log n/nK ) 1 / 3 ) rate for standard m ulti-la yer smo othing (He et al., 2026). Moreo v er, when K ≳ O ( n ) , the oracle rate approaches O ( log ( n ) /n ) , matc hing the minimax lo wer b ound for the estimation of an individual la yer deriv ed in Gao et al. (2015). These comparisons highligh t the statistical b eneﬁts of lev eraging information from structurally similar la yers to impro ve estimation accuracy . Ho wev er, the v alidity of the neighborho o d smo othing principle requires that the no de and la yer dimensions n and K b e suﬃcien tly large so that the nearest neighbors selected b y the quan tile thresholds are genuinely close in structure. When either n or K is to o small, the identiﬁed neighbors may in fact lie far from the target no de or lay er, p oten tially inducing non-negligible smo othing bias. In particular, a mo derate num b er of lay ers is needed to ensure the theoretical consistency of inter-la yer smo othing. Section 4 ev aluates MICE across a v ariety of conﬁgurations with diﬀerent v alues of K , and the results show that the metho d delivers strong p erformance in most settings. Notably , when K is limited, the framew ork naturally reduces to a single-la yer pro cedure by restricting the lay er neighborho o d to S k = { k } , in which case MICE b ecomes the standard ICE metho d (Qin et al., 2021) that relies solely on in tra-lay er information. In practical settings, exact recov ery of the true neighborho o d structures is rarely ac hiev able. Theorem 2 shows that our metho d is robust in this regard, demonstrating that the optimal oracle error rate is retained ev en when the estimated neighborho o ds include a small prop ortion of incorrectly selected members. Theorem 2. Assume that P ij k ∈ [ a, b ] for al l ( i, j, k ) ∈ V ×V ×L , wher e 0 < a < b < 1 , and let b S denote the estimate d neighb orho o d c onﬁgur ation. Supp ose that the numb er of inc orr e ctly sele cte d neighb ors is b ounde d by max ( i,j,k ) X k ′ ∈ b S k X i ′ ∈ b S k ′ i X j ′ ∈ b S k ′ j I (( i ′ , j ′ , k ′ ) / ∈ S k ′ i × S k ′ j × S k ) ≤ e ( n, K ) , wher e the toler anc e thr eshold e ( n, K ) is given by e ( n, K ) := √ C 9 b − a  max   n (log n ) 3 K log K  1 2 , n (log n )(log K ) , (2 log n + log K ) 1 4 ( n 3 (log n ) 3 K (log K )) 1 4  . Then, ther e exist p ositive c onstants ˜ C ′ 1 , ˜ C ′ 2 , and ˜ C ′ 3 such that, with pr ob ability at le ast 1 − 2 n − C 1 / 4 − 2 K − C ′ 1 / 4 − 14 n − C ′ ∗ K − C ∗ , the estimator b P k c onstructe d fr om b S satisﬁes the same err or b ound as the or acle estimator for every layer k : ∥ b P k − P k ∥ 2 F n 2 ≤ ˜ C ′ 1 log n n + ˜ C ′ 2 log K K + ˜ C ′ 3 √ 2 log n + log K [ n (log n ) K (log K )] 1 2 . A direct consequence of Theorem 2 concerns the allo w able tolerance for neigh b or- ho od selection errors. Since the cardinality of the estimated neigh b orho od pro duct set | b S k ′ i × b S k ′ j × b S k | is appro ximately s i s j t k , the p ermissible fraction of incorrectly selected neigh b ors is giv en by e ( n, K ) s i s j t k ≍ √ C 9 b − a · max h (log n/n ) 1 2 , (log K /K ) 1 2 , (2 log n + log K ) 1 4 ( n log nK log K ) 1 4 i . This result implies that as long as the prop ortion of misiden tiﬁed neighbors remains b elo w this threshold, the estimator contin ues to ac hieve the optimal con vergence rate. Building on this insigh t, the estimation accuracy can b e further improv ed when the algorithm is initialized with a w arm-start estimator that already satisﬁes a preliminary consistency b ound. F ormally , let b P k (0) b e an initial estimator satisfying the uniform error b ound max i ∈V , k ∈L 1 n n X j =1 ( b P (0) ij k − P ij k ) 2 ≤ C 10 E ( n, K ) , with probabilit y at least 1 − n − C 11 . The error rate E ( n, K ) is assumed to satisfy lim n,K →∞ E ( n, K ) . log n n + log K K + √ 2 log n + log K [ n (log n ) K (log K )] 1 2 ! → ∞ Under these conditions, the next theorem establishes that the prop osed iterative pro cedure yields a strictly improv ed estimator relative to the warm start. Theorem 3. Deﬁne the minimum signal sep ar ation gap κ ( n, K ) as the smal lest diﬀer enc e in c onne ction pr ob abilities b etwe en any true neighb or and any non-neighb or: κ ( n, K ) = min i,j ∈V ,i ′′ / ∈S k ′ i k ∈L ,k ′ ∈S k ,k ′′ / ∈S k |P ij k − P i ′′ j k ′′ | . Supp ose the initial estimator b P k (0) satisﬁes the err or b ound E ( n, K ) deﬁne d pr eviously. If the sep ar ation gap is suﬃciently lar ge, in p articular if κ 2 ( n, K ) ≥ 8 L 2 n  C + ( C 1 + 4) 1 / 2  2 log n n +8 L 2 K  C ′ + ( C ′ 1 + 4) 1 / 2  2 log K K +20 C 10 E ( n, K ) , then, using b P k (0) as the initialization, with pr ob ability at le ast 1 − 2 n − C 1 / 4 − 2 K − C ′ 1 / 4 − 14 n − C ′ ∗ K − C ∗ − n − C 11 , the up date d estimator b P k new attains the or acle c onver genc e r ate: ∥ b P k new − P k ∥ 2 F n 2 ≤ ˜ C 1 log n n + ˜ C 2 log K K + ˜ C 3 √ 2 log n + log K [ n (log n ) K (log K )] 1 2 . W e emphasize that the separation requirement on κ ( n, K ) is mild. Under the piecewise Lipschitz conditions imp osed along b oth the no de and lay er dimen- sions, the minimal distinguishable scale is determined b y local deviations in ei- ther index. In an ideal noiseless setting, the separation constan t therefore satisﬁes κ ( n, K ) ≍ max  L n ∆ n , L K ∆ K  , whic h reﬂects the intrinsic structural gap b et w een a true neigh b or and a non-neigh b or in the no de or lay er direction. In practice, how ev er, neighborho o d selection is carried out using the initial estimator b P k (0) rather than the true probability tensor P . Because the initialization error E ( n, K ) generally exceeds the in trinsic structural v ariation in ﬁnite samples, the distances based on b P k (0) tend to b e inﬂated relative to those deﬁned on P . As a result, the theoretical analysis requires a larger low er b ound on κ ( n, K ) to ensure stable neighborho o d reco v ery in this noisy regime. This condition is nevertheless non-restrictiv e: b oth ∆ n and ∆ K are allow ed to conv erge to zero as n, K → ∞ . In this asymptotic setting, the required separation constan t κ ( n, K ) approaches its ideal scale max( L n ∆ n , L K ∆ K ) , matc hing our intuition. In summary , once the metho d is initialized with a consistent warm-start estimator from an existing pro cedure, the iterativ e algorithm progressiv ely reﬁnes the lo cal neigh b orho od structures. Ultimately , the estimator ac hieves a conv ergence rate that coincides with the oracle b enc hmark, as if the true neigh b oring lay ers and no des were kno wn in adv ance. 4. Sim ulation Studies 4.1 Algorithm Eﬀectiv eness In this section, we compare the ﬁnite-sample p erformance of MICE with four b enc h- mark algorithms: the Iterative Connecting Probability Estimation (ICE) metho d (Qin et al., 2021), the Multiplex Netw orks with Shared Structure Appro ximation (MultiNeSS) mo del (MacDonald et al., 2022), the Degree-Corrected Multiple Adja- cency Sp ectral Em b edding (DC-MA SE) metho d (Agterb erg et al., 2025), and the Multi-La yer Neigh b orho o d Smo othing (MNS) algorithm (He et al., 2026). Among these comp etitors, ICE is designed for single-lay er net works, whereas the remaining three metho ds leverage multi-la yer information. An imp ortan t distinction is that MultiNeSS and DC-MASE were originally dev elop ed for latent p osition mo deling and comm unity detection, resp ectiv ely , rather than for direct estimation of connection probabilities. T o facilitate a fair comparison, we reconstruct the implied connection probabilities from their ﬁtted mo del parameters. F or MultiNeSS, the pro cedure returns lo w-rank approximation matrices b F and b G k , where b F captures the shared laten t structure across la y ers and b G k enco des la y er-sp eciﬁc deviations. F ollowing Remark 3 of Theorem 1 in MacDonald et al. (2022), we obtain the estimated connection probability b et w een no des i and j in lay er k via the additive form b P ij k = b F ij + b G k,ij . Similarly , DC- MASE is based on a m ulti-lay er Degree-Corrected Sto chastic Blo c k Mo del (DCSBM), 4.1 Algorithm Eﬀectiv eness under whic h the exp ected adjacency matrix for la y er k is P k = Θ k ZB k Z ⊤ Θ k , where Θ k denotes the lay er-sp eciﬁc degree correction matrix, Z is the shared communit y mem b ership matrix, and B k is the blo c k connectivity matrix. Once these parameters are estimated, w e compute the corresp onding connection probability matrix b P k directly from the DCSBM form ulation. T o assess the robustness and broad applicabilit y of our metho d, w e generate m ulti-lay er netw orks from ﬁve distinct graphon functions. Graphon 1 has a blo ck structure, Graphon 2 exhibits oscillatory b eha vior, and Graphon 3 is full-rank with degree monotonicit y . Graphons 4 and 5 are also full-rank but display more intricate structure at multiple scales. F or illustration, Figure 1 sho ws representativ e heatmaps of the true connection probabilit y matrices (the 50th lay er of a netw ork with K = 100 la yers). The explicit formulas for all ﬁve graphons are provided in Section S1 in the Supplemen tary Materials. W e ev aluate the p erformance of the prop osed estimator under tw o sim ulation conﬁgurations. • Scenario 1 (V arying No de Size): W e ﬁx the num b er of lay ers at K = 100 and v ary the no de size n ∈ { 100 , 200 , 500 , 1000 } to examine the estimator’s asymptotic b eha vior with resp ect to n . • Scenario 2 (V arying La yer Size): W e ﬁx the no de size at n = 200 and v ary 4.1 Algorithm Eﬀectiv eness the num b er of la yers K ∈ { 20 , 50 , 100 , 200 } to study how p erformance impro ves with increased cross-la yer information. The av erage Ro ot Mean Squared Error (RMSE) v alues, computed ov er 100 indep enden t replications, are rep orted in T ables 1 and 2. The Mean Absolute Error (MAE) results show similar patterns and are included in Section S2 of the Supplemen tary Materials. Figure 1: Connection probability matrices corresp onding to Graphons 1–5. As shown in T able 1, the estimation errors of all metho ds decrease monotoni- cally as the no de size n increases, which aligns with standard large-sample theory . The prop osed MICE metho d consis ten tly attains the low est RMSE across all ﬁv e graphon mo dels, indicating its strong ability to capture heterogeneous top ological structures. By adaptively aggregating information from lo cal neighborho o ds, MICE eﬀectiv ely leverages larger no de scales to furth er reduce estimation errors and reﬁne the probabilit y estimates. T able 2 highlights the distinct b eha viors of the comp eting metho ds as the la yer 4.2 P arameter Robustness size K gro ws. Because the ICE algorithm is designed for single-lay er net works, its error remains unchanged with resp ect to K ; the small ﬂuctuations observed are due to v ariabilit y in the h yp erparameters c hoices C it and C est across replications. MultiNeSS and DC-MASE show negligible impro vemen t as K increases. Although b oth mo dels incorp orate shared and lay er-sp eciﬁc structures, they rely hea vily on parameters that are estimated separately for eac h lay er, such as laten t p ositions in MultiNeSS or degree-correction and blo c k matrices in DC-MASE, so increasing K do es not pro vide additional information to reﬁne these la yer-speciﬁc comp onents. In con trast, b oth MNS and MICE exhibit substan tial gains as K increases. In particular, MICE consistently delivers the highest accuracy across all settings, demonstrating the eﬀectiv eness of its iterative strategy in b orro wing strength across la yers. 4.2 P arameter Robustness W e further examine the sensitivity of the MICE estimator to the neigh b orho od size scaling constants D i and G k , as deﬁned in Section 2.2. T o ev aluate robustness with resp ect to these tuning parameters, we conduct sim ulation exp erimen ts under three represen tative ( n, K ) settings: (100 , 100) , (200 , 100) , and (200 , 20) . F or each conﬁguration, w e v ary D i and G k o ver a geometric grid with log 2 D i , log 2 G k ∈ {− 3 , − 2 , − 1 , 0 , 1 , 2 } , corresp onding to the explicit v alues { 0 . 125 , 0 . 25 , 0 . 5 , 1 , 2 , 4 } . F or eac h choice, w e compute the RMSE, av eraged ov er 100 indep enden t replications, 4.2 P arameter Robustness across all ﬁv e graphon mo dels. The resulting p erformance patterns are summarized in Figure 2. Figure 2: A v erage RMSE with standard deviation error bars under v arying choices of D i and G k . (F or the setting ( n = 200 , K = 20) , the case log 2 G k = 2 is omitted b ecause the resulting neighborho o d size exceeds the la yer size.) As sho wn in Figure 2, the results exhibit mo derate v ariation in the optimal c hoice of tuning parameters. F or Graphons 1–3, which feature relativ ely smo oth or monotonic structures, the b est p erformance is obtained using larger neighborho o ds ( D i = G k = 1 ), as this reduces v ariance without in tro ducing substan tial bias. In con trast, for the more complex Graphons 4 and 5, a restricted no de-wise n eigh b orho o d ( D i = 0 . 5 ) ac hiev es the lo w est RMSE by excluding structurally dissimilar nodes. Because the p erformance loss from using this restricted no de neighborho o d in the smo other graphons is minimal, while it consistently improv es accuracy for the more irregular ones, we adopt D i = 0 . 5 and G k = 1 as the default conﬁguration in subsequen t analyses. Compared with the conv entional single-la yer bandwidth D i = 1 (Zhang et al., 2017), this choice yields a more conserv ative neighborho o d construction that enhances robustness to heterogeneit y an d irregular structures in multi-la yer net works, while main taining computational eﬃciency . 5. Real Data Analysis 5.1 Brain Net work Data in ADHD P atients In this section, we demonstrate the practical utility of the prop osed MICE metho d b y analyzing brain netw ork data from patients with Atten tion-Deﬁcit/Hyp eractivity Disorder (ADHD). ADHD is one of the most prev alent childhoo d-onset neuro dev el- opmen tal disorders, and c haracterizing individual-level heterogeneit y in functional brain connectivit y is crucial for accurate diagnosis and treatmen t. W e use data from the ADHD-200 Global Comp etition, whic h provides demographic information and functional MRI (fMRI) scans for b oth ADHD patients and t ypically developing con trols (TDC). The data were collected at eight participating institutions and are publicly a v ailable at https://www.nitrc.org/frs/?group_id=383 . 5.1 Brain Net work Data in ADHD P atients T o a void p oten tial site bias, our analysis fo cuses exclusively on the fMRI data collected at P eking Universit y . This site provides K = 24 diagnosed ADHD patients for constructing the multi-la yer netw ork. F or eac h sub ject, fMRI time series were recorded from n = 116 Regions of In terest (R OIs) across T = 172 time p oin ts. The net wo rk construction pro ceeds as follows. First, for each sub ject k ( k = 1 , . . . , K ), w e compute a sparse correlation matrix b et w een the time series of the n R OIs. Second, to remo ve weak or spurious correlations, w e apply a standard hard-thresholding pro cedure: an edge is set to 1 whenever the absolute pairwise correlation exceeds 0.7. The resulting adjacency tensor A con tains en tries A ij k ∈ { 0 , 1 } , indicating the presence or absence of a functional connection b et ween ROI i and R OI j for sub ject k . This pro cess yields a m ulti-lay er netw ork with K = 24 la yers, eac h of dimension n × n . The adjacency tensor A is then used as the input for estimating the underlying connection probabilit y tensor. Giv en that the true connection probabilit y tensor P is unobserv ed in this dataset, directly ev aluating estimation accuracy is not feasible. F ollo wing the strategy of Zhang et al. (2017), w e assess empirical p erformance through a link prediction task, whic h provides a practical proxy for ev aluating the estimated probability tensor. Sp eciﬁcally , we generate a partially observ ed adjacency tensor A obs = M ◦ A , where ◦ denotes the Hadamard pro duct. The mask tensor M has indep enden t en tries M ij k ∼ Bernoulli (1 − ρ ) , so that eac h entry is masked (set to missing) with probabilit y 5.2 W orldwide F o o d and Agricultural T rade Net work Data ρ . Based on the estimated probabilit y tensor b P and a classiﬁcation threshold τ ∈ (0 , 1) , w e then deﬁne the T rue Positiv e Rate (TPR) and F alse Positiv e Rate (FPR) as follo ws: TPR( τ ) = X i,j,k I  b P ij k > τ , A ij k = 1 , M ij k = 0  . X i,j,k I ( A ij k = 1 , M ij k = 0) , FPR( τ ) = X i,j,k I  b P ij k > τ , A ij k = 0 , M ij k = 0  . X i,j,k I ( A ij k = 0 , M ij k = 0) . (5.3) Based on these deﬁnitions, we plot the receiver op erating characteristic curv es for the ﬁve comparative metho ds in Figure 3a and rep ort their corresp onding AUC v alues in the ﬁrst row of T able 3. Overall, metho ds designed for m ulti-lay er net w orks consisten tly outp erform the single-lay er b enc hmark (ICE). Among the m ulti-lay er approac hes, the graphon-based estimators MNS and MICE achiev e substan tially higher predictive accuracy . In particular, the prop osed MICE metho d achiev es the highest AUC v alue of 0.9430, underscoring its robustness and practical utilit y for analyzing complex real-w orld datasets. 5.2 W orldwide F o o d and Agricultural T rade Net work Data The w orldwide fo o d and agricultural trade netw ork dataset is collected annually by the F o o d and Agriculture Organization (F A O) of the United Nations, co vering the p eriod from 1986 to 2023. The data are publicly av ailable at https://www.fao.org/ faostat/en/#data/TM . In this study , we fo cus on the most recen t trade data from 2023 and extract the rep orted imp ort quantities for our analysis. 5.2 W orldwide F o o d and Agricultural T rade Net work Data (a) ROC curves for link prediction on the ADHD brain netw ork under a missing rate of ρ = 0 . 1 . (b) ROC curves for link prediction on the 2023 w orldwide fo o d and agricultural trade net work under a missing rate of ρ = 0 . 2 . Figure 3: Receiv er op erating characteristic curves on tw o real-world datasets. T o iden tify the core no des of the net work, we compute eac h country’s frequency of app earance as b oth a rep orting and a partner nation. Applying a frequency threshold of 1,000 yields n = 137 k ey trading coun tries. This set includes ma jor global economies suc h as China, Russia, Canada, the United States, Japan, Australia, India, Brazil, and mem b er states of the Europ ean Union. T o construct a binary m ulti-lay er netw ork, w e randomly select K = 100 distinct item categories. The adjacency tensor is deﬁned using trade volumes: a strong connection b et ween coun try i and country j in item category k is recorded (that is, A ij k = 1 ) if the traded quantit y exceeds 1,000 units; 5.2 W orldwide F o o d and Agricultural T rade Net work Data otherwise, A ij k = 0 . W e adopt a v alidation strategy similar to that used in the ADHD brain netw ork analysis to ev aluate the p erformance of the MICE metho d. In this exp eriment, we set the missing rate of the mask tensor to ρ = 0 . 2 . Figure 3b displays the R OC curv es for eac h metho d, and the corresp onding AUC v alues are rep orted in the second ro w of T able 3. Consisten t with our earlier ﬁndings, MICE attains the highest A UC v alue of 0.8976, further demonstrating the robustness and practical eﬀectiv eness of our approac h in real-world applications. T o further assess the predictiv e capability of our mo del, we conduct a temp oral link prediction exp eriment that uses the 2022 trade netw ork to forecast the 2023 net work structure. Giv en the relative s tabilit y of ann ual trade v olumes, we adjust the edge-deﬁnition criterion: a connection is considered present if the trade v olume exceeds 100 units. W e randomly sample K = 200 netw ork la y ers and apply our metho d to the 2022 data. The v alidation task fo cuses on iden tifying emerging links. A prediction is coun ted as successful if the mo del assigns a high probabilit y to no de pairs that are unconnected in the 2022 netw ork (trade volume ≤ 100) but form a connection in the corresp onding 2023 netw ork (trade volume > 100). W e apply all ﬁv e metho ds to the 2022 data to generate the probability tensor b P (2022) . A p oten tial link is predicted whenev er the estimated probability exceeds 0.5. T o ev aluate p erformance sp eciﬁcally in identifying emerging connections, we restrict 5.2 W orldwide F o o d and Agricultural T rade Net work Data atten tion to no de-item triples ( i, j, k ) for whic h no link existed in 2022 ( A (2022) ij k = 0 ). A prediction is counted as correct if the mo del successfully forecasts the presence of a link in the 2023 net work ( A (2023) ij k = 1 ). The prediction precision is computed as: Precision = P i,j,k I  b P (2022) ij k > 0 . 5 , A (2023) ij k = 1 , A (2022) ij k = 0  P i,j,k I  b P (2022) ij k > 0 . 5 , A (2022) ij k = 0  . The prediction results for all ﬁv e mo dels are presen ted in the third row of T able 3. Among all ﬁv e metho ds, MICE achiev es the highest precision. How ever, the o verall precision lev els across metho ds remain limited. This suggests that relying solely on structural information from past trade netw orks, without incorp orating additional cov ariates, may b e insuﬃcien t for forecasting complex future trade patterns. Figure 4: T w o examples of v alidated link predictions. Emerging connections in 2023 are correctly forecasted from 2022 data with estimated probabilities of 0.712 and 0.626. T o illustrate the predictiv e capabilit y of the MICE metho d, w e highligh t tw o represen tative instances shown in Figure 4, where the mo del successfully identiﬁes laten t trade p otential. In b oth examples, the metho d assigns high probabilities to coun try-item pairs that are not y et connected in 2022 (v olume b elo w 100 tons) but b ecome connected in 2023. The ﬁrst case inv olves Item Co de 30 (Rice, paddy). In 2022, Romania imp orted 95.53 tons from German y , just b elo w the threshold ( A (2022) ij k = 0 ). Nevertheless, MICE assigns a probabilit y of 0.712 to this prosp ectiv e connection. This prediction is v alidated in 2023, when the trade volume rises to 123.92 tons and the link b ecomes active. A similar pattern is observ ed for Item Co de 1274 (Chemically mo diﬁed fats and oils). Poland imp orted 75.14 tons from Spain in 2022, which w as insuﬃcien t to form a link, y et the mo del predicts a probability of 0.626. By 2023, the volume increases sharply to 184.19 tons, conﬁrming a strong trade relationship. These examples illustrate the metho d’s ability to detect emerging links ahead of time. 6. Conclusion This study introduces an iterative estimation method for connection p robabilities in m ulti-lay er netw orks that progressively up dates neigh b oring sets and probabilit y estimates across b oth la yers and no des. The prop osed metho d is conceptually intuitiv e, straigh tforward to implement, and computationally eﬃcient. F o cusing on multi-la y er undirected binary netw orks, we establish the theoretical consistency of the estimator. Extensiv e n umerical exp erimen ts show that our approac h consisten tly outp erforms existing methods across div erse graphon functions and net work scales. Moreo ver, the metho d demonstrates strong p erformance in link prediction tasks on real-w orld datasets, underscoring its practical usefulness. Despite these contributions, sev eral directions remain op en for future research. Although we establish theoretical consistency , achieving the optimal conv ergence rate requires a large num b er of la yers, which may b e diﬃcult in practice. In addition, the current framework assumes a common no de set across all la y ers; relaxing this assumption to allo w for v arying or partially ov erlapping no de sets would substantially broaden the metho d’s applicability to heterogeneous net work settings. Finally , the curren t approac h employs a uniform quan tile threshold to determine neigh b orho o d sizes. Dev eloping data-driv en, adaptive mechanisms that assign neigh b orho od sizes based on lo cal structural density represents a promising av enue for further impro ving estimation accuracy . Supplemen tary Materials Supplemen tary material av ailable at Statistica Sinica online includes additional sim ulation results and detailed pro ofs of main theorems and supp orting lemmas. A c kno wledgements This research is supp orted b y NSF China (12401370), the Summit A dv ancement Disciplines of Zhejiang Pro vince (Zhejiang Gongshang Universit y - Statistics) and REFERENCES Collab orativ e Inno v ation Cen ter of Statistical Data Engineering T echnology & Appli- cation. References Agterb erg, J., Z. Lubb erts, and J. Arroy o (2025). Joint sp ectral clustering in multila yer degree-corrected sto c hastic blo c kmo dels. Journal of the Americ an Statistic al Asso ciation 120 (551), 1607–1620. Aldous, D. J. (1981). Represen tations for partially exchangeable arrays of random v ariables. Journal of Multivariate Analysis 11 (4), 581–598. Bic kel, P . J. and A. Chen (2009). A nonparametric view of netw ork mo dels and newman–girv an and other mo dularities. Pr o c e e dings of the National A c ademy of Scienc es 106 , 21068 – 21073. Bo ccaletti, S., V. Latora, Y. Moreno, M. Chav ez, and D.-U. Hwang (2006). Complex net works: Structure and dynamics. Physics r ep orts 424 (4-5), 175–308. Calderer, G. and M. L. Kuijjer (2021). Communit y detection in large-scale bipartite biological netw orks. F r ontiers in Genetics 12 , 649440. Chan, S. and E. Airoldi (2014). A consistent histogram estimator for exchangeable graph mo dels. In Pr oc e edings of the 31st International Confer enc e on Machine L e arning , V olume 32 of Pr o c e e dings of Machine Le arning Rese ar ch , pp. 208–216. Chandna, S. and P .-A. Maugis (2020). Nonparametric regression for multiple heterogeneous netw orks. arXiv pr eprint arXiv:2001.04938 . Chandna, S., S. C. Olhede, and P . J. W olfe (2022). Lo cal linear graphon estimation using cov ariates. REFERENCES Biometrika 109 (3), 721–734. Choi, D. (2017). Co-clustering of n onsmooth graphons. The Annals of Statistics 45 (4), 1488 – 1515. Eagle, N., A. Pen tland, and D. Lazer (2009). Inferring friendship netw ork structure by using mobile phone data. Pr o c e edings of the national ac ademy of scienc es 106 (36), 15274–15278. Gao, C., Y. Lu, and H. H. Zhou (2015). Rate-optimal graphon estimation. The Annals of Statistics 43 (6), 2624 – 2652. Golden b erg, A., A. X. Zheng, S. E. Fienberg, and E. M. Airoldi (2010). A survey of statistical netw ork mo dels. F oundations and T r ends ® in Machine L e arning 2 (2), 129–233. He, Y., Z. Huang, B. Jing, and D. Li (2026). Joint estimation of edge probabilities for multi-la yer netw orks via neighborho od smo othing. arXiv pr eprint arXiv:2601.20219 . Ho o v er, D. N. (1979). Relations on probability spaces and arrays of random v ariables. Pr eprint, Institute for Advanc e d Study . Jing, B., T. Li, Z. Lyu, and D. Xia (2021). Communit y detection on mixture multila yer netw orks via regularized tensor decomp osition. The Annals of Statistics 49 (6), 3181 – 3205. Kiv elä, M., A. Arenas, M. Barthelem y , J. P . Gleeson, Y. Moreno, and M. A. Porter (2014). Multilay er net works. Journal of c omplex networks 2 (3), 203–271. Lesk ov ec, J., K. J. Lang, and M. Mahoney (2010). Empirical comparison of algorithms for netw ork communit y detection. In Pr o c e e dings of the 19th international c onfer ence on W orld wide web , pp. 631–640. MacDonald, P . W., E. Levina, and J. Zhu (2022). Latent space mo dels for multiplex netw orks with shared REFERENCES structure. Biometrika 109 (3), 683–706. P aul, S. and Y. Chen (2016). Consistent comm unity detection in multi-relational data through restricted m ulti-lay er sto c hastic blo c kmo del. Ele ctr onic Journal of Statistics 10 (2), 3807 – 3870. P ensky , M. (2019). Dynamic netw ork mo dels and graphon estimation. The Annals of Statistics 47 (4), 2378 – 2403. Pham, P ., L. T. T. Nguyen, N. T. Nguyen, W. Pedrycz, U. Y un, and B. V o (2022). Comgcn: Communit y- driv en graph conv olutional netw ork for link prediction in dynamic netw orks. IEEE T ransactions on Systems, Man, and Cyb ernetics: Systems 52 (9), 5481–5493. Qin, Y., L. Y u, and Y. Li (2021). Iterative connecting probability estimation for netw orks. In A dvanc es in Neur al Information Pr o c essing Systems , V olume 34, pp. 1155–1166. Rahiminejad, S., M. R. Maury a, and S. Subramaniam (2019). T op ological and functional comparison of comm unity detection algorithms in biological netw orks. BMC bioinformatics 20 (1), 212. W ang, D., Y. Y u, and A. Rinaldo (2021). Optimal change point detection and lo calization in sparse dynamic net works. The Annals of Statistics 49 (1), 203–232. W olfe, P . J. and S. C. Olhede (2013). Nonparametric graphon estimation. arXiv pr eprint arXiv:1309.5936 . Xu, S., Y. Zhen, and J. W ang (2023). Cov ariate-assisted communit y detection in multi-la yer netw orks. Journal of Business & Ec onomic Statistics 41 (3), 915–926. Zhang, Y., E. Levina, and J. Zhu (2017). Estimating net work edge probabilities b y neigh b ourhoo d smo othing. Biometrika 104 (4), 771–783. REFERENCES Zhao, Z., L. Chen, and L. Lin (2019). Change-p oin t detection in dynamic netw orks via graphon estimation. arXiv preprint arXiv:1908.01823 . Dingzi Guo , Institute for Financial Studies, Shandong Universit y , Jinan, China E-mail: guo dingzi@mail.sdu.edu.cn Corresp onding author: Diqing Li , School of Statistics and Data Science, Zhejiang Gongshang Universit y , Hangzhou, China E-mail: dqli@mail.zjgsu.edu.cn Jingyi W ang , Institute for Financial Studies, Shandong Universit y , Jinan, China E-mail: 202432059@mail.sdu.edu.cn W en-Xin Zhou , Department of Information & Decision Sciences, College of Business Administration, Univ ersity of Illinois Chicago, Chicago, USA E-mail: wenxinz@uic.edu REFERENCES T able 1: RMSE ( × 100) across graphons and no de sizes for diﬀerent metho ds on a ﬁxed 100-lay er netw ork. Standard errors are rep orted in parentheses, and b oldface indicates the b est p erformance. Graphon No de Size( n ) ICE DC-MASE MultiNeSS MNS MICE Graphon 1 100 6.43 (0.03) 6.07 (0.11) 3.45 (0.28) 3.35 (0.13) 1.74 (0.22) 200 4.39 (0.01) 4.13 (0.07) 2.91 (0.19) 2.57 (0.08) 1.24 (0.14) 500 2.44 (0.00) 2.55 (0.03) 2.35 (0.17) 1.71 (0.04) 0.89 (0.03) 1000 1.57 (0.00) 1.80 (0.02) 1.99 (0.11) 1.19 (0.02) 0.80 (0.03) Graphon 2 100 6.90 (0.03) 7.40 (0.12) 5.30 (0.34) 4.01 (0.12) 2.78 (0.11) 200 5.53 (0.00) 5.58 (0.11) 4.57 (0.19) 3.26 (0.08) 2.13 (0.06) 500 4.03 (0.00) 3.99 (0.11) 3.50 (0.12) 2.54 (0.06) 1.59 (0.05) 1000 2.96 (0.04) 3.10 (0.07) 2.81 (0.09) 2.05 (0.04) 1.33 (0.05) Graphon 3 100 6.92 (0.01) 6.03 (0.15) 4.49 (0.23) 3.15 (0.08) 2.36 (0.14) 200 5.71 (0.01) 4.23 (0.07) 4.16 (0.21) 2.60 (0.05) 1.72 (0.09) 500 3.44 (0.01) 2.64 (0.03) 3.96 (0.21) 2.01 (0.02) 1.22 (0.06) 1000 2.34 (0.01) 1.86 (0.02) 3.85 (0.21) 1.62 (0.02) 1.02 (0.05) Graphon 4 100 10.62 (0.01) 10.01 (0.20) 9.62 (0.28) 8.02 (0.20) 6.53 (0.34) 200 9.47 (0.01) 8.37 (0.17) 9.13 (0.22) 7.51 (0.15) 5.79 (0.29) 500 8.60 (0.01) 7.43 (0.15) 7.98 (0.19) 6.80 (0.13) 5.18 (0.12) 1000 6.86 (0.01) 7.08 (0.14) 6.99 (0.16) 6.19 (0.12) 4.93 (0.20) Graphon 5 100 7.30 (0.01) 5.36 (0.23) 9.59 (0.52) 4.40 (0.15) 2.23 (0.12) 200 6.31 (0.01) 4.32 (0.24) 8.09 (0.38) 3.81 (0.13) 1.76 (0.08) 500 3.28 (0.00) 3.46 (0.25) 6.62 (0.36) 3.13 (0.09) 1.33 (0.06) 1000 3.23 (0.00) 3.26 (0.19) 5.72 (0.29) 2.62 (0.08) 1.12 (0.05) REFERENCES T able 2: RMSE ( × 100) across graphons and lay er sizes for diﬀeren t metho ds on a ﬁxed 200-no de netw ork. Standard errors are rep orted in paren theses, and b oldface denotes the b est p erformance. Graphon Lay er Size( K ) ICE DC-MASE MultiNeSS MNS MICE Graphon 1 20 4.37 (0.03) 4.33 (0.33) 3.13 (0.37) 4.05 (0.30) 1.94 (0.13) 50 4.41 (0.02) 4.12 (0.08) 2.98 (0.26) 3.05 (0.14) 1.49 (0.10) 100 4.39 (0.01) 4.13 (0.07) 2.91 (0.19) 2.57 (0.08) 1.24 (0.14) 200 4.35 (0.01) 4.12 (0.06) 2.88 (0.18) 2.24 (0.06) 1.06 (0.16) Graphon 2 20 5.43 (0.01) 5.89 (0.15) 4.82 (0.39) 4.11 (0.20) 3.35 (0.17) 50 5.51 (0.00) 5.66 (0.11) 4.62 (0.27) 3.67 (0.11) 2.57 (0.09) 100 5.53 (0.00) 5.58 (0.11) 4.57 (0.19) 3.26 (0.08) 2.13 (0.06) 200 5.61 (0.00) 5.51 (0.09) 4.51 (0.14) 2.97 (0.08) 1.82 (0.08) Graphon 3 20 5.30 (0.01) 4.37 (0.10) 5.22 (0.29) 3.35 (0.10) 2.65 (0.15) 50 5.77 (0.01) 4.31 (0.07) 4.47 (0.27) 2.90 (0.05) 2.05 (0.10) 100 5.71 (0.01) 4.23 (0.07) 4.16 (0.21) 2.60 (0.05) 1.72 (0.09) 200 5.27 (0.02) 4.18 (0.26) 3.99 (0.17) 2.36 (0.04) 1.49 (0.09) Graphon 4 20 9.61 (0.03) 8.61 (0.26) 9.19 (0.28) 8.43 (0.24) 7.32 (0.38) 50 9.54 (0.01) 8.39 (0.18) 9.13 (0.24) 7.88 (0.18) 6.57 (0.28) 100 9.47 (0.01) 8.37 (0.17) 9.13 (0.22) 7.51 (0.15) 5.79 (0.29) 200 10.10 (0.00) 8.35 (0.15) 9.11 (0.19) 7.14 (0.15) 5.29 (0.26) Graphon 5 20 5.51 (0.02) 4.36 (0.47) 7.94 (0.69) 4.77 (0.32) 2.73 (0.20) 50 5.43 (0.01) 4.33 (0.31) 8.05 (0.56) 4.21 (0.17) 2.07 (0.12) 100 6.31 (0.01) 4.32 (0.24) 8.09 (0.38) 3.81 (0.13) 1.76 (0.08) 200 6.35 (0.01) 4.24 (0.18) 8.16 (0.34) 3.45 (0.09) 1.50 (0.06) REFERENCES T able 3: P erformance on real-w orld datasets: A UC v alues for link prediction on ADHD and F AO (ﬁrst tw o rows), and prediction precision for forecasting the 2023 F A O net work based on 2022 data (last ro w). Dataset ICE DC-MASE MultiNeSS MNS MICE ADHD 0.7159 0.7422 0.7929 0.9183 0.9430 F A O 0.7720 0.7842 0.6123 0.8547 0.8976 F A O-prediction 20.13% 15.09% 19.50% 4.30% 23.16%

Connection Probabilities Estimation in Multi-layer Networks via Iterative Neighborhood Smoothing

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment