Contextual Graph Matching with Correlated Gaussian Features

Con textual Graph Matc hing with Correlated Gaussian F eatures Mohammad Hassan Ahmad Y arandi Sharif Univ ersity of T echnology , T ehran, Iran mh.ahmad.yarandi@gmail.com Luca Ganassali Univ ersit´ e P aris-Saclay , CNRS, Inria Lab oratoire de math ´ ematiques d’Orsa y , Orsay , F rance luca.ganassali@universite-paris-saclay.fr Abstract W e in vestigate con textual graph matc hing in the Gaussian setting, where b oth edge weigh ts and no de features are correlated across tw o netw orks. W e deriv e precise information-theoretic thresholds for exact recov ery , and iden tify conditions under which almost exact recov ery is p ossible or imp ossible, in terms of graph and feature correlation strengths, the num b er of no des, and feature dimension. Interestingly , whereas an all-or-nothing phase transition is observed in the standard graph-matching scenario, the additional con textual information introduces a ric her structure: thresholds for exact and almost exact recov ery no longer coincide. Our results provide the ﬁrst rigorous characterization of ho w structural and contextual information interact in graph matc hing, and establish a b enchmark for designing eﬃcient algorithms. 1 In tro duction and related w ork Database matc hing is a statistical inference problem which consists of iden tifying alignmen ts b et ween anon ymous databases. In the most basic model, there are tw o databases with corresp onding entities that are correlated through an underlying statistical mo del. The goal is to infer the corresp ondence using only observ ations drawn from the model. W e ﬁrst review the tw o w ell-studied t yp es of database matching problems, and then in tro duce a third v arian t, whic h combines elemen ts of the ﬁrst t wo and serves as the main fo cus of this pap er. 1.1 F eature-based database matching In feature-based database matc hing [ 6 , 8 , 33 ], a database consists of n users, eac h user having a feature vector in R d . Consider tw o databases X , Y ∈ R n × d referring to the same underlying set of users, where pairs of feature vectors of the form ( X i,j , Y π ∗ ( i ) ,j ) j =1 ,...,d , asso ciated with the same user, are correlated across databases. The one-to-one map π ∗ represen ts the corresp ondence b etw een users, which is unknown. The ob jective is to reco ver π ∗ based on the observ ation of ( X , Y ). F or example, a user’s features across tw o platforms – suc h as age, preferences, and ratings – are often correlated. By lev eraging the information av ailable in one of the databases, one ma y be able to de-anonymize the second one [23]. One of the standard statistical mo dels for this problem is the Gaussian mo del. In the Gaussian setting, all pairs of corresponding features across users, i.e., ( X i,j , Y π ∗ ( i ) ,j ), are dra wn i.i.d. from a join t Gaussian distribution with zero mean, unit v ariance, and correlation co eﬃcient η ∈ [ − 1 , 1]. 1 Regarding the literature on feature-based database matc hing, [ 6 ] is among the earliest w orks to establish a sharp information-theoretic threshold for exact reco very in the case where features tak e v alues from a ﬁnite alphab et. In the Gaussian setting, sev eral works hav e in vestigated the fundamental limits of the problem for diﬀerent criteria, including exact and almost exact reco very [ 7 , 9 ], as w ell as correlation detection [ 34 ]. Notably , for exact recov ery in the Gaussian setting, [7] establishes a sharp threshold for the correlation co eﬃcient η at η c , deﬁned b y log  1 1 − η 2 c  = 4 log( n ) d , b elo w which exact reco very of π ∗ is impossible and ab o ve which it is achiev able. In the limit as η c → 0, the threshold simpliﬁes to η c = 2 q log( n ) d . 1.2 Graph matc hing Graph matching is a w ell-kno wn instance of database matc hing [ 12 , 11 , 15 ]. Consider tw o graphs G 1 and G 2 with adjacency matrices A and B , such that each pair ( A i,j , B π ∗ ( i ) ,π ∗ ( j ) ) represents the in teractions b et ween the same pair of entities across the t wo databases. Unlike the previous setting, where information w as based on individual-lev el attributes (no des), here the information is encoded in the in teractions b etw een users (edges). With the gro wing av ailabilit y of graph-structured data, the c hallenge of recov ering suc h cor- resp ondences has b ecome increasingly signiﬁcant, with applications in pattern recognition [ 3 ], bioinformatics [20], and so cial netw ork analysis [21]. The tw o most widely studied statistical mo dels for graph matc hing are the Gaussian mo del and the Erd˝ os–R´ enyi mo del. In the Gaussian setting, it is assumed that corresp onding edges ( A i,j , B π ∗ ( i ) ,π ∗ ( j ) ) are drawn i.i.d. from a joint Gaussian distribution with standard Gaussian marginals and correlation co eﬃcient ρ ∈ [ − 1 , 1]. In the Erd˝ os–R ´ enyi setting, G 1 and G 2 are mo deled as Erd˝ os–R´ en yi graphs, with edges follo wing an i.i.d. correlated Bernoulli distribution sp eciﬁed b y P  A i,j = a, B π ∗ ( i ) ,π ∗ ( j ) = b  = p ab for a, b ∈ { 0 , 1 } , where the parameters p ab c haracterize the joint distribution of corresponding edges across the t wo graphs. The information-theoretic limits of graph matc hing for correlated Erd˝ os–R´ en yi graphs hav e b een inv estigated under v arious reco very regimes, including exact recov ery [ 24 , 4 , 28 ], almost exact and partial recov ery [ 5 , 17 , 18 ], and correlation detection [ 29 ]. Analogously , [ 16 , 28 ] examine the information-theoretic limits of graph matching under the Gaussian mo del. In particular, [ 28 ] establishes a sharp threshold for the correlation coeﬃcient ρ at ρ c giv en b y ρ c = 2 r log( n ) n , b elo w which no algorithm can reco ver any p ositiv e fraction of the correct matc hes, and abov e which the maximum-a-posteriori estimator succeeds in exact recov ery . Another thread of research fo cuses on the algorithmic asp ects of the graph matching problem [ 14 , 12, 10, 2, 25, 22]. W e also mention recen t w orks on m ulti-graph alignment [1, 26], whic h considers the matching problem for m ≥ 2 correlated graphs. 1.3 Graph matc hing with feature information: con textual graph matc hing In many real-world applications, b oth no de features and graph structure are often av ailable, whic h motiv ates the mo deling of b oth graph structure and database information simultaneously [ 31 , 32 ]. 2 More precisely , consider tw o databases, ( X , G 1 ) and ( Y , G 2 ), which con tain information ab out the same set of users through the feature matrices X , Y ∈ R n × d and the interaction graphs G 1 , G 2 with asso ciated adjacency matrices A , B . Consequen tly , ( X i,j , Y π ∗ ( i ) ,j ) and ( A i,j , B π ∗ ( i ) ,π ∗ ( j ) ) represen t the alignmen ts in feature and graph information, resp ectively . Utilizing both sources of information can enhance the identiﬁcation of the underlying alignment, from b oth computational and information-theoretic persp ectives. Similar to the previous matching problems, t w o statistical mo dels can b e considered for con textual graph matc hing. In the Gaussian model, the graphs G 1 and G 2 are assumed to be correlated Gaussian graphs with correlation parameter ρ , while the feature matrices X and Y consist of correlated Gaussian features with correlation parameter η . In the Erd˝ os–R ´ en yi mo del with Gaussian features, the graphs are assumed to be correlated Erd˝ os–R ´ enyi graphs with parameters p ab for a, b ∈ { 0 , 1 } , and the features are drawn from a join t Gaussian distribution with correlation co eﬃcien t η . Although graph matc hing and feature-based matching problems hav e been extensiv ely studied in the literature, the join t mo deling of these tw o sources of information has receiv ed signiﬁcantly less attention. T o the b est of our kno wledge, the only existing works in this direction are the recen t studies by [ 31 , 32 ]. In [ 31 ] they c haracterized the information-theoretic limits of exact recov ery in correlated Erd˝ os–R ´ enyi graphs with correlated Gaussian features, establishing a sharp threshold for exact recov ery of the form np 11 + d 4 log  1 1 − η 2  = log( n ) , under additional tec hnical conditions. In [ 32 ], the setting is extended to correlated sto chastic blo c k mo dels with tw o communities. They derive conditions for b oth exact matching and exact communit y detection recov ery in this mo del. Ho wev er, the setting inv olving Gaussian graph mo dels with correlated Gaussian features remains unexplored. This forms the fo cus of our inv estigation in this pap er. 2 Problem form ulation and main results 2.1 Notations Throughout this pap er, w e denote the set of all p ermutations of [ n ] = { 1 , . . . , n } b y S n . F or brevit y w e denote b y π ( i, j ) := { π ( i ) , π ( j ) } the non-oriented edge b et ween π ( i ) and π ( j ) for some π ∈ S n . F or each p erm utation π ∈ S n , D π and F π (resp ectiv ely , D E π and F E π ) denote the sets of unﬁxed and ﬁxed p oints of the permutation π acting on the no des (respectively , on the edges). More precisely , D π : = { i ∈ [ n ] : π ( i )  = i } , F π : = [ n ] \ D π , D E π : =  { i, j } ∈ E : π ( i, j )  = { i, j }  , F E π : = E \ D E π . W e partition S n as follows: S n = { id } ∪ n [ t =2 S n,t , where S n,t is the set of p erm utations of S n that diﬀer from id b y exactly t unﬁxed p oints, that is S n,t := { π ∈ S n , | D π | = t } . F or π , π ′ ∈ S n , deﬁne their o verlap: o verlap( π, π ′ ) := |{ i ∈ [ n ] , π ( i ) = π ′ ( i ) }| = | D π − 1 ◦ π ′ | . 3 W e say that a sequence of ev ents E n happ ens with high probability (w.h.p.) if P ( E n ) n →∞ − → 1 , or, equiv alently , P ( E n ) = 1 − o (1). A sequence of estimators ( ˆ π n ) n ≥ 1 of ( π ∗ n ) n ≥ 1 where for all n ≥ 1, π ∗ n ∈ S n , is said to achiev e • exact r e c overy if ˆ π n = π ∗ n w.h.p.; • almost exact r e c overy if for all δ ∈ (0 , 1), ov erlap( ˆ π n , π ∗ n ) > δ w.h.p.; • p artial r e c overy if there exists δ ∈ (0 , 1) such that o verlap( ˆ π n , π ∗ n ) > δ w.h.p. Throughout, we ma y omit the dep endence on n . 2.2 Problem form ulation In our setting, the statistical mo del is deﬁned as follo ws. Let G 1 and f G 2 b e graphs with same node set [ n ], with w eigh ted adjacency matrices A and e B , where the edge weigh t pairs { ( A i,j , e B i,j ) : 1 ≤ i < j ≤ n } are i.i.d. standard Gaussian random v ariables with correlation ρ . Then, G 2 is obtained b y permuting the v ertices of e G 2 according to a uniform random p ermutation π ∗ in S n . That is, the join t distribution of edge w eights is given b y  A i,j , B π ∗ ( i ) ,π ∗ ( j )  ∼ N  0 0  ,  1 ρ ρ 1  , (1) where ρ ∈ [ − 1 , 1]. F or the feature information, we draw t wo matrices X , e Y ∈ R n × d are such that each pair of corresp onding entries ( X i,j , e Y i,j ) are also i.i.d. standard Gaussian random v ariables with correlation η . The matrix Y is then formed by p erm uting the rows of e Y using the same permutation π ∗ used for the graph structure. The feature-level data is mo deled as  X i,j , Y π ∗ ( i ) ,j  ∼ N  0 0  ,  1 η η 1  , (2) where η ∈ [ − 1 , 1]. Using the permutation matrix Π ∗ corresp onding to π ∗ , this probabilistic mo del can b e also expressed as: B = ρ Π ∗ T AΠ ∗ + p 1 − ρ 2 Z , Y = η Π ∗ T X + p 1 − η 2 Z ′ , (3) where Z and Z ′ are matrices with indep enden t standard Gaussian entries. In the abov e statistical mo del (3), the goal is to infer the underlying p ermutation π ∗ using an estimator ˆ π based on the observed data: ˆ π = ˆ π ( A , B , X , Y ). 2.3 Main results As explained in the in tro duction, we in v estigate the information-theoretic limits of contextual graph matc hing in the Gaussian setting. Our ﬁrst main result is regarding establishing a sharp information-theoretic threshold for exact recov ery . Theorem 2.1 ( Exact Reco very ) . 4 ( i ) (A chievability R esult): if d = ω ( log n ) and for suﬃciently lar ge n , the fol lowing c ondition holds: ρ 2 n 1 − ρ 2 + η 2 d 1 − η 2 ≥ 4(1 + ε ) log n, for some ε > 0 , then ther e exists an estimator (namely, the MAP estimator) ˆ π : ( A , B , X , Y ) → S n such that P ( ˆ π = π ∗ ) = 1 − o (1) . ( ii ) (Converse R esult): If d = ω ((log n ) 2 ) and ρ 2 n 1 − ρ 2 + η 2 d 1 − η 2 ≤ 4 log n − log log n − ω (1) , then, for any estimator ˆ π : ( A , B , X , Y ) → S n , P ( ˆ π = π ∗ ) = o (1) . W e then pro vide a suﬃcien t condition for p ossibility of almost exact recov ery . Theorem 2.2 ( Achiev ability of almost exact reco very ) . If d = ω ( log n ) and for suﬃciently lar ge n , the fol lowing c ondition holds: ρ 2 n 1 − ρ 2 + 2 η 2 d 1 − η 2 ≥ 4(1 + ε ) log n, for some ε > 0 , then ther e exists an estimator ˆ π : ( A , B , X , Y ) → S n such that for al l δ ∈ (0 , 1) , P (ov erlap( ˆ π , π ∗ ) > δ ) = 1 − o (1) . Finally , we pro vide an information-theoretic lo wer b ound for almost exact recov ery when the signal is w eaker than in Theorem 2.2, and further show that no more than 50% of the no des can b e correctly matched under this bound. Theorem 2.3 (Imp ossibilit y of recov ering more than 50% of the nodes) . If d = ω  (log n ) 2  and ρ 2 n 1 − ρ 2 + 2 η 2 d 1 − η 2 ≤ 2(1 − ε ) log n, then, for any estimator ˆ π : ( A , B , X , Y ) → S n and for al l δ ∈ (0 . 5 , 1) , P (ov erlap( ˆ π , π ∗ ) > δ ) = o (1) . 5 ρ 2 n log n η 2 d log n 1 2 4 1 2 4 Figure 1: Recov ery phase diagram in the regime where ρ, η → 0. The blue line corresp onds to the exact reco v ery threshold (Theorem 2.1), the green line to the almost exact reco very threshold (Theorem 2.2), and the red region to the information-theoretic imp ossibilit y of ac hieving ov erlap greater than 50% (Theorem 2.3). 2.4 Discussion No all-or-nothing phase transition o ccurs. An illustration of the ab o ve results is pro vided in the phase diagram of Figure 1, which highlights a clear separation b et ween the regimes of exact reco very and almost exact reco very . In particular, our results sho w that the thresholds for exact and almost exact reco v ery do not coincide. Indeed, there exists a non trivial region (the green region in Figure 1) where exact reco very is information-theoretically imp ossible, while almost exact reco very remains achiev able with high probabilit y . This phenomenon sharply con trasts with the all-or-nothing phase transition observed in the graph matc hing setting [ 28 ] where either exact reco v ery is p ossible or partial reco v ery is imp ossible, with no intermediate regime. In our setting, the presence of information on the users fundamentally alters the recov ery landscap e, leading to a richer phase structure (see the double p oint at ( x, y ) = (4 , 0) in Figure 1). On the d = ω ( log n ) assumption. The conditions d = ω ( log n ) or d = ω (( log n ) 2 ) app earing in our results are common in the literature of database matching [ 7 , 9 ]. They are primarily tec hnical and stem from the concen tration arguments used in the pro ofs. W e b eliev e that these assumptions are not fundamen tal, and that the same information-theoretic thresholds should hold under muc h milder conditions on d , but with diﬀerent pro of techniques. Note ho wev er that in the regime where both ρ and η tend to zero and d = O ( log n ), the comparison of ρ 2 n + η 2 d to log n amoun ts to comparing the graph term ρ 2 n to log n . In this regime, the contribution of feature information is negligible at the level of recov ery thresholds, and the problem essen tially reduces to the graph-only setting. This suggests that, in the lo w correlation regime, the role of features b ecomes information-theoretically relev ant only when d gro ws faster than log n , making our assumptions natural. Conjecture regarding the threshold for partial reco very . W e conjecture the b ound in Theorem 2.2 to b e the sharp bound for almost exact and partial recov ery , which w e formalize in the 6 ρ 2 n log n η 2 d log n 1 2 4 1 2 4 Figure 2: Complete reco very phase diagram under Conjecture 2.1, in the regime where ρ, η → 0. The blue and green regions are unchanged from Figure 1. The red region corresp onds to infeasibilit y of partial reco v ery . The p oin t at ( x, y ) = (4 , 0) is now a triple point. follo wing conjecture. Conjecture 2.1. If ρ 2 n 1 − ρ 2 + 2 η 2 d 1 − η 2 ≤ 4(1 − ε ) log n , than for any estimator ˆ π : ( A , B , X , Y ) → S n and for al l δ ∈ (0 , 1) , P (o verlap( ˆ π , π ∗ ) > δ ) = o (1) . Conjecture 2.1 together with the prov ed results are summed up in the phase diagram of Figure 2. Op en questions. Beyond the information-theoretic characterization pro vided in this work, sev eral questions remain op en. A ﬁrst direction is to complete the phase diagram by closing the remaining gaps b etw een ac hiev ability and imp ossibility , in particular for partial reco very . A second, and more c hallenging, direction concerns computational aspects. The estimators considered here are instances of the Quadratic Assignment Problem, whic h is NP-hard in the w orst case (see e.g. [ 12 ]), whereas the feature-only setting reduces to a linear assignment problem and admits eﬃcient algorithms. Understanding whether and ho w no de features can mitigate the computational hardness of graph matching remains an in triguing open problem. 2.5 P ap er organization In Section 3, w e in tro duce the optimal estimators for all notions of reco very and establish some of their concen tration prop erties. These prop erties are then useful in Section 4, where we give the main ideas of the pro of of the exact reco very result (Theorem 2.1). Finally , in Section 5, we give the full proof our almost exact and partial recov ery results (Theorem 2.2 and Theorem 2.3). Additional pro ofs are deferred to the Appendix. 3 Optimal estimator As done b y [26], w e lo ok at the problem through the lens of Ba yesian inference where our optimal estimators are the minimizers of some exp ected loss. T o this end w e deﬁne our loss functions as 7 follo ws, for π , π ′ ∈ S n and r ∈ [0 , 1), d ( π , π ′ ) := 1 − o verlap( π, π ′ ) , l r ( π , π ′ ) := 1 { d ( π ,π ′ ) >r } . The exp ected loss of an estimator ˆ π ( D ) where D is the observ ed data D = ( A , B , X , Y ) is deﬁned b y L r ( ˆ π ) = E [ l r ( ˆ π ( D ) , π ∗ )] = P ( d ( ˆ π ( D ) , π ∗ ) > r ) . Equalit y ( a ) shows the concrete corresp ondence b etw een the Bay esian point of view and our problem deﬁnition in Section 2. More precisely , r = 0 gives L 0 ( ˆ π ) = P ( ˆ π  = π ∗ ) , which is the exact recov ery criterion and if r ∈ (0 , 1) then L r ( ˆ π ) for al l (resp. some ) r ∈ (0 , 1) is the criterion for almost exact (resp. partial) recov ery . Next, w e denote by P post := P π ∗ | D = the p osterior distribution of π ∗ after the observ ation of data D . Based on this Bay esian expression of the recov ery problems, the optimal estimator (for par- tial/almost exact/exact reco very) is the estimator among all v alid estimators that minimizes L r ( ˆ π ) for the corresponding r . Note that min ˆ π L r ( ˆ π ) = min ˆ π E D  E π ∗ | D [ l r ( ˆ π ( D ) , π ∗ )]  ≥ E D  min π ∈S n E π ∗ | D [ l r ( π ( D ) , π ∗ )]  . (4) Since ˆ π ( D ) = arg min π ∈S n E π ∗ | D [ l r ( π ( D ) , π ∗ )] = arg min π ∈S n E post [ l r ( π ( D ) , π ∗ )] is a v alid estimator, (4) implies that this estimator is optimal. If we denote by B ( π , r ) the closed ball of radius r at π for metric d , based on the distance metric we deﬁned earlier, one can rewrite the optimal estimator for L r as ˆ π opt = arg min π ∈S n P post ( B ( π , r ) c ) = arg max π ∈S n P post ( B ( π , r )) , where B ( π , r ) c is the complemen t of B ( π , r ) in S n . F urthermore, the optimal exp ected loss is L (opt) r = 1 − E P D  max π ∈S n P post ( B ( π , r ))  . These expression for optimal estimator and optimal expected loss help us to rewrite the recov ery criteria in the follo wing forms. Prop osition 3.1. (i) Exact r e c overy is p ossible iﬀ arg max π ∈S n P p ost ( π ) , which is the Maximum A Posteriori (MAP) estimator, e quals π ∗ with pr ob ability at le ast 1 − o (1) and imp ossible iﬀ π / ∈ arg max π ∈S n P p ost ( π ) with pr ob ability at le ast 1 − o (1) . (ii) A lmost exact r e c overy is p ossible iﬀ ∀ r ∈ (0 , 1) : max π ∈S n P p ost ( B ( π , r )) P D − → 1 and imp ossible iﬀ ∃ r ∈ (0 , 1) : max π ∈S n P p ost ( B ( π , r )) P D − → 0 . (iii) Partial r e c overy is p ossible iﬀ ∃ r ∈ (0 , 1) : max π ∈S n P p ost ( B ( π , r )) P D − → 1 and imp ossible iﬀ ∀ r ∈ (0 , 1) : max π ∈S n P p ost ( B ( π , r )) P D − → 0 . 8 These are the ultimate e quiv alen t reco v ery criteria that w e in vestigate to pro ve our main theorems. As the imp ortan t part of the criteria in prop osition 3.1 is the p osterior distribution we need to obtain it in a more explicit wa y . The p osterior distribution in this problem can b e written as: p ( π | A , B , X , Y ) = p ( A , B , X , Y | π ) p ( π ) p ( A , B , X , Y ) ( a ) = p ( A , B | π ) p ( X , Y | π ) p ( π ) p ( A , B , X , Y ) where the equality ( a ) holds since the only information that is shared betw een ( A , B ) and ( X , Y ) is π , and they are independent giv en π . No w, based on (1) and (2) we hav e p ( π | A , B , X , Y ) = C exp  − 1 2(1 − ρ 2 ) X 1 ≤ i 0 is a universal c onstant. Then the event H 1 o c curs with pr ob ability at le ast 1 − o (1) . R emark 3.3 . It is w orth noting that in [ 26 ] the authors establish a concentration b ound for V ∗ G of order O  n 3 / 2 ( log n ) 1 / 4  . The b ound in the presen t lemma is strictly tigh ter: for instance, if t = Θ(1), our b ound is O  √ n log n  , which is substantially smaller than O  n 3 / 2 ( log n ) 1 / 4  . The reason is that w e lev erage the fact that V ∗ G ( π ) and V ∗ F ( π ) only depend on π through the set D π . Therefore, for a lot of p erm utations that hav e same set of unﬁxed p oin ts, the v alue of these comp onents are similar. 3.2 Upp er Bound for V F and V G No w, w e b ound the remaining terms in the exp onen tial comp onent, namely V F and V G . Unlik e V ∗ F and V ∗ G , which are almost constan t with high probabilit y , the ﬂuctuations of V F and V G are signiﬁcan tly larger than their means. W e adopt a diﬀerent approac h: instead of directly bounding V F ( π ) and V G ( π ), we control their Laplace transforms. The pro of of the follo wing lemma is given in App endix B. Lemma 3.4. Ther e exists an event H 2 , indep endent of ( A i,j ) 1 ≤ i 0 is a universal c onstant. R emark 3.5 . In [ 26 ], the authors bound E  e β V G ( π ) 1 H 2  for small β using a series expansion and retain only the leading terms. In con trast, our approach derives an upper b ound on a high-probabilit y ev ent; since the imp ossibilit y result holds with high probability , conditioning on H 2 do es not weak en the argument. An imp ortant adv antage of our metho d is that it applies to all ρ, η ∈ [0 , 1), whereas the approac h of [ 26 ] requires β (and thus ρ, η ) to be small. This is particularly crucial for the V F term, as when d = o (log n ) the quantit y η 1 − η 2 ma y div erge, making our approach necessary in this regime. 10 3.3 Lo w er Bound for Z A trivial lo w er bound on Z is given b y 1 = exp ( − V (id)) ≤ Z. In fact, this inequality is suﬃcien t for the pro of of p ossibility of almost exact recov ery , b ecause as we will show the numerator of P post ( B ( π , r ) c ) is bounded from ab ov e b y exp ( − Θ( n log n )) with high probability . Thus, Z ≥ 1 is enough to ensure that max π ∈S n P post ( B ( π , r )) P D − → 0. Ho wev er, the situation in the imp ossibility condition b ecomes more complicated. In this case, as w e will see, we need to pro ve that Z ≥ exp ( εn log n (1 − o (1))) in order to prov e imp ossibilit y of almost exact reco v ery . More precisely , we sho w that under the imp ossibilit y criterion ρ 2 1 − ρ 2 n + 2 η 2 1 − η 2 d ≤ 2(1 − ε ) log n, the v alue of Z is suﬃcien tly large. The pro of of this fact pro ceeds in t w o steps. In the ﬁrst step, we sho w that log ( Z ) is sharply concen trated around its mean E [ log ( Z )]. This is a simple consequence of the fact that log ( Z ) is with high probabilit y a Lipschitz function of Gaussian random v ariables in our problem. W e obtain: log( Z ) ≥ E [log( Z )] − o ( n log n ) with probability at least 1 − o (1). In the second step, we establish a lo wer bound for E [ log ( Z )]. More precisely , w e show, using a conditional second momen t metho d, that E [log( Z )] ≥ 1 + ε 2 n log n (1 − o (1)) . Com bining this result with the ﬁrst step completes the pro of, yielding a high-probability low er b ound for Z . W e obtain the following, whose pro of is giv en in App endix C. Lemma 3.6. If d = ω ((log( n )) 2 ) and ρ 2 1 − ρ 2 n + 2 η 2 1 − η 2 d ≤ 2(1 − ε ) log n, then Z ≥ exp  1 + ε 2 n log ( n )(1 + o (1))  , with pr ob ability at le ast 1 − o (1) . 4 Exact reco v ery In this section, w e giv e the main ideas in the pro of of our exact reco very results (Theorem 2.1). 4.1 Ac hiev abilit y Result Pr o of of The or em 2.1, ( i ) . As mentioned in Prop osition 3.1, the MAP estimator is the optimal estimator for exact reco very . Moreov er, we can assume without loss of generality that π ∗ = id , so that P ( ˆ π  = π ) ≥ P (MAP fails) = P ( ∃ π ∈ S n : V ( π ) < 0) . 11 T o analyze the MAP estimator, we decomp ose S n in to the orbits around the identit y p ermutation, denoted by S n,t for t ∈ { 2 , . . . , n } , and in vestigate the failure of the MAP estimator on eac h of the ev ents E t = {∃ π ∈ S n,t , V ( π ) < 0 } . W e show that P ( MAP fails ) = P ( S n t =2 E t ) is of order o (1). The simple ﬁrst moment metho d enables to deal with the union of E t only when t is not too close to n , otherwise it fails due to correlations across even ts E t . Consequen tly , w e distinguish the cases 2 ≤ t ≤ α 0 n (ﬁrst moment metho d works w ell), for a w ell-chosen α 0 ∈ (0 , 1) and t > α 0 n . In the second case, a tighter control of the Laplace transform of V ( π ) for each π ∈ S n,t tak es proﬁt of these correlations and mak es a lay ered ﬁrst momen t method w ork. W e sum up these ideas in the following Lemma pro v ed in Section D, which concludes the pro of. Lemma 4.1. Assume that c onditions of The or em 2.1, ( i ) hold. Then, P   [ 2 ≤ t ≤ n E t   = o (1) . 4.2 Con v erse Result Pr o of of The or em 2.1, ( ii ) . T o pro ve the con verse part, we show that b elo w the threshold, there exists, with high probability , π ∈ S n \ { id } suc h that V ( π ) < 0, demonstrating the failure of the MAP estimator, which is optimal for exact recov ery . More precisely , we sho w that a signiﬁcant n umber of p ermutations in S n, 2 , i.e., transp ositions, satisfy V ( π ) < 0. W e sho w this via the second momen t method. Supp ose N is a random v ariable representing the num b er of transp ositions that lead to the failure of the MAP estimator on some high probabilit y ev en t H 2 (see Lemma B.1). N : = X π ∈S n, 2 1 { V ( π ) < 0 } 1 H 2 . Since N is a random v ariable with a p ositiv e mean and ﬁnite v ariance, it follows from the P aley-Zygmund inequalit y that for all 0 < c < 1, P ( N ≥ c E [ N ]) ≥ (1 − c ) 2 E [ N ] 2 E [ N 2 ] . Hence, if w e pro ve that E [ N ] → ∞ and E [ N 2 ] ≤ (1 + o (1)) E [ N ] 2 , it follo ws that, with high probabilit y , N = ω (1), th us completing the pro of. Lemma 4.2. Assume that c onditions of The or em 2.1, ( ii ) hold. Then, E [ N ] → ∞ and E [ N 2 ] ≤ (1 + o (1)) E [ N ] 2 . 5 Almost exact and partial reco v ery In this section, w e prov e the results of Theorems 2.2 and 2.3, regarding almost exact and partial reco very . 12 5.1 Ac hiev abilit y Result Pr o of of The or em 2.2. Recall that we assume π ∗ = id . By Prop osition 3.1, to show that almost exact recov ery is reac hable, it is enough to sho w that with high probabilit y , P post ( B (id , r ) c ) = o (1) for all r ∈ (0 , 1). Based on equation (18), we hav e, for t ∈ [ r n, n ], for π ∈ S n,t E [exp( − V ( π )) 1 H 1 ∩H 2 ] ≤ exp  − 1 2  ρ 2 1 − ρ 2 t  n − t 2  + η 2 1 − η 2 td  (1 − o (1))  ≤ exp  − 1 4  ρ 2 1 − ρ 2 nt + 2 η 2 1 − η 2 dt  (1 − o (1))  ≤ exp ( − (1 + ε ) t log n (1 − o (1))) where the last inequalit y follo ws from the p ossibility condition ρ 2 1 − ρ 2 n + 2 η 2 1 − η 2 d ≥ 4(1 + ε ) log n . Let r in (0 , 1). W e use the shorthand P := P post ( B (id , r ) c ). E [ Z P 1 H 1 ∩H 2 ] = n X t = rn X π ∈S n,t E [exp( − V ( π )) 1 H 1 ∩H 2 ] ≤ n X t = rn |S n,t | exp ( − (1 + ε ) t log n (1 − o (1))) ≤ n X t = rn exp ( − εt log n (1 − o (1))) = o (1) . Observ e that log n × E [ Z P 1 H 1 ∩H 2 ] = o (1), and Mark o v inequalit y yields that probabilit y at least 1 − o (1) we ha ve Z P 1 H 1 ∩H 2 ≤ log n × E [ Z P 1 H 1 ∩H 2 ] = o (1) , and since H 1 ∩ H 2 is a high probability even t, this giv es Z P = o (1) with high probability . Since Z > 1 then with high probabilit y P = P post ( B (id , r ) c ) = o (1). This completes the pro of of ac hiev abilit y part. 5.2 Con v erse Result Pr o of of The or em 2.3. T o prov e the con verse part, by Prop osition 3.1, w e need to sho w that max π ∈ B (id ,r ) P post ( B ( π , r )) = o (1) with high probability for any r < 0 . 5. Since all sets B ( π , r ) for π ∈ B ( id , r ) are con tained in B ( id , 2 r ), it is suﬃcien t to show that for any r < 1, w.h.p., P post ( B (id , r )) = o (1). Assume, without loss of generality , that ρ 2 1 − ρ 2 n + 2 η 2 1 − η 2 d = 2(1 − ε ) log n . Similar to the ac hiev abilit y part, for each π in S n,t w e ha v e E [exp( − V ( π )) 1 H 1 ∩H 2 ] ≤ exp  − 1 4  ρ 2 1 − ρ 2 nt + 2 η 2 1 − η 2 dt  (1 − o (1))  ≤ exp  − 1 − ε 2 t log n (1 − o (1))  13 No w, w e hav e, using the same shorthand P := P post ( B (id , r ) c ) E [ Z P 1 H 1 ∩H 2 ] = rn X t =1 X π ∈S n,t E [exp( − V ( π )) 1 H 1 ∩H 2 ] ≤ rn X t =1 |S n,t | exp  − 1 − ε 2 t log n (1 − o (1))  ≤ rn X t =1 exp  1 + ε 2 t log n (1 − o (1))  ≤ exp  1 + ε 2 r n log n (1 − o (1))  Mark ov inequalit y yields that w.h.p., Z P 1 H 1 ∩H 2 ≤ log n × E [ Z P 1 H 1 ∩H 2 ] ≤ exp  1 + ε 2 r n log n (1 − o (1))  . On the other hand, Lemma 3.6 shows that under the imp ossibility criterion of almost exact reco very w e ha v e with probability 1 − o (1), Z ≥ exp  1 + ε 2 n log n (1 − o (1))  Com bining the last t wo equation it results that w.h.p. (again, H 1 ∩ H 2 is a high probabilit y ev ent), P = P post ( B (id , r )) ≤ exp  − (1 − r ) 1 + ε 2 n log n (1 − o (1))  = o (1) . Ac kno wledgmen ts The authors w ould like to thank Louis V assaux for useful discussions. References [1] T aha Ameen and Bruce Ha jek. Detecting correlation b et ween multiple unlab eled gaussian net works. arXiv pr eprint arXiv:2504.16279 , 2025. [2] Boaz Barak, Chi-Ning Chou, Zhixian Lei, Tselil Schramm, and Y ueqi Sheng. (nearly) eﬃcient algorithms for the graph matc hing problem on correlated random graphs. A dvanc es in Neur al Information Pr o c essing Systems , 32, 2019. [3] Alexander C Berg, T amara L Berg, and Jitendra Malik. Shap e matching and ob ject recognition using low distortion correspondences. In 2005 IEEE c omputer so ciety c onfer enc e on c omputer vision and p attern r e c o gnition (CVPR’05) , volume 1, pages 26–33. IEEE, 2005. 14 [4] Daniel Cullina and Negar Kiy av ash. Exact alignmen t recov ery for correlated erd \ h { o } sr \ ’en yi graphs. arXiv pr eprint arXiv:1711.06783 , 2017. [5] Daniel Cullina, Negar Kiy av ash, Prateek Mittal, and H Vincent P o or. P artial recov ery of erd˝ os-r´ en yi graph alignmen t via k-core alignmen t. A CM SIGMETRICS Performanc e Evaluation R eview , 48(1):99–100, 2020. [6] Daniel Cullina, Prateek Mittal, and Negar Kiy av ash. F undamen tal limits of database alignmen t. In 2018 IEEE International Symp osium on Information The ory (ISIT) , pages 651–655. IEEE, 2018. [7] Osman E Dai, Daniel Cullina, and Negar Kiya v ash. Database alignment with gaussian features. In The 22nd International Confer enc e on A rtiﬁcial Intel ligenc e and Statistics , pages 3225–3233. PMLR, 2019. [8] Osman Emre Dai. FUNDAMENT AL LIMITS AND ALGORITHMS FOR D A T ABASE AND GRAPH ALIGNMENT . PhD thesis, Georgia Institute of T ec hnology , 2023. [9] Osman Emre Dai, Daniel Cullina, and Negar Kiya v ash. Gaussian database alignment and gaussian planted matc hing. arXiv pr eprint arXiv:2307.02459 , 2023. [10] Jian Ding, Zongming Ma, Yihong W u, and Jiaming Xu. Eﬃcient random graph matc hing via degree proﬁles. arxiv e-prin ts, page. arXiv pr eprint arXiv:1811.07821 , 2018. [11] Jian Ding, Zongming Ma, Yihong W u, and Jiaming Xu. Eﬃcient random graph matc hing via degree proﬁles. Pr ob ability The ory and R elate d Fields , 179(1):29–115, 2021. [12] Zhou F an, Cheng Mao, Yihong W u, and Jiaming Xu. Spectral graph matching and regularized quadratic relaxations i: The gaussian mo del. arXiv pr eprint arXiv:1907.08880 , 2019. [13] Da vid Gamarnik and Ilias Zadik. Sparse high-dimensional linear regression. estimating squared error and a phase transition. The Annals of Statistics , 50(2):880–903, 2022. [14] L Ganassali, M Lelarge, and L Massouli ´ e. Sp ectral alignment of correlated gaussian random matrices. arxiv e-prin ts, art. arXiv pr eprint arXiv:1912.00231 , 2019. [15] Luca Ganassali. The gr aph alignment pr oblem: fundamental limits and eﬃcient algorithms . PhD thesis, PSL Researc h Univ ersity; Ecole normale sup ´ erieure, 2022. [16] Luca Ganassali. Sharp threshold for alignmen t of graph databases with gaussian w eights. In Mathematic al and Scientiﬁc Machine L e arning , pages 314–335. PMLR, 2022. [17] Luca Ganassali, Lauren t Massouli ´ e, and Marc Lelarge. Imp ossibility of partial reco very in the graph alignment problem. In Confer enc e on L e arning The ory , pages 2080–2102. PMLR, 2021. [18] Georgina Hall and Laurent Massouli´ e. P artial reco very in the graph alignmen t problem. Op er ations R ese ar ch , 71(1):259–272, 2023. [19] Da vid Lee Hanson and F arroll Tim W righ t. A b ound on tail probabilities for quadratic forms in indep enden t random v ariables. The Annals of Mathematic al Statistics , 42(3):1079–1083, 1971. [20] Ehsan Kazemi, Hamed Hassani, Matthias Grossglauser, and Hassan Pezeshgi Mo darres. Prop er: global protein in teraction netw ork alignmen t through p ercolation matching. BMC bioinformatics , 17:1–16, 2016. 15 [21] Nitish Korula and Silvio Lattanzi. An eﬃcien t reconciliation algorithm for so cial netw orks. arXiv pr eprint arXiv:1307.1690 , 2013. [22] Cheng Mao, Yihong W u, Jiaming Xu, and Sophie H Y u. T esting net work correlation eﬃciently via counting trees. The A nnals of Statistics , 52(6):2483–2505, 2024. [23] Arvind Naray anan and Vitaly Shmatiko v. Robust de-anonymization of large sparse datasets. In 2008 IEEE Symp osium on Se curity and Privacy (sp 2008) , pages 111–125. IEEE, 2008. [24] P edram P edarsani and Matthias Grossglauser. On the priv acy of anonymized netw orks. In Pr o c e e dings of the 17th A CM SIGKDD international c onfer enc e on Know le dge disc overy and data mining , pages 1235–1243, 2011. [25] Sushil Mahavir V arma, Ir` ene W aldspurger, and Laurent Massouli ´ e. Graph alignment via birkhoﬀ relaxation. arXiv pr eprint arXiv:2503.05323 , 2025. [26] Louis V assaux and Laurent Massouli ´ e. The feasibility of multi-graph alignmen t: a ba yesian approac h. arXiv pr eprint arXiv:2502.17142 , 2025. [27] Martin J W ain wright. High-dimensional statistics: A non-asymptotic viewp oint , volume 48. Cam bridge univ ersit y press, 2019. [28] Yihong W u, Jiaming Xu, and H Y u Sophie. Settling the sharp reconstruction thresholds of random graph matc hing. IEEE T r ansactions on Information The ory , 2022. [29] Yihong W u, Jiaming Xu, and Sophie H Y u. T esting correlation of unlabeled random graphs. The A nnals of Applie d Pr ob ability , 33(4):2519–2558, 2023. [30] Jiaming Xu. Sp ectral graph matc hing and regularized quadratic relaxations. 2019. [31] Jo onh yuk Y ang and Hy e W on Chung. Exact graph matching in correlated gaussian-attributed erd˝ os-r´ enyi mo de. In 2024 IEEE International Symp osium on Information The ory (ISIT) , pages 3450–3455. IEEE, 2024. [32] Jo onh yuk Y ang and Hye W on Chung. Exact matc hing in correlated netw orks with no de attributes for impro v ed comm unity recov ery . arXiv pr eprint arXiv:2501.02851 , 2025. [33] K Zeynep. Datab ase Alignment: F undamental Limits and Multiple Datab ases Setting . PhD thesis, Boston Univ ersit y , 2024. [34] K Zeynep and Bobak Nazer. Detecting correlated gaussian databases. In 2022 IEEE Interna- tional Symp osium on Information The ory (ISIT) , pages 2064–2069. IEEE, 2022. A Pro of of Lemma 3.2 Pr o of. The ﬁrst step in pro ving this lemma is to express V ∗ G ( π ) and V ∗ F ( π ) as standard quadratic forms. Accordingly , we construct the vector v G ∈ R n ( n − 1) b y em b edding the edge w eights so that eac h pair of corresponding entries A i,j and B π ∗ ( i,j ) are placed consecutively . Similarly , we deﬁne the v ector v F ∈ R 2 nd b y embedding the feature information in suc h a w a y that eac h pair of corresponding en tries X i,j and Y π ∗ ( i ) ,j app ear next to each other. The vectors v G and v F are indep endent Gaussian random vectors with zero mean and cov ariance matrices Σ G and Σ F , respectively , where both matrices are block diagonal with blocks given b y  1 ρ ρ 1  and  1 η η 1  . 16 No w, we can represen t the quadratic forms as V ∗ G ( π ) = v ⊤ G M π G v G and V ∗ F ( π ) = v ⊤ F M π F v F , where M π G (resp. M π F ) is a block-diagonal matrix. Sp eciﬁcally , for eac h pair ( A i,j , B π ∗ ( i,j ) ) (resp. ( X i,j , Y π ∗ ( i ) ,j )), if π ∗ ( i, j )  = π ( i, j ) (resp. π ∗ ( i )  = π ( i )), then the corresponding blo ck is given by  0 0 . 5 0 . 5 0  , and otherwise the blo ck is the zero matrix. Ha ving established these quadratic forms for Gaussian random vectors, w e can apply the classical Hanson–W right inequality [19] to deriv e concen tration inequalities for V ∗ G and V ∗ F . Lemma A.1 (Hanson–W right Inequality) . L et z b e a r andom ve ctor with i.i.d. standar d Gaussian entries, and let A b e a deterministic squar e matrix. Then ther e exists a universal c onstant c > 0 such that P     z ⊤ A z − E h z ⊤ A z i    > ε  ≤ 2 exp  − c ε 2 ∥ A ∥ 2 F + ∥ A ∥ op ε  . Pr o of. A pro of can be found in [19]. T o apply the Hanson–W right inequalit y in our setting, w e ma y replace without loss of generalit y v G with Σ 1 / 2 G z , where z is a standard Gaussian v ector. Under this representation, for a ﬁxed π w e obtain P    v T G M π G v G − E  v T G M π G v G    > ε  = P     z ⊤ Σ 1 / 2 G M π G Σ 1 / 2 G z − E h z ⊤ Σ 1 / 2 G M π G Σ 1 / 2 G z i    > ε  ≤ 2 exp − c ε 2 ∥ Σ 1 / 2 G M π G Σ 1 / 2 G ∥ 2 F + ∥ Σ 1 / 2 G M π G Σ 1 / 2 G ∥ op ε ! . (8) Applying the same argumen t to V ∗ F ( π ) yields P ( | V ∗ F ( π ) − E [ V ∗ F ( π )] | > ε ) ≤ 2 exp − c ε 2 ∥ Σ 1 / 2 F M π F Σ 1 / 2 F ∥ 2 F + ∥ Σ 1 / 2 F M π F Σ 1 / 2 F ∥ op ε ! . Note that, b y the deﬁnitions of M π F and Σ F , we can b ound the op erator norm as ∥ Σ 1 / 2 F M π F Σ 1 / 2 F ∥ op = ∥ Σ F M π F ∥ op ≤ ∥ Σ F ∥ op ∥ M π F ∥ op ≤ 1 2 (1 + η ) ≤ 1 . (9) Moreo ver, the F rob enius norm can b e b ounded as follo ws: ∥ Σ 1 / 2 F M π F Σ 1 / 2 F ∥ 2 F = ∥ Σ F M π F ∥ 2 F ( a ) = d | D π | 4 (1 + η 2 ) ≤ d | D π | , (10) where step ( a ) follows from the fact that Σ F M π F is blo ck diagonal with d | D π | nonzero blo cks of the form  0 . 5 η 0 . 5 0 . 5 0 . 5 η  , and all remaining blocks equal to zero. By the same reasoning, for the graph term we obtain ∥ Σ 1 / 2 G M π G Σ 1 / 2 G ∥ op ≤ 1 , ∥ Σ 1 / 2 G M π G Σ 1 / 2 G ∥ 2 F ≤ | D E π | . (11) W e pro ceed ﬁrst b y proving the V ∗ F part of Theorem 3.2 and then the V ∗ G part. T o derive a tigh t b ound for V ∗ F , w e exploit the fact that for all p erm utations sharing the same set D π , the v alue of V ∗ F is identical, since it dep ends only on the set of no des con tained in D π , rather than on the speciﬁc mapping induced b y π . Therefore, we use the notation V ∗ F ( D ) to denote V ∗ F ( π ) for all π suc h that D π = D . Based on this prop ert y , w e obtain 17 P ( ∃ t ∈ [ n ] , ∃ π ∈ S n,t : | V ∗ F ( π ) − E [ V ∗ F ( π )] | > ε ) = P ( ∃ t ∈ [ n ] , ∃ D ⊆ [ n ] : | D | = t, | V ∗ F ( D ) − E [ V ∗ F ( D )] | > ε ) = n X t =1 X D ⊆ [ n ] | D | = t P ( | V ∗ F ( D ) − E [ V ∗ F ( D )] | > ε ) ( a ) ≤ n X t =1  n t  2 exp  − c ε 2 dt + ε  where in (a) we apply the Hanson–W righ t inequality (8) . No w, b y setting ε = C t q max  d, log  en t  log  en t  for some suﬃcien tly large constan t C > 0, we obtain n X t =1  n t  2 exp  − c ε 2 dt + ε  = n X t =1  n t  2 exp   − c C 2 t 2 max { d, log  en t  } log  en t  dt + C t q max { d, log  en t  } log  en t    ≤ n X t =1  n t  2 exp  − C ′ t log  en t  ( a ) ≤ n X t =1 exp  − C ′′ t log  en t  ( b ) ≤ exp  − ( C ′′ − 1) log ( n )  = o (1) , where C ′ , C ′′ > 1 are constants. Step ( a ) follows from the inequality  n t  ≤  en t  t , and step ( b ) follo ws from the fact that the minimum of t log  en t  o ver t ∈ [ n ] is attained at t = 1, yielding log( n ) + 1. W e no w turn to the pro of of the V ∗ G part. The idea b ehind obtaining a tigh t bound is similar to that of V ∗ F , namely , grouping together all p erm utations that yield identical v alues of V ∗ G . Ho wev er, the situation is more complicated than in the case of V ∗ F , since V ∗ G ( π ) is determined b y D E π rather than D π . Consequently , since p ermutations with the same D π do not necessarily share the same D E π , they may not pro duce the same v alue of V ∗ G . Nevertheless, w e know that for any π and π ′ suc h that D π = D π ′ , we hav e   D E π △ D E π ′   ≤   D π   since the only edges in D E π that are not in D E π can b e pairs of no des in D π whic h are at most | D π | / 2. Moreov er, b oth   D E π   and   D E π ′   are of order O ( n | D π | ). Therefore, the symmetric diﬀerence b etw een D E π and D E π ′ is negligible. This implies that for all p erm utations with the same D π , the v alues of V ∗ G ( π ) are appro ximately equal. W e formalize this observ ation in the follo wing lemma. Lemma A.2. F or D ⊆ [ n ] , deﬁne the set of al l p ermutations whose set of unﬁxe d p oints is e qual to D as P D : = { π ∈ S n : D π = D } , and supp ose that ˜ π is an arbitr ary p ermutation in P D . If ε > C | D | log ( | D | ) for some lar ge enough c onstant C > 0 , then we obtain P ( ∃ π ∈ P D : | V ∗ G ( π ) − E [ V ∗ G ( π )] | > ε ) ≤ P  | V ∗ G ( ˜ π ) − E [ V ∗ G ( ˜ π )] | > ε 2  + exp  − C ′ ε  . Pr o of. F or π , ˜ π ∈ P D deﬁne V ∗ G ( π \ ˜ π ) : = X 1 ≤ i ε     | V ∗ G ( ˜ π ) − E [ V ∗ G ( ˜ π )] | ≤ ε 2  = P  | V ∗ G ( ˜ π ) − E [ V ∗ G ( ˜ π )] + V ∗ G ( π \ ˜ π ) − V ∗ G ( ˜ π \ π ) − E [ V ∗ G ( π \ ˜ π ) − V ∗ G ( ˜ π \ π )] | > ε     | V ∗ G ( ˜ π ) − E [ V ∗ G ( ˜ π )] | ≤ ε 2  ≤ P     V ∗ G ( π \ ˜ π ) − E h V ∗ G ( ˜ π \ ˜ π ) i    > ε 4  + P  | V ∗ G ( ˜ π \ π ) − E [ V ∗ G ( ˜ π \ π )] | > ε 4  ( a ) ≤ 2 exp  − c ε 2 8 | D | + 4 ε  + 2 exp  − c ε 2 8 | D | + 4 ε  ≤ exp  − C ′′ ε  , where in (a) we use the Hanson-W right inequality and the fact that   D E π \ D E ˜ π   ≤ | D | / 2 which enables us to improv e the denominator in the exponential term, replacing n | D | with | D | . No w, w e can prov e the lemma. P ( ∃ π ∈ P D : | V ∗ G ( π ) − E [ V ∗ G ( π )] | > ε ) ≤ P  | V ∗ G ( ˜ π ) − E [ V ∗ G ( ˜ π )] | > ε 2  + P  ∃ π ∈ P D \ { ˜ π } : | V ∗ G ( π ) − E [ V ∗ G ( π )] | > ε     | V ∗ G ( ˜ π ) − E [ V ∗ G ( ˜ π )] | ≤ ε 2  ≤ P  | V ∗ G ( ˜ π ) − E [ V ∗ G ( ˜ π )] | > ε 2  + X π ∈P D \{ ˜ π } P  | V ∗ G ( π ) − E [ V ∗ G ( π )] | > ε     | V ∗ G ( ˜ π ) − E [ V ∗ G ( ˜ π )] | ≤ ε 2  ( a ) ≤ P  | V ∗ G ( ˜ π ) − E [ V ∗ G ( ˜ π )] | > ε 2  + e | D | log( | D | ) P  | V ∗ G ( π ) − E [ V ∗ G ( π )] | > ε     | V ∗ G ( ˜ π ) − E [ V ∗ G ( ˜ π )] | ≤ ε 2  ( b ) ≤ P  | V ∗ G ( ˜ π ) − E [ V ∗ G ( ˜ π )] | > ε 2  + exp  − C ′ ε  . Here, step ( a ) follows from the b ound |P D | ≤ | D | ! ≤ | D | | D | , while step (b) holds under the assumption that ε > C | D | log ( | D | ). Based on Theorem A.2, we now pro ve the ﬁrst inequality of Theorem 3.2. Supp ose ε = C t q n log  en t  for some suﬃcien tly large constan t C > 0. P ( ∃ t ∈ [ n ] , ∃ π ∈ S n,t : | V ∗ G ( π ) − E [ V ∗ G ( π )] | > ε ) = n X t =1 X D ⊆ [ n ] | D | = t P ( ∃ π ∈ P D : | V ∗ G ( π ) − E [ V ∗ G ( π )] | > ε ) ( a ) ≤ n X t =1 X D ⊆ [ n ] | D | = t P  | V ∗ G ( ˜ π ) − E [ V ∗ G ( ˜ π )] | > ε 2  + exp  − C ′ ε  ≤ n X t =1  n t  2 exp   − c C 2 t 2 n log  en t  4 nt + 2 C t q n log  en t    + n X t =1  n t  exp  − C ′ C t r n log  en t   ( b ) ≤ n X t =1 exp  − C ′′ t log  en t  + n X t =1 exp  − C ′′ t √ n  = o (1) , 19 where (a) follows from the Theorem A.2, nothing that ε > t log n , and in ( b ) we used n ≥ p n log ( en/t ) in the ﬁrst part of RHS, and p n log ( en/t ) ≥ √ n in the second part. Combining the b ounds for V ∗ F and V ∗ G completes the proof of Theorem 3.2. B Pro of of Lemma 3.4 Pr o of. The k ey idea in pro ving this lemma is to condition the exp onen tial terms on B and Y . Without loss of generalit y , we assumed π ∗ = id . Thus, we ha ve A i,j = ρB i,j + p 1 − ρ 2 Z i,j and X i,j = η Y i,j + p 1 − η 2 Z ′ i,j , where Z i,j and Z ′ i,j are indep endent standard Gaussian random v ariables. In this case, w e ha ve V G ( π ) | B ∼ ρ X 1 ≤ i 0 is a universal c onstant. Then the event H 2 o c curs with pr ob ability at le ast 1 − o (1) . 20 Pr o of. The pro of tec hnique is closely related to that of Theorem 3.2. W e deﬁne vectors b ∈ R n ( n − 1) 2 and y ∈ R nd b y v ectorizing the edge information B i,j and the feature information Y i,j , resp ectively . Therefore, b and y are indep endent standard Gaussian v ectors. Analogous to Theorem 3.2, we can express Cov B ( π , π ′ ) and Co v Y ( π , π ′ ) in quadratic form as Co v B ( π , π ′ ) = b ⊤  ( 1 N 1 ⊤ N − I N ) ⊙ Π B  ⊤ Π ′ B b and Co v Y ( π , π ′ ) = y ⊤  ( 1 N ′ 1 ⊤ N ′ − I N ′ ) ⊙ Π Y  ⊤ Π ′ Y y , where Π B and Π Y (resp. Π ′ B and Π ′ Y ) are p erm utation matrices asso ciated with the p erm utation π (resp. π ′ ) for b and y , resp ectively . Here, w e deﬁne N : = n ( n − 1) / 2 and N ′ : = nd , and ⊙ denotes the Hadamard (elemen t wise) product. Note that      ( 1 N 1 ⊤ N − I N ) ⊙ Π B  ⊤ Π ′ B     op ≤    ( 1 N 1 ⊤ N − I N ) ⊙ Π B    op   Π ′ B   op ≤ ∥ Π B ∥ op   Π ′ B   op ≤ 1      ( 1 N ′ 1 ⊤ N ′ − I N ′ ) ⊙ Π Y  ⊤ Π ′ Y     op ≤    ( 1 N ′ 1 ⊤ N ′ − I N ′ ) ⊙ Π Y    op   Π ′ Y   op ≤ ∥ Π Y ∥ op   Π ′ Y   op ≤ 1 . Moreo ver, w e ha v e      ( 1 N 1 ⊤ N − I N ) ⊙ Π B  ⊤ Π ′ B     F =    ( 1 N 1 ⊤ N − I N ) ⊙ Π B    F = q | D E π | ≤ p | D π | n      ( 1 N ′ 1 ⊤ N ′ − I N ′ ) ⊙ Π Y  ⊤ Π ′ Y     F =    ( 1 N ′ 1 ⊤ N ′ − I N ′ ) ⊙ Π Y    F = p | D π | d Applying Hanson-W righ t inequalit y (Theorem A.1) and setting ε = C t √ n log n for Co v B ( π , π ′ ) and ε = C t √ max d, log n log n for Cov Y ( π , π ′ ), for π ∈ S n,t , we obtain P ( H c 2 ) ≤ n exp  log( |S n,t | 2 ) − C ′ t log ( n )  + n exp  log( |S n,t | 2 ) − C ′′ t log ( n )  = o (1) . This completes the pro of. Based on Theorem B.1, we can now complete the pro of of Lemma 3.4. W e hav e E h e ρ 1 − ρ 2 V G ( π ) 1 H 2 i = E h E h e ρ 1 − ρ 2 V G ( π ) 1 H 2    B ii ( a ) = E     exp     ρ 2 1 − ρ 2 X 1 ≤ i 0 . Then, for any function f : R n → R that is L -Lipschitz with r esp e ct to the Euclide an norm, we have P ( | f ( X ) − E [ f ( X )] | ≥ t ) ≤ 2 exp  − γ t 2 4 L 2  . In our setting, if w e collect all random v ariables B , A , Y , X into a single vector p , this v ector is a Gaussian random v ector with zero mean and cov ariance matrix Σ p , whose largest eigen v alue is λ max (Σ p ) = max { 1 + ρ, 1 + η } ≤ 2. Therefore, w e ha v e γ ≥ 0 . 5. Next, we compute the gradient of log ( Z ) with resp ect to the vector p . Throughout the computa- tion, we abbreviate Z = X π ∈S n exp( − H ( π )) . First, consider the deriv atives with resp ect to the v ariables B i,j . W e hav e ∂ B i,j log( Z ) = ρ 1 − ρ 2 X π : π ( i,j )  =( i,j ) ( A π − 1 ( i,j ) − A i,j ) e − H ( π ) Z = − ρ 1 − ρ 2 A i,j + ρ 1 − ρ 2 X 1 ≤ i ′ 0 it holds that P  log( Z ) 1 H 1 ∩H 2 ≥ E [log ( Z )] − C n (log n ) 3 4  ≤ P  log( Z ) ≥ E [log( Z )] − C n (log n ) 3 4  ≤ exp  − Ω  n p log n  ≤ exp  − o  n p log n  ≤ P (log( Z ) 1 H 1 ∩H 2 ≥ εn log n (1 − o (1))) . This implies that E [log( Z )] ≥ 1+ ε 2 n log n (1 + o (1)), which completes the pro of of the lemma. W e now pro ceed to establish a lo wer b ound for N 1 H 1 ∩H 2 in order to complete the pro of. The ﬁrst step is to derive a b ound on the conditional exp ectation E [ N 1 H 1 ∩H 2 | B , Y ]: E [ N 1 H 1 ∩H 2 | B , Y ] ( a ) = |S n,n | P  N  0 , ρ 2 σ 2 B ( π ) 1 − ρ 2 + η 2 σ 2 Y ( π ) 1 − η 2  ≥ (1 − ε ) n log n, H 1 ∩ H 2  ( b ) ≥ |S n,n | P ( N (0 , (1 − ε ) n log n (1 − o (1))) ≥ (1 − ε ) n log n (1 − o (1))) ( c ) ∼ |S n,n | 1 √ 2 π (2(1 − ε ) n log n (1 − o (1))) 3 / 2 exp( − (1 − ε ) n log n (1 − o (1))) ( d ) = exp  1 + ε 2 n log n (1 − o (1))  , where (a) follows from the deﬁnition of σ 2 B ( π ), σ 2 Y ( π ) and T ; (b) applies Lemma B.1 together with the fact that d = ω ( log ( n )); (c) follo ws from the asymptotic relation P ( N (0 , 1) ≥ t ) ∼ 1 √ 2 π t exp ( − t 2 / 2); and (d) uses the fact that |S n,n | = exp  n log n (1 − o (1))  . 25 F rom now on, we assume that ρ 2 1 − ρ 2 n + 2 η 2 1 − η 2 d = 2(1 − ε ) log n, since establishing the lemma under this equality suﬃces, as the result then extends to the case ρ 2 1 − ρ 2 n + 2 η 2 1 − η 2 d < 2(1 − ε ) log n. F or simplicity , we in tro duce new Gaussian v ariables. F or π ∈ S n,n , deﬁne W ( π ) : = ρ √ 1 − ρ 2 P 1 ≤ i p (1 − ε ) n log n, W ( π ′ ) > p (1 − ε ) n log n  = X π ,π ′ ∈S n,n | D π ∩ π ′ | = o  n √ log( n )  P  W ( π ) > p (1 − ε ) n log n, W ( π ′ ) > p (1 − ε ) n log n  + X π ,π ′ ∈S n,n | D π ∩ π ′ | =Ω  n √ log( n )  P  W ( π ) > p (1 − ε ) n log n, W ( π ′ ) > p (1 − ε ) n log n  . (15) The motiv ation for this decomp osition is as follo ws. Most pairs of p erm utations fall in to the ﬁrst group, where the correlation is negligible, i.e., E [ W ( π ) W ( π ′ )] = o (1). In con trast, pairs in the second group may exhibit non-negligible correlation, but the n umber of such pairs is asymptotically negligible compared to the ﬁrst group. Accordingly , we treat these t w o cases separately and apply the following lemma to bound eac h group. Lemma C.4. L et W 1 and W 2 b e jointly Gaussian r andom variables with zer o me an and varianc es 1 + b (1) n and 1 + b (2) n , r esp e ctively, wher e b (1) n = o (1) and b (2) n = o (1) . Assume their c orr elation c o eﬃcient is α n ∈ [0 , 1] . Then, for any se quenc e t n → ∞ , the fol lowing statements hold: (i) Gener al ly, P ( W 1 > t n , W 2 > t n ) ≤ (1+ o (1)) 1 + b (1) n + b (2) n 2 + (1 + o (1)) α n √ 2 π t n exp   − t 2 n 1 + b (1) n + b (2) n 2 + (1 + o (1)) α n   . (ii) Particularly, if α n → 0 , then for suﬃciently lar ge n , P ( W 1 > t n , W 2 > t n ) ≤ exp − 8 t 2 n 1 + b (1) n ! + exp  O  α n t 2 n  P ( W 1 > t n ) P ( W 2 > t n ) . Pr o of. P art (i): Since W 1 + W 2 is a Gaussian random v ariable with v ariance 2 + b (1) n + b (2) n + 2 q 1 + b (1) n q 1 + b (2) n α n w e can b ound the join t probability as follows: P ( W 1 > t n , W 2 > t n ) ≤ P ( W 1 + W 2 > 2 t n ) ( a ) ≤ (1 + o (1)) 1 + b (1) n + b (2) n 2 + (1 + o (1)) α n √ 2 π t n exp   − t 2 n 1 + b (1) n + b (2) n 2 + (1 + o (1)) α n   . where in ( a ) w e use this fact that q 1 + b (1) n q 1 + b (2) n = 1 + o (1) since b (1) n = o (1) and b (2) n = o (1). P art (ii): One can easily v erify that W 1 and W 2 can b e replaced b y random v ariables q 1 + b (1) n W and q 1 + b (2) n  α n W + p 1 − α 2 n W ′  where W and W ′ are tw o indep enden t standard Gaussian 27 random v ariables. Moreov er, for a standard Gaussian random v ariable lik e W w e ha v e P ( W > 4 t n ) ≤ exp  − 8 t 2 n  . Hence, P ( W 1 > t n , W 2 > t n ) ≤ P ( W 1 > 4 t n ) + P ( W 1 > t n ) P  W 2 > t n    t n < W 1 ≤ 4 t n  ≤ exp − 8 t 2 n 1 + b (1) n ! + P ( W 1 > t n ) P   q 1 + b (2) n W ′ > t n − 4 q 1 + b (2) n α n t n p 1 − α 2 n   ≤ exp − 8 t 2 n 1 + b (1) n ! + P ( W 1 > t n ) P ( W 2 > t n − O ( α n t n )) ≤ exp − 8 t 2 n 1 + b (1) n ! + exp  O  α n t 2 n  P ( W 1 > t n ) P ( W 2 > t n ) . Before applying Lemma C.4, we ﬁrst establish b ounds on the num b er of pairs of p erm utations in each group, as stated in the follo wing lemma. Lemma C.5. (i) F or |S n,n | , the numb er of p ermutations satisfying D id ∩ π = ∅ , we have |S n,n | = n ! e  1 + O  1 ( n + 1)!  . (ii) F or e ach π ∈ S n,n , let R π : = { π ′ ∈ S n,n : D π ∩ π ′ = ∅ } . Then |R π | = n ! e 2  1 + O  1 ( n + 1)!  . Pr o of. Part (i): There is a w ell-kno wn problem in com binatorics, called the derangemen t problem, whic h concerns the n um b er of p ermutations of n ob jects such that none of them is mapp ed to itself. If we deﬁne, for i, j ∈ [ n ], the set J i → j : = { π ∈ S n : π ( i ) = j } , then S n,n =   [ i ∈ [ n ] J i → i   c . Therefore, by the inclusion–exclusion principle, we obtain |S n,n | = n ! −   X ∅  = I ⊆{ 1 ,...,n } ( − 1) | I | +1      \ i ∈ I J i → i        = n ! n X k =0 ( − 1) k k ! ! = n ! e  1 + O  1 ( n + 1)!  . 28 Part (ii): Similar to P art (i), for an arbitrary π ∈ S n,n w e can rewrite R π as R π =   [ i ∈ [ n ]  J i → i ∪ J i → π ( i )    c . Since π ∈ S n,n , we hav e that for all i ∈ [ n ], the sets J i → i and J i → π ( i ) are disjoint. Hence, for I ⊆ [ n ],      \ i ∈ I  J i → i ∪ J i → π ( i )       = 2 | I |  n | I |  ( n − | I | )! . Therefore, by applying the inclusion–exclusion principle, we obtain |R π | = n ! n X k =0 ( − 2) k k ! ! = n ! e 2  1 + O  1 ( n + 1)!  . No w, according to Lemma C.5, we can deriv e a low er b ound on the n um b er of pairs of permuta- tions such that | D π ∩ π ′ | ≤ m for m ∈ { 0 , . . . , n } :   { π , π ′ ∈ S n,n : | D π ∩ π ′ | ≤ m }   = X π ∈S n,n   m X t =0 X I ⊆ [ n ]: | I | = t   { π ′ ∈ S n,n : D π ∩ π ′ = I }     ≥ |S n,n | m X t =0  n t  ( n − t )! e 2  1 + O  1 ( n + 1)!  ≥ ( n !) 2 e 3  1 + O  1 ( n + 1)!  m X t =0 1 t ! ( a ) ≥ ( n !) 2 e 2  1 + c ( m + 1)!  , where in ( a ), 0 < c < 1 is constan t and w e used appro ximation of e b y its T aylor series. This, in turn, implies an upp er b ound for the num b er of pairs of permutations suc h that | D π ∩ π ′ | > m :   { π , π ′ ∈ S n,n : | D π ∩ π ′ | > m }   = |S n,n | 2 −   { π , π ′ ∈ S n,n : | D π ∩ π ′ | ≤ m }   ≤ ( n !) 2 e 2 c ( m + 1)! . (16) This helps us to b ound the num b er of permutations in the second group in (15) . On the other hand, the num b er of pairs of p ermutations in the ﬁrst group in 15 can also b e simply upper bounded as     { π , π ′ ∈ S n,n : | D π ∩ π ′ | ≤ √ n log( n ) }     ≤ |S n,n | 2 = ( n !) 2 e 2  1 + O  1 ( n + 1)!  . No w we turn to Lemma C.4 and use it to b ound (15) . F or each pair π , π ′ ∈ S n,n suc h that | D π ∩ π ′ | = o  n √ log( n )  , from (14) w e ha ve E [ W ( π ) W ( π ′ )] ≤ o  n √ log( n )  n + o ( n p log( n )) (1 − ε ) n log ( n ) → 0 . 29 Moreo ver, E [ W ( π ) W ( π ′ )]  p (1 − ε ) n log ( n )  2 ≤ (1 + o (1))(1 − ε ) o  n p log( n )  + o  n p log( n )  = o  n p log( n )  . Hence, based on Lemma C.4 part ( ii ), w e hav e: X π ,π ′ ∈S n,n | D π ∩ π ′ | = o  n √ log( n )  P  W ( π ) > p (1 − ε ) n log n, W ( π ′ ) > p (1 − ε ) n log n  ≤ X π ,π ′ ∈S n,n | D π ∩ π ′ | = o  n √ log( n )  e − 8(1 − ε ) n log( n ) (1 − o (1)) + exp  o  n √ log( n )   P  W ( π ) > p (1 − ε ) n log n  P  W ( π ′ ) > p (1 − ε ) n log n  ≤ |S n,n | 2 e − 8(1 − ε ) n log( n ) (1 − o (1)) + exp  o  n √ log( n )   X π,π ′ ∈S n,n P  W ( π ) > p (1 − ε ) n log n  P  W ( π ′ ) > p (1 − ε ) n log ( n )  ( a ) = o (1) + exp  o  n p log( n )  E [ N 1 H 1 ∩H 2 | B , Y ] 2 = exp  o  n p log( n )  E [ N 1 H 1 ∩H 2 | B , Y ] 2 , where in ( a ) we used the fact that |S n,n | 2 = exp (2 n log ( n )(1 − o (1))) and E [ N 1 H 1 ∩H 2 | B , Y ] 2 = P π ,π ′ ∈S n,n P  W ( π ) > p (1 − ε ) n log ( n )  P  W ( π ′ ) > p (1 − ε ) n log ( n )  . The only remained part is to control the second group in (15) . As explained before, this group consists of pair of p ermutations with strong correlation but the size of the whole group is small whic h helps to control it and sho w that it is of order of o  E [ N 1 H 1 ∩H 2 | B , Y ] 2  . More precisely , 1 E [ N 1 H 1 ∩H 2 | B , Y ] 2 X π ,π ′ ∈S n,n | D π ∩ π ′ | =Ω  n √ log( n )  P  W ( π ) > p (1 − ε ) n log n, W ( π ′ ) > p (1 − ε ) n log n  ( a ) ≤ exp ( − (1 + ε ) n log n (1 − o (1))) X m =Ω  n √ log( n )  exp ((2 n log n − m log m )(1 − o (1))) exp   − (1 − ε ) n log n (1 − o (1)) 1 + b ( π ) n + b ( π ′ ) n 2 + (1 + o (1)) α n   ( b ) ≤ X m =Ω  n √ log( n )  exp  (2 n log n − m log m )(1 − o (1)) − (1 − ε ) n log n (1 − o (1)) 1 + (1 + o (1)) m n − (1 + ε ) n log n (1 − o (1))  ≤ X m =Ω  n √ log( n )  exp  (1 − ε )(1 + o (1)) m log n 1 + (1 + o (1)) m n − m log m  ( c ) ≤ X m =Ω  n √ log( n )  exp  (1 − ε )(1 + o (1)) m log n 1 + (1 + o (1)) m n − m log n (1 − o (1))  ≤ X m =Ω  n √ log( n )  exp  (1 − ε )(1 + o (1)) 1 + (1 + o (1)) m n − 1  m log n (1 + o (1))  ( d ) = o (1) , 30 where ( a ) follows from these facts that E [ N 1 H 1 ∩H 2 | B , Y ] ≥ exp  1+ ε 2 n log n (1 + o (1))  and for all pair of permutations with | D π ∩ π ′ | = m w e can use the same bound based on Lemma C.4 and the n umber of all them is based on equation (16) is of order of exp ((2 n log n − m log m )(1 − o (1))) . In ( b ) w e use this fact that for pair of p erm utations that | D π ∩ π ′ | = m the correlation coeﬃcient equals α n = m/n , also since b ( π ) ) n = O (1 / √ n + 1 / √ d ) (similarly for b ( π ′ ) ) n ) and d = ω ( log n ) the term b ( π ) ) n + b ( π ′ ) ) n is of order of o ( m/n ) for m = Ω( n/ √ log n ). Inequality ( c ) holds since m = Ω( n/ √ log n ) and therefore m log m = m log n (1 − o (1)). ( d ) is true since the all p ossible v alue for m is at most n and the exp onen tial term for all m = Ω( n/ √ log n ) is of order of − θ ( m log n ). Based on the prop osed b ounds for the tw o groups in (15), w e can conclude that E [ N 2 1 H 1 ∩H 2 | B , Y ] ≤ (1 + o (1)) exp  o  n p log n  E [ N 1 H 1 ∩H 2 | B , Y ] . Therefore, the P aley-Zygm und inequalit y (13) prov es the Lemma C.3 Finally , based on Lemma C.1 and Lemma C.3, Lemma 3.6 is prov ed. D Pro ofs of Section 4 D.1 Pro of of Lemma 4.1 Pr o of of L emma 4.1. W e take α 0 = ε 1+ ε and ﬁrst deal with the case 2 ≤ t ≤ α 0 n . Observ e that V ( π ) ( a ) = ρ 2 1 − ρ 2     X 1 ≤ i ρ 2 t  n − t 2  1 − ρ 2 + η 2 td 1 − η 2 ! (1 − o (1)) ! ≤ P N (0 , 1) > s 1 2  ρ 2 1 − ρ 2 t  n − t 2  + η 2 1 − η 2 td  (1 − o (1)) ! ( a ) ≤ exp  − 1 4  ρ 2 1 − ρ 2 t  n − t 2  + η 2 1 − η 2 td  (1 + o (1))  (17) ( b ) ≤ exp  − 1 4  ρ 2 1 − ρ 2 tn  1 − α 0 2  + η 2 1 − η 2 td  1 − α 0 2   (1 + o (1))  ( c ) ≤ exp  − (1 + ε )  1 − α 0 2  t log n (1 + o (1))  where in ( a ) w e used the bound P ( N (0 , 1) > x ) ≤ exp ( − x 2 / 2) , in ( b ) follo ws from t ≤ α 0 n , and ( c ) follo ws from ρ 2 n 1 − ρ 2 + η 2 d 1 − η 2 ≥ 4(1 + ε ) log n . No w, using union b ound, w e obtain P   [ 2 ≤ t ≤ α 0 n E t   ≤ P ( H c 2 ) + P   [ 2 ≤ t ≤ α 0 n E t ∩ H 2   ≤ o (1) + ⌊ α 0 n ⌋ X t =2 |S n,t | exp  − (1 + ε )  1 − α 0 2  t log n (1 + o (1))  ( a ) ≤ o (1) + ⌊ α 0 n ⌋ X t =2 exp  t log n − (1 + ε )  1 − α 0 2  t log n (1 + o (1))  ( b ) = o (1) + ⌊ α 0 n ⌋ X t =2 exp  − ε 2 t log n (1 + o (1))  32 ≤ o (1) + exp  − ε 2 log n (1 + o (1))  1 − o (1) = o (1) , where in ( a ) w e used the b ound |S n,t | ≤ n t = exp( t log n ) and ( b ) follows from α 0 = ε 1+ ε . W e no w deal with the remaining case t > α 0 n . In this case, pro ving P post ( B (id , α 0 ) c ) = o (1) with high probability will result in V ( π ) < 0 for all π ∈ S n,t with t > α 0 n with high probability , since Z > 1. F or each π in S n,t w e ha v e E [exp( − V ( π )) 1 H 1 ∩H 2 ] = E  exp  − ρ 1 − ρ 2 ( V ∗ G ( π ) − V G ( π )) + η 1 − η 2 ( V ∗ F ( π ) − V F ( π ))  1 H 1 ∩H 2  ( a ) ≤ exp − ρ 2 t  n − t 2  1 − ρ 2 + η 2 td 1 − η 2 ! (1 − o (1)) ! E  exp  ρ 1 − ρ 2 V G ( π ) + η 1 − η 2 V F ( π )  1 H 2  ( b ) = exp − ρ 2 t  n − t 2  1 − ρ 2 + η 2 td 1 − η 2 ! (1 − o (1)) ! E  exp  ρ 1 − ρ 2 V G ( π )  1 H 2  E  exp  η 1 − η 2 V F ( π )  1 H 2  ( c ) ≤ exp  − 1 2  ρ 2 1 − ρ 2 t  n − t 2  + η 2 1 − η 2 td  (1 − o (1))  (18) ( d ) ≤ exp  − t 4  ρ 2 1 − ρ 2 n + η 2 1 − η 2 d  (1 − o (1))  ( e ) ≤ exp ( − (1 + ε ) t log n (1 − o (1))) where in ( a ) w e used the deﬁnition of even t H 1 , in ( b ) w e used the indep endence b etw een V G ( π ) and V F ( π ), ( c ) follo ws from the deﬁnition of even t H 2 , ( d ) is based on n − t/ 2 ≥ n/ 2 and ( e ) follo ws from the fact that ρ 2 n 1 − ρ 2 + η 2 d 1 − η 2 ≥ 4(1 + ε ) log n . No w, w e can con trol the p osterior distribution. T o this end, ﬁrst w e control its exp ectation. E [ Z P post ( B (id , α 0 ) c ) 1 H 1 ∩H 2 ] = n X t = α 0 n X π ∈S n,t E [exp( − V ( π )) 1 H 1 ∩H 2 ] ≤ n X t = α 0 n |S n,t | exp ( − (1 + ε ) t log n (1 − o (1))) ≤ n X t = α 0 n exp ( − εt log n (1 − o (1))) = o (1) . Next, Marko v inequality yields P ( Z P post ( B (id , α 0 ) c ) > log n E [ Z P post ( B (id , α 0 ) c )]) ≤ P (( H 1 ∩ H 2 ) c ) + P ( Z P post ( B (id , α 0 ) c ) > log n E [ Z P post ( B (id , α 0 ) c )] ∩ ( H 1 ∩ H 2 )) ≤ o (1) + 1 log n = o (1) . Therefore, with probabilit y at least 1 − o (1) we hav e Z P post ( B (id , α 0 ) c ) ≤ log n E [ Z P post ( B (id , α 0 ) c )] = o (1) . Since Z > 1 this means P post ( B (id , α 0 ) c ) = o (1) with probabilit y at least 1 − o (1). This shows that P  S t>α 0 n E t  is also of order of o (1) and completes the pro of of the achiev ability of exact reco v ery . 33 D.2 Pro of of Lemma 4.2 Pr o of of L emma 4.2. T o sho w these t wo statements, without loss of generality w e assume that ρ 2 1 − ρ 2 n + η 2 1 − η 2 d = 4 log n − log log n − a n , where a n = ω (1) and a n = o (log log n ). F or the ﬁrst moment part, similar to what happ ened in equation (17) for t = 2 we ha ve E [ N ] ≥  n 2  (1 − o (1)) E  P  N (0 , 1) ≥ s ρ 2 1 − ρ 2 n + η 2 1 − η 2 d   =  n 2  (1 − o (1)) E  P  N (0 , 1) ≥ p 4 log n − log log n − a n   ∼ n 2 4 √ 2 π √ log n exp  − 2 log n + log log n 2 + a n 2  = 1 4 √ 2 π exp  a n 2  → ∞ , F or the second moment analysis, we expand the second momen t as follows: E [ N 2 ] = E [ N ] + X π ,π ′ ∈S n, 2 π ∩ π ′ = ∅ P  V ( π ) < 0 , V  π ′  < 0 , H 2  + X π ,π ′ ∈S n, 2 π ∩ π ′  = ∅ P  V ( π ) < 0 , V  π ′  < 0 , H 2  . (19) Moreo ver, eac h term in the abov e summations can b e expressed as: P  V ( π ) < 0 , V  π ′  < 0 , H 2  = P ( G π > r, G π ′ > r, H 2 ) , where r = q ρ 2 1 − ρ 2 n + η 2 1 − η 2 d = (1 + o (1)) √ 4 log n − log log n − a n . , and G π and G ′ π are tw o standard Gaussian random v ariables with the follo wing co v ariance c n = ρ 2 1 − ρ 2 P ( i,j ) ∈ D E π ∩ D E π ′  B π ( i,j ) − B i,j   B π ′ ( i,j ) − B i,j  + η 2 1 − η 2 P j ∈ [ d ] ,i ∈ D π ∩ D π ′  Y π ( i ) ,j − Y i,j   Y π ′ ( i ) ,j − Y i,j  v u u t ρ 2 1 − ρ 2 P 1 ≤ i r, G ′ π > r  ≤ (1 − o (1))  n 2  n − 2 2  h C ′ e − 8 r 2 + (1 − o (1)) P ( G π > r ) P  G ′ π > r  i ≤ (1 + o (1)) E [ N ] 2 . F or the second summation in (19), applying Lemma C.4 part (i) we obtain: X π ,π ′ ∈S n, 2 π ∩ π ′ = ∅ P  V ( π ) < 0 , V  π ′  < 0 , A  = X π ,π ′ ∈S n, 2 π ∩ π ′ = ∅ P  G π > r, G ′ π > r  ≤ (1 − o (1)) n ( n − 1) 2 × 2( n − 2) ×  (1 + o (1)) 1 + c n √ 2 π r exp  − r 2 1 + c n  ≤ C ′′ n 3 log − 1 / 2 ( n ) exp  − 16 5 log n + o (log n )  = o (1) = o  E [ Y ] 2  . This completes the pro of of the fact that E [ N 2 ] ≤ (1 + o (1)) E [ N ] 2 . 35

Contextual Graph Matching with Correlated Gaussian Features

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment