Maximum entropy based testing in network models: ERGMs and constrained optimization

Maxim um en trop y based testing in net w ork mo dels: ER GMs and constrained optimization Subhrosekhar Ghosh * † Departmen t of Mathematics, National Universit y of Singap ore matghos@nus.edu.sg Rathindra Nath Karmak ar * ‡ Departmen t of Mathematics, Kyushu Univ ersit y karmakar.rathindra.735@s.kyushu-u.ac.jp Samriddha Lahiry * ‡ Departmen t of Statistics and Data Science, National Universit y of Singap ore slahiry@nus.edu.sg Abstract Sto c hastic net w ork mo dels pla y a cen tral role across a wide range of scien tiﬁc disciplines, and questions of statistical inference arise naturally in this con text. In this pap er we in vestigate go o dness-of-ﬁt and t w o-sample testing pro cedures for statistical netw orks based on the principle of maxim um en trop y (MaxEn t). Our approac h formulates a constrained en tropy-maximization problem on the space of net works, sub ject to prescrib ed structural constraints. The resulting test statistics are deﬁned through the Lagrange m ultipliers asso ciated with the constrained optimization problem, which, to our kno wledge, is no vel in the statistical netw orks literature. W e establish consistency in the classical regime where the num b er of v ertices is ﬁxed. W e then consider asymptotic regimes in which the graph size gro ws with the sample size, developing tests for b oth dense and sparse settings. In the dense case, w e analyze exp onen tial random graph mo dels (ERGM) (including the Erdös-Rèn yi mo dels), while in the sparse regime our theory applies to Erdös-Rènyi graphs. * Authors are listed in alphab etical order of their surnames. † Supp orted in part b y Singap ore MOE grants R-146-000-312-114, A-8002014-00-00, A-8003802-00-00, E-146-00- 0037-01, A-8000051-00-00, A-0009806-01-00 and A-0004586-00-00 ‡ Corresp onding authors. 1 Our analysis lev erages recen t adv ances in nonlinear large deviation theory for random graphs. W e further sho w that the prop osed Lagrange-m ultiplier framework connects naturally to classical score tests for constrained maxim um lik eliho o d estimation. The results provide a uniﬁed en trop y- based framew ork for net work mo del assessmen t across diverse growth regimes. Keywor ds— Statistical Netw orks, Go o dness-of-ﬁt tests, T wo-sample tests, Exp onential Random Graph Mo dels (ER GM), Maximum Entrop y Principle (MaxEnt), Lagrange Multiplier T est, Nonlinear Large Devi- ations, Graph Limits Con ten ts 1 In tro duction 3 1.1 T esting in random net w ork models and applications . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.1 Go o dness-of-ﬁt testing in random net works . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.2 T w o sample testing in random netw orks . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 T esting based on maxim um en tropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Connection to Lagrange Multiplier tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Outline of the pap er . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Bac kground 7 2.1 Hyp othesis testing via MaxEn t principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 T ests based on Lagrange m ultipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Exp onen tial random graph mo del . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Graph Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.5 Large deviation in random graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3 Main Results 14 3.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Net w orks of ﬁxed size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2.1 Consistency of the Lagrange Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.2 Asymptotic normalit y of the Lagrange Multiplier . . . . . . . . . . . . . . . . . . . . . 15 3.2.3 Go o dness-of-ﬁt test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3 Net w orks of growing size: the sparse regime . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.1 Consistency of the Lagrange Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.2 Asymptotic normalit y of the Lagrange Multiplier . . . . . . . . . . . . . . . . . . . . . 18 3.3.3 Go o dness-of-ﬁt test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3.4 T w o sample test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4 Net w orks of growing size: the Dense Regime . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4.1 Consistency of the Lagrange Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4.2 Sharp rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4.3 Go o dness-of-ﬁt test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4.4 T w o sample test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.5 A few key technical ingredients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.5.1 Uniqueness of ro ots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.5.2 One sided vs t w o sided tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2 4 Pro of Outline 26 4.1 Pro ofs for a ﬁxed n umber of v ertices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Pro ofs for the sparse regime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3 Pro ofs for the dense regime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5 Discussion 32 A Pro ofs 36 B Pro ofs of Theorems 3.8 and 3.10 47 C Pro of of technical lemmas 58 D Existence and Uniqueness of ro ots 67 D.1 Conditions for existence and uniqueness of ro ots . . . . . . . . . . . . . . . . . . . . . . . . . . 67 D.2 Existence and uniqueness of ro ots in netw ork mo dels . . . . . . . . . . . . . . . . . . . . . . . 68 D.2.1 V eriﬁcation of Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 D.2.2 Pro of of Lemma 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 1 In tro duction Ov er the past tw o decades, the statistical mo deling of netw ork data using random graph mo dels has received signiﬁcan t atten tion, with applications spanning diverse domains suc h as social net w orks, brain net works, and omics netw orks. F rom a statistical inference p ersp ective, these netw orks ha v e b een extensively studied in terms of b oth estimation and h yp othesis testing (cf. [ 42 ] and the references therein). Estimation fo cuses on inferring the underlying random net w ork from observ ed data, whereas h yp othesis testing aims to determine whether the giv en data conforms to a sp eciﬁed random netw ork. In this article, w e consider the setting where i.i.d. copies of a random graph of possibly growing size are pro vided, and the ob jective is to perform statistical testing under constraints based on exp ected motif counts of the underlying random graph model. 1.1 T esting in random netw ork models and applications In line with classical statistical testing setups, hypothesis testing in random netw orks can b e formulated in t w o distinct w ays. The ﬁrst approac h pertains to go o dness-of-ﬁt testing, which seeks to determine whether a given net w ork or a collection of netw orks originates from a sp eciﬁc random graph mo del with predeﬁned parameters. In the context of random netw orks, goo dness-of-ﬁt testing has b een widely explored in the literature; w e refer the reader to [ 69 , 22 , 43 , 12 , 41 , 45 , 51 , 31 , 44 , 21 , 46 ] for a partial list, and the references therein. The second approac h concerns t w o-sample testing, where t wo independent samples of random net w orks are pro vided, and the ob jectiv e is to assess whether they are generated from the same underlying random graph mo del. This problem has b een extensively studied for v arious random graph mo dels; see for example [ 13 , 33 , 34 , 3 , 63 , 1 ] among many others. Both go o dness-of-ﬁt tests and tw o-sample tests are well-motiv ated by applications across a wide range of disciplines. In particular, go o dness-of-ﬁt testing is utilized to ev aluate the suitability of protein-protein in teraction (PPI) net w orks [ 49 , 25 ] and to assess the ﬁt of functional neuroimaging data [ 35 ]. Similarly , t w o-sample testing naturally arises in v arious applications. F or example, it has b een employ ed to analyze gene regulatory netw orks, where it is used to study topological c hanges under t wo diﬀerent breast cancer treatmen ts [ 73 ]; to inv estigate structural brain diﬀerences, where it helps compare anatomical v ariations 3 b et w een health y individuals and schizophrenic patients [ 5 ]; and to adv ance computational biology , where it is applied in graph-based classiﬁcation tasks [ 60 ] 1.1.1 Go o dness-of-ﬁt testing in random net works In the context of go o dness-of-ﬁt testing, v arious approaches hav e been developed for diﬀerent random graph mo dels. F or instance, [ 50 ] and [ 45 ] prop osed motif-based tests for bipartite net w ork mo dels and k ernel-based tests for general random graphs, resp ectively . F or inhomogeneous random graph mo dels, a degree-based test has b een introduced in [ 51 ], while [ 22 ] develops a test based on the adjacency matrix of the random graph. In the context of the sto chastic blo c k mo del, [ 43 ] also develops an adjacency matrix-based test. How ever, unlike [ 22 ], whic h considers m ultiple indep enden t samples, [ 43 ] fo cuses on a test based on a single sample. Several other metho ds hav e also b een explored. F or example, [ 69 ] studies go o dness-of-ﬁt testing for exp onential random graph mo dels (ERGMs) and in tro duces a test based on k ernel Stein discrepancies. A dditionally , [ 12 ] presen ts a framew ork for testing the homogeneity of a random graph using graph functionals, which generalizes the subgraph coun t approac h. 1.1.2 T w o sample testing in random net w orks On the other hand, several tw o-sample tests ha v e also b een dev elop ed. A t wo-sample v ersion of the adja- cency matrix-based test for inhomogeneous random graphs is prop osed in [ 22 ], while [ 33 ] explores similar questions using net work summary statistics. Other tw o sample tests include [ 63 ] whic h i n v estigates whether t w o random dot pro duct graphs share the same generating laten t p ositions, using a test statistic based on sp ectral decomp osition of the adjacency matrix, [ 35 ] which constructs tests based on F réchet means of the Laplacians, and [ 68 ] which examines comparisons of p opulation means in net w ork data, fo cusing on individ- ual links and utilizing symmetric matrices. More recently , [ 19 ] prop osed a general pro cedure for hypothesis testing in netw ork data, aimed at distinguishing the distributions of tw o samples of netw orks. 1.2 T esting based on maximum en trop y In this article, w e presen t a diﬀerent test pro cedure based on subgraph counts. In particular, rather than di- rectly developing a test based on ra w subgraph counts, we maximize the entrop y of the empirical distribution of subgraph counts while imp osing sp eciﬁc constraints on this distribution (see the next section for details). Maximizing entrop y sub ject to constraints is a standard approach in statistical mechanics and information theory [ 39 , 59 ], grounded in Jaynes’ principle of maximum entrop y [ 40 ]. In the context of netw ork mo dels, en trop y maximization under constrain ts has b een extensiv ely explored in the statistical ph ysics literature (see, for instance, [ 53 ]). In this paradigm, one considers probability distribution on space of all graphs with a giv en set of no des. Many random graph mo dels arise out of suc h maximization and the constrain ts represent the so called top ological prop erties of the netw ork concerned (see [ 62 ] and the references therein). Our metho d is inspired b y a maximal entrop y principle similar to the classical setting; how ever, w e maximize the entrop y of the empirical distribution from n samples of the random graph and construct our test statistics based on the Lagrange m ultiplier. On the other hand, the condition imp osed by the n ull h yp othesis translates to constraints on exp ected motif counts (which is analogous to momen t constraints on distributions) and mirrors the top ological constraints describ ed in [ 62 ]. 4 1.3 Connection to Lagrange Multiplier tests Maximizing the en tropy of an empirical distribution under constraints (momen t constrain ts or otherwise) naturally leads to a Lagrange multiplier based form ulation from an optimization p oint of view. Such an optimization pro cedure canonically yields an optimal Lagrange multiplier. F or most statistical applications, mo dern day practitioners often regard this ob ject largely as a computational device, and it is typically not in v estigated for its implications vis-a-vis the big picture. In this work how ever, we demonstrate in our netw ork-based setup that, under appropriate centering and scaling, this optimal Lagrange m ultiplier exhibits asymptotic normality under the n ull h yp othesis. This asymptotic behavior forms the foundation of our proposed go o dness-of-ﬁt test. W e further extend this framew ork to construct a tw o-sample test, utilizing the optimal Lagrange m ultipliers computed indep endently for each sample. It turns out, in fact, that Lagrange multiplier-based tests hav e a long history in the classical statistical literature, particularly in the context of constrained lik eliho o d maximization [ 2 , 61 ]. Indeed, the well-kno wn sc or e test can b e shown to b e equiv alent to the so-called Lagrange Multiplier test [ 61 ]. These metho ds are widely used in econometrics [ 11 ], where they underpin classical pro cedures such as the Breusch–P agan test for heteroscedasticity [ 10 ]. Ho w ev er, the Lagrange Multiplier test has predominantly b een applied to random v ariables in Euclidean spaces, and, to the best of our kno wledge, hypothesis testing pro cedures based on Lagrange m ultipliers ha v e not b een developed for netw ork mo dels, where the growing size of the netw orks introduces additional analytical c hallenges. While the conceptual idea of using a Lagrange multiplier remains analogous to classical settings, the in v estigation of its limiting distribution in our framew ork relies on fundamen tally diﬀerent tec hniques. In particular, our analysis draws heavily from the state-of-the-art nonline ar lar ge deviation the ory . A related direction w as explored in [ 32 ] (also cf. [ 18 ]), where the authors studied semi-parametric maxim um likelihoo d estimation (MLE) under constrain ts, where the optimal Lagrange multiplier app eared as a k ey analytical device. While [ 32 ] fo cuses largely on distributions on Euclidean spaces, it includes suggestiv e empirical results for constrained likelihoo d maximization in a netw ork-based setup, linking the optimal Lagrange multiplier to degeneracies in the exp onen tial random graph mo del (ERGM). In con trast, our approac h maximizes entrop y rather than likelihoo d, whic h is more canonical from a statistical netw orks p ersp ective, and we demonstrate that the resulting optimizer is highly informative from a hypothesis testing viewp oint. Our Contributions : In this work, w e develop a systematic pro cedure for hypothesis testing for random netw orks via an en- trop y maximization approac h with constraints on motif coun ts, whic h serves as an analogue of moment constrain ts for net work-structured data. In particular, we address the problems of goo dness-of-ﬁt testing and tw o sample testing for random netw orks in the setting of Exp onential R andom Gr aph ( abbrv. ERGM) mo dels. Our metho d leverages the optimal Lagrange multiplier asso ciated with the maximum entrop y prin- ciple, whose asymptotic statistical prop erties are established as a cornerstone of our analysis. On a technical lev el, w e demonstrate that this optimal Lagrange multiplier can b e characterized as the ro ot of an equation in v olving an exp onentially tilted version of the relev ant empirical motif count, whic h allows us to p erform the asymptotic analysis necessary to derive our statistical results. W e fo cus on three distinct setups: Gr aphs with ﬁxe d numb er of vertic es: In this setting, our analysis is quite general and encompasses the 5 case where the underlying graph has a ﬁxed num b er of vertices, i.e., the size of the graph do es not grow with the sample size. In particular, w e assume that samples are generated from an arbitrary distribution o ver graphs on a ﬁnite v ertex set. W e establish the asymptotic normality of the Lagrange multiplier and construct a go o dness-of-ﬁt test based on exp ected motif counts. Sp eciﬁcally , w e test whether the exp ected motif coun ts of the generating random graph mo del coincide with the exp ected motif coun t of a ﬁxed distribution G 0 on the same vertex set. Dense gr aphs with incr e asing numb er of vertic es: Here we consider the ERGM mo del where the num b er of v ertices is allow ed to grow with the sample size. Based on the principle of en trop y maximization under con- strain ts we analyze the asymptotic b eha vior of the Lagrange multiplier whic h then yields a go o dness-of-ﬁt test based on the exp ected motif coun ts. While the study of the optimization problem relies on techniques similar to those used in examining the free energy of ERGMs [ 16 ], understanding the b ehavior of the corresp onding Lagrange multiplier requires more reﬁned to ols, drawing on the log-sum-exp appro ximation framework from the nonlinear large deviations literature [ 15 ]. Finally since the dense Erdös-Rènyi mo del ( G ( N , p ) with p = O (1) is a sp ecial case of our framew ork, we obtain a go o dness-of-ﬁt test for the edge probability p . In this setting our results further yield a natural tw o-sample testing pro cedure. Sp arse gr aphs with incr e asing numb er of vertic es: In this case, we consider Erdös-Rèn yi mo dels in the sparse regime, where the count of the relev ant motif H con verges to a P oisson distribution as the graph size grows. While a maxim um-entrop y–based test can b e en visioned for more general sparse ERGM mo d- els—suc h as the framework in [ 20 ]—these mo dels are not readily amenable to the techniques developed for the dense setting and th us posit signiﬁcan t challenges. Accordingly , w e focus on the sparse Erdös-Rèn yi setup and show that the exp onentially tilted empirical H coun t satisﬁes a central limit theorem. Leveraging the asymptotic theory for Z -estimators, w e then establish the asymptotic normality of the centered and scaled Lagrange m ultiplier ˆ λ n . As in the dense regime, this result immediately yields a go o dness-of-ﬁt test, as well as a t w o-sample test, for the edge probability p . Although our analysis is centered on strictly balanced graph coun ts, which includes imp ortan t motifs such as cliques and cycles (see the notation section for a deﬁnition of strictly balanced graphs), the argumen ts can, in principle, b e extended to general motif counts. Ho w ev er, w e fo cus on a strictly balanced graph to main tain clarity of exp osition. Moreo v er, while our study is restricted to the Erdös-Rèn yi mo del (b oth sparse and dense) and ER GMs (in the dense regime), the underlying principle of en tropy maximization is quite general and may b e adapted to other random graph mo dels. How ev er, even in these cases, the analysis of the Lagrange multiplier is technically inv olved, and we leav e the extensions to the more general mo dels for future work. 1.4 Outline of the pap er In Section 2 we describ e the ERGM mo del and the general hypothesis testing problem. W e also describ e our testing pro cedure based on the Lagrange multiplier which is in turn obtained from the empirical entrop y maximization under constrain ts on motif coun ts. The analysis in the dense regime uses the theory of graphons and hence w e also include a brief in tro duction to graphons in Section 2 . Section 3 con tains main results of our pap er while Section 4 is devoted to a short discussion on the key ideas used in the pro ofs. Finally , w e conclude in Section 5 with a discussion on p ossible generalizations of our results. The pro ofs of the main theorems are given in App endix A while the tec hnical lemmas underpinning our main arguments are pro vided in the App endix C . 6 1.5 Notations W e will use the following notation throughout the pap er. F or a graph G we will denote its set of vertices as V ( G ) or simply V , when there is no c hance of confusion. Similarly we will denote the edge set as E ( G ) or E . The n umber of v ertices and edges are denoted as v ( G ) = | V ( G ) | and e ( G ) = | E ( G ) | resp ectively . W e will denote the Erdös-Rènyi random graph on N vertices and edge-connection probabilit y p as G ( N , p ) . F urther, we deﬁne the density of G as d ( G ) := e ( G ) /v ( G ) , and k ( G ) := max { d ( H ) | H ⊆ G, e ( H ) ≥ 1 } . A graph is said to b e balanced if k ( G ) = d ( G ) , i.e., it is its densest subgraph. It is said to b e strictly balanced if it is strictly denser than all other subgraphs. F or example, cliques, cycles, trees are strictly balanced, whereas subgraph formed b y the union of t wo cliques is balanced but not strictly balanced. As men tioned b efore we will restrict our motifs to strictly balanced subgraphs. Let Aut ( G ) b e the group of automorphisms of the graph G and w e will denote the num b er of automor- phisms as a ( G ) = | Aut ( G ) | . Next we rigorously deﬁne the num b er of copies of a subgraph H in a graph G . Let hom( H , G ) :=   { ϕ : V ( H ) → V ( G ) : { u, v } ∈ E ( H ) ⇒ { ϕ ( u ) , ϕ ( v ) } ∈ E ( G ) }   , inj( H , G ) :=   { ϕ : V ( H )  → V ( G ) : ϕ injectiv e and edge-preserving }   . W e deﬁne unlab elled cop y counts (of the motif H in G ) T ( H , G ) := inj( H , G ) | Aut( H ) | . W e will also denote T ( H , G ) as H ( G ) and use the tw o notations interc hangeably . Henceforth, w e will refer to the num ber of unlab elled copies simply as "num b er of copies" or "counts" for brevity . No w let N = v ( G ) and k = v ( H ) . With these notations, w e ﬁnally deﬁne the following notion of density of a motif, which is sho wn later to b e asymptotically equal to the prop ortion of unlab elled copies of H among all the p ossible k - vertex complete subgraphs of G : t ( H , G ) := hom( H , G ) N k . W e use the O ( . ) and o ( . ) to denote asymptotically b ounded and asymptotically go es to 0 resp ectively i.e w e say a n = O ( γ n ) if lim sup | a n /γ n | ≤ C and a n = o ( γ n ) if lim sup | a n /γ n | = 0 . The notation a n ≫ b n will also b e used to denote b n = o ( a n ) . Similarly O p ( . ) and o p ( . ) are reserved for the sto chastic versions of the same quan tities. W e will use P − → to denote con v ergence in probability while d − → is used to denote con v ergence in distribution. The  1 and  2 norms of v ectors are denoted by ∥ . ∥ 1 and ∥ . ∥ respectively unless otherwise men tioned. 2 Bac kground In this section, we explain the main ideas b ehind the construction of our statistical tests, which are ro oted in the Maximum Entrop y principle, and pro vide a brief ov erview of the mathematical to ols used in the analysis. W e b egin in Section 2.1 by describing the general testing framew ork. The null hypothesis naturally gives rise to a constrained optimization problem, in which the empirical entrop y is maximized sub ject to structural constrain ts. The resulting Lagrange m ultipliers play a central role, and our test statistics are deﬁned through their asymptotic b ehavior. In Section 2.2 , we place our metho d in a broader statistical context b y relating it to classical hypothesis testing pro cedures based on Lagrange multipliers, most notably the score test. Section 2.3 then introduces the exp onential random graph mo del (ER GM) setting and outlines the basic assumptions of the mo del. 7 Finally , we review the tw o main analytical to ols required for our pro ofs: graph limit theory in Section 2.4 and large deviation principles for random graphs in Section 2.5 . 2.1 Hyp othesis testing via MaxEnt principle Supp ose X 1 , . . . , X n are i.i.d. P ∈ P where P is an arbitrary family of distributions. Let ϑ ( · ) b e a statistical functional; that is, ϑ ( P ) is a real-v alued function of P , deﬁned for P ∈ P . F or example, if P is the set of all distributions on R with ﬁnite v ariance, ϑ ( P ) could b e the mean of P or the v ariance of P . The general problem of testing functional of distributions is as follows: H 0 : ϑ ( P ) ∈ C vs H 1 : ϑ ( P ) / ∈ C . (2.1) for an arbitrary set C ⊆ R . This problem reduces to testing of parameters in parametric models if the model is identiﬁable i.e. P = { P θ } θ ∈ Θ and P θ 1 = P θ 2 implies θ 1 = θ 2 . Indeed we can deﬁne ϑ ( P θ ) := θ and state the hypothesis testing problem at the parameter level. No w let G N b e the set of graphs on N vertices (where N = m n ma y grow with the sample size n ) and Θ ⊆ R d . F or θ ∈ Θ , we denote a parametrized family of probability measures on G N b y P θ . Let G 1 , G 2 , . . . G n b e i.i.d. realizations random graphs on N v ertices with common distribution P θ . Fix θ 0 ∈ Θ and consider the function h : G N → R giv en by G 7→ T ( H , G ) − E G ∼ P θ 0 [ T ( H , G )] . Note that h denotes the centered H -counts in the graph G and observ e that E G ∼ P θ [ h ( G )] = 0 (2.2) if θ = θ 0 . No w consider the following hypothesis testing problem: H 0 : E G ∼ P θ [ h ( G )] = 0 vs H 1 : E G ∼ P θ [ h ( G )]  = 0 . (2.3) Indeed deﬁning ϑ ( P θ ) = E G ∼ P θ [ h ( G )] this is of the form ( 2.1 ) when C = { 0 } . In other words using the functional E G ∼ P θ [ h ( G )] we wan t to test whether the random graph is sampled from the distribution P θ . W e state the problem in this form since many random graph mo dels like the ER GM are not identiﬁable through a single subgraph coun t. How ev er, note that for the Erdös-Rèn yi model G ( N , p ) (in the general setup p = p N can v ary as a function of N , but for brevity we will use p when there is no chance of confusion) the distribution P θ can then b e denoted as P p and our hypothesis testing problem reduces to the following problem: H 0 : p = p 0 vs H 1 : p  = p 0 . (2.4) in other words Erdös-Rèn yi mo dels are iden tiﬁable with resp ect to the exp ected motif counts. In the maximum-en tropy under constraints paradigm one considers an arbitrary distribution { p ( G ) } G ∈ G N and writes the following optimization problem: max p ( G ) − X G ∈ G N p ( G ) log p ( G ) suc h that X G ∈ G N c ( G ) p ( G ) = C ∗ (2.5) where the maximum is taken ov er all probability distribution on G N and c ( G ) is a real v alued function with the domain G N . The constrain t P G ∈ G N c ( G ) p ( G ) = C ∗ is called top ological constrain t in [ 62 ], and the 8 authors discuss the prop erties of suc h maximizing distributions under diﬀeren t top ological constraints. Our test is motiv ated from such maximum entrop y principle; in particular to design the test w e will maximize the entrop y of the empirical distribution sub ject to the condition that the empirical v ersion of ( 2.2 ) holds. Let w := ( w 1 , . . . , w n ) b e the weight ve ctor where w i is the w eigh t assigned to the graph G i . The empirical v ersion of ( 2.5 ) is given by: arg max w ∈W − n X i =1 w i log w i where W := ( w : n X i =1 w i h ( G i ) = 0 , w i ≥ 0 , n X i =1 w i = 1 ) . W e will denote h ( G i ) by h n,i (and, whenever the num b er of vertices do es not grow with n , by h i ). Note that this problem is feasible for min h i < 0 < max h i . F urther, as a function of the weigh t vector, this is a strictly concav e maximization problem on a compact conv ex set. Hence, the maximum exists and is attained in the interior. This allows us to lo cate the p oint of maximum using Lagrange multipliers. T ak e the Lagrangian to b e L ( w , λ, α ) = − n X i =1 w i log w i − λ n X i =1 w i h n,i − α n X i =1 w i − 1 ! . Then the Lagrangian equations yield the following relations b etw een the optimal v ariables: n X i =1 h n,i e − ˆ λ n h n,i = 0 (2.6) 1 + ˆ α = log n X i =1 e − ˆ λ n h n,i (2.7) ˆ w i = e − (1+ ˆ α + ˆ λ n h n,i ) (2.8) W e will design our test based on the asymptotic b ehaviour of the Lagrange multiplier ˆ λ n . In particular, w e will demonstrate that, under the null hypothesis, the random v ariable ˆ λ n will conv erge to some constan t λ ◦ at a certain rate, which will lead to a natural go o dness-of-ﬁt test. 2.2 T ests based on Lagrange m ultipliers In this subsection w e review the classical Lagrange m ultiplier test asso ciated with constrained likelihoo d maximization, that is widely used in the econometrics literature. Consider a simple setup where X 1 , . . . , X n are i.i.d. samples from a mo del with parameter θ ∈ R p and log-likelihoo d  n ( θ ) = P n i =1  ( X i ; θ ) . Consider testing equality constraints H 0 : r ( θ ) = 0 , r : R p → R q , q < p. (2.9) The three classical test families are W ald, Likelihoo d Ratio (LR), and Lagrange Multiplier (LM/score/Rao) with the LM tests b eing attractive when the unrestricted MLE is hard but the restricted MLE under H 0 is easy to compute. W e brieﬂy review the LM test b elow. Let S n ( θ ) = ∇ θ  n ( θ ) b e the score, I n ( θ ) = −∇ 2 θ  n ( θ ) the observ ed information, and R ( θ ) = ∂ r ( θ ) /∂ θ b e the q × p Jacobian. Under H 0 , write θ 0 for the true parameter and ˆ θ 0 for the restricted MLE. T o compute the restricted MLE we consider the constrained maximization max θ ∈ R p  n ( θ ) s.t. r ( θ ) = 0 , 9 and the Lagrangian can b e written as L n ( θ , λ ) =  n ( θ ) + λ ⊤ r ( θ ) , λ ∈ R q . A t an optimum ( ˆ θ 0 , ˆ λ ) , the ﬁrst-order conditions are S n ( ˆ θ 0 ) + R ( ˆ θ 0 ) ⊤ ˆ λ = 0 , r ( ˆ θ 0 ) = 0 . (2.10) The distinguishing feature here is that the optimal L agr ange multiplier ˆ λ at the restricted optimum carries all the information needed for the test; the test statistic can b e written as a (scaled) quadratic form in ˆ λ . If regularit y assumptions hold, [ 2 ] sho w ed that under H 0 , the statistic n − 1 / 2 ˆ λ is asymptotically normally distributed with v ariance-cov ariance matrix  R ( θ 0 ) I ( θ 0 ) − 1 R ( θ 0 ) ⊤  − 1 , which is of rank q . Consequently 1 n ˆ λ ⊤  R ( θ 0 ) I ( θ 0 ) − 1 R ( θ 0 ) ⊤  − 1 ˆ λ is asymptotically distributed as χ 2 with q degrees of freedom, when h ( θ 0 ) = 0 , and [ 2 ] prop osed to choose as a region of acceptance of the hypothesis that h ( θ 0 ) = 0 the set of x for which 1 n ˆ λ ⊤  R ( θ 0 ) I ( θ 0 ) − 1 R ( θ 0 ) ⊤  − 1 ˆ λ < χ 2 q , 1 − α . Since the seminal work of [ 2 ], the Lagrange Multiplier (LM) test has been extensively explored and reﬁned. In [ 61 ], the author extended the framework to accommo date degenerate cases and established the asymptotic equiv alence among the W ald, likelihoo d ratio (LR), and LM tests as the sample size n gro ws large. F ollo wing this developmen t, the LM test has b ecome a foundational to ol in econometric literature, forming the basis for a wide range of sp eciﬁcation and diagnostic tests. In particular, building on the Lagrange Multiplier principle, [ 10 ] prop osed a test for heteroskedasticit y in regression mo dels, [ 36 , 37 ] developed LM tests for serial correlation in dynamic regressions, [ 26 ] introduced LM-type pro cedures for detecting ARCH eﬀects, and [ 6 ] constructed LM tests for assessing normalit y in limited dep endent v ariable models, among others. Our prop osed tests are grounded in a similar foundational principle, yet they diﬀer from the classical LM constructions in several imp ortant conceptual and metho dological asp ects, whic h we summarize b elow. 1. The Lagrange Multiplier test dev elop ed in this pap er is formulated for statistical netw orks mo dels in whic h the num b er of no des is allow ed to grow with the sample size. The resulting asymptotic b ehavior in the sparse and dense regimes requires a com pletely diﬀeren t analytical approach from that of the ﬁxed-dimensional parametric mo dels commonly studied in econometrics. 2. Our test is based on maximizing entr opy rather than the likelihoo d. Maximizing the empirical entrop y corresp onds to ﬁnding the ro ot of the function n X i =1 h n,i e − ˆ λ n h n,i , whose b ehavior can b e characterized using the analysis of the exp onential random graph mo del (ER GM) free energy , as outlined in [ 16 ]. While a similar analysis could, in principle, b e dev elop ed under the likelihoo d framework, its exact equiv alence and comparative merit are not yet clear and are left for future inv estigation. F urthermore, since the maximum-en tropy principle is natural in the analysis of netw ork mo dels (see [ 62 ]), our construction can b e view ed as a direct extension of that principle. 10 3. Finally , our prop osed tests are based on a single Lagrange multiplier. In con trast, the framew ork of [ 2 ] suggests the p ossibility of tests based on a ve ctor of m ultipliers arising from hypotheses inv olving m ultiple parameters. While this generalization app ears promising, it lies b ey ond the scop e of the presen t work and will b e explored in future research. 2.3 Exp onential random graph model An Exp onential R andom Gr aph Mo del (ER GM) deﬁnes a probability distribution on lab eled simple graphs G = ( V , E ) with | V | = N . Let T 1 b e an edge and T 2 , . . . , T k b e graphs with at least tw o edges. Fix suﬃcien t statistics t ( T 1 , G ) , . . . , t ( T K , G ) (homomorphism densities of the motifs such as edges, triangles etc.) and parameters β = ( β 1 , . . . , β K ) ∈ R K . One considers the following probabilit y distribution, P β ( G ) = 1 Z N ( β ) exp ( N 2 K X k =1 β k t ( T k , G ) ) , (2.11) where Z N ( β ) is the normalizing constan t. F or k = 1 , w e reco v er the Erdös-Rènyi model G ( N , p ) where ev ery edge is present with probability p = p ( β ) = e 2 β 1 (1 + e 2 β 1 ) − 1 , indep enden t of each other. If k ≥ 2 the probability measure “encourages” the presence of the corresp onding subgraphs if the corresp onding coeﬃcient β k is positive. A canonical example is the edge-triangle mo del, with t ( T 1 , G ) the edge densit y and t ( T 2 , G ) the triangle densit y; see [ 16 ] for a detailed discussion. ER GMs grew from exp onen tial-family mo dels in so cial netw ork analysis [ 29 , 38 , 65 ], and since then this mo del has b een widely studied (cf. [ 52 , 54 , 16 , 55 , 56 , 71 , 58 , 24 , 48 , 27 , 28 , 7 , 57 , 9 , 30 , 66 , 67 ] for a partial list). While the literature on this topic is extensiv e (and the ab ov e list is by no means exhaustive), we note that h yp othesis testing within the ERGM framew ork has b een relatively underexplored. In particular, while a few recen t w orks suc h as [ 69 ] and [ 57 ] develop tests based on Stein discrepancies and [ 70 ] prop oses tests based on sums of degrees, the wider literature on the topic is rather sparse. Our tests are based on the treatmen t of ERGMs in [ 16 ], where one uses graph limits to study the asymptotic b ehaviour of ER GMS. W e note in passing that while this encapsulates the so-called dense r e gime , some recent work has also studied mo diﬁcations which yield sparse graphs instead [ 72 , 47 , 20 ]. How ever these sparse ERGMs are outside the scop e of the current pap er and we study sparse mo dels for the Erdös-Rèn yi random graphs where the analysis pro ceeds via a diﬀerent route. While analyzing the ERGMs, we consider the so called ferr omagnetic r e gime where the parameters β k -s are p ositive for k > 1 . Let e k is the num b er of edges in T k ; we deﬁne the follo wing functions: Φ β ( a ) := K X k =1 β k e k a e k − 1 , ϕ β ( a ) := e 2Φ β ( a ) 1 + e 2Φ β ( a ) . (2.12) The parameter β is said to lie in the sub critic al r e gime (see [ 8 , 16 ]) if there is a unique solution p ( β ) ∈ (0 , 1) to ϕ β ( p ) = p, ϕ ′ β ( p ) ≤ 1 (2.13) and the solution p ( β ) satisﬁes ϕ ′ β ( p ( β )) < 1 . It ma y b e shown that ( 2.13 ) is equiv alent to p satisfying 2Φ β ( p ) = log  p 1 − p  . (2.14) 11 In this regime, ERGMs asymptotically resem ble an Erdös-Rènyi G ( N , p ) and is amenable to analysis. On the other hand, if the parameter β lies outside this region (in other w ords, it lies in the sup ercritical regime), the graphs exhibit clustering and metastabilit y , and are muc h harder to analyze (see [ 66 , 67 ] for details). W e will restrict ourselves to the sub critical regime of a ferromagnetic ERGM and will use the to ols from [ 16 ] to analyze the Lagrange multiplier and thus develop a test. 2.4 Graph Limits T o analyze the ERGMs in the asymptotic limit, w e need to deﬁne sev eral key quantities that app ear in the v ariational formulations of this limit. These deﬁnitions build on the theory of graph limits and graphons, whic h provide a natural framework for analyzing dense graph sequences. In this section, we provide a quick in tro duction to some essential ingredients from this theory that are necessary to understand this connection. Let h : [0 , 1] 2 → [0 , 1] b e a symmetric measurable function, which, up to an equiv alence (to b e deﬁned shortly), we can interpret as a graphon or a graph limit. F or any such function h , we deﬁne the following functionals: • H densit y: t ( H , h ) := Z [0 , 1] | V ( H ) | Y ( i,j ) ∈ E ( H ) h ( x i , x j ) dx 1 . . . dx k , (2.15) . • Edge densit y: e ( h ) := Z [0 , 1] 2 h ( x, y ) dx dy , (2.16) . • Entrop y functional: I ( h ) := Z [0 , 1] 2 h h ( x, y ) log h ( x, y ) + (1 − h ( x, y )) log (1 − h ( x, y )) i dx dy . (2.17) W e denote by W the space of all pre-graphons: W = { h : [0 , 1] 2 → [0 , 1] | h is symmetric and measurable } . T w o pre-graphons, h 1 , h 2 ∈ W , are equiv alent ( h 1 ∼ h 2 ) if there exists a measure-preserving bijection σ : [0 , 1] → [0 , 1] such that h 2 ( x, y ) = h 1 ( σ ( x ) , σ ( y )) for almost all ( x, y ) . The space of graphons is the quotien t space ˜ W = W / ∼ . Since the functionals t ( H , h ) , e ( h ) , and I ( h ) are inv ariant under this equiv alence, the functions are well- deﬁned on ˜ W . F or an element h ∈ W , the corresp onding equiv alence class will b e denoted b y ˜ h . F or a function f on W inv ariant under the ab ov e equiv alence, w e will use the same notation f to denote its lift to the space ˜ W , so that f ( ˜ h ) = f ( h ) . F or p ∈ (0 , 1) , w e also deﬁne, b y abuse of notation, the constant function p ∈ ˜ W as p ( x, y ) = p for all x, y ∈ [0 , 1] . Throughout the dense regime, w e will work with v ariational problems in v olving these functionals, in particular, sup ˜ h ∈ ˜ W − λ | Aut ( H ) | t ( H , ˜ h ) + K X k =1 β k t ( T k , ˜ h ) − 1 2 I ( ˜ h ) ! (2.18) 12 where t ( H , ˜ h ) , t ( T k , ˜ h ) are the H-densities (see ( 2.15 )) of the subgraphs H and T k , while I ( ˜ h ) is the entrop y functional as deﬁned in ( 2.17 ). In fact ( 2.18 ) represents the free energy of the system (cf. [ 16 ] for an in terpretation) and plays a central role in determining the limiting b ehavior of the empirical critical p oints. When sp ecializing to Erdös-Rènyi random graphs G ( N , p ) we will consider the setup where p = Θ(1) . In this case we will ha v e t ( H , p ) = p | E ( H ) | and e ( p ) = p . 2.5 Large deviation in random graphs The large deviation principle for the Erdös-Rènyi random graph was ﬁrst formulated in the pioneering work of [ 17 ] where the authors extended Sano v’s theorem to the Erdös-Rèn yi mo del. Using the same result [ 16 ] sho w ed the following result for the graphons. Let T : f W → R b e a b ounded con tinuous function on the metric space ( f W , δ □ ) (see [ 16 ] for exact deﬁnition of the metric). Fix N and let G N denote the set of simple graphs on N vertices. Then T induces a probability mass function p N on G N deﬁned as p N ( G ) := e N 2 ( T ( ˜ G ) − ψ N ) where ˜ G is the image of G in the quotient space f W . Noting that ψ N = 1 N 2 log X G ∈ G N e N 2 T ( ˜ G ) , (2.19) the following result was prov ed in [ 16 ]: ψ := lim N →∞ ψ N = sup ˜ h ∈ f W  T ( ˜ h ) − 1 2 I ( ˜ h )  . Using the ab o v e result the authors further prov ed that the normalizing constant in ( 2.11 ) b ehav es asymp- totically as follows: lim N →∞ 1 N 2 log Z N ( β ) = sup 0 ≤ u ≤ 1 K X i =1 β i u e ( T i ) − 1 2 I ( u ) ! , (2.20) where I ( u ) = u log u + (1 − u ) log (1 − u ) and e ( T i ) is the num b er of edges in T i . Here I ( u ) can b e interpreted as the entrop y functional ( 2.17 ) for the constant function u ( x, y ) = u . Understanding the b ehavior of the log normalizing constant is a crucial asp ect in our analysis of the Lagrange multiplier. Although details are given in Section 4 , w e note here that the solution of ( 2.6 ) can b e seen as a critical p oint of the function P n i =1 e − λh n,i . It can be shown that the exp ected v alue of the same function is exactly the normalizing constant Z N ( β ) for a certain ER GM and hence ( 2.20 ) features heavily in the analysis of our solution ˆ λ n . How ever, the abov e argument is at the exp ectation lev el and the same argumen t cannot b e easily generalized to analyze the empirical version, and we need more n uanced to ols. T o analyze the critical p oint of the function P n i =1 e − λh n,i , w e b orrow to ols from the large deviation theory of ER GMs. While the large deviation results in the dense case follows from [ 17 ], the pro of in that w ork in v olv es the use of Szemeredy regularity lemma, whic h do es not supply quantitativ e error b ounds. T o bypass the diﬃcult y , [ 15 ] (also see [ 14 ] for a uniﬁed treatmen t on the sub ject) developed the theory of nonlinear large deviation with quan titativ e error estimates. The key result in [ 15 ] is Theorem 1.6 where the authors appro ximate the log normalizing constan t b y a v ariational problem (similar to ( 2.20 ) but using a diﬀeren t analysis). 13 W e apply this approximatio n to study the asymptotic b ehavior of the function P n i =1 e − λh n,i in the large n limit. In particular, w e characterize its critical p oint by analyzing the corresp onding critical p oin t of the limiting v ariational problem and exploiting the conv exity prop erties of the appro ximating sequence of functions. 3 Main Results 3.1 Generalities In this section, we present the main results of our pap er. The main goal will b e to to construct the go o dness- of-ﬁt and the tw o sample tests by analyzing the asymptotic b ehaviour of the Lagrange optimizer ˆ λ n . W e consider tw o distinct setups. In the ﬁrst case, the total n um ber of v ertices in the graph N , is set to a ﬁxed integer m , and the results hold for general random graph models. In the second case, w e allow the num b er of vertices ( N = m n ) to grow with the sample size. In the latter growing sample case we ﬁrst analyze the sparse regime, and subsequently the dense regime. In the sparse setting, we construct tests for the Erdös-Rènyi mo del, the general sparse ERGM mo dels b eing out of the scop e due to fundamental b ottlenec ks, as already explained earlier. In particular we con- sider samples from G ( m n , p n ) with, m n p k ( H ) n → c . Next, w e analyze the dense regime - here we formulate tests for the more general ferromagnetic ER GM model in the sub critical regime and derive the results for the dense Erdös-Rènyi mo del (samples obtained from G ( m n , p ) with p = Θ(1) ) as a sp ecial case. Finally w e end this section with discussing a few key tec hnical ingredien ts, related to the results stated in this section. Remarks: 1. The existence and uniqueness of the Lagrange multiplier are established in Lemma 3.3 . As shown in the App endix, this result holds across all regimes considered—namely , the ﬁxed-size, sparse, and dense regimes. Consequently , throughout this section we tacitly assume the existence and uniqueness of the Lagrange multiplier in all stated theorems. 2. Although the general Go o dness-of-ﬁt test stated in 2.3 is a tw o sided test, we will prov e the results for one-sided tests due to technical reasons. A brief discussion in this regard is given in Section 3.5 . 3. It is p ossible to extend the results to the full sparse regime ( p n → 0 at any given rate) and also to an arbitrary ﬁxed small graph H , but these w ould in v olv e additional tec hnical and notational diﬃculties without adding to our conceptual understanding of the problem. Therefore, for clarity of exp osition, w e will restrict ourselves to a strictly balanced motif H at the threshold ( m n p k ( H ) n → c ), and refer to this as "the sparse" regime. 3.2 Net w orks of ﬁxed size In this case our result is general: w e assume that the graphs are sampled from any random graph mo del G with m vertices. W e will denote by P G the distribution of h under the random graph mo del G . In this setup, indep endent samples G 1 , . . . , G n are obtained from a random graph mo del G with a ﬁxed n um b er of vertices (say m ). Let H b e the set of v alues h i can assume. Note that the set H is ﬁnite as the n um b er of vertices is ﬁnite. The conv ergence of empirical distribution of H count then leads to the ro ot of ( 2.6 ) conv erging to a particular v alue λ ◦ , which is captured b y the following theorem. 14 3.2.1 Consistency of the Lagrange Multiplier T o analyze the ro ot of ( 2.6 ), we ﬁrst rewrite the LHS as: 1 n n X i =1 h i e − λh i = X a ∈H ae − λa |{ i | h i = a }| n . (3.1) Th us it is enough to analyze the ro ot of the RHS of ( 3.1 ). Theorem 3.1. L et ˆ λ n b e the unique r e al r o ot of the function X a ∈H ae − λa |{ i | h i = a }| n . and λ ◦ b e the unique r e al r o ot of the function X a ∈H ae − λa P G ( h = a ) . Then ˆ λ n → λ ◦ almost sur ely. 3.2.2 Asymptotic normality of the Lagrange Multiplier Next we consider the asymptotic normality of the Lagrange multiplier. F or notational con v enience we will redeﬁne h i , centering T ( H , G i ) with an arbitrary constant h 0 as follows: h i = T ( H , G i ) − h 0 . Cho osing h 0 as the exp ected motif count under a particular random graph mo del G 0 , then yields the results needed for the corresp onding Go o dness-of-ﬁt test. Theorem 3.2. L et G 1 , . . . , G n b e sample d indep endently fr om a r andom gr aph mo del G with a ﬁxe d numb er of vertic es. If ˆ λ n is the unique r e al r o ot of the e quation n X i =1 h i e − λh i = 0 , then the fol lowing holds: √ n ( ˆ λ n − λ ◦ ) d − → N 0 , V ar  ( T ( H , G ) − h 0 ) e − λ ◦ ( T ( H, G ) − h 0 )  ( E  ( T ( H , G ) − h 0 ) 2 e − λ ◦ ( T ( H, G ) − h 0 )  ) 2 ! wher e λ ◦ is the unique r e al r o ot of the e quation X a ∈H ae − λa P G ( h = a ) = 0 . 15 3.2.3 Go o dness-of-ﬁt test In this subsec tion w e consider Go o dness-of-ﬁt tests for graphs with ﬁxed num b er of v ertices. Recall that T ( H , G ) is the num b er of H coun ts in the graph G . Let G 0 is a given random graph mo del on the same set of m vertices and consider the statistic: h ( G ) = T ( H , G ) − E G ∼ P G 0 [ T ( H , G )] whic h denote the centered H counts in the graph G . Now consider the following hypothesis testing problem: H 0 : E G ∼ P G [ h ( G )] = 0 vs H 1 : E G ∼ P G [ h ( G )] > 0 . (3.2) Before using Theorem 3.2 to construct a test, w e note that setting h 0 = E G ∼ P G 0 [ T ( H , G )] results in λ ◦ = 0 under H 0 . Indeed deﬁning b ( λ ) := X a ∈H ae − λa P G ( h = a ) w e hav e b (0) = X a ∈H a P G ( h = a ) = E G ∼ P G [ h ( G )] = 0 , where the last equality holds under H 0 . Since λ ◦ is deﬁned to the unique ro ot of b ( λ ) , it follo ws that λ ◦ = 0 . F urther deﬁning σ 2 0 := V ar  ( T ( H , G ) − h 0 ) e − λ ◦ ( T ( H, G ) − h 0 )   E  ( T ( H , G ) − h 0 ) 2 e − λ ◦ ( T ( H, G ) − h 0 )  2 , w e note that for λ ◦ = 0 we hav e σ 2 0 = V ar [( T ( H , G ) − h 0 )] ( E [( T ( H , G ) − h 0 ) 2 ]) 2 = ( V ar [( T ( H , G ) − h 0 )]) − 1 , and it follows that ˆ σ 2 0 =   1 n − 1 n X i =1 T ( H , G i ) − 1 n n X i =1 T ( H , G i ) ! 2   − 1 is a consistent estimator of σ 2 0 . W e consider the following test: T est 3.1 (Goo dness-of-ﬁt test for ﬁxed num b er of vertices) . The test is given by φ n = I { ˆ λ n > ˆ σ 0 √ n z α } wher e z α is the 1 − α quantile of the standar d normal distribution. Corollary 3.1. The Go o dness-of-ﬁt test for ﬁxe d numb er of vertic es i.e. T est 3.1 is a level α c onsistent test. Pr o of. F rom Theorem 3.2 it follows that the ab ov e is asymptotically a level α test. Indeed, under H 0 λ ◦ = 0 , and hence P H 0 ( Reject Null ) = E H 0  I  ˆ λ n > ˆ σ 0 √ n z α  = P H 0 [ √ n ( ˆ λ n − λ ◦ ) > ˆ σ 0 z α ] → α. F urther we observe that P H 1 ( Reject Null ) = E H 1 [ φ n ] = P [ ˆ λ n > σ 0 √ n z α ] = P [ √ n ( ˆ λ n − λ ◦ ) > − √ nλ ◦ + ˆ σ 0 z α ] The RHS conv erges to 1 since λ ◦ > 0 when the graphs are sampled from a distribution corresp onding to the alternate hypothesis, so that the test is consistent. 16 3.3 Net w orks of growing size: the sparse regime W e no w examine the problem where the graph size N = m n increases with the sample size n . In this section, w e consider the setup where indep endent samples G n, 1 , . . . , G n,n are obtained from the Erdös-Rènyi mo del G ( m n , p n ) , with the edge probability p n (also denoted as p ( m n ) or simply p , when there is no chance of confusion) v anishing with the graph size. W e restrict ourselves to a strictly balanced subgraph H and assume the scaling m n p k ( H ) n → c > 0 , m n p k ( H ) 0 ,n → c 0 > 0 , (3.3) for constan ts c and c 0 with c ≥ c 0 , where p 0 ,n corresp onds to the null hypothesis i.e. the graphs are sampled from G ( m n , p 0 ,n ) . The ab ov e scaling ensures that the exp ected n um b er of copies of H remains of constant order and we deﬁne µ ( m ) := m v ( H ) p e ( H ) | Aut ( H ) | . (3.4) whic h is asymptotically equal to the exp ected n umber of copies of H in G ( m, p ) as m → ∞ . Let G n,i b e the i th cop y of the n i.i.d. copies sampled from G ( m n , p n ) . F or notational conv enience we will redeﬁne h n,i , cen tering T ( H , G n,i ) with an arbitrary constant h 0 as follows: h n,i = T ( H , G n,i ) − h 0 . Later while constructing the Go o dness-of-ﬁt test w e will c ho ose the cen tering term to be the limiting v alue of the exp ected H -coun t under the null mo del i.e. h 0 = lim n →∞ E G ∼ P p 0 ,n [ T ( H , G n,i )] . Since the underlying random graph mo del is Erdös-Rèn yi , it can b e easily shown that under the null, h 0 = lim n →∞ µ ( m n ) where µ ( m ) is deﬁned in ( 3.4 ). F or general v alues of the parameter p n (not necessarily the null) we assume that lim n →∞ µ ( m n ) h 0 = e λ ◦ , (3.5) for some constant λ ◦ ≥ 0 (the existence of the limit guaranteed by ( 3.3 ), non-negativity follows from c ≥ c 0 ). In this setting, w e will sho w a central limit theorem for exponentially tilted empirical H count P n i =1 h n,i e − λh n,i . The asymptotic normality of the centered and scaled Lagrange multiplier ˆ λ n then follows by analyzing the ro ot of the ab o v e function. But ﬁrst, we show the consistency of the ro ot ˆ λ n . 3.3.1 Consistency of the Lagrange Multiplier W e show the consistency in tw o steps. In the ﬁrst step, we establish the conv ergence of the ro ot of the p opulation version of ( 2.6 ). Classical P oisson approximation results (see, for example, [ 4 ], Eq. (2.9) and the ensuing discussion) imply that the H -count (denoted as H m in this setup) in the G ( m, p ) graph con v erges in total v ariation to the distribution of a Poisson random v ariable Z µ with mean µ = h 0 e λ ◦ . Since the function x 7→ xe − λx is b ounded on any interv al [ a, ∞ ) , total v ariation conv ergence implies E [( H m − h 0 ) e − λ ( H m − h 0 ) ] = E [( Z µ − h 0 ) e − λ ( Z µ − h 0 ) ](1 ± o (1)) . Monotonicit y of the function x 7→ xe − λx can b e used to sho w that the ro ot of E [( H m − h 0 ) e − λ ( H m − h 0 ) ] con v erges to that of E [( Z µ − h 0 ) e − λ ( Z µ − h 0 ) ] . This result is established in the following theorem. 17 Theorem 3.3. L et H m b e the numb er of c opies of a gr aph H in a gr aph dr awn fr om an Er dös-Rènyi r andom gr aph G ( m, p ( m )) i.e. H m = T ( H , G ( m, p ( m ))) wher e mp ( m ) k ( H ) → c as m → ∞ . L et λ m b e the unique r e al r o ot of the e quation E [( H m − h 0 ) e − λ ( H m − h 0 ) ] = 0 . F urther, let Z µ b e a Poisson ( µ ) r andom variable with µ = h 0 e λ ◦ and λ ◦ m b e the unique r e al r o ot of the e quation E [( Z µ − h 0 ) e − λ ( Z µ − h 0 ) ] = 0 . Then as m → ∞ , | λ m − λ ◦ m | → 0 . In the second step we establish the empirical version of the ab o v e result. Theorem 3.4. L et G n, 1 , . . . , G n,n b e sample d indep endently fr om G  m n , ( c m n ) 1 /k ( H )  , wher e m n → ∞ . If ˆ λ n is the unique r e al r o ot of n X i =1 h n,i e − λh n,i = 0 , wher e h n,i = T ( H , G n,i ) − h 0 , then ˆ λ n P − → λ ◦ wher e λ ◦ = log  c | V ( H ) | / | Aut ( H ) | h 0  . 3.3.2 Asymptotic normality of the Lagrange Multiplier Next w e establish the asymptotic normalit y of the tilted empirical H counts P n i =1 h n,i e − λh n,i (prop erly scaled and centred). W e will need the follo wing lemma which is a direct consequence of the Lindeb erg-F eller Central Limit Theorem. Lemma 3.1. In the setup of The or em 3.4 , we have n X i =1 h n,i e − λh n,i − E [ h n, 1 e − λh n, 1 ] q nV ar ( h n, 1 e − λh n, 1 ) d − → N (0 , 1) for every λ > 0 . Finally w e are ready to state the asymptotic normality of the ro ot. Since the function P n i =1 h n,i e − λh n,i exhibits asymptotic normality , using standard Z estimator tec hniques we establish the asymptotic normalit y of the ro ot ˆ λ n . Theorem 3.5. L et G n, 1 , . . . , G n,n b e sample d indep endently fr om G  m n , ( c m n ) 1 /k ( H )  , wher e m n ≫ n 1 2(2 − k ( H )) is a se quenc e of natur al numb ers. L et ˆ λ n is the unique r e al r o ot of n X i =1 h n,i e − λh n,i = 0 , wher e h n,i = T ( H , G n,i ) − h 0 . F urther let λ ◦ = log  µ h 0  , and µ = c | V ( H ) | / | Aut ( H ) | , then the fol lowing holds: √ n ( ˆ λ n − λ ◦ ) d − → N 0 , V ar  ( Z µ − h 0 ) e − λ ◦ ( Z µ − h 0 )  ( E  ( Z µ − h 0 ) 2 e − λ ◦ ( Z µ − h 0 )  ) 2 ! . 18 3.3.3 Go o dness-of-ﬁt test In this subsection we consider Go o dness-of-ﬁt tests for sparse Erdos-Renyi random graph, in particular we assume that the sample G n, 1 , . . . , G n,n is generated from G  m n , ( c m n ) 1 /k ( H )  and we wan t to test betw een the following hypotheses: H 0 : c = c 0 vs H 1 : c > c 0 . (3.6) Let σ 2 0 = V ar h ( Z µ 0 − h 0 ) e − λ ◦ c 0 ( Z µ 0 − h 0 ) i  E h ( Z µ 0 − h 0 ) 2 e − λ ◦ c 0 ( Z µ 0 − h 0 ) i 2 . (3.7) Deﬁne µ = c v ( H ) / | Aut ( H ) | , µ 0 = c v ( H ) 0 / | Aut ( H ) | , λ ◦ c 0 = log  µ 0 h 0  , and λ ◦ c = log  µ h 0  . F urther we note that h 0 can b e chosen to b e µ 0 in which case λ ◦ c 0 = 0 and σ 2 0 = 1 /µ 0 (b y plugging in λ ◦ c 0 = 0 in ( 3.7 )). W e are now ready to construct the test: T est 3.2 (Goo dness-of-ﬁt test in the sparse regime) . The test is given by φ n = I { ˆ λ n > 1 √ µ 0 n z α } wher e z α is the 1 − α quantile of the normal distribution. Corollary 3.2. The Go o dness-of-ﬁt test in the sp arse r e gime i.e. T est 3.2 is a level α c onsistent test. Pr o of. F rom Theorem 3.5 it follows that the ab o v e is a level α test. Indeed, under H 0 λ ◦ c 0 = 0 , and hence P H 0 ( Reject Null ) = E c 0  I  ˆ λ n > 1 √ µ 0 n z α  = P c 0 " √ n ( ˆ λ n − λ ◦ c 0 ) 1 / √ µ 0 > z α # → α F urther we observe that P H 1 ( Reject Null ) = E c [ φ n ] = P [ ˆ λ n > 1 √ µ 0 n z α ] = P [ √ n ( ˆ λ n − λ ◦ c ) > − √ nλ ◦ c + 1 √ µ 0 z α ] . The RHS conv erges to 1 as λ ◦ c > 0 when c > c 0 (corresp onding to the alternate hypothesis), so the test is consisten t. 3.3.4 T w o sample test Next, we consider the tw o-sample setting in whic h we observe n 1 i.i.d. graphs dra wn from G  m n 1 , ( c 1 m n 1 ) 1 /k ( H )  and n 2 i.i.d. graphs dra wn from G  m n 2 , ( c 2 m n 2 ) 1 /k ( H )  . Our goal is to test the null hypothesis c 1 = c 2 , where c 1 and c 2 are some unknown p ositiv e reals with c j > c 0 for j ∈ { 1 , 2 } and some known c 0 > 0 . When m n 1 = m n 2 , this reduces to testing equalit y of edge probabilities; in the general case, where m n 1  = m n 2 , it corresp onds to testing whether the tw o graph sequences hav e asymptotically identical subgraph counts. 19 The tw o-sample test is obtained through a careful and indirect application of Theorem 3.5 . Rather than reform ulating the n ull h yp othesis as a constrained optimization problem, we exploit the asymptotic theory dev elop ed earlier in a diﬀeren t manner. Sp eciﬁcally , w e construct tw o auxiliary hypothesis testing problems, whose corresp onding test statistics can b e appropriately combined to yield the desired tw o-sample test statistic. Set h 0 = c | V ( H ) | 0 Aut( H ) . F or the ﬁrst sample, we consider the auxiliary hypothesis testing problem H 0 : c = c 0 vs. H 1 : c = c 1 . Deﬁne h n 1 ,i = h G n 1 ,i − h 0 , and let ˆ λ n 1 denote the solution to n 1 X i =1 h n 1 ,i e − λh n 1 ,i = 0 . Let µ 1 = c | V ( H ) | 1 | Aut( H ) | , λ ◦ c 1 = log  µ 1 h 0  . Then, by Theorem 3.5 , w e hav e √ n 1  ˆ λ n 1 − λ ◦ c 1  d − → N 0 , V ar  ( Z µ 1 − h 0 ) e − λ ◦ ( Z µ 1 − h 0 )   E  ( Z µ 1 − h 0 ) 2 e − λ ◦ ( Z µ 1 − h 0 )  2 ! . (3.8) Similarly , for the second sample we consider the auxiliary hypothesis testing problem H 0 : c = c 0 vs. H 1 : c = c 2 , whic h yields √ n 2  ˆ λ n 2 − λ ◦ c 2  d − → N 0 , V ar  ( Z µ 2 − h 0 ) e − λ ◦ ( Z µ 2 − h 0 )   E  ( Z µ 2 − h 0 ) 2 e − λ ◦ ( Z µ 2 − h 0 )  2 ! . (3.9) When c 1 = c 2 , we hav e µ 1 = µ 2 , and consequently λ ◦ c 1 = λ ◦ c 2 . It therefore follows from ( 3.8 ) and ( 3.9 ) that the diﬀerence ˆ λ n 1 − ˆ λ n 2 serv es as a natural test statistic for testing the null hypothesis H 0 : c 1 = c 2 . T o formally construct a t w o-sample test, one m ust establish the asymptotic normalit y of ˆ λ n 1 − ˆ λ n 2 under appropriate cen tering and scaling. While it may app ear that this follows directly from combining tw o instances of Theorem 3.5 , the p ossibility of unequal sample sizes n 1  = n 2 necessitates additional technical argumen ts. The following theorem makes this precise. Theorem 3.6 (T wo sample asymptotics) . L et the gr aphs G n 1 , 1 , . . . , G n 1 ,n 1 b e sample d indep endently fr om G  m n 1 , ( c 1 m n 1 ) 1 /k ( H )  and ˜ G n 2 , 1 , . . . , ˜ G n 2 ,n 2 b e sample d indep endently fr om G  m n 2 , ( c 2 m n 2 ) 1 /k ( H )  wher e m n i ≫ n i 1 2(2 − k ( H )) ar e se quenc es of natur al numb ers for i ∈ { 1 , 2 } and n 1 /n 2 → ρ ∈ (0 , ∞ ) . L et ˆ λ n 1 b e the unique r e al r o ot of n 1 X i =1 h n 1 ,i e − λh n 1 ,i = 0 , 20 and ˆ λ n 2 b e the unique r e al r o ot of n 2 X i =1 h n 2 ,i e − λh n 2 ,i = 0 , wher e h n 1 ,i = T ( H , G n 1 ,i ) − h 0 and h n 2 ,i = T ( H , G n 2 ,i ) − h 0 . Now let σ 2 n 1 ,n 2 = V ar h ( Z µ 1 − h 0 ) e − λ ◦ c 1 ( Z µ 1 − h 0 ) i n 1  E h ( Z µ 1 − h 0 ) 2 e − λ ◦ c 1 ( Z µ 1 − h 0 ) i 2 + V ar h ( Z µ 2 − h 0 ) e − λ ◦ c 2 ( Z µ 2 − h 0 ) i n 2  E h ( Z µ 2 − h 0 ) 2 e − λ ◦ c 2 ( Z µ 2 − h 0 ) i 2 and λ ◦ c j = log  µ j h 0  , and µ j = c v ( H ) j / | Aut ( H ) | for j ∈ { 1 , 2 } . Then the fol lowing holds: ( ˆ λ n 1 − ˆ λ n 2 ) − ( λ ◦ c 1 − λ ◦ c 2 ) σ n 1 ,n 2 d − → N (0 , 1) . Supp ose we wan t to test b etw een the following hypotheses H 0 : c 1 = c 2 vs H 1 : c 1  = c 2 . Under the null hypothesis, λ ◦ c 1 = λ ◦ c 2 = λ ◦ (sa y). F urther the v ariance simpliﬁes to σ 2 n 1 ,n 2 =  1 n 1 + 1 n 2  V ar  ( Z µ − h 0 ) e − λ ◦ ( Z µ − h 0 )   E  ( Z µ − h 0 ) 2 e − λ ◦ ( Z µ − h 0 )  2 with µ = h 0 e λ ◦ . Since ˆ λ c j P − → λ ◦ for j = { 1 , 2 } (see Theorem 3.4 ), ˆ λ = ˆ λ n 1 + ˆ λ n 2 2 is a consistent estimator of λ ◦ and thus ˆ µ = h 0 e ˆ λ ◦ is a consistent estimator of µ . Let Z ˆ µ b e a P oisson random v ariable with the random parameter ˆ µ . F urther w e denote the condi- tional exp ectation and conditional v ariance with resp ect to the random v ariable Z ˆ µ b y E h . | ˆ λ i and V ar h . | ˆ λ i resp ectiv ely . The following lemma shows that V ar  ( Z µ − h 0 ) e − λ ◦ ( Z µ − h 0 )  and E  ( Z µ − h 0 ) 2 e − λ ◦ ( Z µ − h 0 )  can b e con- sisten tly estimated. Lemma 3.2. E h ( Z ˆ µ − h 0 ) 2 e − ˆ λ ( Z ˆ µ − h 0 ) | ˆ λ i P − → E h ( Z µ − h 0 ) 2 e − λ ◦ ( Z µ − h 0 ) i V ar h ( Z ˆ µ − h 0 ) e − ˆ λ ( Z ˆ µ − h 0 ) | ˆ λ i P − → V ar h ( Z µ − h 0 ) e − λ ◦ ( Z µ − h 0 ) i Th us deﬁning ˆ σ 2 n 1 ,n 2 b y: ˆ σ 2 n 1 ,n 2 :=  1 n 1 + 1 n 2  V ar h ( Z ˆ µ − h 0 ) e − ˆ λ ( Z ˆ µ − h 0 ) | ˆ λ i  E h ( Z ˆ µ − h 0 ) 2 e − ˆ λ ( Z ˆ µ − h 0 ) | ˆ λ i 2 w e observe that ˆ σ 2 n 1 ,n 2 is a consistent estimator of σ 2 n 1 ,n 2 and consequently by Slutsky’s theorem ˆ λ n 1 − ˆ λ n 2 ˆ σ n 1 ,n 2 d − → N (0 , 1) under H 0 . Th us we can consider the following test 21 T est 3.3 (T w o sample test in the sparse regime) . The test is given by φ n = I {| ˆ λ n 1 − ˆ λ n 2 | > ˆ σ n 1 ,n 2 z α/ 2 } wher e z α is the 1 − α quantile of the normal distribution. F rom Theorem 3.6 it follows that the ab ov e is a level α test. The consistency of this test can b e sho wn as in the Go odness-of-ﬁt case and we omit the details. Corollary 3.3. The two sample test in the sp arse r e gime i.e. T est 3.3 is a level α c onsistent test. 3.4 Net w orks of growing size: the Dense Regime In this setup we consider n i.i.d. copies of random graphs sampled from the Exponential Random Graph Mo del (cf. ( 2.11 )) with probability distribution P β ( . ) with P β ( G ) = 1 Z N ( β ) exp ( N 2 K X k =1 β k t ( T k , G ) ) . (3.10) W e particularly fo cus on the ER GM mo del in the ferromagnetic (where the parameters β k -s are p ositiv e for k > 1 ) and the sub critical regime (see ( 2.13 )). In contrast to the sparse setting, the dense regime presen ts considerably greater analytical c hallenges. A k ey tec hnical reason for this is that the asymptotic normality of the exp onentially tilted empirical H count is not a v ailable in this regime. Instead, we in terpret ˆ λ n as a critical p oin t of the function 1 n P n i =1 e − λ m v ( H ) − 2 n h n,i (note the diﬀeren t scaling). W e consider a ferromagnetic ERGM in the sub critical regime and show that the logarithm of the optimization ob jective (describ ed ab ov e)—after appropriate scaling— conv erges to the optimum of a conv ex functional deﬁned ov er the space of gr aphons . In particular this log ob jectiv e function can b e interpreted as the free energy of a tilted exp onen tial random graph mo dels (ERGMs) and the conv ergence follows from the results established in [ 16 ]. The study of the corresp onding Lagrange multiplier ˆ λ n requires a more delicate analysis, whic h relies on the log-sum-exp approximation tec hniques from the nonlinear large deviations literature [ 15 ]. W e note that the results for dense Erdös-Rènyi graphs emerge as a natural sp ecial case of the ER GM framework discussed here, and complement the results for sparse Erdös-Rènyi graphs in Section 3.3 . Recall that the i th cen tred motif count is given b y: h n,i = T ( H , G n,i ) − h 0 . W e set the cen tering term to b e the exp ected H -count under the null mo del, E G ∼ P β 0 [ T ( H , G )] =: h 0 where P β 0 is an ERGM with parameter vector β 0 ∈ R K . In this dense setting, h 0 is of order m v ( H ) n , while large deviations are gov erned by an m 2 n scale. This motiv ates the exp onen tial tilt F n ( λ ) = 1 n n X i =1 exp ( − λ m v ( H ) − 2 n h n,i ) , (3.11) so that 1 m 2 n log F n ( λ ) has a nondegenerate limit. 22 3.4.1 Consistency of the Lagrange Multiplier As in the sparse Erdös-Rèn yi setup we ﬁrst consider the consider the conv ergence of the critical p oin t of a p opulation version of ( 3.11 ). Theorem 3.7. L et H m b e the numb er of c opies of a gr aph H in a gr aph dr awn fr om a ferr omagnetic ERGM P β on m vertic es, wher e β is in the sub critic al r e gime. L et λ m b e the unique critic al p oint of the function E [ e − λ m v ( H ) − 2 ( H m − h 0 ) ] . W e deﬁne the function g ( λ ) := λh 0 m v ( H ) + sup ˜ h ∈ ˜ W − λ | A ut ( H ) | t ( H , h ) + K X k =1 β k t ( T k , h ) − 1 2 I ( h ) ! − sup ˜ h ∈ ˜ W K X k =1 β k t ( T k , h ) − 1 2 I ( h ) ! . (3.12) Then the function g ( λ ) is strictly c onvex and has a unique critic al p oint. F urther, if we let λ ◦ b e the unique critic al p oint of this function, then as m → ∞ , we have | λ m − λ ◦ | → 0 . Next we establish the empirical version of the ab o v e theorem. Theorem 3.8. L et G n, 1 , . . . , G n,n b e sample d indep endently fr om a ferr omagnetic ER GM P β (with β in the sub critic al r e gime) on m n vertic es, wher e  m n 2  = o (log n ) and let ˆ λ n b e the unique critic al p oint of 1 n P n i =1 e − λ m v ( H ) − 2 n h n,i . L et λ ◦ b e the unique critic al p oint of the function g ( λ ) deﬁne d in ( 3.12 ) . Then as n → ∞ , | ˆ λ n − λ ◦ | P − → 0 . 3.4.2 Sharp rates Next we establish sharp er results for the ab ov e conv ergence whic h can b e used to construct consistent tests. The analysis of the λ ◦ = 0 case and λ ◦ < 0 are quite diﬀerent and hence we state the results in tw o separate theorems. Theorem 3.9. Supp ose λ ◦ = 0 . Then, under the assumptions of The or em 3.8 , we have m 2 n ( ˆ λ n − λ ◦ ) P − → 0 . Theorem 3.10. Supp ose λ ◦ < 0 is such that maximizer is unique. F urther deﬁne p 0 := lim m n →∞  | Aut ( H ) | h 0 m v ( H ) n  1 e ( H ) . Then under the assumptions of the The or em 3.9 , we have m 2 n ( ˆ λ n − λ ◦ ) P − → 1 ( u ∗ ) e ( H ) | Aut ( H ) | − p e ( H ) 0 | Aut ( H ) | wher e u ∗ is the unique maximizer of sup u ∈ [0 , 1] − λ ◦ u e ( H ) | A ut ( H ) | + K X k =1 β k u e ( T k ) − 1 2 I ( u ) ! . 23 3.4.3 Go o dness-of-ﬁt test In this section we consider Go o dness-of-ﬁt tests for a ferromagnetic ERGM mo del in sub critical regime, analogous to our in v estigations in the sparse regime. In particular, w e assume that the sample G n, 1 , . . . , G n,n is generated from P β with β in the sub critical regime (see Section 2.3 for details). No w consider the follo wing h yp othesis testing problem: H 0 : E G ∼ P β [ h ( G )] = 0 vs H 1 : E G ∼ P β [ h ( G )] < 0 (3.13) where h ( G ) = T ( H , G ) − E G ∼ P β 0 [ T ( H , G )] denote the cen tered H counts in the graph G . W e note that if β = β 0 , then E G ∼ P β [ h ( G )] = 0 , while the con v erse may not b e true. As a consequence, we note that this hypothesis testing problem cannot b e reduced to the simpler problem of testing parameter v alues as in the Erdös-Rèn yi case. W e can construct the test as follows. T est 3.4 (Go o dness-of-ﬁt test in the dense regime) . L et c n → ∞ such that c n = o ( m 2 n ) . W e c onsider the fol lowing test φ n = I { ˆ λ n < − c − 1 n } Corollary 3.4. The Go o dness-of-ﬁt test in the dense r e gime i.e. T est 3.4 is a c onsistent test. Pr o of. By Theorem 3.9 under H 0 , ˆ λ n > − 1 2 m 2 n with probability conv erging to 1 and hence E H 0 [ φ n ] → 0 . On the other hand for λ 0 < 0 (under H 1 ), P ( ˆ λ n < − c − 1 n ) ≥ P    ˆ λ n < λ 0 + 1 m 2 n    1 2 + 1 u e ( H ) | Aut ( H ) | − p e ( H ) 0 | Aut ( H ) |       . The RHS conv erges to 1 b y Theorem 3.10 , so that the test is consistent. F or the Erdös-Rènyi mo del G ( m n , p ) the distribution P β can then b e denoted as P p and our h yp othesis testing problem reduces to testing whether p = p 0 (n ull hypothesis) or p < p 0 . 3.4.4 T w o sample test If we ha v e tw o samples from tw o Erdös-Rèn yi mo dels, analogous to the sparse regime we can test whether they come from an Erdös-Rèn yi with same parameter based on the Lagrange multipliers obtained from the t w o samples. While a theory of t w o sample testing for motif densit y can b e sho wn to hold for the ERGM mo del as w ell, this do es not automatically translate in to t w o sample testing of parameters (see the discussion b efore T est 3.4 ), and is thus less interpretable. Therefore for tw o sample testing, w e fo cus here only on the dense Erdös-Rènyi graphs. Theorem 3.11. Assume p, ˜ p ∈ (0 , 1 −  ) wher e  ∈ (0 , 1) is ﬁxe d. L et G n, 1 , . . . , G n,n b e sample d indep endently fr om G ( m n , p ) , ˜ G n, 1 , . . . , ˜ G n,n b e sample d indep endently fr om G ( m n , ˜ p ) , and indep endent of the ﬁrst sample, wher e  m n 2  = o (log n ) is a se quenc e of natur al numb ers. L et ˆ λ n b e the unique critic al p oint of 1 n n X i =1 e − λ m v ( H ) − 2 n ( T ( H, G n,i ) − ( m n v ( H ) ) (1 − ϵ ) e ( H ) ) 24 and let ˜ λ n b e the unique critic al p oint of 1 n n X i =1 e − λ m v ( H ) − 2 n ( T ( H, ˜ G n,i ) − ( m n v ( H ) ) (1 − ϵ ) e ( H ) ) Then as n → ∞ , m 2 n ( ˆ λ n − ˜ λ n ) P − →      0 if p = ˜ p + ∞ if p > ˜ p −∞ if p < ˜ p W e can easily design a tw o sample test based on the ab ov e theorem: T est 3.5 (T w o sample test in the dense regime) . The two sample test is given by: φ = I ( | ˆ λ n − ˜ λ n | > cm − 2 n ) . Corollary 3.5. The two sample test in the dense r e gime i.e. T est 3.5 is a c onsistent test. 3.5 A few k ey technical ingredients 3.5.1 Uniqueness of roots In this section we discuss the existence and uniqueness of the Lagrange multiplier ˆ λ n that is crucially used in Theorems 3.1 - 3.11 . W e note that in the sparse and ﬁxed n um b er of v ertex case the Lagrange multiplier is scaled diﬀerently than in the dense regime. Lemma 3.3. L et h n,i denote the c enter e d motif c ounts deﬁne d in The or ems 3.1 – 3.11 . In the networks of ﬁxe d size and the sp arse r e gime, let ˆ λ n b e a r e al r o ot of the estimating e quation n X i =1 h n,i e − λh n,i = 0 whenever such a r o ot exists. In the dense r e gime, let ˆ λ n b e a critic al p oint of the obje ctive function 1 n n X i =1 exp − λ m v ( H ) − 2 n h n,i ! , whenever such a critic al p oint exists. Then in b oth the c ases ˆ λ n exists and is unique. 3.5.2 One sided vs tw o sided tests W e note that the general Go odness-of-ﬁt testing as discussed in 2.1 , in particular the statement in ( 2.3 ) is a t w o-sided hypothesis testing problem i.e. the alternate hypothesis considers a signiﬁcant diﬀerence in either direction (greater than or less than) from a n ull hypothesis. On the other hand the tests we consider in this article are one-sided hypothesis tests only • In the sparse regime H 0 : c = c 0 vs H 1 : c > c 0 • In the dense regime: H 0 : E G ∼ P β [ h ( G )] = 0 vs H 1 : E G ∼ P β [ h ( G )] < 0 25 In the sparse regime, the condition c ≥ c 0 is equiv alent to λ ◦ ≥ 0 . F or λ ◦ < 0 , the function x 7− → ( x − h 0 ) e − λ ( x − h 0 ) is unbounded, and consequen tly Poisson conv ergence of subgraph counts do es not imply L 1 con v ergence of this function. In fact, one can show that in Theorem 3.3 , E h ( H m − h 0 ) e − λ ( H m − h 0 ) i → ∞ as m → ∞ , so that the asymptotic b ehaviour of λ m and ˆ λ n cannot b e characterized using the techniques developed in this pap er. F or this reason, we restrict attention to a one-sided test in the sparse regime. In the dense regime, the condition E G ∼ P β [ h ( G )] ≤ 0 corresp onds to λ ◦ ≤ 0 in Theorem 3.7 . When λ ◦ > 0 , the supremum in ( 3.12 ) need not b e attained by a constan t graphon. Since our deriv ation of sharp rates (Theorems 3.9 and 3.10 ) crucially relies on the structure of this supremum, this do es not allow us to construct a v alid test in this setting using the av ailable tec hniques. Accordingly , the t w o-sided testing problem lies outside the scop e of the presen t w ork and is left for future inv estigation. 4 Pro of Outline In this section, w e provide a brief ov erview of the pro of techniques used to establish the main results of the pap er. As discussed earlier, our analysis pro ceeds under three distinct regimes: a ﬁxed num b er of vertices, the sparse regime, and the dense regime. Accordingly , w e divide the pro of outline into three parts. 4.1 Pro ofs for a ﬁxed n um b er of v ertices In the ﬁxed n um ber of vertices setting, the results concern consistency and asymptotic normality of the Lagrange multiplier ˆ λ n . The consistency is established in Theorem 3.1 , while the asymptotic normalit y is sho wn in Theorem 3.2 . • Theorem 3.1 . Using standard empirical pro cess argument we sho w that the empirical mean of the tilted motif count (see ( 3.1 )) conv erges uniformly ov er compact sets to its p opulation counterpart. Sp eciﬁcally , we hav e sup λ ∈ K     X ae − λa |{ i | h i = a }| n − X ae − λa P G ( h = a )     → 0 almost surely for compact set K . Uniqueness of the ro ot then yields consistency of the estimator. • Theorem 3.2 . W e establish the asymptotic normality of the function giv en in ( 3.1 ) after appropriate cen tering and scaling. Standard Z -estimation theory then implies asymptotic normality of the corre- sp onding ro ot. The details are omitted as the pro of follows in the same vein as the pro of of Theorem 3.5 . 26 4.2 Pro ofs for the sparse regime In the sparse regime, pro ving consistency is more n uanced and is carried out in tw o steps. As a ﬁrst step, in Theorem 3.3 w e sho w that ro ot of the p opulation version of ( 2.6 ) conv erges to a constant λ ◦ . In the second step w e establish Theorem 3.4 - we combine a strong law of large num bers with Theorem 3.3 , to identify the limit of the empirical ro ot of ( 2.6 ). Next, in Theorem 3.5 we derive a central limit theorem for the exp onentially tilted motif counts (LHS of ( 2.6 )) and in v ok e Z -estimation theory to obtain asymptotic normality of the ro ot. Finally in Theorem 3.6 we establish asymptotic normality for the diﬀerence of ro ots obtained from indep endent samples, which leads naturally to a tw o-sample test. • Theorem 3.3 . In this theorem w e sho w that the ro ot of the p opulation version of ( 2.6 ) conv erges to the ro ot of the P oisson limit. Using classical P oisson conv ergence results for motif counts, we show that the exp ectation of the exp onentially tilted motif count conv erges to the corresp onding exp ectation under a Poisson limit. E [( H m − h 0 ) e − λ ( H m − h 0 ) ] = E [( Z µ − h 0 ) e − λ ( Z µ − h 0 ) ](1 ± o (1)) . (4.1) Monotonicit y of the function xe − λx then ensures that | λ m − λ ◦ | → 0 . • Theorem 3.4 . This theorem prov es consistency of the ro ot of ( 2.6 ). Using strong la w of large num b ers w e hav e P n i =1 h n,i e − λh n,i n − E h h n, 1 e − λh n, 1 i a.s. − − → 0 . Using standard arguments from empirical pro cess theory and ( 4.1 ), we show that sup λ ∈ K      1 n n X i =1 h n,i e − λh n,i − E h ( Z µ − h 0 ) e − λ ( Z µ − h 0 ) i      → 0 a.s. for compact set K ⊂ [0 , ∞ ) . Uniqueness of ro ot then ensures ˆ λ n → λ ◦ in probability . • Theorem 3.5 . In this theorem we prov e asymptotic normalit y of the ro ot of ( 2.6 ). Let Ψ n ( λ ) = 1 n P n i =1 h n,i e − λh n,i . Using a T aylor expansion ab out λ ◦ , one shows that √ n ( ˆ λ n − λ ◦ ) = − √ n (Ψ n ( λ ◦ ) − E [Ψ n ( λ ◦ )]) ˙ Ψ n ( λ ◦ ) + 1 2 ( ˆ λ n − λ ◦ ) ¨ Ψ n ( ˜ λ n ) + o p (1) (4.2) where ˜ λ n is a p oint b etw een ˆ λ n and λ 0 . F urther by arguments similar to those in Theorem 3.4 we hav e ˙ Ψ n ( λ ◦ ) → E h − ( Z µ − h 0 ) 2 e − λ ◦ ( Z µ − h 0 ) i , ¨ Ψ n ( λ ◦ ) → E h ( Z µ − h 0 ) 3 e − λ ◦ ( Z µ − h 0 ) i . Then the consistency of ˆ λ n ensures that the denominator in the RHS of ( 4.2 ) conv erges in probability: ˙ Ψ n ( λ ◦ ) + 1 2 ( ˆ λ n − λ ◦ ) ¨ Ψ n ( ˜ λ n ) P − → E h − ( Z µ − h 0 ) 2 e − λ ◦ ( Z µ − h 0 ) i . 27 Next using Lemma 3.1 , we show that √ n (Ψ n ( λ ◦ ) − E [Ψ n ( λ ◦ )]) d − → N  0 , V ar h ( Z µ − h 0 ) e − λ ( Z µ − h 0 ) i . Finally using Slutsky’s theorem, we get √ n ( ˆ λ n − λ ◦ ) d − → N 0 , V ar  ( Z µ − h 0 ) e − λ ◦ ( Z µ − h 0 )  E  ( Z µ − h 0 ) 2 e − λ ◦ ( Z µ − h 0 )  2 ! . • Theorem 3.6 . Here, w e establish asymptotic normalit y for the diﬀerence of ro ots obtained from indep enden t samples. Using Theorem 3.5 w e obtain, √ n 1 ( ˆ λ n 1 − λ ◦ c 1 ) d − → N (0 , σ 2 1 ) , √ n 2 ( ˆ λ n 2 − λ ◦ c 2 ) d − → N (0 , σ 2 2 ) , with σ 2 i = V ar h ( Z µ i − h 0 ) e − λ ◦ c i ( Z µ i − h 0 ) i E h ( Z µ i − h 0 ) 2 e − λ ◦ c i ( Z µ i − h 0 ) i 2 , i ∈ { 1 , 2 } . Using the fact that the samples are indep endent, we show that T n := ( ˆ λ n 1 − ˆ λ n 2 ) − ( λ ◦ c 1 − λ ◦ c 2 ) p V ∗ n d − → N (0 , 1) , with V ∗ n := σ 2 1 n 1 + σ 2 2 n 2 . W e note that the last statement do es not follow immediately from Theorem 3.5 and a nuanced analysis is needed to prov e this theorem as the sample sizes n 1 and n 2 are diﬀerent. 4.3 Pro ofs for the dense regime In the dense regime, the general theory corresp onds to a dense exp onential random graph mo del (ERGM), with the dense Erdös-Rènyi mo del as a special case. The optimal Lagrange m ultiplier is in terpreted as a critical p oint of the ob jectiv e function ( 3.11 ), and our analysis is based on the asymptotic b eha vior of this critical p oint. In the spirit of Theorem 3.3 , as a ﬁrst step, w e sho w that the critical p oint of the population v ersion of ( 3.11 ) con v erges to the solution of the v ariational problem ( 3.12 ) in Theorem 3.7 . In the second step, in Theorem 3.8 , we use log-sum-exp approximation techniques from the nonlinear large deviation literature to sho w that the critical p oint of ( 3.11 ) con v erges to the same v ariational limit. Rates of con vergence of the critical p oin t ˆ λ n are established in Theorem 3.9 and Theorem 3.10 . In Theorem 3.9 we consider the case when the limiting parameter satisﬁes λ ◦ = 0 ; w e use T aylor expansion and concen tration inequalities to sho w that the empirical critical p oin t conv erges to zero at rate o ( m − 2 n ) . In Theorem 3.10 .we consider the case λ ◦ < 0 ; a T aylor expansion around λ ◦ com bined with nonlinear large deviation tec hniques sho ws that m 2 n ( ˆ λ n − λ ◦ ) con v erges to a nondegenerate constan t. Finally in Theorem 3.11 w e combine Theorems 3.9 and 3.10 to construct a tw o-sample test for dense Erdös-Rènyi mo dels. • Theorem 3.7 . Here, we show that the critical p oin t of the p opulation version of ( 3.11 ) conv erges to the solution of the v ariational problem ( 3.12 ). The pro of is based on the large deviation principle for 28 general ERGMs developed in [ 16 ]. The exp ectation E [ e − λ m v ( H ) − 2 ( H m − h 0 ) ] can b e approximated as the partition function of a new ERGM whose Hamiltonian is the sum of the original ERGM Hamiltonian and a p erturbation term related to the H -coun t. In particular, we hav e: E  e − λ m v ( H ) − 2 ( H m − h 0 )  = e λh 0 m v ( H ) − 2 × P exp { m 2 ( − λt ( H , G ) / | Aut ( H ) | + O ( λ/m ) + P β k t ( T k , G )) } P exp { m 2 P β k t ( T k , G ) } F rom Theorem 3.1 of [ 16 ], the normalized log-partition function of an ERGM conv erges to the supre- m um of a free energy functional o v er the space of graphons. Applying this result to our p erturb ed mo del, we obtain, 1 m 2 ln  E β  e − λ m v ( H ) − 2 ( H m − h 0 )  → λh 0 m v ( H ) + sup ˜ h ∈ ˜ W − λ | Aut ( H ) | t ( H , h ) + K X k =1 β k t ( T k , h ) − 1 2 I ( h ) ! − sup ˜ h ∈ ˜ W K X k =1 β k t ( T k , h ) − 1 2 I ( h ) ! = g ( λ ) (4.3) for all λ ∈ R . Since x 7→ ln( x ) is strictly increasing, w e need to show the conv ergence of critical p oin ts in the display ab ov e. The functions in the sequence ab ov e are all strictly conv ex in λ by direct diﬀeren tiation. The limiting function is con vex b eing a sum of an aﬃne part and a supremum ov er con v ex functions. No w the con v ergence of critical p oints will follo w once we show that the limiting function is strictly conv ex. • Theorem 3.8 . In this theorem our goal is to sho w the con v ergence in probabilit y of the empirical critical p oint ˆ λ n to the theoretical one λ ◦ in the dense ERGM regime. The pro of hinges on the uniform conv ergence of the normalized log-partition function: 1 m 2 n log 1 n n X i =1 e − λ m v ( H ) − 2 n h n,i ! to its limit. The core idea is to apply the log-sum-exp appro ximation framework from [ 15 ]. Deﬁne N ( G ) to b e the count of a particular graph G in the sample. Let x b e the adjacency vector of a graph and G x b e the graph corresp onding to the adjacency vector x . W e further deﬁne N ( x ) = N ( G x ) , H ( x ) = T ( G, H x ) . The energy function f ( x ) is deﬁned by f ( x ) = log N ( x ) n − λ m v ( H ) − 2 n H ( x ) . 29 W e rewrite the empirical ob jectiv e function as a log-partition function ov er the space of all graphs on m n v ertices: 1 m 2 n log X G N ( G ) n e − λ m v ( H ) − 2 n ( T ( H,G ) − h 0 ) ! = λh 0 m v ( H ) n + 1 m 2 n log    X x ∈{ 0 , 1 } ( m n 2 ) e f ( x )    , (4.4) and then verify some smo othness conditions on the function f , under which we sho w that the following holds: 1 m 2 n log    X x ∈{ 0 , 1 } ( m n 2 ) e f ( x )    = sup ˜ h ∈ ˜ W − λ | Aut ( H ) | t ( H , h ) + K X k =1 β k t ( T k , h ) − 1 2 I ( h ) ! − sup ˜ h ∈ ˜ W K X k =1 β k t ( T k , h ) − 1 2 I ( h ) ! + o (1) . (4.5) Com bining ( 4.4 ) with ( 4.5 ) we obtain 1 m 2 n log 1 n n X i =1 e − λ m v ( H ) − 2 n h n,i ! → g ( λ ) . Using conv exity arguments as in Theorem 3.7 , we obtain conv ergence of the critical p oint ˆ λ n to λ ◦ . • Theorem 3.9 . In this theorem we consider the case when the limiting parameter satisﬁes λ ◦ = 0 and sho w that the empirical critical p oint con verges to zero at rate o ( m − 2 n ) . W e p erform a T aylor expansion of the empirical critical p oint equation ab out λ ◦ to obtain: m 2 n ( ˆ λ n − λ ◦ ) = n P i =1 h n,i m v ( H ) n e − λ ◦ m v ( H ) − 2 n h n,i n P i =1 h 2 n,i m 2 v ( H ) n e − ˜ λ n m v ( H ) − 2 n h n,i F or λ ◦ = 0 , we get m 2 n ˆ λ n = 1 n n X i =1 h n,i m v ( H ) n 1 n n X i =1 h 2 n,i m 2 v ( H ) n e − ˜ λ n m v ( H ) − 2 n h n,i = 1 √ n n X i =1 h n,i m v ( H ) − 1 n 1 n n X i =1 h 2 n,i m 2 v ( H ) n e − ˜ λ n m v ( H ) − 2 n h n,i 1 √ nm n . (4.6) W e deﬁne Z n := 1 √ n n X i =1 h n,i m v ( H ) − 1 n and show that m n 4 √ n Z n P − → 0 . (4.7) 30 using concentration inequalities from [ 30 ] in the sub critical regime of the ferromagnetic ERGM. In the next step we show that V n := 4 √ n × m n × 1 n n X i =1 h 2 n,i m 2 v ( H ) n e − ˜ λ n m v ( H ) − 2 n h n,i = Ω P  1 m n  . (4.8) using a central limit theorem for subgraph coun ts in the sub critical regime of a ferromagnetic ERGM from [ 67 ]. Finally , we plug-in ( 4.7 ) and ( 4.8 ) in ( 4.6 ) to conclude that m 2 n ˆ λ n P − → 0 . • Theorem 3.10 . This theorem sharp ens the consistency result of Theorem 3.8 by establishing the precise asymptotic behavior of m 2 n ( ˆ λ n − λ ◦ ) . When λ ◦ < 0 , a T aylor expansion around λ ◦ com bined with nonlinear large deviation tec hniques sho ws that m 2 n ( ˆ λ n − λ ◦ ) con v erges to a nondegenerate constan t. First we use T aylor expansion of the empirical critical p oint equation ab out λ ◦ to show that m 2 n ( ˆ λ n − λ ◦ ) = n P i =1 h n,i m v ( H ) n e − λ ◦ m v ( H ) − 2 n h n,i n P i =1 h 2 n,i m 2 v ( H ) n e − ˜ λ n m v ( H ) − 2 n h n,i . T o analyze the asymptotic b ehavior of the numerator and denominator separately , w e employ tw o distinct v ariational formulations based on the log-sum-exp appro ximation. F or the n umerator, we analyze the deriv ative of the normalized log-partition function. By Danskin’s Theorem [ 23 ] and the uniform conv ergence established in the pro of of Theorem 3.8 , we hav e: n P i =1 − h n,i m v ( H ) n e − λ ◦ m v ( H ) − 2 n h n,i n P i =1 e − λ ◦ m v ( H ) − 2 n h n,i P − → d dλ g ( λ )     λ = λ ◦ = p e ( H ) 0 | Aut ( H ) | − ( u ∗ ) e ( H ) | Aut ( H ) | (4.9) where u ∗ is the unique constan t graphon that maximizes the v ariational problem deﬁning g ( λ ◦ ) from ( 3.12 ). F or the denominator, we in tro duce a crucial auxiliary function by adding a squared H -count term: 1 m 2 n log X G exp − λ m v ( H ) − 2 n h ( G ) + log N ( G ) n + α m 2 v ( H ) − 2 n h ( G ) 2 ! where h ( G ) = T ( H , G ) − h 0 . Once again using log-sum-exp approximation as in Theorem 3.8 w e establish that the auxiliary function conv erges uniformly to its limiting v ariational form: sup ˜ h ∈ ˜ W  − λ ◦ | Aut ( H ) | + 2 αp e ( H ) 0 | Aut ( H ) | 2 ! t ( H , h ) + K X k =1 β k t ( T k , h ) + α | Aut ( H ) | 2 t ( H , h ) 2 − 1 2 I ( h )  + λ ◦ p e ( H ) 0 | Aut ( H ) | + αp 2 e ( H ) 0 | Aut ( H ) | 2 − sup ˜ h ∈ ˜ W  K X k =1 β k t ( T k , h ) − 1 2 I ( h )  . 31 Using Danskin’s theorem [ 23 ], we compute the deriv ative of this limiting function with resp ect to α at α = 0 , which captures the limiting b ehavior of the denominator. Since the supremum in the limit is uniquely attained at the constant graphon u ∗ , we obtain: n P i =1 h 2 n,i m 2 v ( H ) n e − λ ◦ m v ( H ) − 2 n h n,i n P i =1 e − λ ◦ m v ( H ) − 2 n h n,i P − → ( u ∗ ) e ( H ) | Aut ( H ) | − p e ( H ) 0 | Aut ( H ) | ! 2 (4.10) Com bining the limits of the numerator and denominator yields the stated conv ergence: m 2 n ( ˆ λ n − λ ◦ ) P − → ( u ∗ ) e ( H ) | Aut ( H ) | − p e ( H ) 0 | Aut ( H ) |  ( u ∗ ) e ( H ) | Aut ( H ) | − p e ( H ) 0 | Aut ( H ) |  2 = 1 ( u ∗ ) e ( H ) | Aut ( H ) | − p e ( H ) 0 | Aut ( H ) | . (4.11) • Theorem 3.11 . W e com bine Theorems 3.9 and 3.10 to construct a tw o-sample test for dense Erdös- Rèn yi mo dels. 5 Discussion In this pap er, w e construct tests for statistical netw ork mo dels based on the principle of constrained entrop y maximization. Although w e analyze the distributions of the optimal Lagrange m ultipliers (LM)—and the asso ciated tests—for particular random graph families, the construction itself is general and, in principle, applies to any random graph mo del. Thus, this work pro vides a ﬁrst step tow ard a general hypothesis-testing framew ork for random graphs, and we anticipate that LM-t yp e tests can b e used for man y other models, m uc h as LM tests are widely used in the econometrics literature. W e conclude with several op en questions and extensions. In the sparse regime, we develop our tests for Erdös-Rèn yi random graphs. It would b e interesting to extend these ideas to sparse ERGMs, suc h as the model in [ 20 ], where the Hamiltonian is a m ultiv ariate function of motif densities rather than the linear form considered here. A crucial ingredient in our analysis is the b eha vior of dense ferromagnetic ERGMs in the sub critical regime, where the free energy is approximated b y a v ariational problem whose optimizer corresp onds to Erdös-Rènyi graphs. This picture ceases to hold in sparse ERGMs: typical samples need not resemble an Erdös-Rèn yi mo del (or a mixture) but rather an Erdös-Rèn yi mo del with planted substructure. Extending our results to accommo date such sparsit y-induced structure is comp elling but p oses signiﬁcant challenges. Another important direction is to mo v e beyond the sub critical regime of ferromagnetic ERGMs. The uniqueness of the v ariational optimizer is cen tral to our analysis of the optimal Lagrange m ultiplier, and as p er the state of the art in the random graph literature this is so far av ailable in the sub critical regime. Recen t dev elopmen ts in the sup ercritical regime ha v e b een explored in [ 67 ]; extending our testing results to that setting will require nontrivial technical adv ances. W e also note that, while under the n ull the Erdös-Rènyi model is fully identiﬁed—in the sense that exp ected motif coun ts and edge probability are in one-to-one corresp ondence—the same need not b e true for general ER GMs. Consequen tly , our current tests determine whether the parameter vector β lies in a sp eciﬁed subset, rather than testing individual co eﬃcients for zero. W e conjecture that formulating null hypotheses that constrain several motif counts sim ultaneously could yield tests for co eﬃcient nullit y in ERGMs. In that case, the Lagrange multiplier b ecomes a vector, and a natural test statistic would b e a quadratic form in λ in analogy with the classical Lagrange Multiplier tests (cf. Section 2.2 ). 32 Finally , we note that in the sparse setting w e obtain distributional limits for the centered and scaled Lagrange multiplier. In the dense case, suc h limit results are beyond the scop e of this pap er; instead, w e establish a separation rate that distinguishes the comp eting hypotheses. W e conjecture that asymptotic distributions can b e deriv ed in the dense setting as well, which w ould further strengthen our results. A c knowledgemen ts The authors thank Persi Diaconis, Vilas Winstein and Clarence Chew for helpful discussions, and Daren W ei and Huanchen Bao for their supp ort. SG was supp orted in part by the Singap ore MOE grants R-146-000- 312-114, A-8002014-00-00, A-8003802-00-00, E-146-00-0037-01 and A-8000051-00-00. Most of this work was completed when R.N.K. w as a researc h assistan t at the National Univ ersit y of Singap ore, supp orted in part b y the Singap ore MOE gran ts A-8000051-00-00, R-146-000-312-114, A-0009806-01-00 and A-0004586-00-00. References [1] Joshua Agterb erg, Minh T ang, and Carey Prieb e. Nonparametric t w o-sample hypothesis testing for random graphs with negative and rep eated eigenv alues. arXiv pr eprint arXiv:2012.09828 , 2020. [2] J. Aitchison and S. D. Silvey . Maxim um-lik elihoo d estimation of parameters sub ject to restraints. The A nnals of Mathematic al Statistics , 29(3):813–828, 1958. [3] A v anti Athrey a, Donniell E Fishkind, Minh T ang, Carey E Prieb e, Y oungser P ark, Joshua T V ogelstein, Keith Levin, Vince Lyzinski, Yichen Qin, and Daniel L Sussman. Statistical inference on random dot pro duct graphs: a survey . Journal of Machine L e arning R ese ar ch , 18(226):1–92, 2018. [4] A. D. Barb our. Poisson conv ergence and random graphs. Mathematic al Pr o c e e dings of the Cambridge Philosophic al So ciety , 92:349–359, 1982. [5] Danielle S Bassett, Edw ard Bullmore, Beth A V erc hinski, V enk ata S Mattay , Daniel R W einberger, and Andreas Mey er-Linden berg. Hierarchical organization of h uman cortical netw orks in health and sc hizophrenia. Journal of Neur oscienc e , 28(37):9239–9248, 2008. [6] Anil K Bera, Carlos M Jarque, and Lung-F ei Lee. T esting the normalit y assumption in limited dep endent v ariable mo dels. International e c onomic r eview , pages 563–578, 1984. [7] Shank ar Bhamidi, Guy Bresler, and Allan Sly . Mixing time of exp onential random graphs. In 2008 49th A nnual IEEE Symp osium on F oundations of Computer Scienc e , pages 803–812. IEEE, 2008. [8] Shank ar Bhamidi, Guy Bresler, and Allan Sly . Mixing time of exp onential random graphs. The Annals of Applie d Pr ob ability , 21(6):2146–2170, 2011. [9] Guy Bresler, Dheera j Nagara j, and Eshaan Nichani. Metastable mixing of marko v chains: Eﬃcien tly sampling low temp erature exp onen tial random graphs. The Annals of Applie d Pr ob ability , 34(1A):517– 554, 2024. [10] T. S. Breusch and A. R. Pagan. A simple test for heteroscedasticit y and random co eﬃcien t v ariation. Ec onometric a , 47(5):1287–1294, 1979. [11] T. S. Breusch and A. R. Pagan. The lagrange multiplier test and its applications to mo del sp eciﬁcation in econometrics. The R eview of Ec onomic Studies , 47(1):239–253, 01 1980. [12] Barbara Brune, Jonathan Flossdorf, and Carsten Jentsc h. Go o dness-of-ﬁt testing based on graph func- tionals for homogeneous erdös–rényi graphs. Sc andinavian Journal of Statistics , 52(1):332–380, 2025. [13] Say ak Chatterjee, Dib y endu Saha, Soham Dan, and Bhaswar B Bhattachary a. T w o-sample tests for inhomogeneous random graphs in l r norm: Optimality and asymptotics. In International Confer enc e on Artiﬁcial Intel ligenc e and Statistics , pages 6903–6911. PMLR, 2023. 33 [14] Sourav Chatterjee. An introduction to large deviations for random graphs. Bul letin of the A meric an Mathematic al So ciety , 53(4):617–642, 2016. [15] Sourav Chatterjee and Amir Dembo. Nonlinear large deviations. A dvanc es in Mathematics , 299:396–450, 2016. [16] Sourav Chatterjee and Persi Diaconis. Estimating and understanding exp onential random graph mo dels. The Annals of Statistics , 41(5):2428–2461, 2013. [17] Sourav Chatterjee and SR Sriniv asa V aradhan. The large deviation principle for the erdös-rèn yi random graph. Eur op e an Journal of Combinatorics , 32(7):1000–1017, 2011. [18] Sanjay Chaudh uri, Subhroshekhar Ghosh, and Kim Cuc Pham. On an empirical lik eliho o d based solution to the appro ximate bay esian computation problem. Statistic al Analysis and Data Mining: The ASA Data Scienc e Journal , 17(5):e11711, 2024. [19] Li Chen, Jie Zhou, and Lizhen Lin. Hypothesis testing for p opulations of netw orks. Communic ations in Statistics - The ory and Metho ds , 52(11):3661–3684, 2023. [20] Nicholas A Co ok and Amir Dembo. T ypical structure of sparse exp onential random graph mo dels. The A nnals of Applie d Pr ob ability , 34(3):2885–2939, 2024. [21] Villõ Csiszár, Péter Hussami, János Komlós, T amás F. Móri, Lídia Rejtõ, and Gáb or T usnády . T esting go o dness of ﬁt of random graph mo dels. A lgorithms , 5(4):629–635, 2012. [22] Soham Dan and Bhaswar B Bhattachary a. Go o dness-of-ﬁt tests for inhomogeneous random graphs. In International c onfer enc e on machine le arning , pages 2335–2344. PMLR, 2020. [23] John M Danskin. The theory of max-min, with applications. SIAM Journal on Applie d Mathematics , 14(4):641–664, 1966. [24] Ronen Eldan and Renan Gross. Exp onential random graphs b ehav e like mixtures of sto chastic blo ck mo dels. The Annals of Applie d Pr ob ability , 28(6):3698–3735, 2018. [25] Andrew Elliott, Elizab eth Leich t, Alan Whitmore, Gesine Reinert, and F elix Reed-T so chas. A nonpara- metric signiﬁcance test for sampled netw orks. Bioinformatics , 34(1):64–71, 2018. [26] Rob ert F Engle. Autoregressiv e conditional heteroscedasticity with estimates of the v ariance of united kingdom inﬂation. Ec onometric a: Journal of the e c onometric so ciety , pages 987–1007, 1982. [27] Xiao F ang, Song-Hao Liu, Qi- Man Shao, and Yi-Kun Zhao. Normal approximation for exp onen tial random graphs. arXiv pr eprint arXiv:2404.01666 , 2024. [28] Xiao F ang, Song-Hao Liu, Zhonggen Su, and Xiaolin W ang. Conditional cen tral limit theorems for exp onen tial random graphs, 2025. [29] Ove F rank and Da vid Strauss. Mark o v graphs. Journal of the Americ an Statistic al Asso ciation , 81(395):832–842, 1986. [30] Shirshendu Ganguly and Kyeongsik Nam. Sub-critical exponential random graphs: concentration of measure and some applications. T r ansactions of the A meric an Mathematic al So ciety , 377(04):2261– 2296, 2024. [31] Chao Gao and John Laﬀerty . T esting netw ork structure using relations b etw een small subgraph proba- bilities. , 2017. [32] Subhroshekhar Ghosh, Sanjay Chaudhuri, and Ujan Gangopadhy ay . Maximum likelihoo d estimation under constraints: Singularities and random critical p oints. IEEE T r ansactions on Information The ory , 69(12):7976–7997, 2023. [33] Debarghy a Ghoshdastidar, Maurilio Gutzeit, Alexandra Carp entier, and Ulrik e v on Luxburg. T w o- sample tests for large random graphs using netw ork statistics. In Confer enc e on L e arning The ory , pages 954–977. PMLR, 2017. [34] Debarghy a Ghoshdastidar, Maurilio Gutzeit, Alexandra Carp entier, and Ulrike V on Luxburg. T wo- sample h yp othesis testing for inhomogeneous random graphs. The Annals of Statistics , 48(4):2208–2229, 2020. 34 [35] Cedric E. Ginestet, Jun Li, Prak ash Balachandran, Stev en Rosenberg, and Eric D. Kolaczyk. Hyp othesis testing for netw ork data in functional neuroimaging. The Annals of Applie d Statistics , 11(2):725 – 750, 2017. [36] Leslie G Godfrey . T esting against general autoregressiv e and mo ving av erage error models when the regressors include lagged dep endent v ariables. Ec onometric a: Journal of the Ec onometric So ciety , pages 1293–1301, 1978. [37] Leslie G Go dfrey . T esting for higher order serial correlation in regression equations when the regressors include lagged dependent v ariables. Ec onometric a: Journal of the Ec onometric So ciety , pages 1303– 1310, 1978. [38] Paul W Holland and Samuel Leinhardt. An exp onential family of probability distributions for directed graphs. Journal of the A meric an Statistic al Asso ciation , 76(373):33–50, 1981. [39] E. T. Jaynes. Information theory and statistical mechanics. Phys. R ev. , 106:620–630, May 1957. [40] Edwin T Ja ynes. On the rationale of maximum-en tropy metho ds. Pr o c e e dings of the IEEE , 70(9):939– 952, 1982. [41] Jiashun Jin, Zheng T racy Ke, Jia jun T ang, and Jingming W ang. Net w ork go o dness-of-ﬁt for the blo ck- mo del family . arXiv pr eprint arXiv:2502.08609 , 2025. [42] E.D. K olaczyk. Statistic al A nalysis of Network Data: Metho ds and Mo dels . Springer Series in Statistics. Springer New Y ork, 2009. [43] Jing Lei. A go o dness-of-ﬁt test for sto chastic blo ck mo dels. The Annals of Statistics , 44(1):401 – 424, 2016. [44] Yin Li and Keumhee Chough Carriere. Assessing go o dness of ﬁt of exp onential random graph mo dels. International Journal of Statistics and Pr ob ability , 2(4):64, 2013. [45] P-AG Maugis, Soﬁa C Olhede, Carey E Prieb e, and Patric k J W olfe. T esting for equiv alence of netw ork distribution using subgraph counts. Journal of Computational and Gr aphic al Statistics , 29(3):455–465, 2020. [46] Benjamin A. Miller, Lauren H. Stephens, and Nady a T. Bliss. Go o dness-of-ﬁt statistics for anomaly detection in ch ung-lu random graphs. In 2012 IEEE International Confer enc e on A c oustics, Sp e e ch and Signal Pr o c essing (ICASSP) , pages 3265–3268, 2012. [47] Sumit Mukherjee. Degeneracy in sparse ERGMs with functions of degrees as suﬃcient statistics. Bernoul li , 26(2):1016 – 1043, 2020. [48] Sumit Mukherjee and Y uanzhe Xu. Statistics of the tw o star ergm. Bernoul li , 29(1):24–51, 2023. [49] Luis Ospina-F orero, Charlotte M Deane, and Gesine Reinert. Assessmen t of mo del ﬁt via netw ork comparison metho ds based on subgraph counts. Journal of Complex Networks , 7(2):226–253, 2019. [50] Sarah Ouadah, Pierre Latouche, and Stéphane Robin. Motif-based tests for bipartite net w orks. Ele c- tr onic Journal of Statistics , 16(1):293–330, 2022. [51] Sarah Ouadah, Stéphane Robin, and Pierre Latouche. Degree-based go o dness-of-ﬁt tests for heteroge- neous random graph mo dels: Indep endent and exchangeable cases. Sc andinavian Journal of Statistics , 47(1):156–181, 2020. [52] Juyong P ark and Mark EJ Newman. Solution of the tw o-star mo del of a netw ork. Physic al R eview E—Statistic al, Nonline ar, and Soft Matter Physics , 70(6):066146, 2004. [53] Juyong Park and Mark EJ Newman. Statistical mechanics of netw orks. Physic al R eview E—Statistic al, Nonline ar, and Soft Matter Physics , 70(6):066117, 2004. [54] Juyong Park and Mark EJ Newman. Solution for the prop erties of a clustered netw ork. Physic al R eview E—Statistic al, Nonline ar, and Soft Matter Physics , 72(2):026136, 2005. [55] Charles Radin and Lorenzo Sadun. Phase transitions in a complex net w ork. Journal of Physics A: Mathematic al and The or etic al , 46(30):305002, 2013. 35 [56] Charles Radin and Mei Yin. Phase transitions in exp onential random graphs. The A nnals of Applie d Pr ob ability , pages 2458–2471, 2013. [57] G Reinert and N Ross. Approximating stationary distributions of fast mixing glaub er dynamics, with applications to exp onential random graphs. A nnals of Applie d Pr ob ability , 29(5), 2019. [58] Cosma Rohilla Shalizi and Alessandro Rinaldo. Consistency under sampling of exp onen tial random graph mo dels. The Annals of statistics , 41(2):508, 2013. [59] C. E. Shannon. A mathematical theory of communication. The Bel l System T e chnic al Journal , 27(3):379–423, 1948. [60] Nino Sherv ashidze, P ascal Sch weitzer, Erik Jan v an Leeu wen, Kurt Mehlhorn, and Karsten M. Borg- w ardt. W eisfeiler-lehman graph kernels. Journal of Machine L e arning R ese ar ch , 12(77):2539–2561, 2011. [61] S. D. Silv ey . The lagrangian multiplier test. The Annals of Mathematic al Statistics , 30(2):389–407, 1959. [62] Tiziano Squartini and Diego Garlaschelli. Maximum-entr opy networks: Pattern dete ction, network r e c onstruction and gr aph c ombinatorics . Springer, 2017. [63] Minh T ang, A v anti Athrey a, Daniel L. Sussman, Vince Lyzinski, and Carey E. Prieb e. A nonparametric t w o-sample h yp othesis testing problem for random graphs. Bernoul li , 23(3):1599 – 1630, 2017. [64] A. W. v an der V aart. Asymptotic Statistics . Cambridge Universit y Press, Cam bridge, UK, 1998. [65] Stanley W asserman and Katherine F aust. So cial network analysis: Metho ds and applic ations . Cambridge univ ersit y press, 1994. [66] V. Winstein. Concentration via metastable mixing, with applications to the supercritical exp onential random graph mo del, 2025. [67] Vilas Winstein. Quan titativ e central limit theorems for exp onential random graphs. arXiv pr eprint arXiv:2507.10531 , 2025. [68] Yin Xia and Lexin Li. Hyp othesis testing for net w ork data with p ow er enhancement. Statistic a Sinic a , 32:293–321, 2022. [69] W enk ai Xu and Gesine Reinert. A stein go o dness-of-test for exp onential random graph mo dels. In International Confer enc e on Artiﬁcial Intel ligenc e and Statistics , pages 415–423. PMLR, 2021. [70] Y uanzhe Xu and Sumit Mukherjee. Signal detection in degree corrected ERGMs. Bernoul li , 30(3):1746 – 1773, 2024. [71] Mei Yin. Critical phenomena in exp onential random graphs. Journal of Statistic al Physics , 153:1008– 1021, 2013. [72] Mei Yin and Ling jiong Zhu. Asymptotics for sparse exp onential random graph mo dels. Br azilian Journal of Pr ob ability and Statistics , pages 394–412, 2017. [73] Bai Zhang, Huai Li, Rebecca B Riggins, Ming Zhan, Jianh ua Xuan, Zhen Zhang, Eric P Hoﬀman, Rob ert Clarke, and Y ue W ang. Diﬀerential dep endency netw ork analysis to identify condition-sp eciﬁc top ological changes in biological net w orks. Bioinformatics , 25(4):526–532, 2009. A Pro ofs Before we discuss the pro ofs of the main theorems, w e recall some graph theoretic concepts from the Notations subsection. F or a graph G we denote its set of vertices as V ( G ) (or simply V ) and the edge set as E ( G ) or E . The n um b er of v ertices and edges are denoted as v ( G ) = | V ( G ) | and e ( G ) = | E ( G ) | resp ectively . F urther, w e deﬁne the density of G as d ( G ) := e ( G ) /v ( G ) , and k ( G ) := max { d ( H ) | H ⊆ G, e ( H ) ≥ 1 } . A graph is said to b e balanced if k ( G ) = d ( G ) , i.e., it is its densest subgraph. It is said to b e strictly balanced if it is strictly denser than all other subgraphs. Let Aut ( G ) be the group of automorphisms of the graph G and w e will 36 denote the n um b er of automorphisms as a ( G ) = | Aut ( G ) | . W e will denote the Erdös-Rèn yi random graph on N vertices and edge-connection probabilit y p as G ( N , p ) . Here, we introduce a few other graph-theoretic concepts. This includes: Homomorphisms and injective embeddings hom( H , G ) :=   { ϕ : V ( H ) → V ( G ) : { u, v } ∈ E ( H ) ⇒ { ϕ ( u ) , ϕ ( v ) } ∈ E ( G ) }   . inj( H , G ) :=   { ϕ : V ( H )  → V ( G ) : ϕ injectiv e and edge-preserving }   . Lab eled and unlab eled cop y counts (of the motif H in G ) ( lab el le d copies) inj( H , G ) , ( unlab el le d copies) inj( H , G ) | Aut( H ) | . W e will denote the num b er of unlab elled copies of H in G as T ( H , G ) or H ( G ) interc hangeably . No w let N = | V ( G ) | and k = | V ( H ) | . With these notations, we are ready to deﬁne the following notions of density: t ( H , G ) := hom( H , G ) N k , t inj ( H , G ) := inj( H , G ) ( N ) k , where ( N ) k := N ( N − 1) · · · ( N − k + 1) . Ev ery homomorphism is either injective or has a collision (tw o distinct v ertices of H mapp ed to the same v ertex of G ). The num b er of collision patterns dep ends only on H , and for eac h pattern the n um b er of maps is O ( N k − 1 ) . Hence hom( H , G ) = inj( H , G ) + O H ( N k − 1 ) , so hom( H , G ) N k = inj( H , G ) N k + O H  1 N  . Finally , ( N ) k = N k  1 + O k (1 / N )  , so     inj( H , G ) N k − inj( H , G ) ( N ) k     = inj( H , G )    1 N k − 1 ( N ) k    = O H  1 N  , whic h yields t ( H, G ) − t inj ( H , G ) = O H (1 / N ) . F or later use, we will also record the following equiv alent v ersion of this statement: t ( H,G ) | Aut ( H ) | = H ( G ) ( N ) k + O H (1 / N ) . T o apply the large deviation theory it is imp ortan t to view the t ( H , G ) as a function on [0 , 1] ( N 2 ) . With that in mind, for every function f of G , w e will also consider its extension to [0 , 1] ( N 2 ) b y deﬁning f ( x ) to b e the exp ectation of f ( G ) where G ∼ G ( N , x ) (ev ery edge e is sampled indep endently of others with probabilit y x e ). F or instance, t ( H , x ) := E G ∼ G ( N ,x ) [ t ( H , G )] . Note that this generalizes the notation t ( H , G ) as every graph G on m v ertices can b e represen ted as an elemen t of [0 , 1] ( m 2 ) b y ﬁxing a lab elling of the edges of K m b y { 1 , . . .  m 2  } and deﬁning the adjacency vector x ∈ [0 , 1] ( m 2 ) as x i = 1 if the edge lab elled b y i is presen t in G (and 0 otherwise) when G is viewed as a subgraph of K m . Then t ( H , G ) = t ( H , x ) . Pr o of of The or em 3.1 . Recall that ˆ λ n is the unique zero of X ae − λa |{ i | h i = a }| n 37 where the sum ranges ov er all p ossible v alues of the statistic h (and the num b er is ﬁnite in this case, as N is ﬁxed). By strong law of large num b ers the ab ov e display con v erges, for each λ , to X ae − λa P G ( h = a ) almost surely . Observ e that for any compact subset K ⊆ [0 , ∞ ) , the collection of functions { a 7→ ae − λa } λ ∈ K is trivially a p oint wise compact class and hence P G ( h = · ) Glivenk o-Cantelli ([ 64 ] Ch.19). Hence, sup λ ∈ K     X ae − λa |{ i | h i = a }| n − X ae − λa P G ( h = a )     → 0 almost surely . Now [0 , ∞ ) = ∪ ∞ n =1 [0 , n ] is a coun table union. Hence, sup λ ∈ [0 ,n )     X ae − λa |{ i | h i = a }| n − X ae − λa P G ( h = a )     → 0 ∀ n ∈ N almost surely . In other words, the conv ergence is uniform ov er compacts, almost surely . W e note that the ro ot of the equation X ae − λa P G ( h = a ) = 0 exists and is unique by Lemma D.2 and D.1 . Then by Hurwitz’s theorem, ˆ λ n → λ ◦ almost surely , where λ ◦ is the unique zero of P ae − λa P G ( h = a ) . Pr o of of The or em 3.2 . The proof follo ws b y the standard techniques of pro ving CL T for Z estimators. In particular w e p erform a T aylor expansion of the function 1 n P n i =1 h n,i e − λh n,i around λ ◦ and use the facts that ˆ λ n is the root of the function and that ˆ λ n con v erges to λ ◦ almost surely . The steps of the proof identical to that of Theorem 3.5 (the bounds are indeed muc h simpler for the ﬁnite graph) and hence we omit the details. Pr o of of The or em 3.3 . Let 0 <  < λ ◦ / 2 . Since λ ◦ is a ro ot of the equation E h ( Z µ − h 0 ) e − λ ( Z µ − h 0 ) i = 0 , it follo ws that E [( Z µ − h 0 ) e − λ ( Z µ − h 0 ) ] > 0 (resp, < 0 ) at λ = λ ◦ −  (resp, λ ◦ +  ) . Therefore, it suﬃces to sho w that for every ﬁxed λ > 0 , we hav e E [( H m − h 0 ) e − λ ( H m − h 0 ) ] = E [( Z µ − h 0 ) e − λ ( Z µ − h 0 ) ](1 ± o (1)) (A.1) Indeed if ( A.1 ) is true, the tw o functions will b e of the same sign for large enough m implying that the LHS is p ositive at λ ◦ −  and negativ e at λ ◦ +  . This will imply that | λ m − λ ◦ | <  . T o sho w ( A.1 ), ﬁrst note that for λ > 0 , the function x 7→ ( x − h 0 ) e − λ ( x − h 0 ) is b ounded. Hence, b y Poisson conv ergence of H coun ts in the regime mp k ( H ) → c > 0 , we hav e that E [( H m − h 0 ) e − λ ( H m − h 0 ) ] = E [( Z µ − h 0 ) e − λ ( Z µ − h 0 ) ] ± o (1) Noting that in this regime, we also ha v e E [( Z µ − h 0 ) e − λ ( Z µ − h 0 ) ] = Θ(1) for ﬁxed λ , ( A.1 ) holds. 38 Pr o of of The or em 3.4 . It is straightforw ard to see that, ∀ λ > 0 , E  ( h n, 1 e − λh n, 1 ) 4  is uniformly b ounded in n . With this prop erty , we get a strong law of large num b ers for the triangular arra y { h n,i } : P n i =1 h n,i e − λh n,i n − E h h n, 1 e − λh n, 1 i a.s. − − → 0 . Recall that µ = c v ( H ) / | Aut ( H ) | . By rep eating the same calculation as in Theorem 3.3 with m replaced by m n ev erywhere, E h h n, 1 e − λh n, 1 i − E h ( Z µ − h 0 ) e − λ ( Z µ − h 0 ) i → 0 = ⇒ P n i =1 h n,i e − λh n,i n − E h ( Z µ − h 0 ) e − λ ( Z µ − h 0 ) i → 0 . No w if K ⊂ [0 , ∞ ) is compact, then { a 7→ ae − λa : λ ∈ K } has ﬁnite brac k eting n um b er with brack ets of the same form. Hence sup λ ∈ K      1 n n X i =1 h n,i e − λh n,i − E h ( Z µ − h 0 ) e − λ ( Z µ − h 0 ) i      → 0 a.s. Since [0 , ∞ ) is coun tably compact we hav e 1 n n X i =1 h n,i e − λh n,i → E h ( Z µ − h 0 ) e − λ ( Z µ − h 0 ) i uniformly on compacts almost surely . = ⇒ ˆ λ n → λ ◦ almost surely , and hence in probability . Pr o of of L emma 3.1 . Pick a δ > 0 (any δ will do, but one can choose δ = 2 for concreteness). W e verify the Lindeb erg condition: it suﬃces to show that n X i =1 E      h n,i e − λh n,i − E [ h n,i e − λh n,i ] q nV ar ( h n,i e − λh n,i )   2+ δ    = 1 n δ 2 × E      h n, 1 e − λh n, 1 − E [ h n, 1 e − λh n, 1 ] q V ar ( h n, 1 e − λh n, 1 )   2+ δ    → 0 . Indeed, for ev ery λ > 0 , the function x 7→ xe − λx is b ounded on the interv al [ − h 0 , ∞ ) . And hence the exp ectation con verges, b y the Poisson conv ergence of h n, 1 in total v ariation norm. Hence, the righ t hand side go es to 0 . Pr o of of The or em 3.5 . W e will modify the standard pro of of the asymptotic normality of Z -estimators to suit our needs. T o that end, deﬁne Ψ n ( λ ) = 1 n n X i =1 h n,i e − λh n,i , Ψ( λ ) = E h ( Z µ − h 0 ) e − λ ( Z µ − h 0 ) i No w noting that ˆ λ n is a zero of Ψ n ( . ) and con v erges in probability to λ ◦ = log( µ h 0 ) , we expand Ψ n ( ˆ λ n ) in a T a ylor series ab out λ ◦ . Then 0 = Ψ n ( ˆ λ n ) = Ψ n ( λ ◦ ) + ( ˆ λ n − λ ◦ ) ˙ Ψ n ( λ ◦ ) + 1 2 ( ˆ λ n − λ ◦ ) 2 ¨ Ψ n ( ˜ λ n ) , 39 where ˜ λ n is a p oint b etw een ˆ λ n and λ 0 . This can b e rewritten as √ n ( ˆ λ n − λ ◦ ) = − √ n (Ψ n ( λ ◦ ) − E [Ψ n ( λ ◦ )]) ˙ Ψ n ( λ ◦ ) + 1 2 ( ˆ λ n − λ ◦ ) ¨ Ψ n ( ˜ λ n ) + − √ n E [Ψ n ( λ ◦ )] ˙ Ψ n ( λ ◦ ) + 1 2 ( ˆ λ n − λ ◦ ) ¨ Ψ n ( ˜ λ n ) . (A.2) By arguments similar to those in Theorem 3.4 (in particular, the prop ert y that for every p ositive λ , the fourth moment is b ounded uniformly in n ), it follows that the follo wing hold almost surely , ˙ Ψ n ( λ ◦ ) → E h − ( Z µ − h 0 ) 2 e − λ ◦ ( Z µ − h 0 ) i ¨ Ψ n ( λ ◦ ) → E h ( Z µ − h 0 ) 3 e − λ ◦ ( Z µ − h 0 ) i Then the consistency of ˆ λ n ensures that the denominators in ( A.2 ) conv erge almost surely to E  − ( Z µ − h 0 ) 2 e − λ ◦ ( Z µ − h 0 )  . F or the numerators, observ e that by arguing as we did for exp ectation, the v ariance of h n, 1 e − λ ◦ h n, 1 con v erges to that of − ( Z µ − h 0 ) 2 e − λ ( Z µ − h 0 ) . Hence, the ﬁrst numerator cov erges, by Lemma 3.1 , to N  0 , V ar h ( Z µ − h 0 ) e − λ ( Z µ − h 0 ) i F or the second numerator, observ e that E [Ψ n ( λ ◦ )] = E [( Z µ − h 0 ) e − λ ◦ ( Z µ − h 0 ) ] + O ( C ( λ ◦ ) m v ( H ) − 2 n p e ( H ) − 1 ) with p n = ( c m n ) 1 /k ( H ) and some constant C ( λ ◦ ) . This follows by the standard b ound on the total v ariation distance of the H counts from a Poisson random v ariable with the same mean. No w the ﬁrst term is 0 by deﬁnition and the error is o ( c ( λ ◦ ) √ n ) (for some constant c ( λ ◦ ) ) by assumption. Hence the second n umerator go es to 0 in the limit. Com bining the ab ov e using Slutsky’s theorem, we get √ n ( ˆ λ n − λ ◦ ) d − → N 0 , V ar  ( Z µ − h 0 ) e − λ ◦ ( Z µ − h 0 )  E  ( Z µ − h 0 ) 2 e − λ ◦ ( Z µ − h 0 )  2 ! Pr o of of The or em 3.6 . The proof hinges on Theorem 3.5 . How ever since the sample sizes are diﬀeren t w e cannot apply the theorem directly . Instead w e apply the follo wing lemma whose pro of is giv en in the app endix. Lemma A.1. L et ˆ θ 1 and ˆ θ 2 b e estimators of r e al p ar ameters θ 1 and θ 2 b ase d on two indep endent samples of sizes n 1 and n 2 , r esp e ctively. Supp ose ther e exist sc aling se quenc es a n 1 → ∞ , b n 2 → ∞ and ﬁnite p ositive c onstants σ 2 1 , σ 2 2 such that a n 1 ( ˆ θ 1 − θ 1 ) d − → N (0 , σ 2 1 ) , b n 2 ( ˆ θ 2 − θ 2 ) d − → N (0 , σ 2 2 ) . Assume that the r ate r atio satisﬁes r n := a n 1 /b n 2 → ρ ∈ (0 , ∞ ) and deﬁne the fol lowing statistic T n := ( ˆ θ 1 − ˆ θ 2 ) − ( θ 1 − θ 2 ) p V ∗ n , V ∗ n := σ 2 1 a 2 n 1 + σ 2 2 b 2 n 2 . Then it fol lows that T n d − → N (0 , 1) . 40 The pro of of the theorem now follows by setting a n 1 = √ n 1 , b n 2 = √ n 2 and observing that √ n i ( ˆ λ n i − λ ◦ i ) d − → N  0 , σ 2 i  with σ 2 i = V ar h ( Z µ i − h 0 ) e − λ ◦ i ( Z µ i − h 0 ) i E h ( Z µ i − h 0 ) 2 e − λ ◦ i ( Z µ i − h 0 ) i 2 for i ∈ { 1 , 2 } . Pr o of of The or em 3.7 . The core of the argumen t relies on the large deviation principle for general ERGMs dev elop ed in [ 16 ]. The expectation E [ e − λ m v ( H ) − 2 ( H m − h 0 ) ] can b e viewed as the partition function of a new ER GM whose Hamiltonian is the sum of the original ERGM Hamiltonian and a p erturbation term related to the H -coun t. E  e − λ m v ( H ) − 2 ( H m − h 0 )  = 1 Z m ( β ) X e − λ m v ( H ) − 2 ( H m ( G ) − h 0 )+ m 2 P β k t ( T k ,G ) = e λh 0 m v ( H ) − 2 × P e − λ m v ( H ) − 2 H m ( G )+ m 2 P β k t ( T k ,G ) P e m 2 P β k t ( T k ,G ) = e λh 0 m v ( H ) − 2 × P exp { m 2 ( − λt ( H , G ) / | Aut ( H ) | + O ( λ/m ) + P β k t ( T k , G )) } P exp { m 2 P β k t ( T k , G ) } F rom Theorem 3.1 of [ 16 ], the normalized log-partition function of an ERGM conv erges to the suprem um of a free energy functional ov er the space of graphons. Applying this result to our p erturb ed mo del, w e get that 1 m 2 ln  E β  e − λ m v ( H ) − 2 ( H m − h 0 )  → λh 0 m v ( H ) + sup ˜ h ∈ ˜ W − λ | Aut ( H ) | t ( H , h ) + K X k =1 β k t ( T k , h ) − 1 2 I ( h ) ! − sup ˜ h ∈ ˜ W K X k =1 β k t ( T k , h ) − 1 2 I ( h ) ! = g ( λ ) (A.3) for all λ ∈ R . Since x 7→ ln( x ) is strictly increasing, w e need to sho w the conv ergence of critical p oints in the display ab ov e. The functions in the sequence ab ov e are all strictly conv ex in λ by direct diﬀerentiation. The limiting function is conv ex b eing a sum of an aﬃne part and a suprem um o v er conv ex functions. Now the conv ergence of critical p oints will follow once we show that the limiting function is strictly conv ex. This is equiv alent to showing that the function G ( λ ) := sup ˜ h ∈ ˜ W − λ | Aut ( H ) | t ( H , h ) + K X k =1 β k t ( T k , h ) − 1 2 I ( h ) ! is strictly con v ex in λ . Supp ose, for the sake of contradiction, that G ( λ ) is not strictly conv ex. Then there exist λ 1 < λ 2 and µ ∈ (0 , 1) such that µG ( λ 1 ) + (1 − µ ) G ( λ 2 ) = G ( µλ 1 + (1 − µ ) λ 2 ) . 41 This implies that there must exist a single graphon ˜ w ∗ ∈ ˜ W that simultaneously achiev es the suprem um for b oth λ 1 and λ 2 . Let w ∗ b e any representativ e of the class ˜ w ∗ . A ccording to the Euler-Lagrange equations derived in Theo- rem 6.3 of [ 16 ], an y maximizing graphon for the ER GM with Hamiltonian deﬁned by statistics H , T 1 , . . . , T K and parameters ( − λ | Aut ( H ) | , β 1 , . . . , β K ) must satisfy: w ∗ ( x, y ) = exp  2  − λ | Aut ( H ) | ∆ H w ∗ ( x, y ) + P K k =1 β k ∆ T k w ∗ ( x, y )  1 + exp  2  − λ | Aut ( H ) | ∆ H w ∗ ( x, y ) + P K k =1 β k ∆ T k w ∗ ( x, y )  , where ∆ F w ( x, y ) is the op erator giving the c hange in densit y of a graph F when an edge is added at ( x, y ) : ∆ F w ( x, y ) := X ( r,s ) ∈ E ( F ) Z [0 , 1] | V ( F ) \{ r,s }| Y ( r ′ ,s ′ ) ∈ E ( F ) ( r ′ ,s ′ )  =( r,s ) w ( z r ′ , z s ′ ) Y v ∈ V ( F ) v  = r,s dz v This equation must hold for our ﬁxed w ∗ for all λ ∈ [ λ 1 , λ 2 ] . Ho w ev er, since H is a non-empt y graph, the term ∆ H w ∗ ( x, y ) is not identically zero (as established in [ 16 ]). This means we can rearrange the equation to solve for λ : λ = | Aut ( H ) | ∆ H w ∗ ( x, y ) K X k =1 β k ∆ T k w ∗ ( x, y ) − 1 2 logit ( w ∗ ( x, y )) ! . Since the righ t-hand side is a ﬁxed v alue determined b y w ∗ and the β k , λ must be uniquely determined. This contradicts the assumption that the same w ∗ is a maximizer for tw o distinct λ v alues. Therefore, the function G ( λ ) must b e strictly conv ex. Sketch of pr o of of The or em 3.8 . Our goal is to show the conv ergence in probability of the empirical critical p oin t ˆ λ n to the theoretical one λ ◦ in the dense ERGM regime. The pro of hinges on the uniform conv ergence of the normalized log-partition function: 1 m 2 n log 1 n n X i =1 e − λ m v ( H ) − 2 n h n,i ! to its limit, which, from Theorem 3.7 , is: λh 0 m v ( H ) n + sup ˜ h ∈ ˜ W − λ | Aut ( H ) | t ( H , h ) + K X k =1 β k t ( T k , h ) − 1 2 I ( h ) ! − sup ˜ h ∈ ˜ W K X k =1 β k t ( T k , w ) − 1 2 I ( h ) ! . The core idea is to apply the log-sum-exp approximation framew ork from [ 15 ]. W e rewrite the empirical ob jectiv e function as a log-partition function ov er the space of all graphs on m n v ertices: 1 m 2 n log X G N ( G ) n e − λ m v ( H ) − 2 n ( H ( G ) − h 0 ) ! = λh 0 m v ( H ) n + 1 m 2 n log    X x ∈{ 0 , 1 } ( m n 2 ) e f ( x )    , where x is the adjacency vecto r of a graph, N ( G ) is its count in the sample, and the energy function is f ( x ) = log N ( x ) n − λ m v ( H ) − 2 n H ( x ) . W e decomp ose this energy function into tw o parts: f ( x ) = f H ( x ) + log N ( x ) n , f H ( x ) = − λ m v ( H ) − 2 n H ( x ) . 42 1. H-count P art Control: The term f H ( x ) = − λ m v ( H ) − 2 n H ( x ) is the standard H -count in teraction from our maximum entrop y problem. As shown in [ 15 ], the function and its deriv atives are well-behav ed after appropriate scaling: ∥ f H ∥ ∞ = O ( m 2 n | λ | ) ,     ∂ f H ∂ x e     ∞ = O ( | λ | ) and     ∂ 2 f H ∂ x e ∂ x e ′     ∞ = O  | λ | m n  . 2. ERGM P art Control: The term log N ( x ) n dep ends on the empirical sample drawn from the ERGM P β . The k ey insigh t is that for a suﬃcien tly large n umber of samples n (sp eciﬁcally , when  m n 2  = o (log n ) ), the empirical distribution of graphs will b e v ery close to the true probability distribution P β . Therefore, with high probability , log N ( x ) n b eha v es lik e the log-likelihoo d of the ERGM itself: log N ( x ) n ≈ m 2 n K X k =1 β k t ( T k , x ) − log Z m n ( β ) , where t ( T k , x ) is the homomorphism density of the k -th suﬃcien t statistic. The term log Z m n ( β ) is a constan t with resp ect to x and do es not aﬀect the deriv atives. The Hamiltonian part, m 2 n P β k t ( T k , x ) , and its deriv atives are w ell-con trolled, so that we ha v e: ∥ f E RGM ∥ ∞ = O ( m 2 n ) ,     ∂ f E RGM ∂ x e     ∞ = O (1) and     ∂ 2 f E RGM ∂ x e ∂ x e ′     ∞ = O  1 m n  . where f E RGM = m 2 n P K k =1 β k t ( T k , x ) − log Z m n ( β ) 3. Application of General Result: The total energy function is f ( x ) = f H ( x ) + log N ( x ) n . Com- bining the bounds, the o verall energy function satisﬁes ∥ f ∥ ∞ = O P ( m 2 n ) , ∥ ∂ f /∂ x e ∥ ∞ = O P (1) and ∥ ∂ 2 f / ( ∂ x e ∂ x e ′ ) ∥ ∞ = O P (1 /m n ) . These are precisely the conditions required by the general log-sum- exp appro ximation theorem (Theorem 1.6 in [ 15 ]). The theorem sho ws that the error terms in the appro ximation are negligible after normalization b y m 2 n . This establishes the uniform conv ergence of the empirical ob jectiv e function to the desired v ariational formula g ( λ ) . Since g ( λ ) was shown to b e strictly conv ex in the pro of of Theorem 3.7 , the con v ergence of the empirical critical p oin t ˆ λ n to the unique critical p oint λ ◦ of g ( λ ) is guaran teed. Pr o of of The or em 3.9 . W e b egin by p erforming a T a ylor expansion of the empirical critical p oint equation ab out λ ◦ : 0 = n X i =1 h n,i e − λ ◦ m v ( H ) − 2 n h n,i + ( ˆ λ n − λ ◦ ) n X i =1 − h 2 n,i m v ( H ) − 2 n e − ˜ λ n m v ( H ) − 2 n h n,i = ⇒ m 2 n ( ˆ λ n − λ ◦ ) = n P i =1 h n,i m v ( H ) n e − λ ◦ m v ( H ) − 2 n h n,i n P i =1 h 2 n,i m 2 v ( H ) n e − ˜ λ n m v ( H ) − 2 n h n,i 43 F or λ ◦ = 0 , we get m 2 n ˆ λ n = 1 n n X i =1 h n,i m v ( H ) n 1 n n X i =1 h 2 n,i m 2 v ( H ) n e − ˜ λ n m v ( H ) − 2 n h n,i = 1 √ n n X i =1 h n,i m v ( H ) − 1 n 1 n n X i =1 h 2 n,i m 2 v ( H ) n e − ˜ λ n m v ( H ) − 2 n h n,i 1 √ nm n . (A.4) Deﬁne Z n := 1 √ n n X i =1 h n,i m v ( H ) − 1 n . W e show that m n 4 √ n Z n P − → 0 . (A.5) It suﬃces to show that the summands h n,i m v ( H ) − 1 n are subgaussian with mean zero and a v ariance pro xy b ounded b y a constant indep endent of n . Equiv alentl y , there exists σ 2 > 0 such that for all s ∈ R , E h e sh n,i /m v ( H ) − 1 n i ≤ e σ 2 s 2 / 2 , uniformly ov er i, n . This ensures that Z n itself is subgaussian with a v ariance pro xy not dep ending on n , hence tight. Consequentl y , m n 4 √ n Z n P − → 0 . T o this end, w e apply Theorem 1 from [ 30 ] in the sub critical regime of the ferromagnetic ERGM. Recall from earlier that the partial deriv atives of H ( x ) m v ( H ) − 2 n are b ounded in the l ∞ norm by a constant. Hence, the lipsc hitz vector of H ( x ) is b ounded in l ∞ norm by m v ( H ) − 2 n . By Theorem 1 of [ 30 ], we get the b ound P ( | H ( x ) − E [ H ( x )] | ≥ t ) ≤ 2 e − ct 2 ( m n 2 ) m v ( H ) − 2 n · m v ( H ) − 2 n for all t ∈ R . Replacing t b y m v ( H ) − 1 n t , we get P     H ( x ) m v ( H ) − 1 n − E h H ( x ) m v ( H ) − 1 n i    ≥ t  ≤ 2 e − ct 2 , which shows the constan t v ariance proxy . In the next step we show that V n := 4 √ n × m n × 1 n n X i =1 h 2 n,i m 2 v ( H ) n e − ˜ λ n m v ( H ) − 2 n h n,i = Ω P  1 m n  . (A.6) W e will need the empirical mean and second moment. h n := 1 n n X i =1 h n,i m v ( H ) n , h 2 n := 1 n n X i =1 h 2 n,i m 2 v ( H ) n . W e will also use the empirical v ariance d V ar( h n ) := h 2 n − h n 2 . 44 F or any δ > 0 , we hav e the following chain of inequalities: 1 n X i h 2 n,i m 2 v ( H ) n e − ˜ λ n m v ( H ) − 2 n h n,i ≥ e − ˜ λ n m v ( H ) − 2 n  h n − t q h 2 n − h n 2  δ 2 #     h n,i m v ( H ) n    ≥ δ, h n,i ≥ h n − t q h 2 n − h n 2  n ≥ e − ˜ λ n m v ( H ) − 2 n  h n − t q h 2 n − h n 2  δ 2     1 −     # n    h n,i m v ( H ) n    < δ o n + #  h n,i < h n − t q h 2 n − h n 2  n         ≥ e − ˜ λ n m v ( H ) − 2 n  h n − t q h 2 n − h n 2  δ 2 1 − P      h n,i m v ( H ) n      < δ ! + o p (1) + 1 /t 2 !! = e − ˜ λ n m v ( H ) − 2 n  h n − t q h 2 n − h n 2  δ 2 P      h n,i m v ( H ) n      ≥ δ ! − 1 /t 2 − o p (1) ! where the ﬁrst inequalit y follows since ˜ λ n < 0 . Now from the central limit theorem for subgraph coun ts in the sub critical regime of a ferromagnetic ERGM ( Corollary 1.2 of [ 67 ]), we get that P     h n,i m v ( H ) n    ≥ 1 m n  → P ( |N (0 , c )) | ≥ 1) The limit is p ositive. Denote it by l . Cho ose t > 1 √ l . Then for δ = 1 m n , the RHS is lo w er b ounded, in probability , by e − ˜ λ n m v ( H ) − 2 n  h n − t q h 2 n − h n 2  1 m 2 n  l − 1 /t 2 − o p (1)  No w each h n,i has moments at most p olynomial in m n (b y the same subgaussian tail argument). Using E [ h n,i ] = 0 and E [ h 2 n,i ] = O ( m 2 v ( H ) − 2 n ) we get that h n con v erges to 0 in probabilit y for n ≫ m 2 v ( H ) − 2 n . So,we ha v e h n − t q h 2 n − h n 2 = O p  t q m 2 v ( H ) − 2 n  = O p ( m v ( H ) − 1 n ) for ﬁxed t . Hence, the RHS is low er b ounded b y e − O ( | ˜ λ n | m n ) l − 1 /t 2 m 2 n in probability . Now, 4 √ n × m n × e − O ( | ˜ λ n | m n ) l − 1 /t 2 m 2 n = Ω P  1 m n  where we hav e used the facts ˜ λ n P − → 0 and n ≫ e ( m n 2 ) . Finally , we plug-in ( A.5 ) and ( A.6 ) in ( A.4 ) to conclude that m 2 n ˆ λ n P − → 0 . Sketch of pr o of of The or em 3.10 . Before pro ving Theorem 3.10 , we give a brief outline of the proof b elo w. This theorem sharp ens the consistency result of Theorem 3.8 by establishing the precise asymptotic b ehavior of m 2 n ( ˆ λ n − λ ◦ ) . 45 W e b egin by p erforming a T aylor expansion of the empirical critical p oin t equation ab out λ ◦ : 0 = n X i =1 h n,i e − λ ◦ m v ( H ) − 2 n h n,i + ( ˆ λ n − λ ◦ ) n X i =1 − h 2 n,i m v ( H ) − 2 n e − ˜ λ n m v ( H ) − 2 n h n,i = ⇒ m 2 n ( ˆ λ n − λ ◦ ) = n P i =1 h n,i m v ( H ) n e − λ ◦ m v ( H ) − 2 n h n,i n P i =1 h 2 n,i m 2 v ( H ) n e − ˜ λ n m v ( H ) − 2 n h n,i . T o analyze the asymptotic b ehavior of the n umerator and denominator separately , we emplo y t w o distinct v ariational formulations based on the log-sum-exp approximation. F or the n umerator, we analyze the deriv ative of the normalized log-partition function. By Danskin’s Theorem [ 23 ] and the uniform conv ergence established in the pro of of Theorem 3.8 , we hav e: n P i =1 − h n,i m v ( H ) n e − λ ◦ m v ( H ) − 2 n h n,i n P i =1 e − λ ◦ m v ( H ) − 2 n h n,i P − → d dλ g ( λ )     λ = λ ◦ = p e ( H ) 0 | Aut ( H ) | − ( u ∗ ) e ( H ) | Aut ( H ) | (A.7) where u ∗ is the unique constan t graphon that maximizes the v ariational problem deﬁning g ( λ ◦ ) from ( 3.12 ). F or the denominator, we in tro duce a crucial auxiliary function by adding a squared H -count term: 1 m 2 n log X G exp − λ m v ( H ) − 2 n h ( G ) + log N ( G ) n + α m 2 v ( H ) − 2 n h ( G ) 2 ! where h ( G ) = H ( G ) − h 0 . W e analyze this function by: 1. Reparameterizing the exp onent to isolate the terms in v olving the statistics H ( G ) and H ( G ) 2 , analogous to the pro of of Theorem 3.7 . The term log N ( G ) n is now appro ximated by the ERGM Hamiltonian P k β k T k ( G ) . 2. Showing this function is diﬀerentiable at α = 0 by analyzing the b ehavior of the squared H statistic T 2 H ( x ) = ( H ( x )) 2 . As this is a combinatorial prop erty , the deriv ative b ounds remain v alid:      ∂ ∂ x ij α m 2 v ( H ) − 2 n T 2 H ( x ) !      ≤ O ( | α | ) ,      ∂ 2 ∂ x ij ∂ x kl α m 2 v ( H ) − 2 n T 2 H ( x ) !      ≤ O ( | α | ) m n . 3. Establishing that the auxiliary function conv erges uniformly to its limiting v ariational form: sup ˜ h ∈ ˜ W  − λ ◦ | Aut ( H ) | + 2 αp e ( H ) 0 | Aut ( H ) | 2 ! t ( H , h ) + K X k =1 β k t ( T k , h ) + α | Aut ( H ) | 2 t ( H , h ) 2 − 1 2 I ( h )  + λ ◦ p e ( H ) 0 | Aut ( H ) | + αp 2 e ( H ) 0 | Aut ( H ) | 2 − sup ˜ h ∈ ˜ W  K X k =1 β k t ( T k , h ) − 1 2 I ( h )  . 46 Using Danskin’s theorem [ 23 ], w e compute the deriv ativ e of this limiting function with resp ect to α at α = 0 , which captures the limiting b ehavior of the denominator. Since the supremum in the limit is uniquely attained at the constant graphon u ∗ , we obtain: n P i =1 h 2 n,i m 2 v ( H ) n e − λ ◦ m v ( H ) − 2 n h n,i n P i =1 e − λ ◦ m v ( H ) − 2 n h n,i P − → ( u ∗ ) e ( H ) | Aut ( H ) | − p e ( H ) 0 | Aut ( H ) | ! 2 (A.8) Com bining the limits of the numerator and denominator yields the stated conv ergence: m 2 n ( ˆ λ n − λ ◦ ) P − → ( u ∗ ) e ( H ) | Aut ( H ) | − p e ( H ) 0 | Aut ( H ) |  ( u ∗ ) e ( H ) | Aut ( H ) | − p e ( H ) 0 | Aut ( H ) |  2 = 1 ( u ∗ ) e ( H ) | Aut ( H ) | − p e ( H ) 0 | Aut ( H ) | . (A.9) Pr o of of The or em 3.11 . T ake p 0 to b e (1 −  ) , so that p, ˜ p < p 0 . W e apply Theorem 3.10 to the pair ( p, p 0 ) to obtain m 2 n ( ˆ λ n − λ ◦ ( p )) P − → 1 ( u ∗ ( p )) e ( H ) | Aut ( H ) | − p e ( H ) 0 | Aut ( H ) | (A.10) and to the pair, ( ˜ p, p 0 ) to obtain m 2 n ( ˜ λ n − λ ◦ ( ˜ p )) P − → 1 ( u ∗ ( ˜ p )) e ( H ) | Aut ( H ) | − p e ( H ) 0 | Aut ( H ) | (A.11) where λ ◦ ( p ) = lim ˆ λ n and λ ◦ ( ˜ p ) = lim ˜ λ n . It can b e sho wn from the computations of Theorem 3.10 that • p = ˜ p = ⇒ λ ◦ ( p ) = λ ◦ ( ˜ p ) and u ∗ ( p ) = u ∗ ( ˜ p ) • p < ˜ p = ⇒ λ ◦ ( p ) < λ ◦ ( ˜ p ) and u ∗ ( p ) < u ∗ ( ˜ p ) • p > ˜ p = ⇒ λ ◦ ( p ) > λ ◦ ( ˜ p ) and u ∗ ( p ) > u ∗ ( ˜ p ) Hence subtracting ( A.11 ) from ( A.10 ) giv es the result. B Pro ofs of Theorems 3.8 and 3.10 Pr o of of The or em 3.8 . F or a graph G on m vertices (with m = m n ), let x = ( x ij ) 1 ≤ i | λ ◦ | + 1 . Then for any ε > 0 , w e hav e, for suﬃcien tly large n , P  | ˆ λ n | > M  ≤ P  | ˆ λ n − λ ◦ | > 1  < ε, so { ˆ λ n } n is b ounded in probability and, with probability tending to one, ˆ λ n ∈ [ − M , M ] . Consequently , in all preceding arguments, | λ | can b e b ounded b y M with high probability . By Lemma C.2 , the same  -net works for the image of ∇ f if  is not as small as o p ( 1 m n ) . Application of the Log-Sum-Exp Approximation and Final Error Bounds With the deriv ativ e and complexity b ounds for the energy function f ( x ) established, we can no w apply the general log-sum-exp approximation theorem (Theorem 1.6 in [ 15 ]). The theorem implies that: log X x ∈{ 0 , 1 } ( m 2 ) exp( f ( x )) ≤ sup x ∈ [0 , 1] ( m 2 )  f ( x ) − I ( x )  + 1 4  X e b 2 e  1 / 2  + 3 n v  + log |D (  ) | + S, (B.2) 49 and log X x ∈{ 0 , 1 } ( m 2 ) exp( f ( x )) ≥ sup x ∈ [0 , 1] ( m 2 )  f ( x ) − I ( x )  − 1 2 X e c ee , (B.3) where: • n v =  m 2  ∼ m 2 2 is the num b er of v ariables. • b e is the b ound on the ﬁrst deriv ative for co ordinate e , so that b e ≤ B where B is a constan t dep ending on λ and β . • c ee is the b ound on the diagonal second deriv ative, whic h for our com bined energy function is b ounded b y O ( B /m ) with high probability . • S is the smo othness term which, after normalization by n v , is O ( m − 1 / 2 ) (see Lemma C.4 ). F ollo wing the standard pro cedure for applying this theorem, we choose  =  B 3 log m n m n  1 / 5 . After dividing b y m 2 , the error terms in the upper bound ( B.2 ) conv erge to zero. The lo wer b ound error term, 1 m 2 P c ee = 1 m 2 O ( n v · B /m ) = O ( B /m ) , also v anishes. This leads to the following con v ergence result, stated as a lemma. Lemma B.1 (Conv ergence of P artition F unction for the Empirical ER GM) . Assume that m = m n → ∞ and that  m n 2  = o (log n ) . L et the sample gr aphs b e dr awn fr om an ERGM with p ar ameters β = ( β 1 , . . . , β K ) . Then ther e exist c onstants c, C > 0 (dep ending on the choic e of statistics H , T 1 , . . . , T K ) such that, with pr ob ability tending to 1 as n → ∞ , − c B m − 1 ≤ log Z m ( λ ) m 2 − L m ( λ ) ≤ C B 8 / 5 m − 1 / 5 (log m ) 1 / 5  1 + log B log m  + C B 2 m − 1 / 2 , wher e Z m ( λ ) = X G exp − λ m v ( H ) − 2 H ( G ) + log  N ( G ) n  ! , L m ( λ ) = sup x ∈ [0 , 1] ( m 2 ) ( 1 m 2  − λ m v ( H ) − 2 H ( x ) + log  N ( x ) n   − I ( x ) m 2 ) , and the c onstant B is given by B = 1 + | λ | | Aut ( H ) | + K X k =1 | β k | . 50 The limit of L m T o ﬁnd the limit of L m , w e decomp ose the term 1 m 2 log  N ( x ) n  . F or any graph x ∈ { 0 , 1 } ( m 2 ) , whic h represen ts an adjacency v ector, we can write the log of the empirical frequency as the sum of the log- probabilit y under the P β mo del and a deviation term: 1 m 2 log  N ( x ) n  = 1 m 2 log p β ( x ) + 1 m 2 log  N ( x ) n  − 1 m 2 log p β ( x ) Substituting this decomp osition back in to the expression for L m , we obtain: L m = sup x ∈ [0 , 1] ( m 2 ) ( − λ m v ( H ) H ( x ) − I ( x ) m 2 + 1 m 2 log p β ( x ) + 1 m 2 log  N ( x ) n  − 1 m 2 log p β ( x ) ) ≤ sup x ∈ [0 , 1] ( m 2 ) ( − λ m v ( H ) H ( x ) − I ( x ) m 2 + 1 m 2 log p β ( x ) ) + sup x ∈ [0 , 1] ( m 2 )  1 m 2 log  N ( x ) n  − 1 m 2 log p β ( x )  . By the uniform w eak law of large n umbers (Lemma C.1 ), the ratio inside the logarithm conv erges to 1 in probabilit y , uniformly ov er all graphs x . Therefore, the error term v anishes as n → ∞ : 1 m 2 log  N ( x ) n  − 1 m 2 log p β ( x ) = o P (1) . A similar low er b ound can b e established. This means that the limit of L m is determined by the limit of the main v ariational term sup x ∈ [0 , 1] ( m 2 ) ( − λ m v ( H ) H ( x ) − I ( x ) m 2 + 1 m 2 log p β ( x ) ) . No w we take limits as m → ∞ : lim m →∞ " sup x ∈ [0 , 1] ( m 2 ) ( − λ m v ( H ) H ( x ) + 1 m 2 log p β ( x ) − I ( x ) m 2 )# = lim m →∞ " sup x ∈ [0 , 1] ( m 2 ) ( − λ m v ( H ) H ( x ) + K X k =1 β k t ( T k , x ) − I ( x ) m 2 ) − 1 m 2 Z m ( β ) # = sup x ∈∪ m [0 , 1] ( m 2 ) ( − λ m v ( H ) H ( x ) + K X k =1 β k t ( T k , x ) − I ( x ) m 2 ) − sup ˜ h ∈ ˜ W K X k =1 β k t ( T k , h ) − 1 2 I ( h ) ! = sup x ∈∪ m [0 , 1] ( m 2 ) ( − λ | Aut ( H ) | t ( H , x ) + K X k =1 β k t ( T k , x ) − I ( x ) m 2 ) − sup ˜ h ∈ ˜ W K X k =1 β k t ( T k , h ) − 1 2 I ( h ) ! = sup ˜ h ∈ ˜ W − λ | Aut ( H ) | t ( H , h ) + K X k =1 β k t ( T k , h ) − 1 2 I ( h ) ! − sup ˜ h ∈ ˜ W K X k =1 β k t ( T k , h ) − 1 2 I ( h ) ! . 51 where we ha v e used the result t ( H,G ) | Aut ( H ) | = H ( G ) ( m ) v ( H ) + O H (1 /m ) to replace H ( x ) m v ( H ) b y t ( H,x ) | Aut ( H ) | , and the last step follo ws by the following lemma whose pro of is giv en in the app endix Lemma B.2. Fix ﬁnite simple gr aphs T 1 , . . . , T K and β 1 , . . . , β K ∈ R . L et F ( h ) := K X k =1 β k t ( T k , h ) − 1 2 I ( h ) for h ∈ W . L et S ⊂ W b e the class of step functions on [0 , 1] 2 taking values in [0 , 1] . Then sup h ∈ W F ( h ) = sup h ∈ S F ( h ) , Note: The space S can b e iden tiﬁed with the space ∪ m [0 , 1] ( m 2 ) as follows: represent an element of [0 , 1] ( m 2 ) as the adjacency v ector of a w eigh ted graph, and then consider the adjacency matrix. The adjacency matrix can b e represented as a step function on [0 , 1] 2 in a natural w a y . The other direction is also straightforw ard. This shows that lim m →∞ L m exists in probabilit y and is equal to the desired v ariational form ula, which in turn implies that 1 m 2 n log 1 n n X i =1 e − λ m v ( H ) − 2 n h n,i ! con v erges in probability to g ( λ ) . The conv ergence of the ro ot follows from an argument similar to the pro of of Theorem 3.7 . Pr o of of The or em 3.10 . Expanding as b efore ab out λ ◦ , we get 0 = n X i =1 h n,i e − λ ◦ m v ( H ) − 2 n h n,i + ( ˆ λ n − λ ◦ ) n X i =1 − h 2 n,i m v ( H ) − 2 n e − ˜ λ n m v ( H ) − 2 n h n,i = ⇒ m 2 n ( ˆ λ n − λ ◦ ) = n P i =1 h n,i m v ( H ) n e − λ ◦ m v ( H ) − 2 n h n,i n P i =1 h 2 n,i m 2 v ( H ) n e − ˜ λ n m v ( H ) − 2 n h n,i . As mentioned in the pro of outline we will deal with the n umerator and denominator of the RHS of the ab ov e displa y separately . F or the numerator we will prov e ( A.7 ), while for the denominator w e will establish ( A.8 ). Since the statement of the theorem follows from com bining ( A.7 ) and ( A.8 ), the rest of the pro of will b e dev oted to proving ( A.7 ) and ( A.8 ). F rom the pro of of Theorem 3.8 we hav e: 1 m 2 n log( 1 n n X i =1 e − λ m v ( H ) − 2 n h n,i ) P − → g ( λ ) (B.4) where g ( λ ) = p e ( H ) 0 | Aut ( H ) | | {z } C 1 λ + sup ˜ h ∈ ˜ W − λ | Aut ( H ) | t ( H , h ) + K X k =1 β k t ( T k , h ) − 1 2 I ( h ) ! | {z } G ( λ ) − sup ˜ h ∈ ˜ W K X k =1 β k t ( T k , h ) − 1 2 I ( h ) ! . 52 Deﬁne the following quantities: g ( λ, ˜ h ) = − λ | Aut ( H ) | t ( H , h ) + K X k =1 β k u e ( T k ) − 1 2 I ( h ) G ( λ ) = sup ˜ h ∈ ˜ W g ( λ, ˜ h ) H ∗ ( λ ) = arg max ˜ h ∈ ˜ W g ( λ, ˜ h ) (Here, argmax is the set of all ˜ h ∈ ˜ W where the supremum is attained. The set is non-empt y b y the compactness of ˜ W ). Let co denote the closed con v ex h ull. Then by Danskin’s Theorem [ 23 ], the sub diﬀerential of G at λ ◦ is: ∂ G ( λ ◦ ) = co n ∇ λ g ( λ ◦ , ˜ h ) | ˜ h ∈ H ∗ ( λ ◦ ) o = co  − t ( H , h ) | Aut ( H ) | | ˜ h ∈ H ∗ ( λ ◦ )  W e are given that the suprem um is attained at a unique p oint u ∗ ∈ [0 , 1] ⊆ ˜ W at λ ◦ . Th us, H ∗ ( λ ◦ ) = { u ∗ } and ∂ G ( λ ◦ ) = co  − t ( H , u ∗ ) | Aut ( H ) |  =  − t ( H , u ∗ ) | Aut ( H ) |  . Since ∂ G ( λ ◦ ) is a singleton, G is diﬀeren tiable at λ ◦ with G ′ ( λ ◦ ) = − t ( H,u ∗ ) | Aut ( H ) | . Th us w e ha v e g ( λ ) = C 1 λ + G ( λ ) + const. is diﬀeren tiable at λ ◦ with deriv ative: g ′ ( λ ◦ ) = C 1 + G ′ ( λ ◦ ) = p e ( H ) 0 | Aut ( H ) | − t ( H , u ∗ ) | Aut ( H ) | = p e ( H ) 0 | Aut ( H ) | − ( u ∗ ) e ( H ) | Aut ( H ) | F urther, observe that • The left hand side of ( B.4 ) is a sequence of conv ex functions. • The right hand side of ( B.4 ), b eing a p oin t wise limit of conv ex functions, is conv ex. • The left hand side is a sequence of diﬀerentiable functions. • W e ha v e shown that the right hand side is diﬀerentiable at λ ◦ as the supremum is attained at a unique p oin t. T ak e an y subsequence of the left hand side. Then it has a further subsequence that conv erges almost surely . Using the abov e conditions on the subsequence, w e get that the subsequen tial deriv atives at λ ◦ con v erge almost surely to the deriv ativ e of the right hand side. Hence w e get that every subsequence of deriv ativ es of the left hand side has a further subsequence conv erging almost surely . Therefore, we hav e the con v ergence in probability of deriv atives at λ ◦ : n P i =1 − h n,i m v ( H ) n e − λ ◦ m v ( H ) − 2 n h n,i n P i =1 e − λ ◦ m v ( H ) − 2 n h n,i P − → p e ( H ) 0 | Aut ( H ) | − ( u ∗ ) e ( H ) | Aut ( H ) | 53 Th us the conv ergence outlined in ( A.7 ) is established. Next to analyze con v ergence in ( A.8 ), we require the limit of n P i =1 h 2 n,i m 2 v ( H ) n e − ˜ λ n m v ( H ) − 2 n h n,i n P i =1 e − ˜ λ n m v ( H ) − 2 n h n,i T o that end, we consider 1 m 2 n log X G e − λ m v ( H ) − 2 n ( H ( G ) − m v ( H ) n p e ( H ) 0 | Aut ( H ) | ) +log N ( G ) n + α m 2 v ( H ) − 2 n ( H ( G ) − m v ( H ) n p e ( H ) 0 | Aut ( H ) | ) 2 = 1 m 2 n log X G e m 2 n   − λ m v ( H ) n ( H ( G ) − m v ( H ) n p e ( H ) 0 | Aut ( H ) | ) + 1 m 2 n log N ( G ) n + α m 2 v ( H ) n ( H ( G ) − m v ( H ) n p e ( H ) 0 | Aut ( H ) | ) 2   = 1 m 2 n log X G e m 2 n "( − λ | Aut ( H ) | − 2 αp e ( H ) 0 | Aut ( H ) | 2 ) H ( G ) m v ( H ) n / | Aut ( H ) | + 1 m 2 n log N ( G ) n + α | Aut ( H ) | 2  H ( G ) m v ( H ) n / | Aut ( H ) |  2 # + λp e ( H ) 0 | Aut ( H ) | + αp 2 e ( H ) 0 | Aut ( H ) | 2 As b efore, we denote this by 1 m 2 n log X G e f ( G ) + λp e ( H ) 0 | Aut ( H ) | + αp 2 e ( H ) 0 | Aut ( H ) | 2 (B.5) where f ( x ) = m 2 n "( − λ | Aut ( H ) | − 2 αp e ( H ) 0 | Aut ( H ) | 2 ) H ( x ) m v ( H ) n / | Aut ( H ) | + 1 m 2 n log N ( x ) n + α | Aut ( H ) | 2 H ( x ) m v ( H ) n / | Aut ( H ) | ! 2   (B.6) Squared H Statistic T H ( x ) Consider T H ( x ) =  H ( x )  2 . Its ﬁrst deriv ative is ∂ T H ( x ) ∂ x e = 2 H ( x ) ∂ H ( x ) ∂ x e , and using H ( x ) ≤  m v ( H )  = O ( m v ( H ) ) and    ∂ H ( x ) ∂ x e    ≤ C m v ( H ) − 2 w e obtain    ∂ T H ( x ) ∂ x e    ≤ 2 C O ( m 2 v ( H ) − 2 ) . 54 After multiplying by the normalization factor α/m 2 v ( H ) − 2 , we hav e    ∂ ∂ x e  α m 2 v ( H ) − 2 T H ( x )     ≤ O ( | α | ) . Similarly , the second deriv ative of T H ( x ) is b ounded by O ( m 2 v ( H ) − 3 ) , so that after the factor α/m 2 v ( H ) − 2 w e obtain    ∂ 2 ∂ x e ∂ x e ′  α m 2 v ( H ) − 2 T H ( x )     ≤ C ′′ | α | m n , for some constant C ′′ . Com bination of Edge, H , and Squared H Coun t Bounds The energy function in ( B.6 ) now decomp oses as f ( x ) = log  N ( x ) n  + f H ( x ) + f S ( x ) , where f H ( x ) = − λ + 2 αp e ( H ) 0 | Aut ( H ) | m v ( H ) − 2 H ( x ) , f S ( x ) = α m 2 v ( H ) − 2 T 3 ( x ) , T 3 ( x ) =  H ( x )  2 . F or the H part f H ( x ) , standard combinatorial arguments (as in [ 15 ]) yield the b ounds ∥ H ∥ ≤ C m v ( H ) ,    ∂ H ∂ x e    ≤ C m v ( H ) − 2 ,    ∂ 2 H ∂ x e ∂ x e ′    ≤ C m v ( H ) − 3 , for edges e, e ′ . It follows that ∥ f H ∥ ∞ ≤ m 2 n | λ + 2 αp e ( H ) 0 | Aut ( H ) | |     ∂ f H ∂ x e     ≤ | λ + 2 αp e ( H ) 0 | Aut ( H ) | | m v ( H ) − 2 C m v ( H ) − 2 = C | λ + 2 αp e ( H ) 0 | Aut ( H ) | | , and     ∂ 2 f H ∂ x e ∂ x e ′     ≤ | λ + 2 αp e ( H ) 0 | Aut ( H ) | | m v ( H ) − 2 · C m v ( H ) − 3 = C | λ + 2 αp e ( H ) 0 | Aut ( H ) | | m n . F or log  N ( x ) n  , as shown previously ,     log  N ( x ) n      ∞ ≤ O ( m 2 n ∥ β ∥ 1 ) + o P (1) ,     ∂ ∂ x e log  N ( x ) n      ∞ ≤ O ( ∥ β ∥ 1 ) + o P (1) ,     ∂ 2 ∂ x e ∂ x e ′ log  N ( x ) n      ∞ ≤ O ( ∥ β ∥ 1 ) + o P (1) m n . 55 Th us, its contribution to the gradien t is essentially constan t and (up to an o P (1) error) adds a ﬁxed v ector whose cov ering num b er is one. Combining these with the b ounds for the squared H term f S ( x ) , we get ∥ f ∥ ∞ ≤ m 2 n | λ + 2 αp e ( H ) 0 | Aut ( H ) | | + O ( m 2 n ∥ β ∥ 1 ) + m 2 n | α | + o P (1) , the deriv ative satisﬁes     ∂ f ∂ x e     ≤ O ( ∥ β ∥ 1 ) + C | λ + 2 αp e ( H ) 0 | Aut ( H ) | | + O ( | α | ) + o P (1) , and the second deriv ativ e is b ounded by     ∂ 2 f ∂ x e ∂ x e ′     ≤ O ( ∥ β ∥ 1 ) + C | λ + 2 αp e ( H ) 0 | Aut ( H ) | | + C ′′ | α | + o P (1) m n . As b efore, the next step is to b ound the complexity of the gradient, ∇ f . F ollowing the previous line of argumen ts one obtains that there exists a constan t C ′ > 0 such that the net satisﬁes log |D (  ) | ≤ C ′ B 4 m  4 log C ′ B 4  4 , with B = 1 + | λ | Aut ( H ) | + 2 αp e ( H ) 0 | Aut ( H ) | 2 | + | α | | Aut ( H ) | + K X k =1 | β k | C k . Applying the log-sum approximation (Theorem 1.6 in [ 15 ]), we get the follo wing result. 1 m 2 n log 1 n n X i =1 e − λ m v ( H ) − 2 n h n,i + α m 2 v ( H ) − 2 n h 2 n,i ! P − → sup ˜ h ∈ ˜ W  − λ | Aut ( H ) | + 2 αp e ( H ) 0 | Aut ( H ) | 2 ! t ( H , h ) + K X k =1 β k t ( T k , h ) + α | Aut ( H ) | 2 t ( H , h ) 2 − 1 2 I ( h )  + λp e ( H ) 0 | Aut ( H ) | + αp 2 e ( H ) 0 | Aut ( H ) | 2 − sup ˜ h ∈ ˜ W  K X k =1 β k t ( T k , h ) − 1 2 I ( h )  uniformly in λ (see pro of of Theorem 3.8 for details). Since we also hav e ˜ λ n P − → λ ◦ , we get 56 1 m 2 n log 1 n n X i =1 e − ˜ λ n m n h n,i + α m 2 v ( H ) − 2 n h 2 n,i ! (B.7) P − → sup ˜ h ∈ ˜ W  − λ ◦ | Aut ( H ) | + 2 αp e ( H ) 0 | Aut ( H ) | 2 ! t ( H , h ) + K X k =1 β k t ( T k , h ) + α | Aut ( H ) | 2 t ( H , h ) 2 − 1 2 I ( h )  + λ ◦ p e ( H ) 0 | Aut ( H ) | + αp 2 e ( H ) 0 | Aut ( H ) | 2 − sup ˜ h ∈ ˜ W  K X k =1 β k t ( T k , h ) − 1 2 I ( h )  (B.8) Let F ( α ) b e deﬁned as follows: F ( α ) = − sup ˜ h ∈ ˜ W  K X k =1 β k t ( T k , h ) − 1 2 I ( h )  + λ ◦ p e ( H ) 0 | Aut ( H ) | | {z } Const w.r.t α + αp 2 e ( H ) 0 | Aut ( H ) | 2 + sup ˜ h ∈ ˜ W − λ ◦ | Aut ( H ) | + 2 αp e ( H ) 0 | Aut ( H ) | 2 ! t ( H , h ) + K X k =1 β k t ( T k , h ) + α | Aut ( H ) | 2 t ( H , h ) 2 − 1 2 I ( h ) ! | {z } g ( α, ˜ h ) W e note that F is con v ex in α , b eing the p oint wise limit of conv ex functions . Alternatively , g ( α, ˜ h ) is linear in α , hence sup ˜ h g ( α, ˜ h ) is conv ex. F ( α ) is the sum of this conv ex function and a linear term αp 2 e ( H ) 0 | Aut ( H ) | 2 (plus constants), hence conv ex. W e analyze its diﬀerentiabilit y at α = 0 . Deﬁne G ( α ) : = sup ˜ h ∈ ˜ W g ( α, ˜ h ) H ∗ (0) = arg max ˜ h ∈ ˜ W g (0 , ˜ h ) and we note that ∇ α g ( α, ˜ h ) = − 2 p e ( H ) 0 | Aut ( H ) | 2 t ( H , h ) + 1 | Aut ( H ) | 2 t ( H , h ) 2 g (0 , ˜ h ) = − λ ◦ | Aut ( H ) | t ( H , h ) + K X k =1 β k t ( T k , h ) − 1 2 I ( h ) , F urther we are given that this supremum is attained at a unique p oint u ∗ ∈ ˜ W , which implies that H ∗ (0) = { u ∗ } . By Danskin’s Theorem [ 23 ], G ( α ) is diﬀeren tiable at α = 0 , and its deriv ative is: G ′ (0) = ∇ α g (0 , ˜ h ∗ ) = − 2 p e ( H ) 0 | Aut ( H ) | 2 t ( H , u ∗ ) + 1 | Aut ( H ) | 2 t ( H , u ∗ ) 2 . 57 Since F ( α ) = Const + αp 2 e ( H ) 0 | Aut ( H ) | 2 + G ( α ) , its deriv ative at α = 0 is given b y: F ′ (0) = p 2 e ( H ) 0 | Aut ( H ) | 2 + G ′ (0) = p 2 e ( H ) 0 | Aut ( H ) | 2 − 2 p e ( H ) 0 | Aut ( H ) | 2 t ( H , u ∗ ) + 1 | Aut ( H ) | 2 t ( H , u ∗ ) 2 = p 2 e ( H ) 0 | Aut ( H ) | 2 − 2 p e ( H ) 0 | Aut ( H ) | 2 u ∗ e ( H ) + 1 | Aut ( H ) | 2 u ∗ 2 e ( H ) F urther, observe that • The left hand side of ( B.8 ) is a sequence of conv ex functions of α . • The right hand side of ( B.8 ), b eing a p oin t wise limit of conv ex functions, is conv ex. • The left hand side of ( B.8 ) is a sequence of diﬀerentiable functions of α . • F rom [ 16 ], we know that the right hand side is diﬀeren tiable at 0 as the suprem um is attained at a unique p oint. By passing to subsequences as b efore, we get n P i =1 h 2 n,i m 2 v ( H ) n e − λ m v ( H ) − 2 n h n,i + α m 2 v ( H ) − 2 n h 2 n,i n P i =1 e − λ m v ( H ) − 2 n h n,i + α m 2 v ( H ) − 2 n h 2 n,i P − → p 2 e ( H ) 0 | Aut ( H ) | 2 − 2 p e ( H ) 0 u ∗ e ( H ) | Aut ( H ) | 2 + u ∗ 2 e ( H ) | Aut ( H ) | 2 = u ∗ e ( H ) | Aut ( H ) | − p e ( H ) 0 | Aut ( H ) | ! 2 C Pro of of tec hnical lemmas Lemma C.1 (Uniform W eak Law of Large Numbers) . L et G n, 1 , . . . , G n,n b e indep endent r andom gr aphs sample d fr om P β mo del, wher e m n → ∞ as n → ∞ . F or any gr aph G on m n vertic es, we have E  1 ( G n,i = G )  = p β ( G ) . Deﬁne X i,G := 1 ( G n,i = G ) p β ( G ) . Then, if  m n 2  = o (log n ) , for every  > 0 we have P sup G ∈G m n    1 n n X i =1 X i,G − 1    >  m n ! → 0 58 Pr o of. F or each ﬁxed graph G on m n v ertices, note that E [ X i,G ] = E  1 ( G n,i = G )  p β ( G ) = 1 . Th us, for each G , the random v ariables { X i,G } n i =1 are i.i.d. with mean 1. Moreo v er, the v ariance of the indicator is V ar  1 ( G n,i = G )  = p β ( G )  1 − p β ( G )  , so that V ar( X i,G ) = 1 p β ( G ) − 1 . By Chebyshev’s inequality , for any ﬁxed G and  > 0 P    1 n n X i =1 X i,G − 1    >  m n ! ≤ m 2 n n 2 V ar( X 1 ,G ) . A union b ound o v er the (at most) 2 ( m n 2 ) graphs on m n v ertices then yields P sup G ∈G m n    1 n n X i =1 X i,G − 1    >  m n ! ≤ 2 ( m n 2 ) · m 2 n n 2  1 min G ( p β ( G )) − 1  . Under the assumption  m n 2  = o (log n ) , the right-hand side conv erges to zero as n → ∞ . This completes the pro of. Lemma C.2 (Uniform Conv ergence of the Log-Deriv ative) . L et G n, 1 , . . . , G n,n b e indep endent r andom gr aphs sample d fr om the P β mo del, with m n → ∞ in such a way that  m n 2  = o (log n ) . F or e ach gr aph G on m n vertic es, let N ( G ) = n X i =1 1  G n,i = G  b e the numb er of o c curr enc es of G in the sample. Extend the function log  N ( G ) n  to [0 , 1] ( m n 2 ) multiline arly and denote the extension by log  N ( x ) n  . Then, for every  > 0 we have P sup x ∈ [0 , 1] ( m n 2 ) sup 1 ≤ i  m n ! → 0 59 Pr o of. By the uniform w eak law Lemma C.1 , for every graph G on m n v ertices we hav e N ( G ) n = p β ( G )  1 + δ n ( G )  , with m n δ n ( G ) → 0 in probability uniformly o v er G . T aking logarithms, we write log N ( G ) n = log p β ( G ) + η n ( G ) , where η n ( G ) = log(1 + δ n ( G )) and m n η n ( G ) → 0 uniformly in probability o v er all graphs G . By multilinear extension, for any x ∈ [0 , 1] ( m n 2 ) the extended function log  N ( x ) n  agrees with the ab o v e on the vertices (i.e., on the set of all graphs). F or an edge e = ij , deﬁne the ﬁnite-diﬀerence deriv ative D e ( x ) := ∂ ∂ x ij log  N ( x ) n  = log  N ( x + e ) n  − log  N ( x − e ) n  , where x + e denotes that the e th co ordinate is set to 1 and x − e that it is set to 0. Then we hav e log  N ( x + e ) n  − log  N ( x − e ) n  = log p β ( x + e ) − log p β ( x − e ) + ∆ e ( x ) , where we hav e deﬁned ∆ e ( x ) = η n ( x + e ) − η n ( x − e ) . Note that p β ( x ) is also m ultilinear, so that its deriv ative is also giv en by a ﬁnite-diﬀerence. Since the set of v ertices (i.e., graphs on m n v ertices) is ﬁnite with cardinalit y at most 2 ( m n 2 ) and there are  m n 2  p ossible edges, a union b ound shows that there exists a function δ ( n ) → 0 suc h that P max x ∈{ 0 , 1 } ( m n 2 ) max e | ∆ e ( x ) | >  m n ! ≤ 2 ( m n 2 )  m n 2  δ ( n ) . By the multilinearit y of the extension, the same b ound applies when taking the supremum ov er all x ∈ [0 , 1] ( m n 2 ) . Hence, P sup x ∈ [0 , 1] ( m n 2 ) sup 1 ≤ i  m n ! ≤ 2 ( m n 2 )  m n 2  δ ( n ) . Under the assumption  m n 2  = o (log n ) , the right-hand side conv erges to zero as n → ∞ . This completes the pro of. Lemma C.3 (Uniform Con v ergence of the Discrete Second Deriv ative) . L et G n, 1 , . . . , G n,n b e indep endent r andom gr aphs sample d fr om the P β mo del with m n → ∞ in such a way that  m n 2  = o (log n ) . F or e ach gr aph G on m n vertic es, deﬁne N ( G ) = n X i =1 1  G n,i = G  60 to b e the numb er of o c curr enc es of G in the sample, and extend the function log  N ( G ) n  to [0 , 1] ( m n 2 ) by multiline ar interp olation, denoting the extension by log  N ( x ) n  . F or any two distinct e dges e and e ′ (viewe d as c o or dinates), deﬁne the discr ete se c ond derivative by ∂ 2 ∂ x e ∂ x e ′ log  N ( x ) n  := log  N ( x + e + e ′ ) n  − log  N ( x + e − e ′ ) n  − log  N ( x − e + e ′ ) n  + log  N ( x − e − e ′ ) n  , wher e for a given x ∈ [0 , 1] ( m n 2 ) , the notation x ± e ± e ′ indic ates that the c o or dinates c orr esp onding to e dges e and e ′ ar e set to the indic ate d values (either 0 or 1), while the r emaining c o or dinates r emain as in x . Then, for every  > 0 we have P sup x ∈ [0 , 1] ( m n 2 ) sup e,e ′ e  = e ′     ∂ 2 ∂ x e ∂ x e ′ log  N ( x ) n  − ∂ 2 ∂ x e ∂ x e ′ log p β ( x )     >  m n ! → 0 Pr o of. F or vertices of the hypercub e, that is, for x ∈ { 0 , 1 } ( m n 2 ) corresp onding to graphs, the uniform weak la w (see previous lemmas) sho ws that N ( G ) n = p β ( G )  1 + δ n ( G )  , with error terms m n δ n ( G ) → 0 uniformly in probability ov er all graphs G . Consequen tly , taking logarithms, w e hav e log N ( G ) n = log p β ( G ) + η n ( G ) , where η n ( G ) = log  1 + δ n ( G )  satisﬁes m n η n ( G ) → 0 uniformly in probability . Therefore, for x ∈ { 0 , 1 } ( m n 2 ) w e hav e ∂ 2 ∂ x e ∂ x e ′ log  N ( x ) n  = log  p β ( x + e + e ′ ) p β ( x − e − e ′ ) p β ( x + e − e ′ ) p β ( x − e + e ′ )  + η n ( x + e + e ′ ) − η n ( x + e − e ′ ) − η n ( x − e + e ′ ) + η n ( x − e − e ′ ) . By the uniform conv ergence of m n η n ( · ) to 0 in probability , for every  > 0 there exists δ ( n ) → 0 such that P max x ∈{ 0 , 1 } ( m n 2 ) max e,e ′ e  = e ′   η n ( x + e + e ′ ) − η n ( x + e − e ′ ) − η n ( x − e + e ′ ) + η n ( x − e − e ′ )   >  m n ! ≤ 2 ( m n 2 )  m n 2  2 δ ( n ) . Since the set of v ertices { 0 , 1 } ( m n 2 ) has cardinalit y at most 2 ( m n 2 ) and there are at most  m n 2  2 pairs ( e, e ′ ) , a union b ound o v er these ﬁnite sets yields the stated probabilit y b ound. Finally , b y the multilinear extension the same bound applies when taking the suprem um o v er all x ∈ [0 , 1] ( m n 2 ) . Under the assumption  m n 2  = o (log n ) , the factor 2 ( m n 2 )  m n 2  2 gro ws slow er than any p ositiv e p o w er of n , ensuring that the right-hand side tends to zero as n → ∞ . This completes the pro of. 61 Lemma C.4 (Controlling the smo othness term) . The smo othness term S in formula ( B.2 ) , after normal- ization by n v , is O ( m − 1 / 2 ) . Pr o of. W e will in v ok e Theorem 1.6 from Chatterjee & Dembo [ 15 ] (henceforth NLLD). Let f : [0 , 1] n v → R b e a function that is t wice contin uously diﬀerentiable in (0 , 1) n v , such that f and its ﬁrst and second order deriv ativ es extend contin uously to the b oundary . Deﬁne the following quantities: • a := ∥ f ∥ ∞ , the supremum norm of the function. • b e := ∥ ∂ f /∂ x e ∥ ∞ for any edge e . • c ee ′ := ∥ ∂ 2 f / ( ∂ x e ∂ x e ′ ) ∥ ∞ for any edges e and e ′ . • I ( x ) := P e ( x e log x e + (1 − x e ) log(1 − x e )) is the entrop y function for x ∈ [0 , 1] ( m 2 ) . Theorem 1.6 of NLLD provides an upp er b ound for the free energy F = log P x ∈{ 0 , 1 } n v e f ( x ) : F ≤ sup x ∈ [0 , 1] n v ( f ( x ) − I ( x )) + complexity term + smoothness term (C.1) The smo othness term, whic h we denote S NLLD , is given by: S NLLD = 4   X e ( ac ee + b 2 e ) + 1 4 X e,e ′ ( ac 2 ee ′ + b e b e ′ c ee ′ + 4 b e c ee )   1 / 2 ( S 1 ) + 1 4 X e b 2 e ! 1 / 2 X e c 2 ee ! 1 / 2 ( S 2 ) + 3 X e c ee + log 2 ( S 3 ) Our term S in formula ( B.2 ) corresp onds to this S NLLD . W e apply the theorem to the following function main text: f ( x ) = − λ m v ( H ) − 2 H ( x ) + log  N ( x ) n  (C.2) Here, x = ( x e ) e ∈ E m is the vector of edge indicators for a graph on m vertices. The num b er of v ariables is n v = Θ( m 2 ) . The deriv ative b ounds are: • a = ∥ f ∥ ∞ = O ( m 2 B ) , where B = 1 + C H | λ | + P K k =1 | β k | C k is an O (1) constan t. • b e = ∥ ∂ f /∂ x e ∥ ∞ = O ( B ) . • c ee ′ = ∥ ∂ 2 f / ( ∂ x e ∂ x e ′ ) ∥ ∞ = O ( B /m ) . The crucial prop erty that generalizes is the sparsit y of the second deriv ative interactions . The term c ee ′ is O ( B /m ) if edges e and e ′ share a vertex, which happ ens in at most O ( m 3 ) wa ys, and it is O ( B /m 2 ) if they do not share a vertex, which happ ens in at most O ( m 4 ) w a ys. This sparsit y is the key to the ﬁnal scaling. 62 Calculation of the Smoothness T erm Comp onents W e now calculate each comp onent of S NLLD using these scalings. The indices e, e ′ no w run ov er the n v = O ( m 2 ) edges. Comp onen t S 1 : This is the dominant term. The sum under the square ro ot has several parts: • P e ( ac ee + b 2 e ) : Each term is O ( m 2 B · B /m )+ O ( B 2 ) = O ( mB 2 ) . The sum ov er n v terms is O ( n v · mB 2 ) = O ( m 3 B 2 ) . • P e,e ′ ac 2 ee ′ : The vertex sharing pairs ( e, e ′ ) contribute O ( m 3 ) terms. Each term is O ( m 2 B · ( B /m ) 2 ) = O ( B 3 ) . The remaining pairs ( e, e ′ ) con tribute O ( m 4 ) terms. Each term is O ( m 2 B · ( B /m 2 ) 2 ) = O ( B 3 /m 2 ) . The sum is O ( m 3 · B 3 ) + O ( m 4 · B 3 /m 2 ) = O ( m 3 B 3 ) . • P e,e ′ b e b e ′ c ee ′ : The vertex sharing pairs contribute O ( m 3 ) terms. Eac h term is O ( B · B · B /m ) = O ( B 3 /m ) . The remaining pairs contribute O ( m 4 ) terms. Each term is O ( B · B · B /m 2 ) = O ( B 3 /m 2 ) . The sum is O ( m 3 · B 3 /m ) + O ( m 4 · B 3 /m 2 ) = O ( m 2 B 3 ) . • P e,e ′ 4 b e c ee = n v P e 4 b e c ee : The inner sum is O ( n v · B · B /m ) = O ( m 2 · B 2 /m ) = O ( mB 2 ) . The total is O ( m 2 · mB 2 ) = O ( m 3 B 2 ) . The sum under the square ro ot is O ( m 3 B 2 + m 3 B 3 + m 2 B 3 ) = O ( m 3 B 3 ) assuming B ≥ 1 . Thus, the entire S 1 comp onen t is O (( m 3 B 3 ) 1 / 2 ) = O ( m 3 / 2 B 3 / 2 ) . Comp onen t S 2 : 1 4  P e b 2 e  1 / 2  P e c 2 ee  1 / 2 • The ﬁrst square ro ot is ( O ( n v B 2 )) 1 / 2 = ( O ( m 2 B 2 )) 1 / 2 = O ( mB ) . • The second square ro ot is ( O ( n v ( B /m ) 2 )) 1 / 2 = ( O ( m 2 · B 2 /m 2 )) 1 / 2 = O ( B ) . Th us, comp onent S 2 is O ( mB · B ) = O ( mB 2 ) . Comp onen t S 3 : 3 P e c ee + log 2 The sum has n v terms, each of size O ( B /m ) . The sum is O ( n v · B /m ) = O ( m 2 · B /m ) = O ( mB ) . T otal Smoothness T erm and Normalization Com bining the comp onents gives the total smo othness term: S NLLD = O ( m 3 / 2 B 3 / 2 ) + O ( mB 2 ) + O ( mB ) F or large m and B ≥ 1 , the dominan t term is S 1 = O ( m 3 / 2 B 3 / 2 ) . The main text normalizes the log-partition function b y m 2 , whic h corresp onds to normalizing error terms by m 2 . The relev ant normalization factor for the smo othness term from NLLD is the num b er of v ariables, n v =  m 2  = O ( m 2 ) . S NLLD n v = O ( m 3 / 2 B 3 / 2 ) O ( m 2 ) = O ( m 3 / 2 − 2 B 3 / 2 ) = O ( m − 1 / 2 B 3 / 2 ) Since B is an O (1) constant with resp ect to m , we arrive at the ﬁnal scaling: S NLLD n v = O ( m − 1 / 2 ) 63 Pr o of of L emma 3.2 . Let h 0 > 0 b e a constan t and deﬁne µ ( λ ) = h 0 e λ . Let Z µ ( λ ) ∼ Poisson( µ ( λ )) . Deﬁne the deterministic functions f ( λ ) := E h ( Z µ ( λ ) − h 0 ) 2 e − λ ( Z µ ( λ ) − h 0 ) i , g ( λ ) := V ar  ( Z µ ( λ ) − h 0 ) e − λ ( Z µ ( λ ) − h 0 )  . As a ﬁrst step we will prov e that the functions f ( λ ) and g ( λ ) are contin uous. Con tin uity of f ( λ ) W e may rewrite f ( λ ) = e λh 0 E h ( Z − h 0 ) 2 e − λZ i , Z ∼ P oisson( µ ( λ )) . Using the Poisson moment generating function we wan t to compute E [( Z − h 0 ) 2 e − λZ ] = E [ Z 2 e − λZ ] − 2 h 0 E [ Z e − λZ ] + h 2 0 E [ e − λZ ] . F or a Poisson ( µ ) random v ariable we hav e E [ e − λZ ] = exp  µ ( e − λ − 1)  . Diﬀeren tiating with resp ect to λ yields E [ Z e − λZ ] = µe − λ exp  µ ( e − λ − 1)  , and a second diﬀerentiation giv es E [ Z 2 e − λZ ] =  µe − λ + µ 2 e − 2 λ  exp  µ ( e − λ − 1)  . Substituting these expressions (with µ = µ ( λ ) ) into the previous expansion, we obtain E [( Z − h 0 ) 2 e − λZ ] = exp  µ ( λ )( e − λ − 1)  h µ ( λ ) e − λ + µ ( λ ) 2 e − 2 λ − 2 h 0 µ ( λ ) e − λ + h 2 0 i . Consequen tly , f ( λ ) = e λh 0 exp  µ ( λ )( e − λ − 1)  h µ ( λ ) e − λ + µ ( λ ) 2 e − 2 λ − 2 h 0 µ ( λ ) e − λ + h 2 0 i . Finally , since the map λ 7→ µ ( λ ) = h 0 e λ , it is easy to observe that f ( λ ) is contin uous in λ . Con tin uity of g ( λ ) Deﬁning Y λ := ( Z µ ( λ ) − h 0 ) e − λ ( Z µ ( λ ) − h 0 ) , w e will show that the map λ 7− → V ar( Y λ ) is contin uous. It suﬃces to prov e con tin uit y of λ 7→ E [ Y λ ] and λ 7→ E [ Y 2 λ ] . W e hav e E [ Y λ ] = E h ( Z − h 0 ) e − λ ( Z − h 0 ) i = e λh 0 E h ( Z − h 0 ) e − λZ i , Z ∼ P oisson( µ ( λ )) . Expanding gives E h ( Z − h 0 ) e − λZ i = E [ Z e − λZ ] − h 0 E [ e − λZ ] . Using E [ e − λZ ] = exp  µ ( e − λ − 1)  , E [ Z e − λZ ] = µe − λ exp  µ ( e − λ − 1)  , 64 one obtains E [ Y λ ] = e λh 0 exp  µ ( λ )( e − λ − 1)   µ ( λ ) e − λ − h 0  . It is immediate that λ 7→ E [ Y λ ] is contin uous. Next note that Y 2 λ = ( Z − h 0 ) 2 e − 2 λ ( Z − h 0 ) = e 2 λh 0 ( Z − h 0 ) 2 e − 2 λZ , so that E [ Y 2 λ ] = e 2 λh 0 E [( Z − h 0 ) 2 e − 2 λZ ] = e 2 λh 0 ( E [ Z 2 e − 2 λZ ] − 2 h 0 E [ Z e − 2 λZ ] + h 2 0 E [ e − 2 λZ ]) . Using the Poisson moment generating function with t = − 2 λ giv es E [ e − 2 λZ ] = exp  µ ( e − 2 λ − 1)  , E [ Z e − 2 λZ ] = µe − 2 λ exp  µ ( e − 2 λ − 1)  , and E [ Z 2 e − 2 λZ ] =  µe − 2 λ + µ 2 e − 4 λ  exp  µ ( e − 2 λ − 1)  . Therefore, with µ = µ ( λ ) , E [ Y 2 λ ] = e 2 λh 0 exp  µ ( λ )( e − 2 λ − 1)  h µ ( λ ) e − 2 λ + µ ( λ ) 2 e − 4 λ − 2 h 0 µ ( λ ) e − 2 λ + h 2 0 i . It follows that λ 7→ E [ Y 2 λ ] is contin uous. Since λ 7→ E [ Y λ ] and λ 7→ E [ Y 2 λ ] are contin uous λ 7→ V ar( Y λ ) = E [ Y 2 λ ] −  E [ Y λ ]  2 is contin uous. Con v ergence of the estimators of f ( λ ) and g ( λ ) Note that conditioning on ˆ λ , the quantit y ˆ µ = µ ( ˆ λ ) is ﬁxed, and Z ˆ µ | ˆ λ ∼ P oisson( ˆ µ ) by deﬁnition. Therefore, E h ( Z ˆ µ − h 0 ) 2 e − ˆ λ ( Z ˆ µ − h 0 ) | ˆ λ i = f ( ˆ λ ) , and similarly V ar h ( Z ˆ µ − h 0 ) e − ˆ λ ( Z ˆ µ − h 0 ) | ˆ λ i = g ( ˆ λ ) . Since ˆ λ P − → λ ◦ and f is contin uous, the contin uous mapping theorem yields f ( ˆ λ ) P − → f ( λ ◦ ) = E h ( Z µ − h 0 ) 2 e − λ ◦ ( Z µ − h 0 ) i , where µ = µ ( λ ◦ ) .Lik ewise, contin uity of g implies g ( ˆ λ ) P − → g ( λ ◦ ) = V ar  ( Z µ − h 0 ) e − λ ◦ ( Z µ − h 0 )  . Pr o of of L emma A.1 . Let ∆ n := ( ˆ θ 1 − ˆ θ 2 ) − ( θ 1 − θ 2 ) . Deﬁne U n := a n 1 ( ˆ θ 1 − θ 1 ) , V n := b n 2 ( ˆ θ 2 − θ 2 ) . 65 By assumption, U n d − → N (0 , σ 2 1 ) and V n d − → N (0 , σ 2 2 ) . Since the samples are indep endent, U n and V n are indep enden t for each n . Consider the linear combination a n 1 ∆ n = U n − r n V n , where r n = a n 1 b n 2 . Let ϕ U n ( t ) = E [ e itU n ] and ϕ V n ( t ) = E [ e itV n ] denote the characteristic functions. Indep endence gives ϕ a n 1 ∆ n ( t ) = E [ e it ( U n − r n V n ) ] = ϕ U n ( t ) ϕ V n ( − r n t ) . By the marginal central limit theorems and the conv ergence r n → ρ , ϕ U n ( t ) → e − 1 2 σ 2 1 t 2 , ϕ V n ( − r n t ) → e − 1 2 σ 2 2 ρ 2 t 2 . Hence ϕ a n 1 ∆ n ( t ) → exp  − 1 2 ( σ 2 1 + ρ 2 σ 2 2 ) t 2  , whic h is the characteristic function of N (0 , σ 2 1 + ρ 2 σ 2 2 ) . By Lévy’s contin uity theorem, a n 1 ∆ n d − → N  0 , σ 2 1 + ρ 2 σ 2 2  . Equiv alen tly , letting V ∗ n := σ 2 1 a 2 n 1 + σ 2 2 b 2 n 2 , w e hav e ∆ n p V ∗ n d − → N (0 , 1) . Pr o of of L emma B.2 . F or a ﬁnite partition P of [0 , 1] , let A = σ ( P × P ) b e the sigma algebra generated by P × P . Now given an h ∈ W , consider h A = E [ h | A ] , where the randomness is wrt the Leb esgue measure on [0 , 1] 2 . Since I is conv ex, Jensen’s inequalit y yields E  I ( h ) | A  ≥ I  E [ h | A ]  = I ( h A ) . In tegrating b oth sides gives I ( h ) ≥ I ( h A ) . F or an y ﬁnite simple graph H , the map h 7→ t ( H , h ) is contin uous wrt the cut metric. By the F rieze–Kannan weak regularit y lemma, w e can ﬁnd a sequence of ﬁnite partitions P n suc h that for the corresp onding sigma algebras A n ha v e the prop erty that h A n → h wrt the cut metric. Putting these together, we get F ( h A n ) = K X k =1 β k t ( T k , h A n ) − 1 2 I ( h A n ) ≥ K X k =1 β k t ( T k , h A n ) − 1 2 I ( h ) Hence, lim sup F ( h A n ) ≥ F ( h ) , so that F ( h ) is upp er b ounded by the supremum ov er step-functions. 66 D Existence and Uniqueness of ro ots D.1 Conditions for existence and uniqueness of ro ots W e b egin b y stating the tw o lemmas required to establish the existence and uniqueness of the ro ot for the momen t-generating function equations arising in the maximum entrop y framework. Lemma D.1. L et X b e a r e al value d r andom variable such that P ( X  = 0) > 0 , and let its L aplac e tr ansform exist in an op en interval I . Deﬁne the function f ( λ ) = E [ X e − λX ] on I . Then, ∃ at most one λ ∗ ∈ I such that f ( λ ∗ ) = 0 . F urthermor e, if such a λ ∗ exists, it is the unique minimizer of the c onvex function M ( λ ) = E [ e − λX ] on I . Pr o of. Under the assumption that the Laplace transform exists on I , we can diﬀerentiate under the exp ec- tation: f ′ ( λ ) = d dλ E [ X e − λX ] = E [ − X 2 e − λX ] Since P ( X  = 0) > 0 , the term X 2 e − λX is strictly p ositiv e with non-zero probabilit y . Therefore, E [ X 2 e − λX ] > 0 , whic h implies f ′ ( λ ) < 0 for all λ ∈ I . Hence f ( λ ) has at most one ro ot. Since M ′ ( λ ) = − f ( λ ) , this means that if such a λ ∗ exists, it must b e the unique minimizer of the conv ex function M ( λ ) . Lemma D.2. L et X b e a discr ete r andom variable with ﬁnite supp ort X = { x 1 , . . . , x k } . Deﬁne f ( λ ) = E [ X e − λX ] . A solution λ ∗ ∈ R to f ( λ ∗ ) = 0 exists if and only if: • min( X ) < 0 < max( X ) , OR • P ( X = 0) = 1 (i.e., X is the zer o c onstant). Pr o of. W e ﬁrst prov e suﬃciency . Case 1: P ( X = 0) = 1 . If X ≡ 0 , then f ( λ ) = E [0 · e − λ · 0 ] = 0 for all λ ∈ R . Thus, an y λ ∗ ∈ R is a ro ot. Case 2: min( X ) < 0 < max( X ) . Let x min = min( X ) and x max = max( X ) . By assumption, x min < 0 and x max > 0 . The function f ( λ ) = P x ∈X P ( X = x ) xe − λx is a ﬁnite sum of contin uous functions, hence f is contin uous on R . As λ → ∞ , we factor out the term with the smallest exp onen t: f ( λ ) = e − λx min " P ( X = x min ) x min + X x>x min P ( X = x ) xe − λ ( x − x min ) # Since x − x min > 0 for all terms in the summation, e − λ ( x − x min ) → 0 as λ → ∞ . Because x min < 0 and P ( X = x min ) > 0 , the leading term inside the brack et is strictly negative. Thus, lim λ →∞ f ( λ ) = −∞ . As λ → −∞ , we factor out the term with the largest exp onen t: f ( λ ) = e − λx max " P ( X = x max ) x max + X x 0 and P ( X = x max ) > 0 , the leading term is strictly p ositive. Thus, lim λ →−∞ f ( λ ) = ∞ . By the Intermedia te V alue Theorem, since f is contin uous and takes b oth p ositive and negative v alues on R , there exists at least one λ ∗ ∈ R such that f ( λ ∗ ) = 0 . W e no w pro ve necessity . Supp ose X is not almost surely zero. If min( X ) ≥ 0 , then ev ery x ∈ X is non-negativ e and at least one x > 0 . Then xe − λx ≥ 0 for all x, λ , and E [ X e − λX ] > 0 , so no root exists. Similarly , if max( X ) ≤ 0 , then xe − λx ≤ 0 for all x, λ with at least one x < 0 . Thus E [ X e − λX ] < 0 , and no ro ot exists. Therefore, the supp ort must strictly straddle zero for a ro ot to exist for non-constant X . 67 D.2 Existence and uniqueness of ro ots in netw ork mo dels Let G n, 1 , . . . , G n,n b e n indep endent and identically distributed realizations of random graphs on m n v ertices. Let h n,i = T ( H , G n,i ) − h 0 denote observed statistic, where h 0 is chosen such that E [ h n,i ] = 0 under the null h yp othesis. W e deﬁne an auxiliary discrete random v ariable Y n represen ting the empirical distribution of the subgraph coun ts. Y n tak es v alues in the multiset H n = { h n, 1 , . . . , h n,n } with uniform probability: P ( Y n = h n,i ) = 1 n , for i = 1 , . . . , n. The estimating equation for the Lagrange multiplier ˆ λ n tak es diﬀeren t forms dep ending on the regime’s normalization, but all can b e cast in the form of an exp ectation ov er Y n . D.2.1 V eriﬁcation of Conditions T o in v ok e Lemma D.1 and Lemma D.2 , we must verify that the supp ort of Y n strictly straddles zero with high probability as n → ∞ . That is, we deﬁne the ev en t: E n :=  min 1 ≤ i ≤ n h n,i < 0 < max 1 ≤ i ≤ n h n,i  . W e justify that lim n →∞ P ( E n ) = 1 under any hypothesis where h 0 is in the interior of the supp ort of H ( G ) . Regime 1: Fixed Num b er of V ertices ( m ﬁxed) In this regime, the sample space of graphs is indep enden t of n . The probability mass on either side of zero is a strictly p ositiv e constant: π + := P ( h n, 1 > 0) > 0 , π − := P ( h n, 1 < 0) > 0 . The probability that the sample fails to straddle zero is b ounded by (1 − π − ) n + (1 − π + ) n , which v anishes as n → ∞ . Thus, P ( E n ) → 1 . Regime 2: Sparse Regime ( m n → ∞ ) Here, the uncentered subgraph count H ( G n,i ) conv erges in distribution to a Poisson random v ariable Z with mean c v ( H ) / | Aut ( H ) | , and h 0 = c v ( H ) 0 / | Aut ( H ) | . The v ariable h n,i b eha v es asymptotically as Z − h 0 . Since a P oisson distribution with parameter c v ( H ) / | Aut ( H ) | is non-degenerate and supp orted on in tegers, it assigns p ositiv e probability to v alues strictly less than h 0 and strictly greater than h 0 . lim n →∞ P ( h n, 1 > 0) = P ( Z > h 0 ) > 0 , lim n →∞ P ( h n, 1 < 0) = P ( Z < h 0 ) > 0 . F ollo wing the same logic as the ﬁxed case, P ( E n ) → 1 . Regime 3: Dense Regime ( m n → ∞ ) In the dense regime (e.g., Erdös-Rén yi with constan t p ) we restrict ourselves to the sub critical case of the ferromagnetic ERGM. The Central Limit Theorem for subgraph counts states that the standardized v ariable con v erges to a Normal distribution: H ( G n,i ) − E [ H ( G n,i )] p V ar( H ( G n,i )) d − → N (0 , 1) . 68 The limiting Normal distribution has supp ort on the en tire real line. Hence, the limiting probabilities of h n,i = H ( G n,i ) − h 0 b eing p ositive or negative are strictly p ositive: lim n →∞ P ( h n, 1 > 0) > 0 , lim n →∞ P ( h n, 1 < 0) > 0 . Therefore, the probabilit y that the maxim um is p ositiv e and the minim um is negative approaches 1 as n → ∞ . D.2.2 Pro of of Lemma 3.3 Pr o of. W e now apply the lemmas to prov e existence and uniqueness. 1. Estimating Equations: • Fixed/Sparse Regime: The equation is P n i =1 h n,i e − ˆ λ n h n,i = 0 . This is equiv alent to E [ Y n e − ˆ λ n Y n ] = 0 . • Dense Regime: The estimator ˆ λ n is the critical p oin t of 1 n P n i =1 exp  − λ m v ( H ) − 2 n h n,i  . Diﬀeren- tiating with resp ect to λ , the deﬁning equation is: n X i =1 h n,i e − ˆ λ n m v ( H ) − 2 n h n,i = 0 . By deﬁning a scaled parameter ˜ λ = ˆ λ n m v ( H ) − 2 n , this reduces to the form P h n,i e − ˜ λh n,i = 0 , which is equiv alent to E [ Y n e − ˜ λY n ] = 0 . 2. Existence of ro ot: Assume the ev en t E n holds. The supp ort of the empirical v ariable Y n is H n . By deﬁnition of E n , w e hav e min( H n ) < 0 < max( H n ) . By Lemma D.2 , this condition is necessary and suﬃcient for the existence of a real ro ot to the equation E [ Y n e − λY n ] = 0 (and consequently for the scaled equation in the dense regime). Since lim n →∞ P ( E n ) = 1 , the ro ot exists with probability approaching 1. 3. Uniqueness of ro ot Consider the function f ( λ ) = P n i =1 h n,i e − λh n,i . Diﬀerentiating with resp ect to λ : f ′ ( λ ) = − n X i =1 h 2 n,i e − λh n,i . Conditioned on E n , the v alues h n,i are not all zero. Therefore, h 2 n,i > 0 for at least some indices, implying f ′ ( λ ) < 0 for all λ ∈ R . The function is strictly decreasing. By Lemma D.1 , if a ro ot exists, it must b e unique. Com bining the asymptotic probabilit y of E n with the deterministic results of Lemmas D.1 and D.2 conditioned on E n : with probability approac hing 1 as n → ∞ , the Lagrange m ultiplier ˆ λ n exists and is the unique solution to the estimating equation. 69

Maximum entropy based testing in network models: ERGMs and constrained optimization

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment