Sparse Maximum-Entropy Random Graphs with a Given Power-Law Degree Distribution

Sparse Maxim um-En trop y Random Graphs with a Giv en P o w er-La w Degree Distribution Pim v an der Hoorn 1 , Gab or Lippner 2 , and Dmitri Kriouk ov 1,2,3 1 Northeastern Univ ersity , Departmen t of Physics 2 Northeastern Univ ersity , Departmen t of Mathematics 3 Northeastern Univ ersity , Departmen ts of Electrical&Computer Engineering Octob er 12, 2017 Abstract Ev en though pow er-la w or close-to-pow er-la w degree distributions are ubiquitously ob- serv ed in a great v ariet y of large real netw orks, the mathematically satisfactory treatment of random p o wer-la w graphs satisfying basic statistical requiremen ts of realism is still lac king. These requiremen ts are: sparsit y , exc hangeability , pro jectivity , and un biased- ness. The last requirement states that entrop y of the graph ensemble must b e maximized under the degree distribution constrain ts. Here we prov e that the h yp ersoft conﬁguration mo del (HSCM), belonging to the class of random graphs with latent hyperparameters, also known as inhomogeneous random graphs or W -random graphs, is an ensemble of random pow er-law graphs that are sparse, un biased, and either exchangeable or pro jec- tiv e. The pro of of their unbiasedness relies on generalized graphons, and on mapping the problem of maximization of the normalized Gibbs en tropy of a random graph ensem ble, to the graphon entrop y maximization problem, showing that the t wo en tropies conv erge to eac h other in the large-graph limit. Keyw ords: Sparse random graphs, P ow er-law degree distributions, Maximum-en trop y graphs P ACS: 89.75.Hc, 89.75.Fb, 89.70.Cf MSC: 05C80, 05C82, 54C70 1 Con ten ts 1 In tro duction 3 1.1 Hyp ersoft conﬁguration mo del (HSCM) . . . . . . . . . . . . . . . . . . . . . 3 1.2 Prop erties of the HSCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Un biasedness and the maximum-en trop y requirement . . . . . . . . . . . . . . 5 1.4 Main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.5 Exc hangeability and pro jectivit y . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.6 Other remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.7 P ap er organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2 Bac kground information and deﬁnitions 10 2.1 Graph ensembles and their entrop y . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.1 Maxim um-entrop y graphs with a giv en degree sequence (CM) . . . . . 11 2.1.2 Maxim um-entrop y graphs with a giv en exp ected degree sequence (SCM) 11 2.2 Sparse graphs with a giv en degree distribution . . . . . . . . . . . . . . . . . 11 2.3 Maxim um-entrop y graphs with h yp ersoft constraints . . . . . . . . . . . . . . 12 2.3.1 Graphon-based graph ensembles . . . . . . . . . . . . . . . . . . . . . 12 2.3.2 Bernoulli and graphon en tropies . . . . . . . . . . . . . . . . . . . . . 13 2.3.3 Dense maximum-en trop y graphs with a given degree distribution . . . 13 2.3.4 Rescaled graphon entrop y of sparse graphs . . . . . . . . . . . . . . . 15 2.3.5 Sparse maximum-en trop y graphs with a given degree distribution . . . 15 2.3.6 Sparse p ow er-la w hypersoft conﬁguration mo del (sparse HSCM) . . . 16 3 Results 16 3.1 Main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 The limit of the degree distribution in the HSCM . . . . . . . . . . . . . . . . . 17 3.3 The limit of the exp ected av erage degree in the HSCM . . . . . . . . . . . . . . 18 3.4 HSCM maximizes graphon en tropy . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.5 Graphon entrop y scaling and conv ergence . . . . . . . . . . . . . . . . . . . . 19 3.6 Gibbs entrop y scaling and conv ergence . . . . . . . . . . . . . . . . . . . . . . 20 4 Pro ofs 21 4.1 The classical limit appro ximation of the F ermi-Dirac graphon . . . . . . . . . 21 4.2 Pro ofs for no de degrees in the HSCM . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2.1 T ec hnical results on P oisson couplings and concentrations . . . . . . . 27 4.2.2 Pro of of Theorem 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2.3 Pro of of Theorem 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 Pro ofs for graphon entrop y . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3.1 Pro of of Prop osition 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3.2 Pro of of Theorem 3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.4 Pro of of Theorem 3.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.4.1 Av eraging W by a partition of A n . . . . . . . . . . . . . . . . . . . . 36 4.4.2 Constructing the partition . . . . . . . . . . . . . . . . . . . . . . . . . 37 2 1 In tro duction Random graphs hav e been used extensiv ely to mo del a v ariet y of real net w orks. Many of these net works, ranging from the In ternet and so cial net works to the brain and the universe, hav e broad degree distributions, often following closely p o wer la ws [ 1 – 3 ], that the simplest random graph mo del, the Erd˝ os-R ´ en yi random graphs [ 4 – 6 ] with Poisson degree distributions, do es not repro duce. T o resolve this disconnect, sev eral alternative mo dels hav e b een proposed an d studied. The ﬁrst one is the conﬁguration mo del (CM), random graphs with a given degree sequence [ 7 , 8 ]. This mo del is a micro canonical ensem ble of random graphs. Every graph in the ensem ble has the same ﬁxed degree sequence, e.g., the one observ ed in a snapshot of a real net work, and ev ery suc h graph is equiprobable in the ensem ble. The ensem ble th us maximizes Gibbs en tropy sub ject to the constrain t that the degree sequence is ﬁxed. Y et given a real net work snapshot, one cannot usually trust its degree sequence as some “ultimate truth” for a v ariet y of reasons, including measuremen t imperfections, inaccuracies, and incompleteness, noise and stochasticit y , and most imp ortan tly , the fact that most real net works are dynamic b oth at short and long time scales, gro wing often by orders of magnitude ov er years [ 9 , 10 , 1 – 3 ]. These factors partly motiv ated the developmen t of the soft conﬁguration mo del (SCM), random graphs with a giv en exp ected degree sequence, ﬁrst considered in [ 11 , 12 ], and later corrected in [ 13 – 16 ], where it was sho wn that this correction yields a canonical ensem ble of random graphs that maximize Gibbs entrop y under the constraint that the exp ected degree sequence is ﬁxed. In statistics, canonical ensem bles of random graphs are known as exp onen- tial random graphs (ER Gs) [ 17 ]. In [ 18 , 19 ] it w as shown that the sparse CM and SCM are not equiv alen t, but they are equiv alen t in the case of dense graphs [ 20 ]. Y et the SCM still treats a given degree sequence as a ﬁxed constrain t, alb eit not as a sharp but soft constrain t. This constraint is in stark contrast with reality of man y growing real netw orks, in which the degree of all nodes constantly c hange, yet the shape of the degree distribution and the av erage degree do not c hange, staying essentially constant in net works that gro w in size even b y orders of magnitude [ 9 , 10 , 1 – 3 ]. These observ ations motiv ated the developmen t of the hypersoft conﬁguration mo del [ 21 – 24 ]. 1.1 Hyp ersoft conﬁguration mo del (HSCM) In the HSCM neither degrees nor ev en their expected v alues are ﬁxed. Instead the ﬁxed prop erties are the degree distribution and av erage degree. The HSCM with a given av erage degree and p o wer-la w degree distribution is deﬁned b y the exp onential measure µ on the real line R µ = e αx , x ∈ R , where α > 1 is a constan t, and b y the F ermi-Dirac graphon W : R 2 → [0 , 1] W ( x, y ) = 1 e x + y + 1 . (1) The volume-form measure µ establishes then probabilit y measures µ n = µ | A n µ ( A n ) = α e α ( x − R n ) 3 on interv als A n = ( −∞ , R n ], where R n = 1 2 log n β 2 ν , β = 1 − 1 α , (2) and ν > 0 is another constant. The constan ts α > 1 and ν > 0 are the only t wo parameters of the model. The HSCM random graphs of size n are deﬁned by ( W , A n , µ n ) via sampling n i.i.d. p oin ts x 1 , . . . , x n on A n according to measure µ n , and then connecting pairs of p oints i and j at sampled lo cations x i and x j b y an edge with probability W ( x i , x j ). An alternative equiv alent deﬁnition is obtained by mapping ( W , A n , µ n ) to ( W I ,n , I , µ I ), where I = [0 , 1], µ I = 1, and W I ,n ( x, y ) = 1 n β 2 ν ( xy ) 1 α + 1 , x, y ∈ I , (3) In this deﬁnition, x i s are n i.i.d. random v ariables uniformly distributed on the unit in terv al [0 , 1], and vertices i and j are connected with probabilit y W I ,n ( x i , x j ). Y et another equiv alent deﬁnition, p erhaps the most familiar and most frequently used one, is given by ( W P,n , P , µ P ), where in terv al P = [ β ν, ∞ ), and measure µ P on P is the Pareto distribution µ P = α ( β ν ) α x − γ , x ∈ P , W P,n ( x, y ) = 1 ν n xy + 1 , x, y ∈ P . In this Pareto represen tation, the exp ected degree of a vertex at co ordinate x is prop ortional to x [ 15 , 16 ]. Compared to the SCM where only edges are random v ariables while the expected degrees are ﬁxed, the HSCM in tro duces another source of randomness—and hence entrop y—coming from exp ected degrees that are also random v ariables. One obtains a particular realization of an SCM from the HSCM b y sampling x i s from their ﬁxed distribution and then freezing them. Therefore the HSCM is a probabilistic mixture of canonical ensem bles, SCM ERGs, so that one ma y call the HSCM a hyp er c anonic al ensemble given that latent v ariables x in the HSCM are called hyp erp ar ameters in statistics [ 25 ]. 1.2 Prop erties of the HSCM W e pro v e in Theorem 3.2 that the distribution of degrees D in the p o wer-la w HSCM ensem ble deﬁned ab o ve conv erges—henceforth c onver genc e alwa ys means the n → ∞ limit, unless men tioned otherwise—to P ( D = k ) = α ( β ν ) α Γ( k − α, β ν ) k ! = α ( β ν ) α Γ( k − α ) Γ( k + 1) (1 − P ( k − α, β ν )) , (4) where Γ( a, x ) is the upp er incomplete Gamma function, and P ( a, x ) is the regularized low er incomplete Gamma function. Since P ( a, x ) ∼ ( ex/a ) a for a  ex , while Γ( k − α ) / Γ( k + 1) ∼ k − ( α +1) for k  α , w e get P ( D = k ) ∼ α ( β ν ) α k − γ , γ = α + 1 . (5) 4 4 Pi m v an der Ho orn, Gab or Lippner, Dmitri Kriouk o v 10 0 10 2 10 4 10 6 10 -15 10 -10 10 -5 10 0 Theory HSCM n=10 4 HSCM n=10 5 HSCM n=10 6 Internet 10 0 10 1 10 2 10 3 10 4 10 5 10 -15 10 -10 10 -5 10 0 Theory HSCM n=10 4 HSCM n=10 5 HSCM n=10 6 Fig. 1 Degree distribution in the HSCM, theory vs. sim ulations, and in the In ternet. The theory curv e in the left panel is Eq. ( 5 ) with α = 1 . 1 ( γ = 2 . 1) and ν = 4 . 92. The sim ulation data sho wn b y sym b ols is a v eraged o v er 1 00 random graphs for eac h gr aph size n . All the graphs are generated according to the HSCM with the same α = 1 . 1 and ν = 4 . 92. The a v erage degrees, a v eraged o v er 100 random graphs, in the graphs of size 1 0 4 , 10 5 , and 10 6 , a re 1 . 73, 2 . 16, and 2 . 51, resp ectiv el y . The In ternet data comes fr om CAID A’ s Arc hi p elago measuremen ts of the In ternet top ology at the Autonomous System lev el [ 34 ]. The n um b er of no des and the a v erage degree in the In tern e t graph are 23 , 752 and 4 . 92. The righ t panel sho ws the theoretical degree distribution curv e in Eq. ( 5 ) with α = 2 . 0 and ν = 10 . 0 v ersus sim ulations of 100 random HSCM g raphs o f diﬀeren t sizes with the same α and ν . The a v erage degrees in the gra phs of size 10 4 , 10 5 , and 10 6 , are 9 . 96, 9 . 98, and 10 . 0, resp ectiv ely . the limit, is constan t δ = ν / 2 if α = 2. In this case, the HSCM can b e equiv alen tly deﬁned as a gro wing mo del of lab eled graphs, con v erging to soft preferen tial attac hmen t [ 22 ]—for n = 1 , 2 , . . . , the lo c ation x n of no de n b elongs to A n ’s incremen t, x n ∈ B n = A n \ A n − 1 ( A 0 = ∅ ), and sampled from µ restricted to this incremen t, i.e., from the probabi lit y measure ˜ µ n = µ | B n /µ ( B n ) = α e α ( x − R n + R n − 1 ) ; no de n is then connected to existing n o des i = 1 . . . n − 1 with probabilit y giv e n b y ( 2 ). F or the exact equiv alence, for eac h n , b et w een the original equilibrium HSCM deﬁnition with ordered x i s, x i < x i +1 , an d this gro wing deﬁnition, sligh tly more care, ensuring that the join t distribution of x i s is exactly the same in b oth cases, m ust b e tak en using basics prop erties of P oisson p oin t pro c esses [ 33 ]. Sp e ciﬁcally , in the equilibrium form ulation with a giv en R n , th e n um b er of no des cannot b e ﬁxed to n , but m ust b e sampled from the P oisson distribution with mean n . Alternativ ely , for a giv en ﬁxed n , the righ t b oundary R n of in terv al A n cannot b e ﬁxed, but m ust b e a random v ariable R n = (1 / 2) log (2 V n ), where V n is a random v ariable sampled from the Gamma distribution with shap e n and rate δ = ν / 2. No de n is then placed at x n = R n , while the co ordin ate s of the rest of n − 1 no des are s ampled from µ n deﬁned b y this random R n , and then lab eled in the in c r e asing order. In the gro wing graph form ulation, the co ordinate x n +1 of the n + 1’th no de is determined b y v n +1 = v n + V , where v 0 = 0, v i = (1 / 2) e 2 x i , and V is a random v ariable sampled from the exp onen tial distribution with rate δ = ν / 2. All the three options are equiv alen t realizations of t he same P oisson p oin t pro cess on R with measure µ and rate δ , con v erging to the binomial sampling with b oth n and R n ﬁxed [ 33 ]. The pro jec ti v e map π n 7→ n 0 in the pro jectivit y deﬁnition ab o v e, maps graphs G n to their subgraphs induced b y no des i = 1 . . . n 0 . W e note that ev en though the gro wing v ersion of the mo del is not e xc hangeable since it relies on lab eling of no des in the increasing order of their co ordinates, it is nev ertheless equiv alen t to the equilibrium HSCM with this ordered lab eling, b ecause the join t distribution of no de co ordinates, and the linking probabilit y as a function of these co ordinates ar e the same in the t w o for m ulations [ 33 ]. This observ ation suggests that there migh t exist a less trivial pro jectiv e map suc h that the mo d e l is b oth pro jectiv e an d exc hangeable if α = 2. If α 6 = 2, then the HSCM, ev en with ordered lab eling, is not equiv alen t to soft preferen tial attac h m en t, but it is equiv alen t to its adju s ted v ersion with a certain rate of (dis)app earance of edges b et w een existing v ertices [ 22 ]. Here w e note that the HSCM is th e zero-clustering limit [ 38 ] of random h yp erb olic graphs [ 39 ], where th e α = 2 case corresp onds to the uniform densit y of p oin ts in the h yp erb oli c space H d , a n d Figure 1: Degree distribution in the HSCM, theory vs. simulations, and in the In ternet. The theory curve in the left panel is Eq. ( 4 ) with α = 1 . 1 ( γ = 2 . 1) and ν = 4 . 92. The sim ulation data shown b y symbols is av eraged o ver 100 random graphs for eac h graph size n . All the graphs are generated according to the HSCM with the same α = 1 . 1 and ν = 4 . 92. The a verage degrees, av eraged o ver 100 random graphs, in the graphs of size 10 4 , 10 5 , and 10 6 , are 1 . 73, 2 . 16, and 2 . 51, resp ectively . The Internet data comes from CAIDA’s Arc hip elago measuremen ts of the Internet top ology at the Autonomous System lev el [ 26 ]. The n umber of no des and the a v erage degree in the In ternet graph are 23 , 752 and 4 . 92. The righ t panel shows the theoretical degree distribution curv e in Eq. ( 4 ) with α = 2 . 0 and ν = 10 . 0 v ersus simulations of 100 random HSCM graphs of diﬀerent sizes with the same α and ν . The a verage degrees in the graphs of size 10 4 , 10 5 , and 10 6 , are 9 . 96, 9 . 98, and 10 . 0, resp ectively . W e also pro ve in Theorem 3.3 that the exp ected av erage degree in the ensem ble con verges to E [ D ] = ν . (6) That is, the degree distribution in the ensemble has a p o wer tail with exp onen t γ , while the exp ected av erage degree is ﬁxed to constan t ν that do es not dep end on n , Fig. 1 . 1.3 Un biasedness and the maxim um-en trop y requirement While the av erage degree in the HSCM con verges to a constant, and the degree distribution con verges to a p o wer law, the p o wer-la w HSCM is certainly just one of an inﬁnite n umber of other mo dels that p ossess these tw o prop erties. One example is random h yperb olic graphs [ 27 ] that also hav e constan t av erage degree and p ow er-la w degree distribution, but hav e larger n umbers of triangles, non-zero clustering in the limit. Is the HSCM an unbiased mo del of random p ow er-la w graphs with constan t av erage degree? That is, are the HSCM random graphs characterized b y only these t w o prop erties and no others? Or collo quially , is the HSCM the mo del of “maximally random” p o wer-la w graphs with a constant av erage degree. This question can b e formally answered by c hecking whether the HSCM satisﬁes the maximum- en tropy requirement. A discrete distribution p i , i = 1 , 2 , . . . , is said to satisfy the maximum-en tropy requirement sub ject to constrain ts P i p i f ir = ¯ f r , r = 1 , 2 , . . . , where f r s are some real functions of states i , 5 and ¯ f r s are a collection of real num b ers, if the Gibbs/Shannon entrop y of the distribution S = − P i p i log p i is maximized sub ject to these constrain ts [ 28 ]. This en tropy-maximizing distribution is known to b e alwa ys unique, b elonging to the exp onential family of distribu- tions, and it can be derived from the basic consistency axioms: uniqueness and inv ariance with resp ect to a change of co ordinates, system indep endence and subset indep endence [ 29 – 31 ]. Since entrop y is the unique measure of information satisfying the basic requirements of contin uit y , monotonicit y , and system/subset indep endence [ 32 ], the maxim um-entrop y re- quiremen t formalizes the notion of enco ding in to the probability distribution p i describing a sto c hastic system, all the a v ailable information ab out the system given to us in the form of the constrain ts ab ov e, and not enco ding an y other information not given to us. Since the en tropy-maximizing distribu tion is unique, an y other distribution necessarily but p ossibly im- plicitly in tro duces biases by enco ding some additional ad-ho c information, and constraining some other system prop erties, concerning which we are not giv en any information, to some ad-ho c v alues. Clearly , such uncontrolled information injection in to a mo del of a system may aﬀect the predictions one may wish to make about the system using the mo del, and indeed it is known that giv en all the a v ailable information about a system, the predictiv e p ow er of a model that describ es the system is maximized by the maximum-en trop y mo del [ 29 – 31 ]. P erhaps the b est illustration of this predictive p ow er is the predictive p o wer of equilibrium statistical mechanics, whic h can b e formulated almost fully in terms of the maximum-en trop y principle [ 28 ]. T o illustrate the maxim um-en tropy requiremen t in application to random graphs, supp ose w e are to deﬁne a random graph ensemble, and the only av ailable information is that these random graphs must hav e n no des and m edges. F rom the purely probabilistic p ersp ectiv e, an y random graph ensemble satisfying these constraints—random m -stars or m -cycles, for instance, if m < n —would b e an equally go o d one. Y et there is only one unique ensem ble that satisﬁes not only these constraints but also the maximum-en trop y requirement. This ensem ble is G n,m b ecause in G n,m any graph with n no des and m edges is equally lik ely , so that the probability distribution on the set of all graphs with n no des and m edges is uninform, and without further constrain ts, the uniform distribution is the maximum-en trop y distribution on the state space, whic h in this case is i = 1 , . . . ,  ( n 2 ) m  , the num b er of such graphs. Random m -stars or m -cycles, while satisfying the constraints, inject in to the model, in this case explicitly , additional information ab out the graph structure that w as not giv en to us. Clearly , predictions based on random m -stars v ersus G n,m ma y b e very diﬀerent, as the ﬁrst mo del trivially predicts that m -stars o ccur with probability 1, while they app ear with a nearly zero probability in G n,m if n and m are large. A sligh tly less trivial example is the SCM. In this case the given information is that the exp ected degrees of no des i = 1 , . . . , n m ust b e k i ∈ R + , and the state space is all the 2 ( n 2 ) graphs on n no des. As sho wn in [ 13 – 16 ], the unique en tropy-maximizing ensemble satisfying these constraints is given b y random graphs in whic h no des i and j are connected with probabilities p ij = 1 / ( k n/κ i κ j + 1), where k = P i k i , and κ i s are the unique solution of the system of n equations P j p ij = k i . The p opular Chung-Lu mo del [ 11 , 12 ] is diﬀeren t in that the connection probabilit y there is p C L ij = min ( k i k j /k n, 1), which can b e thought of as a classical-limit appro ximation to the entrop y-maximizing F ermi-Dirac p ij ab o ve. While the CL ensem ble also satisﬁes the desired constrain ts P j p C L ij = k i (alb eit only for sequence s k i s.t. k i k j /k n ≤ 1), it do es not satisfy the maxim um-entrop y requirement, so that it injects, in this case implicitly , some additional information in to the ensemble, constraining some undesired 6 prop erties of graphs in the ensemble to some ad-ho c v alues. Since the undesired information injection is implicit in this case, it may b e quite diﬃcult to detect and quantify all the biases in tro duced into the ensem ble. 1.4 Main results The main result of this pap er is the pro of in Theorem 3.1 that the HSCM is un biased, that is, that the HSCM random graphs maximize the Gibbs entrop y of random graphs whose degree distribution and av erage degree conv erge to ( 4 , 6 ). The ﬁrst diﬃcult y that we face in pro ving this result is how to prop erly formulate the en tropy-maximization problem under these constrain ts. Indeed, we are to sho w that the probabilit y distributions P n that the HSCM deﬁnes on the set of n -sized graphs G n maximizes the graph entrop y S [ P 0 n ] = − X G n P 0 n ( G n ) log P 0 n ( G n ) across all the distributions P 0 n that deﬁne random graph ensem bles with the degree distri- butions and a verage degrees conv erging to ( 4 , 6 ). These constraints are quite diﬀerent than the SCM constraints, for example, b ecause for any ﬁxed n , w e do not hav e a ﬁxed set of constrain ts or suﬃcien t statistics. Instead of in tro ducing suc h suﬃcient statistics, e.g., ex- p ected degrees con verging to a des ired P areto distribution, and pro ceeding from there, we sho w in Section 2 that the problem of graph entrop y maximization under these constraints is equiv alen t to a graphon entrop y maximization problem [ 33 ], i.e., to the problem of ﬁnding a graphon W that maximizes graphon en tropy σ n [ W 0 ] = Z Z A 2 n H  W 0 ( x, y )  dµ n ( x ) dµ n ( y ) , where H ( p ) = − p log p − (1 − p ) log (1 − p ) is the en tropy of a Bernoulli random v ariable with success probability p , across all the graphons W 0 that satisfy the constrain t ( n − 1) Z A n W 0 ( x, y ) dµ n ( y ) = κ n ( x ) , where κ n ( x ) ≈ √ ν n e − x is the exp ected degree of a no de at co ordinate x in the p ow er-la w HSCM. W e then pro ve in Prop osition 3.4 that the unique solution to this graphon entrop y maximization problem is given by the F ermi-Dirac graphon W in ( 1 ). The fact that the F ermi-Dirac graphon is the unique solution to this graphon entrop y maximization problem is a reﬂection of the basic fact in statistical ph ysics that the grand canonical ensem ble of F ermi particles, which are edges of energy x + y in our case, is the unique maximum-en trop y ensem ble with ﬁxed exp ected v alues of energy and num b er of particles [ 34 ], in which the probabilit y to ﬁnd a particle at a state with energy x + y is given by ( 1 ). Y et the solutions to the graph and graphon entrop y maximization problems yield equiv a- len t random graph ensem bles only if the rescaled graph en tropy S ∗ [ P n ] = S [ P n ] /  n 2  con verges to the graphon entrop y σ n [ W ]. Here we face another diﬃcult y that since our ensembles are sparse, b oth S ∗ [ P n ] and σ n [ W ] con verge to zero, so that we are actually to prov e that the tw o en tropies con verge to each other faster than either of them conv erges to zero. T o this end we pro ve in Theorems 3.5 and 3.6 that b oth the graphon and graph entropies conv erges to zero 7 as σ n [ W ] , S ∗ [ P n ] ∼ ν log ( n ) /n . The key result then, also in Theorem 3.6 , is the proof that if divided by the scaling factor of log( n ) /n , the diﬀerence b etw een the graphon and graph en tropies v anishes in the limit, lim n →∞ ( n/ log n ) |S ∗ [ P n ] − σ n [ W ] | → 0, meaning that the tw o en tropies do indeed conv erge to each other faster than to zero. The combination of graphon ( 1 ) b eing the entrop y maximizer, and the conv ergence of the rescaled graph entrop y to the entrop y of this graphon, implies the main result in Theorem 3.1 that the p o wer-la w HSCM is a graph entrop y maximizer sub ject to the degree distribution and av erage degree constraints ( 4 , 6 ). 1.5 Exc hangeabilit y and pro jectivity In addition to the natural, dictated by real-world net work data, requirements of constant, i.e., indep enden t of graphs size n , av erage degree and pow er-la w degree distribution, as w ell as the maximum-en trop y requirement, dictated by the basic statistical considerations, a rea- sonable mo del of real netw orks must also satisfy tw o more requirements: exchangeabilit y and pro jectivit y . Exc hangeability takes care of the fact that no de labels in random graph mo dels are usually meaningless. Even though no de lab els in real net w orks often ha ve some netw ork-sp eciﬁc meaning, suc h as autonomous system n um b ers in the In ternet [ 9 ], node lab els in random graph mo dels can b e, and usually are, random integer indices i = 1 , 2 , . . . . A random graph mo del is exc hangeable if for an y p ermutation σ of no de indices i , the probabilities of any t wo graphs G and G σ giv en by adjacency matrices G i,j and G σ ( i ) ,σ ( j ) are the same, P ( G ) = P ( G σ ) [ 35 , 36 ]. A random graph mo del is pro jectiv e if there exists a map π n 7→ n 0 from graphs of size n to graphs of size n 0 < n such that the probability of graphs in the model satisﬁes P ( G n 0 ) = P ( π n 7→ n 0 ( G n )) [ 37 , 38 ]. If this condition is satisﬁed, then it is easy to see that the same mo del admits a dual formulation as an equilibrium mo del of graphs of a ﬁxed size, or as a gro wing graph mo del [ 39 ]. If this requirement is not satisﬁed, then as so on as one no de is added to a graph, e.g., due to the growth of a real net work that this graph represen ts, then the resulting bigger graph is eﬀectively sampled from a diﬀeren t distribution corresponding to the mo del with diﬀerent parameters, necessarily aﬀecting the structure of its existing subgraphs, a clearly unrealistic scenario. As the simplest examples, G n,p is pro jective (the map π n 7→ n 0 simply selects any subset of n no des consisting of n 0 no des), but G n,k/n with constan t k is not. In the ﬁrst case, one can realize G n,p b y growing graphs adding no des one at a time, and connecting each new no de to all existing no des with probabilit y p , while in the second case suc h growth is impossible since the existing edges in the gro wing graphs m ust b e remov ed with probability 1 /n for the resulting graphs to b e samples from G n,k/n for each n . The HSCM random graphs are manifestly exchangeable as an y graphon-based ensem- ble [ 40 , 41 ]. Here we note that the fact that these graphs are b oth sparse and exchangeable is not by an y means in conﬂict with the Aldous-Ho o ver theorem [ 35 , 42 ] that states that the limit graphon, mapp ed to the unit square, of an y exchangeable sparse graph family is necessarily zero. Indeed, if mapp ed to unit square, the limit HSCM graphon W I ,n ( 3 ) is zero as well. W e also note that the con vergence of W I ,n to zero do es not mean that the ensem ble con verges to inﬁnite empty graphs. In fact, the exp ected degree distribution and av erage degree in the ensem ble conv erge to ( 4 , 6 ) in the limit, as stated ab o ve. If α = 2, the HSCM ensem ble is also pro jective, but only with a sp eciﬁc lab eling of no des breaking exc hangeability . This can be seen by observing that the density of p oints on in terv als A n , δ n = n/µ ( A n ) = α  β 2 ν  α/ 2 n 1 − α/ 2 , and consequently on the whole real line R 8 in the limit, is constant δ = ν / 2 if α = 2. In this case, the HSCM can b e equiv alently deﬁned as a mo del of growing lab eled graphs as follows: for n = 1 , 2 , . . . , the lo cation x n of new no de n b elongs to A n ’s increment, x n ∈ B n = A n \ A n − 1 ( A 0 = ∅ ), and sampled from µ restricted to this increment, i.e., from the probabilit y measure ˜ µ n = µ | B n /µ ( B n ) = α e α ( x − R n + R n − 1 ) . Ha ving its lo cation sampled, new no de n then connects to all existing no des i = 1 . . . n − 1 with probability given by ( 1 ). This gro wing mo del is equiv alen t to the original equilibrium HSCM deﬁnition in Section 1.1 only asymptotically . How ever, the exact equiv alence, for each n , betw een the equilibrium HSCM with ordered x i s, x i < x i +1 , and its growing counterpart can b e also achiev ed by ensuring that the joint distribution of x i s is exactly the same in b oth cases, using basic prop erties of Poisson p oin t pro cesses [ 39 ]. Sp eciﬁcally , the equilibrium deﬁnition in Section 1.1 m ust b e adjusted by making the right b oundary R n of in terv al A n not a ﬁxed function of n ( 2 ), but a random v ariable R n = (1 / 2) log (2 V n ), where V n is a random v ariable sampled from the Gamma distribution with shap e n and rate δ = ν / 2. Node n is then placed at random co ordinate x n = R n , while the co ordinates of the rest of n − 1 no des are sampled from probabilit y measure µ n —measure µ restricted to the random in terv al A n = ( −∞ , R n ]—and then lab eled in the increasing order of their co ordinates. The growing mo del deﬁnition must b e also adjusted: the co ordinate x n +1 of the n + 1’th node is determined b y v n +1 = v n + V , where v 0 = 0, v i = (1 / 2) e 2 x i , and V is a random v ariable sampled from the exp onential distribution with rate δ = ν / 2. One can sho w that co ordinates x i , b oth for ﬁnite and inﬁnite n , in b oth the equilibrium and growing HSCM mo dels deﬁned this wa y , are equiv alen t realizations of the same P oisson p oin t proce ss on R with measure µ and rate δ , con verging to the binomial sampling with R n ﬁxed to ( 2 ) [ 39 ]. The pro jective map π n 7→ n 0 in the pro jectivity deﬁnition ab ov e, simply maps graphs G n to their subgraphs induced by no des i = 1 . . . n 0 . W e note that even though the growing HSCM is not exchangeable since it relies on lab eling of no des in the increasing order of their co ordinates, it is nevertheless equiv alent to the equilibrium HSCM with this ordered lab eling, b ecause the join t distribution of no de co ordinates, and the linking probabilit y as a function of these co ordinates are the same in b oth the equilibrium and growing HSCM deﬁnitions [ 39 ]. This observ ation suggests that there might exist a less trivial pro jective map such that the HSCM is b oth pro jective and exchangeable at the same time. 1.6 Other remarks W e note that thanks to its pro jectiv eness, the p ow er-la w HSCM was sho wn in [ 24 ] to b e equiv alen t to a soft version of preferential attachmen t, a mo del of gro wing graphs in which new no des connect to existing no des with probabilities prop ortional to the exp ected degrees of existing nodes. It is well-kno wn that similar to the HSCM, the degree distribution and a verage degree in graphs gro wn according to preferential attachmen t do not essen tially c hange either as graphs gro w [ 43 , 44 ]. If α = 2, then the equiv alence b etw een the HSCM and soft preferen tial attachmen t is exact. If α 6 = 2, then the HSCM, even with ordered lab eling, is not equiv alen t to soft preferential attachmen t, but it is equiv alent to its adjusted v ersion with a certain rate of (dis)appearance of edges b etw een existing vertices [ 24 ]. W e also note that the HSCM is the zero-clustering limit [ 45 ] of random hyperb olic graphs [ 27 ], where the α = 2 case corresponds to the uniform densit y of points in the hyper- b olic space H d , and where x i s are the radial co ordinates of no des i in the spherical co ordinate system of the h yp erb oloid mo del of H d . These co ordinates can certainly not b e negative, 9 but the exp ected fraction of no des with negative co ordinates in the HSCM is negligible: µ n ( R − ) =  β 2 ν /n  α/ 2 → 0. In the zero-clustering limit, the angular co ordinates of no des in H d are ignored in the h yp erb olic graphon [ 27 ], whic h becomes equiv alent to ( 1 ). As a ﬁnal introductory remark w e note that among the rigorous approaches to sparse ex- c hangeable graphs, the HSCM deﬁnition is p erhaps closest to graphon pro cesses and graphexes in [ 46 – 48 ]. In particular in [ 48 ], where the fo cus is on graph conv ergence to well-deﬁned limits, t wo ensembles are considered. One ensem ble, also appearing in [ 47 ], is deﬁned b y any graphon W : R 2 + → [0 , 1] and an y measure µ on R + ( R + can b e replaced with an y measure space). Graphs of a certain exp ected size, whic h is a growing function of time t > 0, are deﬁned b y sampling p oints as the Poisson p oint pro cess on R + with in tensity tµ , then connecting pairs of p oints with the probability given by W , and ﬁnally remo ving isolated v ertices. The other ensemble is even more similar to the HSCM. It is still deﬁned by W and µ on R + , but the lo cation of v ertex n on R + is sampled from µ n = µ | A n /µ ( A n ), where A n s are ﬁnite-size in terv als growing with n whose inﬁnite union cov ers the whole R + . The latter ensemble is not exchangeable, but b oth ensembles are shown to con verge to prop erly stretched graphons deﬁned by W , y et only if the exp ected av erage degree gro ws to inﬁnity in the limit. The HSCM deﬁnition is diﬀeren t—in particular all n vertices of n -sized graphs are sampled from the same µ n —ensuring exchangeabilit y , and allo wing for an explicit con trol of the degree distribution and a verage degree, whic h can b e constant, but making the problem of graph con vergence diﬃcult. W e do not further discuss graph conv ergence here, lea ving it, as w ell as the generalization of the results to arbitrary degree distributions, for future publications. 1.7 P ap er organization In the next Section 2 we ﬁrst review in more detail the necessary background information and pro vide all the required deﬁnitions. In Section 3 we formally state all the results in the pap er, while Section 4 con tains all the pro ofs of these results. 2 Bac kground information and deﬁnitions 2.1 Graph ensem bles and their en trop y A graph ensem ble is a set of graphs G with probabilit y measure P on G . The Gibbs entrop y of the ensemble is S [ P ] = − X G ∈G P ( G ) log P ( G ) (7) Note that this is just the entrop y of the random v ariable G with resp ect to the probability measure P . When G n is a graph of size n , sampled from G according to measure P , we write S [ G n ] instead of S [ P ]. Given a set of constraints, e.g., in the form of graph prop erties ﬁxed to giv en v alues, the maxim um-entrop y ensem ble is giv en by P ∗ that maximizes S [ P ] across all measures P that satisfy the constrain ts. These constraints can be either sharp (micro canonical) or soft (canonical), satisﬁed either exactly or on av erage, resp ectively . The simplest example of the constrained graph prop erty is the num ber of edges, ﬁxed to m , in graphs of size n . The corresp onding micro canonical and canonical maximum-en tropy ensem- bles are G n,m and G n,p with p = m/  n 2  , resp ectively . The P ∗ is resp ectively the uniform and exp onen tial Boltzmann distribution P ( G ) = e − H ( G ) / Z with Hamiltonian H ( G ) = λm ( G ), 10 where m ( G ) is the n umber of edges in graph G , and the Lagrange multiplier λ is given by p = 1 /  e λ + 1  [ 13 ]. When the constrain ts are giv en by the degrees of nodes, instead of the num ber of edges, w e hav e the follo wing characterization of the micro canonical and canonical ensem ble. 2.1.1 Maxim um-entrop y graphs with a given degree sequence (CM) Giv en a degree sequence d n = d 1 . . . d n , the micro canonical ensemble of graphs that ha ve this degree sequence is the conﬁguration mo del ( CM ) [ 7 , 8 ]. The entrop y-maximizing P ∗ is uniform on the set of all graphs that ha ve degree sequence d n . 2.1.2 Maxim um-entrop y graphs with a given exp ected degree sequence (SCM) If the sharp CM constrain ts are relaxed to soft constrain ts, the result is the canonical ensem ble of the soft conﬁguration mo del ( SCM ) [ 14 , 15 ]. Giv en an exp ected degree sequence k n , which in con trast to CM’s d n , do es not ha v e to b e a graphical sequence of non-negativ e in tegers, but can b e an y sequence of non-negative real num b ers, the SCM is deﬁned b y connecting no des i and j with probabilities p ij = 1 e λ i + λ j + 1 , where Lagrange m ultipliers λ i are the solution of (8) k i = X i 1 , (22) A n = ( −∞ , R n ] , (23) R n = 1 2 log n ν β 2 , ν > 0 , β = 1 − 1 α , (24) µ n = µ | A n µ ( A n ) = α e α ( x − R n ) . (25) The dense p ow er-la w HSCM is recov ered from the ab ov e deﬁnition by setting ν = ˜ ν n , where ˜ ν is a constant, in which case R n = R = − (1 / 2) log  ˜ ν β 2  , A n = A = ( −∞ , R ], and µ n = µ = α e α ( x − R ) . 3 Results In this section we formally state our results, and provide brief o verviews of their pro ofs ap- p earing in subsequent sections. The main result is Theorem 3.1 , stating that the HSCM ( α, ν ) deﬁned in Section 2.3.6 is a maximum-en trop y mo del under hypersoft p o wer-la w degree dis- tribution constrain ts, according to the deﬁnition in Section 2.3.5 . This result follo ws from Theorems 3.2 - 3.6 and Prop osition 3.4 . Theorems 3.2 , 3.3 establish the limits of the degree dis- tribution and exp ected a verage degree in the HSCM ( α, ν ). Prop osition 3.4 states that HSCM ’s graphon uniquely maximizes the graphon en tropy under the constrain ts imp osed by the degree distribution. Theorem 3.5 establishes prop er graphon rescaling and the limit of the rescaled graphon. Finally , the most critical and inv olv ed Theorem 3.6 prov es that the rescaled Gibbs en tropy of the HSCM conv erges to its rescaled graphon entrop y . 3.1 Main result Let Y b e a Pareto random v ariable with shap e α > 1, scale ν β > 0, β = 1 − 1 /α , so that Y ’s probabilit y density function is P Y ( y ) = α ( ν β ) α y − γ , y ≥ ν β , γ = α + 1 , and (26) P ( Y > y ) = ( ( ν β ) α y − α if y ≥ ν β 1 otherwise. (27) Let D b e a discrete random v ariable with probability density function P ( D = k ) = E  Y k k ! e − Y  , k = 0 , 1 , 2 , . . . , (28) 16 whic h is the mixed Poisson distribution with mixing parameter Y [ 63 ]. Then it follows that E [ D ] = E [ Y ] = ν , (29) and since Y is a p ow er la w with exp onen t γ , the tail of D ’s distribution is also a p ow er la w with the same exp onent [ 63 ]. In particular, P ( D = k ) is given by ( 4 ). Therefore if D is the degree of a random node in a random graph ensem ble, then graphs in this ensemble are sparse and hav e a p ow er-la w degree distribution. Our main result is: Theorem 3.1. F or any α > 1 and ν > 0 , HSCM ( α, ν ) is a maximum entr opy ensemble of r andom gr aphs under the hyp ersoft c onstr aints ( C1 , C2 ) with P ( D = k ) and ν deﬁne d by ( 26 - 29 ). 3.2 The limit of the degree distribution in the HSCM The degree D n of a random no de i in a random HSCM ( α , ν ) graph of size n , conditioned on the no de co ordinates x n = x 1 . . . x n , is the sum of n − 1 indep endent Bernoulli random v ariables with success probabilities W ( x i , x j ), j 6 = i . The distribution of this sum can b e approximated b y the mixed Poisson distribution with the mixing parameter P j 6 = i W ( x i , x j ). Therefore after ﬁrst integrating ov er x j with j 6 = i and then o ver x i , the distribution of D n is approximately the mixed Poisson distribution P ( D n = k ) = E  ( κ n ( X )) k k ! e − κ n ( X )  , so that E [ D n ] = E [ κ n ( X )] , where the random v ariable X has densit y µ n ( 25 ), and the mixing parameter κ n ( x ) is the exp ected degree of a no de at co ordinate x : κ n ( x ) = ( n − 1) w n ( x ) , (30) w n ( x ) = Z A n W ( x, y ) dµ n ( y ) , (31) where A n is given by ( 23 ). In the n → ∞ limit, the distribution of the exp ected degree κ n ( X ) of X con verges to the P areto distribution with shap e α and scale ν β . T o prov e this, w e use the observ ation that the mass of measure µ n is concen trated tow ards the right end of the interv al A n = ( −∞ , R n ], where R n  1 for large n . Therefore, not only the contributions coming from negative x, y are negligible, but we can also approximate the F ermi-Dirac graphon W ( x, y ) in ( 21 ) with its classical limit approximation c W ( x, y ) = e − ( x + y ) (32) on R 2 + . In addition, the exp ected degree function w n ( x ) can b e appro ximated with b w n ( x ) deﬁned by b w n ( x ) = ( ω n e − x if 0 ≤ x ≤ R n 0 otherwise , where (33) ω n = Z R n 0 e − x dµ n ( x ) = 1 − e − ( α − 1) R n β e R n =  ν n  1 2 + o  n − 1 2  , (34) 17 so that the expected degree of a no de at co ordinate x can be appro ximated by b κ n ( x ) = n b w n ( x ) = e − x  ( ν n ) 1 2 + o  n 1 2  . T o see that b κ n ( X ) con verges to a P areto random v ariable, note that since X has densit y µ n , it follows that for all t > ν β P ( b κ n ( X ) > t ) = P  X < log nω n t  = e − αR n  nω n t  α = ( ν β ) α t − α (1 + o (1)) = P ( Y > t ) (1 + o (1)) , where Y is a P areto-distributed random v ariable ( 27 ). W e therefore ha ve the follo wing result, the full pro of of whic h can b e found in Section 4.2.2 : Theorem 3.2 ( HSCM ( α, ν ) satisﬁes ( C1 )) . L et α > 1 , ν > 0 , and D n b e the de gr e e of a uniformly chosen vertex in the HSCM ( α, ν ) gr aphs of size n . Then, for e ach k = 0 , 1 , 2 , . . . , lim n →∞ P ( D n = k ) = P ( D = k ) , wher e P ( D = k ) is given by ( 28 ) . 3.3 The limit of the exp ected av erage degree in the HSCM The exp ected degree of a random no de in n -sized HSCM graphs is, for any ﬁxed i , E [ D n ] = X j 6 = i E [ W ( X i , X j )] = ( n − 1) E [ W ( X , Y )] , (35) where X and Y are indep endent random v ariables with distribution µ n . Approximating W ( x, y ) with c W ( x, y ) on R 2 + , and using e 2 R n = n/ν β 2 , we hav e n Z Z R n 0 c W ( x, y ) dµ n ( x ) dµ n ( y ) = n  Z R n 0 e − x dµ n ( x )  2 = n β 2 e 2 R n  1 − e − ( α − 1) R n  2 = ν + o (1) . All other contributions are sho wn to also v anish in the n → ∞ limit in Section 4.2.3 , where the following theorem is prov ed: Theorem 3.3 ( HSCM satisﬁes ( C2 )) . L et α > 1 , ν > 0 , and D n b e the de gr e e of a uniformly chosen vertex in the HSCM ( α, ν ) gr aphs of size n . Then lim n →∞ E [ D n ] = ν. 3.4 HSCM maximizes graphon entrop y Let A ⊆ R b e some in terv al, µ a measure on A with µ ( A ) < ∞ , and suppose some µ -integrable function w : A → R is giv en. Consider the graphon entrop y maximization problem under the constrain t w ( x ) = Z A W ( x, y ) dµ ( y ) . (36) That is, the problem is to ﬁnd a symmetric function W ∗ that maximizes graphon entrop y σ [ W, µ, A ] in ( 11 ) and satisﬁes the constraint ab ov e, for ﬁxed A and µ . 18 W e note that this problem is a “con tinuous version” of the Gibbs en tropy maximization problem in the SCM ensem ble in Section 2.1.2 . The following prop osition, which we pro ve in Section 4.3.1 , states that the solution to this problem is a “contin uous v ersion” of the SCM solution ( 8 ): Prop osition 3.4 ( HSCM Maximizes Graphon En tropy) . Supp ose ther e exists a solution W ∗ to the gr aphon entr opy maximization pr oblem deﬁne d ab ove. Then this solution has the fol lowing form: W ∗ ( x, y ) = 1 e λ ( x )+ λ ( y ) + 1 , (37) wher e function λ is µ -almost-everywher e uniquely deﬁne d on A by ( 36 ) . This prop osition pro ves that for each n , the HSCM ( α , ν )’s F ermi-Dirac graphon ( 21 ) max- imizes the graphon entrop y under constraint ( 31 ), b ecause A and µ in the HSCM ( α, ν ) are c hosen such that λ ( x ) = x . This is alwa ys p ossible as so on as λ ( x ) is inv ertible, cf. Sec- tion 2.3.3 . F or each n , interv al A n ( 23 ) and measure µ n ( 25 ) in the HSCM ( α, ν ) can be mapp ed to [0 , 1] and 1, respectively , in whic h case λ n ( x ) = R n + (1 /α ) log x , leading to ( 3 ). In other w ords, no de co ordinates x in the original HSCM ( α , ν ) deﬁnition in Section 2.3.6 , and their co ordinates ˜ x in its equiv alent deﬁnition with A n = [0 , 1] and µ n = 1 are related b y ˜ x = e α ( x − R n ) . 3.5 Graphon en trop y scaling and con v ergence T o derive the rate of conv ergence of HSCM ’s graphon entrop y to zero, it suﬃces to consider the classical limit appro ximation c W ( 32 ) to W ( 21 ) on R 2 + . Its Bernoulli entrop y ( 10 ) is H  c W ( x, y )  = ( x + y ) c W ( x, y ) − (1 − c W ( x, y )) log (1 − c W ( x, y )) . Since the most of the mass of µ n is concen trated near R n and c W ( R n , R n ) → 0, the second term is negligible. Integrating the ﬁrst term o ver [0 , R n ], we get Z Z R n 0 ( x + y ) c W ( x, y ) dµ n ( x ) dµ n ( y ) = α 2 e − 2 αR n Z Z R n 0 ( x + y ) e ( α − 1)( x + y ) dx dy = 2 R n β 2 e 2 R n + O ( n − 1 ) = ν n log n ν β 2 + O ( n − 1 ) = ν log n n + O ( n − 1 ) , (38) from which we obtain the prop er scaling as log( n ) /n . All further details b ehind the pro of of the following theorem are in Section 4.3.2 . Theorem 3.5 (Graphon En tropy Con vergence) . L et σ [ G n ] b e the gr aphon entr opy ( 18 ) in the HSCM ( α, ν ) ensemble with any α > 1 and ν > 0 . Then, as n → ∞ ,     n σ [ G n ] log n − ν     = O (1 / log n ) . This theorem implies that σ [ G n ] go es to zero as log ( n ) /n , while n σ [ G n ] / log n go es to ν as 1 / log n . 19 3.6 Gibbs en trop y scaling and con v ergence The last part of Theorem 3.1 is to prov e that the rescaled Gibbs entrop y ( 13 ) of the HSCM con verges to its graphon entrop y ( 18 ) faster than the latter con verges to zero. The graphon en trop y is a trivial lo wer b ound for the rescaled Gibbs entrop y , Section 2.3.2 , so the problem is to ﬁnd an appropriate upp er b ound for the latter conv erging to the graphon en tropy . T o iden tify such an upper bound, we rely on an argumen t similar to [ 41 ]. Sp eciﬁcally , w e ﬁrst partition A n in to m in terv als I t that induce a partition of A 2 n in to rectangles I st = I s × I t , s, t = 1 . . . m . W e then appro ximate the graphon by its a v erage v alue on eac h rectangle. Suc h appro ximation brings in an error term on each rectangle. W e then show that the Gibbs en tropy is upp er-b ounded b y the entrop y of the a veraged graphon, plus the sum of entropies of indicator random v ariables M i whic h take v alue t if the co ordinate x i of no de i happ ens to fall within in terv al I t . The smaller the num b er of in terv als m , the smaller the total en tropy of these random v ariables M i , but the larger the sum of the error terms coming from graphon a veraging, b ecause rectangles I st are large. The smaller they are, the smaller the total error term, but the larger the total en tropy of the M i ’s. The crux of the proof is to ﬁnd a “sweet sp ot”—the right n umber of in terv als guaranteeing the prop er balance b et ween these tw o t yp es of contributions to the upp er b ound, which w e w ant to b e tighter than the rate of the con vergence of the graphon entrop y to zero. This program is executed in Section 4.4 , where w e pro ve the following theorem: Theorem 3.6 (Gibbs En trop y Conv ergence) . L et σ [ G n ] b e the gr aphon entr opy ( 18 ) and S ∗ [ G n ] b e the r esc ale d Gibbs entr opy ( 13 ) in the HSCM ( α, ν ) ensemble with any α > 1 and ν > 0 . Then lim n →∞ n log n |S ∗ [ G n ] − σ [ G n ] | = 0 , and lim n →∞ 2 S [ G n ] n log n = ν. W e remark that this theorem implies that S [ G n ] ∼ ν 2 n log n, whic h is the leading term of the (S)CM Gibbs entrop y obtained in [ 23 ]. It is also instructiv e to compare this scaling of Gibbs en tropy with its scaling in dense ensembles with lim n →∞ σ [ G n ] = σ [ W, µ, A ] = σ ∈ (0 , ∞ ) [ 41 ]: S [ G n ] ∼ σ 2 n 2 . Finally , it is w orth mentioning that ev en though we use the F ermi-Dirac graphon W ( 21 ) to deﬁne our W -random graphs, the same conv ergence results could b e obtained for W -random graphs deﬁned by any other graphon W 0 suc h that lim n →∞ n E    W ( X, Y ) − W 0 ( X , Y )    = 0 , with X and Y ha ving densit y µ n . In fact, to establish the required limits, we use the classi- cal limit appro ximation graphon c W instead of W . Therefore there exists a v ast equiv alence class of W -random graphs deﬁned by graphons W 0 that all hav e the same limit degree dis- tribution ( C1 ) and av erage degree ( C2 ), and whose rescaled Gibbs entrop y conv erges to 20 the graphon entrop y of the F ermi-Dirac W . Ho wev er, it follows from Prop osition 3.4 that among all these ensembles, only the W -random graph e nsem ble deﬁned b y the F ermi-Dirac graphon ( 21 ) uniquely maximizes the graphon en tropy ( 11 ) for each n , whic h, b y our deﬁni- tion of maxim um-entrop y ensembles under hypersoft constraints, is a necessary condition for graph entrop y maximization. 4 Pro ofs In this section we provide the pro ofs of all the results stated in the previous section. In Section 4.1 we b egin with some preliminary results on the accuracy of the appro ximation of the F ermi-Dirac graphon W ( 21 ) by the classical limit approximation c W . In the same section w e also establish results showing that the main con tribution of the in tegration with resp ect to µ n is on the p ositiv e part [0 , R n ] of the in terv al A n as deﬁned b y ( 23 ). In particular, we sho w that all contributions coming from the negative part of this in terv al, i.e., R − , are o ( n − 1 ), whic h means that for all our results the negative part of the supp ort of our measure µ n is negligible. W e then pro ceed with proving Theorems 3.2 and 3.3 in Section 4.2 . The pro ofs of Prop osition 3.4 and Theorem 3.5 can b e found in Section 4.3 . Finally , the conv ergence of the rescaled Gibbs en tropy to the graphon en tropy (Theorem 3.6 ) is given in Section 4.4 . 4.1 The classical limit approximation of the F ermi-Dirac graphon W e will use e − ( x + y ) as an approximation to the graphon W to compute all necessary limits. T o b e precise we deﬁne c W ( x, y ) = min { e − ( x + y ) , 1 } (39) and sho w that diﬀerences b etw een the integrals of W and c W conv erge to zero as n tends to inﬁnit y . Note that, instead of c W , w e could hav e also work ed with the integral expressions in volving W , whic h migh t hav e led to better b ounds. How ev er, these in tegrals tend to ev aluate to com binations of hypergeometric functions, while the integrals of c W are muc h easier to ev aluate and are suﬃcien t for our purp oses. By the deﬁnition of c W ( x, y ) we need to consider separately , the interv als ( −∞ , 0] and (0 , R n ]. Since graphons are symmetric functions, this leads to the following three diﬀeren t cases: I) − ∞ < x, y ≤ 0 I I) − ∞ < y ≤ R n and 0 < x ≤ R n , I I I) 0 < x, y ≤ R n . (40) F or case I) w e note that W , c W ≤ 1 and Z Z 0 −∞ dµ n ( y ) dµ n ( x ) = O  n − α  . (41) With this we obtain the following result, whic h shows that for b oth W and c W , only the in tegration ov er (0 , R n ] 2 , i.e. case II I, matters. Lemma 4.1. Z Z R n −∞ c W ( x, y ) dµ n ( y ) dµ n ( x ) − Z Z R n 0 c W ( x, y ) dµ n ( y ) dµ n ( x ) = O  n − α +1 2  and the same r esult holds if we r eplac e c W with W . 21 Pr o of. First note that Z − R n −∞ Z R n −∞ c W ( x, y ) dµ n ( y ) dµ n ( x ) ≤ Z − R n −∞ dµ n ( y ) = O  n − α  . (42) W e show that Z 0 − R n Z R n 0 c W ( x, y ) dµ n ( y ) dµ n ( x ) = O  n − α +1 2  , (43) whic h together with ( 41 ) and ( 42 ) implies the ﬁrst result. The result for W follows by noting that W ≤ c W . W e split the in tegral ( 43 ) as follo ws Z 0 − R n Z R n 0 c W ( x, y ) dµ n ( y ) dµ n ( x ) = Z 0 − R n Z − x 0 dµ n ( y ) dµ n ( x ) + Z 0 − R n Z R n − x e − ( x + y ) dµ n ( y ) dµ n ( x ) . F or the ﬁrst in tegral we compute Z 0 − R n Z − x 0 dµ n ( y ) dµ n ( x ) = αe − 2 αR n Z 0 − R n  e − αx − 1  e αx dx = e − 2 αR n  αR n + e − αR n − 1  = O  log( n ) n − α  Finally , the second in tegral ev aluates to α β e − 2 αR n Z 0 − R n  e ( α − 1) R n − e − ( α − 1) x  e ( α − 1) x dx = α β e − 2 αR n e ( α − 1) R n − 1 α − 1 − R n ! = O  n − α +1 2  , from which the result follo ws since α > ( α + 1) / 2. W e can show a similar result for H ( W ) and H ( c W ). Lemma 4.2. Z Z R n −∞ H  c W ( x, y )  dµ n ( y ) dµ n ( x ) − Z Z R n 0 H  c W ( x, y )  dµ n ( y ) dµ n ( x ) =  n , wher e  n =        O (log( n ) n − α ) if 1 < α < 2 , O  log( n ) 2 n − 2  if α = 2 , O  n − 2+ α 2  if α > 2 , Mor e over, the same r esult holds if we r eplac e c W with W . 22 Pr o of. W e will ﬁrst prov e the result for c W . F or this we split the interv al A n in to three parts ( −∞ , − R n ], [ − R n , 0] and (0 , R n ] and sho w that the integrals on all ranges other than [0 , R n ] 2 are b ounded by a term that scales as  n . Since H ( p ) ≤ log (2) for all 0 ≤ p ≤ 1 it follows from ( 41 ) and ( 42 ) that, for all α > 1, Z Z 0 −∞ H  c W ( x, y )  dµ n ( y ) dµ n ( x ) = O  n − α  = o (  n ) , Z R n −∞ Z − R n −∞ H  c W ( x, y )  dµ n ( y ) dµ n ( x ) = O  n − α  = o (  n ) . Hence, using the symmetry of c W , w e only need to consider the integration o ver (0 , R n ] × ( − R n , 0]. First we compute H  c W ( x, y )  = e − ( x + y ) ( x + y ) − (1 − e − ( x + y ) ) log  1 − e − ( x + y )  and observe that − (1 − e − z ) log  1 − e − z  ≤ e − 2 z for all large enough z . (44) No w let δ > 0 b e suc h that ( 44 ) holds for all z ≥ δ . Then δ < R n for suﬃciently large n and w e split the integration as follows Z R n 0 Z 0 − R n H  c W ( x, y )  dµ n ( y ) dµ n ( x ) = Z δ 0 Z 0 − R n H  c W ( x, y )  dµ n ( y ) dµ n ( x ) + Z R n δ Z 0 − R n H  c W ( x, y )  dµ n ( y ) dµ n ( x ) . The ﬁrst integral is O ( n − α ). F or the second w e note that x + y > δ for all y > δ − x and hence Z R n δ Z 0 − R n H  c W ( x, y )  dµ n ( y ) dµ n ( x ) ≤ Z R n δ Z δ − x − R n log(2) dµ n ( y ) dµ n ( x ) + Z R n δ Z 0 δ − x e − 2( x + y ) dµ n ( y ) dµ n ( x ) . F or the second in tegral we obtain, Z R n δ Z 0 δ − x e − 2( x + y ) dµ n ( y ) dµ n ( x ) =        O (log( n ) n − α ) if 1 < α < 2 , O  log( n ) 2 n − 2  if α = 2 , O  n − 2+ α 2  if α > 2 , while for the ﬁrst in tegral w e hav e Z R n δ Z δ − x − R n log(2) dµ n ( y ) dµ n ( x ) ≤ αe − 2 αR n Z R n δ e αδ dx = O  log( n ) n − α  . 23 Therefore we conclude that Z R n 0 Z 0 −∞ H  c W ( x, y )  dµ n ( y ) dµ n ( x ) =        O (log( n ) n − α ) if 1 < α < 2 , O  log( n ) n − 2  if α = 2 , O  n − 4+ α 2  if α > 2 , whic h yields the result for c W . F or W w e ﬁrst compute that H ( W ( x, y )) = log  1 + e x + y  − ( x + y ) e x + y 1 + e x + y = log  1 + e x + y  − ( x + y ) + ( x + y )  1 − e x + y 1 + e x + y  = log  1 + e − ( x + y )  + ( x + y ) 1 + e x + y ≤ log  1 + e − ( x + y )  + ( x + y ) e − ( x + y ) . Comparing this upp er b ound to H ( c W ( x, y )) and noting that log (1 + e − z ) ≤ e − 2 z for large enough z , the result follows from the computation done for c W . With these t wo lemmas we no w establish tw o imp ortant results on the approximations of W . The ﬁrst sho ws that if X and Y are indep enden t with distribution µ n , then c W ( X, Y ) con verges in expectation to W ( X, Y ), faster than n − 1 . Prop osition 4.3. L et X , Y b e indep endent with density µ n and α > 1 . Then, as n → ∞ , E h    W ( X, Y ) − c W ( X, Y )    i = O  n − α +1 2  Pr o of. Since E h    W ( X, Y ) − c W ( X, Y )    i = Z Z R n −∞    W ( x, y ) − c W ( x, y )    dµ n ( y ) dµ n ( x ) , and    W ( x, y ) − c W ( x, y )    ≤ 1 it follo ws from Lemma 4.1 that it is enough to consider the in tegral Z Z R n 0    W ( x, y ) − c W ( x, y )    dµ n ( y ) dµ n ( x ) . F or this we note that    W ( x, y ) − c W ( x, y )    ≤ e − 2( x + y ) . Hence we obtain Z Z R n 0    W ( x, y ) − c W ( x, y )    dµ n ( y ) dµ n ( x ) ≤  Z R n 0 e − 2 x dµ n ( x )  2 =      O ( n − α ) if 1 < α < 2 O  log( n ) 2 n − 2  if α = 2 O  n − 2  if α > 2 . Since all these terms are o ( n − ( α +1) / 2 ), the result follo ws. 24 Next w e sho w that also H ( c W ( X, Y )) con v erges in exp ectation to H ( W ( X, Y )), faster than n − 1 . Prop osition 4.4. L et X , Y b e indep endent with density µ n and α > 1 . Then, as n → ∞ , E h    H ( W ( X, Y )) − H  c W ( X, Y )     i =      O (log( n ) n − α ) if 1 < α < 2 O  log( n ) 3 n − 2  if α = 2 O  log( n ) n − 2  if α > 2 . Pr o of. Similar to the previous pro of w e now use Lemma 4.2 to show that it is enough to consider the integral Z Z R n 0    H ( W ( x, y )) − H  c W ( x, y )     dµ n ( y ) dµ n ( x ) . Deﬁne δ W ( x, y ) = 1 e 2( x + y ) + e x + y , and note that c W ( x, y ) = W ( x, y ) + δ W ( x, y ). No w ﬁx x, y . Then, b y the T aylor-Lagrange theorem, there exists a W ( x, y ) ≤ c W ( x, y ) ≤ c W ( x, y ) suc h that    H ( W ( x, y )) − H  c W ( x, y )     =   H 0 ( c W ( x, y ))   δ W ( x, y ) ≤    H 0 ( W ( x, y ))   +    H 0  c W ( x, y )      δ W ( x, y ) . Next we compute that   H 0 ( W ( x, y ))   = ( x + y ) , and    H 0  c W ( x, y )     = log  e x + y − 1  , where log ( e x + y − 1) ≤ ( x + y ) for all x + y ≥ 1. W e now split the in tegral and b ound it as follo ws Z Z R n 0    H ( W ( x, y )) − H  c W ( x, y )     dµ n ( y ) dµ n ( x ) = Z 1 0 Z 1 − x 0    H ( W ( x, y )) − H  c W ( x, y )     dµ n ( y ) dµ n ( x ) + Z 1 0 Z R n 1 − x    H ( W ( x, y )) − H  c W ( x, y )     δ W ( x, y ) dµ n ( y ) dµ n ( x ) + Z R n 1 Z R n 0    H ( W ( x, y )) − H  c W ( x, y )     δ W ( x, y ) dµ n ( y ) dµ n ( x ) ≤ log(2) Z 1 0 Z 1 − x 0 dµ n ( y ) dµ n ( x ) + 2 Z 1 0 Z R n 1 − x ( x + y ) δ W ( x, y ) dµ n ( y ) dµ n ( x ) + 2 Z R n 1 Z R n 0 ( x + y ) δ W ( x, y ) dµ n ( y ) dµ n ( x ) 25 ≤ log(2) Z 1 0 Z 1 − x 0 dµ n ( y ) dµ n ( x ) + 4 Z Z R n 0 ( x + y ) δ W ( x, y ) dµ n ( y ) dµ n ( x ) . The ﬁrst integral is O (log( n ) n − α ), while for the second we hav e 4 Z Z R n 0 ( x + y ) δ W ( x, y ) dµ n ( y ) dµ n ( x ) ≤ 8 R n  Z R n 0 e − 2 x dµ n ( x )  2 =      O (log( n ) n − α ) if 1 < α < 2 O  log( n ) 3 n − 2  if α = 2 O  log( n ) n − 2  if α > 2 . Comparing these scaling to the ones from Lemma 4.2 we see that the former are dominating, whic h ﬁnishes the pro of. 4.2 Pro ofs for no de degrees in the HSCM In this section we give the pro ofs of Theorem 3.2 and Theorem 3.3 . Denote by D i the degree of no de i and recall that D n is the degree of a no de, sampled uniformly at random. Since the no de lab els are interc hangeable we can, without loss of generality , consider D 1 for D n . F or Theorem 3.3 we use ( 35 ). W e sho w that if X and Y are indep endent with distribution µ n , then E h c W ( X, Y ) i → ν . The ﬁnal result will then follow from Prop osition 4.3 . The pro of of Theorem 3.2 is more inv olv ed. Given the co ordinates X 1 , . . . X n w e hav e D n = P n j =2 W ( X 1 , X j ). W e follo w the strategy from [ 64 , Theorem 6.7] and construct a coupling b et ween D n and a mixed P oisson random v ariable P n , with mixing parameter n b w n ( X ), where b w n ( x ) is given by ( 33 ) and X has distribution µ n . In general, a coupling b etw een t wo random v ariables X and Y consists of a pair of new random v ariables ( b X , b Y ), with some joint probabilit y distribution, suc h that b X and b Y ha ve the same marginal probabilities as, resp ectively X and Y . The adv antage of a coupling is that we can tune the joint distribution to ﬁt our needs. F or our pro of we construct a coupling ( b D n , b P n ), such that lim n →∞ P  b D n 6 = b P n  = 0 . Hence, since b D n and b P n ha ve the same distribution as, resp ectively D n and P n w e hav e P ( D n = k ) = P ( P n = k ) + P  b D n = k , b D n 6 = b P n  , from which it follows that | P ( D n = k ) − P ( P n = k ) | → 0. Finally , we sho w that lim n →∞ P ( n b w n ( X ) > k ) = P ( Y > k ) , where Y is a P areto random v ariable with shap e ν β and scale α , i.e. it has probabilit y densit y P Y ( y ) given by ( 26 ). This implies that the mixed Poisson random v ariable P n with mixing parameter n b w n ( X ) conv erges to a mixed P oisson D with mixing parameter Y , whic h pro ves the result. Before we give the pro ofs of the t wo theorems w e ﬁrst establish some technical results needed to construct the coupling, required for Theorem 3.2 , the pro of of which is giv en in Section 4.2.2 . 26 4.2.1 T echnical results on Poisson couplings and concen trations W e ﬁrst establish a general result for couplings b etw een mixed Poisson random v ariables, where the mixing parameters con verge in expectation. Lemma 4.5. L et X n and Y n b e r andom variables such that, lim n →∞ E [ | X n − Y n | ] = 0 . Then, if P n and Q n ar e mixe d Poisson r andom variables with, r esp e ctively, p ar ameters X n and Y n , P ( P n 6 = Q n ) = O ( E [ | X n − Y n | ]) , and in p articular lim n →∞ P ( P n 6 = Q n ) = 0 . Pr o of. Let a n = E [ | X n − Y n | ] and deﬁne the ev ent A n = {| X n − Y n | ≤ √ a n } . Then, since lim n →∞ P ( | X n − Y n | > √ a n ) ≤ lim n →∞ E [ | X n − Y n | ] √ a n = 0 , it is enough to sho w that lim n →∞ P ( P n 6 = Q n , A n ) = 0 . T ak e ˆ P n to b e mixed Poisson with parameter X n + √ a n . In addition, let V n and Z n b e mixed P oisson with parameter min { X n + √ a n − Y n , 0 } and √ a n , resp ectiv ely . Then, since on A n w e ha ve X n + √ a n > Y n w e get, using Marko v’s inequalit y , P ( P n 6 = Q n , A n ) ≤ P  P n 6 = Q n , P n = ˆ P n , A n  + P  ˆ P n 6 = P n , A n  = P  ˆ P n 6 = Q n , A n  + P ( Z n 6 = 0 , A n ) = P ( V n 6 = 0 , A n ) + P ( Z n 6 = 0 , A n ) ≤ P ( V n ≥ 1 , A n ) + P ( Z n ≥ 1 , A n ) ≤ E [ | X n + √ a n − Y n | ] + √ a n ≤ E [ | X n − Y n | ] + 2 √ a n = O ( √ a n ) . Since a n → 0 by assumption, this ﬁnishes the pro of. Next we show that c W ( X, Y ) con verges in exp ectation to b w n ( X ) when X , Y ha ve distri- bution µ n . W e also establish an upp er b ound on the rate of con vergence, showing that it con verges faster than n − 1 . Lemma 4.6. L et X , Y b e indep endent r andom variables with density µ n . Then, as n → ∞ , E h    c W ( X, Y ) − b w n ( X )    i = O  log( n ) n − α + n − 1+ α 2  27 Pr o of. Recall that b w n ( x ) = ( ω n e − x if 0 ≤ x ≤ R n 0 otherwise , where ω n = Z R n 0 e − x dµ n ( x ) = 1 − e − ( α − 1) R n β e R n . Hence, it follows that E h    c W ( X, Y ) − b w n ( X )    i = Z 0 −∞ Z R n −∞ c W ( x, y ) dµ n ( y ) dµ n ( x ) + Z R n 0 Z R n −∞    c W ( x, y ) − b w n ( x )    dµ n ( y ) dµ n ( x ) , where the ﬁrst in tegral is O ( n − ( α +1) / 2 ) by Lemma 4.1 . T o deal with the second integral we ﬁrst compute Z R n 0 Z − x −∞ dµ n ( y ) dµ n ( x ) = αR n e − 2 αR n = O  log( n ) n − α  . Therefore we hav e Z R n 0 Z R n −∞    c W ( x, y ) − b w n ( x )    dµ n ( y ) dµ n ( x ) = Z R n 0 Z − x −∞ dµ n ( y ) dµ n ( x ) + Z R n 0 Z R n − x    e − ( x + y ) − ω n e − x    dµ n ( y ) dµ n ( x ) ≤ O  log( n ) n − α  + Z R n 0 Z R n 0 e − x   e − y − ω n   dµ n ( y ) dµ n ( x ) = O  log( n ) n − α  + ω n Z R n 0   e − y − ω n   dµ n ( y ) W e proceed to compute the last in tegral and show that it is O ( n − ( α +1) / 2 ). F or this w e note that since e − y ≤ ω n ⇐ ⇒ y ≤ − log ( ω n ) , w e hav e Z R n 0   e − y − ω n   dµ n ( y ) = Z − log( ω n ) 0 ( ω n − e − y ) dµ n ( y ) + Z R n − log( ω n ) ( e − y − ω n ) dµ n ( y ) F or the ﬁrst in tegral we compute Z − log( ω n ) 0 ( ω n − e − y ) dµ n ( y ) = αe − αR n  ω 1 − α n α − ω 1 − α n α − 1 − ω n α + 1 α − 1  = e − αR n β − ω n e − αR n α − 1 − ω n e − αR n = e − αR n β − ω n e − αR n β . 28 Similar calculations yield Z R n − log( ω n ) ( e − y − ω n ) dµ n ( y ) = e − R n β − ω n − ω 1 − α n e − αR n α − 1 , and hence, Z R n 0   e − y − ω n   dµ n ( y ) = 1 β  e − R n + e − αR n  − ω n − ω n e − αR n β − ω 1 − α n e − αR n α − 1 ≤ 1 β  e − R n + e − αR n  − ω n . W e now use this last upp er b ound, together with ω n = 1 − e − ( α − 1) R n β e R n = 1 β  e − R n − e − αR n  , to obtain ω n Z R n 0   e − y − ω n   dµ n ( y ) ≤ ω n β  e − R n + e − αR n  − ω 2 n = 1 β 2  e − 2 R n + e − α 2 R n  − 1 β 2  e − R n − e − αR n  2 = 2 β 2 e − ( α +1) R n = O  n − 1+ α 2  , from which the result follo ws. 4.2.2 Pro of of Theorem 3.2 W e start with constructing the coupling ( b D n , b P n ), b etw een D n and P n suc h that lim n →∞ P  b D n 6 = b P n  = 0 . (45) First, let I j b e the indicator that the edge (1 , j ) is present in G n . Let X n = X 1 , . . . , X n denote the co ordinates of the no des. Then, conditioned on these, we ha ve that I j are indep endent Bernoulli random v ariables with probability W ( X 1 , X j ), while D 1 = P n j =2 I j . Now, let Q n b e a mixed P oisson with parameter P n j =2 W ( X 1 , X j ). Then, see for instance [ 64 , Theorem 2.10], there exists a coupling ( b D 1 , b Q n ) of D 1 and Q n , such that P  b D 1 6 = b Q n    X n  ≤ n X j =2 W ( X 1 , X j ) 2 . Therefore, we get that P  b D 1 6 = b Q n  ≤ ( n − 1) Z Z R n −∞ W ( x, y ) 2 dµ n ( y ) dµ n ( x ) ≤ n  Z R n −∞ e − 2 x dµ n ( x )  2 =      O  n − ( α − 1)  if 1 < α < 2 O  log( n ) n − 1  if α = 2 O  n − 1  if α > 2 . 29 Next, since X 1 and X j are indep enden t for all 2 ≤ j ≤ n we use Prop osition 4.3 together with Lemma 4.6 to obtain E         n X j =2 W ( X 1 , X j ) − n b w n ( X 1 )         ≤ ( n − 1) E [ | W ( X 1 , X 2 ) − b w n ( X 1 ) | ] + E [ b w n ( X 1 )] ≤ ( n − 1) E h    W ( X 1 , X 2 ) − c W ( X 1 , X 2 )    i + ( n − 1) E h    c W ( X 1 , X 2 ) − b w n ( X 1 )    i + E [ b w n ( X 1 )] = O  n − α − 1 2  . from which it follows that lim n →∞ E         n X j =2 W ( X 1 , X j ) − n b w n ( X 1 )         = 0 . (46) No w let b P n b e a mixed P oisson random v ariable with mixing parameter n b w n ( X 1 ). Then by ( 46 ) and Lemma 4.5 lim n →∞ P  b P n 6 = b Q n  = 0 and ( 45 ) follows from P  b D n 6 = b P n  ≤ P  b D n 6 = b Q n  + P  b P n 6 = b Q n  . As a result w e hav e that lim n →∞ | P ( D n = k ) − P ( P n = k ) | = lim n →∞    P  b D n = k  − P  b P n = k     = lim n →∞ P  b D n = k , b D n 6 = b P n  ≤ lim n →∞ P  b D n 6 = b P n  = 0 , (47) W e will now prov e that if X has densit y µ n , then for an y t > 0 lim n →∞ | P ( n b w n ( X ) > t ) − P ( Y > t ) | = 0 . (48) A sequence { P n } n ≥ 1 of mixed P oisson random v ariables, with mixing parameters Z n con verges to a mixed Poisson random v ariable P with mixing parameter Z , if Z n con verge to Z in distribution, see for instance [ 63 ]. Therefore, since D is mixed P oisson with mixing parameter Y , ( 48 ) implies lim n →∞ P ( P n = k ) = P ( D = k ) whic h, combined with ( 47 ), yields lim n →∞ P ( D n = k ) = P ( D = k ) . 30 T o establish ( 48 ), ﬁrst deﬁne  n = e − ( α − 1) R n − e − 2 αR n so that n b w n ( x ) = n β e R n  1 − e − ( α − 1) R n  = √ ν n e − x (1 −  n ) . Next, for all 0 ≤ x ≤ R n w e hav e that ν β (1 −  n ) ≤ n b w n ( x ) ≤ √ ν n (1 −  n ) and hence P ( n ˆ w n ( X ) > t ) =      P  X < log  nω n t  if ν β (1 −  n ) ≤ t ≤ √ ν n (1 −  n ) 1 if t < ν β (1 −  n ) 0 else. Moreo ver, for an y ν β (1 −  n ) ≤ t ≤ √ ν n (1 −  n ) P ( n b w n ( X ) > t ) = P  X < log  nω n t  = Z log ( nω n t ) −∞ dµ n ( x ) = e − αR n  nω n t  α =  nω n e − R n  α t − α =  ν β t  α (1 −  n ) α . (49) No w, ﬁx t < ν β . Then, for large enough n it holds that t < ν β (1 −  n ) in which case ( 48 ) holds trivially , since b oth probabilities are 1. Hence w e can assume, without loss of generality that t ≥ ν β . Then for n large enough such t ≤ √ ν n (1 −  n ), it follows from ( 49 ) and ( 27 ) that | P ( n b w n ( X ) > t ) − P ( Y > t ) | =      ν β t  α (1 −  n ) α −  ν β t  α     =  ν β t  α (1 − (1 −  n ) α ) ≤ 1 − (1 −  n ) α , whic h conv erges to zero as n → ∞ and hence pro ves ( 48 ). 4.2.3 Pro of of Theorem 3.3 First using Lemma 4.1 , w e compute that ( n − 1) E h c W ( X, Y ) i = ( n − 1) Z Z R n −∞ c W ( x, y ) dµ n ( y ) dµ n ( x ) = ( n − 1) Z Z R n 0 e − ( x + y ) dµ n ( y ) dµ n ( x ) + O  n − α − 1 2  = ( n − 1) β 2 e − 2 R n  1 − e − αR n  2 + O  n − α − 1 2  = ν  1 − 1 β 2 n   1 − e − αR n  2 + O  n − α − 1 2  = ν + O  n − α − 1 2  , 31 where the last line follo ws since − 1 < − α / 2 < − ( α − 1) / 2. Next recall ( 35 ) E [ D n ] = n X j =2 E [ W ( X 1 , X j )] = ( n − 1) E [ W ( X , Y )] . Therefore, using Lemma 4.6 w e ha v e | E [ D n ] − ν | = | ( n − 1) E [ W ( X , Y )] − ν | ≤ ( n − 1) E h    W ( X, Y ) − c W ( X, Y )    i + ( n − 1)    E h c W ( X, Y ) i − ν    ≤ O  log( n ) n − ( α − 1)  + O  n − α − 1 2  , whic h yields the result. 4.3 Pro ofs for graphon en trop y Here we deriv e the prop erties of the graphon entrop y σ of our graphon W giv en b y ( 21 ). W e ﬁrst give the pro of of Prop osition 3.4 , and then that of Theorem 3.5 . 4.3.1 Pro of of Prop osition 3.4 Recall that, giv en a measure µ on some in terv al A ⊆ R with µ ( A ) < ∞ and a µ -integrable function w ( x ), we consider the problem of maximizing σ [ W , µ ] under the constrain t ( 36 ), w ( x ) = Z A W ( x, y ) dµ ( y ) . In particular we need to show that if a solution exists, it is given by ( 37 ). Therefore, supp ose there exists at least one graphon W whic h satisﬁes the constraint. F or this w e use the tec hnique of Lagrange multipliers from v ariation calculus [ 65 , 66 ]. T o set up the framework, let W denote the space of symmetric functions W : A × A → [0 , 1] whic h satisfy Z Z A 2 W ( x, y ) dµ ( y ) dµ ( x ) < ∞ . Observ e that W is a conv ex subset of the Banac h space W R , of all symmetric, µ -integrable, functions W : A × A → R and that the function w is an elemen t of L 1 ( −∞ , R ) with resp ect to the measure µ , which is also Banach. Denote the latter space b y L 1 A,µ . W e slightly abuse notation and write σ for the functional W 7→ Z Z A 2 H ( W ( x, y )) dµ ( y ) dµ ( x ) = σ [ W , µ ] , and deﬁne the functional F : W R → L 1 A,µ b y F ( W )( x ) = w ( x ) − Z A W ( x, y ) dµ ( y ) . 32 Then w e need to solv e the following Euler-Lagrange equation for some Lagrange m ultiplier functional λ : L 1 A,µ → R , ∂ ∂ W ( σ ( W ) − Λ ◦ F ( W )) = 0 , with resp ect to the F r´ ec het deriv ativ e. By Riesz Representation Theorem, w e ha ve that for an y functional Λ : L 1 A,µ → R , there exists a λ ∈ L ∞ , uniquely deﬁned on A , such that Λ( f ) = Z A λ ( x ) f ( x ) dµ ( x ) . Hence our Euler-Lagrange equation b ecomes ∂ ∂ W  σ ( W ) − Z A λ ( x )  w ( x ) − Z A W ( x, y ) dµ ( y )  dµ ( x )  = 0 . (50) In addition, since W is symmetric w e hav e Z Z A 2 λ ( x ) W ( x, y ) dµ ( y ) dµ ( x ) = 1 2 Z Z A 2 ( λ ( x ) + λ ( y )) W ( x, y ) dµ ( y ) dµ ( x ) and hence, by absorbing the factor 1 / 2 into λ w e can rewrite our Euler-Lagrange equation as ∂ ∂ W  σ ( W ) − Z A λ ( x ) w ( x ) dµ ( x ) + Z Z A 2 ( λ ( x ) + λ ( y )) W ( x, y ) dµ ( y ) dµ ( x )  = 0 . (51) F or the tw o deriv ativ es we hav e ∂ σ ( W ) ∂ W = log(1 − W ( x, y )) − log( W ( x, y )) = − log  W ( x, y ) 1 − W ( x, y )  , ∂ ∂ W  Z A λ ( x ) w ( x ) dµ ( x ) + Z Z A 2 ( λ ( x ) + λ ( y )) W ( x, y ) dµ ( y ) dµ ( x )  = λ ( x ) + λ ( y ) . (52) Hence, we need to solv e the equation − log  W ( x, y ) 1 − W ( x, y )  = λ ( x ) + λ ( y ) , whic h gives ( 37 ). There is, how ev er, a small technicalit y related to the computation of the deriv ativ e ( 52 ). This is caused by the fact that H is only deﬁned on [0 , 1]. In particular, for W 1 , W 2 ∈ W , it could b e that W 1 + W 2 > 1 on some subset C ⊆ A × A and hence H ( W 1 + W 2 ) is not w ell deﬁned. T o compute the F r´ ec het deriv ativ e we need H ( W 1 + εW 2 ) to b e well deﬁned for any W 1 , W 2 ∈ W and some, small enough, ε > 0. T o this end w e ﬁx a 0 < δ < 1 and deﬁne H δ ( x ) = H  (1 − δ ) x + δ 2  , whic h is just H , stretched out to the in terv al ( − δ / 2(1 − δ ) , 1 + δ / 2(1 − δ )). Similarly we deﬁne σ δ , using H δ and consider the corresponding graphon entrop y maximization problem. Then 33 w e can compute ∂ σ δ /∂ W , b y taking a W 0 ∈ W and ε > 0 suc h that W + εW 0 < 1 + δ / 2(1 − δ )). Then H δ ( W + εW 0 ) is well-deﬁned and using the chain rule w e obtain ∂ H δ ( W + εW 0 ) ∂ ε = − (1 − δ ) W 0 log  (1 − δ )( W + εW 0 ) + δ / 2 1 − (1 − δ )( W + εW 0 ) − δ / 2  , from which it follows that ∂ σ δ ( W ) ∂ W = − (1 − δ ) log  (1 − δ ) W + δ / 2 1 − (1 − δ ) W − δ / 2  . Therefore we hav e the following equation − (1 − δ ) log  (1 − δ )( W + δ / 2 1 − (1 − δ ) W − δ / 2  = λ ( x ) + λ ( y ) , whic h leads to a solution W δ of the form W δ ( x, y ) = 1 − δ 2  1 + e λ ( x )+ λ ( y ) (1 − δ )  (1 − δ )  1 + e λ ( x )+ λ ( y ) (1 − δ )  . Since δ < 1 it follows, using elemen tary algebra, that W δ ( x, y ) ∈ [0 , 1] for all x, y ∈ [0 , 1]. F rom this we conclude that W δ ∈ W . Moreo ver W δ con verges to ( 37 ) as δ → 0. Since σ δ → σ as δ → 0, w e obtain the graphon that maximizes the en trop y σ , where the function λ is determined by the constraint ( 36 ). F or uniqueness, suppose there exist t w o solutions, λ 1 and λ 2 in L 1 A,µ , to ( 36 ) and for whic h the graphon entrop y is maximized. Let f ( x ) = λ 1 ( x ) − λ 2 ( x ) ∈ L 1 A,µ , so that λ 1 = λ 2 + f . Since λ 1 satisﬁes ( 50 ) it follo ws that, due to linearit y of the deriv ativ e, ∂ ∂ W Z A f ( x ) W ( x, y ) dµ ( y ) dµ ( x ) = 0 . No w since ∂ ∂ W Z A f ( x ) W ( x, y ) dµ ( y ) dµ ( x ) = f ( x ) , it follows that f = 0, µ almost ev erywhere on A and hence λ 1 = λ 2 , µ almost everywhere on A . 4.3.2 Pro of of Theorem 3.5 First note that Prop osition 4.4 implies that the diﬀerence in exp ectation b etw een H ( W ) and H ( c W ) conv erges to zero faster than log ( n ) /n , so that for the purp ose of Theorem 3.5 we can approximate W with c W . Hence we are left to show that the rescaled entrop y nσ [ c W , µ n ] / log ( n ) con verges to ν . By Lemma 4.2 the in tegration o ver all regimes except [0 , R n ] 2 go es to zero faster than log ( n ) /n and therefore w e only need to consider in tegration o ver [0 , R n ] 2 . The main idea for the rest of the pro of is that in this range H  c W ( x, y )  ≈ ( x + y ) c W ( x, y ) . 34 Let us ﬁrst compute the in tegral ov er [0 , R n ] 2 of the righ t hand side in the ab o ve equation. Z Z R n 0 ( x + y ) c W ( x, y ) dµ n ( y ) dµ n ( x ) = α 2 e − 2 αR n Z Z R n 0 ( x + y ) e ( α − 1)( x + y ) dy dx = 2 e − 2(1+ α ) R n β 2 ( α − 1)  ( α − 1) R n  e 2 αR n + e ( α +1) R n  − e 2 R n − e 2 αR n  = 2 R n β 2  e − 2 R n + e − ( α +1) R n  − 2 β 2 ( α − 1)  e − 2 αR n + e − 2 R n  = 2 R n e − 2 R n β 2 + O ( n − 1 ) = ν n − 1 log( n ) + O ( n − 1 ) whic h implies that lim n →∞ n log( n ) Z Z R n 0 ( x + y ) c W ( x, y ) dµ n ( y ) dµ n ( x ) = ν. (53) Next we show that n log( n )     Z Z R n 0  H  c W ( x, y )  − ( x + y ) c W ( x, y )  dµ n ( y ) dµ n ( x )     = O  log( n ) − 1  , (54) whic h, together with ( 53 ), gives lim n →∞ n log( n ) Z Z R n 0 H  c W ( x, y )  dµ n ( y ) dµ n ( x ) = ν. (55) W e compute that H  c W ( x, y )  = e − ( x + y ) ( x + y ) − (1 − e − ( x + y ) ) log  e x + y − 1 e x + y  = ( x + y ) c W ( x, y ) − (1 − e − ( x + y ) ) log  1 − e − ( x + y )  . (56) Note that − (1 − e − z ) log (1 − e − z ) ≤ e − z for all z ≥ 0. Hence, it follows from ( 56 ) that, on [0 , R n ] 2 ,    H  c W ( x, y )  − ( x + y ) c W ( x, y )    = −  1 − e − ( x + y )  log  1 − e − ( x + y )  ≤ e − ( x + y ) = c W ( x, y ) , so that by Theorem 3.3 , n log( n )     Z Z R n 0  H  c W ( x, y )  − ( x + y ) c W ( x, y )  dµ n ( y ) dµ n ( x )     ≤ n log( n ) Z Z R n 0 c W ( x, y ) dµ n ( y ) dµ n ( x ) = O  log( n ) − 1  . T o conclude, we hav e σ [ c W , µ n ] = Z Z R n −∞ H  c W ( x, y )  dµ n ( y ) dµ n ( x ) 35 = Z Z R n 0 H  c W ( x, y )  dµ n ( y ) dµ n ( x ) + O  n − α + log( n ) n − α +1 2  and hence, using ( 55 ), lim n →∞ nσ [ c W , µ n ] log( n ) = ν. 4.4 Pro of of Theorem 3.6 In this section we ﬁrst formalize the strategy b ehind the pro of of Theorem 3.6 , brieﬂy discussed in Section 3.6 . This strategy relies on partitioning the interv al A n in to non-o verlapping subin terv als. W e then construct a speciﬁc partition satisfying certain requiremen ts, and ﬁnish the pro of of the theorem. 4.4.1 Av eraging W by a partition of A n W e follow the strategy from [ 41 ]. First recall that for a graph G n generated by the HSCM, S [ G n ] ≥ E [ S [ G n | x n ]] =  n 2  E [ H ( W ( x 1 , x 2 ))] =  n 2  σ [ G n ] , and hence S ∗ [ G n ] ≥ σ [ G n ] , where S ∗ [ G n ] = S [ G n ] /  n 2  denotes the normalized Gibbs en tropy . Therefore, the key ingredient is to ﬁnd a matc hing upp er b ound. F or this we partition the range A n = ( −∞ , R n ] of our probabilit y measure µ n ( x ) in to interv als and approximate W ( x, y ) b y its a verage ov er the b ox in whic h x and y lie. T o b e more precise, let m n b e an y increasing sequence of p ositive integers, { ρ t } 0 ≤ t ≤ m n b e suc h that ρ 0 = −∞ , and ρ m n = R n , and consider the partition of ( −∞ , R n ] given by I t = ( ρ t − 1 , ρ t ] for 1 ≤ t ≤ m n . No w deﬁne J n ( x ) = t ⇐ ⇒ x ∈ I t , and let M i b e the random v ariable J n ( X i ), where X i has density function µ n for any v ertex i . The v alue of M i equal to t indicates that v ertex i happ ens to lie within interv al I t . Denoting the vector of these random v ariable by M n = M 1 , . . . , M n , and their en tropy by S [ M n ], we hav e that S [ G n ] ≤ S [ M n ] + S [ G n | M n ] ≤ n S [ M ] +  n 2  Z Z R n −∞ H ( f W ( x, y )) dµ n ( y ) dµ n ( x ) , where f W ( x, y ) is the av erage of W ov er the square I J n ( x ) × I J n ( y ) . That is, f W n ( x, y ) = 1 µ x,y Z I J n ( x ) Z I J n ( y ) W ( u, v ) dµ n ( v ) dµ n ( u ) , (57) 36 with µ x,y = Z I J n ( x ) Z I J n ( y ) dµ n ( v ) dµ n ( u ) , the measure of the b o x to whic h ( x, y ) b elongs. The ﬁrst step in our pro of is to inv estigate ho w well f W n appro ximates W . More sp eciﬁcally , w e w ant to understand ho w the diﬀerence | σ [ W, µ n ] − σ [ f W n , µ n ] | scales, depending on the sp eciﬁc partition. Note that for an y partition ρ t of A n in to m n in terv als we hav e that S [ M ] = − m n X t =1 P ( M = t ) log ( P ( M = t )) ≤ log ( m n ) , where the upp er b ound is ac hieved on the partition which is uniform according to measure µ n . Since n log( n ) n S [ M ]  n 2  = 2 S [ M ] log( n )(1 − 1 /n ) , it is enough to ﬁnd a partition ρ t , with log ( m n ) = o (log( n )), such that lim n →∞ n log( n ) | σ [ W, µ n ] − σ [ f W n , µ n ] | = 0 , (58) This then prov es Theorem 3.6 , since lim n →∞ n log( n ) |S ∗ [ G n ] − σ [ G n ] | = lim n →∞ n log( n ) ( S ∗ [ G n ] − σ [ G n ]) ≤ lim n →∞ n log( n ) n S [ M ]  n 2  + n log( n ) | σ [ W, µ n ] − σ [ f W n , µ n ] | ! ≤ lim n →∞  2 log ( m n ) log( n )(1 − 1 /n ) + n log( n ) | σ [ W, µ n ] − σ [ f W n , µ n ] |  = 0 . 4.4.2 Constructing the partition W e will take I 1 = ( −∞ , − R n ] and partition the remaining interv al [ − R n , R n ] into log ( n ) 2 equal parts. T o this end, let n b e suﬃciently large so that n ≥ ν β 2 , take m n = d log( n ) 2 e + 1, and deﬁne the partition ρ t b y ρ 0 = −∞ , ρ 1 = − R n , and ρ t = ρ t − 1 + 2 R n m n − 1 for all t = 2 , . . . , m n . Note that log( m n ) = O (log log n ) = o (log( n ) so that all that is left, is to prov e ( 58 ). In addition, n log( n ) Z − R n −∞ Z R n −∞ H ( f W ( x, y )) dµ n ( y ) dµ n ( x ) = O  log( n ) − 1 n 1 − α  , and the same holds if we replace f W with W . Hence it follo ws that in order to establish ( 58 ) w e only need to consider the integral o ver the square [ − R n , R n ] × [ − R n , R n ]. That is, w e need to show lim n →∞ n log( n )     Z Z R n − R n  H ( W ( x, y )) − H  f W ( x, y )  dµ n ( y ) dµ n ( x )     = 0 , (59) 37 F or this w e compare σ [ W , µ n ] and σ [ f W , µ n ], based on the mean v alue theorem which states that for any a ≤ b, ∃ c ∈ [ a, b ] , suc h that H ( b ) − H ( a ) = H 0 ( c ) ( a − b ) . Here | H 0 ( c ) | = | log( c ) − log (1 − c ) | , and due to the symmetry of H we get, for any 0 < a ≤ c ≤ b < 1, | H 0 ( c ) | ≤ min { H ( a ) , H ( b ) } min { a, b, 1 − a, 1 − b } . (60) Note that 0 < min u ∈ I M n ( x ) , v ∈ I M n ( y ) W ( u, v ) ≤ f W n ( x, y ) ≤ max u ∈ I M n ( x ) , v ∈ I M n ( y ) W ( u, v ) < 1 , (61) for all x, y ∈ [ − R n , R n ]. In addition, for all x, y ∈ [ − R n , R n ] and ( u, v ) ∈ I M n ( x ) × I M n ( y ) w e ha ve | x + y − u − v | ≤ 2 R n /m n ≤ 2 / log ( n ) and thus | 1 − exp( x + y − u − v ) | ≤ 3 | x + y − u − v | b y the mean v alue theorem. Therefore we hav e | W ( x, y ) − W ( u, v ) | = | e u + v − e x + y | (1 + e x + y )(1 + e u + v ) ≤ min  | e u + v − x − y − 1 | 1 + e u + v , | 1 − e x + y − u − v | 1 + e u + v  ≤ 3 | u + v − x − y | min { W ( u, v ) , 1 − W ( u, v ) } , By symmetry we obtain a similar upp er b ound with W ( x, y ) instead of W ( u, v ) and hence w e conclude | W ( x, y ) − W ( u, v ) | ≤ 3 | u + v − x − y | min { W ( x, y ) , 1 − W ( x, y ) , W ( u, v ) , 1 − W ( u, v ) } . (62) Then, for any x + y ≤ u + v there are c, d such that x + y ≤ c + d ≤ u + v , so by ( 60 ) and ( 62 ) we get | H ( W ( x, y )) − H ( W ( u, v )) | = | H 0 ( W ( c, d ))( W ( x, y ) − W ( u, v )) | ≤ min { H ( W ( x, y )) , H ( W ( u, v )) } | W ( x, y ) − W ( u, v ) | min { W ( x, y ) , 1 − W ( x, y ) , W ( u, v ) , 1 − W ( u, v ) } ≤ 3 | x + y − u − v | · min { H ( W ( x, y )) , H ( W ( u, v )) } . (63) Next, for the partition ρ t w e hav e | I t | = 2 R n m n − 1 ≤ 2 R n log( n ) 2 = log( n ) − log ( ν β 2 ) log( n ) 2 ≤ 1 log( n ) , for t ≥ 2. In addition, ( 61 ) implies that for ( x, y ) ∈ I t × I s , with t, s ≥ 2, W ( ρ t − 1 , ρ s − 1 ) ≤ f W n ( x, y ) , W ( x, y ) ≤ W ( ρ t , ρ s ) , and thus there is a pair x s,t , y s,t ∈ I s × I t suc h that f W n ( x, y ) = W ( x s,t , y s,t ) on I s × I t . Therefore, using ( 63 ), we get | H ( W ( x, y )) − H ( f W n ( x, y )) | = | H ( W ( x, y )) − H ( W ( x s,t , y s,t )) | ≤ 3 | x + y − x s,t − y s,t | H ( W ( x, y )) 38 ≤ 3( | I t | + | I s | ) H ( W ( x, y )) ≤ 6 log( n ) H ( W ( x, y )) Finally , integrating the ab o ve equation o ver the whole square [ − R n , R n ] × [ − R n , R n ] we obtain     Z Z R n − R n  H ( W ( x, y )) − H  f W ( x, y )  dµ n ( y ) dµ n ( x )     ≤ 6 log( n ) σ [ W, µ n ] , whic h prov es ( 59 ) since lim n →∞ n log( n ) | σ [ W, µ n ] − σ [ f W n , µ n ] | ≤ lim n →∞ 6 n (log n ) 2 σ [ W, µ n ] = 0 thanks to Theorem 3.5 . Ac kno wledgments This work was supp orted b y the ARO gran t No. W911NF-16-1-0391 and by the NSF gran t No. CNS-1442999. References [1] S Bo ccaletti, V Latora, Y Moreno, M Cha vez, and D.-U. Hw anga. Complex Net works: Structure and Dynamics. Phys R ep , 424:175–308, 2006. doi:10.1016/j.physrep.2005. 10.009 . [2] M E J Newman. Networks: An Intr o duction . Oxford Univ ersity Press, Oxford, 2010. [3] Alb ert-L´ aszl´ o Barab´ asi. Network scienc e . Cambridge Univ ersity Press, Cambridge, UK, 2016. [4] Ra y Solomonoﬀ and Anatol Rap oport. Connectivity of random nets. B Math Biophys , 13(2):107–117, 1951. doi:10.1007/BF02478357 . [5] E. N. Gilbert. Random Graphs. A nn Math Stat , 30(4):1141–1144, 1959. doi:10.1214/ aoms/1177706098 . [6] P Erd˝ os and A R ´ en yi. On Random Graphs. Publ Math , 6:290–297, 1959. [7] Edw ard A Bender and E Ro dney Canﬁeld. The asymptotic n umber of lab eled graphs with giv en degree sequences. J Comb The ory, Ser A , 24(3):296–307, 1978. doi:10.1016/ 0097- 3165(78)90059- 6 . [8] M Mollo y and B Reed. A Critical P oin t for Random Graphs With a Given Degree Sequence. R andom Struct A lgor , 6:161–179, 1995. [9] Amogh Dhamdhere and Constantine Dovrolis. Twelv e Y ears in the Ev olution of the In ternet Ecosystem. IEEE/A CM T r ans Netw , 19(5):1420–1433, 2011. doi:10.1109/ TNET.2011.2119327 . [10] M. E. J. Newman. Clustering and preferential attachmen t in growing net works. Phys R ev E , 64(2):025102, 2001. doi:10.1103/PhysRevE.64.025102 . 39 [11] F an Chung and Lin yuan Lu. Connected Comp onen ts in Random Graphs with Given Exp ected Degree Sequences. Ann Comb , 6(2):125–145, 2002. doi:10.1007/PL00012580 . [12] F an Ch ung and Lin yuan Lu. The a verage distances in random graphs with given ex- p ected degrees. Pr o c Natl A c ad Sci USA , 99(25):15879–82, 2002. doi:10.1073/pnas. 252631999 . [13] J P ark and M E J Newman. Statistical Mechanics of Net works. Phys R ev E , 70:66117, 2004. doi:10.1103/PhysRevE.70.066117 . [14] Ginestra Bianconi. The entrop y of randomized net work ensembles. EPL , 81(2):28005, 2008. doi:10.1209/0295- 5075/81/28005 . [15] Diego Garlasc helli and Maria Loﬀredo. Maxim um likelihoo d: Extracting unbiased in- formation from complex netw orks. Phys R ev E , 78(1):015101, 2008. doi:10.1103/ PhysRevE.78.015101 . [16] Tiziano Squartini and Diego Garlasc helli. Analytical maxim um-likelihoo d metho d to detect patterns in real netw orks. New J Phys , 13(8):083001, 2011. doi:10.1088/ 1367- 2630/13/8/083001 . [17] P aul W Holland and Samuel Leinhardt. An Exponential F amily of Probability Distri- butions for Directed Graphs. J Am Stat Asso c , 76(373):33–50, 1981. doi:10.1080/ 01621459.1981.10477598 . [18] Kartik Anand and Ginestra Bianconi. En trop y Measures for Netw orks: T ow ard an Information Theory of Complex T op ologies. Phys R ev E , 80:045102(R), 2009. doi: 10.1103/PhysRevE.80.045102 . [19] Tiziano Squartini, Jo ey de Mol, F rank den Hollander, and Diego Garlaschelli. Breaking of Ensemble Equiv alence in Netw orks. Phys R ev L ett , 115(26):268701, 2015. doi:10. 1103/PhysRevLett.115.268701 . [20] Soura v Chatterjee, Persi Diaconis, and Allan Sly . Random graphs with a giv en degree sequence. Ann Appl Pr ob ab , 21(4):1400–1435, 2011. doi:10.1214/10- AAP728 . [21] Guido Caldarelli, Andrea Cap o cci, P . De Los Rios, and Miguel Angel Mu ˜ noz. Scale-F ree Net works from V arying V ertex In trinsic Fitness. Phys R ev L ett , 89(25):258702, 2002. doi:10.1103/PhysRevLett.89.258702 . [22] Mari´ an Bogu ˜ n´ a and Rom ualdo Pastor-Satorras. Class of Correlated Random Net works with Hidden V ariables. Phys R ev E , 68:36112, 2003. doi:10.1103/PhysRevE.68.036112 . [23] Kartik Anand, Dmitri Kriouko v, and Ginestra Bianconi. Entrop y distribution and con- densation in random netw orks with a giv en degree distribution. Phys R ev E , 89(6):062807, 2014. doi:10.1103/PhysRevE.89.062807 . [24] Konstan tin Zuev, F ragkisk os Papadopoulos, and Dmitri Kriouko v. Hamiltonian dy- namics of preferen tial attachmen t. J Phys A Math The or , 49(10):105001, 2016. doi: 10.1088/1751- 8113/49/10/105001 . 40 [25] Mic hael Ev ans and Jeﬀrey S. Rosenthal. Pr ob ability and statistics: The scienc e of un- c ertainty . W.H. F reeman and Co, New Y ork, 2009. [26] Kim b erly Claﬀy , Y oung Hyun, Ken Keys, Marina F omenko v, and Dmitri Kriouk ov. In- ternet Mapping: F rom Art to Science. In 2009 Cyb erse curity Appl T e chnol Conf Homel Se cur , 2009. doi:10.1109/CATCH.2009.38 . [27] Dmitri Kriouko v, F ragkiskos Papadopoulos, Maksim Kitsak, Amin V ahdat, and Mari´ an Bogu ˜ n´ a. Hyp erb olic Geometry of Complex Netw orks. Phys R ev E , 82:36106, 2010. doi:10.1103/PhysRevE.82.036106 . [28] E T Ja ynes. Information Theory and Statistical Mechanics. Phys R ev , 106(4):620–630, 1957. doi:10.1103/PhysRev.106.620 . [29] J. Shore and R. Johnson. Axiomatic deriv ation of the principle of maximum en tropy and the principle of minimum cross-entrop y. IEEE T r ans Inf The ory , 26(1):26–37, 1980. doi:10.1109/TIT.1980.1056144 . [30] Y. Tikochinsky , N. Z. Tishb y , and R. D. Levine. Consisten t Inference of Probabilities for Repro ducible Exp erimen ts. Phys R ev L ett , 52(16):1357–1360, 1984. doi:10.1103/ PhysRevLett.52.1357 . [31] John Skilling. The Axioms of Maximum Entr opy , In Maximum-Entr opy and Bayesian Metho ds in Scienc e and Engine ering , pages 173–187. Springer Netherlands, Dordrech t, 1988. doi:10.1007/978- 94- 009- 3049- 0_8 . [32] Claude Edmund Shannon. A Mathematical Theory of Communication. Bel l Syst T e ch J , 27(3):379–423, 1948. doi:10.1002/j.1538- 7305.1948.tb01338.x . [33] Dmitri Kriouko v. Clustering Implies Geometry in Net w orks. Phys R ev L ett , 116(20):208302, 2016. doi:10.1103/PhysRevLett.116.208302 . [34] Jagat Narain Kapur. Maximum-entr opy mo dels in scienc e and engine ering . Wiley , New Delhi, 1989. [35] Da vid J Aldous. Representations for partially exchangeable arrays of random v ariables. J Multivar A nal , 11(4):581–598, 1981. doi:10.1016/0047- 259X(81)90099- 3 . [36] P ersi Diaconis and Sv an te Janson. Graph Limits and Exhcangeable Random Graphs. R end di Matemtic a , 28:33–61, 2008. [37] Ola v Kallenberg. F oundations of mo dern pr ob ability . Springer, New Y ork, 2002. [38] Cosma Rohilla Shalizi and Alessandro Rinaldo. Consistency under sampling of exp onen- tial random graph models. A nn Stat , 41(2):508–535, 2013. doi:10.1214/12- AOS1044 . [39] Dmitri Kriouko v and Massimo Ostilli. Dualit y b et ween equilibrium and growing net- w orks. Phys R ev E , 88(2):022808, 2013. doi:10.1103/PhysRevE.88.022808 . [40] L´ aszl´ o Lov´ asz and Bal´ azs Szegedy . Limits of dense graph sequences. J Comb The ory, Ser B , 96(6):933–957, 2006. doi:10.1016/j.jctb.2006.05.002 . 41 [41] Sv an te Janson. Graphons, cut norm and distance, couplings and rearrangements. NYJM Mono gr , 4, 2013. [42] Douglas N. Ho o ver. Relations on Probabilit y Spaces and Arra ys of Random V ariables. T ec hnical rep ort, Institute for Adanced Study , Princeton, NJ, 1979. [43] S N Dorogovtsev, J F F Mendes, and A N Sam ukhin. Structure of Growing Net- w orks with Preferential Linking. Phys R ev L ett , 85(21):4633–4636, 2000. doi:10.1103/ PhysRevLett.85.4633 . [44] P L Krapivsky , S Redner, and F Leyvraz. Connectivity of Gro wing Random Netw orks. Phys R ev L ett , 85(21):4629–4632, 2000. doi:10.1103/PhysRevLett.85.4629 . [45] Ro drigo Aldecoa, Chiara Orsini, and Dmitri Kriouko v. Hyp erb olic graph generator. Comput Phys Commun , 196:492–496, 2015. doi:10.1016/j.cpc.2015.05.028 . [46] F ran¸ cois Caron and Emily B. F ox. Sparse graphs using exchangeable random measures. 2014. . [47] Victor V eitch and Daniel M. Roy . The Class of Random Graphs Arising from Exchange- able Random Measures. 2015. . [48] Christian Borgs, Jennifer T Cha yes, Henry Cohn, and Nina Holden. Sparse exc hangeable graphs and their limits via graphon pro cesses. 2016. . [49] L´ aszl´ o Lov´ asz. L ar ge Networks and Gr aph Limits . American Mathematical So ciety , Pro vidence, RI, 2012. [50] Da vid J Aldous. Exchange ability and r elate d topics , In Ec ole d’Ete de Pr ob abilites de Saint-Flour XIII , pages 1–198. Springer, Berlin, Heidelb erg, 1983. doi:10.1007/ BFb0099421 . [51] Da vid D McF arland and Daniel J Brown. So cial distanc e as a metric: A systematic intr o duction to smal lest sp ac e analysis , In Bonds of Plur alism: The F orm and Substanc e of Urb an So cial Networks , pages 213–252. John Wiley , New Y ork, 1973. [52] Katherine F aust. Comparison of metho ds for p ositional analysis: Structural and gen- eral equiv alences. So c Networks , 10(4):313–341, 1988. doi:10.1016/0378- 8733(88) 90002- 0 . [53] J. M. McPherson and J. R. Ranger-Mo ore. Evolution on a Dancing Landscap e: Or- ganizations and Net works in Dynamic Blau Space. So c F or c es , 70(1):19–42, 1991. doi:10.1093/sf/70.1.19 . [54] P eter D Hoﬀ, Adrian E Raftery , and Mark S Handco ck. Laten t Space Approac hes to So cial Netw ork Analysis. J A m Stat Asso c , 97(460):1090–1098, 2002. doi:10.1198/ 016214502388618906 . [55] B ´ ela Bollob´ as, Sv an te Janson, and Oliver Riordan. The phase transition in inhomo- geneous random graphs. R andom Struct Algor , 31(1):3–122, 2007. doi:10.1002/rsa. 20168 . 42 [56] Hamed Hatami, Sv an te Janson, and Bal´ azs Szegedy . Graph prop erties, graph limits and en tropy. 2013. . [57] Soura v Chatterjee and S.R.S. V aradhan. The large deviation principle for the Erds-R´ en yi random graph. Eur J Comb , 32(7):1000–1017, 2011. doi:10.1016/j.ejc.2011.03.014 . [58] Soura v Chatterjee and Persi Diaconis. Estimating and understanding exp onential ran- dom graph mo dels. Ann Stat , 41(5):2428–2461, 2013. doi:10.1214/13- AOS1155 . [59] Charles Radin and Lorenzo Sadun. Singularities in the Entrop y of Asymptotically Large Simple Graphs. J Stat Phys , 158(4):853–865, 2015. doi:10.1007/s10955- 014- 1151- 3 . [60] Christian Borgs, Jennifer T Chay es, L´ aszl´ o Lov´ asz, V era T. S´ os, and Katalin V eszter- gom bi. Con vergen t sequences of dense graphs I: Subgraph frequencies, metric prop erties and testing. A dv Math , 219(6):1801–1851, 2008. doi:10.1016/j.aim.2008.07.008 . [61] Ginestra Bianconi. Entrop y of Netw ork Ensembles. Phys R ev E , 79:36114, 2009. [62] A Barvinok and J A Hartigan. The num b er of graphs and a random graph with a given degree sequence. R andom Struct Algor , 42(3):301–348, 2013. doi:10.1002/rsa.20409 . [63] Jan Grandell. Mixe d Poisson pr o c esses . Chapman & Hall/CR C, London, UK, 1997. [64] Remco v an der Hofstad. R andom gr aphs and c omplex networks . Cambridge Univ ersity Press, Cambridge, UK, 2016. [65] Jean Pierre Aubin. Applie d functional analysis . John Wiley & Sons, Inc, New Y ork, 2000. [66] I. M. Gelfand and S. V. F omin. Calculus of variations . Do ver Publications, New Y ork, 2000. 43

Sparse Maximum-Entropy Random Graphs with a Given Power-Law Degree Distribution

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment