Sparse graphs using exchangeable random measures
Statistical network modeling has focused on representing the graph as a discrete structure, namely the adjacency matrix, and considering the exchangeability of this array. In such cases, the Aldous-Hoover representation theorem (Aldous, 1981;Hoover, …
Authors: Franc{c}ois Caron, Emily B. Fox
arXiv: 1401.1137 Sparse graphs using exc hangeable random measures F ran¸ cois Caron ∗ and Emily B. F o x † Dep artment of Statistics University of Oxfor d 1 South Parks R o ad Oxfor d, OX1 3TG Unite d Kingdom e-mail: caron@stats.ox.ac.uk Dep artment of Statistics University of Washington Box 354322 Se attle, W A 98195-4322 e-mail: ebfox@stat.washington.edu Abstract: Statistical netw ork mo deling has focused on representing the graph as a discrete structure, namely the adjacency matrix, and consid- ering the exchangeabilit y of this array . In suc h cases, the Aldous-Ho ov er representation theorem ( Aldous , 1981 ; Hoov er , 1979 ) applies and informs us that the graph is necessarily either dense or empty . In this pap er, we instead consider representing the graph as a measure on R 2 + . F or the as- sociated definition of exchangeability in this contin uous space, we rely on the Kallenberg representation theorem ( Kallen b erg , 2005 ). W e sho w that for certain choices of such exchangeable random measures underlying our graph construction, our netw ork pro cess is sparse with p o wer-la w degree distribution. In particular, we build on the framew ork of completely ran- dom measures (CRMs) and use the theory asso ciated with suc h pro cesses to deriv e important net work properties, such as an urn representation for our analysis and netw ork simulation. Our theoretical results are explored empirically and compared to common netw ork models. W e then present a Hamiltonian Monte Carlo algorithm for efficien t exploration of the p osterior distribution and demonstrate that w e are able to recov er graphs ranging from dense to sparse—and p erform asso ciated tests—based on our flexi- ble CRM-based formulation. W e explore netw ork prop erties in a range of real datasets, including F aceb o ok social circles, a p olitical blogosphere, pro- tein net works, citation netw orks, and world wide web netw orks, including netw orks with hundreds of thousands of nodes and millions of edges. Primary 62F15, 05C80; secondary 60G09, 60G51, 60G55. Keyw ords and phrases: random graphs, L´ evy measure, point pro cess, exchangeabilit y, generalized gamma pro cess. ∗ FC ackno wledges the supp ort of the Europ ean Commission under the Marie Curie In tra- European F ellowship Programme. † EBF was supported in part b y DARP A Grant F A9550-12-1-0406 negotiated b y AFOSR and AFOSR Grant F A9550-12-1-0453. 1 F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 2 Con tents 1 In tro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Bac kground . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1 Exchangeabilit y and de Finetti-t yp e representation theorems . . 6 2.2 Completely Random Measures . . . . . . . . . . . . . . . . . . . 8 3 Statistical netw ork models . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1 Directed multigraphs . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Undirected graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Bipartite graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4 General prop erties and simulation . . . . . . . . . . . . . . . . . . . . 14 4.1 Exchangeabilit y under the Kallen b erg framework . . . . . . . . . 15 4.2 Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.3 Interactions b etw een groups . . . . . . . . . . . . . . . . . . . . . 18 4.4 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5 Sp ecial cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5.1 Poisson pro cess . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5.2 Comp ound P oisson pro cess . . . . . . . . . . . . . . . . . . . . . 19 5.3 Generalized gamma pro cess . . . . . . . . . . . . . . . . . . . . . 20 6 P osterior characterization and inference . . . . . . . . . . . . . . . . . 23 6.1 Directed multigraph and undirected simple graph . . . . . . . . . 25 6.2 Bipartite graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 7 Exp erimen ts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 7.1 Simulated data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 7.2 T esting for sparsit y of real-world graphs . . . . . . . . . . . . . . 29 8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 A Pro ofs of results on the sparsity . . . . . . . . . . . . . . . . . . . . . . 41 A.1 Probability asymptotics notation . . . . . . . . . . . . . . . . . . 41 A.2 Pro of of Theorems 5 , 6 and 7 in the finite-activit y case . . . . . . 41 A.3 Pro of of Theorem 6 in the infinite-activity case . . . . . . . . . . 42 A.4 Pro of of Theorem 7 in the infinite-activity case . . . . . . . . . . 43 B T echnical lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 C Proofs of results on the properties of the GGP graph . . . . . . . . . . 50 C.1 Pro of of Theorem 8 . . . . . . . . . . . . . . . . . . . . . . . . . . 50 C.2 Pro of of Theorem 9 . . . . . . . . . . . . . . . . . . . . . . . . . . 51 D Proofs of results on p osterior c haracterization . . . . . . . . . . . . . . 52 D.1 Pro of of Theorem 12 . . . . . . . . . . . . . . . . . . . . . . . . . 52 D.2 Pro of of Theorem 13 . . . . . . . . . . . . . . . . . . . . . . . . . 54 E Details on the MCMC algorithms . . . . . . . . . . . . . . . . . . . . . 55 E.1 Simple graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 E.2 Bipartite graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 3 1. In tro duction The rapid increase in the av ailability and imp ortance of netw ork data has b een a driving force b ehind the significant recent atten tion on random graph mo dels. This effort builds on a long history , with a p opular early mo del being the Erd¨ os R ´ enyi random graph ( Erd¨ os and R´ enyi , 1959 ). How ever, the Erd¨ os R´ enyi for- m ulation has since been dismissed as ov erly simplistic since it fails to capture imp ortan t real-world netw ork properties. A plethora of other net w ork models ha v e b een proposed in recent y ears, with some o verviews of such mo dels pro- vided in ( Newman , 2003 , 2009 ; Bollob´ as , 2001 ; Durrett , 2007 ; Golden b erg et al. , 2010 ; Fienberg , 2012 ). In many scenarios, it is app ealing conceptually to assume that the order in whic h no des are observ ed is of no imp ortance ( Bick el and Chen , 2009 ; Hoff , 2009 ). In statistical net work mo dels, this equates with the notion of exc hange- abilit y . Classically , the graph has b een represen ted by a discrete structure, or adjac ency matrix , Z where Z ij is a binary v ariable with Z ij = 1 indicating an edge from no de i to node j . In the case of undirected graphs, we furthermore restrict Z ij = Z j i . F or generic matrices Z in some space Z , an (infinite) ex- change able r andom arr ay ( Diaconis and Janson , 2008 ; Lauritzen , 2008 ) is one suc h that ( Z ij ) d = ( Z π ( i ) σ ( j ) ) for ( i, j ) ∈ N 2 (1) for any p ermutation π , σ of N , with π = σ in the jointly exc hangeable case. The celebrated Aldous-Ho ov er theorem ( Aldous , 1981 ; Ho ov er , 1979 ) states that infinite exchangeabilit y implies a mixture mo del representation for the matrix inv olving transformations of uniform random v ariables (see Theorem 1 ). F or undirected graphs, this transformation is sp ecified by the gr aphon . The Aldous-Ho ov er constructive definition has motiv ated the developmen t of Bay esian statistical models for arrays ( Llo yd et al. , 2012 ) and many p opular net w ork mo dels can b e recast in this framework ( Hoff, Raftery and Handco ck , 2002 ; No wicki and Snijders , 2001 ; Airoldi et al. , 2008 ; Kim and Lesko vec , 2012 ; Miller, Griffiths and Jordan , 2009 ). Estimators of mo dels in this class and their asso ciated prop erties hav e b een studied extensively in recen t years ( Bick el and Chen , 2009 ; Bic kel, Chen and Levina , 2011 ; Rohe, Chatterjee and Y u , 2011 ; Zhao, Levina and Zhu , 2012 ; Airoldi, Costa and Chan , 2014 ; W olfe and Choi , 2014 ). Ho w ev er, one unpleasing consequence of the Aldous-Ho ov er theorem is that graphs represen ted by an exc hangeable random array are either trivially empty or dense 1 , i.e. the n umber of edges gro ws quadratically with the n umber of nodes n (see Theorem 14 ). T o quote the survey of Orbanz and Roy ( 2015 ) “the the ory also clarifies the limitations of exchange able mo dels. It shows, for example, that most Bayesian mo dels of network data ar e inher ently missp e cifie d.” The conclu- sion is that we cannot hav e b oth exchangeabilit y of the no des (in the sense of ( 1 )), a cornerstone of Bay esian mo deling, and sparse graphs, whic h is what we 1 Note that we refer to graphs with Θ( n 2 ) edges as dense graphs and to graphs with o ( n 2 ) edges as sp arse gr aphs , following the terminology of Bollob´ as and Riordan ( 2009 ). F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 4 0 z ij θ i θ j w i w j Fig 1 . Point pr oc ess repr esentation of a r andom gr aph. Each no de i is embe dde d in R + at some lo cation θ i and is asso ciate d with a so ciability p ar ameter w i . An e dge b etwe en no des θ i and θ j is repr esente d by a p oint at lo c ations ( θ i , θ j ) and ( θ j , θ i ) in R 2 + . observ e in the real world ( Newman , 2009 ), esp ecially for large netw orks. Several mo dels hav e b een developed which give up exchangeabilit y in order to obtain sparse graphs ( Barab´ asi and Alb ert , 1999 ). Alternativ ely , there is a b o dy of literature that examines rescaling graph prop erties with netw ork size n , leading to sparse graph sequences where eac h graph is finitely exc hangeable ( Bollob´ as, Janson and Riordan , 2007 ; Bollob´ as and Riordan , 2009 ; W olfe and Olhede , 2013 ; Borgs et al. , 2014 ). Ho wev er, an y method building on a rescaling-based approac h pro vides a graph distribution, π n , that lacks pro jectivity: marginalizing no de n do es not yield π n − 1 , the distribution on graphs of size n − 1. T o lev erage some of the b enefits of gener ative exc hangeable mo deling while pro ducing sparse graphs with p ow er-law b ehavior, w e set aside the discrete arra y structure of the adjacency matrix and instead consider a different notion of exc hangeability of a con tin uous-space represen tation of netw orks based on a p oint pr o c ess on R 2 + (see Figure 1 ) Z = X i,j z ij δ ( θ i ,θ j ) , (2) where z ij = 1 if there is a link betw een nodes θ i and θ j in R + , and is 0 otherwise. Our notion of exchangeabilit y in this framew ork is as follows. Paralleling ( 1 ), the p oint pro cess Z on R 2 + is exchangeable if and only if, for an y h > 0 and for an y p ermutations π , σ of N , ( Z ( A i × A j )) d = ( Z ( A π ( i ) × A σ ( j ) )) for ( i, j ) ∈ N 2 , (3) where here we consider intervals A i = [ h ( i − 1) , hi ] with i ∈ N . Considering arbitrarily small in terv als A i , suc h that t w o no des θ j and θ k are unlik ely to fall F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 5 in to the same interv al, leads to a similar in tuition and statistical implication of exc hangeabilit y as in the Aldous-Ho ov er framew ork. Note, how ever, that if w e order no des in ( 2 ) by the first time an edge app ears for that no de, and lo ok at the asso ciated adjacency matrix, then this array is not exchangeable in the sense of ( 1 ). Imp ortan tly , though, our notion of exchangeabilit y allo ws us to define a practical and efficien t inference algorithm (describ ed in Section 6 ) due to the in v ariance prop erty in the contin uous space sp ecified in ( 3 ). In place of the Aldous-Ho ov er theorem, we no w app eal to the contin uous- space coun terpart ( Kallen b erg , 2005 , Chapter 9) whic h pro vides a represen tation theorem for exc hangeable point processes on R 2 + : a point process is exc hangeable if and only if it can b e represented as a transformation of unit-rate P oisson pro- cesses and uniform random v ariables (see Theorem 2 ); this is in direct analogy to the graphon transformation of uniform random v ariables in the Aldous-Hoov er represen tation. More precisely , within the Kallenberg framework, w e consider that tw o nodes i 6 = j connect with probability Pr( z ij = 1 | w i , w j ) = 1 − e − 2 w i w j (4) where the p ositive so ciability parameters ( w i ) i =1 , 2 ,... are the p oints of a P oisson p oin t process, or equiv alen tly the jumps of a c ompletely r andom me asur e (CRM) ( Kingman , 1967 , 1993 ; Lijoi and Pr ¨ unster , 2010 ). W e show that by carefully c ho osing the L´ evy measure characterizing this CRM, w e are able to construct graphs ranging from sp arse to dense . In particular, any L´ evy measure yielding an infinite activit y CRM leads to sparse graphs; alternativ ely , finite activit y CRMs, whose asso ciated p oint pro cesses are in the comp ound P oisson process family , yield dense graphs. When building on a specific class of infinite activit y r e gularly varying CRMs, w e can obtain graphs where the num b er of edges increases at a rate b elow n a for some constant 1 < a < 2 that depends on the L ´ evy meas ure. The asso ciated degree distribution has a p ower-law form. By building on the framework of CRMs, w e are able to harness the consider- able theory and practicality of suc h pro cesses to (1) derive important prop erties of our prop osed mo del and (2) dev elop an efficient statistical estimation pro ce- dure. The CRM construction enables us to relate the sparsity prop erties of the graph to the prop erties of the L´ evy measure. W e also utilize the CRM-based form ulation to develop a scalable Hamiltonian Monte Carlo sampler that can automatically handle a range of graphs from dense to sparse based on inferring a graph sparsit y parameter. W e show in Section 7 that our metho ds scale to graphs with hundreds of thousands of no des and millions of edges. Thus, our generativ e specification enjo ys both an analytic represen tation in the Kallen b erg framew ork and a formulation in terms of CRMs. The former allows us to nicely connect with existing random graph models whereas the latter pro vides (1) con- nections to the Ba y esian nonparametric mo deling and inference literature and (2) interpretabilit y and theoretical analysis of the formulation. In summary , our prop osed framework captures a n umber of desirable prop- erties: • Sparsity . W e can obtain graphs where the num b er of edges increases F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 6 sub-quadratically with the num b er of no des. • Po w er La w. Our formulation yields a p ow er-law form, which is useful in mo deling many real-world graphs ( Newman , 2009 ). • Exchangeabilit y in the sense of ( 3 ). • Simplicity . Three hyperparameters tune the exp ected num b er of no des, p o wer-la w prop erties, etc. • Interpretabilit y . The no de-sp ecific sociability parameters, w i , lead to straigh tforw ard interpretabilit y of the model. • Scalable inference. Our CRM-based Hamiltonian Mon te Carlo sampler efficien tly scales to large, real-world graphs, allo wing for rapid analysis of graph prop erties suc h as sparsity , p ow er-law, etc. A bipartite random graph formulation with p ow er-law b ehavior building on CRMs was first prop osed by Caron ( 2012 ). In this pap er, we consider a more general CRM-based framework for bipartite graphs, directed multigraphs, and undirected graphs. More imp ortantly , we prov e that the resulting form ulation yields sparse graphs under certain conditions—a notion not explored in ( Caron , 2012 )—and cast exchangeabilit y within the Kallenberg representation theorem. Both of these represen t imp ortant and non-trivial extensions of this work. A n um b er of other theoretical results are explored in Section 4 as well. Finally , w e note that the sampler of Caron ( 2012 ) simply do es not apply to our undi- rected graphs. Instead, w e presen t new and efficien t posterior computations with demonstrated scalability on a range of large, real-world netw orks. Our pap er is organized as follows. In Section 2 , we provide bac kground on exc hangeabilit y for sequences, arra ys, and random measures on R 2 + . The latter pro vides an imp ortant theoretical foundation for the graph structures we pro- p ose. W e also present background on CRMs, which form the k ey building block of our graph construction. The generic formulation for directed multigraphs, undirected graphs, and bipartite graphs is presen ted in Section 3 . Prop erties, suc h as exchangeabilit y and sparsity , and metho ds for sim ulation are presen ted in Section 4 . Specific cases of our form ulation leading to dense and sparse graphs are considered in Section 5 , including an empirical analysis of net work properties of our prop osed formulation relativ e to common netw ork mo dels. Our Marko v c hain Monte Carlo (MCMC) based posterior computations are in Section 6 . Fi- nally , Section 7 pro vides a sim ulated study and an extensiv e analysis of a v ariet y of large, real-w orld graphs. 2. Bac kground 2.1. Exchange ability and de Finetti-typ e r epr esentation the or ems Our fo cus is on exchangeable random structures that can represent netw orks. T o build to such constructs, we first present a brief review of exc hangeability for random sequences, con tinuous-time pro cesses, and discrete net work arra ys. Thorough and accessible ov erviews of exc hangeability of random structures are presen ted in the surv eys of Aldous ( 1985 ) and Orbanz and Roy ( 2015 ). Here, w e F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 7 simply abstract aw ay the notions relev ant to placing our netw ork formulation in context, as summarized in T able 1 . T able 1 Overview of r epresentation the or ems Discrete structure Contin uous time/space Exchangeabilit y de Finetti (1931) B¨ uhlmann (1960) Joint/separate exchangeability Aldous-Hoover (1979-1981) Kallenberg (1990) The classical representation theorem arising from a notion of exchangeabilit y for discrete se quenc es of random v ariables is due to de Finetti ( 1931 ). The theorem states that a sequence Z 1 , Z 2 , . . . with Z i ∈ Z is exc hangeable if and only if there exists a random probability measure Θ on Z with la w ν suc h that the Z i are conditionally i.i.d. given Θ. That is, all exchangeable infinite sequences can be represented as a mixture with directing measure Θ and mixing measure ν . If examining con tin uous-time pr o c esses instead of sequences, the represen tation asso ciated with exchangeable incr ements is given by B¨ uhlmann ( 1960 ) (see also F reedman ( 1996 )) in terms of mixing L´ evy pro cesses. The fo cus of our w ork, how ever, is on graph structures. Recall the definition of exc hangeability of arra ys in ( 1 ). A represen tation theorem for exc hangeability of the classical discrete adjacency matrix , Z , follo ws in Theorem 1 by considering a sp ecial case of the Aldous-Ho ov er theorem to 2-arr ays . W e additionally fo cus here on joint exchange ability —that is, symmetric p erm utations of rows and columns—whic h is applicable to matrices Z where b oth ro ws and columns index the same set of nodes. Sep ar ate exchange ability allo ws for differen t row and column permutations, making it applicable to scenarios where one has distinct no de iden tities on rows and columns, suc h as in the bipartite graphs w e consider in Section 3.3 . Extensions of Theorem 1 to higher dimensional arrays are lik ewise straigh tforw ard ( Orbanz and Roy , 2015 ). Theorem 1 (Aldous-Ho ov er representation of join tly exchangeable ma- trices ( Aldous , 1981 ; Ho ov er , 1979 )) . A r andom 2-arr ay ( Z ij ) i,j ∈ N is jointly exchange able if and only if ther e exists a r andom me asur able function f : [0 , 1] 3 → Z such that ( Z ij ) d = ( f ( U i , U j , U ij )) , (5) wher e ( U i ) i ∈ N and ( U ij ) i,j >i ∈ N with U ij = U j i ar e a se quenc e and matrix, r e- sp e ctively, of i.i.d. Uniform [0 , 1] r andom variables. F or undirected graphs where Z is a binary , symmetric adjacency matrix, the Aldous-Ho o ver represen tation can b e expressed as the existence of a gr aphon ω : [0 , 1] 2 → [0 , 1], symmetric in its argumen ts, where f ( U i , U j , U ij ) = 1 U ij < ω ( U i , U j ) 0 otherwise . (6) Exc hangeabilit y is a fundamen tally imp ortant concept in mo deling. F or ex- ample, an assumption of join t exc hangeabilit y in netw ork mo dels implies that F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 8 the probability of a given graph dep ends on certain structural features, suc h as num b er of edges, triangles, and fiv e-stars, but not on where these features o ccur in the net work. Likewise, for separate exchangeabilit y , the probability of the matrix is in v ariant to reordering of the rows and columns, e.g., users and items in a recommender system application. Ho wev er, based on the Aldous- Ho o ver represen tation theorem, one can derive the imp ortant consequence that if a r andom gr aph is exchange able, it is either dense or empty. Note, crucially , that this result assumes the graph is mo deled via a discr ete adjacency matrix structure and exc hangeability is considered in this framework. Throughout this pap er, we instead consider representing a graph as a p oint pro cess Z = P i,j z ij δ ( θ i ,θ j ) with nodes θ i em b edded in R + , as in ( 2 ), and then examine notions of exchangeabilit y in this con text. Kallen b erg ( 1990 ) deriv ed de-Finetti-st yle represen tation theorems for separately and jointly exc hangeable random measures on R 2 + , which w e presen t for the jointly exchangeable case in Theorem 2 . Recall the definition of join t exc hangeability of a random measure on R + in ( 3 ). In the follo wing, λ denotes the Lebesgue measure on R + , λ D the Leb esgue measure on the diagonal D = { ( s, t ) ∈ R 2 + | s = t } , and e N 2 = {{ i, j }| ( i, j ) ∈ N 2 } . W e also define a U-arr ay to b e an array of indep endent uniform random v ariables. Theorem 2 (Representation theorem for join tly exchangeable ran- dom measures on R 2 + ( Kallen b erg , 1990 , 2005 , Theorem 9. 24)). A r andom me asur e ξ on R 2 + is jointly exchange able if and only if almost sur ely ξ = X i,j f ( α 0 , ϑ i , ϑ j , ζ { i,j } ) δ θ i ,θ j + β 0 λ D + γ 0 ( λ × λ ) + X j,k g ( α 0 , ϑ j , χ j k ) δ θ j ,σ j k + g 0 ( α 0 , ϑ j , χ j k ) δ σ j k ,θ j + X j h ( α 0 , ϑ j )( δ θ j × λ ) + h 0 ( α 0 , ϑ j )( λ × δ θ j ) + X k l ( α 0 , η k ) δ ρ k ,ρ 0 k + l 0 ( α 0 , η k ) δ ρ 0 k ,ρ k (7) for some me asur able functions f : R 4 + → R + , g , g 0 : R 3 + → R + and h, h 0 , l, l 0 : R 2 + → R + . Her e, ( ζ { i,j } ) with { i, j } ∈ e N 2 is a U-arr ay. { ( θ j , ϑ j ) } and { ( σ ij , χ ij ) } on R 2 + and { ( ρ j , ρ 0 j , η j ) } on R 3 + ar e indep endent, unit-r ate Poisson pr o c esses. F urthermor e, α 0 , β 0 , γ 0 ≥ 0 ar e an indep endent set of r andom variables. W e place our prop osed netw ork mo del of Section 3 within this Kallenberg represen tation in Section 4.1 , yielding direct analogs to the classical graphon represen tation of graphs based on exchangeabilit y of the adjacency matrix. 2.2. Completely R andom Me asur es Our models for graphs build on the completely random measure (CRM) ( King- man , 1967 ) framework. CRMs ha ve b een used extensively in the Bay esian non- F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 9 parametric literature for prop osing flexible classes of priors o v er functional spaces, (cf. Regazzini, Lijoi and Pr¨ unster , 2003 ; Lijoi and Pr ¨ unster , 2010 ). W e recall in this section basic properties of CRMs; the reader can refer to the mono- graph of Kingman ( 1993 ) for an exhaustiv e cov erage. A CRM W on R + is a random measure suc h that for an y coun table n um b er of disjoin t measurable sets A 1 , A 2 , . . . of R + , the random v ariables W ( A 1 ) , W ( A 2 ) , . . . are indep endent and W ( ∪ j A j ) = X j W ( A j ) . (8) If one additionally assumes that the distribution of W ([ t, s ]) only dep ends on t − s , (i.e. we ha ve i.i.d. incremen ts of fixed size) then the CRM takes the follo wing form W = ∞ X i =1 w i δ θ i , (9) where ( w i , θ i ) i ∈ N are the p oints of a Poisson p oint pro cess on R 2 + with mean (or L ´ evy) measure ν ( dw, dθ ) = ρ ( dw ) λ ( dθ ); moreo ver, the Laplace transform of W ( A ) for an y measurable set A admits the following representation: E [exp( − tW ( A ))] = exp − Z R + × A [1 − exp( − tw )] ρ ( dw ) λ ( dθ ) ! , (10) for any t > 0 and ρ a measure on R + suc h that Z ∞ 0 (1 − e − w ) ρ ( dw ) < ∞ . (11) The measure ρ is referred to as the jump part of the L ´ evy measure. F or a CRM W with i.i.d. increments, whic h are intimately connected to sub ordina- tors ( Kingman , 1993 , Chapter 8), ρ c haracterizes these incremen ts. W e denote this process as W ∼ CRM( ρ, λ ). Note that W ([0 , T ]) < ∞ for any T < ∞ , while W ( R + ) = ∞ if ρ is not degenerate at 0. The jump part ρ of the L ´ evy measure is of particular interest for our con- struction for graphs. If ρ satisfies the condition Z ∞ 0 ρ ( dw ) = ∞ , (12) then there will b e an infinite num b er of jumps in any in terv al [0 , T ], and we refer to the CRM as infinite activity . Otherwise, the num b er of jumps will b e finite almost surely . In our mo dels of Section 3 , these jumps will map directly to the no des in the graph. Finally , throughout we let ψ ( t ) b e the Laplace exp onent, defined as ψ ( t ) = Z ∞ 0 (1 − e − wt ) ρ ( w ) dw (13) F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 10 and ρ ( x ) the tail L ´ evy intensit y ρ ( x ) = Z ∞ x ρ ( w ) dw. (14) In Section 5 , we consider special cases including the (comp ound) P oisson pro cess and generalized gamma pro cess ( Brix , 1999 ; Lijoi, Mena and Pr ¨ unster , 2007 ). 3. Statistical net work mo dels Our primary focus is on undirected netw ork mo dels, but implicit in our con- struction is the definition of a directed integer-w eighted, or multigr aph , which in some applications might b e the direct quan tit y of interest. F or example, in so cial netw orks, interactions are often not only directed (“p erson i messages p erson j ”), but also hav e an associated count. Additionally , interactions might b e t yp ed (“message”, “SMS”,“like”,“tag”). Our prop osed framework could b e directly extended to mo del suc h data. Our undirected graph simply transforms the directed multigraph by forming an undirected edge if there is any directed edge b etw een tw o nodes. Due to the straightforw ard relationship b etw een the tw o graphs, muc h of the intuition gained from the directed case carries o ver to the undirected scenario. 3.1. Dir e cte d multigr aphs Let V = ( θ 1 , θ 2 , ... ) be a coun tably infinite set of nodes with θ i ∈ R + . W e represen t the directed multigraph of interest using an atomic measure on R 2 + D = ∞ X i =1 ∞ X j =1 n ij δ ( θ i ,θ j ) , (15) where n ij coun ts the num b er of directed edges from no de θ i to no de θ j . See Figure 2 for an illustration of the restriction of D to [0 , 1] 2 and the corresponding directed graph. Our generative approac h for mo deling D asso ciates with each no de θ i a so- ciability parameter w i > 0 defined via the atomic random measure W = ∞ X i =1 w i δ θ i , (16) whic h we tak e to b e distributed according to a homogeneous CRM, W ∼ CRM( ρ, λ ). Given W , D is simply generated from a P oisson pro cess (PP) with in tensit y given by the pro duct measure f W = W × W on R 2 + : D | W ∼ PP( W × W ) . (17) That is, informally , the individual counts n ij are generated as Poisson( w i w j ). By construction, for any A, B ⊂ R , we hav e f W ( A × B ) = W ( A ) W ( B ). On any b ounded interv al A of R + , W ( A ) < ∞ implying f W ( A × A ) has finite mass. F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 11 0 1 2 3 4 θ 2 θ 1 θ 3 θ 3 θ 1 θ 2 Counts θ 1 θ 2 θ 3 4 2 1 3 θ 1 θ 2 θ 3 (a) (b) (c) Fig 2 . An example of (a) the r estriction on [0 , 1] 2 of an atomic me asur e D , (b) the c orr e- sp onding dir e cted multigr aph, and (c) c orresp onding undir ecte d gr aph. 3.2. Undir e cte d gr aphs W e now turn to the primary fo cus of mo deling undirected graphs. Similarly to the directed case of Section 3.1 , we represent an undirected graph using an atomic measure Z = ∞ X i =1 ∞ X j =1 z ij δ ( θ i ,θ j ) , with the conv ention z ij = z j i ∈ { 0 , 1 } . Here, z ij = z j i = 1 indicates an undi- rected edge b etw een no des θ i and θ j . W e arise at the undirected graph via a simple transformation of the directed graph: set z ij = z j i = 1 if n ij + n j i > 0 and z ij = z j i = 0 otherwise. That is, place an undirected edge betw een no des θ i and θ j if and only if there is at least one directed in teraction b etw een the no des. Note that in this definition of an undirected graph, w e allow self-edges. This could represent, for example, a p erson p osting a message on his or her own profile page. The resulting hierarc hical mo del is as follows: W = P ∞ i =1 w i δ θ i W ∼ CRM( ρ, λ ) D = P ∞ i =1 P ∞ j =1 n ij δ ( θ i ,θ j ) D | W ∼ PP ( W × W ) Z = P ∞ i =1 P ∞ j =1 min( n ij + n j i , 1) δ ( θ i ,θ j ) . (18) This pro cess is depicted graphically in Figure 3 . Equiv alen tly , giv en the so ciabilit y parameters w = { w i } , w e can directly sp ecify the undirected graph model as Pr( z ij = 1 | w ) = 1 − exp( − 2 w i w j ) i 6 = j 1 − exp( − w 2 i ) i = j. (19) T o see the equiv alence b et ween this formulation and the one obtained from manipulating the directed multigraph, note that for i 6 = j , Pr( z ij = 1 | w ) = Pr( n ij + n j i > 0 | w ). By prop erties of the Poisson pro cess, n ij and n j i are inde- p enden t random v ariables conditioned on W . The sum of tw o Poisson random F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 12 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10 12 14 16 18 (a) f W = W × W 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10 12 14 16 18 (b) Integer p oint process D 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 (c) Poin t pro cess Z Fig 3 . An example of (a) the pr o duct me asure f W = W × W for CRM W , (b) a dr aw of the dir e cted multigraph me asur e D | W ∼ P P ( W × W ) , (c) corr esp onding undir ecte d me asur e Z = P ∞ i =1 P ∞ j =1 min( n ij + n j i , 1) δ ( θ i ,θ j ) . v ariables, eac h with rate w i w j , is again P oisson with rate 2 w i w j . The result ( 19 ) arises from the fact that Pr( n ij + n j i > 0 | w ) = 1 − Pr( n ij + n j i = 0 | w ). Lik ewise, the i = j case arises using a similar reasoning for Pr( z ii = 1 | w ) = Pr( n ii > 0 | w ). Graph restrictions Our general netw ork process is defined on R 2 + and, due to the fact that W ( R + ) = ∞ , yields an infinite n umber of edges. In applications, w e are t ypically interested in considering graphs with a finite num b er of edges, but without a b ound on or presp ecification of this finite n umber. W e therefore consider restrictions D α and Z α of D and Z , resp ectiv ely , to the b o x [0 , α ] 2 and in Section 6 examine metho ds for inferring α . W e also denote by W α and λ α the corresp onding CRM and Lebesgue measure on [0 , α ]. W e write Z ∗ α = Z α ([0 , α ] 2 ), the total mass on [0 , α ] 2 , and similarly for D ∗ α and W ∗ α . By definition, D α is dra wn from a P oisson pro cess with finite mean measure W α × W α , so w e ha v e the following generative model for directly sim ulating D α and Z α : W α ∼ CRM( ρ, λ α ) D ∗ α | W ∗ α ∼ P oisson( W ∗ 2 α ) . F or k = 1 , . . . , D ∗ α and j = 1 , 2 U kj | W α iid ∼ W α W ∗ α D α = D ∗ α X k =1 δ ( U k 1 ,U k 2 ) . (20) Here, the v ariables U kj ∈ R + corresp ond to no des in the graph, and pairs of v ariables ( U k 1 , U k 2 ) corresp ond to a directed edge from node U k 1 to no de U k 2 . The num b er of directed edges, D ∗ α , dep ends on the total mass of the CRM, W ∗ α . F or eac h such directed edge, the defining no des U kj are dra wn from a normalized CRM, W α W ∗ α ; since W α W ∗ α is discrete with probabilit y 1, the U kj tak e a F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 13 n um b er N α ≤ 2 D ∗ α of distinct v alues. That is, N α corresp onds to the num b er of no des with degree at least one in the netw ork. Recall that the undirected net w ork construction simply forms an undirected edge b etw een a set of no des if there exists at least one directed edge b et ween them. If w e consider unordered pairs { U k 1 , U k 2 } , the num b er of such unique pairs takes a num b er N ( e ) α ≤ D ∗ α of distinct v alues, where N ( e ) α corresp onds to the n umber of edges in the undirected net w ork. The construction ( 20 ), enables us to re-express our Cox pro cess mo del in terms of normalized CRMs ( Regazzini, Lijoi and Pr ¨ unster , 2003 ). This is very attractiv e b oth practically and theoretically; as w e show in Section 5 , one can use this framew ork to build on the v arious results on urn processes and p ow er- la w prop erties of normalized CRMs in order to get exact samplers for our graph mo dels as w ell as to show its sparsity . Finite-dimensional generative pro cess W e now describ e the urn form ula- tion that allo ws us to obtain a finite-dimensional generative process. Recall that in practice, we cannot sample W α ∼ CRM( ρ, λ α ) if the CRM is infinite activity . Let ( U 0 1 , . . . , U 0 2 D ∗ α ) = ( U 11 , U 12 , . . . U D ∗ α 1 , U D ∗ α 2 ). F or some classes of L´ evy measure ρ , it is possible to integrate out the normalized CRM µ α = W α W ∗ α in ( 20 ) and derive the conditional distribution of U 0 n +1 giv en ( W ∗ α , U 0 1 , . . . , U 0 n ). W e first recall some background on random partitions. As µ α is discrete with proba- bilit y 1, v ariables U 0 1 , . . . , U 0 n tak e k ≤ n distinct v alues e U 0 j , with multiplicities 1 ≤ m j ≤ n . The distribution on the underlying partition is usually defined in terms of an exchangeable partition probabilit y function (EPPF) ( Pitman , 1995 ) Π ( k ) n ( m 1 , . . . , m k | W ∗ α ) which is symmetric in its arguments. The predic- tiv e distribution of U 0 n +1 giv en ( W ∗ α , U 0 1 , . . . , U 0 n ) is then giv en in terms of the EPPF: U 0 n +1 | ( W ∗ α , U 0 1 , . . . , U 0 n ) ∼ Π ( k +1) n +1 ( m 1 , . . . , m k , 1 | W ∗ α ) Π ( k ) n ( m 1 , . . . , m k | W ∗ α ) 1 α λ α + k X j =1 Π ( k ) n +1 ( m 1 , . . . , m j + 1 , . . . , m k | W ∗ α ) Π ( k ) n ( m 1 , . . . , m k | W ∗ α ) δ e U 0 j . (21) Using this urn representation, we can rewrite our generative process as W ∗ α ∼ P W ∗ α D ∗ α | W ∗ α ∼ P oisson( W ∗ 2 α ) . ( U kj ) k =1 ,...,D ∗ α ; j =1 , 2 | W ∗ α ∼ Urn pro cess ( 21 ) D α = D ∗ α X k =1 δ ( U k 1 ,U k 2 ) , (22) where P W ∗ α is the distribution of the CRM total mass, W ∗ α . The represen tation of ( 22 ) can be used to sample exactly from our graph model, assuming w e can sam- ple from P W ∗ α and ev aluate the EPPF. In Section 5 we show that this is indeed F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 14 p ossible for sp ecific CRMs of in terest. If this is not p ossible, in Section 4.4 we presen t alternativ e, though potentially more computationally complex, methods for simulation. 3.3. Bip artite gr aphs The abov e construction can also b e extended to bipartite graphs. Let V = ( θ 1 , θ 2 , ... ) and V 0 = ( θ 0 1 , θ 0 2 , ... ) b e t w o coun tably infinite set of no des with θ i , θ 0 i ∈ R + . W e assume that only connections b etw een nodes of differen t sets are allow ed. W e represent the dir e cte d bip artite multigr aph of interest using an atomic measure on R 2 + D = ∞ X i =1 ∞ X j =1 n ij δ ( θ i ,θ 0 j ) , (23) where n ij coun ts the num b er of directed edges from no de θ i to node θ 0 j . Similarly , the bip artite gr aph is represented b y an atomic measure Z = ∞ X i =1 ∞ X j =1 z ij δ ( θ i ,θ 0 j ) . Our bipartite graph form ulation introduces tw o CRMs, W ∼ CRM( ρ, λ ) and W 0 ∼ CRM( ρ 0 , λ ), whose jumps corresp ond to s ociability parameters for no des in sets V and V 0 , resp ectively . The generative mo del for the bipartite graph mimics that of the non-bipartite one: W = P ∞ i =1 w i δ θ i W ∼ CRM( ρ, λ ) W 0 = P ∞ j =1 w 0 j δ θ 0 j W 0 ∼ CRM( ρ 0 , λ ) D = P ∞ i =1 P ∞ j =1 n ij δ ( θ i ,θ 0 j ) D | W, W 0 ∼ PP ( W × W 0 ) Z = P ∞ i =1 P ∞ j =1 min( n ij , 1) δ ( θ i ,θ 0 j ) . (24) The model ( 24 ) has been proposed by Caron ( 2012 ) in a sligh tly different form u- lation. Here, w e recast this mo del within our general framew ork making connec- tions with an urn represen tation and the Kallen b erg theory of exc hangeability , b oth of whic h enable new theoretical and practical insights. 4. General prop erties and sim ulation W e provide here general properties of our net work mo del dep ending on the prop erties of the L ´ evy intensit y ρ. In the next section, we provide more refined prop erties, dep ending on sp ecific c hoices of ρ . F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 15 4.1. Exchange ability under the Kal lenb er g fr amework Prop osition 3 (Joint exchangeabilit y of the undirected graph mea- sure). F or any CRM W ∼ CRM( ρ, λ ) , the p oint pr o c ess Z define d by ( 18 ) , or e quiva- lently by ( 19 ) , is jointly exchange able. Pro of. The pro of follows from the prop erties of W ∼ CRM( ρ, λ ). Let A i = [ h ( i − 1) , hi ] for h > 0 and i ∈ N . W e hav e ( W ( A i )) d = ( W ( A π ( i ) )) (25) for any permutation π of N . As D ( A i × A j ) ∼ P oisson( W ( A i ) W ( A j )), it follows that ( D ( A i × A j )) d = ( D ( A π ( i ) × A π ( j ) )) (26) for any p ermutation π of N . Joint exc hangeability of Z follows directly . W e now reformulate our netw ork pro cess in the Kallen b erg representation of ( 7 ). Due to exchangeabilit y , we kno w that such a representation exits. What we sho w here is that our CRM-based formulation has an analytic and interpretable represen tation. In particular, the CRM W can b e constructed from a tw o- dimensional unit-rate P oisson process on R 2 + using the in verse L´ evy metho d ( Khint- c hine , 1937 ; F erguson and Klass , 1972 ). Let ( θ i , ϑ i ) b e a unit-rate Poisson pro- cess on R 2 + . Let ρ ( x ) b e the tail L ´ evy intensit y defined in ( 14 ). Then the CRM W = P w i δ θ i with L ´ evy measure ρ ( dw ) dθ can be constructed from the bi- dimensional p oint pro cess by taking w i = ρ − 1 ( ϑ i ). ρ − 1 is a monotone func- tion, kno wn as the inv erse L´ evy intensit y . It follows that our undirected graph mo del can b e formulated under the representation of ( 7 ) by selecting any α 0 , β 0 = γ 0 = 0, g = g 0 = 0, h = h 0 = l = l 0 = 0 and f ( α 0 , ϑ i , ϑ j , ζ { i,j } ) = 1 ζ { i,j } ≤ M ( ϑ i , ϑ j ) 0 otherwise (27) where M : R 2 + → [0 , 1] is defined b y M ( ϑ i , ϑ j ) = 1 − exp( − 2 ρ − 1 ( ϑ i ) ρ − 1 ( ϑ j )) if ϑ i 6 = ϑ j 1 − exp( − ρ − 1 ( ϑ i ) 2 ) if ϑ i = ϑ j . In Section 5 , we provide explicit forms for ρ dep ending on our c hoice of L´ evy in tensit y ρ . The expression ( 27 ) represents a direct analog to that of ( 6 ) arising from the Aldous-Ho o v er framew ork. In particular, M here is akin to the graphon ω , and th us allo ws us to connect our CRM-based form ulation with the extensiv e literature on graphons. An illustration of the netw ork construction from the Kallen b erg representation, including the function M , is provided in Figure 4 . Note that had we started from the Kallenberg representation and selected an f (or M ) arbitrarily , w e would lik ely not ha ve yielded a netw ork mo del with the normalized CRM in terpretation that enables both in terpretabilit y and analysis of netw ork properties, such as those presen ted in Section 5.3 . F or the bipartite graph, an application of Kallen b erg’s represen tation theorem for sep ar ate exchangeabilit y can likewise b e made. F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 16 θ j θ i α ϑ i ϑ j ϑ i ϑ j ϑ i ϑ j Fig 4 . Il lustr ation of the mo del c onstruction base d on the Kal lenb er g r epresentation. (left) A unit-r ate Poisson pr o c ess ( θ i , ϑ i ) , i ∈ N on [0 , α ] × R + . (right) F or e ach p air { i, j } ∈ e N 2 , set z ij = z j i = 1 with pr obability M ( ϑ i , ϑ j ) . Her e, M is indic ated by the blue shading (darker shading indic ates higher value) for a stable pr o c ess (generalize d gamma pr o c ess with τ = 0 ). In this c ase ther e is an analytic expr ession for ρ − 1 and therefor e M . 4.2. Sp arsity In this section we state the sparsity prop erties of our graph mo del, whic h relate to the prop erties of the L´ evy intensit y ρ . Of particular interest is the notion of a r e gularly varying L´ evy intensit y ( Karlin , 1967 ; Gnedin, Pitman and Y or , 2006 ; Gnedin, Hansen and Pitman , 2007 ), defined as follows. Definition 4 (Regular v ariation) L et W ∼ CRM( ρ, λ ) . The CRM is said to b e r e gularly varying if the tail L´ evy intensity verifies ρ ( x ) x ↓ 0 ∼ ` (1 /x ) x − σ (28) for σ ∈ (0 , 1) wher e ` is a slow ly varying function satisfying lim t →∞ ` ( at ) /` ( t ) = 1 for any a > 0 . F or example, c onstant and lo garithmic functions ar e slow ly varying. The e quivalenc e notation f ( x ) x ↓ 0 ∼ g ( x ) is use d for lim x → 0 f ( x ) g ( x ) = 1 (not to b e c onfuse d with the notation ∼ alone for ‘distribute d fr om’). As a trivial (and degenerate) example of obtaining sparse graphs, we note that if ρ ( dw ) = 0, then W ([0 , ∞ )) = 0 almost surely and there are no edges, N ( e ) α = 0, and thus no nodes of degree at least one, N α = 0, for all v alues of α . W e consider more general L´ evy intensities in Theorem 5 . In this theorem, we follow the notation of Janson ( 2011 ) for probability asymptotics (see App endix A.1 for details). Theorem 5 Consider the p oint pr o c ess Z with ρ ( w ) 6 = 0 . L et ψ ( t ) , define d in ( 13 ) , b e the L aplac e exp onent and ψ 0 ( t ) its first derivative; her e, lim t → 0 ψ 0 ( t ) = E [ W ∗ 1 ] , the exp e cte d total mass for α = 1 . L et N ( e ) α b e the numb er of e dges in F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 17 the undir e cte d gr aph r estriction Z α , and N α b e the numb er of no des. If the CRM W is finite-activity (i.e., is obtaine d fr om a c omp ound Poisson pr o c ess): Z ∞ 0 ρ ( w ) dw < ∞ , then the numb er of e dges sc ales quadr atic al ly with the numb er of no des N ( e ) α = Θ( N 2 α ) (29) almost sur ely as α tends to infinity, and the gr aph is dense . If the CRM is infinite-activity, i.e. Z ∞ 0 ρ ( w ) dw = ∞ and lim t → 0 ψ 0 ( t ) < ∞ , (30) then the numb er of e dges sc ales sub-quadr atic al ly with the numb er of no des N ( e ) α = o ( N 2 α ) (31) almost sur ely as α tends to infinity, and the gr aph is sp arse . The sp arsity r e gime is linke d to the pr op erty of r e gular variation of the L´ evy intensity (Definition 4 ). If the L ´ evy intensity ρ is r e gularly varying, i.e. if ther e exists a slow ly varying function ` such that ρ ( x ) x ↓ 0 ∼ ` (1 /x ) x − σ with σ ∈ (0 , 1) , and if additional ly lim t →∞ ` ( t ) > 0 , then N ( e ) α = O N 2 1+ σ α (32) almost sur ely. Theorem 5 is a direct consequence of t wo theorems that w e state now and pro v e in App endix A . The first theorem states that the n um b er of edges grows quadratically with α , while the second states that the num b er of no des scales sup erlinearly with α for infinite-activit y CRMs, and linearly otherwise. Theorem 6 Consider the p oint pr o c ess Z with ρ ( w ) 6 = 0 . If lim t → 0 ψ 0 ( t ) = E [ W ∗ 1 ] < ∞ , then the numb er of e dges in Z α gr ows quadr atic al ly with α : N ( e ) α = Θ( α 2 ) (33) almost sur ely. Otherwise, N ( e ) α = Ω( α 2 ) . Theorem 7 Consider the p oint pr o c ess Z with ρ ( w ) 6 = 0 . Then N α = Θ( α ) if W is a finite-activity CRM ω ( α ) if W is an infinite-activity CRM (34) F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 18 almost sur ely as α → ∞ . In wor ds, the numb er of no des in Z α sc ales line arly with α for finite-activity CRMs and sup erline arly with α for infinite-activity CRMs. In p articular, for a r e gularly varying L ´ evy intensity with lim t →∞ ` ( t ) > 0 , we have N α = Ω( α σ +1 ) (35) almost sur ely as α → ∞ . 4.3. Inter actions b etwe en gr oups F or an y disjoint set of no des A, B ⊂ R + , A ∩ B = ∅ , the probability that there is at least one connection b etw een a no de in A and a no de in B is giv en by Pr( Z ( A × B ) > 0 | W ) = 1 − exp( − 2 W ( A ) W ( B )) . That is, the probability of a b etw een-group edge dep ends on the sum of the so ciabilities in eac h group, W ( A ) and W ( B ), respectively . 4.4. Simulation T o simulate an undirected graph, we harness the directed multigraph represen- tation. That is, we first sample a directed m ultigraph and then transform it to an undirected graph as described in Section 3.2 . One might imagine simulating a directed netw ork by first sampling W α and then sampling D α giv en W α . Ho w- ev er, recall that W α ma y ha ve an infinite num b er of jumps. One approximate approac h to coping with this issue, whic h is p ossible for some L´ evy in tensities ρ , is to resort to adaptiv e thinning ( Lewis and Shedler , 1979 ; Ogata , 1981 ; F av aro and T eh , 2013 ). A related alternative approximate approach, but applicable to an y L´ evy in tensit y ρ satisfying ( 12 ), is the inv erse L ´ evy metho d. This metho d first defines a threshold ε and then samples the w eights Ω = { w i | w i > ε } using a Poisson measure on [ ε, + ∞ ]. One then simulates D α using these truncated w eigh ts Ω. A naiv e application of this truncated method that considers sampling directed or undirected edges as in ( 18 ) or ( 19 ), respectively , can prov e computationally problematic since a large n umber of p ossible edges m ust be considered (one P oisson/Bernoulli dra w for each ( θ i , θ j ) pair for the directed/undirected case). Instead, we can harness the Cox pro cess represen tation and resulting sampling pro cedure of ( 20 ) to first sample the total n umber of directed edges and then their specific instan tiations. More specifically , to appro ximately sim ulate a point pro cess on [0 , α ] 2 , we use the inv erse L´ evy metho d to sample Π α,ε = { ( w , θ ) ∈ Π , 0 < θ ≤ α, w > ε } . (36) Let W α,ε = P K i =1 w i δ θ i b e the associated truncated CRM and W ∗ α,ε = W α,ε ([0 , α ]) its total mass. W e then sample D ∗ α,ε and U k,j as in ( 20 ) and set D α,ε = F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 19 P D ∗ α,ε k =1 δ ( U k 1 ,U k 2 ) . The undirected graph measure Z α,ε is set to the manipula- tion of D α,ε as in ( 18 ). In the next section, we show that it is possible to sample a graph exactly via an urn scheme when considering the special case of generalized gamma pro cesses, whic h includes the standard gamma pro cess. 5. Sp ecial cases In this section, we examine the prop erties of v arious mo dels and their link to classical random graph mo dels dep ending on the L´ evy measure ρ . W e sho w that in generalized gamma process case, the resulting graph can b e either dense or sparse, with the sparsity tuned by a single h yp erparameter. W e focus on the undirected graph case, but similar results can b e obtained for directed multi- graphs and bipartite graphs. 5.1. Poisson pr o c ess Consider a P oisson pro cess with fixed increments a and ρ ( dw ) = δ w 0 ( dw ) , where δ w 0 is the dirac delta mass at w 0 > 0. Recalling the definition ρ ( x ) = R ∞ x ρ ( dw ), in this case, w e ha v e ρ ( x ) = 1 if x < w 0 0 otherwise . Ignoring self-edges, the graph construction can b e describ ed as follows. T o sample W α ∼ PP( ρ, λ α ), we generate n ∼ P oisson( α ) and then sample θ i ∼ Uniform([0 , α ]) for i = 1 , . . . n . W e then sample edges according to ( 19 ): F or 0 < i < j < n , set z ij = z j i = 1 with probabilit y 1 − exp( − 2 w 2 0 ) and 0 other- wise. The model is therefore equiv alent to the Erd¨ os-R ´ enyi random graph mo del G ( n, p ) with n ∼ Poisson( α ) and p = 1 − exp( − 2 w 2 0 ). Therefore, this choice of ρ leads to a dense graph where the num b er of edges grows quadratically with the n um b er of no des n . 5.2. Comp ound Poisson pr o c ess A comp ound P oisson pro cess is one where ρ ( dw ) = h ( w ) dw and h : R + → R + is such that R ∞ 0 h ( w ) dw = 1 . In this case, we hav e ρ ( x ) = 1 − H ( x ) F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 20 where H is the distribution function asso ciated with h . Here, we arrive at a framew ork similar to the standard graphon. Lev eraging the Kallenberg repre- sen tation of ( 27 ), we first sample n ∼ Poisson( α ). Then, for i = 1 , . . . n w e set z ij = z j i = 1 with probability M ( U i , U j ) where U i are uniform v ariables and M is defined b y M ( U i , U j ) = 1 − exp( − 2 H − 1 ( U i ) H − 1 ( U j )) . This representation is the same as with the Aldous-Ho ov er theorem, where the n um b er of no des is random and follows a Poisson distribution. As suc h, the resulting random graph is either trivially empt y or dense. 5.3. Gener alize d gamma pr o c ess The generalized gamma pro cess ( Hougaard , 1986 ; Aalen , 1992 ; Lee and Whit- more , 1993 ; Brix , 1999 ) (GGP) is a flexible tw o-parameter CRM, with inter- pretable parameters and remark able conjugacy prop erties ( Lijoi, Mena and Pr ¨ unster , 2007 ; Caron, T eh and Murph y , 2014 ). The pro cess is known as the Hougaard pro cess ( Hougaard , 1986 ) when λ is the Lebesgue measure, as in this pap er, but w e will use the term GGP in the rest of this paper. The L ´ evy in tensity of the GGP is giv en by ρ ( dw ) = 1 Γ(1 − σ ) w − 1 − σ exp( − τ w ) dw , (37) where the t wo parameters ( σ, τ ) v erify ( σ, τ ) ∈ ( −∞ , 0] × (0 , + ∞ ) or ( σ, τ ) ∈ (0 , 1) × [0 , + ∞ ) . (38) The GGP has different prop erties if σ ≥ 0 or σ < 0. When σ < 0, the GGP is a finite-activit y CRM; more precisely , the num b er of jumps in [0 , α ] is finite w.p. 1 and drawn from a Poisson distribution with rate − α σ τ σ while the jumps w i are i.i.d. Gamma( − σ, τ ). When σ ≥ 0, the GGP has an infinite num b er of jumps ov er an y in terv al [ s, t ]. It includes as sp ecial cases the gamma process ( σ = 0, τ > 0), the stable pro cess ( σ ∈ (0 , 1), τ = 0) and the inv erse-Gaussian process ( σ = 1 2 , τ > 0). The tail L ´ evy intensit y of the GGP is giv en by ρ ( x ) = Z ∞ x 1 Γ(1 − σ ) w − 1 − σ exp( − τ w ) dw = ( τ σ Γ( − σ,τ x ) Γ(1 − σ ) if τ > 0 x − σ Γ(1 − σ ) σ if τ = 0 , where Γ( a, x ) is the incomplete gamma function. Example realizations of the pro cess for v arious v alues of σ ≥ 0 are display ed in Figure 5 alongside a realiza- tion of an Erd¨ os-R ´ enyi graph. Exact sampling via an urn approach In the case σ ≥ 0, W ∗ α is an exp o- nen tially tilted stable random v ariable, for which exact samplers exist ( Devro ye , 2009 ). As sho wn b y Pitman ( 2003 ) (see also ( Lijoi, Pr¨ unster and W alk er , 2008 )), F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 21 (a) G (1000 , 0 . 05) (b) GGP (100 , 2 , 0) (c) GGP (100 , 2 , 0 . 5) (d) GGP (100 , 2 , 0 . 8) Fig 5 . Sample gr aphs: (a) Erd¨ os-R´ enyi gr aph G ( n, p ) with n = 1000 and p = 0 . 05 (b-c) Gener alize d gamma pr oc ess gr aph GGP ( α, τ , σ ) with α = 100 , τ = 2 and (b) σ = 0 , (c) σ = 0 . 5 , (d) σ = 0 . 8 . The size of a no de is pr op ortional to its de gre e. Gr aphs have b e en gener ate d with the software Gephi. the EPPF conditional on the total mass W ∗ α = t only dep ends on the parameter σ (and not τ , α ) and is given b y Π ( n ) k ( m 1 , . . . , m k | t ) = σ k t − n Γ( n − k σ ) g σ ( t ) Z t 0 s n − kσ − 1 g σ ( t − s ) ds k Y i =1 Γ( m i − σ ) Γ(1 − σ ) ! , (39) where g σ is the p df of the p ositiv e stable distribution. Plugging the EPPF of ( 39 ) in to ( 21 ) yields the urn process for sampling in the GGP case. In particular, one can use the generative pro cess ( 22 ) in order to sample exactly from the mo del. In the sp ecial case of the gamma pro cess ( σ = 0), W ∗ α is a Gamma( α , τ ) random v ariable and the resulting urn pro cess is given b y ( Blackw ell and Mac- Queen , 1973 ; Pitman , 1996 ): U 0 n +1 | ( W ∗ α , U 0 1 , . . . , U 0 n ) ∼ 1 α + n λ α + k X j =1 m j α + n δ e U 0 j . (40) When σ < 0, the GGP is a c ompound Poisson process and can thus be sampled exactly . Exp ected num b er of no des and edges In Theorem 8 , we consider bounds on the exp ected num b er of no des in the gamma pro cess case ( σ = 0 , τ > 0), and the exp ected n umber of edges in the m ultigraph. The pro of is in App endix C . Theorem 8 F or any ε ∈ (0 , 1) , α log 1 + ε 2( α + 1) τ 2 1 − c 1 ( α ) 1 − ε 2 ≤ E [ N α ] ≤ α log 1 + 2( α + 1) τ 2 + 2( α + 1) τ 2 + 2( α + 1) , (41) F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 22 wher e c 1 ( α ) = V ar ( D ∗ α ) E [ D ∗ α ] 2 = τ 2 α ( α +1) 1 + 4 α +6 τ 2 is a de cr e asing function of α with c ( α ) → 0 as α → ∞ . Conse quently, E [ N α ] = Θ( α log α ) . (42) L et D ∗ α b e the numb er of e dges in the dir e cte d multigr aph. Then E [ D ∗ α ] = α ( α + 1) τ 2 V ar [ D ∗ α ] = α ( α + 1) τ 2 1 + 4 α + 6 τ 2 . P ow er-law prop erties In Theorem 9 , w e show that the GGP directed m ulti- graph has a p o w er-la w degree distribution. A corresp onding theorem in the undi- rected graph case is c hallenging to sho w and b ey ond the scope of this paper, but our empirical results of Figure 6 demonstrate that such a pow er-law property lik ely holds for the undirected case as well. Theorem 9 L et N α,j , j ≥ 1 b e the numb er of no des in the dir e cte d multigr aph D α with j outgoing or inc oming e dges (a self e dge c ounts twic e for a given no de). Then we have the fol lowing asymptotic r esults for the GGP: N α,j N α α ↑∞ − → p σ,j = σ Γ( j − σ ) Γ(1 − σ )Γ( j + 1) , (43) almost sur ely, for fixe d j . In p articular, for lar ge j , we have tail b ehavior p σ,j j ↑∞ ∼ σ Γ(1 − σ ) j − 1 − σ (44) c orr esp onding to a p ower-law b ehavior. The pro of, whic h builds on the asymptotic properties of the normalized GGP ( Li- joi, Mena and Pr ¨ unster , 2007 ), is given in App endix C . Sparsit y The following theorem states that the GGP parameter σ tunes the sparsit y of the graph. When σ < 0, the graph is dense, whereas it is sparse when σ ≥ 0. Theorem 10 L et N α b e the numb er of no des and N ( e ) α the numb er of e dges in the undir e cte d gr aph r estriction, Z α . Then N ( e ) α = Θ N 2 α if σ < 0 o N 2 α if σ ∈ [0 , 1) , τ > 0 O N 2 / (1+ σ ) α if σ ∈ (0 , 1) , τ > 0 almost sur ely as α → ∞ . That is, the underlying gr aph is sp arse if σ ≥ 0 and dense otherwise. Pro of. F or σ < 0, the CRM is finite-activity and thus Theorem 5 implies that the graph is dense. When σ ≥ 0 the CRM is infinite-activit y; moreov er, for F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 23 τ > 0, E [ W ∗ α ] < ∞ , and thus Theorem 5 implies that the graph is sparse. More precisely , for σ > 0, the tail L´ evy intensit y has the asymptotic b ehavior ρ ( x ) x ↓ 0 ∼ α σ Γ(1 − σ ) x − σ and so Theorem 10 follo ws directly from Theorem 5 . Remark 11 The pr o of te chnique r e quir es a finite first moment for the total mass W ∗ α , and thus excludes the stable pr o c ess ( τ = 0 , σ ∈ (0 , 1)) , although we c onje ctur e that the gr aph is also sp arse in that c ase. Empirical analysis of graph prop erties F or the GGP-based form ulation, w e provide an empirical analysis of our net work prop erties in Figure 6 by simu- lating undirected graphs using the approach describ ed in Section 4.4 for v arious v alues of σ, τ . W e compare to an Erd¨ os R ´ en yi random graph, preferential at- tac hmen t ( Barab´ asi and Albert , 1999 ), and the Bay esian nonparametric netw ork mo del of ( Llo yd et al. , 2012 ). The particular features w e explore are • Degree distribution Figure 6 (a) demonstrates that the mo del can ex- hibit p o w er-la w b eha vior providing a heavy-tailed degree distribution. As sho wn in Figure 6 (b), the mo del can also handle an exp onen tial cut-off in the tails of the degree distribution, which is an attractive prop erty ( New- man , 2009 ). • Number of degree 1 no des Figure 6 (c) examines the fraction of degree 1 no des v ersus num b er of no des. • Sparsity Figure 6 (d) plots the n umber of edges versus the num b er of no des. The larger σ , the sparser the graph. In particular, for the GGP random graph model, we hav e netw ork growth at a rate O ( n a ) for 1 < a < 2 whereas the Erd¨ os R ´ enyi (dense) graph gro ws as Θ( n 2 ). In terpretation of h yp erparameters Based on the prop erties deriv ed and explored empirically in this section, we see that our hyperparameters hav e the follo wing interpretations: • σ — F rom Figure 6 (a) and (d), σ relates to the slop e of the degree distri- bution in its pow er-law regime and the o verall net work sparsit y . Increasing σ leads to higher p o wer-la w exp onen t and sparser netw orks. • α — F rom Theorem 8 , α provides an o verall scale that affects the num b er of no des and directed in teractions, with larger α leading to larger netw orks. • τ — F rom Figure 6 (b), τ determines the exp onential decay of the tails of the p ow er-law degree distribution, with τ small lo oking like pure p o w er- la w. This is in tuitiv e from the form of ρ ( dw ) in ( 37 ), where w e see that τ affects large w eights more than small ones. 6. P osterior characterization and inference In this section we target inferring the p osterior distribution of the sociability parameters, w i , restriction v alue α , and CRM h yp erparameters. In the sp ecial F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 24 10 0 10 1 10 2 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 Degree Distribution ER BA Lloyd GGP ( σ = 0.2) GGP ( σ = 0.5) GGP ( σ = 0.8) 10 0 10 1 10 2 10 3 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 Degree Distribution ER BA Lloyd GGP ( τ = 10 −1 ) GGP ( τ = 1) GGP ( τ = 5) (a) (b) 10 1 10 2 10 0 10 1 10 2 10 3 Number of nodes Number of nodes of degree one ER BA Lloyd GGP ( σ = 0) GGP ( σ = 0.5) GGP ( σ = 0.8) 10 1 10 2 10 1 10 2 10 3 10 4 Number of nodes Number of edges ER BA Lloyd GGP ( σ = 0) GGP ( σ = 0.5) GGP ( σ = 0.8) (c) (d) Fig 6 . Examination of the GGP undir e cte d network pr op erties (aver aging over gr aphs with various α ) in c omp arison to an Er d¨ os R´ enyi G ( n, p ) mo del with p = 0 . 05 (ER), the pr efer ential attachment mo del of ( Bar ab´ asi and Alb ert , 1999 ) (BA), and the non- p ar ametric formulation of ( Lloyd et al. , 2012 ) (Lloyd). (a-b) De gr e e distribution on a lo g-log sc ale for (a) various values of σ ( τ = 10 − 2 ) and (b) various values of τ ( σ = 0 . 5 ) for the GGP. (c) Numb er of no des with de gr e e one versus the numb er of no des on a lo g-lo g sc ale. Note that the Lloyd metho d le ads to dense gr aphs such that no no de has only de gr ee 1. (d) Numb er of e dges versus the numb er of no des. In (d) we note gr owth at a r ate o ( n 2 ) for our GGP gr aph mo dels, and Θ( n 2 ) for the Er d¨ os R´ enyi and Lloyd mo dels (dense gr aphs). F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 25 case of GGPs, our h yp erparameters of interest are then the set ( α, σ, τ ). 6.1. Dir e cte d multigr aph and undir e cte d simple gr aph W e first characterize the conditional distribution of the restricted CRM W α giv en the directed graph D α (see ( 20 ) and surrounding text). In what follows, w e utilize the fact that the conditional CRM W α giv en D α can b e decomp osed as a sum of (i) a measure with fixed lo cations θ i and random w eights w i , cor- resp onding to no des for whic h we observed at least one connection, and (ii) a measure with random weigh ts and random atoms, corresponding to the re- maining set of no des. W e denote the total mass of this remaining w eight as w ∗ . Theorem 12 L et ( θ 1 , . . . , θ N α ) , N α ≥ 0 , b e the set of supp ort p oints of D α such that D α = P 1 ≤ i,j ≤ N α n ij δ ( θ i ,θ j ) . L et m i = P N α j =1 ( n ij + n j i ) > 0 for i = 1 , . . . , N α . The c onditional distribution of W α given D α is e quivalent to the distribution of w ∗ ∞ X i =1 e P i δ e θ i + N α X i =1 w i δ θ i (45) wher e e θ i ∼ Unif ([0 , α ]) , and the weights ( e P i ) i =1 , 2 ,... , with e P 1 > e P 2 > . . . and P ∞ i =1 e P i = 1 , ar e distribute d fr om a Poisson-Kingman distribution ( Pitman , 2003 , Definition 3 p.6) with L´ evy intensity ρ , c onditional on w ∗ ( e P i ) | w ∗ ∼ PK( ρ | w ∗ ) . Final ly, the weights ( w 1 , . . . , w N α , w ∗ ) ar e jointly dep endent c onditional on D α , with the fol lowing p osterior distribution: p ( w 1 , . . . , w N α , w ∗ | D α ) ∝ " N α Y i =1 w m i i # e − ( P N α i =1 w i + w ∗ ) 2 " N α Y i =1 ρ ( w i ) # × g ∗ α ( w ∗ ) (46) wher e g ∗ α is the pr ob ability density function of the r andom variable W ∗ α = W α ([0 , α ]) , with L aplac e tr ansform E [ e − tW ∗ α ] = e − αψ ( t ) . (47) Pro of. The proof builds on the Palm formula for Poisson random measures ( Pr ¨ unster , 2002 ; James , 2002 , 2005 ; James, Lijoi and Pr ¨ unster , 2009 ) and is describ ed in App endix D . Note that the normalized w eights ( e P i ) i =1 , 2 ,... and locations ( e θ i ) i =1 , 2 ,... are not lik eliho o d identifiable, as the lik eliho o d only brings information on the weigh ts of the observed no des, and on the total mass w ∗ of the remaining no des. Ad- ditionally , note that the conditional distribution of ( w 1 , . . . , w N α , w ∗ ) given D α do es not dep end on the lo cations ( θ 1 , . . . , θ N α ) b ecause w e considered a ho- mogeneous CRM. This fact is imp ortan t since the lo cations ( θ 1 , . . . , θ N α ) are F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 26 t ypically not observed, and our algorithm outlined b elow will not consider these terms in the inference. W e now specialize to the sp ecial case of the GGP , for which w e deriv e an MCMC sampler for p osterior inference. Let φ = ( α, σ, τ ) b e the set of hyper- parameters that we also wan t to estimate. W e will assume improp er priors on those parameters: p ( α ) ∝ 1 α , p ( σ ) ∝ 1 1 − σ , p ( τ ) ∝ 1 τ . T o emphasize the dependence of the L ´ evy measure and pdf of the total mass w ∗ on the hyperparameters, we write ρ ( w | σ, τ ) and g ∗ α,σ,τ ( w ∗ ). W e are in terested in appro ximating the posterior p ( w 1 , . . . , w N α , w ∗ , φ | ( n ij ) 1 ≤ i,j ≤ N α ) for a directed m ultigraph or p ( w 1 , . . . , w N α , w ∗ , φ | ( z ij ) 1 ≤ i,j ≤ N α ) for a simple graph. In the case of a simple graph, w e will simply impute the missing directed edges in the graph. F or eac h i ≤ j such that z ij = 1, w e introduce latent v ariables n ij = n ij + n j i with conditional distribution n ij | z , w ∼ δ 0 if z ij = 0 tP oisson(2 w i w j ) if z ij = 1 , i 6 = j tP oisson( w 2 i ) if z ii = 1 , i = j, (48) where tPoisson( λ ) is the zero-truncated Poisson distribution with p df k λ exp( − λ ) (1 − exp( − λ )) k ! , for k = 1 , 2 , . . . By conv ention, we set n ij = n j i for j < i and m i = P N α j =1 n ij . F or efficient exploration of the target p osterior, w e propose using a Hamil- tonian Monte Carlo (HMC) algorithm ( Duane et al. , 1987 ; Neal , 2011 ) within Gibbs to up date the weigh ts ( w 1 , . . . , w N α ). The HMC step requires computing the gradient of the log-p osterior, whic h in our case, letting ω i = log w i , is given b y ∇ ω 1: N α log p ( ω 1: N α , w ∗ | D α ) i = m i − σ − w i τ + 2 N α X j =1 w j + 2 w ∗ . (49) F or the up date of the total mass w ∗ and h yp erparameters φ , we use a Metropolis- Hastings step. Note that, except in some particular cases ( σ = 0 , 1 2 ), the density g ∗ α,σ,τ ( w ∗ ) do es not admit any analytical expression. W e therefore use a sp ecific prop osal for w ∗ based on exp onen tial tilting of g ∗ α,σ,τ that alleviates the need to ev aluate this p df in the Metrop olis-Hasting ratio (see details in App endix E ). T o summarize, the MCMC sampler is defined as follows: 1. Up date the weigh ts ( w 1 , . . . , w N α ) given the rest using an HMC up date 2. Up date the total mass w ∗ and hyperparameters φ = ( α, σ, τ ) given the rest using a Metrop olis-Hastings update F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 27 3. [Undirected graph] Up date the latent coun ts ( n ij ) given the rest using the conditional distribution ( 48 ) or a Metrop olis-Hastings update Note that the computational b ottlenecks lie in steps 1 and 3, whic h roughly scale linearly in the num b er of no des/edges, resp ectiv ely , although one can par- allelize step 3 o ver edges. If L is the n um b er of leapfrog steps in the HMC algorithm, n iter the num b er of MCMC iterations, the ov erall complexity is in O ( n iter ( LN α + N ( e ) α )). W e show in Section 7 that the algorithm scales w ell to large net works with h undreds of thousands of no des and edges. T o efficiently scale HMC to even larger collections of nodes/edges, one can deplo y the methods of Chen, F ox and Guestrin ( 2014 ). 6.2. Bip artite gr aph F or the bipartite graph case, the p osterior c haracterization follo ws as proposed b y Caron ( 2012 ). Ho wev er, our proposed data augmen tation is differen t and leads to a simpler form for the sampler. Theorem 13 L et ( θ 1 , . . . , θ N α ) , ( θ 0 1 , . . . , θ 0 N 0 α ) with N α , N 0 α ≥ 0 , b e the set of supp ort p oints of D α and thus D α = P 1 ≤ i,j ≤ N α n ij δ ( θ i ,θ 0 j ) . L et m i = P N 0 α j =1 n ij and m 0 j = P N α i =1 n ij The c onditional distribution of W α given D α , W 0 α is e quiv- alent to the distribution of f W + N α X i =1 w i δ θ i (50) wher e ( w 1 , . . . , w N α ) ar e indep endent of f W with p ( w i | D α , W 0 α ) ∝ w m i i e − w i P N 0 α j =1 w 0 j + w 0 ∗ ρ ( w i ) (51) and f W ∼ CRM( e ρ, λ α ) is a CRM with exp onential ly tilte d L´ evy intensity e ρ ( w ) = ρ ( w ) e − w P N 0 α j =1 w 0 j + w 0 ∗ . (52) In p articular, for the gener alize d gamma pr o c ess, we have w i | D α , W 0 α ∼ Gamma m i − σ, τ + N 0 α X j =1 w 0 j + w 0 ∗ (53) and the total mass w ∗ of f W is distribute d fr om an exp onential ly tilte d stable distribution with p df p ( w ∗ | r est ) = e − w ∗ P N 0 α j =1 w 0 j + w 0 ∗ g α ( w ∗ ) e − ψ P N 0 α j =1 w 0 j + w 0 ∗ , (54) F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 28 fr om which one c an sample exactly ( Devr oye , 2009 ; Hofert , 2011 ). A dditional ly, the mar ginal likeliho o d is expr esse d as E W α [ p ( D α | W 0 α )] = e − αψ P N 0 α j =1 w 0 j + w 0 ∗ α N α N α Y i =1 κ m i , N 0 α X j =1 w 0 j + w 0 ∗ dθ i , (55) wher e κ ( n, z ) = R ∞ 0 w n exp( − z w ) ρ ( w ) dw . Pro of. The proof is describ ed b y Caron ( 2012 ) and in Appendix D for com- pleteness. Let φ = ( α , σ, τ ) and φ 0 = ( α 0 , σ 0 ) b e, resp ectively , the parameters of the L ´ evy intensit y of W and W 0 . T o preserve identifiabilit y , we set the parameter τ 0 to 1. The MCMC sampler for appro ximating p ( w 1: N α , w ∗ , w 0 1: N 0 α , w 0 ∗ , φ, φ 0 | Z α ) iterates as follo ws: 1. Up date α, σ, τ giv en w 0 1: N 0 α using a Metrop olis-Hastings step with accep- tance ratio calculated with ( 55 ) 2. Up date w 1: N α giv en ( w 0 1: N 0 α , w 0 ∗ , α, σ, τ ) using ( 53 ) 3. Up date w ∗ giv en ( w 0 1: N 0 α , w 0 ∗ , α, σ, τ ) using ( 54 ) 4. Up date the latent n ij giv en w 0 1: N 0 α , w 1: N α as n ij | z , w , w 0 ∼ δ 0 if z ij = 0 tP oisson( w i w 0 j ) if z ij = 1 The model is symmetric in ( w , w 0 ), so the first three steps can be rep eated for up dating ( α 0 , σ 0 , τ 0 , w 0 1: N 0 α , w 0 ∗ ). F ull algorithmic details are given in Appendix E . 7. Exp erimen ts 7.1. Simulate d data W e first study the con vergence of the MCMC algorithm on sim ulated data where the graph is simulated from our mo del. W e simulate a GGP undirected graph with parameters α = 300 , σ = 0 . 5 , τ = 1. Note that w e are in the sparse regime. The sampled graph has 13,995 no des and 76,605 edges. W e run 3 MCMC c hains eac h with 40,000 iterations and with different initial v alues. L = 10 leapfrog steps are used, and the stepsize of the leapfrog algorithm is adapted during the first 10,000 iterations so as to obtain an acceptance rate of 0.6. Standard devia- tions of the random w alk Metrop olis-Hastings for log τ and log (1 − σ ) are set to 0.02. It takes 10 min utes with Matlab on a standard computer (CPU@3.10GHz, 4 cores) to run the 3 chains successively . T race plots of the parameters α , σ , τ and w ∗ are giv en in Figure 7 . The potential scale factor reduction ( Bro oks and Gelman , 1998 ; Gelman et al. , 2014 ) is computed for all 13,999 parameters ( w 1: N α , w ∗ , α, σ, τ ) and has a maximum v alue of 1.01, indicating conv ergence of the algorithm. This is rather remark able as the MCMC sampler actually sam- ples from a target distribution of dimension 13,995+76,605+4=90,604. P osterior F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 29 credible interv als of the sociability parameters w i of the no des with highest de- grees and log-so ciability parameters log w i of the no des with low est degrees are displa y ed in Figure 8 (a) and (b), resp ectively , sho wing the ability of the metho d to accurately recov er so ciability parameters of b oth low and high degree no des. T o show the versatilit y of the GGP graph model, w e now examine our ap- proac h when the observed graph is actually generated from an Erd¨ os-R´ en yi mo del with n = 1 , 000 and p = 0 . 01. The generated graph has 1,000 no des and 5,058 edges. W e ran 3 MCMC chains with the same sp ecifications as abov e. In this dense-graph regime, the following transformation of our parameters α , σ and τ is more informativ e: ς 1 = − α σ τ σ , ς 2 = − σ τ and ς 3 = − σ τ 2 . When σ < 0, ς 1 corresp onds to the exp ected n umber of no des, ς 2 to the mean of the so ciability parameters and ς 3 to their v ariance (see Section 5.3 ). In contrast, the parameters σ and τ are only weakly identifiable in this case. The p oten tial scale reduction factor is computed on ( w 1: N α , w ∗ , ς 1 , ς 2 , ς 3 ), and its maxim um v alue is 1.01, indi- cating con vergence. T race plots are shown in Figure 9 for ς 1 , ς 2 , ς 3 and w ∗ . The v alue of ς 1 con v erges around the true num b er of nodes, ς 2 to the true so ciabilit y parameter q − 1 2 log(1 − p ) (constan t across no des for the Erd¨ os-R´ enyi mo del), while ς 3 is close to zero as the v ariance ov er the so ciability parameters is v ery small. The total mass is very close to zero, indicating that there are no no des with degree zero. 7.2. T esting for sp arsity of r e al-world gr aphs W e now turn to using our methods to test whether a given graph is sparse or not. Such testing based on a single given graph is notoriously challenging as sparsit y relates to the asymptotic b ehavior of the graph. Measures of sparsity from finite graphs exist, but can be costly to implemen t ( Ne ˇ set ˇ ril and Ossona de Mendez , 2012 ). Based on our GGP-based form ulation and asso ciated theoretical results describ ed in Section 5 , we prop ose the follo wing test: H 0 : σ < 0 vs H 1 : σ ≥ 0 . In our experiments, w e again consider a GGP-based graph model with improper priors on the unknown parameters ( α, σ, τ ), as describ ed in Section 6 . W e aim at rep orting Pr( H 1 | z ) = Pr( σ > 0 | z ) based on a set of observed connections ( z ), whic h can b e directly approximated from the MCMC output. W e consider 12 differen t datasets: • faceb o ok107 : So cial circles from F aceb o ok 2 ( McAuley and Lesko vec , 2012 ) • p olblogs : P olitical blogosphere (F eb. 2005) 3 ( Adamic and Glance , 2005 ) • USairp ort : US airp ort connection netw ork in 2010 4 ( Colizza, P astor- Satorras and V espignani , 2007 ) 2 https://snap.stanford.edu/data/egonets- Facebook.html 3 http://www.cise.ufl.edu/research/sparse/matrices/Newman/polblogs 4 http://toreopsahl.com/datasets/ F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 30 (a) α (b) σ (c) τ (d) w ∗ Fig 7 . MCMC tr ace plots of par ameters (a) α (b) σ , (c) τ and (d) w ∗ for a gr aph gener ate d fr om a GGP mo del with p ar ameters α = 300 , σ = 0 . 5 , τ = 1 . F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 31 (a) 50 no des with highest degree (b) 50 no des with low est degree Fig 8 . 95 % p osterior intervals of (a) the so ciability p ar ameters w i of the 50 nodes with highest de gre e and (b) the lo g-so ciability par ameter log w i of the 50 no des with lowest de gr ee, for a gr aph generate d from a GGP model with p arameters α = 300 , σ = 0 . 5 , τ = 1 . T rue values are r epr esented by a gr e en star. • UCirvine : Social net work of students at Univ ersity of California, Irvine 4 ( Op- sahl and P anzarasa , 2009 ) • yeast : Y east protein interaction net work 5 ( Bu et al. , 2003 ) • USp ow er : Net work of high-voltage p ow er grid in the W estern States of the United States of America 4 ( W atts and Strogatz , 1998 ) • IMDB : Actor collab oration netw ork based on acting in the same movie 6 • cond-mat1 : Co-authorship netw ork 4 ( Newman , 2001 ), based on preprin ts p osted to Condensed Matter of Arxiv b et w een 1995 and 1999; obtained from the bipartite preprin ts/authors netw ork using a one-mo de pro jection • cond-mat2 : As in cond-mat1, but using Newman’s pro jection metho d • Enron : Enron collab oration netw ork from multigraph email netw ork 7 • internet : Connectivit y of internet routers 8 • www : Linked www pages in the nd.edu domain 9 The sizes of the different datasets are given in T able 2 and range from a few h undred no des/edges to a million. The adjacency matrices for these net works are plotted in Figure 11 and empirical degree distributions in Figure 14 (red). W e ran 3 MCMC chains for 40,000 iterations with the same sp ecifications as ab o ve and rep ort the estimate of Pr( H 1 | z ) and 99% p osterior credible in terv als of σ in T able 2 ; we additionally pro vide runtimes. Figure 12 and Figure 13 sho w 5 http://www.cise.ufl.edu/research/sparse/matrices/Pajek/yeast.html 6 http://www.cise.ufl.edu/research/sparse/matrices/Pajek/IMDB.html 7 https://snap.stanford.edu/data/email- Enron.html 8 http://www.cise.ufl.edu/research/sparse/matrices/Pajek/internet.html 9 http://lisgi1.engr.ccny.cuny.edu/ ~ makse/soft_data.html F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 32 (a) ς 1 = − α σ τ σ (b) ς 2 = − σ τ (c) ς 3 = − σ τ 2 (d) w ∗ Fig 9 . MCMC tr ac e plots of par ameters (a) ς 1 (b) ς 2 , (c) ς 3 , (d) w ∗ for a gr aph gener ated fr om an Erd¨ os-R´ enyi mo del with p ar ameters n = 1000 , p = 0 . 01 . F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 33 (a) No des with highest degree (b) No des with low est degree Fig 10 . 95% posterior intervals of (a) sociability p ar ameters w i of the 50 nodes with highest de gr ee and (b) lo g-so ciability p arameters log w i of the 50 no des with lowest de gre e, for a graph gener ate d fr om an Er d¨ os-R ´ enyi model with p ar ameters n = 1000 , p = 0 . 01 . In this c ase, al l no des have the same true so ciability par ameter q − 1 2 log(1 − p ) , r epr esente d by a gr e en star. T able 2 Size of r eal-world datasets and p osterior pr ob ability of sp arsity. Name Nb nodes Nb edges Time Pr( H 1 | z ) 99% CI σ (min) facebo ok107 1,034 26,749 1 0.000 [ − 1 . 057 , − 0 . 819] polblogs 1,224 16,715 1 0.000 [ − 0 . 348 , − 0 . 202] USairport 1,574 17,215 1 1.000 [ 0 . 099 , 0 . 181] UCirvine 1,899 13,838 1 0.000 [ − 0 . 141 , − 0 . 017] yeast 2,284 6,646 1 0.280 [ − 0 . 093 , 0 . 054] USpower 4,941 6,594 1 0.000 [ − 4 . 837 , − 3 . 185] IMDB 14,752 38,369 2 0.000 [ − 0 . 244 , − 0 . 173] cond-mat1 16,264 47,594 2 0.000 [ − 0 . 945 , − 0 . 837] cond-mat2 7,883 8,586 1 0.000 [ − 0 . 176 , − 0 . 022] Enron 36,692 183,831 7 1.000 [ 0 . 201 , 0 . 221] internet 124,651 193,620 15 0.000 [ − 0 . 201 , − 0 . 171] www 325,729 1,090,108 132 1.000 [0 . 262 , 0 . 298] MCMC traces and p osterior histograms, respectively , for the sparsit y parameter σ for the differen t datasets. Man y of the smaller net works fail to pro vide evidence of sparsity . These graphs ma y indeed b e dense; for example, our facebook107 dataset represents a small so cial circle that is lik ely highly interconnected and the polblogs dataset represents tw o tigh tly connected p olitical parties. Three of the datasets ( USairport , Enron , www ) are clearly inferred as sparse; note that t w o of these datasets are in the top three largest netw orks considered, where sparsit y is more commonplace. In the remaining large, but inferred-dense net- w ork, internet , there is not enough evidence under our test that the netw ork is not dense. This may b e due to the presence of dense subgraphs or sp ots (e.g., spatially proximate routers may b e highly interconnected, but sparsely F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 34 connected outside the group) ( Borgs et al. , 2014 ). This relates to the idea of c ommunity structur e , though not ev ery no de need b e asso ciated with a c omm u- nit y . As in many sparse netw ork mo dels that assume no dense sp ots ( Bollob´ as and Riordan , 2009 ; W olfe and Olhede , 2013 ), our approach do es not explic- itly mo del such effects. Capturing suc h structure remains a direction of future researc h lik ely feasible within our generativ e framew ork, though our curren t metho d has the b enefit of simplicit y with three h yp erparameters tuning the net w ork prop erties. Finally , we note in T able 2 that our analyses finish in a remark ably short time despite the co de base b eing implemented in Matlab on a standard desktop mac hine, without lev eraging p ossible opp ortunities for par- allelizing and otherwise scaling some comp onen ts of the sampler (see Section 6 for a discussion.) T o assess our fit to the empirical degree distributions, we use the metho ds describ ed in Section 4.4 to simulate 5000 graphs from the p osterior predic- tiv e and compare to the observ ed graph degrees in Figure 14 . In all cases, we see a reasonably go o d fit. F or the largest net works, Figure 14 (j)-(l), we see a sligh t underestimate of the tail of the distribution; that is, w e do not capture as many high-degree no des as truly present. This may b e b ecause these graphs exhibit a p ow er-law b eha vior, but only after a certain cutoff ( Clauset, Shalizi and Newman , 2009 ), whic h is not an effect explicitly modeled b y our framew ork. Lik ewise, this cutoff might b e due to the presence of dense sp ots. In con trast, w e capture p ow er-law b eha vior with p ossible exp onential cutoff in the tail. W e see a similar trend for cond-mat1 , but not cond-mat2 . Based on the bipartite articles-authors graph, cond-mat1 uses the standard one-mo de pro jection and sets a connection betw een tw o authors who hav e co-authored a paper; this pro- jection clearly creates dense sp ots in the graph. On the contrary , cond-mat2 uses Newman’s pro jection method ( Newman, Strogatz and W atts , 2001 ). This metho d constructs a weigh ted undirected graph by coun ting the num b er of pap ers co-authored by tw o scientists, where each count is normalized by the n um b er of authors on the pap er. T o construct the undirected graph, we set an edge if the w eight is equal or greater than 1; cond-mat1 and cond-mat2 th us ha v e a differen t num b er of edges and nodes, as only no des with at least one connection are considered. It is interesting to note that the pro jection method used for the cond-mat dataset has a clear impact on the sparsit y of the resulting graph, cond-mat2 b eing less dense than cond-mat1 (see Figure 14 (h)-(i)). The degree distribution for cond-mat1 is similar to that of internet , thus inherit- ing the same issues previously discussed. Overall, it app ears our mo del b etter captures homogeneous p o wer-la w b eha vior with p ossible exp onen tial cutoff in the tails than it do es a graph with p erhaps structured dense sp ots or p o w er- la w-after-cutoff b ehavior. 8. Discussion There has b een extensive w ork ov er the past years on flexible Ba yesian non- parametric mo dels for net works, allo wing complex latent structures of unknown F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 35 (a) faceb o ok107 (b) p olblogs (c) USairp ort (d) UCirvine (e) yeast (f ) USp ow er (g) IMDB (h) cond-mat1 (i) cond-mat2 (j) enron (k) internet (l) www Fig 11 . A djac ency matric es for various r e al-world networks. F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 36 (a) faceb o ok107 (b) p olblogs (c) USairp ort (d) UCirvine (e) yeast (f ) USp ow er (g) IMDB (h) cond-mat1 (i) cond-mat2 (j) enron (k) internet (l) www Fig 12 . MCMC tr ac e plot for the p arameter σ for various r e al-world networks. F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 37 (a) faceb o ok107 (b) p olblogs (c) USairp ort (d) UCirvine (e) yeast (f ) USp ow er (g) IMDB (h) cond-mat1 (i) cond-mat2 (j) enron (k) internet (l) www Fig 13 . Histo gr ams of MCMC samples of the p arameter σ for various r e al-world networks. F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 38 (a) faceb o ok107 (b) p olblogs (c) USairp ort (d) UCirvine (e) yeast (f ) USp ow er (g) IMDB (h) cond-mat1 (i) cond-mat2 (j) enron (k) internet (l) www Fig 14 . Empiric al degr e e distribution (r ed) and posterior pr e dictive (blue) for various r e al- world networks. F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 39 dimension to be unco vered from real-world net w orks ( Kemp et al. , 2006 ; Miller, Griffiths and Jordan , 2009 ; Lloyd et al. , 2012 ; Palla, Kno wles and Ghahramani , 2012 ; Herlau, Sc hmidt and Mørup , 2014 ). Ho wev er, as men tioned in the unifying o v erview of Orbanz and Ro y ( 2015 ), these metho ds all fit in the Aldous-Ho ov er framew ork and as such pro duce dense graphs. Norros and Reittu ( 2006 ) (see also ( v an der Hofstad , 2014 ) for a review and ( Britton, Deijfen and Martin-L¨ of , 2006 ) for a similar mo del) prop osed a conditionally Poissonian multigraph pro cess with similarities to b e drawn to our m ultigraph pro cess. They consider that each no de has a giv en so ciability parameter, and the num b er of edges b etw een t wo no des i and j is dra wn from a P oisson distribution with rate the pro duct of the so ciabilit y parameters, normal- ized b y the sum of the sociability parameters of all the nodes. The normalization mak es this mo del similar to mo dels based on rescaling of the graphon and, as suc h, do es not define a pro jective mo del, as explained in Section 1 . Another related model is the degree-corrected random graph mo del ( Karrer and Newman , 2011 ), where edges of the multigraph are drawn from a P oisson distribution whose rate is the pro duct of no de-sp ecific so ciabilit y parameters and a parameter tuning the interaction betw een the latent comm unities to which these nodes b elong. When the so ciabilit y parameters are assumed to b e i.i.d. from some distribution, this mo del yields an exc hangeable matrix and thus a dense graph. Additionally , there are similarities to b e dra wn with the extensive literature on laten t space mo deling (cf. Hoff, Raftery and Handco c k , 2002 ; Penrose , 2003 ; Hoff , 2009 ). In such mo dels, no des are em b edded in a low-dimensional, con tin- uous laten t space and the probabilit y of an edge is determined b y a distance or similarit y metric of the no de-sp ecific latent factors. In our case, the no de p osition, θ i , is of no imp ortance in forming edge probabilities. It would, ho w- ev er, be p ossible to extend our approach to lo cation-dep endent connections by considering inhomogenous CRMs. Finally , the urn construction describ ed in Section 3.2 highlights a connection with the configuration model ( Bollob´ as , 1980 ; Newman , 2009 ), a p opular mo del for generating simple graphs with a given degree sequence. The configuration mo del pro ceeds as follo ws. First, the degree k i of each no de i = 1 , . . . , n is sp ecified such that the sum of k i is an o dd n umber. Each no de i is giv en a total of k i stubs, or demi-e dges . Then, w e rep eatedly c ho ose pairs of stubs uniformly at random, without replacemen t, and connect the selected pairs to form an edge. The simple graph is obtained either by discarding the multiple edges and self- lo ops (an er ase d configuration mo del), or by repeating the ab o v e sampling until obtaining a simple graph. The connections to this past w ork nicely place our proposed Bay esian non- parametric net w ork model within the con text of existing literature. Importantly , ho w ev er, to the b est of our knowledge this w ork represents the first fully gen- erativ e and pro jective approach to sparse graph modeling, and with a notion of exc hangeabilit y essential for devising our scalable statistical estimation pro- cedure. F or this, we devised a sampler that readily scales to large, real-world F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 40 net w orks. The foundational modeling to ols and theoretical results presen ted herein represent an important building blo c k for future developmen ts, including incorp orating notions of communit y structure, no de attributes, etc. Ac knowledgemen ts. The authors thank Bernard Bercu for help in deriving the pro of of Theorem 16 , and Arnaud Doucet, Y ee Why e T eh, Stefano F av aro and Dan Roy for helpful discussions and feedbac k on earlier versions of this pap er. F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 41 App endix A: Pro ofs of results on the sparsity A.1. Pr ob ability asymptotics notation W e first describ e the asymptotic notation used in the remaining of this section, whic h follo ws the notation of Janson ( 2011 ). All unsp ecified limits are as α → ∞ . Let ( X α ) α ≥ 0 and ( Y α ) α ≥ 0 b e tw o [0 , ∞ )-v alued sto chastic pro cesses defined on the same probabilit y space and such that lim α →∞ X α = lim α →∞ Y α = ∞ a.s. W e hav e X α = O ( Y α ) a.s. ⇐ ⇒ lim sup α →∞ X α Y α < ∞ a.s. X α = o ( Y α ) a.s. ⇐ ⇒ lim α →∞ X α Y α = 0 a.s. X α = Ω( Y α ) a.s. ⇐ ⇒ Y α = O ( X α ) a.s. X α = ω ( Y α ) a.s. ⇐ ⇒ Y α = o ( X α ) a.s. X α = Θ( Y α ) a.s. ⇐ ⇒ X α = O ( Y α ) and X α = Ω( Y α ) a.s. The relations ha ve the follo wing interpretation X α = O ( Y α ) “ X α do es not gro w at a faster rate than Y α ” [ ≤ ] X α = o ( Y α ) “ X α gro ws at a (strictly) slo wer rate than Y α ” [ < ] X α = Ω( Y α ) “ X α do es not gro w at a slow er rate than Y α ” [ ≥ ] X α = ω ( Y α ) “ X α gro ws at a (strictly) faster rate than Y α ” [ > ] X α = Θ( Y α ) “ X α and Y α gro w at the same rate” [ = ] A.2. Pr o of of The or ems 5 , 6 and 7 in the finite-activity c ase W e first consider the case of a finite-activity CRM. Let T = R ∞ 0 ρ ( w ) dw < ∞ and H ( t ) = 1 T R t 0 ρ ( w ) dw . The p oin t pro cess Z can be equiv alen tly defined as follo ws. Let Π = { θ 1 , θ 2 , . . . } b e a homogeneous Poisson pro cess of rate T . F or eac h 1 ≤ i ≤ j , sample z ij | U i , U j ∼ Ber( W ( U i , U j )) (56) where U 1 , U 2 , . . . are uniform random v ariables and W ( u, v ) = 1 − exp( − 2 H − 1 ( u ) H − 1 ( v )) u 6 = v 1 − exp( − H − 1 ( u ) 2 ) u = v Let J α = Π ∩ [0 , α ] . As (i) J α → ∞ almost surely as α → ∞ and (ii) R 1 0 R 1 0 W ( u, v ) dudv < ∞ and R 1 0 p W ( u, u ) du < ∞ , the law of large num b ers for V statistics yields (cf Theorem 15 ) 2 J α ( J α + 1) X 1 ≤ i ≤ j ≤ J α W ( U i , U j ) → Z 1 0 Z 1 0 W ( u, v ) dudv (57) F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 42 almost surely as α → ∞ . Additionally , applying Theorem 16 to Equation ( 56 ), giv es N ( e ) α P 1 ≤ i ≤ j ≤ J α W ( U i , U j ) → 1 a.s. which, combined with Equation ( 57 ) yields N ( e ) α J 2 α = Θ(1) almost surely . As N α J α = Θ(1) almost surely , w e determine that N ( e ) α = Θ( N 2 α ) a.s. N α = Θ( α ) a.s. N ( e ) α = Θ( α 2 ) a.s. A.3. Pr o of of The or em 6 in the infinite-activity c ase Consider now the infinite-activity case. Assume ψ 0 (0) = E [ W ∗ 1 ] < ∞ . Let e Z ij = 1 if Z ([ i − 1 , i ] , [ j − 1 , j ]) > 0 0 otherwise (58) then, for an y k ∈ N , X 1 ≤ i 0 Clearly , for all α ≥ 0 e N α ≤ N α (65) W e ha ve, for θ i ∈ S 1 α Pr D { θ i } × S 2 α > 0 W = 1 − exp − W ( { θ i } ) × W S 2 α Note the key fact that W ( { θ i } ) is indep endent of W S 2 α as θ i / ∈ S 2 α . Applying Campb ell’s theorem, w e hav e E h e N α | W S 2 α i = λ ( S 1 α ) × ψ ( W S 2 α ) where ψ ( t ) = R ∞ 0 (1 − exp( − w t )) ρ ( w ) dw is the Laplace exp onen t. And so, by complete randomness of the CRM ov er S 1 n , e N α | W S 2 α ∼ P oisson λ ( S 1 α ) × ψ ( W S 2 α ) (66) W e hav e λ ( S 1 α ) = Θ( α ) and λ ( S 2 α ) = Θ( α ). Moreo v er, as we are in the infinite-activit y case R ∞ 0 ρ ( w ) dw = ∞ , Lemma 18 implies that lim t →∞ ψ ( t ) = ∞ . (67) As W S 2 α → ∞ almost surely , we therefore hav e ψ ( W S 2 α ) → ∞ almost surely . Th us, ψ ( W S 2 α ) = ω (1) a.s. (68) and λ ( S 1 α ) × ψ ( W S 2 α ) = ω ( α ) a.s. (69) Com bining ( 69 ) with Theorem 17 and ( 65 ) yields e N α = ω ( α ) a.s. (70) N α = ω ( α ) a.s. (71) F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 44 Consider now the case where ρ ( x ) x ↓ 0 ∼ ` (1 /x ) x − σ where ` ( t ) is a slowly v ary- ing function, i.e. a function v erifying ` ( ct ) ` ( t ) → 1 for any c > 0, and such that lim t →∞ ` ( t ) > 0 . Then Lemma 19 implies that ψ ( t ) = Ω( t σ ) as t → ∞ and thus λ ( S 1 α ) × ψ ( W S 2 α ) = Ω( α σ +1 ) a.s. whic h implies that N α = Ω( α σ +1 ) a.s. App endix B: T ec hnical lemmas Theorem 14 (Graphs constructed from exc hangeable arrays are dense) L et ( X ij ) i,j ∈ N , b e an infinitely exchange able binary symmetric arr ay. L et N n = P 1 ≤ i 0 almost sur ely, then N n = Θ( n 2 ) almost sur ely and in L 1 (72) Pr o of. F r om the Aldous-Ho over the or em, ther e is a r andom function W : [0 , 1] → [0 , 1] such that X ij | W , U i , U j ∼ Ber( W ( U i , U j )) (73) wher e ( U i ) i ∈ N ar e uniform r andom variables. Given W , the law of lar ge numb ers for U statistics (se e The or em 15 ) yields 2 n ( n − 1) X 1 ≤ i 0 almost sur ely, then W = R 1 0 R 1 0 W ( u, v ) dudv > 0 almost sur ely, thus X 1 ≤ i 0 and lim k →∞ f k ( w ) = ρ ( w ) Th us, by Leb esgue’s monotone con v ergence theorem lim k →∞ ψ ( k ) = Z ∞ 0 ρ ( w ) dw = ∞ as ψ ( t ) ≥ ψ ( b t c ), lim t →∞ ψ ( t ) = ∞ Lemma 19 (Relating tail L´ evy in tensity and Laplace exp onent) ( Gne din, Hansen and Pitman , 2007 , Pr op osition 17) L et ρ ( w ) b e the L´ evy intensity ρ ( x ) = R ∞ x ρ ( w ) dw b e the tail L´ evy intensity, and ψ ( t ) = R ∞ 0 (1 − exp( − w t )) ρ ( w ) dw its L aplac e exp onent . The fol lowing c onditions ar e e quivalent: ρ ( x ) x ↓ 0 ∼ ` (1 /x ) x − σ (79) ψ ( t ) t ↑∞ ∼ Γ(1 − σ ) t σ ` ( t ) (80) wher e 0 < σ < 1 and ` is a function slow ly varying at ∞ i.e. satisfying ` ( cy ) /` ( y ) → 1 as y → ∞ , for every c > 0 . Pro of. Applying in tegration by part, we ha v e ψ ( t ) = Z ∞ 0 (1 − exp( − wt )) ρ ( w ) dw = t Z ∞ 0 exp( − w t ) ρ ( w ) dw F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 49 As ρ ( x ) is positive monotonic, application of Proposition 20 yields the following equiv alence ρ ( x ) x ↓ 0 ∼ ` (1 /x ) x − σ ψ ( t ) t t ↑∞ ∼ Γ(1 − σ ) t σ − 1 ` ( t ) Prop osition 20 (T aub erian theorem) ( F el ler , 1971 , Chapter XIII, Se ction 5, The or em 4 p. 446) L et U ( dw ) b e a me asur e on (0 , ∞ ) with ultimately monon- tone density u , i.e. monotone in some interval ( x 0 , ∞ ) . Assume that L ( t ) = Z ∞ 0 e − tw u ( w ) dw exists for t > 0 . If ` is slow ly varying at infinity and 0 ≤ a < ∞ , then the two r elations ar e e quivalent L ( τ ) τ ↓ 0 ∼ τ − a ` (1 /τ ) (81) u ( x ) x ↑∞ ∼ 1 Γ( a ) x a − 1 ` ( x ) (82) A dditional ly ( F el ler , 1971 , Chapter XIII, Se ction 5, The or em 3), the r esult r e- mains valid if we inter change the r ole of infinity and 0, henc e x → ∞ and τ → 0 L ( τ ) τ ↑∞ ∼ τ − a ` ( τ ) (83) u ( x ) x ↓ 0 ∼ 1 Γ( a ) x a − 1 ` (1 /x ) (84) Prop osition 21 (Cheb yshev-type inequality) L et X b e a r andom variable with E [ X 2 ] < ∞ and θ ∈ (0 , 1) . Then P ( X ≥ θ E [ X ]) ≥ 1 − V ar ( X ) (1 − θ 2 ) E [ X ] 2 (85) F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 50 App endix C: Pro ofs of results on the prop erties of the GGP graph C.1. Pr o of of The or em 8 The hierarchical mo del for the num b er of no des N α in the gamma process case is N α = 2 D ∗ α X i =1 Y i (86) Y i ind ∼ Ber α α + i − 1 (87) D ∗ α | W ∗ α ∼ P oisson W ∗ 2 α (88) W ∗ α ∼ Gamma( α, τ ) (89) where D ∗ α is the total num b er of directed edges in the directed graph, and W ∗ α is the total mass. W e ha ve E [ D ∗ α ] = E [ W ∗ 2 α ] = α ( α + 1) τ 2 V ar ( D ∗ α ) = α ( α + 1) τ 2 1 + 4 α + 6 τ 2 F rom Equations ( 86 ) and ( 87 ), w e ha v e E [ N α | D ∗ α ] = α 2 D ∗ α X i =1 1 α + i − 1 As the function f : x → 1 α + x is decreasing on [0 , n ] , n > 0, we ha ve n +1 X i =2 1 α + i − 1 ≤ Z n 0 f ( x ) dx ≤ n X i =1 1 α + i − 1 n X i =1 1 α + i − 1 ! + 1 α + n − 1 α ≤ log 1 + n α ≤ n X i =1 1 α + i − 1 hence α log 1 + 2 D ∗ α α ≤ α 2 D ∗ α X i =1 1 α + i − 1 ≤ α log 1 + 2 D ∗ α α + 1 − α α + 2 D ∗ α and so α log 1 + 2 D ∗ α α ≤ E [ N α | D ∗ α ] ≤ α log 1 + 2 D ∗ α α + 1 − α α + 2 D ∗ α (90) F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 51 Lets work on the upp er b ound of Eq. ( 90 ). The function x → α log(1 + 2 x α ) + 1 − α α +2 D ∗ α is concav e, so by Jensen’s inequality E α log 1 + 2 D ∗ α α + 1 − α α + 2 D ∗ α ≤ α log 1 + 2 E [ D ∗ α ] α + 1 − α α + 2 E [ D ∗ α ] = α log 1 + 2( α + 1) τ 2 + 1 − τ 2 τ 2 + 2( α + 1) No w lets w ork on the low er b ound of Eq. ( 90 ). F or θ ≥ 0, Marko v inequalit y giv es Pr log 1 + 2 D ∗ α α ≥ θ ≤ E h log 1 + 2 D ∗ α α i θ T aking θ = log 1 + ε 2( α +1) τ 2 , with ε ∈ (0 , 1), we obtain log 1 + ε 2( α + 1) τ 2 Pr log 1 + 2 D ∗ α α ≥ log 1 + ε 2( α + 1) τ 2 ≤ E log 1 + 2 D ∗ α α hence log 1 + ε 2( α + 1) τ 2 Pr D ∗ α ≥ ε α ( α + 1) τ 2 ≤ E log 1 + 2 D ∗ α α (91) Using the Cheb yshev-type inequality ( 85 ) we obtain Pr D ∗ α ≥ ε α ( α + 1) τ 2 ≥ 1 − V ar ( D ∗ α ) (1 − ε 2 ) E [ D ∗ α ] 2 (92) Let c 1 ( α ) = V ar ( D ∗ α ) E [ D ∗ α ] 2 = τ 2 α ( α +1) 1 + 4 α +6 τ 2 , whic h is a decreasing function of α. Com bining Inequalities ( 91 ) and ( 92 ) with ( 90 ), we hav e the following inequal- ities, for an y ε ∈ (0 , 1) α log 1 + ε 2( α + 1) τ 2 1 − c 1 ( α ) 1 − ε 2 ≤ E [ N α ] ≤ α log 1 + 2( α + 1) τ 2 + 2( α + 1) τ 2 + 2( α + 1) where c 1 ( α ) → 0 as α → ∞ , and so, E [ N α ] = Θ( α log α ) , α → ∞ C.2. Pr o of of The or em 9 Consider the conditionally Poisson construction D ∗ α | W ∗ α ∼ P oisson( W ∗ 2 α ) ( U 0 1 , . . . , U 0 2 D ∗ α ) | D ∗ α , W α ∼ W α W ∗ α . F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 52 The n umber of U 0 j in an y in terv al [ a, b ] a < b ≤ α is distributed from a P oisson distribution with rate 2 W ([ a, b ]) W ([0 , α ]) and therefore goes to infinity as α go es to infinit y . W e can therefore inv oke asymptotic results on i.i.d. sampling from a normalized generalized gamma pro cess, W α W ∗ α . Let N α,j b e the num b er of clusters of size j in ( U 0 1 , . . . , U 0 2 D ∗ α ). In the di- rected graph mo del, N α,j corresp onds to the num b er of no des with j incom- ing/outgoing edges (self-edges count twice for a giv en no de). As the U 0 j are dra wn from a normalized generalized gamma pro cess of pa- rameters ( α, σ, τ ), we hav e the follo wing asymptotic result ( Pitman , 2006 ; Lijoi, Mena and Pr ¨ unster , 2007 , Corollary 1) N α,j N α − − − − → α → ∞ p σ,j = σ Γ( j − σ ) Γ(1 − σ )Γ( j + 1) . almost surely , for j = 1 , 2 , . . . . App endix D: Pro ofs of results on p osterior characterization D.1. Pr o of of The or em 12 W e first state a general Palm formul a for Poisson random measures. This result is used b y v arious authors in similar forms for c haracterization of conditionals in Bay esian nonparametric mo dels ( Pr ¨ unster , 2002 ; James , 2002 , 2005 ; James, Lijoi and Pr ¨ unster , 2009 ; Caron , 2012 ; Caron, T eh and Murphy , 2014 ; Zhou, Madrid-P adilla and Scott , 2014 ; James , 2014 ). Theorem 22 L et Π denote a Poisson r andom me asur e on a Polish sp ac e S with non-atomic me an me asur e ν . L et M b e the sp ac e of b ounde d ly finite me asur es on S , with sigma-field B ( M ) . L et f i , i = 1 , . . . , K b e functions fr om S to R + such that f i ( s ) f j ( s ) = 0 for al l i 6 = j . L et s 1: K = ( s 1 , . . . , s K ) ∈ S K and G b e a me asur able function on S K × M . Then we have the fol lowing gener alize d Palm formula E Π " Z S K G ( s 1: K , Π) K Y i =1 f i ( s i )Π( ds i ) # = Z S K E Π " G s 1: K , Π + K X i =1 δ s i !# K Y i =1 f i ( s i ) ν ( ds i ) (93) Pro of. The pro of is obtained by induction from the classical Palm formula ( Bertoin , 2006 ; Daley and V ere-Jones , 2008 ) E Π Z S f ( s ) G ( s, Π)Π( ds ) = Z S E Π [ G ( s, Π + δ s )] f ( s ) ν ( ds ) . (94) F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 53 Let G 1 ( s 1 , Π) = R S K − 1 G ( s 1: K , Π) Q K i =2 f i ( s i )Π( ds i ). Then E Π " Z S K G ( s 1: K , Π) K Y i =1 f i ( s i )Π( ds i ) # = Z S E Π [ G 1 ( s 1 , Π + δ s 1 )] f 1 ( s 1 ) ν ( ds 1 ) = Z S E Π " Z S K − 1 G ( s 1: K , Π + δ s 1 ) K Y i =2 f i ( s i ) [Π( ds i ) + δ s 1 ( ds i )] # f 1 ( s 1 ) ν ( ds 1 ) = Z S E Π " Z S K − 1 G ( s 1: K , Π + δ s 1 ) K Y i =2 f i ( s i )Π( ds i ) # f 1 ( s 1 ) ν ( ds 1 ) as f 1 ( s 1 ) f i ( s 1 ) = 0 for all i = 2 , . . . , K . Applying the same strategy recursively giv es ( 93 ). W e now pro v e Theorem 12 . The conditional Laplace functional of W α giv en D α is E e − W α ( f ) | D α , for any nonnegative measurable function f suc h that W α ( f ) = P i w i f ( ϑ i )1 ϑ i ∈ [0 ,α ] < ∞ . W e hav e W α ( f ) = Π( e f ) where Π = P ∞ i =1 δ ( w i ,ϑ i ) is a Poisson random measure on S = (0 , + ∞ ) × [0 , α ] with mean measure ν and e f ( w , ϑ ) = w f ( ϑ ). The Laplace functional can thus b e expressed in terms of the P oisson random measure Π E h e − W α ( f ) | D α i = E Π h e − Π( e f ) | D α i = E Π h R S N α e − Π( e f ) exp( − Π( h ) 2 ) Q N α i =1 g i ( w i , ϑ i )Π( dw i , dϑ i ) i E Π h R S N α exp( − Π( h ) 2 ) Q N α i =1 g i ( w i , ϑ i )Π( dw i , dϑ i ) i (95) where g i ( w , ϑ ) = w m i 1 dθ i ( ϑ ), h ( w, ϑ ) = w , hence Π( h ) = P ∞ i =1 w i = W α (1). Applying Theorem 22 to the numerator yields E Π " Z S N α e − Π( e f ) e − Π( h ) 2 N α Y i =1 g i ( w i , ϑ i )Π( dw i , dϑ i ) # = Z S N α E Π h e − Π( e f ) − P N α i =1 e f ( w i ,ϑ i ) e − ( Π( h )+ P N α i =1 w i ) 2 i N α Y i =1 g i ( w i , ϑ i ) ν ( dw i , dϑ i ) ! = Z S N α E W α e − W α ( f ) − P N α i =1 w i f ( ϑ i ) e − ( W α (1)+ P N α i =1 w i ) 2 N α Y i =1 g i ( w i , ϑ i ) ν ( dw i , dϑ i ) ! = Z S N α E W α (1) E W α h e − W α ( f ) | W α (1) i e − P N α i =1 w i f ( ϑ i ) e − ( W α (1)+ P N α i =1 w i ) 2 × N α Y i =1 g i ( w i , ϑ i ) ν ( dw i , dϑ i ) ! . F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 54 The denominator in ( 95 ) is obtained by taking f = 0. Then, after simplifica- tion, we obtain E h e − W α ( f ) | D α i = Z R N α +1 + E W α h e − W α ( f ) | W α (1) = w ∗ i × e − P N α i =1 w i f ( θ i ) p ( w 1 , . . . , w N α , w ∗ | D α ) dw 1: N α dw ∗ where p ( w 1 , . . . , w N α , w ∗ | D α ) = Q N α i =1 w m i i ρ ( w i ) e − ( w ∗ + P N α i =1 w i ) 2 g ∗ α ( w ∗ ) R R N α +1 + h Q N α i =1 e w m i i ρ ( e w i ) i e − ( e w ∗ + P N α i =1 e w i ) 2 g ∗ α ( e w ∗ ) d e w 1: N α d e w ∗ (96) D.2. Pr o of of The or em 13 The proof follows the same lines as in ( Caron , 2012 ) and is included for com- pleteness. The Laplace functional is expressed as E h e − W α ( f ) | D α , W 0 α i = E Π h e − Π( e f ) | D α , W 0 α i = E Π " R S N α e − Π( e f ) e − Π( h ) P N 0 α j =1 w 0 j + w 0 ∗ Q N α i =1 g i ( w i , ϑ i )Π( dw i , dϑ i ) # E Π R S N α e − Π( h ) P N 0 α j =1 w 0 j + w 0 ∗ Q N α i =1 g i ( w i , ϑ i )Π( dw i , dϑ i ) (97) where g i ( w , ϑ ) = w m i i 1 dθ i ( ϑ ), h ( w, ϑ ) = w , hence Π( h ) = P ∞ i =1 w i = W α (1). F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 55 Applying Theorem 22 to the numerator yields E Π " Z S N α e − Π( e f ) e − Π( h ) P N 0 α j =1 w 0 j + w 0 ∗ N α Y i =1 g i ( w i , ϑ i )Π( dw i , dϑ i ) # = Z S N α E Π " e − Π( e f ) − P N α i =1 e f ( w i ,ϑ i ) e − ( Π( h )+ P N α i =1 w i ) P N 0 α j =1 w 0 j + w 0 ∗ # × N α Y i =1 g i ( w i , ϑ i ) ν ( dw i , dϑ i ) = Z S N α E Π " e − Π( e f ) − P N α i =1 e f ( w i ,ϑ i ) e − Π( h ) P N 0 α j =1 w 0 j + w 0 ∗ # × N α Y i =1 g i ( w i , ϑ i ) e − w i P N 0 α j =1 w 0 j + w 0 ∗ ν ( dw i , dϑ i ) = E W α " e − W α ( f ) e − W α (1) P N 0 α j =1 w 0 j + w 0 ∗ # × N α Y i =1 Z S " e − w i f ( ϑ i ) w m i i 1 dθ i ( ϑ i ) e − w i P N 0 α j =1 w 0 j + w 0 ∗ ν ( dw i , dϑ i ) # The denominator in ( 95 ) is obtained b y taking f = 0: E W α " e − W α (1) P N 0 α j =1 w 0 j + w 0 ∗ # N α Y i =1 Z S w m i i 1 dθ i ( ϑ i ) e − w i P N 0 α j =1 w 0 j + w 0 ∗ ν ( dw i , dϑ i ) = e − αψ P N 0 α j =1 w 0 j + w 0 ∗ α N α N α Y i =1 κ m i , N 0 α X j =1 w 0 j + w 0 ∗ dθ i (98) where κ ( n, z ) = Z ∞ 0 w n exp( − z w ) ρ ( w ) dw App endix E: Details on the MCMC algorithms E.1. Simple gr aph The undirected graph sampler outlined in Section 6.1 iterates as follo ws: 1. Up date w 1: N α giv en the rest with Hamiltonian Monte Carlo 2. Up date ( α , σ, τ , w ∗ ) given the rest using a Metrop olis-Hastings step 3. Up date the laten t coun ts n ij giv en the rest using either the full conditional or a Metrop olis-Hastings step F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 56 Step 1: Up date of w 1: N α W e use an Hamiltonian Monte Carlo up date for w 1: N α via an augmented system with momentum v ariables p . See ( Neal , 2011 ) for on ov erview. Let L ≥ 1 b e the n umber of leapfrog steps and ε > 0 the stepsize. F or conciseness, we write U 0 ( w 1: N α , w ∗ , φ ) = ∇ ω 1: N α log p ( ω 1: N α , w ∗ , φ | D α ) w 1: N α ,w ∗ ,φ the gradient of the log-p osterior in ( 49 ). The algorithm pro ceeds b y first sam- pling momentum v ariables as p ∼ N (0 , I N α ) . (99) The Hamiltonian proposal q ( e w 1: N α , e p | w 1: N α , p ) is obtained b y the follo wing leap- frog algorithm (for simplicit y of exp osure, we omit indices 1 : N α ). Sim ulate L steps of the discretized Hamiltonian via e p (0) = p + ε 2 U 0 ( w , w ∗ , φ ) e w (0) = w and for ` = 1 , . . . , L − 1, log e w ( ` ) = log e w ( ` − 1) + ε e p ( ` − 1) e p ( ` ) = e p ( ` − 1) + εU 0 ( e w ( ` ) , w ∗ , φ ) and finally set log e w = log e w ( L − 1) + ε e p ( L − 1) e p = − h e p ( L − 1) + ε 2 U 0 ( e w , w ∗ , φ ) i e w = e w ( L ) . Accept the prop osal ( e w , e p ) with probabilit y min(1 , r ) with r = h Q N α i =1 e w m i i i exp − P N α i =1 e w i + w ∗ 2 Q N α i =1 e w i ρ ( e w i ) h Q N α i =1 w m i i i exp − P N α i =1 w i + w ∗ 2 Q N α i =1 w i ρ ( w i ) e − 1 2 P N α i =1 ( e p 2 i − p 2 i ) = " N α Y i =1 e w i w i m i − σ # e − ( P N α i =1 e w i + w ∗ ) 2 + ( P N α i =1 w i + w ∗ ) 2 − τ ( P N α i =1 e w i + P N α i =1 w i ) × e − 1 2 P N α i =1 ( e p 2 i − p 2 i ) Step 2: Up date of w ∗ , α, σ, τ F or our Metrop olis-Hasting step, we prop ose ( e α, e σ , e τ , e w ∗ ) from q ( e α, e σ , e τ , e w ∗ | α, σ, τ , w ∗ ) and accept with probability min(1 , r ) F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 57 where r = e − ( P N α i =1 w i + e w ∗ ) 2 e − ( P N α i =1 w i + w ∗ ) 2 " N α Y i =1 ρ ( w i | e σ, e τ ) ρ ( w i | σ, τ ) # × g ∗ e α, e σ , e τ ( e w ∗ ) g ∗ α,σ,τ ( w ∗ ) × p ( e α, e σ , e τ ) p ( α, σ, τ ) × q ( α, σ, τ , w ∗ | e α, e σ , e τ , e w ∗ ) q ( e α, e σ , e τ , e w ∗ | α, σ, τ , w ∗ ) (100) W e will use the following prop osal q ( e α, e σ , e τ , e w ∗ | α, σ, τ , w ∗ ) = q ( e τ | τ ) q ( e σ | σ ) q ( e α | e σ , e τ , w ∗ ) q ( e w ∗ | e α, e σ , e τ , e w ∗ ) where q ( e τ | τ ) = lognormal( e τ ; log( τ ) , σ 2 τ ) q ( e σ | σ ) = lognormal(1 − e σ ; log (1 − σ ) , σ 2 τ ) q ( e α | e σ , e τ , w ∗ ) = Gamma e α ; N α , ( e τ + 2 P w i + w ∗ ) e σ − τ e σ e σ q ( e w ∗ | e α, e σ , e τ , e w ∗ ) = g ∗ e α, e σ , e τ +2 P w i + w ∗ ( e w ∗ ) The choice of the prop osal for e w ∗ is motiv ated by the fact that it can b e written as an exp onen tial tilting of the p df g ∗ e α, e σ , e τ ( e w ∗ ) : g ∗ e α, e σ , e τ +2 P w i + w ∗ ( e w ∗ ) = exp( − 2 P w i − w ∗ ) g ∗ e α, e σ , e τ ( e w ∗ ) exp( − ψ e α, e σ , e τ ( w ∗ )) whic h will allow the terms in v olving the intractable p df g ∗ to cancel in the Metrop olis-Hastings ratio. The acceptance probabilit y reduces to ha ving r = e − ( P N α i =1 w i + e w ∗ ) 2 e − ( P N α i =1 w i + w ∗ ) 2 e α α Γ(1 − σ ) N α Γ(1 − e σ ) N α e − ( e τ − τ ) P N α i =1 w i " N α Y i =1 w i # − e σ + σ × p ( e α, e σ , e τ ) p ( α, σ, τ ) × 1 τ 1 1 − σ × 1 σ (( τ + 2 P w i + e w ∗ ) σ − τ σ ) N α e − w ∗ (2 P w i + e w ∗ ) 1 e τ 1 1 − e σ × 1 e σ (( e τ + 2 P w i + w ∗ ) e σ − e τ e σ ) N α e − e w ∗ (2 P w i + w ∗ ) . Finally , if we assume improp er priors on α, σ, τ p ( α ) ∝ 1 α , p ( σ ) ∝ 1 1 − σ , p ( τ ) ∝ 1 τ , then r = e − ( P N α i =1 w i + e w ∗ ) 2 + ( P N α i =1 w i + w ∗ ) 2 e − ( e τ − τ +2 w ∗ − 2 e w ∗ ) P N α i =1 w i × " N α Y i =1 w i # − e σ + σ " Γ(1 − σ ) σ (( τ + 2 P w i + e w ∗ ) σ − τ σ ) Γ(1 − e σ ) e σ (( e τ + 2 P w i + w ∗ ) e σ − e τ e σ ) # N α . F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 58 Step 3: Up date of the latent v ariables n ij Concerning the latent n ij , the conditional distribution is a truncated Poisson distribution ( 48 ) from which we can sample directly . An alternativ e strategy , which may be more efficient for a large num b er of edges, is to use a Metropolis-Hastings prop osal: q ( e n ij | n ij ) = 1 2 if e n ij = n ij + 1, n ij > 1 1 2 if e n ij = n ij − 1, n ij > 1 1 if e n ij = n ij + 1, n ij = 1 0 otherwise and accept the prop osal with probability min 1 , n ij ! e n ij ! ((1 + δ ij ) w i w j ) e n ij − n ij q ( n ij | e n ij ) q ( e n ij | n ij ) ! . E.2. Bip artite gr aph In the bipartite graph case, the sampler iterates as follows: 1. Prop ose ( e α, e σ , e τ ) ∼ q ( e α, e σ , e τ | α, σ, τ ) and accept with probability min(1 , r ) with r = exp − e αψ e σ , e τ P N 0 α j =1 w 0 j + w 0 ∗ e α N α Q N α i =1 κ e σ , e τ m i , P N 0 α j =1 w 0 j + w 0 ∗ exp − αψ σ,τ P N 0 α j =1 w 0 j + w 0 ∗ α N α Q N α i =1 κ σ,τ m i , P N 0 α j =1 w 0 j + w 0 ∗ × p ( e α ) p ( e σ ) p ( e τ ) p ( α ) p ( σ ) p ( τ ) × q ( α, σ, τ | e α, e σ, e τ ) q ( e α, e σ , e τ | α, σ, τ ) 2. F or i = 1 , . . . , N α , sample w i | rest ∼ Gamma m i − σ, τ + N 0 α X j =1 w 0 j + w 0 ∗ 3. Sample w ∗ | rest ∼ p ( w ∗ | r est ) = exp − w ∗ P N 0 α j =1 w 0 j + w 0 ∗ g α ( w ∗ ) exp h − ψ P N 0 α j =1 w 0 j + w 0 ∗ i using the algorithm of ( Devro ye , 2009 ). 4. Up date the laten t n ij giv en w 0 1: N 0 α , w 1: N α from a truncated Poisson distri- bution n ij | z , w , w 0 ∼ δ 0 if z ij = 0 tP oisson( w i w 0 j ) if z ij = 1 F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 59 5. Prop ose ( e α 0 , e σ 0 ) ∼ q ( e α 0 , e σ 0 | α 0 , σ 0 ) and accept with probability min(1 , r ) with r = exp − e α 0 ψ e σ 0 , 1 P N α i =1 w i + w ∗ e α 0 N α Q N α j =1 κ e σ 0 , 1 m 0 j , P N α i =1 w i + w ∗ exp − α 0 ψ σ 0 , 1 P N α i =1 w i + w ∗ α 0 N α Q N α j =1 κ σ 0 , 1 m 0 j , P N α i =1 w i + w ∗ × p ( e α 0 ) p ( e σ 0 ) p ( α 0 ) p ( σ 0 ) × q ( α 0 , σ 0 | e α 0 , e σ 0 ) q ( e α 0 , e σ 0 | α 0 , σ 0 ) 6. F or j = 1 , . . . , N 0 α , sample w 0 j | rest ∼ Gamma m 0 j − σ, 1 + N α X i =1 w i + w ∗ ! 7. Sample w 0 ∗ | rest ∼ p ( w ∗ | r est ) = exp − w 0 ∗ P N α i =1 w i + w ∗ g α ( w 0 ∗ ) exp h − ψ P N α i =1 w i + w ∗ i using the algorithm of ( Devro ye , 2009 ). References Aalen, O. (1992). Mo delling heterogeneity in surviv al analysis b y the com- p ound Poisson distribution. The A nnals of Applie d Pr ob ability 951–972. Adamic, L. A. and Glance, N. (2005). The p olitical blogosphere and the 2004 US election: divided they blog. In Pr o c e e dings of the 3r d international workshop on Link disc overy 36–43. ACM. Airoldi, E. M. , Cost a, T. B. and Chan, S. H. (2014). Stochastic blo ckmodel appro ximation of a graphon: Theory and consistent estimation. In A dvanc es in Neur al Information Pr o c essing Systems 26 . Airoldi, E. M. , Blei, D. , Fienber g, S. E. and Xing, E. (2008). Mixed mem- b ership sto chastic blo ckmodels. The Journal of Machine L e arning R ese ar ch 9 1981–2014. Aldous, D. J. (1981). Represen tations for partially exchangeable arra ys of random v ariables. Journal of Multivariate Analysis 11 581–598. Aldous, D. (1985). Exchangeabilit y and related topics. In Ec ole d’ ´ et´ e de Pr ob- abilit ´ es de Saint-Flour XIII - 1983 1–198. Springer. Arcones, M. A. and Gin ´ e, E. (1992). On the b o otstrap of U and V statistics. The Annals of Statistics 655–674. Barab ´ asi, A. L. and Alber t, R. (1999). Emergence of scaling in random net w orks. Scienc e 286 509–512. Ber toin, J. (2006). R andom fr agmentation and c o agulation pr o c esses 102 . Cam bridge Universit y Press. F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 60 Bickel, P. J. and Chen, A. (2009). A nonparametric view of netw ork mo d- els and Newman–Girv an and other mo dularities. Pr o c e e dings of the National A c ademy of Scienc es 106 21068–21073. Bickel, P. J. , Chen, A. and Levina, E. (2011). The method of momen ts and degree distributions for net work mo dels. The A nnals of Statistics 39 2280– 2301. Blackwell, D. and MacQueen, J. B. (1973). Ferguson distributions via P´ olya urn schemes. The Annals of Statistics 353–355. Bollob ´ as, B. (1980). A probabilistic proof of an asymptotic form ula for the n um b er of labelled regular graphs. Eur op e an Journal of Combinatorics 1 311– 316. Bollob ´ as, B. (2001). R andom gr aphs 73 . Cam bridge Universit y Press. Bollob ´ as, B. , Janson, S. and Riord an, O. (2007). The phase transition in inhomogeneous random graphs. R andom Structur es & Algorithms 31 3–122. Bollob ´ as, B. and Riordan, O. (2009). Metrics for sparse graphs. In Surveys in c ombinatorics , (S. Huczynsk a, J. D. Mitc hell and C. M. Roney-Dougal, eds.). L ondon Mathematic al So ciety L e ctur e Note Series 365 211–287. Cam- bridge Universit y Press, Borgs, C. , Cha yes, J. T. , Cohn, H. and Zhao, Y. (2014). An L p theory of sparse graph conv ergence I: Limits, sparse random graph models, and pow er la w distributions. arXiv pr eprint arXiv:1401.2906 . Britton, T. , Deijfen, M. and Mar tin-L ¨ of, A. (2006). Generating sim- ple random graphs with prescrib ed degree distribution. Journal of Statistic al Physics 124 1377–1397. Brix, A. (1999). Generalized gamma measures and shot-noise Cox pro cesses. A dvanc es in Applie d Pr ob ability 31 929–953. Brooks, S. P. and Gelman, A. (1998). General metho ds for monitoring con v ergence of iterativ e sim ulations. Journal of c omputational and gr aphic al statistics 7 434–455. Bu, D. , Zha o, Y. , Cai, L. , Xue, H. , Zhu, X. , Lu, H. , Zhang, J. , Sun, S. , Ling, L. and Zhang, N. (2003). T op ological structure analysis of the protein–protein in teraction netw ork in budding yeast. Nucleic acids r ese ar ch 31 2443–2450. B ¨ uhlmann, H. (1960). Austausch bare sto chastisc he V ariablen und ihre Gren- zw erts¨ atze PhD thesis, Univ ersity of California, Berkeley . Caron, F. (2012). Ba yesian nonparametric mo dels for bipartite graphs. In A dvanc es in Neur al Information Pr o c essing Systems 25 (F. Pereira, C. J. C. Burges, L. Bottou and K. Q. W ein b erger, eds.) 2051–2059. Curran Asso ciates, Inc. Caron, F. , Teh, Y. W. and Murphy, T. B. (2014). Ba yesian nonparametric Plac k ett-Luce mo dels for the analysis of preferences for college degree pro- grammes. The Annals of Applie d Statistics 8 1145-1181. Chen, T. , Fo x, E. B. and Guestrin, C. (2014). Sto chastic Gradient Hamil- tonian Mon te Carlo. In Pr o c. International Confer enc e on Machine L e arning 1683–1691. Clauset, A. , Shalizi, C. R. and Newman, M. E. J. (2009). Po wer-la w F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 61 distributions in empirical data. SIAM r eview 51 661–703. Colizza, V. , P astor-Sa torras, R. and Vespignani, A. (2007). Reaction– diffusion pro cesses and metapopulation models in heterogeneous netw orks. Natur e Physics 3 276–282. D aley, D. J. and Vere-Jones, D. (2008). An intr o duction to the the ory of p oint pr o c esses . Springer V erlag. de Finetti, B. (1931). F unzione caratteristica di un fenomeno aleatorio. Atti del la R. A c ademia Nazionale dei Linc ei, Serie 6. Memorie, Classe di Scienze Fisiche, Mathematic e e Natur ale 4 251-299. Devro ye, L. (2009). Random v ariate generation for exp onentially and p olyno- mially tilted stable distributions. ACM T r ansactions on Mo deling and Com- puter Simulation (TOMACS) 19 18. Diaconis, P. and Janson, S. (2008). Graph limits and exchangeable random graphs. R endic onti di Matematic a e del le sue Applic azioni. Serie VII 33–61. Duane, S. , Kennedy, A. D. , Pendleton, B. J. and R oweth, D. (1987). Hybrid Monte Carlo. Physics L etters B 195 216–222. Durrett, R. (2007). R andom gr aph dynamics . Cam bridge univ ersity press. Durrett, R. (2010). Pr ob ability: the ory and examples . Cambridge univ ersit y press. Erd ¨ os, P. and R ´ enyi, A. (1959). On random graphs. Public ationes Mathe- matic ae 6 290–297. F a v ar o, S. and Teh, Y. W. (2013). MCMC for normalized random measure mixture mo dels. Statistic al Scienc e 28 335-359. Feller, W. (1971). A n intr o duction to pr ob ability the ory and its applic ations 2 . John Wiley & Sons. Ferguson, T. S. and Klass, M. J. (1972). A represen tation of indep endent incremen t processes without Gaussian comp onen ts. The Annals of Mathemat- ic al Statistics 43 1634–1643. Fienberg, S. E. (2012). A brief history of statistical mo dels for netw ork anal- ysis and op en challenges. Journal of Computational and Gr aphic al Statistics 21 825–839. Freedman, D. A. (1996). De Finetti’s theorem in contin uous time. L e ctur e Notes-Mono gr aph Series 83–98. Gelman, A. , Carlin, J. B. , Stern, H. S. , Dunson, D. B. , Veht ari, A. and R ubin, D. B. (2014). Bayesian Data A nalysis . Chapman and Hall/CR C. Gin ´ e, E. and Zinn, J. (1992). Marcinkiewicz t yp e laws of large num b ers and con v ergence of momen ts for U-statistics. In Pr ob ability in Banach Sp ac es, 8: Pr o c e e dings of the Eighth International Confer enc e 273–291. Gnedin, A. , Hansen, B. and Pitman, J. (2007). Notes on the o ccupancy problem with infinitely man y b oxes: general asymptotics and p ow er laws. Pr ob ab. Surv 4 88. Gnedin, A. , Pitman, J. and Yor, M. (2006). Asymptotic laws for comp osi- tions derived from transformed sub ordinators. The Annals of Pr ob ability 34 468–492. Goldenberg, A. , Zheng, A. X. , Fienberg, S. E. and Airoldi, E. M. (2010). A survey of statistical netw ork mo dels. F oundations and T r ends in F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 62 Machine L e arning 2 129–233. Herlau, T. , Schmidt, M. N. and Mør up, M. (2014). Infinite-degree- corrected sto chastic blo c k mo del. Physic al R eview E 90 032819. Hoeffding, W. (1961). The strong law of large num b ers for U-statistics. In- stitute of Statistics ,Mime o series 302 . Hofer t, M. (2011). Sampling exponentially tilted stable distributions. A CM T r ansactions on Mo deling and Computer Simulation (TOMA CS) 22 3. Hoff, P. D. (2009). Multiplicative latent factor mo dels for description and prediction of social netw orks. Computational and Mathematic al Or ganization The ory 15 261–272. Hoff, P. D. , Rafter y, A. E. and Handcock, M. S. (2002). Latent space approac hes to so cial netw ork analysis. Journal of the americ an Statistic al asso ciation 97 1090–1098. Hoover, D. N. (1979). Relations on probability spaces and arrays of random v ariables. Pr eprint, Institute for A dvanc e d Study, Princ eton, NJ . Hougaard, P. (1986). Surviv al mo dels for heterogeneous p opulations deriv ed from stable distributions. Biometrika 73 387–396. James, L. F. (2002). Poisson pro cess partition calculus with applica- tions to exc hangeable mo dels and Bay esian nonparametrics. arXiv pr eprint math/0205093 . James, L. (2005). Ba yesian Poisson pro cess partition calculus with an applica- tion to Ba yesian L ´ evy moving av erages. The Annals of Statistics 1771–1799. James, L. (2014). Poisson Laten t F eature Calculus for Generalized Indian Buf- fet Pro cesses T echnical Rep ort, James, L. F. , Lijoi, A. and Pr ¨ unster, I. (2009). Posterior analysis for nor- malized random measures with indep enden t incremen ts. Sc andinavian Jour- nal of Statistics 36 76–97. Janson, S. (2011). Probability asymptotics: notes on notation T ec hnical Re- p ort, Kallenberg, O. (1990). Exchangeable random measures in the plane. Journal of The or etic al Pr ob ability 3 81–136. Kallenberg, O. (2005). Pr ob abilistic symmetries and invarianc e principles . Springer. Karlin, S. (1967). Cen tral limit theorems for certain infinite urn schemes. J. Math. Me ch 17 373–401. Karrer, B. and Newman, M. E. (2011). Stochastic blo ckmodels and com- m unit y structure in netw orks. Physic al R eview E 83 016107. Kemp, C. , Tenenba um, J. B. , Griffiths, T. L. , Y amad a, T. and Ued a, N. (2006). Learning systems of concepts with an infinite relational mo del. In AAAI 21 381. Khintchine, A. (1937). Zur Theorie der unbeschrankt teilbaren V erteilungs- gesetze. Mat. Sb ornik 2 79–119. Kim, M. and Lesko vec, J. (2012). Multiplicative attribute graph mo del of real-w orld netw orks. Internet Mathematics 8 113–160. Kingman, J. F. C. (1967). Completely random measures. Pacific Journal of Mathematics 21 59–78. F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 63 Kingman, J. F. C. (1993). Poisson pr o c esses 3 . Oxford Univ ersity Press, USA. Lauritzen, S. (2008). Exchangeable Rasch matrices. R endic onti di Matemat- ic a, Serie VII 28 83–95. Lee, M. L. T. and Whitmore, G. A. (1993). Sto chastic pro cesses directed b y randomized time. Journal of applie d pr ob ability 302–314. Lewis, P. A. and Shedler, G. S. (1979). Simulation of nonhomogeneous Poisson pro cesses by thinning. Naval R ese ar ch L o gistics Quarterly 26 403– 413. Lijoi, A. , Mena, R. H. and Pr ¨ unster, I. (2007). Controlling the reinforce- men t in Ba yesian non-parametric mixture mo dels. Journal of the R oyal Sta- tistic al So ciety: Series B (Statistic al Metho dolo gy) 69 715–740. Lijoi, A. , Pr ¨ unster, I. and W alker, S. G. (2008). In vestigating nonpara- metric priors with Gibbs structure. Statistic a Sinic a 18 1653. Lijoi, A. and Pr ¨ unster, I. (2010). Mo dels b eyond the Dirichlet pro cess. In Bayesian Nonp ar ametrics (P . M. S. G. W. N. L. Hjort C. Holmes, ed.) Cam- bridge Universit y Press. Lloyd, J. , Orbanz, P. , Ghahramani, Z. and Ro y, D. (2012). Random func- tion priors for exchangeable arrays with applications to graphs and relational data. In NIPS 25 1007–1015. McAuley, J. and Lesk ovec, J. (2012). Learning to disco ver so cial circles in ego net works. In A dvanc es in neur al information pr o c essing systems 539–547. Miller, K. , Griffiths, T. and Jordan, M. (2009). Nonparametric latent feature mo dels for link prediction. In NIPS . Neal, R. M. (2011). MCMC using Hamiltonian dynamics. In Handb o ok of Markov Chain Monte Carlo , (S. Bro oks, A. Gelman, G. Jones and X. L. Meng, eds.) 2 Chapman & Hall / CR C Press. Ne ˇ set ˇ ril, J. and Ossona de Mendez, P. (2012). Sp arsity (Gr aphs, Struc- tur es, and Algorithms) . Springer. Newman, M. E. J. (2001). The structure of scientific collab oration net w orks. Pr o c e e dings of the National A c ademy of Scienc es 98 404–409. Newman, M. E. J. (2003). The structure and function of complex net works. SIAM r eview 167–256. Newman, M. (2009). Networks: an intr o duction . OUP Oxford. Newman, M. E. J. , Stroga tz, S. H. and W a tts, D. J. (2001). Random graphs with arbitrary degree distributions and their applications. Physic al R eview E 64 26118. Norros, I. and Reittu, H. (2006). On a conditionally Poissonian graph pro- cess. A dvanc es in Applie d Pr ob ability 38 59–75. Nowicki, K. and Snijders, T. (2001). Estimation and prediction for stochastic blo c kstructures. Journal of the Americ an Statistic al Asso ciation 96 1077– 1087. Oga t a, Y. (1981). On Lewis’ sim ulation method for p oint pro cesses. IEEE T r ansactions on Information The ory 27 23–31. Opsahl, T. and P an zarasa, P. (2009). Clustering in weigh ted netw orks. So- cial networks 31 155–163. Orbanz, P. and R oy, D. M. (2015). Bay esian Mo dels of Graphs, Arrays and F. Caron and E. F ox/Bayesian nonp ar ametric r andom gr aphs 64 Other Exchangeable Random Structures. IEEE T r ans. Pattern Anal. Mach. Intel ligenc e (P AMI) 37 437-461. P alla, K. , Knowles, D. A. and Ghahramani, Z. (2012). An Infinite Laten t A ttribute Mo del for Netw ork Data. In ICML . Penrose, M. (2003). R andom ge ometric gr aphs 5 . Oxford Univ ersit y Press. Pitman, J. (1995). Exc hangeable and partially exchangeable random parti- tions. Pr ob ability The ory and R elate d Fields 102 145–158. Pitman, J. (1996). Some dev elopmen ts of the Blac kwell-MacQueen urn sc heme. L e ctur e Notes-Mono gr aph Series 245–267. Pitman, J. (2003). Poisson-Kingman partitions. L e ctur e Notes-Mono gr aph Se- ries 1–34. Pitman, J. (2006). Com binatorial Stochastic Processes. In Ec ole d’Et´ e de Pr ob- abilit ´ es de Saint-Flour XXXII–2002. L e ctur e Notes in Mathematics. Springer. Pr ¨ unster, I. (2002). Random probabilit y measures derived from increasing additiv e pro cesses and their application to Bay esian statistics PhD thesis, Univ ersit y of P avia. Regazzini, E. , Lijoi, A. and Pr ¨ unster, I. (2003). Distributional results for means of normalized random measures with indep endent incremen ts. The A nnals of Statistics 31 560–585. R ohe, K. , Cha tterjee, S. and Yu, B. (2011). Sp ectral clustering and the high-dimensional sto chastic blo ckmodel. The A nnals of Statistics 39 1878– 1915. v an der Hofst ad, R. (2014). Random graphs and complex netw orks. V ol. I. T echnical Rep ort, Department of Mathematics and Computer Science. Eind- ho v en Universit y of T ec hnology . W a tts, D. J. and Stroga tz, S. H. (1998). Collectiv e dynamics of ‘small- w orld’ netw orks. Natur e 393 440–442. W olfe, P. and Choi, D. S. (2014). Co-clustering separately exchangeable net w ork data. A nnals of Statistics 42 29–63. W olfe, P. J. and Olhede, S. C. (2013). Nonparametric graphon estimation. A rXiv pr eprint arXiv:1309.5936 . Zhao, Y. , Levina, E. and Zhu, J. (2012). Consistency of comm unity detection in net works under degree-corrected sto chastic blo ck mo dels. The Annals of Statistics 40 2266–2292. Zhou, M. , Madrid-P adilla, O. H. and Scott, J. G. (2014). Priors for random coun t matrices derived from a family of negativ e binomial pro cesses. A rXiv pr eprint arXiv:1404.3331 .
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment