A unified view of generative models for networks: models, methods, opportunities, and challenges

A uniﬁed view of generativ e models f or netw orks: models, methods, opportunities, and challenges Abigail Z. Jacobs Dept of Computer Science Univ ersity of Colorado Boulder abigail.jacobs@colorado.edu Aaron Clauset Dept of Computer Science; BioFrontiers Institute Univ ersity of Colorado Boulder Santa Fe Institute aaron.clauset@colorado.edu Abstract Research on probabilistic models of networks now spans a wide v ariety of ﬁelds, including physics, sociology , biology , statistics, and machine learning. These efforts hav e produced a di verse ecology of models and methods. Despite this div ersity , many of these models share a common underlying structure: pairwise interactions (edges) are generated with probability conditional on latent verte x at- tributes. Differences between models generally stem from different philosophical choices about how to learn from data or different empirically-motiv ated goals. The highly interdisciplinary nature of work on these generativ e models, howe ver , has inhibited the dev elopment of a uniﬁed view of their similarities and dif ferences. For instance, no vel theoretical models and optimization techniques de veloped in machine learning are largely unknown within the social and biological sciences, which hav e instead emphasized model interpretability . Here, we describe a uni- ﬁed view of generative models for networks that draws together many of these disparate threads and highlights the fundamental similarities and differences that span these ﬁelds. W e then describe a number of opportunities and challenges for future work that are re vealed by this vie w . 1 Introduction The quantitativ e study of the structure, dynamics, and function of real-world networks is an emerg- ing interdisciplinary ﬁeld that spans nearly every domain of science. Generativ e models, in which we deﬁne a structured probability distribution over all graphs, are a powerful and increasingly pop- ular approach for studying real-world networks. W ork on generative models now spans machine learning, statistics, the social sciences, ecology , and statistical physics. Howe ver , as a result of their different traditions and goals, these communities ha ve dev eloped a range of different generati ve models, including latent position models [1], block models [2], and feature models [3]. Further- more, these communities use dif ferent methods to learn such models from data, including frequen- tist, Bayesian, and nonparametric methods. Despite this large and div erse ecology of approaches, formulating a coherent understanding of whether and ho w these models are related remains an open challenge (notwithstanding several thoughtful efforts [4, 5, 6, 7]). W e focus on the general frame- work of modeling conditionally independent edges between vertices with latent v ertex attributes. Here, we survey and organize this diversity of models and methods, and then introduce a uniﬁed view of the ﬁeld. This view positions the three main classes of generativ e models for networks as special cases of a single coherent framework, positions the three major approaches to learning these models as different philosophical choices, and yields new insights from translation, such as theoretical properties, interpretation, and new inference algorithms. W e close with a discussion of the challenges and opportunities this view presents for the ﬁeld. 1 2 A walking tour of existing models The follo wing brief tour of generati ve models for networks illustrates the large di versity of ap- proaches, as well as the dif ferent traditions used across disciplines. Despite dif ferent points of origin and assumptions for many of these models, we emphasize their underlying similarity throughout this tour of models. W e note that dynamic networks, multiplex networks, networks with metadata, or networks produced from physical processes [8] can often be modeled in a straightforward manner using the model classes described below . The ﬁrst class of models is the latent space models. The canonical latent space model w as introduced in statistics and mathematical sociology by Hoff and colleagues [1]. In this model, vertices take latent positions z i ∈ R K for a K -dimensional space and edges are generated with a probability that depends only on the Euclidean distance between vertex positions. Subsequently , Hoff introduced a more ﬂexible construction called “the eigenmodel” that captures both the latent space model and latent block models [4]. In general, latent space models assume latent continuous vertex attributes with edge probabilities giv en by a distance function on those attributes. Moreov er , by relaxing the continuous (or Euclidean [9]) space assumption and allowing edge probabilities to be determined by a generic function of those attributes, we can recover as special cases each of the two model classes discussed belo w . Thus, latent space models ef fectively subsume all generati ve models for networks. In the canonical latent space model, connection probabilities are assumed to decrease with latent distance, which induces strong assortativity in the resulting networks ( like vertices connect with like ). This pattern implies that v ertices can be clustered by their latent positions. It then is natural to deﬁne a hierarchical model that explicitly represents such clusters within the latent space model [10]. But, homophily can be a misleading assumption for many real networks, which sometimes exhibit ordered (status-oriented) or disassortati ve patterns ( dislike connects with dislike ). Disassortativity in particular is common in many biological, technological, and economic networks. F or example, the probabilistic niche model from ecology is a general latent space model that can naturally produce assortativ e, ordered, or disassortative latent clustering by dissociating sending and receiving latent positions of each verte x [11]. Finally , exponential random graph models in the sociological literature give network likelihood an exponential family form, typically as a function of network statistics; whenev er they employ latent variables, the y also fall within the latent space model class [12, 6]. Our second class is the block models, which use a latent space deﬁned by categorical v ariables rather than continuous variables: vertices take positions z i in some latent space { 0 , 1 } K for K blocks, where || z i || 1 = 1 . These models are often used to cluster vertices into roughly homogeneous groups or classes, a task akin to “community detection” in netw ork science [13, 8]. Block models are by f ar the most well-explored and broadly used class of generati ve models for networks. Block models originated in mathematical sociology [14, 15, 16] and represent activ e, albeit some- what disconnected, areas of research in sociology , machine learning, and statistical ph ysics [6, 7, 8]. In a block model, vertices can belong to either exactly one latent block (hard clustering) or have partial memberships in multiple blocks (ov erlapping or mixed membership). V ertices within a block are stochastically equi v alent, meaning any pair of vertices in a gi ven block ha ve the same probabili- ties to connect to all other v ertices in the network, and block interactions describe a coarse-graining of the network interactions. If vertices can hav e mixed membership, vertices instead take latent positions in the simplex [2]. In nonparametric v ariations, K is ﬁrst sampled, otherwise ﬁxed in ad- vance [6, 17]. The simplest block model is the Erd ˝ os-R ´ enyi random graph, in which K = 1 . Larger models can exhibit a rich v ariety of block-lev el patterns, including assortativ e, disassortativ e, core- periphery , ordered, bipartite structure [18] and overlapping groups [2]. Furthermore, block models can be e xtended to directly model other structural information, including degree heterogeneity [13], edge-weights [19], social status [20], or a growing number of blocks [6, 17, 21]. It is worthwhile to note that block models include topic models as a special case [2]. The third class is the latent feature models. This class is a natural progression from latent blocks and explicitly highlights the connection to the canonical latent space model. Instead of dividing total membership into one or across a mixture of groups, latent feature models allo w vertices to hav e ar- bitrarily many unique features, typically binary , so that verte x attrib utes ha ve the form z i ∈ { 0 , 1 } K , where || z i || 1 ≤ K . Machine learning offers a rich literature on latent feature models that extends 2 naturally to relational data, i.e., networks [3]. Edge probabilities are giv en as a function of a weighted sum of feature vectors: by allo wing these weights to be positiv e or negati ve, latent feature models can allo w verte x features to combine assortati vely or disassortati vely . Richer structure, including hi- erarchies on the features, can also be imposed [6, 22]. Alternativ ely , ﬁx ed- K (parametric) v ariations combine features with block models [23] or in a regression frame work [24]. A fourth class includes models deﬁned on a latent hierarchical or tree-like space. Although such a latent structure may appear distinct from the previous model classes, these models of hierarchical structure are in fact a special subset of the other classes. For instance, we can directly impose a hier- archical org anization on the latent block or feature models. Some models can be explicitly deﬁned using an underlying tree structure, in which vertices in the network are leaves and common ances- tors determine the probability of connection, as in the (nonparametric) hierarchical random graph model [25] and the Bayesian and nonparametric variations [26, 27, 6]. Each of these models uses a piece-wise constant underlying space [28], or a discrete latent space model, in which ultrametric distances between vertices determine the probability of connection [29]; the continuous v ersion can be captured by hyperbolic geometry of a graph [9, 30]. 3 Philosophies, repr esentations, and uniﬁcations In order to learn a particular generati ve model using real netw ork data, we must also choose a learn- ing paradigm that deﬁnes the relationship between an observed network and the v arious parameters of the model (including complexity parameters like K ). Here, we characterize the three major ap- proaches for learning network models in terms of their different philosophical assumptions, and illustrate how moving a model between paradigms can yield insights into theory and interpretation, and shed new light on both practical and ef ﬁcient application of these models to network data. Generativ e models for networks can be deﬁned as parametric, frequentist or Bayesian, or under the Bayesian nonparametric paradigms. (Frequentist nonparametric methods are available as well, although relativ ely uncommon in this context. There is a bit of this ﬂavor in some recent pa- pers [20, 30, 31, 32], howe ver .) Ho we ver , different communities usually fav or one paradigm ov er the others, which generally reﬂects differences in traditions and goals as much as it reﬂects any objectiv e rationale. As popular as hierarchical Bayesian models are in machine learning, frequen- tist perspectives still dominate the ecology , economics and statistical physics literatures, often with an emphasis on interpretability . Some network models hav e appeared in multiple communities, and these can be independent rediscoveries under dif fering paradigms or are intentional adaptations from one paradigm to another, although this is not always obvious. There is thus much to be gained by unifying various netw ork models within a single paradigm of learning from data. Past efforts at uniﬁcation hav e generally focused on placing large classes of models within a common representation and restricted set of philosophical choices. For instance, the eigenmodel [4] captures both latent space and block models, which can also be represented within a generalized linear model schema [5]. Another general approach is matrix estimation [31], which can be applied broadly to networks in their adjacency matrix representation, and is in the tradition of nonparametric graph limit (graphon) estimation [33, 34, 35, 36, 37]. Bayesian nonparametric models are also increasingly popular, particularly in the machine learn- ing community , and many of the models previously described can be uniﬁed under this frame- work [6, 28, 37]. Placing generativ e models for networks under the Bayesian nonparametric um- brella allows the model implies a particular method of model selection (which is effecti vely sub- sumed within the inference step). Bayesian nonparametric models allow the structural parameters to change, sometimes dramatically so, as more data is observed, e.g., with an increasing number of blocks or features. In practice, this means we begin with an inﬁnite dimensional space and move to a ﬁnite representation. Practical application also requires a choice of priors that are tractable— though the consequences of which remain unclear , e.g., for consistency and practical performance. On the other hand, efforts under the Bayesian nonparametric framew ork hav e successfully led to nov el machinery for modeling, inference, and theory [38, 28, 36]. T o illustrate the more general theoretical perspecti ve to be gained from this broad view of learning network models, we explore recent work deﬁning the underlying theoretical basis for generative 3 models for networks via the graphon. This approach is not without its limitations, howe ver , and we then explore a related b ut distinct approach based on continuous processes. 3.1 Uniﬁcation under the graphon A reasonable and desirable property of generative models for networks is that they should not de- pend on the order in which we observe data, i.e., that our models are exc hangeable . Network data, presented as a (random) adjacency matrix, requires joint exchangeability , so row and column iden- tities in the matrix are jointly preserved under permutations of the data. 1 Exchangeability as a requirement leads to representation theorems. F or exchangeable sequences , de Finetti’ s theorem implies that they can be represented by an underlying i.i.d. mixture of random variables. For a jointly exchangeable adjacency matrix, the Aldous-Hoover theorem implies the existence of a ran- dom measurable function, giv en in the following manner: for each vertex i , draw U i uniformly at random from the unit interv al [0 , 1] , and for each pair of v ertices i, j , draw U ij uniformly at random from [0 , 1] . There then exists a function ω : [0 , 1] 2 → [0 , 1] such that the adjacency matrix entries follow X ij d = 1 U ij <ω ( U i ,U j ) . W e call this function ω the graphon [28]. Equiv alently , the graphon is the limit object of a sequence of graphs [39, 40, 41], which can then in theory be directly estimated nonparametrically [33, 35, 36, 34, 37]. From this perspectiv e, exchangeability for network data implies the existence of a latent variable generativ e model, and furthermore that in such a model, edges are conditionally independent. The latent variable models discussed abo ve can be justiﬁed under this framework [4]. Howe ver , an important consequence of the graphon construction is that graphs will be almost surely dense ( Θ( n 2 ) edges) or trivially empty [28, 41]. That is, e xchangeability of a netw ork model implies that we are in the regime of dense netw orks. This is problematic because most real-world netw orks, in fact, are sparse (vertices have constant or v ery slowing-gro wing de gree; graphs ha ve O ( n ) edges). Most of the models discussed in our walking tour fall within the jointly exchangeable frame work of Aldous-Hoov er . This would seem to imply that all generati ve models for networks are misspeciﬁed, despite their many successful practical applications. One solution to this fundamental problem is to abandon exchangeability in fav or of alternativ e properties [28], or to attempt to escape the Aldous- Hoov er representation entirely [38]. 3.2 Uniﬁcation under continuous space T o move beyond the graphon, Caron and Fox lift the representation of discrete adjacency matrices into a continuous space [38]. Under this formulation, graphs are generated purely as a point process. By representing a graph as a sample from a continuous-time process, the Caron-Fox formulation sidesteps the dense-graph implication of the Aldous-Hoover approach while still preserving joint exchangeability . Instead of a random measurable function on the unit square, this approach implies, due to Kallenberg, a representation as a mixture of random functions. This alternative approach is promising, as it presents the possibility that generativ e models for net- works are not all misspeciﬁed, so long as they can be reformulated as a point process model for networks. T o demonstrate this approach, Caron and F ox describe sev eral choices of parametrization that correspond to an Erd ˝ os-R ´ enyi G ( n, p ) model, graphon model, and the conﬁguration model for random graphs with speciﬁed degree sequence. Each vertex has a latent variable weight parame- ter corresponding to sociability , which is analogous to degree “propensity” in the popular degree- corrected stochastic block model [13]. This model parameterization connects their model explicitly to the family of physical models with speciﬁed de gree distribution [8]. Introducing higher order structure, such as community , ordered, or hierarchical structure, has not yet been accomplished, and will be a natural and productiv e extension of these models. Hierarchi- cal models that are based on the point process model seem likely to yield new theoretical insights and inference algorithms. On the other hand, more is currently known about the graphon and the pathologies of discrete graph models than about the continuous-space approach. Thus, there may be as-yet-unknown difﬁculties lurking within the point process model. One immediate issue is that 1 Bipartite netw orks and feature data require separable exchangeability , as row and column identities need not be related. Joint exchangeability is a special case that notably requires symmetry of the graphon. 4 this model only yields graphs with superlinear density O ( n α ) , 1 < α < 2 , that is, it does not model extremely sparse graphs ( O ( n ) edges for O ( n ) vertices) [38]. It will be crucial to assess the prac- tical implications of these models to understand if, when, and how they break down on real-world network data. 4 Challenges and opportunities Moving models between learning paradigms has already been a productiv e source of new ways to model networks. Howe ver , without a broader interest in the application of these models or in the underlying theoretical questions about networks, such model dev elopment risks being primarily a ﬂuency ex ercise in philosophical bases and notation. The broad view of generative models for networks described above reveals a number of challenges and opportunities for the ﬁeld, many of which are directly related to moving between representations and translating insights across ﬁelds. Interpr etability . The interpretability of model parameters, model structure, and their relationship to netw ork data are crucial for inference about real-world complex systems. Identifying interpretable underlying structure is a strong motiv ation for many of the applications of network models in the sciences. For instance, the structure of social networks is generally believ ed to be driv en by latent spaces, including geography [42], socioeconomic status [43], and popularity [20]. And, models of biological networks generally seek structural “modules” that are meaningfully related to biological function, e.g., species with similar feeding patterns in food webs, the different disease phenotypes of the malaria parasite, or proteins with similar cellular functions [25, 18, 2]. Models motiv ated by speciﬁc applications can be useful for understanding how to interpret the dif fer- ent patterns of large-scale network org anization encoded by different models [7]. Extremely general models often have limited interpretability , and thus also limited utility for scientiﬁc applications. That is, we tend to trade-off between model interpretability and model generality (not necessarily model complexity). For instance, approximating the graphon with Gaussian processes [37] is quite general but such a model cannot easily produce useful insights about the underlying mechanisms that generated any particular network. Thus, an opportunity and a challenge is the adaptation of structural metaphors from applied domains to general models, allowing us to lev erage the corre- sponding interpretations, e.g., social group structure to block models or ecological niche structure to latent space models [7, 11]. Homophily . Assortativity is the tendency in observational data that v ertices that are connected hav e similar attributes. Many generativ e models for networks assume that similarity between ver- tices is what generates edges (a process called homophily ), and conv ersely , that we can infer verte x similarity directly from the structure of a network. That is, these models assume homophily is the underlying generative mechanism. Howe ver , assortativity also occurs when vertices become more similar as a result of an existing connection (a process sometimes called “social contagion, ” although that name has unfortunate connotations). In particular , inﬂuence can also produce assortativity , and few generati ve models allow for both mechanisms. Deciding which of the two governs a particular system is sometimes scientiﬁcally attractiv e but it is also generally statistically impossible [44]. Observed or latent attributes can dri ve assortativity directly , as in homophily , but homophily and structural similarity are easily confounded as well. (There remains another idea to conﬂate: detect- ing assortativity is not the same as predicting, giv en some vertex features, that similar vertices will be connected. Assortativity only guarantees that giv en an edge, vertex features are likely to be sim- ilar .) Furthermore, vertex attrib utes and network structure may be unrelated. Fosdick and Hoff [45] designed a framework to ﬁrst test for dependence between vertex attributes and network structure, and then model network structure and attributes jointly . Even then, dependence between attributes and latent network structure is not causality . Latent positions correlated with known metadata may be correlated with the true causal mechanism, e.g., jointly caused by a true but unknown mecha- nism [46]. Although latent space models do not solve the correlation-causation problem, they can be a useful way to instrumentalize these processes. In the opposite direction, we can lev erage patterns of assortativity to make useful predictions. Rec- ommender systems are built on this notion, where we can infer proﬁles of interests given a user proﬁle. In sociology for instance, McCormick and Zheng use the latent space model [1] to model 5 the distribution of latent social positions of partially unobserved populations [47]. The y use these distributions to infer demographic proﬁles of underrepresented populations, i.e., in a setting where such assumptions about generalizability and assortativity may be justiﬁed. Model selection Practical model selection for models of netw ork data remains an open challenge. This includes ﬁnding the dimensionality of the latent space or number of blocks or features: dif- ferent choices can easily introduce a linear number of new parameters. A number of methods have been applied, including AIC [11] and BIC [2] (both of which are known to f ail in our context [48]); MDL [49, 50]; Bayes factors [19, 51, 10]; and likelihood ratios [48]. Bayesian nonparametric mod- els mo ve the modeling selection task inside the model, where dimensionality becomes a parameter to be inferred. In general, it is unclear when these methods fail. Potentially inappropriate assumptions made during inference (e.g., assortativity of latent blocks [52]) and lack of effecti ve validation [46] complicate this already non-trivial challenge. Model checking Even if our fav orite model selection technique succeeds, ho w do we know if the latent variable representation we ha ve inferred is reasonable? One answer is metadata: for example, if the recovered latent space corresponds to verte x information that was excluded from the model [45]. On the other hand, using unsupervised methods to ﬁnd patterns that correlate with ‘ground truth’ can be problematic, depending on how we validate and interpret the inferred verte x distribution and latent space [46]. Simulation-based model checking can be used to further probe the reasonableness of our models, that is, th egoodness of ﬁt [53, 54]. Model checking can then expand be yond link prediction: e.g., by measuring the structural similarity of resampled graphs to the original data. This manner of model checking is also reminiscent of approximate Bayesian computation (ABC), which a voids intractable optimization problems by comparing other properties from the resampled model. Model checking in this manner can be used further to compare approximate inference methods. Finally , model checking provides a rob ust critical method to in vestigate hypotheses [54]. Model checking also points the way to potential method to compare networks, which is currently an open challenge for working with network data. Howe ver , we currently lack a sensible general framew ork with which to compare networks. For networks suspected to have come from the same generativ e model, one can resample networks and compare the netw orks of interest. (Even this is de- pendent on the model speciﬁcation, howev er . Exponential random graph models, for example, lack consistency under subsampling [55].) If the graphon is used directly , one would be able to directly compare netw orks of dif ferent sizes, b ut this requires estimating the graphon directly such that sam- ples could be compared. T o sidestep this issue, Asta and Shalizi recently proposed a nonparametric graph comparison method based on hyperbolic latent space models [30]. Optimization Models for networks suffer dependencies among data and model parameters, and tend to scale poorly or suffer from strong degeneracies. MCMC methods remain popular across do- mains, despite poor scalability [7, 12, 11, 8]. V ariational methods and belief propagation (message passing) have become popular in both statistical physics and machine learning (e.g., [2, 56, 19, 8]), including recent extension to stochastic variational inference [52]. V ariational methods are much faster , but the costs on these models of their mean-ﬁeld approximation (independence assumptions) are not well understood. In addition, a number of algorithms make assumptions that cannot be widely true, such as strictly assortativ e clustering [52]. Belief propagation, like wise, is very efﬁ- cient but necessarily approximate and only suitable for (tree-like) sparse graphs. Pseudolikelihood methods ha ve been dev eloped for latent feature models [57], block models [58], and the latent space model [59]. Currently , Bayesian nonparametric models tend to suffer during optimization and gen- erally scale poorly , but we e xpect this to be an acti ve area moving forw ard [6]. Model use and accessibility Making new models relev ant to the wider communities using net- work analysis requires making them easily accessible in language, construction, and implemen- tation. One challenge already troubling inference and analysis for these models comes from the inherent symmetries in the model construction. T ranslational and reﬂectional symmetries, as well as other forms of non-identiﬁability , are a practical issue for comparing and combining point estimates and posterior probabilities (i.e., model selection and comparison). These symmetries contribute to the ruggedness of the likelihood space, which further challenges the validity of variational as- 6 sumptions for model inference [60]. This—and the challenges already discussed—will necessarily create challenges for automatic, rapid model construction and ev aluation, as in probabilistic pro- gramming [60]. Sometimes this can be resolved simply with a canonical (but arbitrary) setting (e.g., [25]) but ev en this can be theoretically non-trivial [33]. Even if model-ﬁtting tools become widely a vailable, applied network analysis still generally lacks best practices for coping with model non-identiﬁabilities. T o meaningfully impact change, such dev elopments will need to be developed jointly between theoreticians and practitioners. 5 Discussion Generativ e models are a po werful and increasingly popular approach for understanding the structure of networks. They are also studied or used by researchers spanning a surprisingly div erse set of ﬁelds, including both method-oriented ﬁelds like machine learning and question-oriented ﬁelds lik e ecology and ph ysics. This di versity has produced a broad v ariety of models and a highly fragmented literature. It has also slo wed the development of a coherent understanding of the theoretical basis and practical applications of these models. Most of the speciﬁc generative models de veloped across these literatures, ho wever , can be vie wed as being special cases of a general latent space model, where the latent space may hav e certain restrictions or characteristics, and employing one of three particular learning paradigms for working with actual data. This view is not a grand uniﬁed mathematical theory , but it does provide a coherent understanding of how different models in different ﬁelds are related. This view also highlights a number of interesting challenges and opportunities for future work on generati ve models for netw orks. Even after connecting these perspecti ves, we still face the challenge of e valuating the theoretical and practical costs and beneﬁts of different sets of modeling assumptions or choices over one another . For example, how can we understand what is being traded off when we use a latent block model under a Bayesian nonparametric learning paradigm versus a general latent space model under a frequentist learning paradigm? Ev en when pursuing the same basic goals, dif ferent communities of researchers and even dif ferent individuals can rationalize different modeling choices, e.g., which priors to use for regularization, how to manage model parameter uncertainty , or when to employ frequentist-style hypothesis testing. At their core, many of these choices reﬂect fundamental beliefs about the nature of data, e.g., the choice of distance function or the choice of representation for class or feature memberships. Other choices reﬂect fundamental beliefs about the nature of models, such as belie ving that the number of parameters should grow with larger sample sizes or that domain knowledge should be incorporated into the expected distributions and relationships between variables. The costs of these choices are often unkno wn in part because the modeling goals are usually unstated, but the consequences can be quite real. For instance, what is the empirical cost of choosing mathematically tractable priors for Bayesian nonparametric models? How much more or less robust is low-le vel parametric modeling versus a general nonparametric approximation? One of the genuine advantages of generative models for networks is that they explicitly encode our beliefs about both the nature of the data and the nature of our models, which allo ws us to carefully quantify parameter and model uncertainty and to interpret the results with respect to the data generating process. What we lack, howe ver , is an effecti ve, principled way to ev aluate the costs and beneﬁts of speciﬁc choices with respect to speciﬁc goals. Making progress on this fundamental issue would shed considerable light on all aspects of gener- ativ e models for networks. Connecting models to high-level representations, for discrete random graphs [28, 7] and continuous space representations [38] provides an illustrative example of ex- ploring these boundaries. T ranslating modeling techniques such as block structure and hierarchical structure to the point process model should provide an e xciting new ground for model de velop- ment, borro wing interpretation and developments already established on simpler , discrete spaces and physical, ecological, and social processes. Finally , jointly building theoretical foundations for these models while representing more complex structure in network data will help unite and beneﬁt network science, both in applied and theoretical domains. 7 Acknowledgments This work was supported by the US AFOSR and DARP A grant number F A9550-12-1-0432 (AZJ, A C) and the NSF Graduate Research Fellowship award number DGE 1144083 (AZJ). References [1] P . D. Hoff, A. E. Raftery , and M. S. Handcock, “Latent space approaches to social network analysis, ” Journal of the American Statistical Association , vol. 97, no. 460, pp. 1090–1098, 2002. [2] E. M. Airoldi, D. M. Blei, S. E. Fienber g, and E. P . Xing, “Mixed Membership Stochastic Blockmodels, ” Journal of Machine Learning Resear ch , v ol. 9, pp. 1981–2014, 2008. [3] K. Miller, T . L. Grifﬁths, and M. I. Jordan, “Nonparametric latent feature models for link prediction, ” NIPS , 2009. [4] P . D. Hof f, “Modeling homophily and stochastic equiv alence in symmetric relational data, ” NIPS , 2007. [5] A. C. Thomas, Hier arc hical Models for Relational Data . PhD thesis, Harvard Uni versity , 2009. [6] M. N. Schmidt and M. Mørup, “Nonparametric Bayesian Modeling of Comple x Networks, ” IEEE Signal Pr ocessing Magazine , pp. 110–128, 2013. [7] A. Goldenberg, A. X. Zheng, S. E. Fienberg, and E. M. Airoldi, “A Survey of Statistical Network Models, ” F ound. T r ends Mach. Learn. , vol. 2, no. 2, pp. 129–233, 2010. [8] M. E. J. Ne wman, Networks: An Intr oduction . Oxford, UK: Oxford Univ ersity Press, 2010. [9] D. Kriouko v , F . Papadopoulos, M. Kitsak, A. V ahdat, and M. Bogu ˜ n ´ a, “Hyperbolic geometry of complex networks, ” Phys. Rev . E , vol. 82, p. 036106, 2010. [10] M. S. Handcock, A. E. Raftery , and J. M. T antrum, “Model-based clustering for social networks, ” Journal of the Royal Statistical Society: Series A , vol. 170, no. 2, pp. 301–354, 2007. [11] R. J. Williams and D. W . Purves, “The probabilistic niche model re veals substantial variation in the niche structure of empirical food webs., ” Ecology , v ol. 92, no. 9, pp. 1849–57, 2011. [12] G. Robins, P . Pattison, Y . Kalish, and D. Lusher, “An introduction to exponential random graph (p*) models for social networks, ” Social Networks , vol. 29, no. 2, pp. 173–191, 2007. [13] B. Karrer and M. E. J. Ne wman, “Stochastic blockmodels and community structure in networks, ” Physical Revie w E , vol. 83, no. 1, p. 016017, 2011. [14] P . W . Holland, K. B. Laskey , and S. Leinhardt, “Stochastic blockmodels: First steps, ” Social Networks , vol. 5, no. 2, pp. 109–137, 1983. [15] Y . J. W ang and G. Y . W ong, “Stochastic Blockmodels for Directed Graphs, ” Journal of the American Statistical Association , vol. 82, no. 397, pp. 8–19, 1987. [16] K. No wicki and T . A. B. Snijders, “Estimation and Prediction for Stochastic Blockstructures, ” Journal of the American Statistical Association , vol. 96, no. 455, pp. 1077–1087, 2001. [17] C. K emp, J. B. T enenbaum, T . L. Grifﬁths, T . Y amada, and N. Ueda, “Learning systems of concepts with an inﬁnite relational model, ” AAAI , 2006. [18] D. B. Larremore, A. Clauset, and A. Z. Jacobs, “Efﬁciently inferring community structure in bipartite networks, ” Physical Revie w E , vol. 90, no. 1, p. 012805, 2014. [19] C. Aicher , A. Z. Jacobs, and A. Clauset, “Learning Latent Block Structure in W eighted Networks, ” J our- nal of Complex Networks, to appear . , 2014. [20] B. Ball and M. E. J. Newman, “Friendship networks and social status, ” Network Science , vol. 1, no. 01, pp. 16–30, 2013. [21] D. S. Choi, P . J. W olfe, and E. M. Airoldi, “Stochastic blockmodels with a growing number of classes, ” Biometrika , pp. 273–284, 2012. [22] K. Palla, D. Knowles, and Z. Ghahramani, “An inﬁnite latent attribute model for network data, ” ICML , 2012. [23] M. Kim and J. Lesko vec, “Multiplicativ e Attribute Graph Model of Real-W orld Networks, ” Internet Math- ematics , vol. 8, no. 1-2, pp. 113–160, 2012. [24] P . D. Hoff, “Multiplicative latent factor models for description and prediction of social networks, ” Com- putational & Mathematical Organization Theory , vol. 15, no. 4, pp. 261–272, 2009. [25] A. Clauset, C. Moore, and M. E. J. Newman, “Hierarchical structure and the prediction of missing links in networks, ” Natur e , vol. 453, no. 7181, pp. 98–101, 2008. 8 [26] D. M. Roy , C. Kemp, V . K. Mansinghka, and J. B. T enenbaum, “Learning annotated hierarchies from relational data, ” NIPS , 2007. [27] D. M. Roy and Y . W . T eh, “The Mondrian Process, ” NIPS , 2009. [28] P . Orbanz and D. M. Roy , “Bayesian Models of Graphs, Arrays and Other Exchangeable Random Struc- tures, ” IEEE T ransactions on P attern Analysis and Machine Intelligence , pp. 1–25, 2014. [29] T . A. B. Snijders, “Statistical Models for Social Networks, ” Annual Review of Sociology , vol. 37, no. 1, pp. 131–153, 2011. [30] D. Asta and C. R. Shalizi, “Geometric network comparison, ” Preprint, , 2014. [31] S. Chatterjee, “Matrix estimation by Univ ersal Singular V alue Thresholding, ” Annals of Statistics, to appear , . [32] S. C. Olhede and P . J. W olfe, “Network histograms and universality of blockmodel approximation, ” PN AS , vol. 111, p. 14722–14727, 2014. [33] J. J. Y ang, Q. Han, and E. M. Airoldi, “Nonparametric estimation and testing of exchangeable graph models, ” AIST ATS , v ol. 33, pp. 1060–1067, 2014. [34] P . J. Bickel, A. Chen, and E. Levina, “The method of moments and degree distributions for network models, ” Annals of Statistics , vol. 39, no. 5, pp. 2280–2301, 2011. [35] P . J. W olfe and S. C. Olhede, “Nonparametric graphon estimation, ” Preprint, arXiv: 1309.5936 , 2013. [36] P . J. Bickel and A. Chen, “A nonparametric view of network models and Newman–Girvan and other modularities, ” Pr oc. Natl. Acad. Sci. USA , v ol. 106, no. 50, pp. 21068–73, 2009. [37] J. R. Llo yd, P . Orbanz, Z. Ghahramani, and D. M. Ro y , “Random function priors for e xchangeable arrays with applications to graphs and relational data, ” NIPS , 2012. [38] F . Caron and E. B. Fox, “Bayesian nonparametric models of sparse and exchangeable random graphs, ” in NIPS W orkshop on F rontier s in Network Analysis , 2014. [39] P . Diaconis and S. Janson, “Graph limits and exchangeable random graphs, ” Rendiconti di Matematica, Serie VII , vol. 28, pp. 33–61, 2007. [40] T . Austin, “On exchangeable random variables and the statistics of large graphs and hypergraphs, ” Pr ob- ability Surveys , v ol. 5, pp. 80–145, 2008. [41] L. Lo v ´ asz, Lar ge networks and graph limits . American Mathematical Society , 2012. [42] D. Liben-Nowell, J. Nov ak, R. Kumar , P . Raghav an, and A. T omkins, “Geographic routing in social networks, ” PN AS , vol. 102, no. 33, pp. 11623–11628, 2005. [43] N. Eagle, M. Macy , and R. Claxton, “Network di versity and economic development., ” Science , vol. 328, no. 5981, pp. 1029–31, 2010. [44] C. R. Shalizi and A. C. Thomas, “Homophily and Contagion Are Generically Confounded in Observa- tional Social Network Studies., ” Sociological methods & r esearc h , vol. 40, no. 2, pp. 211–239, 2011. [45] B. K. Fosdick and P . D. Hoff, “T esting and Modeling Dependencies Between a Network and Nodal Attributes, ” Preprint, , 2013. [46] U. von Luxberg, R. C. Williamson, and I. Guyon, “Clustering: Science or art?, ” Journal of Machine Learning Resear ch: W&CP , vol. 27, pp. 65–79, 2012. [47] T . H. McCormick and T . Zheng, “Latent space models for networks using Aggregated Relational Data, ” tech. rep., Univ ersity of W ashington, 2013. [48] X. Y an, J. E. Jensen, F . Krzakala, C. Moore, C. R. Shalizi, L. Zdeborov , P . Zhang, and Y . Zhu, “Model selection for degree-corrected block models, ” J Stat Mech: Theory and Experiment , pp. 1–13, 2014. [49] T . P . Peixoto, “Parsimonious module inference in large networks, ” Phys Rev Lett , p. 148701, 2013. [50] T . P . Peixoto, “Model selection and hypothesis testing for large-scale network models with overlapping groups, ” Preprint, , 2014. [51] J. Hofman and C. W iggins, “Bayesian Approach to Netw ork Modularity, ” Phys Rev Lett , vol. 100, no. 25, p. 258701, 2008. [52] P . Gopalan, D. Mimno, S. M. Gerrish, M. J. Freedman, and D. M. Blei, “Scalable inference of overlapping communities, ” NIPS , 2012. [53] D. R. Hunter , S. M. Goodreau, and M. S. Handcock, “Goodness of Fit of Social Network Models, ” Journal of the American Statistical Association , v ol. 103, no. 1, pp. 248–258, 2008. [54] A. Gelman and C. R. Shalizi, “Philosophy and the practice of Bayesian statistics, ” British Journal of Mathematical and Statistical Psychology , vol. 66, no. 1996, p. 36, 2013. 9 [55] C. R. Shalizi and A. Rinaldo, “Consistency under sampling of exponential random graph models, ” Annals of Statistics , vol. 41, no. 2, pp. 508–535, 2013. [56] M. Salter-T o wnshend and T . B. Murphy , “V ariational Bayesian inference for the Latent Position Cluster Model for network data, ” Computational Statistics & Data Analysis , vol. 57, no. 1, pp. 661–671, 2013. [57] C. Reed, Submodular MAP Inference for Scalable Latent F eature Models . PhD thesis, University of Cambridge, 2013. [58] A. A. Amini, A. Chen, P . J. Bickel, and E. Levina, “Pseudo-likelihood methods for community detection in large sparse netw orks, ” Annals of Statistics , v ol. 41, no. 4, pp. 2097–2122, 2013. [59] A. E. Raftery , X. Niu, P . D. Hoff, and K. Y . Y eung, “Fast Inference for the Latent Space Network Model Using a Case-Control Approximate Likelihood, ” Journal of Computational and Graphical Statistics , vol. 21, no. 4, pp. 901–919, 2012. [60] R. Nishihara, T . Minka, and D. T arlo w , “Detecting Parameter Symmetries in Probabilistic Models, ” Preprint, , 2013. 10

A unified view of generative models for networks: models, methods, opportunities, and challenges

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment