A survey of statistical network models
Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.
💡 Research Summary
The paper provides a comprehensive survey of statistical network models, tracing their development from early probabilistic graph theory in the late 1950s to modern machine‑learning‑driven approaches. It begins with a historical overview, highlighting how sociological experiments (e.g., Milgram’s small‑world studies) and the Erdős‑Rényi random graph laid the groundwork for a vibrant interdisciplinary community. The authors then present a diverse set of real‑world datasets—ranging from Sampson’s monastery data and the Enron email corpus to protein‑protein interaction networks and online co‑authorship graphs—to illustrate the breadth of applications and to motivate the need for both static and dynamic modeling frameworks.
The core of the survey is organized around two axes: static models that explain a single snapshot of a network, and dynamic models that capture how networks evolve over time. In the static section, the authors systematically discuss the Erdős‑Rényi‑Gilbert model, exchangeable graph models, the p1 and p2 exponential family formulations, and the broader class of Exponential Random Graph Models (ERGMs). They explain how p1 models introduce node‑level popularity and reciprocity via a log‑linear specification, while p2 models extend this to Bayesian hierarchical settings, enabling multidimensional and block‑structured extensions. Fixed‑degree models, stochastic block models (SBM), and latent space models are covered in depth, with attention to parameter interpretation, likelihood‑based and variational inference methods, and computational scalability.
The dynamic portion surveys preferential‑attachment models (the “rich‑get‑richer” mechanism), small‑world models (Watts‑Strogatz rewiring), duplication‑attachment models for biological networks, continuous‑time Markov chain formulations, and discrete‑time Markov chain approaches. Within these, the authors highlight dynamic ERGMs, dynamic latent space models, and the Dynamic Contextual Friendship Model (DCFM) as contemporary attempts to embed temporal dependence directly into the exponential family framework. Estimation techniques such as MCMC‑MLE, MPLE, particle filters, and sequential Monte Carlo are described, together with their limitations for large‑scale data.
A dedicated “Issues in Network Modeling” chapter critiques current practice. The authors point out that many physics‑style models rely on summary statistics (e.g., degree distributions, clustering coefficients) without rigorous statistical validation, leading to over‑interpretation of power‑law fits—a problem illustrated by Stouffer et al.’s re‑analysis of email communication data. They discuss challenges in model selection (AIC/BIC, cross‑validation, Bayes factors), handling noisy or biased observations, and the computational burden of fitting complex exponential family models.
Finally, the paper outlines open problems and future directions: integrating mixed‑membership models with graph neural networks, developing online inference algorithms for streaming network data, improving causal inference in dynamic networks, and building unified frameworks that combine statistical rigor with the scalability of modern machine learning. The survey serves as both a roadmap for newcomers and a reference for seasoned researchers seeking a coherent synthesis of static and dynamic statistical network modeling.
Comments & Academic Discussion
Loading comments...
Leave a Comment