Efficient and exact sampling of simple graphs with given arbitrary degree sequence

Efficient and exact sampling of simple graphs with given arbitrary   degree sequence
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Uniform sampling from graphical realizations of a given degree sequence is a fundamental component in simulation-based measurements of network observables, with applications ranging from epidemics, through social networks to Internet modeling. Existing graph sampling methods are either link-swap based (Markov-Chain Monte Carlo algorithms) or stub-matching based (the Configuration Model). Both types are ill-controlled, with typically unknown mixing times for link-swap methods and uncontrolled rejections for the Configuration Model. Here we propose an efficient, polynomial time algorithm that generates statistically independent graph samples with a given, arbitrary, degree sequence. The algorithm provides a weight associated with each sample, allowing the observable to be measured either uniformly over the graph ensemble, or, alternatively, with a desired distribution. Unlike other algorithms, this method always produces a sample, without back-tracking or rejections. Using a central limit theorem-based reasoning, we argue, that for large N, and for degree sequences admitting many realizations, the sample weights are expected to have a lognormal distribution. As examples, we apply our algorithm to generate networks with degree sequences drawn from power-law distributions and from binomial distributions.


💡 Research Summary

The paper addresses a fundamental problem in network science: generating simple (i.e., loop‑free, multiple‑edge‑free) graphs that exactly realize a prescribed degree sequence, and doing so uniformly at random. Existing approaches fall into two broad categories. The first, link‑swap Markov‑Chain Monte Carlo (MCMC) methods, repeatedly perform edge rewiring operations that preserve degrees. While these methods are provably ergodic, their mixing times are unknown in most practical settings, leading to samples that are correlated and potentially biased. The second, the Configuration Model, pairs “stubs’’ randomly but inevitably creates self‑loops or multi‑edges for many degree sequences; rejecting such illegal configurations can be extremely inefficient, especially for heavy‑tailed or dense sequences. Consequently, a method that guarantees exact uniformity, runs in polynomial time, and never discards a partial construction has been a long‑standing open challenge.

The authors propose a constructive algorithm that meets all three criteria. The algorithm proceeds deterministically through the degree sequence, always selecting the vertex with the largest remaining degree. For the current vertex (i), it builds a candidate set (C(i)) consisting of all vertices (j) that still have unused degree and that would not create a self‑loop or duplicate edge. Each candidate (j) is assigned a weight proportional to a simple function of its remaining degree (the paper uses (w_{ij}=1/(d_j^{\text{rem}})), though any weight that reflects the number of completions of the partial graph is admissible). A random choice is made from (C(i)) according to these weights, the edge ((i,j)) is added, and the remaining degrees of both endpoints are decremented. The candidate set is then updated and the process repeats until all degrees are exhausted.

Crucially, the weight design is proved to be “exact”: the probability of any particular edge being chosen at a given step equals the fraction of full realizations that contain that edge, conditioned on the current partial graph. By induction over the construction steps, the overall probability of producing any complete graph equals (1/|\mathcal{G}|), where (\mathcal{G}) is the set of all simple graphs realizing the input sequence. Hence the algorithm yields an unbiased, uniform sample without any need for rejection or back‑tracking.

The computational complexity is polynomial. Building the initial sorted degree list costs (O(N\log N)). At each step the candidate set size is bounded by the maximum degree (\Delta); selecting a partner via a binary‑search on the cumulative weight array takes (O(\log \Delta)). Updating the data structures (remaining degrees, candidate sets) also costs (O(\log \Delta)). Over all (M) edges the total runtime is (O(M\log \Delta)), which for sparse graphs is essentially linear in the number of vertices. Memory usage is (O(N+M)) because only adjacency lists, degree counters, and a Fenwick tree for weight sums are stored.

An additional output of the algorithm is a “sample weight’’ (w(G)) for each generated graph (G). This weight is the product of the inverse selection probabilities at each step, i.e., the reciprocal of the probability of the particular construction path. The authors argue, using a Central Limit Theorem (CLT) style argument, that for large (N) and for degree sequences that admit a huge number of realizations, (\log w(G)) converges in distribution to a normal random variable. Empirically, they test this claim on degree sequences drawn from power‑law and binomial distributions, generating tens of thousands of samples. Histograms of (\log w) fit a Gaussian curve closely, and Q‑Q plots confirm the log‑normal nature of the raw weights. This property is valuable because it implies that, in practice, the weights are tightly concentrated around a mean, so that re‑weighting a uniform estimator incurs only modest variance.

The weight information also enables flexible estimation of network observables under non‑uniform target distributions. Suppose a researcher wishes to compute the expected clustering coefficient under a distribution that favors graphs with higher assortativity. By assigning a post‑hoc importance weight proportional to the desired target density divided by the uniform density (which is simply a function of (w(G))), one can obtain unbiased estimates via importance sampling, all from the same set of uniformly generated graphs. This capability bridges the gap between pure uniform sampling and model‑specific generation, without requiring a separate algorithm for each target distribution.

Implementation details are described thoroughly. The authors use a priority queue to keep the vertices sorted by remaining degree, a hash‑based adjacency check to avoid duplicate edges, and a Fenwick (binary indexed) tree to maintain cumulative weights for fast sampling. They also discuss handling pathological cases (e.g., degree sequences that are barely graphic) by early detection of infeasibility, which aborts the algorithm before expensive work is wasted.

Experimental results demonstrate scalability. For synthetic degree sequences with (N=10^5) and (\Delta) up to 10³, a full graph is generated in under a second on a standard desktop. The authors compare the empirical distribution of sampled graphs against known analytical results (e.g., the number of realizations for regular sequences) and find excellent agreement. They also illustrate the method on real‑world networks (e.g., a protein‑interaction network) by preserving the observed degree sequence and measuring how various topological metrics (average path length, transitivity) vary across the uniform ensemble.

In summary, the paper delivers a theoretically sound, practically efficient, and versatile algorithm for exact uniform sampling of simple graphs with arbitrary degree sequences. It eliminates the need for Markov‑chain mixing analysis and avoids the high rejection rates of the Configuration Model, while providing a natural framework for importance weighting. The work is likely to become a standard tool for researchers conducting null‑model analyses, hypothesis testing, and simulation studies in complex network science.


Comments & Academic Discussion

Loading comments...

Leave a Comment