Boltzmann sampling and optimal exact-size sampling for directed acyclic graphs

Boltzmann sampling and optimal exact-size sampling for directed acyclic graphs
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We propose two efficient algorithms for generating uniform random directed acyclic graphs, including an asymptotically optimal exact-size sampler that performs $\frac{n^2}{2} + o(n^2)$ operations and requests to a random generator. This was achieved by extending the Boltzmann model for graphical generating functions and by using various decompositions of directed acyclic graphs. The presented samplers improve upon the state-of-the-art algorithms in terms of theoretical complexity and offer a significant speed-up in practice.


💡 Research Summary

The paper addresses the problem of uniformly generating random directed acyclic graphs (DAGs) and presents two efficient algorithms based on an extension of the Boltzmann sampling framework to graphic generating functions (GGFs). The authors first adapt the classical Boltzmann model, which traditionally works with ordinary or exponential generating functions, to the setting where combinatorial objects are labelled digraphs. In this graphic Boltzmann model, a DAG G is sampled with probability proportional to (z^{v(G)} w^{e(G)}) multiplied by a normalising factor derived from the GGF of DAGs. The parameters (z) and (w) control the expected number of vertices and edges, respectively, and when (w=1) the distribution becomes uniform over all labelled DAGs of a given size.

Two distinct structural decompositions of DAGs are then exploited to build concrete samplers. The first decomposition is the root‑layering introduced by Robinson. A root‑layering is an ordered partition ((V_1,\dots,V_k)) of the vertex set such that each (V_i) consists of the sources of the subgraph obtained after removing all vertices in earlier layers. The authors prove (Theorem 4) that this layering is unique for any DAG and characterise it by two simple constraints: (1) no backward edges between later and earlier layers, and (2) every vertex in layer (i+1) must have at least one incoming edge from layer (i). Lemmas 5‑8 give the exact probability distributions of the layer sizes and of the admissible edge sets conditioned on a given layering. Using these distributions, a Boltzmann sampler can be built by (i) sampling the sequence of layer sizes, (ii) independently deciding for each possible inter‑layer edge whether it appears, and (iii) finally assigning random labels to the vertices.

The second decomposition relies on the arrow product. In the GGF world the arrow product of two classes (A) and (B) corresponds to simply multiplying their generating functions: ((A!\rightarrow!B)(z,w)=A(z,w)B(z,w)). Since the DAG GGF satisfies (\text{DAG}(z,w)=1/\text{Set}(-z,w)), the authors rewrite it as an infinite product of “Set” components and obtain a recursive sampling scheme that repeatedly draws independent sub‑DAGs and connects them with all possible forward edges. This approach avoids inclusion‑exclusion and fits naturally into the standard Boltzmann sampling toolbox.

The main contribution of the paper is an asymptotically optimal exact‑size sampler for labelled DAGs. The algorithm proceeds as follows: (1) choose parameters (z) and (w) so that the expected size of a Boltzmann‑sampled DAG matches the target size (n); (2) generate a Boltzmann‑sampled DAG using either the root‑layering or arrow‑product method; (3) perform an O(n)‑time rejection step that discards the sample if its vertex count differs from (n); (4) if the size matches, make a single pass over the adjacency matrix to finalize the edge set, consuming exactly the remaining random bits. The authors prove that the expected number of elementary operations is (\frac{n^{2}}{2}+o(n^{2})) and that the average number of random bits used is also (\frac{n^{2}}{2}+o(n^{2})), which matches the Shannon entropy of the uniform distribution over all labelled DAGs of size (n). Notably, the algorithm requires no preprocessing; the only overhead beyond the Boltzmann sampling is a constant‑time parameter tuning and the linear‑time rejection phase.

Experimental evaluation, performed with a C/C++ implementation, shows dramatic speedups over the previous state‑of‑the‑art exact‑size sampler (which is based on a recursive method with quadratic preprocessing). For (n=4096) the older method needs roughly 3 seconds, whereas the new algorithm completes in about 20 milliseconds. Even for very large instances ((n=200{,}000)) the sampler runs in roughly 2 seconds on a consumer laptop, with memory accesses limited to (\frac{n^{2}}{2}+O(n)). All source code and benchmark scripts are publicly released on OSF, enabling reproducibility.

In summary, the paper demonstrates that by extending Boltzmann sampling to graphic generating functions and by exploiting two complementary DAG decompositions, one can obtain a uniform random DAG generator that is both theoretically optimal (in time and random‑bit complexity) and practically extremely fast. The work opens the door to applying the same methodology to other families of directed graphs, partial orders, or more constrained network models, and suggests future research directions such as parallel GPU implementations, adaptive parameter selection, and extensions to weighted or coloured DAGs.


Comments & Academic Discussion

Loading comments...

Leave a Comment