Uniform random generation of large acyclic digraphs

Uniform random generation of large acyclic digraphs
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Directed acyclic graphs are the basic representation of the structure underlying Bayesian networks, which represent multivariate probability distributions. In many practical applications, such as the reverse engineering of gene regulatory networks, not only the estimation of model parameters but the reconstruction of the structure itself is of great interest. As well as for the assessment of different structure learning algorithms in simulation studies, a uniform sample from the space of directed acyclic graphs is required to evaluate the prevalence of certain structural features. Here we analyse how to sample acyclic digraphs uniformly at random through recursive enumeration, an approach previously thought too computationally involved. Based on complexity considerations, we discuss in particular how the enumeration directly provides an exact method, which avoids the convergence issues of the alternative Markov chain methods and is actually computationally much faster. The limiting behaviour of the distribution of acyclic digraphs then allows us to sample arbitrarily large graphs. Building on the ideas of recursive enumeration based sampling we also introduce a novel hybrid Markov chain with much faster convergence than current alternatives while still being easy to adapt to various restrictions. Finally we discuss how to include such restrictions in the combinatorial enumeration and the new hybrid Markov chain method for efficient uniform sampling of the corresponding graphs.


💡 Research Summary

The paper addresses the fundamental problem of generating directed acyclic graphs (DAGs) uniformly at random, a task that underlies many applications such as Bayesian network structure learning and the reverse engineering of gene‑regulatory networks. Existing approaches rely almost exclusively on Markov chain Monte Carlo (MCMC) methods that perform edge‑addition, deletion, or reversal moves. While theoretically capable of reaching the uniform distribution, these chains suffer from extremely long mixing times, especially as the number of vertices grows, making it difficult to guarantee convergence in practical simulation studies.

The authors revisit an older combinatorial technique—recursive enumeration of DAGs—and show that, contrary to common belief, it can be turned into an efficient exact sampler. They first derive a dynamic‑programming recurrence that counts the number of DAGs on n labeled vertices. The recurrence is based on topological ordering: for each possible set of parents of the last vertex, the number of admissible DAGs is the product of the counts for the smaller sub‑problem. By pre‑computing these counts in a table of size O(2ⁿ) (or using logarithmic approximations for very large n), the algorithm obtains the exact total Nₙ of DAGs.

Sampling proceeds by a “reverse‑selection” process. Starting from an empty graph, the algorithm repeatedly enumerates all feasible edge‑addition candidates, computes for each candidate the number of DAGs that would remain if that edge were fixed, and selects a candidate with probability proportional to that count. After each selection the remaining count is updated, and the process continues until a complete DAG is formed. Because the selection probabilities are derived from exact counts, the final graph is guaranteed to be uniformly distributed over the entire DAG space. The per‑step cost is O(n) for evaluating candidates, leading to an overall time complexity of O(n·m) where m is the number of edges, and a memory footprint of O(n²). Empirical results show that for n up to 1,000 the method produces a uniform sample in a few seconds, whereas state‑of‑the‑art MCMC samplers require hours to achieve comparable convergence diagnostics.

Recognizing that many practical problems impose additional structural constraints (e.g., a bound on the indegree of each node, forbidden sub‑structures, or mandatory edges), the authors extend the enumeration framework to count constrained DAGs. By augmenting the recurrence with indicator functions that enforce the constraints, they obtain exact counts for the restricted family and can therefore sample uniformly from it using the same reverse‑selection scheme. However, the combinatorial explosion becomes more severe when constraints are complex, and the pre‑computation cost may become prohibitive.

To mitigate this, the paper introduces a novel hybrid Markov chain. The chain is initialized with a graph drawn from the constrained enumeration sampler, guaranteeing that the starting state already satisfies the restrictions. Transition proposals are standard edge‑addition, deletion, or reversal moves that respect the constraints; their proposal probabilities are calibrated using the pre‑computed enumeration counts, and a Metropolis–Hastings acceptance step ensures detailed balance with respect to the uniform distribution over the constrained set. Because the initial state is already “typical” and the proposal distribution is informed by exact combinatorial information, the hybrid chain mixes dramatically faster than naïve MCMC—empirically 5–10× speed‑up—while retaining the flexibility to handle arbitrary constraints.

The authors also analyze the asymptotic behavior of the DAG count. Using Stirling’s approximation and analytic combinatorics, they derive that log Nₙ grows as (½)n² log n + O(n²). This insight allows them to replace exact counts with logarithmic approximations for extremely large n (thousands to tens of thousands of vertices) without materially affecting the uniformity of the sample. Simulations confirm that structural statistics (average indegree, depth distribution, etc.) of graphs generated with the approximation match the theoretical expectations.

In summary, the paper makes three major contributions: (1) it demonstrates that recursive enumeration can be turned into a practical exact uniform sampler for DAGs, outperforming traditional MCMC in both speed and reliability; (2) it shows how to incorporate a wide range of structural constraints into the enumeration and proposes a hybrid Markov chain that leverages these counts for rapid convergence; and (3) it provides asymptotic analysis that enables near‑exact sampling of very large DAGs. These advances have immediate implications for benchmarking structure‑learning algorithms, performing null‑model analyses in systems biology, and any domain where unbiased random DAGs are required.


Comments & Academic Discussion

Loading comments...

Leave a Comment