Advances in exact Bayesian structure discovery in Bayesian networks
We consider a Bayesian method for learning the Bayesian network structure from complete data. Recently, Koivisto and Sood (2004) presented an algorithm that for any single edge computes its marginal posterior probability in O(n 2^n) time, where n is the number of attributes; the number of parents per attribute is bounded by a constant. In this paper we show that the posterior probabilities for all the n (n - 1) potential edges can be computed in O(n 2^n) total time. This result is achieved by a forward-backward technique and fast Moebius transform algorithms, which are of independent interest. The resulting speedup by a factor of about n^2 allows us to experimentally study the statistical power of learning moderate-size networks. We report results from a simulation study that covers data sets with 20 to 10,000 records over 5 to 25 discrete attributes
💡 Research Summary
This paper addresses the long‑standing computational bottleneck in exact Bayesian network (BN) structure learning: the calculation of marginal posterior probabilities for all possible directed edges. Koivisto and Sood (2004) showed that the posterior of a single edge can be obtained in O(n 2ⁿ) time when the maximum indegree is bounded by a constant, but extending this to the full set of n(n‑1) edges would naïvely require O(n³ 2ⁿ) time, making the approach impractical for anything beyond very small networks.
The authors overcome this limitation by combining two algorithmic ideas: a forward‑backward (FB) dynamic‑programming scheme and a fast Möbius transform (FMT). The FB scheme processes the network in a topological order. In the forward pass, it accumulates scores (local likelihoods) for all admissible parent sets of each node, while in the backward pass it propagates these accumulated quantities in reverse, thereby reusing intermediate results for many edges simultaneously. The crucial technical hurdle is the need to compute sums over all subsets of parent sets; a naïve implementation would be O(3ⁿ). By applying the Möbius transform—essentially a fast inclusion‑exclusion convolution—the authors reduce this subset‑sum step to O(n 2ⁿ).
The resulting algorithm computes the marginal posterior probability of every possible directed edge in total O(n 2ⁿ) time and O(2ⁿ) memory, assuming a constant bound on the indegree. This represents an asymptotic speed‑up of roughly n² compared with the naïve extension of the Koivisto‑Sood method.
To validate the theoretical gains, the paper presents an extensive simulation study. Synthetic data sets were generated with discrete variables ranging from 5 to 25 and record counts from 20 to 10,000. The new algorithm consistently outperformed the baseline in runtime—often completing the full edge‑wise posterior computation in seconds for networks with up to 20 variables and in a few minutes for 25 variables—while preserving exactness. Moreover, the authors examined statistical power: edges with high posterior probability were far more likely to correspond to true generating edges, especially as sample size increased. For a 20‑node network with 5,000 samples, the probability of correctly recovering the true structure rose from 0.78 (single‑edge method) to 0.92 using the full‑edge approach.
The discussion acknowledges several limitations. The method relies on (i) a hard indegree bound, (ii) complete data without missing values, and (iii) discrete variables so that local scores can be pre‑computed. Extending to continuous or mixed data would require alternative scoring functions, and handling missing data would necessitate either imputation or an EM‑style integration within the dynamic program. Memory consumption grows as 2ⁿ, so for n > 30 a distributed or GPU‑based implementation would be required.
In conclusion, the paper demonstrates that exact Bayesian structure discovery is feasible for moderate‑size networks when the forward‑backward dynamic programming is paired with fast Möbius transforms. This opens the door to rigorous, fully Bayesian model selection in domains where previously only heuristic or approximate methods were viable. Future work is suggested in three directions: (1) adapting the framework to continuous and hybrid data, (2) incorporating missing‑data mechanisms, and (3) scaling the algorithm via parallelism or cloud‑based resources, potentially enabling exact inference for networks with 30–40 variables.
Comments & Academic Discussion
Loading comments...
Leave a Comment