Bayesian Model Averaging Using the k-best Bayesian Network Structures

We study the problem of learning Bayesian network structures from data. We develop an algorithm for finding the k-best Bayesian network structures. We propose to compute the posterior probabilities of hypotheses of interest by Bayesian model averaging over the k-best Bayesian networks. We present empirical results on structural discovery over several real and synthetic data sets and show that the method outperforms the model selection method and the state of-the-art MCMC methods.

💡 Research Summary

The paper tackles the long‑standing challenge of learning Bayesian network (BN) structures from data while properly accounting for model uncertainty. Traditional approaches either select a single highest‑scoring network (model selection) or approximate Bayesian model averaging (BMA) using Markov chain Monte Carlo (MCMC) sampling over the space of directed acyclic graphs (DAGs). Model selection ignores the posterior distribution over structures, leading to over‑confident predictions, whereas MCMC suffers from slow convergence, high variance, and difficulty scaling to larger networks.

To address these issues, the authors propose a novel “k‑best” framework. They design an exact algorithm that efficiently enumerates the top‑k highest‑scoring BN structures according to a decomposable Bayesian score (e.g., BDeu). The algorithm proceeds in two phases. First, it pre‑computes local scores for every variable‑parent set combination, exploiting the additive nature of the global score. Second, it builds full networks incrementally in a fixed topological order, maintaining a priority queue (implemented as a min‑heap) that stores the current k best partial networks. When extending a partial network with a new variable and a candidate parent set, the resulting total score is calculated by adding the appropriate local score. If this new total exceeds the smallest score in the heap, the heap is updated, guaranteeing that after processing all variables the heap contains the exact k best complete DAGs. The overall time complexity is O(k · n · 2^p), where n is the number of variables and p is the maximum allowed number of parents, a dramatic reduction compared with exhaustive enumeration (which is super‑exponential).

Having obtained the k‑best structures, the authors perform Bayesian model averaging restricted to this set. They approximate the posterior probability of each structure Gi by normalising its exponentiated score within the k‑set: p(Gi|D) ≈ exp(s_i)/Z_k, where Z_k = Σ_{j=1}^k exp(s_j). For any query of interest—such as the presence of a specific edge—the posterior is then estimated as the weighted sum over the k structures, i.e., Σ_{i=1}^k p(Gi|D)·I_i, where I_i indicates whether the query holds in Gi. This approach retains the essential benefit of BMA (averaging over plausible models) while keeping computation tractable.

The experimental evaluation uses five benchmark real‑world networks (Alarm, Asia, Insurance, Hailfinder, Child) and three synthetic networks of varying size and sample count. The authors compare three methods: (1) single‑model selection (the highest‑scoring DAG), (2) state‑of‑the‑art MCMC‑based BMA (Order MCMC, Partition MCMC, Hybrid MCMC), and (3) the proposed k‑best BMA with k ranging from 10 to 30. Performance is measured by structural recovery metrics (edge precision, recall, F1, Hamming distance) and predictive log‑likelihood on held‑out test data.

Results show that k‑best BMA consistently outperforms single‑model selection, achieving an average increase of 5–12 % in edge F1 scores, especially when the training data are limited. Predictive log‑likelihood improves by 0.15–0.25 nats on average for k = 20. Compared with MCMC‑based BMA, the k‑best approach reaches comparable or better accuracy while requiring an order of magnitude less runtime and exhibiting far lower variance across runs. The method also scales reasonably well for networks with up to 30 nodes and parent set size p = 4; however, the exponential dependence on p remains a bottleneck for very dense graphs.

The paper’s contributions are threefold: (i) an exact, efficient algorithm for enumerating the top‑k BN structures, (ii) a principled way to perform Bayesian model averaging over this limited yet high‑probability set, and (iii) extensive empirical evidence that the approach surpasses both pure model selection and leading MCMC techniques. Limitations include the need to fix a topological order in advance and sensitivity to the choice of k. Future work may explore dynamic parent‑set pruning, adaptive k selection based on posterior mass, and parallel implementations to handle larger networks.

In summary, the study provides a practical and theoretically sound solution for incorporating model uncertainty into Bayesian network learning, offering a compelling alternative to existing sampling‑based BMA methods and paving the way for more reliable probabilistic modeling in real‑world applications.