On Bayesian Network Approximation by Edge Deletion

We consider the problem of deleting edges from a Bayesian network for the purpose of simplifying models in probabilistic inference. In particular, we propose a new method for deleting network edges, which is based on the evidence at hand. We provide some interesting bounds on the KL-divergence between original and approximate networks, which highlight the impact of given evidence on the quality of approximation and shed some light on good and bad candidates for edge deletion. We finally demonstrate empirically the promise of the proposed edge deletion technique as a basis for approximate inference.

💡 Research Summary

Bayesian networks (BNs) are powerful graphical models for representing complex probabilistic dependencies, but exact inference quickly becomes intractable as the number of nodes and edges grows. A common strategy for scaling BNs is to simplify the network structure, for example by deleting edges, yet most existing edge‑deletion approaches decide which edges to remove based solely on prior statistics or global structural criteria, ignoring the influence of the currently observed evidence.

The paper introduces an evidence‑driven edge‑deletion framework that selects edges for removal according to the specific evidence set available at inference time. The authors first derive a bound on the Kullback‑Leibler (KL) divergence between the original distribution (P) and the approximated distribution (Q) after deleting an edge ((X\rightarrow Y)). The bound is expressed in terms of the conditional probability table (CPT) of the edge and the posterior distribution of the parent given the evidence. Crucially, the bound shows that when evidence strongly concentrates on a subset of variables, the KL impact of deleting edges incident to those variables diminishes.

Two theoretical results are presented. The first provides a per‑edge KL‑divergence upper bound that depends on the joint posterior (P(x,y\mid\mathbf{e})) and the original conditional (P(x\mid y)). The second shows that the total KL divergence after a sequence of deletions is bounded by the sum of the individual bounds, thereby giving a principled way to rank edges by their expected contribution to approximation error.

Based on these insights, the authors define a “deletion score” for each candidate edge: \