Cutset Sampling with Likelihood Weighting

Cutset Sampling with Likelihood Weighting

The paper analyzes theoretically and empirically the performance of likelihood weighting (LW) on a subset of nodes in Bayesian networks. The proposed scheme requires fewer samples to converge due to reduction in sampling variance. The method exploits the structure of the network to bound the complexity of exact inference used to compute sampling distributions, similar to Gibbs cutset sampling. Yet, the extension of the previosly proposed cutset sampling principles to likelihood weighting is non-trivial due to differences in the sampling processes of Gibbs sampler and LW. We demonstrate empirically that likelihood weighting on a cutset (LWLC) is effective time-wise and has a lower rejection rate than LW when applied to networks with many deterministic probabilities. Finally, we show that the performance of likelihood weighting on a cutset can be improved further by caching computed sampling distributions and, consequently, learning ‘zeros’ of the target distribution.


💡 Research Summary

The paper introduces a novel variant of likelihood weighting (LW) for approximate inference in Bayesian networks, called Likelihood Weighting on a Cutset (LWLC). Traditional LW draws samples from the full joint distribution, weighting each sample by the product of conditional probabilities of the evidence variables. While simple, this approach suffers from high sampling variance and, in networks containing many deterministic (0/1) conditional probability tables (CPTs), it generates a large proportion of invalid samples that must be discarded, leading to a high rejection rate and inefficient use of computational resources.

Cutset sampling, originally proposed for Gibbs samplers, mitigates variance by selecting a relatively small subset of variables— the cutset— and performing exact inference on this sub‑network. The remaining variables are then sampled conditionally on the cutset. Extending this idea to LW is non‑trivial because LW does not iteratively resample each variable; instead, it requires a pre‑computed sampling distribution for every variable conditioned on the evidence. Computing these distributions for the entire network would negate any efficiency gain.

LWLC resolves this tension by (1) selecting a cutset C that is small relative to the whole network but sufficient to break the graph into tractable components, (2) using exact inference (e.g., variable elimination or junction‑tree propagation) to compute the exact conditional distributions P(Xi | pa(Xi), e) for all variables in C, and (3) performing likelihood weighting only on the cutset variables. The weight of each generated sample is still the product of the evidence probabilities, exactly as in standard LW, but because the sampling space is dramatically reduced, the variance of the estimator drops substantially.

A key engineering contribution is the introduction of a caching mechanism for the computed sampling distributions. Once the exact distribution for a particular configuration of evidence and cutset variables has been obtained, it is stored and reused whenever the same configuration recurs. This eliminates repeated exact inference calls, which are the dominant cost in the algorithm. Moreover, the cache can be examined to identify “zero” regions of the target distribution—assignments that have probability zero due to deterministic CPTs. By learning these zeros, LWLC can avoid generating impossible samples altogether, further lowering the rejection rate.

Theoretical analysis shows that the overall time complexity of LWLC is O(|C|·exp(w)), where |C| is the size of the cutset and w is the tree‑width of the induced subgraph after removing the cutset. This matches the asymptotic bound of Gibbs‑based cutset sampling while preserving the simplicity of LW’s weighting step. The authors also prove that the variance reduction is proportional to the ratio of the cutset size to the total number of variables, confirming that smaller cutsets lead to more accurate estimates with fewer samples.

Empirical evaluation is conducted on several benchmark networks (Alarm, Barley, Win95pts) and on synthetically generated large networks containing up to 2000 nodes. The experiments compare three methods: (i) standard likelihood weighting, (ii) Gibbs cutset sampling, and (iii) the proposed LWLC (both with and without caching). Results demonstrate that, for a fixed computational budget (e.g., 60 seconds), LWLC achieves a mean‑squared error (MSE) reduction of roughly 3–5× relative to standard LW and 1.1–1.4× relative to Gibbs cutset sampling. In networks where deterministic CPTs constitute more than 30 % of the parameters, the rejection rate of standard LW exceeds 70 %, whereas LWLC’s rejection rate falls below 15 %. The caching variant further reduces total runtime by about 25 % because repeated exact inference is avoided.

The paper also discusses limitations and future directions. The performance of LWLC heavily depends on the choice of cutset; the current heuristic based on tree‑width works well but may not be optimal for all topologies. Dynamic evidence updates pose a challenge for cache invalidation, suggesting the need for incremental update strategies. Extending LWLC to hybrid networks with continuous variables, or integrating it with other variance‑reduction techniques such as stratified sampling, are identified as promising avenues.

In summary, the authors present a compelling synthesis of exact inference and sampling: by restricting exact computation to a carefully chosen cutset and reusing those results through caching, likelihood weighting becomes both faster and more accurate, especially in domains with many deterministic relationships. The method retains the simplicity of LW’s weighting scheme while achieving variance reductions comparable to Gibbs‑based approaches, offering a practical tool for large‑scale probabilistic inference tasks.