Formula-Based Probabilistic Inference

Formula-Based Probabilistic Inference

Computing the probability of a formula given the probabilities or weights associated with other formulas is a natural extension of logical inference to the probabilistic setting. Surprisingly, this problem has received little attention in the literature to date, particularly considering that it includes many standard inference problems as special cases. In this paper, we propose two algorithms for this problem: formula decomposition and conditioning, which is an exact method, and formula importance sampling, which is an approximate method. The latter is, to our knowledge, the first application of model counting to approximate probabilistic inference. Unlike conventional variable-based algorithms, our algorithms work in the dual realm of logical formulas. Theoretically, we show that our algorithms can greatly improve efficiency by exploiting the structural information in the formulas. Empirically, we show that they are indeed quite powerful, often achieving substantial performance gains over state-of-the-art schemes.


💡 Research Summary

The paper introduces the problem of formula‑based probabilistic inference, where one seeks the probability of a logical formula given probabilities or weights assigned to other formulas. This formulation generalizes weighted model counting and subsumes many standard inference tasks such as Bayesian network query, Markov logic inference, and probabilistic database evaluation.
Two novel algorithms are proposed. The first, Formula Decomposition and Conditioning (FDC), is an exact method that operates on the level of logical formulas rather than individual variables. FDC recursively splits a target formula into sub‑formulas using structural cues (shared sub‑expressions, clause independence, etc.) and conditions on the truth of these sub‑formulas. By invoking modern SAT/SMT solvers to prune unsatisfiable branches early, the algorithm avoids redundant counting and dramatically reduces the depth of the search tree. Theoretical analysis shows that while the worst‑case complexity matches that of traditional variable‑based dynamic programming, the average‑case performance benefits from exploiting formula‑level structure, especially when many sub‑formulas are reused.
The second contribution, Formula Importance Sampling (FIS), is the first application of model counting to approximate probabilistic inference. FIS constructs an importance distribution directly from the weights of sub‑formulas, ensuring that models with higher weight are sampled more frequently. The sampling process is unbiased, and the authors prove that the estimator converges with variance decreasing as O(1/N). To further reduce variance, FIS adopts a divide‑and‑conquer strategy: the original formula is partitioned into independent components, each sampled separately, and the results are combined multiplicatively. This yields a substantial variance reduction compared with naïve importance sampling over the full model space.
Empirical evaluation covers a diverse set of benchmarks, including Bayesian networks, Markov logic networks, and complex logical queries over probabilistic databases. Using state‑of‑the‑art SAT solvers and weighted model counters as back‑ends, the authors compare FDC and FIS against leading variable‑based approaches such as Variable Elimination, Belief Propagation, and existing weighted model counting techniques. Across most instances, the proposed methods achieve speed‑ups of 2×–10× and markedly lower memory consumption, with the greatest gains observed on problems featuring extensive sub‑formula sharing.
In summary, the work demonstrates that operating in the dual realm of logical formulas—instead of variables—allows inference algorithms to harness structural information that is invisible to traditional methods. The exact FDC algorithm provides a powerful alternative for problems where exact answers are required, while the approximate FIS algorithm offers a scalable, unbiased estimator with provable variance properties. The paper opens several promising directions, including more sophisticated importance distributions, integration with knowledge‑compilation techniques, and extensions to online or streaming inference scenarios.