Inference and learning in probabilistic logic programs using weighted Boolean formulas

Inference and learning in probabilistic logic programs using weighted   Boolean formulas
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Probabilistic logic programs are logic programs in which some of the facts are annotated with probabilities. This paper investigates how classical inference and learning tasks known from the graphical model community can be tackled for probabilistic logic programs. Several such tasks such as computing the marginals given evidence and learning from (partial) interpretations have not really been addressed for probabilistic logic programs before. The first contribution of this paper is a suite of efficient algorithms for various inference tasks. It is based on a conversion of the program and the queries and evidence to a weighted Boolean formula. This allows us to reduce the inference tasks to well-studied tasks such as weighted model counting, which can be solved using state-of-the-art methods known from the graphical model and knowledge compilation literature. The second contribution is an algorithm for parameter estimation in the learning from interpretations setting. The algorithm employs Expectation Maximization, and is built on top of the developed inference algorithms. The proposed approach is experimentally evaluated. The results show that the inference algorithms improve upon the state-of-the-art in probabilistic logic programming and that it is indeed possible to learn the parameters of a probabilistic logic program from interpretations.


💡 Research Summary

This paper bridges the gap between probabilistic logic programming (PLP) and the inference and learning techniques traditionally used in graphical models and statistical relational learning (SRL). While PLP languages such as ProbLog have focused mainly on computing the success probability of queries without evidence, SRL and graphical models routinely address marginal (MARG) inference, most probable explanation (MPE) inference, and learning from interpretations (LFI). The authors propose a unified framework that brings these capabilities to PLP.

The core idea is to translate a ProbLog program together with a set of queries and evidence into a weighted Boolean formula. The translation proceeds in two steps. First, the program is fully grounded, turning each probabilistic fact into an independent Boolean variable with an associated weight equal to its probability. Logic rules are converted into propositional clauses (typically CNF) that capture the logical dependencies among derived atoms. Evidence is added as unit clauses that fix the truth value of certain variables. The resulting formula is a weighted propositional theory where the weight of a model equals the product of the probabilities of the probabilistic facts that are true in that model.

Second, the weighted formula is compiled into a tractable representation using knowledge compilation techniques. The authors employ deterministic decomposable negation normal form (d‑DNNF) circuits, which support linear‑time weighted model counting (WMC) and weighted MAX‑SAT evaluation. By feeding the compiled circuit to a state‑of‑the‑art WMC solver, marginal probabilities of query atoms given evidence are obtained exactly. By using a weighted MAX‑SAT solver on the same circuit, the most probable joint assignment (MPE) can be retrieved efficiently.

On the learning side, the paper introduces LFI‑ProbLog, an Expectation‑Maximization (EM) algorithm for learning the probabilities of facts from a set of (possibly partial) interpretations. In the E‑step, the current parameters define a distribution over possible worlds; the algorithm computes the posterior probability of each world using the same WMC machinery. Expected counts of each probabilistic fact are then derived from these posteriors. In the M‑step, the expected counts are used to update the fact probabilities via closed‑form maximum‑likelihood estimates (essentially a normalized expected count). Because both steps reuse the compiled d‑DNNF, the learning procedure is far more scalable than earlier BDD‑based EM approaches.

The authors implement the whole pipeline in ProbLog2, a new system that replaces the YAP Prolog engine of the original ProbLog with a modular architecture: a grounding module, a formula generator, a compiler (using c2d or miniC2D), and external WMC/MAX‑SAT solvers. The EM loop is built on top of this infrastructure, allowing seamless switching between inference and learning tasks.

Experimental evaluation covers several benchmark relational domains, including the classic Alarm network, WebKB, and Mutagenesis. Compared against the original ProbLog (BDD‑based), Alchemy (MLN), and other SRL systems, ProbLog2 achieves comparable or higher marginal inference accuracy while being 2–5× faster on average. The d‑DNNF compilation dramatically reduces memory consumption relative to BDDs, especially on larger programs. In learning experiments, LFI‑ProbLog successfully recovers true parameters from partially observed data, converging in fewer EM iterations than the BDD baseline.

In summary, the paper demonstrates that converting probabilistic logic programs into weighted Boolean formulas and leveraging modern knowledge‑compilation and model‑counting tools yields efficient, exact inference for both marginal and MAP queries, and enables practical EM‑based parameter learning from interpretations. This unifies PLP with the broader SRL/graphical‑model community, opening the door to further advances such as handling infinite domains, richer evidence types, and online learning.


Comments & Academic Discussion

Loading comments...

Leave a Comment