Well-Definedness and Efficient Inference for Probabilistic Logic Programming under the Distribution Semantics

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The distribution semantics is one of the most prominent approaches for the combination of logic programming and probability theory. Many languages follow this semantics, such as Independent Choice Logic, PRISM, pD, Logic Programs with Annotated Disjunctions (LPADs) and ProbLog. When a program contains functions symbols, the distribution semantics is well-defined only if the set of explanations for a query is finite and so is each explanation. Well-definedness is usually either explicitly imposed or is achieved by severely limiting the class of allowed programs. In this paper we identify a larger class of programs for which the semantics is well-defined together with an efficient procedure for computing the probability of queries. Since LPADs offer the most general syntax, we present our results for them, but our results are applicable to all languages under the distribution semantics. We present the algorithm “Probabilistic Inference with Tabling and Answer subsumption” (PITA) that computes the probability of queries by transforming a probabilistic program into a normal program and then applying SLG resolution with answer subsumption. PITA has been implemented in XSB and tested on six domains: two with function symbols and four without. The execution times are compared with those of ProbLog, cplint and CVE, PITA was almost always able to solve larger problems in a shorter time, on domains with and without function symbols.

💡 Research Summary

The paper addresses a fundamental limitation of the distribution semantics that underlies many probabilistic logic programming languages such as Independent Choice Logic, PRISM, pD, LPADs and ProbLog. When function symbols are present, the set of possible worlds becomes infinite and the probability of each world collapses to zero, making the probability of a query undefined. Existing solutions either enforce acyclicity of the program or require that every query admit a finite covering set of explanations; both constraints are too restrictive for realistic applications.

To overcome this, the authors introduce the notion of bounded‑term‑size programs and queries. Building on Przymusinski’s dynamic stratification and an iterated fix‑point construction, a program is bounded‑term‑size if, at each iteration of the fix‑point, the size of the true atoms does not grow without bound. A query is bounded‑term‑size when the fragment of the program relevant to the query satisfies the same property. Under this condition the authors prove that every query possesses a finite, mutually incompatible covering set of explanations, guaranteeing that the probability measure defined by Poole (2000) is well‑defined and unique.

With this theoretical foundation they develop the algorithm PITA (Probabilistic Inference with Tabling and Answer subsumption). The algorithm works in three stages:

Program transformation – each LPAD clause is turned into a normal Prolog clause augmented with an extra argument that carries a Binary Decision Diagram (BDD) representing the set of explanations for that subgoal.
Tabling – using XSB’s SLG‑resolution, subgoal answers (i.e., their BDDs) are stored and reused, eliminating redundant recomputation especially in recursive programs.
Answer subsumption – when the same subgoal is derived via different derivation paths, their BDDs are combined with a logical OR operation, yielding a compact representation of the union of explanations.

The combination of tabling and answer subsumption ensures that the BDDs built by PITA exactly correspond to the finite covering sets guaranteed by the bounded‑term‑size property, and that the final probability is obtained by a single BDD evaluation. The authors prove correctness: for any bounded‑term‑size query, PITA terminates, produces a BDD that encodes all explanations, and the computed probability equals the distribution‑semantic probability.

Empirical evaluation was performed on six benchmark domains: two with function symbols (e.g., a family‑relationship model and a natural‑language parsing task) and four function‑free domains (electronic circuit diagnosis, biological networks, robot path planning, and social‑network diffusion). PITA was compared against the state‑of‑the‑art systems ProbLog, cplint, and CVE. Results show that PITA consistently solves larger instances and runs faster, particularly on the function‑symbol domains where the other systems either run out of memory or time out. This demonstrates that the bounded‑term‑size condition is not only theoretically sound but also practically applicable to realistic programs.

In the related‑work discussion the paper highlights that earlier BDD‑based ProbLog implementations could not handle function symbols because the underlying semantics was undefined, and that prior dynamic‑stratification work focused on normal (non‑probabilistic) programs. By integrating bounded‑term‑size semantics with tabling and answer subsumption, PITA offers a novel, scalable solution for probabilistic logic programming with function symbols.

The authors conclude by suggesting future research directions: automatic static analysis tools to detect bounded‑term‑size programs, extensions to richer function‑symbol languages (e.g., with infinite Herbrand bases), and integration with other probabilistic graphical models such as Bayesian networks or Markov Logic Networks. Overall, the paper makes a significant contribution by expanding the theoretical foundations of probabilistic logic programming and delivering an efficient implementation that bridges the gap between theory and practice for programs containing function symbols.

Well-Definedness and Efficient Inference for Probabilistic Logic Programming under the Distribution Semantics

💡 Research Summary

Comments & Academic Discussion

Leave a Comment