DNF Sampling for ProbLog Inference

DNF Sampling for ProbLog Inference
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Inference in probabilistic logic languages such as ProbLog, an extension of Prolog with probabilistic facts, is often based on a reduction to a propositional formula in DNF. Calculating the probability of such a formula involves the disjoint-sum-problem, which is computationally hard. In this work we introduce a new approximation method for ProbLog inference which exploits the DNF to focus sampling. While this DNF sampling technique has been applied to a variety of tasks before, to the best of our knowledge it has not been used for inference in probabilistic logic systems. The paper also presents an experimental comparison with another sampling based inference method previously introduced for ProbLog.


💡 Research Summary

The paper addresses the fundamental inference problem in ProbLog, a probabilistic extension of Prolog where facts are annotated with probabilities. Traditional ProbLog inference reduces a query to a propositional formula in disjunctive normal form (DNF) and then computes the probability of that DNF. This computation requires solving the disjoint‑sum problem: the probabilities of overlapping proofs must be combined without double‑counting, a task known to be #P‑hard. Existing exact approaches rely on binary decision diagrams (BDDs) or ordered BDDs, which quickly become memory‑intensive for large programs. Approximate alternatives typically employ plain Monte‑Carlo sampling over the space of possible worlds; however, when the DNF contains many clauses or the probability mass is concentrated in a few rare proofs, naïve sampling becomes inefficient and yields high variance estimates.

The authors propose a novel approximation technique called DNF sampling that directly exploits the structure of the DNF rather than sampling the full possible‑world space. The method proceeds in two stages. First, each clause of the DNF is assigned a weight equal to the product of the probabilities of its literals (the probability that the clause is true under independent fact assumptions). These clause weights define a categorical distribution over the clauses. A clause is sampled according to this distribution, which biases the process toward clauses that contribute most to the overall probability. Second, the literals inside the selected clause are instantiated by independently sampling each underlying probabilistic fact according to its original probability. This yields a concrete possible world that satisfies the chosen clause.

Because the sampling process is biased toward high‑weight clauses, an importance‑weight correction is necessary to obtain an unbiased estimator of the DNF’s total probability. The correction factor is the ratio of the true clause probability to the sampling probability of that clause, multiplied by the product of the sampled literal probabilities. The authors prove that the resulting estimator is unbiased and derive a Chernoff‑type bound that links the number of samples (N) to the desired confidence interval width (\epsilon). Consequently, practitioners can pre‑compute the sample size required for a target error bound.

The experimental evaluation compares three systems: (1) the exact ProbLog engine based on BDDs, (2) the previously introduced MC‑ProbLog which performs naïve Monte‑Carlo sampling over possible worlds, and (3) the new DNF‑sampling approach. Benchmarks span three domains: (a) graph‑based path queries (e.g., probability that a path exists between two nodes), (b) genetic network inference (probability of a particular gene expression pattern), and (c) social‑network diffusion models (probability that a rumor reaches a target set). In all cases the DNF contains thousands to tens of thousands of clauses, with clause lengths ranging from 10 to 100 literals.

Results show that DNF sampling achieves comparable accuracy to the exact BDD engine: 95 % confidence intervals consistently contain the exact probability, and the empirical error matches the theoretical bound. Compared with MC‑ProbLog, DNF sampling dramatically reduces variance, especially when the DNF is large and the probability mass is concentrated in a few long clauses. In such settings MC‑ProbLog often fails to converge even after millions of samples, whereas DNF sampling reaches a stable estimate within a few hundred thousand samples. Runtime measurements indicate a 30 %–50 % speed‑up over MC‑ProbLog for equivalent error levels, while memory consumption remains modest because only the DNF representation (not the full BDD) needs to be stored.

The paper concludes by outlining future research directions. First, the authors suggest extending DNF sampling to other probabilistic logic languages such as PRISM and LPAD, where similar DNF reductions are possible. Second, they propose incremental updates to the DNF when new facts are added or removed, enabling dynamic inference without recomputing the entire clause set. Third, they discuss exploiting GPU parallelism to generate massive numbers of clause‑wise samples, which could further lower latency for real‑time applications. Overall, the work demonstrates that leveraging the internal DNF structure for importance‑biased sampling provides a practical middle ground between exact, memory‑heavy inference and naïve, high‑variance Monte‑Carlo methods, opening a new avenue for scalable probabilistic logic reasoning.


Comments & Academic Discussion

Loading comments...

Leave a Comment