Approximating Queries on Probabilistic Graphs

Approximating Queries on Probabilistic Graphs
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Query evaluation over probabilistic databases is notoriously intractable – not only in combined complexity, but often in data complexity as well. This motivates the study of approximation algorithms, and particularly of combined FPRASes, with runtime polynomial in both the query and instance size. In this paper, we focus on tuple-independent probabilistic databases over binary signatures, i.e., probabilistic graphs, and study when we can devise combined FPRASes for probabilistic query evaluation. We settle the complexity of this problem for a variety of query and instance classes, by proving both approximability results and (conditional) inapproximability results together with (unconditional) DNNF provenance circuit size lower bounds. This allows us to deduce many corollaries of possible independent interest. For example, we show how the results of Arenas et al. [ACJR21a] on counting fixed-length strings accepted by an NFA imply the existence of an FPRAS for the two-terminal network reliability problem on directed acyclic graphs, a question asked by Zenklusen and Laumanns [ZL11]. We also show that one cannot extend a recent result of van Bremen and Meel [vBM23] giving a combined FPRAS for self-join-free conjunctive queries of bounded hypertree width on probabilistic databases: neither the bounded-hypertree-width condition nor the self-join-freeness hypothesis can be relaxed. We last show how our methods can give insights on the evaluation and approximability of regular path queries (RPQs) on probabilistic graphs in the data complexity perspective, showing in particular that some of them are (conditionally) inapproximable.


💡 Research Summary

The paper investigates the problem of approximating query probabilities on tuple‑independent probabilistic databases when the schema consists of binary relations only, i.e., on probabilistic graphs. The authors focus on the combined complexity setting, where both the query and the instance are part of the input, and ask whether a fully polynomial‑time randomized approximation scheme (FPRAS) exists for a given query‑instance pair.

Model and Problem Definition
A probabilistic graph is a pair (H,π) where H is a directed, edge‑labelled graph (the label set σ may be a singleton, yielding an unlabeled graph) and π assigns an independent existence probability to each edge. Given a deterministic query graph G, the probabilistic graph homomorphism problem (PHom) asks for the probability that a randomly sampled subgraph H′⊆H (according to π) admits a homomorphism from G. This probability is denoted Prπ(G ⇝ H).

Graph Classes Considered
The authors restrict both queries and instances to well‑studied graph families that are tractable in the non‑probabilistic setting:

Query graphs – one‑way paths (1WP), two‑way paths (2WP), downward trees (DWT, edges oriented from root to leaves), and poly‑trees (PT, arbitrary orientation).

Instance graphs – directed acyclic graphs (DAG) and arbitrary graphs (All).

Both labeled (|σ|>1) and unlabeled (|σ|=1) settings are examined separately, yielding a matrix of 4×2 cases per labeling regime.

Main Contributions

  1. Positive Approximation Results
    Proposition 3.1 shows that for one‑way path queries on DAGs there exists a combined FPRAS. The proof hinges on the observation that the Boolean provenance of such queries can be compiled into a nondeterministic ordered binary decision diagram (nOBDD). A recent result by Arenas et al. (2021) provides an FPRAS for counting satisfying assignments of an nOBDD, even in the weighted setting, which directly yields an FPRAS for PHom in this case.

    As a striking corollary, the authors obtain an FPRAS for the two‑terminal network reliability problem (ST‑CON) on DAGs (Theorem 6.3). This resolves a long‑standing open question (originally posed by Zenklusen and Laumanns, 2011) for the acyclic case, showing that the reliability of a directed acyclic network can be approximated efficiently despite its #P‑hardness in exact computation.

  2. Negative Approximation Results
    For all other query/instance class combinations the paper proves (under standard complexity assumptions such as RP≠NP or #P‑hardness) that no combined FPRAS can exist. The hardness proofs are reductions from known #P‑hard problems (e.g., counting homomorphisms, #SAT, or ST‑CON) and often involve constructing gadgets that preserve the probabilistic semantics.

    A particularly strong negative result (Corollaries 6.1 and 6.2) demonstrates that the two conditions required by van Bremen and Meel’s earlier FPRAS (self‑join‑free and bounded hypertree width) cannot be relaxed, even in the severely restricted setting of a single binary relation. Specifically, for treewidth‑1 conjunctive queries on treewidth‑1 tuple‑independent instances, no combined FPRAS exists unless RP=NP.

  3. Lower Bounds on Provenance Circuits
    The authors complement the hardness results with unconditional lower bounds on the size of deterministic decomposable negation normal form (DNNF) circuits that represent Boolean provenance. For every query‑instance pair that is conditionally non‑approximable, any DNNF must have size at least 2^{Ω((|G|+|H|)^{1−ε})} for any ε>0 (Result 1.2). Moreover, for treewidth‑1 queries and instances, a tighter bound of 2^{Ω(|G|+|H|)} is shown (Result 1.3). These bounds illustrate that the inability to approximate is reflected in an intrinsic blow‑up of any succinct logical representation of the provenance.

  4. Regular Path Queries (RPQs) in Data Complexity
    Moving beyond combined complexity, the paper studies RPQs—queries that ask whether there exists a walk whose edge labels form a word belonging to a regular language. While RPQs that describe finite languages reduce to UCQs and are trivially approximable, the authors show that “unbounded” RPQs are at least as hard as ST‑CON even when the query is fixed. Consequently, for many RPQs no polynomial‑time approximation exists in the data‑complexity setting, unless standard complexity collapses occur.

Methodological Highlights

  • nOBDD Compilation: The key technical device for the positive result is the translation of one‑way path provenance into an nOBDD, enabling the use of existing weighted counting algorithms.
  • Gadget Reductions: Negative results rely on carefully designed graph gadgets that simulate logical constraints while preserving independence of edge probabilities.
  • Knowledge Compilation Perspective: By proving exponential lower bounds for DNNF, the authors connect approximation hardness to the expressive limits of the most succinct known knowledge‑compilation target.

Implications
The work provides a near‑complete map of which query/instance combinations on probabilistic graphs admit efficient approximation and which do not. It clarifies that the self‑join‑free and bounded hypertree‑width restrictions are essential for combined FPRASs, and it introduces the first positive combined‑complexity approximation for a non‑trivial class (1WP on DAGs). The FPRAS for DAG‑restricted network reliability opens new algorithmic possibilities for reliability analysis in directed acyclic systems (e.g., workflow or supply‑chain networks). Finally, the RPQ analysis highlights that certain expressive navigational queries remain intractable even for approximation, guiding practitioners toward safer query design.

Overall, the paper advances both the theoretical understanding of probabilistic query evaluation and offers concrete algorithmic tools for a subset of practical scenarios.


Comments & Academic Discussion

Loading comments...

Leave a Comment