Algorithms and Complexity Results for Exact Bayesian Structure Learning

Bayesian structure learning is the NP-hard problem of discovering a Bayesian network that optimally represents a given set of training data. In this paper we study the computational worst-case complexity of exact Bayesian structure learning under graph theoretic restrictions on the super-structure. The super-structure (a concept introduced by Perrier, Imoto, and Miyano, JMLR 2008) is an undirected graph that contains as subgraphs the skeletons of solution networks. Our results apply to several variants of score-based Bayesian structure learning where the score of a network decomposes into local scores of its nodes. Results: We show that exact Bayesian structure learning can be carried out in non-uniform polynomial time if the super-structure has bounded treewidth and in linear time if in addition the super-structure has bounded maximum degree. We complement this with a number of hardness results. We show that both restrictions (treewidth and degree) are essential and cannot be dropped without loosing uniform polynomial time tractability (subject to a complexity-theoretic assumption). Furthermore, we show that the restrictions remain essential if we do not search for a globally optimal network but we aim to improve a given network by means of at most k arc additions, arc deletions, or arc reversals (k-neighborhood local search).

💡 Research Summary

Bayesian structure learning (BSL) asks for a directed acyclic graph (DAG) that maximizes a given scoring function on a data set. When the score decomposes into a sum of local scores—one for each node and its parent set—search algorithms can exploit this additive structure. The paper focuses on the exact version of BSL, i.e., finding a globally optimal DAG, and studies its worst‑case computational complexity under structural restrictions on the so‑called super‑structure. The super‑structure is an undirected graph that contains the skeleton (the undirected version) of every feasible solution DAG; consequently, any edge present in the super‑structure may be oriented in a solution, while edges absent can never appear. By limiting the shape of this graph, the authors obtain both algorithmic upper bounds and matching hardness results.

The first major contribution is an algorithm that runs in non‑uniform polynomial time when the super‑structure has bounded treewidth. A tree decomposition of width k yields bags of at most k+1 vertices. The algorithm performs dynamic programming over the decomposition: for each bag it enumerates all admissible parent sets for the vertices inside the bag, records the best local scores, and propagates consistent choices to neighboring bags. Because the number of possible parent configurations per bag is bounded by 2^{O(k)}, the overall running time is polynomial in the input size, but the degree of the polynomial depends on k, which is why the result is termed “non‑uniform”.

When, in addition, the super‑structure’s maximum degree Δ is bounded by a constant, the enumeration of parent sets becomes independent of the total number of vertices. Each vertex can have at most Δ potential parents, so the number of parent subsets to consider is at most 2^{Δ}. Combined with the bounded‑treewidth dynamic program, this yields a linear‑time algorithm O(|V|) for exact BSL. Thus, the paper identifies a concrete pair of graph‑theoretic parameters—treewidth and maximum degree—whose simultaneous boundedness makes the otherwise NP‑hard problem tractable in practice.

To demonstrate that neither restriction can be dropped without sacrificing tractability, the authors provide a series of hardness proofs. If the treewidth bound is removed, they reduce from graph coloring, establishing NP‑hardness even when the scoring function is decomposable. If the degree bound is removed, they reduce from the k‑Clique problem, showing W