Bayesian network learning with cutting planes

The problem of learning the structure of Bayesian networks from complete discrete data with a limit on parent set size is considered. Learning is cast explicitly as an optimisation problem where the goal is to find a BN structure which maximises log marginal likelihood (BDe score). Integer programming, specifically the SCIP framework, is used to solve this optimisation problem. Acyclicity constraints are added to the integer program (IP) during solving in the form of cutting planes. Finding good cutting planes is the key to the success of the approach -the search for such cutting planes is effected using a sub-IP. Results show that this is a particularly fast method for exact BN learning.

💡 Research Summary

The paper addresses the exact learning of Bayesian network (BN) structures from complete discrete data under a hard limit on the size of parent sets. The authors formulate the problem as a combinatorial optimization task: the objective is to maximize the log marginal likelihood (the BDe score) of the network given the data. To this end, they pre‑compute the BDe score for every admissible parent set (subject to the user‑specified maximum number of parents) and introduce a binary decision variable (x_{i,\Pi}) that indicates whether node (i) selects parent set (\Pi). The resulting integer programming (IP) model contains (i) a linear objective that is a weighted sum of the pre‑computed scores, (ii) “one‑parent‑set‑per‑node” constraints ensuring exactly one parent set is chosen for each node, and (iii) constraints limiting the cardinality of each parent set.

A crucial difficulty in BN learning is the acyclicity requirement. Directly encoding acyclicity as linear constraints leads to an exponential number of inequalities, which is impractical. Instead, the authors adopt a cutting‑plane approach within the SCIP framework. The initial IP contains no acyclicity constraints; SCIP first solves a linear relaxation of the model. If the relaxed solution corresponds to a cyclic graph, a new linear inequality (a cutting plane) is generated to cut off that cyclic solution. The generic form of such a cut for a detected cycle (C) is (\sum_{i\in C}\sum_{\Pi\ni i} x_{i,\Pi} \le |C|-1), which forces at least one node in the cycle to drop the offending parent set, thereby breaking the cycle.

The effectiveness of the method hinges on finding strong cuts quickly. To this end, the authors devise a secondary integer program (sub‑IP) whose purpose is to locate the most violated cycle in the current LP solution. The sub‑IP treats the fractional values of the decision variables from the LP relaxation as weights and solves a “minimum‑weight cycle” problem: it introduces binary variables (y_i) indicating whether node (i) participates in the cycle and auxiliary ordering variables to enforce a directed acyclic ordering. The objective minimizes (\sum_i w_i y_i) where (w_i) are the LP values of the corresponding parent‑set variables. If the optimal value is less than (|C|-1), the associated cycle yields a valid cutting plane. Because the sub‑IP is solved exactly, the generated cuts are provably the strongest possible for the current relaxation.

The authors evaluate their approach on a suite of standard BN benchmark datasets (Alarm, Barley, Insurance, Hailfinder, etc.) with node counts ranging from 10 to 50 and parent‑set limits of 3 or 4. They compare against dynamic programming, A* search, and other integer‑programming based methods (e.g., CPLEX formulations). Performance metrics include total runtime to reach the optimal solution, memory consumption, number of cuts generated, and time spent solving the sub‑IP. Results show that the SCIP + cutting‑plane method consistently outperforms the baselines, often achieving a 2–10× speed‑up while using comparable or less memory. Even for the larger networks (≈30–50 nodes), the algorithm finds the exact optimal structure without exhausting resources. The cutting planes account for roughly 10–15 % of the overall runtime, and each sub‑IP is typically solved in a few milliseconds, confirming that the overhead of generating strong cuts is modest.

Key contributions of the paper are threefold: (1) a clean integer‑programming formulation of BN structure learning that separates scoring from structural constraints, (2) a dynamic cutting‑plane scheme that enforces acyclicity without enumerating an exponential set of constraints, and (3) a dedicated sub‑IP that efficiently discovers the most violated cycles, guaranteeing high‑quality cuts. The authors argue that this framework is readily extensible to more complex settings, such as partially observed data, continuous variables, or other graphical models (e.g., Markov random fields). Future work may explore heuristic warm‑starts for the sub‑IP, integration with meta‑heuristic search, and adaptation to hybrid scoring functions.

In summary, the paper demonstrates that exact Bayesian network structure learning can be made practically fast by embedding the problem in a modern IP solver, using cutting planes to handle acyclicity, and employing a specialized sub‑IP to generate strong cuts. This combination yields a state‑of‑the‑art exact learning algorithm that bridges the gap between theoretical optimality and real‑world scalability.