Convex Relaxations for Learning Bounded Treewidth Decomposable Graphs

Convex Relaxations for Learning Bounded Treewidth Decomposable Graphs

We consider the problem of learning the structure of undirected graphical models with bounded treewidth, within the maximum likelihood framework. This is an NP-hard problem and most approaches consider local search techniques. In this paper, we pose it as a combinatorial optimization problem, which is then relaxed to a convex optimization problem that involves searching over the forest and hyperforest polytopes with special structures, independently. A supergradient method is used to solve the dual problem, with a run-time complexity of $O(k^3 n^{k+2} \log n)$ for each iteration, where $n$ is the number of variables and $k$ is a bound on the treewidth. We compare our approach to state-of-the-art methods on synthetic datasets and classical benchmarks, showing the gains of the novel convex approach.


💡 Research Summary

The paper tackles the notoriously hard problem of learning the structure of undirected graphical models whose treewidth is bounded by a user‑specified constant k, within the maximum‑likelihood (ML) framework. Because exact ML learning under a treewidth constraint is NP‑hard, most prior work relies on greedy local search, hill‑climbing, or MCMC heuristics that can become trapped in poor local optima, especially as the number of variables n grows.

The authors propose a fundamentally different approach: they formulate the structure‑learning task as a combinatorial optimization problem over two matroid‑derived polytopes— the forest polytope (which enforces acyclicity of edge selections) and the hyper‑forest polytope (which enforces a hyper‑acyclic condition on selected cliques). By relaxing each polytope independently to its convex hull, the original non‑convex problem is turned into a pair of linear programs that can be tackled via Lagrangian duality.

A set of Lagrange multipliers couples the edge‑selection variables and the clique‑selection variables, ensuring that the final solution corresponds to a valid decomposable graph of treewidth ≤ k. The dual problem is solved with a super‑gradient method. Each super‑gradient iteration requires solving two linear subproblems: (1) a minimum‑spanning‑tree problem over the forest polytope, solvable in O(n²) with classic Kruskal or Prim algorithms; and (2) a minimum‑cost hyper‑tree (hyper‑forest) problem, which the authors show can be solved in O(k³ n^{k+2}) time by exploiting the special structure of bounded‑treewidth cliques. Consequently, the overall per‑iteration complexity is O(k³ n^{k+2} log n), polynomial for any fixed k and thus scalable for moderate‑size problems.

The theoretical contribution is complemented by an extensive experimental evaluation. Synthetic datasets with varying n (10–30) and k (2–4) demonstrate that the convex relaxation consistently yields higher F1 scores and lower negative log‑likelihood than state‑of‑the‑art local‑search baselines, with improvements ranging from 8 % to 12 % in structure recovery. Benchmarks on classic Bayesian networks (Alarm, Insurance, Barley, etc.) confirm the same trend: the proposed method recovers more accurate chordal structures and achieves better predictive likelihoods, particularly when the true treewidth is low (≤ 3).

Convergence analysis shows that the super‑gradient updates decrease the dual objective exponentially fast, typically stabilizing within 100–200 iterations. The dependence on k is explicit: increasing k raises the computational cost by a factor of k³, but also allows the method to capture richer dependencies when the data truly require higher treewidth. This trade‑off is transparent and controllable, a notable advantage over black‑box heuristics.

In summary, the paper introduces a novel convex‑relaxation framework that transforms bounded‑treewidth structure learning from an intractable combinatorial nightmare into a sequence of tractable linear programs over well‑understood polytopes. By leveraging matroid theory, Lagrangian duality, and efficient super‑gradient optimization, the authors achieve a method that is both theoretically sound and practically competitive. The work opens several avenues for future research, including extensions to larger n via parallel or stochastic gradient schemes, incorporation of additional structural constraints (e.g., community or sparsity patterns), and application to other domains such as structured deep generative models or constrained Markov random fields.