A Greedy, Flexible Algorithm to Learn an Optimal Bayesian Network Structure

In this report paper we first present a report of the Advanced Machine Learning Course Project on the provided data set and then present a novel heuristic algorithm for exact Bayesian network (BN) structure discovery that uses decomposable scoring functions. Our algorithm follows a different approach to solve the problem of BN structure discovery than the previously used methods such as Dynamic Programming (DP) and Branch and Bound to reduce the search space and find the global optima space for the problem. The algorithm we propose has some degree of flexibility that can make it more or less greedy. The more the algorithm is set to be greedy, the more the speed of the algorithm will be, and the less optimal the final structure. Our algorithm runs in a much less time than the previously known methods and guarantees to have an optimality of close to 99%. Therefore, it sacrifices less than one percent of score of an optimal structure in order to gain a much lower running time and make the algorithm feasible for large data sets (we may note that we never used any toolbox except for result validation)

💡 Research Summary

The paper tackles the notoriously hard problem of learning the structure of a Bayesian Network (BN) from data, where the goal is to find a directed acyclic graph (DAG) that maximizes a decomposable scoring function such as BIC or BDeu. Exact approaches like Dynamic Programming (DP) and Branch‑and‑Bound can guarantee optimality but suffer from exponential time and memory requirements (DP needs O(n·2ⁿ) memory and O(n²·2ⁿ) time), making them infeasible for more than a few dozen variables. Heuristic methods such as greedy hill‑climbing, tabu search, or evolutionary algorithms are faster but often provide no quantitative guarantee on how far the obtained score is from the optimum.

The authors propose a novel “flexible greedy” algorithm that sits between these extremes. The method consists of two main stages. In the first stage, for each variable the algorithm enumerates all possible parent sets, computes the local score, and sorts the sets by decreasing score. A single tunable parameter γ∈