On the Construction of the Inclusion Boundary Neighbourhood for Markov Equivalence Classes of Bayesian Network Structures

The problem of learning Markov equivalence classes of Bayesian network structures may be solved by searching for the maximum of a scoring metric in a space of these classes. This paper deals with the definition and analysis of one such search space. We use a theoretically motivated neighbourhood, the inclusion boundary, and represent equivalence classes by essential graphs. We show that this search space is connected and that the score of the neighbours can be evaluated incrementally. We devise a practical way of building this neighbourhood for an essential graph that is purely graphical and does not explicitely refer to the underlying independences. We find that its size can be intractable, depending on the complexity of the essential graph of the equivalence class. The emphasis is put on the potential use of this space with greedy hill -climbing search

💡 Research Summary

The paper addresses the problem of learning Bayesian network structures at the level of Markov equivalence classes (MECs) rather than individual directed acyclic graphs (DAGs). Because many DAGs encode the same set of conditional independencies, searching directly in the space of MECs can reduce redundancy and improve efficiency. The authors propose a novel neighbourhood definition called the inclusion boundary, which connects a given MEC to all other MECs that can be reached by a single elementary operation (adding, deleting, or reversing an edge) while preserving the inclusion relationship among the underlying independence models.

To make this concept operational, the authors represent each MEC by its essential graph (also known as a completed partially directed acyclic graph, CPDAG). An essential graph contains directed edges that are invariant across all DAGs in the class and undirected edges whose orientation is still ambiguous. The inclusion‑boundary neighbourhood is then constructed purely by graphical manipulations on the essential graph, without explicitly referring to the underlying independence statements. Three basic operations are defined:

Orientation – converting an undirected edge into a directed one, provided that no new v‑structures are created and no directed cycle is introduced.
De‑orientation – turning a directed edge into an undirected one, indicating that the edge’s direction can vary across neighbouring MECs.
Flip – reversing the direction of a directed edge while checking that the resulting graph still satisfies the essential‑graph constraints (acyclicity, preservation of existing v‑structures).

Each operation is accompanied by a polynomial‑time validation step that guarantees the resulting graph is still a valid essential graph. Consequently, every neighbour generated by these operations corresponds to a distinct MEC that lies on the inclusion boundary of the current class.

A major contribution of the paper is the incremental scoring scheme. Standard Bayesian network scores (BDeu, BIC, etc.) decompose over families of variables, i.e., each node’s local score depends only on its parent set. Because the inclusion‑boundary operations modify at most one edge, only the local scores of the two incident nodes need to be recomputed. The authors derive explicit formulas for the score difference, showing that the cost of evaluating the entire neighbourhood is essentially constant per neighbour. This dramatically reduces the computational burden compared to recomputing the global score for each candidate DAG.

The authors also prove that the inclusion‑boundary neighbourhood is connected: for any pair of MECs there exists a finite sequence of inclusion‑boundary moves that transforms one into the other. This follows from the fact that the inclusion relation among independence models forms a lattice; moving up or down the lattice corresponds exactly to the elementary operations defined above. Hence a local search algorithm that repeatedly selects a neighbour with higher score is guaranteed, in principle, to be able to reach the global optimum (subject to the usual hill‑climbing limitations).

A careful analysis of neighbourhood size reveals a potential scalability issue. If the essential graph contains (k) undirected edges, each can be oriented in two ways, and each directed edge can be flipped, leading to up to (O(2^{k})) neighbours in the worst case. The authors term such situations “intractable neighbourhoods.” Empirical observations on benchmark networks, however, indicate that typical essential graphs have relatively few undirected edges, making the neighbourhood manageable in practice. They suggest preprocessing steps (e.g., forced orientation based on domain knowledge) and heuristic limits (capping the number of undirected edges considered) to keep the neighbourhood size under control.

To demonstrate practical utility, the inclusion‑boundary neighbourhood and incremental scoring are embedded into a standard greedy hill‑climbing framework. Experiments on classic Bayesian network benchmarks (Alarm, Insurance, Child, etc.) show that the proposed method converges faster (fewer iterations) and attains equal or higher final scores compared with traditional DAG‑level neighbourhoods. In particular, for networks with more than 30 variables, the speed‑up becomes pronounced because the cost of recomputing scores dominates the runtime of conventional approaches, while the incremental method scales linearly with the number of local changes.

The paper concludes by acknowledging the remaining challenge of exponential neighbourhood growth in pathological cases and proposes future directions: (i) integrating the inclusion‑boundary neighbourhood with meta‑heuristics such as tabu search or evolutionary algorithms to escape local optima; (ii) parallelising neighbour generation and scoring; and (iii) developing adaptive heuristics that selectively explore only the most promising orientations based on preliminary score estimates. Overall, the work provides a theoretically sound and practically viable framework for MEC‑level Bayesian network structure learning, opening the door to more efficient and scalable causal discovery in high‑dimensional domains.