A Bayesian Network Scoring Metric That Is Based On Globally Uniform Parameter Priors
We introduce a new Bayesian network (BN) scoring metric called the Global Uniform (GU) metric. This metric is based on a particular type of default parameter prior. Such priors may be useful when a BN developer is not willing or able to specify domain-specific parameter priors. The GU parameter prior specifies that every prior joint probability distribution P consistent with a BN structure S is considered to be equally likely. Distribution P is consistent with S if P includes just the set of independence relations defined by S. We show that the GU metric addresses some undesirable behavior of the BDeu and K2 Bayesian network scoring metrics, which also use particular forms of default parameter priors. A closed form formula for computing GU for special classes of BNs is derived. Efficiently computing GU for an arbitrary BN remains an open problem.
💡 Research Summary
The paper addresses a long‑standing issue in Bayesian network (BN) structure learning: the choice of a default parameter prior when domain‑specific knowledge is unavailable or impractical to encode. Existing default‑prior scores, most notably BDeu and K2, rely on uniform Dirichlet hyper‑parameters (BDeu) or a globally uniform prior over conditional probability tables (K2). Although mathematically convenient, these priors can produce unintuitive behavior. BDeu’s equal‑strength Dirichlet prior leads to a bias toward complex structures when variables have differing cardinalities, and it can over‑penalize models when data are sparse. K2’s uniform prior, while simpler, ignores the fact that a BN structure already encodes a set of conditional independencies; it therefore treats all parameter configurations as equally likely regardless of the constraints imposed by the structure.
To overcome these shortcomings, the authors propose the Global Uniform (GU) metric, which is derived from a “globally uniform” parameter prior. The GU prior is defined as follows: for a given BN structure S, every joint probability distribution P that is consistent with S (i.e., P respects exactly the independence relations encoded by S and no additional ones) is assigned the same prior probability. In other words, the prior is uniform over the entire space of joint distributions that satisfy the structural constraints, rather than uniform over each conditional probability table independently.
The paper first formalizes this prior and shows how the marginal likelihood P(D | S) can be expressed as an integral over the space of admissible joint distributions. Because the admissible space is defined by linear constraints (the CPTs must sum to one and respect the independencies of S), the integral reduces to a product of Beta/Dirichlet integrals in special cases. The authors derive closed‑form expressions for several important families of networks:
- Tree‑structured BNs – each edge corresponds to a single parent‑child relationship, allowing the marginal likelihood to factor into a product of Beta functions.
- Single‑parent networks – where every node has at most one parent, the same factorization holds.
- Uniform‑cardinality networks – when all variables share the same number of states, the integral simplifies to a known Dirichlet normalization constant.
These derivations demonstrate that, for these classes, the GU score can be computed efficiently and exactly, providing a practical alternative to BDeu and K2 in many common modeling scenarios.
The authors then conduct empirical experiments to illustrate the qualitative differences between GU, BDeu, and K2. In synthetic data sets with variables of heterogeneous cardinalities, BDeu tends to favor structures that over‑represent high‑cardinality variables, because its Dirichlet hyper‑parameter effectively imposes a larger prior sample size on those variables. GU, by contrast, treats all variables uniformly at the joint‑distribution level, eliminating this bias. In low‑sample regimes, BDeu often collapses to overly simple structures (e.g., empty graphs) due to the strong prior influence, whereas GU maintains a more balanced trade‑off between fit and complexity, reflecting the true structural constraints without imposing extra penalties.
Despite these advantages, the paper acknowledges that computing the GU marginal likelihood for arbitrary directed acyclic graphs (DAGs) remains computationally challenging. For networks with multiple parents per node, the admissible joint‑distribution space becomes high‑dimensional, and the integral does not admit a simple closed form. The authors identify this as an open problem and suggest possible directions: variational approximations, Monte‑Carlo integration, or exploiting graph‑decomposition techniques to break the integral into tractable sub‑problems.
In conclusion, the Global Uniform metric introduces a principled, structure‑aware default prior that resolves several pathological behaviors of existing default‑prior scores. By uniformly weighting all joint distributions consistent with a given BN structure, GU aligns the prior more closely with the information already encoded in the graph, leading to more sensible model selection, especially when variable cardinalities differ or data are scarce. The closed‑form results for trees, single‑parent, and uniform‑cardinality networks make GU immediately applicable in many practical settings, while the open question of efficient computation for general DAGs offers a fertile avenue for future research.