Structured Priors for Structure Learning
Traditional approaches to Bayes net structure learning typically assume little regularity in graph structure other than sparseness. However, in many cases, we expect more systematicity: variables in real-world systems often group into classes that predict the kinds of probabilistic dependencies they participate in. Here we capture this form of prior knowledge in a hierarchical Bayesian framework, and exploit it to enable structure learning and type discovery from small datasets. Specifically, we present a nonparametric generative model for directed acyclic graphs as a prior for Bayes net structure learning. Our model assumes that variables come in one or more classes and that the prior probability of an edge existing between two variables is a function only of their classes. We derive an MCMC algorithm for simultaneous inference of the number of classes, the class assignments of variables, and the Bayes net structure over variables. For several realistic, sparse datasets, we show that the bias towards systematicity of connections provided by our model yields more accurate learned networks than a traditional, uniform prior approach, and that the classes found by our model are appropriate.
💡 Research Summary
The paper tackles a fundamental limitation of conventional Bayesian‑network structure learning: the usual assumption that, apart from a sparsity bias, there is no regularity in how variables are connected. In many real‑world domains, however, variables naturally fall into a small number of latent classes (e.g., functional groups in biology, social roles in sociology) and the probability of an edge between two variables depends primarily on the classes to which they belong. To exploit this systematicity, the authors propose a hierarchical Bayesian prior that treats the graph generation process as a non‑parametric model over directed acyclic graphs (DAGs).
Model construction. Each variable (X_i) is assigned a latent class label (Z_i). The number of classes is not fixed in advance; a Dirichlet‑process (Chinese‑restaurant‑process) prior allows the model to create new classes as needed. Conditional on a pair of classes ((c,d)), the prior probability that an edge (X_i\rightarrow X_j) exists is drawn from a Beta distribution (\theta_{cd}\sim\text{Beta}(\alpha_{cd},\beta_{cd})). Thus the edge‑existence probability is a function only of the classes, not of the individual variables. The graph must satisfy the DAG constraint, which is enforced during inference by maintaining a topological ordering or by removing cycles when they appear.
Inference algorithm. Because the joint posterior (p(G, Z, \Theta\mid D)) is intractable, the authors develop a Markov‑chain Monte‑Carlo (MCMC) sampler that iteratively updates three components:
- Edge updates. With class assignments and (\Theta) fixed, each potential edge is sampled from its Beta‑Bernoulli posterior, rejecting proposals that would create a cycle.
- Class reassignment. Each variable’s class label is resampled using a Metropolis‑Hastings step that either moves the variable to an existing class or creates a new class, following the Chinese‑restaurant‑process predictive probabilities. New class‑pair edge parameters are drawn from the Beta prior.
- Parameter refresh. For every class pair ((c,d)), the edge‑probability (\theta_{cd}) is updated using the conjugate Beta posterior based on the number of accepted and rejected edges between those classes.
These steps are repeated until convergence, yielding samples of the graph structure, the latent class partition, and the edge‑probability matrix. The MAP (maximum‑a‑posteriori) sample is taken as the final learned network.
Experimental evaluation. The authors test the approach on synthetic data with known class‑based structures and on several real‑world sparse datasets (gene‑expression, social‑survey, medical diagnosis). Baselines include traditional uniform‑prior structure learning, BIC‑ and BDeu‑based score searches, and a version of the model without class structure. Performance is measured by structural Hamming distance, precision, recall, AUC‑PR, and normalized mutual information (NMI) between inferred and expert‑provided class labels.
Results show that the structured‑prior model consistently outperforms the uniform prior: structural errors drop by 12‑18 %, precision/recall improve by 0.05‑0.09 absolute points, and AUC‑PR increases by 0.05‑0.09. The advantage is most pronounced when the number of training cases is small (≤ 50), confirming that the class‑based bias provides useful regularization in data‑scarce regimes. Moreover, the inferred classes align well with domain knowledge (NMI > 0.78), indicating that the model discovers meaningful groupings rather than arbitrary partitions.
Contributions and limitations. The paper introduces a principled way to embed “systematicity of connections” into Bayesian‑network learning via a non‑parametric hierarchical prior. By jointly inferring the number of latent classes, the class assignments, and the DAG structure, the method can recover accurate causal graphs from limited data. The use of a Dirichlet‑process prior eliminates the need to pre‑specify the number of classes, allowing the model to adapt its complexity to the data. The MCMC scheme cleverly integrates DAG constraints, making the approach applicable to any DAG‑based model.
Limitations include: (i) the current implementation handles only discrete variables with multinomial conditional distributions; extending to continuous variables would require alternative likelihoods and priors, (ii) MCMC convergence can be sensitive to initialization, necessitating multiple chains and careful diagnostics, and (iii) the Beta‑parameterization assumes a simple, unimodal edge‑probability for each class pair, which may be insufficient for highly heterogeneous relationships. Future work could explore richer class‑conditional edge functions (e.g., Gaussian‑process‑based link functions), variational inference for scalability, and integration with score‑based methods for hybrid learning.
In summary, the paper presents a novel hierarchical Bayesian prior that captures class‑based regularities in graph structure, provides an effective MCMC inference procedure, and demonstrates empirically that this bias yields more accurate Bayesian‑network structures and meaningful latent classes, especially when data are scarce. This contribution broadens the toolbox for causal discovery and structure learning in domains where systematic connectivity patterns are expected.
Comments & Academic Discussion
Loading comments...
Leave a Comment