Bayesian MAP Model Selection of Chain Event Graphs
The class of chain event graph models is a generalisation of the class of discrete Bayesian networks, retaining most of the structural advantages of the Bayesian network for model interrogation, propagation and learning, while more naturally encoding asymmetric state spaces and the order in which events happen. In this paper we demonstrate how with complete sampling, conjugate closed form model selection based on product Dirichlet priors is possible, and prove that suitable homogeneity assumptions characterise the product Dirichlet prior on this class of models. We demonstrate our techniques using two educational examples.
💡 Research Summary
The paper “Bayesian MAP Model Selection of Chain Event Graphs” addresses the challenging problem of selecting the most appropriate Chain Event Graph (CEG) model within a Bayesian framework. CEGs are a generalisation of discrete Bayesian networks (BNs) that explicitly encode the order of events and allow for asymmetric state spaces, making them especially suitable for domains such as medical diagnosis, educational pathways, reliability engineering, and any sequential decision‑making context where the underlying process does not conform to the symmetric, fully‑connected structure of a BN.
The authors begin by reviewing the construction of a CEG from an event tree. An event tree enumerates every possible sequence of primitive events; CEGs then merge vertices (called “situations”) that share the same conditional probability structure, resulting in a more compact graph that still retains the full probabilistic semantics of the original tree. This merging creates a set of directed edges that represent conditional transitions, each of which can be parameterised by a probability vector.
A central obstacle to Bayesian model selection for CEGs has been the lack of a conjugate prior that respects the graph’s hierarchical, non‑regular structure. The paper proposes to adopt a product Dirichlet prior: for each situation (s) a Dirichlet distribution (\text{Dir}(\alpha_{s1},\dots,\alpha_{sK_s})) is placed on its outgoing transition probabilities, and the joint prior over the whole CEG is the product of these independent Dirichlet components. This prior is natural because (i) it is conjugate to the multinomial likelihood generated by counts of observed transitions, and (ii) it mirrors the factorisation of the CEG’s likelihood into situation‑specific terms.
The authors introduce a “homogeneity assumption” that stipulates the Dirichlet hyper‑parameters are the same for all situations that are structurally equivalent (i.e., have the same number of outgoing edges and play the same role in the graph). Under this assumption, they prove that the product Dirichlet prior is the unique prior that satisfies a set of desirable invariance properties (exchangeability across symmetric situations, consistency under marginalisation, and factorisation according to the CEG topology). This theoretical result justifies the use of the product Dirichlet as a principled, non‑informative prior for CEGs.
With the prior in place, the paper shows that Bayesian model selection can be performed analytically when the data constitute a complete sample—that is, every root‑to‑leaf path of the underlying event tree is observed at least once. In this setting, the sufficient statistics for each situation are simply the observed transition counts. The posterior for each situation remains Dirichlet with parameters (\alpha_{sj}+n_{sj}), where (n_{sj}) is the count of transitions from situation (s) to child (j). The Maximum A Posteriori (MAP) estimate of the transition probabilities for a given CEG is then obtained by taking the mode of each posterior Dirichlet component, which has the closed‑form expression ((\alpha_{sj}+n_{sj}-1)/(\sum_{k} \alpha_{sk}+n_{sk}-K_s)).
Model comparison proceeds by evaluating the exact posterior probability of each candidate CEG. Because the Dirichlet–multinomial conjugacy yields a closed‑form marginal likelihood (the Dirichlet‑multinomial integral), the posterior probability of a CEG (G) given data (D) is proportional to
\
Comments & Academic Discussion
Loading comments...
Leave a Comment