ABC likelihood-freee methods for model choice in Gibbs random fields

ABC likelihood-freee methods for model choice in Gibbs random fields
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Gibbs random fields (GRF) are polymorphous statistical models that can be used to analyse different types of dependence, in particular for spatially correlated data. However, when those models are faced with the challenge of selecting a dependence structure from many, the use of standard model choice methods is hampered by the unavailability of the normalising constant in the Gibbs likelihood. In particular, from a Bayesian perspective, the computation of the posterior probabilities of the models under competition requires special likelihood-free simulation techniques like the Approximate Bayesian Computation (ABC) algorithm that is intensively used in population genetics. We show in this paper how to implement an ABC algorithm geared towards model choice in the general setting of Gibbs random fields, demonstrating in particular that there exists a sufficient statistic across models. The accuracy of the approximation to the posterior probabilities can be further improved by importance sampling on the distribution of the models. The practical aspects of the method are detailed through two applications, the test of an iid Bernoulli model versus a first-order Markov chain, and the choice of a folding structure for two proteins.


💡 Research Summary

Gibbs random fields (GRFs) are powerful probabilistic models for describing complex dependence structures, especially in spatial and network data. A major obstacle to their use in Bayesian model selection is the intractability of the normalising constant (partition function), which prevents direct evaluation of model likelihoods and consequently the computation of posterior model probabilities. The paper addresses this challenge by adapting Approximate Bayesian Computation (ABC), a likelihood‑free simulation technique, to the problem of model choice among competing GRFs.

The authors first establish a crucial theoretical result: there exists a single sufficient statistic that is valid across all candidate GRF models, regardless of the specific graph topology or energy function. This common sufficient statistic enables a unified ABC procedure in which the same summary of the data is used to compare every model, dramatically simplifying implementation and ensuring that the distance metric used for acceptance is comparable across models.

The proposed ABC algorithm proceeds as follows. A prior distribution is placed on the model index and on the model‑specific parameters. For each iteration a model is drawn from its prior, parameters are sampled, and synthetic data are generated using Gibbs sampling (or any appropriate MCMC method) under the chosen model. The sufficient statistic is computed for both the observed data and the synthetic data, and a distance (typically Euclidean) between the two summaries is evaluated. If the distance is less than a tolerance ε, the pair (model, parameters) is retained; otherwise it is discarded. After many repetitions, the proportion of retained draws belonging to each model provides an approximation to the posterior model probabilities p(Mk | y).

Because the choice of ε governs a trade‑off between acceptance rate and approximation accuracy, the authors augment the basic ABC scheme with importance sampling. The empirical distribution of retained model indices from an initial ABC run is used as a proposal distribution q(Mk). Each retained draw is then re‑weighted by w = π(Mk)/q(Mk) (where π denotes the original model prior) to correct for the proposal bias. This importance‑sampling correction reduces variance in the posterior estimates, especially when the number of competing models is large or when prior model probabilities are highly non‑uniform.

Two illustrative applications demonstrate the practicality of the method. The first compares an i.i.d. Bernoulli model with a first‑order Markov chain for binary sequences. In this simple setting the sufficient statistic reduces to the total number of successes and the number of transitions, making the ABC implementation straightforward. Results show that, even with modest sample sizes, the ABC posterior probabilities closely match those obtained by exact enumeration, and the importance‑sampling step further improves accuracy for larger ε values.

The second application tackles a biologically motivated problem: selecting the most plausible folding structure for two proteins from a set of competing GRF specifications. Here the sufficient statistic is high‑dimensional, encompassing contact‑pair frequencies, angular distributions, and other structural descriptors. The authors design a graph‑based summary and a normalized distance metric to feed into the ABC algorithm. Despite the complexity of the models, the ABC‑IS (importance‑sampling) approach successfully identifies the folding model that aligns with experimental evidence, illustrating the method’s scalability to realistic, high‑dimensional problems.

Overall, the paper contributes three major advances: (1) a proof of the existence of a common sufficient statistic for a broad class of GRFs, (2) a concrete ABC framework tailored to model choice in the presence of intractable likelihoods, and (3) an importance‑sampling refinement that enhances posterior probability estimation without excessive computational burden. By marrying rigorous theory with practical algorithmic design, the work opens a viable pathway for Bayesian model selection in settings where traditional likelihood‑based methods are infeasible, thereby extending the applicability of GRFs to a wider array of scientific domains.


Comments & Academic Discussion

Loading comments...

Leave a Comment