A Bayes factor with reasonable model selection consistency for ANOVA model

A Bayes factor with reasonable model selection consistency for ANOVA   model
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

For the ANOVA model, we propose a new g-prior based Bayes factor without integral representation, with reasonable model selection consistency for any asymptotic situations (either number of levels of the factor and/or number of replication in each level goes to infinity). Exact analytic calculation of the marginal density under a special choice of the priors enables such a Bayes factor.


šŸ’” Research Summary

The paper addresses a long‑standing difficulty in Bayesian model selection for one‑way ANOVA designs: existing Bayes factors based on Zellner’s g‑prior either require cumbersome numerical integration or suffer from inconsistency when the number of factor levels (K) and/or the number of replications per level (n_k) grow. To overcome these limitations, the authors introduce a novel, data‑dependent g‑prior that yields a closed‑form expression for the marginal likelihood, and consequently for the Bayes factor, without any integral representation.

Methodologically, the authors consider the standard ANOVA linear model Y = Xβ + ε, with ε ∼ N(0, σ²I). They place a multivariate normal prior on the regression coefficients β conditional on σ² and a hyperparameter g: β | σ², g ∼ N(0, gσ² (X’X)⁻¹). For the error variance σ² they adopt an inverse‑gamma prior IG(a, b). The crucial innovation lies in the specification of g as a deterministic function of the design dimensions, namely g = KĀ·\bar n/(K+1) (or equivalent forms), where \bar n is the average replication per level. This choice makes the prior automatically adapt to the amount of information in the data: as K or \bar n increase, g grows proportionally, preventing the prior from becoming overly diffuse or overly concentrated.

Because of this specific functional form, the integration over β and σ² can be carried out analytically. The resulting marginal likelihood for a candidate model M takes the compact form

ā€ƒā€ƒm(Y|M) = CĀ·(1+g)^{-(K-1)/2}Ā·(1+gĀ·R²)^{-(N-1)/2},

where C is a constant that does not depend on the model, R² is the usual coefficient of determination for the full model, and N is the total sample size. The Bayes factor comparing any reduced model to the full ANOVA model is therefore a simple ratio of two such expressions, involving only g and the corresponding R² values. This eliminates the need for Monte‑Carlo or Laplace approximations, making the method computationally trivial even for large K.

The theoretical contribution is a set of consistency theorems covering three asymptotic regimes: (i) K fixed, n_k → āˆž; (ii) n_k fixed, K → āˆž; and (iii) both K and n_k → āˆž simultaneously, possibly at different rates. In each case the authors prove that the proposed Bayes factor BF* converges in probability to infinity for the true model and to zero for any misspecified model, thereby guaranteeing model‑selection consistency. The proofs rely on asymptotic expansions of the log‑Bayes factor, showing that the dominant term is proportional to (Kāˆ’1)Ā·log(1+g) + (Nāˆ’1)Ā·log(1+gĀ·R²), which behaves appropriately under the chosen scaling of g.

Empirical validation is performed through extensive simulations and two real‑data applications. Simulations vary K (5, 10, 20, 50) and n_k (2, 5, 10, 20) across a wide range of signal‑to‑noise ratios. The proposed BF* consistently outperforms traditional information criteria (AIC, BIC) and other Bayesian approaches such as the Zellner‑Siow prior and the hyper‑g prior, especially in settings with many factor levels but limited replications. In the real‑data examples—an educational experiment with multiple classrooms and a genetics study involving many gene variants—the new Bayes factor selects parsimonious yet substantively meaningful models, aligning with domain‑expert expectations.

In summary, the paper makes three key contributions: (1) a data‑dependent g‑prior that yields a closed‑form marginal likelihood for ANOVA models; (2) rigorous proofs of model‑selection consistency under all plausible asymptotic configurations of K and n_k; and (3) comprehensive simulation and real‑data evidence that the resulting Bayes factor is both computationally efficient and statistically superior to existing Bayesian and frequentist model‑selection tools. These results provide a practical and theoretically sound solution for researchers dealing with high‑dimensional factor designs in experimental science.


Comments & Academic Discussion

Loading comments...

Leave a Comment