A Bayes factor with reasonable model selection consistency for ANOVA model
For the ANOVA model, we propose a new g-prior based Bayes factor without integral representation, with reasonable model selection consistency for any asymptotic situations (either number of levels of the factor and/or number of replication in each level goes to infinity). Exact analytic calculation of the marginal density under a special choice of the priors enables such a Bayes factor.
š” Research Summary
The paper addresses a longāstanding difficulty in Bayesian model selection for oneāway ANOVA designs: existing Bayes factors based on Zellnerās gāprior either require cumbersome numerical integration or suffer from inconsistency when the number of factor levels (K) and/or the number of replications per level (n_k) grow. To overcome these limitations, the authors introduce a novel, dataādependent gāprior that yields a closedāform expression for the marginal likelihood, and consequently for the Bayes factor, without any integral representation.
Methodologically, the authors consider the standard ANOVA linear model Y = Xβ + ε, with ε ā¼ N(0, ϲI). They place a multivariate normal prior on the regression coefficients β conditional on ϲ and a hyperparameter g: β | ϲ, g ā¼ N(0, gϲ (X’X)ā»Ā¹). For the error variance ϲ they adopt an inverseāgamma prior IG(a, b). The crucial innovation lies in the specification of g as a deterministic function of the design dimensions, namely g = KĀ·\bar n/(K+1) (or equivalent forms), where \bar n is the average replication per level. This choice makes the prior automatically adapt to the amount of information in the data: as K or \bar n increase, g grows proportionally, preventing the prior from becoming overly diffuse or overly concentrated.
Because of this specific functional form, the integration over β and ϲ can be carried out analytically. The resulting marginal likelihood for a candidate model M takes the compact form
āām(Y|M) = CĀ·(1+g)^{-(K-1)/2}Ā·(1+gĀ·R²)^{-(N-1)/2},
where C is a constant that does not depend on the model, R² is the usual coefficient of determination for the full model, and N is the total sample size. The Bayes factor comparing any reduced model to the full ANOVA model is therefore a simple ratio of two such expressions, involving only g and the corresponding R² values. This eliminates the need for MonteāCarlo or Laplace approximations, making the method computationally trivial even for large K.
The theoretical contribution is a set of consistency theorems covering three asymptotic regimes: (i) K fixed, n_k ā ā; (ii) n_k fixed, K ā ā; and (iii) both K and n_k ā ā simultaneously, possibly at different rates. In each case the authors prove that the proposed Bayes factor BF* converges in probability to infinity for the true model and to zero for any misspecified model, thereby guaranteeing modelāselection consistency. The proofs rely on asymptotic expansions of the logāBayes factor, showing that the dominant term is proportional to (Kā1)Ā·log(1+g) + (Nā1)Ā·log(1+gĀ·R²), which behaves appropriately under the chosen scaling of g.
Empirical validation is performed through extensive simulations and two realādata applications. Simulations vary K (5, 10, 20, 50) and n_k (2, 5, 10, 20) across a wide range of signalātoānoise ratios. The proposed BF* consistently outperforms traditional information criteria (AIC, BIC) and other Bayesian approaches such as the ZellnerāSiow prior and the hyperāg prior, especially in settings with many factor levels but limited replications. In the realādata examplesāan educational experiment with multiple classrooms and a genetics study involving many gene variantsāthe new Bayes factor selects parsimonious yet substantively meaningful models, aligning with domaināexpert expectations.
In summary, the paper makes three key contributions: (1) a dataādependent gāprior that yields a closedāform marginal likelihood for ANOVA models; (2) rigorous proofs of modelāselection consistency under all plausible asymptotic configurations of K and n_k; and (3) comprehensive simulation and realādata evidence that the resulting Bayes factor is both computationally efficient and statistically superior to existing Bayesian and frequentist modelāselection tools. These results provide a practical and theoretically sound solution for researchers dealing with highādimensional factor designs in experimental science.
Comments & Academic Discussion
Loading comments...
Leave a Comment