Bayesian model comparison and model averaging for small-area estimation
This paper considers small-area estimation with lung cancer mortality data, and discusses the choice of upper-level model for the variation over areas. Inference about the random effects for the areas may depend strongly on the choice of this model, but this choice is not a straightforward matter. We give a general methodology for both evaluating the data evidence for different models and averaging over plausible models to give robust area effect distributions. We reanalyze the data of Tsutakawa [Biometrics 41 (1985) 69–79] on lung cancer mortality rates in Missouri cities, and show the differences in conclusions about the city rates from this methodology.
💡 Research Summary
The paper tackles a fundamental issue in small‑area estimation: the choice of the upper‑level (hierarchical) model that governs the distribution of area‑specific random effects. Using lung‑cancer mortality data from cities in Missouri, the authors demonstrate that inference about city‑level rates can be highly sensitive to this modeling decision, yet selecting an appropriate model is far from trivial.
To address the problem, the authors propose a two‑step Bayesian framework. First, several plausible upper‑level models are specified: (i) a simple normal distribution for the random effects, (ii) a Student‑t distribution with unknown degrees of freedom, and (iii) a finite mixture of two normal components. Each model is embedded in a hierarchical Poisson (or binomial) likelihood for the observed death counts, and vague priors are placed on all hyper‑parameters.
Second, the framework employs Bayesian model comparison and model averaging. Posterior samples are generated for each candidate model using Markov chain Monte Carlo (MCMC). Model fit is assessed via Bayes factors and posterior model probabilities, which quantify the evidence the data provide for each specification. The analysis reveals that the normal model under‑estimates the tail behaviour of the mortality rates, whereas the t‑distribution and the mixture model capture the heavy‑tailed variability observed in high‑mortality cities.
Model averaging then combines the posterior distributions of the random effects across models, weighting each by its posterior model probability. This produces a “robust” posterior for each city’s mortality rate that incorporates model‑selection uncertainty. Compared with the single‑model approach of Tsutakawa (1985), the averaged estimates have similar point estimates but noticeably wider credible intervals for the most extreme cities, reflecting the additional uncertainty about the correct upper‑level distribution.
The re‑analysis shows that conclusions about which cities have unusually high lung‑cancer mortality can change when model uncertainty is accounted for. In particular, cities previously flagged as outliers under the normal model remain outliers, but the evidence is less decisive because the credible intervals are broader. This has practical implications for public‑health policy: resource allocation decisions should consider not only point estimates but also the uncertainty arising from model choice.
Overall, the study provides a general, implementable methodology for evaluating competing hierarchical models and for producing model‑averaged small‑area estimates. By applying it to a classic dataset, the authors illustrate how Bayesian model comparison and averaging can yield more reliable and transparent inference, a lesson that extends to any field where small‑area or hierarchical estimation is required.
Comments & Academic Discussion
Loading comments...
Leave a Comment