In this paper, we extend to generalized linear models (including logistic and other binary regression models, Poisson regression and gamma regression models) the robust model selection methodology developed by Mueller and Welsh (2005; JASA) for linear regression models. As in Mueller and Welsh (2005), we combine a robust penalized measure of fit to the sample with a robust measure of out of sample predictive ability which is estimated using a post-stratified m-out-of-n bootstrap. A key idea is that the method can be used to compare different estimators (robust and nonrobust) as well as different models. Even when specialized back to linear regression models, the methodology presented in this paper improves on that of Mueller and Welsh (2005). In particular, we use a new bias-adjusted bootstrap estimator which avoids the need to centre the explanatory variables and to include an intercept in every model. We also use more sophisticated arguments than Mueller and Welsh (2005) to establish an essential monotonicity condition.
Model selection is fundamental to the practical application of statistics and there is a substantial literature on the selection of linear regression models. A growing part of this literature is concerned with robust approaches to selecting linear regression models: see Müller and Welsh (2005) for references. The literature on the selection of generalized linear models (GLM; McCullagh and Nelder, 1989) and the related marginal models fitted by generalized estimating equations (GEE; Liang and Zeger, 1986) -though both are widely used -is much smaller and has only recently incorporated robustness considerations. Hurvich and Tsai (1995) and Pan (2001) 2007) presented approaches based on the bootstrap and cross-validation, respectively. Our purpose in this paper is to generalize the robust bootstrap model selection criterion of Müller and Welsh (2005) to generalized linear models.
The extension of the methodology of Müller and Welsh (2005) from linear regression to generalized linear models is less straightforward than we expected and, as a result, the present paper differs from Müller and Welsh (2005) in two important respects. First, the bias-adjusted mout-of-n bootstrap estimator β c * α,m -E * ( β c * α,m -β c α ) rather than the m-out-of-n bootstrap estimator β c * α,m is used in estimating the expected prediction loss M
n (α) (definitions are given in Section 2). As discussed in more detail in Section 3.2, this achieves the same purpose but avoids the centering of the explanatory variables and the requirement that we include an intercept in every model used in Müller and Welsh (2005). Second, we present a simpler, more general method than that used in Müller and Welsh (2005) for showing that the consistency result applies to particular robust estimators of the regression parameter. As discussed in Section 3.3, we use generalized inverse matrices to decompose the asymptotic variance of the estimator into terms which are easier to handle, write the critical trace term as a simple sum and then show that the terms in this sum have the required properties. Both of these changes were necessitated by the more complicated structure of generalized linear models but they also apply to regression models where they represent improvements to the methodology of Müller and Welsh (2005).
Suppose that we have n independent observations y = (y 1 , . . . , y n ) T and an n × p matrix X whose columns we index by {1, . . . , p}. Let α denote any subset of p α distinct elements from {1, . . . , p} and let X α denote the n × p α matrix with columns given by the columns of X whose indices appear in α. Let x T αi denote the ith row of X α . Then a generalized linear regression model α for the relationship between the response variable y and explanatory variables X is specified by E y i = h(η i ), Var y i = σ 2 v 2 (η i ) with η i = x T αi β α , i = 1, . . . , n,
where β α is an unknown p α -vector of regression parameters. Here h is the inverse of the usual link function and, for simplicity, we have reduced notation by absorbing h into the variance function v. Both h and v are assumed known. Let A denote a set of generalized linear regression models for the relationship between y and X. The purpose of model selection is to choose one or more models α from A with specified desirable properties.
Our perspective on model selection is that a useful model should (i) parsimoniously describe the relationship between the sample data y and X and (ii) be able to predict independent new observations. The ability to parsimoniously describe the relationship between the sample data can be measured by applying a penalised loss function to the observed residuals and we use the expected variance-weighted prediction loss to measure the ability to predict new observa- We define a class of robust model selection criteria in Section 2, present our theoretical results in Section 3, report the results of a simulation study in Section 4, present a real data example in Section 5, and conclude with a short discussion and some brief remarks in Section 6.
Let β c α denote an estimator of type c ∈ C of β α under (1), let σ be a scale parameter, let ρ be a nonnegative loss function, let δ be a specified function of the sample size n, let σ denote a measure of spread of the data, and let y be a vector of future observations at X which are independent of y. Then, we choose models α from a set A for which the criterion function
is small. In practice, we often supplement this criterion with graphical diagnostic methods which further explore the quality of the model in ways that are not amenable to simple mathematical description.
As in Müller and Welsh (2005) we separate the estimators β c α and ρ because in practice we want to compare different estimators indexed by c ∈ C and linking ρ to any one of these estimators may excessively favour that estimator. We adopt the view that we are interested in fitting the core data and predicting core observations
This content is AI-processed based on open access ArXiv data.