Risk and resampling under model uncertainty

IMS Collectio ns Pushing the Limits of Con temp orary Statist ics: Contributions in Honor of Jay an ta K. Ghosh V ol. 3 ( 2008) 155–169 c  Institute of Mathe matical Statistics , 2008 DOI: 10.1214/ 07492170 80000001 29 Risk and resampling under mo de l uncertaint y Snigdhansu Chatterjee 1 and Nitai D. Mukhopadh y ay 2 University of Minnesota and Vir ginia Commonwe alth Univ ersity Abstract: In s tatistical exercises where there ar e sev eral candidate models, the traditional approac h is to select one model using some data dri ven crite- rion and use that mo del for estimation, testing and other purposes, ignoring the v ari ability o f the mo del selection process. W e discuss some problems asso- ciated with this approach. A n alternative scheme is to use a model- a v eraged estimator, that is, a weigh te d av erage of estimators obtained under diﬀerent models, as an estimator of a parameter. W e show that the risk associated with a Ba y esian model-av eraged estimator is b ounded as a function of the sample size, when parameter v alues are ﬁxed. W e establish conditions whi c h ensure that a model- a v eraged estimator’s di stribution can b e consisten tly approximat ed us- ing the b o otstrap. A new, data-adaptiv e, m o del a v eraging sche me is prop osed that balances eﬃciency of estimation w i thout compromising applicability of the bo otstrap. Thi s pap er ill ustr ates that certain desirable risk and resam- pling pr op erties of mo del-av eraged estimators are obtainable when parameters are ﬁxed but unkno wn; this complemen ts several studies on minimaxity and other properties of post-m odel-s elected and model-av eraged estimato rs, where parameters are allow ed to v ary . Con ten ts 1 Int ro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 2 Issues with mo del se lection or av eraging . . . . . . . . . . . . . . . . . . . 15 7 3 Risk proﬁle of mo del-averaged estimators . . . . . . . . . . . . . . . . . . 159 4 Adaptive mo del-averaged estimators and the b o o tstrap . . . . . . . . . . . 161 5 A simulation exa mple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5 6 Discussion and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 1 67 Ac knowledgmen ts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 1. Introduction In typical statistical a pplications, it is rare that a precise mo del is av ailable to ﬁt to the data. Selecting o ne mo del fr o m several co mpe ting mo dels is often the ﬁrst step in the pro cess. How ever, in the subs e quent analysis, it is common to ignore the v aria bility in the initial mo del selection. Two of the many co nsequences of igno ring 1 Sc hool of Statistics, Uni v ersity of Minnesota, 313 F ord Hall , 224 Churc h Street SE, Minneapo- lis, MN 55455, USA, e-mai l : chatterj ee@stat. umn.edu 2 Departmen t of Biostatistics, Virginia Commonw ealth Unive rsity , 730 E. Broad Street, Rich- mond, V irginia 23298, USA, e- mail: ndmukhop adhy@vcu .edu AMS 2000 subje c t classiﬁc ations: Pr imary 60F12; secondary 60J05, 62C10, 62F40. Keywor ds and phr ases: b o otstrap, b ounded r i sk, linear regression, mo del av eraging, mo del se- lection. 155 156 S. Chatterjee and N . Mukhop adhyay mo deling v aria bility are (i) under-estimation of the v a riability of estimator s and predictors, and (ii) erroneo us infere nce and predic tio n, resulting from incor rectly computing the distributions of estimators and predictor s. An alterna tive to selecting a mo del ﬁrst and then co mputing an estimator under that mo del is to consider several mo dels a nd appropr iately average the estimators computed under these mo dels. Several studies hav e b een published recently on the pr o p erties of p ost-mo del- selected and mo del-av eraged estimators ; see for example, [ 8 ], [ 23 ] a nd [ 24 ]. These studies ar e discour aging a s they show that many nice pro p erties asso ciated with estimators under a known mo del v anish when there is mo del uncertaint y . F or exam- ple, Y ang [ 23 ] shows that consistent mo de l selec tion/av eraging, and minimax-rate optimality cannot be sim ultaneously obtained. The review of Leeb and P¨ otscher [ 8 ] contains a discussion of several other pr oblems with inference a fter mo del selection. In view of these neg ative r esults, it see ms des ir able to sca le down our expecta tions while working under mo del uncerta in t y , a nd strive for p ositive, if weak er, results. This may b e achiev ed in one of tw o wa ys: we may either imp ose less str ing ent conditions on our estimator s, or we may r e lax the cr iterion by whic h an estimato r is ev aluated. The latter is the g oal of the present study . The c omputation of an estimator is g enerally one of the ea rly s teps in a sta- tistical exercise. E stimators o f para meters are used for v a rious purp o s es, nota bly for quantifying evidence for or aga inst scientiﬁc hypo theses, obtaining interv al es- timates for the parameter under consideratio n, for prediction and forecasting , and for quantif ying the accuracy of predictions and for ecasts. These applications require knowledge ab out the distribution of the estimator , and knowledge ab out the risk asso ciated with the usag e of such estimators. In this pap er, we co ncentrate on the risk b ehavior of a model-averaged estimator , and on approximating the distribution of a mo del-averaged estimator using the b o o tstrap. In the ﬁr st part of our study we s how that under the traditio nal frequentist a s- sumption that the pa r ameters are ﬁxe d but unknown co nstants, the mean squa red error in regres sion estimation under consistent model selec tio n/av eraging is b ound- ed as a function of sample size. This complements Y ang [ 23 ], where it was shown that a similar quantit y c a nnot achiev e minimax-ra te optimality . Several of the negative results, including those o f Y ang [ 23 ], arise when a parameter is a known co nstant in a smaller mo del, while it is allow ed to v ary in a lo cal neighborho o d of that constant in a larger mo del. Recently , Hjort and Claeskens [ 5 ] studied mo del aver- aged estimators under a lo cal para meter framework. Lo ca l parameters are ideal fo r mathematical development, but they ar e not reﬂective o f statistical r eality; see [ 17 ]. Indeed, as Hjort and Cla e skens themselves remar k in the rejoinder to the discus- sion of their pap er, “a too literal b elief in sample-size- de p endent par ameters would clash with Kolmog orov consistency and other requir e ments o f na tural statistical mo dels.” [ 5 ]. In view o f this, it is meaningful to verify tha t es timators hav e rea - sonable r isk b ehavior under c o nsistent mo del selection/averaging when para meters are ﬁxed cons tants. Our result also implies that inte gr ate d risks under consistent mo del selection/averaging a re b ounded, when integrals are taken with resp ect to any pro bability measur e on the par ameter space tha t do es not depend on sample size. In the second part o f our study , in addition to the assumption that the para me- ters are ﬁxed but unknown constants, we a lso w eaken the consistency re q uirement of the mo del av eraging pro cedure. In the ter minology of Y ang [ 23 ], a mo del selec- tion/av eraging scheme is c onsistent if it is asymptotically degener ate at the true mo del, when the true mo del is one of the candidate mo dels . When the mo dels ar e Mo del unc ert ainty 157 nested a nd several o f them can co rrectly descr ibe the data gene r ation pro cess , the most parsimonio us corre ct mo del is taken as the true mo del. W e call this str ong c onsistency . W e deﬁne a model selection/averaging scheme as we akly c onsistent if it selects or averages o ver all candida te mo dels that corr ectly describ e the data gen- eration pr o cess. When only one mo del is c o rrect, the stro ng and weak consistency requirements are identical; but if mo dels a re nes ted and several of them are cor- rect, a weakly consistent scheme may distribute weigh ts among all of them while a strongly cons istent one is as ymptotically degenera te at the smallest one. Recently , Leung and Barr on [ 11 ] pro po sed a scheme of mo del av eraging that r esults in nice risk b ehavior. Their scheme is an example of a weakly consistent pro cedur e. W e show that a pa rticular choice of a weakly consistent mo del-averaged estimator has a distribution that ca n be approximated using the b o otstra p. In Sectio n 2 we prop ose a s imple linear regre s sion mo del fra mework to study mo del uncer taint y . W e also discuss some of the prop er ties of p ost-mo del- selection estimators that make them unsuitable for further applications, and a lso some prop- erties of model-averaged estimato rs. This is follow ed in Sec tion 3 with a discussion of mea n squared err or of the Bay esian mo del-averaged estimator . In Section 4 we prop ose a new adaptive, mo del-av eraged estimato r whose distr ibutio n may b e con- sistently approximated using the b o otstrap. A simu lation example is discussed in Section 5 . Finally , in Section 6 we discus s some aspe c ts of our results, and p o int to some op en issues r elating to mo del uncerta in t y . 2. Iss ues with mo de l sel e ction or a v e raging W e sele ct a simple reg ression framework for our study , which is the s ame a s tha t used by [ 8 ], and similar to that of [ 24 ]. The observed data { ( Y t , x t = ( x t 1 , x t 2 ) T ) , t = 1 , . . . , n } , are mo deled as Y t = αx t 1 + β x t 2 + e t , (2.1) where the e t ’s are indep endent, identically distr ibuted N (0 , σ 2 ), σ 2 known. The design matrix X with rows given by x T t = ( x t 1 , x t 2 ) is non-r andom. W e deno te the tw o columns of X a s X 1 and X 2 , the vector o f err ors as e , and the vector of obse r v atio ns a s Y . The inner pro ducts and norms used b elow are the usual Euclidean ones. The nota tio n D is used for the determinant o f the desig n matrix, th us D = || X 1 || 2 || X 2 || 2 − < X 1 , X 2 > 2 . The unknown pa rameters in this model are ( α, β ). Mo del uncertaint y surrounds the issue o f whether or not β = 0. In this pap er, for ea se in presentation, w e consider the problem of estimatio n of α . W e make the standard a ssumption that n − 1 X T X → Q fo r a p ositive deﬁnite matrix Q . This, in pa rticular, implies the sta nda rd design conditions || X 1 || 2 = O ( n ) , || X 2 || 2 = O ( n ) , (2.2) < X 1 , X 2 > = O ( n ) , D = || X 1 || 2 || X 2 || 2 − < X 1 , X 2 > 2 = O ( n 2 ) . (2.3) W e als o assume that n − 1 < X 1 , X 2 > 6− → 0 as n → ∞ , since without this restriction the eﬀect of mo del uncertaint y v a nishes in this fra mework. The true mo del, ca lled M 0 , may b e describ ed as M 0 =  U (unrestricted) if β 6 = 0; R (res tricted) if β = 0. 158 S. Chatterjee and N . Mukhop adhyay Under U , we adopt the ordina ry leas t squa res or maximum likelihoo d esti- mators d ( α, β ) = ( X T X ) − 1 X T Y . Our no tation for these are ( ˆ α ( U ) , ˆ β ( U )). Un- der R , ˆ β ( R ) ≡ 0 , and the ordina ry lea st squares or maximum likeliho o d esti- mator for α is ˆ α ( R ) = [ P i x 2 1 i ] − 1 P x 1 i y i . Deﬁne V 1 = σ − 1 || X 1 || − 1 < X 1 , e > and V 2 = σ − 1 D − 1 / 2 || X 1 ||  < X 2 , e > −|| X 1 || − 2 < X 1 , X 2 >< X 1 , e >  , thus V = ( V 1 , V 2 ) T ∼ N (0 , I 2 ). In terms of V , the es tima tors are   ˆ α ( R ) ˆ α ( U ) ˆ β ( U )   =   α + β || X 1 || − 2 < X 1 , X 2 > + σ || X 1 || − 1 V 1 , α + σ || X 1 || − 1 V 1 − σ || X 1 || − 1 D − 1 / 2 < X 1 , X 2 > V 2 , β + σ || X 1 || D − 1 / 2 V 2 .   The dichotom y b etw een the bias of the res tricted mo del R and the varianc e o f the unrestricted mo del U can be clearly seen in the ab ov e formula. The restricted model estimator ˆ α ( R ) has a bias factor β || X 1 || − 2 < X 1 , X 2 > , which v anishes under R , while ˆ α ( U ) has an ex tr a factor o f σ || X 1 || − 1 D − 1 / 2 < X 1 , X 2 > V 2 that inﬂates its v aria nce relative to ˆ α ( R ). Hence, mo del selection or mo del av eraging is essentially a pro ce s s o f balancing bias and v ariance; see [ 20 ]. Let σ β be the standard deviation of ˆ β ( U ). This is a non-r andom, known num ber depe nding on σ 2 and X . The following mo del selection criter io n is used: ˆ M = ( U if | n − 1 / 2 σ − 1 β ˆ β ( U ) | > c ; R if | n − 1 / 2 σ − 1 β ˆ β ( U ) | ≤ c . The ab ov e criterion may b e identiﬁed as repres ent ative of standar d mo del selec tion to ols, in the s imple reg ression mo del. In particular, the a bove criterio n is the tra- ditional pre-test pro cedure based on the likeliho o d ratio, coincides with the Akaike Information Criterion (AIC) if c = √ 2, and coincides with the Bayesian Infor- mation Criterion (BIC) if c = √ log n . The p os t-mo del-selection estimator of α is ˜ α = ˆ α ( R ) I { ˆ M = R } + ˆ α ( U ) I { ˆ M = U } . (2.4) Several nice prop er ties are known ab out ˆ M a nd, consequently , it is g enerally belie ved that ˜ α will a lso hav e go o d prop erties. So me of the imp ortant pr op erties include that for all β and as c → ∞ , n − 1 / 2 c → 0, P [ ˆ M = M 0 ] → 1, { ˆ M = M 0 } ⊆ { ˜ α = ˆ α ( M 0 ) } a nd thus P [ ˜ α = ˆ α ( M 0 )] → 1 (see [ 15 ]). Note that ˆ α ( M 0 ) is the “orac le ’s guess” ab out α , and is not a statistic, since it is base d o n the k nowledge of β . The a bove prop erties tend to give the impress ion that ˜ α is a very go o d estimator . How ev er, there a re some ma jor pr oblems since the ab ov e results are asy mptotic in nature, and the a symptotics can take a long time to kick in, as w ell as b e dep endent on the v alue o f β . Our primary reference for this mo del and its basic pr op erties [ 8 ] ident iﬁes this as a problem of non-uniformity in β of the conv ergence of ˆ M and ˜ α . It can b e immedia tely seen that the estimator ˜ α is sup er-eﬃcient when c → ∞ , c/ √ n → 0, as with BIC. The ma jor rep ercus sions of super -eﬃciency of ˜ α and the non-uniformity of its asy mptotics is in its r isk per formance, and in its ﬁnite sample behavior. The mea n square d error of ˜ α is un bounded a nd dep ends on β , while that of ˆ α ( M 0 ) is a consta nt . As a consequence, the ﬁnite sa mple b ehavior of ˜ α is erratic and can b e quite unlike its asymptotic approximation. Av aila ble simulations conﬁr m this; see [ 8 ]. Several other studies conducted by Lee b, P ¨ otsc her, Y ang and other s reveal how and why the pro p erties of ˜ α and ˆ α ( M 0 ) diﬀer. F or further informa tion see, for exa mple, [ 6 , 7 , 8 , 9 , 10 , 22 , 24 , 25 ]. Mo del unc ert ainty 159 The sup e r-eﬃciency of ˜ α results in most v ariations o f the bo otstrap b eing in- applicable. Only subsampling ([ 14 ]) and the m -o ut-of- n b o otstrap with m/ n → 0 would y ield consistent approximations of the distr ibution of ˜ α . Unfortunately , these metho ds hav e problems of their own, some details o f which can b e found in [ 18 ] and [ 1 ]. Sp eciﬁcally , although subsampling is asy mptotically consistent, it can p e r form miserably in ﬁnite samples. F o r any α ∈ (0 , 1), the actual asymptotic cov erage o f a standard level (1 − α ) s ubsampling conﬁdence int erv a l can b e zer o; see [ 1 ] for de- tails. The ﬁnite sa mple pro p erties of subsampling based metho ds can b e impr oved sometimes by considering hybrid tec hniques, ca librations and other mo diﬁcations, as do c umented by [ 2 ]. How ever, the asymptotic zero coverage of subsampling in- terv als for ˜ α ca nnot be reversed by , for example, size corr ection, since technical conditions that allow for such cor r ection to w ork are not satisﬁed by ˜ α . The ab ov e issues with p os t-mo del-selection estimators lead to mo del- av eraged estimators. A mo del- averaged estimator of α is of the for m ˇ α = ˆ α ( R ) p R + ˆ α ( U ) p U , (2.5) where p R and p U are tw o weights ass o ciated with the mo dels R and U . Y ang and his c o-authors hav e ex tensively studied aggr egation a cross mo dels fo r several statistical pro cedur es lik e estimators and forecasts, in b oth their a lgorithmic as well as theoretica l asp ects (see [ 22 , 2 3 , 2 4 , 2 5 ]). In pa r ticular, a result of [ 23 ] implies that when the mo del averaging technique is strongly cons is tent , the supremum o f the mean squa r ed error of n 1 / 2 ( ˇ α − α ) ov er v alues of ( α, β ) tends to inﬁnit y . Thus, strongly consis tent mo del av eraging do es not attain the minimax ra te. Our res ult in Section 3 shows that, up to co ns tant terms, it is no worse than the p ost-mo del- selection estimator when ( α, β ) ar e held ﬁxed. Recently , [ 5 ] studied s everal forms of mo del averaging and showed that a typical mo del-av eraged estimator co nv erges weakly to a mixture of norma l laws, when the par ameters o f the true mo del are in a O ( n − 1 / 2 ) neighbo r ho o d of the simplest candidate in a nesting of models . Since subsa mpling do es not see m to p erform well in pra ctice, it is impo rtant to study co nditions on mo del weigh ts under which bo otstrap approximations of ﬁnite sample distributions hold, i.e ., conditio ns under which the statistic under consideration is smo oth and a symptotically norma l (see [ 12 ], [ 13 ]). This is s tudied in Section 4 . 3. R isk proﬁle of mo del- a v eraged esti mators Several pr oblems a sso ciated with the p ost-mo del-selec tion estimato r ca n b e at- tributed to its lack of uniformity , as discussed extensively by others [ 8 ]. One is the sup e r-eﬃciency o f ˜ α , for example, when BIC is used for mo del selection. The cor e problem of lack of uniformity in the co nv e r gence pattern of ˜ α is unav oidable – even with mo del av eraging – when a str o ngly consis tent mo del averaging technique is used, as describ ed b y [ 23 ]. In this section we s how that when parameter v alues are ﬁxed, mo del averaging is no worse than mo del selectio n, up to constant terms. Under the unr estricted mo del, U , we c ho o se the prio r on ( α, β ) to b e a standar d mean zer o , identit y cov ariance biv ar iate Normal distr ibutio n, N (0 , I ). Under the restricted mo del, R , the prio r on α is a standar d univ ariate Normal distribution, N (0 , 1 ). W e put equal prio r weigh ts, i.e., 1 / 2, on the mo dels, so the prio r o dds is 1. Our notation fo r the p oster ior pr o babilities of the tw o mo dels are π nU and π nR . Since σ is known, without loss of ge nerality we also ass ume σ = 1 in this section. 160 S. Chatterjee and N . Mukhop adhyay Thu s the Bayesian mode l- av eraged estimator of α is ˆ α B M A = π nU ˆ α ( U ) + π nR ˆ α ( R ) . (3.6) W e us e the pr e-selected, least s quares estimators ˆ α ( U ) and ˆ α ( R ) as constituents of ˆ α B M A , and co nsider the squared erro r loss function. The cas e where a g e neral loss function is used, with ˆ α ( U ) a nd ˆ α ( R ) taken to b e the Bay es estimator s under mo dels U and R , is very simila r . The following Pro po sition is our main result in this section. Prop ositi o n 3. 1. The normalize d risk of ˆ α B M A , nR ( α ) = nE ( ˆ α B M A − α ) 2 , satisﬁes sup n nR ( α ) < ∞ , for every ﬁxe d choic e of α and β . Henc e, the int e gr ate d normalize d risk sup n Z α, β nR ( α ) dλ ( α, β ) < ∞ for any pr ob ability me asur e λ ( · ) that do es not dep end on n . Pr o of. In the following, we use C as a generic consta nt , not depending on the parameters α and β or the sample siz e n . Note that ˆ α ( R ) = ˆ α ( U ) + ˆ β ( U ) || X 1 || − 2 < X 1 , X 2 > . Therefore, nR ( α ) = nE [ π nU ˆ α ( U ) + π nR ˆ α ( R ) − α ] 2 ≤ 2 nE ( ˆ α ( U ) − α ) 2 + 2 n || X 1 || − 4 < X 1 , X 2 > 2 E n π 2 nR ˆ β 2 ( U ) o . (3.7) Note that E ( ˆ α ( U ) − α ) 2 = σ 2 || X 1 || − 2 E h V 1 − < X 1 , X 2 > D − 1 / 2 V 2 i 2 = C n − 1 and E π 2 nR ˆ β 2 ( U ) . ≤ 2 β 2 E π 2 nR + C n − 1 . Thus, we need suitable b ounds for β 2 E π 2 nR . W e now hav e p nR = m R ( Y ) / ( m U ( Y ) + m U ( Y )) = m R ( Y ) m U ( Y )  1 + m R ( Y ) m U ( Y )  − 1 ≤ m R ( Y ) m U ( Y ) . Then, mak ing use o f the moment genera ting function of a χ 2 random v a riable, we can deduce that E  m R ( Y ) m U ( Y )  2 = C n 2 exp  − nC 0 ( α 2 + β 2 )  for a par ticular constant C 0 . This yields , a t ( 3.7 ), that nR ( α ) = C n − 1 + C n 3 β 2 exp  − nC 0 ( α 2 + β 2 )  . which is b ounded for every ﬁxed ( α, β ), a s a function of n . The r est o f the r esult follows. Remark 3 .1. A low er b ound for nR ( α ) ca n also be established using a rguments similar to those ab ov e. With slight mo diﬁcation, the ab ov e appr oach using the moment generating function of a non-central χ 2 random v ar iable ca n b e used to provide an alternative pro o f of Theo rem 2 o f [ 23 ]. It can also b e seen that even when ( α, β ) v ar y over a co mpact set, the supremum of nR ( α ) ov er ( α, β ) is un bo unded. Mo del unc ert ainty 161 4. Adaptive mo del- a v eraged estimators and the b o otstrap The results of Hjor t and Claeskens [ 5 ] a nd Leeb a nd Potscher [ 8 ] indicate that the p ost-mo del-selec tio n estimator and many mo del-av eraged estimators cannot b e consistently b o otstrapp ed. The pro blems asso cia ted with the r is k b ehavior, and those asso ciated with b o o tstrap a pproximation, a rise from tw o diﬀerent sources . Undesirable be havior of the risk function ar ises from co nsidering s cenarios as pa- rameters v ary , while a ma jor r eason why the distribution of p ost-mo del-selection or mo del-av eraged estimators cannot b e approximated by b o otstrap metho ds is bec ause of lack of smo o thness of the estima to r, o r lack o f asymptotic normality . In this section w e study the conditions on the model weigh ts which a re req uired for consis ten t b o otstrap approximation of the distribution of the resulting mo del- av eraged estima tor. Clea rly , sinc e the distribution of ˆ α ( U ) can b e approximated using the b o otstr ap, putting the entire weigh t on mo del U is an option. Howev er, balancing b etw een ˆ α ( U ) and ˆ α ( R ) can le a d to a more eﬃcient estimator. W e pr o- po se b e low a data-adaptive mo del weighing s cheme that achiev es the dual goa ls of reasona ble eﬃciency a nd b o otstrap cons istency . A mo del-av eraged estimator o f α is of the form ˇ α = ˆ α ( R ) p nR + ˆ α ( U ) p nU . (4.8) Notice that we hav e adopted a diﬀerent no tation ( p nR and p nU ) for the mo de l weigh ts in this Section, from those ( π nR and π nU ) used in Section 3 . This is to emphasize that the nature of these w eigh ts may b e diﬀeren t. W e r etain the condition that the para meters ( α, β ) are ﬁxed but unknown. A pr ima ry requir ement for consistency is p nR + p nU = 1, a s p ointed out in [ 5 ]. In order to av oid pathologies , we als o sp ecify that p nU ∈ [0 , 1]. Note that the weigh ts p nR and p nU may depend o n the par ameters ( α, β ), and the r andom co mpo nent V , apart from the known constants X a nd σ 2 . Replacing p nU by 1 − p nR , w e thus hav e ˇ α = α + σ || X 1 || − 1 V 1 + β p nR || X 1 || − 2 < X 1 , X 2 > − σ || X 1 || − 1 D − 1 / 2 < X 1 , X 2 > (1 − p nR ) V 2 . A primar y req uirement on ˇ α is that it should b e consis ten t, and the following prop osition establis he s a necessary and suﬃcient condition for this. Prop ositi o n 4.1. The mo del-aver age d estimator ˇ α c onver ges in pr ob ability to α if and only if β p nR c onver ges in pr ob abi lity t o zer o as n → ∞ . Pr o of. The suﬃciency part follows eas ily from the design conditions ( 2.2 )–( 2.3 ). F or the necessity pa rt, supp ose that β p nR p → ˜ c 6 = 0 as n → ∞ . This is clea rly equiv alent to p nR p → c = ˜ c/β 6 = 0 a s n → ∞ and β 6 = 0. Hence, we also hav e (1 − p nR ) n σ || X 1 || − 1 D − 1 / 2 < X 1 , X 2 > V 2 o p → (1 − c )0 = 0. This implies ˇ α p → α − ˜ cγ 6 = α , where || X 1 || − 1 < X 1 , X 2 > → γ as n → ∞ . The case where p nR do es not hav e a limit can b e treated simila r ly with a little more algebra. The next prop ositio n is an extension of the previous one, and establishes suﬃcient conditions for a symptotic normality of ˇ α . Prop ositi o n 4.2 . The sc ale d and c en t er e d mo del-aver age d estimator n 1 / 2 ( ˇ α − α ) has an asymptotic normal distribution if (i) n 1 / 2 β p nR c onver ges in pr ob ability to zer o as n → ∞ , and (ii) p nR c onver ges in pr ob abil ity as n → ∞ for al l values of ( α, β ) . 162 S. Chatterjee and N . Mukhop adhyay Pr o of. The ﬁrst condition forces the bias comp onent in ˇ α to be o ( n − 1 / 2 ), while the second condition a llows for use of Slutsky’s theo rem. By r equiring n 1 / 2 β p nR p → 0 as n → ∞ we hav e ensured that, when β 6 = 0, we hav e n 1 / 2 p nR p → 0 . Thus the mo del-averaged estima to r is close to the unrestric ted mo del estimator ˆ α ( U ), and has the same limiting distribution up to ﬁrs t o rder terms. Howev er, when β = 0, the asymptotic distribution o f n 1 / 2 ( ˇ α − α ) dep ends on the limit of p nR , which is b etw een zero and o ne. Thus, when the r estricted mo del holds, the asymptotic v a riance of ˇ α is betw een tha t of ˆ α ( R ) and ˆ α ( U ). The r elative strengths of diﬀere n t candida tes for mo del weigh t p nR may b e ev aluated by their probability limits when β = 0 . W e no te that we consider ( α, β ) as ﬁxed co nstants and do no t allow them to v ar y with n . If, for example, we assumed β = O ( n − 1 / 2 ), then the ﬁr st condition of Pro p o sition 4.2 would imply asymptotically zero weight on the re stricted mo del. In or der to pro gress tow ards b o o tstrap cons istency , apart from a symptotic nor - mality of ˇ α , we als o need p nR to be a smo oth function. Thus ruling out the indicator function p nR = I {| n − 1 / 2 σ − 1 β ˆ β ( U ) |≤ c } used in ˜ α . K eeping in view the nice pro p e rties of ˜ α , w e now develop an adaptive, data- driven mo del w eight function p nR that is a smo oth version of I {| n − 1 / 2 σ − 1 β ˆ β ( U ) |≤ c } . F or any k n , we split the even t {− k n ≤ ˆ β ( U ) ≤ k n } into t w o ev en ts, { ˆ β ( U ) − k n ≤ 0 } a nd { ˆ β ( U ) + k n ≥ 0 } , and approximate the indicators of these even ts sepa rately . Our approximation for I { ˆ β ( U ) − k n ≤ 0 } is ξ 1 n ≡ ξ 1 n  γ 1 n , ˆ β ( U ) , k n  =  1 + exp n − γ 1 n ( ˆ β ( U ) − k n ) o − 1 exp n − γ 1 n ( ˆ β ( U ) − k n ) o , and for I { ˆ β ( U )+ k n ≥ 0 } is ξ 2 n ≡ ξ 2 n  γ 1 n , ˆ β ( U ) , k n  =  1 + exp n γ 2 n ( ˆ β ( U ) + k n ) o − 1 exp n γ 2 n ( ˆ β ( U ) + k n ) o . W e take the tw o tuning v alues γ 1 n and γ 2 n to b e always p o sitive. How ever, they change with n ; a nd in a ma jor departure fr om traditional mo de l weigh ts, they are not equal to ea ch other, and also depe nd on the da ta. Thus, γ 1 n ≡ γ 1 n ( α, β , V ) and γ 2 n ≡ γ 2 n ( α, β , V ) are unequal, ra ndom w eights. Equipp ed with these functions, we deﬁne p nR = 0 . 5 ξ 1 n + 0 . 5 ξ 2 n . W e adopt the p air e d b o otstra p as our resampling str ategy . Thus, we draw a simple random sample with repla cement of the data pair s ( Y ∗ i , x ∗ i ) , i = 1 , . . . , n , from the original data ( Y i , x i ) , i = 1 , . . . , n . The entire pro ces s of obtaining ˆ α ( R ), ˆ α ( U ), ˆ β ( R ), p nR , and ˇ α is imitated with the resa mple ( Y ∗ i , x ∗ i ) , i = 1 , . . . , n , and we approximate the distr ibution of n 1 / 2 ( ˇ α − α ) with the distribution of n 1 / 2 ( ˇ α ∗ − ˇ α ), conditional on ( Y i , x i ) , i = 1 , . . . , n . A technical condition g uarantees that the design ma tr ix from the r esampled data is non-sing ular with high probability; s ee condition (1.17 ) of [ 3 ]. The following Theorem is our main r esult in this section, and establishes consis- tency of the bo otstrap for a ada ptively weigh ted mo del-averaged estimator. Mo del unc ert ainty 163 Theorem 4. 1. Assume t hat se quenc e of c onstants k n ↓ 0 as n → ∞ . Supp ose t he tuning c onst ants ar e chosen as γ 1 n = a n ˆ β ( U ) γ 2 n = − a n ˆ β ( U ) wher e { a n } is a se quen c e of p ositive c onstants satisfying a − 1 n log( n ) ↓ 0 as n → ∞ . Then n 1 / 2 ( ˇ α − α ) has an asymptotic Normal distribution and the p air e d b o otstr ap is c onsisten t for it. Pr o of. F or the a symptotic normality we only need to chec k that the conditio ns o f Prop ositio n 4.2 are met. W e illustra te the calculation for verifying n 1 / 2 ξ 1 n p → 0, when β 6 = 0 . P h | n 1 / 2 ξ 1 n | > ǫ i = P  |  1 + exp n − γ 1 n ( ˆ β ( U ) − k n ) o − 1 exp n − γ 1 n ( ˆ β ( U ) − k n ) + 0 . 5 log( n ) o | > ǫ i ≤ P h exp n − γ 1 n ( ˆ β ( U ) − k n ) + 0 . 5 lo g( n ) o > ǫ i = P h ˆ β ( U ) lies b etw een the r o ots of x 2 − k n x − 0 . 5 a − 1 n log( n ) + a − 1 n log( ǫ ) = 0 i . The ro o ts of the equation x 2 − k n x − 0 . 5 a − 1 n log( n ) + a − 1 n log( ǫ ) = 0 are alw a ys rea l when ǫ < 1, s inc e k 2 n + 2 a − 1 n log( n ) − 4 a − 1 n log( ǫ ) > 0 for all n . Note that the squa re of the distance b etw een the r o ots is given by  k 2 n + 2 a − 1 n log( n ) − 4 a − 1 n log( ǫ )  / 4. When k n ↓ 0 , k 2 n + 2 a − 1 n log( n ) − 4 a − 1 n log( ǫ ) ↓ 0 , hence the Leb esgue measur e o f the interv al be tw een the ro ots g o es to zer o a s n → ∞ , thus ensuring P h ˆ β ( U ) lies b etw een the ro ots of x 2 − k n x − 0 . 5 a − 1 n log( n ) + a − 1 n log( ǫ ) = 0 i → 0 , as n → ∞ . Note that this r esult actua lly do es not dep end on the v alue of β , as long as it is no n- zero. Other par ts of the pro of fo r asymptotic Norma lit y may be v eriﬁed similarly . Since ˇ α is a smo o th function of α , β a nd V , and has an asymptotic Normal distr ibution, the consistency of the paired b o otstra p pr o cedure follows from [ 12 ] and [ 13 ]. Remark 4. 1. The c o ndition k n ↓ 0 as n → ∞ is a weak er restriction than typically found in literature . Since ˆ β ( U ) = O p ( n − 1 / 2 ), the AIC criterion uses k n = O ( n − 1 / 2 ), while the BIC uses k n = O ( n − 1 / 2 p log( n )). Remark 4.2. The a ssumptions of Pr o p osition 4.1 and Prop os ition 4.2 cannot b e weak ened in general. The exa mple of Section 10.6 of [ 5 ] provides a test case. It is a simpler version of the mo del des crib ed in Section 2 , and simply has Y 1 , . . . , Y n independent, identically distributed as N ( µ, 1) ra ndom v aria bles. Mo del uncertaint y is ab out whether µ = 0, and the natura l estimato r for µ is ¯ Y n = n − 1 P n i =1 Y i in the unrestricted mo del, and 0 in the restric ted model. A mo del- av eraged estima to r is ˆ µ = W ( n 1 / 2 ¯ Y n ) ¯ Y n , for s ome weigh t W ( · ) ∈ [0 , 1]. Note tha t under a mo del with contiguous alterna tives µ true = n − 1 / 2 δ , the r equirement that ˆ µ b e co nsistent for µ true actually places no restriction on the weigh t W ( · ), which may take an y v alue in [0 , 1 ]. Howev er, if we wan t consistency under arbitr ary µ , W ( n 1 / 2 ¯ Y n ) p → 1 is a requirement. F or asy mptotic normality , n 1 / 2 µ (1 − W ( n 1 / 2 ¯ Y n )) p → 0 and convergence in proba- bilit y of W ( n 1 / 2 ¯ Y n ), are re quirements. Under µ true , this implies that W ( n 1 / 2 ¯ Y n ) p → 164 S. Chatterjee and N . Mukhop adhyay 1 must hold, while for ge ner al µ , the stronge r condition n 1 / 2 (1 − W ( n 1 / 2 ¯ Y n )) p → 0 m ust b e satisﬁed. Under µ true , it is of interest to a pproximate the distribution of the standar dized statistic Λ n = n 1 / 2 ( ˆ µ − µ true ) = n 1 / 2 W ( n 1 / 2 ¯ Y n ) ¯ Y n − δ = W ( δ + Z n )( δ + Z n ) − δ, where Z n ∼ N (0 , 1). A natura l question is what should b e a b o otstrap eq uiv alent of Λ n . Supp ose Y ∗ 1 , . . . , Y ∗ n are a r a ndom sample from the data Y 1 , . . . , Y n . W e consider the bo o t- strap equiv alen t of n 1 / 2 ¯ Y n to b e n 1 / 2 ( ¯ Y ∗ n − ¯ Y n ), and not n 1 / 2 ¯ Y ∗ n . This is in keeping with [ 4 ], who put forth the guideline that for go o d p ow er p erfor ma nce, resa mpling m ust b e done to r eﬂect the null hypothesis . While mo del selection is no t in g eneral a hypothesis test, some of the same principles a re applicable. Hence, we hav e ˆ µ ∗ = W ( n 1 / 2 ( ¯ Y ∗ n − ¯ Y n )) ¯ Y ∗ n . When 1 − W ( n 1 / 2 ¯ Y n ) p → 0, it can b e readily s een that the distribution of Λ ∗ n = n 1 / 2 ( ˆ µ ∗ − ˆ µ ), co nditional on Y 1 , . . . , Y n , and that of Λ n conv erge to the same limit law. Remark 4.3. W e conjectur e that for the mo del-averaged estimato r pr o p osed in this s ection, a r esult similar to [ 16 ] would hold. In the fra mework of this pap er, the statement cor resp onding to the main result of [ 16 ] would be as follows: Let F n,α,β ( t ) = P  n 1 / 2 ( ˇ α − α ) ≤ t  , and let ˆ F n ( t ) b e a n estimator of F n,α,β ( t ) satisfy- ing fo r every δ > 0 P n,α,β [ | ˆ F n ( t ) − F n,α,β ( t ) | > δ ] → 0, as n → ∞ . Then ∃ δ 0 > 0 and ρ 0 > 0 such that sup ( ˜ α, ˜ β ) ∈ B (( α,β ); ρ 0 / √ n ) P n, ˜ α, ˜ β h | ˆ F n ( t ) − F n, ˜ α, ˜ β ( t ) | > δ 0 i → 1; (4.9) where B (( α, β ); a ) = { ( ˜ α, ˜ β ) : || ( ˜ α, ˜ β ) − ( α, β ) || < a is the op en ball of radius a a round ( α, β ). It ca n b e seen that under standard conditions, if the s upremum in ( 4.9 ) is taken o ver B (( α, β ); a n ) with a n = o ( n − 1 / 2 ) instead of B (( α, β ); ρ 0 / √ n ), the limit w ould b e zer o instead of 1. Thus the result of [ 16 ] ma y b e improv ed to the case where the s upremum is ta ken only over the set of para meter v alues that are ex act orde r n − 1 / 2 aw a y from the ( α, β ) under which the estimato r ˆ F n ( · ) is co mputed. This is ea sily veriﬁed, for ex a mple, when α = 0, σ = 1 and X t 2 ≡ 1. Note that from a b o otstrap approximation po int of v iew, ( 4.9 ) is not a negative result, but a very p ositive o ne . The uses of bo otstrap appr oximation a re for con- structing interv al estimates, testing hypotheses and so on. Eq uation ( 4.9 ) and o ther related results fr om [ 16 ] imply that a b o o tstrap approximation ˆ F n ( · ) constructed under the “null” ( α, β ), has sup-norm dista nce of 1 from the tr ue distributions under parameter v alues that ar e exac t order n − 1 / 2 aw a y fro m the ( α, β ). Thus ˆ F n ( · ) has p ow er 1 in hypothesis testing under contiguous alter natives. T his is a further conﬁr ma tion o f the tenet of [ 4 ], that resa mpling pro cedure ought to r e ﬂect the null hypo thesis. Remark 4.4 . It is o f interest to know that the a symptotic v ar ia nce of ˇ α dep ends on β , and is given by V a r( n 1 / 2 ( ˇ α − α )) − V ar( n 1 / 2 ( ˆ α ( U ) − α )) → 0 if β 6 = 0 , while V ar ( n 1 / 2 ( ˇ α − α )) − { 0 . 5 V ar( n 1 / 2 ( ˆ α ( U ) − α )) + 0 . 5 V ar( n 1 / 2 ( ˆ α ( R ) − α )) } → 0 if β = 0 . This is established by checking that bo th ξ 1 n and ξ 2 n tend to 1 / 2 as n → ∞ when β = 0. Thus ˇ α p erfor ms like the correct estimator ˆ α ( U ) when mo del U is v alid, and balances b etw een the cor rect a nd co nserv a tive c hoices when the restricted mo del R is true. Mo del unc ert ainty 165 5. A s i m ulation example W e p erfo r med a small s im ulation exp eriment to illustrate some of the features of inference under model uncertaint y that hav e b een discussed in the previous sec tions. W e to ok n = 50, x i 1 ≡ 1, and generated 50 num ber s from the Uniform distribution suppo rted b etw een zero and three and ﬁxed these as the x i 2 v alues. W e ﬁxed α = 1, and v a ried the β v alues. F or diﬀerent v alues o f β ∈ [ − 1 , 1], we o btained sa mpling distribution appr oxima- tions of (i) the po st-mo del-selected es timator ˆ α M S , (ii) a version of the Bay esian mo del-av eraged estima to r ˆ α B M A , and (iii) an adaptive mo del-averaged estimator ˆ α AM A , by 50 00 replications for ea ch v a lue of β . F o r the Bayesian model- averaged estimator, mo del R was a ssigned weigh t q nR = exp( − B I C R / 2) / (exp( − B I C R / 2) + exp( − B I C U / 2)) while mo del U was assigned weigh t 1 − q nR . W e deﬁne B I C R = X [ Y i − ˆ α R x i 1 ] 2 + log( n ) , B I C U = X h Y i − ˆ α U x i 1 − ˆ β U x i 1 i 2 + 2 log( n ) . F or the adaptive mo del-av eraged estimato r, we to ok a n = (log ( n )) 2 . The req uir ement that a − 1 n log( n ) ↓ 0 sugges ts that a n should b e an incr easing sequence, growing fa ster than log( n ). Several choices of a n were used initially , and it turned out tha t very s lowly incr easing s e quences like a n = (log( n )) 2 or very quickly increas ing sequences like a n = n 0 . 499 per formed b etter than others. This is a reﬂection on our wa y of constructing the functions ξ 1 n and ξ 2 n using γ 1 n and γ 2 n . Alternative choices, like γ 1 n = a n | ˆ β ( U ) |{ ˆ β ( U ) } − 1 , are a s ub ject for further resear ch. The ﬁrst ob ject of o ur study is the mean squared erro r of the three estimato rs o f α , namely , ˆ α M S , ˆ α B M A , and ˆ α AM A . P anel (a ) in Figure 1 cont ains the gr a phs o f the mean squar ed er ror (MSE) as β v a ries b etw ee n [ − 1 , 1]. In this and all subse quent ﬁgures, the solid line corr esp onds to ˆ α B M A , the broken line to ˆ α M S , a nd the do tted line to ˆ α AM A . In this ﬁg ure, we hav e also added the g raph for the MSE of ˆ α ( U ), which is the near ly hor iz ontal dot-and-dash line. Fir st, using mo del selection or av- eraging is clearly b etter than using ˆ α ( U ) only in the regio n 0 ± 2 / √ n ≈ ( − 0 . 3 , 0 . 3), where M S , B M A and AM A all p e rform better than ˆ α ( U ). How ever, in the neig h- bo ring regions | β | ∈ (0 . 3 , 0 . 8), ˆ α ( U ) has smaller MSE than the three estimators. F or high v alues of | β | , using model selec tion/av eraging or the unres tricted mo del makes little diﬀerence. Thus whether mo del av eraging/sele ction is useful or not dep ends considerably on the v a lue of β . Also no te tha t B M A has a low er MSE compar ed to M S for low v alues of | β | and o nly marg inally higher MSE otherwise, w ith a muc h low er maximum MSE v alue. The gr a ph for AM A tends to stay clos est to the graph for ˆ α ( U ), and thus do es b etter tha n B M A or M S in the reg ion | β | ∈ (0 . 05 , 0 . 75 ), but is marg inally po o rer otherwise. In order to study how the three estimators balance b etw een ˆ α ( R ) and ˆ α ( U ), we computed the Ko lmo gorov–Smirnov distances K S j R and K S j U , be tw een the dis- tribution of n 1 / 2 ( ˆ α j − α ), a nd the dis tributions o f n 1 / 2 ( ˆ α R − α ) and n 1 / 2 ( ˆ α U − α ), where j = M S, B M A, AM A (MS: mo de l selected, BMA=B ayesian mo del- av eraged, AMA=adaptive model- av eraged). W e then computed the r atios K S Rati o j = 100 K S j R K S j R + K S j U , j = M S, B M A, AM A. Under idea l cir cumstances, this ratio o ught to b e zero a t β = 0, a nd 100 for β 6 = 0. Panel (b) in Figure 1 displays the K S R atio j v alues fo r the three estimator s j = 166 S. Chatterjee and N . Mukhop adhyay Fig 1 . Panel (a) is the me an squar e d err or of ˆ α BM A (solid line), ˆ α M S (br oken line), ˆ α AM A (dot- te d line), and ˆ α ( U ) (dot-and-dash line ). Panel (b) i s the r ati o of Kolmo gor ov Smirnov distanc es K S Ratio j = K S j R / ( K S j R + K S j U ) , j = M S, B M A, AM A , sc ale d by 100; b etwe en distributions of c enter e d and sc ale d estimators and ˆ α ( R ) (for K S j R ) and ˆ α ( U ) (for K S j U ). M S, B M A, AM A . When β = 0, M S is closest to ˆ α R , while, as predicted, AM A balances b etw ee n ˆ α R and ˆ α U . The Bayesian mo del-av eraged es tima to r B M A lies betw een M A and AM A , and is quite clos e to M S . In the r e gion 0 ± 2 / √ n ≈ ( − 0 . 3 , 0 . 3) both M S and B M A are muc h clos er to ˆ α R than ˆ α U . Next, we studied resa mpling for the three estimators. Subsampling with sub- sample size m = 20 = 0 . 4 n and the b o otstra p was studied. Note that subsa m- pling is co nsistent for all three estimators, but the b o otstra p is co nsistent only for AM A . Panels (a) ((b)) of Figure 2 , resp ectively , present the Kolmo gorov–Smirnov distance, scaled by 100, b etw een the distributions of n 1 / 2 ( ˆ α j − α ) and its sub- sampling (b o otstrap) version, j = M S, B M A, AM A . W e prese nt the g raphs for | β | ≤ 0 . 4 ≈ 3 / √ n , s ince there is not muc h diﬀerence b etw een the three gra phs for other v a lues of β . It can b e seen that the distances b etw een the actual distr ibu- tion and its subsa mpling/b o otstrap versions a re muc h smaller for AM A , while the resampling approximations for M S a nd B M A are pa rticularly bad in the re g ions {| β | ∈ (0 . 1 , 0 . 3) } . Also, there is little visua l diﬀer ence betw een the acc ur acies of the subsampling and the b o otstr ap a pproximations despite their diﬀerent asymptotic behavior, which co nﬁrms some of the obser v ations made in [ 1 ], [ 2 ] and [ 18 ]. Mo del unc ert ainty 167 Fig 2 . Panel (a) is t he subsampling appr oximation (subsample size 20 ) for the distribution of c e nter e d and sca le d ˆ α BM A (solid line), ˆ α M S (br oken line), ˆ α AM A (dotte d line ). Panel (b) is t he c orr esp onding b o otstr ap appr oximation. 6. Di scussion and conclusions The pro blems asso ciated with p ost-mo del-selection es timation hav e b e en discuss ed by several r e searchers. In current sta tistical pr actice, the pro cess o f s electing a mo del has simila rities with hypothes is testing. On the other hand, es tima tion of parameters , some of which may b e known co nstants in some of the mo dels, is generally entirely separ ated fro m mo del selection. Estimation and testing/selec tio n are tw o diﬀerent pa radigms of statistical analysis that are hard to integrate. The lack of uniformit y acro ss mo dels that parameter es tima to rs genera lly display , and the issues that arise subsequently , are pr o ducts o f the less than successful attempt to combine the tw o pr o cesses of estimation and selec tio n. In the Bayesian para digm, mo del av eraging seems to b e a g o o d integration of the tw o, s inc e the selection s tep here is also an estimation exer cise in spirit. The statement ab out int egrated risks in Prop os ition 3.1 implies that B ayes’ risks of mo del-av eraged estimator s a r e b ounded. Thus, while minimaxity seems to b e an elusive goal under mo del uncertaint y , a fully Bay esian a pproach to analyzing risk behavior may b e mo re successful. In the co ntext o f bo o tstrapping mo de l- av eraged estimators, a n alter native to ˇ α 168 S. Chatterjee and N . Mukhop adhyay is to estimate the bias in ˆ α in a ll the mo dels, a nd deﬁne a bias correc ted average of these. As the bias of ˆ α ( R ) is β || X 1 || − 1 < X 1 , X 2 > , if we estimate this by ˆ β || X 1 || − 1 < X 1 , X 2 > , we g et back ˆ α ( U ). Nevertheless, in more complex problems the “bias cor rected mo del averaged” estimator may b e an interesting ob ject to study . In Theorem 4.1 w e es tablished the consistency of the pair ed bo otstra p for a data- adaptive mo del-av eraged estimato r. Two other kinds of b o o tstrap are av ailable in the linear r egress ion context; namely , pa rametric bo otstrap a nd the r esidual-bas e d bo otstrap. When only one mo del is in use, the par ametric bo otstrap generates data from it us ing estimated v alues for the unknown parameters , while the residual bo otstrap obtains residuals a fter ﬁtting the mo del. The equiv ale nts o f these are not obvious under mo del uncertaint y . In Section 4 we r emarked that the data a daptive weigh ts p nR and p nU may not share the same prop erties as the p osterio r mo del proba bilities π nR and π nU of Section 3 . It would be interesting to study when p nR and p nU can be interpreted as po sterior pr obabilities, and als o under what conditions the frequentist prop erties of a Bay esian model- av eraged estimator may be e lic ited using the b o otstra p. Ac kno wle dgment s. Profes s or Chatterjee’s res earch w as par tially supp orted by a gr a nt from the Univ ersity of Minnesota. W e thank the referee and the editor s of this mono graph for so me excellent comment s and s uggestions. Also, we w ould like to tha nk P rofessor Y uhong Y ang, who c a refully rea d an earlier draft of this pap er and ma de several comments; which, along with several illuminating discus sions, greatly enhanced our understanding o n the scop e and issues relating to mo del selection/averaging. References [1] Andrews, D. W. K. and Guggenber ger, P. (2005). Hybrid a nd size- corrected subsample metho ds. Cowles F o undation discussion pap er # 1 605. [2] Andrews, D. W. K. and Guggenber ger, P. (2005 ). The limit of ﬁnite sample size and a problem with subsampling. Cowles F oundation discussion pap er # 160 6. [3] Cha tterjee, S. and Bose, A. (2000 ). V a r iance estimation in high dimen- sional regr ession mo dels . Statist. Sinic a 10 49 7–515 . MR17697 54 [4] Hall, P. and Wilson, S. R. (1991). Two guideline s for bo otstrap hypothesis testing. Biometrics 47 7 57–7 6 2. MR11325 43 [5] Hjor t, N. L. and Claeskens, G. (200 3). F requentist mo del average esti- mators. J. Amer. Statist. Asso c. 98 879– 899. MR20414 81 [6] Leeb, H. (2 006). The distribution of a linear pr edictor a fter mo del selection: unconditional ﬁnite sample distributions and a symptotic a pproximations. IMS L e ctur e Notes Mono gr aph Series 49 291– 311. MR23385 49 [7] Leeb, H. and P ¨ otscher, B. M. (20 03). The ﬁnite sample distr ibution of po st-mo del-selectio n es timators and uniform versus non-uniform approxima- tions. Ec onometric The ory 19 100 – 142. MR19658 44 [8] Leeb, H. and P ¨ otscher, B. M. (20 05). Mo del selection and inference: facts and ﬁction. Ec onometric The ory 21 21 – 59. MR21538 56 [9] Leeb, H. and P ¨ otscher, B. M. (2006). Performance limits for the estimators of the risk or distribution of shrink age type estimato rs, and some gener al low er risk b ound res ults. Ec onometric The ory 22 69–9 7. MR22126 93 Mo del unc ert ainty 169 [10] Leeb, H. and P ¨ otscher, B. M. (2 006). Can one estimate the conditional distribution of po st-mo del-selectio n es timators? Ann. Statist. 3 4 2554 –2591 . MR22915 10 [11] Leung, G. and Barron, A. R. (2 006). Information theory and mixing lea st squares regr essions. IEEE T r ans. Inform. The ory 52 339 6–34 1 0. MR22423 56 [12] Mammen, E. (19 92a). Bo otstra p, wild b o otstra p and asymptotic nor mality . Pr ob ab. The ory Rela te d Fields 93 439– 4 55. MR1 1 8388 6 [13] Mammen, E. (1992 b). When Do es Bo otstr ap Work: Asymptotic R esults and Simulations . Springer , Berlin. [14] Politis, D. N., R omano, J. P. and Wolf, M. (1999). Subsampling . Springer, New Y o rk. MR17072 86 [15] P ¨ otscher, B. M. (199 1). E ﬀects o f model selection on infer ence. Ec onometric The ory 7 163– 185. MR11284 10 [16] P ¨ otscher, B. M. (20 06). The distribution of mo del averaging es timators and an impo ssibility r esult regar ding its estima tio n. MPRA pap er # 73 . [17] Rafter y, A. E. and Zheng, Y. (2003 ). Comment on “F requentist mo del av erage estimators,” by N. L. Hjort and G. C la eskens. J. Amer. St atist. Asso c. 98 931– 9 38. MR20414 81 [18] Samwor th, R. (200 3). A no te on metho ds of restoring co nsistency to the bo otstrap. Biometrika 90 98 5–990 . MR2024 773 [19] Sethuraman, J. (2004). Are sup er-eﬃcient estimators super -p ow erful? Comm. Statist. The ory and Metho ds 33 200 3–20 1 3. MR21 0305 8 [20] Shen, X. and Dougher ty, D. P. (2 0 03). Discussion of “F requentist mo del av erage estimators,” by N. L. Hjort and G. C la eskens. J. Amer. St atist. Asso c. 98 917– 9 19. MR20414 81 [21] Y ang, Y. (200 3). Reg ression with m ultiple c a ndidate mo dels: selecting or mixing? Statist. Sinic a 13 78 3–80 9. MR19971 74 [22] Y ang, Y. (20 04). Aggr egating regres sion pr o cedures to improve p erformance . Bernoul li 10 25–4 7. MR20445 92 [23] Y ang, Y. (2005 ). Can the strengths of AIC and BIC b e shared? A conﬂict betw een mo del identiﬁcation and reg ression estimation. Biometrika 92 937 – 950. MR22341 96 [24] Y ang, Y . (2007). Prediction/ estimation with simple linea r mo del: is it really that simple? Ec onometric The ory 23 1–36 . MR23 3895 0 [25] Yuan, Z . and Y ang, Y. (2005 ). Combining linear r egress io n mo dels : when and how? J . Amer. St atist. Asso c. 10 0 1 202– 1214. MR2236 435

Risk and resampling under model uncertainty

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment