J. K. Ghoshs contribution to statistics: A brief outline

IMS Collectio ns Pushing the Limits of Con temp orary Statist ics: Contributions in Honor of Jay an ta K. Ghosh V ol. 3 ( 2008) 1–18 c  Institute of Mathe matical Statistics , 2008 DOI: 10.1214/ 07492170 80000000 11 J. K. Ghosh’s con tribution to statistics: A brief outlin e Bertrand Clark e 1 and Subhashis Ghosal 2 University of British Columbia and North Car olina State University Abstract: Professor Ja y an ta Kumar Ghosh has cont ributed massiv ely to v ar- ious areas of Statistics o v er the l ast ﬁv e deca des. Here, we s urve y some of his most imp ortan t con tributions. In roughly chronologica l order, we discuss his ma j or results in the areas of sequen tial analysis, foundations, asymptotics, and Ba y esian inference. It i s seen that he progressed f rom thinking ab out data points, to thinking ab out data sum m arization, to the limiting cases of data summarization in as they r el ate to parameter estimation, and then to more general aspects of modeli ng including prior and mo del selection. Con ten ts 1 Int ro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Sequential analy sis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3 F oundations of statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4 Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4.1 Ba hadur–Ghosh– Kiefer represe n tation . . . . . . . . . . . . . . . . . 5 4.2 Edg eworth expansio ns . . . . . . . . . . . . . . . . . . . . . . . . . . 5 4.3 Seco nd order eﬃciency . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.4 Ba rtlett correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.5 Compa rison of the likelihoo d ratio , W ald’s and Rao ’s statistics . . . 7 4.6 Ba hadur–Co chran deﬁciency . . . . . . . . . . . . . . . . . . . . . . . 8 4.7 Neyman– Scott problem and semiparametr ic inference . . . . . . . . 8 5 Bay esian inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 5.1 Matching and other ob jective prior s . . . . . . . . . . . . . . . . . . 9 5.2 Limits of po s terior distributions . . . . . . . . . . . . . . . . . . . . . 10 5.3 Bayesian nonparametrics . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 5.4 Mo del selec tio n and Bay esian hypo thesis testing . . . . . . . . . . . 13 6 Concluding remark s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1 Departmen t of Statistics, Univ ersity of British Columbia, 6356 Agricultural Road, V anco uv er, BC, V6T1Z2, Canada, e-mail : riffraff @stat.ub c.edu 2 Departmen t of Statistics, North Carolina State U nive rsity , 12 P atterson Hall, 2501 F ounders Drive Raleigh, NC 2769 5, USA, e-mail: ghoshal@ stat.ncs u.edu AMS 2000 subje c t classiﬁc ations: Pr imary 62; secondary 62. Keywor ds and phr ases: Bartlett corr ections, Bay esian nonparametrics, Edgeworth expan sions, foundations of statistics, mo del selection, noninformative pr i or, p osterior c on v ergence, second order eﬃciency, semi parametric inference, sequent ial analysis. 1 2 B. Clarke and S. Ghosal 1. In tro duction Professo r Jay an ta Kumar Ghosh, o r J. K. Ghosh, as he is commonly known, has b een a pro minent co n tributor to the discipline of statistics for ﬁve decades. The sp ectrum of his contributions encompasses sequential ana lysis, the founda- tions of statistics , ﬁnite p opulations, Edg eworth expansions, second or der eﬃciency , Bartlett co rrections, noninformative, and esp ecia lly matc hing, priors , semiparamet- ric inference, po sterior limit theorems, Bay esian nonpar ametrics, mo del s election Bay esian hypothesis testing and high dimensiona l data analysis , a s well a s some applied work in reliability theory , statistical quality con trol, mode ling hydro carb o n discov eries, g eologica l ma pping and D NA ﬁngerprin ting. By itself, c overing such diverse topics in depth is a ma jor ca reer achievemen t. He has author ed ov er 130 publications including three monog raphs and sev eral edited volumes. His bo oks, one entitled Higher Or der Asymptotics and published as an I MS monograph and another entitled Bayesian Nonp ar ametrics , co -authored b y R. V. Ramamo orthi and published by Springer -V erlag, contin ue to ho ld resp ected po s itions for r esearchers in these areas. His recen tly published third bo ok [ 34 ] is a ﬁne gra duate text on Bay esian inference. In addition, his service to the profes s ion, esp ecially a s the edi- tor of Sankhy¯ a, ha s be e n inv aluable. The v a riety of his work notwithstanding, asymptotics ha s b een cen tral to his thinking acr o ss a wide range of pro blems. Accordingly , in what follows, we outline some of his work, in ro ughly chronological order , fo cussing on tho se c ontributions which are in timately co nnected to a symptotics. In the course of re viewing his work, we try to c haracterize the pro gressio n o f thinking that naturally connects the topic s that J. K . Ghos h has done so muc h to develop. 2. Sequential analysi s J. K. Ghosh started his resear ch career in Seq uent ial Analys is in the e a rly sixties a s a Graduate Student in the Department of Statistics a t Ca lcutta Universit y . W ald ha d recently intro duced his se qu ential pr ob abili ty r atio test (SPR T), but its prop er ties were not well understo o d in the c o mpo site case. This was the ﬁrst topic to which Ghosh turned his attention. Thro ugh his w ork, many of the prop erties of SPR T and rela ted pro cedur es were established a nd b etter under s to o d. F or insta nce, in the testing context, double minimaxity essen tially means sim ultaneous minimization of av erage type I and I I e rror probabilities. In his ﬁrst published w ork [ 26 ], Ghosh clariﬁed a result of W ald on the double minimaxit y of the SPR T for nor mal tw o- sided alterna tive h ypo thesis (with unknown scale) separated from the null b y δ . It is well-known that the p ow er function is monotonic in man y common families for ﬁxed sa mple sizes. Ghosh established an analog o f this result in [ 27 ], namely that the op erating characteristic function o f the (genera lized) SPR T contin ues to be monotonic. Also in the s equential context, [ 28 ] considered the admiss ibility of sequential tes ts ba sed on a simple ident it y which later b eca me known as the Ghosh– Pratt identit y . Ghosh compa red the SPR T not just with the cla ss of all tests with ﬁnite ex pec ted sample size but also within other classes, for instance, the class which requires at lea st o ne observ ation or which r equires no mor e than a pr edetermined nu m ber of obser v ations to reach a conclus ion. F ollo wing this, Ghosh co nt in ued to elucidate mor e pr op erties of the SP R T, and its v ariant s, which could b e seen a s ana lo gs o f the corres po nding pro p er ties Neyman–Pearson o r Bay es tests for ﬁxed sa mple size. I n [ 29 ], he prov ed that fo r J. K. Ghosh’s c ontribution to statistics: A brief outline 3 exp onential families, truncated or untruncated Bayesian sequential decisio n rule s ’ terminal decisions describ e regions in terms of suﬃcient statistics, a nd a ls o show ed that for testing pr oblems, truncated genera lized SP R T’s form a complete clas s. Abo ut tw o dec a des later, Ghosh re turned to sequential pr oblems, along with v ario us co-author s. In [ 33 ], he studied an inv ariant SP R T to identif y tw o normal po pulations with e q ual v ar iance and obtained b ounds for erro r probabilities. Mos t recently , similar b ounds for an inv ariant SP R T with resp ect to an impro pe r prior hav e als o b een obtained in [ 50 ]. Two-stage pro cedur es are closely r elated to sequential pro cedur e s . Rec a ll Stein’s famous pr oblem of ﬁnding a bo unded leng th conﬁdence in terv al for the normal mean with unkno wn v ariance. Stein prop o s ed a t wo-stage pro cedur e for doing this: In the ﬁrst stage, the sample v a riance deter mines how many samples ar e to b e taken in the second stage. An o bvious shor tco ming of the pro c edure is that the seco nd stage sample v ar ia nce is no t used in the constructio n of the interv al. So, it is natural to ask whether one can improv e Stein’s pro cedure by using the s e cond stage sample v aria nce. Surprisingly , it is impossible to b etter Stein’s pro cedure, as shown in [ 38 ]. How ev er, the pro cedure ca n be improv ed in a diﬀeren t, and p erhaps mo re ap- propriate sense. The c onﬁdence co eﬃcient do es not in g eneral pr op erly r eﬂect the true s ense of conﬁdence ab out a par ameter after observing da ta . F or instance, if t wo obse rv atio ns are obtained from a U ( θ, θ + 1) family , then the a s sessment of θ is very precise when the tw o obser v ations diﬀer in ma gnitude by nearly 1, while the assessment is m uc h less precise if the tw o observ ations ar e close to each other. This means tha t classical conﬁdence int erv a ls fail to indicate the true diﬀere nc e in the level of co nﬁdence after obser ving the s ample. Motiv ated b y this, Kiefer sugg ested letting the conﬁdence co eﬃcient dep end on the data. After all, in reality , for a g iven ra ndom interv a l I , we often wan t to predict the indicator function 1 { θ ∈ I } . Since this o b ject is unknown, it is traditionally estimated by a constant, the best co ns tant b eing the exp ectation P θ ( θ ∈ I ), which bec omes ﬁxed (or a s ymptotically ﬁx ed) for man y class ical interv als. How ev er, from a prediction theory p oint of view, it mak es mor e sense to let the predictor of 1 { θ ∈ I } depe nd on the obs e rved data. The pr edictor considered in this wa y is called the random co nﬁdence co eﬃcient asso ciated with the conﬁdence interv al I . It is shown in [ 39 ] that the seco nd stage sample v ar iance can be used to bo ost the random conﬁdence co eﬃcient of a b ounded length conﬁdence in terv al. 3. F o undations of s tatistics F rom the examination o f individual data p oints as they relate to the testing prob- lem, Ghosh shifted his attention to da ta s ummarization, fo cussing on the relation- ship betw een suﬃciency and in v ariance. Suﬃciency isola tes features of the collectio n of o bs erv a tions from tho se o f the individual o nes which ar e indep endent of the fea- tures of the collection. In v ariance, o n the other hand, summarizes data b y impo sing symmetry cons tr aints. In practice, b o th suﬃciency and inv aria nce r estrictions are applied, but their order of applicatio n is an issue of interest. Consider a statistical mo del ( X , A , P ) wher e a group of transfor mations G is acting on the sample space and atten tion is limited to in v ariant pro cedure s. T o ﬁnd a suﬃciency reductio n, one needs to ﬁnd a suﬃcient sub- σ -ﬁeld C of the inv ariant σ -ﬁeld I . How ev er, in practice, it is typically easie r to inv oke inv aria nce on the data after it ha s been r educed by suﬃciency . Let S b e a suﬃcient σ - ﬁeld. T o justify the application of inv ariance restriction after a suﬃciency reduction, it is enoug h 4 B. Clarke and S. Ghosal to es tablish that S ∩ I is suﬃcient for I . This pro blem was a ddressed by W. J. Hall, R. A. Wijsman a nd J. K . Ghosh, indep endently and roughly simultaneously . Once they realized they had compatible results, they published a combined pap er [ 65 ]. Their ma in result can be describ ed brieﬂy as follows. A sta tis tic T is ca lled almost inv a r iant if, for ev ery g ∈ G , T ( x ) = T ( g x ) a.s. Under co nditions that imply that every almost in v ariant set is equiv alent, up to n ull sets, to an in v aria nt set, it follows that S a nd I are co nditionally independent given S ∩ I , and hence S ∩ I is suﬃcient for I . Another notion whic h relates tw o sequences of σ - ﬁelds in sequen tial exp e riments is that of transitivity , int ro duced by Baha dur. Two sequences of σ -ﬁelds B n ⊂ A n are sa id to b e tra nsitive if for every B n +1 -measurable function f , E( f | A n ) is B n measurable. In the usual sequential setting up, S n ∩ I n is transitive for I n , where the extra index n indica tes the sample size. Several implications of this res ult were discussed in [ 65 ]. In many applicatio n ar eas, sample surveys for instanc e , discrete models arise, where the probabilit y is concen trated on a co untable set but the models do not hav e commo n suppo r t, i.e., the supp ort set is diﬀerent fo r diﬀerent par ameter v a l- ues. Clearly , such a family is not dominated and the Ha lmo s–Sav age theor em on suf- ﬁciency do es no t hold. Nevertheless, as shown in [ 2 ], minimal suﬃcient σ -ﬁelds exist and the Neyman factorization theorem holds goo d. These results were extended for pairwise suﬃcient σ -ﬁelds and condition for existence of minimal pa ir wise suﬃcient σ -ﬁeld was found in [ 37 ]. Another basic question is whether a ﬁxed-dimensional suﬃcien t statistic inde- pendent of sa mple size actually exists. In exp onential families, it is well-kno wn that ﬁxed-dimensional suﬃcient s tatistics exist. Outside of exponential families, how ev er, suﬃcien t statistics a re hard to ﬁnd. So me distinguished no nregular cases like U ( − θ , θ ) provide additional examples. In [ 54 ], it is shown that if the supp ort ( a ( θ ) , b ( θ )) is shrinking or expanding, as in the support of U (0 , θ ) for example, then the density must b e o f the form g ( θ ) h ( x ) to ha ve a real-v alued suﬃcient statistic. If a ( θ ) and b ( θ ) are b oth increasing , or b oth decr easing, as in U ( θ , θ + 1), then an R 2 -v alued suﬃcient s tatistic can exist only in sp ecial cas e s . 4. Asym ptotics The asymptotic p oint o f view undergirded Ghosh’s thinking, even in pr oblems that were not primarily focusse d on asy mptotic prop erties. In a sense, m uc h of his work on sequential analysis , Bayesian analysis a nd Bay esian nonparametr ics are also, at least implicitly , work on asymptotics. In fact, many of the most imp ortant as ymp- totic ideas, such as higher order asymptotics and E dgeworth expansions, were pio- neered by him. Moreov er, in terms of how his thinking progresse d, asymptotics can be r egarded a s the next natural conceptual step after thinking ab out da ta p oints in seq uent ial analy sis, and suﬃciency or inv ariance a s a data summarization tech- nique. That is, o nce we ha ve gather e d and summarized our da ta, we wan t to see where it seems to be leading us. Ghosh’s work on asymptotics can b e broadly gr oup ed into seven c ategories . He work ed on the B ahadur–Gho s h–Kiefer representation for a qua ntile. He made foun- dational c o ntributions to establishing the existence of Edgeworth expansio ns. In higher or der as y mptotics, Ghosh ex amined se cond order e ﬃcie ncy , Bartlett cor rec- tion and contributed to our understanding o f how W ald, Ra o a nd likelihoo d ratio tests compare . Then he turned his attention to Baha dur eﬃciency and the vexing Neyman–Scott problem. J. K. Ghosh’s c ontribution to statistics: A brief outline 5 4.1. Bahadur–Ghosh–Kiefer r epr esentation Bahadur represented a sample q ua ntile approximately as an av erage of i.i.d. r andom v aria bles. T o get this repres ent ation, Bahadur assumed the existence of t w o deriv a - tives of the c.d.f.; the second deriv ative is b ounded and the ﬁr st deriv ativ e is p ositive on a neighbor ho o d of the p -th p opulation q uantile ξ p . Then Baha dur s how ed that the er ror in the represe n tation is O ( n − 3 / 4 (log n ) 3 / 4 ). The o rder of error in Bahadur’s representation is near ly sharp, cf. the exact order n − 3 / 4 (log n ) 1 / 2 (log lo g n ) 1 / 4 ob- tained by Kie fer. One of the reasons this is imp or tant is that the order of error is small enough to obtain asymptotic normality for the sa mple qua nt iles. Howev er, a ssuming the existence of tw o deriv ativ es is somewhat strong. F or instance, it r ules out the lo- cation family from the double exp o nential densit y . O n the other hand, for most statistical purp oses, wher e o nly the asymptotic distribution is imp ortant, having an e rror ter m of or der o p ( n − 1 / 2 ) is enough. Therefore it is of interest to weak en Bahadur’s as sumptions at the expe nse of weakening the conclusion to o p ( n − 1 / 2 ). This is po ssible, even for a v ar iable p oint p n depe nding on n , a s shown in [ 30 ], using only the assumption of po sitive ﬁrst deriv ative a t ξ p . Actually , the idea o f repr esenting a quantile approximately as a n av erage of i.i.d. observ ations occ ur red to Ghosh indep endently in the mid sixties at the sa me time as Bahadur was working on the pro blem. Ghosh w as lo oking at the pro blem in the more g e neral multiv ariate m ultisample fra mework in co nnection with asymptotic normality of multiv ariate rank tests. He did not recor d his pro o f then since it did not extend to the multiv aria te setup at that time. 4.2. Edgeworth exp ansions Edgeworth expans ions are natura l reﬁnements o f a symptotic nor mality results in that they give er ror terms of asymptotically sma lle r order by including mor e terms in addition to the leading nor mal term. How ev er, for a long time, Edgeworth expan- sions were only heuristically justiﬁed. In the pio neering pap er [ 7 ], it is shown tha t under co nditions of ﬁniteness of cer tain moments and a co ndition on c haracteris tic function known a s Cram´ er’s condition is the literature, the r -th order E dgeworth expansion of a smo oth function of sample averages admits err or O ( n − ( r +1) / 2 ). In particular, it follows that for the s ample average, ﬁniteness of the 2 r - th moments is required to justify an E dgeworth e x pansion of order r . The follow-up pap er [ 8 ] relaxes so me moment co nditions. A tho r ough and lucid tr e atment of Edgew orth expansions and higher o rder asymptotics is g iven in Ghosh’s IMS mono g raph [ 32 ]. Another angle on E dg eworth expansio ns comes from Fisher co nsistency . Con- sider an e x po nential family with dens ity prop or tio nal to exp[ P k j =1 w j ( θ ) t j ( x )]. In this co nt ext, a n estimator T n , which is a function of the k -dimensio nal suﬃcient statistic ( P n i =1 t 1 ( X i ) , . . . , P n i =1 t k ( X i )), is Fishe r consistent if T n ( w ( θ )) = θ . As- suming suﬃcient smo o thness conditions and linear indep endence of the compo nent functions w 1 ( θ ) , . . . , w k ( θ ), a Fisher consistent es timator c an be written as a smo oth function of sample a verages, and hence has a n Edgeworth expans ion. In [ 62 ], this Edgeworth expa nsion is compar e d with that o f the MLE, whic h is another Fisher consistent estima tor. Interestingly , for any b owl shap ed loss function, the MLE has better second order risk prop er ties than any o ther Fisher consistent es tima to r. Con- sequently , this g ives a wa y to discr iminate among estimators which ar e ﬁrst or der asymptotically equiv alent. This prop er ty is called second o rder eﬃcie ncy and will be discussed in the nex t subsection. 6 B. Clarke and S. Ghosal Edgeworth-type expansions need not b e restricted to asymptotically no rmal esti- mators. Other limiting distributions can app ear naturally . Recall that log-likelihoo d ratio type statistics a re among the mo st co mmon statistics co n verging to non- normal limits such as a c hi-square distribution. F or loc ally quadr atic functions of sample a verages, suc h as the log-likelihoo d ra tio, a symptotic expansions have been obtained in [ 13 ]. They have a leading c hi-square term. Subsequent terms app ear as co eﬃcients of powers of n − 1 / 2 and are ﬁnite linear c o mbinations of chi-square distributions of degrees of freedoms p , p + 2, p + 4, etc., where p is the degree of freedom of the lea ding term. Similar expansions hold e ven under co nt iguous a lterna- tives with no n-central c hi-squares replacing the chi-square leading term as shown in [ 14 ]. The subseq ue nt terms are ﬁnite linear combinations of non-central chi-square distributions with deg rees of freedoms p , p + 2 , p + 4 , a nd so forth. 4.3. Se c ond or der eﬃciency Second or der eﬃcie ncy (also ca lled third order eﬃciency b y s ome authors) is the natural wa y to compare t wo asymptotica lly e ﬃcie nt estimators since they ar e ﬁr st order equiv a le nt. In pa rticular, it was widely b elieved that the MLE , o r some suit- able v a riant of it, had, asymptotically , the smalle s t p o ssible risk up to the seco nd order. Ghosh, among others like Efro n, Chibisov, P fanzagl, Ak ahira and T a keuc hi, made pioneering contributions to wards rig o rous justiﬁcation of this assertion in [ 64 ]. His main result may b e roughly descr ibed as follows: Let T n be a n eﬃcient estimator a nd consider a modiﬁca tion T ′ n = T n + m ( T n ) /n . Then T ′ n can b e beaten by ˆ θ ′ n = ˆ θ n + g ( ˆ θ n ) /n , a mo diﬁcation of the MLE ˆ θ n , where the function g dep ends on T n and m . Here, by a better estimator, we mean that lim n →∞ n 2 [E θ { W ( T ′ n , θ ) } − E θ { W ( ˆ θ ′ n , θ ) } ] ≥ 0 , for all θ ∈ Θ, for a tr unca ted squared err o r loss W . This pap er also contains o ther impressive results such as Bhattacharya-t ype b o unds, a Bayesian connection with second or der eﬃciency and a notion of second or der asymptotic s uﬃcie nc y . Similar results ab out second order eﬃciency of the MLE f or Pitman closeness and any bo unded b owl shap e d loss functions are given in [ 63 ]. In addition to seco nd or der eﬃciency , there is a no tion of seco nd or der admis- sibility . An estimator is s econd or der a dmissible if there is no estimator which has uniformly smaller seco nd or der risk with strict inequality for at least one point. In [ 59 ], for estimators of the form ˆ θ n + g ( ˆ θ n ) /n , a necessary and suﬃcient condition for second order admissibility under squa r ed error loss is obta ined. These seco nd order optimality prop erties o f mo diﬁed versions of the MLE rais e the issue whether the MLE has o ptimality pr op erties beyond the second o rder. A nice co un terexample in [ 60 ], how ever, concludes negatively . On the other hand, questions on second order admissibility go b eyond the MLE to any BAN estima- tor ˆ θ n mo diﬁed to ˆ θ n + g ( ˆ θ n ) + o p ( n − 1 ). The condition for second o rder Pitman admissibility is obtained in [ 58 ], and its multiparameter version in [ 4 9 ]. Another natural question in the context of second or der admissibility is the fol- lowing. If tw o or more statistics are s e parately seco nd order admissible, for t w o diﬀerent co mpo nents of a para meter with bias o ( n − 1 ), then, is it true that they ar e joint ly second order admissible? The question has a cur ious answer given in [ 16 ]. F or t w o dimensions, they are jointly second order admiss ible, but for three o r more dimensions, they are not jointly second or der admissible. This result is reminiscent of Stein’s phenomenon on or dinary admissibility with resp ect to the squa red error J. K. Ghosh’s c ontribution to statistics: A brief outline 7 loss for es timating the nor mal mean. In tuitiv ely , asymptotica lly , a ll r egular exper i- men ts a re nor mal exp eriments and th us a phenomeno n under normality contin ues to hold asymptotica lly under any reg ular mode l. The int eresting part of the result is that the pheno menon shows up in the second order. 4.4. Bartlett c orr e ction Bartlett intro duce d a r emark able technique, which bea rs his name, to improve the chi-square approximation to the distribution of a log -likelihoo d r atio sta tistic. The idea is embarrassing ly simple: resca le the c hi-square distribution with the second order expansion of the mean of the statistic. It is sur pr ising that suc h a simple strategy improves the approximation so muc h. In the seminal pap er [ 9 ], a v a riant o n Wilks’ theorem tuned to the go al of un- derstanding the Ba rtlett c o rrection was pr esented. Reca ll tha t Wilks’ theor em is the statement that the log-likeliho o d ratio is asymptotically chi-square. How ever, the c hi-square is the re s ult of squa ring no rmals. T o see how this might apply to the log-likelihoo d ra tio statistic, let X 1 , X 2 , . . . be i.i.d. o bserv ations from a para- metric family gov erned by θ = ( θ 1 , . . . , θ p ) and let L ( θ ) b e the log likeliho o d. F o r j = 1 , . . . , p , le t ˆ θ j be the MLE of θ under the null h ypo thesis θ 1 = θ 1 0 , . . . , θ j = θ j 0 , and le t T j = 2 n ( L ( ˆ θ j − 1 ) − L ( ˆ θ j )) 1 / 2 sign( ˆ θ j − 1 − ˆ θ j ), where ˆ θ 0 stands for the un- restricted MLE. Note that squa ring T j gives the usual ob ject in Wilks’ theor e m with limiting ch i-square b ehavior. How ever, now, without the square, ( T 1 , . . . , T p ) is asymptotically normal with error O p ( n − 3 / 2 ) under the gra nd n ull hypothesis. This prop erty of T j gives rise to the Bartlett cor rection in the multidimensional setting. Another result dev eloped in that paper is a Ba yesian version of the Bartlett correctio n. This is a Bartlett corr e ction to the pos terior distribution, conditional on the data, obtained by letting the pr io r tend to the degenera te distribution at the true parameter v alue. The relation b etw ee n the Bartlett cor rection and the Bayesian correctio n gives a deeper understa nding of the Bartlett cor rection phenomenon and leads to a v ariet y of generalizatio ns. F ollowing this pa th, [ 41 ] studied the asymptotic equiv alence of the frequent ist and B ay esian Bar tlett corre c tions for the likeliho o d ratio and the conditiona l like- liho o d r atio statistic (CLR) intro duced by Cox a nd Reid. In particular, the con- ditions for equiv alence are instrumental for g iving a simple pr o of o f the existence of the fr equentist B a rtlett correctio n for the CLR sta tis tic. This was e xtended to the multiv ariate ca se in [ 40 ]. A v ariant on the likeliho o d ratio calle d the adjusted likelihoo d ratio (ALR) was introduced by McCullagh and Tibshira ni. In [ 45 ], it was shown that the ALR statistic ha s b ehavior similar to that of the CLR statistic, in that it a dmits a B a rtlett cor r ection and its p ower under contiguous alternatives is equiv alent to tha t of the CLR up to the order o ( n − 1 / 2 ). In ter ms of a verage p ow er, the agre e men t contin ues up to o ( n − 1 ). 4.5. Comp arison of the li keliho o d r atio, Wald’s and R ao’s statistics The problem of compar ing the likelihoo d r atio (LR), W a ld’s a nd Rao’s tests, with regar d to power has r eceived signiﬁcant atten tion in the statistics and econometrics literature. It is well-kno wn that, up to the ﬁrst o rder o f approximation and under contiguous alternatives, these three tests hav e the same lo cal pow er as dictated by the noncentral chi-square distribution. Discr imination amo ng them, therefo r e, 8 B. Clarke and S. Ghosal calls for compariso n via higher or der power. How ever, while the LR test is loc ally un biased up to a higher order of approximation, the sa me do es not hold in general for the other tw o tests. F rom this p er sp ective, to make them really compar able, Ghosh sugg ested considering lo cally unbiased versions o f W ald’s a nd Rao’s tests. This work, do ne under his super vision, even tua lly led to an optimum prop erty of Rao’s test in terms of third order lo cal p ower. A review o f these dev elopments is av a ilable in [ 31 ]. In addition, the p ow er prop erties of the three tests a s well a s their Bartlett adjustability , when they are dev eloped on the basis of a quasi-likelihoo d rather than a true dens ity-based likelihoo d was disc us sed in [ 48 ]. 4.6. Bahadur–Co chr an deﬁciency T o compare the asymptotic per formance of tw o tests, one may lo ok at their Bahadur–Co c hran r elative eﬃciency , which is the limit, a s δ → 0, of the ra tio of the smallest integers whic h ma ke their levels less than δ . F or many pairs of rea sonable tests, the ra tio turns out to b e 1. T o compar e them at a ﬁner lev el, it is se ns ible to lo ok at their diﬀerence, which may be called the Bahadur– Co chran deﬁciency . The limit inferior (or sup erior) of the diﬀerence, reﬂecting the rela tive adv antage o f one test ov er o ther, was calcula ted in [ 12 ] for some common test statistics . 4.7. Neyman–Sc ott pr oblem and semip ar ametric infer enc e Ghosh ma de notable co ntributions in the Neyman–Sco tt pro ble m also . In the Neyman–Scott problem, a new observ a tion X i is gov erned by a common para meter θ and an a dditio na l parameter ξ i , depending on i , but only the parameter θ is of in- terest. The problem is notoriously diﬃcult in that common estimators , such as the MLE, are us ua lly inco nsistent. F or instance, if the X ij are indep endently no rmally distributed with mea n µ i and v aria nce σ 2 , i = 1 , . . . , n , j = 1 , . . . , k , then the MLE of σ 2 , ( nk ) − 1 P n i =1 P k j =1 ( X ij − ¯ X i ) 2 , where ¯ X i = k − 1 P k j =1 X ij , conv erges to the wrong v alue (1 − k − 1 ) σ 2 . Although it is easy to cor rect the MLE in this particula r situation, in g eneral ident ifying the cor rection is a hard problem. Ghosh pro po sed constructing an asymptotica lly eﬃcien t estimator fo r θ by view- ing the ξ i as random v ar iables arising from an unknown distribution G . The semi- parametric mo del resulting fro m this can then b e explored to ﬁnd eﬃcien t esti- mators fo r θ . In a ddition, eﬃcie ncy in the Neyman–Scott mo del can b e deﬁned in terms of the semipar ametric mo del so the tw o mo dels have man y interesting links betw een them. These links ma y b e exploited and are studied in detail in [ 4 ], [ 5 ] and [ 6 ]. 5. Bay esi an inference Ghosh has a lwa ys b e en very fond of Bayesian ideas and, later in his career, he bec ame more convinced that the B ayesian a pproach to statistics is mo r e natura l and fruitful. Over the course of his in vestigations, Ghosh examined eac h asp ect of the Bay esian formulation, from co nstruction o f a prior to model selection, to asymptotic prop erties. And a gain, this can b e se e n a s a co ntin ua tio n from his asymptotic work. After all, once the asymptotics are developed, we wan t to see how they can be used in a complete infere nce problem and the Bayesian setting J. K. Ghosh’s c ontribution to statistics: A brief outline 9 provides a uniﬁed con text. Indeed, Ghosh’s con tributions hav e helped speed the developmen t of several branches of Bay esian analy sis because of his asymptotic orientation. Ghosh ha d alwa ys b een pra gmatic and thoug ht that a go o d statistical metho d should have go o d frequentist pro p er ties as well a s sensible conditional pr op erties. Moreov er, as in the freq ue ntist case, asymptotics often play a vita l r o le in Bay e sian inference and o ne of the recurring themes in Ghosh’s work has b een the quest for frequentist pro pe r ties of pos terior distributions. As one of the leaders in dev eloping ob jectiv e Bay esian methods, he regula rly w orked to reconcile the two schoo ls of thought. The pap er [ 57 ] elab orately r eviews issues and developmen ts in ob jective Bay esian metho dolo gy . Ghosh’s Bayesian w o rk can be broadly gro uped into four c a tegories . He has work ed on frequentist matching and other ob jective pr iors. He has worked on de- termining the limiting behavior of p osterior distributions in the parametric co ntext. Then, he has turned his atten tion to richer mo del classes, examining Bay esian non- parametrics and mo del sele c tion. 5.1. Matching and other obje cti ve pri ors Ghosh had never b een very keen on the term noninformativ e to describ e priors that are co nstructed thro ugh some automatic mechanism ra ther than throug h a sub jectiv e ass e ssment of o dds. His prefere nce was to use these priors as o b jectiv e or default pr io rs in the abs ence of genuine sub jectiv e infor mation. T o him, such priors can b e obtained b y any o ne of v a rious techniques including matching what a frequentist mig ht use, inv ar iance, en tropy-t ype maximizations (reference pr iors), approximation, o r anything that s e e med reasonable. The idea of a Bay esian choo sing a prior so as to match freq uent ist inferences was origina lly in tro duced by Peers a nd W elch, but the term “probability matching prior” was ﬁrst us e d by Ghosh and Mukerjee [ 42 ] and the approa ch b ecame popular after Ghosh’s pr esentation in the 1990 V alencia meeting. The basic idea is quite simple: choose a prior so that Bayesian notions like credibilit y approximately agree with the cor resp onding frequen tist notions like conﬁdence level. Ho w ev er, when asymptotic normality of the p o s terior distribution holds (discuss ed in the next subsection), it means tha t the v a riability accor ding to the pos ter ior distr ibution of a parameter is a symptotically equiv alent to the sampling ﬂuctuations o f the MLE in the freq uentist sense. This implies that Bay e s-frequentist matchin g o ccurs for any prior under minimal restrictions. Consequently , to ide ntify a pr ior uniquely , ﬁrst order matching of limits is not eno ug h. Satisfyingly , agreement contin ues to the next or der, but only if the prior is of a certain form. Th us matching can be used to characterize a prio r, which may then b e thought of as ob jective at least in the sense that it was not c hosen accor ding to the per sonal views of the exp erimenter. Of course, neither Bay esian cre dibility sets no r frequentist co nﬁdence s e ts are unique, so when θ is a scalar, it is natural to look at one sided interv als. If W is a prop erly cent ered and nor malized version of the para meter, then equating the p osterio r pro bability P π ( W ≤ t | X 1 , . . . , X n ) with the fr equentist proba bilit y P θ ( W ≤ t ) for t and ensuring b oth s ides are 1 − α up to o ( n − 1 / 2 ) for ea ch α leads to a diﬀerential e quation. The Jeﬀrey s prior is the so lution to this equation. A multiparameter version of this frequentist-Bay esian matching w as used in [ 43 ]. In higher dimensio ns, the comp onents of W ma y be deﬁned b y successively co mput- ing the regre s sion residual o f the current comp onent ov er the earlier co mpo nents. 10 B. Clarke and S. Ghosal Naturally , this depe nds o n the ordering of the parameter s, but the dep endence is not pr esent when the parameter s are orthog onal. In these cases , the matching c r i- terion lea ds to pa rtial diﬀerential equations. Curiously , Jeﬀreys’ prior is a solution in some but not all cases: the lo cation-s cale pro blem is an impor tant exception. In fact, it is well known that Jeﬀr e ys’ pr ior, which is also the left Haar measure , may be an ina ppr opriate choice in this cas e, so the matching criter ion g enuinely leads to sensible solutio ns even in high dimensiona l cases. The ma tching prior is clos e ly related to o ther imp ortant ob jectiv e prior s such as the r eference pr io r. Reference prior s often dep end o n the role of the para meter; nu isance parameters a r e trea ted diﬀerent ly fro m pa rameters of interest. Interest- ingly , in the tw o parameter case, a cute observ ation of Gho s h is that the reverse reference prior , r ather than the reference prio r itself, is probability matching. Here, by reverse reference prio r, we mean the reference prior c o mputed by reversing the roles of the pa rameter of interest and the nuisance parameter. More details a nd discussion of o ther prop erties, such a s weak minimaxity , may b e fo und in [ 42 ]. Although matching po sterior pro babilities do es yield useful insig ht, highest p os- terior density (HPD) regions are more eﬃcien t credible s ets from a B ay esian stand- po int . Acco rdingly , matching the coverage proba bilit y of HPD regions with the cr ed- ibilit y is an alterna tive that might b e more app ealing to some Ba yesians. When this matching is done to o ( n − 1 ), it lea ds to diﬀeren tial equations characterizing prior distributions. These w ere derived in [ 44 ]. In some cases, Jeﬀreys’ prior so lves these equations and so is a matching prior in the sense of cov erage proba bility a s well. A related pap er is [ 46 ]. Matc hing the co verage of one-sided posterio r credibilit y int erv als fo r parametric functions up to O ( n − 1 ) was studied in [ 1 7 ]. Alternatively , instead o f c haracterizing a pr io r through matching, one might ask if there is some adjustmen t to make matching work for any prior satisfying mild general conditions. Indeed, in [ 47 ], it is shown, with examples , that if the center of the (1 − α )- HP D ellipsoid is a ppropriately shifted by a o ( n − 1 / 2 ) amount, where the corr ection is o btained b y solving an equation dep ending on the prior , then the resulting p erturb ed HPD ellipsoid’s cov erage is 1 − α + o ( n − 1 ). Of course, there ar e ma n y s ensible no tions of ob jectivity for a prior o ther than matching. Inv ariance is often the dr iving force in group mo dels , where a g roup of transformatio n is acting on the parameter space a nd the para meter of in terest is the maximal inv a riant parametric function. In [ 18 ], a detailed study of v arious pr iors such as the Chang–E av es pr ior for g roup mo dels is given in the light of matching and the mar ginalization paradox. 5.2. Limits of p osteri or distributi ons One of the mo st in triguing results in statistics is the Bernstein–von Mise s theorem, which sta tes that the p o s terior distribution of the para meter centered at the MLE and scaled by √ n times the square roo t o f the Fisher information co nv erges to the sta ndard normal distr ibutio n almost surely , as the s a mple size increases to inﬁnit y . This para llels the frequentist res ult that √ n ( ˆ θ − θ true ) is a s ymptotically normal with v ariance given b y the in v erse Fisher information. In essence, po s terior normality implies that in an asymptotic sens e, a t least to ﬁrst o rder, a n y s ensible Bay esian must agree even tually with frequentist notions of v a riability . Ghosh worked to extend poster ior norma lit y in a v a riety o f dir e ctions. One natu- ral idea is to lo ok a t higher order prop erties so that the usual normal limit is viewed as mer e ly the ﬁr st term in an asy mptotic expansion. This parallels the sense in J. K. Ghosh’s c ontribution to statistics: A brief outline 11 which an Edgeworth expansio n is an improvemen t over the standa rd c ent ral limit theorem. Jo hnson pioneer ed suc h expa nsions, but the probability statemen ts in his expansions are in terms o f the true distr ibutio n o f the sample. Often, a Bay esian is mo re interested in b ounds that are uniform on se ts with hig h pro bability in the marginal distribution of the sample. In [ 61 ], precis e co nditions were given so that Johnson’s expansio n of the p oster ior distribution holds on a se t with mar ginal prob- ability 1 − O ( n − r ), where r is the extra num ber of terms in the expansion, i.e., not counting the leading normal. It was a lso shown, by coun terexamples, that s o me of the earlier published results in the ﬁeld ar e incorrect. Sometimes it is meaningful to condition on a s tatistic ra ther than the full data to obtain the p o sterior distribution. In particular , s inc e the sa mple mea n is a widely used summa r y measur e, it is natural to as k if a version of the Bernstein–von Mises theorem holds when the po s terior is computed given only the mean. P r ovided that exp ectation and v ar iance are smo oth functions, and the eigenv alue s o f the cov ari- ance matrix ar e uniformly bounded a nd bo unded aw ay from zer o, it is shown in [ 15 ] that a no rmal limit for the p osterio r distribution is obtained. The v ar iance of the limiting distribution can equal the v ariance of an obser v ation, but in general, the nor mal limit can diﬀer from that in the usual Bernstein– von Mises theorem, unless the sa mple mean is asymptotica lly s uﬃcient . The pro of is based on an Edge- worth expans ion for the s ample mean a nd a lo cal limit theorem. The idea extends to indep endent but non-identically distr ibuted observ ations. More broadly , the Ber ns tein–von Mises phenomenon in a pa rametric family may be seen a s the conv ergence of the p oster ior dens it y o f the sta nda rdized parameter to a non-degenerate distribution. In general, the centering need not b e at the MLE, the scaling need not be √ n and the limit distribution need not be no rmal. Indeed, in so me nonregular families suc h as the uniform distribution on [0 , θ ] o r the locatio n family of the exponential dis tribution, cen tering by the Bayes estimator and scaling by n yields a n exp onential limit. This leads to the following question: F or whic h families will a limit of the pos ter ior distribution exist? When it do es ex ist, what is the co r rect cen tering, scaling and limiting distribution? This pro blem is germa ne to approximating pos terior distributions numerically when n is larg e. Under the g eneral setting up of a parametric family considered by Ibrag imov and Has’minskii in their b o o k, a very e legant characterization was giv en in [ 35 ] and [ 23 ] in terms o f the b ehavior of the limiting (lo cal) likeliho o d ratio pro cess of the mo del, Z n ( u ) = p ( X n ; θ + r n u ) /p ( X n ; θ ), where X n is the o bserv a tion at stag e n and r n is the appro priate normaliz er for the problem. Usually X n = ( X 1 , . . . , X n ) and n is the s ample s iz e . Let Z ( u ) stand for the weak limit of Z n ( u ) and ξ ( u ) = Z ( u ) / R Z ( v ) dv , a random probability densit y . Under the natural scaling in t he family , the p osterio r distribution conv erges to a limit, after appropria te cent ering, if and o nly if ξ ( u ) = g ( u + W ) for some ﬁxed probability density g a nd a ra ndo m v aria ble W , i.e., a s a random element in L 1 , ξ ( · ) is a ra ndo m lo cation shift of a ﬁxed probability densit y g . When this holds, g is the limit of the p osterior density . Clearly , this is a stringent representation, so in man y nonreg ular case s the pos ter ior distribution will no t ha ve a limit. Int erestingly , in the regular cases, loc al asymptotic normality implies that ξ ( u ) = g ( u + W ), in whic h g is norma l and W is a random normal shift. Th us this yields a Bernstein–von Mises theorem under an extremely general condition. A s imilar limit theorem holds with an expone ntial limit whenever densities are positively supp or ted on an interv al [ a ( θ ) , b ( θ )], where the supp or t is either expanding or c o ntracting. While it is disappo int ing to ﬁnd that poster ior limits exist only in r elatively ra re cases, it do e s not rule out the possibility of ﬁnding useful a pproximation to the 12 B. Clarke and S. Ghosal po sterior distributio ns dep ending on the sample size n . F or change-p oint problems, where the density jumps from a p ositive v alue to another p o sitive v alue a t an unknown lo ca tion but is otherwise smo oth, a useful approximation was obtained in [ 24 ] by normalizing an a pproximation to the likelihoo d r atio pro c e s s. It turns out that a certain mixture o f n many trunca ted and shifted expo nent ial densities is a go o d approximation. 5.3. Bayesian nonp ar ametri cs Ghosh’s inv olv ement with Bay esian nonpar ametrics started in the mid 90 ’s with the pap er [ 5 1 ] attempting to determine whether the prio rs used for surviv a l analysis le a d to co nsistency under censo ring. This pap er show ed that for the Dirichlet pro ces s, the poster ior under censoring ca n b e represented as a P` olya tree pro cess whose par- tition dep ends on the data, and then cons istency can be obtained from the tail-free prop erty of P` olya tree pro cesses . The question is followed up in subs equent paper s [ 53 ] and [ 36 ]. Since then, Ghos h has contin ued to b e one of the most imp orta n t contributors to understanding the as y mptotics of Bay esian nonpar ametrics. F or instance , the sear ch for a noninformative prior for inﬁnite dimensiona l mod- els, as an extension to the ﬁnite dimensio nal case, is ongoing. One approach is to generalize the notio n of a uniform distribution. This w as prop osed in [ 19 ] us- ing uniform distributions on discrete approximations to a space found by ma ximal ǫ -disp ersed sets. Even in the para metric setting this approach is fundamen tal a nd leads to Jeﬀreys ’ prior . The approach g ives co nsistency in inﬁnite-dimensional cases. More typically , Ghosh was str ongly motiv ated by the examples of inco nsistency of po sterior distributions in inﬁnite dimensional mo dels. While he apprecia ted tho se illuminating examples, he was always hop eful that Bay es’ metho ds would work if priors w ere co nstructed prop erly . He was par ticularly fond of the Kullback–Leibler prop erty which req uires that the true dis tribution b e in the s upp or t of the prior when distances ar e measured in terms of K ullback–Leibler divergence. That is, the prior should assign strictly p ositive probability to ev ery Kullback–Leibler neighbo r - ho o d aro und the tr ue distribution. Because of this, Ghosh thought the Dirichlet pro cess was inappropria te in many contexts, despite its eviden t utilit y , since its discreteness means it fails to hav e anything in its Kullback–Leibler supp ort. In [ 20 ], it was shown that a prior with the Kullback–Leibler prop erty , such as a suitable P` olya tree or a Dirichlet mixture pro cess, can ov ercome the inconsistency property of Diric hlet pr o cesses for esti- mating a lo cation para meter. E ssentially the s ame phenomenon appe a rs in line a r regres s ion mo dels as s hown in [ 1 ]. In that pap er, the ﬁr st ex tension of a genera l po sterior co nsistency theorem to indep endent non-identically distributed v aria bles is also developed. In Bayesian no nparametrics, consistency o ften combines testing concepts with sieves. A celebrated re sult o f Sch w artz empha s izes the ro le of tests for the true density f 0 versus the complement of a neighbo r ho o d, say V , a r ound it. The bas ic idea is to construct tests, b y cov ering V c with man y small balls, s ay B i ’s, and testing f 0 versus B i for e ach i using p ow erful tests. One can then simply lo o k a t the maximum of all tests against each small ball, whose t yp e II er r or pr obability is clea rly under control and the type I er ror pr obability b ounded by the co mmon exp onential bo und for error pr obability m ultiplied by the n umber of small balls required to cover V c . Thus the conc e pt of metric e n tropy , which is the lo garithm of the num b er of ba lls required to cov er a set, co mes in to the pictur e. Gener ally J. K. Ghosh’s c ontribution to statistics: A brief outline 13 V c is not co mpa ct, and it is not p ossible to cov er it by ﬁnitely many s ma ll balls . The diﬃculty can b e overcome by using a sieve, which is a sequence o f increasing subsets of a parameter space that gr adually ﬁll out the whole par a meter space. One may ignore the por tion of the parameter space outside the sieve as long as that part has exp onentially small prior probabilit y . Now o ne m ust con trol the metric en tropy of the sieve to ensure that it do es not grow fa ster than a small mult iple of n . This style of pro o f gives consistency for de ns it y estimation with Dirichlet mixtures of normal kernels as shown in [ 21 ], providing a lar g e sample justiﬁcation for the mo s t widely used Bay esian densit y estimator. This approach w orks for dens it y estimation with other priors in pla ce of the Dirichlet mixtures. In [ 68 ], consistency is obtained for the logistic Gaussian prior for a dens ity , that is, a prior on densities o btained b y exp onentiating a nd then normalizing a Gaus sian pro cess. The imp ortance of entropy f or p os terior consistency app eared in [ 21 ]. Ther e it is seen that in the no nparametric setting prior p o sitivity at the true density m ust be satisﬁed, but in terms o f spec ial neighborho o ds given by the Kullba ck– Leibler n um ber . Moreov er, it m ust be p oss ible to c hoos e a sieve whose entropy grows no faster than the ra te O ( n ), while ensuring that the pr ior probability of the complement of sieve is exp onentially small as n incr eases. This obs erv a tion led to the deriv ation of the results on p o s terior conv ergence rates in [ 25 ] in the sense that the conditions fo r ra tes can be viewed as quantitativ e analog s of the conditions for consistency . F or instance, instead of just requir ing that the prior fo r a ﬁxed ǫ neighbo rho o d in the K ullback–Leibler sense ha s p os itive probability , o ne now needs to show that the prio r pro bability o f the Kullback–Leibler neighborho o d of r adius ǫ n is a t lea st e − nǫ 2 n , where ǫ n is the intended ra te of p osterior conv ergence. In a similar manner, requiring that the ǫ n - entrop y of the sieve be bo unded by a m ultiple of nǫ 2 n is also reminiscent of the condition that the ǫ -entropy of the siev e s hould be b ounded by a small m ultiple o f n . Thus, for ﬁxed ǫ n , this reduce s to the condition for consistency . The pap er als o constructs a prio r achieving optimal rates of c o nv ergence by brack eting densities ab ov e a nd below by t w o functions – cho osing a ﬁnite collection of brack ets to provide upp er and low er b o unds for any probability density in the given class. This ensur es go o d approximation o f any function within the brack et together with a control ov er likelihoo d ratios. This can b e viewed as a reﬁnement of the constructio n pro p osed in [ 19 ]. Other approa ches to optimal ra tes a re also discussed, most notably , through exp onential families genera ted using a B-spline basis. Many asp ects of Bayesian asymptotics for inﬁnite dimensional mo dels are neatly summarized in the review [ 22 ], and thoroughly discussed in [ 52 ], which to date is the only b o ok de a ling with asymptotic res ults in Bay esian nonpara metrics. 5.4. Mo del sele ction and Bayesian hy p othesis testing T esting h ypothes e s is a ma jor area where fr equentist and Bay e s ian pro cedures often diﬀer substantially . There is a tendency for fr equentist metho ds to over-reject just as there is a tendency for Ba yes’ methods to under-reject, a s in the L indley paradox. Results such as the consistency of the Bay esian information criterion (BIC) bridge the gap somewhat b ecause the BIC approximates Bay es factors and is frequentist consistent for mo del sele ction under appropria te co nditions in the sense that the BIC selects the co rrect mo del with pro ba bilit y tending to one. 14 B. Clarke and S. Ghosal These pr op erties of the B IC are v a lid only if the dimension p o f the mo del re- mains ﬁxed. How ever, for many applications, esp ecially for complex data c ontaining nu merous v ariables co mmonly ar is ing now adays, the BIC fails to appr oximate the Bay es factor a de q uately and is co nsistent. The main reaso n for the failure is ignor- ing certain terms in an expansion o f the Bay es factor which are not neglig ible when p → ∞ . The diﬃculty ca n be avoided by paying pro p e r attention to these terms. In [ 3 ], a correction is pr op osed b y intro ducing tw o mo re terms, one is pro po rtional to p and the other to log p , as well as changing the meaning of the s ample size to the num ber of replicatio ns. The resulting “generalized BIC” then selects the correct mo del with increa singly high proba bilit y . Another gener alization o f BIC is devel- op ed in [ 11 ] which works in a genera l exp onential family . These generaliza tions of BIC are p ow erful to o ls to ov ercome the ch allenges p o sed by high- dimens io nal data problems of co n tempo rary statistics. In mo del selection pro blems, the deﬁnition of optimality is often tricky . An ap- pea ling appr oach is comparison with the or acle, that is, with the b est pro cedure (for a giv en lo ss function) whic h uses the knowledge o f the cor rect model in making decisions. A parametric empirica l Bay es (PEB) a pproach approximates the Bay es factor by deriving the rule in a par ametric mo del but estimating the parameters in the penalty function by a penaliz ed lik elihoo d criterion with data dep endent pena lty . In [ 6 6 ], the relative p erfor mance of a PEB, the AIC and the BIC were thoroughly studied through asy mptotics and simulations under b oth 0-1 a nd pre- diction loss. The conclus ion is tha t the BIC p erforms badly , but PEB rules perfor m quite satisfac to rily , and so do es t he AIC. If Bayes estimates are used in ma k ing predictions, instead of least s q uares estimator s, a PE B p erfor ms b etter than the AIC. One particula r diﬃcult y with the Bayes factor is that it is undeﬁned when im- prop er prior s are used in individual mo dels. V arious remedies are pro po sed in the literature, based on the idea o f using a part o f the information cont ained in the data (training p ortion) to make priors pr op er and use the remaining p ortio n in Bay esian analys is with the obtained “prop er prio r”. Since this typically dep ends on the ordering of the data, so me kind of av eraging, through bo otstra p o r cr oss v alida- tion, ov er diﬀerent choices of the tr aining p ortion is desired. A particularly po pular candidate among these B ayes factors is obtained by taking a geometr ic av erage. In [ 67 ], such Bay es facto rs are studied thro ug h a symptotics as the prop or tion of the training sample v aries, and conditio ns for c onsistency a r e obtained as the total sample size go es to inﬁnity . It turns of that pr edictive optimality of the “geometr ic Bay es factor ” as it is often claimed is not ent irely corr ect. There are many other signiﬁcant pap ers on mo del selectio n autho r ed by Ghosh. In [ 10 ], optimality of the AIC in infer ence a bo ut Brownian motion is shown. The reviews [ 56 ] a nd [ 55 ] contain wealth o f information on Bayesian model selection. 6. Co ncluding remarks Overall, Ghosh’s work in statistics reveals a prog ression. He b egan with individual data p oints, pro ceeded to data summar ization, and then to the a symptotics of inference. Ghosh’s r esults there were a successful attempt to map o ut wher e the accumulation of data tend to p oint. In a sense, as ymptotic limits are the ultimate data s ummarization. Then, putting it all together, Gho sh turned to the Bay esian formulation, exa mining ea ch of its comp onents, prior, mo del, p oster ior, in turn, to per mit a c o mprehensive and uniﬁed study o f the statistica l problem. Indeed, his J. K. Ghosh’s c ontribution to statistics: A brief outline 15 recent work on Bay esian nonparametr ics is a further g eneralizatio n, again a logical step b ecause it builds on his earlie r work b y using ever r icher mo del c la sses. In fact, Ghosh ha s w orked in ma ny more areas of statis tics , apart fro m those outlined above, as w ell as w orking on a v ar iety of applicatio ns. These topics include distribution theory , decisio n theory , ro bustness, ﬁnite population sampling, relia bil- it y , quality co ntrol, mo deling h ydro carb on discov ery , geologic al mapping and DNA ﬁngerprinting. Finally , every gre a t re s earcher has a stra tegy , a method or a drive, often sum- marized in a max im, that guides or motiv ates their intellectual endeav ors. O ne of Ghosh’s max ims was the injunction: “Settle the question!” By this he meant for mu- late a question s o that answering it gives you something deﬁnite for the formulation of another question. As can b e inferre d fro m the progre s sion of his work, his ques- tioning led him to an ever broa der view of the statistical problem, culmina ting in a Bayesian treatment o f high-dimens io nal models, nonpara metric or not. Ghosh’s injunction to settle questions has help ed, a nd will contin ue to help, r esearchers all ov er the world to think deeply ab o ut the most imp ortant is s ues. References [1] Amewou-A tisso, M., Ghosal, S. , Gh osh, J. K. and Ramamoor thi, R. V. (20 0 3). Posterior consistency for semi-para metric r egress ion pr oblems. Bernoul li 9 291–31 2. MR19970 31 [2] Basu, D. an d Ghosh, J. K. (196 9 ). Suﬃcient statistics in s a mpling fr om a ﬁnite universe. Bul l. In s t . Int ernat. St atist. 42 850– 858. MR02861 97 [3] Ber ger, J. O., G hosh, J. K. and Mukhop adhy a y, N. (2003). Approxi- mations and consistency of Ba yes facto r s as mo del dimensio n grows. J. Statist. Plann. In fer enc e 112 24 1–25 8 . MR19617 33 [4] Bhanja, J. and Ghosh, J. K. (1992 ). Eﬃcient estimation with many nui- sance parameter s. I. Sankhy¯ a Ser. A 54 1–39. MR11897 81 [5] Bhanja, J. and Ghosh, J. K. (1992 ). Eﬃcient estimation with many nui- sance parameter s. I I. Sankhy¯ a S er. A 5 4 135–1 56. MR11 9209 1 [6] Bhanja, J. and Ghosh, J. K. (1992 ). Eﬃcient estimation with many nui- sance parameter s. I I I. Sankhy¯ a Ser. A 54 2 97–3 0 8. MR12 1 6288 [7] Bha tt a char y a, R. N. and Ghosh, J. K. (1978 ). O n the v alidity of the formal Edgeworth e xpansion. Ann. Statist. 6 43 4 –451 . MR04711 42 [8] Bha tt a char y a, R. N . and Ghosh, J. K. (1 988). On momen t condi- tions for v alid for mal Edg e worth expansions. J . Multivariate Anal. 27 68–79 . MR09711 73 [9] Bickel, P. J. and Ghos h, J. K. (199 0). A decompo sition for the likelihoo d ratio statistic a nd the Ba rtlett correction – a Bay esian argument. Ann. S tatist. 18 1070 – 1090 . MR10626 99 [10] Chakrabar ti, A . and Ghosh, J. K. (2006). Optimality of AIC in inference ab out Brownian motion. Ann. Inst. Statist. Math. 58 1 – 20. MR22812 04 [11] Chakrabar ti, A. and Ghosh, J. K. (2 006). A genera lization of BIC for the general exp onential family . J. Statist. P lann. Infer en c e 136 2847–287 2. MR22812 34 [12] Chandra, T. K. and Ghosh, J. K. (1978). Compariso n of tests with same Bahadur eﬃciency . Sankhy¯ a S er. A 4 0 253–2 77. MR05892 81 [13] Chandra, T. K. and Ghosh, J. K. (1979). V alid asymptotic expa ns ions for the likeliho o d ratio statistic and other p erturb ed c hi-square v ariables. S ankhy¯ a Ser. A 41 22 –47. MR06150 38 16 B. Clarke and S. Ghosal [14] Chandra, T. K. and Ghosh, J. K. (1980). V alid asymptotic expa ns ions for the likeliho o d ra tio and other statistics under contiguous alternatives. Sankhy¯ a Ser. A 42 17 0–184 . MR06562 54 [15] Clarke, B. and Ghosh, J. K. (1995). Posterior conv ergence given the mean. Ann. Statist. 23 21 16–2 144. MR13898 68 [16] DasGupt a, A. and Ghosh, J. K. (1983). Some remar ks on second- order admiss ibility in the multiparameter case . Sankhy¯ a Ser. A 45 181 – 190. MR07484 57 [17] Da tt a, G. S. and G hosh, J. K. (1995). O n priors pro viding fr e quentist v alidity fo r Bayesian inference. Biometrika 82 37–45 . MR1332 838 [18] Da tt a, G. S. an d Ghosh, J. K. (19 95). Noninforma tive priors for maximal inv ar iant parameter in gro up models . T est 4 95 –114. MR136504 2 [19] Ghosal, S., Ghosh , J. K. and Ramamoor thi, R. V. (1997). Non- informative priors via sieves and pac king num bers . In Ad vanc es in Statisti- c al De cisio n The ory and Applic ations 11 9–132 . Stat. In d. T e chnol. B irkh¨ auser , Boston. MR14791 80 [20] Ghosal, S., Ghosh, J. K. and Ramamo or thi, R. V. (1999). Consis- ten t semiparametric Bay esian inference ab out a lo cation parameter . J. S tatist. Plann. In fer enc e 77 181–1 93. MR16879 55 [21] Ghosal, S., Ghosh, J. K. and Ramamoor thi, R. V. (1999). P osterior consistency of Dirichlet mixtures in density estimation. Ann. S t atist. 27 1 43– 158. MR17011 05 [22] Ghosal, S., Ghosh, J. K. and Ramamo or thi, R. V. (1999). Consis- tency issues in Bay esian nonpa r ametrics. In Asymptotics, Nonp ar ametrics, and Time Series 639–66 7. St at ist . T extb o oks Mono gr. 158 . Dekk er, New Y ork. MR17247 11 [23] Ghosal, S., Ghosh, J. K. and Samant a, T. (1 995). On con v ergence of po sterior distributions. Ann. Statist. 23 2145 –2152 . MR138986 9 [24] Ghosal, S., Ghosh, J. K. and Samant a, T. (1999). Approximation of the po sterior distribution in a change-po in t problem. Ann. Inst. Statist. Math. 51 479–4 97. MR1722 841 [25] Ghosal, S., G hosh, J. K. and v an der V aar t, A. W. (2000 ). Conv ergence rates of p osterior distributions . Ann. Statist. 28 500 –531 . MR179000 7 [26] Ghosh, J. K. (1960). On so me prop erties of seq ue ntial t -test. Calcutta Statist. Asso c. Bul l. 9 77– 86. MR01142 77 [27] Ghosh, J. K. (1960 ). On the monotonicity of the O C of a class of sequent ial probability r atio tests. Calcutta Statist. Asso c. Bul l. 9 139 –144 . MR011784 9 [28] Ghosh, J. K. (196 1). On the optimality of probability ratio tests in sequential and m ultiple sa mpling . Calcutta S tatist. A sso c. Bul l. 10 73– 92. MR013077 4 [29] Ghosh, J. K. (1964 ). Bayes solutions in sequential pr oblems for t wo or more terminal de c isions and related r esults. Calcutta Statist. Asso c. Bul l. 13 101– 122. MR01724 22 [30] Ghosh, J. K. (1971). A new pro of of the Bahadur repr esentation o f quantiles and an applicatio n. A nn. Math. Statist. 42 1957– 1 961. MR02 9 7071 [31] Ghosh, J. K. (19 91). Higher order asymptotics for the likelihoo d ratio, Rao ’s and W ald’s tests. Statist. Pr ob ab. L ett . 12 505– 5 09. MR11437 47 [32] Ghosh, J. K. (199 4). Higher Or der A symptotics. NSF-CBMS R e gional Con- fer enc e in Pr ob ab ility and St atistics 4 . IMS, Hayw ard, CA. [33] Ghosh, J. K. and Chaudhuri, A. R. (1 984). An in v ar ia nt SPR T for iden- tiﬁcation. Se quential Anal. 3 99–1 2 0. MR07672 49 [34] Ghosh, J. K., Del amp a dy, M. and Samant a, T. (2007). An Intr o duction J. K. Ghosh’s c ontribution to statistics: A brief outline 17 to Bayesian Analysi s, The ory and Metho ds. Springer, New Y or k. MR22474 39 [35] Ghosh, J. K., Ghosa l, S. and Samant a, T. (1994 ). Stabilit y and conv er- gence of the p oster ior in non-r egular problems. In Statistic al De cision The ory and R ela te d T opics V 183–1 99. Spring er, New Y ork . MR12863 04 [36] Ghosh, J. K., H jor t, N. L., Messan, C. and Ramamoor thi, R. V. (2006). Bay esian biv ariate surviv a l estimation. J. Statist. Plann. Infer enc e 136 2297– 2308 . MR2235 060 [37] Ghosh, J. K., Morimoto, H. and Y amada, S. (1981 ). Neyman factoriza- tion and minimality of pairwise suﬃcient subﬁelds. Ann. Statist. 9 5 14–5 30. MR06154 28 [38] Ghosh, J. K. and Mukerjee, R. (19 89). Some optimality r esults on Stein’s t wo-stage sampling. In Statistic al Data Anal ysis and Infer enc e 251–25 6. North- Holland, Amsterdam. MR10896 40 [39] Ghosh, J. K. and Mukerjee, R. (19 90). Improvemen t in Stein’s pro cedure using a random c o nﬁdence co eﬃcient. Calcutta Statist. Asso c. Bul l. 40 1 45– 152. MR11726 40 [40] Ghosh, J. K. and Muk erjee, R. (1 991). Char acterizatio n o f priors under which Bay esian and frequentist Bartlett co rrections a r e equiv alen t in the mul- tiparameter case. J. Multivariate Anal. 38 385–3 93. MR11317 27 [41] Ghosh, J. K. and Mukerjee, R. (1992 ). Bayesian and frequentist Bartlett correctio ns for likelihoo d r a tio a nd conditional lik eliho o d ratio tests. J. R oy . Statist. So c. Ser. B 54 867– 875. MR11852 28 [42] Ghosh, J. K. an d Mukerjee, R. (1992 ). Non-informative priors. In Bayesian Statistics 4 195–2 10. O x ford Univ. Press, New Y o r k. MR13802 77 [43] Ghosh, J. K. and Mukerjee, R. (19 93). On priors that match p os terior and frequentist distribution functions. Canad. J. Statist. 2 1 89–96 . MR1221 860 [44] Ghosh, J. K. and Mukerjee, R. (1993). F requentist v alidity of highest po sterior density regions in the mult iparameter case. Ann. Inst. Statist . Math. 45 293– 3 02. MR12324 96 [45] Ghosh, J. K. and Mukerjee, R. (1 994). Adjusted versus co nditio na l like- liho o d: p ower prop erties and B artlett-type adjustment. J . R oy. S tatist. So c. Ser. B 56 1 85–18 8. MR12578 06 [46] Ghosh, J. K. and Mukerjee, R. (1995). F requentist v alidity o f highest po s - terior density regions in the presence o f n uisance parameter s. Statist. De cisions 13 131– 1 39. MR13427 34 [47] Ghosh, J. K. and Mukerjee, R. (1995). On p erturb ed ellipsoidal and high- est p osterio r density regions with a pproximate f requentist v alidity . J. Ro y. Statist. So c. Ser. B 57 761– 769. MR13540 80 [48] Ghosh, J. K. and Mukerjee, R. (20 01). T est statistics a rising from qua si likelihoo d: Ba rtlett adjustment a nd hig he r -order power. J. Statist. Plann. In - fer enc e 97 45– 55. MR18513 73 [49] Ghosh, J. K. , Mukerjee, R. and Sen, P. K. (199 6). Second-order Pitman admissibility a nd Pitman closeness: the m ultiparameter case and Stein-rule estimators. J. Multivariate Anal. 57 52–6 8. MR13925 77 [50] Ghosh, J. K., Purka y as tha, S. and Sama nt a , T. (2004). Sequential prob- ability ra tio tests bas ed on improp er priors. Se quential Anal. 23 58 5 –602 . MR21039 10 [51] Ghosh, J. K. and Ramamoor thi, R. V. (1995). Consistency of Bay esian in- ference for surviv al analy sis with or without censoring . In Analysis of Censor e d Data. IMS L e ctur e Notes Mono gr. Ser. 27 . IMS, Hayw ard, CA. MR14 8334 2 [52] Ghosh, J. K. and Ramamoor thi, R. V. (20 03). Bayesian Nonp ar ametrics . 18 B. Clarke and S. Ghosal Springer, New Y or k. MR19922 45 [53] Ghosh, J. K., Ramamoor thi, R. V. and Srikan th, K. R. (1999 ). Bay esian analysis of cens ored data. Statist . Pr ob ab. L ett. 4 1 2 55–26 5. MR16723 93 [54] Ghosh, J. K. and R o y, K. K. (1972 ). F a milies of densities with non-constant carrier s which hav e ﬁnite dimensio nal suﬃcient statistics. S ankhy¯ a Ser. A 34 205–2 26. MR0378 158 [55] Ghosh, J. K. and Samant a, T. (2001 ). Mo del selection – an overview. Cur- r ent S cienc e 80 (9), 1 135– 1 144. [56] Ghosh, J. K. and Samant a, T. (200 2). Nonsub jective Bayes testing – an ov erview. J. Statist. Plann. Infer enc e 103 205– 223. MR18969 93 [57] Ghosh, J. K. and Samant a, T. (2 002). T ow ards a nonsub jective Bayesian paradigm. Unc ertainty and Optimality 1–6 9. W orld Sci. Publ., Riv er Edge, NJ. MR19559 63 [58] Ghosh, J. K. , Sen, P. K. and Mukerjee, R. (199 4). Second-order Pitman closeness and Pitman admiss ibilit y . Ann. Statist. 22 1133 –1141 . MR13119 68 [59] Ghosh, J. K. and Sinha , B. K. (1981). A necessar y and suﬃcien t co ndition for second or der admissibility with a pplications to Ber k son’s bioassay problem. Ann. Statist. 9 13 34–13 38. MR06301 16 [60] Ghosh, J. K. and Sinha, B . K. (198 2). Third order eﬃciency of the MLE – a counterexample. Calcutta St atist . Asso c. Bul l. 31 15 1 –158 . MR07024 02 [61] Ghosh, J. K., Sinha, B. K. and Joshi, S. N. (198 2). Expa nsions for p o ste- rior probability and integrated Bayes risk. In St atistic al De cisi on The ory and R ela te d T opics III 1 403 –456 . Academic Press , New Y ork. MR07052 99 [62] Ghosh, J. K., Sinha, B. K. and Subramany am, K. (1979). Edg eworth ex- pansions for Fisher- consistent estima tors and second order eﬃciency . Calcutta Statist. Asso c. Bul l. 28 1 –18. MR05860 79 [63] Ghosh, J. K., S inha, B. K. and Wieand, H. S. (1980). Second o rder eﬃciency o f the MLE w ith res pe c t to any bounded b owl-shap ed loss function. Ann. Statist. 8 50 6–521 . MR05687 17 [64] Ghosh, J. K. and Subramany am, K. (1974). Second order eﬃciency of maximum likelihoo d estimators. Sankhy¯ a S er. A 36 325 –358 . MR04285 72 [65] Hall, W. J., Wijsman, R. A . a nd Gho sh, J. K. (19 65). The relationship betw een suﬃciency and inv ariance with applicatio ns in se quential analysis. Ann. Math. Statist. 36 5 75–61 4. MR01785 52 [66] Mukhop adhy a y, N. and Ghosh, J. K. (2003). Parametric empirical Bay es mo del se lection – so me theory , metho ds and simulation. In Pr ob ability, S tatis- tics and Their Applic ations: Pap ers in Honor of Ra bi Bhattacharya 229–2 45. IMS L e ctur e Notes Mono gr. S er. 41 . IMS, Beach w o o d, O H. MR19994 24 [67] Mukhop adhy a y, N. , Gho sh, J. K. and Berger, J. O. (2005). Some Bay esian pre dictive a pproaches to model s election. St atist. Pr ob ab. L ett . 73 369–3 79. MR2187 852 [68] Tokdar, S. T. and Ghosh, J. K. (2007 ). Posterior consistency of logistic Gaussian pro cess priors in dens it y e s timation. J. St atist. Plann. Infer enc e 137 34–42 . MR2292 838

J. K. Ghoshs contribution to statistics: A brief outline

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment