We consider inference procedures, conditional on an observed ancillary statistic, for regression coefficients under a linear regression setup where the unknown error distribution is specified nonparametrically. We establish conditional asymptotic normality of the regression coefficient estimators under regularity conditions, and formally justify the approach of plugging in kernel-type density estimators in conditional inference procedures. Simulation results show that the approach yields accurate conditional coverage probabilities when used for constructing confidence intervals. The plug-in approach can be applied in conjunction with configural polysampling to derive robust conditional estimators adaptive to a confrontation of contrasting scenarios. We demonstrate this by investigating the conditional mean squared error of location estimators under various confrontations in a simulation study, which successfully extends configural polysampling to a nonparametric context.
The classical conditionality principle (Fisher (1934(Fisher ( , 1935)); Cox and Hinkley (1974)) demands that statistical inference be made relevant to the data at hand by conditioning on ancillary statistics. Arguments for this are best seen from examples in Cox and Hinkley ((1974), Ch.2). Further discussion can be found in Barndorff-Nielsen (1978) and in Lehmann (1981). Under regression models, the ancillary statistic takes the form of studentized residuals. Conditional inference about regression coefficients has been discussed by Fraser (1979), Hinkley (1978), DiCiccio (1988), DiCiccio, Field and Fraser (1990), and Severini (1996), among others. When the error density is completely specified, approximate conditional inference can be made by Monte Carlo simulation or by using numerical integration techniques. The procedure nevertheless becomes computationally intensive if the parameter has a high dimension, in which case large-sample approximations such as those proposed by DiCiccio (1988) and DiCiccio, Field and Fraser (1990) may be necessary. In a nonparametric context where the error density is unspecified, conditional inference has not received much attention despite its clear practical relevance. Fraser (1976) and Severini (1994) tackle the special case of location models. Both suggest plugging in kernel density estimates but provide no theoretical justification for the approach nor any formal suggestion on the choice of bandwidth. The need 1 for sophisticated Monte Carlo or numerical integration techniques endures, and the computational cost is even more expensive than that required by the parametric case. Details of the computational procedures can be found in Severini (1994) and Seifu, Severini and Tanner (1999). In the present paper we prove asymptotic consistency, conditional on the ancillary statistic, of plugging in the kernel density estimator, and derive the orders of bandwidths sufficient for ensuring such consistency. Our proof also suggests a normal approximation to the plug-in approach which is computationally much more efficient for high-dimensional regression estimators.
Consideration of conditionality has motivated different notions of robustness for regression models: see Fraser (1979), Barnard (1981Barnard ( , 1983)), Hinkley (1983) and Severini (1992Severini ( , 1996)). Morgenthaler and Tukey (1991) propose a configural polysampling technique for robust conditional inference, which compromises results obtained separately from a confrontation of contrasting error distributions and provides a global perspective for robustness. Our plug-in approach extends configural polysampling to a nonparametric context, substantially broadens the scope of confrontation, and enhances the global nature of the robustness attributed to the resulting inference procedure. Section 2.1 describes the problem setting. Section 2.2 reviews a bootstrap approach to unconditional inference for regression coefficients. The case of conditional inference is treated in Section 2.3. Section 3 investigates the asymptotics underlying the plug-in approach. Section 4 reviews configural polysampling and extends it to nonparametric confrontations by the plug-in approach. Empirical results are given in Section 5. Section 6 concludes our findings. All proofs are given in the Appendix.
Consider a linear regression model Y i = x T i β + ǫi , for i = 1, . . . , n, where respectively. When f 0 , and hence f , is unspecified, exact conditional inference is not possible as the conditional likelihood of β depends in general on f 0 . Adopting Jørgensen’s (1993) notion of I-sufficiency, we see that A is I-sufficient for f 0 , so that any relevant information about f 0 is contained in A. The same applies to à and f . Such ancillary-informed knowledge about f and f 0 forms the basis for nonparametric estimation of the conditional likelihood and facilitates nonparametric conditional inference in an approximate sense.
Under the regression-scale model, the distribution
Conditional inference about β replaces G T and G U used in the unconditional approach by, respectively, the conditional distributions G T |A (•|a) of T given
Consider first the regression-scale model. Define S = σ/σ. The conditional joint density of (S, T ) given A = a has the expression
where
where k is a kernel function and h > 0 is the bandwidth. This leads to nonparametric estimates ĜT|A and ĝT|A of G T |A and g T |A respectively, which can again be approximated by either Monte Carlo or numerical integration methods. We term this the “plug-in” (PI) approach to distinguish it from the “residual bootstrap” (RB) approach introduced earlier to unconditional inference. The use of studentized residuals a in its derivation guarantees that fh (z|a) has unit scale asymptotically. Under symmetry of f 0 , it might be beneficial in practice to use in place of fh its symmetrized version, fh (z|a) = ( fh (z|a) + fh (-z|a))/2.
Under the regression model, the distribution and de
This content is AI-processed based on open access ArXiv data.