We study the distributions of the LASSO, SCAD, and thresholding estimators, in finite samples and in the large-sample limit. The asymptotic distributions are derived for both the case where the estimators are tuned to perform consistent model selection and for the case where the estimators are tuned to perform conservative model selection. Our findings complement those of Knight and Fu (2000) and Fan and Li (2001). We show that the distributions are typically highly nonnormal regardless of how the estimator is tuned, and that this property persists in large samples. The uniform convergence rate of these estimators is also obtained, and is shown to be slower than 1/root(n) in case the estimator is tuned to perform consistent model selection. An impossibility result regarding estimation of the estimators' distribution function is also provided.
Penalized maximum likelihood estimators have been studied intensively in the last few years. A prominent example is the least absolute selection and shrinkage (LASSO) estimator of Tibshirani (1996). Related variants of the LASSO include the Bridge estimators studied by Frank and Friedman (1993), least angle regression (LARS) of Efron, Hastie, Johnston, Tibshirani (2004), or the smoothly clipped absolute deviation (SCAD) estimator of Fan and Li (2001). Other estimators that fit into this framework are hard-and soft-thresholding estimators. While many properties of penalized maximum likelihood estimators are now well understood, the understanding of their distributional properties, such as finite-sample and large-sample limit distributions, is still incomplete. The probably most important contribution in this respect is Knight and Fu (2000) who study the asymptotic distribution of the LASSO estimator (and of Bridge estimators more generally) when the tuning parameter governing the influence of the penalty term is chosen so that the LASSO acts as a conservative model selection procedure (that is, a procedure that does not select underparameterized models asymptotically, but selects overparameterized models with positive probability asymptotically); see also Knight (2008). In Knight and Fu (2000), the asymptotic distribution is obtained in a fixed-parameter as well as in a standard local alternatives setup. This is complemented by a result in Zou (2006) who considers the fixed-parameter asymptotic distribution of the LASSO when tuned to act as a consistent model selection procedure. Another contribution is Fan and Li (2001) who derive the asymptotic distribution of the SCAD estimator when the tuning parameter is chosen so that the SCAD estimator performs consistent model selection; in particular, they establish the so-called 'oracle' property for this estimator. The results in that latter paper are also fixed-parameter asymptotic results. It is well-known that fixed-parameter (i.e., pointwise) asymptotic results can give a wrong picture of the estimators' actual behavior, especially when the estimator performs model selection; see, e.g., Kabaila (1995), or Leeb andPötscher (2005, 2008a). Therefore, it is interesting to take a closer look at the actual distributional properties of such estimators.
In the present paper we study the finite-sample as well as the asymptotic distributions of the hard-thresholding, the LASSO (which coincides with softthresholding in our context), and the SCAD estimator. We choose a model that is simple enough to facilitate an explicit finite-sample analysis that showcases the strengths and weaknesses of these estimators in a readily accessible framework. Yet, the model considered here is rich enough to demonstrate a variety of phenomena that will also occur in more complex models. We study both the cases where the estimators are tuned to perform conservative model selection as well as where the tuning is such that the estimators perform consistent model selection. We find that the finite-sample distributions can be decisively non-normal (e.g., multimodal). Moreover, we find that a fixed-parameter asymptotic analysis gives highly misleading results. In particular, the ‘oracle’ property, which is based on a fixed-parameter asymptotic analysis, is shown to not provide a reliable assessment of the estimators’ actual performance. For these reasons, we also obtain the asymptotic distributions of the estimators mentioned before in a general ‘moving parameter’ asymptotic framework, which better captures essential features of the finite-sample distribution. [Interestingly, it turns out that in the consistent model selection case a ‘moving parameter’ asymptotic framework more general than the usual n -1/2 -local asymptotic framework is necessary to exhibit the full range of possible limiting distributions.] Furthermore, we derive the uniform convergence rate of the estimators and show that it is slower than n -1/2 in the case where the estimators are tuned to perform consistent model selection. This again exposes the misleading character of the ‘oracle’ property. We also show that the finite-sample distribution of these estimators can not be estimated in any reasonable sense, complementing results of this sort in the literature (Leeb andPötscher (2006a,b, 2008b), Pötscher (2006)). In a subsequent paper, Pötscher and Schneider (2009), analogous results are obtained for the adaptive LASSO estimator.
We note that penalized maximum likelihood estimators are intimately related to more classical post-model-selection estimators. The distributional properties of the latter estimators have been studied by Sen (1979), Pötscher (1991), and Leeb and Pötscher (2003, 2005, 2006a,b, 2008b).
The paper is organized as follows: The model and the estimators are introduced in Section 2, and the model selection probabilities are discussed in Section 3. Consistency, uniform consistency, and uniform convergence ra
This content is AI-processed based on open access ArXiv data.