Beyond the Oracle Property: Adaptive LASSO in Cointegrating Regressions with Local-to-Unity Regressors

Bey ond the Oracle Prop ert y: A daptiv e LASSO in Coin tegrating Regressions with Lo cal-to-Unit y Regressors Karsten Reic hold ∗ and Ulrik e Schneider Institute of Statistics and Mathematical Metho ds in Economics, TU Wien, Vienna, A ustria Marc h 13, 2026 Abstract This pap er deriv es new asymptotic results for the adaptiv e LASSO estimator in coin tegrating regressions, allo wing for uncertain ty ab out whether the regressors are exact unit ro ot pro cesses. W e study mo del selection probabilities, estima- tor consistency , and limiting distributions under standard and moving-parameter asymptotics. W e further deriv e uniform con vergence rates and the fastest lo cal-to- zero rates detectable by the estimator under conserv ativ e and consistent tuning. F or consistent tuning, we construct conﬁdence regions that are easy to implement, uniformly v alid ov er the parameter space, and achiev e sure asymptotic co verage without requiring kno wledge or estimation of lo cal-to-unity or long-run co v ari- ance parameters. Sim ulation results reveal that the ﬁnite-sample distribution of the adaptiv e LASSO estimator can deviate substan tially from the oracle prop erty , whereas moving-parameter asymptotics pro vide muc h more accurate appro xima- tions. Consequently , in addition to being infeasible in applications due to their dep endence on non-estimable n uisance parameters, oracle-based conﬁdence regions are often to o small to ac hieve adequate cov erage in empirically relev an t scenarios with small but non-zero coeﬃcients. In con trast, the prop osed conﬁdence regions are alwa ys feasible and deliver reliable cov erage across the parameter space. An empirical application to predicting the U.S. unemplo yment rate illustrates their practical usefulness for quan tifying uncertaint y around adaptive LASSO estimates. K eywor ds: A daptive LASSO, Conﬁdenc e r e gions, L o c al-to-unity r e gr essors, Moving- p ar ameter asymptotics, Shrinkage estimation, V ariable sele ction JEL classiﬁc ation: C22, C51, C52, C61 ∗ Corresp ondence to: Karsten Reichold, Institute of Statistics and Mathematical Metho ds in Eco- nomics, TU Wien, Wiedner Hauptstr. 8–10, A–1040 Vienna, Austria. E-mail addresses: karsten.reichold@tu wien.ac.at, ulrik e.schneider@tu wien.ac.at 1 1 In tro duction In recen t y ears, the a v ailabilit y of large datasets comprising numerous economic and ﬁnancial v ariables has become the rule rather than the exception. Consequently , practi- tioners using traditional metho ds are frequen tly confronted with the c hallenge of selecting a small num b er of relev ant v ariables from an extensive p o ol of p otential cov ariates. In this context, statistical metho ds that sim ultaneously p erform estimation and v ariable se- lection, such as v ariants of the least absolute shrinkage and selection op erator (LASSO) in tro duced in Tibshirani (1996), are b ecoming increasingly p opular in econometrics. While contributions suc h as W ang et al. (2007), Ren and Zhang (2010), Medeiros and Mendes (2017), A damek et al. (2023), and Chen et al. (2025) examine the use of LASSO-t yp e estimators in models with (lo cally) stationary time series, a gro wing b o dy of recent research considers mo dels with highly p ersistent and endogenous regressors ex- hibiting unit ro ot or lo cal-to-unit ro ot b ehavior. F or example, Liao and Phillips (2015) prop ose adaptive shrinkage metho ds to estimate vector error correction mo dels. K o o et al. (2020) and Mei and Shi (2024) consider so-called predictiv e regressions with high- dimensional stationary and unit ro ot regressors and derive certain asymptotic prop erties of LASSO-t yp e pro cedures in this con text. Smeekes and Wijler (2021) consider asymp- totic prop erties of a LASSO-t yp e estimator in a high-dimensional error correction mo del. Sc hw eikert (2022) proposes an adaptiv e group LASSO metho d to estimate structural breaks in coin tegrating regressions and T u and Xie (2023) follo w a similar agenda in predictiv e regressions with a ﬁxed n umber of highly p ersistent regressors. Lee et al. (2022) deriv e asymptotic prop erties of LASSO-t yp e estimators in regressions with a ﬁxed n umber of stationary and (p oten tially coin tegrated) lo cal-to-unit y regressors. Gonzalo and Pitarakis (2025) prop ose a test for cointegration in a regression model with a ﬁxed n umber of regressors based on the residuals of the adaptive LASSO estimator. T ypically , these pap ers consider regression mo dels where the regressors can b e split in to a set of relev an t regressors, i.e., those with a non-zero regression co eﬃcien t, and a set of irrelev an t regressors, i.e., those with a regression co eﬃcien t b eing exactly equal to zero. Then these articles, among other things, often derive mo del selection probabilities and sometimes also the limiting distribution of the LASSO-t yp e estimator for the set of non- zero co eﬃcien ts. If the pro cedure identiﬁes zero co eﬃcients with probability approaching one and if the limiting distribution for the non-zero coeﬃcients coincides with that of the ordinary least squares (OLS) estimator applied to the true mo del, the estimator is said to p ossess the “oracle prop erty” (F an and Li, 2001). While the oracle prop erty certainly appears conv enien t, it has to b e interpreted with extreme caution. The oracle prop ert y primarily characterizes the asymptotic behavior of the penalized estimation metho d under the assumption that co eﬃcients are either exactly zero or suﬃcien tly large (in absolute v alue, relativ e to the sample size). It th us oﬀers v ery limited guidance 2 in empirically relev ant situations where some coeﬃcients are small (in absolute v alue, relativ e to sample size), but not exactly zero. T o study the asymptotic prop erties when co eﬃcien ts are allow ed to b e small but unequal to zero, one has to let the true co eﬃcien ts dep end on sample size. The con tributions of Lee et al. (2022) and T u and Xie (2023) take a step forw ard b y providing suc h moving-parameter asymptotic prop erties of the LASSO-t yp e pro ce- dures under consideration. Their results, ho w ever, are restricted to sp eciﬁc sequences, as they fo cus on particular rates at which the true co eﬃcients go to zero, and also couple these rates to the c hoice of the tuning parameter. W e discuss the implications of these restrictions at sev eral points in this pap er. As a ﬁrst illustration that these restrictions are not inno cuous, we refer to K o ck (2016), who analyzes the asymptotic prop erties of the adaptiv e LASSO estimator in stationary and non-stationary autoregressions. F or example, in the AR(1) case ∆ y t = ρ T y t − 1 + ε t , with i.i.d. errors ε t , the estimator’s behavior critically depends on both the v alue of ρ T as w ell as the choice of the tuning parameter. In particular, when ρ T approac hes zero at rate 1 /T , the pro cedure fails to detect the co eﬃcien t as non-zero if the tuning parameter div erges, whereas it succeeds with p ositive probability if the tuning parameter remains b ounded. Moreo ver, if the tuning parameter con v erges to zero as w ell, the pro cedure detects the co eﬃcient as non-zero with probabilit y approac hing one. What remains unresolv ed, ho wev er, is the cut-oﬀ rate for the lo cal-to-zero co eﬃcien t that can still b e detected by the estimator when the tuning parameter diverges. These so-called lo cal-to-zero rates are motiv ated not only b y theoretical considerations but also by their relev ance in empirical applications. F or example, they provide a natural framew ork for mo deling w eak signal-to-noise ratios, whic h pla y an imp ortan t role in, e.g., the analysis of sto ck return predictabilit y (see, e.g., Campb ell and Y ogo, 2006; Campb ell, 2008; Phillips, 2015; Demetrescu et al., 2022). In this context, a detailed analysis of the properties of LASSO-t yp e estimators under general sequences at which the true coeﬃcients are allo wed to go to zero, rev eals the fastest lo cal-to-zero rates that can still b e detected by the estimator. In this pap er, w e provide a comprehensive analysis of the asymptotic prop erties of the adaptiv e LASSO estimator (Zou, 2006) applied to coin tegrating regressions with p oten- tially lo cal-to-unity regressors. Allowing for deviations from exact unit ro ots is imp ortant, as macro economic v ariables often display high p ersistence without b eing unit-root pro- cesses, see, e.g., Jensen (2009) for evidence on inﬂation, and Hw ang and V aldés (2024) for further discussion. In particular, we deriv e mo del selection probabilities, estimator consistency , and limiting distributions, while allowing the true co eﬃcien ts to freely mov e 3 through the parameter space along arbitrary sequences. In addition, we deriv e uniform con vergence rates and the local-to-zero rates of the true coeﬃcients that can still b e detected by the estimator. These ﬁndings will shed ligh t on, e.g., the signal-to-noise ratios practitioners can accept when employing the adaptive LASSO estimator in empir- ical applications. W e complete our theoretical analysis with pro viding uniformly v alid conﬁdence regions in the t ypical regime when the tuning parameter diverges. Before we discuss our results in more detail, note that in the context of classical linear regression mo dels with non-sto c hastic regressors, Pötsc her and Sc hneider (2009) provide a comprehensiv e analysis of the adaptiv e LASSO estimator, examining b oth its asymptotic b eha vior and ﬁnite-sample prop erties. In particular, they derive uniform con vergence rates and highlight ho w the c hoice of tuning parameters inﬂuences the performance of the estimation pro cedure. In con trast to the analysis of Pötsc her and Sc hneider (2009), w e ha v e to ov ercome sev eral diﬃculties to deriv e the results of this pap er. First, we are dealing with diﬀerent conv ergence rates of the estimators and second, w e ha ve to accoun t for the sto chastic and non-stationary nature of the regressors, which leads to the o ccurrence of sto chastic in tegrals in the limit. Moreov er, the second-order bias terms in the limiting distribution of the OLS estimator also aﬀect the limiting distribution of the adaptive LASSO estimator. Finally , w e also pro vide results for the m ultiv ariate case whic h is not addressed in the aforementioned article. Based on the asymptotic study of mo del selection probabilities, w e distinguish be- t ween t wo regimes determined b y the large-sample b eha vior of the tuning parameter: consisten t mo del selection (or “consisten t tuning”), where zero coeﬃcients are found with asymptotic probabilit y equal to one, and conserv ative mo del selection (“conserv a- tiv e tuning”), where zero coeﬃcients are detected with asymptotic probability less than one. The asymptotic prop erties of the adaptiv e LASSO estimator diﬀer substan tially b et w een these tw o cases, with the main massages as follows: In the conserv ativ ely tuned case, the estimator is uniformly T -consisten t for parameter estimation and the cut-oﬀ rate for local-to-zero co eﬃcien ts that can b e detected by the pro cedure is 1 /T . In the consisten tly tuned case, the uniform conv ergence rate dep ends on the tuning parameter and is slow er than 1 /T . Deviations of the true parameter from zero of rate 1 /T can- not b e disco vered by the estimator. The fastest lo cal-to-zero rate that is still detectable with p ositive probabilit y again dep ends on the tuning parameter and is slow er than 1 /T . Moreo ver, in the consistently tuned case, the detailed theoretical analysis of the adaptive LASSO estimator allows us to construct conﬁdence regions that hav e co verage probability approac hing one uniformly ov er the parameter space, without requiring an y kno wledge or estimation of local-to-unity or long-run cov ariance parameters. Although inspired b y the non-stochastic regressor case considered in Amann and Sc hneider (2023), extending the construction of such regions to the stochastic regressors presen t in the unit-ro ot or lo cal-to-unit y setting substan tially c hanges the nature of the problem. 4 The theoretical analysis is complemen ted by an extensiv e simulation study . The re- sults show that the ﬁnite-sample distribution of the adaptiv e LASSO estimator often deviates substantially from what is suggested b y the oracle prop ert y , whereas the limit- ing distributions deriv ed under mo ving-parameter asymptotics capture the ﬁnite-sample prop erties of the pro cedure more closely . Moreo v er, the p o or appro ximation qualit y of the oracle prop ert y to the ﬁnite-sample distribution of the adaptive LASSO estimator is also reﬂected in the p erformance of the conﬁdence regions based on the oracle prop ert y . In particular, we ﬁnd that the oracle-based conﬁdence regions are often to o small to achiev e adequate cov erage in empirically relev ant scenarios with small but non-zero co eﬃcien ts, whereas the conﬁdence regions proposed in this paper achiev e adequate cov erage across the en tire parameter space. T ak en together, the theoretical and simulation results indicate that the oracle prop- ert y pro vides an incomplete c haracterization of b oth the asymptotic and ﬁnite-sample prop erties of the adaptive LASSO estimator. A full understanding instead requires a mo ving-parameter framew ork of the type considered in this pap er. Moving b ey ond the oracle prop erty in this wa y enables the construction of uniformly v alid conﬁdence regions under consisten t tuning. Finally , an empirical application to predicting the U.S. unemplo yment rate illustrates the usefulness of the prop osed conﬁdence regions for quan tifying uncertaint y around adaptiv e LASSO estimates. The pap er is organized as follows. Section 2 in tro duces the mo del and states the assumptions. Section 3 contains our theoretical contributions: Section 3.1 deriv es the large-sample prop erties of the adaptive LASSO estimator in a ﬁxed-parameter asymptotic framew ork in the univ ariate regressor case, Section 3.2 considers a moving-parameter asymptotic framew ork in the univ ariate regressor case, and Section 3.3 extends the results to the multiv ariate case. Section 3.4 then derives the uniform conﬁdence region based on the adaptiv e LASSO estimator. Section 4 presen ts the sim ulation results and Section 5 con tains the empirical application. Section 6 summarizes and concludes. All pro ofs are pro vided in the app endix, which also contains additional sim ulation and empirical results. W e use the follo wing notation: ⌊ x ⌋ denotes the integer part of x ∈ R , L is the bac kward-shift operator, diag ( · ) denotes a diagonal matrix with elements sp eciﬁed through- out, and R : = R ∪ {−∞ , ∞} . With ⇒ and p − → we denote w eak con vergence and con- v ergence in probability , resp ectively , and all limit s apply as the sample size T tends to inﬁnit y . W e denote a normal distribution with mean µ and co v ariance matrix Σ as N ( µ, Σ) . The symbol I k denotes the k -dimensional iden tity matrix. F or an y even t E , the indicator function 1 { E } equals one if E o ccurs and zero otherwise. If a sequence a T is iden tical to a ∈ R for all T , we write a T ≡ a . By ω we denote an element of the sample space of the underlying probabilit y space and ( ω ) attached to a random v ariable denotes its realization for this particular ω . 5 2 Setting and Assumptions As motiv ated in the in tro duction, w e consider a cointegrating regression model with lo cal-to-unit y regressors of the form y t = x ′ t β T + u t , (1) x t =  I k − T − 1 c  x t − 1 + v t , (2) for t = 1 , . . . , T , where c : = diag ( c 1 , . . . , c k ) with c j ≥ 0 , and x 0 = O P (1) . F or c = 0 , the mo del encompasses classical cointegrating regressions with unit ro ot regressors. F ollowing Lee et al. (2022), we treat the num b er of regressors k as ﬁxed. F or notational brevity , we exclude deterministic comp onents from (1). F or { w t } t ∈ Z : = { [ u t , v ′ t ] ′ } t ∈ Z w e imp ose the follo wing assumption. Assumption 1. L et w t = Ψ( L ) ε t = P ∞ j =0 Ψ j ε t − j , with P ∞ j =0 j ∥ Ψ j ∥ < ∞ and det(Ψ(1))  = 0 , wher e { ε t } t ∈ Z is a (1 + k ) -dimensional strictly stationary er go dic martingale diﬀer enc e se quenc e with natur al ﬁltr ation F t : = σ  { ε s } t −∞  , c onditional c ovarianc e matrix Σ : = E ( ε t ε ′ t |F t − 1 ) > 0 , and sup t ≥ 1 E ( ∥ ε t ∥ r |F t − 1 ) < ∞ a.s. for some r > 4 . Conditions similar to Assumption 1 are common in the cointegrating regression lit- erature, see, e.g., W agner and Hong (2016) for a detailed discussion. In particular, Assumption 1 allows for regressor endogeneity and error serial correlation, but excludes coin tegration among the elemen ts of x t . 1 Under Assumption 1, the pro cess { w t } t ∈ Z fulﬁlls a functional cen tral limit theorem of the form T − 1 / 2 ⌊ rT ⌋ X t =1 w t ⇒ B ( r ) =   B u ( r ) B v ( r )   = Ω 1 / 2 W ( r ) , 0 ≤ r ≤ 1 , (3) where W ( r ) = [ W u · v ( r ) , W v ( r ) ′ ] ′ is a (1 + k ) -dimensional v ector of indep endent standard Bro wnian motions and Ω : = P ∞ h = −∞ E ( w 0 w ′ h ) > 0 denotes the long-run cov ariance matrix of { w t } t ∈ Z . Note that the results in this pap er also hold under alternativ e sets of assump- tions as long as they imply the functional cen tral limit theorem for { w t } t ∈ Z giv en in (3), see, e.g., Ibragimov and Phillips (2008) and de Jong (2003) for p ossible other conditions. The target of our in vestigation, the adaptiv e LASSO estimator (Zou, 2006) of β T in (1), is deﬁned as ˆ β AL : = argmin b ∈ R k    T X t =1 ( y t − x ′ t b ) 2 + λ T k X j =1 | ˆ β 0 j | − γ | b j |    , (4) 1 As shown b y Lee et al. (2022), allowing for cointegration among the regressors requires a diﬀerent estimation strategy , termed the twin adaptive LASSO. Extending our analysis to this estimator in the more general setting is left for future researc h. 6 where λ T > 0 and γ ≥ 1 are tuning parameters. In contrast to the classical LASSO estimator of Tibshirani (1996), the p enalt y term for the j -th co eﬃcien t in (4) con tains the recipro cal of the absolute v alue of a preliminary estimator ˆ β 0 j of β T ,j , where β T ,j denotes the j -th elemen t of β T . Its aim is to increase the p enalt y term if β T ,j seems small to encourage shrinking, and to p enalize less if β T ,j app ears to b e large in order to reduce the bias. In practice, γ is often c hosen as 1 or 2 and λ T is t ypically selected based on cross-v alidation or information criteria. In line with the recommendations in Lee et al. (2022), we set γ = 1 and ˆ β 0 = ˆ β , where ˆ β denotes the OLS estimator of β T in (1). Under Assumption 1, the limiting distribution of ˆ β is given by T  ˆ β − β T  ⇒ Z c : =  Z 1 0 J c v ( r ) J c v ( r ) ′ dr  − 1  Z 1 0 J c v ( r ) dB u ( r ) + ∆ v u  , (5) where J c v ( r ) : = R r 0 e ( r − s ) c dB v ( s ) and ∆ v u : = P ∞ h =0 E ( v 0 u h ) , see, Phillips (1988). T o simplify notation, w e deﬁne ζ c v v : = R 1 0 J c v ( r ) J c v ( r ) ′ dr . If at least one regressor is endogenous, the limiting distribution of the OLS estimator is con taminated by second-order bias terms. In contrast to the classical coin tegrating re- gressions with unit ro ot regressors, where such bias terms can b e addressed, for example, using the fully mo diﬁed approac h of Phillips and Hansen (1990), in the local-to-unity regressor case, these bias terms additionally dep end on the unkno wn lo cal-to-unity pa- rameters c j . Since these parameters are not consistently estimable, the resulting bias is diﬃcult to correct, see, e.g., Phillips (2023) and the references therein for a detailed discussion. Consequen tly , constructing asymptotically v alid conﬁdence interv als or h y- p othesis tests in cointegrating regressions with lo cal-to-unit y regressors is non-trivial, see, e.g., Magdalinos and Phillips (2009) for an instrumen tal v ariables approac h and Hwang and V aldés (2024) for a mo diﬁed lo w-frequency transformed and augmented OLS metho d. In Section 3.4, w e show that asymptotically v alid uniform conﬁdence regions can nev- ertheless b e constructed from the consisten tly tuned adaptiv e LASSO estimator whic h are straigh tforw ard to implemen t, and do not require an y kno wledge of lo cal-to-unity parameters. Remark 1. Our r esults also extend to the pr e dictive r e gr ession setting (se e, e.g., K o o et al., 2020; Mei and Shi, 2024), wher e the r e gr essor at time t is x t − 1 r ather than x t . In this setting, the limiting distribution of the OLS estimator c oincides with Z c , exc ept that ∆ v u ne e ds to b e r eplac e d by P ∞ h =1 E ( v 0 u h ) . Before deriving the asymptotic prop erties of the adaptive LASSO estimator in the m ultiv ariate case, where no closed-form solution of the minimization problem in (4) is a v ailable, we consider the univ ariate case, where the minimization problem has an explicit 7 solution of the form ˆ β AL =      ˆ β − ˜ λ T ˆ β − 1 if | ˆ β | > q ˜ λ T 0 otherwise , (6) with ˜ λ T : = 0 . 5 λ T ( P T t =1 x 2 t ) − 1 and P T t =1 x 2 t = O P ( T − 2 ) , compare Pötsc her and Sc hneider (2009). Analyzing the univ ariate case in detail facilitates a transparent deriv ation of the estimator’s asymptotic prop erties and pro vides insigh ts in to the underlying mechanisms, helping to clarify the m ultiv ariate results. Equation (6) reveals that in the univ ariate case the adaptive LASSO estimator can b e represen ted solely in terms of the OLS estimator and that the tuning parameter λ T only aﬀects the estimator in its “standardized” version ˜ λ T , where the term P T t =1 x 2 t can b e view ed as a measure of v ariation in the regressor. This explains wh y for the asymptotic study in the subsequent section, the large-sample b ehavior of λ T relativ e to T − 2 is imp ortant – whic h is the rate at whic h P T t =1 x 2 t stabilizes. Figure 1 illustrates the relationship b etw een ˆ β AL and ˆ β for diﬀerent v alues of ˜ λ T . -5 0 5 -5 0 5 Figure 1: Relationship b etw een ˆ β AL and ˆ β for diﬀerent v alues of ˜ λ T . 3 Asymptotic Theory In this section, w e inv estigate the large-sample b ehavior of the adaptiv e LASSO estimator under t wo diﬀerent asymptotic regimes regarding the mo del selection prop erties of the pro cedure. W e sp eak of consisten t model selection (or “consisten t tuning”) if all zero co eﬃcien ts are detected with asymptotic probabilit y equal to one, whereas the case where at least one zero coeﬃcient is set to zero b y the estimator with limiting probabilit y strictly less than one is referred to as conserv ative mo del selection (“conserv ative tuning”). F ormally , this deﬁnition only relates to zero co eﬃcients and p oses no requiremen t on the 8 non-zero co eﬃcients in the mo del. Whic h regime applies dep ends on the limiting b eha vior of the tuning parameter sequence λ T , as will b e clariﬁed b elow. W e ﬁrst consider the univ ariate regressor case. In Section 3.1, w e set β T ≡ β and study ho w the b ehavior of the tuning parameter sequence λ T aﬀects b oth mo del selection and parameter estimation. In addition, we derive the asymptotic distribution of the estimator when β T ≡ β is ﬁxed under b oth mo del selection regimes. The insights from Section 3.1 in to which limiting b eha vior of λ T leads to what t yp e of mo del selection regime then serv e as a starting p oin t for the detailed analysis of the large-sample b ehavior of the adaptiv e LASSO estimator in Section 3.2. In that section, we adopt a mo ving-parameter framework in which the true parameter β T ma y v ary with the sample size T . This allows to determine lo cal-to-zero and uniform con v ergence rates, and to derive asymptotic distributions that, as will b ecome apparent in the simulation study in Section 4.1, more accurately capture the estimator’s ﬁnite-sample prop erties. The analysis is again conducted under b oth conserv ative and consistent mo del selection. After utilizing the explicit expression of the adaptiv e LASSO estimator in the uni- v ariate case, we turn to the m ultiv ariate case in Section 3.3. In this case, the absence of a closed-form solution necessitates diﬀerent tec hniques for deriving the asymptotic prop- erties. Unlike in the univ ariate case, we do not separately present results under ﬁxed- parameter asymptotics, since these are encompassed b y the moving-parameter framework and do not pro vide additional insigh ts b ey ond those already established in the univ ari- ate analysis. As b efore, w e study the asymptotic b ehavior of the estimator under b oth conserv ative and consistent mo del selection. Finally , in Section 3.4, we use the insights from Section 3.3 to construct asymptotically v alid uniform conﬁdence regions for the consistently tuned adaptive LASSO estimator. 3.1 Fixed-P arameter Asymptotics in the Univ ariate Case As outlined abov e, w e begin b y deriving asymptotic results for the adaptiv e LASSO estimator in the univ ariate regressor case under a ﬁxed-parameter framework, i.e., by setting β T ≡ β in (1) with β ∈ R ﬁxed. W e ﬁrst examine the large-sample prop erties of the estimator with resp ect to mo del selection. Prop osition 1 (Mo del selection) . L et { y t } t ∈ Z and { x t } t ∈ Z b e gener ate d by (1) and (2) with k = 1 and β T ≡ β , and let { w t } t ∈ Z satisfy A ssumption 1. (a) L et β  = 0 . If T − 2 λ T → 0 , then P  ˆ β AL = 0  → 0 . (b) L et β = 0 . 9 (b1) If λ T → λ 0 , 0 ≤ λ 0 < ∞ , then P  ˆ β AL = 0  → P   ( ζ c v v ) 1 / 2 |Z c | ≤ s λ 0 2   < 1 . (b2) If λ T → ∞ , then P  ˆ β AL = 0  → 1 . Prop osition 1 rev eals the role of the tuning parameter sequence λ T for mo del selection of the adaptive LASSO: In case λ T → λ 0 with 0 ≤ λ 0 < ∞ , the estimator detects zero co eﬃcien ts with probabilit y smaller than one asymptotically , resulting in conserv ativ e mo del selection. In contrast, when λ T → ∞ , the estimator sets zero co eﬃcien ts equal to zero with probability approaching one and consequently leads to consistent mo del selection. In the follo wing, w e therefore refer to the case λ T → λ 0 , 0 ≤ λ 0 < ∞ , as c onservative tuning , whereas the case λ T → ∞ is termed c onsistent tuning . 2 Moreo ver, the condition T − 2 λ T → 0 is a basic requiremen t for the tuning parameter as it ensures that the probabilit y of the adaptive LASSO estimator incorrectly setting the co eﬃcien t to zero v anishes asymptotically . While this condition is automatically fulﬁlled under conserv ative tuning, it controls the rate at which λ T ma y diverge under consisten t tuning. W e will assume this condition in all subsequent statemen ts in this section. W e now deriv e the asymptotic prop erties of the adaptive LASSO estimator with resp ect to parameter estimation. Prop osition 2 (P arameter estimation) . L et { y t } t ∈ Z and { x t } t ∈ Z b e gener ate d by (1) and (2) with k = 1 and β T ≡ β , let { w t } t ∈ Z satisfy A ssumption 1, and assume T − 2 λ T → 0 . (a) ˆ β AL − β = o P (1) . (b) If T − 1 λ T → ˜ λ 0 , 0 ≤ ˜ λ 0 < ∞ , then T ( ˆ β AL − β ) = O P (1) . (c) If T − 1 λ T → ∞ then λ − 1 T T 2 ( ˆ β AL − β ) = O P (1) . F rom Prop osition 2(a), w e learn that the basic condition T − 2 λ T → 0 , whic h ensures that non-zero co eﬃcients are not falsely put to zero as shown in Prop osition 1(a), also guaran tees that the pro cedure is consisten t for β with resp ect to parameter estimation. Prop osition 2(b) shows that in case of conserv ative tuning or for consistent tuning with a slowly div erging tuning parameter sequence (in the sense that T − 1 λ T sta ys b ounded), the con vergence rate of the adaptive LASSO estimator is T − 1 and coincides with the rate of OLS. How ever, when the estimator is tuned consisten tly and λ T tends to inﬁnity fast enough so that T − 1 λ T div erges also, Prop osition 1(c) rev eals that the conv ergence rate of the adaptiv e LASSO estimator is only T − 2 λ T , whic h is slo w er than T − 1 in this case. 2 Under conserv ative tuning with λ 0 = 0 , the adaptive LASSO estimator is equiv alent to OLS in the sense that T ( ˆ β AL − ˆ β ) = o P (1) . This follo ws directly from the pro of of Theorem 5(b) in Section 3.3. The result also holds in a moving-parameter framework and extends to the multiv ariate case. 10 W e no w deriv e the limiting distribution of the adaptive LASSO estimator. Prop osition 3 (Limiting distribution) . L et { y t } t ∈ Z and { x t } t ∈ Z b e gener ate d by (1) and (2) with k = 1 and β T ≡ β , let { w t } t ∈ Z satisfy A ssumption 1, and assume T − 2 λ T → 0 . (a) L et λ T → λ 0 , 0 ≤ λ 0 < ∞ . (a1) If β  = 0 , then T ( ˆ β AL − β ) ⇒ Z c . (a2) If β = 0 , then T ( ˆ β AL − β ) ⇒ 1    ( ζ c v v ) 1 / 2 |Z c | > s λ 0 2    Z c − λ 0 2 ζ c v v ( Z c ) − 1 ! . (b) L et λ T → ∞ . (b1) If T − 1 λ T → ˜ λ 0 , 0 ≤ ˜ λ 0 < ∞ , then T ( ˆ β AL − β ) ⇒ 1 { β  = 0 } Z c − ( ζ c v v ) − 1 ˜ λ 0 2 β ! . (b2) If T − 1 λ T → ∞ , then λ − 1 T T 2 ( ˆ β AL − β ) ⇒ − 1 { β  = 0 } ( ζ c v v ) − 1 1 2 β . Under conserv ativ e tuning, Prop osition 3(a) rev eals that the asymptotic distribution of the adaptive LASSO estimator coincides with the one of OLS if β  = 0 . In case β = 0 , ho wev er, the limiting distribution consists of an atomic part at zero incurred b y the p ositiv e probabilit y of the estimator b eing equal to zero, and an absolutely contin uous part for when the estimator is not equal to zero. Prop osition 3(b) further sho ws that under consistent tuning, the limiting distribution of the adaptive LASSO estimator fully collapses to p oin tmass at zero whenever β = 0 . 3 In case β  = 0 , b oth the con vergence rate of the adaptive LASSO estimator and its limiting distribution dep end on how λ T div erges in relation to T . If λ T tends to inﬁnity slow er than T , the limiting distribution of T ( ˆ β AL − β ) coincides with the one of OLS. If λ T div erges at rate T , rate- T consistency is still maintained as is already shown in Prop osition 2(b1), but the asymptotic distribution no w deviates from the one of OLS b y a random shift that is inv ersely prop ortional to and has the opp osite sign of β . This random shift in the limiting distribution of the adaptiv e LASSO estimator is not detected in Lee et al. (2022), as their assumptions imply that ˜ λ 0 = 0 . Moreo ver, in con trast to the second 3 F or completeness, note that the proofs of (b1) and (b2) in Proposition 3 reveal that δ T ( ˆ β AL − β ) ⇒ 0 for an y sequence δ T → ∞ if β = 0 and λ T → ∞ . 11 order bias term in the limiting distribution of the OLS estimator, the random shift in the limiting distribution of the adaptive LASSO estimator do es not v anish if the regressor is exogenous. Lastly , if λ T div erges faster than T , the conv ergence rate b ecomes slo wer than T , as shown in Prop osition 2(b2). In this case, for the appropriately scaled estimator λ − 1 T T 2 ( ˆ β AL − β ) , the term “corresp onding to OLS” no longer app ears in the limit. Finally , Prop osition 3 conﬁrms that the rates established in Prop osition 2 are indeed sharp. The results in this section can b e used to explicitly formulate a condition for the so-called “oracle prop erty” of the adaptiv e LASSO estimator, a term coined by F an and Li (2001), established for the adaptiv e LASSO estimator in a classical linear regression mo del in Zou (2006). Corollary 1 (“Oracle prop ert y”) . L et { y t } t ∈ Z and { x t } t ∈ Z b e gener ate d by (1) and (2) with k = 1 and β T ≡ β , let { w t } t ∈ Z satisfy A ssumption 1, and assume T − 1 λ T + λ − 1 T → 0 . Then P  ˆ β AL = 0  → 1 { β = 0 } and T ( ˆ β AL − β ) ⇒ 1 { β  = 0 } Z c . Corollary 1 states that, under consistent tuning with λ T div erging more slo wly than T , the adaptiv e LASSO estimator identiﬁes non-zero co eﬃcients and sets n ull co eﬃcien ts to zero with probability approaching one. Moreo ver, its limiting distribution coincides with that of the OLS estimator whenever β  = 0 . Results similar to Corollary 1 often constitute the main asymptotic ﬁndings in the literature on LASSO-t yp e estimators across v arious con texts, see, e.g., Medeiros and Mendes (2017, Theorem 1), Smeek es and Wijler (2021, Corollary 1), Sc hw eikert (2022, Theorem 4), Lee et al. (2022, Theorems 1 and 3), T u and Xie (2023, Theorem 3.2), and Chen et al. (2025, Theorem 4.2). How ev er, although such results ma y seem con venien t, they hav e to b e interpreted with extreme caution. While the “oracle prop ert y” represen ts the large-sample p erformance of the estimator in situations where regression co eﬃcien ts are either equal to zero or “relatively large” (in absolute v alue and in relation to sam- ple size), it do es not shed light on the empirically relev ant case where some co eﬃcients are “relativ ely small” rather than being exactly equal to zero. In Section 3.2, w e ana- lyze the large-sample prop erties of the adaptive LASSO estimator within an asymptotic framew ork that also accommodates this case. 3.2 Mo ving-P arameter Asymptotics in the Univ ariate Case In this section, we study the asymptotic b eha vior of the adaptiv e LASSO estimator in the univ ariate regressor case within a mo ving-parameter framework, where the unknown co eﬃcien t β T ma y v ary with T . This framework ov ercomes the limitations of the ﬁxed- parameter setting, in whic h the true co eﬃcient is restricted to being either exactly zero or asymptotically large relative to sample size. Suc h a dichotom y is unsatisfactory , since in ﬁnite samples the co eﬃcient may b e non-zero y et small, esp ecially when the signal- to-noise ratio is lo w. The smaller the non-zero co eﬃcien t that an estimator can still 12 reliably detect, the better its p erformance. A k ey adv antage of the mo ving-parameter framew ork is that it rev eals the lo cal-to-zero rate at whic h the estimator can detect non-zero co eﬃcients. In the follo wing theorem, w e deriv e the mo del selection probabilities of the adaptiv e LASSO under b oth conserv ativ e and consisten t tuning. Theorem 1 (Model selection) . L et { y t } t ∈ Z and { x t } t ∈ Z b e gener ate d by (1) and (2) with k = 1 , and let { w t } t ∈ Z satisfy A ssumption 1. (a) If λ T → λ 0 , 0 ≤ λ 0 < ∞ , and T β T → β 0 ∈ R , then P  ˆ β AL = 0  → P   ( ζ c v v ) 1 / 2 |Z c + β 0 | ≤ s λ 0 2   < 1 . (b) If λ T → ∞ and λ − 1 / 2 T T β T → ˜ β 0 ∈ R , then P  ˆ β AL = 0  → P ( ζ c v v ) 1 / 2 ≤ 1 √ 2    ˜ β 0    − 1 ! . Remark 2. The or em 1 describ es the asymptotic b ehavior of the mo del sele ction pr ob abili- ties for arbitrary se quenc es of β T in the sense that al l ac cumulation p oints of the sele ction pr ob abilities c an b e obtaine d in the fol lowing way: apply the r esult to subse quenc es and observe that for every such subse quenc e, we c an sele ct a further subse quenc e such that r elevant quantities, i.e., T β T or λ − 1 / 2 T T β T , c onver ge to a limit in R . A similar c omment also applies to The or ems 2–3 b elow. P art (a) of Theorem 1 shows that under conserv ativ e tuning, if β T is bounded aw a y from zero or conv erges to zero at rate slo w er than T − 1 , i.e., | β 0 | = ∞ , the estimator can detect the co eﬃcient as non-zero with asymptotic probabilit y equal to one. If β T ≡ 0 or β T con verges to zero at rate T − 1 or faster, i.e., β 0 ∈ R , the estimator will set the co eﬃcien t equal to zero with p ositive probability less than one even asymptotically . T o in terpret the results in (b) of Theorem 1 in a meaningful wa y , we supp ose that the basic condition T − 2 λ T → 0 from Section 3.1 holds, whic h we also assume to hold in all subsequen t statements in this section. P art (b) of Theorem 1 then reveals that if β T is b ounded aw a y from zero or conv erges to zero at rate slow er than T − 1 λ 1 / 2 T , i.e., | ˜ β 0 | = ∞ , the estimator can detect the co eﬃcient as non-zero with asymptotic probability equal to one. If β T con verges to zero exactly at rate T − 1 λ 1 / 2 T , i.e., ˜ β 0 ∈ R , ˜ β 0  = 0 , the estimator will set the co eﬃcien t equal to zero with p ositiv e probability less than one asymptotically . Finally , if β T ≡ 0 or β T con verges to zero with rate faster than T − 1 λ 1 / 2 T , the estimator will set the co eﬃcient equal to zero with asymptotic probability equal to one. 13 Remark 3. A s discusse d ab ove, The or em 1 shows that in the c onsistently tune d c ase, the r elevant lo c al-to-zer o r ate is T − 1 λ 1 / 2 T in the sense that c o eﬃcients that c onver ge to zer o slower than that wil l b e dete cte d as non-zer o with asymptotic pr ob ability e qual to one and c o eﬃcients that c onver ge to zer o faster wil l b e dete cte d as non-zer o with asymptotic pr ob ability e qual to zer o. In L e e et al. (2022), it app e ars that c o eﬃcients c onver ging to zer o at r ate T − δ for any δ ∈ (0 , 1) p ose no diﬃculty for the c onsistently tune d adaptive L asso in the sense that they wil l b e dete cte d as non-zer o with asymptotic pr ob ability e qual to one. This, however, is made p ossible by an assumption that links the true c o eﬃcient to the tuning p ar ameter (thr ough the p ar ameter δ ), ther eby masking the dep endenc e of the lo c al-to-zer o r ate on the tuning p ar ameter. Next, we analyze estimation consistency of the adaptive LASSO estimator for the parameter β T . Theorem 2 (Parameter estimation) . L et { y t } t ∈ Z and { x t } t ∈ Z b e gener ate d by (1) and (2) with k = 1 , let { w t } t ∈ Z satisfy A ssumption 1, and assume T − 2 λ T → 0 . (a) It holds that ˆ β AL − β T = o P (1) . (b) If λ T → λ 0 , 0 ≤ λ 0 < ∞ , then T ( ˆ β AL − β T ) = O P (1) . (c) If λ T → ∞ , then λ − 1 / 2 T T ( ˆ β AL − β T ) = O P (1) . Theorem 2(a) sho ws that if T − 2 λ T → 0 , the adaptiv e LASSO estimator is not only consisten t (cf. Prop osition 2), but also uniformly consisten t. P arts (b) and (c) rev eal that the uniform conv ergence rate dep ends on the tuning regime. Under conserv ativ e tuning, the estimator is rate- T consistent, whereas under consisten t tuning, it is only rate- T λ − 1 / 2 T consisten t. W e now deriv e the limiting distribution of the adaptiv e LASSO estimator under ar- bitrary sequences of β T . Theorem 3 (Limiting distribution) . L et { y t } t ∈ Z and { x t } t ∈ Z b e gener ate d by (1) and (2) with k = 1 , let { w t } t ∈ Z satisfy A ssumption 1, and assume T − 2 λ T → 0 . (a) If λ T → λ 0 , 0 ≤ λ 0 < ∞ , and T β T → β 0 ∈ R , then T ( ˆ β AL − β T ) ⇒ 1    ( ζ c v v ) 1 / 2 |Z c + β 0 | > s λ 0 2    Z c − λ 0 2 ζ c v v ( Z c + β 0 ) − 1 ! − 1    ( ζ c v v ) 1 / 2 |Z c + β 0 | ≤ s λ 0 2    β 0 . 14 (b) If λ T → ∞ and λ − 1 / 2 T T β T → ˜ β 0 ∈ R , then λ − 1 / 2 T T ( ˆ β AL − β T ) ⇒      − 1 n ( ζ c v v ) 1 / 2 > a 0 o (2 ˜ β 0 ζ c v v ) − 1 − 1 n ( ζ c v v ) 1 / 2 ≤ a 0 o ˜ β 0 if 0 < | ˜ β 0 | < ∞ 0 otherwise , wher e a 0 : = 1 / ( √ 2 | ˜ β 0 | ) . F rom Theorem 3(a), we learn that under conserv ativ e tuning, if β T is b ounded a wa y from zero or con verges to zero at rate slo w er than T − 1 , i.e., | β 0 | = ∞ , the limiting distribution of T ( ˆ β AL − β T ) is Z c and th us coincides with the limiting distribution of OLS. If β T ≡ 0 or β T con verges to zero at rate T − 1 or faster, i.e., β 0 ∈ R , the limiting distribution consists of an atomic as well as an absolutely contin uous part. P art (b) of the ab ov e theorem shows that under consisten t tuning and using the correct scaling factor, the limit of λ − 1 / 2 T T ( ˆ β AL − β T ) collapses to zero if β T ≡ 0 or β T con verges to zero faster than T λ − 1 / 2 T , i.e. ˜ β 0 = 0 , or if β T is b ounded aw ay from zero or con verges to zero slo wer than T λ − 1 / 2 T , i.e., | ˜ β 0 | = ∞ . Ho wev er, if β T con verges to zero exactly at rate T − 1 λ 1 / 2 T , i.e., 0 < | ˜ β 0 | < ∞ , the limit is random and con tains an atomic as w ell as an absolutely con tinuous part. In terestingly , all remaining randomness originates from the regressor x t , but not from the errors u t . The dep endence on u t disapp ears b ecause the inﬂuence of the rate- T consisten t OLS estimator v anishes asymptotically if ˆ β − β T is scaled by λ − 1 / 2 T T rather than T , whereas the dep endence on x t app ears in the limit through ˜ λ T . 4 Remark 4. The or em 3(b) shows that the uniform c onver genc e r ate for the adaptive L asso estimator under c onsistent tuning is, inde e d, T − 1 λ 1 / 2 T and that sc aling the estimation err or with the lar ger factor T wil l r esult in a sto chastic al ly unb ounde d se quenc e if β T c onver ges to zer o at r ate T − 1 λ 1 / 2 T . F or c ompleteness, we also list the limiting distribution of T ( ˆ β AL − β T ) for arbitr ary se quenc es of β T : If λ T → ∞ , such that T − 2 λ T → 0 , and λ − 1 / 2 T T β T → ˜ β 0 ∈ R , then T ( ˆ β AL − β T ) ⇒            − β 0 if ˜ β 0 = 0 − sign ( ˜ β 0 ) ∞ if 0 < | ˜ β 0 | < ∞ Z c − 0 . 5( ζ c v v ¯ β 0 ) − 1 if | ˜ β 0 | = ∞ , wher e T β T → β 0 ∈ R and λ − 1 T T β T → ¯ β 0 ∈ R . Henc e, T ( ˆ β AL − β T ) c ol lapses to p ointmass at − β 0 whenever β T ≡ 0 or β T c onver ges to zer o at r ate T − 1 or faster, i.e., β 0 ∈ R and ˜ β 0 = 0 . The limiting distribution of T ( ˆ β AL − β T ) is r andom if β T is b ounde d away fr om zer o or c onver ges to zer o at r ate T − 1 λ T or slower, i.e., ¯ β 0  = 0 and | ˜ β 0 | = ∞ . In this c ase, the limiting distribution c oincides with the one of OLS if β T c onver ges to zer o 4 F or more details, we refer to Lemma A.1(c) in App endix A. 15 slower than T − 1 λ T , i.e., | ¯ β 0 | = | ˜ β 0 | = ∞ . However, if | ¯ β 0 | < ∞ , the limiting distribution of the adaptive LASSO estimator deviates fr om the one of OLS by a r andom shift that is inversely pr op ortional to and has the opp osite sign of ¯ β 0 , analo gously to what we have se en under ﬁxe d-p ar ameter asymptotics in Pr op osition 3(b1). A gain, the r andom shift is not dete cte d in L e e et al. (2022), and, in c ontr ast to the se c ond or der bias term in the limiting distribution of the OLS estimator, it do es not vanish if the r e gr essor is exo gene ous. In al l other c ases, the total mass of T ( ˆ β AL − β T ) esc ap es to −∞ or ∞ . Remark 5. R emark 4 il lustr ates that in the c onsistently tune d c ase, while the adaptive L asso estimator c an dete ct lo c al-to-zer o r ates of any or der gr e ater than T − 1 λ 1 / 2 T , in or der to obtain the same limiting distribution as OLS, the true c o eﬃcient must b e of even lar ger or der of magnitude, i.e., gr e ater than T − 1 λ T . In the setting of L e e et al. (2022) with β T = β T − δ for δ ∈ (0 , 1) , β  = 0 , and T − (1 − δ ) λ T → 0 , it automatic al ly holds that λ − 1 T T β T → ∞ . 3.3 The Multiv ariate Case W e no w turn to the multiv ariate regressor case and inv estigate the asymptotic prop erties of the adaptiv e LASSO estimator within a moving-parameter framew ork. Since detailed results for the ﬁxed-parameter framework ha ve already b een presen ted and discussed in the univ ariate case, our fo cus here remains on the more general setting, where the true co eﬃcien ts are allow ed to v ary with sample size, see also the discussion at the b eginning of Section 3. With resp ect to notation, please note that the subscript j con tin ues to denote the j -th element of the vector to which it is attached, e.g., ˆ β AL ,j denotes the j -th comp onen t of ˆ β AL . W e start by deriving mo del selection probabilities for the adaptive LASSO estimator under conserv ativ e as well as consistent tuning for certain relev an t sequences of β T . Theorem 4 (Mo del selection) . L et { y t } t ∈ Z and { x t } t ∈ Z b e gener ate d by (1) and (2) , and let { w t } t ∈ Z satisfy A ssumption 1. (a) If λ T → λ 0 , 0 ≤ λ 0 < ∞ , and T β T → β 0 ∈ R k , then P  ˆ β AL ,j = 0  → 0 , if | ˜ β 0 ,j | = ∞ . (b) If λ T → ∞ and λ − 1 / 2 T T β T → ˜ β 0 ∈ R k then P  ˆ β AL ,j = 0  →      1 if ˜ β 0 ,j = 0 0 if | ˜ β 0 ,j | = ∞ . 16 Before w e discuss the results in Theorem 4 in detail, we extend the statement in Theorem 4(a) in the following remark to obtain a more comprehensiv e picture for the mo del selection prop erties in the conserv atively tuned case. Remark 6. W e p oint out two additional sp e cial c ases for the mo del sele ction pr op erties in the c onservatively tune d c ase. L et λ T → λ 0 , 0 ≤ λ 0 < ∞ . Then: (a) F or T β T → β 0 ∈ R k with β 0 ,j = 0 we have lim inf T →∞ P  ˆ β AL ,j = 0  > 0 . (b) F or β T ≡ β ∈ R k , A : = { j : β j  = 0 } , and ˆ A : = { j : ˆ β AL ,j  = 0 } we have lim sup T →∞ P  ˆ A = A  < 1 . In line with the results from the univ ariate case, Theorem 4(a) sho ws that under conserv ativ e tuning, the estimator can detect co eﬃcien ts with lo cal-to-zero rates of order greater than T − 1 as non-zero with asymptotic probability equal to one. Importantly , in the multiv ariate case, this prop erty dep ends solely on the rate of the co eﬃcient under consideration and is unaﬀected b y the b eha vior of the other comp onen ts of β T . Moreo v er, co eﬃcien ts conv erging to zero with rate faster than T − 1 will b e set to zero with p ositive asymptotic probabilit y as can b e seen in Remark 6(a). The smallest detectable lo cal-to- zero rate under conserv ativ e tuning therefore remains T − 1 . Remark 6(b) illustrates that this tuning regime is indeed conserv ativ e, also in the m ultiv ariate case. F or the consisten tly tuned case, a meaningful in terpretation again requires the basic condition T − 2 λ T → 0 , whic h w e assume to hold throughout this section. As in the univ ariate case, Theorem 4(b) then reveals that the estimator detects co eﬃcients with lo cal-to-zero rates of order greater than T − 1 λ 1 / 2 T as non-zero with asymptotic probability equal to one, while co eﬃcients con verging to zero with rate faster than T − 1 λ 1 / 2 T will alw a ys b e set to zero with asymptotic probability equal to one. As under conserv ativ e tuning, these prop erties only dep end on the rate of the co eﬃcient under consideration and are not aﬀected by the b eha vior of the remaining comp onents of β T . Hence, in the m ultiv ariate setting the smallest detectable lo cal-to-zero rate under consisten t tuning con tinues to b e T − 1 λ 1 / 2 T . W e turn to parameter estimation consistency in the following theorem. Theorem 5 (P arameter estimation) . L et { y t } t ∈ Z and { x t } t ∈ Z b e gener ate d by (1) and (2) , let { w t } t ∈ Z satisfy A ssumption 1, and assume T − 2 λ T → 0 . (a) It holds that ˆ β AL − β T = o P (1) . 17 (b) If λ T → λ 0 , 0 ≤ λ 0 < ∞ , then T ( ˆ β AL − β T ) = O P (1) . (c) If λ T → ∞ , then λ − 1 / 2 T T ( ˆ β AL − β T ) = O P (1) . Theorem 5 sho ws that if T − 2 λ T → 0 , the adaptiv e LASSO estimator is uniformly consisten t also in the multiv ariate case and its uniform conv ergence rate dep ends on the tuning regime. Under conserv ativ e tuning, the estimator is rate- T consistent, whereas under consisten t tuning, the rate decreases to T λ − 1 / 2 T , just as in the univ ariate case. W e now deriv e the limiting distribution of the adaptiv e LASSO estimator under ar- bitrary sequences of β T . Theorem 6 (Limiting distribution) . L et { y t } t ∈ Z and { x t } t ∈ Z b e gener ate d by (1) and (2) , let { w t } t ∈ Z satisfy A ssumption 1, and assume T − 2 λ T → 0 . (a) If λ T → λ 0 , 0 ≤ λ 0 < ∞ and T β T → β 0 ∈ R k , then T ( ˆ β AL − β T ) ⇒ argmin z ∈ R k V c β 0 ( z ) , wher e V c β 0 ( z ) : = z ′ ζ c v v z − 2 z ′  Z 1 0 J c v ( r ) dB u ( r ) + ∆ v u  + λ 0 k X j =1 A j ( z j , β 0 ,j ) and A j ( z j , β 0 ,j ) : =              0 if | β 0 ,j | = ∞ or z j = 0 | z j | |Z c j | if β 0 ,j = 0 and z j  = 0 | β 0 ,j + z j |−| β 0 ,j | | β 0 ,j + Z c j | otherwise. (b) If λ T → ∞ and λ − 1 / 2 T T β T → ˜ β 0 ∈ R k , then λ − 1 / 2 T T ( ˆ β AL − β T ) ⇒ argmin z ∈ R k ˜ V c ˜ β 0 ( z ) , wher e ˜ V c ˜ β 0 ( z ) : = z ′ ζ c v v z + k X j =1 ˜ A j ( z j , ˜ β 0 ,j ) and ˜ A j ( z j , ˜ β 0 ,j ) : =              0 if | ˜ β 0 ,j | = ∞ or z j = 0 ∞ if ˜ β 0 ,j = 0 and z j  = 0 | z j + ˜ β 0 ,j | | ˜ β 0 ,j | − 1 otherwise. (c) If λ T → ∞ , T β T → β 0 ∈ R k , and λ − 1 T T β T → ¯ β 0 ∈ R k , then T ( ˆ β AL − β T ) ⇒ argmin z ∈ R k ¯ V c ¯ β 0 ( z ) , wher e ¯ V c ¯ β 0 ( z ) : = z ′ ζ c v v z − 2 z ′  Z 1 0 J c v ( r ) dB u ( r ) + ∆ v u  + k X j =1 ¯ A j ( z j , β 0 ,j , ¯ β 0 ,j ) 18 and ¯ A j ( z j , β 0 ,j , ¯ β 0 ,j ) : =                            0 if | ¯ β 0 ,j | = ∞ or z j = 0 ∞ if ¯ β 0 ,j = β 0 ,j = 0 and z j  = 0 sign ( z j + 2 β 0 ,j ) sign ( z j ) ∞ if ¯ β 0 ,j = 0 , 0 < | β 0 ,j | < ∞ , and z j  = 0 sign ( z j ) sign ( β 0 ,j ) ∞ if ¯ β 0 ,j = 0 , | β 0 ,j | = ∞ , and z j  = 0 − sign ( ¯ β 0 ,j ) z j | ¯ β 0 ,j | otherwise. The limiting distributions presented in Theorem 6 are deﬁned implicitly . While we cannot explicitly minimize V c β 0 ( z ) , ˜ V c ˜ β 0 ( z ) , and ¯ V c ¯ β 0 ( z ) for ﬁxed β 0 , ˜ β 0 , and ¯ β 0 in general, there are a num b er of sp ecial cases worth p ointing out. First, in line with Theorem 3(a), Theorem 6(a) shows that under conserv ativ e tuning with either λ 0 = 0 or | β 0 ,j | = ∞ for all j = 1 , . . . , k , the limiting distribution of T ( ˆ β AL − β T ) is Z c and th us coincides with the limiting distribution of OLS. Second, in line with Theorem 3(b), part (b) shows that under consistent tuning, the limit of λ − 1 / 2 T T ( ˆ β AL − β T ) collapses to zero whenever ˜ β 0 ,j = 0 or | ˜ β 0 ,j | = ∞ for all j = 1 , . . . , k . Moreo ver, whenev er the limit is stochastic, all randomness originates from the regressors x t , but not from the errors u t (see also the discussion in the univ ariate case). Finally , part (c) reveals that under consisten t tuning, the limiting distribution of T ( ˆ β AL − β T ) coincides with the limiting distribution of OLS if | ¯ β 0 ,j | = ∞ for all j = 1 , . . . , k . Con v ersely , the limit of T ( ˆ β AL − β T ) collapses to zero whenev er | ¯ β 0 ,j | = 0 for all j = 1 , . . . , k . Both results are consisten t with Remark 4. Remark 7. A similar c omment as in R emark 2 also applies to The or ems 4–6. The following prop osition oﬀers additional insigh ts in to the limiting distribution of λ − 1 / 2 T T ( ˆ β AL − β T ) under consisten t tuning, as derived in Theorem 6(b). Prop osition 4. F or a ﬁxe d ω in the sample sp ac e of the underlying pr ob ability sp ac e, the p oint m = m ( ω ) ∈ R k is a minimizer of ˜ V c ˜ β 0 ( z )( ω ) if and only if                    m j = 0 if ˜ β 0 ,j = 0 ( ζ c v v ( ω ) m ) j = 0 if | ˜ β 0 ,j | = ∞ ( ζ c v v ( ω ) m ) j = − sign ( m j + ˜ β 0 ,j ) 2 | ˜ β 0 ,j | if 0 < | ˜ β 0 ,j | < ∞ and m j  = − ˜ β 0 ,j | ( ζ c v v ( ω ) m ) j | ≤ 1 2 | ˜ β 0 ,j | if 0 < | ˜ β 0 ,j | < ∞ and m j = − ˜ β 0 ,j , wher e ζ c v v is the same r andom matrix as in the deﬁnition of ˜ V c ˜ β 0 ( z ) in The or em 6(b). Building on Prop osition 4, the following theorem shows that the set of minimizers of ˜ V c ˜ β 0 ( z ) taken ov er ov er all ˜ β 0 ∈ R k is contained in a random set that do es not dep end on ˜ β 0 . 19 Theorem 7. Deﬁne the r andom set M c : =  m ∈ R k : m j ( ζ c v v m ) j ≤ 1 2 , j = 1 , . . . , k  , wher e ζ c v v is the same as in the deﬁnition of ˜ V c ˜ β 0 ( z ) in The or em 6(b). Then, [ ˜ β 0 ∈ R k argmin z ∈ R k ˜ V c ˜ β 0 ( z ) ⊆ M c , wher e the set inclusion holds sur ely, i.e., for al l ω in the sample sp ac e of the underlying pr ob ability sp ac e. Remark 8. F or later use, we wil l assume that the r andom matrix ζ c v v in The or em 6(b) satisﬁes lim T →∞ T − 2 P T t =1 x t ( ω ) x ′ t ( ω ) = ζ c v v ( ω ) for al l ω . This c an b e achieve d using Skor oho d’s r epr esentation the or em. Theorem 7 together with Remark 8 shows that the union of limits of λ − 1 / 2 T T ( ˆ β AL − β T ) o ver all p ossible parameter sequences is con tained in the set M c , which is a compact set for each realization of ζ c v v . This observ ation allows us to construct uniformly v alid conﬁ- dence regions cen tered at the adaptiv e LASSO estimator, as dev elop ed in the following subsection. Before doing so, let us examine the set M c in more detail. Clearly , all randomness in M c stems from the regressors x t , i.e., no randomness arises from the regression errors u t . F or exp ositional conv enience, w e fo cus on the pure unit ro ot case ( c = 0 ): There, ζ c v v has exp ectation 0 . 5 Ω v v , where Ω v v , given b y the k × k b ottom-righ t blo ck of Ω , the long-run co v ariance matrix of { v t } t ∈ Z . Hence, on av erage, M c is given by { m ∈ R k : m j (Ω v v m ) j ≤ 1 , j = 1 , . . . , k } . Th us, on av erage, the set M c b ecomes smaller as the v ariability of { v t } t ∈ Z increases. In the univ ariate regressor case, Ω v v reduces to the long-run v ariance of v t . Normalizing this v ariance to one implies that, on a v erage, M c coincides with the in terv al [ − 1 , 1] . Consequen tly , in one dimension and on av erage, we reco ver the same in terv al as P ötscher and Sc hneider (2009) and Amann and Sc hneider (2023). Note, how ev er, that the corresp onding sets in these t wo pap ers are non-random, as the regressors are treated as deterministic. 3.4 A Univ ersal Conﬁdence Region Under Consisten t T uning W e no w use the observ ation from Theorem 7 to construct a conﬁdence region that has asymptotic co verage probability equal to one. T o this end, w e deﬁne a “sligh tly larger” ﬁnite-sample analogue c M T ( ε ) : = { m ∈ R k : m j (( T − 2 P T t =1 x t x ′ t ) m ) j ≤ 1 2 + ε, j = 1 , . . . , k } of M c , where ε > 0 but arbitrarily small. The following theorem shows that this set can b e used to construct a conﬁdence region based on the adaptive LASSO estimator that asymptotically holds an y prescribed cov erage level. 20 Theorem 8 (Conﬁdence regions) . L et { y t } t ∈ Z and { x t } t ∈ Z b e gener ate d by (1) and (2) , let { w t } t ∈ Z satisfy A ssumption 1, and let T − 2 λ T → 0 and λ T → ∞ . Then lim T →∞ inf β ∈ R k P β  β ∈ ˆ β AL − T − 1 λ 1 / 2 T c M T ( ε )  = 1 for any ε > 0 . Theorem 8 deliv ers v alid conﬁdence regions, as the co v erage probabilit y holds uni- formly o v er the parameter space. T echnically , this is ac hieved b y taking the inﬁm um o ver the parameter space prior to letting T tend to inﬁnity . The k ey underlying idea is that β ∈ { ˆ β AL − λ 1 / 2 T T − 1 m : m ∈ c M T ( ε ) } holds if and only if λ − 1 / 2 T T ( ˆ β AL − β ) ∈ c M T ( ε ) . The latter even t can then b e approximated by argmin z ∈ R k ˜ V c ˜ β 0 ( z ) ∈ M c for some ˜ β 0 ∈ R k , and this inclusion surely holds for any ˜ β 0 b y Theorem 7. In practice, w e prop ose to construct the conﬁdence region for β based on c M T (0) . Since no closed-form solution is av ailable, c M T (0) can b e computed numerically follo wing the the description b elow, and the conﬁdence region is then given by { ˆ β AL − λ 1 / 2 T T − 1 m : m ∈ c M T (0) } . If conﬁdence in terv als for individual comp onen ts β j are of in terest, these can b e ob- tained b y [ ˆ β AL ,j − λ 1 / 2 T T − 1 m j , ˆ β AL ,j − λ 1 / 2 T T − 1 m j ] , where m j : = max { m j : m ∈ c M T (0) } and m j : = min { m j : m ∈ c M T (0) } = − m j . The quantit y m j can b e computed numerically using sequen tial quadratic programming without explicitly constructing the set c M T (0) . Sp eciﬁcally , we solve the constrained maximization problem deﬁning m j directly . T o reduce the risk of con v ergence to lo cal optima, the algorithm can b e initialized from m ultiple random starting v alues, retaining the largest v alue of m j obtained across these runs. The co ordinate-wise in terv als can also b e used to construct the set c M T (0) . Since c M T (0) is con tained in the Cartesian pro duct of these interv als, i.e., c M T (0) ⊆ B : = Q k j =1 [ m j , m j ] , one can either construct a grid within B or, to sp eed up computation for large k , randomly sample v ectors from B , retaining only those that satisfy the constrain ts deﬁning c M T (0) . 5 It ma y app ear curious to consider conﬁdence regions whose asymptotic cov erage prob- abilit y equals one. T o explain this, recall that the consisten tly tuned adaptiv e LASSO estimator exhibits the b ehavior that only randomness stemming from the regressors x t can p ersist asymptotically . This o ccurs b ecause the scaling induced by the uniform con- v ergence rate is not suﬃciently large for sto chastic v ariation from the error terms u t to surviv e in the limit. When the regressors are treated as non-random, this ev en manifests 5 MA TLAB co de for computing m j and m j for all j = 1 , . . . , k , as w ell as c M T (0) for general k × k symmetric positive deﬁnite matrices, is av ailable on the ﬁrst author’s p ersonal website. 21 in en tirely non-random limits which are contained in a compact set. Amann and Schnei- der (2023) show that this non-random set can b e utilized in a similar w ay to construct conﬁdence regions with uniform asymptotic co verage probability equal to one, but that an y slightly smaller region has asymptotic cov erage probability equal to zero. When the regressors x t are random, a sligh tly diﬀeren t picture arises. All p ossible limits of λ − 1 / 2 T T ( ˆ β AL − β ) are contained in a set that is compact for a ﬁxed realization of the limiting regressor matrix ζ c v v . As a result, conﬁdence regions can b e constructed that still ac hiev e uniform asymptotic cov erage equal to one. How ev er, it cannot b e shown an ymore that this probabilit y will drop to zero when the regions are made smaller, an eﬀect of x t b eing sto chastic. It ma y b e p ossible to exploit the remaining randomness to construct conﬁdence regions with asymptotic cov erage strictly less than one, but w e leav e a formal in v estigation of this question for future research. Nevertheless, the regions prop osed here p ossess a k ey adv an tage: they do not rely on the asymptotic distribution of either the adaptiv e LASSO estimator or the rescaled regressors. As a consequence, their construction a v oids the need for any knowledge or estimation of lo cal-to-unit y and long-run cov ariance parameters, as w ell as for accoun ting for second-order bias terms in the limiting distribution of the adaptiv e LASSO estimator. This universalit y distinguishes our approac h from existing metho ds and, to the b est of our kno wledge, represen ts the ﬁrst construction of LASSO- based uniformly v alid conﬁdence regions in regressions with unit ro ot or lo cal-to-unity regressors. 4 Sim ulation Results This section presen ts simulation results. Section 4.1 analyzes the appro ximation qualit y of the asymptotic results to the ﬁnite-sample distribution of the adaptiv e LASSO esti- mator, whereas Section 4.2 fo cuses on the empirical cov erage probabilities of the uniform conﬁdence regions. 4.1 Finite-Sample Distributions W e in v estigate the approximation qualit y of our theoretical results to the ﬁnite-sample distribution of the adaptiv e LASSO estimator under b oth conserv ative and consistent tuning for v arious sequences β T and diﬀeren t sample sizes. W e also compare the ﬁnite- sample distribution of the adaptiv e LASSO to that of OLS to analyze ho w m uch it deviates from what is suggested b y the oracle prop erty in empirically relev an t scenarios where some co eﬃcients are small rather than exactly equal to zero. W e generate data according to (1) and (2) for the univ ariate unit root case with [ u t , v t ] ′ ∼ N (0 , I 2 ) i.i.d. across t , and presen t results for β T ∈ { 0 . 1 β , β /T 1 / 2 , β /T , λ 1 / 2 T β /T } , 22 with β = 1 , λ T ∈ { 1 , T 1 / 4 , T 1 / 2 , T } , and T ∈ { 25 , 50 , 100 , 250 , 1000 } . All results are based on 10 , 000 Monte Carlo replications. 6 The c hoices for β T co ver the cases where β T is b ounded a wa y from zero, conv erges to zero at rate T 1 / 2 (the same rate as used in the sim ulations in Lee et al., 2022), conv erges to zero at rate T − 1 (the cut-oﬀ rate under conserv ative tuning), and con v erges to zero at the slow er rate T − 1 λ 1 / 2 T (the cut-oﬀ rate under consistent tuning). With resp ect to the tuning parameter λ T , the c hoice λ T ≡ 1 leads to conserv ativ e tuning, while the other three choices lead to consistent tuning. Im- p ortan tly , in case β T = β /T 1 / 2 , only λ T = T 1 / 4 fulﬁlls the condition in Lee et al. (2022) that T − 1 / 2 λ T + λ − 1 T → 0 . Separately for the four c hoices of β T , Figures 2–5 display the ﬁnite-sample distribu- tions of T ( ˆ β AL − β T ) (under conserv ativ e tuning) and λ − 1 / 2 T T ( ˆ β AL − β T ) (under consisten t tuning). The distributions consist of an atomic mass, drawn at the height corresp onding to the relativ e frequency p of the ev en t ˆ β AL = 0 , and a contin uous comp onen t (rescaled to integrate to 1 − p ), representing the densit y of the non-zero estimates. 7 The ﬁgures also displa y the ﬁnite-sample distribution of the OLS estimator, T ( ˆ β − β T ) , as well as the case-sp eciﬁc limiting distribution of the adaptiv e LASSO estimator from Theorem 3 ev aluated at β 0 ,T : = T β T (under conserv ative tuning) and ˜ β 0 ,T : = λ − 1 / 2 T T β T (under con- sisten t tuning). 8 Replacing the limiting parameters β 0 and ˜ β 0 with their ﬁnite-sample coun terparts allows us to accoun t for the size of β T relativ e to the sample size when ev aluating the appro ximation qualit y of the limiting distributions deriv ed in Theorem 3 for the ﬁnite-sample distributions. Figure 2 presen ts the results for β T ≡ 0 . 1 β . In general, the adaptiv e LASSO estimator iden tiﬁes the true co eﬃcient as non-zero with probabilit y approaching one as sample size T increases. How ev er, by construction, the empirical probability of incorrectly setting the non-zero coeﬃcient to zero increases with the order at whic h λ T div erges. Under conserv ative tuning, the ﬁnite-sample distribution of the estimator approac hes that of the OLS estimator, with the tw o distributions b ecoming virtually indistinguishable already for T = 100 . Notably , the limiting distribution deriv ed in Theorem 3(a) ev aluated at β 0 ,T already pro vides a go o d appro ximation to the ﬁnite-sample distribution of the adaptiv e LASSO estimator for small T , e.g., T = 25 . In the con text of Lemma A.1 in App endix A, this indicates that Z c T and ζ v v ,T con verge quic kly to their asymptotic coun terparts, such that the ﬁnite-sample distribution of the pro cedure is eﬀectively gov erned by β 0 ,T . Under 6 T o fo cus on the main eﬀects, we omit error serial correlation and regressor endogeneit y from the mo del. While our empirical ﬁndings remain qualitatively similar when these features are included – as w ell as under changes in the v ariances of u t and v t or deviations from normality – the approximation qualit y of the limiting distributions derived in Theorem 3 and Remark 4 ma y b e reduced, particularly in small- to medium-sized samples. 7 All displa yed densities are smo othed by using the Gaussian kernel. 8 Densities of limiting distributions are obtained b y simulation, where Bro wnian motions are appro x- imated by normalized sums of 10 , 000 i.i.d. standard normal random v ariables and stochastic integrals are appro ximated accordingly . 23 consisten t tuning, the ﬁgure shows that the scaling factor implied by the uniform rate causes all mass of the distribution of the adaptiv e LASSO estimator to collapse at zero as T increases. Moreov er, the limiting distribution derived in Theorem 3(b), ev aluated at ˜ β 0 ,T , approximates the ﬁnite-sample distribution more accurately the larger the order of λ T . F or λ T = T , the approximation is already quite accurate for small T . W e no w turn to Figure 3, which presen ts the results for β T = β /T 1 / 2 . Under con- serv ativ e tuning, the adaptive LASSO estimator still identiﬁes the smaller co eﬃcien t as non-zero with probabilit y approac hing one as T increases and its distribution quic kly approac hes that of the OLS estimator. Under consistent tuning, ho wev er, its prop er- ties no w dep end on the order of λ T . As b efore, for λ T ∈ { T 1 / 4 , T 1 / 2 } , the pro cedure correctly identiﬁes the true co eﬃcient as non-zero with probability approaching one and the scaling factor implied b y the uniform rate causes all mass to collapse at zero as T increases. F or λ T = T , ho w ever, the estimator incorrectly sets the true co eﬃcient to zero with probabilit y approac hing 0 . 68 . As a result, even for large T , the distribution of the estimator consists of both an atomic and a contin uous part. Nev ertheless, the limiting distribution derived in Theorem 3(b), ev aluated at ˜ β 0 ,T , pro vides a go o d appro ximation to the ﬁnite-sample distribution ev en for small T . Figure 4 presents the results for β T = β /T , which is the cut-oﬀ rate for detection of non-zero co eﬃcients under conserv ative tuning. As a result, under conserv ativ e tuning, the adaptive LASSO estimator incorrectly sets the true co eﬃcien t to zero with probabilit y approac hing 0 . 43 and its distribution – whic h is appro ximated v ery w ell b y the limiting distribution derived in Theorem 3(a), ev aluated at β 0 ,T , already for small T – consists of b oth an atomic and a con tinuous part. Under consisten t tuning, the estimator incorrectly sets the true co eﬃcien t to zero with probability approaching one, suc h that its distribution collapses to an atomic part at − ˜ β 0 ,T . Finally , Figure 5 presents the results for β T = √ λ T β /T , whic h is the cut-oﬀ rate for the detection of non-zero co eﬃcients under consistent tuning. 9 As a result, under consisten t tuning, the adaptiv e LASSO estimator incorrectly sets the true coeﬃcient to zero with probabilit y approac hing 0 . 68 and its distribution consists of b oth an atomic and a contin uous part. The limiting distribution derived in Theorem 3(b), ev aluated at ˜ β 0 ,T , approximates the ﬁnite-sample distribution of the estimator more accurately the larger the order of λ T . F or λ T = T , the appro ximation is already quite accurate for small T . No w, w e analyze the ﬁnite-sample distribution of T ( ˆ β AL − β T ) under consistent tuning, whic h is presen ted in Figures F.1–F.4 in App endix F separately for the four choices of β T alongside the corresp onding limits from Remark 4 and the ﬁnite-sample distribution of the OLS estimator. 9 Since λ T ≡ 1 implies β T = β /T , the results under conserv ative tuning coincide with those shown in Figure 4. 24 Figure F.1 presen ts the results for β T ≡ 0 . 1 β . As seen previously in Figure 2, the adaptiv e LASSO estimator identiﬁes the true coeﬃcient as non-zero with probability approac hing one as T increases. Ho wev er, the order of λ T has a substan tial impact on the estimator’s distribution. F or λ T ∈ { T 1 / 4 , T 1 / 2 } , its distribution approac hes that of the OLS estimator. In con trast, for λ T = T , the distribution approaches to that of the OLS estimator shifted to the left. This sto c hastically b ounded shift signiﬁcan tly distorts the distribution even for small T . As a result, when λ T = T , the adaptive LASSO estimator fails to exhibit the oracle prop ert y , despite correctly identifying the true co eﬃcient as non-zero with probabilit y approac hing one. W e no w turn to Figure F.2, whic h presen ts the results for β T = β /T 1 / 2 . F or λ T ∈ { T 1 / 4 , T 1 / 2 } , the adaptive LASSO estimator identiﬁes the true co eﬃcient as non-zero with probabilit y approac hing one as T increases (c.f. Figure 3), but only for λ T = T 1 / 4 do es its distribution approac h that of OLS. F or λ T = T 1 / 2 , on the other hand, the distribution of the pro cedure approaches the one of OLS shifted to the left and it th us loses its oracle prop ert y . F or λ T = T , the adaptive LASSO estimator incorrectly sets the true co eﬃcien t to zero with probability approaching 0 . 68 and its distribution consists of b oth an atomic part and a contin uous part. As T increases, b oth the lo cation of the atomic part and the region where the con tinuous part of the distribution has mass shift tow ard −∞ . While the b eha vior of the estimator in case λ T = T 1 / 4 is already describ ed in Lee et al. (2022), the cases λ T ∈ { T 1 / 2 , T } are not cov ered b y their results, but are in line with our asymptotic results in Remark 4. Figure F.3 presents the results for β T = β /T . In this case, the adaptiv e LASSO estimator incorrectly sets the true co eﬃcien t to zero with probabilit y approaching one (see also Figure 4), causing its distribution to collapse to an atomic part at − β . The collapse o ccurs more rapidly the larger the order at which λ T div erges. Finally , Figure F.4 presents the results for β T = √ λ T β /T . As this is the cut-oﬀ rate for the detection of non-zero co eﬃcien ts under consisten t tuning, the adaptiv e LASSO estimator incorrectly sets the true co eﬃcient to zero with probability approac hing 0 . 68 (see also Figure 5), and its distribution consists of b oth an atomic and a contin uous part. As T increases, b oth the lo cation of the atomic part and the region where the contin uous part of the distribution has mass shift tow ard −∞ . Ov erall, the sim ulation results reveal that the ﬁnite-sample distribution of the adap- tiv e LASSO estimator can deviate substantially from what is suggested by the oracle prop ert y , esp ecially under consisten t tuning. By contrast, the limiting distributions de- riv ed in under moving-parameter asymptotics capture the ﬁnite-sample prop erties m uch more accurately . They not only provide reasonable appro ximations for the absolutely con- tin uous part of the estimator’s distribution but also conv ey information on the relative frequency with whic h the co eﬃcient is set to zero. 25 4.2 Co v erage Probabilities W e now study the cov erage probabilities of the conﬁdence regions introduced in Sec- tion 3.4 and b enchmark them against conﬁdence regions based on the oracle prop ert y . Data are generated as in the previous subsection. In the univ ariate case, the conﬁdence region from Theorem 8 simpliﬁes to the interv al ˆ β AL ∓ √ λ T / q 2 P T t =1 x 2 t , whic h w e lab el the “Uniform CI” . The oracle-based interv al (“Oracle CI”) is given b y [ ˆ β AL − q 1 − α/ 2 /T , ˆ β AL − q α/ 2 /T ] , where q α denotes the α -quan tile of Z c deﬁned in Equation (5) in Section 2. The quantiles are obtained by sim ulation as described in the previous subsection. W e rep ort results for α = 0 . 05 and α = 0 . 01 , corresp onding to nominal 95% and 99% oracle conﬁdence in terv als, resp ectively . In addition, w e consider the in terv al ˆ β AL ∓ √ λ T /T , lab eled “asymptotic Uniform CI”, which replaces T − 2 P T t =1 x 2 t in the deﬁnition of c M T (0) b y the exp ectation of its limit ζ c v v , whic h is equal to 1 in this case. F or each in terv al, we compute the empirical cov erage probability for the true param- eter v alue β ∈ [ − 0 . 6 , 0 . 6] using an equidistant grid with step size 0 . 01 . As the results are symmetric in β , w e rep ort them only for | β | . Practitioners following the oracle property are typically interested in conﬁdence interv als only when ˆ β AL  = 0 , as they assume β = 0 otherwise. A ccordingly , when ˆ β AL = 0 we collapse the Oracle CI to the singleton { 0 } , whic h co vers the true parameter β only if β = 0 . 10 The Uniform CI is constructed the same w a y for all v alues of ˆ β AL . As b efore, all results rep orted in this subsection are based on 10 , 000 Mon te Carlo replications. Figure 6 rep orts cov erage probabilities for those choices of tuning parameters λ T from the previous subsection that lead to consisten t tuning, i.e., λ T ∈ { T 1 / 4 , T 1 / 2 , T } . Since the oracle prop erty requires T − 1 λ T + λ − 1 T → 0 (see, e.g., Corollary 1), it do es not hold for λ T = T . Consequen tly , the Oracle CI is asymptotically in v alid in this case, whereas the Uniform CI remains asymptotically v alid. The ﬁgure sho ws that the cov erage probabilities of all conﬁdence in terv als are close to one when β = 0 . How ever, for small deviations of β aw ay from zero, the co verage probability of the Oracle CI drops sharply , often to v alues b elow 0 . 5 , while the cov erage probabilit y of the Uniform CI remains relativ ely stable. As β mov es further a wa y from zero, the co verage probabilit y of the Oracle CI ev entually recov ers. Nevertheless, it p erforms p o orly in the region of primary in terest, namely when β is small but non-zero. More generally , we ﬁnd that the higher the rate at whic h λ T div erges, or the larger the sample size T , the more stable the co v erage probabilit y of the Uniform CI across v alues of β . When λ T = T , the asymptotically in v alid Oracle CI can exhibit co verage probabilities close to zero ev en in large samples, whereas the Uniform CI remains asymptotically v alid and p erforms w ell even in small samples. Finally , the co verage probabilities of the asymptotic Uniform CI are generally m uch lo wer than those of the Uniform CI. This reﬂects the p oint already emphasized in 10 Results are qualitatively similar without this restriction. 26 Section 3.4 that the conﬁdence sets m ust b e constructed using the realized regressors x t , rather than quan tities based on (limiting) distributional prop erties only . W e next rep eat the analysis after scaling λ T b y a factor of four in eac h case, leaving its rate of div ergence unc hanged. Figure 7 presents the results. Scaling λ T in this w ay further impro ves the co verage probability of the Uniform CI across all sample sizes and div ergence rates, while the co verage probabilit y of the Oracle CI deteriorates further. This outcome reﬂects tw o opp osing eﬀects of increasing λ T . First, more estimates ˆ β AL are shrunk to zero, whic h w orsens the p erformance of the Oracle CI. Second, the width of the Uniform CI increases, which impro ves its cov erage. Finally , we relax the assumption of i.i.d. standard normal regression errors and regres- sor inno v ations and examine the p erformance of the conﬁdence interv als under error serial correlation and regressor endogeneit y . Sp eciﬁcally , we generate u t = ρ 1 u t − 1 + e t + ρ 2 ν t and v t = ν t + 0 . 5 ν t − 1 , where [ e t , ν t ] ′ ∼ N (0 , (4 / 9) I 2 ) i.i.d. across t . 11 The parameters ρ 1 and ρ 2 go vern the degree of error serial correlation and regressor endogeneit y , resp ectively . Figure 8 rep orts results for ρ 1 = ρ 2 = 0 . 6 and the scaled tuning parameters. 12 The ﬁgure sho ws that the previous ﬁndings remain in tact in the presence of serial correlation and regressor endogeneit y . T o complete the analysis, T ables F.1 and F.2 in App endix F rep ort the lengths of the conﬁdence in terv als underlying Figures 6–8. As exp ected, the Uniform CI is typically longer than the Oracle CI, but the diﬀerence is usually mo derate and diminishes further as the sample size increases. In some instances, ho wev er, the Uniform CI can b e substantially longer than the Oracle CI, but this o ccurs either for unfa vorable realizations of x t or in settings where the Oracle CI is asymptotically inv alid and exhibits cov erage probabilities close to zero. W e therefore conclude that the uniform conﬁdence region proposed in Section 3.4 constitutes a useful to ol for quantifying uncertaint y around adaptive LASSO estimates under consisten t tuning in empirical applications. 11 Rescaling the v ariance ensures that Ω v v = 1 and thus simpliﬁes the comparison with the previous results. 12 When sim ulating the quantiles of Z c for the oracle interv als, w e use the true long-run co v ariance pa- rameters to capture the dependence structure in the data. In applications, these quan tities are unknown and must b e estimated. In the presence of lo cal-to-unit y regressors this approach b ecomes infeasible as Z c also depends on the lo cal-to-unity parameters which are not consisten tly estimable. In contrast, the uniform conﬁdence interv als are unaﬀected by these issues. 27 λ T ≡ 1 T = 25 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 50 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 100 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 250 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 1000 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T 1 / 4 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T 1 / 2 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 Figure 2: Finite-sample distributions of T ( ˆ β AL − β T ) (under conserv ative tuning, in the ﬁrst row) and λ − 1 / 2 T T ( ˆ β AL − β T ) (under consistent tuning, in the remaining rows) in case β T ≡ 0 . 1 β (lab eled “AL”), and case-speciﬁc limiting distribution from Theorem 3, ev al- uated at sample counterparts of limiting parameters (lab eled “Thm.3”). Notes : “OLS” denotes the ﬁnite-sample distribution of the OLS estimator. If the correct lo cation of the atomic part of a densit y is smaller than − 4 , it is plotted at − 4 with an arro w p oin ting to the left. 28 λ T ≡ 1 T = 25 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 50 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 100 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 250 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 1000 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T 1 / 4 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T 1 / 2 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 Figure 3: Finite-sample distributions of T ( ˆ β AL − β T ) (under conserv ative tuning, in the ﬁrst row) and λ − 1 / 2 T T ( ˆ β AL − β T ) (under consistent tuning, in the remaining rows) in case β T = β /T 1 / 2 (lab eled “AL”), and case-sp eciﬁc limiting distribution from Theorem 3, ev aluated at sample counterparts of limiting parameters (lab eled “Thm.3”). Notes : See notes to Figure 2. 29 λ T ≡ 1 T = 25 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 50 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 100 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 250 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 1000 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T 1 / 4 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T 1 / 2 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 Figure 4: Finite-sample distributions of T ( ˆ β AL − β T ) (under conserv ative tuning, in the ﬁrst ro w) and λ − 1 / 2 T T ( ˆ β AL − β T ) (under consisten t tuning, in the remaining ro ws) in case β T = β /T (labeled “AL”), and case-sp eciﬁc limiting distribution from Theorem 3, ev aluated at sample counterparts of limiting parameters (lab eled “Thm.3”). Notes : See notes to Figure 2. 30 λ T ≡ 1 T = 25 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 50 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 100 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 250 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 1000 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T 1 / 4 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T 1 / 2 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 Figure 5: Finite-sample distributions of T ( ˆ β AL − β T ) (under conserv ative tuning, in the ﬁrst row) and λ − 1 / 2 T T ( ˆ β AL − β T ) (under consistent tuning, in the remaining rows) in case β T = √ λ T β /T (lab eled “AL”), and case-sp eciﬁc limiting distribution from Theorem 3, ev aluated at sample counterparts of limiting parameters (lab eled “Thm.3”). Notes : See notes to Figure 2. 31 λ T = T 1 / 4 T = 25 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 T = 50 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 T = 100 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 T = 250 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 T = 1000 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 λ T = T 1 / 2 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 λ T = T 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 Figure 6: Cov erage probabilities of conﬁdence interv als 32 λ T = 4 × T 1 / 4 T = 25 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 T = 50 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 T = 100 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 T = 250 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 T = 1000 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 λ T = 4 × T 1 / 2 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 λ T = 4 × T 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 Figure 7: Cov erage probabilities of conﬁdence interv als for scaled λ T 33 λ T = 4 × T 1 / 4 T = 25 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 T = 50 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 T = 100 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 T = 250 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 T = 1000 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 λ T = 4 × T 1 / 2 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 λ T = 4 × T 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 Figure 8: Cov erage probabilities of conﬁdence interv als for scaled λ T in the presence of error serial correlation and regressor endogeneity 34 5 Empirical Illustration W e apply the adaptive LASSO estimator within a predictiv e regression framework to fore- cast the U.S. monthly unemploymen t rate ( UNRATE ). As p oten tial predictors, w e include the v ariables considered by Buc kmann and Joseph (2023, T able 1): the 3-mon th T reasury bill ( TB3MS ), real p ersonal income ( RPI ), industrial pro duction ( INDPRO ), consumption ( DPCERA3M086SBEA ), the S&P 500 price index ( S&P 500 ), business loans ( BUSLOANS ), the consumer price index ( CPIAUCSL ), the oil price ( OILPRICEx ), and the M2 money sto c k ( M2SL ). In addition, w e include the four v ariables most frequently selected b y the stan- dardized LASSO in Mei and Shi (2024, T able 4) when predicting the U.S. unemploymen t rate one mon th ahead using a 20-year rolling window: initial jobless claims ( CLAIMSx ), the num b er of unemploy ed less than 5 weeks ( UEMPLT5 ), the n umber of unemplo yed 5 to 14 w eeks ( UEMP5TO14 ), and the num b er of unemplo yed 15 weeks and ov er ( UEMP15OV ). All series are obtained from the FRED-MD macro economic database (McCrac ken and Ng, 2016), with their resp ective FRED-MD co des indicated in paren theses. Throughout, w e use the ra w data without applying any transformations. Mei and Shi (2024) demonstrate the usefulness of LASSO-type metho ds for predict- ing the U.S. unemploymen t rate using the full set of FRED-MD v ariables, considering m ultiple horizons (1, 2, and 3 months) and rolling windows of 10, 20, and 30 years, and benchmarking their results against a random w alk with drift and an autoregressiv e mo del. In con trast, our fo cus is not on relativ e predictiv e p erformance, but on quantifying uncertain ty around adaptiv e LASSO estimates using the uniformly v alid conﬁdence inter- v als prop osed in Section 3.4. Although we compare the magnitudes of adaptive LASSO co eﬃcien ts to their OLS counterparts, oracle-based conﬁdence interv als for OLS are in- feasible b ecause the limiting distribution of the OLS estimator is distorted by n uisance parameters arising from endogeneit y , serial correlation, and lo cal-to-unity predictors. W e rep ort results for one-month-ahead out-of-sample forecasts based on a 20-year rolling window using data from Jan uary 1959 to December 2024. The sample from Jan- uary 1959 to Decem b er 1979 is used for initial estimation, while forecasts are ev aluated o ver the p erio d Jan uary 1980 to Decem b er 2024. Within eac h rolling window, the p enalization parameter λ T is selected via time-series cross-v alidation follo wing Hyndman and A thanasop oulos (2018, Chapter 5.10). F or each candidate v alue of λ T , the adaptiv e LASSO is calculated using the ﬁrst 60% of ob- serv ations in the windo w and then used to generate a one-month-ahead forecast. The estimation sample is then expanded recursiv ely b y one observ ation at a time un til the end of the windo w, pro ducing a sequence of forecasts. The v alue of λ T is c hosen to minimize the resulting ro ot mean squared forecasting error across those forecasts. The candidate set for λ T consists of a grid ranging from zero to the smallest v alue that shrinks all co eﬃcients to zero when estimated on the full window. 35 The left panel of Figure G.5 in App endix G sho ws the adaptiv e LASSO and OLS forecasts alongside the observed unemploymen t rate, while the righ t panel rep orts the corresp onding forecast errors. Over the full ev aluation p erio d, OLS attains a ro ot mean squared forecasting error of 0 . 82 , whereas the adaptiv e LASSO reduces this b y 11% to 0 . 73 . This impro vemen t largely reﬂects the adaptiv e LASSO’s ability to accommo date structural changes during and in the aftermath of the COVID-19 pandemic. The spik e in forecast errors b oth for OLS and adaptiv e LASSO in the b eginning of COVID-19 is driv en by the abrupt surge in CLAIMSx . While the adaptiv e LASSO adjusts rapidly in subsequen t months, the OLS estimator fails to do so, as illustrated in Figure 9. In line with the motiv ation of this pap er, w e ﬁnd that the co eﬃcien ts for the four lab or-mark et v ariables ( CLAIMSx , UEMPLT5 , UEMP5TO14 , UEMP15OV ) are frequen tly esti- mated to b e small but non-zero, whereas the remaining co eﬃcients are often shrunk to zero b y the adaptiv e LASSO. Figure 9 rep orts the rolling-window adaptive LASSO esti- mates for the co eﬃcients corresp onding to the lab or-mark et v ariables together with their uniformly v alid conﬁdence in terv als, benchmark ed against the corresp onding OLS esti- mates. The resulting conﬁdence interv als appear plausible, often widening during and in the aftermath of crisis episo des. This b eha vior can b e partly attributed to increases in the p enalization parameter λ T (see Figure G.6 in App endix G) and partly to changes in the underlying v ariables. Imp ortan tly , a larger λ T do es not necessarily imply wider conﬁdence in terv als. F or example, during and in the aftermath of the COVID-19 crisis, λ T is elev ated, yet the conﬁdence interv als for the co eﬃcients on CLAIMSx and UEMP5TO14 b ecome noticeably narrow er. Figure G.7 in App endix G sho ws the results for the remaining v ariables. Although the adaptiv e LASSO often sets their co eﬃcients to zero, the asso ciated uncertain t y can still b e relatively large, particularly during crises. Ov erall, the application highligh ts the usefulness of the adaptiv e LASSO for esti- mating relationships among economic v ariables in the presence of structural changes or sho c ks. The conﬁdence interv als prop osed in this pap er are plausible and allow to quan- tify uncertain t y around adaptiv e LASSO estimates. Although the conﬁdence in terv als can o ccasionally b e wide, they are robust to endogeneit y , serial correlation, and lo cal-to-unit y parameters, making them a v aluable to ol for empirical applications. 6 Summary and Conclusions This pap er analyzes the asymptotic b eha vior of the adaptiv e LASSO estimator in cointe- grating regressions with lo cal-to-unit y regressors under moving-parameter asymptotics. W e establish mo del selection probabilities, estimation consistency , limiting distributions, and uniform con v ergence rates, as w ell as the fastest local-to-zero rates that remain detectable b y the estimator. As these rates depend critically on the tuning regime, 36 1980 1990 2000 2010 2020 -3 -2 -1 0 1 2 3 10 -5 CLAIMSx 1980 1990 2000 2010 2020 -0.01 -0.005 0 0.005 0.01 UEMPLT5 1980 1990 2000 2010 2020 -0.01 -0.005 0 0.005 0.01 UEMP5TO14 1980 1990 2000 2010 2020 -2 -1 0 1 2 3 10 -3 UEMP15OV Figure 9: 20-y ear rolling window co eﬃcient estimates and adaptive LASSO-based conﬁ- dence in terv als for lab or-market v ariables. the results characterize the smallest signal-to-noise ratios that can reliably b e detected under b oth conserv ative and consistent tuning. In addition, under consistent tuning, w e construct uniformly v alid conﬁdence regions for the regression co eﬃcien ts that are straigh tforward to compute and do not require an y kno wledge or estimation of nuisance parameters asso ciated with long-run cov ariance matrices or lo cal-to-unity parameters. Our simulation study demonstrates that the ﬁnite-sample distribution of the adaptive LASSO estimator often diﬀers substan tially from what is implied b y the oracle prop ert y . In contrast, the limiting distributions derived under moving-parameter asymptotics pro- vide accurate approximations and also successfully capture the empirical frequency with whic h co eﬃcien ts are set to zero. As a result, the proposed uniform conﬁdence regions exhibit stable co verage probabilities across the parameter space, whereas conﬁdence re- gions based on the oracle prop erty p erform p o orly when the true co eﬃcients are close to zero. The empirical application complemen ts these ﬁndings by illustrating the usefulness of the prop osed conﬁdence regions for quan tifying uncertaint y around adaptive LASSO 37 estimates in practice. Sev eral promising directions for future researc h emerge. First, an extension of the analysis to the twin adaptiv e LASSO prop osed in Lee et al. (2022) for settings that allo w also for stationary regressors and coin tegration among lo cal-to-unit y regressors w ould extend the framew ork to a broader class of econometric mo dels. Second, further w ork should fo cus on the prop erties of the prop osed conﬁdence regions, including their p oten tial usefulness for hypothesis testing. Third, theoretical guidance on c ho osing the p enalization parameter that balances the p erformance of the adaptive LASSO estimator and the size of the conﬁdence regions w ould further enhance the empirical applicability of the prop osed metho ds. Finally , extending the results to high-dimensional regressions represen ts a natural next step tow ard data-ric h applications. A c kno wledgemen ts W e thank the participan ts of a researc h seminar at the Vienna Univ ersity of Economics and Business in 2025 and of the 2025 Econometrics W orkshop at TU Dortmund Universit y for their v aluable comments and suggestions. Declaration of In terest The authors ha ve no conﬂicts of interest to declare. References Ad amek, R. , Smeeks, S. and Wilm s, I. (2023). Lasso inference for high-dimensional time series. Journal of Ec onometrics 235 , 1114–1143. Amann, N. and Schneider, U. (2023). Uniform asymptotics and conﬁdence regions based on the adaptiv e lasso with partially consistent tuning. Ec onometric The ory 39 , 1097–1122. Buckmann, M. and Joseph, A. (2023). An interpretable mac hine learning w orkﬂo w with an application to economic forecasting. International Journal of Centr al Banking 19 , 449–522. Campbell, J. Y. (2008). Viewp oint: Estimating the equity premium. Canadian Journal of Ec onomics 41 , 1–21. Campbell, J. Y. and Yogo, M. (2006). Eﬃcient tests of sto c k return predictability . Journal of Financial Ec onomics 81 , 27–60. Chen, J. , Li, D. , Li, Y.-N. and Linton, O. (2025). Estimating time-v arying netw orks for high-dimensional time series. Journal of Ec onometrics 249 , 105941. 38 de Jong, R. (2003). Nonlinear estimators with in tegrated regressors but without exo- geneit y . Mimeo. Demetrescu, M. , Geor giev, I. , Rodrigues, P. M. M. and T a ylor, A. M. R. (2022). T esting for episo dic predictabilit y in sto c k returns. Journal of Ec onometrics 227 , 85–113. F an, J. and Li, R. (2001). V ariable selection via nonconcav e p enalized likelihoo d and its oracle prop erties. Journal of the A meric an Statistic al A sso ciation 96 , 1348–1360. Geyer, C. (1996). On the asymptotics of con v ex sto c hastic optimization. Unpublished man uscript. Gonzalo, J. and Pit arakis, J.-Y. (2025). Detecting sparse cointegration. Preprin t 2501.13839, arxiv. Hw ang, J. and V aldés, G. (2024). Lo w frequency coin tegrating regression with lo cal to unit y regressors and unknown form of serial dependence. Journal of Business and Ec onomic Statistics 42 , 160–173. Hyndman, R. J. and A thanasopoulos, G. (2018). F or e c asting: Principles and Pr ac- tic e . OT exts. Ibra gimo v, R. and Phillips, P. C. B. (2008). Regression asymptotics using martin- gale con vergence metho ds. Ec onometric The ory 24 , 888–947. Jensen, M. J. (2009). The long-run ﬁsher eﬀect: Can it b e tested? Journal of Money, Cr e dit and Banking 41 , 221–231. K ock, A. B. (2016). Consisten t and conserv ativ e mo del selection with the adaptiv e LASSO in stationary and nonstationary autoregressions. Ec onometric The ory 32 , 243– 259. K oo, B. , Anderson, H. M. , Seo, M. H. and Y ao, W. (2020). High-dimensional predictiv e regression in the presence of cointegration. Journal of Ec onometrics 219 , 456–477. Lee, J. H. , Shi, Z. and Gao, Z. (2022). On LASSO for predictive regression. Journal of Ec onometrics 229 , 322–349. Lia o, Z. and Phillips, P. C. B. (2015). Automated estimation of vector error correction mo dels. Ec onometric The ory 31 , 581–646. Ma gd alinos, T. and Phillips, P. C. B. (2009). Econometric inference in the vicinit y of unit y . Preprint, CoFie W orking Paper 7, Singap ore Managemen t Univ ersit y . 39 McCra cken, M. W. and Ng, S. (2016). FRED-MD: A monthly data-base for macro e- conomic researc h. Journal of Business and Ec onomic Statistics 34 , 574–589. Medeir os, M. C. and Mendes, E. F. (2017). ℓ 1 -regularization of high-dimensional time-series models with non-gaussian and heterosk edastic errors. Journal of Ec ono- metrics 191 , 255–271. Mei, Z. and Shi, Z. (2024). On lasso for high dimensional predictive regression. Journal of Ec onometrics 242 , 105809. P ark, J. Y. and Phillips, P. C. B. (1988). Statistical inference in regressions with in tegrated pro cesses: Part 1. Ec onometric The ory 4 , 468–497. Phillips, P. C. B. (1988). Regression theory for near-in tegrated time series. Ec ono- metric a 56 , 1021–1043. Phillips, P. C. B. (2015). Pitfalls and p ossibilities in predictive regression. Journal of Financial Ec onometrics 13 , 521–555. Phillips, P. C. B. (2023). Estimation and inference with near unit ro ots. Ec onometric The ory 39 , 221–263. Phillips, P. C. B. and Hansen, B. E. (1990). Statistical inference in instrumental v ariables regression with i(1) pro cesses. R eview of Ec onomic Studies 57 , 99–125. Pötscher, B. M. and Schneider, U. (2009). On the distribution of the adaptiv e LASSO estimator. Journal of Statistic al Planning and Infer enc e 139 , 2775–2790. Ren, Y. and Zhang, X. (2010). Subset selection for v ector autoregressive pro cesses via adaptiv e Lasso. Statistics and Pr ob ability L etters 80 , 1705–1712. Saikk onen, P. and Choi, I. (2004). Coin tegrating smo oth transition regressions. Ec onometric The ory 20 , 301–340. Schweiker t, K. (2022). Oracle eﬃcien t estimation of structural breaks in coin tegrating regressions. Journal of Time Series A nalysis 43 , 83–104. Smeekes, S. and Wijler, E. (2021). An automated approac h to wards sparse single- equation coin tegration mo delling. Journal of Ec onometrics 221 , 247–276. Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the R oyal Statistic al So ciety Series B 58 , 267–288. Tu, Y. and Xie, X. (2023). Penetrating sp oradic return predictabilit y . Journal of Ec onometrics 237 , 105509. 40 W agner, M. and Hong, S. H. (2016). Coin tegrating p olynomial regressions: F ully mo diﬁed OLS estimation and inference. Ec onometric The ory 32 , 1289–1315. W ang, H. , Li, G. and Tsai, L. (2007). Regression co eﬃcient and autoregressiv e order shrinkage and selection via the Lasso. Journal of the R oyal Statistic al So ciety Series B 69 , 63–78. Zou, H. (2006). The adaptiv e Lass o and its oracle prop erties. Journal of the A meric an Statistic al A sso ciation 101 , 1418–1429. App endices A Preparation In preparation for some of the asymptotic deriv ations in the univ ariate case, w e pro- vide ﬁnite-sample expressions of model selection probabilities and appropriately scaled estimation errors in the follo wing lemma. Lemma A.1 (Finite sample results) . With Z c T : = T ( ˆ β − β T ) , ζ v v ,T : = T − 2 P T t =1 x 2 t , β 0 ,T : = T β T , ˜ β 0 ,T : = λ − 1 / 2 T T β T , and ¯ β 0 ,T : = λ − 1 T T β T , we obtain the fol lowing expr essions for the mo del sele ction pr ob abilities and the distributions of the adaptive LASSO estimator in ﬁnite samples. (a) P  ˆ β AL = 0  c an b e written as P   ζ 1 / 2 v v ,T |Z c T + β 0 ,T | ≤ s λ T 2   (A.1) = P ζ 1 / 2 v v ,T ≤ 1 √ 2    λ − 1 / 2 T Z c T + ˜ β 0 ,T    − 1 ! . (A.2) (b) T ( ˆ β AL − β T ) c an b e de c omp ose d into 1 n ˆ β AL  = 0 o Z c T − λ T 2 ζ v v ,T ( Z c T + β 0 ,T ) − 1 ! − 1 n ˆ β AL = 0 o β 0 ,T (A.3) = 1 n ˆ β AL  = 0 o Z c T − 1 2 ζ v v ,T  λ − 1 T Z c T + ¯ β 0 ,T  − 1 ! − 1 n ˆ β AL = 0 o β 0 ,T . (A.4) (c) λ − 1 / 2 T T ( ˆ β AL − β T ) c an b e de c omp ose d into 1 n ˆ β AL  = 0 o λ − 1 / 2 T Z c T − 1 2 ζ v v ,T  λ − 1 / 2 T Z c T + ˜ β 0 ,T  − 1 ! − 1 n ˆ β AL = 0 o ˜ β 0 ,T . (A.5) 41 Pr o of. T o sho w (a), note that from (6), we get P  ˆ β AL = 0  = P     ˆ β    ≤ q ˜ λ T  = P      T  ˆ β − β T  + T β T    ≤ v u u t λ T 2 T − 2 P T t =1 x 2 t   = P   |Z c T + β 0 ,T | ≤ v u u t λ T 2 ζ v v ,T   = P    λ − 1 / 2 T Z c T + ˜ β 0 ,T    ≤ s 1 2 ζ v v ,T ! , with the last t w o expressions immediately yielding (A.1) and (A.2), resp ectively . T o sho w (b), note that again from (6), it follows that T ( ˆ β AL − β T ) = 1 n ˆ β AL  = 0 o  T ( ˆ β − β T ) − ˜ λ T ˆ β − 1  − 1 n ˆ β AL = 0 o T β T = 1 n ˆ β AL  = 0 o Z c T − λ T 2 ζ v v ,T ( Z c T + β 0 ,T ) − 1 ! − 1 n ˆ β AL = 0 o β 0 ,T = 1 n ˆ β AL  = 0 o Z c T − 1 2 ζ v v ,T  λ − 1 T Z c T + ˜ β 0 ,T  − 1 ! − 1 n ˆ β AL = 0 o β 0 ,T , where the last t w o expressions prov e (A.3) and (A.4), resp ectiv ely . Finally , (c) can b e shown by appropriately scaling (A.4). F or the pro ofs using Lemma A.1, note that Z c T ⇒ Z c and ζ v v ,T ⇒ ζ c v v , as w ell as β 0 ,T → β 0 , ˜ β 0 ,T → ˜ β 0 , and ¯ β 0 ,T → ¯ β 0 in R , the latter three in the notation of Theorems 1– 3. F or the pro ofs in b oth the univ ariate as w ell as the multiv ariate case, we rep eatedly use the join t con vergence of  T − 2 P T t =1 x t x ′ t , T − 1 P T t =1 x t u t  ⇒  ζ c v v , R 1 0 J c v ( r ) dB u ( r ) + ∆ v u  , see Phillips (1988, Lemma 3.1) together with the contin uous mapping theorem, without stating it explicitly . F or the unit ro ot case see Park and Phillips (1988, Lemma 2.1). B Pro ofs for Section 3.1 Pr o of of Pr op osition 1. T o prov e (a), note that just lik e in the pro of of Lemma A.1, we ha ve P  ˆ β AL = 0  = P     ˆ β    ≤ q ˜ λ T  = P       ˆ β − β  + β    ≤ v u u t T − 2 λ T 2 T − 2 P T t =1 x 2 t   . The result therefore follo ws from observing that ˆ β − β = o P (1) , β  = 0 , T − 2 P T t =1 x 2 t = O P (1) , and T − 2 λ T → 0 . T o sho w (b), use (A.1) in Lemma A.1 with β T ≡ 0 to see that P  ˆ β AL = 0  → P   ( ζ c v v ) 1 / 2 |Z c | ≤ s lim T →∞ λ T 2   . 42 F or (b1), observ e that the limiting probability in the ab o ve displa y is strictly less than 1 since λ T → λ 0 < ∞ in this case and the distribution of Z c has unbounded supp ort. F or (b2), note that λ T → ∞ in this case and that ( ζ c v v ) 1 / 2 |Z c | is b ounded in probability . Pr o of of Pr op osition 2. P art (a) is a sp ecial case of Theorem 2(a). P art (b) follo ws directly from Prop osition 3((a) and (b1)) and (c) is a direct consequence of Prop osi- tion 3(b2). Pr o of of Pr op osition 3. P arts (a1) and (a2) are sp ecial cases of Theorem 3(a), with | β 0 | = ∞ and β 0 = 0 , resp ectively . T o pro v e (b1), note that from (6), we can deduce that T ( ˆ β AL − β ) = 1 n ˆ β AL  = 0 o  T  ˆ β − β  − T ˜ λ T ˆ β − 1  − 1 n ˆ β AL = 0 o T β . (B.6) In case β  = 0 , w e get from Prop osition 1(a) that 1 n ˆ β AL  = 0 o p − → 1 as w ell as 1 n ˆ β AL = 0 o T β = o P (1) , where the latter holds since P  1 n ˆ β AL = 0 o | T β | > ε  = 1 {| T β | > ε } P  ˆ β AL = 0  → 0 . Th us, using estimation consistency of ˆ β , T  ˆ β − β  ⇒ Z c , and T ˜ λ T = ( T − 1 λ T ) /  2 T − 2 P T t =1 x 2 t  ⇒ ˜ λ 0 / (2 ζ c v v ) , it follo ws that T ( ˆ β AL − β ) ⇒ Z c − ( ζ c v v ) − 1 ˜ λ 0 2 β . In case β = 0 , (B.6) reduces to T ( ˆ β AL − β ) = 1 n ˆ β AL  = 0 o  T  ˆ β − β  − T ˜ λ T ˆ β − 1  , whic h is o P (1) since P  1 n ˆ β AL  = 0 }     ˆ β − β  − T ˜ λ T ˆ β − 1    > ε  ≤ P  1 n ˆ β AL  = 0 o = 1  = P  ˆ β AL  = 0  → 0 b y Prop osition 1(b2). The pro of of (b2) is similar. First, again from (6), we get that λ − 1 T T 2 ( ˆ β AL − β ) = 1 n ˆ β AL  = 0 o  λ − 1 T T 2  ˆ β − β  − λ − 1 T T 2 ˜ λ T ˆ β − 1  − 1 n ˆ β AL = 0 o λ − 1 T T 2 β . (B.7) As b efore, in case β  = 0 , we get that 1 n ˆ β AL  = 0 o p − → 1 and 1 n ˆ β AL = 0 o λ − 1 T T 2 β = o P (1) . Th us, from estimation consistency of ˆ β , T  ˆ β − β  = O P (1) , λ − 1 T T → 0 , and λ − 1 T T 2 ˜ λ T = 43 1 /  2 T − 2 P T t =1 x 2 t  ⇒ 1 / (2 ζ c v v ) , it follo ws that λ − 1 T T 2 ( ˆ β AL − β ) ⇒ − ( ζ c v v ) − 1 1 2 β . In case β = 0 , (B.7) reduces to λ − 1 T T 2 ( ˆ β AL − β ) = 1 n ˆ β AL  = 0 o  λ − 1 T T 2  ˆ β − β  − λ − 1 T T 2 ˜ λ T ˆ β − 1  , whic h is o P (1) b y the same arguments as ab ov e. Pr o of of Cor ol lary 1. The corollary follows directly from (a) and (b2) in Prop osition 1 and from Prop osition 3(b1). C Pro ofs for Section 3.2 Pr o of of The or em 1. P art (a) immediately follows from (A.1) in Lemma A.1(a) after let- ting Z c T , ζ v v ,T , β 0 ,T , and λ T settle at their (w eak) limits Z c , ζ c v v , β 0 , and λ 0 , resp ectiv ely . Also note that since Z c has a distribution with un b ounded support, the limiting proba- bilit y is smaller than one for all 0 ≤ λ 0 < ∞ and β 0 ∈ R . Similarly , (b) can b e deduced from (A.2) in Lemma A.1(a) in a straightforw ard manner after letting ζ v v ,T and ˜ β 0 ,T settle at their (w eak) limits ζ c v v and ˜ β 0 , resp ectively , and noting that λ − 1 / 2 T Z c T = o P (1) . Pr o of of The or em 2. T o sho w part (a), observe that (6) implies    ˆ β AL − β T    ≤ 1     ˆ β    > q ˜ λ T     ˆ β − ˜ λ T ˆ β − 1 − β T    + 1     ˆ β    ≤ q ˜ λ T  | β T | . The ﬁrst term on the right-hand side of the ab ov e displa y is b ounded by 1     ˆ β    > q ˜ λ T      ˆ β − β T    +    ˜ λ T ˆ β − 1     ≤ o P (1) + q ˜ λ T = o P (1) , where the last equality follows from the fact that ˜ λ T = 0 . 5 T − 2 λ T ( T − 2 P T t =1 x 2 t ) − 1 = o P (1) since T − 2 P T t =1 x 2 t = O P (1) and T − 2 λ T → 0 . F or the second term, we ha v e P  1     ˆ β    ≤ q ˜ λ T  | β T | > ε  = 1 {| β T | > ε } P     ˆ β − β T + β T    ≤ q ˜ λ T  → 0 , as ˆ β − β T = o P (1) , | β T | > ε if 1 {| β T | > ε } = 1 , and ˜ λ T = o P (1) . Hence, 1     ˆ β    ≤ q ˜ λ T  | β T | = o P (1) also and w e can conclude that    ˆ β AL − β T    = o P (1) . P art (b) can b e learned from Theorem 3(a) since the limiting distribution is sto c has- 44 tically b ounded. F or this, note that 1    ( ζ c v v ) 1 / 2 |Z c + β 0 | ≤ s λ 0 2    = 0 for | β 0 | = ∞ since λ 0 < ∞ . P art (c) follows directly from Theorem 3(b). Pr o of of The or em 3. T o sho w (a), note that (A.3) in Lemma A.1(b) states T ( ˆ β AL − β T ) = 1 n ˆ β AL  = 0 o Z c T − λ T 2 ζ v v ,T ( Z c T + β 0 ,T ) − 1 ! − 1 n ˆ β AL = 0 o β 0 ,T . Observing that 1 n ˆ β AL = 0 o = 1    ζ 1 / 2 v v ,T |Z c T + β 0 ,T | ≤ s λ T 2    b y (A.1) in Lemma A.1(a) and letting ζ v v ,T , Z c T , β 0 ,T , and λ T settle at their resp ective (w eak) limits yields the desired result. T o sho w (b), note that (A.3) in Lemma A.1(b) states λ − 1 / 2 T T ( ˆ β AL − β T ) = 1 n ˆ β AL  = 0 o λ − 1 / 2 T Z c T − 1 2 ζ v v ,T  λ − 1 / 2 T Z c T + ˜ β 0 ,T  − 1 ! − 1 n ˆ β AL = 0 o ˜ β 0 ,T . Observing that 1 n ˆ β AL = 0 o = 1 ( ζ 1 / 2 v v ,T ≤ 1 √ 2    λ − 1 / 2 T Z c T + ˜ β 0 ,T    − 1 ) b y (A.2) in Lemma A.1(a), letting ζ v v ,T and ˜ β 0 ,T settle at their respective (weak) limits and recalling that λ − 1 / 2 T Z c T = o P (1) immediately yields the desired result when ˜ β 0 ∈ R . If | ˜ β 0 | = ∞ , the fact that P ( ˆ β AL = 0) → 0 ensures 1 { ˆ β AL = 0 } ˜ β 0 ,T = o P (1) , which prov es the claim for this case in a straight-forw ard manner also. Pr o of of R emark 4. Using Lemma A.1(b), Equation (A.5) states that T ( ˆ β AL − β T ) is giv en b y 1 n ˆ β AL  = 0 o Z c T − 1 2 ζ v v ,T  λ − 1 T Z c T + ¯ β 0 ,T  − 1 ! − 1 n ˆ β AL = 0 o β 0 ,T . By Theorem 1(b), P ( ˆ β AL = 0) con v erges to one for ˜ β 0 = 0 , to 0 < p < 1 for 0 < | ˜ β 0 | < ∞ , and to zero for | ˜ β 0 | = ∞ . Hence, for ˜ β 0 = 0 , the ﬁrst summand in the ab o v e displa y is o P (1) . Consequen tly , T ( ˆ β AL − β T ) ⇒ − β 0 . In case 0 < | ˜ β 0 | < ∞ , note that necessarily , β 0 = sign ( ˜ β 0 ) ∞ and ¯ β 0 = 0 hold. Since λ − 1 T Z c T = o P (1) , the terms next to b oth indicator functions in the ab ov e display tend to − sign ( ˜ β 0 ) ∞ , allo wing to deduce 45 that altogether T ( ˆ β AL − β T ) ⇒ − sign ( ˜ β 0 ) ∞ . Finally , if | ˜ β 0 | = ∞ , the second summand in the ab o ve displa y is o P (1) , whereas in the ﬁrst summand, again, λ − 1 T Z c = o P (1) , so that T ( ˆ β AL − β T ) ⇒ Z c − 0 . 5( ζ c v v ¯ β 0 ) − 1 . D Pro ofs for Section 3.3 F or the proofs in the m ultiv ariate case, w e in tro duce some additional notation. Let y : = [ y 1 , . . . , y T ] ′ ∈ R T , u : = [ u 1 , . . . , u T ] ′ ∈ R T , X j : = [ x 1 ,j , . . . , x T ,j ] ′ ∈ R T , and X = [ X 1 , . . . , X k ] ∈ R T × k , with x t,j denoting the j -th element of x t . Pr o of of The or em 4. T o prov e (a), note that, clearly , the ev ent { ˆ β AL ,j = 0 } is a subset of the ev ent { 2 | ˆ β AL ,j − β T ,j | > | β T ,j |} . Hence, P  ˆ β AL ,j = 0  ≤ P  2 | T ( ˆ β AL ,j − β T ,j ) | > | β 0 ,T ,j |  → 0 , since T ( ˆ β AL ,j − β T ,j ) = O P (1) b y Theorem 5(b) and | β 0 ,T ,j | → | β 0 ,j | = ∞ . F or (b), we ﬁrst show P  ˆ β AL ,j = 0  → 1 if ˜ β 0 ,j = 0 . It follo ws from the Karush-Kuhn- T uck er optimality conditions that when ˆ β AL ,j  = 0 we ha ve 2 X ′ j  y − X ˆ β AL  = λ T 1 | ˆ β j | sign  ˆ β AL ,j  . Multiplying b oth sides of the ab ov e display by λ − 1 / 2 T T − 1 yields 2 λ − 1 / 2 T T − 1 X ′ j  y − X ˆ β AL  = 1 λ − 1 / 2 T T | ˆ β j | sign  ˆ β AL ,j  . (D.8) Using Assumption 1, w e can rewrite the left-hand side of (D.8) as λ − 1 / 2 T T − 1 X ′ j u − ( T − 2 X ′ j X ) λ − 1 / 2 T T ( ˆ β AL − β T ) = O P (1) , since T − 1 X ′ j u = O P (1) , T − 2 X ′ j X = O P (1) , and λ − 1 / 2 T T ( ˆ β AL − β T ) = O P (1) by Theo- rem 5(c). How ev er, for the absolute v alue of the right-hand side of (D.8) we get that 1 λ − 1 / 2 T T | ˆ β j | = 1 | λ − 1 / 2 T Z c T ,j + ˜ β 0 ,T ,j | ⇒ 1 | ˜ β 0 ,j | , whic h is unbounded if ˜ β 0 ,j = 0 . Hence, P  ˆ β AL ,j  = 0  → 0 if ˜ β 0 ,j = 0 . T o complete the pro of, it remains to sho w that P  ˆ β AL ,j = 0  → 0 if | ˜ β 0 ,j | = ∞ . Similar arguments as used to pro ve (a) yield P  ˆ β AL ,j = 0  ≤ P  2 | λ − 1 / 2 T T ( ˆ β AL ,j − β T ,j ) | > | ˜ β 0 ,T ,j |  → 0 , 46 since λ − 1 / 2 T T ( ˆ β AL ,j − β T ,j ) = O P (1) b y Theorem 5(c) and | ˜ β 0 ,T ,j | → | ˜ β 0 ,j | = ∞ . Pr o of of R emark 6. T o prov e (a), deﬁne A 0 : = { j : β 0 ,j  = 0 } and A c 0 : = { j : β 0 ,j = 0 } . W e assume, without loss of generality , that the elements in β 0 are ordered such that the non- zero elemen ts come ﬁrst. W e can therefore partition β 0 = [ β ′ 0 , A 0 , β ′ 0 , A c 0 ] ′ , ˆ β = [ ˆ β ′ A 0 , ˆ β ′ A c 0 ] ′ , ˆ β AL = [ ˆ β ′ AL , A , ˆ β ′ AL , A c ] ′ , and X = [ X A 0 , X A c 0 ] . In generic notation, for sets I and J , a v ector V , and a matrix M , w e deﬁne D I ( V ) : = diag ( | V j | : j ∈ I ) , and M [ I , J ] to b e the |I | × |J | -matrix containing the elements of M with row indices in I and column indices in J only . F or j ∈ A c 0 , w e clearly ha v e that P  ˆ β AL ,j = 0  ≥ P  ˆ β AL , A c 0 = 0 , ˆ β AL ,i  = 0 for all i ∈ A 0  . The corresp onding Karush-Kuhn-T uc ker optimality conditions can b e written as X ′ A 0 X  ˆ β AL − ˆ β  = − λ T 2 D − 1 A 0 ( ˆ β ) sign ( ˆ β AL , A 0 ) (D.9)    D A c 0 ( ˆ β ) X ′ A c 0 X  ˆ β AL − ˆ β     ∞ ≤ λ T 2 . (D.10) Since ˆ β AL , A c 0 = 0 , we can rewrite (D.9) and (D.10) as ˆ β AL , A 0 = ˆ β A 0 +  X ′ A 0 X A 0  − 1 " X ′ A 0 X A c 0 ˆ β A c 0 − λ T 2 D − 1 A 0 ( ˆ β ) sign ( ˆ β AL , A 0 ) # (D.11)    D A c 0 ( ˆ β ) X ′ A c 0 h X A 0 ˆ β A 0 + X A c 0 ˆ β A c 0 − X A 0 ˆ β AL , A 0 i    ∞ ≤ λ T 2 . (D.12) Plugging-in (D.11) into (D.12), after some rearrangement, leads to    D A c 0 ( ˆ β ) [  ( X ′ X ) − 1 [ A c 0 , A c 0 ]  − 1 ˆ β A c 0 + λ T 2  X ′ A c 0 X A 0   X ′ A 0 X A 0  − 1 D − 1 A 0 ( ˆ β ) sign ( ˆ β AL , A 0 ) #      ∞ ≤ λ T 2 , where ( X ′ X ) − 1 [ A c 0 , A c 0 ] denotes the b ottom-right blo ck-elemen t of ( X ′ X ) − 1 . Hence, lim inf T →∞ P  ˆ β AL ,j = 0  ≥ P      D A c 0 ( Z c )   ζ − 1 v v [ A c 0 , A c 0 ]  − 1 Z c A c 0 + λ 0 2 ζ v v [ A c 0 , A 0 ]  ζ v v [ A 0 , A 0 ]  − 1 D − 1 A 0 ( Z c + β 0 ) sign ( β 0 , A 0 ) #      ∞ ≤ λ 0 2 ! > 0 , as the random v ariable in the left-hand side of the ab ov e display has supp ort (0 , ∞ ) . T o prov e (b), note that P  ˆ A = A  = P  ˆ β AL , A c = 0 , ˆ β AL ,i  = 0 for all i ∈ A  and, for T β T = T β → β 0 ∈ R k , w e ha v e β 0 , A = sign ( β A ) ∞ and β 0 , A c = 0 , with A c : = { j : β j = 0 } . 47 Therefore, similar calculations as ab ov e yield lim sup T →∞ P  ˆ A = A  ≤ P     D A c ( Z c )   ζ − 1 v v [ A c , A c ]  − 1 Z c A c      ∞ ≤ λ 0 2 ! < 1 , as the random v ariable in the left-hand side of the ab ov e display has supp ort (0 , ∞ ) . T o pro v e Theorem 5, w e ﬁrst need the following lemma. Lemma D.2. W e have  ˆ β AL − ˆ β  ′ ( X ′ X )  ˆ β AL − ˆ β  = k X j =1  ˆ β AL − ˆ β  j  X ′ X  ˆ β AL − ˆ β  j ≤ k 2 λ T , wher e the ine quality holds sur ely, i.e., for al l ω in the sample sp ac e of the underlying pr ob ability sp ac e. Pr o of. The pro of is similar to the pro of of Lemma 1 in Amann and Schneider (2023). The starting p oint are the Karush-Kuhn-T uc ker optimalit y conditions, whic h can b e written as  X ′ X  ˆ β AL − ˆ β  j = − 1 2 λ T | ˆ β j | sign ( ˆ β AL ,j ) if ˆ β AL ,j  = 0 (D.13)      X ′ X  ˆ β AL − ˆ β  j     ≤ 1 2 λ T | ˆ β j | if ˆ β AL ,j = 0 , (D.14) using X ′ ˆ u = 0 , where ˆ u : = y − X ˆ β denote the OLS residuals. When ˆ β AL ,j = 0 , (D.14) yields      ˆ β AL − ˆ β  j  X ′ X  ˆ β AL − ˆ β  j     ≤ 1 2 λ T . W e no w consider the case ˆ β AL ,j  = 0 . If | ˆ β AL ,j − ˆ β j | ≤ | ˆ β j | , (D.13) implies that      ˆ β AL − ˆ β  j  X ′ X  ˆ β AL − ˆ β  j     ≤ | ˆ β j |      X ′ X  ˆ β AL − ˆ β  j     = 1 2 λ T . On the other hand, if | ˆ β AL ,j − ˆ β j | > | ˆ β j | , w e hav e sign ( ˆ β AL ,j − ˆ β j ) = sign ( ˆ β AL ,j )  = 0 . Therefore, since λ T > 0 ,  ˆ β AL − ˆ β  j  X ′ X  ˆ β AL − ˆ β  j = − 1 2 λ T | ˆ β j | sign ( ˆ β AL ,j )( ˆ β AL ,j − ˆ β j ) = − 1 2 λ T | ˆ β j | sign ( ˆ β AL ,j − ˆ β j )( ˆ β AL ,j − ˆ β j ) = − 1 2 λ T | ˆ β AL ,j − ˆ β j | | ˆ β j | ≤ 0 ≤ 1 2 λ T , whic h completes the proof. 48 Pr o of of The or em 5. P art (a) follows directly from (b) and (c). T o pro ve these parts, let µ min ,T denote the smallest eigen v alue of T − 2 X ′ X = T − 2 P T t =1 x t x ′ t . The contin uity of eigen v alues (Saikk onen and Choi, 2004, Pro of of Lemma 5) and the fact that the limit of T − 2 P T t =1 x t x ′ t is p ositive deﬁnite a.s. b y Assumption 1 ensure that µ − 1 min ,T = O P (1) . F or (b), ﬁrst note that T  ˆ β AL − β T  = T  ˆ β AL − ˆ β  + T  ˆ β − β T  = T  ˆ β AL − ˆ β  + O P (1) . It th us remains to sho w that T  ˆ β AL − ˆ β  = O P (1) , which follows from µ − 1 min ,T = O P (1) since T 2    ˆ β AL − ˆ β    2 = T 2  ˆ β AL − ˆ β  ′  ˆ β AL − ˆ β  ≤ 1 µ min ,T T 2  ˆ β AL − ˆ β  ′ X ′ X T 2  ˆ β AL − ˆ β  = 1 µ min ,T  ˆ β AL − ˆ β  ′ X ′ X  ˆ β AL − ˆ β  ≤ k λ T 2 µ min ,T = O P (1) , where the last inequalit y follo ws from Lemma D.2. F or (c), analogously to the pro of of (b), we ha v e λ − 1 / 2 T T  ˆ β AL − β T  = λ − 1 / 2 T T  ˆ β AL − ˆ β  + λ − 1 / 2 T T  ˆ β − β T  = λ − 1 / 2 T T  ˆ β AL − ˆ β  + o P (1) . It therefore remains to sho w that λ − 1 / 2 T T  ˆ β AL − ˆ β  = O P (1) , which follows again from µ − 1 min ,T = O P (1) since λ − 1 T T 2    ˆ β AL − ˆ β    2 = λ − 1 T T 2  ˆ β AL − ˆ β  ′  ˆ β AL − ˆ β  ≤ 1 µ min ,T λ − 1 T T 2  ˆ β AL − ˆ β  ′ X ′ X T 2  ˆ β AL − ˆ β  = 1 µ min ,T λ − 1 T  ˆ β AL − ˆ β  ′ X ′ X  ˆ β AL − ˆ β  ≤ k 2 µ min ,T = O P (1) , where the last inequalit y again follo ws from Lemma D.2. Pr o of of The or em 6. F or (a), note the follo wing: Since ˆ β AL is the solution to the mini- 49 mization problem in (4) (with γ = 1 and ˆ β 0 = ˆ β ), T ( ˆ β AL − β T ) is the minimizer of Ψ T ( z ) : = T X t =1  y t − x ′ t ( β T + T − 1 z )  2 + λ T k X j =1 | T − 1 z j + β T ,j | | ˆ β j | . Therefore, T ( ˆ β AL − β T ) also minimizes V T ( z ) : = Ψ T ( z ) − Ψ T (0) = z ′ T − 2 T X t =1 x t x ′ t ! z − 2 z ′ T − 1 T X t =1 x t u t ! + λ T k X j =1 | T − 1 z j + β T ,j | − | β T ,j | | ˆ β j | = z ′ T − 2 T X t =1 x t x ′ t ! z − 2 z ′ T − 1 T X t =1 x t u t ! + λ T k X j =1 | z j + β 0 ,T ,j | − | β 0 ,T ,j | |Z c T ,j + β 0 ,T ,j | . Under Assumption 1, we get that V T ( z ) ⇒ V c β 0 ( z ) for all z ∈ R k . Using Prop osition 2.2 and Theorem 3.2 in Gey er (1996), w e ma y deduce that also the minimizer of V T ( z ) con verges weakly to the minimizer of V c β 0 ( z ) , i.e., T ( ˆ β AL − β T ) ⇒ argmin z ∈ R k V c β 0 ( z ) . F or (b), analogously to the pro of of (a), we get that λ − 1 / 2 T T ( ˆ β AL − β T ) minimizes ˜ Ψ T ( z ) : = T X t =1  y t − x ′ t ( β T + λ 1 / 2 T T − 1 z )  2 + λ T k X j =1 | λ 1 / 2 T T − 1 z j + β T ,j | | ˆ β j | . Therefore, λ − 1 / 2 T T ( ˆ β AL − β T ) also minimizes ˜ V T ( z ) : = λ − 1 T  ˜ Ψ T ( z ) − ˜ Ψ T (0)  = z ′ T − 2 T X t =1 x t x ′ t ! z − 2 z ′ λ − 1 / 2 T T − 1 T X t =1 x t u t ! + k X j =1 | λ 1 / 2 T T − 1 z j + β T ,j | − | β T ,j | | ˆ β j | = z ′ T − 2 T X t =1 x t x ′ t ! z − 2 z ′ λ − 1 / 2 T T − 1 T X t =1 x t u t ! + k X j =1 | z j + ˜ β 0 ,T ,j | − | ˜ β 0 ,T ,j | | λ − 1 / 2 T Z c T ,j + ˜ β 0 ,T ,j | . Under Assumption 1, we get that ˜ V T ( z ) ⇒ ˜ V c ˜ β 0 ( z ) for all z ∈ R k . How ever, since ˜ V c ˜ β 0 ( z ) is not ﬁnite on an op en subset of R k , w e cannot use the same argumen ts as in (a) to deduce that also the minimizer of ˜ V T ( z ) con verges weakly to the minimizer of ˜ V c ˜ β 0 ( z ) . Nev ertheless, the result follows from similar arguments as used in Amann and Schneider (2023, Pro of of Theorem 7). F or (c), note that we already kno w from (a) that T ( ˆ β AL − β T ) minimizes V T ( z ) , which can also b e written as V T ( z ) = z ′ T − 2 T X t =1 x t x ′ t ! z − 2 z ′ T − 1 T X t =1 x t u t ! + k X j =1 | z j + β 0 ,T ,j | − | β 0 ,T ,j | | λ − 1 T Z c T ,j + ¯ β 0 ,T ,j | . Under Assumption 1, w e get that ˜ V T ( z ) ⇒ ¯ V c ¯ β 0 ( z ) for all z ∈ R k . T o derive the limits 50 ¯ A j ( z j , β 0 ,j , ¯ β 0 ,j ) , a tedious case-b y-case analysis is necessary when ¯ β 0 ,j = 0 and z j  = 0 . When 0 < | ¯ β 0 ,j | < ∞ and z j  = 0 (“otherwise”), note that β 0 ,j = sign ( ¯ β 0 ,j ) ∞ must hold. The result then follo ws from analogous argumen ts as used in (b). Pr o of of Pr op osition 4. Clearly , m j = 0 m ust hold if ˜ β 0 ,j = 0 , as the ob jective function w ould b e inﬁnite otherwise. If | ˜ β 0 ,j | = ∞ or { 0 < | ˜ β 0 ,j | < ∞ and m j  = − ˜ β 0 ,j } , the partial deriv ativ e of ˜ V ˜ β 0 with resp ect to z j exists and regular ﬁrst-order conditions yield the required result for these cases. F or { 0 < | ˜ β 0 ,j | < ∞ and m j = − ˜ β 0 ,j } , w e mak e use of the fact that m is a minimizer if and only if zero is a subgradien t of ˜ V ˜ β 0 at m . Pr o of of The or em 7. Let m = argmin z ∈ R k ˜ V c β 0 ( z ) . W e need to show that m ∈ M c also, i.e., that m ( ω ) j ( ζ c v v ( ω ) m ( ω )) j ≤ 1 / 2 is satisﬁed for all ω and for all j = 1 , . . . , k . Using Prop osition 4, we get m j ≡ 0 if ˜ β 0 = 0 , so that the required inequalit y surely holds. If | ˜ β 0 | = ∞ , the same prop osition yields ( ζ c v v m ) j ≡ 0 , again ensuring that the inequality holds for all ω . When 0 < | ˜ β 0 ,j | < ∞ , we lo ok at the following cases: If m j ( ω ) = − β 0 ,j , Prop osition 4 shows that | ( ζ c v v ( ω ) m ( ω )) j | ≤ 1 2 | ˜ β 0 ,j | and therefore m ( ω ) j ( ζ c v v ( ω ) m ( ω )) j ≤ | m ( ω ) j ( ζ c v v ( ω ) m ( ω )) j | ≤ | ˜ β 0 ,j | 2 | ˜ β 0 ,j | = 1 2 . If m j ( ω )  = − β 0 ,j , again from Prop osition 4, we get ( ζ c v v ( ω ) m ( ω )) j = − sign ( m ( ω ) j + β 0 ,j ) 2 | ˜ β 0 ,j | . If | m ( ω )) j | > | β 0 ,j | , then sign ( m ( ω ) j + β 0 ,j ) = sign ( m ( ω ) j ) and m ( ω ) j ( ζ c v v ( ω ) m ( ω )) j = − | m ( ω ) j | 2 | ˜ β 0 ,j | ≤ 0 < 1 2 , whereas for | m ( ω )) j | ≤ | β 0 ,j | , w e hav e m ( ω ) j ( ζ c v v ( ω ) m ( ω )) j ≤ | m ( ω ) j ( ζ c v v ( ω ) m ( ω )) j | = | m ( ω ) j | 2 | ˜ β 0 ,j | ≤ 1 2 . 51 E Pro ofs for Section 3.4 Pr o of of The or em 8. Let g T ( β ) = P β ( β ∈ ˆ β AL − T − 1 λ 1 / 2 T c M T ( ε ) and c T = inf β ∈ R k g T ( β ) . W e need to sho w that c T → 1 as T → ∞ . Since c T are the inﬁma of g T , w e can c ho ose sequences ( β T ,n ) n ∈ N ⊆ R k suc h that | c T − g T ( β T ,n ) | ≤ 1 n for all T , k ∈ N . No w deﬁne ˘ β T = β T ,T . Since | c T − g T ( ˘ β T ) | ≤ 1 /T = o (1) as T → ∞ , w e can lo ok at the limiting b ehavior of g T ( ˘ β T ) instead of c T . Now let ˜ β 0 ∈ R p suc h that T λ − 1 / 2 T ˘ β T → ˜ β 0 . 13 Deﬁne M c ( ε ) = { m : m j ( ζ c v v m ) j < 1 2 + ε } and note that for large enough T , w e ha v e M c ( ε/ 2) ⊆ c M T ( ε ) for all ω . W e then get 1 ≥ lim sup T →∞ g T ( ˘ β T ) ≥ lim inf T →∞ g T ( ˘ β T ) = lim inf T →∞ P ˘ β T  T λ − 1 / 2 T ( ˆ β AL − ˘ β ) ∈ c M T ( ε )  ≥ lim inf T →∞ P ˘ β T  T λ − 1 / 2 T ( ˆ β AL − ˘ β ) ∈ M c ( ε/ 2)  ≥ P ˜ β 0  argmin z ∈ R k ˜ V c ˜ β 0 ( z ) ∈ M c ( ε/ 2)  ≥ P ˜ β 0  argmin z ∈ R k ˜ V c ˜ β 0 ( z ) ∈ M c  = 1 , where second-to-last inequality holds by Theorem 6(b) and the P ortmanteau Theorem, and the ﬁnal equalit y b y Theorem 7 together with Remark 8. W e therefore get lim T →∞ c T = lim T →∞ g T ( ˘ β T ) = 1 . 13 If this quantit y do es not conv erge, simply rev ert to a conv ergent subsequence. 52 F A dditional Finite-Sample Results λ T = T 1 / 4 T = 25 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 50 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 100 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 250 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 1000 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T 1 / 2 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 Figure F.1: Finite-sample distribution of T ( ˆ β AL − β T ) under consisten t tuning (in all ro ws) in case β T ≡ 0 . 1 β (lab eled “AL”), and limiting distribution from Remark 4 (lab eled “Rem.5”). Notes : See notes to Figure 2. 53 λ T = T 1 / 4 T = 25 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 50 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 100 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 250 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 1000 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T 1 / 2 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 Figure F.2: Finite-sample distribution of T ( ˆ β AL − β T ) under consisten t tuning (in all ro ws) in case β T = β /T 1 / 2 (lab eled “AL”), and limiting distribution from Remark 4 (labeled “Rem.5”). Notes : See notes to Figure 2. 54 λ T = T 1 / 4 T = 25 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 50 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 100 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 250 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 1000 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T 1 / 2 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 Figure F.3: Finite-sample distribution of T ( ˆ β AL − β T ) under consisten t tuning (in all ro ws) in case β T = β /T (labeled “AL”), and limiting distribution from Remark 4 (lab eled “Rem.5”). Notes : See notes to Figure 2. 55 λ T = T 1 / 4 T = 25 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 50 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 100 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 250 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 T = 1000 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T 1 / 2 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 λ T = T -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4 0 0.2 0.4 0.6 0.8 1 Figure F.4: Finite-sample distribution of T ( ˆ β AL − β T ) under consisten t tuning (in all ro ws) in case β T = √ λ T β /T (labeled “AL”), and limiting distribution from Remark 4 (lab eled “Rem.5”). Notes : See notes to Figure 2. 56 T able F.1: Length of conﬁdence interv als corresp onding to Fig- ures 6–7 Uniform CI Oracle CI T λ T min. median mean max. 95% 99% 25 T 1 / 4 0.032 0.153 0.172 0.764 0.400 0.627 T 1 / 2 0.047 0.228 0.257 1.143 0.400 0.627 T 0.106 0.510 0.574 2.555 0.400 0.627 50 T 1 / 4 0.017 0.084 0.095 0.391 0.200 0.313 T 1 / 2 0.027 0.138 0.156 0.637 0.200 0.313 T 0.073 0.366 0.414 1.694 0.200 0.313 100 T 1 / 4 0.010 0.046 0.052 0.188 0.100 0.157 T 1 / 2 0.018 0.082 0.092 0.334 0.100 0.157 T 0.057 0.258 0.292 1.055 0.100 0.157 250 T 1 / 4 0.005 0.021 0.023 0.109 0.040 0.063 T 1 / 2 0.009 0.041 0.047 0.217 0.040 0.063 T 0.037 0.165 0.186 0.864 0.040 0.063 1000 T 1 / 4 0.001 0.006 0.007 0.028 0.010 0.016 T 1 / 2 0.003 0.015 0.017 0.066 0.010 0.016 T 0.017 0.082 0.093 0.371 0.010 0.016 25 4 × T 1 / 4 0.063 0.305 0.343 1.528 0.400 0.627 4 × T 1 / 2 0.095 0.457 0.513 2.285 0.400 0.627 4 × T 0.212 1.021 1.148 5.110 0.400 0.627 50 4 × T 1 / 4 0.034 0.169 0.191 0.781 0.200 0.313 4 × T 1 / 2 0.055 0.275 0.311 1.274 0.200 0.313 4 × T 0.146 0.732 0.827 3.388 0.200 0.313 100 4 × T 1 / 4 0.020 0.092 0.104 0.375 0.100 0.157 4 × T 1 / 2 0.036 0.163 0.185 0.667 0.100 0.157 4 × T 0.114 0.516 0.584 2.110 0.100 0.157 250 4 × T 1 / 4 0.009 0.042 0.047 0.218 0.040 0.063 4 × T 1 / 2 0.019 0.083 0.093 0.434 0.040 0.063 4 × T 0.074 0.330 0.371 1.727 0.040 0.063 1000 4 × T 1 / 4 0.003 0.012 0.014 0.056 0.010 0.016 4 × T 1 / 2 0.006 0.029 0.033 0.132 0.010 0.016 4 × T 0.035 0.165 0.186 0.742 0.010 0.016 Notes: The length of the Uniform CI dep ends on x t , t = 1 , . . . , T . The table presents the minim um, median, mean, and maxim um length of the Uniform CI across 10 , 000 Monte Carlo replications for diﬀerent v alues of T and λ T . The length of the Oracle CI is constan t across Mon te Carlo replications as it only dep ends on the nominal size and T . 57 T able F.2: Length of conﬁdence interv als corresp onding to Fig- ure 8 Uniform CI Oracle CI T λ T min. median mean max. 95% 99% 25 4 × T 1 / 4 0.070 0.314 0.361 1.978 0.737 1.142 4 × T 1 / 2 0.105 0.469 0.539 2.958 0.737 1.142 4 × T 0.234 1.048 1.205 6.614 0.737 1.142 50 4 × T 1 / 4 0.034 0.171 0.195 0.839 0.369 0.571 4 × T 1 / 2 0.056 0.279 0.318 1.368 0.369 0.571 4 × T 0.149 0.742 0.845 3.639 0.369 0.571 100 4 × T 1 / 4 0.020 0.093 0.105 0.382 0.184 0.285 4 × T 1 / 2 0.036 0.165 0.187 0.679 0.184 0.285 4 × T 0.114 0.523 0.591 2.147 0.184 0.285 250 4 × T 1 / 4 0.009 0.042 0.047 0.216 0.074 0.114 4 × T 1 / 2 0.018 0.083 0.094 0.432 0.074 0.114 4 × T 0.073 0.330 0.372 1.716 0.074 0.114 1000 4 × T 1 / 4 0.003 0.012 0.014 0.053 0.018 0.029 4 × T 1 / 2 0.006 0.029 0.033 0.126 0.018 0.029 4 × T 0.035 0.165 0.187 0.709 0.018 0.029 Notes: See notes to T able F.1. 58 G A dditional Empirical Results 1960 1980 2000 2020 0 5 10 15 20 25 30 Unemployment rate 1980 1990 2000 2010 2020 -5 0 5 10 15 20 Forecast errors Figure G.5: One-month-ahead forecasts (left) and corresponding forecast errors (righ t) based on a 20-y ear rolling windo w. 1980 1990 2000 2010 2020 1 2 5 10 20 50 100 Figure G.6: V alues of the adaptive LASSO p enalty parameter λ T selected via time-series cross-v alidation, plotted on a logarithmic scale. 59 1980 1990 2000 2010 2020 -2 -1 0 1 2 TB3MS 1980 1990 2000 2010 2020 -0.015 -0.01 -0.005 0 0.005 0.01 0.015 RPI 1980 1990 2000 2010 2020 -1 -0.5 0 0.5 1 INDPRO 1980 1990 2000 2010 2020 -2 -1 0 1 2 DPCERA3M086SBEA 1980 1990 2000 2010 2020 -0.04 -0.02 0 0.02 0.04 S&P 500 1980 1990 2000 2010 2020 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 BUSLOANS 1980 1990 2000 2010 2020 -1.5 -1 -0.5 0 0.5 1 1.5 CPIAUCSL 1980 1990 2000 2010 2020 -0.2 -0.1 0 0.1 0.2 OILPRICEx 1980 1990 2000 2010 2020 -0.02 -0.01 0 0.01 0.02 M2SL Figure G.7: 20-y ear rolling window co eﬃcient estimates and adaptive LASSO-based con- ﬁdence in terv als for all remaining v ariables. 60

Beyond the Oracle Property: Adaptive LASSO in Cointegrating Regressions with Local-to-Unity Regressors

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment