GLS under Monotone Heteroskedasticity

GLS UNDER MONOTONE HETER OSKED ASTICIT Y YOICHI ARAI, T AISUKE OTSU, AND M ENGSHAN XU Abstra ct. The generalized least square (GLS) is one of th e most basic tools in regressi on analyses. A ma jor issue in implemen ting the GLS is estimatio n of t h e conditional v aria nce function of the error term, which typically requires a restrictiv e functional form assumption fo r parametric esti mation o r s moothin g parameters for nonparametric estimation. In this pap er, we prop ose an alternative app roac h to estimate th e conditional v ariance function under n onpara- metric monotonicity constraints by utilizing the isotonic regression metho d. Our GLS estimator is shown to b e asymptotically eq uiv alen t to the infeasible GLS estimator with k nowl edge of the conditional error v aria nce, and inv olv es on ly some tuning to trim b ou n dary observ ations, not only for p oint estima tion but als o for interv a l estima tion or h yp othesis testing. Our an alysis extends the scope of the isotonic regressio n meth od by sho wing th at t he isotonic estimates, p ossibly with generated v ariables , can b e employ ed as ﬁrst stage estimates to b e plugged in for semiparame tric ob jects. Sim ulation stud ies illustrate excellen t ﬁn ite sample p erformances of the prop osed metho d. As an empirical example, we revisit Acemoglu and Restrep o’s (2017) study on the relationship b etw een an aging p opulation and economic grow th to illustrate how our GLS estimator eﬀectiv ely redu ces estimation errors. 1. Introduction The generalized least square (GLS ) is one of the most basic tools in r egression analyses. It yields the b est linear unbiased estimator in the classical linear regression mo del, and has b een studied extensively in eco nometrics and statistic s literature; see e.g., W o oldr idge (2010, Chapter 7) for a review. A ma j or issue in implementing the GLS is that the optimal we igh ts giv en b y the conditional error v aria nce function (sa y , σ 2 ( · )) are typically unknown to r esearchers and need to b e estimated. One wa y to estimate σ 2 ( · ) is to sp ecify its parametric fun ctional form and estimate it b y a parametric r egression for the squared O LS resid uals of the original regression on the sp eciﬁed co v ariates. Ho w ev er, economic theory rarely provides exact f u nctional forms of σ 2 ( · ), and the fea sible GLS using missp eciﬁed σ 2 ( · ) is n o longer asym p totically eﬃcien t (Cragg, 19 83). T o address this issu e, C arroll (1 982) and Robins on (1987 ) prop osed to estimate σ 2 ( · ) nonp arametrically and established the asymptotic equiv alence of th e resulting feasible GLS estimator with the infeasible one und er certain regularit y conditions. T his is a remark able r esult, but it requires th eoreticall y and practically judicious c hoices of smo othing parameters, suc h as bandwidth s, ser ies lengths, or n umbers of neighbors. It s h ould b e noted that such smo othing parameters app ear in not only the p oin t estimator but also its s tand ard error for inference, and their c hoices t ypically requ ire some assumption or kn owledge of the smo othness of the conditional v ariance and asso ciated d ensit y functions, such as their d iﬀeren tiabilit y ord ers. In this pap er , we prop ose an alternativ e approac h to estimate the cond itional error v ariance function to implement the GLS by exploring a sh ap e constrain t of σ 2 ( · ) instead of its smo othness 1 as in Robinson (1987). As argued by Matzkin (1994), economic theory often pr o vides shap e constrain ts for functions of economic v ariables, such as monotonicit y , conca vit y , or s ymmetry . In p articular, we fo cu s on situations w here σ 2 ( · ) is kno wn to b e monotone in its argumen t ev en th ou gh its exact functional form is unsp eciﬁed, and prop ose to estimate σ 2 ( · ) b y utilizing the metho d of isotonic regression (see a review by Gro en eb o om and J ongblo ed, 2014). It is kno wn that the conv en tional isoto nic regression estimator t ypically yields p iecewise constan t function estimates and do es n ot inv olv e an y tuning parameters. Although the limiting b eh avior of the isotonic regression estimator is less tractable (suc h as the n 1 / 3 -consistency and complicated limiting distribution), w e sho w th at our feasible GLS estimato r using the optimal w eigh ts by the isotonic estimator with some trimm ing for b oun dary observ ations is asymp toticall y equiv alen t to the infeasible GLS estimator. F urth ermore, w e can plug in th is isotonic estimator to estimate the asymptotic v ariance of the GLS estimator for statistical inference. F or th e linear mo del Y = X ′ β + U in the p resence of heterosk edasticit y σ 2 ( X ) = E [ U 2 | X ], using feasible GLS to imp ro v e the estimation eﬃciency has a long history . On the one hand , sev eral parametric mo dels h a ve b een pr op osed to estimate conditional err or v ariance fun ction σ 2 ( · ). See Remark 5 b elo w. On the other hand , Carroll (1982) and Robinson (1 987) estimated σ 2 ( · ) with k ernel and nearest neigh b or estimator, r esp ectiv ely , and they sho w ed their s emipara- metric GLS estimators are asymp toticall y equiv alen t to the in feasible GLS estimator and thus eﬃcien t. Compared to existing parametric method s, our prop osed method imp oses monotonic- it y , a f eature implied by m an y parametric mo d els, but it is nonparametric and do es not rely on an y sp eciﬁc parametric function form. 1 Compared to existing n onparametric metho ds, our pro- p osed metho d inv olv es only some tuning to trim b oundary observ ations whic h d o es not r equire kno wledge of the smo othness of the conditional v ariance and associated densit y functions. In the Mon te C arlo simulat ions, we sho w that our pr op osed metho d outp erforms the ab ov e-men tioned nonparametric metho ds at almost ev ery c hoice of sm o othing parameters, w hile it p erforms as w ell as parametric feasible GLS estimators with correctly sp eciﬁed conditional error v ariance function. The isotonic estimator can date bac k to the m iddle of the last cent ury . Earlier work includ es Ay er et al. (1955 ), Grenander (195 6), Rao (19 69, 1970 ), and Barlo w and Brunk (1972), amo ng others. The isotonic estimator of a regression function can b e form ulated as a least square estimation with monotonicit y constrain ts. Supp ose that the conditional expectation E [ Y | X ] = m ( X ) is monotone increasing, for an iid r andom sample { Y i , X i } n i =1 , the isotonic estimator is the minimizer of the sum of squared errors, min m ∈M P n i =1 { Y i − m ( X i ) } 2 , where M is the class of monotone increasing functions. Th e minimizer can b e calculated w ith the p o ol adjacen t violators algorithm (Barlo w and Bru nk, 1972), or equ iv alen tly b y solving the greatest conv ex minoran t of the cum ulativ e sum d iagram { (0 , 0) , ( i, P i j =1 Y j ) , i = 1 , . . . , n } , where the corresp onding { X i } n i =1 1 Monotone heteroskedastici ty is often observ ed in economic literature. F or example, Mincer (1974) argued that the v ariance of w ages, when conditioned on edu cation, should increase with the level of education b ecause individu als with higher edu cation ha ve a broader arra y of job choices . Ruud (2000) cited this argument and provided empirical evidence in his Figure 18.1 based on the CPS data from Marc h 1995. Another ex ample can be found in Example 8.6 of W o oldridge (2013, pp . 283-284), where he emplo yed a univ ariate conditional va riance function of log income to ex plain the h eteroskedas ticit y observed in net total ﬁnancial w ealth of p eople in the U nited States. 2 are ordered sequence; see Gro eneb o om and Jongblo ed (2014) f or a comprehensive discussion of d iﬀeren t asp ects of isotonic regression. Moreo ve r, recen t dev elopmen ts in the monotone single index mo d el provide con v enien t and ﬂ exib le tools for combining monotonicit y and multi- dimensional co v ariates. In a monotone single index m o del, the conditional mean of Y is mod eled as E [ Y | X ] = m ( X ′ α ), and th e monotone link fu nction m ( · ) is solv ed w ith isotonic regression. Balab d aoui, Durot and Janko wski (2019) stud ied th e monotone sin gle index m o del with th e monotone least square metho d. Gro eneb o om and Hend r ic k x (2018), Balab daoui, Gro eneb o om and Hendrickx (2019), and Balab daoui and Gro eneb o om (2021) dev elop ed a score-t yp e app roac h for the monotone single ind ex mo del. Th eir approac h can estimate the sin gle index parameter α and the link function m ( · ) at n − 1 / 2 -rate and n − 1 / 3 -rate resp ectiv ely . W e employ their approac h for the estimation of the cond itional v ariance function in the m ultiv ariate case. Recently , Babii and K umar (2023) applied the isotonic r egression to their analysis of regression discon tin uit y designs. T o this end, Babii and Kumar (2023) extended existing results concerning the b oundary prop er ties of Grenan d er’s estimator (e.g., those fr om W o o dro ofe and Sun, 1993, and Kulik o v and Lopu ha¨ a, 2006) to deriv e the asymptotic d istribution of their trimmed isotonic r egression discon tin u it y estimator. T o regularize the isotonic estimator in the w eights of our p rop osed GLS estimato r, we emplo y a similar trimming strategy while ad ap tin g the th eory of Babii and Kumar (2023) to our conte xt of the conditional v ariance estimation. W e con tribute to this literature on isotonic regression by sh o wing that the isotonic estimates can b e emp lo yed as ﬁrst stage estimates to b e plugged in for semiparametric ob jects. F urtherm ore, we note that our isotonic estimator inv olv es generated v ariables (i.e., OLS residuals), wh ic h m ak e theoretica l dev elopmen ts substantia lly diﬀeren t from the existing ones. This pap er is organized as follo ws. In Section 2, we consider the case where σ 2 ( · ) is monotone in one co v ariate, presen t our GLS estimator, and study its asymptotic prop erties. Sect ion 3 extends our GLS approac h to the case wh ere σ 2 ( · ) is sp eciﬁed by a monotone s ingle index function. Section 4 illustrates the prop osed metho d b y a sim u lation study and empirical example. 2. Heteroskedasticity by univ aria te cov aria te W e ﬁrst consider the case w here monotone heterosk edasticit y is caused b y a single co v ariate. In particular, consider the follo win g multiple linear regression mo d el Y = α + β X + Z ′ γ + U, E [ U | X, Z ] = 0 , (2.1) where X ∈ X = [ x L , x U ] is a scalar co v ariate with compact supp ort and Z is a v ector of other co v ariates. In th is secti on, w e focus on the case wh ere heterosk edasticit y is caused b y th e co v ariate X , i.e., E [ U 2 | X, Z ] = E [ U 2 | X ] =: σ 2 ( X ) , (2.2) and σ 2 ( · ) is a monotone incr easing fu nction. The case of monotone decreasing σ 2 ( · ) is analyz ed analogously (by setting U 2 as − U 2 ). In the setup (2.2), w e assum e that the researc her kn o w s whic h co v ariate should b e in clud ed in σ 2 ( · ) based on economic th eory or other prior inform ation. This setup should b e considered as a useful b enc h mark to pro vid e a clear exp osition of the main concept and the asymptotic prop erties of the prop osed monotone GLS estimator. Without the 3 co v ariates Z , th e ab o v e mo del co v ers a b iv ariate regression mod el, and our approac h is new ev en in such a fund amen tal setup. F u rthermore, this setup cov ers the case where X cont ained in (2.2) do es not enter the r egression mo del (2.1) by setting β = 0 (suc h a situation is considered in our emp ir ical illustration in S ection 4.2). Extensions to relax the assump tion in (2.2) will b e discussed in Remark 1 and S ection 3. Let θ = ( α, β , γ ′ ) ′ b e a v ector of the slop e parameters and W := (1 , X, Z ′ ) ′ so that the mo del in (2.1) can b e written as Y = W ′ θ + U . Based on an iid samp le { Y i , X i , Z i } n i =1 , the infeasible GLS estimator for θ is written as ˆ θ IGLS = n X i =1 σ − 2 i W i W ′ i ! − 1 n X i =1 σ − 2 i W i Y i ! , (2.3) where σ 2 i = σ 2 ( X i ). In ord er to m ake this estimator f easible, v arious approac hes hav e b een prop osed in the literature. In this pap er, w e are concerned with the s ituation where the researcher knows σ 2 ( · ) is monotone in a particular r egressor X but its exact functional form is un sp eciﬁed. In par- ticular, by utilizing kn o wledge of the monotonicit y of σ 2 ( · ), we prop ose to estimate σ 2 ( · ) by the isotonic regression from the squared OLS residual on the regressor X . More pr ecisely , let ˆ θ OLS = ( P n i =1 W i W ′ i ) − 1 ( P n i =1 W i Y i ) be th e OLS estimator for (2.1), and ˆ U j = Y j − W ′ j ˆ θ OLS b e its residu al. Then we estimate σ 2 ( · ) by ˆ σ 2 ( · ) = isotonic regression function from { ˆ U 2 j } n j =1 on { X j } n j =1 . (2.4) Although this esti mator is sh o w n to b e consistent for σ 2 ( · ) in the interior of sup p ort [ x L , x U ] of X , it is generally biased at the low er b oundary x L , which m a y cause in consistency of the resulting GLS estimator. Therefore, we prop ose to trim observ ations whose X i ’s are to o close to x L , and dev elop the follo wing feasible GLS estimator ˆ θ = n X i =1 I { X i ≥ q n } ˆ σ − 2 i W i W ′ i ! − 1 n X i =1 I { X i ≥ q n } ˆ σ − 2 i W i Y i ! , (2.5) where I {·} is the indicator fun ction, and the trimming term q n is set as the ( n − 1 / 3 )-th sample quan tile of { X i } n i =1 . Let B ( a, R ) b e a ball around a with radius R ; for ε = U 2 − σ 2 ( X ), deﬁn e σ 2 ε ( x ) = E [ ε 2 | X = x ]. T o s tudy the asymptotic prop erties of the prop osed estimator ˆ θ , w e imp ose the follo wing assumptions. Assumption. A1: { Y i , X i , Z i } n i =1 is an iid sample of ( Y , X , Z ) . The supp ort of ( X, Z ) is c onvex with non-empty interiors and is a subset of B (0 , R ) for some R > 0 . The supp ort of X is a c omp act interval X = [ x L , x U ] . A2: σ 2 : X → R is a monotone incr e asing fu nction deﬁne d on X , and 0 < σ 2 ( x L ) < σ 2 ( x U ) < ∞ . Ther e exist p ositive c onstants a 0 and M such that E [ | U | 2 s | X = x ] ≤ a 0 s ! M s − 2 for al l inte gers s ≥ 2 and x ∈ X . F or some p ositive c onstant δ , σ 2 ( · ) is c ontinuously diﬀer entiable on ( x L , x L + δ ) , and σ 2 ε ( · ) is c ontinuous on ( x L , x L + δ ) . 4 A3: X has a c ontinuous density function f X ( · ) on X , and ther e exists a p ositive c onstant b such that b < f X ( x ) < ∞ for al l x ∈ X . Assumption A1 is standard. As p ointe d out in Balab daoui, Gro en eb o om and Hendr ickx (2019 , p.13), the compact supp ort assumption can b e relaxed wh en X follo ws a sub-Gaussian distribution. In this case, the L 2 -con vergence rate of the isotonic estimator will d ecrease from O p ( n − 1 / 3 log n ) to O p ( n − 1 / 3 (log n ) 5 / 4 ). Another impact of relaxing the distribution of X (and Z ) to a su b-Gaussian one is on the concent ration rate of m ax j | ˆ U 2 j − U 2 j | (see App endix A for more detail s). This rate, used in pr o ving Lemma 1 and explaining the concentrati on of T 1 and T 2 in App end ix A.2, will inﬂate b y a factor of log n . Ho w ev er, ev en with this c h ange, we still ha v e max j | ˆ U 2 j − U 2 j | = o p ( n − 1 / 3 ), whic h is the key to sho w that th e impact of sub stituting infeasible U 2 with estimated ˆ U 2 on isotonic estimators is asym p totically negligible. Consid ering that the conv ergence rates of these aforement ioned terms are slo w ed do w n by a factor of log n at most, the v alidit y of the main results in th is pap er is preserved with sub-Gaussian co v ariates, but the analytica l deriv ation w ould b ecome more cum b ersome. F or a clearer and more concise exp osition, we main tain the compact su pp ort assu mption on X . Assum ption A2 is on the error term. The mon otonicit y of σ 2 ( · ) is the m ain assumption. T h e assumption on arbitrary higher moments, which rules out some fat-tailed d istributions, is commonly us ed to obtain some maximal inequalities (cf. v an d er V aart and W ellner, 1996, Lemma 2.2.11, for a s im ilar assumption). Assumption A3 con tains additional mild conditions on the dens it y of X . W e ﬁrst present asymptotic prop erties of the conditional error v ariance estimator ˆ σ 2 ( · ) in (2.4). Let q ∗ n b e the ( n − 1 / 3 )-th p opulation quan tile of X , D L A [ f ]( a ) b e the left deriv ativ e of the greatest con vex m inoran t of a fun ction f ( · ) ev aluated at a ∈ A , and {W t } b e the stand ard Br ownian motion. Also deﬁne c ∗ = lim n →∞ n 1 / 3 ( q ∗ n − x L ). Assump tion A3 guarante es 0 < c ∗ < ∞ . Then w e obtain the follo wing lemma for the b eha vior of ˆ σ 2 ( · ) around the b oundary x L , wh ic h extends the r esu lt by Babii an d K u mar (2023, Th eorem 2.1(ii)) by allo wing the generated v ariable ˆ U 2 i as a regressand for ˆ σ 2 ( · ). Lemma 1. Under Assumptions A1-A3 and lim x ↓ x L dσ 2 ( x ) dx > 0 , it holds n 1 / 3 { ˆ σ 2 ( q n ) − σ 2 ( q n ) } d → D L [0 , ∞ ) " s σ 2 ε ( x L ) c ∗ f X ( x L ) W t +  lim x ↓ x L dσ 2 ( x ) dx  c ∗  1 2 t 2 − t  # (1) . (2.6) Based on this lemma, the asymptotic distribu tion of our feasible GLS estimator ˆ θ is obtained as follo ws. Theorem 1. Under Assumptions A1-A3, it holds √ n ( ˆ θ − θ ) d → N (0 , E [ σ − 2 ( X ) W W ′ ] − 1 ) , and the asymptot ic varianc e matrix is c onsistently estimate d b y  1 n P n i =1 ˆ σ − 2 i W i W ′ i  − 1 . This theorem imp lies that our estimator ˆ θ has the s ame limiting d istribution as the in feasible GLS estimator ˆ θ IGLS and thus ac hiev es th e semiparametric eﬃciency b ound . This result extends the scop e of the isotonic r egression m etho d b y sho win g that the isotonic estimates, p ossibly with 5 generated v ariables, can b e emplo y ed as ﬁrst stage estimates to b e plugged in for semiparametric ob jects. W e re-emphasize that ˆ θ in v olv es only a trimming term q n , th e ( n − 1 / 3 )-th sample quan tile of { X i } n i =1 . 2 Remark 1. [Extensions of (2.2)] The b enchmark setup E [ U 2 | X, Z ] = σ 2 ( X ) considered in this section can b e extended in v arious w a ys. First, an extension to a single index m o del (sa y , E [ U 2 | X, Z ] = σ 2 ( X η x + Z ′ η z )) will b e discuss ed in the next section. Second, the mo del in (2.1 )-(2.2) can b e extended to the case where the conditional v ariance v aries with discrete co v ariates Z (or its subv ector), sa y E [ U 2 | X, Z = z ] = σ 2 z ( X ) with monotone fun ctions σ 2 z ( · ) for z ∈ { z (1) , . . . , z ( D ) } . In this case, w e can imp lemen t the isotonic regression for eac h group catego rized by z , and construct the feasible GLS estimator in an analogous wa y as (2.5). Third, our approac h ma y b e extended to the additiv e monotone heteroske dasticit y , sa y E [ U 2 | X, Z ] = σ 2 x ( X ) + σ 2 z ( Z ) with monotone functions σ 2 x ( · ) and σ 2 z ( · ). Although formal analysis is b ey ond the scop e of this p ap er, the results in Mammen and Y u (2007 ) suggest that the isotonic estimators for ad d itiv e fun ctions con verge at similar rates as the univ ariate case, an d w e conjecture that a similar result as Th eorem 1 can b e obtained. Finall y , when the conditional error v ariance function is multiplicat iv e, sa y E [ U 2 | X, Z ] = σ 2 x ( X ) σ 2 z ( Z ), and the r esearc h er knows the form of σ 2 z ( · ) (e.g., Z is household size and σ 2 z ( Z ) = Z 2 ), then our feasible GLS estimator can b e applied to observ ations reweig h ted b y 1 /σ z ( Z ). Remark 2. [M onotonicit y testing] Monotonicit y is an assumption that ca n b e tested. F or ob- serv able random v ariables ( Y , X ), seve ral metho ds hav e b een develo p ed to test whether E [ Y | X ] is mon otone increasing in X ; see, e.g., Ghosal, Sen and v an d er V aart (2000), Hall and Hec k- man (2000), D ¨ um bgen and Sp ok oin y (2001), Chetv erik ov (201 9), and Hsu, L iu and Shi (2019), among others. All these tests can b e adapted for our case, testing the monotonicit y of σ 2 ( · ) with generated { ˆ U 2 j } n j =1 and observ ed { X j } n j =1 . Since Assu mptions A1-A2 and ˆ θ OLS − θ = O p ( n − 1 / 2 ) imply ˆ U 2 j − U 2 j = O p ( n − 1 / 2 log n ) u niformly ov er j = 1 , . . . , n , th e critical v alues of these tests can b e adju s ted accordingly to maint ain a prop er asymptotic size. Remark 3. [Missp eciﬁcation of E [ U 2 | X, Z ]] W e wan t to note that ev en if the assumption in (2.2) is violated (e.g., E [ U 2 | X, Z ] v aries with Z or E [ U 2 | X, Z ] = σ 2 ( X ) w ith non-monotone σ 2 ( · )), our feasible GLS estimator ˆ θ in (2.5) is still consisten t for θ due to E [ U | X , Z ] = 0, and asymptotically n ormal at the √ n -rate w ith the limiting d istribution √ n ( ˆ θ − θ ) d → N (0 , E [ ρ ( X ) − 1 W W ′ ] − 1 E [ ρ ( X ) − 2 E [ U 2 | X, Z ] W W ′ ] E [ ρ ( X ) − 1 W W ′ ] − 1 ) , where ρ ( · ) = arg min m ∈M E [ { U 2 − m ( X ) } 2 ] for the class of monotone increasing functions M . Since ˆ σ 2 ( · ) can estimate ρ ( · ), then the asymptotic v ariance matrix can b e co nsisten tly estimated 2 Although our estimator ˆ θ in (2.5) do es not inv olv e any tuning constan t, the trimming term q n should be under- stoo d as t he c · ( n − 1 / 3 )-th sample quantile of { X i } n i =1 , where the tuning constant is set as c = 1. Ind eed Theorem 1 holds true with any c > 0. If we compare with other nonparametric metho ds, smo othing parameters, such as bandwidths, series lengths, and neigh b ors, t ypically require tw o constan ts t o implement. F or example, for the bandwidth parameter b = c 1 n − c 2 , researchers n eed to choose c 1 and c 2 . The constan t c 1 , whic h is analogous to c above, can b e any p ositive number. How ever, th ey also need to choose a positive constant c 2 whose upp er b ound typicall y depen d s on (u nknown) smooth n ess of underlying fun ctions. 6 b y 1 n n X i =1 ˆ σ − 2 ( X i ) W i W ′ i ! − 1 1 n n X i =1 ˆ σ − 4 ( X i ) ˆ U 2 i W i W ′ i ! 1 n n X i =1 ˆ σ − 2 ( X i ) W i W ′ i ! − 1 . (2.7) This m iss p eciﬁcation r obust v ariance estimator is analogous to the one pr op osed b y Cragg (1992 ) for the feasible GLS estimator with parametrically sp eciﬁed m o dels for th e conditional error v ariance E [ U 2 | X, Z ]. 3 Remark 4. [End ogenous r egressor] The resu lt of Theorem 1 can also b e extended to some linear instrumental v ariable (IV) r egression m o del. F or notational simplicit y , consider the follo wing univ ariate IV regression: Y = α + β X + U, E [ U | Z ] = 0 , where X is a scalar endogenous regressor and Z is a scal ar IV, and we fur ther assume E [ X | Z ] = η + γ Z for some parameters ( η , γ ). T his linearit y assumption on E [ X | Z ] is not essen tial, and may b e relaxed b y some nonparametric estimator of E [ X | Z ]. In this setup, the optimal instrument for estimating ( α, β ) ′ is giv en b y (see, e.g., New ey , 1993 ) E  ∂ ( Y − α − β X ) ∂ ( α, β ) ′     Z  E [ U 2 | Z ] − 1 = − 1 0 η γ ! 1 Z ! v − 2 ( Z ) , where v 2 ( · ) = E [ U 2 | Z = · ]. Under th e assumption of γ 6 = 0 (i.e., the IV is r elev ant), the optimal IV estimator is obtained by the metho d of momen ts estimator of th e f ollo wing m oment condition: E " 1 Z ! v − 2 ( Z )( Y − α − β X ) # = 0 . (2.8) Under the m onotonicit y assumption of v 2 ( · ), we can obtain the isotonic estimator ˆ v 2 ( · ) for v 2 ( · ) b y regressing the squared residuals ˆ e 2 = ( Y − ˜ α − ˜ β X ) 2 for an initia l estima tor ( ˜ α, ˜ β ) (e.g., the t wo-stag e least squares estimator) on Z . T he resulting estimator, ˆ v 2 ( · ), should ha ve the same prop er ties as those of ˆ σ 2 ( · ) presente d in Lemma 1, where q n is replaced with the ( n − 1 / 3 )-th sample quantile of { Z i } n i =1 . Based on this isotonic estimator, a f easible optimal IV estimator ˆ θ IV = ( ˆ α IV , ˆ β IV ) ′ is giv en b y ˆ θ IV = n X i =1 I { Z i ≥ q n } ˆ v − 2 ( Z i ) 1 Z i ! (1 , X i ) ! − 1 n X i =1 I { Z i ≥ q n } ˆ v − 2 ( Z i ) 1 Z i ! Y i ! . By applying th e same argumen ts for T h eorem 1, w e ca n s h o w that ˆ θ IV is asymptotically equiv- alen t to th e infeasible optimal IV estimato r based on (2.8) w ith kn o w n v 2 ( · ). 3 Based on sim ulation studies, Cragg (1992) recommended to use his misspeciﬁcation robust v ariance estimator even when the parametric form of heterosk edasticit y is correctly sp eciﬁed. A lthough a similar analysis is b eyond the scope of this pap er, we also recommend to employ the v ariance estimator (2.7) in practice d ue to its consistency regardless of the assumption in (2.2). 7 3. Heteroskedasticity by mul tiv aria te cov aria tes W e now consid er the mo del Y = α + X ′ β + Z ′ γ + U, E [ U | X , Z ] = 0 , (3.1) where X is a v ector of co v ariates. This section fo cuses on the case where heterosk ed asticit y tak es the form of a monotone sin gle index fun ction of X with unkn o w n p arameters η 0 , i.e., E [ U 2 | X, Z ] = E [ U 2 | X ] = σ 2 ( X ′ η 0 ) for a monotone increasing function σ 2 ( · ). Single index mo dels are known to b e more ﬂexible than parametric mod els an d ac hiev e dimension r eduction relativ e to nonparametric mo dels. Remark 5. First, the monotone ind ex mod el σ 2 ( X ′ η 0 ) co ve rs several existing p arametric mo dels. P opular examples include σ 2 ( X ) = C ( X ′ η 0 ) 2 − 2 λ (Bo x and Hill, 1974), σ 2 ( X ) = C exp( λ ( X ′ η 0 )) (Bic k el, 1978), σ 2 ( X ) = C { 1 + λ ( X ′ η 0 ) 2 } (F uller, 1980) for some constan ts C > 0 and λ ; interesti ngly , all these parametric f u nctions are monotone increasing (or de- creasing) in th e ind ex of X . Second, although the setup E [ U 2 | X, Z ] = σ 2 ( X ′ η 0 ) assumes that the researc h er kno ws whic h (sub -)v ecto r of co v ariates should b e included in σ 2 ( · ), researc hers do not h a v e to select those co v ariates in the case w here suc h p rior information is u na v ailable. They can simp ly re-deﬁne the mo del in (3.1) w ithout co v ariates Z (or equiv alen tly sp ecify as E [ U 2 | X, Z ] = σ 2 ( X ′ η 0 + Z ′ η z 0 )). Our asymptotic theory b elo w applies ev en if some co v ariates are irrelev ant for E [ U 2 | X, Z ]. F or ident iﬁcation, η 0 is normalized as || η 0 || = 1. Deﬁn e σ 2 η ( a ) = E [ σ 2 ( X ′ η 0 ) | X ′ η = a ] . (3.2) W e sho w in Lemma 4 that σ 2 ( · ) and η 0 can b e consistent ly estimat ed b y exte nding the method prop osed in Balab daoui, Gro eneb o om and Hendric kx (2019) (BGH h ereafter) and Balabd aoui and Gro eneb o om (2021 ) to allo w generated v ariables. In particular, for a giv en η , deﬁne th e isotonic regression of { ˆ U 2 i } n i =1 on { X ′ i η } n i =1 as ˆ σ 2 η = arg min m ∈M 1 n n X i =1 { ˆ U 2 i − m ( X ′ i η ) } 2 , (3.3) where M is the set of monotone increasing fu nctions deﬁned on R . Based on this, ˆ η can b e estimated b y min im izing the squ are s um of a score fun ction. F or example, the simp le score estimator in the sp irit of BGH and Balab d aoui and Gro eneb o om (202 1) is give n b y ˆ η = arg min η      1 n n X i =1 X i { ˆ U 2 i − ˆ σ 2 η ( X ′ i η ) }      2 , (3.4) where k·k is the Euclidean norm : k a k = q P k j =1 a 2 j for a = ( a 1 , . . . , a k ) ′ ∈ R k . Letting ˆ σ 2 i = ˆ σ 2 ˆ η ( X ′ i ˆ η ) and W = (1 , X ′ , Z ′ ) ′ , we prop ose the follo w ing GLS estimator ˆ θ = n X i =1 I { X ′ i ˆ η ≥ q n } ˆ σ − 2 i W i W ′ i ! − 1 n X i =1 I { X ′ i ˆ η ≥ q n } ˆ σ − 2 i W i Y i ! , (3.5) 8 where q n is the ( n − 1 / 3 )-th sample qu an tile of { X ′ i ˆ η } n i =1 . T o av oid unnecessarily heavy n otations, in the multiv ariate case, w e redeﬁn e some notatio ns, whic h ha ve similar meanings to those used in S ection 2. Deﬁne ε = U 2 − σ 2 ( X ′ η 0 ), σ 2 ε ( · ) = E [ ε 2 | X ′ η 0 = · ], x L = inf x ∈ X ( x ′ η 0 ), and x U = sup x ∈ X ( x ′ η 0 ). Let f X ( · ) b e the densit y function of the random v ariable X ′ η 0 . Let q ∗ n b e th e ( n − 1 / 3 )-th p opulation quan tile of X ′ η 0 , q n b e the ( n − 1 / 3 )-th sample quantile of { X ′ i ˆ η } n i =1 , c ∗ = lim n →∞ n 1 / 3 ( q ∗ n − x L ), and D L A [ f ]( a ) b e the left deriv ativ e of the greatest con v ex minorant of function f ( · ) ev aluated at a ∈ A . Let dim( w ) b e the dimen s ion of a ve ctor w . Assumption. M1: { Y i , X i , Z i } n i =1 is an iid samp le of ( Y , X , Z ) . The su pp ort of ( X, Z ) , X × Z , is c onvex with non-empty interiors and is a subset of B (0 , R ) for some R > 0 . M2: (i) Ther e exists δ 0 > 0 su c h that th e fu nction a 7→ σ 2 η ( a ) deﬁne d in (3.2) is mono tone incr e asing on I η = { x ′ η , x ∈ X } f or e ach η ∈ B ( η 0 , δ 0 ) . (ii) 0 < inf a ∈ I η σ 2 η ( a ) < sup a ∈ I η σ 2 η ( a ) < ∞ for e ach η ∈ B ( η 0 , δ 0 ) . (iii) Ther e exist p ositive c onstants a 0 and M such that E [ | U | 2 s | X = x ] ≤ a 0 s ! M s − 2 for al l inte ge rs s ≥ 2 and x ∈ X . (iv) σ 2 η ( · ) is c ontinuously diﬀer entiable on I η for e ach η ∈ B ( η 0 , δ 0 ) . (v) σ 2 ε ( · ) is c ontinuous on ( x L , x L + δ 1 ) for some δ 1 > 0 . M3: The r andom variable X ′ η 0 has a density function f X ( · ) that is c ontinuous on I η 0 . Ther e exists some r e al p ositive numb ers b and b , such that 0 < b < f X ( a ) < b < ∞ holds for al l a ∈ I η 0 . M4: F or e ach η ∈ B ( η 0 , δ 0 ) , the map ping a 7→ E [ X | X ′ η = a ] deﬁne d on I η is b oun de d and has a ﬁnite total v ariation. M5: Cov[ X ′ ( η 0 − η ) , σ 2 ( X ′ η 0 ) | X ′ η ] 6 = 0 almos t sur e ly for e ach η 6 = η 0 . M6: B := R ( x − E [ X | x ′ η 0 ])( x − E [ X | x ′ η 0 ]) ′ dσ 2 ( a ) da    a = x ′ η 0 dP ( x ) has r ank dim ( η 0 ) − 1. Assumptions M1-M 3 are analogs of Assumptions A1-A3, resp ectiv ely . The main assum p tion is the monotonicit y of σ 2 η ( · ). Assumptions M4-M6 are ad d itional regularit y conditions for the monotone index mo del. By Ass u mption M1, w e h a ve −∞ < x L < x U < ∞ . Th en similar to Lemma 1, w e obtain the follo win g lemma for the b ehavio r of ˆ σ 2 ˆ η ( · ) around x L . Lemma 2. Under Assumptions M1-M6 and lim a ↓ x L dσ 2 ( a ) da > 0 , it hold s n 1 / 3 { ˆ σ 2 ˆ η ( q n ) − σ 2 ( q n ) } d → D L [0 , ∞ ) " s σ 2 ε ( x L ) c ∗ f X ( x L ) W t +  lim a ↓ x L dσ 2 ( a ) da  c ∗  1 2 t 2 − t  # (1) . Based on this lemma, the asymptotic d istribution of the GLS estimator ˆ θ in (3.5) is obtained as follo ws. Let σ 2 i = σ 2 ( X ′ i η 0 ). Theorem 2. Under Assumptions M1-M6, it holds √ n ( ˆ θ − θ ) d → N (0 , E [ σ − 2 ( X ′ η 0 ) W W ′ ] − 1 ) , and the asymptot ic varianc e matrix is c onsistently estimate d b y  1 n P n i =1 ˆ σ − 2 i W i W ′ i  − 1 . 9 Similar comments to Theorem 1 apply h er e. Our estimator ˆ θ is asymp toticall y equiv alent to the infeasible GLS estimator ˆ θ IGLS . In terms of tec hn ical con tr ibution, our theoretical anal- ysis generalizes existing ones in, e.g., Babii and Ku m ar (2023) , BGH, and Balab daoui and Gro eneb o om (2021) to ac commo date generated v ariables. Similar to R emark 3, ev en w hen the monotonicit y assumption of σ 2 η ( · ) is viola ted, ˆ θ is s till consisten t for θ and asymptotica lly normal at the √ n -rate with certain robust asymptotic v ariance. F u rthermore, endogenous regressors can b e accommo dated as in Remark 4. Remark 6. W e can su ggest t wo informal robustness c h ec ks for the monotone index assumption in (3.2). One is to compu te the s tandard errors robu st to p ossible missp eciﬁcation obtained in the same man n er as Remark 3 an d compare th em to those in T heorem 2. This can serv e as a robustness c h ec k for the monotone sp eciﬁcation giv en v ariables of the conditional error v ariance functions. Another is to r ep ort the results for the sp eciﬁcation where all exogenous v ariables are included to σ 2 ( · ) in addition to th ose for the chose n s p eciﬁcations. A large d iﬀeren ce b et w een these r esults can b e a sign of th e missp eciﬁcation of th e chosen ones. See Section 4.2 for illustration. Remark 7. In this section, we emp lo y the monotone single ind ex structure to mo del the multi- v ariate conditional v ariance function. This strategy allo ws us to strik e a balance b etw een robu st- ness and mitigati ng the curse of dimens ionalit y . In deed, th e current sp eciﬁcation can b e extended to the multi ple in dex mo del E [ U 2 | X = x ] = x ′ 0 η 0 + P M i =1 G i ( x ′ i η i ), for X = ( X ′ 0 , X ′ 1 . . . , X ′ M ) ′ , where { G i ( · ) } M i =1 are unkn o wn monotone in cr easing functions. F or the case of M = 1, this m o del simpliﬁes to a monotone partially linear s in gle in dex mo del w hose prop erties hav e b een studied b y Xu and Otsu (2020). W e are optimistic that, under certain r egularity conditions, similar re- sults as in this s ection ca n be obtained. T o the b est of our kno wledge, w e ha ve not come across an y wo rks th at discuss th e m ultiple monotone index mo d el with M > 1 eve n for the con ven- tional regression setup for E [ Y | X = x ]. A p ossible solution could b e deriv ed b y combining the existing literature on the monotone single index mod el (as cited in Section 1) with the lite rature on the monoto ne additiv e mo del (for instance, Mammen and Y u, 2007). An other p oten tial ex- tension inv olv es employing the n onparametric framewo rk of F ang, Guntub o yina and Sen (2021) to m o del the m ultiv ariate conditional v ariance function. This framework is f r ee of parametric structure, and it requir es the true conditional v ariance to b e ent irely monotone increasing in its argumen ts, i.e., σ 2 ( x 1 , z 1 ) ≤ σ 2 ( x 2 , z 2 ) if only if x 1 ≤ x 2 and z 1 ≤ z 2 . Explorations of these extensions exceed the scop e of this pap er, and w e lea ve them for fu ture researc h. 4. Numer ical illus tra tions 4.1. Simulation. W e now in vestig ate the ﬁ nite sample prop erties of the prop osed GLS estima- tor b y a Monte Carlo experiment. W e follo w the simulati on d esign b y C r agg (1983) and Newe y (1993 ). The ﬁrs t data generating pro cess, denoted b y DGP1, is the h eterosk ed astic linear mo del 10 with a univ ariate co v ariate and n ormally distributed disturb ance: 4 Y i = β 0 + β 1 X i + u i , u i = σ i ε i , ε i ∼ N (0 , 1) , β 0 = β 1 = 1 , log( X i ) ∼ N (0 , 1) , X i and ε i are in dep en d en t , σ 2 i = . 1 + . 2 X i + . 3 X 2 i . (4.1) W e consider three samp le sizes, n = 50, 100, and 500. The n um b er of replications is set to 1,00 0. In addition to the feasible GLS estimator with monotone heterosk edasticit y (MGLS), we con- sider the ordinary least squ ares (OLS), infeasible generaliz ed lea st squares (GLS ), f easible GLS (F GLS ), and nearest neigh b or estimators (k-NN). GLS requires knowledge of th e conditional error v ariance function (4.1), in clud ing the v alues of the co eﬃcien ts. In con trast, F GLS p r o- ceeds with the known fu nctional form, but the co eﬃcien ts are estimated. The “k-NN automatic” c h o oses the n um b er of n eigh b ors b y a cross-v alidation p ro cedur e su ggested b y Newey (1993). All the estimators except OL S are the weig h ted least squ ares estimators, an d their diﬀerences come from how the w eights are calculated. F ollo w ing New ey (1993), w e calculate th e wei gh ts for eac h method by taking a ratio of the p redicted squ ared residu al to th e estimated v ariance of the distu r bance, censoring the resu lt below 0 . 04. T able 4.1 presents the simulation results for estimation. T he ﬁ r st column sho ws the estimation metho ds, and the follo wing t w o columns sho w the ro ot mean-squared error (RMSE ) and mean absolute error (MA E) for DGP1 with n = 50. The results for GLS r ep ort the level s of the RMS E and MAE, and th ose for others are their ratios relativ e to GLS . The n ext t w o columns giv e the corresp ondin g results with n = 100 and the last t wo co lumns with n = 500. Two ro ws for eac h estimator sho w the results for β 0 and β 1 , resp ectiv ely . The ineﬃciency and inaccuracy of OLS are apparen t. F GLS p erforms quite w ell, and this is natural wh en the conditional error v ariance functions are correctly sp eciﬁed. The p erformance of k-NN v aries with the choice of k and is in b et w een OLS and FG LS. W e observ e that the p erformance of MGLS is b etter than k-NN in ev ery c hoice of smo othing parameters. Th e result of MGLS is comparable to that of F GLS if not b etter. MGLS’s ind ep end ence of a sm o othing parameter is clearly desirable. W e also note that MGLS p erforms w ell ev en for n = 50. The last four columns of T able 4.1 presen t the results for DGP2 with a homosk edastic error: Y i = β 0 + β 1 X i + u i , u i ∼ N (0 , 1) , β 0 = β 1 = 1 , lo g ( X i ) ∼ N (0 , 1) , X i and u i are indep endent . F or DGP2, all estimators w ork r easonably we ll although the p erformance of k-NN with k = 6 is w orse than others. 4 Normal random v ariables are not compactly supp orted, and h ence it violates Assump tion A1. How ev er, as discussed in th e remark on Assumption A1, this assumption can b e relaxed. 11 T abl e 4.1. Sim ulation: Estimation with un iv ariate co v ariate DGP1 DGP2 n = 50 n = 100 n = 500 n = 50 n = 100 n = 500 Estimator RMSE MAE RMSE MAE RMSE MAE RMSE MAE RMSE MAE RMSE MAE GLS (inf easible) 0.132 0.085 0.093 0.059 0.041 0.028 0.194 0.122 0.133 0.088 0.057 0.039 0.157 0.100 0.108 0.073 0.048 0.032 0.083 0.046 0.055 0.034 0.021 0.014 OLS 3.103 2.856 3.831 3.479 5.574 4.495 1.000 1.000 1.000 1.000 1.000 1.000 2.072 2.098 2.543 2.370 3.377 2.971 1.000 1.000 1.000 1.000 1.000 1.000 F GLS 1.279 1.210 1.245 1.233 1.598 1.152 1.032 1.041 1.026 1.033 1.024 1.067 1.427 1.268 1.406 1.280 1.271 1.242 1.090 1.092 1.075 1.036 1.088 1.090 k-NN (Automatic) 1.630 1.373 1.633 1.511 1.355 1.167 1.123 1.081 1.130 1.081 1.181 1.138 1.535 1.427 1.606 1.431 1.424 1.267 1.092 1.074 1.065 1.006 1.197 1.097 k-NN ( k = 6) 1.554 1.361 1.525 1.498 1.474 1.417 1.274 1.243 1.253 1.155 1.359 1.276 1.466 1.421 1.472 1.462 1.454 1.459 1.178 1.143 1.177 1.114 1.350 1.344 k-NN ( k = 15) 1.600 1.386 1.566 1.365 1.251 1.108 1.037 1.076 1.079 1.059 1.081 1.140 1.520 1.398 1.546 1.408 1.247 1.197 1.003 1.046 1.037 1.012 1.066 1.053 k-NN ( k = 24) 1.781 1.568 1.685 1.457 1.291 1.160 1.011 1.039 1.039 0.980 1.044 1.098 1.630 1.560 1.673 1.471 1.312 1.246 1.002 1.026 1.015 0.994 1.038 1.025 MGLS 1.379 1.285 1.326 1.279 1.113 1.129 1.039 1.091 1.049 1.075 1.027 1.075 1.327 1.214 1.332 1.249 1.113 1.144 1.043 1.051 1.051 1.058 1.055 1.066 Note: “RMSE” and “MAE” stand for the root mean squared error an d mean absolute error, resp ectively . The results for GLS report the lev els of the RMSE and MAE, and t h ose for others are their ratios rela tive to GLS. 12 Next, we consider the heterosk edastic linear models w ith multiv ariate co v ariates, denoted by DGP3: Y i = β 0 + β 1 X 1 i + β 2 X 2 i + u i , u i = σ i ε i , ε i ∼ N (0 , 1) , β 0 = β 1 = β 2 = 1 , log( X 1 i ) , log ( X 2 i ) ∼ N (0 , 1) , X 1 i , X 2 i and ε i are in dep en d en t , σ 2 i = . 2( X 1 i + X 2 i ) 2 . (4.2) The conditional error v ariance function of DPG3 is of a monotone sin gle ind ex structure. Using the notat ion in (3.2), DGP3 co rresp onds to the structur e with σ 2 ( a ) = a 2 , X ′ = ( X 1 , X 2 ), and η 0 = ( √ . 2 , √ . 2) ′ . The left p anel of T able 4.2 sho ws th e results of DGP3 in the same mann er as T able 4.1. F or eac h estimation metho d, t wo ro ws show the results for β 0 and β 1 , and those for β 2 are omitted to a void redundancy . k-NNs and MGLS p erform b etter than F GLS, and this is in con trast to the p erformance of DGP1. In general, MGLS works b etter than k-NNs except f or a few cases. T o see the p oten tial applicabilit y of MGLS to a non -sin gle index stru cture, w e consider another heterosk ed astic linear mo d el denoted b y DGP4: Y i = β 0 + β 1 X 1 i + β 2 X 2 i + u i , u i = σ i ε i , ε i ∼ N (0 , 1) , β 0 = β 1 = β 2 = 1 , log( X 1 i ) , log( X 2 i ) ∼ N (0 , 1) , X 1 i , X 2 i and ε i are in dep en d en t , σ 2 i = . 1 + . 2 ˜ X i + . 3 ˜ X 2 i , log( ˜ X i ) = log( X 1 i ) + log ( X 2 i ) √ 2 . (4.3) The right panel of T able 4.2 s h o ws the results. The results for DGP 4 are o ve rall similar to those of DGP3. An exception is F GLS , which p er f orms p o orly f or DGP3. MGLS w orks remark ably w ell for the h eterosk edasticit y of a non-single index structure. 13 T abl e 4.2. S imulation: Estimation with m ultiv ariate co v ariates DGP 3 DGP 4 n = 50 n = 100 n = 500 n = 50 n = 100 n = 500 Estimator RMSE MAE RMSE MAE RMSE MAE RMSE MAE RMSE MAE RMSE MAE GLS (inf easible) 0.162 0.103 0.110 0.071 0.045 0.028 0.165 0.108 0.115 0.076 0.049 0.033 0.163 0.107 0.109 0.072 0.048 0.033 0.108 0.067 0.071 0.046 0.029 0.020 OLS 3.401 3.589 4.255 4.168 6.650 6.695 3.069 2.653 3.792 2.980 4.897 4.186 1.942 1.950 2.317 2.260 3.051 2.610 2.318 2.170 2.914 2.338 3.809 3.198 F GLS 2.531 2.239 2.516 2.141 2.731 2.037 1.381 1.189 1.427 1.233 1.699 1.219 1.606 1.441 1.709 1.486 1.732 1.358 1.359 1.227 1.344 1.281 1.326 1.271 k-NN (Automatic) 1.952 1.925 2.108 1.709 1.786 1.537 1.868 1.638 1.778 1.488 1.709 1.390 1.546 1.429 1.680 1.489 1.516 1.318 1.766 1.771 1.865 1.763 2.009 1.626 k-NN ( k = 6) 1.827 1.766 1.787 1.587 1.670 1.666 1.719 1.541 1.674 1.521 1.594 1.537 1.458 1.397 1.514 1.486 1.497 1.362 1.717 1.704 1.769 1.764 1.813 1.654 k-NN ( k = 15) 1.914 1.957 1.850 1.669 1.490 1.385 1.769 1.611 1.669 1.491 1.373 1.246 1.468 1.401 1.511 1.428 1.313 1.248 1.712 1.727 1.769 1.639 1.588 1.517 k-NN ( k = 24) 2.182 2.203 2.008 1.816 1.570 1.510 1.952 1.888 1.799 1.581 1.392 1.254 1.562 1.455 1.611 1.571 1.371 1.267 1.825 1.807 1.890 1.729 1.626 1.511 MGLS 2.144 1.977 1.993 1.659 1.667 1.481 1.839 1.549 1.647 1.422 1.320 1.251 1.486 1.467 1.477 1.401 1.238 1.186 1.670 1.533 1.604 1.451 1.448 1.360 Note: “RMSE” and “MAE” stand for the root mean squared error an d mean absolute error, resp ectively . The results for GLS report the lev els of the RMSE and MAE, and t h ose for others are their ratios rela tive to GLS. 14 Next, we tur n to the sim u lation results on inference. T ables 4.3 and 4.4 sho w empirical co ver- ages (EC ) and av erage lengths (AL) for the 95% conﬁden ce in terv als u nder DGPs 1-4. Again w e consider GLS , OLS, F GLS , k-NN, and MGLS. F or OLS, three t y p es of conﬁdence in terv als are considered. They are b ased on the us u al OLS standard error (OLS-U), the heterosk edasticit y- robust standard error (OLS-R), and the wild b o otstrap standard error (OLS -b o ot). F or MGLS, w e also p resen t the results f or its robus t v ers ion. W e observ e th at the empirical co v erages are smaller than th e nominal co v erage 0.95 for all DGPs and all m etho ds except GLS. I t is natural that OLS -U p er f orms p o orly since it is in v alid except for DGP2. The p erformance of k-NN is w orse th an others for all DGPSs in terms of empirical co v erage. OLS-R, OLS -b o ot, FG LS, and MGLS work similarly in terms of empirical co ve rage, how ev er, w e n ote that the a verag e length of OLS -R is m uc h larger than those of FG LS and MGLS except for DGP2. While the empir ical co verag es of OLS-Bo ot are similar to those of OLS-R, th e a v er age lengths of OLS-Bo ot are smaller than those of OLS-R bu t s till larger than those of MGLS. MGLS w orks quite well for all DGPs, esp ecially for n = 500. T he r esu lts of MGLS (Robust) are similar to th ose of MGLS esp ecially when n = 10 0 and 500. Finally , w e note that the empirical co v erages tend to b e lo we r when n = 50 than when n = 100 and 500. Careful in terpretation of results is recommended when the sample size is sm all. 15 T abl e 4.3. S imulation: Inference with univ ariate co v ariate DGP 1 DGP 2 n = 50 n = 100 n = 500 n = 50 n = 100 n = 500 Estimator EC AL EC AL EC AL EC AL EC AL EC AL GLS (inf easible) 0.956 0.528 0.955 0.370 0.956 0.16 4 0.939 0.749 0.947 0.516 0.95 5 0.224 0.946 0.627 0.962 0.441 0.960 0.19 6 0.948 0.316 0.952 0.207 0.97 0 0.085 OLS-U 0.798 1.008 0.742 0.740 0.636 0.34 9 0.939 0.749 0.947 0.516 0.95 5 0.224 0.492 0.409 0.421 0.290 0.348 0.13 1 0.948 0.316 0.952 0.207 0.97 0 0.085 OLS-R 0.766 0.962 0.8 05 0.862 0.8 84 0.648 0.933 0.733 0.941 0.5 07 0.949 0.2 22 0.730 0.761 0.772 0.689 0.880 0.48 8 0.874 0.272 0.881 0.185 0.93 5 0.081 OLS-Bo ot 0.740 0.885 0.845 0.517 0.907 0.35 6 0.917 0.718 0.947 0.516 0.95 5 0.224 0.690 0.681 0.856 0.527 0.894 0.34 5 0.846 0.270 0.952 0.207 0.97 0 0.085 F GLS 0.800 0.451 0.847 0.328 0.872 0.16 2 0.916 0.709 0.925 0.493 0.93 5 0.216 0.737 0.504 0.812 0.395 0.885 0.19 5 0.761 0.231 0.758 0.159 0.84 4 0.072 k-NN (Automatic) 0.708 0.410 0.659 0.2 58 0.701 0.10 2 0.902 0.711 0.884 0.483 0.8 83 0.205 0.576 0.351 0.574 0.251 0.650 0.11 5 0.927 0.306 0.917 0.197 0.88 1 0.079 k-NN ( k = 6) 0.732 0.410 0.666 0.258 0.621 0.10 2 0.845 0.711 0.845 0.483 0.81 9 0.205 0.576 0.351 0.574 0.251 0.650 0.11 5 0.927 0.306 0.917 0.197 0.88 1 0.079 k-NN ( k = 15) 0.735 0.418 0.704 0.266 0.717 0.10 5 0.929 0.725 0.907 0.492 0.91 4 0.210 0.582 0.353 0.592 0.258 0.677 0.11 8 0.944 0.310 0.931 0.200 0.91 9 0.081 k-NN ( k = 24) 0.711 0.440 0.688 0.269 0.721 0.10 7 0.945 0.744 0.921 0.504 0.93 5 0.216 0.512 0.324 0.537 0.244 0.668 0.11 8 0.953 0.316 0.942 0.204 0.93 9 0.083 MGLS 0.779 0.499 0.812 0.363 0.905 0.16 5 0.885 0.640 0.907 0.468 0.93 7 0.219 0.725 0.523 0.744 0.392 0.888 0.18 8 0.951 0.333 0.968 0.222 0.97 2 0.092 MGLS (Robus t) 0. 762 0.483 0.791 0.35 4 0.903 0.16 3 0.879 0.635 0.902 0.463 0.93 3 0.216 0.725 0.465 0.744 0.359 0.888 0.18 1 0.951 0.258 0.968 0.177 0.97 2 0.079 Note: “EC” and “AL” stand for the empirica l cov erage probability and av erage length, resp ectively . “OLS-U”, “OLS-R”, and “OLS-Bo ot” use the normal approximation wi th the usual OLS standard error, the heteroskedas ticity robust standard error, and the percentile b ootstrap interv al, respectively . “MGLS (Robust)” is b ased on the v ariance form ula presented in Remark 2. 16 T abl e 4.4. Sim ulation: Inference with multiv ariate co v ariates DGP 3 DGP 4 n = 50 n = 100 n = 500 n = 50 n = 100 n = 500 Estimator EC AL EC AL EC AL EC AL EC AL EC AL GLS (inf easible) 0.944 0.611 0.951 0.413 0.946 0.17 5 0.944 0.636 0.943 0.440 0.95 6 0.192 0.951 0.632 0.961 0.439 0.960 0.19 4 0.949 0.414 0.950 0.282 0.96 8 0.123 OLS-U 0.824 1.535 0.786 1.108 0.632 0.51 1 0.819 1.222 0.780 0.893 0.67 5 0.412 0.589 0.526 0.549 0.369 0.491 0.16 4 0.639 0.420 0.611 0.298 0.52 1 0.133 OLS-R 0.787 1.411 0.8 15 1.232 0.8 69 0.873 0.797 1.197 0.843 1.0 68 0.906 0.7 18 0.729 0.767 0.782 0.660 0.891 0.44 1 0.762 0.596 0.810 0.517 0.91 4 0.334 OLS-Bo ot 0.756 1.319 0.781 1.133 0.839 0.79 7 0.752 1.115 0.785 0.951 0.86 0 0.654 0.688 0.708 0.749 0.593 0.826 0.40 0 0.719 0.569 0.780 0.460 0.86 6 0.301 F GLS 0.831 1.069 0.845 0.759 0.897 0.33 6 0.801 0.596 0.823 0.424 0.83 4 0.191 0.658 0.517 0.722 0.395 0.826 0.19 8 0.797 0.382 0.862 0.262 0.85 5 0.112 k-NN (Automatic) 0.571 0.481 0.557 0.2 89 0.587 0.10 5 0.672 0.526 0.646 0.323 0.6 09 0.122 0.471 0.296 0.472 0.205 0.534 0.09 1 0.596 0.298 0.599 0.200 0.60 5 0.082 k-NN ( k = 6) 0.597 0.481 0.574 0.289 0.549 0.10 5 0.670 0.526 0.639 0.323 0.57 1 0.122 0.471 0.296 0.472 0.205 0.534 0.09 1 0.596 0.298 0.599 0.200 0.60 5 0.082 k-NN ( k = 15) 0.592 0.504 0.590 0.299 0.639 0.10 8 0.690 0.551 0.675 0.338 0.68 9 0.129 0.491 0.301 0.484 0.212 0.570 0.09 6 0.607 0.309 0.629 0.210 0.66 8 0.088 k-NN ( k = 24) 0.582 0.557 0.561 0.313 0.618 0.11 1 0.681 0.590 0.664 0.347 0.70 2 0.132 0.459 0.287 0.450 0.204 0.562 0.09 6 0.598 0.297 0.608 0.207 0.66 2 0.091 MGLS 0.803 0.956 0.863 0.623 0.938 0.24 8 0.801 0.805 0.844 0.526 0.90 2 0.216 0.687 0.524 0.756 0.401 0.897 0.19 8 0.707 0.404 0.776 0.305 0.90 8 0.154 MGLS (Robus t) 0. 755 0.855 0.833 0.57 3 0.920 0.23 4 0.762 0.750 0.833 0.511 0.90 2 0.219 0.687 0.548 0.756 0.415 0.897 0.19 9 0.707 0.422 0.776 0.308 0.90 8 0.148 Note: “EC” and “AL” stand for the empirica l cov erage probability and av erage length, resp ectively . “OLS-U”, “OLS-R”, and “OLS-Bo ot” use the normal approximation wi th the usual OLS standard error, the heteroskedas ticity robust standard error, and the percentile b ootstrap interv al, respectively . “MGLS (Robust)” is b ased on the v ariance form ula presented in Remark 2. 17 4.2. Empirical example. W e illustrate ho w the pr op osed metho d in this p ap er can impr o ve the precision of the traditional OLS approac h. In doing so, we revisit Acemoglu and R estrep o (2017 ) that inv estigate the relationship b et w een an aging p opu lation and economic gro wth. After Hansen (193 9), a p opular p er s p ectiv e is th at coun tries undergoing faster aging suﬀer m ore economicall y partly b ecause of excessiv e sa vin gs by an aging p opulation. In con trast to the p ersp ectiv e, Acemoglu and Restrep o (2017) ﬁn d no evidence of a negati v e r elationship b etw een aging and GDP p er capita after con trolling for initial GDP p er capita, initial d emographic comp osition, and trends b y region. Acemoglu and Restrep o (2017) estimated eight sp eciﬁcations for the regression of the c hange in (log) GDP p er capita from 1990 to 2015 (denoted b y GDP) on the p opulation aging measured b y the c hange in th e ratio of the p opulation ab o ve 50 to those b et w een the ages of 20 and 49 (denoted b y Aging). The results are repro d uced in P anel A of T able 4.5. Th ose in columns 1-5 are based on the s ample including 169 coun tries. Column 1 sho ws the resu lt of th e simple regression. Standard err ors r ob u st to heterosk ed asticit y are rep orted in square b rac kets. Column 2 shows the result with an additional regressor, the initial log GDP p er w ork er in 1990. Column 3 in addition includes the initial demographic information, the ratio of the p opulation ab o v e 50 to those b et w een 20 and 49 in 1990 (denoted b y Initial Rati o), and the p opulation in 1990. Column 4 additionally uses dummies for sev en regions, Latin America, E ast Asia, South Asia, Africa, North Africa and Middle East, Eastern Eu rop e and Central Asia, and Dev elop ed Count ries. Column 5 estimates the same sp eciﬁcation as Column 4 with instruments of birthrates for the 1960, 1965, 1970, 1975, and 1980 cohorts. C olumns 6 to 8 rep ort the result for OECD countries using sp eciﬁcatio ns of Columns 1, 3, and 5, resp ectiv ely . T he num b er of observ ations for the ﬁrst ﬁv e columns is 16 9, an d th at f or the last th r ee columns is 35. Sev en out of eigh t O LS estimates indicate p ositiv e relationships and ﬁ v e of them are statistical ly signiﬁcant at the 5 p ercen t lev el. Acemoglu and Restrep o (2017) discuss th at these ﬁn dings ca n b e explained by th e adoption of automation tec hnologies based on a th eoretical mo del. W e estimate th e same sp eciﬁcations by MGLS p rop osed in this pap er. Acemoglu and Restrep o (2017 ) s ho w that the negativ e eﬀect of aging can b e mitigated or reversed by adopting n ew automation tec h nologies give n abundant capital. This also imp lies that the eﬀect of aging can b e negativ e without suﬃcient capital. Hence it would b e r easonable to consider Aging as a source of heterosk edasticit y . Th e upp er panel of Figure 4.2 sho ws the relationship b etw een the r esid ual from the simple regression of column 1 in P anel A and Aging. Heterosk ed asticit y due to Aging is not easily conﬁr med visu ally . W e consider Initial Ratio as another source of heterosk edasticit y since the lo w ratio of old to y oung in 1990 is like ly correlated with more aging in 2015, leading to larger v ariabilit y in GDP p er capita by the same reasoning discus sed ab ov e. Th e lo w er panel of Figure 4.2 p resent s the relationship b etw een the residu al fr om th e sim p le r egression of column 1 in Panel A and Initial R atio, and we see that the v ariabilit y decrease s with the gro wing ratio. P anels B, C , and D of T able 4.5 show the resu lts of MGLS. Panels B and C p resen t the results for cases where the conditional error v ariance functions d ep end on Aging and Initial Ratio, resp ectiv ely . P anel D rep orts the results where the conditional err or v ariance functions dep end on all exogenous regressors except the regional du mmies. Standard errors b ased on 18 Theorems 1-2 and their analogo us v ers ions for IV estimato rs are rep orted in parentheses, wh ile robust standard errors are rep orted in square brac k ets. First, we observe reductions in standard errors for all MGLSs relativ e to OLS. The diﬀerences stand out when n = 169. Second, the t wo standard errors are similar for th e MGLS estimates under exogeneit y while they d iﬀer for the IV estimates. These are th e supp orting evidence for the monotone sp eciﬁcation of the conditional error v ariance fu nction for the MGLS metho d with exogenous regressors bu t not for the IV metho d. Third , the r esu lts giv en in Columns 2, 3, and 4 are stable, w hile the results of IV estimates and OECD coun tries con tain a lot of v ariations. Those v ariations can b e d ue to n on -mon otone conditional error v ariance fu nctions and /or small samp le sizes, and fur ther in v estigations w ill b e required. Ov erall, the standard errors of MGLS tend to b e smaller or no larger th an those of OLS, w h ic h demonstr ates the increased precision of MGLS. 19 −1 0 1 2 −0.2 0.0 0.2 0.4 Change in the ratio of old to young workers from 1990 to 2015 Residual −1 0 1 2 0.2 0.4 0.6 0.8 The ratio of old to young workers in 1990 Residual Figure 4.1. Plots for residual and aging (upp er) and residu al and ratio of old to y oung w orkers in 1990 (lo w er) Note: F or b oth panels, the residuals are obtained from the regres sion of the c h ange in GDP p er capita from 1990 to 2015 (GD P) on the p opulation aging measured by the ratio of the p opulation above 50 to those betw een the ages of 20 and 49 (Aging). F or the up p er panel, the v ariable on the X- axis rep resents the c hange in the ratio of old t o young wo rkers from 1990 to 2015. F or the lo w er panel, it represents the ratio of old t o young wo rkers in 1990. 20 T abl e 4.5. Eﬀects of the Aging on GDP b y O L S and MGLS Sample of all countries ( n = 169) OECD countries ( n = 35) Sp eciﬁcation (1) (2) (3) (4) (5) (6) (7) (8) P anel A: OLS Aging 0.335 1.036 1.162 0.773 1.703 -0.262 0.042 1.186 (0.210 ) (0.257 ) (0.2 76) (0.3 22) (0.4 11) (0.352 ) (0.346 ) (0.458 ) Initial GDP -0.153 -0.138 -0.156 -0.190 -0.205 -0.260 (0.039 ) (0.0 42) (0.0 46) (0.0 45) ( 0.072) (0 .092) P anel B: MGLS (co v ariate of σ 2 ( · ) =Aging) Aging 0.387 1.098 1.191 0.751 0.414 -0.391 -0.029 -0.458 (0.189 ) (0.187 ) (0.2 05) (0.2 67) (0.1 01) (0.247 ) (0.284 ) (0.092 ) [0.150 ] [0.179] [0.198] [0.310] [ 0.472] [0.190 ] [0.340] [0.4 51] Initial GDP -0.164 -0.155 -0.168 -0.079 -0.190 -0.297 (0.027 ) (0.0 29) (0.0 30) (0.0 09) ( 0.069) (0 .025) [0.031 ] [0.032] [0.029 ] [0.04 6] [0.069 ] [0.141] P anel C: MGLS (co v ariate of σ 2 ( · ) =Initial Ratio) Aging 0.065 0.771 0.894 0.574 0.483 -0.501 -0.344 -0.585 (0.196 ) (0.223 ) (0.2 31) (0.2 35) (0.1 42) (0.270 ) (0.219 ) (0.226 ) [0.196 ] [0.249] [0.262] [0.272] [ 0.603] [0.231 ] [0.213] [0.7 58] Initial GDP -0.164 -0.141 -0.159 -0.080 -0.148 -0.379 (0.031 ) (0.0 35) (0.0 41) (0.0 12) ( 0.056) (0 .096) [0.035 ] [0.037] [0.046 ] [0.05 5] [0.065 ] [0.288] P anel D: MGLS (co v ariates of σ 2 ( · ) =All) Aging 0.285 1.064 1.188 0.810 0.494 -0.391 0.062 -0.268 (0.221 ) (0.265 ) (0.2 81) (0.2 89) (0.1 00) (0.247 ) (0.274 ) (0.186 ) [0.206 ] [0.249] [0.271] [0.323] [ 0.442] [0.190 ] [0.340] [0.9 72] Initial GDP -0.152 -0.136 -0.146 -0.079 -0.203 -0.250 (0.030 ) (0.0 33) (0.0 41) (0.0 09) ( 0.072) (0 .049) [0.033 ] [0.036] [0.044 ] [0.05 1] [0.072 ] [0.197] Note: F or all speciﬁcations from (1) to (8), GDP is the dependent v ariable. Column 1 shows th e result of the simple regression of Aging on GDP . Column 2 sho ws the result with an add itional regres sor, the initial log GDP p er work er in 19 90. Column 3, in add ition, includes the initial demographic information, the ratio of the p opulation abov e 50 t o those betw een 20 and 49 in 1990 (den oted b y Initial Ratio), and the p opulation in 1990 . Column 4 additionally uses dummies for seven regions, Latin America, East A sia, S outh Asia, Africa, North Africa and Middle East, Eastern Europ e and Centra l Asia, and Developed Countries. Columns (6), (7) and (8) rep ort the result for OECD coun tries using speciﬁcations (1), (3) and (5), resp ectively . Pa nel A reproduces the results b y A cemoglu and Restrepo (2017). F or P anel A, heteroskedas ticity robust standard errors are presen ted in parentheses. Panels B, C, and D presen t the results by MGLS. Panels B and C sho w t h e results where th e conditional error v ariance fun ctions depen d on Aging and Initial Ratio, resp ectively . Panel D rep orts the results where the conditional error v ariance functions dep end on all exogenous v ariables except the regional dummies. F or Columns (1)- ( 4) an d ( 6)- (7) of Panels B and C, standard errors based on the form ula in Theorem 1 are presented in parentheses, while those based on t h e for mula in R emark 3 are presented in square brac kets. F or Columns (1)-(4) and (6)-(7) of Pa nel D, standard errors based on the formula in Theorem 2 are presen ted in p arentheses, while those based on th e form ula analogous to Remark 3 are presented in square brac kets. F or Columns (5) and ( 8) of Panels B, C, and D, standard errors are based on the form u lae analog ous to Remark 3. 21 Appendix A. Pr oof of l emma and th eorem in Section 2 Notation. In this section, we use the follo wing notation. F or a fu nction f ( · ), w e let k f k ∞ = sup x ∈ X | f ( x ) | b e the sup-n orm and k f k 2 ,P = q R | f ( x ) | 2 dP b e the L 2 ( P ) norm; given there is no confusion in the con text, w e use the same set of notations for a vecto r a = ( a 1 , . . . , a k ) ′ : w e let k a k ∞ = max j ∈{ 1 ,...,k } | a k | b e the su p-norm and k a k = q P k j =1 a 2 j b e the Eu clidean norm. D L A [ f ]( a ) b e the left deriv ativ e of the greatest conv ex min oran t of a function f ev aluated at a ∈ A , P n b e the empirical measure of { Y i , X i , Z i } n i =1 , G n b e the empirical pro cess, i.e., G n f = 1 √ n P n i =1 { f ( X i ) − E [ f ( X i )] } , k G n k F = sup f ∈F | G n f | , and I A ( x ) = I { x ∈ A } . Let τ 0 ( x ) = σ 2 ( x ), τ ′ 0 ( x L ) b e the right deriv ativ e of τ 0 at x L , ˆ τ ( x ) = ˆ σ 2 ( x ), W b e the supp ort of W := (1 , X , Z ′ ) ′ , F ( x ) b e the distrib u tion function of X , F n ( x ) = 1 n P n i =1 I { X i ≤ x } , and M n ( x ) = 1 n P n i =1 ˆ U 2 i I { X i ≤ x } . F or a, b ∈ R , let a ∧ b d enote min { a, b } , and a . b denote that there exists a p ositiv e constan t C su ch that a ≤ C · b . Let dim( w ) b e th e dimens ion of a v ector w . A.1. Pro of of L emma 1. Since ˆ U j = Y j − W ′ j ˆ θ OLS is the OLS residual, Assu mptions A1-A2 and ˆ θ OLS − θ = O p ( n − 1 / 2 ) imply ˆ U 2 j − U 2 j = O p ( n − 1 / 2 log n ) = o p ( n − 1 / 3 ) uniformly o ver j = 1 , . . . , n . T o see this, decomp ose ˆ U 2 j − U 2 j = ( Y j − W ′ j ˆ θ OLS ) 2 − ( Y j − W ′ j θ ) 2 = W ′ j ( ˆ θ OLS + θ ) · W ′ j ( ˆ θ OLS − θ ) − 2 W ′ j θ · W ′ j ( ˆ θ OLS − θ ) − 2 U j W ′ j ( ˆ θ OLS − θ ) =: I j + I I j + I I I j . F or I j , note that I j = [ W ′ j ( ˆ θ OLS − θ )] 2 + 2 W ′ j θ · W ′ j ( ˆ θ OLS − θ ) ≤ k W j k 2    ˆ θ OLS − θ    2 + 2 k W j k · k θ k · k W j k ·    ˆ θ OLS − θ    ≤ R 2    ˆ θ OLS − θ    2 + 2 R 2 k θ k ·    ˆ θ OLS − θ    = O p ( n − 1 / 2 ) , (A.1) where R is the constan t d eﬁned in Assu mption A1. The ﬁrst inequ alit y follo ws f r om the Cauch y- Sc h w arz inequalit y , the second inequalit y follo ws from k W j k ≤ R (b y Assump tion A1), and the last equalit y follo w s from ˆ θ OLS − θ = O p ( n − 1 / 2 ). Note that in the second inequalit y , the upp er b ound no longer dep end s on the ind ex j , so w e hav e max j | I j | = O p ( n − 1 / 2 ). F or I I j , using the same reasoning as for th e ﬁrs t inequ alit y in (A.1), w e ha v e max j | I I j | = O p ( n − 1 / 2 ). Note that here w e only consider the second term f ollo wing the ﬁrst inequalit y of (A.1). F or I I I j , the same argument as ab o ve yields max j | W ′ j ( ˆ θ OLS − θ ) | = O p ( n − 1 / 2 ). F urtherm ore, by Assumption A2 and a similar argumen t after equation (7.11) on p .3297 of Balab daoui, Durot and Jank o wsk i (2019) (BDJ h er eafter), we h a v e max 1 ≤ j ≤ n | U 2 j | = O p (log n ). By the fact that max 1 ≤ j ≤ n | U j | ≤ max 1 ≤ j ≤ n | U 2 j — if max 1 ≤ j ≤ n | U j | ≥ 1, w e ha v e max 1 ≤ j ≤ n | U j | = O p (log n ) . (A.2) In the case of max 1 ≤ j ≤ n | U j | < 1, max 1 ≤ j ≤ n | U j | = O p (log n ) holds trivially . Combining (A.2) and max j | W ′ j ( ˆ θ OLS − θ ) | = O p ( n − 1 / 2 ), we hav e max j | I I I j | = O p ( n − 1 / 2 log n ). Consequently , w e 22 ha v e max j | ˆ U 2 j − U 2 j | ≤ max j | I j | + max j | I I j | + max j | I I I j | = O p ( n − 1 / 2 log n ) = o p ( n − 1 / 3 ) . (A.3) F u rthermore, Assumption A3 gu arantees q ∗ n − x L = O ( n − 1 / 3 ) (by an expansion of q ∗ n = F − 1 ( n − 1 / 3 ) for the quant ile fu nction F − 1 ( · ) of X ), and w e can deﬁne c ∗ = lim n →∞ n 1 / 3 ( q ∗ n − x L ) = dF − 1 ( q ) dq    q ↓ 0 ∈ (0 , ∞ ). No w, we analyze n 1 / 3 { ˆ τ ( q ∗ n ) − τ 0 ( x L ) } . The term n 1 / 3 { ˆ τ ( q n ) − τ 0 ( q n ) } w ill b e addressed in the ﬁn al step of this subsection. Pic k any m > 0. Let Z n 1 ( t ) = n 2 / 3 [ { n − 1 / 3 m + τ 0 ( x L ) } F n ( x L + t ( q ∗ n − x L )) − M n ( x L + t ( q ∗ n − x L ))] . Observe that P  n 1 / 3 { ˆ τ ( q ∗ n ) − τ 0 ( x L ) } ≤ m  = P  arg max s ∈ [ x L ,x U ] [ { n − 1 / 3 m + τ 0 ( x L ) } F n ( s ) − M n ( s )] ≥ q ∗ n  = P  arg max t ∈ [0 , ( x U − x L ) / ( q ∗ n − x L )] n − 2 / 3 Z n 1 ( t ) ≥ 1  , (A.4) where the ﬁr st equalit y follo ws from the switch relation (see a review by Gro eneb o om and Jongblo ed, 2014), and the second equalit y f ollo ws from a c hange of v ariables s = x L + t ( q ∗ n − x L ) and its implication, s ≥ q ∗ n ⇔ t ≥ 1. Let ˆ U ( y , w ) = y − w ′ ˆ θ OLS and g n,t ( y , w ) = n 1 / 6 { τ 0 ( x L ) − ˆ U ( y , w ) 2 } I [ x L ,x L + t ( q ∗ n − x L )] ( x ) . W e decomp ose Z n 1 ( t ) = √ n ( P n − P ) g n,t + n 2 / 3 E [ { τ 0 ( x L ) − ˆ U ( Y , W ) 2 } I [ x L ,x L + t ( q ∗ n − x L )] ( X )] + n 1 / 3 m { F n ( x L + t ( q ∗ n − x L )) − F ( x L + t ( q ∗ n − x L )) } + n 1 / 3 mF ( x L + t ( q ∗ n − x L )) =: Z a n 1 ( t ) + Z b n 1 ( t ) + Z c n 1 ( t ) + Z d n 1 ( t ) . Analysis of Z a n 1 ( t ) . W e verify the conditions of v an der V aart (2000 , Th eorem 19.28). Deﬁne the class of ran d om functions (dep ending on ˆ θ OLS ): G n 1 = { g n,t ( y , w ) = n 1 / 6 ( τ 0 ( x L ) − ˆ U ( y , w ) 2 ) I [ x L ,x L + t ( q ∗ n − x L )] ( x ) : t ∈ [0 , K ] } , for K ∈ (0 , ∞ ), wh ere n in the subscript indicates the dep endence on b oth the s caling parameter n 1 / 6 and ˆ θ OLS . By v an der V aart (2000, E x amp le 19.6) we kno w that for a brac k et size ǫ , G n 1 has the en trop y w ith brac keti ng of order log (1 /ǫ ). T h us, G n 1 satisﬁes the ent rop y condition for v an d er V aart (20 00, Theorem 19.28). 23 F or eac h t, s ∈ [0 , K ], note that Co v( g n,t , g n,s ) = n 1 / 3 E [ { ˆ U ( Y , W ) 2 − τ 0 ( x L ) } 2 I [ x L ,x L +( t ∧ s )( q ∗ n − x L )] ( X )] + o p (1) = n 1 / 3 E [ { U 2 − τ 0 ( x L ) } 2 I [ x L ,x L +( t ∧ s )( q ∗ n − x L )] ( X )] + o p (1) = n 1 / 3 E [[ ε 2 + { τ 0 ( X ) − τ 0 ( x L ) } 2 ] I [ x L ,x L +( t ∧ s )( q ∗ n − x L )] ( X )] + o p (1) = n 1 / 3 Z x L +( t ∧ s )( q ∗ n − x L ) x L [ σ 2 ε ( x ) + { τ 0 ( x ) − τ 0 ( x L ) } 2 ] f X ( x ) dx + o p (1) = [ σ 2 ε ( ξ n ) + { τ 0 ( ξ n ) − τ 0 ( x L ) } 2 ] f X ( ξ n ) c ∗ ( t ∧ s ) + o p (1) = σ 2 ε ( x L ) f X ( x L ) c ∗ ( t ∧ s ) + o p (1) , (A.5) for ξ n ∈ ( x L , x L + ( t ∧ s ) q ∗ n ). The ﬁrst equalit y follo ws from q ∗ n − x L = O ( n − 1 / 3 ). In the second equalit y , we r eplace the estimated ˆ U 2 with the unobserv able U 2 . By (A.3), the discrepancy b et w een ˆ U 2 and U 2 con verges more r apidly than n − 1 / 3 , and th e factor I [ x L ,x L +( t ∧ s )( q ∗ n − x L )] ( X ) further reﬁnes this rate. C onsequent ly , under Assu mptions A1 and A2, the impact of substituting ˆ U 2 with U 2 in the second line is o p (1). The third equalit y f ollo ws fr om the deﬁn ition ε = U 2 − τ 0 ( X ) and E [ ε | X ] = 0, the fourth equalit y f ollo ws from the la w of iterated exp ectations, the ﬁfth equ alit y f ollo ws from a T aylo r expansion, and the last equalit y follo ws fr om c ∗ = lim n →∞ n 1 / 3 ( q ∗ n − x L ) and the cont in uit y of σ 2 ε ( · ) and τ 0 ( · ) at x L from right. Similarly , we ha v e V ar( g n,t ) = σ 2 ε ( x L ) f X ( x L ) c ∗ t + o p (1). W e next co nsider the env elop function of the cla ss G n 1 , that is G n 1 ( y , w ) = n 1 / 6 | τ 0 ( x L ) − ˆ U ( y , w ) 2 | · I [ x L ,x L + K ( q ∗ n − x L )] ( x ) . W e can s ee th at G n 1 is squ are inte grable since similar argumen ts to (A.5) yield E [ G 2 n 1 ( Y , W )] = n 1 / 3 E [ | τ 0 ( x L ) − ˆ U ( Y , W ) 2 | · I [ x L ,x L + K ( q ∗ n − x L )] ( X )] = n 1 / 3 E [ | τ 0 ( x L ) − U 2 | · I [ x L ,x L + K ( q ∗ n − x L )] ( X )] + o p (1) = n 1 / 3 E [[ ε 2 + { τ 0 ( X ) − τ 0 ( x L ) } 2 ] · I [ x L ,x L + K ( q ∗ n − x L )] ( X )] + o p (1) = n 1 / 3 Z x L + K ( q ∗ n − x L ) x L [ σ 2 ε ( x ) + { τ 0 ( x ) − τ 0 ( x L ) } 2 ] f X ( x ) dx + o p (1) = O p (1) , (A.6) and thus the Lindeb erg condition can b e v er iﬁ ed b y Assu m ption A2: for an y ζ > 0 and some δ > 0, E [ G 2 n 1 I { G n 1 > ζ √ n } ] ≤ n (2+ δ )1 / 6 ζ δ n δ/ 2 E [ | τ 0 ( x L ) − ˆ U ( Y , W ) 2 | 2+ δ · I [ x L ,x L + K ( q ∗ n − x L )] ( X )] = n (2+ δ )1 / 6 ζ δ n δ/ 2 E [ | τ 0 ( x L ) − U 2 | 2+ δ · I [ x L ,x L + K ( q ∗ n − x L )] ( X )] + o p (1) = O ( n − δ/ 3 ) + o p (1) = o p (1) , (A.7) where the inequalit y f ollo ws from the same argumen ts that are used in the pr o of of Mark o v’s inequalit y , the ﬁr st equalit y follo ws from ˆ θ OLS − θ = O p ( n − 1 / 2 ) and Assump tions A1-A2, and the second equalit y follo ws from a similar argument to (A.6 ). 24 F u rthermore, as δ n → 0, w e ob tain sup | t − s |≤ δ n E | g n,t − g n,s | 2 = n 1 / 3 sup | t − s |≤ δ n E [ { ˆ U ( Y , W ) 2 − τ 0 ( x L ) } 2 I [ x L ,x L + | t − s | q ∗ n ] ( X )] = n 1 / 3 sup | t − s |≤ δ n E [[ ε 2 + { τ 0 ( X ) − τ 0 ( x L ) } 2 ] · I [ x L ,x L + | t − s | q ∗ n ] ( X )] + o p ( δ n ) = O p ( δ n ) = o p (1) . (A.8) By (A.5)-(A.8), we can app ly v an der V aart (2000, Th eorem 19.28), which implies for eac h K ∈ (0 , ∞ ), Z a n 1 ( t ) d → p σ 2 ε ( x L ) f X ( x L ) c ∗ W t in l ∞ [0 , K ] . (A.9) Analysis of Z b n 1 ( t ) . Observe that Z b n 1 ( t ) = n 2 / 3 E [ { τ 0 ( x L ) − U 2 } I [ x L ,x L + t ( q ∗ n − x L )] ( X )] + n 2 / 3 E [( U 2 − ˆ U ( Y , W ) 2 ) I [ x L ,x L + t ( q ∗ n − x L )] ( X )] = n 2 / 3 Z x L + t ( q ∗ n − x L ) x L { τ 0 ( x L ) − τ 0 ( F − 1 ( F ( x ))) } dF ( x ) + o p (1) = n 2 / 3 Z F ( x L + t ( q ∗ n − x L )) F ( x L ) { τ 0 ( x L ) − τ 0 ( F − 1 ( v )) } dv + o p (1) = − n 2 / 3 Z F ( x L + t ( q ∗ n − x L )) F ( x L ) τ ′ 0 ( x L ) { F − 1 ( v ) − F − 1 ( F ( x L )) } dv + o p (1) = − n 2 / 3 Z F ( x L + t ( q ∗ n − x L )) F ( x L ) τ ′ 0 ( x L ) v − F ( x L ) f X ( x L ) dv + o p (1) = − n 2 / 3 τ ′ 0 ( x L ) { F ( x L + t ( q ∗ n − x L )) − F ( x L ) } 2 2 f X ( x L ) + o p (1) = − τ ′ 0 ( x L ) t 2 ( c ∗ ) 2 2 f X ( x L ) + o p (1) (A.10) holds u niformly o ve r t ∈ [0 , K ], wher e the second equalit y follo ws from E [ { U 2 − ˆ U ( Y , W ) 2 } · I [ x L ,x L + t ( q ∗ n − x L )] ( X )] = o p ( n − 2 / 3 ), the third equalit y follo w s from a change of v ariables v = F ( x ), the fourth equalit y follo ws f r om a T aylo r expans ion, the ﬁ fth equalit y follo ws fr om F − 1 ( v ) − x L = 1 f X ( x L ) ( v − F ( x L )) + o ( v − F ( x L )), the sixth equality follo ws from ev aluating th e in tegral, and the last equalit y follo w s f rom a T ayl or expansion and c ∗ = lim n →∞ n 1 / 3 ( q ∗ n − x L ). Analysis of Z c n 1 ( t ) . By Kim and P ollard (1990, Maximal inequalit y 3.1), E " sup t ∈ [0 ,K ] | F n ( x L + t ( q ∗ n − x L )) − F ( x L + t ( q ∗ n − x L )) | # ≤ n − 1 / 2 J p P G 2 n holds for some constant J ∈ (0 , ∞ ). Here G n is the env elop e of the set of ind icator fun ctions, th us P G 2 n ≤ 1. As a result, Z c n 1 ( t ) ≤ n 1 / 3 n − 1 / 2 mJ p P G 2 n = o (1) , (A.11) uniformly ov er t ∈ [0 , K ]. Analysis of Z d n 1 ( t ) . A T aylo r expansion yields Z d n 1 ( t ) = n 1 / 3 mF ( x L + t ( q ∗ n − x L )) = m · t · f X ( x L ) c ∗ + o (1) , (A.12) 25 uniformly ov er t ∈ [0 , K ], for eve ry K < ∞ . Com bining (A.9 )-(A.12), it holds that for eac h 0 < K < ∞ , Z n 1 ( t ) d → Z 1 ( t ) := p σ 2 ε ( x L ) f X ( x L ) c ∗ W t − τ ′ 0 ( x L ) t 2 ( c ∗ ) 2 2 f X ( x L ) + m · t · f X ( x L ) c ∗ in l ∞ [0 , K ] . (A.13) W e no w ve rify the conditions of th e argmax cont in uous mapping theorem (Kim and Polla rd, 1990) . Note that for eac h t 6 = s , V ar ( Z 1 ( s ) − Z 1 ( t )) = σ 2 ε ( x L ) f X ( x L ) c ∗ | t − s | 6 = 0 . By Kim and Pol lard (1990), th e pro cess t → Z 1 ( t ) ac hiev es its maxim um a.s. at a u nique p oin t. Consider extended v ers ions of Z n 1 and Z 1 to the real line: ˜ Z n 1 ( t ) =    Z n 1 ( t ) , t ≥ 0 t t < 0 , ˜ Z 1 ( t ) =    Z 1 ( t ) , t ≥ 0 t t < 0 . It holds ˜ Z n 1 ( t ) d → ˜ Z 1 ( t ), and the similar argum ent to Lemma S M.2.1 (ii) in Babii and Kumar (2023 ) yields that the maximum of ˜ Z n 1 ( t ) is uniformly tight. T herefore, b y Kim and P ollard (1990 , Theorem 2.7), P  n 1 / 3 { ˆ τ ( q ∗ n ) − τ 0 ( x L ) } ≤ m  → P  arg max t ≥ 0 Z 1 ( t )  ≥ 1  = P " arg max t ≥ 0 s σ 2 ε ( x L ) c ∗ f X ( x L ) W t − τ ′ 0 ( x L ) t 2 c ∗ 2 + mt # ≥ 1 ! = P " D L [0 , ∞ ) s σ 2 ε ( x L ) c ∗ f X ( x L ) W t + τ ′ 0 ( x L ) t 2 c ∗ 2 ! (1) # ≤ m ! , where the second equalit y f ollo ws f rom the s witc h relation and symmetry of the pro cess W t . Th us, we ha v e n 1 / 3 { ˆ τ ( q ∗ n ) − τ 0 ( x L ) } d → D L [0 , ∞ ) s σ 2 ε ( x L ) c ∗ f X ( x L ) W t + τ ′ 0 ( x L ) t 2 c ∗ 2 ! (1) , (A.14) whic h also implies n 1 / 3 { ˆ τ ( q ∗ n ) − τ 0 ( q ∗ n ) } d → D L [0 , ∞ ) s σ 2 ε ( x L ) c ∗ f X ( x L ) W t + τ ′ 0 ( x L ) t 2 c ∗ 2 ! (1) − lim n →∞ n 1 / 3 { τ 0 ( q ∗ n ) − τ 0 ( x L ) } d ∼ D L [0 , ∞ ) s σ 2 ε ( x L ) c ∗ f X ( x L ) W t + τ ′ 0 ( x L ) t 2 c ∗ 2 − τ ′ 0 ( x L ) c ∗ t ! (1) , (A.15) where the distribution relation follo ws fr om the fact that the D L [0 , ∞ ) is a linear op erator for a linear fun ction of t . Finally , we analyze n 1 / 3 { ˆ τ ( q n ) − τ 0 ( q n ) } . Recall q n is the ( n − 1 / 3 )-th sample quan tile of X . As- sumption A3 guaran tees q n − q ∗ n = O p ( n − 1 / 2 ) = o p ( n − 1 / 3 ), whic h also implies plim n →∞ n 1 / 3 ( q n − x L ) = lim n →∞ n 1 / 3 ( q ∗ n − x L ) = c ∗ . Th us, th e same argumen t for (A.14) can b e applied to s ho w 26 that the result in (A.14) holds tr u e ev en if we replace q ∗ n with q n . T herefore, the conclusion follo ws. A.2. Pro of of Theorem 1. By the deﬁn itions of the estimators, it h olds that √ n ( ˆ θ − θ ) =   1 n X i : x i >q n ˆ σ − 2 i W i W ′ i   − 1   1 √ n X i : x i >q n ˆ σ − 2 i W i U i   , √ n ( ˆ θ IGLS − θ ) = 1 n n X i =1 σ − 2 i W i W ′ i ! − 1 1 √ n n X i =1 σ − 2 i W i U i ! . Th us it is suﬃcient for the conclusion to show T 1 := 1 √ n X i : x i >q n ˆ σ − 2 i W i U i − 1 √ n n X i =1 σ − 2 i W i U i p → 0 , T 2 := 1 n X i : x i >q n ˆ σ − 2 i W i W ′ i − 1 n n X i =1 σ − 2 i W i W ′ i p → 0 . A.2.1. The c onc entr ation of T 1 . Decomp ose T 1 = 1 √ n X i : x i >q n ( ˆ σ − 2 i − σ − 2 i ) W i U i − 1 √ n X i : x i ≤ q n σ − 2 i W i U i =: T 11 − T 12 . W e ﬁ rst consider T 12 . F or an y h ∈ { 1 : dim( W ) } , let W h i and T h 12 b e the h - th elemen t of W i and T 12 , resp ectiv ely . Note that E [ T h 12 | q n ] = 0 by E [ U | W ] = 0. Also we h a ve V ar ( T h 12 | q n ) p → 0. T o see this, decomp ose V ar( T h 12 | q n ) = I h − n · ( I I h ) 2 , where I h = 1 n E h  P n i =1 I { X i ≤ q n } σ − 2 i W h i U i  2    q n i and I I h = E [ I { X i ≤ q n } σ − 2 i W h i U i | q n ]. F or I h , note that I h = 1 n E   E   n X i =1 I { X i ≤ q n } σ − 2 i W h i U i ! 2       W         q n   = E [ E [( I { X i ≤ q n } σ − 2 i W h i U i ) 2 | W ] | q n ] = E [ I { X ≤ q n } σ − 2 ( X )( W h ) 2 | q n ] ≤ R 2 σ − 2 ( x L ) E [ I { X ≤ q n }| q n ] p → 0 , where W = ( W 1 , . . . , W n ) ′ . The ﬁrst equalit y follo ws fr om the la w of iterate d exp ectation and the fact that q n is a function of W , the second equ alit y follo ws from E [ U | W ] = 0 and { U i } n i =1 b eing iid, the third equalit y follo ws b ecause cond itional on W , I { X i ≤ q n } ( σ − 2 i W h i ) 2 is treat ed as ﬁxed, the inequality follo ws from Ass umptions A1 and A2, and the con ve rgence f ollo ws from q n p → x L . F or I I h , note that I I h = E [ I { X i ≤ q n } σ − 2 i W h i E [ U i | W ] | q n ] = 0 , where the ﬁ rst equ alit y follo w s from the la w of iterated exp ectation and the fact that q n is a function of W , and the second equalit y follo ws from E [ U i | W ] = E [ U i | W i ] = 0. Sin ce E [ T h 12 | q n ] = 0 and V ar( T h 12 | q n ) p → 0 hold for ev ery h , we can conclude that T 12 p → 0. 27 T o pr o ceed, w e will utilize Lemma 3 b elo w. Its pr o of can b e foun d at the end of App endix A.2. Recall th at earlier in this app endix, we relabel σ 2 ( · ) as τ 0 ( · ), and ˆ τ is used to denote the isotonic estimator of σ 2 ( · ). Additionally , with some abuse of n otation, w e use w h to d enote the h -th elemen t of vect or w . Lemma 3. Under Assumptions A1-A3, (i): k ˆ τ k ∞ = O p (log n ) , (ii): k ˆ τ − τ 0 k 2 2 ,P = O p ((log n ) 2 n − 2 / 3 ) , (iii): E [ k G n k F n ] ≤ Aν 2 holds for any c onstant s A > 0 and ν > 0 , and al l suﬃc i ently lar ge n , wher e F n is the function class deﬁne d as F n =      f n ( w, u ) = I { x > q n }  1 τ ( x ) − 1 τ 0 ( x )  w h u : τ ≥ 0 is monoto ne incr e asing on X , k τ k ∞ ≤ C log n, k τ − τ 0 k 2 2 ,P ≤ C r n , I { x > q n } /τ ( x ) ≤ 1 /K 0 , h ∈ { 1 : dim( w ) }      , (A.16) with C and K 0 b eing some p ositive c onstants, and r n = (log n ) 2 n − 2 / 3 . No w w e f o cus on T 11 . Since the pro of is similar, w e only present the pro of for the h -th elemen t of T 11 , i.e., for any constan t A > 0, P {| G n ˆ f | ≥ A } → 0 , (A.17) where ˆ f ( w, u ) = I { x > q n }  1 ˆ τ ( x ) − 1 τ 0 ( x )  w h u . T o this end, w e set τ 0 ( x L ) = C 0 = 2 K 0 > 0. It holds that for any A > 0 and ν > 0, there exists a p ositiv e constan t C such that P {| G n ˆ f | ≥ A } ≤ P  | G n ˆ f | ≥ A, k ˆ τ k ∞ ≤ C log n, k ˆ τ − τ 0 k 2 2 ,P ≤ C r n , I { x > q n } ˆ τ ( x ) ≤ 1 K 0  + ν 2 ≤ E  k G n k F n  A + ν 2 ≤ ν, (A.18) for all suﬃciently large n , where the ﬁrst inequ ality follo ws from Lemma 1 and Lemma 3 (i)- (ii), and the fact that ˆ τ is monotone increasing (so that the lo w er b ound at the truncation p oint is the u niform lo wer b ound). Sp eciﬁcally , for an y ν > 0, w e can ﬁn d C > 0 and a p ositiv e intege r n 0 suc h that for any in teger n > n 0 , it holds th at (a) P {k ˆ τ k ∞ > C log n } < ν 6 , (b) P {k ˆ τ − τ 0 k 2 2 ,P > C r n } < ν 6 , and (c) P n I { x>q n } ˆ τ ( x ) > 1 K 0 o < ν 6 . Pa rts (a) and (b) are ensured b y Lemma 3 (i) and (ii), r esp ectiv ely; part (c) is guarantee d by Lemma 1. As a result, P  {k ˆ τ k ∞ > C log n } or {k ˆ τ − τ 0 k 2 2 ,P > C r n } or n I { x>q n } ˆ τ ( x ) > 1 K 0 o < ν 2 . In the case of lim x ↓ x L dσ 2 ( x ) dx = 0, part (c) remains v alid since ˆ τ ( x ) will conv erge to τ 0 at a faster rate (the √ n -rate), then the ﬁrst inequalit y of (A.18) holds without in v oking Lemma 1. The second inequalit y of (A.18) f ollo ws f rom Marko v’s inequ alit y and the deﬁnition of F n , whic h is giv en by (A.16). The last in equ alit y follo ws from Lemma 3 (iii). Sin ce ν can b e arbitrarily sm all, w e obtain (A.17) and the co nclusion follo ws. 28 A.2.2. Pr o of of T 2 = o p (1) . Note th at T 2 = 1 n X i : x i >q n ˆ σ − 2 i W i W ′ i − 1 n n X i =1 σ − 2 i W i W ′ i = 1 n X i : x i >q n ( ˆ σ − 2 i − σ − 2 i ) W i W ′ i − 1 n X i : x i ≤ q n σ − 2 i W i W ′ i =: T 21 − T 22 . First, we hav e T 22 p → 0 s ince q n p → x L . F or T 21 , let s n b e the (1 − n − 1 / 3 )-th s ample quan- tile of { X i } n i =1 . By emplo yin g argumen ts similar to those in the pro of of Lemma 1, we hav e ˆ σ 2 ( s n ) − σ 2 ( s n ) = O p ( n − 1 / 3 ). Using reasoning akin to, ye t simpler than, those in the p r o of of Lemma 1, we ca n establish that for any x ∈ ( q n , s n ), it h olds th at ˆ σ 2 ( x ) − σ 2 ( x ) = O p ( n − 1 / 3 ). Com bining the aforementio ned results with the monotonicit y of b oth ˆ σ 2 ( · ) and σ 2 ( · ), we can conclude that sup x ∈ [ q n ,s n ] | ˆ σ 2 ( x ) − σ 2 ( x ) | = O p ( n − 1 / 3 ), i.e., ˆ σ 2 ( x ) is un iformly consistent within trimmed domain [ q n , s n ] (the pro of here resem bles the one giv en for the Gliv enko -Can telli The- orem regarding th e un if orm consistency of th e empirical distribu tion fu nction; see, for example, the pr o of of Theorem 19.1 in v an der V aart, 2000 ). Therefore, w e ha ve T 21 = 1 n X i : q n 0 and a shr inking sequence ǫ n , set inclusion relationships y ield P ( d 2 2 ( ˆ τ , τ 0 ) ≥ ǫ 2 n ) = P  d 2 ( ˆ τ , τ 0 ) ≥ ǫ n , Z g ˆ τ ( u, x ) d ( P n − P )( u, x ) + R n ≥ d 2 2 ( ˆ τ , τ 0 )  ≤ P d 2 ( ˆ τ , τ 0 ) ≥ ǫ n , | R n | ≤ C n − 1 (log n ) 2 , k ˆ τ k ∞ ≤ K log n R g ˆ τ ( u, x ) d ( P n − P )( u, x ) + R n − d 2 2 ( ˆ τ , τ 0 ) ≥ 0 ! + P ( | R n | > C n − 1 (log n ) 2 ) + P ( k ˆ τ k ∞ > K log n ) =: P 1 + P 2 + P 3 , where the ﬁrst equ ality follo ws from (A.25 ). F or P 2 and P 3 , (A.26) and Lemma 3 (i) imply th at w e can choose C and K to mak e th ese terms arbitrarily sm all. Thus, we fo cus on the ﬁrs t term P 1 . No w let T = { τ : τ is p ositive and monoto ne increasing on X , k τ k ∞ ≤ K log n } , G = { g τ ( u, x ) = { 2 u 2 τ ( x ) − τ ( x ) 2 } − { 2 u 2 τ 0 ( x ) − τ 0 ( x ) 2 } : τ ∈ T } , G v = { g ∈ G : d 2 ( τ , τ 0 ) ≤ v } . Set inclusion relationships and Mark o v’s inequality yield P 1 ≤ P sup τ ∈T ,d 2 ( τ ,τ 0 ) ≥ ǫ n { Z g τ ( u, x ) d ( P n − P )( u, x ) − d 2 2 ( τ , τ 0 ) } ≥ − C n − 1 (log n ) 2 ! ≤ ∞ X s =0 P sup τ ∈T , 2 s ǫ n ≤ d 2 ( τ ,τ 0 ) ≤ 2 s +1 ǫ n √ n { Z g τ ( u, x ) d ( P n − P )( u, x ) } ≥ √ n  2 2 s ǫ 2 n − C n − 1 (log n ) 2  ! ≤ ∞ X s =0 P  k G n g k G 2 s +1 ǫ n ≥ √ n  2 2 s ǫ 2 n − C n − 1 (log n ) 2   ≤ ∞ X s =0 E [ k G n g k G 2 s +1 ǫ n ] / { √ n  2 2 s ǫ 2 n − C n − 1 (log n ) 2  } . F or a suﬃcientl y large constan t ˜ C > 0, the s equence ǫ 2 n := ˜ C (log n ) 2 n − 2 3 dominates C n − 1 (log n ) 2 , so it holds √ n  2 2 s ǫ 2 n − C n − 1 (log n ) 2  = √ n 2 2 s ǫ 2 n (1 + o (1)). Therefore, the standard result for the L 2 -con vergence of the isot onic estimato r under Assu mption A2 (e.g., pp. 8-11 in BGH-supp) implies that the last term can b e made arbitrarily small by appropriately selecting ˜ C . Th us, the pr o of is concluded. A.2.5. Pr o of of L emma 3 (iii). W e s h o w E [ k G n k F n ] ≤ Aν 2 b y using v an d er V aart and W ellner (1996 , Lemma 3.4. 3). First w e in tr o duce some notation for this part. Let N [] ( ε, F , || · || ) b e the ε - brac k eting n um b er of the fun ction class F u nder the norm || · || , H B ( ε, F , || · || ) = log N [] ( ε, F , || · || ) b e the en tropy , J n ( δ , F , || · || ) = R δ 0 p 1 + H B ( ε, F , || · || ) dε , and k f k B ,P = (2 E [ e | f | − | f | − 1]) 1 / 2 b e the Bernstein n orm. Lemma 3.4.3 in v an der V aart and W ellner (1996): Let F b e a class of measurable functions such that k f k 2 B ,P ≤ δ for every f in F . Then E [ k G n k F ] . J n ( δ , F , || · || B ,P ) { 1 + J n ( δ , F , || · || B ,P ) / ( √ nδ 2 ) } . 31 T o apply this lemma, w e need to compu te H B ( ǫ, ˜ F n , k · k B ,P ) and k ˜ f k 2 B ,P , where ˜ F n = { ˜ f = D − 1 f : f ∈ F n } , the function class F n is deﬁned b elo w in (A.27 ), and the constan t D > 0 will b e c hosen later to guaran tee that the Bernstein norm of ˜ f is ﬁnite. Moreo ver, let us deﬁn e the follo wing function cla ss: T I ,K 1 = { τ m onotone n on-decreasing on the interv al I and 0 < τ < K 1 } . Assumption A2 implies that there exist p ositiv e constants, C an d C , suc h that 0 < C < τ 0 < C < ∞ . Also let F n = ( f n ( w, u ) = I { x > q n }  1 τ ( x ) − 1 τ 0 ( x )  w h u : τ ∈ T X ,K 1 , k τ − τ 0 k 2 2 ,P ≤ v 2 , I { x > q n } /τ ( x ) ≤ 1 /K 0 , h ∈ { 1 : dim( w ) } ) , (A.27) where w h is the h -th comp onent of vec tor w . W e set 2 K 0 = C , K 1 = K 2 log n , an d v = K 3 (log n ) n − 1 / 3 for some constan ts K 2 , K 3 > 0. Consider ǫ -brac k ets ( τ L , τ U ) under the L 2 ( P )-norm for the fu n ctions in T I ,K 1 . According to v an d er V aart and W ellner (1996 , Theorem 2.7 .5), there exists some constant C ′ > 0 such that H B ( ǫ, T X ,K 1 , k · k 2 ,P ) ≤ C ′ K 1 ǫ , for eac h ǫ ∈ (0 , K 1 ) . (A.28) Without loss of generalit y , w e can choose those br ac ket f u nctions that satisfy I { x > q n } /τ L ( x ) ≤ 1 /K 0 . 5 Deﬁne f L ( w, u ) =    I { x > q n }  1 τ U ( x ) − 1 τ 0 ( x )  w h u if w h u ≥ 0 , I { x > q n }  1 τ L ( x ) − 1 τ 0 ( x )  w h u if w h u < 0 , f U ( w, u ) =    I { x > q n }  1 τ L ( x ) − 1 τ 0 ( x )  w h u if w h u ≥ 0 , I { x > q n }  1 τ U ( x ) − 1 τ 0 ( x )  w h u if w h u < 0 . Note that ( f L , f U ) is a b rac k et of f ∈ F n for every q n ∈ [ x L , x U ]. No w w e compute the brac ket size of ( ˜ f L , ˜ f U ) := ( D − 1 f L , D − 1 f U ) with resp ect to the Be rn- stein norm . Note that k ˜ f U − ˜ f L k 2 B ,P = k D − 1 f U − D − 1 f L k 2 B ,P ≤ 2 ∞ X k =2 1 k ! D k Z W × R     τ U ( x ) − τ L ( x ) τ L ( x ) τ U ( x ) w h u     k dP ( w, u ) ≤ 2 ∞ X k =2 1 k ! D k ( R k k ! M k − 2 0 a 0 (2 K 1 ) k − 2 K 2 k 0 k τ U − τ L k 2 2 ,P ) ≤ 2 a 0  R D K 2 0  2 ∞ X k =0  2 RM 0 K 1 D K 2 0  k ǫ 2 , where the ﬁrst inequalit y follo ws from the deﬁnition of k · k 2 B ,P and I { x > q n } ≤ 1, the second inequalit y follo ws from Assum p tion A2 (where w e can choose a 0 , M 0 > 1) and I { x>q n } τ L ( x ) ≤ 1 K 0 . 5 By deﬁnition ( A.27), the τ ( · ) associated to F n must satisfy I { x > q n } /τ ( x ) ≤ 1 /K 0 . Since T X ,K 1 is a cla ss of monotone increasing function, any ǫ - brack ets of T X ,K 1 can b e mo diﬁed t o b e a ǫ - brack et of t he “ F n -subset” of T X ,K 1 , satisfying I { x > q n } /τ ( x ) ≤ 1 /K 0 by leveli ng-up certain part of low er bound s functions τ L , without changi ng the brac ket n umbers, and th e size of each mo diﬁed brack et can only b e smaller. 32 Th us, by setting D = 4 M 0 RK 1 /K 2 0 , we obtain k ˜ f U − ˜ f L k 2 B ,P ≤ a 0 4 M 2 0 K 2 1 ǫ 2 , whic h implies k ˜ f U − ˜ f L k B ,P ≤ ˜ K ǫ, for ˜ K = a 1 / 2 0 2 M 0 K 1 . Note that ( ˜ f L , ˜ f U ) is: (a ) a set of br ac kets in ˜ F n , (b) one-to-one induced by ( τ L , τ U ), an ǫ -brac k et in T X ,K 1 with the entrop y H B ( ǫ, T X ,K 1 , k · k 2 ,P ), an d (c) k ˜ f U − ˜ f L k B ,P ≤ ˜ K ǫ . Based on these facts, (A.28) yields H B ( ˜ K ǫ, ˜ F n , k · k B ,P ) ≤ H B ( ǫ, T X ,K 1 , k · k 2 ,P ) ≤ C ′ K 1 ǫ , whic h imp lies (by a c hange-of-v ariable argument ) H B ( ǫ, ˜ F n , k · k B ,P ) ≤ ˜ K C ′ K 1 ǫ = ˜ B ǫ , for ˜ B = C ′ a 1 / 2 0 2 M 0 . (A.29) W e now characte rize the Bernstein norm of ˜ f , k ˜ f k 2 B ,P ≤ 2 ∞ X k =2 1 k ! D k Z W × R     τ ( x ) − τ 0 ( x ) τ ( x ) τ 0 ( x ) w h u     k dP ( w, u ) ≤ 2 ∞ X k =2 1 k ! D k ( R k k ! M k − 2 0 a 0 (2 K 1 ) k − 2 K 2 k 0 k τ − τ 0 k 2 2 ,P ) ≤ 2 a 0  R D K 2 0  2 ∞ X k =0  2 RM 0 K 1 D K 2 0  k v 2 ≤ a 0 4 M 2 0 1 K 2 1 v 2 , where the second inequalit y follo ws fr om I { x>q n } τ ( x ) ≤ 1 K 0 , an d the th ird inequalit y follo ws fr om (A.27) and some rearrangements. Then, w e ha ve k ˜ f k B ,P ≤ B v K 1 , for B = a 1 / 2 0 2 M 0 . (A.30) Com bining (A.29 ) and (A.30 ), v an der V aart and W ellner (19 96, Lemma 3.4.3) implies E [ k G n k ˜ F n ] . J n ( B K − 1 1 v )  1 + J n ( B K − 1 1 v ) √ nB 2 v 2 /K 2 1  , where J n ( · ) is the abb reviation of J n ( · , ˜ F n , k · k B ,P ). By the argumen ts used in the pro of of Prop osition 7.9 of BDJ, it holds J n ( B K − 1 1 v ) ≤ B K − 1 1 v + 2 ˜ B 1 / 2 B 1 / 2 K − 1 / 2 1 v 1 / 2 . B 1 K − 1 / 2 1 v 1 / 2 , for some B 1 > 0 and suﬃcien tly small v . This implies E [ k G n k ˜ F n ] . B 1 K − 1 / 2 1 v 1 / 2 1 + K 2 1 B 1 K − 1 / 2 1 v 1 / 2 √ nB 2 v 2 ! . B 1 K − 1 / 2 1 v 1 / 2 1 + B 2 K 3 / 2 1 √ nv 3 / 2 ! , for some B 2 > 0. By the deﬁn ition of the cla ss ˜ F n = { ˜ f = D − 1 f : f ∈ F n } , it follo ws that E [ k G n k F n ] = D · E [ k G n k ˜ F n ] . D B 1 K − 1 / 2 1 v 1 / 2 1 + B 2 K 3 / 2 1 √ nv 3 / 2 ! . B 3 K − 2 0 K 1 / 2 1 v 1 / 2 1 + B 2 K 3 / 2 1 √ nv 3 / 2 ! , 33 for some B 3 > 0. The conclusion follo ws by observing that with v = K 3 (log n ) n − 1 / 3 , K 1 = K 2 log n , and all suﬃcient ly large n , w e ha v e E [ k G n k F n ] . C 3 (log n ) n − 1 / 6 (1 + C 4 ) . Aν 2 , where C 3 = B 3 K − 2 0 K 1 / 2 2 K 1 / 2 3 and C 4 = B 2 ( K 2 /K 3 ) 3 / 2 . Appendix B. Pr oof of lemma and theo rem in S ection 3 Notation. T o a void hea vy notations, some of them are used in App endix A but redeﬁned here. Deﬁne τ η ( a ) = E [ σ 2 ( X ′ η 0 ) | X ′ η = a ] and τ η 0 ( a ) = τ 0 ( a ) (note that τ 0 ( x ′ η 0 ) = σ 2 ( x ′ η 0 )). Let ˆ τ η = ˆ τ η ( x ′ η ) b e the isotonic estimator obtained by (3.3) for a giv en η , W b e the su pp ort of W := (1 , X ′ , Z ′ ) ′ , F n ( t ) = 1 n P n i =1 I { X ′ i ˆ η ≤ t } , and M n ( t ) = 1 n P n i =1 ˆ U 2 i I { X ′ i ˆ η ≤ t } . B.1. Pro of of Le mma 2. The main p art of th e pro of is similar to that of Lemma 1. Recall that q ∗ n is the ( n − 1 / 3 )-th p opulation quant ile of ( X ′ η 0 ) and q n is the ( n − 1 / 3 )-th sample qu antile of { X ′ i ˆ η } n i =1 with ˆ η estimated b y (3.4). T o pro ceed, we use the follo wing lemma: Lemma 4. Under Assumptions M1-M6, it holds (i): ˆ η − η 0 = O p ( n − 1 / 2 ) , (ii): τ ˆ η ( a ) − τ 0 ( a ) = O p ( n − 1 / 2 ) for e ach a , and k τ ˆ η − τ 0 k 2 ,P = O p ( n − 1 / 2 ) . The pro of of this lemma is in App endix B.3. Based on Lemma 4 (i), Assum ptions M2-M3, and p rop erties of the sample quantile , we obtain q n − q ∗ n = O p ( n − 1 / 2 ) = o p ( n − 1 / 3 ), whic h implies c ∗ = lim n →∞ n 1 / 3 ( q ∗ n − x L ) = plim n →∞ n 1 / 3 ( q n − x L ) < ∞ . By Assumption M2, Lemma 4 (ii), and similar argumen ts in App end ix A.1, we ha v e n 1 / 3 { ˆ τ ˆ η ( q n ) − τ 0 ( q n ) } = n 1 / 3 { ˆ τ ˆ η ( q n ) − τ ˆ η ( q n ) } + o p (1) = n 1 / 3 [ { ˆ τ ˆ η ( q n ) − τ ˆ η ( x L ) } − { τ ˆ η ( q n ) − τ 0 ( x L ) } ] + o p (1) d → D L [0 , ∞ ) s σ 2 ε ( x L ) c ∗ f X ( x L ) W t + τ ′ 0 ( x L ) t 2 c ∗ 2 ! (1) − plim n →∞ n 1 / 3 { τ 0 ( q n ) − τ 0 ( x L ) } d ∼ D L [0 , ∞ ) s σ 2 ε ( x L ) c ∗ f X ( x L ) W t + τ ′ 0 ( x L ) t 2 c ∗ 2 ! (1) − lim n →∞ n 1 / 3 { τ 0 ( q ∗ n ) − τ 0 ( x L ) } d ∼ D L [0 , ∞ ) s σ 2 ε ( x L ) c ∗ f X ( x L ) W t + τ ′ 0 ( x L ) t 2 c ∗ 2 − τ ′ 0 ( x L ) c ∗ t ! (1) , where the ﬁrst and second equalities follo w from Lemma 4 (ii), the con v ergence follo ws from a similar argument to (A.15 ), the ﬁrst distribution relation follo ws from L emma 4 (ii), Ass umption M2(iv), and q ∗ n − q n = o p ( n − 1 / 3 ), an d th e second distribu tion relation follo ws from the fact that the D L [0 , ∞ ) is a linear op erator for a linear fun ction of t . B.2. Pro of of T heorem 2. Sim ilar to Th eorem 1, it is suﬃcient for the conclusion to pro v e the follo wing lemma. Lemma 5. Under Assumptions M1-M6, it holds 34 (i): k ˆ τ η k ∞ = O p (log n ) uniformly over η ∈ B ( η 0 , δ 0 ) , (ii): k ˆ τ ˆ η − τ 0 k 2 2 ,P = O p ((log n ) 2 n − 2 / 3 ) , (iii): E  k G n k F n  ≤ Aν 2 holds for any c onstan ts A > 0 and ν > 0 , and al l suﬃciently lar ge n , wher e F n is the function class deﬁne d as F n =            f n ( w, u ) = I { x ′ η > q n }  1 τ ( x ′ η ) − 1 τ η ( x ′ η )  w h u : τ ≥ 0 is monotone incr e asing on I η , k τ k ∞ ≤ C log n, k τ − τ η k 2 2 ,P ≤ C r n , I ( x ′ η > q n ) /τ ( x ′ η ) ≤ 1 /K 0 , h ∈ { 1 : d im( w ) }            , with C and K 0 b eing some p ositive c onstants, and r n = (log n ) 2 n − 2 / 3 . B.2.1. Pr o of of L emma 5 (i). Th e p ro of is adapted from BDJ (2019 , eq. (7.11) on p.3297). F or ﬁ xed η , let { ˆ U 2 η,i } n i =1 b e a p ermutati on of { ˆ U 2 j } n j =1 , whic h is arranged according to the monotonically ord ered s er ies { X ′ i η } n i =1 . The min -max form ula of the isoto nic regression sa ys min 1 ≤ k ≤ n P k i =1 ˆ U 2 η,i k ≤ ˆ τ η ( x ′ η ) ≤ max 1 ≤ k ≤ n P n i = k ˆ U 2 η,i n − k + 1 , for eac h x ∈ X and η ∈ B ( η 0 , δ 0 ), w hic h implies min 1 ≤ j ≤ n ˆ U 2 j ≤ ˆ τ η ( x ′ η ) ≤ max 1 ≤ j ≤ n ˆ U 2 j for eac h x ∈ X . Thus, it is suﬃcient for the conclusion to show that max 1 ≤ j ≤ n ˆ U 2 j = O p (log n ) . (B.1) Observe that max 1 ≤ j ≤ n ˆ U 2 j ≤ max 1 ≤ j ≤ n U 2 j + 2 Rk || ˆ θ OLS − θ || ∞ max 1 ≤ j ≤ n | U j | + R 2 k 2 || ˆ θ OLS − θ || 2 ∞ , where k is the dimension of θ . F rom Lemma 7.1 of BDJ, Assumption M2 gu arantees max 1 ≤ j ≤ n U 2 j = O p (log n ). By the same reasoning for the pro of of Lemma 3, we ha ve max 1 ≤ j ≤ n | U j | = O p (log n ) and || ˆ θ OLS − θ || ∞ = O p ( n − 1 / 2 ). Thus, we hav e k ˆ τ η k ∞ = O p (log n ). S ince d iﬀeren t η only c h anges the p ermutation { ˆ U 2 η,i } n i =1 but not max 1 ≤ j ≤ n ˆ U 2 j , w e h a v e k ˆ τ η k ∞ = O p (log n ) un iformly o ver η ∈ B ( η 0 , δ 0 ) . B.2.2. Pr o of of L emma 5 (ii). Th e main p art of the pro of is similar to those of Lemma 3 (ii) and Prop osition 4 of BGH-supp. Deﬁn e g η,τ ( u, x ) = { 2 u 2 τ ( x ′ η ) − τ ( x ′ η ) 2 } − { 2 u 2 τ η ( x ′ η ) − τ η ( x ′ η ) 2 } , R n,η = 2 n n X j =1  − 2 U j W j ( ˆ θ OLS − θ ) + { W j ( ˆ θ OLS − θ ) } 2  { ˆ τ η ( X ′ j η ) − τ η ( X ′ j η ) } , d 2 2 ( τ 1 , τ 2 ) = − E [2 τ 1 τ 2 − τ 2 1 − τ 2 2 ] , 35 F ollo wing reasoning s im ilar to that presente d f or (A.21)-(A.26 ), we hav e for some C and K , P ( sup η ∈B ( η 0 ,δ 0 ) d 2 2 ( ˆ τ η , τ η ) ≥ ǫ 2 n ) ≤ P    sup η ∈B ( η 0 ,δ 0 ) d 2 ( ˆ τ η , τ η ) ≥ ǫ n , sup η ∈B ( η 0 ,δ 0 ) k ˆ τ η k ∞ ≤ K log n, sup η ∈B ( η 0 ,δ 0 ) R g η, ˆ τ ( u, x ) d ( P n − P )( u, x ) + R n,η − d 2 2 ( ˆ τ η , τ η ) ≥ 0 , sup η ∈B ( η 0 ,δ 0 ) | R n,η | ≤ C n − 1 (log n ) 2    + P ( | R n,η | > C n − 1 (log n ) 2 ) + P sup η ∈B ( η 0 ,δ 0 ) k ˆ τ η k ∞ > K log n ! =: P 1 + P 2 + P 3 . Lemma 5 (i) implies P 3 → 0, and P 2 → 0 follo ws from similar arguments for (A.26). F or P 1 , w e deﬁne T = { τ : τ is p ositiv e and monotone increasing function on I η , k τ k ∞ ≤ K log n } , G = { g ( x, u ) = { 2 u 2 τ ( x ′ η ) − τ ( x ′ η ) 2 } − { 2 u 2 τ η ( x ′ η ) − τ η ( x ′ η ) 2 } : τ ∈ T } , G v = { g ∈ G : d 2 ( τ , τ η ) ≤ v } , for eac h η ∈ B ( η 0 , δ 0 ). By similar argumen ts for Lemm a 3 (ii) and Prop osition 4 of BGH -supp, w e can obtain P 1 ≤ ∞ X s =0 E h k G n g k G 2 s +1 ǫ n i / { √ n 2 2 s ǫ 2 n − C n − 1 / 2 (log n ) 2 } , and sup η ∈B ( η 0 ,δ 0 ) Z { ˆ τ η ( x ′ η ) − τ η ( x ′ η ) } 2 dF ( x ) = O p ((log n ) 2 n − 2 / 3 ) . (B.2) By combining (B.2), Lemma 4, and the triangle in equalit y , w e obtain k ˆ τ ˆ η − τ 0 k 2 2 ,P = O p ((log n ) 2 n − 2 / 3 ). Pr o of of L emma 5 (iii). T o av oid hea vy n otation, we use the s ame notation as in the pro of of Lemma 3 (iii), b ut some notatio n is r ed eﬁned here. Let T I ,K 1 = { τ m onotone n on-decreasing on some interv al I and 0 < τ < K 1 } . Assumption M2 guaran tees 0 < C < τ 0 < C < ∞ . Similar to the pr o of of Lemma 3 (iii), w e calculate H B ( ǫ, ˜ F , k · k B ,P ) and k ˜ f k 2 B ,P , with ˜ F = { ˜ f = D − 1 f : f ∈ F } , w here the constan t D > 0 is d etermined later. Deﬁne I ∗ η = ( a L , a U ) with a L = inf x ∈ X ,η ∈B ( η 0 ,δ 0 ) x ′ η and a U = sup x ∈ X ,η ∈B ( η 0 ,δ 0 ) x ′ η . Deﬁne F n =      f n ( w, u ) = I { x ′ η > q n }  1 τ ( x ′ η ) − 1 τ η ( x ′ η )  w h u : τ ∈ T I ∗ η ,K 1 , η ∈ B ( η 0 , δ 0 ) , k τ − τ 0 k 2 2 ,P ≤ v 2 , h ∈ { 1 : d im( w ) } , I ( x ′ η > q n ) /τ ( x ) ≤ 1 /K 0      , where w h is the h -th comp onen t of w . W e s et 2 K 0 = C , K 1 = K 2 log n , and v = K 3 (log n ) n − 1 / 3 for some p ositiv e constant s K 2 and K 3 . 36 By v an der V aart and W ellner (1996, Theorem 2.7.5), it holds for eac h ǫ ∈ (0 , K 1 ), H B ( ǫ, T I ∗ η ,K 1 , k · k P ) ≤ C ′ K 1 ǫ . Similarly to the univ ariate case, we can choose those brac k et functions ( τ L , τ U ), which satisfy I { x ′ η > q n } /τ L ( x ′ η ) ≤ 1 /K 0 . Then, w e deﬁne f L ( w, u ) =    I { x ′ η > q n }  1 τ U ( x ′ η ) − 1 τ η ( x ′ η )  w h u if w h u ≥ 0 , I { x ′ η > q n }  1 τ L ( x ′ η ) − 1 τ η ( x ′ η )  w h u if w h u < 0 , f U ( w, u ) =    I { x ′ η > q n }  1 τ L ( x ′ η ) − 1 τ η ( x ′ η )  w h u if w h u ≥ 0 , I { x ′ η > q n }  1 τ U ( x ′ η ) − 1 τ η ( x ′ η )  w h u if w h u < 0 . Note that ( f L , f U ) is a b rac k et for f ∈ F n . The brac ket size is k ˜ f U − ˜ f L k 2 B ,P = k D − 1 f U − D − 1 f L k 2 B ,P = 2 ∞ X k =2 1 k ! D k Z W × R I { x ′ η > q n }      1 τ L ( x ′ η ) − 1 τ U ( x ′ η )  w h u     k dP ( w, u ) ≤ 2 ∞ X k =2 1 k ! D k ( R k k ! M k − 2 0 a 0 (2 K 1 ) k − 2 K 2 k 0 k τ U − τ L k 2 P ) ≤ 2 a 0  R D K 2 0  2 ∞ X k =0  2 RM 0 K 1 D K 2 0  k ǫ 2 , where the ﬁrst inequalit y follo ws from Assump tion M2 (wh ere we can choose a 0 , M 0 > 1) and I { x ′ η>q n } τ L ( x ′ η ) ≤ 1 K 0 . Setting D = 4 M 0 RK 1 /K 2 0 yields k ˜ f U − ˜ f L k B ,P ≤ ˜ K ǫ f or ˜ K = a 1 / 2 0 2 M 0 K 1 , and th us H B ( ǫ, ˜ F , k · k B ,P ) ≤ ˜ B ǫ , for ˜ B = C 2 a 1 / 2 0 2 M 0 . (B.3) No w we compute the Bernstein norm of ˜ f : k ˜ f k 2 B ,P = 2 ∞ X k =2 1 k ! D k Z W × R I { x ′ η > q n }      1 τ ( x ′ η ) − 1 τ η ( x ′ η )  w h u     k dP ( w, u ) ≤ 2 ∞ X k =2 1 k ! D k ( R k k ! M k − 2 0 a 0 (2 K 1 ) k − 2 K 2 k 0 k τ − τ 0 k 2 P ) ≤ 2 a 0  R D K 2 0  2 ∞ X k =0  2 RM 0 K 1 D K 2 0  k v 2 ≤ a 0 4 M 2 0 1 K 2 1 v 2 , where the ﬁrst in equalit y follo ws f rom I { x ′ η>q n } τ ( x ′ η ) ≤ 1 K 0 . This implies k ˜ f k B ,P ≤ B v K 1 , for B = a 1 / 2 0 2 M 0 . (B .4) Com bining (B.3) and (B.4), the remaining steps are the same as th ose in th e pro of of Lemma 3 (iii). 37 B.3. Pro of of Lemma 4 . Recall for ﬁ xed η , we ﬁ rst ob tain ˆ τ η = arg min τ ∈M 1 n P n i =1 { ˆ U 2 i − τ ( X ′ i η ) } 2 and then obtai n ˆ η by ˆ η = argmin η || 1 n P n i =1 X ′ i { ˆ U 2 i − ˆ τ η ( X ′ i η ) }|| 2 . W e denote E [ X | X ′ η = x ′ η ] by E [ X | x ′ η ]. Th e pro of is sim ilar to the ones in BGH and Balab daoui and Gro eneb o om (2021 ) except that we need to handle th e inﬂuence of the estimated dep en den t v ariables ˆ U 2 i . The pro of of consistency of ˆ η is similar to pp.16-17 of BGH- supp. By a similar argum en t in Balab d aoui and Gro eneb o om (202 1, Lemma 3.2), under Assu mptions M1- M3, w e ha v e 1 n n X i =1 X ′ i { ˆ U 2 i − ˆ τ η ( X ′ i η ) } = 1 n n X i =1 ( X i − E [ X | X ′ i η ]) { ˆ U 2 i − τ η ( X ′ i η ) } + o p ( n − 1 / 2 ) , for eac h η , where w e also us e (B.2). Thus, it holds      1 n n X i =1 X i { ˆ U 2 i − ˆ τ ˆ η ( X ′ i ˆ η ) }      = min η      1 n n X i =1 X i { ˆ U 2 i − ˆ τ η ( X ′ i η ) }      ≤ min η      1 n n X i =1 ( X i − E [ X | X ′ i η ]) { ˆ U 2 i − τ η ( X ′ i η ) } + o p ( n − 1 / 2 )      . The leading term ins id e the norm k·k of the last expression do es not dep end on the p oten tially non-smo oth ˆ τ η ; it is a sm o oth fu nction of η . Thus, un der standard conditions for the metho d of momen ts, we hav e min η    1 n P n i =1 ( X i − E [ X i | X ′ i η ]) { ˆ U 2 i − τ η ( X ′ i η ) }    = 0, and o p ( n − 1 / 2 ) = 1 n n X i =1 X i { ˆ U 2 i − ˆ τ ˆ η ( X ′ i ˆ η ) } = 1 n n X i =1 ( X i − E [ X | X ′ i ˆ η ]) { ˆ U 2 i − ˆ τ ˆ η ( X ′ i ˆ η ) } + o p ( n − 1 / 2 + ( ˆ η − η )) = Z ( x − E [ X | x ′ ˆ η ]) { ˆ u 2 − τ 0 ( x ′ η 0 ) } d ( P n − P )( x, ˆ u ) + Z ( x − E [ X | x ′ ˆ η ]) { ˆ u 2 − τ ˆ η ( x ′ ˆ η ) } dP ( x, ˆ u ) + o p ( n − 1 / 2 + ( ˆ η − η )) =: I + I I + o p ( n − 1 / 2 + ( ˆ η − η )) , (B.5) where the second equalit y follo ws from similar argument s to pp.18-20 of BGH-supp and (B.2), and the third equalit y follo ws from a similar argumen t in p p.21-23 of BGH-supp. Let ˆ U ( w, u ) = u − w ′ ( ˆ θ OLS − θ ) and ˆ e ( w , u ) := ˆ U ( w, u ) 2 − u 2 = − 2 w ′ ( ˆ θ OLS − θ ) u + { w ′ ( ˆ θ OLS − θ ) } 2 . (B.6) F or I , w e ha ve I = Z ( x − E [ X | x ′ ˆ η ]) { u 2 + ˆ e ( w , u ) − τ 0 ( x ′ η 0 ) } d ( P n − P )( w , u ) = Z ( x − E [ X | x ′ η 0 ]) { u 2 − τ 0 ( x ′ η 0 ) } d ( P n − P )( x, u ) + Z ( x − E [ X | x ′ ˆ η ]) ˆ e ( w , u ) d ( P n − P )( w , u ) + o p ( n − 1 / 2 ) = Z ( x − E [ X | x ′ η 0 ]) { u 2 − τ 0 ( x ′ η 0 ) } d ( P n − P )( x, u ) + o p ( n − 1 / 2 ) , (B.7) 38 where the s econd equalit y follo ws from p .21 of BGH-supp , and the third equalit y follo ws from the facts that (a) ˆ θ OLS − θ = O p ( n − 1 / 2 ), (b) ˆ e ( w , u ) is a p arametric function of w and u in a c h anging cla ss ind exed by ˆ θ OLS (see (B.6)), so its ǫ -en tropy is of ord er log (1 /ǫ ) ≤ 1 /ǫ (see, e.g., Example 19.7 of v an der V aart and W ellner, 2000), and (c) sim ilar argu m en ts in pp.22-23 of BGH-supp. By Lemma 17 of BGH-supp w e ha ve τ η ( x ′ η ) = τ 0 ( x ′ η 0 ) + ( η − η 0 )( x − E [ X | X ′ η 0 = x ′ η 0 ]) τ ′ 0 ( x ′ η 0 ) + o p ( η − η 0 ) . (B.8) F or I I , observ e that I I = Z ( x − E [ X | x ′ ˆ η ]) { u 2 − τ ˆ η ( x ′ ˆ η ) } dP ( x, u ) + Z ( x − E [ X | x ′ ˆ η ]) ˆ e ( w , u ) dP ( w , u ) =  Z ( x − E [ X | x ′ η 0 ])( x − E [ X | X ′ η 0 = x ′ η 0 ]) τ ′ 0 ( x ′ η 0 ) dP ( x )  ( ˆ η − η 0 ) + Z ( x − E [ X | x ′ ˆ η ]) ˆ e ( w , u ) dP ( w , u ) + o p ( ˆ η − η 0 ) =  Z ( x − E [ X | x ′ η 0 ])( x − E [ X | x ′ η 0 ]) τ ′ 0 ( x ′ η 0 ) dP ( x )  ( ˆ η − η 0 ) + O p ( n − 1 / 2 ) + o p ( ˆ η − η 0 ) = B ( ˆ η − η 0 ) + O p ( n − 1 / 2 ) + o p ( ˆ η − η 0 ) , (B.9) where the third equalit y follo ws from (B.8) and ( E [ X | x ′ ˆ η ] − E [ X | x ′ η 0 ])( ˆ η − η 0 ) = o p ( ˆ η − η 0 ), the fourth equalit y follo ws fr om ˆ θ OLS − θ = O p ( n − 1 / 2 ) and the d eﬁnition of B in Assu m ption M6. Com bining (B.5 ), (B. 7), and (B.9), w e ha v e ˆ η − η 0 = B − Z ( x − E [ X | x ′ η 0 ]) { u 2 − τ 0 ( x ′ η 0 ) } d ( P n − P )( x, u ) + O p ( n − 1 / 2 ) + o p ( n − 1 / 2 + ( ˆ η − η )) , where B − is the Mo ore-P enrose inv erse of B (see p .17 of BGH f or more details). Ther efore, w e hav e ˆ η − η 0 = O p ( n − 1 / 2 ). Th is result, com bined with (B.8) and Assumptions M1 and M2, implies τ ˆ η ( a ) − τ 0 ( a ) = O p ( n − 1 / 2 ) and k τ ˆ η − τ 0 k 2 ,P = O p ( n − 1 / 2 ). Referen ces [1] Acemoglu, D . an d P . Restrep o (2017) Secular stag nation? The eﬀect of aging on economic growth in the age of automation, A meric an Ec onomic R eview , 107, 174-179. [2] Ayer, M., Brunk , H. D., Ewing, G. M., Reid, W. T. and E. Silverman (1955) An empirical d istribution function for sampling with incomplete information, Annals of Mathematic al Statistics , 26, 641-647. [3] Babii, A. and R. Ku mar (2023) Isotonic regression discon tin uity designs. Journal of Ec onometrics , 234(2), 371-393. [4] Balabdaoui, F., Durot, C. and H . Janko wski (2019) Least sq u ares estimation in the monotone single index mod el, Bernoul li , 25, 3276-331 0. [5] Balabdaoui, F., Gro eneb o om, P . and K. Hendrickx (2019) Score estimation in the monotone single-index mod el, Sc andinavian Journal of Statistics , 46, 517-544. [6] Balabdaoui, F. and P . Gro eneb o om (2021) Proﬁle least squares estimators in th e monotone single index mod el, in A dvanc es in Contemp or ary Statistics and Ec onometrics , pp. 3-22, Springer. [7] Barlo w, R. and H . Brunk (1972) The isotonic regression problem an d its du al, Journal of the Americ an Statistic al Asso ciation , 67, 140 -147. [8] Bic kel, P . J. (1978) Using residuals robustly I : T ests for heteroscedasticit y , nonlinearity , A nnals of Stat istics , 6, 266-291. 39 [9] Bo x, G. E. and W. J. Hill (1974) Correcting inhomogeneit y of v ariance with p ow er transformation weigh ting, T e chnometrics , 16, 385-38 9. [10] Carroll, R. J. (1982) Adapt ing for heteroscedasticity in linear mo dels, An nals of Statistics , 10, 1224-123 3. [11] Chetverik ov, D. (2019) T esting regression monotonicity in econometric models, Ec onometric The ory , 35, 729-776. [12] Cragg, J. G. (1983) More eﬃcien t estimation in the presence of heteroscedasticit y of unknown form, Ec ono- metric a , 51, 751-764. [13] Cragg, J. G. (1992) Quasi-Aitken estimatio n for h eteroskedasticit y of unknown form, Journal of Ec onomet- rics , 54, 179-201 . [14] D ¨ um bgen, L. and V . G. Sp okoiny (2001) Multiscale testing of qu alitativ e hypotheses, A nnals of Statistics , 29, 124-152. [15] F ang, B., Gu ntuboyina, A., and Sen, B. (2021) Multiv ariate exten sions of isotonic regression and total v ariation denoising via enti re monotonicit y and H ardy–Krause v ariation. An n. Statist. 49(2): 769-792. [16] Ghosal, S., Sen, A. and A. v an der V aart (2000) T esting monotonicit y of regressio n, Annals of Stat istics , 28, 1054-1082 . [17] Grenander, U. (1956) On th e theory of mortalit y measurement. II., Skand. Aktuarietidskr , 39, 125-153. [18] Groeneb o om, P . and G. Jo ngbloed (2014) Nonp ar ametric Estimation under Shap e Constr aints , Cam bridge Universit y Press. [19] Hall, P . an d N. E. Heckman (2000). T esting for monotonicity of a regression mean by calibrating for linear functions, Anna ls of Statistics , 28, 20-39. [20] Hansen, A. H. (1939) Economic progress and declining p opulation growth, Americ an Ec onomi c R eview , 29, 1-15. [21] Hsu, Y. C., Liu , C. A. and X . Shi (2019) T esting generalized regression monotonicit y , Ec onometric The ory , 35, 1146-120 0. [22] Jobson, J. D. and W. A. F uller (1980) Least squares estimation when th e co v ariance matrix and parameter vector are functionally related, Journal of the Americ an Statistic al Asso ciation , 75, 176-181 . [23] Kim, J. and D. Polla rd (1990) Cube ro ot asymptotics, Annals of Statistics , 18, 191-219 . [24] Kuliko v, V. N. and Lopuha¨ a, H. P . (2006) The b ehavior of the NPMLE of a decreasing density near the b oundaries of the sup p ort. A nnals of Statistics , 34(2), 742-768 . [25] Matzkin, R. L. (1994) Restrictions of economic theory in nonparametric metho ds, in Engle R. F. and D. L. McF adden (eds.), Handb o ok of Ec onometrics , vol. I V , pp. 252 3-2558, Elsevier. [26] Mincer, J. (197 4) Schooling, Exp erience, and Earnings. Human Behavior & So cial Institutions, No. 2. [27] Newe y , W. K. (1993) Eﬃcient estimation of mod els with conditional moment restrictions, in Maddala, G. S., Rao, C. R. an d H. D. Vinod (eds.), Handb o ok of Statistics , vol. 11, p p. 419-454, Els evier. [28] Rao, B. P . (1969) Estimatio n of a unimodal density , Sankhy ¯ a , A 31, 23-36. [29] Rao, B. P . (1970) Estimation for d istributions with monotone failure rate, Annals of Mathematic al Statistics , 41, 507-519. [30] Robinson, P . M. (1987) A symptotically eﬃcient estimation in t h e presence of heteroskedasticit y of u nknown form, Ec onometric a , 55, 875-891. [31] Ruud, P . A. (2000) A n intr o duction to classic al e c onometric the ory. OUP Catalogue. [32] v an de Geer, S. ( 2000) Empirical Pro cesses in M − estimatio n. Cambridge Univ ersit y Press. [33] v an der V aart, A. W. (200 0) Asymptotic Statistics , Cambridge U niversit y press. [34] v an der V aart, A. W. and J. A. W ellner (1996) We ak Conver genc e and Empiric al Pr o c esses , Springer. [35] W o o dro ofe, M. and Sun, J. (1993) A p enalized maximum lik elihoo d estimate of f (0+) when f is non- increasing. Statistic a Sinic a , 501-515. [36] W o oldridge, J. M. (2010) Ec onometric Analysis of Cr oss Se ction and Panel Data , 2nd ed., MIT Press. [37] W o oldridge, J. M. (2013) Intr o ductory e c onometrics: A mo dern appr o ach, 5th Ed. Cengage learning. 40 School of Social Sciences, W aseda Unive rsity, 1-6-1 Nishiw aseda, Shi njuku-ku, Tokyo 169-8050, Jap an. Email addr ess : yarai@waseda. jp Dep ar tmen t of Economics, London School of Economics, Houghton Street, London, WC2A 2AE, UK. Email addr ess : t.otsu@lse.ac .uk Dep ar tmen t of Economics, University of Mannheim , L7 3-5, 68161, Mannh eim, Germa ny. Email addr ess : mengshan.xu@u ni-mannhei m.de 41

GLS under Monotone Heteroskedasticity

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment