Generalized Sliced Inverse Regression (GSIR) is one of the most important methods for nonlinear sufficient dimension reduction. As shown in Li and Song (2017), it enjoys a convergence rate that is independent of the dimension of the predictor, thus avoiding the curse of dimensionality. In this paper we establish an improved convergence rate of GSIR under additional mild eigenvalue decay rate and smoothness conditions. Our convergence rate can be made arbitrarily close to $n^{-1/3}$ under appropriate decay rate and smoothness parameters. As a comparison, the rate of Li and Song (2017) is $n^{-1/4}$ under the best conditions. This improvement is significant because, for example, in a semiparametric estimation problem involving an infinite-dimensional nuisance parameter, the convergence rate of the estimator of the nuisance parameter is often required to be faster than $n^{-1/4}$ to guarantee desired semiparametric properties such as asymptotic efficiency. This can be achieved by the improved convergence rate, but not by the original rate. The sharpened convergence rate can also be established for GSIR in more general settings, such as functional sufficient dimension reduction.
For regression problems with high-dimensional predictors, sufficient dimension reduction (SDR) provides a powerful framework for finding a low-dimensional representation of the predictor that preserves all the information useful for predicting the response. The theoretical foundation of SDR builds on the concept of sufficiency, which posits that certain functions of the predictors capture all the information about the response. Consequently, the remaining predictors can be ignored without any loss of information. SDR facilitates data visualization via low-dimensional representations of the predictors, performs data summarization without losing information, and enhances prediction accuracy by alleviating the curse of dimensionality.
Classic linear SDR assumes the existence of a p × d matrix B, with d < p, such that Y is independent of X conditioning on B ⊤ X. In symbols, Y X|B ⊤ X.
(1)
If this relation holds, the low-dimensional representation B ⊤ X serves as a sufficient predictor for Y since the conditional distribution of Y given X is fully characterized by B ⊤ X.
Note that matrix B in (1) is only identifiable up to an invertible right transformation. Thus, the identifiable parameter to estimate is the column space of B, denoted by span(B). The central space, denoted by S Y |X , is defined as the intersection of all subspaces spanned by the columns of B that satisfy (1). It is the target of estimation in linear SDR, which was first proposed and studied by Li (1991). See Li (2018b) and Ma & Zhu (2013) for details. Many methods have been proposed to find S Y |X , such as sliced inverse regression (SIR, Li (1991)), sliced average variance estimate (SAVE, Cook & Weisberg (1991)), contour regression (CR, Li et al. (2005)) and directional regression (DR, Li & Wang (2007)).
A closely related problem, called SDR for conditional mean, assumes the existence of a p × d matrix B, with d < p, such that
which was proposed in Cook & Li (2002) and Cook & Li (2004). Clearly, (2) is a weaker condition compared to (1), which is useful in many regression settings. The target of estimation in this problem is the central mean space, denoted by S E(Y |X) , which is the intersection of all the subspaces spanned by the columns of B satisfying (2). Methods that target the central mean space include, among others, ordinary least squares (OLS, Li & Duan (1989)), principal Hessian directions (PHD, Li (1992)), iterative Hessian transformation (IHT, Cook & Li (2002, 2004)), outer product gradient (OPG, Xia et al. (2002)) and minimum average variance estimation (MAVE, Xia et al. (2002)).
The methodology of sufficient dimension reduction was extended to a nonlinear setting by several authors, where B ⊤ X is replaced by a set of nonlinear functions. See Wu (2008), Wang (2008), Yeh et al. (2009), Li et al. (2011), Lee et al. (2013), and Li & Song (2017). In the following we adopt the reporducing kernel Hilbert space (RKHS) framework articulated in Li (2018b). Suppose there exist functions f 1 , . . . , f d : R p → R, with d < p such that Y X|f 1 (X), . . . , f d (X).
(3)
In the above relation, the functions f 1 , . . . , f d are not identifiable, because any one-to-one transformation of (f 1 (X), . . . , f d (X)) would satisfy the same relation. The identifiable object is the σ-field generated by f 1 (X), . . . , f d (X), denoted by σ{f 1 (X), . . . , f d (X)}. The goal of nonlinear SDR is to recover this σ-field, or any set of functions generating this σ-field. Two main classes of approaches to this nonlinear SDR problem (3) have been developed: RKHSbased methods proposed by Li et al. (2011), Lee et al. (2013) and Li & Song (2017), and deep learning based methods via various neural network structures, including Liang et al. (2022), Sun & Liang (2022), Chen et al. (2024), Tang & Li (2025) and Xu et al. (2025). Among the RKHS based methods, the most commonly used method is Generalized Sliced Inverse Regression (GSIR), which was first proposed by Lee et al. (2013). By leveraging nonlinear transformations of the predictor, GSIR is capable of achieving a better performance in dimension reduction than linear SDR methods. Consequently, it has been applied in various fields, such as graphical models Li & Kim (2024), reliability analysis Yin & Du (2022), and distributional data regression Zhang et al. (2024). Furthermore, Li & Song (2017) extends GSIR to f-GSIR, a functional variant of GSIR, where both X and Y are random functions lying in Hilbert spaces instead of Euclidean spaces.
A critically important property of GSIR is its convergence rate, as it is often used in conjunction with downstream nonparametric regression, conditional density estimation, and graphical estimation. The convergence rate of GSIR will directly affect the accuracy of downstream analysis. So far, the only published convergence rate we know of is that given in Li & Song (2017), which is
where β > 0 is a constant representing the degree of smoothness between the
This content is AI-processed based on open access ArXiv data.