Minimax Rates of Estimation for Sparse PCA in High Dimensions
We study sparse principal components analysis in the high-dimensional setting, where $p$ (the number of variables) can be much larger than $n$ (the number of observations). We prove optimal, non-asymptotic lower and upper bounds on the minimax estimation error for the leading eigenvector when it belongs to an $\ell_q$ ball for $q \in [0,1]$. Our bounds are sharp in $p$ and $n$ for all $q \in [0, 1]$ over a wide class of distributions. The upper bound is obtained by analyzing the performance of $\ell_q$-constrained PCA. In particular, our results provide convergence rates for $\ell_1$-constrained PCA.
💡 Research Summary
The paper tackles the fundamental problem of estimating the leading eigenvector of a covariance matrix when the true eigenvector is sparse, in a regime where the ambient dimension p can far exceed the sample size n. Assuming the data are i.i.d. mean‑zero sub‑Gaussian vectors with covariance Σ, the authors focus on eigenvectors that belong to an ℓ_q ball (0 ≤ q ≤ 1) of radius R, which captures a wide range of sparsity patterns from exact s‑sparsity (q = 0) to ℓ_1‑regularized sparsity (q = 1).
The first major contribution is a non‑asymptotic minimax lower bound on the mean‑squared error (MSE) of any estimator (\hat v). Using Fano’s inequality together with metric‑entropy calculations for the ℓ_q ball, they show that for every estimator
\
Comments & Academic Discussion
Loading comments...
Leave a Comment