OEM for least squares problems
We propose an algorithm, called OEM (a.k.a. orthogonalizing EM), intended for var- ious least squares problems. The first step, named active orthogonization, orthogonalizes an arbi- trary regression matrix by elaborately adding more rows. The second step imputes the responses of the new rows. The third step solves the least squares problem of interest for the complete orthog- onal design. The second and third steps have simple closed forms, and iterate until convergence. The algorithm works for ordinary least squares and regularized least squares with the lasso, SCAD, MCP and other penalties. It has several attractive theoretical properties. For the ordinary least squares with a singular regression matrix, an OEM sequence converges to the Moore-Penrose gen- eralized inverse-based least squares estimator. For the SCAD and MCP, an OEM sequence can achieve the oracle property after sufficient iterations for a fixed or diverging number of variables. For ordinary and regularized least squares with various penalties, an OEM sequence converges to a point having grouping coherence for fully aliased regression matrices. Convergence and convergence rate of the algorithm are examined. These convergence rate results show that for the same data set, OEM converges faster for regularized least squares than ordinary least squares. This provides a new theoretical comparison between these methods. Numerical examples are provided to illustrate the proposed algorithm.
💡 Research Summary
The paper introduces a novel algorithm called OEM (Orthogonalizing EM) for solving ordinary and regularized least‑squares problems. The key idea is to “actively orthogonalize” any design matrix X by appending a small number of carefully constructed rows, thereby creating an enlarged matrix (X_c) whose columns are exactly orthogonal. This is achieved by scaling X with a diagonal matrix S, performing an eigen‑decomposition of (Z^\top Z) where (Z = X S^{-1}), and forming the augmentation matrix (\Delta = B^{1/2} V_1 S^{-1}). The resulting (X_c = \begin{pmatrix}X\\Delta\end{pmatrix}) satisfies (X_c^\top X_c = d S^2) for a scalar (d) larger than the largest eigenvalue of (X^\top X).
Once orthogonalization is complete, OEM iterates two simple steps. In the “E‑step” the missing responses for the added rows are imputed as (y_{\text{miss}} = \Delta \beta^{(k)}). In the “M‑step” the complete data ((X_c, y_c)) are used to update the coefficient vector. Because (X_c) is orthogonal, the update decouples across coordinates: for ordinary least squares (\beta^{(k+1)}_j = u_j/d_j) where (u = X^\top y + \Delta^\top \Delta \beta^{(k)}) and (d_j) are the diagonal entries of (X_c^\top X_c). For regularized problems the update becomes a one‑dimensional penalized quadratic, which also has a closed‑form solution for many popular penalties (lasso, SCAD, MCP, ridge, non‑negative garrote).
The authors provide a thorough theoretical analysis. For OLS with a singular (X) (rank (r < p)), if the orthogonalization is performed with (d = \gamma_1) (the largest eigenvalue of (X^\top X)) and (S = I), OEM converges to the Moore‑Penrose generalized inverse solution (\hat\beta^* = (X^\top X)^+ X^\top y). This limit has minimal Euclidean norm among all LS solutions, a simple risk expression (E|X(\hat\beta^-\beta)|^2 = r\sigma^2), and respects linear dependencies among columns (i.e., (X\alpha = 0 \Rightarrow \alpha^\top \hat\beta^ = 0)). The paper shows that OEM’s iterative scheme is numerically more stable than direct eigen‑decomposition, especially when some eigenvalues are near zero, and has computational complexity (O(np^2)) compared with (O(np^2 + p^3)) for the eigen‑decomposition approach.
For regularized problems, the authors prove that OEM converges to a stationary point (or local minimum) under mild conditions for any penalty that is separable across coordinates. In the case of non‑convex penalties SCAD and MCP, they demonstrate that after a sufficient number of iterations OEM attains the oracle property: it selects the true active set with probability tending to one and estimates the non‑zero coefficients as efficiently as if the true model were known. Moreover, OEM yields “grouping coherence”: when columns of X are fully aliased, the corresponding coefficients converge to identical values, guaranteeing that such variables enter or leave the model together— a property not shared by standard coordinate‑descent or gradient‑descent algorithms.
Convergence‑rate analysis reveals that the spectral radius of the iteration matrix (D = I - \gamma_1^{-1} X^\top X) governs the speed. Because regularization effectively reduces the magnitude of the eigenvalues of (X^\top X), the radius becomes smaller and OEM converges faster for penalized problems than for plain OLS on the same data. Empirical studies confirm these theoretical findings. In tall data settings ((n > p)), OEM outperforms the traditional eigen‑decomposition method by a factor of 2–5 in runtime and matches or exceeds the speed of state‑of‑the‑art solvers for lasso, SCAD, and MCP. In wide settings ((p > n)), OEM can be slower, but a two‑stage strategy (screening followed by OEM) mitigates this drawback.
Overall, OEM offers a unified, theoretically sound, and computationally efficient framework for a broad class of least‑squares problems. Its ability to orthogonalize any design matrix, coupled with closed‑form coordinate updates, bridges the gap between EM‑type missing‑data algorithms and modern regularization techniques. Future work suggested includes parallel implementations for massive data, extensions to generalized linear models (e.g., logistic regression), and online versions suitable for streaming data environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment