Self-concordant analysis for logistic regression
Most of the non-asymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closed-form expressions. In this paper, we use and extend tools from the convex optimization literature, namely self-concordant functions, to provide simple extensions of theoretical results for the square loss to the logistic loss. We apply the extension techniques to logistic regression with regularization by the $\ell_2$-norm and regularization by the $\ell_1$-norm, showing that new results for binary classification through logistic regression can be easily derived from corresponding results for least-squares regression.
💡 Research Summary
The paper addresses a long‑standing gap in non‑asymptotic statistical theory: most sharp results have been derived for the squared loss because the corresponding estimators admit closed‑form expressions, while logistic loss, which is central to binary classification, does not. The authors import a concept from convex optimization—self‑concordant functions—and adapt it to the logistic setting. Classical self‑concordance requires the third derivative to be bounded by the 3/2‑power of the second derivative, a condition violated by the logistic loss. Instead, they define a broader class of functions for which the third derivative is bounded linearly by the second derivative multiplied by the squared norm of the direction vector, i.e., |g‴(t)| ≤ R‖v‖² g″(t) for some constant R.
Two technical propositions are proved for this class. Proposition 1 provides global lower and upper second‑order Taylor expansions with remainder terms that depend on ‖v‖²·e^{R‖v‖²}, guaranteeing control even when ‖v‖ is large. Proposition 2 analyzes Newton’s method under the same assumptions, showing that if the Newton decrement ν(F,w) is sufficiently small relative to the smallest eigenvalue of the Hessian and R, then the function has a unique global minimizer and the error measured in the Hessian norm satisfies (w−w*)ᵀ∇²F(w)(w−w*) ≤ 16 ν(F,w)². Moreover, a single Newton step reduces the decrement quadratically: ν(F,w+Δ_N(w)) ≤ ν(F,w)².
These results are then applied to logistic regression. The logistic loss ℓ(u)=log(1+e^{−u}) satisfies the new third‑derivative bound with R equal to the maximal ℓ₂‑norm of the covariates. Consequently, the empirical logistic risk ˆJ₀(w) fits the framework, and a one‑step Newton update from a “population” parameter ŵ (e.g., the true parameter in a well‑specified model) approximates the empirical minimizer with the same guarantees as in the quadratic case.
For ℓ₂‑regularized logistic regression, the authors consider ˆJ_λ(w)=ˆJ₀(w)+λ‖w‖²/2. With λ>0 the objective is strongly convex, guaranteeing a unique minimizer ˆw_λ. Under minimal assumptions (bounded inputs, independent binary outputs) they derive a high‑probability bound
J₀(ˆw_λ) ≤ J₀(w₀) + C·(1+‖w₀‖²)·(log(1/δ)/n)
for any reference vector w₀, where C is an absolute constant. When the model is correctly specified, they obtain a finer 1/n expansion and explicit bounds on the remainder terms, matching the precision of classical results for least‑squares regression. The analysis also extends to reproducing‑kernel Hilbert spaces and spline smoothing by exploiting the representer theorem, showing that the same ℓ₂‑regularized bounds hold for non‑parametric estimators.
For ℓ₁‑regularized logistic regression (logistic Lasso), the paper shows that the same Taylor expansions and Newton‑step analysis allow one to transplant recent ℓ₁ results from the square‑loss setting—model consistency and prediction efficiency—directly to the logistic loss. The key is to view the one‑step Newton iterate as solving a weighted least‑squares problem, to which existing ℓ₁ theory applies.
Finally, the authors develop new Bernstein‑type concentration inequalities for quadratic forms of bounded random variables (derived from general U‑statistics results) to support the high‑probability bounds throughout the paper.
In summary, the contribution is a clean, unified framework that replaces the cumbersome third‑derivative assumptions traditionally required for non‑asymptotic analysis of logistic regression with a simple ‖v‖²‑controlled condition. This enables the direct transfer of a wealth of non‑asymptotic results from least‑squares regression to logistic regression, covering both ℓ₂ and ℓ₁ regularization, and even non‑parametric kernel methods, without imposing restrictive bounds on the linear predictor. The work bridges statistical learning theory and convex optimization, offering a versatile toolset for future analyses of generalized linear models.
Comments & Academic Discussion
Loading comments...
Leave a Comment