Tractable Gaussian Phase Retrieval with Heavy Tails and Adversarial Corruption with Near-Linear Sample Complexity
Phase retrieval is the classical problem of recovering a signal $x^* \in \mathbb{R}^n$ from its noisy phaseless measurements $y_i = \langle a_i, x^* \rangle^2 + ζ_i$ (where $ζ_i$ denotes noise, and $a_i$ is the sensing vector) for $i \in [m]$. The problem of phase retrieval has a rich history, with a variety of applications such as optics, crystallography, heteroscedastic regression, astrophysics, etc. A major consideration in algorithms for phase retrieval is robustness against measurement errors. In recent breakthroughs in algorithmic robust statistics, efficient algorithms have been developed for several parameter estimation tasks such as mean estimation, covariance estimation, robust principal component analysis (PCA), etc. in the presence of heavy-tailed noise and adversarial corruptions. In this paper, we study efficient algorithms for robust phase retrieval with heavy-tailed noise when a constant fraction of both the measurements $y_i$ and the sensing vectors $a_i$ may be arbitrarily adversarially corrupted. For this problem, Buna and Rebeschini (AISTATS 2025) very recently gave an exponential time algorithm with sample complexity $O(n \log n)$. Their algorithm needs a robust spectral initialization, specifically, a robust estimate of the top eigenvector of a covariance matrix, which they deemed to be beyond known efficient algorithmic techniques (similar spectral initializations are a key ingredient of a large family of phase retrieval algorithms). In this work, we make a connection between robust spectral initialization and recent algorithmic advances in robust PCA, yielding the first polynomial-time algorithms for robust phase retrieval with both heavy-tailed noise and adversarial corruptions, in fact with near-linear (in $n$) sample complexity.
💡 Research Summary
Phase retrieval seeks to reconstruct an unknown signal x* ∈ ℝⁿ from magnitude‑only measurements yᵢ = ⟨aᵢ, x*⟩² + ζᵢ, where the sensing vectors aᵢ are drawn from a standard Gaussian distribution. In many practical scenarios the noise ζᵢ is heavy‑tailed (zero‑mean, bounded fourth moment) and a constant fraction ε of the (aᵢ, yᵢ) pairs may be arbitrarily corrupted by an adversary. This “strong adversarial corruption” model captures, for example, compromised sensors in large‑scale IoT deployments. Prior work (Buna & Rebeschini, AISTATS 2025) gave an exponential‑time algorithm with O(n log n) samples, relying on a robust spectral initializer that computes the top eigenvector of Cov(y a). Dong et al. later claimed a near‑linear‑time method for the noiseless case (ζ = 0), but their reduction contained a bug. Consequently, no polynomial‑time algorithm with near‑linear sample complexity was known for the full heavy‑tailed, adversarial setting.
The present paper bridges this gap by combining two recent advances in robust statistics. First, it leverages the robust PCA algorithm of Cherapanamjeri et al. (2022) which, under a bounded‑fourth‑moment assumption, can approximate the leading eigenvector of a covariance matrix in polynomial time with only ˜O(n) samples. To make this applicable to phase retrieval, the authors introduce a carefully chosen truncation of the product y a. The truncation threshold is set based on the ratio r = K₄/‖x*‖² (where K₄ bounds the fourth moment of the noise). This ensures that the truncated variables still satisfy the bounded‑fourth‑moment condition, allowing the robust PCA routine to recover an accurate estimate of the top eigenvector of Cov(y a).
Second, the paper adopts the robust gradient descent framework of Bun et al. (2025). After obtaining an initial iterate x₀ from the truncated‑PCA step, the algorithm repeatedly computes a robustified gradient of the population loss r(x) = ½ E
Comments & Academic Discussion
Loading comments...
Leave a Comment