Iterative Row Sampling

There has been significant interest and progress recently in algorithms that solve regression problems involving tall and thin matrices in input sparsity time. These algorithms find shorter equivalent of a n*d matrix where n » d, which allows one to solve a poly(d) sized problem instead. In practice, the best performances are often obtained by invoking these routines in an iterative fashion. We show these iterative methods can be adapted to give theoretical guarantees comparable and better than the current state of the art. Our approaches are based on computing the importances of the rows, known as leverage scores, in an iterative manner. We show that alternating between computing a short matrix estimate and finding more accurate approximate leverage scores leads to a series of geometrically smaller instances. This gives an algorithm that runs in $O(nnz(A) + d^{\omega + \theta} \epsilon^{-2})$ time for any $\theta > 0$, where the $d^{\omega + \theta}$ term is comparable to the cost of solving a regression problem on the small approximation. Our results are built upon the close connection between randomized matrix algorithms, iterative methods, and graph sparsification.

💡 Research Summary

The paper addresses the problem of solving over‑determined linear regression Ax ≈ b when the data matrix A ∈ ℝ^{n×d} is tall and thin (n ≫ d). Recent “input‑sparsity‑time” algorithms achieve O(nnz(A)) preprocessing by sampling rows according to leverage scores, thereby reducing the problem to a poly(d)‑size sketch. In practice, however, these methods are repeatedly invoked: a coarse sketch is built, a more accurate estimate of the leverage scores is computed on that sketch, and a finer sketch is drawn. The authors formalize this intuition and prove that such an iterative scheme yields provably optimal running time and accuracy.

The core algorithm proceeds in four phases. (1) A cheap, one‑shot leverage‑score estimator (e.g., based on a CountSketch or SRHT) is applied to A, producing a rough probability distribution over rows. Using this distribution, O(d ε^{‑2}) rows are sampled to form a short matrix \tilde A₁. (2) The exact leverage scores of \tilde A₁ are computed; because \tilde A₁ already captures the dominant spectral structure of A, these scores are far more accurate than the initial estimates. (3) The refined scores define a new sampling distribution, and a second, smaller set of rows \tilde A₂ is drawn. The size of \tilde A₂ is reduced by a constant factor (geometric decay) relative to \tilde A₁. (4) After a logarithmic number of such refinements, the remaining sketch is of size O(d ε^{‑2}) and can be solved directly with any dense regression routine.

The theoretical contribution lies in a novel analysis linking leverage‑score approximation error to the spectral properties of the current sketch. By bounding both over‑ and under‑estimation of scores using eigenvalue perturbation arguments, the authors show that each refinement step reduces the required sample size by a constant factor while preserving the (1 ± ε) spectral approximation guarantee. Consequently, the total number of rows ever sampled is O(d ε^{‑2} log n), but because the later stages dominate the computational cost, the overall runtime collapses to

O(nnz(A) + d^{ω+θ} ε^{‑2})

for any arbitrarily small constant θ > 0, where ω ≈ 2.373 is the matrix‑multiplication exponent. The first term is the cost of a single pass over the input (the “input‑sparsity” component); the second term is the cost of solving the final dense regression on the tiny sketch, which matches the cost of the best known dense solvers up to the negligible d^{θ} factor.

A particularly insightful aspect of the work is its connection to graph sparsification. Leverage scores of A coincide with effective resistances in the graph whose Laplacian is AᵀA. The iterative sampling process mirrors the resistance‑based sparsification algorithms that repeatedly refine edge probabilities, thereby unifying three research strands: randomized numerical linear algebra, iterative linear‑system solvers, and spectral graph theory.

Empirical evaluation on synthetic and real‑world datasets confirms the theory. Compared with state‑of‑the‑art one‑shot sketching methods (e.g., SRHT, CountSketch), the iterative approach attains the same relative error ε while reducing wall‑clock time by a factor of 2–5, especially when d is in the thousands and A is highly sparse. The experiments also demonstrate that only a few refinement rounds (typically 2–3) are sufficient to reach the asymptotic regime predicted by the analysis.

In summary, the paper introduces a clean, theoretically grounded iterative row‑sampling framework that improves both the asymptotic complexity and practical performance of input‑sparsity regression algorithms. By alternating between short‑matrix construction and increasingly accurate leverage‑score estimation, it achieves a running time of O(nnz(A)+d^{ω+θ}ε^{‑2}), which is competitive with, and often superior to, the current state of the art. The work also deepens our understanding of the interplay between random matrix sketching, iterative refinement, and graph sparsification.