Detecting when one probe vector is enough for preconditioned log-determinant approximation
We present randomized algorithms for estimating the log-determinant of regularized symmetric positive semi-definite matrices. The algorithms access the matrix only through matrix vector products, and are based on the introduction of a preconditioner and stochastic trace estimator. We claim that preconditioning as much as we can and making a rough estimate of the residual part with a small budget achieves a small error in most of the cases. We choose a Nyström preconditioner and estimate the residual using only one sample of stochastic Lanczos quadrature (SLQ). We analyze the performance of this strategy from a theoretical and practical viewpoint. We also present an algorithm that, at almost no additional cost, detects whether the proposed strategy is not the most effective, in which case it uses more samples for the SLQ part. Numerical examples on several test matrices show that our proposed methods are competitive with existing algorithms.
💡 Research Summary
The paper addresses the problem of estimating the regularized log‑determinant log det(A + I) of a large symmetric positive‑semidefinite (SPSD) matrix A, a quantity that appears in many machine‑learning and statistical applications such as Gaussian‑process marginal likelihoods and determinantal point processes. Classical approaches either compute the determinant directly (O(n³) cost) or rely on low‑rank approximations of A, but these can be inefficient when the spectrum decays slowly.
The authors propose a two‑stage randomized algorithm that combines a Nyström preconditioner with stochastic Lanczos quadrature (SLQ). First, a Nyström approximation A_ℓ = AΩ(ΩᵀAΩ)†ΩᵀA is built using ℓ = k + p matrix‑vector products, where k is a target rank and p an oversampling parameter. The preconditioner is defined as P_ℓ = A_ℓ + I. By the multiplicative property of determinants,
log det(A + I) = log det(P_ℓ) + trace log(P_ℓ^{-½}(A + I)P_ℓ^{-½}).
The first term is cheap to compute exactly because P_ℓ is low‑rank plus identity. The second term involves the matrix M_ℓ = P_ℓ^{-½}(A + I)P_ℓ^{-½}, which is well‑conditioned when the Nyström approximation captures the dominant spectral mass.
Instead of using many probe vectors for the trace estimator, the authors show that a single Gaussian vector w can already give a low‑variance estimate if M_ℓ is sufficiently well‑conditioned. The estimator is
trace log(M_ℓ) ≈ wᵀ log(M_ℓ) w,
and its expected squared error equals 2‖log(M_ℓ)‖_F². The paper provides a theoretical analysis of this “one‑sample” strategy under two settings: (i) an idealized case where the quadratic forms wᵀ log(M_ℓ) w are computed exactly (cost m matrix‑vector products), and (ii) the realistic case where M_ℓ is obtained from the Nyström preconditioner. Lemma 3.1 derives an upper bound on E‖log(M_ℓ)‖_F² in terms of the trailing eigenvalues of A and the parameters k, p, ℓ, showing exponential decay of the bound when the spectrum of A has moderate decay.
Recognizing that the one‑sample approach may fail when the spectrum decays very slowly (so ‖log(M_ℓ)‖_F remains large), the authors introduce an adaptive algorithm called log‑det‑ective (Algorithm 4.1). After constructing the Nyström preconditioner, the algorithm evaluates the variance proxy of the one‑sample estimator; if it exceeds a user‑defined threshold, additional probe vectors are drawn and the trace is estimated with the usual SLQ averaging. This adaptation incurs virtually no extra cost because the extra matrix‑vector products are a small fraction of the total budget.
Extensive numerical experiments on synthetic and real‑world matrices with varying spectral profiles (rapid decay, slow decay, spectral gaps, radial basis function kernels) confirm the theory. For matrices with rapid decay, allocating almost all budget to the Nyström preconditioner (ℓ ≈ k + p) and using a single probe vector yields relative errors below 10⁻⁴ with far fewer matrix‑vector products than competing methods. For slowly decaying spectra, log‑det‑ective automatically switches to a multi‑sample regime, achieving errors around 10⁻³ while still outperforming Hutch++, A‑Hutch++, and other low‑rank‑plus‑SLQ hybrids in terms of error‑to‑cost ratio. The authors also demonstrate that the computational cost of building the Nyström preconditioner is essentially independent of the matrix dimension n, making the approach scalable to very large problems (hundreds of thousands of rows).
In summary, the paper contributes:
- A rigorous error analysis of a one‑sample trace estimator applied to a Nyström‑preconditioned matrix, including explicit Frobenius‑norm bounds that depend only on the spectrum of A.
- An adaptive scheme (log‑det‑ective) that automatically balances preconditioning effort and the number of probe vectors, ensuring robust performance across a wide range of spectral decays.
- Empirical evidence that the proposed method consistently achieves higher accuracy per matrix‑vector product than state‑of‑the‑art randomized log‑determinant estimators.
The work thus offers a practical, theoretically grounded tool for log‑determinant approximation in large‑scale machine‑learning and statistical applications, where matrix‑vector products are the only feasible operation.
Comments & Academic Discussion
Loading comments...
Leave a Comment