Computing adjoint mismatch of linear maps
This paper considers the problem of detecting adjoint mismatch for two linear maps. To clarify, this means that we aim to calculate the operator norm for the difference of two linear maps, where for one we only have a black-box implementation for the evaluation of the map, and for the other we only have a black-box for the evaluation of the adjoint map. We give a stochastic algorithm for which we prove the almost sure convergence to the operator norm. The algorithm is a random search method for a generalization of the Rayleigh quotient and uses optimal step sizes. Additionally, a convergence analysis is done for the corresponding singular vector and the respective eigenvalue equation.
💡 Research Summary
The paper tackles the practical problem of quantifying the “adjoint mismatch” between two linear operators A : ℝ^d → ℝ^m and V : ℝ^d → ℝ^m when only black‑box access to the forward evaluation of A and the adjoint evaluation of V is available. This situation arises, for example, in computed tomography where forward projection and back‑projection are discretized independently, often leading to non‑adjoint pairs and making the operator norm ‖A − V‖ difficult to compute. Traditional approaches such as power iteration on L*L, Krylov subspace methods, or Lanczos require explicit access to the adjoint of the operator under investigation, and they also need to store many intermediate vectors, which is prohibitive for large‑scale problems.
The authors propose a stochastic, memory‑light algorithm that converges almost surely to the exact operator norm ‖A − V‖ while using only O(max{m,d}) storage (four vectors in total). The key observation is the variational characterization
‖A − V‖ = max_{‖u‖=‖v‖=1} ⟨u,(A − V)v⟩,
which can be interpreted as a generalized Rayleigh quotient involving a left vector u and a right vector v. Starting from random unit vectors u₀ and v₀ (with a sign adjustment to guarantee non‑negativity of the objective), the algorithm iteratively samples random directions w_k ∈ T_{u_k}S^{m‑1} and x_k ∈ T_{v_k}S^{d‑1} by projecting Gaussian vectors onto the respective tangent spaces. For each iteration the scalar function
q_k(τ,ξ) = ⟨u_k + τ w_k, (A − V)(v_k + ξ x_k)⟩ / (‖u_k + τ w_k‖‖v_k + ξ x_k‖)
is maximized over the two step sizes τ and ξ. By expanding the numerator and denominator, q_k can be expressed in terms of four inner‑product coefficients
a_k = ⟨u_k,Av_k⟩ − ⟨V* u_k, v_k⟩, b_k = ⟨w_k,Av_k⟩ − ⟨V* w_k, v_k⟩, c_k = ⟨u_k,Ax_k⟩ − ⟨V* u_k, x_k⟩, d_k = ⟨w_k,Ax_k⟩ − ⟨V* w_k, x_k⟩,
which are all computable with the available black‑boxes. The maximization problem reduces to a rational function of τ and ξ, and the authors derive closed‑form optimal step sizes (τ_k, ξ_k) by solving the first‑order optimality conditions. When the product a_k b_k + c_k d_k ≠ 0, the optimal τ_k is given by a sign‑adjusted expression involving a square root; the corresponding ξ_k follows from a simple linear relation. Degenerate cases (a_k b_k + c_k d_k = 0) are treated separately, showing that either τ = 0 or the supremum is not attained, but these events occur with probability zero under the random sampling scheme.
The algorithm proceeds as follows:
- Initialize u₀, v₀ uniformly on the unit spheres, flip the sign of u₀ if necessary.
- For each iteration k: a. Sample Gaussian vectors y_k ∈ ℝ^d and z_k ∈ ℝ^m, project them to obtain x_k and w_k. b. Compute a_k, b_k, c_k, d_k using the black‑box calls Av_k, V* u_k, Ax_k, V* x_k, etc. c. Compute τ_k, ξ_k via the closed‑form formulas (Propositions 2.3 and 2.6). d. Update u_{k+1} = (u_k + τ_k w_k)/‖u_k + τ_k w_k‖ and v_{k+1} = (v_k + ξ_k x_k)/‖v_k + ξ_k x_k‖. e. The current estimate of the norm is |⟨u_k,Av_k⟩ − ⟨V* u_k, v_k⟩|.
Mathematically, the authors prove that the sequence of objective values is non‑decreasing, bounded above by ‖A − V‖, and converges almost surely to the supremum. They also show that the iterates (u_k, v_k) converge to a pair of left and right singular vectors associated with the largest singular value of A − V. The convergence rate depends on a “spectral gap” expressed through the quantities a_k² + c_k² versus b_k² + d_k²; when the gap is large, the algorithm exhibits rapid linear convergence, whereas a small gap leads to slower progress, mirroring behavior of classical power methods.
Experimental results validate the theory. In the special case V = 0, the proposed method matches or outperforms the earlier stochastic algorithm of
Comments & Academic Discussion
Loading comments...
Leave a Comment