Improved Sparse Recovery for Approximate Matrix Multiplication

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We present a simple randomized algorithm for approximate matrix multiplication (AMM) whose error scales with the output norm $|AB|_F$. Given any $n\times n$ matrices $A,B$ and a runtime parameter $r\leq n$, the algorithm produces in $O(n^2(r+\log n))$ time, a matrix $C$ with total squared error $\mathbb{E}[|C-AB|_F^2]\le (1-\frac{r}{n})|AB|_F^2$, per-entry variance $|AB|F^2/n^2$ and bias $\mathbb{E}[C]=\frac{r}{n}AB$. Alternatively, the algorithm can compute an unbiased estimation with expected total squared error $\frac{n}{r}|{AB}|{F}^2$, recovering the state-of-art AMM error obtained by Pagh’s TensorSketch algorithm (Pagh, 2013). Our algorithm is a log-factor faster. The key insight in the algorithm is a new variation of pseudo-random rotation of the input matrices (a Fast Hadamard Transform with asymmetric diagonal scaling), which redistributes the Frobenius norm of the output $AB$ uniformly across its entries.

💡 Research Summary

The paper introduces a simple randomized algorithm for approximate matrix multiplication (AMM) whose error scales with the Frobenius norm of the output matrix AB. The authors observe that existing AMM techniques compress the input matrices (via low‑rank projections or sampling) before multiplying, which can be problematic in data‑intensive settings such as large‑scale deep learning where preserving the full parameter space is desirable. To avoid compression, they propose a “pseudo‑random rotation” of the input matrices that uniformly spreads the energy of the product across all entries, enabling a sparse reconstruction of the result.

The core transformation is defined as W_{α,β}(X) = H Diag(α) X Diag(β) H, where H is the Walsh‑Hadamard matrix (implemented via the Fast Hadamard Transform) and α, β ∈ {±1}ⁿ are independent random sign vectors. This operator is unitary, preserving the Frobenius norm, and satisfies the distributive property W_{α,β}(AB) = W_{α,γ}(A)·W_{γ,β}(B) for any third sign vector γ. The algorithm proceeds as follows:

Sample three independent random sign vectors α, β, γ.
Compute the rotated matrices A′ = W_{α,γ}(A) and B′ = W_{γ,β}(B). Each rotation costs O(n² log n) time because vec(H X H) = (H ⊗ H) vec(X) can be evaluated with the Fast Hadamard Transform, and diagonal scaling costs O(n²).
Choose a set F of r·n indices uniformly at random from the n² possible output positions. For each (i,j) ∈ F, compute the exact entry (A′ B′)_{i,j} and store it in a sparse matrix C′; all other entries remain zero.
Apply the inverse rotation C = W^{-1}_{α,β}(C′) to obtain the final approximation.

The total runtime is O(n² (r + log n)), a logarithmic factor faster than the best known TensorSketch algorithm (Pagh 2013). The authors analyze two variants:

Biased estimator (as described above). The expectation of each entry is E

Improved Sparse Recovery for Approximate Matrix Multiplication

💡 Research Summary

Comments & Academic Discussion

Leave a Comment