Exploiting the Structure in Tensor Decompositions for Matrix Multiplication
We present a new algorithm for fast matrix multiplication using tensor decompositions which have special features. Thanks to these features we obtain exponents lower than what the rank of the tensor decomposition suggests. In particular for $6\times 6$ matrix multiplication we reduce the exponent of the recent algorithm by Moosbauer and Poole from $2.8075$ to $2.8016$, while retaining a reasonable leading coefficient.
💡 Research Summary
The paper introduces a novel recursive matrix‑multiplication algorithm that exploits structural properties of tensor decompositions to obtain a lower exponent than the rank of the underlying decomposition would suggest. The authors focus on exact algorithms rather than approximate ones, aiming for practical relevance while still improving the asymptotic exponent ω.
The starting point is the well‑known connection between a tensor restriction ⟨n,m,p⟩ ≤ ⟨r⟩ (or a rank‑r decomposition) and a recursive algorithm that multiplies N×N matrices in O(N^ω) time with ω = log_n(r). Classical approaches, such as Strassen’s algorithm, treat each recursive call independently, counting only the number of scalar multiplications. The key insight of this work is that many recursive calls often share one of the input blocks or produce an output block that is used in several positions. When such sharing occurs, the calls can be merged into a single multiplication of larger sub‑matrices, effectively increasing the depth of recursion without increasing the total number of scalar multiplications. This “common input/output sharing” reduces the leading constant and, more importantly, the exponent.
To demonstrate the technique, the authors revisit the 6×6 matrix‑multiplication tensor ⟨6,6,6⟩ studied by Moosbauer and Poole, which admits a decomposition into 153 rank‑one tensors. By analysing the pattern of these tensors they derive a restriction of the form
⟨6,6,6⟩ ≤ 137·⟨1⟩ ⊕ 8·⟨1,1,2⟩.
The term ⟨1,1,2⟩ can be computed with two scalar multiplications, while ⟨2,2,2⟩ (obtained by a cyclic permutation) can be handled by Strassen’s 7‑multiplication algorithm. Taking three Kronecker products of the restriction yields
⟨216,216,216⟩ ≤ ⟨3 581 065⟩,
which, via the standard Asymptotic Sum Inequality (ASI), gives an exponent ω₀ ≈ 2.80751—only a slight improvement over the original 2.80754. The breakthrough comes from applying the same structural idea recursively: instead of using Strassen’s algorithm for the ⟨2,2,2⟩ sub‑problems, the authors replace it with the very same 6×6 algorithm they are constructing. This self‑similar recursion deepens the call tree, leading to a further reduced exponent ω₀ ≈ 2.80496. By taking higher tensor powers and repeatedly merging shared sub‑computations, they finally achieve ω₀ = 2.8016 for the 6×6 base case, surpassing Strassen’s 2.8073 and the previous best 2.8075.
Beyond the exponent, the paper addresses the leading constant (the number of additions and scalar multiplications). After obtaining a decomposition that works modulo 2, the authors lift it to the integers using Hensel lifting. They then minimize the addition count through a three‑stage pipeline: (1) a flip‑graph search to find minimal‑multiplication decompositions, (2) application of DeGroote actions to optimize linear combination coefficients, and (3) common‑subexpression elimination via the Mårtensson‑Wagner method. This pipeline yields an addition count A(φ) that remains modest, ensuring that the overall operation count is competitive.
The authors also apply the same methodology to smaller base cases. For 3×3 multiplication they improve Laderman’s exponent from 2.854 to 2.836, and similar gains are reported for other small dimensions. These results illustrate that the structural exploitation is not limited to large tensors; it can be beneficial even for modest matrix sizes.
Importantly, the work bridges a gap that has existed in the literature: while the ASI has been a powerful tool for proving ever‑smaller asymptotic exponents, it has rarely been used to construct concrete, exact algorithms with reasonable constants. This paper provides an explicit construction, showing that the ASI can be turned into a practical algorithmic framework without resorting to approximation or excessively large base cases.
Experimental evaluation indicates that for matrix dimensions n ≥ 1000 the new algorithm requires fewer total operations than Strassen’s method. Although the constant factors mean that a speed‑up may not be observable immediately in real implementations, the algorithm lies well within the “practical” regime defined by Pan (who suggested that algorithms become practically relevant for n up to about 10²⁰). Consequently, the paper offers a significant step toward algorithms that are both theoretically superior and potentially implementable for large‑scale scientific computing.
Comments & Academic Discussion
Loading comments...
Leave a Comment