Generating and Searching Families of FFT Algorithms

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

A fundamental question of longstanding theoretical interest is to prove the lowest exact count of real additions and multiplications required to compute a power-of-two discrete Fourier transform (DFT). For 35 years the split-radix algorithm held the record by requiring just 4n log n - 6n + 8 arithmetic operations on real numbers for a size-n DFT, and was widely believed to be the best possible. Recent work by Van Buskirk et al. demonstrated improvements to the split-radix operation count by using multiplier coefficients or “twiddle factors” that are not n-th roots of unity for a size-n DFT. This paper presents a Boolean Satisfiability-based proof of the lowest operation count for certain classes of DFT algorithms. First, we present a novel way to choose new yet valid twiddle factors for the nodes in flowgraphs generated by common power-of-two fast Fourier transform algorithms, FFTs. With this new technique, we can generate a large family of FFTs realizable by a fixed flowgraph. This solution space of FFTs is cast as a Boolean Satisfiability problem, and a modern Satisfiability Modulo Theory solver is applied to search for FFTs requiring the fewest arithmetic operations. Surprisingly, we find that there are FFTs requiring fewer operations than the split-radix even when all twiddle factors are n-th roots of unity.

💡 Research Summary

The paper addresses the long‑standing problem of determining the minimal number of real additions and multiplications required to compute a power‑of‑two discrete Fourier transform (DFT). For more than three decades the split‑radix FFT held the record with a cost of 4 n log₂ n − 6 n + 8 real FLOPs, and it was widely believed to be optimal. Recent work by Van Buskirk and Lundy showed that by allowing twiddle factors that are not n‑th roots of unity one can shave about 5.6 % off this count (the so‑called tangent FFT).

The authors take a different angle: they keep the restriction that all twiddle factors must be n‑th roots of unity, but they explore a much larger family of FFT algorithms that share a common flow‑graph structure. Any radix‑2, radix‑4, split‑radix, conjugate‑split‑radix, or twisted FFT can be represented by the same directed acyclic graph; the only difference among algorithms is the assignment of twiddle‑factor exponents to the graph’s edges. By labeling each node with a triple (stride, base, Wₛ) – quantities that remain invariant across the family – they obtain a compact parametric description of the entire solution space.

The key contribution is to cast the search for the lowest‑cost algorithm as a Boolean Satisfiability (SAT) problem augmented with theories of fixed‑size bit‑vectors (SMT). Each twiddle‑factor exponent is represented as a bit‑vector variable; constraints enforce that the exponent lies in the range 0…n‑1 and that the overall arithmetic cost does not exceed a target C. The cost model distinguishes between free multiplications (by 1, –1, i, –i), cheap multiplications (by √i‑type constants, 4 FLOPs), and generic n‑th‑root multiplications (6 FLOPs). An additional optimization reduces the cost when the left and right twiddle factors of a node are complex conjugates, allowing a combined cost of 6 + 2 FLOPs instead of 12. Additions are always counted as 2 FLOPs per node.

To make the SMT problem tractable, the authors apply symmetry reduction and graph partitioning. Nodes that share the same (stride, base) are grouped, forcing them to use identical twiddle choices and thereby collapsing many symmetric solutions. The flow‑graph is also split into independent sub‑graphs that can be solved in parallel, dramatically shrinking the search space.

Using a modern SMT solver (the paper mentions Z3, CVC4, etc.) they exhaustively search for size‑256 and size‑512 FFTs. They discover 6 616 distinct algorithms for n = 256 and 15 128 for n = 512 that achieve the minimal FLOP count under the imposed constraints. These counts are 48 FLOPs (≈0.7 %) lower than the classic split‑radix for n = 256 (6664 → 6616) and 240 FLOPs (≈1.6 %) lower for n = 512 (15368 → 15128). Although still higher than the tangent FFT (6552 and 15048 FLOPs respectively), the gap is explained by the restriction to unit‑modulus twiddle factors; relaxing this restriction would recover the previously known lower counts.

The authors stress that the primary goal is not to produce a new practical FFT implementation—modern hardware often makes FLOP count less critical—but to demonstrate that a SAT/SMT formulation can rigorously prove lower bounds for a well‑defined class of algorithms. They argue that the same framework can be extended to incorporate non‑unit‑modulus twiddles, hardware‑specific cost models (e.g., accounting for pipeline stalls, memory bandwidth, or fixed‑point quantization), and multi‑objective optimization (balancing FLOPs, accuracy, and energy).

In conclusion, the paper shows that Boolean satisfiability techniques, when combined with a careful algebraic representation of FFT flow‑graphs, provide a powerful tool for exploring vast algorithmic families and establishing provable optimality results. This approach opens a new avenue for FFT research, suggesting that many long‑standing “optimal” algorithms may be superseded not by clever hand‑derived formulas but by systematic, computer‑assisted search within mathematically constrained design spaces.

Generating and Searching Families of FFT Algorithms

💡 Research Summary

Comments & Academic Discussion

Leave a Comment