Matrix approximation methods have successfully produced efficient, low-complexity approximate transforms for the discrete cosine transforms and the discrete Fourier transforms. For the DFT case, literature archives approximations operating at small power-of-two blocklenghts, such as \{8, 16, 32\}, or at large blocklengths, such as 1024, which are obtained by means of the Cooley-Tukey-based approximation relying on the small-blocklength approximate transforms. Cooley-Tukey-based approximations inherit the intermediate multiplications by twiddled factors which are usually not approximated; otherwise the effected error propagation would prevent the overall good performance of the approximation. In this context, the prime factor algorithm can furnish the necessary framework for deriving fully multiplierless DFT approximations. We introduced an approximation method based on small prime-sized DFT approximations which entirely eliminates intermediate multiplication steps and prevents internal error propagation. To demonstrate the proposed method, we design a fully multiplierless 1023-point DFT approximation based on 3-, 11- and 31-point DFT approximations. The performance evaluation according to popular metrics showed that the proposed approximations not only presented a significantly lower arithmetic complexity but also resulted in smaller approximation error measurements when compared to competing methods.
The discrete Fourier transform (DFT) is a central tool in signal processing [9], finding applications in a very large number of contexts, such as spectral analysis [74], filtering [64], data compression [69], and fast convolution [11], to cite a few. The widespread usage of the DFT is due to its rich physical interpretation [10] and the existence of efficient methods for its computation [61]. Although the direct computation of the N-point DFT is an operation in O (N 2 )-which is prohibitively expensive [35]-efficient algorithms [10,15,61] collectively known as fast Fourier transforms (FFTs) [6] are capable of evaluating the DFT with much less numerical operations placing the resulting complexity in O (N log N) [15].
Despite such substantial reduction in complexity, the remaining operations can still be significant in contexts where severe restrictions in computational power and/or energy autonomy are present [48]. Such restrictive conditions arise in the framework of wireless communication [54,85], embedded systems [44,49], and Internet of Things (IoT) [50,73].
Inspired by the successful methods for approximating the discrete cosine transform [5,7,8,12,19,21,22,33], in [75], a suite of multiplierless DFT approximations was derived for N = 8, 16, and 32 [2,20,46,52,53,76]. These DFT approximations were demonstrated to provide spectral estimates close to the exact DFT computation, while requiring only 26, 54, and 144 additions for real-valued input, respectively [20,75]. Broadly, finding approximate transforms that closely match the performance of the exact ones is a hard task, because it is often posed as an integer non-linear matrix optimization problem with a large number of variables [27]. Thus, as N increases, obtaining good approximations becomes an exceedingly demanding problem to be solved [65]. As a consequence, designers of DFT approximations make use of indirect methods such as (i) mathematical relationships between small-sized and large-sized DFT matrices [15], (ii) matrix functional recursions [62], and (iii) matrix decompositions [70]. The systematic derivation of good DFT approximations for large block sizes is still an open problem and technical advances occur in a case-by-case fashion due to the inherent numerical difficulties of finding integer matrices that ensure competitive performance.
Following such an indirect approach, the 32-point DFT approximation discussed in [20,75] was employed as the fundamental block of the 1024-point DFT approximation introduced in [53]. When employed as a fundamental block to obtain larger transforms, the DFT is referred to as a ground transformation. The methodology described in [53] revisits the Cooley-Tukey algorithm and effectively extends a given 32-point DFT approximation resulting in a 32 2 -point DFT approximation. This extension stems from the fact that the Cooley-Tukey algorithm can be formulated according to a two-dimensional mapping such that the computation of the 1024-point DFT is performed by 2 × 32 instantiations of the 32-point DFT [25]. However, even considering multiplierless 32-point DFT approximations, the resulting 1024-point DFT approximations proposed in [53] are not multiplication-free. Indeed, the Cooley-Tukey-based approximations inherit the twiddle factors present in the exact formulation of the traditional Cooley-Tukey algorithm [61]. Thus, the final resulting arithmetic complexity of the best Cooley-Tukey-based 1024-point DFT approximation in [53] is 2883 real multiplications and 25155 additions, which represent approximately a 72% reduction in terms of real multiplication and an 18% reduction in terms of additions when compared to the exact Cooley-Tukey algorithm [6].
The goal of the present paper is to propose a framework for deriving large DFT approximations that are fully multiplierless. Although, in fixed-point arithmetic, any multiplication can theoretically be expressed as a sum of dyadic terms, we adopt the term multiplierless in a more restrictive and practical sense, consistent with [12], where the minimum number of adders is sought. In this work, matrices elements assumes values in {0,±1,± 1 2 } and eventual scaling constants have their dyadic representation limited to at most two additions. Such multiplierlessness criterion emphasizes that the proposed approximations rely only on additions and bit-shifting operations, aiming at the minimum number of adders, so that future implementations can achieve reductions in chip area, power consumption, and delay [2]. For such an end, we aim at exploiting the prime factor algorithm (PFA) [31,78], also known as the Good-Thomas algorithm. The PFA has distinct number-theoretical properties capable of performing the DFT computation without intermediate computations such as twiddle factors, which the traditional radix-2 algorithm heavily rely on. This approach allows the construction of scalable, multiplierless DFT approximations, significantly reducing arithmetic complexity whi
This content is AI-processed based on open access ArXiv data.