Multiplierless 16-point DCT Approximation for Low-complexity Image and Video Coding

Multiplierless 16-p oin t DCT Appro ximation for Lo w-complexit y Image and Video Co ding T. L. T. Silv eira ∗ R. S. Oliv eira † F. M. Ba y er ‡ R. J. Cin tra § A. Madanay ak e ¶ Abstract An orthogonal 16-p o int appro ximate discrete cosine transform (DCT) is in tro duced. The prop o sed transform requires neither multiplications nor b i t-sh ifting operations. A fa st algor ithm based on matrix factorizatio n is introd uced, requiring only 44 additions—the low est arithmetic cost in literature. T o assess the introduced transform, computational complexit y , similarit y with the ex a ct DCT, and coding p erf ormance measures are computed. Classical and state-of-the-art 16-point low-complexit y transforms w ere used in a comparativ e analysis. I n the context of imag e compression, t he prop os ed appro ximation w as ev aluated via PSNR and SS IM measuremen ts, attaining the b est cost-ben e ﬁt ratio among the com- p eti tors. F or video encod i ng, th e prop o sed approximation was embedd ed into a HEVC reference softw are for direct comparison with t h e origi nal H EV C standard. Physically realized and tested u sing FPGA hard- w are, the prop osed transform show ed 35% and 37% impro vements of area-time and area-time-squared VLSI metrics when compared to the best competing transform in the literature. Keywords DCT approximatio n, F ast algori thm , Low cost algori thms, Image compression, Video coding 1 Introduction The discrete cosine transform (DCT) [1, 2] is a fundamental building-block for several image and vide o pro- cessing a pplications. In fact, the DCT closely a p proximates the Karhunen-Lo` eve transform (K L T) [1], whic h is capable of optimal data decorr e lation and energ y compaction of ﬁrst-or der stationar y Markov signals [1]. This class of signals is particular ly appr opriate for the modeling o f natura l images [1 , 3]. Thu s, the DCT ﬁnds applications in several contempo rary ima ge and video compress io n standa rds, such as the JP EG [4] and the H.26 x family of codec s [5–7]. Indeed, several fast algorithms for c o mput ing the exact DCT w ere pro- po sed [8 – 15]. Howev er, these metho ds r equire the us e of arithmetic multipliers [16, 1 7 ], which a re time, p o wer, and hardw are dema ndin g arithmetic op erations, when compared to additions or bit-shifting op erations [18]. This fact may jeopardiz e the application o f the DCT in very low p o wer consumption contexts [19, 20]. T o ∗ T. L. T. Silv eira is with the Programa de P´ os-Gradua¸ c˜ ao em Computa¸ c˜ ao, Universidade F ederal do Rio Grande do Sul (UFR GS), P orto Alegre, RS, Brazil † R. S. Oliveira is with the Signal Pro ce ssing Group, Departamen to de Estat ´ ıstica, Unive rsi d ade F ederal de Pernam buco (UFPE); Pr o grama de Gradua¸ c˜ ao em Estat ´ ıstica (UFPE), Brazil, and the Department of Electrical and Computer Engineering, Unive rs i t y of Akron, OH ‡ F. M. Bay er i s with the Departamen to de Estat ´ ıs tica, UFSM, and LACESM, San ta Maria, RS, Brazil, E- m a il : bay er@ufsm.br § R. J. Cintra is with the Signal Pro ce ssi n g Group, Departamen to de Estat ´ ıs t ica, Universidade F ederal de Pernam buco. E-mail: rjdsc@stat.ufpe.org ¶ A. Madanay ak e is with th e Departmen t o f Electrical and Compu ter Engineering, U ni v ersit y o f Akron, OH, E- mail: ar juna @uakron.edu 1 ov ercome this pro blem, in r ecen t years, several approximate DCT methods hav e b een prop osed. Suc h ap- proximations do not compute the ex act DCT, but are capable of providing energy compaction [21, 22] at a very low computational cost. In particular , the 8 - point DCT was given a num ber of approximations: the signed DCT [17], the level 1 appro ximation [1 6], the Bouguezel-Ahmad-Swamy (BAS) transforms [21, 23–26], the r o unded DCT (RDCT) [27], the mo diﬁed RDCT [28], the approximation in [29], a nd the improv ed DCT a ppr o ximation in tro duced in [30]. These metho ds furnish mea ningful DCT approximations using only addition and bit-shifting op erations, whils t oﬀering s u ﬃcient computational acc ur acy for ima g e and vide o pro cessing [31]. Recently , with the gr o wing need for higher compressio n rates [30], the high eﬃciency video co ding (HE VC) was prop osed [32, 3 3 ]. Unlik e sev eral image and video compr e ssion standar ds, the HE V C employs 4-, 16-, and 32-p oin t integer DCT-based trans f or m atio ns [30, 32]. In contrast to the 8-p oint DCT case—where dozens o f approximations are a v a ilable [2 1 , 25, 27, 2 8, 30, 34], —the 16-p oin t DCT approximation metho ds a re muc h less explored in literatur e . T o the best of our knowledge, only the following or thogonal methods are av aila ble: the traditional W alsh–Hadama rd transform (WHT) [35], the BAS-2010 [24] and BAS-2013 [2 6 ] approximations, and the trans fo rmations prop osed in [31], [22], and [36]. In this work, we aim at prop osing a low-complexit y orthogonal 1 6-point DCT approximation capable of outp e rforming all comp eting metho ds in terms o f arithmetic complexity while ex hib iting very close co ding per formance when compared to sta t e-o f -the-a rt metho ds. F o r such, we adv a nce a tr a nsformation matrix which co m bines instantiations of a low-complexit y 8-p oin t approximation according to a divide-and-c o nquer approach. The remainder o f this pap er is org anized as follows. Section 2 intro duces the new DCT approximation, a fast algo rithm based o n ma t rix factor ization, and a comprehensive a ssessmen t in terms of co mput atio na l complexity and several p erformance metrics . In Section 3, the pr oposed appr oximation is submitted to computational sim ulations co nsisting of a JPEG-like sc heme for still image compre ssion and the em b edding of the prop osed approximation into a HE V C sta nda rd reference softw are. Section 4 a ssesses the prop osed transform in a har dw are realiza tion based o n ﬁe ld- programmable g ate ar ra y (FPGA). Conclusions are dra wn in Section 5. 2 16-point DCT appro xima tion 2.1 Definition It is well-known that several fast alg orithm structures co mput e the N -po in t DCT through re c u rs iv e computa- tions of the N 2 -p oin t DCT [1, 2, 13, 31, 36]. F ollowing a similar approach to that a do pted in [31, 36 ], we prop ose a new 16 -point appro ximate DCT b y combining t wo instan tiations of the 8-p oin t DCT approximation in tro- duced in [28] with ta ilored s ignal c hanges and p erm utations. This pro cedure is induced by signal-ﬂow gra ph in Fig . 1. T his particular 8-p oint DCT approximation, presented a s T 8 in Fig. 1, was selected b ecause (i) it presents the lowest computational co st among the a ppro ximations a rc hived in literatur e (zer o m ultiplications, 14 additions, a nd zero bit-shifting o perations) [28] a nd (ii) it oﬀers go o d energy compaction pr operties [37]. 2 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 15 x 14 x 13 x 12 x 11 x 10 X 0 X 8 x 0 X 12 X 15 X 4 X 6 X 10 X 2 X 11 X 1 X 5 − X 3 − X 13 − X 9 X 14 X 7 T 8 T 8 Figure 1 : Signa l-ﬂo w graph of the fa s t algo rithm for T . The input data x i , i = 0 , 1 , . . . , 15 re la tes to the output data X j , j = 0 , 1 , . . . , 1 5 according to X = T · x . Dashed arrows represent multiplications by - 1. As a result, the prop o sed transformation matrix is given by: T =               1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 − 1 − 1 − 1 − 1 − 1 − 1 − 1 − 1 1 0 0 0 0 0 0 − 1 − 1 0 0 0 0 0 0 1 1 1 0 0 0 0 − 1 − 1 1 1 0 0 0 0 − 1 − 1 1 0 0 − 1 − 1 0 0 1 1 0 0 − 1 − 1 0 0 1 1 1 − 1 − 1 − 1 − 1 1 1 − 1 − 1 1 1 1 1 − 1 − 1 0 0 − 1 0 0 1 0 0 0 0 1 0 0 − 1 0 0 0 0 0 0 0 0 − 1 1 − 1 1 0 0 0 0 0 0 1 − 1 − 1 1 1 − 1 − 1 1 1 − 1 − 1 1 1 − 1 − 1 1 0 0 − 1 1 0 0 0 0 0 0 0 0 − 1 1 0 0 0 − 1 0 0 0 0 1 0 0 1 0 0 0 0 − 1 0 0 0 1 1 − 1 − 1 0 0 0 0 1 1 − 1 − 1 0 0 0 − 1 1 0 0 1 − 1 0 0 − 1 1 0 0 1 − 1 0 1 − 1 0 0 0 0 0 0 0 0 0 0 0 0 1 − 1 0 0 0 − 1 1 0 0 0 0 0 0 1 − 1 0 0 0 0 0 0 0 − 1 1 0 0 0 0 − 1 1 0 0 0 0               . The ent rie s of the resulting transfo r mation matrix are deﬁned ov er { 0 , ± 1 } , ther efore it is completely mul- tiplierless. Ab o ve trans fo rmation can b e orthog onalized according to the pro cedure des c ribed in [3, 2 7 , 38]. Thu s the asso ciate orthogona l DCT approximation is furnished b y ˆ C = S · T , where S = p ( T · T ⊤ ) − 1 and the supersc ript ⊤ denotes matrix tra nsposition. In particular, w e ha ve: S = 1 4 · diag  1 , 1 , 2 , √ 2 , √ 2 , 1 , 2 , 2 , 1 , 2 , 2 , √ 2 , √ 2 , 2 , 2 , 2  . In the context of image and video co ding, the diagonal matrix S do es not con tribute to the computational cost of ˆ C . This is beca use it can be merged in to the codec quantization steps [22, 25, 27, 3 1 ]. The r efore, the actual computation co st of the a ppro ximation is fully conﬁned in the low-complexity matrix T . 3 T a ble 1: Compariso n of co mpu tationa l complexities T r ansform Mult Add Shifts T otal Chen DCT 44 74 0 118 WHT 0 64 0 64 BAS-2010 0 64 8 72 BAS-2013 0 64 0 64 T r ansform in [22] 0 72 0 72 T r ansform in [31] 0 60 0 60 T r ansform in [36] 0 60 0 60 Prop osed approx. 0 44 0 44 2.2 F ast algo rithm and comput a tional complexity The trans formation T requir es 112 additions, if computed directly . How ever, it can b e g iven the following sparse matrix facto r ization: T = P 2 · M 4 · M 3 · M 2 · P 1 · M 1 , where M 1 = h I 8 I 8 I 8 − I 8 i , M 2 = diag h I 4 I 4 I 4 − I 4 i , h I 4 I 4 I 4 − I 4 i , M 3 = diag h I 2 I 2 I 2 − I 2 i , − I 4 , h I 2 I 2 I 2 − I 2 i , − I 4  , M 4 = diag h 1 1 0 1 − 1 0 0 0 − 1 i , I 4 , h − 1 0 0 0 1 1 0 1 − 1 i , − I 4 ,  1 0 0 − 1   , matrices P 1 and P 2 corres p o nd to the permutations (1 )(2 )(3 )(4)(5 )(6 )(7 )(8)(9 )(1 0 12 16 10)(11 13 1 5 11)(14 ) and (1)(2 9 )(3 8 16 15 5 4 12 1 1 7 6 10 14 13 3) in cyclic notation [3 9], resp ectively; a nd I N and I N denote the identit y and counter-ident ity matrices o f order N , resp ectively . The ab ove factorization reduces the computational cost of T to only 44 additions. Fig. 1 depicts the signa l-ﬂow graph of the fast algo rithm for T ; the blo cks labeled as T 8 denote the selected 8-p oint appr oximate DCT [2 8]. A computational complex ity compariso n o f the cons ider ed orthog onal 1 6-p oint DCT a pproximations is summarized in T able 1. F or contrast, w e also included the computational cost of the Chen DCT fast algorithm [8]. The prop os ed approximation requir es neither mult iplicatio n, no r bit-shifting op era tio ns. F ur- thermore, when compared to the metho ds in [31, 36], the WHT or BAS-2013, and the transformation in [22], the propo sed approximation requires 26 .67%, 31.25%, a nd 3 8.89% less ar ithmetic op era tions, res p e ctively . 2.3 Perf ormance assessment W e sepa rate similarit y and co ding p erformance measures to assess the propo s ed transformation. F or similar - it y measures, we c onsidered the DCT disto r tion ( d 2 ) [40], the total err or energ y ( ǫ ) [27], and the mea n square error (MSE) [1, 2]. F or co ding p erformance ev aluation, we selected the the tr ansform co ding g ain ( C g ) [1 ] and the tr ansform eﬃcie ncy ( η ) [1]. T a ble 2 co mpares the p er formance measur e v alues for the discussed 4 T a ble 2 : Co ding and similarit y perfo r mance assessment T r ansform d 2 ǫ MSE C g η Chen DCT 0.000 0.000 0.000 9.4 55 88.452 WHT 0.878 92.563 0.42 8 8.19 4 70.646 BAS-2010 0.667 64.749 0.18 7 8.52 1 73.634 BAS-2013 0.511 54.621 0.13 2 8.19 4 70.646 T r ansform in [22] 0.15 2 8.081 0.046 7.840 65.279 T r ansform in [31] 0.340 30.323 0.06 4 8.29 5 70.831 T r ansform in [36] 0.25 6 14.740 0 .051 8 .428 72.230 Prop osed approx . 0.493 41. 000 0.09 5 7.857 67.60 8 transforms. The prop os ed appr oximation could furnish p erformance measure which are co mpa rable to the av erage results of the state-of-the-a rt appr oximation. A t the same time, its computational co st is roughly 30% smaller than the lo west co mplex it y method in literature [31, 36 ]. 3 Image and video coding In the follo wing subsections, we descr ibe tw o computational experiments in the co nt ext of image and video enco ding. O ur goal is to demonstrate in real-life scenarios that the in tro duced approximation is ca pable of p erfor ming very closely to state-of-the-a rt approximations at a muc h lower computational cost. F or the still imag e exp eriment, we employ a ﬁxed-rate enco ding scheme which av oids quantization. This is done to iso late the ro le of the transform in order to emphasize the g o o d pr op erties of ener gy compac tio n of the approximate transfor ms. On the o ther hand, for the video exp eriment, w e inc lude the v ar iable-rate enco ding equipp e d with the quantization s tep as required b y the actual HEVC standa rd. Thus, w e aim at pro viding t wo co mprehensive exp eriments to highlight the capabilities of the in tro duce d approximation. 3.1 Image compression experiments W e adopted a JP EG-like pro cedure as detaile d in the metho dolo gy pre sented in [17] and repro duced in [21, 2 4, 25, 31, 36]. A total of 45 512 × 512 8-bit gr ayscale images obta ine d fro m a standar d public image bank [4 1] w as consider ed. This s e t of imag e was selected to b e repre sentativ e of the ima gery commonly found in re a l-life applications. Color images co uld be treated similarly by pro cessing each channel separately . Each given input image A was split int o 10 24 1 6 × 16 disjoint blo cks ( A k , k = 1 , 2 , . . . , 1024) which were submitted to the forward bidimensional (2-D) tra nsformation given by: B k = ˜ C · A k · ˜ C ⊤ , wher e ˜ C is a selected 16-po int tr a nsformation. F ollowing the zig -zag seq uence [42], only the ﬁrst 1 ≤ r ≤ 150 elemen ts of B k were r etained; b eing the remaining ones zero ed and resulting in ˜ B k . The inv erse 2-D trans formation is then applied a ccording to: ˜ A k = ˜ C ⊤ · ˜ B k · ˜ C . The res ulting matrix ˜ A k is the lossy r econstructio n of A k . The correct re arra ng ement of all blo cks results in the reconstructed ima ge ˜ A . This pr o cedure w as perfor med for each o f the 45 images in the selected data se t. T o as sess the approximation in a fair ma nner, we consider the ratio b etw een performa nc e mea sures and ar ithmetic cost. Such ratio furnishes the performa nce gain p er unit of arithmetic computatio n. Fig. 2 shows the av era ge PSNR and s tructural s imilarity index (SSIM) [43] measurements per unit o f additive cost. The prop os e d approximation o utper forms all appr oximate DCT for any v alue of r in both metrics. The in tro duced 16-p oint tra nsform pre sents the best cost-b e ne ﬁt ratio a mo ng 5 0 50 100 150 0.3 0.4 0.5 0.6 0.7 0.8 r A verage PSNR/Additions (dB) WHT BAS−2010 BAS−2013 Transform in [22] Transform in [31] Transform in [36] Proposed approx. (a) PSNR 0 50 100 150 0.000 0.005 0.010 0.015 0.020 r A verage SSIM/Additions WHT BAS−2010 BAS−2013 Transform in [22] Transform in [31] Transform in [36] Proposed approx. (b) SSIM Figure 2: Average (a) PSNR a nd (b) SSIM measurements p er additive cost at compression ra tios. all competing metho ds. Fig. 3 displays a qua litative and q uantitativ e compariso n considering standard Lena image. The PSNR measurements for the Lena image were only 4.7 5% and 5.69% b elow the r esults furnished by the transfor - mations in [31, 3 6], resp ectively . Similar ly , considering the SSIM, the prop osed transfor m p erfor med only 0.62%, 6.4 2 %, and 7 .4 3% b elow the p erfor mance o ﬀered by the tra nsformations in [2 2], [3 1], and [36]. On the other hand, the prop osed approximate DCT require s 38.8% and 26.6% less arithmetic oper ations when compared to [22] and [31, 36], resp ectively . The propo sed appro ximatio n outper formed the WHT, B AS-2 010, and BAS-2013 ac c o rding to both ﬁgures of merit. Indeed, the small losses in PSNR and SSIM compared to the exact DCT are not s uﬃcient to eﬀect a signiﬁcan t image deg radatio n a s perceived by the human visua l system, as shown in Fig. 3. 6 (a) Original i mage (b) PSNR = 28 . 55 dB, SSIM = 0 . 7915 (c) PSNR = 21 . 20 dB, SSIM = 0 . 2076 (d) PSNR = 25 . 27 dB, SSIM = 0 . 6735 (e) PSNR = 25 . 79 dB, SSIM = 0 . 6921 (f ) PSNR = 25 . 75 dB, SSIM = 0 . 7067 (g) PSNR = 27 . 13 dB, SSIM = 0 . 7505 (h) PSNR = 27 . 40 dB, SSIM = 0 . 7587 (i) PSNR = 25 . 84 dB, SSIM = 0 . 7023 Figure 3: Original (a) Lena image and compressed v ersio ns with r = 1 6 according to (b) the DCT, (c) WHT, (d) BAS-201 0, (e) BAS-2 013, (f ) tra nsform in [22], (g) transfo r m in [31], (h) transform in [36], and (i) prop osed 16-p oint approximation. 7 5 10 15 20 25 30 35 40 45 50 30 35 40 45 50 QP A verage PSNR (dB) HEVC standard Scenario (i) Scenario (ii) Figure 4: Performance of the pro p osed DCT a pproximation in HEV C standard for several QP v a lues. 3.2 V ideo compression experiments The prop o sed approximation was embedded in to the HM-1 6 .3 HEVC r eference softw are [44], i.e., the pro- po sed appro ximatio n is considered as a repla cement for the or iginal integer transfor m in the HEVC standard. Because the HEVC standard employs 4- , 8 -, 1 6 -, a nd 32-p oint transfor mations, we p erformed simulations in t wo scenar ios: (i) substitution of the 16-p oint transforma tion only and (ii) replacement of the 8- and 16-p oint transformations. W e adopted the appr oximation described in [28] and the propo sed approximation for the 8- and 16-p o int substitutions, resp ectively . The original 8- and 16-p oint tra nsforms employed in the HE V C standard r equire 22 mult iplicatio ns and 28 additions; and 86 multiplications and 100 a dditions, resp ectively [45]. In co ntrast, the selected DCT approximations are multiplierless and r equire 50 % and 56% few er a dditions, resp ectively . The diag onal matrices ass o ciated to the 8 - and 16 -p oint approximations ar e fully embedded into the quantization step according to judicious scaling o p erations of the standar d HEV C quantization tables [45]. In b oth scenarios , we have consider ed 11 CIF videos of 300 fra mes obtained from a public video database [46]. The default HEVC co ding conﬁguratio n for Main proﬁle was ado pted, which includes b o th 8-bit depth in tra and inter-frame co ding mo des. W e v aried the quan tization parameter (QP) from 5 to 50 in steps of 5. W e adopted the PSNR as ﬁgure of merit, be c ause it is r eadily av a ilable in the reference soft ware. Measurements were taken for each color channel and frame. The ov erall video PSNR v alue was co mputed according to [47]. Average PSNR measur e ments are shown in Fig. 4. The propo sed appr oximation is multi- plierless and eﬀected 66% and 5 3.12% savings in the num ber of a dditio ns considering Scenarios (i) and (ii), resp ectively . At the same time, the resulting image qua lity measures sho wed av erag e e r rors less than 0.28% and 0.7 1%, for Scenarios (i) and (ii), re s p e c tively . Fig. 5 displays the ﬁrst fra me of the F o reman enco ded video according to the unmodiﬁed co dec and the modiﬁed co dec in Scena rios (i) a nd (ii). The approximate transform could eﬀect images that ar e essentially ide ntical to the ones pr o duced by the actual co dec a t a m uch low er computational complexit y . 8 (a) HEV C standa rd ( b) Scenario (i) (c) Scenario (ii) Figure 5: Fir st frame fr om ‘F oreman’ video in the HEVC exp eriment with QP = 3 5. T a ble 3: Hardware resour ce and power consumption using Xilinx Virtex-6 XC6VLX240T 1FFG1156 device Metho d CLB FF T cp d F max D p Q p AT AT 2 T r ansform in [36] 499 1588 3.0 333.3 3 7.4 3.500 1497 44 91 Prop os e d approx. 303 936 2.9 344.83 7.9 3.5 0 9 879 25 48 4 Hardw are implement a tion In order to ev aluate the har dware res ource consumption of the prop osed approximation, it was modeled and tested in Matlab Simulink and then it was ph ysically realized on FPGA. The emplo yed FPGA was a Xilinx Virtex-6 XC6VLX240T installed on a Xilinx ML605 prototyping b oard. The FPGA r ealization was tested with 10,0 00 random 16-p o int input test vectors using hardware co-simulation. T est vectors were generated from within the Matlab environmen t and routed to the physical FPGA device using JT AG based hardware co-simulation. Then the data mea sured from the FPGA w as routed bac k to Ma tla b memo ry space. The a sso ciated FPGA implemen tation w as ev aluated for hardware complexity a nd r eal-time p er formance using metrics suc h as conﬁgura ble log ic blo cks (CLB) and ﬂip-ﬂop (FF) count, cr itical pa th delay ( T cp d ) in ns, and maximum opera ting frequency ( F max ) in MHz. V alues were obtained from the Xilinx FPGA s ynthesis and place - route to ols by a ccessing the xflo w.res ults rep or t ﬁle. In addition, the dynamic p ower ( D p ) in mW / GHz and static p ow er cons umption ( Q p ) in mW were estimated using the Xilinx XPow er Ana ly zer. Using the CLB co unt as a metric to estimate the circuit area ( A ) a nd deriving time ( T ) from T cp d , we also rep ort area-time c o mplexity ( AT ) and a rea-time-sq uared complexity ( AT 2 ). Because the transformation in [36] p osses ses a v ery low arithmetic complexit y (cf. T a ble 1) and presents go o d p er formance (cf. T able 2) , it w as c hosen for a direct comparison with the pr op osed a pproximation. The obtained results are dis played in T able 3. The pr o p osed approximation presents an improv ement of 41.28% and 43.26% in area- time and area-time-square measur es, respectively , when c o mpared to [36]. 5 Conclusion This pap er intro duce d an orthog onal 16-p oint DCT approximation whic h requir es only 44 additions for its computation. T o the b est of our knowledge, the pro po sed transfor mation ha s the lowest computational cost among the meaningful 16- po int DCT appr oximations a rchiv ed in liter ature. The in tro duced method requires fro m 26.6 7% to 38.89 % fewer ar ithmetic op era tions than the b est comp etitor s. In the context of 9 image compres sion, the prop osed to ol attained the be st p er formance vs co mputational co st ratio for b oth PSNR a nd SSIM metrics. When em bedded in to the H.2 65/HECV standa rd, resulting video fr ames exhibited almost imp erceptible degra dation, while demanding no m ultiplications and 5 6 few er additions than the standard unmodiﬁed co dec. The ha rdware realization of the propo sed tra ns form pre s ented an impro vemen t of more than 30% in ar e a-time and area- time-square measures when compar ed to the lowest complexity comp etitor [36]. Poten tially , the present approach can extended to derive 32- and 64-p o int approximations by means of the scaled approach introduced in [36]. Ackno wledgments Authors ac knowledge CAPES, CNPq, F A CEPE, and F APERGS for the partial supp o r t. References [1] V. Britanak, P . Yip, and K. R. Rao, Di scr ete Cosine and Sine T r ansforms . Academic Press, 2007. [2] K. R. R ao and P . Y ip, Di scr ete Cosine T r ansform: A lgorithms, A dvantages, Applic ations . San Diego, CA: Academic Press, 1990. [3] R. J. Cintra, F. M. Ba yer, and C. J. T ablada, “Low-complexit y 8-p oin t DCT app roxima tions based on in teger functions,” Signal Pr o c ess. , v ol. 99, pp. 20 1–214, 201 4. [4] W. B. Pennebaker and J. L. Mitc hell, JPEG Stil l Image Data Compr ession Standar d . New Y ork, NY: V an Nostrand Reinhold, 1992. [5] International T elecomm un ication Union, “ITU-T recommendation H.261 versi on 1: Video codec for aud io visual services at p × 64 kbits,” ITU-T, T ech. Rep., 1990. [6] ——, “ITU-T recommendation H.263 version 1: Video co ding for low bit rate communication,” ITU-T, T ech. Rep., 1995. [7] A. Luthra, G. J. Sulliv an, and T. Wiegand, “Introduction to the special issue on the H.264/A V C video coding standard,” IEEE T r ans. Cir cuits Syst. Vide o T e chnol. , vol. 13, no. 7, pp . 557–55 9, Jul. 2003. [8] W. H. Chen , C. Smith, and S. F ralic k, “A fast computational alg orithm for the discrete cosine transform, ” IEEE T r ans. Commun. , vol. 25, no. 9, pp. 1004–1009, Sep. 1977 . [9] Y. Arai, T. Agui, and M. Nak a jima, “A fast DCT-SQ scheme for imag es,” IEICE T r ans. , vol. E-71, no. 11, pp. 1095–10 97, Nov. 1988. [10] E. F eig and S. Winograd, “F ast algorithms for the discrete cosine transform,” IEEE T r ans. Signal Pr o c essing , vol . 40, no. 9, pp. 2174–219 3, 1992. [11] H. S. Hou , “A fast recursive algorithm for computing the discrete cosine transform,” IEEE T r ans. A c oust., Sp e e ch, Signal Pr o c essing , v ol. 6, no. 10, pp. 145 5–1461, 1987. [12] B. G. Lee, “A new algori thm for computing the discrete cosi ne transform,” IEEE T r ans. A c oust., Sp e e ch, Signal Pr o c essing , v ol. ASS P-32, pp. 1243–1245 , D ec. 1984. [13] C. Lo eﬄer, A. Ligtenberg, and G. Mosc hytz, “Practical fast 1D DCT algorithms with 11 m ultiplications,” in Pr o c. Int. Conf. on A c oustics, Sp e e ch, and Signal Pr o c ess. , 1989, pp. 988 –991. [14] M. V etterli and H. Nussbaumer, “Simple FFT and DCT algorithms with red uced n umber of op erations,” Signal Pr o c ess. , v ol. 6, pp . 267–278, Aug. 19 84. 10 [15] Z. W ang, “F ast algorithms for the discrete W transform and for the discrete Fourier transform, ” IEEE T r ans. A c oust., Sp e e ch, Si gnal Pr o c essing , vol. ASSP-32, p p . 803–81 6, Aug. 1984. [16] K. Len gw ehasatit and A. O rt ega, “Scalable v ariable complexity approximate forw ard DCT,” IEEE T r ans. Cir- cuits Sys t. Vide o T e chnol. , vol. 14 , no. 11, pp. 1236–1248, No v. 2004. [17] T. I. Haw eel, “A new sq uare wa ve transform based on the DCT,” Signal Pr o c ess. , v ol. 82, pp. 230 9–2319, 2001. [18] R. E. Blahut, F ast A lgorithms for Si gnal Pr o c essing . Cambridge Universi ty Press, 2010. [19] T. D. T ran, “The b inDCT: F ast m ultiplierless appro ximation of the DCT,” IEEE Signal Pr o c essing L ett. , vol. 6, no. 7, pp. 14 1–144, 20 00. [20] M. C. Lin, L. R. Dung, and P . K. W eng, “A n ultra-lo w-p ow er image comp ressor for capsule endoscop e,” Biome d. Eng. Online , vo l. 5, no. 1, pp. 1–8, F eb. 2006. [21] S. Bouguezel, M. O. Ahmad, and M. N. S. Swa my , “Low-complexit y 8 × 8 transform for image compression,” Ele ctr on. L ett. , vol. 44, no. 21, p p . 1249–1250 , sep 20 08. [22] F. M. Ba yer, R. J. Cin tra, A. Edirisuriya, and A. Madana yak e, “A digital hardw are fast algorithm and FPGA- based prototype for a nov el 16-p oint app roxima te DCT for image compression applications,” Me as. Sc i. T e chnol. , vol . 23, no. 8, pp. 114 010–114 019, 2012. [23] S. Bouguezel, M. O. Ahmad, and M. Swa my , “A fast 8 × 8 transform for image compressio n,” in Int. Conf . on Micr o ele ctr onics , Dec. 2009, pp. 74–77. [24] S. Bouguezel, M. O. Ahmad, and M. N. S. Swam y , “A nov el transform for image compression,” in Pr o c. 53r d IEEE Int. M idwest Symp. on Cir cuits and Systems , Aug 2010, pp. 509 –512. [25] ——, “A lo w-complexity parametric transform for image compression,” in Pr o c IEEE I nt. Symp. on Cir cuits and Syst ems , 201 1. [26] ——, “Binary discrete cosine and Hartley transforms,” IEEE T r ans. Cir cuits Syst. I , R e g. Pap ers1 , v ol. 60, no. 4, pp. 989–1002 , 2013. [27] R. J. Cintra and F. M. Bay er, “A DCT approximation for image compression,” IEEE Signal Pr o c essing L ett. , vol . 18, no. 10, pp. 579–582, Oct. 20 11. [28] F. M. Bay er and R. J. Cintra , “DCT-like t ransform for image compression requires 14 additions only ,” Ele ctr on. L ett. , v ol. 48, no. 15, pp. 919–921, 2012. [29] U. S. Potluri, A. Madanay ake, R. J. Cin tra, F. M. Bay er, and N. Ra japaksha, “Multiplier-free D CT approxi- mations fo r RF multi-beam digital ap erture- array space imaging and directional sensing,” Me as. Sci. T e chnol. , vol . 23, no. 11, p. 114003, 2012. [30] U. S. Potluri, A. Madana yak e, R . J. Cin tra, F. M . Ba yer, S. Kulasek era, and A. Ediris uriya, “Improv ed 8-p oint approximate DCT for ima ge and video compressi on requiring only 14 additions,” IEEE T r ans. Cir cuits Syst. I, R e g. Pap ers1 , vol. PP , no. 99, pp. 1–14, 2014. [31] T. L. T. da Silveira, F. M. Bay er, R. J. Cintra, S. Kulasekera, A. Madanay ake, and A. J. Kozakevicius, “An orthogonal 16-point approximate DCT for image and video compression,” Multidim. Syst. Si gn. P. , pp . 1–18, 2014. [32] M. T. Po urazad, C. D outre, M. Azimi, and P . Nasiop oulos, “HEVC : The new gold standard for video compression: How does HEVC compare with H.264/A VC?” IEEE T r ans. Consumer Ele ctr on. , vol. 1, no. 3, pp. 36–46, Jul. 2012. [33] G. J. Sulliv an, J.-R. Ohm , W.-J. Han, and T. Wiegand, “Overview of the high eﬃciency video cod ing (H EVC) standard,” IEEE T r ans. Cir cuits Syst. Vide o T e chnol. , vol. 22, no. 12, pp. 164 9–1668, Dec. 2012. 11 [34] S. Bouguezel, M. O . Ahmad, and M. N . S. Swam y , “A multiplication-free transform for image compression,” in Pr o c. 2nd Int. C onf. on Signals, Cir cuits and Systems , no v 2008, pp. 1–4. [35] R. K. Y arlagadda and J. E. Hershey , Hadamar d Matrix A nalysis and Synthesis With Applic ations to Communi- c ations and Signal/Im age Pr o c essing . Kluw er Academic Publishers, 199 7. [36] M. Jridi, A. Alfalou, and P . K. Meher, “A generalize d algorithm and reconﬁ gurable arc hitecture for eﬃcien t and scalable orthogonal approximation of DCT,” I EEE T r ans. Cir cuits Syst. I , vol. 62, no. 2, pp. 449–457, 201 5. [37] C. J. T ablada, F. M. Ba yer, and R. J. Cintra, “A class of DCT approximati ons based on the Feig-Winograd algorithm,” Signal Pr o c ess. , v ol. 113, pp . 38–51, 2015. [38] R. J. Cin tra, “A n integer approximatio n metho d for discrete sinusoi dal tran sforms,” Cir c. Syst. Signal Pr. , vol . 30, no. 6, pp. 1481–150 1, 2011. [39] I. N. Herstein, T opics i n A lgebr a , 2nd ed. John Wiley & Son s, 197 5. [40] C.-K. F ong and W.-K . Cham, “LLM i nteger cosine transform and its fast algorithm,” IEEE T r ans. Cir cuits Syst . Vide o T e chnol. , vol. 22, no. 6, p p. 844–854, 2012. [41] USC-SIPI, “The U SC-SIPI image database,” h ttp://sipi.usc.edu/database/ , 2011, Universit y of Sout h ern Cal i- fornia, Signal and Image Processing Institute. [42] I.-M. Pa o and M.-T. Sun, “Approximation of calculations for forw ard discrete cosine transform,” IEEE T r ans. Cir cuits Syst. Vi de o T e chnol. , vol. 8, no. 3, pp. 26 4–268, Jun. 1998. [43] Z. W ang, A. C. Bo vik, H. R. Sheikh , and E. P . S imoncelli, “Image quality assessment: from error visibilit y t o structural similarit y ,” IEEE T r ans. Image Pr o c essing , v ol. 13, n o. 4, pp. 600–612, Apr. 200 4. [44] Join t Colla b orative T eam on Video Cod ing (JCT-VC), “HEVC reference softw are documentation,” 2 013, Fraunhofer Heinrich Hertz Institute. [Online]. A v ailable: https://hev c.hhi.fraunhofer.de/ [45] M. Budagavi, A. F uldseth , G. Bjontega ard, V . Sze, and M. Sadafale, “Core transform design in the high eﬃciency video co ding (H EVC) standard,” IEEE J. Sel. T op. Sign. Pr o c es. , vol. 7, no. 6, pp . 1029–1 041, dec 2013. [46] Xiph.Org F oundation, “X iph.org video test media, ” 2014 , h ttp s://media.xiph.org/video/derf/. [47] J.-R. Ohm, G. J. Sulliv an, H. Schw arz, T. K. T an, and T. Wiegand, “Comparison of the co ding eﬃciency of video codin g standards - including high eﬃciency video coding (HEVC), ” I EEE T r ans. Cir cuits Syst. Vide o T e chnol. , vol . 22, no. 12, pp. 1669–1684, Dec. 20 12. 12

Multiplierless 16-point DCT Approximation for Low-complexity Image and Video Coding

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment