Low-complexity Image and Video Coding Based on an Approximate Discrete Tchebichef Transform
The usage of linear transformations has great relevance for data decorrelation applications, like image and video compression. In that sense, the discrete Tchebichef transform (DTT) possesses useful coding and decorrelation properties. The DTT transf…
Authors: P. A. M. Oliveira, R. J. Cintra, F. M. Bayer
Lo w-complexit y Image and Video Coding Based on an Appro ximate Discrete Tc hebic hef T ransform Paulo A. M. Oliveira ∗ Renato J. Cintra † F´ abio M. Bay e r ‡ Sunera Kulasekera § Arjuna Madanayak e § V ´ ıtor A. Coutinho ¶ Abstract The usage of linear transformations has great r elev ance for data decorrelation applications, li k e image and video compression. In that sense, the discrete Tch ebic hef transform (DTT) p ossesses useful co ding and decorrelation proper ties. The DTT transform k ernel does not dep end on the input data and fast algorithms can b e developed to real time applications. How eve r, the DTT fast algorithm presen ted i n literature possess high computational complexity . In this w ork, we in tro duce a new lo w-complexity appro ximation for the DTT. The fast algorithm of the prop osed transfor m is multiplication-free and requires a reduced num b er of additions and bit-shifting op er ations. Image and video compression simulations in p opular standards shows go o d p erformance of the prop osed transform. Regarding hardwa re r esource consumption for FPGA s ho ws 43.1% reduction of configurable logic blo ck s and A SIC place and route r eali zation shows 57.7% reduction in the area-time figure when compared wi th the 2-D version of the exact DTT. Keywords Approx imate transforms, discrete Tche bich ef transfor m, fast algori thms, image and video co ding 1 Intr oduction Discrete v ariable orthogonal p olynomials emerge as solutions of sever al hyp ergeometric d ifference equations [1 ]. Classic applica- tions of this class of orthogonal p olynomials include functional analysis [2] and graphs [3]. Additionally , suc h p olynomials are emplo yed in the computation of moment functions [4 ], which are largely used in image pro cessing [5–7]. F or instance, the discrete Tc h ebichef moments [8], whic h are derived from the dis- crete Tchebic h ef p olynomials, form a set of orthogonal moment functions. S uch functions are not discrete ap p ro ximation b ased on contin u ou s functions; they are n aturally orthogonal over the discrete domain. The Tc hebichef momen ts hav e been u sed for quantifying image block artifact [9], image recognition [10–12], blind integrit y ver- ification [13], and image compression [14–18]. In the d ata com- pression context, bi-dimensional (2-D) moments are computed by means of the 2-D discrete Tchebic hef t ransform (DTT). In fact, the 8-p oin t DTT can ac h ieve b etter performance when com- parison with t he discrete cosine transform (DCT) [19], in terms of av erage bit length as rep orted in [16 , 20, 21]. Moreo ver, the 8-p oint DTT-based embedded enco der prop osed in [18], shows impro ved image quality and reduced enco ding/decod ing t ime in comparison with state-of-the-art DCT-based embedded co ders. The 8-p oint DTT has also b een employ ed in blind forensics, as a tool to d etermine the integrit y of medical imagery sub ject to ∗ Paulo A. M. Oliveira was with the Sign al Pro ce ssing Group, Depar- tamento de Estat ´ ıstica, Universidade F ede ral de Pernambuco, Recife, PE , Brazil; M ultimed ia Communications an d Signal P roc essing, University of Erlangen–Nu rember g, Erlangen , BY, Germ any . † Renato J. Cintra i s with th e Signal Pro ce ssing Group, Universi- dade F ederal de Pernambuco, Caruaru, PE, Brazil. He w as with ´ Equip e Cairn , INRIA-IRISA, Universit´ e d e Rennes, Rennes, F ranc e; an d LIRIS, In stitut Nati on al de s Sc ience s Appl iqu´ ees, Ly on, F rance (e-mail: rjdsc@de . ufp e.b r). ‡ F´ abio M. Ba yer is with the Departame nto de Estat ´ ıstica and LAC ESM, Universidade F e deral de Santa Maria, Santa Maria, RS, Brazil (e-mai l : bay er@u fsm.br). § Sunera Kulasekera and Arj una Madanay ake were with the Departm ent of El ectrical and Compute r Engine ering at th e Un iversit y of Akron, OH. ¶ V ´ ıtor A. Coutinho w as with th e Signal Pro cessing Group, Departa- mento de E stat ´ ıstica, UFPE, Rec i fe, Brazil. filtering and compression [13]. How ever, the exact DTT p ossesses high arithmetic complex- it y , due to its significant amount of add itions and float-p oint multipli cations. Such multiplicatio ns are k now n to b e more de- manding computational structures th an additions or fix ed-p oint multipli cations, both in softwar e and har dwar e . Th us, the h igher computational complexity of t h e DTT p recludes its applications in lo w p ow er consumption systems [22, 23] and/or real-time pro- cessing, such as video streaming [24, 25]. Therefore, fast algo- rithms for the DTT could improv e its computational efficiency . A comprehensive literature search reveals only tw o fast algo- rithms for the 4-p oint DTT [14, 26] and one for the 8-p oint DTT [15]. Although th ese fast algorithms p ossess lo w er arith- metic complex ities when compared with the direct DTT calcu- lation, they still p ossess high arithmetic complexity , requ iring a significan t amount of additions and bit-shifting op erations. In a comparable scenario, th e computation of DCT-based transforms—whic h h as b een employ ed in several p opular co ding sc hemes suc h as JPEG [27], MPEG-2 [28], H.261 [29], H.263 [30], H.264 [31], H EVC [32, 3 3], and VP9 [34]—has profited from matrix appro x imation theory [35–41]. I n this context, discrete transforms are not exactly calculated, but instead an appro xi- mate, low-cos t comp u tation, is p erformed. The approximations are designed in such a w ay to allow similar sp ectral and co ding chara cteristics as w ell as lo wer arithmetic complexity . Usually , approximatio ns are m ultiplierless, requiring only ad d ition and bit-shifting operations for its computation. In [42], a multiplier- less appro ximation for the 8-p oint DTT i s prop osed. T o the best of our k n o wledge, this is the only DTT apro x imation arc hived in literature. The aim of this w ork is to introd uce an efficien t low-complexit y approximatio n for the 8-p oin t DTT capable of outperform- ing [42]. T o derive m ultiplierless approximate DTT matrix, a multicri teria optimization problem is sought, combining d iffer- ent co ding metrics: co ding gain and transform efficiency . Addi- tionally , a fast algorithm for efficient computation of t he sought approximatio n is also pursued. F or co ding p erformance ev alu- ation, we prop ose tw o computational exp eriments: (i) a JPEG image co mpression sim ulation and (ii) a v ideo co ding exp eriment 1 whic h consists of embedd ing the sought approximation in to th e H.264/A VC standard. The paper unfolds as follo ws. Section 2 reviews the mathemat- ical background of the DTT. Section 3 introduces a parametriza- tion of th e D TT to derive a family of DTT approxima tions and sets up an optimization problem to identify optimal approxima- tions. In Section 4 , w e assess th e obtained approximatio n in terms of co ding p erformance, proximit y with th e exact trans- form, and computation cost. Moreo ver, a fast algorithm for the prop osed approximate DTT is introdu ced. Section 5 show s the results of th e image and video compression simulatio ns. Sec- tion 6 shows hardware resource consumption comparison with the exact DTT for b oth FPGA and A SIC realizations. A d iscus- sion and final remarks are sho wn in Section 7. 2 Discrete Tchebiche f Tra nsfo rm 2.1 Discrete Tchebiche f Pol ynomials The d iscrete Tc hebichef p olynomials are a set of discrete vari able orthogonal p olynomials [43]. The k th order discrete Tc hebichef p olynomials are given by t h e f ollo wing closed form ex pres- sion [14]: t k [ n ] = (1 − N ) k · 3 F 2 ( − k , − n, 1 + k ; 1 , 1 − N ; 1) , where n = 0 , 1 , . . . , N − 1, 3 F 2 ( a 1 , a 2 , a 3 ; b 1 , b 2 ; z ) = P ∞ n =0 ( a 1 ) k ( a 2 ) k ( a 3 ) k ( b 1 ) k ( b 2 ) k z k k ! is th e generalized hypergeometric func- tion and ( a ) k = a ( a + 1) · · · ( a + k − 1) is t h e descend ant facto- rial. Tchebic hef p olynomials can be obtained according to the follo wing recursion [14]: t k [ n ] = 2 k − 1 k t k [1] t k − 1 [ n ] − k − 1 k N 2 − ( k − 1) 2 t k − 2 [ n ] , for t 0 [ n ] = 1 and t 1 [ n ] = 2 n − N + 1. Ind eed , the set { t k [ n ] } , k = 0 , 1 , . . . , N − 1, is an orthogonal basis in resp ect with th e unit weigh t. Consequently , th e d iscrete Tchebic hef p olynomials satisfy the follo wing mathematical relation: N − 1 X i =0 t i [ n ] t j [ n ] = ρ ( j, N ) · δ i,j , where ρ ( k , N ) = ( N + k ) ! (2 k +1) · ( N − k − 1)! and δ i,j is the K ronec ker d elta function which yields δ i,j = 1, if i = j , and δ i,j = 0, otherwise. 2.2 2-D Discrete Tchebiche f Transform Let f [ m, n ], m, n = 0 , 1 , . . . , N − 1, be an intensit y d istribu tion from a discrete image of size N × N pixels. The 2-D DTT of f [ m, n ], denoted by M [ p, q ], p, q = 0 , 1 , . . . , N − 1, is given by [8, 14]: M [ p, q ] = N − 1 X m,n =0 e t p [ m ] · e t q [ n ] · f [ m, n ] , (1) where e t k [ n ], k = 0 , 1 , . . . , N − 1, are the orthonormali zed d iscrete Tc hebichef p olynomials given by e t k [ n ] = t k [ n ] / p ρ ( k , N ). Note that the transform kernel describ ed in (1) is separable. Hence, the follo wing is relation holds true: M [ p, q ] = N − 1 X m =0 e t p [ m ] N − 1 X n =0 e t q [ n ] · f [ m, n ] , for p, q = 0 , 1 , . . . , N − 1. Therefore, the t ransform-domain coef- ficients of f can b e calculated by the follo wing matrix op eration: M = T · f · T ⊤ , (2) where T is the N -p oin t unid imensional DTT matrix giv en by T = e t 0 [0] e t 0 [1] ··· e t 0 [ N − 1] e t 1 [0] e t 1 [1] ··· e t 1 [ N − 1] . . . . . . . . . . . . e t N − 1 [0] e t N − 1 [1] ·· · e t N − 1 [ N − 1] . The matrix op erations induced by (2) represents the 2-D DTT. Because of t he kernel separation prop erty , the 2-D DTT can b e calculated by means th e su ccessiv e applications of the 1-D DTT to the row s of f ; and then to columns of the resulting intermedia te matrix. The original intensit y distribution f can b e reco vered by the inverse p roced u re: f = T − 1 · M · ( T − 1 ) ⊤ = T ⊤ · M · T . The last equality abov e stems from the D TT orthogonalit y prop- erty: T ⊤ = T − 1 [14]. Therefore, th e same structure can b e u sed at t h e forw ard transform as w ell in the inv erse. F or N = 4 and N = 8, we hav e th e particular cases of interest in the context of image and video co d ing. Th us, the 4- and 8-p oint DTT matrices are, resp ectively , furnished by: T 4 = F 4 · 1 1 1 1 − 3 − 1 1 3 1 − 1 − 1 1 − 1 3 − 3 1 and T 8 = 1 2 · F 8 · 1 1 1 1 1 1 1 1 − 7 − 5 − 3 − 1 1 3 5 7 7 1 − 3 − 5 − 5 − 3 1 7 − 7 5 7 3 − 3 − 7 − 5 7 7 − 13 − 3 9 9 − 3 − 13 7 − 7 23 − 1 7 − 15 15 17 − 23 7 1 − 5 9 − 5 − 5 9 − 5 1 − 1 7 − 21 35 − 35 21 − 7 1 , where F 4 = diag 1 2 , 1 √ 20 , 1 2 , 1 √ 20 and F 8 = diag 1 √ 2 , 1 √ 42 , 1 √ 42 , 1 √ 66 , 1 √ 142 , 1 √ 546 , 1 √ 66 , 1 √ 858 . W e observe that T 4 and T 8 are written as a result from the pro duct of an integ er matrix and a diagonal matrix whic h requires fl oat-p oint representa tion. 3 DTT Appro xim a tions and Coding Optima lity In t h is section, we aim at prop osing an extremely lo w-complexity DTT approximation. Our metho dology consists of generating a class of p arametric approximate matrices and then identif y the optimal class member in terms of co ding p erformance. 3.1 Rela ted Work T o the b est of our knowledge, the only D TT approximation arc hived in literature was prop osed in [42]. That app ro ximation w as obtained by means of a parameterization of integer func- tions combined with a normalization of transformation matrix columns. The derived approximation in [42] furnishes go o d co d- ing capabilities, but it lac k s orth ogonality or near-orthogonality prop erties. As a consequence, the forw ard and invers e transfor- mations are qu ite d istinct and possess u nbalanced computational complexities. 2 3.2 P arametri c Lo w-complexity Ma trices In [35, 36, 38, 44], DCT approximations were prop osed according to follo wing op eration: int( α · C ) , where int( · ) is an integer function, α is a real scaling factor, and C ´ e is the exact DCT matrix. Usual intege r functions include the flo or, ceiling, signal, and round ing functions [38]. In this w ork, these functions op erate element-wise when applied to a matrix argument. A similar approach is sough t for t he prop osed DTT ap p ro x- imation. How ever, in con trast with the DCT, the rows of DTT matrix (basis vectors) ha ve a widely v arying dynamic range. Th u s, the integer fun ct ion ma y excessively p enalize the row s with small dynamic range. T o compensate t h is phenomenon, we normalize the ro ws of T 4 and T 8 accord- ing to left multiplicatio ns by D 4 = diag 2 , √ 20 3 , 2 , √ 20 3 and D 8 = diag √ 8 , √ 168 7 , √ 168 7 , √ 264 7 , √ 568 13 , √ 2184 23 , √ 264 9 , √ 3432 35 , re- sp ectively . The sought approximations are required to p ossess extremely lo w complexit y . On e wa y of ensuring this prop erty is to a dopt an integ er function whose co-domain is a set of lo w-complexity in- teger. In the DCT literature, common sets are: P 0 = {± 1 } [35], P 1 = { 0 , ± 1 } [36], and P 2 = { 0 , ± 1 , ± 2 } [38]. Note that el- ements from these sets hav e very simple realization in hard- w are; implying multiplierless designs with only addition and b it- shifting op erations [45]. Adopting P 2 , we have that a suitable intege r function is given by: round : [ − 1 , 1] → P 2 = { 0 , ± 1 , ± 2 } , x 7→ round( α · x ) , 0 < α < 5 / 2 , where round( x ) = sign( x ) · ⌊| x | + 1 2 ⌋ is the rounding function as implemented in MA TLAB [46], Octave [47], and Python [48] programming languages. F ollo wing the method ology described in [38], we obtain th e follow ing parametric class of matrices: T N ( α ) = roun d( α · D N · T N ) , N ∈ { 4 , 8 } . (3) 3.3 DTT Appro xima tion A giv en lo w-complexity matrix T N ( α ) can b e u sed to approxi- mate the DTT matrix b y means of orth ogonalizatio n or quasi- orthogonalizatio n as describ ed in [36–38]. As a result, an ap- proximati on for T N , referred to as ˆ T N ( α ), can b e obtained by: ˆ T N ( α ) = S N ( α ) · T N ( α ) , (4) where S N ( α ) = q { ediag( T N ( α ) · T N ( α ) ⊤ ) } − 1 is a diago nal ma- trix, ediag ( · ) returns a diagonal matrix with the diagonal entries of its argumen t and √ · is the m atrix element-wise squ are root operator [38]. I f T N ( α ) · T N ( α ) ⊤ = [diagonal matrix] (5) holds true, then ˆ T N ( α ) is an orth ogonal matrix [49]. Otherwise, it is p ossibly a n ear orthogonal matrix [38]. An approximation is said q uasi-orthogonal when the deviation from diagonalit y of T N ( α ) · T N ( α ) ⊤ is considered small. Let A b e a sq uare real matrix. The d eviation from diagonalit y δ ( A ) is given by [50]: δ ( A ) = 1 − k ediag( A ) k F k A k F , where k · k F is the F rob enius norm for matrices [49]. In the context of ima ge compressio n, a deviation from diagonalit y v alue b elo w 1 − 2 √ 5 ≈ 0 . 1056 indicates qu asi-orthogonalit y [35, 38]. 3.4 Optimiza tion Pr oblem Now our goal is to identify in the family T N ( α ) the matrix that furnishes the b est approximatio n. W e adopted tw o metrics as figures of merit to guide the optimal choice: (i) th e unified co d- ing gain C g [51, 52] and (ii) the transform effici ency η [53]. These metrics are relev ant, b ecause they quantify the transform capac- it y of removing signal redundancy , as w ell as d ata compression and decorrelation [53]. Hence, follo wing the metho dology in [54 ], we prop ose the fol- lo wing multicriteria opt imization p roblem: α ∗ = arg max 0 <α< 5 / 2 n C g ˆ T N ( α ) , η ˆ T N ( α ) o , N ∈ { 4 , 8 } , where α ∗ is the scaling parameter that results in the optimal low complexity matrix T ∗ N , ˆ T N ( α ∗ ) according to (3). The ab o ve opt imization problem is not analytically tractable. Thus, we resort to exhaustive numerical search to obtain α ∗ . W e consider linearly spaced v alues of α with a step of 10 − 3 in the interv al 0 < α < 5 / 2. F or N = 4 and N = 8, we obtain th at op- timalit y is found in the in terv als ( 3 2 , 5 2 ) and ( 23 14 , 69 34 ), resp ectively . Therefore, any v alue of α in the aforementioned interv als effects the same appro x imations. F or op erational reasons, we selected α = 2. Thus, the resulting low-complexit y matrices are given b elo w: T ∗ 4 = 1 1 1 1 − 2 − 1 1 2 1 − 1 − 1 1 − 1 2 − 2 1 and T ∗ 8 = 1 1 1 1 1 1 1 1 − 2 − 1 − 1 0 0 1 1 2 2 0 − 1 − 1 − 1 − 1 0 2 − 2 1 2 1 − 1 − 2 − 1 2 1 − 2 0 1 1 0 − 2 1 − 1 2 − 1 − 1 1 1 − 2 1 0 − 1 2 − 1 − 1 2 − 1 0 0 0 − 1 2 − 2 1 0 0 . The associate optimal approximations are denoted by ˆ T ∗ N , ˆ T N ( α ∗ ) and can be computed according t o (4). Hence, w e obtain: ˆ T ∗ 4 = diag 1 2 , 1 √ 10 , 1 2 , 1 √ 10 · T ∗ 4 and ˆ T ∗ 8 = diag 1 √ 8 , 1 √ 12 , 1 √ 12 , 1 √ 20 , 1 √ 12 , 1 √ 14 , 1 √ 12 , 1 √ 10 · T ∗ 8 4 Ev alua tion and Comput a tional C omplexity 4.1 Discussion The obtained matrix T ∗ 4 satisfies (5) and therefore ˆ T ∗ 4 is orthog- onal. In fact, the prop osed matrix is iden tical to the 4-p oint in te- ger transform for H.26 4 enco ding in tro duced b y Malv ar et al. [44]. Therefore, it is also an optimal approximate DTT. Because the 4-p oint DCT approximatio n matrix b y Mal v ar et al . was sub mit- ted to in-dept h analyses in the context of video co ding [31], such results also app ly to T ∗ 4 . Therefore, h ereafter we focus the forth- coming discussions to the prop osed 8-p oint approximation T ∗ 8 . 4.2 Or thogonality a nd Inver tibility The matrix T ∗ 8 does not satisfy (5). Theref ore, the associate approximatio n ˆ T ∗ 8 is n ot orthogonal, i.e. ( ˆ T ∗ 8 ) − 1 6 = ( ˆ T ∗ 8 ) ⊤ . A s a consequ ence, the in verse do es not inherit the low-complexit y prop erties of T ∗ 8 . I n oth er words, t he entries of ( T ∗ 8 ) − 1 are n ot in P 2 . Hence, the prop osed transform p ossesses asymmetrical computational costs when comparing t h e direct and inv erse op- erations [5 5]. The in verse transformation is a m uch required tool, 3 sp ecially for reconstruction en coded images back to the sp atial domain [27, 56]. How ever, T ∗ 8 · ( T ∗ 8 ) ⊤ has a lo w deviation from diagonal- it y [38, 50]: only 0 . 024—roughly 4.4 times less than the devi- ation imp lied by the S DCT [35], which is t aken as the stan- dard reference. Therefore, the follo wing approximation is v alid: ( ˆ T ∗ 8 ) − 1 ≈ ( T ∗ 8 ) ⊤ · diag 1 √ 8 , 1 √ 12 , 1 √ 12 , 1 √ 20 , 1 √ 12 , 1 √ 14 , 1 √ 12 , 1 √ 10 . Moreo ver, since diagonal matrices can b e absorb ed into other computational steps [38, 40, 42], ( T ∗ 8 ) − 1 can b e replaced with the low-complexit y matrix ( T ∗ 8 ) ⊤ . This has th e adv antage of using the same algorithm for b oth forw ard and invers e app ro xi- mations. 4.3 Perfo rmance Assessment The prop osed approximations w ere compared with their corre- sp on d ing exact DTT in terms of codin g p erformance as mea- sured according to the co ding gain and th e transform efficiency . F or N = 8, we also include in our comparisons th e DTT ap- proximati on prop osed in [42]. The co ding p erformance of the prop osed approximations was eval uated according to the fi gures of merit C g and η . T able 1 displays the results. The prop osed approximatio ns are capable of furnishing co ding measures very close to the exact transformations. F or comparison purp oses, the exact DCT has its co ding gain and t ransform efficiency of 8.83 dB and 93.99, resp ectively . Although our goal is to derive go o d approximations for coding, w e also analyzed the resulting approximati ons in terms of prox- imit y metrics. W e separated the mean square error ( MS E) [57], the total energy error ǫ [36], and the transform distortion d [58]. All these measures aim at quantifying the d istance b etw een the exact transformations and th eir resp ective approximations. An - alytic expressions fo r the MSE and ǫ are d et ailed in [54]. W e al so ev aluated the p ro ximity of the prop osed approximatio ns with re- sp ect to the ex act D TT according to the transform distortion measure su ggested in [58]. This metric was originally prop osed as the DCT d istortion in th e context of DCT appro ximations and quantifies in p ercentage a distance b etw een exact and appro xi- mate DCT. Adapting it for the DTT, we obtain t he transform distortion as follo ws: d ( ˆ T ∗ N ) = 1 − 1 N · ediag n T N · ( ˆ T ∗ N ) ⊤ o 2 2 × 100% , where k · k 2 is the eucli dean norm for matrices [49]. Low v alues of distortion ind icates pro ximity with the DTT. A s a comparison for N = 4, the 4-p oin t DCT app ro ximation prop osed in [59] has a d istortion of 7 . 32%. Proximit y results are also sho wn in T able 1. 4.4 F ast Algorithm and A rithmetic Complexity Now we aim at deriving fast algorithms for the obtained DTT ap- proximati ons. Being identical to the H.264 4-p oint DCT approx- imation, the d erive d matrix T ∗ 4 w as given fast algorithms in [44]. Although T ∗ 8 is multiplicati on free, without a fast algorithm, its direct implementation requires 44 additions and 24 bit-shifting operations. Th u s we focu s our efforts on the efficient computa- tion of T ∗ 8 . T o such end, a sparse matrix factorization is sought, where the num b er of add itions and bit-sh ifting op erations can b e significantly redu ced [45, 60]. The sparse matrix factorization prop osed in the manuscript w as derived from scratch b ased on usual butterfly structures [ 45]. W e obtained th e follow ing decomp osition: T ∗ 8 = P · A 2 · A 1 · B 8 , where B 8 = 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 − 1 0 0 0 0 0 1 0 0 − 1 0 0 0 1 0 0 0 0 − 1 0 1 0 0 0 0 0 0 − 1 , A 1 = 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 − 1 2 − 1 0 0 0 0 2 0 − 1 − 1 0 0 0 0 0 0 0 0 2 − 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 2 − 1 0 0 0 0 0 0 0 − 2 A 2 = 1 1 1 0 0 0 0 0 0 0 0 1 − 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 − 1 0 1 0 0 0 0 0 0 − 1 0 1 0 , P = 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 . Matrix B 8 represents a la yer of b u tterfly structu res, A 1 and A 2 denote additive m atrices with bit- shifting op erations, and P represents a fin al p ermutation, which is cost-free. The result- ing algorithm p reserv es all algebraic and co ding prop erties of the direct computation, while requiring less arithmetic op era- tions. Moreo ver, the factor of 2 of th e first matrix row can b e absorbed into the diagonal matrix. The obtained factori zation is represented by t he signal fl o w graph sho wn in Figure 1. Such algorithm reduces th e arithmetic cost of the proposed approxi- mation to only 24 additions and six bit-shifting op erations. T able 2 compares the arithmetic complexit y of the discussed metho d s ev aluated according to b oth their respective fast algo- rithms. The add itive and total arithmetic complexity of the prop osed approximation are 45.5% and 58.9% lo w er than the ex- act transform, respectively . Although, th e computational cost of the p rop osed approximation is slightly h igher th an th e f orwar d DTT approximation in [42], it is imp ortant to notice th at the inv erse t ransformation in [42] is relatively more complex. Con- sidering th e combination of forward and inverse transformation, the design in [42] requires 49 additions, whereas t h e prop osed design requires 48 additions. Bit-shifting costs are virtually null, b ecause in a hardware implementation t hey represent only wiring. F or comparison, the p opular L o effler DCT algorithm [61] requires 11 floating-p oin t m ultiplications and 29 additions. Al- though t he actual approximation consists of the multiplication of the lo w-complexity matrix and a d iagonal matrix as shown in (4 ), the multiplications introd uced by diagonal matrices rep- resen t n o additional arithmetic complexity in image compres- sion app lications. This is b ecause th ey can b e absorb ed into the image qu antiz ation step [27, 56] of JPEG-lik e image compres- sion [36–41]. F urthermore, t h e new approximation is capable of a b etter coding p erformance and p ossesses one order of magni- tude lo wer pro ximity measure as show n in T able 1 . Therefore , b oth performance and arithmetic complexity measures are fa vor- able to the prop osed approximation. 5 Ima ge and Video Compression Expe riments In this section, w e p erform tw o computation exp erimen ts. The first one consists of still image compression considering a JPEG 4 T able 1: Perfo rmance assessment N Method C g (dB) η MSE ǫ d δ 4 Exact DTT [26] 7.55 97.25 - - - 0 Proposed 7.55 97.33 0.001 0.13 0.29% 0 8 Exact DTT [15] 8.68 92.86 - - - 0 DTT Approx. [42] 5.51 83.51 0.015 3.32 12.61 % 0.09 Proposed 9.25 92.71 0.002 0.77 3.03% 0.024 x 7 x 6 x 5 x 4 x 1 x 0 x 2 x 3 X 2 X 0 X 4 X 6 X 5 X 3 X 1 X 7 Figure 1: Signal flow graph for T ∗ 8 . I nput d ata x n , n = 0 , 1 , . . . , 7, relates to output X m , m = 0 , 1 , . . . , 7. Dashed arrows and blac k no des represent multiplications by − 1 and 2, respectively . T able 2: F ast algorithm arithmetic complexity comparison Method Mult. Adit. Shifts T otal Exact DTT [15] 0 44 29 73 F orw ard DTT approx. [42] 0 20 0 20 Inv erse DTT approx. [42] 0 29 8 37 Proposed 0 24 6 30 5 proced ure. The second sim ulation assess the effectiveness of the prop osed ap p ro ximations under realistic v ideo en cod in g condi- tions. 5.1 Image Compressi on W e adopted the image compression metho d according to the JPEG standard [27 , 56]. A set of 45 512 × 512 8-bit images were obtained from a p ublic image bank [62] and submitted to process- ing. The selected images encompass a wide range of categorie s, including 13 textures, 12 satellite images, three human faces and sever al other miscellaneous scenarios. F or each image, the lumi- nance comp onent w as ex tracted and sub divided into 8 × 8 blocks, I i,j , i, j = 1 , 2 , . . . , 64. After prepro cessing, each block was sub- mitted to the the follo wing operation: M i,j = P · I i,j · P ⊤ , where M i,j is t he 2-D transform-domain data and P is a given 1-D transformation matrix, such as the ex act or approximate DTT. Then each subblo ck M i,j w as element-wise divided by a quantization matrix, yielding the q uantized JPEG coefficients J i,j , as follo ws: J i,j = round ( M i,j ⊘ Q ) , where Q = ⌊ ( S · Q 0 + 50) / 100 ⌋ is the q uantizatio n matrix, ⌊·⌋ denotes the flo or function, the default quantization table is Q 0 = 16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 84 80 62 18 22 37 56 68 1 09 103 77 24 35 55 64 81 1 04 113 92 49 64 78 87 1 03 121 120 1 01 72 92 95 98 1 12 100 103 99 , S = 5000 /QF , if QF < 50, and 200 − 2 · QF otherwise, and QF is the q uality factor [56]. I f QF = 50, then Q = Q 0 . Decreasing v alues of QF lead to higher compression ratios (with image total destruction at QF = 0); whereas increasing val ues leads to lo w er compression ratios (with b est p ossible quality at QF = 100). I n our exp eriments, we adopt ed QF varying from 10 to 90 in steps of 5. In the JPEG d ecoder, each sub-blo ck is initially arithmetic decod ed and dequantized according to: ˆ M i,j = J i,j ⊙ Q . Then, the sub-blo cks are inv erse transformed: ˆ I i,j = P − 1 · ˆ M i,j · P −⊤ . Original and compressed images w ere compared for image degradation. The structural similarity ind ex (S SIM) [63] and the sp ectral residual based similarit y (SR-SIM) [64] w ere separated as image q ualit y measures. The SS I M takes into account lumi- nance, contras t, and the image structure to qu antify th e image degradation, b eing consistent with sub jectiv e quality measure- ments [65]. On its turn, the SR- S IM is based on the hyp othesis that th e visual saliency maps of n atural images are closely re- lated to their p erceived q ualit y . This measure could outp erform sever al state-of-the-art fi gu res of merit in exp eriments wi th stan- dardized datasets [64]. W e did n ot consider the p eak signal-to- noise ratio ( PS NR) as a q ualit y measure, b ecause it is n ot a su it- able metric to capt ure the human p erception of image fidelity and q ualit y [57 ]. F or eac h va lue of QF , w e considered a verage measure va lues instead of values from particular images. This ap- proac h is les s prone to vari ance effects and fortuitous data [36, 66]. F or direct compariso n, we selected the exact DTT [15], the ap- proximati on p roposed in [42], and the prop osed appro ximation. As an ext ra reference, we also included the results from the stan- dard JPEG, which is based on the ex act DCT. Figure 2 d ispla y s the results. F or both selected measures, th e prop osed app ro x- imation p erformed very closely to the ex act DTT, specially at high compression ratios. It could outp erform the DTT app ro x- imation in [42] in terms of SSIM an d SR -SIM for Q F < 80 and Q F < 55, resp ectively . I t shows th at the prop osed approx- imation is more efficient in the scenario of high and mo derate compression, which are the very common cases [67], suitable for lo w-p o we r d ev ices. The app ro ximation in [42 ] could attain a b etter p erformance at low compression ratios b ecause it satisfies the p erfect reconstruction p rop erty—alb eit at the exp ense of an inv erse tran sformation with h igher arithmetic complexit y . On the other h and, the prop osed appro ximation explores the near- orthogonalit y prop erty which could excel in mo derate to h igh compression scenarios—whic h are often more relev ant [67]. F or qu alitativ e ev aluation purp oses, Figure 3 sh o ws the compressed Lena image according t o the exact DTT and the prop osed app ro ximation. W e adopted the scenario of high/moderate compression with Q F = 15 and Q F = 50, respec- tively . In both cases, the approximate transform was capable of prod ucing comparable results to the exact DTT with visually similar images. 5.2 Video Compression Sim ula tion T o ev aluate t he prop osed transform p erformance in v ideo co d- ing, w e hav e embedded th e DTT app ro ximation in the widely used x264 softw are library [68] for encoding video streams into the H.264/A VC stand ard [31]. The default 8-point transform em- plo yed in H.264 is the foll ow ing integer DCT appro ximation [69]: ˆ C = 1 8 · 8 8 8 8 8 8 8 8 12 10 6 3 − 3 − 6 − 10 − 12 8 4 − 4 − 8 − 8 − 4 4 8 10 − 3 − 12 − 6 6 12 3 − 10 8 − 8 − 8 8 8 − 8 − 8 8 6 − 12 3 10 − 10 − 3 12 − 6 4 − 8 8 − 4 − 4 8 − 8 4 3 − 6 10 − 12 12 − 10 6 − 3 . The fast algorithm for the abov e transf ormation requires 3 2 addi- tions and 14 bit-shifting op erations [69]. Therefore, the prop osed 8-p oint transform requires 25% less add itions and 57% less bit- shifting operations than the fast algorithm for ˆ C . Elev en 300-frame common intermediate format (CIF) videos obtained from an online test video database [70] w ere encod ed with t he standard softw are and then with th e mod ifi ed softw are. W e employ ed the softw are default settings and conducted the sim ulation und er tw o scenarios: (i) target bitrate v arying from 50 to 500 kbps with steps of 50 kbps and (ii) q uantizatio n pa- rameter (QP) v arying from 5 to 50 with steps of 5. Psycho visual optimization was disabled in order to obtain val id SSIM v alues. Besides PS NR ev aluation, the discussed softw are library [68] of- fers natively SSIM measurements for video quality assessment. Average SSIM of th e lu ma comp onent were compu ted for all re- constructed frames. The results are shown in Figure 4 in terms of the absolute p ercentage error ( A PE) [38] of the S SIM with re- sp ect to the standard DCT-based transformation in the original H.264/A VC co dec. This measure is given by: APE(SSIM) = SSIM H.264 − SS IM P SSIM H.264 , where SSIM H.264 returns the SSIM figures as comput ed accord- ing to the H.264 standard and SSIM P represents the SSIM when the exact DTT, the approximation in [42], or the prop osed approximatio n are considered. SSIM curves for the DCT are absent, b ecause they were employ ed as p erformance references. The use of the proposed transform effects a minor degradation in the video q uality . It also could p erform b ett er than t h e previous approximatio n in all cases. 6 10 20 30 40 50 60 70 80 90 Quality factor 0 . 70 0 . 75 0 . 80 0 . 85 0 . 90 0 . 95 1 . 00 A verage Y -SSIM JPEG Exact DTT [15] Approx. [42] Proposed (a) 10 20 30 40 50 60 70 80 90 Quality factor 0 . 960 0 . 965 0 . 970 0 . 975 0 . 980 0 . 985 0 . 990 0 . 995 1 . 000 A verage Y -SR-SIM JPEG Exact DTT [15] Approx. [42] Proposed (b) Figure 2: Average SSIM ( top) and SR-SI M (b ottom) measurements for image compression for the considered transforms at several v alues of QF . (a) Exact DTT [15], QF = 15 (b) Exact DTT [15], QF = 50 (c) Prop osed, QF = 15 (d) Prop osed, QF = 50 Figure 3: Compressed ‘Lena’ image for QF = 15 and QF = 50. 100 200 300 400 500 T arget bitrate (kbps) 0 . 00 0 . 01 0 . 02 0 . 03 0 . 04 0 . 05 A verage Y -SSIM APE Exact DTT [15] Approx. [42] Proposed (a) 10 20 30 40 50 QP 0 . 000 0 . 005 0 . 010 0 . 015 0 . 020 0 . 025 0 . 030 0 . 035 0 . 040 A verage Y -SSIM APE Exact DTT [15] Approx. [42] Proposed (b) Figure 4: Video q ualit y assessmen t in t erms of target bitrate and QP . 7 Figure 5 displa ys the first encoded f rame of tw o stand ard video sequences at low bitrate (200 kbps). The compressed frames re- sulting from the original and mo dified co d ecs are visually indis- tinguishable. 6 Hardw are Section T o compare the h ardw are resource consumpt ion of the pro- p osed approximate D TT against the exact DTT fast algorithm prop osed in [15], algorithm p rop osed in [42] and t h e Lo effler DCT [61], the 2-D versio n of b oth algorithms were initiall y mo d- eled and tested in Matlab Simulink and t h en were physically realized on a Xilinx Virtex-6 XC 6VSX475T-1FF1759 Reconfig- urable Op en Architecture Computing Hardware -2 (ROA CH2) b oard [71]. The RO ACH2 board consists of a X ilinx Virtex 6 FPGA, 16 complex analog-to-digital converters (ADC), multi- gigabit transceivers and a 72-bit DDR3 RAM. The 1-D versions w ere initially mo deled and th e 2-D versions w ere generated using tw o 1-D designs along with a t ransp ose buffer. Designs were verified u sing more than 10000 test vectors with complete agreemen t with theoretical val ues. R esults are sho wn in T able 3. Metrics, includ ing configurable logic b locks (CLB) and flip-fl op (FF) count, critical p ath delay ( T cp d , in ns), and maxim um operating frequency ( F max , in MHz) are provided. The p ercentage reduction in the number of CLBs and FFs w ere 43.2% and 25.0%, resp ectively , compared with the exact D TT fast algorithm prop osed in [15]. In is imp ortant to emph asize that the approximation in [42] is asymmetric; th e f orw ard and in - verse transform p ossess differen t structures, being the inv erse op- eration more complex (cf. T able 2). F or comparisons, w e adopt the av erage measurement b etw een forw ard and inv erse realiza- tions. The proposed appro ximation could pro vide higher maxi- mum op erating frequency with improv ements of 85.9%, 43.5%, and 9.7% when compared to the Lo effler DCT [61 ], t h e exact DTT [15], and the design in [42 ], resp ectively . The ASIC realizatio n wa s done by p orting the hardw are de- scription language co de to 0.18 um CMOS technolog y an d was sub jected to syn thesis and place-and-route according to the Cadence Encounter Digital Implementation (EDI) for A MS li- braries. Libraries for the b est case scenario were emplo yed in getting the place-and-route results with gate voltage of 1.8 V . The adopted figures of merit for th e ASIC synthesis we re: area ( A ) in mm 2 , area-time complexity ( AT ) in mm 2 · ns, area-time- squared complexity ( AT 2 ) in mm 2 · n s 2 , dynamic ( D p ) p o w er consumption in mW/MHz, critical p ath delay ( T cpd ) in ns, and maximum operating frequency ( F max ) in MHz. Results are d is- pla yed in T able 4. The figures of merit AT and AT 2 had p er- centag e reduct ions of 57.7% and 57.4% when compared with the exact DTT. Th us, the prop osed design could attain reductions of 17.3%, 20.6%, and 82.1% for area, AT 2 , T cp d , and dynamic p o we r consumption, resp ectively , when compared to [42]. 7 Discussion and Conclusion In t h is work, a lo w-complexity near-orthogonal 8-p oint DTT approximatio n suitable for image and video co ding wa s pro- p osed. A fast algorithm for prop osed DTT approximati on whic h requires only 24 add itions and six bit-shifting op erations was also introduced. This fast algorithm can b e used for b oth for- w ard and near in verse transformations. The additive arithmetic cost of th e prop osed approximatio n is 45.5% and 2.05% low er when compared with th e exact DTT fast algorithm an d the DTT app ro ximation in [42], resp ectively . Moreo ver, the pro- p osed transform exhibited similar co ding p erformance with th e exact DTT and outp erformed previous app ro ximations [42] ac- cording to computational exp eriments with p opular visual im- age compression stand ards. In terms of video co ding, t he results from the p rop osed to ol w ere v irtually indistinguishable from th e ones furn ished by the app ro ximation in [42]. Thus, th e n ew to ol outp erform the comp eting meth od s b oth in compu tational cost and co ding p erformance. The prop osed metho d was embedd ed into the JPEG stand ard and th e standard soft wa re library for H.264/A VC v ideo co ding. Obt ained results show ed negligible degradation when compared to the standard DCT-based com- pression metho ds in b oth cases. The 2-D versions were realized in FPGA using RO ACH2 hardware platform and A S IC place and route was realized using Cadence encounter with AMS stan- dard cells and the results show a 43 . 1% redu ction in the number of CLB for the FPGA realization and a 57 . 7% reduction in area- time figure for the ASIC place and route realization when com- pared with the exact DTT. The prop osed design could excel in providing h igh op eration fre quency and very low pow er consump - tion. Therefore, the prop osed approximation offers lo w compu - tational complexity while maintai ning go o d co ding p erformance. Systems that operate under lo w pro cessing constraints and re- quire video streaming can b enefi t of th e prop osed lo w-complexity codecs and low -p ow er h ardw are. In particular, applications in the follo wing contexts meet such requ irements that need low- complexity [72]: environmen tal monitoring, habitat monitoring, surveil lance, stru ct ural monitoring, eq uipment diagnostics, d is- aster managemen t, and emergency response [73]. Ack now ledgment Arjuna Madanay ake th anks th e Xilinx Universit y Program (XUP) for the Xilinx Virtex-6 S x475 FPGA device installed in on the RO ACH2 b oard. References [1] A. F. Nikiforov, S. K . Suslov, an d V. B. U varo v, Classic al Ortho gonal Polynomials of a Discrete V ariable , ser. Springe r Se ries in Compu- tational Physics. Springer Berlin Heidelb erg, 1991. [2] P . D. Dragnev and E. B. S aff, “Constrained ene rgy problems with app lications to orthogonal p ol ynomials of a discre te v ari able,” Journal d’A nalyse M athematique , vol. 72, pp. 223–259, 1997. [Online] . Av ailable: h ttp://dx.doi. org/10.1007/BF02843160 [3] M. Cˆ amara, J. F´ abre ga, M. A. Fiol , and E. G arriga, “S ome fami lies of orthogonal p olynomials of a disc rete v ariable an d th eir applicati on s to graphs and c o d es,” The Ele ctron ic Journal of Com binatorics , vol. 16, pp. 1–30, 2009. [4] H. Zhu, M. Liu, H. Sh u, H. Zhang, and L. Luo, “Gen eral form for ob- taining discrete orthogonal moments,” IET Image Pro c essing , vol. 4, pp. 335–352, Oct 2010. [5] A. Goshtasb y , “T emplate matching in rotated images,” IEEE T r ans- actions on Pattern Analysis and Machine Intel ligence , v ol. 7, n o. 3, pp. 338–344, May 1985. [6] M. I. He yw o od and P . D. Noakes, “F racti on al c entral moment m etho d for mov eme nt-in va riant ob je ct c lassification,” IEE Pro c ee dings– Vision, Image and Signal Pr oc essing , vol. 142, no. 4, pp. 213–219, Aug 1995. [7] V. Mark andey and R. I. P . de Figuei r edo, “Rob ot sensing technique s based on hi gh -dimension al mom ent inv ariants and ten sors,” IEEE T r ansactions on R obotics and Automation , vol. 8, no. 2, p p . 186– 195, Apr 1992. [8] R. Muk undan, S. H. On g, and R. A. Lee, “Image an alysis by Tchebichef mome nts,” IEEE T r ansactions on Image Pro c e ssing , vol. 10, pp. 1357–1364, 2001. [9] L. Le ida, Z. Hancheng, Y. Gaob o, and Q. Jianshen g, “Refer e ncele ss measure of b lo cking artifac ts by Tchebichef kernel analysis,” IEEE Signal Pr oc essing Letters , v ol . 21, no. 1, pp. 122–125, Jan 2014. 8 (a) F oreman, H.264 (b) F orem an, Approx. [42] (c) F oreman, P rop osed Figure 5: First frame of the compressed ‘F oreman’ sequence, with a target bitrate of 200 kbps. T able 3: Hardware resource consumption using Xilinx V irtex-6 XC6 VSX475T 1FF1759 dev ice Method CLB FF T cp d (ns) F max (MHz) Exact DTT [15] 2941 7271 7.688 130 .07 Approximation in [42] 1515 6058 5.596 178 .69 Inv erse App ro ximation in [42 ] 1713 4834 6.184 161. 71 Loeffler D CT [61] 3250 4413 9.956 100 .44 Proposed D TT 1671 5455 5.356 186 .70 T able 4: Hardware resource consump t ion for CMOS 0.18 um A SIC place and route FRS Metho d Area (mm 2 ) AT AT 2 T cp d (ns) F max (MHz) D p (mW / MHz) Exact DTT [15] 0.872 3.84 16. 92 4.405 227.01 0.182 Approximation in [42] 0.237 1.34 7.52 5.635 177.46 0.171 Inv erse App ro ximation in [42] 0.323 1.79 9.89 5.536 180.64 0.724 Loeffler D CT [61] 0.684 4.20 25. 85 6.148 162.65 1.961 Proposed 0.366 1.62 7.20 4.434 225.53 0.080 9 [10] J.-L. Rose, C. Revol-Muller, D. Charpi gny , and C. Odet, “S hap e pri or criterion based on Tc h ebichef mom ents in v ariational region growing,” in 2009 16th IEEE International Confer enc e on Image Pr oc essing (ICIP) , Nov 2009, pp. 1081–1084. [11] H. Zhang, X. Dai, P . S un, H. Zhu, and H. Shu, “Sym metric im age recognition by Tchebichef m oment inv ariants ,” in 2010 17th IEEE International Conferenc e on Image Pr o c essing (ICIP) , Sept 2010, pp. 2273–2276. [12] Q. Li , H. Zhu, and Q. Liu, “Image rec ognition by combined affine and b lur Tchebichef moment inv ariants,” in 2011 4th International Congr e ss on Image and Signal Pr o cessing (CISP) , vol. 3, Oct 2011, pp. 1517–1521. [13] H. Hu ang, G. Coatrieux, H. Shu, L. Luo, and C. Roux, “Blind in- tegrity verification of m e dical i mages,” IEEE T r ansactions on In- formation T echnolo gy in Biomedicine , v ol. 16, n o. 6, pp. 1122–1126, Nov 2012. [14] S. Ishw ar, P . K. Meher, and M . N. S. Swam y , “Discrete Tchebichef transform–a fast 4 × 4 algorithm and its app lication i n im age/video compression, ” in 2008 IEEE International Sympo sium on Cir cuits and Systems (ISCAS) , 2008, pp. 260–263. [15] S. Prattipati, S. Ishw ar, P . K. M eher, and M . N. S. Swam y , “A fast 8 × 8 in te ger Tchebichef tran sf orm and comparison with i nteger cosine transform for image compre ssion,” in 2013 IEEE 56th International Midwest Symposium on Cir cu its and Systems (MWSCAS) , 2013, pp. 1294–1297. [16] N. A. Abu , S. L. W ong, N. He r m an, and R. Muku ndan, “An effici ent compact Tchebichef moment f or image c ompression,” in 2010 10th International Conferenc e on Information Scienc e s Signal Pro cess- ing and their Applic ations (ISSP A) , May 2010, p p . 448–451. [17] Q. Li and H. Zhu, “Block-based comp ressed sensing of image using d i- rectional Tchebichef tran sforms,” in 2012 IEEE International Con- fer ence on Systems, Man, and Cyb e rnetics (SMC) , Oct 2012, pp. 2207–2212. [18] R. K. Se napati, U. C. Pati, and K. K. Mahapatr a, “Reduced mem ory , low comple x ity e mb edde d image com pression algorithm using hierar- chical l istless discrete Tc h e bichef transform ,” IET Image Pro c e ssing , vol. 8, n o. 4, pp. 213–238, Ap r 2014. [19] N. Ah med, T. Natara j an , and K. R. Rao, “Discre te cosine transform,” IEEE T r ansactions on Com puters , vol. C-23, n o. 1, pp . 90–93, Jan. 1974. [20] F. Ernaw an, N. A. Ab u, and N. Su ryana, “TMT quantization table generation based on psychovisual threshold for image compre ssion,” in 2013 International Confer enc e of Information and Com munic a- tion T e c hnolo gy (ICoICT) , Mar 2013, p p. 202–207. [21] R. K. Senapati, U. C. Pati, and K. K. Mahapatra, “A l ow complexity embed ded image co ding algorith m u sin g hierarchical l i stless DTT,” in 2011 8th International Confer enc e on Information, Com munic a- tions and Signal Pr o cessing (ICICS) , Dec 2011, pp. 1–5. [22] F. Ernaw an , E. No e rsasongk o, and N. A. Abu , “An efficient 2 × 2 Tchebichef mome nts for mobile im age com pression,” in 2011 Inter- national Sympo sium on Intel ligent Signal Pro c e ssing and Comm u- nic ations Systems (ISP A CS) , Dec 2011, p p. 1–5. [23] L. W. Chew, L.-M. Ang, and K. P . Sen g, “Su r vey of image com pres- sion algori thms in wirele ss sensor netw orks,” in 2008 Internationa l Symp osium on Information T e chnolo g y (ITSim) , vol. 4, Au g 2008, pp. 1–9. [24] M. Guo, M. H. Amm ar, an d E. W. Zegura, “V3: a vehicle-to- vehicle li ve video streami n g architecture,” in 2005 3rd IEEE Inter- national Confer e nce on Pervasive Computing and C ommunic ation (PerCom) , Mar 2005, pp. 171–180. [25] D. H. F riedman, “Stream ing imple mentation of vid eo algorith ms on a low- p ow er parallel architecture ,” in 2013 IEEE Global Confer enc e on Signal and Information Pro c essing (Glob alSIP) , Dec 2013, pp. 650–653. [26] K. Nak agaki and R. Muku ndan, “A fast 4 × 4 forward discrete Tchebichef transform algorithm,” IEEE Signal Pr oc essing Letters , vol. 14, p p. 684–687, 2007. [27] G. K. W allace, “The JPEG still pic ture compre ssion standard,” IEEE T r ansactions on Consumer Electr onics , vol. 38, no. 1, p p. xviii– xxxiv, F eb 1992. [28] International Organisation for Standardi sation, “Gen eric cod ing of moving pi ctures and associate d audio inform ation – part 2: Vid e o, ISO/IEC JTC1/SC29/W G11 – co ding of mo vin g pictures and audio,” 1994. [29] International T elec om munication Union, “ITU-T rec ommend ation H.261 v ersion 1: Video co dec for audio vi sual services at p × 64 k b its,” T echnical Repor t , ITU-T, 1990. [30] ——, “ITU-T recommend ation H. 263 version 1: Vide o codi ng for low bit rate communication , ” T ec hnical Rep ort, ITU-T, 1995. [31] I. Richardson, The H.264 A dvanc e d Video Compressi on Standar d , 2nd e d. John Wiley and Sons, 2010. [32] G. J. Sul l iv an, J. Ohm, W . -J. Han, an d T. Wi e gand, “Overview of the high efficienc y video co ding (HEVC) standard,” IEEE T rans actions on Cir cuits and Systems for Vide o T echnolo gy , vol. 22, pp. 1649– 1668, 2012. [33] F. Bossen, B. Bross, K. Su hring, and D. Flynn , “HEVC c om plexity and im pleme ntation analysis,” IEEE T r ansactions on Cir c uits and Systems for Vide o T echnolo gy , vol. 22, no. 12, pp. 1685–1696, Dec 2012. [34] Go ogl e Inc., “VP9,” The W ebM Pro ject, http://www.w ebmp ro j ect.org/vp9/, 2015. [35] T. I. Haw eel, “A ne w square wa ve transform based on the DCT,” Signal Pr o cessing , vol. 81, pp. 2309–2319, 2001. [Online]. Av ail ab le: http://www.s cien cedire ct.com/sci ence/article/pii/S0165168401001062 [36] R. J. Cintra and F . M. Ba yer, “A DCT ap proximation for im age compression, ” IEEE Signal Pr oc essing L ette rs , vol. 18, no. 10, p p. 579–582, Oc t 2011. [37] F. M. Bay er and R. J. Cintra, “DCT-like transform for im age compr e s- sion require s 14 additions on l y ,” Ele ctr onics Letters , vol. 48, n o. 15, pp. 919–921, July 2012. [38] R. J. Cintra, F. M . Bay er, and C. J. T ablada, “Low-complexity 8-poi nt DCT approximations based on integer functions,” Signal Pr oc essing , vol. 99, pp. 201–214, 2014. [Onli ne]. Av ailable: http://www.s cien cedire ct.com/sci ence/article/pii/S0165168413005161 [39] S. Bouguez el, M. O . Ahmad, and M. N. S. Swam y , “Low-complexity 8 × 8 t ransform for image com pression,” Ele c tr onics Letters , vol. 44, no. 21, pp. 1249–1250, Oc t 2008. [40] ——, “A low-complexity parame tric transform for image compres- sion,” in 2011 IEEE International Symp osium on Cir cuits and Sys- tems (ISCAS) , M ay 2011, pp . 2145–2148. [41] ——, “Binary d iscrete cosine and h artley transforms,” IEEE T r ans- actions on Cir c u its and Systems I: Re gu l ar Papers , vol. 60, no. 4, pp. 989–1002, Apr 2013. [42] P . A. M. Oliveira, R. J. Cintra, F. M. Bay er, S. Ku lasekera, and A. Madan ay ake, “A discre te Tchebichef tr ansform approximation for image and vid eo co ding,” IEEE Signal Pr o c essing Letters , vol. 22, no. 8, pp. 1137–1141, Au g 2015. [43] H. Bateman and A. E r d´ e lyi, Higher tr ansc endental func- tions . McGraw -Hill, 1953, v ol. 2. [On line]. Av ail able: http://bo oks.go ogle.c om.br/b o oks?id=p lQAAAAMAAJ [44] H. S. M alv ar, A. Hallap uro, M. Karcze wicz, and L. Ker of sky , “Low- comple x ity transform and q u antization in H.264/A V C,” IEEE T r ans- actions on Cir cuits and Systems for Vi de o T e chnology , vol. 13, no. 7, pp. 598–603, Jul 2003. [45] R. Blahut, F ast Algorithms for Signal Pro c essing . Cambridge Uni- versit y Press, 2010. [46] MA TLAB, “v e rsion 8.1 (R2013a) docu mentation,” Natick, MA, 2013. [47] J. W. Eaton, D. Bateman, S. Haub erg, and R. W ehbring, GNU Oc - tave version 3.8.0 Documentation , 3rd ed. F ree Softw are F ou nda- tion, Inc. , F eb 2011. [48] Python, “version 2.7.6 d o c umentation,” Dela w are, US, 2015. [49] G. A. F. S eb er, A Matrix Handbo ok for Statisticians , ser. Wiley Series in Prob ability and Mathematic al S tatistics. Hoboken, NJ: John W iley and Sons, Inc., 2008. [50] B. N. Flury and W. Gautschi, “An algorithm for simultaneous orthogonal transform ation of several p ositive definite symm etric matrices to n e arly diagonal f orm,” SIAM Journal on Scientific and Statistic al Com puting , vol. 7, no. 1, pp. 169–184, Jan. 1986. [Online ]. Av ailable : http://dx.doi. org/10.1137/0907013 [51] V. K. Goy al, “Theoretical foundation s of transform co ding,” IEEE Signal Pr oc essing Magazine , vol. 18, no. 5, pp. 9–21, Sept 2001. [52] J. Katto and Y. Y asuda, “Performance ev aluation of sub- band c o ding an d optimization of its fi lter co efficients,” Journal of Visual Com munic ation and Image Repr esen- tation , vol. 2, pp. 303–313, 1991. [Online ] . Av ai lable: http://www.s cien cedire ct.com/sci ence/article/pii/1047320391900114 [53] V. Britanak, P . C. Yip, and K. R. Rao, Discrete Cosine and Sine T r ansforms . Academic Pre ss, 2007. [Online ]. Av ailable : http://bo oks.go ogle.c om.br/b o oks?id=iRlQHcK - r kC [54] C. J. T ablada, F. M. Bay er, and R. J. Cintra, “A class of DCT approx- imations based on the F eig–Winograd algorithm,” Signal Pr o cessing , vol. 113, p p. 38–51, 2015. 10 [55] R. J. Cintra, H. M. Oliv ei ra, and C. O. Cintra, “The rounded Hartley transform,” in Pro c e e dings of the IEEE International T elec ommu- nic ations Symp osium–ITS’2002 , Se pt 2002, pp. 1357–1364. [56] W. B. P en nebaker and J. L. M itchell, JPEG : Stil l Imag e Data Com- pr ession Standar d , ser. Chapman & Hall digital multimedia stan- dards seri es. Springer, 1993. [57] Z. W ang and A. C. Bo vik, “Mean squared error: Lov e it or leav e it? a n ew lo ok at signal fi delity measures,” IEEE Signal Pr oc essing Magazine , vol. 26, no. 1, pp. 98–117, Jan 2009. [58] C.-K. F ong and W . -K. Cham, “LLM integer cosine transform and its fast algorithm,” IEEE T ransaction s on Cir cu its and Systems for Vide o T echnolo gy , vol. 22, no. 6, pp. 844–854, Jun 2012. [59] F. M. Ba yer, R. J. Cintra, A. Madanay ake, and U. S. Potluri, “M ulti- plierle ss approximate 4-p oint DCT VLS I architectures for tran sform blo ck co ding,” Electr onics Letters , v ol. 49, n o. 24, pp. 1532–1534, Nov 2013. [60] A. V . Opp e nheim and R. W . Schafer, Discr ete -time signal pr o cessing , 3rd ed., ser. Prentice-Hall sign al pro ce ssing seri es. Prentice Hall, 2010. [61] C. Lo effler, A. Ligtenberg, and G. S. Mosch ytz , “A practic al fast 1- D DCT algorithm s with 11 multiplications,” in IEEE International Confer ence on A c oustics, Sp ee ch, and Signal Pr o c essing , vol. 2, Ma y 1989, pp. 988–991. [62] Universit y of South ern California, Signal and Image Pro cessing Institute, “The USC-SIPI image database , ” http://sipi.usc.edu/d atabase/ , 2015. [63] Z. W ang, A. C. Bovik, H. R. Sh e ikh, and E . P . Simonc elli, “Im- age quality assessment: from error visibili ty to struc tural sim ilarity ,” IEEE T rans actions on Imag e Pro c essing , v ol. 13, n o. 4, pp. 600–612, Apr 2004. [64] L. Zhan g and H. Li, “S R-SIM: A f ast and hi gh p e rformance IQA index b ased on spe ctral resid u al,” in 2012 19th IEEE Internation al Confer ence on Image Pr o c e ssing (ICIP) , S ep 2012, pp. 1473–1476. [65] Z. W ang and A. C. Bo vik, “Reduce d- and no-refere nce image quality assessment,” IEEE Signal Pr o c essing Mag azine , v ol. 28, no. 6, p p . 29–40, N ov 2011. [66] S. M. Ka y , F undamentals of Statisti c al Signal Pro cessi ng, V olume I: Estimation Theory , ser. Prentice Hall Signal Pro cessing Serie s. Up - p er Saddl e River, NJ: Prentice-Hall, 1993, vol. 1. [67] R. Pandit, N. Khosla, G. Singh, , and H. Sharma, “Im age compres- sion and quality factor in case of JPEG image form at,” In ternational Journal of A dvanc e d Rese ar ch in Com puter and C ommunic ation Engine ering , vol. 2, pp. 2578–2581, Jul 2013. [68] x264 te am, “x264,” http://www.videolan.org/develop ers/x264.html , 2015. [69] S. Gordon, D. Marp e , and T. Wi egand, “Simplifie d use of 8 × 8 transform–up date d prop osal and results,” Joint Vide o T eam (JVT) of ISO/IE C MPEG an d ITU-T V CEG, do c. JVT–K028, Munich, Ger - many , Mar 2004. [70] “Xiph.org Vide o T est Me dia,” https://media.xip h.org/video/de rf/ , 2015. [71] (2015) RO AC H2. https://casp er.b e rkeley .edu . [72] I. F. Akyildiz , T. Melod i a, and K. R. Cho wdhury , “A survey on wire- less multime dia sensor netw orks,” Com puter Networks , v ol . 51, p p. 921–960, 2007. [73] N. Ki mura an d S. Latifi, “A survey on data c ompression i n wirele ss sensor netw orks,” in 2005 International Confer enc e on Information T e chnolo gy: Codi ng and Computing (ITC C ) , vol. 2, Apr 2005, pp. 8–13. 11
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment