Low-complexity Image and Video Coding Based on an Approximate Discrete Tchebichef Transform

Lo w-complexit y Image and Video Coding Based on an Appro ximate Discrete Tc hebic hef T ransform Paulo A. M. Oliveira ∗ Renato J. Cintra † F´ abio M. Bay e r ‡ Sunera Kulasekera § Arjuna Madanayak e § V ´ ıtor A. Coutinho ¶ Abstract The usage of linear transformations has great r elev ance for data decorrelation applications, li k e image and video compression. In that sense, the discrete Tch ebic hef transform (DTT) p ossesses useful co ding and decorrelation proper ties. The DTT transform k ernel does not dep end on the input data and fast algorithms can b e developed to real time applications. How eve r, the DTT fast algorithm presen ted i n literature possess high computational complexity . In this w ork, we in tro duce a new lo w-complexity appro ximation for the DTT. The fast algorithm of the prop osed transfor m is multiplication-free and requires a reduced num b er of additions and bit-shifting op er ations. Image and video compression simulations in p opular standards shows go o d p erformance of the prop osed transform. Regarding hardwa re r esource consumption for FPGA s ho ws 43.1% reduction of conﬁgurable logic blo ck s and A SIC place and route r eali zation shows 57.7% reduction in the area-time ﬁgure when compared wi th the 2-D version of the exact DTT. Keywords Approx imate transforms, discrete Tche bich ef transfor m, fast algori thms, image and video co ding 1 Intr oduction Discrete v ariable orthogonal p olynomials emerge as solutions of sever al hyp ergeometric d iﬀerence equations [1 ]. Classic applica- tions of this class of orthogonal p olynomials include functional analysis [2] and graphs [3]. Additionally , suc h p olynomials are emplo yed in the computation of moment functions [4 ], which are largely used in image pro cessing [5–7]. F or instance, the discrete Tc h ebichef moments [8], whic h are derived from the dis- crete Tchebic h ef p olynomials, form a set of orthogonal moment functions. S uch functions are not discrete ap p ro ximation b ased on contin u ou s functions; they are n aturally orthogonal over the discrete domain. The Tc hebichef momen ts hav e been u sed for quantifying image block artifact [9], image recognition [10–12], blind integrit y ver- iﬁcation [13], and image compression [14–18]. In the d ata com- pression context, bi-dimensional (2-D) moments are computed by means of the 2-D discrete Tchebic hef t ransform (DTT). In fact, the 8-p oin t DTT can ac h ieve b etter performance when com- parison with t he discrete cosine transform (DCT) [19], in terms of av erage bit length as rep orted in [16 , 20, 21]. Moreo ver, the 8-p oint DTT-based embedded enco der prop osed in [18], shows impro ved image quality and reduced enco ding/decod ing t ime in comparison with state-of-the-art DCT-based embedded co ders. The 8-p oint DTT has also b een employ ed in blind forensics, as a tool to d etermine the integrit y of medical imagery sub ject to ∗ Paulo A. M. Oliveira was with the Sign al Pro ce ssing Group, Depar- tamento de Estat ´ ıstica, Universidade F ede ral de Pernambuco, Recife, PE , Brazil; M ultimed ia Communications an d Signal P roc essing, University of Erlangen–Nu rember g, Erlangen , BY, Germ any . † Renato J. Cintra i s with th e Signal Pro ce ssing Group, Universi- dade F ederal de Pernambuco, Caruaru, PE, Brazil. He w as with ´ Equip e Cairn , INRIA-IRISA, Universit´ e d e Rennes, Rennes, F ranc e; an d LIRIS, In stitut Nati on al de s Sc ience s Appl iqu´ ees, Ly on, F rance (e-mail: rjdsc@de . ufp e.b r). ‡ F´ abio M. Ba yer is with the Departame nto de Estat ´ ıstica and LAC ESM, Universidade F e deral de Santa Maria, Santa Maria, RS, Brazil (e-mai l : bay er@u fsm.br). § Sunera Kulasekera and Arj una Madanay ake were with the Departm ent of El ectrical and Compute r Engine ering at th e Un iversit y of Akron, OH. ¶ V ´ ıtor A. Coutinho w as with th e Signal Pro cessing Group, Departa- mento de E stat ´ ıstica, UFPE, Rec i fe, Brazil. ﬁltering and compression [13]. How ever, the exact DTT p ossesses high arithmetic complex- it y , due to its signiﬁcant amount of add itions and ﬂoat-p oint multipli cations. Such multiplicatio ns are k now n to b e more de- manding computational structures th an additions or ﬁx ed-p oint multipli cations, both in softwar e and har dwar e . Th us, the h igher computational complexity of t h e DTT p recludes its applications in lo w p ow er consumption systems [22, 23] and/or real-time pro- cessing, such as video streaming [24, 25]. Therefore, fast algo- rithms for the DTT could improv e its computational eﬃciency . A comprehensive literature search reveals only tw o fast algo- rithms for the 4-p oint DTT [14, 26] and one for the 8-p oint DTT [15]. Although th ese fast algorithms p ossess lo w er arith- metic complex ities when compared with the direct DTT calcu- lation, they still p ossess high arithmetic complexity , requ iring a signiﬁcan t amount of additions and bit-shifting op erations. In a comparable scenario, th e computation of DCT-based transforms—whic h h as b een employ ed in several p opular co ding sc hemes suc h as JPEG [27], MPEG-2 [28], H.261 [29], H.263 [30], H.264 [31], H EVC [32, 3 3], and VP9 [34]—has proﬁted from matrix appro x imation theory [35–41]. I n this context, discrete transforms are not exactly calculated, but instead an appro xi- mate, low-cos t comp u tation, is p erformed. The approximations are designed in such a w ay to allow similar sp ectral and co ding chara cteristics as w ell as lo wer arithmetic complexity . Usually , approximatio ns are m ultiplierless, requiring only ad d ition and bit-shifting operations for its computation. In [42], a multiplier- less appro ximation for the 8-p oint DTT i s prop osed. T o the best of our k n o wledge, this is the only DTT apro x imation arc hived in literature. The aim of this w ork is to introd uce an eﬃcien t low-complexit y approximatio n for the 8-p oin t DTT capable of outperform- ing [42]. T o derive m ultiplierless approximate DTT matrix, a multicri teria optimization problem is sought, combining d iﬀer- ent co ding metrics: co ding gain and transform eﬃciency . Addi- tionally , a fast algorithm for eﬃcient computation of t he sought approximatio n is also pursued. F or co ding p erformance ev alu- ation, we prop ose tw o computational exp eriments: (i) a JPEG image co mpression sim ulation and (ii) a v ideo co ding exp eriment 1 whic h consists of embedd ing the sought approximation in to th e H.264/A VC standard. The paper unfolds as follo ws. Section 2 reviews the mathemat- ical background of the DTT. Section 3 introduces a parametriza- tion of th e D TT to derive a family of DTT approxima tions and sets up an optimization problem to identify optimal approxima- tions. In Section 4 , w e assess th e obtained approximatio n in terms of co ding p erformance, proximit y with th e exact trans- form, and computation cost. Moreo ver, a fast algorithm for the prop osed approximate DTT is introdu ced. Section 5 show s the results of th e image and video compression simulatio ns. Sec- tion 6 shows hardware resource consumption comparison with the exact DTT for b oth FPGA and A SIC realizations. A d iscus- sion and ﬁnal remarks are sho wn in Section 7. 2 Discrete Tchebiche f Tra nsfo rm 2.1 Discrete Tchebiche f Pol ynomials The d iscrete Tc hebichef p olynomials are a set of discrete vari able orthogonal p olynomials [43]. The k th order discrete Tc hebichef p olynomials are given by t h e f ollo wing closed form ex pres- sion [14]: t k [ n ] = (1 − N ) k · 3 F 2 ( − k , − n, 1 + k ; 1 , 1 − N ; 1) , where n = 0 , 1 , . . . , N − 1, 3 F 2 ( a 1 , a 2 , a 3 ; b 1 , b 2 ; z ) = P ∞ n =0 ( a 1 ) k ( a 2 ) k ( a 3 ) k ( b 1 ) k ( b 2 ) k z k k ! is th e generalized hypergeometric func- tion and ( a ) k = a ( a + 1) · · · ( a + k − 1) is t h e descend ant facto- rial. Tchebic hef p olynomials can be obtained according to the follo wing recursion [14]: t k [ n ] =  2 k − 1 k t k [1]  t k − 1 [ n ] −  k − 1 k  N 2 − ( k − 1) 2   t k − 2 [ n ] , for t 0 [ n ] = 1 and t 1 [ n ] = 2 n − N + 1. Ind eed , the set { t k [ n ] } , k = 0 , 1 , . . . , N − 1, is an orthogonal basis in resp ect with th e unit weigh t. Consequently , th e d iscrete Tchebic hef p olynomials satisfy the follo wing mathematical relation: N − 1 X i =0 t i [ n ] t j [ n ] = ρ ( j, N ) · δ i,j , where ρ ( k , N ) = ( N + k ) ! (2 k +1) · ( N − k − 1)! and δ i,j is the K ronec ker d elta function which yields δ i,j = 1, if i = j , and δ i,j = 0, otherwise. 2.2 2-D Discrete Tchebiche f Transform Let f [ m, n ], m, n = 0 , 1 , . . . , N − 1, be an intensit y d istribu tion from a discrete image of size N × N pixels. The 2-D DTT of f [ m, n ], denoted by M [ p, q ], p, q = 0 , 1 , . . . , N − 1, is given by [8, 14]: M [ p, q ] = N − 1 X m,n =0 e t p [ m ] · e t q [ n ] · f [ m, n ] , (1) where e t k [ n ], k = 0 , 1 , . . . , N − 1, are the orthonormali zed d iscrete Tc hebichef p olynomials given by e t k [ n ] = t k [ n ] / p ρ ( k , N ). Note that the transform kernel describ ed in (1) is separable. Hence, the follo wing is relation holds true: M [ p, q ] = N − 1 X m =0 e t p [ m ] N − 1 X n =0 e t q [ n ] · f [ m, n ] , for p, q = 0 , 1 , . . . , N − 1. Therefore, the t ransform-domain coef- ﬁcients of f can b e calculated by the follo wing matrix op eration: M = T · f · T ⊤ , (2) where T is the N -p oin t unid imensional DTT matrix giv en by T =    e t 0 [0] e t 0 [1] ··· e t 0 [ N − 1] e t 1 [0] e t 1 [1] ··· e t 1 [ N − 1] . . . . . . . . . . . . e t N − 1 [0] e t N − 1 [1] ·· · e t N − 1 [ N − 1]    . The matrix op erations induced by (2) represents the 2-D DTT. Because of t he kernel separation prop erty , the 2-D DTT can b e calculated by means th e su ccessiv e applications of the 1-D DTT to the row s of f ; and then to columns of the resulting intermedia te matrix. The original intensit y distribution f can b e reco vered by the inverse p roced u re: f = T − 1 · M · ( T − 1 ) ⊤ = T ⊤ · M · T . The last equality abov e stems from the D TT orthogonalit y prop- erty: T ⊤ = T − 1 [14]. Therefore, th e same structure can b e u sed at t h e forw ard transform as w ell in the inv erse. F or N = 4 and N = 8, we hav e th e particular cases of interest in the context of image and video co d ing. Th us, the 4- and 8-p oint DTT matrices are, resp ectively , furnished by: T 4 = F 4 ·  1 1 1 1 − 3 − 1 1 3 1 − 1 − 1 1 − 1 3 − 3 1  and T 8 = 1 2 · F 8 ·     1 1 1 1 1 1 1 1 − 7 − 5 − 3 − 1 1 3 5 7 7 1 − 3 − 5 − 5 − 3 1 7 − 7 5 7 3 − 3 − 7 − 5 7 7 − 13 − 3 9 9 − 3 − 13 7 − 7 23 − 1 7 − 15 15 17 − 23 7 1 − 5 9 − 5 − 5 9 − 5 1 − 1 7 − 21 35 − 35 21 − 7 1     , where F 4 = diag  1 2 , 1 √ 20 , 1 2 , 1 √ 20  and F 8 = diag  1 √ 2 , 1 √ 42 , 1 √ 42 , 1 √ 66 , 1 √ 142 , 1 √ 546 , 1 √ 66 , 1 √ 858  . W e observe that T 4 and T 8 are written as a result from the pro duct of an integ er matrix and a diagonal matrix whic h requires ﬂ oat-p oint representa tion. 3 DTT Appro xim a tions and Coding Optima lity In t h is section, we aim at prop osing an extremely lo w-complexity DTT approximation. Our metho dology consists of generating a class of p arametric approximate matrices and then identif y the optimal class member in terms of co ding p erformance. 3.1 Rela ted Work T o the b est of our knowledge, the only D TT approximation arc hived in literature was prop osed in [42]. That app ro ximation w as obtained by means of a parameterization of integer func- tions combined with a normalization of transformation matrix columns. The derived approximation in [42] furnishes go o d co d- ing capabilities, but it lac k s orth ogonality or near-orthogonality prop erties. As a consequence, the forw ard and invers e transfor- mations are qu ite d istinct and possess u nbalanced computational complexities. 2 3.2 P arametri c Lo w-complexity Ma trices In [35, 36, 38, 44], DCT approximations were prop osed according to follo wing op eration: int( α · C ) , where int( · ) is an integer function, α is a real scaling factor, and C ´ e is the exact DCT matrix. Usual intege r functions include the ﬂo or, ceiling, signal, and round ing functions [38]. In this w ork, these functions op erate element-wise when applied to a matrix argument. A similar approach is sough t for t he prop osed DTT ap p ro x- imation. How ever, in con trast with the DCT, the rows of DTT matrix (basis vectors) ha ve a widely v arying dynamic range. Th u s, the integer fun ct ion ma y excessively p enalize the row s with small dynamic range. T o compensate t h is phenomenon, we normalize the ro ws of T 4 and T 8 accord- ing to left multiplicatio ns by D 4 = diag  2 , √ 20 3 , 2 , √ 20 3  and D 8 = diag  √ 8 , √ 168 7 , √ 168 7 , √ 264 7 , √ 568 13 , √ 2184 23 , √ 264 9 , √ 3432 35  , re- sp ectively . The sought approximations are required to p ossess extremely lo w complexit y . On e wa y of ensuring this prop erty is to a dopt an integ er function whose co-domain is a set of lo w-complexity in- teger. In the DCT literature, common sets are: P 0 = {± 1 } [35], P 1 = { 0 , ± 1 } [36], and P 2 = { 0 , ± 1 , ± 2 } [38]. Note that el- ements from these sets hav e very simple realization in hard- w are; implying multiplierless designs with only addition and b it- shifting op erations [45]. Adopting P 2 , we have that a suitable intege r function is given by: round : [ − 1 , 1] → P 2 = { 0 , ± 1 , ± 2 } , x 7→ round( α · x ) , 0 < α < 5 / 2 , where round( x ) = sign( x ) · ⌊| x | + 1 2 ⌋ is the rounding function as implemented in MA TLAB [46], Octave [47], and Python [48] programming languages. F ollo wing the method ology described in [38], we obtain th e follow ing parametric class of matrices: T N ( α ) = roun d( α · D N · T N ) , N ∈ { 4 , 8 } . (3) 3.3 DTT Appro xima tion A giv en lo w-complexity matrix T N ( α ) can b e u sed to approxi- mate the DTT matrix b y means of orth ogonalizatio n or quasi- orthogonalizatio n as describ ed in [36–38]. As a result, an ap- proximati on for T N , referred to as ˆ T N ( α ), can b e obtained by: ˆ T N ( α ) = S N ( α ) · T N ( α ) , (4) where S N ( α ) = q { ediag( T N ( α ) · T N ( α ) ⊤ ) } − 1 is a diago nal ma- trix, ediag ( · ) returns a diagonal matrix with the diagonal entries of its argumen t and √ · is the m atrix element-wise squ are root operator [38]. I f T N ( α ) · T N ( α ) ⊤ = [diagonal matrix] (5) holds true, then ˆ T N ( α ) is an orth ogonal matrix [49]. Otherwise, it is p ossibly a n ear orthogonal matrix [38]. An approximation is said q uasi-orthogonal when the deviation from diagonalit y of T N ( α ) · T N ( α ) ⊤ is considered small. Let A b e a sq uare real matrix. The d eviation from diagonalit y δ ( A ) is given by [50]: δ ( A ) = 1 − k ediag( A ) k F k A k F , where k · k F is the F rob enius norm for matrices [49]. In the context of ima ge compressio n, a deviation from diagonalit y v alue b elo w 1 − 2 √ 5 ≈ 0 . 1056 indicates qu asi-orthogonalit y [35, 38]. 3.4 Optimiza tion Pr oblem Now our goal is to identify in the family T N ( α ) the matrix that furnishes the b est approximatio n. W e adopted tw o metrics as ﬁgures of merit to guide the optimal choice: (i) th e uniﬁed co d- ing gain C g [51, 52] and (ii) the transform eﬃci ency η [53]. These metrics are relev ant, b ecause they quantify the transform capac- it y of removing signal redundancy , as w ell as d ata compression and decorrelation [53]. Hence, follo wing the metho dology in [54 ], we prop ose the fol- lo wing multicriteria opt imization p roblem: α ∗ = arg max 0 <α< 5 / 2 n C g  ˆ T N ( α )  , η  ˆ T N ( α ) o , N ∈ { 4 , 8 } , where α ∗ is the scaling parameter that results in the optimal low complexity matrix T ∗ N , ˆ T N ( α ∗ ) according to (3). The ab o ve opt imization problem is not analytically tractable. Thus, we resort to exhaustive numerical search to obtain α ∗ . W e consider linearly spaced v alues of α with a step of 10 − 3 in the interv al 0 < α < 5 / 2. F or N = 4 and N = 8, we obtain th at op- timalit y is found in the in terv als ( 3 2 , 5 2 ) and ( 23 14 , 69 34 ), resp ectively . Therefore, any v alue of α in the aforementioned interv als eﬀects the same appro x imations. F or op erational reasons, we selected α = 2. Thus, the resulting low-complexit y matrices are given b elo w: T ∗ 4 =  1 1 1 1 − 2 − 1 1 2 1 − 1 − 1 1 − 1 2 − 2 1  and T ∗ 8 =     1 1 1 1 1 1 1 1 − 2 − 1 − 1 0 0 1 1 2 2 0 − 1 − 1 − 1 − 1 0 2 − 2 1 2 1 − 1 − 2 − 1 2 1 − 2 0 1 1 0 − 2 1 − 1 2 − 1 − 1 1 1 − 2 1 0 − 1 2 − 1 − 1 2 − 1 0 0 0 − 1 2 − 2 1 0 0     . The associate optimal approximations are denoted by ˆ T ∗ N , ˆ T N ( α ∗ ) and can be computed according t o (4). Hence, w e obtain: ˆ T ∗ 4 = diag  1 2 , 1 √ 10 , 1 2 , 1 √ 10  · T ∗ 4 and ˆ T ∗ 8 = diag  1 √ 8 , 1 √ 12 , 1 √ 12 , 1 √ 20 , 1 √ 12 , 1 √ 14 , 1 √ 12 , 1 √ 10  · T ∗ 8 4 Ev alua tion and Comput a tional C omplexity 4.1 Discussion The obtained matrix T ∗ 4 satisﬁes (5) and therefore ˆ T ∗ 4 is orthog- onal. In fact, the prop osed matrix is iden tical to the 4-p oint in te- ger transform for H.26 4 enco ding in tro duced b y Malv ar et al. [44]. Therefore, it is also an optimal approximate DTT. Because the 4-p oint DCT approximatio n matrix b y Mal v ar et al . was sub mit- ted to in-dept h analyses in the context of video co ding [31], such results also app ly to T ∗ 4 . Therefore, h ereafter we focus the forth- coming discussions to the prop osed 8-p oint approximation T ∗ 8 . 4.2 Or thogonality a nd Inver tibility The matrix T ∗ 8 does not satisfy (5). Theref ore, the associate approximatio n ˆ T ∗ 8 is n ot orthogonal, i.e. ( ˆ T ∗ 8 ) − 1 6 = ( ˆ T ∗ 8 ) ⊤ . A s a consequ ence, the in verse do es not inherit the low-complexit y prop erties of T ∗ 8 . I n oth er words, t he entries of ( T ∗ 8 ) − 1 are n ot in P 2 . Hence, the prop osed transform p ossesses asymmetrical computational costs when comparing t h e direct and inv erse op- erations [5 5]. The in verse transformation is a m uch required tool, 3 sp ecially for reconstruction en coded images back to the sp atial domain [27, 56]. How ever, T ∗ 8 · ( T ∗ 8 ) ⊤ has a lo w deviation from diagonal- it y [38, 50]: only 0 . 024—roughly 4.4 times less than the devi- ation imp lied by the S DCT [35], which is t aken as the stan- dard reference. Therefore, the follo wing approximation is v alid: ( ˆ T ∗ 8 ) − 1 ≈ ( T ∗ 8 ) ⊤ · diag  1 √ 8 , 1 √ 12 , 1 √ 12 , 1 √ 20 , 1 √ 12 , 1 √ 14 , 1 √ 12 , 1 √ 10  . Moreo ver, since diagonal matrices can b e absorb ed into other computational steps [38, 40, 42], ( T ∗ 8 ) − 1 can b e replaced with the low-complexit y matrix ( T ∗ 8 ) ⊤ . This has th e adv antage of using the same algorithm for b oth forw ard and invers e app ro xi- mations. 4.3 Perfo rmance Assessment The prop osed approximations w ere compared with their corre- sp on d ing exact DTT in terms of codin g p erformance as mea- sured according to the co ding gain and th e transform eﬃciency . F or N = 8, we also include in our comparisons th e DTT ap- proximati on prop osed in [42]. The co ding p erformance of the prop osed approximations was eval uated according to the ﬁ gures of merit C g and η . T able 1 displays the results. The prop osed approximatio ns are capable of furnishing co ding measures very close to the exact transformations. F or comparison purp oses, the exact DCT has its co ding gain and t ransform eﬃciency of 8.83 dB and 93.99, resp ectively . Although our goal is to derive go o d approximations for coding, w e also analyzed the resulting approximati ons in terms of prox- imit y metrics. W e separated the mean square error ( MS E) [57], the total energy error ǫ [36], and the transform distortion d [58]. All these measures aim at quantifying the d istance b etw een the exact transformations and th eir resp ective approximations. An - alytic expressions fo r the MSE and ǫ are d et ailed in [54]. W e al so ev aluated the p ro ximity of the prop osed approximatio ns with re- sp ect to the ex act D TT according to the transform distortion measure su ggested in [58]. This metric was originally prop osed as the DCT d istortion in th e context of DCT appro ximations and quantiﬁes in p ercentage a distance b etw een exact and appro xi- mate DCT. Adapting it for the DTT, we obtain t he transform distortion as follo ws: d ( ˆ T ∗ N ) =  1 − 1 N ·    ediag n T N · ( ˆ T ∗ N ) ⊤ o    2 2  × 100% , where k · k 2 is the eucli dean norm for matrices [49]. Low v alues of distortion ind icates pro ximity with the DTT. A s a comparison for N = 4, the 4-p oin t DCT app ro ximation prop osed in [59] has a d istortion of 7 . 32%. Proximit y results are also sho wn in T able 1. 4.4 F ast Algorithm and A rithmetic Complexity Now we aim at deriving fast algorithms for the obtained DTT ap- proximati ons. Being identical to the H.264 4-p oint DCT approx- imation, the d erive d matrix T ∗ 4 w as given fast algorithms in [44]. Although T ∗ 8 is multiplicati on free, without a fast algorithm, its direct implementation requires 44 additions and 24 bit-shifting operations. Th u s we focu s our eﬀorts on the eﬃcient computa- tion of T ∗ 8 . T o such end, a sparse matrix factorization is sought, where the num b er of add itions and bit-sh ifting op erations can b e signiﬁcantly redu ced [45, 60]. The sparse matrix factorization prop osed in the manuscript w as derived from scratch b ased on usual butterﬂy structures [ 45]. W e obtained th e follow ing decomp osition: T ∗ 8 = P · A 2 · A 1 · B 8 , where B 8 =     1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 − 1 0 0 0 0 0 1 0 0 − 1 0 0 0 1 0 0 0 0 − 1 0 1 0 0 0 0 0 0 − 1     , A 1 =       0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 − 1 2 − 1 0 0 0 0 2 0 − 1 − 1 0 0 0 0 0 0 0 0 2 − 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 2 − 1 0 0 0 0 0 0 0 − 2       A 2 =     1 1 1 0 0 0 0 0 0 0 0 1 − 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 − 1 0 1 0 0 0 0 0 0 − 1 0 1 0     , P =     1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0     . Matrix B 8 represents a la yer of b u tterﬂy structu res, A 1 and A 2 denote additive m atrices with bit- shifting op erations, and P represents a ﬁn al p ermutation, which is cost-free. The result- ing algorithm p reserv es all algebraic and co ding prop erties of the direct computation, while requiring less arithmetic op era- tions. Moreo ver, the factor of 2 of th e ﬁrst matrix row can b e absorbed into the diagonal matrix. The obtained factori zation is represented by t he signal ﬂ o w graph sho wn in Figure 1. Such algorithm reduces th e arithmetic cost of the proposed approxi- mation to only 24 additions and six bit-shifting op erations. T able 2 compares the arithmetic complexit y of the discussed metho d s ev aluated according to b oth their respective fast algo- rithms. The add itive and total arithmetic complexity of the prop osed approximation are 45.5% and 58.9% lo w er than the ex- act transform, respectively . Although, th e computational cost of the p rop osed approximation is slightly h igher th an th e f orwar d DTT approximation in [42], it is imp ortant to notice th at the inv erse t ransformation in [42] is relatively more complex. Con- sidering th e combination of forward and inverse transformation, the design in [42] requires 49 additions, whereas t h e prop osed design requires 48 additions. Bit-shifting costs are virtually null, b ecause in a hardware implementation t hey represent only wiring. F or comparison, the p opular L o eﬄer DCT algorithm [61] requires 11 ﬂoating-p oin t m ultiplications and 29 additions. Al- though t he actual approximation consists of the multiplication of the lo w-complexity matrix and a d iagonal matrix as shown in (4 ), the multiplications introd uced by diagonal matrices rep- resen t n o additional arithmetic complexity in image compres- sion app lications. This is b ecause th ey can b e absorb ed into the image qu antiz ation step [27, 56] of JPEG-lik e image compres- sion [36–41]. F urthermore, t h e new approximation is capable of a b etter coding p erformance and p ossesses one order of magni- tude lo wer pro ximity measure as show n in T able 1 . Therefore , b oth performance and arithmetic complexity measures are fa vor- able to the prop osed approximation. 5 Ima ge and Video Compression Expe riments In this section, w e p erform tw o computation exp erimen ts. The ﬁrst one consists of still image compression considering a JPEG 4 T able 1: Perfo rmance assessment N Method C g (dB) η MSE ǫ d δ 4 Exact DTT [26] 7.55 97.25 - - - 0 Proposed 7.55 97.33 0.001 0.13 0.29% 0 8 Exact DTT [15] 8.68 92.86 - - - 0 DTT Approx. [42] 5.51 83.51 0.015 3.32 12.61 % 0.09 Proposed 9.25 92.71 0.002 0.77 3.03% 0.024 x 7 x 6 x 5 x 4 x 1 x 0 x 2 x 3 X 2 X 0 X 4 X 6 X 5 X 3 X 1 X 7 Figure 1: Signal ﬂow graph for T ∗ 8 . I nput d ata x n , n = 0 , 1 , . . . , 7, relates to output X m , m = 0 , 1 , . . . , 7. Dashed arrows and blac k no des represent multiplications by − 1 and 2, respectively . T able 2: F ast algorithm arithmetic complexity comparison Method Mult. Adit. Shifts T otal Exact DTT [15] 0 44 29 73 F orw ard DTT approx. [42] 0 20 0 20 Inv erse DTT approx. [42] 0 29 8 37 Proposed 0 24 6 30 5 proced ure. The second sim ulation assess the eﬀectiveness of the prop osed ap p ro ximations under realistic v ideo en cod in g condi- tions. 5.1 Image Compressi on W e adopted the image compression metho d according to the JPEG standard [27 , 56]. A set of 45 512 × 512 8-bit images were obtained from a p ublic image bank [62] and submitted to process- ing. The selected images encompass a wide range of categorie s, including 13 textures, 12 satellite images, three human faces and sever al other miscellaneous scenarios. F or each image, the lumi- nance comp onent w as ex tracted and sub divided into 8 × 8 blocks, I i,j , i, j = 1 , 2 , . . . , 64. After prepro cessing, each block was sub- mitted to the the follo wing operation: M i,j = P · I i,j · P ⊤ , where M i,j is t he 2-D transform-domain data and P is a given 1-D transformation matrix, such as the ex act or approximate DTT. Then each subblo ck M i,j w as element-wise divided by a quantization matrix, yielding the q uantized JPEG coeﬃcients J i,j , as follo ws: J i,j = round ( M i,j ⊘ Q ) , where Q = ⌊ ( S · Q 0 + 50) / 100 ⌋ is the q uantizatio n matrix, ⌊·⌋ denotes the ﬂo or function, the default quantization table is Q 0 =     16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 84 80 62 18 22 37 56 68 1 09 103 77 24 35 55 64 81 1 04 113 92 49 64 78 87 1 03 121 120 1 01 72 92 95 98 1 12 100 103 99     , S = 5000 /QF , if QF < 50, and 200 − 2 · QF otherwise, and QF is the q uality factor [56]. I f QF = 50, then Q = Q 0 . Decreasing v alues of QF lead to higher compression ratios (with image total destruction at QF = 0); whereas increasing val ues leads to lo w er compression ratios (with b est p ossible quality at QF = 100). I n our exp eriments, we adopt ed QF varying from 10 to 90 in steps of 5. In the JPEG d ecoder, each sub-blo ck is initially arithmetic decod ed and dequantized according to: ˆ M i,j = J i,j ⊙ Q . Then, the sub-blo cks are inv erse transformed: ˆ I i,j = P − 1 · ˆ M i,j · P −⊤ . Original and compressed images w ere compared for image degradation. The structural similarity ind ex (S SIM) [63] and the sp ectral residual based similarit y (SR-SIM) [64] w ere separated as image q ualit y measures. The SS I M takes into account lumi- nance, contras t, and the image structure to qu antify th e image degradation, b eing consistent with sub jectiv e quality measure- ments [65]. On its turn, the SR- S IM is based on the hyp othesis that th e visual saliency maps of n atural images are closely re- lated to their p erceived q ualit y . This measure could outp erform sever al state-of-the-art ﬁ gu res of merit in exp eriments wi th stan- dardized datasets [64]. W e did n ot consider the p eak signal-to- noise ratio ( PS NR) as a q ualit y measure, b ecause it is n ot a su it- able metric to capt ure the human p erception of image ﬁdelity and q ualit y [57 ]. F or eac h va lue of QF , w e considered a verage measure va lues instead of values from particular images. This ap- proac h is les s prone to vari ance eﬀects and fortuitous data [36, 66]. F or direct compariso n, we selected the exact DTT [15], the ap- proximati on p roposed in [42], and the prop osed appro ximation. As an ext ra reference, we also included the results from the stan- dard JPEG, which is based on the ex act DCT. Figure 2 d ispla y s the results. F or both selected measures, th e prop osed app ro x- imation p erformed very closely to the ex act DTT, specially at high compression ratios. It could outp erform the DTT app ro x- imation in [42] in terms of SSIM an d SR -SIM for Q F < 80 and Q F < 55, resp ectively . I t shows th at the prop osed approx- imation is more eﬃcient in the scenario of high and mo derate compression, which are the very common cases [67], suitable for lo w-p o we r d ev ices. The app ro ximation in [42 ] could attain a b etter p erformance at low compression ratios b ecause it satisﬁes the p erfect reconstruction p rop erty—alb eit at the exp ense of an inv erse tran sformation with h igher arithmetic complexit y . On the other h and, the prop osed appro ximation explores the near- orthogonalit y prop erty which could excel in mo derate to h igh compression scenarios—whic h are often more relev ant [67]. F or qu alitativ e ev aluation purp oses, Figure 3 sh o ws the compressed Lena image according t o the exact DTT and the prop osed app ro ximation. W e adopted the scenario of high/moderate compression with Q F = 15 and Q F = 50, respec- tively . In both cases, the approximate transform was capable of prod ucing comparable results to the exact DTT with visually similar images. 5.2 Video Compression Sim ula tion T o ev aluate t he prop osed transform p erformance in v ideo co d- ing, w e hav e embedded th e DTT app ro ximation in the widely used x264 softw are library [68] for encoding video streams into the H.264/A VC stand ard [31]. The default 8-point transform em- plo yed in H.264 is the foll ow ing integer DCT appro ximation [69]: ˆ C = 1 8 ·     8 8 8 8 8 8 8 8 12 10 6 3 − 3 − 6 − 10 − 12 8 4 − 4 − 8 − 8 − 4 4 8 10 − 3 − 12 − 6 6 12 3 − 10 8 − 8 − 8 8 8 − 8 − 8 8 6 − 12 3 10 − 10 − 3 12 − 6 4 − 8 8 − 4 − 4 8 − 8 4 3 − 6 10 − 12 12 − 10 6 − 3     . The fast algorithm for the abov e transf ormation requires 3 2 addi- tions and 14 bit-shifting op erations [69]. Therefore, the prop osed 8-p oint transform requires 25% less add itions and 57% less bit- shifting operations than the fast algorithm for ˆ C . Elev en 300-frame common intermediate format (CIF) videos obtained from an online test video database [70] w ere encod ed with t he standard softw are and then with th e mod iﬁ ed softw are. W e employ ed the softw are default settings and conducted the sim ulation und er tw o scenarios: (i) target bitrate v arying from 50 to 500 kbps with steps of 50 kbps and (ii) q uantizatio n pa- rameter (QP) v arying from 5 to 50 with steps of 5. Psycho visual optimization was disabled in order to obtain val id SSIM v alues. Besides PS NR ev aluation, the discussed softw are library [68] of- fers natively SSIM measurements for video quality assessment. Average SSIM of th e lu ma comp onent were compu ted for all re- constructed frames. The results are shown in Figure 4 in terms of the absolute p ercentage error ( A PE) [38] of the S SIM with re- sp ect to the standard DCT-based transformation in the original H.264/A VC co dec. This measure is given by: APE(SSIM) =     SSIM H.264 − SS IM P SSIM H.264     , where SSIM H.264 returns the SSIM ﬁgures as comput ed accord- ing to the H.264 standard and SSIM P represents the SSIM when the exact DTT, the approximation in [42], or the prop osed approximatio n are considered. SSIM curves for the DCT are absent, b ecause they were employ ed as p erformance references. The use of the proposed transform eﬀects a minor degradation in the video q uality . It also could p erform b ett er than t h e previous approximatio n in all cases. 6 10 20 30 40 50 60 70 80 90 Quality factor 0 . 70 0 . 75 0 . 80 0 . 85 0 . 90 0 . 95 1 . 00 A verage Y -SSIM JPEG Exact DTT [15] Approx. [42] Proposed (a) 10 20 30 40 50 60 70 80 90 Quality factor 0 . 960 0 . 965 0 . 970 0 . 975 0 . 980 0 . 985 0 . 990 0 . 995 1 . 000 A verage Y -SR-SIM JPEG Exact DTT [15] Approx. [42] Proposed (b) Figure 2: Average SSIM ( top) and SR-SI M (b ottom) measurements for image compression for the considered transforms at several v alues of QF . (a) Exact DTT [15], QF = 15 (b) Exact DTT [15], QF = 50 (c) Prop osed, QF = 15 (d) Prop osed, QF = 50 Figure 3: Compressed ‘Lena’ image for QF = 15 and QF = 50. 100 200 300 400 500 T arget bitrate (kbps) 0 . 00 0 . 01 0 . 02 0 . 03 0 . 04 0 . 05 A verage Y -SSIM APE Exact DTT [15] Approx. [42] Proposed (a) 10 20 30 40 50 QP 0 . 000 0 . 005 0 . 010 0 . 015 0 . 020 0 . 025 0 . 030 0 . 035 0 . 040 A verage Y -SSIM APE Exact DTT [15] Approx. [42] Proposed (b) Figure 4: Video q ualit y assessmen t in t erms of target bitrate and QP . 7 Figure 5 displa ys the ﬁrst encoded f rame of tw o stand ard video sequences at low bitrate (200 kbps). The compressed frames re- sulting from the original and mo diﬁed co d ecs are visually indis- tinguishable. 6 Hardw are Section T o compare the h ardw are resource consumpt ion of the pro- p osed approximate D TT against the exact DTT fast algorithm prop osed in [15], algorithm p rop osed in [42] and t h e Lo eﬄer DCT [61], the 2-D versio n of b oth algorithms were initiall y mo d- eled and tested in Matlab Simulink and t h en were physically realized on a Xilinx Virtex-6 XC 6VSX475T-1FF1759 Reconﬁg- urable Op en Architecture Computing Hardware -2 (ROA CH2) b oard [71]. The RO ACH2 board consists of a X ilinx Virtex 6 FPGA, 16 complex analog-to-digital converters (ADC), multi- gigabit transceivers and a 72-bit DDR3 RAM. The 1-D versions w ere initially mo deled and th e 2-D versions w ere generated using tw o 1-D designs along with a t ransp ose buﬀer. Designs were veriﬁed u sing more than 10000 test vectors with complete agreemen t with theoretical val ues. R esults are sho wn in T able 3. Metrics, includ ing conﬁgurable logic b locks (CLB) and ﬂip-ﬂ op (FF) count, critical p ath delay ( T cp d , in ns), and maxim um operating frequency ( F max , in MHz) are provided. The p ercentage reduction in the number of CLBs and FFs w ere 43.2% and 25.0%, resp ectively , compared with the exact D TT fast algorithm prop osed in [15]. In is imp ortant to emph asize that the approximation in [42] is asymmetric; th e f orw ard and in - verse transform p ossess diﬀeren t structures, being the inv erse op- eration more complex (cf. T able 2). F or comparisons, w e adopt the av erage measurement b etw een forw ard and inv erse realiza- tions. The proposed appro ximation could pro vide higher maxi- mum op erating frequency with improv ements of 85.9%, 43.5%, and 9.7% when compared to the Lo eﬄer DCT [61 ], t h e exact DTT [15], and the design in [42 ], resp ectively . The ASIC realizatio n wa s done by p orting the hardw are de- scription language co de to 0.18 um CMOS technolog y an d was sub jected to syn thesis and place-and-route according to the Cadence Encounter Digital Implementation (EDI) for A MS li- braries. Libraries for the b est case scenario were emplo yed in getting the place-and-route results with gate voltage of 1.8 V . The adopted ﬁgures of merit for th e ASIC synthesis we re: area ( A ) in mm 2 , area-time complexity ( AT ) in mm 2 · ns, area-time- squared complexity ( AT 2 ) in mm 2 · n s 2 , dynamic ( D p ) p o w er consumption in mW/MHz, critical p ath delay ( T cpd ) in ns, and maximum operating frequency ( F max ) in MHz. Results are d is- pla yed in T able 4. The ﬁgures of merit AT and AT 2 had p er- centag e reduct ions of 57.7% and 57.4% when compared with the exact DTT. Th us, the prop osed design could attain reductions of 17.3%, 20.6%, and 82.1% for area, AT 2 , T cp d , and dynamic p o we r consumption, resp ectively , when compared to [42]. 7 Discussion and Conclusion In t h is work, a lo w-complexity near-orthogonal 8-p oint DTT approximatio n suitable for image and video co ding wa s pro- p osed. A fast algorithm for prop osed DTT approximati on whic h requires only 24 add itions and six bit-shifting op erations was also introduced. This fast algorithm can b e used for b oth for- w ard and near in verse transformations. The additive arithmetic cost of th e prop osed approximatio n is 45.5% and 2.05% low er when compared with th e exact DTT fast algorithm an d the DTT app ro ximation in [42], resp ectively . Moreo ver, the pro- p osed transform exhibited similar co ding p erformance with th e exact DTT and outp erformed previous app ro ximations [42] ac- cording to computational exp eriments with p opular visual im- age compression stand ards. In terms of video co ding, t he results from the p rop osed to ol w ere v irtually indistinguishable from th e ones furn ished by the app ro ximation in [42]. Thus, th e n ew to ol outp erform the comp eting meth od s b oth in compu tational cost and co ding p erformance. The prop osed metho d was embedd ed into the JPEG stand ard and th e standard soft wa re library for H.264/A VC v ideo co ding. Obt ained results show ed negligible degradation when compared to the standard DCT-based com- pression metho ds in b oth cases. The 2-D versions were realized in FPGA using RO ACH2 hardware platform and A S IC place and route was realized using Cadence encounter with AMS stan- dard cells and the results show a 43 . 1% redu ction in the number of CLB for the FPGA realization and a 57 . 7% reduction in area- time ﬁgure for the ASIC place and route realization when com- pared with the exact DTT. The prop osed design could excel in providing h igh op eration fre quency and very low pow er consump - tion. Therefore, the prop osed approximation oﬀers lo w compu - tational complexity while maintai ning go o d co ding p erformance. Systems that operate under lo w pro cessing constraints and re- quire video streaming can b eneﬁ t of th e prop osed lo w-complexity codecs and low -p ow er h ardw are. In particular, applications in the follo wing contexts meet such requ irements that need low- complexity [72]: environmen tal monitoring, habitat monitoring, surveil lance, stru ct ural monitoring, eq uipment diagnostics, d is- aster managemen t, and emergency response [73]. Ack now ledgment Arjuna Madanay ake th anks th e Xilinx Universit y Program (XUP) for the Xilinx Virtex-6 S x475 FPGA device installed in on the RO ACH2 b oard. References [1] A. F. Nikiforov, S. K . Suslov, an d V. B. U varo v, Classic al Ortho gonal Polynomials of a Discrete V ariable , ser. Springe r Se ries in Compu- tational Physics. Springer Berlin Heidelb erg, 1991. [2] P . D. Dragnev and E. B. S aﬀ, “Constrained ene rgy problems with app lications to orthogonal p ol ynomials of a discre te v ari able,” Journal d’A nalyse M athematique , vol. 72, pp. 223–259, 1997. [Online] . Av ailable: h ttp://dx.doi. org/10.1007/BF02843160 [3] M. Cˆ amara, J. F´ abre ga, M. A. Fiol , and E. G arriga, “S ome fami lies of orthogonal p olynomials of a disc rete v ariable an d th eir applicati on s to graphs and c o d es,” The Ele ctron ic Journal of Com binatorics , vol. 16, pp. 1–30, 2009. [4] H. Zhu, M. Liu, H. Sh u, H. Zhang, and L. Luo, “Gen eral form for ob- taining discrete orthogonal moments,” IET Image Pro c essing , vol. 4, pp. 335–352, Oct 2010. [5] A. Goshtasb y , “T emplate matching in rotated images,” IEEE T r ans- actions on Pattern Analysis and Machine Intel ligence , v ol. 7, n o. 3, pp. 338–344, May 1985. [6] M. I. He yw o od and P . D. Noakes, “F racti on al c entral moment m etho d for mov eme nt-in va riant ob je ct c lassiﬁcation,” IEE Pro c ee dings– Vision, Image and Signal Pr oc essing , vol. 142, no. 4, pp. 213–219, Aug 1995. [7] V. Mark andey and R. I. P . de Figuei r edo, “Rob ot sensing technique s based on hi gh -dimension al mom ent inv ariants and ten sors,” IEEE T r ansactions on R obotics and Automation , vol. 8, no. 2, p p . 186– 195, Apr 1992. [8] R. Muk undan, S. H. On g, and R. A. Lee, “Image an alysis by Tchebichef mome nts,” IEEE T r ansactions on Image Pro c e ssing , vol. 10, pp. 1357–1364, 2001. [9] L. Le ida, Z. Hancheng, Y. Gaob o, and Q. Jianshen g, “Refer e ncele ss measure of b lo cking artifac ts by Tchebichef kernel analysis,” IEEE Signal Pr oc essing Letters , v ol . 21, no. 1, pp. 122–125, Jan 2014. 8 (a) F oreman, H.264 (b) F orem an, Approx. [42] (c) F oreman, P rop osed Figure 5: First frame of the compressed ‘F oreman’ sequence, with a target bitrate of 200 kbps. T able 3: Hardware resource consumption using Xilinx V irtex-6 XC6 VSX475T 1FF1759 dev ice Method CLB FF T cp d (ns) F max (MHz) Exact DTT [15] 2941 7271 7.688 130 .07 Approximation in [42] 1515 6058 5.596 178 .69 Inv erse App ro ximation in [42 ] 1713 4834 6.184 161. 71 Loeﬄer D CT [61] 3250 4413 9.956 100 .44 Proposed D TT 1671 5455 5.356 186 .70 T able 4: Hardware resource consump t ion for CMOS 0.18 um A SIC place and route FRS Metho d Area (mm 2 ) AT AT 2 T cp d (ns) F max (MHz) D p (mW / MHz) Exact DTT [15] 0.872 3.84 16. 92 4.405 227.01 0.182 Approximation in [42] 0.237 1.34 7.52 5.635 177.46 0.171 Inv erse App ro ximation in [42] 0.323 1.79 9.89 5.536 180.64 0.724 Loeﬄer D CT [61] 0.684 4.20 25. 85 6.148 162.65 1.961 Proposed 0.366 1.62 7.20 4.434 225.53 0.080 9 [10] J.-L. Rose, C. Revol-Muller, D. Charpi gny , and C. Odet, “S hap e pri or criterion based on Tc h ebichef mom ents in v ariational region growing,” in 2009 16th IEEE International Confer enc e on Image Pr oc essing (ICIP) , Nov 2009, pp. 1081–1084. [11] H. Zhang, X. Dai, P . S un, H. Zhu, and H. Shu, “Sym metric im age recognition by Tchebichef m oment inv ariants ,” in 2010 17th IEEE International Conferenc e on Image Pr o c essing (ICIP) , Sept 2010, pp. 2273–2276. [12] Q. Li , H. Zhu, and Q. Liu, “Image rec ognition by combined aﬃne and b lur Tchebichef moment inv ariants,” in 2011 4th International Congr e ss on Image and Signal Pr o cessing (CISP) , vol. 3, Oct 2011, pp. 1517–1521. [13] H. Hu ang, G. Coatrieux, H. Shu, L. Luo, and C. Roux, “Blind in- tegrity veriﬁcation of m e dical i mages,” IEEE T r ansactions on In- formation T echnolo gy in Biomedicine , v ol. 16, n o. 6, pp. 1122–1126, Nov 2012. [14] S. Ishw ar, P . K. Meher, and M . N. S. Swam y , “Discrete Tchebichef transform–a fast 4 × 4 algorithm and its app lication i n im age/video compression, ” in 2008 IEEE International Sympo sium on Cir cuits and Systems (ISCAS) , 2008, pp. 260–263. [15] S. Prattipati, S. Ishw ar, P . K. M eher, and M . N. S. Swam y , “A fast 8 × 8 in te ger Tchebichef tran sf orm and comparison with i nteger cosine transform for image compre ssion,” in 2013 IEEE 56th International Midwest Symposium on Cir cu its and Systems (MWSCAS) , 2013, pp. 1294–1297. [16] N. A. Abu , S. L. W ong, N. He r m an, and R. Muku ndan, “An eﬃci ent compact Tchebichef moment f or image c ompression,” in 2010 10th International Conferenc e on Information Scienc e s Signal Pro cess- ing and their Applic ations (ISSP A) , May 2010, p p . 448–451. [17] Q. Li and H. Zhu, “Block-based comp ressed sensing of image using d i- rectional Tchebichef tran sforms,” in 2012 IEEE International Con- fer ence on Systems, Man, and Cyb e rnetics (SMC) , Oct 2012, pp. 2207–2212. [18] R. K. Se napati, U. C. Pati, and K. K. Mahapatr a, “Reduced mem ory , low comple x ity e mb edde d image com pression algorithm using hierar- chical l istless discrete Tc h e bichef transform ,” IET Image Pro c e ssing , vol. 8, n o. 4, pp. 213–238, Ap r 2014. [19] N. Ah med, T. Natara j an , and K. R. Rao, “Discre te cosine transform,” IEEE T r ansactions on Com puters , vol. C-23, n o. 1, pp . 90–93, Jan. 1974. [20] F. Ernaw an, N. A. Ab u, and N. Su ryana, “TMT quantization table generation based on psychovisual threshold for image compre ssion,” in 2013 International Confer enc e of Information and Com munic a- tion T e c hnolo gy (ICoICT) , Mar 2013, p p. 202–207. [21] R. K. Senapati, U. C. Pati, and K. K. Mahapatra, “A l ow complexity embed ded image co ding algorith m u sin g hierarchical l i stless DTT,” in 2011 8th International Confer enc e on Information, Com munic a- tions and Signal Pr o cessing (ICICS) , Dec 2011, pp. 1–5. [22] F. Ernaw an , E. No e rsasongk o, and N. A. Abu , “An eﬃcient 2 × 2 Tchebichef mome nts for mobile im age com pression,” in 2011 Inter- national Sympo sium on Intel ligent Signal Pro c e ssing and Comm u- nic ations Systems (ISP A CS) , Dec 2011, p p. 1–5. [23] L. W. Chew, L.-M. Ang, and K. P . Sen g, “Su r vey of image com pres- sion algori thms in wirele ss sensor netw orks,” in 2008 Internationa l Symp osium on Information T e chnolo g y (ITSim) , vol. 4, Au g 2008, pp. 1–9. [24] M. Guo, M. H. Amm ar, an d E. W. Zegura, “V3: a vehicle-to- vehicle li ve video streami n g architecture,” in 2005 3rd IEEE Inter- national Confer e nce on Pervasive Computing and C ommunic ation (PerCom) , Mar 2005, pp. 171–180. [25] D. H. F riedman, “Stream ing imple mentation of vid eo algorith ms on a low- p ow er parallel architecture ,” in 2013 IEEE Global Confer enc e on Signal and Information Pro c essing (Glob alSIP) , Dec 2013, pp. 650–653. [26] K. Nak agaki and R. Muku ndan, “A fast 4 × 4 forward discrete Tchebichef transform algorithm,” IEEE Signal Pr oc essing Letters , vol. 14, p p. 684–687, 2007. [27] G. K. W allace, “The JPEG still pic ture compre ssion standard,” IEEE T r ansactions on Consumer Electr onics , vol. 38, no. 1, p p. xviii– xxxiv, F eb 1992. [28] International Organisation for Standardi sation, “Gen eric cod ing of moving pi ctures and associate d audio inform ation – part 2: Vid e o, ISO/IEC JTC1/SC29/W G11 – co ding of mo vin g pictures and audio,” 1994. [29] International T elec om munication Union, “ITU-T rec ommend ation H.261 v ersion 1: Video co dec for audio vi sual services at p × 64 k b its,” T echnical Repor t , ITU-T, 1990. [30] ——, “ITU-T recommend ation H. 263 version 1: Vide o codi ng for low bit rate communication , ” T ec hnical Rep ort, ITU-T, 1995. [31] I. Richardson, The H.264 A dvanc e d Video Compressi on Standar d , 2nd e d. John Wiley and Sons, 2010. [32] G. J. Sul l iv an, J. Ohm, W . -J. Han, an d T. Wi e gand, “Overview of the high eﬃcienc y video co ding (HEVC) standard,” IEEE T rans actions on Cir cuits and Systems for Vide o T echnolo gy , vol. 22, pp. 1649– 1668, 2012. [33] F. Bossen, B. Bross, K. Su hring, and D. Flynn , “HEVC c om plexity and im pleme ntation analysis,” IEEE T r ansactions on Cir c uits and Systems for Vide o T echnolo gy , vol. 22, no. 12, pp. 1685–1696, Dec 2012. [34] Go ogl e Inc., “VP9,” The W ebM Pro ject, http://www.w ebmp ro j ect.org/vp9/, 2015. [35] T. I. Haw eel, “A ne w square wa ve transform based on the DCT,” Signal Pr o cessing , vol. 81, pp. 2309–2319, 2001. [Online]. Av ail ab le: http://www.s cien cedire ct.com/sci ence/article/pii/S0165168401001062 [36] R. J. Cintra and F . M. Ba yer, “A DCT ap proximation for im age compression, ” IEEE Signal Pr oc essing L ette rs , vol. 18, no. 10, p p. 579–582, Oc t 2011. [37] F. M. Bay er and R. J. Cintra, “DCT-like transform for im age compr e s- sion require s 14 additions on l y ,” Ele ctr onics Letters , vol. 48, n o. 15, pp. 919–921, July 2012. [38] R. J. Cintra, F. M . Bay er, and C. J. T ablada, “Low-complexity 8-poi nt DCT approximations based on integer functions,” Signal Pr oc essing , vol. 99, pp. 201–214, 2014. [Onli ne]. Av ailable: http://www.s cien cedire ct.com/sci ence/article/pii/S0165168413005161 [39] S. Bouguez el, M. O . Ahmad, and M. N. S. Swam y , “Low-complexity 8 × 8 t ransform for image com pression,” Ele c tr onics Letters , vol. 44, no. 21, pp. 1249–1250, Oc t 2008. [40] ——, “A low-complexity parame tric transform for image compres- sion,” in 2011 IEEE International Symp osium on Cir cuits and Sys- tems (ISCAS) , M ay 2011, pp . 2145–2148. [41] ——, “Binary d iscrete cosine and h artley transforms,” IEEE T r ans- actions on Cir c u its and Systems I: Re gu l ar Papers , vol. 60, no. 4, pp. 989–1002, Apr 2013. [42] P . A. M. Oliveira, R. J. Cintra, F. M. Bay er, S. Ku lasekera, and A. Madan ay ake, “A discre te Tchebichef tr ansform approximation for image and vid eo co ding,” IEEE Signal Pr o c essing Letters , vol. 22, no. 8, pp. 1137–1141, Au g 2015. [43] H. Bateman and A. E r d´ e lyi, Higher tr ansc endental func- tions . McGraw -Hill, 1953, v ol. 2. [On line]. Av ail able: http://bo oks.go ogle.c om.br/b o oks?id=p lQAAAAMAAJ [44] H. S. M alv ar, A. Hallap uro, M. Karcze wicz, and L. Ker of sky , “Low- comple x ity transform and q u antization in H.264/A V C,” IEEE T r ans- actions on Cir cuits and Systems for Vi de o T e chnology , vol. 13, no. 7, pp. 598–603, Jul 2003. [45] R. Blahut, F ast Algorithms for Signal Pro c essing . Cambridge Uni- versit y Press, 2010. [46] MA TLAB, “v e rsion 8.1 (R2013a) docu mentation,” Natick, MA, 2013. [47] J. W. Eaton, D. Bateman, S. Haub erg, and R. W ehbring, GNU Oc - tave version 3.8.0 Documentation , 3rd ed. F ree Softw are F ou nda- tion, Inc. , F eb 2011. [48] Python, “version 2.7.6 d o c umentation,” Dela w are, US, 2015. [49] G. A. F. S eb er, A Matrix Handbo ok for Statisticians , ser. Wiley Series in Prob ability and Mathematic al S tatistics. Hoboken, NJ: John W iley and Sons, Inc., 2008. [50] B. N. Flury and W. Gautschi, “An algorithm for simultaneous orthogonal transform ation of several p ositive deﬁnite symm etric matrices to n e arly diagonal f orm,” SIAM Journal on Scientiﬁc and Statistic al Com puting , vol. 7, no. 1, pp. 169–184, Jan. 1986. [Online ]. Av ailable : http://dx.doi. org/10.1137/0907013 [51] V. K. Goy al, “Theoretical foundation s of transform co ding,” IEEE Signal Pr oc essing Magazine , vol. 18, no. 5, pp. 9–21, Sept 2001. [52] J. Katto and Y. Y asuda, “Performance ev aluation of sub- band c o ding an d optimization of its ﬁ lter co eﬃcients,” Journal of Visual Com munic ation and Image Repr esen- tation , vol. 2, pp. 303–313, 1991. [Online ] . Av ai lable: http://www.s cien cedire ct.com/sci ence/article/pii/1047320391900114 [53] V. Britanak, P . C. Yip, and K. R. Rao, Discrete Cosine and Sine T r ansforms . Academic Pre ss, 2007. [Online ]. Av ailable : http://bo oks.go ogle.c om.br/b o oks?id=iRlQHcK - r kC [54] C. J. T ablada, F. M. Bay er, and R. J. Cintra, “A class of DCT approx- imations based on the F eig–Winograd algorithm,” Signal Pr o cessing , vol. 113, p p. 38–51, 2015. 10 [55] R. J. Cintra, H. M. Oliv ei ra, and C. O. Cintra, “The rounded Hartley transform,” in Pro c e e dings of the IEEE International T elec ommu- nic ations Symp osium–ITS’2002 , Se pt 2002, pp. 1357–1364. [56] W. B. P en nebaker and J. L. M itchell, JPEG : Stil l Imag e Data Com- pr ession Standar d , ser. Chapman & Hall digital multimedia stan- dards seri es. Springer, 1993. [57] Z. W ang and A. C. Bo vik, “Mean squared error: Lov e it or leav e it? a n ew lo ok at signal ﬁ delity measures,” IEEE Signal Pr oc essing Magazine , vol. 26, no. 1, pp. 98–117, Jan 2009. [58] C.-K. F ong and W . -K. Cham, “LLM integer cosine transform and its fast algorithm,” IEEE T ransaction s on Cir cu its and Systems for Vide o T echnolo gy , vol. 22, no. 6, pp. 844–854, Jun 2012. [59] F. M. Ba yer, R. J. Cintra, A. Madanay ake, and U. S. Potluri, “M ulti- plierle ss approximate 4-p oint DCT VLS I architectures for tran sform blo ck co ding,” Electr onics Letters , v ol. 49, n o. 24, pp. 1532–1534, Nov 2013. [60] A. V . Opp e nheim and R. W . Schafer, Discr ete -time signal pr o cessing , 3rd ed., ser. Prentice-Hall sign al pro ce ssing seri es. Prentice Hall, 2010. [61] C. Lo eﬄer, A. Ligtenberg, and G. S. Mosch ytz , “A practic al fast 1- D DCT algorithm s with 11 multiplications,” in IEEE International Confer ence on A c oustics, Sp ee ch, and Signal Pr o c essing , vol. 2, Ma y 1989, pp. 988–991. [62] Universit y of South ern California, Signal and Image Pro cessing Institute, “The USC-SIPI image database , ” http://sipi.usc.edu/d atabase/ , 2015. [63] Z. W ang, A. C. Bovik, H. R. Sh e ikh, and E . P . Simonc elli, “Im- age quality assessment: from error visibili ty to struc tural sim ilarity ,” IEEE T rans actions on Imag e Pro c essing , v ol. 13, n o. 4, pp. 600–612, Apr 2004. [64] L. Zhan g and H. Li, “S R-SIM: A f ast and hi gh p e rformance IQA index b ased on spe ctral resid u al,” in 2012 19th IEEE Internation al Confer ence on Image Pr o c e ssing (ICIP) , S ep 2012, pp. 1473–1476. [65] Z. W ang and A. C. Bo vik, “Reduce d- and no-refere nce image quality assessment,” IEEE Signal Pr o c essing Mag azine , v ol. 28, no. 6, p p . 29–40, N ov 2011. [66] S. M. Ka y , F undamentals of Statisti c al Signal Pro cessi ng, V olume I: Estimation Theory , ser. Prentice Hall Signal Pro cessing Serie s. Up - p er Saddl e River, NJ: Prentice-Hall, 1993, vol. 1. [67] R. Pandit, N. Khosla, G. Singh, , and H. Sharma, “Im age compres- sion and quality factor in case of JPEG image form at,” In ternational Journal of A dvanc e d Rese ar ch in Com puter and C ommunic ation Engine ering , vol. 2, pp. 2578–2581, Jul 2013. [68] x264 te am, “x264,” http://www.videolan.org/develop ers/x264.html , 2015. [69] S. Gordon, D. Marp e , and T. Wi egand, “Simpliﬁe d use of 8 × 8 transform–up date d prop osal and results,” Joint Vide o T eam (JVT) of ISO/IE C MPEG an d ITU-T V CEG, do c. JVT–K028, Munich, Ger - many , Mar 2004. [70] “Xiph.org Vide o T est Me dia,” https://media.xip h.org/video/de rf/ , 2015. [71] (2015) RO AC H2. https://casp er.b e rkeley .edu . [72] I. F. Akyildiz , T. Melod i a, and K. R. Cho wdhury , “A survey on wire- less multime dia sensor netw orks,” Com puter Networks , v ol . 51, p p. 921–960, 2007. [73] N. Ki mura an d S. Latiﬁ, “A survey on data c ompression i n wirele ss sensor netw orks,” in 2005 International Confer enc e on Information T e chnolo gy: Codi ng and Computing (ITC C ) , vol. 2, Apr 2005, pp. 8–13. 11

Low-complexity Image and Video Coding Based on an Approximate Discrete Tchebichef Transform

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment