Gradient-based Filter Design for the Dual-tree Wavelet Transform

Gradien t-based Filter Design for the Dual-tree W a v elet T ransform Daniel Recoskie ∗ and Ric hard Mann Univ ersity of W aterlo o Cheriton Sc ho ol of Computer Science W aterlo o, Canada { dprecosk, mannr } @u waterloo.ca Abstract The w av elet transform has seen success when incorp orated in to neu- ral netw ork architectures, such as in w av elet scattering netw orks. More recen tly , it has b een shown that the dual-tree complex w av elet transform can provide b etter representations than the standard transform. With this in mind, we extend our previous method for learning ﬁlters for the 1D and 2D wa velet transforms in to the dual-tree domain. W e sho w that with few mo diﬁcations to our original model, we can learn directional ﬁlters that leverage the properties of the dual-tree w av elet transform. 1 In tro duction In this work we explore the task of learning ﬁlters for the dual-tree complex w av elet transform [9, 17]. This transform was introduced to address several shortcomings of the separable, real-v alued w av elet transform algorithm. How- ev er, the dual-tree transform requires greater care when designing ﬁlters. The added complexit y mak es the transform a go od candidate to replace the tradi- tional ﬁlter deriv ations with learning. W e demonstrate that it is p ossible to learn ﬁlters for the dual-tree c ompl ex wa v elet transform in a similar fashion to [16, 15]. W e sho w that very few changes to the original auto encoder framew ork are necessary to learn ﬁlters that o vercome the limitations of the separable 2D w av elet transform. W av elet represen tations ha ve been sho wn to perform w ell on a v ariet y of ma- c hine learning tasks. Speciﬁcally , wa velet scattering net w orks hav e shown state- of-the-art results despite the fact they use a ﬁxed representation (in contrast to the learned representations of con volutional neural netw orks) [2, 11, 3, 12]. ∗ Thanks to NSER C for funding. 1 Similar work has b een done using oriented bandpass ﬁlters in the SOE-Net [6]. More recently , it has b een sho wn that extending this work using the dual-tree complex w av elet transform can lead to impro ved results [18, 19]. In all of these examples, the ﬁlters used in the net works are ﬁxed. Building upon our work in [16, 15], the goal of this work is to demonstrate that it is possible to learn v alid ﬁlters for the dual-tree complex wa velet transform for use in neural netw ork arc hitectures. 2 Problems with the W a velet T ransform The discrete real wa velet transform has man y desirable prop erties, such as a linear time algorithm and basis functions (w av elets) that are not ﬁxed. Ho w- ev er, the transform do es ha ve some dra wbacks. The four ma jor limitations that we will consider are: shift v ariance, oscillations, lac k of directionality , and aliasing. In order to ov ercome these problems, w e will need to make use of com- plex wa velets. W e will restrict ourselves to a single formulation of the complex w av elet transform kno wn as the dual-tree complex wa v elet transform (DTCWT) [9]. The DTCWT ov ercomes the limitations of the standard wa v elet trans- form (with some computational o verhead). Before going into the details of the DTCWT, we will discuss the issues with the standard wa velet transform. A more detailed discussion can b e found in [17]. 2.1 Shift V ariance A ma jor problem with the real-v alued wa velet transform is shift v ariance. A desirable prop ert y for any representation is that small p erturbations of the input should result only in small p erturbations in the feature representation. In the case of the wa velet transform, translating the input (ev en b y a single sample) can result in large c hanges to the w av elet co eﬃcien ts. Figure 1 demonstrates the issue of shift v ariance. W e plot the third scale w a velet coeﬃcients of a signal comp osed of a single step edge. The signal is shifted by a single sample, and the co eﬃcien ts are recomputed. Note that the coeﬃcient v alues c hange signiﬁcan tly after the shift. 2.2 Oscillations Figure 1 also demonstrates that w av elet co eﬃcien ts are not stable near signal singularities such as step edges or impulses. Note how the wa velet co eﬃcien ts c hange in v alue and sign near the edge. W e can see this problem is more detail b y lo oking at the non-decimated w av elet co eﬃcien ts of the step edge in Figure 2. The co eﬃcien ts necessarily oscillate b ecause the wa velet ﬁlter is highpass. 2.3 Lac k of Directionality The standard multidimensional wa v elet transform algorithm makes use of sepa- rable ﬁlters. In other w ords, a single one-dimensional ﬁlter is applied along each 2 (a) (b) (c) Figure 1: (a) A signal containing a single step edge. (b) The third lev el Daub ec hies 6 wa velet coeﬃcients of (a). (c) Same as (b) but with the signal in (a) shifted by a single sample. (a) (b) Figure 2: (a) A signal containing a single step edge. (b) The (undecimated) third level Daubechies 6 wa velet coeﬁcients corresp onding to the edge. dimension of the input. Though this method lends itself to an eﬃcien t imple- men tation of the algorithm, there are some problems in the impulse resp onses of the ﬁlters. Namely , we can only properly achiev e horizontal and v ertical directions of the ﬁlters, while the diagonal direction suﬀers from c heck erb oard artifacts. Figure 3 shows impulse responses of a t ypical wa v elet ﬁlter in the 2D transform. The impulse resp onses demonstrate these problems. The problem of not having directionalit y can b e seen b y reconstructing sim- ple images from w av elet co eﬃcien ts at a single scale. Figure 4 sho ws tw o images reconstructed from their fourth scale wa v elet co eﬃcien ts. Note that the hori- zon tal and vertical edges in the ﬁrst image appear without an y artifacts. The diagonal edges, on the other hand, hav e signiﬁcan t irregularities. The curved image illustrates this further, as almost all edges are oﬀ-axis. 2.4 Aliasing The ﬁnal problem we w i ll discuss is aliasing. The standard decimating discrete w av elet transform algorithm downsamples the w av elet co eﬃcien ts b y a factor of t wo after the wa velet ﬁlters are applied. Normally we must apply a lowpass 3 Figure 3: T ypical impulse resp onse of ﬁlters used in the 2D w av elet transform. (a) (b) Figure 4: (a) Left: An image with three edge orientations. Righ t: Reconstruc- tion of the image using only the fourth band Daub ec hies 2 w av elet co eﬃcien ts. Note the irregularities on the edges that are not axis-aligned. (b) Same as (a), but using an image with a curved edge. (an tialiasing) ﬁlter prior to downsampling a signal to preven t aliasing. Aliasing o ccurs when downsampling is p erformed on a signal that contains frequencies ab o ve the Nyquist rate. The energy of these high frequencies is reﬂected onto the lo w frequency comp onen ts, causing artifacts. The wa v elet transform av oids aliasing by careful construction of the wa velet ﬁlter. Thus, the original signal can b e p erfectly reconstructed from the do wnsampled w av elet coeﬃcients. Ho wev er, an y c hanges made to the wa velet co eﬃcien ts prior to p erforming the in verse transform (suc h as quantization) can cause aliasing in the reconstructed signal. Figure 5 shows the eﬀects of quantization of a simple 1D signal. In the ﬁrst case, quan tization is p erformed in the sample domain, leading to step edges in the signal. In the second case, a one lev el wa velet transform is ﬁrst applied to the signal. Quantization is then performed on the wa v elet coeﬃcients prior to reconstruction. This second metho d of quantization introduces artifacts that did not o ccur when quantizing in the sample domain. 4 (a) (b) (c) Figure 5: (a) Original signal. (b) The signal from (a) quantized to nine levels. (c) A one level w av elet transform is ﬁrst applied to (a), the wa v elet co eﬃcien ts are quantized to nine levels, and an inv erse transform is applied. Note the artifacts near the step edges. 3 The W a v elet T ransform 3.1 1D W a velet T ransform The wa velet transform is a linear time-frequency transform that was introduced b y Haar [5]. It mak es use of a dictionary of functions that are lo calized in time and frequency . These functions, called wa velets, are dilated and shifted versions of a single mother w av elet function. Our work will b e restricted to discrete w av elets which are deﬁned as ψ j [ n ] = 1 2 j ψ  n 2 j  (1) for n, j ∈ Z . W e restrict the wa velets to hav e unit norm and zero mean. Since the wa velet functions are bandpass, w e require the introduction of a scaling func- tion, φ , so that we can can cov er all frequencies down to zero. The magnitude of the F ourier transform of the scaling function is set so that [10] | ˆ φ ( ω ) | 2 = Z + ∞ 1 | ˆ ψ ( sω ) | 2 s ds. (2) Supp ose x is a discrete, uniformly sampled signal of length N . The w av elet co eﬃcien ts are computed by con volving eac h of the w av elet functions with x . The discrete wa velet transform is deﬁned as W x [ n, 2 j ] = N − 1 X m =0 x [ m ] ψ j [ m − n ] . (3) The discrete wa velet transform is computed by wa y of an eﬃcient iterative algorithm. The algorithm mak es use of only tw o ﬁlters in order to compute all the wa v elet co eﬃcien ts from Equation 3. The ﬁrst ﬁlter, h [ n ] =  1 √ 2 φ  t 2  , φ ( t − n )  (4) 5 is called the scaling (lowpass) ﬁlter. The second ﬁlter, g [ n ] =  1 √ 2 ψ  t 2  , φ ( t − n )  (5) is called the wa velet (highpass) ﬁlter. Eac h iteration of the algorithm computes the following, a j +1 [ p ] = + ∞ X n = −∞ h [ n − 2 p ] a j [ n ] (6) d j +1 [ p ] = + ∞ X n = −∞ g [ n − 2 p ] a j [ n ] (7) with a 0 = x . The detail co eﬃcien ts, d j , corresp ond exactly to the wa velet co eﬃcien ts in Equation 3. The appro ximation co eﬃcien ts, a j , are successiv ely blurred v ersions of the original signal. The co eﬃcien ts are downsampled by a factor of tw o after each iteration. If the wa velet and scaling functions are orthogonal, we can reconstruct the signal according to a j [ p ] = + ∞ X n = −∞ h [ p − 2 n ] a j +1 [ n ] + + ∞ X n = −∞ g [ p − 2 n ] d j +1 [ n ] (8) The co eﬃcien ts must b e upsampled at eac h iteration b y inserting zeros at even indices. The reconstruction is called the inv erse discrete w av elet transform. 3.2 2D W a velet T ransform In order to process 2D signals, such as images, we must use a mo diﬁed version of the w av elet transform that can b e applied to multidimensional signals. The simplest modiﬁcation is to apply the ﬁlters separately along eac h dimension [14]. Figure 6a sho ws the coeﬃcients computed for a single iteration of the algorithm, where the input is an image. Let us deﬁne the following four co eﬃcien t comp onen ts appro ximation: LR ( LC ( x )) (9) detail horizontal: LR ( H C ( x )) (10) detail vertical: H R ( LC ( x )) (11) detail diagonal: H R ( H C ( x )) (12) where LR and H R corresp ond to conv olving the scaling and wa v elet ﬁlters re- sp ectiv ely along the rows. W e can deﬁne LC and H C similarly for the columns. 6 lowpass rows lowpass columns highpass columns lowpass columns highpass rows highpass columns LR LC LR HC HR LC HR HC - approximation co e ﬃ cien ts - detail coe ﬃ cients (a) LR LC HR HC LR HC LR HC LR HC HR LC HR HC HR LC HR LC HR HC (b) Figure 6: (a) One iteration of the 2D discrete wa velet transform. (b) Typical arrangemen t of the wa velet coeﬃcients after three iterations of the algorithm. A t eac h iteration of the algorithm, we compute the appro ximation co eﬃcien ts from Equation 9 and the three detail coeﬃcient comp onen ts from Equations 10 – 12. Like in the 1D case, the co eﬃcien ts are subsampled after eac h conv o- lution. The co eﬃcien ts are t ypically arranged as in Figure 6b. Performing the 2D wa v elet transform in this manner is used in the JPEG2000 standard [4]. 3.3 Dual-T ree Complex W av elet T ransform The problems with the standard w av elet transform discussed in the previous section can b e ov ercome with the DTCWT [17] (see Figures 7, 8, and 9). W e will restrict our discussion to the 2D version of the transform. As its name suggests, the dual-tree w av elet transform mak es use of multiple computational trees. The trees each use a diﬀerent wa velet and scaling ﬁlter pair. W e will b egin our discussion with the real v ersion of the dual-tree transform. Let us denote the ﬁlters h i and g i , where i = 1 for the ﬁrst tree and i = 2 for the second tree. The dual-tree algorithm pro ceeds similarly to the 2D version. Lik e in the 2D case, w e compute four comp onen ts of co eﬃcien ts for eac h tree: LR i ( LC i ( x )), LR i ( H C i ( x )), H R i ( LC i ( x )), and H R i ( H C i ( x )). Each tree com- putes its own co eﬃcien t matrix W i . The ﬁnal co eﬃcien t matrices are computed as follows: W 1 ← W 1 ( x ) + W 2 ( x ) √ 2 (13) W 2 ← W 1 ( x ) − W 2 ( x ) √ 2 (14) W e therefore ha ve six bands of detail co eﬃcien ts at each level of the transform, as opp osed to three in the 2D case. Example impulse resp onses for real ﬁlters 7 is shown in Figure 13a. Note that they are oriented along six diﬀeren t direc- tions and there is no chec kerboard eﬀect. The drawbac k of the real dual-tree transform is that it is not approximately shift inv ariant [17]. The complex transform is similarly computed. The main diﬀerence is that there are a total of four trees instead of t wo (for a total of tw elv e detail bands). The tw elv e bands are of the form LR i ( H C j ( x )) , H R i ( LC j ( x )) , H R i ( H C j ( x )) (15) for i, j ∈ { 1 , 2 } . Let W i,j represen t the three bands of detail coeﬃcients using ﬁlter i for the rows and ﬁlter j for the columns. The ﬁnal complex detail co eﬃcien ts are computed as: W 1 , 1 ← W 1 , 1 ( x ) + W 2 , 2 ( x ) √ 2 (16) W 2 , 2 ← W 1 , 1 ( x ) − W 2 , 2 ( x ) √ 2 (17) W 1 , 2 ← W 1 , 2 ( x ) + W 2 , 1 ( x ) √ 2 (18) W 2 , 1 ← W 1 , 2 ( x ) − W 2 , 1 ( x ) √ 2 (19) In this work we will fo cus on q -shift ﬁlters [17]. That is, we restrict h 2 [ n ] = h 1 [ − n ] . (20) In other w ords, the ﬁlters used in the second tree are the reverse of the ﬁlters used in the ﬁrst tree. F urthermore, the ﬁlters used in the ﬁrst iteration of the algorithm m ust b e diﬀeren t than the ﬁlters used for the remaining iterations [17]. The initial ﬁlters must also ob ey the q -shift prop ert y , but b e oﬀset by one sample. W e will denote these ﬁlters using a prime symbol (e.g., h 0 i is the initial ﬁlter in the i th tree). Finally , we restrict ourselves to quadrature mirror ﬁlters, i.e., g [ n ] = ( − 1) n h [ − n ] . (21) Th us, we can deriv e the w av elet ﬁlters directly from the scaling ﬁlter b y re- v ersing it and negating alternating indices. Equations 20 and 21 mean w e can completely deﬁne the transform with the ﬁlters h 1 and h 0 1 . 4 The W a v elet T ransform as a Neural Net w ork The dual-tree w av elet transform is computed using tw o main op erations: con- v olution and do wnsampling. These tw o op erations are the same as those used in traditional con volutional neural netw orks (CNNs). With this observ ation in mind, we prop ose framing the dual-tree wa v elet transform as a mo diﬁed CNN 8 (a) (b) (c) Figure 7: (a) A signal containing a single step edge. (b) The magnitude of the third level DTCWT coeﬃcients of (a). (c) Same as (b) but with the signal in (a) shifted by a single sample. (a) (b) Figure 8: (a) Left: An image with three edge orientations. Righ t: Reconstruc- tion of the image using only the fourth band DTCWT co eﬃcien ts. (a) (b) (c) Figure 9: (a) Original signal. (b) The signal from (a) quantized to nine levels. (c) A one lev el DTCWT is ﬁrst applied to (a), the wa v elet co eﬃcien ts are quan tized to nine lev els, and an in v erse transform is applied. Note how the artifacts are muc h le ss pronounced. arc hitecture. See Figures 10 and 11 for illustrations of the real and complex v er- sions of the net work resp ectiv ely . This net work directly implements the wa velet transform algorithm, with eac h la yer represen ting a single iteration. The output of the netw ork are the wa velet co eﬃcien ts of the input image. T o demonstrate how this mo del b eha v es, w e construct an autoenco der frame- w ork [7] consisting of a dual-tree wa velet netw ork follow ed by an inv erse net work. 9 W 1 ( x ) W 2 ( x ) W 1 ( x )+ W 2 ( x ) p 2 W 1 ( x )  W 2 ( x ) p 2 initial ﬁ lter pair second ﬁ lter pair Figure 10: The real dual-tree w av elet transform net work. Eac h W 1 and W 2 are computed as in Equations 13 and 14. W i;j ( x ) initial ﬁ lter pair second ﬁ lter pair ﬁ lter i along rows ﬁ lter j along columns W i;j ( x ) initial ﬁ lter pair second ﬁ lter pair ﬁ lter i along rows ﬁ lter j along columns W 1 ; 1 ( x ) W 1 ; 2 ( x ) W 2 ; 1 ( x ) W 2 ; 2 ( x ) ± Figure 11: The complex dual-tree wa velet transform net work. Each W i,j is computed as in Equations 16 – 19. The in termediate representation in the auto encoder are exactly the wa velet co- eﬃcien ts of the input image. W e will imp ose a sparsit y constrain t on the wa velet co eﬃcien ts so that the mo del will learn ﬁlters that can summarize the structure of the training set. In order for our learned ﬁlters to compute a v alid wa velet transform, we use the following constrain ts on the ﬁlters [16]: L w ( h ) = ( || h || 2 − 1) 2 + ( µ h − √ 2 /k ) 2 + µ 2 g (22) where µ h and µ g are the means of the scaling and w av elet ﬁlters resp ectiv ely . The ﬁrst t wo terms are necessary so that the scaling function will hav e unit L 2 norm and ﬁnite L 1 norm [13, 10]. The third term requires that the w av elet ﬁlter has zero mean. In order to av oid degenerate ﬁlters, w e will require an additional loss term. W e desire ﬁlters that will giv e lo calized and cen tered impulse resp onses. There- fore, w e prop ose a loss that prefers impulse resp onses that are close to Gaussian. Let M k b e the matrix corresp onding to the magnitude of the impulse resp onse for the k th band (i.e. the sum of the squares of the real and imaginary impulse 10 resp onses). And let G b e a circular Gaussian matrix of the form: G i,j = αe − [( i − i o ) 2 +( j − j 0 ) 2 ] / (2 σ 2 ) (23) with i 0 and j 0 c hosen to cen ter the Gaussian in the matrix. The new loss term is of the form L g = 6 X k =1 || G − M k || 2 2 (24) W e will learn the netw ork parameters using gradient descent, and so must deﬁne a loss function ov er a dataset of images X = { x 1 , x 2 , . . . x M } : L ( X ; h 1 , h 0 1 ) = 1 M M X k =1 || x k − ˆ x k || 2 2 + λ 1 1 M M X k =1 X i,j || W i,j ( x k ) || 1 + λ 2 [ L w ( h 1 ) + L w ( h 0 1 )] + λ 3 L g (25) where ˆ x k is the reconstruction of the auto encoder. W e use mean squared error for our reconstruction loss and the L 1 norm for the sparsity p enalt y . The λ parameters control the trade-oﬀ b et ween the loss terms. F or the real v alued net work, the index j is remov ed from the sparsit y summation. 5 Exp erimen ts W e make use of synthetic images to demonstrate that our mo del is able to learn ﬁlters with localized directional structure. The generation pro cess is the same one used in [15], but with a restriction to sine wa ves. The pro cess extends the syn thetic generation pro cess for 1D data from [16]: x ( t ) = K − 1 X k =0 a k · sin (2 k t + φ k ) (26) where φ k ∈ [0 , 2 π ] and a k ∈ { 0 , 1 } are chosen uniformly at random. In Equation 26, φ k is a phase oﬀset and a k is an indicator v ariable that determines whether a particular harmonic is presen t. W e ﬁx the size of the images to b e 128 × 128. Figure 12 shows some example syn thetic images. W e set λ 1 = . 1, λ 2 = 1, λ 3 = 4e-5 (real netw ork), and λ 3 = 4e-4 (complex net work). F or the L g cost, we compute the impulse resp onse at the fourth scale and set α = . 02 and σ = 10 (found empirically). W e train using sto c hastic gradien t descent using the Adam algorithm [8]. Our implementation is done using T ensorﬂo w [1]. Figure 13 shows the impulse resp onses of the ﬁlters learned by our real and complex mo dels. In this case, all ﬁlters w ere of length ten. W e can see that 11 Figure 12: Examples of synthetic images. (a) Real transform (b) Complex transform Figure 13: Example impulse responses of ﬁlters learned using our (a) real and (b) complex mo d el. The top and b ottom ro ws of (b) corresp ond to the real and complex parts of the ﬁlter responses resp ectiv ely . the mo del is able to learn lo calized, directional ﬁlters from the data. Figure 14 sho ws a comparison of the learned h ﬁlter from Figure 13b to Kingsbury’s Q-shift ﬁlters. Note that h has a v ery similar structure. The shifted cosine distance measure used in [16] is sho wn ab o ve each plot. See the app endix for more ﬁlters and plots. The Gaussian loss term is important for learning directional ﬁlters. Figure 15 sho w a selection of ﬁlters learned without the impulse response loss term (i.e. λ 3 = 0 in Equation 25). W e can see that the ﬁlters are no longer directional, and share similar structure to the impulse resp onses from Figure 3. 6 Conclusion W e presen ted a model based on the dual-tree complex w av elet transform that is capable of learning directional ﬁlters that ov ercome the limitations of the standard 2D w a velet transform. W e made use of an autoenco der framework, and constrained our loss function to fav our orthogonal w av elet ﬁlters that produced sparse representations. This w ork builds upon our previous w ork on the 1D and 2D wa velet transform. W e prop ose this metho d as an alternative to the traditional ﬁlter design metho ds used in wa velet theory . 12 2 4 6 8 10 -0.2 0 0.2 0.4 0.6 0.8 Q-shift learned distance: 2.135335e-03 (a) 2 4 6 8 10 -0.2 0 0.2 0.4 0.6 0.8 Q-shift learned distance: 4.239467e-03 (b) Figure 14: Comparison of Kingsbury’s Q-shift (a) 6 tap, and (b) 10 tap ﬁlters with the learned h ﬁlter from Figure 13b. Note that h has b een reversed. (a) (b) (c) (d) Figure 15: Example impulse responses of ﬁlters learned using our (a,c) real and (b,d) complex mo del without the impulse resp onse loss term (i.e. λ 3 = 0). References [1] Mart ´ ın Abadi et al. T ensorFlow: Large-scale machine learning on hetero- geneous systems, 2015. Softw are a v ailable from tensorﬂo w.org. [2] Joan Bruna and St ´ ephane Mallat. Classiﬁcation with scattering op era- tors. In Computer Vision and Pattern R e c o gnition (CVPR), 2011 IEEE Confer enc e on , pages 1561–1566. IEEE, 2011. 13 [3] Joan Bruna and St´ ephane Mallat. In v arian t scattering con volution net- w orks. IEEE tr ansactions on p attern analysis and machine intel ligenc e , 35(8):1872–1886, 2013. [4] Charilaos Christop oulos, A thanassios Sk o dras, and T ouradj Ebrahimi. The jp eg2000 still image coding system: an ov erview. IEEE tr ansactions on c onsumer ele ctr onics , 46(4):1103–1127, 2000. [5] Alfred Haar. Zur theorie der orthogonalen funktionensysteme. Mathema- tische Annalen , 69(3):331–371, 1910. [6] Isma Hadji and Richard P Wildes. A spatiotemp oral oriented energy net- w ork for dynamic texture recognition. In Pr o c e e dings of the IEEE Con- fer enc e on Computer Vision and Pattern R e c o gnition , pages 3066–3074, 2017. [7] Geoﬀrey E Hinton and Ruslan R Salakhutdino v. Reducing the dimension- alit y of data with neural netw orks. scienc e , 313(5786):504–507, 2006. [8] Diederik Kingma and Jimmy Ba. Adam: A metho d for sto c hastic opti- mization. arXiv pr eprint arXiv:1412.6980 , 2014. [9] Nick Kingsbury . The dual-tree complex wa velet transform: a new eﬃcient to ol for image restoration and enhancement. In Signal Pr o c essing Confer- enc e (EUSIPCO 1998), 9th Eur op e an , pages 1–4. IEEE, 1998. [10] St´ ephane Mallat. A Wavelet T our of Signal Pr o c essing, Thir d Edition: The Sp arse Way . Academic Press, 3rd edition, 2008. [11] St´ ephane Mallat. Group inv ariant scattering. Communic ations on Pur e and Applie d Mathematics , 65(10):1331–1398, 2012. [12] St´ ephane Mallat. Understanding deep conv olutional netw orks. Phil. T r ans. R. So c. A , 374(2065):20150203, 2016. [13] Stephane G Mallat. Multiresolution appro ximations and wa velet orthonor- mal bases of L 2 ( R ). T r ansactions of the A meric an mathematic al so ciety , 315(1):69–87, 1989. [14] Stephane G Mallat. A theory for multiresolution signal decomp osition: the w av elet represen tation. IEEE tr ansactions on p attern analysis and machine intel ligenc e , 11(7):674–693, 1989. [15] Daniel Recoskie and Ric hard Mann. Learning ﬁlters for the 2D wa velet transform. In Computer and R ob ot Vision (CR V), 2018 15th Confer enc e on , pages 1–7. IEEE, 2018. [16] Daniel Recoskie and Ric hard Mann. Learning sparse w av elet representa- tions. arXiv pr eprint arXiv:1802.02961 , 2018. 14 [17] Iv an W Selesnick, Richard G Baraniuk, and Nic k C Kingsbury . The dual-tree complex w av elet transform. IEEE signal pr o c essing magazine , 22(6):123–151, 2005. [18] Amarjot Singh and Nic k Kingsbury . Dual-tree w av elet scattering net work with parametric log transformation for ob ject classiﬁcation. In A c oustics, Sp e e ch and Signal Pr o c essing (ICASSP), 2017 IEEE International Confer- enc e on , pages 2622–2626. IEEE, 2017. [19] Amarjot Singh and Nic k Kingsbury . Eﬃcient conv olutional netw ork learn- ing using parametric log based dual-tree wa velet scatternet. In Pr o c e e d- ings of the IEEE Confer enc e on Computer Vision and Pattern R e c o gnition , pages 1140–1147, 2017. 15 App endix A Learned Filter V alues W e include the learned ﬁlter v alues from Figures 15 and 13. A.1 Real dual-tree ﬁlters h 0 1 h 1 +7 . 538e-04 +3 . 207e-02 − 6 . 960e-02 − 4 . 052e-03 − 4 . 274e-02 − 5 . 705e-02 +4 . 233e-01 +2 . 732e-01 +7 . 917e-01 +7 . 357e-01 +4 . 234e-01 +5 . 615e-01 − 4 . 291e-02 +5 . 348e-03 − 6 . 972e-02 − 1 . 456e-01 +2 . 573e-04 − 5 . 864e-03 +4 . 025e-04 +2 . 406e-02 Filters from Figure 13a 2. h 0 1 h 1 − 1 . 010e-02 +1 . 970e-02 +3 . 534e-02 − 3 . 822e-02 +2 . 407e-02 − 1 . 078e-01 − 8 . 123e-02 +2 . 241e-01 +2 . 722e-01 +7 . 332e-01 +7 . 981e-01 +6 . 144e-01 +5 . 143e-01 +5 . 249e-02 − 4 . 624e-02 − 1 . 302e-01 − 9 . 701e-02 +9 . 322e-03 − 5 . 738e-04 +3 . 693e-02 Filters from Figure 15a. h 0 1 h 1 +1 . 956e-02 +1 . 970e-02 − 5 . 031e-02 − 3 . 822e-02 − 7 . 082e-02 − 1 . 078e-01 +4 . 035e-01 +2 . 241e-01 +8 . 096e-01 +7 . 332e-01 +4 . 028e-01 +6 . 144e-01 − 7 . 107e-02 +5 . 249e-02 − 4 . 922e-02 − 1 . 302e-01 +1 . 958e-02 +9 . 322e-03 − 1 . 094e-05 +3 . 693e-02 Filters from Figure 15c. 2 4 6 8 10 -0.2 0 0.2 0.4 0.6 0.8 Q-shift learned distance: 2.548161e-03 (a) 2 4 6 8 10 -0.2 0 0.2 0.4 0.6 0.8 Q-shift learned distance: 3.496355e-03 (b) Figure 16: Comparison of Kingsbury’s Q-shift (a) 6 tap, and (b) 10 tap ﬁlters with the learned h ﬁlter from Figure 13a. 16 A.2 Complex dual-tree ﬁlters h 0 1 h 1 +1 . 017e-02 +1 . 914e-02 +3 . 677e-02 − 1 . 947e-02 − 3 . 689e-02 − 1 . 469e-01 − 5 . 913e-02 +3 . 960e-02 +3 . 982e-01 +6 . 020e-01 +8 . 018e-01 +7 . 357e-01 +4 . 223e-01 +2 . 393e-01 − 7 . 171e-02 − 8 . 599e-02 − 8 . 616e-02 − 5 . 614e-03 +4 . 248e-05 +3 . 850e-02 Filters from Figure 13b. h 0 1 h 1 +1 . 523e-02 +2 . 826e-02 − 5 . 440e-02 − 1 . 266e-02 − 6 . 360e-02 − 1 . 225e-01 +4 . 075e-01 +1 . 364e-01 +8 . 054e-01 +6 . 800e-01 +4 . 075e-01 +6 . 807e-01 − 6 . 346e-02 +1 . 358e-01 − 5 . 442e-02 − 1 . 220e-01 +1 . 547e-02 − 1 . 340e-02 +1 . 666e-04 +2 . 921e-02 Filters from Figure 15b. h 0 1 h 1 +1 . 903e-02 − 8 . 475e-02 − 4 . 904e-02 − 8 . 946e-02 − 7 . 057e-02 +3 . 345e-01 +4 . 024e-01 +7 . 659e-01 +8 . 090e-01 +5 . 145e-01 +4 . 020e-01 − 1 . 701e-02 − 7 . 052e-02 − 9 . 557e-02 − 4 . 888e-02 +5 . 711e-02 +1 . 931e-02 +3 . 971e-02 +5 . 240e-04 − 9 . 468e-03 Filters from Figure 15d. 2 4 6 8 10 -0.2 0 0.2 0.4 0.6 0.8 Q-shift learned distance: 2.135335e-03 (a) 2 4 6 8 10 -0.2 0 0.2 0.4 0.6 0.8 Q-shift learned distance: 4.239467e-03 (b) Figure 17: Comparison of Kingsbury’s Q-shift (a) 6 tap, and (b) 10 tap ﬁlters with the learned h ﬁlter from Figure 13b. Note that h has b een reversed. 17 h 0 1 h 1 +1 . 640e-02 − 1 . 571e-03 − 5 . 179e-02 − 2 . 007e-03 − 6 . 775e-02 +2 . 725e-02 +4 . 049e-01 − 2 . 220e-03 +8 . 068e-01 − 1 . 437e-01 +4 . 050e-01 +2 . 654e-03 − 1 . 189e-01 +5 . 598e-01 +1 . 654e-02 +7 . 526e-01 +3 . 910e-04 +2 . 865e-01 − 7 . 620e-02 − 2 . 374e-02 +3 . 958e-02 +2 . 277e-03 − 7 . 738e-03 Learned length 10 h 0 and length 14 h ﬁlters. 2 4 6 8 10 12 14 -0.2 0 0.2 0.4 0.6 0.8 Q-shift learned distance: 2.285723e-03 Figure 18: Comparison of Kingsbury’s Q-shift 14 tap ﬁlter with the learned length 14 h . Note that h has b een rev ersed. h 0 1 h 1 +1 . 952e-02 +4 . 335e-04 − 4 . 506e-02 +3 . 752e-04 − 6 . 062e-02 − 6 . 027e-03 +4 . 069e-01 +3 . 851e-03 +8 . 066e-01 +3 . 955e-02 +4 . 043e-01 − 2 . 870e-02 − 7 . 323e-02 − 8 . 351e-02 − 6 . 027e-02 +2 . 887e-01 +1 . 452e-02 +7 . 560e-01 − 9 . 868e-05 +5 . 569e-01 +3 . 984e-03 − 1 . 344e-01 − 1 . 460e-03 +2 . 056e-02 − 2 . 688e-03 − 5 . 462e-04 − 1 . 798e-03 +5 . 302e-05 Learned length 10 h 0 and length 18 h ﬁlters ( λ 3 w as increased to 8e-4). 5 10 15 -0.2 0 0.2 0.4 0.6 0.8 Q-shift learned distance: 1.437532e-03 Figure 19: Comparison of Kingsbury’s Q-shift 18 tap ﬁlter with the learned length 18 h . 18 App endix B F ull Dual-tree Complex W a velet T rans- form Net w ork W 2 ; 1 ( x ) W 2 ; 2 ( x ) W 1 ; 2 ( x ) − W 2 ; 1 ( x ) p 2 W 1 ; 1 ( x ) − W 2 ; 2 ( x ) p 2 initial ﬁ lter pair second ﬁ lter pair W 1 ; 1 ( x ) W 1 ; 2 ( x ) W 1 ; 1 ( x )+ W 2 ; 2 ( x ) p 2 W 1 ; 2 ( x )+ W 2 ; 1 ( x ) p 2 (1 ; 1) (1 ; 2) (2 ; 1) (2 ; 2) ( i; j ) - ﬁ lter i along rows, ﬁ lter j along columns Figure 20: The full complex dual-tree wa v elet transform net work. Each W i,j is computed as in Equations 16 – 19. 19

Gradient-based Filter Design for the Dual-tree Wavelet Transform

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment