Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks

Single-Channel Signal Separation and Decon volution with Generativ e Adversarial Netw orks Qiuqiang K ong 1 , Y ong Xu 2 , W enwu W ang 1 , Philip J.B. J ackson 1 and Mark D. Plumbley 1 1 Uni versity of Surre y , Guildford, UK 2 T encent AI lab, Bellevue, USA {q.kong, w .wang, p.jackson, m.plumbley}@surre y .ac.uk, lucayongxu@tencent.com Abstract Single-channel signal separation and decon volution aims to separate and decon volv e individual sources from a single-channel mixture and is a challeng- ing problem in which no prior knowledge of the mixing ﬁlters is av ailable. Both individual sources and mixing ﬁlters need to be estimated. In addi- tion, a mixture may contain non-stationary noise which is unseen in the training set. W e propose a synthesizing-decomposition (S-D) approach to solve the single-channel separation and decon volu- tion problem. In synthesizing, a generative model for sources is built using a generativ e adversar - ial network (GAN). In decomposition, both mix- ing ﬁlters and sources are optimized to minimize the reconstruction error of the mixture. The pro- posed S-D approach achiev es a peak-to-noise-ratio (PSNR) of 18.9 dB and 15.4 dB in image inpainting and completion, outperforming a baseline con volu- tional neural network PSNR of 15.3 dB and 12.2 dB, respectiv ely and achieves a PSNR of 13.2 dB in source separation together with decon v olution, out- performing a con volutiv e non-negati ve matrix fac- torization (NMF) baseline of 10.1 dB. 1 Introduction Single-Channel signal separation and decon volution aims to separate and decon volv e sources from a single-channel mix- ture. One challenging aspect of single-channel signal sepa- ration and decon volution is that only a single-channel mix- ture is av ailable, so this problem is underdetermined. Sec- ond, there is no prior knowledge of the mixing ﬁlters. Both individual sources and mixing ﬁlters are unknown and need to be estimated. Third, there is no prior knowledge on the noise, which can be non-stationary and has not been seen in the training data. These difﬁculties lead to single- channel signal separation and deconv olution being a very challenging problem. Single-channel signal separation and decon volution has many applications in image, speech and audio denoising [ Xie et al. , 2012 ] , inpainting [ Y eh et al. , 2016 ] , decon volution and separation [ Cichocki et al. , 2009; Mijovic et al. , 2010 ] . For example, an audio sensor usually receiv es signals from multiple sources con volved with chan- nel distortion. Much previous work focuses on source separation [ Ci- chocki et al. , 2009; Grais et al. , 2014 ] or decon volution [ Levin et al. , 2009; Campisi and Egiazarian, 2017 ] inde- pendently , but not together . W e categorize previous source separation and deconv olution methods into decomposition based approaches and regression based approaches. Decom- position methods usually learn a set of bases for sources and use these bases to decompose a mixture. Decomposing methods including non-negati ve matrix factorization (NMF) [ Lee and Seung, 1999; Cichocki et al. , 2009; Kitamura et al. , 2013 ] assumes that a source can be represented by lin- ear combination of a set of bases. NMF has been used in source representation and separation [ Cichocki et al. , 2006; Kitamura et al. , 2013 ] . In contrast to the decomposition based approaches, regression based approaches learn a map- ping from a mixture to an indi vidual source. Such map- pings can be modeled by neural networks, for example, fully connected neural networks [ Grais et al. , 2014 ] and conv o- lutional neural networks (CNNs) [ Jain and Seung, 2009; Zhang et al. , 2017 ] . In [ Xie et al. , 2012 ] , a stacked denois- ing auto-encoder (DAE) is proposed to recov er sources from a mixture. CNNs are used for source decon volution in [ Xu et al. , 2014 ] . Howe ver , many decomposition methods such as NMF and ICA are shallow layer models, which are typically a linear combination of bases. These shallow layer models do not hav e enough capacity to represent a broad range of sources compared with neural networks [ Jain and Seung, 2009 ] . On the other hand, regression based approaches such as deep neural networks are able to model complicated mappings b ut require both mixture and target sources for training. Re- gression based methods may not generalize well if the mix- ing ﬁlter and noise in the testing data hav e different dis- tribution from the training data, which will result in poor separation results when the mixing ﬁlter and noise are un- seen in the training data [ Y osinski et al. , 2014 ] . Recently generativ e adversarial networks (GANs) hav e been proposed for solving the source separation problem [ Fan et al. , 2018; Subakan and Smaragdis, 2017; Stoller et al. , 2017 ] . So far these methods assume that the mixing ﬁlters in the single- channel signal separation problem are known. This paper proposed a novel synthesizing-decomposition (S-D) approach to solve the single-channel source separation and deconv olution problem. Compared to the conv entional regression approaches, the S-D approach applies generative adversarial network (GANs) to solve this problem in a gen- erativ e way . The S-D approach can estimate both the sources and con volutiv e mixing ﬁlters, while con ventional regression methods do not estimate conv olutive mixing ﬁlters. In addi- tion, we formulate the single-channel signal separation and decon volution problem as a Bayesian maximum a posteriori (MAP) estimation which is a constrained non-con vex opti- mization problem. In the S-D approach, a generati ve model is built for sources using a generati ve adversarial netw ork (GAN). In decomposition, both sources and mixing ﬁlters can be obtained by minimizing the reconstruction error of a mix- ture. T o tackle the non-conv ex optimization problem, repeat- ing the decomposition with different initializations can sig- niﬁcantly increase the underdetermined single-channel signal separation and decon volution performance. W e carry out the underdetermined single-channel signal separation and decon- volution experiments on MNIST dataset as a starting research to show the effecti veness of the proposed S-D approach with GANs. This paper is organized as follows: Section 2 formulates the underdetermined single-channel signal separation and de- con volution problem. Section 3 proposes the synthesising- decomposition (S-D) approach for this problem. Section 4 shows experimental results. Section 5 concludes and fore- casts future work. 2 Single-Channel Signal Separation and Decon volution In underdetermined single-channel signal separation and de- con volution, a single-channel mixture x ( u ) ∈ L 2 (Ω) , u ∈ Ω is composed of individual sources s k ( u ) ∈ L 2 (Ω) , u ∈ Ω , k = 1 , ..., K con volv ed with unknown ﬁlters α k ( u ) ∈ L 2 (Ω) , u ∈ Ω , k = 1 , ..., K followed by unknown additional noise n ( u ) ∈ L 2 (Ω) , u ∈ Ω . The space Ω can be a Euclidean space R d where K and d denote the number and the dimen- sion of sources, respectiv ely: x ( u ) = K X k =1 ( α k ∗ s k )( u ) + n ( u ) . (1) The symbol ∗ represents the con volution operation: ( α k ∗ s k )( u ) = Z R d α k ( u − v ) s k ( v ) dv. (2) For the simple case of source separation without decon volu- tion, in (2) α k ( u ) simpliﬁes to α k ( u − v ) = α k δ ( u − v ) where δ ( u ) is the Dirac delta function. General single-channel signal separation and decon volution problem concerns both separating and deconv olving individual sources s k ( u ) , k = 1 , ..., K from a single-channel mixture x ( u ) while the mix- ing ﬁlters α k ( u ) , k = 1 , ..., K and the noise signal n ( u ) are unknown in (1). In the following paper , we simplify the no- tation of x ( u ) , s k ( u ) , α k ( u ) to x, s k and α k , respectiv ely . In the regression based approaches [ Jain and Seung, 2009; Grais et al. , 2014 ] , a mapping from a mixture to a source sig- nal is modeled by deep neural networks and learned to sepa- rate the k -th source: f k : x 7→ s k . In separation, separated sources are obtained by forwarding a mixture to the model: ˆ s test = f k ( x test ) . Howe ver there are several problems associ- ated with the regression based approaches as follo ws: Problem 1. In regression based supervised learning, the training data x train and testing data x test should have the same distrib ution, otherwise the trained model will be bi- ased [ Y osinski et al. , 2014 ] . Howe ver , in single-channel signal separation and deconv olution, no prior knowledge of test noise n is a v ailable. The model trained with training noise may not generalize well to sources with unseen non- stationary noise. Problem 2. In single-channel signal separation and decon- volution, both the sources s k and mixing ﬁlters α k are un- known and need to be estimated. Problem 3. Previous regression and decomposition based approaches do not constrain the distribution of the separated sources ˆ s to be the same as the distribution of real sources p real ( s ) . Ideally , the separated sources ˆ s should be re gularized in the area where p real ( ˆ s ) has lar ger value. Decomposition approaches such as NMF can be trained on individual sources instead of on a mixture so that Prob- lem 1 can be mitigated. Recently , GANs [ Fan et al. , 2018; Subakan and Smaragdis, 2017; Stoller et al. , 2017 ] hav e been applied to source separation to solve Problem 3 to constrain the separated sources to be laid in natural source space. How- ev er, those methods are based on the assumption that the mix- ing ﬁlters α k are constants so that they are solving only sepa- ration but not decon volution problem as sho wn in (1). 3 Proposed Synthesising-Decomposition (S-D) A pproach 3.1 Maximum a Posteriori (MAP) Estimation In this section, we ﬁrst formulate the single-channel sig- nal separation and decon volution problem in (1) as a Bayesian parameter estimation problem. W e denote θ = { s 1 , ..., s K , α 1 , ..., α K } as the set of parameters to be esti- mated, including sources and mixing ﬁlters. The estimated ˆ θ can be obtained by maximum a posteriori (MAP) estimation: ˆ θ = argmax θ p ( θ | x ) = argmax θ p ( x | θ ) p ( θ ) . (3) The ﬁrst term p ( x | θ ) in (3) is a likelihood function. The re- constructed signal can be written as ˆ x = P K k =1 α k ∗ s k . As- suming n is a Gaussian process, the likelihood of observed signal giv en estimated signal can be written as: p ( x | θ ) = p ( x | ˆ x ) = Y u ∈ Ω p ( x ( u ) | ˆ x ( u )) = Y u ∈ Ω N ( x ( u ) − K X k =1 α k ∗ s k , σ n ) (4) where N ( · , · ) is the probability density of a Gaussian distri- bution. The second term p ( θ ) in (3) is the prior probability of Algorithm 1 T raining of a GAN [ Goodfellow et al. , 2014 ] . 1: Inputs: Real data s n , n = 1 , ..., N . 2: Outputs: Parameters of the discriminator θ d and the gen- erator θ g of a GAN. 3: for number of iterations do • Sample minibatch of m noise samples { z (1) , ..., z ( m ) } from a Gaussian distribution N (0 , 1) . • Sample minibatch of m examples { s (1) , ..., s ( m ) } from real data. • Update the discriminator by ascending its stochastic gradient: O θ d 1 m m X i =1 h log D ( s ( i ) ) + log (1 − D ( G ( z ( i ) ))) i • Sample minibatch of m noise samples { z (1) , ..., z ( m ) } from a Gaussian distribution N (0 , 1) . • Update the generator by descending its stochastic gradient: O θ g 1 m m X i =1 log (1 − D ( G ( z ( i ) ))) 4: end for θ . Assuming the sources and ﬁlters are independent of each other , we can write p ( θ ) as: p ( θ ) = K Y k =1 p ( α k ) K Y k =1 p ( s k ) . (5) W e assume s k , k = 1 , ..., K to hav e a compact support V ∈ Ω . Substituting equations (4) and (5) to equation (3) the estimation of sources and ﬁlters can be obtained by solv- ing the following optimization problem: ˆ s k , ..., ˆ s K , ˆ α 1 , ..., ˆ α K = argmax s 1 ∈ V ,...,s K ∈ V α 1 ,...,α J Y u ∈ Ω N ( x ( u ) − K X k =1 α k ∗ s k , σ n ) K Y k =1 p ( α k ) K Y k =1 p ( s k ) (6) 3.2 Optimization with S-D A pproach T o optimize (6) is difﬁcult because of the constraint of s k ∈ V . The source prior p ( s k ) is unknown, so that V can not be written in a closed form. Our solution is to con vert (6) to an unconstrained optimization problem. In the proposed S-D approach, we ﬁrst build a generativ e model for x k with a GAN [ Goodfellow et al. , 2014; Subakan and Smaragdis, 2017 ] . A GAN consists of a generator G and a discrimina- tor D . The generator G is a mapping from any distribution p z such as a Gaussian distribution N (0 , σ I ) to a real distri- bution of sources. W e call p z a seed distribution and sam- ple z ∼ p z as seeds. The generator G is trained to generate Algorithm 2 Decomposition of a mixture source. Hyperpa- rameters: K : Number of indi vidual sources. 1: Inputs: A mixture source. Generator G trained using al- gorithm 1. 2: Outputs: Separated and decon volved sources s k , k = 1 , ..., K and mixing ﬁlters α k , k = 1 , ..., K . 3: Sample K seeds { z 1 , ..., z K } and K mixing ﬁlters { α 1 , ..., α K } from a Gaussian distribution N (0 , 1) . 4: for number of iterations do • Calculate reconstructed signal ˆ s = P K k =1 α k ∗ G ( z k ) . • Calculate gradient O φ from equation (11) where φ = { z 1 , ..., z K , α 1 , ..., α K } . • Update φ = { z 1 , ..., z K , α 1 , ..., α K } using Adam optimizer [ Kingma and Ba, 2015 ] . 5: end for samples to fool the discriminator D . The discriminator D is trained to discriminate fake sources from real sources. In other words, the generator G and the discriminator D play the following two-player minimax game with value function V ( G, D ) [ Goodfellow et al. , 2014 ] : min G max D V ( D , G ) = E s ∼ p data ( s ) [ log D ( x )] + E z ∼ p z ( z ) [ log (1 − D ( G ( z )))] (7) where p data is the real data probability density . The train- ing of the GAN is shown in Algorithm 1. The generator G and discriminator D are trained iteratively . If both G and D have enough capacity , then the generated source distribu- tion will con ver ge to p data [ Subakan and Smaragdis, 2017 ] . Once GAN is successfully trained, there is G ( z ) ∈ V for all z . T o solve the optimization problem in (6), we substi- tute s k = G ( z k ) and optimize ov er z k instead of s k so that the constraint s ∈ V is eliminated. No w the variables to be optimized are z k and the mixing ﬁlters α k . In addition, GAN does not predict the probability density p ( s k ) of s k so the optimization of equation (6) is intractable. T o solve this problem, we approximate p ( s k ) with: p ( s k ) =  0 , s k / ∈ V 1 / | V | , s k ∈ V . (8) Equation (8) assumes the probability density p ( s k ) outside V is zero. It is not required to know the value of | V | as it is eliminated when optimizing (6): ˆ z k , ..., ˆ z K , ˆ α 1 , ..., ˆ α K = argmax z 1 ,...,z K α 1 ,...,α J Y u ∈ Ω N ( x ( u ) − K X k =1 α k ∗ s k , σ n ) K Y k =1 p ( α k ) . (9) W e assume the coefﬁcients in α k to be Gaussian α k ∼ N (0 , σ α ) . T aking the logarithm of (9) the optimization can be written as: ˆ z k , ..., ˆ z K , ˆ α 1 , ..., ˆ α K = argmin z 1 ,...,z K α 1 ,...,α J      x − K X k =1 α k ∗ G ( z k )      2 2 + β K X k =1 k α k k 2 2 (10) where β = σ n /σ α is a regularization term for (10). 3.3 Optimization T o solve (10), we apply a gradient based iterativ e approach. W e denote φ = { z 1 , ..., z K , α 1 , ..., α K } where z k and α k need to be optimized. First we randomly initialize φ , then the gradients of φ are calculated by: O φ = ∂ ∂ φ         x − K X k =1 α k ∗ G ( z k )      2 2 + β K X k =1 k α k k 2 2    . (11) The parameters φ are optimized using Algorithm 2. Be- cause G is a non-linear mapping, so (10) is a non-conv ex function over φ . The gradient based methods might reach a local minimum depending on the initialization of seeds. T o mitigate this problem we repeat Algorithm 2 for L times and choose the one with smallest reconstruction error . 4 Experiments In this section, we apply the proposed S-D method to solve underdetermined image single-channel signal separation and decon volution problem. W e carry out experiments on MNIST 10-digit dataset [ LeCun et al. , 1998 ] as a starting research for this challenging problem and show the ef fectiv eness of the proposed S-D method. W ith dif ferent types of unkno wn mix- ing ﬁlters α k and unknown interference noise n , the prob- lem of (1) can be categorized as image denoising, inpainting, completion, decon volution and separation, as shown in T a- ble 1. The symbol ‘-’ represents any type of noise. Previous works usually focus on one of these problems such as de- noising [ Jain and Seung, 2009 ] , inpainting [ Xie et al. , 2012 ] , decon volution [ Xu et al. , 2014 ] or separation [ Subakan and Smaragdis, 2017 ] . In this paper we solve these problem to- gether with the proposed S-D method. The PyT orch imple- mentation of this paper is released 1 . 4.1 Model Conﬁguration In the proposed S-D approach, we model the synthesising procedure with a deep con voluti ve generative adversarial net- work (DCGAN) [ Radford et al. , 2015 ] , which can stabilize the training of a GAN and can generate high quality images as shown in [ Radford et al. , 2015 ] . A DCGAN consists of a generator G and a discriminator D . The input to G consists of a seed sampled from a Gaussian distrib ution N (0 , σ I ) . The Gaussian distribution has a dimension of 100 follo wing [ Rad- ford et al. , 2015 ] . The generator G has 4 transpose con- volutional layers with number of feature maps of 512, 256, 1 https://github .com/qiuqiangkong/gan_separation_decon volution 128 and 1, respectively . Follo wing [ Radford et al. , 2015 ] , batch normalization [ Ioffe and Sze gedy , 2015 ] and ReLU non-linearity are applied after each transpose conv olutional layer . The output of G is an image which has the same size as the images in the training data. The discriminator D takes a fake or a real image as input. The discriminator D consists of 4 con volutional layers, with a sigmoid output representing the probability that the input to D is from real data instead of generated data. Follo wing [ Radford et al. , 2015 ] , we use the Adam [ Kingma and Ba, 2015 ] optimizer with a learning rate of 0.0002, a β 1 of 0.5 and a β 2 of 0.999 to train the gen- erator . In decomposition, we freeze the trained generator G . W e approximate p ( x | ˆ x ) with a Gaussian distribution which works well in our experiment. W e set β to 0.001 to re gularize the mixing ﬁlters α k to be searched. The ﬁlters α k and z k are randomly initialized and optimized with Adam optimizer with a learning rate of 0.01, a β 1 of 0.9 and a β 2 of 0.999 (Algorithm 2). For comparison with regression based approaches, we ap- ply a CNN [ Xie et al. , 2012 ] which consists 4 layers with batch normalization [ Ioffe and Sze gedy , 2015 ] and ReLU non-linearity . The number of layers and parameters are set to be the same as the discriminator D in the DCGAN. The CNN is trained to regress from individual source with noise s + n to individual source s . For comparison with decomposi- tion based approaches, we train a dictionary for each of the 10 digits using NMF [ Cichocki et al. , 2009 ] with Euclidean dis- tance. Each dictionary consists of 20 bases which performs well in our experiment. In decomposition, the trained dic- tionaries are concatenated to form a dictionary of 200 bases which is then used to decompose the mixtures. 4.2 Evaluation Follo wing [ Xie et al. , 2012; Xu et al. , 2014; Jain and Seung, 2009 ] , we use peak signal to noise ratio (PSNR) to ev aluate single-channel signal separation and deconv olution quality . A higher PSNR indicates a better reconstruction quality . PSNR is deﬁned as: PSNR = 20 log 10  MAX I √ MSE  (12) where MAX I is the maximum value of a noise-free image. MSE represents mean squared error between two images I and J with size of m × n : MSE ( I , K ) = 1 mn m − 1 X i =0 n − 1 X j =0 ( I ( i, j ) − J ( i, j )) 2 (13) 4.3 Denoising, Inpainting and Completion Denoising, inpaining and completion are special case of single-channel signal separation and decon volution problem where α k is an unknown constant and n is unknown noise such as Gaussian noise, non-stationary noise or corruption of an image. The ﬁrst and second rows of Fig. 1 show the clean and noisy images. The third to the ﬁfth rows sho w the denoised images with CNN, NMF and the proposed S-D ap- proach. In the ﬁrst column, testing noise and training noise hav e the same distribution so CNN performs well. Ho wev er Noise n Mixing ﬁlters α k , k = 1 , ..., K Denoising Gaussian K = 1 , α k is a constant Inpainting, Completion Unknown K = 1 , α k is a constant Decon volution - K = 1 , α k is a tensor Separation - K > 1 , α k are constants Separation + decon volution - K > 1 , α k are tensors T able 1: Category of single-channel signal separation and decon volution problem with different noise and mixing ﬁlters. Figure 1: Image denoising, inpainting and completion with CNN, NMF and S-D approaches. denoising inpainting completion CNN 26.0 dB 15.3 dB 12.2 dB NMF 17.4 dB 13.4 dB 12.9 dB con voluti ve NMF 18.3 dB 13.4 dB 13.0 dB S-D with 1 init. 23.1 dB 15.2 dB 13.6 dB S-D with 8 init. 25.1 dB 18.2 dB 15.4 dB S-D with 32 init. 25.1 dB 18.9 dB 15.4 dB T able 2: PSNR of image denoising, inpainting and completion with different approaches. CNN based denoising methods do not generalize well to un- seen noise such as non-stationary noise or image corruption shown in the second and third columns in Fig. 1. NMF per- forms better than CNN under unseen noise but sometimes produces unnatural separation result, which is due to Problem 3 we stated in Section 2. S-D approach has a good perfor- mance in all of image denoising, inpainting and completion. T able 2 shows PSNR of CNN, NMF , con volutiv e NMF and S-D approaches. S-D approach achiev es a PSNR of 25.1 dB in image denoising which is comparable to CNN. NMF and con voluti ve NMF achiev e similar PSNR of 17.4 dB and 18.3 dB, respectiv ely . In image inpainting, S-D achie ves a PSNR of 18.9 dB, outperforming NMF and CNN methods of 13.4 dB and 15.3 dB, respectively . This result shows source sepa- ration with S-D generalize well to unseen noise than NMF and CNN. In image completion, S-D approach achiev es a PSNR of 15.4 dB, outperforming con voluti ve CNN of 12.2 dB and conv olutive NMF of 12.9 dB respectiv ely . T able 2 also shows the decomposition in S-D approach with respect to the number of initializations. W ith 8 or 32 initializations the performance is 2 dB better than with only 1 initialization. decon v . sep. sep. + decon v . NMF 15.3 dB 9.4 dB 8.7 dB con voluti ve NMF 18.3 dB 14.2 dB 10.1 dB S-D with 1 init. 17.3 dB 13.7 dB 9.3 dB S-D with 8 init. 21.9 dB 16.8 dB 11.5 dB S-D with 32 init. 23.2 dB 18.5 dB 13.2 dB T able 3: PSNR of image separation and deconv olution with different approaches. This may result from the fact that the optimization problem in (10) is non-con vex. Algorithm 2 is a gradient based method which may lead to the solution being in a local minimum. Repeating Algorithm 2 several times with dif ferent initializa- tions and choosing the solution with least reconstruction error shows better performance. 4.4 Separation and Decon volution W e ev aluate single-channel signal separation and decon vo- lution with the mixing ﬁlters α k , k = 1 , ..., K as unknown tensors, which is a very challenging task. In this case both of the mixing tensors α k and individual sources s k need to be estimated. Fig. 2 shows a mixture obtained by conv olv- ing clean sources with mixing ﬁlters followed by summation. In our experiment we set K = 2 and each mixing ﬁlter has a size of 5 × 5 . In actual application scenarios the size of mixing ﬁlter depends on the task. Fig. 2 shows NMF based separation often leads to unnatural images. The S-D based approach can separate images with high quality and both the sources s k and mixing ﬁlters α k can be estimated. Fig. 2 shows both estimated sources and mixing ﬁlters are learned correctly compared with the ground truth sources and mix- Figure 2: Image separation and decon volution with NMF and S-D approach. ing ﬁlters. The ﬁrst column of T able 3 shows the results of image deconv olution without separation where K=1 and α is an unknown tensor . S-D achiev es a PSNR of 23.2 dB and performs better than NMF and the conv olutive NMF of 15.3 dB and 18.3 dB, respectiv ely . The second column of T able 3 shows the results of image separation where α k are unkno wn constants and K = 2 . S-D achiev es a PSNR of 18.5 dB and performs better than NMF and conv olutive NMF of of 9.4 dB and 14.2 dB, respectively . The third column of T able 3 shows both of source separation and decon volution where α k are unknown tensors and K = 2 . S-D achiev es a PSNR of 13.2 dB and outperforms NMF and con voluti ve NMF of 8.7 dB and 10.1 dB, respectively . S-D with 32 initializations has higher PSNR than 8 initializations and than 1 initialization, which sho ws the effecti veness of repeating Algorithm 2 sev- eral times to solve the non-con vex optimization problem in (10). 5 Conclusion In this paper , we propose a synthesis-decomposition (S-D) approach to solve single-channel signal separation and de- con volution problem. In synthesizing, a generati ve model for source signals is trained using a generativ e adversarial net- work (GAN). In decomposition, both sources and ﬁlters are optimized to minimize the reconstruction error . Instead of optimizing sources directly , we optimize over the seeds of a GAN. The proposed S-D approach achiev es a PSNR of 18.9 dB and 15.4 dB in image inpainting and completion, outper- forming the regression approach CNN and decomposition ap- proach NMF . The S-D approach achieves a PSNR of 13.2 dB in image source separation with decon volution, outperform- ing NMF of 8.7 dB. Repeating the decomposition in S-D sev- eral times can signiﬁcantly improve PSNR. In future, we will explore the S-D approach to more source separation and de- con volution problems. Acknowledgements This research was supported by EPSRC grant EP/N014111/1 “Making Sense of Sounds” and a Research Scholarship from the China Scholarship Council (CSC) No. 201406150082. References [ Campisi and Egiazarian, 2017 ] Patrizio Campisi and Karen Egiazarian. Blind image deconvolution: theory and appli- cations . CRC press, 2017. [ Cichocki et al. , 2006 ] A. Cichocki, R. Zdunek, and S. Amari. New algorithms for non-ne gati ve matrix factorization in applications to blind source separation. In IEEE International Confer ence on Acoustics, Speech and Signal Pr ocessing (ICASSP) , 2006. [ Cichocki et al. , 2009 ] A. Cichocki, R. Zdunek, A. H. Phan, and S. Amari. Nonne gative matrix and tensor factoriza- tions: applications to exploratory multi-way data analysis and blind sour ce separation . John Wile y & Sons, 2009. [ Fan et al. , 2018 ] Z. Fan, Y . Lai, and J. Jang. SVSGAN: Singing v oice separation via generati ve adv ersarial net- work. In IEEE International Conference on Acoustics, Speech and Signal Pr ocessing (ICASSP) , 2018. [ Goodfellow et al. , 2014 ] I. Goodfello w , J. Pouget-Abadie, M. Mirza, B. Xu, D. W arde-Farle y , S. Ozair , A. Courville, and Y . Bengio. Generativ e adversarial nets. In Advances in Neural Information Pr ocessing Systems (NIPS) , pages 2672–2680, 2014. [ Grais et al. , 2014 ] E. M. Grais, M. Sen, and H. Erdogan. Deep neural networks for single channel source separa- tion. In IEEE International Confer ence on Acoustics, Speech and Signal Pr ocessing (ICASSP) , pages 3734– 3738, 2014. [ Ioffe and Sze gedy , 2015 ] S. Ioffe and C. Szegedy . Batch normalization: Accelerating deep network training by re- ducing internal cov ariate shift. In International Confer- ence on Machine Learning (ICML) , 2015. [ Jain and Seung, 2009 ] V . Jain and S. Seung. Natural im- age denoising with con volutional networks. In Advances in Neural Information Pr ocessing Systems (NIPS) , pages 769–776, 2009. [ Kingma and Ba, 2015 ] D. P . Kingma and J. Ba. Adam: A method for stochastic optimization. In International Con- fer ence on Learning Representations (ICLR) , 2015. [ Kitamura et al. , 2013 ] D. Kitamura, H. Saruwatari, K. Shikano, K. Kondo, and Y . T akahashi. Music signal separation by supervised nonnegati ve matrix factorization with basis deformation. In International Confer ence on Digital Signal Pr ocessing (DSP) , 2013. [ LeCun et al. , 1998 ] Y . LeCun, L. Bottou, Y . Bengio, and P . Haffner . Gradient-based learning applied to document recognition. Pr oceedings of the IEEE , 86(11):2278–2324, 1998. [ Lee and Seung, 1999 ] D. D. Lee and H. S. Seung. Learning the parts of objects by non-negati ve matrix factorization. Natur e , 1999. [ Levin et al. , 2009 ] Anat Levin, Y air W eiss, Fredo Durand, and William T Freeman. Understanding and ev aluating blind decon volution algorithms. In IEEE Conference on Computer V ision and P attern Recognition (CVPR) , 2009. [ Mijovic et al. , 2010 ] B. Mijovic, M. De V os, I. Gligorije- vic, J. T aelman, and S. V an Huf fel. Source separation from single-channel recordings by combining empirical- mode decomposition and independent component analy- sis. IEEE T ransactions on Biomedical Engineering , 2010. [ Radford et al. , 2015 ] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep con vo- lutional generativ e adversarial networks. arXiv preprint arXiv:1511.06434 , 2015. [ Stoller et al. , 2017 ] D. Stoller , S. Ewert, and S. Dixon. Adversarial semi-supervised audio source separation ap- plied to singing voice extraction. arXiv pr eprint arXiv:1711.00048 , 2017. [ Subakan and Smaragdis, 2017 ] Cem Subakan and Paris Smaragdis. Generati ve adversarial source separation. arXiv pr eprint arXiv:1710.10779 , 2017. [ Xie et al. , 2012 ] J. Xie, L. Xu, and E. Chen. Image denois- ing and inpainting with deep neural netw orks. In Advances in Neural Information Pr ocessing Systems (NIPS) , pages 341–349, 2012. [ Xu et al. , 2014 ] L. Xu, J. Ren, C. Liu, and J. Jia. Deep con- volutional neural netw ork for image decon volution. In Ad- vances in Neural Information Pr ocessing Systems (NIPS) , pages 1790–1798, 2014. [ Y eh et al. , 2016 ] R. Y eh, C. Chen, T . Y . Lim, M. Hasega wa- Johnson, and M. N. Do. Semantic image inpainting with perceptual and contextual losses. arXiv preprint arXiv:1607.07539 , 2016. [ Y osinski et al. , 2014 ] J. Y osinski, J. Clune, Y . Bengio, and H. Lipson. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems (NIPS) , pages 3320–3328, 2014. [ Zhang et al. , 2017 ] K. Zhang, W . Zuo, Y . Chen, D. Meng, and L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE T rans- actions on Image Pr ocessing , 26(7):3142–3155, 2017.

Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment