Phased Microphone Array for Sound Source Localization with Deep Learning

Phased Microphone Array for Sound Source Locali zation with Deep Learning W ei MA a, ∗ , Xun LIU b, ∗∗ a Scho ol of Aeronau tics and Astro nautics, Shanghai Jiao T ong Universi ty , Shangha i, PR China b Shanghai K e yGo T ec hnolo gy Company Limited , Shanghai, PR China Abstract T o phased microph one array for soun d source localization, algor ithm with both h ig h compu tational e ﬃ ciency and high precision is a persistent pu rsuit. In this pap er conv o lutional neu r al network (CNN) a kin d of deep lear ning is pr eliminarily applied as a new algorithm . At high frequen cy CNN can reconstruc t the so u nd localizatio n s with excellent spatial resolutio n as good as D AM A S, w ith in a very short time as short as conv entional beamfo rming. This exciting result means th a t CNN perfectly ﬁnds source distribution directly from cross-spectral matrix without gi ven propag ation function in advance, and th us CNN deserves to b e fu rther explor ed as a n ew algorithm. K eywor d s: microph one arrays, b eamform ing, deep learnin g 1. Introduction In rec ent years with the development of society , the awareness of the im p act of noise on h ealth has increased signiﬁcantly , environmen tal comfort has been becoming more and mor e importan t, and consequ ently acoustic source localization has been increasing ly critical in noise diag nosis. Nowadays phased micr o phon e array h as becom e a standard tech n ique f or aco ustic so urce lo calization. In the post-pro cessing, th e m ain two categories of traditio n al algorithm s ar e be a m formin g and deconv olution algorithms. Beamformin g a lg orithms construct a dirty map o f source distributions from array m icropho ne pr e ssure signals [ 1 ]. Conventional b e a mformin g is simple and robust, howe ver its main disadvantages includ e po or spatial r esolution particularly at low frequencies and po or dynamic range due to side-lob e e ﬀ ects [ 2 ]. For algorith m s with better perfor- mances, m a ny researcher s h av e proposed some advance beamformin g algo rithms, such as orthogo n al beam formin g [ 3 ], ro bust adaptive b e amformin g [ 4 ], and f unctiona l beam forming [ 5 ]. Concern ing spatial resolution, these adv ance beamfor ming algo rithms hav e obvious super iority compared to con ventional b e amform in g, h owe ver they a re not as good as deco n volution algorithms. Deconv o lution algorithms r econstruct a clean map of sou rce d istributions from a d irty map via iterative decon- volution, an d thus can sign iﬁcantly improve the spatial resolution. The most famou s deco n volution a lgorithms are D AM AS [ 6 , 7 ], NNLS [ 8 ] and CLEAN-SC [ 9 ]. Howev er deconv olution algorithm s requ ire a relatively high com- putational e ﬀ ort compared to co n ventional beamform ing du e to the inevitable iterations used in the deconv olu tion algorithm s. S pectral procedu re [ 10 ] an d c ompression co m putationa l gr id [ 11 , 12 , 13 ] are used to improve the e ﬃ - ciency of deconv olution algorithms. There are still two big challeng es for phased micropho ne array . One is that algo rithm with both high comp utational e ﬃ ciency and high p recision is a persistent pursuit, to improve the ab ility of real-time display an d o nline an a lysis. The other one is that when phased microph one arra y used in complex ﬂow en vironmen t with unknown p r opagatio n function , phased micro phone array with trad itional algo rithms loses its acc uracy , due to uncer tainty in the pro pagation function used in tradition al algorithm s when. ∗ Correspondi ng author ∗∗ T wo authors contrib uted equally to this work Email addr esses: mawei@sjtu.edu.c n (W ei MA), ae1905kaka@gma il.com (Xun LIU) Prep rint submitt ed to the Journal of the Acoustic al Society of A merica Expr ess Lett ers F ebruary 14, 2018 At th is time deep learning - deep n eural n etworks - is the m ost attra cti ve data m ining tool without any do ubt. Deep learning is a speciﬁc kind of m a chine learning [ 14 ]. Machine learnin g is able to learn f rom data and ﬁnd the relationship between inp u t and output d ata. Deep le a r ning discovers intricate stru cture in large data sets by using the back prop agation algorithm to in d icate how a mach ine should ch ange its internal pa r ameters that ar e used to comp ute the rep r esentation in each lay e r form the repr esentation in the p revious layer [ 15 ]. Deep learning h as recently ac h iev ed spectacular success in many dom ains such as speech re cognition [ 16 ], visu al ob je c t recognitio n [ 17 ], astronomy [ 1 8 ], as w e ll as the game of Go [ 1 9 ]. In traditio nal d isciplines, deep learning have also attracted wide spread attention and expected to be ab le to f urther solve th e traditional pro b lems. For example, deep learn in g h as been used to turbulence modellin g in ﬂu id mech anics [ 20 , 21 ]. In these traditional disciplines, deep learning is still stron gly challenging the dee p -rooted c o nsensus that innovations are inspired from expert- in-the-loo p intuition and p hysically in terpretable models, by providing co mpeting prediction s and witho ut clear physical interpr e tation. Inspired by th e success of deep learning , in this pap er con v olutional neural network (CNN) [ 14 ] a kin d of deep learning is applied to ph ased microphon e array for soun d source localization as a new algorithm. CNN uses the mathematical operation conv olution in at least one of their layers. C on volution leverages three important ideas that can h elp improve a machine learning: spar se interactio ns, param e ter sharing and equ iv arian t represen tations [ 14 ]. This attemp t ma in ly looks for ward to m aking full use of three featu res of deep learn ing to overcome the big challenges of phased micr ophon e array intr oduced ab ove. Th e ﬁrst on e is the excellent data learn ing capabilities. The second one is its compu tational speed o nce trained. The last one is its potential application s with un known pr opagatio n function . The rest of th is pa per is organized as f ollows. Algorithm s are p r esented in Section 2 . An applica tion is examine d in Section 3 . Finally , discussion and con clusion are presented in Section 4 . 2. Algorithms Beamforming algorithms Deconvolution algorithms (e.g. C NN ) (e.g. conventional beamforming, functional beamforming) (e.g. DAMAS, NNLS, CLEAN-CS) Deep learning Figure 1: Methods used for source local izat ion Fig. 1 o f (author?) [ 12 ] illustrates a setup with a plan ar m icropho ne a r ray that con tains M microp hones and has a diame ter of D , as well as a two-dimensional region of interest. Stationary no ise sources ar e located in an x - y plane at a distance of z 0 from th e centre o f the m icropho ne arr ay . The leng th of th e scann ing plane is L = 2 z 0 tan( α/ 2), wher e α is the ope n ing angle. The region of in terest is divided into S = N × N equidistan t points. In each test case, d ata from the mic r ophon e array are simu ltan eously acquired. Cross-spectral matrix (CSM) is th en calculated using these simultaneously acquired data from th e microph o ne array . The acquired data of each microph one are di vided into I fr ames. Each frame is then co n verted into fr equency bins by Fast Fourier Transform (FFT). For a g iv en angu lar frequ ency ω , CSM is av eraged over I block s C ( ω ) = p ( ω ) p ( ω ) H = 1 I I X i = 1 p i ( ω ) p i ( ω ) H (1) where p ( ω ) = [ p 1 ( ω ) , p 2 ( ω ) , . .., p M ( ω )] T , ( · ) H denotes com p lex con jugate transpose. For the sake o f brevity , ω is omitted in the fo llowing. Th e pro blem of phased m ic r ophon e a r rays for sour ce localization can be expressed as f ( C ) = x (2) 2 where x is the sour ce distribution of power descriptor s and x = [ q 2 1 , .. ., q 2 s , .. ., q 2 S ] T (3) where q s is sou rce a mplitude in terms of the pressure prod u ced at sou rce poin t s . Fig. 1 shows the algorithm s deal with Eq. 2 , including beamform ing algor ithms, de conv olu tion algorithms, and deep learn ing. 2.1. Beamfo rming a lgorithms The conv entional b eamform ing b ( r ) = e ( r ) H Ce ( r ) || e ( r ) | | 4 (4) where the vector e ( r ) ∈ C M × 1 is the steering vector at r and e ( r ) = [ e 1 ( r ) , ..., e m ( r ) , ..., e m ( r )] T (5) The nota tio n of steering vector un der mo n opole point source assum ption and in a mediu m with a unifor m ﬂow is [ 7 ] e m ( r ) = || r − r m || || r || exp {− j 2 π f / c 0 || r − r m ||} (6) where || r || is the distance fro m th e beamform er focus position to the centre o f the ar ray , || r − r m || is the distance fro m the b eamform er focu s po sition to the m th m icropho ne (see in Fig. ? ? ), f is frequen cy , and c 0 is speed o f sou n d. 2.2. Deconvolution alg orithms The sound p ressure con tr ibution at micr o phon es can be written as p = S X s = 1 e ( r s ) q s (7) For incoheren t acoustic sou r ces, CSM thus becom es C = S X s = 1 | q s | 2 e ( r s ) e ( r s ) H (8) The conventional DAS beamfor ming o utput can then be written as b ( r ) = S X s = 1 | q s | 2 · e ( r ) H [ e ( r s ) e ( r s ) H ] e ( r ) || e ( r ) | | 4 = S X s = 1 | q s | 2 · | e ( r ) H e ( r s ) | 2 || e ( r ) | | 4 (9) For a single u nit-power p oint so urce, Eq . ( 9 ) is k n own as p o int-spread fu nction (PSF) of th e array an d is deﬁn ed as PSF( r | r s ) = e ( r ) H [ e ( r s ) e ( r s ) H ] e ( r ) || e ( r ) | | 4 = | e ( r ) H e ( r s ) | 2 || e ( r ) | | 4 (10) and then Eq. ( 9 ) can then be written as b ( r ) = S X s = 1 | q s | 2 · PSF ( r | r s ) (11) By com puting PSF( r | r s ) for all co mbination s of ( r | r s ) in discrete grid and arrangin g e ach r esulting PSF m a p column - wise in a matrix A , Eq. ( 11 ) co u ld r eformu late in matrix notation as Ax = b (12) 3 where b contain s the beamforme r map . Eq. ( 12 ) is a system of lin ear equations. Notice that A ∈ R S × S , x ∈ R S × 1 , b ∈ R S × 1 . The deco n volution task is to ﬁnd a source distribution x for a giv e dirty m a p b and know matrix A . T h e co nstraint is that e a ch compo nent of the vector x is larger o r equ al to zero . In most of the application s the matrix A is singular , and b is in the range o f A , th is means there are very large n umber of solutions of x that fulﬁl Eq . 12 . The D AMAS algorithm [ 7 ] is an iterati ve algebraic deconv olution meth od. In this algo rithm, the sou rce distribution is calculated by th e solution of Eq. 12 using a Gauss-Seidel-type re la x ation. In each step the co nstraint is app lied that the sou rce strength re mains positive. 2.3. Deep learning Deep learning is used to recon struct sour c e distribution from CSM directly . Thus input tensor is C ∈ C M × M , while output ten sor is x ∈ C S × 1 . Keras fra m ew ork [ 22 ] with a T ensorﬂow backend is used h ere. 2.3.1. Networks ar chitectur e CNN a kind of deep learning is used here. V ariables of parameters and stru ctures o f CNN are displayed in T able 1 . This CNN model consists of fo ur two-dimensional convolutional lay ers (Con v2D), two two-dimension al pooling layers ( MaxPooling 2D), a ﬂatten layer (Flatten) and a regular densely-con nected neural networks la y er ( Dense). The conv olu tional layers p erform d iscrete c on volution operation s on th eir in put. In each conv olutional layer, zero-pad ding is valid such th at the output h as the same length as the o riginal inpu t, meanwhile a bias vector is cr eated and added to the outp u ts. The outpu t o f each co n volutional layer is passed to a r e ctiﬁed linear u nit (ReLU) ﬁlter . The poo lin g layer perfor ms a max operatio n ov er sub-regions of the extracted feature map s resulting in d own sampling by a factor of two. The ﬂatten layer just ﬂattens the input and d oes no t a ﬀ ect the batch size. The regular den sely-conn ected neu r al networks layer gives S -dimensional output sp a c e using a matr ix multiplicatio n and b ias ad dition. T able 1: V ariabl es of paramete rs and structu res of the con volutiona l neural networks. Layer Layer Kernel Kernel Stride Activ ation Padding Output No. T ype Number Size Size 1 Con v2D 64 3 × 3 1 × 1 ReLU Y es M × M × 64 2 Con v2D 64 3 × 3 1 × 1 ReLU Y es M × M × 64 3 MaxPooling 2D - 2 × 2 2 × 2 - No M / 2 × M / 2 × 64 4 Con v2D 128 3 × 3 1 × 1 ReLU Y es M / 2 × M / 2 × 1 28 5 Con v2D 128 3 × 3 1 × 1 ReLU Y es M / 2 × M / 2 × 1 28 6 MaxPooling 2D - 2 × 2 2 × 2 - No M / 4 × M / 4 × 1 28 7 Flatten - - - - - ( M / 4 ∗ M / 4 ∗ 128) × 1 8 Dense 1 S × ( M / 4 ∗ M / 4 ∗ 1 28) - - - S × 1 S , grid nu mber; M , microp hone numbe r . 2.3.2. T raining strate gy T o tr a in the CNN, a variant of stoch astic gradien t d escent called ADAM is used. The learning rate is set as α = 0.001 and the other hyp er-parameters of ADAM op tim izer to β 1 = .09, β 2 = 0.999 and ǫ = 10 − 8 as recomm ended. The loss function used to train the weights o f the networks is the mean squ a red error . Metric function is set as the mean-squ ares o f the erro rs between assigned a n d p redicted values, Metric Fu nction = 1 S N X i = 1 ( Y i − ˆ Y i ) 2 (13) where Y i and ˆ Y i are assign ed value and pre dicted value at i th grid, respectively . 2.3.3. T raining data The data u sed to train the network is obtain ed by simulating a CSM for a g iv en sound source distribution according to Eq . 8 . 4 3. Ap plication In th is sectio n syn thetic applicatio ns are carried out to c h eck the spatial resolutio n of CNN. The planar array contains 30 simulated microp hones and has a diameter D of 0.35 m, as shown in Fig. 2 of (author?) [ 12 ]. In th e geome trical setup, th e observation p lane is parallel to the array p la n e, and the r egion of interest is rig ht in f r ont of the array . The distance between ar ray plane and observation plane z 0 is 2. 0 m. The opening angle α = 45 ◦ . T he co m putationa l grid is 15 × 15 with 225 grid points. Gau ssian white noise is added with a signal-to-noise ratio o f 15 dB at the microph one a r ray . For tradition al alg orithms, d iagonal removal is app lied on the CSM used for co n ventional bea mformin g, while n o diagona l rem oval is app lied on the PSF u sed fo r D AMAS. DAMAS is ru n with 1000 iterations, wh ic h appear ed to be more tha n en o ugh for conv ergence. For new alg orithm, CNN d escribed in previous section has appro ximately 1.62 × 1 0 6 trainable param eters ac- cording to microph one num b er M = 30 and grid num b er S = 225. The da ta used to tr ain th e network is obtained by simulating a CSM with three unifor m so u nd sources ra n domly distributed in the g r id as th e given soun d distributions. In this ap plication f or a tr aining, 4 × 10 4 numerical data are generated , in which 80% are used as training data, 10% are used as validation data, the r emaining 10 % are used as te st data. T est data are ma d e su re do n ot appear in training and validation da ta . Nu mber of samples p er grad ient upd ate is speciﬁed as 32, and numb er of epoch s to tra in the model is speciﬁed as 10 which appeared to be mo r e than enough for con vergenc e . The network train ing takes around 4 hours on a MacBoo k Pro with a p rocessor o f 2. 9 GHz Inter Core i5 for training. CNN test accu racy with frequen cy is listed in T able 2 . CNN test acc uracy at f = 8 kHz is u p to 98%. After c h ecking these 2% inco r rect examp les, at least one sou nd so urce p oint is located at the ed ge o f grid in most cases. Results of two g iv en sou nd distributions at f = 8 kHz are sh own in Fig. 2 . I n the ﬁrst given sound distribution, the d istances between th ree soun d sources are quite large. C on ventional beamfo rming can separ ate the se three sources alth ough some side-lobes exist. Both CNN and D AMAS can reco nstruct accu rately th e source distribution. In th e seco nd given sound distrib ution, two sou r ces a r e located o n adjacent grids. Conventional beam f orming cannot separ a te these two adjacent sou rces. Both CNN and D AMAS can still reconstruc t accu rately the source distribution. This exciting r esult means that CNN almo st perfec tly ﬁnds source distribution x from C withou t giv en pro p agation function in ad vance. Unfortu n ately CNN test accu racy d ecreases as the frequen cy decreases. CNN test accurac ie s are only 83% and 60% at f = 5 kHz an d 3 k Hz, respectively . A f ter check ing these incorrect examples at f = 3 kHz, at least one sound source po int is located at the edg e of grid in lots cases. These inco rrect examples are the sam e as those in f = 8 kHz. Howe ver there are also lots incorrect exam ples where sound sources are adja c e nt. Fig. 3 shows the reconstru c tion results at f = 3 kHz for the two g i ven sou nd distributions in Fig. 2 . In the ﬁrst given sound distribution with three dispersed sources far apa r t, c o n ventional be amformin g canno t sepa r ate these three source s, because resolutio n of conv entional beamf orming is inv ersely p ropor tional to frequ ency according their relationship [ 23 ], R = 1 . 22 cos( α/ 2) 3 zc D f , where R is resolution o f con ventional be a m formin g an d c is sound velocity . R and grid spacing ∆ x are also listed in T able 2 . For this sound distribution, both CNN and DAMAS can reco n struct accurately the source distribution. In the second given sound distribution with two ad jacent sources, con ventional beamformin g cannot separate sourc es unexpectedly , while D A M AS can reco n struct accu rately the source distribution. Ho we ver CNN losses its acc uracy , at this given sound d istribution. W ith regard to compu ting speed in applicatio ns, CNN is as fast as conv entional b eamform ing, and is sign iﬁcantly faster than D AMAS. T able 2: Parameters with frequenc y . f 8 k Hz 5 k Hz 3 kHz CNN test accuracy 98% 83% 60% R 0.375 7 0.601 2 1.001 9 ∆ x 0 .1183 0 .1183 0 .1183 5 −0.8 −0.4 0 0.4 0.8 −0.8 −0.4 0 0.4 0.8 Conventional Beamforming, f=8 kHz x (m) y (m) SPL(dB) −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 (a) −0.8 −0.4 0 0.4 0.8 −0.8 −0.4 0 0.4 0.8 DAMAS, f=8 kHz x (m) y (m) SPL(dB) −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 (b) −0.8 −0.4 0 0.4 0.8 −0.8 −0.4 0 0.4 0.8 CNN, f=8 kHz x (m) y (m) SPL(dB) −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 (c) −0.8 −0.4 0 0.4 0.8 −0.8 −0.4 0 0.4 0.8 Conventional Beamforming, f=8 kHz x (m) y (m) SPL(dB) −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 (d) −0.8 −0.4 0 0.4 0.8 −0.8 −0.4 0 0.4 0.8 DAMAS, f=8 kHz x (m) y (m) SPL(dB) −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 (e) −0.8 −0.4 0 0.4 0.8 −0.8 −0.4 0 0.4 0.8 CNN, f=8 kHz x (m) y (m) SPL(dB) −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 (f) Figure 2: f = 8 kHz. Black cross symbols, positions of synthetic point sources. The ﬁrst line, three dispersed sources far apart; the second line, two of three sources are adjac ent. The ﬁrst column, con vent ional beamforming; the second column DAMAS; the third column, CNN. −0.8 −0.4 0 0.4 0.8 −0.8 −0.4 0 0.4 0.8 Conventional Beamforming, f=3 kHz x (m) y (m) SPL(dB) −20 −18 −16 −14 −12 −10 −8 −6 −4 −2 0 (a) −0.8 −0.4 0 0.4 0.8 −0.8 −0.4 0 0.4 0.8 DAMAS, f=3 kHz x (m) y (m) SPL(dB) −20 −18 −16 −14 −12 −10 −8 −6 −4 −2 0 (b) −0.8 −0.4 0 0.4 0.8 −0.8 −0.4 0 0.4 0.8 CNN, f=3 kHz x (m) y (m) SPL(dB) −20 −18 −16 −14 −12 −10 −8 −6 −4 −2 0 (c) −0.8 −0.4 0 0.4 0.8 −0.8 −0.4 0 0.4 0.8 Conventional Beamforming, f=3 kHz x (m) y (m) SPL(dB) −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 (d) −0.8 −0.4 0 0.4 0.8 −0.8 −0.4 0 0.4 0.8 DAMAS, f=3 kHz x (m) y (m) SPL(dB) −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 (e) −0.8 −0.4 0 0.4 0.8 −0.8 −0.4 0 0.4 0.8 CNN, f=3 kHz x (m) y (m) SPL(dB) −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 (f) Figure 3: f = 3 kHz. Black cross symbols, positions of synthetic point sources. The ﬁrst line, three dispersed sources far apart; the second line, two of three sources are adjac ent. The ﬁrst column, con vent ional beamforming; the second column DAMAS; the third column, CNN. 6 4. Discussion and Conclusion In this paper, CNN a kin d of deep learning as an alternative algorithm is prelimin arily ap plied to phased micro- phone arrays fo r sound source localiza tio n. T o the best knowledge o f the authors, this p aper is ﬁrst work so far that applies d eep learn ing to phased microp hone array fo r sou nd sou r ce localization . Throu g h prelim in ary in vestigation, at high fre q uency CNN can rec onstruct the sound loca liza tions with excellent spatial resolu tion as good as DAMAS, within a very short time as short as con ven tional beam f orming . This e xciting result mean s that CNN almost pe r fectly ﬁnds sou rce distribution from cro ss-spectral matrix without given propagation function in advance. This preliminary inv estigation m akes CNN has encou raging p r ospects for application s with unknown p r opagatio n func tion, and thus CNN d eserves to be f urther exp lo red as a new algo rithm. One point the author s want to emphasize her e is that CNN is deﬁnitely n ot data ﬁtting. For three u niform sou nd sources random ly distributed in 2 25 grid s, ther e are C 3 225 = 1.873 × 10 6 possibilities. In th e application s, the n u mber o f training d ata is 3.2 × 10 4 , on ly 1 .71% of all the possibilities. About the CNN in vestigation and optimization , th e q uestions are still op en and needed to investigate in the futu re, such as: (i) What’ s the d ynamic ran ge of CNN? (ii) How many layers ar e m ost suitable for a give da ta set? (iii) How m any kernel number and size are n eeded? (iv) How big are training data? (v) Wha t is the u ncertainty o f CNN prediction s? (vi) How to imp rove the a ccuracy at low freque ncy? (v ii) How to improve the accuracy whe n sound sources located at edg e of grid? The main c h allenge of CNN is that large amo u nt of training data are requir ed. Especially in the application s with unknown prop agation fu nction, reliable training data can o nly be accum ulated thr ough a large number of experiments with a p r ocess that takes a lot of time and money . References [1] D. H. Johnson, D. E. Dudgeon, Array Signal Processing: Concepts and T echnique s, Prentice Hall, Ne w Jersey , 1993. [2] U. Michel, History of acoustic beamforming, 1st Berlin Beamforming Conferenc e 2006 (2006) 1-17. [3] E. Sarradj, A fast signal subspace approa ch for the determina tion of absolut e lev els from phased microphone array measurements, Journal of Sound and V ibration (2010). [4] X. Huang, B. Long, I. V inogradov , E. Peers, Adapti ve beamforming for array signal proce ssing in aeroacoust ic measurements, Journal of the Acoustic al Society of America 131 (2012) 2152-2161. [5] R. P . Dougherty , Functional beamforming, 5th Berlin Beamforming Conferen ce 2014, BeBeC-201 4-01 (2014). [6] T . F . Brooks, W . M. Humphreys, A decon voluti on approach for the mapping of acoustic sources (DAMAS) determined from phased m icro- phone arrays, AIAA-2004-2954 (2004). [7] T . F . Brooks, W . M. Humphreys, A decon voluti on approach for the mapping of acoustic sources (DAMAS) determined from phased m icro- phone arrays, Journal of Sound and V ibration 294 (2006) 856-879. [8] C. L. Lawson, R. J. Hanson, Solving Least Square Problems (Chapte r 23), SIAM, 1995. [9] P . Sijtsma, CLEAN based on spatial s ource cohere nce, Internationa l Journal of Aeroacoustics 6 (2007) 357-374. [10] R. P . Dougherty , Extension of DAMAS and beneﬁt s and limita tions of decon voluti on in beamforming, AIAA 2005-2961 (2005). [11] W . Ma, X. Liu, Improvin g the e ﬃ cienc y of DAMAS for sound source locali zati on via wav elet compression computat ional grid, Journal of Sound and V ibration 395 (2017) 341-353. [12] W . Ma, X. Liu, D AMAS with compre ssion computat ional grid for acoustic source mapping, Journal of Sound and V ibration 410 (201 7) 473-484. [13] W . Ma, X. Liu, Compression computat ional grid based on functional beamforming for acoustic source local izati on, Appli ed Acoustics 134 (2018) 75-87. [14] I. Goodfello w , Y . Be ngio, A. Courville , Deep Learning, www .deeple arning book.org, 2017. [15] Y . LeCun, Y . Bengio, G. Hinton, Deep learni ng, Nature 521 (2015) 436-444. [16] G. E. Dahl, D. Y u, L. Deng, A. Acero, Conte xt-de pendent pre-traine d deep neural netwo rks for large -voc abul ary speech recogni tion, IEEE Tra nsactio ns on Audio, Speech, and Language Processing 20 (2012) 30-42. [17] A. Krizhe vsky , S. I, G. Hinton, Imagenet classi? cati on with deep con volutio nal neural networks, Communications of the ACM 60 (2012). [18] Y . D. Hezave h, L. P . L e v asseur , P . J. Marshall, Fast automated anal ysis of strong gravitati onal lenses with con vol utiona l neural networks, Nature 548 (2017) 555-557. [19] D. Silver , J. Schri ttwiese r , K. Simon yan, I. Antonoglou, A. Huang, Guez, T . Art hur , Hubert, L. Bake r , M. Lai, A. Bolton, Y . Chen, T . Lillicrap, F . Hui, L. Sifre, G. v an den Driessche, T . Grae pel, D . Hassabi s, Mast ering the game of go without human knowled ge, Natur e 550 (2017) 354359. [20] J. Ling, A. Kurz awski , J. T empleton, Re ynolds a vera ged turb ulenc e modelling using deep neura l netw orks with embedded in varia nce, Journal of Fluid Mechanic s 807 (2016) 155-166. [21] J. N. Kutz, Deep learnin g in ?uid dynamics, Journal of Fluid Mechanics 814 (2017) 1-4. [22] F . Chollet, Ke ras, GitHub Repository (2015). [23] J. J. Christensen, J. Hald, T echnical revie w: Beamfor ming, Bruel&Kje ar , Danmark (2004). 7

Phased Microphone Array for Sound Source Localization with Deep Learning

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment