A Signal Subspace Rotation Method for Localization of Multiple Wideband Sound Sources
In this paper, the problem of extending narrowband multichannel sound source localization algorithms to the wideband case is addressed. The DOA estimation of narrowband algorithms is based on the estimate of inter-channel phase differences (IPD) betw…
Authors: Kainan Chen, Wenyu Jin, Bharadwaj Desikan
A SIGNAL S UBSP A CE R OT A TION ME THOD FOR LOCALIZA TION OF MUL TIPLE WIDEB AND SO UND SOURCES Kainan Chen, 1 W enyu Jin, 1 , 2 Bharadwaj Desikan, 1 1 Huawei German Research Center , Munich, Germany { kainan.chen, bharadwaj.desikan } @huawei.com 2 Department of Algorithm, Starke y Hearing T echnologies, Eden Prairie, United States wenyu.jin@ieee.org ABSTRA CT In this paper, the problem of extending nar rowband mul- tichannel sound source localization algorithms to the wide- band case is addr essed. The DO A estimation of narrowband algorithm s is based on the estimate of inter-channel phase dif- ferences (IPD) between microph ones o f the so u nd sources. A new method for wideband soun d source DO A estimation based o n sign al subspa ce ro tatio n is present. The p roposed algorithm n ormalizes the narr owband signal statistics by ro- tating the estimated signal subspace to the wideban d coun- terpart in the eigenv ector domain. Then the wideban d DO A estimate can be obtained by estimatin g the no rmalized I PD from th ese wideband signal statistics. In additio n to requ iring less comp utational complexity compared to r epeating the n ar- rowband algo r ithms for all relev ant freq u encies of wideband signals, the pr oposed method also does not re quire any addi- tional prio r knowledge. The experimental results demon stra te the efficac y and the robustness o f th e pro posed method. Index T erms — Sou nd source localization, wideband, sig- nal subspace rotation 1. INTR O DUCTION Sound source loc a lization is an im portan t com ponen t in ma ny multichann el signal pro cessing systems aiming, e.g., at sou rce tracking, signal separation , enhanceme n t and noise suppres- sion [1, 2] . T ra d itional sou nd so u rce loc a lization alg o rithms such as GCC-PHA T [3] estimates the time delay of arriv a l (TDO A) between a pair of microphones to localize a sing le source. SRP-PHA T [4] as an extensio n can further local- ize multiple sources simultaneo usly . Anothe r grou p of algo- rithms th a t resolves th e simultan e ous multiple sou rce lo cal- ization pr oblem is based on h igh-reso lu tion subspace tech- niques, such as MUSIC [ 5] and ESPRIT [6] . For h ighly reverberant senarios, th e Indepen dent Com ponen t Analysis (ICA) based algorithms are prop osed [ 7, 8]. ESPRIT [6] exploits the algeb raic pro perties of the spatial covariance matrix. This m ethod feature s goo d performan c e for narrowband sign a ls. Howe ver , it is not directly applicab le to wideband signals. Hence, it pr ocess the wideband signals by estimatin g the individual narrowband results in th e Shor t T ime Fourier Transform (STFT) dom ain. Post-processing schemes such as the histogram meth od only exploit the nar- rowband estimatio n results while the mutual information between different frequency bins is not con sidered. Espe- cially for known typ e of the sou nd so u rces, su c h as speech, the process based on clustering STFT bins such as [9, 10] can further improve the accuracy . Many studies focus o n extension s of narrowband localiza- tion algorithms to wideband signals. The methods of [11, 12] transform signals to cylindr ical/spherical ha rmonics for c ir- cular/spher ic a l micro phon e arr ays. Then the rotational in vari- ances are n o rmalized through frequencies and the further es- timation via ESPRIT is b a sed o n wideband sign als. A Coher- ent Signal Subspace ( CSS) method that deri ves the so-called focusing m atrices based o n steering vectors w as introduced in [13]. The fo cusing matrices aim to adapt each narrowband signal covariance matrix to gen eralize to the wideband case. Theoretical analysis and exper iments show that the f ocus- ing matrices lead to a wideband MUSIC algorithm [13], and in [14, 15, 16], the f ocusing matrice s extend ESPRIT to wide- band signals. As pointed out in [ 17, 18], the focusing matrices method ad a pts the covariance matrices via linear tran sfo rma- tions that are depend e nt on fr equency and DO As. Therefor e, these extensions require a prio r i kn owledge of DOA estimates and o n the arra y manifo ld for initializing the transform ation matrices. T o av oid th is req u irement, the algorith m in [17] estimates an AR model for the transmission ch annel of the wideband signals as a priori knowledge for wideban d local- ization. The perfor mance of this algorithm cruc ia lly depend on how well the AR model is estimated. Howev er , the model that they proposed and the conv entional m odels are not suit- able for non-station ary scenarios. A novel metho d tha t adapts the n arrowband ESPRIT to wideband signals is intro duced in this work. The key idea is the rotatio n of the eigen vectors that span the signal sub- space by th e c orrespon ding frequ ency , and to reconstruct th e covariance m atrices for each fr e quency bin b ased on the ro- tated eigenvectors. In comparison with the focu sing matrices approa c h , the pro posed method utilize the wid eband signal second-o rder statistics wh ile d o es not require a priori kno wl- edge of the DO A wh ich is typically obtaine d by repeating nar- rowband MUSIC [ 16]. Experimen ts with simulated and real recordin gs show the advantages of the new method regarding perfor mance an d computation a l complexity . 2. SIGN AL MODEL W e assume a micro p hone array which fits the requiremen ts for using ESPRIT , i.e ., to loc alize Q sources. W e use P m i- cropho nes ( P > Q ) that for m a linear array with a unif orm spacing ∆ d an d with mutually uncorr e la te d zero- mean sensor noise. T h e en vir onmen t is assumed with free-field and far - field conditions. The sou rce-micr o phon e model in the STFT domain is defined as X p ( f i ) = Q X q =1 A p,q ( f i ) S q ( f i ) + N p ( f i ) , (1) where X p ( f i ) and N p ( f i ) denote th e obser vation and the noise at the p -th m icropho ne in the STFT doma in at f r e- quency bin f i , respectively , S q ( f i ) denotes the q - th source signal, and A p,q ( f i ) denotes the f requen cy re sponse of the propag ation path which are gi ven by the rows of A ( f i ) for q -th sour ce arri ving at the p -th microph one. The po wer spectral density matr ix for all Q sou rce signal compon ents at the i -th fr equency bin f i is den oted as R s ( f i ) and the power spectr al density matrix for the observed noise is denoted a s R n ( f i ) . The narrowband compo nent covariance matrix R ( f i ) of the micro phon e signals can be expressed as R ( f i ) = A ( f i ) R s ( f i ) A H ( f i ) + R n ( f i ) , (2) where ∗ H denotes the Hermitian transp o se, A ( f i ) denotes a P × Q matrix that captures the steerin g vector s at the fre- quency bin f i . The elem ent of A ( f i ) at the p − th row and q − th column, A ( f i ) p,q , can be expressed as A ( f i ) p,q = e − j 2 π f i ∆ t p,q , ∆ t p,q def = ( p − 1)∆ dc − 1 sin θ q . (3) Therefo re, the steering vectors are frequen cy-depend ent and the relatio nship between dif ferent freque n cy bins f 1 , f 2 can be expressed as, A ( f 1 ) ◦ f − 1 1 = A ( f 2 ) ◦ f − 1 2 , (4) where {∗} ◦ f − 1 describes elem ent-wise exponen tiation by f − 1 and f 1 , f 2 denote any tw o frequen cies below the spa tial aliasing frequ e ncy f a , f a = ⌊ c 2∆ d sin θ ⌋ , (5) where ⌊∗⌋ denotes a function that returns the next lower fre- quency bin. For our proposed ev olution o f ESPRIT to wideb and source localization, the first step is to estimate the narrowband steering vector . Then th e IPDs of each source are obtain ed from the estima tio n result. Therefo re, the IPDs are d epend- ing on f r equency and ESPRIT has to b e repeated f or each narrowband to localize wid eband sou rces, a s it is also the case for other narrowband localization algor ithms [5, 19]. A solution to estimate a unique I PD for each source thr ough the frequen cy b ins is to rotate the estimated steerin g vector based on (4) as it is shown in the following. 3. P ROPOSED APPR OA CH The proposed approach co nsiders a novel way for adaptin g ESPRIT to a single or multip le wideband sources. In ca se of a single sou rce scenario, for each nar rowband componen t, the pr oposed appr oach rotates th e eigenv ectors that span the signal subspaces by normalizing the IPD. In case of multiple sources, it r econstructs covariance matrices from the rotated eigenv ectors. T o obtain an estimate of the sign al subspace, the least- squares (LS) criterion is conventionally employed to fin d Q vectors that describe the signal subspa c e an d P − Q noise vectors to represent the n oise subspace in an LS sense (see, e.g., [5, 6 ]). For Q indepen dent target sources, the P × P matrix A ( f i ) R s ( f i ) A H ( f i ) is at least of rank Q and posi- ti ve semidefinite by co nstruction . Therefore, by taking th e eigenv alu e decomposition of R ( f i ) , the eigenvectors corre- sponding to the largest Q eigenv alues are assumed to be opti- mum to span the signal subspace at the frequency b in f i (see, e.g., [6]). The eigenv alue decomp osition of R ( f i ) can be expressed as R ( f i ) U ( f i ) = U ( f i ) Λ Λ Λ( f i ) , (6) where U denotes the eigen vector matrix and Λ Λ Λ is a diago- nal matr ix that contain s the correspo nding eigenv alu e s. The eigenv ector spans the signal su b space is den oted as U s , a nd the eigenv ector for th e noise subspace is den oted as U n , U ( f i ) = [ U s ( f i ) | U n ( f i )] . (7) By the relationship defined in (4), to rotate the estimated subspaces such that th ey b ecome frequency-in depend ent for all f requen cy subband s, the estimated sign al subspa c e rota tio n is defined as U ′ ( f i ) = U ( f i ) ◦ f − 1 i . (8) Then the frequency comp onent of the IPDs are assumed to cancelled. ESPRIT is based on sou r ce subspac e analysis [6]. For the single - source ESPRIT , the estimated source subspace is described by the vector U ′ s ( f i ) . By weighting and summing the ro tated eig en vectors that span th e sour ce sub space U ′ s ( f i ) , the estimate d wideband source sub sp ace U ′ ss can be ob tained as U ′ ss = X i β ( f i ) U ′ s ( f i ) , (9) where β ( f i ) den otes a frequ ency-depen dent weighting fac- tor . The e igenv alue s of th e sources are re levant to th e signal power at a certain frequency and can be assumed to reflect the reliab ility of the signal subspaces estimates. Therefor e, the weighting func tio n is chosen as β ( f i ) = tra c e { Λ Λ Λ s ( f i ) } , (10) where trace {∗} d enotes a fu nction which retu rns the trac e of the matrix a nd Λ Λ Λ s ( f i ) denotes the eigenvalue matrix o f the signal subspace. Follo wing E SPRIT [6], the submatrices satisfy the in v ari- ance relation U ′ ss 2 = U ′ ss 1 Φ , (11) with Φ = diag { e − j 2 π ∆ t 1 , ..., e − j 2 π ∆ t P } (12) where U ′ ss 1 and U ′ ss 2 denote the vectors contain the first and last P − 1 elements o f U ′ ss , respectively . Since the combined vector U ′ ss is assumed to span th e wid eband sign a l subspace E s , it holds that E s = U ′ ss T , (13) where T is a no n -singular matrix . Th erefor e , the subspaces of the two subarrays can be defined as E s 1 = U ′ ss 1 T , E s 2 = U ′ ss 2 T = U ′ ss 1 ΦT , E s 2 = E s 1 Ψ , (14) where Ψ = TΦT − 1 . Ψ can then be obtain ed from (14) by applying a standard least-squ ares o r total least-squares solver . By realizing that the eigenvalues of Ψ are the diago nal ele- ments of Φ the locations of th e sou rces can be estimated. For the multi-sour c e ESPRIT , e ach estimated na r rowband source subspace is described by the c olumns o f the m atrix U ′ s ( f i ) . T he order of the colum ns of the matrix is g ener- ally not known and m ay be different for different frequency bins. Therefo re, the estimated signal sub spaces cannot be combined using (9). A simp ler and more robust solution than the sou rce sub- space identificatio n is to reco nstruct estimated signal sub- spaces back to the fo rm of a P × P cov ariance matrix like R ′ ( f i ) . This can be a chieved by the inverse pro c ess of eigenv alu e decom p osition (6) u sing the frequency compon ent cancelled matrix U ′ ( f i ) , R ′ ( f i ) = U ′ ( f i ) Λ Λ Λ( f i ) U ′ ( f i ) − 1 . (15) W ith the same weighting factor definitio n in (1 0), the re- constructed covariance matrices are calculated by R ′′ = X i β ( f i ) R ′ ( f i ) . (16) Therefo re, the eigen vectors that span the estimated sign al sub- spaces are r e mixed and accumu lated through fr e q uencies in R ′′ . By usin g the conventional narrowband ESPRIT , wide- band sign al subspace can b e separated b ack and wid eband DO A s can then be estimated. 4. IMPLEMENT A TION In the estimated signal subspace ro tation step (8), whe n f gets larger , a finer quan tization is required to limit the ef fect of quantization erro r s o n the DO A estimatio n . An iterative accu- mulation method is p roposed to solve this numerical sensitiv- ity problem . I n each iteration, th e e stimated signal subspac e from the i -th ( i ∈ N + ) fr equency bin is ro tated to a dapt the next f requen cy b in ( i + 1 ) to recon stru ct the covariance matrix R ′′ i +1 . After weigh ting and summing to R ′′ i , the ne w accu mu- lated covariance m atrix R ′′′ i +1 is rotated to the next f requen cy bin ( R ′′ i +1 ) f or the next iteration. The rotation step is then as small as the po wer of f i +2 f − 1 i +1 . The iteration process ca n be expressed as Initialization: R ′′ 1 = U ′ f 1 f − 1 0 s ( f 0 ) Λ Λ Λ( f 0 ) U ′ f 1 f − 1 0 s ( f 0 ) − 1 (17) In each iteration : R ′′′ i +1 = β ( f i ) U ′ s ( f i ) Λ Λ Λ( f i ) U ′ s ( f i ) − 1 + R ′′ i R ′′′ i +1 def = U ′′′ s Λ ′′′ Λ ′′′ Λ ′′′ U ′′′ − 1 s R ′′ i +1 = U ′′′ f i +2 f − 1 i +1 s Λ Λ Λ ′′′ U ′′′ f i +2 f − 1 i +1 s (18) Because of the spatial aliasing problem, the iteration con- tinues until the lowest aliasing frequen cy , d enoted by f a 0 , f a 0 def = ⌊ argmin θ f a ⌋ = ⌊ c 2∆ d ⌋ . (19) T o utilize high er f requen cies, the r e plication method [20] can potentially b e useful f or detectin g the aliasin g frequ ency f or each sou rce. Similar to multi-sour ce localization, the DO A can be obtaine d using the matrix R ′′ ( f a ) . 5. EV ALU A TION A set of experiments was perf ormed in ord er to ev aluate the perfor mance of the algo rithm using real-world record ings. The ev aluation includ es comparison s to the narrowband ES- PRIT with the histogr am meth od (hist-ESPRIT) [21] and to the CSS method [13]. 5.1. Experimenta l setup The re c ording s were captured by a uniform linear array (ULA) with five microph o nes in a low-re verberation lab (T60 ≈ 0 . 2 s o f size 9 m × 8 m × 3 m ). The micr o phon e model SNR=10dB SNR=0dB Algorithm MAE SDE MAE SDE hist-ESPRIT 2 . 10 ◦ 2 . 47 ◦ 3 . 44 ◦ 2 . 47 ◦ CSS 3 . 81 ◦ 5 . 58 ◦ 5 . 52 ◦ 7 . 34 ◦ Proposed 1 . 41 ◦ 1 . 62 ◦ 1 . 68 ◦ 1 . 75 ◦ T able 1 . Sing le white noise source was AKG C562CM. The spacing between micro p hones ∆ d was 0 . 044 m. The backg r ound no ise was wh ite noise, and it was played back via 22 surround speakers to emulate dif- fuse backgr ound no ise. The set of sou r ces contains white sources (independ ent from the backgro und noise) and a set of speech re c ording s selected from th e GRID Corpu s [22]. There were two sourc es located at an angle of 45 ◦ and at − 45 ◦ at a distance of 3 m from the c e nter of the micr opho ne array . In order to evaluate the robustness of the pr oposed localization method, thre e test con ditions wer e u sed: (1) sin- gle white source (2 ) two competing white sources, (3) two competing talkers. The total length of the recording s is 800 s. The sampling rate was 1600 0 sam p les per second. The proposed method, hist-ESPRIT [ 21] and CSS [13] were processed block-wise with 50% overlaps. T he length of the block was 1024 sam p les. The fr equency band u sed for these algorithms was up to 3800 Hz , b elow the aliasing frequen cy . 5.2. Results The scenario of the first experimen ts con sisted of a sing le white no ise a s the source under 10 dB and 0 dB back groun d diffuse white n oise. The sou rce signal was play ed in an al- ternating f ashion from two speakers lo cated at 45 ◦ and − 45 ◦ . The resu lt is shown in T able 1. The pe rforma n ce of the alg o- rithm is e v aluated by the mean absolute error (MAE) an d the standard deviation of the erro r ( SDE). It can be seen from the results that the pr oposed algorithm is m o re accurate and mo re stable under this strong back groun d no ise scenar io. W ith the same backg round noise conditions, the secon d experiment features two wh ite noise sources being played by both sp eakers in a co mpeting fashion . Th e result is shown in T able 2. The MAE of hist-ESPRIT is slightly lo wer than the propo sed algo rithm as the erro r influen ce of estimatio n in low frequen cies is slightly lower , while the SDE fo r the proposed algorithm is clearly superio r . The final experimen t used the same scen ario as th e second experimen t, but two simultaneo usly active speech sources were to be localized. The le vel of the spe e ch signals changed over time, and the estimated SNR was in th e rang e of [ − 5 , 10] d B. The result is shown in T able 3. Th e perf ormanc e of the propo sed alg o rithm is better th an CSS algor ithm, but worse than the hist-ESPRIT algorithm, espec ia lly on SDE. The speec h signal is spectrally sparse a n d the energy is com- SNR=10dB SNR=0dB Algorithm MAE SDE MAE SDE hist-ESPRIT 2 . 57 ◦ 5 . 64 ◦ 3 . 28 ◦ 7 . 47 ◦ CSS 6 . 8 ◦ 9 . 67 ◦ 6 . 5 ◦ 12 . 22 ◦ Proposed 2 . 82 ◦ 3 . 55 ◦ 2 . 75 ◦ 5 . 62 ◦ T able 2 . T wo simultaneous white noise sources Algorithm MAE SDE hist-ESPRIT 4 . 28 ◦ 6 . 47 ◦ CSS 6 . 85 ◦ 15 . 37 ◦ Proposed 5 . 75 ◦ 10 . 62 ◦ T able 3 . T wo simultaneous talkers, SNR ∈ [ − 5 , 10] dB pacted in the fundamen tal freq uency and its harmo nics. In contrast, under the white noise backgrou nd, the inter vals be- tween harmonics in the spec tr um are n o isier ( lower SNR). W ith the inspection of th e narrowband processing, the esti- mated DO A results fro m the intervals had large errors (up to 90 ◦ ). The reason behind the r e duction in performan ce of the prop osed algorithm is the inf erior ro bustness of the least squares criterio n in co mparison to the histog ram method . In the white sources experiments above, the SNR is constant for the entire fr equency range. Ther efore, those experimen ts had better results. Throu g h the above experiments, comp a red to th e co mput- ing time of hist-ESPRIT algorith m , CSS algorithm (with 0 . 1 ◦ resolution of the spatial spectrum) was 13% faster, and the propo sed alg orithm was 22 . 6% faster . 6. CO NCLUSION A wideband signal subspace DO A estimation approach is presented. The pr oposed signal subsp ace r otation method and the narrowband signal covariance matrix reconstru ction method are high and ou tperfo r m the existing co n ventional approa c h es in com putation al c omplexity . Add itionally , the propo sed ap proach av oid s the ne cessity of additio nal prior knowledge for extending the narrowband ESPRIT to a wide- band scheme. Exp eriments b ased on real rec o rding s validate the effectiv eness and low computation al comp lexity of the propo sed meth od. As pa r t o f the future work, the pr o posed algorithm sho uld be analyzed in the co ntext of solving the spatial aliasing problem, in order to improve the localization accuracy of the sp ectrally sparse sources in environmen ts with backgro und noise. 7. REFERENCES [1] W . Jin, M. J. T aghizadeh , K. Chen, and W . Xiao, “M ulti- channel noise redu ction f or ha n ds-free v oice co mmuni- cation o n mobile p hones, ” in 20 17 IEEE Internationa l Confer ence o n Ac oustics, S peech and Signal Pr ocessing (ICASSP) , March 2017, pp. 506–51 0. [2] W . Jin, B. Desikan, A. Kuma r, an d K. Chen, “M ulti- channel noise reduc tio n with in terferen ce suppr ession on mobile p h ones, ” in 2 018 16 th In ternationa l W ork- shop on Aco ustic Signa l Enha ncement (IW AENC) , Sep. 2018, pp. 201– 205. [3] C. Knap p and G. Carter, “Th e genera lized co r relation method for e stima tio n o f time d elay , ” A coustics, Speech and Signal Pr ocessing, IEEE T ransactions , vol. 24(4) , pp. 320 – 327 , 1976 . [4] J. H. DiBiase, A High-Accu racy , Low-Latency T ech- nique for T alker Localization in Reverberant En vir on- ments Using Micr op hone Arrays , Ph.D. thesis, Brown University , 2000. [5] R. Schmidt, “ M ultiple em itter loca tion and sign al pa- rameter estimation, ” An tennas an d Pr op agation, IE EE T ransactions , vol. 34(3), pp. 276 – 280, 1986 . [6] R. Roy and T . Kailath, “ESPRIT-estimation of signal parameters via rotation al inv ariance techniq ues, ” Acous- tics, Speech a nd Sig nal Pr oce ssing, IEEE T ransactions , vol. 37(7), pp. 984–99 5, 19 89. [7] A. L ombard , Y . Zheng, H. Buchner, an d W . Kellermann, “TDO A estimation fo r multiple soun d sources in n oisy and reverberant environments using broadban d indepen- dent com ponen t analy sis, ” IEEE T ransactions on A u dio, Speech, an d Lang uage Pr ocessing , vol. 19( 6 ), pp. 1490 – 150 3, 2011. [8] F . Nesta, P . Svaizer, and M. Omolog o, “Con volu- ti ve BSS of shor t mixtures by ICA r ecursively regu la r- ized across freq uencies, ” IEEE transactions on audio, speech, an d language pr ocessing , vol. 19, no. 3, pp. 624 – 639 , 2011. [9] A. Brendel, C. Hua ng, and W . Kellermann, “STFT bin selection fo r loc a lization algor ithms based on the spar- sity of speec h signal spectra, ” in Eur op ean Congr ess and E xposition on Noise Con tr ol En gineering . IEEE, 2018, pp. 2561 –256 8. [10] S. Araki, H. Sawada, R. Muk ai, a nd S. Mak ino, “DOA estimation f or m ultiple sparse sources with norm a lize d observation vector clustering, ” in IEEE Internationa l Confer ence on Aco ustics, S peech, and Signal Pr ocess- ing (ICASSP) . IEEE, 2006, v ol. 5. [11] H. T eutsch and W . Kellermann, “EB-ESPRIT: 2D lo- calization of m ultiple wideband ac oustic sources usin g eigenbeam s, ” IE EE In ternationa l Confer ence , vol. iii/89 - iii/92 V ol. 3 (3), pp. 89– 92, 2005. [12] D. Khayk in and B. Rafaely , “Coherent signals direction- of-arrival estimation u sing a spherical micr opho n e ar - ray: Frequency smoothing appro ach, ” in Applica tions of Signal Pr oc essing to Audio and Acoustics, 2009. W AS- P AA ’09. IEEE W orkshop on , 2009, pp. 221– 224. [13] H. W ang and M. Ka veh, “Coherent sign al-subspace pro- cessing f or the detection and estima tio n of angles of arriv al of multiple wide-band sources, ” IEEE T rans- actions on Acou stics, Sp eech, a nd Sign al Pr ocessing , 1985. [14] A. Shaw and R. Kumaresan , “Estimatio n of an gles of arriv als o f broad b and signals, ” in IEEE In ternationa l Confer ence on Acoustics, Speech, a nd Signal Pr ocess- ing (ICASSP) , 1987 . [15] Y .-H. Chen a n d R.-H. Chen, “Direction s-of-ar riv al esti- mations o f multiple coheren t broadb a nd signals, ” IE EE transactions on aer osp ace and electr onic systems , v ol. 29, no. 3, pp. 103 5 – 1043, 1993. [16] H. Hung an d M. Kaveh, “Focussing matrices for coh er- ent signal-subspace processing , ” IEEE T ransactio ns on Acoustics, S peech, a nd Signal Pr o cessing , v ol. 36, no. 8, pp. 1272 – 128 1, 1988. [17] B. Ottersten and T . Kailath, “Direction- of-arr iv al esti- mation for wide-band signals using the ESPRIT algo- rithm, ” IEEE T ransactions on A coustics, Sp eech, and Signal Pr ocessing , vol. 38, no. 2, pp. 31 7 – 327, 1990. [18] F . Raimondi, P . Como n, and O. Michel, “Wideband multilinear array proce ssing th rough tensor deco mposi- tion, ” in IEEE Internationa l Conference on Acoustics, Speech and Signal P r ocessing (ICAS SP) , 2016. [19] D. B. W ard, Z. Ding, and R. A. Kennedy , “Broadb and DO A estimation using frequency in variant beam f orm- ing, ” IEEE T ransaction s on Signal Pr o cessing , 199 8 . [20] K. Chen, J. T . Geiger , W . Jin, M. T aghizad eh, and W . K ellermann, “Robust phase rep lication m ethod for spatial aliasing prob lem in multiple sound sou rces lo- calization, ” in IEEE W orkshop on Ap plication s of Signa l Pr ocessing to Audio and Acoustics (W ASP AA) , 2 0 17. [21] R. Roy , A. Paulraj, and T . Kailath , “Comparative perfor mance of ESPRIT and MUSIC for d ir ection-o f - arriv al estimation, ” in IEEE Internation al Confer ence on Acoustics, Sp eech, and Signal Pr oce ssing (ICASS P) , 1987. [22] M. Cooke, J. Barker , S. Cunningham , and X. Shao, “ An audio-v isual corpu s for speech perception and automatic speech recognitio n, ” The Journal of the Acoustical So- ciety of America , vol. 120(5), pp. 2421 – 2424 , 2 006.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment