Evaluation of an open-source implementation of the SRP-PHAT algorithm within the 2018 LOCATA challenge
This short paper presents an efficient, flexible implementation of the SRP-PHAT multichannel sound source localization method. The method is evaluated on the single-source tasks of the LOCATA 2018 development dataset, and an associated Matlab toolbox…
Authors: Romain Lebarbenchon, Ewen Camberlein, Diego di Carlo
LOCA T A Challe nge W orkshop, a satelli te e vent of IW AENC 2018 September 17-20, 2018, T okyo, Japan EV ALU A T ION OF AN OPEN-SOURCE IMPLEMENT A TION OF THE SRP-PHA T ALGORITHM WITHIN THE 2018 LOCA T A CHALLENGE Romain Lebarbenchon 1 , Ewen Camberlein 1 , Die go di Carlo 1 , Cl ´ ement Gaultier 1 , Antoine Delefor ge 2 , Nancy Bertin 1 1 Univ Rennes 1, Inria, CNRS, IRISA, F-35000 Rennes, France 2 Univ ersit ´ e de Lorraine, CNRS, Inria, Loria, F-54000 Nancy , France ABSTRA CT This short paper presents an ef ficient, flexible implementation of the SRP-PHA T multichannel sound source localization method. The method is e valuated on the single-source tasks of the LOCA T A 2018 de velop ment dataset, and an associated Matlab toolbox is made av ailable online. 1. INTRODUCTION Many source localization methods are based on the estimation of time-difference-of-arri val (TDOA) between microphones. In the two-micropho ne case, the thorough e valua tion carried o ut in [1] highlighted the good performan ce of the Generalized Cross- Correlation wi t h P HAse Transform (GC C -PHA T ) algorithm [2] among sev eral other methods. A general principle for extending t his method to the multi channel setting, named SRP-P HA T , was pro- posed in [3]. Howe ver , no reference implementation of this family of algorithms (including all its possible v ariants) was av ailable to date, to the best of the authors’ kno wledge. In particular , choices of parameters (pooling in time and frequenc y , grid search resolution) and management of the multi ple coordinate systems between pairs of microphon es and the whole antenna ha ve not been systematically documented . W e present here our participation t o the 2018 LOCA T A chal- lenge [4]. This aims at ev aluating our implementation of SRP - PHA T , whose good performance was alr eady probed in realistic en- vironments in [5], but which was not confronted to other methods on a common , realistic dataset so far . It is here applied to task 1 (sin- gle, static source and static array), task 3 (single moving source and static array) and task 5 (single moving source and mov ing array) of the challenge. Evaluation of t he method on the dev elopment dataset sho ws encouraging results, confirming the legitimate place of SRP- PHA T among state-of-the-art method s for multichann el source lo- calization in r everbera nt en vironments. 2. PRINCIPLES 2.1. TDO A and angular spectrum The general principle of S R P-PHA T is to compute a function Φ( θ, ϕ ) termed “angu lar spectrum”, where θ and φ are azimuth and ele va tion v ari ables, which is expected to exhibit local maxima in the directions of acti ve sources. More precisely , the following steps are follo wed: • build in each time-frequency ( t, f ) bin a local angular spec- trum function φ ( t, f , θ , ϕ ) that is large for directi ons ( θ, ϕ ) which are compatible with the observ ed signal at ( t, f ) and small otherwise, • inte grate (or pool ) this function over the time-frequency plane, leading to a global angular spectrum, • find t he peaks of this spectrum abo ve a certain t hr eshold and distant from a certain minimum angle . 2.2. GCC-PHA T The GCC algorithm [2] generalizes cross-correlation by multiply- ing the cross-spectral density (CS D) of two signals by complex weights in the frequenc y domain. The PHA T weighting consists of normalizing the CSD to hav e unit amplitude at all frequencies, thus considering phase differences only and minimizing t he influence of the source’ s power spectral density . In the case of a single broad- band source, two microphones and under far-field (source-array dis- tance much larger that the array aperture h ), free-field (only the di- rect source-to-micropho ne sound paths are con sidered) and noise- less assumptions, the asymptotic GCC-PHA T of the microphone signals in the discrete frequenc y domain is e − 2 iπf τ cos( α ) where f is t he frequenc y index, α is the angle of arriva l (A O A) of the source signal at the array and τ is the maximum observ able delay in sam- ples. W e have τ = hF s /C where C denotes the speed of sound and F s is t he frequency of sampling. This leads to the follo wing natural local angular spectrum for a microphone pair: φ GCC − PHA T ( t, f , α ) = ℜ x 1 ( t, f ) x ∗ 2 ( t, f ) | x 1 ( t, f ) x ∗ 2 ( t, f ) | e − 2 iπf τ cos( α ) (1) where x 1 , x 2 denote the microphone signals i n the short-time Fourier transform (S TFT) domain and α is a local azimuth i n the coordinate system defined by the microphone pair . Note that this spectrum does not depend on elev ation. 2.3. Extension to multichannel SRP-PHA T’ s principle is to first compute local angular spectra for each microphone pair , then to aggregate them across pairs (bringing them back in the global coordinate system) before being pooled and maximized. The computation consists of the followin g steps: 1. Define the search space, i. e. a grid of possible DO As ( θ j , ϕ k ) for which we want to ev aluate Φ in the global coor- dinate system; 2. For each microphone pair n : LOCA T A Challe nge W orkshop, a satelli te e vent of IW AENC 2018 Septembe r 17-20, 2018, T okyo, Japan (a) Compute the correspond ing A OAs { α ( n ) j k } j k with re- spect to t he micropho ne pair; (b) R esample { α ( n ) j k } j k into a smaller set { α ( n ) i } i in order to reduce computational ti me; (c) Compute the GCC-P HA T local angular spectrum φ n at angles { α ( n ) i } i for microphone pair n according to formula (1); (d) L inearly interpolate φ n back to the original angle res- olution and global coordinate system; 3. Compute the global spectrum Φ( θ j , ϕ k ) by pooling the local angular spectra φ n ov er all time-frequency bins ( t , f ) and across all microphone pairs n . Pooling methods (su ch as maximum or sum) and their order ov er each of the f , t, n index es must be chosen for this purpose. 4. Find the index es j and k of the largest peak (single source case) or peaks (multiple source case) of Φ( θ j , ϕ k ) , yielding the estimated source azimuth(s) θ j and ele va tion(s) ϕ k . A Matlab toolbox allowing easy and flexible implementations of SR P -PHA T as well as 7 other angular spectrum-based was made freely ava ilable online under the name Multichannel BSS Locate 1 . 3. EXPERIMENTS AND RESUL TS W e no w ev aluate our i mplementation of SR P-PHA T on tasks 1, 3 and 5 of t he LOCA T A challenge [4] with the robo t head and the Eigenmik e arrays for azimuth and elev ation estimation. The other two antennas of the challenge are discarded since they are near- linear (prev enting elev at i on estimati on). Moreo ver , DICIT is in- compatible with our far-field assumption while the dummy head is incompatible with our f ree-field assumption. All microphones (12) and all mi crophone pairs (66) are used with the robot head. All mi- crophones are also used wit h the Eigenmike, but microphone pairs with a curvilinear distance lesser than 90 ◦ are discarded in order to reduce the ov erall algorithm complexity , resulting in 240 micro- phone pairs usage instead of the 496 av ail able pairs. A sphere sam- pled w i th 1 ◦ resolution both i n azimuth and elev ation is used as t he search space (see Sec. 2.3 - 1). For each microphone pair , the ev al- uated A O As (see Sec. 2.3 - 2.(b)) are computed with 5 ◦ resolution. SRP-PHA T is applied e very 256 ms to 512 ms signals (do wnsam- pled t o 16 kHz) using an ove rlapping sliding analysis windo w . The STFT is applied to each channel of the signals using 64 ms Fourier frames (1024 samples) with 50% overlap and sine windo ws. This results in 15 Fourier frames per analysis window . The pooling meth- ods used are first summations over microphone pairs and f r equen- cies and then maximum ov er ti me frames. Only the largest value 1 http://bass- db.gforge.inria.fr/b ss_locate/ ov er the global angular search space is returned as a DO A estimate, as only single source localization i s considered. DO A estimates are linearly interpolated o ver time in order to match the one-estimate- per-time-stamp requirement of LOCA T A. T able 1. shows l ocalization errors obtained with the proposed method. Encouragin gly , mean azimuth errors (fi rst ro w ) are 2 to 6 times smaller than those reported for the baseline MUSIC method [4] for the same arrays and tasks. T o reduce the ef fect of gross err ors on reported mean values, average successful localization errors for differe nt success threshold are also reported. The proposed method localizes the target source with less than 20 ◦ error in at least 92% of the tests for all tasks and arrays. Nearly 100% correct l ocalization is achie ved in the static scenario (task 1). While the robot head array enables better performance in all tasks, av erage errors ne ver exceed 10 ◦ regardless of the task and array . 4. CONCLUSION In this paper , we reported competitive single sound source localiza- tion results using the S RP-PHA T implementation of the Multichan- nel BS S Locate t oolbox on the 2018 LOCA T A challenge dataset. The toolbox’ s flexibility allo ws one to easily adapt the method t o arbitrary array geometries, to v ari ous types of emitted signals and en vironments, and to easily generalize any two-channel localization method to more channels. It can also be tuned to find a trade-off be- tween computational t ime and accuracy , al t hough we focused on accurac y for this challenge. While the toolbox can output multiple source direction estimates for a gi ven input signal, it does not yet incorporate source counting or source t r acking solutions, pre vent- ing its use in multiple sources scenario. Such extensions wil l be considered for future releases of the toolbox. 5. REFERENCES [1] C. Blandin, A. Ozerov , and E . V incent, “Mult i-source TDOA estima- tion in rev erberant audio using angula r spectra and clustering, ” Signal Pr ocessing , vol. 92, no. 8, pp. 1950–1960, 2012. [2] C. Knapp and G. Carte r , “The gene raliz ed correlation m ethod for es- timatio n of time delay , ” IEEE T ransact ions on Acoustics, Speech, and Signal Proc essing , vol. 24, no. 4, pp. 320–327, August 1976. [3] J. Dibiase, H . Silve rman, and M. Brandst ein, “Robu st local izati on in re- verbe rant rooms, ” in Micr ophone Arrays: Signal Proce ssing T ech niques and Applications . Springer , 2001, pp. 157–180. [4] H. W . L ¨ ollmann, C. Evers, A. Schmidt, H. Mellmann, H. Barfuss, P . A. Naylor , and W . Ke llermann, “The loca ta challeng e data corpus for acoustic source loca lizat ion and tracking, ” in IEEE Sensor Array and Multic hannel Signal Pr ocessing W orkshop (SAM),(Sheffiel d, UK) , 2018. [5] N. Bertin, E. Cambe rlein, R. Le barbench on, E. V incent, S. Siv asankaran, I. Illina, and F . Bimbot, “V oicehome-2, an ex- tended corpus for multicha nnel speech processing in real homes, ” to appear in Speec h Communica tion, Elsevie r , 2018. T able 1: Sound localization results obtained on the LOCA T A de velopment dataset using the proposed SRP-PHA T method. For each task and array , dif ferent success thr esholds (in degre es) are considered. A localization is considered successful when both azimuth and elev ation estimation errors are belo w t hese thresholds. The percentage of successful localizations (suc.) and their av erage azimuth (az.) and ele v ation (el.) errors in degrees are sho wed for each threshold. Success threshold T ask 1 T ask 3 T ask 5 Robot head Eigenmike Robot head Eigenmike Robot head E igenmike az. | el. | suc. az. | el. | suc. az. | el. | suc. az. | el. | suc. az. | el. | suc. az. | el. | suc. No thresh. 1 . 51 | 1 . 71 | − 7 . 04 | 4 . 68 | − 4 . 43 | 2 . 66 | − 8 . 79 | 4 . 41 | − 6 . 19 | 3 . 16 | − 9 . 31 | 4 . 37 | − 20 ◦ 1 . 43 | 1 . 66 | 99 . 9 6 . 95 | 4 . 64 | 9 9 . 9 2 . 48 | 1 . 75 | 95 . 8 7 . 82 | 3 . 12 | 9 2 . 5 1 . 76 | 1 . 83 | 94 . 5 5 . 84 | 2 . 94 | 9 4 . 8 10 ◦ 1 . 43 | 1 . 66 | 99 . 9 6 . 95 | 4 . 64 | 9 9 . 9 2 . 35 | 1 . 56 | 93 . 7 6 . 27 | 2 . 40 | 7 1 . 0 1 . 64 | 1 . 77 | 93 . 3 5 . 43 | 2 . 80 | 8 9 . 5
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment