A Steered Response Power Method for Sound Source Localization With Generic Acoustic Models
The steered response power (SRP) method is one of the most popular approaches for acoustic source localization with microphone arrays. It is often based on simplifying acoustic assumptions, such as an omnidirectional sound source in the far field of the microphone array(s), free field propagation, and spatially uncorrelated noise. In reality, however, there are many acoustic scenarios where such assumptions are violated. This paper proposes a generalization of the conventional SRP method that allows to apply generic acoustic models for localization with arbitrary microphone constellations. These models may consider, for instance, level differences in distributed microphones, the directivity of sources and receivers, or acoustic shadowing effects. Moreover, also measured acoustic transfer functions may be applied as acoustic model. We show that the delay-and-sum beamforming of the conventional SRP is not optimal for localization with generic acoustic models. To this end, we propose a generalized SRP beamforming criterion that considers generic acoustic models and spatially correlated noise, and derive an optimal SRP beamformer. Furthermore, we propose and analyze appropriate frequency weightings. Unlike the conventional SRP, the proposed method can jointly exploit observed level and time differences between the microphone signals to infer the source location. Realistic simulations of three different microphone setups with speech under various noise conditions indicate that the proposed method can significantly reduce the mean localization error compared to the conventional SRP and, in particular, a reduction of more than 60% can be archived in noisy conditions.
💡 Research Summary
The paper addresses a fundamental limitation of the conventional steered response power (SRP) method, particularly the SRP‑PHAT variant, which assumes a free‑field, far‑field, omnidirectional source and spatially uncorrelated noise. In many realistic scenarios—distributed microphone networks, near‑field conditions, directional sources or receivers, and acoustic shadowing—these assumptions are violated, causing traditional SRP to ignore valuable level‑difference cues (ILD) and to perform poorly.
To overcome this, the authors propose a generalized SRP (GSRP) framework that can incorporate arbitrary acoustic models, including near‑field distance attenuation, source/receiver directivity, head‑related transfer functions, and measured transfer functions. They first demonstrate that simply plugging a generic model into the conventional delay‑and‑sum (DS) beamformer leads to pathological behavior: the beamformer output power diverges when the steering point approaches a microphone and collapses for distant points. This motivates the use of a distortionless‑response beamformer (MVDR or MPDR) that normalizes the output power with respect to the source power.
The optimal beamformer weights are derived as
w_opt(ω,p)=Φ_vv⁻¹(ω) h(ω,p) /
Comments & Academic Discussion
Loading comments...
Leave a Comment