Localization and Tracking of an Acoustic Source using a Diagonal Unloading Beamforming and a Kalman Filter

Localization and Tracking of an Acoustic Source using a Diagonal   Unloading Beamforming and a Kalman Filter
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We present the signal processing framework and some results for the IEEE AASP challenge on acoustic source localization and tracking (LOCATA). The system is designed for the direction of arrival (DOA) estimation in single-source scenarios. The proposed framework consists of four main building blocks: pre-processing, voice activity detection (VAD), localization, tracking. The signal pre-processing pipeline includes the short-time Fourier transform (STFT) of the multichannel input captured by the array and the cross power spectral density (CPSD) matrices estimation. The VAD is calculated with a trace-based threshold of the CPSD matrices. The localization is then computed using our recently proposed diagonal unloading (DU) beamforming, which has low-complexity and high resolution. The DOA estimation is finally smoothed with a Kalman filer (KF). Experimental results on the LOCATA development dataset are reported in terms of the root mean square error (RMSE) for a 7-microphone linear array, the 12-microphone pseudo-spherical array integrated in a prototype head for a humanoid robot, and the 32-microphone spherical array.


💡 Research Summary

This paper presents a comprehensive signal processing framework designed for the IEEE AASP Challenge on Acoustic Source Localization and Tracking (LOCATA). The system is tailored for Direction of Arrival (DOA) estimation and tracking in single-source scenarios. The proposed architecture is structured around four core modules: pre-processing, voice activity detection (VAD), localization, and tracking.

The process begins with pre-processing, where the multichannel audio signals from the microphone array are transformed into the time-frequency domain using the Short-Time Fourier Transform (STFT). Subsequently, Cross-Power Spectral Density (CPSD) matrices are estimated for a specified frequency range, forming the foundational data for subsequent stages.

The VAD module operates by calculating the trace of the CPSD matrices, which represents the total power received by the array. A simple thresholding mechanism is applied to this trace value to determine the presence of an active acoustic source, providing a computationally efficient gate for the localization process.

The heart of the system is the localization module, which employs a Diagonal Unloading (DU) beamforming technique. This method enhances spatial resolution by subtracting a scaled identity matrix (where the scale factor is the trace of the CPSD matrix) from the estimated CPSD matrix. This transformation attenuates the signal subspace relative to the noise subspace, leading to a sharper spatial spectrum. A broadband steered response power (SRP) map is constructed by incoherently fusing the narrowband DU responses across frequencies. The direction corresponding to the peak of this SRP map is selected as the DOA estimate for each time frame.

To smooth the potentially noisy and discontinuous frame-wise DOA estimates and produce a coherent trajectory, a Kalman Filter (KF) is used for tracking. The KF models the source motion with a state vector containing azimuth, elevation, and their respective velocities. It recursively predicts and corrects the state based on the new DOA measurements from the localization module, effectively filtering out jitter and providing a stable track.

The experimental evaluation was conducted on the LOCATA development dataset. The system’s performance was tested using three distinct microphone arrays: a 7-channel linear subarray, a 12-channel pseudo-spherical array mounted on a humanoid robot head, and a 32-channel spherical Eigenmike array. The evaluation covered tasks of varying difficulty: static source with a static array (Task 1), moving source with a static array (Task 3), and moving source with a moving array (Task 5). Performance was quantified using the Root Mean Square Error (RMSE) between the estimated and ground-truth DOA angles.

The results, presented in a detailed table, demonstrate that the performance varies significantly depending on the array geometry and the task complexity. While the algorithm works consistently across different hardware, the absolute RMSE values differ, highlighting the influence of array characteristics and environmental dynamics. The paper includes illustrative figures showing example waveforms, VAD decisions, and the resulting azimuth/elevation tracks for specific tasks and recordings. In conclusion, the paper successfully demonstrates that the integration of low-complexity, high-resolution DU beamforming with a Kalman filter tracker forms a viable and effective framework for the challenge of acoustic source localization and tracking in single-source scenarios.


Comments & Academic Discussion

Loading comments...

Leave a Comment