DNN-Based Online Source Counting Based on Spatial Generalized Magnitude Squared Coherence
The number of active sound sources is a key parameter in many acoustic signal processing tasks, such as source localization, source separation, and multi-microphone speech enhancement. This paper proposes a novel method for online source counting by detecting changes in the number of active sources based on spatial coherence. The proposed method exploits the fact that a single coherent source in spatially white background noise yields high spatial coherence, whereas only noise results in low spatial coherence. By applying a spatial whitening operation, the source counting problem is reformulated as a change detection task, aiming to identify the time frames when the number of active sources changes. The method leverages the generalized magnitude-squared coherence as a measure to quantify spatial coherence, providing features for a compact neural network trained to detect source count changes framewise. Simulation results with binaural hearing aids in reverberant acoustic scenes with up to 4 speakers and background noise demonstrate the effectiveness of the proposed method for online source counting.
💡 Research Summary
The paper addresses the fundamental problem of estimating the number of simultaneously active sound sources in real time, a prerequisite for many downstream acoustic‑signal‑processing tasks such as source localization, separation, and multi‑mic speech enhancement. Existing approaches fall into three categories: (i) single‑mic deep neural networks that directly regress the source count from the mixture, (ii) multi‑mic methods that rely on direction‑of‑arrival (DOA) estimation or clustering and therefore need precise array geometry, and (iii) conventional change‑detection schemes that use fixed thresholds on spectral or coherence measures. All of these suffer from either high computational latency, strong dependence on array calibration, or poor robustness to reverberation and noise.
The authors propose a novel framework that exploits spatial coherence as a physical cue. In a spatially white noise background, the covariance matrix of the microphone signals is diagonal. Adding a single coherent source introduces a rank‑1 component, dramatically increasing spatial coherence. By “time‑reversed whitening” – i.e., subtracting the covariance of a past reference frame from the current frame – the contribution of already active sources is cancelled, leaving only the rank‑1 update caused by a newly active (or de‑active) source.
From the whitened covariance matrix the Generalized Magnitude‑Squared Coherence (GMSC) is computed per frequency bin: γₜ,𝑓 = λ_max(Γₜ,𝑓)/(M‑1), where Γₜ,𝑓 is the whitened covariance normalized by its diagonal, and λ_max denotes the largest eigenvalue. GMSC values lie in
Comments & Academic Discussion
Loading comments...
Leave a Comment