Frequency domain TRINICON-based blind source separation method with multi-source activity detection for sparsely mixed signals
The TRINICON (‘Triple-N ICA for convolutive mixtures’) framework is an effective blind signal separation (BSS) method for separating sound sources from convolutive mixtures. It makes full use of the non-whiteness, non-stationarity and non-Gaussianity properties of the source signals and can be implemented either in time domain or in frequency domain, avoiding the notorious internal permutation problem. It usually has best performance when the sources are continuously mixed. In this paper, the offline dual-channel frequency domain TRINICON implementation for sparsely mixed signals is investigated, and a multi-source activity detection is proposed to locate the active period of each source, based on which the filter updating strategy is regularized to improve the separation performance. The objective metric provided by the BSSEVAL toolkit is utilized to evaluate the performance of the proposed scheme.
💡 Research Summary
The paper investigates a blind source separation (BSS) technique based on the TRINICON framework (Triple‑N ICA for convolutive mixtures) when the observed mixtures are sparsely mixed, i.e., the sources are not continuously active. TRINICON is known for exploiting three statistical properties of source signals—non‑whiteness, non‑stationarity, and non‑Gaussianity—and can be implemented either in the time domain or the frequency domain. The frequency‑domain implementation avoids the internal permutation problem that plagues many frequency‑domain BSS methods, but its performance degrades when the sources are only intermittently present because the adaptive filters are updated on every time‑frequency bin, including those where a source is silent.
To address this limitation, the authors propose a Multi‑Source Activity Detection (MSAD) module that identifies the active periods of each source directly from the short‑time Fourier transform (STFT) representation of the mixtures. The MSAD operates by monitoring the energy of each frequency bin and applying a dynamically adjusted threshold that reflects the non‑stationary nature of speech. When the energy exceeds the threshold, the corresponding source is marked as active; otherwise, it is considered silent. The result is a binary activity mask (A_k(t,f)) for source (k) at time frame (t) and frequency bin (f).
The core contribution lies in regularizing the filter‑update rule of TRINICON with the activity mask. In the standard frequency‑domain TRINICON, the natural‑gradient update of the demixing filters (\mathbf{w}) is driven by the gradient of a cost function that combines three terms: a non‑whiteness term (autocorrelation), a non‑stationarity term (temporal variation), and a non‑Gaussianity term (kurtosis). The proposed modification multiplies this gradient by the activity mask, i.e.,
\
Comments & Academic Discussion
Loading comments...
Leave a Comment