Non-negative matrix factorization-based subband decomposition for acoustic source localization

February 23, 2026

Reading time: 5 minute

...

📝 Abstract

A novel non-negative matrix factorization (NMF) based subband decomposition in frequency spatial domain for acoustic source localization using a microphone array is introduced. The proposed method decomposes source and noise subband and emphasises source dominant frequency bins for more accurate source representation. By employing NMF, delay basis vectors and their subband information in frequency spatial domain for each frame is extracted. The proposed algorithm is evaluated in both simulated noise and real noise with a speech corpus database. Experimental results clearly indicate that the algorithm performs more accurately than other conventional algorithms under both reverberant and noisy acoustic environments.

💡 Analysis

🇰🇷 한글로 읽기

📄 Content

1 ELECTRONICS LETTERS 22nd October 2015 Vol. 51 No. 22 pp. 1723-1724 Non-negative matrix factorization based subband decomposition for acoustic source localization

S. Shon, S. Mun, D. Han and H. Ko✉

A novel Non-negative Matrix Factorization (NMF) based subband decomposition in frequency-spatial domain for acoustic source localization using a microphone array. The proposed method decomposes source and noise subband and emphasizes source dominant frequency bins for more accurate source representation. By employing NMF, we extract Delay Basis Vectors (DBV) and their subband information in frequency-spatial domain for each frame. The proposed algorithm is evaluated in both simulated noise and real noise with a speech corpus database. Experimental results clearly indicate that the algorithm performs more accurately than other conventional algorithms under both reverberant and noisy acoustic environments.

Introduction: Acoustic source localization has been an active research area with applications in a variety of fields and it has become an important topic in acoustic based applications. Time Delay Estimation (TDE) between two or more microphone signals can be used as a mean for source localization. Generalized Cross-Correlation (GCC) is the most commonly used TDE approach.
In this paper, we propose a decomposition of signal and noise subbands based on Non-negative Matrix Factorization (NMF) and GCC. Using the decomposed signal subband information, the source dominant frequency bins can be emphasized by spectral weighting. A TDE algorithm based on the proposed subband decomposition approach outperforms conventional GCC algorithms and other TDE algorithm such as Adaptive Eigenvalue Decomposition (AED) [1] and other spectral weighting method such as Cross-Power Spectrum (CPS) [2] and local-Peak-Weighted (LPW) [3] in reverberant and noisy environments. The proposed approach exhibits conceptual similarity to the Multiple Signal Classification (MUSIC) algorithm [4]. It decomposes the cross- correlation matrix of the multichannel signals into signal and noise subspaces using eigenvalue decomposition. It was developed originally as a direction-of-arrival (DOA) estimation technique for narrowband signals, and there are many variants. Although there are subspace techniques, such as the MUSIC method, that are applicable to wideband signals, theoretically, they cannot be used for coherent source localization such as acoustic environment with reverberations.

Proposed subband decomposition: Consider that the Mth channel microphone input signal is xm(t) and its Short Time Fourier Transform (STFT) is Xm(t), then the GCC with PHAse Transform (PHAT) of the lth and the qth microphone signal is

1 ( ) ( ) ( ) ( ) 2 j lq lq l q R X X e d             (1)

where Ψlq denotes a PHAT weight function as |) ( ) ( |/1 ) ( Ψ

ω X ω X ω q l lq  .
Note that θ is azimuth when the Time Delay Of Arrival of the lth and the qth micorphones is τ as   d / sin 1      where d is the distance between the lth and the qth microphones, and γ is the speed of sound. Since STFT is designed for a discrete signal, frequency ω should be a discrete value, i.e., ωk=2π(k/N), where N is the length of the frame and k denotes the frequency bin index. Therefore, for calculating GCC-PHAT corresponding to each frequency bin, (1) can be rewritten as

1 ( , ) ( ) ( ) ( ) k j lq k lq k l k q k R X X e K        (2) Using (2), we show some examples of source localization in a single frame. For clean signals, we can see clear large amplitudes to source directions in all frequency bins as in Fig. 1 (a). However, when there is noise, the source signal is corrupted as shown in Fig. 1 (b). For more accurate and robust source localization, we utilize the NMF theory to decompose the source and noise subbands and accentuate the source dominant frequency bins.

Fig. 1 Example of GCC-PHAT amplitude by each frequency and the proposed subband decomposition when the source is at 30˚. a GCC-PHAT when noise is absent: Clear large amplitude to source directions. Spatial aliasing cannot be avoided when d is larger than λ/2= γ /2f where λ is the wavelength of the signal frequency f b GCC-PHAT when SNR = -5dB: Source signal is corrupted c DBV W: the columns of W can be interpreted as DBV d Subband information H: the rows of H are spectral weights corresponding to each DBV

NMF is a matrix factorization algorithm that decomposes non- negative matrix V in to a product of a non-negative basis matrix W and a non-negative gains matrix H as follows

WH V  (3)

where B A    V , C A    W and B C    H and C<A,B. For factorization, the Lee’s approach was adopted in our method[5]. The most common usage of NMF is decomposing a spectrogram into spectra

View Original ArXiv

This content is AI-processed based on ArXiv data.

Non-negative matrix factorization-based subband decomposition for acoustic source localization

📝 Abstract

💡 Analysis

📄 Content

Table of Contents

Table of Contents

📝 Abstract

💡 Analysis

📄 Content

Start searching

No results found