Speech Enhancement Based on Non-stationary Noise-driven Geometric Spectral Subtraction and Phase Spectrum Compensation

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this paper, a speech enhancement method based on noise compensation performed on short time magnitude as well phase spectra is presented. Unlike the conventional geometric approach (GA) to spectral subtraction (SS), here the noise estimate to be subtracted from the noisy speech spectrum is proposed to be determined by exploiting the low frequency regions of current frame of noisy speech rather than depending only on the initial silence frames. This approach gives the capability of tracking non-stationary noise thus resulting in a non-stationary noise-driven geometric approach of spectral subtraction for speech enhancement. The noise compensated magnitude spectrum from the GA step is then recombined with unchanged phase of noisy speech spectrum and used in phase compensation to obtain an enhanced complex spectrum, which is used to produce an enhanced speech frame. Extensive simulations are carried out using speech files available in the NOIZEUS database shows that the proposed method consistently outperforms some of the recent methods of speech enhancement when employed on the noisy speeches corrupted by street or babble noise at different levels of SNR in terms of objective measures, spectrogram analysis and formal subjective listening tests.

💡 Research Summary

The paper introduces a novel speech‑enhancement algorithm that jointly operates on the magnitude and phase spectra to tackle non‑stationary noise. Traditional geometric‑approach (GA) spectral subtraction relies on a noise estimate derived from initial silence frames, which becomes ineffective when the noise statistics change over time. To overcome this, the authors propose a dynamic noise‑estimation scheme that exploits the low‑frequency band (approximately 0–300 Hz) of the current noisy frame. Because human speech contributes little energy in this band, the measured power there is dominated by noise, allowing an accurate, frame‑wise estimate of the noise spectrum. This estimate is combined with the conventional initial‑silence estimate through a weighted average, yielding a non‑stationary‑noise‑driven noise model that can track rapid variations in street, babble, or other environmental noises.

With the updated noise spectrum (\hat N(k,m)) in hand, the GA step computes a corrected magnitude (|\hat X(k,m)|) using the geometric relationship between the noisy complex spectrum (Y(k,m)) and the noise estimate. The phase is left unchanged at this stage, and the corrected magnitude is recombined with the original noisy phase to form an intermediate complex spectrum (Z(k,m)).

The second stage, Phase Spectrum Compensation (PSC), addresses the well‑known fact that phase errors can severely degrade speech intelligibility, especially under low‑SNR conditions. The authors derive a compensation term based on the imaginary part of the product between (Z(k,m)) and the conjugate of the noise estimate, normalized by (|Z(k,m)|^2). A scaling factor (\alpha) (tuned empirically) controls the strength of the correction:
\

Speech Enhancement Based on Non-stationary Noise-driven Geometric Spectral Subtraction and Phase Spectrum Compensation

💡 Research Summary

Comments & Academic Discussion

Leave a Comment