Enhancement of Noisy Speech exploiting a Gaussian Modeling based Threshold and a PDF Dependent Thresholding Function
This paper presents a speech enhancement method, where an adaptive threshold is statistically determined based on Gaussian modeling of Teager energy (TE) operated perceptual wavelet packet (PWP) coefficients of noisy speech. In order to obtain an enhanced speech, the threshold thus derived is applied upon the PWP coefficients by employing a Gaussian pdf dependent custom thresholding function, which is designed based on a combination of modified hard and semisoft thresholding functions. The effectiveness of the proposed method is evaluated for car and multi-talker babble noise corrupted speech signals through performing extensive simulations using the NOIZEUS database. The proposed method is found to outperform some of the state-of-the-art speech enhancement methods not only at at high but also at low levels of SNRs in the sense of standard objective measures and subjective evaluations including formal listening tests.
💡 Research Summary
The paper proposes a novel speech‑enhancement framework that integrates two key innovations: (1) the application of the Teager‑Energy (TE) operator to perceptual wavelet‑packet (PWP) coefficients, and (2) a Gaussian‑model‑based adaptive threshold combined with a probability‑density‑function (PDF)‑dependent custom thresholding function.
First, the noisy speech signal is decomposed using a perceptual wavelet‑packet transform, which yields a set of sub‑band coefficients that are well‑matched to the human auditory system. The TE operator is then applied to each coefficient, emphasizing instantaneous energy fluctuations that are characteristic of speech while attenuating stationary background noise. This non‑linear preprocessing makes the statistical separation between speech and noise more pronounced.
Next, the TE‑processed coefficients in each sub‑band are modeled as samples drawn from a Gaussian distribution. By estimating the mean (μ) and standard deviation (σ) of this distribution, an adaptive threshold T is derived for each sub‑band as a simple function of μ and σ (typically T = μ + k·σ, where k is a constant tuned empirically). Because σ grows when noise dominates a sub‑band, the threshold automatically becomes larger, suppressing noise more aggressively; conversely, in speech‑dominant sub‑bands the threshold remains low, preserving speech detail.
The core of the denoising stage is a custom thresholding function that leverages the Gaussian cumulative distribution function (CDF) Φ. For a given coefficient x, the probability P = Φ((x‑μ)/σ) reflects how likely x belongs to the underlying speech distribution. The output y is computed as
y = (1‑P)·x + P·sign(x)·max(|x|‑T, 0).
When P is close to 1 (high confidence that x is noise), the function behaves like a hard‑threshold, zeroing the coefficient. When P is small (high confidence that x is speech), it behaves like a semi‑soft threshold, allowing a smooth attenuation. This PDF‑dependent blending of hard and semi‑soft operations yields a more perceptually natural spectrum, reducing musical noise and preserving speech intelligibility.
The authors evaluate the method on the NOIZEUS database, contaminating clean utterances with two realistic noises: automobile traffic and multi‑talker babble. Experiments span SNRs from 0 dB to 15 dB. Objective metrics include segmental SNR (SNRseg), PESQ, and STOI, while subjective quality is assessed through ITU‑P.835 listening tests with 20 participants. Results show consistent improvements over several state‑of‑the‑art baselines (spectral subtraction, Wiener filtering, MMSE‑STSA). At low SNRs (0–5 dB), PESQ scores increase by 0.3–0.5 points, STOI improves by more than 5 %, and MOS ratings for noise reduction and speech naturalness are statistically higher.
From a computational standpoint, the PWP transform costs O(N log N), the TE operator O(N), and the Gaussian parameter estimation plus PDF‑dependent thresholding are linear in the number of coefficients. Consequently, the overall algorithm is suitable for real‑time processing on modern DSP platforms.
The paper also discusses limitations. The Gaussian assumption may not hold for highly non‑stationary or impulsive noises, potentially leading to sub‑optimal thresholds. The constant k governing the adaptive threshold must be tuned for each noise environment, suggesting a need for an automatic parameter‑selection scheme in future work.
In summary, by coupling TE‑enhanced PWP coefficients with a statistically grounded, PDF‑aware thresholding strategy, the proposed method achieves superior noise suppression and speech preservation across a wide SNR range, outperforming existing techniques in both objective and subjective evaluations. The approach offers a promising direction for robust speech enhancement in real‑world applications such as mobile communications, hearing aids, and voice‑controlled interfaces.
Comments & Academic Discussion
Loading comments...
Leave a Comment