Modeling of Teager Energy Operated Perceptual Wavelet Packet Coefficients with an Erlang-2 PDF for Real Time Enhancement of Noisy Speech
In this paper, for real time enhancement of noisy speech, a method of threshold determination based on modeling of Teager energy (TE) operated perceptual wavelet packet (PWP) coefficients of the noisy speech and noise by an Erlang-2 PDF is presented. The proposed method is computationally much faster than the existing wavelet packet based thresholding methods. A custom thresholding function based on a combination of mu-law and semisoft thresholding functions is designed and exploited to apply the statistically derived threshold upon the PWP coefficients. The proposed custom thresholding function works as a mu-law or a semisoft thresholding function or their combination based on the probability of speech presence and absence in a subband of the PWP transformed noisy speech. By using the speech files available in NOIZEUS database, a number of simulations are performed to evaluate the performance of the proposed method for speech signals in the presence of Gaussian white and street noises. The proposed method outperforms some of the state-of-the-art speech enhancement methods both at high and low levels of SNRs in terms of standard objective measures and subjective evaluations including formal listening tests.
💡 Research Summary
The paper proposes a real‑time speech enhancement algorithm that combines three key ideas: (1) the application of the Teager Energy (TE) operator to perceptual wavelet packet (PWP) coefficients, (2) statistical modeling of the resulting TE‑PWP coefficients with an Erlang‑2 probability density function (PDF), and (3) a custom thresholding function that adaptively blends mu‑law compression and semisoft shrinkage based on the estimated probability of speech presence in each subband.
Signal processing pipeline
The noisy speech signal is first passed through the TE operator, which emphasizes instantaneous energy fluctuations and makes speech transients more distinguishable from background noise. The TE‑processed signal is then decomposed using a perceptual wavelet packet transform, yielding a set of subband coefficients that reflect the human auditory system’s frequency resolution. Because the TE operator produces only positive values, each subband’s coefficient distribution is highly skewed with a long tail. The authors demonstrate that an Erlang‑2 distribution (a special case of the Gamma distribution with shape parameter k = 2) fits these coefficients very well. Only the scale parameter θ needs to be estimated per subband, which can be done analytically from the sample mean, making the modeling step computationally cheap.
Derivation of the adaptive threshold
From the Erlang‑2 model the likelihood ratio of speech versus noise is obtained in closed form, leading to a simple expression for the optimal threshold λ:
\
Comments & Academic Discussion
Loading comments...
Leave a Comment