The Combination of Several Decorrelation Methods to Improve Acoustic Feedback Cancellation

The Combination of Several Decorrelation Methods to Improve Acoustic Feedback Cancellation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper extends an acoustic feedback cancellation system by incorporating multiple decorrelation methods. The baseline system is based on a frequency-domain Kalman filter implemented in a multi-delay structure. The proposed extensions include a variable time delay line, prediction, distortion compensation, and a simplified reverberation model. Each extension is analyzed, and a practical parameter range is defined. While existing literature often focuses on a single extension, such as prediction, to describe an optimal system, this work demonstrates that each individual extension contributes to performance improvements. Furthermore, the combination of all proposed extensions results in a superior system. The evaluation is conducted using publicly available datasets, with performance assessed through system distance metrics and the objective speech quality measure PSEQ.


💡 Research Summary

The paper presents an enhanced acoustic feedback cancellation (AFC) system that builds upon a frequency‑domain Kalman filter embedded in a multi‑delay (MD‑FLMS) structure. The baseline system partitions a long room impulse response (RIR) into four 256‑sample blocks (N = 512 with 50 % overlap) and processes each block in the frequency domain using FFT/IFFT, allowing real‑time operation at a 16 kHz sampling rate. While the Kalman filter provides robustness against time‑varying room acoustics, two fundamental issues remain: bias in the estimated filter coefficients and slow convergence when the input signal is colored.

To address these issues, the authors introduce four decorrelation extensions, each analyzed theoretically and validated experimentally:

  1. Fixed time delay – Inherent to the MD‑FLMS architecture, a 256‑sample (≈16 ms) delay reduces the autocorrelation of the loudspeaker signal x, thereby partially mitigating bias without additional hardware.

  2. Variable time‑delay (vibrato) line – Implemented in the time domain as a tapped‑delay line whose tap index is modulated by a low‑frequency sinusoid (max delay ±2 ms, modulation frequency 1–2 Hz). This creates a controlled frequency shift/phase‑modulation effect that further decorrelates x and the microphone signal y. Experiments show that, at a high loop gain of 30 dB, the system becomes unstable after ~6 s with only the fixed delay, but adding a 1 Hz vibrato restores convergence, while a 2 Hz vibrato improves convergence speed at the cost of a modest MOS reduction.

  3. Non‑linear distortion compensation – Four nonlinear functions are examined: half‑wave rectification, signed‑square, a mixture of the two, and a smoothed half‑wave rectifier whose knee parameter c adapts to the signal variance σ²ₓ. Total harmonic distortion (THD) is measured on a 400 Hz sine wave; the smoothed rectifier yields a nearly constant THD across amplitudes, while the mixed function produces higher THD at low amplitudes. By mixing the distorted signal with the clean input using a scaling factor α (set to achieve target THD of 5 % or 10 %) and a power‑preserving factor sc, the authors obtain a controllable amount of distortion that reduces cross‑correlation without audibly degrading speech.

  4. Simplified reverberation model – Instead of a full convolutional reverberation filter, the RIR is approximated by 2–3 delay taps incorporated into the Kalman state transition matrix. This drastically reduces computational load while still providing enough decorrelation to speed up convergence.

Performance is quantified with two metrics: (i) system distance sd(l) = ‖h – ĥₗ‖₂, evaluated early (sd₅, average over 4–6 s) and late (sd₂₀⁺, after 20 s), and (ii) PESQ, an objective speech quality measure mapped to MOS. The loop gain g is varied (0, 6, 12, 30 dB); for high gains a linear ramp (0–g over 0–10 s) prevents abrupt instability. Results show that each extension alone reduces sd₅ and sd₂₀⁺ by roughly 15–25 % and improves PESQ by 0.2–0.4 points. When all four extensions are combined, sd₅ drops by ~30 %, sd₂₀⁺ by >40 %, and PESQ gains 0.4–0.6 points. Notably, the combined system remains stable at 30 dB gain, where the baseline would fail.

The authors also report overflow statistics (clipping > +6 dB) only for the challenging 30 dB case; overflow percentages remain negligible for lower gains, confirming that the added processing does not introduce harmful amplitude excursions.

In summary, the paper demonstrates that (a) a multi‑delay Kalman‑filter AFC framework can be substantially improved by modest, low‑latency decorrelation techniques, (b) each technique contributes measurable bias reduction or convergence acceleration, and (c) their synergistic combination yields a robust AFC solution suitable for demanding applications such as in‑car hands‑free communication. The work provides concrete parameter ranges—maximum vibrato delay 2 ms, modulation frequency 1–2 Hz, THD 5–10 %—and validates them on publicly available speech corpora, offering a practical blueprint for engineers seeking to implement high‑gain, low‑latency acoustic echo cancellers. Future directions suggested include adaptive control of the nonlinear distortion level and extension to multi‑channel (stereo or microphone‑array) scenarios.


Comments & Academic Discussion

Loading comments...

Leave a Comment