Differentiable Grouped Feedback Delay Networks for Learning Coupled Volume Acoustics

Differentiable Grouped Feedback Delay Networks for Learning Coupled Volume Acoustics
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Rendering dynamic reverberation in a complicated acoustic space for moving sources and listeners is challenging but crucial for enhancing user immersion in extended-reality (XR) applications. Capturing spatially varying room impulse responses (RIRs) is costly and often impractical. Moreover, dynamic convolution with measured RIRs is computationally expensive with high memory demands, typically not available on wearable computing devices. Grouped Feedback Delay Networks (GFDNs), on the other hand, allow efficient rendering of coupled room acoustics. However, its parameters need to be tuned to match the reverberation profile of a coupled space. In this work, we propose the concept of Differentiable GFDNs (DiffGFDNs), which have tunable parameters that are optimised to match the late reverberation profile of a set of RIRs captured from a space that exhibits multi-slope decay. Once trained on a finite set of measurements, the DiffGFDN interpolates to unmeasured locations in the space. We propose a parallel processing pipeline that has multiple DiffGFDNs with frequency-independent parameters processing each octave band. The parameters of the DiffGFDN can be updated rapidly during inferencing as sources and listeners move. We evaluate the proposed architecture against the Common Slopes (CS) model on a dataset of RIRs for three coupled rooms. The proposed architecture generates multi-slope late reverberation with low memory and computational requirements, achieving a better energy decay relief (EDR) error and slightly worse octave-band energy decay curve (EDC) errors compared to the CS model. Furthermore, DiffGFDN requires an order of magnitude fewer floating-point operations per sample than the CS renderer.


💡 Research Summary

The paper addresses the challenge of rendering dynamic reverberation for moving sources and listeners in complex acoustic environments, a critical requirement for immersive extended‑reality (XR) applications. Traditional approaches that rely on measured room impulse responses (RIRs) suffer from prohibitive memory consumption and computational cost, especially on wearable devices. While feedback delay networks (FDNs) provide a low‑complexity artificial reverberator, they cannot capture the multi‑slope decay patterns typical of coupled rooms with non‑uniform absorption.

To overcome these limitations, the authors extend their previously introduced Grouped Feedback Delay Network (GFDN) by making it differentiable, yielding the DiffGFDN architecture. A GFDN consists of several groups of delay lines, each group associated with its own absorption filter (i.e., decay time). The groups can be interpreted as individual sub‑rooms or wall sections with distinct acoustic properties. Inter‑group coupling is controlled by a unitary feedback matrix; a block‑diagonal matrix represents weakly coupled or decoupled sub‑spaces, while a dense matrix would model strong acoustic coupling.

DiffGFDN separates parameters into two categories. Position‑invariant parameters—delay line lengths, absorption filters, scalar input/output gains, and the feedback matrix—are shared across the entire space and are learned globally. Position‑dependent parameters—source filters g_i and receiver filters g_o—are functions of the 3‑D coordinates of the source and listener. These spatially varying filters are generated by a multilayer perceptron (MLP) that maps (x, y, z) to filter gains for each group. Because the MLP is differentiable, the whole system can be trained end‑to‑end using gradient descent.

The forward model can be written as

\


Comments & Academic Discussion

Loading comments...

Leave a Comment