The Stationary Phase Approximation, Time-Frequency Decomposition and Auditory Processing

The principle of stationary phase (PSP) is re-examined in the context of linear time-frequency (TF) decomposition using Gaussian, gammatone and gammachirp filters at uniform, logarithmic and cochlear spacings in frequency. This necessitates consideration of the use the PSP on non-asymptotic integrals and leads to the introduction of a test for phase rate dominance. Regions of the TF plane that pass the test and don’t contain stationary phase points contribute little or nothing to the final output. Analysis values that lie in these regions can thus be set to zero, i.e. sparsity. In regions of the TF plane that fail the test or are in the vicinity of stationary phase points, synthesis is performed in the usual way. A new interpretation of the location parameters associated with the synthesis filters leads to: (i) a new method for locating stationary phase points in the TF plane; (ii) a test for phase rate dominance in that plane. Together this is a TF stationary phase approximation (TFSFA) for both analysis and synthesis. The stationary phase regions of several elementary signals are identified theoretically and examples of reconstruction given. An analysis of the TF phase rate characteristics for the case of two simultaneous tones predicts and quantifies a form of simultaneous masking similar to that which characterizes the auditory system.

💡 Research Summary

The paper revisits the Principle of Stationary Phase (PSP) within the framework of linear time‑frequency (TF) decomposition, focusing on three families of analysis filters—Gaussian, gammatone, and gammachirp—arranged with uniform, logarithmic, and cochlear‑like spacings. Because PSP traditionally assumes asymptotic integrals, its direct use on the non‑asymptotic integrals that arise in TF analysis is problematic. To overcome this, the authors introduce a “Phase‑Rate Dominance Test” (PRDT). For each point (t, ω) in the TF plane the test compares the magnitude of the phase gradients ∂Φ/∂t and ∂Φ/∂ω with the corresponding amplitude gradients. If the phase gradients dominate, the point is deemed to contain no stationary‑phase contribution; its analysis coefficient can be set to zero without appreciable loss of information. This yields a sparsity‑promoting representation: large regions of the TF plane are eliminated, while only points that either fail the PRDT or lie in the vicinity of true stationary‑phase points are retained for synthesis.

A second contribution is a new interpretation of the synthesis‑filter location parameters, which leads to an efficient algorithm for locating stationary‑phase points in the TF plane. By adjusting the centre frequency and time‑delay of each synthesis filter to minimise the local phase gradient, the algorithm identifies the exact coordinates where both ∂Φ/∂t = 0 and ∂Φ/∂ω = 0. The combined approach—PRDT‑based sparsification together with precise stationary‑phase localisation—is termed the Time‑Frequency Stationary Phase Approximation (TFSFA). TFSFA applies symmetrically to analysis and synthesis, offering a unified, computationally cheap method for TF processing.

The authors validate the theory on several elementary signals. For a single sinusoid, a Gaussian pulse, and a chirp, the stationary‑phase regions are derived analytically and reconstruction errors are shown to be negligible when the TFSFA is employed. The most striking result concerns the case of two simultaneous tones. By examining the TF phase‑rate field, the authors demonstrate that the stronger tone imposes a dominant phase‑rate region that suppresses the weaker tone’s stationary‑phase points. Consequently, the weaker tone’s coefficients are driven to zero, reproducing a form of simultaneous masking that mirrors psychoacoustic observations. The paper quantifies this masking effect in terms of the relative phase‑rate magnitudes and the spacing of the tones, providing a mathematically grounded model of auditory masking.

Beyond the theoretical contributions, the paper discusses practical implications. The sparsity introduced by PRDT reduces the number of TF coefficients that must be stored or processed, which is advantageous for real‑time audio applications such as speech recognition, hearing‑aid signal processing, and auditory scene analysis. Moreover, because the masking model emerges naturally from the TF phase‑rate analysis, filter banks can be designed to exploit this property, yielding perceptually motivated representations that align with human auditory processing.

In summary, the work delivers a novel, rigorously derived TF stationary‑phase framework that unifies analysis, synthesis, and perceptual masking. By marrying PSP with a phase‑rate dominance criterion, it achieves both computational efficiency (through sparsity) and physiological relevance (through a quantitative model of simultaneous masking), opening new avenues for efficient, biologically inspired audio signal processing.