Spatiotemporal Gabor filters: a new method for dynamic texture recognition

This paper presents a new method for dynamic texture recognition based on spatiotemporal Gabor filters. Dynamic textures have emerged as a new field of investigation that extends the concept of self-similarity of texture image to the spatiotemporal domain. To model a dynamic texture, we convolve the sequence of images to a bank of spatiotemporal Gabor filters. For each response, a feature vector is built by calculating the energy statistic. As far as the authors know, this paper is the first to report an effective method for dynamic texture recognition using spatiotemporal Gabor filters. We evaluate the proposed method on two challenging databases and the experimental results indicate that the proposed method is a robust approach for dynamic texture recognition.

💡 Research Summary

The paper introduces a novel approach for dynamic‑texture recognition that leverages a bank of spatiotemporal Gabor filters. Traditional texture analysis deals with static, two‑dimensional patterns, but dynamic textures evolve over time (e.g., water ripples, fire, smoke). To capture both spatial structure and temporal dynamics, the authors extend the classic 2‑D Gabor filter into three dimensions by combining a Gaussian spatial envelope with a sinusoidal carrier that is modulated in both space and time. Each filter is parameterized by orientation (θ), spatial wavelength (λ), Gaussian scale (σ), and a temporal velocity (v) that controls the rate of phase progression along the time axis. By constructing a filter bank that samples several orientations and velocities, the method can respond selectively to motion direction, speed, and spatial frequency content.

For a given video sequence I(x, y, t), the authors convolve it with every filter in the bank, producing a set of response volumes R_i(x, y, t). From each response they compute a single scalar – the energy statistic E_i = Σ_{x,y,t} R_i(x, y, t)^2 – which aggregates the squared magnitude over the entire spatiotemporal support. This energy captures how strongly the video exhibits the particular spatiotemporal frequency–orientation pattern encoded by the filter, while being robust to phase variations and moderate noise. Stacking the energies from all filters yields a compact feature vector V =

💡 Research Summary

📜 Original Paper Content