Fire on Motion: Optimizing Video Pass-bands for Efficient Spiking Action Recognition

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Spiking neural networks (SNNs) have gained traction in vision due to their energy efficiency, bio-plausibility, and inherent temporal processing. Yet, despite this temporal capacity, most progress concentrates on static image benchmarks, and SNNs still underperform on dynamic video tasks compared to artificial neural networks (ANNs). In this work, we diagnose a fundamental pass-band mismatch: Standard spiking dynamics behave as a temporal low pass that emphasizes static content while attenuating motion bearing bands, where task relevant information concentrates in dynamic tasks. This phenomenon explains why SNNs can approach ANNs on static tasks yet fall behind on tasks that demand richer temporal understanding.To remedy this, we propose the Pass-Bands Optimizer (PBO), a plug-and-play module that optimizes the temporal pass-band toward task-relevant motion bands. PBO introduces only two learnable parameters, and a lightweight consistency constraint that preserves semantics and boundaries, incurring negligible computational overhead and requires no architectural changes. PBO deliberately suppresses static components that contribute little to discrimination, effectively high passing the stream so that spiking activity concentrates on motion bearing content. On UCF101, PBO yields over ten percentage points improvement. On more complex multi-modal action recognition and weakly supervised video anomaly detection, PBO delivers consistent and significant gains, offering a new perspective for SNN based video processing and understanding.

💡 Research Summary

This paper investigates why spiking neural networks (SNNs), despite their temporal processing capabilities, lag behind artificial neural networks (ANNs) on video‑based tasks such as action recognition and anomaly detection. The authors identify a fundamental “pass‑band mismatch”: the leaky integrate‑and‑fire (LIF) neuron, which is the core of most SNNs, acts as a first‑order low‑pass filter in the temporal domain. By deriving the discrete‑time Fourier transform of the LIF update equation, they show that the frequency response |H_LIF(e^{jω})|² equals (1‑α)²/(1+α²‑2αcos ω), where α∈(0,1) is the leakage factor. This response preserves the DC component (ω = 0) but attenuates all higher frequencies, especially the mid‑to‑high bands where motion information resides. Consequently, static background (B) and low‑frequency noise dominate the spiking budget, while the motion component (M

Fire on Motion: Optimizing Video Pass-bands for Efficient Spiking Action Recognition

💡 Research Summary

Comments & Academic Discussion

Leave a Comment