SpikingGamma: Surrogate-Gradient Free and Temporally Precise Online Training of Spiking Neural Networks with Smoothed Delays

SpikingGamma: Surrogate-Gradient Free and Temporally Precise Online Training of Spiking Neural Networks with Smoothed Delays
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Neuromorphic hardware implementations of Spiking Neural Networks (SNNs) promise energy-efficient, low-latency AI through sparse, event-driven computation. Yet, training SNNs under fine temporal discretization remains a major challenge, hindering both low-latency responsiveness and the mapping of software-trained SNNs to efficient hardware. In current approaches, spiking neurons are modeled as self-recurrent units, embedded into recurrent networks to maintain state over time, and trained with BPTT or RTRL variants based on surrogate gradients. These methods scale poorly with temporal resolution, while online approximations often exhibit instability for long sequences and tend to fail at capturing temporal patterns precisely. To address these limitations, we develop spiking neurons with internal recursive memory structures that we combine with sigma-delta spike-coding. We show that this SpikingGamma model supports direct error backpropagation without surrogate gradients, can learn fine temporal patterns with minimal spiking in an online manner, and scale feedforward SNNs to complex tasks and benchmarks with competitive accuracy, all while being insensitive to the temporal resolution of the model. Our approach offers both an alternative to current recurrent SNNs trained with surrogate gradients, and a direct route for mapping SNNs to neuromorphic hardware.


💡 Research Summary

The paper introduces SpikingGamma, a novel spiking neural network (SNN) architecture that eliminates the need for surrogate gradients (SG) and back‑propagation through time (BPTT) while preserving precise temporal learning capabilities. Traditional SNN training treats neurons as self‑recurrent units and relies on SG‑based BPTT or RTRL to propagate errors through the discontinuous spike function. This approach suffers from exploding/vanishing gradients, high memory and computational cost at fine temporal resolutions, and instability in online approximations for long sequences.

SpikingGamma tackles these issues by embedding an adaptive recursive memory—implemented as a cascade of leaky “buckets”—inside each neuron. Incoming spikes are filtered through multiple temporal kernels (κ_k) with distinct decay rates (α_k), producing a set of delayed internal states (\hat y_{k i}(t)). These states are linearly combined using learned bucket weights (v_{k j}) (or per‑synapse weights (v_{k ij})) and synaptic weights (w_{ij}) to form a continuous pre‑activation signal (x_j(t)). After a ReLU non‑linearity, the neuron’s analog output (y_j(t)) is obtained.

Temporal discretization is achieved via sigma‑delta coding: the neuron maintains a running estimate (\hat y_j(t)) of its own analog signal. When the mismatch (z_j(t)=y_j(t)-\hat y_j(t-1)) exceeds a dynamic threshold (\theta_j(t)), a spike is emitted and the estimate is corrected by adding a refractory response. Crucially, the estimate (\hat y_j(t)) is expressed in the same bucket basis as the inputs, meaning downstream synapses can directly use (\hat y_{k j}(t)) without decoding spikes.

Because (\hat y_j) is an explicit linear approximation of (y_j), the derivative (\partial \hat y_j / \partial y_j = 1). Consequently, gradients of a loss (cross‑entropy for classification or MSE for precise spike‑timing) with respect to both synaptic weights and bucket weights can be computed by ordinary chain‑rule differentiation on the forward‑pass equations. No surrogate gradient is required, and the error signal bypasses the discrete spike events entirely. This yields a true online learning rule: at each timestep the loss is evaluated, gradients are computed, and parameters are updated immediately.

Key advantages of the SpikingGamma framework are:

  1. Temporal‑resolution invariance – The bucket memory compresses the entire past into a fixed‑size representation, so increasing the simulation timestep does not increase memory or computational complexity beyond a linear factor.
  2. Online stability – Since gradients are computed locally at each timestep, error accumulation over long sequences is avoided, mitigating the instability seen in approximate BPTT methods.
  3. Sparsity‑driven efficiency – Sigma‑delta coding fires spikes only when the approximation error exceeds the threshold, naturally limiting spike counts and preserving the energy‑saving promise of SNNs.
  4. Feed‑forward long‑range dependency modeling – Multiple buckets act as learned delays, allowing a purely feed‑forward network to capture temporal relationships that would otherwise require recurrent cells such as LSTMs.

The authors validate the approach on five benchmarks: (a) simple delay detection, (b) an echo‑location task, (c) DVS Gesture (event‑based vision), (d) SHD (speech‑handwritten digit), and (e) SSC (spike‑sequence classification). In DVS Gesture and SHD, SpikingGamma surpasses state‑of‑the‑art online SNN methods (e.g., FPTT, ES‑D‑RTRL) by 2–5 % absolute accuracy while maintaining an average firing rate below 5 %. Moreover, performance remains stable when the simulation timestep is varied from 0.1 ms to 1 ms, confirming the claimed temporal‑resolution insensitivity.

Limitations noted include the use of globally shared bucket weights per neuron (or per synapse) which may restrict the expressiveness needed for highly non‑linear temporal transformations, and the lack of a concrete hardware mapping strategy for the bucket dynamics. Future work could explore dynamic bucket‑weight learning, hierarchical multi‑scale bucket architectures, and ASIC/FPGA implementations that exploit the analog nature of leaky‑bucket circuits.

In summary, SpikingGamma presents a compelling alternative to surrogate‑gradient‑based recurrent SNNs. By integrating adaptive delayed memory and sigma‑delta coding, it enables direct error back‑propagation without BPTT, achieves precise temporal learning at arbitrary fine resolutions, and aligns closely with the constraints of neuromorphic hardware. This work opens a practical pathway for deploying high‑performance, low‑latency spiking networks in real‑world energy‑constrained applications.


Comments & Academic Discussion

Loading comments...

Leave a Comment