UltraLIF: Fully Differentiable Spiking Neural Networks via Ultradiscretization and Max-Plus Algebra

Reading time: 5 minute
...

📝 Original Info

  • Title: UltraLIF: Fully Differentiable Spiking Neural Networks via Ultradiscretization and Max-Plus Algebra
  • ArXiv ID: 2602.11206
  • Date: 2026-02-10
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (원문에 저자 리스트가 포함되지 않아 알 수 없습니다.) **

📝 Abstract

Spiking Neural Networks (SNNs) offer energy-efficient, biologically plausible computation but suffer from non-differentiable spike generation, necessitating reliance on heuristic surrogate gradients. This paper introduces UltraLIF, a principled framework that replaces surrogate gradients with ultradiscretization, a mathematical formalism from tropical geometry providing continuous relaxations of discrete dynamics. The central insight is that the max-plus semiring underlying ultradiscretization naturally models neural threshold dynamics: the log-sum-exp function serves as a differentiable soft-maximum that converges to hard thresholding as a learnable temperature parameter $\eps \to 0$. Two neuron models are derived from distinct dynamical systems: UltraLIF from the LIF ordinary differential equation (temporal dynamics) and UltraDLIF from the diffusion equation modeling gap junction coupling across neuronal populations (spatial dynamics). Both yield fully differentiable SNNs trainable via standard backpropagation with no forward-backward mismatch. Theoretical analysis establishes pointwise convergence to classical LIF dynamics with quantitative error bounds and bounded non-vanishing gradients. Experiments on six benchmarks spanning static images, neuromorphic vision, and audio demonstrate improvements over surrogate gradient baselines, with gains most pronounced in single-timestep ($T{=}1$) settings on neuromorphic and temporal datasets. An optional sparsity penalty enables significant energy reduction while maintaining competitive accuracy.

💡 Deep Analysis

📄 Full Content

Spiking Neural Networks (SNNs) represent a promising paradigm for energy-efficient machine learning, with sig-nificant potential for neuromorphic hardware deployment (Maass, 1997;Roy et al., 2019). In contrast to artificial neural networks (ANNs) communicating via continuous activations, SNNs process information through discrete spike events, emulating biological neural computation. This eventdriven nature enables substantial energy savings; Intel's Loihi chip demonstrates up to 1000× energy reduction compared to GPUs on certain tasks (Davies et al., 2018).

However, SNN training remains challenging due to the nondifferentiability of spike generation. The standard Leaky Integrate-and-Fire (LIF) neuron follows the dynamics:

(1)

where the Heaviside step function H(•) has gradient zero almost everywhere. The dominant approach employs surrogate gradients, replacing the true gradient with a smooth approximation during backpropagation (Neftci et al., 2019;Zenke & Vogels, 2021). While empirically effective, surrogate gradients introduce a fundamental mismatch between forward (discrete) and backward (continuous) passes (Figure 1a), with limited theoretical understanding of convergence properties (Li et al., 2021;Gygax & Zenke, 2025).

This paper proposes UltraLIF, a theoretically grounded alternative based on ultradiscretization, a limiting procedure from tropical geometry transforming continuous dynamical systems into discrete max-plus systems while preserving structural properties (Tokihiro et al., 1996;Grammaticos et al., 2004). The key contributions are:

  1. Principled differentiability: The LSE function provides a natural soft relaxation of the max operation underlying spike generation, with explicit convergence bounds as ε → 0 (Lemma 3.2).

  2. Forward-backward consistency: Unlike surrogate methods, UltraLIF employs identical dynamics in forward and backward passes, eliminating gradient mismatch (Remark 5.6).

  3. Bounded gradients: For any ε > 0, gradients remain bounded and non-vanishing, enabling stable optimization (Proposition 5.4). 4. Consistent low-timestep improvements: On six benchmarks spanning static, neuromorphic, and audio modalities, ultradiscretized models improve over surrogate gradient baselines at T =1, with the largest gains on temporal and event-driven data (+11.22% SHD, +7.96% DVS-Gesture, +3.91% N-MNIST).

Surrogate Gradient Methods. The dominant paradigm for direct SNN training replaces non-differentiable spike gradients with smooth surrogates (Neftci et al., 2019). Common choices include piecewise linear (Bellec et al., 2018), sigmoid (Zenke & Ganguli, 2018), and arctangent (Fang et al., 2021) functions. Recent work introduces learnable surrogate parameters (Lian et al., 2023) and adaptive shapes (Li et al., 2021). Despite empirical success, the forwardbackward mismatch remains theoretically problematic. Gygax & Zenke (2025) provide partial justification via stochastic neurons, showing surrogate gradients match escape noise derivatives in expectation.

Spike Timing Approaches. SpikeProp (Bohte et al., 2002) and variants (Mostafa, 2018;Kheradpisheh & Masquelier, 2020) compute exact gradients with respect to spike times. These methods require careful initialization and struggle with silent neurons. Recent work on exact smooth gradients through spike timing (Göltz et al., 2021) addresses some limitations but remains computationally intensive.

ANN-to-SNN Conversion. An alternative approach trains conventional ANNs then converts to SNNs (Cao et al., 2015;Rueckauer et al., 2017;Bu et al., 2022). While avoiding direct SNN training, conversion methods typically require many timesteps to achieve ANN accuracy and sacrifice tem-poral dynamics.

Tropical Geometry and Neural Networks. Tropical geometry studies algebraic structures where addition becomes max (or min) and multiplication becomes addition (Maclagan & Sturmfels, 2015). Zhang et al. (2018) establish that ReLU networks compute tropical rational functions. Recent work extends this to graph neural networks (Pham & Garg, 2024) and neural network compression (Fotopoulos et al., 2024). The connection to spiking networks via ultradiscretization appears novel.

Ultradiscretization. Originating in integrable systems (Tokihiro et al., 1996), ultradiscretization transforms difference equations into cellular automata preserving solution structure. Applications include soliton systems (Takahashi & Satsuma, 1990) and limit cycle analysis (Yamazaki & Ohmori, 2023;2024). Application to neural networks has not been previously explored. This semiring underlies tropical geometry and provides the limit structure for ultradiscretization.

The log-sum-exp function with temperature ε > 0 is defined as:

The following lemma establishes its role as a smooth approximation to the maximum.

Proof. For the lower bound, the sum includes e M/ε , so

The limit follows by the squeeze theorem.

The gradient formula follows from direct differentiation:

Remark 3.3. When

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut