SynSacc: A Blender-to-V2E Pipeline for Synthetic Neuromorphic Eye-Movement Data and Sim-to-Real Spiking Model Training

SynSacc: A Blender-to-V2E Pipeline for Synthetic Neuromorphic Eye-Movement Data and Sim-to-Real Spiking Model Training
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The study of eye movements, particularly saccades and fixations, are fundamental to understanding the mechanisms of human cognition and perception. Accurate classification of these movements requires sensing technologies capable of capturing rapid dynamics without distortion. Event cameras, also known as Dynamic Vision Sensors (DVS), provide asynchronous recordings of changes in light intensity, thereby eliminating motion blur inherent in conventional frame-based cameras and offering superior temporal resolution and data efficiency. In this study, we introduce a synthetic dataset generated with Blender to simulate saccades and fixations under controlled conditions. Leveraging Spiking Neural Networks (SNNs), we evaluate its robustness by training two architectures and finetuning on real event data. The proposed models achieve up to 0.83 accuracy and maintain consistent performance across varying temporal resolutions, demonstrating stability in eye movement classification. Moreover, the use of SNNs with synthetic event streams yields substantial computational efficiency gains over artificial neural network (ANN) counterparts, underscoring the utility of synthetic data augmentation in advancing event-based vision. All code and datasets associated with this work is available at https: //github.com/Ikhadija-5/SynSacc-Dataset.


💡 Research Summary

The paper addresses a critical bottleneck in eye‑movement research: the scarcity of finely annotated, high‑temporal‑resolution event‑camera data for classifying saccades and fixations. To overcome this, the authors introduce a fully synthetic pipeline that starts with Blender, a 3‑D creation suite, to generate realistic eye‑movement sequences. By manipulating the armature bones of a left and right eye model, they programmatically produce saccadic rotations within user‑defined angular limits and frame intervals. The right‑eye motion is derived by mirroring the left‑eye rotation on the Y‑axis, ensuring biologically plausible binocular coordination. Each rotation is key‑framed, yielding precise ground‑truth trajectories for both eyes.

The rendered RGB frames are then fed into the V2E (Video‑to‑Event) simulator. Prior to event generation, the authors up‑sample the video eight‑fold using Super‑SloMo, dramatically increasing temporal granularity. V2E converts each pixel’s logarithmic intensity change into an event when it exceeds a threshold of 0.2 (both ON and OFF), and it applies the “Noisy” preset to inject realistic sensor artifacts such as shot noise, leak events, and background activity. The resulting event streams match the 346 × 260 resolution of a DVS346 sensor and retain the fine‑grained timing needed for saccade analysis.

For neural‑network training, the authors adopt a binary spike representation: each event is directly mapped to a spike tensor S(x, y, k, p), where k denotes a discrete time bin. They choose rate coding—counting spikes within a fixed temporal window—to improve robustness against timing jitter and to simplify gradient‑based learning. Two lightweight spiking neural network (SNN) architectures are evaluated. Both employ the current‑based leaky integrate‑and‑fire (CUBA‑LIF) neuron model, which separates synaptic current accumulation from membrane voltage updates, yielding more stable surrogate‑gradient training. The first architecture is a three‑layer convolution‑pool‑global‑pool network that extracts spatio‑temporal features; the second is a fully‑connected network that directly aggregates spike counts. Experiments span multiple temporal resolutions (10 ms, 20 ms, 40 ms), and both models achieve classification accuracies between 0.78 and 0.83, showing little sensitivity to bin size.

To validate real‑world relevance, the best‑performing SNN is fine‑tuned on the EV‑Eye dataset, a large multimodal event‑camera collection that includes saccades, fixations, and smooth pursuits but lacks fine‑grained labels. Transfer learning demonstrates that pre‑training on the synthetic SynSacc data accelerates convergence and improves final performance. Moreover, the SNN requires roughly one‑fifth the synaptic operations and consumes about one‑third the energy of comparable artificial neural networks (ANNs) trained on the same task, confirming the computational efficiency of event‑driven spiking computation.

The paper’s contributions are fourfold: (1) a reproducible Blender‑V2E pipeline that automatically generates annotated event streams for eye‑movement research; (2) a binary spike, rate‑coded data format that preserves temporal fidelity while remaining sparse; (3) two compact CUBA‑LIF SNN models that achieve state‑of‑the‑art accuracy with markedly lower computational cost; and (4) empirical evidence that synthetic event data can serve as effective pre‑training material for real‑world neuromorphic vision tasks, all while respecting GDPR‑compliant data generation. The work opens avenues for large‑scale, privacy‑preserving eye‑movement datasets and demonstrates the practical advantages of spiking architectures in neuromorphic perception applications.


Comments & Academic Discussion

Loading comments...

Leave a Comment