Wandering around: A bioinspired approach to visual attention through object motion sensitivity

Wandering around: A bioinspired approach to visual attention through object motion sensitivity
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Active vision enables dynamic visual perception, offering an alternative to static feedforward architectures in computer vision, which rely on large datasets and high computational resources. Biological selective attention mechanisms allow agents to focus on salient Regions of Interest (ROIs), reducing computational demand while maintaining real-time responsiveness. Event-based cameras, inspired by the mammalian retina, enhance this capability by capturing asynchronous scene changes enabling efficient low-latency processing. To distinguish moving objects while the event-based camera is in motion the agent requires an object motion segmentation mechanism to accurately detect targets and center them in the visual field (fovea). Integrating event-based sensors with neuromorphic algorithms represents a paradigm shift, using Spiking Neural Networks to parallelize computation and adapt to dynamic environments. This work presents a Spiking Convolutional Neural Network bioinspired attention system for selective attention through object motion sensitivity. The system generates events via fixational eye movements using a Dynamic Vision Sensor integrated into the Speck neuromorphic hardware, mounted on a Pan-Tilt unit, to identify the ROI and saccade toward it. The system, characterized using ideal gratings and benchmarked against the Event Camera Motion Segmentation Dataset, reaches a mean IoU of 82.2% and a mean SSIM of 96% in multi-object motion segmentation. The detection of salient objects reaches 88.8% accuracy in office scenarios and 89.8% in low-light conditions on the Event-Assisted Low-Light Video Object Segmentation Dataset. A real-time demonstrator shows the system’s 0.12 s response to dynamic scenes. Its learning-free design ensures robustness across perceptual scenes, making it a reliable foundation for real-time robotic applications serving as a basis for more complex architectures.


💡 Research Summary

The paper presents a fully bio‑inspired visual attention system that operates in real time on a neuromorphic platform, combining an event‑driven Dynamic Vision Sensor (DVS) with the Speck 2 spiking hardware. The core idea is to emulate the retinal object‑motion‑sensitivity (OMS) found in mammals, which suppresses global motion caused by self‑movement while amplifying relative motion of objects in the scene. This is realized through a spiking Object Motion Sensitivity (sOMS) module that processes asynchronous events with spiking convolutional filters, effectively filtering out egomotion‑induced noise.

The sOMS output feeds a Spiking Neural Network (SNN) Proto‑Object module that implements Gestalt principles (continuity, proximity, figure‑ground) in the firing patterns of spiking neurons, grouping pixels into candidate object regions (proto‑objects). A subsequent Spiking Attention Control (sAC) module selects the most salient proto‑object, generates pan‑tilt commands for a mounted PTU, and performs rapid saccadic movements toward the target. Small fixational eye‑movements are interleaved to prevent static background events from fading, thereby continuously refreshing the saliency map.

The architecture is entirely learning‑free: all parameters are set analytically based on biological studies and prior algorithmic work, allowing the system to operate out‑of‑the‑box in new environments without retraining. Performance is evaluated on two public benchmarks. On the EVIMO motion‑segmentation dataset the system achieves a mean Intersection‑over‑Union of 82.2 % and a Structural Similarity Index of 96 %, surpassing conventional frame‑based and non‑spiking approaches. On the low‑light LLE‑VOS dataset it reaches 88.8 % accuracy in indoor scenes and 89.8 % outdoors, demonstrating robustness to illumination changes. A live demonstration shows the complete pipeline processing a dynamic scene in 0.124 seconds, enabling the robot to “wander” and repeatedly saccade toward the most salient moving object.

Key contributions include: (1) a hardware‑realized spiking OMS model that mimics retinal centre‑surround inhibition, (2) a spiking convolutional neural network (sCNN) that performs object motion segmentation with orders‑of‑magnitude reduction in data bandwidth, (3) integration of a spiking proto‑object detector that provides reliable multi‑object grouping without learning, and (4) a closed‑loop attention system that translates saliency into motor commands on a neuromorphic platform, achieving low power consumption and sub‑150 ms latency.

The work demonstrates that combining event‑based sensing with neuromorphic computation can deliver biologically plausible, energy‑efficient visual attention suitable for autonomous robots, drones, and edge AI devices where computational resources and power are limited. It opens avenues for more complex hierarchical perception‑action loops that retain the advantages of spike‑based processing while remaining adaptable to diverse, real‑world conditions.


Comments & Academic Discussion

Loading comments...

Leave a Comment