AceleradorSNN: A Neuromorphic Cognitive System Integrating Spiking Neural Networks and DynamicImage Signal Processing on FPGA

AceleradorSNN: A Neuromorphic Cogniti v e System Inte grating Spiking Neural Netw orks and Dynamic Image Signal Processing on FPGA Daniel Gutierrez, Ruben Martinez, Leyre Arnedo, Antonio Cuesta, Soukaina El Hamry Intigia R&D Department Alicante, Spain Abstract —The demand f or high-speed, low-latency , and energy-efﬁcient object detection in autonomous systems—such as advanced driv er -assistance systems (AD AS), unmanned aerial vehicles (U A Vs), and Industry 4.0 robotics—has exposed the limi- tations of traditional Conv olutional Neural Networks (CNNs). T o address these challenges, Intigia has developed AceleradorSNN, a third-generation artiﬁcial intelligence cognitive system. This architectur e integrates a Neuromorphic Processing Unit (NPU) based on Spiking Neural Networks (SNNs) to pr ocess asyn- chronous data from Dynamic Vision Sensors (DVS), alongside a dynamically reconﬁgurable Cognitive Image Signal Processor (ISP) for RGB cameras. This paper details the hardware-oriented design of both IP cores, the evaluation of surrogate-gradient- trained SNN backbones, and the real-time streaming ISP ar- chitecture implemented on Field-Programmable Gate Arrays (FPGA). Index T erms —Spiking Neural Networks, FPGA, Dynamic V i- sion Sensor , Image Signal Pr ocessor , Neur omorphic Computing, Artiﬁcial Intelligence. I . I N T RO D U C T I O N The ev olution of autonomous navigation and industrial inspection requires vision systems capable of operating in real- time under highly variable lighting conditions while main- taining strict power constraints. Con ventional Deep Neural Networks (DNNs) and CNNs process continuous frame data, leading to high computational ov erhead and signiﬁcant energy consumption. Furthermore, standard RGB sensors struggle with high-speed motion and dynamic range. T o overcome these limitations, event-based cameras (Dy- namic V ision Sensors, DVS) hav e emerged as a powerful alternativ e. D VS pixels respond asynchronously to changes in illumination, offering microsecond latency and high dynamic range. Howe ver , ev ent data lacks the static, high-resolution texture and color information critical for certain classiﬁcation tasks. The AceleradorSNN project introduces a synergistic archi- tecture that merges the advantages of D VS and RGB sensors. By emplo ying a Neuromorphic Processing Unit (NPU) utiliz- ing Spiking Neural Networks (SNNs) to process D VS ev ents, the system achiev es ultra-fast object detection. Crucially , this NPU acts as a cogniti ve controller , generating real-time ad- justment instructions for a Cognitive Image Signal Proces- sor (ISP), which dynamically reconﬁgures the RGB camera parameters to adapt to environmental changes. Designed for FPGA and ASIC implementation, the system achieves a high T echnology Readiness Le vel (TRL 6/7) tailored for critical embedded applications. I I . R E L A T E D W O R K The integration of ev ent-based vision and neuromorphic computing has been extensi vely explored in recent literature. Gallego et al. highlighted the advantages of event cameras for high-speed tracking, though noting the challenges of their sparse, asynchronous nature [1]. T o process this data, SNNs offer a biologically inspired approach, communicating via discrete spikes rather than continuous v alues, signiﬁcantly reducing energy consumption. Cordone et al. demonstrated the viability of adapting tradi- tional CNN architectures (such as VGG and DenseNet) into the spiking domain for automoti ve event data [2]. Fan et al. further advanced this with the Spiking Fusion Object Detector (SFOD), proving that SNNs can achiev e robust multiscale object detection [3]. In parallel, hardware-ef ﬁcient ISP designs are critical for embedded vision. Algorithms like the Malvar -He-Cutler linear image demosaicing [5] and Non-Local Means (NLM) de- noising [7], speciﬁcally adapted for FPGA architectures by K oizumi and Maruyama [6], provide the foundation for low- latency spatial image processing without relying on e xternal memory buf fers. Additionally , dynamic defectiv e pixel cor- rection techniques, such as those proposed by Y ongji and Xiaojun [8], are essential for maintaining sensor ﬁdelity in harsh en vironments. I I I . S Y S T E M A R C H I T E C T U R E O V E RV I E W The AceleradorSNN framework is divided into two pri- mary Intellectual Property (IP) cores, designed to operate synchronously on an FPGA platform: 1) Neuromorphic Processor (PNN): The central AI com- ponent. It implements an SNN to process D VS ev ents in real-time, detecting dynamic objects and generating parameter adjustment instructions based on the scene’ s lighting and motion proﬁle. 2) Cognitive ISP: A fully pipelined hardware module that receiv es the control instructions from the PNN and dynamically adjusts its internal algorithms (e.g., exposure, white balance) to optimize the high-resolution output of a standard RGB sensor . By combining these modules, the system achie ves max- imum performance in perception precision, versatility , and adaptability to changing conditions, all while minimizing energy consumption. I V . N E U R O M O R PH I C P R O C E S S I N G U N I T ( N P U ) D E S I G N The NPU is engineered to e xtract spatial and temporal features from D VS data using spiking neurons. A. Event Encoding Raw D VS ev ents are represented as tuples containing the timestamp, spatial coordinates, and polarity: e = ( t, x, y, p ) . Because SNNs operate in discrete time steps, the continuous asynchronous stream is segmented into ﬁxed temporal win- dows [4]. W ithin each window , ev ents are aggregated into temporal bins and encoded using a one-hot spatial-temporal vox el grid. This transformation yields a multidimensional tensor representing time steps, polarity channels, and spatial dimensions, which serves as the input to the spiking con volu- tional layers. B. Neur on Model and T raining The architecture utilizes the Leak y Integrate-and-Fire (LIF) neuron model, balancing biological realism with computa- tional ef ﬁciency . The LIF neuron simulates the natural loss of char ge via a decay term. Its membrane potential u ( t ) is gov erned by the differential equation: τ m du ( t ) dt = u rest − u ( t ) + RI ( t ) (1) where τ m is the membrane time constant, R is resistance, and I ( t ) is the input current. A spike is emitted when the membrane potential reaches a deﬁned threshold, after which the potential is reset. Because the discrete ﬁring e vent is non- differentiable, training is conducted using Surrogate Gradients, which approximate the deriv ative of the spike function. This allows the use of Backpropagation Through T ime (BPTT) and standard optimizers like AdamW to update network weights effecti vely . C. Backbone Evaluation T o optimize the NPU, three adapted spiking architectures were ev aluated using the Prophesee’ s GEN1 Automotiv e De- tection Dataset (Prophesee GEN1): • Spiking-VGG: A deep, uniform sequence of con volu- tional layers ideal for hierarchical feature extraction. • Spiking-DenseNet: Features dense blocks where the out- put of each layer feeds into all subsequent layers, pre vent- ing gradient vanishing and promoting feature reuse. • Spiking-MobileNet: Utilizes depthwise separable con vo- lutions to drastically reduce parameter count and compu- tational cost. • Spiking Y OLO: Con verts the standard Y OLO archi- tecture into a Spiking Neural Network (SNN), utilizing temporal spikes rather than continuous v alues to achiev e high performance on neuromorphic hardware. Experimental results with quantized models indicated that Spiking YOLO achiev ed the highest ov erall precision (A v- erage Precision of 0.4726 at IoU 0.50), presenting the best balance between accuracy and computational cost. Conv ersely , Spiking-MobileNet exhibited the highest network sparsity (48.08%), meaning a large proportion of neurons remained inactiv e, which is highly desirable for maximizing ener gy efﬁcienc y in low-po wer edge devices. V . C O G N I T I V E I M AG E S I G N A L P R O C ES S O R ( I S P ) D E S I G N The Cogniti ve ISP is designed entirely in HDL as a modular , parameterizable pipeline. It operates on a continuous data stream, processing pix els individually as they traverse the pipeline without the need to store full image frames, thereby drastically reducing hardware area. A. AXI4-Str eam Inte gration Data transfer between ISP modules relies on the AXI4- Stream protocol, a point-to-point standard ideal for embedded systems. Handshaking is governed by tvalid (master indi- cating valid data) and tready (slav e indicating readiness), ensuring seamless data ﬂow and pipeline stalling when neces- sary . B. Pipeline Stages and Algorithms The ISP implements a sequence of real-time correction and enhancement stages: 1) Dynamic Defective Pixel Correction (DPC): Based on the algorithm by Y ongji and Xiaojun [8], this module identiﬁes dead pixels by analyzing a 5 × 5 spatial win- dow . Line buf fers are utilized to cache incoming ro ws. A pixel is marked defecti ve if its intensity signiﬁcantly deviates from its neighbors across multiple directional gradients. 2) A uto White Balance (A WB) & White Balance (WB): A state machine analyzes the image to calculate RGB gains, discarding overe xposed and underexposed pixels before applying the corrective gains dynamically based on NPU feedback. 3) Demosaicing: Con verts Bayer pattern data to full RGB. The hardware implementation utilizes the Malvar -He- Cutler linear interpolation method [5], which estimates missing color channels based on surrounding gradients to preserve sharp edges. 4) Non-Local Means (NLM) Denoising: T o remo ve Gaus- sian noise without blurring edges, the system imple- ments an FPGA-adapted NLM algorithm [6]. Unlike local ﬁlters, NLM relies on structural redundancy , com- puting the Euclidean distance between a reference patch and neighboring patches within a search window . 5) Gamma Correction & Color Space Conv ersion: Cus- tom LUTs (Look-Up T ables) apply non-linear gamma curves, follo wed by a conﬁgurable ﬁxed-point arithmetic module to con vert the RGB signal to the YCbCr color space for independent luminance sharpening. V I . S Y S T E M I N T E G R AT I O N A N D H A R D W A R E D E P L OY M E N T The complete AceleradorSNN system is uniﬁed through a top-level integration module. The architecture allows the NPU and ISP to operate in a closed cognitiv e loop. The NPU interfaces directly with the DVS sensor logic and processes ev ent tensors via the spiking neural network. Once the NPU detects an object and identiﬁes localized lighting anomalies, it transmits conﬁguration parameters via a control interface to the ISP . The ISP’ s synchronization controller aligns the D VS and RGB data streams. It interprets the NPU’ s parame- ter updates—such as modifying the A WB gains, tweak- ing the Gamma LUTs, or adjusting the NLM denoising strength—applying these changes on-the-ﬂy to the RGB feed. This allows the extraction of high-resolution, context-rich images of the detected objects, overcoming the traditional trade-offs between speed, dynamic range, and image ﬁdelity . The entire framework has been synthesized and v alidated on FPGA, with tools mapping the design for future ASIC fabrication. V I I . C O N C L U S I O N Intigia’ s AceleradorSNN establishes a new paradigm in embedded artiﬁcial intelligence for aerospace, automotiv e, and industrial applications. By successfully marrying the ultra-low latency and energy efﬁcienc y of event-dri ven Spiking Neural Networks with the high-resolution output of a dynamically controlled Cognitive ISP , the system transcends the limitations of con ventional CNNs. The rigorous FPGA implementation of local streaming algorithms and surrogate-gradient trained LIF networks demonstrates a viable, high-TRL pathway to- ward fully autonomous, highly adaptable, and environmentally sustainable AI edge perception. A C K N O W L E D G M E N T This work has been supported by the program “Proyectos de desarrollo experimental en el ´ area de la inteligencia artiﬁcial y para el impulso al desarrollo de espacios de datos sectoriales”, within the framework of the Recovery , T ransformation and Resilience Plan, funded by the European Union - NextGenera- tionEU, under the project reference number INREIA/2024/62. This speciﬁc grant (INREIA/2024/62) was ofﬁcially awarded to Intigia S.L. under this NextGenerationEU funding frame- work to support the experimental development of Artiﬁcial Intelligence R E F E R E N C E S [1] G. Gallego et al., “Event-based vision: A survey , ” IEEE T ransactions on P attern Analysis and Machine Intelligence , vol. 44, no. 1, pp. 154-180, 2020. [2] L. Cordone, B. Miramond, and P . Thierion, “Object detection with spiking neural networks on automotive ev ent data, ” arXiv preprint arXiv:2205.04339 , 2022. [3] Y . Fan et al., “SFOD: Spiking fusion object detector, ” arXiv pr eprint arXiv:2403.15192 , 2024. [4] P . de T ournemire et al., “ A large scale e vent-based detection dataset, ” arXiv pr eprint arXiv:2001.08499 , 2020. [5] P . Getreuer, “Malv ar-He-Cutler Linear Image Demosaicking, ” Image Pr ocessing On Line , vol. 1, pp. 16, 2011. [6] H. Koizumi and T . Maruyama, “ An implementation of Non-Local Means Algorithm on FPGA, ” Advances in P arallel Computing , v ol. 36, pp. 681- 690, 2020. [7] L. Feng and J. W ang, “Research on Image Denoising Algorithm Based on Improved W a velet Threshold and Non-local Mean Filtering, ” in IEEE 6th International Conference on Signal and Image Pr ocessing , 2021. [8] L. Y ongji and Y . Xiaojun, “ A Design of Dynamic Defectiv e Pixel Correction for Image Sensor, ” in ICAIIS , 2020. [9] J. E. Pedersen et al., “Neuromorphic intermediate representation: A uni- ﬁed instruction set for interoperable brain-inspired computing, ” Nature Communications , vol. 15, no. 1, p. 8122, 2024.

AceleradorSNN: A Neuromorphic Cognitive System Integrating Spiking Neural Networks and DynamicImage Signal Processing on FPGA

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment