A neuromorphic model of the insect visual system for natural image processing
Insect vision supports complex behaviors including associative learning, navigation, and object detection, and has long motivated computational models for understanding biological visual processing. However, many contemporary models prioritize task performance while neglecting biologically grounded processing pathways. Here, we introduce a bio-inspired vision model that captures principles of the insect visual system to transform dense visual input into sparse, discriminative codes. The model is trained using a fully self-supervised contrastive objective, enabling representation learning without labeled data and supporting reuse across tasks without reliance on domain-specific classifiers. We evaluated the resulting representations on flower recognition tasks and natural image benchmarks. The model consistently produced reliable sparse codes that distinguish visually similar inputs. To support different modelling and deployment uses, we have implemented the model as both an artificial neural network and a spiking neural network. In a simulated localization setting, our approach outperformed a simple image downsampling comparison baseline, highlighting the functional benefit of incorporating neuromorphic visual processing pathways. Collectively, these results advance insect computational modelling by providing a generalizable bio-inspired vision model capable of sparse computation across diverse tasks.
💡 Research Summary
The paper presents a biologically inspired vision model that emulates the hierarchical processing pipeline of the insect visual system and translates dense visual input into a highly sparse, discriminative code. The authors construct a series of processing layers—retina, lamina, medulla, lobula, visual projection neurons (VPN), and finally a Kenyon cell (KC) layer—each implemented as a convolutional neural network (CNN) block followed by a custom activation‑normalization (BAN) module. The BAN combines a LeakyReLU activation with local response normalization (to mimic lateral inhibition) and group normalization (to emulate global homeostatic regulation).
In the retina stage the image is filtered by a CNN; its output is concatenated with a sign‑inverted copy to reproduce the parallel positive‑negative contrast pathways found in compound eyes. The lamina processes this concatenated tensor, enhancing spatial contrast without hand‑crafted filters. The medulla splits the signal into chromatic (blue‑green) and achromatic streams, each processed in parallel to extract color‑specific and intensity‑specific features. The lobula further refines these features before they are fed into three distinct VPN pathways—anterior superior optic tract (ASOT), anterior inferior optic tract (AIOT), and lateral optic tract (LOT)—capturing diverse spatial information analogous to insect optic tracts.
Because the CNN backbone produces 128 feature channels, an average‑pooling operation reduces them to a single channel before the sparsification stage. Sparsity is achieved through two mechanisms. First, a fixed binary connectivity mask M imposes a sparse wiring pattern on a linear transformation with learnable weights W and bias b. Second, an adaptive k‑Winner‑Take‑All (a‑kWTA) algorithm enforces a target activity level (≈5 % active neurons). The a‑kWTA tracks each neuron’s activation frequency µ using an exponential moving average, adjusts an individual threshold θ based on the deviation from a desired sparsity ρ, and finally selects the top‑k neurons. This process mirrors the competition among Kenyon cells in the insect mushroom body and yields a low‑dimensional (1024‑dim) KC code that is both sparse and highly discriminative.
Training is performed with a fully self‑supervised contrastive objective: pairs of augmented views of the same image are pulled together in representation space while views of different images are pushed apart. No labeled data are required, allowing the model to learn generic visual features from large, unlabeled datasets.
Two implementations are provided. The artificial neural network (ANN) version runs on conventional GPUs and serves as a rapid prototyping platform. The spiking neural network (SNN) version replaces the ReLU‑based activations with leaky integrate‑and‑fire (LIF) neurons and uses spike‑based communication, making the architecture compatible with low‑power neuromorphic hardware.
The authors evaluate the learned representations on flower‑recognition tasks, standard natural‑image benchmarks, and a simulated localization scenario. Across all tests the model outperforms a simple down‑sampling baseline, demonstrating that the sparse KC code retains sufficient information for downstream tasks while being far more efficient than dense CNN features. Visualizations show that similar images produce overlapping sparse patterns, confirming the discriminative power of the code.
In summary, the work bridges insect neurobiology and modern machine learning by (1) faithfully reproducing the sequential processing stages of the insect optic lobes, (2) incorporating biologically plausible normalization and competition mechanisms, (3) employing contrastive self‑supervision to obtain task‑agnostic features, and (4) delivering both ANN and SNN realizations for flexible deployment. The model advances neuromorphic vision by showing that insect‑inspired sparse coding can be harnessed for general‑purpose image understanding, opening avenues for energy‑efficient vision systems on neuromorphic chips and for integrating additional sensory modalities in future bio‑inspired architectures.
Comments & Academic Discussion
Loading comments...
Leave a Comment