AVM: Towards Structure-Preserving Neural Response Modeling in the Visual Cortex Across Stimuli and Individuals
While deep learning models have shown strong performance in simulating neural responses, they often fail to clearly separate stable visual encoding from condition-specific adaptation, which limits their ability to generalize across stimuli and individuals. We introduce the Adaptive Visual Model (AVM), a structure-preserving framework that enables condition-aware adaptation through modular subnetworks, without modifying the core representation. AVM keeps a Vision Transformer-based encoder frozen to capture consistent visual features, while independently trained modulation paths account for neural response variations driven by stimulus content and subject identity. We evaluate AVM in three experimental settings, including stimulus-level variation, cross-subject generalization, and cross-dataset adaptation, all of which involve structured changes in inputs and individuals. Across two large-scale mouse V1 datasets, AVM outperforms the state-of-the-art V1T model by approximately 2% in predictive correlation, demonstrating robust generalization, interpretable condition-wise modulation, and high architectural efficiency. Specifically, AVM achieves a 9.1% improvement in explained variance (FEVE) under the cross-dataset adaptation setting. These results suggest that AVM provides a unified framework for adaptive neural modeling across biological and experimental conditions, offering a scalable solution under structural constraints. Its design may inform future approaches to cortical modeling in both neuroscience and biologically inspired AI systems.
💡 Research Summary
The paper introduces the Adaptive Visual Model (AVM), a novel framework for predicting neural responses in primary visual cortex (V1) that explicitly separates stable visual encoding from condition‑specific adaptation. AVM builds on a Vision Transformer (ViT) backbone originally used in the V1T model, but unlike V1T the backbone is frozen after an initial pre‑training phase, ensuring that the core visual representation remains invariant across all experimental conditions. To capture variability due to stimulus changes, subject identity, or experimental context, AVM inserts lightweight Condition‑aware Modulation Units (CAMUs) in parallel to the frozen backbone. Each CAMU is a bottleneck feed‑forward module (down‑projection, ReLU, up‑projection) that adds a residual correction to the activations after attention, after the MLP, and after the block output. This dual‑path architecture allows localized, interpretable adjustments without altering the shared encoder.
Three architectural variants are explored: (i) AVM – a distinct CAMU for each Transformer block, providing the finest granularity of adaptation; (ii) AVM‑S – a single shared CAMU across all blocks, dramatically reducing trainable parameters; and (iii) AVM‑B – which adds cross‑block modulation pathways to model higher‑level interactions. All variants preserve the biological principle that cortical circuitry maintains a stable structural scaffold while flexibly modulating responses.
Training proceeds in two stages. Stage 1 jointly trains the ViT backbone and a Gaussian readout on a large dataset (e.g., the Sensorium dataset) using Poisson loss, learning a universal visual representation. Stage 2 freezes the backbone and fine‑tunes only the CAMU parameters on a new condition (different mouse, different stimulus set, or different dataset). This mirrors the biological scenario where the visual cortex’s anatomical wiring is fixed but contextual factors modulate activity.
The authors evaluate AVM on two extensive mouse V1 datasets: the Sensorium dataset (≈7 000 neurons from five mice, 25 100 natural images) and the Franke dataset (≈1 000 neurons from ten mice, both grayscale and color images). Three experimental challenges are addressed: (a) stimulus‑level variation within the same animal, (b) cross‑subject generalization, and (c) cross‑dataset adaptation. Performance is measured with single‑trial correlation (ρ_trial), trial‑averaged correlation (ρ_avg), and Fraction of Explained Variance (FEVE).
Across all settings AVM consistently outperforms the state‑of‑the‑art V1T model. Correlation gains are roughly 2 % higher, and in the cross‑dataset adaptation scenario FEVE improves by 9.1 % relative to V1T. Parameter efficiency is striking: while V1T requires over 2.46 M trainable weights, AVM‑S reduces this to 0.03 M and AVM‑B to 0.11 M, yet still delivers superior accuracy. Robustness checks with five random seeds show standard deviations below 0.001 for all key metrics, confirming stability.
The study demonstrates that decoupling representation and modulation yields a model that is both biologically plausible and practically advantageous: it preserves a shared, interpretable visual code while allowing rapid, low‑cost adaptation to new conditions without full retraining. Limitations include the focus on mouse V1 and relatively simple behavioral covariates; future work could extend AVM to other cortical areas, incorporate richer behavioral states, and test on human neuroimaging data. Overall, AVM offers a scalable, interpretable solution for adaptive cortical modeling and may inspire new biologically‑inspired AI architectures.
Comments & Academic Discussion
Loading comments...
Leave a Comment