뇌전위와 스파이크를 연결하는 교차 모달 지식 증류: 다세션 LFP 변환기 모델의 성능 향상
📝 Abstract
Local field potentials (LFPs) can be routinely recorded alongside spiking activity in intracortical neural experiments, measure a larger complementary spatiotemporal scale of brain activity for scientific inquiry, and can offer practical advantages over spikes, including greater long-term stability, robustness to electrode degradation, and lower power requirements. Despite these advantages, recent neural modeling frameworks have largely focused on spiking activity since LFP signals pose inherent modeling challenges due to their aggregate, population-level nature, often leading to lower predictive power for downstream task variables such as motor behavior. To address this challenge, we introduce a cross-modal knowledge distillation framework that transfers high-fidelity representational knowledge from pretrained multi-session spike transformer models to LFP transformer models. Specifically, we first train a teacher spike model across multiple recording sessions using a masked autoencoding objective with a session-specific neural tokenization strategy. We then align the latent representations of the student LFP model to those of the teacher spike model. Our results show that the Distilled LFP models consistently outperform single-and multi-session LFP baselines in both fully unsupervised and supervised settings, and can generalize to other sessions without additional distillation while maintaining superior performance. These findings demonstrate that cross-modal knowledge distillation is a powerful and scalable approach for leveraging high-performing spike models to develop more accurate LFP models.
💡 Analysis
Local field potentials (LFPs) can be routinely recorded alongside spiking activity in intracortical neural experiments, measure a larger complementary spatiotemporal scale of brain activity for scientific inquiry, and can offer practical advantages over spikes, including greater long-term stability, robustness to electrode degradation, and lower power requirements. Despite these advantages, recent neural modeling frameworks have largely focused on spiking activity since LFP signals pose inherent modeling challenges due to their aggregate, population-level nature, often leading to lower predictive power for downstream task variables such as motor behavior. To address this challenge, we introduce a cross-modal knowledge distillation framework that transfers high-fidelity representational knowledge from pretrained multi-session spike transformer models to LFP transformer models. Specifically, we first train a teacher spike model across multiple recording sessions using a masked autoencoding objective with a session-specific neural tokenization strategy. We then align the latent representations of the student LFP model to those of the teacher spike model. Our results show that the Distilled LFP models consistently outperform single-and multi-session LFP baselines in both fully unsupervised and supervised settings, and can generalize to other sessions without additional distillation while maintaining superior performance. These findings demonstrate that cross-modal knowledge distillation is a powerful and scalable approach for leveraging high-performing spike models to develop more accurate LFP models.
📄 Content
Cross-Modal Representational Knowledge Distillation for Enhanced Spike-Informed LFP Modeling Eray Erturk1 Saba Hashemi2 Maryam M. Shanechi1,2,3,4∗ 1Ming Hsieh Department of Electrical and Computer Engineering 2Thomas Lord Department of Computer Science 3Alfred E. Mann Department of Biomedical Engineering 4Neuroscience Graduate Program University of Southern California, Los Angeles, CA {eerturk, saba.hashemi, shanechi}@usc.edu Abstract Local field potentials (LFPs) can be routinely recorded alongside spiking activity in intracortical neural experiments, measure a larger complementary spatiotemporal scale of brain activity for scientific inquiry, and can offer practical advantages over spikes, including greater long-term stability, robustness to electrode degradation, and lower power requirements. Despite these advantages, recent neural modeling frameworks have largely focused on spiking activity since LFP signals pose inherent modeling challenges due to their aggregate, population-level nature, often leading to lower predictive power for downstream task variables such as motor behavior. To address this challenge, we introduce a cross-modal knowledge distillation framework that transfers high-fidelity representational knowledge from pretrained multi-session spike transformer models to LFP transformer models. Specifically, we first train a teacher spike model across multiple recording sessions using a masked autoencoding objective with a session-specific neural tokenization strategy. We then align the latent representations of the student LFP model to those of the teacher spike model. Our results show that the Distilled LFP models consistently outperform single- and multi-session LFP baselines in both fully unsupervised and supervised settings, and can generalize to other sessions without additional distillation while maintaining superior performance. These findings demonstrate that cross-modal knowledge distillation is a powerful and scalable approach for leveraging high-performing spike models to develop more accurate LFP models. 1 Introduction Recent advances in neural recording technologies have enabled the collection of large-scale neural datasets across multiple subjects and recording sessions and led to many advanced models of neural activity that are trained for single recording sessions [1–12]. Moreover, access to such large-scale datasets has motivated training multi-subject and multi-session models of neural activity that can generalize across experimental conditions and tasks [13–22]. To date, the development of multi- session models has focused on spiking activity given the widespread availability of spike datasets compared to other neural modalities such as field potentials. Indeed, local field potentials (LFPs) remain underutilized in recent modeling efforts, despite being routinely recorded alongside spikes. Yet, LFP signals can offer a complementary modality for neuroscience investigations by measuring the brain at larger spatiotemporal scales compared with neuronal spikes [23, 24]. Furthermore, LFPs are potentially advantageous for brain-computer interfaces (BCIs) [25–27] for several reasons. First, ∗Corresponding author: shanechi@usc.edu 39th Conference on Neural Information Processing Systems (NeurIPS 2025). arXiv:2512.12461v1 [cs.LG] 13 Dec 2025 they often exhibit greater long-term stability and robustness as they are less sensitive to small shifts in electrode positions or neuronal loss due to reflecting larger-scale population activity across many neurons [23, 24, 28–30]. Second, they are more consistently available than spikes, particularly in chronic long-term settings where spike recordings often degrade or become unavailable over time [28, 31–34]. Third, they have lower power requirements than spike signals, making them more suitable for real-time applications such as BCIs [29, 35]. Despite these advantages, models trained on LFP signals tend to underperform compared to spike- based models in decoding tasks, especially under unsupervised and self-supervised training regimes [9, 24, 34, 36, 37], which are critical for modeling with unlabeled neural data and for building neural representations that are generalizable across tasks. This gap arises primarily due to inherent challenges in modeling LFP signals. Specifically, LFP signals reflect population-level aggregate activity from complex neural circuits, resulting in difficulties in isolating individual sources of task-relevant neural variability within aggregate signals [38, 39], and leading to redundant and highly correlated spatial patterns [40]. In addition, high noise correlations in low-frequency bands of LFP [41] can further obscure task-relevant information in LFP signals. To overcome these limitations and enable training more accurate LFP models, we propose a cross- modal representational knowledge distillation framework that leverages spike-based transformer models pretrained on an abundant amount of public spike datasets a
This content is AI-processed based on ArXiv data.