ASPEN: Spectral-Temporal Fusion for Cross-Subject Brain Decoding

February 20, 2026

Reading time: 4 minute

...

📝 Original Info

Title: ASPEN: Spectral-Temporal Fusion for Cross-Subject Brain Decoding
ArXiv ID: 2602.16147
Date: 2026-02-18
Authors: 논문에 명시된 저자 정보가 제공되지 않아 현재 확인할 수 없습니다.

📝 Abstract

Cross-subject generalization in EEG-based brain-computer interfaces (BCIs) remains challenging due to individual variability in neural signals. We investigate whether spectral representations offer more stable features for cross-subject transfer than temporal waveforms. Through correlation analyses across three EEG paradigms (SSVEP, P300, and Motor Imagery), we find that spectral features exhibit consistently higher cross-subject similarity than temporal signals. Motivated by this observation, we introduce ASPEN, a hybrid architecture that combines spectral and temporal feature streams via multiplicative fusion, requiring cross-modal agreement for features to propagate. Experiments across six benchmark datasets reveal that ASPEN is able to dynamically achieve the optimal spectral-temporal balance depending on the paradigm. ASPEN achieves the best unseen-subject accuracy on three of six datasets and competitive performance on others, demonstrating that multiplicative multimodal fusion enables effective cross-subject generalization.

💡 Deep Analysis

📄 Full Content

Cross-subject generalization remains a fundamental bottleneck in EEG-based brain-computer interfaces (BCIs). Models trained on multi-subject data often degrade substantially when deployed to new users, requiring lengthy subject-specific calibration that undermines the goal of plug-and-play systems (Wan et al., 2021;Liang et al., 2024b). This is due to inherent differences between individuals such as skull thickness, cortical folding, and electrode placement that can produce substantial variation in signal amplitude, timing, and spatial distribution (Lu et al., 2024;Roy et al., 2019).

A growing body of work has addressed this limitation through increasingly expressive temporal modeling, progressing from compact CNN-based decoders (Lawhern et al., 2018) to Transformer architectures that capture global dependencies (Song et al., 2022). However, temporal waveforms are highly sensitive to phase shifts, latency jitter, and amplitude scaling across subjects. The hypothesis we investigate here is that spectral representations provide a more stable basis for cross-subject transfer. Frequency-domain features abstract away precise timing information while preserving the oscillatory signatures, such as µ (8-12 Hz) and β (13-30 Hz) rhythms, that serve as primary biomarkers for BCI paradigms (Ang et al., 2008;Mane et al., 2020).

To test this hypothesis, we first conduct a systematic correlation analyses comparing temporal and spectral representations across SSVEP, P300, and Motor Imagery paradigms. Our analysis reveals that spectral features exhibit substantially higher cross-subject similarity than temporal signals, suggesting that frequency-domain representations offer a more robust foundation for generalization. Motivated by this finding, we introduce ASPEN (Adaptive Spectral Encoder Network, Figure 1), a hybrid framework that processes EEG signals through parallel temporal and spectral streams and combines them via multiplicative fusion. Unlike prior approaches that concatenate or average multimodal features (Li et al., 2021;2025), multiplicative fusion computes element-wise products of projected stream representations, requiring both streams to agree for a feature to propagate. This cross-modal gating naturally suppresses artifacts and noise that appear prominently in only one view, while amplifying genuine neural patterns that manifest consistently across both temporal and spectral domains.

We evaluate ASPEN across six benchmark datasets spanning three paradigms. Our experiments reveal that the optimal spectral-temporal balance varies by task: P300 decoding benefits strongly from spectral emphasis, while Motor Imagery requires greater temporal contribution. ASPEN achieves the best unseen-subject accuracy on three datasets (Lee2019 SSVEP, BNCI2014 P300, and Lee2019 MI), outperforming both specialized temporal models and recent multimodal transformers. These results demonstrate that our model is able to perform cross-subject generalization across different BCI tasks while maintaining robustness across diverse neural signatures.

Temporal modeling: Deep learning for EEG signals has evolved from high-capacity architectures like DeepConvNet (Schirrmeister et al., 2017) toward compact, neurophysiologicallyinformed models. EEGNet (Lawhern et al., 2018) introduced depthwise and separable convolutions that mirror traditional spatial filtering, achieving strong performance with minimal parameters. Transformer-based models such as EEG Conformer (Song et al., 2022) and hybrid CNN-Transformer architectures like CTNet (Zhao et al., 2024) capture long-range temporal dependencies. Temporal convolutional networks (TCNs) offer improved sequential modeling with training stability advantages over recurrent approaches (Ingolfsson et al., 2020;Musallam et al., 2021).

Spectral and filter-bank approaches: Filter-bank methods decompose EEG into frequency subbands before learning spatial filters. The foundational FBCSP algorithm (Ang et al., 2008) demonstrated that isolating discriminative frequency bands improves motor imagery classification. Deep learning extensions apply this principle with learnable filters (Mane et al., 2020;Liu et al., 2022), while IFNet (Wang et al., 2023) models cross-frequency interactions. Time-frequency representations via wavelets have also shown promise for capturing non-stationary dynamics (Morales & Bowers, 2022).

Recent work has begun combining temporal and spectral features. Li et al. (Li et al., 2021) proposed a temporal-spectral squeeze-and-excitation network for motor imagery. TSformer-SA (Li et al., 2025) integrates temporal signals with wavelet spectrograms through crossview attention for RSVP decoding. Dual-branch architectures have also been explored for emotion recognition (Luo et al., 2023). However, these approaches typically employ additive fusion strategies like concatenation, averaging, or learned weighted sums that allow each stream to contribute independently. Our multiplicative approach

📄 Read Full PDF on ArXiv

Reference

This content is AI-processed based on open access ArXiv data.

ASPEN: Spectral-Temporal Fusion for Cross-Subject Brain Decoding

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Related Posts

Adaptive Semi-Supervised Training of P300 ERP-BCI Speller System with Minimum Calibration Effort

Approximation Theory for Lipschitz Continuous Transformers

Bayesian Quadrature: Gaussian Processes for Integration

Start searching

No results found