EEG Foundation Models: Progresses, Benchmarking, and Open Problems

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Electroencephalography (EEG) foundation models have recently emerged as a promising paradigm for brain-computer interfaces (BCIs), aiming to learn transferable neural representations from large-scale heterogeneous recordings. Despite rapid progresses, there lacks fair and comprehensive comparisons of existing EEG foundation models, due to inconsistent pre-training objectives, preprocessing choices, and downstream evaluation protocols. This paper fills this gap. We first review 50 representative models and organize their design choices into a unified taxonomic framework including data standardization, model architectures, and self-supervised pre-training strategies. We then evaluate 12 open-source foundation models and competitive specialist baselines across 13 EEG datasets spanning nine BCI paradigms. Emphasizing real-world deployments, we consider both cross-subject generalization under a leave-one-subject-out protocol and rapid calibration under a within-subject few-shot setting. We further compare full-parameter fine-tuning with linear probing to assess the transferability of pre-trained representations, and examine the relationship between model scale and downstream performance. Our results indicate that: 1) linear probing is frequently insufficient; 2) specialist models trained from scratch remain competitive across many tasks; and, 3) larger foundation models do not necessarily yield better generalization performance under current data regimes and training practices.

💡 Research Summary

The paper provides a comprehensive survey and benchmark of EEG foundation models, a newly emerging class of deep‑learning systems designed to learn transferable neural representations from massive, heterogeneous EEG recordings. First, the authors catalog 50 representative models published between 2021 and 2026, extracting key design dimensions—data standardization and preprocessing, backbone architecture, and self‑supervised pre‑training objective. They observe a rapid surge of activity (over 80 % of papers appearing in the last two years) and a clear dominance of Transformer‑based encoders, while pre‑training strategies fall into five families: masked time‑domain reconstruction, token‑level reconstruction (including code‑book discretization), frequency‑domain reconstruction, contrastive learning, and causal/autoregressive prediction.

To move beyond fragmented evaluations, the authors construct a unified benchmark that tests 12 open‑source foundation models against strong specialist baselines across 13 publicly available EEG datasets covering nine BCI paradigms (motor imagery, SSVEP, ERP, emotion, workload, imagined speech, sleep, epilepsy, etc.). Two realistic downstream protocols are employed: (i) leave‑one‑subject‑out (LOSO) cross‑subject generalization, and (ii) within‑subject few‑shot adaptation where fine‑tuning data amount to only 1/20–1/100 of the usual training set. For each downstream task they compare full‑parameter fine‑tuning with linear probing (fixed encoder + linear classifier).

The experimental results reveal three major findings. First, linear probing alone is generally insufficient; full‑parameter fine‑tuning consistently yields 4–7 percentage‑point gains, especially on high‑frequency or temporally complex paradigms such as ERP and SSVEP. This suggests that EEG signals retain substantial non‑linear structure that must be adapted during downstream training. Second, specialist models trained from scratch (e.g., EEGNet, DeepConvNet) remain competitive; foundation models only marginally outperform them on a subset of tasks, indicating that the generic representations learned from heterogeneous data do not always capture paradigm‑specific cues. Third, scaling up model size or pre‑training data volume does not guarantee better performance. Models ranging from <1 M to >1 B parameters show no monotonic improvement; the largest models sometimes suffer from over‑fitting or negligible gains, likely because publicly available EEG corpora are still limited in diversity and scale.

Beyond performance, the paper highlights methodological inconsistencies across the literature—varying channel‑selection schemes, normalization (z‑score, CAR, EMA), masking ratios, and evaluation splits—making fair comparison difficult. To address this, the authors release a standardized preprocessing pipeline, unified LOSO and few‑shot protocols, and an open‑source evaluation suite. They also propose future directions: multimodal pre‑training (EEG + ECG/EMG/MEG), parameter‑efficient architectures (e.g., Mamba, pruning, LoRA), and meta‑learning or prompting techniques tailored to BCI constraints.

In conclusion, EEG foundation models hold promise for reducing label dependence and improving cross‑paradigm generalization in BCIs, but current data regimes, training practices, and evaluation standards limit their practical advantage. Systematic benchmarking, larger and more diverse EEG corpora, and efficient transfer‑learning strategies are essential next steps for the field to realize the full potential of foundation models in real‑world neurotechnology applications.

EEG Foundation Models: Progresses, Benchmarking, and Open Problems

💡 Research Summary

Comments & Academic Discussion

Leave a Comment