Brain4FMs: A Benchmark of Foundation Models for Electrical Brain Signal

Brain4FMs: A Benchmark of Foundation Models for Electrical Brain Signal
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Brain Foundation Models (BFMs) are transforming neuroscience by enabling scalable and transferable learning from neural signals, advancing both clinical diagnostics and cutting-edge neuroscience exploration. Their emergence is powered by large-scale clinical recordings, particularly electroencephalography (EEG) and intracranial EEG, which provide rich temporal and spatial representations of brain dynamics. However, despite their rapid proliferation, the field lacks a unified understanding of existing methodologies and a standardized evaluation framework. To fill this gap, we map the benchmark design space along two axes: (i) from the model perspective, we organize BFMs under a self-supervised learning (SSL) taxonomy; and (ii) from the dataset perspective, we summarize common downstream tasks and curate representative public datasets across clinical and human-centric neurotechnology applications. Building on this consolidation, we introduce Brain4FMs, an open evaluation platform with plug-and-play interfaces that integrates 15 representative BFMs and 18 public datasets. It enables standardized comparisons and analysis of how pretraining data, SSL strategies, and architectures affect generalization and downstream performance, guiding more accurate and transferable BFMs. The code is available at https://anonymous.4open.science/r/Brain4FMs-85B8.


💡 Research Summary

Brain4FMs presents a comprehensive benchmark for Brain Foundation Models (BFMs) that learn from large‑scale electroencephalography (EEG) and intracranial EEG (iEEG) recordings. The authors first organize the rapidly growing landscape of BFMs along two orthogonal axes. From the model side, they propose a self‑supervised learning (SSL) taxonomy that groups methods into three paradigms: (1) contrastive‑based approaches, which create positive pairs through data augmentations, temporal prediction, or cross‑modal alignment and contrast them against a set of negatives; (2) generative‑based approaches, which reconstruct masked inputs or predict future tokens using autoencoders, VQ‑VAEs, or autoregressive decoders; and (3) hybrid/explicit‑predictive approaches that combine multiple SSL objectives or predict predefined neurophysiological attributes (e.g., sleep stage, channel configuration). Each paradigm is further broken down into patch‑level (treating short temporal windows as exchangeable tokens) and sequence‑level formulations, providing a unified mathematical description of the training objective.

From the data side, the benchmark curates 18 publicly available datasets covering 11 downstream tasks grouped into disease diagnosis, sleep staging, brain‑computer communication, and affective computing. These datasets span a wide range of channel counts, sampling rates, and label types, and most are disjoint from the pretraining corpora of the evaluated models, ensuring a true test of generalization.

The benchmark pipeline consists of two stages. First, a standardized preprocessing routine (band‑pass and notch filtering, down‑sampling, event‑aligned segmentation, channel selection, and per‑channel z‑score normalization) is applied to all recordings. Second, each pretrained BFM serves as a feature extractor; its latent representation is fed to a task‑specific classifier that is fine‑tuned under a leave‑subjects‑out cross‑validation protocol (approximately 3:1:1 train‑validation‑test split). This protocol guarantees that no subject appears in more than one split, mimicking real‑world clinical deployment where models must generalize to unseen patients.

Fifteen representative BFMs are integrated into Brain4FMs, covering a spectrum of architectures (CNNs, graph neural networks, transformers) and parameter scales from 138 K to 708 M. The models employ diverse SSL strategies: augmentation‑based contrastive learning (e.g., SppEEGNet), contrastive predictive coding (e.g., Bendr, MBrain), cross‑modal contrast, masked autoencoding (e.g., Brant, Brainbert), VQ‑VAE tokenization (e.g., BFM), autoregressive language‑model style training (NeuroGPT), and hybrid generative‑contrastive designs (EEGPT‑1, NeuroLM).

Empirical results reveal several key findings. (i) Model size correlates positively with downstream performance, but the alignment between pretraining data modality and target task is a stronger predictor of success. iEEG‑pretrained models excel on iEEG tasks, while EEG‑only models dominate EEG‑centric benchmarks. (ii) Contrastive methods are highly sensitive to the choice of augmentations; physiologically plausible transformations (e.g., jitter, scaling, frequency‑domain masking) are essential, otherwise performance degrades sharply. (iii) Generative methods show more stable learning across augmentation schemes, with mask ratio and reconstruction target design being the primary hyper‑parameters. (iv) Hybrid models that jointly optimize contrastive and reconstruction objectives achieve the best results on multimodal communication tasks, suggesting that combining cross‑view consistency with signal‑structure preservation yields richer representations.

The analysis also highlights the importance of domain‑matched pretraining data and consistent preprocessing. Models trained on heterogeneous, multi‑subject corpora generalize better across the diverse benchmark datasets, underscoring the need for large, diverse EEG/iEEG repositories. The authors discuss practical implications: researchers should prioritize physiologically grounded augmentations, consider mixed SSL objectives for tasks requiring both temporal dynamics and cross‑modal alignment, and leverage larger, modality‑balanced pretraining sets.

Brain4FMs is released as open‑source code with plug‑and‑play interfaces, allowing the community to add new models or datasets easily. This facilitates reproducible research, fair comparison, and rapid iteration toward more accurate, transferable BFMs. The authors envision future extensions to incorporate other biosignals (EMG, fNIRS) and real‑time streaming scenarios, further broadening the benchmark’s applicability to clinical neurotechnology and human‑centric AI.


Comments & Academic Discussion

Loading comments...

Leave a Comment