BrainStack: Neuro-MoE with Functionally Guided Expert Routing for EEG-Based Language Decoding
Decoding linguistic information from electroencephalography (EEG) remains challenging due to the brain’s distributed and nonlinear organization. We present BrainStack, a functionally guided neuro-mixture-of-experts (Neuro-MoE) framework that models the brain’s modular functional architecture through anatomically partitioned expert networks. Each functional region is represented by a specialized expert that learns localized neural dynamics, while a transformer-based global expert captures cross-regional dependencies. A learnable routing gate adaptively aggregates these heterogeneous experts, enabling context-dependent expert coordination and selective fusion. To promote coherent representation across the hierarchy, we introduce cross-regional distillation, where the global expert provides top-down regularization to the regional experts. We further release SilentSpeech-EEG (SS-EEG), a large-scale benchmark comprising over 120 hours of EEG recordings from 12 subjects performing 24 silent words, the largest dataset of its kind. Experiments demonstrate that BrainStack consistently outperforms state-of-the-art models, achieving superior accuracy and generalization across subjects. Our results establish BrainStack as a functionally modular, neuro-inspired MoE paradigm that unifies neuroscientific priors with adaptive expert routing, paving the way for scalable and interpretable brain-language decoding.
💡 Research Summary
BrainStack introduces a neuroscience‑inspired mixture‑of‑experts (MoE) architecture for decoding silent speech from electroencephalography (EEG). Recognizing that the brain operates as a set of interacting functional modules, the authors partition the EEG signal into seven anatomically defined cortical regions (pre‑frontal, frontal, central, left‑temporal, right‑temporal, parietal, occipital) plus a global view that encompasses all channels. Each region is processed by a lightweight convolutional expert (CNet) that captures localized temporal dynamics and intra‑regional spatial relationships, while a global expert (CTNet) combines a convolutional patch‑embedding stage with a Transformer encoder to model long‑range spatio‑temporal dependencies across the whole scalp.
A learnable routing gate computes a relevance score for every expert using a small projection network, normalizes these scores with a softmax, and produces a weighted sum of expert representations. This adaptive routing enables the model to emphasize the most informative regions for each trial and suppress noisy or irrelevant channels in a data‑driven manner. To further align local and global representations, the authors employ hierarchical cross‑regional distillation: the global expert’s logits serve as soft targets for each regional expert, encouraging the latter to incorporate the global semantic structure while preserving region‑specific cues.
Training follows a multi‑objective loss that balances fused prediction, global modeling, local specialization, and distillation terms. A dynamic schedule gradually shifts emphasis from the global warm‑up phase to full multi‑expert supervision, ensuring stable convergence and effective coordination among experts.
The paper also releases SilentSpeech‑EEG (SS‑EEG), a new benchmark comprising over 120 hours of 128‑channel EEG plus 8 auxiliary EXG channels from 12 participants uttering 24 English words silently. With 60 000 trials, SS‑EEG is substantially larger than prior imagined‑speech datasets (e.g., KaraOne, Thinking Out Loud) in both duration and vocabulary size.
Extensive experiments on SS‑EEG demonstrate that BrainStack consistently outperforms state‑of‑the‑art baselines—including CNN, RNN, pure Transformer, and graph‑based LGGNet—across accuracy, F1‑score, and cross‑subject generalization metrics. Notably, the adaptive routing weights reveal neuro‑physiologically plausible patterns: temporal and frontal regions receive higher importance during language decoding, while central and parietal regions dominate in motor‑imagery tasks. The distillation process also reduces the divergence between regional and global output distributions, confirming effective top‑down guidance.
In summary, BrainStack advances EEG‑based language decoding by (1) embedding anatomical modularity into the model architecture, (2) providing a flexible expert‑routing mechanism that dynamically selects relevant cortical modules, (3) leveraging cross‑regional distillation to harmonize local and global representations, and (4) offering a large, publicly available silent‑speech dataset. The framework not only raises decoding performance but also enhances interpretability, paving the way for scalable, real‑time brain‑computer interfaces and future extensions to multimodal or broader cognitive decoding tasks.
Comments & Academic Discussion
Loading comments...
Leave a Comment