Scaling Next-Brain-Token Prediction for MEG

Scaling Next-Brain-Token Prediction for MEG
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We present a large autoregressive model for source-space MEG that scales next-token prediction to long context across datasets and scanners: handling a corpus of over 500 hours and thousands of sessions across the three largest MEG datasets. A modified SEANet-style vector-quantizer reduces multichannel MEG into a flattened token stream on which we train a Qwen2.5-VL backbone from scratch to predict the next brain token and to recursively generate minutes of MEG from up to a minute of context. To evaluate long-horizon generation, we introduce task-matched tests: (i) on-manifold stability via generated-only drift compared to the time-resolved distribution of real sliding windows, and (ii) conditional specificity via correct context versus prompt-swap controls using a neurophysiologically grounded metric set. We train on CamCAN and Omega and run all analyses on held-out MOUS, establishing cross-dataset generalization. Across metrics, generations remain relatively stable over long rollouts and are closer to the correct continuation than swapped controls. Code available at: https://github.com/ricsinaruto/brain-gen.


💡 Research Summary

The paper introduces a large-scale autoregressive model for source‑space magnetoencephalography (MEG) that can predict the next brain token over long contexts and generate minutes‑long recordings from a one‑minute prompt. The authors first develop a modified SEANet‑style vector‑quantizer called BrainTokMix, which compresses multichannel MEG into a flattened token stream by early channel mixing and residual vector quantization (RVQ) across four code levels. This tokenizer achieves a 17‑fold compression (≈400 tokens per second) while preserving reconstruction fidelity through a loss that combines L1 error, Pearson correlation, commitment penalties, and spectral magnitude/phase terms.

Using the tokenized data, they train FlatGPT, a decoder‑only transformer built on the Qwen‑2.5‑VL architecture. Tokens are serialized across time, a small spatial axis (four “neuro‑streams”), and RVQ level, and the model receives multi‑axis rotary positional embeddings (MRoPE) so that each dimension retains its positional semantics. Training is purely next‑token cross‑entropy, with no auxiliary embeddings or metadata, enabling the model to rely solely on the prompt for conditional generation.

The training corpus comprises over 500 hours of MEG from three public datasets—CamCAN, OMEGA, and MOUS—covering resting‑state and diverse tasks. The model is trained on CamCAN and OMEGA and evaluated exclusively on the held‑out MOUS dataset to test out‑of‑distribution generalization. Evaluation introduces two task‑matched tests: (i) on‑manifold stability, measured by the drift of generated‑only token sequences relative to the distribution of real sliding windows; and (ii) conditional specificity, assessed by comparing generations conditioned on the correct context versus swapped‑prompt controls using neurophysiologically grounded metrics such as channel‑wise power spectra, Pearson correlations, and time‑frequency synchrony.

Results show that FlatGPT maintains relatively low drift over multi‑minute rollouts and that generations are significantly closer to the true continuation than to swapped prompts, indicating genuine context dependence. The authors also provide extensive ablations showing that the SEANet‑based tokenization and the flattened token ordering are crucial for scalability and performance.

Contributions are summarized as: (1) BrainTokMix, a causal channel‑mixing RVQ tokenizer for source‑space MEG; (2) FlatGPT, a large decoder‑only transformer trained from scratch on these tokens; (3) a cross‑dataset evaluation protocol that stress‑tests long‑horizon stability and prompt dependence; and (4) open‑source code and pretrained models. Limitations include the fixed 100 Hz sampling rate, potential loss of high‑frequency information during quantization, and the need for downstream probing to interpret generated signals. Future work is suggested to increase temporal resolution, integrate multimodal tokens (language, vision, action), and explore downstream applications such as data augmentation, brain‑in‑the‑loop AI training, and neuroscientific hypothesis testing.


Comments & Academic Discussion

Loading comments...

Leave a Comment