Tokenizing Single-Channel EEG with Time-Frequency Motif Learning
Foundation models are reshaping EEG analysis, yet an important problem of EEG tokenization remains a challenge. This paper presents TFM-Tokenizer, a novel tokenization framework that learns a vocabulary of time-frequency motifs from single-channel EEG signals and encodes them into discrete tokens. We propose a dual-path architecture with time-frequency masking to capture robust motif representations, and it is model-agnostic, supporting both lightweight transformers and existing foundation models for downstream tasks. Our study demonstrates three key benefits: Accuracy: Experiments on four diverse EEG benchmarks demonstrate consistent performance gains across both single- and multi-dataset pretraining settings, achieving up to $11%$ improvement in Cohen’s Kappa over strong baselines. Generalization: Moreover, as a plug-and-play component, it consistently boosts the performance of diverse foundation models, including BIOT and LaBraM. Scalability: By operating at the single-channel level rather than relying on the strict 10-20 EEG system, our method has the potential to be device-agnostic. Experiments on ear-EEG sleep staging, which differs from the pretraining data in signal format, channel configuration, recording device, and task, show that our tokenizer outperforms baselines by $14%$. A comprehensive token analysis reveals strong class-discriminative, frequency-aware, and consistent structure, enabling improved representation quality and interpretability. Code is available at https://github.com/Jathurshan0330/TFM-Tokenizer.
💡 Research Summary
The paper introduces TFM‑Tokenizer, a novel tokenization framework designed specifically for electroencephalogram (EEG) data and intended to serve as a plug‑and‑play front‑end for a wide range of EEG foundation models. The authors identify three core challenges that have been largely ignored in prior work: (1) Tokenization target – EEG recordings come from heterogeneous devices, channel configurations, and sampling rates, making a channel‑agnostic tokenization essential; (2) Token resolution – EEG contains both oscillatory (e.g., alpha, beta) and transient (e.g., spikes) patterns that require a representation capable of capturing short, recurring motifs; (3) Learning objective – Purely temporal tokenization neglects the rich spectral information that co‑exists with temporal dynamics. To address these, TFM‑Tokenizer operates on single‑channel signals, learns a vocabulary of time‑frequency motifs, and employs a dual‑path architecture that processes both frequency and time dimensions.
Architecture Overview
- Pre‑processing – Each channel is segmented into overlapping patches of length L with hop H, producing N patches. For each patch a short‑time Fourier transform yields a spectral window Sᵢ.
- Localized Spectral Window Encoder – The spectral window is split along the frequency axis into P non‑overlapping patches. Each patch is linearly projected (GroupNorm → GeLU → Linear) to a D‑dimensional embedding eᵢ,ₚ. A Frequency Transformer then operates across the P embeddings, modeling intra‑window cross‑frequency dependencies. A gated patchwise aggregation (σ(W₁e)·W₂e) produces a compact frequency embedding E_Fᵢ.
- Temporal Encoder – Raw time‑domain patches are similarly projected to embeddings E_Tᵢ. E_Fᵢ and E_Tᵢ are concatenated and fed into a Temporal Transformer that captures long‑range temporal relationships across the N patches.
- VQ‑VAE Codebook – The output embeddings Zᵢ are quantized against a learnable codebook V, yielding discrete token IDs. This codebook constitutes the motif vocabulary. Positional encodings are omitted to avoid imposing artificial stationarity on the inherently non‑stationary EEG signal.
Training Objective
A time‑frequency masking scheme is applied: random temporal segments and random frequency bands are masked, and the model must reconstruct the masked tokens. The loss is a cross‑entropy between the predicted token distribution and the ground‑truth token from the codebook. This forces the encoder to learn robust, frequency‑specific motifs that are recoverable even under heavy noise or amplitude scaling.
Experimental Protocol
- Pre‑training: The tokenizer is trained unsupervised on large, heterogeneous EEG corpora (four public datasets: TUEV, SEED, BCI‑IV, Sleep‑EDF).
- Downstream: After freezing the tokenizer, a lightweight transformer (≈0.7 M parameters, linear attention) is pretrained on the token sequences using a masked‑token prediction objective, then fine‑tuned on specific classification tasks (e.g., seizure detection, motor imagery, sleep staging).
- Plug‑and‑Play Evaluation: The token sequences are also fed into existing foundation models (BIOT, LaBraM). In each case, the models receive discrete tokens instead of raw continuous embeddings.
Results
- Across the four benchmarks, TFM‑Tokenizer consistently improves Cohen’s Kappa by 4–11 % over strong baselines that use raw segment embeddings.
- When combined with BIOT or LaBraM, performance gains of ≈4 % are observed on the TUEV seizure detection task, demonstrating the tokenizer’s model‑agnostic benefit.
- Cross‑device scalability is validated on an ear‑EEG sleep‑staging dataset, which differs in electrode placement, sampling rate, and task. Here the tokenizer outperforms baselines by 14 %, confirming that single‑channel tokenization indeed yields device‑agnostic representations.
Token Analysis
- Class‑specific uniqueness – Certain tokens appear predominantly in one class (e.g., seizure vs. non‑seizure), indicating that the vocabulary captures discriminative neural events.
- Frequency awareness – Visualization of token‑frequency activation maps shows clear alignment with known EEG bands (alpha, beta, gamma), confirming that the dual‑path encoder successfully isolates spectral information.
- Structural consistency – The same token recurs across different subjects and sessions with similar spectral shapes, suggesting robustness to inter‑subject variability.
Limitations & Future Work
- The current pipeline freezes the tokenizer after pre‑training; joint fine‑tuning of tokenizer and downstream model could further boost performance.
- Codebook size and token length are hyper‑parameters that strongly influence results; systematic exploration is needed.
- Real‑time deployment on low‑power wearables has not been demonstrated; model compression and on‑device inference are promising directions.
- Extending the framework to multimodal settings (EEG‑text, EEG‑video) could leverage the discrete token space for cross‑modal foundation models.
Conclusion
TFM‑Tokenizer provides a principled solution to EEG tokenization by learning a discrete, time‑frequency motif vocabulary from single‑channel data. Its dual‑path encoder, time‑frequency masking objective, and model‑agnostic design enable substantial accuracy gains, improved generalization across devices, and enhanced interpretability. By decoupling token generation from downstream modeling, the framework opens the door for a new generation of EEG foundation models that can readily incorporate diverse datasets, devices, and tasks, much like tokenization has propelled progress in natural language processing.
Comments & Academic Discussion
Loading comments...
Leave a Comment