Geometry- and Relation-Aware Diffusion for EEG Super-Resolution
Recent electroencephalography (EEG) spatial super-resolution (SR) methods, while showing improved quality by either directly predicting missing signals from visible channels or adapting latent diffusion-based generative modeling to temporal data, often lack awareness of physiological spatial structure, thereby constraining spatial generation performance. To address this issue, we introduce TopoDiff, a geometry- and relation-aware diffusion model for EEG spatial super-resolution. Inspired by how human experts interpret spatial EEG patterns, TopoDiff incorporates topology-aware image embeddings derived from EEG topographic representations to provide global geometric context for spatial generation, together with a dynamic channel-relation graph that encodes inter-electrode relationships and evolves with temporal dynamics. This design yields a spatially grounded EEG spatial super-resolution framework with consistent performance improvements. Across multiple EEG datasets spanning diverse applications, including SEED/SEED-IV for emotion recognition, PhysioNet motor imagery (MI/MM), and TUSZ for seizure detection, our method achieves substantial gains in generation fidelity and leads to notable improvements in downstream EEG task performance.
💡 Research Summary
The paper introduces TopoDiff, a geometry‑ and relation‑aware diffusion framework designed to perform spatial super‑resolution (SR) on electroencephalography (EEG) recordings. Traditional EEG‑SR approaches either directly predict missing channels from sparse recordings or adapt latent diffusion models to time‑series data, but they typically treat electrode positions as passive metadata, ignoring the rich scalp geometry and structured inter‑electrode relationships that are crucial for faithful signal reconstruction.
TopoDiff addresses these gaps by integrating two complementary inductive biases: (1) Global topological conditioning and (2) Dynamic channel‑relation graph modeling. For the former, low‑density EEG segments are first partitioned into non‑overlapping temporal patches; each patch is temporally averaged across channels and visualized as a color‑coded topographic map (a “topoplot”). These images are fed into a frozen, pretrained DINO‑v3 vision encoder, yielding high‑level spatial feature maps that capture the overall distribution of activity over the scalp. The resulting feature vectors are flattened and projected to the same latent dimension as the EEG token embeddings, then concatenated to the beginning of the diffusion Transformer’s input sequence, providing explicit global geometry guidance throughout the denoising process.
For the latter, at each diffusion timestep a graph is constructed whose nodes correspond to EEG channels and whose edges encode both physical electrode distances and instantaneous signal correlations. A graph neural network (GNN) performs message passing on this graph, enriching each channel token with time‑varying relational information. The edge weights are updated dynamically, allowing the model to capture evolving functional connectivity patterns that are essential for realistic EEG synthesis.
The core generative engine is a patch‑based Transformer denoiser. Low‑resolution EEG tokens (C_LR × T_p) and noisy high‑resolution “unseen” channel tokens (C_Unseen × T_p) are merged into a single token sequence. Within each Transformer block, a token‑wise self‑attention layer captures global interactions across all spatial‑temporal tokens, while a temporal‑wise self‑attention layer (operating on a transposed representation) models local dynamics within each temporal patch. The model is trained using an x‑prediction objective: at each noise level t, the network predicts the clean signal for the unseen channels, and an ℓ₂ reconstruction loss is applied only to those channels. During inference, the observed low‑density channels are kept as conditioning inputs while the reverse‑time ordinary differential equation (ODE) is solved, starting from Gaussian noise, to generate the full high‑density EEG.
Extensive experiments were conducted on four public datasets covering affective computing (SEED, SEED‑IV), motor‑imagery BCI (PhysioNet MI/MM), and clinical seizure detection (TUSZ). Quantitatively, TopoDiff outperforms state‑of‑the‑art baselines—including ST‑AD, ESTFormer, and SR‑GDiff—on standard reconstruction metrics such as PSNR (average gain > 2.8 dB), SSIM, and MSE. More importantly, when the reconstructed high‑density signals are fed into downstream classifiers, TopoDiff yields consistent performance boosts: emotion recognition accuracy improves by ~4 %, motor‑imagery classification F1‑score by ~3.5 %, and seizure detection AUC by ~5 % relative to the best competing method. Ablation studies confirm that both components are essential: removing the topographic conditioning degrades spatial consistency, while omitting the dynamic graph reduces the model’s ability to capture temporally varying connectivity, leading to lower fidelity reconstructions.
In summary, TopoDiff is the first diffusion‑based EEG‑SR approach that explicitly incorporates global scalp geometry via image embeddings and time‑dependent inter‑electrode relations via a dynamic graph. This dual‑conditioning strategy yields superior signal fidelity and downstream task performance, thereby enhancing the practicality of low‑density wearable EEG systems for both research and clinical applications. Future directions include real‑time inference, multimodal extensions (e.g., EEG‑fMRI fusion), and personalized graph construction for subject‑specific electrode layouts.
Comments & Academic Discussion
Loading comments...
Leave a Comment