Synthesizing Epileptic Seizures: Gaussian Processes for EEG Generation
Reliable seizure detection from electroencephalography (EEG) time series is a high-priority clinical goal, yet the acquisition cost and scarcity of labeled EEG data limit the performance of machine learning methods. This challenge is exacerbated by the long-range, high-dimensional, and non-stationary nature of epileptic EEG recordings, which makes realistic data generation particularly difficult. In this work, we revisit Gaussian processes as a principled and interpretable foundation for modeling EEG dynamics, and propose a novel hierarchical framework, \textit{GP-EEG}, for generating synthetic epileptic EEG recordings. At its core, our approach decomposes EEG signals into temporal segments modeled via Gaussian process regression, and integrates a domain-adaptation variational autoencoder. We validate the proposed method on two real-world, open-source epileptic EEG datasets. The synthetic EEG recordings generated by our model match real-world epileptic EEG both quantitatively and qualitatively, and can be used to augment training sets.
💡 Research Summary
The paper tackles the pressing problem of scarce, heterogeneous labeled EEG data for epileptic seizure detection. Existing deep generative approaches (GANs, VAEs, diffusion models) typically produce short, low‑dimensional segments and lack interpretability, making them unsuitable for clinical‑grade, long‑horizon, multichannel EEG synthesis. To overcome these limitations, the authors propose GP‑EEG, a hierarchical pipeline that combines principled probabilistic modeling with modern deep learning for domain adaptation.
First, all recordings of a single patient are concatenated and subjected to singular value decomposition (SVD). This yields a low‑dimensional temporal score matrix U (size T × d) and a spatial loading matrix Y (size C × d). The dimensionality d is chosen to retain the dominant variance while drastically reducing computational load.
Next, the authors detect quasi‑stationary regimes in each temporal component using a hybrid of KPSS and Augmented Dickey‑Fuller tests. Within each regime a Gaussian Process (GP) regression is fitted with a quasi‑periodic kernel that combines a periodic term (to capture rhythmic EEG activity) and a Matérn‑3/2 term (to model roughness). Hyper‑parameters are optimized per regime via maximum likelihood. Synthetic temporal scores are generated by sampling from the fitted GPs, ensuring continuity across regime boundaries by initializing each new segment with the previous segment’s endpoint.
The synthetic temporal scores are projected back to the original 18‑channel space by multiplying with the patient‑specific spatial loadings Yᵀ, producing a “raw” synthetic EEG that preserves inter‑channel correlation. However, SVD compression and the fixed kernel can introduce spectral distortions. To correct this, the authors train a convolutional‑LSTM variational auto‑encoder (Conv‑LSTM VAE) as a domain‑adaptation module. The VAE receives the raw synthetic EEG as input and is trained to reconstruct the original real EEG, thereby learning to map the surrogate distribution onto the true EEG distribution while retaining the long‑range temporal structure.
The complete pipeline consists of six stages: (1) dimensionality reduction, (2) changepoint detection and GP regression, (3) raw synthesis, (4) kernel‑state discretisation and Markov‑chain modelling of regime dynamics, (5) Poisson‑process modelling of changepoint times, and (6) domain adaptation via Conv‑LSTM VAE.
Evaluation is performed on two public datasets: CHB‑MIT and Siena, both recorded at 256 Hz with 18 channels. Quantitative metrics include power spectral density (PSD) similarity, inter‑channel correlation matrices, autocorrelation functions, and downstream seizure‑detection performance. GP‑EEG outperforms recent GAN, VAE, and diffusion baselines, achieving closer PSD matches (≈12 % improvement) and higher correlation fidelity. Visual inspection confirms that generated seizures are indistinguishable from real ones over windows exceeding 10 seconds. Moreover, augmenting training data with GP‑EEG‑generated recordings improves seizure‑detection AUC by 3–5 %.
The authors acknowledge limitations: the need for patient‑specific model training, sensitivity to the choice of SVD rank d and changepoint detection thresholds, and the current focus on 256 Hz data. Future work is suggested on meta‑learning for cross‑patient generalisation, automated rank selection, robustness to high‑frequency noise, and real‑time synthesis for clinical decision support.
In summary, GP‑EEG presents a novel, interpretable, and scalable solution for generating long‑horizon, multivariate epileptic EEG. By marrying Gaussian‑process based regime modeling with a deep domain‑adaptation network, it delivers high‑fidelity synthetic data that can alleviate data scarcity, enhance seizure‑detection models, and potentially serve broader neuro‑engineering applications.
Comments & Academic Discussion
Loading comments...
Leave a Comment