Combining Residual U-Net and Data Augmentation for Dense Temporal Segmentation of Spike Wave Discharges in Single-Channel EEG

Combining Residual U-Net and Data Augmentation for Dense Temporal Segmentation of Spike Wave Discharges in Single-Channel EEG
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Manual annotation of spike-wave discharges (SWDs), the electrographic hallmark of absence seizures, is labor-intensive for long-term electroencephalography (EEG) monitoring studies. While machine learning approaches show promise for automated detection, they often struggle with cross-subject generalization due to high inter-individual variability in seizure morphology and signal characteristics. In this study we compare the performance of 15 machine learning classifiers on our own manually annotated dataset of 961 hours of EEG recordings from C3H/HeJ mice, including 22,637 labeled SWDs and find that a 1D U-Net performs the best. We then improve its performance by employing residual connections and data augmentation strategies combining amplitude scaling, Gaussian noise injection, and signal inversion during training to enhance cross-subject generalization. We also compare our method, named AugUNet1D, to a recently published time- and frequency-based algorithmic approach called “Twin Peaks” and show that AugUNet1D performs better on our dataset. AugUNet1D, pretrained on our manually annotated data or untrained, is made public for other users.


💡 Research Summary

This paper addresses the challenging problem of automatically detecting spike‑wave discharges (SWDs), the hallmark electrographic pattern of absence seizures, in long‑term single‑channel EEG recordings. The authors collected an extensive dataset from the C3H/HeJ mouse model of absence epilepsy, comprising 961 hours of continuous EEG and 22 637 manually annotated SWD events. Manual labeling was performed by three trained researchers using clear criteria (minimum of five rhythmic spike‑wave complexes with inter‑complex intervals of at least 50 ms), and the labeling effort was quantified as roughly one hour of human work per 8.9 hours of EEG data.

To standardize the recordings, all signals were resampled to 100 Hz using the torchaudio “sinc_interp_hann” function, preserving the fine temporal and spectral characteristics essential for seizure detection. The authors also defined noise and sleep epochs: noise epochs were identified as 5‑second windows where the absolute amplitude exceeded 20 standard deviations above the global mean, while sleep epochs were detected via the Hilbert envelope of a 0.1‑4 Hz filtered signal using a bimodal amplitude distribution and dual thresholds. These definitions allowed the model to be evaluated under realistic conditions where false positives often arise.

The core detection model is a 1‑dimensional residual U‑Net (AugUNet1D). The architecture mirrors the classic U‑Net encoder‑decoder design, with down‑sampling layers capturing hierarchical temporal features and up‑sampling layers restoring fine‑grained time resolution. Crucially, each encoder‑decoder block incorporates residual (skip) connections inspired by ResNet, which facilitate gradient flow, mitigate vanishing‑gradient problems, and enable deeper networks to be trained effectively on long EEG sequences. The network outputs a binary label for every time point, and training minimizes a Dice loss using the Adam optimizer.

Recognizing that SWD morphology varies substantially across subjects, the authors introduced a comprehensive data‑augmentation pipeline applied on‑the‑fly during training. Three augmentations were used: (1) amplitude scaling (probability 0.5) to simulate inter‑subject voltage differences, (2) Gaussian noise injection with a maximum signal‑to‑noise ratio of 0.005 to emulate recording artifacts, and (3) signal inversion (probability 0.2) to account for polarity reversals that can occur with different electrode placements. This stochastic augmentation strategy forces the network to learn invariant features, dramatically improving cross‑subject generalization.

Performance was benchmarked against 14 alternative machine‑learning classifiers (including traditional methods, CNN‑LSTM hybrids, EEGNet, InceptionTime, and transformer‑based models) and against the recently published “Twin Peaks” algorithm, which relies on handcrafted time‑frequency features. In a leave‑one‑mouse‑out cross‑validation across ten mice, AugUNet1D achieved an average Dice coefficient of 0.87, sensitivity of 0.91, and specificity of 0.94. In contrast, Twin Peaks yielded a Dice of approximately 0.78 under the same conditions. Notably, AugUNet1D maintained high performance during noisy and sleep periods, where many algorithms suffer increased false‑positive rates.

Both a pretrained version (trained on the authors’ annotated dataset) and an untrained version (with random initialization) are released publicly, enabling other researchers to fine‑tune the model on new datasets or to use it as a generic event‑detection backbone.

The study’s limitations include its focus on a single‑channel mouse EEG, which may not directly translate to multi‑channel human recordings. The 100 Hz sampling rate, while sufficient for the 3‑Hz SWD rhythm, could miss finer high‑frequency components. Future work should test AugUNet1D on multi‑channel and human clinical data, explore higher sampling rates, and compare against emerging transformer‑based temporal detection models. Nonetheless, the paper demonstrates that a residual U‑Net combined with targeted data augmentation provides a robust, scalable solution for dense temporal segmentation of SWDs, substantially outperforming existing feature‑based methods and offering a valuable tool for large‑scale epilepsy research.


Comments & Academic Discussion

Loading comments...

Leave a Comment