AnySleep: a channel-agnostic deep learning system for high-resolution sleep staging in multi-center cohorts
Sleep is essential for good health throughout our lives, yet studying its dynamics requires manual sleep staging, a labor-intensive step in sleep research and clinical care. Across centers, polysomnography (PSG) recordings are traditionally scored in 30-s epochs for pragmatic, not physiological, reasons and can vary considerably in electrode count, montage, and subject characteristics. These constraints present challenges in conducting harmonized multi-center sleep studies and discovering novel, robust biomarkers on shorter timescales. Here, we present AnySleep, a deep neural network model that uses any electroencephalography (EEG) or electrooculography (EOG) data to score sleep at adjustable temporal resolutions. We trained and validated the model on over 19,000 overnight recordings from 21 datasets collected across multiple clinics, spanning nearly 200,000 hours of EEG and EOG data, to promote robust generalization across sites. The model attains state-of-the-art performance and surpasses or equals established baselines at 30-s epochs. Performance improves as more channels are provided, yet remains strong when EOG is absent or when only EOG or single EEG derivations (frontal, central, or occipital) are available. On sub-30-s timescales, the model captures short wake intrusions consistent with arousals and improves prediction of physiological characteristics (age, sex) and pathophysiological conditions (sleep apnea), relative to standard 30-s scoring. We make the model publicly available to facilitate large-scale studies with heterogeneous electrode setups and to accelerate the discovery of novel biomarkers in sleep.
💡 Research Summary
Sleep plays a central role in health, yet conventional sleep staging relies on manual scoring of 30‑second epochs, a convention driven by practical constraints rather than physiology. This limits the temporal granularity at which sleep dynamics can be examined and makes large‑scale, multi‑center studies cumbersome because electrode montages differ across sites. In response, the authors present AnySleep, a deep neural network that can ingest any combination of EEG and EOG channels and produce sleep stage predictions at adjustable temporal resolutions ranging from the standard 30 s down to 0.008 s (128 Hz).
The architecture combines a U‑Net‑style encoder‑decoder with channel‑attention modules. Each input channel is processed independently through encoder blocks to generate channel‑specific feature maps; learnable attention weights then fuse these maps into a cross‑channel representation that the decoder uses to output stage probabilities at the desired frequency. During training, the number and type of input channels are randomly varied, encouraging the model to become robust to heterogeneous montages.
Training and validation were performed on an unprecedented collection of 19 909 overnight polysomnographies from 21 datasets, encompassing roughly 200 000 hours of EEG/EOG data collected at many clinics and spanning diverse patient populations. The data were split into an in‑distribution set (13 datasets) for model development and a hold‑out set (8 datasets) for unbiased generalization testing.
At the conventional 30‑second epoch level, AnySleep achieved macro‑averaged F1 scores between 0.71 and 0.76 across all tested channel configurations, including single‑channel scenarios and cases without any EOG. Performance improved monotonically with the number of EEG channels, reaching 0.771 when six EEG channels plus one EOG were supplied. By contrast, the previously published U‑Sleep model requires exactly one EEG and one EOG channel; when forced to handle other configurations its performance dropped markedly (average macro‑F1 reductions of 0.16–0.20). AnySleep therefore demonstrates superior flexibility and consistently higher accuracy across a wide range of realistic recording setups.
The high‑frequency output (up to 128 Hz) enables the detection of brief events that are invisible to 30‑second staging. When evaluated on the MASS C1 and C3 datasets, AnySleep’s 2‑second resolution predictions overlapped 57 % of expert‑annotated arousals, compared with only 7.7 % overlap for standard staging. Intersection‑over‑Union analysis showed peak precision (0.475), recall (0.530) and F1 (0.442) for arousal detection at 2–8 second windows, confirming that the model learns to represent arousals as short wake segments.
Beyond event detection, the authors explored whether the fine‑grained stage dynamics carry information about subject characteristics. Using “triplet features” (counts of consecutive three‑stage sequences) derived from high‑resolution predictions, they were able to discriminate age groups, sex, and the presence of obstructive sleep apnea (OSA) with statistically significant differences, supporting the hypothesis that fragmented sleep patterns reflect underlying physiological or pathological states.
AnySleep’s code and pretrained weights are publicly released on GitHub (https://github.com/dslaborg/anysleep), allowing immediate deployment on heterogeneous datasets without the need for manual channel mapping or re‑training. This openness, combined with the model’s channel‑agnostic design and sub‑second temporal resolution, positions AnySleep as a practical tool for harmonizing sleep staging across sites, reducing data loss due to montage incompatibility, and accelerating the discovery of novel sleep biomarkers in large‑scale, multi‑center studies.
Comments & Academic Discussion
Loading comments...
Leave a Comment