SIGMA-PPG: Statistical-prior Informed Generative Masking Architecture for PPG Foundation Model
Current foundation model for photoplethysmography (PPG) signals is challenged by the intrinsic redundancy and noise of the signal. Standard masked modeling often yields trivial solutions while contrastive methods lack morphological precision. To address these limitations, we propose a Statistical-prior Informed Generative Masking Architecture (SIGMA-PPG), a generative foundation model featuring a Prior-Guided Adversarial Masking mechanism, where a reinforcement learning-driven teacher leverages statistical priors to create challenging learning paths that prevent overfitting to noise. We also incorporate a semantic consistency constraint via vector quantization to ensure that physiologically identical waveforms (even those altered by recording artifacts or minor perturbations) map to shared indices. This enhances codebook semantic density and eliminates redundant feature structures. Pre-trained on over 120,000 hours of data, SIGMA-PPG achieves superior average performance compared to five state-of-the-art baselines across 12 diverse downstream tasks. The code is available at https://github.com/ZonghengGuo/SigmaPPG.
💡 Research Summary
The paper introduces SIGMA‑PPG, a generative foundation model for photoplethysmography (PPG) signals that explicitly tackles the intrinsic redundancy and low signal‑to‑noise ratio of PPG recordings. The architecture consists of two cascaded stages. In Stage 1, a spectrum‑aware vector‑quantized variational auto‑encoder (VQ‑VAE) tokenizes raw one‑second patches into discrete semantic tokens. Unlike conventional time‑domain reconstruction losses, the decoder is trained to reconstruct the power spectral density (PSD) of each patch, encouraging the codebook to capture physiologically relevant frequency components (e.g., heart‑rate peaks) while ignoring high‑frequency noise. A semantic consistency loss further forces the encoder to produce identical latent vectors for an original patch and its stochastic augmentations (scaling, Gaussian noise), ensuring that minor morphological variations map to the same codebook index and thus improving codebook density and robustness.
Stage 2 replaces random masking with a Prior‑Guided Adversarial Masking mechanism implemented as a reinforcement‑learning (RL) teacher‑student game. The teacher network generates binary masks for the token sequence, but its logits are biased by statistical priors derived from each patch: an amplitude‑stability score (based on relative median‑absolute‑deviation and absolute validity gates) and a skewness score (capturing the asymmetry between systolic and diastolic phases). These priors are combined into a single prior score S_prior, which is added to the teacher’s raw logits with a scaling factor α. The teacher’s objective is to maximize the reconstruction loss (negative log‑likelihood) of the student, while the student—a bidirectional Transformer encoder—tries to predict the masked tokens. This adversarial interaction forces the student to go beyond trivial interpolation of adjacent cardiac cycles and to learn global morphological dependencies, especially around physiologically informative regions such as systolic peaks.
The model is pre‑trained on an unprecedented 120 000 hours of publicly available PPG data and subsequently fine‑tuned on twelve downstream tasks spanning six datasets, including heart‑rate estimation, SpO₂ prediction, cuff‑less blood‑pressure inference, emotion recognition, and stress detection. Compared against five state‑of‑the‑art PPG foundation models (e.g., PaPagei, Pulse‑PPG, AnyPPG), SIGMA‑PPG achieves superior average performance, with improvements of 3–7 percentage points on classification accuracy and notable reductions in mean absolute error for regression tasks. Ablation studies demonstrate that both the PSD‑based reconstruction and the semantic consistency loss substantially increase codebook semantic density, while the prior‑guided masking yields higher robustness under motion‑induced artifacts.
The authors acknowledge limitations: the RL teacher‑student loop incurs considerable computational overhead and is sensitive to hyper‑parameters (β controlling the balance between amplitude and skewness, α scaling the prior bias). The current implementation focuses on single‑channel PPG; extending the framework to multi‑channel or multimodal signals (e.g., ECG, accelerometer) remains future work. Nonetheless, SIGMA‑PPG provides a compelling blueprint for leveraging statistical priors and adversarial curriculum learning to build robust, generative PPG representations that can serve as a unified backbone for next‑generation medical AI and wearable health applications.
Comments & Academic Discussion
Loading comments...
Leave a Comment