Weighted Temporal Decay Loss for Learning Wearable PPG Data with Sparse Clinical Labels

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Advances in wearable computing and AI have increased interest in leveraging PPG for health monitoring over the past decade. One of the biggest challenges in developing health algorithms based on such biosignals is the sparsity of clinical labels, which makes biosignals temporally distant from lab draws less reliable for supervision. To address this problem, we introduce a simple training strategy that learns a biomarker-specific decay of sample weight over the time gap between a segment and its ground truth label and uses this weight in the loss with a regularizer to prevent trivial solutions. On smartwatch PPG from 450 participants across 10 biomarkers, the approach improves over baselines. In the subject-wise setting, the proposed approach averages 0.715 AUPRC, compared to 0.674 for a fine-tuned self-supervised baseline and 0.626 for a feature-based Random Forest. A comparison of four decay families shows that a simple linear decay function is most robust on average. Beyond accuracy, the learned decay rates summarize how quickly each biomarker’s PPG evidence becomes stale, providing an interpretable view of temporal sensitivity.

💡 Research Summary

The paper tackles a fundamental obstacle in wearable health monitoring: the scarcity and temporal misalignment of clinical labels for photoplethysmography (PPG) data. While modern smartwatches can continuously record high‑frequency PPG signals, most biomarkers (cholesterol, HbA1c, electrolytes, blood counts, etc.) are only measured in the clinic at discrete time points, often weeks apart from the recorded signals. Traditional approaches either discard all PPG segments that are not temporally close to a lab draw or treat every segment within a fixed window as equally reliable. Both strategies ignore the fact that the physiological relevance of a PPG segment decays as the time gap to the ground‑truth measurement grows, introducing label noise that degrades model performance.

Proposed solution – Weighted Temporal Decay Loss
The authors introduce a loss function that explicitly incorporates the absolute time difference Δt (in days) between a 10‑second PPG segment and its nearest clinical label. For each biomarker b, a learnable decay rate α̂_b (constrained to be non‑negative via a softplus) modulates a monotonic decreasing decay function g(·). The weight assigned to segment i is

w_i = g(α̂_b·Δt_i).

Four families of g(·) are examined: linear, exponential, inverse, and cosine‑annealing. The weighted binary cross‑entropy (BCE) loss is then multiplied by w_i, and a regularization term λ·(1/N)∑w_i is subtracted to prevent the trivial solution where α̂_b → ∞ (which would drive all w_i to zero). λ is fixed at 0.5 for all experiments, eliminating the need for per‑biomarker hyper‑parameter tuning. Importantly, the weighting is applied only during training; inference uses the underlying neural network without any additional computation, preserving real‑time deployment efficiency.

Dataset and preprocessing
Data were collected from 450 participants (236 F, 214 M, age 24‑92) wearing Samsung Galaxy Watch 6 over a ten‑month period (Oct 2024–Aug 2025). Green‑light PPG was sampled at 25 Hz, segmented into non‑overlapping 10‑second windows, and filtered (0.5‑5 Hz band‑pass) after discarding low‑quality segments using a signal‑quality index (SQI). Z‑score normalization was applied per segment. Clinical records provided measurements for ten biomarkers: LDL, triglycerides, HbA1c, hemoglobin, CO₂, chloride, potassium, sodium, white‑blood‑cell count, and platelets. For each biomarker, the top 25 % of values were labeled positive and the bottom 25 % negative; the middle 50 % were excluded. Segments whose nearest lab draw was more than 30 days away were also excluded, yielding a final subject count per biomarker ranging from 33 (triglycerides) to 86 (chloride).

Model architecture and baselines
The backbone is the same encoder used in the state‑of‑the‑art self‑supervised PPG model PAPAGEI, pretrained on a large PPG corpus. A simple classification head is added, and the model is fine‑tuned using the proposed weighted loss. Two baselines are compared: (1) a Random Forest trained on 34 handcrafted features (morphology descriptors, HRV metrics, average heart rate) and (2) PAPAGEI fine‑tuned with the standard BCE loss (referred to as PAPAGEI‑FT). Five‑fold subject‑wise cross‑validation ensures that no subject appears in both training and test folds.

Results
Across the ten biomarkers, the weighted‑decay model achieves an average AUROC of 0.712 and AUPRC of 0.715, outperforming PAPAGEI‑FT (0.660/0.674) and Random Forest (0.599/0.626). The most pronounced gains are observed for biomarkers with rapid physiological dynamics: potassium (AUROC 0.724) and white‑blood‑cell count (AUROC 0.843). The linear decay function yields the best overall performance (0.712/0.715), while inverse and exponential decays are close, and cosine‑annealing lags behind. The authors hypothesize that linear decay provides a clear cut‑off (≈ 1/α) that eliminates gradients from very distant samples while still preserving a smooth decrease, avoiding early saturation (exponential) or long tails (inverse).

Ablation studies confirm the importance of each component: removing the learnable α̂_b (fixing decay) drops performance to 0.676/0.694; removing the time‑aware loss altogether reduces it further to the baseline level of 0.660/0.674. Thus, the temporal weighting accounts for the bulk of the improvement, and learning a biomarker‑specific decay rate adds a modest but consistent boost.

Discussion and limitations
The method excels when the biomarker’s true value changes quickly relative to the labeling interval, as the model can down‑weight stale segments that would otherwise inject noise. Triglycerides are an outlier where the baseline PAPAGEI outperforms the proposed approach, likely due to the small cohort (N = 33) and the strong influence of recent diet, which makes a single lab draw a poor anchor for a 30‑day window. The authors acknowledge that a fixed 30‑day window is suboptimal; different biomarkers may require longer or shorter horizons. Future work could learn the window length jointly with the decay rate, perhaps via a differentiable mask over Δt. Moreover, the study is limited to a single device family and health system, so external validation on other wearables, populations, and clinical workflows is essential before deployment.

Conclusion
The paper introduces a principled, easy‑to‑implement training strategy that incorporates temporal proximity into the loss function, enabling deep learning models to exploit all available PPG data even when clinical labels are sparse and temporally misaligned. By learning biomarker‑specific decay rates, the method not only improves predictive performance across a diverse set of ten biomarkers—including electrolytes and lipids that are traditionally difficult to infer from PPG—but also yields interpretable decay parameters that reflect how quickly PPG evidence becomes stale for each biomarker. The approach adds no inference overhead, making it attractive for real‑world continuous health monitoring. Future directions include external validation, multi‑modal sensor fusion, and adaptive window learning to further enhance robustness and generalizability.

Weighted Temporal Decay Loss for Learning Wearable PPG Data with Sparse Clinical Labels

💡 Research Summary

Comments & Academic Discussion

Leave a Comment