PPG-Based Heart Rate Accuracy in Diverse Populations: Investigating Inequities Across Body Composition and Skin Tones

PPG-Based Heart Rate Accuracy in Diverse Populations: Investigating Inequities Across Body Composition and Skin Tones
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Wearable devices are widely used for heart rate (HR) monitoring, yet their accuracy across diverse body compositions and skin tones remains uncertain. This study evaluated four wrist worn devices (Apple, Fitbit, Samsung, Garmin) in 58 Hispanic adults with Fitzpatrick skin types III to V during a cycling protocol alternating moderate (0.64 to 0.76 HRmax) and vigorous (0.77 to 0.95 HRmax) intensities. Criterion HR was obtained using a Polar H10 ECG, and accuracy was assessed using mean absolute error, mean absolute percentage error (MAPE), bias, and intraclass correlation coefficients. All devices showed significant deviation from criterion measures. Apple and Garmin demonstrated the lowest error, whereas Fitbit and Samsung exhibited greater inaccuracies. Higher BMI and darker skin tones were associated with increased MAPE. These biases disproportionately affect higher risk populations, underscoring the need for improved algorithms to ensure equitable health monitoring.


💡 Research Summary

This study examined the heart‑rate (HR) accuracy of four commercially available wrist‑worn wearables—Apple Watch Series 8, Garmin Forerunner 955, Fitbit Sense 2, and Samsung Galaxy Watch 5—among 58 Hispanic adults (31 women, 27 men; mean age 23 years) with Fitzpatrick skin types III–V. Participants performed a structured cycling protocol on a recumbent ergometer that alternated 2‑minute intervals of moderate (64‑76 % HRmax) and vigorous (77‑95 % HRmax) intensity, preceded by a 5‑minute rest and followed by a 5‑minute recovery. A Polar H10 chest‑strap ECG served as the criterion reference.

Error metrics (Mean Absolute Error, Mean Absolute Percentage Error, signed bias, and Intraclass Correlation Coefficient) were calculated for each device. Non‑parametric analyses (Wilcoxon signed‑rank, Kruskal‑Wallis, Scheirer‑Ray‑Hare, and Spearman correlations) were employed due to skewed distributions and heteroscedasticity. All devices deviated significantly from the ECG (p < .001). Apple and Garmin showed the lowest MAE (≈ 3–4 bpm) and MAPE (≈ 3 %), with ICCs > 0.90, indicating high reliability. Fitbit and Samsung exhibited higher MAE (≈ 7 bpm) and MAPE (≈ 6 %), with ICCs around 0.75–0.78.

BMI and skin tone were strong moderators of error. Participants with BMI ≥ 30 kg/m² experienced larger MAPE (increase from ~4.5 % to ~7.2 %). Darker skin (Fitzpatrick V) further amplified error, with MAPE approaching 8 % overall and up to 12 % for the Fitbit device in the high‑BMI/dark‑skin subgroup. Spearman correlations confirmed positive relationships between BMI (ρ = 0.42) and skin tone (ρ = 0.35) with MAPE. Age, sex, and body‑fat percentage did not significantly affect accuracy.

The authors note several limitations: the sample is limited to Hispanic adults, the protocol restricts movement to a stationary bike, and data from the Empatica E4 were excluded. Nonetheless, the findings reveal systematic bias of current PPG‑based wearables against darker‑skinned and higher‑BMI individuals, raising concerns about health equity. The paper calls for algorithmic adjustments, sensor redesign, and broader validation studies that include diverse phenotypes to ensure equitable wearable health monitoring.


Comments & Academic Discussion

Loading comments...

Leave a Comment