Investigating the Generalizability of ECG Noise Detection Across Diverse Data Sources and Noise Types

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Electrocardiograms (ECGs) are vital for monitoring cardiac health, enabling the assessment of heart rate variability (HRV), detection of arrhythmias, and diagnosis of cardiovascular conditions. However, ECG signals recorded from wearable devices are frequently corrupted by noise artifacts, particularly those arising from motion and large muscle activity, which distort R-peaks and the QRS complex. These distortions hinder reliable HRV analysis and increase the risk of clinical misinterpretation. Existing studies on ECG noise detection typically evaluate performance on a single dataset, limiting insight into the generalizability of such methods across diverse sensors and recording conditions. In this work, we propose an HRV-based machine learning approach to detect noisy ECG segments and evaluate its generalizability using cross-dataset experiments on four datasets collected in both controlled and uncontrolled settings. Our method achieves over 90% average accuracy and an AUPRC exceeding 90%, even on previously unseen datasets-demonstrating robust performance across heterogeneous data sources. To support reproducibility and further research, we also release a curated and labeled ECG dataset annotated for noise artifacts.

💡 Research Summary

The paper addresses a critical challenge in wearable electrocardiogram (ECG) monitoring: the frequent corruption of signals by motion‑related and large muscle artifacts that obscure R‑peaks and the QRS complex, thereby compromising heart‑rate‑variability (HRV) analysis and clinical interpretation. While many prior works on ECG noise detection evaluate performance on a single dataset, this study explicitly investigates the generalizability of a noise‑detection method across heterogeneous data sources and noise types.

The authors propose a machine‑learning pipeline that relies exclusively on time‑domain HRV features extracted from short (20 second) ECG windows. After segmenting the raw recordings with 50 % overlap, they apply a band‑pass filter (3–45 Hz) and a moving‑average smoother (implemented via BioSPPy) to suppress baseline wander, power‑line interference, and minor muscle noise. R‑peak detection is performed using the Hamilton segmenter in BioSPPy, followed by NeuroKit2 to compute RR intervals. From each window, 22 HRV descriptors are derived, including MeanNN, SDNN, RMSSD, SDSD, CVNN, pNN50, and others, providing a compact yet informative representation of beat‑to‑beat variability.

Four datasets are used for evaluation: (1) a newly collected Speech Performance Dataset (D1) recorded with three‑lead Shimmer sensors at 1024 Hz in a controlled lab setting; (2) the publicly available BUTQDB (D2), comprising long‑term single‑lead recordings from everyday activities; (3) an extended version of MIT‑BIH‑NST (D3) where realistic muscle and motion artifacts are synthetically added at 0 dB and –6 dB SNR; and (4) a newly released Activity Dataset (D4) captured during sitting, standing, and walking with Shimmer sensors. Two experienced annotators manually labeled noisy segments based on the visibility of R‑peaks, and a segment was marked noisy if at least half of its constituent indices were noisy.

Several classifiers are trained and compared: Logistic Regression, Support Vector Machine, Decision Tree, Gradient Boosting, Random Forest, as well as shallow ANN, 1‑D CNN, and VGG‑style deep networks. Stratified 5‑fold cross‑validation is used for within‑dataset assessment, while cross‑dataset experiments involve training on one dataset and testing on the remaining three. Random Forest consistently outperforms other models, achieving an average accuracy of 91.1 % and an area under the precision‑recall curve (AUPRC) exceeding 90 % in cross‑dataset tests. When all datasets are combined for training, the accuracy rises to 93.6 %, demonstrating strong robustness to variations in sensor hardware, sampling rates, and noise characteristics.

The contributions are threefold: (i) introducing an HRV‑based, feature‑light approach that attains high noise‑detection performance without deep‑learning complexity; (ii) providing the first systematic cross‑dataset evaluation of ECG noise detection, thereby highlighting the importance of generalizability; and (iii) releasing a curated, annotated dataset (D4) to facilitate reproducibility and future research.

Limitations include reliance on 20‑second windows, which may miss very brief artifacts, and the binary definition of noise that could overlook subtle degradations relevant in clinical practice. Future work should explore multi‑scale HRV features, incorporate frequency‑domain and nonlinear descriptors, develop lightweight real‑time models, and examine interactions with pathological rhythms such as atrial fibrillation. Overall, the study demonstrates that HRV‑derived time‑domain features are sufficiently discriminative to enable robust, generalizable ECG noise detection across diverse wearable recording scenarios.

Investigating the Generalizability of ECG Noise Detection Across Diverse Data Sources and Noise Types

💡 Research Summary

Comments & Academic Discussion

Leave a Comment