Densely Connected Convolutional Networks and Signal Quality Analysis to Detect Atrial Fibrillation Using Short Single-Lead ECG Recordings

Densely Connected Convolution al Netw orks and Signal Qua lity Analysis t o Detect Atrial F ibrillation Usin g Short Single-Lead ECG Rec ordings Jonathan Rubin 1 * , Saman Parvaneh 1* , Asif Rahman 1 , Bry an Conro y 1 , Saeed Babaeizadeh 2 1 Philips Research Nor th America, Cambridge, MA, USA 2 Advanced Algorithm Research Ce nter, Philips Healthcare, Andover, MA, USA Abstract The develo pment of new technology such a s wearable s that record hig h-quality single channel ECG, provides an opportun ity for ECG screening in a larger population , especially for atrial fibrillation screening. The main goal of this study is to d evelop an automatic cla ssification algorithm for normal sinus rhythm (NSR), atrial fibrillation (AF), other rhythms (O), and noise from a single channel short ECG segment (9 -6 0 second s). For this purpose, signal quality index (SQI) along with dense convolutional neural networks was used . Tw o convolutional neural network (CNN) models (main model that accepts 15 seconds ECG and secondary model that processes 9 seconds shorter ECG) we re trained using the training data set . If the recording is determined to be of low qua lity by S QI, it is immedia tely classified as noisy . Otherwise, it is transformed to a time-frequen cy representation and classified with the C NN as N SR, AF, O, or noise. At the final step, a feature-based post-processing algorithm cla ssifies the rhythm as either NSR or O in c ase the CNN model’s discrimination between the two is indeterminate . The best result achieved at the official phase of the PhysioNet/CinC cha llenge on the b lind test set was 0.80 (F1 for NSR, AF, and O were 0.90 , 0 .80, and 0.70 , respectively). 1. Introduction Atrial Fibrillatio n ( AF) is the most co mmon heart arrhythmia and its incidence in the U nited States alo ne is estimated to be 2.7-6.1 million peop le [1]. As such, AF screening using handheld eas y- to -use devices has r eceived a lot of attention i n recent years. T he goal of the 201 7 PhysioNet/CinC Challenge is the development of algorithms to classify normal sinus rhyth m (NS R), AF, other r hythm (O), an d noisy re cordings from a s hort s ingle- channel ECG recording (9 -60 seconds) . In light of the successful utilization of deep neural networks for * Authors con tributed equally classification of biomedical time series (e.g., h eart sounds) [2 - 5] , a signal quality index (SQI) technique alo ng with dense convolutional neural networks (CNN) trained with spectrogram representations were us ed f or class ificatio n o f ECG recordings. 2. Method and Material A blo ck diagra m of our proposed method is shown in Fig ure 1. Given an ECG reco rding, fir st QR S detection takes place, followed by signal q uality analysis. If the recording is j udged to be o f low quali ty (further details in Section 2.2) , it is immediately classif ied as noisy (noise detected by SQI i n Figure 1) . If t he reco rding quality is determined to be reasonable (SQI>0.5 ) , the ECG is transformed from a one-di mensional time- series to a time- frequency representation and consecutively ev aluated using one of two CNNs, depending o n the signal recording length. T he fir st model accep ts as input 15 -seco nd ECG segments. Ho wever, if the input recording is shorter th a n 15 seconds, a seco ndary model that pro cesses 9-second ECG segments is used. Bo th models e mploy t he Dense ly Connected Convolutional Network (DenseNet) architecture [6]. Compared to a standar d CNN architecture , each layer within a DenseNet archi tecture concatenates all preceding layer feature-maps as input. Figure 2 illustrates this concept, where arro w s indicate reused feature -maps from previous layers in a five-layer dense b lock. Each DenseNet model accep ts as input a spectrogra m segment co mputed by cons ecutive Fourier tran sforms (d etails could be found in section 2. 4 ). T he original DenseNet architect ure was modified to ensure b atch normalization [7] could b e performed ro w-wise (i.e. normalizing o ver frequenc y bins per batch) . This modification res ulted in networks t hat outper formed standard channel- wide batch nor malization. If the i nput ECG was labelled as NSR or O b y the DenseNet model, an additional check was perfor m ed by a post-processing unit (further details in Section 2. 5). 2.1. Data Splitting and Augmentation The training set for the challenge included 8,528 sin gle- channel ECG recordings (NSR: 5050 , AF: 7 38 , other rhythm: 2 456 , and noisy: 284 ). Details about the challenge dataset can be fou nd in [ 8] . A 5 -fold stratified sp lit was applied to the 8,528 ECG recordings made available by the challenge organizers. Stratified splitting was used to maintain class prevalence between the data splits. Recordings from f our of the splits w ere used to construct a training/validatio n set (68 21 ECG recordings) made up of the QRS ali gned spectrogra m segments. The tr aining set included 80% of t he above rec ordings. The other 20% were used as a validation set during model trai ning. The remaining stratified split , consisting of 1,7 07 ECG recordings, was kept as an in -house test -set for assessing algorithm performance, i ndependent from t he b lind challenge test dataset. A further 6,3 12 30 -second ECG segments rep resenting atrial fibrillation were coll ected from various sources (including ambulator y recordings fro m Ho lter monitors) and used to augment the tr aining and validatio n sets. Baseline wander was re moved from each AF seg ment and was up -sa mpl ed from 250 to 300 sa mples-per-second in order to m atch t he sampling rate o f the challenge dataset. 2.2. QRS Detection and S ignal Quality Assessment After re moving baseline wander using a moving average filter, QRS co mplex es were detected using gqrs algorithm , p ublicly a vailable in WFDB toolbox [9 ]. After aligning by the detected QRS peaks , average template matching correlation coe fficient [ 10] with the threshold of 0.5 was used as SQI to identify noisy data. T his measure had the h ighest area under the receiver oper ating characteristic ( ROC) curve fo r discri minating between artefacts and arrh ythmic ECG [1 1] . 2.3. Spectrogram For each recording, a spec trogram was co nstructed using an FFT applied on a m oving window with the length of 75 sa mples and o verlap of 5 0%. Seg ments with the length of 15 and 9 seconds were extracted from the spectrogram beginning at each of the detected QRS peaks. 2.4. Dense Convolutional Neural Networks If the quality of ECG rec ording was reasonable ( SQI> 0.5) b y the SQI module, rhythm clas sification took place using a dense co nvolutional neural network. Reco rdings processed by CNNs were classified as NSR, AF, O, as well as noisy. Recall t hat, at first, an atte mpt is made to u se a CNN model that processes 15 -second segments. However, if the input recording le ngth is not long enough, a secondary model that p rocesses 9 -second se gments i s used. 2.4.1. Main and Secondary Models Main model : T he 15 -second model is made up of 3 dense blo cks co nsisting total of 40 layers. Eac h la yer involves appl ying a convol utional filter, followed b y Re LU activatio n and row-wise batch nor malization. A growth rate of 6 feature -maps was used for each la yer. Model input dimensio ns were a single channe l of 20 frequency bins by 375 time segments. The first 20 frequency bins from the computed spectrogram cap tures a frequency range of up to approximately 50 Hz. In total, the model consisted of 26 2,344 trainable para meters. Secondary Model : The architect ure used for the secondary model was similar to the main model, however, a smaller growth rate of 4 featu re -maps was used p er layer. Model input dimensio ns were a single channel o f 20 frequency bin s b y 225 time segments, height and width. The lo wer width resulting from the shorter 9-second segment s ize. In total , the secondar y model co nsisted o f 119,458 trainab le parameters. Figure 2. Five la yers of a DenseNet block with a growth rate of 4 feature -maps per layer (source [6] ). Figure 1. B lock diagram of the pr oposed algorithm. 2.4.2. Model Training Both the main a nd seco ndary models were trained as four class classification m odels u sing stan dard softmax cross-entrop y loss. Models were typically trained for no more than 1 5 epo chs. Once a model was s ufficiently trained, in-house tes ting was per formed o n the left-out stratified sp lit , as previously de scribed. Models that achieved des irable per formance we re further trai ned before submission to the challenge s erver. In par ticular, the full five splits of challe nge da ta w ere used to train a final model, where 95 % of the data w as used for trainin g a nd the remaining 5% for validation. Final model train ing did not occur from scratch, but rather weights fro m th e previousl y learned model were used to pre -initialize the de nse CNN for continued training using the updated, full dataset. 2.5. Post Processing If the ECG is labelled as NSR or O by th e CNN and the probability of being NSR a nd O ar e close to each o ther (absolute difference bet ween pro bability of NS R and O < 0.4), a feature-based post-processi ng step is p erformed to cast th e final decisio n. For NSR/O post-processing, an AdaBoost-abstain c lass iﬁer [12 ] was trained using the NSR a nd O recordings in th e in -house training set. Its performance was tested on t he in -house te st set. A total of 437 features were extracted fr om five different categories to train the model: - Signal quality (2 features): average template matching correlation coefficient [10 ] and bSQI [ 13] based on the output of gqrs and P an-To mpkins [14] QRS detection algorithms. - Frequency content (10 featu res) : median power across nine freque ncy band s ( 1- 15, 15 - 30, 30 - 45, 45 - 60, 60 - 75, 75 - 90, 90 - 150, 5- 14, and 5 -50Hz) as well as ratio of power in 5 -14Hz b and to power in 5 -50Hz. The power spectrum of the ECG record w as esti mated using discrete-time Fourier transform . - Beat to beat interval ( 11 features): nu mber, minimum, maximum, and median of RR intervals, S DNN, RMSSD, average heart rate, and dif ferent heart rate asymmetry measures (P I, GI, SI) - ECG -based reco nstructed phase space (401 features): normalized E CG reconstructed phase space ( RPS) was created w ith di mension 2 a nd delay equal to 4 sample s [15]. Then, th e RPS w as divided into sm all square areas (grid of 20×2 0). No rmalized number of p oints in eac h square was considered a feature. In addition, spatial filling index was calculated [16] . - Poincare section fro m E CG (13 features): using RPS reconstructed fro m EC G, a s described above, 13 different features from P oincare sectio n with unity line were ex tracted. More details about the method and features can be found i n [15, 17] . 2.6. Algorithm Evaluation Performance of the algorithm was e valuated using an average of t hree F1 values for classification of NSR, AF, and O (F1 n , F1 a , and F1 o , respectively). In-house test set was used for algorith m eval uation indep endent from the blind challenge test dataset. Also, perfor mance was te sted on a rando m subset of blind hidden test during official phase and final score w as creat ed using the whole blind test set. 3. Results Area under the ROC curve for AdaBoost – ab stain cla ssifier in NS R/O p ost-processin g ste p was 0.86 o n the in-house test set. On ly 58 features w ere selected by the classiﬁer , the top 10 were from beat to beat interval ( n=5), ECG-based reconstructed phase space (n=2) , and Poincare section from ECG (n=3) . The best result achieved of the propo sed algorithm at the official phase of the challenge on the in - ho use test set was 0.82 (F1 for NSR, AF, and other rhythm were 0. 91 , 0. 80 , and 0.76 , r espectively). Final result on the w hole challenge blind test dataset w as 0 .80 (F1 for NSR, AF, and other rhythm were 0. 90, 0.80 , and 0.70, respectively). Table 1. Algorith m perfor mance o n in -house test set and whole b lind test set. F1 n , F1 a , and F1 o ar e F1 values for classification of NS R, AF, and O. F1 n F1 a F1 o F1 Dataset In -house test set 0.91 0.80 0.76 0.82 Whole blind test set 0.90 0.80 0.70 0.80 4. Discussion This work led to the develo pment and evaluatio n of several model types, not all of w hich are fully described in this paper . Here we discuss s ome of the findings o f this effort, as well as alternati ve approac hes investigated during the CinC challe nge.  One o f t he findings of this work was that for CNNs that process spectro grams as input, row -wise b atch normalization (i.e. normalizing o ver frequency bins per batch) is p referable to a t ypical c hannel -wide application of batch normaliza tion. This modification to our CNN models co nsistently r esulted in considerable p erformance gains.  Significant experimentatio n was per formed usi ng so called wide and deep networks [18], w here activ ations from later convolutional la yers (deep features) are combined with variables that capture information using domain kno wledge (wide features). The wide features that were considered in clude d well-known HRV measureme nts (e.g. SDNN, RMSSD, pNN5 0) , entropy measure (e.g. SampleEn) and morphological features (e. g. P -wave duration, PR interval, QT- interval). Ho wever, the add ition of wide features typically res ulted in app roximately a 2% drop in overall performance. It is po ssible that additio nal wide features, not presently i ncluded within our experimentation, would result in performance improvement.  Lastly, the model p resented here achieved its current performance using onl y time -frequency i nputs encoded as sp ectrograms. A further model type was tested that accepted time-frequency inputs (as described), as well as a parallel CNN architec ture that accep ted the raw ECG wavefor m a s input to automatically capture morphological i nformation. These two parallel models were combined to make a final clas sification. This d ual network that cap tured frequency a nd morphology information showed promise on our in-house test s et results – producing F 1 scores that outperfor med all other network architectures that were eval uated. However, these networks resulted in co mputation al require ments that were beyond the restrictio ns imposed by t he challenge server, hence we were not able to assess t heir overall performance on the hidd en challenge test dataset. 5. Conclusion In this article , a SQI technique was combined with dense co nvolutional neural networks following b y a post- processing feature-based clas sifier to find the best method for distingui shing atrial fibrillation fro m nor mal sinus rhythm, other rhyt hms, an d noise. The promising performance of t he algorithm makes us hopef ul that with further enhancement this techniq ue may be suitable for practical clinical use. References [1] C. T. January, L. S. Wann , J. S. Alpert, H. Calkins, J. C. Cleveland, J. E. Cigarroa , et al. , "2014 AHA/ACC/HRS guideline for the manageme nt o f p atients with atrial fibrillation," Circulation, p. CIR. 0000000000000041, 2014. [2] C. P otes, S . Parvaneh, A. Rahman, and B . Co nroy, " Classifier ensemble for detection of abnormal heart sounds," 2016. [3] J. Rubin, R. Abreu, A. Ganguli , S . Nelaturi, I. M atei, and K. Sricharan, "Recognizing Abno rmal Heart Sou nds Usin g Deep Learning," 2017. [4] J. Rubin, R. Abreu, A. Ganguli , S . Nelaturi, I. M atei, and K. Sricharan, "Classifying heart sound recordings using deep convolutional neural networks and mel-frequency cepstral coefficients," in Computing in Cardiology Con ference (CinC), 2016 , 2016, pp. 813-816. [5] C. Potes, S. Parvaneh, A. Rahman, and B. Conroy , "Ensemble of f eature-based and deep learnin g-based classifiers for detection of abn ormal heart sounds," in Computing in Cardiology Conference (CinC), 2016 , 2016, pp. 621-624. [6] G. Hu ang, Z. Liu, K. Q. Weinberger, and L. van der Maaten, "Densely connected convolutional networks," arXiv preprint arXiv:1608.06993, 2016. [7] S. Ioffe and C. Szegedy , "Batch normaliza tion: Ac celerating deep network training by reducing in ternal covariate sh ift," in International Conference on Machine Learning , 2015, pp. 448- 456. [8] G. Cli fford, C. Liu, B. Moody, I. Sil va, Q. Li, A. Johnson , et al. , " AF Classification from a Short Single Lead ECG Record ing: the PhysioNet Computing in Cardiology Challenge 2017," presented at the Computing in Cardiology Rennes -France, 2017. [9] I. Silva and G. B. Moody , "An open-source toolbox f or analysing and processing physionet databases in matlab and octave," Journal of open research software, vol. 2, 2014. [10] C. Orphanidou, T. Bonnici, P. Charlton, D. C lifton, D. Vallance, and L. Tarassenko, "Signal -quality ind ices for the electrocardiogram and photoplethysmogram: d erivation and applications to wireless monitoring," IEEE journal of biomedical and health informatics, vol. 19, pp. 832-838, 2015. [11] C. Daluwatte, L. Johannesen, L. Galeotti, J. Vicente, D. Strauss, and C. Scully , " Assessing ECG signal qualit y in dices to discriminate ECGs with artefact s from p athologically different arrhythmic ECGs," Physiological me asurement, vol. 37, p . 1370, 2016. [12] B. Conroy, L. Eshelman, C. Potes, and M. Xu -Wilson, "A dynamic ensemble approach to robust classification in th e presence of missing data," Machine Learning, vol. 102, pp. 443- 463, 2016. [13] J. Behar, J. Oster, Q. Li, and G. D. Clifford, "ECG signal quality durin g arrhythmia and its application to false alar m reduction," IEEE transactions on biomedical engineering, vol. 60, pp. 1660-1666, 2013. [14] J. P an and W. J. Tompkins, "A real -time QRS d etection algorithm," IEEE transactions on b iomedical engineering, pp. 230 -236, 1985. [15] S. P arvaneh, M. R. Hashemi Golpayegani, M. Firoozabadi, and M. Haghjoo, "Predicting the spontaneou s term ination of atrial fibrillation based on P oincare section in the electrocardiogram phase space ," Proceedings of the Institution of Mechanical Engineers, Pa rt H: Jou rnal o f Eng ineering in Medicine, vol. 226, pp. 3-20, 2012. [16] O. Faust, R. A charya, S. Krishnan, an d L. C. Min, " Analysis of cardiac signals using spatial filling index and time- frequency domain," BioMedical Engineering OnLine, vol. 3, p. 30, 2004. [17] S. P arvaneh, M. R. H. Golpaygani, M. Firoozabadi, and M. Haghjoo, "A nalysis o f Ecg In Phase Space for the Prediction of Spontaneous Atrial Fibrillation Termination," Journal of electrocardiology, vol. 49, pp. 936-937, 2016. [18] H. -T. Ch eng, L . K oc, J. Harm sen, T. S haked, T. C handra, H. Aradhye , et al. , "Wide & deep learning for recommender systems," in Proceedings of the 1st Workshop on Deep Lea rning for Recommender Systems , 2016, pp. 7- 10. Address for correspondence. Jonathan Rubin / Saman Parvaneh 2 Canal Park, 3rd ﬂoor, Cambridge, MA 02141 . jonathan.rubin@philips.com / saman.parvaneh@philips.com

Densely Connected Convolutional Networks and Signal Quality Analysis to Detect Atrial Fibrillation Using Short Single-Lead ECG Recordings

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment