Combining Support Vector Machine and Elephant Herding Optimization for Cardiac Arrhythmias

Combining Suppor t V ector Machine and Elephant Her ding Optimization f or Car diac Arrh ythmias Aboul Ella Hassanien 1,* , Moataz Kilany 2,* , and Essam H. Houssein 2 2 F aculty of Computers and Inf ormation, Information T echology Depar tment, Cairo University , Egypt 3 F aculty of Computers and Inf ormation, Computer Science Depar tment, Minia University , Egypt 4 Scientiﬁc Research Group in Egypt (SRGE) http://www .egyptscience .net * aboitcairo@gmail.com ABSTRA CT Many people are currently suff er ing from hear t diseases that can lead to untimely death. The most common hear t abnor mality is arrh ythmia, which is simply irregular beating of the heart. A prediction system f or the early intervention and pre v ention of hear t diseases, including cardiov ascular diseases (CD Vs) and arrh ythmia, is impor tant. This paper introduces the classiﬁcation of electrocardiogram (ECG) hear tbeats into nor mal or abnor mal. The approach is based on the combination of swarm optimization algorithms with a modiﬁed P an–T ompkins algorithm (MPT A) and support vector machines (SVMs). The MPT A was implemented to remo ve ECG noise , follo wed by the application of the e xtended f eatures extraction algorithm (EFEA) for ECG f eature extraction. Then, elephant herding optimization (EHO) was used to ﬁnd a subset of ECG f eatures from a larger f eature pool that provided better classiﬁcation perf ormance than that achiev ed using the whole set. Finally , SVMs were used for classiﬁcation. The results show that the EHO-SVM approach achie ved good classiﬁcation results in terms of ﬁve statistical indices: accuracy , 93.31%; sensitivity , 45.49%; precision, 46.45%; F-measure, 45.48%; and speciﬁcity , 45.48%. Further more, the results demonstrate a clear improvement in accuracy compared to that of other methods when applied to the MIT -BIH arrh ythmia database. Introduction The W orld Health Organization (WHO) refers to CVDs as the main cause of death around the world. An estimated 17.5 million people died from CVDs in 2012, representing 31% of all global deaths 1 . Accordingly , cardiac health research has recei ved substantial attention from researchers, especially those tar geting pre venti ve, medical and technological adv ances. The main interest of researchers in this ﬁeld is the improv ement of traditional cardiov ascular-diagnosis technologies. ECG is a common and vital diagnosis tool for man y cardiac disorders and breathing disorders, such as obstructi ve sleep apnea syndrome, and for monitoring other functional or structural cardiac abnormalities 2 . The a vailability , reasonable cost, simplicity and low risk of ECG ha ve made it a popular technique that has been applied in many research ﬁelds during the past two decades. ECG is a non-inv asi ve tool that measures the electrophysiological activity and of the heart and the cardiov ascular system 3 and analysis of heart function. A heartbeat signals has three main characteristic features: the P w ave, QRS comple x, and T wa ve. Each feature appears as a distinguishable peak that is repeated in each beat signal. Cardiac arrhythmia detection requires analysis of the morphology , amplitude, and duration of the P , QRS and T peaks. The automation of ECG signal analysis based on the main characteristic P , QRS and T wa ves is an important research ﬁeld for sev eral reasons. Physicians depend on these signals to diagnose many cardiac diseases, such as autonomic malfunction, and other vascular , respiratory or even psychological dysfunctions. The automation process in volves numerous ﬁelds. This paper employs ECG analysis techniques to produce a model to efﬁciently and accurately detect heartbeats belonging to a set of categories known by cardiologists. T o obtain good results, we combined nov el optimization techniques with classiﬁer methods to perform heartbeat classiﬁcation 4 . Sev eral recent studies on ECG classiﬁcation and modeling ha ve been presented. For e xample, in 5 , temporal features and the hermit function coef ﬁcient are extracted from ECG signals as an input vector of the block-based neural network. In 6 , the bacterial foraging optimization (BFO) and particle sw arm optimization (PSO) are combined with neural networks (NNs) for the detection of left and right b undle branch block ECG patterns. In 7 , the cutof f frequency of ECG was in vestigated, and the spectrum of the ECG signal was e xtracted from four classes. In 8 , the proposed algorithm required approximately 15 min to ﬁlter a training set composed of 250 labeled samples. A ﬁv e-lev el ECG signal quality classiﬁcation algorithm using SVM was outlined in 9 . In 10 , computers in Cardiology applied continuous wa velet transform and Daubechies w avelet to the benchmark MIT -BIH Arrhythmia database. In addition, hybrid ﬁreﬂy and PSO (FFPSO) were combined with NNs to detect b undle branch block in 11 . Additionally , PSO with a random asynchronous approach was introduced in 12 . Finally , in 13 , a fuzzy classiﬁer with a genetic algorithm (GA) was proposed to classify ECG signals more precisely based on a dynamic model of the ECG signal. The aim of this paper is to present an automatic classiﬁcation approach for cardiac arrhythmias. The results introduced in this paper 14 show that ECG classiﬁcation of arrhythmias can be highly accurate. Therefore, these past results serve as motiv ation to focus on the classiﬁcation of ECG heartbeat signals into normal or abnormal. Feature extraction and selection techniques play a major role in the domain of signal processing. Therefore, the performance of identiﬁcation systems depends strongly on these techniques 15 . This paper introduces a hybrid optimization and classiﬁcation approach that uses EHO 16 to select relev ant features and optimize the SVM classiﬁer parameters for ECG heartbeat signals. The introduced classiﬁcation approach is superior to alternati ve approaches in a number of aspects such as; 1) W e applied a recent optimization algorithm that emplo ys a simple and relati vely quick search pattern. 2) Our v alidation relied on a stable benchmark dataset acquired by the MIT -BIH Laboratory and employed a relativ ely large number of records (10 patients). 3) V alidation was conducted using 3-fold lea ve-one-out cross-v alidation for generalized performance. 4) The classiﬁcation model relied on a large number of features compared with pre vious ECG classiﬁcation problems. 5) Staged optimization was employed for ECG classiﬁcation optimization, i.e., feature selection and parameter optimization were performed in separate stages, in contrast to many former studies. Staged optimization pre vents the loss of opportunities that arise when search agents change correlated parameters (features and classiﬁcation penalty) simultaneously . The structure of this paper is as follo ws. The ne xt section introduces the techniques and materials employed in this paper . The methodology is explained in detail in terms of the applied dataset, feature extraction and selection, and emotion regression optimization process. Then, the experimental results and a discussion of these results are presented. Finally , concluding remarks and future work are provided in last section. Materials and Methods Electrocar diogram (ECG) Signals Six kno wn heartbeat types can be identiﬁed. Each heartbeat can be accurately described by an ECG wa veform consisting of ﬁ ve peaks (features). The detection and ev aluation of each peak and its v ariance, distance and other mathematical characteristics leads to a powerful identiﬁcation of heartbeat properties 17 . T able 1 shows a description of each wav eform. All these characteristic points should be detected. T able 1. ECG wav es. W av e Description P A trial depolarization. Q Point before R, with slope < 0. R Distance between two peaks of QRS. S Point after R with slope > 0. T V entricular re-polarization. As a part of the ECG automatic detection process, additional features are extracted from the P , Q, R, S and T wav eforms as feature vectors 17 . Those ﬁve basic components (P , Q, R, S, and T) are used to interpret the ECG, as shown in T able 2 . T able 2. ECG parameters description. Amplitude Duration P W ave - 0.25 mV PR interval- 0.12s to 0.20s Q W ave -25% of R wa ve ST interval- 0.05s to 0.15s R W ave -1.60 mV QT interval- 0.35s to 0.44s T W ave -0.1 to 0.5 mV QRS interval-0.09s Elephant Herding Optimization A new algorithm introduced by Gai-Ge W ang et. al. in 2012, named Elephant Herding Optimization Algorithm (EHO) 16 . EHO solve all kinds of global optimization problems and the herding behavior of the elephants can be modeled as follo w; (1) each population is composed of some clans in the same time each clan has ﬁxed number of elephants. (2) at each generation, a ﬁx ed number of male will lea ve their family group and li ve far a way . (3) in each clan, the elephants li ve together under the leader called a matriarch. Exploration and exploitation in EHO are achiev ed by the clan updating operator and the separating operator . Algorithm 1 provides the algorithmic frame work of the EHO. For more details about EHO, see 16 . 2/ 13 Algorithm 1 Pseudo code of EHO. 1: Initialization: Initialize the generation counter g = 1; the maximum generation M axGen and the population; 2: While g < M axGen do 3: All the elephants should be classiﬁed according to the ﬁtness (objectiv e function) 4: Perform clan updating operator 5: Perform separating operator 6: assess the population by newly updated positions 7: g = g + 1 8: end while Support V ector Machines (SVMs) Sev eral classiﬁers hav e been proposed in the signal processing domain, including artiﬁcial neural network 18 , SVM, and fuzzy logic system 19 . Most researchers have focused on SVM for CVD classiﬁcation of ECG signals 20 . The classiﬁcation process of ECG signals for CVDs using SVM is re garded as the main objecti ve of this paper . Previous research illustrated the great performance of SVM, in which data are represented as a P-dimensional vector 21 . Classiﬁcation is performed by means of optimal separating hyperplanes, which ensure the greatest margin between the closest data points that belong to separate classes. SVMs depend on kernels in the classiﬁcation process, and kernel selection is a challenging task that strongly af fects the classiﬁcation performance. The SVM algorithm aims to ﬁnd the greatest distance around a hyperplane to separate a positi ve class from a negati ve class, as illustrated in the following Equations. f ( x ) = ( w . φ ( x )) + b (1) R S V M ( C ) = c 1 N N ∑ i = 1 L ε ( y i y ∆ i ) + 1 2 W T . W (2) L ε ( y i y ∆ i ) =    y i − y ∆ i   − ε   y i − y ∆ i   , ≥ ε 0 , Ot her wise (3) y ∆ = f ( x ) = N ∑ i = 1 ( α i − α ∗ i ) K ( X i , X ) + b (4) K ( X i , X ) = e x p (( − 1 δ 2 ( X i − X j ) 2 )) (5) Where φ ( x ) is a non-linear high-dimensional feature space and x is the input space. w and b are the modiﬁable model and threshold, estimated by minimizing, respecti vely . α i − α ∗ i is a Lagrange multipliers. k ( x i , x ) and δ 2 deﬁnes Gaussian kernel and the width of the kernel function, respecti vely . C is a positive real constant. ε refers to SVM parameter . Proposed Appr oach f or ECG Hear tbeat Classiﬁcation Fiv e distinct points (P , Q, R, S and T wav es) are included in each ECG signal. Fig. 1 shows the four phases of the SVM feature optimization process of the proposed approach: (1) Preprocessing, (2) ECG feature extraction, (3) Feature selection and optimization, and (4) Classiﬁcation and validation. Later, we pro vide a detailed model for phases (3) and (4), which are shown in Fig. 2 . In this paper , the EHO algorithm was modiﬁed for the purpose of classiﬁcation optimization. Elephant locations are identiﬁed as SVM parameters in the selected features set while elephant ﬁtness is realized as the av erage classiﬁcation accuracy for all cross-validation folds. For the ﬁtness calculation, the SVM is trained with three training sets and validated against three validation sets. Algorithm 2 provides the algorithmic frame work of the EHO-SVM classiﬁer presented in Fig. 2 . Algorithm 3 shows the ﬁtness calculation process using the SVM classiﬁer . 3/ 13 Figure 1. The proposed approach for ECG heartbeat classiﬁcation. Algorithm 2 EHO-SVM approach. 1: Input: Training sets (F olds) ( T 1 , T 2 . . . T n ) 2: Input: V alidation sets (Folds) ( V 1 , V 2 . . . V n ) 3: Output: Classiﬁcation accuracy 4: Initialization: 5: Generat ion count er t ← 1 6: Initialize population locations (SVM, Kernel parameters / Selected Features) 7: Evaluate population ﬁtness ( g ) (Alg. 3 , Eq. 6 ) 8: While g < M axGen do 9: Sort all the elephants according to their ﬁtness 10: Apply clan updating operator 11: Apply elephant separating operator 12: Evaluate population ﬁtness ( g ) (Alg. 3 , Eq. 6 ) 13: Find best elephant with highest ﬁtness (Classiﬁcation accuracy) 14: g = g + 1 15: end while Fitness Function An optimization algorithm generally depends on a ﬁtness function to ﬁnd best solution. The ﬁtness function provides the algorithm a value that quantiﬁes the ﬁtness of each solution found in search space. In this paper , we selected classiﬁcation accuracy as the solution qualiﬁer through the search process. Classiﬁcation accuracy is in the range [ 0 , 1 ] , and each elephant (search agent) is characterized by a number of accuracies that depend on the cross-validation strategy . In this paper , each elephant has three accuracy v alues, one for each fold in the 3-fold cross-validation strate gy . The accuracy values for all folds are av eraged to obtain the ﬁtness value for the search algorithm, as sho wn in Equation 6 . f ( i , j ) = n ∑ k = 1 Acc i , j , k n (6) 4/ 13 Figure 2. The general approach for ECG heartbeat classiﬁcation based on EHO. where f ( i , j ) is the ﬁtness value for elephant i in iteration j . n represents the number of folds selected for cross-v alidation. Acc i , j , k is the accuracy of the e valuation for elephant i in iteration j for the data fold k . Algorithm 3 shows the ﬁtness calculation process using the SVM classiﬁer . Algorithm 3 Evaluate elephant population ﬁtness. 1: Input: Training sets (F olds) ( T 1 , T 2 . . . T n ) 2: Input: V alidation sets (Folds) ( V 1 , V 2 . . . V n ) 3: Input: Population number ( j ) 4: Output: T otal accuracy 5: for Each elephant i in population j E i , j do 6: Get elephant location Loc i , j ← Locat ion ( E i , j ) 7: for Each training set T i ∈ T 1 , T 2 , T 3 do 8: SVM parameters P ← Ge t P aramet ers ( Loc i , j ) 9: Sel ect ed Fea t ures F ← Ge t Feat ures ( Loc i , j ) 10: T rain SVM on T i using P , F 11: Val id at ion Accuracy V ← V alidate V i 12: Acc ← Acc + V 13: end for 14: end for 15: T o t al Accur acy ← Acc / n 16: F i t ness ← T ot al Accuracy 17: Exit 5/ 13 Pre-processing Phase Using MPT A Power -line interference and baseline wandering are regarded as the most prominent types of noise that strongly af fect signals. Patient respiration, with a frequency in the range of 0.15 to 0.3 Hz, is the main source of baseline wandering. Power-line interference is categorized as narro w-band noise centered at approximately 60 Hz and occupies a bandwidth less than 1 kHz. The other sources of noise are wide-band and also affect ECG signals. The hardw are used to acquire ECG signals has the ability to suppress po wer-line interference; ho wev er , wide-band noise and baseline wandering cannot be suppressed by hardware alone. Therefore, software algorithms are used to remov e baseline wandering and other wide-band noise 22 . In this paper, MPT A 23 is used to remove dif ferent types of artifacts and noise. First, a bandpass ﬁlter, composed of a low-pass ﬁlter and a high-pass ﬁlter , is used to reduce noise. Then, a deri vati ve ﬁlter is used to obtain the slope information. Amplitude squaring is performed, and the signal is passed to a moving window inte grator . Finally , a thresholding technique is applied, and the peaks are detected. ECG Feature Extraction A wa ve analysis technique is required to perform feature extraction. W ave analysis techniques decompose a giv en wav e into its wa velet b uilding blocks. In this paper , two feature e xtraction techniques are applied to extract features for classiﬁcation, such as the RR interval. Featur e Extraction Using MPT A: Nine heartbeat wa ves are extracted from the ST segment and QRS comple x based on MPT A, and the ECG signals are decomposed into lo w-frequency signals. Therefore, the low-frequency band is utilized to detect the P , QRS, and T waves. Featur e Extraction Using Improv ed Featur e Extraction Algorithm (IFEA): W e apply an IFEA 24 to obtain more features. The algorithm takes the output of MPT A, pinpoints the wav e components from the results, and calculates ne w features. The MPT A output describes different types of locations (points in time) on the ECG w aveform in terms of descripti ve letters (annotations). There are numerous letters employed for this purpose such as P , N, and T representing the P-type wave, the R-Peak of a normal beat, and the T -T ype wav e respecti vely . There are also auxiliary letters such as the opening ad closing brackets representing the beginning and end of a wa ve type with the wa ve peak enclosed in between. The following is an example of three consecutiv e heartbeats from the data set : (P)(N)(t)(P)(V)(t)(P)(N)(t) which are two normal beats with a V entricular Arrhythmias beat in the middle. The R-T ype wa ve peak takes a series of letters that annotates the type of heart beat in a whole. For example, N is a normal beat as in the “(P) (N) (t)” wav e. A “(P) (V) (t)” wav e form indicates a beat with a V entricular Arrhythmias. The algorithm also resolves some defects in the P-QRS-T e xtractor when patterns of the waveforms are not consistent, not complete or do not exist for the corresponding beats. MPT A is employed to extract the nine heartbeat wav e characteristic features (1 through 9) sho wn in T able 3 , which represent detailed features of the previously described P , Q, R, S, and T wav eforms. The additional ten features (10 through 19) are extracted by IFEA. All nineteen features e xtracted with MPT A and IFEA are depicted in T able 3 . Feature Selection and Optimization Based on EHO Swarm intelligence (SI) is a ne w branch of artiﬁcial intelligence employed to mimic the collectiv e behavior of social swarms in nature, such as elephant herding, social spiders, gray wolv es, and ant colonies. A swarm is composed of a set of agents that interact among themselv es and with the en vironment without central control. Recent research introduced swarm-based algorithms that can rapidly solve search-based problems at low cost. The types of swarms include nature-inspired and population-based. Classiﬁcation and feature optimization(feature selection and parameter optimization) are two of many application domains that successfully employ SI. Other domains include machine learning, bioinformatics, medical informatics, dynamical systems and operations research 25 . The proposed approach utilizes EHO for feature selection and parameter optimization to improv e the classiﬁcation accuracy . The feature optimization framework is illustrated in Fig. 2 , which depicts the last two phases of the classiﬁcation approach: Phase 3 is feature selection and optimization and Phase 4 is classiﬁcation and validation. Fig. 2 also sho ws how EHO employs the SVM classiﬁer to ev aluate the ﬁtness of each search agent in each optimization iteration. ECG Classiﬁcation and Optimization P arameters Research ef forts hav e shown the dependency between feature optimization and SVM parameter optimization. A kno wn approach is to perform optimization via multiple stages of feature optimization follo wed by SVM parameter optimization rather than simultaneously optimizing features and parameters in the same run. Fig. 3 illustrates the multi-stage feature and parameter optimization model. In this w ork, we established a four -stage optimization process, where the parameter and feature optimization processes are interchanged in each stage. 6/ 13 T able 3. Heartbeats features extracted by MPT A and EFEA. N Featur e Meaning 1 PS Beginning location of P wa ve form. 2 P Peak location of P wa ve form. 3 PE End location of P wa ve form. 4 Q Beginning of QRS comple x. 5 R R peak of QRS complex. 6 S End of QRS complex. 7 TS Beginning of T wa ve form. 8 T Peak of T wa ve form. 9 TE End of T wa ve form. 10 QRS QRS = S Q. 11 P-R P RSeg = Q PE. 12 P-R P RInt = Q PS. 13 S-T S TSeg = TS S. 14 Q-T TE Q. 15 R-R RNext R. 16 P-P PNext P . 17 R-R / P-P RR-PPSim = ABS(R-R P-P). 18 R-R v ariance V ar (R-R). 19 Heartbeat 60/R-R. Figure 3. The staged feature and parameter optimization approach. Results The simulation results are obtained using MA TLAB R2014a. The e xperimental setup of the dataset, training data and testing data is presented in the following section. ECG Dataset Description Researchers use standard databases for analysis purposes. The PhysioNet website is dedicated to medical data corresponding to various diseases 26 . PhysioNet databases are composed of hundreds of digitized medical records of ECG, EEG and other types of physiologic signals. Each ECG record is annotated and re vised by a number of cardiologists. Man y research efforts depend on the MIT -BIH Arrhythmia database provided by PhysioNet and obtained by the MIT -BIH Laboratory , which consists of sev eral ECG signal records for patients with dif ferent types of abnormalities and diseases that af fect heart rhythms 27 . MIT -BIH Arrhythmia database comprises of 25 male and 22 female subjects and has 48 half-hours. The signals were collected at 360 samples/sec/channel ov er a 10 mV range with 11-bit resolution. Additionally , each record is annotated by two or more cardiologists independently , and approximately 110,000 annotations are included in the database 27 . W e applied the proposed classiﬁcation approach to a subset of the dataset that includes 10 patients with 16 heartbeat types. 7/ 13 The data were processed into 10 feature vectors (one for each patient), which combined represent 24,474 records of 10 features each. These data are considered suf ﬁciently large to cov er the great variability of patients while maintaining a reasonable level of computational ov erhead. T able 4 shows the datasets emplo yed in our experiment. T able 4. ECG dataset description. N Patient No. Gender Age PhysioNet Standard Beat T ypes 1 202 Male 68 N-A-a-V -F 2 203 Male 43 N-a-V -F 3 205 Male 59 N-A-V -F 4 207 Female 89 L-R-A-V -E 5 214 Male 53 L-V -F 6 215 Male 81 N-A-V -F 7 217 Male 65 N-V -/-f 8 219 Male Unknown N-A-V -F 9 221 Male 83 N-V 10 223 Male 73 N-A-a-V -F-e The types of heartbeat are represented by symbols deﬁned by PhysioNet, as shown in T able 5 . This set of beat types is translated from 16 classes into two classes, normal and abnormal (N and A), with type N considered to be normal and all types considered to be abnormal (A). W e selected ten patients with a suf ﬁcient number of beat types to ensure the v alidity of classiﬁcation results and to describe se veral types of heartbeat. A total of 25,210 ECG beats of dif ferent types were used for classiﬁcation. T able 5. Heartbeat descriptions. Beats Description T otal number N Normal beat 16742 L Left b undle 3460 V Premature v entricular contraction 2154 / Paced beat 1542 ! V entricular ﬂutter wav e 472 A Atrial premature 228 f Fusion of paced and normal beat 260 x Non-conducted P-wa ve (blocked APB) 133 R Right b undle 86 | Isolated QRS-like artifact 37 F Fusion of v entricular 30 a Atrial premature 22 E V entricular escape 16 e Atrial escape 16 [ Start of v entricular ﬂutter/ﬁbrillation 6 ] End of v entricular ﬂutter/ﬁbrillation 6 P arameters Settings The cross-v alidation, SVM parameter settings, and EHO parameter settings are included in the e xperiments. W e performed 3-fold leav e-one-out cross-validation on all datasets T able 6 summarizes the selected settings for SVM and EHO. A subset of the settings is determined based on the recommendations of the algorithm designers, and the others are set via comprehensi ve testing. P erformance Measurements Fi ve standard criteria are used to ev aluate the proposed approach: 1) accuracy (Acc), 2) precision (Prec), (3) speciﬁcity (Sp), (4) F-measure (F), and (5) sensitivity (Se). Performance measures generally depend on four main metrics of a binary classiﬁcation result (positiv e/negati ve/true/false). Mathematically , the performance measures are deﬁned by the following Equations. 8/ 13 T able 6. Parameter settings for SVMs and EHO. SVM EHO Parameter V alue Parameter V alue Kernel Radial Basis Alpha Factor 5 Penalty [1, 1000] Peta Factor 0.0005 Gamma [0, 1000] Elephant Keep 2 Scaling [-1, 1] Clans Count 5 Elephants 30 • Accuracy (Acc): Acc = T P + T N T P + F P + F N + T N ∗ 100 (7) • Precision (Prec): Prec = T P T P + F P ∗ 100 (8) • Sensitivity (Se): Se = T P T P + F N ∗ 100 (9) • F-measure (F): F = 2 ∗ PP V ∗ T PR PP V + T PR (10) • Speciﬁcity (Sp): S p = T N T N + F P ∗ 100 (11) Discussion The follo wing section presents the classiﬁcation results for the 10 selected patients in terms of the performance measures for each patient. T ables 7 and 8 summarize the classiﬁcation accuracy , speciﬁcity , sensitivity , precision, F-measure for each patient record. The best accuracy results per record are shown in boldface. Each patient has four results sets, one for each stage of the optimization, as discussed in the previous sections. Additionally , the stage in which the best accurac y is obtained for each patient is indicated by boldface font. The problem considered in this paper is not a binary classiﬁcation problem, so we e xtract the true positive (TP), false positi ve (FP), true negati ve (TN), and false ne gativ e (FN) measures by means of a confusion matrix constructed for the classiﬁcation test. T able 8 summarizes the best results for each classiﬁer along with the classiﬁcation accurac y , precision, sensiti vity , F-measure, and speciﬁcity . The table compares between accuracy values of SVM and EHO-SVM. The accuracy v alues of SVM were acquired from early stages of optimization (ST1), where SVM model w as assigned random parameters for all patients. Then results were av eraged ov er all patients for each accuracy metric. EHO-SVM accuracy v alues are calculated as the av erage of best values for all patients and for each metric stated in T able 7 . Fig. 4 shows the results for each classiﬁer and the visual comparison of the best results obtained by SVM and EHO. The proposed approach achiev es the best classiﬁcation performance with the highest number of features. MPT A was employed to extract nine heartbeat w av e characteristic features (indices 1 through 9), which represent detailed features for the pre viously described P , Q, R, S, and T wa veforms. Additionally , ten features are extracted by means of the proposed IFEA (indices 10 through 19) 24 . The behavior of EHO during the search process is depicted in Fig. 5 , which illustrates the ev olution of the ﬁtness function v alue (averaged v alue) associated with the best global swarm parameter for all patients and stages, also kno wn as the conv ergence of the ﬁtness function, based on EHO. 9/ 13 T able 7. Summary of EHO-SVM approach results. Patient No. Stage No. Acc Prec Se F Sp 202 Stage 1 97.09% 44.51% 33.02% 36.19% 85.74% Stage 2 97.10% 19.42% 20.00% 19.71% 80.00% Stage 3 97.10% 19.42% 20.00% 19.71% 80.00% Stage 4 97.28% 49.52% 34.03% 38.59% 86.49% 203 Stage 1 85.95% 35.36% 21.43% 21.13% 81.41% Stage 2 89.68% 17.94% 20.00% 18.91% 80.00% Stage 3 89.69% 24.63% 20.34% 19.61% 80.31% Stage 4 89.69% 24.63% 20.34% 19.61% 80.31% 205 Stage 1 98.79% 47.77% 45.12% 45.92% 92.31% Stage 2 98.79% 47.77% 45.12% 45.92% 92.31% Stage 3 98.72% 47.48% 44.71% 45.42% 91.93% Stage 4 98.76% 47.92% 44.45% 45.80% 91.74% 207 Stage 1 81.21% 32.92% 31.04% 28.41% 83.89% Stage 2 82.07% 44.10% 33.65% 32.83% 83.75% Stage 3 78.35% 15.69% 19.97% 17.57% 79.98% Stage 4 80.91% 20.77% 20.32% 18.88% 80.88% 214 Stage 1 95.66% 46.27% 43.91% 44.50% 93.93% Stage 2 97.21% 48.60% 50.00% 49.29% 50.00% Stage 3 97.21% 48.60% 50.00% 49.29% 50.00% Stage 4 97.70% 48.20% 45.98% 46.96% 96.00% 215 Stage 1 98.75% 47.13% 46.06% 46.57% 95.83% Stage 2 98.81% 47.41% 46.08% 46.71% 95.85% Stage 3 98.81% 47.41% 46.08% 46.71% 95.85% Stage 4 98.81% 47.41% 46.08% 46.71% 95.85% 217 Stage 1 84.95% 71.33% 69.30% 67.63% 94.44% Stage 2 86.06% 74.70% 70.72% 69.19% 94.83% Stage 3 86.11% 74.74% 71.13% 69.42% 94.92% Stage 4 86.11% 74.74% 71.13% 69.42% 94.92% 219 Stage 1 98.14% 44.62% 42.79% 43.47% 90.85% Stage 2 98.70% 49.12% 42.64% 45.30% 90.60% Stage 3 99.26% 48.57% 48.36% 48.37% 95.84% Stage 4 99.26% 48.57% 48.36% 48.37% 95.84% 221 Stage 1 99.54% 99.13% 99.23% 99.18% 99.23% Stage 2 99.67% 99.11% 99.70% 99.41% 99.70% Stage 3 99.71% 99.14% 99.83% 99.48% 99.83% Stage 4 99.71% 99.14% 99.83% 99.48% 99.83% 223 Stage 1 88.82% 28.14% 28.11% 27.99% 93.59% Stage 2 90.55% 30.16% 28.68% 29.20% 93.68% Stage 3 90.93% 29.70% 29.59% 29.55% 94.50% Stage 4 91.81% 29.26% 30.92% 30.05% 95.91% T able 8. Summary of the experimental results. Measures SVM EHO-SVM Improvement Accuracy 80.31% 94.07% 13.76% Precision 40.45% 52.32% 11.87% Sensitivity 40.49% 47.85% 7.36% F-measure 38.48% 47.58% 9.10% Speciﬁcity 40.48% 47.58% 7.10% As shown by Fig. 5 , each record reaches the maximum classiﬁcation accuracy at an arbitrary stage. Some records reach the 10/ 13 Figure 4. Classiﬁcation performance for SVM and EHO-SVM. maximum in stage 1, for example, patient number 205 under EHO optimization, while others reach the maximum in stages 2, 3 or 4. Overall, multi-stage optimization is important and can produce better results than those of single-stage optimization. The con ver gence curves for EHO show the accuracy of the algorithm and how fast it reaches the ﬁnal accuracy . For the test conducted in this paper with the deﬁned parameters (T able 6 ), EHO reaches the maximum accuracy with fe wer iterations. The previous conclusion is v alid with respect to both each stage individually and to the ov erall con ver gence curve for all stages. Figure 5. A verage con vergence curv es for EHO. Accuracy Analysis T o avoid possible bias in the selection of the test and training sets, 3-fold cross-v alidation is utilized in this paper; hence, the ECG dataset was divided into three parts. For comparison, we consider some pre vious studies based on the same dataset. In 6 , the MIT -BIH Arrhythmia database was tested using PSO, GA, BFO, and bacterial foraging–particle swarm optimization (BFPSO) with SVM. In 10 , CWT and the histogram representation were applied to determine the QRS, T and P wa ves. Furthermore, in 28 , the optimal number of Hermite functions to represent the QRS wav e was studied. In 29 , SVM was utilized to cluster heartbeats based on only two types of features, in contrast to our work with nineteen features. Additionally , in 30 , wa velet time frequency (WTF) was applied to detect sudden amplitude and frequency jumps, b ut the ECG signals were recorded under hypnosis to obtain heart rate variability . Ho wev er , this work did not focus on the heart rate classiﬁcation accuracy . These comparisons are shown in T able 9 , where it is clear that the proposed approach outperforms the compared studies. The proposed classiﬁcation approach was applied to 10 patients, 16 types of heartbeat and 24,474 records. The proposed classiﬁcation approach was validated and e valuated for ef ﬁciency based on the suf ﬁciently large data co vering a large v ariety of patients. It is important to note that this approach requires a number of future improv ements. The proposed model currently tar gets two classes of heartbeat arrhythmias, normal and abnormal, which are considered to be relati vely general classes. Howe ver , we are improving this w ork to accurately separate more precise classes of heartbeat, such as PVC, F , A, R, and F . The proposed model also applies a relatively traditional classiﬁcation technique (SVM); we plan to employ deep learning techniques to achiev e better classiﬁcation performance. Finally , a more advanced and more popular feature extraction technique, such as wa velet transform, is required in future work. 11/ 13 T able 9. Comparison of the results and methods of studies that used the same MIT -BIH Arrhythmia database. Studies Y ear Appr oach Accuracy 28 2015 Hermite functions 90% 6 2015 BFPSO-SVM 76.74% 10 2016 Delineation Method 92.44% 29 2017 SVM 93.1% 30 2017 WTF N A Proposed 2017 EHO-SVM 93.31% Conclusion and Future W ork ECG analysis helps cardiologist to make decisions about cardiac arrhythmias more accurately and easily to sav e li ves of thousands of people. ECG records the electrical activity of the heart within a speciﬁc time; hence, ECG is considered to be an important diagnostic tool to assess heart function. In this paper , we hav e de veloped a hybrid approach for automatic ECG signal classiﬁcation by means of EHO and SVMs. The proposed approach includes three modules for automatic ECG signal classiﬁcation: an efﬁcient preprocessing module, a feature extraction module, and a feature optimization and classiﬁcation module. In the preprocessing module, the MPT A and IFEA are applied to extract nineteen heartbeat features. Additionally , we use SVMs to classify features extracted from the previous module. Finally , in the last module, EHO is utilized to optimize the features and parameters extracted by the SVMs. The experiments sho wed that the proposed approach achie ves precise detection. Moreover , the proposed approach sho ws promise for use by medical experts who wish to diagnose heart and cardiac disorders based on ECG signals. In future work, we will propose an automated cardiac arrhythmia classiﬁcation approach using hybrid SVMs and spike neural network with recent meta-heuristic optimization algorithms to focus on common disorders, such as congestiv e heart failure, and other cases of biomedical time series. Additional important goals are to analyze ECG signals in time domain and to detect the optimal representation of the P , QRS and T patterns. References 1. Cardiov ascular diseases (cvds). http://www.who.int/mediacentre/factsheets/fs317/en . 2. Bortolan, G. & W illems, J. Diagnostic ECG classiﬁcation based on neural networks. J. Electr ocardiol. 26 , 75–79 (1992). 3. Hasan, M. & Mamun, M. Hardware approach of r-peak detection for the measurement of fetal and maternal heart rates. J. applied r esear ch technology 10 , 835–844 (2012). 4. Houssein, E. H., Kilan y , M. & Hassanien, A. E. Ecg signals classiﬁcation: A revie w . Int. J. Intell. Eng. Informatics 5 , 376 – 396 (2017). 5. Shadmand, S. & Mashouﬁ, B. A new personalized ECG signal classiﬁcation algorithm using block-based neural network and particle swarm optimization. Biomed. Signal Pr ocess. Contr ol. 25 , 12 – 23 (2016). 6. K ora, P . & Kalva, S. R. Hybrid bacterial foraging and particle swarm optimization for detecting bundle branch block. SpringerPlus 4 , 1–19 (2015). 7. Moein, S. & Logeswaran, R. Intelligent ECG signal noise remov al using psonn. Int. J. Comput. Appl. 45 , 9–17 (2012). Full text a vailable. 8. Pasolli, E. & Melgani, F . Genetic algorithm-based method for mitigating label noise issue in ECG signal classiﬁcation. Biomed. Signal Pr ocess. Control. 19 , 130 – 136 (2015). 9. Li, Q., Rajagopalan, C. & Clif ford, G. D. A machine learning approach to multi-le vel ECG signal quality classiﬁcation. Comput. Methods Pr og. Biomed. 117 , 435–447 (2014). 10. Y ochum, M., Renaud, C. & Jacquir, S. Automatic detection of p, qrs and t patterns in 12 leads ECG signal based on cwt. Biomed. Signal Pr ocess. Control. 25 , 46 – 52 (2016). 11. K ora, P . & Krishna, K. R. Hybrid ﬁreﬂy and particle swarm optimization algorithm for the detection of b undle branch block. Int. J. Cardiovasc. Acad. 2 , 44 – 48 (2016). 12. Adam, A., Shapiai, M. I., Mohd Tumari, M. Z., Mohamad, M. S. & Mubin, M. Feature selection and classiﬁer parameters estimation for eeg signals peak detection using particle swarm optimization. The Sci. W orld J. 2014 , 13 (2014). 13. V afaie, M., Ataei, M. & Kooﬁgar , H. Heart diseases prediction based on ECG signals’ classiﬁcation using a genetic-fuzzy system and dynamical model of ECG signals. Biomed. Signal Process. Contr ol. 14 , 291 – 296 (2014). 12/ 13 14. Jovic, A. & Jo vic, F . Classiﬁcation of cardiac arrhythmias based on alphabet entrop y of heart rate variability time series. Biomed. Signal Pr ocess. Control. 31 , 217–230 (2017). 15. Mihandoost, S. & Amirani, M. C. Cyclic spectral analysis of electrocardiogram signals based on garch model. Biomed. Signal Pr ocess. Control. 31 , 79–88 (2017). 16. W ang, G. G., Deb, S. & d. S. Coelho, L. Elephant herding optimization. In Computational and Business Intelligence (ISCBI), 2015 3r d International Symposium on , 1–5 (IEEE, 2015). 17. Kutlu, Y . & Kuntalp, D. A multi-stage automatic arrhythmia recognition and classiﬁcation system. Comput. Biol. Medicine 41 , 37–45 (2011). 18. Gaetano, A. D., Panunzi, S., Rinaldi, F ., Risi, A. & Sciandrone, M. A patient adaptable ECG beat classiﬁer based on neural networks. Appl. Math. Comput. 213 , 243 – 249 (2009). 19. Tsipouras, M. G., V oglis, C. & Fotiadis, D. I. A framew ork for fuzzy expert system creation application to cardiov ascular diseases. IEEE T ransactions on Biomed. Eng. 54 , 2089–2105 (2007). 20. Ghosh, D., Midya, B. L., K oley , C. & Purkait, P . W av elet aided SVM analysis of ECG signals for cardiac abnormality detection. In 2005 Annual IEEE India Conference - Indicon , 9–13 (IEEE, 2005). 21. Karpagachelvi, S., Arthanari, M. & Si vakumar , M. Classiﬁcation of electrocardiogram signals with support vector machines and extreme learning machine. Neural Comput. Appl. 21 , 1331–1339 (2012). 22. Rocha, T . et al. A lead dependent ischemic episodes detection strategy using hermite functions. Biomed. Signal Pr ocess. Contr ol. 5 , 271 – 281 (2010). 23. Kritika Bawa, P . S. R-peak detection by modiﬁed pan-tompkins algorithm. Int. J. Adv. Res. & T echnol. 3 (2014). 24. Houssein, E. H., Kilan y , M., Hassanien, A. E. & Snasel, V . A two-stage feature extraction approach for ecg signals. In International Afr o-European Confer ence for Industrial Advancement , 299–310 (Springer , 2016). 25. Hassanien, A. E. & Emary , E. Swarm Intelligence: Principles, Advances, and Applications (CRC Press, 2016). 26. Goldberger , A. L. et al. Physiobank, physiotoolkit, and physionet. Circ. 101 , e215–e220 (2000). 27. K or ¨ urek, M. & Nizam, A. Clustering mit–bih arrhythmias with ant colony optimization using time domain and pca compressed wa velet coef ﬁcients. Digit. Signal Pr ocess. 20 , 1050 – 1060 (2010). 28. M ´ arquez, D. G., Otero, A., Garc ´ ıa, C. A. & Presedo, J. A study on the representation of qrs complex es with the optimum number of hermite functions. Biomed. Signal Process. Contr ol. 22 , 11–18 (2015). 29. Chen, S., Hua, W ., Li, Z., Li, J. & Gao, X. Heartbeat classiﬁcation using projected and dynamic features of ecg signal. Biomed. Signal Pr ocess. Control. 31 , 165–173 (2017). 30. Chen, X., Y ang, R., Ge, L., Zhang, L. & Lv , R. Heart rate variability analysis during hypnosis using wa velet transformation. Biomed. Signal Pr ocess. Control. 31 , 1–5 (2017). A uthor contributions statement Moataz Kilany performed the experiments, discussed the data and wrote the paper . Aboul Ella Hassanien conceived and supervised the research, discuss the experiments and polished the paper . Essam H. Houssein participated in written some part in the paper and write the references. All authors read and approved the ﬁnal paper . Additional inf ormation The authors declare no competing interests. 13/ 13

Combining Support Vector Machine and Elephant Herding Optimization for Cardiac Arrhythmias

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment