Extract fetal ECG from single-lead abdominal ECG by de-shape short time Fourier transform and nonlocal median

The multiple fundamental frequency detection problem and the source separation problem from a single-channel signal containing multiple oscillatory components and a nonstationary noise are both challenging tasks. To extract the fetal electrocardiogra…

Authors: Su Li, Hau-tieng Wu

Extract fetal ECG from single-lead abdominal ECG by de-shape short time   Fourier transform and nonlocal median
EXTRA CT FET AL ECG FR OM SINGLE-LEAD ABDOMINAL ECG BY DE-SHAPE SHOR T TIME FOURIER TRANSFORM AND NONLOCAL MEDIAN LI SU AND HA U-TIENG WU A B S T R AC T . The multiple fundamental frequency detection problem and the source sep- aration problem from a single-channel signal containing multiple oscillatory components and a nonstationary noise are both challenging tasks. T o extract the fetal electrocardiogram (ECG) from a single-lead maternal abdominal ECG, we f ace both challenges. In this paper , we propose a novel method to e xtract the fetal ECG signal from the single channel maternal abdominal ECG signal, without any additional measurement. The algorithm is composed of three main ingredients. First, the maternal and fetal heart rates are estimated by the de- shape short time Fourier transform, which is a recently proposed nonlinear time-frequency analysis technique; second, the beat tracking technique is applied to accurately obtain the maternal and fetal R peaks; third, the maternal and fetal ECG wav eforms are established by the nonlocal median. The algorithm is e valuated on a simulated fetal ECG signal data- base ( fecgsyn database), and tested on two real databases with the annotation provided by experts ( adfecgdb database and CinC2013 database). In general, the algorithm could be applied to solve other detection and source separation problems, and reconstruct the time-varying w av e-shape function of each oscillatory component. 1. I N T R O D U C T I O N Electrocardiograph (ECG) is inarguably the most widely applied measurement to non- in vasi vely study the cardiac activity , since its appearance in 1901 [3]. Its wav eform pro- vides a significant amount of clinical information. In addition, the time-varying speed of heart beating, widely understood as the heart rate variability (HR V), has prov ed to be a por- tal to our physiological dynamical status. While it has been that widely applied in dif ferent scenarios, its application to the intra-uterus fetus is still limited, mainly due to the lack of a direct contact measurement of the fetal ECG (fECG) signal. Like the adult ECG signal processing, there are two main purposes in the fECG signal processing. First, we want to non-in vasi vely obtain the fetal heart rate, which is intimately related to the fetal distress [35]; second, we would like to analyze the fECG morphology for the sake of diagnosing cardiac problems. Howe ver , the fECG morphological analysis is less performed in clinics, except for the ST analysis (ST AN) monitor , which detects and alerts the potential risk for fetal hypoxia (see, e.g. [9] and the citations therein). There are mainly two types of fECG signals. The first kind of signal is directly recorded through an electrode attached to the fetal skin. For example, the electrode could be attached to the scalp while the cervix dilates during deliv ery , which is considered in vasi ve. While the recorded signal is of high quality , it can only be recorded during a specific and short period and the instrument is not designed for the long-term monitoring purpose. Also, the infection risk is not negligible. So it is not routinely used in clinics. W e call it the dir ect fECG signal , and we mention that the ST AN monitor depends on the direct fECG signal. The second kind of signal is recorded from the mother’ s abdomen, where the sensor is close to the fetus so that the fECG signal is big enough compared with the maternal 1 2 L. SU AND H.-T . WU ECG. The recorded signal is called the abdominal ECG (aECG), which is composed of the maternal cardiac activity , called the maternal abdominal ECG (maECG) , and the fetal cardiac acti vity , called the indir ect fECG signal (or nonin vasi ve fECG signal). When there is no danger of confusion, we call the indirect fECG signal simply the fECG signal in this paper . An excellent summary of the av ailable measurement techniques and fECG history (as well as sev eral other topics) is provided in [35, 56]. While the aECG signal is non-in v asi ve, easy-to-obtain, and suitable for the long-time monitoring purpose, ho we ver , from the signal processing vie wpoint, it is challenging to obtain the indirect fECG signal from the aECG signal. For example, the fECG signal is al- ways “contaminated” by or mixed with the maECG, and the signal-to-noise ratio (SNR) is in general lo w . These issues challenge the estimation of the fECG and hence the HR V anal- ysis from the aECG signal. Furthermore, ev en if the maECG signal could be successfully decoupled from the fECG signal and perfectly denoised, interpreting the morphology of fECG signal is still challenging. This issue originates from the indi vidual variation among subjects, for example, the uterus position and shape, and the fetal size and presentation. So, e ven if we could standardize the lead system on the mother’ s abdomen, the application of the fECG wa veform is still limited. The challenge has attracted a lot of attention in the past decades, and se veral algorithms hav e been proposed. As is summarized in [20], most methods take the following fi ve steps to study the aECG: first, pre-process the aECG; second, estimate the maECG; third, remov e the maECG from the aECG; fourth, post-process the remainder to obtain the fECG and/or estimate the R peaks and hence the fIHR. In short, the maECG is remov ed first so that the fECG could be analyzed from the remainder . While available algorithms could be classified in different ways based on different criteria, here, for the work presented in this paper , we summarize the existing algorithms based on the number of needed leads, and classify them into two categories – one depends on more than one ECG channel and one depends on only one aECG signal. Most algorithms need multiple aECG channels and/or one maternal thoracic ECG (mtECG) signal, or at least one aECG channel and one mtECG; for example, blind source separation (BSS) [23, 2, 24, 65], semi-BSS like periodic component analysis ( π CA), or π T ucker de- composition, which tak es the pseudo-periodic structure into account [58, 32, 1], echo state neural network [7], least mean square (LMS) [66], recursive least square (RLS) [7], and blind adapti ve filtering [30], Kalman filter [55, 50, 6], channel selection approach based on features extracted by different methods, like discrete w avelet transform [28], time-adapti ve W iener-filter like filtering [54], principal component regression [46], phase space embed- ding [37], to name but a fe w . On the other hand, fewer algorithms depend on the single-lead aECG signal; for exam- ple, template subtraction (TS) [64, 13, 49, 63, 7], and its variation based on singular value decomposition (SVD) or principal component analysis [36, 18], the time-frequency anal- ysis, like wavelet transform, pseudo-smooth W igner-V ille distribution [39, 38, 12, 4] (in practice, three aECG channels are av eraged in [38]), and S-transform [42], sequential to- tal variation [43], adaptiv e neuro-fuzzy inference system and extended Kalman filter [52], particle swarm optimization and extended Kalman smoother [51] state space reconstruc- tion via lag map [53, 40], etc. W e refer the reader with interest to, for example, [56, 5] for a more detailed revie w of av ailable methods. The above-mentioned algorithms all have their own merits and disadvantages; for ex- ample, algorithms depending on multiple leads usually provide a more accurate result, but SINGLE CHANNEL FECG BY DE-SHAPE STFT AND NONLOCAL MEDIAN 3 the dependence on multiple leads render it less applicable for the screening and monitor- ing purpose; on the other hand, the algorithms depending on the single-lead aECG signal usually ha ve lo wer accuracy , while the y could be applied to a wider range of situations. T o simultaneously fulfill the practical need and the accurac y , in this paper , we propose a no vel algorithm to extract fetal instantaneous heart rate (fIHR) and the fECG signal from the single-lead aECG signal from a dif ferent viewpoint. The proposed algorithm combines a recently developed nonlinear time-frequency (TF) analysis called the de-shape short time Fourier transform (de-shape STFT), and the nonlocal median; the de-shape STFT extracts the maternal instantaneous heart rate (mIHR) from the single-lead aECG, which provides the maternal R peak information. The maECG is then extracted from the aECG by the non- local median algorithm. The difference between the aECG and estimated maECG serves as a rough fECG estimate, and the fIHR could be estimated from the rough estimate of fECG by de-shape STFT , and hence the fetal R peaks. The fECG is then extracted by the nonlocal median algorithm. The R peak information could be accurately estimated by the beat tracking algorithm based on the dynamic programming. While not explicitly used in the algorithm, we mention that our method has the ability to simultaneously obtain the fIHR and mIHR, and hence simultaneously the fECG and mECG. The novelty and the main difference between our proposed method and the other al- gorithms based on the single-lead aECG signal are two folds. First, we use more infor- mation hidden in the single-lead aECG signal. Note that the traditional R peak detection algorithms mainly count on the morphological landmarks (fiducial points), like the maxi- mal points representing the R peaks, or the maximal “energy” pattern driv en by the QRS complex in the TF domain determined by , for example, the wav elet transform. Then the mIHR and fIHR are obtained by interpolating the estimated R peak locations. On the other hand, de-shape STFT allows us to dir ectly extract the mIHR and fIHR from the single- lead aECG signal, and the fIHR from the rough fECG estimate. W e could then utilize the mIHR and fIHR to guide an accurate R peak detection. Unlike the traditional approach, we simultaneously use the frequenc y information (the mIHR and fIHR), which reflects the time-varying and nonlinear beat-to-beat relationship, and the morphological landmark in- formation. Second, based on the nonlinear manifold model, we apply the nonlocal median algorithm [15, 45] to extract the maECG and fECG signals – for each cardiac acti vity can- didate, we only consider those aECG segments with a similar pattern, and use the median to estimate the underlying cardiac activity . Compared with the traditional TS methods [36], where the mean, or the mean together with the first few principal components, of consec- utiv e aECG segments containing cardiac acti vities is considered to be the template of the cardiac activity , the nonlocal median algorithm takes care of the following commonly en- countered issues. The fact that the QRST complex morphology is time-v arying [48] might be ov erlooked in the traditional TS procedure; the mean of consecutiv e aECG segments containing cardiac activities is well kno wn to be sensitive to outliers; the TS algorithm is sensitiv e to the number of principal components and an empirical optimization is needed [18]. In sum, the de-shape STFT is applied to get a better mIHR and fIHR and hence maternal and fetal R peaks, and the nonlocal median is applied to get a better mECG and fECG. The paper is organized in the follo wing. In Section 2, we discuss a phenomenological model for the aECG and the mathematical background for the de-shape STFT and nonlo- cal median. In Section 3, the single-lead fECG extraction algorithm is introduced. The material and results are reported in Section 4. In Section 5, the paper is summarized by a discussion, including limitations and future works. 4 L. SU AND H.-T . WU 2. M AT H E M ATI C A L B A C K G R O U N D The aECG signal contains at least two components of interest: the component associ- ated with the maternal cardiac acti vity and the component associated with the fetal cardiac activity , which have different time-v arying frequencies and different non-sinusoidal oscil- lations. These lead to a wide spectrum, so the usual linear signal processing techniques do not work. While it is challenging enough to separate these two components, the prob- lem becomes more challenging considering the influence of different kinds of noise in the measurement. Furthermore, due to the physiological nature of the ECG signal, the non- sinusoidal oscillation is not just impulse-like but also time-varying (see, for example, the nonlinear relationship between QT and RR intervals [48]); that is, we will run into the time-varying wav e-shape function issue. In this section, we provide a phenomenological model suitable for the aECG signal and an algorithm suitable for analyzing this kind of sig- nal. Second, we provide a lo w dimensional and nonlinear geometric model to describe the maternal and fetal cardiac acti vities. Based on this nonlinear model, the nonlocal median algorithm is introduced to reconstruct the time-varying wav e-shape function, and hence extract the mECG and fECG from the aECG. 2.1. Adaptive non-harmonic model and de-shape short time Fourier transform. T o extract the fECG information from the aECG signal, we propose to apply the adaptive non-harmonic model to model the aECG signal, and apply the de-shape STFT to study the aECG. 2.1.1. Adaptive non-harmonic model. W e start from introducing the adaptive non-harmonic model. T ake a small enough 0 ≤ ε < 1, a non-neg ati ve sequence c = { c ( ` ) } ∞ ` = 0 , 0 < C < ∞ and N ∈ N . The set of functions D c , C , N ε in the space of bounded and continuous functions with continuous first order deriv ativ es, denoted as C 1 ( R ) ∩ L ∞ ( R ) , consists of functions x ( t ) = 1 2 B 0 ( t ) + ∞ ∑ ` = 1 B ` ( t ) cos ( 2 π φ ` ( t )) (1) satisfying the following three conditions. First, the re gularity condition says that B ` ∈ C 1 ( R ) ∩ L ∞ ( R ) , φ ` ∈ C 2 ( R ) for each ` = 0 , . . . ∞ . For all t ∈ R , B 1 ( t ) > 0, B ` ( t ) ≥ 0 for all ` = 0 , 2 , 3 , . . . , ∞ and φ 0 ` ( t ) > 0 for all ` = 1 , . . . , ∞ . Second, the time-varying wave-shape condition says that for all t ∈ R , (2)     φ 0 ` ( t ) φ 0 1 ( t ) − `     ≤ ε for all ` = 0 , 1 , . . . , ∞ , (3) ε < B ` ( t ) B 1 ( t ) ≤ c ( ` ) for all ` = 1 , . . . , N , ∑ ∞ ` = N + 1 B ` ( t ) ≤ ε B 1 ( t ) , and ∑ ∞ ` = 0 ` c ( ` ) ≤ C . Third, the slowly varying condition says that for all t ∈ R , | B 0 ` ( t ) | ≤ ε c ( ` ) φ 0 1 ( t ) , | φ 00 ` ( t ) | ≤ ε ` φ 0 1 ( t ) , (4) for each ` = 0 , . . . ∞ , and k φ 0 1 ( t ) k L ∞ < ∞ . W e call a function x in D c , C , N ε an adaptive non-harmonic (ANH) function , where the adjectiv e non-harmonic indicates the possibly non-sinusoidal nature of the oscillation, and the adjectiv e adaptive indicates the time-varying nature of the frequency , amplitude, and SINGLE CHANNEL FECG BY DE-SHAPE STFT AND NONLOCAL MEDIAN 5 the non-sinusoidal oscillatory pattern. W e call B 1 ( t ) cos ( 2 π φ 1 ( t )) the fundamental com- ponent , B 1 ( t ) the fundamental amplitude , φ 1 the phase function , and φ 0 1 the fundamental instantaneous frequency (IF) of the signal x . By a slight abuse of terminology , for ` > 1, we call B ` cos ( 2 π φ ` ( t )) the ` -th multiple of the fundamental component, which we simply call the ` -th multiple if no danger of confusion is possible, B ` ( t ) the amplitude of the ` -th multiple , and φ 0 ` the IF of the ` -th multiple , although φ ` might not be an e xact integral mul- tiple of φ 1 . Thus, we could view an ANH function as an oscillatory component with the time-varying amplitude, fr equency , and wave-shape function . A special case deserves a discussion. When β ` : = B ` ( t ) B 1 ( t ) are constants for all ` = 0 , 1 , . . . , ∞ and φ 0 ` ( t ) = ` φ 0 1 ( t ) + α ` for some α ` ∈ R for all ` = 1 , . . . , ∞ , (1) is reduced to the x ( t ) = B 1 ( t ) s ( φ 1 ( t )) , (5) where s is a 1-periodic function with the Fourier coefficients determined by β ` and α ` . In this case, clearly the phase function φ 1 is linearly related to the deformation of the non- sinusoidal oscillation, and we say that the wav e-shape function is not time-varying. On the contrary , for an ANH function, the phase function φ 1 might be nonlinearly related to the deformation of the non-sinusoidal oscillation. If we further assume that ε = 0, then we obtain the well known harmonic function. T o motiv ate this model, take the relationship between the RR and QT intervals of an ECG signal as an example. The nonlinear relationship between the QT interval and the RR interval has been well studied – for example, the Fridericia’ s formula (QT interval is proportional to the cubic root of RR interval) or a fully nonlinear depiction [48]. Thus, the QRS complex representing the ventricular response is not linearly related to the in- stantaneous heart rate, and we need a time-v arying w av e-shape function to model this physiological fact. For more detailed discussion, we refer the reader with interest to [44]. In general, a signal might be composed of more than one oscillatory component; for example, the aECG signal is composed of the fECG and maECG. T ake a small enough 0 < ε < 1 and d > 0. W e consider the set D ε , d ⊂ C 1 ( R ) ∩ L ∞ ( R ) consisting of superposition of ANH functions; that is, (6) f ( t ) = K ∑ k = 1 f k ( t ) for some finite K ∈ N and f k ( t ) = ∞ ∑ ` = 0 B k ,` ( t ) cos ( 2 π φ k ,` ( t )) ∈ D c k , C k , N k ε k for some 0 ≤ ε k ≤ ε , non-negativ e sequence c k = { c k ( ` ) } ∞ ` = 0 , 0 < C k < ∞ and N k ∈ N , where the fundamental IF’ s of all ANH functions satisfy the following two conditions if K > 1. First, the fr equency separation condition says that (7) φ 0 k , 1 ( t ) − φ 0 k − 1 , 1 ( t ) ≥ d for k = 2 , . . . , K . Second, the non-multiple condition says that for each k = 2 , . . . , K , φ 0 k , 1 ( t ) / φ 0 `, 1 ( t ) is not an integer for ` = 1 , . . . , k − 1. W e say that a signal in D ε , d satis- fies the ANH model . 2.1.2. Model the maternal abdominal ECG signal by the ANH model. W e now model a recorded aECG signal, x ( t ) , by the ANH model, whichs satisfies (8) x ( t ) = x m ( t ) + x f ( t ) + n ( t ) , 6 L. SU AND H.-T . WU where x m ( t ) ∈ D c m , C m , N m ε for some c m , C m , N m is the maECG signal, x f ( t ) ∈ D c f , C f , N f ε for some c f , C f , N f is the fECG signal, and n ( t ) is noise. Here, the fundamental IF of x m (re- spectiv ely x f ) is the mIHR of maECG (respecti vely fECG). n ( t ) includes dif ferent kinds of noise, ranging from the baseline wandering, po wer line interference, maternal electromyo- graphic signal, to uterine contraction, so in general it is not stationary . W e consider either smooth varying non-stationary noise model with a smooth and slowly varying covariance function for n ( t ) to capture the possible heteroscedasticity and autocorrelation inside the noise [17, Equation (6)], or a more general piece wise locally stationary model [67, Defini- tion 1] to further capture the abrupt change inside the noise structure. In sum, both x m ( t ) and x f ( t ) are signals with time-varying IF due to the inevitable HR V with time-v arying amplitude and wav e-shape function representing the cardiac activity , and the recorded signal might be contaminated by different kinds of ine vitable noise. 2.1.3. de-shape short-time F ourier transform. There are sev eral challenges of analyzing the aECG signal, including the time-varying amplitude, time-varying frequency , and the time-varying non-sinusoidal oscillation. T o deal with this kind of signal, under the ANH model (8), we could apply the currently proposed algorithm, the de-shape STFT . The de- shape STFT is a nonlinear TF analysis technique combining the well known STFT and the cepstrum technique commonly applied in the signal processing field, and it is composed of the follo wing four steps. First, note that x ( t ) is a tempered distribution, so with a chosen window function h ∈ S , where S is the Schwartz space, we ha ve (9) V ( h ) x ( t , ξ ) = Z x ( τ ) h ( τ − t ) e − i 2 π ξ ( τ − t ) d τ , where t ∈ R indicates time and ξ ∈ R indicates frequency . Clearly , V ( h ) f ( t , ξ ) ∈ C ∞ is smooth and slowly increasing on both time and frequency axes. Precisely , the frequency function V ( h ) f ( t , · ) for any t (as well as the temporal function V ( h ) f ( · , ξ ) for any ξ ) and all its deriv atives hav e at most polynomial growth at infinity . Second, ev aluate the short time cepstral tr ansform (STCT) in order to obtain the fundamental period and its multiples: (10) C ( h , γ ) x ( t , q ) : = Z | V ( h ) x ( t , ξ ) | γ e − i 2 π q ξ d ξ , where γ > 0 is sufficiently small and q ∈ R is called the quefency (its unit is second or any feasible unit in the time domain). Since V ( h ) f ( t , ξ ) ∈ C ∞ is smooth and slowly increas- ing, | V ( h ) f ( t , · ) | γ is continuous and slowly increasing. Hence its Fourier transform is well- defined in the distribution sense. Third, ev aluate the inver se short time cepstral transform (iSTCT) , which is defined on R × R + as (11) U ( h , γ ) x ( t , ξ ) : = C ( h , γ ) x ( t , 1 / ξ ) , where ξ > 0 (its unit is Hz or any feasible unit in the frequency domain) and U ( h , γ ) x ( t , · ) is in general a distribution. Precisely , the map ξ → 1 / ξ from ( 0 , ∞ ) to ( 0 , ∞ ) is open and its differentiation is surjective on ( 0 , ∞ ) , so for a distribution C ( h , γ ) x ( t , · ) , for any t , defined on ( 0 , ∞ ) , C ( h , γ ) x ( t , 1 / ξ ) is well-defined [33, Theorem 6.1.2]. Fourth, we remove all the multiples by ev aluating the de-shape STFT , which is defined on R × R + as (12) W ( h , γ ) x ( t , ξ ) : = V ( h ) x ( t , ξ ) U ( h , γ ) x ( t , ξ ) , where ξ > 0 is interpreted as frequency . Note that the de-shape STFT is well-defined as a distribution since V ( h ) f ( t , ξ ) is a C ∞ function and U ( h , γ ) f ( t , ξ ) is a distribution. SINGLE CHANNEL FECG BY DE-SHAPE STFT AND NONLOCAL MEDIAN 7 The main motiv ation of the de-shape STFT is to decouple the IF and the non-sinusoidal wa ve-shape function. Due to the non-sinusoidal oscillation, at each time t , in STFT we could see not only the fundamental frequency b ut also its multiples. The existence of multiples, when there are more than one component, interfere with each other and mask the true information we have interest. Thus, the notion of cepstrum is applied to obtain the fundamental period information of the signal. Note that the fundamental period and its multiples, after the inv ersion, become the fundamental frequency and its divisions, and hence the common ingredient between iSTCT and STFT is the fundamental frequency . Thus, after a direct element-wise product, only the fundamental frequency is preserved in the de-shape STFT . This approach could be viewed as a “nonlinear filter” technique, which uses the “dual information” of the spectrum (the cepstrum), as a mask to remov e the irrelev ant information (the wa ve-shape function) and keep the relev ant information (the IF). It has been shown in [44, Theorem 3.6] that under the ANH model (6), the de-shape STFT could extract the fundamental IF of each component, where all multiple IF’ s are sup- pressed. In other w ords, the time-v arying wa ve-shape information of maECG and fECG is decoupled from the IF and AM of maECG and fECG in the TF representation. The imple- mentation of the de-shape STFT in the discrete-time domain will be discussed in Section 3. 2.2. Nonlocal median. Nonlocal median algorithm [15, 45] is a variation of the well- known nonlocal mean algorithm in the image processing field, mainly for the purpose of image denoising [11, 60]. After obtaining the fundamental IF of each component, we apply the nonlocal median algorithm to recover the time-varying wav e-shape function. In this study , it allows us to extract the maECG and fECG from the aECG signal. W ithout loss of generality , we assume that maECG and fECG are both with the Rs pattern in the aECG so that we could discuss the R peaks. The discussion holds for the S peaks if either maECG or fECG has the rS pattern. T ake the maECG into account. Suppose the k -th cardiac activity , which could be normal or ectopic, without any other pathological arrhythmia, in the maECG starts at s ( k ) m ∈ R and ends at e ( k ) m ∈ R , and suppose the k -th R peak is located at r ( k ) m ∈ ( s k , e k ) . Similarly , we could define s ( k ) f , e ( k ) f and r ( k ) f for the k -th cardiac acti vity in the fECG. By the physiological property of the cardiac activity , we kno w that e ( k ) m < s ( k + 1 ) m and e ( k ) f < s ( k + 1 ) f for all k , where ( e ( k ) m , s ( k + 1 ) m ) and ( e ( k ) f , s ( k + 1 ) f ) are periods where the maECG and fECG are isoelectric respectiv ely (kno wn as the TP interval). Thus, the ANH function x m in (8) could be written as (13) x m ( t ) = ∑ k ∈ Z x m ( t ) χ [ s ( k ) m , e ( k ) m ] , where χ [ s ( k ) m , e ( k ) m ) is an indicator function defined on [ s ( k ) m , e ( k ) m ] . Note that due to the time- varying nature of the non-sinusoidal oscillation, the amplitude of the fundamental compo- nent and each multiple in the ANH function is nonlinearly related to the wa veform mor- phology . It is well known that electrophysiologically , the fECG and maECG are similar , except the heart rate [56], so the above discussion could be carried over to the fECG, x f . Consider l m = max k { r ( k ) m − s ( k ) m } > 0 and r m = max k { e ( k ) m − r ( k ) m } > 0, and denote the k -th abdominal cardiac activity mixture as (14) x ( k ) m : [ 0 , l m + r m ] → R , 8 L. SU AND H.-T . WU where x ( k ) m ( t ) = x ( r ( k ) m + t − l m ) when t ∈ [ l m − ( r ( k ) m − s ( k ) m ) , l m + ( e ( k ) m − r ( k ) m )] and x ( k ) m ( t ) = 0 otherwise. Clearly , x ( k ) m is the k -th aECG segment containing the k -th maternal cardiac activities, denoted as (15) x ( k ) m , m : [ 0 , l m + r m ] → R , where x ( k ) m , m ( t ) = x m ( r ( k ) m + t − l m ) when t ∈ [ l m − ( r ( k ) m − s ( k ) m ) , l m + ( e ( k ) m − r ( k ) m )] and x ( k ) m , m ( t ) = 0 otherwise, and several fetal heart beats, since normally the fetal heart rate is higher . Denote (16) X m : = { x ( k ) m } k ∈ N ⊂ C 1 ([ 0 , l m + r m ]) to be collection of aECG segments from x ( t ) and (17) X m , m : = { x ( k ) m , m } k ∈ N ⊂ C 1 ([ 0 , l m + r m ]) to be collection of maECG segments from the underlying maternal cardiac activities. W e could thus view X m as a “noisy” collection of maECG segments X m , m , and the mission is to recov er X m , m from X m . Similarly , we could define X f = { x ( k ) f } k ∈ N ⊂ C 1 ([ 0 , l f + r f ]) and X f , f = { x ( k ) f , f } k ∈ N ⊂ C 1 ([ 0 , l f + r f ]) , where l f : = max k { r ( k ) f − s ( k ) f } > 0, r f : = max k { e ( k ) f − r ( k ) f } > 0, x ( k ) f ( t ) = x ( r ( k ) f + t − l f ) when t ∈ [ l f − ( r ( k ) f − s ( k ) f ) , l f + ( e ( k ) f − r ( k ) f )] and x ( k ) f ( t ) = 0 otherwise, and x ( k ) f , f ( t ) = x f ( r ( k ) f + t − l f ) when t ∈ [ l f − ( r ( k ) f − s ( k ) f ) , l f + ( e ( k ) f − r ( k ) f )] and x ( k ) f , f ( t ) = 0 otherwise. Physiologically , it is well known that while the underlying mechanism leading to the cardiac activities might be complicated [5], phenomenologically they are similar from beat to beat. There are two dominant parameters that quantify the similarity between cardiac activities – the scaling and the dilation, where the scaling reflects the respiratory activity and the dilation reflects the nonlinear relationship between the RR interval and QT inter- val. Also, the wav eform representing the cardiac activity should be bounded and with a bounded differentiation. This fact could be summarized in the following: Assumption 2.1. The dataset X m , m is sampled fr om a random vector V m , where V m has the r ange supported on a bounded set inside C 1 ([ 0 , l m + r m ]) with a low dimensional struc- tur e. T o simplify the qualitative description “low dimensional structur e”, we assume a low dimensional smooth and compact manifold to quantify the range of V m . Note that due to the fact that the maECG amplitude and frequency are both time-varying, under this model, two consecutiv e maternal cardiac acti vities might be far away in the manifold. Similarly , we have the same assumption for the fetal cardiac activity; that is, the set X f , f is sampled from a random vector V f , with the range supported on a bounded set inside C 1 ([ 0 , l f + r f ]) with a low dimensional structure. A critical assumption we need to apply the nonlocal median algorithm is the following Assumption 2.2. The random vectors V m and V f ar e independent. This assumption essentially says that for different x ( k ) , while the maternal cardiac ac- tivities are similar , the fetal cardiac activities sit in random positions. W ith the above setup and assumptions, we could now introduce the nonlocal median algorithm. For each x ( k ) m ∈ X m , find its K nearest neighbors with the L 2 norm, where K ∈ N is chosen by the user . Precisely , by ranking d k , l : = k x ( k ) m − x ( l ) m k L 2 in the ascending order , SINGLE CHANNEL FECG BY DE-SHAPE STFT AND NONLOCAL MEDIAN 9 we have the set N ( k ) m containing the first K neighbors of x ( k ) m in X m with the smallest L 2 norm. Note that we could also consider the correlation or other more sophisticated metrics, but to keep the discussion simple, we focus on the L 2 norm in this paper . Then, the k -th maternal cardiac activity is estimated by (18) ˜ x ( k ) m ( t ) : = median { x ( j ) m ( t ) | x ( j ) m ∈ N ( k ) m } for all t ∈ [ 0 , l m + r m ] . Based on Assumption 2.1, the neighbors we find ha ve similar mater - nal cardiac activities while the y are not neighbors in the temporal axis. On the other hand, it is well known that the median is robust to outliers. Note that under Assumption 2.2, for different x ( k ) , the fetal cardiac activities sit in dif ferent positions in the support of x ( k ) , and note that the ECG morphology associated with the ventricular acti vity is spiky , so the fetal cardiac acti vity occupies a small portion of the interval [ 0 , l m + r m ] . As a result, the median value will faithfully reflect the maternal cardiac acti vity at t , and hence the recov ery of the maECG. The above procedure could be applied to extract the fECG signal from the aECG signal, if we rev erse the role of the maECG and fECG, and consider X f from X f , f . 3. M E T H O D O L O G Y In this section, we introduce our single-lead fECG extraction algorithm, and provide a summary of the used statistics for the ev aluation. 3.1. Single-channel fetal ECG extraction algorithm. W e now introduce our algorithm to extract the fECG signal from the aECG signal. The aECG signal x ( t ) is sampled with the sampling rate f s ∈ N during the interv al from the 0-th second to the T -th second, where T > 0. Denote (19) x 0 : = [ x ( 0 ) , x ( 1 / f s ) , . . . , x ( b T / f s c )] T ∈ R N , where N = b T / f s c + 1, to be the collected aECG signal. Thus we hav e (20) x 0 = x m + x f + n , where x m , x f , and n are the discretized maECG signal, fECG signal and noise. The algo- rithm is summarized in the flowchart in Figure 1. Below we detail the algorithm step by step. 3.1.1. Step 0: Pr eprocessing . The first step of the proposed algorithm, as most other al- gorithms, is signal preprocessing. Among all the analyses, we apply the following steps. First, we remove the baseline wandering by the median filter with the window size 100 ms. If needed, the power -line interference is suppressed by a zero-phase notch filter at 50 or 60 Hz. T o preserve the non-stationary nature of the signal, we do not apply any other linear filtering technique to the signal. If the signal is sampled at the frequency lower than 1000 Hz, to enhance the R peak alignment needed in the nonlocal median step, the signal is upsampled to 1000 Hz [41]. T o simplify the notation, we use the same notation f s and N to denote the resulting sampling rate and the resulting number of sampling points, and denote the resulting signal as x ∈ R N . 10 L. SU AND H.-T . WU Input the signal lead aECG x 0 preprocessing get pr epr ocessed aECG x de-shape STFT get TFR of x Curve e xtraction get mIHR (and optionally fIHR) Beat tracking get maternal R peaks nonlocal median get maECG ˜ x m and r ough fECG ˜ x f , 0 : = x − ˜ x m de-shape STFT get TFR of ˜ x f , 0 Curve e xtraction get fIHR Beat tracking get fetal R peaks nonlocal median get fECG ˜ x f Output fIHR , fetal R peaks , and fECG F I G U R E 1 . The flow chart of the proposed algorithm for extracting fECG from the signal lead aECG signal. The shown aECG signal is of 6 seconds long. SINGLE CHANNEL FECG BY DE-SHAPE STFT AND NONLOCAL MEDIAN 11 3.1.2. Step 1: Run de-shape STFT on x to estimate the maternal instantaneous heart rate. W e apply the de-shape STFT algorithm to extract the IF information of the maternal and fetal cardiac activities from the preprocessed aECG signal x . Fix the frequency resolution of the STFT by f s 2 M , where M ∈ N is the number of discretization points in the frequency axis, and the quefency resolution by M f s M 0 , where M 0 ∈ N is the number of discretization points in the quefency axis. The numerical implementation of the de-shape STFT algorithm is summarized in Algorithm 1. Denote W x ∈ C N , M to be the de-shape STFT of x . Algorithm 1 de-shape STFT algorithm. [INPUT] the aECG signal x ∈ R N ; a discretized window function h ∈ R 2 k + 1 , k ∈ N ; a sufficiently small γ > 0; an upsample factor α ∈ N . [STEP 1-1] Evaluate the STFT of x , denoted as V x ∈ C N , 2 M , where M ∈ N is related to the chosen frequency resolution, by V x ( n , m ) = 1 f s k ∑ l = − k x ( n + l ) h ( l + k + 1 ) e − i π n ( m − M ) / M , where m = 1 , . . . , 2 M and we pad x by 0 so that x ( l ) = 0 when l < 1 and l > N . [STEP 1-2] Evaluate the STCT of x , denoted as C x ∈ C N , 2 M by: C x ( n , m 0 ) : = f s 2 M 2 M ∑ m = 1 | V x ( n , m ) | γ e − i π m 0 ( m − M ) / M , where m 0 = 1 , . . . , M . [STEP 1-3] Upsample the positive quefency axis of C x to be α -time finer; that is, con- struct a new matrix ˜ C x ∈ C N , α M , where ˜ C x ( n , m 00 ) : = C x  n , M + m 00 α  for all n = 1 , . . . , N and m 00 = 1 , . . . , 2 α M . [STEP 1-4] Evaluate the iSTCT of x , denoted as U x ∈ C N , M , by: U x ( n , m ) : = d m + 1 / 2 e ∑ 1 / m 00 = d m − 1 / 2 e ˜ C x  n , 1 m 00  , where n = 1 , . . . , N and m = 1 , . . . , M . [STEP 1-5] Evaluate the de-shape STFT of x , denoted as W x ∈ C N , M , by: W x ( n , m ) : = V x ( n , m ) U x ( n , m ) , where m = 1 , . . . , N and n = 1 , . . . , M . [OUTPUT] the de-shape STFT of x , W x , for the postprocessing. It has been systematically reported in [44] that the fundamental frequencies of the maECG and fECG are represented as dominant curves in the TFR. This fact allo ws us to extract the salient fundamental IF information of each oscillatory component [44]. While the feature inside TFR determined by the de-shape STFT is suitable for sev eral IF tracking methods, including dynamic programming (DP), dynamic Bayesian networks, adapti ve filters, and others, to simplify the discussion, we apply the simple DP curve extraction algorithm [17] to track the peaks and estimate the IFs based on one assumption that the maECG is strong than the fECG. The detailed procedure is as follows: 12 L. SU AND H.-T . WU (1) Performing DP on W x to extract the most dominant component. The extracted curve, when adjusted with the frequency resolution, denoted as η m ∈ R N , repre- sents the mIHR. Precisely , suppose at time n , the most dominant component is located at W x ( n , j n ) , then η m ( n ) = j n f s 2 M . (2) (Optional step 1: simultaneously estimate the fIHR) For ev ery time n , multi- ply W ( n , j ) by θ m , where j ∈ { η m ( n ) − N ∆ , η m ( n ) − N ∆ + 1 , . . . , η m ( n ) + N ∆ − 1 , η m ( n ) + N ∆ } and N ∆ ∈ N and 0 ≤ θ m < 1 are chosen by the user , to suppress the maternal cardiac acti vity . This procedure makes the fECG be the predominant component in W x . Then, performing the curve extraction on W x again to extract the fIHR, denoted as η f ∈ R N . In practice, we could choose N ∆ so that N ∆ f s 2 M = 0 . 1 Hz and θ m = 10 − 4 . This optional step is not carried out in this paper . The temporal complexity of ev aluating the de-shape STFT is O ( N M log M ) . Indeed, the ev aluation of STFT is local in nature, and it depends on the chosen window and the frequency resolution. 3.1.3. Step 2: Obtain the maternal R peaks by beat tracking and dynamic pr ogr amming. Note that although theoretically the IHR is related to the R peak to R peak interval (RRI) time series, in practice there is a discrepancy due to the time-varying nature of the wav e- shape function. T o obtain the exact R peak location, we apply the beat tracking technique, which has been well studied in music signal analysis 1 . Define the estimated maternal RRI as δ m ( n ) : = 1 / η m ( n ) , which is the in verse of the mIHR. Our goal is to find an strictly increasing sequence B m = { b i } M m i = 1 of length M m , where M m ∈ N and 0 < b i ≤ N so that b i is the index of i -th maternal R peak position and ( b i − b i − 1 ) / f s is close to the estimated maternal RRI at time b i / f s , δ m ( b i ) . W e call B m the maternal beat sequence . This problem is understood as the beat tracking problem, and it is formulated as the following optimization problem [26]: (21) ˜ B m = argmax B m h M m ∑ i = 1 x ( b i ) + λ BT M m ∑ i = 2 P ( b i , b i − 1 ) i , where P ( b i , b i − 1 ) : = −  log 2  ( b i − b i − 1 ) / f s δ m ( b i )  2 and λ BT ≥ 0 is the penalty term determined by the user, which balances the influence of R peak amplitude and the estimated maternal RRI. Notice that [26] assumes δ m ( n ) (the inter-beat interval in the music clip; in our case, the IHR) being a constant function, which is clearly not suitable for an oscillatory signal with time-varying frequency . T o address this issue, we modify the original formulation in [26] by allo wing a time-v arying function 1 / η m ( n ) . Note that the first term in the objectiv e function in (21) reaches its maximal value when b i matches the R-peak location, and the second term penalizes the discrepanc y between the estimated IHR and the IHR determined by the true RR interval, with the maximal value zero. The resulting beat sequence ˜ B m = { ˜ b i } M m i = 1 provides an estimate of R peaks location; that is, ˜ b i is an estimate of the i -th R peak location. In our experiments, setting λ BT between 20 and 50 turns out to yield a suitable tradeoff, and the result is not sensiti ve to λ BT . While the number of possible beat sequences in x grows exponentially as N grows, the optimization problem (21) can be solved effecti vely by the DP . The main idea leading to the DP is that the objecti ve function in (21) is an accumulation in time. Thus, we could divide the problem into a set of optimization subproblems, each of which optimizes the objecti ve 1 The beat tracking problem in music refers to finding the instants of beats in music by analyzing the accents of music signals (e.g., drum hits). SINGLE CHANNEL FECG BY DE-SHAPE STFT AND NONLOCAL MEDIAN 13 function up to a prescribed time step, and the solution is the combination of the optima at ev ery time step. This is implemented by introducing two additional vectors C ∈ R N and D ∈ R N , where C ( n ) records the maximum of the objective function accumulated from 1 to n , where n = 1 , . . . , N , and D ( n ) records the estimated beat position yielding this maximum at the time step of n − 1. D allows us to trace previous beat positions, and this step is called the backtr acking . The whole procedure is sketched in Algorithm 2. More details of this method can be found in [26], and the source code is av ailable in http://labrosa.ee.columbia.edu/projects/beattrack/ . Algorithm 2 Beat tracking by dynamic programming. [INPUT] the aECG signal x ∈ R N ; an estimated sequence of RR intervals δ m ; an weighted parameter λ BT ; [STEP 2-1] Initialize C ( 0 ) = 0, D ( 0 ) = 0 [STEP 2-2] Accumulate the objectiv e function and store the result for i = 1 to N do C ( n ) = x ( n ) + max m = 1 ,..., n − 1 [ C ( m ) + λ BT P ( n , m )] ; D ( n ) = argmax m = 1 ,..., n − 1 [ C ( m ) + λ BT P ( n , m )] ; end for [STEP 2-3] Backtracking a 1 = argmax n C ( n ) ; while D ( a l ) > 0 do l ← l + 1; a l = D ( a l − 1 ) ; end while [OUTPUT] the optimal beat sequence ( b 1 , · · · , b M m ) , where b j = a M m − j + 1 , j = 1 , . . . , M m . 3.1.4. Step 3: Estimate the maECG morphology by the nonlocal median. W ith the mater- nal R peak location, we could extract x m from the aECG signal. Based on the physiologi- cal knowledge, we choose L m , R m ∈ N so that [ ˜ b i − L m f s , ˜ b i + R m f s ] , where i = 1 , . . . , M m , is long enough to cov er the i -th maternal cardiac acti vity . Define the aECG se gment x i ∈ R L m + R m + 1 associated with the i -th maternal cardiac activity , where i = 1 , . . . , M m and x ( i ) : =  x ( ˜ b i − L m ) . . . x ( ˜ b i − 1 ) x ( ˜ b i ) x ( ˜ b i + 1 ) . . . x ( ˜ b i + R m )  t , (22) where the superscript t means taking the transpose. Note that the i -th R peak is located on the ( L m + 1 ) -th entry of all x i . Denote N ( i ) m = { x ( i 1 ) , . . . , x ( i K m ) } to be the first K m nearest neighbors of x ( i ) with respect to the L 2 norm, where K m ∈ N is chosen by the user . The i -th maternal cardiac activity is thus estimated by ˜ x ( i ) m ( l ) : = median { x ( i j ) ( l ) } K m j = 1 , (23) where l = 1 , . . . , L m + R m + 1. Before estimating the maECG from { ˜ x ( i ) m } M m i = 1 , we need to take care of the possible ov er- lapping issue. If ˜ x ( i ) m ov erlap with its neighboring segments ˜ x ( i + 1 ) m , we need to taper the ov erlapping regions of both segments. See Algorithm 3 for the implementation of the ta- pering step. T o simplify the notation, we use the same notation ˜ x ( i ) m to denote the tapered 14 L. SU AND H.-T . WU segment. Finally , the maECG could be estimated by ˜ x m ∈ R N , where (24) ˜ x m ( ˆ b i + j ) = M m ∑ i = 1 ˜ x ( i ) m ( j + L m + 1 ) for all i = 1 , . . . , M m and j = − L m , . . . , R m , and zero otherwise. In practice, we found that the result is stable when K m ranges from 20 to 60. The whole procedure is sketched in Algorithm 3. Algorithm 3 Non-local median for maECG extraction. [INPUT] the aECG signal x ∈ R N ; an estimated R-peak sequence ˜ B m ; L m and R m ; the number of nearest neighbors K m ; [STEP 3-0] Initialize y ∈ R N , y ( i ) = 0 where i = 1 , . . . , N [STEP 3-1] Segmenting the aECG signal according to the R peaks for i = 1 to M m do I m , i : = { ˜ b i − L m , ˜ b i − L m + 1 , . . . , ˜ b i + R m } x ( i ) : =  x ( ˜ b i − L m ) . . . x ( ˜ b i − 1 ) x ( ˜ b i ) x ( ˜ b i + 1 ) . . . x ( ˜ b i + R m )  t ; end for X : = { x ( i ) } M m i = 1 ⊂ R L m + R m + 1 for i = 1 to M m do [STEP 3-2] Select X K m ⊆ X , the K m -th nearest neighbors of x ( i ) in the L 2 distance. [STEP 3-3] T ake non-local median: ˜ x ( i ) m ( l ) = median { x ( k ) ( l ) } x ( k ) ∈ X K m , where l = 1 , . . . , L m + R m + 1. [STEP 3-4] T aper the (possible) overlap region between ˜ x ( i ) m and ˜ x ( i − 1 ) m , and between ˜ x ( i ) m and ˜ x ( i + 1 ) m o 1 : = | I m , i − 1 ∩ I m , i | , o 2 : = | I m , i ∩ I m , i + 1 | for l = 1 to L m + R m + 1 do if l ≤ o 1 then ˜ x ( i ) m ( l ) ← ˜ x ( i ) m ( l ) sin 2 ( π l 2 o i ) end if if l ≥ ( L m + R m + 1 ) − o 2 + 1 then ˜ x ( i ) m ( l ) ← ˜ x ( i ) m ( l ) cos 2 ( π ( l + o 2 − 1 − ( L m + R m + 1 )) 2 o i ) end if end for [STEP 3-5] y | I m , i ← y | I m , i + ˜ x ( i ) m end for [OUTPUT] the maECG signal y ∈ R N . 3.1.5. F inal step: get the fIHR and obtain the fECG signal. Denote ˜ x f , 0 : = x − ˜ x m to be the r ough fECG estimate. The fIHR, the fetal R peaks, and fECG could be obtained by repeating Step 1, Step 2, and Step 3 on the rough fECG estimate. Precisely , the fIHR could be obtained by running the de-shape STFT on ˜ x f , 0 and another DP curve extraction on W ˜ x f , 0 in Step 1. [Optional step 2] Before running the DP curve extraction, we could re-weight W ˜ x f , 0 around the band associated with the mIHR η m by a small constant θ f > 0 to reduce to the possible impact of the remaining maECG component. Precisely , set W ˜ x f , 0 ( n , j n ) to be θ f W ˜ x f , 0 ( n , j n ) , where n = 1 , . . . , N and j n ∈ { η m ( n ) − N ∆ , η m ( n ) − N ∆ + 1 , . . . , η m ( n ) + SINGLE CHANNEL FECG BY DE-SHAPE STFT AND NONLOCAL MEDIAN 15 N ∆ − 1 , η m ( n ) + N ∆ } . In practice, choosing θ f = 1 / 10 could slightly improve the result, and we recommend to take it into account. Denote η f ∈ R N to be the final estimated fIHR. The fetal R peak location could be further refined by running the beat tracking technique by the DP . Denote the final estimated fetal R peaks by a strictly increasing sequence ˜ B f . The fECG signal could be reconstructed by the nonlocal median algorithm, with L f , R f ∈ N chosen based on the physiological fact and K f ∈ N nearest neighbors chosen by the user . Denote the final reconstructed fECG as ˜ x f ∈ R N . In practice, we found that the result is stable when K f ranges from 20 to 60. [Optional step 3] This optional step takes the physiological constraint into account. If the a verage η f determined from the rough fECG signal is smaller than η m , the final recon- structed maECG and fECG could be exchanged to respect the ph ysiological constraint that normally the mIHR is slower then fIHR. When both the fetus and mother are healthy , this step could help when the fECG is strong than the maECG, a case which is not commonly seen. 3.2. Evaluation statistics. The R peak detection result is ev aluated by beat-to-beat com- parisons between the detected beats and the annotation provided by the experts. W e follo w the suggestion [31] and choose a matching window of 50 ms. Denote T P , F P , T N , and F N to be true positiv e, f alse positi ve, true neg ati ve, and false negativ e. W e report the sensiti vity (SE) SE = T P / ( T P + F N ) , the positiv e predictive v alue (PPV) PP V = T P / ( T P + F P ) , and the F 1 score, which is the harmonic mean of PPV and SE, F 1 = 2 T P / ( 2 T P + F N + F P ) . In addition, the mean absolute error (MAE) of estimated R peak locations is also report. T o make the ev aluation independent of the detection accuracy , we follo w the suggestion in [5] to calculate the MAE only on TP annotations. T o ev aluate the fECG morphology recovery , we consider the correlation between the estimated fECG signal and the true fECG signal, ov er the period ranging from 80 msec before the annotated R peak and 120 msec after the annotated R peak. Again, we ev aluate the correlation only on TP annotations. 4. M AT E R I A L A N D R E S U LT For a fair comparison, the parameters for the algorithm are set to be the same for all signals throughout the paper, and the optional step 2 and 3 are implemented, unless other- wise stated. The window h is chosen to be the Hamming windo w of length 5 second, the up-sampling rate α = 10, the frequency resolution in STFT and de-shape STFT is set to 0 . 02 Hz, the quefrency resolution is set to 10 msec, γ = 0 . 3 for the STCT , and υ = 10 − 4 % of the root mean square energy of the signal under analysis for the de-shape STFT . In the beat tracking, λ BT is set to 50. In the nonlocal median, we choose K m = K f = 40. 4.1. Material. W e e v aluate the proposed algorithm on three databases of aECG signals. The first database is the simulated fECG signal database ( fecgsym ) [5]. The publicly a vail- able simulator generates simultaneously the maECG and fECG signals in 34 channels, which model a number of realistic non-stationary physiological phenomena that affect the morphology and dynamics of the aECG, including different kinds of noise. A total of se ven physiological e vents are introduced in the simulator: 16 L. SU AND H.-T . WU (1) Baseline: the baseline abdominal mixture with a constant fIHR and mIHR without noise or ev ents; (2) Case 0: baseline signal contaminated by noise; (3) Case 1: Case 0 is complicated by the fetal movement noise; (4) Case 2: the fIHR and mIHR are time-varying with noise; (5) Case 3: Case 2 is complicated by the uterine contraction noise; (6) Case 4: Case 2 is complicated by ectopic beats in both fECG and maECG; (7) Case 5: twin pregnancy contaminated by noise. The simulator also generates five dif ferent levels of additiv e noise, ranging from 0, 3, 6, 9, to 12 dB. Each physiological ev ent and noise le vel were simulated independently fiv e rounds to mimic the realistic situation. A generated benchmark is av ailable in https: //physionet.org/physiobank/database/fecgsyndb/ [29], which contains ten sub- jects, fi ve rounds, and fiv e SNR’ s for each case. Each simulation is of 5-minute long and is discretized at the sampling rate 250Hz. For each subject, case, round, and SNR, there are one maECG, fECG, two noise realizations, and one extra uterine contraction noise in Case 3 in the database. W e sum all these time series together to get the simulated signal for analysis. W e test our algorithm on all Cases, all lev els of noise, and all five simulations, except Case 5. W e consider the twin pregnanc y case as an independent project, and the result will be reported in the other work. The second database is the PhysioNet non-in vasi ve fECG database ( adfecgdb ), where the aECG signals with the annotation provided by experts is publicly available https:// www.physionet.org/physiobank/database/adfecgdb/ [29]. The annotation is de- termined from the direct fECG recorded from the fetal scalp lead. There are five pregnant women between 38 to 40 weeks of pregnancy in this database, each has 4 aECG channels and one direct fECG signal. The signal is of 5-minute long, and is digitalized at 1000Hz with 16bit resolution. More details could be found in https://www.physionet.org/ physiobank/database/adfecgdb/ . The third database is the 2013 PhysioNet/Computing in Cardiology Challenge ( https: //physionet.org/challenge/2013/#data- sets ), abbre viated as CinC2013, which is composed of three sets, learning (training) set A, open test set B, and hidden test set C, where only set A comes with the annotation. W e thus used set A for an assessment of our proposed algorithm. There are 75 subjects in set A, and for each subject four aECG channels of length 1-minute long sampled at 1000 Hz are pro vided. More details about this database could be found in https://physionet.org/challenge/2013/#data- sets . 4.2. Results of the simulation: fecgsym database. In this simulated database, since we a priori know that the noise could be as big as 0 dB, so we choose a longer window of length 10 s. The other parameters are fixed for all the cases. Follo wing the recommended report method in [5], T ables 1 and 2 giv e an ov ervie w of the performance of the proposed algorithm. In addition to reporting the 1min result, we also report the whole 5min result. For the 1min (respectively 5min) result, for each subject, noise lev el, case, and round, we find the channel and the 1-minute subset out of the 5-minute signal (respectively the channel and the whole 5-minute signal) that giv es the highest F 1 . For each noise level and case, the median and interquartile (IQR) of those highest F 1 values, and the associated MAE of all subjects and rounds are sho wn in T ables 1. T o better understand the performance of the proposed algorithm, the whole gross statistics with the tenth highest F 1 is reported in T able 2. From these tw o tables, we could see that as SNR decreases, the performance decreases, which is consistent with the results of different TS techniques reported in [5, Figure 4]. It SINGLE CHANNEL FECG BY DE-SHAPE STFT AND NONLOCAL MEDIAN 17 is also clear that the overall accuracy of the proposed algorithm is higher than those of TS techniques and adaptive filtering techniques reported in [5, T able 3]. W e also notice that the IQR in our report is higher than those of different TS techniques reported in [5, T able 3]. This is caused by the fact that we do not use the ground truth maternal R peak information but estimate it from the aECG signal. 2 W e should notice that when SNR decreases, the lost information could not be recovered by only one channel signal. Clearly , in Case 3 where the signal is contaminated by the uterine contraction noise, the results on the 5-minute long signal is much lower . This comes from the fact that the energy of the uterine contraction is much larger than both the maECG and fECG, and almost no information could be extracted when the uterine contraction happens. Note that except Case 4, the MAE outperforms the reported results in [5, T able 3]. This indicates the strength of beat tracking and nonlocal median. In Case 4, although the MAE is larger than the other cases, it is still on the same lev el of that reported in [5, T able 3]. For the computational time o ver the 5-minute signal, it takes about 12 seconds to finish a round in MacBook Pro (Retina, 15-inch, Mid 2014) with Processor 2.5GHz Intel Core i7, Memory 16 GB 1600MHz DDR3, OS X Y osemite (V ersion 10.10.5), and Matlab R2014b without implementing the parallel computation. The fECG morphology estimation results are shown in T able 3 and Figure 2. The high correlation between the estimated fECG and the ground truth indicates that the nonlocal median algorithm leads to an estimated fECG morphology with low distortion. Note that in this ev aluation, we only ev aluate the correlation on the correctly detected fECG beats (true positive fECG beats). This suggests that if we could accurately estimate the fetal R peaks (the higher F 1 score), then we could hav e an accurate fECG reconstruction. T o look deeper into the algorithm and its performance, we take subject 1, round 1, case 4, and channel 21 into account, and show the result without noise in Figure 2. In this example, over the 5-minute, the F 1 is 1 and the MAE is 0.78 msec. In addition to the de-shape STFT of the maECG signal, the estimated R peaks by the beat tracking, the decomposed maternal ECG, the rough fECG estimate x f , 0 and its TFR, and the final fECG estimate are shown. W e see that by the de-shape STFT , the information of non-sinusoidal oscillation; that is, the time-v arying wav e-shape function, and the IHR information are decoupled, and only the IHR information is shown in the TF representation. Note that we could see both the mIHR and fIHR in the TFR, and as expected, fIHR has a weaker intensity than mIHR, meaning that the energy of fECG is smaller . The intensity level could be seen in the colorbar . For a comparison, we could see that in the STFT of aECG, both the mIHR and fIHR could be seen, while the fIHR is much weaker compared with that in the de-shape STFT . Furthermore, the multiples of the maternal fundamental component could mask both the mIHR and fIHR information and even interfere with each other . T o show how the fECG could be reconstructed, the estimated fECG and the ground truth fECG signal are put side by side for a comparison. On the other hand, we could see that the nonlocal nature of the nonlocal median algorithm does help us to recover ECG morphology , ev en the ectopic beats, and the median and IQR of the correlation of all TP beats are 0 . 997 and 0 . 004. T o show the result when noise exists, we take another signal, subject 3, round 1, case 4, and channel 23 into account, and with SNR 6dB, as an example, and show the result in Figure 3. W e choose this signal as an example since it has a smaller fECG amplitude. As 2 Note that the purpose of [5] is comparing different methods, ranging from different BSS algorithms to single-lead TS algorithms, instead of evaluati ng simply a specific TS algorithm, but in this paper we only focus on ev aluating our proposed algorithm. 18 L. SU AND H.-T . WU can be clearly seen, e ven when the noise is 6dB, the de-shape STFT still giv es a reasonable TFR with the IHR information, although there are sev eral “speckles” in the background, which come from the noise. In this example, since fECG is smaller compared with the maECG, the fIHR could not be clearly seen in the de-shape STFT of the aECG. Howe ver , it could be seen clearly in the de-shape STFT of the rough fECG estimate. The intensity lev el could be seen in the colorbar . Also note that since the fECG is smaller and the noise is big, and no denoise technique is combined into the algorithm, the de-shape STFT of the rough fECG estimate has a “scattered” background. For the morphology reconstruction, due to the large noise, the nonlocal median failed to recover two ectopic beats, as are indicated in the red arrows. Note that although the nonlocal nature of the nonlocal median has the power to handle ectopic beats, it is not designed for this purpose. T o have a better recov ery of the ectopic beats, more features specifically designed for ectopic beats should be taken into account. In this case, the F 1 is 0.996, the MAE is 3.476 msec, and the median and IQR of correlations of all TP beats are 0 . 968 and 0 . 034. W e could see that the nonlocal median provides a con vincing potential in recov ering the fECG morphology , specially when noise exists. 4.3. Results of the first r eal database: adfecgdb database. T able 4 sho ws the F 1 , MAE, PPV and SE of all channels and all subjects. Notice that in the r10 record, the direct fECG measurement was lost between 187 and 191 s, and between 203 and 211 s, so these two segments were neglected in the ev aluation. T o a void the boundary ef fect introduced by the window function, the first and last 0.5 second in ev ery recording are not ev aluated neither . In terms of F 1 , compared with the state-of-art result reported in, for example, [12], our result is overall better . The MAE, which is less reported in the literature, is as small as 10 msec, which indicates the potential to carry out the fetal HR V analysis from the single- lead aECG, although this topic is out of the scope of this paper . Howe ver , as compared with the simulated database, the MAE is larger . Note that this larger MAE is reasonable, since in this database, we use the R peaks of the direct fECG as our ground truth, but the direct fECG has a different projection direction compared with the fECG recorded from the maternal abdomen. This difference leads to the slightly lar ger MAE. T o further e xplore the proposed algorithm, in Figure 4, we sho w the result of STFT and de-shape STFT of the third channel of the case r07, denoted as x 11 , as well as the the result of STFT and de-shape STFT of the rough fECG estimate from x 11 , denoted as x 11 , f 0 (the signal used in the flowchart in Figure 1 is the channel 1 of the case r01). Clearly , in the STFT of x 11 , we not only see the fundamental IF of maECG around 1 . 2 Hz, but also its multiples, due to the nature of non-sinusoidal oscillation of the ECG signal. W e could also see a relati vely v ague component around 2 Hz in STFT , which turns out to be the fIHR. On the other hand, after the de-shape process, in the de-shape STFT of x 11 , only two dominant curves associated with the mIHR and fIHR are left. An example of the estimated fECG wav eform, x 11 , f 0 , is shown in Figure 5. The R peaks are clearly well reconstructed and match those in the direct fECG signal recorded from the fetal scalp. By comparing the rough fECG estimate and the final fECG estimate, we could see the ef fectiveness of nonlocal median. Furthermore, some fiducial points could be observed, as is shown in Figure 6, which is the zoomed in of Figure 5. Although not all critical fiducial points could be extracted, this result indicates the potential of studying the morphology of fECG, particularly taking into account the fact that we only count on a single-lead aECG signal. W e mention that in some cases the P wa ve and T wav e could be reconstructed, but in general they are buried in the noise, so we do not consider it SINGLE CHANNEL FECG BY DE-SHAPE STFT AND NONLOCAL MEDIAN 19 as an achievement of the proposed algorithm, and more work is needed to recover these morphological features. 4.4. Results of the second real database: CinC2013 database. In this database, since the signal is of only 1-minute long, we take K m = K f = 10 to enhance the local nature of the nonlocal median algorithm. In practice, the result is slightly worse if K m = K f = 40. Here we follow the report suggested in [8] to report the mean in addition to the median of all subjects. For each subject, we choose the channel with the highest F 1 score as the channel for this subject, and report the gross statistics of the obtained F 1 , PPV , SE, and MAE over 75 subjects. Over 75 subjects, the mean and standard deviation of the F 1 score (respectiv ely PPV and SE) are 86 . 37 and 22 . 9% (respectiv ely 85.77% and 23.42%, and 87.5% and 22.05%), and the mean and standard deviation of MAE are 6 . 12 msec and 5 . 56 msec. 3 Our F 1 result is compatible with the reported results in [28, 43, 24, 8, 6], while the MAE is better . On the other hand, ov er 75 subjects, the median and IQR of the F 1 score (respectiv ely PPV and SE) are 98.28% and 14% (respectively 98.25% and 14.72%, and 98.59% and 22.05%), and the median and IQR of MAE are 3 . 81 msec and 5 . 74 msec. The discrepancy between the mean and median indicates the existence of outliers in the database. W e thus take a closer look into the database. W e found that the fECG signal could be hardly visualized in 10 subjects in the database, including a27, a32, a43, a50, a59, a60, a63, a64, a68, and a75, even when the signal is clean, and these cases are considered difficult for our algorithm. If we remov e these cases, the mean and standard de viation of the F 1 score (respecti vely PPV and SE) become 94 . 54% and 11 . 63% (respecti vely 93.87% and 13.05%, and 95.78% and 8.79%), and the mean and standard deviation of MAE be- come 4 . 35 msec and 3 . 04 msec. On the other hand, the median and IQR of the F 1 score (respectiv ely PPV and SE) become 98 . 84% and 4 . 91% (respectively 98.53% and 6%, and 99.21% and 4.16%), and the mean and standard deviation of MAE become 3 . 35 msec and 4 . 33 msec. T o hav e a deeper look into the difficulty , we sho w one example, a59, which is considered difficult case for our algorithm, in Figure 7. In Figure 7, the second channel of a59 is shown for an illustration. It is clear that the signal is quite clean, and it is not easy to identify if the fECG exists, even with the help of the provided annotation. Not surprisingly , the fetal R peaks are all detected incorrectly . By a direct visual inspection, we could see that those locations that are erratically identified as fetal R peaks are nothing but the residue coming from the incomplete maECG remov al. This fact could also be observed in the de-shape STFT of the rough fECG estimate – the only “dominant component” in the de-shape STFT coincides with that of the aECG, as is indicated by the red arrow . W e mention that the other three channels all share the same result – the fECG is too small to be e ven sensed from this relativ ely clean aECG signal. Howe ver , if we take the VCG notion into account, then it is possible to enhance the result. Precisely , by linearly combining different channels, we hav e a chance to obtain a stronger fECG signal relativ e to the maECG signal, and hence the result could be better . This idea is shown in 8, where the dif ference between channel 2 and channel 3 is analyzed. Clearly , we could see that no w the fetal R peaks could be almost perfectly recov ered. In this specific case, the F 1 is 99 . 3% and the MAE is 0 . 993 msec. W e mention that if we do the same linear combination trick between channel 1 and channel 3 (respectiv ely channel 2 and channel 3, channel 2 and channel 3, channel 1 and channel 2, 3 If we follow [8] and remove 7 cases, including a33, a38, a47, a52, a54, a71, and a74, then the mean and standard deviation of F 1 score (respectiv ely PPV and SE) are 88 . 73% and 20 . 96% (respectiv ely 88 . 39% and 21 . 08%, and 89 . 12% and 20 . 77%), and the mean and standard deviation of MAE o ver 75 subjects are 5 . 88 msec and 5 . 14 msec. 20 L. SU AND H.-T . WU channel 1 and channel 2, channel 2 and channel 3, channel 2 and channel 3, channel 2 and channel 3, and channel 1 and channel 2), the F 1 ’ s of a27 (respectively a32, a43, a50, a60, a63, a64, a68, and a75) become 79.3% (respectively 100%, 100%, 97.9%, 84.7%, 91%, 98.5%, 91.4%, and 86.2%). While this is a nai ve way to handle the problem and it works well in this preliminary analysis, howe ver , we need at least two channels. Since we focus on the single-lead aECG analysis in this paper, this direction will be explored in the other work. W e mention that this idea is also considered in [6]. 5. D I S C U S S I O N A N D F U T U R E W O R K In this section, we discuss the dif ficulty of the fECG e xtraction problem from the signal processing viewpoint, different findings of the proposed algorithm, and sev eral possible applications and future works. 5.1. General technical difficulty . In the analysis of multi-component oscillatory signals, there are two chicken-and-egg problems of fundamental importance. The first is the de- tection pr oblem ; that is, ho w to determine the number of components and ho w to find the fundamental frequency or equiv alently the fundamental period of each component? The second one is the separation pr oblem ; that is, how to separate all components from a recorded signal? These two problems coexist in many kinds of data, ranging from physio- logical signals to polyphonic music signals, where each component in the mixture contains information different from others. Previously , these tw o problems are usually discussed separately , probably because only discussing one of them is challenging enough. Howe ver , recently some research works start to consider these two problems as a single one by viewing the separation problem as an e xtension of the detection problem. For e xample, we could simultaneously estimate the IF of each component, and then extract the wa ve-shape function as well as each component. The methods could be classified into two classes, iterative or joint . The iterativ e ap- proach extracts the most prominent IF/component in each iteration, until no additional IF/component can be found. Although iterativ e models are usually computationally inex- pensiv e, they have a main drawback: iterativ e models tend to accumulate errors at each iteration step if the feature representation is not robust enough. T o handle this limitation, we could consider the “joint” approach. V ast majority of recent approaches in multi-pitch estimation and source separation in music now falls within the “joint” category [10], and more and more studies on source separation started to consider pitch information to im- prov e the source separation algorithms [14, 34]. Note that although it is intuitive to utilize the IF information to handle the detection and separation problem, this kind of approach has been less studied until recent years, probably because the task of finding IF is by no means easy , especially when there are multiple components. Compared with the iterative method, the joint methods lead to more accurate estimates but with more in volv ed mathe- matical tool and increased computational cost. Our proposed algorithm falls in the “iterative” category – estimate and remove the ma- ternal component first, and get the fIHR and fECG from the left. W e do have the accumu- lated error issue when we estimate the fECG, and we count on the median filter to alle viate this error . One natural question is to ask if it is possible to generate a single-lead fECG al- gorithm in the joint category , in order to alleviate this problem, and the answer is positiv e. Precisely , the mIHR and fIHR could be estimated simultaneously by the de-shape STFT , as is shown in Figure 4. The estimated R peaks of each component could then be applied to estimate the maECG and fECG by the nonlocal median algorithm. In brief, we could consider an algorithm falling in the joint category , which is composed of two steps: (1) SINGLE CHANNEL FECG BY DE-SHAPE STFT AND NONLOCAL MEDIAN 21 jointly estimate the mIHR and the fIHR by the de-shape STFT ; that is, carry out Optional step 1 in the algorithm; (2) jointly approximate the maECG and fECG signal by the non- local mean. Howe ver , a preliminary study showed that without a more sophisticated curve extraction algorithm, this approach does not outperform what we propose in the current paper . Its potential will be explored in the future work. 5.2. Several theoretical and algorithmic topics. W e should discuss more about each step in the proposed algorithm. First of all, notice that the proposed algorithm is specifically designed to handle a nonstationary signal, which is commonly encountered in physiology . The de-shape STFT respects the local information hidden in a nonstationary time series, and decouple the non-sinusoidal oscillatory pattern from the IF; the nonlocal mean, while it is not local, takes the nonlinear relationship between the phase and the oscillation (the nonlinear relationship between the RR interval and QT interval) into account. Thus, the proposed algorithm could be applied to analyze a long time series without modification, and the hard truncation of the long time series into pieces is not needed. W e hav e seen the potential of e xtracting more information hidden inside the single-lead aECG signal by the ANH model and the de-shape STFT . In the past few years, different ideas in the TF analysis field have been proposed, regarding the model and the algorithm, and it is possible to incorporate them into the current algorithm to achieve a better practi- cal result. For example, the Blaschke decomposition and unwinding based on the analytic signal analysis technique [21], the synchrosqueezing transform, or the concentration of frequency and time (ConceFT) [22]. Howev er, to simplify the discussion, we do not carry out this combination in this paper . While de-shape STFT works well, we cannot ignore a major limitation in the curve extraction step for the IHR estimation. Precisely , when the energy of the fECG signal is greater then that of the maECG, then the curve extraction al- gorithm could easily go astray; that is, when the fECG has an almost equal or lar ger ener gy than the maECG, we could see two dominant curv es, which may or may not intersect, and the curve extraction algorithm could get confused. While we could apply the ph ysiological constraint to distinguish which curve belongs to the maECG, for example, the fIHR is in general faster than mIHR (Optional step 3), it is not univ ersal, and will limit the method to normal subjects. W e thus need more tools to handle this issue. Next, we mention that the ability to obtain an accurate IHR estimation is the key step to an accurate R peak location estimation by the beat tracking algorithm. The technical limitation in obtaining a good IF information might explain why the intuiti ve and po werful beat tracking technique is not widely applied up to now , although it was introduced in the music signal analysis field long time ago [26]. In sum, the combination of the de-shape STFT , or any other technique providing an accurate IF estimation, is essential. Also note that this combination could of fer an alternati ve way to estimate the R peaks or other fiducial points, and ev en to other signal processing problems. Furthermore, to the best of our knowledge, this work is the first one combining the nonlocal median and the nonlinear manifold model to analyze the biomedical time-series analysis, and particularly reconstructing the time-varying wave-shape function. T aking the nonlinear manifold structure to analyze a time series is certainly not a ne w idea [53, 37, 40]. Howe ver , in the past, the focus was on decoupling maECG and fECG by the locally linear projection on the sequential beats. In this work, the nonlinear geometric structure is further explored to apply the nonlocal median. In general, we could apply other manifold learning techniques to further improve the algorithm. From a data analysis viewpoint, it is well known that a manifold structure might still be too restrictiv e to be the best model for this dataset, while it could be low dimensional and nonlinear with nontri vial 22 L. SU AND H.-T . WU topological structures, and could serve as a good mathematical framework to understand different algorithms. Thus, finding a more flexible model to include more physiological facts is a critical issue in the future study . While the L 1 norm is associated with the nonlocal median algorithm, to enhance the sparsity feature, it is possible to consider the L p norm, where 0 < p < 1, which has been applied in the recently proposed nonlocal patch re gression for image denoising [16]. W e have extensi vely extracted av ailable information hidden in the single-lead aECG signal. When there are more than one channel, it could be beneficial to combine results from different channels, or to inte grate the proposed algorithm with existing multiple- lead algorithms to further improv e the accuracy . The benefit of doing so has been shown with the case a59 in the CinC2013 database in Figure 8. Another direct benefit of using multiple-lead algorithms is guiding the mIHR estimation by the curve extraction, when the fECG has a stronger energy than the maECG. In practice, we could also e xpect to combine other kinds of signals, like the contact photoplethysmogram signal commonly equipped in modern wearable de vices, to guide the mIHR estimation. The result of this kind of combination and the analysis of multiple channel signals will be reported in the future work. 5.3. Several practical topics. In this study , we do not carry out an y optimization to choose the best parameters, but just choose them in an ad hoc fashion. For example, due to the inevitable Heisenberg uncertainty principal, in general the window should be long enough to capture enough spectrum information, b ut it could not be too long, otherwise the local information could be missed. In practice we found that a window that could cover about 5-10 oscillations is a good choice. On the other hand, when the signal is noisy , we could get more stable results if the window is longer . Based on the ph ysiological constraint and the discussion, we thus choose a 10 sec long windo w for the very noisy simulated sig- nal, and a 5 sec window for the less noisy real databases. W e mention that the algorithm in general is stable to the choice of parameters, and the performance of the algorithm could be further impro ved by running a systematic parameter optimization for a specific mission. The proposed fECG extraction algorithm is just one among man y successful algorithms. Each algorithm has its own strength and weakness, and it is generally believed that there might not exist a single algorithm that could handle all different situations and dif ferent signals. Thus, it would be of great interest to consider the possibility of combining the proposed algorithm with other methods, like the combination proposed in [8]. More prac- tically , while in the real life the noise could be highly nonstationary , we may want to combine the notion of signal quality index (SQI) to help guide us in selecting the “good” signal for analysis [7, 47, 27]. In general, if the noise is really too big and bypass the infor- mation limitation, then there is no hope to recover anything. Howe ver , if the noise is not that big, then some denoising techniques could certainly help. In this work, to simplify the discussion, we do not consider any denoising technique before running de-shape STFT or beat tracking. Ho wev er , it could be of great help if we could incorporate a good de-noise scheme into the algorithm. All these interesting practical issues will be explored in the near future. 5.4. Several clinical topics. In clinics, the fECG signal could provide at least two differ - ent kinds of physiological information – the HR V and the electrophysiological dynamics. For the HR V analysis, the most important ingredient is ha ving an accurate R peak detection algorithm. One of the main strength of the propose algorithm is providing the state-of-the- art MAE in the field. For example, in the real database, the av eraged MAE is about 5 SINGLE CHANNEL FECG BY DE-SHAPE STFT AND NONLOCAL MEDIAN 23 msec, which is roughly equi valent to the R peak information from an fECG signal sampled at about 200Hz, and its clinical value does deserv e a further e valuation. Howe ver , note that when we e valuate the MAE, we focus only on the TP beats, which means that we could only get a good HR V analysis when F 1 is high enough. See Figure 9 for an illustration of the relationship between the HR V analysis and the F 1 . This figure is generated in the following w ay . For each case in CinC2013 (75 in total), we ev aluate the root mean square error (RMSE) to quantify how accurate we could recover the fIHR for the HR V analysis. Based on the T ask Force [61], we get the fIHR’ s from the estimated fetal R peaks and from the annotated fetal R peaks, and RMSE between these two fIHR is e valuated. Note that we take all estimated fetal R peaks to simulate the real scenario. W e could see that when F 1 is slightly worse than 95%, the RMSE is as high as 0.1 beats/sec. This preliminary result emphasizes the importance of getting an accurate F 1 . An extensiv e study of this important topic will be reported in the future work. For the fECG morphology reconstruction, although the result is con vincing, note that we could not infer too much electrophysiological information from the single-lead fECG signal, ev en if the reconstruction is perfect. Indeed, due to the variation of relati ve cardiac axes between the mother and fetus, the projection direction of fECG is in general unknown, ev en if the abdominal lead system on the abdominal surface is standardized. T o explore this direction, we need to design or choose the optimal single abdominal lead location under the complicated time-varying dynamics, which is adaptive to the mother , or we may take multiple leads and proceed with an adaptiv e way to reconstruct the VCG signal. Uterine contraction is in general a dif ficult problem and big challenge in the fECG signal analysis, since it behaves like a huge noise in the aECG signal. In general, to handle such a huge noise, some a priori knowledge of the noise is needed. While the current method could provide a reasonable result in Case 3 of the simulated database, it does not hav e the ability to handle the uterine contraction, and that is why we always get an F 1 less then 80% in the 5 min result. While it is out of the scope of this paper, we mention that one possible approach is to incorporate the physiological behaviors underlying cardiac signals, a realistic model of an individual fetus’ ECG, or the statistical behavior of the noise associated with the uterine contraction to improv e the result [19, 57, 59]. It is not a reasonable assumption that the mother or the fetus are both healthy in the real life. Sometimes we might encounter subjects with a problematic heart. In Case 4 of the simulated database, the situation when the fetus and mother both hav e frequent ec- topic beats, lik e ventricular pre-contraction (VPC), is considered. W e see that the nonlocal median algorithm gives us a reasonable cardiac activity estimation. Indeed, the distance between an ectopic beat and a normal beat is in general larger than the distance between two ectopic beats and two normal betas. Therefore, the nonlocal nature of the nonlocal me- dian algorithm could accurately estimate the cardiac activity , since VPC and normal beats are clustered into two groups. Note that if VPC is polymorphic and/or the VPC number is small, the problem becomes more challenging and a different approach is needed. The twin pre gnancy problem was relativ ely rarely discussed, except in [62, 50]. In Case 5 of the simulated database, although we do not specifically study in this paper , we mention that the proposed algorithm has the potential to handle ev en multiple pregnancy problem. Indeed, the mIHR and fIHR’ s of different fetuses could be simultaneously revealed in the de-shape STFT . Ho wever , it is still challenging to utilize this single channel information at this stage, since there is no a priori information about the relationship between the IHR’ s and signal strengths of those fetuses. This limits the curve extraction step in the de-shape STFT , as the IHR’ s and ener gy of two fetus might be similar . Thus, although we could see 24 L. SU AND H.-T . WU all the fIHR’ s, to extract the fIHR from the TF representation based on the current curve extraction algorithm is challenging, and we need a more sophisticated curve extraction algorithm to do it. The study of the above-mentioned clinical topics will be explored in the future work. 5.5. Limitation. The discussion could not be complete without mentioning the limitation. While the strength of the proposed algorithm is confirmed on one simulation and two real databases, a larger scale clinical study is needed to further confirm the usefulness of the proposed algorithm since the size of the publicly av ailable databases, adfecgdb and CinC2013 , is small. Second, although fecgsyn database provides a univ ersal comparison platform with many interesting examples, the model at this stage is still oversimplified, and that might explain the high accuracy of our proposed algorithm. Many other limitations hav e been well discussed in [5]. A better “ground truth” should be considered for the ev aluation purpose. For example, a well established sheep model [25] could provide a gold standard test bed for the algorithm. 6. C O N C L U S I O N In this paper, we propose a novel fECG extraction algorithm depending mainly on a single-lead aECG based on a nonlinear TF analysis technique, de-shape STFT , and the nonlocal median. In addition to providing the theoretical model and mathematical guar- antee, the algorithm is ev aluated on a simulated fECG database and tested on two real databases with experts’ annotation. The main novelty in our algorithm is an extensi ve utilization of hidden information inside the aECG signal, that is, both the frequency and energy information and the nonlinear relationship between consecutiv e cardiac activities. In addition to solving clinical problems, we mention that the combination of the de-shape STFT and nonlocal median has a potential to deal with more general multi-component oscillatory signals in other fields. 7. A C K N O W L E D G E M E N T Hau-tieng W u’ s research is partially supported by Sloan Research Fellow FR-2015- 65363. Part of this work was done during Hau-tieng W u’ s visit to National Center for Theoretical Sciences, T aiwan, and he would like to thank NCTS for its hospitality . Hau- tieng W u acknowledges Professor Ingrid Daubechies for the continuous discussion about various topics and Professor Martin Frasch for the discussion about clinical needs. R E F E R E N C E S 1. H. Akbari, M. B. Shamsollahi, and R. Phlypo, F etal ECG Extraction using π T uck er Decomposition , Interna- tional Conference on Systems, Signals and Image Processing, IEEE, 2015. 2. M. Akhbari, M. Niknazar, C. Jutten, M. B. Shamsollahi, and B. Rivet, F etal Electrocar diogram R-peak Detection using Robust T ensor Decomposition and Extended Kalman F iltering , Computing in Cardiology (2013), 189–192. 3. M. AlGhatrif and J. Lindsay , A brief r eview: history to understand fundamentals of electrocar diography , J Community Hosp Intern Med Perspect. 2 (2012), no. 1, 10.3402. 4. R. Almeida, H. Gonc ¸ alves, J. Bernardes, and A. P . Rocha, F etal QRS detection and heart rate estimation: a wavelet-based appr oach. , Physiol. Meas. 35 (2014), no. 8, 1723–35. 5. F . Andreotti, J. Behar , S. Zaunseder, J. Oster , and G. D. Clifford, An open-sour ce framework for str ess-testing non-in vasive foetal ECG extraction algorithms. , Physiol. Meas. 37 (2016), no. 5, 627–648. 6. F . Andreotti, M. Riedl, T . Himmelsbach, D. W edekind, N. W essel, H. Stepan, C. Schmieder, A. Jank, H. Mal- berg, and S. Zaunseder , Rob ust fetal ECG e xtraction and detection fr om abdominal leads. , Physiol. Meas. 35 (2014), no. 8, 1551–67. SINGLE CHANNEL FECG BY DE-SHAPE STFT AND NONLOCAL MEDIAN 25 7. J. Behar , A. Johnson, G. D. Clifford, and J. Oster, A comparison of single channel fetal ECG extr action methods , Annals of Biomedical Engineering 42 (2014), no. 6, 1340–1353. 8. J. Behar , J. Oster , and G. D. Clifford, Combining and Benchmarking Methods of F oetal ECG Extraction W ithout Maternal or Scalp Electrode Data , Ph ysiol. Meas. 35 (2014), no. 8, 1569–1589. 9. M. A. Belfort, G. R. Saade, and et. al., A randomized trial of intr apartum fetal ecg st-segment analysis , Ne w England Journal of Medicine 373 (2015), no. 7, 632–641. 10. E. Benetos, S. Dixon, D. Giannoulis, H. Kirchhoff, and A. Klapuri, Automatic music transcription: chal- lenges and futur e directions , J. Intelligent Information Systems 41 (2013), no. 3, 407–434. 11. A. Buades, B. Coll, and J. M. Morel, A r eview of image denoising algorithms, with a new one , Multiscale Model. Simul. (2005), no. 4, 490–530. 12. E. Castillo, D. P . Morales, G. Botella, A. Garcia, L. Parrilla, and A. J. Palma, Efficient wavelet-based ECG pr ocessing for single-lead FHR extraction , Digital Signal Processing 23 (2013), no. 6, 1897–1909. 13. S. Cerutti, G. B. Baselli, S. Civ ardi, E. Ferrazzi, A. M. Marconi, M. Pagani, and G. Pardi, V ariability analysis of fetal heart rate signals as obtained from abdominal electr ocardiogr aphic recor dings , Journal of Perinatal Medicine 14 (1986), no. 6, 445–452. 14. T .-S. Chan, T .-C. Y eh, Z.-C. Fan, H.-W . Chen, L. Su, Y .-H. Y ang, and R. Jang, V ocal activity informed singing voice separation with the ikala dataset , IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2015, pp. 718–722. 15. K. N. Chaudhury and A. Singer, Non-local euclidean medians , IEEE Signal Processing Letters 19 (2012), no. 11, 745–748. 16. , Non-local patch re gression: Robust image denoising in patch space , ICASSP , IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (2013), 1345–1349. 17. Y .-C. Chen, M.-Y . Cheng, and H.-T . W u, Nonparametric and adaptive modeling of dynamic seasonality and tr end with heter oscedastic and dependent error s , J. Roy . Stat. Soc. B 76 (2014), 651–682. 18. I. Christov , I. Simov a, and R. Ab ¨ acherli, Extraction of the fetal ECG in nonin vasive r ecordings by signal decompositions. , Physiol. Meas. 35 (2014), 1713–21. 19. G. D. Clifford, A. Shoeb, P . E. McSharry , and B. A. Janz, Model-based filtering, compression and classifi- cation of the ECG , Proceedings of Bioelectromagnetism and 5th International Symposium on Noninv asive Functional Source Imaging within the Human Brain and Heart (BEM & NFSI) (2005), 158–161. 20. G. D. Clifford, I. Silva, J. Behar, and G. B. Moody , Non-invasive fetal ECG analysis , Physiol. Meas. 35 (2014), no. 8, 1521. 21. R. R. Coifman and S. Steinerberger , Nonlinear phase unwinding of functions , submitted (2015). 22. I. Daubechies, Y . W ang, and H.-T . W u, ConceFT : Concentration of fr equency and time via a multitaper ed synchr osqueezing transform , Philosophical T ransactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 374 (2016), no. 2065. 23. L. De Lathauwer, B. De Moor, and J. V andewalle, F etal Electrocar diogram Extraction by Blind Source Subspace Separation , IEEE T rans. Biomed. Eng. 47 (2000), no. 5, 567–572. 24. C. Di Maria, C. Liu, D. Zheng, A. Murray , and P . Langley , Extracting fetal heart beats from maternal abdominal recor dings: selection of the optimal principal components. , Physiol. Meas. 35 (2014), no. 8, 1649–64. 25. L. D. Durosier , G. Green, I. Batkin, A. J. Seely , M. G. Ross, B. S. Richardson, and M. G. Frasch, Sampling rate of heart rate variability impacts the ability to detect acidemia in ovine fetuses near-term , Frontiers in Pediatrics 2 (2014), 38. 26. D. P . W . Ellis, Beat trac king by dynamic progr amming , Journal of New Music Research 36 (2007), no. 1, 51–60. 27. N. Gambarotta, F . Aletti, G. Baselli, and M. Ferrario, A revie w of methods for the signal quality assessment to impr ove r eliability of heart rate and blood pressur es derived parameters , Med. Biol. Eng. Comput. (2016). 28. A. Ghaffari, M. J. Mollakazemi, S. A. Atyabi, and M. Niknazar, Robust fetal QRS detection fr om noninva- sive abdominal electrocar diogram based on channel selection and simultaneous multichannel pr ocessing , Australasian Physical and Engineering Sciences in Medicine 38 (2015), no. 4, 581–592. 29. A.L. Goldberger , L.A.N. Amaral, L. Glass, J.M. Hausdorff, P .Ch. Ivanov , R.G. Mark, J.E. Mietus, G.B. Moody , C.-K. Peng, and H.E. Stanley , Physiobank, physiotoolkit, and physionet: Components of a new r esear ch resour ce for complex physiologic signals. , Circulation 101 (2000), no. 23, e215–e220. 30. D. Graupe, M. H. Graupe, Y . Zhong, and R. K. Jackson, Blind adaptive filtering for non-in vasive extr ac- tion of the fetal electrocar diogram and its non-stationarities , Proceedings of the Institution of Mechanical Engineers, Part H: Journal of Engineering in Medicine 222 (2008), no. 8, 1221–1234. 26 L. SU AND H.-T . WU 31. J. F . Guerrero-Martinez, M. Martinez-Sober, M. Bataller -Mompean, and J. R. Magdalena-Benedito, New al- gorithm for fetal qr s detection in surface abdominal r ecords , 2006 Computers in Cardiology , 2006, pp. 441– 444. 32. M. Haghpanahi and D. A. Borkholder , F etal ECG Extraction F rom Abdominal Recordings using Array Signal Pr ocessing , Computing in Cardiology 40 (2013), 173–176. 33. L. Hormander, The analysis of linear partial dif ferential operator s I , Springer , 1990. 34. Y . Ikemiya, K. Y oshii, and K. Itoyama, Mir ex2014: Singing voice separation , The 10th Annual Music Infor - mation Retriev al Evaluation eXchange (MIREX), 2014. 35. H. M. Jenkins, Thirty years of electronic intrapartum fetal heart rate monitoring: discussion paper , Journal of the Royal Society of Medicine 82 (1989), no. 4, 210–4. 36. P . P . Kanjilal, S. Palit, and G. Saha, F etal ECG extraction fr om single-channel maternal ECG using singular value decomposition , IEEE T rans. Biomed. Eng. 44 (1997), no. 1, 51–59. 37. E. C. Karvounis and M. G. Tsipouras, Detection of F etal Heart Rate Through 3-D Phase Recordings , IEEE T rans. Biomed. Eng. 56 (2009), no. 5, 1394–1406. 38. E. C. Karvounis, M. G. Tsipouras, D. I. F otiadis, and K. K. Naka, An automated methodology for fetal heart rate extraction from the abdominal electrocar diogram , IEEE Transactions on Information T echnology in Biomedicine 11 (2007), no. 6, 628–638. 39. A. H. Khamene and S. Negahdaripoure, A new method for the extraction of fetal ECG from the composite abdominal signal , IEEE T rans. Biomed. Eng. 47 (2000), no. 4, 507–516. 40. M. Kotas, J. Jezewski, A. Matonia, and T . Kupka, T owar ds noise immune detection of fetal QRS complexes , Computer Methods and Programs in Biomedicine 97 (2010), no. 3, 241–256. 41. P . Laguna and L. S ¨ ornmo, Sampling rate and the estimation of ensemble variability for r epetitive signals. , Medical & biological engineering & computing 38 (2000), no. 5, 540–6. 42. G. Lamesgin, Y . Kassaw , and D. Assefa, Extraction of F etal ECG fr om Abdominal ECG and Heart Rate V ariability Analysis , Advances in Intelligent Systems and Computing 334 (2015), 147–161. 43. K. Lee and B. Lee, Sequential T otal V ariation Denoising for the Extraction of F etal ECG from Single-Channel Maternal Abdominal ECG , Sensors 16 (2016), no. 7, 1020. 44. C.-Y . Lin, S. Li, and H.-T . W u, W ave-shape function analysis–when cepstrum meets time-fr equency analysis , arXiv preprint arXi v:1605.01805 (2016). 45. C.-Y . Lin, A. Minasian, and H.-T . Wu, V ector non-local euclidean median and diffusion on the orbiford , submitted (2016). 46. J. A. Lipponen and M. P . T arvainen, Advanced Maternal ECG Removal and Noise Reduction for Application of F etal QRS Detection , Computing in Cardiology 40 (2013), 161–164. 47. C. Liu, P . Li, C. Di Maria, L. Zhao, H. Zhang, and Z. Chen, A Multi-step Method with Signal Quality Assessment and F ine-tuning Pr ocedur e to Locate Maternal and F etal QRS Comple xes fr om Abdominal ECG Recor dings. , Physiological measurement 35 (2014), 1665–83. 48. M. Malik, P . Farbom, V . Batchvaro v , K. Hnatkova, and A.J. Camm, Relation between QT and RR intervals is highly individual among healthy subjects: implications for heart rate correction of the QT interval , Heart 87 (2002), no. 3, 220–228. 49. S. M. M. Martens, C. Rabotti, M. Mischi, and R. J. Sluijter, A Robust F etal ECG Detection Method for Abdominal Recor dings. , Physiol. Meas. 28 (2007), 373–88. 50. M. Niknazar , B. Riv et, and C. Jutten, F etal ECG extraction by extended state Kalman filtering based on single-channel r ecordings , IEEE T rans. Biomed. Eng. 60 (2013), no. 5, 1345–52. 51. D. Panigrahy , M. Rakshit, and P . K. Sahu, An efficient method for fetal ECG extr action from single chan- nel abdominal ECG , 2015 International Conference on Industrial Instrumentation and Control, ICIC 2015 (2015), no. Icic, 1083–1088. 52. D. Panigraphy and P . K. Sahu, Extraction of fetal ECG by extended state Kalman filtering and adaptive neur o-fuzzy inference system based on single channel , Sadhana 40 (2015), no. June, 1091–1104. 53. M. Richter, T . Schreiber , and D.T . Kaplan, F etal ECG extr action with nonlinear state-space projections , IEEE T rans. Biomed. Eng. 45 (1998), no. 1, 133–137. 54. R. Rodrigues, F etal beat detection in abdominal ECG recor dings: global and time adaptive appr oaches. , Physiol. Meas. 35 (2014), no. 8, 1699–711. 55. R. Sameni, Extr action of fetal cardiac signals fr om an arr ay of maternal abdominal r ecordings , Ph.D. thesis, Sharif Univ ersity of T echnology – Institut National Polytechnique de Grenoble, 2008. 56. R. Sameni and G. D. Clifford, A r eview of fetal ECG signal pr ocessing; issues and promising directions , Open Pacing Electrophysiol Ther J. 3 (2010), 4–20. SINGLE CHANNEL FECG BY DE-SHAPE STFT AND NONLOCAL MEDIAN 27 57. R. Sameni, G. D. Clif ford, C. Jutten, and M. B. Shamsollahi, Multichannel ECG and noise modeling: Appli- cation to maternal and fetal ECG signals , Eurasip Journal on Advances in Signal Processing 2007 (2007), 1–14. 58. R. Sameni, C. Jutten, and M.B. Shamsollahi, Multichannel electr ocardiogram decomposition using periodic component analysis , IEEE T rans. Biomed. Eng. 55 (2008), no. 8, 1935–1940. 59. R. Sameni, M. B. Shamsollahi, C. Jutten, and G. D. Clifford, A nonlinear Bayesian filtering framework for ECG denoising , IEEE T rans. Biomed. Eng. 54 (2007), no. 12, 2172–2185. 60. A. Singer , Y . Shk olnisky , and B. Nadler, Diffusion Interpretation of Nonlocal Neighborhood F ilters for Signal Denoising , SIAM Journal on Imaging Sciences 2 (2009), no. 1, 118. 61. T ask Force, Heart Rate V ariability : Standards of Measurement, Physiological Interpretation, and Clinical Use , Circulation 93 (1996), no. 5, 1043–1065. 62. M. J.O. T aylor , M. J. Smith, M. Thomas, A. R. Green, F . Cheng, S. Oseku-Af ful, L. Y . W ee, N. M. Fisk, and H. M. Gardiner, Non-in vasive fetal electr ocar diography in singleton and multiple pr e gnancies , BJOG: An International Journal of Obstetrics and Gynaecology 110 (2003), no. 7, 668–678. 63. M. Ungureanu, J. W . M. Ber gmans, S. G. Oei, and R. Strungaru, F etal ECG e xtraction during labor using an adaptive maternal beat subtraction technique , Biomedizinische T echnik 52 (2007), no. 1, 56–60. 64. J. H. van Bemmel and H. van der W eide, Detection procedur e to repr esent the foetal heart rate and electro- car diogram. , IEEE T rans. Biomed. Eng. 13 (1966), no. 4, 175–182. 65. M. V aranini, G. T artarisco, L. Billeci, A. Macerata, G. Pioggia, and R. Balocchi, An efficient unsupervised fetal QRS complex detection fr om abdominal maternal ECG , Physiol. Meas. 35 (2014), no. 8, 1607–19. 66. B. Widro w , C. S. Williams, J. R. Glover , J. M. McCool, R. H. Hearn, J. R. Zeidler, J. Kaunitz, E. Dong, and R. C. Goodlin, Adaptive Noise Cancelling: Principles and Applications , Proceedings of the IEEE 63 (1975), no. 12, 1692–1716. 67. Z. Zhou, Heter oscedasticity and Autocorr elation Robust Structural Change Detection , J. Am. Stat. Assoc. 108 (2013), 726–740. 28 L. SU AND H.-T . WU T A B L E 1 . Results of fetal R peaks estimation – median F 1 and MAE and their IQRs for different cases and noise levels among 10 subjects and 5 rounds. The 1min result is reported in the following way . For each subject, noise lev el, case, and round, we take the channel and the 1-minute subset out of the 5-minute signal that leads to the highest F 1 to report the result. Then, for each noise lev el and each case, the median and IQR of all subjects and rounds are shown as the final result, as is suggested in [5]. For the 5min result, for each subject, noise level, case, and round, we take the channel that leads to the highest F 1 , and then for each noise level and each case, we report the median and IQR of all subjects and rounds. Case 0 Case 1 Case 2 Case 3 Case 4 F 1 (%) 12 dB 1min 100 (0) 100 (0) 100 (0) 100 (0) 100 (0) 5min 99.93 (0.38) 99.92 (0.21) 100 (0.87) 72.96 (10.32) 99.85 (0.47) 9 dB 1min 100 (0) 100 (0) 100 (0) 100 (0) 100 (0) 5min 99.9 (0.61) 99.91 (0.29) 99.93 (0.27) 73.6 (9.44) 99.58 (3.75) 6 dB 1min 100 (0.73) 100 (0.88) 100 (0.67) 100 (0.43) 100 (7.83) 5min 99.89 (1.38) 99.82 (11.01) 99.91 (9.56) 73.72 (13.58) 99.31 (22.52) 3 dB 1min 100 (71.53) 100 (0.88) 100 (69.14) 100 (6.11) 100 (19.82) 5min 99.12 (76.49) 98.72 (6.17) 99.44 (78.63) 72.11 (28.53) 98.6 (45.75) 0 dB 1min 95.63 (56.14) 97.99 (19.31) 97.9 (64.75) 99.61 (37) 93.76 (64.04) 5min 88.24 (74.69) 83.46 (38.1) 81.17 (73.67) 64.25 (38.3) 85.84 (74.18) MAE (msec) 12 dB 1min 0.85 (1.54) 0.5 (0.77) 0.68 (1.2) 0.53 (0.52) 2.15 (1.46) 5min 0.82 (1.82) 0.58 (0.78) 0.63 (1.59) 3.07 (1.54) 2.32 (1.63) 9 dB 1min 1.01 (2.83) 0.75 (0.65) 0.7 (0.49) 0.51 (0.58) 3.21 (1.77) 5min 0.96 (2.96) 0.71 (0.7) 0.77 (0.37) 3.15 (1.62) 3.2 (1.77) 6 dB 1min 1.21 (3.04) 0.92 (2.7) 1.06 (2.16) 0.85 (1.01) 4.03 (5.56) 5min 1.15 (3.17) 0.93 (2.48) 1 (3.11) 3.43 (1.99) 4.01 (5.52) 3 dB 1min 2.23 (10.64) 1.43 (3.18) 1.41 (4.15) 1.11 (6.67) 5.13 (11.54) 5min 1.9 (12.18) 1.76 (4.16) 1.38 (4.13) 3.77 (8.61) 5.07 (11.89) 0 dB 1min 5.65 (8.06) 4.36 (8.4) 3.5 (13.46) 2.13 (11.66) 6.6 (13.47) 5min 5.27 (7.85) 5.59 (9.47) 4.56 (10.93) 5.53 (11.47) 6.26 (13.34) SINGLE CHANNEL FECG BY DE-SHAPE STFT AND NONLOCAL MEDIAN 29 T A B L E 2 . Results of fetal R peaks estimation for the tenth best channel – median F 1 and MAE and their IQRs for dif ferent cases and noise levels among 10 subjects and 5 rounds. The 1min result is reported in the following way . For each subject, noise lev el, case, and round, we take the channel and the 1-minute subset out of the 5-minute signal that leads to the tenth highest F 1 to report the result. Then, for each noise level and each case, the median and IQR of all subjects and rounds are sho wn as the final result, as is suggested in [5]. For the 5min result, for each subject, noise lev el, case, and round, we take the channel that leads to the tenth highest F 1 , and then for each noise lev el and each case, we report the median and IQR of all subjects and rounds. Case 0 Case 1 Case 2 Case 3 Case 4 F 1 (%) 12 dB 1min 100 (0) 100 (0) 100 (0.36) 100 (0.4) 100 (1.74) 5min 99.82 (9.28) 99.88 (0.35) 99.39 (14.41) 64.17 (19.45) 99.02 (16.1) 9 dB 1min 100 (1.25) 100 (74.63) 100 (4.62) 100 (2.08) 99.2 (3.13) 5min 99.48 (5.48) 99.53 (78.69) 99.48 (27.05) 64.61 (17.64) 97.71 (10.31) 6 dB 1min 100 (10.37) 100 (1.64) 99.23 (64.58) 99.6 (14.44) 98.94 (12.78) 5min 96.02 (43.21) 98.06 (6.49) 94.48 (74.91) 60.57 (21.72) 93.71 (28.81) 3 dB 1min 97.89 (59.78) 99.18 (9.54) 99.8 (80.33) 99.7 (66.55) 94.79 (29.41) 5min 87.54 (74.19) 95.27 (21.42) 78.4 (90.88) 60.31 (47.41) 87.48 (48.56) 0 dB 1min 51.55 (71.07) 80.53 (80.79) 46.34 (67.88) 83.73 (68.65) 83.69 (38.35) 5min 43.7 (69.08) 49.29 (90.35) 32.07 (49.34) 45.65 (41.85) 66.02 (57.91) MAE (msec) 12 dB 1min 1.09 (1.8) 0.84 (1.23) 1.29 (2.66) 1.03 (2.26) 3.87 (2.97) 5min 1.16 (2.14) 0.75 (1.07) 1.17 (2.61) 4.41 (3.72) 3.86 (2.94) 9 dB 1min 1.36 (5.53) 0.72 (1.5) 1.37 (5.99) 1.51 (2.31) 4.74 (3.76) 5min 1.34 (6.02) 0.64 (1.71) 1.48 (5.58) 5.18 (3.51) 4.74 (4.15) 6 dB 1min 2.87 (7.36) 1.36 (3.68) 4.4 (13.72) 1.36 (5.73) 6.75 (5.03) 5min 3.17 (7.5) 2.52 (4.39) 4.33 (17.84) 5.47 (5.92) 6.74 (6.27) 3 dB 1min 4.44 (11.48) 2.98 (4.53) 2.18 (13.44) 2.09 (19.65) 8.13 (8.8) 5min 6 (11.63) 3.54 (5.34) 3.14 (14.91) 5.49 (15.82) 8.8 (8.56) 0 dB 1min 11.46 (12.85) 6.18 (16.45) 14.41 (12.98) 8.01 (19.81) 9.06 (9.89) 5min 13.29 (13.67) 6.22 (17.23) 15.57 (13.19) 12.11 (16.83) 9.04 (10.82) 30 L. SU AND H.-T . WU T A B L E 3 . Results of fECG wa veform estimation over the 5-minute sig- nal – median correlation and its IQR for different cases and noise levels among 10 subjects and 5 rounds. For each subject, noise lev el, case, and round, we take the channel that leads to the highest F 1 (and the thenth highest F 1 ), and then for each noise lev el and each case, we report the median and IQR of all subjects and rounds. Case 0 Case 1 Case 2 Case 3 Case 4 Correlation 12 dB highest 0.962 (0.067) 0.977 (0.044) 0.984 (0.052) 0.983 (0.017) 0.97 (0.032) 10th highest 0.953 (0.097) 0.96 (0.047) 0.959 (0.062) 0.973 (0.052) 0.963 (0.108) 9 dB highest 0.962 (0.076) 0.968 (0.05) 0.977 (0.021) 0.978 (0.034) 0.962 (0.039) 10th highest 0.93 (0.222) 0.943 (0.753) 0.953 (0.12) 0.97 (0.075) 0.942 (0.164) 6 dB highest 0.954 (0.076) 0.958 (0.083) 0.949 (0.069) 0.977 (0.053) 0.957 (0.141) 10th highest 0.912 (0.256) 0.919 (0.108) 0.897 (0.374) 0.948 (0.118) 0.926 (0.191) 3 dB highest 0.919 (0.48) 0.94 (0.103) 0.939 (0.276) 0.965 (0.133) 0.95 (0.216) 10th highest 0.859 (0.348) 0.925 (0.29) 0.923 (0.428) 0.945 (0.47) 0.863 (0.37) 0 dB highest 0.892 (0.404) 0.894 (0.289) 0.9 (0.371) 0.942 (0.215) 0.875 (0.386) 10th highest 0.656 (0.378) 0.846 (0.518) 0.768 (0.3) 0.886 (0.345) 0.778 (0.375) T A B L E 4 . F 1 score, MAE, PPV , and SE of all channels and all subjects ov er the whole 5 minute signals in the adfecgdb database. For each sub- ject, the best result among 4 channels is marked in the boldface. Subject Channel F 1 (%) MAE (msec) PPV (%) SE (%) r01 1 99.53 1.5227 99.38 99.69 2 75.21 4.4004 72.33 78.32 3 87.35 4.4046 86.41 88.30 4 78.83 4.9358 77.53 80.19 r04 1 82.15 9.0635 81.63 82.67 2 98.81 7.6447 98.73 98.89 3 97.54 7.2622 97.46 97.62 4 98.41 6.9065 98.26 98.57 r07 1 84.81 10.5816 84.20 85.42 2 98.48 8.3182 98.25 98.72 3 99.44 7.6329 99.36 99.52 4 99.28 7.3048 99.20 99.36 r08 1 97.85 2.0219 97.26 98.46 2 84.44 3.7191 82.69 86.27 3 71.42 5.3655 69.49 73.46 4 69.69 5.2397 71.92 67.59 r10 1 98.65 2.8119 98.26 99.04 2 98.49 3.5919 98.26 98.73 3 78.52 8.5177 76.28 80.89 4 95.25 4.1681 94.79 95.70 SINGLE CHANNEL FECG BY DE-SHAPE STFT AND NONLOCAL MEDIAN 31 R E S E A R C H C E N T E R F O R I N F O R M A T I O N T E C H N O L O G Y I N N OV ATI O N , A C A D E M I A S I N I C A E-mail addr ess : li.sowaterking@gmail.com D E PART M E N T O F M A T H E M ATI C S , U N I V E R S I T Y O F T O RO N T O E-mail addr ess : hauwu@math.toronto.edu 32 L. SU AND H.-T . WU F I G U R E 2 . The result of subject 1, round 1, case 4, and channel 21 with- out noise. In the top subplot, all relev ant signals are sho wn together for a visual comparison. The clean aECG signal is shown in black (shifted up by 8 units) with the detected R peaks in round red circles on the top row; the clean maECG is sho wn in light gray (shifted up by 5 units) superim- posed with the estimated maECG signal in blue on the second row; the rough fetal ECG is shown in dark gray (shifted up by 2.5 units) with the detected R peaks in round red circles on the third row; the clean fECG is sho wn in light gray superimposed with the estimated fECG signal in red on the bottom row . In the middle subplot, the de-shape STFT of the maECG is shown on the left and the de-shape STFT of the rough fECG is shown on the right. In the bottom subplot, the STFT of the maECG is shown on the left and the STFT of the rough fECG is shown on the right. Note that the discrepancy between the clean maECG and the esti- mated maECG comes from the median filter , which in general is needed to remov e the baseline wandering. SINGLE CHANNEL FECG BY DE-SHAPE STFT AND NONLOCAL MEDIAN 33 F I G U R E 3 . The result of subject 3, round 1, case 4, and channel 23 with the SNR 6dB. In the top subplot, all relev ant signals are shown together for a visual comparison. The clean maECG signal is shown in black (shifted up by 8 units) with the detected R peaks in round red circles on the top row; the clean maECG is shown in light gray superimposed with the estimated maECG signal in blue (shifted up by 5 units) on the second ro w; the rough fetal ECG is shown in dark gray (shifted up by 2.5 units) with the detected R peaks in round red circles on the third ro w; the clean fECG is sho wn in light gray superimposed with the estimated fECG signal in red on the bottom row . The two red arrows indicate two ectopic beats that are not recov ered by the nonlocal median algorithm. In the middle subplot, the de-shape STFT of the maECG is sho wn on the left and the de-shape STFT of the rough fECG is shown on the right. In the bottom subplot, the STFT of the maECG is sho wn on the left and the STFT of the rough fECG is sho wn on the right. Note that the discrepanc y between the clean maECG and the estimated maECG comes from the median filter , which in general is needed to remo ve the baseline w ander- ing. 34 L. SU AND H.-T . WU F I G U R E 4 . Upper left and upper right: the STFT and de-shape STFT of the first channel of the case r01; bottom left and bottom right: the STFT and de-shape STFT of the rough fECG estimate. The red arrow indicates the fIHR, and the blue arrow indicates the mIHR. SINGLE CHANNEL FECG BY DE-SHAPE STFT AND NONLOCAL MEDIAN 35 F I G U R E 5 . The aECG signal of the third channel of the case r07 is plot- ted in black on the top row . The rough fECG signal estimate (shifted down by 90 units) is plotted in gray on the second ro w . The final fECG estimation (shifted down by 120 units) is plotted in red superimposed on the rough fECG signal (shifted down by 120 units) plotted in gray for the comparison purpose on the third row . The direct fECG recorded from the fetal scalp (shifted down by 240 units) is shown in blue on the bottom row . F I G U R E 6 . The small segment fECG estimation is plotted in red with the rough fECG signal plotted in gray for the comparison purpose. The data is the third channel of the case r07. 36 L. SU AND H.-T . WU F I G U R E 7 . The result of channel 2 of a59 in the CinC2013 database. In the top subplot, all relev ant signals are shown together for a visual comparison. The clean maECG signal is shown in black (shifted up by 1600 units) with the detected maternal R peaks in round red circles and the annotated fetal R peaks in blue diamonds on the top row; the estimated maECG signal is shown in blue (shifted up by 600 units) on the second ro w; the rough fetal ECG is shown in dark gray (shifted up by 300 units) with the detected R peaks in red upper triangles and the annotated R peaks in blue diamonds on the third row; the fECG estimation is shown in red on the bottom ro w . It is clear that the fetal R peaks are all detected incorrectly . In the bottom subplot, the de-shape STFT of the maECG is shown on the left and the de-shape STFT of the rough fECG is sho wn on the right. The red arrow indicated the “dominant curve” in the de-shape STFT , which comes from the incomplete removal of the maECG signal. SINGLE CHANNEL FECG BY DE-SHAPE STFT AND NONLOCAL MEDIAN 37 F I G U R E 8 . The result of the dif ference between channel 2 and channel 3 of a59 in the CinC2013 database. In the top subplot, all relev ant signals are shown together for a visual comparison. The clean maECG signal is shown in black (shifted up by 55 units) with the detected maternal R peaks in round red circles and the annotated fetal R peaks in blue diamonds on the top row; the estimated maECG signal is shown in blue (shifted up by 35 units) on the second ro w; the rough fetal ECG is shown in dark gray (shifted up by 20 units) with the detected fetal R peaks in round red circles and the annotated fetal R peaks on red upper triangles on the third row; the fECG estimation is shown in red on the bottom row . In the bottom subplot, the de-shape STFT of the maECG is shown on the left and the de-shape STFT of the rough fECG is shown on the right. Clearly , the fetal R peaks is perfectly recov ered. 38 L. SU AND H.-T . WU F I G U R E 9 . The RMSE and F 1 of the 75 subjects in the CinC2013 data- base is plotted.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment