Topological Data Analysis for True Step Detection in Piecewise Constant Signals

T op ological Data Analysis for T rue Step Detection in Piecewise Constan t Signals Firas A. Khasa wneh ∗ Elizab eth Munc h † Ma y 17, 2018 Abstract This pap er in tro duces a simple y et pow erful approach based on topological data analysis (TDA) for detecting the true steps in a piecewise constan t (PW C) signal. The signal is a t w o-state square wa v e with randomly v arying in-b et w een-pulse spacing, and sub ject to spurious steps at the rising or falling edges whic h w e refer to as digital ringing. W e use persistence homology to deriv e mathematical guarantees for the resulting change detection which enables accurate iden tiﬁcation and counting of the true pulses. The approac h is describ ed and tested using b oth synthetic and exp erimen tal data obtained using an engine lathe instrumented with a laser tac hometer. The describ ed algorithm enables the accurate calculation of the spindle speed with the appropriate error bounds. The results of the described approac h are compared to the frequency domain approach via F ourier transform. It is found that b oth our approach and the F ourier analysis yield comparable results for numerical and exp erimental pulses with regular spacing and digital ringing. Ho wev er, the describ ed approac h signiﬁcantly outp erforms F ourier analysis when the spacing b etw een the p eaks is v aried. W e also generalize the approach to higher dimensional PW C signals, although utilizing this extension remains an interesting question for future research. 1 In tro duction Piecewise constant (PWC) signals are an imp ortant sub class of piecewise contin uous data which o ccur in a v ariety of applications such as bioinformatics, astroph ysics, geoph ysics, molecular biosciences and digital imagery [ 1 , 2 , 3 , 4 , 5 ]. Figure 1 a–c shows sev eral forms of PW C signals b oth deterministic and sto chastic. The PWC wa veform also occurs in the output of several sensors, which include tachometers. Sensors with PW C wa v eform are widely used and implemented in applications that in volv e rotating mac hinery such as in vestigating pressure ﬂuctuations in b earings [ 6 ], ﬂat plate motions [ 7 ], analyzing the p ow ertrain vibrations of vehicles [ 8 ], and for the exp erimen tal inv estigations of the dynamic performance of wind turbines [ 9 ]. One imp ortan t example of PW C wa v eform sensors are laser tac hometers which are often used in machining pro cess applications, including monitoring rotation velocity in the presence of spindle sp eed mo dulation [ 10 ] or vibration-assisted cutting in turning [ 11 ], defect detection for friction stir welding [ 12 ], measuring the feed error in tapping pro cesses [ 13 ], and for strob oscopic sampling of vibrations in milling [ 14 ]. The output of laser tachometers assumes tw o logical lev els low/high or on/oﬀ based on the reﬂectivit y or con trast of the target material when sub jected to a laser source. These tachometers often output transistor- transistor-logic (TTL) pulses triggered by the change in the reﬂected laser b eam. The resulting t wo-state tra jectory can b e collected using digital or analog c hannels. If analog c hannels are used to collect the laser tac hometer signal, then noise will b e sup erimp osed on the pulse train. Generally , utilizing linear ﬁltering for noise remov al from PWC signals is ineﬃcient b ecause b oth the noise and the PW C signal hav e a broad ∗ Dept. of Mec hanical Engineering, Michigan State Univ ersit y . khasawn3@egr.msu.edu † Dept. of Computational Mathematics Science and Engineering; and Dept. of Mathematics, Michigan State Univ ersity . muncheli@egr.msu.edu This material is based up on work supp orted by the National Science F oundation under Gran t Nos. CMMI-1759823 and DMS-1759824 with PI F AK, and CMMI-1800466 and DMS-1800446 with PI EM. 1 Figure 1: An ov erview of the categorization (left column) and analysis (right column) of the t yp e of signals considered in this study . The pap er in vestigates piecewise constant signals with tw o-state tra jectories and random digital ringing (panel b in the left ﬁgure). band frequency sp ectrum. How ev er, since for laser tachometers the signal to noise ratio is t ypically large the underlying digital signal can b e reco v ered by hard-thresholding an adequately sampled time series. Ho wev er, hard-thresholding is not suitable in general for sto c hastic PW C signals suc h as the case sho wn in Fig. 1 c. Assuming that a representativ e, noise-free sample of the laser tac hometer signal has b een recov ered, another challenge is the phenomenon where spurious pulses app ear in the signal near the transitions of reﬂectiv e/non-reﬂective target, as shown in Fig. 1 b. W e call these peaks digital ringing . W e observed these p eaks while collecting laser tachometer signals during a turning exp eriment. These spurious p eaks can o ccur due to (1) chips or co oling ﬂuid interrupting the laser b eam in manufacturing applications, (2) vibrations of the laser tachometer holder near regions on the target with diﬀeren t con trast, (3) vibration of the target a wa y or to wards the laser tachometer, or (4) unin tended reﬂections of the laser beam on non-target surfaces due to the motion of the machine. These spurious p eaks can o ccur at the rising or falling edges but they do not occur as a rep etitiv e pattern, and they ma y not o ccur at some p eaks or at all combinations of cutting parameters. Hence, they are considered a random v ariable and are grouped under the umbrella of stochastic systems in Fig. 1 . The challenge in obtaining the once-p er-rev olution pulse or in counting the n umber of pulses p er unit time is exacerbated b y these spurious peaks. Moreo ver, signiﬁcant complications are in tro duced if the spacing betw een the pulses is further randomly modulated either due to external disturbances or due to some noise sup erimposed on inten tional, regular v ariations of the rotational sp eed. In b oth these scenarios the in terest is in ﬁnding either (a) the rising edge, (b) the p eak, or (c) the falling edge where a true transition in the reﬂectivity of the target occurs. In tachometer signals it is common to detect falling or rising edges, so in this study we choose to in vestigate the rising edges of true transitions. While ﬁnding true p eaks from a signal con taminated with noise may seem like an easy problem, the exis- tence of spurious p eaks precludes using traditional, step detection metho ds for p eak detection and counting. F or example, p eak-ﬁnding algorithms are not useful here because true p eaks can b e incorrectly coun ted m ultiple times thus giving false counts and consequently yielding exaggerated spindle sp eeds. In addition, setting width thresholds in the p eaks algorithm requires the user to ha ve a priori kno wledge of the existing noise, whic h ma y not b e feasible or practical. The p eak ﬁnding algorithm also catastrophically fails if the spacing b etw een p eaks randomly v aries due, for example, to randomness in the spindle drive or v ariations in the cutting load of mac hine to ols. F urther, since the signal is corrupted b y digital ringing, typical ﬁltering tec hniques for PWC signals [ 3 , 4 ] such as total v ariation denoising, hidden Marko v chains, and wa v elets are ineﬀective. T raditional clustering metho ds suc h as k -means clustering require knowing the num b er of clusters desired in adv ance, and thus are not useful for our need to automatically coun t the n umber of pulses. It is worth men tioning that iden tifying true peaks in the presence of digital ringing as described in this 2 pap er is a task where humans outp erform traditional computer algorithms. Ho wev er, while humans can often identify true p eaks, coun ting them is prone to error particularly when the num b er of pulses is large p er unit time. Moreo ver, it is impractical to rely on h uman interpretation of laser tachometer signals esp ecially in high-sp eed or real-time op erations. Therefore, the ob jective of this pap er is to in tro duce a simple yet pow erful approac h for automatically detecting the true steps in a PWC tw o-state square w av e sub ject to digital ringing. The sp eciﬁc application is obtaining the spindle speed of a machine to ol with a laser tac hometer whose signal is corrupted by spurious on/oﬀ pulses. W e also extend the approac h to the case where the spacing b et ween the pulses v aries randomly , and we discuss and e xtension of the theory to higher dimensional analogs, and we deriv e mathematical guaran tees for the resulting c hange detection whic h enables accurate identiﬁcation and counting of the true pulses. The approac h w e utilize is based on T op ological Data Analysis (TD A) [ 15 , 16 , 17 , 18 , 19 , 20 ] and it is outlined in Section 3 . The prev alence of T op ological Data Analysis (TD A) has explo ded in recent years due to its use in many disparate domains. Arguably the most prev alent application is that of TDA to time series analysis and signal pro cessing, having spa wned some of the earliest results in the ﬁeld [ 21 , 22 , 23 ]. This subﬁeld of TDA is often referred to as T opological Signal Pro cessing (TSP) [ 24 ]. F rom the insigh ts gained from the b eginnings of TSP came one of the most prominent tools in TDA, namely p ersisten t homology [ 25 , 26 ], which quantiﬁes shap e and structure in data. No w, TSP has b ecome a mature ﬁeld in its own right. Much of the w ork stems from utilizing persistent homology in conjunction with delay coordinate embeddings to give top ological quan tiﬁcation of attractors of dynamical systems [ 27 , 28 , 29 , 30 , 31 ]. Recent work has come in the form of classiﬁcation and quantiﬁcation of p eriodicity and quasi-perio dicity [ 32 , 33 , 34 , 35 , 36 ]. This work has app eared in a diverse arra y of applications including wheeze detection [ 37 ], computer performance [ 38 ], market prices [ 39 , 40 ], sonar [ 41 ], image processing [ 42 ], ice core analysis [ 43 ], mac hining dynamics [ 44 , 45 , 46 , 47 ], and gene expression [ 48 , 49 ]. In this paper, w e will utilize one of the simplest examples of p ersistent homology , namely 0-dimensional p ersistence deﬁned on points in R . Viewing our data in this w ay gives access to the p o werful theory built up for understanding p ersisten t homology , particularly with resp ect to noise. In particular, the existence of a metric on the space of p ersistence diagrams leads to the p ow erful notion of stability [ 50 , 51 ]. This knowledge in conjunction with our assumptions on noise is the theoretical basis for the algorithm w e develop in Section 3 . While there is a growing collection of ever faster co de for computation of persistent homology [ 52 ], our restricted setting also gives rise to simpliﬁed algorithms, th us making the analysis quite fast. An additional p erk of this viewp oin t is that it also allows for generalization to higher dimensional problems as outlined in Section 66.1 . The approach is describ ed and tested using b oth synthetic and exp erimen tal data. The exp erimen tal apparatus includes an engine lathe instrumented with a laser tachometer which detects the change in re- ﬂectivit y of a tap e adhered to the circumference of the spindle. The resulting mean spindle speed is then rep orted with appropriate error bounds. W e also compare the results to the output of F ourier analysis. It is found that the describ ed approach pro vides comparable results to F ourier for regular pulse wa v es with digital ringing; how ev er, the numerical calculations sho w that the describ ed approach outperforms F ourier analysis when the spacing b et ween the pulses is v aried. 2 Bac kground This section provides the necessary background for motiv ating and presenting the new approach for pulse coun ting using p ersistence diagrams. Section 2.1 discusses existing metho ds for pulse counting and spindle sp eed calculations. Sections 2.2 – 2.4 present the background theory on p ersisten t homology and describ es ho w it relates to the current work. 2.1 Curren t metho ds for counting and rotational sp eed calculation Sensors for detecting rotary motion include proximit y sensors, photo electric sensors, and encoders. These sensors output pulses that can b e counted and used to ﬁnd the sp eed of the shaft, t ypically , in units of rev olutions p er minute (RPM). The pulse signals are piecewise con tinuous (PWC) functions with tw o logic 3 lev els: high or on, and low or oﬀ. The quality of the output data dep ends on the num b er of pulses per rev olution whic h aﬀects the data resolution, as well as the symmetry of the pulses, which inﬂuences the accuracy and consistency of the data. Once the pulses are obtained, there are generally tw o tec hniques for determining the corresp onding RPM: 1) frequency measuremen t approac h (calculate RPM from pulse count and pulse frequency), and 2) p erio d measuremen t approac h (calculate RPM from pulse count and pulse perio d). The frequency approac h in v olves transforming the signal into the frequency domain using a F ourier transform; how ever, the F ourier transform of a PW C signals can hav e slo w con v ergence [ 53 ]. Other counting algorithms include lo cal maxima or peak detection with w av elet transforms or other metho ds [ 3 ]. Ho wev er, the interest in this study is in detecting true p eaks, not all peaks. Therefore, utilizing con ven tional metho ds for p eak detection must b e combined with a threshold for rejecting false p eaks or retaining true ones. It may b e tempting to utilize a statistical measure such as the v ariance or standard deviation of the pulse duration. How ev er, since the data can v ary from bimo dal to uniform dep ending on the amount of noise introduced, statistical disp ersion measures are generally not eﬀectiv e. When there is noise sup erimp osed on the signal (for example due to collecting the signal on an analog c hannel) this noise can b e remo ved by hard thresholding. If the noise component is so large that the pulse structure can no longer b e distinguished as sho wn in Fig. 1 c, the denoising becomes more diﬃcult b ecause the traditional approach of lo w-pass ﬁltering typically in tro duces large spurious oscillations in PW C signals [ 53 ]. In this case, either the viewp oin t of piecewise constant smo othing, or that of level-set recov ery can b e used for determining the lo cation of the jumps [ 3 ]. In this pap er, we fo cus on signals where the small amoun t of noise superimp osed on the pulse amplitude can b e remov ed by thresholding, but where spurious random pulses o ccur at the rising or falling edges and where the spacing b et w een p eaks can randomly v ary . This type of noise cannot b e remov ed by thresholding, or by traditional ﬁltering techniques. V arying the p eriod length b et ween tw o consecutive pulses signiﬁcantly complicates the analysis, and without a prop er metho d for change detection can lead to p o or RPM calculations. Therefore, there is still a need for new, robust to ols for pulse detection in PWC signals and this pap er presents a metho d for reliable pulse-counting using 0-dimensional persistent homology . 2.2 0-dimensional P ersisten t homology P ersistent homology [ 21 , 22 , 25 , 26 ], a to ol arising from T op ological Data Analysis [ 20 , 19 ], seeks to quantify shap e and structure in data sets. Algebraic top ology [ 54 , 55 ] is a ﬁeld of mathematics which quantiﬁes qualitativ e similarities in the structure of spaces. One suc h metho d for quantiﬁcation is homology , whic h, giv en a top ological space, provides a vector space 1 for each dimensional structure b eing studied. This pap er will only focus on dimension 0, whic h quantiﬁes connected comp onents. The in tuition b ehind persistent homology for a p oin t cloud data set is to increase a connectivity parameter, and quantify ho w the top ological structure c hanges. This p o werful metho d can ﬁnd in teresting, higher dimensional structure, using eac h dimension of homology , how ever, w e will look at the simplest version for the purp oses of our problem. Zero-dimensional p ersisten t homology quan tiﬁes how the clusters change when view ed at diﬀerent scales. In fact, 0-dimensional p ersistent homology is closely related to classical clustering metho ds such as single-link age hierarchical clustering, dendrograms, and minimal spanning trees [ 56 ]. Here, w e adopt the view of the pro cedure as a restricted case of p ersisten t homology in order to gain understanding and predictive p o w er in our analysis, th us resulting in Thm. 3.1 . In the general setting, assume w e are given a p oin t cloud χ ⊂ R D with | χ | = n . W e can deﬁne a function f χ : R D → R b y f χ ( x ) = 2 k x − χ k where k x − A k = inf y ∈ A k x − a k for any set A ⊂ R D . The set of p oin ts for whic h f χ ( x ) ≤ r is the union of D -dimensional balls of radius 1 2 r centered at the points of χ ; we write this as f − 1 χ ( −∞ , r ]. If w e allo w r to increase from 0, initially , f − 1 χ ( −∞ , r ] has n distinct connected comp onen ts. Ho wev er, as r is increased, these comp onents will merge together un til we are ﬁnally left with only one connected component. In particular, these mergings happen at the instant tw o disks touch, and thus when r is equal to the distance betw een the associated p oin ts. W e sa y that a connected component dies when it merges with another connected comp onen t; that is, a death o ccurs any time t wo clusters merge. W e can k eep trac k of the function v alues at whic h these deaths o ccur in the following manner. 1 W e work with ﬁeld co eﬃcien ts, most typically Z 2 , so the homology group is, in fact, a vector space. 4 Figure 2: Left, an example p oin t cloud with its 0-dimensional p ersistence diagram drawn as a histogram at center. Right, an example of the pairing for tw o p ersistence diagrams used to compute the b ottlenec k distance. A function v alue r is a homological critical v alue if the num b er of connected comp onen ts decreases at function v alue r . The m ultiplicity of a critical v alue is the net decrease in the n umber of connected comp onen ts. W e deﬁne dgm( χ ) ⊂ R to b e the collection of homological critical v alues, with num b er of copies equal to the m ultiplicity . The resulting set of v alues is called a 0-dimensional p ersistence diagram, notated dgm = { d 1 ≤ d 2 ≤ · · · ≤ d k } . 2 One observ ation that will b e useful for interpretation is that the n umber of connected comp onen ts of f − 1 ( −∞ , r ] is one more than the num b er of d i ∈ dgm with d i > r . W e will also use the slightly more general formulation of p ersistence for any subset A ⊂ R D . If f A : R D → R is given b y f A ( x ) = 2 k x − A k , dgm( A ) is the collection of function v alues for which the num b er of connected comp onen ts of f − 1 ( −∞ , r ] changes, again with m ultiplicity . In order to ensure that the p ersistence diagrams are ﬁnite, w e assume that A has ﬁnitely many connected components. 2.3 Stabilit y of P ersistence Diagrams The main reason for thinking of this information as a p ersistence diagram is that these structures come with a metric kno wn as the b ottlenec k distance. Here we describ e the metric in the restricted 0-dimensional p ersistence diagram setting; see e.g. [ 16 ] for the full deﬁnition. Giv en t wo 0-dim persistence diagrams, dgm 1 = { d 1 ≤ d 2 ≤ · · · ≤ d k } and dgm 2 = { e 1 ≤ e 2 ≤ · · · ≤ e ` } , a partial matching η is a bijection b et ween subsets of the t wo diagrams η : A → B , A ⊂ dgm 1 , B ⊂ dgm 2 . The cost of a matching is deﬁned to be c ( η ) = max  | a − η ( a ) |  a ∈ A ∪  a 2  a ∈ dgm 1 \ A ∪  b 2  b ∈ dgm 2 \ B ! and the b ottlenec k distance d B (dgm 1 , dgm 2 ) = min η c ( η ) is the minimum cost of the p ossible matchings. Notice that because w e are restricting our diagrams to ﬁnitely man y points, the set of matchings is ﬁnite, thus this minim um is alwa ys ac hieved. In addition, it is p ossible that multiple matchings achiev e the minimum, so we will reference suc h a matc hing as a min-cost matc hing. Consider, for example, a diagram consisting of a single p oint dgm 1 = { d 1 } vs a diagram with tw o copies of the same p oint dgm 2 = { e 1 , e 2 } . WLOG, assume e 1 ≥ e 2 . The only p ossible matchings are where η 1 ( d 1 ) = e 1 , η 2 ( d 1 ) = e 2 , or η 3 whic h matches nothing. The scores of these are c ( η 0 ) = max  d 1 2 , e 1 2 , e 2 2  , c ( η 1 ) = max  | d 1 − e 1 | , e 2 2  , and c ( η 2 ) = max  | d 1 − e 2 | , e 1 2  . In the example of the t w o persistence diagrams in the righ t of Figure 2 , the score is lo west using η 1 , so the distance b et ween the diagrams is max  | d 1 − e 1 | , e 2 2  . The b ottlenec k distance is particularly useful due to the stability theorem [ 50 ]. Recall that the Hausdorﬀ distance b et ween sets A, B ⊂ R D is d H ( A, B ) = max  sup a ∈ A k a − B k , sup b ∈ B k b − B k  . 2 Experts will notice that, in general, persistence diagrams are giv en as collections of points in R 2 . Since for 0-dimensional p er- sistence on point clouds, all components are born at 0, the true persistence diagram w ould be the collection { (0 , d 1 ) , · · · , (0 , d k ) } . W e elect to drop the rep eated ﬁrst co ordinate in order to simplify the description. 5 Then the stabilit y theorem is as follows. Theorem 2.1 ([ 50 ]) . Under mild assumptions on the sets A, B ⊂ R D 3 , d B (dgm( A ) , dgm( B )) ≤ d H ( A, B ) . This theorem is particularly useful when w e tak e the view, as we will need in Section 3 , that B is a noisy p oin t cloud appro ximation of A . In this case, d H ( A, B ) is small, so the resulting p ersistence diagrams will b e close as w ell. Let us no w explore what it means for a diagram to b e close to another diagram. Assume w e again ha ve a diagram with a single p oin t dgm 1 = { d 1 } . F or ε small relative to d 1 , w e can think of the set of diagrams within distance ε of dgm 1 = { d 1 } . Say the second diagram is dgm 2 = { e 1 , · · · , e n } , so the options for matc hings are η i matc hing d 1 to e i for i = 1 , · · · , n , and η 0 matc hing nothing. Assume d B (dgm 1 , dgm 2 ) ≤ ε and let ε < d 1 / 3. If a minimum cost matc hing is η i , this implies | d 1 − e i | ≤ ε , and e j ≤ 2 ε for all j 6 = i . This means that there m ust b e exactly one point, e j , within distance ε of d 1 , and all remaining p oin ts are in [0 , 2 ε ]. As shown in the righ tmost p ortion of Figure 2 , this means there is exactly one p oint in the top green region, and an y remaining p oin ts are in the b ottom green region. What will b e useful in Section 33.2 is the set of diagrams close to the diagram which has n copies of the p oin t d 1 , e.g. dgm n = { d 1 , d 1 , · · · , d 1 } , . In this case, any diagram within bottleneck distance ε (again suﬃcien tly small ε < d 1 / 3), will hav e exactly n p oin ts in [ d 1 − ε, d 1 + ε ], with all remaining p oin ts less than 2 ε . 2.4 Computation of the p ersistence diagram using MSTs In order to do computations, w e con vert this information in to combinatorial structures. F ollowing [ 21 , 22 ], w e compute p ersistence using a minimal spanning tree. A gr aph G = ( V ( G ) , E ( G )) consists of a ﬁnite list of v ertices V ( G ) and a set of edges, E ( G ), betw een them. A graph is c omplete if ev ery pair of v ertices has an edge b etw een them. A sub gr aph A ⊆ G is a subset of G whic h is itself a graph. That is, A = ( V ( A ) , E ( A )) where V ( A ) ⊆ V ( G ), E ( A ) ⊆ E ( G ). A p ath in G is a sequence of v ertices v 0 , · · · , v n suc h that there is an edge betw een ev ery adjacen t pair: ( v i , v i +1 ) ∈ E ( G ) ∀ i . A path is a cycle if v 0 = v n . A graph is c onne cte d if there is a path b etw een every pair of v ertices. A graph is a tr e e if it is connected and there are no cycles. A subgraph T ⊆ G whic h is a tree and with V ( T ) = V ( G ) is called a sp anning tr e e . A w eighted graph G = ( V ( G ) , E ( G ) , ω ) is a graph with a real v alue, called a weigh t, asso ciated to eac h edge: ω : E ( G ) → R . The total weight of a spanning tree of a weigh ted graph is the sum of the weigh ts on the edges. A spanning tree is called a minimal sp anning tr e e (MST) if it has minimal total weigh t amongst the set of spanning trees. If G is not connected, no MST exists as there is no connected subgraph using all the v ertices. A minimal sp anning for est (MSF) is an acyclic subgraph such that the restriction to eac h connected comp onen t of G is a MST. Giv en a ﬁnite set of p oin ts χ ⊂ R D , the ˇ Cec h graph for parameter r > 0 is the complete graph ˇ C( χ, r ) with vertex set in 1-1 corresp ondence to the p oin ts in χ , with all edges ( x, y ) for which k x − y k ≤ r , and with edge w eigh t equal to the distance betw een the associated p oin ts. First, notice that if r ≤ s , ˇ C( χ, r ) ⊆ ˇ C( χ, s ). Second, if r is larger than the diameter of χ (whic h is ﬁnite since χ is ﬁnite), ˇ C( χ, r ) is a complete graph and ˇ C( χ, r ) = ˇ C( χ, s ) for all s ≥ r . Thus, w e denote this graph as ˇ C( χ, ∞ ). Our ﬁrst intuition of understanding the connected components of the union of disks as describ ed ab o ve is to watc h the connected components of ˇ C( χ, r ) c hange as r increases. It is a consequence of the celebrated Nerv e Lemma [ 54 ] that the connected comp onen ts of ˇ C( χ, r ) match up with the connected comp onen ts of the union of disks of radius r 2 ; equiv alently with the connected comp onen ts of f χ ( −∞ , r ] in the notation of the previous section. So, w e just need to determine the list of function v alues for which the connected comp onen ts of ˇ C( χ, r ) change. Let T b e a MST for ˇ C( χ, ∞ ) 4 . It is an immediate consequence of the results [ 22 ] that the list of weigh ts on the edges of T is exactly dgm( χ ). Restated in our notation, this is the follo wing. 3 The distance functions f A and f B must b e what is known as tame in the TDA literature. As all sets considered in this paper hav e ﬁnitely many connected comp onen ts in R , w e will alwa ys satisfy this assumption. 4 This is sometimes called a “Euclidean spanning tree”. 6 Lemma 2.2 ([ 22 , Lem. 3]) . L et T b e a MST for ˇ C ( χ, ∞ ) , and let T r b e the r estriction to the set of e dges of weight at most r so that T r ⊂ ˇ C ( χ, r ] . Then T r is a MSF for ˇ C ( χ, r ] . Since a MSF of a graph has the same connected comp onents as the graph itself, this means that the c hanges in the connected comp onen ts happ en exactly at the v alues of the weigh ts in T . Putting this all together, we ha ve the following theorem. Theorem 2.3. F or a ﬁnite set of p oints χ , dgm( χ ) is the set of weights on the e dges of a MST of ˇ C ( χ, ∞ ) . The ﬁnal thing to note is that w e will largely b e in terested in p oin ts in R (so D = 1). Say χ = { a 0 , · · · , a k } . In this case, it is obvious that the MST is exactly the graph with edges { ( a i , a i +1 ) } k − 1 i =0 , and so the diagram dgm( χ ) = { ( a i +1 − a i ) } k − 1 i =0 . 3 Metho d 3.1 Basic mo del assumptions The initial data is a time series X deﬁned on { a = t 0 < t 1 < · · · < t N = b } given by { X ( t 0 ) , X ( t 1 ) , · · · , X ( t N ) } . W e assume that this data approximates the pulse w av e P τ T giv en b y P τ T ( t ) = ( 1 if 0 ≤ ( t mo d T ) ≤ τ 0 else and shown in Figure 3 . W e will assume that the duty , δ = τ /T , is less than 0 . 5 since otherwise we could sw ap our analysis to c heck for the pulses at 0 instead of the pulses at 1. W e ﬁrst assume that our noisy data is given b y X ( t ) = P τ T ( t + δ x ) + δ y (1) where δ x ∼ unif ( − α · τ , α · τ ) and δ y ∼ unif ( − β , β ) with α ∈ [0 , 1 / 2] and β ∈ [0 , 1]. F or simplicity , w e assume the noise in y , δ y , separates the high and low v alues, so β  1 2 . The noise in x , δ x , serv es to give incorrect pulse v alues when w e near the b eginning or end of a pulse b y lo oking forward and backw ard in time to reach an output. Since α is giv en as p ercen tage of τ , and we will not b e able to see anything if α is large enough to entirely ero de the oﬀ-cycle with digital ringing, we assume α ≤ T − τ 3 τ . F or practical applications with a small τ , this is satisﬁed for any α ∈ [0 , 1 / 2]. It will b e important later to note that with these assumptions on β and α , the Hausdorﬀ distance betw een M and χ = { t | X ( t ) ≥ 1 2 } for an y realization of { X ( t ) } dense enough in t will b e b ounded by ατ . 3.2 Coun ting pulses and RPM calculation using p ersistence Coun ting the num b er of pulses in a range [ A, B ] can b e thought of as counting the n umber of connected comp onen ts of M = ( P τ T ) − 1 ( { 1 } ) ∩ [ A, B ]. With this in mind, w e use a sp ecial case of 0-dimensional p ersistence to determine this from the point cloud appro ximation of M , namely χ = { t i | X ( t i ) > 1 2 } . See Figure 4 for an example. In order to determine the p ersistence-based RPM Ω P , w e need to ﬁnd the range on the t -axis con taining the full p erio ds, and count the num b er of full p eriods there. In some sense, this pro cedure is measuring the v oids b et ween the components of M , rather than measuring the comp onen ts of M themselves. The following theorem essentially outlines an algorithm, which, given con trolled enough noise, dense enough sampling, and enough visible pulses, calculates the num b er of pulses in that region using the p ersistence diagram dgm( χ ). Theorem 3.1. Assume a time series X (0) , X ( t 1 ) , · · · , X ( t N ) is data dr awn fr om the mo del of Eq. 1 with α < 1 5 ( T τ − 1) , and β < . 5 , and with times { A = t 0 < t 1 < · · · < t N = B } evenly sp ac e d such that t i − t i − 1 < τ (1 − 2 α ) / 2 and ( B − A ) /T > 3 . Set χ = { t i | X ( t i ) > 1 2 } = { a 1 ≤ · · · ≤ a m } and denote the r esulting p ersistenc e diagr am dgm( χ ) = { b 1 ≤ · · · ≤ b ` } . Cho ose µ ∈ ( d j , d j +1 ) for j = argmax k { d k +1 − d k } . Then |{ d ∈ dgm( χ ) | d > µ }| − 1 is the numb er of pulses in the r ange A < A low :=  A − τ T  + 1  T <  B T  T =: A high ≤ B . (2) 7 Figure 3: Left, an ideal pulse train P τ T . Center and righ t, noisy instances of the pulse train, b oth with T = 2, τ = . 5, α = . 05, and β = . 05. The center ﬁgure has ε = 0 and the right ﬁgure has ε = . 5. Note the uneven, random pulse spacing for  6 = 0. In particular, since A low and A high can b e approximated within ατ by a low = min { a i ∈ χ | a i − a i − 1 > µ } , and a high = max { a i ∈ χ | a i − a i − 1 > µ } , this theorem allo ws us to approximate the RPM in the follo wing sections as Ω P = # { d ∈ dgm( χ ) | d ≥ µ } − 1 a high − a low . (3) . Pr o of. First, note that if M has m -connected comp onen ts, there will b e m − 1 mergings of components that occur exactly at distance T − τ . Thus, the p ersistence diagram of M , dgm( M ), has m − 1 copies of d 1 = T − τ . By deﬁnition of the noise parameters α and β , we kno w that the Hausdorﬀ distance b et ween M and χ is at most ατ . So, by the stabilit y theorem, Thm. 2.1 , we hav e that d B (dgm( M ) , dgm( χ )) < ατ . Th us, w e turn our attention to computation of the 0-dimensional p ersistence of χ . As discussed previously , the MST for p oin ts in R has edges { ( a i , a i − 1 ) } k − 1 i =0 . Hence, the set of pairwise distances { a 1 − a 0 , a 2 − a 1 , · · · , a n − a n − 1 } is exactly the 0-dimensional persistence diagram dgm( χ ). The set of p ossible 0-dimensional diagrams within b ottlenec k distance ατ of dgm( M ) hav e exactly m − 1 p oin ts in [( T − τ ) − ατ , ( T − τ ) + ατ ], and all remaining p oints are in [0 , 2 α τ ]. Requiring α < 1 5 ( T τ − 1) means that there will be a distinct split in dgm( χ ); in particular that the largest diﬀerence betw een p oin ts in the diagram will o ccur b etw een a p oin t in [0 , 2 ατ ] and a p oint in [( T − τ ) − α τ , ( T − τ ) + ατ ]. These p oin ts are then notated as d j and d j +1 where j = argmax k { d k +1 − d k } . W e choose a threshold µ betw een the collections, and coun t # { d ∈ dgm( χ ) | d ≥ µ } + 1; that is to sa y , the num b er of p oin ts in the diagram in the interv al [( T − τ ) − ατ , ( T − τ ) + ατ ], so this is m − 1. While it is an annoying case study in num b er theory to determine exactly how man y pulses app ear in the in terv al [ A, B ], what w e can sa y is that if we cut oﬀ the ﬁrst (p ossibly partial) pulse seen, and the last (p ossibly partial) pulse seen b y lo oking in the interv al [ A low , A high ] deﬁned in Eq. 2 then the num b er of pulses is m − 2. Th us, the theorem follows. 3.3 Coun ting pulses and RPM calculation using F ourier transform The pulse counting and RPM calculation can be obtained by transforming the signal to the frequency domain. Sp eciﬁcally , the one-sided F ourier transform of the signal is obtained after subtracting the mean v alue from it to remo ve the DC comp onen t from the sp ectrum. Then, the frequency f in Hertz corresp onding to the 8 Figure 4: A pulse train, with T = 2, τ = . 8, α = . 2, β = . 1, is shown at left. The corresp onding sets χ and M are drawn ab ov e. The c hoices of domain for the RPM computation is shown by the green lines. The histogram for the p oin ts in the 0-dimensional diagram for the example of Figure 4 . Note that there is a large split b et ween the non-empty bins, so w e can c ho ose any threshold µ ∈ [0 . 2 , 1 . 0]. ﬁrst amplitude A that satisﬁes A > A max /w is found, where w is a scalar, and the F ourier-based RPM Ω F is calculated according to Ω F = f 1 × 60 . (4) In this study we chose w = 3 since it provided accurate results for the studied cases with constant p eak spacing. How ever, other v alues of w did not improv e the accuracy of the m ethod for the cases with non- constan t p eak spacing, and therefore the deterioration in the p erformance of the F ourier-based approach for these case w as not related to our c hoice of w = 3. 3.4 Accordion mo del assumptions W e no w create a generalized mo del for testing purposes where w e hav e data whic h do es not main tain regular pulses. This model will generate data on a window [0 , W ]. Assume Q 1 , · · · , Q K are dra wn iid from unif ((1 − ε ) T , (1 + ε ) T ). W e assume ε ∈ [0 , 1] and K > W (1 − ε ) T is large enough to ensure that P K i =1 Q i ≥ W . W e will generate pulses so that the length of the i th p erio d is Q i . F or the sake of notation, let Q 0 = 0. F or any s ∈ [0 , W ], deﬁne σ ( s ) = max { j ∈ [0 , · · · , K ] | P j i =0 Q i ≤ s } . The reparameterization function is given by ϕ ( s ) = ( T · σ ( s )) + T Q σ ( s )+1   s − σ ( s ) X i =0 Q i   . Then the noisy time series is given as X ( s ) = P τ T ( ϕ ( s ) + δ x ) + δ y (5) where δ x and δ y are noise as deﬁned in Sec. 3.1 . See Fig. 3 for an example. Notice that this mo del simpliﬁes to Eq. 1 when ε = 0. 4 Numerical Sim ulations and Robustness Analysis This section describ es the numerical simulations and shows the results of the robustness analysis. The basic idea is to take a standard pulse train with a small duty cycle and introduce spurious p eaks at b oth the rising 9 P arameter Description V alue(s) studied δ nominal duty cycle for the pulse 5% Ω 0 nominal RPM [30 , 24000] T nominal p erio d of the pulse in seconds 60 / Ω 0 τ time in on cycle δ · T α uniform noise in x as % of τ [0 , 0 . 5] β uniform noise in y as % of amplitude 0  p eaks’ spacing uniform noise amplitude/ T [0 . 02 , 0 . 65] n n umber of p eriods to sim ulate 32 m o versampling factor 32 N num b er of simulation p oin ts N = m (Ω 0 / 60) nT T able 1: The parameters used in the numerical sim ulations. and the falling edge of the data using a uniformly distributed random v ariable as deﬁned in Eq. 1 . This noise is found to mimic the digital ringing noticed in the laser tac hometer experimental data discussed in Sec. 5 . F urther, we simulate data from a signal where the spacing b etw een the p eaks is non-constant as deﬁned in Eq. 5 . The signal with mo dulated p eak separation simulates, for example, a heavy cutting pro cess where the spindle sp eed v aries during the cut. Alternativ ely , this spindle sp eed v ariation migh t b e inten tionally in tro duced to mitigate c hatter and improv e the cutting process [ 57 , 58 , 59 , 60 ]. The basic parameters used in the sim ulation are sho wn in T able 1 . In order to test the robustness of the p ersistence-based algorithm, the describ ed numerical sim ulations w ere utilized in a n umber of analyses which include: 1. Inv estigating the describ ed approach as a function of digital ringing. This is controlled by the parameter α which speciﬁes, as a percent of τ , the interv al around the rising and falling edges that will be aﬀected b y digital ringing. The digital ringing is in tro duced using a uniformly distributed v ariable deﬁned on the interv al describ ed by α . Therefore, as α gets larger, the digital ringing gets more severe. 2. Studying the performance of the metho d when v arying the spacing b etw een the peaks (the accordion eﬀect). This is controlled by the parameter  which represents, as a p ercen t of T , the width of the in terv al that will be mo dulated b y uniform noise. The larger  , the more lik ely it is for the p eaks to get closer. 3. Analyze the results of our approach versus the well-established F ourier transform method. With F ourier the frequency is found and the RPM or the count of pulses per unit time are then deduced from the prominen t frequency comp onen ts. 4. Benchmark the computational time of the p ersistence-based approach versus the heavily optimized F ourier transform approach. All the simulations and the related analysis were p erformed using Python’s scipy stack. The n umerical sim ulations consisted of a 100 × 100 uniform grid each in the (Ω 0 , α ) plane, and the (Ω 0 ,  ) plane. The sim ulation grids w ere deﬁned using Ω 0 ∈ [30 , 24000], α ∈ [0 , 0 . 5], and  ∈ [0 . 02 , 0 . 65]. F or b oth sets of sim ulations, 100 replicates were generated at each grid point. The starting seed for the random n umber generator used to pro duce the random noise w as 48824, and it was incremented b y one for each subsequen t iteration. F or eac h grid point, the RPM was calculated using both p ersistence (Eq. 3 ) and F ourier (Eq. 4 ). The computation time for each method was also recorded using the performance counter from Python’s time pac k age. The sample mean was used as the p oin t estimator on each grid p oin t. The relative error b et ween the output of eac h algorithm (F ourier, and p ersistence) and the nominal RPM Ω 0 w as computed. The results that show a comparison b et ween F ourier and p ersistence are summarized in Fig. 5 . The left and right columns corresp ond to p ersistence-based and F ourier-based results, resp ectiv ely . The ﬁrst ro w is for v arying the accordion parameter  , while the second ro w corresp onds to v arying the digital ringing parameter α . The same color scale is used for b oth heat maps in the same ro w, and higher v alues on the color scale indicate larger relative error. 10 Figure 5: Heat maps showing a comparison of the relative errors b et ween the calculated RPM using p ersis- tence (left column) and F ourier (right column). The ﬁrst row shows the comparison in the (Ω 0 ,  ) plane, while the second row compares the tw o algorithms in the (Ω 0 , α ) plane. Note that the color scale is ﬁxed for each ro w to facilitate the comparisons. 11 Figure 6: T op left: A ﬁgure showing more details of the heat map that depicts the relative error in the p ersistence-based RPM as a function of Ω 0 and α . T op righ t, b ottom left, and b ottom right: The calculated RPM using p ersistence and F ourier algorithms for α = 10% and the RPM v alues 3178, 12136, and 20126, resp ectiv ely . The nominal RPM v alues are marked using horizontal black lines. The bands around the line represen t the 68% conﬁdence bands which w ere computed using an empirical b ootstrap distribution of the sample mean. 4.1 α v ersus Ω 0 The left heat map in the b ottom row of Fig. 5 shows that p ersistence p erformed signiﬁcantly b etter than F ourier. Speciﬁcally , the relative error of the p ersistence algorithm is b elow 7% for α < 30%. In contrast, the error in F ourier is as high as 24% for α < 30% and go es to at least 40% for α > 35%. Note that in b oth ro ws of Fig. 3 the color scale is the same. A closer look at the heat map of p ersistence in Fig. 6 sho ws ho w the error in the persistence algorithm is b et w een 15% and 25% but only for α > 30%. These are large v alues that corresp ond to a signal with very strong v ariations b etw een the peaks. Note that even though we show the results for α ∈ [0 , 0 . 5], w e could deal with the case of α > 0 . 5 by rev ersing the p ersistence algorithm from lo oking at the gaps b et w een p eaks (logic zero) to lo oking at logic one (high v alues in the tw o-lev el digital signal). Therefore, the user has the ﬂexibility of adjusting the algorithm to recov er go od p erformance, as needed. Figure 7 sho ws the plot of the a verage RPM versus α calculated using b oth p ersistence and F ourier for  = 25% at three nominal rpms: 3178 (top righ t), 12136 (bottom left), and 20126 (b ottom righ t). The nominal RPM is indicated on the ﬁgure using a solid, horizontal black line. The curv es for the calculated p ersistence and F ourier RPM v alues also include the 68% conﬁdence band that can b e seen in the inset panels of the top right ﬁgure. These conﬁdence bands were computed using an empirical b o otstrap distribution of the sample mean. It can b e seen that for α < 20%, F ourier consistently underestimates all three nominal RPM v alues. Moreov er, as α is increased b ey ond 20%, the F ourier estimate starts to decrease further b elo w 12 the nominal RPM. In contrast, p ersistence giv es close estimates of the nominal RPM for α < 25%, and only tends to consistently undersho ot the nominal RPM for α > 25%. Therefore, the b ottom ro ws in Figs. 5 and 6 show that the persistence algorithm is robust to the width of the digital ringing. 4.2  v ersus Ω 0 The top row of Fig. 5 shows that for  < 7% the tw o algorithms pro duce comparable results with less than 10% relative error. How ever, as  is increased, the p erformance of F ourier progressively deteriorates from 10% to greater than 40% as evidenced by the increase in the relative error across appro ximately four horizon tal bands with a width of 10% each. These bands are roughly given b y  ≤ 15%, 18% <  ≤ 30%, 30% <  ≤ 40%, and  > 40%. In contrast, the left heat map in the ﬁrst ro w shows that persistence consisten tly maintains less than 10% error for almost all v alues of  ; ho wev er, errors start to increase for  > 42%. The deterioration of the p erformance of the describ ed approac h is b etter observed from the heat map in Fig. 7 where higher relative errors start o ccurring tow ards the upper limit of the  v alues. In addition to the heat map that sho ws the relativ e error as a function of  and Ω 0 , Fig. 7 sho ws the plot of the av erage RPM v ersus  calculated using b oth p ersistence and F ourier for α = 10% at three nominal RPM v alues: 3178 (top righ t), 12136 (b ottom left), and 20126 (b ottom right). The nominal RPM is indicated on the ﬁgure using a solid, horizontal black line. The curves for the calculated p ersistence and F ourier rpms also includes the 68% conﬁdence band that can b e seen in the inset panels of the top right ﬁgure. These conﬁdence bands were computed using an empirical b o otstrap distribution of the sample mean. W e can see that for both algorithms the conﬁdence bands are narrow. F urther, it can b e seen that for  < 5% b oth algorithms are close to the nominal v alue for all three RPM choices. How ever, for  > 5%, the F ourier algorithm quic kly div erges a w ay from the nominal RPM while persistence remains close to the nominal v alue. As  is increased to 45%, we start to see p ersistence undershooting the nominal v alue whic h indicates that at high v alues of  , p ersistence undercoun ts the true peaks. Therefore, the top ro ws in Figs. 5 and 7 show that the persistence algorithm is robust to the v ariation in the spacing b etw een the peaks. 4.3 Run time comparison In terms of theoretical runtime, the algorithm for computing Ω P is simple, requiring only a sequence of pairwise subtractions and sortings. Since these take O ( n ) and O ( n log n ) resp ectiv ely , the theoretical worst case runtime for the p ersistence based algorithm is O ( n log n ). Meanwhile, the FFT algorithm computes the F ourier transform in O ( n log n ) as w ell, so the worst case analysis of the tw o methods is the same. Run time b enc hmarks for F ourier and p ersistence algorithms w ere measured using the p erformance counter from Python’s time pack age. The machine used for the sim ulations is a standard desktop running Ubun tu 16.04 with 32GB ram and a 3.6GHz In tel i7 pro cessor. A total of 200 runs for each algorithm was p erformed at each v alue of the nominal RPM. The nominal RPM is kept as a v ariable of runtime because it controls the length of the sim ulated pulse (see T able 1 ), and consequently , the runtime for each algorithm. The diﬀerent runs were then av eraged for eac h algorithm, and the results are plotted in Fig. 8 . The line in the ﬁgure describ es the av erage while the bands represent the 68% conﬁdence interv al obtained by b ootstrap sampling of the replicates. Notice the small y -axis scale in Fig. 8 indicating that b oth algorithms run relativ ely fast. Ho wev er, the ﬁgure shows that p ersistence runs sligh tly faster than F ourier, and that the runtime is appro ximately constant across the RPM range. In contrast, the runtime for F ourier v aries more than its p ersistence coun terpart for the diﬀerent v alues of the nominal RPM. 5 Exp erimen tal v eriﬁcation The exp erimen tal apparatus is shown in Fig. 9 . It consists of a T erhahertz T echnologies L T-880 Laser T achometer moun ted against the spindle of a Clausing-Gamet 33 Centimeters (13 inc h) engine lathe. The laser b eam is emitted on to a point along the circumference of the spindle. Half of the spindle is cov ered with white tap e, while the other half is cov ered with black tap e. When the emitted light b eam is reﬂected back to the tachometer’s receiving lens, a 5 v olt TTL pulse is registered. These pulses are captured by an NI USB- 6366 data acquisition b ox using Matlab. The data is collected using an analog channel to emulate a common practice in industry where sensory inputs in mac hining processes (including those of laser tachometers) are 13 Figure 7: T op left: A ﬁgure showing more details of the heat map that depicts the relative error in the p ersistence-based RPM as a function of the nominal RPM and  . T op righ t, b ottom left, and bottom right: The calculated RPM using p ersistence and F ourier algorithms for  = 25% and the RPM v alues 3178, 12136, and 20126, resp ectiv ely . The nominal RPM v alues are marked using horizontal blac k lines. The bands around the line represent the 68% conﬁdence bands whic h were computed using an empirical b ootstrap distribution of the sample mean. all collected using analog channels of the same data acquisition device. The signal to noise ratio for laser tac hometers is large and it is easy to reco ver a digital signal b y hard-thresholding. In this study , additive normal noise was remo ved b y setting the threshold at 2.5 v olts. Thus, any signal ab o v e the threshold was set to 1, while signals below the threshold w ere set to zero. The exp eriment was p erformed using several nominal spindle sp eeds, which are shown in T able 2 . Three trials were performed for each speed and data collection started after the initial spindle ramp up ended. The data was sampled at 80 kHz and the duration of the data collection was c hosen suc h that for each trial the recorded time series captured 30–32 true p eaks in the tac hometer’s pulse train; again, see T able 2 . The reason for choosing a sampling rate of 80 kHz is t wofold. First, it is now b ecoming standard to use digital signal pro cessing (DSP) for low/high/band-pass ﬁltering instead of in-line analog ﬁlters. A conserv ative rule of thum b in DSP is to ov ersample by a factor of 16, digitally ﬁlter the signal, and then do wnsample to the frequency range of in terest. Since many signals in cutting pro cesses are collected to study or preven t chatter, which can o ccur at frequencies of a few thousand hertz, we wan ted to mak e sure that our results apply to cases where by the time all the signals b eing collected are do wnsampled b y a factor of 16 we can recov er a noise-free signal within the range of 5 kHz; so 16 × 5 = 80 kHz. The second reason is that the frequency roll-oﬀ of the laser tac hometer we are using is 40 kHz. Therefore, using 80 kHz ensures that w e are utilizing the full dynamic range of the sensor. This eliminates the p ossibility of attributing the digital ringing w e are observing in the signal to insuﬃcien t sampling. Nevertheless, although w e are reporting the 14 Figure 8: Average runtime for all trials as a function of the nominal RPM, i.e., the signal length. Figure 9: The exp erimen tal apparatus consists of a laser tachometer mounted against a rotating spindle. The left picture shows the bac k view while the right picture sho ws the side view. Half of the spindle is ﬁtted with white tape whereas the other half is cov ered with blac k tape. 15 Figure 10: Comparison of Ω P , Ω F , and the nominal spindle speeds using experimental data. The inset panels shows that b oth Ω P and Ω F are close to each other and to the nominal sp eed ev en at the sections where they deviate the most. The ﬁgure shows that b oth Ω P and Ω F giv e a sp eed that is low er than its nominal v alue for Ω 0 > 800 RPM. This is attributed to the c huc k slip eﬀects at the higher range of speeds. 16 results when sampling at 80 kHz, w e tried other, lo wer ov ersampling ratios, e.g., by a factor of 4, and the observ ed behavior was the same. Figure 10 shows the results of calculating Ω P and Ω F using the collected signals. It can b e seen that F ourier and p ersistence results are almost indistinguishable. F urther, close examination of the the tw o inset panels in Fig. 10 show that the t wo curves can cross at several p oin ts and trade places. Nevertheless, they remain close and parallel for all practical reasons. A grid is sup erimp osed on the ﬁgure to show the exp ected p oin ts on a curv e that represen ts p erfect RPM calculation v ersus Ω 0 , the nominal RPM set on the mac hine. It can b e seen from the ﬁgure that both Ω P and Ω F matc h Ω 0 for Ω 0 < 800 RPM. Ho wev er, for Ω 0 ≥ 800, b oth Ω P and Ω F undersho ot Ω 0 . This was expected since the machine for these spindle sp eeds w as op erating at its high range and therefore, spindle slipping and inaccuracies in the Ω 0 w ere exp ected. The registered Ω P and Ω F , which are considered more accurate than the nominal v alue set on the machine, conﬁrm this exp ectation. Nominal RPM 30 40 54 72 98 130 175 235 320 425 570 770 1030 62 50 35 30 20 15 12 9 7 5 4 3 3 Duration of data record (seconds) T able 2: The nominal spindle speeds (top blo c k) and the corresp onding duration of the data collection for eac h trial (b ottom blo c k). Three records were collected at eac h nominal RPM. 6 Conclusions and Discussion This pap er describ ed a new approach for step detection using to ols from top ological data analysis, speciﬁcally , 0-dimensional persistent h omology . This viewp oin t allo ws us to provide guaran tees via Thm. 3.1 for coun ting pulses even in the presence of digital ringing and sto chasticit y in p eriod length. Sp eciﬁcally , the approach in volv es computing the p ersistence diagram for the set of times when the time series is ab o ve a threshold, and using the widest split in this diagram to determine how many true pulses were seen. The metho d is further extended to b e able to giv e an estimation of RPM. The describ ed approach was v eriﬁed using numerical and exp erimental studies. Sp eciﬁcally , simulated pulse trains with small duty cycles w ere generated using tw o indep enden t uniform noise comp onents: one that v aried the spacing b et ween the pulses  , and one that introduced spurious peaks near the rising and falling edges of the signal α . The α noise comp onen t simulates the digital ringing observed in the exp erimen tal data. The top row of Fig. 5 shows that while b oth F ourier and persistence methods are resilien t to  < 7%, p ersistence is b etter suited for higher  v alues. Sp eciﬁcally , for the F ourier-based approach the relative error progressiv ely grows to ab out 40% across four horizontal bands of  interv als while the relativ e e rror of the p ersistence-based approac h remains b elo w 10%. The b ottom ro w of Fig. 5 shows that the p ersistence based approach gives results with a relativ e error b elo w 7% for α < 30%. The F ourier-based approach gives errors as high as 24% for the same α range. F urther, the ﬁgure sho ws that while the p ersistence-based approach giv es a relative error b etw een 15% and 25% for α > 30%, the error in the F ourier-based approach is at least 40%. Bey ond 42%, the describ ed approac h can regain its accuracy b y detecting high logic b etw een v alleys as opp osed to detecting low logic b et w een peaks. A close examination of sample tra jectories of  versus Ω P and Ω F in Fig. 7 shows that both metho ds hav e tigh t error b ounds. How ev er, the p ersistence-based metho d remains close to Ω 0 while Ω F deviates rapidly from Ω 0 for  > 5%. Similarly , Fig. 6 sho ws that persistence remains close to Ω 0 with noticeable deviations from Ω 0 starting at α > 30%. The same ﬁgure sho ws that F ourier-based approach deviates from Ω 0 o ver the whole 0 . 02 < α < 0 . 5 range. Computationally , p ersistence and F ourier ha ve the same theoretical runtime as sho wn in Section 4.3 . This is conﬁrmed b y the b enc hmark shown in Fig. 8 where it is seen that the av erage run time of b oth algorithms is small although p ersistence is consisten tly smaller. F urther, the ﬁgure shows larger v ariations 17 Figure 11: An example of a higher dimensional analogue of the giv en metho d for pulse counting. The image is a pro duct of tw o realizations of Eqn. 1 and the histogram shows the resulting 0-dimensional p ersistence diagram. The fact that there is a distinct split b et w een the p oin ts in the diagram means that we can still use the high-persistence points to count the num b er of clusters seen at left. in the runtime of F ourier in comparison to persistence. The 68% conﬁdence band around the tw o curv es is extremely small, whic h indicates high conﬁdence in the mean as an estimator of the true run time. The experimental inv estigations depicted in Fig. 9 further established the reliability of the describ ed approac h. Speciﬁcally , Fig. 10 shows that b oth p ersistence-based and F ourier-based calculated RPMs are v ery close. F or the sp eciﬁc lathe used in the exp erimen t, b oth metho ds gav e spindle sp eeds close to the nominal v alue for Ω 0 < 800 RPM. How ev er, for Ω 0 > 800 RPM, b oth Ω P and Ω F yielded RPM v alues smaller than the nominal. This discrepancy at higher sp eeds is attributed to op erating the machine at its high range where slipping in the spindle drive can cause the actual RPM to b e smaller than the set RPM. One cav eat w e mention here is that it is imp ortant to pick a span of time that contains a reasonable n umber of true p eaks. If the pick ed time span is to o long, then the resulting distribution of the gaps (or the p eaks if the algorithm is switc hed around) may ﬁll up the histogram of the p oin ts in the persistence diagram (see Fig. 2 ) whic h can make distinguishing the true peaks diﬃcult or imp ossible. 6.1 Higher dimensional extensions In essence, w e ha ve taken a diﬃcult problem — clustering p oin ts in R d — to a rather simpliﬁed setting — d = 1. This restriction allows for very fast algorithms, particularly owing to the fact that one knows the minimal spanning tree in adv ance. Still, algorithms for computing 0-dimensional p ersistent homology for p oin ts in higher dimensions are still quite fast: an implementation using a mo diﬁed union ﬁnd algorithm runs in O ( nα ( n )) where n is the num b er of p oints and α is the notoriously slow growing inv erse Ack ermann function. One could, ho wev er, imagine higher dimensional signals (i.e., images) where there are suﬃcient guarantees on the pulse rate and noise inputs so as to create higher dimensional analogues of Thm. 3.1 . Consider, for example, the image of Fig. 11 generated by Y ( s, t ) = X 1 ( s ) · X 2 ( t ) for tw o realizations of Eqn. 1 . The 0-dimensional persistence diagram histogram is shown to the right, and, as with the previous data sho wn in this pap er, there is a clear distinction b et ween the p oin ts. Since there are 35 p oints abov e a threshold of 8 in the histogram, w e can conclude that there are 36 clusters in the image, which can b e conﬁrmed b y inspection. Th us, w e h yp othesize that this metho d is quite useful in higher dimensional applications, although suc h an application is still an op en question for future research. The data ﬁles, code, and exp erimental parameters can b e found at the gitlab rep ository https://gitlab.msu.edu/ TSAwithTDA/TDA- for- true- step- detection- in- PWC- signals . The authors ackno wledge the help of David Petrushenk o in setting up the lathe for the exp erimen ts. 18 References [1] Y. Sow a, A. D. Row e, M. C. Leake, T. Y akushi, M. Homma, A. Ishijima, and R. M. Berry , “Direct observ ation of steps in rotation of the bacterial ﬂagellar motor,” Natur e , v ol. 437, pp. 916–919, 2005. [2] M. A. Little and N. S. Jones, “Sparse Bay esian step-ﬁltering for high-throughput analysis of molecular mac hine dynamics,” in 2010 IEEE International Confer enc e on A c oustics, Sp e e ch and Signal Pr o c essing , pp. 4162–4165, Marc h 2010. [3] M. A. Little and N. S. Jones, “Generalized metho ds and solvers for noise remov al from piecewise constan t signals. I. Background theory ,” Pr o c e e dings of the R oyal So ciety of L ondon A: Mathematic al, Physic al and Engine ering Scienc es , v ol. 467, no. 2135, pp. 3088–3114, 2011. [4] M. A. Little and N. S. Jones, “Generalized metho ds and solvers for noise remov al from piecewise constan t signals. I I. New methods,” Pr o c e e dings of the R oyal So ciety of L ondon A: Mathematic al, Physic al and Engine ering Scienc es , v ol. 467, no. 2135, pp. 3115–3140, 2011. [5] J. A. Niro dy , Y.-R. Sun, and C.-J. Lo, “The biophysicist’s guide to the bacterial ﬂagellar motor,” A dvanc es in Physics: X , v ol. 2, no. 2, pp. 324–343, 2017. [6] A. Y oussef, D. Matthews, A. Guzzomi, and J. P an, “Measuremen t of pressure ﬂuctuations inside a mo del thrust b earing using PVDF sensors,” Sensors , v ol. 17, no. 4, p. 878, 2017. [7] Y. Jin, S. Ji, B. Liu, and L. Chamorro, “On the role of thic kness ratio and location of axis of rotation in the ﬂat plate motions,” Journal of Fluids and Structur es , v ol. 64, pp. 127 – 137, 2016. [8] A. Alb ers, F. Schille, and M. Behrendt, “Metho d for measuring and analyzing transient p o wertrain vibrations of h ybrid electric vehicles on an acoustic roller test b enc h,” in SAE T e chnic al Pap er , SAE In ternational, 06 2016. [9] M. Rahman, T. Saly ers, M. Ahmed, A. ElShahat, V. Soloiu, and E. Maroha, “Inv estigation of aero- dynamic p erformance of helical shape v ertical-axis wind turbine mo dels with v arious num b er of blades using wind tunnel testing and computational ﬂuid dynamics,” in ASME International Me chanic al En- gine ering Congr ess and Exp osition , vol. 7: Fluids Engineering, 2016. [10] G. Urbik ain, D. Olvera, L. L. de Lacalle, and A. El ´ ıas-Z ´ u ˜ niga, “Spindle sp eed v ariation technique in turning op erations: Mo deling and real implementation,” Journal of Sound and Vibr ation , vol. 383, pp. 384 – 396, 2016. [11] S. Amini, M. Aghaei, M. Lotﬁ, and E. Hakimi, “Analysis of linear vibration in rotary turning of AISI 4140 steel,” International Journal of A dvanc e d Manufacturing T e chnolo gy , 2017. [12] B. Das, S. Bag, and S. P al, “Defect detection in friction stir welding process through characterization of signals b y fractal dimension,” Manufacturing L etters , vol. 7, pp. 6 – 10, 2016. [13] M. W an, Y.-C. Ma, J. F eng, and W.-H. Zhang, “Mec hanics of tapping pro cess with emphasis on measure- men t of feed error and estimation of its induced inden tation forces,” International Journal of Machine T o ols and Manufactur e , vol. 114, pp. 8 – 20, 2017. [14] A. Honeycutt and T. L. Schmitz, “Surface location error and surface roughness for p erio d-n milling bifurcations,” Journal of Manufacturing Scienc e and Engine ering , v ol. 139, pp. 061010 –061018, 2017. [15] R. Ghrist, Elementary Applie d T op olo gy . 2014. [16] H. Edelsbrunner and J. Harer, Computational T op olo gy: A n Intr o duction . American Mathematical So ciet y , 2010. [17] T. Kaczynski, K. Mischaik ow, and M. Mrozek, Computational Homolo gy . Springer, 2004. [18] A. Zomoro dian, T op olo gy for Computing , vol. 16 of Cambridge Mono gr aphs on Applie d and Computa- tional Mathematics . Cambridge Universit y Press, 2005. 19 [19] R. Ghrist, “Barco des: The p ersistent top ology of data,” Buil letin of the Americ an Mathematic al So ciety , v ol. 45, pp. 61–75, 2008. [20] G. Carlsson, “T op ology and data,” Bul letin of the A meric an Mathematic al So ciety , vol. 46, pp. 255–308, Jan. 2009. [21] V. Robins, J. D. Meiss, and E. Bradley , “Computing connectedness: An exercise in computational top ology ,” Nonline arity , v ol. 11, no. 4, p. 913, 1998. [22] V. Robins, J. Meiss, and E. Bradley , “Computing connectedness: disconnectedness and discretenes s,” Physic a D: Nonline ar Phenomena , vol. 139, no. 3, pp. 276 – 300, 2000. [23] V. Robins, N. Rooney , and E. Bradley , “T op ology-based signal separation,” Chaos: An Inter disciplinary Journal of Nonline ar Scienc e , vol. 14, no. 2, pp. 305–316, 2004. [24] M. Robinson, T op olo gic al Signal Pr o c essing . Springer, 2014. [25] Edelsbrunner, Letsc her, and Zomoro dian, “T op ological p ersistence and simpliﬁcation,” Discr ete & Com- putational Ge ometry , v ol. 28, no. 4, pp. 511–533, 2002. [26] A. Zomorodian and G. Carlsson, “Computing p ersisten t homology ,” Discr ete & Computational Ge om- etry , vol. 33, pp. 249–274, Nov. 2004. [27] J. Garland, E. Bradley , and J. D. Meiss, “Exploring the topology of dynamical reconstructions,” Physic a D: Nonline ar Phenomena , v ol. 334, pp. 49–59, nov 2016. [28] R. MacPherson and B. Sch w einhart, “Measuring shap e with top ology ,” Journal of Mathematic al Physics , v ol. 53, no. 7, 2012. [29] Z. Alexander, E. Bradley , J. D. Meiss, and N. F. Sanderson, “Simplicial multiv alued maps and the witness complex for dynamical analysis of time series,” SIAM Journal on Applie d Dynamic al Systems , v ol. 14, pp. 1278–1307, jan 2015. [30] S. Maleti´ c, Y. Zhao, and M. Ra jk ovi ´ c, “P ersistent topological features of dynamical systems,” Chaos: A n Inter disciplinary Journal of Nonline ar Scienc e , v ol. 26, p. 053105, may 2016. [31] C. M. Pereira and R. F. de Mello, “Persisten t homology for time series and spatial data clustering,” Exp ert Systems with Applic ations , no. 0, 2015. [32] J. A. Perea, “Persisten t homology of toroidal sliding window embeddings,” in 2016 IEEE International Confer enc e on A c oustics, Sp e e ch and Signal Pr o c essing (ICASSP) , IEEE, mar 2016. [33] J. A. P erea and J. Harer, “Sliding windo ws and persistence: An application of top ological methods to signal analysis,” F oundations of Computational Mathematics , pp. 1–40, 2015. [34] C. J. T ralie and J. A. P erea, “(Quasi)Periodicity quantiﬁcation in video data, using topology ,” [35] M. Robinson, “Universal factorizations of quasip erio dic functions,” in 2015 International Confer enc e on Sampling The ory and Applic ations (SampT A) , pp. 588–592, May 2015. [36] V. de Silv a, P . Skraba, and M. V ejdemo-Johansson, “T op ological analysis of recurrent systems,” in NIPS 2012 Workshop on A lgebr aic T op olo gy and Machine L e arning , 2012. [37] S. Emrani, T. Gentimis, and H. Krim, “Persisten t homology of delay embeddings and its application to wheeze detection,” Signal Pr o c essing L etters, IEEE , v ol. 21, pp. 459–463, April 2014. [38] Z. Alexander, J. D. Meiss, E. Bradley , and J. Garland, “Iterated function system mo dels in data analysis: Detection and separation,” Chaos: An Inter disciplinary Journal of Nonline ar Scienc e , vol. 22, no. 2, 2012. [39] M. Gidea and Y. Katz, “T op ological data analysis of ﬁnancial time series: Landscap es of crashes,” 2017. 20 [40] M. Gidea, T op olo gic al Data Analysis of Critic al T r ansitions in Financial Networks , pp. 47–59. Cham: Springer International Publishing, 2017. [41] M. Robinson, “Multipath-dominant, pulsed doppler analysis of rotating blades,” IET R adar Sonar and Navigation , vol. 7, pp. 217–224, Mar. 2013. [42] M. V ejdemo-Johansson, F. Pok orn y , P . Skraba, and D. Kragic, “Cohomological learning of p erio dic motion,” Applic able Algebr a in Engine ering, Communic ation and Computing , pp. 1–22, 2015. [43] J. J. Berwald, M. Gidea, and M. V ejdemo-Johansson, “Automatic recognition and tagging of topologi- cally diﬀeren t regimes in dynamical systems,” Disc ontinuity, Nonline arity, and Complexity , v ol. 3, no. 4, pp. 413–426, 2014. [44] F. A. Khasawneh and E. Munch, “Chatter detection in turning using p ersisten t homology ,” Me chanic al Systems and Signal Pr o c essing , vol. 70-71, pp. 527–541, 2016. [45] F. A. Khasa wneh and E. Munch, “Exploring equilibria in sto c hastic delay diﬀeren tial equations using p ersisten t homology ,” in Pr o c e e dings of the ASME 2014 International Design Engine ering T e chnic al Confer enc es & Computers and Information in Engine ering Confer enc e, August 17-20 , 2014, Buﬀalo, NY, USA , 2014. P ap er no. DETC2014/VIB-35655. [46] F. A. Khasawneh and E. Munc h, “Stabilit y determination in turning using p ersisten t homology and time series analysis,” in Pr o c e e dings of the ASME 2014 International Me chanic al Engine ering Congr ess & Exp osition, Novemb er 14-20, 2014, Montr e al, Canada , 2014. Paper no. IMECE2014-40221. [47] F. A. Khasa wneh and E. Munc h, Utilizing T op olo gic al Data A nalysis for Studying Signals of Time-Delay Systems , pp. 93–106. Cham: Springer International Publishing, 2017. [48] A. Deck ard, R. C. Anaﬁ, J. B. Hogenesch, S. B. Haase, and J. Harer, “Design and analysis of large-scale biological rhythm studies: a comparison of algorithms for detecting p eriodic signals in biological data,” Bioinformatics , vol. 29, no. 24, pp. 3174–3180, 2013. [49] J. A. Perea, A. Dec k ard, S. B. Haase, and J. Harer, “SW1PerS: Sliding windows and 1-p ersistence scoring; disco vering p erio dicit y in gene expression time series data,” BMC Bioinformatics , vol. 16, Aug 2015. [50] D. Cohen-Steiner, H. Edelsbrunner, and J. Harer, “Stability of persistence diagrams,” Discr ete Comput. Ge om. , v ol. 37, pp. 103–120, Jan. 2007. [51] D. Cohen-Steiner, H. Edelsbrunner, J. Harer, and Y. Mileyko, “Lipsc hitz functions ha ve l p -stable per- sistence,” F ound. Comput. Math. , vol. 10, pp. 127–139, F eb. 2010. [52] N. Otter, M. A. Porter, U. Tillmann, P . Grindrod, and H. A. Harrington, “A roadmap for the compu- tation of persistent homology ,” EPJ Data Scienc e , v ol. 6, aug 2017. [53] M. St ´ ephane, “Chapter 2 - the fourier kingdom,” in A Wavelet T our of Signal Pr o c essing (Thir d Edition) (M. St´ ephane, ed.), pp. 33 – 57, Boston: Academic Press, third edition ed., 2009. [54] A. Hatc her, Algebr aic T op olo gy . Cam bridge Universit y Press, 2002. [55] J. Munkres, T op olo gy (2nd Edition) . P earson, 2000. [56] F. Murtagh and P . Con treras, “Algorithms for hierarchical clustering: an o verview,” Wiley Inter disci- plinary R eviews: Data Mining and Know le dge Disc overy , vol. 2, pp. 86–97, dec 2011. [57] B. Sexton, J.S.and Stone, “The stability of machining with con tinuously v arying spindle sp eed,” Annals of the CIRP , vol. 27, pp. 321–326, 1978. [58] A. Yilmaz, E. AL-Regib, and J. Ni, “Machine to ol chatter suppression by multi-lev el random spindle sp eed v ariation,” Journal of Manufacturing Scienc e and Engine ering , vol. 124, no. 2, pp. 208–216, 2002. 21 [59] S. Lin, R. DeV or, and S. Kap o or, “The eﬀects of v ariable sp eed cutting on vibration control in face milling,” Journal of Engine ering for Industry , vol. 112, pp. 1–11, 1990. [60] R. Radulescu, S. G. Kapo or, and R. E. DeV or, “An in vestigation of v ariable spindle speed face milling for to ol-w ork structures with complex dynamics, part 1: Simulation results,” Journal of Manufacturing Scienc e and Engine ering , v ol. 119, no. 3, pp. 266–272, 1997. 22

Topological Data Analysis for True Step Detection in Piecewise Constant Signals

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment