Bearing fault diagnosis based on domain adaptation using transferable features under different working conditions

Bearing fault diagnosis based on domain adaptation using transferable features under diﬀeren t w orking conditions Zhe T ong 1 , W ei Li 1 , Bo Zhang 2 ∗ , Meng Zhang 1 1 Sc ho ol of Mec hanical Engineering, China Univ ersity of Mining and T ec hnology , Xuzhou, 221116, P .R. China. 2 Sc ho ol of Computer Science and T ec hnology , China Univ ersity of Mining and T ec hnology , Xuzhou, 221116, P .R. China. Email: zb cum t@163.com Abstract Bearing failure is the most common failure mo de in rotating mac hin- ery and can result in large ﬁnancial losses or even casualties. Ho w ev er, complex structures around b earing and actual v ariable working condi- tions can lead to large distribution diﬀerence of vibration signal b etw een a training set and a test set, which causes the accuracy-dropping prob- lem of fault diagnosis. Th us, how to impro ve eﬃciently the p erformance of bearing fault diagnosis under diﬀerent working conditions is alwa ys a primary c hallenge. In this pap er, a no vel b earing fault diagnosis under diﬀeren t working conditions metho d is proposed based on domain adap- tation using transferable features(DA TF). The dataset of normal b earing and faulty bearings are obtained through the fast F ourier transforma- tion(FFT) of raw vibration signals under diﬀerent motor sp eeds and load conditions. Then we reduce marginal and conditional distributions sim ul- taneously across domains based on maximum mean discrepancy(MMD) in feature space by reﬁning pseudo test labels, which can b e obtained by the Nearest-Neigh b or(NN) classiﬁer built on training data, and then a robust transferable feature representation for training and test domains is ac hiev ed after several iterations. With the help of the NN classiﬁer 1 trained on transferable features, bearing fault categories are identiﬁed ac- curately in ﬁnal. Extensiv e exp eriment results show that the prop osed metho d under diﬀerent w orking conditions can identify the bearing faults accurately and outperforms obviously competitive approaches. Keyw ords: F ault diagnosis; Vibration signal; Domain adaptation; T rans- ferable features 1 In tro duction Bearings are the most critical comp onents and widely used in rotating ma- c hinery , whose health conditions, for example, the fault degree in diﬀeren t places under diﬀerent motor sp eeds and loads, may hav e huge eﬀect on the p erfor- mance, reliability and residual life of the equipment [1] or even can lead to hea vy casualties [2–4]. Hence, it is imp ortant to diagnose b earings under diﬀer- en t working conditions. Crac ks or spalls on the surfaces of the roller, outer race or inner race are commonly failure mo des in b earings [5]. Vibration signal is the most intuitiv e description for op erating state of b earing. With the vibration signals under diﬀeren t conditions are collected by sensors [6], man y intelligen t fault diagnosis metho ds hav e already achiev ed signiﬁcant success in the ﬁeld of fault diagnosis. In [7], a genetic algorithm-based SVM(GA-SVM) mo del was presen ted, and it had high accuracy and generalization ability by optimizing parameters of SVM. N. Sarav anan et al [8] prop osed fault diagnosis metho d based on D WT and ANN, and it has b een prov ed such approach had the p oten tial to diagnose v arious faults of the gear b ox. There are tw o key points for common intelligen t fault diagnosis tec hnologies, namely , feature extraction and classiﬁcation. Raw vibration signal collected by sensors is ab ound in redundant information. Thus, it is important for fault diagnosis to ac hieve eﬀective features [9]. Many signal pro cessing ap- proac hes are applied to feature extraction from vibration signals. Such as, time- domain statistical analysis, frequency-domain analysis [10] and time-frequency 2 domain analysis [2]. Then reducing the dimensions is conducted for the sake of computational eﬃciency , such as principal comp onent analysis(PCA) [11], lo cally linear embedding(LLE) [12] and linear discriminant analysis(LDA) [13]. Finally , with the help of a suitable classiﬁer, such as, nearest-neighbor (NN), supp ort vector machine(SVM) or artiﬁcial neural netw orks(ANN), features ac- quired from ab ov e tec hnological pro cess are used for defect classiﬁcation. T o b e true, most of intelligen t fault diagnosis metho ds work well only un- der a general assumption: the training and test data are dra wn from the same distribution. Ho wev er, in op eration of rotating mac hinery , b ecause of compli- cated w orking conditions and complex sensor signals, the distribution of fault data is not consistent. Vibration signals sampled under diﬀerent w orking condi- tions violate ab ov e assumption and sho w large distribution diﬀerences b etw een domains [9, 14], which lead to drop dramatically of performance. More sp ecif- ically , take the roller bearing fault diagnosis problem as an example, classiﬁer w as trained under a very concrete type of data sampled under a certain motor sp eed and load, how ever, the actual application in fault diagnosis is to recog- nize test data collected under another motor sp eed and load. Although the fault diameter and categories are not changed, the distribution diﬀerences b etw een training data (training domain) and test data (test domain) changes with w ork- ing condition v aries. As a direct result, the classiﬁer can ac hieve high accuracy on training domain while p erforming p o orly on test domain [14]. This is caused b y distribution diﬀerences b etw een tw o domains, since features extracted from one domain can not represent for another domain. Of course we can sp end lots of time and eﬀorts to recollect data to build a new classiﬁer for eﬀectiv e fault diagnosis on test domain. How ever, w e can not alwa ys to replace classiﬁer by rep etitiv ely recollecting data. W orse, it is so exp ensive or even impossible to rebuild the fault diagnosis mo del from scratch using newly recollected training data for the actual task. Therefore, there is still plent y of ro om for improv e- men t. In order to av oid suc h recalibration eﬀort, we might w an t to reﬁne a fault 3 diagnosis mo del trained in one condition(training domain) for a new working condition(test domain), or to reﬁne the mo del trained on one rolling b ear- ing(training domain) for a new rolling b earing(test domain). This leads to the researc h of domain adaptation(DA) [15, 16]. DA can b e considered as particular setting of transfer learning [17, 18] whic h aims to lev erage the knowledge learnt from a training domain to use in a diﬀerent but related test domain by reducing distribution diﬀerences [18, 19]. Maximum mean discrepancy(MMD) [20 – 22] in the ﬁeld of D A can b e applied to ev aluate distribution divergences. In this pap er, considering actual fault diagnosis application, we prop ose a no v el b earing fault diagnosis under diﬀerent working conditions based on do- main adaptation using transferable features(D A TF). Dataset of normal b earing and faulty b earings are ac hieved through the fast F ourier transformation(FFT) of ra w vibration signals under diﬀerent motor speeds and load conditions. F ault diagnosis mo del is built by using nearest-neighbor(NN) classiﬁer in training do- main, and then, we resort the pseudo outputs of NN classiﬁer in test domain to reﬁne this mo del by reducing distribution diﬀerence b et ween domains con- stan tly , so that transferable feature representation could be learnt from training and test domains . Finally , NN classiﬁer is built with extracted transferable features and b earing faults are identiﬁed accurately . The rest of this pap er is organized as follows. Section 2 sketc hes out previ- ous works and preliminaries, including domain adaptation and maximum mean discrepancy . Section 3 introduces fault diagnosis using transferable features, including feature space generation and transferable feature extraction and di- agnosis. Section 4 presents the exp erimental ev aluations. The conclusion are giv en in Section 5. 4 2 Previous w orks and preliminaries 2.1 Domain adaptation D A as one research of transfer learning aims at making full use of infor- mation coming from b oth training domain and test domain during the learning pro cess to adapt automatically [18, 19, 23]. Generally domain is considered as consisting of a feature space of inputs X and a probability distribution of inputs P ( X ), where X = { x 1 , · · · , x n } ∈ X is a series of learning samples. Note that distributions of t wo domains are div erse when source domain and target domain are diﬀerent, that is X S 6 = X T and P ( X S ) 6 = P ( X T ) [20, 24]. In our work, the ob jective of domain adaptation is to extract transfer- able features b etw een t w o domains for realizing successfully b earing fault di- agnosis under diﬀeren t working conditions. W e denote the lab eled training domain X tr = { ( x tr 1 , y tr 1 ) , ..., ( x tr n 1 , y tr n 1 ) } , where x tr i ∈ X is the input and y tr i ∈ Y is the related class label. Similarly , let the unlab eled test domain b e X te = { ( x te 1 ) , ..., ( x te n 2 ) } , where the input x te i ∈ X . In the asp ect of distri- bution, let P ( X tr ) and Q ( X te ) b e the marginal distributions of X tr = { x tr i } and X te = { x te i } from the training and test domains, resp ectively . Similarly let P ( Y tr | X tr ) and Q ( Y te | X te ) b e the conditional distributions of X tr = { x tr i } and X te = { x te i } from the training domain and test domain, respectively [20, 25, 26]. In this literature, we fo cus on the following settings: 1)one training domain and one test domain share the same fault types and feature space. 2)domain adaptation in our work is unsup ervised and training domain X tr are of la- b els while test domain X te are fully unlab eled. 3)the marginal distribution P ( X tr ) 6 = Q ( X te ) and the conditional distribution P ( Y tr | X tr ) 6 = Q ( Y te | X te ). Ab o v e settings are well suited to real-world v ariable working conditions fault diagnosis. Our task is predict the fault types of b earing accurately in the unla- b eled test domain with en tirely diﬀeren t distribution b y using the mo del built in training domain. 5 2.2 Maxim um mean discrepancy T ypical pro cedure of domain adaptation is to reduce marginal distribu- tion diﬀerence across domains. In our work, domain adaptation is to reduce b oth m arginal and conditional distribution diﬀerence simultaneously by explic- itly minimizing the empirical distance measure, which is more suitable for the situation of b earing fault diagnosis under diﬀerent working conditions. In order to v oid exp ensiv e distribution calculation caused b y the parametric criteria, a nonparametric distance metric, kno wn as MMD, is emplo yed for domain adap- tation in our w ork. T aking data from source domain X S and target domain X T , the MMD calculates the empirical estimate of distances across domains in the k -dimensional embedding [20, 24]: D m ( X S , X T ) = || 1 n s n s X i =1 A T x i − 1 n t n s + n t X j = n s +1 A T x j || 2 (1) where D m is the distance of marginal distributions across domains, A is the adaptation matrix, and n s and n t denote the n umber of source instances and target instances, resp ectively . 3 F ault diagnosis using transferable features As mentioned in Section 1, huge distribution diﬀerence across training do- main and test domain under diﬀeren t working conditions directly leads to p o or p erformance of b earing fault diagnosis. In order to solve this problem, we need to learn the shift b etw een tw o domains and extract more robust transferable features for tw o domains. In this section, we present our nov el b earing fault diagnosis metho d under v ariable working conditions. The frame w ork of our metho d is illustrated in Figure 1. As shown in Figure 1, fault diagnosis mo del built via lab eled training data is iterated revision according to pseudo-label, and the ﬁnal diagnostic results are obtained through ab ov e revised mo del. Details 6 of each part are elab orated in the following subsections. Labeled training data (Training domain) PCA Raw vibration signals Frequency domain Trained model Test data iteration #1 Revise model < < < iteration #T pseudo-label NO IF BF OF FFT Data acquisition Ulabeled test data (Test domain) PCA Raw vibration signals Frequency domain FFT Data acquisition Test Normal condition Inner race fault Ball fault Outer race fault NO IF BF OF Predict Transferable feature extraction Figure 1 The framework of D A TF for v ariable w orking condition fault diagnosis 3.1 F eature space generation Ra w time series vibration signals are readily av ailable and ab ound in b ear- ing information. Owning to the rotating nature of raw vibration signals from a defectiv e b earing, the p erio dic impulse would app ear in obtained signals once a fault o ccurs. Thus, these fault impacts can b e detected generally in frequency domain. In our work, we directly catch FFT amplitudes from the ra w time series vibration signals as samples, where all samples hav e the same dimension, and these samples are generated under diﬀeren t motor speeds and load conditions, 7 as describ ed in ﬁgure 2. Start Vibration signal sampled with Hz ( ) x n s f N = 12000; NFFT = 2^nextpow2(N); f = fs/2*linspace(0,1,NFFT/2+1) fft_amplitude = abs(fft(x,NFFT))/N. % Length of x (n); % Next power of 2 from N; % Frequency resolution; % Fast Fourier transform of x (n). The single-sided FFT spectrum amplitude is acquired through 2*fft_amplitude(1:NFFT/2+1) in Matlab End Figure 2 Flo wc hart of FFT sp ectrum amplitudes creation in MA TLAB They are divided into tw o parts: lab eled training data( D tr ) and unlab eled test data( D te ). Then we use principal comp onent analysis(PCA) to generate feature space. The main steps of feature space generation are as follows: • Step 1 : Catch FFT amplitudes from ra w time series vibration signals collected under diﬀerent w orking conditions as samples D data . • Step 2 : T ake one of the conditions with diﬀeren t fault types from D data as training samples X tr ∈ R n tr × d with lab el Y tr ∈ R n tr × 1 , and take another of the conditions with diﬀerent fault types from D data as unlabeled test samples X te ∈ R n te × d . • Step 3 : Denote X D = { X tr , X te } ∈ R d × ( n tr + n te ) and H = I − 1 n tr + n te ll T , where I denotes the identit y matrix and l is considered as the ones v ectors. Then, the k dimensional represen tation is found by solving the follo wing optimization problem max A T A = I tr ( A T X D H X T D A ), and then, feature space is created by V = A T X D . 8 3.2 T ransferable feature extraction and diagnosis In order to reduce marginal distribution diﬀerence and extract robust fea- ture for t wo domains, w e resort MMD as the distance measures betw een x i tr and x j te to compare diﬀerent distributions: || 1 n tr n tr X i =1 A T x i − 1 n te n tr + n te X j = n tr +1 A T x j || 2 = tr ( A T X D M m X T D A ) (2) where M m =    ( M m ) tr,tr ( M m ) tr,te ( M m ) te,tr ( M m ) te,te    is the MMD matrix and is computed as follows [24, 26] M m =                1 n tr n tr , x i , x j ∈ X tr 1 n te n te , x i , x j ∈ X te − 1 n tr n te , other w ise (3) The marginal distributions b etw een training domain and test domain are brought closer under the new representation V = A T X D b y minimizing Eq.(2). In theory , training and test data under diﬀeren t working conditions col- lected from sensors should b e of the same marginal and conditional distributions while the reality is very diﬀerent. F or improving the p erformance of b earing fault diagnosis under diﬀeren t w ork conditions, in our work, the diﬀerences of conditional distribution b etw een domains are also reduced by mining the class- conditional distribution. F ormally , the class-conditional distributions can b e measured according to mo diﬁed MMD. || 1 n tr n tr X i =1 A T x i − 1 n te n tr + n te X j = n tr +1 A T x j || 2 = tr ( A T X D M c X T D A ) (4) 9 Where M c =    ( M c ) tr,tr ( M c ) tr,te ( M c ) te,tr ( M c ) te,te    is MMD co eﬃcien t matrix that includes the class lab el c , and it can b e calculated according to [24, 26] M c =                                1 n c tr n c tr , x i , x j ∈ X tr 1 n c te n c te , x i , x j ∈ X te − 1 n tr n te ,      x i ∈ X c tr , x j ∈ X c te x j ∈ X c tr , x i ∈ X c te 0 , other w ise (5) The conditional distributions b etw een training and test domains are brought closer under the new representation V = A T X D b y minimizing Eq.(4). In order to obtain eﬀectiv e and robust transferable feature representation and improv e the quality of fault diagnosis, our w ork aims to reduce the impact of discrepancies from b oth the marginal and conditional distributions betw een training and test domains by resorting the pseudo lab els of test data [26] on diagnosis, and these pseudo lab els can b e obtained from a base classiﬁer(NN classiﬁer) built on the lab eled training data to predict the fully unlab eled test data. Thus, the ﬁnal optimization problem Eq.(6) in this pap er is comprised from Eq.(2) and Eq.(4). min A T X D H X T D A = I (1 − λ ) C X c =0 tr ( A T X D M c X T D A ) + λ || A || 2 F (6) where || · || F is the F robenius norm that guaran tees the optimization problem to b e well deﬁned, and λ is the regularization parameter [24] that trades oﬀ the impact of regularization term on the transformation matrix A. The goal is to ﬁnd the latent feature space created by a transformation matrix A ∈ R d × k where the discrepancies of b oth the marginal and conditional distributions b et w een domains are signiﬁcantly reduced. The Lagrange function for Eq.(7) is 10 constructed, where Λ = diag (Λ 1 , · · · , Λ k ) ∈ R k × k is the Lagrange m ultiplier. L = (1 − λ ) tr ( A T ( X D C X c =0 M c X T D ) A ) + λtr ( A T A ) + tr (( I − A T X D H X T D A )Λ) (7) According to dL dA = 0, the optimal solution of Eq.(9) can b e acquired through the generalized eigen decomp osition. ((1 − λ ) X D C X c =0 M c X T D + λI ) A = X D H X T D A Λ (8) Finally , the adaptation matrix A is obtained from solving Eq.(8) for k smallest eigen v ectors. The pro cedure of fault diagnosis using DAFT c an b e depicted as follo ws in details: • Step 1 : F or given training data X tr ∈ R n tr × d with lab el Y tr ∈ R n tr × 1 and unlab eled test data X te ∈ R n te × d in the feature space. • Step 2 : Construct MMD matrix M m b y Eq.(2). Adaptation matrix A generated b y the k smallest eigen vectors can be acquired b y solving Eq.(8) through Lagrange multiplier. Then the robust represen tation for tw o do- mains is obtained V = A T X D . • Step 3 : T rain the NN classiﬁer on pro jected training data { A T X tr , Y tr } , and then obtain pseudo test data lab els Y te that denote the conditional probabilit y Q ( Y te | X te ) by using the trained NN classiﬁer. • Step 4 : Up date MMD matrix { M c } C c =1 b y Eq.(5) according to P ( Y tr | X tr ) = Q ( Y te | X te ), and then obtain the up dated adaptation matrix A by solving Eq.(8) through Lagrange m ultiplier. The up dated robust representation for tw o domains is obtained V = A T X D , and then jump to Step 3 until the end of the iteration. • Step 5 : Finally the test data lab els Y te are predicted accurately by the adaptiv e NN classiﬁer. 11 4 Exp erimen tal ev aluations In order to demonstrate the eﬀectiveness of the prop osed fault diagnosis metho d, the v ast b earing vibration signals collected from a b earing test rig are used. Dataset is acquired from the b earing data centre of Case W estern Reserv e Univ ersit y(CWRU) [27]. D A TF is compared with the baseline approaches and sev eral successful metho ds. a. Baseline: NN classiﬁer with no pro jection and no adaptation is created. That is, original input is directly used for diagnosis. b. NN NA: NN classiﬁer with no adaptation is created. Sp eciﬁcally , we use a new representation extracted from original input by PCA without domain adaptation. c. NN SA: NN classiﬁer with pro jection and domain adaptation using sub- space alignment that only reduces the marginal distribution [28]. a is a baseline metho d without pro jection and domain adaptation tech- niques, which is widely used in the ﬁeld of fault diagnosis. b is a classical metho d without domain adaptation, whic h has achiev ed success in man y fault diagnosis applications. c is one of the nov el and eﬃcient approac h in domain adaptation. 4.1 Exp erimen tal setup and dataset preparation The test-b ed illustrated in ﬁgure 3 consists of a driving motor, a 2 hp mo- tor for loading, a torque sensor/enco der, a p o w er meter, accelerometers and electronic control unit [27, 29]. The test b earings lo cate in the motor shaft. Sub jected to electrosparking, inner-race faults (IF), outer-race faults (OF) and ball fault (BF) of diﬀerent sizes (0.007in, 0.014in, and 0.021in) are in tro duced in to the driv e-end b earing of motor [30]. The vibration signals are sampled with the help of accelerometers installed to the rack with magnetic bases. 12 Figure 3 Bearing test rig of Case W estern Reserv e Universit y Data Cen ter The working condition of the rotating machinery is usually complex in real-w orld. F or purp ose of simulating the actual application and making the ex- p erimen tal results more p ersuasive, in our exp eriment, dataset, collected from Driv e End Bearing F ault Data and sampled at a frequency of 12kHz, is ob- tained from diﬀerent working conditions. Dataset includes three kinds of fault degrees(0.007in, 0.014in and 0.021in). Eac h fault degree contains four fault t yp es of b earings: NO, IF, OF and BF. Eac h fault type of vibration data is collected from four kinds of w orking conditions, i.e., L0 = 0 hp/1797 rpm, L1 = 1 hp/1772 rpm, L2 = 2 hp/1750 rpm and L3 = 3 hp/1730 rpm. Each sample con tains 2049 F ourier co eﬃcients transformed from the ra w vibration signals using FFT. Each domain on dataset contains four fault types and each fault t yp e contains 200 samples. Under our exp erimental setup, it is imp ossible to ﬁnd the optimal k and λ via cross v alidation, since lab eled training data and unlab eled test data are sampled from diﬀerent working conditions. Th us, em- pirically searching the parameter space is used to ﬁnd the optimal parameter settings, and details are describ ed in Section 4. Finally , λ = 0 . 1 and k = 100 are used in our work. In order to verify the b eneﬁts of DA TF, con trast metho ds of a-c are also carried out simultaneously . The scenario settings of all exp eriments are trained on lab eled training data under one single load(training domain) to diagnose the unlab eled test data under another load(test domain). In all, 48 diﬀerent trans- 13 ferring tests are conducted and the description of exp erimental setup in detail is shown in T able 1. T able 1 Description of the exp erimental setup T ask Diagnose unlab eled test samples in test domain ] of Lab eled training Unlab eled test F ault F ault test (training domain) (test domain) t yp e size 1 L0,L1,L2,L3 L0 NO,IF, 0.007in BF,OF 2 L0,L1,L2,L3 L1 NO,IF, 0.007in BF,OF 3 L0,L1,L2,L3 L2 NO,IF, 0.007in BF,OF 4 L0,L1,L2,L3 L3 NO,IF, 0.007in BF,OF 5 L0,L1,L2,L3 L0 NO,IF, 0.014in BF,OF 6 L0,L1,L2,L3 L1 NO,IF, 0.014in BF,OF 7 L0,L1,L2,L3 L2 NO,IF, 0.014in BF,OF 8 L0,L1,L2,L3 L3 NO,IF, 0.014in BF,OF 9 L0,L1,L2,L3 L0 NO,IF, 0.021in BF,OF 10 L0,L1,L2,L3 L1 NO,IF, 0.021in BF,OF 11 L0,L1,L2,L3 L2 NO,IF, 0.021in BF,OF 12 L0,L1,L2,L3 L3 NO,IF, 0.021in BF,OF 4.2 Diagnosis results of the prop osed metho d The diagnositic results for fault size b eing 0.007in, 0.014in and 0.021in are sho wn in ﬁgure 4, ﬁgure 5 and ﬁgure 6. The av erage classiﬁcation accuracies of four metho ds are describ ed in ﬁgure 7. 14 l i A S A A 100 77.88 81.13 100 100 77.63 78.63 100 100 99.75 95.75 100 100 100 100 100 80 100 Base li ne NN N A NN SA D A TF ) 95.5 100 96.38 79 25 99.12 100 93 82.37 100 100 100 1 00 100 100 100 100 80 100 ) 7 7 . 8 8 8 1 . 1 3 20 40 60 Accuracy(% ) 79 . 25 20 40 60 Accuracy(% ) 0 20 L0->L0 L1->L 0 L2->L0 L3->L0 0 20 L0->L1 L1->L1 L2->L1 L3->L1 (a) (b) 84.25 79.5 96.63 84.25 100 94.5 98.5 90.25 100 94.25 100 100 100 100 80 100 98.75 75 88 91 100 99.5 76 87 100 100 100 99.63 100 100 100 100 100 80 100 ) (a) (b) 63.75 57.37 40 60 75 . 88 40 60 Accuracy(% ) A ccuracy(%) 0 20 L0->L2 L1->L2 L2->L2 L3->L2 0 20 L0->L3 L1->L3 L2->L3 L3->L3 A (c) (d) Figure 4 The results with fault size b eing 0.007in l i A S A A 100 88.75 90 100 100 87.75 94.37 100 100 100 100 100 100 100 100 100 80 100 Base li ne NN N A NN SA D A TF ) 100 100 94.87 100 100 100 99.75 100 100 100 100 1 00 100 100 100 100 80 100 ) 20 40 60 Accuracy(% ) 20 40 60 Accuracy(% ) 0 20 L0->L0 L1->L0 L2->L0 L3->L0 0 20 L0->L1 L1->L1 L2->L1 L3->L1 (a) (b) 100 98.5 100 100 100 100 100 100 100 100 100 100 100 100 100 100 80 100 99.12 96.75 97 100 99.62 98 98.12 100 100 100 99.63 100 100 100 100 100 100 ) (a) (b) 40 60 Accuracy(% ) A ccuracy(%) 0 20 L0->L2 L1->L2 L2->L2 L3->L2 80 L0->L3 L1->L3 L2->L3 L3->L3 A (c) (d) Figure 5 The results with fault size b eing 0.014in 15 l i A S A A 100 99.5 97 78.5 100 99.25 97.62 79 100 100 100 100 100 100 100 100 80 100 Base li ne NN N A NN SA D A TF ) 97.5 100 96.25 76 12 98.38 100 95.87 76.12 100 100 100 1 00 100 100 100 100 80 100 ) 20 40 60 Accuracy(% ) 76 . 12 20 40 60 Accuracy(% ) 0 20 L0->L0 L1->L0 L2->L0 L3->L0 0 20 L0->L1 L1->L1 L2->L1 L3->L1 (a) (b) 95.13 100 100 75 92.63 100 100 75 100 100 100 100 100 100 100 100 80 100 92.13 100 100 100 86.88 100 100 100 100 100 100 100 100 100 100 100 80 100 ) (a) (b) 75 75 40 60 40 60 Accuracy(% ) A ccuracy(%) 0 20 L0->L2 L1->L2 L2->L2 L3->L2 0 20 L0->L3 L1->L3 L2->L3 L3->L3 A (c) (d) Figure 6 The results with fault size b eing 0.021in l i A S A A 97.81 98.63 99.97 100 100 100 100 98.6 93 79 96 98 100 Base li ne NN N A NN SA D A TF 88.74 94.19 89.33 93 . 79 86 88 90 92 94 A ccuracy(%) 80 82 84 86 0 007 0 014 0 021 A 0 . 007 0 . 014 0 . 021 Fault diameter(in) Figure 7 The av erage classiﬁcation accuracies Eac h ﬁgure is comp osed of four subﬁgures and test domains in every ﬁgure are ordered clo c kwise from the top left: L0, L1, L2 and L3. The left of the sym b ol ” − > ” in every subﬁgures represents the training domain and the right represen ts the test domain. F or eac h set of bars in ﬁgure 4, 5 and 6, the p erformances indicate transferring from training domain to test domain, whic h 16 sim ulates fault diagnosis under diﬀeren t w orking conditions. The load and sp eed b et w een diﬀerent domains hav e large discrepancies. F or example, in ﬁgure 4(a), the test domain is L0(the motor load is 0hp and sp eed is 1797rpm), the training domain are L1(the motor load is 1hp and sp eed is 1772rpm), L2(the motor load is 2hp and speed is 1750rpm) and L3(the motor load is 3hp and sp eed is 1730rpm). F rom the p erformances of b earing fault diagnosis in ﬁgure 4, 5 and 6, the highest accuracy rates can alwa ys b e achiev ed when the training set of one domain is the same with the testing set of one domain and this phenomenon is reasonable theoretically . W e can obviously ﬁnd that p e rformances of the baseline metho d and NN NA are all very p o or. F or example, in ﬁgure 6(a), (b), (c), the accuracies are only ab out 75% when we transfer L3 to L0, L1 and L2 resp ectively . Esp ecially in ﬁgure 4, a lot of accuracies of baseline metho d and NN NA can not reach 70% when we transfer L1 to L2. These results illustrate traditional metho ds without domain adaptation can not b e applied to fault diagnosis in v ariable working conditions. The p erformances of NN SA are better than the ﬁrst tw o t yp es metho ds. In ﬁgure 5 and 6, the accuracies of NN NA for v ariable working condition b earing fault diagnosis are v ery high. Ho w ever, in ﬁgure 4(c), the p erformance that transferring b et ween L1 and L2 is only about 90% and the accuracy is ab out 94% when w e transfer L3 to L2. Similar phenomena also app ear in ﬁgure 4(a). These results men tioned abov e indicate that NN NA also can not b e applied to complex and v ariable working condition b earing fault diagnosis. What is exciting that the prop osed metho d is eviden tly sup erior to the other three compared metho ds in all cases, whatev er the training domain and test domain are. Note that the accuracies of DA TF all can ac hiev e 100% in ﬁgure 4, 5 and 6. Even in ﬁgure 4(a), DA TF can still achiev e a fav orable accuracy(100%) while baseline metho d and NN NA just reac h about 60% and NN SA only ac hieve 90% when transferring from L1 to L2. Compared to the other three metho ds, the a verage classiﬁcation accuracy(100%) of DA TF has b een markedly improv ed. These results are all obtained from the 17 b enc hmark datasets of fault diagnosis research under a relativ ely fair exp eriment condition. Through ab ov e result analysis, we can conclude that the proposed metho d is very p otential for solving b earing fault diagnosis problems under diﬀeren t working conditions. T o further illustrate the inﬂuence of extracted transferable features on the results, receiver operating characteristics (R OC) is applied to ev aluation [31]. An ROC curve is generated by plotting the false p ositive rate and true p ositive rate as the threshold lev el is v aried. In this paper, R OC curv es are obtained from diﬀeren t models based on NN classiﬁer, which are built on diﬀeren t extracted features, and we only rep ort ROC results on transferring test that transfers L1 to L2 with fault size b eing 0.007in in ﬁgure 8, while similar trends on all other tests. Before the iteration b egins in ﬁgure 8(a), p e rformances of the mo del built on extracted features are unsatisfactory . After iteration 1 time in ﬁgure 8(b), p erformances of the mo del built on extracted transferable features are improv ed dramatically , and what is exciting is that p erformances based on extracted transferable features ac hiev e the p erfect detection results ultimately . 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Po sitive Rat e True Pos itive Rate NO IF OF BF 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Po sitive Rat e True Pos itive Rate 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Po sitive Rat e True Pos itive Rate BF: AUC = 0.91 OF: AUC = 0.54 IF : AU C = 0.58 NO: A UC= 1 NO: A UC= 1 IF : AUC= 1 OF: AUC= 0.93 BF: AUC= 0.79 NO: A UC= 1 IF : AUC= 1 OF: AUC= 1 BF: AUC= 1 (a) Before the iterat ion begins (b) It eration 1 t ime (c) It eration 10 t imes Figure 8 R OC curves of faults detection based on DA TF 18 4.3 P arameter sensitivity In this section, we inv estigate the inﬂuence of the parameter λ , which rep- resen ts regularization parameter and feature dimensionality resp ectively dur- ing transferable feature extraction. Theoretically , larger v alues of λ can make shrink age regularization more imp ortant in our work. When λ → 0 and λ → 1, the optimization problem is ill-deﬁned. Diﬀerent λ has diﬀerent eﬀects on clas- siﬁcation accuracy . Figure 9 rep orts the results. F rom the ﬁgure 9, it is obvious that diﬀerent λ ha ve a great inﬂuence on diagnostic results with fault size b eing 0.007in and p erformances with fault size b eing 0.021in and it has little ov erall eﬀect on results with fault size b eing 0.014in. What is noticeable is that results are little aﬀected b y parameter λ when the training domain and test domain are the same, and λ ∈ [0.05,0.5] can be optimal parameter v alues, which can indi- cate the prop osed method can ac hieve stable and excellent p erformance under a wide range of parameter v alues. 0.0001 0.001 0.01 0.1 0.99 50.00% 55.00% 60.00% 65.00% 70.00% 75.00% 80.00% 85.00% 90.00% 95.00% 100.00% Accuracy 0.0001 0.001 0.01 0.1 0.99 80.00% 82.00% 84.00% 86.00% 88.00% 90.00% 92.00% 94.00% 96.00% 98.00% 100.00% Accuracy 0.0001 0.001 0.01 0.1 0.99 65.00% 70.00% 75.00% 80.00% 85.00% 90.00% 95.00% 100.00% Accuracy L0->L0 L0->L1 L0-> L2 L0->L3 L1->L0 L1-> L1 L1- >L2 L1-> L3 L2->L0 L2- >L1 L2->L2 L2->L3 L3-> L0 L3->L1 L3->L2 L3-> L3 (a) Effects on fault size being 0.007in (b) Effects on fault size being 0.014in (c) Eff ects on fault s ize being 0. 021in Figure 9 Accuracy (%) on diﬀeren t λ 4.4 Domain discrepancy eﬀect of empirical analysis In many actual fault diagnosis and classiﬁcation scenarios, the distribution of training data domain is diﬀerent from the testing data domain, whic h le ads to fault diagnostic accuracy-dropping. In fact, the data distribution diﬀerences 19 b et w een domains(training data domain and test data domain) reﬂect the dif- ferences of the data structures that contain plent y of fault messages. It is a k ey p oint for fault diagnosis to extract fault features from data structures. In order to profound understand the eﬀect of distribution diﬀerences b etw een tw o domains and explain wh y the prop osed metho d w orks, w e resort the t-SNE tec h- nique [32] to visualize high dimensional representation of mentioned methods in our exp eriment in a tw o-dimensional map. In all ab ov e men tioned cases, taking the transferring test that transfers L1 to L2 with fault size b eing 0.007in as an example in ﬁgure 10. −60 −40 −20 0 20 40 60 80 −60 −40 −20 0 20 40 60 −80 −60 −40 −20 0 20 40 60 80 −80 −60 −40 −20 0 20 40 60 −80 −60 −40 −20 0 20 40 60 80 −80 −60 −40 −20 0 20 40 60 80 −80 −60 −40 −20 0 20 40 60 80 −80 −60 −40 −20 0 20 40 60 80 NO IF BF OF NO IF BF OF (b) NN NA (a) Baseline method (c) NN SA (d) DATF Figure 10 F eature visualization via t-SNE [32] ov er a fault diagnosis task from training domain L1(blue) to test domain L2(red) under diﬀeren t working con- ditions F rom ﬁgure 10, it is clear that the distribution discrepancies of transferable features extracted via DA TF b etw een training domain and test domain are m uch smaller than the compared metho ds’, and transferable features are muc h more divisible than others’. These results verify that DA TF can ﬁgure out a robust feature representation for training domain and test domain, and test samples can b e discriminated signiﬁcantly with NN classiﬁer built in training domain by using extracted transferable features. 20 4.5 Discussion The prop osed metho d provides a wa y of domain adaptation to extract ro- bust fault features and classify fault t yp es under diﬀerent working conditions. Sev eral remarks still need to b e describ ed. (1) This w ork presents a new p oint of view that uses domain adaptation to re- alize b earing fault diagnosis under diﬀeren t working conditions. Li [30] utilized sp ectrum images as features to conduct bearing fault diagnosis, which applied t w o-dimensional principal comp onent analysis (2DPCA) into the dimension re- duction of the sp ectrum images of vibration signals and feature extraction, and most accuracies were v ery high. Unfortunately , there are still several instances ha ving lo wer accuracies. T o solv e this problem, w e apply the domain adaptation in to this ﬁeld and transferable features for training domain and test domain are extracted to classify fault t ypes. Finally the accuracies all can reach 100%. In this paper, our w ork considers more b earing conditions(fault size b eing 0.007in). Compared with the metho d [30] in this situation, adv an tages of our metho d are highligh ted. (2) The v ast results indicate that the prop osed metho d is suitable for eﬀec- tiv ely classifying mechanical health conditions under diﬀerent working condi- tions. In [9], Deep Conv olutional Neural Netw orks with Wide First-la yer Kernel (WDCNN) and AdaBN are applied to diagnose three datasets which con tain 10 kinds of health conditions (BF IF OF with fault size b eing 0.007 in, 0.014 in and 0.021 in) under three load conditions (Load1, Load2, Load3), resp ectiv ely , whic h is similar to L1, L2 and L3 in this pap er. The a verage accuracy of this metho d in [9] is 95 . 9%, whereas av erage accuracy of DA TF is 100%. The main reason is that transferable features extracted based on domain adaptation take full adv antage of structure information of training domain and test domain, and the distributions of transferable features extracted from training domain and testing domain are very close after our metho ds as shown in ﬁgure 10. (3) It is noted that our metho d is unsup ervised and fo cuses on fault transfer diagnosis based on the same fault diameter under diﬀerent working conditions. 21 In [14], a metho d based on Neural Netw ork by using transferring parameters is prop osed and success for diagnosing tw o datasets including 6 kinds of health conditions whic h sampled from diﬀeren t fault diameters (BF IF OF with fault size b eing 0.007 in and 0.021 in) with the same motor load and sp eed (L0), and it fo cuses on fault diagnosis b etw een tw o kinds of fault diameters under the same working conditions. In addition, unlike our metho d, it should b e noted that a small amount of lab eled data in test domain are needed when training mo diﬁed neural netw orks, while our metho d do es not need lab eled test data during the training. 5 Conclusion This paper presen ts a new w ay for solving b earing fault diagnosis under dif- feren t working conditions. Although baseline approaches and several successful metho ds are all capable of detecting the b earing defects, distributional diﬀer- ence of datasets sampled from diﬀerent w orking conditions has a huge impact on these metho ds, and their shallow represen tations are insensitiv e to distinguish diﬀeren t patterns under diﬀerent w orking conditions. T o tackle this problem, D A TF extracts transferable feature represen tation for training and test domain b y reducing the discrepancy b etw een domains and strengthen the recognizable information in ra w vibration signal. T o ev aluate the proposed D A TF metho d, b earing fault diagnosis exp erimen ts were carried out. Extensive exp eriment results show DA TF is capable of impro ving the p erformance of b earing fault diagnosis under diﬀeren t w orking conditions, comparing with the p eer metho ds. Ac knowledgemen ts This researc h is supp orted by National Key R&D Program of China (2016YFC0802900), National Natural Science F oundation of China (No. 51475455), the Natural Sci- ence F oundation of Jiangsu Province (No. BK20160276). 22 References [1] W. Jacobs, B. V. Ho oreweder, R. Bo onen, P . Sas, D. Mo ens, The inﬂu- ence of external dynamic loads on the lifetime of rolling element b earings: Exp erimen tal analysis of the lubrican t ﬁlm and surface wear, Mec hanical Systems and Signal Pro cessing 74 (1) (2016) 144 – 164. [2] G. Li, G. L. McDonald, Q. Zhao, Sin usoidal synthesis based adaptiv e trac k- ing for rotating machinery fault detection, Mechanical Systems and Signal Pro cessing 83 (2017) 356 – 370. [3] K. A. Loparo, M. L. Adams, W. Lin, M. F. Ab del-Magied, N. Afshari, F ault detection and diagnosis of rotating machinery , IEEE T ransactions on Industrial Electronics 47 (5) (2000) 1005–1014. [4] A. K. Jalan, A. Mohant y , Model based fault diagnosis of a rotor–b earing system for misalignment and un balance under steady-state condition, Jour- nal of Sound and Vibration 327 (3) (2009) 604 – 622. [5] B. Qiao, X. Zhang, J. Gao, X. Chen, Impact-force sparse reconstruction from highly incomplete and inaccurate measurements, Journal of Sound and Vibration 376 (Supplemen t C) (2016) 72 – 94. [6] Y. Li, M. Xu, Y. W ei, W. Huang, A new rolling b earing fault diagnosis metho d based on m ultiscale p erm utation en trop y and improv ed supp ort v ector machine based binary tree, Measurement 77 (Supplement C) (2016) 80 – 94. [7] J. Huang, X. Hu, F. Y ang, Supp ort vector machine with genetic algorithm for machinery fault diagnosis of high v oltage circuit breaker, Measurement 44 (6) (2011) 1018–1027. [8] N. Sarav anan, K. Ramachandran, Incipien t gear b ox fault diagnosis using discrete wa velet transform (dwt) for feature extraction and classiﬁcation 23 using artiﬁcial neural netw ork (ann), Exp ert Systems with Applications 37 (6) (2010) 4168 – 4181. [9] W. Zhang, G. P eng, C. Li, Y. Chen, Z. Zhang, A new deep learning mo del for fault diagnosis with goo d anti-noise and domain adaptation ability on ra w vibration signals, Sensors 17 (2). [10] C. Joannin, B. Chouvion, F. Thouverez, J.-P . Ousty , M. Mbay e, A non- linear comp onent mode synthesis method for the computation of steady- state vibrations in non-conserv ative systems, Mechanical Systems and Sig- nal Pro cessing 83 (Supplemen t C) (2017) 75 – 92. [11] M. Misra, H. Y ue, S. Qin, C. Ling, Multiv ariate pro cess monitoring and fault diagnosis by m ulti scale p ca, Computers and Chemical Engineering 26 (9) (2002) 1281 – 1293. [12] S. T. Row eis, L. K. Saul, Nonlinear dimensionalit y reduction by lo cally linear embedding, Science 290 (5500) (2000) 2323–2326. [13] X. Jin, M. Zhao, T. W. S. Chow, M. Pec ht, Motor b earing fault diag- nosis using trace ratio linear discriminan t analysis, IEEE T ransactions on Industrial Electronics 61 (5) (2014) 2441–2451. [14] R. Zhang, H. T ao, L. W u, Y. Guan, T ransfer learning with neural netw orks for b earing fault diagnosis in changing working conditions, IEEE Access 5 (2017) 14347–14357. [15] H. Shimo daira, Impro ving predictive inference under co v ariate shift by w eigh ting the log-likelihoo d function, Journal of Statistical Planning and Inference 90 (2) (2000) 227 – 244. [16] D. T uia, C. Persello, L. Bruzzone, Domain adaptation for the classiﬁcation of remote sensing data: An o verview of recent adv ances, IEEE Geoscience and Remote Sensing Magazine 4 (2) (2016) 41–57. 24 [17] S. J. Pan, I. W. Tsang, J. T. Kwok, Q. Y ang, Domain adaptation via transfer component analysis, IEEE T ransactions on Neural Net works 22 (2) (2011) 199–210. [18] S. J. Pan, Q. Y ang, A survey on transfer learning, IEEE T ransactions on Kno wledge and Data Engineering 22 (10) (2010) 1345–1359. [19] M. Long, J. W ang, G. Ding, S. J. Pan, P . S. Y u, Adaptation regularization: A general framew ork for transfer learning, IEEE T ransactions on Knowl- edge and Data Engineering 26 (5) (2014) 1076–1089. [20] V. M. P atel, R. Gopalan, R. Li, R. Chellappa, Visual domain adaptation: A surv ey of recent adv ances, IEEE Signal Pro cessing Magazine 32 (3) (2015) 53–69. [21] W. Lu, B. Liang, Y. Cheng, D. Meng, J. Y ang, T. Zhang, Deep mo del based domain adaptation for fault diagnosis, IEEE T ransactions on Industrial Electronics 64 (3) (2017) 2296–2305. [22] K. Nigam, A. K. McCallum, S. Thrun, T. Mitchell, T ext classiﬁcation from lab eled and unlabeled do cumen ts using em, Mach. Learn. 39 (2-3) (2000) 103–134. [23] R. Gopalan, R. Li, R. Chellappa, Domain adaptation for ob ject recognition: An unsup ervised approach, 2011 International Conference on Computer Vision (2011) 999–1006. [24] J. T ahmoresnezhad, S. Hashemi, Visual domain adaptation via transfer feature learning, Kno wledge and Information Systems 50 (2) (2017) 585 – 605. [25] G. Csurk a, Domain adaptation for visual applications: A comprehensive surv ey , CoRR abs/1702.05374. 25 [26] M. Long, J. W ang, G. Ding, J. Sun, P . S. Y u, T ransfer feature learning with joint distribution adaptation, 2013 IEEE International Conference on Computer Vision (2013) 2200–2207. [27] Case western reserve universit y b earings vibration dataset av ail- able: http://csegroups.case.edu/bearingdatacenter/home(accessed octob er 2015). [28] B. F ernando, A. Habrard, M. Sebban, T. T uytelaars, Unsup ervised vi- sual domain adaptation using subspace alignmen t, 2013 IEEE In ternational Conference on Computer Vision (2013) 2960–2967. [29] H. Al-Bugharb ee, I. T rendaﬁlov a, A fault diagnosis methodology for rolling elemen t b earings based on adv anced signal pretreatment and autoregressiv e mo delling, Journal of Sound and Vibration 369 (2016) 246–265. [30] W. Li, M. Qiu, Z. Zhu, B. W u, G. Zhou, Bearing fault diagnosis based on sp ectrum images of vibration signals, Measurement Science and T echnology 27 (3) (2016) 035005. [31] S. Lee, P . White, Higher-order timefrequency analysis and its application to fault detection in rotating machinery , Mechanical Systems and Signal Pro cessing 11 (4) (1997) 637–650. [32] L. Maaten, G. Hinton, Visualizing data using t-sne, Journal of Machine Learning Research 9 (2605) (2008) 2579–2605. 26

Bearing fault diagnosis based on domain adaptation using transferable features under different working conditions

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment