Detecting Unseen Falls from Wearable Devices using Channel-wise Ensemble of Autoencoders

A fall is an abnormal activity that occurs rarely, so it is hard to collect real data for falls. It is, therefore, difficult to use supervised learning methods to automatically detect falls. Another challenge in using machine learning methods to auto…

Authors: Shehroz S. Khan, Babak Taati

Detecting Unseen Falls from Wearable Devices using Channel-wise Ensemble   of Autoencoders
Detecting Unseen F alls from W earable Devices using Channel-wise Ensem ble of Auto enco ders Shehroz S. Khan b,a, ∗ , Babak T aati a a T or onto R ehabilitation Institute, 550 University Ave, T or onto, ON, M5G 2A2, Canada b University of T or onto, Canada Abstract A fall is an abnormal activit y that o ccurs rarely , so it is hard to collect real data for falls. It is, therefore, difficult to use sup ervised learning methods to automatically detect falls. Another c hallenge in using mac hine learning methods to automatically detect falls is the c hoice of engineered features. In this paper, w e propose to use an ensem ble of autoenco ders to extract features from differen t c hannels of wearable sensor data trained only on normal activities. W e sho w that the traditional approach of choosing a threshold as the maximum of the reconstruction error on the training normal data is not the right wa y to identify unseen falls. W e propose t wo metho ds for automatic tightening of reconstruction error from only the normal activities for better identification of unseen falls. W e presen t our results on tw o activity recognition datasets and show the efficacy of our prop osed metho d against traditional auto enco der mo dels and t w o standard one-class classification metho ds. Keywor ds: fall detection, one-class classification, auto encoders, anomaly detection 1. In tro duction F alls are a ma jor cause of b oth fatal and non-fatal injury and a hindrance in living indep endently . Each year an estimated 424 , 000 individuals die from falls globally and 37 . 3 million falls require medical attention [23]. Exp eriencing a fall ma y lead to a fear of falling [6], which in turn can result in lack of mobility , less pro ductivit y and reduced qualit y of life. There exist several commercial w earable devices to detect falls [24]; most of them use accelerometers to capture motion information. They normally come with an alarm button to manually con tact a c aregiv er if the fall is not detected by the device. Ho wev er, most of the devices for detecting falls pro duce man y false alarms [3]. Automatic detection of falls is long sought; hence, machine learning techniques are needed ∗ Corresponding author: T el.: +1 416-597-3422; F ax: +1 416-597-6201; Email addr esses: shehroz.khan@utoronto.ca (Shehroz S. Khan), babak.taati@uhn.ca (Babak T aati ) Pr eprint submitte d to Exp ert Systems with Applic ations Mar ch 24, 2017 to automatically detect falls based on sensor data. How ev er, a fall is a rare ev ent that does not happ en frequen tly [30, 12]; therefore, during the training phase, there may b e v ery few or no fall samples. Standard sup ervised classification tec hniques may not b e suitable in this t yp e of skew ed data scenario. Another issue regarding the use of machine learning metho ds in fall detection is the c hoice of features. T raditional activity recognition and fall detection metho ds extract a v ariety of domain sp ecific features from ra w sensor readings to build classification models [26, 11]. It is very difficult to ascertain the num b er or t yp es of features, specially in the absence of fall sp ecific training data to build generalizable mo dels. T o handle the problems of lack of training data from real falls and the difficult y in engineering appropriate features, w e explore the use of Auto encoders (AE) that are trained only on normal activities. AEs can learn generic features from the raw sensor readings and can b e used to identify unseen falls as abnormal activities during testing based on a threshold on the reconstruction error. W e presen t t wo ensembles approaches of AE that train on the raw data of the normal activities from different c hannels of accelerometer and gyroscop e separately and the results of eac h AE is combined to arrive up on a final decision. Typically , while using AE, the maximum of reconstruction error on the training set is considered as the threshold to iden tify an activity as abnormal. How ev er, w e exp erimen tally show that suc h threshold may not be appropriate for detecting falls due to noisy sensor data. W e present t wo threshold tightening techniques to remov e few outliers from the normal data. Then, either a new threshold is deriv ed using inter-quartile range or by training a new AE on the training data with outliers remov ed. W e show result on t wo activit y recognition datasets that con tain different normal activities along with falls from w earable sensors. The rest of the pap er is organized as follo ws. In the next Section, we presen t a brief in tro duction to Autoenco ders. Section 3 reviews the literature on fall detection using AE and on the use of AE in general outlier detection tasks. W e presen t the prop osed c hannel-wise ensem ble of auto enco der and tw o threshold tigh tening approac hes using reconstruction error in Section 5. Experimental analysis and results are discussed in Section 6, follow ed by conclusions and future w ork in Section 7. 2. Brief Introduction to Auto encoders An AE is an unsup ervised multi-la y er neural net work that learns compact represen tation of the input data [29]. An AE tries to learn an iden tit y function suc h that its outputs are similar to its inputs. How ever, by putting constrain t on the net work, suc h as limiting the n umber of hidden neurons, it can discov er compact represen tations of the data that can b e used as features for other sup ervised or unsup ervised learning tasks. An AE is often trained b y using the bac kpropagation algorithm and consists of an encoder and decoder part. If there is one hidden la y er, an AE tak es the input x ∈ R d and maps it on to h ∈ R p , s.t. h = f ( W x + b ) (1) 2 where W is a w eight matrix and b is a bias term and f ( . ) is a mapping function. This step is referred to as encoding or learnin g laten t represen tation, after which h is mapp ed bac k to reconstruct y of the same shap e as x , i.e. y = g ( W 0 h + b 0 ) (2) This step is referred to as decoding or reconstructing the input back from la- ten t representation. An AE can b e used to minimize the squared reconstruction error, L i.e., L ( x,y ) = k x -y k 2 (3) AE can learn compact and useful features if p < d ; how ev er, it can still disco ver interesting structures if p > d . This can be achiev ed b y imp osing a sparsit y constraint on the hidden units, s.t. neurons are inactive most of the time or the av erage activ ation of eac h hidden neuron is close to zero. T o achiev e sparsit y , an additional sparsity parameter is added to the ob jectiv e function. Multiple lay ers of AEs can b e stac k ed on top of each other to learn hierarc hical features from the ra w data. They are called Stac k ed AE (SAE). During enco ding of a SAE, the output of first hidden la yer serves as the input to the second la y er, whic h will learn second level hierarc hical features and so on. F or deco ding, the output of the last hidden lay er is reconstructed at the second last hidden la yer, and so on un til the original input is reconstructed. 3. Related W ork AEs can b e used both in sup ervised and unsup ervised mo de for iden tifying falls. In a sup ervised classification setting, AE is used to learn representativ e features from b oth the normal and fall activities. This step can be follo wed b y a standard mac hine learning classifier trained on these compressed features [17] or b y a deep net w ork [10]. In the unsupervised mo de or One-Class Classification (OCC) [14] setting, only data for normal activities is presen t during training the AE. In these situations, an AE is used to learn represen tative features from the ra w sensor data of normal activities. This step is follow ed by either employing (i) a discriminative mo del by using one-class classifiers or (ii) a generativ e mo del with appropriate threshold based on reconstruction error, to detect falls and normal activities. The present paper follo ws the unsupervised AE approac h with a generativ e model and finding an appropriate threshold to inden tify unseen falls. A lot of work has been done in ev aluating the feasibility of learning generic represen tations through AEs for general activit y recognition and fall detection tasks. Pl¨ otz et al. [25] explore the p oten tial of disco v ering universal features for con text-aw are application using w earable sensors. They presen t sev eral feature learning approaches using PCA and AE and sho w their sup erior p erformance in comparison to standard features across a range of activit y recognition appli- cations. Budiman et al. [1] use SAEs and marginalized SAE to infer generic features in conjunction with neural netw orks and Extreme Learning Machines 3 as the supervised classifiers to p erform pose-based action recognition. Li et al. [16] compare SAE, Denoising AE and PCA for unsup ervised feature learning in activity recognition using smartphone sensors. They sho w that traditional features p erform w orse than the generic features inferred through auto encoders. Jok ano vic et al. [10] use SAE to learn generic low er dimensional features and use softmax regression classifier to identify falls using radar signals. Other re- searc hers [8, 31] hav e used AEs to reduce the dimensionalit y of domain sp ecific features prior to applying traditional supervised classification models or deep b elief net w orks. AEs hav e also b een extensively used in anomaly detection. Japko wicz et al. [9] present the use of AE for no velt y detection. F or noiseless data, they prop ose to use a reduced p ercentage of maximum of reconstruction error as a threshold to iden tify outliers. F or noisy data, they propose to identify b oth the intermediate p ositive and negativ e regions and subsequently optimizing the threshold un til a desired accuracy is ac hiev ed. Manevitz and Y ousef [18] presen t an AE approac h to filter documents and report b etter performance than tradi- tional classifiers. They rep ort to carry out certain type of uniform transforma- tion b efore training the netw ork to improv e the p erformance. They discuss that c ho osing an appropriate threshold to identify normal do cumen ts is challenging and presen t sev eral v arian ts.The method that w orked the b est in their applica- tion is to tighten the threshold sufficiently to disallow the classification of the highest 25 p ercentile error cases from the training set. Erfani et al. [4] present a h ybrid approach to com bine the AEs and one-class SVM ( O S V M ) for anomaly detection in high-dimensional and large-scale applications. They first extract generic features using SAE and train an O S V M with linear k ernel on learned features from SAE. They also use SAE as a one-class classifier by setting the threshold to b e 3 times of standard deviation a wa y from the mean. Their results sho w comparable results in comparison to AE based anomaly classifier but the training and testing time greatly reduced. Sakurda and Y airi [28] show the use of AE in anomaly detection task and compare it with PCA and Kernel PCA. They demonstrate that the AE can detect subtle anomalies that PCA could not and is less complex than Kernel PCA. Ensem bles of AE hav e b een used to learn diverse feature representations, mainly in the sup ervised settings. Ithapu et al. [7] present an ensemble of SAE by presen ting it with randomized inputs and randomized sample sets of h yp er-parameters from a given hyperparameter space. They show that their approac h is more accurately related to differen t stages of Alzheimer’s disease and leads to efficient clinical trials with very less sample estimates. Reeve and Ga vin [27] presen t a modular AE approach that consists of M AE modules trained separately on different data representations and the combined result is defined b y taking an av erage of all the modules present. Their results on several b enc hmark datasets show improv ed p erformance in comparison to baseline of b ootstrap version of the AE. Dong and Japko wicz [2] present a sup ervised and unsup ervised ensemble approac h for stream learning that uses multi-la y er neural net works and AE. They train their mo dels from multi-threads whic h evolv e with data streams, the ensem ble of the AE is trained using only the data from 4 p ositiv e class and is accurate when anomalous training data are rare. Their metho d p erforms b etter as compared to the state-of-the-art in terms of detection accuracy and training time for the datasets. The researc h on using AE show that it can successfully learn generic features from raw sensor data for activit y and fall recognition tasks. W e observe that AE can b e effectiv ely used for anomaly detection tasks and their ensembles can p erform better than a single AE. In this paper, fall detection problem is form ulated as an OCC or anomaly detection, where abundant data for normal activities is a v ailable during training and none for falls. W e inv estigate the utilit y of features learned through AE and their ensembles for the task of fall detection. 4. Auto encoder Ensemble for Detecting Unseen F alls In the absence of training data for falls, a fall can b e detected by training an AE/SAE on only the normal activities to learn generic features from a wearable device. These features can be fed to standard OCC algorithms to detect a test sequence as a normal activity or not (a fall in our case). Alternativ ely , based on the training data, a threshold can b e set on the reconstruction error of the AE/SAE to identify a test sequence as an abnormal activit y (a fall in our case) if its reconstruction error is higher than a given threshold. In tuitiv ely , this w ould mean that the test sequence is very different from the training data comprising of normal activities. Belo w, w e discuss tw o t yp es of AE approac hes used in the pap er. 4.1. Monolithic Auto enc o ders Figure 1 sho ws the AE/SAE for training normal activities using raw sensor data from a three-c hannel accelerometer and gyroscop e. The raw sensor read- ings coming from each of the c hannels of accelerometer ( a x , a y , a z ) and gyroscop e ( ω x , ω y , ω z ) are combined and presen ted as input to the AE/SAE. F or a sliding windo w of a fixed length ( n samples), a x = [ a 1 x , a 2 x , ..., a n x ], a y = [ a 1 y , a 2 y , ..., a n y ], and so on. The feature v ector for a time window is constructed by concatenating these sensor readings as f = [ a x , a y , a z , ω x , ω y , ω z ] T . W e call this feature learn- ing approach as monolithic b ecause it com bines raw sensor data from different c hannels as one input to an AE. Figure 1: Monolithic AE for detecting unseen falls. 5 (a) Ensem ble of 6 separate channels of ac- celerometer and gyroscope. (b) Ensem ble of 2 channels of the magni- tude of accelerometer and gyroscope. Figure 2: Channel-wise AE for detecting unseen falls. 4.2. Channel-wise Auto enc o ders Li et al. [16] presen t the use of ensem ble of SAE by extracting generic fea- tures p er each of the three accelerometer channels and additional channel for the magnitude of the accelerometer v ector in 3-dimensions. They extract fixed n um- b er of features for each of these 4 channels and concatenate them. Sup ervised classification metho ds are then used on these extracted features. This setting can w ork for sup ervised classification but not in OCC scenario. In our case, w e deal with OCC scenario with only normal data a v ailable during training. Therefore, separate AEs are trained on the raw data from different c hannels of 6 the sensors. Each AE can detect a test sample as an unseen fall or not based on the reconstruction error and their ov erall result is combined to tak e a final decision. W e prop ose to use tw o t yp es of channel-wise ensemble strategies for detecting unseen falls as follo ws: • Six Channel Ensem ble (6 C E ): F or eac h of the 6 c hannels of an accelerom- eter and a gyroscop e (i.e., a x , a y , a z , ω x , ω y , ω z ), 6 separate AE/SAE are trained to learn a compact representation for each channel. A decision threshold can b e employ ed on each of these 6 AEs to decide whether a test sample is normal or a fall. • Tw o Channel Ensem ble (2 C E ): Alternativ ely , w e can compute the mag- nitude of the 3 accelerometer channels and that of the 3 gyroscope c han- nels. The magnitude v ector giv es direction inv ariant information. W e train t wo separate AE/SAE to learn a compact representation for each of the tw o magnitude c hannels. Thresholding the reconstruction error on these tw o c hannels can b e used to decide whether a test sample is normal activit y or a fall. F or a given test sample, the 6 C E will giv e 6 differen t decisions and the 2 C E giv e 2 decisions. These decisions can b e combined b y ma jority v oting to arrive at a final decision; as a conv en tion, ties are considered as falls. F or simplicity , w e keep the h yper-parameters for each AE/SAE corresponding to a channel as the same. The ensemble approach can b e faster than the monolithic approac h b ecause AE/SAE p er c hannel uses less amount of data in comparison to the com bined 6 channel data to a single AE/SAE. Figure 2 sho ws the graphical represen tation of the 6 C E and 2 C E approaches. 5. Optimizing the Threshold on the Reconstruction Error F or the fall detection problem, we assume that fall data is rare and is not presen t during training phase [13]. Therefore, we train monlithinc and channel- wise AE and SAE on the ra w sensor data to learn a compact representation of the normal activities. The next step is to identify a test sample as normal or fall based on the trained AE/SAEs. The typical approach to iden tify a fall as an anomaly is to set a threshold on the reconstruction error. This threshold is generally set as the maxim um of the reconstruction error on the full training data. W e call this threshold MaxRE . During testing, an y sample that has a reconstruction error greater than this v alue can b e identified as a fall. Ho w ever, sensor readings are not p erfect and ma y contain spurious data [12], which can affect this threshold. Due to the presence of a few outliers in the training data, MaxRE is often too large, which could result in man y of the falls b eing missed during testing time. T o handle this situation, tigh tening of threshold is often required (as discussed in Section 3). W e use the approac h of Erfani et al. [4] that sets the threshold as 3 standard deviations aw ay from the mean of the training data reconstruction error. W e call this threshold metho d as StdRE . The StdRE 7 threshold can result in identifying more falls during testing in comparison to MaxRE at the cost of few false alarms b ecause the threshold in this case is smaller in comparison to MaxRE . A problem with StdRE is that it is chosen in an adho c manner and it ma y not b e an appropriate c hoice for a giv en data. W e now present tw o new approac hes to tigh ten the threshold on reconstruc- tion error. These approaches deriv e the threshold from the training data such that it can b etter identify unseen falls. These metho ds are similar to finding an optimal op erating p oint on an ROC curve by reducing false negativ es at the cost of false alarms. Ho w ev er, in a OCC framew ork, it is difficult to adopt a traditional ROC approach b ecause of the unav ailability of the v alidation data for the negativ e class. The prop osed metho ds ov ercome these difficulty b y re- mo ving outliers from the training data prior to setting a threshold based on only the training data. 5.1. R e duc e d R e c onstruction Err or As we discussed earlier, the ra w sensor data ma y not b e p erfect and may con tain spurious or incorrectly labeled readings [12]. If an AE/SAE is trained on normal activities on such data, the reconstruction error for some of the samples of the training set may b e very large. In this case, c ho osing the maximum of reconstruction error as the threshold to iden tify falls may lead to accepting most of the falls as normal activities. Khan et al. [13] prop ose to use the concept of quartiles from descriptive statistic to remo ve few outliers presen t in the normal activities class. W e use a similar idea but adapt it to AE to tigh ten the threshold on the reconstruction error. W e first train an AE on normal activites, then find the reconstruction error of eac h training sample. Giv en the reconstruction error on the training data comprising of only instances of normal activities, the lo wer quartile ( Q 1 ), the upper quartile ( Q 3 ) and the in ter-quartile range ( I QR = Q 3 − Q 1 ), a p oint P is qualified as an outlier of the normal class, if P > Q 3 + Ω × I QR || P < Q 1 − Ω × I QR (4) where Ω is the rejection rate that represents the p ercen tage of data p oin ts that are within the non-extreme limits. Based on Ω, the extreme v alues of reconstruction error that represen ts spurious training data can b e remov ed and a threshold can b e c hosen as the maxim um of the remaining reconstruction errors. W e call this metho d as Reduced Reconstruction Error ( RRE ). The v alue of Ω can be found experimentally or set to remo ve a small fraction of the normal activities data. W e describe a cross-v alidation technique in Section 5.3 to find RRE from only the normal activities. 5.2. Inlier R e c onstruction Err or In this metho d, we first train an AE/SAE on full normal data and then remo ve the corresp onding anomalous training instances based on Ω from the training set (as discussed in the previous section). After this step, we are left with training data without the outlier instances. Then, we train a new AE on 8 this reduced data comprising of just the inlier. The idea is that the v ariance of reconstruction error for such inlier data will not be to o high and its maximum can serve as the new threshold. W e call this metho d as Inlier Reconstruction Error ( IRE ). F or the c hannel-wise ensem ble approac h, each AE/SAE is trained only using the raw sensor data from a sp ecific channel of normal activities, then v arious thresholds, i.e., MaxRE , StdRE , RRE and IRE are computed for each c hannel separately . During testing, for a given threshold metho d, the final decision is tak en as the ma jority voting outcome of all the AE/SAE. The intuition b ehind RRE and IRE is that they should provide a b etter trade-off b et ween false p os- itiv e and true p ositiv e rate in comparison to MaxRE and StdRE . The threshold MaxRE may w ork b etter in one direction, whereas StdRE is an adho c approac h to minimize errors. Both RRE and IRE are deriv ed from the data and not ar- bitrarily set to a fixed num ber. The proposed threshold tightening methods, i.e. RRE and IRE , attempt to find a threshold after remo ving spurious sensor data from the normal activities, this may lead to improv ed sensitivity in detecting falls. 5.3. Cr oss V alidation The parameter Ω to tighten the threshold for RRE and IRE cannot be di- rectly optimized b ecause there is no v alidation set due to the absence of fall data during training. Khan et al. [13] prop ose to remo ve some outliers from the normal data and consider them as proxy for unseen falls. They show that rejected outliers from the normal activities can b e used to create a v alidation set and tune parameters of a learning algorithm in the absence of fall data. They use hidden marko v mo dels and show that some of the proxy for falls b ear re- sem blance to actual falls. W e mo dify this idea with resp ect to the autoenco ders and presen t a cross-v alidation method to optimize Ω in our setting. Firstly , w e train an AE on full normal data and compute reconstruction er- ror of eac h training example. Then we reject some instances from the normal activities based on a parameter ρ , using their reconstruction error. The pa- rameter ρ also uses IQR technique (as discussed in Section 5.1); ho wev er it is v ery differen t from the parameter Ω. The parameter ρ represents the amount of outlier data remov ed from normal activities to generate samples for proxy falls to create a v alidation set. The remaining normal activities are called non- falls. The parameter Ω represents the amoun t of reconstruction error remo ved to set a ‘threshold’ to identify unseen falls during testing. F or a giv en v alue of ρ , sev eral v alues of Ω can b e tested and the b est is used for further analysis. Therefore, ρ is considered as a hyper-parameter and Ω as a parameter to find RRE and IRE . Then the data from b oth the classes (non-falls and proxy fall) is divided into K -folds. The non-fall data from ( K − 1) folds is com bined and an AE is trained on it. The data from K th fold for non-fall and proxy fall is used for testing and tuning the parameters. The process is rep eated K times for differen t v alues of Ω, the one with the b est av erage p erformance o ver K folds is chosen for further analysis. The p eformance metric is discussed in Section 6.1. Lastly , for a giv en ρ , we retrain on the non-fall data. The maximum of the 9 reconstruction error corresp onding to the best Ω (obtained in the step discussed ab o v e for a given ρ ) is taken as RRE . T o compute IRE , we remo v e the outliers from the non-falls corresp onding to Ω; then retrain on the reduced training set and tak e the maximum of the reconstruction error as IRE . The v alue of h yp er-parameter ρ can be v aried to observ e an o v erall effect on the performance of the prop osed threshold tightening methods, RRE and IRE . In tuitively , a large v alue of ρ means less num ber of instances are remov ed from normal activities as pro xy for fall, whic h may lead to classify a lot of test samples as normal activities and may miss to iden tify some falls. Whereas, a small v alue of ρ means more instances from normal activities may b e rejected as proxy for falls; thus, the normal class will b e smaller that may result in identifying most of the falls but at the cost of more false alarms. In summary , we exect that with increase in ρ , both the true p ositiv e rate and false p ositive rate should reduce (fall is the positive class). By v arying ρ , w e can find an optimum range of op eration with a go o d balance of true p ositives and false p ositiv es. It is to b e noted that in this cross-v alidation metho d, no actual falls from the training set are used because it is only comprised of normal activities and all the parameters are tuned in the absence of actual falls. 6. Exp erimen tal Analysis 6.1. Performanc e Metrics W e consider a case for detecting falls where they are not av ailable during training and o ccur only during testing. Therefore, during the testing phase, w e exp ect a sk ew ed distribution of falls. Hence, the standard p erformance metrics suc h as accuracy ma y not b e appropriate because it ma y give an ov er-estimated view of the p erformance of the classifier. T o deal such a case, w e use the geometric mean ( g mean ) [15, 12] as the performance metric to presen t the test results and optimize the parameters during cross-v alidation. g mean is defined as the square ro ot of the m ultiplication of true p ositive and true negativ e rate, i.e. g mean = √ T P R ∗ T N R g mean = p T P R ∗ (1 − F P R ) (5) where T P R is the true p ositive rate, T N R is the true negativ e rate and F P R is the false positive rate. The v alue of g mean v aries from 0 to 1, where a 1 means a p erfect classification among falls and normal activities and 0 as the w orst outcome. W e also use the T P R and F P R as other p erformance metrics to further elab orate our results. T o ev aluate the p erformance of the prop osed approaches for fall detection, w e perform lea v e-one-sub ject-out cross v alidation (LOOCV) [5], where only nor- mal activities from ( N − 1) sub jects are used to train the classifiers and the N th sub ject’s normal activities and fall even ts are used for testing. This pro cess is rep eated N times and the a verage p erformance metrics are rep orted. This ev al- uation is p erson indep enden t and demonstrates the generalization capabilities as the sub ject who is b eing tested is not included in training the classifiers. 10 6.2. Datasets W e sho w our results on tw o activity recognition datasets that includes dif- feren t normal activities and fall even ts collected via w earable devices. 6.2.1. German A er osp ac e Center (DLR) [21] This dataset is collected using an Inertial Measuremen t Unit with a sam- pling frequency of 100 Hz. The dataset contains samples from 19 p eople of b oth genders of different age groups. The data is recorded in indo or and out- do or environmen ts under semi-natural conditions. The sensor is placed on the b elt either on the righ t or the left side of the bo dy or in the right p o c ket in differen t orientations. The dataset con tains lab elled data of the follo wing 7 activities: Standing, Sitting, Lying, W alking (up/downstairs, horizontal), Run- ning/Jogging, Jumping and F alling. One of the sub jects did not p erform fall activit y; therefore, their data is omitted from the analysis. 6.2.2. Coventry Dataset (CO V) [22] This dataset is collected using tw o SHIMMER TM sensor no des strapp ed to the chest and thighs of sub jects with a sampling frequency of 100 Hz. Two proto cols were follow ed to collect data from sub jects. In Protocol 1, data for six t yp es of fall scenarios are captured (forward, bac kw ard, right, left, real fall- bac kward and real fall forward) and a set of ADL (standing, lying, sitting on a chair or b ed, walking, crouching, near falls and lying). Protocol 2 inv olv ed ascending and descending stairs. 42 y oung healthy individuals sim ulated v arious ADL and fall scenarios (32 in Protocol 1 and 10 in Proto col 2). These data from differen t types of falls are joined together to mak e one separate class for falls. The sub jects for Proto col 2 did not record corresp onding fall data; therefore, that data is not used. In our analysis, w e used accelerometer and gyroscop e data from the sensor no de strapped to the chest. 6.3. Exp erimental Setup F or b oth the datasets, all the normal activitis are joined together to form a normal class. F or CO V dataset, different t yp es of falls are joined to make a fall class. The raw sensor data is pro cessed using a 50% ov erlapping sliding windo w. The time window size is set to 1 . 28 seconds for the DLR dataset and 2 . 56 seconds for the COV dataset (as shown in Khan et al. [13]). After pre- pro cessing, the DLR dataset has 26576 normal activities and 84 fall segments, and the CO V dataset has 12392 normal activities and 908 fall segments. W e test tw o types of AE for the analysis; one with a single hidden la y er and other with three lay ered SAE. F or the monolithic AE, the ra w data within a time window for each of the 3 channels of accelerometer and gyroscop e is concatenated, which leads to 768(= 128 × 6) input lay er neurons for the DLR dataset and 1536(= 256 × 6) input la y er neurons for the COV dataset. The n umber of hidden neurons (i.e., the n umber of generic features learned) is set to 31 (as suggested for the engineered features case in the w ork of Khan et al. [13]). F or the monolithic SAE, the num b er of hidden neurons in the first la yer is chosen 11 to b e half of the num b er of input neurons, i.e. 384 for DLR dataset and 768 for COV dataset and the second lay er has 31 num ber of features. F or the c hannel-wise ensem ble metho d, eac h channels is fed to the AE/SAE separately . Therefore, the n umber of neurons in the input la y er p er AE is set to 128 for DLR dataset and 256 for COV dataset and the hidden lay ers has 31 neurons. F or the c hannel-wise SAE, the hidden neurons for first lay er is half the num b er of input la yer, i.e. 64 for DLR dataset and 128 for the COV dataset. The second hidden la yer for b oth the datasets has 31 neurons. The num ber of training ep ochs is fixed to 10 for all the different autoencdo ers. Rest of the parameters such as the sparsit y parameter, activ ation function etc., are k ept at the default v alues [19]. Compressed features learned through monolithic AE and SAE are further used to train O S V M and One-class nearest neighbour ( O C N N ) classifiers [14] for comparison. 6.3.1. Internal Cr oss-V alidation F or OC N N , the n umber of nearest neighbours to iden tify an outlier is k ept as 1. O S V M has a parameter ν (or the outlier fraction), whic h is the exp ected prop ortion of outliers in the training data. The v alue of this parameter is tuned, similar to parameter optimization discussed in Section 5.3. That is, reject a small p ortion of normal class data as a pro xy for unseen falls for a giv en ρ and create a v alidation set. Then p erform a K fold cross-v alidation for different v alues of ν and choose the one with the largest av erage g mean ov er all the K - folds. The ‘KernelScale’ parameter is set to ‘auto’ and ’Standardize’ to ‘true’, other parameters are k ept to default v alues [20]. An internal K = 3-fold cross v alidation is employ ed to optimize the parame- ters ν for the O S V M and Ω for the RRE and IRE thresholding metho ds. The parameter Ω is v aried from from [0 . 001 , 0 . 01 , 0 . 1 , 0 . 5 , 1 , 1 . 5 , 1 . 7239 , 2 , 2 . 5 , 3] and ν is v aried from [0 . 1 , 0 . 3 , 0 . 5 , 0 . 7 , 0 . 9]. The b est parameter is chosen based on the av erage g mean ov er K folds. T o understand the effect of remo ving outlier data from the normal activities in building classification models for unseen falls, the hyper-parameter ρ is v aried from [0 . 001 , 0 . 01 , 0 . 1 , 0 . 5 , 1 , 1 . 5 , 1 . 7239 , 2 , 2 . 5 , 3]. Along with 6 c hannel ra w data to train differen t classifiers to detect unseen falls, we also use 2 channel magnitude data from each of the datasets to train differen t classifiers. Therefore, in the exp eriment w e compare the following differen t classifiers for tw o t yp es of c hannels data (i.e. 6 and 2 c hannels): • Tw o types of AE, i.e. single la y er AE and three lay ered SAE. • F our t yp es of thresholding metho ds i.e. MaxRE , StdRE , RRE and IRE . • Tw o types of feature learning tec hniques - (i) monolithic and (ii) c hannel-wise ensem ble. • Tw o one-class classifiers ( O C N N and O S V M ) trained on features learned from AE and SAE (not for the c hannel-wise case). This results in 20 different classifiers trained p er 6 / 2 channels input ra w sensor data, w e compare their p erformance in the next section. 12 6.4. R esults and Discussion T ables 1 and 2 show the results for the DLR datasets for 6 and 2 channel input raw data. The results corresp ond to ρ = 1 . 5. T ables 1d and 2d show the results when the features learned using AE and SAE are fed to OSVM and OCNN. W e observe that for the 6 c hannel case, the b est g mean is obtained for c hannel-wise AE with RRE follow ed b y IRE metho d (see T ables 1a. The traditional thresholding methods of MaxRE and StdRE does not perform well. W e observe similar results for DLR dataset with 2 channel input; ho wev er, the g mean v alues using 6 channel input data are higher. Both the RRE and IRE metho ds with c hannel-wise AE give goo d trade-off b et w een T P R (see T ables 1b and 2b) and F P R (see T ables 1c and 2c). Results on the CO V dataset for the 6 and 2 c hannels input data are sho wn in T able 3 and 4. F or the 6 c hannel case, c hannel-wise ensemble metho d with AE for RRE and IRE give equiv alent v alues of g mean , which is higher than other metho ds of thresholding. Both the b est metho ds giv e a goo d trade-off betw een T P R and F P R (see T ables 3b and 3c). F or the 2 channel case, the IRE thresh- old metho d for b oth the monolithic and channel-wise approaches for AE and SAE give equiv alent p erformance along with monolithic SAE with RRE . The c hannel-wise approach giv es more false alarms but detects more falls than the monolithic approach. F or b oth the DLR and COV datasets, the OCNN classi- fier perform w orse than the prop osed methods b ecause it gives large num ber of false alarms; whereas, OSVM classifies all the test samples as falls (see T ables 1d, 2d, 3d, 4d). By con ven tion, we classify a test sample as a fall in case of a tie in the c hannel-wise approach. The probability of a tie o ccurring is higher in 2 c hannel ensem ble metho d than in 6 c hannel ensemble; therefore, its sensitivit y to detect falls is higher than the 6 channel case with an increase in the false alarm rate. W e observ e this behavior for both the DLR and COV datasets (see the Channel- wise rows in T ables 1c, 2c, and 3c, 4c ). F rom this exp eriment we infer that the traditional methods of thresholding, i.e., MaxRE and StdRE , are not suitable for the task of fall detection. MaxRE may not w ork prop erly because of the presence of noise in the sensor data that can significan tly increase the reconstruction error of an AE/SAE, leading to classify most of the test samples as normal activit y . The StdRE is an ad-hoc approach that arbitrarily c hooses a threshold to iden tify fall and do es not derive it from a given dataset. Ho wev er, it can p erform b etter than MaxRE , in terms of identifying more falls. Both of these metho ds attempt to find a discriminating threshold from the training dataset to to get a go o d trade-off b etw een T P R and F P R . Our exp eriments suggest that for both the datasets, the proposed threshold tightening methods RRE and IRE with c hannel-wise ensemble approach p erform equiv alen tly and consistently b etter than the traditional methods of threshold tightening. W e v ary the h yper-parameter ρ to understand its impact on the performance of differen t thresholding tec hniques. Figures 3 and 4 sho w the v ariation of T P R , F P R and g mean with increasing ρ . W e observe that as the v alue of ρ increases, b oth T P R and F P R reduce. The reason is that at smaller v alues of ρ , a large 13 (a) g mean v alues. F eatures T yp es Auto encoder T yp e Thresholding MaxRE StdRE RRE IRE Monolithic AE 0 0.106 0.825 0.757 SAE 0 0.234 0.840 0.837 Channel- AE 0 0.547 0.860 0.849 wise SAE 0 0.334 0.818 0.811 (b) T P R v alues. F eatures T yp es Auto encoder T yp e Thresholding MaxRE StdRE RRE IRE Monolithic AE 0 0.056 0.856 0.762 SAE 0 0.138 0.893 0.893 Channel- AE 0 0.428 0.902 0.840 wise SAE 0 0.226 0.774 0.750 (c) F P R v alues. F eatures T yp es Auto encoder T yp e Thresholding MaxRE StdRE RRE IRE Monolithic AE 5.9e-6 0.025 0.189 0.169 SAE 5.9e-6 0.025 0.199 0.204 Channel- AE 0 0.010 0.169 0.122 wise SAE 0 0.008 0.088 0.079 (d) Performance on OSVM and OCNN metho ds. Classifier Auto enco der T yp e g mean T P R F P R OSVM AE 0 1 1 SAE 0 1 1 OCNN AE 0.460 0.911 0.761 SAE 0.423 0.681 0.701 T able 1: Performance of different fall detection metho ds on DLR dataset (6 channels) for ρ = 1 . 5 p ortion of normal data is rejected as outliers and used for parameter tuning; th us, the num b er of instances in the non-fall class is small. This means that the AE/SAE will learn on a smaller dataset and will reject most of the v ariations 14 (a) g mean v alues. F eatures T yp es Auto encoder T yp e Thresholding MaxRE StdRE RRE IRE Monolithic AE 0 0 0.504 0.774 SAE 0 0 0.630 0.776 Channel- AE 0.013 0.487 0.839 0.822 wise SAE 0 0.446 0.678 0.655 (b) T P R v alues. F eatures T yp es Auto encoder T yp e Thresholding MaxRE StdRE RRE IRE Monolithic AE 0 0 0.966 0.949 SAE 0 0 0.959 0.941 Channel- AE 0.003 0.323 0.941 0.926 wise SAE 0 0.329 0.629 0.579 (c) F P R v alues. F eatures T yp es Auto encoder T yp e Thresholding MaxRE StdRE RRE IRE Monolithic AE 7.4e-5 0.037 0.705 0.363 SAE 3.7e-5 0.039 0.544 0.353 Channel- AE 1.0e-4 0.032 0.245 0.264 wise SAE 7.7e-5 0.033 0.099 0.094 (d) Performance on OSVM and OCNN metho ds. Classifier Auto enco der T yp e g mean T P R F P R OSVM AE 0 1 1 SAE 0 1 1 OCNN AE 0.459 0.815 0.719 SAE 0.318 0.317 0.449 T able 2: Performance of different fall detection metho ds on DLR dataset (2 channels) for ρ = 1 . 5 from this small subset of normal activities as p oten tial falls. Consequently , man y falls will also b e identified correctly . The reverse b eha vior will happ en when ρ is large, th us less n um b er of normal data is rejected as outliers and the 15 (a) g mean v alues. F eatures T yp es Auto encoder T yp e Thresholding MaxRE StdRE RRE IRE Monolithic AE 0.015 0.744 0.774 0.771 SAE 0.019 0.743 0.772 0.771 Channel- AE 0.014 0.463 0.795 0.795 wise SAE 0 0.226 0.737 0.707 (b) T P R v alues. F eatures T yp es Auto encoder T yp e Thresholding MaxRE StdRE RRE IRE Monolithic AE 0.004 0.589 0.744 0.740 SAE 0.007 0.588 0.738 0.738 Channel- AE 0.003 0.248 0.7 0.7 wise SAE 0 0.082 0.665 0.573 (c) F P R v alues. F eatures T yp es Auto encoder T yp e Thresholding MaxRE StdRE RRE IRE Monolithic AE 1.1e-4 0.017 0.169 0.169 SAE 1.1e-4 0.017 0.166 0.167 Channel- AE 0 0.002 0.067 0.072 wise SAE 0 5.9e-5 0.128 0.078 (d) Performance on OSVM and OCNN metho ds. Classifier Auto enco der T yp e g mean T P R F P R OSVM AE 0 1 1 SAE 0 1 1 OCNN AE 0.432 0.977 0.805 SAE 0.484 0.949 0.751 T able 3: P erformance of different fall detection metho ds on COV dataset (6 channels) for ρ = 1 . 5 class of normal activities will b e large. This will reduce the n umber of false alarms but can also lead to missing to iden tify some falls. The experimental observ ation for eac h of the 6 and 2 channel datasets is consisten t with this 16 (a) g mean v alues. F eatures T yp es Auto encoder T yp e Thresholding MaxRE StdRE RRE IRE Monolithic AE 0.041 0.743 0.668 0.785 SAE 0.019 0.724 0.784 0.784 Channel- AE 0.337 0.767 0.726 0.788 wise SAE 0.331 0.757 0.739 0.786 (b) T P R v alues. F eatures T yp es Auto encoder T yp e Thresholding MaxRE StdRE RRE IRE Monolithic AE 0.012 0.587 0.729 0.698 SAE 0.007 0.557 0.677 0.674 Channel- AE 0.147 0.621 0.805 0.779 wise SAE 0.142 0.606 0.781 0.779 (c) F P R v alues. F eatures T yp es Auto encoder T yp e Thresholding MaxRE StdRE RRE IRE Monolithic AE 1.7e-4 0.013 0.287 0.094 SAE 1.7e-4 0.012 0.06 0.056 Channel- AE 1.1e-4 0.015 0.298 0.186 wise SAE 1.1e-4 0.016 0.255 0.182 (d) Performance on OSVM and OCNN metho ds. Classifier Auto enco der T yp e g mean T P R F P R OSVM AE 0 1 1 SAE 0 1 1 OCNN AE 0.594 0.905 0.606 SAE 0.580 0.606 0.432 T able 4: P erformance of different fall detection metho ds on COV dataset (2 channels) for ρ = 1 . 5 in tuition discussed in Section 5.3. Similar observ ation can b e made for the CO V dataset from Figures 5 and 6. F or b oth datasets, w e notice that at large v alue of ρ , the p erformance of best thresholding approaches drops slow er. This 17 exp erimen tal observ ation suggests that a smaller amount of data (corresponding to ρ ≥ 1 . 5) may b e remov ed from the normal activities class as outliers, which can be used as a v alidation set to optimize the parameters of the AE/SAE and b etter p erformance can b e achiev ed for identifying unseen falls. W e also infer that channel-wise approach outp erforms monolithic in all the 6 and 2 channel data v arian ts of both the datasets. 7. Conclusions and F uture W ork A fall is a rare even t; therefore, it is difficult to build classification mo dels using traditional sup ervised algorithms in the absence of training data. An as- so ciated c hallenge for fall detection problem is to extract discriminativ e features in the absence of fall data for training generalizable classifiers. In this pap er, we presen ted solutions to deal with these issues. Firstly , we form ulated a fall detec- tion problem as a one-class classification or outlier detection problem. Secondly , w e presen ted the use of AE, more spec ifically a nov el w a y to train separate AE for each channel of the w earable sensor, to learn generic features and create their ensemble. W e prop osed threshold tightening metho ds to identify unseen falls accurately . This work provides useful insigh ts that an ensemble based on c hannels of a w earable device with optimized threshold is a useful technique to iden tify unseen falls. In future w ork, we are exploring extreme v alue theory and com bining it with the prop osed approac hes to identify unseen falls. Ac kno wledgments This work was partially supported b y the A GE-WELL NCE T rainee Award Program and by the Canadian Consortium on Neuro degeneration in Aging (CCNA). References References [1] Budiman, A., F anany , M. I., Basaruddin, C., Oct 2014. Stac ked denoising auto encoder for feature represen tation learning in pose-based action recog- nition. In: 2014 IEEE 3rd Global Conference on Consumer Electronics (GCCE). pp. 684–688. [2] Dong, Y., Japk owicz, N., 2016. Adv ances in Artificial Intelligence: 29th Canadian Conference on Artificial Intelligence, Canadian AI 2016, Victoria, BC, Canada, Ma y 31 - June 3, 2016. Pro ceedings. Springer International Publishing, Cham, Ch. Threaded Ensembles of Sup ervised and Unsup er- vised Neural Net works for Stream Learning, pp. 304–315. [3] El-Bendary , N., T an, Q., Pivot, F. C., Lam, A., 2013. F all detection and prev ention for the elderly: A review of trends and challenges. In ternational Journal on Smart Sensing and In telligent Systems 6 (3), 1230–1266. 18 [4] Erfani, S. M., Ra jasegarar, S., Karunasek era, S., Leckie, C., 2016. High- dimensional and large-scale anomaly detection using a linear one-class { SVM } with deep learning. P attern Recognition 58, 121 – 134. [5] He, Z., Jin, L., 2009. Activity recognition from acceleration data based on discrete consine transform and svm. In: SMC. IEEE, pp. 5041–5044. [6] Igual, R., Medrano, C., Plaza, I., 2013. Challenges, issues and trends in fall detection systems. BioMedical Engineering OnLine 12 (1), 1–24. [7] Ithapu, V. K., Singh, V., Ok onkwo, O., Johnson, S. C., 2014. Randomized denoising auto encoders for smaller and efficien t imaging based ad clinical trials. In: Medical Image Computing and Computer-Assisted In terven tion– MICCAI 2014. Springer, pp. 470–478. [8] Jank owski, S., Szymanski, Z., Dziomin, U., Mazurek, P ., W agner, J., 2015. Deep learning classifier for fall detection based on ir distance sensor data. In: Intelligen t Data Acquisition and Adv anced Computing Systems: T ech- nology and Applications (ID AACS), 2015 IEEE 8th In ternational Confer- ence on. V ol. 2. IEEE, pp. 723–727. [9] Japk owicz, N., Myers, C., Gluck, M., 1995. A nov elt y detection approach to classification. In: Pro ceedings of the 14th in ternational joint conference on Artificial in telligence-V olume 1. Morgan Kaufmann Publishers Inc., pp. 518–523. [10] Jok ano vic, B., Amin, M., Ahmad, F., 2016. Radar fall motion detection using deep learning. In: 2016 IEEE Radar Conference (RadarConf ). pp. 1–6. [11] Khan, S., 2016. Classification and decision-theoretic framew ork for detect- ing and rep orting unseen falls. Ph.D. thesis, Univ ersit y of W aterlo o. [12] Khan, S. S., Karg, M. E., Kuli ´ c, D., Ho ey , J., Dec 2014. X-factor HMMs for detecting falls in the absence of fall-sp ecific training data. In: et al., L. P . (Ed.), Pro ceedings of the 6th International W ork-conference on Ambien t Assisted Living (IW AAL 2014). V ol. 8868. Springer In ternational Publish- ing Switzerland, Belfast, U.K., pp. 1–9. [13] Khan, S. S., Karg, M. E., Kuli´ c, D., Ho ey , J., 2017. Detecting falls with x-factor hidden mark ov models. Applied Soft Computing 55, 168–177. [14] Khan, S. S., Madden, M. G., 2014. One-class classification: taxonom y of study and review of techniques. The Knowledge Engineering Review 29, 345–374. [15] Kubat, M., Mat win, S., 1997. Addressing the curse of im balanced training sets: One-sided selection. In: In Proceedings of the F ourteenth Interna- tional Conference on Mac hine Learning. 19 [16] Li, Y., Shi, D., Ding, B., Liu, D., 2014. Mining In telligence and Kno wl- edge Exploration: Second International Conference, MIKE 2014, Cork, Ireland, December 10-12, 2014. Pro ceedings. Springer International Pub- lishing, Cham, Ch. Unsup ervised F eature Learning for Human Activity Recognition Using Smartphone Sensors, pp. 99–107. [17] Li, Y., Shi, D., Ding, B., Liu, D., 2014. Unsup ervised feature learning for h uman activity recognition using smartphone sensors. In: Mining Intelli- gence and Kno wledge Exploration. Springer, pp. 99–107. [18] Manevitz, L., Y ousef, M., 2007. One-class document classification via neu- ral netw orks. Neuro computing 70 (79), 1466 – 1481, adv ances in Compu- tational In telligence and Learning14th Europ ean Symp osium on Artificial Neural Netw orks 200614th Europ ean Symposium on Artificial Neural Net- w orks 2006. [19] MA TLAB, 2017. T rain auto enco der. http://www.mathworks.com/help/ nnet/ref/trainautoencoder.html , accessed on 23 rd F ebruary , 2017. [20] MA TLAB, 2017. T rain binary support v ector mac hine classifier. https:// www.mathworks.com/help/stats/fitcsvm.html , accessed on 23 rd F ebru- ary , 2017. [21] Nadales, M. J. V., 2010. Recognition of human motion related activities from sensors. Master’s thesis, Univ ersit y of Malaga and German Aerospace Cener. [22] Ojetola, O., Gaura, E., Brusey , J., 2015. Data set for fall ev ents and daily activities from inertial sensors. In: Pro ceedings of the 6th A CM Multimedia Systems Conference. MMSys ’15. A CM, New Y ork, NY, USA, pp. 243–248. [23] Organization, W. H., 2016. F alls fact sheet, review ed septem b er 2016. http://www.who.int/mediacentre/factsheets/fs344/en/ , accessed on 7 th Marc h 2017. [24] P annurat, N., Thiemjarus, S., Nanta jeewara w at, E., 2014. Automatic fall monitoring: a review. Sensors 14 (7), 12900–12936. [25] Pl¨ otz, T., Hammerla, N. Y., Olivier, P ., 2011. F eature learning for activit y recognition in ubiquitous computing. In: Pro ceedings of the Tw en t y-Second In ternational Join t Conference on Artificial Intelligence - V olume Tw o. IJ- CAI’11. AAAI Press, pp. 1729–1734. [26] Ra vi, N., Dandek ar, N., Mysore, P ., Littman, M. L., 2005. Activit y recog- nition from accelerometer data. In: Pro ceedings of the 17th conference on Inno v ativ e applications of artificial in telligence - V olume 3. IAAI’05. AAAI Press, pp. 1541–1546. 20 [27] Reev e, H. W. J., Bro wn, G., 2015. Mo dular auto enco ders for ensemble fea- ture extraction. In: NIPS 2015 W orkshop on F eature Extraction: Mo dern Questions and Challenges. pp. 242–259. [28] Sakurada, M., Y airi, T., 2014. Anomaly detection using auto encoders with nonlinear dimensionality reduction. In: Proceedings of the MLSDA 2014 2Nd W orkshop on Mac hine Learning for Sensory Data Analysis. MLSD A’14. ACM, New Y ork, NY, USA, pp. 4:4–4:11. [29] Sc holz, M., Vig´ ario, R., 2002. Nonlinear pca: a new hierarchical approach. In: ESANN. pp. 439–444. [30] Stone, E., Skubic, M., Jan 2015. F all detection in homes of older adults us- ing the microsoft kinect. Biomedical and Health Informatics, IEEE Journal of 19 (1), 290–301. [31] W ang, L., 2016. Recognition of human activities using contin uous autoen- co ders with w earable sensors. Sensors 16 (2), 189. 21 (a) T rue Positiv e Rate (b) F alse Positiv e Rate (c) g mean Figure 3: Performance of top 5 fall detection metho ds b y v arying ρ on DLR dataset. AE - Sin- gle La y er Autoenco der, SAE - 3 Lay er Stac ked Auto encoder, RRE - Reduced Reconstruction Error, IRE - Inlier Reconstruction Error, CW - Channel-wise Ensem ble 22 (a) T rue Positiv e Rate (b) F alse Positiv e Rate (c) g mean Figure 4: Performance of top 5 fall detection methods b y v arying ρ on DLR-norm dataset. AE - Single Lay er Auto encoder, SAE - 3 Lay er Stack ed Auto enco der, RRE - Reduced Recon- struction Error, IRE - Inlier Reconstruction Error, CW - Channel-wise Ensem ble 23 (a) T rue Positiv e Rate (b) F alse Positiv e Rate (c) g mean Figure 5: P erformance of top 5 fall detection metho ds by v arying ρ on COV dataset. AE - Sin- gle La y er Autoenco der, SAE - 3 Lay er Stac ked Auto encoder, RRE - Reduced Reconstruction Error, IRE - Inlier Reconstruction Error, CW - Channel-wise Ensem ble 24 (a) T rue Positiv e Rate (b) F alse Positiv e Rate (c) g mean Figure 6: P erformance of top 5 fall detection methods by v arying ρ on COV-norm dataset. AE - Single Lay er Auto encoder, SAE - 3 Lay er Stack ed Auto enco der, RRE - Reduced Recon- struction Error, IRE - Inlier Reconstruction Error, CW - Channel-wise Ensem ble 25

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment