User Environment Detection with Acoustic Sensors Embedded on Mobile Devices for the Recognition of Activities of Daily Living

User Environ ment Detection with Acoustic Sen sors Embe dded on M obile Devices for the Recognition o f Activities of Daily Living Ivan Miguel Pires 1,2,3 , Nuno M. Garcia 1,3,4 , Nuno Pom bo 1,3 ,4 , and Francisco Flórez-Revuelta 5 1 Institut o de Telecomunicações, U niversidade da Beira Interi or , Covilhã , Portugal 2 Altranport ugal, Lisbon, P ortugal 3 ALLab - Assisted Living Comput ing and Telecommu nications Laboratory, Computer Sci ence Department, Universidade da Beira Interior, Covilhã , Portugal 4 Universidade Lusófona de Humanidades e Tecnologias, Lisbon, Portugal 5 Department of C ompu ter Technology, Universidad de Alicante, Spain i m p i r e s @ i t. u b i. p t, n g a r c i a @ d i . u b i. p t, n g p o m bo @ d i . u b i. p t, fr an c i s co . f l o r e z @ ua . e s Abs tr act The detection of the environmen t where us er is located, is o f extreme use for the identification of Activit ies of Daily Living (ADL). ADL can be ident ified by use of the sens ors availabl e in many off-the-shelf mobile devices , i ncluding magnetic a nd mot ion, and the enviro nment can be also identified us ing acoustic sensors. The study pres ented in this paper i s divided in t wo part s: firstl y, we discus s the recognition of t he environment using acoustic sensors ( i.e., microphone), and secondly, we fuse this information with motion and magneti c sensors ( i.e ., mot ion and magnetic s ensors) for the recognition of standi ng act ivities of daily living. The recognition of t he environments and the ADL are performed using pattern recognit ion techniques, in order to develop a system that i ncludes data acqui sition, data processing, data fusion, and artifi cial intelligence methods. The artificial intelligence metho ds explored in this study are composed by different types of Ar tificial N eural Networks (ANN), comparing the diff erent types of ANN and selectin g the best methods to implement in t he different stages of t he system developed. Conclusions point t o the use of Deep Neur al Networks (D NN) with normalized data for the i dentification of AD L with 85.89% of accuracy, the use of Feedforward neural networks with non -normalized data for the identification of the environments with 86.50% of accuracy, and the use of DNN with normal ized data for t he identifica tion of standing acti vities w ith 1 00% of accuracy. Keywords: Activities of Daily Living (ADL); s ens ors; mobile devices; accelerometer; gyroscope; magnetometer; mi crophone; data acquisition; dat a processing; data cleani ng; data fusion; feature extract ion; pattern recognition; machine learning. 1. Introduction The acquisiti on of data rela ted to the Acti vities of Daily Living (ADL) [1] may be performed w ith the sensors availabl e i n off- the-shelf mobile devices, e.g . , the accelerometer, the gyroscope, the magnetometer, the microphone, a nd the G lobal Positioning System (GPS) receiver. The acquired da ta from the s ensors available in off-the-shelf mobile devices are related t o the movement perf ormed during the activities and the environment where the ac tivities are pe rformed [2] in order to develop a me thod for the aut omatic recognition of the ADL as a part of the development of a personal digital life coach [3 ]. This study proposes the us e o f the microphone for t he recognit ion of the environment, which is fused with the data a cquired from the accelerometer, gyroscope and magnetomet er sens ors for the recognition of the activities with movement. In continuation of the previous s tudy, available at [4] , t he main goal o f the fusion of the environment recognized with the other s ensors’ data is to increas e the number o f ADL recognized using data fusion and art ificial intelligence techniqu es. This study proposes the recognition of ADL, including running, walking, walki ng on stairs, standi ng, and sleeping, and the recognit ion of environments, including bar, classroom, gym, kitchen, li brary , street, hall, watchi ng TV and bedroom. These methods a re included in t he development of a f rame work for the recognition of ADL and their environments, proposed in [5 - 7] , compos ed by several modules , s uch as data acquisition, data processing, data fusion, and artificial intelligence methods. However , the da ta proces sing is composed by some steps, such as data cl ean ing and feature ex traction, and the da ta fusion and artificial i n telli gence techniques ar e applied at the same time for the achievement of the final pu rpose of the recognition of ADL a nd their environments. The advant ages of recognition o f the environme nts are not limited to the increasing of th e number of A DL recognized, but it a llows the framework t o combine the environment s with the A DL recognition ret urning differ ent results, e.g., the user is walking on the street . The topic related t o the recognition of the ADL has some s tudies availa ble in the literature [8 -1 3], but there are no s tudies that uses all sensors available on t he o ff -the-shelf mobile devices, however the Artifi cial Neural Networks (ANN) is one of the most used met hods in this topic. Based on our previous studies using motion and magnetic s ensors for the development of the fr amework for the recognition o f ADL and their environments [4, 14], t his study proposes the creation of several methods to adapt the framework to the number of s ensors available in off-the-sh elf mobile devices. Some methods using different combinations of sens ors are presented in previous studies [4, 14], such as the method using accelerometer data, the method using accelerometer and m agnetometer data, and the met hod using accelerometer, magnetometer and g yroscope se nsors. T hus, this study proposes the creation of the method using acoustic data for the recognition of the environ ments, as well as, different methods fusing the environment re cognized with o ther data sources , s uch a s the method using accelerometer and environment, the method using accelerometer, magnetomete r and environment, and the method using accelerometer, magnetometer, gyroscope and environment. For the implement ation and testing of these methods, w e proposed the use of AN N, exploring the use of three types of ANN, such as the Multilayer Perception (MLP) with Backpropagation implemen ted with Neuroph [15] , the Feedforward neural network w ith Backpropagation implemented with Encog [16], and D eep Learning implemented w ith DeepLearning4j [17] . The acquisition of data was performed by people a ged bet ween 16 a nd 60 years ol d with different lifestyles and the mobile device, correctly posit i oned on the pocket , for the creation of the datasets composed by the sens ors’ data. This res earch included the definition of the correct set of features needed and the bes t ANN method for the recognitio n of ADL and environments, verifying that, for the re cognition of environmen ts, the best results are achiev ed wit h Feedforward neural ne twork wit h Backpropagation, and, for the recognition of A DL, the best results are achieved w ith Deep L earning techniques. The introduction section is concluded in this paragraph, and the remaining sections are organized as follows: Section 2 pres ents the literature review focused on the use of acoustic sens ors for the recogni tion of ADL and their environments. The met hods used for the development of the methods for the implementat ion in the framework for the recognition of ADL and their environments a re presented in the section 3. Secti on 4 presented the results obtained with the i mplementation of t he di fferent methods. Finall y, the dis cussion about the res ults and implementation in the fra mework is pres enting in the sec tion 5, presenting the conclusions in t he section 6. 2. Related Wor k There are no studies rel ated to t he use of the f usion of the data a cquired f rom al l sensors availa ble on off-t he-shelf mobi le devi ces, including accelerometer, gyroscope, magnetometer, a nd microphone, for the recognition of A ctivities of Daily Living (ADL) and their en vironments [1] , but there are few st udies using subsets of these sensors . The authors of [18] used the Global Positioning System ( GPS) rece iver, accelerometer, and microphone sensors for the recognit ion o f sleeping, walk ing, sta nd ing, running, and social in teraction act ivities, applying li near and logistic regression methods with several fea tures, including the mean and va riance of the accelerometer dat a and the spectral roll-off of the acoustic data, reporting an accuracy around 90%. In [19] , the authors extracted the minimum, difference bet ween axis, mean, standard deviation, variance, correlation between axis, sum of coefficient s , spectral energy and spectral entropy from the accelerometer sens or, a nd the zero -crossing rate, t otal spectrum power, sub -band powers, spec tral centroid, spectral spread, spectral flux, spectral roll -off, and Mel-Frequency Cepstral Coeff icients (MFCC) from the microphone, applied t o Support V ector Mach ine (SVM) and Gradient Boosting Decision T ree methods, in order to recognize sitting on cha ir, lying, s tanding, walking, going upstairs, going downstairs, jogging, running, and drinking ac tivities, obtaining res ults with a reported accuracy between 89.12% and 91.5%. The authors of [20] recognized several activiti es, including cycling, cleaning tabl e, shopping, travelling by car, going to toilet, cooking, watching television, ea ting, driving, working on a computer, reading, and sleeping, using data acquired from the microphone and accelerometer s ensors and applyi ng t he Ga ussian mixture model (GMM) with log power, and M FCC as features, reporting an a ccuracy of 77.9%. In [21] , the accelerometer and microphone sensors are also used for t he recognition of shopping , waiting in a queue, driving, t ravelling by car, clean ing with a va cuum cleaner, cooking, washing dis hes, working at a computer, sleeping, watching television, being a bar, sit ting, walking, standing, lying, and standing activities, using J48 decision tree, FT deci sion t ree, LM T decision tree, and IBk lazy algorithm with mean, standard deviati on, range, angular degree, and MFCC as features. The re ported accuracies are around 90%, where LMT dec ision tree reports 90.4126%, J48 dec ision tree reports 90.6553%, IBk lazy algorithm report s 90.7767% and FT decision tree reports 90.6553% [21] . The remaining studies availabl e in the literat ure using aco ustic sensors do not use data f usion techniques, because they only u se the microphone signal. Base d on the acoustic signal acquired from the microphone, the authors o f [22] used the SVM method with spectral roll- o ff, slope, minimum, medi an , coefficient of varia tion, invers e coefficient of vari ation, trimmed mean, skew ness, kurtosis, and 1 st , 57 th , 95 th and 99 th per centiles as features, reporting an accuracy h igher than 90% for the recognition o f some environments, such as restaurant, casino, playground , s treet traffic, street wit h ambulance, train, nature at day time, nature a t night time , ocean, and river. In [23] , the Linear Discriminant Cl assifier (LDC) was us ed w ith microphone data in order to recognize several ADL, including eating, drinking, clearing the throat, r elaxing, la ughing, coughing, s niffling, and talki ng, using several feat ures , including log power, tot al Root -Mean-Square (R MS) energy, spectral centroid, s pectral flux, spectral variance, spectral skewness, sp ectral kurtosis, spectral slope, spectral roll - off, zero crossing rate, MFCC, minimum, maximum, mean, R MS , median, 1 st and 3 rd quar tiles, interquartile range, st andard deviation, skewness, kurtosis, number of peaks, mean dista nce of peak s, mean amplit ude of peaks, mean cros sing rate and linear regre s sion slope, wher e the bes t reported accuracy was achieved using the tot al RMS energy, spectral cent roid, spectral flux , spectral variance, spectral skewness, spectral kurtosis, s pectral slope, spectral roll-off and MFCC as features and the average of the repor ted accuracy was 66.5%. Th e Artificial Neural Networks (ANN) is one of the mos t methods used for the recognition of the ADL and their environment s using the acoustic signals. In [24] , the authors implemented ANN method ( i.e., Multi- Layer P erceptron (MLP)) with M FCC as fea tures for the re cognition of acoust ic w arning signals of emergency vehicles (police, fire department and ambulance) , reporting an highest a ccuracy of 96.7%. Another study [25] uses ANN for the recogniti on of boll impa ct, metal impac t, wood impact, p lastic impact, door openi ng/closing, typing, knocking, telephone ring ing, grains falli ng, spray and whistle, using time-vari ance and frequency -variance patterns as features, reporti ng an average accuracy of 92%. In [ 26], the A NN was used for the recognition of sneezing , dog ba rking, clock ticking, bab y crying, crowing rooster, raining, sound of sea waves, fire crackling, so und of helicopter, and sound o f chainsaw with s ome features, such as zero crossing rate, MF CC, spectral f latness, and spectra l centroid, reportin g an accuracy around 74 .5%. Another type of ANN named as Feedforward neural networks was us ed in [27] for the recog nition of sirens from emergency vehicles, car horns, and normal street s ounds with MFCC and zero crossing ra te as features, reporting an a ccuracy between 70% and 90%. Deep Neural Networks (D NN) is anot her type of neural networ ks used for the recognition of laughing, singing, cry ing, arguing and sighing with MFCC as features, reporting res ults with reliable accuracy [28] . The authors of [ 29] also used DNN for the ambient s cene analysis ( i.e., voice, music, w ater and traffic), stress detecti on, emot ion recognition, and speaker iden tifi ca tion with MF CC as fea tures, reporti ng an accuracy between 60% and 90 %. The SVM is another method used for the recognition of ADL and their environments using acoustic signals. In [30], the authors implemented the SVM meth od for the recogni tion o f keystrokes wi th MFCC as features, reporting an a ccuracy of 7 8.4%. The authors of [31] used the SVM method for t he recognition o f sever al sounds, including beach, crowd football, shaver, birds, dishwasher, sink, brushing teeth, dog, speech, bus, forest, street, car, phone ringing, chair, train s tation, vacuum cleaner, coff ee machi ne, raining, washing machine, computer keyboard and restauran t, us ing MFCC as fea tures and re porting an accuracy around 80%. T he SVM method is also used for the recognition of sleeping using MFC C and sound pressure level (SPL) as features, reporting accuracies between 75% and 81 % [32, 33]. The Hidden Markov models (HMM) is another method used for the recognition of ADL and their environments using acoust ic signal s. In [34] , t he a uthors used HMM for the recognit ion of several sounds, such as car, truck , moped, aircraft, and train, us ing computation and storage of n oise levels, one-thi rd - octave spectra, statisti cal indices , and detection of noise events based on thresholds as features, reporting an accuracy higher tha n 95%. In [35], the authors recognized the idle state and the cicada singing sounds with HMM, based on the f requency bands and ratio, reporting res ults with a reli able accuracy. The Gaussian Mixture Model ( GMM) is a nother me thod used for the recognit ion of ADL and their environments us ing acous tic signals. In [36] , the au thors used GMM with MF CC as features for the recognition of calls during driving, reporting an accuracy arou nd 86% . On the other hand, the authors of [37] used GMM wi th zero cros sing rate, R MS, MFCC, and l ow energy frame rate as features for the recognition of emotional states, reporting an accuracy betwee n 65% and 100%. The aut hors of [3 8] used Random Forests a nd SVM methods fo r the recogni tion o f air conditioner, car horn, children playing, dog bark, drilli ng, idling, gum sho t, j ackhammer, siren, and street mus ic sounds , using MFCC and moti f features, report ing an accuracy be tween 26.45% and 5 5.68% with SVM, an d between 70.55 % and 85% with Random Forests. In [39] , the authors us ed the decision tree and HMM methods for the recognition of s everal ADL and environments, including reading, meeting, cha tti ng, assisting conference tal ks, lec tures, music, driving, elevator, walking, airplane, fan, va cuuming, shower, clapping , raining, climbing s tairs, and wind, using zero crossing rate, low energ y frame rate, s pectral fl u x, s pectral roll-off, bandwidt h, normalized weighte d phase deviation, a nd Relative Spectral Entropy (RSE), reporting an accuracy higher than 78%. The authors of [40] implemented t he GM M, Feed-Forward D NN , Rec urrent Neural N etworks (RNN ), and SVM for the recognition of baby crying and smoking alarm, using MFCC, spectral centroid, spectral flatness, s pectral roll-off, spectral kurtosis, and zero cros sing rate, reporting accuracies betw een 2% and 24%. The SVM, divers e dens ity (D D), and expected maximizat ion (EM) methods were implemented in [41] for the recogniti on of several sounds, including cutlery, water, voice, ambient, and mus ic, using MFCC, spectral flux, spectral centroid, b andwidth, Normalized Mel-F r equency Bands, z ero crossing rate, and low energy frame rate as feat ures, reporting an average accuracy of 87%. In [42] , several sounds were identified, including coffee machine brewing, hand washing, walk ing, elevator, door opening/closing, and silence, us ing k -Nearest Neighbour (k-NN), SVM and GMM methods with s ome f eatures, such as zero crossing rate, short-time energy, temporal cent roid, ene rgy ent ropy, autocorrelat ion, R MS, spectral centroid, spec tral spread, spectral roll -off point, spec tral flux, spectra l entropy, and MFC C methods. The highest a ccuracies achieved w ith the diff erent methods are 97.9%, w ith k-NN, 90%, with GM M, and 100%, with SVM [42] . The authors of [43] implemented the Random Forest, HMM, GMM, SVM, ANN, k-NN, and deep belief network methods in order t o recognize babble, driving, m achinery, crowded restaurant, street, a ir conditioner, washer , dryer, and vacuum cleaner, with MFCC , band periodi city, and band en tropy, reporting results with a reli able accuracy. In [44], the authors implemented Naïve Bayes, k-NN, R andom Forest, and B ayesian Networks meth ods for the re cognition of several nursing acti vities, including measurement of height, patient sit ting, ass isting doctor, at taching/measuring/removing electrocardiogra ph y (ECG), changing bandage, cleaning body, examining edema, and w ashing hands, using several features, i ncluding mean of intensity, mean, vari ance of intensit y, variance, mean of Fast Fourier Transform ( FFT) -domain energy, and covariance between intensiti es. The results reported are 5 6.10%, with k -NN and N aïve Bayes, 73.18 %, with k-NN and Bayesian Networks, 55.15%, with N aïve Bayes only, 80.96%, with Naïve Bayes and B ayesian Networks, 59.03%, w ith Random Forest and Naïve Bayes, and 6 7.83%, with Random Fores t and Bayesian Networks [44] . The authors of [45] recognized s everal sounds , including alarms, birds, clapping, dogs, footsteps, motorcycles, raining, rivers, s ea w aves, and wind, using k -NN, Naïve Bayes, SVM, C4.5 decision tree, logistic regres sion, and ANN, imputing several feat ure s, including zero crossing rate, skew ness, kurtosis, spectral centroid, s pectral spread, spectral flux, spectral sl ope, spectral roll -off, spectral skewness, spectral kurtosis, spectral fla tness measure, spectral crest fa ctor, spec tral sharpness, C hroma vectors, spectral s moothness, spectral variabili ty, and M FCC. The highest reported accuracies are 45%, with k -NN, 45%, with Naïve Bayes, 54% , w ith SVM, 45%, with C4.5 decis ion tree, 44%, with logistic regres sion, and 54%, with ANN [45] . In [46] , a fall detec tion method was developed wit h k-NN, SVM, least squares method (LSM), and ANN methods with spectrogram, MFCC, linear predictive coding (LPC), and ma tching pursuit (MP) as features, reporting 98 % of accuracy. Fo ll owing the research studies availabl e in the literature, the ta ble 1 s hows the ADL and environments recognized w ith the us e of the microphone, verifying tha t the standing activities are well differentiated with acoustic dat a. Tab le 1 - Di stribution of the ADL and environments extracted in the stud ies analyzed ADL: Numbe r of Studies: street with emergency vehicles (police, f ire department and ambulance) 6 sleeping 5 walking 5 standing 5 street traf fic 5 ocean 5 driving 4 river 4 sitting 3 cleaning with a va cuum cleaner 3 train 3 nature 3 typing 3 dog barking 3 baby crying 3 raining 3 music 3 running 2 lying 2 going upstairs 2 going downstairs 2 drinking 2 shopping 2 travelli ng by car 2 cooking 2 watching television 2 eating 2 working on a computer 2 reading 2 washing dishes 2 restaurant 2 laughing 2 door opening/closing 2 telephone ringing 2 helicopter 2 speech 2 coffee machine 2 elevator 2 social interact ion activities 1 jogging 1 cycling 1 cleaning tabl e 1 going to t oilet 1 waiting in a queue 1 being a bar 1 casino 1 playground 1 clearing the t hroat 1 ADL: Numbe r of Studies: relaxing 1 coughing 1 sniffling 1 talki ng 1 grains falling 1 whistle 1 sneezing 1 clock tick ing 1 arguing 1 footbal l 1 shaver 1 bird 1 dishwasher 1 brushing teeth 1 bus 1 calling 1 air conditi oner 1 car horn 1 children playing 1 drilling 1 meeting 1 chatti ng 1 shower 1 clapping 1 smoking alarm 1 hand washing 1 Several features, presented in the table 2, have been us ed for the recognition of ADL and environments based on acoustic data, show ing that the MFCC, z ero cros sing rate, spectral roll-off, spectral centroid, spectral flux, total RMS ene rgy, mean, standard deviation, minimum, median, and low energ y frame rate are t he most used features, w ith more relevance for MFCC . Tab le 2 - Di stribution of the features extracted in the studies analyzed Features: Numbe r of Studies: Mel-Frequency C epstral Coefficients (MFCC) 18 zero-crossing rate 8 spectral roll-of f 6 spectral centroid 5 spectral flux 5 tota l Roo t-Mean-Square (RMS ) energy 4 mean 3 standard deviat ion 3 minimum 3 median 3 low energy frame rate 3 spectral spread 2 log power 2 skewness 2 kurtosis 2 sound pressure level (SPL) 2 bandwidth 2 Relative S pectral Entropy (RSE) 2 tota l spectrum power 1 Features: Numbe r of Studies: sub-band powers 1 range 1 angular degree 1 slope 1 coefficient o f variation 1 inverse coefficient of variation 1 trimmed mean 1 percentiles (1st, 5 7th, 95th, and 99 th) 1 spectral variance 1 spectral skewness 1 spectral kurtosis 1 spectral slope 1 maximum 1 quartiles (1st and 3rd) 1 interquart ile range 1 number of peaks 1 mean distance of peak s 1 mean amplit ude of peaks 1 mean crossing rate 1 linear regression slope 1 spectral flat ness 1 threshold 1 noise level 1 one-third- octave spec tra 1 statisti cal indices 1 motif 1 normalized weighted phase deviati on 1 Normalized Mel-Frequency Bands 1 short-time energy 1 temporal cent roid 1 energy entropy 1 autocorrelat ion 1 spectral entropy 1 At the end, the recognition of ADL and environment s may be per formed w ith several met hods presented in the table 3, concluding tha t the most used method s are SVM, MLP , GMM, and DNN methods. Following t he most used methods for the recognition of ADL and environments using the acoustic signal, implemented in more than 3 studies analyzed, the method tha t reports the best average accuracy in the recognition of ADL and environments is the MLP, with an average a ccuracy of 88%. Tab le 3 - Di stribution of the classifi cation methods u sed in the stud ies a nalyzed Methods: Number of Studies: Average of Reported Accuracy: Support Vector M achine (SVM) 10 77% Gaussian mixt ure model (GMM) 5 76% Artifi cial Neural Networks (ANN) / Multi- Layer Perceptron (MLP) 3 88% Deep Neural Networks (DNN) 3 68% Hidden Markov Model s (HMM) 2 87% J48 decision tree 2 84% FT decision tree 2 84% LMT decision tree 2 84% k-Nearest Neighbour (k- NN) 1 98% Gradient Boosti ng Decision Tre e 1 92% Methods: Number of Studies: Average of Reported Accuracy: IBk lazy algorit hm 1 91% logistic regression 1 90% linear regression 1 90% Feedforward neural networks 1 90% diverse density (DD) 1 87% expected maxi mization (EM) 1 87% Random Forests 1 85% Linear Discriminant Classifier (LDC) 1 67% Recurrent Neural Networks (RNN) 1 24% 3. Methods In coherence with the methods defined in the previous studies [4, 14] for the development of the framework for the recognit ion of ADL and their environments [5 - 7] , t he me thods developed in thi s stud y should be separated in s everal methods, such as data acqui sition, data proce ssing, data fusion, and artifi cial intelligence methods, where the fusion of the data a nd the application of art ificial intelligence methods are performed at t he same time. Following t he steps for the creati on of the met hod, firstly, the section 3.1 presented t he data acquisition me thods. Secondl y, the data processing m ethods are presented in the section 3. 2. Finalizing this section with t he presentation of the data fusion and arti ficial intelligence methods in t he s ection 3.3 . 3.1. Data Acquisition The data acquisition w as per formed w ith a mobile application install ed in a BQ Aquarius device [47] with the Androi d operating system [48, 49] installed, which allow s the captures of the sensor s’ data and , at the first s tage, s aves the data acquired from the microphone in a raw format into text files , and, in a second stage, the data capt ured from the accelerometer , ma g netometer, and gyroscope sensors a re al so saved in text files. The sens ors’ data is captured in slots of 5 se conds ev ery 5 minutes in background and a frequency of data acquisition by the accelerometer, magnetometer, and gyroscope sensors is around 10ms. Before the e xperiments, the user s elected the ADL that are performed and/or the environments were the AD L are performed. The experiments were performed with the mobile device in the pocket by people aged between 16 and 60 years old and different lifestyles. Following the most common environments and the most identif ied ADL in the l iterature, the allowed AD L in the mobile application are running, walking, going upst airs, sleeping, goi ng downstairs, an d standi ng, and the allowed environments in the mobile application a re bar, classro om, gym, ki tchen, libra ry, street , ha ll, watching TV and bedr oom. A minimum of 20 00 experiments for each ADL and environme nt have been acquired and stored in the ALLab MediaWiki [5 0]. 3.2. Data Processing Another module of the framework for t he recogni tion of ADL and their environment s is composed by data processing methods. T his module is compos ed by data cl eaning methods, presented in the sec tion 3.2.1 , and me thods for extraction of feature, presented in t he section 3.2.2 . 3.2.1. Data Cleaning Data cleaning methods are di fferent for each t ype of sensors, r emoving the noise and the inval id data present in the data acquired. For the data cap tured with the microphone availabl e in the mobile device , the Fast Fourier Transform (FFT) [ 51] is the best method to apply for the extraction of the frequencies o f the audio signal, handling the reduction of t he environmental noise. For the data captured w ith th e accelerometer, magnetomet er, and gyroscope sens or s, the lo w pass filter [52] is the best method for t he reduction of t he nois e captured during t he ADL with movemen t. 3.2.2. Feature Extraction In cohere nce with our previous studies [4, 14], based on the data filtered and the most features extract ed in the studies availabl e in the literat ure, the features extract ed for the methods us ing acoustic data for the re cognition of environments were the 26 MF CC c oefficients, the Standard Deviation of the raw s ignal, the Average of the raw signal, the Maximum va lue of the raw s ignal, the Minimum value of the raw signal, t he Variance of t he of the raw signal, and the M edian of the raw signal. O n the other hand, the feat ures extracted from the accelerometer, gyroscope, and magnetomet er sensors w ere the 5 greatest dis tances between the maximum peaks , the Average of the maximum peaks , the Standa r d Deviation of the ma ximum peaks, the V ariance o f the max i mum peaks, the Median of the maximum peaks, t he Standa rd Deviat ion of t he raw signal, the Average o f the raw signal, t he M aximum value of the raw signal, the Minimum val ue of the raw signal, the Variance of the of the raw signal, the Median of the raw signal, and the environment recognized. 3.3. Identification of Activities of Daily Living and thei r environment In continuation of our prev ious studies [ 4, 14] using the accelerometer, gyroscope and magnetomet er sensors, this s tudy creates dat as ets with the f eatures ex tracted from the acoustic da ta for the recognit ion of the environment (section 3.3.1), the f ea tures extract ed from the fusion of the accelerometer data and the environment recognized (section 3.3. 2), the features extracted from the fusion of the accelerometer and magnetometer data and t he environment recognized (section 3.3.3), and the features ext racted from the fusion of the accelerometer, magnetome ter and gyro scope data and the environment recognized (section 3.3.4). At the end of this section, the artifici al intelligen ce methods f or t he recognit ion of ADL and their environment s are presented in the section 3.3.5. 3.3.1. Identification of environments of Activitie s of Daily Living using Microphone Regarding the fea tures extracted from each environmen t , f ou r datasets ha ve been constructed with features extrac ted from the microphone d ata acqui red in t he defined environments, ha ving 2000 records from each environment. The datasets defined a re: • Dataset 1: Composed by 26 MFCC coefficients, St andard Deviat ion of t he raw signal, Average of the raw signal, Maxi mum v alue of the raw s ignal, M inimum value of t he raw signal , Variance of the of the raw s ignal, and Median of the raw signal, ext racted from t he microphone data; • Dataset 2: Composed by Standard Deviation of the raw s ignal, Average of the raw s ignal, Maximum value of the r aw signal, Minimum va lue of the raw signal, Variance of the o f the raw signal, and Median of t he raw signal, ex tracted from the microphone data; • Dataset 3: Composed by Standard Deviation of the raw s ignal, Average of the raw s ignal, Variance of the of the raw s ignal, and Median of the ra w signal, extract ed from the microphone data; • Dataset 4: Composed by S tandard Deviat ion of the raw signal, and Avera ge of t he raw signal, extract ed from the microphone dat a. 3.3.2. Data fusion of the environment recognized with the Accelerometer d ata for the recognition of standing activitie s Regarding the features e xtract ed from each standing activity , five da tasets have been constructed with feat ures extracted from t he accelerometer data acqui red during the performance of the two standing activities, having 20 00 records from each activity . T his method allows the distinction between sleeping and watching TV . T he datasets defined are: • Dataset 1: Composed by 5 greatest distances be tween the ma ximum peaks, Average of the maximum peaks, S tandard Deviation of t he max imum peak s, Variance of t he maximum peaks, Median of the ma ximum peaks, S tandard Deviation of t he raw signal, Average of the raw signal, Maximum value of the raw signal, Mini mum value of the r aw signal, Variance of the of the r aw signal, and Median of the raw s ignal, ex tracted fr om the acceleromet er se nsor, and the environment recognized with the features defined in the section 3.3.1; • Dataset 2: Composed by Average of the maximum peaks, Standard D eviation of the maximum peaks, Variance of the maximum peaks, Median of t he maximum peaks, Standard Deviation of the r aw signal, Average o f the raw signal, Maximum value of the raw s ignal, Minimum value of the raw s ignal, Variance of the of the raw signal, and Median of the raw signal, extracted from the accelerometer sensor, and the env ironment recognized with the features defined in t he section 3.3.1; • Dataset 3: Composed by Standard Deviation of the raw s ignal, Average of the raw s ignal, Maximum value of the r aw signal, Minimum va lue of the raw signal, Variance of the o f the raw signal, and Median of the raw s ignal, extracted from the a ccelerometer sens or, and the environment recognized with t he features defined in the section 3.3.1; • Dataset 4: Composed by Standard Deviation of the raw s ignal, Average of the raw s ignal, Variance of the of the raw s ignal, and Median of the ra w signal, extract ed from the accelerometer s ensor, and the environment recognized with the features defined in the section 3.3.1 ; • Dataset 5: Composed by S tandard Deviat ion of the raw signal, and Avera ge of t he raw signal , extract ed from the accelerometer s ensor, and the environment recognized w ith the fea tures defined in the secti on 3.3.1. 3.3.3. Data fusion of the environment recognized with th e Accelerometer and Magnetometer data for the recognitio n of standing acti vities Regarding the features e xtracted from each standing activity , five da tasets have been cons tructed with features extracted from the accelerometer and magnetometer sens ors’ data acquired during the performance of t he two standing activiti es, having 2 000 records from each act ivity. T his method all ows the distinct ion between s leeping and watching TV. The datasets defined are: • Dataset 1: Composed by 5 greatest distances be tween the ma ximum peaks, Average of the maximum peaks, S tandard Deviation of t he max imum peak s, Variance of t he maximum peaks, Median of the ma ximum peaks, Standard Deviation of t he raw s ignal, Average of the raw signal, Maximum value of the r aw signal, Minimum value of the r aw signal, Variance of the of the raw signal, and Median of the raw s ignal, extracted from the accelerometer and magnetometer sens ors, and the env ironment recognized w it h the fea tures defined in the section 3.3.1 ; • Dataset 2: Composed by Average of the maximum peaks, Standard D eviation of the maximum peaks, Variance of the maximum peaks, Median of t he maximum peaks, Standard Deviation of the r aw s ignal, Average o f the raw signal, Maximum value of the raw signal, Minimum value of the raw signal, Variance of the of the raw signal , and Median of the raw signal, extracted from the accelerometer and magnetometer sensors, and the environment recognized with the feat ures defined in the section 3.3.1 ; • Dataset 3: Composed by Standard D eviation of the raw s ign al, Average of the raw signal, Maximum value of the r aw signal, Minimum va lue of the raw signal, Variance of the o f the raw signal, and Median of the raw signal, extracted from the accelerometer and magnetometer sens ors, and the env ironment recognized wi th the fea tures defined in the section 3.3.1 ; • Dataset 4: Composed by Standard Deviation of the raw s ignal, Average of the raw s ignal, Variance of the of the raw s ignal, and Median of the ra w signal, extract ed from the accelerometer and magnetometer sens or s, and the environment recognized w ith the features defined in t he section 3.3.1; • Dataset 5: Composed by S tandard Deviat ion of the raw signal, and Avera ge of t he raw signal , extract ed from the acceleromet er and magne tometer se nsors, an d the environment recognized with the feat ures defined in the section 3.3.1 . 3.3.4. Data fusion of the environment recognized with the Accelerometer, Magnetometer and Gyros co pe d ata for the rec ognition of standing activities Regarding the features e xtract ed from each standing activity , five da tasets have bee n cons tructed with features extracted from the accelerometer, magnetometer and gyroscope sens ors’ data acquired during the performance of the two s tanding ac tivities, havi ng 2000 records from each ac tivi ty . This method allows the dist inction between sleeping and watching TV. T he datasets defined are: • Dataset 1: Composed by 5 greatest distances be tween the ma ximum peaks, Average of the maximum peaks, S tandard Deviation of t he max imum peak s, Variance of t he maximum peaks, Median of the ma ximum peaks, Standard Deviation of t he raw s ignal, Average of the raw signal, Maximum value of the r aw signal, Minimum value of the r aw signal, Variance of the of the raw signal, and Median of the raw signal, extracted from the accelerometer, magnetometer and gyroscope s ensors, and the environment re cognized with the features defined in the secti on 3.3.1; • Dataset 2: Composed by Average of the maximum peaks, Standard D eviation of the maximum peaks, Variance of the maximum peaks, Median of t he maximum peaks, Standard Deviation of the r aw s ignal, Average o f the raw signal, Maximum value of the raw signal, Mi nimum value of the raw s ignal, Variance of the of the raw signal, and Media n of the raw signal, extracted from the accelerometer, magnetome ter an d gyroscope s ensors, and the environment recognized with t he features defined in the section 3.3.1; • Dataset 3: Composed by Standard Deviation of the raw s ignal, Average of the raw s ignal, Maximum value of the r aw signal, Minimum va lue of the raw signal, Variance of the o f the raw signal, and Median of the raw s ignal, extracted from the accelerometer, magnetometer and g yroscope sensors, and the environment recognized w it h the features defined in the section 3.3.1 ; • Dataset 4: Composed by Standard Deviation of the raw s ignal, Average of the raw s ignal, Variance of the of the raw s ignal, and Median of the ra w signal, extract ed from the accelerometer, magnetometer and gyroscope sensors, and the environment recognized with the feat ures defined in the s ection 3 .3.1 ; • Dataset 5: Composed by S tandard Deviat ion of the raw signal, and Avera ge of t he raw signal, extract ed from th e accelerometer, magnetometer and gyroscope sensors, and the environment recognized with t he features defined in the section 3.3.1. 3.3.5. Artificial Intelligence Based on the li terature review related to the use of acoustic dat a for the recognition of the environments, pres ented in the section 2, one of the mos t used methods for t he re cognition of environments is the ANN, reporting better accuracy than other most used methods, such as SVM, GMM, and DNN. In addition, based on the literature reviews about the recognition of ADL us ing accelerometer, magnetometer and gyroscope sensors, presented in our previ ous studies [4, 14], one o f t he mos t used methods for the recognition of ADL is t he ANN, repor ting be tter accuracy tha n S VM, KNN, Random F ores t, and Naïve Bayes, how ever the results ob tained in these studi es [4, 14] proved that D NN reports better accuracy. For the i dentification of the best methods for the recognition of environments and standing activities proposed in the sections 3.3.1, 3.3.2, 3. 3.3 and 3. 3.4, this study explores the use o f three types of neural networks, such as MLP, Feedforward Neural Network, and DNN, with di fferent frameworks, these are: • MLP with Backpropagation, applied with Neuroph framework [15]; • Feedforward Neural Network with Backpropagat ion, applied w ith Encog framework [16] ; • Deep Neural Networks, applied with DeepLearning4j framework [17] . Before the implement ation of the MLP w ith Backpropagation, and the Feedforward Neural Network with Backpropagat ion, the datasets should be normalized with the MIN/MAX normalizer [53] , implementing these methods with non -normalized and norm alized data to verify if the normaliza tion increases the accuracy of the recognit ion of ADL and environm ents. On the ot her hand, before the implementat ion of DNN method, the datasets should be normalized with mean and standard deviation [54] and applied the L 2 regularization [55], implementing this method with non-normalized and normalized data to verify if the no rmalization increases the accuracy of the recognition of ADL and environments. The number of training iterations is another f actor that may affect the accuracy of the me thods and we defined three l imits for the verifica tion of the best number of it er ations for the recognition o f ADL and environments, these are 1M, 2 M and 4M. Based on the dat as ets defined in the sections 3.3.1, 3.3.2, 3.3.3 and 3.3 .4, the created methods should be implemented in a framework for t he recognition of ADL and environments defined in [5 - 7] . For the recognition of common ADL, a s concluded in the previous s tudies [4, 14], the method tha t s hould b e implemented is DNN w ith normalized data and L 2 regularization, however this research w ill identify the best method for the recognition of the environments, based in the datasets defined in the section 3.3.1, and the best methods fort the distinct ion between standing activities, based on the datasets defined in the section 3.3 .2, 3.3.3 and 3.3.4. 4. Results The results of this paper are focused on the creation of on e method for the recognition of the environments using the microphone data, and three method s for the recognition of standing activities with different number o f s ensors. Firstl y, the results of th e creation o f a method for the recognition o f the environment are presented in the section 4.1. Secondly, th e results of the creati on of a method w ith accelerometer sensor are presented in the section 4.2. T hirdly, the results of the creat ion of a method with accelerometer and magnetometer s ensors are presented i n t he secti on 4 .3. Fina lly, the results of the creation of a me thod with accelerometer, magnetometer, an d gyroscope sens ors are presented in the section 4.4. 4.1. I dentification of the environment of the Activities of Daily Living with Microphon e Based on the dat as ets defined in the section 3.3.1, t he three types of neural net works proposed in the section 3.3.5 were implemented, these a re MLP with Backprop agation, Feedforward Neural Network wit h Backpropagation, and DNN. T he datasets defined for t raining and testing phases are composed by 16000 records, where each environments has 2000 records. Firstly, the results of t he implement ation of the MLP with Back propagation using t he Neurop h framework are pre sented in the f igure 1, verifying that the results have very low accuracy wi th all datasets. With non-normalized da ta (figure 1-a), the res ults achieved are betwee n 10% and 15% . And , with normalized dat a (figure 1-b), t he results obtained are between 10% and 20%, w here t he bes t re sults are achieved with dat aset 1. a) b) Figu re 1 – Results o btain ed wi th Neu roph framewor k for the different data sets of micr ophone da ta (h ori zontal axis) and differ ent maximum number of i terations (series), obtainin g the ac curacy in percentage (vertical axis). The figure a) shows the results with da t a without n or malization. The figure b) shows the re sults with normalized da ta. 0 2 4 6 8 10 12 14 1 2 3 4 Neur oph 1M iterations 2M iterations 4M iterations 0 5 10 15 20 25 1 2 3 4 Neur oph With Nor malized Da ta 1M iterations 2M iterations 4M iterations Secondly, the results of the implementation of the Feedforward Neural Network w ith Backpropagation using the Encog framewo rk are presented in the figure 2. In gene ral, thi s type of neural network reports better results with non-normalized data. W ith non-normalized data (f igure 2-a), the neural networks re ports results higher than 70% with dataset 1 with all ma ximum number of training iterat ions, dataset 2 with 1M of training itera tions, and datas et 4 w ith 4M of training i terations. W ith normalized data (f igure2-b), the neural netw orks reports result s below than 6 0%, but t he r esults achieved are higher than 6 0% with the dataset 4 trained over 1M and 2M of iterations. a)¶ b) Figu re 2 – Results o btain ed wi th Encog framework for the differ ent d atasets of microphone data (horizontal axis) and different maximum number o f iterations (series), obtaining the accuracy in percentag e (verti cal ax is). The figure a) shows the results with data w ithout normaliz ation. The figu re b) show s the results with normalized da ta. Finall y, the results of the implementa tion of DNN w ith D e epLearning4j framework are presented in the figure 3. With non-normalized data (figure 3-a), the results obta ined are below 20% with datasets 1 and 2, and the results obtained are higher than 40% with datasets 3 and 4. On the other hand, w ith normalized data (figure 3-b), the results reported are round 50 % with all dat asets. a) b) Figu re 3 – Results o btain ed wi th DeepLea rning4j framewor k for the different da tasets of microphone da t a (h ori zontal axis) and d ifferent m aximum nu mber o f iterations (series), obtaining the accuracy in percentage (vertical axis). The figure a) shows the results with data without norm a lization. The figur e b) shows the results with normaliz ed da ta. In table 4, the maximum accuracies achieved with the d ifferent types of neural networks are related with the different datasets us ed for the microphone data, and the maximum number of training iterations, 0 20 40 60 80 100 1 2 3 4 Encog 1M iterations 2M iterations 4M iterations 0 20 40 60 80 100 1 2 3 4 Encog With Normalized Da ta 1M iterations 2M iterations 4M iterations 0 10 20 30 40 50 60 1 2 3 4 Deep Learning 1M iterations 2M iterations 4M iterations 0 10 20 30 40 50 60 70 1 2 3 4 Deep Learning With Normalized Da ta 1M iterations 2M iterations 4M iterations verifying that the bes t results are achieved w ith the Feedforward Neural N etwork w ith Backpropagation with non-normali zed data. Tab le 4 - Best accuraci es obtained with th e d ifferent frame works, data sets and number of iter ations for the reco gn itio n of environments using micr ophone da ta . FRAMEWORK DATASETS ITERATIONS NEEDED F OR TRAINING BEST ACCURACY ACHIEVED (%) NOT NORMALIZED DATA NEUROPH 2 1M 12.86 ENCOG 1 2M 86.50 DEEP LE ARNING 4 4M 48.11 NORMALIZED DATA NEUROPH 1 1M 19.43 ENCOG 4 1M 82.75 DEEP LE ARNING 4 4M 48.74 In conclusion, the method for the recognition of the environm ent that should be implemented in the framework for the re cognition of ADL and their environmen ts is the Feedforward Neural N etwork wit h Backpropagation using non-normalized data, because achieves results around 8 6.50% with t he dataset 1. 4.2. I dentification of the standing activities with the e nvironment recognized and the Accelerometer sensor Based on the dat as ets defined in the section 3.3.2, the three types of neural networks proposed in the section 3.3.5 were implemented, these are MLP with Backpropagation, F ee dforward Neural Network wit h Backpropagation, and D NN . The datasets defined for training and testing ph ases are composed by 4 00 0 records, where each ADL has 2000 records. Firstly, the results of t he implement ation of the MLP with Back propagation using t he Neuroph framework are presented in t he f igure 4, verifying that the resu lts have reliable accuracy with all da tasets. With non-normalized data (figure 4-a), the results achieved are between 50 % and 100%, where t he better accuracy w as achieved with the datasets 1 and 4. And, with normalized data ( figure 4 -b), the res ults obtained are al ways around 100% with all datasets. a) b) Figu re 4 – Results obtain ed with Neuroph framewo rk for the differ ent da tasets of environme nt a nd accelerometer data (hor izontal axis) and different maximum number of i terations (series), obtaining the a cc uracy in p ercentage (vertical axis). The figure a) shows the results with data without norm alization. The figure b) shows the results with normaliz ed da ta. 0 20 40 60 80 100 1 2 3 4 5 Neur oph 1M iterations 2M iterations 4M iterations 0 20 40 60 80 100 1 2 3 4 5 Neur oph With Nor malized Da ta 1M iterations 2M iterations 4M iterations Secondly, the results of the implementation of the Feedforward Neural Network w ith Backpropagation using the Encog framework are presented in the figure 5, verifying that t he re sults have reliable accuracy with all datasets. Wit h non- norm alized data (figure 5-a), the results achieved are around 10 0%, except with da taset 1 that achieves an accuracy around 50%. And, with normalized d ata (figure 5 - b), the results obta ined are a lways around 100% with all datasets. a)¶ b) Figu re 5 – Results o btain ed wi th Encog framework for the differ ent da tasets of environment an d acceler ometer data (horizo ntal ax is) and different maximum number of iterations (series), obtaining the accur acy in percentage (vertical axis). Th e figure a) shows the resu lts with data without normalizatio n. The figu re b) shows t he results with normaliz ed da ta. Finall y, the results of the implementa tion of DNN w ith DeepL earning4j framework are presented in the figure 6, verifying tha t the results have re liable accuracy with all da tasets. With non-normalized d ata (figure 6-a), the results obt ained are around 1 00% with dataset s 2 , 4 a nd 5 wit h all trai ning i terat ions, and with dataset 3 with 4M iterations, but the res ults obta ined w ith o ther datasets are bel ow the expectat ions. On the o ther hand, with norm alized da ta (fig ure 6 -b), the results obtained are always around 100 % w ith all datasets. a) b) Figu re 6 – Results obtain ed with DeepLearn ing4j framewor k for the different da tasets of environme nt an d accelero meter data (horizo nta l axis) and differ en t maximum number of iterations (series), obtaining the accuracy in percentag e (ver tical axis). The figure a) shows the results with data without nor malization. The figure b) shows the results with norm alized data . In table 5, the maximum accuracies achieved with the different types of neural networks are presented with the re lation of the different da tasets used for the environment recognized and the acceleromete r 0 20 40 60 80 100 1 2 3 4 5 Encog 1M iterations 2M iterations 4M iterations 0 20 40 60 80 100 1 2 3 4 5 Encog With Normalized Da ta 1M iterations 2M iterations 4M iterations 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 Deep Learning 1M iterations 2M iterations 4M iterations 0 20 40 60 80 100 1 2 3 4 5 Deep Learning With Norma liz ed Dat a 1M iterations 2M iterations 4M iterations data, and the m aximum number of i terations, verifying tha t the use of all neur al ne tworks achieves reliable results. Tab le 5 - Best accuraci es obtained with th e d ifferent frame works, data sets and number of iter ations for the reco gn itio n of stand ing a ctiviti es with the a cceler ome t er data and th e enviro nments recogn ized. FRAMEWORK DATASETS ITERATIONS NEEDED F OR TRAINING BEST ACCURACY ACHIEVED (%) NOT NORMALIZED DATA NEUROPH 1 1M 100.00 ENCOG 2 1M 100.00 DEEP LE ARNING 2 1M 100.00 NORMALIZED DATA NEUROPH 1 1M 100.00 ENCOG 1 1M 100.00 DEEP LE ARNING 1 1M 100.00 Regarding the results obtai ned, in the case of t he use of the environment recognized and the accelerometer data in the module for t he recognition of sta nding activities in the framework for the identifi ca tion ADL and their env ironments, t he type of neural n etworks that should be used is a DNN with normalized data , because the results obtained are always 100%. 4.3. I dentification of the standing activities with the e nvironment recogn ized and the Accelerometer and Magnetometer sensors Based on the dat as ets defi ned i n t he secti on 3 .3.3, the t hree t y pes of neural net works proposed in the section 3.3.5 were implemented, these a re MLP w ith Backpropagation, Feedforward Neural Network wit h Backpropagation, and DN N. T he datasets defined for traini ng and testing ph ases are composed by 400 0 records, where each ADL has 2000 records. Firstly, the results of t he implement ation of the MLP with Back propagation using t he Neuroph framework a re presented in t he fi gure 7, verifying that the resu lts have reliable accuracy with all da tasets. With non-normali zed da ta (figure 7-a), the results ach ieved are around 100 %, e xcept with the da tasets 1 and 5 tha t achieves an accuracy around 5 0%. And, with normal ized data (fi gure 7-b), the results obta ined are always around 100 % w ith all d atasets. a) b) Figu re 7 – Results obtain ed with Neuroph framewo rk for the differ ent da tasets of environme nt, a nd accelero m eter and magnetom eter sensor s’ data (horizo nta l axis) and differ en t maximum number of iterations (series), obtainin g the accu racy in percentag e (verti ca l axis). The figure a) shows the resul ts with da ta without norm a lization. The figure b) shows the results with normaliz ed da ta . 0 20 40 60 80 100 1 2 3 4 5 Neur oph 1M iterations 2M iterations 4M iterations 0 20 40 60 80 100 1 2 3 4 5 Neur oph With Nor malized Da ta 1M iterations 2M iterations 4M iterations Secondly, the results of the imp lementation of the Feedforward Neural Network w ith Backpropagation using the Encog framework are presented in the figure 8, verifying that t he re sults have reliable accuracy with al l datasets. With non -normalized da ta ( f igure 8 -a), the res ults achieved are always around 100%. And, w ith normalized data (figure 8-b), the res ults obtained are always around 100% w ith all dat asets. a)¶ b) Figu re 8 – Results o btain ed wi th Encog framewo rk for the differ ent da tasets of environment, an d acceler ometer and magnetometer sensors’ da ta (h ori zontal axis) and d iffere nt maximum number of iter ations (series), obta ining the accura cy in percentag e (ver tical axis). The figure a) shows the resu lts with da ta without norm alization. The figure b) shows the results with norm alized data . Finall y, the results of the implementa tion of DNN w ith DeepL earning4j framewo rk are presented in the figure 9. Wi th non-normalized data (figure 9-a), the results obtained are around 10 0% with data set 5 with all training iterations , and with dataset 4 with 1 M of traini ng iterati ons, but the results obtained w ith other datasets are below the e xpectations. On the other ha nd, w ith normalized d ata (figure 9 -b), the results obtained are always around 100% with all datasets. a) b) Figu re 9 – Results obtain ed with DeepLearn ing4j framewor k for the different da tasets of environme nt, an d accelero meter and magnetometer sensors’ da ta (h oriz ontal ax is) and differ ent maximum number of i terations (series), o b taining the accu r a cy in percentage (verti cal axis). Th e figure a) sh ows the re sults with da ta without normaliz ation. The figu re b) shows the results with norm a lized data. In table 6, the maximum accu racies achieved with the different types of neural networks are presented with the re lati on of the different datasets used for the enviro nment recog nized, and the accelerometer and magnetometer sensors’ data, and the maximum number of iterations, verifying that the use of all neural networks achieves reliable results. 99.88 99.9 99.92 99.94 99.96 99.98 100 1 2 3 4 5 Encog 1M iterations 2M iterations 4M iterations 0 20 40 60 80 100 1 2 3 4 5 Encog With Normalized Da ta 1M iterations 2M iterations 4M iterations 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 Deep Learning 1M iterations 2M iterations 4M iterations 0 20 40 60 80 100 1 2 3 4 5 Deep Learning With Normalized Da ta 1M iterations 2M iterations 4M iterations Tab le 6 - Best accuraci es obtained with th e d ifferent frame works, data sets and number of iter ations for the reco gn itio n of stand ing a ctiviti es with the accelerom et er and magneto meter data , and the environme nts reco gn ized. FRAMEWORK DATASETS ITERATIONS NEEDED F OR TRAINING BEST ACCURACY ACHIEVED (%) NOT NORMALIZED DATA NEUROPH 4 1M 99.05 ENCOG 2 1M 100.00 DEEP LE ARNING 3 1M 89.55 NORMALIZED DATA NEUROPH 1 1M 100.00 ENCOG 1 1M 100.00 DEEP LE ARNING 1 1M 100.00 Regarding t he results obtained, i n the c ase of the use of the environment recognized, and the accelerometer and magnetomet er sens ors’ dat a in the modul e for the recognition of standing activi ties in the framework for the identification ADL and their enviro nments , the type of neura l networks that should be used is a D NN with normalized data, because t he results obtained are always 100%. 4.4. I dentification of the standing activities with the e nvironment recognized and the Accelerometer, Magnetometer and Gyroscope sensors Based on the dat as ets defi ned i n t he secti on 3.3.4, the three t y pes of neur al networks proposed i n t he section 3.3.5 were implemented, these are MLP w ith Backpropagation, Feedforward Neural N etwork with Backpropagation, and DNN . T he datasets defined for trai ning and testing ph ases are composed by 4 00 0 records, where each ADL has 2000 records. Firstly, the results of t he implement ation of the MLP with Back propagation using t he Neuroph framework are pres ented in the figure 1 0, verif ying that th e results have rel iable accuracy with a ll datasets. With non-normalized data (figure 10-a), the results a chieved are around 100%, except with the datasets 1 tha t achieves an accuracy around 50%. And, w ith normalized data (f igure 10-b), the results obtained are a lways around 100% with all datasets. a) b) Fi gure 10 – Results obtained with Neurop h framewor k for the different da tasets of environment, and a ccelero meter, magnetometer and gyroscope sensor s’ data (h ori zontal axis) an d differe nt maximum number of iterations (series), obtaining the accur acy in percentage (vertical axis). The figure a) shows the results with data without norm alization. The figu re b) s hows the results with norm alized data . Secondly, the results of the imp lementation of the Feedforward Neural Network w ith Backpropagation us ing the Encog framew ork are presented in the figure 11 , verifying that the results have 0 20 40 60 80 100 1 2 3 4 5 Neur oph 1M iterations 2M iterations 4M iterations 0 20 40 60 80 100 1 2 3 4 5 Neur oph With Nor malized Da ta 1M iterations 2M iterations 4M iterations reliable accuracy with all datasets. W ith non-normal ized da ta (figure 11-a), the results achieved are always around 100%. And, w ith normal ized data (figure 11- b), the results obta ined are always around 1 00% with all dat asets. a)¶ b) Figu re 11 – Results o bta ined with Encog framework for the different data sets of enviro nment, an d accelerom eter, magnetometer and gyroscope s ensors’ data (h ori zontal axis) an d differ ent maximum number of iterations (ser ies), obtaining t he accu r acy in percentage (vertical axis). The figure a) shows the results wi th da ta without norm alization. The figu re b) s hows the results with norm alized data . Finall y, the results of the implementa tion of DNN w ith DeepL earning4j framewo rk are presented in the figure 12. Wi th non-normalized data (figure 12-a), the res ults obtained are around 90% with dataset 5 with all training iterations, but the results obt ained with other datasets are below the expectat ions. O n the other hand, wit h n ormalized data (figure 12-b), the results obtained are always around 100% with all datasets. a) b) Figu re 12 – Results o bta ined with DeepLearning4j fr a mewor k for the different da tasets of environme nt, an d accelero meter, mag netomete r and gyrosco pe sensors’ data (horizontal ax is) and different maximum number of iteratio ns (series), obtainin g the accuracy in percentage (vertical axis). The figure a) sho ws the results with data without normaliz ation. Th e figure b) shows the results with norm alized d ata. In table 7, the maximum accuracies achieved with the different types of neural networks are presented with the relation of the different datasets used fo r the enviro nment recognized, and the accelerometer, magnetometer and gyros cope sens ors’ data, and the maximum number of iterations, verify ing tha t th e use of all neural net works achieves reliable results. 99.2 99.4 99.6 99.8 100 1 2 3 4 5 Encog 1M iterations 2M iterations 4M iterations 99.75 99.8 99.85 99.9 99.95 100 1 2 3 4 5 Encog With Normalized Da ta 1M iterations 2M iterations 4M iterations 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 Deep Learning 1M iterations 2M iterations 4M iterations 99.97 99.975 99.98 99.985 99.99 99.995 100 1 2 3 4 5 Deep Learning With Norma liz ed Dat a 1M iterations 2M iterations 4M iterations Tab le 7 - Best accuraci es obtained with th e d ifferent frame works, data sets and number of iter ations for the reco gn itio n of stand ing a ctiviti es with the accelerom et er, gyroscope and magnetom eter data, an d the envir onments reco gn ized. FRAMEWORK DATASETS ITERATIONS NEEDE D FOR TRAINING BEST ACCURACY ACHIEVED (%) NOT NORMALIZED DATA NEUROPH 2 1M 100.00 ENCOG 3 1M 100.00 DEEP LE ARNING 5 1M 89.55 NORMALIZED DATA NEUROPH 1 1M 100.00 ENCOG 1 1M 100.00 DEEP LE ARNING 1 1M 100.00 Regarding the results obtai ned, in the case of t he use of the environment recognized and the accelerometer, magnetometer and gyroscope sens ors’ dat a in the module for t he recognition of standing activit ies in the framework for the identification ADL and their environments, the type of neural net works that should be used is a DNN with normalized data , because th e results obtained are a lways 100%. 5. Discussion This research is included in the development of the framework for the recognition of ADL and their environments, pres ented in [5 - 7] , composed by several modules, includ ing data acquisition, data processing, data fusion, and artificial intelligence methods. T he definition of the method for the identifi ca tion st arted in the previous s tudies [4, 14], where s everal ADL were recognized us ing accelerometer, gyroscope and magnetometer sensors, these are going downstairs, going upstairs, running, walking and sta nding with the DNN method, wit h the normalization of the data and the applicat ion of L 2 regulariza tion. At the section 3.3 .1, the resul ts of the recognition of the environments using t he mi crophone dat a, where t he envi ronments recognized are bar, class room, gym, k itchen, library, street, hall, watching TV and bedroom w ith Feedforward ne ural networks w ith non -normal ized data. Fusing the environment recognized w ith the a ccelerometer, gy roscope and magnetometer sensors’ data, the recognition of more s tanding activit ies ( i.e., watching T V and s leeping) was allowed, increasing the number of ADL recognized a t this stage of the development of the framework for the recognition of ADL and environments, as presented in the fi gure 13. Figu re 13 – ADL an d environments r ecognized b y the framewor k f or the recognition of ADL and en viro nments. The choice of t he methods for dat a fusion, and art ificial intelli g ence modules, depends on t he number of s ensors avail able on the mobile devi ce, using the maximum number of s ensors avai lable on the mobile device, in order to increase the reliability of the method. In the figure 14 , a simplified s chema for the development of a framework for the identification of ADL is presented. Figu re 14 - S implified diag ram fo r the framework for the identification of ADL. Firstly, based on the res ults obtained in the sec tion 4 .1 , the best results achieved for each type of neural network are presented in the table 4, verifying that th e best method for the recognit ion of the environments is the Feedforward neural networks with non - normalized data, reporting an accuracy of 86.50 %. Secondly, based on re sults obtained wit h the use of th e environment recognized and the accelerometer data, presented in t he section 4.2 , the recogni tion of st anding activities is allowed and the best results achieved for each ty pe of neural network are pre sented in the t able 5, verify ing that the best method for the recognition of the standing activities is the D NN method with normalization of the data and the appli ca tion of L 2 regularization, reporting an accuracy of 100%. Figu re 15 - S implified diag ram fo r the framework for the i dentification of ADL with ind ication of the m ethods for each stage. Thirdly, based on res ults o btained wit h the use of t he environment recognized and the accelerometer and magnetometer sensors’ data, presented in the section 4. 3, the recogniti on o f standing activities is allowed and the best results achieved for each type of neur al netw ork are pres e nted in the table 6, verifying that the best method for the recogni tion of the st anding activities is the DN N method w ith normalizati on of the data and the application of L 2 regularization, reporting an accuracy of 100%. Finall y, b ased on results obta ined with t he use of the envi ro nment recognized and the accelerome ter, magnetometer and gyroscope s ensors’ data, presented in the s ection 4.4, the recog nition of s tanding activit ies is allowed and the bes t results achieved for each type of neural network are pr es ented in the table 7, verifying th at the bes t method for the re cognition of the standing activities is the DNN method with normalizat ion of the data and the application of L 2 regularization, reporting an accuracy of 100%. In conclusion, when the activity was recognized as standing and the environment is correctly identifi ed, the accuracy for the recognit ion of s tanding activit ies i s 10 0%. A s presented in the f igure 1 5, at this stage of the development o f the framework for the recog nition of ADL and their environments, two different a rtificial intelligence methods are defined, these are: • DNN with normalized data for t he general identification of ADL ; • Feedforward neural networks with non -normalized data for th e general identification of the environments; • DNN with normalized data for t he iden tifi ca tion of s tanding activiti es. 6. Conclusions The development of a framew ork for the recognition of ADL [1] and their environments using the sensors available in t he of f-the-shelf mobi le devi ces, including accelerometer, gyroscope, magnetometer and microphone, with the architecture presented in [5 - 7] has several modu les, such as data acquisition, data process ing, data fusion and ar tifici a l in telligence metho ds . At this s tage of the development, the proposed ADL for the recognition are running, walking, st anding, g oing ups tairs, sleeping, and going downstairs, and the proposed environments for the recogniti on are bar, classroom, gym, k itchen, library, street, hall, wat ching TV, and bedroom. Depending on the types of sensors , several features were ext r acted from t he sensors’ da ta for further processing. The features extra cted from the microphone are 2 6 MFCC coeffi cien ts, Standa rd Deviation of the raw signal, Average of the raw signal, Max imum value of the raw signal , Minimum value of the raw signal, Variance of the of the raw s ign al , and Median of the ra w signal. And, the features extracted from the accelerometer, magnetometer and gyroscope s ensors are the 5 greatest distances between the maximum peaks, the Average of the maximum peaks, the Stan dard Deviat ion of t he maximum peaks, the Variance of the m aximum peaks, the Median of the maximum peaks, the S tandard Devia tion o f the r aw signal, the Average of the raw signal, the M aximum value of the raw signal, the M inimum value of the raw signal, the Variance of the of the raw signal, an d the Median of the raw signal. The met hod developed should be a function of the number of sens ors available in the off-the-shelf mobile devices, and adapted to the l imited resources of these devices. In coherence w ith the previous s tudies [ 4, 14], t his research includes the comparison of three different types of neural networks, s uch as MLP w ith Backpropagation using the Neu roph framework [15] , the Feedforward Neural Network with B ackpropagat ion us ing the Encog framework [16] , and th e DNN using DeepLearning4j framework [17] , verifying t hat the DNN is the best method for the recogniti on of general ADL and standing activities, but the F eedforward Neural Network with B ackpropagat ion is the best method for t he recognition of environmen ts. The accuracies of the recogni tion AD L and their en vironments a re different depending on t he different stages of the framework for the recognition of ADL and envir onments. Firstly , the best accuracy for the recognition of the general ADL, presented in previous s tudies [4, 14 ], is 85.89 %, implementing DNN using L 2 regularization and normalized dat a. Secondly, the best accuracy for the recognition of the environments is 8 6.50%, implementing Feedforward neural networks with Backpropagation using non-normalized data. Finall y, the recognition of standi ng act ivities are always around 100% with all types of neural networks, but, due to the performance, the best method for the implem e ntation in the framework is D NN using L 2 regularization and normalized data. As future work, the methods f or the recognition of ADL presented in this s tudy should be implemented during the development o f the framework for t he identificatio n of ADL and their environments, adapting the method to the number of sensors available on the mobile d evice . The recognition of the environments allows the framework for ident ify the loca tion in t he indoor/outdoor environments, where the ADL were performed. T he recognition o f the environment can also impr ove the recogni tion o f ADL , increasing the number of ADL recognized. T he data relat ed to this research is available in a free repository [50]. Acknowledgements This work w as supported by FCT projec t UID/EEA/50008/2013 ( Este traba lho foi s uportad o pelo projecto FCT U I D/EEA/50008 /2013 ). The authors would also like to acknowledge the contribution of the C OST Action I C1303 – AAPELE – Architectures, Algorit hms and Protocols for Enhanced Living E nvironments. References [1] D. Foti and J. S. Koketsu, "Activities of daily living," Ped retti’s Occ upational T herapy: Prac tic al Skills for Phys i cal Dysfunction, vol. 7, pp. 157 -232, 2 013 [2] L. H. A. Salazar, T. Lacerda, J. V. Nunes, and C. Gress e von Wangenheim, "A Systematic Literature Review on Usabilit y Heuristics for Mobile Phones," Intern ationa l J ournal of Mobile Hu man Computer Interac tio n, vol. 5, pp. 50-61, 2013. doi: 10.4018/jmhci.2013040103 [3] N. M. Garcia, "A R oadmap to the Design of a Personal Digital Life Coach," in ICT Inn ova tions 2015 , ed: Springer, 2016 . [4] I. M. Pires, N. M. G arcia, N. Pombo, and F. F lórez-Revuelta, "Data Fusion on Motion a nd Magne tic Sensors embedded on Mobile D evices for the Identificat ion of Activities of D aily Living," In Review [5] I. Pires , N. Garcia, N. Pombo, and F. Fl órez -Revuelta, "From D ata Acquis ition to D ata F usion: A Comprehensive Review and a Roadmap for the Identificati on of Activities of Daily Living Using Mobile Devices," Sensors, vol. 16 , p. 184, 2016 [6] I. M. Pires , N. M. Garcia, and F. Flórez-Revuelta, "Mult i-sensor data fusion techniques for the identifi ca tion of act iviti es of daily living using mobile devices ," in Proc ee dings of the ECMLPKDD 2015 Do ctoral Consortium , E uropean Conference on Machine Learning and Principles and Prac tice o f Knowledge Discovery in Databases , Porto, Portugal, 2 015. [7] I. M. Pires, N. M. Garcia, N. Pombo, and F. Flórez -Revuelta, "Identification of Activities of Daily Living U sing Sen sors Available in off-the-shelf Mobile Devices: R esearch and Hypothesis," in Ambient Intelligence-Softwar e and Applicatio ns – 7th International Symposium on Ambient Intelligence (ISAmI 201 6) , 2016, pp. 121 - 130. [8] O. Banos, M. Damas, H. Pomares, and I. Roj as, "O n the use of sens or f usion t o reduce the impact of rotational and additive noise in human act ivity recognition," Senso rs (Basel), vol. 12, pp. 8039- 54, 20 12. doi: 10.3390/s120608039 [9] M. A. A . Akhoundi and E. Valavi , " Multi-Sensor Fuzzy Data Fu sion Using Sensors w ith Different Characteri stics," ar Xiv prepr i nt arXiv:1010.6 09 6, 20 10 [10] P. Paul and T. George, "An Effective Approach for Human Activit y R ecognition on Smartphone," 2015 Ieee Internationa l Con ference on Engineering and T echn olog y (Icetech), pp. 45-47, 2015. doi: 10.1109/ ice tech.2015.7275024 [11] Y. -W. Hsu, K.-H. Chen, J.-J. Yang , a nd F .-S. Jaw , "Smartphone-based fal l detection algori thm us ing feature ex traction," in 2016 9 th Intern ational Co ngress on Im age a nd Sig nal Processing, BioMedica l E ngineering and Informatics (CISP-BME I) , Datong, China, 2016, pp. 1535-1540. [12] S. Dernbach, B. Das, N. C. Krishnan, B. L. Thomas, and D . J. Cook, "Simple and Complex Activity Recognition through Smart Phones," in 2012 8 th Internation al Con fere nce o n Intelligent Environm e nts (IE) , Guanajuato, M exico, 2012, pp. 214-221. [13] C. Shen, Y. F. Chen, and G. S. Yang, " On Motion -Sensor Beha vior Analysis for Human- Activity Recognition via Smartphones," in 2016 Ieee Intern ation al Conference on Identity, Security and Behavior Ana ly sis (Isba) , Sendai, Japan, 2 016, pp. 1- 6. [14] I. M. Pires, N. M. Garcia, N. P ombo, and F. Flórez -Revuelta, "Pattern Recognition Techniques for the Identif ication of Activities of Daily Living using Mob ile Device Accelerometer," In Review [15] Neuroph. (2017, 2 Sep. 2017). Java Neural Networ k Fram ework N euro ph . Available: http:// neuroph .sourceforge.net/ [16] H. Research. (2017 , 2 Sep. 2017). E nco g Mac hine Lear ning Framework . Available: http:// w ww.heatonresearch.com/encog/ [17] A. Chris Nicholson. (2 017, 2 Sep. 20 17). D eeplearnin g4j: Ope n -source, Distributed Deep Le a rning for the JVM . Available: https:// deeplearning4j .org/ [18] N. D. Lane, M. Moha mmod, M. Lin, X. Y ang, H. L u, S. Ali , et a l. , "Bewell: A sma rtphone applicati o n to monitor, model and promote wellbeing," in 5th in ternationa l ICST conferenc e on pervasive computing tec hnologies f o r healthcare , 2011, pp. 23- 26. [19] Y. Mengistu, M. Pham, H. M. D o, and W. Sheng, "AutoHydrate: A Wearable Hydration Moni toring System," 2016 Ieee /Rs j Internati on al Conference on Intelligent Robots and Sys tem s (Iros 20 16) , pp. 1857-1862, 20 16. doi: 10.1109/iros.2016.7759295 [20] M. Nishida, N . Kitaoka, and K . T akeda, "Daily act ivit y recog nition based on acoustic signals and acceleration signals es timat ed with Gaussian process," in 2015 Asia -Pacific Signal and Information Processing Associati on Annual Summit a nd Conference (APSIPA) , 2015, pp. 279-282. [21] G. Filios, S . Nikoletseas, C. Pavlopoulou, M . Rapti, and S. Ziegler, "Hierarchical Algorithm f or Dail y Activit y R ecognition via Smartphone Sensors," 2015 Ie e e 2nd Wo rld Forum on Internet of Things (Wf-Iot), pp. 381-386 , 201 5. doi: 10 . 1109/WF-IoT.2015.7389084 [22] J. R . Delgado- Con treras, J. P. Garćıa -Vázquez, R. F. Brena, C. E . Galván -Tej ada, and J. I. Gal ván- Tejada, "Feature Select ion for Place Class ificat ion through E nvir onmental Sounds," P roced ia Computer Scienc e, vol. 37, pp. 40-47 , 2014. doi: 10.1016/j.procs.2014 .08.010 [23] T. Rahman, A. T. Adams, M . Zhang, E. Cherry, B. Zhou, H. P eng , et al. , "BodyBeat : a mobile system for s ensing non-speech body s ounds," presented at the Proceedings of the 12th annual internat ional conference on Mobile systems, applications, and s ervices, Bretton W oods, New Hampshire, USA, 201 4. [24] M. Mielke and R. Brück, "Smartphone applicat ion for aut oma tic classificat ion of environmental sound," in P roceedings of the 20t h Interna tio na l Conference Mixed Design of Integrated Circuits and Systems - MIXDES 2013 , 20 13, pp. 512-515. [25] X. Guo, Y. Toyoda, H. Li, J. Huang, S. Ding, and Y . Liu, "Enviro nmental sound recognition us ing time-frequency intersection pa tterns," in 2011 3rd In ternat iona l Conference o n Awareness Science and T e chnology (iCAST) , 2011 , pp. 243-246. [26] A. P illos, K . Alghamidi, N. A lzamel, V. Pavlov , and S. Machana vaj hala, "A real- time environmen tal sound recognition s ystem for the Android O S," P roceed ings of Detection a nd Classification of Acoustic Scenes and Events, 2016 [27] M. Mielk e and R. Brueck, "Des ign and ev aluat ion of a smartp hone application for non -speech sound awareness for people with hearing l oss," in 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) , 20 15, pp. 5008-5011. [28] H. D ubey, M. R. Mehl, and K. Mankodiya, "BigEAR: Inf erri ng the Ambient and Emotional Correlat es from Smartphone-Based Acoustic B ig Data," i n 2016 IEEE First Inter national Conference o n C onnected He a lth: Applications, Sy s tems and Engineering Technologies (CHASE) , 2016 , pp. 78-83. [29] N. D. Lane, P. Georgiev, and L. Qendro, "Dee pEar: robust smartphone aud io s ensing in unconstrained acoustic environments using deep learning," presented at the Proceedings of the 2015 A CM Interna tional Joint Conference on P ervasive and Ubi quitous C omputing, Osaka, Japa n, 2015. [30] J. Wang, R. Ruby, L. Wang, and K . Wu, "Accurate Combi ned Keystrokes Detecti on Using Acoustic Signals," in 20 16 12t h In ternationa l Conference on Mo bile Ad -Hoc and Sensor Networks (MSN) , 2016 , pp. 9- 14. [31] M. Ros si, S. Feese, O. Amft, N . B raune, S. Mar tis, and G. T rös ter, "AmbientSense: A real-time ambient sound recogniti on s ystem for smartphones," in 2013 IE EE In ternati ona l Co nference on Pervasive Compu ting a nd Communi cations W orkshops (PERCOM Works hops) , 2013, pp. 230-235. [32] K. N ishijima, S. Uenohara, and K. Furuya, "A Study on the Optimum Number of T raining Data in Snore Activity Detecti on Using SVM," in 2016 10th Internationa l Conference on Complex, Intelligent, and Software Intensive Systems (CISIS) , 2016, pp. 582-584. [33] K. Nishijima, S . Uenohara, and K. Furuya, "S nore a c tivity detecti on using smartphone sens ors," i n 2015 IEEE International Conference on Consumer Electronics - Taiwan , 2015, pp. 1 28- 129. [34] P. Gaunard, C. G. Mub ikangiey, C. Couvreur, and V . Fontai ne, "Automatic class ificat ion of environmental noise even ts by hidden Markov models," i n Acou stics, Speech and Signal Proc e ss ing, 1998. Pr oceedings o f the 1998 IE EE In ternation al Co nference o n , 1998, pp. 3609 - 3612. [35] D. Zilli, O. Pars on, G. V. Merrett, and A . Rogers , "A hidden Markov model-based acoustic cicad a detector for crow dsourced smartphone biodi versity monitoring," J . Artif. In t. Res ., vol. 51, pp. 805 -827, 2014 [36] T. Song, X. Cheng, H. Li, J. Y u, S. W ang, and R . B ie, "D etecting driver phone calls in a moving vehicle based on vo ice features," in IEEE INFOCOM 2016 - T he 35th Annu al IEEE Intern ational Conference on Com puter Communications , 2016, pp. 1- 9. [37] Y. A. Chen, J. Chen, a nd Y. C. Ts eng, "Inf erence of C onversation Partners by Cooperative Acousti c Sensing in Smartphone Networks," IEEE Transactions on Mobile Comp uting, vol. 15, pp. 1387- 1400 , 2016. doi: 10.1109/TMC.2015.2465376 [38] E. F. Gomes, #225, b. Batista, Al, #237, and p. M. Jorge, "Usin g Smartphones to Classify Urban Sounds," presented at the Proceedings of the Nin th In ternational C* Conference on Comput e r Science & S oftware E ngineering, Porto, Portugal, 20 16. [39] H. Lu, W. Pan, N. D . Lane, T. Choudhury, and A. T . Campbell, "So undSense: scalable sound s ensing for people-centric applications on mobile phones ," pre sented at the P roceedings of the 7t h internat ional conference on Mobile systems, applications, and s ervices, Kraków, Poland, 2009. [40] S. S igtia, A. M. Stark, S . Krstulovic, and M. D. Plumbley, "Automa tic Environmental Sound Recognition: P erformance V ersus Computat ional Cost," IEE E/ACM Transactions on Au dio, Speech, and Languag e Processing, vol. 24, pp. 2096-2107, 2016. do i: 10.11 09/taslp.2016.2592698 [41] D. Kelly and B . Caulfi eld, "Pervasive Sound Sensing: A Weakly Supervised Training Approach," IEEE Tr ans Cyber n, vol. 46, pp. 123-35, Jan 2016. doi: 10.1109/TCY B.2015 .2396291 [42] G. T. Abreha, " An environment al aud io-based context recognit ion s ystem using smartphones," ed, 2014 . [43] F. Saki, A . Sehgal, I. Panahi, and N. Keht arnavaz, " Smartphone -based real-ti me class ification of noise signals using s ubband features and random forest classifier," in 2016 IEE E International Conference on A coustics, Sp ee c h and Signal Processing (ICASSP ) , 2016, pp. 2204 - 2208. [44] S. Inoue, N. Ueda, Y. Nohara, and N. Nakashima, "Mobile act ivity re cognition for a whole day: recognizing real nursing activit ies with big dataset," presented at the P roceedings of the 2015 ACM In ternati onal Joint Conference on Pervasive and Ubiquitous Compu ting, Osak a, J apan, 2015. [45] V. Bountourakis, L. Vrysis , and G. Papanikolaou, "Mach ine Learning Algorithms for Environmenta l Sound R ecognition: T owards Soundscape Semantics," presente d a t the P roceedings of the Audio Mostly 20 15 on Interaction With Sound, Thes saloniki, Greece, 20 15. [46] M. Cheffena, "Fall Detection Using Smartphone Audio Features ," IEEE J Biom e d Hea lth Inform, vol. 20 , pp. 1073 -80, Jul 2016 . doi: 10.1109 /JBHI.201 5.242593 2 [47] Bq.com. (2017, 2 Sep. 2017). Sm artphones BQ Aquaris | BQ Portugal . Availa ble: https:// www .bq.com/pt/ s martphones [48] H. B ojinov, Y. Michalevsky, G. Nakibly, and D . Boneh, "Mobile device identification via sens or fingerprinting," arXiv preprint arX iv :1408.1 416, 20 14 [49] K. Katevas, H. Haddadi, an d L. T okarchuk, "Sensingkit: Evaluating t he sensor power consumption in ios devi ces," in In telligent En vironm e nts (IE), 2016 12th Interna tional Co nference on , 2 016, pp. 222 -225. [50] ALLab. (2017, September 2nd). August 20 17- Multi-senso r d ata fusion in mobile de v ices f or the identification of activities of da ily living - ALLab Signals . Availa ble: https:// allab.di.ubi.pt/mediawiki/index.php/August_2017-_ Multi- sensor_data_fusion_in_mobile_devices_for_the_ident ification_of_activities_of_daily_livi ng [51] C. Rader and N. Brenner, "A new principle for fast F ourier transformation," IEEE Transactions on Acoustics, Speech, a nd Signal Pr oc e ssing, vol. 2 4, pp. 264-266, 1976. do i: 10.11 09/tass p.197 6.1162805 [52] V. G raizer, "Effect of low‐pass fil tering and re‐sampling on spect ral and peak grou nd accelerati o n in s trong‐moti on records," in P roc. 15th World Conference of E arthquake Eng inee ring, Lisbon, Portugal , 2012, pp. 24- 28. [53] A. Jain, K. Nandakumar, and A. R oss, "Score norma lization in multimodal biometric systems," Pattern Recogn itio n , vo l. 38, pp. 2270-2285, Dec 2005. doi: 1 0. 1016/j.patcog.2005.01.012 [54] L. B rocca, F. Melone, T . Moramarco, W. Wagner, V. Naeimi, Z. Bartalis , et al. , "Improving runoff prediction t hrough the assimilation of the ASCAT soil moisture product," Hydro logy and E arth System Sciences , vol. 14, pp. 1 881-1893 , 2010. doi: 10.5194/hess - 14 - 1881 -2010 [55] A. Y . Ng, "Feature s election, L 1 vs. L 2 regularization, and rotati onal invariance," in Pro ceedings of the twenty-first internationa l conference on Machine learning , 2004, p. 78.

User Environment Detection with Acoustic Sensors Embedded on Mobile Devices for the Recognition of Activities of Daily Living

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment