A multi-layered energy consumption model for smart wireless acoustic sensor networks

A m ulti-la y ered energy consumption mo del for smart wireless acoustic sensor net w orks Gert Dekk ers ∗ a,b , F ernando Rosas c,d , Stev en Lauw ereins b , Sreera j Ra jendran b , Soﬁe P ollin b , Bart V anrumste b , T o on v an W aterschoot b , Marian V erhelst b , and P eter Karsmakers a a Dep artment of Computer Scienc e, KU L euven, Belgium b Dep artment of Ele ctric al Engine ering, KU L euven, Belgium c Dep artment of Mathematics, Imp erial Col le ge L ondon, UK d Dep artment of Ele ctric al and Ele ctr onic Engine ering, Imp erial Col le ge L ondon, UK Decem b er 18, 2018 Abstract Smart sensing is exp ected to b ecome a p erv asiv e technology in smart cities and environmen ts of the near future. These services are impro ving their capabilities due to integrated devices shrinking in size while main taining their computational p o wer, whic h can run diverse Mac hine Learning algorithms and achiev e high p erformance in v arious data-pro cessing tasks. One attractiv e sensor mo dalit y to be used for smart sensing are acoustic sensors, which can con vey highly informativ e data while keeping a mo derate energy consumption. Unfortunately , the energy budget of current wireless sensor netw orks is usually not enough to supp ort the requiremen ts of standard microphones. Therefore, energy eﬃciency needs to b e increased at all lay ers — sensing, signal pro cessing and communication — in order to bring wireless smart acoustic sensors in to the market. T o help to attain this goal, this pap er introduces W ASN-EM : an energy consumption mo del for wireless acoustic sensors netw orks (W ASN), whose aim is to aid in the developmen t of no vel tec hniques to increase the energy-eﬃcien t of smart wireless acoustic sensors. This model pro vides a ﬁrst step of exploration prior to custom design of a smart wireless acoustic sensor, and also can b e used to compare the energy consumption of diﬀeren t protocols. Keyw ords: wireless sensor net w orks, smart acoustic sensing, energy consumption mo del. 1 In tro duction Recen t adv ances in hardw are miniaturization i s enabling in tegrated devices con taining wireless radios, pro cessing and sensing capabilities to shrink their sizes, while their computational p o w er is main tained or even increased [1]. This, along with a recen t surge of pow erful Machine Learning algorithms that can accomplish v arious data-driven tasks, has caused a rising interest in smart cities and environmen ts. Suc h a smart environmen t typically uses a net work of wireless sensors for acquiring information to oﬀer smart functionalit y [2]. Scenarios where this ha ve b een taking place include securit y , smart cit y , health monitoring and en tertainmen t [2 – 6]. An attractiv e t yp e of sensor to use in these applications are acoustic sensors (i.e. microphones). Compared to other sensors, acoustic sensors can con v ey highly informative data, including sounds with seman tic con tect (e.g. speech), noises that represent a warning (e.g. screams), sounds with in trinsic ∗ F or questions related to the mo del: gert.dekkers@kuleuv en.b e 1 meaning in a particular en vironment (e.g. a w ater faucet running within a kitc hen), etc. Detecting these informativ e acoustic ev ents can b e b eneﬁtial for numerous tasks, including sp eec h recognition, surveillance and monitoring, and man y others [7]. In order to allo w an wireless acoustic sensor netw ork (W ASN) to b e easily installed, wireless battery- p o w ered architectures are preferred to av oid extensive use of wiring [8]. Unfortunately , this brings addi- tional challenges as the lifetime of the devices can b e compromised by the energy consumption of acoustic sensors and wireless transmission, which usually go beyond what common curren t sensor net work arc hi- tectures can pro vide [9]. Increasing the energy eﬃciency of sensors can b e tac kled from the diﬀerent la y ers of the pro cessing c hain, including the sensing, signal pro cessing and comm unication mo dules [10]. In eﬀect, the total amount of consumed energy depends strongly on particular parameters related to hardw are dep endency . Ideally , one w ould optimize these parameters for eac h particular hardw are design, but in practice such approach w ould require tedious measurement campaings. F rom a signal pro cessing point of view, energy is often reduced by means of limiting the amoun t of arithmetic op erations. Such an approac h might not alwa ys b e v alid due to costly memory accesses. Additionally , if the goal is to design a smart wireless acoustic sensor, it is important to know the energy con tribution of eac h la yer to motiv ate an optimization. The literature provides a n umber of mo deling eﬀorts ab out v arious asp ects of wireless sensors netw orks (see [11] and references therein). Substan tial eﬀorts has b een done in increasing the energy eﬃciency of the wireless communication module, ranging from the physical la yer [12], multihop and routing [13, 14] and net work lay er proto cols [15, 16]. Regarding the pro cessing of audio information to retriev e relev ant information, deep learning has recen tly b ecome p opular [17 – 19]. Y ang et al hav e introduced an energy estimation mo del to estimate the energy consumption of Deep Neural Net w orks (DNN) [20]. The mo del is based on p ow er num b ers from their Eyeriss DNN accelerator c hip and pro vides an estimate on the energy consumption given an optimized dataﬂo w [21]. The disadv an tage of the model is that it is not ﬂexible as it is limited to the bounds of that particular c hip. T o the b est of our knowledge, no op en-source model is a v ailable that co v er all la yers of a smart acoustic sensor. In order to aid the design of nov el conﬁgurations for allowing the increase of energy eﬃciency of a range of sensing devices, in this rep ort we in tro duce the W ASN-EM : an Energy Mo del for Wireless Acoustic Sensor Netw orks. The goal of the mo del is threefold: (a) to bridge the gap b et ween the machine learning and hardware comm unity regarding (energy-eﬃcient) design of smart wireless acoustic sensors, (b) to b e ﬂexible to adjust to v arious hardware conﬁgurations, and (c) to provide a simple and op en-source softw are suc h that the communit y can contribute. The mo del can act as a ﬁrst step prior to custom design of a smart wireless acoustic sensor and pro vide a common ground for researchers to compare energy consumption, computational complexity and memory storage. In the sequel, an ov erview of the prop osed mo del is provided in section 2 as it is composed out of three separate mo dels: sensing, pro cessing and communication. Sectio n 3 elab orates on the mo delling of the sensing la y er whic h includes the microphone, p o wer ampliﬁer and analog-to-digital con verter. Section 4 co vers the pro cessing lay er where an hardware arc hitecture mo del is in tro duced that provides an energy consumption estimate of the arithmic operations and the accompanying memory accesses. A dditionally , some common algorithms for pro cessing audio information are in tro duced of which an energy consumption estimate can be obtained using the proposed hardware architecture mo del. Section 5 elab orates on the mo del of the communication lay er. The mo del cov ers the p ow er ampliﬁer and other electronic components based on a hypothetical hardware arc hitecture along with the eﬀects of re-transmission. The ﬁnal section pro vides some guidelines on in teresting parameters to exp erimen t with. 2 2 System mo del Let us consider an scenario where the goal is to monitor a particular en vironment to acquire information ab out the activities that are taking place. This could corresp ond to an apartment, where by determining the activities that are taking place an automated system could optimize a range of services including ligh ting, heating, etc. One w ay to harvest information ab out an environmen t is to deploy a W ASN consisting of m ultiple acoustic sensors no des with wireless comm unication capabilities, and a central connection/pro cessing device that can gather and pro cess the sensed data (see Figure 1). Eac h no de consists out of a acoustic sensing, pro cessing and communication mo dule, b eing capable of: (1) capturing and digitizing acoustic information, (2) pro cessing the resulting acoustic data to pro vide a meaningfull output and/or to reduce the amount of bits to comm unicate, (3) transmitting the processed information with a cen tral connection/processing point, and (4) receiving data from a central connection/pro cessing p oin t. Sensing layer Communication layer Processing layer smart wireless acoustic sensor central connection/processing point Figure 1: Scenario description Giv en the aforementioned scenario, let us consider a single duty cycle where a single no de measures the environmen t during ∆ seconds and subsequently do es some pro cessing on the data. Consequen tly , this generates N T bits of information, sp ending E S and E P joules in the sensing and data processing step resp ectiv ely . The pro cessed information is divided in N T / ( r u L u ) forw ard frames, where L u is the num b er of pa yload bits p er frame in the uplink direction and r u is its co de rate, which are transmitted directly to a cen tral connection p oin t (sink) using designated time-slots. After eac h frame transmission trial, the sink send bac k a feedbac k frame whic h ackno wledges correct reception or requests a re-transmission. Similarly , the communication mo dule can receiv e N R / ( r d L d ) frames where N R is total amoun t of receiv ed informativ e bits and L u and r u are the num b er of payload bits and the co de rate in the do wnlink direction. Hence, the total energy consumption of the audio sensor no de can be modeled as follo ws: ¯ E node = E S + E P + N T ¯ E T + N R ¯ E R . (1) Ab o v e, ¯ E T and ¯ E R is the a verage total energy consumption p er information bit that is correctly transmitted and resp ectiv ely receiv ed. Let us assume that the no de has to b e sensing the en vironment δ p ercent of the time (i.e. its dut y cycle). Let us also assume that the node carries n b batteries with a c harge of B Joules eac h. Then, b y neglecting the energy consumption of the no de when it is in sleep mode, the lifetime of the no de can b e estimated to be equal to L = δ − 1 n b B ¯ E node ∆ . (2) In the follo wing, we will develop expressions for E S , E P and E T . 3 3 Energy consumption of sensing The energy consumption exp ended in acoustic sensing can b e expressed as follows: E S = E mic + E LNA,mic + E ADC,mic . (3) Ab o v e, the energy consumed by the analog fron t-end E S consists out of the energy consumed b y the microphone E mic , the consumption of the lo w-noise ampliﬁer (LNA) E LNA,mic and the consumption of the analog-to-digital conv ertor (ADC) in digitalising the signal E ADC . ADC DSP unit LNA Figure 2: Microphone analog fron t-end (marked in grey) along with the pro cessing la yer The p o wer consumption of the microphone can be expressed as follo ws: P mic = ( 0 if passive mic or switc hed oﬀ , P mic,act if active and p o wered on , (4) so, the energy consumption will be giv en by E mic = ( 0 if passive mic, or ∆ P mic,act if active. (5) The energy consumption of the LNA can b e calculated as E LNA,mic = I LNA,mic V LNA,mic dd ∆ . (6) Ab o v e, V LNA,mic dd is the v oltage supply level and I LNA,mic is the a v erage current drawn by the LNA, whic h can b e calculated as I LNA,mic = π u T 4 k T W ADC 2 NEF v rms n,in ! 2 , (7) where k is the Boltzmann constan t, T is the temp erature in Kelvin, u T = k T /q e with q e equal to the c harge of the electron, W ADC is the ADC bandwidth, v rms n,in is the RMS voltage of the noise at the input of the LNA, and NEF is the noise eﬃciency factor , whic h w as prop osed in [22] and whic h v alue in a verage designs is betw een 5-10. Typical v alues for v rms n,in are v rms n,in = ( 10 µW for passive microphones , 100 µW for active microphones . (8) The energy consumption of the ADC can b e computed as E ADC,mic = P ADC,mic ∆ . (9) Ab o v e, P ADC,mic is the pow er consumption of the ADC, whic h can b e calculated as P ADC,mic = 2 n mic f s,mic F OM , (10) where n mic is the resolution of the ADC, f s,mic is the sampling frequency and F OM is the ﬁgur e of merit of the ADC. An ov erview of other relev an t hardw are related parameters with the used v alues is giv en in Appendix A T able 2. 4 4 Energy consumption of pro cessing The goal of the (optional) local pro cessing is to translate the ra w audio information to a lo wer dimension to reduce the amount of communicated bits. The pro cessed information could already b e the ﬁnal required output (e.g. a classiﬁcation output) or the output of a feature extraction stage. F rom a signal pro cessing p oin t of view, energy consumption is often reduced by means of limiting the amoun t of arithmetic op- erations. Such an approach might not alw ays b e v alid due to costly memory accesses. Here the energy consumption E P due pro cessing of the acquired information is deﬁned as E P = E op z }| { E cc J ALU X j =1 c j n DSP j + E m z }| { ∆ J MEM X k =1 ( E ma,k M a ,k + E ms,k M s ,k ) , (11) whic h consists of the consumption due to arithmetic op erations E ops and due to memory E m . Regarding the energy consumed due to the arithmetic op erations E op , E cc is the energy consumption p er clo c k cycle, c j is the num b er of clo c k cycle required by the j -th arithmetic op eration which is p erformed n DSP j times during the digital signal processing and J ALU is the n umber of diﬀeren t arithmetic op erations the micropro cessor p erforms. In the mo del w e distinguish the follo wing arithmetic op erations: 1) multiply- accum ulate (MAC), 2) addition and subtraction, 3) m ultiplication, 4) division, 5) comparison (including maxim um and minim um), 6) natural exp onen tiation and 7) logarithm. Dep ending on the h yp othetical hardw are architecture (e.g. CPU, ASIC, ...), each of these op erations take a diﬀeren t amount of clo c k cycles and energy cost p er clo c k cycle. This mo del fo cusses on a micro controller-based wireless acoustic sensor without any hardw are acceleration. A mo del for a custom c hip is not provided but in general these could provide energy gains of a factor 500 till 1000 for the processing la y er [24]. Mem Mem. Off-chi p ALU On-chip ALU ALU ALU Figure 3: The hardware architecture model The energy consumed by the memory E m , is decomposed into the energy required for accessing and storing whic h dep ends on the amoun t of bits accessed M a ,k or stored M s ,k in a particular memory k. The energy consumed by accessing and storing one bit is deﬁned by E ma,k and storage E ms,k resp ectiv ely for k = [0 , 1 , . . . , J MEM ] with J MEM the amoun t of av ailable memories. T ypical hardware arc hitectures ha ve m ultiple memory types a v ailable eac h ha ving a diﬀerent energy consumption for storage and access. The least consuming memory is typically close to the pro cessor unit but limited in size. When the needed memory is not suﬃcient, data mov ement is needed to and from more consuming memories. It is therefore imp ortan t to maximize data reuse on the least consuming memories to limit data mov ement [25]. In this mo del we assume: a) an arc hitecture with on- and oﬀ-chip memory as sho wn by Figure 3, b) equal energy cost for eac h operation per clo c k cycle b) equal cost for memory read and writes and c) in-place computation suc h that memory accesses can easily b e derived from a particular arithmetic op eration. A dditionally , it is explicitly deﬁned where the information should b e stored/accessed for each building blo c k in the pro cessing chain. An o verview of the parameters of the hardware architecture mo del are giv en in App endix A T able 3. In the follo wing subsections the explanation and an energy mo del for some t ypical algorithms used in the ﬁeld of automatic sound recognition are provided. The depicted problem is that of classifying an input audio stream in one out of all classes. A typical system to solve such a problem consists of Mel-F requency 5 | | Mel filterbank log DCT Framing + Windowing MFCC features Audio input ... DFT Figure 4: Mel-F requency Cepstral Coeﬃcients feature extraction process. The ra w acoustic data is trans- formed to the feature domain by applying a (1) framing and windo wing, (2) Discrete F ourier T ransform, (3) Mel-ﬁlterbank, (4) logarithm and (5) Discrete Cosine T ransform. (Cepstral) Co eﬃcients as feature extraction follo wed by a (Deep) Neural Net work based arc hitecture as classiﬁer [17 – 19]. Both consist of sev eral mo dular building-blo c ks whic h will b e described in the tw o follo wing subsections. 4.1 F eature extraction: Mel-F requency Cepstral Co eﬃcien ts The Mel-F requency Cepstral Co eﬃcien ts (MFCC) feature extraction algorithm originates from the domain of automatic sp eec h recognition and is based on the perception of sound b y the h uman auditory system [26]. Despite the fact that MFCC w as dev elop ed for that task, it is sho wn that it is also usable for automatic sound recognition due to their abilit y to represen t the amplitude sp ectrum in a compact form [17 – 19]. Figure 4 is an ov erview of the MFCC feature extraction process and in v olves the following main comp onen ts: (1) framing and windo wing, (2) Discrete F ourier T ransform (DFT), (3) Mel-frequency ﬁlterbank, (4) logarithmic op eration and (5) Discrete Cosine T ransform (DCT). In recen t y ears, related to the popularity of deep learning, researchers tend to use an in termediate output of the MF CC algorithm (Mel-frequency ﬁlterbank) or ev en the ra w audio w av eform due to the neural netw ork b eing able to learn a feature represen tation from the pro vided data. Up to date, the building blo cks of MFCC are still one of the dominant signal processing algorithms used for speech and audio classiﬁcation tasks. In the following subsections these building blo c ks are explained. 4.1.1 F raming and windowing The framing and windowing op eration of the feature extraction pro cess transfers the ra w acoustic w av eform in to short ov erlapping segments. These segmen ts, further called frames, are typically 30 ms long with an o verlap of 10 ms. Each frame f is then t ypically windo wed with an Hamming-windo w h to reduce sp ectral leak age. The windo wing op eration is deﬁned as s n = h n f n for n = [0 , 1 , . . . , N t − 1] with N t the amount of samples in one frame. The amoun t of op erations for this stage consists of N t m ultiply-accumulates for one frame. As it can b e computed in-place, the total needed storage is 2 · N t · S bits, where S represen ts the w ord size. As a m ultiply-accumulate needs four memory accesses, this results in a total of 4 · N t memory accesses. 4.1.2 Discrete F ourier transform The framing and windowing op eration are follo wed b y the DFT that transforms the frame s in the time- domain in to a frame z in the frequency domain. T ypically , the F ast F ourier T ransform (FFT) can b e used whic h is a computational eﬃcient v ariant for computing the DFT [27]. Here, a frame is zero-padded to the next radix-2 n umber. In case of a radix-2 FFT implemen tation the amount of op erations are N f / 2 · log 2 ( N f ) complex multiplies and N f · l og 2 ( N f ) complex additions with N f the length of the (zero- padded) input frame. A complex multiplication is assumed to consist of 4 multiplications and 2 additions. By assuming an in-place algorithm, the total needed storage is N t · S bits along with 5 · N f · l og 2 ( N f ) memory accesses. 6 4.1.3 log Mel-frequency transform As a ﬁrst step only the pow er sp ectrum | z | 2 up to N f / 2 samples is retained since studies ha ve shown that the amplitude of the spectrum is of more imp ortance than the phase information. Then, the Mel- frequency ﬁlterbank smo oths the high-dimensional magnitude sp ectrum suc h that it reﬂects the sensitivity of the human auditory system to frequency shifts where the low er frequencies are p erceptually of more imp ortance than the higher ones. The ﬁlterbank is deﬁned b y ov erlapping triangular frequency response bandpass ﬁlters with a constant spacing and bandwidth in the Mel-frequency domain. T ypically , the n umber of bands N m is set in the range b etw een 20 and 60 [17, 18]. The log Mel features can b e computed b y using: m b = log ( N f / 2 − 1 X k =0 W bk | z k | 2 ) , (12) with b = [0 , 1 , . . . , N m − 1] and W bk the w eight of the Mel ﬁlter bank in band b at frequency k . In a ﬁnal step a logarithm operation is p erformed on the Mel features, which is also motiv ated by h uman p erception of sound as h umans hear loudness on a logarithmic scale. The amount of op erations summarises to N f / 2 · N m m ultiply-accumulates and N m logarithms. In total ( N f / 2 · N m + N m ) · S bits need to b e stored along with 2 · N f · N m + 2 · N m memory accesses. 4.1.4 Discrete Cosine T ransform The Discrete Cosine T ransform (DCT) expresses the Mel features in terms of a sum of cosine functions. These cosine functions describ e the amplitude en v elop e of the Mel features. A limited set of co eﬃcients (t ypically 14) are retained as they contain the elemen tary asp ects of the shape. The DCT on the log Mel features is deﬁned as: c d = N m − 1 X b =0 m b cos h d  b + 1 2  π N m i , (13) with d = [0 , 1 , . . . , D − 1] and [ c 0 , c 1 , . . . , c N c − 1 ] as the MF CC feature v ector with length N c . The amoun t of op erations consist of N m · N c m ultiply-accumulates. In total ( N m + 1) · N c bits need to be stored along with 4 · N m · N c memory accesses. 4.2 Classiﬁer: Artiﬁcial Neural Net work Artiﬁcial Neural Net works, inspired by biological neural netw orks, automatically learn tasks based on pro vided data (and desired output) [25]. A (Deep) Neural Netw ork architecture is highly customizable and constructed using multip le la yers. F ollo wing subsections roughly introduce sev eral common lay ers used in (Deep) Neural Netw ork architectures along with the amount of op erations. 4.2.1 F ully-Connected la yer The main building-blo c k of the F ully Connected (FC) lay er is an artiﬁcial neuron which is mo delled using a modiﬁed version of a perceptron. A perceptron is a linear classiﬁer able to discriminate tw o classes [28]. The formal deﬁnition of the mo diﬁed p erceptron is f ( x ) = σ ( w T x ) , with x the input v ector, w the learned w eight v ector and σ an activ ation function to (non-linearly) transform the output v alue. The input v ector x is augmen ted with a scalar of v alue 1 to allow for a shift in the discriminating hyperplane. Diﬀerent from the original p erceptron is that the activ ation function is not restricted to a threshold. A p erceptron can b e extended to m ulti-class classiﬁcation b y stacking m ultiple perceptrons (one for each class) and is referred as a multi-class perceptron. When used in Neural Net w orks, this is denoted as a FC la y er. T o allow for non-linear classiﬁcation m ultiple FC lay ers can be concatenated where each output of the previous lay er is connected to all inputs of that particular lay er. In b et w een those la yers, a non-linear activ ation function should be used to create the non-linearity . As output la y er the activ ation function t ypically consists of a 7 Softmax op eration to pro vide a probabilistic output. These activ ations are deﬁned in section 4.2.2. The amoun t of op erations for the F C lay er are L n ( L i + 1) m ultiply-accumulates, with L n the amoun t of neurons and L i the size of the input v ector. In total L n · ( L i + 2) · S bits need to b e stored along with 4 · L n · ( L i + 1) memory accesses. 4.2.2 Activ ation function A ctiv ation functions introduce non-linear properties to the Neural Net work. It does a non-linear mapping on the output of a artiﬁcial neuron to either pro vide input to a following la y er or a probabilistic output on the ﬁnal lay er. Numerous activ ations ha ve b een prop osed ov er past decades. Curren tly , the most commonly used are the Rectiﬁed Linear Unit (ReLU), Logistic and T anh [19]. ReLU is describ ed as σ ( z k ) = max (0 , z k ) with z the output of the previous la yer and k = [0 , 1 , . . . , L n − 1] . As it does not provide a probabilistic output, it is only used in-b et ween la yers. Logistic and T anh are deﬁned as σ ( z k ) = 1 / (1 + e − z k ) and σ ( z k ) = 2 / (1 + e − 2 z k ) − 1 respectively . As output lay er, a Softmax function is t ypically used to compute the probabilities for eac h class. The Softmax is deﬁned as σ ( z k ) = e z k / P K − 1 k =0 e z k . F or ReLU, the total amount of op erations devoted to the activ ation function is L n comparisons along with 3 · L n memory accesses. The T anh activ ation function consists of 2 · L n additions, and L n divisions and exp onen tials along with 12 · L n memory accesses. In case of Softmax and Logistic the total amount of op erations summarises to L n additions, divisions and exp onen tials along with 9 · L n memory accesses. 4.2.3 Conv olutional la y er In case of more-than-one-dimensional input data, connecting all inputs to a FC la yer migh t lead to an unreasonable amoun t of w eights. A con v olutional la yer is similar to a F C la y er as they are also made up of artiﬁcial neurons that ha ve learnable weigh ts. A conv olution la yer ho wev er, con v olves the input data with m ultiple so-called templates. It is assumed that a particular template, smaller than the input data, can b e reused at multiple p ositions in the input data. A t eac h con volution index these templates are lo cally-connected to the input data and output one activ ation. Due to the weigh t sharing the amount of w eights is reduced compared to directly using a FC lay er. The h yp erparameters that deﬁne a con volutional la yer are the n umber of templates T n , template dimensions T d,k , conv olution strides T s,k and amoun t of zero padding T p,k on the input data for a particular dimension index k . The output size of a conv olution lay er for a particular template T n is deﬁned as L o,k = ( L i,k − T d,k + 2 T p,k ) /T s,k + 1 , with L i,k and L o,k the length of the input- and output data resp ectively at dimension k . This leads to total amoun t of op erations of T n · Q L d k =0 (( L i,k − T d,k + 2 T p,k ) /T s,k + 1) · ( Q L d k =0 T d,k + 1) multiply-accum ulates. The amount of memory accesses are the amount of op erations m ultiplied by 4. The total needed storage consists of T n · ( Q L d k =0 T d,k + 1) · S bits for the weigh ts and ( Q L d k =0 L o,k ) · S bits as output. 4.2.4 Pooling la y er It is a common practice to introduce a p ooling la yer in-b et w een succesive conv olutional la yers. Such a la yer performs undersampling on the previous output to reduce ov erﬁtting and the amoun t of parameters and computation in the netw ork. Similar to a con volutional lay er, the po oling la yer iterates ov er the en tire input data. Diﬀeren t from a conv olutional lay er is that, instead of a matrix product of the locally- connected data and a template, it calculates a summary of the curren t locally-connected part of the data. This summary could either b e a verage or a max. op eration. The po oling la yer therefor consists of Q L d k =0 (( L i,k − T d,k + 2 T p,k ) /T s,k + 1) · ( Q L d k =0 T d,k − 1) op erations. The amount of memory accesses are the amoun t of op erations m ultiplied b y 3. The needed output storage summarises to ( Q L d k =0 L o,k ) · S bits. 8 4.2.5 Batch Normalization Batc h Normalization (BN) was in tro duced to comp ensate for the so-called internal c ovariate shift that slo ws do wn the training of the net work [29]. BN p erforms a standard normalization, during training on eac h mini-batc h, on the output of the activ ations of each la yer sep erately . During the test phase this adds an additional shift and scale to each activ ation output. F or a particular lay er the amount of op erations are L n additions and multiplications, whic h results in 6 · L n memory accesses. The total stored information is 2 · L n · S bits for the shift and scale. 5 Energy consumption of comm unications This section fo cuses in describing and mo deling the energy consumption of the comm unication module of the sensor no de. W e assume that the no de is equipp ed with N t transmitter and N r receiv eing antennas and corresp onding transceiv er branches, respectively [33]. If a node has only one an tenna then N t = N r = 1 is used. By default, the node is assumed to b e in a low p o wer consumption (sleep) mo de [10]. A t its designated time the no de wak es up, and engages a transmission and reception of frames with the central connection p oin t resp ectively . In case of transmission, if x attempts are deco ded with errors, the transmitter will declare an outage and go to sleep mo de for one coherence time of the c hannel. Let us denote as τ out the num b er of outages and τ x the num b er of transmission trials required to achiev e a deco ding without errors ( τ x ∈ { 1 , . . . , x } ). These are random v ariables with mean v alues giv en b y ¯ τ out and ¯ τ x , whose v alues dep ends on the modulation, coding sc heme and fading statistics [34]. The energy consumption p er correctly transmitted information bit, ¯ E T , can be modeled as [35] ¯ E T = (1 + ¯ τ out ) E st N T + E enc + ( E etx,b + E P A,b + E erx,fb ) ( x ¯ τ out + ¯ τ x ) . (14) Ab o v e, E st is the startup energy required to aw ak e the no de from the lo w pow er mo de, E enc is the energy required to enco de the forward message, E etx,b and E erx,fb are the energy consumption of the baseband and radio-frequency electronic comp onen ts that p erform the forward transmission and the reception of the feedback frame resp ectiv ely and E P A,b is the energy consumption of the p o w er ampliﬁer (whic h is resp onsible of the electromagnetic irradiation) for sending an information bit. By analogy , the total energy used p er correctly receiv ed bit, whic h in volv es demo dulating forward frames and transmitting the feedback frames, can be modeled as [35] ¯ E R = (1 + ¯ τ out ) E st N R + ( E dec + E erx,b + E etx,fb + E P A,fb ) ( x ¯ τ out + ¯ τ x ) . (15) Ab o v e, E erx,b and E etx,fb are the energy consumption of the baseband and radio-frequency electronic com- p onen ts that perform the forward reception and the transmission of the feedbac k frame respectively and E P A,fb is the energy consumption of the p o wer ampliﬁer for transmitting feedbac k frames. 5.1 Mo deling the energy consumption of the P A Let us express E P A , the total consumption due to the irradiated pow er, as E P A = E P A,b + E P A,fb = N t X j =1 P ( j ) P A T b + N t X j =1 P ( j ) P A T fb , (16) where P ( j ) P A is the pow er consumption of the P A of the j -th transceiver branch. The total time p er information bit in the forward direction, T b , is calculated as [33] T b = 1 r R s  1 ω b + H ω L + N t O a + O b L  , (17) 9 where R s is the physical la y er sym b ol-rate, r is the code rate of the co ding sc heme (p ercen tage of data p er payload bits), ω is the m ultiplexing gain of the MIMO mo dulation, b = log 2 ( M ) is the n umber of bits p er complex sym b ol, H and L are the n umber of bits in the header and pa yload of the frame, O a is the acquisition ov erhead p er transceiv er branch and O b is the remaining ov erhead, whic h is appro ximately indep enden t of the an tenna arra y size (b oth O a and O b are measured in bits [36]). Similarly , the total time p er feedback frame, T fb , is giv en by T fb = F r ω R s L , (18) where F is the num b er of bits of the feedbac k frame. Let us relate the p ow er consumption of the P As with the signal-to-noise ratio (SNR). The j -th transmit an tenna radiates P ( j ) tx w atts, whic h are pro vided by the corresp onding p o w er ampliﬁer (P A). The P A’s p o w er consumption is mo deled by [36] P ( j ) P A = 1 η P ( j ) tx , (19) where η the av erage eﬃciency of the P A. In general, the av erage P A eﬃciency can b e more precisely mo deled using the distribution of the output p o w er of the underlying signal. If we limit the analysis to linear P As, suc h as Class A and Class B P As (as man y mobile and wireless comm unication devices require linear P As), then we can approximate η with η =  ¯ P tx P max  β η max , (20) where ¯ P tx is the av erage radiated p o wer (which we assume is the same for all transmitter an tennas), P max is the maximal P A output and η max, class A = 0 . 5 and β class A = 1 (21) η max, class B = 0 . 785 and β class B = 0 . 5 . (22) In these equations, P back-oﬀ = P max / ¯ P tx is the back-oﬀ of the P A. Highest eﬃciency is achiev ed b y constant- en velope signals for whic h P back-oﬀ = 1 . In general, one can calculate the bac k-oﬀ co eﬃcien t as P back-oﬀ = ξ /S , where ξ is the p eak-to-av erage p o wer ratio of the mo dulation (which is usually calculated as ξ = 3( √ M − 1) / ( √ m + 1) ) and S accoun ts an y additional bac k-oﬀ that may b e taken when the wireless link has excess link budget and transmit pow er can b e decreased further. Finally , the relationship b etw een the P A consumption P P A and the radiated p o w er P tx is calculated as: P ( j ) tx =  S ξ  β η max P ( j ) P A . (23) The transmission pow er atten uates ov er the air with path loss and arrives at the receiver with a mean p o w er giv en by P ( j ) rx = P ( j ) tx A 0 d α , (24) where d is the distance b et ween transmitter and receiver and α is the path loss exp onen t. Ab ov e, A 0 is a parameter that is deﬁned by the free-space friss equation A 0 = 1 G t G r  4 π λ  2 , (25) 10 where G t and G r are the transmitter and receiver an tenna gains and λ is the carrier wa v elength. Finally , if σ 2 s is the av erage received p o wer p er symbol at the input p oin t of the decision stage of the receiver (which is lo cated after the MIMO deco der), the total received signal p o wer is given b y N t X j =1 ¯ P ( j ) rx = ω σ 2 s = ω σ 2 n ¯ γ , (26) where σ 2 n is the thermal noise p o w er and ¯ γ is the a verage SNR. In general, σ 2 n = N 0 W N f M L , where N 0 is the p o w er sp ectral densit y of the baseband-equiv alen t additive white Gaussian noise (A W GN), W is the transmission bandwidth, N f is the noise ﬁgure of the receiver’s front end and M L is a link margin term whic h represen ts any other additive noise or in terference [37]. With all this, one ﬁnds that N t X j =1 P ( j ) P A =  ξ S  β 1 η max N t X j =1 P ( j ) tx (27) =  ξ S  β A 0 d α η max N t X j =1 P ( j ) rx (28) =  ξ S  β N 0 W N f M L A 0 η max ω d α ¯ γ (29) = Aω d α ¯ γ (30) with A a constan t. 5.2 Mo deling the energy consumption of the other electronic comp onen ts Let us assume that the device is equipp ed with N t an tennas, using an arc hitecture as shown in Figure 5. Then, one can express the electronic consumption of the transmitter per information bit as [33] E etx,b = N t P etx T b = N t ( P DA C + 2 P ﬁlter + P LO + P mixer ) T b , (31) where P etx is the p o w er consumption of the electronic comp onen ts (ﬁlters, mixer, DA C and lo cal oscillator) that p erform the transmission per transceiv er branc h. A similar equation for E etx,fb can b e obtained by replacing T b with T fb whic h are deﬁned in (17) and (18) respectively . ... ... DAC Filter Filter Local oscillator (LO) Mixer Filter DAC MIMO Encoder Tr ansmission branches ... ... Mixer Filter MIMO Decoder Reception branches Filter Filter LNA Filter VGA ADC Local oscillator (LO) Figure 5: MIMO architecture considered in this work Analogously , one can show that E erx,b = N r P erx T b = N r (3 P ﬁlter + P LNA + P LO + P mixer + P VGA + P ADC ) T b , (32) 11 where P erx is the pow er consumption of the electronic comp onen ts (ﬁlters, mixer, ADC, VGA and lo cal oscillator) that perform the forward and feedback frame reception p er branc h. A similar equation can b e obtained for E erx,fb . F ollowing [36], we will mo del the energy consumption of D ACs as P DA C = β ( P static DA C + P dyn DA C ) , where P static DA C (resp. P static DA C ) is the static (resp. dynamic) p o w er consumption and β is a correcting factor to incorp orate some second order eﬀects. If a binary-w eighted current-steering D AC is considered [38], then P static DA C = V dd I unit E ( n 1 X i =0 2 i b i ) = 1 2 V dd I unit (2 n 1 − 1) , (33) where n 1 is the resolution, b i are independent Bernoulli random v ariables with parameter 1 / 2 , V dd is the p o w er supply voltage and I unit is the unit current source corresp onding to the least signiﬁcant bit. The dynamic consumption can b e appro ximated as P dyn DA C = (1 / 2) n 1 C p f DA C s V 2 dd , where C p is the parasitic capacitance of eac h switc h, the 1 / 2 is the switching probability and f DA C s is the sampling frequency . Hence, the total consumption of the DA C is expressed as P DA C = β 2  V dd I unit (2 n 1 − 1) + n 1 C p f DA C s V 2 dd  . (34) In turn, the ADC consumption can b e computed using (10). 5.3 Mo deling the energy consumption of enco ding and deco ding forw ard frames The computations requred for enco ding and deco ding data frames can b e demanding, depending on the c hoice of co ding sc heme [35]. Therefore, it makes sense to include the energy costs of these op erations in to the energy budget. How ever, for simplicit y w e neglect the co ding and deco ding costs of headers and feedbac k frames, which usually are either unco ded or use ligh tw eigth co des whose processing can safely be neglected. By considering that the enco ding has to b e done for eac h frame, its cost is shared among the L u pa yload bits. Similarly to (11), the energy consumption for enco ding one frame — normalized p er pa yload bit — is giv en b y E enc = 1 N T E cc J ALU X j =1 c j n enc j , (35) with n enc j the amoun t of times the j -th arithmetic op eration is p erformed. Note that it is straigh tforward to write an equation for the deco ding cost equiv alent to (35). More information on the computational complexit y of the used error correction co de (BCH code) can b e retrieved in [35]. Finally , following [35] our modeling do es not includes the cost of memory storage and access, which is left for future w ork. 5.4 Re-transmission Statistics F or computing the statistic of retransmissions due to deco ding errors, let us derive an expression for ¯ τ out and ¯ τ x follo wing [35]. As eac h outage declaration are indep enden t even ts, τ out will be a geometric random v ariable with p.d.f. P { τ out = j } = (1 − q x ) q j x , where q x = 1 − P { τ ≤ x } is the outage probability . Then, a direct calculation sho ws that its mean v alue is giv en by ¯ τ out = q x 1 − q x . (36) The p.d.f. of τ x is given b y P { τ x = t } = P { τ = t | τ ≤ x } = P { τ = t } 1 − q x (37) 12 and, hence, it mean v alue is calculated as ¯ τ x = x X t =1 t · P { τ x = t } = 1 1 − q x x X t =1 t · P { τ = t } . (38) Finally , one can ﬁnd that x ¯ τ out + ¯ τ x = 1 1 − q x xq x + x X t =1 t P { τ = t } ! . (39) The v alues of ¯ τ out and ¯ τ x dep end strongly on the correlations of the wireless c hannels. Let us consider t wo extreme examples: fast fading and static fading c hannels. In fast fading c hannels the frame error rate of each transmission trial are i.i.d. random v ariables, while in static channels there are the same random v ariable. Then, one hav e that q ﬀ x = ¯ P x f and q bf x = E { P x f } , and a direct application of the Jensen inequality sho ws that q ﬀ x ≤ q bf x . Also, b y deﬁning Φ( x ) as Φ( x ) = 1 1 − E  P x f  E  1 − P x f 1 − P f  , (40) it can b e shown that x ¯ τ bf x + ¯ τ bf out = Φ( x ) and x ¯ τ ﬀ x + ¯ τ ﬀ out = Φ(1) . It can also b e sho wn that Φ( x ) is a increasing function, so that in fast fading scenarios one usually requires more transmissions trails. P articular functional forms can b e given for P f dep ending on the error correcting code sc heme in use. F or the sak e of concreteness, in the sequel w e present a deriv ation v alid for BCH co des, follo wing the deriv ation sho wn in [39]. Let us denote as n the legth of eac h co dew ord, and assuming that n < L let’s deﬁne n c = L/n ( n c ∈ N ) as the num b er of codewords p er payload. In order to decode a frame correctly , one needs to correctly obtain H correct header symbols and n c co dew ords with at least ( n − t ) = λ correct sym b ols, where t is the maxim um n umber of bits that the FEC blo c k code is able to correct in each co dew ord. Therefore, by taking into consideration the v arious p ossible p erm utations, ¯ P f can b e expressed in terms of the bit error rate of the M -ary mo dulation P b ( γ ) and the binary mo dulation symbol error rate P bin ( γ ) as ¯ P f ( ¯ γ ) = 1 −  1 − ¯ P bin ( ¯ γ )  H   t X j =0  n j   1 − ¯ P b ( ¯ γ )  n − j ¯ P b ( ¯ γ ) j   n c . (41) Ab o v e, γ corresp onds to the signal-to-noise ratio and ¯ γ = E { γ } . Also, please note that w e are using the shorthand notation ¯ P bin ( ¯ γ ) = E { P bin ( γ ) } and ¯ P b ( ¯ γ ) = E { P b ( γ ) } , and that (41) is only v alid for scenarios that exp erience fast-fading conditions. Finally , let us remark that simple metho ds to approximate the error rates of MIMO channels are a v ailable in [40]. 5.5 Final mo del Using the material presen ted in the ab ov e subsections, one an ﬁnally express (14) as ¯ E T = 1 N T ( E st 1 − q x + E cc J ALU X j =1 c j n enc j ) + [( N t P etx + Ad α ω ¯ γ ) T b + N t P erx T fb ] Φ( x ) . (42) This equation allow s to express the a verage energy consumption per successfully transferred bit in terms of a n um b er of design parameters. Similarly , the a verage energy consumption per succesfully received bit, giv en b y (43), is expanded to 13 ¯ E R = E st (1 − q x ) N R +   1 N R E cc J ALU X j =1 c j n dec j + N r P erx T b + ( N r P etx + Ad α ω ¯ γ ) T fb   Φ( x ) . (43) An o verview of the relev ant hardware related parameters along with the used v alues is given in Ap- p endix A T able 4. 6 Mo del analysis The proposed mo del contains separate blo c ks for eac h of the four considered lay ers (sensing, pro cessing and communication), and eac h of them are p opulated b y a considerable amoun t of parameters. How ev er, some of these parameters are more in teresting to b e explored than others. Although w e don’t dev elop an exhaustiv e exploration of all these parameters, the remaining of this section provides some guidelines on asp ects of interest that can be explored. T able 1 pro vides a summary of some parameters from the sensing and communication la yers that hav e a strong inﬂuence on the energy eﬃciency of the sensor no de. ∗ In case of the sensing lay er, f s,mic and n mic are interesting parameters as they ha ve a big impact on the consumed energy b ecause it regulate the amoun t of information to b e pro cessed and comm unicated. Naturally , more collected information equals a higher energy consumption. In case of the communication lay er, one parameter to explore is the amount of communicated bits N T that dep ends on both the pro cessing and sensing la yer. An interesting trade-oﬀ is the energy sp end in pro cessing v ersus the one sp ent in comm unication. Another parameter of interest is the n umber of bits t the FEC can correct. A higher n umber leads to more comm unication frame o v erhead but less re-transmission due to errors, as shown in Refs. [35, 39]. Another parameter of interest is the transmission bandwidth W , whic h is determined usually b e the comm unication standard that the netw ork uses (Zigb ee, Blueto oth, etc). In general a higher bandwidth is b eneﬁcial as transmission times – and hence the baseline electronic consumption of the transceiv er – are reduced. Please note that the eﬀect of in terference is not included in this model, b eing left for future w ork. Finally , one can explore the impact of v arious the comm unication c hannel via the path-loss co eﬃcien t α and the SNR e . In busy scenarios with many obstables the path-loss is higher and the receiv ed signal strength is smaller, which usually causes more re-transmission and, in turn, a lo w er energy eﬃciency . P arameter Description V alue f s,mic Sampling frequency of the microphone 16 kHz n mic ADC resolution 12 bit N T Amoun t of bits to comm unicate - t Num b er of bits the FEC can correct 4 W T ransmission bandwidth 1 MHz α P ath-loss coeﬃcient 3.2 e SNR of comm unication 25 dB T able 1: Exp erimen tal parameters along with the default v alue in the source co de A Default parameters used in the MA TLAB implemen tation ∗ The pro cessing lay er is not co vered in the table; one can compare v arious feature extraction and classiﬁer architectures, or no pro cessing at all. 14 P arameter Description V alue T Ro om temp erature 290 K P mic,act A ctive microphone p o wer consumption 10 m W V LNA,mic dd V oltage supply of the LNA-mic 1.5 V NEF Noise eﬃciency factor 6 F OM ADC Figure of merit of the ADC 500 fJ/conv. [23] f s,mic Sampling frequency of the microphone 16 kHz n mic ADC resolution 12 bit T able 2: Overview of the default parameters used in the MA TLAB implementation for the sensing lay er P arameter Description V alue E op Energy p er op eration (GP proc.) 500 pJ [24, 30] Energy p er op eration (GP DSP) 100 pJ [24] E ma Energy p er memory access (on-c hip SRAM) 100 fJ/bit [31] Energy p er memory access (oﬀ-c hip SRAM) 100 pJ/bit [32] Energy p er memory access (oﬀ-c hip DRAM) 100 pJ/bit [24] E ms Energy memory leak age (on-c hip SRAM) 50 pW/bit [31] Energy memory leak age (oﬀ-c hip SRAM) 10 pW/bit [32] Energy memory leak age (oﬀ-c hip DRAM) 75 pW/bit [24] S W ord size 32 bit c mac Multiply-accum ulate cost 2 ops. [30] c add A ddition cost 1 op. [30] c mul Multiplication cost 1 op. [30] c div Division cost 8 op. [30] c cmp Comparator cost 1 op. [30] c exp Natural exp onen tial cost 2 op. [30] c log Logarithmic cost 25 ops. [30] T able 3: Ov erview of the default parameters used in the MA TLAB implemen tation for the pro cessing la yer 15 P arameter Description V alue E st Start-up energy 94 µ J [41] P ﬁlter Filter p o wer consumption 1 m W [42] P mixer Mixer p o wer consumption 1 m W [42] P LNA,rx LNA p o wer consumption 3 m W [42] P VGA V GA pow er consumption 5 m W [42] P LO Lo cal oscilator consumption 22.5 m W [42] n 1 Resolution of the T x D AC 10 levels [36] f DA C s D AC sampling frequency 4 MHz V DA C dd V oltage supply of the D AC 3 V [36] I unit D AC unit current source 10 µ A [36] C p D AC parasitic capacitance 1 pF [36] n 2 Resolution of Rx ADC 10 lev els [36] f ADC s D AC sampling frequency 4 MHz η max P A eﬃciency (Class B) 0.785 % β Exp onen t for Class B P A 0.5 S A dditional bac k-oﬀ co eﬃcient 0 dB G t T ransmitter antenna gain 1.8 G r Receiv er an tenna gain 1.8 f c Carrier frequency 2.4 GHz [43] W Bandwidth 1 MHz [42] R s Sym b ol rate 0.125 MBaud [42] M M-ary num b er 2 (BPSK) N f Receiv er noise ﬁgure 16 dB [42] M l Link margin 20 dB α P ath-loss co eﬃcien t 3.2 t Num b er of bits the FEC can correct 4 H F rame Header 2 bytes [43] L P ayload 127 b ytes [43] O a A cquisition o v erhead 4 bytes [43] O b Estimation and sync hronization ov erhead 1 bytes [43] F F eedback frame length 5 b ytes T able 4: Overview of the default parameters used in the MA TLAB implemen tation for the communication la yer 16 References [1] S. Bork ar and A. A. Chien, “The future of micropro cessors,” Commun. A CM , v ol. 54, no. 5, pp. 67–77, Ma y 2011. [2] P . Ra w at, K. D. Singh, and J. M. Chaouc hi, Hakimaand Bonnin, “Wireless sensor netw orks: a surv ey on recen t dev elopments and p oten tial synergies,” The Journal of Sup er c omputing , vol. 68, no. 1, pp. 1–48, Apr 2014. [3] A. Mainw aring, D. Culler, J. Polastre, R. Szew czyk, and J. Anderson, “Wireless sensor net works for habitat monitoring,” in Pr o c e e dings of the 1st ACM international workshop on Wir eless sensor networks and applic ations . Acm, 2002, pp. 88–97. [4] I. Butun, S. D. Morgera, and R. Sank ar, “A surv ey of in trusion detection systems in wireless sensor net works,” IEEE Communic ations Surveys T utorials , vol. 16, no. 1, pp. 266–282, First 2014. [5] F. Erden, S. V elipasalar, A. Z. Alk ar, and A. E. Cetin, “Sensors in assisted living: A survey of signal and image processing metho ds,” IEEE Signal Pr o c essing Magazine , v ol. 33, no. 2, pp. 36–44, March 2016. [6] I. A. T. Hashem, V. Chang, N. B. Anuar, K. Adew ole, I. Y aqo ob, A. Gani, E. Ahmed, and H. Chiroma, “The role of big data in smart city ,” International Journal of Information Management , v ol. 36, no. 5, pp. 748 – 758, 2016. [7] M. V acher, F. P ortet, A. Fleury , and N. Noury , “Dev elopment of audio sensing technology for am- bien t assisted living: Applications and challenges,” International Journal of E-He alth and Me dic al Communic ations (IJEHMC) , v ol. 2, no. 1, pp. 35–54, January 2011. [8] A. Bertrand, “Applications and trends in wireless acoustic sensor netw orks: A signal pro cessing p erspective,” in 2011 18th IEEE Symp osium on Communic ations and V ehicular T e chnolo gy in the Benelux (SCVT) , Nov 2011, pp. 1–6. [9] S. Lauw ereins, “Cross-lay er self-adaptivity for ultra-low p o wer responsive iot devices,” 2018. [10] H. Karl and A. Willig, Pr oto c ols and ar chite ctur es for wir eless sensor networks . John Wiley & Sons, 2007. [11] G. Anastasi, M. Con ti, M. Di F rancesco, and A. Passarella, “Energy conserv ation in wireless sensor net works: A survey ,” A d ho c networks , v ol. 7, no. 3, pp. 537–568, 2009. [12] E. Shih, S.-H. Cho, N. Ick es, R. Min, A. Sinha, A. W ang, and A. Chandrak asan, “Physical lay er driven proto col and algorithm design for energy-eﬃcien t wireless sensor net works,” in Pr o c e e dings of the 7th annual international c onfer enc e on Mobile c omputing and networking . A CM, 2001, pp. 272–287. [13] D. Ganesan, R. Govindan, S. Shenker, and D. Estrin, “Highly-resilient, energy-eﬃcient m ultipath routing in wireless sensor net works,” ACM SIGMOBILE Mobile Computing and Communic ations R eview , v ol. 5, no. 4, pp. 11–25, 2001. [14] F. Rosas, R. D. Souza, M. V erhelst, and S. Pollin, “Energy-eﬃcient mimo m ultihop comm unications using the an tenna selection sc heme,” in Wir eless Communic ation Systems (ISWCS), 2015 Interna- tional Symp osium on . IEEE, 2015, pp. 686–690. [15] W. Y e, J. Heidemann, and D. Estrin, “An energy-eﬃcien t mac protocol for wireless sensor net works,” in INF OCOM 2002. Twenty-First Annual Joint Confer enc e of the IEEE Computer and Communic ations So cieties. Pr o c e e dings. IEEE , v ol. 3. IEEE, 2002, pp. 1567–1576. 17 [16] T. V an Dam and K. Langendo en, “An adaptiv e energy-eﬃcien t mac protocol for wireless sensor net works,” in Pr o c e e dings of the 1st international c onfer enc e on Emb e dde d networke d sensor systems . A CM, 2003, pp. 171–180. [17] D. Stow ell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley , “Detection and classiﬁcation of acoustic scenes and ev ents,” IEEE T r ansactions on Multime dia , v ol. 17, no. 10, pp. 1733–1746, Oct 2015. [18] A. Mesaros, T. Heittola, E. Benetos, P . F oster, M. Lagrange, T. Virtanen, and M. D. Plumbley , “Detection and classiﬁcation of acoustic scenes and even ts: Outcome of the dcase 2016 challenge,” IEEE/A CM T r ans. Audio, Sp e e ch and L ang. Pr o c. , v ol. 26, no. 2, pp. 379–393, F eb. 2018. [19] A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Ra j, and T. Virtanen, “ DCASE 2017 Challenge setup: T asks, datasets and baseline system,” in Pr o c e e dings of the Dete ction and Classiﬁc ation of A c oustic Sc enes and Events 2017 W orkshop (DCASE2017) , Munic h, Germany , No vem b er 2017. [20] T. Y ang, Y. Chen, J. Emer, and V. Sze, “A metho d to estimate the energy consumption of deep neural netw orks,” in 2017 51st Asilomar Confer enc e on Signals, Systems, and Computers , Oct 2017, pp. 1916–1920. [21] Y. Chen, J. Emer, and V. Sze, “Eyeriss: A spatial arc hitecture for energy-eﬃcien t dataﬂow for conv o- lutional neural netw orks,” in 2016 ACM/IEEE 43r d A nnual International Symp osium on Computer A r chite ctur e (ISCA) , June 2016, pp. 367–379. [22] M. Steyaert and W. Sansen, “A microp o wer lo w-noise monolithic instrumentation ampliﬁer for medical purp oses,” Solid-State Cir cuits, IEEE Journal of , v ol. 22, no. 6, pp. 1163–1168, Dec 1987. [23] B. Murmann. Adc p erformance surv ey 1997-2014. [Online]. A v ailable: h ttp://www.stanford.edu/ ~m urmann/adcsurvey .html [24] M. Horowitz, “1.1 computing’s energy problem (and what we can do ab out it),” in 2014 IEEE Inter- national Solid-State Cir cuits Confer enc e Digest of T e chnic al Pap ers (ISSCC) , F eb 2014, pp. 10–14. [25] V. Sze, Y. Chen, T. Y ang, and J. S. Emer, “Eﬃcient pro cessing of deep neural netw orks: A tutorial and survey ,” Pr o c e e dings of the IEEE , vol. 105, no. 12, pp. 2295–2329, Dec 2017. [26] S. B. Davis and P . Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in contin uously spoken sen tences,” A c oustics, Sp e e ch and Signal Pr o c essing, IEEE T r ans- actions on , pp. 357–366, 1980. [27] J. Co oley and J. T ukey , “An algorithm for the mac hine calculation of complex fourier series,” Mathe- matics of Computation , vol. 19, no. 90, pp. 297–301, 1965. [28] F. Rosen blatt, “The p erceptron: A probabilistic mo del for information storage and organization in the brain,” Psycholo gic al R eview , pp. 65–386, 1958. [29] S. Ioﬀe and C. Szegedy , “Batch normalization: A ccelerating deep net work training b y reducing internal co v ariate shift,” 2015, pp. 448–456. [30] Cortex-M4: T e chnic al R efer enc e Manual , ARM Limited, March 2010. [31] T. Haine, Q. Nguyen, F. Stas, L. Moreau, D. Flandre, and D. Bol, “An 80-mhz 0.4v ulv sram macro in 28nm fdsoi achieving 28-fj/bit access energy with a ulp bitcell and on-c hip adaptiv e back bias generation,” in ESSCIR C 2017 - 43r d IEEE Eur op e an Solid State Cir cuits Confer enc e , Sept 2017, pp. 312–315. 18 [32] CY62126EV30 MoBL: 1-Mbit (64K x 16) Static RAM , Cypress Semiconductor Corp oration, 2017, rev. *P . [33] F. Rosas and C. Ob erli, “Impact of the c hannel state information on the energy-eﬃciency of mimo comm unications,” IEEE T r ansactions on Wir eless Communic ations , vol. 14, no. 8, pp. 4156–4169, 2015. [34] ——, “Modulation and snr optimization for achieving energy-eﬃcien t communications ov er short- range fading c hannels,” IEEE T r ansactions on Wir eless Communic ations , v ol. 11, no. 12, pp. 4286– 4295, 2012. [35] F. Rosas, R. D. Souza, M. E. Pellenz, C. Ob erli, G. Brante, M. V erhelst, and S. Pollin, “Optimizing the co de rate of energy-constrained wireless communications with harq,” IEEE T r ansactions on Wir eless Communic ations , v ol. 15, no. 1, pp. 191–205, 2016. [36] S. Cui, A. J. Goldsmith, and A. Bahai, “ Energy-constrained mo dulation optimization,” IEEE T r ans- actions on Wir eless Communic ations , v ol. 4, no. 5, pp. 2349–2360, Sept. 2005. [37] ——, “ Energy-eﬃciency of MIMO and co operative MIMO techniques in sensor netw orks,” IEEE Journal on Sele cte d A r e as in Communic ations , vol. 22, no. 6, pp. 1089–1098, Aug. 2004. [38] J. J. W. M. Gusta vsson and N. N. T an, CMOS Data Converters for Communic ations . Boston, MA: Klu wer, 2000. [39] F. Rosas, G. Bran te, R. D. Souza, and C. Ob erli, “Optimizing the co de rate for ac hieving energy-eﬃcien t wireless comm unications,” in Wir eless Communic ations and Networking Confer enc e (W CNC), 2014 IEEE . IEEE, 2014, pp. 775–780. [40] F. Rosas and C. Ob erli, “Nak agami-m approximations for m ultiple-input multiple-output singular v alue decomposition transmissions,” IET Communic ations , v ol. 7, no. 6, pp. 554–561, 2013. [41] M. Siekkinen, M. Hiienk ari, J. Nurminen, and J. Nieminen, “Ho w low energy is blueto oth low energy? comparativ e measurements with zigb ee/802.15.4,” in Wir eless Communic ations and Networking Con- fer enc e W orkshops (WCNCW), 2012 IEEE , 2012, pp. 232–237. [42] A. Balankutty , S.-A. Y u, Y. F eng, and P . Kinget, “ A 0.6-V Zero-IF/Low-IF Receiver With Integrated F ractional-N Synthesizer for 2.4-GHz ISM-Band Applications,” IEEE Journal of Solid-State Cir cuits , v ol. 45, no. 3, pp. 538–553, March 2010. [43] Sp e ciﬁc ations for L o c al and Metr op olitan A r e a Networks- Sp e ciﬁc R e quir ements Part 15.4 , IEEE Std. 802.15.4, 2006. 19

A multi-layered energy consumption model for smart wireless acoustic sensor networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment