Distributed Principal Component Analysis for Wireless Sensor Networks
The Principal Component Analysis (PCA) is a data dimensionality reduction technique well-suited for processing data from sensor networks. It can be applied to tasks like compression, event detection, and event recognition. This technique is based on …
Authors: ** - Yann‑Aël Le Borgne (Université Libre de Bruxelles, Belgium) - Sylvain Raybaud (École Normale Supérieure de Cachan, France) - Gianluca Bontempi (Université Libre de Bruxelles
Sensors 2008 , 8 , 4821-4850; DOI: 10.3390/sensors OPEN A CCESS sensors ISSN 1424-8220 www .mdpi.org/sensors Article Distrib uted Principal Component Analysis f or Wir eless Sensor Networks Y ann-A ¨ el Le Borgne 1 ,? , Sylvain Raybaud 2 , and Gianluca Bontempi 1 1 Machine Learning Group, D ´ epartement d’Informatique, Facult ´ e des Sciences, Uni versit ´ e Libre de Bruxelles, Boule v ard du Triomphe, 1050 Brussels, Belgium 2 ´ Ecole Normale Sup ´ erieure de Cachan, 61, A v enue du Pr ´ esident W ilson, 94235 Cachan Cedex, France E-mails: yleborgn@ulb .ac.be; sraybaud@dptmaths.ens-cachan.fr; gbonte@ulb .ac.be ? Author to whom correspondence should be addressed. Received: 27 May 2008; in r evised form: 29 July 2008 / Accepted: 4 A ugust 2008 / Published: 11 August 2008 Abstract: The Principal Component Analysis (PCA) is a data dimensionality reduction technique well- suited for processing data from sensor networks. It can be applied to tasks like compression, e vent detection, and ev ent recognition. This technique is based on a linear transform where the sensor measurements are projected on a set of principal components . When sensor measure- ments are correlated, a small set of principal components can explain most of the measure- ments variability . This allows to significantly decrease the amount of radio communication and of energy consumption. In this paper , we sho w that the po wer iteration method can be distributed in a sensor network in order to compute an approximation of the principal compo- nents. The proposed implementation relies on an aggregation service, which has recently been sho wn to provide a suitable framework for distributing the computation of a linear transform within a sensor network. W e also extend this previous work by providing a detailed analysis of the computational, memory , and communication costs inv olved. A compression e xperiment in volving real data v alidates the algorithm and illustrates the tradeof fs between accurac y and communication costs. K eywords: W ireless sensor netw orks, distrib uted principal component analysis, in-network aggregation, po wer iteration method. Sensors 2008 , 8 4822 1 Introduction Ef ficient in-netw ork data processing is a ke y factor for enabling wireless sensor networks (WSN) to extract useful information and an increasing amount of research has been dev oted to the de velopment of data processing techniques ???? . W ireless sensors hav e limited resource constraints in terms of energy , network data throughput and computational po wer . In particular , the radio communication is an energy consuming task and is identified in many deployments as the primary factor of sensor node’ s battery exhaustion ? . Emitting or recei ving a packet is indeed orders of magnitude more energy consuming than elementary computational operations. The reduction of the amount of data transmissions has therefore been recognized as a central issue in the design of wireless sensor networks data gathering schemes ? . Data compression is often acceptable in real settings since ra w data collected by sensors typically contain a high degree of spatio-temporal redundancies ???? . In fact, most applications only require approximated or high-lev el information, such as the average temperature in a room, the humidity lev els in a field with a ± 10% accuracy , or the detection and position of a fire in a forest. An attracti ve frame work for processing data within a sensor network is provided by the data aggrega- tion services such as those de veloped at UC Berkeley (T inyDB and T A G projects) ?? , Cornell Uni versity (Cougar) ? , or EPFL (Dozer) ? . These services aim at aggre gating data within a netw ork in a time- and energy-ef ficient manner . They are suitable when the netw ork is connected to a base station from which queries on sensor measurements are issued. In T A G or T inyDB, for example, queries are entered by means of an SQL-like syntax which tasks the network to send raw data or aggregates at regular time in- terv als. These services make possible to compute “within the netw ork” common operators like aver age , min , max , or count , thereby greatly decreasing the amount of data to be transmitted. Services typically rely on synchronized routing trees along which data is processed and aggre gated along the way from the leav es to the root ?? . Recently , we have sho wn that a data aggregation service can be used to represent sensor measurements in a dif ferent space ? . W e suggested that the space defined by the principal component basis, which makes data samples uncorrelated, is of particular interest for sensor networks. This basis is returned by the Principal Component Analysis (PCA) ? , a well-known technique in multiv ariate data analysis. The design of an aggregation scheme which distrib utes the computation of the principal component scores (i.e., the transformed data in the PCA space) has three major benefits. First, the PCA provides varying le vels of compression accuracies, ranging from constant approximations to full recovery of original data. Second, simple adaptiv e protocols can lev erage this flexibility by trading network resources for data accuracy . Third, principal component scores contain suf ficient information for a variety of WSN applications like approximate monitoring ? , feature prediction ?? and e vent detection ?? . The approach we proposed in ? exclusi vely addresses the distribution of the computation of the principal component scores and requires the component basis to be computed beforehand in a centralized manner . This limits the applicability of the PCA to small networks, as the centralized computation of the principal component basis does not scale with the network size. The main contribution of this article is to provide a distributed implementation of the principal compo- nent basis computation. The algorithm is based on the Po wer Iteration Method ?? , an iterati ve technique for computing the principal component basis, which can be implemented in a distributed manner by Sensors 2008 , 8 4823 means of an aggre gation service. In particular we show that this algorithm can properly compute the principal component basis under the mild hypothesis that the sensor measurements collected by distant sensors are not significantly correlated. W e also extend the pre vious w ork of ? by an in-depth discussion of the network tradeof fs and scalability issues. The article is structured as follows. Section 2 revie ws pre vious work in the domain by describing the principles of data aggreg ation services in WSN and detailing how principal component scores can be computed by an aggregation service. This section also pro vides a brief o vervie w of PCA together with potential WSN applications and related tradeof fs. Section 3 presents the main issues in the distrib ution of the PC basis and proposes an implementation based on a data aggregation service. Experimental results illustrating the tradeoffs between accuracy and communication costs are discussed in Section 4. Additional related work and future research tracks are addressed in Section 5. 2 Principal component aggregation in wir eless sensor networks Let us consider a static network of size p whose task is to collect at regular time instants sensor measurements or aggre gates thereof, and to route them to a destination node. This scheme is standard for plenty of real-time WSN applications like surveillance, actuator control or e vent detection. The destination node is commonly referred to as the sink or the base station and is often assumed to benefit from higher resources (e.g., a desktop computer). Let t ∈ N refer to a discretized time domain and x i [ t ] be the measurement collected by sensor i , 1 ≤ i ≤ p , at time t . At each time instant t , the p resulting measurements can be seen as components of a vector x [ t ] ∈ R p . The sampling period is also referred to as epoch . Since the communication range of a single node is limited, sensors which are not in communication range of the sink have their measurements relayed by intermediate sensors. A standard approach consists in setting up a multi-hop netw ork by means of a routing tree whose root is connected to the sink (Figure 1). Different metrics such as hop count, latenc y , energy consumption and network load may be taken into account during the design of the routing tree ? . 2.1 Data aggr e gation 2.1.1 Aggregation ser vice An aggregation service allows to aggregate data within a network in a time- and energy-ef ficient manner . A well-kno wn example of aggre gation service is the T A G system, dev eloped at the Univ ersity of Berkele y , California ?? . T AG stands for T iny A Ggregation and is an aggregation service for sensor networks which has been implemented in T inyOS, an operating system with a low memory footprint specifically designed for wireless sensors ? . In T A G, an epoch is divided into time slots so that sensors’ acti vities are synchronized according to their depth in the routing tree. An y algorithm can be relied on to create the routing tree, as long as it allows data to flow in both directions of the tree and does not send duplicates ? . The T A G service focuses on lo w-rate data collection tasks which permits loose synchronization of the sensor nodes. The overhead implied by the synchronization is therefore assumed to be lo w . The Sensors 2008 , 8 4824 r o o t n o d e r a d i o r a n g e b a s e s t a t i o n r o u t i n g t r e e Figure 1 . A multi-hop network where all the sensor nodes are connected to the base station by means of a routing tree. Note that as the radio range usually allo ws a node to communicate with se veral others, dif ferent routing trees can be obtained. The black node represents the root node. goal of synchronization is to minimize the amount of time spent by sensors in powering their dif ferent components and to maximize the time spent in the idle mode, in which all electronic components are off except the clock. Since the ener gy consumption is se veral orders of magnitude lo wer in the idle mode than when the CPU or the radio is activ e, synchronization significantly extends the wireless sensors’ lifetime. An illustration of the sensors’ activities during an epoch is gi ven in Figure 2 for a network of four nodes with a routing tree of depth three. Note that the synchronization is maintained at the transport layer of the network stack, and does not require precise synchronization constraints as in TDMA. Rather , nodes are synchronized by timing information included in data pack ets, and a CSMA-like MA C protocol with a random backof f scheme is used at the link layer to resolve multiple access collisions. 2.1.2 Aggregation primiti ves Once the routing tree is set up and the nodes synchronized, data can be aggregated from the lea ves to the root. Each node adds its contrib ution to a partial state r ecor d X which is propagated along the routing tree. Partial state records are mer ged when tw o (or more) of them arri ve at the same node. When the partial state record is e ventually deliv ered by the root node to the base station, the desired result is returned by means of an ev aluator function. An aggregation service requires then the definition of three primiti ves ?? : • an initializer init which transforms a sensor measurement into a partial state record, • an aggregation operator f which merges partial state records, and Sensors 2008 , 8 4825 n4 n3 n2 n1 Sink n3,n4 n2 n1 T ime Start of the epoch End of the epoch sink Idle Sensing (idle radio) T ransmission, Reception and processing Reception T ransmission Base station Base station Figure 2 . Acti vities carried out by sensors in the routing tree (adapted fr om [13]) . T ransmissions are synchronized for optimizing ener gy sa vings. The last stage in volv es all sensors and allo ws unsynchro- nized operations (for sensor discov ery e.g.). • an ev aluator e which returns, on the basis of the root partial state record, the result required by the application. Note that when partial state records are scalars or vectors, the three operators defined abov e may be seen as functions. Partial state records may howe ver be an y data structure which, following the notations of ? , are represented using the symbols h . i . W e illustrate the aggreg ation principle by the follo wing e xample. Suppose that we are interested in computing the Euclidean norm of the v ector containing the WSN measurements at a gi ven epoch. Rather than by directly sending all the p measurements to the base station for computation, an aggregation service (Figure 3) can obtain the same result in an online manner once the following primiti ves are implemented : init ( x ) = h x 2 i f ( h X i , h Y i ) = h X + Y i e ( h X i ) = √ X In this example the partial state records X and Y are scalars of the form X = X i ∈ I x 2 i where I ⊂ { 1 , . . . , p } . It is worth noting that in-network aggregation does not only reduce the amount of pack ets transmitted but also increases the network lifetime. The energy consumption of the processing unit of a sensor node is at least one order of magnitude lo wer than the one of the wireless radio ??? . According to the analysis ? of the main wireless sensor platforms av ailable today on the mark et, sending one bit of data Sensors 2008 , 8 4826 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 S 1 = x 2 1 S 3 = ! 3 i =1 x 2 i S 4 = ! 4 i =1 x 2 i S 8 = ! 8 i =1 x 2 i S 12 = ! 12 i =1 x 2 i S 16 = ! 16 i =1 x 2 i S 5 = x 2 5 S 7 = ! 7 i =5 x 2 i nor m = e ( S 16 )= " ! 16 i =1 x 2 i Base station Figure 3 . Aggre gation service at work for computing the norm of the v ector measured. is equi v alent to 2000 CPU c ycles in terms of ener gy . For a packet length of 30 bytes (a verage Tin yOS packet length), this corresponds to 480000 CPU cycles. It follows that, with respect to transmission costs, the ener gy consumption of simple aggre gation operators is negligible. There is therefore a lar ge v ariety of in-network processing algorithms that can lead to ener gy sa vings by reducing the number of packet transmissions. 2.1.3 Communication costs This section details the communication costs caused by three types of network operation. The first is the default data collection operation in which all measurement are routed to the sink without any aggregation. This is referred to as the D operation. The second is the aggr e gation operation, referred to as A operation, which consists in tasking the network to retriev e an aggregate by means of the aggregation service. Finally , we denote by F the feedbac k operation which consists in flooding the aggregate obtained at the sink back to the whole set of sensors. Let q be the size of a partial state record, C i be the number of direct children of node i in the routing tree, R T i be the size of the subtree whose root is the i -th node and i ∗ C = arg max i C i the node whose number of children is the highest. The follo wing analysis compares the orders of magnitude of the communication costs caused by the D, A and F operations, respectiv ely . For this reason we consider the number of packets processed by each node in an ideal case where ov erhearing, collisions or retransmissions are ignored. D operation W ithout aggre gation, all the measurements are routed to the base station by means of the routing tree. As mentioned before, the network load at the sensor nodes, i.e., the sum of recei ved and transmitted packets, is ill-balanced. The load is the lowest at by leaf nodes, which only send one packet per epoch, while the load is the highest at the root node which processes 2 p − 1 pack ets ( p − 1 receptions and p transmissions) per epoch. The load at a generic i -th sensor node depends on the routing tree, and amounts to 2 RT i − 1 packets per epoch. Sensors 2008 , 8 4827 A operation During the aggregation, the i -th node sends q packets and recei ves a number of packets which depends on its number C i of children. The total number of packets processed is therefore q ( C i + 1) per epoch. The load is the lowest at leaf nodes, which only have q pack ets to send, while the load is the highest at the node whose number of children is the highest. F operation The feedback operation consists in propagating the aggre gated value back from the root do wn to the all leav es of the tree. This operation can be used, for instance, to get all sensor nodes acquainted with the overall norm of their measurements or with the approximation ev aluated at the sink. The feedback of a packet containing the result of the e valuation generates a network load of tw o packets for all non-leaf nodes (one reception and one transmission for forw arding the pack et to the children) and of one packet for the lea ves (one reception only) . 2.2 Principal component analysis This section describes the Principal Component Analysis (PCA), a well-known dimensionality reduc- tion technique in statistical data analysis, data compression, and image processing ?? . The technique is based on a basis change which reduces the number of coordinates used to represent the data while max- imizing the variance retained. Its use is particularly adapted to correlated data where the dimensionality of the data can be strongly reduced while retaining most of the measurement v ariations. Gi ven a set of N centered multi variate measurements x [ t ] ∈ R p used as training samples, 1 ≤ t ≤ N , the PCA basis is obtained by minimizing the follo wing optimization function J q ( x [ t ] , w k ) = 1 N P N t =1 || x [ t ] − P q k =1 w k w T k x [ t ] || 2 = 1 N P N t =1 || x [ t ] − ˆ x [ t ] || 2 (1) where the set of q ≤ p vectors { w k } 1 ≤ k ≤ q of R p form the PCA basis. The linear combinations ˆ x [ t ] = P q k =1 w k w k T x [ t ] represent the projections of x [ t ] on the PCA basis, and Eq. (1) therefore quantifies the mean squared error the original measurements x [ t ] and their projections. Note that the measurements are centered so that the origin of the coordinate system coincides with the centroid of the set of measurements. This translation is desirable to av oid a biased estimation of the basis { w k } 1 ≤ k ≤ q of R p to wards the centroid of the set of measurements. The optimization function equi valently maximizes the v ariance retained by the family of v ectors { w k } 1 ≤ k ≤ q . Setting to zero the deri vati ve of J q ( x [ t ] , w k ) with respect to w k gi ves Cw k = λ k w k (2) where C = 1 N P N t =1 x [ t ] x [ t ] T is the sample cov ariance matrix of the sensor measurements x [ t ] . The minimizer of Equation (1) is giv en by the set of the first q eigen vectors { w k } of the sample cov ariance matrix C ? . These eigenv ectors are called the principal components and form the PCA basis. The corresponding eigen v alues λ k quantify the amount of v ariance conserved by the eigen vectors. Indeed, left-multiplying Equation (2) by w k T gi ves w k T Cw k = w k T λ k w k (3) Sensors 2008 , 8 4828 and as w k T Cw k = w k T ( 1 N P N t =1 x [ t ] x [ t ] T ) w k = P N t =1 ˆ x [ t ] T ˆ x [ t ] and w k T w k = 1 , each eigen value λ k quantifies the v ariance of the projections of the measurements x [ t ] on the corresponding k -th eigen vector . The sum of the eigen v alues therefore equals the total variance of the original set of observ ations X , i.e.: p X k =1 λ k = 1 N N X t =1 || x [ t ] || 2 It is con venient to order the vectors w k by decreasing order of the eigen values, so that the proportion P of retained v ariance by the first q principal components can be e xpressed by: P ( q ) = P q k =1 λ k P p k =1 λ k (4) Storing columnwise the set of vectors { w k } 1 ≤ k ≤ q in a W p × q matrix, approximations ˆ x [ t ] to x [ t ] in R p are obtained by: ˆ x [ t ] = W W T x [ t ] = W z [ t ] (5) where z [ t ] = W T x [ t ] = P p i =1 w i 1 x i ... P p i =1 w iq x i = p X i =1 w i 1 x i ... w iq x i (6) denotes the column vector of coordinates of ˆ x [ t ] in { w k } 1 ≤ k ≤ q , also referred to as the principal compo- nent scor es . 2.3 Principal component aggr e gation The computation of the principal component scores z [ t ] can be performed within the network if each node i is initialized with the q scalar components w i 1 , .., w iq of the principal component basis ? . As proposed in ? , this initialization can done during a training stage where a set of measurements is first collected from the netw ork and the related cov ariance matrix is estimated. This set of measurements is supposed to be large enough to provide an accurate estimate of the cov ariance matrix. Once the first q principal components are computed, they are sent back to the network and stored in the respectiv e nodes (e.g., the i -th node stores only the elements w i 1 , .., w iq ). While our proposition for distributing the computation of the principal components will be introduced in Section 3, here we limit to re vie w the aggregation mechanism to compute the PC scores in a distrib uted manner . The principal component scores can be computed by summing along the routing tree the vectors ( w i 1 x i [ t ] , .., w iq x i [ t ]) av ailable at each node. The aggregation primiti ves tak e then the following form : init ( x i [ t ]) = h w i 1 x i [ t ]; .. ; w iq x i [ t ] i f ( h x 1 ; .. ; x q i , h y 1 ; .. ; y q i ) = h x 1 + y 1 ; .. ; x q + y q i where partial state records are vectors of size q . Sensors 2008 , 8 4829 2.4 Applications Once the transformed data z [ t ] reaches the base station, it can be used for multiple purposes and applications, such as approximate monitoring, feature extraction or e vent detection. 2.4.1 A pproximate monitoring In approximate monitoring, the state of the process sensed by the WSN at time t is summarized by the vector ˆ x t = e ( z 1 [ t ] , ..., z q [ t ]) = ( ˆ x 1 [ t ] , ..., ˆ x n [ t ]) = W T z [ t ] returned by the e valuator function at the base station. If the number of components is p , the exact set of measurements collected by sensors is retrie ved at the base station. Otherwise, for a number of components less than p , the vector ˆ x t provides an optimal approximation to the real measurement vector in a mean square sense (Equation (4)). Note that the transformation in a principal component basis allo ws the design of simple policies to manage congestion issues, like discarding scores associated to the components with the lo west variance. It is also interesting to note that a simple additional procedure can be set up to assess the accuracy of approximations against a user defined threshold. W e know that the approximation ˆ x i [ t ] for the i -th sensor , 1 ≤ i ≤ p , is gi ven by (5) by: ˆ x i [ t ] = q X k =1 z k [ t ] ∗ w ik Since the elements { w ik } are already a vailable at each node, by sending back to the routing tree the k principal component scores we can enable these two additional functionalities: (i) each sensor is able to compute the approximation retriev ed at the sink, and (ii) each sensor is able to send a notification if the approximation error is greater than some user defined threshold . This scheme, referred to as supervised compr ession in ? , guarantees that all data e ventually obtained at the sink are within ± of their actual measurements. 2.4.2 Dimensionality reduction An important class of WSN applications consists in inferring, on the basis of sensor measurements, some high level information such as the type or number of occurred ev ents. For instance, a WSN mon- itoring the vibrations of a bridge could be used to answer queries concerning the type or number of vehicles that pass ov er the bridge. This class of problems is typically tackled by supervised learning techniques such as Nai ve Bayes classifiers, decision trees, or support vector machines ?? . The PCA is in this context an effecti ve preprocessing technique which simplifies the learning process by reducing the dimensionality of the problem ?? . Sensors 2008 , 8 4830 2.4.3 Event detection The use of PCA for identifying at the netw ork scale une xpected e vents which are not detectable at the node le vel has been discussed in the literature ?? . Detection of such ev ents can be achiev ed by focusing on the lo w variance components. In fact, such components are expected to bear coordinates close to zero, as they typically account for sensor equipment noise. In this context, the principal component aggregation scheme can be used to deliv er the v alue of low variance components, and the ev aluator function takes the form of a statistical test checking if the coordinates on one or more low variance components is dif ferent from zero. 2.5 T radeof fs In this section, we will use the term default to denote the scheme that forwards all the measurements to the sink at each epoch, and the term PCAg to denote the scheme based on the principal component ag- gregation. Let us first consider the distribution of the netw ork load among sensor nodes, with a particular focus on its upper bound. The upper bound, henceforth referred to as highest network load , determines the network capacity , that is the amount of traf fic that the network can handle at one time ? . The highest network load is also related to the network lifetime in terms of time to first failure (i.e., the time at which the first node in the netw ork runs out of energy), as the radio is kno wn to be the most ener gy consuming task of a wireless sensor ?? . In the default scheme , the root of the routing tree is the node that sustains the highest network load as it forw ards to the base station the measurements from all other sensors. Its radio throughput therefore causes the root node to hav e its energy e xhausted first, which may lead the rest of the network to be disconnected from the base station. In the PCAg scheme (Section 2.1.3), the network load is distributed in a more homogeneous manner . The node i ∗ C = arg max i C i whose number of children is the highest becomes the limiting node in terms of highest netw ork load. The network traffic at this node amounts to q ( C i ∗ C + 1) packets per epoch, where q is the number of components retained. A PCAg scheme therefore reduces the highest network load in a configuration where q ( C i ∗ C + 1) ≤ 2 p − 1 . (7) W e illustrate the tradeoff by considering two extreme configurations. If only the first principal component is retained, the PCAg reduces the highest netw ork load as C i ∗ C < p . On the contrary , if all the components are retained, i.e. q = p , the equation (7) simplifies into pC i ∗ C ≤ p − 1 , and the PCAg incurs a higher network load than the default scheme. The main parameter of the approach is therefore the number of principal components to retain, which trades network load for sensing task accurac y . The amount of information retained by a set of q principal components depends on the de gree of correlation among the data sources. Whenev er a set of sensors collects correlated measurements, a small proportion of principal components is likely to support most of the variations observed by the network. The number of principal components to retain can be specified by the application, for example by means of a cross validation procedure on the accuracy of the sensing task at the base station ? . It can be also be dri ven by network constraints or netw ork congestion issues. In Sensors 2008 , 8 4831 the latter case, a policy prioritizing pack ets containing the component scores of the principal components would allo w to retain a maximum amount of information under network load constraints. Finally , gi ven a number of principal components, note that the highest network load may be further reduced if the algorithm that b uilds the routing tree uses C i ∗ C as a parameter . The network load indeed depends on the number of children a node has. Although we de not address this aspect, routing strate gies that aim at lo wering C i ∗ C can therefore further improv e the efficienc y of the PCAg scheme. 3 Computation of the principal components 3.1 Outline This section addresses the initialization procedure whereby each node i can be initialized with the elements w i 1 , .., w iq of the principal components w 1 , .., w q . In ? , we proposed a centralized approach for performing this initialization stage. This approach is first briefly recalled in 3.2. W e then e xtend this work by showing that a fully distributed implementation can perform this initialization process. The proposed approach relies on the power iteration method (PIM), a classic iterati ve technique for computing the eigen vectors of a matrix ? , and on a simplifying assumption on the co variance structure which allo ws to estimate the cov ariance matrix in a distributed way . Section 3.3 presents and discusses the distribution of the cov ariance matrix computation and storage. W e then show in Section 3.4 that by means of the distrib uted covariance matrix computation, the PIM can be implemented in an aggre gation service in a distributed manner . Each of these sections is concluded by a complexity analysis of the communication, computational, and memory costs. W e finally discuss in Section 3.5 three methods for performing the initialization stage. 3.2 Centralized appr oach Let us suppose that t vectors of sensor measurements x [ τ ] ∈ R p , τ ∈ { 1 , .., t } are retrie ved at the base station. Let X [ t ] denote the p × t matrix whose τ -th column-vector is the v ector x [ τ ] ∈ R p . An estimate of the cov ariance matrix C [ t ] = ( c ij [ t ]) 1 ≤ i,j ≤ p is obtained at time t by computing C [ t ] = 1 t t X τ =1 ( x [ τ ] − ¯ x [ t ])( x [ τ ] − ¯ x [ t ]) T = 1 t X [ t ] X [ t ] T − ¯ x [ t ] ¯ x [ t ] T (8) where ¯ x [ t ] = 1 t P t τ =1 x [ τ ] is the average v ector . The co variance matrix can be updated recursively as ne w v ectors of observations x [ t + 1] are made av ailable at the base station. From Equation (8) it follows that c ij [ t ] = 1 t S ij [ t ] − 1 t 2 S i [ t ] S j [ t ] (9) Sensors 2008 , 8 4832 where S i [ t ] = t X τ =1 x i [ τ ] = S i [ t − 1] + x i [ t ] S ij [ t ] = t X τ =1 x i [ τ ] x j [ τ ] = S ij [ t − 1] + x i [ t ] x j [ t ] (10) This means that the storage of the quantities t , S j and S ij , 1 ≤ i, j ≤ p , in the base station makes possible the update of the co variance matrix estimation C [ t ] at time t . Once the co variance matrix is estimated, the principal components can be computed using a standard eigendecomposition method ? and its elements communicated to the sensor nodes. 3.2.1 Scalability analysis Highest network load: The centralized estimation of the cov ariance matrix from a set of t vectors of observ ations requires t operations of type D as defined in Section 2.1.3. The root node supports the highest netw ork load, which is therefore in O ( tp ) . Once the eigen vector decomposition is performed at the base station, the root node then transmits the principal components to the sensor nodes, which requires q p operations of type F . Overall, the communication cost amounts to O ( tp + q p ) . Computational and memory costs: From Equation (10), the computational cost related to the co vari- ance matrix at the base station is O ( tp 2 ) . The memory cost for storing the matrix of observations and the cov ariances is O ( tp + p 2 ) or O ( p 2 ) if recursive updates are relied on. As far as the estimation of the principal components is concerned, the cost of standard eigendecomposition algorithm is O ( p 3 ) in terms of computation and O ( p 2 ) in terms of memory ? . 3.3 Distributed estimation of the covariance matrix This section presents an alternativ e methodology to distribute the computation and storage of the cov ariance matrix within the network. The methodology relies on the observ ation that sensors that are geographically distant often present lo w correlation values. This hypothesis often holds in real-w orld situations as shown in ?? . This leads us to make the simplifying hypothesis that cov ariances between sensors which are out of reach is equal to zero. Note that the radio range can usually be tuned on wireless sensor platforms by increasing or reducing the power transmission level ? . If av ailable, this feature could be used to set the radio range such that nearby correlated sensors are within radio range. Let N i be the neighborhood of the sensor i , i.e., the set of sensors which are in the communication range of the i -th node. An instance of neighborhood is illustrated in Figure 1. Our simplifying hypothesis assumes that c ij [ t ] = 0 for all j / ∈ N i . This makes the computation of the cov ariance matrix scalable to large networks. The cov ariances between sensors i and j can indeed be computed recursiv ely by Equation (10) once the sensor i keeps over time a record of t , S j and S ij for j ∈ N i . This hypothesis will be referred to in the follo wing as the local covariance hypothesis . Sensors 2008 , 8 4833 3.3.1 Positi ve semi-definiteness criterion If the local cov ariance hypothesis is not v erified, the distributed sparse matrix obtained by the local cov ariance hypothesis may lead to a globally non positi ve definite matrix (i.e., that contains negati ve eigen v alues). The positi ve semi-definiteness is ho wever a necessary (and suf ficient) condition for a symmetric matrix to be a cov ariance matrix. T o face this issue, the approach adopted in this article is to discard eigen vectors whose eigen v alues are found to be ne gativ e. This approach, according to ? , is the simplest w ay to transform a non positi ve definite matrix in a semi positiv e definite matrix. The detection of a ne gativ e eigen value will be detailed in Section 3.4.2, and used as a stopping criterion in the last line of Algorithm 2. A good revie w of existing methods for transforming the approximated cov ariance matrix can be found in ? . Our opinion is that it seems howe ver difficult to implement these method in a distributed manner . Note that despite its simplicity , the method that consists in discarding negati ve eigen vectors was observed by authors in ? to provide good approximations to the co variance matrix. 3.3.2 Scalability analysis Highest network load: Each update requires a node i to send one pack et (its measurement) and to recei ve |N i | pack ets (other sensors’ measurements). Let i ∗ N be the node that has the largest number of neighbors, i.e., i ∗ N = arg max i |N i | . The highest network load is therefore O ( t |N i ∗ N | ) . Computational and memory costs: From Equation (10), the updates of S j and S ij demand 3 |N i | floating points additions/subtractions, 2 |N i | floating points multiplications and 2 |N i | floating points divi- sions. The highest computational cost therefore scales in O ( |N i ∗ N | ) . The number of v ariables to maintain at a sensor node is one integer and 2 |N i | floating points numbers, so the highest memory cost amounts to O ( |N i ∗ N | ) . 3.4 Distributed estimation of the principal components The distribution of eigendecomposition algorithms is a fairly kno wn research area ? . Ho wev er, the distribution sought in our problem should comply with the specificity that each node only has access to a single dimension of the problem and needs a specific entry of each principal eigen vector . W e propose in the follo wing an approach that le verages the aggregation service to implement in a rather simple way the po wer iteration method (PIM), a well-kno wn solution to the eigen value decompo- sition problem ? . The resulting technique complies with the specific features mentioned above, and lets each node i locally identify the subset { w ik } required by the principal component aggregation. After describing the PIM in Sections 3.4.1 and 3.4.2, we detail in Sections 3.4.3 and 3.4.4 its imple- mentation in the aggreg ation service. The analysis of comple xity of the method is provided in Section 3.4.5. Sensors 2008 , 8 4834 3.4.1 Po wer iteration method Let C be a co variance matrix. The power iteration method starts with a random v ector v 0 . At the t -th iteration, the vector v t +1 is obtained by multiplying C by v t , and by normalizing it. It can be easily sho wn ? that v t con ver ges to the principal eigen vector w 1 of C , under the mild condition that v 0 is not strictly orthogonal to the principal eigen vector . The con vergence rate is exponential, its base being the squared ratio of the two principal eigen v alues. The con ver gence criteria can be defined either as a minimum variation δ for v t , or as a highest number of iterations t max ?? . Algorithm 1 outlines the dif ferent steps. Algorithm 1 Standard po wer iteration algorithm v 0 ← arbitrary initialization t ← 0 repeat v t ← Cv t v t +1 ← v t k v t k t ← t + 1 until t > t max and / or k v t +1 − v t k ≤ δ return v t Note that as the method con verges to the principal eigen vector w 1 , the normalizing f actor k v t k con- ver ges to the associated eigen value λ 1 , as by definition : Cw 1 = λ 1 w 1 (11) In practical settings, the power method quickly con verges to a linear combination of eigen vectors whose eigen v alues are close, or to the eigen vector whose eigen value is the highest if eigen v alues are well separated. As our purpose here is to find the subspace that contains the signal, eigen vectors with close eigen values can be thought of as generating a subspace where similar amount of information are retained along any v ector of the subspace. The con ver gence to a linear combination of eigen vectors with close eigen val ues is therefore deemed acceptable as far as principal component aggregation is concerned. 3.4.2 Computation of subsequent eigen vectors The standard way to employ PIM in order to find the other eigen vectors (up to the number q ) is the deflation method which consists in applying the PIM to the cov ariance matrix from which we have remov ed the contribution of the k principal eigen vector already computed ? . W e obtain v t +1 = ( C − k X l =1 w l λ l w T l ) v t = Cv t − k X l =1 w l λ l w T l v t Sensors 2008 , 8 4835 which boils down to orthogonalize v t with respect to all pre viously identified eigen vectors { w l } 1 ≤ l t max or k v t +1 − v t k ≤ δ λ k ← ± k v t k w k ← v t +1 until k = q or λ k < 0 3.4.3 Implementation in the aggregation ser vice A fully distributed implementation of the algorithm is possible if the covariance matrix has been estimated as proposed in 3.3. The computation of one iteration therefore goes in two stages : Computation of Cv ∀ i ∈ { 1 , .., p } , ( Cv ) [ i ] = p X j =1 c ij v [ j ] As we made the assumption that: ∀ i ∈ { 1 , .., p } , ∀ j 6∈ N i , c ij = 0 Sensors 2008 , 8 4836 The sum can be simplified : ∀ i ∈ { 1 , .., p } , ( Cv ) [ i ] = X j ∈N i c ij v [ j ] Recall that each node i has av ailable the set of cov ariances ( c ij ) , j ∈ N i . Therefore at each step t , node i only has to broadcast v t [ i ] and to recei ve the v t [ j ] , j ∈ N i from its neighbors for updating locally its P j ∈N i c ij v [ j ] = ( Cv ) [ i ] . The power iteration method can thus be implemented in a fully distributed manner . Normalization and orthogonalization The norm can be computed using the primitiv es detailed in Section 2.1.2, while the orthogonalization is obtained by means of these primiti ves: init ( v t [ i ]) = h ( v t [ i ] w l [ i ]) 1 ≤ l ≤ k − 1 i f ( h X i , h Y i ) = h X + Y i e ( h X i ) = X . The resulting k − 1 scalar products {h w l , v t i} 1 ≤ l ≤ k − 1 are then communicated to the sensors by means of an F operation, so that each sensor i can locally orthogonalize v t +1 [ i ] ← v t [ i ] − P k − 1 l =1 h v t , w l i w l [ i ] . 3.4.4 Synchronization The nodes hav e to be synchronized so that the y all work on the same copy of v at the same time. The scheduling policy is presented in Algorithm 3. The implementation is straightforward since the aggregation service is synchronized. Algorithm 3 Network synchronization scheme for the distrib uted power iteration algorithm k ← 0 repeat ∀ i node i is initialized with C [ i, i ] repeat The nodes get their neighbor’ s value ( v t [ j ]) j ∈N i The computation of C v t is performed in parallel k v t k and {h w l , v t i} 1 ≤ l ≤ k − 1 are computed by the aggregation service k v t k and {h w l , v t i} 1 ≤ l ≤ k − 1 are fed back in the network For all i , v t + 1 [ i ] ← ( C v t )[ i ] − P k − 1 l =1 h v t , w l i w l [ i ] k v t k (in parallel on all nodes) until con ver gence is obtained w k ← v until k = q or λ k ≤ 0 Sensors 2008 , 8 4837 3.4.5 Scalability analysis Highest network load: At each iteration, the computation of the product Cv t requires a node i to send one packet and to recei ve |N i | pack ets. The highest netw ork load for this operation is therefore O ( |N i ∗ N | ) . The normalization step implies one A and one F operations (as defined in 2.1.3), and the orthogonalization step implies k − 1 operations of type A and F respecti vely , where k is the index of the principal component computed. The highest netw ork load for the computation of the q first principal components amounts to O ( q |N i ∗ N | + q 2 |C i ∗ C | ) . The highest netw ork load related to the power iteration method is therefore quadratic in the number of principal components computed. Depending on the constraints of the network, it may be more interesting to retriev e the approximated cov ariance matrix from the network at the base station for a centralized computation of the eigen vectors. The tradeoff depends here on the communication and computational resources av ailable in the network, and will be discussed more in detail in Section 3.5. Computational and memory costs For a node i , the cost of the computation of Cv t is O ( |N i | ) . The cost of the normalization step is O ( |C i | ) for node i , and the orthogonalization step is O ( k |C i | ) . The ov erall highest computational cost therefore amounts to O ( q ( |N i ∗ N | + |C i ∗ C | ) . Regarding memory costs, each node i needs to maintain v ariables for storing its local w ik and its neighbors parameters w j k , j ∈ N i . The complexity of the highest memory cost is therefore O ( q + |N i ∗ N | ) . 3.5 Summary The benefits of the different approaches for computing the co variance matrix and its principal com- ponents depend on the communication, computational and memory constraints av ailable in the network nodes and at the base station. A summary of the complexities of these approaches is gi ven in T able 1. Operation Communication Computation Memory Cov ariance Centralized O ( pT ) O ( p 2 T ) O ( p 2 ) Distributed O ( |N i ∗ N | T ) O ( |N i ∗ N | T ) O ( |N i ∗ N | ) Eigen vectors Centralized O ( q p ) O ( p 3 ) O ( p 2 ) Distributed O ( q 2 |N i ∗ N | ) O ( q ( |N i ∗ N | + |C i ∗ C | )) O ( q + |N i ∗ N | ) T able 1 . Scalability of the centralized, hybrid and distributed schemes. 4 Experimental results This section illustrates by means of a compression task the ability of the proposed approach to prop- erly identify the principal component subspace, and discusses the tradeof fs in volv ed between the com- pression accuracy and the communication costs. The experimental study is based on a set of real world Sensors 2008 , 8 4838 temperature measurements in volving a network of 52 sensors (Section 4.1), and on a netw ork simulation used to vary the structure of the routing trees to study the impact of the netw ork topology on the network load (Section 4.2). Results on the ability of a few components to retain most of the information are first illustrated in Section 4.3, follo wed in Section 4.4 by an analysis of the network loads as the number of retained components increases. The ability of the distributed PCA to properly identify the principal com- ponents is then studied in Sections 4.5 and 4.6, which focus on the co variance matrix approximation and eigen vector e xtraction stages, respectiv ely . Section 4.7 concludes by a discussion on the main results. 4.1 Dataset description Experiments were carried out using a set of fi ve days of temperature readings obtained from a 54 Mica2Dot sensor deployment at the Intel research laboratory at Berkeley ? . The sensors 5 and 15 were remov ed as they did not provide any measurement. The readings were originally sampled e very thirty- one seconds. A preprocessing stage where data w as discretized in thirty second interv als was applied to the dataset. After preprocessing, the dataset contained a trace of 14400 readings from 52 different sensors. The code associated to the preprocessing and the netw ork simulation was de veloped in R, an open source statistical language, and is av ailable from the authors’ web site ? . Examples of temperature profiles and dependencies between measurements are reported in Fig. 4 and 5, respecti vely . The sensors 21 and 49 were the least correlated ones o ver that time period, with a correlation coef ficient of 0 . 59 . The y were situated on opposite sides of the laboratory . T emperature ov er the whole set of data ranged from about 15 ◦ C to 35 ◦ C. 1 2 3 4 5 6 20 25 30 35 Temperature variations over five days Sensor 21 ! 2880 mesurements per day Days Temperature (°C) T ime [day] T emperature [°C] Figure 4 . T emperature measurements col- lected by sensor 21 ov er a fiv e day period. T emperature at sensor 49 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 20 25 30 35 16 18 20 22 24 26 28 Dependencies between sensors 21 and 49 Correlation coefficient: 0.59 Temperature ! Sensor 21 Temperature ! Sensor 49 T emperature at sensor 21 Figure 5 . Examples of the dependencies be- tween the measurements of sensor 21 and sen- sor 49. 4.2 Network simulation The positions of the sensors are provided in ? , and the distrib ution of the sensors in the laboratory can be seen in Fig. 6. W e analyzed the communication costs in different routing trees which were generated in the following way . The root node was always assumed to be the top right sensor node in Fig. 6 (node 16 in ? ). The routing trees were generated on the basis of the sensor positions and the radio range Sensors 2008 , 8 4839 was v aried from 6 meters (minimum threshold such that all sensor could find a parent) to 50 meters (all sensors in radio range of the root node). Starting from the root node, sensors were assigned to their parent in the routing tree using a shortest path metric, until all sensors were connected. An illustration of the routing tree obtained for a maximum communication range of 10m is reported in Fig. 6. 0 10 20 30 40 0 5 10 15 20 25 30 Distance (m) Distance (m) base station Root node Position (m) Position (m) Figure 6 . Map of sensors and the routing obtained for a connectivity of 10 meters. The root node connecting the sink is the top right sensor . The routing tree has a depth of sev en. 4.3 Principal component aggr e gation W e study in this section the amount of variance that the PCA can retain in the measurements for dif ferent training periods and number of components. W e rely on a 10-fold cross v alidation technique to simulate the fact that the measurements used to compute the principal components are not used for assessing the accurac y of the approach. More precisely , the dataset is split in ten consecuti ve blocks (1440 observations – i.e., half a day of measurements). Each of the ten blocks is used in turn as a training set to compute the cov ariance matrix and its eigenv ectors, while the remaining observations, referred to as test set, are used to estimate the percentage of retained variance. This provides ten estimates of the percentage of retained variance on measurements not used for computing the eigen vectors, which we av erage to obtain an ov erall estimate of the method. Fig. 7 pro vides the av erage retained variance on the 10 test sets for the first 25 principal components. The upper line giv es the av erage amount of v ariance retained when the principal components are com- puted with the test sets, and pro vides an upper bound on the compression efficiency that the PCA can achie ve on this dataset. The lower curv e giv es the average amount of v ariance retained on the test set when the components are computed with the training set. This figure sho ws that the first principal com- ponent accounts on av erage for almost 80% of the v ariance, while 90% and 95% of variance are retained Sensors 2008 , 8 4840 with 4 and 10 components, respecti vely . The confidence lev el of these estimates was about ± 5% . Addi- tional experiments were run using K -fold cross validation with K ranging from 2 to 30 . The percentages of retained v ariance on the test data blocks tended to decrease with K . Losses of a fe w percents were observed for K higher than 15 (less than nine hours of data). The amount of retained v ariance increases very fast with the first principal component, and becomes almost linear after about ten components. A linear increase of retained v ariance with the number of principal components reflects the fact that the components obtained by the PCA are actually no better than random components ? . From Fig. 7, it therefore seems that from 10 or 15 components onwards, the remaining v ariations can be considered as white noise. 5 10 15 20 25 70 75 80 85 90 95 100 Variance retained wrt PCs Number of PCs Variance retained (%) Test set Cross validation Number of principal components Percentage of variance retained Figure 7 . Capacity of principal components to retain the measurement v ariance. Fig. 8 illustrates the approximations obtained during the first round of the cross validation (i.e., prin- cipal components are computed from the first 12 hours of measurements) for the sensor 49, using one, fi ve and ten principal components. A single principal component provides rough approximations, which cannot account for fine-grained details of some of the sensor measurements. For e xample, the stabiliza- tion of the temperature around 20 ◦ C around noon during the second, third and fourth day (probably due to the activ ation of an air conditioning system at a location close to sensor 49) are not captured by the approximations. Increasing the number of principal components allo ws to better approximate the local v ariations, and passing to fiv e components provides for example a much better approximation of the sensor measurements. 4.4 Communication costs W e compare in the follo wing the communication costs entailed by the def ault and PCAg schemes for dif ferent types of routing trees. Fig. 9 reports the total number of packets processed by nodes (recei ved Sensors 2008 , 8 4841 T emperature (°C) 15 20 25 30 35 Days 1 2 3 4 5 6 1 2 3 4 5 6 15 20 25 30 35 Approximations to sensor n°49 Days Temperature (°C) Default 1 PC 5 PCs 10 PCs Figure 8 . Approximations obtained on the test set for the sensor 21, using one, fi ve and fifteen principal components. The cov ariance matrix was computed from the first twelve hours of measurements (before the vertical dashed line). and transmitted) during an epoch, for the default and PCAg scheme, as the radio communication range increases. The subfigure on the left reports the ov erall sensor network load, while subfigures in the middle and on the right detail with boxplots the distrib utions of the network load per node in the network, for the default and PCAg schemes, respecti vely . Gi ven that sensors choose as their parent the sensor within radio range that is the closest to the base station, increasing the radio communication range typically leads the routing tree to have a smaller depth, and its nodes to hav e a higher number of children. As was pointed out in Section 2.5, this is detrimental to the PCAg scheme, as this may lead the numbers of packets recei ved by a node to be higher than in the default scheme. For the def ault scheme, increasing the radio range reduces the ov erall sensor network load (Fig. 9, left), and e ventually leads all the nodes but the root to only transmit one packet (Fig. 9, middle, radio range of 50m). In this extreme case, the routing tree has depth one, and all nodes b ut the root are leaf nodes. Note that the highest network load does not depend on the tree topology . This highest load is sustained by the root node which, whate ver the depth of the tree, is required to forward other node’ s packets (i.e., 51 measurements to receiv e and send), and to send its own measurement to send. Its network load is therefore of 103 packets per epoch. For the PCAg scheme, we first report results for the extraction of one component. A nice feature of aggre gation is that the ov erall network load (Fig. 9, left) does not depend on the topology , thanks to the fact that forwarding does not increase the number of transmissions. Looking at the details of the distribution of the network load per node (Fig. 9, right), it is interesting to see that increasing the radio range has a re verse effect for the PCAg scheme, as it tends to increase the network load. This is Sensors 2008 , 8 4842 a direct consequence of the increased number of children induced by routing trees with smaller depths. Eventually , for a fully interconnected network, we observe the same ef fect than for the default scheme, where all the sensors send only one pack et, while the root node sustains the higher netw ork load due to the forwarding task. Note howe ver than thanks to the aggregation process, it only sends one packet, which reduces to 52 packets per epoch its netw ork load (one pack et transmitted and 51 pack ets recei ved). The extraction of one component therefore decreases the network load supported by the root node. The network load incurred by k components is obtained by multiplying the values found in Fig. 9 by k . The PCAg scheme may therefore be less ef ficient than the the default scheme if man y components are extracted. 6 10 20 30 40 50 D operation A operation Overall network load Comparison of D and A operations Communication radius (m) Number of packets processed per cycle (Tx+Rx) 0 200 400 600 800 1000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6 10 20 30 40 50 0 20 40 60 80 100 120 Distribution of the network load Default data collection (D operation) Communication radius (m) Number of packets processed per cycle (Tx+Rx) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6 10 20 30 40 50 0 20 40 60 80 100 120 Distribution of the network load Aggregation on one PC (A operation) Communication radius (m) Number of packets processed per cycle (Tx+Rx) Figure 9 . Communication costs entailed by D and A operations. The overall netw ork load (left) is in all cases higher for a D operation than for an A operation. Details of the distrib ution of the netw ork load per node reported as boxplots (middle and right) rev eal that the distrib utions of the load are more balanced using aggregation. Aggregation also leads to reduce the highest network load. This is illustrated in Fig. 10 where the number of packets processed (recei ved and sent) is reported as a function of the number of principal components e xtracted for a radio range of 10 (Illustrated in Fig. 6). In this routing tree, the highest number of children is 6 . For the e xtraction of one PC, the highest network load is therefore of 7 , i.e., 6 receptions and one transmission, to be compared with the highest network load of 103 for the root node in the default scheme. This results in a reduction of about 85% of the network load. Extracting more than 15 components leads howe ver the highest network load to be higher than in the default scheme, as the sensor node aggregating the packets from its 6 children will sustain a network load of 105 packets per epoch. The ov erall netw ork l oad generated by the aggre gation of k components is in the same manner k times the load generated by of the aggre gation of one component. For a communication radius of 10 meters, Sensors 2008 , 8 4843 this o verall load w as of 103 pack ets as reported in Fig. 9, left, to be compared with an ov erall load of 466 pack ets for the default scheme. The overall load generated by the aggregation process is therefore higher than the load of the def ault scheme from 5 components. This is due to the f act that aggregation, while decreasing the load of the most solicited node, increases e venly the load of all other nodes. ● ● ● ● ● ● ● ● ● ● ● ● ● Default 1 PC 5 PCs 15 PCs 0 20 40 60 80 100 Network load incurred by default and PCA schemes Number of packets processed per cycle Figure 10 . Comparison of the communication costs entailed by a D operation, and A operations with 1 , 5 and 15 principal components. Radio range is 10 meters. 4.5 Distributed covariance matrix W e study in this section the ability of the local cov ariance hypothesis to properly identify the prin- cipal component subspace, and discuss how the accuracy loss is counterbalanced by gains in energy consumption and network load. In Fig. 11, the upper curv e giv es the amount of v ariance retained if all co variances are computed, and is the same as in Fig. 7. Lower curves correspond to the percent- age of variance retained as the radio range of sensors is decreased, and illustrate the f act that the local cov ariance hypothesis may hav e as a negati ve consequence a loss of accuracy for identifying the princi- pal components. If the local covariance hypothesis does not hold, the loss can be high, as is illustrated by a radio range of 6 meters. This loss is ho wev er attenuated when the number of principal compo- nents increases. In any case, the subspace obtained by computing the principal components from the approximate cov ariance matrix is significantly better in retaining information than a random subspace. W e note that reducing the radio range also decreases the ener gy spent in communication, as a lo wer transmission po wer leads to a lower energy consumption for the radio. W e illustrate in Fig. 12 the potential savings in terms of netw ork load obtained by decreasing the radio range. This figure reports Sensors 2008 , 8 4844 5 10 15 20 25 0 20 40 60 80 100 Variance retained wrt PCs for different covariance approximations Number of PCs Variance retained (%) Covariance values All 20 meters 10 meters 6 meters Random Number of PCs V ariance retained (%) V ariance retained wrt PCs for different covariance approximations Figure 11 . Capacity of the principal compo- nents to retain the measurement v ariance. 6 10 20 30 40 50 0 10 20 30 40 50 Network load incurred by covariances computation Communication radius (m) Number of packets processed per cycle (Tx+Rx) Communication radius (m) Number of packets processed per cycle (Tx+Rx) Network load incurred by covariances computation Figure 12 . Network load incurred by local co- v ariances updates. the distribution of the network loads related to the computation of the cov ariance matrix for varying radio ranges. The av erage network load increases with the radio range, as the neighborhood of sensors gets larger . Fig. 11 and 12 provide together an illustration of the tradeof f between the accuracy and the communication costs entailed by the local cov ariance hypothesis. A radio range range larger than 20 meters does not bring much gains in terms of accuracy , while it strongly increases the network load. Also, it is interesting to compare the results reported in Fig. 12 and Fig. 9. The highest network load caused by an update is in all cases lower using the distributed cov ariance matrix scheme ( 52 packets processed against 101 at the root for the default data gathering operation). The a verage network load can ho wev er be higher as the radio range increases, due to the f act that nodes close to the leav es process more packets in the distrib uted scheme than in the centralized scheme. 4.6 Distributed principal component computation W e finally discuss the ability of the power iteration method to properly identify the principal compo- nents. The main parameter for the estimation of the eigen vectors is the con ver gence criteria used. W e illustrate in Fig. 13 the dif ference in accurac y obtained on the test set, for dif ferent con ver gence criteria, between the set of exact eigen vectors (Computed in a centralized manner with the QR method), and the set of approximated eigen vectors obtained by means of the po wer iteration method. The conv ergence threshold δ was set to 10 − 3 , and we tested the accurac y obtained for 5 , 10 , 20 , 30 , 40 and 50 iterations. Results reported are averages of ten-fold cross validations, and the confidence lev el of the results was about 2% . This Figure allows to experimentally illustrate a set of possible beha viors, which are understood by keeping in mind that the rate of con vergence of the po wer method depends on the ratio of the two dominant eigen values. T ypically , in correlated data, the ratio of subsequent eigen values decreases e xpo- nentially (which can be seen qualitati vely in Fig. 7), making the con vergence speed lo wer as the number of principal components computed increases. As a result, few iterations are usually enough to con ver ge to the first eigen vectors. This is seen in Fig. 13 for the first PC, for which as little as fiv e iterations Sensors 2008 , 8 4845 are enough to nicely con ver ge. In the computation of the subsequent PCs, we observe that the number of iterations required to properly con verge is higher (about 20 iterations led to accuracy similar to the centralized approach). If the number of iterations is not high enough, the power iteration leads to a PC that may not represent the data as well as the centralized approach, as is observ ed from the second PC for a maximum number of fiv e iterations. Note that, giv en the fact that PCs are estimated on the basis of past measurements whose distribution is not exactly the same as upcoming measurements, it may also happen that approximations pro vide better accuracies than the centralized approach. This explains the gains in accuracy obtained around the 8 -th component where the dif ference between subsequent eigen values gets very lo w . ! " # $ %& %! %" ! %& ! ' & ' %& %' !& Difference in accuracy with centralized scheme as number of PM iterations increases ()*+,-./0.123.4/*5)6,7 8/33.9:.;44)-;4<.=96>.4,:6-;?9@,7.34>,*,.ABC D6,-;69/:3 ' %& !& E& "& '& Number of PCs computed Loss in accuracy with respect to centralized scheme Difference in accuracy with centralized scheme as number of PM iterations increases 2 4 6 8 10 12 14 ! 10 ! 5 0 5 10 15 20 Difference in accuracy with centralized shceme as the number of PM increases Number of PCs computed Loss in accuracy with respect to centralized scheme Iterations 5 10 20 30 40 50 Figure 13 . Comparison of the accuracy ob- tained using the exact and the approximated eigen vectors, for dif ferent number of iterations and principal components. 1 PC 5 PCs 10 PCs 15 PCs 0 1000 2000 3000 4000 5000 6000 !"##$%&'()&"%*'"+)+*(+*),-*%$#.-/* *"0*1!+*'"#2$)-3*&%'/-(+-3* ! *4(3&"*/(%5-*&+*67# Number of packets processed per cycle (Tx+Rx) Communication costs as the number of PCs computed increases Number of packets processed per cycle (Tx+Rx) 1 PC 5 PCs 10 PCs 15PCs Figure 14 . The network load is quadratic in the number of principal components. Radio range is 10 meters. Additional experiments were also run to observe the consequence of the local cov ariance hypothesis on the non positi ve semi-definiteness of the matrix. W e observed that negati ve eigen values could lead the algorithm to stop at stages as early as between 5 to 10 components (this was observ ed for radio ranges of 30 and 40 meters). It is howe ver important to note that despite the early stopping of the algorithm, the set of principal components identified w as enough to retain more than 90% of the v ariance (cf. Fig. 11). Finally , we report in Fig. 14 estimates of the communication costs entailed by the computation of the principal components. The radio range was of 10 meters, and the costs were computed follo wing the analysis of Sections 2.1.3 and 3.5. While about two hundred packets per node suf fice for the computation of the first eigen vector , this quantity can reach a significant netw ork load of 6000 packets on a verage for the computation of 15 components. As discussed in Section 3.5, the main adv antage of the distributed computation of the principal components lies in the distribution of the computational and memory costs, and the approach does not scale well to the computation of a large number of principal components. 4.7 Summary The main parameter of the principal component aggregation, which determines the tradeoff between the compression accuracy and the communication costs, is the number of principal components to e x- Sensors 2008 , 8 4846 tract. As illustrated in Fig. 10 or 14, the network load incurred by the computation of the principal components quickly increases with their number . The PCAg scheme therefore proves useful only if a few components are required. In such a case, we emphasize that the gains can be dramatic, as was reported in Fig. 10 or 14. The amount of information retained by a set of principal components depends on the correlation existing between sensor measurements. W e illustrated the method for a compression task using a real- world temperature dataset where measurements had an degree of correlation that can be assumed to be representati ve of a typical sensor network scenario. An interesting result to recall is that for 5 principal components, 90% of the v ariance could be retained. This was shown to return a nice approximation of the original signals (Fig. 8), and to reduce the highest netw ork load from about 100 packets per epoch to about 40 packets per epoch (Fig. 10). Regarding the scalability , an interesting property of the PCAg is that the network load is better dis- tributed in the network. This property first pre vents congestion issues observed at the root node using the default scheme. Second, as the radio communication is a primary source of energy depletion for a wireless node, it better distributes the lifetime of the nodes in the network. The PCAg therefore provides an interesting frame work for dealing with data in multi-hop networks. 5 Related work The distrib ution of the PCA has been addressed in the signal processing and information theory com- munities ???? . These approaches, based on distributed source coding, in volve compression techniques related to Slepian-W olf and W yner -Ziv coding together with Karhunen-Loev e transform for compress- ing data in distributed systems. Each component of the system is ho wev er assumed to observ e sev eral dimensions of the problem, and they require the base station to support the task of defining the coding scheme. The network architectures and communication costs analyses are also left as open research areas. Although these approaches may lead to attracti ve applications for WSN, they still remain in that respect at an early research stage. In the fields of machine learning and data mining, distrib uted PCA schemes have also been addressed in ?? and ? . The former approaches aimed at combining the PCA results obtained from different sources that share the same variables ( vertically distributed databases). The latter work provided an architecture aimed at network anomaly detection, where the PCA is first computed in a centralized manner , and a subsequent distrib uted and adaptiv e feedback mechanism is proposed. Despite the similarity in the choice for the method name, the corresponding work is howe ver clearly different from the approach proposed in this article. The distributed computation of eigenv ectors of large and sparse matrices has been tackled from se veral angles in the literature, and good revie ws of existing state of the art techniques are for example detailed in ?? . The approach proposed in this article is howe ver assumed to be innov ativ e to the best of authors’ kno wledge, as it lev erages two specific architectural properties of sensor networks. First, a wireless sensor network is an intrinsically distributed system where each component only captures one dimension of the system variations. Second, the communication constraints can be coupled with the local cov ariance hypothesis. Sensors 2008 , 8 4847 Recent work in the domain of link analysis and recommendation systems has led authors in ? to propose a distrib uted implementation of the po wer iteration method, which closely matches the approach proposed in this article. Their algorithm aims at computing the principal eigen vector of the adjacenc y matrix of a peer -to-peer network, which leads to the ranking of the network components, in a w ay similar to the page rank algorithm. The underlying h ypotheses of the distrib uted system are interestingly close to ours (each component has only access to one dimension the problem, and can communicate with components whose data is related). The netw ork structure is ho wev er dif ferent, as no coordinating entity is assumed. This leads authors to rely on the harmonic mean to achiev e the normalization step of the po wer iteration method. The computation of subsequent eigen vectors is also not addressed. Related work on the use of basis change for sensor networks has been addressed in ? , where au- thors proposed the radical position of using random bases to project sensor measurements. The work is analyzed in the context of compressed sensing, an emerging field in signal processing (see also ? ). Their work has ho wev er mainly focused on the theoretical ability of random bases to retain the sensor measurements v ariations. Conclusion This article e xtended pre vious work related to the distribution of the principal component analysis by presenting an implementation suitable for wireless sensor networks. The approach relies on the hypoth- esis that sensor measurements collected by distant sensors are uncorrelated, and was shown to provide significant gains in terms of radio communication. Additionally , we sho wed that the distrib uted principal component analysis led to balance the network load among the sensors, making the method particularly suitable for multi-hop sensor networks. W e plan to e xtend this work by sho wing that spatiotemporal aggregation and independent component analysis can also be formulated in the same frame work. Acknowledgments This w ork was supported by the COMP2SYS project, sponsored by the Human Resources and Mo- bility program of the European Community (MEST -CT -2004-505079), and by the PIMAN project, sup- ported by the Institute for the Encouragement of Scientific Research and Inno vation of Brussels, Bel- gium. c 2008 by the authors; licensee Molecular Div ersity Preservation International, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creati ve Commons Attribution license (http://creati vecommons.org/licenses/by/3.0/).
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment