The Principal Component Analysis (PCA) is a data dimensionality reduction technique well-suited for processing data from sensor networks. It can be applied to tasks like compression, event detection, and event recognition. This technique is based on a linear transform where the sensor measurements are projected on a set of principal components. When sensor measurements are correlated, a small set of principal components can explain most of the measurements variability. This allows to significantly decrease the amount of radio communication and of energy consumption. In this paper, we show that the power iteration method can be distributed in a sensor network in order to compute an approximation of the principal components. The proposed implementation relies on an aggregation service, which has recently been shown to provide a suitable framework for distributing the computation of a linear transform within a sensor network. We also extend this previous work by providing a detailed analysis of the computational, memory, and communication costs involved. A compression experiment involving real data validates the algorithm and illustrates the tradeoffs between accuracy and communication costs.
Deep Dive into Distributed Principal Component Analysis for Wireless Sensor Networks.
The Principal Component Analysis (PCA) is a data dimensionality reduction technique well-suited for processing data from sensor networks. It can be applied to tasks like compression, event detection, and event recognition. This technique is based on a linear transform where the sensor measurements are projected on a set of principal components. When sensor measurements are correlated, a small set of principal components can explain most of the measurements variability. This allows to significantly decrease the amount of radio communication and of energy consumption. In this paper, we show that the power iteration method can be distributed in a sensor network in order to compute an approximation of the principal components. The proposed implementation relies on an aggregation service, which has recently been shown to provide a suitable framework for distributing the computation of a linear transform within a sensor network. We also extend this previous work by providing a detailed analy
Sensors 2008, 8, 4821-4850; DOI: 10.3390/sensors
OPEN ACCESS
sensors
ISSN 1424-8220
www.mdpi.org/sensors
Article
Distributed Principal Component Analysis for Wireless Sensor
Networks
Yann-A¨el Le Borgne1,⋆, Sylvain Raybaud2, and Gianluca Bontempi1
1 Machine Learning Group, D´epartement d’Informatique, Facult´e des Sciences, Universit´e Libre de
Bruxelles, Boulevard du Triomphe, 1050 Brussels, Belgium
2 ´Ecole Normale Sup´erieure de Cachan, 61, Avenue du Pr´esident Wilson, 94235 Cachan Cedex, France
E-mails: yleborgn@ulb.ac.be; sraybaud@dptmaths.ens-cachan.fr; gbonte@ulb.ac.be
⋆Author to whom correspondence should be addressed.
Received: 27 May 2008; in revised form: 29 July 2008 / Accepted: 4 August 2008 /
Published: 11 August 2008
Abstract:
The Principal Component Analysis (PCA) is a data dimensionality reduction technique well-
suited for processing data from sensor networks. It can be applied to tasks like compression,
event detection, and event recognition. This technique is based on a linear transform where the
sensor measurements are projected on a set of principal components. When sensor measure-
ments are correlated, a small set of principal components can explain most of the measure-
ments variability. This allows to significantly decrease the amount of radio communication
and of energy consumption. In this paper, we show that the power iteration method can be
distributed in a sensor network in order to compute an approximation of the principal compo-
nents. The proposed implementation relies on an aggregation service, which has recently been
shown to provide a suitable framework for distributing the computation of a linear transform
within a sensor network. We also extend this previous work by providing a detailed analysis of
the computational, memory, and communication costs involved. A compression experiment
involving real data validates the algorithm and illustrates the tradeoffs between accuracy and
communication costs.
Keywords: Wireless sensor networks, distributed principal component analysis, in-network
aggregation, power iteration method.
arXiv:1003.1967v1 [cs.NI] 9 Mar 2010
Sensors 2008, 8
4822
1
Introduction
Efficient in-network data processing is a key factor for enabling wireless sensor networks (WSN) to
extract useful information and an increasing amount of research has been devoted to the development of
data processing techniques ????. Wireless sensors have limited resource constraints in terms of energy,
network data throughput and computational power. In particular, the radio communication is an energy
consuming task and is identified in many deployments as the primary factor of sensor node’s battery
exhaustion ?. Emitting or receiving a packet is indeed orders of magnitude more energy consuming than
elementary computational operations. The reduction of the amount of data transmissions has therefore
been recognized as a central issue in the design of wireless sensor networks data gathering schemes
?. Data compression is often acceptable in real settings since raw data collected by sensors typically
contain a high degree of spatio-temporal redundancies ????. In fact, most applications only require
approximated or high-level information, such as the average temperature in a room, the humidity levels
in a field with a ±10% accuracy, or the detection and position of a fire in a forest.
An attractive framework for processing data within a sensor network is provided by the data aggrega-
tion services such as those developed at UC Berkeley (TinyDB and TAG projects) ??, Cornell University
(Cougar) ?, or EPFL (Dozer)?. These services aim at aggregating data within a network in a time- and
energy-efficient manner. They are suitable when the network is connected to a base station from which
queries on sensor measurements are issued. In TAG or TinyDB, for example, queries are entered by
means of an SQL-like syntax which tasks the network to send raw data or aggregates at regular time in-
tervals. These services make possible to compute “within the network” common operators like average,
min, max, or count, thereby greatly decreasing the amount of data to be transmitted. Services typically
rely on synchronized routing trees along which data is processed and aggregated along the way from the
leaves to the root ??.
Recently, we have shown that a data aggregation service can be used to represent sensor measurements
in a different space ?. We suggested that the space defined by the principal component basis, which
makes data samples uncorrelated, is of particular interest for sensor networks. This basis is returned by
the Principal Component Analysis (PCA) ?, a well-known technique in multivariate data analysis. The
design of an aggregation scheme which distributes the computation of the principal component scores
(i.e., the transformed data in the PCA space) has three major benefits. First, the PCA provides varying
levels of compression accuracies, ranging from constant approximations to full recovery
…(Full text truncated)…
This content is AI-processed based on ArXiv data.