Model-Based Event Detection in Wireless Sensor Networks

In this paper we present an application of techniques from statistical signal processing to the problem of event detection in wireless sensor networks used for environmental monitoring. The proposed approach uses the well-established Principal Compon…

Authors: ** *원문에 저자 정보가 명시되지 않았습니다.* (논문에 기재된 저자명을 그대로 기입해 주세요.) **

Model-Based Event Detection in Wireless Sensor Networks
1 Model-Based Ev ent Detecti on in W ireless Sensor Networks Jayant Gupchup , Randal Burns, Andreas T erzis Alex Szalay Department of Co mputer Science Department of P hysics and As tronomy Johns Hopkins Uni versity Johns Hopkins University 3400 N. Ch arles St 3400 N. Ch arles St Baltimore, MD 2 1218 Baltimore, MD 2 1218 { gupchup,randa l,terzis } @jhu. edu szalay@jhu. edu Abstract — In this paper we present an application of tech- niques fr om statistical signal proc essing to the problem of eve nt detection in w ireless sensor networks used for env ironmental monitoring. Th e proposed approach uses the well-established Principal Component Analysis (PCA) techn ique to bui ld a com- pact model of the observ ed phenomena that is able to capture daily and seasonal trends in th e collected measurements. W e then use the diver gence between actual measurem ents and model predictions to detect the existence of discrete ev ents within t he collected data streams. Ou r preliminary results show that th is ev ent detection mechani sm is sensitive enough to d etect the onset of rain events using th e temperature modality of a wireless sensor network. I . I N T R O D U C T I O N A number of testbed s ( e.g. , [ 1–3]) ha ve shown the p otential of wireless sensor network s ( WSNs) to c ollect en vironmen tal data at pr eviously un imaginable spatial and tempo ral den- sities. These developments p resent many data ma nagemen t challenges. First, o ur exper ience from the d eployments has made clear the shortcomin gs of the static b ehavior of curr ent sensor networks. For example, scientists would like to sam- ple the en vironment at a high f requen cy to cap ture detailed informa tion about “inter esting” events, but d oing so would create an inordina te am ount of data. On the other hand, sampling at a lower f requency g enerates less data b ut misses importan t tempor al transients. Second, the large amount of data that these networks gen erate co mplicates the query ing and post-processing stages. Rather than manually traversing throug h the co llected data, scientists would pre fer to query for measurements related with certain ev ents ( e.g. , significant rainfall). T o address th ese issues, we n eed WSNs that can reason about the p henom ena they observe and change their beh avior based o n events they d etect. Possible ad aptation strategies include changes in th e sampling rate as well as waking up other no des in the n etwork to incr ease spatial coverage of the detected ev ent [ 4, 5]. The reading s of sensors are superpo sitions o f se veral pro- cesses. They are often dominated by pre dictable for eground s, which can be very m uch larger than the subtle tren ds and variations that we are trying to m easure or the small ev ents that we try to detect. In order to in terpret the re adings, it is impo rtant to sep arate these different signals into in depen- dent co mpon ents. In environmental monitoring , most sensor s witness d aily v ariations o f all quantities and season al trend s. In addition, the re are discrete n atural events (storm , rainfall, strong winds) that ha ve a separa ble ef fect on our d ata. W e present an a pproac h using techniques of statistical signal pro- cessing to decom pose th e sensor readings into various physi- cally m eaningfu l componen ts. In our app roach, we perfor m a step-by-step identification of various f oregroun ds. W e identify the diurn al cycle present in both the box and soil temper ature sensor d ata and we acco unt for the effect of seasonal drift. W e make use o f all th ese p riors (daily cycle, seasonal drift) to detect e vents by iden tifying when measurements diverge fr om those expected b y the foregroun ds. Specifically , we explore variants of Princip al Compon ents Analysis (PCA) [6] that we u se to extract features fr om the data collecte d b y the network and discover the m ultiple underly ing ph ysical pr ocesses that g enerate the observed data. This produce s a model of “no rmal beha vior . ” Observations that div erge fr om the mode l correspond well with ev ents. W e no te that one ca n build th e PCA model o ffline using historic al data and that a small numbe r of parameter s su mmarize the phe- nomena th at th e mo tes sense. Such a compa ct repr esentation of the mo del makes it po ssible to build a lightweig ht event detection mechanism that runs in real time on the network’ s motes. W e e valuate the performance of the proposed mech anism using data from th e L ife Under Y our Feet environmental sensing n etwork [1] . W e e xecute the e vent detection algorithm to d etect rain events with the d eployment area over ten mo nths of the network ’ s lifetime. W e c ompare the list of detected ev ents with precip itation data recorded by a weather station at BWI airport. This specific app lication reveals another aspect of th e pro - posed ap proach : while th e motes in our network h ave soil moisture sensors, these sensors can not detect the on set of a rain event, b ecause soil moisture rises only after th e water seeps throu gh the soil. Instead, we use a c ombinatio n of air and soil temperatur e measur ements to detect wh en rain starts to fall. Figure 1 shows that temperature varies immediately with th e o nset of an ev ent, but that soil moisture lags by se veral hour s. The model allows us to detect the rain e vent rapidly b ased on ind irect e vidence pr ior to the rain’ s direct 2 -20 -15 -10 -5 0 5 10 15 20 0 5 10 15 20 value hour Air Temperature Soil Moisture Typical Air Temperature Profile Event Period Fig. 1. Air temperat ure is a better indicator of the onset of a rain ev ent compared to soil moisture. effect on soil moisture. This better describes system behavior , capturing much mo re informatio n ab out the dynamic s o f soil moisture in response to rain. A. Envir onmen tal Sensing While our solution is genera lly applicable to W SNs that col- lect large am ounts of data using multiple sensing m odalities, we present ou r d esign th rough a environmental monitor ing application we d ev eloped and is cu rrently deployed for over 18 months at an urb an forest in Baltimore, M D. The pur pose of the Life Un der Y our F eet network is soil monito ring in which each o f th e network’ s ten motes period ically collects measuremen ts, includin g soil temper ature an d soil h umidity , as well as ambient temperature and light. The key d ifference b etween th is ap plication and previous en vironm ental monitorin g networks ( e.g. , [2, 3]) is that all raw measure ments a re reliably retrieved at the network’ s base station, which subsequ ently in serts them to an SQL database. This strin gent r eliability requiremen t is d ictated b y our scien tific co llaborator s an d the researc h mission of the monitore d site. Each mote takes measurements at on e minute intervals and reco rds them tem porarily to its integrated flash memory . The MicaZ m otes we use h av e a total ca pacity of 512 KB of flash storage [7 ]. In general, each mote store s 23 KB bytes of m easurement d ata p er day , which indicates th at measuremen t data w ill be lost if not collected within 20 days. In practice, we download d ata from each of the n etwork’ s motes at least o nce a week, using an auto matic repeat request (ARQ) protocol to ensure r eliable deliv ery in the p resence of packet losses. W e also extract wea ther info rmation ( air tem perature and rain ev ents) from a weather station at the BWI airport located 25 miles away from ou r dep loyment site. Th e data scraping progr am we use inserts this data in to the same database, allowing meteorolo gical i nform ation, such as rain duration and amount of rainfall, to be correlated to the da ta collec ted b y the sensor network. I I . R E L AT E D W O R K PCA event detection c onstructs a model of system b ehavior . W e co nsider two ap plications of m odel-ba sed event detection in descr ibing related work. T he first is an offl ine variant in which event detectio n happen s at th e datab ase th at stores the measuremen ts collected by the n etwork and is u sed to automatically identify “interesting” regions within the swaths of data ac quired by the sensor network. The other is on line in that motes in the n etwork detect use ev ents and models to alter their behavior . Offline event de tection p rovides a model suitab le fo r q uery- ing events from noisy an d impr ecise d ata. Both databa se systems [8, 9] and sen sor networks [10 –12] have explored model-b ased q ueries as a meth od for dealing with irregula r or unreliable data. Models in these systems include Gaussian- processes [10], interpo lation [ 13, 14 ], regression [1 0, 15] and dynamic- proba bilistic models [ 9, 11] . W e gi ve anoth er , PCA- based mod el specifically suited to ev ent detection. MauveDB [9] pr ovides a user-view inter face to mo del-based queries, which greatly exten ds the utility an d usability o f models. W e inten d to implemen t our o ffline PCA mod el within the MauveDB fra mew ork. In the o nline case, senso r n etworks reduc e the bandwidth requirem ents of d ata collection b y suppr essing r esults that confor m to the model or compressing the data s tream through a model rep resentation. Th is h as coin cident ben efits o n r esource and en ergy u sage within the network. If senso rs measure spatially correlated values, v alues collected from a subset of nodes can be used to materialize the uncollec ted values from other nod es [16, 17]. Similar ly , tempo rally-co rrelated values may be collected infrequen tly and missing values interpolated [11, 18]. By placing mod els in th e mote itself, the mote may transmit mod el parameters in lieu of the data, comp ressing or suppressing entirely the data stre am [19–21 ]. O ur PCA model may be used for suppression and compression and may also be u sed to alter th e beh avior and con figuration of the network, e.g. only collecting d ata when events o ccur and turn ing off large po rtions of the n etwork at other tim es. Most research on “event detection” describes data fusion and in-n etwork e vent processing, ra ther than the detection o f an event based on the d ata. REED provides in-network joins to report event cond itions that are program med declar ativ ely [22]. Othe r systems make sure that multiple sensors detect an event pr ior to repor ting it [23 , 24]. Our work focuses on using PCA models to rap idly and accurately report an ev ent at a single mote. This single mote report serves as an input to fusion and e vent query e v aluation. Other ecological monitoring systems use simple rising edge or trigger/thr eshold based e vent detectors at each mote [25]. W e use PCA to de termine that a reading or time series is dissimilar to the n ormal beh avior o f the system, cha racterize by the princip al comp onents. Similar u ses of PCA inc lude anomaly and intr usion detection in comp uter networks [26, 27] leak age detectio n in gas sen sor array s [28 ]. Recently , PCA has be en ap plied to e vent detection in the Internet, sp ecifically identifyin g correlated throug hput and loss events on mu ltiple Internet p aths [2 9]. Howev er , the authors p rovide no details o f their approach . 3 -6 -4 -2 0 2 4 6 0 2 4 6 8 10 12 14 16 18 20 22 24 Mean subtracted temperature (celcius) Hour of Day Air Temperature Soil Temp, factor of 20 scale-up Fig. 2. Mea n subtracted p rofile of air a nd soil temperature (latter sca led up by a f actor of 20) for a typic al 24 hour cycle I I I . M E T H O D O L O G Y Principal comp onent analy sis (PCA) [6] or Karhu nen- Lo ` eve transfo rm (KL T) is a powerful statistical tool for simplifying data , by r educing high- dimensiona l datasets into low-dimensional datasets that approxim ate the orig inal d ata. It does so throu gh singu lar value d ecomposition (SVD): an orthog onal linear transform o f a matrix (the or iginal data) into an equivalent d iagonalized matrix. The v alues of the d iagonal matrix are eigenvalues and the co rrespon ding eigen vectors are called basis vectors. The eigenvectors with m aximum eigenv alues repre sent the “most important” dimensions in tha t these dim ensions have the max imum variance and strongest correlation in the dataset. Th us, the data set may be reduc ed to just those d imensions (eig en vectors) with large eig en values. Data an alysis may be perfor med in the lower dimen sional representatio n with go od fidelity to resu lts o n the or iginal data. The lo wer d imensional space offers benefits not only in data size, computationa l complexity , and ease o f v isualization, but also these vectors represent th e “typical” patterns of the data, where as the re siduals correspo nd to “atypical” be havior . PCA has seen wide-rang e of applications, i ncludin g clustering, correlation detection, pa ttern matching, and data compression. A. App lying PCA to sensor measu r ements W e app ly o ur PCA mod el to air temper ature an d soil temperatur e senso r read ings. Sensor readings exhib it typical diurnal cycles, which d ominate every other signal present. Fig 2 shows the m ean-subtra cted profile o f a typical d ay for air tempe rature and soil temperatu re. W e note the rise in temp erature as the sun co mes out in th e morning and the fall in temper ature as the sun go es down in the evening for air tem perature. W e also ob serve that soil tempera ture changes lags a ir temperatu re changes by sev eral hou rs, owing to the inertia of th e soil. There is a noticeable phase sh ift between air temperature and soil temperature. This pattern (A C compon ent) is exhibited by all normal (non- ev ent) d ays of all seasons arou nd the av erage value ( DC compo nent) for that day . LUYF sensor s r ecord m easuremen ts once every minu te. W e aggregate and smoo th multiple readin gs, which p roduce s a -0.2 -0.1 0 0.1 0.2 0.3 0 20 40 60 80 100 120 140 Loading Loading number Eigenvector 1 Eigenvector 2 Eigenvector 3 Eigenvector 4 -0.2 -0.1 0 0.1 0.2 0.3 0 20 40 60 80 100 120 140 Loading Loading number Eigenvector 1 Eigenvector 2 Eigenvector 3 Eigenvector 4 Fig. 3. Daily tempera ture eigen ve ctors in decr easing order of eigen v alu es. The top panel sho ws the basis v ectors for the air tempe rature while the lower panel displays the basis vectors for soil temperat ure data-series with a reading every 1 0 m inutes. W e find emp iri- cally th at a 10 min ute average reveals useful information fr om the data. It smo oths transients, yet samp les at a relatively high- frequen cy . This data-series is then converted into an array of vectors such that each vector represents a day ’ s readings from midnigh t to midnight. In a given d ay , we ha ve 14 4, 1 0 minute intervals. W e clean the data prior to building the model in order to best characterize the “n ormal” behavior of the s ystem. W e subtract the me an temper ature of that gi ven day (ca lculated separately for each sensor) from each of these vectors and normalize the readings in th e RMS sense. Using norm alized vecto rs ensure that the diagona l elements o f the corr elation matrix are unity . Thus, each vector contributes eq ually to the PCA basis. Th is balances the co ntribution of summer and winter to th e model ev en thou gh summ er day s have highe r variance. In order to obtain a well-behaved basis, we censor the days which hav e a lot of inherent no ise and jitter from our training set. W e apply a simple median filter to get rid of th ese “bad” days. After clean ing the data, we perfo rm a SVD o n the data to p roduc e our orthog onal eig en vectors (basis vectors) and order these vector s by de creasing eigen values. Fig 3 shows the basis ob tained for air tempera ture and soil temper ature for the LUYF d eployment between the period of September 2005 to July 2 006. W e fin d that the first 4 eig en vectors cover 90.95 % of the total variation in the air temp erature d ata an d 98.89 % in the soil temperatu re data (as defined by the su m of the first fo ur eigenv alues o f the diagon al matrix di vided by the trac e). T he first eigenvector accounts fo r 5 5% of the total 4 variation in the air temper ature data. The ph ysical meaning of the different eig en vectors are apparent. Th e first co mpon ent of the air temperatu re is a b ell shape curve, corresp onding to the slo w rise o f the temperature around 7 am, then coo ling after 3p m. The seco nd eige n vector is r ising throu ghou t the day monoto nically , describing a warmin g/cooling trend from o ne day to another . T he third vector causes the bell shaped curve of th e temperatu re to slide forward or backward, representin g the effect of the seasonal warming and cooling . Finally , the fourth eig en vector is the broa dening an d shorten ing of th e daily temperatur e cycle, again a seasonal effect. The so il has a large iner tia in responding to changes in the external temperatu re, the characteristic timescale is longer than a day . Th is manifests itself in th e f act that the most significan t eigenv ector is the cooling/warmin g, and all other s (daily cycle, shift and broadening ) ar e substantially suppressed in amplitude and have a significant phase shift. B. Expa nsion on the Ba sis and Long-T erm T r ends T o complete the model, we factor in the con tributions of all sensors over all time. W e expand all the daily vectors ov er the basis vector s. Th is g iv es us fou r coe fficients ( e i 1 , ..., e i 4 ) to d escribe the daily be havior of the temperatur e for each sensor i (fiv e, if we add the mean temperatur e as e i 0 ). In order to identify long-ter m trend s, we iteratively run a low- pass filter with a fixed wid th of one week over the different series, resulting in the smooth series s i 0 , .., s i 4 . For each of those coefficients we a verage over all sensors to get the smooth mean ( S 0 , .., S 4 ). Hereafter, we will use capitals to den ote a time series av eraged over all the sensors. The sm oothed ser ies exhibit strong corre lations. S 3 and S 4 describe th e b eginning and the length o f daytim e, wher eas S 2 describes the slow warming and cooling trends, associated with the chan ges o f season s. These smoo th tren ds serve as the backgr ound to all the other variations. C. Even t detection Our general approach to event d etection looks at the co- efficient of the first eigen vector . W e began by looking at the projection s o f each day’ s mean- subtracted air temperatur e on the first f ew eigenvectors. Altho ugh the first 4 eigenv ectors for air temperatur e represent 90.95% of the total variation in the data, we realized that mo st of the inform ation is shown by the coefficient of first eigenvector . Th us, we wer e ab le to analyze an entire day’ s data by loo king at the ser ies e i 1 thereby achie ving a massiv e com pression. W e created the data series E 1 , the eige n-coefficient e 1 for that day a veraged over all sensors. W e applied a thresho ld on the E 1 series to detec t ev ents: lo w v alues correspond to beh avior that differs f rom the mod el. W e ref er to this m ethod as the B A S I C metho d. Although this appro ach ga ve us satisfactory results, it do es not take into account th e seasonal drift. W e im prove on the B A S I C d etector by rem oving the sea- sonal d rift a nd run ning a high p ass filter on the e i 1 data ser ies. W e run the hig h-pass filter using the difference D 1 = E 1 − S 1 between the data series E 1 and the smoothed series S 1 . W e refer to this metho d as the H I G H PA S S metho d. It significantly -8 -6 -4 -2 0 2 4 6 8 0 5 10 15 20 Temperature (celcius) hour of the day Mean Subtracted Air Temperature Model Temperature(using basis vectors 1.,4) Residuals (absolute value) Fig. 4. Dif ferenc e between Air temperat ure measurement and model project ion for the rain e ve nt on 2006-01-18 improves th e numb er of events detected and redu ces the number of false negati ves. The last appr oach we present m akes use of the iner tia ex- hibited by the soil tem perature. Since soil tem perature changes much slower comp ared to the air temperatu re, we lo oked at the d ifferences between the high-p ass filtered series, D 1 for air temperature and the h igh-pass filtered data series, D 1 for soil temperature and then set a suitab le thresh old for detecting ev ents. W e ref er to this ap proach as the D E LTA method. It significan tly o utperfo rms the B A S I C and the H I G H PA S S methods. W e find that because of the ine rtia shown b y soil temperatur e, the eigen- coefficients E 1 for soil temper ature show sha rp chang es on the day (s) after the e vent. Th is made the e vent d ays easier to identify . I V . E V A L U A T I O N W e use o ur model to detect ev ents o n the deployment for the perio d between Sep tember 20 05 and Augu st 2006 an d compare the r esults with the actu al known events recorded by a w eather station at Baltimore-W ashingto n Inter national (BWI) airpor t [ 30]. W e assume that rain at BWI implies rain at Johns Hopk ins Un iv ersity , Baltimore wh ich is located 25 miles away . In ou r evaluation, we only con sider r ain events which are promin ent. For examp le, we co nsider event d ays as days having p recipitation greater than 3 mm. W e considered 225 d ays starting from September 17, 200 5 and J uly 20 , 20 06, and found that 48 e vents fit this criteria. There are many o ther types of events which have also occurre d durin g the d ays of our sampling : faulty sensor s, motes running ou t o f power , etc. Particularly interesting was a perio d of ab out 45 days from mid March 06 to the e nd of April 06 in which ther e were lots o f anom alies in the e 1 values. This was th e result of spo radic direc t sunlight heating up the m otes. Af ter Apr il, ther e w as en ough foliage cover th at the motes (located at grou nd le vel) were no t exposed to the direct heating of the sun. W e focus o n th e efficiency of detecting the rain events just from tem perature data. The re is a g ood ph ysical b asis fo r th is: during rainfall the tem peratur e suddenly dro ps, but once the rain is over it recovers. This produces a la rge transient on the 5 0 0.2 0.4 0.6 0.8 1 1.2 -24 -20 -16 -12 -8 -4 0 4 8 12 16 Percentage Threshold value precision recall Fig. 5. Precision-reca ll curve for the Delta m ethod T ABLE I P E R F O R M A N C E O F D I FF E R E N T M E T H O D S F O R D E T E C T I N G E V E N T S . Method Precision Recal l False Nega ti ve s B A S I C 52.459% 64% 18 H I G H PA S S 51.28% 80% 10 D E LTA 54.795% 85.106% 7 shape of th e 24 hour cycle fo r that particu lar day , resulting in a smaller e 1 coefficient an d a larger residual. Fig 4 illustrates this fact. W e observed a m ajor event on 2006- 01-18 . There was heavy rain between 9:0 0 AM and 1 1:00 AM. W e can clearly see large r esiduals for this period. W e ev aluate the perfor mance of the three m ethods i.e. B A S I C , H I G H PA S S and D E LTA metho d. In our e v aluation, we use the stan dard infor mation retr iev al mea sures of pre cision and recall. In this case, p recision is the fr action o f repor ted ev ents that were actu ally rain e vents and r ecall is the fraction of rain ev ents that the PCA model repo rted c orrectly . W e also report false negativ es, which effect recall an d not precision . W e attemp t to strike a balance between p recision and recall. Our criteria is to detect as many events a s possible with a true positive rate (precision ) of at least 50%. Higher precision is difficult to achiev e giv en th at ou r system also detects o ther (non- rain) events. Recall may be affected by the assumption that rain at BWI imp lies rain at JHU an d vice versa. Th is is not always the ca se. T able I shows th e results for the different m ethods. Using high-p ass filtering and includ ing soil temper ature increases recall withou t affecting precision substantially . Figure 6 shows the p rojection values of different methods for th e period between 12/13/200 5 an d 01 /02/20 06. The ra in ev ents are indicated by a triangu lar marker at th e bo ttom. W e can see th at the D E LT A m ethod shows sharper negative p eaks than the oth er m ethods on event days and shows lower p eaks for non-event days. Notice that the large downward spike shown on day 4 (12/1 6/2005 ) correspon ds to a large event. W e are able to detect most ev ents d ays, missing o nly 7 with the D E LT A method. Ag ain, we fo cus o n rec all, given that non-r ain e vents occur and pollu te our p recision statistics. Th e precision-r ecall curves for dif ferent threshold values (Figure -20 -10 0 10 20 30 40 50 60 0 5 10 15 20 projection value days Delta Highpass Basic events Fig. 6. Projecti on v alue s for dif ferent technique s on ev ent and non-e v ent days. The mark er at the bottom i ndicat es an ev ent 5) shows that g ood recall ca n be achieved at better than 50% precision. The con verse is not true. High recall matches well with our application needs; rep orting events when they occur suppor ts network adaptation and identifies interesting regions of d ata to scientists. In all likelihoo d, prec ision an d recall would be much imp roved w ith mo re accurate and local weather monitoring – a better “groun d truth” – and considering multiple types of e vents. V . D I S C U S S I O N A N D F U T U R E W O R K In this paper we presen t an application of techniq ues from statistical signa l pro cessing to d etect the presence of e vents ( e.g. , rain events) that deviate f rom the regular p hysical patterns witnessed b y a sensor netw ork. W e do th is by u sing a variant o f the Principal Comp onent Analysis (PCA) techniq ue to gen erate a compact pr ofile fo r ’no rmal’ measureme nts. W e can then co mpare actu al mote measurements with mod el prediction s and classify th e instanc es in which the two di ver ge significantly as ev ents of interest. W e evaluate the performan ce of the p roposed mec hanisms using temp erature measure ments, collected over a year by a small environmen tal m onitorin g network, to d etect th e onset of rain events. Our preliminar y results show tha t this techniqu e is able to detect most rain ev ents, with small n umber o f false positiv es, even in the presence o f large f oregroun d variations and a substantial seasonal drifts. This is on ly the beginn ing—one can carry this a pproach much further . While we p resent e vent detection in its o ffline setting, the obser vation that only a small n umber of compo - nents can accur ately describe the collected data sug gests that the same mechanism c an be im plemented on the n etwork’ s motes. Th is in tu rn can result in a light- weight adaptive sampling algor ithm that will enab le real-life WSN dep loy- ments co nfron ted with slo wly varying environments a s well as sudden, discrete events. Efficient ev ent detection is at the core of any adaptive observin g strategy , and we de monstrate how this can be done on today’ s WSN platfor ms. At this po int the method is able to detect global events, i.e. ev ents that all the sensors experience. Howev er , on e w ould like to d etect loca lized e vents. While it is seemingly possible 6 to apply the same PCA technique to detect events experien ced by a single mote, it becomes harder to differentiate be tween an actual event and a malfunctio ning sensor . The qu estion is then how mu ch a dditional in formatio n is ne cessary to separate faults fr om actual e vents. The sen sors are expected to hav e variations due to their local en vironm ent (located near/far from a stream, sitt ing on a hillside with a steep grad ient, etc.) which will cause small, but con sistent, c orrelated chang es. The task is then to find groups of sensors with correlated measurements. W e can d o so by removing the obvious daily foreground s, and the lo ng seasonal tr ends, at which poin t we expect to see these small co rrelated d ifferences in the behavior of senso rs in the same group. Once such group s are cre ated, we can compare the projec ted measurements of a mote with the measur ements of other gr oup m embers. If tho se a gree, then a localized e vent is most likely occurring, otherwise one (or more) of the sensors are faulty . So far , we comp letely exclude from the training set, days with par tial data in wh ich d ue to some ha rdware error s we did n ot g et a r eading f or every one o f the 1 44 sam pling periods. Howe ver , it is easy to app ly a ”gappy” Karh unen- Loeve transform ation [31], in which the expa nsion coefficients can still be comp uted over a partial sup port. Do ing so, will enable the creation of a mo re representative compressed model of the measurement data and hopefu lly lead to higher detection accuracy . A C K N O W L E D G M E N T S W e would like to than k Ching-W a Y ip (JHU, Depa rtment of Phy sics and Astrono my) for making available to u s he r PCA C# library a nd pr oviding us her valuable time in the discussions. The data collected he re was done in collaboration with Katalin Szlav ecz (JHU, Dep artment of Earth and Plan- etary Scien ce) and Razvan Musaloiu -E (JHU, Departmen t of Computer Science) . The on-line da tabase was built in collab- oration with Jim Gray an d Stuart Ozer (Microsof t Research) . Their help and contributions are gra tefully acknowledged. R E F E R E N C E S [1] R. Mus ˘ aloiu-E., A. T erzi s, K . Szlav ecz, A. Szalay , J. Cogan, and J. Gray , “Life Under Y our Feet: A Wire less Soil Ecology Sensor Network, ” in Pr oceedi ngs of the Third W orkshop on Embedded Netw orked Sensors (EmNets 2006) , May 2006. [2] G. T olle, J. Pola stre, R. Sze wczyk, N. Tu rner , K. T u, P . Buona donna, S. Burgess, D. Gay , W . Hong, T . Da wson, and D. Culler , “A Macroscope in the Red woods, ” in Pr ocee dings of the Thir d ACM Confe re nce on Embedded Networked S ensor Systems (SenSys) , Nov . 2005 . [3] A. Mainwaring , J. Polast re, R. Sze wczyk, D. Culle r , and J. Anderson, “W irel ess sensor networ ks for habitat m onitorin g, ” in Proce edings of 2002 ACM International W orkshop on W ir el ess Sensor Networks and Applicat ions , Sept. 2002. [4] P . Dutta, M. Grimmer , A. Arora, S. Bibyk, and D. Culler , “Desi gn of a wireless sensor netw ork platform for detecti ng rare, random, and ephemera l e ven ts, ” in Pr oceedings of IPSN , 2005. [5] L. Gu and J. Stanko vic, “ Radio triggere d w ak e-up capabilit y for sensor netw orks, ” in Real -T ime Applicat ions Symposium , 2004. [6] R. Dud a, P . Hart, and D. Stork, P attern Classificat ion . W ile y , 2001. [7] C. Corporation, “MICAz Specification s, ” A vail able at http:/ /www . xbo w .com/Support/Support pdf file s/MPR- MIB Series Users Manual.pdf. [8] IBM, “Db2 inte llige nt miner , ” 2007, ava ilable at http:/ /www- 306.ibm.com/software /data/iminer/ . [9] A. Deshpande and S. Madden, “Mauvedb : supporting model-based user vie ws in databa se systems, ” in Pr oce edings of ACM SIGMOD , 2006. [10] A. Deshpande, C. Guestr in, S. Madden, J. M. Hellerstein, and W . Hong, “Model-d ri ven dat a acquisition in sensor networks, ” in Pr oce edings of VLDB , 2004. [11] A. Jain, E. Change, and Y . W ang, “ Adapti v e stream resource mana ge- ment using ka lman filt ers, ” in Proc eedings of A CM SIGMOD , 20 04. [12] M. Chu, H. Haussecke r , and F . Zhao, “Scalable information-dr i ven sensor querying and routing for ad hoc heterogeneous sensor networks, ” Internati onal Journal of High-P erf ormance Computing Application s , vol. 16, no. 3, 2002. [13] S. Grumbach, P . Rigaux, and L. Segoufin, “Manipu lating interpolate d data is e asier tha n you thought , ” in Pr oceedings of VLDB , 2000. [14] L. Neugebauer , “Optimi zation and ev alua tion of database queries includ- ing embedded interpola tion procedure s, ” in Pro ceedin gs of SIGMOD , 1991. [15] C. Guestrin, P . Bodik, R. Thibaux, M. Paskin, and S. Madden, “Dis- trib uted Re gression: an Efficien t Framew ork for Modeling Sensor Net- work Data, ” in Proc eeding s of IPSN 2004 , Apr . 2004. [16] H. Gupta, V . Nacda, S . Das, and V . Chowdhar y , “Energy-e f ficient gatheri ng of correla ted data in sensor netwo rks, ” in Proc eedings of MobiHoc , 2005. [17] Y . K otidis, “Sna pshot queries: to w ards data-cen tris sensor networks, ” in Pr oceedi ngs of ICDE , 2005. [18] A. Deligia nnakis, Y . K otidis, and N. Roussopoulos, “Compressing historic al information in sensor netw orks, ” in Proce edings of SIGMOD , 2004. [19] D. Chu, A. Deshpan de, J. Hel lerstei n, and W . Hong, “ Approximate dat a colle ction in sensor netw orks using p robabili stic m odels, ” in Pr oce edings of ICDE , 2006. [20] A. Silberstei n, R. Braynard, G . Filpus, G. Puggioni, A. Gelfand, K. Mu- nagala , and J. Y ang, “Dat a-dri v en processing in sensor networks, ” in Pr oceedi ngs of Confer ence on Inno vativ e Data Syste ms Resear c h , 2007. [21] D. Tulone and S. Madden, “P A Q: Time series forecasting for aproximate query answering in sensor networks, ” in Pro ceedin gs of the Eur opean Confer enc e on W ir eless Sensor Networks , 2006. [22] D. J. Abadi, S. Madden, and W . Lindner , “Reed: Robust , ef ficie nt filterin g and ev ent detecti on in sensor networks, ” in Proc eedings of VLDB , 2005. [23] S. Li, Y . L . adn S. H. Son, J. A. Stanko vic, and Y . W ei, “Event detection service s using data service midd le ware in distri bute d sensor networks, ” in Proce edings of IPSN , 2003. [24] A. Herbold1, T . Lamarre, N. Bulusu, and S. Jha, “Resilient ev ent detec tion in wireless sensor networks, ” in Procee dings of Intellig ent Sensors, Sensor Ne tworks and Informat ion Pr ocessing , 2004 . [25] R. Szewczyk , E. Osterweil, J. Polastre, M. Hamilton, A. Mainw aring, and D. E strin, “Habit at monitoring with sensor networ ks, ” CACM , vol. 47, no. 6, 2004. [26] W . W ang, X. Guan, and X. Zhang, “ A nov el int rusion detect ion met hod based on princ iple component analy sis in computer security , ” in Pr o- ceedi ngs of Advanced in Neural Network s , 2004. [27] A. Lakhi na, M. Crov ella, and C. Diot, “Mining anomalies using traffic feature distrib utions, ” in P r ocee dings of ACM SIGCOMM 2005 , Aug. 2005, pp. 217–228. [Online]. A vaila ble: http:/ /www . cs.bu.edu/ fac ulty/cro vella/paper - ar chiv e/sigc05- mining- anomalies.pdf [28] A. Per era, N. Pa pamichai l, N. B ˆ arsan, U. W eimar , and S. Marco , “On- line e ve nt detection by recursi ve dynamic principal component analysis and gas sensor arrays under drift condi tions, ” in Proce edings of IEEE Sensors , 2003. [29] A. Iqbal, “Multi-route ev ent detection using PCA, ” 2006, a va ilabl e at – http:/ /confluen ce.slac.stanford.edu/display/IEPM/Multi- route+Event+Detection+Using+P C A . [30] “Baltimore -W ashington Interna tional air - port, weather station , ” A v ai lable at: http:/ /weathe r . marylandweather .com/cgi- bin/findweather/getForecast?query=BWI . [31] A. J. Connolly and A. S. Szal ay, “A Robust Cla ssificatio n of Galaxy Spectra : Dealing with Noisy and Incomplete Data, ” The Astr onomi cal J ournal , v ol. 117, pp . 2052–2062, May 1999.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment