Universal Behavior in Large-scale Aggregation of Independent Noisy Observations

Univ ersal Beha vior in Large-scale Aggregation of I ndep enden t Noisy Ob serv ations T atsuto Mura yama ∗ and Peter Davis † NTT Communic ation Scienc e L a b or atories, NTT Corp or ation, 2-4, Hikaridai, Seika, Kyoto 6 19-0237, Jap a n (Dated: No v emb er 9, 201 8) Abstract Aggregat ion of noisy obs er v ations in volv es a d iﬃcult tradeoﬀ b etw een observ ation qualit y , whic h can b e increased b y increasing the num b er of observ ations, and agg r egation quality whic h decreases if the n u m b er of ob s erv ations is to o large. W e clarify this b ehavi or for a prot yp ical s ystem in whic h arbitrarily large n umbers of observ ations exceeding the system capacit y ca n b e aggreg ated using lossy data co m pression. W e sho w the existence of a scaling relati on b et ween the colle ctive error and the system capacit y , and sho w that large scale lossy aggreg ation can outp erform lossless aggregat ion ab o ve a critical lev el of obser v ation noise. F urther, w e sho w that un iv ersal resu lts for scaling and critical v alue of noise whic h are ind ep end en t of system capacit y can b e obtained by considering asymptotic b eha vior when the system capacit y increases to w ard inﬁnit y . P ACS num b ers: 89.70 .-a, 6 4.60.-i, 02.50 .Cw, 7 5.10.Nr ∗ Electronic a ddress: muray ama@cs lab.kecl.n tt.co.jp † Electronic addres s: davis@cslab.kecl.nt t.co .jp 1 This letter presen ts results whic h giv e a new p ersp ectiv e o n the gro wing ﬁeld of sensory data aggregation by clarifying fundamen tal principles of large-scale agg regation. Examples of large scale ag gregation o f observ ations include astronomical observ ations [1], biolo gical sensing [2], early detection of na tural disasters suc h as earthquak es, tidal wa v es a nd ﬂo o ds [3] and wireless sensor netw orks [4]. Errors in o bserv ations can be reduced b y collecting obser- v atio n data from more sensors. How eve r, collecting data f rom many sensors usually in volv es some cost in terms of net work resources, resulting in fundamen t a l tradeoﬀs [5]. The the- oretical understanding of these tra deoﬀs in na t ural and engineered systems is now a high priorit y . An imp ortant fundamen tal problem in this ﬁeld is the problem of aggregating indep enden t observ ations of the same phenomenon with a r esource constrain t. Previous works hav e analyzed t he tradeoﬀ b ehav io r betw een aggregate data rat e and sens ing erro r from the fundamen tal view of information theory . The analysis ha s b een extended to include the situation where arbitrarily lar ge n um b ers of samples can b e collected b y reducin g the data aggregated from eac h sample using lossy data compress ion. Ho w ev er, so far results ha v e only b een obtained for the fundamental information theoretic bounds with inﬁnitely man y sensors [6, 7], or sp eciﬁc situations in whic h the num b er o f sensors is ﬁxed [8]. The previous w orks do no t include the situation where the n umber o f observ a t io ns can b e v aried, a nd th us the results are not suﬃcien t to supp ort our understanding and design of real world systems. In this pap er w e in t r o duce a mo diﬁcation of the common basic mo del for dat a a ggregation with compression whic h mak es it more tractable and amenable to a nalysis when the num b er of sensors can v a ry . Sp eciﬁcally , w e consider indep enden t decompression of eac h observ ation in a discrete v ersion of the CEO pro blem [6]. W e show that this mo del rev eals a new prop erty , the existenc e of noise threshold b ey ond whic h large scale ag gregation is superior to lossles s aggregation with no compression. This can b e seen as a manifestation of “more is diﬀerent” in sensor net w orks [9]. Moreo v er, w e sho w that unive r sal results for scaling b eha vior of collectiv e estimation error can b e obtained by considering asymptotic b eha vior when the sys tem capacit y div erges to inﬁnit y . Supp ose that w e hav e L indep enden t sensors which eac h indep enden tly observ e an M -bit state X , X µ for µ = 1, · · · , M , of a common, unifo rm binary source, and obtain an M -bit observ ation Y ( a ) ( a = 1, · · · , L ) where each bit Y µ ( a ) has common probability p of error, i.e. diﬀering fro m the corresp onding source bit X µ . The v alue of p speciﬁes the lev el of 2 observ ation noise. No w the sensors indep enden tly compress their M -bit observ ation in to shorter N - bit co dew ords, Z ( a ), and send them to t he agg regator. The condition ’indep en- den t’ excludes the p ossibilit y of m utual comm unications b et wee n sensors. W e a ssume the rate R = N / M is common to all the sensors. In addition, w e suppose that the sum t o tal of the rate, the system capacity λ , is ﬁxed, with λ = LR . (1) The a ggregator then deco des ev ery N bit co dew ord indep enden tly to obtain L separate M - bit repro ductions ˆ Y ( a ) ( a = 1 , · · · , L ). Finally , the ˆ Y ( a ) are used to obta in a single collectiv e estimator ˆ X . W e analyze the b ehavior of the bit error probability , denoted p e ( p, R ; λ ), in the collectiv e estimate. The theoretical low er b ound of av erage distortion for a giv en rate R is giv en by the distortion-rate function, or simply the Shannon b ound [10]. Though w e kno w that the b ound could b e a c hiev ed asymp t otically b y using Shannon’s random co des, the expo nen tial enco ding complexit y prohibits us from using them in practice. F or uniform binary sources, ho we ver, an alternativ e a ppro ac h has b een recen t ly dev elop ed based on linear co des with iterativ e, or message passing, enco ding achiev ing close to the theoretical limit [11, 12, 13]. Applying these new results allows us to obtain n umerical results f o r arbitrary data reduction. FIG. 1 sho ws t ypical results from a n umerical experiment for t he av erag e v alues of p er-bit error probability , p e ( p, R ; λ ), obta ined using a linear co de with an iterativ e enco der [11]. The linear co des are deﬁned by a class of sparse matrices ha ving K ones (’1’) p er row and C ones p er column , respective ly , where K/C = N / M . Therefore w e ma y write R = K /C [14]. F or ease of comparison, the v a lues of error pro babilit y p e ( p, R ; λ ) for noise p and rate R = N / M are divided b y a reference lev el p e ( p, 1; λ ) for R = 1 under the same system capacity λ [15]. The example FIG. 1 demonstrates the follo wing t w o p oin ts; (1) Ther e exists a thr eshold value o f noise wher e lossy lar ge-sc ale ag gr e gation b e c omes sup erior to loss less aggr e gation . Lossless a ggregation with R = 1 outp erfo rms the lossy ag g regation with R smaller than 1 at low er noise lev els. Ho w eve r, at higher noise lev els the alternative strategy with lo ssy data compression b ecomes superior . (2) T her e exists a sc aling r elation with r esp e ct to system c ap ac ity. The error curve s ha v e a univ ersal shap e in the sense that plots for diﬀeren t λ o ve r la p with appropriate re-scaling, as shown b y the example fo r λ = 500 using the scale on the righ t side. This observ ation implies a scaling law for the data aggregation with respect 3 0.2 0.3 0.4 0.5 10 −2 10 −1 10 0 10 1 10 2 0.2 0.3 0.4 0.5 10 −1 10 0 10 1 p R = 2 / 3 , λ = 1 00 0 (lef t) R = 1 / 6 , λ = 1 00 0 (lef t) R = 2 / 3 , λ = 5 00 (r i ght) FIG. 1: Semilog plots for a verage error probabilit y in noisy data ag gregation u sing linear codes with K = 2. The v alues of error probabilit y p e ( p, R ; λ ) for noise p an d rate R = K/C are divid ed b y a reference lev el p e ( p, 1; λ ) under the same system capacit y λ . Here parameters are c hosen to b e λ = 500 with C = 3 (pluses, righ t scale) and λ = 1000 with C = 3 and 12 (circles and squares, left scale), resp ective ly . to λ . Introducing the co eﬃcien t β w e can write the emp ir ical scaling relation a s follows: log h p e ( p, R ; β λ ) p e ( p, 1; β λ ) i = β log h p e ( p, R ; λ ) p e ( p, 1; λ ) i . (2) Using the base-10 lo garithm, t he scaling in FIG. 1 is well deﬁned b y the scaling factor β = 2. In this letter, w e presen t a theoretical analysis whic h explains these empirical results, a nd presen ts them in a univ ersal form. First, w e assume that t he error due to lossy compression is indep enden t of µ and a , a nd denoted by D , that is h δ ( Y µ ( a ) , − ˆ Y µ ( a )) i = D . Here we used Kronec k er’s delta δ and the brake t denotes av eraging o v er random v ariables. This includes the standard exc hangeable sensor ansatz f or our mo del [6 , 7], whic h means that all sensors ha ve the same ra t e R and distortion D . The p ossible v alue of the distortion D dep ends o n R , so we explicitly denote D as D ( R ). The combine d error probabilit y for ˆ Y µ ( a ), indep endent of µ a nd a , is obtained as ρ = (1 − 2 p ) D ( R ) + p . (3) The combined error pro babilit y ρ is a function of b oth p and R . In particular, the equation (3) implies t ha t ρ is a decreasing function of R , since D ( R ) should b e a decreasing function 4 of R . Since we assume Bernoulli statistics, t he b est estimate fro m the set of aggregat ed v alues can b e obtained b y the sim ple ma jority-v ote operatio n: ˆ X µ = sgn  L X a =1 ˆ Y µ ( a )  . Then, the error probability for the ﬁnal estimate is giv en, in terms of ρ and L , by p e ( p, R ; λ ) = P L l = L +1 2 Q ρ ( l ; L ), whic h is just the pro babilit y of getting more tha n L/ 2 errors out of L Bernoulli trials. W e assume for simplicit y that only o dd v alues of L are tak en. The Q ρ ( l ; L ) =  L l  ρ l (1 − ρ ) L − l represen ts the binomial dis t ribution. It is ob vious that p e ( p, R ; λ ) is a decreasing function of L if ρ is ﬁxed. How eve r, due to the constrain t (1), a nd the decrease in distortion D ( R ) with increase of R , ρ actually increase s with an increase of L , resulting in con t rary eﬀects on p e ( p, R ; λ ). Therefore the c hallenge here is to incorp orate consideration of the distortion D in a w a y whic h clariﬁes the interpla y b et wee n the con trary eﬀects induced b y the constrain t (1). In the following, w e consider the asymptotic analysis in the limit of larg e λ , for whic h w e can obtain explicit results. F or suﬃcien tly large L , t he binomial distribution Q ρ ( l | L ) is w ell appro ximated b y the Gaussian distribution N( Lρ, Lρ (1 − ρ )) with mean Lρ and v aria nce Lρ (1 − ρ ). No w we examine t he asymptotic b eha vior f or λ . W rite α ( p, R ) = (1 − 2 p )(1 − 2 D ( R )) and deﬁne, for simplicity , ν = α ( p, R ) √ λ p R (1 − α ( p, R ))(1 + α ( p, R )) . Then, in the limit λ → ∞ , the asymptotic expansion of the cumm ulat ive G aussian distri- bution gives p e ( p, L ; λ ) ∼ 1 2 erfc  ν √ 2  , where erfc( x ) is the complimen tary error function [16]. By analogy with large deviation theory [17], w e can deﬁne and calculate t he exponential rate o f decay as follows : I p ( R ) = − lim λ →∞ 1 λ ln p e ( p, R ; λ ) = α ( p, R ) 2 2 R (1 − α ( p, R ))(1 + α ( p, R )) (0 < R ≤ 1) . (4) Notice that the ab ov e f orm ula holds fo r an y function D ( R ) . Indeed this univ ersal prop ert y w ell describes the exp onen tial scaling (2). 5 In particular, the smallest av erage distortion D ( R ) is obtained in the limit of M → ∞ , and is called the distortion-rate function [10]. In our mo del, its in v erse function, the ra te- distortion f unction [10], can b e ana lytically giv en b y R ( D ) = 1 + D log 2 D + (1 − D ) log 2 (1 − D ) . (5) W e ma y use either the distort io n-rate function or the rate- distort io n function to describ e the optimal b oundary , since the t w o des criptions are equiv alen t in the large M limit. No w assume hereafter that the distortion-rat e function D ( R ) is the sp eciﬁc case implicitly giv en b y the in v erse formula (5) f or R ( D ). Then asym pto tics of R ( D ) enables us to obtain the large s cale de ca y rate as I p (0) = − lim λ →∞ 1 λ lim R → 0 ln p e ( p, R ; λ ) = (1 − 2 p ) 2 ln 2 . No w w e can see that if w e compare just the t wo aggregation strategies R = 1 or R → 0, the threshold v alue of noise p 1 corresp onding to the switc h of the sup erior aggregation can b e determined by solving the equation (1 − 2 p 1 ) 2 ln 2 = (1 − 2 p 1 ) 2 2(1 − (1 − 2 p 1 ) 2 ) . The analytical solution p 1 = 0 . 236 gives the threshold beyond whic h the larg e scale agg r e- gation with R → 0 outp erforms t he R = 1 strategy . Next w e n umerically examine the v alue of R whic h maximizes I p ( R ) for a giv en p . The optimal v alue R ∗ is plotted in FIG. 2 as a function of p . W e ﬁnd that the optimal rate v anishes, i.e. R ∗ = 0, f o r no ise lev els larg er than a critical p o in t p 0 = 0 . 295. In con tra st, w e can alwa ys ﬁnd non-zero optimal v alues of R b elow this p oin t. In particular, if the noise lev el is near zero, then R = 1 is optimal. The c ha nge in v alue o f optimal R ∗ with resp ect to noise lev el p is con tin uous at p 0 , as in a second o r der phase transition. W e note that the a nalytical results presen ted here using ( 4 ) and (5) a r e consisten t with the results of t he numerical sim ulations with linear co des. That is, t he exp onen tial ra te of deca y (4) w ell describ es the scaling la w (2). Moreov er, they add more sp eciﬁc and fundamen tal conditions to our ﬁrst observ ation on F IG. 1 that ag gregation with R smaller than 1 is sup erior for larger noise. The critical p oin t b ey ond whic h the strategy with R = 1 is not optimal in FIG. 2 indicates the low est b ound for such threshold, and is ob viously consisten t with the n umerical simulations. 6 0 0.1 0.2 0.3 0.4 0.5 0 0.2 0.4 0.6 0.8 1 p R ∗ R † FIG. 2: Op timal rate R ∗ for lossy aggregat ion of observ ations f r om indep end en t sensors in noisy en vironm ent w ith noise level p . R ∗ is the optimal v alue of R , the aggregation r ate p er sensor, whic h maximizes the asymptotic deca y r ate I p ( R ) of error probabilit y with increase of system capacit y λ . I p ( R ) is d eﬁned in (4). Disto r tion due to lossy compression is given implicitly b y (5). F or comparison, R † is th e p essimistic v alue of R w hic h minimizes I p ( R ). No w let us consider the v alue of R whic h minimizes I p ( R ), sa y R † . In contrast with the con tinuous c hange in the b eha vior of optimal R ∗ , the p essimistic R † sho ws an abr upt c hange with respect to the noise p . Our numerical ana lysis indicates that there are only tw o cases for the w orst solution: R † = 0 and R † = 1, so t he threshold v a lue of noise p 1 corresp onds to the switc h of the R † . W e no t e that in the in termediate rang e of p t he optimal R ∗ is a ﬁnite v alue b etw een R = 1 and R = 0 . It is natural to ask how m uc h the estimates obtained with these in termediate v alues of R ∗ diﬀer fr o m the estimates obtained using the extreme v alues of R = 1 or R = 0. FIG. 3 sho ws the noise dep endence of deca y rates I p ( R ) with R = 0 , 1, and R ∗ , resp ective ly . The size of the diﬀerence I p ( R ∗ ) − I p (1) and I p (0) − I p (1) is show n in the inset of FIG. 3. F or comparison with these results whic h we r e obtained using the Shannon limit, the ra te- distortion function in (5), we also sho w the result obtained for linear co de with K = 2, corresp onding to FIG. 1. This result for K = 2 was obtained using the replica metho d for diluted spin systems [18, 19]. First we note that in the case of compression using R ( D ), expression (5 ), the com bination strategy of using only either R = 0 o r R → 0, switc hing at 7 the threshold p oin t p 1 , w ell appro ximates the optimal p erformance giv en b y R ∗ . Next, w e fo cus o n the b eha vior of the diﬀerence I p ( R ∗ ) − I p (1) with res p ect to the noise p (solid line in inset). The la rgest gain is ac hiev ed at p ∗ = 0 . 305 (indicated in the ﬁgure b y a v ertical dotted line), whic h diﬀers sligh tly from the v a lue for p 0 whic h was p 0 = 0 . 295. Finally , w e consider the result for the linear co de with K = 2 . It sho ws a similar threshold b eha vior - the v alue o f I p ( R ) fo r R = 0 b ecomes greater than the v alue fo r R = 1 when the noise p exc eeds a threshold v alue [20]. Ho we ver, the gain is less than that obtained for the rate- distortion function, whic h show s that there is still ro o m for improv emen t b y using alternativ e tec hniques [21, 22]. Our results sho w that the optimal aggregatio n for a system of sensors with constrained system capacit y exhibits a kind of threshold behavior with res p ect to the observ ation noise lev el. If we imagine the system autonomously switc hing to the optimal aggregation metho d, then it w ould app ear to b e a phase transition b eha vior. This result is signiﬁcant for under- standing the principles of large scale a g gregation in se nsing systems, natura l or engineered. W e describ ed the b eha vior of the optimal aggregation rate p er sensor R = λ/L , the ratio of the system capacit y λ and the n um b er of s ensors L . The analysis shows that in the hig h noise region b ey o nd a critical v alue of noise, t he rate R should a pproac h to zero in order to reduce collectiv e estimation error. This means that v ery man y sensors with L ≫ λ should b e used. In contrast, if the noise level is low er than the critical p oin t, the ratio R should tak e a p ositiv e v a lue. In this case , the n umber of sensors scales as L = O ( λ ). This w o rk has b een supp orted in part b y Gr an t- in- Aid for Scien tiﬁc Researc h o n Priority Areas, Ministry of Education, Culture, Sp orts, Science and T ec hnology (MEXT), Ja pan, No. 18079015. [1] M. Ryle and A. Hewish, Mon th ly Notices of the Ro ya l Astronomical So ciet y 120 , 220 (1960). [2] N. F ran ceschini, Ph otoreceptor Op tics pp. 98–12 5 (1975). [3] J. Zschau and A. K ¨ upp ers, E arly Warning Systems for Natur al Di saster R e duction (Sprin ger, 2003) . [4] J. Kahn , R. Katz , and K . Pister, in Pr o c e e dings of the 5th annua l ACM/IEEE internatio nal c onfer enc e on Mobile c omputing and networking (A CM Press New Y ork, NY, USA, 1999) , p p. 8 0.15 0.2 0.25 0.3 0.35 0.1 0.2 0.3 0.4 0.5 0.6 p 0 0.1 0.2 0.3 0.4 0.5 0 0.01 0.02 p I p ( R ∗ ) − I p (1) I p (0) − I p (1) K = 2 R ( D ) I p ( R ∗ ) I p (1) I p (0) FIG. 3: Error deca y r ates. I p ( R ∗ ) is the error d eca y rate with lossy d ata compression at the optimal aggrega tion rate R ∗ , while I p (1) is the error deca y rate without data compression. I (0) corresp onds to the error deca y rate for the large system limit wh en R → 0. Inset: Information gain. R ( D ) corresp ond s to the Shann on limit, wh ile K = 2 ind icates p erformance of the linear co des wh en C → ∞ . 271–2 78. [5] I. Akyildiz, W. Su, Y. Sank arasubramaniam, and E. Cayirci, IEEE Communications Magazine 40 , 102 (2002). [6] T. Berger, Z. Zhang, and H. Viswanat h an, IEE E T ransactions on Information T heory 42 , 887 (1996 ). [7] Y. Oohama, IEE E T ransactions on Information T heory 44 , 1057 (1998). [8] M. Gastpar, IEEE T ransactions on Information Theory 54 , 5247 (2008). [9] P . An derson, S cience 177 , 393 (1972). [10] T. Cov er and J . Thomas, Elements of informa tion the ory (Wiley New Y ork, 1991). [11] T. Mur a ya ma, P hysical Review E 69 , 35105 (2004 ). [12] M. W ain wr igh t and E. Manev a, in Pr o c e e dings of IEEE International Symp osium on Infor- mation The ory (2005), pp. 1493–1497 . [13] S. Cilib erti, M. M ´ ezard, and R. Zecc hina, Physical Review Letters 95 , 38701 (2005). [14] R. Gallag er, IEEE T r ansactions on Inform ation Theory 8 , 21 (1962). [15] T. Mura yama and P . Da vis, Adv ances in Neural Information Pro cessing Systems 18 (NIPS’05) 9 pp. 931–938 (2006) . [16] E. Copson, Asymptotic E xp ansions (Cam br idge Universit y Press, 2004). [17] R. Ellis, Entr opy, L ar ge Deviations a nd Statistic al Me chanics (S p ringer, 1985). [18] K. W ong and D. Sherrington, Journ al of Ph ysics A: Mathematical and General 20 , L793 (1987 ). [19] Y. K abashima and D. S aad, Europhysics Letters 45 , 97 (1999). [20] T. Mura yama and M. Ok ada, Ad v ances in Neural Inf ormation Pro cessing Sys tems 15 (NIPS’02) p p. 423–430 (2003) . [21] T. Hosak a, Y. Kabash im a, and H. Nishimori, Physic al Review E 66 , 66126 (2002). [22] M. O pp er and O. Win ther, Physical R eview Letters 86 , 3695 (2001) . 10

Universal Behavior in Large-scale Aggregation of Independent Noisy Observations

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment