Bayesian Compressive Sensing via Belief Propagation

Ba y esian Compressiv e Sensing via Belief Propagation Dror Baron, 1 Shrira m Sarvotham, 2 and Ric har d G. Bar aniuk 3 ∗ 1 Departmen t of Electrical En gineering, T ec hnion – Isr ael In stitute of T ec hnology; Haifa, Israel 2 Halliburton; Houston, TX 3 Departmen t of Electrical and Compu ter Engineering, Rice Univ ersit y; Houston, TX Octob er 22, 2018 Abstract Compressive sensing (CS) is an emerging ﬁeld based on the rev elation that a small collect ion of linear pr o jections of a sparse signal con tains enough information for sta- ble, sub-Nyquist signal acquisitio n. When a statistica l c haracterization of the signal is a v ail able, Ba y esian inference can complemen t co nv en tional CS methods b ased on lin- ear programming or greedy alg orithms. W e p erform appro ximate B a yesia n inference using b elie f propagatio n (BP) d ecod ing, whic h represents the CS en cod ing matrix as a graphical mo del. F ast computation is obtained by reducing the size of the grap h - ical mo d el w ith sp arse enco din g matrices. T o decode a length- N signal con taining K large coeﬃcien ts, our CS-BP deco ding algorithm uses O ( K log( N )) m easur emen ts and O ( N log 2 ( N ) ) compu tation. Finally , although we fo cus on a t wo -state mixture Gaussian mo del, CS-BP is easily adapted to other signal mo dels. 1 In tro duction Man y signal pro cessing applicatio ns require the iden tiﬁcation and estimation of a few signif- ican t co eﬃ cien ts f rom a high-dimensional v ector. The wisdom b ehind this is the ubiquitous compressibilit y of signals: in an appropriate basis, most of the information con tained in a signal often resides in just a few large coeﬃcien ts. T raditiona l sensing and pro cessing ﬁrst acquires the en tire data, only to later thro w a w ay most coeﬃcien ts a nd retain the few sig- niﬁcan t ones [2]. In terestingly , t he information con tained in the few large coeﬃcien ts can b e captured (enco ded) b y a small n um b er of random linear pro jections [3]. The ground-breaking ∗ This w ork was supported by the grant s NSF CCF-043 1150 and CCF-0728867 , D ARP A/ONR N66 001-0 8- 1-206 5, ONR N0001 4-07-1 -0936 and N00 0 14-08 -1-11 1 2, AF OSR F A955 0-07- 1 -0301 , AR O MURI W311NF- 07-1- 0185, and the T exas Instrumen ts Leader ship Univ ersity Progr a m. A preliminary version of this work app eared in the techn ical repor t [1 ]. E-mail: dror b@ ee.technion.ac.il, { shr i, rich b } @rice.edu; W eb: dsp.rice.edu/cs w ork in c ompr essive se nsing (CS) [4–6 ] has pro v ed fo r a v ariety of settings that t he signal can then b e deco ded in a computationa lly f easible manner from these random pro jections. 1.1 Compressiv e sensing Sparsit y and random enco ding : In a typical compressiv e sensing (CS) setup, a signal v ector x ∈ R N has the form x = Ψ θ , where Ψ ∈ R N × N is an orthonormal basis, a nd θ ∈ R N satisﬁes k θ k 0 = K ≪ N . 1 Owing to the sp arsity o f x relativ e to the basis Ψ, there is no need to sample all N v a lues of x . Instead, the CS theory establishes that x can b e deco ded from a small n um b er of pro jections onto an incoheren t set of measuremen t v ectors [4, 5]. T o me asur e (enco de) x , w e compute M ≪ N linear pro jections of x via the matrix-vec tor m ultiplication y = Φ x where Φ ∈ R M × N is t he en c o ding m a trix . In addition to strictly sp arse signals where k θ k 0 ≤ K , other signal mo dels are possible. Appr oxima tely sp arse signals ha ve K ≪ N la rge co eﬃcien ts, while the remaining coeﬃcien ts are small but not necessarily ze ro. Compr ess ible signals hav e co e ﬃcien ts tha t , whe n sorted, deca y quic kly according to a pow er law . Similarly , both noiseless and noisy signals and measuremen ts ma y b e considered. W e emphasize noiseless measuremen t of approximately sparse signals in the pap er. Deco ding via sparsit y : Our goal is to deco de x give n y and Φ. Although de co ding x from y = Φ x app ears to b e an ill-p osed in v erse problem, the prior kno wledge of sparsit y in x enables to deco de x from M ≪ N measuremen ts. Deco ding often relies on an optimization, whic h searc hes for the sparsest co eﬃcien ts θ that agree with the measuremen ts y . If M is suﬃcien tly large and θ is strictly sparse, then θ is the solution t o the ℓ 0 minimization: b θ = arg min k θ k 0 s.t. y = ΦΨ θ . Unfortunately , solving t his ℓ 0 optimization is NP-complete [7]. The rev elatio n that supports the CS theory is that a computationally tractable optimiza- tion problem yields an equiv a lent solution. W e need only solve for the ℓ 1 -sparsest co eﬃcien ts that a gree with t he measuremen ts y [4, 5]: b θ = arg min k θ k 1 s.t. y = ΦΨ θ , (1) as long as ΦΨ satisﬁes some tec hnical conditions, wh ich are satisﬁed with o v erwhelming probabilit y when the en tries of Φ are indep enden t and iden tically distributed (iid) sub- Gaussian random v ariables [4]. This ℓ 1 optimization problem (1), also kno wn as Basis Pursuit [8], can b e solv ed with linear programming me tho ds . The ℓ 1 deco der requires only M = O ( K log( N /K )) pro jections [9 , 10]. Ho w ev er, enco ding by a dense G aussian Φ is slow, and ℓ 1 deco ding requires cubic computation in gene ral [1 1]. 1.2 F a st CS deco ding While ℓ 1 deco ders ﬁgure prominen tly in the CS literature, their cubic complexit y still renders them impractical for many applications. F or example, current digital cameras acquire ima g es 1 W e use k · k 0 to denote the ℓ 0 “norm” that coun ts the num b er of non-zero elements. 2 with N = 10 6 pixels or more, and fast deco ding is critical. The slo wness of ℓ 1 deco ding has motiv ated a ﬂurry of r esearch into faster algorithms. One line of research inv olv es iterative greedy algorithms. The Matching Pursuit (MP) [12] algorithm, for example, iterativ ely selects the v ectors from the matrix ΦΨ tha t con tain most of the energy of the measure men t v ector y . MP has been pro ven to successfully deco de the acquired signal with high probabilit y [12, 13]. Algor it hms inspired b y MP include OMP [12], tree matc hing pursuit [14], stagewise OMP [15], CoSaMP [16], IHT [17], and Subs pace Pursuit [18] hav e been sho wn to attain similar guar a n tees to tho se of their optimization- based coun terparts [19– 2 1]. While the CS algorithms discusse d ab ov e typic ally use a dense Φ matrix, a class of meth- o ds has emerged that emplo y structur e d Φ. F or example, subsampling an orthogonal basis that admits a fast implicit algorithm also leads to fast deco ding [4]. Enco ding matrices that are themselv es sparse can also b e used. Cormo de and Muth ukrishnan pro p osed fast stream- ing alg o rithms based on group testing [2 2, 23], whic h considers subsets o f signal coeﬃcien ts in whic h w e ex p ect at most one “hea vy hitter” co eﬃcien t to lie. Gilb ert et al. [24] propo se the Chaining Pursuit algorit hm, w hic h w orks best for extremely sparse signals. 1.3 Ba y esian CS CS deco ding algorithms rely on the sparsit y of the signal x . In some applications, a statisti- cal characterization of the signal is av ailable, and Bay esian inference oﬀers the p oten tial for more prec ise estimation of x or a reduction in the n um b er of CS measuremen ts. Ji et al. [2 5 ] ha v e prop o sed a Bay esian CS framew ork where relev ance v ector machine s are used for signal estimation. F o r certain types of hierarc hical priors, their metho d can appro ximate t he p os- terior dens ity of x and is somewhat faster than ℓ 1 deco ding. Seeger and Nic kisc h [26] ex tend these ideas to exp erimen tal design, where the encoding matrix is designed sequen tially based on previous measuremen ts. Another Bay esian a ppro ac h b y Sc hniter et al. [27 ] approx imates conditional exp ectation b y extending t he maximal lik eliho o d approach to a w eigh ted mixture of the most lik ely mo dels. There are also many related results on application of Ba y esian metho ds to sparse in v erse problems (c.f. [28 ] and references therein). Ba y esian appro ac hes hav e also b een used for multiuser de c o ding (MUD) in comm uni- cations. In MUD, users mo dulate their sym b ols with diﬀeren t spreading seq uences, and the receiv ed signals a re sup erp ositions of sequences. Because most users a re inactiv e, MUD algorithms extract information fr o m a sparse sup erp osition in a manner analogous to CS deco ding. Guo and W ang [29] p e rform MUD using sparse s preading sequences and deco de via b eli ef pr op ag a tion (BP) [3 0 –35]; our pap er also uses sparse enco ding matrices and BP deco ding. A related algorithm for deco ding low density lattic e c o des ( L D LC) b y Sommer et al. [36] uses BP on a factor g raph whose s elf and edge p oten tials are Gaussian mix- tures. Con v ergence results for the LDLC deco ding a lgorithm ha v e b een deriv ed for G a ussian noise [3 6]. 3 1.4 Con tributions In this pap er, w e dev elop a sparse encoder matrix Φ and a b elief propag a tion (BP) decoder to accelerate CS enco ding and deco ding under the Ba y esian framew ork. W e call our algorithm CS-BP . Although w e emphasize a tw o-state mixture Gaussian mo del as a prior for sparse signals, CS-BP is ﬂexible to v aria tions in the signal and measuremen t mo dels. Enco ding by sparse CS matrix : The dense sub-Gaussian CS enco ding matrices [4, 5] are reminiscen t of Shannon’s random code constructions. How ev er, although dense matrices capture the informat io n con t en t of sparse signals, they may not be amenable to fast enco ding and deco ding. L ow density p arity che ck (LDPC) co des [37, 38] oﬀer an imp ortant insight: enco ding and deco ding a r e fa st, b ecause m ultiplication by a sparse ma t rix is fast; nonetheless, LDPC codes ac hiev e rates c lose to t he Shannon limit. Indeed, in a previous pap er [39], w e used an LDPC-lik e sparse Φ for the s p ecial case of noise less measureme nt of strictly sparse signals; similar matrices w ere also prop osed for CS b y Berinde and Indyk [40]. Although LDPC deco ding algorithms ma y not ha ve prov able con v ergence, the rec en t extension of LDPC to LD LC codes [36] oﬀers pro v a ble conv ergence, whic h may lead to similar future results for CS deco ding. W e enco de (measure) the signal using sp arse Rademacher ( { 0 , 1 , − 1 } ) LD PC-lik e Φ ma- trices. Because en tries of Φ are restricted to { 0 , 1 , − 1 } , enco ding only requires sums and diﬀerences of small subsets of co eﬃcien t v alues of x . The design of Φ, including charac- teristics suc h as column a nd ro w w eigh ts, is based o n the relev ant signal and measuremen t mo dels, as w ell as the accompanying deco ding algorithm. Deco ding b y BP : W e represen t the sparse Φ as a sparse bipartite g r aph. In addition to accelerating the algorithm, the sparse structure reduces t he n umber of loo ps in the graph and th us assists the con v ergence of a message passing metho d that solv es a Bay esian inference problem. Our estimate f o r x explains the measuremen ts while oﬀering the b est matc h to the prior. W e emplo y BP in a manner similar t o LDPC c hannel deco ding [34 , 37, 38]. T o deco de a length- N signal containing K large co eﬃcien ts, our CS-BP deco ding algorithm uses M = O ( K log ( N )) me asuremen ts a nd O ( N log 2 ( N )) computat ion. Alt ho ug h C S-BP is not guaran teed to conv erge, n umerical res ults are quite fav orable. The remainder of the pap er is organized as follo ws. Section 2 deﬁnes our signa l mo del, and Sec tion 3 describ es our spars e CS-LDPC enco ding matrix. The CS-BP decoding algo- rithm is describ ed in Section 4, and its p erformance is demonstrated num erically in Section 5. V ariations and applications are discusse d in Section 6, and Section 7 concludes. 2 Mixture Gaus sian sig nal mo del W e fo cus on a tw o- state mixture G aussian mo del [41–43 ] as a prior that succinctly captures our prior kno wledge a b out a ppro ximate sparsity of the signal. Ba y esian inference using a t w o-state mixture mo del has b een studied w ell b efore the a dven t of CS, for example b y George and McCulloch [44] and G ew ek e [45]; the mo del w as prop osed fo r CS in [1] and also used b y He a nd Carin [46]. More formally , let X = [ X (1) , . . . , X ( N )] b e a random v ector in R N , and consider the signal x = [ x (1) , . . . , x ( N )] as an outcome of X . Because our appro ximately sparse signal consists of a small n um b er of large co eﬃcien ts and a large n um b er 4 Pr( Q = 0) Pr( Q = 1) f ( X | Q = 0) f ( X | Q = 1) f ( X ) ⇒ Figure 1: Mixture Gaussian mo del for signal co eﬃcien ts. Th e d istribution of X cond itioned on the t wo state v ariables, Q = 0 and Q = 1 , is depicted. Also shown is the o v erall distribution for X . of small co eﬃcien ts, w e associate eac h probability densit y function (p df ) f ( X ( i )) with a state v ariable Q ( i ) that can tak e on tw o v alues. La rge and small magnitudes correspond to zero mean Gaussian distributions with high and lo w v ariances, whic h are implied b y Q ( i ) = 1 and Q ( i ) = 0 , resp ectiv ely , f ( X ( i ) | Q ( i ) = 1) ∼ N (0 , σ 2 1 ) and f ( X ( i ) | Q ( i ) = 0) ∼ N (0 , σ 2 0 ) , with σ 2 1 > σ 2 0 . Let Q = [ Q (1) , . . . , Q ( N )] be the state random vec tor a sso ciated with the signal; the actual conﬁgurat io n q = [ q (1) , . . . , q ( N )] ∈ { 0 , 1 } N is one of 2 N p ossible outcomes. W e assume that the Q ( i )’s are iid. 2 T o ensure that we hav e appro ximately K large co eﬃcien ts, w e c ho ose the probability mass f unction (pmf ) of the state v ariable Q ( i ) to be Bernoulli with Pr ( Q ( i ) = 1) = S and Pr ( Q ( i ) = 0) = 1 − S , whe re S = K / N is the sp arsity r ate . The resulting mo del for signal coeﬃcien ts is a t w o-state mixture Gaussian distribution, as illustrated in Figure 1. This mix ture mo del is completely c haracterized b y three parameters: the sparsit y rate S and the v ariances σ 2 0 and σ 2 1 of the Gaussian p df ’s corresp onding to eac h state. Mixture Gaussian mo dels hav e b e en successfully emplo y ed in image pro cessing and infer- ence problems, b ecause they are simple y et eﬀe ctiv e in modeling real-w orld s ignals [41–43]. Theoretical connections hav e also b een made b et w een w a v elet co e ﬃcien t mixture mo dels a nd the fundamen ta l parameters of Beso v spaces, whic h hav e prov ed in v aluable for c haracterizing real-w orld images. Moreo v er, a r bit r a ry densities with a ﬁnite n umber of discon tin uities can b e approximated arbitrarily closely b y increasing the n um b er of states and allowin g non- zero means [47 ]. W e lea ve these extens ions for future w ork, and fo cus on tw o-state mixture Gaussian distributions for mo deling t he signal coeﬃcien ts. 3 Sparse e nco ding Sparse CS encoding matrix : W e use a sparse Φ matrix to accelerate b oth CS enco ding and deco ding. Our CS enco ding matrices are dominated b y zero en tries, with a small n um b er of non-zeros in each row and eac h column. W e fo cus on CS-LD PC matric es whose non-zero 2 The mo del can be extended to captur e dependencies betw een co eﬃcients, as sugg ested b y Ji et al. [25 ]. 5 Measurements Y States Coefficients X Q Prior Mixing Encoding Figure 2: F actor graph depicting the relatio nsh ip b et wee n v ariable no d es (blac k) and co nstraint no des (white) in CS-BP . en tries a re {− 1 , 1 } ; 3 eac h measuremen t in v olv es o nly sums and diﬀerences of a small subset of co e ﬃcien ts of x . Although the c oher enc e b et w een a sparse Φ and Ψ, whic h is the maximal inner pro duct b et w een rows of Φ and Ψ, may b e higher than the coherence using a dense Φ matrix [48], as long as Φ is not to o sparse (see Theorem 1 below) the me asuremen ts capture enough information ab out x to deco de the signal. A CS-LDPC Φ can b e represen ted as a bipartite graph G , whic h is also sparse. Eac h edge of G connects a co eﬃcien t no de x ( i ) to an enco ding no de y ( j ) and corresp onds to a non-zero entry of Φ (Figure 2). In addition to the core structure of Φ, w e can in tro duc e other constrain t s t o ta ilor the measuremen t pro cess t o the signa l mo del. The c ons tan t r ow weight constrain t make s sure that eac h row of Φ con tains exactly L non-zero en tries. The row we ight L can b e c hosen based on signal prop erties suc h as sparsit y , p ossible measureme nt noise, and details of the deco ding pro ce ss. Another option is to use a c onstant c ol umn weight constrain t, whic h ﬁxes the n um b er of non-zero en tries in eac h column of Φ to b e a constan t R . Although o ur emphasis is on noiseless measuremen t o f approximately sparse signals, w e brieﬂy disc uss noisy me asuremen t of a strictly sparse signal, and sho w that a constan t row w eigh t L ensures that the measuremen ts are a ppro ximated b y tw o-state mixture Ga ussians. T o see this, consider a strictly sparse x with sparsit y rate S and Gaussian v ariance σ 2 1 . W e no w ha v e y = Φ x + z , where z ∼ N (0 , σ 2 Z ) is addi tive white Gaussian no ise (A W GN) with v ariance σ 2 Z . In our approxim ately sp arse setting, eac h row of Φ pic ks up ≈ L (1 − S ) small magnitude co eﬃcien ts. If L (1 − S ) σ 2 0 ≈ σ 2 Z , then the few large co eﬃcien ts will b e obscure d b y similar noise artifa cts. Our deﬁnition of Φ relies on the implicit a ssumption that x is sparse in the canonical sparsifying basis, i.e., Ψ = I . In contrast, if x is sparse in some other basis Ψ, then more complicated enco ding matrices may b e necessary . W e defer the discussion of these issues to Section 6 , but emphasize that in many practical situations our metho ds can b e extended to 3 CS-LDPC matrices are slightly diﬀerent from LDPC parit y c heck matrices, whic h only contain the binar y ent ries 0 and 1. W e hav e obs erved n umerically that allo wing negative en tries oﬀers impro ved p erforma nc e . A t the exp e nse of a dditional computation, further minor improv ement can b e a ttained using spar se matrices with Gaussia n non-zero entries. 6 supp ort the sparsifying basis Ψ in a computationally tractable manner. Information c on ten t of spa rsely encoded measurem en ts : The sparsit y of our CS- LDPC matrix may yield measuremen ts y that con tain less informat ion ab out the signal x than a dense Gaussian Φ. The follo wing theorem, whose pro of app ears in the App endix, v er- iﬁes that y retains enough information to deco de x w ell. As long as S = K / N = Ω   σ 0 σ 1  2  , then M = O ( K log ( N )) measuremen ts are suﬃcien t. Theorem 1 L et x b e a two-state mixtur e Gaussian s i g nal with sp arsity r ate S = K / N and varianc es σ 2 0 and σ 2 1 , and let Φ b e a CS -LDPC matrix with c onstant r ow weight L = η ln( S N 1+ γ ) S , wher e η , γ > 0 . If M = O (1 + 2 η − 1 )(1 + γ ) µ 2 " 2 K + ( N − K )  σ 0 σ 1  2 # log( N ) ! , (2) then x c a n b e de c o de d to b x s uch that k x − b x k ∞ < µσ 1 with pr ob abi l i ty 1 − 2 N − γ . The proo f of The orem 1 relies on a r esult b y W ang et al. [49, Theorem 1]. Their pro of partitions Φ in to M 2 sub-matrices of M 1 ro ws each, and estimates eac h b x i as a median of inner pro ducts with sub-matrices. The ℓ ∞ p erformance guar a n tee relies on the union b ound; a less stringen t gua r an tee yields a reduction in M 2 . Moreo v er, L can b e reduced if w e increase the n um b er of measureme nts accordingly . Based on nu merical results, we prop ose the following mo diﬁed v alues a s rules of thum b, L ≈ S − 1 = N /K , M = O ( K log( N )) , and R = LM / N = O (log( N )) . (3) Noting that each measu remen t require s O ( L ) additions and subtractions, a nd using our rules of th umb for L and M (3), the computation required for encoding is O ( LM ) = O ( N log( N )), whic h is signiﬁcan tly low er than the O ( M N ) = O ( K N log ( N )) required for dense Gaussian Φ. 4 CS-BP d eco ding of appro ximately spars e signals Deco ding appro ximately sparse random signals can b e treated as a Bay esian inference prob- lem. W e o bserve the measuremen ts y = Φ x , where x is a mixture Ga ussian signal. Our goa l is to estimate x giv en Φ and y . Because the set of equations y = Φ x is under-determined, there are inﬁnitely many solutions. All solutions lie along a h yp erplane of dimension N − M . W e lo cate the solution within this h yperplane that b est matc hes our prior signal mo del. Consider the minimum me an s q uar e err or (MMSE) and maximum a p osteriori (MAP) estimates, b x MMSE = arg min x ′ E k X − x ′ k 2 2 s.t. y = Φ x ′ , b x MAP = arg max x ′ f ( X = x ′ ) s.t. y = Φ x ′ , where the exp ectation is tak en ov er the prior distribution for X . The MM SE estimate can b e expressed as the conditional mean, b x MMSE = E [ X | Y = y ] , where Y ∈ R M is t he ra ndom 7 v ector that corresp onds to the measuremen ts. Although the precise computation of b x MMSE ma y require the ev aluation of 2 N terms, a close approximation to the MMSE estimate can b e obtained using the ( usually small) set of state conﬁguration v ectors q with dominant p osterior probabilit y [27]. Indeed, exact inference in graphical mo dels is NP-hard [50], b ecause of lo ops in the graph induced by Φ. How ev er, the sparse structure of Φ reduces t he num b e r of lo ops and enables us to use low-complex ity message-passing metho ds to es timate x appro ximately . 4.1 Deco ding algorithm W e now emplo y b elief pro pagation (BP), an eﬃcien t metho d for solving inference problems by iterativ ely passing messages ov er graphical mo dels [3 0 –35]. Although BP has not b een pr ov ed to con verge, for graphs w ith few lo ops it often oﬀ ers a go o d appro ximation to the solution to the MAP inference problem. BP relies on fa c tor gr aphs , whic h enable fast computation of global multiv ariat e functions b y exploiting the wa y in whic h the globa l function factors into a pro duct of simpler lo cal functions, each of which dep ends o n a subset o f v ariables [5 1]. F actor graph for CS-BP : The factor graph shown in Figur e 2 captures the relationship b et w een the states q , the signal co eﬃcien ts x , and the o bserv ed CS measuremen ts y . The graph is bipartit e and con tains tw o t yp es o f v ertices; all edges connect variable no de s (blac k) and c onstr aint no des (white). There are three ty p es of v ariable no d es corresp onding to state v ariables Q ( i ), c o eﬃcient v aria bles X ( i ), and me asur em e n t v ar ia bles Y ( j ). The factor graph also has three types of constraint no des, whic h encapsulate the dep endencies tha t their neigh b ors in the graph (v aria ble no des ) ar e sub jected to. First, prior c onstr aint no des imp ose the Bernoulli prior on state v a riables. Second, mixi n g constrain t no des imp ose the conditional distribution on co eﬃcien t v ar ia bles given the stat e v ariables. Third, enc o ding constrain t no des impose the enco ding matrix structure on measuremen t v ar iables. Message passin g : CS-BP appro ximates the marginal distributions of all co eﬃcien t and state v a r ia bles in the factor graph, conditio ned on the observ ed measuremen ts Y , b y passing messages b et we en v ariable no de s and constraint no des. Eac h message enco des the marginal distributions of a v ariable asso ciated with one of the edges. Given the distributions Pr( Q ( i ) | Y = y ) and f ( X ( i ) | Y = y ), one can extract MAP and MMSE estimates fo r eac h co eﬃcien t. Denote the message sen t from a v a r iable no de v to one of its neigh b or s in the bipartite graph, a constraint no de c , b y µ v − → c ( v ); a message from c to v is denoted b y µ c − → v ( v ). The message µ v − → c ( v ) is up dated b y taking the product of all messages receiv ed by v on a ll other edges. The message µ c − → v ( v ) is computed in a similar manner, but t he constrain t a sso ciat ed with c is applied to the pro duct and the result is marginalized. More formally , µ v − → c ( v ) = Y u ∈ n ( v ) \{ c } µ u − → v ( v ) , (4) µ c − → v ( v ) = X ∼{ v }   con( n ( c )) Y w ∈ n ( c ) \{ v } µ w − → c ( w )   , (5) where n ( v ) and n ( c ) are sets of neigh b or s of v and c , resp ectiv ely , con( n ( c )) is the constrain t on the set of v ariable no des n ( c ), and ∼ { v } is the set o f neighbors of c excluding v . W e 8 in terpret these 2 t yp es of message processing a s multiplic ation of b eliefs at v a riable no des (4) and c onvolution at constrain t no des (5 ). Finally , the marginal distribution f ( v ) for a g iv en v ariable no de is obtained from the pro duct of all the most rece nt incoming messages along the edges connecting to that no de, f ( v ) = Y u ∈ n ( v ) µ u − → v ( v ) . (6) Based on the marginal distribution, v arious statistical c haracterizations can b e computed, including MMSE, MAP , error bars, and so o n. W e a lso need a metho d to enco de beliefs. One metho d is to sample the relev an t p df ’s uni- formly and then use the samples as messages. Another encoding method is t o approx imate the p df b y a mix ture Gaussian with a given n um b er of comp onents, where mixture param- eters are used as mes sages. These t w o metho ds o ﬀer diﬀeren t trade-oﬀs b et w een mo delin g ﬂexibilit y and computational requiremen ts; details app ear in Sections 4.2 and 4 .3 . W e leav e alternativ e metho ds such as particle ﬁlters and imp ortance sampling f or future res earc h. Protecting again st lo op y graphs and me ssage quan tization errors : BP conv erges to the exact conditional dis tribution in the ideal situation whe re the following conditions a re met: ( i ) the fa ctor graph is cyc le-free; and ( ii ) me ssages are pro c essed and propagated without errors. In CS-BP deco ding, b o th conditions are violated. First, the factor graph is lo o py — it contains cycles . Second, message enco d ing metho ds in tro duce errors. These non-idealities ma y lead CS-BP to con v erge to imprecise conditional distributions, or more critically , lead CS-BP to div erge [52–54 ]. T o some exten t these problems can b e reduced b y ( i ) using CS-LDPC matr ices, which ha v e a relativ ely mo dest num b er of lo ops ; a nd ( ii ) carefully designing o ur message enco ding metho ds (Sections 4.2 and 4.3). W e stabilize CS-BP against these non-idealities using message damp ed b elief propagation (MD BP) [55], where messages ar e w eigh ted a v erages b et w een old and new es timates. D espite the damping, CS- BP is not g uaran teed to con v erge, and y et the n umerical r esults of Section 5 demonstrate that its p erformance is quite promising. W e conclude with a protot yp e algorithm; Matlab co de is a v a ila ble a t http://dsp.rice.e du/CSBP . CS-BP Deco din g Algorithm 1. Initialization : Initialize the iterat io n coun ter i = 1. Set up da t a structures fo r factor graph mess ages µ v − → c ( v ) and µ c − → v ( v ). Initialize mess ages µ v − → c ( v ) from v ariable to constrain t no des w ith the signal prior. 2. Con v olution : F or eac h measuremen t c = 1 , . . . , M , whic h corresp onds to constrain t no de c , compute µ c − → v ( v ) via conv olution (5) f o r all neigh b oring v a riable nodes n ( c ). If me asuremen t noise is presen t , then con v olv e further with a no ise prior. Apply damping metho ds suc h a s MDBP [55] b y weigh ting the new estimates from iteration i with estimates from previous iterations. 3. Multiplication : F or eac h co eﬃcien t v = 1 , . . . , N , whic h corresp onds to a v aria ble no de v , compute µ v − → c ( v ) via m ultiplication (4) for all neighboring constrain t no des n ( v ). Apply damping metho ds as needed. If the iteration counter has yet to reach its maximal v alue, then go to Step 2. 9 4. Output : F or eac h coeﬃcien t v = 1 , . . . , N , compute MMSE or MAP estimates (o r alternativ e statistical characterizations) based on the marginal distribution f ( v ) (6). Output the requisite statistics. 4.2 Samples of the p df as messages Ha ving des crib ed main aspects of the CS-BP deco ding algorithm, we now fo cus o n the t w o message enco ding metho ds, starting with samples. In this metho d, we s ample the p df and send the samples as messages. Multiplication of p df ’s (4) corresp onds to point-wise m ultiplication of messages; con v olution (5) is computed eﬃcien tly in the f requency domain. 4 The main adv antage o f using samples is ﬂexibilit y to diﬀeren t prior distributions for the co eﬃcien ts; for example, mixture Gaussian priors ar e easily supp orted. Additionally , b oth m ultiplication and con v olution are computed eﬃcien t ly . Ho w ev er, sampling has large memory requiremen ts and in tro duc es quantization errors that reduce precision and hamp er the con v ergence of CS-BP [52]. Sampling also r equires ﬁner sampling for precise deco ding; w e prop o se to sample the p df ’s with a spacing less than σ 0 . W e analyze the computational requireme nts of this metho d. Let each message be a vector of p s amples. Eac h iteration p e rforms m ultiplication at co eﬃcien t no des (4) and conv olution at constraint no des (5). Outg oing messages are mo diﬁed, µ v − → c ( v ) = Q u ∈ n ( v ) µ u − → v ( v ) µ c − → v ( v ) and µ c − → v ( v ) = X ∼{ v } con( n ( c )) Q w ∈ n ( c ) µ w − → c ( w ) µ v − → c ( v ) ! , (7 ) where the denominators are non-zero, b ecause mixture Gaussian p df ’s are strictly p ositiv e. The mo diﬁcations (7) reduce computation, because the n umerators are computed once and then reused for all messages lea ving the no de b eing pro cessed. Assuming that the column w eigh t R is ﬁxed (Section 3), the computation required for message pro c essing a t a v ariable no de is O ( Rp ) per iteration, b ecause w e m ultiply R + 1 v ectors o f length p . With O ( N ) v ariable no des, eac h iteration r equires O ( N Rp ) computa- tion. F or constraint no des, we p erform con v olution in the frequency do main, and so t he computational cost p er no de is O ( L p log ( p )). With O ( M ) constrain t no des, each iteration is O ( LM p log ( p )). Accoun ting for b o th v aria ble and constrain t no des, each iteration is O ( N Rp + LM p log( p )) = O ( p log ( p ) N log( N )), where w e emplo y our rules of th umb for L , M , and R (3). T o complete the computational analysis, w e note ﬁrst that w e use O (log ( N )) CS-BP iterations, whic h is prop ortiona l to the diameter of the g r a ph [5 6]. Second, sampling the p df ’s with a spacing less tha n σ 0 , we c ho ose p = O ( σ 1 /σ 0 ) to supp ort a maximal ampli- tude on the order of σ 1 . Therefore, our o v erall computation is O  σ 1 σ 0 log  σ 1 σ 0  N log 2 ( N )  , whic h scales as O ( N log 2 ( N )) when σ 0 and σ 1 are constant. 4.3 Mixture Gaussian parameters as messages In this metho d, w e appro ximate the p df b y a mixture G aussian with a maxim um num b er of comp onen ts, and then send the mixture parameters as messages. F or b oth multiplication 4 F as t conv olution via FFT has b een used in LDPC deco ding over GF (2 q ) using B P [34]. 10 T able 1: Computational and storage requiremen ts of CS-BP deco ding Messages P arameter Computation Storage Samples of p df p = O ( σ 1 /σ 0 ) samples O  σ 1 σ 0 log  σ 1 σ 0  N log 2 ( N )  O ( pN log( N )) Mixture Gaussians m comp one nts O  m 2 N S log 2 ( N )  O ( mN log ( N )) (4) a nd con v olution (5), the resulting n umber of comp o nents in the mixture is m ultiplicativ e in the n um b er o f cons tituent comp onen ts. T o k eep the message represen tat ion tracta ble, w e p erform mo del reduction using the Iter ative Pairwis e R epla c ement A lgorithm (IPRA) [57], where a sequence of mixture models is computed iterative ly . The adv an tage of using mixture Gaussians to enco de p df ’s is that the messages are short and hence consume little mem ory . This metho d w orks w ell for mixture Gaussian priors, but could b e diﬃcult to adapt to other prio r s. Mo del order reduction algorithms suc h as IP RA can b e computationally exp ensiv e [57], and in tro duce errors in the messages, which impair the qualit y of the solution a s well as the con v ergence of CS-BP [52 ]. Again, w e a na lyze the computational requireme nts . Because it is impo ssible to undo the m ultiplication in (4) and (5 ) , w e cannot use the mo diﬁe d form (7). Let m b e the maxim um mo del order. Mo del or der reduction us ing IPRA [57] r equires O ( m 2 R 2 ) computation p er co- eﬃcien t no de p er iteration. With O ( N ) co eﬃcien t no des, eac h it era t io n is O ( m 2 R 2 N ). Sim- ilarly , with O ( M ) constrain t no des , eac h iteration is O ( m 2 L 2 M ). Accoun ting for O (log ( N )) CS-BP iterations, o v erall computation is O ( m 2 [ L 2 M + R 2 N ] log ( N )) = O  m 2 N S log 2 ( N )  . 4.4 Prop erties of CS-BP deco ding W e brieﬂy describ e sev eral prop erties of CS-BP decoding. The c omputational char a c teristics of the tw o metho ds fo r enco ding b eliefs ab out conditional distributions w ere ev aluated in Sections 4.2 and 4.3. The stor age r e quir ements are mainly for message represen tation of the LM = O ( N log ( N )) edges. F or enco ding with p df sample s, the message length is p , and so the storage r equiremen t is O ( pN lo g ( N ) ) . F or enco d ing with mixture Ga ussian para meters, the message length is m , and so the storag e requiremen t is O ( mN log ( N )). Computational and storage requiremen ts are summarized in T a ble 1 . Sev eral a dditional prop erties ar e now featured. First, w e ha v e pr o gr essive de c o ding ; more measuremen ts will impro v e the precision of the estimated p oste rior probabilities. Second, if w e a re only in terested in an estimate of the stat e conﬁguration v ector q but not in the co eﬃ- cien t v alues, then less information m ust b e extracted from the measuremen ts. Consequen tly , the n um b er of measuremen ts can be reduced. Third, w e hav e r obustness to noise , b e cause noisy measuremen ts can b e incorp orated in to our mo del by con v olving the noiseless v ersion of t he estimated pdf (5) a t each enco ding no de with the p df of the noise. 5 Numerical results T o demonstrate the eﬃcacy of CS-BP , w e sim ulated sev eral diﬀeren t settings. In our ﬁrst setting, w e conside red deco ding problems where N = 1000, S = 0 . 1, σ 1 = 10, σ 0 = 1, and the 11 100 200 300 400 500 600 700 20 40 60 80 100 M MMSE L=5 L=10 L=20 Figure 3 : MMSE as a function of the num b er of measurements M usin g diﬀeren t matrix r o w w eigh ts L . The dashed lines sho w the ℓ 2 norms of x (top) and the small coeﬃcients (b ottom). ( N = 1000 , S = 0 . 1 , σ 1 = 10 , σ 0 = 1 , and noiseless measurements.) measuremen ts a r e noiseless. W e used samples of the p df as messages, where each me ssage consisted of p = 525 = 3 · 5 2 · 7 samples; this choice of p pro vided fast FFT computation. Figure 3 plot s the MMSE deco ding error as a function of M for a v ar iety of row we ights L . The ﬁgure emphasizes with dashed lines the av erage ℓ 2 norm of x (top) and of t he small co eﬃcien ts (b ottom); increasing M reduces the deco ding error, until it reache s the energy lev el of the small co eﬃcien ts. A small row we ight, e.g., L = 5, ma y mis s some of the large co eﬃcien ts and is th us bad for de co ding; as w e increas e L , few er measuremen ts a re needed to o bta in the same precision. Ho w ev er, there is an optimal L opt ≈ 2 /S = 2 0 b ey ond whic h an y p erformance gains ar e marg inal. F urthermore, v alues of L > L opt giv e rise to div ergence in CS-BP , even with damping. An example of the output of the CS-BP deco der and ho w it compares to the signal x app ears in Figure 4, where w e used L = 20 and M = 400. Although N = 1000, w e only plo t t ed the ﬁrst 1 00 signal v alues x ( i ) for ease of visualization. T o compare the p erfor ma nce of CS-BP with other CS de co ding algorithms, we also sim ulated: ( i ) ℓ 1 deco ding (1) via linear programming; ( ii ) GPSR [20], an o ptimization metho d that minimizes k θ k 1 + µ k y − ΦΨ θ k 2 2 ; ( iii ) CoSaMP [16], a fa st greedy solver; a nd ( iv ) IHT [17], an iterativ e thresholding algo rithm. W e simu lated all ﬁv e metho ds where N = 1000 , S = 0 . 1, L = 20, σ 1 = 10, σ 0 = 1, p = 525, and the meas uremen ts are noise less. Throughout the exp erimen t we ra n the diﬀeren t metho ds using the same CS-LDPC encoding matrix Φ, the same signal x , and therefore same measuremen ts y . Figure 5 plots the MMSE deco ding error as a function of M for the ﬁv e metho ds. F or small to mo derate M , CS-BP exploits its kno wledge ab out the approximately sparse structure of x , and has a smaller deco ding error. CS-BP req uires 20–30% few er measuremen ts than the optimization metho ds LP and GPSR t o obtain the same MMSE deco ding error; t he adv antage o v er the greedy solv ers IHT and CoSaMP is ev en g r eater. Ho w ev er, as M increases, the adv an tage of CS-BP o v er L P and GPSR becomes less pronounced. T o compare the sp eed of CS-BP to other metho ds, we ra n the same ﬁv e metho ds as b efore. In this exp erimen t, we v aried the signal length N from 100 to 10000, where S = 0 . 1, 12 0 10 20 30 40 50 60 70 80 90 100 −20 −10 0 10 20 i Original x(i) 0 10 20 30 40 50 60 70 80 90 100 −20 −10 0 10 20 i CS−BP estimate of x(i) Figure 4: Or iginal signal x and version decod ed b y CS-BP . ( N = 1000 , S = 0 . 1 , L = 20 , M = 400 , σ 1 = 10 , σ 0 = 1 , and noiseless measuremen ts.) L = 2 0, σ 1 = 10, σ 0 = 1, p = 525, and the measuremen ts ar e noiseless. W e men tion in passing that some of the a lgorithms that were ev a luated can b e accelerated using linear algebra routines optimized for sparse matrices; the improv emen t is quite mo dest, and the run-times presen ted here do not reﬂe ct this optimization. Figure 6 plots the run-times of the ﬁv e metho ds in seconds as a function o f N . It can b e seen tha t LP scales mor e p o orly than the other algorithms, and so w e did not sim ulat e it for N > 3000. 5 CoSaMP also seems to scale relativ ely p o orly , although it is p ossible that our conjugate gradien t implemen tatio n can b e improv ed using the pseudo-in ve rse approach instead [16 ]. The run-times of CS-BP seem to scale somewhat better than IHT and GPSR. Alt ho ug h the asymptotic computatio nal complexit y of CS-BP is go o d , for signals of length N = 10000 it is still slo w er than IHT and GPSR; whereas IHT and GPSR es sen tially p erform matrix-v ector multiplications, CS- BP is slo w ed b y F FT computations p erfo rmed in each iteration for all no des in the fa ctor graph. Additionally , whereas the choice p = O ( σ 1 /σ 0 ) yields O  σ 1 σ 0 log  σ 1 σ 0  N log 2 ( N )  complexit y , FFT computation with p = 525 samples is somewhat slo w. That said, our main con tribution is a computationally feasible Bay esian approac h, whic h allo ws to reduce the n um b er of measuremen ts (Fig ure 5); a comparison b etw een CS-BP and previous Bay esian approac hes to CS [25, 26] w ould be fav ora ble. T o demonstrate that CS-BP deals w ell with measuremen t noise, recall the noisy mea- suremen t setting y = Φ x + z o f Section 3, where z ∼ N (0 , σ 2 Z ) is A W GN with v ariance σ 2 Z . Our algorithm deals with noise b y conv olving the noiseless v ersion of the estimated pdf (5) with the noise p df. W e sim ula ted deco ding pro blems where N = 10 0 0, S = 0 . 1, L = 20, σ 1 = 10, σ 0 = 1, p = 525 , and σ 2 Z ∈ { 0 , 2 , 5 , 1 0 } . Figure 7 plots the M MSE deco ding error 5 Our LP solv er is based on interior po int metho ds. 13 100 200 300 400 500 600 700 20 40 60 80 100 M MMSE IHT CoSaMP GPSR LP CS−BP Figure 5: MMSE as a function of the n umb er o f measuremen ts M u sing CS-BP , linear programming (LP), GPS R, CoSaMP , and IHT. Th e dashed lines sho w the ℓ 2 norms of x (top) and the small co eﬃcien ts (b ot tom). ( N = 100 0 , S = 0 . 1 , L = 20 , σ 1 = 10 , σ 0 = 1 , and noiseless measurements.) as a function of M and σ 2 Z . T o put things in p ersp ectiv e, the av erage measuremen t picks up a Gaussian term o f v ariance L (1 − S ) σ 2 0 = 18 from the signal. Although the deco ding error increases with σ 2 Z , as long as σ 2 Z ≪ 18 the noise has little impact on the dec o d ing error; CS-BP oﬀers a g raceful degradation to measuremen t noise . Our ﬁnal exp erimen t considers mo del mismatch where CS-BP has a n imprecise statistical c haracterization of the signal. Instead of a tw o-state mixture Gaussian signal model as b efore, where large co eﬃcien ts hav e v ariance σ 2 1 and o ccur with proba bility S , w e deﬁned a C -comp onen t mixture mo del. In our deﬁnition, σ 2 0 is in terpreted as a back ground signal lev el, which app e ars in all co e ﬃcien ts. Whereas the tw o - state mo del adds a “true signal” comp onen t of v ariance σ 2 1 − σ 2 0 to the bac kground signal, the C − 1 large comp onen ts eac h o ccur with probability S a nd the amplitudes of the true signals are σ 2 , 2 σ 2 , . . . , ( C − 1) σ 2 , where σ 2 is c hosen to preserv e the t otal signal energy . At the same time, w e did not c hange the signal priors in CS-BP , and used the same t w o-state mixture mo del as b efore. W e sim ulated decoding problems where N = 100 0, S = 0 . 1, L = 20 , σ 1 = 10, σ 0 = 1, p = 52 5, the measuremen ts are noiseless , and C ∈ { 2 , 3 , 5 } . Figure 8 plots the MMSE deco ding error as a function of M and C . The ﬁgure also sho ws ho w IHT and G PSR p erform, in order t o ev aluate whether they are more robust than the Bay esian approac h of CS-BP . W e did not sim ulate CoSaMP and ℓ 1 deco ding, since their MMSE p e rformance is comparable to that of IHT and GPSR. As the num b er of mixture comp onen ts C increases, the MMSE provided b y CS-BP increas es. How eve r, eve n fo r C = 3 the sparsit y rate eﬀectiv ely doubles from S to 2 S , a nd an increase in the required n umber of measuremen ts M is exp ected. In terestingly , the greedy IHT metho d also degrades signiﬁcan tly , p erhaps b ecause it implicitly mak es an assumption regar ding the num b er of large mixture components. GPSR, on the other hand, degrades more gracefully . 14 10 2 10 3 10 4 10 0 10 2 N Time [seconds] LP CS−BP GPSR CoSaMP IHT Figure 6: Run-time in seco nd s as a fun ction of the signal length N usin g CS-BP , linear program- ming (LP) ℓ 1 deco ding, GPSR, CoSaMP , and IHT. ( S = 0 . 1 , L = 20 , M = 0 . 4 N , σ 1 = 10 , σ 0 = 1 , and noiseless measurement s.) 6 V ariations and enhancemen ts Suppor ting arbi trary sparsifying basis Ψ: Un til now, w e hav e assumed that the canon- ical sparsifying basis is used, i.e., Ψ = I . In this case, x itself is sparse. W e no w ex- plain how CS-BP can b e mo diﬁe d to supp ort the case where x is sparse in an arbitrary basis Ψ. In the enco der, we m ultiply the CS-LDPC matr ix Φ b y Ψ T and enco de x as y = (ΦΨ T ) x = (ΦΨ T )(Ψ θ ) = Φ θ , where ( · ) T denotes the transpose op erator. In the deco der, w e use BP to form the appro ximation b θ , and t hen transform via Ψ to b x = Ψ b θ . In order to construct the mo diﬁed enco ding matrix ΦΨ T and later transform b θ to b x , extra computation is neede d; this extra cost is O ( N 2 ) in general. F ortuna t ely , in man y pr a ctical situations Ψ is structured (e.g., F o urier or w av elet bases) and amenable to fast c omputation. Therefore, extending our metho ds to suc h ba ses is f easible. Exploiting statistical de p endencies : In man y signal represen tations, the co eﬃcien ts are not iid. F or example, w av elet represen tations of natural images often con tain correlations b et w een magnitudes of parent and c hild co eﬃcien ts [2, 43]. Consequen tly , it is p ossible to deco de signals from fewe r measuremen ts using an algor ithm that allo cates diﬀeren t distribu- tions to diﬀe rent coeﬃcien ts [46, 58]. By mo difying the dependencies imp osed b y the prior constrain t no des (Se ction 4.1 ), CS-BP deco ding supports diﬀerent signal models. F eedbac k : F eedbac k from the deco der to the en co der can b e used in applications where measuremen ts may b e lost b ecause of transmissions ov er fault y c hannels. In an analogous manner to a digital fountain [5 9], the marg inal distributions (6) enable us to identify when suﬃcien t information for signal deco ding has b een receiv ed. A t that stage, the dec o d er notiﬁes t he enco der that deco ding is complete, and the stream of measuremen t s is stopp ed. Irregular CS-LDPC matrices : In c hannel coding, LDPC matrices that ha v e irregular ro w a nd column w eights come closer to the Shannon limit, b e cause a sm all n umber of rows or columns with large w eights require only mo dest additional computation y et g reatly reduce the blo c k error rate [38]. In an analogous manner, we expect irregular CS-LDPC matrices 15 100 200 300 400 500 600 700 20 40 60 80 100 M MMSE σ Z 2 =10 σ Z 2 =5 σ Z 2 =2 Noiseless Figure 7 : MMSE as a fun ction of M usin g d iﬀerent noise lev els σ 2 Z . Th e d ashed lines sho w the ℓ 2 norms of x (top) and th e s m all co eﬃcien ts (bottom). ( N = 1000 , S = 0 . 1 , L = 20 , σ 1 = 10 , and σ 0 = 1 .) to enable a further reduction in the num b er of measuremen ts required . 7 Discuss ion This pap er has dev elop ed a sparse encoding matrix and b elief propagation deco ding alg o- rithm to accelerate CS enco ding and deco ding under the Bay esian framew ork. Altho ugh w e focus on deco ding approx imately sparse signals, CS-BP can be exte nded to signals that are sparse in other bases, is ﬂexible to mo diﬁcations in the signal mo del, and can address measuremen t noise. Despite the signiﬁcan t b eneﬁts, CS-BP is not univ ersal in the sense that the enco ding matrix and deco ding metho ds mus t b e mo diﬁed in order to apply o ur framew o r k to arbitrary bases. Nonetheless, the necessary mo diﬁcations only require m ultiplication b y the sparsifying basis Ψ or its transp ose Ψ T . Our me tho d res em bles low densit y parit y c hec k (LDPC) codes [37, 38], whic h use a sparse Bernoulli parity c hec k matrix. Although a ny linear co de can be represen ted as a bipartite graph, for LDPC co des the sparsit y of the graph accelerates the encoding and deco ding pro cesses. LDPC co des are celebrated fo r ac hieving rates close to the Shannon limit. A similar comparison of t he MMSE p erformance of CS-BP with information theoretic b ounds on CS p erformance is left for future researc h. Additionally , a lthough CS-BP is not guaran teed to conv erge, the recen t con v ergence pro ofs for LD LC co des [36] suggest that future w ork on extensions o f CS-BP may also yield conv ergence pro o fs. In comparison to previous work on Ba y esian a sp ects of CS [25, 26], o ur metho d is m uc h faster, requiring only O ( N log 2 ( N )) computation. A t the same time, CS-BP oﬀers signiﬁcan t ﬂexibilit y , and should not b e view ed as merely ano ther fast CS deco ding algorithm. Ho w ev er, CS-BP relies on the sparsity of CS-LDPC matrices, and future researc h can consider the applicabilit y of suc h matrices in diﬀeren t applications. 16 100 200 300 400 500 600 700 800 20 30 40 50 60 70 80 90 100 M MMSE IHT C=5 IHT C=3 CS−BP C=5 IHT C=2 GPSR C=5 GPSR C=3 CS−BP C=3 GPSR C=2 CS−BP C=2 Figure 8: MMSE as a fun ction of the num b er of measuremen ts M and the n umb er of co mp onen ts C in the mixture Gaussian signal m o del. Plots for CS-BP (x), GPSR (circle), and IHT (asterisk) app ear for C = 2 (dotted), C = 3 (dashed), and C = 5 (solid). The horizonta l dashed lines show the ℓ 2 norms of x (t op) and th e small co eﬃcien ts (b ottom). ( N = 100 0 , S = 0 . 1 , L = 20 , σ 1 = 10 , σ 0 = 1 , and noiseless measuremen ts.) Outline of pro of of Theorem 1 : The proof b egins with a deriv ation of probabilistic b ounds on k x k 2 and k x k ∞ . Next, w e review a result b y W ang et a l. [49, Theorem 1]. The pro of is completed by com bining the b ounds with the result b y W ang et al. Upp er b ound on k x k 2 2 : Consider k x k 2 2 = P N i =1 x 2 i , where t he r a ndom variable (R V) X i has a mixture distribution X 2 i ∼  χ 2 σ 2 1 w.p. S χ 2 σ 2 0 w.p. 1 − S . Recall the moment gener ating function (MGF), M X ( t ) = E [ e tx ]. The MGF of a Chi-squared R V satisﬁes M χ 2 ( t ) = (1 − 2 t ) − 1 2 . F or the mixture R V X 2 i , M X 2 i ( t ) = S p 1 − 2 tσ 2 1 + 1 − S p 1 − 2 tσ 2 0 . Additionally , b ecause the X i are iid, M k x k 2 2 ( t ) = h M X 2 i ( t ) i N . In voking t he Chernoﬀ b ound, 17 w e hav e Pr  k x k 2 2 < S N σ 2 1  < e − tS N σ 2 1 " S p 1 − 2 tσ 2 1 + 1 − S p 1 − 2 tσ 2 0 # N for t < 0. W e aim to sho w that Pr ( k x k 2 2 < S N σ 2 1 ) decay s faster than N − γ as N is increase d. T o do so, let t = − α σ 2 1 , where α > 0. It suﬃces to prov e that there exists some α f or whic h f 1 ( α ) = e αS     S √ 1 + 2 α + 1 − S r 1 + 2 α  σ 0 σ 1  2     < 1 . Let f 2 ( α ) = 1 √ 1+2 α and f 3 ( α ) = e α . It is easily seen v ia T aylor series that f 2 ( α ) = 1 − α + O ( α 2 ) and f 3 ( α ) = 1 + α + O ( α 2 ), and s o f 1 ( α ) = e αS " S  1 − α + O ( α 2 )  + (1 − S ) 1 − α  σ 0 σ 1  2 + O α 2  σ 0 σ 1  4 !!# =  1 + α S + O ( α 2 S 2 )  " 1 − α S + (1 − S )  σ 0 σ 1  2 ! + O ( α 2 ) # . Because of t he negat ive term − α (1 − S )  σ 0 σ 1  2 < 0, whic h do minates the higher order term O ( α 2 ) for small α , there exis ts α > 0 , whic h is indep enden t of N , for whic h f 1 ( α ) < 1. Using this α , the Che rnoﬀ b ound pro vides an upp er b ound on Pr ( k x k 2 2 < S N σ 2 1 ) that deca ys exp o nen tially w ith N . In summary , Pr  k x k 2 2 < S N σ 2 1  = o ( N − γ ) . (8) Lo w er bound on k x k 2 2 : In a similar manner, MGF’s and the Chernoﬀ b ound can b e used to oﬀer a probabilistic b ound on the n um b er of large Gaussian mixture comp onents Pr N X i =1 Q ( i ) > 3 2 S N ! = o ( N − γ ) . (9) T aking in to accoun t t he limited n um b er of large comp onents and the exp ected squared ℓ 2 norm, E [ k x k 2 2 ] = N [ S σ 2 1 + (1 − S ) σ 2 0 ], w e hav e Pr  k x k 2 2 > N [2 S σ 2 1 + (1 − S ) σ 2 0 ]  = o ( N − γ ) . (10) W e omit the (similar) details for brevity . Bound on k x k ∞ : T he upper b ound on k x k ∞ is obta ined by ﬁrst conside ring large mixture comp onen ts and then small c omp o nen ts. First, w e consider the large Gaussian 18 mixture comp onen ts, and denote x L = { x ( i ) : Q ( i ) = 1 } . Pr k x L k ∞   1 − f 5  p 2 ln( S N 1+ γ )  p 2 ln( S N 1+ γ )   3 2 S N (12) > 1 − 3 2 S N f 5  p 2 ln( S N 1+ γ )  p 2 ln( S N 1+ γ ) (13) = 1 − 3 S N 2 p 2 ln( S N 1+ γ ) e − 1 2 2 ln( S N 1+ γ ) √ 2 π = 1 − 3 N − γ 4 p ln( S N 1+ γ ) , where f 4 ( α ) = 1 √ 2 π R α −∞ e − u 2 / 2 du is the cum ulative distribution function of t he standard nor- mal distribution, the inequalit y (11) relies on f 4 ( · ) < 1 and the p ossibilit y that P N i =1 Q ( i ) is strictly smalle r than 3 2 S N , f 5 ( α ) = 1 √ 2 π e − α 2 / 2 is the p df of the standard normal distribu- tion, (12) r elies on t he b ound f 4 ( α ) > 1 − f 5 ( α ) /α , and the ine quality (13) is motiv ated b y (1 − α ) β > 1 − αβ for α , β > 0. Noting that ln ( S N 1+ γ ) increases with N , for large N w e ha v e Pr k x L k ∞ 1 − N − γ 5 . (14) No w consider t he small G aussian mixture comp o nen ts, and den ote x S = { x ( i ) : Q ( i ) = 0 } . As b efore, Pr  k x S k ∞ 1 − N p 2 ln( S N 1+ γ ) σ 1 σ 0 e − 1 2 2 ln( S N 1+ γ ) “ σ 1 σ 0 ” 2 √ 2 π , where in (15) the n umber of small mixture comp onen ts is often less than N . Because σ 1 > σ 0 , for large N w e ha v e Pr  k x S k ∞ 1 − N − γ 5 . (16) Com bining (9), (14 ) and ( 16), f or lar g e N we hav e Pr  k x k ∞ 1 − N − γ 2 . (17) Result by W ang et al. [49, Theorem 1] : 19 Theorem 2 ([49]) Cons ider x ∈ R N that s a tisﬁes the c ondition k x k ∞ k x k 2 ≤ Q. (18) In addition, le t V b e any s e t of N ve ctors { v 1 , . . . , v N } ⊂ R N . Supp o se a sp arse r andom matrix Φ ∈ R M × N satisﬁes E [Φ ij ] = 0 , E [Φ 2 ij ] = 1 , E [Φ 4 ij ] = s, wher e 1 s = L N is the fr action of no n-zer o entries in Φ . L et M =  O  1+ γ ǫ 2 sQ 2 log( N )  if sQ 2 ≥ Ω(1) O  1+ γ ǫ 2 log( N )  if sQ 2 ≤ O (1 ) . (19) Then with pr ob ab ility a t le ast 1 − N − γ , the r andom pr oje c tion s 1 M Φ x and 1 M Φ v i c an pr o duc e an es tima te b a i for x T v i satisfying | b a i − x T v i | ≤ ǫ k x k 2 k v i k 2 , ∀ i ∈ { 1 , . . . , N } . Application of Theorem 2 t o p ro of of Theorem 1 : Com bining (8) , (10 ) , and (17), the union b ound demonstrates that with probability low er b ounded b y 1 − N − γ w e ha v e k x k ∞ < p 2 ln( S N 1+ γ ) σ 1 and k x k 2 2 ∈ ( N S σ 2 1 , N [2 S σ 2 1 + (1 − S ) σ 2 0 ]). 6 When these ℓ 2 and ℓ ∞ b ounds hold, w e can apply Theorem 2. T o a pply Theorem 2, w e m ust sp ecify ( i ) Q (1 8); ( ii ) the test vectors ( v i ) N i =1 ; ( iii ) the matrix sparsit y s ; and ( iv ) the ǫ parameter. First, the b ounds on k x k 2 and k x k ∞ indicate that k x k ∞ k x k 2 ≤ Q = q 2 ln( S N 1+ γ ) S N . Second, w e c ho o se ( v i ) N i =1 to be the N canonical ve ctors of the iden tit y matr ix I N , pro viding x T v i = x i . Third, our c hoice of L oﬀers s = N L = N S η ln( S N 1+ γ ) . F ourth, w e set ǫ = µσ 1 p N [2 S σ 2 1 + (1 − S ) σ 2 0 ] . Using these parameters, Theorem 2 demonstrates that all N a pproximations b a i satisfy | b a i − x i | = | b a i − x T v i | ≤ ǫ k x k 2 k v i k 2 < µσ 1 with probabilit y low er b o unded by 1 − N − γ . Combining the pro ba bilit y that the ℓ 2 and ℓ ∞ b ounds hold and the deco ding pro babilit y oﬀered by Theorem 2, w e hav e k b a − x k ∞ < µσ 1 (20) with probabilit y low er bounded b y 1 − 2 N − γ . W e complete the pro of b y computing the num b er of measuremen ts M required (1 9). Because sQ 2 = K η l n( S N 1+ γ ) 2 ln( S N 1+ γ ) S N = 2 η , w e need M = O  (1 + 2 η − 1 ) 1 + γ ǫ 2 log( N )  = O N (1 + 2 η − 1 ) (1 + γ ) µ 2 " 2 S + ( 1 − S )  σ 0 σ 1  2 # log( N ) ! measuremen ts.  6 The o ( · ) terms (8) and (10) demonstr a te that there exists some N 0 such that for a ll N > N 0 the upp er and low er b ounds o n k x k 2 2 each hold with probability lo wer bounded b y 1 − 1 4 N γ , res ulting in a pro ba bilit y low er b ounded by 1 − N − γ via the union b ound. Because the express ion (2) for the n umber of measurements M is an order term, the c a se where N ≤ N 0 is inconsequential. 20 Ac kno wledgments Thanks to Da vid Sc ott, Danny Sorensen, Yin Zhang, Marco Duar t e, Mic hael W akin, Mark Da ve np ort, Jason Lask a, Matthew Mora v ec, Elaine Hale, Christine Kelley , and Ingmar Land for informativ e and inspiring con vers ations. Thanks to Phil Sc hniter f o r bringing his related w ork [27 ] to our atten tion. Sp ecial thanks to Ramesh Neelamani, Alexandre de Ba ynast, and Predrag Radosavljev ic for pro viding helpful suggestions for implemen ting BP; to Danny Bic kson and Harel Avissar for improving our implemen tation; and to Marco Duarte for wizardry with the ﬁgures. Additionally , the ﬁrst author tha nks the Departmen t of Electrical Engineering a t the T ec hnio n f or generous hospitalit y while parts o f the w ork w ere b eing p erformed, and in particular the supp ort of Yitzhak Birk and Tsac hy W eissman. Fina l thanks to the ano nymous review ers, w hose s up erb commen ts helped to greatly improv e the qualit y of the pap er. References [1] S. Sarv otham, D . Baro n, and R. G. Baraniuk, “ Compressed sensing reconstruction via b elief propagation,” T ech. Rep. TREE0601 , Rice Univ ersit y , Houston, TX, July 2006. [2] R. A. DeV ore, B. Jaw erth, and B. J. Lucier, “Image compression through w av elet transform co ding,” IEEE T r ans. Inf. The ory , vol. 38, no. 2, pp. 719– 746, Mar. 19 92. [3] I. F. Goro dnitsky a nd B. D. Rao , “Sparse signal reconstruction from limited data using F OCUSS: A re-w eigh ted minimum norm algorithm,” IEEE T r ans. Signal Pr o c ess. , v ol. 45, no . 3, pp. 600–616, Marc h 1997 . [4] E. Cand` es, J. Rom b erg, and T. T ao, “Robust uncertain t y principles: Exact signal re- construction from highly incomplete frequen cy inf ormation,” IEEE T r an s . Inf. T h e ory , v ol. 52, no. 2, pp. 489– 509, F eb. 20 06. [5] D. D o noho, “Compressed sensing,” IEEE T r ans. Inf. T he ory , vol. 52, no. 4, pp. 1289– 1306, Apr. 2 006. [6] R. G. Baraniuk, “A lecture on compressiv e sensing,” IEEE Signal Pr o c es s Mag. , vol. 24, no . 4, pp. 118–121, 2007. [7] E. Cand ` es, M. Rudelson, T. T ao, and R. V ersh ynin, “Error correction via linear pro - gramming,” F ound. Co mp. Math. , pp. 295–308, 2005. [8] S. Chen, D. Donoho, and M. Saunders, “A to mic decomp osition b y basis pursuit,” SI AM J. Sci. Comp. , vol. 20, no. 1 , pp. 33–61, 1998. [9] D. Donoho and J. T anner, “Neigh b orliness of randomly pro jected simplic es in hig h dimensions,” Pr o c. Nat. A c adem y Scienc es , v ol. 10 2, no. 27, pp. 9452–4 57, 2005. [10] D. D onoho, “High-dimensional centrally symmetric polytop es with neigh b o rliness pro- p ortional to dimension,” Discr ete Comp ut. Ge ometry , vol. 35, no. 4, pp. 617–6 52, Mar. 2006. 21 [11] P . M. V aidy a, “An algorithm for linear programming whic h requires O(((m+n)n2+(m+n)1.5n)L) arithmetic o p erations,” in STO C ’87 : Pr o c. 19th ACM Symp. T h e ory of c omputing , New Y or k, NY, USA , 19 87, pp. 2 9–38, ACM . [12] J. A. T ropp and A. C. Gilb ert, “Signa l reco v ery from random measuremen ts via or- thogonal matc hing pursuit,” I EEE T r ans. Inf. The o ry , v ol. 53, no. 12, pp. 46 55–4666, Dec. 200 7. [13] A. Cohen, W. Dahmen, and R . A. DeV ore, “Near optimal appro ximation of a r bit r a ry v ectors from highly incomplete me asuremen ts,” 2007, Preprin t. [14] M. F. Duarte, M. B. W akin, a nd R. G. Baraniuk, “F ast reconstruction of piecewise smo oth signals from random pro j ections,” in Pr o c. SP ARS05 , Rennes, F rance, Nov . 2005. [15] D. L. Donoho, Y. Tsaig, I. Dro r i, and J-C Starc k, “Sparse solution of underdetermined linear equations b y s tagewise o r thogonal matc hing pursuit,” Mar. 20 0 6, Preprin t. [16] D. Needell and J. A. T ropp, “CoSaMP: Iterativ e signal rec ov ery from inc omplete and inaccurate samples,” Appl . Comput. Harmonic Analysis , v ol. 2 6, no. 3, pp. 301–3 21, 2008. [17] T. Blumensath and M. E. Da vies, “Iterativ e hard thr esholding for compressed sensing,” to app e ar in Appl. Comput. Harmonic A nalysis , 200 8 . [18] W. Dai and O. Milenk o vic, “Subspace pursuit for compressiv e sensing: Closing the gap b et w een p erformance and complexit y ,” IEEE T r a n s. Inf. The o ry , v ol. 55, no. 5, pp. 2230–224 9, Ma y 2009 . [19] E. Hale, W. Yin, and Y. Zha ng, “Fixed-p o in t con tin uation f or ℓ 1 -minimization: Metho d- ology a nd con v ergence,” 2007, Submitted. [20] M. Figueiredo, R. Now ak, and S. J. W righ t, “Gra dien t pro jection for sparse reconstruc- tion: Application to compressed sensing and other in v erse problems,” Dec. 20 07, IEEE J. Sel. T op. Sign. Pro ces. [21] E. v an den Berg and M. P . F riedlander, “ Probing the Pareto fron tier for basis pursuit solutions,” T ech . Rep. TR- 2008-01, Departmen t of Computer Science, Univ ersit y of British Colum bia, Jan. 2008 , T o app ear in SI AM J. Sci. Comp. [22] G. Cormo de and S. Muth ukrishnan, “T o w ards an algorithmic theory of compress ed sensing,” DIMACS T e chn ic al R ep ort TR 2005-25 , 2005. [23] G. Cormo de and S. Muthu krishnan, “Com binatorial algor it hms f o r compressed sens- ing,” DI MACS T e c h nic al R ep ort TR 20 05-40 , 2005 . [24] A. C. Gilb ert, M. J. Strauss, J. T ropp, and R. V ersh ynin, “Algorithmic linear dimension reduction in the ℓ 1 norm fo r sparse v ectors,” Apr. 20 0 6, Submitted. 22 [25] S. Ji, Y. Xue, and L. Carin, “Ba y esian compressiv e s ensing,” I EEE T r ans. Signal Pr o c ess. , v ol. 56 , no. 6, pp. 23 46–2356, June 2008. [26] M. W. Seeger and H. Nic kisc h, “Compressed sensing and Bay esian experimen tal design,” in IC ML ’0 8 : Pr o c. 25th Int. Conf. Machine le arning , 2 0 08, pp. 912– 919. [27] P . Sc hniter, L. C. P otter, and J. Ziniel, “F ast Bay esian matching pursuit: Mo del uncertain t y a nd pa r ameter estimation for sparse linear mo dels,” I EEE T r an s . Signal Pr o c ess. , Marc h 200 9. [28] T. Hastie, R . Tibshirani, and J. H. F riedman, The Elements of Statistic al L e arnin g , Springer, August 2001. [29] G. G uo and C.-C. W ang, “Multiuser detection of sparsely spread CDMA,” IEEE J. Sel. Ar e a s Comm un. , vol. 26, no. 3, pp. 421 – 431, 2008 . [30] J. Pearl, “ Proba blistic reasoning in intelligen t systems : Net w orks of plausible inference,” Mor gan-Kaufm a nn , 1988. [31] F. V. Jensen, “An introduction to Ba yes ian net works ,” Springer-V erlag , 19 96. [32] B. J. F rey , “G raphical mo dels for machine learning and digital comm unication,” MIT pr ess , 1998 . [33] J. S. Y edidia, W. T. F reeman, and Y. W eiss, “Understanding b elief propagation and its generalizations,” Mitsubishi T e ch. R ep. TR2001-022 , Jan. 2002. [34] D. J. C. MacKay , “Informatio n theory , inference and learning algorithms,” Cambridge University Pr es s , 200 2 . [35] R. G. Co w ell, A. P . Daw id, S. L. Lauritzen, and D. J. Spiegelh alter, “Proba bilistic net w orks and exp ert systems,” Springer-V erlag , 2003. [36] N. Sommer, M. F eder, and O. Shalvi, “Lo w-densit y lattice co des,” IEEE T r ans. I nf. The ory , v ol. 54 , no. 4 , pp. 156 1–1585, 2 008. [37] R. G. Gallager, “Lo w-densit y parity-c hec k co des,” IEEE T r ans. Inf. The ory , vol. 8 , pp. 21–28, Ja n. 1962. [38] T. J. Ric hardson, M. A. Shokrollahi, and R. L. Urbanke , “Design of capacit y- approac hing irregular lo w-densit y parity-c heck co des,” IEEE T r an s . I nf. The ory , v ol. 47, pp. 619–637, F eb. 2001. [39] S. Sarv otham, D . Baron, and R. G. Baraniuk, “Sudo co des – Fast measuremen t and reconstruction of sparse signals,” in Pr o c. Int. Symp. Inf. The ory (ISIT20 0 6) , Seattle, W A, July 2006 . [40] R. Berinde and P . Indyk , “Sparse reco ve ry using sparse random matrices,” MIT-CSAIL- TR-2008-001 , 200 8 , T ec hnical Rep o rt. 23 [41] J.-C P esquet, H. Krim, and E. Hamman, “Bay esian approach to b est basis s election,” IEEE 1996 Int. Conf. A c oustics, Sp e e ch, S ignal Pr o c ess. (I CASSP) , pp. 263 4 –2637, 1996. [42] H. Chipman, E. K olaczyk, and R. McCullo c h, “Adapt ive Bay esian wa ve let shrink age,” J. Amer. S tat. Asso c. , v ol. 92, 19 97. [43] M. S. Crouse, R. D. No w ak, and R. G. Baraniuk, “W av elet-based signal pro cessing using hidden Mark o v mo dels,” IEEE T r ans. Signal Pr o c ess. , v ol. 46, pp. 886–902, April 1998. [44] E. I. George and R . E. McCullo c h, “V ariable selection via Gibbs sampling,” J. A m. Stat. A sso c. , vol. 88, pp. 88 1 –889, 19 9 3. [45] J. G ewek e, “V ariable selection and mo del comparison in regress ion,” in Bayesian Statistics 5 , 1 996, pp. 609–620 . [46] L. He and L. Carin, “Exploiting structure in w av elet-based Ba y esian compre ssed s ens- ing,” to app e ar in IEEE T r ans. Sign al Pr o c ess. , 2 008. [47] H. W. Sorenson and D. L. Alspac h, “Recursiv e Bay esian estimation using Ga ussian sums,” Aut omatic a , v ol. 7, pp. 465 –479, 197 1. [48] J. A. T ro pp, “Greed is go o d: Algorithmic results for sparse approximation,” I EEE T r ans. In f . The ory , v ol. 50, pp. 2231–2 242, 2004. [49] W. W ang, M. Garofalakis, and K. Ramc handran, “Distributed sparse random pro j ec- tions for reﬁnable appro ximation,” in Pr o c. I nf. Pr o c es s . Sensor Networks (IPSN2007) , 2007, pp. 3 3 1–339. [50] G. Co op er, “The computational complexit y of proba bilistic inference using Ba y esian b elief net w orks,” A rtiﬁcial Intel lig e nc e , vol. 42, pp. 393–405 , 1990. [51] F. R. Ksc hisch ang, B. J. F rey , and H-A. Lo eliger, “Factor graphs and the sum-pro duct algorithm,” I EEE T r ans . Inf. T h e ory , v ol. 47, no. 2, pp. 498 –519, F eb. 2 001. [52] E. Sudderth, A. Ihler, W. F reeman, and A. S. Willsky , “Nonparametric b elief propa g a- tion,” MIT LIDS T e ch. R ep. 2 5 51 , Oct. 2 002. [53] B. J. F rey and D. J. C. MacKay , “A rev olution: Belief propagation in g raphs with cycles,” A dv. Neur al I nf. Pr o c ess. Systems, M. Jor dan, M. S. Ke arns and S. A. S ol la (Eds.) , v ol. 10, 1998 . [54] A. Ihler, J. Fisher, and A. S. Willsky , “Lo opy b elief propagation: Con v ergence and eﬀects of me ssage erro r s,” J. Machine L e a rn i ng R es. , v ol. 6, pp. 905 –936, May 200 5. [55] M. Pretti, “A Message-P assing Algorithm with D a mping,” J. Stat. Me ch. , Nov. 2 0 05. [56] D. J. C. MacKa y , “Go o d error-correcting co des based on very sparse matrices,” IEEE T r ans. In f . The ory , v ol. 45, pp. 399–43 1, Mar. 1999 . 24 [57] D. W. Scott and W. F. Szew czyk, “From k ernels to mixtures,” T e chnometrics , vol. 43, pp. 323–335 , Aug. 2 001. [58] R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hegde, “Mo del-based compressiv e sensing,” 2008, Preprin t. [59] J. W. By ers, M. Luby , and M. Mitzenmac her, “A digital foun tain approac h to asyn- c hronous reliable m ulticast,” IEEE J. Sel. A r e as Com mun. , v ol. 20, no. 8, pp. 1528–1540 , Oct. 200 2. 25

Bayesian Compressive Sensing via Belief Propagation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment