Sparse Estimation with the Swept Approximated Message-Passing Algorithm

Sparse Estimation with the Swept Approximated Message-P assing Algorithm Andre Manoel, Florent Krzakala, Eric W . T ramel, Lenka Zdeborov ´ a Abstract Approximate Message Passing (AMP) has been shown to be a superior method for inference problems, such as the recov ery of signals from sets of noisy , lower -dimensionality measurements, both in terms of reconstruction accuracy and in computational efﬁciency . Howe ver , AMP suffers from serious con vergence issues in contexts that do not exactly match its assumptions. W e propose a new approach to stabilizing AMP in these contexts by applying AMP updates to individual coef ﬁcients rather than in parallel. Our results sho w that this change to the AMP iteration can provide theoretically expected, but hitherto unobtainable, performance for problems on which the standard AMP iteration di ver ges. Additionally , we ﬁnd that the computational costs of this swept coef ﬁcient update scheme is not unduly burdensome, allowing it to be applied efﬁciently to signals of large dimensionality . I . I N T RO D U C T I O N Belief Propagation (BP) is a powerful iterati ve message passing algorithm for graphical models [1–3]. Howe ver , it presents two main drawbacks when applied to highly connected continuous variable problems: ﬁrst, the need to work with continuous probability distributions; and second, the necessity to iterate over one such probability distribution for each pair of variables. The ﬁrst problem can be addressed by projecting the distributions onto a ﬁnite number of moments [4] and the second by utilizing the Thouless-Andreson-Palmer (T AP) approach [2, 3] where only single variable marginals are required. Approximate message passing (AMP), ﬁrst introduced in [5], is one relaxation of BP that utilizes both of the aforementioned approximations in order to solve sparse estimation problems. In AMP’ s more general setting, as is considered in Generalized AMP (GAMP) [6], the goal of the algorithm is the reconstruction of an N - dimensional sparse vector x gi ven the knowledge of an M -dimensional vector y obtained via a possibly non-linear and/or probabilistic output function h ( z ) performed on a set of linear projections. Speciﬁcally , y µ = h ( z µ ) , where z µ = N X i =1 Φ µi x i . 1 (1) For example, if h ( z ) = z + ξ where ξ is a zero-mean i.i.d. Gaussian random variable, then h ( z ) represents an A. Manoel is with the Institute of Physics, Univ ersity of S ˜ ao Paulo, R. do Mat ˜ ao 187, 05508-090 S ˜ ao Paulo, Brazil. F . Krzakala is with Universit ´ e Pierre et Marie Curie and ´ Ecole Normale Sup ´ erieure, 24 rue Lhomond, 75005 Paris, France. E. W . Tramel is with ´ Ecole Normale Sup ´ erieure, 24 rue Lhomond, 75005 Paris, France. L. Zdeborov ´ a is with Institut de Physique Th ´ eorique, CEA Saclay , and CNRS URA 2306, 91191 Gif-sur-Yvette, France. 1 In the present work, we use subscript notation to denote the individual coefﬁcients of vectors, i.e. y µ refers to the µ th coefﬁcient of y where µ ∈ { 1 , 2 , . . . , M } , and the double-subscript notation to refer to individual matrix elements in row-column order . October 17, 2018 DRAFT 1 additiv e white Gaussian noise (A WGN) channel. With this output function, in the setting M  N , (1) is simply the application of Compressed Sensing (CS) [7] under noise. AMP is currently acknowledged as one of the foremost algorithms for such problems in terms of both its computational efﬁcienc y and in the number of measurements required for exact reconstruction of x . In fact, with properly chosen measurement matrices [8–10], one can achieve information-theoretically optimal reconstruction performance for CS, a hitherto unachiev able bound with standard con ve x optimization approaches. Just as with any iterativ e algorithm, the con ver gence properties of AMP are of chief analytical concern. Many rigorous results hav e been obtained on the performance of AMP in the case of i.i.d. and block i.i.d. matrices [10, 11]. Unfortunately , while AMP performs well for zero-mean i.i.d. projections, performance tends to drastically decline if one moves away from these simple scenarios. In fact, even for i.i.d. matrices with a small positiv e mean, the algorithm may violently diver ge, leading to poor reconstruction results [12]. This instability to slight v ariations from these strict assumptions on the projections is a serious problem for many practical applications of AMP . The main theoretical reason for these con ver gence issues has been identiﬁed in [12]. Namely , AMP’ s use of a parallel update, instead of a sequential one, on the BP variables at each iteration. Three strategies hav e been proposed in recent literature to avoid this problem. First, one can highly damp the AMP iterations, as in [8, 13]. Howe ver , this often requires a damping factor so large that the cost, in terms of the number of iterations until con ver gence, is prohibitiv e. Additionally , it is not entirely clear how to determine an optimal damping factor to ensure con vergence in general. Second, one can modify the problem a posteriori in order to come back to a more fa vorable situation. For instance, one might remove the mean of the matrix and of the measurements [12], or one might modify the algorithm according to the theoretical spectrum of the operator Φ [14, 15], if it is known. This knowledge about the operator may be prohibitive and could therefore present a strong limitation in practice. Third, one might take one step backward in approximation from AMP to a BP-style iteration [12]. This amounts to a huge cost in terms of both memory and computational efﬁciency as there are O ( N 2 ) v ariables to update with BP as opposed to the the O ( N ) utilized in AMP . In this contribution, we solve these problems by deriving a modiﬁed and ef ﬁcient AMP algorithm with greatly improv ed con ver gence properties while preserving the O ( N ) iteration and memory cost of AMP . W e accomplish this by a careful analysis of the relaxation leading from BP to AMP where we preserve the sequential, or swept, variable update pattern of BP in our AMP approach. This leads to a slightly modiﬁed set of update rules for the AMP and GAMP algorithms without affecting the ﬁxed point in any way . The resulting algorithm, which we denote as Swept AMP (SwAMP), possesses impressive empirical conv ergence properties. The deriv ation of SwAMP is explained next in Sec. II. W e then report, in Sec. III, numerical results for basic and 1-bit CS, as well as for group testing. In all of these cases, huge improv ements over the state-of-the-art can be obtained while remaining robust to projections with troublesome properties. October 17, 2018 DRAFT 2 I I . F RO M B E L I E F - P R O P A G A T I O N T O S W A M P F O R S I G N A L R E C OV E RY A. Signal Recovery as Statistical Estimation T o describe AMP , we focus on the CS signal recovery problem with real valued signals in terms of statistical inference. Giv en an unknown signal x ∈ R N , a linear projection Φ ∈ R M × N , and a set of observations y ∈ R M generated from x and Φ , we write the posterior distribution for the unknown signal according to Bayes’ rule, P ( x | Φ , y ) ∝ P ( y | Φ , x ) P 0 ( x ) , (2) where we write ∝ as we neglect the normalization constant. The likelihood P ( y | Φ , x ) is determined according to the constraints one wishes to enforce, which we consider to be of form y = h (Φ x ) , with h being, in general, any stochastic function. Here, we consider h to be an A WGN channel 2 , y µ = h (Φ µ x ) = Φ µ x + N (0 , ∆) , (3) where ∆ is the variance of the A WGN and Φ µ is the µ th row-v ector of Φ . Hence, P ( y | Φ , x ) = 1 √ 2 π ∆ M Y µ =1 exp  − ( y µ − P i Φ µi x i ) 2 2∆  . (4) The prior P 0 ( x ) is determined from the information we hav e on the structure of x . For CS, we are concerned with the recov ery of sparse signals, i.e. ones with few non-zero values. Unstructured sparse signals can be modeled well by an i.i.d. Bernoulli sparse prior, P 0 ( x ) ∝ N Y i =1 P 0 ( x i ) , where P 0 ( x i ) = ρψ ( x i ) + (1 − ρ ) δ ( x i ) , (5) where ψ ( x i ) can be any distribution, e.g. ψ ( x i ) = N ( x i ; ¯ x, σ 2 ) , and the degree of sparsity is controlled by the value ρ ∈ [0 , 1] . Notice that, in this usual setting, both distributions are factorized, that is, the likelihood is in M terms relativ e to the constraint ov er each y µ , and the prior is in N terms relativ e to what is expected of each x i . Factorized distributions such as these are well represented by graphical models [16], speciﬁcally , bipartite graphs in which the M + N factors are represented by one type of node and the N variables x i by another . Once the posterior distribution is written down, the estimate ˆ x may be assigned in different ways, according to what loss function one wishes to minimize. In this work, we are chieﬂy concerned with the minimum mean-squared error (MMSE) estimate, which can be shown to be the average of x i with respect to the posterior P ( x | Φ , y ) ; if one were able to compute the posterior’ s marginals, the MMSE estimate would read ˆ x MMSE i = Z d x i x i P ( x i | Φ , y ) , ∀ i. (6) The strategy employed by AMP is to infer the marginals of the posterior by using a relaxed version of the BP 2 One can generalize h to be a more complicated output function. This generalization constitutes the change of AMP to GAMP [6]. For example, we examine the case of 1-bit CS in Sec. III-C where h is a non-linear sign function. October 17, 2018 DRAFT 3 algorithm [1, 2], and thus to arriv e at the MMSE estimate of the unknown signal x . B. Relaxed Belief-Propa gation BP implements a message-passing scheme between nodes in a graphical model, ultimately allowing one to compute approximations of the posterior marginals. Messages m i → µ are sent from the v ariables nodes to the factor nodes and subsequent messages m µ → i are sent from factor nodes back to variable nodes that corresponds to algorithm’ s current “beliefs” about the probabilistic distribution of the variables x i . Since these distributions are continuous, the ﬁrst relaxation step is to mov e to a projected version of these distributions, as described in [6, 9]. Here, we shall follow the notation of reference [9] and use the following parametrization: a i → µ , Z d x i x i m i → µ ( x i ) , v i → µ , Z d x i x 2 i m i → µ ( x i ) − a 2 i → µ , (7) m µ → i ( x i ) ∝ e − x 2 i 2 A µ → i + B µ → i x i . (8) This leads (see [9]) to the following closed recursion sometimes called relaxed BP (r-BP): A µ → i = Φ 2 µi ∆ + P j 6 = i Φ 2 µj v j → µ , a i → µ = f 1 1 P γ 6 = µ A γ → i , P γ 6 = µ B γ → i P γ 6 = µ A γ → i ! , (9) B µ → i = Φ µi ( y µ − P j 6 = i Φ µj a j → µ ) ∆ + P j 6 = i F 2 µj v j → µ , v i → µ = f 2 1 P γ 6 = µ A γ → i , P γ 6 = µ B γ → i P γ 6 = µ A γ → i ! , (10) where the functions f are deﬁned by the following prior-dependent integrals f 1 (Σ 2 , R ) , Z d x x P 0 ( x ) 1 √ 2 π Σ e − ( x − R ) 2 2Σ 2 , (11) f 2 (Σ 2 , R ) , Z d x x 2 P 0 ( x ) 1 √ 2 π Σ e − ( x − R ) 2 2Σ 2 − f 2 1 (Σ 2 , R ) = Σ 2 d f 1 d R (Σ 2 , R ) . (12) After conv ergence, the single point marginals are giv en by a i = f 1 1 P γ A γ → i , P γ B γ → i P γ A γ → i ! , v i = f 2 1 P γ A γ → i , P γ B γ → i P γ A γ → i ! . (13) W e intentionally write r-BP without specifying time indices since the updates can be performed in one of two ways. The ﬁrst approach is to update in parallel, where all variables are updated at time t giv en the state at time t − 1 . The second is the random sequential update where one picks a single index i and updates all messages corresponding to it. A time-step is completed once all indices hav e been visited and updated once. As shown in [12], the sequential, or swept, iteration is much more stable for r-BP . W e now turn our attention to AMP and to our proposed modiﬁcation. C. Swept Approximate Message P assing In the message-passing described in the previous section, 2( M × N ) messages are sent, one between each variable component and each measurement at each iteration. This creates a very large computational and memory burden October 17, 2018 DRAFT 4 for applications with large N , M . It is possible to rewrite the BP equations in terms of only N + M messages by making the assumption that Φ is dense and that its elements are of magnitude O (1 / √ N ) . In statistical physics, this assumption leads to the T AP equations [17] used in the study of spin glasses. For graphical models, such strategies hav e been discussed in [3]. The use of T AP with r-BP provides the standard AMP iteration, as we now show . First we deﬁne ω µ , X i Φ µi a i → µ , V µ , X i Φ 2 µi v i → µ , (14) Σ 2 i , 1 P µ A µ → i , R i , P µ B µ → i P µ A µ → i . (15) Next we expand around the marginals and disregard any O (1) terms (see [12] for details) to ﬁnd: V µ ≈ X i Φ 2 µi v i , , Σ 2 i = " X µ Φ 2 µi ∆ µ + V µ − Φ 2 µi v i → µ # − 1 ≈ " X µ Φ 2 µi ∆ µ + V µ # − 1 , R i = " X µ Φ µi ( y µ − ω µ + Φ µi a i → µ ) ∆ µ + V µ − Φ 2 µi v i → µ # " X µ Φ 2 µi ∆ µ + V µ − Φ 2 µi v i → µ # − 1 , ≈ a i + P µ Φ µi ( y µ − ω µ ) ∆ µ + V µ P µ Φ 2 µi 1 ∆ µ + V µ . (16) Now let us in vestigate the expansion of the factor ω µ as we include the time, or iteration, indices t . First one has a t +1 i → µ = f 1 1 P ν A t ν → i − A t µ → i , P ν B t ν → i − B t µ → i P ν A t ν → i − A t µ → i ! = a t +1 i − B t µ → i (Σ 2 i ) t ∂ f 1 ∂ R  (Σ 2 i ) t , R t i  , = a t +1 i − B t µ → i v t +1 i , (17) making the expansion for ω µ ω t +1 µ = X i Φ µi a t +1 i − ( y µ − ω t µ ) ∆ µ + V t µ X i Φ 2 µi v t +1 i = X i Φ µi a t +1 i − ( y µ − ω t µ ) ∆ µ + V t µ V t +1 µ , (18) which allows us to close the equations on the set of a, v , R , Σ , V and ω . Iterating all relations in parallel (i.e. updating all R, Σ ’ s, then a, v ’ s and then the ω , V ’ s) provides the AMP iteration. The implementation of the sequential update is not a straightforward task as many otherwise intuitive attempts lead to non-conv ergent algorithms. The key observation in the deriv ation of SwAMP is that (18) mixes different time indices: while the “ a ” and “ V ” are the “new ones”, the expression in the fraction is the “old” one, i.e. the one before the last iteration. The implication of this is that while P i Φ µi a i and V µ should be recalculated as the updates sweep over i at a single time-step, the term ( y µ − ω µ ) / (∆ µ + V µ ) (which we denote as g µ later on) should not. A corresponding bookkeeping then leads to the SwAMP algorithm for the ev olution of ω µ , Σ 2 i , V µ and R i described in Alg. 1. At this point, the difference between AMP and SwAMP appears minimal, but, as we shall see, October 17, 2018 DRAFT 5 the differences in con vergence properties turn out to be spectacular . Algorithm 1 Swept AMP 1: procedure S W A M P ( y , Φ , { ∆ , θ prior , t max , ε } ) 2: t ← 0 3: initialize  a (0) , v (0)  , { ω (0; N +1) , V (0; N +1) } 4: while t < t max and | | a ( t +1) − a ( t ) | | < ε do 5: for µ = 1 , M do 6: g ( t ) µ ← y µ − ω ( t ; N +1) µ ∆+ V ( t ; N +1) µ 7: V ( t +1; 1) µ ← P i Φ 2 µi v ( t ) i 8: ω ( t +1; 1) µ ← P i Φ µi a ( t ) i − V ( t +1; 1) µ g ( t ) µ 9: S ← Permute ([1 , 2 , . . . , N ]) 10: for k = 1 , N do 11: i ← S k 12: Σ 2 i ( t +1) ←  P µ Φ 2 µi ∆+ V ( t +1; k ) µ  − 1 13: R ( t +1) i ← a ( t ) i + Σ 2 i ( t +1) P µ Φ µi y µ − ω ( t +1; k ) µ ∆+ V ( t +1; k ) µ 14: a ( t +1) i ← f 1 ( R ( t +1) i , Σ 2 i ( t +1) ; θ prior ) 15: v ( t +1) i ← f 2 ( R ( t +1) i , Σ 2 i ( t +1) ; θ prior ) 16: for µ = 1 , m do 17: V ( t +1; k +1) µ ← V ( t +1; k ) µ + Φ 2 µi ( v ( t +1) i − v ( t ) i ) 18: ω ( t +1; k +1) µ ← ω ( t +1; k ) µ + Φ µi ( a ( t +1) i − a ( t ) i ) − g ( t ) µ ( V ( t +1; k +1) µ − V ( t +1; k ) µ ) 19: t ← t + 1 20: return  a ( t +1) , v ( t +1)  Finally , we note that this procedure can also be generalized, a la GAMP , for output channels other than the A WGN. The required change is minimal [6]: one should replace the term ( y µ − ω µ ) / (∆ µ + V µ ) in the R i and ω µ updates with g out ( ω µ , V µ ) , a generic function which depends on the channel. Speciﬁcally , g out ( ω , V ) = R dz P ( y | z ) e − ( z − ω ) 2 2 V  z − ω V  . Additionally , the 1 ∆ µ + V µ term in the Σ 2 i update should be replaced by − ∂ g out ∂ ω . Notice that all A WGN speciﬁc terms are recovered for P ( y | z ) ∝ e − ( y − z ) 2 2∆ . I I I . N U M E R I C A L R E S U LT S Here, we present a range of numerical results demonstrating the effecti veness of the SwAMP algorithm for problems on which both standard AMP and ` 1 minimization via con ve x optimization fail to provide desirable reconstruction performance. All experiments were conducted on a computer with an i7-3930K processor and run via Matlab. W e hav e provided demonstrations of the SwAMP code on-line 3 . For calculating ` 1 recov eries, we utilize an implementation of the SPGL1 [18] algorithm 4 . A. Compr essed Sensing with T roublesome Pr ojections As discussed earlier , using projections of non-zero mean to sample x is one of the simplest cases for which AMP can fail to con ver ge. Howe ver , by using the proposed SwAMP approach, accurate estimates of x can be obtained ev en when the mean of the projections is non-negligible. While it may be possible to use mean subtraction, our 3 https://github .com/eric- tramel/SwAMP- Demo 4 http://www .cs.ubc.ca/ ∼ mpf/spgl1/ October 17, 2018 DRAFT 6 proposed approach does not require such preprocessing. Additionally , as we will show later , not all problems are amenable to such mean subtraction. T o ev aluate the effecti veness of SwAMP as compared to the standard parallel-update AMP iteration, we draw i.i.d. projections according to Φ µi ∼ N  γ N , 1 N  , (19) where the magnitude of the projector mean is controlled by the term γ . For a giv en signal x and noise variance ∆ , as γ increases from 0, we expect to see AMP failing to conv erge. This behavior can be observed in the numerical experiments presented in Fig. 1a. Here, we observe that SwAMP is robust to values of γ over an order of magnitude larger than the standard AMP , conv erging to a low-MSE solution ev en for γ ≈ 140 while AMP fails already at γ = 2 . Additionally , for the tested parameters, ` 1 minimization fails to provide a meaningful reconstruction for any value of γ . W e also considered an even more troublesome case for projections, namely , a set of projections which are strongly correlated. For these tests, we draw Φ = 1 N P Q, where P µk , Q ki ∼ N (0 , 1) (20) with P ∈ R M × R , Q ∈ R R × N and R , η N . That is, Φ is low-rank for η < α , where α = M N . In our experiments, we use η to denote the lev el of independence of the rows of Φ , with lower values of η representing a more difﬁcult problem. W e observe that the elements of Φ are neither normal nor i.i.d. for these experiments. In Fig. 1b we see that SwAMP is robust to ev en these extremely troublesome projections while AMP fails to con verge and ` 1 minimization does not provide the same lev el of accuracy as SwAMP . These two experiments demonstrate how the proposed SwAMP iteration allows for AMP-like performance while remaining robust to conditions outside of the T AP assumptions about the projector . B. Gr oup T esting Group testing, also known as pooling in molecular biology , is an approach to designing experiments so as to reduce the number of tests required to identify rare events or faulty items. In the most naiv e approach to this problem, the number of tests is equal to the number of items, as each item is tested individually . Howe ver , since only a small fraction of the items may be faulty , the number of tests can be signiﬁcantly reduced via pooling, i.e. testing many items simultaneously and allowing items to be included within multiple different tests. The nature of this linear combination of tests allows for a CS-type approach to faulty item detection, but with a few important cav eats. First, the operator is extremely sparse since the number of pools, and the number of items in them, may be limited due to physical testing constraints. Second, the elements of this operator are commonly 0 / 1 . Group testing is therefore a very challenging application for AMP since the properties of the group testing operator do not match AMP’ s assumptions. In one recent work [19], the authors use both BP and AMP for group testing and found that while basic AMP would not con ver ge, very good results—optimal ones, in fact—could be obtained by using a BP approach. This came October 17, 2018 DRAFT 7 0 5 1 0 1 5 A M P i t e r . 0 . 0 0 0 . 0 5 0 . 1 0 0 . 1 5 M S E a t i t e r . γ = 1 . 8 γ = 2 . 0 γ = 2 . 4 0 5 1 0 1 5 S w A M P i t e r . 0 . 0 0 0 . 0 5 0 . 1 0 0 . 1 5 γ = 10 γ = 30 γ = 50 0 6 0 1 2 0 1 8 0 γ 1 0 - 1 1 0 - 3 1 0 - 5 1 0 - 7 1 0 - 9 f i n a l M S E L 1 S w A M P (a) 0 5 1 0 1 5 A M P i t e r . 0 . 0 0 0 . 1 5 0 . 3 0 0 . 4 5 M S E a t i t e r . η = 1 η = 3 η = 5 0 2 0 4 0 6 0 S w A M P i t e r . 0 . 0 0 0 . 0 5 0 . 1 0 0 . 1 5 η = 0 . 5 η = 0 . 7 η = 0 . 9 0 . 4 0 . 6 0 . 8 1 . 0 η 1 0 - 1 1 0 - 3 1 0 - 5 1 0 - 7 1 0 - 9 f i n a l M S E L 1 S W A M P (b) Fig. 1: AMP , SwAMP , and ` 1 solvers compared for CS signal reconstruction for sensing matrices with positive mean (left, a) and of low-rank (right, b) on sparse signals of size N = 10 4 and sparsity ρ = 0 . 2 with noise variance ∆ = 10 − 8 . The projections for (a) hav e been created following (19) using M = αN measurements with α = 0 . 5 . The projectors for (b) hav e been created according to (20) and are low-rank for η < α = 0 . 6 . Finally , a comparison between reconstruction error obtained by SwAMP and ` 1 -minimization is giv en at the bottom of both (a) and (b) for the same experimental settings. at a large computational cost, howe ver . Here, we hav e repeated the experiment of [19] using the SwAMP approach instead of AMP and BP . In fact, for SwAMP , a sparse operator is a very advantageous situation in terms of computational ef ﬁciency . Since the projector is e xtremely sparse by construction, we may e xplicitly ignore operations in volving null elements, thus considerably improving the algorithm’ s speed, as seen in Fig. 2b. Here, we also see that SwAMP’ s computational complexity is on the order of O ( N 2 ) , as is AMP’ s. Group testing experiments are shown in Fig. 2a where we use random 0 / 1 projections, under the constraint that each projection should sum to 7 , to sample sparse 0 / 1 signals with K  N non-zero elements, where N is the signal dimensionality . While AMP div erges when attempting to recover these signals, SwAMP conv erges to the correct solution in few iterations. Additionally , SwAMP very closely matches the BP transition, thus providing recov ery performance better than con ve x optimization, just as BP does, but with much less computational complexity . C. 1-bit Compressed Sensing One of the confounding factors regarding the practical implementation of CS in hardware devices is the treatment of measurement quantization. The original CS analysis provides recovery bounds based upon the assumption of real-valued measurements. Ho wev er , in practice, hardware devices cannot capture such values with inﬁnite precision, and so some kind of quantization on the measurements must be implemented. Speciﬁcally , if Q ( · ) is a uniform October 17, 2018 DRAFT 8 0.0 0.1 0.2 0.3 0.4 0.5 ρ = K N 0.1 0.2 0.3 0.4 0.5 0.6 0.7 α = M N SwAMP BP L1 Bayes opt. (a) 7 8 9 10 11 12 l o g 2 N 1 0 - 2 1 0 - 1 1 0 0 1 0 1 time (s) SwAMP (dense) SwAMP (sparse) AMP t ∝ N 2 (b) Fig. 2: (a) Group testing phase transition diagram between successful and unsuccessful signal recovery ov er M , the number of pools, and K , the number of non-zero signal elements. Successful recov ery means the correct identiﬁcation of all signal elements. The top-left of the diagram represents the easiest problems while the bottom- right the most difﬁcult. The transition lines are drawn along the contour of 50% of recov eries succeeding for many trials. (b) Execution times for both SwAMP and AMP using a sparse matrix with 25% of its elements having non-zero value. The reported times are measured for 500 iterations of the algorithms for each value of N for the parameters ρ = 0 . 25 and α = 0 . 75 . scalar quantizer , then y = Q (Φ x , B ) , where B is the number of bits used to represent the measurement. If signal recov erability is signiﬁcantly impacted by small B , then the dimensionality reduction provided by CS may be lost by the requirement for many bits to encode each measurement. Thankfully , recent works have sho wn CS recovery to be robust to quantization and the non-linear error it introduces. In fact, CS has been shown [20, 21] to be robust ev en in the extreme case B = 1 kno wn as 1-bit CS. In this case, the quantized measurements are giv en by y = sign (Φ x ) . (21) The non-linearity and severity of 1-bit CS requires special treatment from the CS recov ery procedure. In [20], a renormalized ﬁxed-point continuation (RFPC) algorithm was proposed. Later , [21] analyzed the sensitivity of 1-bit CS to sign ﬂips and proposed a noise-robust recovery algorithm, binary iterative hard thresholding (BIHT). Recognizing the capability of GAMP to handle non-linear output channels, [22] proposed the use of GAMP for signal recov ery from quantized CS measurements. Further analysis of message-passing approaches to the 1-bit CS problem from the perspectiv e of statistical mechanics was giv en in [23] where a modiﬁed ﬁxed-point iteration was derived via the cavity method which provided both improved recovery accuracy and reconstruction time as compared to the RFPC. Additionally , the authors used replica analysis to estimate the optimal MSE performance of ` 1 -minimization based 1-bit CS reconstruction. Finally , this analysis is extended in [24] to include the theoretical Bayesian optimal performance, which we will use as a baseline of comparison in Fig. 3a. October 17, 2018 DRAFT 9 Both methods [22] and [23] sho w the effecti veness of algorithms grounded in statistical mechanics for quantized CS reconstruction. Howe ver , both assume an amenable set of projections. Even projections po ssessing small mean can cause large degradations in performance. While mean remov al is occasionally effecti ve in the usual CS setting, it cannot be used for 1-bit CS due to the nature of the sign operation in (21). An algorithm that can handle troublesome projectors can therefore be of great use. In Sec. II-C, we show ho w the SwAMP can be modiﬁed to the general- channel setting, as was done in GAMP . This generalization allows for 1-bit CS recovery with SwAMP under much more relaxed requirements for Φ . In Fig. 3a, we see Generalized SwAMP (G-SwAMP) results for Φ µi ∼ N  20 N , 1 N  . W e observ e that G- SwAMP performs admirably even for this non-neglible mean on the projectors. In terms of recov ery performance, it does not quite meet the theoretical Bayes optimal performance [24], howev er , this is expected as the Bayes optimal performance is calculated for γ = 0 . Additionally , we see that even for this non-zero mean, G-SwAMP outperforms both the BIHT’ s empirical performance for the same mean, as well as the best-case theoretical ` 1 performance for zero mean [23]. Finally , in Fig. 3b, we see that GAMP fails to provide any meaningful signal recov ery for γ small, while G-SwAMP continues to con ver ge to low-MSE even for large values of γ . 0 1 2 3 4 5 6 30 20 10 0 final MSE (dB) ρ = 1 / 4 ρ = 1 / 8 ρ = 1 / 1 6 0 1 2 3 4 5 6 α = M N 30 20 10 0 final MSE (dB) BIHT G-SwAMP L1 Bayes opt. (a) 0 3 6 9 GAMP iter. 20 10 0 MSE at iter. (dB) γ = 1 γ = 3 γ = 5 0 10 20 30 G-SwAMP iter. 20 10 0 γ = 1 0 γ = 2 0 γ = 3 0 (b) Fig. 3: Results for 1-bit CS. (a) T op: Comparison between the Bayes optimal MSE for zero-mean projectors [24] (dashed lines) and that obtained by SwAMP for projectors with γ = 20 (markers) for three different levels of signal sparsity . The reported empirical results were obtained by averaging over 200 instances of size N = 512 . Bottom: Comparison of SwAMP and BIHT for ρ = 1 / 8 for experiment conditions identical to the ﬁgure above; theoretical results for zero-mean projectors are also presented for completeness, including theoretical ` 1 performance [23]. (b) Single instance comparison between GAMP and G-SwAMP for 1-bit CS with N = 2048 , ρ = 1 / 8 , and α = 3 . I V . C O N C L U S I O N While the AMP algorithm has been shown to be a very desirable approach for signal recovery and statistical inference problems in terms of both computational efﬁcienc y and accuracy , it is also very sensiti ve to problems which deviate from its fundamental assumptions. In this work, we propose the SwAMP algorithm which matches October 17, 2018 DRAFT 10 AMP’ s accuracy while remaining robust to such variations, all without unduly increasing computation or memory requirements. W e also demonstrate how SwAMP can be used to solve practical problems for which AMP and GAMP cannot be applied, namely , group testing and 1-bit CS with troublesome projections. In all cases, SwAMP provides superior accuracy as compared to ` 1 - minimization, as well as con ver gence properties superior to AMP and GAMP , and all with less computational and memory burden than BP or r-BP . Exact analysis of the asymptotic state e volution of SwAMP , as well as a thorough analytical proof of its con ver gence, remains a challenging open problem for future work. V . A C K N OW L E D G M E N T S This work has been supported in part by the ERC under the European Unions 7th Frame work Programme Grant Agreement 307087-SP ARCS, by the Grant DySpaN of T riangle de la Physique, and by F APESP under grant 13/01213-8. October 17, 2018 DRAFT 11 R E F E R E N C E S [1] J. Pearl, Pr obabilistic Reasoning in Intelligent Systems . Morgan Kaufmann, 1988. [2] M. M ´ ezard and A. Montanari, Information, Physics, and Computation . OUP , 2009. [3] M. Opper and D. Saad, Advanced Mean F ield Methods: Theory and Practice . MIT Press, 2001, nIPS workshop series. [4] E. B. Sudderth, A. T . Ihler , M. Isard, W . T . Freeman, and A. S. W illsky , “Nonparametric belief propagation, ” Communications of the ACM , vol. 53, no. 10, p. 95, 2010. [5] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for compressed sensing, ” Proc. National Academy of Sciences of the United States of America , vol. 106, no. 45, p. 18914, 2009. [6] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing, ” in Information Theory Proceedings, IEEE Internaional Symposium on , 2011, p. 2168. [7] E. J. Cand ` es and J. Romberg, “Signal reco very from random projections, ” in Computational Imaging III . San Jose, CA: Proc. SPIE 5674, 2005, pp. 76–86. [8] F . Krzakala, M. M ´ ezard, F . Sausset, Y . F . Sun, and L. Zdeborov ´ a, “Statistical-physics-based reconstruction in compressed sensing, ” Physical Review X , vol. 2, no. 2, p. 021005, 2012. [9] ——, “Probabilistic reconstruction in compressed sensing: Algorithms, phase diagrams, and threshold achieving matrices, ” J. Stat. Mech.: Th. and Exp. , no. 8, p. P08009, 2012. [10] D. L. Donoho, A. Javanmard, and A. Montanari, “Information-theoretically optimal compressed sensing via spatial coupling and approximate message passing, ” in Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on . IEEE, 2012, p. 1231. [11] M. Bayati and A. Montanari, “The dynamics of message passing on dense graphs, with applications to compressed sensing, ” IEEE T ransactions on Information Theory , vol. 57, no. 2, p. 764, 2011. [12] F . Caltagirone, F . Krzakala, and L. Zdeborov ´ a, “On conver gence of approximate message passing, ” in Information Theory Pr oceedings (ISIT), 2014 IEEE International Symposium on , 2014. [13] J. P . V ila and P . Schniter, “Expectation-maximization gaussian-mixture approximate message passing, ” in Proc. 46th Annual Confer ence on Information Sciences and Systems , 2012, p. 1. [14] S. Rangan, P . Schniter , and A. K. Fletcher, “On the con vergence of approximate message passing with arbitrary matrices, ” arXiv preprint 1402.3210 , 2014. [15] B. C ¸ akmak, O. W inther, and B. H. Fleury , “S-amp: Approximate message passing for general matrix ensembles, ” arXiv pr eprint 1405.2767 , 2014. [16] M. J. W ainwright and M. I. Jordan, “Graphical models, exponential families, and variational inference, ” F oundations and T rends in Machine Learning , vol. 1, 2008. [17] D. J. Thouless, P . W . Anderson, and R. G. Palmer , “Solution of ‘solvable model of a spin glass’, ” Philosophical Magazine , vol. 35, no. 3, p. 593, 1977. [18] E. van den Berg and M. P . Friedlander, “Probing the pareto frontier for basis pursuit solutions, ” SIAM Journal on Scientiﬁc Computing , vol. 31, no. 2, pp. 890–912, 2008. [19] P . Zhang, F . Krzakala, M. M ´ ezard, and L. Zdeborov ´ a, “Non-adapti ve pooling strategies for detection of rare faulty items, ” in Communications W orkshops, Pr oc. IEEE International Conference on , Budapest, Hungary , 2013, p. 1409. [20] P . T . Boufounos and R. G. Baraniuk, “1-bit compressive sensing, ” in Pr oceedings of the 42 nd Annual Conference on Information Sciences and Systems , Princeton, NJ, 2008, pp. 16–21. [21] L. Jacques, J. N. Laska, P . T . Boufounos, and R. G. Baraniuk, “Robust 1-bit compressive sensing via binary stable embeddings of sparse vectors, ” arXiv preprint 1104.3160v3 , 2012. [22] U. S. Kamilov , V . K. Goyal, and S. Rangan, “Message-passing de-quantization with applications to compressed sensing, ” IEEE T ransactions on Image Pr ocessing , vol. 60, no. 12, p. 6270, 2012. [23] Y . Xu and Y . Kabashima, “Statistical mechanics approach to 1-bit compressed sensing, ” Journal of Statistical Mechanics: Theory and Experiment , no. 2, p. P02041, 2013. [24] Y . Xu, Y . Kabashima, and L. Zdeborov ´ a, “Bayesian signal reconstruction for 1-bit compressed sensing, ” arXiv pr eprint 1406.3782 , 2014. October 17, 2018 DRAFT

Sparse Estimation with the Swept Approximated Message-Passing Algorithm

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment