The Physics of Compressive Sensing and the Gradient-Based Recovery Algorithms

The Ph ysics of Compressiv e Sensing and the Gradien t-Based Reco v ery Algorithms Qi Dai and W ei Sha Dep artment of Ele ctric al and El e ctr onic Eng ine ering, The University of Hong Kong, Hong Kong, China. Email: daiqi@hku.hk (Qi Dai); wsha@e e e.hku.hk (Wei Sh a) Research Report Compiled October 22 , 2018 The physics of compressiv e sensing (CS) and the gradien t-based rec ov ery algorithms are presented. Fir st, the diﬀeren t forms for CS are s ummarized. Second, the physical meanings of coherence and measurement are give n. Third, the gradien t-based reco very a lgori thms and t heir geometry explanat ions are pro vided. Finally , w e conclude the rep ort and giv e some suggestion for future work. Keyw ords: Compressiv e Sensing; Coherence; Measurement ; Gradien t-Based Recov ery Algorithms. c  2018 Opt ical Society of America 1. In tro duction The well-kno wn Nyq uis t/Shannon sampling theor em that the sampling rate m ust be at least twice the max- im um f req ue nc y of the signal is a golden r ule used in visual and audio electronics, medical imaging devices, ra- dio receivers and so on. Ho wev er, can we simply recov er a signal from a small num b er of linea r measurements? Y es, we can, a nswered ﬁrmly by Emma nuel J. Cand` es, Justin Romberg, and T erence T ao [1] [2] [3 ]. They brought us the to ol called Compres s ive Sensing (CS) [4] [5] [6] sev- eral years ago which avoids larg e digital data set and enables us to build the data co mpression dir ectly fro m the acquisition. The mathematical theo ry underlying CS is deep and b eautiful and dra ws from diverse ﬁelds, but we do n’t fo cus to o muc h on the mathematical pro ofs. Here, we will g ive some physical explanations and dis - cuss relev an t recovery algo rithms. 2. Exact Reco ver y of Sparse Signals Given a time-doma in signal f ∈ R N × 1 , there a re four diﬀerent forms for CS. (a) If f is sparse in the time- domain and the meas ur ements are acquired in the time- domain a lso, then the optimization pro blem can b e given by min k f k 1 s.t. M 0 f = y (1) where M 0 ∈ R M × N is the o bserv ation matrix and y ∈ R M × 1 are the mea surements. (b) If f is spar se in the t ime-doma in and the measur ements are acquired in the transform-do main (F ourier transform, disc r ete c o sine transform, wa velet transfor m, X-let tr ansform, etc), then the optimization problem can b e given by min k Ψ † ˜ f k 1 s.t. M 0 ˜ f = ˜ y (2) where Ψ † is the inv er s e trans form matrix and satisﬁes Ψ † Ψ = ΨΨ † = I . (c) If f is s parse in the transfor m- domain and the meas ur ements are acquired in the time- domain, then the optimization problem ca n b e g iven b y min k Ψ f k 1 s.t. M 0 f = y . (3) (d) If f is s parse in the tr ansform-do ma in and the measurements are acquired in the tra nsform-doma in also, then the o ptimization problem can b e given by min k ˜ f k 1 s.t. M 0 ˜ f = ˜ y . (4) F rom the a bove equatio ns, the meaning s of the spar- sity can b e generaliz ed. If the num b er of the non-zer o ele- men ts is very small compar ed with the leng th of the time- domain signal, the signal is spa rse in the time-domain. If the most impor tant K comp onents in the transform- domain can represe nt signa l accurately , we ca n say t he signal is sparse in the transform-doma in. Because we can set other unimp or tant co mp o nent s to be zer o and imple- men t the in verse transfor m, the time-domain s ignal can be r econstructed with very small num eric al erro r. The sparsity prop erty a lso ma kes the loss y data c ompressio n po ssible. F or the image pro cess ing, the deriv atives o f the image (esp ecially for the g e o metric image) a long the ho r- izontal a nd vertical directions are sparse. F or the ph ysi- cal s o ciety , we ca n say the w ave function is spar se with the sp ec iﬁc bas is representations. Before you go int o the CS world, you must know wha t is spa rse in what domain. The seco nd question is what is the size limit for the measurements y in order to p erfectly recover the K sparse sig nal. Usually , M & K lo g 2 ( N ) or M ≈ 4 K for the ge neral sig na l or image. F urther, if the s ignal f is spar se in the tra nsform-doma in Ψ and the mea s ure- men ts ar e acquired in the time-domain, then M & χ (Ψ , M 0 ) K lo g 2 ( N ), where χ (Ψ , M 0 ) is the co herence index betw een the basis system Ψ and the mea surement system M 0 [3]. The inco he r ence lea ds to the small χ and therefore fewer measur e ment s ar e r equired. The coher - ence index χ can b e easily found, if we rewr ite (3) as min k ˜ f k 1 s.t. M 0 Ψ † ˜ f = y . (5) Similarly , (2) can b e rewritten as min k f k 1 s.t. M 0 Ψ f = ˜ y (6) Third, what are the inherent prop er ties for the o b- serv ation matrix ? The observ ation ma trix ob eys wha t is 1 known as a uniform uncertaint y principle (U UP). C 1 M N ≤ ||M 0 f || 2 2 || f || 2 2 ≤ C 2 M N (7) where C 1 . 1 . C 2 . An alternative condition, which is called restricted isometry proper ty (RIP), can be given by 1 − δ k ≤ ||M 0 f || 2 2 || f || 2 2 ≤ 1 + δ k (8) where δ k is a constant and is not to o close to 1. The prop erties show the three facts: (a) The measurements y can maintain the energy o f the o r iginal time-domain sig- nal f . In other words, the mea s urement pro ces s is stable. (b) If f is sparse, then M 0 m ust b e dense. This is the reason why the theorem is called UUP . (c) If we want to per fectly recover f from the mea surements y , at least 2 K mea surements are required. According to the UUP and RIP theorems, it is conv enient to se t the observ ation matrix M 0 to a r andom matrix (nor ma l distribution, uniform distribution, or Bernoulli distribution). F our, why l 1 norm is used in (1)(2)(3)(4)? F or the real application, the size o f meas ur ement M ≪ N . As a result, one will f ace the pro blem how to so lve a n under - determined matrix equation. In other words, there ar e a huge amo unt of diﬀeren t candidate signals that could all result in the given meas urements. Thus, one must int ro duce some additional constraints to s e lect the best candidate. The cla ssical so lution to such pr oblems w ould be minimizing the l 2 norm (the pseudo-inv erse solution), which minimizes the amount of energy in the s ystem. How ever, this leads to po or res ults for most practical ap- plications, as the reco vered s ignal seldo m has zero com- po nents. A more attractiv e solution would be minimiz- ing the l 0 norm, o r equiv alently maximize the num b er of z ero comp onents in the basis s ystem. How ever, this is NP -hard (it co ntains the s ubset-sum problem), and so is computationally infeas ible for all but the tiniest data sets. Thu s, the l 1 norm, or the sum of the absolute v a lues, is usually what is to be minimized. Finding the candidate with the smallest l 1 norm can b e expresse d rel- atively easily as a linear co nv ex optimization progra m, for which eﬃcien t solution metho ds alr e ady exist. This leads to comparable results as using the l 0 norm, often yielding results with many compo ne nts be ing zer o. F or simplicity , we take the 2-D case for example. The b ound- aries o f l 0 , l 1 , and l 2 norms are cr oss, diamond, and c ircle, resp ectively . (See Fig. 1). The underde ter mined matrix equation c an b e seen a s a straight line. If the intersection betw een the straig ht line a nd the b oundary is lo cated at the x-axis or y-axis, the recov ered result will b e sparse. Obviously , the intersection will alwa ys b e lo cated at the axes if p ≤ 1 for the l p norm. Five, ca n CS hav e a goo d pe r formance in a noisy en- vironment? Y es, it can. Because the recov ery alg orithm can g et the most imp or ta nt K comp onents and force other c o mp onents (including noise comp onents) to b e −1 0 1 −1 −0.5 0 0.5 1 (a) −1 0 1 −1 −0.5 0 0.5 1 (b) −1 0 1 −1 −0.5 0 0.5 1 (c) −1 0 1 −1 −0.5 0 0.5 1 (d) Fig. 1 . The g eometries o f l p norm: (a) p = 2; ( b) p = 1; (c) p = 0 . 5; (d) p = 0. zero. F or the image pro c e s sing, the recovery algorithm will not s mo oth the image but yield the sharp edge. Finally , let us review how CS enco de a nd deco de the time-do main signa l f . F or the enco der, it gets the measurements y o r ˜ y accor ding to the obser v ation ma- trix M 0 . F or the deco der , it rec overs f or ˜ f by solving the co nv ex optimization problem (1)(2)(3)(4) with y ( ˜ y ), M 0 , and Ψ 1 . Hence, C S can b e seen a s a fast enco der with lower sampling r ate (few er data set). The sampling rate only dep ends on the spa r sity of f in s ome domains and go es b eyond the limit of the Nyquist/Shannon sam- pling theorem. How ever, CS will face a challenging pro b- lem: Ho w to perfectly recov er the signal with lo w com- putational complexity and memory? 3. Ph ysics of Compressi v e Sensing The most imp or tant conc e pts of the CS theo ry in volve Coherence a nd Measurement. In physics, coher ence is a pro p erty of w aves, that en- ables stationary (i.e. temp orally and spa tially constant) int erfer ence. Mor e gener ally , co herence descr ibe s all cor - relation prop erties be t ween physical quantities of a wa ve. When interfering, t wo wa ves can a dd toge ther to create a larger wav e (constr uctive interference) or subtract from each other to create a smaller wa ve (destructive inter- ference), dep ending on their relative phase. The coher - ence of tw o w aves follows from how well correlated the wa ves ar e as quantiﬁed by the cross- correla tion function. The cross-cor relation q uantiﬁes the abilit y to predict the v a lue of the sec o nd wa ve by knowing the v alue o f the ﬁrst. As an example, consider tw o wa ves p er fectly c o rre- lated for a ll times. At any time, if the ﬁrst wa ve changes, the s econd will change in the same way . If combined they can ex hibit complete constructive in terfere nc e at all 1 F or (1)(4), Ψ i s not necessary . 2 times. It follows that they are p erfectly coher ent. So, the second w av e needs no t be a separate en tit y . It could be the ﬁrst w av e at a diﬀerent time or p osition. In this case, sometimes called self-coherence, the measur e of correla- tion is the auto corr elation function. T ake the Tho mas Y oung’s double-slit expe r iment for exa mple, a coherent light sour c e illuminates a thin pla te with tw o pa rallel slits cut in it, and the light passing throug h the slits strikes a screen behind them. The wa ve nature of light causes the light wa ves passing thro ugh b o th slits to inter- fere, c r eating an interference pattern of bright and dark bands on the scr een. In fact, the dark bands can relate to the zer o comp onents in the s ignal pro cessing ﬁeld. It is well known that we se le ct the ba sis functions c o her- ent with the signal or image. If the signal is the square wa ve, the haa r w avelet is a go o d choice. If the signal is the s ine wav e, the F o ur ier tra nsform is a g o o d choice. The coherence index b etw een a s ignal and a ba sis sy stem will decide the spa rsity of the signal in the transform- domain. In o ther words, fewer basis functions will b e used or mor e comp o ne nts in the transfor m-domain are to b e z e ro if the signal and the basis functions are co- herent. F or the CS, howev er, the observ ation matrix M 0 and the time-domain signal f should b e inco herent. In addition, the observ ation matrix M 0 and the basis s ys- tem Ψ also should be inco herent. If it is not the case, the reconstructio n ma trix M 0 Ψ † in (5) will be sparse, which violates the UUP theorem. Then, let us talk ab o ut quantum mechanics and the measurement. In physics, a w av e f unction o r wav efunc- tion is a mathematical to ol used in quantum mechanics to des c rib e any ph ysic a l system. The v alues of the wav e function ar e probability amplitudes (co mplex num b e rs). The s quares of the a bsolute v alues of the wa ve functions | f | 2 give the pr o bability distr ibution (the chance of ﬁnd- ing the sub ject at a certain time and p os ition) that the system will b e in any of the p o ssible quantum s tates. The mo dern usa ge of the term wa ve function refers to a complex v ector or function, i.e. a n elemen t in a com- plex Hilb ert space. An element of a vector space can b e expressed in diﬀerent bases ; and so the same applies to wa ve functions. The co mp onents of a wa ve function de- scribing the same physical state take diﬀeren t c omplex v a lues dep ending o n the basis b eing used; how ever the wa ve function itself is not dependent on the basis chosen. Similarly , in the signal process ing ﬁeld, w e use diﬀerent basis functions to represent the signa l or image. The qua nt um sta te of a system is a mathema tical ob- ject that f ully describ es the quan tum system. Once the quantum state has b een prepared, some asp ect o f it is measured (for example, its p ositio n o r ener gy). It is a po stulate of quan tum m echanics that all measurements hav e an asso c iated op er ator 2 (called an o bs erv able op- erator). The exp ected result of the measur e ment is in general describ ed not by a single num be r , but b y a prob- ability distribution that spe ciﬁes the lik eliho o ds that the 2 F or the dis crete system, an op erator can be seen as a matrix. v a rious p ossible results will b e o btained. The mea s ure- men t pro c e ss is often said to b e random and indetermin- istic. Suppose w e ta ke a measuremen t corresp onding to observ able op erator ˆ O , on a state whose quantum sta te is f . The mea n v alue (expecta tio n v alue) of the measure- men t is h f , ˆ O f i and the v aria nce of t he measurement is h f , ˆ O 2 f i − ( h f , ˆ O f i ) 2 . Each descriptor (mean, v a riance, etc) of the measurement in v olves a part of infor ma tion of the qua nt um state. In the signal pro ces sing ﬁeld, the observ ation matrix M 0 is a rando m ma trix a nd each measurement y i captures a po rtion o f info r mation of the signal f . Due to the UUP and RIP theorems, all the measurements make the sa me contribution to recov ering f . In other words, e a ch measur e ment y i is equa lly impo r- tant or unimpo rtant 3 . The unique prop e rty will make CS very p ow erful in the comm unication ﬁeld (channel cod- ing). 4. Gradien t-Based Recov ery Algorithms A fas t, low-consumed, and r eliable r ecov ery algo rithm is the core o f the CS theo ry . Ther e ar e a lot of outstand- ing w ork on the topic [7] [8] [9] [1 0]. Based o n their work, we dev elop ed the gra dient -ba sed recov ery algor ithms. In particular, we did not reshap e the image (matr ix) to the signal (vector), which will c onsume a large amo unt of memory . W e trea t each c o lumn of the image as a vector and the comparable r esults also can b e obta ined. F or the sparse imag e in the time-doma in, the l 1 norm co nstraint is used. F or the g eneral image (esp ecia lly for the geomet- ric imag e), the total v ariatio n constr aint is used. Consid- ering the non-diﬀerentiabilit y o f the function | f j,k | at the origin p oint, the subg r adient or smo oth approximation strategies [10] are employ ed. A. Gr adient Algorithms and Their Ge ometries Before solving the constrained convex optimization problems, the clear and deep understandings for the gradient-based a lg orithms are necessary . Giv en a linear matrix equation M 0 f = y (9) the s olution f can b e found by solving the follo wing min- imization problem min f L ( f ) ≡ 1 2 ||M 0 f − y || 2 2 (10) The gra dient-based algorithms for s olving (1 0) can be written as f i +1 = f i − µ i ∇ L ( f i ) (11 ) where µ i is the iteration step size and ∇ L ( f i ) = M † 0 ( M 0 f i − y ) (12) 3 F or the traditional compression metho d, the p er fect recon- struction is imp ossible if s ome impor tan t component s are l ost. 3 A v a riety of settings fo r µ i result in diﬀerent algo- rithms inv olving the gra dient method, the steep est de- scent method, and the Newton’s metho d. The g radient metho d sets µ i to a small constant. The steep est descent metho d sets µ i to µ i =  ∇ L ( f i ) , ∇ L ( f i )  D ∇ L ( f i ) , M † 0 M 0 ∇ L ( f i ) E + ε (13) which minimizes the residua l R i = M 0 f i − y in each iteration. Here, the small ε is used for avoiding a zero denominator. F or the Newton’s metho d, µ i is taken as a constant matrix µ i =  M † 0 M 0 + εI  − 1 (14) Here, the small ε is used for av oiding a nearly singular matrix. T o understand the geometries for the three gradient- based algo rithms, a simple case is tak en for example. A 2-D function f ( x, y ) = ( x + y ) 2 + ( x + 1 ) 2 + ( y + 3) 2 has a lo cal minim um f ∗ =  1 3 , − 5 3  . The contour of the function and the tra jectory of f i are drawn in Fig. 2 . −4 −2 0 2 4 6 8 10 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 x y Newton Steepest Descent Method Gradient Fig. 2. The geo metries for the gr adient-based algor ithms. The conv erg ence of the gradient metho d is worst. The steep est descent metho d con verges fast at the ﬁrst s ev- eral steps but slowly a s the iteration step incr eases. The Newton’s metho d is be s t and needs only o ne step for the 2-D c a se 4 . Next, w e w ill apply the s teep e st descen t method and the Newton’s method to recov er the signal or image f b y using (1). The treatments for other conv ex optimizatio n problems (2)(3)(4) are similar. B. l 1 Norm Str ate gy W e a ssume f j,k is the pixel of an N × N image f at the j-th row and the k-th column (1 ≤ j ≤ N and 4 F or the CS, the p erf ormance of the Newton’s method will de- crease due to t he nearly singular observ ation matrix and the l 1 norm constrain t. 1 ≤ k ≤ N ). The conv ex optimizatio n pro ble m (1) for the sparse image can be con verted to min f H ( f ) ≡ L ( f ) + λ k f k 1 (15) where k f k 1 = P j,k | f j,k | . The ab ove equation is a La - grange m ultiplier formulation. The ﬁrst term relates to the underdetermined matrix equation (9) and the second l 1 -p enalty term will assure a r egulariz ed sparse so lution. The parameter λ balances the weigh t of the ﬁrst term and the s econd term. Because | f j,k | is no t diﬀerentiable at the or igin p o int, we can deﬁne a new subgradien t for ea ch f j,k as follows ∇ j,k H ( f ) =        ∇ j,k L ( f ) + λ sign( f j,k ) , | f j,k | ≥ ε ∇ j,k L ( f ) + λ, | f j,k | < ε, ∇ j,k L ( f ) < − λ ∇ j,k L ( f ) − λ, | f j,k | < ε, ∇ j,k L ( f ) > λ 0 , | f j,k | < ε, |∇ j,k L ( f ) | ≤ λ (16) Then the g r adient-based algor ithm can b e written as f i +1 j,k = f i j,k − µ i k ∇ j,k H ( f i ) (17) where j and k ar e, resp ectively , the row index and the column index of the imag e f a nd i deno tes the i-th iter- ation step. The µ i k has b een given in (13) and (14). Bear in mind, the image will be treated co lumn by co lumn when co mputing µ i k and ∇ j,k L ( f i ). F or the steep est descent method, the parameter λ can be taken as a small p ositive constant ( λ = 0 . 001 − 0 . 01). But fo r the Newton’s metho d, the parameter λ must be g radually decrea s ed as the iteration step incr eases ( λ i +1 = (0 . 99 − 0 . 999) × λ i ). C. T otal V ariatio n Str ate gy F or a general image , esp ecially for a geo metr ic image, it is not s parse in the time-domain. Hence, the l 1 norm strategy developed in the previous subsectio n will brea k down. The con vex optimiza tio n problem for the gener al im- age ca n b e given b y min f H ( f ) ≡ L ( f ) + λ TV(f ) (18) where TV(f ) is the total v aria tion of the image f . The deriv ativ es of f along the vertical and horizont al direc- tions c a n b e d eﬁned a s D v j,k f =  f j,k − f j +1 ,k 1 ≤ j < N 0 j = N (19) D h j,k f =  f j,k − f j,k +1 1 ≤ k < N 0 k = N (20) The tota l v ariatio n of the image f is the summation for the magnitude of the gradient of each pixel [11] TV(f ) = X j , k r  D v j , k f  2 +  D h j , k f  2 = X j , k |∇ j , k f | (21) 4 After some simple deriv a tions, the gr a dient of the to tal v a riation with each pixel is given by ∇ j,k (TV(f )) = D v j,k f |∇ j,k f | + D h j,k f |∇ j,k f | − D v j − 1 ,k f |∇ j − 1 ,k f | − D h j,k − 1 f |∇ j,k − 1 f | (22) When treating (2 2), the smo oth a pproximation strateg y is use d f or avoiding a ze r o denominator, i.e. |∇ j,k f | = r  D v j,k f  2 +  D h j,k f  2 + ε ( 23 ) The gradient-based algor ithm has b een given in (17). 5. Numerical Exp erime n ts and Results A. Cases of l 1 Norm Str ate gy The ﬁr st imag e we’d like to recov er is a 64 × 64 spa rse diamond as shown in Fig. 3. Fig. 3. The 64 × 6 4 sparse diamond. Notice that the image itse lf is spar se in the time-domain, we need no t to transfo r m the image into other do ma ins, such as the wa velet doma in or F ourier domain. The s ize of the o bserv ation matr ix for the nea rly p erfect r econ- struction should be at least lar ger than 12 × 64 wher e 12 is calculated by 2 · log 2 64 = 1 2. If our obser v ation matrices are genera ted by the uniform distribution fro m 0 to 1 , after 200 00 iteration steps, Fig. 4 can b e o b- tained. The subplot (a) s hows the r ecov ered image with a 10 × 64 observ ation matrix, while (b), (c), and (d) are recov ered with 1 2 × 6 4, 15 × 64, and 20 × 6 4 random observ ation matrices resp ectively . When the size of the observ ation matrix is small, the po or reconstructed im- ages ar e shown in (a) and (b). But when the o bserv ation matrix gets larger and larg er, the b etter results can b e obtained as shown in (c) and (d). Another 64 × 64 image we wis h to recov er is a cir- cle as shown in Fig. 5 . Althoug h the circ le is spar s e on the whole, but it is not the case for some columns. These columns may re q uire more measurements if we treat an n × n ima ge as n vectors. Using the steep est descent metho d, we can recov er the image after sev era l iterations. Here, w e don’t wan t to co mpa re the Newton’s metho d with the steep est de s cent method to ﬁnd which (a) (b) (c) (d) Fig. 4. The reco vered diamond ﬁgures b y the Newton’s metho d from the obser v ation matrices with diﬀerent sizes: (a) 10 × 6 4 ; (b) 1 2 × 6 4; (c) 15 × 64; (d) 20 × 6 4. Fig. 5. The 64 × 6 4 sparse circle. one is more p owerful and eﬀectiv e. (Actually , their per - formances are almost the same for the image.) What we wan t to show is the sparsity o f an ima g e aﬀects the re- cov ered res ults a lot. Fig. 6 s hows the recov ered res ults from the obser v ation matrices with diﬀerent sizes slig htly larger than the previous case. F or the subplots (a) and (b), the results are un desir able. The subplot (c) is b et- ter and almos t p erfect recons truction is obtained fo r the subplot (d). Again, we notice that for the small o bserv a- tion matrix, the recov er e d results may v ary dra stically . If the size o f the observ ation matrix is larg e enoug h, the reconstructed imag e is accurate or even ex a ct in any r e - pea ted exp eriments. T he subplots (a), (b), and (c) in Fig. 6 show that thos e columns which are less spa rse ar e the hardest to rec ov er when the small o bserv ation matrix is utilized. F or the case, the total v a riation stra tegy may be a b etter choice. In a word, a la rge obs e rv ation matrix can ca pture more informa tion of the image and there fo re the image can be reco vered w ith hig her probability . B. Cases of T otal V ariatio n Str ate gy If the image is not spar s e in the time-domain, it also can b e recov ere d from the mea surements without repre- sented as the basis functions Ψ who se co eﬃcients may be spar se. F or instance , the geometric ﬁgur e comp o s ed of a solid circle and a so lid square is shown in Fig. 7 . 5 (a) (b) (c) (d) Fig. 6. The recovered circle ﬁgure s by the steep est de- scent method from the observ ation matric es with diﬀer- ent sizes: (a) 1 5 × 64 ; (b) 20 × 64 ; (c) 2 5 × 64; (d) 30 × 6 4. It is easy to imagine the deriv ativ es of the image ar e sparse. W e apply the tota l v ariation strategy to re cov er the image. Fig. 7. The geometr ic ﬁgure. The size of the o b ject image aga in is 64 × 64 and the 20 × 64 o bs erv ation ma trix is employed. Fig . 8 is the t ypica l measur ements y and Fig. 9 is the reconstructed image fro m y . The p eak s ig nal to noise ratio (P SNR) calculated is always a b ove 90 for the r ep eated ex pe r- imen ts (diﬀerent observ ation ma tr ices with the same size), whic h sugges ts that the observ ation ma trix is la rge enough to reco ver the original image with an extremely accurate result. Fig. 8. The measure ment of the geometr ic ﬁgur e. The next tw o images Cameraman and Boats in Fig. 10 are quite well-known in the image pro cessing ﬁeld. W e still treat the 256 × 256 image as 256 vectors. Although the r esult may not be so go o d as that we obtain by treat- ing the 25 6 × 256 image as a long vector of size 65536 × 1, we do s ave a g reat amount of memo r y a nd calculation time. The size of our obser v ation matrix M 0 is 100 × 256 rather than 25600 × 655 36 (25600 ≈ 65536 / 2 . 56). The r e- Fig. 9. The re c ov ered geo metric image by the steep est descent metho d: 20 × 64 obser v ation matrix is employed. cov ery alg orithm is implemen ted in the wa velet-domain, and the PSNR for the subplots (a) a nd (b) in Fig. 11 are 29 . 4 and 3 0 . 9, respe c tively . In addition, the gradient- based total v ariation alg orithm also has a go o d p er for- mance for the geometric image Peppers . (Showing to o many results are no t necessar y). (a) (b) Fig. 1 0. The g e neral imag e: (a ) Camerama n; (b) Boa ts. (a) (b) Fig. 11. The r ecov ered images by the Newton’s metho d: 100 × 256 o bserv ation ma trix is employed. (a) Camera - man; (b) Boats. 6. Discussions and F uture W ork The ab ove ar e just some simple exp eriments for demonstrating that the CS was a ble to recover an im- age a ccurately fro m a few of random pro jections. O ne should unders tand that the main adv an tage o f CS is not how small size it can compress the image to. In fact, if a signa l is K s parse in some doma ins, we indee d re quire 3 K to 5 K meas urements to recover the sig nal. An ob- vious a dv an tage of CS is that it can e nco de the signa l or image fast. In particular, the prior knowledge about 6 the signal is not impo rtant. F or example, it is not nec- essary for us to kno w the exact po sitions and v a lues of the mos t imp or tant comp onents be forehand. What we care is whether the imag e is sparse in some domains or not. A ﬁxed observ a tion matrix can b e applied to mea- sure diﬀerent sig na ls, which makes the applicatio ns of CS f or enco ding and decoding possible. Mean while, the measurements pla y the same role in recovering the sig- nal o r imag e, which makes CS very p ow erful in militar y applications (ra da r imaging) wher e we cannot aﬀord the risk caused by the loss of the most imp ortant K com- po nents. Since each random pr o jection (measur ement) is equally (un)imp orta nt, the CS is not sensitiv e to the noises or measurement err ors a nd can provide the r obust and s ta ble p erforma nce s. Although many r esear chers hav e made gr eat pr o - gresses in the co nv ex optimization problems and demon- strated the accur a te results on the scale we interest (hun- dreds o f thous a nds of measurements and millions o f pix- els), the mo r e eﬃcient algorithms are still req uired. Actu- ally , s olving the l 1 minimization pr oblem is a bo ut 30-5 0 times as exp ens ive as so lving the leas t square s problem. How ever, the unbalanced computational burden g ives us a c hance that the meas ur ements a re a cquired b y the se n- sors with low er p ower, and then the s ignal or image will be recov ered on the c e nt ra l sup erco mputer. The algo- rithms, suc h as the conjugate g r adient metho d and the generalized minimal r esidual metho d, will beco me o ur next candidates fo r a ccelerating the recovery algorithm. The ph ysical understandings and applications for CS are under way , although a sing le-pixel camera ha s sho ck ed the ﬁeld of optics. W e a re aware that the CS has p enetrated many ﬁelds and b ecome a ho tsp ot. W e exp ect more ma thematicians, physicist, and eng ine e r s make contributions for the CS ﬁeld. 7. Ac kno wledgem en t The author s are not from s ignal or imag e pro cess ing so ciety . Hence, some tec hnical words ma y not b e rig ht . W e hop e the r ep o rt can give some help for the rese a rchers form the signal pro cessing , ima g e pro ces sing, and phys- ical so cieties. References 1. Emmanuel J. Cand` es, Justin K. R omb erg, and T erence T ao, “Robust uncertainty principles: Ex act signal re- construction from highly incomplete frequency informa- tion,” IEEE T ransacti ons On Information Theory , v ol. 52, p p. 489-509, F eb. 2006. 2. David L. Donoho, “Compress ed sensing,” I EEE T rans- actions O n Informatio n Theory , vol. 52, pp. 1289-1306, Apr. 2006. 3. Emmanuel J. Cand` es and Mic hael B. W akin, “P eople hearing without listening:” An introduction to compres- sive sampling, Rep ort. 4. Richard G. Baraniuk, “Compressive sensing,” IEEE Sig- nal Processing Magazine, pp. 118-124, Jul. 2007. 5. Emmanuel J. Cand` es and Michael B. W akin, “An in tro- duction to compressive sampling,” IEEE Signal Process- ing Magazine, pp. 21-230, Mar. 2008. 6. Justin K. Romberg, “Imaging via compressiv e sam- pling,” IEEE Signal Processing Magazine, pp. 14-20, Mar. 2008. 7. Emmanuel J. Cand ` es and Justin K. Romberg, “Sig- nal reco very from ra nd om projections,” Pro ceedings of SPIE-IS&T Electronic Imaging, vol. 5674, pp . 76-86, Mar. 2005. 8. Emmanuel J. Cand` es and T erence T ao, “Deco ding by linear programming,” IEEE T ransactions O n In forma- tion Theory , vol. 51, pp. 4203-4215, Dec. 2005. 9. Joel A . T ropp and An na C. Gilb ert, “Signal recov ery from partial informa tion via orthogonal matc hing pur- suit,” IEEE T ransactions On In formation Theory , vol. 53, pp. 4655-4666, Dec. 2007. 10. Mark Schmidt, Glenn F ung, and R ´ omer Rosales, F ast optimization metho ds for L1 regularization: A compar- ative study and tw o new approac hes, Mac hine Learning: ECML 2007, vol. 4701, S ep. 2007. 11. Leonid I. Rudin, Stanley O sher, and Emad F atemi, “Nonlinear total v ariation based noise remov al algo- rithms,” Physica D, vol. 60, pp. 259-268, Nov. 1992. 7

The Physics of Compressive Sensing and the Gradient-Based Recovery Algorithms

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment