Sparsity Based Poisson Denoising with Dictionary Learning

1 Sparsity Based Poisson Denoising with Dicti onary Learning Raja Giryes and Mi chael Elad Department of Computer Science, Th e T echnion - Israel Institute of T echnology Haifa, 32000, Israe l { raja,elad } @cs.tech nion.ac.il Abstract —The problem o f Poisson denoising appears in v arious imaging applications, such as low-light photography , medical imaging and microsco py . In cases of high SNR, sev eral transfor - mations exist so as to con vert the Poisson noise into an addi tiv e i.i.d. Gaussian noise, fo r which many effective algorithms are a vailable. Howev er , in a l ow SNR regime, th ese transforma tions are signiﬁcantly l ess accurate, and a strategy th at relies directly on the true noise statistics is required. A recent work by S almon et a l. [1], [2] too k this route, proposing a patch-based ex ponential image repr esentation model based on GM M (Gaussian mixture model), leading to state-of-the-art r esults. In this p aper , we propose to harness sparse-repres entation modeling to the image patches, adopting t he same exponen tial idea. Our scheme uses a greedy p ursuit with boot-strappin g based stopp ing condition and dictionary learning within the denoising process. The re- construction perfo rmance of the proposed scheme is competitive with leading methods in high S NR, and achieving state-of-the-art results in cases of l ow SNR. I . I N T R O D U C T I O N Poisson n oise appear s in many applications such a s n ight vision, compu ted tomo graphy (CT), ﬂuore scence micro scopy , astrophysics and spectral imag ing. Gi ven a Poisson noisy image y ∈ ( N ∪ { 0 } ) m (represen ted as a column-stacked vector), our task is to rec over the original tr ue image x ∈ R m , where the entries in y (given x ) are Po isson distributed indepen dent random variables with mean and variance x [ i ] , i.e., P ( y [ i ]   x [ i ]) = ( ( x [ i ]) y [ i ] y [ i ]! exp( − x [ i ]) x [ i ] > 0 δ 0 ( y [ i ]) x [ i ] = 0 , (1) where δ 0 is th e Kr onecker delta functio n and x [ i ] and y [ i ] are the i -th component in x an d y r espectiv ely . Notice that Poisson noise is not ad ditiv e and its stren gth is d ependen t on the image intensity . Lower in tensity in the image yields a stronger no ise as the SNR in each pixel is p x [ i ] . Thus, it is natur al to de ﬁne the noise power in an image b y the maximal value in x ( its peak value) 1 . Many schemes fo r recovering x from y exist [3], [4], [5], [6]. A very pop ular strategy [7], [ 8], [9] relies on tran sforma- tions, such as An scombe [10] and Fisz [11], that conv ert the Poisson denoising problem in to a Ga ussian one, f or wh ich plenty of methods exist (e.g. [12], [13], [14]). The n oise becomes ap proxim ately white Gau ssian with un it variance. 1 The peak is a good measure under the assumption that the pixels’ val ues are spread uniformly ov er the whole dynamic range. This assumption it true for most natural images. Fig. 1. Poisson noisy versions of the image peppers with dif ferent peak v alues. From left to right: Peaks 0 . 1 , 1 , 4 and 10 . The p roblem with these app roximatio ns is the fact that they hold true only whe n the measu red pixels hav e high intensity [8], [1], [2], i.e. when a hig h photon cou nt is measur ed in the detectors. As a thu mb rule, these tr ansformatio ns are accur ate only when the p eak value in x is larger than 3 [2] 2 . In this case the noise look s very similar to a Gaussian one . When the peak value is sma ller , th e structure of the noisy imag e is quite different, with many zero pixels and o thers that have very small ( integer) values. As an examp le, when the peak equals 0 . 1 we have almost a b inary im age, con taining mainly either zero s or ones. Fig 1 shows n oisy versions of peppers with different peak values. It can be seen that indeed , as the peak value increases, the noise ”looks” more and more like Gau ssian. In this work we aim at denoising Poisson noisy imag es with p eak ≤ 4 where both A nscombe a nd Fitz transform ations are less effecti ve. The Anscombe transform is a non-linear element-wise tran s- 2 One may argue that the threshold valu e is 2 as a conclusion from the denoisin g result s in [1], so this threshold value is by no means an exa ct and deﬁnite one. 2 formation deﬁned as f Anscombe ( y [ i ]) = 2 r y [ i ] + 3 8 . (2) Therefo re, apply ing a denoising technique o n the s tablized data results with an estimate for f Anscombe ( x ) rather than for x . Thus, there is a need to ap ply an in verse transfor m in o rder to g et an estimate for x . No te that using the algebraic inverse results with a biased estimator du e to th e non- linearity of the transform . In [8], this p roblem has b een add ressed by provid- ing an exact unbiased inv erse fo r the Anscombe transfo rm, which eventually lead s to a better r ecovery perfo rmance. Howe ver, as said above, e ven with this exact inv erse the recovery error dramatically in creases fo r p eak < 3 [2]. In order to dea l with this d eﬁciency , a strategy that relies d irectly on the Poisson statistics is req uired. This directio n has b een taken in [1], [2], providin g a Gaussian m ixture mod el (GMM) [14] based appr oach that relies d irectly on the Poisson noise proper ties. By di viding the image in to overlapping patch es, dividing them to few large clusters and then perfor ming a projection onto th e largest comp onents of a PCA-like b asis for each gro up, state-o f-the-art results have been reported fo r small pea k values. This app roach has two versions. In th e ﬁrst, the non-lo cal PCA (NLPCA), the p rojection is compu ted by minimizing a Poissonian Bregma n divergence using Newton steps, while in th e second, th e no n-local sparse PCA (NL- SPCA), the SPIRAL metho d [1 5] that ad ds an ℓ 1 regularization term to the minimized o bjectiv e is used resulting with a better recovery perf ormance . A. Our Contribution In this work we take a similar p ath as [2] in the image patches modelin g, using th is for the recovery p rocess. How- ev er , we take a different rou te and propose a sparse re pre- sentation mo deling an d a diction ary learning based de noising strategy [13] instead of the GMM. W e employ a greedy OMP-like metho d for sparse co ding of the p atches an d use a smart b oot-strappe d stop ping criter ion. W e demonstrate the superiority of the p roposed scheme in various exper iments. W e have p resented a p reliminary version o f the pro posed OMP-like techn ique that achieves a relati vely poor recovery perfor mance in a local con ference [16]. In this paper, both the algorithm ic and the exper imental parts h av e been r emarkably improved. The main contributions o f this p aper are: • W e intr oduce a greedy tech nique for the Poisson sparse model. Such pursuit method s for a Gaussian noise a re common ly used, and extensive work ha s been dev oted in the past two dec ades to their co nstruction and an alysis. Thus, a propo sal of a greed y strategy fo r the Poisson model is of impo rtance and may open the door for many other variants a nd th eoretical study , similar to what exists in the Gaussian regime. For example, our gr eedy meth od can be extended for th e Poisson deconv olution p roblem as don e for oth er g reedy techniq ues in [17], [1 8] or ser ve as a basis for Poisson inpain ting [19]. W e mention that in some lo w light Poisson denoising applications the diction ary is already kn own. One ex- ample is Fluorescence m icroscopy where the measured image might be sparse by itself . Hence, in such cases the intr oduction of a new recovery method by itself is importan t. • W e intro duce a novel stopping criteria for iterative al- gorithm, and its inco rporatio n into the pursuit leads to much improved results. T o the best of ou r knowledge, this boot strapp ing based stoppin g criterion ﬁrst appears in our paper . • The interplay between GM M and dictionar y learning based mo dels has signiﬁcance of its own, as seen in the treatment of the simpler Gaussian imag e denoising prob- lem. The migration from GMM to dictionar y learn ing poses ser ies o f difﬁculties that ou r paper describ es and solves. Note that in this paper we utilize the learnin g strategy in [20]. • The integratio n of the above lea ds to state-of- the-art results, especially f or the very low SNR case. B. Or ganization The organization o f the pape r is as follows. Sectio n II describes the Poisson deno ising problem with more details, and presents the previous con tributions. Sectio n II I introduces the pro posed denoising algorithm , starting with the pu rsuit task, moving to the clusterin g th at we em ploy for ach ie ving non-lo cality in the denoising process, discussing the ro le of learning the dictionary , an d conc luding with the overall scheme. Section IV presents v ario us te sts and comparisons th at demonstra te th e den oising perfo rmance and superior ity of the propo sed scheme. Section V discusses f uture work directio ns and con clusions. I I . P O I S S O N S P A R S I T Y M O D E L V arious im age processing applicatio ns use the fact that image patches can be r epresented sparsely with a giv en dictionary D ∈ R d × n [21]. Und er this assumptio n each patch p i ∈ R d (the patch is of size √ d × √ d held in a colum n stack represen tation) fro m x can be rep resented as p i = D α i , where α i ∈ R n is k -spa rse, i.e., it has o nly k n on-zero entries, where k ≪ d . If our image is of dimen sion m 1 × m 2 and we trea t all the overlapping patch es in the imag e then 1 ≤ i ≤ ( m 1 − √ d + 1)( m 2 − √ d + 1) . This mod el leads to state-o f-the-ar t results in Gaussian denoising [1 2], [13], [2 1]. In order to use this sparsity- inspired model for the Poisson no ise case one has two o ptions: (i) conv ert the Poisson noise into a Gaussian one, as done in [8]; or (ii) adapt the Gau ssian denoising tools to the Poisson statistics. As explaine d above, the later is important for cases where the Ansco mbe is non-effective, an d th is ap proach is indeed practiced in [ 1], [2], [15], [2 2]. Maximizing of the log-likelihood of the Poisson distribution in (1) provides u s with the following minimizatio n pro blem for recovering the i -th patc h p i from its correspo nding Poisson noisy p atch q i , min p i 1 ∗ p i − q ∗ i log( p i ) s.t. p i ≥ 0 . (3) 3 where 1 is a vector compo sed of ones and the log ( · ) op eration is applied element-wise. Note that this minimizatio n pro blem allows zero entr ies in p i only if the cor respondin g entries in q i are ze ros as well. Th e minimizer of ( 3) is the noisy patch q i itself, and thus using only (3) is not enou gh and a p rior is need ed. By using the stan dard sparsity pr ior f or patches as practiced in [13] and elsewhere, p i = D α i , we en d up with the f ollowing minimization pr oblem: min α i 1 ∗ D α i − q ∗ i log( D α i ) (4) s.t. k α i k 0 ≤ k , D α i ≥ 0 , where k·k 0 is the ℓ 0 semi-norm which co unts the n umber of non-ze ros in a vector . Besides the fact that (4) is a combin a- torial p roblem, it also impo ses a non- negati vity co nstraint on the r ecovered patch, which f urther comp licates the n umerical task at hand. In order to resolve the latter issue we follow [2 ] and set p i = exp( D α i ) wh ere ex p( · ) is applied e lement-wise and α i is still a k - sparse vector . This leads to the fo llowing minimization pr oblem, min α i 1 ∗ exp( D α i ) − q ∗ i D α i s.t. k α i k 0 ≤ k . (5) Having th e non -negativity constraint rem oved, we still need to have an appr oximation algor ithm f or solv ing (5) as it is likely to b e NP-hard . One o ption is to use an ℓ 1 relaxation, which leads to the SPIRAL m ethod [15]. An other, simpler option is to reduce the dictiona ry D to have only k colu mns and thu s (5) can b e minimized with any standard optimizatio n toolbox for con vex o ptimization 3 . This approach is t aken in the NLPCA techn ique [1] wher e the pa tches of y are clu stered into a small nu mber of disjoint groups an d each gr oup has its own narrow dictionar y . Denoting by j , { q j, 1 , . . . , q j,N j } , { α j, 1 , . . . , α j,N j } and D j ∈ R d × k the group number, its noisy patches, the representatio ns of the patch es and their dictio nary , the minimizatio n prob lem NLPCA aims at solvin g for the j -th group is min D j , α j, 1 ,..., α j,N j N j X i =1 1 ∗ exp( D j α j,i ) − q ∗ j,i D j α j,i . (6) Notice th at D j is calculated a lso a s p art of th e min imization process as each grou p has its own diction ary th at shou ld b e optimized. The typical sparsity k an d nu mber of clusters used in NLPCA are 4 and 14 respectively . As it is h ard to min imize (6) both with respect to the diction ary and to th e representa- tions, an alter nating minimization p rocess is u sed by applying Newton steps upd ating the dictionar y and the r epresentations alternately . Having th e recovered patches, we retu rn each to its correspo nding lo cation in th e recovered im age an d average pixels that are m apped to the sam e place (since overlapping patches are u sed). The auth ors in [1] suggest to repe at the whole pro cess, u sing the outpu t of the algo rithm as an in put for the clustering process an d then applying the algorithm again with the new division. An improved version of NLPCA, the NLSPCA, is p roposed in [2], replac ing the Newton step 3 For a gi ven support (locati on of the non-zeros in α ), the problem posed in (5) is con vex. with the SPIRAL algorith m [15] fo r calculating the patches’ representatio ns. The NLPCA for Poisson denoising is ba sed on ideas related to the GMM model dev eloped for the Gaussian deno ising case [14]. Of course, this metho d can be u sed for the Poisson noise by app lying an Anscomb e transform. Howev er , such an approa ch is sh own in [2] to b e inferior to the Poisson m odel based strategy in the low-photo n count case. Like [1], [2], in this work we do not use the Anscomb e and rely on the Poisson based mod el. Howev er , as oppo sed to [1 ], [2], we u se one global diction ary for all patch es and pr opose a greedy algo rithm for ﬁnding the repr esentation of ea ch patch. In an appro ach similar to th e one ad vocated in [13], we treat similar patches jointly by forcing th em to share the sam e atoms in th eir repre sentations. This intro duces a non- local fo rce and sharing of informa tion between different regions in the image. A dictionar y learning process is utilized in o ur scheme as well, in o rder to imp rove the initial dictionar y used f or the data, as the initial dictionary w e em bark from is g lobal f or all im ages. Before movin g to the n ext sectio n we mention a techniq ue propo sed in [2] to enhan ce the SNR in no isy image s. Th is method is effective especially in very low SNR scenario s suc h as peak smaller th an 0 . 5 . I nstead of deno ising the given imag e directly , one can downsample th e imag e by app lying a low- pass ﬁlter followed b y d own-sampling. This provid es us with a smaller image but with a h igher SNR. For example, if ou r low-pass ﬁlter is a kern el of size 3 × 3 c ontaining ones, an d we sample e very third r ow and ev ery third colu mn, we end up with a nine times sma ller image that has a n ine times larger peak value. Having the low-res n oisy image, o ne may ap ply on it any Poisson denoising algorithm an d then perform an upscaling interp olation on the r ecovered small image in o rder to re turn to the origina l dimensions. This me thod is refe rred to as binning in [2] and related to multi-scale programs [23]. Note that this tech nique is especially effecti ve for the An scombe based techniqu es as the peak value of the processed image is larger th an the initial value. I I I . S PA R S E P O I S S O N D E N O I S I N G A L G O R I T H M ( S P DA ) Our d enoising strategy is based on a dictionary le arning based approa ch. W e start by extracting a set of overlapping patches { q i | 1 ≤ i ≤ ( m 1 − √ d + 1 )( m 2 − √ d + 1) } ( m 1 and m 2 are the vertical and h orizontal dimensions of the no isy image y respectively) of size √ d × √ d from th e n oisy image y . The goal is to ﬁnd the dictionary D that leads to the sparsest representatio n of this set of p atches u nder the expo nential formu lation. In other words o ur ta rget is to m inimize min D , α 1 ,..., α N N X i =1 1 ∗ exp( D α i ) − q ∗ i D α i (7) s.t. k α i k 0 ≤ k , 1 ≤ i ≤ N , where N = ( m 1 − √ d + 1)( m 2 − √ d + 1 ) . As in [1 ], [2], since m inimizing both with respect to the dictio nary and the representation s a t th e sam e tim e is a har d pr oblem, we minimize this function altern ately . The pursuit (up dating the representatio ns) is perf ormed using a g reedy tec hnique which returns a k -sparse representation for each patch , unlike the 4 Algorithm 1 Patch Gr ouping Algorith m Input: y is a g i ven image, √ d × √ d is the p atch size to use, l is a target group size, h is a given co n volution kernel (typically a wide Gau ssian) and ǫ is a toleran ce factor . Output: Division to disjoint groups of patches, where the g -th group is of size N g ≥ l and is Q g = { q g, 1 , . . . , q g,N g } . Begin A lgorithm: -Conv olve the image with the kernel: ˜ y = y ∗ h . W e take ˜ y to be of the same size of y . -Extract all ov erlappin g patches Q = { q 1 , . . . , q N } of size √ d × √ d fro m y and their cor respondin g p atches { ˜ q 1 , . . . , ˜ q N } o f size √ d × √ d fro m ˜ y . -Set ﬁrst group pivot ind ex: s 0 = arg min 1 ≤ i ≤ N k ˜ q i k 2 2 . -Initialize g = 0 and i prev g = s 0 . while Q 6 = ∅ do -Initialize gr oup g : Q g = ∅ and l g = 0 . -Select ﬁrst candidate: i g = arg min i : q i ∈ Q   ˜ q s g − ˜ q i   2 2 while  l g ≤ l or       ˜ q s g − ˜ q i g   2 2 −    ˜ q s g − ˜ q i prev g    2 2     ≤ ǫ 2  and Q 6 = ∅ do -Add patch to gro up j : Q g = Q g ∪ { q i g } , l g = l g + 1 . -Exclud e patch from search: Q = Q \ { q i g } . -Sav e previous selectio n: i prev g = i g . -Select n e w candid ate: i g = argmin i   ˜ q s g − ˜ q i   2 2 . end while -Set pivot ind ex for next group : g = g + 1 , s g = i g − 1 . end while -Merge the last g roup with the pr evious o ne to ensure its size to be bigg er th an l . global Newton step or SPIRAL which is no t g uaranteed to have a spa rse outpu t. For learn ing the dictio nary we use the technique in [20] that updates th e dictionary to gether with the representatio ns while their su pports are kept ﬁxed [2]. In ord er to f urther b oost the perf ormance of our algor ithm and exploit the fact tha t similar patches in the image dom ain may have the same suppor t in their sparse representation , the patches are clustere d into a large nu mber of small disjoint gr oups of similar pa tches. W e turn to describ e in details each step o f the algorithm . A. P atch Clustering Ideally , as Poisson noisy images with very low SNR are almost binary images, a g ood c riterion for measuring the similarity between patches would be the earth mover’ s distance (EMD). W e appro ximate this me asure by setting the d istance between patches to b e th eir Euclide an distance in the image after it passes th rough a Ga ussian ﬁlter . It is clear that the Euclidean distance is no t the o nly option, n or the best. Nev ertheless, the reason we ha ve selected the ℓ 2 distance is th at it g i ves a bigger weight for entries where we ﬁn d noisy pixels with large values. Those ar e usually ra re and reﬂect lo cations of high intensity in the origin al image. W e a re expecting that patches with high intensity , which are similar in the origin al image, should hav e concentratio ns of high photons counts at the same locations. Algorithm 2 Poisson Greedy Alg orithm Input: k , D ∈ R d × n , { q 1 , . . . , q ˜ l } where q i ∈ R d is a Pois- son distributed vector with mean and variance approximated (modeled ) b y exp( D α i ) , and k is the m aximal cardinality of α i . All re presentations α i are assum ed to have the same support. Optional param eter: Estimates o f the true image patches { p 1 , . . . , p ˜ l } . Output: Estimates ˆ p i = exp( D ˆ α i ) for q i , i = 1 . . . ˜ l . Begin A lgorithm: -Initialize the support T 0 = ∅ and set t = 0 . while t < k do -Update iteration counter: t = t + 1 . -Set initial o bjective value: v o = inf . for j = 1 : n do -Check atom j : ˜ T t = T t − 1 ∪ { j } . -Calculate curr ent o bjective value: v c = min ˜ α 1 ,..., ˜ α ˜ l P ˜ l i =1 1 ∗ exp( D ˜ T t ˜ α i ) − q ∗ i D ˜ T t ˜ α i if v o > v c then -Update selection: j t = j and v o = v c . end if end for -Update the suppor t: T t = T t − 1 ∪ { j t } . - Update represen tation estimate: - Set [ ˆ α t 1 , . . . , ˆ α t ˜ l ] = [ 0 , . . . , 0 ] . - Update on -suppor t values: [( ˆ α t 1 ) T t , . . . , ( ˆ α t ˜ l ) T t ] = argmin ˜ α 1 ,..., ˜ α ˜ l P ˜ l i =1 1 ∗ exp( D T t ˜ α i ) − q ∗ i D T t ˜ α i . if { p 1 , . . . , p ˜ l } ar e given then -Estimate erro r: e t = P ˜ l i =1   exp( D ˆ α t i ) − p i   2 2 . if t > 1 and e t > e t − 1 then -Set t = t − 1 and break ( exit wh ile and retu rn the result of the previous iteration). end if end if end while -Form th e ﬁnal estimate ˆ p i = exp( D ˆ α t i ) , 1 ≤ i ≤ ˜ l . The grouping alg orithm is describ ed in details in A lgo- rithm 1. It creates disjoint groups of size (at least) l in a sequential way , a dding elem ents on e by one. On ce a grou p gets to th e destinatio n size, the algo rithm continu es to add elements whose distance f rom th e ﬁrst element in th e g roup (the piv o t) is up to ǫ away from the distance of the last add ed element (See A lgorithm 1). The reason we have selected to use this strategy f or clus- tering is d ue to its ability to divide th e patches to groups of similar size. Th e reason we target similar sizes is that we want to guaran tee th at we h av e “enou gh” atom s in e ach group fo r selecting the “corr ect” suppo rt fo r the group in the sparse co ding step (d escribed in the next subsectio n). On the other hand, we nee d to have as many grou ps as possible since otherwise we will n ot have enou gh infor mation for u pdating the dictionary , as we select the same sup port for all the patches in the same gro up. W e co uld have selected a g roup for each atom separ ately , which would result with overlapping g roups, but we have cho sen not to d o so due to compu tational reasons. 5 This is also exactly the reason why we have cho sen to use the current clusterin g metho d and no t other o ff-the-shelf m ethods as its g reedy natu re seems to m ake it faster . W e should n ote that we are well aware o f the fact that ou r choice of grou ping method is subop timal, a nd y et, as the r esults in Section IV indicate, it is sufﬁciently goo d-perfo rming. B. Spa rse Coding For calculating the repr esentations we use a jo int sparsity assumption f or each g roup o f patches with a greedy algor ithm that ﬁnds the repr esentations of each group together . This algorithm is iterativ e and in each iteration it adds the atom that redu ces the mo st the cost f unction (5) for all th e re pre- sentations that b elong to the same grou p. W e f urther explain this matter when we discuss Eq uation (8). An im portant a spect in our pursuit algor ithm is to de cide how many atom s to associate with each patch, i .e., what should be the stopping criterion o f the algor ithm. W e emp loy two options. T he ﬁrst is to ru n the algorith m with a constant n um- ber of iter ations. Howe ver, this choice leads to a suboptim al denoising effect as different patches co ntain different conten t- complexity an d thus require different sparsity levels. Bootstrapping Based Stop ping Criterion: Ano ther option is to set a different card inality for each group. In order to do so we need a way to e valuate th e error in each iteration with respect to the tr ue image an d then stop the iteration s once the error starts in creasing. One o ption for estimatin g the erro r is using the Poiss ion unbiased risk estima tor (PURE), a v ar iant of the Stein ’ s unb iased risk estimato r (SURE) for Po isson noise, as do ne in [24] f or the NL-Mean s a lgorithm. Howe ver, in our context the computation of the PURE is too deman ding, as it requ ires re-app lying the denoising meth od over an d over again for th e same im age b y changing on e pixel each time. In [24], this is don e efﬁciently d ue to the simple structu re of the denoising method (NL -Means) used the re. In our case this (as far as we can see) cannot b e done. Thus, we use boo t-strapping – we rely on th e fact that our sche me is iterative and after each step of represen tation decodin g and dictionar y upda te we have a new estimate o f the origin al imag e. W e use the p atches of the reconstructed image f rom the p revious iter ation as a proxy fo r the p atches of the true imag e and compu te the erro r with r espect to them. One m ight th ink that if the patch from the previous iteratio n is used to deter mine th e numb er of iteratio ns for the same patch in th e curren t iteratio n, it will make no ch ange and we will “get stuck” w ith the same outco me. Howe ver , a s an averaging process is applied in the mid dle together with a dictio nary update step, we are no t expected to get b ack the same p atch but rath er a d ifferent den oising result. Note that sinc e we upd ate the dictionary between iteratio ns and average the patches in the r ecovered image, we do t not get stuck on the same p atch ag ain by using this stoppin g criteria. In pr actice, this condition imp roves the recovery perfor mance signiﬁcantly as each grou p o f patch es is re presented w ith a cardinality that suites its content be tter . The greed y Poisson algo rithm is summarized in Algo- rithm 2. It starts with an empty support T 0 and adds one element to it gradually in each iteration. Gi ven the sup port T t − 1 of th e previous iteration, the next added atom to the support is chosen by iterating over all the ato ms in th e dictionary an d calculating for eac h the following minimizatio n problem : min ˜ α 1 ,..., ˜ α l ˜ l X i =1 1 ∗ exp( D T t − 1 ∪{ j } ˜ α i ) − q ∗ i D T t − 1 ∪{ j } ˜ α i , (8) where j is the index of a poten tial a tom for addition, { q i } ˜ l i =1 are the patches we deco de, { α i } ˜ l i =1 are r epresentation s of size t (car dinality in the cu rrent iteration) a nd D T t − 1 ∪{ j } is D restricted to the supp ort T t − 1 ∪ { j } (In a similar way ( ˆ α t i ) T t is the vector ˆ α t i restricted to the en tries supporte d on T t ) . Notice that all the patches in the clu ster are enfo rced with the same sparsity pattern as we p erform the minim ization fo r all th e ˜ l patch es to gether an d select the same atom for all. Notice that the pro blem in (8) is th e same as the one in (5) but for a ﬁxed gi ven suppo rt. When the suppo rt is ﬁxed it is a conv ex problem, wh ich appear s also in the NLPCA techniqu e, and can be solved by the Newton method or by any conve x optimization too lbox. The boo tstrapping based stopp ing criterio n is bein g used only if the “oracle” patches { p i } ˜ l i =1 are provided. They serve as an estimate f or the patche s of the true image and theref ore we add atoms to the repr esentation till the distance from these patches starts increasing. In this case the outpu t of the sparse coding is the deco ded rep resentation fr om the previous ( t − 1 ) iteration. One may argue that we should have used a Poissonian Breg- man divergence for calculatin g the error and not the ℓ 2 error . The reason we u se the ℓ 2 criterion is that it is the standa rd measure for checking the quality in imag e reconstruction . Note that we use this distan ce measure with the deno ised imag e (which is no lon ger Poissonian ) and n ot with the n oisy image. At the ﬁrst time we a pply the sparse codin g a lgorithm in our recovery pro cess , we canno t use th e bootstrapp ing based stopping criterion as we do no t ha ve an estimate for the original image yet. Therefor e, we use the same card inality k for all grou ps. In the subseq uent iteration s the p atches o f the r ecovered image from the p revious stage are used as an input to the algorithm for the b ootstrapp ing stopping criterio n. Fig. 2. The piec e wise constant image used for the o fﬂi ne training of the ini tial dicti onary . For eac h range of peak va lues the image is scaled appropria tely . 6 Algorithm 3 Sparse Poisson Den oising Algo rithm (SPD A) Input: y , D ∈ R d × n , k , l , h, R , L 1 , L where y is a Poisson noise d ata with mean an d variance x , th e patc hes in x ar e approx imated (modeled) b y exp( D α i ) , k is the cardina lity to b e used for the repr esentations at the ﬁrst sparse cod ing step, l is the gr oup sizes, h is the kernel ﬁlter used with the clustering algorith m, R is the numbe r of advanced dictionary learn ing rounds, L 1 is the n umber of inner learning iteration s at the ﬁrst roun d a nd L is the number at the rest of the rou nds. Output: ˆ x an estimate for x . Begin A lgorithm: -Extract overlapping patches fr om y . -Apply the patch g rouping algor ithm on y with h . -For each gro up apply the Po isson Gree dy Alg orithm with k . -Put in ˆ x a ﬁr st estimate for x by a re- projection of the recovered patch es by averaging. -Set t = 0 while t < R do -Update iter ation counter: t = t + 1 . if t = 1 then Apply L 1 alternating N e wton update steps for (9). else Apply L alternating N ewton update steps for (9). end if -Put in ˆ x an estimate for x by a re- projection of the recovered patch es by averaging. -Extract overlapp ing patches fr om x . -For each g roup a pply the Poisson Gree dy Algorithm with the bootstrap ping stopp ing cr iterion that r elies on the p atches extracted from x . end while -Apply the patch g rouping algorithm on ˆ x with no kernel ( h = [1] ). -Repeat the whole process aga in using th e new clustering and using the curr ent dictiona ry as the initial dictionar y . C. Diction ary Learn ing Giv en the dec oded represen tations we proceed to u pdate the dictionary . W e cou ld have used the simple classical learnin g mechanism and just up date the diction ary atom s, which can be d one u sing a Newton step in a similar way to [2] (with an addition of an Armijo rule). In stead, we utilize an advanced learning strategy from [20]. Keeping the supports of the representatio ns ﬁxed, we up date both the dictionary and the representatio ns witho ut ch anging their support. Denoting b y T i the support of the i -th p atch, the prob lem we aim at solving in th is case is min D , ˜ α 1 ,..., ˜ α N N X i =1 1 ∗ exp( D T i ˜ α i ) − q ∗ i D T i ˜ α i . (9) First, note that it may happen that some atoms i n the dictionary are not used by any of th e representations. Therefore, we have no inf ormation for upd ating them. In this ca se, th ose are removed fr om the dictionary resulting with a narrower dictionary . Second, notice the similarity between (9) and (6). As we can set D to be a co ncatenation of the d ifferent small dictionaries in (6), we can view (9) as a g eneralization of (6) where g roups can share a toms. In this sense we can look at the ad vanced learning stra tegy as a generalizatio n o f the GMM. Th e fact that we use small grou ps and apply a sparse codin g after each advanced lea rning step g i ves more freed om and versatility to our schem e. W ith the ab ove connectio n between the two, we can solve (9) u sing the technique f or solving (6): Applyin g alternating Newton steps (with a n ad dition o f a n Ar mijo rule) bo th on th e dictionary and th e representations. Sin ce these in ner iterations reinfo rce the relation between the dictionar y atom s and the related represen tation, the numb er of th ese iterations depend s on our conﬁd ence in the selected suppor ts. W e select a different nu mber of iterations fo r the ﬁrst learn ing r ound and the re st be cause all the su pports at the ﬁrst rou nd are an outcome of the sparse co ding with a ﬁxed cardina lity and have the same size. Initial Dictiona ry Selection: The initial d ictionary we use is a dictionar y trained off-line on patches o f a clean piecewise constant image shown in Fig. 2. Th is training proc ess fo r an initial dictio nary is need ed sin ce, unlike the stan dard spar sity model where many go od dictionaries are presen t, for the exponential sparsity mo del no suc h dictionar y is intuitively known. Notice also that the rep resentation in the new model is sensitive to the scale o f the imag e, u nlike the stan dard one which is scale invariant. This is due to the fact that fo r a given constant c we have nec essarily that D c α = c D α (10) but in our c ase exp ( D c α ) 6 = c exp ( D α ) . (11) Thus, we train different initializatio n dictionaries for different peak values. The training is done by applyin g our denoising algorithm on a clean image (with no noise) scaled to the required peak value. W e sho uld me ntion that this tra ining is done once and off-line. D. SPDA Su mmary Iterating over the p ursuit and dictio nary lear ning stages we get an estimate for the im age patches. For recovering th e whole image we r eproject each patch to its cor respond ing locatio n with a veraging. At this point it is po ssible to re-cluster th e patches accord ing to the new recovered image and repeat all the p rocess. Ou r pro posed spar se Poisson deno ising algorith m (SPD A) is summar ized in Fig. 3 and Algorithm 3. E. Comparison to NLPCA an d NLSPCA The main difference between our pro posed algorith m and the NLPCA and NLSPCA is re miniscent of the difference between the K-SVD image deno ising a lgorithm [13] and the GMM alternativ e ap proach [14], b oth developed for the Gaussian denoising p roblem. Furtherm ore, when it comes to 7 Fig. 3. The proposed sparse Poisson denoising algorithm (SPDA). the clusterin g of patches, NLPCA and NLSPCA use a small number of large clusters obtain ed using a k- means like algo - rithm. In ou r schem e the notio n of join t sp arsity is used which implies a large numb er of disjoint small groups that are d i vided using a simple algorith m. In NLPCA and NLSPCA, a different set of basis vectors is used for ea ch cluster while we use one (thicker) diction ary fo r all patch es tog ether, fr om which subspaces are created by different choices of small gro ups of atoms. Un like NLPCA an d NLSPCA, our d ictionary can change its size during th e learn ing steps, and the card inalities allocated to different p atches are dyn amic. As a last po int we mention th at NLPCA uses a N e wton step and NLSPCA uses the SPIRAL alg orithm for th e r epresentation deco ding while we propose a new gr eedy p ursuit for this task which guarantees a destina tion c ardinality or reconstru ction err or . W e compare our algorithm ’ s complexity to the one of NLPCA. As we are inter ested on ly in the order of the com- putational c ost we focus o n the bottlenec ks of ea ch algo rithm. In NL PCA, the m ajor co mputation al part is so lving (6). It is don e by app lying a constant num ber of Ne wton steps on the patches and th e local diction aries. Th e complexity of the Newton steps is of th e order of the Hessian inversion. Using the special properties of the Hessian in (6) it is po ssible to perfor m its inversion with complexity O ( k d 3 2 ) [1], wher e k is th e numb er of atom s in each loc al diction ary in NLPCA. Therefo re, given th at the number of patches is N , the total complexity of NLPCA is O ( N k d 3 2 ) . For NLSPCA we get a similar co mplexity . The bottleneck in SPD A is the spar se cod ing. At each iteration of add ing an element to the represen tation, the sparse coding passes over all the a toms in the diction ary and solves (8) checkin g the potential erro r of each of them. For calcula t- ing this err or , we use a constant nu mber of Newton steps. As before, the complexity of minimizing (8) is O ( k d 3 2 ) . Since we start with a square dictiona ry and we target k non -zero elements, the worst-case complexity is O ( k 2 d 5 2 ) per patch, as for each spar sity level we need to pass over all the atom s in the d ictionary and for each solve (8 ). T herefor e, th e overall complexity o f our algor ithm is O ( N k 2 d 5 2 ) , wher e N is the number of processed patch es. Thoug h ou r method is more co mputation ally dem anding, we shall see in the next section that in most cases this cost leads to better denoising results. It should be mentioned also that SPD A is high ly par allelizable which allows a g reat redu ction in its runnin g time. Note also that since the size o f o ur dictio nary may shrink between the spar se coding steps, the co mplexity of the spa rse codin g is likely to become smaller at the latter stages of SPD A. I V . E X P E R I M E N T S In or der to evaluate the SPD A ( with and without binning ) perfor mance we re peat the denoising experimen t perfor med in [2]. W e tes t the r ecovery erro r fo r v ar ious imag es with different peak values ranging from 0 . 1 to 4 . The tested imag es appear in Fig. 4. T he metho ds we c ompare to ar e the NLSPCA [2] and the BM3D with th e exact unbiased in verse An scombe [8] (both with and withou t b inning) as those are th e best-perfo rming methods u p to d ate. Th e code fo r these techniqu es is av ailable 8 Fig. 4. T est images used in this paper . From left to right: Saturn, Flag, Cameraman, House, Swoosh, Peppers, Bridge and Ridges. Parame ter V alue Image Size 256 × 256 Patc h Size 20 × 20 k in First Sparse Coding Step k = 2 Cluster Size ( l ) 50 (no binning) , 6 (binning) Diction ary Learning Rounds R = 5 Diction ary Update Inner Iteration s L 1 = 2 and L = 20 Reclust ering Once Initia l Diction ary (Peak ≤ 0 . 2 ) Tra ined on Fig. 2 with Peak = 0 . 2 Initia l Diction ary ( 0 . 2 < Peak ≤ 4 ) Trained on Fig. 2 with P eak = 2 Initia l Dictionary (Peak > 4 ) Tra ined on Fig. 2 with Peak = 18 Binning Kerne l 3 × 3 Ones Ke rnel T ABLE I P A R A M E T E R S E L E C T I O N online and we use the sam e param eter settings as ap pears in the c ode. For the binn ing we f ollow [ 2] a nd u se a 3 × 3 on es kernel th at incr eases the p eak value to be 9 times h igher, and a bilinear inter polation fo r the upscaling of the low-resolution recovered imag e. For SPD A we use th e fo llowing par ameter setting: Th e size of each p atch is set to be 20 × 20 pixels. W e start with a sparse coding with a ﬁxed sparsity k = 2 . Then we apply ﬁv e round s of sparse coding with the boo tstrapping based stop- ping criterio n together with the advanced dictionary learnin g mechanism that contain s a jo int up date o f the dictionary and the repr esentations (with ﬁxed supp ort). In the ﬁr st roun d we apply 2 inn er iteratio ns of the dictionary u pdate and in the rest we use 20 inner iteration s. The rea son we use a different number of in ner iter ations at the ﬁrst round is that in this round the suppo rts o f th e represen tations are less r eliable as they are selected with a ﬁxed support s ize. After th e ﬁrst r ound, the bootstrap ping b ased stopping criterion is b eing employed and ea ch gro up is being deco ded with a different cardinality leading to a better support selection . W e re- cluster using the outcome of the ab ove process an d repeat it again. W e remind the reader that in selectin g the clusters size we have a trade-off betwe en the number of groups and size of each cluster . As eac h grou p selects th e same sup port, h aving m ore group s provides us with mo re informatio n for th e d ictionary update process. On the other hand, the larger th e cluster the more probab le it is that we select the “righ t” sup port. Of course, we co uld h av e used overlapping gro ups but we have chosen n ot to d o so d ue to computatio nal reasons. As a rule of thumb, we h av e fo und that having 100 0 gro ups is enou gh for the dictionary update. As ou r images are of size 2 56 × 25 6 , this implies a grou p size l = 5 0 . If binning is not used an d l = 6 if it is used. Note that the smaller gro up size in the binn ing case is com pensated by the fact that e ach p atch co ntains more informa tion as it r epresents a larger portion in the original image. The initial dictionary is sq uare ( 400 × 400 ) and dependent on the p eak value. For peak ≤ 0 . 2 we use a dictio nary tr ained on the squares image with peak 0 . 2 , f or 0 . 2 < peak ≤ 4 we train for peak = 2 and for peak > 4 we train for peak = 18 (this is relevant for SPDA with binnin g wh en the original peak is greater than 4 / 9 ). No te that since in Poisson noise th e m ean is equal to the origina l sign al, a roug h estimation for the peak is a n easy task. W e note that o ur algorithm is n ot sensitive to the initial dictio nary selection: if th e peak value is inaccurately estimated an d the “wrong” d ictionary is selected, th e recovery is no t affected signiﬁcantly . I ndeed, we cou ld have trained a different initial d ictionary for e ach peak value. H owe ver, we chose not to d o so, in o rder to demon strate the insensitivity of our scheme to the initialization. W e r emark that it is p ossible to use other reasonable initializations to our upda te process such a s the log of the absolu te values of the DCT transform . From ou r experienc e, there is not mu ch difference in the reconstruc tion result and the gap in the recovery err or is in the range of o nly 0 . 2 dB 4 . The p arameter selection is sum marized in T able I A compar ison between th e recovery of the ﬂ ag imag e with peak = 1 is presented in Fig 5. It can be observed that this image is recovered very acc urately while in the other meth ods there are many artifacts. Samples f rom the dictio nary atoms learned by SPD A are pr esented in Fig. 6. It can be observed that th e d ictionary learning p rocess cap tures th e shap e of the stars and the lines in the ﬂag. Fig ures 7, 8 an d 9 present the recovery of ridges , Saturn and hou se fo r peak = 0 . 2 a nd peak = 2 r espectiv ely . It can b e seen that for the low p eak value the b inning metho ds capture the structure o f the image better and provide lower erro r . Howe ver, when the pea k is higher the binnin g p rovides degraded p erform ance co mpared to the r econstruction using the orig inal image. In all images the SPD A recovers the images’ details better . Fig. 6. Samples of atoms from the diction ary D learned using SPDA for ﬂag with peak = 1 . The r ecovery err or in terms o f PSNR for all images and different peak values app ears in T able II. By look ing at the overall perfo rmance we can see that our prop osed strategy provides better per forman ce on average fo r 5 of 6 tested pe ak- values. No te that even in the case wh ere it does no t behave better, the difference is in signiﬁcant ( 0 . 02 dB fo r peak = 1 ). 4 A package with the code reproducing all our results can be found at www .cs.technion.ac .il/ ∼ raja . 9 (a) ﬂag image. (b) NL SPCA. PSNR = 20.37dB (c) BM3D. PSNR = 18.51dB. (d) SPDA. PSNR = 22.59dB . (e) Noisy image. Peak = 1. (f) NLSPCAbin. P SNR = 16.91dB. (g) BM3Dbin. P SNR = 19.41dB. (h) SPDAbi n. PSNR = 19.9dB. Fig. 5. Denoising of ﬂag with peak = 1. The PSNR is of the presented recov ered images. (a) ridges image. (b) NL SPCA. PSNR = 19.57dB (c) BM3D. PSNR = 19.66dB. (d) SPDA. PSNR = 19.82dB. (e) Noisy image. Peak = 0.1. (f) NLSPCAbin. P SNR = 21.84dB. (g) BM3Dbin. PSNR = 18.98dB. (h) SP D Abin. P SNR = 24.43dB . Fig. 7. Denoising of ridges with peak = 0.1. T he PSNR is of the presented recov ered images. 10 (a) Saturn image. (b) NL SPCA. PSNR = 22.98dB (c) BM3D. PSNR = 21.71dB. (d) SPDA. PSNR = 21.47dB. (e) Noisy image. Peak = 0.2. (f) NLSPCAbin. P SNR = 20.35dB. (g) BM3Dbin. PSNR = 23.16dB. (h) SP D Abin. P SNR = 24.35dB . Fig. 8. Denoising of Saturn with peak = 0.2. The PSNR is of the presented recover ed images. (a) house image. (b) NL SPCA. PSNR = 23.23dB (c) BM3D. PSNR = 24.06dB. (d) SPDA. PSNR = 24.8dB . (e) Noisy image. Peak = 2. (f) NLSPCAbin. P SNR = 21.28dB. (g) BM3Dbin. P SNR = 24.23dB. (h) SP D Abin. P SNR = 21.51dB. Fig. 9. Denoising of house with peak = 2. The PSNR is of the presented recover ed images. 11 Method Peak Saturn Flag Camera House Swoosh Peppers Brid ge Ridges A ver age NLSPCA 20.86 14.42 16.41 17.81 19.11 16.24 16.59 20.92 17.8 NLSPCAbin 19.13 16.07 17.15 18.71 21.89 16.12 16.93 24.05 18.75 BM3D 0.1 19 .42 13.05 15.66 16.28 16.93 15.61 15.68 20.06 16.6 BM3Dbin 21.19 14.23 16.91 18.62 21.90 15.92 16.91 20.40 18.26 SPD A (our work) 17.40 13.35 14.36 14.84 15.12 14.28 14.60 19.86 15.48 SPD Abin (our work) 22.00 15.40 16.75 18.73 21.90 16.27 16.99 25.32 19.17 NLSPCA 22.90 16.48 17.79 18.91 21.10 17.45 17.46 24.22 19.54 NLSPCAbin 20.54 16.54 17.92 19.74 24.00 16.92 17.57 25.91 19.89 BM3D 0.2 22 .02 14.28 17.35 18.37 19.95 17.10 17.09 21.27 18.45 BM3Dbin 23.20 16.28 18.25 19.71 24.25 17.44 17.70 23.92 20.09 SPD A (our work) 21.52 16.58 16.93 17.83 18.91 16.75 16.80 23.25 18.57 SPD Abin (our work) 23.99 18.26 17.95 19.62 23.53 17.59 17.82 27.22 20.75 NLSPCA 24.91 18.80 19.23 20.85 23.80 18.78 18.50 28.20 21.63 NLSPCAbin 20.98 17.10 18.32 20.98 26.48 17.77 18.18 26.81 20.83 BM3D 0.5 23 .86 15.87 18.83 20.27 22.92 18.49 18.24 23.37 20.29 BM3Dbin 25.70 18.40 19.64 21.71 26.33 19.01 18.67 28.23 22.21 SPD A (our work) 25.50 19.67 18.90 20.51 24.21 18.66 18.46 27.76 21.71 SPD Abin (our work) 25.83 19.22 18.97 21.15 26.57 18.63 18.57 30.97 22.49 NLSPCA 26.89 20.26 20.32 22.09 27.42 19.62 18.94 30.57 23.26 NLSPCAbin 21.18 17.07 18.50 21.26 27.62 17.80 18.20 27.58 21.15 BM3D 1 25.89 18.31 20.37 22.35 26.07 19.89 19.22 26.26 22.42 BM3Dbin 27.41 19.33 20.60 23.19 28.44 20.13 19.38 30.50 23.62 SPD A (our work) 27.02 22.54 20.23 22.73 26.28 19.99 19.20 30.93 23.61 SPD Abin (our work) 27.26 19.88 19.45 21.63 28.31 18.92 18.74 32.41 23.33 NLSPCA 28.22 20.86 20.76 23.86 29.62 20.52 19.47 31.87 24.4 NLSPCAbin 21.49 16.85 18.43 21.41 27.88 17.80 18.34 28.68 21.36 BM3D 2 27.42 20.81 22.13 24.18 28.09 21.97 20.31 29.82 24.59 BM3Dbin 28.84 20.02 21.37 24.49 29.74 21.16 20.17 32.06 24.73 SPD A (our work) 29.38 24.92 21.54 25.09 29.27 21.23 20.15 33.40 25.62 SPD Abin (our work) 28.51 19.81 19.61 21.72 28.98 19.26 18.93 33.51 23.79 NLSPCA 29.44 21.25 21.09 24.89 31.30 21.12 20.16 34.01 25.41 NLSPCAbin 21.20 16.50 18.45 21.44 28.01 17.82 18.34 29.09 21.36 BM3D 4 29.40 23.04 23.94 26.04 30.72 24.07 21.50 32.39 26.89 BM3Dbin 30.19 20.51 21.98 25.48 31.30 22.09 20.80 33.55 25.74 SPD A (our work) 31.04 26.27 21.90 26.09 33.20 22.09 20.55 36.05 27.15 SPD Abin (our work) 29.43 19.85 20.12 22.87 30.93 20.37 19.24 34.44 24.66 T ABLE II E X P E R I M E N T S O N S I M U L ATE D D AT A ( A V E R A G E OV E R FI V E N O I S E R E A L I Z A T I O N S ) . T H E I M A G E S A R E T H E S A M E A S T H O S E I N [ 2 ] . B E S T R E S U LT S A R E M A R K E D . For low pea k values SPD Abin behaves better and for larger ones SPD A should b e prefer red. Th e addition of th e binning to the algo rithms im proves their per forman ce signiﬁcantly in the lower peak values. As th e peak raises, the effecti veness of the b inning reduces. No te tha t the efﬁciency redu ces slower for BM3D. This migh t b e explained by the fact that it relies on the Anscombe that b ecomes much mo re effecti ve when th e peak incr eases. W e remar k th at our algo rithm beneﬁts from structure d images, and so do the other method s. This advantage, enabling the exploitation of self-similarities in ima ges is very impor tant in r eal life application s, and it is used extensively in the literature [2 5], [2 6]. In ma ny images we ar e likely to ﬁnd the same patterns r epeating over and over again, and espe cially so when it comes to small pa tches. Note that fo r pe ak inten sities equ al to 0 . 1 , 0 . 2 NLSPCA shows be tter PSNR r esults than SPD A, while the situation changes if binning is used . The reason is that in su ch low peak values the counts are so low th at working with the sam e patch-size used w ith the higher pe aks leads to weaker re sults. This may lead the dictio nary learning pr ocess to learn isolated noise p oints as dictionary elemen ts as h appens with SPDA in Fig. 8. T hus, we believe that with larger patch sizes w e could get b etter results for this pe ak values. Howev er , using large patch sizes is n ot feasible computation ally . Using binning compen sates this computatio nal barr ier by working on a low resolution version of the image with the same patch size. In the regular size, NL- PCA overcomes th e resolu tion problem by the fact that it u ses large cluster sizes. In o ur case, we cannot use large cluster sizes as we nee d as ma ny clusters as possible for the diction ary update step. In conclusion, SPDA seems to ha ve a g ood denoising quality for Poisson noisy images with low SNR achieving state-o f- the-art re covery p erforman ce. It is interesting to explo re th e contribution o f each stage of the algorithm to the quality o f the recovered image. Theref ore we evaluate the perf ormance of SPD A under ﬁve different setups: (I) Applyin g SPDA with only sparse co ding with k = 2 ; (II) Sparse co ding with k = 2 followed by a sparse coding with the bo otstrapping based stopping criterion; ( III) Applying SPD A with simple dictio nary learning steps without the joint represen tation an d dictionar y update stage. If no b inning is perf ormed and p eak ≤ 0 . 2 we use 120 learning iter ations. Otherwise, we use 2 0 dictionary learning iteration s as the ﬁrst stage, then re-cluster and apply additional 20 iteration s; (IV) Applying SPDA with the ﬁve advanced dictionary learning steps with no reclusterin g; (V) Using the setup fro m T able II, ﬁve advanced d ictionary le arn- ing steps followed by another ﬁve step s after r eclustering. 12 SPD A Setup Peak W ith Binning No Binning Ans+G 16.99 17.28 I 17.21 15.22 II 17.26 15.23 III 0.1 18.84 16.50 IV 18.97 15.48 V 19 .17 15.48 Ans+G 17.91 18.41 I 18.38 17.78 II 18.38 18.42 III 0.2 20.42 18.95 IV 20.47 18.72 V 20 .75 18.57 Ans+G 19.80 20.94 I 18.90 20.31 II 18.72 20.57 III 0.5 22.45 21.55 IV 22.23 21.62 V 22 .49 21.71 Ans+G 20.73 23.19 I 19.14 21.92 II 19.00 22.35 III 1 23.17 23.45 IV 23.02 23.43 V 23 .33 23.61 Ans+G 21.69 25.08 I 19.27 23.27 II 19.25 23.71 III 2 23.74 25.41 IV 23.31 25.48 V 23 .79 25.62 Ans+G 22.02 26.74 I 19.27 24.09 II 19.37 24.64 III 4 24.23 26.99 IV 24.19 27.06 V 24 .66 27.15 T ABLE III T H E E FF E C T O F T H E D I FF E R E N T S T AG E S I N S P DA : W E P R E S E N T T H E P S N R O F T H E O U T C O M E O F S P D A I N D I FF E R E N T S T AG E S A N D S E T U P S O F T H E A L G O R I T H M S : ( I ) O N L Y S PA R S E C O D I N G W I T H k = 2 ; ( I I ) S PA R S E C O D I N G W I T H k = 2 F O L L OW E D B Y A S PAR S E C O D I N G W I T H T H E B O OT S T R A P P I N G BA S E D S T O P P I N G C R I T E R I O N ; ( I I I ) I F N O B I N N I N G I S U S E D A N D T H E P E A K S A R E 0 . 1 , 0 . 2 T H E N S P DA W I T H 120 S I M P L E D I C T I O N A RY L E A R N I N G S T E P S I S U S E D . O T H E RW I S E S P D A W I T H 20 S I M P L E D I C T I O N A RY L E A R N I N G S T E P S F O L L OW E D B Y A N O T H E R 20 S T E P S A F T E R R E C L U S T E R I N G I S U S E D ; ( I V ) F I V E A D V A N C E D D I C T I O NA R Y L E A R N I N G S T E P S W I T H N O R E C L U S T E R I N G ; ( V ) F I V E A D V A N C E D D I C T I O N A RY L E A R N I N G S T E P S F O L L O W E D B Y A N OT H E R FI V E S T E P S A F T E R R E C L U S T E R I N G . W E C O M PA R E A L S O T O A V E R S I O N O F T H E A L G O R I T H M A D A P T E D T O G AU S S I A N N O I S E U S E D W I T H A N S C O M B E ( A N S + G ) . T H E AVE R A G E I N T H E R E S U L T S I S OV E R FI V E N O I S E R E A L I Z ATI O N S A N D T H E E I G H T I M AG E S I N F I G . 4 . For each setup we calcu late, fo r six different peak values, the average PSNR over th e eight images in Fig. 4. Th e resu lt is pr esented in T able III. First note that the g ap between the recovery r esult of th e simple sparse coding (Setup I) an d the one o f the ad vanced d ictionary learn ing with reclu stering (Setup V) is 2 . 6 3 dB on average which is very sign iﬁcant. Lookin g at the con tribution of each stage in the algorithm we observe tha t th e effect of the b ootstrapp ing based stopping criterion (Setup II) is n egligible in the case of binning, while it impr oves the recovery resu lt by 0 . 3 dB in the case o f no- binning . W e believ e that the reason is that the number of atoms used in th e recovery determines the resolution of the recovered patches. W ith binnin g, a co arser resolutio n of the image is being p rocessed an d theref ore the r econstructio n result is less sensiti ve to the numb er of patches used in r ecovery , while with no-binn ing patches with ﬁner resolution are used and therefor e the nu mber of atom s to be u sed for the d ecoding is more critical to the recovery perfo rmance. Thus, it is n oticeable tha t the m ain im provement in the recovery is du e to the dictionar y learning. Usin g o nly 5 dictionary lear ning steps we get an im provement of 2 . 43 dB on a verage. Wit h re-clusterin g and an additio nal 5 diction ary learning steps we get a n imp rovement of anoth er 0 . 2 dB . Thoug h th e difference between using SPD A with the ad- vanced d ictionary learn ing steps (Setu p V) and the simple dictionary le arning steps is not sig niﬁcant on average, it h as two advantages: ( i) For the peaks 0 . 1 , 0 . 2 , 0 . 5 where the usage of binnin g is prefe rable the th e im provement is 0 . 23 dB on a verage f or SPD Abin and for peaks 1 , 2 , 4 wh ere it is better to u se no b inning the ga p is 0 . 18 dB fo r SPD A. (ii) The conv ergence of the algorithm is faster with the advance learning steps: 10 advanced learn ing steps versus 40 or 1 20 simple learning steps. Since the bottlen eck of ou r method is the sparse coding step an d each lea rning step is followed by such a on e, usin g SPD A with the advanced diction ary learnin g steps (setup V) runs four tim es f a ster than with the simple s teps (setup III ). Finally , we compar e our algor ithm to its “Gaussian version ” with Ansco mbe. This version aims at minimizing th e standar d ℓ 2 error o bjectiv e with th e standard spar sity constra ints. In the dictionary learning steps we use the MOD algorith m [2 7], [28] with the ad vanced update step from [ 20]. Other than that we adop t exactly the same setu p as in SPDA. W e present the av erage denoising error (over the eight test im ages in Fig. 4) in T ab le II I ( The algorithm is referr ed to as Ans+G) for the different peak values. No te that this algorithm does not per form well when u sed with binnin g. T he re ason is that in the binning scenario there are less patch es f or u pdating the d ictionary . T herefor e we do no t ben eﬁt from th e learn ing process with b inning. Interestingly , this is n ot the case wh en we work with the Po isson ob jectiv e. This shows the advantage of working directly with the Poisson data. W ithou t b inning the perf ormance of the Gaussian version ben eﬁts from the dictionary u pdate as ther e is a larger num ber of patches f or the learn ing. Howe ver, in th is case a s well it is be tter to use SPD A. N ote tha t SPD A outpe rform its Gaussian version f or all peak values except peak = 0 . 1 , for which it is b etter to use SPD A with binn ing a nyway . V . D I S C U S S I O N A N D C O N C L U S I O N This work propo ses a n e w sche me for Poisson d enoising, the sp arse Poisson denoising algorithm (SPD A) . It r elies on the Poisson statistical mo del presented in [2] an d uses a dictionary learning strategy with a sparse cod ing algor ithm that em ploys a boot-strap ping b ased stopping cr iterion. The recovery perfor mance of the SPDA are state-of -the-art and in some scenarios o utperfo rm the existing algorith ms by more than 1db. Howev er , there is still ro om for impr ovement in the current scheme which we leav e for a futur e work: (i) In our curren t work we used the same in itialization dictionary D for all types of images. Howe ver, in many 13 applications the ty pe of images to be observed ar e known beforeh and, e.g., space images or ﬂuore scence microscopy cell images. Of f-line training of a dictionar y which is dedicated to a speciﬁc task can improve the cu rrent resu lts. (ii) Setting a suitable number of dictionary learning iter- ations is imp ortant fo r th e quality of the r econstructio n. An automated tunin g technique should be considered for this purpo se [24], [ 29]. Setting the same numb er for all im ages giv es a mid-level results (which in overall are better th an o ther existing results). Bas ically we can say that almost in all images we hav e lost output quality because of the ﬁxed stopping point. W e should mentio n that in some app lications this is not an issue as the tuning can be done man ually by the user b ased on qu alitati ve results and visual feedback. A C K N OW L E D G M E N T R. Giry es thanks the Azr ieli Foundation for th e Azrieli Fellowship. This re search was suppo rted by Eu ropean Com- munity’ s FP7- ERC prog ram, g rant agr eement no. 32064 9. I n addition, the au thors thank the reviewers of the man uscript fo r their sugg estions which g reatly improved the paper . R E F E R E N C E S [1] J. Salmon, C.-A. Deledal le, R. W illett, and Z. T . Harmany , “Poisson noise reducti on with non-local PCA, ” in ICASSP , 2012. , March 2012, pp. 1109–1112. [2] J. Salmon, Z . Harmany , C.-A. Deledal le, and R. Will ett, “Poisson noise reducti on with non-loca l PCA, ” Journal of Mathemat ical Imaging and V ision , pp. 1–16, 2013. [3] A. Daniely an, V . Katko vnik, and K. Egiazari an, “Deblurri ng of Pois- sonian images using BM3D frames, ” Proc . SPIE , vol. 8138, pp. 813 812–813 812–7, 2011. [4] E. Gil-Rodrigo, J. Portilla, D. Miraut, and R. Suarez-Mesa, “Efﬁcient joint poisson-gauss re storation using multi-fra me l2-rela xed-l 0 analysis- based sparsi ty , ” in IE EE Inte rnational Confer ence on Ima ge Pr ocessing (ICIP) , Sept. 2011, pp. 1385–1388. [5] M. A. T . Figueiredo and J. Biouca s-Dias, “Restora tion of Poissonian images using alternatin g direct ion optimizatio n, ” IEEE T rans. Image Pr ocess. , vol. 19, no. 12, pp. 3133–3145, Dec. 2010. [6] X. Zhang, Y . Lu, and T . Chan, “ A nov el sparsit y reconstru ction method from Poisson dat a for 3D biolumine scence tomography , ” J. Sci. Comput . , vol. 50, no. 3, pp. 519–535, Mar . 2012. [7] J. Boulanger , C. Kervrann, P . Bouthemy , P . Elbau, J.-B. Sibarita, and J. Salamero, “Pat ch-based nonlocal functional for denoisin g ﬂuorescen ce microscop y image sequences, ” IEEE T rans. on Med. Imag . , vol. 29, no. 2, pp. 442–454, Feb . 2010. [8] M. Makital o and A. Foi, “Opt imal in version of the Anscombe transfor- mation in low-co unt Poisson image denoising, ” IEEE T rans. on Imag e Pr oces. , vol. 20, no. 1, pp. 99–109, Jan. 2011. [9] B. Zhang, J. Fadi li, and J. Starck, “W av elets, ridgelets, and curvel ets for poisson noise remov al, ” IEEE T rans. on Imag e Pr ocessing , vol. 17, no. 7, pp. 1093–1108, July 2008. [10] F . J. Anscombe, “The transformation of Poisson, Binomial and negat i ve- Binomial data, ” Biometrika , vol. 35, no. 3-4, pp. 246–254, 1948. [11] M. Fisz, “The limiting distribut ion of a function of two independent ran- dom v ariables and its sta tistica l applica tion, ” Colloquium Mathemati cum , vol. 3, pp. 138–146, 1955. [12] K. Dabov , A. Foi, V . Katk ovnik, and K. Egiazarian , “Image denoising by sparse 3-d transform-domain coll aborati ve ﬁltering, ” IEEE T rans. on Imag e Proce ssing , vol. 16, no. 8, pp. 2080–2095, 2007. [13] J. Mairal, F . Bach, J . Ponce, G. Sapiro, and A. Zisserman, “Non-local sparse models for image restoration, ” in ICCV , 2009 , 2009, pp. 2272– 2279. [14] G. Y u, G. S apiro, and S. Mallat, “Solving in verse problems with piece wise linear estimators: From Gaussian mixture m odels to structu red sparsity , ” IEEE Tr ans. on Imag e Proce ssing , vol. 21, no. 5, pp. 2481 – 2499, may 2012. [15] Z. Harmany , R. Marci a, and R. W illett, “This is SPIRAL-T AP: Sparse poisson intensi ty reconstructi on algori thms – theory and practice , ” IEEE T rans. on Imag e Pr ocessing , vol. 21, no. 3, pp. 1084 –1096, march 2012. [16] R. Giryes and M. Elad, “Sparsity based poisson denoising , ” in IEEE 27th Con vention of Electri cal Electr onics Engineers in Israel (IEE EI), 2012 , 2012, pp. 1–5. [17] F .-X. Dupe and S. Anthoine, “ A greedy approach to sparse Poisson denoisin g, ” in IEE E Internation al W orkshop on Machine Learning for Signal Pr ocessing (MLSP), 2013 , Sept 2013, pp. 1–6. [18] A. K. Quoc Tran -Dinh and V . Ce vher , “Composite self-concordan t minimizat ion, ” CoRR , vol. abs/1308.2867 , 2014. [19] R. Giryes and M. Elad, “Sparsity based poisson inpainting, ” in to appear in IEEE Internation al Confer ence on Imag e P r ocessing (ICIP) , 2014. [20] L. Smith and M. Elad, “Improving dictionary learning: Multiple dic- tionary update s and coef ﬁcient reuse, ” IEEE Signal Pro cessing Lette rs , vol. 20, no. 1, pp. 79–82, Jan. 2013. [21] M. Elad, Sparse and R edundant Repre sentati ons: F rom Theory to Ap- plicat ions in Signal and Image Proc essing , 1st ed. Springer Publishing Compan y , Incorpor ated, 2010. [22] D. J. L ingenfe lter , J. A. Fessler , and Z. He, “Sparsity regul arizat ion for imag e reconstr uction with poisson data , ” Proc . SPIE , v ol. 7246, pp. 72 460F–72 460F–10, 2009. [23] H. Burger and S. Harmeling, “Improving denoising algori thms via a multi-sca le meta-procedure , ” in P attern Reco gnition , ser . Lecture Notes in Computer Science , R. Mester and M. Felsberg, Eds. Springer Berlin Heidelb erg, 2011, vol. 6835, pp. 206–215. [24] C.-A. Dele dalle, F . T upin, and L. Denis, “Poisson NL means: Unsuper - vised non loca l mea ns for poisson noise, ” in ICIP , 2010 , sept. 2010 , pp. 801 –804. [25] A. Buades, B. Coll, and J. Morel, “ A rev ie w of image denoising algorit hms, with a new one, ” Multisca le Model. Simul. , vol. 4, no. 2, pp. 490–530, 2005. [26] M. Protter , M. Elad, H. T aked a, and P . Milanfar , “Gene ralizi ng the nonloca l-means to super-resol ution reconstr uction, ” IEEE T rans. Image Pr ocess. , vol. 18, no. 1, pp. 36–51, Jan 2009. [27] K. E ngan, S. Aase, and J. Hakon Huso y , “Met hod of optimal dire ctions for frame design, ” in IEE E Internat ional Confere nce on A coustic s, Speec h and Signal P r ocessing (ICASSP) , vol . 5, 1999, pp. 2443–2446. [28] K. Engan, K. Skretti ng, and J. H. Husy , “Family of iterati ve ls-based dicti onary learning algori thms, ils-dla, for sparse signal representati on, ” Digital Signal Pr ocessing , vol. 17, no. 1, pp. 32 – 49, 2007. [29] R. Giryes, M. Elad, and Y . C. E ldar , “The project ed GSURE for automati c parameter tuning in iterati ve shrin kage m ethods, ” A pplied and Computati onal Harmonic Analy sis , vol . 30, no. 3, pp. 407 – 422, 2011.

Sparsity Based Poisson Denoising with Dictionary Learning

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment