Removing Gaussian Noise by Optimization of Weights in Non-Local Means

Remo ving Gaussian Noise b y Optimization o f W eigh ts in Non-Lo cal Means Qiyu Jin a,b Ion Gram a a,b Quansheng Liu a,b qiyu.jin@un iv-ubs.fr ion.grama@univ-ubs.fr quansheng.liu@u n iv-ubs.fr a Univ ersit ´ e de Breta gne-Sud , Campus de T ohaninic, BP 573, 56017 V annes, F rance b Univ ersit ´ e Eur op ´ eenne de Bretagne, F rance Abstract A new image denoising algorithm to deal with the additiv e Gaussian white noise mo del is g ive n . Like the non-lo cal means metho d, the ﬁ lter is based on the weigh ted a v erage of the observ atio n s in a neigh b orho o d , with w eigh ts dep en d ing on the simi- larit y of lo cal patc hes. But in contrast to the n on -lo cal means ﬁ lter, in stead of using a ﬁ xed Gaussian ke rn el, we pr op ose to c ho ose the weig hts b y min imizing a tigh t u p- p er b ound of mean square error. This app roac h mak es it p ossib le to deﬁne the w eigh ts adap ted to the function at hand , mimic king the w eigh ts of the oracle ﬁ lter. Under some r egularit y conditions on the target image, we sh o w that the obtained estimator con v erges at th e usual optimal r ate. Th e prop osed algorithm is p arameter free in the sense th at it automatically calculates the b andwidth of the smo othing k ernel; it is fast and its implemen tation is straigh tforwa rd . T he p erformance of the new ﬁlter is illustrated b y n um erical sim u lations. Keywor ds : Non-lo cal means, image denoising, optimization w eigh ts, o racle, statistical estimation. 1 In tro ductio n W e deal with the additive Gaussian noise mo del Y ( x ) = f ( x ) + ε ( x ) , x ∈ I , (1) where I is a uniform N × N gr id of pix els on the unit sq uare, Y = ( Y ( x )) x ∈ I is the observ ed image brig h tness, f : [0 , 1] 2 → R + is an unkno wn target regres sion function and ε = ( ε ( x )) x ∈ I are indep enden t and iden tically distributed (i.i.d.) Gaussian r a ndom v ariables with mean 0 and standard deviation σ > 0 . Imp ortant denoising t echniq ues fo r 1 the mo del ( 1 ) ha ve b een dev elop ed in recen t ye ars, see for example Buades, Coll and Morel (2005 [1 ]) , Kervrann (2006 [10]), Lo u, Zhang, Osher and Bertozz i (2010 [1 4]), P olzehl and Sp okoin y (2006 [17]), Garnett, Huegerich a nd Ch ui (20 0 5 [8]), Cai, Chan, Nik olo v a (2008 [3]), Katk ov nik, F oi, Egiazarian, and Astola ( 2010 [9]), Dab ov, F oi, Katko vnik and Egiazarian (2006 [2]). A signiﬁcant step in these dev elopmen ts was the in tro duction of the Non-Lo cal Means ﬁlter b y Buades, Coll and Morel [1] and its v a rian ts (see e.g. [10], [11], [14]). In these ﬁlters, the basic idea is to estimate the unkno wn image f ( x 0 ) b y a w eigh ted av erage of the form e f w ( x 0 ) = X x ∈ I w ( x ) Y ( x ) , (2) where w = ( w ( x )) x ∈ I are some non-negativ e w eigh ts satisfying P x ∈ I w ( x ) = 1 . The c hoice of t he w eigh ts w are based essen tially on t w o criteria: a lo cal criterion so that the w eigh ts are as a dec reasing function of t he distance t o the estimated pixel, and a non-lo cal criterion whic h giv es more imp ortant w eights to the pixels whose brigh tness is close to the brigh tness of the estimated pixel (see e.g. Y arosla vsky (1 985 [25 ]) a nd T omasi and Manduc hi (1998 [23])). The non-lo cal appro ac h has b een fur t her completed b y a fruit f ul idea whic h consists in attac hing small regions, called data patc hes, to eac h pixel and comparing these data patches instead of the pixels themselv es. The metho ds ba sed on the non-lo cal criterion consist of a comparative ly nov el direction whic h is les s studied in the literature. In this paper w e shall address t w o problems related to this criterion. The ﬁrst problem is how to c ho ose data dep ending on w eights w in (2) in some opti- mal wa y . Generally , the w eights w are deﬁned through some priory ﬁxed k ernels, often the Gaussian one, a nd the imp ortan t problem of the choice of the k ernel has not b een addressed so far f or the non-lo cal approach . Altho ugh the choice of the Gaussian k ernel seems to sho w reasonable n umerical p erformance, there is no pa rticular reason to r e- strict ourselv es only to this type of k ernel. Our theoretical results and the accompany ing sim ulations sho w that another k erne l should b e pre ferred. In addition to this , for the obtained optimal k ernel we shall also b e interes ted in deriving a lo cally adaptiv e rule for the bandwidth choice . The second problem that we shall addres s is the conv ergence of the obtained ﬁlter to the true image. Insights can b e found in [1], [10], [11] and [1 3 ], how ev er the problem of con v ergence of the Non-Lo cal Means Filter has not b een completely settled so far. In this pap er, we shall g iv e some new elemen ts of the pro of of t he conv ergence of the constructed ﬁlter, thereb y giving a theoretical justiﬁcation of the prop o sed approac h from the a symptotic p o int of view. Our main idea is to pro duce a v ery tig h t upp er b ound of the mean square error R  e f w ( x 0 )  = E  e f w ( x 0 ) − f ( x 0 )  2 in terms of the bias and v ariance and to minimize this upp er b ound in w under the constrain ts w ≥ 0 and P x ∈ I w ( x ) = 1 . In contrast to the usual approac h where a sp eciﬁc class of target functions is considered, here w e giv e a b ound of the bias dep ending only on the target function f at hand, instead of using just a bound expressed in terms of the parameters of the class. W e ﬁrst o bta in an explicit formula for the o pt imal w eigh ts 2 w ∗ in terms of the unkno wn function f . In order to get a computable ﬁlter, w e estimate w ∗ b y some adaptiv e weigh ts b w based o n data patc hes from the observ ed image Y . W e th us obtain a new ﬁlter, whic h we call Optimal Weights Filter. T o justify theoretically our ﬁlter, w e pro v e that it ac hiev es the optimal rate of con v ergence under some regularit y conditions on f . Numerical results s how that Optimal W eigh ts Filter outperfo r ms the t ypical Non-Lo cal Means Filter, t hus giving a practical justiﬁcation that the optimal c hoice of the ke rnel impro v es the qualit y of the denoising, while all other conditions a re the same. W e w ould lik e to p o int out that related optimization problems for non parametric signal and density reco ve ring ha v e b een prop osed earlier in Sac ks and Ylvysak er (1978 [22]), Roll (200 3 [19]), Roll and Ljung (2004 [20]), Roll, Nazin and Ljung (2005 [2 1]), Nazin, Roll, Ljung and G rama (2008 [15]). In these pap ers the w eigh ts are optimized o v er a g iv en class of regular f unctions and th us dep end only on some parameters of the class. This appro a c h correspo nds to t he minimax setting, where the resulting minimax estimator has the b est r a te of conv ergence corresp onding t o the worst image in the g iven class of images. If the image happ ens to hav e b etter regularit y than the w orst one, the minimax estimator will exhibit a slow er rate of conv ergence than exp ected. The nov elt y of our work is to ﬁnd the optimal w eigh ts dep ending o n the image f at hand, whic h implicates that our Optimal W eights Filter a utomatically attains the optimal rate o f con v ergence fo r eac h particular image f . Results of this type are related to the ”ora cle” concept dev elop ed in Donoho and Johnstone (1994 [6]). Filters with data - dep enden t w eigh ts ha v e b een previously studied in ma ny papers, among whic h we men tion Polzehl and Spokoin y (2000 [18], 2003 [16], 2006 [17]), Kervrann (2006 [10] and 200 7 [12]). Compared with these ﬁlters our algorithm is straightforw ard to implemen t and gives a quality of denoising whic h is close to that of the b est recen t metho ds (see T able 2). T he w eigh t optimization approach can also b e applied with these algorithms to improv e them. In particular, w e can use it with recen t ve rsions of the Non- Lo cal Means F ilter, lik e the BM3D (see 20 06 [2], 2007 [4, 5 ]) ; ho we ve r this is b eyond the scop e of the presen t pap er and will b e done elsewhere. The paper is organized as follows . Our new ﬁlter base d on the optimization of w eights in the in tro duction in Section 2 where w e presen t the main idea and the algorithm. Our main theoretical results are presen ted in Section 3 where we g ive the rate of conv ergence of the constructed estimators. In Section 4, w e presen t our simu lat io n results with a brief analysis. Proof s of the main results are deferred to Section 5. T o conclude this section, let us set some imp ortant notations t o b e used throughout the pap er. The Euclidean norm of a vec tor x = ( x 1 , ..., x d ) ∈ R d is denoted b y k x k 2 =  P d i =1 x 2 i  . The suprem um norm of x is denoted b y k x k ∞ = sup 1 ≤ i ≤ d | x i | . The cardinalit y of a set A is denoted card A . F or a p ositiv e in teger N the uniform N × N -grid of pixels on the unit square is deﬁned by I =  1 N , 2 N , · · · , N − 1 N , 1  2 . (3) Eac h elemen t x of the grid I will b e called pixel. The n um b er of pixels is n = N 2 . F or 3 an y pixel x 0 ∈ I and a giv en h > 0 , the square windo w of pixels U x 0 ,h = { x ∈ I : k x − x 0 k ∞ ≤ h } (4) will b e called se ar ch w indow at x 0 . W e nat ur a lly take h as a mu ltiple of 1 N ( h = k N for some k ∈ { 1 , 2 , · · · , N } ). T he size of the square search windo w U x 0 ,h is the p ositiv e inte ger n um b er M = nh 2 = card U x 0 ,h . F or any pixel x ∈ U x 0 ,h and a giv en η > 0 a second square window of pixels V x,η = { y ∈ I : k y − x k ∞ ≤ η } (5) will b e called for short a p atch window at x in o rder to b e distinguished fr om the searc h windo w U x 0 ,h . Lik e h , the parameter η is also tak en as a m ultiple of 1 N . The size o f the patc h window V x,η is the p ositiv e in teger m = nη 2 = card V x 0 ,η . The v ector Y x,η = ( Y ( y )) y ∈ V x,η formed by the the v alues of the observ ed noisy image Y at pixels in t he patch V x,η will b e called simply data p atch at x ∈ U x 0 ,h . F inally , the p ositiv e part of a real n um b er a is denoted b y a + , that is a + =  a if a ≥ 0 , 0 if a < 0 . 2 Constr uction of t he es timator Let h > 0 b e ﬁxed. F or an y pixel x 0 ∈ I consider a family of w eigh ted estimates e f h,w ( x 0 ) of the f o rm e f h,w ( x 0 ) = X x ∈ U x 0 ,h w ( x ) Y ( x ) , (6) where the unkno wn w eigh ts satisfy w ( x ) ≥ 0 a nd X x ∈ U x 0 ,h w ( x ) = 1 . (7) The usual bias plus v ariance decomp osition o f the mean square error giv es E  e f h,w ( x 0 ) − f ( x 0 )  2 = B ias 2 + V ar, (8) with B ias 2 =   X x ∈ U x 0 ,h w ( x ) ( f ( x ) − f ( x 0 ))   2 and V ar = σ 2 X x ∈ U x 0 ,h w ( x ) 2 . The decomp osition (8) is commonly used to construct asymptotically minimax es timato r s o v er some giv en classes of f unctions in the nonpara metric function estimation. In order 4 to highligh t the diﬀerence b et w een the approac h prop osed in the pr esen t pap er and the previous w ork, suppose t ha t f belongs to the class of functions satisfying the H¨ older condition | f ( x ) − f ( y ) | ≤ L k x − y k β ∞ , ∀ x, y ∈ I . In this case, it is easy to see that E  e f h,w ( x 0 ) − f ( x 0 )  2 ≤   X x ∈ U x 0 ,h w ( x ) L | x − x 0 | β   2 + σ 2 X x ∈ U x 0 ,h w ( x ) 2 . (9) Optimizing further the w eigh ts w in the obtained upper b ound giv es an asymptotically minimax estimate with w eigh ts dep ending on the unkno wn parameters L and β (fo r details see [22]). With our approach the bias term B ias 2 will b e b ounded in terms of the unkno wn function f itself. As a result w e obtain some ”oracle” w eights w adapted to the unkno wn function f at hand, whic h will be estimated f urther using data patc hes from the image Y . First, w e shall a ddress the problem of determining the ”oracle” weigh ts. With this aim denote ρ f ,x 0 ( x ) ≡ | f ( x ) − f ( x 0 ) | . (10) Note that the v alue ρ f ,x 0 ( x ) c haracterizes the v ariation of the image brig h tness of the pixel x with resp ect to the pixel x 0 . F rom the decomp osition ( 8 ), w e easily obtain a tight upp er b ound in terms of the v ector ρ f ,x 0 : E  e f h ( x 0 ) − f ( x 0 )  2 ≤ g ρ f , x 0 ( w ) , (11) where g ρ f , x 0 ( w ) =   X x ∈ U x 0 ,h w ( x ) ρ f ,x 0 ( x )   2 + σ 2 X x ∈ U x 0 ,h w ( x ) 2 . (12) F rom the fo llowing theorem we can obtain t he form o f the w eights w which minimize the function g ρ f , x 0 ( w ) under the constraints (7 ) in terms o f the v alues ρ f ,x 0 ( x ) . F o r the sak e of gene ra lity , w e shall form ulate the result fo r an arbitrary non- negativ e f unction ρ ( x ) , x ∈ U x 0 ,h . Deﬁne the ob jectiv e function g ρ ( w ) =   X x ∈ U x 0 ,h w ( x ) ρ ( x )   2 + σ 2 X x ∈ U x 0 ,h w ( x ) 2 . (13) In tro duce into consideration the strictly increasing function M ρ ( t ) = X x ∈ U x 0 ,h ρ ( x )( t − ρ ( x )) + , t ≥ 0 . (14) Let K tr b e the usual triang ular kerne l: K tr ( t ) = (1 − | t | ) + , t ∈ R 1 . (15) 5 Theorem 1 Assume that ρ ( x ) , x ∈ U x 0 ,h , is a non-ne gative function. T hen the unique weights which min i m ize g ρ ( w ) subje ct to (7) ar e given by w ρ ( x ) = K tr ( ρ ( x ) a ) P y ∈ U x 0 ,h K tr ( ρ ( x ) a ) , x ∈ U x 0 ,h , (16) wher e the b andwidth a > 0 is the unique solution on (0 , ∞ ) of the e quation M ρ ( a ) = σ 2 . (17) Theorem 1 can b e obtained from a result of Sack s and Ylvysak er [22]. The pro of is deferred to Section 5.1. Remark 2 The v alue of a > 0 c an b e c alculate d as fol lows. We sort the set { ρ ( x ) | x ∈ U x 0 ,h } in the asc ending or der 0 = ρ 1 ≤ ρ 2 ≤ · · · ≤ ρ M < ρ M +1 = + ∞ , wher e M = C ar d U x 0 ,h . L et a k = σ 2 + k P i =1 ρ 2 i k P i =1 ρ i , 1 ≤ k ≤ M , (18) and k ∗ = max { 1 ≤ k ≤ M | a k ≥ ρ k } = min { 1 ≤ k ≤ M | a k < ρ k } − 1 , (19) with the c onvention that a k = ∞ if ρ k = 0 and that min ∅ = M + 1 . The n the solution a > 0 of (17) c an b e expr esse d as a = a k ∗ ; mor e over, k ∗ is the unique inte ger k ∈ { 1 , · · · , M } such that a k ≥ ρ k and a k +1 < ρ k +1 if k < M . The pro of of t he remark is deferred to Section 5.2. Let x 0 ∈ I . Using the o ptima l w eigh ts giv en b y Theorem 1, we ﬁrst intro duce the follo wing non computable appro ximation of the true image, called ”oracle”: f ∗ h ( x 0 ) = P x ∈ U x 0 ,h K tr ( ρ f , x 0 ( x ) a ) Y ( x ) P y ∈ U x 0 ,h K tr ( ρ f , x 0 ( x ) a ) , (20) where the ba ndwidth a is the solution of the equation M ρ f , x 0 ( a ) = σ 2 . A computable ﬁlter can b e o btained b y estimating the unknow n function ρ f ,x 0 ( x ) and the bandwidth a from the data a s follows. Let h > 0 and η > 0 b e ﬁxed num bers. F or any x 0 ∈ I and any x ∈ U x 0 ,h consider a distance b et we en the data patc hes Y x,η = ( Y ( y )) y ∈ V x,η and Y x 0 ,η = ( Y ( y )) y ∈ V x 0 ,η deﬁned b y d 2 ( Y x,η , Y x 0 ,η ) = 1 m k Y x,η − Y x 0 ,η k 2 2 , 6 where m = card V x,η , a nd k Y x,η − Y x 0 ,η k 2 2 = P x 0 + z ∈ V x 0 ,η ( Y ( x + z ) − Y ( x 0 + z )) 2 . Since Buades, Coll and Morel [1] the distance d 2 ( Y x,η , Y x 0 ,η ) is known to b e a ﬂexible to o l t o measure the v ariations of the brigh tness of the image Y . As Y ( x + z ) − Y ( x 0 + z ) = f ( x + z ) − f ( x 0 + z ) + ǫ ( x + z ) − ǫ ( x 0 + z ) w e ha v e E ( Y ( x + z ) − Y ( x 0 + z )) 2 = ( f ( x + z ) − f ( x 0 + z )) 2 + 2 σ 2 . If we use the approximation ( f ( x + z ) − f ( x 0 + z )) 2 ≈ ( f ( x ) − f ( x 0 )) 2 = ρ 2 f ,x 0 ( x ) and the la w of large num bers, it seems reasonable that ρ 2 f ,x 0 ( x ) ≈ d 2 ( Y x,η , Y x 0 ,η ) − 2 σ 2 . But our simulations sho w that a m uch b etter appro ximation is ρ f ,x 0 ( x ) ≈ b ρ x 0 ( x ) =  d ( Y x,η , Y x 0 ,η ) − √ 2 σ  + . (21) The fact that b ρ x 0 ( x ) is a go o d estimator of ρ f ,x 0 will be justiﬁed by conv ergenc e theorems: cf. Th eorems 4 and 5 of Section 3. Th us our Optimal W eigh ts Filter is deﬁned b y b f ( x 0 ) = b f h,η ( x 0 ) = P x ∈ U x 0 ,h K tr ( b ρ x 0 ( x ) b a ) Y ( x ) P y ∈ U x 0 ,h K tr ( b ρ x 0 ( x ) b a ) , (22) where the bandwidth b a > 0 is the solution of the equation M b ρ x 0 ( b a ) = σ 2 , whic h can b e calculated as in Remark 2 (with ρ ( x ) a nd a replaced by b ρ x 0 ( x ) and b a resp ectiv ely). W e end this section by giving an algorithm for computing the ﬁlter (22). The input v alues of the algorithm are the image Y ( x ) , x ∈ I , the v ariance of the noise σ and t wo n um b ers m and M represen ting the sizes of the patc h windo w and the search windo w resp ectiv ely . Algorithm : Optimal W eights Filter Rep eat for eac h x 0 ∈ I giv e an initial v alue o f b a : b a = 1 (it can b e an arbitrary p ositiv e nu mber). compute { b ρ x 0 ( x ) | x ∈ U x 0 ,h } b y (21) / c ompute the b andwidth b a at x 0 reorder { b ρ x 0 ( x ) | x ∈ U x 0 ,h } as increasing sequence, sa y b ρ x 0 ( x 1 ) ≤ b ρ x 0 ( x 2 ) ≤ · · · ≤ b ρ x 0 ( x M ) lo op fro m k = 1 to M if P k i =1 b ρ x 0 ( x i ) > 0 if σ 2 + P k i =1 b ρ 2 x 0 ( x i ) P k i =1 b ρ x 0 ( x i ) ≥ b ρ ( x k ) then b a = σ 2 + P k i =1 b ρ 2 x 0 ( x i ) P k i =1 b ρ x 0 ( x i ) else quit lo op else con tin ue lo op 7 end lo op / c ompute the estimate d weights b w at x 0 compute b w ( x i ) = K tr (1 − b ρ x 0 ( x i ) / b a ) + P x i ∈ U x 0 ,h K tr (1 − b ρ x 0 ( x i ) / b a ) + / c ompute the ﬁlter b f at x 0 compute b f ( x 0 ) = P x i ∈ U x 0 ,h b w ( x i ) Y ( x i ). The prop osed algorithm is computationally f a st and its implemen tation is straightfor- w ard compared t o more sophisticated a lgorithms dev elop ed in recen t ye ars. Notice that an imp orta nt iss ue in the non-lo cal means ﬁlter is the c hoice o f the bandwidth parameter in the Gaussian kernel; our algorithm is parameter f ree in the sense that it automatically c ho oses the bandwidth. The n umerical sim ulations show that our ﬁlter outp erforms the classical non-lo cal means ﬁlter under the same conditions. The ov erall p erformance of the prop osed ﬁlter compared to its simplicity is v ery go o d whic h can b e a big adv an tage in some practical applications. W e hop e that o pt ima l we ights that we deduced can b e useful with mo r e complicated algorithms and can giv e similar impro v emen ts o f the denoising qualit y . Ho w- ev er, these in ve stigatio ns are b ey ond the scop e o f the presen t pap er. A detailed analysis of the p erformance of our ﬁlter is given in Section 4. 3 Main r esults In this section, w e presen t tw o theoretical results. The ﬁrst result is a mathematical justiﬁcation of the ”oracle” ﬁlter in tro duced in the previous section. It sho ws that despite the fact that w e minimized a n upp er b ound of the mean square error instead of the mean square error itself, the obtained ”o r a cle” still has the optimal rate of con v ergence. Moreo v er, w e sho w t ha t the we ights optimization approac h p ossesses the follow ing imp orta n t adaptivity prop erty: our pro cedure automat- ically c ho oses the correct bandwidth a > 0 ev en if t he radius h > 0 of t he searc h window U x 0 ,h is larger tha n necessary . The second result sho ws the conv ergence of the Optimal W eigh ts Filter b f h,η under some more restricted conditions t ha n those form ulated in Section 2. T o prov e the con v ergence, w e split the image into t w o indep enden t parts. F rom the ﬁrst one, w e construct the ”oracle” ﬁlter; from the second one, we es timate the w eights. Under some regularity assumptions on t he target image we are able to show that the resulting ﬁlter has nearly the optimal rate of conv ergence. Let ρ ( x ) , x ∈ U x 0 ,h , b e an ar bitr ary non-negative function and let w ρ b e the o ptimal w eigh ts giv en b y (16). Usin g these w eigh ts w ρ w e deﬁne the family of estimates f ∗ h ( x 0 ) = X x ∈ U x 0 ,h w ρ ( x ) Y ( x ) (23) dep ending on the unkno wn function ρ. The next theorem sho ws that one can pick up a useful estimate from the family f ∗ h if the the function ρ is close to the ”true” function 8 ρ f ,x 0 ( x ) = | f ( x ) − f ( x 0 ) | , i.e. if ρ ( x ) = | f ( x ) − f ( x 0 ) | + δ n , (24) where δ n ≥ 0 is a small deterministic error. W e shall prov e the con v ergence of the estimate f ∗ h under the lo cal H¨ older condition | f ( x ) − f ( y ) | ≤ L k x − y k β ∞ , ∀ x, y ∈ U x 0 ,h , (25) where β > 0 is a constan t, h > 0 , and x 0 ∈ I . In the follow ing, c i > 0 ( i ≥ 1) denotes a positiv e constan t, and O ( a n ) ( n ≥ 1) denotes a num ber b ounded by c · a n for some constan t c > 0. All the constants c i > 0 and c > 0 dep end only on L , β and σ ; their v alues can b e diﬀeren t from line to line. Theorem 3 Assume that h = c 1 n − 1 2 β +2 with c 1 > c 0 =  σ 2 ( β +2)(2 β +2) 8 L 2 β  1 2 β +2 , or h ≥ c 1 n − α with 0 ≤ α < 1 2 β +2 and c 1 > 0 . Supp o s e that f satisﬁes the lo c al H¨ older’s c ondition (25) and that δ n = O  n − β 2+2 β  . Then E ( f ∗ h ( x 0 ) − f ( x 0 )) 2 = O  n − 2 β 2+2 β  . (26) The pro of will b e given in Section 5.3. Recall that the bandwidth h of order n − 1 2+2 β is required to ha v e the optimal minimax rate of con v ergence O  n − 2 β 2+2 β  of the mean squared error for es timating the function f of global H¨ older smo o thness β (cf. e.g. [7]). T o b etter understand the adaptivity prop erty of the oracle f ∗ h ( x 0 ) , assume that the imag e f at x 0 has H¨ older smo othness β (se e [24]) and that h ≥ c 0 n − α with 0 ≤ α < 1 2 β +2 , whic h means that the radius h > 0 of the searc h windo w U x 0 ,h has b een c hosen larg er tha n t he ”standard” n − 1 2 β +2 . Then, b y Theorem 3, the rate of conv ergence of the oracle is still of order n − β 2+2 β , con trary to t he global case men tioned ab ov e. If w e c ho ose a suﬃcien tly large searc h window U x 0 ,h , then the or a cle f ∗ h ( x 0 ) will ha v e a rate of con v ergence whic h dep ends only on the unkno wn maximal lo cal smo othness β of the image f . In particular, if β is ve ry large, then the ra te will b e close to n − 1 / 2 , whic h ensures go o d estimation of the ﬂat regions in cases where the regions are indeed ﬂat. More generally , since Theorem 3 is v alid for arbitrary β , it applies for the maximal lo cal H¨ older smo othness β x 0 at x 0 , therefore the o racle f ∗ h ( x 0 ) will exhibit the b est rate of conv ergence of order n − 2 β x 0 2+2 β x 0 at x 0 . In other w ords, the pro cedure adapts to the b est rate of con v ergence at each p oint x 0 of the imag e. W e justify by simulation results that the diﬀerence b et w een the ora cle f ∗ h computed with ρ = ρ f ,x 0 = | f ( x ) − f ( x 0 ) | , and the true image f , is extremely small (see T able 1). This shows tha t , a t least from the practical p oin t of view, it is justiﬁed to optimize the upp er b ound g ρ f , x 0 ( w ) instead of optimizing the mean square error E ( f ∗ h ( x 0 ) − f ( x 0 )) 2 itself. The estimate f ∗ h with the c hoice ρ ( x ) = ρ f ,x 0 ( x ) will b e called oracle ﬁlter. In partic- ular fo r the ora cle ﬁlter f ∗ h , under the conditions o f Theorem 3 , we hav e E ( f ∗ h ( x 0 ) − f ( x 0 )) 2 ≤ g ρ ( w ρ ) ≤ cn − 2 β 2+2 β . 9 No w, we turn to t he study of the con v ergence o f the Optimal W eigh ts Filter. D ue to the diﬃcult y in dealing with the dep endence of the weigh ts we shall consider a sligh tly mo diﬁed v ersion o f the prop o sed algo rithm: we divide the set of pixels into t wo indep en- den t parts, so that the we ights ar e constructed from the o ne part, a nd the estimation of the targ et function is a w eighted mean alo ng the other part. More precisely , assume that x 0 ∈ I , h > 0 and η > 0 . T o pro v e the con v ergence w e split the set of pixels in to t wo parts I = I ′ x 0 ∪ I ′′ x 0 , where I ′ x 0 =  x 0 +  i N , j N  ∈ I : i + j is ev en  (27) is t he set o f pixels with an ev en sum o f co ordinates i + j and I ′′ x 0 = I  I ′ x 0 . Denote U ′ x 0 ,h = U x 0 ,h ∩ I ′ x 0 and V ′′ x,η = V x,η ∩ I ′′ x 0 . Cons ider the distance b etw ee n the data pa t ches Y ′′ x,η = ( Y ( y )) y ∈ V ′′ x,η and Y ′′ x 0 ,η = ( Y ( y )) y ∈ V ′′ x 0 ,η deﬁned by d  Y ′′ x,η , Y ′′ x 0 ,η  = 1 √ m ′′   Y ′′ x,η − Y ′′ x 0 ,η   2 , where m ′′ = card V ′′ x,η . An estimate of the function ρ f ,x 0 is giv en by ρ f ,x 0 ( x ) ≈ b ρ ′′ x 0 ( x ) =  d  Y ′′ x,η , Y ′′ x 0 ,η  − √ 2 σ  + , (28) see (21) . Deﬁne the ﬁlter b f ′ h,η b y b f ′ h,η ( x 0 ) = X x ∈ U ′ x 0 ,h b w ′′ ( x ) Y ( x ) , (29) where b w ′′ = arg min w   X x ∈ U ′ x 0 ,h w ( x ) b ρ ′′ x 0 ( x )   2 + σ 2 X x ∈ U ′ x 0 ,h w 2 ( x ) . (30) The next theorem giv es a rate of con v ergence of the Opt imal W eigh ts F ilter if the parameters h > 0 and η > 0 are chose n prop erly according to the lo cal smo othness β . Theorem 4 Assume that h = c 1 n − 1 2 β +2 with c 1 > c 0 =  σ 2 ( β +2)(2 β +2) 8 L 2 β  1 2 β +2 , and that η = c 2 n − 1 2 β +2 . Supp ose that function f satisﬁes the lo c a l H¨ older c ondition (25). Then E  b f ′ h,η ( x 0 ) − f ( x 0 )  2 = O  n − 2 β 2 β +2 ln n  . (31) F or the pro of of this theorem see Section 5.4. Theorem 4 stat es t ha t with the pro p er c hoices of the parameters h and η , the mean square error of the estimator b f ′ h,η ( x 0 ) con ve rges nearly at the rate O ( n − 2 β 2 β +2 ) whic h is t he usual optimal rate of con v ergence for a giv en H¨ older smo othness β > 0 ( cf. e.g. [7 ]) . Sim ulation results sho w that the adaptiv e bandwidth b a pro vided b y our algorithm dep ends essen tially on the lo cal prop erties of the image and do es not dep end muc h on the radius h of t he searc h window . These simulations, tog ether with Theorem 3, suggest that the Optimal W eights Filter (22 ) can also b e applied with lar g er h, a s is the case of the ”oracle” ﬁlter f ∗ h . The fo llo wing theorem deals with the case where h is lar g e. 10 Theorem 5 Assume that h = c 1 n − α with c 1 > 0 , and 0 < α ≤ 1 2 β +2 and that η = c 2 n − 1 2 β +2 . Supp ose that the function f satisﬁe s the lo c al H¨ older c ondition (25 ) . Then E  b f ′ h,η ( x 0 ) − f ( x 0 )  2 = O  n − β 2 β +2 ln n  . F or the pro of of this theorem see Section 5.5. Note that in this case the obtained rat e of con v ergence is not the usual optimal o ne, in contrast to Theorems 3 and 4, bu t we b eliev e that t his is the b est rate that can b e obtained for the prop o sed ﬁlter. 4 Numerical p erformance of the Optimal W eigh ts Filter The p erforma nce of the Optimal W eigh ts Filter b f h,η ( x 0 ) is measured by the usual P eak Signal-to-Noise Ra t io (PSNR) in decib els (db) deﬁned as P S N R = 10 log 10 255 2 M S E , M S E = 1 car d I X x ∈ I ( f ( x ) − b f h,η ( x )) 2 , where f is the original imag e, and b f the estimated one. In the sim ulations, w e sometimes shall use the smo othed vers ion of the estimate of brigh tness v a riation d K ( Y x,η , Y x 0 ,η ) instead of the non smo othed one d ( Y x,η , Y x 0 ,η ) . It should b e no t ed that for the smo ot hed v ersions of the estimated brigh tness v a r ia tion w e can establish similar con v ergence results. The smo othed estimate d K ( Y x,η , Y x 0 ,η ) is deﬁned by d K ( Y x,η , Y x 0 ,η ) = k K ( y ) · ( Y x,η − Y x 0 ,η ) k 2 q P y ′ ∈ V x 0 ,η K ( y ′ ) , where K are some w eights deﬁned on V x 0 ,η . The corresp onding e stimate of brigh tness v ariation ρ f ,x 0 ( x ) is giv en b y b ρ K,x 0 ( x ) =  d K ( Y x,η , Y x 0 ,η ) − √ 2 σ  + . (32) With the rectangular kerne l K r ( y ) =  1 , y ∈ V x 0 ,η , 0 , otherwise, (33) w e obtain exactly the distance d ( Y x,η , Y x 0 ,η ) and the ﬁlter describ ed in Section 2. Other smo othing k ernels K used in the simu latio ns a re the Ga ussian k ernel K g ( y ) = exp  − N 2 k y − x 0 k 2 2 2 h g  , (34) where h g is the ba ndwidth parameter and the f o llo wing k ernel K 0 ( y ) = ( P p k = N k y − x 0 k ∞ 1 (2 k +1) 2 if y 6 = x 0 , P p k =1 1 (2 k +1) 2 if y = x 0 , (35) 11 Images Lena Barbara Boat House Pe pp ers Sizes 512 × 512 512 × 512 512 × 512 256 × 256 256 × 256 σ/P S N R 10/28.12db 10/28 .12db 10/28.12db 10/28.11db 10 /28.11db 11 × 11 41.20db 40.06db 40.23db 41.50db 4 0.36db 13 × 13 41.92db 40.82db 40.99db 42.24db 4 1.01db 15 × 15 42.54db 41.48db 41.62db 42.85db 4 1.53db 17 × 17 43.07db 42.05db 42.79db 43.38db 4 1.99db σ/P S N R 20/22.11db 20/22 .11db 20/22.11db 20/28.12db 20 /28.12db 11 × 11 37.17db 35.92db 36.23db 37.18db 3 6.25db 13 × 13 37.91db 36.70db 37.01db 37.97db 3 6.85db 15 × 15 38.57db 37.37db 37.65db 38.59db 3 7.38db 17 × 17 39.15db 37.95db 38.22db 39.11db 3 7.80db σ/P S N R 30/18.60db 30/18 .60db 30/18.60db 30/18.61db 30 /18.61db 11 × 11 34.81db 33.65db 33.79db 34.93db 3 3.57db 13 × 13 35.57db 34.47db 34.58db 35.78db 3 4.23db 15 × 15 36.24db 35.15db 35.25db 36.48db 3 4.78db 17 × 17 36.79db 35.75db 35.84db 37.07db 3 5.26db T able 1: P SNR v alues wh en oracle estimator f ∗ h is applied with diﬀerent v alues of M . −10 −5 0 5 10 −10 −5 0 5 10 0 1 2 3 4 5 6 7 8 9 x 10 −3 −10 −5 0 5 10 −10 −5 0 5 10 0 0.005 0.01 0.015 0.02 0.025 Figure 1: The shap e of the ke rn els K g (left) and K 0 (righ t) w ith M = 21 × 21. with the width o f the similarit y windo w m = (2 p + 1 ) 2 . The shap e of these t wo ke rnels are display ed in Fig ure 1. T o av oid the undesirable b order eﬀects in o ur sim ulations, w e mirror the imag e outside the imag e limits, t ha t is w e extend t he imag e outside the image limits sy mmetrically with resp ect to the b order. A t the corners, the image is extended symmetrically with respect to the corner pixels. W e hav e done sim ulatio ns on a commonly-used set o f images av ailable at h ttp://decsai. ugr.es/ja vier/denoise/test images/ whic h includes Lena, Bar bara, Bo at, House, P epp ers. The p oten tial of the estimation metho d is illustrated with the 512 × 5 12 image ”Lena” (Figure 2(a)) and ”Barbara” (Figure 3(a) ) corrupted b y an additive white Gaussian noise (Figures 2(b), PSNR= 22 . 10 db , σ = 2 0 and 3 (b), PSNR= 18 . 60, σ = 30 ). W e ﬁrst used the rectangular kerne l K 0 for computing the estimated br ig h tness v ariation function b ρ K,x 0 , whic h corresp onds to t he O ptima l W eights Filter as deﬁned in Section 2. Empirically we 12 found that the par ameters m and M can b e ﬁxed to m = 21 × 21 and M = 13 × 13 . In Figures 2(c) and 3(c), we can see t ha t the noise is reduced in a natural manner and signiﬁcant geometric features, ﬁne textures, and original con tra sts a re visually w ell reco v ered with no undesirable artifacts (PSNR= 32 . 5 2 db for ”Lena” and PSNR = 2 8 . 89 for ” Barbara”). T o b etter appreciate the accuracy of the restoration pro cess, the square of the diﬀerence b et w een the orig inal image and the reco v ered image is shown in Figures 2(d) a nd 3(d), where the da r k v alues corresp ond t o a high-conﬁdence estimate. As exp ected, pixels with a low lev el of conﬁdence are lo cated in the neigh b orho o d of image discon tin uities. F or comparison, w e show the image denoised b y Non-Lo cal Means Filter in Figures 2(e),(f ) and 3 (e), (f ). The o v erall visual impression and the n umerical results are improv ed using o ur algor it hm. The Optimal W eights F ilt er seems to provide a feasible and rational metho d to detect automatically the details of images and t a k e the proper weigh ts for eve ry p ossible geomet- ric conﬁguration of the image. F or illustratio n purp oses, w e ha v e chosen a series of searc h windo ws U x 0 ,h with cen ters at some testing pixels x 0 on the noisy image, s ee Figure 4 The distribution of t he w eights inside the search windo w U x 0 ,h dep ends on the estimated brigh tness v ariation function b ρ K,x 0 ( x ) , x ∈ U x 0 ,h . If the estimated brigh tness v ariation b ρ K,x 0 ( x ) is less than b a (see Theorem 1 ), t he similarit y b et w een pixels is measured b y a linear decreasing function of b ρ K,x 0 ( x ) ; otherwise it is zero. Th us b a acts as an a ut o matic threshold. In Fig ure 5, it is shown ho w the Optimal W eights Filter c ho o ses in eac h case a prop er w eigh t conﬁgurat io n. The b est n umerical results are obtained using K = K g and K = K 0 in the deﬁnition of b ρ K,x 0 . In T able 2, we compare the Non- Lo cal Mean Filt er and the Optimal W eights ﬁlter with diﬀeren t c hoices of the k ernel: K = K g , K 0 , K r . The b est PSNR v alues w e obta ined b y v arying the size m of the similarity window s and the size M of t he searc h windo ws are rep orted in T ables 3 ( σ = 10) , 4 ( σ = 20) and 5 ( σ = 30) for K = K 0 . Note that the PSNR v alues are close for ev ery m and M a nd the optimal m and M dep end on the image con tent. The v alues m = 21 × 21 and M = 13 × 13 seem appropriate in most cases and a smaller patch size m can b e considered for pro cessing piecewise smo oth imag es. 5 Pro ofs of t he main res ults 5.1 Pro of of Theorem 1 W e b egin with some preliminary results. The follo wing lemma can b e obtained from Theorem 1 of Sac ks and Ylvisak er [2 2 ]. F or the conv enie nce of readers, we prefer to give a direct pro of adapted to o ur situation. Lemma 6 L et g ρ ( w ) b e deﬁne d by (13) . Then ther e ar e unique weights w ρ which minimize g ρ ( w ) subje ct to (7), given by w ρ ( x ) = 1 σ 2 ( b − λρ ( x )) + , (36) 13 (a) Original i mage ”Lena” (b) Noisy image with σ = 20, P S N R = 22 . 11 db (c) Restored wi th OW F, PSNR= 32 . 52 db (d) Square error with OWF (e) Restored wi th NLMF, PSNR= 31 . 73 db (f ) Square error with NLMF Figure 2: Results of denoising ”Lena” 51 2 × 512 image. Comparing (d) and (f ) w e see that the Optimal W eigh ts Filter (O WF) captures more details than the Non-Lo cal Means Filter (NLMF). 14 (a) Original i mage ”Barbara” (b) Noisy image with σ = 30, P S N R = 18 . 60 db (c) Restored wi th OW F, PSNR= 28 . 89 db (d) Square error with OWF (e) Restored wi th NLMF, PSNR= 27 . 88 db (f ) Square error with NLMF Figure 3: Results of denoising ”Barbara” 512 × 512 image. Comparing (d) and (f ) w e see that the O ptimal W eigh ts Filte r (O WF) captures more d etails th an the Non-Lo cal Means Filter (NLMF). 15 a b c d e f Figure 4: Th e noisy image with six selected searc h w indo ws with cente rs at pixels a, b , c, d, e, f. Images Lena Barbara Boat House Pe pp ers Sizes 512 × 512 512 × 512 512 × 512 256 × 256 256 × 256 σ/P S N R 10/28.12db 10/28.12db 10/28.12db 10/28.11db 10/28.11db OW F wi th K r 35.23db 33.89db 33.07db 35.57db 3 3.74db OW F wi th K g 35.49db 34.13db 33.40db 35.83db 3 3.97db OW F with K 0 35.52db 34.10db 33.48db 35.80db 3 3.96db NLMF 35.03db 33.77db 32.85db 35.43db 3 3.27db σ/P S N R 20/22.11db 20/22.11db 20/22.11db 20/28.12db 20/28.12db OW F wi th K r 32.24db 30.71db 29.65db 32.59db 3 0.17db OW F wi th K g 32.61db 31.01db 30.05db 32.88db 3 0.44db OW F with K 0 32.52db 31.00db 30.20db 32.90db 3 0.66db NLMF 31.73db 30.36db 29.58db 32.51db 3 0.11db σ/P S N R 30/18.60db 30/18.60db 30/18.60db 30/18.61db 30/18.61db OW F wi th K r 30.26db 28.59db 27.69db 30.49db 2 7.93db OW F wi th K g 30.66db 28.97db 28.05db 30.81db 2 8.16db OW F with K 0 30.50db 28.89db 28.23db 30.80db 2 8.49db NLMF 29.56db 27.88db 27.50db 30.02db 2 7.77db T able 2: Comp arison b et w een the Non-Lo cal Means Filter (NLMF) and the Optimal W eights Filter (OWF) . 16 Original image Noisy image 2D represen tation 3D represent ation Restored i mage of the we ights of the weigh ts a −20 −10 0 10 20 −20 −10 0 10 20 0 0.002 0.004 0.006 0.008 0.01 0.012 b −20 −10 0 10 20 −20 −10 0 10 20 0 0.02 0.04 0.06 0.08 0.1 0.12 c −20 −10 0 10 20 −20 −10 0 10 20 0 0.5 1 1.5 2 2.5 3 3.5 x 10 −3 d −20 −10 0 10 20 −20 −10 0 10 20 0 0.5 1 1.5 2 x 10 −3 e −20 −10 0 10 20 −20 −10 0 10 20 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 f −20 −10 0 10 20 −20 −10 0 10 20 0 2 4 6 8 x 10 −4 Figure 5: These p ictur es sho w ho w the Op timal W eigh ts Filte r detects the features of th e image b y c ho osing appropriate weigh ts. The ﬁrs t column displa ys six selected searc h wind o ws used to estimate the image at the corresp onding cen tral pixels a, b, c, d, e and f . The second column d ispla ys the corresp onding searc h windo ws corrupted b y a Gaussian noise with standard deviation σ = 20 . Th e third column d isp la ys the tw o-dimensional rep resen tation of th e weig hts used to estimate cen tral p ixels. The fourth column giv es the three-dimensional representa tion of the weigh ts. The ﬁfth column giv es the restored images. 17 σ = 10 Le na Barbara Boat House Pe pp ers m/ M 512 × 512 512 × 512 512 × 512 256 × 256 256 × 256 11 × 11 / 11 × 11 35.35db 34.03db 33.43db 35.69db 3 4.16db 13 × 13 / 11 × 11 35.40db 34.06db 33.45db 35.72db 3 4.14db 15 × 15 / 11 × 11 35.44db 34.07db 33.47db 35.73db 3 4.10db 17 × 17 / 11 × 11 35.47db 34.08db 33.47db 35.74db 3 4.06db 19 × 19 / 11 × 11 35.50db 34.07db 33.48db 35.74db 3 4.02db 21 × 21 / 11 × 11 35.52db 34.06db 33.47db 35.73db 3 3.97db 11 × 11 / 13 × 13 35.35db 34.08db 33.43db 35.77db 3 4.15db 13 × 13 / 13 × 13 35.40db 34.11db 33.46db 35.79db 3 4.12db 15 × 15 / 13 × 13 35.44db 34.12db 33.47db 35.80db 3 4.09db 17 × 17 / 13 × 13 35.47db 34.12db 33.48db 35.81db 3 4.05db 19 × 19 / 13 × 13 35.50db 34.12db 33.48db 35.81db 3 4.01db 21 × 21 / 13 × 13 35.52db 34.10db 33.48db 35.80db 3 3.96db 11 × 11 / 15 × 15 35.33db 34.11db 33.43db 35.82db 3 4.14db 13 × 13 / 15 × 15 35.39db 34.13db 33.45db 35.84db 3 4.11db 15 × 15 / 15 × 15 35.43db 34.14db 33.47db 35.85db 3 4.08db 17 × 17 / 15 × 15 35.47db 34.14db 33.48db 35.86db 3 4.04db 19 × 19 / 15 × 15 35.49db 34.14db 33.48db 35.85db 3 4.00db 21 × 21 / 15 × 15 35.52db 34.12db 33.48db 35.84db 3 3.96db 11 × 11 / 17 × 17 35.32db 34.13db 33.42db 35.86db 3 4.12db 13 × 13 / 17 × 17 35.37db 34.15db 33.44db 35.88db 3 4.10db 15 × 15 / 17 × 17 35.42db 34.16db 33.46db 35.89db 3 4.07db 17 × 17 / 17 × 17 35.46db 34.16db 33.47db 35.89db 3 4.03db 19 × 19 / 17 × 17 35.48db 34.15db 33.47db 35.88db 3 4.00db 21 × 21 / 17 × 17 35.51db 34.14db 33.47db 35.87db 3 3.95db T able 3: PS NR v alues w h en Optimal W eigh ts Filter with K = K 0 is applied with diﬀeren t v alues of m and M ( σ = 10). σ = 20 Le na Barbara Boat House Pe pp ers m/ M 512 × 512 512 × 512 512 × 512 256 × 256 256 × 256 11 × 11 / 11 × 11 32.08db 30.60db 30.00db 32.56db 3 0.65db 13 × 13 / 11 × 11 32.20db 30.70db 30.06db 32.64db 3 0.68db 15 × 15 / 11 × 11 32.30db 30.78db 30.11db 32.71db 3 0.70db 17 × 17 / 11 × 11 32.39db 30.84db 30.15db 32.76db 3 0.70db 19 × 19 / 11 × 11 32.47db 30.88db 30.18db 32.79db 3 0.70db 21 × 21 / 11 × 11 32.53db 30.91db 30.21db 32.81db 3 0.69db 11 × 11 / 13 × 13 32.06db 30.67db 29.99db 32.63db 3 0.61db 13 × 13 / 13 × 13 32.18db 30.78db 30.05db 32.71db 3 0.64db 15 × 15 / 13 × 13 32.29db 30.86db 30.10db 32.79db 3 0.66db 17 × 17 / 13 × 13 32.38db 30.92db 30.14db 32.84db 3 0.67db 19 × 19 / 13 × 13 32.46db 30.97db 30.18db 32.88db 3 0.67db 21 × 21 / 13 × 13 32.52db 31.00db 30.20db 32.90db 3 0.66db 11 × 11 / 15 × 15 32.02db 30.71db 29.97db 32.67db 3 0.56db 13 × 13 / 15 × 15 32.15db 30.82db 30.03db 32.76db 3 0.59db 15 × 15 / 15 × 15 32.26db 30.90db 30.08db 32.83db 3 0.62db 17 × 17 / 15 × 15 32.35db 30.96db 30.12db 32.89db 3 0.63db 19 × 19 / 15 × 15 32.43db 31.01db 30.16db 32.92db 3 0.63db 21 × 21 / 15 × 15 32.50db 31.04db 30.19db 32.94db 3 0.63db 11 × 11 / 17 × 17 31.97db 30.72db 29.94db 32.70db 3 0.52db 13 × 13 / 17 × 17 32.10db 30.83db 30.00db 32.79db 3 0.56db 15 × 15 / 17 × 17 32.22db 30.92db 30.05db 32.86db 3 0.58db 17 × 17 / 17 × 17 32.32db 30.98db 30.10db 32.92db 3 0.59db 19 × 19 / 17 × 17 32.40db 31.02db 30.13db 32.96db 3 0.60db 21 × 21 / 17 × 17 32.47db 31.06db 30.17db 32.98db 3 0.60db T able 4: PS NR v alues w h en Optimal W eigh ts Filter with K = K 0 is applied with diﬀeren t v alues of m and M ( σ = 20). 18 σ = 30 Le na Barbara Boat House Pe pp ers m/ M 512 × 512 512 × 512 512 × 512 256 × 256 256 × 256 11 × 11 / 11 × 11 29.96db 28.38db 27.96db 30.26db 2 8.36db 13 × 13 / 11 × 11 30.10db 28.53db 28.03db 30.39db 2 8.43db 15 × 15 / 11 × 11 30.23db 28.65db 28.10db 30.50db 2 8.47db 17 × 17 / 11 × 11 30.34db 28.75db 28.15db 30.58db 2 8.50db 19 × 19 / 11 × 11 30.43db 28.83db 28.20db 30.65db 2 8.51db 21 × 21 / 11 × 11 30.50db 28.81db 28.23db 30.70db 2 8.52db 11 × 11 / 13 × 13 29.94db 28.42db 27.95db 30.35db 2 8.30db 13 × 13 / 13 × 13 30.08db 28.58db 28.02db 30.49db 2 8.37db 15 × 15 / 13 × 13 30.21db 28.70db 28.09db 30.60db 2 8.42db 17 × 17 / 13 × 13 30.32db 28.80db 28.14db 30.68db 2 8.46db 19 × 19 / 13 × 13 30.42db 28.88db 28.19db 30.75db 2 8.48db 21 × 21 / 13 × 13 30.50db 28.89db 28.23db 30.80db 2 8.49db 11 × 11 / 15 × 15 29.89db 28.43db 27.92db 30.39db 2 8.23db 13 × 13 / 15 × 15 30.04db 28.58db 27.99db 30.53db 2 8.30db 15 × 15 / 15 × 15 30.17db 28.71db 28.06db 30.64db 2 8.36db 17 × 17 / 15 × 15 30.28db 28.81db 28.11db 30.73db 2 8.40db 19 × 19 / 15 × 15 30.38db 28.89db 28.16db 30.80db 2 8.43db 11 × 11 / 17 × 17 29.82db 28.40db 27.89db 30.39db 2 8.18db 13 × 13 / 17 × 17 29.98db 28.56db 27.96db 30.54db 2 8.26db 15 × 15 / 17 × 17 30.11db 28.69db 28.02db 30.66db 2 8.31db 17 × 17 / 17 × 17 30.22db 28.79db 28.08db 30.76db 2 8.36db 19 × 19 / 17 × 17 30.33db 28.87db 28.13db 30.84db 2 8.39db 21 × 21 / 17 × 17 30.42db 28.96db 28.17db 30.89db 2 8.41db T able 5: PS NR v alues w h en Optimal W eigh ts Filter with K = K 0 is applied with diﬀeren t v alues of m and M ( σ = 30). wher e b and λ ar e d etermine d by X x ∈ U x 0 ,h 1 σ 2 ( b − λρ ( x )) + = 1 , (37) X x ∈ U x 0 ,h 1 σ 2 ( b − λρ ( x )) + ρ ( x ) = λ. (38) Pro of. Let w ′ b e a minimizer of g ρ ( w ) under the constrain t (7). According t o Theorem 3.9 of Whittle ( 1 971 [24]), there are Lagrange m ultipliers b ≥ 0 and b 0 ( x ) ≥ 0 , x ∈ U x 0 ,h , suc h that the function G ( w ) = g ρ ( w ) − 2 b ( X x ∈ U x 0 ,h w ( x ) − 1 ) − 2 X x ∈ U x 0 ,h b 0 ( x ) w ( x ) is minimized at the same p oin t w ′ . Since the function G is strictly con ve x it admits a unique p oin t of minim um. This implies that there is also a unique minimizer of g ρ ( w ) under the constraint (7) whic h coincides with the unique minimizer of G. Let w ρ b e the unique minimizer of G satisfying the constraint (7). Again, using the fact that G is strictly con ve x, f or any x ∈ U x 0 ,h , ∂ ∂ w ( x ) G ( w )     w = w ρ = 2   X y ∈ U x 0 ,h w ρ ( y ) ρ ( y )   ρ ( x ) + 2 σ 2 w ρ ( x ) − 2 b − 2 b 0 ( x ) ≥ 0 . (39) Note that in general w e do not hav e an equalit y in (39). In addition, b y the Karush- Kuhn-T uck er condition, b 0 ( x ) w ρ ( x ) = 0 . (40) 19 Let λ = X y ∈ U x 0 ,h w ρ ( y ) ρ ( y ) . (41) Then (39 ) b ecomes ∂ ∂ w ( x ) G ( w )     w = w ρ = λρ ( x ) + σ 2 w ρ ( x ) − b − b 0 ( x ) ≥ 0 , x ∈ U x 0 ,h . (42) If b 0 ( x ) = 0 , then, with respect to t he single v ariable w ( x ) the function G ( w ) attains its minim um at an inte rior p o int w ρ ( x ) ≥ 0, so that we ha v e ∂ ∂ w ( x ) G ( w )     w = w ρ = λρ ( x ) + σ 2 w ρ ( x ) − b = 0 . F rom this we obtain b − λρ ( x ) = σ w ρ ( x ) ≥ 0, so w ρ ( x ) = ( b − λρ ( x )) + σ . If b 0 ( x ) > 0, by (40 ), w e ha ve w ρ ( x ) = 0. Consequen tly , fro m (42) w e hav e b − λρ ( x ) ≤ − b 0 ( x ) ≤ 0 , (43) so that we get again w ρ ( x ) = 0 = ( b − λρ ( x )) + σ . As to the conditions (37) and (38), they follow immediately from the constrain t (7 ) and the equation (41). Pro of of Theorem 1 . Applying Lemma 6 with b = λa , w e see that the unique op- timal we ights w minimizing g ρ ( w ) sub ject to (7), ar e giv en b y w ρ = λ σ 2 ( a − ρ ( x )) + , (44) where a and λ satisfy λ X x ∈ U x 0 ,h ( a − ρ ( x )) + = σ 2 (45) and X x ∈ U x 0 ,h ( a − ρ ( x )) + ρ ( x ) = σ 2 . (46) Since the function M ρ ( t ) = X x ∈ U x 0 ,h ( t − ρ ( x )) + ρ ( x ) is strictly increasing and contin uous with M ρ (0) = 0 and lim t →∞ M ρ ( t ) = + ∞ , the equation M ρ ( a ) = σ 2 20 has a unique solution on (0 , ∞ ). By (45), σ 2 λ = X x ∈ U x 0 ,h ( a − ρ ( x )) + , whic h together with (44) imply (16) and (17). 5.2 Pro of of Remark 2 Expression (14 ) can b e rewritten a s M ρ ( t ) = M X i =1 ρ i ( t − ρ i ) + . (47) Since function M ρ ( t ) is strictly increasing with M ρ (0) = 0 and M ρ (+ ∞ ) = + ∞ , equation (17) admits a unique solution a on (0 , + ∞ ) , whic h must b e lo cated in some in terv al [ ρ k 0 , ρ k 0 +1 ), 1 ≤ k 0 ≤ M , where ρ M +1 = ∞ (se e Figure 6). Henc e the equation (17) b ecomes k 0 X i =1 ρ i ( a − ρ i ) = σ 2 , (48) where ρ k 0 ≤ a < ρ k 0 +1 . F rom (48), it follow s that a = σ 2 + k 0 P i =1 ρ 2 i k 0 P i =1 ρ i , ρ k 0 ≤ a < ρ k 0 +1 . (49) W e no w sho w that k 0 = k ∗ (so that a = k 0 = k ∗ ), where k ∗ := max { 1 ≤ k ≤ M | a k ≥ ρ k } . T o this end, it suﬃces to v erify that a k 0 ≥ ρ k 0 and a k < ρ k if k 0 < k ≤ M . W e ha v e already seen that a k 0 ≥ ρ k 0 ; if k 0 < k ≤ M , then a k 0 < ρ k 0 +1 ≤ ρ k , so that a k = ( σ 2 + k 0 P i =1 ρ 2 i ) + k P i = k 0 +1 ρ 2 i k P i =1 ρ i = a k 0 k 0 P i =1 ρ i + k P k 0 +1 ρ 2 i k P i =1 ρ i < ρ k k 0 P i =1 ρ i + k P k 0 +1 ρ k ρ i k P i =1 ρ i = ρ k . (50) W e ﬁnally pro v e that if 1 ≤ k < M and a k < ρ k , then a k +1 < ρ k +1 , so that the la st equalit y in (19) holds and that k ∗ is the unique in teger k ∈ { 1 , · · · , M } suc h tha t a k ≥ ρ k and a k +1 < ρ k +1 if 1 ≤ k < M . In fact, for 1 ≤ k < M , the inequalit y a k < ρ k implies that σ 2 + k X i =1 ρ 2 i < ρ k k X i =1 ρ i . 21 ✲ 0 ρ 1 ρ 2 · · · ρ k 0 r a ρ k 0 +1 · · · ρ M Figure 6: The n umb er axis of ρ i , i = 1 , 2 , · · · , M . This, in turn, implies that a k +1 = σ 2 + k P i =1 ρ 2 i + ρ 2 k +1 k +1 P i =1 ρ i < ρ k k P i =1 ρ i + ρ 2 k +1 k +1 P i =1 ρ i ≤ ρ k +1 . 5.3 Pro of of Theorem 3 First assume that ρ ( x ) = ρ f ,x 0 ( x ) = | f ( x ) − f ( x 0 ) | . Recall tha t g ρ and w ρ w ere deﬁned b y (13 ) and (16). Us ing H¨ older’s condition (25) w e hav e, for a n y w , g ρ ( w ρ ) ≤ g ρ ( w ) ≤ g ( w ) , where g ( w ) =   X x ∈ U x 0 ,h w ( x ) L k x − x 0 k β ∞   2 + σ 2 X x ∈ U x 0 ,h w 2 ( x ) . In particular, denoting w = arg min w g ( w ) , w e get g ρ ( w ρ ) ≤ g ( w ) . By Theorem 1, w ( x ) =  a − L k x − x 0 k β ∞  + . X y ∈ U x 0 ,h  a − L k x − x 0 k β ∞  + , where a > 0 is the unique solution on (0 , ∞ ) of the equation M h ( a ) = σ 2 , with M h ( t ) = X x ∈ U x 0 ,h L k x − x 0 k β ∞ ( t − L k x − x 0 k β ∞ ) + , t ≥ 0 . Theorem 3 will b e a consequence of the f ollo wing lemma. Lemma 7 Assume that ρ ( x ) = L k x − x 0 k β ∞ and that h ≥ c 1 n − α with 0 ≤ α < 1 2 β +2 , or h = c 1 n − 1 2 β +2 with c 1 > c 0 =  σ 2 (2 β +2)( β +2) 8 L 2  1 2 β +2 . Then a = c 3 n − β / (2 β +2) (1 + o ( 1 )) (51) and g ( w ) ≤ c 4 n − 2 β 2+2 β , (52) wher e c 3 and c 4 ar e p ositive c onstants dep ending onl y on β , L and σ. 22 Pro of. W e ﬁrst pro ve (51) in the case where h = 1, i.e. U x 0 ,h = I . T hen b y the deﬁnition of a, we ha ve M 1 ( a ) = X x ∈ I ( a − L k x − x 0 k β ∞ ) + L k x − x 0 k β ∞ = σ 2 . (53) Let h = ( a/L ) 1 /β . The n a − L k x − x 0 k β ∞ ≥ 0 if and only if k x − x 0 k ∞ ≤ h . So from (53 ) w e get L 2 h β X k x − x 0 k ∞ ≤ h k x − x 0 k β ∞ − L 2 X k x − x 0 k ∞ ≤ h k x − x 0 k 2 β ∞ = σ 2 . (54) By the deﬁnition of the neighborho o d U x 0 , h it is easily seen that X k x − x 0 k ∞ ≤ h k x − x 0 k β ∞ = 8 N − β N h X k =1 k β +1 = 8 N 2 h β +2 β + 2 (1 + o (1)) and X k x − x 0 k ∞ ≤ h k x − x 0 k 2 β ∞ = 8 N − 2 β N h X k =1 k 2 β +1 = 8 N 2 h 2 β +2 2 β + 2 (1 + o (1)) . Therefore, (54 ) implies 8 L 2 β ( β + 2 ) (2 β + 2) N 2 h 2 β +2 (1 + o (1)) = σ 2 , from whic h w e infer tha t h = c 0 n − 1 2 β +2 (1 + o ( 1 )) (55) with c 0 =  σ 2 ( β +2)(2 β +2) 8 L 2 β  1 2 β +2 . F rom (5 5) and the deﬁnition of h , w e obtain a = L h β = Lc β 0 n − β 2 β +2 (1 + o ( 1 )) , whic h pro v e (51) in the case when h = 1. W e next pro ve (51) under t he conditions of the lemma. If h ≥ c 0 n − α , where 0 ≤ α < 1 2 β +2 , then it is clear tha t h ≥ h for n suﬃcien tly large. T herefore M h ( a ) = M 1 ( a ), th us w e arriv e a t equation (5 3 ) from whic h w e deduce (55). If h ≥ c 0 n − 1 2 β +2 and c 0 > c 1 , then again h ≥ h for n suﬃcien tly large. Therefore M h ( a ) = M 1 ( a ), a nd we arrive again at (55). W e ﬁnally pro v e (52). Denote for brevit y G h = X k x − x 0 k ∞ ≤ h ( h β − k x − x 0 k β ∞ ) + . Since h ≥ h for n suﬃcien tly large, w e ha ve M h ( a ) = M h ( a ) = σ 2 and G h = G h . Then it is easy to see that g ( w ) = σ 2 M h ( a ) + P k x − x 0 k ∞ ≤ h   a − L k x − x 0 k β ∞  +  2 L 2 G 2 h = σ 2 L a G h . 23 Since G h = X k x − x 0 k ∞ ≤ h ( h β − k x − x 0 k β ∞ ) = h β X 1 ≤ k ≤ N h 8 k − 8 N β X 1 ≤ k ≤ N h k β +1 = 4 β β + 2 N 2 h β +2 (1 + o (1)) = 4 β ( β + 2 ) L 1 /β N 2 a ( β +2) /β (1 + o (1)) , w e obtain g ( w ) = σ 2 ( β + 2) 4 β L 1 /β − 1 a − 2 β N 2 (1 + o (1)) = c 5 n − 2 β 2 β +2 (1 + o (1)) , where c 4 is a constant dep ending on β , L and σ . Pro of of Theorem 3 . As ρ ( x ) = | f ( x ) − f ( x 0 ) | + δ n , we hav e   X x ∈ U x 0 ,h w ( x ) ρ ( x )   2 ≤   X x ∈ U x 0 ,h w ( x ) | f ( x ) − f ( x 0 ) | + δ n   2 ≤ 2   X x ∈ U x 0 ,h w ( x ) | f ( x ) − f ( x 0 ) |   2 + 2 δ 2 n . Hence g ρ ( w ) ≤ 2 g ( w ) + 2 δ 2 n . So g ρ ( w ρ ) ≤ g ρ ( w ) ≤ 2 g ( w ) + 2 δ 2 n . Therefore, by Lemma 7 a nd the condition that δ n = O  n − β 2 β +2  , we obtain g ρ ( w ρ ) = O  n − 2 β 2 β +2  . This give s (26) . 5.4 Pro of of Theorem 4 W e b egin with a decomp osition of b ρ ′′ x 0 ( x ). Note that b ρ ′′ x 0 ( x ) =  d  Y ′′ x,η , Y ′′ x 0 ,η  − σ √ 2  + ≤    d  Y ′′ x,η , Y ′′ x 0 ,η  − σ √ 2    . (56) 24 Recall that M ′ = car d U ′ x 0 ,h = nh 2 / 2, m ′′ = car d V ′′ x 0 ,η = nη 2 / 2 . Let T x 0 ,x b e the translation mapping T x 0 ,x y = x + ( y − x 0 ) . D enote ∆ x 0 ,x ( y ) = f ( y ) − f ( T x 0 ,x y ) and ζ ( y ) = ε ( y ) − ε ( T x 0 ,x y ) . Since Y ( y ) − Y ( T x 0 ,x y ) = ∆ x 0 ,x ( y ) + ζ ( y ) , it is easy to see that d  Y ′′ x,η , Y ′′ x 0 ,η  2 = 1 m ′′ X y ∈ V ′′ x 0 ,η (∆ x 0 ,x ( y ) + ζ ( y )) 2 = ∆ 2 ( x ) + S ( x ) + 2 σ 2 , where ∆ 2 ( x ) = 1 m ′′ X y ∈ V ′′ x 0 ,η ∆ 2 x 0 ,x ( y ) , (57) S ( x ) = − 2 S 1 ( x ) + S 2 ( x ) (58) with S 1 ( x ) = 1 m ′′ X y ∈ V ′′ x 0 ,η ∆ x 0 ,x ( y ) ζ ( y ) , S 2 ( x ) = 1 m ′′ X y ∈ V ′′ x 0 ,η  ζ ( y ) 2 − 2 σ 2  . Notice that E S 1 ( x ) = E S 2 ( x ) = E S ( x ) = 0 . Then obviously d  Y ′′ x,η , Y ′′ x 0 ,η  − σ √ 2 = p ∆ 2 ( x ) + S ( x ) + 2 σ 2 − √ 2 σ 2 = ∆ 2 ( x ) + S ( x ) p ∆ 2 ( x ) + S ( x ) + 2 σ 2 + √ 2 σ 2 . (59) First we pro v e the following lemma. Lemma 8 Supp ose that the function f satisﬁes the lo c a l H¨ older c ondition (2 5 ). Then, for any x ∈ U ′ x 0 ,h , 1 3 ρ 2 f ,x 0 ( x ) − 2 L 2 η 2 β ≤ ∆ 2 ( x ) ≤ 3 ρ 2 f ,x 0 ( x ) + 6 L 2 η 2 β . Pro of. By t he decomp osition f ( y ) − f ( T x 0 ,x ( y )) = [ f ( x 0 ) − f ( x )] + [ f ( y ) − f ( x 0 )] + [ f ( x ) − f ( T x 0 ,x ( y ))] 25 and the inequalit y ( a + b + c ) 2 ≤ 3 ( a 2 + b 2 + c 3 ) we obtain ∆ 2 ( x ) = 1 m ′′ X y ∈ V ′′ x 0 ,η ( f ( y ) − f ( T x 0 ,x ( y ))) 2 ≤ 3 m ′′ X y ∈ V ′′ x 0 ,η ( f ( x 0 ) − f ( x )) 2 3 m ′′ X y ∈ V ′′ x 0 ,η ( f ( y ) − f ( x 0 )) 2 3 m ′′ X y ∈ V ′′ x 0 ,η ( f ( x ) − f ( T x 0 ,x ( y ))) 2 . By the lo cal H¨ older condition (25) this implies ∆ 2 ( x ) ≤ 3 ( f ( x 0 ) − f ( x )) 2 + 3 L 2 η 2 β + 3 L 2 η 2 β , whic h giv es the upp er b ound. The lo w er b ound can b e pro v ed similarly using the inequal- it y ( a + b + c ) 2 ≥ 1 3 a 2 − b 2 − c 2 . W e ﬁrst prov e a lar g e deviation inequalit y for S ( x ). Lemma 9 L et S ( x ) b e deﬁne d by (58). Then ther e ar e two c onstants c 1 and c 2 such that for any 0 ≤ z ≤ c 1 ( m ′′ ) 1 / 2 , P  | S ( x ) | ≥ z √ m ′′  ≤ 2 exp  − c 2 z 2  . Pro of. Denote ξ ( y ) = ζ ( y ) 2 − 2 σ 2 − 2∆ x 0 ,x ( y ) ζ ( y ) . Since ζ ( y ) = ε ( y ) − ε ( T x 0 ,x y ) is a normal random v ariable with mean 0 and v ariance 2 σ 2 , the random v ariable ξ ( y ) ha s an exp o nen tial momen t, i.e. there exist tw o p ositive constants t 0 and c 3 dep ending only on β , L and σ 2 suc h that φ y ( t ) = E e tξ ( y ) ≤ c 3 , for an y | t | ≤ t 0 . Let ψ y ( t ) = ln φ y ( t ) b e the cum ulate generating function. By Cheb yshev’s exp o nential inequalit y w e get, P { S ( x ) > z √ m ′′ } ≤ exp    − tz √ m ′′ + X y ∈ V ′′ x 0 ,η ψ y ( t )    , for any | t | ≤ t 0 and for an y z > 0 . By the-three t erms T aylor expansion, fo r | t | ≤ t 0 , ψ y ( t ) = ψ y (0) + tψ ′ y (0) + t 2 2 ψ ′′ y ( θ t ) , where | θ | ≤ 1 , ψ y (0) = 0 , ψ ′ y (0) = E ξ ( y ) = 0 a nd 0 ≤ ψ ′′ y ( t ) = φ ′′ y ( t ) φ y ( t ) −  φ ′ y ( t )  2 ( φ y ( t )) 2 ≤ φ ′′ y ( t ) φ y ( t ) . 26 Since, by Jensen’s inequalit y E e tξ ( y ) ≥ e t E ξ ( y ) = 1 , we obtain the follo wing upp er b ound: ψ ′′ y ( t ) ≤ φ ′′ y ( t ) = E ξ 2 ( y ) e tξ ( y ) . Using the elemen tar y inequalit y x 2 e x ≤ e 3 x , x ≥ 0 , w e hav e, for | t | ≤ t 0 / 3 , ψ ′′ y ( t ) ≤ 9 t 2 0 E  t 0 3 ξ ( y )  2 e t 0 3 ξ ( y ) ≤ 9 t 2 0 E e t 0 ξ ( y ) ≤ 9 t 2 0 c 3 . This implies that for | t | ≤ t 0 , 0 ≤ ψ y ( t ) ≤ 9 c 3 2 t 2 0 t 2 and P  S ( x ) > z √ m ′′  ≤ exp { − tz √ m ′′ + 9 c 3 2 t 2 0 m ′′ 2 } . If t = c 4 z / √ m ′′ ≤ t 0 / 3, where c 4 is a p ositiv e constan t, w e obtain P  S ( x ) > z √ m ′′  ≤ exp  − c 4 z 2  1 − 9 c 3 2 t 2 0 c 4  . Cho osing c 4 > 0 suﬃcien tly small w e get P  S ( x ) > z √ m ′′  ≤ exp  − c 5 z 2  for some constant c 5 > 0 . In the same wa y we sho w that P  S ( x ) < − z √ m ′′  ≤ exp  − c 5 z 2  . This prov es the lemma. W e next prov e that ρ ′′ x 0 ( x ) is uniformly o f order O  n − β 2 β +2 √ ln n  with probability 1 − O ( n − 2 ) , if h has the order n − 1 2 β +2 . Lemma 10 Supp ose that the function f satisﬁ e s the lo c al H¨ older c ondition (25). Assume that h = c 1 n − 1 2 β +2 with c 1 > c 0 =  σ 2 ( β +2)(2 β +2) 8 L 2 β  1 2 β +2 and that η = c 2 n − 1 2 β +2 . Th e n ther e exists a c onstant c 3 > 0 dep ending only on β , L and σ , such that P  max x ∈ U x 0 ,h b ρ ′′ x 0 ( x ) ≥ c 3 n − β 2 β +2 √ ln n  = O  n − 2  . (60) Pro of. Using Lemma 9, there are t w o constants c 4 , c 5 suc h that, for an y z satisfying 0 ≤ z ≤ c 4 ( m ′′ ) 1 / 2 , P max x ∈ U ′ x 0 ,h | S ( x ) | ≥ z √ m ′′ ! ≤ X x ∈ U ′ x 0 ,h P  | S ( x ) | ≥ z √ m ′′  ≤ 2 m ′′ exp  − c 5 z 2  . 27 Recall that m ′′ = nη 2 / 2 = c 7 n 2 β 2 β +2 . Leting z = √ c 6 log m ′′ and c ho o sing c 6 suﬃcien tly large w e obtain P max x ∈ U ′ x 0 ,h | S ( x ) | ≥ c 8 n − β 2 β +2 √ ln n ! ≤ c 9 n 2 . (61) Using Lemma 8 and t he lo cal H¨ older conditio n (25) w e ha v e ∆ 2 ( x ) ≤ cL 2 h 2 β , for x ∈ U ′ x 0 ,h . F rom (56) and (59), with probability 1 − O ( n − 2 ) , w e hav e max x ∈ U ′ x 0 ,h b ρ ′′ x 0 ( x ) ≤ max x ∈ U ′ x 0 ,h ∆ 2 ( x ) + | S ( x ) | p ∆ 2 ( x ) + S ( x ) + 2 σ 2 + √ 2 σ 2 ≤ cL 2 h 2 β + c 8 n − β 2 β +2 √ ln n √ 2 σ 2 . Since h = O  n − 1 2 β +2  , this give s the desired r esult. W e t hen prov e that given { Y ( x ) , x ∈ I ′′ x 0 } , the conditional exp ectation of | b f ′ h,η ( x 0 ) − f ( x 0 ) | is o f order O  n − 2 β 2 β +2 √ ln n  with probability 1 − O ( n − 2 ) . Lemma 11 Supp ose that the c onditions of The or em 4 a r e sa tisﬁe d. T hen P  E {| b f ′ h,η ( x 0 ) − f ( x 0 ) | 2   Y ( x ) , x ∈ I ′′ x 0 } ≥ cn − 2 β 2 β +2 ln n  = O ( n − 2 ) , wher e c > 0 is a c onstant dep ending o n ly on β , L and σ. Pro of. By ( 2 9) and the indep endence of ε ( x ), w e hav e E {| b f ′ h,η ( x 0 ) − f ( x 0 ) | 2   Y ( x ) , x ∈ I ′′ x 0 } ≤   X x ∈ U ′ x 0 ,h b w ′′ ( x ) ρ f ,x 0 ( x )   2 + σ 2 X x ∈ U ′ x 0 ,h b w ′′ 2 ( x ) . (62) Since ρ f ,x 0 ( x ) < Lh β , from (62) w e get E {| b f ′ h,η ( x 0 ) − f ( x 0 ) | 2   Y ( x ) , x ∈ I ′′ x 0 } ≤   X x ∈ U ′ x 0 ,h b w ′′ β   2 + σ 2 X x ∈ U ′ x 0 ,h b w ′′ 2 ( x ) ≤ L 2 h 2 β + σ 2 X x ∈ U ′ x 0 ,h b w ′′ 2 ( x ) ≤   X x ∈ U ′ x 0 ,h b w ′′ ( x ) b ρ ′′ x 0 ( x )   2 + σ 2 X x ∈ U ′ x 0 ,h b w ′′ 2 ( x ) + L 2 h 2 β . (63) 28 Let w ∗ 1 = arg min w g 1 ( w ), where g 1 ( w ) =   X x ∈ U ′ x 0 ,h w ( x ) ρ f ,x 0 ( x )   2 + σ 2 X x ∈ U ′ x 0 ,h w 2 ( x ) . (64) As b w ′′ minimizes the function in (30), f r om (63 ) we obtain E {| b f ′ h,η ( x 0 ) − f ( x 0 ) | 2   Y ( x ) , x ∈ I ′′ x 0 } ≤   X x ∈ U ′ x 0 ,h w ∗ 1 ( x ) b ρ ′′ x 0 ( x )   2 + σ 2 X x ∈ U ′ x 0 ,h w ∗ 2 1 ( x ) + L 2 h 2 β . (65) By Lemma 1 0, with probabilit y 1 − O ( n − 2 ) w e ha ve X x ∈ U ′ x 0 ,h w ∗ 1 ( x ) b ρ ′′ x 0 ( x ) ≤ c 1 n − β 2 β +2 √ ln n. Therefore by (65 ), with proba bilit y 1 − O ( n − 2 ), E {| b f ′ h,η ( x 0 ) − f ( x 0 ) | 2   Y ( x ) , x ∈ I ′′ x 0 } ≤ σ 2 X x ∈ U ′ x 0 ,h w ∗ 2 1 ( x ) + c 2 1 n − 2 β 2 β +2 ln n + L 2 h 2 β ≤ g 1 ( w ∗ 1 ) + c 2 1 n − 2 β 2 β +2 ln n + L 2 h 2 β . This giv es the assertion of Lemma 13, as h 2 β = O  n − 2 β 2 β +2  and g 1 ( w ∗ 1 ) = O  n − 2 β 2 β +2  , b y Lemma 7 with U ′ x 0 ,h instead of U x 0 ,h . No w we are ready to prov e Theorem 4. Pr o of of The or em 4. Since the function f satisﬁes H¨ older’s condition, by the deﬁnition of g 1 ( w ) (cf. (64)) w e ha ve g 1 ( w ) ≤   X x ∈ U ′ x 0 ,h w ( x ) Lh β   2 + σ 2 X x ∈ U ′ x 0 ,h w 2 ( x ) ≤ L 2 h 2 β + σ 2 ≤ L 2 + σ 2 , so that E  | b f ′ h,η ( x 0 ) − f ( x 0 ) | 2   Y ( x ) , x ∈ I ′′ x 0  ≤ g 1 ( b w ′′ ) ≤ L 2 + σ 2 . Denote b y X the conditional expectation in the abov e dis play a nd write 1 {·} for the indicator function of the set {·} . Then E X = E X · 1 { X ≥ cn − 2 β 2 β +2 ln n } + E X · 1 { X < cn − 2 β 2 β +2 ln n } ≤  L 2 + σ 2  P { X ≥ cn − 2 β 2 β +2 ln n } + cn − 2 β 2 β +2 ln n. 29 So applying Lemma 11, w e see that E  | b f ′ h,η ( x 0 ) − f ( x 0 ) | 2  = E X ≤ O ( n − 2 ) + cn − 2 β 2 β +2 ln n = O  n − 2 β 2 β +2 ln n  . This prov es Theorem 4 . 5.5 Pro of of Theorem 5 W e k eep the notations of the prev oius subsection. The following result g iv es a t w o sided b ound for b ρ ′′ x 0 ( x ) . Lemma 12 Supp ose that the function f satisﬁ e s the lo c al H¨ older c ondition (25). Assume that h = c 1 n − α with c 1 > 0 and α < 1 2 β +2 and that η = c 2 n − 1 2 β +2 . Then ther e exists p ositive c onstants c 3 , c 4 , c 5 and c 6 dep ending on l y on β , L and σ , such that P  max x ∈ U x 0 ,h  b ρ ′′ x 0 ( x ) − c 3 ρ 2 f ,x 0 ( x )  ≤ c 4 n − β 2 β +2 √ ln n  = 1 − O  n − 2  (66) and P  max x ∈ U x 0 ,h  ρ 2 f ,x 0 ( x ) − c 5 b ρ ′′ x 0 ( x )  ≤ c 6 n − β 2 β +2 √ ln n  = 1 − O  n − 2  . (67) Pro of. As in the pro of of Lemma 10, we ha ve P max x ∈ U ′ x 0 ,h | S ( x ) | ≥ c 7 n − β 2 β +2 √ ln n ! ≤ c 8 n 2 . Using Lemma 8 , for an y x ∈ U ′ x 0 ,h , 1 3 ρ 2 f ,x 0 ( x ) − 2 L 2 η 2 β ≤ ∆ 2 ( x ) ≤ 3 ρ 2 f ,x 0 ( x ) + 6 L 2 η 2 β . (68) F rom (56) we ha ve d  Y ′′ x,η , Y ′′ x 0 ,η  − σ √ 2 = ∆ 2 ( x ) + S ( x ) p ∆ 2 ( x ) + S ( x ) + 2 σ 2 + √ 2 σ 2 . F or the upp er b o und w e ha v e, for an y x ∈ U ′ x 0 ,h , b ρ ′′ x 0 ( x ) =  d  Y ′′ x,η , Y ′′ x 0 ,η  − σ √ 2  + ≤ 3 ρ 2 f ,x 0 ( x ) + 6 L 2 η 2 β + | S ( x ) | √ 2 σ 2 Therefore, with probabilit y 1 − O ( n − 2 ) , max x ∈ U ′ x 0 ,h  b ρ ′′ x 0 ( x ) − 3 ρ 2 f ,x 0 ( x ) √ 2 σ 2  ≤ 6 L 2 η 2 β + c 7 n − β 2 β +2 √ ln n √ 2 σ 2 ≤ c 8 n − β 2 β +2 √ ln n. 30 F or the low er b ound, we hav e, for any x ∈ U ′ x 0 ,h , w e ha ve b ρ ′′ x 0 ( x ) =  d  Y ′′ x,η , Y ′′ x 0 ,η  − σ √ 2  + = (∆ 2 ( x ) + S ( x )) + p ∆ 2 ( x ) + S ( x ) + 2 σ 2 + √ 2 σ 2 ≥ (∆ 2 ( x ) + S ( x )) + q ∆ 2 ( x ) + c 7 n − β 2 β +2 √ ln n + 2 σ 2 + √ 2 σ 2 ≥ c 9  ∆ 2 ( x ) + S ( x )  + ≥ c 9  ∆ 2 ( x ) − | S ( x ) |  . T aking in to accoun t (68), on the set n max x ∈ U ′ x 0 ,h | S ( x ) | < c 7 n − β 2 β +2 √ ln n o , b ρ ′′ x 0 ( x ) ≥ c 9  1 3 ρ 2 f ,x 0 ( x ) − 2 L 2 η 2 β − | S ( x ) |  ≥ c 10  ρ 2 f ,x 0 ( x ) − η 2 β − n − β 2 β +2 √ ln n  Therefore, with probabilit y 1 − O ( n − 2 ) , max x ∈ U ′ x 0 ,h  c 10 ρ 2 f ,x 0 ( x ) − b ρ ′′ x 0 ( x )  ≤ c 10  η 2 β + n − β 2 β +2 √ ln n  ≤ c 11 n − β 2 β +2 √ ln n. So the lemma is pro v ed. W e t hen prov e that given { Y ( x ) , x ∈ I ′′ x 0 } , the conditional exp ectation of | b f ′ h,η ( x 0 ) − f ( x 0 ) | is o f order O  n − β 2 β +2 √ ln n  with probability 1 − O ( n − 2 ). Lemma 13 Supp ose that the c onditions of The or em 5 a r e sa tisﬁe d. T hen P  E {| b f ′ h,η ( x 0 ) − f ( x 0 ) | 2   Y ( x ) , x ∈ I ′′ x 0 } ≥ cn − β 2 β +2 ln n  = O ( n − 2 ) , wher e c > 0 is a c onstant dep ending o n ly on β , L and σ . Pro of. By ( 2 9) and the indep endence of ε ( x ), w e hav e E {| b f ′ h,η ( x 0 ) − f ( x 0 ) | 2   Y ( x ) , x ∈ I ′′ x 0 } ≤   X x ∈ U ′ x 0 ,h b w ′′ ( x ) ρ f ,x 0 ( x )   2 + σ 2 X x ∈ U ′ x 0 ,h b w ′′ 2 ( x ) . Since, by Lemma 12, with probability 1 − O ( n − 2 ) , max x ∈ U x 0 ,h  ρ 2 f ,x 0 ( x ) − c 1 b ρ ′′ x 0 ( x )  ≤ c 2 n − β 2 β +2 √ ln n, 31 w e get (with proba bility 1 − O ( n − 2 )), E {| b f ′ h,η ( x 0 ) − f ( x 0 ) | 2   Y ( x ) , x ∈ I ′′ x 0 } ≤ c 3   X x ∈ U ′ x 0 ,h b w ′′ ( x ) q b ρ ′′ x 0 ( x )   2 + c 2 n − β 2 β +2 √ ln n + σ 2 X x ∈ U ′ x 0 ,h b w ′′ 2 ( x ) . (69) A simple truncation a r gumen t, using the decomp osition b ρ ′′ x 0 ( x ) = b ρ ′′ x 0 ( x ) 1 n b ρ ′′ x 0 ( x ) ≤ n − β 2 β +2 o + b ρ ′′ x 0 ( x ) 1 n b ρ ′′ x 0 ( x ) > n − β 2 β +2 o , giv es X x ∈ U ′ x 0 ,h b w ′′ ( x ) q b ρ ′′ x 0 ( x ) ≤ n − 1 2 β 2 β +2 X x ∈ U ′ x 0 ,h b w ′′ ( x ) + n 1 2 β 2 β +2 X x ∈ U ′ x 0 ,h b w ′′ ( x ) b ρ ′′ x 0 ( x ) ≤ n − 1 2 β 2 β +2 + n 1 2 β 2 β +2 X x ∈ U ′ x 0 ,h b w ′′ ( x ) b ρ ′′ x 0 ( x ) . (70) F rom (69) and (70) o ne gets E {| b f ′ h,η ( x 0 ) − f ( x 0 ) | 2   Y ( x ) , x ∈ I ′′ x 0 } ≤ c 4 n β 2 β +2     X x ∈ U ′ x 0 ,h b w ′′ ( x ) b ρ ′′ x 0 ( x )   2 + σ 2 X x ∈ U ′ x 0 ,h b w ′′ 2 ( x )   + c 5 n − β 2 β +2 √ ln n. Let w ∗ 1 = arg min w g 1 ( w ), where g 1 w as deﬁned in (65). As b w ′′ minimize the function in (30), from ( 6 3) w e obtain E {| b f ′ h,η ( x 0 ) − f ( x 0 ) | 2   Y ( x ) , x ∈ I ′′ x 0 } ≤ c 4 n β 2 β +2     X x ∈ U ′ x 0 ,h w ∗ 1 ( x ) b ρ ′′ x 0 ( x )   2 + σ 2 X x ∈ U ′ x 0 ,h w ∗ 2 1 ( x )   + c 5 n − β 2 β +2 √ ln n. (71) By Lemma 1 2, with probabilit y 1 − O ( n − 2 ) , max x ∈ U x 0 ,h  b ρ ′′ x 0 ( x ) − c 6 ρ 2 f ,x 0 ( x )  ≤ c 7 n − β 2 β +2 √ ln n. Therefore, with probabilit y 1 − O ( n − 2 ) , E {| b f ′ h,η ( x 0 ) − f ( x 0 ) | 2   Y ( x ) , x ∈ I ′′ x 0 } ≤ c 8 n β 2 β +2     X x ∈ U ′ x 0 ,h w ∗ 1 ( x ) ρ f ,x 0 ( x )   2 + σ 2 X x ∈ U ′ x 0 ,h w ∗ 2 1 ( x )   + c 9 n − β 2 β +2 √ ln n = c 8 n β 2 β +2 g 1 ( w ∗ 1 ) + c 9 n − β 2 β +2 √ ln n. 32 This g iv es the a ssertion o f Lemma 13, as g 1 ( w ∗ 1 ) = O  n − 2 β 2 β +2  b y Lemma 7 with U ′ x 0 ,h instead o f U x 0 ,h . Pr o of of The or em 5 . Since the function f satisﬁes H¨ older’s condition, b y the deﬁnition of g 1 ( w ) (cf. (64)) w e ha ve (see the pro of of Theorem 4) g 1 ( w ) ≤ L 2 + σ 2 so that E  | b f ′ h,η ( x 0 ) − f ( x 0 ) | 2   Y ( x ) , x ∈ I ′′ x 0  ≤ g 1 ( b w ′′ ) ≤ L 2 + σ 2 . Denote by X the conditional exp ectation in t he ab ov e displa y . Then E X = E X · 1 { X ≥ cn − β 2 β +2 ln n } + E X · 1 { X < cn − β 2 β +2 ln n } ≤  L 2 + σ 2  P { X ≥ cn − β 2 β +2 ln n } + cn − β 2 β +2 ln n. So applying Lemma 13, w e see that E  | b f ′ h,η ( x 0 ) − f ( x 0 ) | 2  = E X ≤ O ( n − 2 ) + cn − β 2 β +2 ln n = O  n − β 2 β +2 ln n  . This prov es Theorem 5 6 Conclus ion A new imag e denoising ﬁlter to deal with the additive Gaussian white noise mo del based on a w eigh ts optimization problem is propo sed. The prop osed algorithm is c omputa- tionally fast and its implemen tation is straightforw ard. Our w ork leads to the fo llo wing conclusions. 1. In the non-lo cal means ﬁlter the choice of the Gaussian k ernel is not j ustiﬁed. Our approac h sho ws that it is preferable to c ho ose the triangular k ernel. 2. The obtained estimator is show n to con v erge at the usual optimal rate, under some regularit y conditions on the tar g et function. T o the b est of our k nowledge suc h con v ergence results ha v e not b een established so far. 3. Our ﬁlt er is parameter free in the sens e that it c ho oses auto matically the ba ndwidth parameter. 4. Our nume rical results conﬁrm that optimal c hoice of the k ernel improv es the p er- formance of the non-lo cal means ﬁlter, under the same conditions. 33 References [1] A. Buades, B. Coll, and J.M. Morel. A review of image denoising algorit hms, with a new one. Multisc ale Mo del. Si m ul. , 4(2):490 – 530, 2006. [2] T. Buades, Y. Lou, JM Morel, and Z. T ang . A note on multi-image denoising. In I nt. workshop on L o c al and Non- L o c al Appr oximation in I mage Pr o c essing , pages 1 – 15, August 2009 . [3] J.F. Cai, R.H. Chan, and M. Nik olo v a. Tw o-phase approac h for deblurring images corrupted by impulse plus Gaussian noise. I nverse Pr oblems and Imag i n g , 2(2):18 7– 204, 2008. [4] K. Dab ov, A. F oi, V. Katk ovnik , and K. Egia zar ia n. Color image denoising via sparse 3D collab orative ﬁltering with grouping constrain t in luminance-c hrominance space. In IEEE In t. Conf. I mage Pr o c ess. , v olume 1, pages 31 3–316, Septem b er 200 7 . [5] K. Dab o v, A. F oi, V. Katko vnik, and K. Egiazarian. Image denoising by sparse 3-D transform-domain colla b orativ e ﬁltering. IEEE T r ans. Image Pr o c ess . , 16(8):2080 – 2095, 2007. [6] D.L. D onoho and J.M . Johnstone. Ideal spatial adapta t ion b y w a v elet shrink a ge. Biometrika , 81(3):42 5, 1 994. [7] J.Q. F an and I. Gijb els. Lo cal p olynomial mo delling and its applications. In C hapman & Hal l, L ondon , 1996 . [8] R. Ga rnett, T. Huegeric h, C. Chui, and W. He. A univ ersal noise remo v al algorit hm with an impulse detector. IEEE T r ans. Image Pr o c e s s . , 14 (11):1747 – 1754, 20 05. [9] V. Katk o vnik, A. F oi, K. Egiazarian, and J. Astola. F rom lo cal k ernel to nonlo cal m ultiple-mo del image denoising. Int. J. C omput. Vis. , 86 (1):1–32, 2010 . [10] C. Kervrann and J. Boulanger. O ptimal spatial adapta t io n for patc h-based image denoising. I EEE T r ans. Image Pr o c ess. , 15 (10):2866– 2878, 2 0 06. [11] C. Kervrann a nd J. Boula ng er. Lo cal adaptivit y to v ariable smo othness for exemplar- based image regularization and represen tation. Int. J. C omput. Vis. , 79( 1 ):45–69, 2008. [12] C. Kervrann, J. Boulanger, and P . Coup ´ e. Ba y esian non-lo cal means ﬁlter, image redundancy and adaptive dictionaries for noise remo v al. In Pr o c. Conf. Sc ale-Sp ac e and V ariational Meth. (SSVM’ 07) , pages 520–5 32. Springer, June 2007. [13] B. Li, Q.S. Liu, J.W. Xu, and X.J. Luo. A new metho d for remo ving mixed noises. Sci. China Ser. F (Inf o rmation s c ienc es) , 54 :1 –9, 2010 . [14] Y. Lou, X. Zhang, S. Osher, and A. Berto zzi. Image reco v ery via nonlo cal op erators. J. Sci. Com put. , 42(2):185 – 197, 20 1 0. 34 [15] A.V. Nazin, J. Roll, L. Ljung, and I. Grama. Direct we ight optimization in statistical estimation and system iden tiﬁcation. System Identiﬁc ation and Contr ol Pr oblems (SICPRO08), Mosc ow , Jan uary 28-31 2 008. [16] J. P olzehl and V. Sp ok oiny . Image denoising: point wise adaptiv e approach. Ann. Stat. , 31(1):30– 5 7, 2 003. [17] J. P olzehl and V. Sp ok oiny . Propa gation-separation approac h fo r lo cal lik eliho o d estimation. Pr ob ab. The ory R el. , 135(3):33 5 –362, 20 06. [18] J. Polz ehl and V.G. Sp okoin y . Adaptiv e w eigh ts smo ot hing with applications t o image restoration. J. R oy. Stat. So c. B , 62(2 ) :335–354, 200 0. [19] J. Roll. Lo cal and piecewise aﬃne a pproac hes to system identiﬁc atio n. Ph.D. dis- sertation, Dept. Ele ct. Eng., Link¨ o ing Univ e rsity, Link¨ oing, Swe den , 2003. [20] J. Roll and L . Ljung. Extending the direct we ight optimization approac h. In T e chnic al R ep ort LiTH-I S Y-R-2601. Dept. of EE, Link¨ oping Univ., Sw e den , 2004. [21] J. Roll, A. Nazin, and L. Ljung. Nonlinear system iden tiﬁcation via direct w eight optimization. Automatic a , 4 1(3):475– 4 90, 200 5 . [22] J. Sack s and D. Ylvisak er. Linear estimation for a ppro ximately linear mo dels. A nn. Stat. , 6(5):1122 – 1137, 19 78. [23] C. T omasi and R. Manduch i. Bilateral ﬁltering for gra y and color images. In Pr o c. Int. Conf. Com p uter Vision , pages 839–846, January 0 4 -07 1998. [24] P . Whittle. Optimization under constraints: theory and applicatio ns of nonlinear programming. In Wiley-In terscienc e, New Y ork , 1971 . [25] L. P . Y aro sla vsky . Digital picture pro cessing. An in tro duction. In S pringer-V erlag, Berlin , 1985. 35

Removing Gaussian Noise by Optimization of Weights in Non-Local Means

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment