Oracle inequalities and minimax rates for non-local means and related adaptive kernel-based methods

c  xxxx Society for Industrial and Applied Mathematics V ol. xx, pp. x x–x Oracle inequalities and minimax rates fo r non-lo cal means and related adaptive k ernel-based metho ds ∗ Ery Arias-Castro † , Joseph Salmon ‡ , and Reb ecca Willett ‡ Abstract. This paper describes a no vel theoretical characterizatio n of the p erformance of non-lo cal means (NLM) for noise remov al. NLM has prov en eﬀectiv e in a v ariety of e mpirical studies, b u t little is understo od fundamentall y ab out ho w it performs relativ e to cl assical methods based on w av elets or how its parameters should be chosen. F or carto on images and images which ma y con tain thin features and regular textures, the error deca y rates of N LM are derived and compared with those of linear ﬁltering, ora cle estimators, Y aro sla vsky’s ﬁlter and wa velet thresholding estimators. The trade-oﬀ betw een g lobal and local searc h for matching patc hes is examined, and the bias reduction associated with the local p olynomial regression versio n of NLM is analyzed. The th eoretical results are v alidated via sim ulations for 2D images corrupted by additive w hite Gaussian n oise. Key wo rds. Non-local means (NLM), Y arosla vsky’s ﬁlter, kernel smoothing, p atc h-based metho ds, lo cal p oly- nomial regression, oracle b ounds, minimax b ounds, cartoon mo del, textu res. 1. Intro duction. The classical problem of image noise r emo v al has drawn signiﬁ can t at- ten tion dur in g the past few d ecades fr om the image pr ocessing, compu tatio nal harmonic anal- ysis, nonlinear a pproxima tion, and statist ics comm u nities. In recen t y ears there has b een a resur gence of in terest in k ernel-based method s , includ ing the u biquitous non-lo cal means (NLM) algorithm [ 6 ], due to their pr actic al eﬃcacy on broad collectio ns of “natural” images. While there is a we alth of theoretical analysis asso ciated with n on lin ear thr esholding estima- tors based on wa v elets and r elated sp ars e multiscale repr esen tations of images [ 13 , 14 , 35 , 51 ] or on diﬀusion mo dels [ 52 , 47 ] and p artial diﬀerenti al equations [ 39 , 1 ], p erformance guarante es for NLM are lacki ng and this pap er aims at pro viding some resu lts in this d irection. In t his pap er, w e explore the theoretical underpinn in gs of adaptiv e k ernel-based image estimation and deriv e b oun ds on the mean squared error as a function of the num b er of pixels observ ed and features of the un derlying image. The d enoising metho ds we consider are b ased on estimating eac h p ixel v alue with a weigh ted sum of the sur rounding pixels. Dep end in g on ho w the weig h ts in this a v erage are selected, this corresp onds to classical linear ﬁlters [ 38 , 60 ], Y arosla vsky ’s ﬁlter (YF) [ 62 ], the Sigma ﬁlter [ 28 ], or the bilateral ﬁlter [ 55 ]. It also includ es v ariable-bandwidth kernel estimators [ 29 ], referred to as Lepski’s metho d by statisticians and as the Inte rsection of Conﬁdence In terv als (ICI) rule [ 22 , 23 ] in signal pro cessing. Other v arian ts for a lo cal c hoice of the kernel include [ 49 , 53 ] W e r efer to [ 20 , 43 , 36 ] for more insigh ts on a un ifying framew ork for av eraging ﬁlters. As none of these method s h a ve b een explicitly designed t o deal with textured regions, man y authors, insp ired b y work on texture syn thesis [ 16 ] and inpain ting [ 8 ], h a ve prop osed to introd uce patc hes (small sub -image s) to tak e adv anta ge of natural image r ed undancy , es- ∗ The au tho rs started wo rk ing on the pap er at the Institute for Math ematics an d its Applications and gratefully acknowl edge supp o rt from DARP A grant n o. F A8650-11-1-7150, AF OSR aw ard no. F A9550-10-1-0390, and NSF aw ards no. CCF- 06-4394 7. † Department of Mathematics; U niv ersit y of California, San Diego, CA, USA. ‡ Department of Electrical and Computer Engineering; Duke Universit y , Du rh am, NC, US A. 1 2 E. Arias-Cas tro et al. p ecially in textured regions. NLM [ 6 ] and UINT A [ 3 ] algorithms are typical examples of this approac h, as is their extension u s ing Lepski’s metho d [ 26 ]. Th ose algorithms rely on a v er ag- ing similar pixels, wh ere the similarit y is measured thr ough p atc hes cen tered on th e pixel of in terest. Some more elab orate metho ds hav e tried to remov e artifacts app earing in regions with lo w redu ndancy [ 45 ] — a phenomenon also kn o wn as the r ar e p atch eﬀe ct [ 15 ] — for instance by c hoosing NLM parameters automatically and lo cally . A common tool used for this lo cal adaptivit y is the Stein Unbiase d Risk Estimate (SURE) [ 15 , 58 , 59 ]. Most current state-of-the-a rt m ethod s for d enoising tak e adv antag e of th e patc h fr amew ork [ 32 , 9 , 10 ]. The inte rested reader could get a clear picture of p r actic al p erformance of those recen t metho ds, in the r eview pap er by K atk o vnik et al . [ 25 ]. Despite the strong empirical p erformance of th ese m etho ds , few p erformance guarantees exist: b ou n ds with information theory ﬂa v or are derived in [ 61 ] for a simple version of NLM; a consistency r esult relying on b eta-mixing assu mptions on the image and on th e noise (b oth mo deled as rand om v ariables) is obtained in [ 5 , 6 ]; [ 47 ] p rop oses a graph-diﬀus ion in terpretation for a simple image mo del; a bias/v ariance analysis aiming at lo cally c ho osing NLM parameters is carried out in [ 15 ]; [ 30 , 7 ] obtain Cramer-Rao t yp e eﬃciency results. While ﬁnish ing this pap er, w e b ecame a ware of t w o related pap ers b y Maleki, Nara y an and Baraniuk, addressing optimal p erf ormance in the con text of non-parametric minim ax estimation [ 34 , 33 ]. [ 34 ] ev aluates t he p erformance of NLM for the piecewise constant horizon mo del [ 27 ], while [ 33 ] considers an anisotropic v arian t of NLM for the same image class. The latter shares sev eral features w ith earlier w ork on anisotropic NLM [ 11 , 12 ]. Our w ork is most closely related to [ 34 ], addressing the same c hallenge of q u an tifying the p erformance of NLM and related metho ds, an d at the same time con tains seve ral no v el con tributions. While the pap er was under review, we learned ab out an older pap er of Tsybako v [ 56 ]. This pap er p rop oses and analyzes a patc h-based metho d that compares the medians o v er patches. The pap er also derive s a m inimax low er b ound for th e cartoon mo del we consider. W e comment in more d etail on the work of Maleki [ 34 ] et al and the w ork of Ts ybak ov [ 56 ] in Section 7 . 1.1. Our contribution. W e d eriv e theoretical p erformance b ounds for the linear ﬁlter, or- acle v ariable-bandwidth k ernel metho ds, Y arosla vsky’s ﬁ lter and NLM — b oth the original [ 6 ] and a fast patc h-mean based v arian t [ 31 ] — in the classical “carto on” mo del in whic h an image consists of smo oth su rfaces separated b y a smo oth d iscontin uity , a p opular mo del in statistics [ 27 ]. Our results are for the lo cal p olynomial versions of these metho ds. (The systematic bias associated with NLM n ear discon tin uities — and b oundaries — i s sho wn to disapp ear when using a lo cal p olynomial regression.) W e also consider non s tandard image classes, one mo deling images with thin features and another o ne mo deling regular textures. The latter is particularly signiﬁcan t b ecause it highligh ts some of the k ey adv ant ages of p atc h-based metho ds ov er, sa y , w a ve let thr esholding estimators. Previous insigh ts into the p erf ormance of NLM-lik e metho ds on t extures are empirical at b est; w e are n ot a ware of an y theory in this ve in. Our b enchmarks are t w o oracle inequalities, th ough man y of our theoretical results can b e compared directly with sim ilar classic al results in the w a v elet literature and kno wn minimax lo w er b ounds on mean squared err or (MSE ) [ 14 , 27 ]. The cartoon mo del for images has b een a b enc h mark for image denoising met ho ds, at least since th e w ork of Korostelev and T sybak o v, c ondensed in [ 27 ]. This mo del is relev an t Minimax b ounds for non-lo cal means 3 when comparing denoising m etho ds on texture-less images. The other m o dels are no v el and tailored to situations where the image exhib its some thin features — lik e the legs of the Cameraman’s trip o d — and regular textur es — like the patterns in Barbara’s blouse. Though these mo dels do n ot reﬂect th e complexities of real images, w e do gain some qualitativ e insigh ts. First, we learn that v ariable band width k ernel metho ds are fundamenta lly limited b y the bias near discont in uities. Y a rosla v s ky’s ﬁlter is found to b e near-optimal when the noise lev el is suﬃcien tly lo w that the d iﬀerent regions in the carto on im age do not mix wh en noise is added; and w hen this is not the case, the metho d b ecomes us eless. In non-lo cal means, the patc h size should b e c hosen just suﬃ cien tly large that n earb y patc hes fr om diﬀerent regions lo ok diﬀerent (in the a v erage v ersion of the NLM, this can b e made very precise). The searc h windo w should b e c hosen lik e a standard kernel bandw idth. W e quickly argue that not lo caliz ing these metho ds ma y lead to v ery p o or p erformance, in agreemen t w ith [ 44 , 64 ]. Also, while the NLM av erage and regular NLM p erf orm similarly on carto on images, th e latter is sup erior when textures are pr esen t. 1.2. Organization of the pap er. In Section 2 w e describ e the mathematical framewo rk. In S ectio n 3 we introdu ce the metho ds that w e analyze in the sequel. In Section 4 we state p erformance guarantees in the carto on mo del for these metho ds, and in Section 5 we do the same in the context of the thin feature and r egular pattern m o dels. In Section 6 we p erf orm some numerical exp erim ents carefully illustrating our theoretica l ﬁndings. In Sectio n 7 we con trast our contribution to that of Maleki et al. [ 34 ] and discuss extensions. Th e pro ofs are gathered in Section 8 , wh ich includes general results on lo cal p olynomial regression which ma y b e of indep en d en t in terest. 1.3. Notation. W e use standard n otat ion. F or non -n egat iv e sequ ences ( a n ) an d ( b n ), a n = O ( b n ) ( same as a n  b n ) i f the sequ ence | a n /b n | i s bou n ded from abov e; a n ≍ b n if a n = O ( b n ) and b n = O ( a n ); a n = o ( b n ) if a n /b n → 0 as n → ∞ . F or r eal num b ers a and b , a ∨ b = max( a, b ) while a ∧ b = min ( a, b ). F or a L eb esgue-measurable sub set A ⊂ R d , V ol( A ) denotes its Leb esgue measur e. F or an y x ∈ R d , we deﬁn e its Euclidean and sup norm as k x k 2 = d X i =1 x 2 i ! 1 / 2 , k x k := k x k ∞ = d max i =1 | x i | . W e use the notation B (0 , 1) (resp. B (0 , 1)) to denote th e op en (resp. close d) u nit ball for the supnorm . F or η > 0, we deﬁ ne th e η -neighborh oo d (for the n orm k · k ) of a set A ⊆ R d as B ( A, η ) = { x ∈ R d : d ist( x, A ) < η } . F or a discrete set A , w e denote its cardinality by either | A | or # A . F or a set A ⊂ R d , 1 { A } is the in d icato r f unction of A , while for a discrete sub set B ⊂ { 1 , . . . , m } , 1 B denotes the ve ctor with ent ries indexed b y B equal to one, and all others equ al to zero. Additional notation is in tro duced in the text as needed. 2. F unction estimation in a dditive white noise. W e cast th e problem of image denoising as a non-parametric regression problem in the p resence of white noise, a standard mo del in statistics [ 27 ]. W e consider the general d -dimensional problem, and use the term “image” to 4 E. Arias-Cas tro et al. denote an y d iscretized signal on the d -dimen s ional square lattice, with imp ortan t cases w hen 1 ≤ d ≤ 4. Thou gh p atch-based metho ds were d esigned for 2D images, we consider a general dimension, as the same tec hniques ma y apply in color, sp ectral, 3D and 4D imaging [ 63 ]. W e observ e noisy samples { y i ∈ R : i ∈ I d n } (wher e I n := { 1 , . . . , n } ) of the target f u nction f : [0 , 1] d → [0 , 1] at t he design p oints { x i ∈ R d : i ∈ I d n } corru pted by an additiv e n oise { ε i ∈ R : i ∈ I d n } , as f ollo ws y i = f ( x i ) + ε i , i ∈ I d n . (2.1) F or n ow, w e only assu me th at the noise { ε i : i ∈ I d n } are uncorrelated with mean zero and v ariance σ 2 , though s ome results will require some tail b ounds. Also, for concreteness, we focu s on a standard mo del in image p ro cessing where the sample p oin ts are on the square lattice, sp eciﬁcally , x i = (( i 1 − 1 / 2) /n, . . . , ( i d − 1 / 2) /n ) when i = ( i 1 , . . . , i d ). Lea ving n implicit, deﬁne vec tors y = ( y i : i ∈ I d n ), f = ( f i : i ∈ I d n ) with f i := f ( x i ) and ε = ( ε i : i ∈ I d n ). The v ecto r mo del can thus b e w ritten y = f + ε . (2.2) W e fo cus on estimating a f unction f on the grid, namely our goal is to estimate th e v ector f and w e measure the p erformance of an estimator b f in terms of (MSE): MSE f ( b f ) = E k b f − f k 2 2 n d = 1 n d X i ∈ I d n E ( b f i − f i ) 2 , where the exp ectation E is with resp ect to the p robabilit y measure asso ciate d with the noise. Although our analysis ma y b e generalized to other norms, mean squared error is handy b ecause of the p oin t-wise (squ ared) b ias and v ariance decomp osition: E ( b f i − f i ) 2 = ( E b f i − f i ) 2 | {z } Squared Bias + E  E ( b f i ) − b f i  2 | {z } V ariance , ∀ i ∈ I d n . (2.3) This leads for the vec tor estimate to the f ollo wing decomp osition: E k b f − f k 2 2 = k E ( b f ) − f k 2 2 + E k E ( b f ) − b f k 2 2 . T o reco v er the fun ction f only through a ﬁ n ite n um b er of measurements, it is customary to require that the targete d fu nction b elongs to a class F of structured functions suc h as smo oth, piecewise smooth, or p erio dic textured images. In this co n text, the minimax risk o v er the f unction class F is deﬁned as R ∗ n ( F ) = in f b f sup f ∈F MSE f ( b f ) , where the inﬁmum is ov er all the measurable function with resp ect to the observ ations. W e sa y that an estimator is (rate-)optimal f or the class F if its worst-ca se MSE ov er F is comparable to the minimax risk, i.e. , (assum ing imp licitly that n b ecomes large) R n ( b f , F ) := sup f ∈F MSE f ( b f ) = O ( R ∗ n ( F )) . Minimax b ounds for non-lo cal means 5 2.1. Ca rto on images. W e are particularly intereste d in situations where the fun ction f has discon tin uities: this is typical of images, mainly b ecause of o cclusions o ccurrin g in natural scenes. W e sa y that f is a “cartoon image” if it is a p iecewise smo oth image with discont in uities along smo oth hyp ersurfaces. This mo del spu rred the greatest p art of the research in image pro cessing and is very common when no texture is pr esen t [ 27 ]. F or simp licit y , we consider that f is m ade of t w o p ieces with eac h piece b eing H¨ older smo oth. Note that all our results apply to the more general case where f is made of more than t wo pieces. F or a function g : R d → R and s = ( s 1 , . . . , s d ) ∈ N d , we d enote the s -deriv ativ e of g at x ∈ R d b y g ( s ) ( x ) = ∂ | s | ∂ s 1 x 1 · · · ∂ s d x d g ( x ) , where | s | := s 1 + · · · + s d . Deﬁnition 2.1 (H¨ older f unction class). F or α, C 0 > 0 , we deﬁne H d ( α, C 0 ) as the H¨ older class of functions g : [0 , 1] d → [0 , 1] that ar e ⌊ α ⌋ times diﬀer e ntiable ( ⌊ α ⌋ i s the lar gest inte ger strictly less than α ) and satisfy ∀ x ∈ [0 , 1] d , ∀ s ∈ N d , 1 ≤ | s | ≤ ⌊ α ⌋ : | g ( s ) ( x ) | ≤ C 0 ; (2.4) ∀ ( x, x ′ ) ∈ [0 , 1] d , ∀ s ∈ N d , | s | = ⌊ α ⌋ : | g ( s ) ( x ) − g ( s ) ( x ′ ) | ≤ C 0 k x − x ′ k α −⌊ α ⌋ ∞ . (2.5) The m ain f eature of H¨ older functions of order α is th at they are w ell-appro ximated lo cally b y a p olynomial (in fact, their T a ylor expansion) of degree ⌊ α ⌋ , cf. Lemm a 8.1 . Deﬁnition 2.2 (Cartoon function class). F or α, C 0 > 0 , let F cartoon ( α, C 0 ) denote the set of functions of the f orm f ( x ) = 1 { x ∈ Ω } f Ω ( x ) + 1 { x ∈ Ω c } f Ω c ( x ) , (2.6) wher e f Ω , f Ω c ∈ H d ( α, C 0 ) , with ju mp (or disc ontinuity gap) µ ( f ) := inf x ∈ ∂ Ω | f Ω ( x ) − f Ω c ( x ) | ≥ 1 /C 0 , (2.7) and Ω ⊂ (0 , 1) d is a b i -Lipschitz image of the (Euclide an) unit b al l B (0 , 1) , sp e ciﬁc al ly, Ω = φ ( B (0 , 1)) , wher e φ : R d → R d is inje ctive with φ and φ − 1 b oth Li pschitz with c onstant C 0 ( i.e., C 0 -Lipschitz) with r esp e ct to the supnorm. We r efer to f Ω as the for e gr ound and to f Ω c as the b ackgr ound. Mor e over ∂ Ω r epr esents the (top olo gic al) b oundary of Ω . The condition ( 2.7 ) is a lo we r bou n d on the minim um “jump”t along t he discontin uity ∂ Ω . W e require that φ is bi-Lips chitz to en sure that th e set Ω is suﬃcient ly smo oth an d do es not ha v e a serious b ottlenec k, w hic h could p oten tiall y mislead the m ethod s d iscussed h ere. W e d eﬁ ne the jump-to-noise ratio (JNR) f or a target function f with jump µ ( f ), and noise standard deviation σ , as b eing th e quant it y JNR = µ ( f ) σ . (2.8) W e assume throughout that µ ≍ 1, so that our b ound s (which scale with σ ) reﬂect p erform ance also as a function of JNR. In the ca rto on m o del, we fo cus on the case where the n oiseless image is at least p iecewise Lipsc hitz, th at is, α ≥ 1. Note th at our results apply to the case where α > 1 / 2 , and that simple linear ﬁltering is essenti ally optimal wh en α ≤ 1 / 2. The setting is illustrated in Figure 2.1 (a). 6 E. Arias-Cas tro et al. (a) Blob (b) Bo wl (c) Swoosh (d) S tripes (e) Ridges (f ) Barbara (g) Cameraman Figure 2.1. O ri ginal and noisy images: c arto on (Blob, Bow l ), thin fe atur es (Swo osh), textur e (Strip es) and natur al im ages (Ridges, Barb ar a, Camer aman). 2.2. Thin f eatures and t extures. I n add ition to considering carto on images as deﬁned ab o v e, we w ill consider images which cont ain other features common in natural images, suc h as th in regions a few pixel wide and regular textures. W e consider simple mo dels for these and s h o w that YF and , m ore generally , the NLM p erform muc h b etter th an linear ﬁltering. These mo dels are instances o f the carto on mo del wh ere the forefron t Ω v aries with n . Let F ( α, C 0 ) b e deﬁned as F cartoon ( α, C 0 ) bu t withou t constrain ts on Ω. As a simple mo del of thin feature, consid er an image f in the carto on family , but where Ω is a thin d 0 -dimensional surface of thickness a — which will v ary with n . A classical example of this kind of structure is th e supp ort bar of the C ameraman’s trip o d, see Figure 2.1 (g). An example of fun ction fr om this class is illustrated by th e Swoosh image, see Figure 2.1 (c). Deﬁnition 2.3 (Thin feature function class). F thin ( α, C 0 , d 0 , a ) :=  f ∈ F ( α, C 0 ) : Ω = { x = ( x ′ , z ) : d ist( z , φ ( x ′ )) < a }  , wher e φ : (0 , 1) d 0 → (0 , 1) d − d 0 is C 0 -Lipschitz. W e may sim ilarly deﬁne a class of regular pattern fu nctions w hic h th emselv es may not b e smo oth, but which occur rep eatedly across the image d omain. This structur e w ould b e diﬃcult to exp loit w ith , sa y , wa ve let-based metho ds that fail to take adv an tage of image redun dancy . Ho wev er, empirical evidence su ggests that n on-local adaptiv e kernels can p erform quite well on th ese images. A classical example of this t yp e of image structure is the strip ed scarf in the Barbara image. The follo wing is a class where Ω is mad e of the disjoint un ion of translates of a smaller region Ω 0 of diameter of order a — whic h will v ary with n . Deﬁnition 2.4 (Regular pattern function cla ss). F pattern ( α, C 0 , a ) :=    f ∈ F ( α, C 0 ) : Ω = (0 , 1) d ∩ [ v ∈ a Z d (Ξ + v )    , wher e Ξ ⊂ (0 , a ) d is any se t. Note that the union ab ove is disjoint. An example of fun ction from this class is illustrated by the Strip es image in Figure 2.1 (d). Minimax b ounds for non-lo cal means 7 3. Background on k ernel metho ds f o r denoising. W e n o w describ e NLM and other related metho ds. The story starts with kernel smo othing ( i.e., linear ﬁltering). Though this age-ol d metho d (with a pr op er choice of k ernel) is essen tially optimal when th e image d oes not hav e d iscon tinuities, its p erformance suﬀers dramatically in the p r esence of edges, w hic h it tends to blu r . YF, and more generally NLM, attempt to c hoose th e ke rnel adaptiv ely so as to a v oid a v eraging o v er the discon tin uit y . The estimates we consid er are weig h ted av erages of the pixel v alues of the form b f i = P j ∈ I d n ω i,j y j P j ∈ I d n ω i,j . (3.1) The v arious m ethod s that w e study in this pap er diﬀer o nly in t he c h oice o f w eig h ts ω i,j . Adaptation to higher o rder of smoothness is often acc omplished b y a l o cal p olynomial re- gression (LPR) [ 17 , 20 ]. The lo cal p olynomial estimator of degree r and we igh ts ( ω i,j ) is          b f i = b a ( i ) 0 b a ( i ) = arg m in a X j ∈ I d n ω i,j   y j − X 0 ≤| s |≤ r a s ( x j − x i ) s   2 , (3.2) where x s := x s 1 1 · · · x s d d , f or x = ( x 1 , . . . , x d ) ∈ R d and s = ( s 1 , . . . , s d ) ∈ R d , and the minimiza- tion in ( 3.2 ) is o v er a = ( a s : 0 ≤ | s | ≤ r ) ∈ R q where q =  r + d d  . N ote th at, in fact, ( 3.2 ) leads to an estimator of the form ( 3.1 ) w ith d iﬀeren t we igh ts ( e.g., a smo other k ernel) [ 57 , p. 34]. W e assume throughout that the p olynomial d egree r is suﬃcient ly large to tak e full adv antag e of the smo othness of f . Sp eciﬁcally , if f ∈ F cartoon ( α, C 0 ), we assu me that r ≥ ⌊ α ⌋ . W hen the num b er of nonzero weigh ts in ( 3.2 ) is not enough to d etermin e b f i uniquely , we deﬁne b f i as y i , namely , w e d o not apply an y smo othing. Alternativ ely , one could decrease the degree of the p olynomial regression until the ﬁt is well-deﬁned, but this is not imp ortant in our s etting. Since we kno w that f take s v alues in [0 , 1], w e clip b f so th at it also tak es v alues in [0 , 1]. This clipping d o es not increase the MSE. 3.1. Linear ﬁltering (LF ). T h is metho d can b e traced bac k in the statistics literature to the work of Nadara ya [ 38 ] an d W atson [ 60 ] ( cf. [ 20 ] for details on kernel metho ds). In this con text th e c hoice of th e similarity b et w een t w o pixels is only con trolled by spatial pro ximit y: ω i,j = K h ( x i , x j ) , (3.3) where K h ( x, x ′ ) = K ( x h , x ′ h ) for a kernel fu nction K and a band width h > 0, which is inde- p endent of the lo cation in the nonadaptiv e (classical) version. Common choic es include the Gaussian k ernel, bu t we fo cus on the b ox k ernel K h ( x, x ′ ) = 1 {k x − x ′ k ∞ ≤ h } . (3.4) 3.2. Y aroslavsky’s ﬁlter (YF ). YF w as in tro duced by Y arosla vsky [ 62 ] and indep end en tly b y Lee [ 28 ], and more mo dern v ariants suc h as S USAN [ 48 ] and Bilateral ﬁltering [ 55 ]. Here, 8 E. Arias-Cas tro et al. similarit y betw een pixels is based on their sp atial distance and on the relativ e pr o ximity of image in tensit y at these p ixels. This trans late s in to c ho osing we igh ts in ( 3.1 ) of the form ω i,j = K h ( x j , x j ) L h y ( y i , y j ) , (3.5) where K , L are k ernels and h, h y the asso ciated band w idths. ( K, h ) cont rol the spatial prox- imit y while ( L, h y ) con trol the photometric pro ximit y . As in cl assical k ern el smoothing, h pla ys the r ole of spatial bandwidth , while h y is a p hotometric b an d width. In this work we only consider the simple ve rsion u sing the b ox ke rnel: K h y ( y , y ′ ) = 1 {| y − y ′ | ≤ h y } . (3.6) 3.3. Non-Lo cal Means (NLM) and patch-based metho ds. NLM and other p atc h-based metho ds generalize the idea of includ ing the photometric proximit y in the kernel. In [ 6 ], the distance b et w een t w o pixels is solely measured in terms of the discrepancy b et w een patc hes surroun ding the pixels considered. Thou gh spatial p ro xim ity was already in tro duced in [ 6 ], it wa s only menti oned as a numerical parameter to solv e a computational iss u e. Ho wev er, later works ( cf. [ 44 , 64 ]) ha v e sho wn that spatial p r o ximity can imp r o ve NLM p erf ormance. W e consider NLM with spatial proximit y , whic h includes the non-lo cal version, the t w o b eing iden tical when h is suﬃcientl y large. A generic description is the follo wing. Let h P > 0 and let P i (lea ving h P implicit) b e th e h yp ercub e of width h P cen tered at x i , i.e., P i = x i +  − h P 2 , h P 2  d =  x : k x − x i k ∞ ≤ h P 2  . (3.7) Suc h a patc h corresp onds to a pixel p atc h of width [ h P n ] + 1 in the digital image (where [ a ] denotes the largest integ er n ot exceeding a ∈ R ). L et y P i = ( y j : x j ∈ P i ) b e the v ecto r of pixel v alues o v er the p atc h cen tered at x i . With this n otati on, the w eigh ts used in NLM are: ω i,j = K h ( x i , x j ) L h y  y P i , y P j  , (3.8) where K, L are k ernel fu nctions and h, h y are band w idths, as b efore. One classical c hoice of L h y (whic h we consider in our theoretical results) is L h y  y P i , y P j  = 1 {k y P i − y P j k 2 ≤ h y } . (3.9) The ph otometric similarit y is b ased on the Eu clidean distance b et w een the patc hes (as ve ctors) around the pixels. W e refer to this as “classical ” or Euclidean NLM (or just NLM). Computing L h y can b e computationally inte nsiv e for large h P . T o addr ess this, some authors h a ve considered pro jecting y P i on to a lo w-dimensional subsp ace and using this p ro- jection to compu te an appr oximati on of L h y ( y P i , y P j ). This introd uces an interesting trade-oﬀ b et w een computational complexit y and accuracy which is examined in [ 4 , 54 ]. In this pap er, w e consid er a 1-dimensional pr o jection int ro duced in [ 31 ] where p atc hes are simply compared via their means alone, resu lting in a p h otometric k ernel of the form L h y  y P i , y P j  = L h y  y P i , y P j  , y P i := Av e( y P i ) . (3.10) Minimax b ounds for non-lo cal means 9 W e refer to this m ethod as NLM-a v erage. F or our theoretical results, we consid er th e k ernel L h y  y P i , y P j  = 1 {| y P i − y P j | ≤ h y } . (3.11) In our analysis, Euclidean NLM ( 3.9 ) and NLM-av erage ( 3. 11 ) b ehav e similarly , except for the regular p attern mo del, w h ere the former is generally su p erior. In p ractice , ho w ev er, w e note a d iﬀeren ce. In smo oth regions, th e a v erag e in ( 3.11 ) has little bias and little v ariance, making it signiﬁcan tly more robus t to noise than the E uclidean distance ( 3.9 ). Near edge s or patterns, ho w ev er, the bias of the av erage in ( 3.11 ) can out we igh the v ariance, making Euclidean NLM ( 3.9 ) sup erior. This insigh t is sup p orted by our exp erimental results in Section 6 . The spatial b andwidth h is t ypically larger th an the patc h w idth h P . Common sizes used in practice are 21 × 21 k ernel w indo ws (also referred to as the searching zone) and 7 × 7 p atc hes (in pixel units). Common k ernels are the b o x-k ern el for K and the Gaussian kernel for L . Though w e assume b ox k ernels for b oth, ou r results extend readily to other k ernel functions. 4. Oracle inequalities and minimax results for carto on images. W e analyze the p erfor- mance of the k ernel-based metho ds describ ed in Section 3 with in the mathematical framework detailed in S ectio n 2 . Qualitativ ely sp eaking, our theoretical r esults are congruent with w hat is observ ed in practice; see our exp erimen ts in Section 6 . Indeed, w e sho w that LF blurs edges, whic h is in fact w ell-kno wn b oth in theory and practice. YF p erform s well wh en the J NR is large, and p o orly otherwise. This ﬁlter relies on a clear gap b etw een the pixel v alues on either side of the discon tin uit y: wh en the JNR is large, there is indeed a gap, w hic h ceases to exist wh en th e JNR is of order 1 ( cf. Figure 4.1 ). The latter situation is w here NLM s h ines. I n deed, p atches of size larger than one pixel gather more in formation ab out the area su r rounding the p ixel, whic h NLM (imp licitl y) u ses to assess whether t w o pixels are on th e same s ize of the discon tin uit y . F or example, comparing p atches in Figur e 4.2 , w e s ee that the means of suﬃciently large patc hes allo w u s to estimate r eliably whether eac h cen ter pixel is in Ω or not, ev en with an J NR of order 1. In what follo ws, we fo cus on the LPR v ariant s describ ed in ( 3.2 ) to a v oid a systematic b ias that con v en tional we igh ted av erage v ariant s suﬀer fr om. I t is well-kno wn that this b ias app ears near the b oun d ary of the image, though this can b e corrected with a prop er extension of the image. More imp ortan tly , this bias arises also near th e discontin uity . Note that enforcing the spatial win do w s to h a ve the same (symmetric) shap e puts a real constrain t on the resulting p erformance of the algorithm, a s discussed in Section 4.3 . The c hoice of ke rnel K for LPR v arian ts ( 3.2 ) is unimp ortant f or stand ard k ernel regression as long as it satisﬁes s ome b asic prop erties. (F o r example, in [ 17 , Th. 3.1], the kernel do es not impact the error rate except for a multiplica tiv e constan t.) Less is known ab out th e impact of the choice of L . In this pap er, w e consider b o x k ernels f or b oth spatial and ph otometric comp onen ts, namely ( 3.4 ), ( 3.9 ) or ( 3.11 ). T o obtain our b oun ds, we minimize the error with resp ect to th e b andwidth parameter h , eﬀectiv ely striking a goo d balance b etw een the bias and v ariance in ( 2.3 ). Indeed, the larger the bandwid th , the larger the bias and th e smaller the v ariance. The issue with k ernel smo othing — whether in the form of we igh ted a v erage ( 3.1 ) or LPR ( 3.2 ) — is that it suﬀers from a s ubstan tial bias when the smo othing win d o w (those p oin ts where the w eigh ts are equal 10 E. Arias-Cas tro et al. −50 0 50 100 150 200 250 300 0 10 20 30 40 50 60 −200 −100 0 100 200 300 400 500 0 10 20 30 40 50 60 −400 −200 0 200 400 600 800 0 10 20 30 40 50 60 Figure 4.1. On the left c olumn ar e c arto on i m ages with incr e asing levels of noise (r ows ar e with J N R = 4 , 2 , 1 fr om top to b ottom). A se ar ching zone is displaye d in r e d, for a pi xel ne ar the disc ontinuity. The mi dd le c olumn is a close-up of the se ar ching zone, while the right one pr ovides histo gr am s of pixel values w i thin it. to one) includes points f rom the “ot her side” of the discon tinuit y . At the same time, the windo w cann ot b e to o small, for otherwise the v ariance w ill b e o v er w helming. 4.1. Linear ke rnel smo othing blurs edges. It is w ell-kno wn that LF blurs d iscon tinuities. This comes f rom the fact that the window size is ﬁxed — the same at all pixels — so p oint s near and p oint s far from th e d iscontin uity are treated in the same wa y . This lac k of adaptivit y leads to a substan tial MSE. How do es that sta temen t tr an s late in to a mat hematical r esult within our framew ork? The follo wing is pro v ed in [ 2 ] for d ≤ 2, though the result (at least the upp er b oun d) is p robably older; see also [ 27 ]. W e p ro vid e a pro of for L P R in Section 8 . Theo rem 4.1. L et b f LF h denote the line ar estimator, in the f orm of either lo c al aver age ( 3 .1 ) or LPR ( 3.2 ) , with weights as in ( 3.3 ) . W e have inf h R n ( b f LF h , F cartoon ( α, C 0 )) ≍ R LF := ( σ 2 /n d ) 1 / ( d +1) , and the optimal choic e of b andwidth is h ≍ h LF := ( σ 2 /n d ) 1 / ( d +1) . Note that the b ound do es not dep end on the regularit y α ≥ 1 of the function f . As apparen t in the pro of, this is b ecause LF blu rs edges: t o strike a go o d b ias-v ariance trade- oﬀ, the sm oothing wind ow cannot b e to o small, transforming sharp ed ges int o ramps. The resulting bias is then larger than the bias o v er the smo oth regions, whic h is where α app ears. Minimax b ounds for non-lo cal means 11 −400 −200 0 200 400 600 800 0 10 20 30 40 50 60 −100 −50 0 50 100 150 200 250 300 350 0 10 20 30 40 50 60 0 50 100 150 200 250 300 0 10 20 30 40 50 60 70 Figure 4.2. We use again the i mage f r om Figur e 4.1 , last r ow (JNR=1). The noi sy i m age is displaye d in the ﬁrst c ol umn, with kernel supp orts. The se c ond c olumn is the r esult of the lo c al (b ox) kernel aver aging using supp ort of wi dth 1, 3 and 7 fr om top to b ottom. The last c olumn pr ovides histo gr ams of the ﬁlter e d pixels 4.2. Oracle kernel. What can we hop e to ac hiev e with adaptive kernel metho ds? Statis- ticians hav e used the notion of an oracle to ans wer this question [ 13 , 21 , 57 ]. W e sa w that what limits linear ﬁ ltering is a large b ias near the discon tin uit y , due to the mixing of pixels from b oth sides. What if w e had access to an oracle th at w ould id en tify for us the foreground and the backg round? The memb ership or acle tells u s which sample p oin ts b elong to Ω or to Ω c . With access to this oracle , we simply pro cess the smooth pieces, f Ω and f Ω c , separately . By doing so w e ac hiev e the minimax rate for the class H d ( α, C 0 ): the information this oracle pro vides is suﬃcien t to do as well as if there were no discon tin uit y . This is illustrated in Figure 4.3 (b est view ed in color). Theo rem 4.2. L et b f MO h denote LPR estimator ( 3.2 ) with weights as in ( 3.3 ) when x i and x j b elong to the same side of the disc ontinuity, and set to zer o otherwise. We have inf h R n ( b f MO h , F cartoon ( α, C 0 )) ≍ R MO := ( σ 2 /n d ) 2 α/ ( d +2 α ) , and the optimal choic e of b andwidth is h ≍ h MO := ( σ 2 /n d ) 1 / ( d +2 α ) . The low er b ound is a w ell-kno wn min imax b oun d [ 27 , Theorem 5.1.2, p. 133]. If we consider a class of piecewise p olynomial functions, then this oracle estimator, w ith ou t spatial pro ximit y ( i.e., h = ∞ ), ac hiev es the parametric r ate of σ 2 /n d . It is worth n oting th at 12 E. Arias-Cas tro et al. (a) Kernel smo othing (b) Bandwidth or acle (c) Memb ersh ip or acle Figure 4.3. The kernel supp orts for the line ar ﬁlter, the b andwidth or acle and the memb ersh ip or acle Figure 4.4. Memb ership or acles of or der 0 and 1 on a non-noisy 1D signal. Note how the bias is r e duc e d by going to the or der 1. LPR pla ys a crucial r ole here. Indeed, the w in do w around a p oint near the d iscon tinuit y — comprised of all p oints b elonging to the same side of the d iscon tinuit y — will b e irregularly shap ed. F or instance, imagine a p oint on a linear s urface adjacen t to the discont in uit y . F or a symmetric wind o w (suﬃcien tly small not to includ e the discon tin uit y), the linear v ariations around the p ixel of interest will a v erage out and we can accurately estimate the pixel v alue. F or an asymmetric wind o w caused by the discon tin uit y , th e linear v ariati ons w ill n ot a v erage out, inducing a small b ias an d leading to a higher risk of ord er ( σ 2 /n d ) 3 / ( d +3) when α ≥ 3 / 2. This phenomenon can b e observe d in p ractice and is illustrated in Figure 4.4 . Note that th e oracle only h as to provide th e mem b ership inform atio n lo cally , within the searc h ing windo w . The insight we get from this is that w e only need t o kno w o v er which pixel v alues to a v erage to attain the same error rate as we w ould without discon tin uities. This is exactly what adaptiv e kernel metho ds [ 29 ], including patc h-based metho ds, PDE metho ds [ 39 , 1 , 19 ] and graph diﬀusion metho ds [ 52 , 47 ] aim at d oing. 4.3. Va riable bandwidth kerne l metho ds. These metho ds [ 37 , 50 , 22 , 23 , 24 ], in cluding Lepski’s metho d [ 29 ] and v ariants [ 40 , 41 ], c ho ose the band width adaptivel y at ev ery lo cation, the goal b eing to av oid smo othing ov er discon tin uities and to adapt to the regularit y of the signal when unkn own. W av elet sh rink age metho ds are often thought to p erform some sort of Minimax b ounds for non-lo cal means 13 v ariable-bandwidth kernel sm oothing [ 13 ]. Clearly , w e cannot do b etter than if we knew the discon tin u it y , m eaning if we had access to the memb ership or acle . In that case, at eac h p oint w e w ould choose th e bandwidth equal to its distance to the disconti n uit y (BO b elo w stands for b andwidth or acle ). S ee Figure 4.3 for a comparison of the MO and BO spatial supp orts. Theo rem 4.3. L et b f BO h denote LPR estimator ( 3.2 ) with weights chosen as in ( 3.3 ) when k x i − x j k ∞ < dist( x, ∂ Ω ) := inf y ∈ ∂ Ω k x − y k ∞ , and set to zer o otherwise. We have inf h R n ( b f BO h , F cartoon ( α, C 0 )) ≍ R BO := ( A n σ 2 /n ) ∨ ( σ 2 /n d ) 2 α/ ( d +2 α ) , wher e A n = log n when d = 1 and A n = 1 wh en d ≥ 2 , for an optimal choic e of maximal b andwidth h ≍ h MO . Note that BO ac hiev es the error rate of MO only when d = 1, when d = 2 and α = 1, or when d ≥ 2 and σ 2 = O ( n − 2 α (1 − 1 /d ) +1 ), whic h is p olynomially s mall when d = 2 and α > 1 or when d ≥ 3. Thus in general, BO is substantia lly weak er than MO. That said, BO achiev es the minimax rate established in [ 56 ] wh en σ is ﬁ xed. 4.4. Y aroslavsky’s ﬁlter is o racle-optimal under lo w noise. As th e practitioner kn o ws, YF can b e quite go o d on natural images. In fact, it can dramatically outp erforms the linear ﬁlter and compares fa v orably with metho ds suc h as w a velet thresholdin g, p articularly when the noise level is small. W e su bstan tiate th is empirical evid en ce with a theoretical study of its p erformance, showing it ac hiev es MO b oun d in su ch situ ations ( i.e., wh en σ is small). Assume that f or a ﬁ xed cumulativ e d istr ibution fun ction F , the noise satisﬁes the follo wing P ( | ε i | ≤ t ) ≥ F ( t/σ ) , ∀ t, ∀ i ∈ I d n , (4.1) The f ollo wing r esult states th at YF achiev es a p erformance comparable to that of MO if σ is small. W e only require that the noise d istribution in ( 4.1 ) has qu ic kly deca ying tails. Theo rem 4.4. L et b f YF h,h y denote the LPR estimator ( 3.2 ) with Y ar oslavsky’s weights ( 3.5 ) . Supp ose that, for some c onstants C, b > 0 , ( 4.1 ) holds with 1 − F ( t ) ≤ C exp( − ( t/C ) b ) for t lar ge enough. Then ther e is another c onstant C ′ > 0 such that, if σ ≤ ( C ′ log n ) − 1 /b , inf h,h y R n ( b f YF h,h y , F cartoon ( α, C 0 )) ≤ (1 + o (1)) R MO , wher e an optimal choic e of b andw idths i s h ≍ h MO and h y ≍ 1 . Gaussian noise s atisﬁes the requirements of Theorem 4.4 with C = √ 2 and b = 2, r esu lting in the constrain t σ = O (1 / √ log n ), which is quite mild. This explains w h y YF tends to p erform w ell in practice, at least for lo w noise lev el. This excellen t p erformance hinges on the fact that th e photometric k ernel is able to mimic the memb ership or acle when the noise lev el is sm all. When the noise leve l is o f order 1 or larger, this is no longer true, as illustrated in Figure 4.1 . There, w e cle arly see that in a windo w con taining p oints f rom b oth Ω and its complement , the pixel v alues are m ixed in the histogram if th e noise level is too large, making a clear separation imp ossible. W e formally argue this p oin t after the pro of of Theorem 4.4 in Section 8.2.4 . 14 E. Arias-Cas tro et al. It is worth noting that the pro of helps clarify exactly the artifacts encoun tered in p ractice b y th e YF for strong noise ( cf. Figure 6.5 ). Indeed, the outpu t often lo oks lik e the original scene conta minated by something lik e “salt and p epp er” noise. As menti oned in the pr oof, this is b ecause the YF do es n ot alter pixels with extreme v alues. 4.5. Perfo rma nce analysis fo r Non-Lo cal Means. In the pr evious s ection w e established that YF p erf orms as we ll as MO when the n oise level is small, while it is useless otherwise. A natural strategy consists of, ﬁr st, reducing the noise lev el by a v eraging and, th en , app lyin g YF. This is almost exactly wh at NLM-a verage do es. W e precisely quan tify the MSE p erformance of b oth NLM-a v erage and Euclidean NLM in th is section. Note that we state our r esults for i.i.d. Gaussian n oise for sim p licit y , though they are v alid for many other distribu tion families suc h as uniform and double-exp onenti al. Theo rem 4.5. L et b f NLM h,h y denote LPR estimator ( 3.2 ) with NLM weig hts ( 3.8 ) and p hoto- metric kernel either Euc lide an ( 3.9 ) or Aver age ( 3.11 ) . If the noise c onditions of The or em 4.4 hold, then inf h,h y R n ( b f NLM h,h y , F cartoon ( α, C 0 )) ≤ (1 + o (1)) R MO , wher e h P = 1 /n . Otherwise, assuming σ is b ounde d away fr om 0, we have inf h,h y R n ( b f NLM h,h y , F cartoon ( α, C 0 ))  R NLM := ( B n /n ) ∨ ( σ 2 /n d ) 2 α/ ( d +2 α ) , wher e B n := ( σ 4 log n ) 1 /d (Euclide an) or := ( σ 2 log n ) 1 /d (Aver age), and an optimal choic e of b andwidths is h ≍ h MO , h y ≍ h NLM y := σ 3 √ log n (Euclide an) or := (2 C 0 ) − d µ/ 2 (Aver age), and h P ≍ h NLM P := B n /n . In other w ords, if the lo w-noise conditions of Th eorem 4.4 hold, then th e optimal p atch size is a sin gle pixel, and the NLM is exactly YF and we ac hiev e the YF p erform ance b oun d. Th ere is an elb o w in the p erformance b ound, since once the optimal patc h size exceeds a single pixel, estimation errors within a patc h sidelength of the b ound ary impact the p erf ormance. There is a s tr ong corresp ondence b et ween this b ound and the BO b oun d in Theorem 4.3 . If σ is ﬁxed, R NLM ≍ R BO for d = 1 and R NLM ≍ (log n ) 1 /d R BO for d ≥ 2, therefore R BO is the minimax rate [ 56 ] and NLM is min im ax optimal u p to a logarithmic factor. Note th at our bandw idth h is not inﬁ nite as in [ 34 ]. There, th e authors use an in ﬁnite windo w f or searching for matc h ing patc hes: this is optimal in their setting b ecause they con- sider p iecewise c onsta nt images. In our setting, images are piecewise smo oth, and a smaller bandwidth can not only h elp us reduce the r isk of our estimate, b ut also lead to more com- putationally eﬃcien t estimation algorithms. 5. Perfo rmance analysis for thin features and textures. In the carto on mo del of Section 2 with J NR of ord er 1, the p erformance of NLM is comparable to that of v ariable b andwidth k ernel sm oothing, and actually that of w a velet s as well [ 26 , 42 ]. In n atural images, ho w ev er, NLM can p erform substantia lly b etter. W e explain this by the fact that the carto on m od el w e consid ered so far, though useful as a b enc hmark, do es not accoun t for f eatures common in natural images, p articularly , v ery thin regions a few pixels wide and regular textures. Minimax b ounds for non-lo cal means 15 Belo w, w e do as if the image con tained regions of cartoon t yp e and regions with thin features and/or texture, and k eep the same band widths th at we fou n d to b e optimal in the cartoon mo del in the previous r esults. 5.1. Thin features. Both YF and NLM ac h iev e a go o d p erformance on thin features. W e fo cus on sample p oints within the feature an d fo cus on the interesting case wh ere th e thic kness is of sm aller ord er of magnitude than the b andwidth h . Theo rem 5.1. Consider f ∈ F thin ( α, C 0 , d 0 , a ) with b and Ω ; assume al l p ar ameters ar e ﬁxe d exc ept a ≥ 4 /n and a → 0 as n → ∞ . In terms of p oint-wise risk ( 2.3 ) at x i ∈ Ω , we have: 1. The line ar ﬁlter with b andwidth h LF has risk of or der 1 if a = o ( h LF ) . 2. BO with maximal b andwidth h MO has a p oint-wise risk of or der a 2 α ∨ σ 2 ( na ) − d , if dist( x i , Ω c ) ≥ a/C for some C > 3 , if na → ∞ . 3. MO with b and width h MO has risk of or der ( h MO /a ) d − d 0 R MO if a = o ( h MO ) . 4. The latter is true of Y F with b andwidths h MO , h y ≍ 1 , if the noise satisﬁes the c ondi- tions of The or em 4.4 . 5. This is also the c ase of NLM (Euclide an or Aver age) with b andwidths h = h MO , h y = h NLM y , and p atch size h P = h NLM P , if dist( x i , Ω c ) ≥ h NLM P . In view o f this result, w e can sa y th at linear ﬁlte ring essentia lly erases the fea ture. In con trast, YF still p erforms very well (relativ e to the MO) u nder lo w noise, and NLM p erf orms w ell in this case and for higher noise settings. Note that wh en h NLM P = o ( a ), the b ound ab ov e holds f or most p oint s with in the th in features. Though not stated h ere, we found that NLM is able t o handle suc h band s un der special circumstances — when d ≥ 3 and the b and is straigh t. 5.2. Regular patterns and textures. W e consider v ery general p atterns where YF will do as well as in th e carto on mo del, situations wh ere most other metho ds are essen tially useless. Euclidean NLM p erforms well to o, under add itional assu m ptions on the p attern. Prop osition 5.1. Consider f ∈ F pattern ( α, C 0 , a ) with al l p ar ameters ar e ﬁxe d exc ept for a , which satisﬁes a = o ( h MO ) . L et N Ω := # { i : x i ∈ Ω } a nd N Ω c is deﬁne d similarly. If N Ω ∨ N Ω c ≤ C ( N Ω ∧ N Ω c ) and na ≥ ( r + 1)(2 C + 2) , with C > 1 ﬁxe d, we have the fol lowing: 1. MO with h = h MO achieves an M SE of or der R MO . 2. The latter is true of Y F with b andwidths h MO , h y ≍ 1 , if the noise satisﬁes the c ondi- tions of The or em 4.4 . 3. Supp ose in addition that for every x i ∈ Ω and x j ∈ Ω c , k 1 ( P i ∩ Ω) − 1 ( P j ∩ Ω) k 2 2 ≥ ( σ 2 log n ) /C ′ , (5.1) for some C ′ > 1 ﬁxe d. Then (Euc lide an) NLM with b andwidths h = h MO , h y = h NLM y and p atch size h NLM P , achieves an MSE of or der ( na ) d R MO . The condition ( 5.1 ) essentially means that an y t w o patc hes, where on is cente red in the foreground and the other is cen tered in the bac kground, must b e suﬃcient ly distinct – and the necessary degree of distinction in creases w ith the noise lev el. F or instance, ( 5.1 ) is satisﬁed b y such p atte rns as a chessb oard or strip es. A regular pattern in a real image ( e. g ., Barbara’s blouse) is often referred to as texture, and NLM is able to eﬀectiv ely d enoise such patterns 16 E. Arias-Cas tro et al. σ = 5 σ = 20 σ = 50 σ = 100 r = 0 7 13 23 35 r = 1 9 17 25 33 r = 2 23 41 59 61 T able 6.1 Sp atial b andwidth h use d in pr actic e obtaine d by mini m izing the MSE of the MO on Bow l ( cf. Figur e 6.1 ). under some r egularit y co nditions. F or r andom mo d els of textures, suc h as Marko v r andom ﬁelds, we do n ot exp ect NLM to do w ell unless th e p atte rn is not very r an d om. Th e r eason is that few p atc hes are close in Euclidean d istance to a give n p atc h. 6. Exp eriments. In this section w e provide numerical results for images with d = 2, whose pixel intensities are b et w een 0 and 255. In our exp erimen ts, the noise is Gaussian with standard deviation σ ∈ { 5 , 20 , 50 , 100 } (Note that th is corresp onds, for normalized images in [0 , 1] to n oise with σ ∈ { 5 / 255 , 10 / 255 , 50 / 255 , 100 / 255 } ). On b oth to y and classical images, we ha v e compared the b eha vior of the follo wing metho ds : linear ﬁ ltering (LF), Y arosla vsky’s ﬁlter (YF), Euclidean NLM (NLM), av erage non-lo cal means (NLM-a ve rage) an d th e membersh ip oracle (MO). In all cases we h a ve implemen ted LPR v ersion of the m ethod s for the ord ers r ∈ { 0 , 1 , 2 } . Note th at, as exp ected, for l inear ﬁltering LPR of order 0 and 1 are exactly iden tical b ecause the su p p ort of the ke rnel is symmetric. Ho wev er, f or other metho ds the symmetry of the sup p ort is no longer guarante ed and the estimators diﬀer. The higher order LPR v ersions are computed by solving the linear system in ( 8.5 ). A small numerical constan t (10 − 8 ) is added to th e diagonal elemen ts of X T X so that inv erting this m atrix is alwa ys a well conditioned pr oblem. F or fair comparisons w e ha v e used th e s ame b o x k ernel with ev ery metho d. The p atc h size is 7 × 7 ( i.e., h P = 7). It is k ept ﬁxed for all the metho ds. F or the spatial band width h w e h a ve c hosen to use the v alues obtained b y considering the b est h (in term of MSE) for the MO, on the Bo wl image. Thus, w e us e for eac h n oise lev el σ ∈ { 5 , 20 , 50 , 10 0 } and p olynomial order r ∈ { 0 , 1 , 2 } an h op timized on Bo wl. The v alues are pro vided by the MSE optimization in Figure 6.1 and summ arized in T ab. 6.1 . The photometric b an d width h y is c hosen by hand , and diﬀers from method to metho d: √ 10 σ (YF), 0 . 29 σ (NLM-a verage ), 13 . 1 σ (NLM), 30 (MO). It is to b e noted th at the parameters are giv en for comparison in b et w een metho ds, w e do not claim those are the b est parameters for all app licat ions. In pr actice, h needs to increase with r to ens ure that the LPR is stable. Note that there are q =  r + d d  p olynomial co eﬃcien ts in eac h searc h wind o w. If w e apply th e rule of thum b of 10 observ ati ons p er unkn o wn parameter, the searc h win d o w needs to includ e ab out 10 q pixels. This is illustr ated in Figure 6.1 , where we s ee the b est h increasing with r . Since f or natur al (not carto on -like) images MO is irrelev ant, we hav e used a mo diﬁed YF oracle instead. This oracle has access to th e original image to compute the w eigh ts as in ( 3.3 ), and then p erforms LPR on th e n oisy p ixel v alues with these weigh ts. F or p iecewise constan t images, this coincides exactly with MO as s oon as the b an d width is large enou gh . The exp eriment s conducted sho w that LF is alwa ys outp erformed in practice by YF, NLM and N LM-a verage . F or lo w n oise lev el ( σ = 5 ), the YF w ith r = 2 outper f orms the other metho ds ( cf. Figure 6.6 an d T able 6.2 ). Ho wev er, in the presence of strong noise the NLM and the NLM-a v erage are the clear w inners on most images. Inte restingly , and may b e Minimax b ounds for non-lo cal means 17 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 M S E e v o l u t i o n w i t h g r o w i n g b a n d w i t h f o r B o w l h M S E M O 0 M O 1 M O 2 (a) σ = 5 0 5 10 15 20 25 30 0 10 20 30 40 50 60 70 M S E e v o l u t i o n w i t h g r o w i n g b a n d w i t h f o r B o w l h M S E M O 0 M O 1 M O 2 (b) σ = 20 0 5 10 15 20 25 30 0 50 100 150 200 250 300 350 400 450 M S E e v o l u t i o n w i t h g r o w i n g b a n d w i t h f o r B o w l h M S E M O 0 M O 1 M O 2 (c) σ = 20 0 5 10 15 20 25 30 0 200 400 600 800 1000 1200 1400 1600 1800 M S E e v o l u t i o n w i t h g r o w i n g b a n d w i t h f o r B o w l h M S E M O 0 M O 1 M O 2 (d) σ = 100 Figure 6.1. MSE with r esp e ct to h for the image Bow l, with noise level σ ∈ { 5 , 20 , 50 , 100 } . surpr isingly the NLM-a v erage can ev en impr o ve on the NLM for very stron g n oise, ev en for natural images. On the other hand, one can see that the NLM-a verage m imics the b eha vior of LF for textured images with stron g noise ( cf. Fig ure 6.9 ), due to the fact th at diﬀerent sides of p erio dic features are a v eraged together. This limitation is particularly obvi ous for the Strip e image 6.9 . The inﬂu ence of the degree r of the LPR dep end s on the nature of the image b eing denoised ( e.g., n atural vs. cartoon im ages). In practice it r emains unclear how this parameter should b e tun ed. Figure 6.2 , Figure 6.3 , Figure 6.4 , and Figure 6.5 demonstrate the imp ortance of the ju m p parameter µ in practice. On th e left end of the Swoosh, the ju m p is larger and w e reconstruct it accurately across all noise leve ls. On the right en d , the ju mp is sm aller and the p erformance degrades with σ , exactly as predicted b y our theory . The MA TLAB co des are a v ailable on the authors’ we bpages to repro duce those resu lts. 7. Discussion. Th e theoretica l results of t he preceding sec tions are summarized in T a- ble 7.1 . As describ ed in the Introdu ction, the b ounds d escrib ed in this pap er and in the ind ep en- 18 E. Arias-Cas tro et al. Blob Sinusoid Bo wl Ridges S trip es Barbara Cam. σ = 5 LF0 35.27 40.29 57.71 48.80 21077 .49 408.13 437.96 LF1 48.50 55.50 74.18 11 0.08 22787.90 473.15 529.64 LF2 72.11 82.89 105.09 246.70 15424.99 586 .47 663.59 YF0 1.45 1.67 1.74 13.70 1.68 20.48 13 .59 YF1 1.11 1.30 1.01 8.57 1.21 20.03 13.39 YF2 0.94 1.17 0.87 7.94 0.69 20.79 13.48 NLM-Av.0 1.61 2.55 3.30 4.15 327 .70 255.23 18 8.74 NLM-Av.1 1.49 1.79 2.33 4.08 202 .88 202.96 99.13 NLM-Av.2 1.35 1.67 1.99 3.45 329 .83 242.91 15 1.60 NLM0 1.55 1.74 1.86 3.88 3.89 19.54 13.52 NLM1 1.47 1.55 1.73 3.65 3.11 19.79 13.72 NLM2 1.51 1.51 1.59 3.59 1.68 20.27 13.74 MO0 1.58 1.77 0.97 16.16 1.23 35.10 28 .60 MO1 1.11 1.26 0.52 19.13 0.81 37.84 28 .99 MO2 0.97 1.31 0.35 12.41 0.36 39.65 29 .75 σ = 20 LF0 77.77 91.66 109.58 305.04 13956.38 607 .11 684.54 LF1 10 4.25 134.27 141.41 5 33.25 14455 .88 725.30 818.62 LF2 13 9.80 208.44 188.67 7 07.47 16996 .42 901.23 994.78 YF0 15.43 17.05 11.61 11 8.99 9.77 189.69 1 04.70 YF1 17.93 20.80 11.46 15 8.55 7.81 219.63 1 13.33 YF2 18.97 24.66 14.34 17 4.83 6.59 242.48 1 22.42 NLM-Av.0 6.99 9.27 17.76 18 .73 332.71 345.66 307.67 NLM-Av.1 6.02 7.72 14.12 19 .61 406.11 334.80 275.87 NLM-Av.2 4.58 7.53 13.05 15 .15 399.46 352.14 306.27 NLM0 5.76 6.37 12.66 20.44 11.54 121.17 92.36 NLM1 5.53 5.70 13.28 21.68 9.31 129.10 96.88 NLM2 5.02 4.53 13.16 19.44 5.92 137.95 101.09 MO0 4.00 4.41 5.03 31.78 4.65 41.67 34 .24 MO1 2.96 3.25 2.82 35.75 2.89 44.92 34 .06 MO2 2.26 2.74 1.88 33.60 1.83 45.68 34 .88 σ = 50 LF0 14 9.70 211.28 195.46 8 47.84 17633 .15 900.24 997.56 LF1 16 2.93 232.97 211.39 9 39.06 15081 .34 955.73 1048.22 LF2 20 9.56 290.03 273.74 1501.85 15705.82 1157 .37 1221.38 YF0 112.17 138.30 146.13 591.42 85 7.69 652.97 5 23.88 YF1 129.07 155.85 164.73 655.67 72 2.93 699.37 5 74.87 YF2 146.05 178.84 199.35 998.40 74 1.85 811.85 6 29.87 NLM-Av.0 2 3.66 29.89 52.96 64.32 807 .78 419.12 38 9.60 NLM-Av.1 2 1.51 27.56 36.86 69.56 770 .17 414.81 37 2.68 NLM-Av.2 1 8.17 26.07 39.07 67.12 820 .21 425.60 38 5.13 NLM0 21.64 27.35 36.32 1 62.17 40.92 367.48 230.35 NLM1 29.09 31.35 30.78 1 79.78 25.50 381.14 234.01 NLM2 25.33 30.60 30.15 2 45.86 20.72 398.52 243.81 MO0 7.68 8.32 11.23 48 .99 10.90 50.50 42.64 MO1 7.72 7.98 9.20 57.78 8.70 55.67 44 .46 MO2 5.56 6.11 6.43 67.90 6.01 49.81 41 .51 σ = 100 LF0 23 9.01 319.36 300.81 1340.28 17131.55 1198 .68 1249.50 LF1 22 5.89 307.90 285.37 1277.60 17776.32 1159 .50 1218.33 LF2 22 5.36 305.06 291.83 1550.89 17079.65 1188 .50 1248.76 YF0 308.15 367.90 365.84 120 6.35 88 48.92 1108.23 1080 .66 YF1 299.59 359.21 352.73 115 6.57 91 97.93 1077.16 1064 .45 YF2 296.39 355.73 356.40 137 5.29 88 13.04 1099.59 1077 .78 NLM-Av.0 6 4.41 76.19 118.55 2 02.15 8223.43 556.50 495.62 NLM-Av.1 6 6.78 74.22 94.12 20 4.95 8385.63 554.88 49 2.05 NLM-Av.2 6 6.27 73.42 98.31 22 4.59 8118.55 555.58 49 5.58 NLM0 91.67 131.36 167.44 819.97 91.49 911.60 628.08 NLM1 118.08 135.37 183.32 786.01 90.29 926.67 662.68 NLM2 101.83 127.34 171.13 956.96 88.01 918.08 646.76 MO0 14.19 15.12 22 .74 80.33 19.90 61.09 54 .41 MO1 18.31 17.96 23 .06 91.72 22.38 76.21 65 .36 MO2 17.76 17.65 18 .95 88.60 22.09 72.00 62 .53 T able 6.2 MSE c omp arisons of the denoising metho ds c onsider e d f or LPR of or der 0, 1 and 2. The c omp ar e d metho ds ar e the Line ar Fi lter (LF), the Y ar oslavsky Filter (YF), the NLM-aver age (NLM-Av .), the cl assic al NLM and the M emb ership Or acle (MO). R esults ar e aver age d over 5 Gaussian noise r eplic as. Minimax b ounds for non-lo cal means 19 (a) Noisy , MSE = 2.50e+01 (b) LF0, MSE = 4.03e+01 (c) LF1, MSE = 5.55e+01 (d) LF2, MSE = 8.29e+01 (e) Y F0, MSE = 1.69e+00 (f ) YF1, MSE = 1.31e+00 (g) YF2, M SE = 1.19e+00 (h) NLM-A v.0, MSE = 2.56e+00 (i) NLM-Av .1, MSE = 1.90e+00 (j) NLM-Av.2, MSE = 1.75e+00 (k) NLM0, MSE = 1.54e+00 (l) NLM1, MSE = 1.36e+00 (m) N LM2, MSE = 1.37e+00 (n) MO0, MSE = 1.79e+00 (o) MO1, MSE = 1.26e+00 (p) MO2, MSE = 1.31e+00 Figure 6.2. T oy thin fe atur e image (Swo osh) c orrupte d Gaussian noise with σ = 5 . 20 E. Arias-Cas tro et al. (a) Noisy , MSE = 3.99e+02 (b) LF0, MSE = 9.18e+01 (c) LF1, MSE = 1.35e+02 (d) LF2, MSE = 2.09e+02 (e) Y F0, MSE = 1.71e+01 (f ) YF1, MSE = 2.08e+01 (g) YF2, M SE = 2.49e+01 (h) NLM-A v.0, MSE = 9.86e+00 (i) NLM-Av .1, MSE = 8.37e+00 (j) NLM-Av.2, MSE = 7.87e+00 (k) NLM0, MSE = 6.19e+00 (l) NLM1, MSE = 6.00e+00 (m) N LM2, MSE = 4.53e+00 (n) MO0, MSE = 4.54e+00 (o) MO1, MSE = 3.39e+00 (p) MO2, MSE = 2.85e+00 Figure 6.3. T oy thin fe atur e image (Swo osh) c orrupte d Gaussian noise with σ = 20 . Minimax b ounds for non-lo cal means 21 (a) Noisy , MSE = 2.50e+03 (b) LF0, MSE = 2.13e+02 (c) LF1, MSE = 2.35e+02 (d) LF2, MSE = 2.91e+02 (e) Y F0, MSE = 1.38e+02 (f ) YF1, MSE = 1.56e+02 (g) YF2, M SE = 1.78e+02 (h) NLM-A v.0, MSE = 3.11e+01 (i) NLM-Av .1, MSE = 2.91e+01 (j) NLM-Av.2, MSE = 2.64e+01 (k) NLM0, MSE = 3.54e+01 (l) NLM1, MSE = 4.25e+01 (m) N LM2, MSE = 4.35e+01 (n) MO0, MSE = 8.61e+00 (o) MO1, MSE = 8.32e+00 (p) MO2, MSE = 6.40e+00 Figure 6.4. T oy thin fe atur e image (Swo osh) c orrupte d Gaussian noise with σ = 50 . 22 E. Arias-Cas tro et al. (a) Noisy , MSE = 9.98e+03 (b) LF0, MSE = 3.22e+02 (c) LF1, MSE = 3.11e+02 (d) LF2, MSE = 3.07e+02 (e) Y F0, MSE = 3.67e+02 (f ) YF1, MSE = 3.58e+02 (g) YF2, M SE = 3.54e+02 (h) NLM-A v.0, MSE = 8.55e+01 (i) NLM-Av .1, MSE = 8.27e+01 (j) NLM-Av.2, MSE = 8.11e+01 (k) NLM0, MSE = 1.78e+02 (l) NLM1, MSE = 1.78e+02 (m) N LM2, MSE = 1.68e+02 (n) MO0, MSE = 1.51e+01 (o) MO1, MSE = 1.82e+01 (p) MO2, MSE = 1.86e+01 Figure 6.5. T oy thin f e atur e image (Swo osh) c orrupte d Gaussian noise with σ = 100 . Minimax b ounds for non-lo cal means 23 (a) Noisy , MSE = 2.50e+01 (b) LF0, MSE = 5.76e+01 (c) LF1, MSE = 7.40e+01 (d) LF2, MSE = 1.05e+02 (e) Y F0, MSE = 1.71e+00 (f ) YF1, MSE = 9.88e-01 (g) YF2, MSE = 8.74e-01 (h) NLM-Av .0, MSE = 3.41e+00 (i) NLM-Av .1, MSE = 2.50e+00 (j) NLM-Av.2, MSE = 2.15e+00 (k) NLM0, MSE = 1.67e+00 (l) NLM1, MSE = 1.60e+00 (m) N LM2, MSE = 1.50e+00 (n) MO0, MSE = 9.56e- 01 (o) MO1, MSE = 5.01e- 01 (p) MO2, MSE = 3.48e- 01 Figure 6.6. T oy c arto on image (Bow l) c orrupte d Gaussian noise wi th σ = 5 . 24 E. Arias-Cas tro et al. (a) Noisy , MSE = 3.99e+02 (b) LF0, MSE = 7.78e+01 (c) LF1, MSE = 1.04e+02 (d) LF2, MSE = 1.40e+02 (e) Y F0, MSE = 1.53e+01 (f ) YF1, MSE = 1.80e+01 (g) YF2, M SE = 1.90e+01 (h) NLM-A v.0, MSE = 7.55e+00 (i) NLM-Av .1, MSE = 6.62e+00 (j) NLM-Av.2, MSE = 5.10e+00 (k) NLM0, MSE = 5.86e+00 (l) NLM1, MSE = 5.83e+00 (m) N LM2, MSE = 4.92e+00 (n) MO0, MSE = 4.12e+00 (o) MO1, MSE = 3.03e+00 (p) MO2, MSE = 2.37e+00 Figure 6.7. T oy c arto on image (Blob) c orrupte d Gaussian noi se with σ = 20 . Minimax b ounds for non-lo cal means 25 (a) Noisy , MSE = 2.50e+03 (b) LF0, MSE = 9.00e+02 (c) LF1, MSE = 9.55e+02 (d) LF2, MSE = 1.15e+03 (e) Y F0, MSE = 6.53e+02 (f ) YF1, MSE = 6.99e+02 (g) YF2, M SE = 8.10e+02 (h) NLM-A v.0, MSE = 4.20e+02 (i) NLM-Av .1, MSE = 4.17e+02 (j) NLM-Av.2, MSE = 4.26e+02 (k) NLM0, MSE = 4.50e+02 (l) NLM1, MSE = 4.76e+02 (m) N LM2, MSE = 4.96e+02 (n) MO0, MSE = 5.28e+01 (o) MO1, MSE = 5.86e+01 (p) MO2, MSE = 5.18e+01 Figure 6.8. Barb ar a image c orrupt e d Gaussian noise with σ = 50 . 26 E. Arias-Cas tro et al. (a) Noisy , MSE = 9.98e+03 (b) LF0, MSE = 1.71e+04 (c) LF1, MSE = 1.78e+04 (d) LF2, MSE = 1.71e+04 (e) Y F0, MSE = 8.91e+03 (f ) YF1, MSE = 9.26e+03 (g) YF2, M SE = 8.87e+03 (h) NLM-A v.0, MSE = 7.39e+03 (i) NLM-Av .1, MSE = 7.51e+03 (j) NLM-Av.2, MSE = 7.25e+03 (k) NLM0, MSE = 2.42e+02 (l) NLM1, MSE = 2.10e+02 (m) N LM2, MSE = 2.33e+02 (n) MO0, MSE = 2.11e+01 (o) MO1, MSE = 2.37e+01 (p) MO2, MSE = 2.28e+01 Figure 6.9. T oy textur e image (Strip es) c orrupte d Gaussian noise with σ = 100 . Minimax b ounds for non-lo cal means 27 Image class Metho d Bound F cartoon MO R n ≍ R MO := ( σ 2 /n d ) 2 α/ ( d +2 α ) LF R n ≍ ( σ 2 /n d ) 1 / ( d +1) YF R n ≤ (1 + o (1)) R MO (for lo w n oise) NLM R n  ( (1 + o (1)) R MO for lo w noise [( σ 4 log n ) 1 /d /n ] ∨ R MO otherwise NLM-Av. R n  ( (1 + o (1)) R MO for lo w noise [( σ 2 log n ) 1 /d /n ] ∨ R MO otherwise F pattern MO R n ≍ R MO LF R n ≍ 1 YF R n  R MO (for lo w noise) NLM R n  ( (1 + o (1)) R MO for lo w n oise [( na ) d R MO for “distinct” patterns F thin MO E ( b f i − f i ) 2 ≍ ( h MO /a ) d − d 0 R MO LF E ( b f i − f i ) 2 ≍ 1 YF E ( b f i − f i ) 2  ( h MO /a ) d − d 0 R MO (for lo w n oise) NLM E ( b f i − f i ) 2  ( h MO /a ) d − d 0 R MO NLM-Av. E ( b f i − f i ) 2  ( h MO /a ) d − d 0 R MO T able 7.1 Summary of r esults den t work [ 34 , 33 ] addr ess fundamental p erformanc e limits of NLM and r elate d photometric image ﬁltering metho ds . These metho ds h av e an established history of strong empir ical p erf or- mance on natural images, but u n til now little w as kno wn ab out how these metho ds p erformed asymptotically , esp eciall y with resp ect to r elate d m ethod s based on computational h armonic analysis ( e.g., w a vele t or curvelet denoising). Both our b ound s and th e b ound s in [ 34 ] su ggest that NLM has some limitations for piece- wise smo oth images wh en the noise is not small ( i.e., when YF cannot p erform eﬀectiv ely). When the noise is small, w e n ote that YF is a sp ecial case of NLM with a p atc h size of one pixel, and the p erform ance of NLM hinges u p on our abilit y to measure th e similarit y of t w o patc h es based on noisy observ ations. In lo w noise, this similarit y can already b e estimated quite accurately with a sin gle pixel patc h. In stronger noise, the sim ilarity measured through larger patc hes is more robu st to noise — but larger p atc hes also introd uce some bias. T h is results in an elb o w in the p erformance b ounds for NLM. Recen t empirical results suggest that these limitations can b e mitigated by adapting the k ernel shap e [ 53 ], the p atc h sh ap e [ 12 ], or spatial bandwid th h [ 26 ]; a theoretical u nderstanding of this kin d of adaptation remains an op en p roblem. There are sev eral d istinctions b etw een our work and the closely related wo rk in [ 34 ] that b ear men tioning. First, w e consider the c arto on mo del , where the functions are piecewise H¨ older s m ooth images w ith a discontin uity set corresp onding to a Lip sc h itz mapping of the unit ball, wh ile Maleki et al. [ 34 ] consider the horizon e dge mo del , w here the fu nctions 28 E. Arias-Cas tro et al. are p iecewise constant with a d iscon tinuit y set corresp onding to the graph of a Lipsc hitz function. Though they actually consider s moother edges, th eir analysis reduces to th e case of Lipschitz smo othness. W e consider images in arbitrary d imension — sho wing that NLM b eha v es diﬀeren tly when d ≥ 3 — while Maleki et al. consider the case of 2D images ( d = 2). Because th ey consider functions that are piecewise constan t, they use the weig h ted a verage v ersion ( 3.1 ) w ithout spatial lo calization ( i.e., h = ∞ ). Because w e f o cus on smo oth — n ot necessarily constant — regions, we need to lo calize b oth YF and NLM. Applying th e more complex LPR ( 3.2 ) enables YF to adapt to the d egree of regularit y in eac h smooth region. W e n ote that Maleki et al. [ 34 ] d o not consider the case of lo w noise and simply s ho w that YF ac h iev es the same p erformance as LF — whic h is also our conclusion in strong n oise. Moreo ver to simplify the analysis, Maleki et al. consider a sligh tly mo diﬁed v ersion of NLM and deriv e lo wer b ounds for oracle v ersions of YF and NLM. The lo wer b ound s for NLM we re also c hallenging for u s and we only provide heu ristics. W e men tion that our results imply that the simp ler NLM-a v erage ac hiev es the same p erformance as NLM in th e horizon mo del. Let u s also co mmen t on the w ork of Tsybak ov [ 56 ]. This p ap er suggests and analyzes (within the carto on mo del) a metho d v ery similar to NLM-av erage, b ased on medians rather than means. The metho d is based on non-ov erlapping p atc hes. This allo w s the m ethod to b e applicable in situations where the n oise d istribution is hea vy-taile d. (W e w ere not aw are of this wo rk when w e prepared our pap er.) Our analysis of NLM for imag e classes with th in f eatures o r regular patterns is also a signiﬁcan t no v el asp ect of our wo rk. Though w e exp ect wa vel ets are near-optimal f or cartoon images when the discont in uit y is Lipsc hitz, NLM has a signiﬁcan t empirical adv an tage o v er w a vele ts for certain kind s of rep eating textures. W e develo p a mo del for images with th ese features, and note that it do es n ot app roac h carto on or horizon mo del asymptoticall y . F or this image class, w e demonstrate that NLM p erf orm s as w ell as it do es for the carto on class. The current b oun ds are based on id eal band w idths wh ic h dep end on the unkn own smo oth- ness parameter α . Thus we ha v e d emonstrated that the adap tive ﬁltering tec h niques consid- ered adap t to the discon tin uit y Ω, b u t n ot to α . W e anti cipate that adaptivit y to α is in deed p ossible and lea v e that analysis for futu r e work. W e note that NLM is n ot the current state-of-the-art image denoising metho d in common use. More ev olve d patc h -b ased metho ds utilize sp arse representa tions o f patc hes, adaptive k ernel bandwid th s, and adaptiv e p atc h shap es ( cf. [ 9 , 10 , 32 , 11 , 49 , 53 ]). While these asp ects are not considered i n our analysis, the th eoretic al insigh ts p ro vid ed b y this p ap er may p o- ten tiall y lead to an improv ed u nderstanding of a broad class of patc h-based image den oising metho ds an d sub s equen tly b etter algorithms ( cf. [ 46 ] for p ossible directions). 8. Pro ofs. In this section, C, C 1 , C 2 , . . . denote ﬁ nite p ositiv e constan ts that do not c hange with n and whose actual v alue ma y c hange with eac h app earance. 8.1. Preliminary results. W e ﬁ rst gather some basic results. 8.1.1. S ome analysis. F un ctions in H d ( α, C 0 ) are uniformly w ell-a ppro ximated lo cally b y p olynomials o f degree ⌊ α ⌋ , sp eciﬁcally t heir T a ylor expansions. F or g ∈ H d ( α, C 0 ) a nd Minimax b ounds for non-lo cal means 29 x ∈ [0 , 1] d , the T a ylor expan s ion of g at x of degree t ∈ N is deﬁ n ed as follo ws: T r x g ( x ′ ) = X | s |≤ t g ( s ) ( x ) d Y i =1 ( x ′ i − x i ) s i s i ! . Lemma 8.1. F or any g ∈ H d ( α, C 0 ) , | g ( x ′ ) − T ⌊ α ⌋ x g ( x ′ ) | ≤ c α C 0 k x ′ − x k α ∞ , ∀ x, x ′ ∈ [0 , 1] d , wher e c α := X s ∈ N d : | s | = ⌊ α ⌋ 1 s 1 ! · · · s d ! . Pr o of . Though this sort of r esult is well-kno wn, w e pro vide a p ro of for completeness. A T a ylor approximat ion of degree ⌊ α ⌋ giv es: g ( x ′ ) = T ⌊ α ⌋ x g ( x ′ ) + X | s | = ⌊ α ⌋ ( g ( s ) ( z ) − g ( s ) ( x )) d Y i =1 ( x ′ i − x i ) s i s i ! , for some z on the segment joinin g x and x ′ . Hence, | g ( x ′ ) − T ⌊ α ⌋ x g ( x ′ ) | ≤ c α k x ′ − x k ⌊ α ⌋ ∞ max | s | = ⌊ α ⌋ | g ( s ) ( z ) − g ( s ) ( x ) | . No w app ly ( 2.5 ) and the fact th at k z − x k ∞ ≤ k x ′ − x k ∞ to get | g ( s ) ( z ) − g ( s ) ( x ) | ≤ C 0 k x ′ − x k α −⌊ α ⌋ ∞ , ∀ s ∈ N d , | s | = ⌊ α ⌋ . 8.1.2. S ome geometry . F or a measurable set A ⊂ R d , ρ ( A ) := in f h ∈ (0 , 1) inf x ∈ A sup  V ol( B ( y , s )) V ol( B ( x, h )) : B ( y , s ) ⊂ B ( x, h ) ∩ A  . (8.1) The quantit y ρ ( A ) pr o vides some measure of ho w irregular the b oundary of A is. The follo w ing lemma b oun d s ρ from b elo w f or sets whose b ound ary is suﬃciently r egular. Lemma 8.2. L et φ : R d → R d b e inje ctive, with φ and φ − 1 b oth C -Lipschitz. Th en for Ω = φ ( B (0 , 1)) and ρ deﬁne d in ( 8.1 ) , we have min( ρ (Ω) , ρ (Ω c )) ≥ (2 C ) − d . Pr o of . Fix x ∈ Ω and h > 0. Since φ is Lip sc h itz with constant C , we ha v e φ ( B ( φ − 1 ( x ) , h/C )) ⊂ B ( x, h ). Note that z := φ − 1 ( x ) ∈ B (0 , 1) and, by the triangle in- equalit y , B ( z , h/C ) ∩ B (0 , 1) ⊃ B ( z ′ , t ), wher e z ′ := (1 − h/ (2 C )) z and t := h/ (2 C ). Because φ − 1 is C -Lipsc hitz, w e hav e φ − 1 ( B ( φ ( z ′ ) , t/C )) ⊂ B ( z ′ , t ), so that B ( y , s ) ⊂ φ ( B ( z ′ , t )) ⊂ φ ( B ( z , h/C ) ∩ B (0 , 1)) ⊂ B ( x, h ) ∩ Ω , where y := φ ( z ′ ) and s := t/C . W e obtain a lo w er b ou n d for Ω c in a similar w a y . Next is a r esult on the num b er of sample p oints within a certain distance of a sub set. Le t X n d b e the set of sample p oints, that is, X n d = { x i : i ∈ I d n } . 30 E. Arias-Cas tro et al. Lemma 8.3. F or any subset A ⊂ (0 , 1) d of t he fo rm A = B ( A ′ , η ) for some A ′ ⊂ (0 , 1) d and 4 /n ≤ η ≤ 1 , 8 − d n d V ol( A ) ≤ | A ∩ X n d | ≤ 4 d n d V ol( A ) . Pr o of . Let z 1 , . . . , z k ∈ (0 , 1) d b e a maximal η -pac king f or A ′ ( i.e., th e balls B ( z j , η / 2) for j = 1 , . . . , k are d isjoin t and in cluded in A ′ ⊂ A , and for any z ∈ A ′ , there is j su c h that z ∈ B ( z j , η )). By the triangle inequalit y , w e hav e [ j =1 ,. . . ,k B ( z j , η / 2) ⊂ A ⊂ [ j =1 ,. . . ,k B ( z j , 2 η ) . On the one hand, taking volumes on all sides, w e get k η d ≤ V ol( A ) ≤ k 2 d (2 η ) d , since th e u nit ( k · k ∞ ) ball has volume 2 d . This turn s into 4 − d V ol( A ) ≤ k η d ≤ V ol( A ). On the other hand , coun ting sample p oints on all sides, using the fact that η d n d ≤ | B ( z , η ) ∩ X n d | ≤ (2 η ) d n d , ∀ z ∈ (0 , 1) d , ∀ η ∈ (2 /n, 1) , w e get k ( η / 2) d n d ≤ k X j =1 | B ( z j , η / 2) ∩ X n d | ≤ | A ∩ X n d | ≤ k X j =1 | B ( z j , 2 η ) ∩ X n d | ≤ k (4 η ) d n d . Com bining these, we get the d esired result. Lemma 8.4. Su pp ose 1 ≤ d 0 ≤ d ar e inte gers and let φ : R d 0 → R d b e inje ctive, with φ and φ − 1 (on the r ange of φ ) b oth C - Lipschitz with C ≥ 1 . Then ther e is another c onstant C ′ > 1 such that, for A := φ ((0 , a ) d 0 ) and h ∈ (0 , 1) , 1 C ′ a d 0 h d − d 0 ≤ V ol( B ( A, h )) ≤ C ′ a d 0 h d − d 0 . Conse quently, if φ : R d → R d is as ab ove and A := φ ( ∂ B (0 , 1)) , the r esult holds with d − d 0 = 1 . Pr o of . W e ﬁrst observe that, for any z ∈ R d 0 and h > 0, since φ is C -Lip s c hitz, φ ( B ( z , h )) ⊂ B ( φ ( z ) , C h ) . (8.2) No w , let z 1 , . . . , z m denote a maximal h -pac king of (0 , a ) d 0 . Note that m ≍ ( a/h ) d 0 when h ≤ 1. By d eﬁnition k z i − z j k ≥ h , so that k φ ( z i ) − φ ( z j ) k ≥ h/C since φ − 1 is C -Lipsc hitz with C ≥ 1. Hence, G i =1 ,...,k B ( φ ( z i ) , h/C ) ⊂ B ( A, h/C ) ⊂ B ( A, h ) , implying m X i =1 V ol( B ( φ ( z i ) , h/C )) ≤ V ol ( B ( A, h )) . Minimax b ounds for non-lo cal means 31 W e then conclude by the fact that P m i =1 V ol( B ( φ ( z i ) , h/C )) ≍ mh d ≍ a d 0 h d − d 0 . F or the u pp er b ound, w e use the fact that (0 , a ) d 0 ⊂ ∪ i =1 ,...,k B ( z i , h ), so that A ⊂ [ i =1 ,...,k φ ( B ( z i , h )) ⊂ [ i =1 ,...,k B ( φ ( z i ) , C h ) , b y ( 8.2 ). Hence, usin g the triangle inequalit y , V ol( B ( A, h )) ≤ m X i =1 V ol( B ( φ ( z i ) , C h + h )) ≍ mh d ≍ a d 0 h d − d 0 . F or th e second part, we use the fact that ∂ B (0 , 1) = ∪ ℓ φ ℓ ((0 , 1) d − 1 ) for a ﬁnite set of fun ctio ns φ ℓ satisfying the requirements and the fact th at th e comp osition φ ◦ φ ℓ is also L ip sc h itz. 8.1.3. S ome statistics. W e establish here some b ounds on th e p oin t-wise MSE ( 2.3 ) of LPR ( 3.2 ). W e m en tion that muc h ﬁner results exist in dimension d = 1 for the case where the underlyin g function f is smo oth; see [ 17 ] and references therein. Lemma 8.5 (Va rian ce). F or an y suﬃciently lar ge c onstant C > 0 , dep e nding o nly on d, r , th e fol lowing is true. Consider the LPR estimator of the form ( 3.2 ) , with weights ω i,j ∈ { 0 , 1 } . Assume that B in i ⊂ A i := { j : ω i,j = 1 } ⊂ B i,h := { j : x j ∈ B ( x i , h ) } , f or some discr e te b al l B in i satisfying | B in i | ≥ | B i,h | /C for some c onstant C . 1 C σ 2 ( nh ) − d ≤ V ar( b f i ) ≤ C σ 2 ( nh ) − d . (8.3) Pr o of . W e assume without loss of ge neralit y th at x i = 0 and drop the subscript i for simplicit y . Belo w C denotes a generic constant that ma y c hange w ith eac h app earance. Let q = r X s =0  s + d − 1 d − 1  =  r + d d  , (8.4) whic h is the num b er of monomials in d v ariables of degree r or less. Let X den ote th e | A | × q matrix with co eﬃcien ts ( x s j : j ∈ A, | s | ≤ r ). By deﬁnition of the lo cal p olynomial estimator ( 3.2 ) and the usual least squares formula, we ha v e b f = e T ( X T X ) − 1 X T y , (8.5) where e = (1 , 0 , . . . , 0) ∈ R q and y := ( y j : j ∈ A ) (assuming that X is full-rank, which we pro v e further down). I n particular, V ar( b f ) = σ 2 e T ( X T X ) − 1 e , since y j = f ( x j ) + ε j , with the noise ( ε j ) b eing un correlate d and having identica l v ariance σ 2 . Let z j = x j /h and Z = ( z s j : j ∈ A, | s | ≤ r ), and also let H = diag( h | s | , | s | ≤ r ), so that X = ZH , leading to V ar( b f ) = σ 2 e T H − 1 ( Z T Z ) − 1 H − 1 e = σ 2 e T ( Z T Z ) − 1 e , (8.6) 32 E. Arias-Cas tro et al. since H − 1 e = e . Th is is b ecause H is an inv er tib le d iagonal matrix with ﬁrst element equal to 1. Th e reason w e work with Z ins tead of X is that, under the conditions assum ed here, z j ∈ [ − 1 , 1] d (b ecause j ∈ A ) and ( nh ) − d Z T Z is b ounded f r om ab o v e and b elo w in terms of its sp ectrum. Indeed, deﬁne matrices Z 1 = ( z s j : j ∈ B in , | s | ≤ r ), Z 2 = ( z s j : j ∈ A \ B in , | s | ≤ r ), Z 3 = ( z s j : j ∈ B h , | s | ≤ r ) and Z 4 = ( z s j : j ∈ B h \ A, | s | ≤ r ). Let ≺ denote the ordering for p ositiv e semi-deﬁnite matrices. S ince Z T 1 Z 1 ≺ Z T 1 Z 1 + Z T 2 Z 2 = Z T Z = Z T 3 Z 3 − Z T 4 Z 4 ≺ Z T 3 Z 3 , it suﬃces that we fo cus on p ro vin g a low er b ound on the sp ectrum of Z T 1 Z 1 and upp er b ound on the sp ectrum of Z T 3 Z 3 . Cons id er therefore the case wh ere A itself is a d iscrete b all, say A = { j : x j ∈ B ( x, ah ) } , where a ∈ ( C − 1 /d , 1) by assumption. Let z = x/h . First, assume that a and h remain ﬁxed. Then for s, t ∈ N d suc h that | s | ∨ | t | ≤ r , we h a ve 1 ( nh ) d ( Z T Z ) st = ( nh ) − d X j ∈ A z s + t j → M st := Z B ( z ,a ) u s + t du, when nh → ∞ , (8.7 ) recognizing a Riemann sum on the LHS. S o, if M = ( M st : | s | ∨ | t | ≤ r ), we ha v e the con v ergence ( nh ) − d Z T Z → M , when nh → ∞ . M is a well-deﬁned p ositiv e semi-deﬁnite matrix sin ce its elemen ts are b ounded b y 1 — b ecause B ( z , a ) ⊂ B (0 , 1) — so w e only need to s h o w that it is p ositiv e uniform ly o ver a ∈ ( C − 1 /d , 1). Let λ z ,a denote the sm allest eigen v alue of M with i n tegral o v er B ( z , a ), with z ∈ B (0 , 1) and a ∈ ( C − 1 /d , 1). W e wan t to sho w that λ z ,a is b ounded aw ay from 0. Su pp ose this is not th e case, that there are sequences ( z m , a m ) suc h that λ z m ,a m → 0 as m → ∞ . By compacit y , we ma y assu me that ( z m , a m ) → ( z ∞ , a ∞ ) ∈ B (0 , 1) × [ C − 1 /d , 1]. Then λ z ∞ ,a ∞ = 0, b y conti n uit y . Let M ∞ b e the asso ciate d m atrix. T h en there is b ∞ ∈ R q nonzero suc h that 0 = b T ∞ M ∞ b ∞ = Z B ( z ∞ ,a ∞ ) X s,t b ∞ s b ∞ ,t u s + t du = Z B ( z ∞ ,a ∞ )  X s b ∞ ,s u s  2 du, where the sums are o v er s ∈ N d suc h that | s | ≤ r . T h is leads to a contradictio n since the p olynomial in the second in tegral cannot b e zero on a nonempty ball. So far, w e assumed that a and z were ﬁxed. Assume this is not the case. The u pp er b ound on the largest eigen v alue of Z T Z is b ound ed in the exact same wa y , using the fact that k z j k ≤ 1. F or the low er b ound we s till h a ve that lim inf ( nh ) − d X j ∈ A z s + t j ≥ in f z ′ ,a ′ Z B ( z ′ ,a ′ ) u s + t , du where the inf is ov er z ′ and a ′ suc h that a ′ ∈ ( C − 1 /d , 1) and B ( z ′ , a ′ ) ⊂ B (0 , 1). Our arguments apply to the R HS . W e conclude that ther e is C 1 ∈ (0 , ∞ ) su c h that, for n h large enough, 1 C 1 ( nh ) d ≤ λ min ( Z T Z ) ≤ λ max ( Z T Z ) ≤ C 1 ( nh ) d . (8.8) W e then r edeﬁne C as max( C, C 1 ) and conclude with ( 8.6 ). Minimax b ounds for non-lo cal means 33 Lemma 8.6 (Bia s : Upper Bound). Assume that f ∈ F cartoon ( α, C 0 ) , with for e gr ound Ω , and that the c onditions of L emma 8.5 also hold. If mor e over e ither A i ⊂ Ω or A i ⊂ Ω c , then, for some c onstant C > 0 , the fol lowing ine q u ality holds ( E b f i − f i ) 2 ≤ min(1 , C h 2 α ) . (8.9) Pr o of . W e con tin ue with the nota tion in trod uced in the proof of Lemma 8.6 . W LOG, assume A ⊂ Ω. In that case, f is smo oth in the windo w, since f = f Ω . By Lemma 8.1 , f ( x j ) = T ⌊ α ⌋ 0 f ( x j ) + g ( x j ), where T ⌊ α ⌋ 0 f is a p olynomial of degree at most ⌊ α ⌋ ≤ r , and | g ( x j ) | ≤ c α C 0 h α for all j ∈ A . No w, for a p olynomial p of degree at most r , let p = ( p ( x j ) : j ∈ A ) so that p = Xa f or s ome a ∈ R q , and we h a ve e T ( X T X ) − 1 X T p = e T a = a 0 = p (0) . With this r epro ducing formula and the fact that T ⌊ α ⌋ 0 f (0) = f (0), E b f − f (0) = e T ( X T X ) − 1 X T g = e T ( Z T Z ) − 1 Z T g . Because of ( 8.8 ), we h av e | e T ( Z T Z ) − 1 Z T g | ≤ k e k 2 · k Z T g k 2 λ min ( Z T Z ) ≤ C 1 ( nh ) − d k Z T g k 2 , (8.10) where C 1 is the constan t of L emma 8.5 . B ut the en tries ( Z T g ) s = P j ∈ A g ( x j ) z s j are u niformly b ounded by | A | · c α C 0 h α = O (( nh ) d h α ), so that the RHS in ( 8.10 ) is of order O ( h α ). This imply th at ( E b f − f (0)) 2 ≤ C 2 h 2 α for some constant C 2 , and we conclude by redeﬁnin g C as max( C, C 1 , C 2 ). Lemma 8.7 (Bia s : Lo w er Bound). L et f = 1 Ω , wher e Ω = (0 , 1 / 2) × (0 , 1) d − 1 and line ar ﬁltering (me aning ω i,j = 1 if , and only if, k x i − x j k ≤ h/ 2 ). Then ther e is a c onstant C > 0 such that, when d ist( x i , ∂ Ω ) ≤ h/C , we have ( E b f i − f i ) 2 ≥ 1 /C. (8.11) Pr o of . W e con tin ue with the notation introd uced in the pro of of Lemma 8.6 . In p articular, w e translate ev erything so th at x i = 0. WLOG, assume that x i ∈ Ω and let δ = dist( x i , ∂ Ω ). Let A Ω = { j : x j ∈ B h ∩ Ω } and deﬁ n e A Ω c similarly . Using the r epro ducing formula , w e get E b f − f ( x i ) = e T ( Z T Z ) − 1 Z T 1 A Ω − e T ( Z T Z ) − 1 Z T 1 = − e T ( Z T Z ) − 1 Z T 1 A Ω c . (8.12) Assume that δ < h , in which case A Ω c = { j : x j ∈ ( δ , h ) × ( − h, h ) d − 1 } , in whic h case the RHS in ( 8.12 ) is equal to − G ( δ /h ), wh ere G ( a ) := e T ( Z T Z ) − 1 Z T 1 { j : z j ∈ ( a, 1) × ( − 1 , 1) d − 1 } . 34 E. Arias-Cas tro et al. It suﬃces to sho w that there is C > 0 such that G ( a ) ≥ 1 /C when a ≤ 1 /C . Assume this is not the case, in whic h case there is a m → 0 suc h that G ( a m ) → 0 . As in the pro of o f Lemma 8.6 , recognizing Riemann su m s we see that, as m → ∞ , 1 ( nh ) d Z T Z → M := Z ( − 1 , 1) d p ( z ) p ( z ) T dz , p ( z ) := ( z s : | s | ≤ r ) , and 1 ( nh ) d Z T 1 { j : z j ∈ ( a, 1) × ( − 1 , 1) d − 1 } → v := Z (0 , 1) × ( − 1 , 1) d − 1 p ( z ) dz . Let u = Z ( − 1 , 0) × ( − 1 , 1) d − 1 p ( z ) dz . W e hav e u + v = M · 1 (where 1 = (1 , . . . , 1)), so that e T M − 1 ( u + v ) = 1, and therefore e T M − 1 v = 1 / 2 by symmetry . Hence, G ( a m ) → 1 / 2, whic h is a cont radiction. This lemma states a lo w er b ound on the squared bias of lin ear ﬁltering w hen f is an indicator function of a half hyp ercub e. The result is actually m uc h more general. An y function f in the cartoon class h as a foreground Ω whose b ound ary is well -appro ximated by a h yp erplane — since ∂ Ω is Lipschitz — and f is appr o ximately lo cally piecewise constan t ( f is smo oth on Ω and Ω c ). Hence, n ear the discon tin uit y , f resem b les the fun ction in Lemm a 8.7 . 8.1.4. S ome p robabilit y . T he follo wing result asserts that the maxim um of m id en tically distributed random v ariables w ith exp onent ially deca ying tails is at m ost a p o wer of log m . Lemma 8.8. Su pp ose X 1 , . . . , X m ar e such that for some a, b, c > 0 , P ( | X r | > t ) ≤ c exp( − ( t/a ) b ) , ∀ t > c, ∀ r = 1 , . . . , m. Then for m suﬃciently lar ge, P  max( | X 1 | , . . . , | X m | ) > a (2 log m ) 1 /b  ≤ c/m. Pr o of . Deﬁne x m = a (2 log m ) 1 /b . By the u n ion b oun d, P (max( | X 1 | , . . . , | X m | ) > x m ) ≤ P ( | X 1 | > x m ) + · · · P ( | X m | > x m ) ≤ mc exp( − ( x m /a ) b ) = cm − 1 → 0 . Lemma 8.9. F or X i ∼ N (0 , σ 2 i ) for i = 1 , . . . , m , and any C > 0 , we have P  max 1 ≤ i ≤ m | X i | > max 1 ≤ i ≤ m σ i p 2 C log m  ≤ m 1 − C . Minimax b ounds for non-lo cal means 35 Pr o of . Fix t ≥ 1 and let σ = max 1 ≤ i ≤ m σ i . By the un ion b ound and the fact that P ( N (0 , 1 ) > t ) ≤ exp( − t 2 / 2), we h a ve P  max 1 ≤ i ≤ m | X i | > t  ≤ m X i =1 P ( | X i | > t ) ≤ m X i =1 exp( − t 2 / (2 σ 2 i )) ≤ m exp( − t 2 / (2 σ 2 )) . W e then p lug in t = σ √ 2 C log m . Lemma 8.10. Supp ose X i ∼ χ 2 k for i = 1 , . . . , m . Ther e is a c onstant C 1 such that, for any C > C 1 , if k ≥ (64 / 9) C log ( m ) , we have P  max 1 ≤ i ≤ m X i > k + 2 p C k log m  ≤ m 1 − C / 2 (8.13) P  min 1 ≤ i ≤ m X i < k − 2 p C k log m  ≤ m 1 − C / 2 (8.14) Pr o of . Let us pr ov e the ﬁr st inequalit y . Since the moment generating fun ction of a χ 2 k is t → (1 − 2 t ) − k / 2 1 ( t < 1 / 2), Cher n oﬀ ’s b ound giv es P ( X i > t ) ≤ exp ( − ( t − k ) / 2 + ( k / 2) log ( t/k )) , ∀ t > k . W e then use the in equalit y log(1 + x ) ≤ x − x 2 / 2 + x 3 / 3, v alid for x ∈ (0 , 1) an d input t = k + 2 √ C k log m , to get − ( t − k ) / 2 + ( k / 2) log ( t/k ) ≤ − ( t − k ) 2 / (4 k ) + ( t − k ) 3 / (6 k 2 ) = − C log m + (4 / 3) C 3 / 2 p log( m ) /k log m, and b ound the second term by ( C / 2) log m . W e then obtain P ( X i > t ) ≤ m − C / 2 , and app ly the union b ound as b efore. The second inequalit y is p ro ved in the same wa y considering that log (1 − x ) ≥ − x − x 2 / 2 − x 3 / 3 holds for x ∈ (0 , 1). Lemma 8.11. Supp ose X i ∼ χ 2 k ( δ 2 i ) (non-c entr al chi-sq u ar e) fo r i = 1 , . . . , m . Ther e is a c onstant C 1 such that, for any C > C 1 , if k ≥ 16 C log( m ) and δ min := min i δ i ≥ 2 √ C log m , we have P  min 1 ≤ i ≤ m X i < δ 2 min / 4 + k − 3 p C k log m  ≤ 2 m 1 − C / 2 . Similarly, if δ max = max i δ i ≤ √ C log m , we have P  max 1 ≤ i ≤ m X i > k + 3 p C k log m  ≤ m 1 − C / 2 . Pr o of . First, X i ≡ ( Z i + δ i ) 2 + Y i , where Z i ∼ N (0 , 1 ) and Y i ∼ χ 2 k − 1 are i.i.d. Hence, min 1 ≤ i ≤ m X i ≥ m in 1 ≤ i ≤ m ( Z i + δ i ) 2 + min 1 ≤ i ≤ m Y i . 36 E. Arias-Cas tro et al. Let E i = { max 1 ≤ i ≤ m | Z i | ≥ p C log m } . By Lemma 8.9 , w e hav e P ( E i ) ≤ m 1 − C / 2 . Let F i = { min 1 ≤ i ≤ m Y i ≤ k − 1 − 2 p C ( k − 1) log m } . T o con trol the Y i ’s, w e apply in equalit y ( 8.14 ) to get P ( F i ) ≤ m 1 − C / 2 . Under E c i ∩ F c i , we hav e min i X i ≥ δ 2 / 4 + k − 3 √ C k log m and P ( E c i ∩ F c i ) = 1 − P ( E i ∪ F i ) ≥ 1 − P ( E i ) − P ( F i ) ≥ 1 − 2 m 1 − C / 2 . This pro v es the b ound on min i X i ; argumen ts for m ax i X i are similar and simp ler. 8.2. Pro of s of t he main results. 8.2.1. Pr o of of Theo rem 4.1 . W e start with the u pp er b ound . Fix f ∈ F cartoon ( α, C 0 ) with f oreground Ω. Let Q = { i : dist( x i , ∂ Ω ) ≤ h } . F or i ∈ Q , we us e the fact that | b f i − f i | ≤ 1, whic h implies E ( b f i − f i ) 2 ≤ 1 . F or i / ∈ Q , from Lemma 8.5 and Lemma 8.6 , coupled with the bias-v ariance decomp osition ( 2.3 ), we get E ( b f i − f i ) 2 ≤ C ( h 2 α + σ 2 ( nh ) − d ) . Using Lemma 8.3 , we hav e | Q | ≤ 4 d n d | B ( ∂ Ω , h ) | , while and | B ( ∂ Ω , h ) | = O ( h ) by Lemma 8.4 and the fact th at | ∂ Ω | is of order 1. Su mming ov er all i ∈ I d n , w e get MSE f ( b f ) ≤ n d − | Q | n d · C ( h 2 α + σ 2 ( nh ) − d ) + | Q | n d · C (1 + σ 2 ( nh ) − d ) ≤ C 1 ( h + σ 2 ( nh ) − d ) . Minimizing the RHS with resp ect to h yields the up p er b ound in T h eorem 4.1 . F or the lo w er b oun d , r edeﬁne Q = { i : dist( x i , ∂ Ω ) ≤ h/C 1 } , where C 1 is the constant of Lemma 8.7 . F or i / ∈ Q , we u se L emma 8.5 and the bias-v ariance d ecomp ositio n ( 2.3 ) , to get E ( b f i − f i ) 2 ≥ 1 C 2 σ 2 ( nh ) − d . F or i ∈ Q we use Lemma 8.7 and the bias-v ariance d ecomposition ( 2.3 ), to get E ( b f i − f i ) 2 ≥ 1 C 1 , Using Lemma 8.3 again, we h a ve th e follo wing low er b ound on the MSE (for n large enough), MSE f ( b f ) ≥ n d − | Q | n d · 1 C 2 σ 2 ( nh ) − d + | Q | n d · 1 C 1 ≥ C 3 ( h + σ 2 ( nh ) − d ) . Minimizing the RHS with resp ect to h leads to the low er b ou n d in Th eorem 4.1 . Minimax b ounds for non-lo cal means 37 8.2.2. Pr o of of T heo rem 4.2 . The pro of for the u pp er b ound is the same as that of Theorem 4.1 in smo oth regions, leading to an up p er b ound on the MSE of the form MSE f ( b f ) ≤ C ( h 2 α + σ 2 ( nh ) − d ) . Then minimizing this qu antit y ov er h give s the stated resu lt. The lo w er b ound is a we ll-kno w n minimax b ound [ 27 , Th eorem 5.1.2, p . 133]. 8.2.3. Pr o of of Theo rem 4.3 . Let δ ( x ) = dist( x, Ω). Th e pro of is similar to that of Theorem 4.2 , except that the v ariance v aries by lo cation. The p oin t bias is of order O ( h α ) ev erywhere, b ecause the smo othing windo w is of radius at most h , with all p oin ts in the windo w b eing on the same side of the discon tin uit y . Ho w ev er, the p oint v ariance is of order O ( σ 2 ⌈ nδ ( x i ) ⌉ − d ), since the wind ow is of r adius δ ( x i ) (immediate consequences of L emma 8.5 ). Let us sum the p oin t v ariances ov er all the pixels in the image. The situation is diﬀerent according to the dimension. W e sta rt with d = 1, so that Ω = ( a, b ) ⊂ (0 , 1) . F or δ small enough, there are exactly four p oints at distance less than δ from ∂ Ω (t wo on eac h side of the tw o jump lo catio ns). Let’s consid er the samp le p oin ts x i ∈ [ b, 1), an d let j b e such that x j − 1 < b ≤ x j . Note that j = bn (1 + o (1)), and w e assume th at b is ﬁxed. F or i ∈ [ j, j + nh ], the v ariance is b ounded by C σ 2 / ( i − j + 1), while f or i ≥ j + nh (in the smo oth region), th e v ariance is of order O ( σ 2 / ( nh )) as b efore. Hence, summ ing o v er i ≥ j , the a v eraged v ariance in that r egio n is b ounded by C σ 2 n − n h − j   j +[ n h ] X i = j 1 i − j + 1 + n − nh − j nh   = O ( σ 2 n ) nh X k =1 1 k + 1 ! = O ( σ 2 log( n ) n ) . The same is true for all the other th ree regions. When d ≥ 2, the story is just sligh tly diﬀerent. Deﬁne Q ℓ = { i : δ ( x i ) ≤ h 2 − ℓ } and let ℓ 0 b e su c h that h 2 − ℓ 0 < 2 / n ≤ h 2 − ℓ 0 +1 . Stratifying, w e hav e the f ollo wing b oun d on the a v eraged v ariance C σ 2 n d   ℓ 0 X ℓ =0 X i ∈ Q ℓ \ Q ℓ +1 ( nh 2 − ℓ − 1 ) − d + X i / ∈ Q 0 ( nh ) − d   = C σ 2 n d ℓ 0 X ℓ =0 | Q ℓ \ Q ℓ +1 | ( nh ) − d 2 d ( ℓ +1) + C σ 2 ( nh ) d . By Lemma 8.3 and Lemma 8.4 , w e h a ve | Q ℓ \ Q ℓ +1 | ≤ | Q ℓ | ≤ C 1 n d · h 2 − ℓ , for some constan t C 1 . Hence, the ﬁ rst su m on the RHS of the last equation is b ounded by C 2 σ 2 h ( nh ) − d ℓ 0 X ℓ =0 2 ( d − 1) ℓ ≤ C 3 σ 2 h ( nh ) − d · 2 ( d − 1) ℓ 0 = O ( σ 2 /n ) . This leads to an upp er b oun d on the MSE of the form MSE f ( b f ) ≤ C ( h 2 α + σ 2 A n /n + σ 2 ( nh ) − d ) . (8.15) Minimizing this qu an tity o v er h giv es the u p p er b ound stated in Theorem 4.3 . 38 E. Arias-Cas tro et al. F or the lo w er b ound, we know from minimax resu lts und erlying the lo w er b ound in Theo- rem 4.2 that th ere are functions f in the carto on class where the b ias in the sm o oth regions is of order at least h α . As for the v ariance, our upp er b oun d for the a v eraged v ariance is easily seen to b e lo w er b ound ed (up to a m ultiplicativ e constan t). This leads to a lo w er b ound iden- tical to ( 8.15 ) mo dulo a m ultiplicativ e constan t, and optimizing it leads to the lo w er b ound in Theorem 4.3 . W e omit d etail s. 8.2.4. Pr o of Theorem 4.4 . Fix i ∈ I d n and let η i = max j ∈ B ( i,nh ) | ε j | . Then by the un ion b ound and ( 4.1 ), P ( η i ≥ t ) ≤ | B ( i, nh ) | max j ∈ B ( i,nh ) P ( | ε j | ≥ t ) ≤ (2 nh + 1) d (1 − F ( t/σ )) =: p. (8.16) Hence, with probab ility at least 1 − p , the ev en t E i := { η i ≤ t } holds tr u e. WLOG, assume that x i ∈ Ω. Since f Ω is C 0 -Lipsc hitz, w e ha v e | f ( x i ) − f ( x j ) | = | f Ω ( x i ) − f Ω ( x j ) | ≤ C 0 h when x j ∈ Ω ∩ B ( x i , h ), and by the triangle inequalit y , | y i − y j | ≤ C 0 h + | ε i − ε j | ≤ C 0 h + 2 t under E i . Supp ose there is x j ∈ Ω c ∩ B ( x i , h ). In that case, th er e is x ∈ ∂ Ω ∩ B ( x i , h ) and we hav e | f ( x i ) − f ( x j ) | ≥ | f Ω ( x ) − f Ω c ( x ) | − | f Ω ( x i ) − f Ω ( x ) | − | f Ω c ( x j ) − f Ω c ( x ) | ≥ µ ( f ) − 2 C 0 h ≥ 1 /C 0 − 2 C 0 h, again by the triangle inequalit y and the fact that f Ω c is also C 0 -Lipsc hitz, this imp lies that | y i − y j | ≥ 1 /C 0 − 2 C 0 h − | ε i − ε j | ≥ 1 /C 0 − 2 C 0 h − 2 t, under E i . W e see that w e n eed h y ≥ C 0 h + 2 t to en s ure that sample p oints x j ∈ Ω ∩ B ( x i , h ) are selected, wh ile w e require that h y < 1 /C 0 − 2 C 0 h − 2 t so that p oin ts x j ∈ Ω c ∩ B ( x i , h ) are disregarded. These t wo in equalities are, for example, sati sﬁed wh en h y = 1 / (3 C 0 ) and t = 1 / (6 C 0 ), and h suﬃcient ly small — by our assumptions, h = o (1). Assume h y and t are c hosen that wa y . Then , when E i holds, the photometric k ernel in YF is able to exactly mimic the MO. W e now tu rn to b ound in g the MSE. First, we hav e E ( b f YF i − f i ) 2 = E [( b f YF i − f i ) 2 1 { E i } ] + E [( b f YF i − f i ) 2 1 { E c i } ] . Since b f YF i = b f MO i on E i , E [( b f YF i − f i ) 2 1 { E i } ] = E [( b f MO i − f i ) 2 1 { E i } ] ≤ E ( b f MO i − f i ) 2 . And since | b f YF i − f i | ≤ 1 b ecause of our clipping, w e hav e E [( b f YF i − f i ) 2 1 { E c i } ] ≤ P ( E c i ) ≤ p . It remains to chec k that p is negligible compared to MO risk given in Th eorem 4.2 . I n deed, using the fact th at t ≍ 1, that h ≤ 1 and that σ ≤ ( C ′ log n ) − 1 /b , we h a ve p = O ( nh ) d exp[ − ( t/ ( C σ )) b ] = exp[( d − t b ( C ′ /C b )) log n ] = o ( σ 2 /n d ) 2 α/ ( d +2 α ) , Minimax b ounds for non-lo cal means 39 when C ′ is suﬃcien tly large, im p licitly assuming that σ is at least a p olynomial in n , for otherwise the trivial estimator b y = y is optimal. This concludes the pro of. When the noise level is not small. Assume σ is ﬁxed, for simplicit y . Note YF is identica l to LF w hen h y → ∞ su ﬃcien tly fast. Assum e therefore that h y ≤ h 0 for some ﬁ xed h 0 < ∞ . W e now argue that YF is essen tially useless wh en this is the case. C on cr etely , assume the rev erse of ( 4.1 ), m eanin g P ( | ε i | ≤ t ) ≤ F ( t/σ ) , ∀ t, ∀ i ∈ I d n . (8.17) W e sh o w th at, wh en F (2 h 0 /σ ) < 1, YF has an o v erall s q u ared bias (and therefore MSE) of order 1, whic h is comparable to the trivial estimator b f = y . In other words, for large noise and r elativ ely small h y , YF can p erform worse than LF. F or example, the bias is at least h 0 at lo cations i satisfying | ε i | ≥ h 0 + h y . Ind eed, we are a v eraging ov er v alues y j suc h that | y j − y i | ≤ h y , s o that | b f i − y i | ≤ h y and therefore | b f i − f i | ≥ | ε i | − | b f i − y i | ≥ ( h 0 + h y ) − h y = h 0 . Moreo ver, by ( 8.17 ) P ( | ε i | ≥ h 0 + h y ) ≥ 1 − F (2 h 0 /σ ) > 0 . In tegrating the sq u ared b ias o v er th ese samp le p oin ts alone leads to a lo w er b ound of order 1. 8.2.5. Pr o of of Theore m 4.5 . F or simp licit y , we ignore b oun dary issues and in particular assume that all p atc hes are of same size, with m P ≍ ( nh P ) d sample p oint s eac h, and s imilarly for spatial wind o ws , with m h ≍ ( nh ) d sample p oints eac h. Upp er b ound for NLM-a v erage. F or i ∈ I d n suc h that P i ∩ ∂ Ω 6 = ∅ , we use the fact that | b f i − f i | ≤ 1 to get E ( b f i − f i ) 2 ≤ 1 . Consider i with P i ∩ ∂ Ω = ∅ . WLOG, assume P i ⊂ Ω. T ak e any j ∈ B ( i, nh ). By deﬁn ition, y P j − y P i = f P j − f P i + ε P j − ε P i . (8.18) F or the n oise p art we ha v e ε P j − ε P i ∼ N (0 , σ 2 ∆ ij /m 2 P ) , (8.19) where ∆ ij is the num b er of sample p oin ts in the symmetric diﬀerence of P i and P j . By Lemma 8.9 , we h a ve that max j ∈ B ( x i ,h ) | ε P j − ε P i | ≤ ζ := 2 σ p C log ( m h ) /m P , (8.20) with probability at least 1 − m 1 − C h . In the sequel, we ﬁx C large and denote b y E i the eve n t ( 8.20 ). F or the s ignal part, w e ha v e the follo wing f P i − f P j = 1 m P X x k ∈ P i f ( x k ) − 1 m P X x k ∈ P j f ( x k ) = 1 m P X x k ∈ P 0 ( f ( x k + x i ) − f ( x k + x j )) , 40 E. Arias-Cas tro et al. where P 0 is a generic patc h cent ered at 0. If x j ∈ Ω with P j ⊂ Ω, then, sin ce f Ω is C 0 -Lipsc hitz, | f P i − f P j | ≤ 1 m P X x k ∈ P 0 | f Ω ( x k + x i ) − f Ω ( x k + x j ) | ≤ C 0 | x i − x j | ≤ C 0 h. (8.21) If x j ∈ Ω c , then there is a p oin t x ∈ B ( x i , h ) ∩ ∂ Ω, and we ha v e f ( x k ) = f Ω ( x ) + [ f Ω ( x k ) − f Ω ( x )] for x k ∈ Ω and f ( x k ) = f Ω c ( x ) + [ f Ω c ( x k ) − f Ω c ( x )] for x k ∈ Ω c , with | f Ω ( x k ) − f Ω ( x ) | ≤ C 0 h , | f Ω c ( x k ) − f Ω c ( x ) | ≤ C 0 h and | f Ω ( x ) − f Ω c ( x ) | ≥ 1 /C 0 . Hence, f P i − f P j = 1 m P X x k ∈ P i f ( x k ) − 1 m P X x k ∈ P j f ( x k ) = f Ω ( x ) + 1 m P X x k ∈ P i [ f Ω ( x k ) − f Ω ( x )] − f Ω ( x ) | P j ∩ Ω | | P j | − 1 m P X x k ∈ P j ∩ Ω [ f Ω ( x k ) − f Ω ( x )] − f Ω c ( x ) | P j ∩ Ω c | | P j | − 1 m P X x k ∈ P j ∩ Ω c [ f Ω c ( x k ) − f Ω c ( x )] = ( f Ω ( x ) − f Ω c ( x )) | P j ∩ Ω c | | P j | + R, where | R | ≤ 2 C 0 h . W e no w use Lemm a 8.2 to b ound the fraction ab ov e from b elo w by (2 C 0 ) − d , to get | f P i − f P j | ≥ (2 C 0 ) − d µ − 2 C 0 h. (8.22) Using the decomp osition ( 8.18 ), coupled w ith the triangle inequalit y and ( 8.20 ), ( 8.21 ) and ( 8.22 ), we see that we need to choose h y suc h that C 0 h + ζ ≤ h y < (2 C 0 ) − d µ − 2 C 0 h − ζ . (8.23) The lo wer b ound is to ensu re that all the p oint s x j ∈ B ( x i , h ) such that P j ⊂ Ω are included in the neigh b orho o d of x i ( i.e., ω ij = 1), w hile the upp er b oun d is to ensur e that no p oints in Ω c are included (under E i ). F or p oint s x j ∈ B ( x i , h ) su c h that P j ∩ Ω c 6 = ∅ , they may or ma y not b e included, dep ending on h o w large that intersectio n is. Note that ( 8.23 ) is satisﬁed when h y is a suﬃcient ly small constan t since w e ha v e h → 0, ζ → 0 and µ ≍ 1 u n der our assumptions. In an y case, w e assu me that ( 8.23 ) holds. In terms of MSE, w e p ro ceed as follo ws. Let B i = { j : x j ∈ B ( x i , h ) } , B 0 i = { j : x j ∈ B ( x i , h ) , P j ⊂ Ω } and A i = { j : ω i,j = 1 } — the latter is a random su bset of B i . W e sa w that A i ⊃ B 0 i under E i , whic h implies E i ⊂ { A i ⊃ B 0 i } ⊂ [ B 0 i ⊂ A ⊂ B i { A i = A } , Minimax b ounds for non-lo cal means 41 leading to 1 { E i } ≤ X B 0 i ⊂ A ⊂ B i 1 {{ A i = A }} . (8 .24) Using ( 8.24 ) and the fact that | b f i − f i | ≤ 1, we ha v e E ( b f i − f i ) 2 = E [( b f i − f i ) 2 1 { E i } ] + E [( b f i − f i ) 2 1 { E c i } ] ≤ X B 0 i ⊂ A ⊂ B i P ( A i = A ) E [( b f A − f i ) 2 ] + P ( E c i ) , where b f A is the lo cal p olynomial estimator based on A ⊂ I d n . F or the second term, P ( E c i ) ≤ m 1 − C h b y ( 8.20 ). F or the ﬁrs t term, b y Lemma 8.2 , w e know that B ( x i , h ) ∩ Ω con tains a ball of radius C 1 h w ith C 1 > 0 dep ending only on C 0 and d . Hence, b y the triangle in equalit y , B ( x i , h ) \ B (Ω c , h P ) con tains a b all of r adius C 1 h − h P ≥ C 1 h/ 2 (even tually), implying that B 0 i con tains a discrete ball of radius at least ( C 1 h/ 3) n ≍ nh . Therefore | B 0 i | / | B i | ≍ 1 and we ma y apply Lemma 8.5 and Lemma 8.6 to eac h A in the sum ab o v e, to get E ( b f A − f i ) 2 ≤ C 2 ( h 2 α + σ 2 ( nh ) − d ) , for a constan t C 2 . Hence, using the f act th at P B 0 i ⊂ A ⊂ B i P ( A i = A ) ≤ 1, we ha v e E ( b f i − f i ) 2 ≤ C 2 ( h 2 α + σ 2 ( nh ) − d ) + m 1 − C h . By our choice for h , h 2 α + σ 2 ( nh ) − d ≍ ( σ 2 /n d ) 2 α/ ( d +2 α ) , and w e may c ho ose C large enough so the last term on the RHS is negligible, leading to an MSE at i of order O ( σ 2 /n d ) 2 α/ ( d +2 α ) . The MSE is of the same order w h en x i ∈ Ω c , and sum ming o v er all i ∈ I d n , we get MSE f ( b f ) ≤ | Q | n d + O ( σ 2 /n d ) 2 α/ ( d +2 α ) , where Q := { i : P i ∩ ∂ Ω 6 = ∅} . Since Q ⊂ { i : dist( x i , ∂ Ω ) < h P } , b y Lemma 8.3 and Lemma 8.4 , we h a ve | Q | ≤ C 2 n d · h P , so that MSE f ( b f ) ≤ O ( h P + ( σ 2 /n d ) 2 α/ ( d +2 α ) ) , Optimizing o v er h P sub ject to ( 8.23 ) b eing satisﬁed, w e achiev e the desired result. Upp er b ound for NLM. W e follo w the same argumen ts. Here we fo cus on i ∈ I d n suc h that dist( x i , ∂ Ω ) > 2 h P (instead of h P ), and assum e WLOG that x i ∈ Ω. T ake j ∈ B ( i, nh ) suc h that P j ∩ P i = ∅ . Note that this is true w hen x j ∈ Ω c . By deﬁnition, y P j − y P i = f P j − f P i + ε P j − ε P i . Since ε P j − ε P i ∼ N (0 , 2 σ 2 I m P ), we h a ve k y P j − y P i k 2 2 ∼ 2 σ 2 χ 2 m P ( k f P j − f P i k 2 2 / (2 σ 2 )), with k f P j − f P i k 2 2 = X x k ∈ P 0 ( f ( x k + x j ) − f ( x k + x i )) 2 . 42 E. Arias-Cas tro et al. If x j ∈ Ω with P j ⊂ Ω, then k f P j − f P i k 2 2 = X x k ∈ P 0 ( f Ω ( x k + x i ) − f Ω ( x k + x j )) 2 (8.25) ≤ m P C 2 0 k x i − x j k 2 2 ≤ m P C 2 0 h 2 , (8.26) since f Ω is C 0 -Lipsc hitz. By Lemma 8.11 and the f act that m P C 2 0 h 2 /σ 2 = o (1), w e conclude that max j k y P j − y P i k 2 2 ≤ 2 σ 2 m P + ζ χ , ζ χ := 6 σ 2 p C m P log m h , (8.27) with p robabilit y at least 1 − m 1 − C / 2 , where the maxim um is o v er j suc h that x j ∈ B ( x i , h ) and P j ⊂ Ω \ P i . Let E i b e this eve n t. If x j ∈ Ω c , then there is a p oint x ∈ B ( x i , h ) ∩ ∂ Ω. Let Q j = { x k ∈ P 0 : x k + x j ∈ Ω c } . F or x k ∈ Q j , w e use the decomp osition f ( x k + x j ) − f ( x k + x i ) = f Ω c ( x k + x j ) − f Ω c ( x ) + f Ω c ( x ) − f Ω ( x ) + f Ω ( x ) − f Ω ( x k + x i ) , with the ﬁr st and th ird diﬀerences b ounded b y C 0 h in absolute v alue, and the second b ound ed from b elo w by µ in absolute v alue. W e therefore hav e δ 2 ij := k f P j − f P i k 2 2 ≥ X x k ∈ Q j ( f Ω ( x k + x i ) − f Ω c ( x k + x j )) 2 (8.28) ≥ | Q j | ( µ − 2 C 0 h ) 2 ≥ δ 2 := m P (2 C 0 ) − d µ 2 / 2 , (8.29) where we u sed Lemma 8.2 to b ound | Q j | from b elo w and the fact that µ ≍ 1 while h = o (1). Since k y P j − y P i k 2 2 ∼ 2 σ 2 χ 2 m P ( δ 2 ij / (2 σ 2 )) and δ ij ≥ δ , with L emm a 8.11 we see that min j k y P j − y P i k 2 2 ≥ δ 2 / 4 + 2 σ 2 m P − ζ χ , (8.30) with probability at least 1 − m 1 − C / 2 , w here th e minimum is o v er j suc h that x j ∈ Ω c ∩ B ( x i , h ). Let F i denote this ev en t. Assuming ( 8.27 ) and ( 8.30 ) hold, we see that we n eed to choose h y suc h that 2 σ 2 m P + ζ χ ≤ h 2 y < m P (2 C 0 ) − d µ 2 / 8 + 2 σ 2 m P − ζ χ . (8.31) The low er b ound is to ensure that all the p oin ts x j ∈ B ( x i , h ) suc h that P j ⊂ Ω and P j ∩ P i = ∅ are included in th e n eigh b orh oo d of x i (meaning ω ij = 1), w hile the upp er b ound is to ensu r e that no p oints x j ∈ Ω c are in cluded (under E i ). F or all other p oints x j ∈ Ω ∩ B ( x i , h ), they ma y or m a y n ot b e in cluded, dep end ing on how large that intersecti on is. Note that th ere is an h y satisfying ( 8.31 ) if, and only if, m P (2 C 0 ) − d µ 2 / 8 > 2 ζ χ ⇔ m P > C 1 σ 4 log n, for a constan t C 1 whic h dep ends only on d, C 0 , µ . Assuming m P is that large, ( 8.31 ) is satisﬁed when h 2 y = 2(1 + η ) σ 2 m P with η su ﬃcien tly small. In an y case, we assume that ( 8.31 ) holds and the rest of the pro of is identica l to the one for NLM-a v erage. Minimax b ounds for non-lo cal means 43 Lo w er b ound ( heuristics). W e discuss here the lo w er b ound and w here the issu es are. C on s ider the important case wh er e σ is ﬁxed and assume that f = 1 { Ω } , where Ω = (0 , 1 / 2 ) × (0 , 1) d − 1 . Consid er dir ect neighbors ( i.e., p oints w ith distance 1 /n d ) x i ∈ Ω and x j ∈ Ω c . F or x k suc h that P k ∩ ( P i ∪ P j ) = ∅ , we use ( 8.19 ) to arrive at y P k − y P i ∼ N ( λ k , 2 σ 2 /m P ) , and y P k − y P j ∼ N ( λ ′ k , 2 σ 2 /m P ) , where | λ k − λ ′ k | = | P i \ P j | = m P − 1 /d . When d ≤ 2, th e diﬀerence in mea ns m P − 1 /d is of order at most that of the stand ard deviation m P − 1 / 2 , so that these tw o distributions cann ot b e eﬀectiv ely separated. H euristic al ly , this ind icates that if the photometric kernel of NLM- a v erage includes x k in the neigh b orho o d of x i , it also includes it in the neigh b orho o d of x j with non-negligible probab ility . Th is is eviden ce that the squ ared bias is of order 1 at these p oin ts. Since there are order ( nh ) d − 1 suc h sample p oint s, a v eraging o v er them yields a low er b ound on the squared b ias (and therefore on the MSE) of ord er O (1 /n ). The same heur istics could b e applied to NLM. The story c hanges for d ≥ 3. In f act, fo r any f in the carto on mo del with similar fore- ground, NLM-a v erage — and NLM — achiev e a muc h b etter risk. T o see this, ﬁ x x i ∈ Ω. W e already kn ow th at NLM-a verage b eha v es we ll when P i ⊂ Ω; therefore assu me that P i ∩ Ω 6 = ∅ . If x j − x i is not p arallel to ∂ Ω , then | f P j − f P i | ≥ m P − 1 /d , so that un der ( 8.19 ), | y P k − y P j | ≥ m P − 1 /d − ζ ≥ m P − 1 / 3 − ζ . Noting that ζ ≍ p log( n ) /m P = o ( m P − 1 /d ), if we c hoose h y ≍ m P − 2 / 5 , th en with h igh probabilit y , the neigh b orho o d of x i only includes those x j ∈ B ( x i , h ) suc h that x j − x i is parallel to ∂ Ω, p erhaps excluding th ose such that P j ∩ P i = ∅ . Th ere are ord er ( n h ) d − 1 suc h x j ’s, whic h d r iv es the v ariance of the lo cal p olynomial estimator at x i . This ap p lies to all x i with P i ∩ Ω 6 = ∅ , and ther e are order n d h su c h x i ’s. The MSE ov er these p oints yields an MSE of order 1 n d ( n d h )  h 2 α + σ 2 ( nh ) d − 1  = h 2 α +1 + ( nh 2 ) σ 2 ( nh ) d . W e kn o w th at the MSE o v er the p oin ts a w a y from the d iscon tinuit y is of ord er h 2 α + σ 2 ( nh ) d , so the o v erall MSE is of order h 2 α + ( nh 2 ∨ 1) σ 2 ( nh ) d . Minimizing o v er h yields a lo w er b ound of ( σ 2 /n d − 1 ) 2 α/ (2 α + d − 2) ∨ ( σ 2 /n d ) 2 α/ (2 α + d ) , whic h is the MO r ate if d ≥ 2 α . Non-lo cal v ersions. W e quickly argue that, without spatial lo caliza tion, YF, NLM, and NLM-a verag e do not p erform that we ll (relativ e to the MO), unless the und erlying function is a p olynomial (of d egree at most r , wh er e r is the c hosen d egree for the p olynomial ﬁtting) or all jumps are greater than h y . Let us lo ok at wh at the metho ds do on noiseless data. F or a giv en photometric bandwidth h y , consider the function f = h y 1 { Ω } , where Ω = (0 , 1 / 2) × (0 , 1) d − 1 . Then b oth YF and NLM-a v erage output a constant estimator equal ev erywhere to the lo cal p olynomial estimator applied to the wh ole image. Hence, the MS E is at least h 2 y / 4. Give n that we tak e h y relativ ely large, this leads to a large MSE (of order 1). 44 E. Arias-Cas tro et al. 8.2.6. Pr o of of Theo rem 5.1 . T h e only diﬀerence with the cartoon mo del is in the b eha vior of lo cal p olynomial regression. Fix a p oin t x i ∈ Ω. By Lemma 8.4 (scaled by h ), V ol( B ( x i , h ) ∩ Ω ) ≍ h d 0 a d − d 0 . (In f act, this is sligh tly easier her e since Ω is a b an d around the graph of a function.) Therefore, by Lemma 8.3 (scaled by h ), w e s ee that, # { j : x j ∈ B ( x i , h ) ∩ Ω } ≍ n d h d 0 a d − d 0 . This is the n um b er of observ ations we are “a v eraging” ov er. F or LF, w e p ro ve a lo wer b oun d of order 1 for the squared bias at x i ; we pr oceed as in Lemma 8.7 with only cosmetic adju stmen ts. F or BO, we app ly LPR to the s amp le p oints x j b elonging to the largest ball cente red at x i whic h is con tained in Ω. Since we only consid er x i ∈ Ω \ B ( ∂ Ω , a/C ), then this b all is of radius at least a/C . W e then conclud e usin g the same argument b oun ding the risk of BO in the carto on mo del detailed in Section 8.2.3 . F or MO, and its mimic k ers YF and NLM, we need to reﬁne Lemma 8.5 b ecause, in the case wh ere Ω is a thin band , th e largest b all within it is not repr esen tativ e of the sample size used in the lo cal p olynomial ﬁ t — whic h is what dr ives the v ariance. W e exp lain how to adapt the pro of of Lemma 8.5 to show that, for a constan t C , V ar( b f i ) ≤ C σ 2 n d h d 0 a d − d 0 , Let u 1 , . . . , u m b e a maximal a -pac king of B ( x i , h ) ∩ Ω, with m ≍ ( h/a ) d 0 . Then m G k =1 B ( u k , a ) ⊂ Ω . Using the n otat ion introdu ced in the p ro of of L emma 8.5 , we h a ve Z T Z ≻ m X k =1 Z T k Z k , where Z k = ( z s j : x j ∈ B ( u k , a ) , | s | ≤ r ). W e can then us e ( 8.8 ) to obtain λ min ( Z T k Z k ) ≥ 1 C ( na ) d , implying λ min ( Z T Z ) ≻ 1 C m ( na ) d . This giv es the upp er b ound on the v ariance, and the bias b ehav es as exp ected, meaning that Lemma 8.6 holds. I t is no w str aightforw ard to d ed uce that MO at i has a squared b ias of order O ( h 2 α ) and a v ariance of O ( σ 2 n − d h − d 0 a − d + d 0 ). Give n that h = h MO and a = o ( h MO ), the v ariance dominates and ma y b e expressed a s ( h/a ) d − d 0 O ( σ 2 / ( nh ) d ), with O ( σ 2 / ( nh ) d ) b eing the order of magnitude of the p oint risk of MO un der th e carto on mo del. YF is still able to p erfectly mimic MO und er th e conditions of Theorem 4.4 (same exact argumen ts). F or p oints x i ∈ Ω with dist( x i , ∂ Ω ) > h NLM P , the analysis for NLM i s again exactly the same, the d iﬀerence here b eing in the num b er of j ’s su ch that P j ⊂ Ω, which is o f order n d h d 0 a d − d 0 . The rest is the same. Minimax b ounds for non-lo cal means 45 8.2.7. Pr o of of Prop osition 5.1 . Here Ω and Ω c are in terc h an geable, so we fo cus on the former WLOG and ﬁ x x i ∈ Ω . Again, the only diﬀerence with the carto on mo del is in the b eha vior of lo cal linear regression and we need a str on ger version of Lemm a 8.5 when Ω is a rep eated pattern. Using notations in tro duced in the p ro of of Lemma 8.5 , we h a ve Z T Z = X v Z T v Z v , where Z v = ( z s j : x j ∈ B ( x i , h ) ∩ (Ξ + v ) , | s | ≤ r ). Note th at w e m a y restrict th e sum to th ose v ∈ a Z d suc h that B ( x i , h ) ∩ (Ξ + v ) 6 = ∅ , and there are order ( h/a ) d suc h v ’s. Since they are all translates of eac h other, let us fo cus on Ξ, that is, v = 0. W e again express Z T 0 Z 0 as a su m of matrices by partitioning the d -dimens ional su bgrid { x j ∈ Ξ } int o discrete 1D grids of the f orm L j 1 ,...,j d − 1 := { (( j 1 − 1 / 2) /n, . . . , ( j d − 1 − 1 / 2) / n, ( j d − 1 / 2) / n ) ∈ Ξ : j d = 1 , . . . , [ na ] } , where j 1 , . . . , j d − 1 ∈ { 1 , . . . , [ na ] } . W e therefore h a ve Z T 0 Z 0 = X j ′ Z T ( j ′ ) Z ( j ′ ) , where Z ( j ′ ) := ( z s k : x k ∈ L j ′ ∩ Ω , | s | ≤ r ) for j ′ ∈ { 1 , . . . , [ na ] } d − 1 . Since N Ω ≥ (1 /C ) N Ω c , we also ha v e that N Ξ ≥ (1 /C ) N (0 ,a ) d \ Ξ , so th at Ξ conta ins at least the fraction 1 / ( C + 1) of the sample p oin ts in (0 , a ) d and therefore X j ′ ∈{ 1 ,..., [ na ] } d − 1 | L j ′ | ≥ [ na ] d C + 1 . (8.32) Let J ′ := { j ′ ∈ { 1 , . . . , [ na ] } d − 1 : | L j ′ | ≥ [ na ] / (2 C + 2) } . Since | L j ′ | ≤ [ na ], we h a ve X j ′ ∈{ 1 ,..., [ na ] } d − 1 | L j ′ | ≤ [ na ] | J ′ | + [ na ] 2 C + 2 ([ na ] d − 1 − | J ′ | ) , so that | J ′ | ≥ [ na ] d − 1 / (2 C + 1) b y ( 8.32 ). W e fo cus on Z ( j ′ ) with j ′ ∈ J ′ . Notice that this reduces the analysis to the one-dimensional case. Lemma 8.12. Ther e is a numeric c onstant C > 0 such that any p olynomial r e gr ession matrix of the f orm U = (( k /m ) s : 0 ≤ s ≤ r ; k ∈ K ) , with K ⊂ {− m, . . . , m } and | K | ≥ r + 1 , satisﬁes λ min ( U T U ) ≥ | K | ( | K | /m ) 2 r /C . Pr o of . Let k 1 < · · · < k q b e the elemen ts of K . Deﬁn e ℓ 0 = [ q / ( r +2)] and for ℓ = 1 , . . . , ℓ 0 − 1, let K ℓ = { k ℓ , k ℓ + ℓ 0 , . . . , k ℓ +( r + 1) ℓ 0 } . Note that | K ℓ | = r + 1 and k ℓ +( j +1) ℓ 0 − k ℓ + j ℓ 0 ≥ ℓ 0 . No w , the matrix U ℓ = (( k /m ) s : 0 ≤ s ≤ r ; k ∈ K ℓ ) is a V and ermonde ( r + 1) × ( r + 1) matrix. It is we ll-kno w n that U ℓ is inv ertible, and more precisely , the main result in [ 18 ] says th at 46 E. Arias-Cas tro et al. k U − 1 ℓ k ∞ = m ax 1 ≤ i ≤ r +1 Y j ∈{ 1 ,.. .,r +1 }\{ i } 1 + | k ℓ + j ℓ 0 | /m | k ℓ + j ℓ 0 /m − k ℓ + iℓ 0 /m | , where k ( a ij ) k ∞ := max i P j | a ij | . Hence, k U − 1 ℓ k 2 ≤ √ r + 1 k U − 1 ℓ k ∞ ≤ √ r + 1(2 m/ℓ 0 ) r , where k · k 2 is the u sual E uclidean op erator norm. Hence, λ min ( U T ℓ U ℓ ) ≥ k U − 1 ℓ k − 2 2 ≥ ( ℓ 0 / (2 m )) 2 r / ( r + 1) . Since the in dex sets K ℓ do not ov erlap, we h a ve λ min ( U T U ) ≥ ℓ 0 X ℓ =1 λ min ( U T ℓ U ℓ ) . When r is ﬁxed, ℓ 0 ≍ q , so the RHS ≍ q ( q /m ) 2 r . Let C 1 denote th e constan t of Lemma 8.12 and let C 2 = C 1 (2 C + 1) 2 r +1 . Applying this result und er the assumption that [ na ] / (2 C + 1) ≥ r + 1, w e ﬁnd that λ min ( Z T ( j ′ ) Z ( j ′ ) ) ≥ [ n a ] /C 2 for all j ′ ∈ J ′ . F rom h ere, w e h a ve λ min ( Z T 0 Z 0 ) ≥ (# J ′ )[ na ] /C 2 ≥ [ na ] d C 2 (2 C + 2) , and then λ min ( Z T Z ) ≥ ( h/a ) d λ min ( Z T 0 Z 0 ) ≍ ( nh ) d . With this established, the bias b eha v es as in th e carto on mo del, and the rest of the analysis for MO and YF is exactly as b efore. F or NLM, some additional argum en ts a re required. W e need to compare P i with other patc h es cent ered at x j ∈ B ( x i , h ). First, s u pp ose th at x j − x i ∈ a Z d . Then, by the p erio dicit y of Ω, x j ∈ Ω to o, and also x k + x i ∈ Ω if, and only if, x k + x j ∈ Ω, for all x k ∈ P 0 . Hence, k f P j − f P i k 2 2 = X x k ∈ P 0 ∩ Ω ( f Ω ( x k + x j ) − f Ω ( x k + x i )) 2 + X x k ∈ P 0 ∩ Ω c ( f Ω c ( x k + x j ) − f Ω c ( x k + x i )) 2 ≤ m P C 2 0 k x i − x j k 2 ≤ m P C 2 0 h 2 . This is the equiv alen t of ( 8.26 ). Supp ose n o w that x j ∈ Ω c . Using the fact that f Ω and f Ω c are C 0 -Lipsc hitz, we h a ve f P i = f Ω ( x i ) 1 ( P i ∩ Ω) + f Ω c ( x i ) 1 ( P i ∩ Ω c ) + O ( h ) , and similarly , f P j = f Ω ( x i ) 1 ( P j ∩ Ω) + f Ω c ( x i ) 1 ( P j ∩ Ω c ) + O ( h ) , Minimax b ounds for non-lo cal means 47 since f Ω ( x i ) − f Ω ( x j ) = O ( h ) and f Ω c ( x i ) − f Ω c ( x j ) = O ( h ). Hence, k f P j − f P i k 2 2 ≥ ( f Ω c ( x i ) − f Ω ( x i )) 2 ×k 1 ( P j ∩ Ω) − 1 ( P i ∩ Ω) k 2 2 + O ( m P h 2 ) ≥ µ 2 m P /C ′ + O ( m P h 2 ) b y ( 5.1 ). This is the equiv alent of ( 8.29 ). Arguing as the in the p ro of of Theorem 4.5 , w e see that, with h igh probab ility , the regres- sion neigh b orho o d of x i includes all x j suc h that x j − x i ∈ a Z d , x j ∈ B ( x i , h ) and P j ∩ P i = ∅ — those x j ’s are in Ω lik e x i — and excludes all x j ∈ Ω c suc h that P j ∩ P i = ∅ . There are of order ( na ) d suc h p oin ts. Using the same t ec h niques as b efore, this leads to a b ound on the v ariance of order ( na ) d / ( nh ) d . The trade-oﬀ with the bias for a c hoice of band width h MO leads to the ( na ) d R MO upp er b ound in th e p r op osition. In principle, an add itional argum en t wo uld b e needed to exclude those x j ∈ Ω c suc h that P j ∩ P i 6 = ∅ , since in that case k ε P j − ε P i k 2 2 is not c h i-square as b efore. Ho wev er, it is not hard to see that even if these are included in the r egression n eigh b orh oo d, it do es not change things muc h sin ce their num b er is small — of ord er O (log n ). Ackno wledgements. W e w ould lik e to thanks Alexandre Tsybak o v for p ointing out some helpful r eferences and Mic ha ¨ el C h ic hignoud f or fru itful commen ts that help ed improving th is w ork. REFERENCES [1] A lv arez, L., Mazorra, L.: Signal and image restoration using sho c k ﬁ lters and anisotropic d iﬀusion. SIAM J. Numer. An al. 31 (2), 590–605 (1994) [2] A rias-Cas tro, E., Donoh o, D .L.: Do es med ian ﬁltering truly preserve edges b etter than linear ﬁltering? Ann. Statist. 37 (3), 1172–1206 (2009) [3] A w ate, S.P ., Whitaker, R .T.: Unsup ervised, information-theoretic, adaptive image ﬁltering for image restoration. IEEE T rans. Pattern A nal. Mach. Intell. 28 (3), 364–376 (2006) [4] A zzabou, N., Para gios, N., Guic hard, F.: Image denoising based on adapted dictionary computation. In : ICIP , pp. 109–112 ( 2007 ) [5] Buad es, A.: Image and movie denoising by non local means. Ph.D. thesis, Universitat de les Illes Balears (2006) [6] Buad es, A ., Coll, B., Morel, J.M.: A review of image denoising algorithms, with a new one. Multiscale Model. Simul. 4 (2), 490–530 (2005) [7] Chatterjee, P ., Milanfar, P .: Patc h -based near-optimal image d enoising. subm itt ed (2011) [8] Criminisi, A., P ´ erez, P ., T oy ama, K.: Region ﬁ lling and ob ject remov al by exemplar-based image inpaint- ing. IEEE T rans. Image Pro cess. 13 (9), 1200–121 2 (2004) [9] D abov, K., F oi, A., Katko vnik, V., Egiazarian, K.O.: Image denoising by sparse 3-D transform-domain collaborative ﬁltering. IEEE T rans. Image Process. 16 (8), 2080–209 5 (2007) [10] Dab ov, K., F oi, A., Katko vnik, V ., Egiazarian, K.O.: BM3D image d enoising with shap e-adaptive princi- pal comp onent analysis. In: Pro c. W orkshop on Signal Processing with Ad aptiv e Sparse St ructured Representations (SP AR S ’09) (2009) [11] Deledalle, C.A., Duv al, V ., Salmon, J.: Anisotropic n on-local means with spatially adaptive patc h shapes. In: SSVM (2011) [12] Deledalle, C.A., Duv al, V., Salmon, J.: Non - local meth ods with shap e-adaptive patches (NLM-SA P). J. Math. Imaging V is. pp . 1–18 (2011) [13] Donoh o, D.L., Johnstone, I.M.: Ideal spatial adaptation by w a ve let shrink age. Biometrik a 81 (3), 425–455 (1994) 48 E. Arias-Cas tro et al. [14] Donoh o, D.L., Johnstone, I.M., Kerk y ac harian, G., Picard, D.: W av elet shrinkag e: asymptopia? J. Roy . Statist. So c. Ser. B 57 (2), 301–369 (1995) [15] Du v al, V., Aujol, J.F., Gousseau, Y.: A b ias-v ariance approach for th e nonlo cal means. SIAM J. Imaging Sci. 4 (2), 760–788 (2011) [16] Efros, A.A., Leung, T.: T exture synthesis by n on-parametric sampling. In: I CCV, pp . 1033–1038 ( 1999 ) [17] F an, J., Gijbels, I .: Local p olynomial mo delling and its applications, Mono gr aphs on Statistics and Appl i e d Pr ob ability , vol. 66. Chapman & H all, London (1996) [18] Gautschi, W.: O n inv erses of V andermonde and conﬂuent vandermonde matrices. Nu merisc h e Mathe- matik 4 , 117–123 (1962) [19] Gilb oa, G., Osher, S.: Nonlocal linear image regularization and sup ervised segmen tation. Multiscale Model. Simul. 6 (2), 595–630 (2007) [20] H astie, T., Tibshirani, R ., F riedman, J.: The elements of statistical learning, second ed n . Springer S eries in Statistics. S pringer, New Y ork (2009) [21] Johnstone, I.M.: O racle inequalities and nonparametric function estimation. In : Proceedings of the Inter- national Congress of Mathematicians, V ol. I I I (Berlin, 1998), Ext ra V ol. I I I, pp. 267–278 (electronic) (1998) [22] Katkovnik, V.: A n ew metho d for v arying adap t iv e b andwidth selection. IEEE T rans. Image Pro cess. 47 (9), 2567–2571 (1999) [23] Katkovnik, V., Egiazaria n, K .O., Astola, J.T.: Adaptive wind ow size image d e- noising based on intersec- tion of conﬁ dence interv als ( I CI) rule. J. Math. Imaging Vis. 16 (3), 223–235 (2002) [24] Katkovnik, V., F oi, A ., Egiazarian, K.O., Astola, J.T.: Directional v arying scale approximatio ns for anisotropic signal pro cessing. In : EUSI PCO, pp . 101–104 (2004) [25] Katkovnik, V., F oi, A., Egiazarian, K.O., Astola, J.T.: F rom lo cal kernel to nonlo cal multiple-mod el image denoising. Int. J. Comput. V ision 86 (1), 1–32 (2010) [26] Kerv rann, C., Boulanger, J.: Op timal spatial adaptation for p atc h-based image denoising. IEEE T rans. Image Pro cess. 15 (10), 2866–2878 (2006) [27] Korostel¨ ev, A.P ., Tsybako v, A.B.: Minimax theory of image reconstruction, L e ctur e Notes in Statistics , vol . 82. S pringer-V erlag, New Y ork (1993) [28] Lee, J.S.: Digital image smo othing and t h e sigma ﬁlter. Computer Vision, Graph ics, and I mage Processing 24 (2), 255–269 (1983) [29] Lepsk i, O.V., Mammen, E., Sp okoin y , V.G.: Optimal spatial adaptation to inhomogeneous smo othness: an app roach based on k ernel estimates with v ariable b an d width selectors. Ann. Statist. 25 (3), 929–947 (1997) [30] Levin , A., Nadler, B.: N atural image denoising: Optimality and inherent b ounds. In : CVPR (2011) [31] Mahmoudi, M., Sapiro, G.: F ast image and video denoising via nonlo cal means of similar n eigh borho od s. IEEE Signal Pro cess. Lett. 12 , 839–842 (2005) [32] Mairal, J., Bac h, F., Po nce, J., Sapiro, G., Zisserman, A.: Non-lo cal sparse mo dels for image restoration. In: ICCV, pp. 2272–227 9 (2009) [33] Maleki, A., Naray an, M., Baraniuk, R.G.: Anisotropic nonlo cal means (2011). Su bmitted to Applie d and Computational Harmonic Analysis [34] Maleki, A., N ara y an, M., Baraniuk, R.G.: Sub optimality of n onlocal means for images with sharp edges (2011). Sub m itt ed to Appli e d and Computational Harmonic Ana lysis [35] Mallat, S.: A wa v elet tour of signal pro cessing. Elsevier/Academic Press, A msterdam (2009). The sparse w a y , With contributions from Gabriel Peyr ´ e [36] Milanfar, P .: A tour of mo d ern image ﬁ ltering. I EEE Signal Pro cessing Magazine, to app ear as a feature article in I EEE signal pro cessing magazine (2012) [37] M ¨ uller, H .G., Stadtm ¨ uller, U .: V ariable b andwidth kernel estimators of regression curves. Ann . Statist. 15 (1), 182–201 (1987) [38] N ad aray a, E.A.: On estimating regression. Theory of Probability and its Applications 9 (1), 141–1 42 (1964) [39] Perona, P ., Malik, J.: Scale space and edge detection using anisotropic diﬀusion. I EEE T rans. Pattern Anal. Mach. Intell. 12 , 629–639 (1990) [40] Polzehl, J., Sp oko iny , V.G.: Adaptive w eigh ts smo othing with applications to image restoration. J. R . Stat. So c. Ser. B S tat. Meth odol. 62 (2), 335–354 (2000) Minimax b ounds for non-lo cal means 49 [41] Polzehl, J., Sp oko iny , V.G.: Image denoising: p oint wis e adaptive approach. Ann. S tatist. 31 (1), 30–57 (2003) [42] Portilla, J., Strela, V., W ainwri ght, M., S imoncelli, E.P .: Image denoising using scale m ix tures of gaussians in the wa velet domain. I EEE T rans. Image Pro cess. 12 (11), 1338–1351 (2003) [43] Salmon, J.: A gr ´ egatio n d’estimateurs et m´ ethodes ` a patch p our le d´ ebruitage d’images n um´ eriques. Ph.D. thesis, Universit ´ e Paris Diderot (2010) [44] Salmon, J.: On tw o p arameters for d enoising with N on-Local Means. IEEE S ignal Pro cess. Lett. 17 , 269–272 (2010) [45] Salmon, J., Strozecki, Y .: Patc h repro jections for Non Lo cal metho ds. Signal Pro cessing 92 (2), 447–489 (2012) [46] Salmon, J. , Willett, R., Arias-Castro, E.: A tw o-stage denoising ﬁlter: the prep rocessed Y arosla vsky ﬁlter. In: IEEE Statistical Signal Pro cessing W orkshop (2012) [47] Sin ger, A., Shkolnisky , Y ., N adler, B.: Diﬀusion interpretation of nonlo cal neighborho od ﬁlters for signal denoising. SI AM J. Imaging S ci. 2 (1), 118–139 (2009) [48] Sm ith , S .M., Brady , J.M.: Susan-a new approach to lo w level image p rocessing. Int. J. Comput. Vision 23 (1), 45–78 (1997) [49] Sp ira, A., Kimmel, R.: Enhancing images painted on manifolds. In: Scale Space and PDF Metho ds in Computer Vision, pp. 492–502 (2005) [50] Sp okoin y , V.G.: Estimation of a function with discon tinuities v ia local p olynomial ﬁt with an adaptive windo w choice. A nn. S tatist. 26 (4), 1356–1378 (1998) [51] St arc k, J.L., Cand` es, E.J., Donoho, D.L.: The curvelet transform for image denoising. IEEE T rans. Image Process. 11 (6), 670–684 (2002) [52] Szlam, A.D., Maggioni, M., Coifman, R.R.: R egulariza tion on graphs with funct ion- adapted diﬀusion processes. J. Mach. Learn. Res. 9 , 1711–1739 (2008) [53] T akeda, H ., F arsiu, S., Milanfar, P .: Kernel regression for image pro cessing and reconstruction. IEEE T rans. Image Pro cess. 16 (2), 349–366 (2007) [54] T asdizen, T. : Principal neighborhoo d dictionaries for nonlocal means image denoising. I EEE T rans. Image Pro cess. 18 (12), 2649–2660 (2009) [55] T omasi, C., Manduchi, R.: Bilateral ﬁ ltering for gra y and color images. In: ICCV, p p. 839–846 (1998) [56] Tsybako v, A.B.: Optimal orders of accuracy of the estimation of n on smo oth images. Problemy Peredac hi Informatsii 25 (3), 13–27 (1989) [57] Tsybako v, A.B.: Introduction to nonparametric estimation. Springer Series in Statistics. Springer, New Y ork (2009) [58] V an De Ville, D ., Ko c her, M.: SURE-b ased Non-Lo cal Means. IEEE Signal Process. Lett. 16 , 973–976 (2009) [59] V an De Ville, D ., Ko cher, M.: N on-local means with dimensionality reduction and sure-based p arameter selection. IEEE T rans. Image Process. 99 (99), 1–1 ( 2011) [60] W atson, G.S.: Smo oth regression analysis. Sankhya: The In dian Journal of Statistics, S eries A 26 (4), 359–372 (1964) [61] W eissman, T., Ordentlic h, E., Seroussi, G., V erd ´ u, S., W einberger, M.J.: Universal discrete denoising: known channel. I EEE T rans. Inf. Theory 51 (1), 5–28 (2005) [62] Y aroslavsky , L.P .: Digital picture pro cessing, Springer Series in Information Scienc es , vol. 9. Sp rin ger- V erlag, Berlin (1985) [63] Zewail, A.H., Thomas, J.M.: 4D electron microscopy: imaging in sp ace and time. I mperial College Pr (2009) [64] Zontak, M., Irani, M.: Internal statistics of a single n atural image. I n: CVPR, pp. 977–984 (2011)

Oracle inequalities and minimax rates for non-local means and related adaptive kernel-based methods

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment