A characterization of proximity operators

A CHARA CTERIZA TION OF PR O XIMITY OPERA TORS R ´ EMI GRIBONV AL AND MILA N IKOLO V A Abstra ct. W e chara cterize proximit y operators, that is to say functions that map a vector to a solution of a p enalized least squares optimization problem. Pro ximit y operators of con vex p enalties hav e b een widely studied and fully c haracterized by Moreau. They are also widely used in practice with nonconv ex p enalties such as th e ℓ 0 pseudo-norm, yet the extension of Moreau’s characterization to this setting seemed to b e a missing elemen t of the literature. W e chara cterize pro ximity operators of (conv ex or n oncon vex) penalties as funct ions that are the sub diﬀerential of some convex p otential. This is prov ed as a consequence of a more general chara cterization of so-called Bregman p roximit y op erators of possibly nonconv ex p enalties in terms of certain conv ex p otentia ls. As a side eﬀect of our analysis, we obtain a test t o verify whether a giv en fun ction is the pro ximit y operator of some p enalt y , or not. Many w ell-kn o wn shrink age op erators are indeed conﬁ rmed to b e proximit y op erators. How ever, w e prove that windo we d Group-LASSO and p ersistent empirical Wiener shrinkag e – t w o forms of so-called so- cial sparsit y shrink age– are generall y not the pro ximity operator of any p enalt y; the except ion is when t h ey are simply w eigh ted versions of group-sparse shrinka ge with non-ov erlappin g groups. Keywords: Pro ximity op erator; Convex regulariza tion; Nonconvex regulariza tion; Sub diﬀer- entia l; S hrink age operator; Social sparsity; Group sparsi ty 1. Introduction a nd over view Pro ximit y op erators ha v e b ecome an imp ortan t ingredien t of nonsmo oth optimizatio n, where a h uge b o dy of w ork has demonstrated the p ow er of iterativ e pr o ximal algorithms to address large-scal e v ariational optimiza tion problems. While these tec hniqu es ha v e b een thoroughly analyzed and un dersto o d for p ro ximit y op erators in v olving con v ex p enalties, there is a deﬁnite trend to wards the use of pro ximit y operators of noncon v ex p en alties suc h as the ℓ 0 p enalt y [7, 8]. This pap er extends existing c h aracterizat ions of proximit y op erators – w hic h are sp ecialized for con vex p enalties – to the non conv ex case. A particular motiv ation is to u n derstand whether certain thresholding rules kno wn as so cial sp arsity shrinkage , which hav e b een successfully ex- ploited in the conte xt of certain linear inv erse problems, are p r o ximity op erators. Anot her mo- tiv ation is to c haracterize w h en Bay esian estimatio n with the conditional mean estimator (also This work and the companion pap er [19 ] are dedicated to the memory of Mila N iko lo v a, who p assed aw ay prematurely in Jun e 2018. Mila ded icated muc h of her energy to bring the technical conten t to completion during the spring of 2018. The ﬁrst author did his b est to ﬁnalize the papers as Mila would have wished. H e should be held resp onsible for an y p ossible imp erfection in the ﬁnal manuscript. R. Gribonv al ( remi.grib onv al@inria.fr) was with Univ Rennes, Inria, CNR S, IRISA when this wo rk was conducted. He is n o w with Univ Lyon, Inria, CNRS, ENS de Lyon, U CB Lyon 1, LIP UMR 5668, F-69342, Lyon, F rance. M. Nikolo v a, CMLA, CNRS and ENS d e Cac han, Univ ersit ´ e P aris-Sacla y , 94235 Cac han, F rance. 1 2 R ´ EMI GRIBONV AL AN D MILA NIKOLO V A kno wn as minimum mean square error estimation or MMSE) can b e expressed as a pro ximit y op erator. This is the ob j ect of a companion pap er [19] c haracterizing wh en certain v ariational approac hes to address inv erse p roblems can in fact b e considered as Ba yesia n app roac hes. 1.1. Characterization of pro ximit y op erators. Let H b e a Hilb ert space equipp ed with an inner pro d uct h· , ·i and a norm k · k . This in cludes the case H = R n , and most of the text can b e read with this simp ler setting in mind. Th e proximit y op erator of a fu nction ϕ : H → R maps eac h y ∈ H to the solutions of a p enalized least-squares pr oblem y 7→ pro x ϕ ( y ) := arg min x ∈H  1 2 k y − x k 2 + ϕ ( x )  F ormally , a pro ximit y op erator is set-v alued as there ma y b e sev eral solutions to this problem, or t he set of solutions ma y be empty . A pr imary example is the soft-thresholding f unction f ( y ) := y max(1 − 1 / | y | , 0), y ∈ R , which is the proximit y op erator of the a bsolute v alue function ϕ ( x ) := | x | . Pro ximit y op erators can b e deﬁned for certain generalized fun ctions ϕ : H → R ∪ { + ∞} . A particular example is the p ro jection on to a giv en closed con v ex set C ⊂ H , w hic h can b e written as pro j C = prox ϕ with ϕ the indicator fu nction of C , i.e., ϕ ( x ) = 0 if x ∈ C , ϕ ( x ) = + ∞ otherwise. F or the sake of precision and brevity , w e use the follo wing d eﬁnition: Definition 1 . L et Y ⊂ H b e non-empty. A function f : Y → H is a pr oximity op er ator of a function ϕ : H → R ∪ { + ∞} if, and only if, f ( y ) ∈ pro x ϕ ( y ) for e ach y ∈ Y . In conv ex analysis, this corresp onds to th e notion of a sele ction of the set-v alued m apping pro x ϕ . A c haracterization of pr o ximity op erators of c onvex lower semic ontinuous (l.s.c.) functions is due to Moreau. It in volv es the sub diﬀerentia l ∂ θ ( x ) of a co nv ex l.s.c. function θ at x , i.e., the set of all its sub gradien ts at x [14, Chapter I I I.2] 1 . Pr oposition 1 . [26, Corollary 10.c] A function f : H → H deﬁne d everywher e is the pr oximity op er ator of a prop er con v ex l.s.c. function ϕ : H → R ∪ { + ∞} if, and only if the fol lowing c onditions hold j ointly: (a) ther e exists a (c onvex l.s.c.) function ψ such that for e ach y ∈ H , f ( y ) ∈ ∂ ψ ( y ) ; (b) f is nonexp ansive, i.e. k f ( y ) − f ( y ′ ) k 6 k y − y ′ k , ∀ y , y ′ ∈ H . W e extend Moreau’s result to p ossib ly n oncon vex functions ϕ on sub domains of H by simply relaxing the n on-expansivit y condition: Theorem 1 . L et Y ⊂ H b e non-empty. A function f : Y → H is a pr oximity op er ator of a function ϕ : H → R ∪ { + ∞} if, and only if, ther e exists a c onvex l.s.c. function ψ : H → R ∪ { + ∞} su c h that for e ach y ∈ Y , f ( y ) ∈ ∂ ψ ( y ) . 1 See Section 2.1 for detailed n otations and remind ers on con vex analysis and diﬀerentiabilit y in Hilbert sp aces. A CHA RACTERIZ A TION OF PROXIMITY OPERA TORS 3 This is pr ov ed in Section 2 as a particular consequence of our main result, Theorem 3, whic h c h aracterizes functions suc h that f ( y ) ∈ arg min x ∈H { D ( x, y ) + ϕ ( x ) } for certain t yp es of data- ﬁdelit y terms D ( x, y ). Among others, the d ata-ﬁdelit y terms co v ered by Th eorem 3 in clude: • the Euclidean d istance D ( x, y ) = 1 2 k y − x k 2 , whic h i s the d ata-ﬁdelit y associated to pro ximit y op erators; • its v arian t D ( x, y ) = 1 2 k y − M x k 2 with M some linear op erator; and • Bregman dive rgences [9], leading to an analog of Theorem 1 to charac terize so-called Bregman pro ximit y op erators [12] (see Corollary 5 in Section 2). Theorem 3 further implies that the fun ctions ϕ and ψ in Th eorem 1 can b e c hosen su c h that (1) ψ ( y ) = h y , f ( y ) i − 1 2 k f ( y ) k 2 − ϕ ( f ( y )) , ∀ y ∈ Y . This is a particular instance of a more general result v alid for all considered d ata-ﬁdelit y terms. Another consequence of Theorem 3 (see Corollary 4 in Section 2) is th at for the considered data-ﬁdelit y terms D ( x, y ), if f : Y → H can b e written as f ( y ) ∈ arg min x ∈H { D ( x, y ) + ϕ ( x ) } for some (p ossib ly nonconv ex) fu nction ϕ and if its image Im( f ) := f ( Y ) is a conv ex set (e.g., if Im( f ) = H ) then the fu nc tion x 7→ D ( x, y ) + ϕ ( x ) is c onvex on Im ( f ). This is reminiscent of observ ations on con vex optimization with nonconv ex p enalties [28, 31] and on th e h idden conv exit y of conditional mean estimat ion un der add itiv e Gaussian noise [17, 18, 25, 1]. The latter is extend ed to other noise mo dels in the companion pap er [19]. 1.2. The case of smo oth pro ximit y op erators. The smo othness of a proximit y op erator f = pro x ϕ and that of the corresp onding functions ϕ and ψ , cf (1), are inte r-related, leading to a c h aracterization of c ontinuous p ro ximit y op erators 2 . Corollar y 1 . L et Y ⊂ H b e non-empty and op en and f : Y → H b e C 0 . The f ol lowing ar e e quivalent: (a) f is a pr oximity op er ator of a function ϕ : H → R ∪ { + ∞} ; (b) ther e exists a c onvex C 1 ( Y ) function ψ such that f ( y ) = ∇ ψ ( y ) for e ach y ∈ Y . This is established in Section 2.6 as a particular consequence of our second main result, Corollary 6. There, w e also prov e that wh en f is a proximit y op erator of some ϕ , the Lipsc hitz prop erty of f with L ipsc hitz constan t L is equiv alen t to the conv exit y of x 7→ ϕ ( x ) +  1 − 1 L  k x k 2 2 . Moreau’s charac terization (Prop osition 1) corresp ond s to the sp ecial case L = 1. Next, w e c h aracterize C 1 pro ximit y op erators on con vex domains more explicitly usin g the diﬀerenti al of f . Theorem 2 . L et Y ⊂ H b e non-empty, op en and c onvex, and f : Y → H b e C 1 . Th e fol lowing pr op erties ar e e quivalent: (a) f is a pr oximity op er ator of a function ϕ : H → R ∪ { + ∞} ; (b) ther e exists a c onvex C 2 ( Y ) function ψ such that f ( y ) = ∇ ψ ( y ) for e ach y ∈ Y ; 2 See Section 2.1 for brief reminders on the notion of con tin uity / diﬀerentiabili ty in Hilb ert spaces. 4 R ´ EMI GRIBONV AL AND MILA NIKOLO V A (c) the diﬀer e ntial D f ( y ) is a s y m metric p ositive semi-deﬁn ite op er ator 3 for e ach y ∈ Y . Pr o of . Since f is C 1 , the equiv alence (a) ⇔ (b) is a consequence o f C orollary 1 . W e n o w establish (b) ⇔ (c). Since Y is conv ex it is simply connected, and as Y is op en b y P oincar’s lemma (see [16, Theorem 6.6.3] w h en H = R n ) the d iﬀeren tial D f is sym metric if, and only if, f is the gradient of some C 2 function ψ . By [6, Pr op osition 17.7], th e function ψ is con vex iﬀ ∇ 2 ψ  0 on Y , i.e. iﬀ D f  0 on Y .  Corollar y 2 . L e t Y ⊂ H b e a set with non-empty interior int( Y ) 6 = ∅ , y ∈ int( Y ) , and f : Y → H b e a pr oximity op er ator. If f is C 1 in a neighb orho o d of y , then D f ( y ) is symmetric p ositive semi-deﬁnite. Pr o of . Restrict f to an y op en con vex neighborh o o d Y ′ ⊂ Y of y and apply Theorem 2.  Remark 1 . Diﬀerentia ls are p erhaps m ore familiar to some readers in the context of multiv ariate calculus: when y = ( y j ) n j = 1 ∈ H = R n and f ( y ) = ( f i ( y )) n i =1 , D f ( y ) is identi ﬁed to the Jacobian matrix J f ( y ) = ( ∂ f i ∂ y j ) 1 6 i,j 6 n . The rows of J f ( y ) are the transp osed grad ients ∇ f i ( y ). The d iﬀeren tial is symmetric if the mixed deriv ativ es satisfy ∂ f i ∂ y j = ∂ f j ∂ y i for all i 6 = j . When n = 3, this corresp ond s to f b eing an irr otational ve ctor ﬁeld . More ge nerally , this c haracterizes the fact that f is a so-calle d c onservative ﬁeld , i. e., a vecto r ﬁeld th at is th e gradien t of some p oten tial fun ction. As th e Jacobian is the Hessian of this p oten tial, it is p ositiv e deﬁnite if the p otentia l is con v ex. Finally w e provide conditions ensuring that f is a p ro x im ity op erator and that f ( y ) is the only critical p oin t of the corresp onding optimization problem. Corollar y 3 . L et Y ⊂ H b e op en and c onvex, and f : Y → H b e C 1 with D f ( y ) ≻ 0 on Y . Then f is inje ctiv e and ther e is ϕ : H → R ∪ { + ∞} such that p ro x ϕ ( y ) = { f ( y ) } , ∀ y ∈ Y and dom( ϕ ) = Im( f ) . Mor e over, i f D f ( y ) is b oundedly inv ertible on Y then ϕ is C 1 on Y and for e ach y ∈ Y , the only critic al p oint of x 7→ 1 2 k y − x k 2 + ϕ ( x ) is x = f ( y ) . This is established in App endix A.6. Remark 2 . I n ﬁnite d imension H = R n , D f ( y ) is b oun dedly inv ertible as so on as D f ( y ) ≻ 0, hence we only need to assume that D f ( y ) ≻ 0 to conclude that f ( y ) is the unique critical p oint . This is no longer the case in inﬁnite dimension. In deed, consider H = ℓ 2 ( N ) and f : y = ( y n ) n ∈ N 7→ f ( y ) := ( y n / ( n + 1)) n ∈ N . As f is linear, its diﬀeren tial is D f ( y ) = f for ev ery y ∈ H . As h f ( y ) , y i = P n ∈ N y n 2 / ( n + 1) > 0 for eac h n onzero y ∈ H we ha ve D f ( y ) ≻ 0 but its inv erse is unb ou n ded. Given n ∈ N and z ∈ R we hav e z / ( n + 1) = arg min x ∈ R 1 2 ( z − x ) 2 + nx 2 / 2, hence f = pro x ϕ 0 with ϕ 0 : x = ( x n ) n ∈ N 7→ ϕ 0 ( x ) := P n ∈ N nx 2 n / 2. Setting ϕ ( x ) = ϕ 0 ( x ) for x ∈ Im( f ), ϕ ( x ) = + ∞ otherwise, w e ha v e pro x ϕ = f and dom( ϕ ) = Im( f ) = { x ∈ 3 A contin uous linear op erator L : H → H is symmetric if h x, Ly i = h Lx, y i for each x, y ∈ H . A symmetric conti nuous linear op erator is p ositive semi-deﬁnite if h x, Lx i > 0 for eac h x ∈ H . This is denoted L  0. It is p ositiv e d eﬁnite if h x, Lx i > 0 for eac h nonzero x ∈ H . This is denoted L ≻ 0. A CHA RACTERIZ A TION OF PROXIMITY OPERA TORS 5 H , P n ∈ N ( n + 1) 2 x 2 n < ∞} . Y et, as no p oint in dom( ϕ ) admits any op en neigh b orho o d in H , ϕ is no where diﬀeren tiable and every x ∈ H is a critical p oint of x 7→ 1 2 k y − x k 2 + ϕ ( x ). T erminology . Pro ximit y op erators often app ear in the conte xt of p enalized least squares regression, where ϕ is called a p enalty , and from n o w on we will adopt th is terminology . In ligh t of Corollary 1, a con tin u ous pr o ximity op erator is exactly c h aracterized as a gradient of a con vex function ψ . In the terminology of physics, a p ro ximit y op erator is th us a c onservative ﬁeld asso ciated to a c onvex p otential . In th e language of con v ex analysis, sub d iﬀeren tials of con vex functions are c h aracterized as maximal cyclically monotone op erators [30, Theorem B]. 1.3. Organization of the pap er. Th e pr o of of our most general results, Th eorem 3 and Corol- lary 6 (and the fact that th ey imply Theorem 1, (1), Corollary 4 and Corollary 1) are established in Section 2, wher e we also discuss their consequences in terms of Bregman pr o ximity op era- tors and illustrate them on concrete examples. A s Theorem 1 and its corollaries characte rize whether a function f is a p ro ximit y op erator and stud y its smo othness in r elation to that of the corresp onding p enalty and p oten tial, they are particularly u seful when f is not explicitly b u ilt as a proximit y op erator. Th is is th e case of so-called so cial shrink age op erators (see e.g. [22]). W e conclude the pap er by showing in Sectio n 3 that so cial shrink age op erators are generally not the pro ximit y op erator of any p en alt y . 1.4. Discussion. In ligh t of the extension to noncon v ex p enalties of Moreau’s charact erization of pro ximit y op erators of con v ex (l.s.c.) p enalties (Prop osition 1), the nonexpansivity of the pro ximit y op erator f determines wh ether the underlying p enalt y ϕ is con vex or not. Wh ile non- expansivit y certainly pla ys a role in the con v ergence analysis of iterativ e pr oximal algorithms based on con v ex p enalties, the ad ap tation of su c h an analysis wh en the pr oximit y op erator is Lipsc hitz rather than nonexpansiv e, using Prop osition 1, is an int eresting p ersp ectiv e. The characte rization of smo oth proximit y op erators as the gradients of conv ex p oten tials, whic h also app ear in optimal transp ort (see e.g., [36]), su ggests that further work is needed to b etter un derstand th e connections b etw een th ese concepts and to ols. Th is could p ossibly lead to s impliﬁed arguments w here the strong mac hinery of con v ex analysis ma y b e u sed more explicitly despite the apparen t lac k of conv exit y of th e optimization problems associated to noncon v ex p enalties. 2. Main resu l ts W e no w state our main results, Theorem 3 and Corollary 6, and p ro v e a num b er of their consequences including Theorem 1, (1), Corollary 4 and C orollary 1 w hic h w ere adv ertized in Section 1 . The most tec hnical pro ofs are p ostp on ed to the App endix. 2.1. Detailed notations. The indicator function of a set S is denoted χ S ( x ) :=  0 if x ∈ S , + ∞ if x 6∈ S . The d omain of a function θ : H → R ∪ { + ∞} is d eﬁ ned and d enoted by dom( θ ) := { x ∈ H | θ ( x ) < ∞} . Giv en Y ⊂ H and a f u nction f : Y → H , the image of Y under f is d enoted by Im( f ). A function θ : H → R ∪ { + ∞} is prop er iﬀ there is x ∈ H suc h that θ ( x ) < + ∞ , i.e., 6 R ´ EMI GRIBONV AL AND MILA NIKOLO V A dom( θ ) 6 = ∅ . It is lo wer semicont in uous (l.s.c.) if for eac h x 0 ∈ H , lim inf x → x 0 θ ( x ) > θ ( x 0 ), or equiv alen tly if the s et { x ∈ H : θ ( x ) > α } is op en for ev ery α ∈ R . A subgradient of a conv ex function θ : H → R ∪ { + ∞} at x is an y u ∈ H suc h that θ ( x ′ ) − θ ( x ) > h u, x ′ − x i , ∀ x ′ ∈ H . A fun ction with k contin uous deriv ativ es 4 is called a C k function. The notation C k ( X ) is used to sp ecify a C k function on an op en domain X . Th us C 0 is the space of con tinuous fu nctions, whereas C 1 is the space of con tin uously d iﬀeren tiable fun ctions [11, p. 327]. The gradient of a C 1 scalar fu nction θ at x is den oted ∇ θ ( x ). The segmen t b et w een t w o elemen ts x, x ′ ∈ H is th e set [ x, x ′ ] := { tx + (1 − t ) x ′ , t ∈ [0 , 1] } . A ﬁ n ite union of segmen ts [ x i − 1 , x i ], 1 6 i 6 n , n ∈ N , wh er e x 0 = x and x n = x ′ is called a p olygonal path b et ween x and x ′ . A non-empt y subset C ⊂ H is p olygonally connected iﬀ b et w een eac h pair x, x ′ ∈ C there is a p olygonal p ath with all its segment s included in C , [ x i − 1 , x i ] ⊂ C . Remark 3 . The notion of p olygonal-connectedness is a bit stronger than that of connect- edness. Indeed, p olygonal-connectedness implies the classical to p ological prop ert y of path- connectedness, whic h in turn implies co nnectedness. Ho w ev er th ere are path-connected s ets that are not p olygonally-connecte d – e.g., the un it circle in R 2 is path-connected, but no t w o p oint s are p olygonally-connected, and there are conn ected sets that are not path-connected. Y et, every op en connected set is p olygonally-c onnected, see [16, Theorem 2.5.2] for a statemen t in R n . 2.2. Main t heorem. Theorem 3 . Consider H and H ′ two Hilb ert sp ac e s 5 , and Y ⊂ H ′ a non-empty set. L et a : Y → R ∪ { + ∞} , b : H → R ∪ { + ∞} , A : Y → H and B : H → H ′ b e arbitr ary functions. Consider f : Y → H and denote Im( f ) the image of Y under f . (a) L et D ( x, y ) := a ( y ) − h x, A ( y ) i + b ( x ) . The fol lowing pr op erties ar e e q uivalent: (i) ther e is ϕ : H → R ∪ { + ∞} su ch that f ( y ) ∈ arg min x ∈H { D ( x, y ) + ϕ ( x ) } for e ach y ∈ Y ; (ii) ther e is a c onvex l.s.c. g : H → R ∪ { + ∞} such that A ( f − 1 ( x )) ⊂ ∂ g ( x ) f or e ach x ∈ Im( f ) ; When they h old, ϕ (r esp. g ) c an b e chosen given g (r esp. ϕ ) so that g ( x ) + χ Im( f ) = b ( x ) + ϕ ( x ) . (b) L et ϕ and g satisfy (ai) and (aii) , r esp e ctively, and let C ⊂ Im( f ) b e p olygonal ly c onne cte d. Then ther e is K ∈ R such that g ( x ) = b ( x ) + ϕ ( x ) + K, ∀ x ∈ C . (2) (c) L et e D ( x, y ) := a ( y ) − h B ( x ) , y i + b ( x ) . Th e fol lowing pr op erties ar e e qu ivalent: (i) ther e is ϕ : H → R ∪ { + ∞} su ch that f ( y ) ∈ arg min x ∈H { e D ( x, y ) + ϕ ( x ) } for e ach y ∈ Y ; 4 see A p p endix A.1 for some remind ers on F r´ ec het deriv ativ es in Hilb ert spaces. 5 F or the sake of simplicit y we u se t he same notation h· , ·i for the inner pro ducts h x , A ( y ) i (b etw een elemen t s of H ) and h B ( x ) , y i (b etw een elements of H ′ ). The reader can insp ect the pro of of Theorem 3 to check that the result still holds if w e consider Banach sp ac es H and H ′ , H ⋆ and ( H ′ ) ⋆ their duals, and A : Y → H ⋆ , B : H → ( H ′ ) ⋆ . A CHA RACTERIZ A TION OF PROXIMITY OPERA TORS 7 (ii) ther e is a c onvex l.s.c. ψ : H ′ → R ∪ { + ∞} such that B ( f ( y )) ∈ ∂ ψ ( y ) for e ach y ∈ Y . ϕ (r esp. ψ ) c an b e chosen given ψ (r esp. ϕ ) so that ψ ( y ) = h B ( f ( y ) , y i − b ( f ( y )) − ϕ ( f ( y )) on Y . (d) L et ϕ and ψ sa tisfy (ci) and (cii) , r esp e ctively, and let C ′ ⊂ Y b e p olygonal ly c onne cte d. Then ther e is K ′ ∈ R such that ψ ( y ) = h B ( f ( y )) , y i − b ( f ( y )) − ϕ ( f ( y )) + K ′ , ∀ y ∈ C ′ . (3) The pro of of Th eorem 3 is p ostp oned to App endix A.4. As stated in (a) (resp. (c) ), the functions can b e c hosen suc h that the relation (2) (resp . (3)) holds on Im( f ) (resp. on Y ) with K = K ′ = 0. As the functions ϕ, g , ψ are at b est deﬁned up to an add itiv e constant, w e pro vide in (b) (resp. (d)) conditions ensu ring th at add ing a constan t is indeed the un ique d egree of freedom. The role of p olygonal-connectedness will b e illustrated on examples in S ection 2.7. Example 1 . In the con text of linear inv erse problems one often encounte rs optimization prob- lems inv olving functions expressed as 1 2 k y − M x k 2 + ϕ ( x ) with M some linear op erator. Su c h fun c- tions ﬁt in to the framew ork of Theorem 3 u sing a ( y ) := 1 2 k y k 2 , b ( x ) := 1 2 k M x k 2 , A ( y ) := M ⋆ y , and B ( x ) := M x , where M ⋆ is the adj oint of M . Among other consequences on e gets that f : Y → H is a generalized p ro ximit y op erator of this type for some p enalt y ϕ if, an d only if, there is a conv ex l.s.c. ψ such that M f ( y ) ∈ ∂ ψ ( y ) for eac h y ∈ Y . Examples where the d ata-ﬁdelit y term is a so-call ed Bregma n div ergence are detail ed in Section 2.4 b elo w. T h is co vers the case of standard proximit y op erators wh ere D ( x, y ) = 1 2 k y − x k 2 . 2.3. Conv exit y in pro ximit y operat ors of noncon vex p enalties. An interesti ng conse- quence of Th eorem 3 is th at the optimization problem asso ciated to (generalized) proximit y op erators is in a sense alw a ys con v ex, ev en when th e considered p enalt y ϕ is not con vex. Corollar y 4 . Consider H , H ′ two Hilb ert sp ac es. L et Y ⊂ H ′ b e non-e mpty and f : Y → H . Assume that ther e is ϕ : H → R ∪ { + ∞} such that f ( y ) ∈ arg min x ∈H { D ( x, y ) + ϕ ( x ) } for e ach y ∈ Y , with D ( x, y ) = a ( y ) − h x, A ( y ) i + b ( x ) as in The or em 3 (a) . Then (a) the function x 7→ b ( x ) + ϕ ( x ) is c onvex on e ach c onvex subset C ⊂ Im( f ) ; (b) if Im( f ) is c onvex, then the function x ∈ Im( f ) 7→ D ( x, y ) + ϕ ( x ) is c onvex, ∀ y ∈ Y . Similarly, if ther e i s ϕ : H → R ∪ { + ∞} such that f ( y ) ∈ arg min x ∈H n e D ( x, y ) + ϕ ( x ) o for e ach y ∈ Y , with e D ( x, y ) = a ( y ) − h B ( x ) , y i + b ( x ) as in The or em 3 (c) then y 7→ h B ( f ( y )) , y i − b ( f ( y )) − ϕ ( f ( y )) is c onvex on e ach c onvex su b set C ′ ⊂ Y . Pr o of . (a) follo ws from T heorem 3(a)-(b). (b) f ollo ws from (a) and the deﬁnition of D . The pro of of the resu lt with e D in stead of D is similar.  Corollary 4(b) migh t seem surprisin g as, giv en a nonconv ex p enalt y ϕ , one ma y exp ect the optimization problem min x D ( x, y ) + ϕ ( x ) to b e nonconv ex. Ho w ev er, as n oticed e.g. b y [27, 28, 31], there are noncon vex p enalties suc h that this problem with D ( x, y ) := 1 2 k y − x k 2 is in fact con vex. Corollary 4 establishes that this conv exit y prop ert y indeed holds whenever th e image Im( f ) of the resulting function f is a con v ex set. A p articular case is that of functions f bu ilt as 8 R ´ EMI GRIBONV AL AND MILA NIKOLO V A conditional exp ectations in the con text of add itiv e Gaussian d enoising, w h ic h hav e b een sh o w n [17] to b e pro ximit y op erators. Extensions of this phenomenon for cond itional mean estimation with other n oise m o dels are discus s ed in the companion p ap er [19]. 2.4. Application to Bregman pro ximit y op erators. Th e squared E uclidean norm is a par- ticular Br e gman diver genc e , and Theorem 3 charac terizes generalized proximit y op erators d eﬁned with suc h divergences. The Bregman d ivergence, known also as D -fu nction, was in tro duced in [9] for strictly c on v ex d iﬀeren tiable functions on so-called linear top ological space s. F or the goals of our study , it will b e enough to consider that h : H → R ∪ { + ∞} is prop er, con vex and diﬀeren tiable on a Hilb ert space. Definition 2 . L et h : H → R ∪ { + ∞} b e pr op er c onvex and diﬀer entiable on its op en domain dom( h ) . The Br e gman diver genc e (asso ciate d with h ) b etwe en x and y is deﬁne d b y (4) D h : H × H → [0 , + ∞ ] : ( x, y ) → ( h ( x ) − h ( y ) − h∇ h ( y ) , x − y i , if y ∈ dom( h ); + ∞ , otherwise In Th eorem 3(a) one obtains D ( x, y ) = D h ( x, y ) b y setting a ( y ) = + ∞ and A ( y ) arb itrary if y / ∈ dom( h ) and, for y ∈ dom( h ) and eac h x ∈ H , (5) a ( y ) := h∇ h ( y ) , y i − h ( y ) b ( x ) := h ( x ) and A ( y ) = ∇ h ( y ) The lac k of symmetry of the Bregman diverge nce suggests to consider also D h ( y , x ). In Th eo- rem 3(c) one obtains e D ( x, y ) = D h ( y , x ) u sing b ( x ) = + ∞ and B ( x ) arbitrary for x / ∈ dom( h ) and, for x ∈ dom( h ) and eac h y ∈ H , (6) a ( y ) := h ( y ) b ( x ) := h∇ h ( x ) , x i − h ( x ) and B ( x ) = ∇ h ( x ) The next claim is an applicati on of Theorem 3 with D ( x, y ) = D h ( x, y ) and e D ( x, y ) = D h ( y , x ). W e th us consider the so-called Bregman pro xim ity op erators whic h we re introd uced in [12]. W e will fo cus on the charac terization of th ese op erators d eﬁned by y 7→ arg min x ∈H { D h ( x, y )+ ϕ ( x ) } and y 7→ arg m in x ∈H { D h ( y , x ) + ϕ ( x ) } . Su c h op erators ha v e b een further stud ied in [5] with an emph asis on the notion of viabilit y , whic h is essentia l for these op erators to b e useful in the context of iterativ e algorithms. Corollar y 5 . Con sider f : Y → H . L et h : H → R ∪ { + ∞} b e a pr op er c onvex function that is diﬀer e ntiable on its op en domain dom( h ) . L et D h r e ad as in (4) . (a) The fol lowing pr op erties ar e e qui valent: (i) ther e is ϕ : H → R ∪ { + ∞} such that f ( y ) ∈ arg m in x ∈H { D h ( x, y ) + ϕ ( x ) } , ∀ y ∈ Y ; (ii) ther e is a c onvex l.s.c. g : H → R ∪ { + ∞} s.t. ∇ h ( f − 1 ( x )) ⊂ ∂ g ( x ) , ∀ x ∈ Im( f ) ; When they h old, ϕ (r esp. g ) c an b e chosen given g (r esp. ϕ ) so that g ( x ) + χ Im( f ) = h ( x ) + ϕ ( x ) . (b) L et ϕ and g satisfy (ai) and (aii) , r esp e ctively, and let C ⊂ Im( f ) b e p olygonal ly c onne cte d. Then ther e is K ∈ R such that g ( x ) = h ( x ) + ϕ ( x ) + K, ∀ x ∈ C . (c) The f ol lowing pr op erties ar e e quivalent: A CHA RACTERIZ A TION OF PROXIMITY OPERA TORS 9 (i) ther e is ϕ : H → R ∪ { + ∞} such that f ( y ) ∈ arg m in x ∈H { D h ( y , x ) + ϕ ( x ) } , ∀ y ∈ Y ; (ii) ther e is a c onvex l.s.c. ψ : H → R ∪ { + ∞} su c h that ∇ h ( f ( y )) ∈ ∂ ψ ( y ) , ∀ y ∈ Y . ϕ c an b e chosen given ψ (r esp. ψ g i ven ϕ ) s.t. ψ ( y ) = h∇ h ( f ( y )) , y − f ( y ) i + h ( f ( y )) − ϕ ( f ( y )) , ∀ y ∈ Y . (d) L et ϕ and ψ satisfy (ci) (cii ) , r esp e ctively, and let C ′ ⊂ Y b e p olygonal ly c onne c te d. Th en ther e is K ′ ∈ R such that ψ ( y ) =  ∇ h ( f ( y )) , y − f ( y )  + h ( f ( y )) − ϕ ( f ( y )) + K ′ , ∀ y ∈ C ′ . Pr o of . (a) and (b) use (5) . F urther , (c) and (d) use (6 ).  2.5. Sp ecialization t o (standard) pro ximit y op era tors. Standard (Hilb ert space) pr o x- imit y op erators corresp ond to taking as the Bregman div ergence D h ( x, y ) = 1 2 k y − x k 2 , which is asso ciated to h ( x ) := 1 2 k x k 2 . An immediate consequence of Corollary 2.4 is th e follo wing theorem, whic h imp lies T heorem 1 and (1). Theorem 4 . L et Y ⊂ H b e non-empty, and f : Y → H . (a) The fol lowing pr op erties ar e e qu ivalent: (i) ther e is ϕ : H → R ∪ { + ∞} such that f ( y ) ∈ prox ϕ ( y ) for e ach y ∈ Y ; (ii) ther e i s a c onvex l.s.c . g : H → R ∪ { + ∞} such that f − 1 ( x ) ⊂ ∂ g ( x ) for e ach x ∈ Im( f ) ; (iii) ther e is a c onvex l.s.c. ψ : H → R ∪ { + ∞} suc h that f ( y ) ∈ ∂ ψ ( y ) for e ach y ∈ Y . When they hold, ther e exists a c hoic e of ϕ, g , ψ satisfying (ai) - (aii) - (aiii) such that g ( x ) + χ Im( f ) = 1 2 k x k 2 + ϕ ( x ) , ∀ x ∈ H ; ψ ( y ) = h y , f ( y ) i − 1 2 k f ( y ) k 2 − ϕ ( f ( y )) , ∀ y ∈ Y . (b) L et ϕ , g and ψ satisfy (ai) , (a ii) and (aiii) , r esp e ctive ly. L et C ⊂ Im( f ) and C ′ ⊂ Y b e p olygonal ly c onne cte d. Th en ther e exist K, K ′ ∈ R such that g ( x ) = 1 2 k x k 2 + ϕ ( x ) + K, ∀ x ∈ C ; (7) ψ ( y ) = h y , f ( y ) i − 1 2 k f ( y ) k 2 − ϕ ( f ( y )) + K ′ , ∀ y ∈ C ′ . (8) 2.6. Lo cal smo othness of pro ximit y op erators. Th eorem 4 c haracterizes p ro ximit y op er- ators in terms of three functions: a (p ossibly noncon ve x) p enalt y ϕ , a con v ex p oten tial ψ , and another conv ex function g . As we now sho w, the prop erties of these functions are tigh tly in ter-related. First w e extend Moreau’s characte rization (Prop osition 1) as follo w s: Pr oposition 2 . Consider f : H → H deﬁne d everywher e, and L > 0 . The fol lowing ar e e quivalent: (1) ther e is ϕ : H → R ∪ { + ∞} s.t. f ( y ) ∈ prox ϕ ( y ) on H , and x 7→ ϕ ( x ) + (1 − 1 L ) k x k 2 2 is c onvex l.s.c; (2) the fol lowing c onditions hold jointly: (a) ther e exists a (c onvex l.s.c.) f u nction ψ such that f or e ach y ∈ H , f ( y ) ∈ ∂ ψ ( y ) ; (b) f is L -Lipschitz, i.e. k f ( y ) − f ( y ′ ) k 6 L k y − y ′ k , ∀ y , y ′ ∈ H . 10 R ´ EMI GRIBONV AL AND MILA NIKOLO V A Pr o of . (1) ⇒ (2a). Simply observ e that f is a pro ximit y op er ator and use Theorem 4(ai) ⇒ (aiii) . (1) ⇒ (2b). The function ˜ ϕ ( z ) := 1 L ( ϕ ( Lz ) + (1 − 1 L ) k Lz k 2 2 ) is conv ex l.s.c. by assumption. W e pro v e b elo w that ˜ f := f /L is a pr o ximity op erator of ˜ ϕ . By Prop osition 1 ˜ f is thus non- expansiv e, i.e., f is L -Lip sc h itz. T o show ˜ f ( y ) ∈ pro x ˜ ϕ ( y ) for eac h y ∈ H , observ e that ϕ ( x ) = L ˜ ϕ ( x/L ) − (1 − 1 L ) k x k 2 2 . F or eac h x ∈ H 1 2 k y − x k 2 + ϕ ( x ) = k y k 2 2 − h y , x i + k x k 2 2 + L ˜ ϕ ( x/L ) − (1 − 1 L ) k x k 2 2 = k y k 2 2 − h y , x i + k x k 2 2 L + L ˜ ϕ ( x/L ) = k y k 2 2 − L h y , z i + L k z k 2 2 + L ˜ ϕ ( z ) = (1 − L ) k y k 2 2 + L  1 2 k y − z k 2 + ˜ ϕ ( z )  , with z = x/L. Since x = f ( y ) is a minimizer of the left-hand -sid e, z = f ( y ) /L = ˜ f ( y ) is a min imizer of the righ t hand side, h ence ˜ f is a proximit y op erator of ˜ ϕ as claimed. (2a) and (2b) ⇒ (1). By (2a) th e f unction ˜ ψ ( y ) := ψ ( y ) /L is con vex l.s.c and f ( y ) /L ∈ ∂ ˜ ψ ( y ). By Theorem 4(aiii) ⇒ (ai) ˜ f := f /L is th erefore a pr o ximity op erator. Since f is L -Lipsc hitz, ˜ f is non-expansive hence b y Prop osition 1 ˜ f is a pr o ximity op erator of some c onvex l.s.c p enalt y ˜ ϕ . Th e fu nction ϕ ( x ) := L ˜ ϕ ( x/L ) − (1 − 1 L ) k x k 2 2 is suc h that ϕ ( x ) + (1 − 1 L ) k x k 2 2 = L ˜ ϕ ( x/ L ) is conv ex l.s.c. as claimed. By the same argument as ab ov e, a s z = ˜ f ( y ) is a minimizer of 1 2 k y − z k 2 + ˜ ϕ ( x ), x = Lz = f ( y ) is a m inimizer of 1 2 k y − x k 2 + ϕ ( x ), sho wing that f is indeed a pro ximit y op erator of ϕ .  Next we consider additional pr op erties of these functions. Corollar y 6 . L et Y ⊂ H and f : Y → H . Consider thr e e functions ϕ , g , ψ on H satisfying the e quivalent pr op erties (ai) , (aii) and (aiii) of The or em 4, r esp e ctively. L et k > 0 b e an i nte ger. (a) Consider an op en set V ⊂ Y . The fol lowing two pr op erties ar e e quivalent: (i) ψ is C k +1 ( V ) ; (ii) f is C k ( V ) ; When one of them holds, we have f ( y ) = ∇ ψ ( y ) , ∀ y ∈ V . (b) Consider an op en set X ⊂ Im( f ) . The fol lowing thr e e pr op erties ar e e qu i valent: (i) ϕ is C k +1 ( X ) ; (ii) g is C k +1 ( X ) ; (iii) the r estriction e f of f to the set f − 1 ( X ) is inje ctive and ( e f ) − 1 is C k ( X ) . When one of them holds, e f is a bije ction b etwe e n f − 1 ( X ) and X , and we have ( e f ) − 1 ( x ) = ∇ g ( x ) = x + ∇ ϕ ( x ) , ∀ x ∈ X . Before p ro ving this corollary , let us ﬁrs t men tion that the charac terization of an y c ontinuous pro ximit y op erator f as th e gradien t of a C 1 con vex p otentia l ψ , i.e., f = ∇ ψ , is a direct consequence of Corollary 6 (a) and Th eorem 1. This establishes Corollary 1 from Section 1. The pro of of Corollary 6 relies on the follo wing tec hnical lemma whic h w e prov e in Ap p en- dix A.5 as a consequence of [6, Prop 17.41 ]. A CHA RACTERIZ A TION OF PROXIMITY OPERA TORS 11 Lemma 1 . Consider a function  : H → H , a function θ : H → R ∪ { + ∞} and an op en set X ⊂ dom(  ) ∩ dom( θ ) ⊂ H . Assume that θ is sub diﬀer entiable at e ach x ∈ X and that (9) ∀ x ∈ X  ( x ) ∈ ∂ θ ( x ) Then the fol lowing statements ar e e quiv alent: (a)  is c ontinuous on X ; (b) θ is c ontinuously diﬀer entiable on X i.e . , its g r adient ∇ θ ( x ) is c ontinuous on X . When one of the statements holds, {  ( x ) } = {∇ θ ( x ) } = ∂ θ ( x ) for e ach x ∈ X . Pr o of . [Pr o of of Corollary 6] (ai) ⇔ ( aii) B y assu mption ψ satisﬁes Theorem 4(cii), i.e., f ( y ) ∈ ∂ ψ ( y ), ∀ y ∈ V . By Lemma 1 with  := f an d the con v ex function θ := ψ , f is C 0 ( V ) if and only if ψ is C 1 ( V ) and when one of these holds, f = ∇ ψ on V . Th is pro v es the result for k = 0. Th e extension to k > 1 is trivial. (bi) ⇔ (bii) Cons id er x ∈ V . As V is op en there is an op en ball B x suc h that x ∈ B x ⊂ V . Noticing that B x is p olygonally connected, b y Theorem 4-(b), there is K ∈ R s u c h that g ( x ′ ) = 1 2 k x ′ k 2 + ϕ ( x ′ ) + K for eac h x ′ ∈ B x . Hence g is C k +1 ( B x ) if and only if ϕ is C k +1 ( B x ), and ∇ g ( x ′ ) = x ′ + ∇ ϕ ( x ′ ) on B x . As this h olds for eac h x ∈ V , the equiv alence h olds on V . (bii) ⇒ (biii) By (b ii), g is C k +1 ( X ) h ence ∂ g ( x ) = {∇ g ( x ) } for eac h x ∈ X . By Theo- rem 4 (aii), f − 1 ( x ) ⊂ ∂ g ( x ) for eac h x ∈ Im( f ). Com bining b oth f acts yields (10) y = ∇ g ( f ( y )) ∀ y ∈ f − 1 ( X ) . Consider y , y ′ ∈ f − 1 ( X ) su c h that f ( y ) = f ( y ′ ). Then y = ∇ g ( f ( y )) = ∇ g ( f ( y ′ )) = y ′ , whic h shows that f is in jectiv e on f − 1 ( X ). Consequen tly , e f is a bijection b et ween f − 1 ( X ) and X , h ence the inv erse function ( e f ) − 1 is well deﬁned. I nserting y = ( e f ) − 1 ( x ) into (10) yields ( e f ) − 1 ( x ) = ∇ g ( x ) for eac h x ∈ X . Then, since g is C k +1 ( X ), it f ollo ws that ( e f ) − 1 is C k ( X ). (biii) ⇒ (bii) Consider x ∈ X . As e f is injectiv e on f − 1 ( X ) by (biii), there is a un ique y ∈ f − 1 ( X ) such that x = f ( y ). Usin g th at f − 1 ( x ) ⊂ ∂ g ( x ) by T heorem 4(aii) shows that ( e f ) − 1 ( x ) = y ∈ ∂ g ( x ). Sin ce ( e f ) − 1 is C k ( X ), using Lemm a 1 with  := ( e f ) − 1 and θ := g pro v es that ( e f ) − 1 ( x ) = ∇ g ( x ) ∀ x ∈ X Since ( e f ) − 1 is C k ( X ) it follo ws that g is C k +1 ( X ).  2.7. Illustration using classical examples. Th eorem 1 and its corollaries c haracterize whether a function f is a p ro ximit y op erator. Th is is particularly useful when f is not explicitly built as a pr o ximity op erator. W e illustrate th is with a few examples. W e b egin with H = R , where pro ximit y op erators happ en to h a ve a particularly simple c haracterization. Corollar y 7 . L et Y ⊂ R b e non-empty. A function f : Y → R is the pr oximity op er ator of some p enalty ϕ if, and only if, f is nonde cr e asing. Pr o of . By Th eorem 1 w e just need to pr o ve that a scalar fu nction f : Y → R b elongs to the sub-gradient of a con v ex fu nction if, and only if, f is non -d ecreasing. When f is con tin u ous 12 R ´ EMI GRIBONV AL AND MILA NIKOLO V A and Y is an op en interv al, a primitiv e ψ of f is indeed con vex if, and only if, ψ ′ = f is non- decreasing [6 , Pr op osition 17.7]. W e no w pro v e the result for more general Y a nd f . First, if f ( y ) ∈ ∂ ψ ( y ) for eac h y ∈ Y where ψ : R → R ∪ { + ∞} is con v ex, then by [21, T heorem 4.2.1 (i)] f is non -d ecreasing. T o pro v e the con v erse deﬁne a := inf { y : y ∈ Y } , I := ( a, ∞ ) if a / ∈ Y (resp. I := [ a, ∞ ) if a ∈ Y ), and set ¯ f ( x ) := sup y ∈Y ,y 6 x f ( y ) ∈ R ∪ { + ∞} for eac h x ∈ I , ¯ f ( x ) = + ∞ , for x / ∈ I . By construction ¯ f is n on-decreasing. If f is non-decreasing on Y then ¯ f ( y ) = f ( y ) for eac h y ∈ Y hence Y ⊂ dom( ¯ f ) ⊂ I and dom( ¯ f ) is an inte rv al. Ch o ose an arb itrary b ∈ Y . As ¯ f is monotone it is in tegrable on eac h b ounded in terv al one can deﬁne ψ ( x ) := R x b ¯ f ( t ) dt for eac h x ∈ dom( ¯ f ) (with the usual con ven tion that if x < b then R x b = − R b x ) and ψ ( x ) := + ∞ for x / ∈ dom( ¯ f ). C onsider x ∈ dom( ¯ f ). S ince ¯ f is non-increasing for h > 0 such that x + h ∈ dom( ¯ f ) w e ha v e ψ ( x + h ) − ψ ( x ) = R x + h x ¯ f ( t ) dt > ¯ f ( x ) h ; similarly for h 6 0 su ch that x − h ∈ dom( ¯ f ) w e ha ve ψ ( x ) − ψ ( x − h ) = R x x − h ¯ f ( t ) dt 6 ¯ f ( x )( − h ), h ence ψ ( x − h ) − ψ ( x ) > ¯ f ( x ) h . Com bining b oth resu lts sh ows ψ ( y ) − ψ ( x ) > ¯ f ( x )( y − x ) for eac h x, y ∈ d om( ¯ f ). This establishes that ¯ f ( x ) ∈ ∂ ψ ( x ) for eac h x ∈ dom( ¯ f ), hen ce that ψ is con v ex on its domain dom( ψ ) = dom( ¯ f ). T o conclude, simply observ e th at for y ∈ Y ⊂ dom( ¯ f ) we ha v e f ( y ) = ¯ f ( y ) ∈ ∂ ψ ( y ).  Example 2 ( Quantiz ation) . In Y = [0 , 1) ⊂ R = H , c onsider 0 = x 0 < x 1 < . . . < x q − 1 < x q = 1 and v 0 6 . . . 6 v q − 1 . L et f b e the quantization-like function so that f ( x ) = v i if and only i f x ∈ [ x i , x i +1 ) , for 0 6 i < q . Quantization tr aditional ly c orr esp onds to the c ase wher e q > 2 and for e ach 0 6 i < q − 1 , x i +1 is the midd le p oint b e twe e n v i and v i +1 . Sinc e f is non-de cr e asing, f is the pr oximity op er ator of a function ϕ . The image of f is the discr ete set of p oints { v 0 , . . . , v q − 1 } . Let us give another example to illustrate the r ole of th e connectedness of the sets C , C ′ in Theorem 4. Example 3 . Consider the identity function f ( y ) := y 7→ y on a subset Y ⊂ R = H . Sinc e f is incr e asing it i s a pr oximity op er ator by Cor ol lary 7. Particular functions satisfying the e quivalent pr op erties (ai) , (aii) and (aiii) of The or em 4 ar e ϕ 0 : x 7→ 0 , g 0 : x 7→ x 2 / 2 and ψ 0 : y 7→ y 2 / 2 . They further satisfy (7) (r esp. (8) ) with K = K ′ = 0 on R . When Y ⊂ R is p olygonal ly c onne cte d, Im( f ) is also p olygonal ly c onne cte d by the c ontinuity of f and The or em 4 implies that ϕ 0 , g 0 , ψ 0 ar e , up to glob al additive c onstants K, K ′ , the only functions satisfying (ai) , (aii ) and (aiii) . N ow, c onsider as a p articular example of disc onne c te d set Y = ( −∞ , 0) ∪ (1 , + ∞ ) . We exhibit two other f u nctions g , ψ such that ϕ 0 , g , ψ also satisfy (a i) , (aii) and (aiii) , but (7) fails on the disc onne cte d set C := Im( f ) = Y (r esp. (8) f ails on the disc onne cte d set C ′ := Y ). Intuitively, what happ ens is that the pr esenc e of a “hole” (the interval [0 , 1] ) in Y give s some fr e e dom in designing sep ar ately the c omp onents of these functions on e ach c onne cte d c omp onent. F or this, c onsider H : [0 , 1] → [0 , 1] any c ontinuous incr e asing function such that H (0) = 0 , H (1) = 1 and C := R 1 0 H ( t ) dt 6 = 1 / 2 . Observe that the function h ( x ) :=      x 2 2 , x < 0 R x 0 H ( t ) dt, 0 6 x 6 1 R 1 0 H ( t ) dt + x 2 − 1 2 , x > 1 . . A CHA RACTERIZ A TION OF PROXIMITY OPERA TORS 13 is c onvex and satisﬁes ∂ h ( x ) = { h ′ ( x ) } = { x } for e ach x ∈ Y . A s a r esu lt the functions g := h and ψ := h also satisfy pr op erties (aii) and (aiii) of The or em 4. Y et on the interval ( −∞ , 0) we have g ( x ) = g 0 ( x ) = ϕ 0 ( x ) + x 2 2 + K 0 with K 0 = 0 , while on the interval (1 , + ∞ ) we have g ( x ) = g 0 ( x ) + C − 1 / 2 = ϕ 0 ( x ) + x 2 2 + K 1 with K 1 = C − 1 / 2 6 = 0 = K 0 . Similarly ψ ( x ) − ψ 0 ( x ) is not c onstant on Y . This show s that (7 ) (r esp. (8) ) fails to hold on C := Im( f ) (r esp. C ′ := Y ). Consider no w functions f : R n → R n giv en b y f ( y ) = ( f i ( y )) n i =1 . When eac h f i can b e written as f i ( y ) = h i ( y i ), the function is said to b e separable. If eac h h i is a scalar pro ximit y op erator then th e fun ction f is also a pr o ximity op erator, and vice-v ersa. This can b e seen, e.g., by writing h i = pro x ϕ i and f = pro x ϕ with ϕ ( x ) := P n i =1 ϕ i ( x i ). All examples b elo w hold for the comp onen ts of sep arable f u nctions. As recalled in Prop osition 1 it is kno wn [13, P r op osition 2.4] that a fu nction f : R → R is the pro ximit y op er ator of a c onvex l.s.c. p enalt y ϕ if, and only if, f is nondecreasing and nonexpansive : | f ( y ) − f ( y ′ ) | 6 | y − y ′ | for eac h y , y ′ ∈ R . A particular example is that of scalar thr esholding rules whic h are kn o wn [2, Pr op osition 3.2] to b e the proximit y op erator of a ( c ontinuous p ositive ) p enalt y fun ction. As we will see in Section 3, Theorem 1 also allo ws to c haracterize whether c ertain blo ck-thr esholding rules [20, 10, 22] are pr oximit y op erators. Our n ext example illustrates the fun ctions app earing in Theorem 1 on the classical hard- thresholding op erator, whic h is the proximit y op erator of a noncon v ex function. Example 4 (Hard-thr esh olding) . In Y = H = R c onsider λ > 0 and the weighte d ℓ 0 p enalty ϕ ( x ) := ( 0 , if x = 0; λ, oth erwise . Its (set- value d) pr oximity op er ator is pro x ϕ ( y ) =        { 0 } if | y | < √ 2 λ { 0 , √ 2 λ } if y = √ 2 λ {− √ 2 λ, 0 } i f y = − √ 2 λ { y } if | y | > √ 2 λ which is disc ontinuous. Cho osing ± √ 2 λ as the value at y = ± √ 2 λ yields a function f ( y ) ∈ pro x ϕ ( y ) with disc onne cte d (henc e nonc onvex) r ange Im( f ) = ( −∞ , − √ 2 λ ] ∪ { 0 } ∪ [ √ 2 λ, + ∞ ) , f ( y ) := ( 0 , if | y | < √ 2 λ y , if | y | > √ 2 λ Sinc e Y is c onvex, the p otential ψ is char acterize d by (1 ) . F or K := 0 we get ψ ( y ) = y f ( y ) − 1 2 f 2 ( y ) − ϕ ( f ( y )) = ( 0 , if | y | < √ 2 λ y 2 / 2 − λ, otherwise = max( y 2 / 2 − λ, 0) . This is inde e d a c onvex p otential, and f ( y ) ∈ ∂ ψ ( y ) for e ach y ∈ R . 14 R ´ EMI GRIBONV AL AND MILA NIKOLO V A Our last example of this section is a scaled version of soft-thresholding: it is still a pr o ximity op erator, how eve r for C > 1 the corresp onding p enalt y is nonconv ex, and is ev en unboun ded from b elo w . Example 5 (Scaled soft-thresholding) . In Y = H = R c onsider f ( y ) :=      0 , if | y | < 1 C ( y − 1) , if y > 1 C ( y + 1) , if y 6 − 1 = C y max(1 − 1 / | y | , 0) . This function has the same shap e as the classic al soft-thr esholding op er ator, b ut is sc ale d by a multiplic ative factor C . When C = 1 , f is the soft-thr esholding op er ator which is the pr oximity op er ator of the absolute value, ϕ ( x ) = | x | , which i s c onvex. F or C > 1 , as f is e xp ansive, by Pr op osition 1 it c annot b e the pr oximity op er ator of any c onvex function. Y et, as f is mono- tonic al ly incr e asing, f ( y ) is a sub gr adient of its “primitive” ψ ( y ) = C 2 (max( | y | − 1 , 0)) 2 = C 2 y 2 (max(1 − 1 / | y | , 0)) 2 = f 2 ( y ) 2 C which is c onvex. Mor e over, by Cor ol lary 7, f is stil l the pr ox- imity op er ator of some (ne c essarily nonc onvex) function ϕ ( x ) . By (1) , up to an additive c onstant K ∈ R , ϕ satisﬁes ϕ ( f ( y )) = y f ( y ) − 1 2 f 2 ( y ) − ψ ( y ) = y f ( y ) − 1+ C 2 C f 2 ( y ) , ∀ y ∈ R F or x > 0 , writing x = f ( y ) with y = f − 1 ( x ) = 1 + x/C yie lds ϕ ( x ) = ϕ ( f ( y )) = (1 + x/C ) x − 1+ C 2 C x 2 . Similar c onsider ations fo r x < 0 and for x = 0 show that ϕ ( x ) = | x | +  1 C − 1  x 2 2 . When C > 1 , ϕ is inde e d not b ounde d fr om b elow, and not c onvex. 3. Wh en is social shrinkage a pro ximity opera tor ? W e conclude this p ap er by stud ying so-called so cial shr ink age op erators, w hic h hav e b een in tro duced to mimic classical sparsit y promoting pro ximity op erators when certain t yp es of structured sparsit y are targeted. W e sho w that the c haracterizat ion of pro x im ity op erators obtained in this p ap er pro vides answ ers to qu estions raised by Ko wa lski et al [22] and b y V aro- queaux et al [35] on the link b et we en suc h non-separable sh rink age op er ators and pro ximit y op erators. Most pr o ximity op erators are indeed n ot separable. A classical example is the pro ximit y op erator asso ciated to m ixed ℓ 12 norms, whic h en forces grou p -sparsit y . Example 6 (Group-sp arsit y s h rink age) . Co nsider a p artition G = { G 1 , . . . , G p } of J 1 , n K , the interval of i nte gers fr om 1 to n , into disjoint sets c al le d groups . L et x G b e the r estriction of x ∈ R n to its entries indexe d by G ∈ G , and deﬁne the group ℓ 1 norm , or mixed ℓ 12 norm , as (11) ϕ ( x ) := X G ∈G k x G k 2 . The pr oximity op er ator f ( y ) := pro x λϕ is the group-sparsity shrink age op er ator with thr eshold λ (12) ∀ i ∈ G, f i ( y ) := y i  1 − λ k y G k 2  + . A CHA RACTERIZ A TION OF PROXIMITY OPERA TORS 15 The group-LASSO p en alt y (11) app eared in statistics in the thesis of Bakin [4, Chapter 2]. It w as p opularized b y Y uan and Lin [37] who intro d uced an iterativ e shrink age algorithm to address the corresp ond ing optimization p r oblem. A generalizat ion is Group Empirical Wiener / Group Non-n egativ e Garrotte, see e.g. [15] (13) ∀ i ∈ G, f i ( y ) := y i  1 − λ 2 k y G k 2 2  + , see also [2] for a review of thresholding ru les, and [3] for a r eview on sparsit y-in d ucing p enalties. T o accoun t for v aried t yp es of structur ed sparsit y , [23, 24] empirically introd uced th e so-call ed Windo w ed Group-LAS SO. A w eigh ted v ersion for aud io app licatio ns wa s fur ther dev elop ed in [32] whic h coins the n otion of p ersistency , and the term so c ial sp arsity was coined in [22] to co ver Windo w ed Group-LASSO, as we ll as other structured shrink age op erators. As further describ ed in these pap er s , the m ain motiv ation of such so cial shrink age op erators is to obtain ﬂexible wa ys of taking int o account (p ossibly o v erlapping) neighb orho o ds of a co eﬃcien t in dex i rather than disjoint gr oups of in dices to decide whether or not to set a co eﬃcien t to zero. These are summarized in the deﬁnition b elo w. Definition 3 (So cial shrin k age) . Consider a family N i ⊂ J 1 , n K , i ∈ J 1 , n K of sets su c h that i ∈ N i . The set N i is c al le d a neigh b orho o d of its index i . Consider nonne gative weight ve ctors w i = ( w i ℓ ) n ℓ =1 such that sup p( w i ) = N i . Wi ndowe d Gr oup L asso (WG- LASSO) shrinkage is deﬁne d as f ( y ) := ( f i ( y )) n i =1 with (14) ∀ i, f i ( y ) := y i  1 − λ k diag( w i ) y k 2  + and Persistent Empiric al Wiener (PEW) shrinkage (se e [33] for the unweighte d version) with (15) ∀ i, f i ( y ) := y i  1 − λ 2 k diag( w i ) y k 2 2  + . Ko w alski et al [22] write “ while the classic al pr oximity op er ators 6 ar e dir e c tly linke d to c onvex r e gr ession pr oblems with mixe d norm priors on the c o eﬃcients, [those] new, structur e d, shr inkage op er ators c annot b e dir e ctly linke d to a c onvex minimization pr oblem ”. S imilarly , V aro quaux et al [35] wr ite that Windo w ed Group Lasso “ is not the pr oximal op er ator of a known p enalty ”. They lea ve op en the question of whether so cial shrink age is the proximit y op erator of some yet to b e d isco vered p enalt y . Using Th eorem 2, we an s w er these qu estions for generalized so cial shrink age op erators. The answer is negativ e un less the inv olv ed neigh b orho o ds form a partition. Definition 4 (Generalized so cial shrink age) . Consider subsets N i ⊂ J 1 , n K an d no nne gative weight ve ctors w i ∈ R n + such that i ∈ N i and supp( w i ) = N i for e ach i ∈ J 1 , n K . Consider λ > 0 and a family of C 1 ( R ∗ + ) sc alar functions h i , i ∈ J 1 , n K such that h ′ i ( t ) 6 = 0 for t ∈ R ∗ + . A gener alize d so cial shrinkage op er ator is deﬁne d as f ( y ) := ( f i ( y )) n i =1 with f i ( y ) := ( y i h i  k diag( w i ) y k 2 2  , if k diag ( w i ) y k 2 > λ, 0 otherwise . 6 that are explicitly constructed as th e proximit y op erator of a con vex l.s.c. p enalty , e.g., soft-thresholding. 16 R ´ EMI GRIBONV AL AND MILA NIKOLO V A W e let the reader c hec k that the ab o v e deﬁnition co v ers Group L ASSO (12), Windo wed Group- LASSO (14), Group Emp irical Wiener (13) and Pe rsisten t Empirical Wiener shrink age (15). Lemma 2 . L et f : R n → R n b e a g ener alize d so cial shrinkage op e r ator and N i ⊂ J 1 , n K , w i ∈ R n + , i ∈ J 1 , n K b e the c orr esp onding families of neighb orho o ds and weight ve ctors. If f is a pr oximity op er ator then ther e exists a p artition G = { G p } P p =1 of the set J 1 , n K of indic es such that: for e ach p and al l i, j ∈ G p we have w i = w j and su pp( w i ) = supp( w j ) = G p . As a c onse qu enc e for i ∈ G p , j ∈ G q with p 6 = q , the weight ve ctors w i and w j have disjoint supp ort. The pro of of Lemma 2 is p ostp oned to App end ix A.7. An imm ed iate consequence of this lemma is t hat if f is a generalized so cial shrin k age op erator, then the neighborh o o d system N i = supp( w i ) coincides with the group s G from the partition G . In particular, the n eigh b orho o d system m ust form a partition. By con trap osition we get the follo wing corollary: Corollar y 8 . Consider non-ne gative weights { w i } as in Deﬁnition 4 and { N i } the c orr esp ond- ing neighb orho o d system. A ssume that ther e exists i, j su c h that N i 6 = N j and N i ∩ N j 6 = ∅ . • L et f b e the WG-LASSO shrinkage (14) . Ther e is no p enalty ϕ such that f = pro x ϕ . • L et f b e the PEW shrinkage (15) . Th er e is no p enalty ϕ such that f = p ro x ϕ . In other words, WG-LASSO / PEW can b e a pro ximit y op erator only if the n eigh b orho o d system has no overlap , i.e. with “plain” Gr oup-LASSO (12) / Group Empirical Wiener (13). A cknowledgements The ﬁrst author wishes to thank Laurent Con d at, Jean-Christophe Pe squet and Pat ric k-Louis Com b ettes for their feedb ack that h elp ed imp ro v e an early version of this pap er, as w ell as the anon ymous review ers for man y ins igh tfu l commen ts th at imp ro ved it muc h fur ther. Appendix A. Proofs The pro ofs of tec hnical results of Section 2 are p ro vided in Sections A.4 (Theorem 3), A.5 (Lemma 1), A.6 (Corollary 3) and A.7 (Lemma 2). As a preliminary w e giv e brief reminders on some useful but classical n otions in Sections A.1-A.3. A.1. Brief reminders on (F r´ ec het ) diﬀerentials and gradients in Hilb ert spaces. C on- sider H , H ′ t wo Hilb ert spaces. A f u nction θ : X → H ′ where X ⊂ H is an op en domain is (F r´ ec het) d iﬀeren tiable at x if there exists a con tinuous linear op erator L : H → H ′ suc h that lim h → 0 k θ ( x + h ) − θ ( x ) − L ( h ) k H ′ / k h k H = 0. Th e linear op erator L is called the d iﬀeren tial of θ at x and denoted D θ ( x ). When H ′ = R , L b elongs to the du al of H , hence there is u ∈ H –called the gradien t of θ at x and denoted ∇ θ ( x )– such that L ( h ) = h u, h i , ∀ h ∈ H . A.2. Subgradients and sub diﬀerentials for p ossibly noncon vex functions. W e adopt a gen tle deﬁnition which is familiar wh en θ is a con v ex fun ction. Although this is p ossibly less w ell-kno wn by non-exp erts, this d eﬁnition is also v alid when θ is p ossibly noncon v ex, see e.g. [6, Deﬁnition 16.1]. A CHA RACTERIZ A TION OF PROXIMITY OPERA TORS 17 Definition 5 . L et θ : H → R ∪ { + ∞} b e a pr op er fu nc tion. The sub d iﬀerential ∂ θ ( x ) of θ at x is the set of al l u ∈ H , c al le d su bgradien t s of θ at x , such that (16) θ ( x ′ ) > θ ( x ) + h u, x ′ − x i , ∀ x ′ ∈ H . If x 6∈ dom( θ ) , then ∂ θ ( x ) = ∅ . The function θ is sub d iﬀeren tiable at x ∈ H if ∂ θ ( x ) 6 = ∅ . The domain of ∂ θ is d om( ∂ θ ) := { x ∈ H , ∂ θ ( x ) 6 = ∅ } . It satisﬁes dom( ∂ θ ) ⊂ d om( θ ) . F act 1. When ∂ θ ( x ) 6 = ∅ the ine quality in (16 ) is trivial for e ach x ′ 6∈ dom( θ ) s inc e it amounts to + ∞ = θ ( x ′ ) − θ ( x ) > h u, x ′ − x i . Deﬁnition 5 leads to the w ell-kno w n F erm at’s rule [6, Th eorem 16.3] Theorem 5 . L et θ : H → R ∪ { + ∞} b e a pr op er fu nction. A p oint x ∈ dom( θ ) is a glob al minimizer of θ if and only i f 0 ∈ ∂ θ ( x ) . If θ h as a global minimizer at x , then b y Theorem 5 the set ∂ θ ( x ) is non-emp ty . Ho w ev er, ∂ θ ( x ) can b e empty , e.g., at lo cal minimizers that are not the global minimizer: Example 7 . Let θ ( x ) = 1 2 x 2 − cos( π x ). Th e global minimum of θ is r eac hed at x = 0 wh ere ∂ θ ( x ) = f ′ ( x ) = 0. A t x = ± 1 . 7 ¯ 9 θ h as lo cal m in imizers where ∂ θ ( x ) = ∅ (ev en though θ is C ∞ ). F or | x | < 0 . 53 on e has ∂ θ ( x ) = ∇ θ ( x ) with θ ′′ ( x ) > 0 and for 0 . 54 < | x | < 1 . 91 ∂ θ ( x ) = ∅ . The pr o of of the follo wing lemma is a standard exercice in con v ex analysis [6, Ex er cice 16.8]. Lemma 3 . L et θ : H → R ∪ { + ∞} b e a pr op e r function suc h that (a) d om( θ ) is c onvex and (b) ∂ θ ( x ) 6 = ∅ for e ach x ∈ dom( θ ) . Then θ i s a c onvex fu nction. Definition 6 . (L ower c onvex envelop e of a function) L et θ : H → R ∪ { + ∞} b e pr op er with dom( ∂ θ ) 6 = ∅ . Its lower c onvex envelop e, 7 denote d ˘ θ , is the p ointwise supr emum of al l the c onvex lower-semic ontinuous functions minorizing θ (17) ˘ θ ( x ) := sup {  ( x ) |  : H → R ∪ { + ∞} ,  c onvex l.s.c. ,  ( z ) 6 θ ( z ) , ∀ z ∈ H} , ∀ x ∈ H . The fu nction ˘ θ is pr op er, c onvex and lower-semic ontinuous. It satisﬁes (18) ˘ θ ( x ) 6 θ ( x ) , ∀ x ∈ H . Pr oposition 3 . L et θ : H → R ∪ { + ∞} b e pr op er with dom( ∂ θ ) 6 = ∅ . F or any x 0 ∈ dom( ∂ θ ) we have ˘ θ ( x 0 ) = θ ( x 0 ) , ∂ θ ( x 0 ) = ∂ ˘ θ ( x 0 ) . Pr o of . As ∂ θ ( x 0 ) 6 = ∅ , b y [6, Proposition 13.45], ˘ θ is the so-called biconjugate θ ∗∗ of θ [6, Deﬁnition 13.1]. Moreo v er, [6, Prop osition 16.5] yields θ ∗∗ ( x 0 ) = θ ( x 0 ) and ∂ θ ∗∗ ( x 0 ) = ∂ θ ( x 0 ).  W e need to adapt [6, Pr op osition 17.31 ] to the ca se where θ is prop er but p ossibly n oncon vex, with a stron ger assu mption of F r ´ ec h et (instead of Gˆ ateaux) diﬀerentia bilit y . Pr oposition 4 . If ∂ θ ( x ) 6 = ∅ and θ is (F r´ echet) diﬀer entiable at x then ∂ θ ( x ) = {∇ θ ( x ) } . 7 also known as conv ex hull, [29, p. 57],[21, Deﬁnition 2.5.3 ] 18 R ´ EMI GRIBONV AL AND MILA NIKOLO V A Pr o of . Consider u ∈ ∂ θ ( x ). As θ is diﬀeren tiable at x there is an open ball B cen tered at 0 su c h that x + h ∈ dom( θ ) for eac h h ∈ B . F or eac h h ∈ B , Deﬁn ition 5 yields θ ( x − h ) − θ ( x ) > h u, − h i and θ ( x + h ) − θ ( x ) > h u, h i hence − ( θ ( x − h ) − θ ( x )) 6 h u, h i 6 θ ( x + h ) − θ ( x ). S ince θ is F r´ ec het diﬀerentia ble at x , letting k h k tend to zero yields − ( h∇ θ ( x ) , − h i + o ( k h k )) 6 h u, h i 6 h∇ θ ( x ) , h i + o ( k h k ) hence h u − ∇ θ ( x ) , h i = o ( k h k ), ∀ h ∈ B . Th is sh o w s that u = ∇ θ ( x ).  A.3. Characterizing functions with a giv en sub diﬀere ntial. Corollary 9 b elow generalizes a result of Moreau [26, Prop osition 8.b] charac terizing functions b y their sub diﬀerent ial. It sh o w s that one only needs the sub diﬀeren tials to in tersect. W e b egin in d imension one. Lemma 4 . Consider a 0 , a 1 : R → R ∪ { + ∞} c onvex functions such that d om( a i ) = dom( ∂ a i ) = [0 , 1] and ∂ a 0 ( t ) ∩ ∂ a 1 ( t ) 6 = ∅ on [0 , 1] . Then ther e exists a c onstant K ∈ R such that a 1 ( t ) − a 0 ( t ) = K on [0 , 1] . Pr o of . As a i is con v ex it is con tin uous on (0 , 1) [21, Theorem 3.1.1, p16]. Moreo ver, by [21, Prop osition 3.1.2] we hav e a i (0) > lim t → 0 ,t> 0 a i ( t ) =: a i (0 + ), and s ince ∂ a i (0) 6 = ∅ , there is u i ∈ ∂ a i (0) such that a i ( t ) > a i (0) + u i ( t − 0) for eac h t ∈ [0 , 1] h en ce a i (0 + ) > a i (0). This sho ws that a i (0 + ) = a i (0), and similarly lim t → 1 ,t< 1 a i ( t ) = a i (1), hence a i is con tinuous on [0 , 1] relativ ely to [0 , 1]. In addition, a i is diﬀeren tiable on [0 , 1] except on a coun table set B i ⊂ [0 , 1] [21, Theorem 4.2.1 (ii)]. F or t ∈ [0 , 1] \ ( B 0 ∪ B 1 ) and i ∈ { 0 , 1 } , Prop osition 4 yields ∂ a i ( t ) = { a ′ i ( t ) } hence the function δ := a 1 − a 0 is con tinuous on [0 , 1] and d iﬀeren tiable on [0 , 1] \ ( B 0 ∪ B 1 ). F or t ∈ I \ ( B 0 ∪ B 1 ), { a ′ 0 ( t ) } ∩ { a ′ 1 ( t ) } = ∂ a 0 ( t ) ∩ ∂ a 1 ( t ) 6 = ∅ , hence a ′ 0 ( t ) = a ′ 1 ( t ) and δ ′ ( t ) = 0. A classical exercice 8 in r eal analysis [34, Example 4] is to show that if a function f is con tin uous on an interv al, and diﬀeren tiable with zero deriv ativ e except on a countable set, then f is constan t. As B 0 ∪ B 1 is coun table it follo ws δ is constan t on (0 , 1). As it is contin uous on [0 , 1], it is constan t on [0 , 1].  Corollar y 9 . L et θ 0 , θ 1 : H → R ∪ { + ∞} b e pr op er and C ⊂ H a non-empty p olygonal ly c onne cte d set. A ssu me that for e ach z ∈ C , ∂ θ 0 ( z ) ∩ ∂ θ 1 ( z ) 6 = ∅ ; then ther e i s a c onstant K ∈ R such that θ 1 ( x ) − θ 0 ( x ) = K , ∀ x ∈ C . Remark 4 . Note that the fun ctions θ i and the set C are not assu med to b e con v ex. Pr o of . Th e pr o of is in t wo parts. (i) Assume that C is conv ex and ﬁ x some x ∗ ∈ C . Consider x ∈ C , an d deﬁne a i ( t ) := θ i ( x ∗ + t ( x − x ∗ )), for i = 0 , 1 and eac h t ∈ [0 , 1], and a i ( t ) = + ∞ if t 6∈ [0 , 1]. As C is con v ex, 8 for a proof see e.g. (in french) https:/ /fr.wikiped ia.org/wiki/Lemme_de_Cousin section 4.9, versi on from 13/01/2 019. A CHA RACTERIZ A TION OF PROXIMITY OPERA TORS 19 z t := x ∗ + t ( x − x ∗ ) ∈ C hence for eac h t ∈ [0 , 1] there exists u t ∈ ∂ θ 0 ( z t ) ∩ ∂ θ 1 ( z t ). By Deﬁnition 5 for eac h t , t ′ ∈ [0 , 1], a i ( t ′ ) − a i ( t ) = θ i ( x ∗ + t ′ ( x − x ∗ )) − θ i ( x ∗ + t ( x − x ∗ )) > h u t , ( t ′ − t )( x − x ∗ ) i = h u t , x − x ∗ i ( t ′ − t ) . F or t ∈ [0 , 1] and t ′ ∈ R \ [0 , 1] since a i ( t ′ ) = + ∞ the inequalit y a i ( t ′ ) − a i ( t ) > h u t , x − x ∗ i ( t ′ − t ) also obviously holds, hence h u t , x − x ∗ i ∈ ∂ a i ( t ), i = 0 , 1. Thus ∂ a i ( t ) 6 = ∅ for eac h t ∈ [0 , 1], so b y Lemma 3 a i is con v ex on [0 , 1] for i = 0 , 1, and h u t , x − x ∗ i ∈ ∂ a 0 ( t ) ∩ ∂ a 1 ( t ) for eac h t ∈ [0 , 1]. By Lemma 4, there exists K ∈ R such that a 1 ( t ) − a 0 ( t ) = K f or eac h t ∈ [0 , 1]. Therefore, θ 1 ( x ) − θ 0 ( x ) = a 1 (1) − a 0 (1) = a 1 (0) − a 0 (0) = θ 1 ( x ∗ ) − θ 0 ( x ∗ ) = K . As this h olds for eac h x ∈ C , we hav e established the result as so on as C is conv ex. (ii) No w we prov e the result wh en C is p olygonally connected. Fix some x ∗ ∈ C and deﬁne K := θ 1 ( x ∗ ) − θ 0 ( x ∗ ). Consider x ∈ C : b y the deﬁn ition of p olygonal connectedness, there exists an in teger n > 1 and x j ∈ C , 0 6 j 6 n w ith x 0 = x ∗ and x n = x such th at the (con vex) segmen ts C j = [ x j , x j + 1 ] = { tx j + (1 − t ) x j + 1 , t ∈ [0 , 1] } satisfy C j ⊂ C . Since eac h C j is con v ex, the result established in (i) implies that θ 1 ( x j + 1 ) − θ 0 ( x j + 1 ) = θ 1 ( x j ) − θ 0 ( x j ) for 0 6 j < n . This sho ws that θ 1 ( x ) − θ 0 ( x ) = θ 1 ( x n ) − θ 0 ( x n ) = . . . = θ 1 ( x 0 ) − θ 0 ( x 0 ) = θ 1 ( x ∗ ) − θ 0 ( x ∗ ) = K .  A.4. Pro of of Theorem 3. The indicator f unction of a set S is denoted χ S ( x ) :=  0 if x ∈ S , + ∞ if x 6∈ S . (ai) ⇒ (aii) . W e introdu ce the fun ction θ : H → R ∪ { + ∞} by (19) θ := b + ϕ + χ Im( f ) . Consider x ∈ Im( f ). By deﬁnition x = f ( y ) where y ∈ Y , hence by (ai) x is a global minimizer of x ′ 7→ { D ( x ′ , y ) + ϕ ( x ′ ) } . Therefore, w e h a ve (20) ∀ x ′ ∈ H , −h A ( y ) , x ′ i + b ( x ′ ) + ϕ ( x ′ ) + χ Im( f ) ( x ′ ) | {z } = θ ( x ′ ) > −h A ( y ) , x i + b ( x ) + ϕ ( x ) + χ Im( f ) ( x ) | {z } = θ ( x ) whic h is equiv alen t to ∀ x ′ ∈ H θ ( x ′ ) > θ ( x ) + h A ( y ) , x ′ − x i (21) meaning that A ( y ) ∈ ∂ θ ( f ( y )). As th is holds for eac h y ∈ Y such that f ( y ) = x , w e get A ( f − 1 ( x )) ⊂ ∂ θ ( x ). Consid er g 1 := ˘ θ according to Deﬁnition 6. Since g 1 is conv ex l.s.c. and ∀ x ∈ Im ( f ) , ∂ θ ( x ) 6 = ∅ , (22) b y Pr op osition 3, ∂ θ ( x ) = ∂ g 1 ( x ) and θ ( x ) = g 1 ( x ) for eac h x ∈ Im( f ). This establishes (aii) with g := g 1 = ˘ θ . 20 R ´ EMI GRIBONV AL AND MILA NIKOLO V A (aii) ⇒ (ai). S et θ 1 := g + χ Im( f ) . By (aii), ∂ g ( x ) 6 = ∅ for eac h x ∈ I m( f ). Since d om( ∂ g ) ⊂ dom( g ) it follo ws that Im( f ) ⊂ dom( g ) and consequen tly dom( θ 1 ) = Im( f ) . Consider y ∈ Y and x := f ( y ) so th at x ∈ Im( f ), hence θ 1 ( x ) = g ( x ) and A ( y ) ∈ A ( f − 1 ( x )) ⊂ ∂ g ( x ) where the inclusion comes from (aii). I t follo ws that for eac h ( x, x ′ ) ∈ Im( f ) × H one h as θ 1 ( x ′ ) = g ( x ′ ) + χ Im( f ) ( x ′ ) > g ( x ′ ) > g ( x ) + h A ( y ) , x ′ − x i = θ 1 ( x ) + h A ( y ) , x ′ − x i , sho wing that A ( y ) ∈ ∂ θ 1 ( x ). This is equiv alent to (21) with θ := θ 1 , an d sin ce dom( θ 1 ) = Im( f ), the inequ ality in (20) h olds with ϕ ( x ) := θ 1 ( x ) − b ( x ), i.e., x is a global minimizer of D ( x ′ , y ) + ϕ ( x ′ ). S ince this holds for eac h y ∈ Y , th is establishes (ai) with ϕ := θ 1 − b = g − b + χ Im( f ) . (b). Consider ϕ and g satisfying (ai) and (aii), r esp ectiv ely . Let 9 g 1 := ˘ θ with θ deﬁ ned in (19). F ollo wing the argumen ts of (ai ) ⇒ (aii) we obtain that g 1 (just as g ) s atisﬁes (aii). F or eac h x ∈ C we th us ha v e ∂ g ( x ) ∩ ∂ g 1 ( x ) ⊃ A ( f − 1 ( x )) 6 = ∅ with g, g 1 con vex l.s.c. f u nctions. Hence, by Corollary 9, since C is p olygonally conn ected, there is a constan t K such that g ( x ) = g 1 ( x ) + K , ∀ x ∈ C . T o establish the relation (2) b et ween g and ϕ we no w sho w that g 1 ( x ) = b ( x ) + ϕ ( x ) on C . By (22) and Prop osition 3 we ha v e ˘ θ ( x ) = θ ( x ) for eac h x ∈ Im( f ), hence as C ⊂ Im( f ) w e obtain g 1 ( x ) := ˘ θ ( x ) = θ ( x ) = b ( x ) + ϕ ( x ) for eac h x ∈ C . This establishes (2). (ci) ⇒ (cii ). Deﬁne  ( y ) := ( + ∞ , ∀ y / ∈ Y h B ( f ( y )) , y i − b ( f ( y )) − ϕ ( f ( y )) , ∀ y ∈ Y . (23) Consider y ∈ Y . F rom (ci), for eac h y ′ the global minimizer of x 7→ e D ( x, y ′ ) + ϕ ( x ) is r eac hed at x ′ = f ( y ′ ). Hence, for x = f ( y ) w e ha v e −h B ( f ( y ′ )) , y ′ i + b ( f ( y ′ ))+ ϕ ( f ( y ′ )) 6 −h B ( x ) , y ′ i + b ( x )+ ϕ ( x ) = − h B ( f ( y )) , y ′ i + b ( f ( y ))+ ϕ ( f ( y )) Using this in equalit y w e obtain that ∀ y ′ ∈ Y ,  ( y ′ ) −  ( y ) = −h B ( f ( y )) , y i + b ( f ( y )) + ϕ ( f ( y )) + h B ( f ( y ′ )) , y ′ i − b ( f ( y ′ )) − ϕ ( f ( y ′ )) > h B ( f ( y )) , y ′ i − h B ( f ( y )) , y i > h B ( f ( y )) , y ′ − y i This sho ws th at B ( f ( y )) ∈ ∂  ( y ) . (24) Set ψ 1 := ˘  according to Deﬁnition 6. T h en the function ψ 1 is conv ex l.s.c. and for eac h y ∈ Y the function B ( f ( y )) is w ell deﬁned, so ∂  ( y ) 6 = ∅ . Hence, b y Pr op osition 3, ∂  ( y ) = ∂ ˘  ( y ) = ∂ ψ 1 ( y ) and  ( y ) = ˘  ( y ) = ψ 1 ( y ) for eac h y ∈ Y . T h is establishes (cii) with ψ := ψ 1 = ˘  . 9 In general, we may hav e g 6 = g 1 as there is no connectedness assumption on dom( θ ) . A CHA RACTERIZ A TION OF PROXIMITY OPERA TORS 21 (cii) ⇒ (ci) . Deﬁne h : Y → R by h ( y ) := h B ( f ( y )) , y i − ψ ( y ) Since B ( f ( y ′ )) ∈ ∂ ψ ( y ′ ) with ψ con vex by (cii), applying Deﬁnition 5 to ∂ ψ yields ψ ( y ) − ψ ( y ′ ) > h y − y ′ , B ( f ( y ′ )) i . Using this inequalit y , one has (25) ∀ y , y ′ ∈ Y h ( y ′ ) − h ( y ) = h B ( f ( y ′ )) , y ′ i − ψ ( y ′ ) − h B ( f ( y )) , y i + ψ ( y ) > h B ( f ( y ′ )) , y ′ i − h B ( f ( y )) , y i + h B ( f ( y ′ )) , y − y ′ i =  B ( f ( y ′ ) − B ( f ( y )) , y  Noticing that for eac h x ∈ Im( f ) there is y ∈ Y such that x = f ( y ), we can d eﬁne θ : H → R ∪ { + ∞} ob eying d om( θ ) = Im( f ) by θ ( x ) :=  h ( y ) with y ∈ f − 1 ( x ) if x ∈ Im( f ) + ∞ otherwise F or x ∈ Im( f ), as f ( y ) = f ( y ′ ) = x for eac h y , y ′ ∈ f − 1 ( x ), app lying (25) yields h ( y ′ ) − h ( y ) > 0. By symm etry h ( y ′ ) = h ( y ), hence th e deﬁn ition of θ ( x ) do es not dep end of w h ic h y ∈ f − 1 ( x ) is c h osen. F or x ′ ∈ Im( f ) we write x ′ = f ( y ′ ). Usin g (25) and the deﬁ n ition of θ yields θ ( x ′ ) − θ ( f ( y )) = θ ( f ( y ′ )) − θ ( f ( y )) = h ( y ′ ) − h ( y ) > h B ( f ( y ′ )) − B ( f ( y )) , y i = h B ( x ′ ) − B ( f ( y )) , y i . that is to say θ ( x ′ ) − h B ( x ′ ) , y i > θ ( f ( y )) − h B ( f ( y )) , y i , ∀ x ′ ∈ Im( f ) . This also trivially holds for x ′ / ∈ Im( f ). Setting ϕ ( x ) := θ ( x ) − b ( x ) for eac h x ∈ H , an d replacing θ b y b + ϕ in the inequalit y ab o v e yields a ( y ) − h B ( x ′ ) , y i + b ( x ′ ) + ϕ ( x ′ ) > a ( y ) − h B ( f ( y )) , y i + b ( f ( y )) + ϕ ( f ( y )) , ∀ x ′ ∈ H sho wing that f ( y ) ∈ arg min x ′ { e D ( x ′ , y ) + ϕ ( x ′ ) } . As this holds for eac h y ∈ Y , ϕ satisﬁes (ci). (d). Consider ϕ and ψ satisfying (ci) and (cii), resp ectiv ely . Using the argumen ts of (ci) ⇒ (cii), the fun ction ψ 1 := ˘  w ith  deﬁned in (23) satisﬁes (ci i). As ψ and ψ 1 b oth s atisfy (cii) , for eac h y ∈ C ′ w e h a ve ∂ ψ ( y ) ∩ ∂ ψ 1 ( y ) ⊃ B ( f ( y )) 6 = ∅ with ψ , ψ 1 con vex l.s.c. f u nctions. Hence, by Corollary 9, since C ′ is p olygonally connected, there is a constant K ′ suc h that ψ ( y ) = ψ 1 ( y ) + K ′ , ∀ y ∈ C ′ . By (24), ∂  ( y ) 6 = ∅ for eac h y ∈ Y , hence b y Prop osition 3 w e ha v e ˘  ( y ) =  ( y ) f or eac h y ∈ Y . As C ′ ⊂ Y , it follo ws that ψ 1 ( y ) = ˘  ( y ) =  ( y ) for eac h y ∈ C ′ . Th is establishes (3). A.5. Pro of of Lemma 1. Pr o of . Without loss of generali t y w e pro ve the equiv alence for the con v ex en v elope ˘ θ instead of θ : in d eed by Prop osition 3 , since ∂ θ ( x ) 6 = ∅ on X w e ha v e ˘ θ ( x ) = θ ( x ) and ∂ ˘ θ ( x ) = ∂ θ ( x ) on X . (a) ⇒ (b). By [6, Prop 17.41(iii) ⇒ ( i)], as ˘ θ is con vex l.s.c. and  is a selection of its su b diﬀerentia l whic h is cont in uous at eac h x ∈ X , ˘ θ is (F rc het) diﬀerentia ble at eac h x ∈ X . By Prop osition 4 w e get ∂ ˘ θ ( x ) = {∇ ˘ θ ( x ) } = {  ( x ) } on X . Since  is cont in uous, x 7→ ∇ ˘ θ ( x ) is con tinuous on X . (b) ⇒ (a). Since ˘ θ is diﬀeren tiable on X , by Prop osition 4 we ha v e ∂ ˘ θ ( x ) = {∇ ˘ θ ( x ) } on X . By (9) it follo ws that  ( x ) = ∇ ˘ θ ( x ) on X . Since ∇ ˘ θ is cont in uous on X , so is  .  22 R ´ EMI GRIBONV AL AND MILA NIKOLO V A A.6. Pro of of Corollary 3. By Theorem 2, as Y is op en and con v ex an d f is C 1 ( Y ) with D f ( y ) symmetric semi-deﬁnite p ositiv e for eac h y ∈ Y , there is a function ϕ 0 and a con v ex l.s.c. function ψ ∈ C 2 ( Y ) su c h that ∇ ψ ( y ) = f ( y ) ∈ prox ϕ 0 ( y ) for eac h y ∈ Y . W e deﬁn e ϕ ( x ) := ϕ 0 ( x ) + χ Im( f ) ( x ) and let the reader c hec k that f ( y ) ∈ prox ϕ ( y ) for eac h y ∈ Y . By construction, dom( ϕ ) = I m( f ). Uniqueness of the global minimizer. Consider e f any fu nction suc h that e f ( y ) ∈ prox ϕ ( y ) for eac h y . This imp lies (26) 1 2 k y − f ( y ) k 2 + ϕ ( f ( y )) = 1 2 k y − e f ( y ) k 2 + ϕ ( e f ( y )) = min x ∈H { 1 2 k y − x k 2 + ϕ ( x ) } , ∀ y ∈ Y . By Corollary 1 ther e is a con v ex l.s.c. function e ψ s u c h that e f ( y ) ∈ ∂ e ψ ( y ) for eac h y ∈ Y . Since Y is con v ex it is p olygonally connected hence by Theorem 4(b) and (26) there are K, K ′ ∈ R suc h that (27) ψ ( y ) − K = 1 2 k y k 2 − 1 2 k y − f ( y ) k 2 − ϕ ( f ( y )) = 1 2 k y k 2 − 1 2 k y − e f ( y ) k 2 − ϕ ( e f ( y )) = e ψ ( y ) − K ′ , ∀ y ∈ Y . Th us, e ψ is also C 2 ( Y ) and e f ( y ) ∈ ∂ e ψ ( y ) = {∇ ψ ( y ) } = { f ( y ) } f or eac h y ∈ Y . This sh o w s that e f ( y ) = f ( y ) f or ea c h y , h ence f ( y ) is th e unique global min imizer on H of x 7→ 1 2 k y − x k 2 + ϕ ( x ), i.e., pro x ϕ ( y ) = { f ( y ) } . Injectivit y of f . The pro of follo ws that of [17 , Lemma 1]. Giv en y 6 = y ′ deﬁne v := y ′ − y 6 = 0 and θ ( t ) := h f ( y + tv ) , v i for t ∈ [0 , 1]. As Y is con v ex this is w ell deﬁned. As f ∈ C 1 ( Y ) and D f ( y + tv ) ≻ 0, the function θ is C 1 ([0 , 1]) with θ ′ ( t ) = h D f ( y + tv ) v , v i > 0 for eac h t . If w e had f ( y ) = f ( y ′ ) then by Rolle’s theorem there w ould b e t ∈ [0 , 1] such that θ ′ ( t ) = 0, con tr adicting the fact that θ ′ ( t ) > 0. Diﬀeren tiabilit y of ϕ . If D f ( y ) is b oundedly in vertible for eac h y ∈ Y , then b y the inv erse function theorem Im( f ) i s open and f − 1 : Im( f ) → Y is C 1 . Giv en x ∈ Im( f ), d enoting u := f − 1 ( x ), (27) yields ϕ ( x ) = ϕ ( f ( u )) = − ( ψ ( u ) − K )+ 1 2 k u k 2 − 1 2 k u − f ( u ) k 2 = − ( ψ ( f − 1 ( x )) − K )+ 1 2 k f − 1 ( x ) k 2 − 1 2 k f − 1 ( x ) − x k 2 . Since ψ is C 2 and f − 1 is C 1 , it f ollo ws that ϕ is C 1 . Global minim um is the unique critical p oin t . The p ro of is inspired by that of [17, Theorem 1]. Consider x a critical p oin t of θ : x 7→ 1 2 k y − x k 2 + ϕ ( x ), i.e., sin ce ϕ is C 1 , a p oin t where ∇ θ ( x ) = 0. S ince dom( ϕ ) = Im( f ) there is some v ∈ Y suc h that x = f ( v ). Moreo ver, as ϕ is C 1 on the op en set Im( f ), the gradient ∇ θ ( x ) is well deﬁ ned and ∇ θ ( x ) = 0. On the one hand, denoting  ( u ) := ( θ ◦ f )( u ) = 1 2 k y − f ( u ) k 2 + ϕ ( f ( u )) w e h a ve ∇  ( u ) = D f ( u ) ∇ θ ( f ( u )) for eac h u ∈ Y . On the other hand, f or eac h u ∈ Y , as f ( u ) = ∇ ψ ( u ) we also ha v e  ( u ) = 1 2 k y k 2 + 1 2 k f ( u ) k 2 − h y , f ( u ) i + ϕ ( f ( u )) = + 1 2 k y k 2 + h u − y , f ( u ) i − ( ψ ( u ) − K ) , ∇  ( u ) = D f ( u ) ( u − y ) + f ( u ) − ∇ ψ ( u ) = Df ( u ) ( u − y ) F or u = v w e get D f ( v ) ( v − y ) = ∇  ( v ) = D f ( v ) ∇ θ ( f ( v )) = D f ( v ) ∇ θ ( x ) = 0. As D f ( v ) ≻ 0, this implies v = y , hence x = f ( y ). A CHA RACTERIZ A TION OF PROXIMITY OPERA TORS 23 A.7. Pro of of Lemma 2. As a pr eliminary let us compute the ent ries of the n × n matrix asso ciated to D f ( y ): (28) ∀ i, j ∈ J 1 , n K ∂ f i ∂ y j ( y ) = ( 0 if k diag( w i ) y k 2 < λ 2( w i j ) 2 y i y j h ′ i  k diag( w i ) y k 2 2  if k diag( w i ) y k 2 > λ NB: if k diag ( w i ) y k 2 = λ then f ma y not b e diﬀeren tiable at y ; this case will not b e useful b elo w. The pro of exploits Corollary 2 whic h sho ws that if f is a proximit y op erator then D f ( y ) is symmetric in eac h op en set where it is well d eﬁned. Let f b e a generalized so cial shr ink age op erator as describ ed in Lemma 2 and consid er G = { G 1 , . . . , G p } the partition of J 1 , n K in to disjoint groups corresp onding to the equiv alence classes deﬁned by the equiv alence r elation b etw een in dices: for i, j ∈ J 1 , n K , i ∼ j if and only if w i = w j . Giv en G ∈ G , denote w G the weigh t vec tor shared all i ∈ G . If f is a proximit y op erator then w e sho w that for eac h G ∈ G , we ha v e supp( w G ) = G . F or i ∈ G , by Deﬁnition 4 we hav e i ∈ N i = supp( w i ) = su pp( w G ), establishing that 10 (29) G ⊂ supp( w G ) . F rom now on we assume th at f is a proxi mit y op erator, and consider a group G ∈ G . T o pro v e that G = supp( w G ), we will establish that for eac h i, j ∈ J 1 , n K (30) if there exists y ∈ R n suc h that k diag ( w j ) y k 2 6 = k diag( w i ) y k 2 then w i j = 0 and w j i = 0 . T o see why it allo ws to conclude, consider j ∈ sup p( w G ), and i ∈ G . As N i := sup p( w i ) = supp( w G ) we obtain that j ∈ N i , i.e., w i j 6 = 0. By (30), it follo ws that k diag ( w j ) y k 2 = k diag( w i ) y k 2 for eac h y . As w i , w j ha v e non-negativ e entries, this means that w i = w j . As i ∈ G , this implies j ∈ G by th e v ery deﬁ nition of G a s an equ iv alence class. This shows supp( w G ) ⊂ G . Using also (29) , we conclude that supp( w G ) = G . Let us no w pro v e (30). Consider a give n pair i, j ∈ J 1 , n K . Assume th at k diag ( w j ) y k 2 6 = k diag( w i ) y k 2 for at least one v ector y . Without loss of generalit y assume that a := k diag ( w j ) y k 2 < k diag( w i ) y k 2 =: b . Rescaling y by a factor c = 2 λ/ ( a + b ) yields the existence of y such that for the considered pair i, j k diag( w j ) y k 2 < λ < k diag ( w i ) y k 2 . (31) By contin uity , p erturb ing y if needed w e can also assume that for this pair i, j we ha v e y i y j 6 = 0. By (28), as (31) holds in a n eigh b orho o d of y , f is C 1 at y and its partial deriv ativ es for the considered pair i, j satisfy ∂ f i ∂ y j ( y ) = 2( w i j ) 2 y i y j h ′ i  k diag( w i ) y k 2 2  and ∂ f j ∂ y i ( y ) = 0 . Since f is a pro ximit y op erator, b y Corollary 2 w e ha v e ∂ f i ∂ y j ( y ) = ∂ f j ∂ y i ( y ). It f ollo ws th at for the considered pair i, j ( w i j ) 2 y i y j h ′ i  k diag( w i ) y k 2 2  = 0 . As y i y j 6 = 0 and h ′ i ( t ) 6 = 0 for t 6 = 0, we obtain w i j = 0. 10 The inclusion (29) is true even if f is not a proximit y op erator. 24 R ´ EMI GRIBONV AL AND MILA NIKOLO V A T o conclude we now sho w that w j i = 0. As w i j = 0, f i is in fact in dep endent of y j and ∂ f i ∂ y j is identic al ly zer o on R n . By scaling y as needed, we get a ve ctor y ′ suc h that y ′ i y ′ j 6 = 0 and λ < k diag ( w j ) y ′ k 2 < k diag( w i ) y ′ k 2 . Reasoning as ab ov e yields 2( w j i ) 2 y ′ j y ′ i h ′ j  k diag( w j ) y ′ k 2 2  = ∂ f j ∂ y i ( y ′ ) = ∂ f i ∂ y j ( y ′ ) = 0, hence w j i = 0. W e th u s obtain that w i j = w j i = 0 as claimed, establishin g (30) an d therefore G = s u pp( w G ). Referen ces [1] Madhu Advani and Surya Ganguli. An equiv alence b et we en high dimensional Ba yes optimal inference and M-estimation. In D D Lee, M Sugiyama, U V Lu xburg, I Guy on, and R Garnett, ed itors, A dvanc es in Neur al Information Pr o c essing Systems , p ages 3378–33 86. Curran Associates, Inc., 201 6. [2] A nestis A ntonia dis. W av elet metho ds in statistics: S ome recent developments and their applications. Statist ics Surveys , 1(0):16–55, 2007. [3] F rancis Bac h . Optimization with Sp arsit y-Indu cing P enalties. FNT in Machine L e arning , 4(1):1 –106, 2011. [4] S ergey Bakin. A daptive r e gr ession and mo del sele ction in data mining pr oblems . PhD thesis, School of Math- ematical S ciences, Australian National Universi ty , 1999. [5] H einz H Bauschk e, Jonathan M Borw ein, and Patric k L Com b ettes. Bregman Monotone Optimization Algo- rithms. SIAM J. Contr ol and Optimization , 42(2):596–636 , 2003. [6] H einz H Bauschke and Pa trick L Com b ettes. Conv ex An alysis and Monotone Op erator Theory in Hilb ert Spaces – S econd Edition. Springer International Pu blishing, Cham, 2017. [7] Thomas Blumensath and Michael E Da v ies. I t erativ e hard th resholding for compressed sensing. Appl. Comp. Harm. Ana l. , 27(3):265–274, No vem b er 2009. [8] K ristian Bredies, Dirk A Lorenz, and Stefan Reiterer. Minimizatio n of Non-smo oth, Non-conv ex F unctionals by Iterativ e Thresholding. Journal of Optimization The ory and Applic ations , 165(1): 78–112, July 2014. [9] L. M. Bregman. The relaxation meth o d of ﬁ nding the common point of conv ex sets and its application to the solution of problems in conv ex p rogramming. USSR Computational Mathemat ics and Mathematic al Physics , 7(3):200–2 17, 1967. [10] T T ony Cai and Bernard W Silverma n. In corporating information on n eighbou rin g coeﬃcients into w a velet estimation. Sankhy` a: The Indi an Journal of Statistics, Series B (1960-2002) , 63(Sp ecial Issue on W av elets):127–14 8, 2001. [11] H. Cartan. Cours de c al cul diﬀ ´ er entiel . Collection M ´ etho des. Editions Hermann, 1977. [12] Y. Censor and S. A. Zenios. Pro ximal minimization algorithm with d -functions. J. Optim. The ory Appl. , 73(3):451– 464, 1992. [13] Pa trick L Com b ett es and Jean-Christophe Pesquet. Proximal Thresholding Algorithm for Minimization o ver Orthonormal Bases. SIAM J. Optim. , 18:1351–1376 , 2007. [14] I Ek eland and T T u rnbull. I nﬁnite-dimensional optimization and c onvexity . Chicago Lectures in Mathematics. The Universit y of Chicago Press, 1983. [15] C ´ edric F´ evo tte and Matthieu Ko w alski. Hybrid sparse and lo w-rank time-frequency signal decomposition. EUSIPCO , pages 464 –468, 2015. [16] Antonio Galbis and Manuel Maestre. V e ctor Analysis V ersus V e ctor Cal cul us . U n ivers itext. Springer US, Boston, MA, 201 2. [17] Remi Gribonv al. S hould Penalized Least Squares R egression b e In terpreted as Maxim um A Po steriori Esti- mation? IEEE T r ansactions on Signal Pr o c essing , 59(5):2405–2410, 2011. [18] Remi Gribonv al and Pierre Machart. Reconciling ”priors” and ”priors” without prejudice? In C J C Burges, L Bottou, M W elling, Z Ghahramani, and K Q W einberger, editors, A dvanc es in Neur al Information Pr o- c essing Systems 26 (NIPS) , pages 2193–2201, 2013 . [19] R´ emi Grib onv al and Mila Nikolo va. O n ba yesia n estimation and proximit y op erators. Applie d and Compu- tational Harmonic A nal ysi s , pages 1–25, 20 19. A CHA RACTERIZ A TION OF PROXIMITY OPERA TORS 25 [20] Pe ter Hall, S piridon I Penev, Gerard Kerky achari an, and D ominique Picard. Numerical p erformance of block thresholded wa vel et estimators. Sta tistics and Computing , 1997. [21] J.-B. Hiriart-U rru ty and C. Lemar´ echal. Convex analysis and Mi nimization Algorithms, vol. I . Springer- V erlag, Berlin, 1996. [22] Matthieu Kow alski, Kai Siedenburg, and Monik a D¨ orﬂer. S ocial S parsit y! N eigh b orhoo d Sy stems Enrich Structured Shrinkag e Op erators. IEEE T r ans. Signal Pr o c essing , 2013. [23] Matthieu Kow alski an d Bruno T orr ´ esani. Sparsity and p ersistence: mixed norms provide simple signal mod els with dep endent coeﬃcients. Signal, Image and Vide o Pr o c essing , 3(3):251–264, 2009. [24] Matthieu Ko w alski and Bruno T orr ´ esani. Structured S p arsit y: from Mixed Norms to Structured Shrink age. In R ´ emi Grib onv al, editor, SP ARS’ 09 - Signal Pr o c essing with A daptive Sp arse Structur e d R epr esentations , Saint Malo, F rance, April 2009. Inria Rennes - Bretagne Atlantique. [25] C ´ ecile Louchet and Lionel Moisan. Posterior Exp ectation of the T otal V ariation Mo del: Prop erties and Exp eriments . SIAM J. Imaging Sci. , 6(4):2640–2 684, January 2013. [26] Jean-Jacques Moreau. Proximi t´ e et du alit ´ e dans un espace Hilbertien. Bul l . So c. math. F r anc e , 93:273–29 9, 1965. [27] M Nikolo v a. Estimation of bin ary images by minimizing conv ex criteria. In Pr o c e e dings 1998 International Confer enc e on Image Pr o c essing. ICIP98 (Cat. No.98CB36269) , pages 108–112. IEEE Comput. So c, 1998. [28] Ankit Pa rekh and Iv an W Selesnick. Conv ex Denoising using N on-Conv ex Tight F rame Regularization. IEEE Signal Pr o c ess. L ett. , 201 5. [29] R Tyrrell Rock afellar and Roger J B W ets. V ariational Analysis , v olume 317 of Grund lehr en der m athema- tischen Wissenschaften . Springer Berlin Heidelb erg, Berli n, Heidelberg, 1998. [30] Ralph Rock afellar. O n th e maximal monotonicit y of subdiﬀerential mapp ings. Paciﬁc Journal of Mathematics , 33(1):209– 216, 1970. [31] Iv an W Selesnic k . Sparse Regularization via Conv ex An alysis. IEEE T r ans. Signal Pr o c essing , 65(17):4481– 4494, 2017. [32] K Sieden burg and M D¨ orﬂer. Structured sparsit y for audio signals. In Pr o c. 14th Int. Conf. on Digital Audio Eﬀe cts (DAFx- 11) , P aris, 2011. [33] Kai S iedenburg, Matthieu Kow alski, and Monika D¨ orﬂer. Aud io declipping with social sparsity. In I CASSP 2014 - 2014 I EEE International Confer enc e on Ac oustics, Sp e e ch and Si gnal Pr o c essing (ICASSP) , pages 1577–15 81. IEEE, 2014. [34] Brian S Thomso n. Rethinking th e Elemen tary R eal Analysis Co urse. The A meric an Mathematic al M onthly , 2007. [35] Ga ¨ el V aro quaux, Matthieu Kow alski, and Bertrand Thirion. So cial-sparsit y brain deco ders: faster spatial sparsit y. In Interna tional Workshop on Pattern R e c o gnition in Neur oimagi ng , T rento, 2016. [36] C Villani. Optimal T r ansp ort - Old and New , volume 338 of Grund l ehr en der mathematischen Wissenschaften - A series of Compr ehensive Studies in Mathematics . Sp ringer-V erlag Berlin Heidelberg, 2009. [37] Ming Y uan and Yi Lin. Model selection and estimation in regressio n with group ed v ariables. Journ al of the R oyal Statistic al So ciety: Series B (Statistic al Metho dolo gy) , 68(1):49–67, F ebruary 2006.

A characterization of proximity operators

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment