Continuous Multiclass Labeling Approaches and Algorithms

Con tin uous Multiclass Lab eling Approac hes and Algorithms J. Lellmann † C. Sc hn¨ orr † Octob er 10, 2018 Abstract W e study con vex relaxations of the image labeling problem on a con- tin uous domain with regularizers based on metric interaction p otentials. The generic framew ork ensures existence of minimizers and cov ers a wide range of relaxations of the originally combinatorial problem. W e fo cus on t wo sp eciﬁc relaxations that diﬀer in ﬂexibilit y and simplicit y – one can be used to tightly relax any metric interaction p oten tial, while the other one only cov ers Euclidean metrics but requires less computational eﬀort. F or solving the nonsmooth discretized problem, we propose a globally conv er- gen t Douglas-Rachford scheme, and show that a sequence of dual iterates can b e reco vered in order to pro vide a p osteriori optimality b ounds. In a quantitativ e comparison to t wo other ﬁrst-order metho ds, the approach sho ws comp etitive p erformance on syn thetical and real-w orld images. By com bining the metho d with an improv ed binarization tec hnique for non- standard potentials, we were able to routinely reco ver discrete solutions within 1%–5% of the global optim um for the com binatorial image labeling problem. 1 Problem F orm ulation The multi-class image lab eling problem consists in ﬁnding, for each pixel x in the image domain Ω ⊆ R d , a label ` ( x ) ∈ { 1 , . . . , l } whic h assigns one of l class lab els to x so that the labeling function ` adheres to some local data ﬁdelity as w ell as nonlo cal spatial coherency constrain ts. This problem class o ccurs in many applications, such as segmentation, m ul- tiview reconstruction, stitc hing, and inpainting [PCF06]. W e consider the v ari- ational formulation inf ` :Ω →{ 1 ,...,l } f ( ` ) , f ( ` ) := Z Ω s ( x, ` ( x )) dx | {z } data term + J ( ` ) . | {z } regularizer (1) The data term assigns to each p ossible lab el ` ( x ) a lo c al c ost s ( x, ` ( x )) ∈ R , while the r e gularizer J enforces the desired spatial coherency . W e will in partic- ular b e in terested in regularizers that p enalize the weigh ted length of interfaces † Image and P attern Analysis & HCI, Dept. of Mathematics and Computer Science, Uni- versit y of Heidelberg, { lellmann,schnoerr } @math.uni-heidelberg.de 1 Figure 1: Conv ex relaxation of the m ulticlass labeling problem. The assignment of one unique lab el to eac h p oin t in the image domain Ω is represented by a v ector-v alued function u : Ω → R l . Ideally , u partitions the image into l sets by assuming one of the unit vectors { e 1 , . . . , e l } everywhere. By relaxing this set to the standard (probabilit y) simplex ∆ l , the originally com binatorial problem can b e treated in a conv ex framew ork. b et w een regions of constant lab eling. Minimizing f is an inherently combina- torial problem, as there is a discrete decision to b e made for eac h point in the image. In the fully discrete setting, the problem can b e expressed in terms of a Mark ov Random Field [Win06] with a discrete state space, where the data and regularization terms can b e though t of as unary and binary p otentials, resp ectiv ely . F or graph-based discretizations of J , the resulting ob jective only con tains terms that dep end on the lab els at one or tw o p oints, and the problem can b e approached with fast graph cut-based metho ds. Unfortunately , this sc heme in tro duces an anisotropy [Boy03] and th us do es not represen t isotropic regularizers w ell. Using ternary or higher-order terms, the metrication error can b e reduced, but graph-based metho ds then cannot be directly applied. Ho wev er, it can b e shown that even in the graph-based represen tation the problem is NP-hard for relatively simple J [BVZ01], so we cannot exp ect to easily derive fast solvers for this problem. This is in part c aused by the discrete nature of the feasible set. In the following, we will relax this set. This allows to solv e the problem in a globally optimal w ay using conv ex optimization methods. On the do wnside, we cannot exp ect the relaxation to b e exact for any problem instance, i.e. we might get non-discrete (or discrete but sub optimal) solutions. There are several choices for the relaxation metho d, of which in our opinion the following is the most transparent (Fig. 1): Identify lab el i with the i -th unit v ector e i ∈ R l , set E := { e 1 , . . . , e l } , and solv e inf u :Ω → E f ( u ) , f ( u ) := Z Ω h u ( x ) , s ( x ) i dx + J ( u ) . (2) The data term is now linear in u and fully described b y the v ector s ( x ) := ( s 1 ( x ) , . . . , s l ( x )) > := ( s ( x, 1) , . . . , s ( x, l )) > . (3) Due to the linearization, the lo cal costs s may b e arbitrarily complicated, pos- sibly deriv ed from a probabilistic model, without aﬀecting the o v erall problem 2 class. In this form, we r elax the lab el set by allowing u to tak e “in termedi- ate” v alues in the con vex h ull con v E of the original label set. This is just the standard simplex ∆ l , ∆ l := con v { e 1 , . . . , e l } = { a ∈ R l | a > 0 , l X i =1 a i = 1 } . (4) The problem is then considered on the relaxed feasible set C , C := { u ∈ BV(Ω) l | u ( x ) ∈ ∆ l for a.e. x ∈ Ω } . (5) The space of functions of b ounded v ariation BV (Ω) l ⊂ ( L 1 ) l guaran tees a mini- mal regularity of the discon tin uities of u , see Sect. 2.2. Assuming w e can extend the regularizer J to the whole relaxed set C , we get the relaxed problem inf u ∈C f ( u ) , f ( u ) := Z Ω h u ( x ) , s ( x ) i dx + J ( u ) . (6) If J can be made conv ex, the ov erall problem is con v ex as well, and it may lik ely b e computationally tractable. In addition, J should ideally hav e a closed-form expression, or at least lead to a computationally tractable problem. Whether these p oin ts are satisﬁed dep ends on the w ay a given regularizer is extende d to the relaxed set. The protot ypical example for suc h a regularizer is the total variation , TV( u ) = Z Ω k D u k , (7) where k · k denotes the F rob enius norm, in this case on R d × l . Note that u may b e discon tinuous, so the gradien t D u has to b e understo od in a distributional sense (Sect. 2.2). Much of the p opularit y of TV stems from the fact that it allo ws to include b oundary-length terms: The total v ariation of the indicator function χ S of a set S , P er( S ) := TV ( χ S ) . (8) called the p erimeter of S , is just the classical length of the b oundary ∂ S . In this pap er, w e will in more generality consider wa ys to construct regular- izers which p enalize interfaces b et ween t wo adjacent regions with lab els i 6 = j according to the p erimeter (i.e. length or area) of the in terface w eigh ted b y an inter action p otential d : { 1 , . . . , l } 2 → R dep ending on the lab els (in a slight abuse of notation the interaction p oten tial is also denoted by d , since there is rarely any am biguity with resp ect to the ambien t space dimension). The sim- plest case is the Potts mo del with the uniform metric d ( i, j ) = 0 iﬀ i = j and otherwise d ( i, j ) = 1. In this case, the regularizer p enalizes the total interface length, as seen ab o ve for the total v ariation. As a prime motiv ation for our w ork, consider the tw o-class case l = 2 with J = TV . As here the second component of u is giv en b y the ﬁrst via u 2 = 1 − u 1 , w e ma y p ose the relaxed problem in the form min u 0 ∈ BV(Ω) , u 0 ( x ) ∈ [0 , 1] for a.e. x ∈ Ω Z Ω u 0 ( x )( s 1 ( x ) − s 2 ( x )) dx + 2 TV( u 0 ) , (9) 3 where u 0 ( x ) is a scalar. This formulation is also referred to as c ontinuous cut in analogy to graph cut methods. It can b e shown [CEN06] that while there ma y b e non-discrete solutions of the relaxed problem, a discr ete – i.e. u 0 ( x ) ∈ { 0 , 1 } – global optimal solution can b e reco vered from any solution of the relaxed problem. W e can thus reduce the c ombinatorial problem to a c onvex problem. While there are reasons to b eliev e that this pro cedure cannot b e extended to the multi-class case, we may still hop e for “nearly” discrete solutions. 1.1 Related W ork The diﬃcult y of the labeling problem v aries greatly with its precise deﬁnition. F orm ulations of the lab eling problem can b e categorized based on 1. whether they tackle the binary (tw o-class) or the muc h harder m ulticlass problem, and 2. whether they rely on a graph represen tation or are form ulated in a spatially con tinuous framework. An early analysis of a v ariant of the binary c ontinuous cut problem and the asso ciated dual maximal ﬂow problem was done by Strang [Str83]. Chan et al. [CEN06] p ointed out that b y thresholding a nonbinary result of the re- laxed, con v ex problem at almost any threshold one can generate binary solutions of the original, com binatorial problem (this can b e carried o v er to any thresh- old and to slightly more general regularizers [Ber09, ZNF09]. The pro of heavily relies on the coarea formula [FR60], which unfortunately do es not transfer to the multiclass case. The binary contin uous case is also closely related to the thoroughly analyzed Mumford-Shah [MS89] and Rudin-Osher-F atemi (R OF) [R OF92] problems, where a quadratic data term is used. F or the gr aph-b ase d discretization, the binary case can b e formulated as a minim um-cut problem on a grid graph, which allows to solv e the problem exactly and eﬃcien tly for a large class of metrics using graph cuts [KB05, BK04]. Graph cut algorithms hav e become a working horse in many applications as they are very fast for medium sized problems. Unfortunately they oﬀer hardly an y p oten tial for parallelization. As mentioned in the introduction, the graph represen tation inv ariably introduces a grid bias [Bo y03] (Fig. 2). While it is p ossible to reduce the resulting artifacts by using larger neighborho o ds, or b y mo ving higher-order potentials through factor graphs, this greatly increases graph size as well as processing time. Prominen t methods to handle the graph-based multiclass case rely on ﬁnding a lo cal minimum b y solving a sequence of binary graph cuts [BVZ01] (see [KT07] for a recent generalization). These metho ds ha ve recently b een extended to the con tinuous formulation [TPCB08] with similar theoretical p erformance [Ols09]. Our results can b e seen as a contin uous analogon to [Ish03], where it was shown that p oten tials of the form d ( i, j ) = | i − j | can b e exactly form ulated as a cut on a m ulti-lay ered graph. An early analysis can be found in [KT99], where the authors also derive sub optimalit y b ounds of a linear programming relaxation for metric distances. All these metho ds rely on the graph representation with pairwise p oten tials. In this pap er, we will fo cus on the c ontinuous multiclass setting with higher order potentials in the discretization. Closely related to our approach is [CCP08]. 4 Figure 2: Discretization schemes. Left to righ t: Graph-based with 4- and 8- neigh b orho od; higher order p oten tials. In graph-based approaches the regular- izer is discretized using terms dep ending on the lab els of at most tw o neighboring p oin ts. This leads to artifacts as isotropic metrics are not approximated well. Using higher-order terms the discrete functional more closely approximates the con tinuous functional (Fig. 3). Figure 3: Eﬀect of the discretization on an inpainting problem. Left to righ t: Input image with unknown black region; graph-based metho d using pairwise p oten tials ( α - β -swap co de from [SZS + 06]); prop osed metho d using higher order p oten tials. Due to the in tro duced anisotropy , the graph-based method shows a bias tow ards axis-parallel edges. In contrast to approac hes that rely on a linear ordering of the lab els [Ish03, BT09], the authors represent lab els in a higher-dimensional space. In a certain sense, [LL T06] can b e seen as a predecessor of this approac h: in this w ork, the authors represent the lab el assignment using a piecewise constan t real-v alued function, but parametrize this function using a set of l p olynomial basis func- tions, which enables them to employ the Potts regularizer. The approac h of [CCP08] allo ws to formulate in teraction p oten tials of the form d ( i, j ) = γ ( | i − j | ) with nondecreasing, p ositiv e, concav e γ . The authors pro vide a thorough analysis of the contin uous mo del and prop ose a relaxation based on the conv ex env elop e, which giv es almost discrete solutions in many cases. W e will extend this approach to the more general class of regularizers where d is an arbitrary metric. The same authors proposed a “F ast Primal- Dual” algorithm with prov en con vergence to solve the asso ciated saddle p oin t problem [PCBC09a]. By lifting the ob jective to a higher dimensional space, it turns out that the same metho d can b e used to solve many problems also with nonlinear data term [PCBC09b]. Our approac h is a generalization of [ZGFN08, LKY + 09], where a similar linearization is used with the regularizer restricted to the Potts distance, and with less strong con vergence results. These metho ds ha ve also b een extended 5 to segmentation on manifolds [DFPH09]. Regarding optimization, several authors prop osed smo othing of the primal or dual ob jectiv e together with gradient descen t [Ber09, BYT10]. In con trast, our approac h do es not require any a priori smoothing. Using Nesterov’s metho d [Nes04] for the lab eling problem was prop osed in [LKY + 09]. An earlier analysis of the metho d in the context of ` 1 -norm and TV minimization can b e found in [W ABF07]. In [BBC09] the metho d is applied to a class of general ` 1 regu- larized problems. In [GBO09] a predecessor of the prop osed Douglas-Rac hford approac h was presen ted in the Split Bregman framework [Set09a] and restricted to the t wo-class case. W e provide an extension to the m ulti-class case, with pro of of conv ergence and a sound stopping criterion. 1.2 Con tribution The pap er is organized as follows: 1. W e formulate natural requirements on the regularizer J and show their implications on the choice of the interaction p oten tial d (Sect. 3). In par- ticular, d must necessarily b e a metric under these requiremen ts (Prop. 1). 2. Given such a metric, we study tw o p ossible approaches to extend regular- izers J on the relaxed set (Sect. 4): • The “env elop e” approac h, which is a generalization of the metho d recen tly suggested b y Chambolle et al. (Sect. 4.3). While there is no simple closed form expression, we show that it can b e used to construct a true extension for any metric d (Prop. 5). • The “Euclidean distance” approac h (Sect. 4.4), which yields exact extensions for Euclidean metrics d only but has a closed form ex- pression. W e review some methods for the approximation of non- Euclidean metrics. W e provide a uniﬁed con tinuous framework and show existence of a min- imizer. 3. Both approaches lead to non-smooth conv ex problems, whic h can b e stud- ied in a general saddle point formulation (Sect. 5). Within this framework, w e prop ose an impro ved binarization tec hnique for nonstandard p oten tials to recov er approximate solutions for the non-relaxed problem (Sect. 5.6). 4. W e provide and analyze tw o diﬀerent methods that are capable of mini- mizing the speciﬁc class of saddle point problems (Sect. 6): • A sp ecialization of a metho d for nonsmo oth optimization as sug- gested by Nestero v (Sect. 6.2). The metho d is virtually parameter- free and pro vides explicit a priori and a p osteriori optimalit y b ounds. • A Douglas-Rac hford splitting approach (Sect. 6.3). W e sho w that the approac h allo ws to compute a sequence of dual iterates that pro vide an optimality bound and stopping criterion in form of the primal-dual gap. 6 Both metho ds are highly parallelizable and are sho wn to conv erge. F or reference, w e also summarize the primal-dual technique from [PCBC09a] (Sect. 6.1). 5. Finally , we illustrate and compare the ab o v e methods under v arying con- ditions and demonstrate the applicability of the prop osed approac hes on real-w orld problems (Sect. 7). In contrast to existing graph-based methods, we provide a con tinuous and isotropic formulation, while in comparison with existing contin uous approaches, w e provide a uniﬁed framew ork for arbitrary , non-uniform metrics d . The Eu- clidean metric metho d and Nestero v optimization hav e been announced in less generalit y in [LKY + 09, LBS09]. 2 Mathematical Preliminaries In the following sections w e provide a reference of the notation used, and a brief in tro duction to the concept of functions of b ounded v ariation and corresp onding functionals. W e aim to provide the reader with the basic ideas. F or more detailed exp ositions we refer to [AFP00, Zie89]. 2.1 Notation In the follo wing, superscripts v i denote a collection of vectors or matrix columns, while subscripts v k denote vector comp onen ts, i.e. we denote, for A ∈ R d × l , A = ( a 1 | . . . | a l ) = ( A ij ) , A ij =  a j  i = a j i , 1 6 i 6 d, 1 6 j 6 l . (10) An additional brack et v ( i ) indicates an element of a sequence ( v ( i ) ). W e will frequen tly mak e use of the Kronec k er product [Gra81] ⊗ : R n 1 × m 1 × R n 2 × m 2 → R ( n 1 n 2 ) × ( m 1 m 2 ) (11) in order to formulate all results for arbitrary dimensions. The standard simplex in R l is denoted by ∆ l := { x ∈ R l | x > 0 , e > x = 1 } , where e := (1 , . . . , 1) > ∈ R l . I n is the identit y matrix in R n and k · k the usual Euclidean norm for vectors resp. the F rob enius norm for matrices. Analogously , the standard inner pro duct h· , ·i extends to pairs of matrices as the sum ov er their element wise pro duct. B r ( x ) denotes the ball of radius r at x , and S d − 1 the set of x ∈ R d with k x k = 1. The c haracteristic function χ S ( x ) of a set S is deﬁned as χ S ( x ) = 1 iﬀ x ∈ S and χ S ( x ) = 0 otherwise. By δ S ( x ) = 0 iﬀ x ∈ S and δ S ( x ) = + ∞ otherwise we denote the corresp onding indicator function. F or a conv ex set C , σ C ( u ) := sup v ∈C h u, v i is the supp ort function from conv ex analysis. J ( u ) denotes the classical Jacobian of u . C k c (Ω) is the space of k -times contin uously diﬀeren tiable functions on Ω with compact supp ort, and C 0 (Ω) the completion of C 0 c (Ω) under the supremum norm. As usual, L d denotes the d -dimensional Lebesgue measure, while H k denotes the k -dimensional Hausdorﬀ measure. F or some measure µ and set M , µ x M denotes the restriction of µ to M , i.e. ( µ x M )( A ) := µ ( M ∩ A ). 7 2.2 T otal V ariation and BV The total v ariation will be our main tool to construct the regularizer J . F or a diﬀeren tiable scalar-v alued function u , the total v ariation is simply the in tegral o ver the norm of its gradien t: TV( u ) = Z Ω k∇ u k dx . (12) As u is the designated lab eling function, which ideally should b e piecewise con- stan t, the diﬀerentiabilit y and contin uit y assumptions hav e to b e dropp ed. In the following w e will shortly review the general deﬁnition of the total v ariation and its properties. W e require the image domain Ω ⊆ R d to b e a b ounded op en domain with compact Lipschitz b oundary , that is the b oundary can lo cally b e represented as the graph of a Lipsc hitz-contin uous function. F or simplicit y , w e will assume in the following that Ω = (0 , 1) d . W e consider general v ector-v alued functions u = ( u 1 , . . . , u l ) : Ω → R l whic h are lo cally absolutely integrable, i.e. u ∈ L 1 loc (Ω) l . As Ω is b ounded this is equiv alent to being absolutely in tegrable, i.e. u ∈ L 1 (Ω) l . F or an y suc h function u , its total variation TV( u ) is deﬁned in a dual wa y [AFP00, (3.4)] as TV( u ) := sup v ∈D TV l X j =1 Z Ω u j div v j dx = sup v ∈D TV Z Ω h u, Div v i dx , (13) D TV := { v ∈ C ∞ c (Ω) d × l |k v ( x ) k 6 1 ∀ x ∈ Ω } , (14) Div v :=  div v 1 , . . . , div v l  > . This deﬁnition can be derived for con tinuously diﬀerentiable u by extending (12) to v ector-v alued u , TV( u ) = Z Ω kJ ( u ) k dx , (15) replacing the norm b y its dual form ulation and partial integration. If u has ﬁnite total v ariation, i.e. TV( u ) < ∞ , u is said to b e of b ounde d variation . The v ector space of all suc h functions is denoted by BV(Ω) l : BV(Ω) l = n u ∈  L 1 (Ω)  l | TV ( u ) < ∞ o . (16) Equiv alently , u ∈ BV (Ω) l iﬀ u ∈ L 1 (Ω) l and its distributional deriv ative corre- sp onds to a ﬁnite Radon measure; i.e. u j ∈ L 1 (Ω) and there exist R d -v alued measures D u j = ( D 1 u j , . . . , D d u j ) on the Borel subsets B (Ω) of Ω such that [AFP00, p.118] P l j =1 R Ω u j div v j dx = − P l j =1 P d i =1 R Ω v j i dD i u j , ∀ v ∈ ( C ∞ c (Ω)) d × l . (17) These measures form the distributional gradient D u = ( D u 1 | . . . | D u l ), which is again a measure that v anishes on an y H ( d − 1) -negligible set. If u ∈ BV(Ω) then | D u | (Ω) = TV( u ), where | D u | is the total v ariation of the measure D u in the measure-theoretic sense [AFP00, 3.6]. The total v ariation of c haracteristic func- tions has an in tuitive geometrical in terpretation: F or a Lebesgue-measurable 8 subset S ⊆ R d , its p erimeter is deﬁned as the total v ariation of its c haracteris- tic function, P er( S ) := TV ( χ S ) . (18) Assuming the b oundary ∂ S is suﬃciently regular, P er( S ) is just the classical length ( d = 2) or area ( d = 3) of the b oundary . 2.3 Prop erties of TV and Compactness W e review the most imp ortan t ingredients for pro ving existence of minimizers for v ariational problems on inv olving BV in volving TV. Convexity. As TV is the point wise supremum of a family of linear functions, it is c onvex and p ositively homo gene ous , i.e. TV( αu ) = α TV( u ) for α > 0. L ower Semic ontinuity. A functional J is said to b e lower semic ontinuous with resp ect to some topology , if for an y sequence ( u ( k ) ) conv erging to u , lim inf k →∞ J ( u ( k ) ) > J ( u ) . (19) It can b e sho wn that for ﬁxed Ω, the total v ariation TV is well-deﬁned on L 1 loc (Ω) l and low er semicontin uous in BV (Ω) l w.r.t. the L 1 loc (Ω) l top ology [AFP00, 3.5,3.6]; hence also in L 1 (Ω) l due to the b oundedness of Ω. Comp actness. F or BV, instead of the norm topology induced by k u k BV := Z Ω k u k dx + TV( u ) , (20) whic h makes BV (Ω) l a Banac h space but is often too strong, one frequen tly uses the weak* conv ergence: Deﬁne u ( k ) → u we akly* iﬀ 1. u, u ( k ) ∈ BV (Ω) l ∀ k ∈ N , 2. u ( k ) → u in L 1 (Ω) and 3. D u ( k ) → D u weakly* in measure, i.e. ∀ v ∈ C 0 (Ω) : lim k →∞ Z Ω v dD u ( k ) = Z Ω v dD u . (21) F or u, u ( k ) ∈ BV (Ω) l this is equiv alent to u ( k ) → u in L 1 (Ω) l , and ( u ( k ) ) being uniformly b ounded in BV (Ω) l , i.e. there exists a constant C < ∞ such that k u ( k ) k BV 6 C ∀ k ∈ N [AFP00, 3.13]. F or the weak* top ology in BV, a com- pactness result holds [AFP00, 3.23]: If ( u ( k ) ) ⊂ BV (Ω) l and ( u ( k ) ) is uniformly b ounded in BV(Ω) l , then ( u ( k ) ) con tains a subsequence weakly*-conv erging to some u ∈ BV(Ω) l . 2.4 General F unctionals on BV W e will no w review how general functionals dep ending on the distributional gradien t D u can be deﬁned. Recall that for any u ∈ BV (Ω) l the distributional gradien t Du is a measure. Moreov er, it can be uniquely decomposed in to three m utually singular measures D u = D a u + D j u + D c u , (22) 9 that is: An absolutely c ontinuous part D a , the jump part D j , and the Cantor part D c . Mutual singularity refers to the fact that Ω can b e partitioned into three subsets, suc h that eac h of the measures is concen trated on exactly one of the sets. W e will give a short intuitiv e explanation, see [AFP00, 3.91] for the full deﬁnitions. The D a part is absolutely contin uous with respect to the d -dimensional Leb esgue measure L d , i.e. it v anishes on an y L d -negligible set. It captures the “smo oth” v ariations of u : in an y neighborho od where u has a (p ossibly w eak) Jacobian J ( u ), the jump and Can tor parts v anish and D u = D a u = J ( u ) L d . (23) The jump part D j is concen trated on the set of p oints where locally u jumps b et w een tw o v alues u − and u + along a ( d − 1)-dimensional hypersurface with normal ν u ∈ S d − 1 (unique up to a c hange of sign). In fact, there exists a jump set J u of discontin uities of u and Borel functions u + , u − : J u → R l and ν u : J u → S d − 1 suc h that [AFP00, 3.78, 3.90] D j u = D u x J u = ν u ( u + − u − ) > H d − 1 x J u , (24) where H d − 1 x J u denotes the restriction of the ( d − 1)-dimensional Hausdorﬀ measure on the jump set J u , i.e. ( H d − 1 x J u )( A ) = H d − 1 ( J u ∩ A ) for measurable sets A . The Cantor part D c captures anything that is left. As an imp ortan t consequence of the mutual singularit y , the total v ariation decomp oses into | D u | = | D a u | + | D j u | + | D c u | . Using this idea, one can deﬁne functionals dep ending on the distributional gradient D u [AFP00, Prop. 2.34]. F or u ∈ BV (Ω) l and some con vex, low er semi-con tin uous Ψ : R d × l → R , deﬁne J ( u ) := Z Ω Ψ( D u ) := Z Ω Ψ( J ( u )( x )) dx + . . . Z J u Ψ ∞  ν u ( x )  u + ( x ) − u − ( x )  >  d H d − 1 + . . . Z Ω Ψ ∞  D c u | D c u |  d | D c u | . (25) Here Ψ ∞ is the recession function Ψ ∞ ( x ) = lim t →∞ Ψ( tx ) t of Ψ, and D c u/ | D c u | denotes the p olar de c omp osition of D c u , whic h is the densit y of D c u with respect to its total v ariation measure | D c u | . If Ψ is p ositiv ely homogeneous, Ψ ∞ = Ψ holds, and J ( u ) = Z Ω Ψ  D u | D u |  d | D u | . (26) F rom (25) it becomes clear that the meaning of Ψ acting on the Jac obian of u is extended to the jump set as acting on the diﬀer enc e of the left and right side limits of u at the discontin uity . This is a key point: by switc hing to the measure form ulation, one can handle noncon tinuous functions as well. 10 3 Necessary Prop erties of the In teraction P o- ten tial Before applying the metho ds ab o ve to the lab eling problem, we start with some basic considerations ab out the regularizer and the interaction potential d . W e b egin b y formalizing the requirements on the regularizer of the relaxed prob- lem as men tioned in the introduction. Let us assume we are given a general in teraction p oten tial d : { 1 , . . . , l } 2 → R . Intuitiv ely , d ( i, j ) assigns a weigh t to switc hing betw een lab el i and lab el j . W e require d ( i, j ) > 0 , i 6 = j , (27) but no other metric prop erties (i.e. symmetry or triangle inequality) for now. Within this w ork, we p ostulate that the regularizer should satisfy (P1) J is conv ex and positively homogeneous on BV(Ω) l . (P2) J ( u ) = 0 for an y constan t u , i.e. there is no penalty for constant labelings. (P3) F or any partition of Ω in to tw o sets S, S { with P er( S ) < ∞ , and any i, j ∈ { 1 , . . . , l } , J ( e i χ S + e j χ S {  = d ( i, j ) Per( S ) , (28) i.e. a c hange from label i to lab el j gets p enalized proportional to d ( i, j ) as w ell as the perimeter of the interface. Note that this implies that J is isotropic (i.e. rotationally inv ariant). W e require conv exity in (P1) in order to render global optimization tractable. Indeed, if J is con vex, the o v erall ob jective function (6) will b e con vex as w ell due to the linearization of the data term. P ositive homogeneit y is included as it allows J to b e represented as a support function (i.e. its conv ex conjugate is an indicator function and J = σ D for some closed conv ex set D ), whic h will b e exploited by our optimization methods. Requiremen ts (P3) and (P2) formalize the principle that the m ultilab eling problem should reduce to the classical contin uous cut (9) in the t wo-class case. This allows to include b oundary length-based terms in the regularizer that can additionally b e weigh ted dep ending on the labels of the adjoining region (Fig. 4). T ogether, these requiremen ts p ose a natural restriction on d : Prop osition 1. L et ( J, d ) satisfy (P1) – (P3). Then d must ne c essarily b e a metric, i.e. for al l i, j, k ∈ { 1 , . . . , l } , 1. d ( i, i ) = 0 , 2. d ( i, j ) = d ( j, i ) , ∀ i 6 = j , 3. d is sub additive: d ( i, k ) 6 d ( i, j ) + d ( j, k ) . Pr o of. 1. follows from (P2) and (P3) by choosing i = j and S with Per( S ) > 0. Symmetry in 2. is obtained from (P3) by replacing S with S { , as Per( S ) = P er( S { ). T o sho w 3., ﬁrst note that J ( u ) = 2 J ( u/ 2 + c/ 2) for any constant 11 Figure 4: Eﬀect of choosing diﬀeren t in teraction potentials. T op ro w: The original image (left) is segmen ted into 12 regions corresponding to protot ypical colors vectors. The P otts interaction p oten tial p enalizes the b oundary length indep enden tly of the lab els (right), which leads to a uniformly smooth segmen- tation. Bottom row: By mo difying the interaction potential, the regulariza- tion strength is selectiv ely adjusted to suppress background (left) or foreground (righ t) structures while allowing for ﬁne details in the other regions. c ∈ R l and all u ∈ BV(Ω) l , since J ( u ) = J ( u + c − c ) 6 2 J (( u + c ) / 2) + J ( − c/ 2) = 2 J ( u/ 2 + c/ 2) 6 J ( u ) + J ( c ) = J ( u ). Fix an y set S with p erimeter c := Per( S ) > 0 . (29) Then, using the ab o ve mentioned fact and the positive homogeneity of J , cd ( i, k ) = J  e i χ S + e k χ S {  (30) = 2 J  1 2  e i χ S + e k χ S {  + 1 2 e j  (31) = 2 J  1 2  e i χ S + e j χ S {  + 1 2  e j χ S + e k χ S {   (32) 6 J  e i χ S + e j χ S {  + J  e j χ S + e k χ S {  (33) = c ( d ( i, j ) + d ( j, k )) . (34) Note that if the requiremen t (27) is dropped, it is easy to sho w that if d ( i, j ) = 0 for some i 6 = j , then d ( i, k ) = d ( j, k ) for any k . In this case the classes i and j can b e collapsed into a single class as far as the regularizer is concerned. The decision b etw een label i and j is then completely local, i.e. dep ends only on the data term and can b e p ostponed to a p ost-pro cessing step b y modifying the data term to s 0 i ( x ) := s 0 j ( x ) := min { s i ( x ) , s j ( x ) } . (35) Th us (27) is not a real limitation and can b e alwa ys assured. As a side note, it can b e shown that, under some assumptions and in the space of piecewise 12 constan t functions, the subadditivity of d already follows if J is required to be lo wer semicontin uous [Bra02, p.88]. Prop osition 1 implies that for non-metric d , we generally cannot exp ect to ﬁnd a regularizer satisfying (P1)–(P3). Note that here J is not required to b e of any particular form. In the follo wing sections, we will show that on the other hand, if d is metric as in Prop osition 1, such a regularizer can alwa ys b e con- structed. This implies that the interaction p otentials allo wing for a regularizer that satisﬁes (P1)–(P3) are exactly the metrics. 4 Constructing Regularizers from the In terac- tion P oten tial W e study next how to construct regularizers on BV (Ω) l satisfying (P1)–(P3). As in (25) we set J ( u ) := Z Ω Ψ( D u ) . (36) W e additionally require Ψ : R d × l → R > 0 to b e a supp ort function, i.e. for some closed, conv ex ∅ 6 = D loc ⊆ R d × l , Ψ( z ) = σ D loc ( z ) = sup v ( x ) ∈D loc h z , v ( x ) i . (37) As a support function, Ψ coincides with its recession function Ψ ∞ , thus J ( u ) = Z Ω Ψ( J ( u )( x )) dx + . . . Z J u Ψ  ν u ( x )  u + ( x ) − u − ( x )  >  d H d − 1 + . . . Z Ω Ψ  D c u | D c u |  d | D c u | . (38) Also, we hav e an equiv alent dual form ulation in analogy to (13), J ( u ) = sup { Z Ω h u , Div v i dx | v ∈ C ∞ c (Ω) d × l , v ( x ) ∈ D loc ∀ x ∈ Ω } . (39) F or simplicit y we will also assume that D loc is rotationally in v arian t along the image dimensions, i.e. for an y rotation matrix R ∈ R d × d , v = ( v 1 , . . . , v l ) ∈ D loc ⇔ ( Rv 1 , . . . , R v l ) ∈ D loc . (40) This is equiv alent to J b eing isotropic. W e will now show under which circumstances a minimizer exists in BV(Ω) l , and then see how the approach can b e used to construct regularizers for sp eciﬁc in teraction potentials. 4.1 Existence of Minimizers The complete problem considered here is of the form (cf. (6) and (38)) inf u ∈C f ( u ) , f ( u ) := Z Ω h u, s i dx + J ( u ) (41) 13 where J ( u ) = R Ω Ψ( D u ) as in (36), Ψ( z ) = sup v ( x ) ∈D loc h z , v ( x ) i (42) for some closed conv ex D loc 6 = ∅ , and C = { u ∈ BV(Ω) l | u ( x ) ∈ ∆ l L d -a.e. x ∈ Ω } . (43) Note that f is con v ex, as it is the p oin twise suprem um of aﬃne functions (39). Again for simplicit y we set Ω = (0 , 1) d . Then w e ha v e the following Prop osition 2. L et D loc 6 = ∅ b e close d c onvex, Ψ = σ D loc , s ∈ L 1 (Ω) l , and f ( u ) = Z Ω h u, s i dx + Z Ω Ψ( D u ) . (44) A dditional ly assume that D loc ⊆ B ρ u (0) ⊆ R d × l for some 0 < ρ u . Then f is lower semic ontinuous in BV (Ω) l with r esp e ct to L 1 c onver genc e. Pr o of. As the data term is contin uous, it suﬃces to show that the regularizer is low er semicontin uous. This is an application of [AFP00, Thm. 5.47]. In fact, the theorem sho ws that f is the relaxation of ˜ f : C 1 (Ω) l → R , ˜ f ( u ) := Z Ω h u, s i dx + Z Ω Ψ( J u ( x )) dx , (45) on BV (Ω) l w.r.t. L 1 loc (th us here L 1 ) conv ergence and th us lo wer semicon tinuous in BV(Ω) l . T o apply the theorem, we hav e to show that Ψ is quasiconv ex in the sense of [AFP00, 5.25], whic h holds as it is con vex by construction. The other precondition is (at most) linear gro wth of Ψ, which holds with 0 6 Ψ( x ) 6 ρ u k x k . Prop osition 3. L et f , Ψ , s as in Pr op. 2 and additional ly assume that B ρ l (0) ∩ { ( v 1 , . . . , v l ) | X i v i = 0) } ⊆ D loc ⊆ B ρ u (0) (46) for some 0 < ρ l 6 ρ u . Then the pr oblem min u ∈C f ( u ) (47) has at le ast one minimizer. Pr o of. F rom the inner and outer b ounds it holds that ρ l k z k 6 Ψ( z ) 6 ρ u k z k for an y z = ( z 1 | . . . | z l ) ∈ R d × l with z 1 + . . . + z l = 0. Moreo ver, the constrain t u ∈ C implies that D u = ( D u 1 | . . . | D u l ) satisﬁes D u 1 + . . . + D u l = 0. Com bining this with the p ositiv e homogeneity of Ψ, it follo ws from (26) that 0 6 ρ l TV( u ) 6 J ( u ) 6 ρ u TV( u ) . (48) F rom Z Ω h u, s i dx > − Z Ω k u ( x ) k ∞ k s ( x ) k 1 dx , (49) 14 the fact that s ∈ L 1 (Ω) l , and b oundedness of Ω and ∆ l , it follows that the data term is bounded from b elo w on C . W e no w show co ercivity of f with respect to the BV norm: Let ( u ( k ) ) ⊂ C with k u ( k ) k 1 + TV ( u ( k ) ) → ∞ . As the data term is b ounded from b elo w, and using the fact that J ( u ( k ) ) > ρ l TV( u ( k ) ), it follows that f ( u ( k ) ) → + ∞ . Thus f is coercive. Equations (48) and (49) also show that f is b ounded from b elo w. Thus we can choose a minimizing sequence ( u ( k ) ). Due to the c oercivity , the sequence k u ( k ) k 1 + TV ( u ( k ) ) must then be b ounded from ab ov e. F rom this and [AFP00, Thm. 3.23] we conclude that there is a subsequence of ( u ( k ) ) weakly*- (and th us L 1 -) conv erging to some u ∈ BV (Ω) l . With the low er semicontin uity from Prop. 2 and closedness of C with resp ect to L 1 con vergence, existence of a minimizer follows. 4.2 Relation to the Interaction Poten tial T o relate such J to the lab eling problem in view of (P3), w e hav e the following Prop osition 4. L et Ψ = σ D loc and J ( u ) = R Ω Ψ( D u ) as in Pr op. 2. F or some u 0 ∈ BV(Ω) and ve ctors a, b ∈ ∆ l , let u ( x ) = (1 − u 0 ( x )) a + u 0 ( x ) b . Then for any ve ctor y ∈ R d with k y k = 1 , J ( u ) = Ψ  y ( b − a ) >  TV( u 0 ) =  sup v ∈D loc k v ( b − a ) k  TV( u 0 ) . (50) In p articular, if Ψ  y ( e i − e j ) >  = d ( i, j ) for some y with k y k = 1 , then J fulﬁl ls (P3). Pr o of. T o show the ﬁrst equalit y , (26) implies J ( u ) = Z Ω Ψ  D u | D u |  | D u | (51) = Z Ω Ψ  D ( a + u 0 ( b − a )) | D ( a + u 0 ( b − a )) |  | D ( a + u 0 ( b − a )) | (52) = Z Ω Ψ  ( D u 0 )( b − a ) > | ( D u 0 )( b − a ) > |  | ( D u 0 )( b − a ) > | (53) W e make use of the property | ( D u 0 )( b − a ) > | = | D u 0 |k b − a k , whic h is a direct consequence of the deﬁnition of the total v ariation measure and the fact that k w ( b − a ) > k = k w kk b − a k for an y v ector w ∈ R d (note that a, b ∈ R l are also v ectors). Therefore J ( u ) = Z Ω Ψ  ( D u 0 )( b − a ) > | D u 0 |k b − a k  | D u 0 |k b − a k , (54) whic h b y p ositive homogeneity of Ψ implies J ( u ) = Z Ω Ψ  D u 0 | D u 0 | ( b − a ) >  | D u 0 | . (55) Since the density function D u 0 / | D u 0 | assumes v alues in S d − 1 , there exists, for a.e. x ∈ Ω, a rotation matrix mapping ( Du 0 / | D u 0 | )( x ) to y . T ogether with the 15 rotational inv ariance of D loc from (40) this implies J ( u ) = Z Ω Ψ  y ( b − a ) >  | D u 0 | = Ψ  y ( b − a ) >  TV( u 0 ) , (56) whic h pro ves the ﬁrst equality in (50). The second equality can b e seen as follo ws: r := sup v ∈D loc k v ( b − a ) k (57) = sup v ∈D loc sup z ∈ R d , k z k 6 1 h z , v ( b − a ) i (58) = sup z ∈ R d , k z k 6 1 sup v ∈D loc h z , v ( b − a ) i . (59) Denote by R z a rotation matrix mapping z to y , i.e. R z z = y , then r = sup z ∈ R d , k z k 6 1 sup v ∈D loc h R z z , R z v ( b − a ) i (60) = sup z ∈ R d , k z k 6 1 sup v 0 ∈ R z D loc h y , v 0 ( b − a ) i . (61) The rotational in v ariance of D loc pro vides R z D loc = D loc , therefore r = sup z ∈ R d , k z k 6 1 sup v ∈D loc h y , v ( b − a ) i (62) = sup v ∈D loc h y , v ( b − a ) i = Ψ( y ( b − a ) > ) . (63) As a consequence, if the relaxed m ulticlass formulation is restricted to tw o classes b y parametrizing u = (1 − u 0 ) a + u 0 b for u 0 ( x ) ∈ [0 , 1], it essentially reduces to the scalar contin uous cut problem: Solving min u 0 ∈ BV(Ω) , u 0 ( x ) ∈ [0 , 1] , L d -a.e. x ∈ Ω Z Ω h (1 − u 0 ) a + u 0 b, s i dx + J ( u ) (64) is equiv alent to solving min u 0 ∈ BV(Ω) , u 0 ( x ) ∈ [0 , 1] , L d -a.e. x ∈ Ω Z Ω u 0 ( b − a ) dx + Ψ( y ( b − a ) > ) TV ( u 0 ) (65) whic h is just the classical binary contin uous cut approac h with data ( b − a ) and regularizer weigh t Ψ( y ( b − a ) > ), where y ∈ R d is some arbitrary unit vector. F or the m ulticlass case, assume that u = u P = e 1 χ P 1 + . . . + e l χ P l (66) for some partition P 1 ∪ . . . ∪ P l = Ω with P er( P i ) < ∞ , i = 1 , . . . , l . Then the absolutely contin uous and Cantor parts v anish [AFP00, Thm. 3.59, Thm. 3.84, Rem. 4.22], and only the jump part remains: J ( u P ) = Z S u P Ψ  ν u P ( u P + − u P − ) >  d H d − 1 , (67) 16 d c a f e b Figure 5: Illustration of the set D loc used to build the regularizer for the uniform distance, d ( i, j ) = 0 iﬀ i = j and d ( i, j ) = 1 otherwise, for l = 3 and d = 1, i.e. in a one-dimensional space. Sho wn is a cut through the z 1 + z 2 + z 3 = 0 plane; the labels corresp ond to the points e i − e j with i 6 = j . The local en velope metho d leads to a larger set D loc (dashed) than the Euclidean metric metho d (solid). This improv es the tigh tness of the relaxation, but requires more exp ensiv e pro jection steps. where S u P = S i =1 ,...,l ∂ P i is the union of the interfaces betw een regions. Deﬁne i ( x ) , j ( x ) such that u P + ( x ) = e i ( x ) and u P − ( x ) = e j ( x ) . Then J ( u P ) = Z S u P Ψ  ν u P  e i ( x ) − e j ( x )  >  d H d − 1 . (68) By rotational in v ariance, J ( u P ) = Z S u P Ψ  y  e i ( x ) − e j ( x )  >  d H d − 1 . (69) for some y with k y k = 1. Th us the regularizer locally penalizes jumps betw een lab els i and j along an interface with the in terface length, multiplied by the factor Ψ( y ( e i − e j ) > ) dep ending on the lab els of the adjoining regions. The question is how to choose the set D loc suc h that Ψ( y ( e i − e j ) > ) = d ( i, j ) for a giv en interaction p otential d . W e will consider tw o approaches which diﬀer with resp ect to expressiveness and simplicit y of use: In the lo c al envelop e approac h (Sect. 4.3), D loc is chosen as large as p ossible. In turn, J is as large as p ossible with the in tegral formulation (36). This preven ts introducing artiﬁcial minima generated by the relaxation, and p oten tially keeps minimizers of the relaxed problem close to minimizers of the discrete problem. Ho wev er, D loc is only implicitly deﬁned, whic h complicates optimization. In contrast, in the emb e dding approach (Sect. 4.4), D loc is simpler at the cost of b eing able to represen t only a subset of all metric potentials d . F or an illustration of the t wo approac hes, see Fig. 5. 17 4.3 Lo cal Env elop e Metho d Cham b olle et al. [CCP08] proposed an interesting approach for potentials d of the form d ( i, j ) = γ ( | i − j | ) for a positive conca v e function γ . The approac h is deriv ed by sp ecifying the v alue of J on discrete u only and then constructing an approximation of the con vex env elop e by pulling the conv exiﬁcation into the in tegral. This potentially generates tigh t extensions and thus one may hop e that the con vexiﬁcation pro cess do es not generate to o many artiﬁcial non-discrete solutions. W e prop ose to extend this approach to arbitr ary metric d by setting (Fig. 5) D d loc := \ i 6 = j { v =  v 1 , . . . , v l  ∈ R d × l |k v i − v j k 6 d ( i, j ) , X k v k = 0 } (70) for some given interface potential d ( i, j ). By deﬁnition D d loc is rotationally in- v ariant, and by the considerations in Sect. 3 w e assume d to b e a metric. Then the inner and outer b ound assumptions required for existence of a minimizer in Prop. 3 are satisﬁed. Moreo ver, d can be reconstructed from D d loc : Prop osition 5. L et d : { 1 , . . . , l } 2 → R > 0 b e a metric. Then for any i, j , sup v ∈D d loc  v i  1 −  v j  1  = d ( i, j ) . (71) Pr o of. “ 6 ” follows from the deﬁnition (70). “ > ” can b e shown using a net work ﬂo w argumen t: W e hav e sup v ∈D d loc  v i  1 −  v j  1  (72) > sup { p i − p j | p ∈ R l , X k p k = 0 , ∀ i 0 , j 0 : p i 0 − p j 0 6 d ( i 0 , j 0 ) } (73) ( ∗ ) = sup { p i − p j | p ∈ R l , ∀ i 0 , j 0 : p i 0 − p j 0 6 d ( i 0 , j 0 ) } (74) ( ∗∗ ) = d ( i, j ) . (75) Equalit y ( ∗ ) holds since eac h p in the set in (74) can b e asso ciated with ˜ p := p − 1 l P k p k , which is contained in the set in (73) and satisﬁes and p i − p j = ˜ p i − ˜ p j . The last equality ( ∗∗ ) follows from [Mur03, 5.1] with the notation γ = d (and ¯ γ = d , since d is a metric and therefore the triangle inequality implies that the length of the shortest path from i to j is alw ays d ( i, j )). The ﬁnal result of this section is the follo wing: Prop osition 6. L et d : R l × l → R > 0 b e a metric. Deﬁne D loc := D d loc as in (70), Ψ d := σ D d loc and J d := Z Ω Ψ d ( D u ) (76) as in (38). Then J d satisﬁes (P1)–(P3). Pr o of. (P1) and (P2) are clear from the deﬁnition of J d . (P3) follo ws directly from Prop. 5 and Prop. 4 with y = e 1 . 18 Deﬁning D d loc as in (70) pro vides us with a w a y to extend the desired regu- larizer for any metric d to non-discrete u ∈ C via (38). The price to pay is that there is no simple closed expression for Ψ and thus for J d , which p otentially complicates optimization. Note that in order to deﬁne D d loc , d do es not hav e to b e a metric. How ev er Prop. 5 then do es not hold in general, so J is not a true extension of the desired regularizer, although it still pro vides a lo wer b ound. 4.4 Euclidean Metric Metho d In this section, we consider a regularizer which is less p ow erful but more eﬃcien t to ev aluate. The classical total v ariation for v ector-v alued u as deﬁned in (13) is TV( u ) = Z Ω k D u k . (77) This classical deﬁnition has also b een used in color denoising and is also referred to as MTV [SR96, DA V08]. W e prop ose to extend this deﬁnition by c ho osing an emb e dding matrix A ∈ R k × l for some k 6 l , and deﬁning J A ( u ) := TV( Au ) . (78) This corresp onds to substituting the F robenius matrix norm on the distribu- tional gradient with a linearly weigh ted v arian t. In the framew ork of (38), it amoun ts to setting D loc = D A loc (cf. Fig. 5) with D A loc := { v 0 A | v 0 ∈ R d × k , k v 0 k 6 1 } = B 1 (0) A (79) Clearly , 0 ∈ D A loc and Ψ( z ) = σ D A loc ( z ) = sup v 0 ∈B 1 (0) A h z , v 0 i = sup v ∈B 1 (0) h z , v A i (80) = sup v ∈B 1 (0) h z A > , v i = k z A > k . (81) In particular, w e formally ha ve Ψ( D u ) = k ( Du ) A > k = k D ( Au ) k , (82) as u 7→ Du is linear in u . T o clarify the deﬁnition, w e may rewrite this to TV A ( u ) = Z Ω q k D 1 u k 2 A + . . . + k D d u k 2 A , (83) where k v k A := ( v > A > Av ) 1 / 2 . In this sense, the approach can b e understo o d as replacing the Euclidean norm by a linearly weigh ted v ariant. It remains to show for whic h interaction potentials d assumption (P3) can b e satisﬁed. The next proposition shows that this is p ossible for the class of Euclide an metrics. Prop osition 7. L et d b e an Euclide an metric, i.e. ther e exist k ∈ N as wel l as a 1 , . . . , a l ∈ R k such that d ( i, j ) = k a i − a j k . Then for A = ( a 1 | . . . | a l ) , J A := TV A satisﬁes (P1)–(P3). 19 Pr o of. (P1) and (P2) are clearly satisﬁed. In order to show (P3) we apply Prop. 4 and assume k y k = 1 to obtain with (81) Ψ( y ( e i − e j ) > ) = k y ( e i − e j ) > A > k (84) = k y ( a i − a j ) > k = k a i − a j k . The class of Euclidean metrics comprises some imp ortant sp ecial cases: • The uniform , discr ete or Potts metric as also considered in [ZGFN08, LKY + 09] and as a sp ecial case in [KT99, KT07]. Here d ( i, j ) = 0 iﬀ i = j and d ( i, j ) = 1 in an y other case, which corresp onds to A = (1 / √ 2) I . • The line ar (lab el) metric, d ( i, j ) = c | i − j | , with A = ( c, 2 c, . . . , l c ). This regularizer is suitable to problems where the lab els can b e naturally or- dered, e.g. depth from stereo or gra yscale image denoising. • More generally , if lab el i corresp onds to a prototypical vector z i in k - dimensional feature space, and the Euclidean norm is an appropriate met- ric on the features, it is natural to set d ( i, j ) = k z i − z j k , whic h is Eu- clidean by construction. This corresp onds to a regularization in feature space, rather than in “lab el space”. Note that the b oundedness assumption in volving ρ l required for the existence pro of (Prop. 3) is only fulﬁlled if the kernel of A is suﬃciently small, i.e. k er A ⊆ { te | t ∈ R } , with e = (1 , . . . , 1) > ∈ R l : Otherwise, D A loc = ( B 1 (0) A ) ∩ { ( v 1 , . . . , v l ) | P i v i = 0 } is con tained in a subspace of at most dimension ( l − 2) d , and (46) cannot b e satisﬁed for any ρ l > 0. Thus if d is a degene rate Euclidean metric which can b e represen ted by an em b edding into a low er-dimensional space, as is the case with the linear metric, it has to be regularized for the ex- istence result in Prop. 3 to hold. This can for example be achiev ed b y choosing an orthogonal basis ( b 1 , . . . , b j ) of k er A , where j = dim ker A , and substituting A with A 0 := ( A > , εb 1 , . . . , εb j ) > , enlarging k as required. How ever these obser- v ations are mostly of theoretical interest, as the existence of minimizers for the discretized problem follo ws already from compactness of the (ﬁnite-dimensional) discretized constraint set. Non-Euclidean d , suc h as the trunc ate d line ar metric , d ( i, j ) = min { 2 , | i − j |} , cannot b e represented exactly b y TV A . In the follo wing w e will demonstrate ho w to construct approximations for these cases . Assume that d is an arbitrary metric with squared matrix representation D ∈ R l × l , i.e. D ij = d ( i, j ) 2 . Then it is known [BG05] that d is Euclidean iﬀ for C := I − 1 l ee > , the matrix T := − 1 2 C DC is p ositiv e semideﬁnite. In this case, d is called Euclide an distanc e matrix , and A can b e found by factorizing T = A > A . If T is not p ositiv e semideﬁnite, setting the nonnegativ e eigenv alues in T to zero yields an Euclidean approximation. This metho d is known as classic al sc aling [BG05] and does not necessarily give go od absolute error b ounds. More generally , for some non-metric, nonnegative d , we can formulate the problem of ﬁnding the “closest” Euclidean distance matrix E as the minimiza- tion problem of a matrix norm k E − D k M o ver all E ∈ E l , the set of l × l Eu- clidean distance matrices. F ortunately , there is a linear bijection B : P l − 1 → E l b et w een E l and the space of p ositiv e semideﬁnite ( l − 1) × ( l − 1) matrices P l − 1 20 Figure 6: Euclidean embeddings in to R 3 for sev eral interaction p otentials with four classes. Left to righ t: Potts; linear metric; non-Euclidean truncated linear metric. The vertices correspond to the columns a 1 , . . . , a l of the em b edding ma- trix A . F or the truncated linear metric an optimal approximate embedding was computed as outlined in Sect. 4.4 with the matrix norm k X k M := max i,j | X ij | . [Go w85, JT95]. This allows us to rewrite our problem as a semideﬁnite pr o gr am [WSV00, (p.534–541)] inf S ∈P l − 1 k B ( S ) − D k M . (85) Problem (85) can be solv ed using av ailable SDP solvers. Then E = B ( S ) ∈ E l , and A can b e extracted by factorizing − 1 2 C E C . Since both E and D are explicitly known, ε E := max i,j | ( E ij ) 1 / 2 − ( D ij ) 1 / 2 | can be explicitly computed and provides an a posteriori b ound on the maxim um distance error. Fig. 6 sho ws a visualization of some em b eddings for a four-class problem. In man y cases, in particular when the n umber of classes is large, the Euclidean em b edding pro vides a go o d appro ximation for non-Euclidean metrics (Fig. 7). Based on the embedding matrices computed in this wa y , the Euclidean dis- tance approach can b e used to solv e appro ximations of the labeling problem with arbitrary metric interaction p otentials, with the adv an tage of having a closed expression for the regularizer. 5 Discretized Problem 5.1 Saddle P oin t F ormulation W e now turn to solving the discretization of the relaxed problem (41). In order to formulate generic algorithms, w e study the bilinear saddle point problem, min u ∈C max v ∈D g ( u, v ) , g ( u, v ) := h u, s i + h Lu, v i − h b, v i . (86) As will b e shown in Sect. 5.2 – 5.4, this co vers b oth J d (76) and J A (78) as well as many other – ev en non-uniform and non-isotropic – regularizers. In a sligh t abuse of notation, we will denote by u, s ∈ R n also the discretiza- tions of u and s on a uniform grid. F urthermore we require a linear op erator L ∈ R m × n , a vector b ∈ R m for some m, n ∈ N , and b ounded closed conv ex sets C ⊆ R n , D ⊆ R m . In tuitively , L discretizes the gradien t op erator and D corresp onds to D loc , i.e. speciﬁes Ψ in a dual formulation. The primal and dual ob jectives are f ( u ) := max v ∈D g ( u, v ) and f D ( v ) := min u ∈C g ( u, v ) , (87) 21 0 20 40 60 i 0 20 40 60 j 0 1 2 d H i, j L 0 20 40 60 i 0 20 40 60 j 0 1 2 d H i, j L 10 20 30 40 50 60 i 0.0 0.5 1.0 1.5 2.0 2.5 3.0 d H i,1 L Figure 7: Euclidean approximation of the Non-Euclidean truncated linear met- ric with interaction p otential d ( i, j ) = √ 2 / 8 min {| i − j | , 16 } . Left to Righ t: Original p oten tial for 64 classes; p oten tial after Euclidean approximation. Bot- tom: cross section of the original (dashed) and approximated (solid) metric at i = 1. The appro ximation was computed using semideﬁnite programming as outlined in Sect. 4.4. It represen ts the closest Euclidean metric with resp ect to the matrix norm k X − Y k M := P i,j | X ij − Y ij | . The maximal error with respect to the original p oten tial is ε E = 0 . 2720. resp ectiv ely . The dual problem then consists of maximizing f D ( v ) o ver D . As C and D are b ounded, it follows from [Ro c70, Cor. 37.6.2] that a saddle p oin t ( u ∗ , v ∗ ) of g exists. With [Roc70, Lemma 36.2], this implies strong duality , i.e. min u ∈C f ( u ) = f ( u ∗ ) = g ( u ∗ , v ∗ ) = f D ( v ∗ ) = max v ∈D f D ( v ) . (88) In our case, C , D exhibit a sp eciﬁc pro duct structure, which allows to compute f D as w ell as the orthogonal pro jections Π C and Π D eﬃcien tly , a fact that will pro ve imp ortan t in the algorithmic part. The ev aluation of the primal ob jective f on the other hand can b e more diﬃcult, dep ending on the deﬁnition of D loc resp. D , but is not required b y the optimizer. Note that in the discrete framew ork, w e may easily substitute non-homogeneous, spatially v arying or even nonlo cal regularizers by choosing L and b appropriately . 22 Algorithm 1 Pro jection of y ∈ R l on to the standard simplex ∆ l 1: Z (0) ← ∅ , y (0) ← y ∈ R l , k ← 0. 2: rep eat 3: ˜ y ( k +1) i ← ( 0 , i ∈ Z ( k ) , y ( k ) i − e > y ( k ) l −| Z ( k ) | , otherwise , ( i = 1 , . . . , l ) . 4: Z ( k +1) ← { i ∈ { 1 , . . . , l }| i ∈ Z ( k ) or ˜ y ( k +1) i < 0 } . 5: y ( k +1) i ← max { ˜ y ( k +1) i , 0 } ( i = 1 , . . . , l ). 6: k ← k + 1. 7: until ˜ y ( k +1) > 0. 5.2 Discretization W e discretize Ω b y a regular grid, Ω = { 1 , . . . , n 1 } × · · · × { 1 , . . . , n d } ⊆ R d for d ∈ N , consisting of n := | Ω | pixels ( x 1 , . . . , x n ) in lexicographical order. W e represent u b y its v alues u = ( u 1 | . . . | u l ) ∈ R n × l at these points, and denote by u x j ∈ R l , j = 1 , . . . , n, the j -th row of u , corresp onding to u ( x j ). The multidimensional image space R n × l is equipp ed with the Euclidean inner pro duct h· , ·i ov er the vectorized elements. W e iden tify u ∈ R n × l with the vector in R nl obtained by concatenating the columns. Let grad := (grad > 1 | . . . | grad > d ) > b e the d -dimensional forward diﬀerences gradien t op erator for Neumann b oundary conditions. Then div := − grad > is the backw ard diﬀerences divergence operator for Dirichlet boundary conditions. These op erators extend to R n × l via Grad := ( I l ⊗ grad), Div := ( I l ⊗ div). W e deﬁne, for some k > 1 as will b e sp eciﬁed b elo w, the con vex sets C := { u ∈ R n × l | u x j ∈ ∆ l , j = 1 , . . . , n } , (89) D := Y x ∈ Ω D loc ⊆ R n × d × k . (90) The set D loc and the op erator L dep end on the c hosen regularizer. Note that for L := Grad, k = l and D loc := D I loc :=  p = ( p 1 | . . . | p l ) ∈ R d × l |k p k 6 1  , (91) the primal ob jective in the saddle p oin t formulation discretizes the classical v ector-v alued total v ariation, TV( u ) := σ Div D ( u ) = σ D (Grad u ) = n X j =1 k G x j u k , (92) where each of the G x j is an ( l d ) × ( nl ) matrix comp osed of rows of (Grad) suc h that G x j u constitutes the vectorized discrete Jacobian of u at the point x j . Pro jections on the set C are highly separable and can be computed exactly in a ﬁnite n umber of steps [Mic86, Alg. 4], see Alg. 1 for a summary of the p oin t wise operation. The dual ob jective is f D ( v ) = −h b, v i + min u ∈C h u, L > v + s i . (93) Since the minim um decouples spatially , and min y ∈ ∆ l h y , z i = v ecmin( z ) := min i z i for all z ∈ R l , the dual ob jective can alwa ys be ev aluated by summing, 23 o ver all p oin ts x ∈ Ω, the minima o ver the comp onen ts of L > v + s corresp onding to each p oin t, denoted b y ( L > v + s ) x j , f D ( v ) = −h b, v i + n X j =1 v ecmin  ( L > v + s ) x j  . (94) This fact is helpful if ev aluating the primal ob jectiv e is costly b ecause the dual set D loc has a complicated structure. 5.3 Sp ecialization for the Lo cal En velope Metho d F or a metric in teraction potential d : { 1 , . . . , l } 2 → R > 0 , we set k := l and L := Grad , D loc := D d loc = \ i 6 = j { v =  v 1 | . . . | v l  ∈ R d × l |k v i − v j k 6 d ( i, j ) } (95) as in (70). W e arrive at the saddle p oint form (86) with C , D , and L deﬁned as ab o v e, m = nl and b = 0. Ho wev er due to the generality of the regularizer (cf. Prop. 6), the primal ob jectiv e f cannot b e easily computed anymore (for the sp ecial case of three labels there is a deriv ation in [CCP08]). Pro jections on D can b e computed for all x ∈ Ω in parallel, while pro jections on the individual sets D d loc can b e computed by the Dykstra algorithm, cf. [CCP08] and Sect. 6.4. 5.4 Sp ecialization for the Euclidean Metric Metho d F or A ∈ R k × l as in (78), the Euclidean regularizer can b e obtained b y setting L := (Grad)( A ⊗ I n ) , D loc := D I loc =  v = ( v 1 | . . . | v k ) ∈ R d × k |k v k 6 1  . (96) Departing from the deﬁnition (79) of D A loc , A is merged into L , as the optimiza- tion metho d will rely on pro jections on D loc . Including A into D loc , i.e. b y setting D loc := ( A ⊗ I n ) > D I loc (97) and L := Grad, would prev ent computing the pro jection in closed form. Pro ject- ing on the unit ball D I loc on the other hand is trivial. The discretized regularizer can b e explicitly ev aluated, since Ψ( z ) = k z A > k . (98) Comparison to (92) yields J A ( u ) = TV(( A ⊗ I n ) u ) . (99) W e ﬁnally arriv e at the form (86) with C , D , and L deﬁned as ab ov e, m = nk and b = 0. Pro jections on D are highly separable and thus can b e computed easily . The primal ob jective can b e ev aluated in closed form using (99). 24 5.5 Optimalit y If f D and f can b e explicitly computed, an y v ∈ D pro vides an optimality b ound on the primal ob jective, with resp ect to the optimal v alue f ( u ∗ ), via the n umerical primal-dual gap f ( u ) − f D ( v ), 0 6 f ( u ) − f ( u ∗ ) 6 f ( u ) − f D ( v ) . (100) Assuming f and f D can be ev aluated, the gap is a conv enient stopping criterion. T o impro ve the scale inv ariance, it is often practical to stop on the r elative gap f ( u ) − f D ( v ) f D ( v ) (101) instead, which gives a similar bound. Ho wev er conv ergence in the ob jective alone do es not necessarily imply con vergence in u , as the minimizer of the original problem (86) is generally not unique. This stands in contrast to the w ell-known ROF-t yp e problems [ROF92], where the uniqueness is induced by a quadratic data term. F or some applications, after solving the relaxed problem a discrete solution – or “hard” lab eling – still needs to be recov ered, i.e. the relaxed solution needs to b e binarize d . F or the con tinuous tw o-class case with the classical TV regularizer, [CEN06] show ed that an exact solution can b e obtained by thresholding at almost an y threshold. Ho wev er, their results do not seem to transfer to the multi-class case. Another diﬃculty lies in the discretization: In order to apply the thresh- olding theorem, a crucial “coarea”-like prop ert y must hold for the discr etize d problem [CD09], whic h holds for the graph-based pairwise ` 1 -, but not the higher order ` 2 -discretization of the TV. Th us, ev en in the tw o-class case, sim- ple thresholding ma y lead to a sub optimal discrete solution. Curren tly w e are not aw are of an a priori b ound on the error introduced b y the binarization step in the general case. In practice, any dual feasible p oint together with (100) yields an a p osteriori optimalit y b ound: Let u ( N ) , v ( N ) b e a pair of primal resp. dual feasible iterates, ¯ u ( N ) the result of the binarization step applied to u ( N ) , and ¯ u ∗ the optimal discrete solution. Then ¯ u ( N ) is primal feasible, and its sub optimalit y with resp ect to the optimal discrete solution is b ounded from ab o ve by f ( ¯ u ( N ) ) − f ( ¯ u ∗ ) 6 f ( ¯ u ( N ) ) − f ( u ∗ ) 6 f ( ¯ u ( N ) ) − f D ( v ( N ) ) . (102) Computation of f D , and in man y cases also f , is eﬃcien t as outlined in Sect. 5.2. 5.6 Impro ving the Binarization There seems to b e no ob vious b est choice for the binarization step. The simplest c hoice is the ﬁrst-max approac h: the lab el ` ( x ) is set to the index of the ﬁrst maximal component of the relaxed solution ¯ u ( N ) ( x ). Ho wev er, this migh t lead to undesired side eﬀects: Consider the segmentation of a grayscale image with the three labe ls 1 , 2 , 3 corresp onding to the gray level intensit y , together with the linear distance d ( i, j ) = | i − j | , which is exactly representable using the Euclidean distance approac h with A = (1 2 3). Assume there is a region where ¯ u ( N ) = (1 / 3 + δ ( x ) , 1 / 3 , 1 / 3 − δ ( x )) for some small noise δ ( x ) ∈ R . The most 25 “natural” choice given the in terpretation as gra yscale v alues is the constan t lab eling ` ( x ) = 2. The ﬁrst-max approach gives ` ( x ) ∈ { 1 , 3 } , dep ending on the sign of δ ( x ), which leads to a noisy ﬁnal segmentation. On closer insp ection, the ﬁrst-max approac h – which works w ell for the P otts distance – corresponds to c ho osing ` ( x ) = arg min ` ∈{ 1 ,...,l } k u ( x ) − e ` k , (103) with the smallest ` chosen in case of ambiguit y . W e prop ose to extend this to non-uniform distances b y setting ` ( x ) = arg min ` ∈{ 1 ,...,l } ¯ Ψ  u ( x ) − e `  , (104) ¯ Ψ : R l → R , ¯ Ψ( z ) := Ψ  e 1 z >  . That is, we select the lab el corresp onding to the nearest unit vector with r esp e ct to ¯ Ψ (note that instead of e 1 w e could choose any normalized vector as Ψ is assumed to b e rotationally inv ariant). In fact, for the linear distance example ab o v e w e hav e ¯ Ψ( z ) = | − z 1 + z 3 | . Thus ¯ Ψ( u ( x ) − e 1 ) = | 1 − 2 δ ( x ) | , (105) ¯ Ψ( u ( x ) − e 2 ) = | 2 δ ( x ) | , ¯ Ψ( u ( x ) − e 3 ) = | 1 + 2 δ ( x ) | , and for small δ we will get the stable and semantically correct c hoice ` ( x ) = 2. This metho d prov ed to w ork well in practice, and considerably reduced the sub optimalit y introduced by the binarization step (Sect. 7.5). In case there is no closed form expression of Ψ, it can b e numerically approximated as outlined in Sect. 6.5. 6 Optimization When optimizing the saddle p oint problem (86), one must tak e care of its large- scale nature and the inherent nonsmoothness of the ob jectiv e. While in terior- p oin t solvers are known to b e very fast for small to medium sized problems, they are not particularly suited well for massively parallel computation, such as on the up coming GPU platforms, due to the exp ensive inner Newton iterations. W e will instead fo cus on ﬁrst or der metho ds inv olving only basic op erations that can be easily parallelized due to their lo cal nature, such as ev aluations of L and L > and pro jections on C and D . The ﬁrst tw o metho ds are stated here for comparison, the third one is new. It additionally requires ev aluating ( I + L > L ) − 1 , which is p otentially non-local, but in many cases can still b e computed fast and explicitely using the Discrete Cosine T ransform as shown b elo w. The following approaches rely on computing pro jections on the full sets D resp. D loc , which requires an iterative procedure for non-trivial constraint sets suc h as obtained when using the lo cal en v elop e method (Sect. 6.4). W e would lik e to mention that by introducing further auxiliary v ariables and a suitable splitting approach, these inner iterations can also b e av oided [LBS10]. Ho wev er the n umber of additional v ariables grows quadratically in the num b er of lab els, 26 Algorithm 2 FPD Multi-Class Lab eling 1: Cho ose ¯ u (0) ∈ R n × l , v (0) ∈ R n × d × l . 2: Cho ose τ P > 0 , τ D > 0 , N ∈ N . 3: for k = 0 , . . . , N − 1 do 4: v ( k +1) ← Π D  v ( k ) + τ D  L ¯ u ( k ) − b  . 5: u ( k +1) ← Π C  u ( k ) − τ P  L > v ( k +1) + s  . 6: ¯ u ( k +1) ← 2 u ( k +1) − u ( k ) . 7: end for therefore the approach is only feasible for relatively small num bers of lab els and memory requirements are usually one to several orders of magnitude higher than for the approac hes discussed next. 6.1 F ast Primal-Dual Metho d One of the most straightforw ard approac hes for optimizing (86) is to ﬁx small primal and dual step sizes τ P resp. τ D , and alternatingly apply pro jected gradi- en t descen t resp. ascent on the primal resp. dual v ariables. This A rr ow-Hurwicz approac h [AHU64] w as proposed in a PDE framework for solving the tw o-class lab eling problem in [A T06] and recently used in [CCP08]. An application to denoising problems can b e found in [ZC08]. How ever it seems non trivial to de- riv e suﬃcient conditions for con vergence. Therefore in [PCBC09a] the authors prop ose the F ast Primal-Dual (FPD) metho d, a v ariant of P op o v’s saddle p oint metho d [Pop80] with prov able conv ergence. The algorithm is summarized in Alg. 2. Due to the explicit steps in volv ed, there is an upp er b ound condition on the step size to assure con vergence, whic h can be shown to b e τ P τ D < 1 / k L k 2 [PCBC09a]. The operator norm can be bounded according to k L k 6 k Grad k 6 2 √ d (106) for the en velope metho d, and k L k 6 k Grad kk A k 6 2 √ d k A k (107) for the Euclidean metric metho d. As b oth the primal and dual iterates are alw ays feasible due to the pro jections, a stopping criterion based on the primal- dual gap as outlined in Sect. 5.5 can be emplo yed. 6.2 Nestero v Metho d W e will pro vide a short summary of the application of Nestero v’s m ulti-step metho d [Nes04] to the saddle p oin t problem (86) as prop osed in [LBS09]. In con trast to the FPD metho d, it treats the nonsmo othness by ﬁrst applying a smo othing step and then using a smo oth constrained optimization metho d. The amount of smo othing is balanced in such a wa y that the ov erall num b er of iterations to produce a solution with a sp eciﬁc accuracy is minimized. The algorithm has a theoretical worst-case complexity of O (1 /ε ) for ﬁnd- ing an ε -optimal solution, i.e. f ( u ( N ) ) − f ( u ∗ ) 6 ε . It has b een shown to 27 Algorithm 3 Nestero v Multi-Class Lab eling 1: Let c 1 ∈ C , c 2 ∈ D and r 1 , r 2 ∈ R such that C ⊆ B r 1 ( c 1 ) and D ⊆ B r 2 ( c 2 ); C > k L k . 2: Cho ose x (0) ∈ C and N ∈ N . 3: Let µ ← 2 k L k N +1 r 1 r 2 . 4: Set G ( − 1) ← 0 , w ( − 1) ← 0. 5: for k = 0 , . . . , N do 6: V ← Π D  c 2 + 1 µ  Lx ( k ) − b   . 7: w ( k ) ← w ( k − 1) + ( k + 1) V . 8: v ( k ) ← 2 ( k +1)( k +2) w ( k ) . 9: G ← s + L > V . 10: G ( k ) ← G ( k − 1) + k +1 2 G . 11: u ( k ) ← Π C  x ( k ) − µ k L k 2 G  . 12: z ( k ) ← Π C  c 1 − µ k L k 2 G ( k )  . 13: x ( k +1) ← 2 k +3 z ( k ) +  1 − 2 k +3  u ( k ) . 14: end for giv e accurate results for denoising [Auj08] and general ` 1 -norm based problems [W ABF07, BBC09]. Besides the desired accuracy , no other parameters ha v e to b e provided. The complete algorithm for our saddle p oin t formulation is sho wn in Alg. 3. The only exp ensiv e op erations are the pro jections Π C and Π D , whic h are eﬃcien tly computable as shown ab o ve. The algorithm conv erges in any case and provides explicit sub optimality b ounds: Prop osition 8. In Alg. 3, the iter ates u ( k ) , v ( k ) ar e primal r esp. dual fe asible, i.e. u ( k ) ∈ C , v ( k ) ∈ D . Mor e over, for any solution u ∗ of the r elaxe d pr oblem (86), the r elation f ( u ( N ) ) − f ( u ∗ ) 6 f ( u ( N ) ) − f D ( v ( N ) ) 6 2 r 1 r 2 C ( N + 1) (108) holds for the ﬁnal iter ates u ( N ) , v ( N ) . Pr o of. Apply [Nes04, Thm. 3] with the notation ˆ f ( u ) = h u, s i , A = L , ˆ φ ( v ) = h b, v i , d 1 ( u ) := 1 2 k u − c 1 k 2 , d 2 ( v ) := 1 2 k v − c 2 k 2 , D 1 = 1 2 r 2 1 , D 2 = 1 2 r 2 2 , σ 1 = σ 2 = 1, M = 0. Corollary 1. F or given ε > 0 , applying A lg. 3 with N =  2 r 1 r 2 C ε − 1 − 1  (109) yields an ε -optimal solution of (86), i.e. f ( u ( N ) ) − f ( u ∗ ) 6 ε . F or the discretization outlined in Sect. 5.2, we choose c 1 = 1 l e and r 1 = p n ( l − 1) /l , which leads to the following complexity b ounds to u ( N ) with re- sp ect to the suboptimality ε . 28 • Envelop e metho d (95). F rom the previous remarks, C := 2 √ d > k L k . Moreo ver c 2 = 0 and b y Prop. 10 (see Appendix), w e ha ve D d loc ⊆ B α d (0) with α d := min i   X j d ( i, j ) 2   1 / 2 , (110) and thus r 2 = α d √ n . The total complexity in terms of the n umber of iterations is O ( ε − 1 n √ dα d ) . (111) • Euclide an metric metho d (96). Here we may set C = 2 √ d k A k , c 2 = 0 and r 2 = √ n for a total complexity of O ( ε − 1 n √ d k A k ) . (112) W e arrive at a parameter-free algorithm, with the exception of the desired sub- optimalit y b ound. F rom the sequence ( u ( k ) , v ( k ) ) we may again compute the curren t primal-dual gap at eac h iteration. As a unique feature, the num b er of required iterations can b e determined a priori and independently of the v ariables in the data term, which could pro v e useful in real-time applications. 6.3 Douglas-Rac hford Metho d W e demonstrate how to apply the Douglas-Rachford splitting approach [DR56] to our problem. By introducing auxiliary v ariables, w e can again reduce the inner steps to pro jections on the sets C and D . This is in con trast to a more straigh tforward splitting approach such as [LKY + 09], where the inner steps require to solv e ROF-t yp e problems that include a quadratic data term. Minimizing a prop er, conv ex, low er-semicontin uous ( lsc ) function f : X → R ∪ {±∞} ov er a ﬁnite dimensional vector space X := R n can b e regarded as ﬁnding a zero of its – necessarily maximal monotone [R W04] – sub diﬀeren tial op erator T := ∂ f : X → 2 X . In the op erator splitting framew ork, ∂ f is assumed to b e decomposable in to the sum of t wo “simple” op erators, T = A + B , of whic h forw ard and backw ard steps can practically be computed. Here, w e consider the (tigh t) Douglas-R achfor d-Splitting iteration [DR56, LM79] with the ﬁxp oin t iteration ¯ u ( k +1) = ( J τ A (2 J τ B − I ) + ( I − J τ B ))( ¯ u ( k ) ) , (113) where J τ T := ( I + τ T ) − 1 is the r esolvent of T . Under the very general con- strain t that A and B are maximal monotone and A + B has at least one zero, the sequence ( ¯ u ( k ) ) is uniquely deﬁned and will conv erge to a p oin t ¯ u , with the additional prop erty that u := J τ B ( ¯ u ) is a zero of T [Eck89, Thm. 3.15, Prop. 3.20, Prop. 3.19]. In particular if f = f 1 + f 2 for prop er, con vex, lsc f i suc h that the relative interiors of their domains ha ve a nonempty intersection, ri(dom f 1 ) ∩ ri(dom f 2 ) 6 = ∅ , (114) 29 it follo ws [R W04, Cor. 10.9] that ∂ f = ∂ f 1 + ∂ f 2 , and A := ∂ f 1 , B := ∂ f 2 are maximal monotone. As x ∈ J τ ∂ f i ( y ) ⇔ x = arg min { (2 τ ) − 1 k x − y k 2 2 + f i ( x ) } , the computation of the resolv ents reduces to proximal p oint optimization problems in volving only the f i . How ever, for straightforw ard splittings of (86), suc h as min u max v ∈D g ( u, v ) | {z } f 1 ( u ) + δ C ( u ) | {z } f 2 ( u ) , (115) ev aluating the resolv ents requires to solve ROF-t yp e problems [LKY + 09], which is a strongly non-lo cal op eration and requires an iterative procedure, in tro ducing additional parameters and conv ergence issues. Instead, we follow the pro cedure in [EB92, Set09b] of adding auxiliary v ariables b efore splitting the ob jective in order to simplify the individual steps of the algorithm. W e introduce w = Lu and split according to min u ∈C max v ∈D h u, s i + h Lu, v i − h b, v i (116) = min u h u, s i + σ D ( Lu − b ) + δ C ( u ) (117) = min u,w h ( u, w ) := δ Lu = w ( u, w ) | {z } h 1 ( u,w ) + h u, s i + δ C ( u ) + σ D ( w − b ) | {z } h 2 ( u,w ) . (118) W e apply the tigh t Douglas-Rac hford iteration to this formulation: Denote ( u ( k ) , w ( k ) ) := J τ ∂ B ( ¯ u ( k ) , ¯ w ( k ) ) , (119) ( u 0 ( k ) , w 0 ( k ) ) := J τ A (2 J τ B − I )( ¯ u ( k ) , ¯ w ( k ) ) = J τ A (2 u ( k ) − ¯ u ( k ) , 2 w ( k ) − ¯ w ( k ) ) . (120) Then ( ¯ u ( k +1) , ¯ w ( k +1) ) = ( ¯ u ( k ) + u 0 ( k ) − u ( k ) , ¯ w ( k ) + w 0 ( k ) − w ( k ) ), according to (113). Ev aluating the resolven t J τ B is equiv ale n t to a proximal step on h 2 ; moreo ver due to the introduction of the auxiliary v ariables it decouples, u 0 ( k ) = arg min u 0 { 1 2 k u 0 − (2 u ( k ) − ¯ u ( k ) ) + τ s k 2 2 + δ C ( u 0 ) } = Π C  (2 u ( k ) − ¯ u ( k ) ) − τ s  , (121) w 0 ( k ) = arg min w 0 { 1 2 τ k w 0 − (2 w ( k ) − ¯ w ( k ) ) k 2 2 + σ D ( w 0 − b ) } = (2 w ( k ) − ¯ w ( k ) ) − τ Π D  1 τ (2 w ( k ) − ¯ w ( k ) − b )  . (122) In a similar manner, J τ A resp. the proximal step on h 1 amoun ts to the least- squares problem ( u ( k ) , w ( k ) ) = arg min u,w { 1 2 τ  k u − ¯ u ( k ) k 2 2 + k w − ¯ w ( k ) k 2  + δ Lu = w ( u, w ) } . (123) Substituting the constrain t w ( k ) = Lu ( k ) yields u ( k ) = arg min u {k u − ¯ u ( k ) k 2 2 + k Lu − ¯ w ( k ) k 2 } = ( I + L > L ) − 1  ¯ u ( k ) + L > ¯ w ( k )  . (124) 30 Algorithm 4 Douglas-Rac hford Multi-Class Labeling 1: Cho ose ¯ u (0) ∈ R n × l , ¯ w (0) ∈ R n × d × l (or set ¯ w (0) = L ¯ u (0) ). 2: Cho ose τ > 0. 3: k ← 0. 4: while (not conv erged) do 5: u ( k ) ← Π C  ¯ u ( k ) − τ s  . 6: w 00 ( k ) ← Π D  1 τ ( ¯ w ( k ) − b )  . 7: u 0 ( k ) ← ( I + L > L ) − 1  (2 u ( k ) − ¯ u ( k ) ) + L > ( ¯ w ( k ) − 2 τ w 00 ( k ) )  . 8: w 0 ( k ) ← Lu 0 ( k ) . 9: ¯ u ( k +1) ← ¯ u ( k ) + u 0 ( k ) − u ( k ) . 10: ¯ w ( k +1) ← w 0 ( k ) + τ w 00 ( k ) . 11: k ← k + 1. 12: end while Finally , Alg. 4 is obtained by substituting w 00 ( k ) := Π D  1 τ (2 w ( k ) − ¯ w ( k ) − b )  . Solving the linear equation system (124) can often b e greatly accelerated b y exploiting the fact that under the forward diﬀerence discretization with Neu- mann b oundary conditions, grad > grad diagonalizes under the discrete cosine transform (DCT-2) [Str99, LKY + 09]: grad > grad = B − 1 diag( c ) B (125) where B is the orthogonal transformation matrix of the DCT and c is the vector of eigenv alues of the discrete Laplacian. In b oth approaches presented ab o ve, L is of the form L = A ⊗ grad for some (p ossibly identit y) matrix A ∈ R k × l , k 6 l . First, compute the decomp osition A > A = V − 1 diag( a ) V with a ∈ R l and an orthogonal matrix V ∈ R l × l , V − 1 = V > . Then ( I + L > L ) − 1 =  V > ⊗ I n   I l ⊗ B − 1  · ( I + diag( a ) ⊗ diag( c )) − 1 ( I l ⊗ B ) ( V ⊗ I n ) (126) (see App endix for the pro of ). Thus step 124 can b e achiev ed fast and accu- rately through matrix multiplications with V , discrete cosine transforms, and one O ( nl ) product for inv erting the inner diagonal matrix. In addition to conv ergence of the primal iterates ( u ( k ) ), it can b e sho wn that the sequence ( w 00 ( k ) ) from Alg. 4 actually conv erges to a solution of the dual problem: Prop osition 9. L et C , D b e b ounde d, close d and c onvex sets with ri( C ) 6 = ∅ and ri( D ) 6 = ∅ . Then Alg. 4 gener ates a se quenc e of primal/dual fe asible p airs ( u ( k ) , w 00 ( k ) ) ∈ C × D such that, for any sadd le p oint ( u ∗ , v ∗ ) of the r elaxe d pr oblem (86), f ( u ( k ) ) k →∞ → f ( u ∗ ) = f D ( v ∗ ) , (127) f D ( w 00 ( k ) ) k →∞ → f D ( v ∗ ) = f ( u ∗ ) . (128) Mor e over, f ( u ( k ) ) − f ( u ∗ ) 6 f ( u ( k ) ) − f D ( w 00 ( k ) ) (129) pr ovides an upp er b ound for the sub optimality of the curr ent iter ate. 31 Pr o of. See App endix. Th us the Douglas-Rachford approach also allows to use the primal-dual gap f ( u ( k ) ) − f D ( w 00 ( k ) ) (130) as a stopping criterion. V ery recently , a generalization of the FPD metho d [PCBC09a] has b een prop osed [CP10]. The authors show that under certain circumstances, their metho d is equiv alent to Douglas-Rachford splitting. As a result, it is p ossible to show that Alg. 4 can alternatively b e in terpreted as an application of the primal-dual metho d from [CP10] to the problem formulation (118). 6.4 Pro jection on the Dual Constraint Set F or the Euclidean metric approach, pro jection on the unit ball D I loc and thus on D is trivial: Π D I loc ( v ) =  v , k v k 6 1 , v k v k , otherwise . (131) Pro jection on D d loc for the general metric case is more inv olved. W e represent D d loc as the in tersection of con vex sets, D d loc = R ∩ S , R := { v ∈ R d × l | X i v i = 0 } , (132) S := \ i 0, starting with v (0) = z . As the ob jectiv e in (136) is linear, D d loc is b ounded and thus Ψ is b ounded from ab o ve, conv ergence follo ws [Gol64, LP66] for any step size τ [WX04]: lim k →∞ h z , v ( k ) i = Ψ( z ) . (138) There is a trade-oﬀ in choosing the step size, as large τ lead to a smaller n um b er of outer iterations, but an increased n umber of nontrivial op erations in the pro jection. W e c hose τ = 2, which work ed w ell for all examples. It is also p ossible to use any of the other nonsmo oth optimization metho ds presen ted ab o v e. 7 Exp erimen ts Regarding the practical p erformance of the presented approaches, we fo cused on tw o main issues: conv ergence sp eed and tightness of the relaxation. W e will ﬁrst quantitativ ely compare the presented algorithms in terms of runtime and the num b er of inner iterations, and then pro vide some results on the eﬀect of the Euclidean metric vs. the env elop e regularization. The algorithms were implemented in Matlab with some core functions, such as the computation of the gradient and the pro jections on the dual sets, imple- men ted in C++. W e used Matlab’s built-in FFTW interface for computing the DCT for the Douglas-Rachford approach. All exp erimen ts were conducted on an Intel Core2 Duo 2 . 66 GHz with 4 GB of RAM and 64-bit Matlab 2009a. 33 Figure 8: Synthetical “four colors” input image for the p erformance tests. T op ro w: Input image with Gaussian noise, σ = 1; lo cal labeling without regular- izer. Bottom row: Result with the Potts regularizer and Douglas-Rachford optimization; ground truth. 7.1 Relativ e P erformance T o compare the conv ergence sp eed of the three diﬀerent approaches, we com- puted the relative primal-dual gap at eac h iteration as outlined in Sect. 6. As it b ounds the sub optimality of the curren t iterate (see Sect. 5.5), it constitutes a reliable and conv enien t criterion for performance comparison. Unfortunately the gap is not av ailable for the en velope metho d, as it requires the primal ob jective to b e ev aluatable. Using a numerical appro ximation suc h as the one in Sect. 6.5 is not an option, as these metho ds can only provide a lower b ound for the ob jective. This would lead to an underestimation of the gap, which is critical as one is interested in the b eha vior when the gap is v ery close to zero. Therefore we restricted the gap-based p erformance tests to the Euclidean metric regularizer. In order to make a fair comparison we generally analyzed the progression of the gap with resp ect to computation time, rather than the n umber of iterations. F or the ﬁrst tests we used the synthetical 256 × 256 “four colors” input image (Fig. 8). It represents a typical segmentation problem with several ob- jects featuring sharp corners and round structures ab ov e a uniform bac kground. The label set consists of three classes for the foreground and one class for the background. The image was ov erlaid with iid Gaussian noise with σ = 1 and truncated to [0 , 1] on all RGB channels. W e used a simple ` 1 data term, s i ( x ) = k g ( x ) − c i k 1 , where g ( x ) ∈ [0 , 1] 3 are the RGB color v alues of the input image in x , and c i is a protot ypical color v ector for the i -th class. The runtime analysis shows that FPD and Douglas-Rac hford perform simi- lar, while the Nesterov method falls b ehind considerably in b oth the primal and the dual ob jectiv e (Fig. 9). The picture changes when considering the gap with resp ect to the num b er of iterations rather than time, eliminating inﬂuences of the implementation and system arc hitecture. T o achiev e the same accuracy , Douglas-Rachford requires only one third of the iterations compared to FPD (Fig. 10). This adv antage do es not fully translate to the time-based analysis as the DCT steps increase the p er- step computational cost signiﬁcantly . How ever in this example the pro jections on the sets C and D w ere relativ ely c heap compared to the DCT. In situations where the pro jections dominate the time per step, the reduced iteration coun t can b e exp ected to lead to an almost equal reduction in computation time. One could ask if these relations are typical to the syn thetical data used. 34 20 40 60 80 100 t 72 800 72 820 72 840 72 860 72 880 72 900 72 920 72 940 f 20 40 60 80 100 t 72 852.5 72 853.0 72 853.5 72 854.0 72 854.5 72 855.0 f Figure 9: Conv ergence sp eed for the “four colors” image in Fig. 8. Left: Primal (upp er) and dual (lo wer) ob jectives vs. computation time for the (from top to b ottom) Nestero v, F ast Primal-Dual (FPD) and Douglas-Rachford methods. Righ t: Detailed view of the FPD and DR metho ds. The primal and dual ob jectives pro vide upper and low er b ounds for the ob jectiv e of the true optim um. Douglas-Rac hford and FPD p erform equally , while the Nestero v metho d falls b ehind. 20 40 60 80 100 120 140 t 0.1 1 10 100 1000 10 4 10 5 gap 500 1000 1500 2000 k 1 10 100 1000 10 4 10 5 gap Figure 10: Primal-Dual gap for Fig. 9 with respect to time and n umber of iterations. T op: Primal-Dual gap vs. time and num ber of iterations. The Nestero v metho d (top) again falls b ehind, while FPD (center) and Douglas- Rac hford (b ottom) are equally fast. Bottom: Primal-Dual gap vs. num b er of iterations. The Douglas-Rachford method requires only one third of the FPD iterations, which makes it suitable for problems with expensive inner steps. 35 50 100 150 t 24 600 24 800 25 000 25 200 25 400 25 600 f 100 200 300 400 500 k 10 100 1000 10 4 gap Figure 11: Ob jectiv es and primal-dual gap for the real-w orld leaf image in Fig. 4 with 12 classes and Potts p oten tial. Left: Primal (upp er) and dual (lo wer) ob jectives vs. time. The Nesterov metho d (top) falls b ehind the FPD (cen ter) and Douglas-Rachford (b ottom) metho ds. Right: Gap vs. n umber of iterations. As with the synthetical four-colors image (Fig. 10), the Douglas- Rac hford approach reduces the num b er of required iterations by approximately a factor of 3. Ho wev er w e found them conﬁrmed on a large ma jorit y of the problems tested. As one example of a real-world example, consider the “leaf ” image (Fig. 4). W e computed a segmentation into 12 classes with Potts regularizer, again based on the ` 1 distances for the data term, with very similar relative p erformance as for the “four colors” image (Fig. 11). 7.2 Num b er of V ariables and Regularization Strength T o examine how the presen ted metho ds scale with increasing image size, w e ev aluated the “four colors” image at 20 diﬀeren t scales ranging from 16 × 16 to 512 × 512. Note that if the grid spacing is held constant, the regularizer weigh ts m ust b e scaled prop ortionally to the image width resp. height in order to obtain structurally comparable results, and not mix up eﬀects of the problem size and of the regularization strength. In order to comp ensate for the increasing num b er of v ariables, the stopping criterion was based on the relative gap (101). The algorithms terminated as soon as the relative gap fell b elo w 10 − 4 . The Nesterov method consistently pro duced gaps in the 10 − 3 range and nev er ac hiev ed the threshold within the limit of 2000 iterations. Douglas-Rachfor d and FPD scale only slightly sup erlinearly with the problem size, which is quite a go od result for such comparativ ely simple ﬁrst- order metho ds (Fig. 12). While w e deliberately excluded inﬂuences of the regularizer in the previous exp erimen t, it is also interesting to examine how algorithms cope with v arying regularization strength. W e ﬁxed a resolution of 256 × 256 and p erformed the same analysis as ab o ve, scaling the regularization term by an increasing sequence of λ in the [0 . 1 , 5] range (Fig. 13). In terestingly , for low regularization, where muc h of the noise remains in the solution, FPD clearly tak es the lead. F or scenarios with large structural c hanges, Douglas-Rac hford performs b etter. W e observed tw o p eaks in the run time plot whic h w e cannot completely explain. How ever we found that at the ﬁrst p eak, structures in the image did not disapp ear at in parallel but rather one after the 36 50 000 100 000 150 000 200 000 250 000 n 50 100 150 t Figure 12: Computation time for increasing problem size for the Douglas- Rac hford (top, dark) and FPD (b ottom, light) metho ds. Shown is the time in seconds required to ac hieve a relativ e gap of 10 − 4 . The computational eﬀort scales slightly sup erlinearly with the num b er of pixels. The Nesterov method nev er con verged to the required accuracy within the limit of 2000 iterations. 1 2 3 4 Λ 50 100 150 t Figure 13: Computation time for v arying regularization strength λ for the Douglas-Rac hford (top, dark) and FPG (b ottom, light) methods. The images at the b ottom show the ﬁnal result for the λ ab o ve. FPD is strong on lo w regularization problems, while Douglas-Rachford is b etter suited for problems with large structural changes. The Nesterov method never achiev ed the relative gap of 10 − 5 within 2000 iterations. other, which migh t contribute to the slow er conv ergence. Again, the Nesterov metho d nev er ac hieved the required accuracy . 7.3 Breaking P oin ts W e hav e no clear explanation wh y the Nesterov metho d app ears to almost alw ays fall b ehind. Ho wev er it is p ossible to compare its b ehavior with the a priori b ound from Prop. 8. By insp ecting the deriv ations in the original work [Nes04], it can b e seen that exactly one half of the ﬁnal bound comes from the smo othing step, while the other half is caused by the ﬁnite num b er of iterations: δ total = δ smooth + δ iter , where δ smooth = δ iter . (139) 37 500 1000 1500 2000 k 100 200 500 1000 2000 5000 1 ´ 10 4 f Figure 14: Theoretical vs. practical p erformance of the Nestero v metho d for Fig. 9. As expected, the metho d sta ys b elo w the theoretical per-iteration b ound δ ( k ) total (dashed). At the ﬁnal iteration, the worst-case total b ound δ total (dotted, top) is outperformed b y a factor of 7, whic h implicates that the error in tro duced b y smoothing is also well b elo w its w orst-case bound δ smooth (dotted, b ottom). Moreo ver, δ iter decreases with 1 / ( k + 1) 2 , which gives an a priori per-iteration sub optimalit y b ound of δ ( k ) total = δ smooth +  N + 1 k + 1  2 δ iter . (140) On the “four colors” image, the actual gap stays just b elo w δ ( k ) total in the beginning (Fig. 14). This implies that the theoretical suboptimality b ound can hardly be impro ved, e.g. b y c ho osing constan ts more precisely . Unfortunately , the b ound is generally rather large, in this case at δ total = 256 . 8476 for 2000 iterations. While the Nesterov method outp erforms the theoretical b ound δ total b y a factor of 2 to 10 and ev en drops well b elo w the worst-case smo othing error δ smooth , it still cannot comp ete with the other methods, whic h achiev e a gap of 0 . 3052 (FPD) resp. 0 . 0754 (Douglas-Rachford). There is an in teresting extreme case where the Nestero v method seems to come to full strength. Consider the noise-free “triple p oin t” inpainting problem (Fig. 15). The triple junction in the cen ter can only b e reconstructed b y the P otts regularizer, as the ` 1 data term has been blank ed out around the center. By rev ersing the sign of the data term, one obtains the “inv erse triple p oin t” problem, an extreme case that has also b een studied in [CCP08] and sho wn to b e an example where the relaxation leads to a strictly nonbinary solution. On the inv erse problem, the Nesterov metho d catc hes up and even surpasses FPD. This stands in contrast with the regular triple p oin t problem, where all metho ds perform as usual. W e conjecture that this sudden strength comes from the inherent av eraging ov er all previous gradients (step 7 in Alg. 3): in fact, on the in verse problem Douglas-Rac hford and FPD display a pronounced oscillation in the primal and dual ob jectives, which is accompanied b y slo w con vergence. In contrast, the Nesterov metho d consistently shows a monotone and smo oth con vergence. 38 5 10 15 20 t 3210 3215 3220 3225 f 5 10 15 20 t - 21 850 - 21 800 - 21 750 - 21 700 f Figure 15: Primal and dual ob jectiv es for the triple point (top) and inv erse triple p oin t (b ottom) inpainting problems. Left to right : Input image with zero ed-out region around the center; relaxed solution; binarized solution; primal (upp er) and dual (low er) energies vs. time. The triple junction in the center has to b e reconstructed solely by the P otts regularizer. The inv erse triple p oint problem exhibits a strictly nonbinary relaxed solution. F or the inv erse triple p oin t, Douglas-Rachford (b ottom) and FPD (cen ter) show an oscillatory b e- ha vior which slows down conv ergence. The Nesterov approach (top) do es not suﬀer from oscillation due to the inherent av eraging, and surpasses FPD on the in verse problem. 100 200 300 400 500 t 72 888 72 890 72 892 72 894 72 896 72 898 72 900 f d 0 100 200 300 400 500 k 2 4 6 8 10 12 D t Figure 16: Performance on the “four colors” image with Potts in teraction p o- ten tial and the env elop e regularizer. Left: Dual ob jectives of for Douglas- Rac hford (top) and FPD (b ottom) vs. time. The reduced iteration count of the Douglas-Rachford metho d b ecomes more apparent in the time plot as the time p er iteration is now dominated by the pro jection rather than the DCT. Righ t: Time p er iteration for Nesterov (top), Douglas-Rachford (cen ter) and FPD (b ottom). The Nestero v metho d fails to conv erge as it accumulates er- rors from the appro ximate pro jections, whic h in turn leads to slo w er and more inexact pro jections. 39 7.4 P erformance for the En v elop e Relaxation Undoubtedly , the diﬃculty when using the en velope based regularizer comes from the slow and inexact pro jection steps whic h hav e to be appro ximated iterativ ely . Therefore we re-ev aluated the “four colors” benchmark image with the env elop e regularizer. The iterative Dykstra pro jection (Alg. 5) was stopp ed when the iterates diﬀered by at most δ = 10 − 2 , with an additional limit of 50 iterations. While the gap cannot b e computed in this case, the dual ob jective can still be ev aluated and pro vides an indicator for the conv ergence speed. W e found that in comparison to the Euclidean metric regularizer from the previous examples, the margin b et w een FPD and Douglas-Rachford increases signiﬁcan tly . This is consistent with the remarks in Sect. 7.1: the low er iter- ation count of the Douglas-Rachford metho d b ecomes more imp ortan t, as the pro jections dominate the p er-iteration runtime (Fig. 16). Surprisingly the Nestero v method did not conv erge at all. On insp ecting the p er-iteration runtime, we found that after the ﬁrst few outer iterations, the iterativ e pro jections b ecame v ery slow and even tually exceeded the limit of 50 iterations with δ remaining b etw een 2 and 5. In contrast, 20 Dykstra iterations w ere usually suﬃcient to obtain δ = 10 − 9 (Douglas-Rac hford) resp. δ = 10 − 11 (FPD). W e again attribute this to the a veraging property of the Nestero v metho d: as it accumulates the results of the previous pro jections, errors from the inexact pro jections build up. This is accelerated by the dual v ariables quickly b ecoming infeasible with increasing distance to the dual feasible set, which in turn puts higher demands on the iterativ e pro jections. Douglas-Rac hford and FPD did not display this b ehavior and consistently required 5 to 6 Dykstra iterations from the ﬁrst to the last iteration. 7.5 Tigh tness of the Relaxations Besides the prop erties of the optimization metho ds, it is interesting to study the eﬀect of the relaxation – i.e. Euclidean metric or env elop e type – on the relaxed and binarized solutions. T o get an insight into the tightness of the relaxations, we used the Douglas- Rac hford metho d to rep eat the “triple point” inpain ting exp erimen t in [CCP08] with both relaxations (Fig. 17). Despite the inaccuracies in the pro jections, the en velope regularizer generates a nearly binary solution: 97 . 6% of all pixels w ere assigned “almost binary” lab els with an ` ∞ distance of less than 0 . 05 to one of the unit v ectors { e 1 , . . . , e l } . F or the Euclidean metric method, this constrain t was only satisﬁed at 88 . 6% of the pixels. The result for the en velope relaxation is v ery close to the sharp triple junction one w ould expect from the con tinuous formulation, and sho ws that the env elope relaxation is tighter than the Euclidean metric metho d. Ho wev er, after binarization b oth approaches generate almost iden tical dis- crete results. The Euclidean metric metho d w as more than four times faster, with 41 . 1 seconds p er 1000 iterations vs. 172 . 16 seconds for the env elop e relax- ation, which required 8–11 Dykstra steps per outer iteration. While the triple p oin t is a problem speciﬁcally designed to challenge the regularizer, real-world images usually contain more structure as w ell as noise, while the data term is av ailable for most or all of the image. T o see if the ab o ve 40 Figure 17: Tigh tness of the relaxation. T op row: In the input (left), the data term w as blanked out in a quadratic region. All structure within the region is generated purely by the regularizer with a standard Potts interface p oten tial. The env elop e relaxation is tighter and generates a muc h more “binary” solu- tion (center) than the Euclidean metric metho d (righ t). Bottom ro w: After binarization of the relaxed solutions, the env elop e (left) and Euclidean metric (cen ter) metho ds generate essen tially the same solution, as can b e seen in the diﬀerence image (right). The Euclidean metric metho d p erformed more than four times faster due to the inexp ensiv e pro jections. results also hold under these conditions, we rep eated the previous exp erimen t with the “sailing” image and four classes (Fig. 18). The improv ed tigh tness of the en velope relaxation was also noticeable, with 96 . 2% vs. 90 . 6% of “almost binary” pixels. Ho wev er, due to the larger n umber of lab els and the larger image size of 360 × 240, run times increased to 4253 (env elope) vs. 420 (Euclidean metric) seconds. The relaxed as well as the binarized solutions sho w some diﬀerences but are hard to distinguish visually . It is diﬃcult to pinp oin t if these diﬀerences are caused b y the tigh ter relaxation or b y n umerical issues: while the Douglas- Rac hford method applied to the Euclidean metric relaxation con v erged to a ﬁnal relativ e gap of 1 . 5 · 10 − 6 , no such b ound is av ailable to estimate the accuracy of the solution for the env elop e relaxation, due to the inexact pro jections and the in tractable primal ob jective. 7.6 Binarization and Global Optimality As an example for a problem with a large num b er of classes, we analyzed the “p enguin” inpainting problem from [SZS + 06]. W e c hose 64 lab els corresponding to 64 equally spaced gra y v alues. The input image contains a region where the image m ust b e inpainted in addition to removing the considerable noise. Again the data term was generated by the ` 1 distance, whic h reduces here to the absolute diﬀerence of the gra y v alues. In order to remov e noise but not o v erly p enalize hard contrasts, such as b et w een the black wing and the white front, w e c hose a regularizer based on the truncated linear p oten tial as introduced in Sect. 4.4. Due to the large n umber of lab els, this problem constitutes an example 41 Figure 18: Eﬀect of the choice of relaxation metho d on the real-world “sailing” image (image courtesy of F. Beck er). T op row: F our-class segmentation using en velope (left) and Euclidean metric (righ t) methods. Sho wn are the solutions of the relaxed problem. Bottom ro w: Original image (left); diﬀerence im- age of the discretized solutions (righ t). While the env elop e relaxation leads to substan tially more “almost discrete” v alues in the relaxed solution, it also runs more than 10 times slo wer and do es not provide a sub optimality b ound. The generated solutions are visually almost iden tical. where the Euclidean metric approach is very useful. As the complexit y of the pro jections for the env elop e relaxation gro ws quadratically with the n umber of lab els, computation time becomes prohibitiv ely long for a mo derate amount of classes. In contrast, the Euclidean metric metho d requires considerably less computational eﬀort and still approximate the p oten tial function to a reasonable accuracy (Fig. 7). In the practical ev aluation, the Douglas-Rachford method conv erged in 1000 iterations to a relativ e gap of 8 . 3 · 10 − 4 , and recov ered b oth smo oth details near the b eak, and hard edges in the inpainting region (Fig. 19). This example also clearly demonstrates the sup eriorit y of the improv ed binarization scheme prop osed in Sect. 5.6. As opp osed to the ﬁrst-max scheme, the improv ed sc heme generated considerably less noise. The energy increased only b y 2 . 78% compared to 15 . 78% for the ﬁrst-max approac h. The lo w energy increase is directly related to global optimality for the dis- crete problem: as the relaxed solution is prov ably nearly optimal, w e conclude that the energy of the binarize d solution must lie within 2 . 78% of the optimal energy for the original c ombinatorial problem (1). Similar results were obtained for the other images: 5 . 64% for the “four colors” demo, 1 . 02% for the “leaf ” image and 0 . 98% for the “triple p oin t” problem. These n umbers indicate that the relaxation seems to b e quite tight in many cases, and allows to reco ver goo d approximations for the solution of the discrete com binatorial labeling problem b y solving the conv ex relaxed problem. 8 Conclusion and F urther W ork The presen t work pro vides a reference and framew ork for con tinuous multilabel- ing approac hes. The presen ted algorithms are robust and fast and are suited for 42 Figure 19: Denoising/inpain ting problem with 64 classes and the nontrivial truncated linear potential appro ximated b y an Euclidean metric. Left to right: Noisy input image with inpain ting region mark ed black [SZS + 06]; result with ﬁrst-max binarization; result with the prop osed binarization scheme (104). The ﬁrst-max metho d introduces noticeable noise in the binarization step. The pro- p osed metho d takes into accoun t the non-uniformity of the used “cut-linear” p oten tial (Fig. 7), resulting in a clean lab eling and an energy increase of only 2 . 78% vs. 15 . 78% for the ﬁrst-max metho d. This shows that the obtained solution solves the originally com binatorial multiclass lab eling problem to a sub optimalit y of 2 . 78%. massiv e parallelization. F rom the exp erimen ts it b ecame clear that solving the con vex relaxed problem allo ws to recov er very go od solutions for the original com binatorial problem in most cases. The performance ev aluations sho wed that the Douglas-Rac hford method consisten tly requires ab out one third of the iterations compared to the F ast Primal-Dual method. F or low regularization and fast pro jections, FPD out- p erforms the Douglas-Rachford metho d. In all other cases, Douglas-Rachford p erforms equally or b etter, with a sp eedup of 2-3 if the pro jections are expen- siv e. Overall, the proposed Douglas-Rac hford metho d approach appears to b e a solid all-round metho d that also handles extreme cases w ell. F rom the viewp oin t of numerics, in our ev aluations we did not sp eciﬁcally consider the eﬀect of choosing diﬀeren t step sizes for the Douglas-Rachford metho d. Also, it seems as if the smo oth optimization step in the Nesterov metho d usually performs muc h b etter than its theoretical b ound. Adjusting the strategy for choosing the smo othing parameter could yield a faster ov erall con vergence and p ossibly render the metho d comp etitiv e. In this regard, it w ould also be interesting to include the inexactness of the pro jections in to the con vergence analysis. There are also sev eral theoretical questions left, such as ho w to include non-metric distances into the con tinuous form ulation. In any case we think that the present framew ork unites contin uous and discrete w orlds in an app ealing w a y , and hopefully contributes to reducing the p opularit y gap compared to purely grid-based, graph cut methods. 43 Ac knowledgmen ts. The authors would like to thank Simon Setzer for the stim ulating discussion. W e are also grateful to an anonymous review er, whose extensiv e commen ts greatly helped to improv e the presen tation of this work. 9 App endix Prop osition 10. L et v = ( v 1 , . . . , v l ) ∈ D d loc , then k v k 6 min i  X j d ( i, j ) 2  1 2 . (141) Pr o of of Pr op. 10. F rom the constraint P l j =1 v j = 0 in (70) we deduce, for arbitrary but ﬁxed i ∈ { 1 , . . . , l } , l X j =1 k v j k 2 6   l X j =1 k v j k 2   − 0 + l k v i k 2 =   l X j =1 k v j k 2   − 2 h v i , l X j =1 v j i + l k v i k 2 = l X j =1  k v j k 2 − 2 h v i , v j i + k v i k 2  = l X j =1 k v j − v i k 2 6 l X j =1 d ( i, j ) 2 . (142) As i w as arbitrary this prov es the assertion. Pr o of of Eq. (126) . W e mainly use the fact that ( P ⊗ Q )( R ⊗ S ) = ( P R ) ⊗ ( QS ) for matrices P, Q, R, S with compatible dimensions: ( I + L > L ) − 1 =  I + ( A ⊗ grad) > ( A ⊗ grad)  − 1 (143) =  I +  A > A  ⊗  grad > grad  − 1 (144) =  I +  V − 1 diag( a ) V  ⊗  B − 1 diag( c ) B  − 1 (145) =  I +  V − 1 ⊗ B − 1  (diag( a ) ⊗ diag( c )) ( V ⊗ B )  − 1 (146) =  V − 1 ⊗ B − 1  ( I + diag( a ) ⊗ diag( c )) ( V ⊗ B )  − 1 (147) =  V − 1 ⊗ B − 1  ( I + diag( a ) ⊗ diag( c )) − 1 ( V ⊗ B ) . (148) Using V − 1 = V > , (126) follo ws. Pr o of of Pr op. 9. The idea of the pro of is to sho w that the sequence ( w 00 ( k ) ) is exactly the minimizing sequence pro duced by the algorithm applied to the dual problem (87), with step size 1 /τ . Th us, if the dual algorithm conv erges, ( w 00 ( k ) ) con verges to the solution of the dual problem, which prov es the assertion. T o 44 Algorithm 6 Dual Douglas-Rachford for Multi-Class Lab eling 1: Cho ose ¯ v (0) ∈ R n × d × l , ¯ z (0) ∈ R n × l . 2: Cho ose τ D > 0. 3: k ← 0. 4: while (not conv erged) do 5: v ( k ) ← Π D  ¯ v ( k ) − τ D b  . 6: z 00 ( k ) ← Π C  1 τ D ( ¯ z ( k ) − s )  . 7: v 0 ( k ) ← ( I + LL > ) − 1  (2 v ( k ) − ¯ v ( k ) ) + ( − L )( ¯ z ( k ) − 2 τ D z 00 ( k ) )  . 8: z 0 ( k ) ← ( − L > ) v 0 ( k ) . 9: ¯ v ( k +1) ← ¯ v ( k ) + v 0 ( k ) − v ( k ) . 10: ¯ z ( k +1) ← z 0 ( k ) + τ D z 00 ( k ) . 11: k ← k + 1. 12: end while sho w the equiv alency , ﬁrst note that the form ulation of the primal problem from (86), min u ∈C max v ∈D h u, s i + h Lu, v i − h b, v i (149) already cov ers the dual problem max v ∈D min u ∈C h u, s i + h Lu, v i − h b, v i = min v ∈D max u ∈C h b, v i + h u, − L > v i − h u, s i (150) b y the substitutions v ↔ u, C ↔ D , b ↔ s , L ↔ − L > . (151) The dual problem can thus b e solved by applying the abov e substitutions to Alg. 4. Additionally substituting w ↔ z in Alg. 4 in order to av oid confusion with iterates of the primal metho d, leads to the dual algorithm, Alg. 6. W e ﬁrst sho w conv e rgence of Alg. 4 and Alg. 6. By construction, these amoun t to applying Douglas-Rachford splitting to the primal resp. dual formulations min u,w δ Lu = w ( u, w ) | {z } =: h 1 ( u,w ) + h u, s i + δ C ( u ) + σ D ( w − b ) | {z } =: h 2 ( u,w ) , (152) min v ,z δ − L > v = z ( v , z ) | {z } =: h D, 1 ( v ,z ) + h v , b i + δ D ( v ) + σ C ( z − s ) | {z } =: h D, 2 ( v ,z ) . (153) As b oth parts of the ob jectives are prop er, conv ex and lsc, it suﬃces to sho w additivit y of the sub diﬀerentials [R W04, Cor. 10.9] [Eck89, Thm. 3.15] [Eck89, Prop. 3.20, Prop. 3.19] [EB92]. Due to the b oundedness of C and D , ri (dom h 2 ) ∩ ri (dom h 1 ) =  ri ( C ) × R ndl  ∩ { Lu = w } = { ( u, Lu ) | u ∈ ri( C ) } , (154) ri (dom h D, 2 ) ∩ ri (dom h D, 1 ) =  ri( D ) × R nl  ∩ {− L > v = z } = { ( v , − L > v ) | v ∈ ri( D ) } . (155) 45 Both of these sets are nonempty as ri( C ) 6 = ∅ 6 = ri( D ). This implies additivity of the sub diﬀeren tials for the prop osed ob jective [R W04, Cor. 10.9] and th us con- v ergence (in the iterates as w ell as the ob jectiv e) of the tight Douglas-Rachford iteration (cf. [Eck89, Thm. 3.15]). W e will now show that the primal and dual algorithms essentially generate the same iterates, i.e. from τ := τ P = 1 /τ D , ¯ u ( k ) = τ ¯ z ( k ) and ¯ w ( k ) = τ ¯ v ( k ) , it follo ws that ¯ u ( k +1) = τ ¯ z ( k +1) , ¯ w ( k +1) = τ ¯ v ( k +1) and u ( k ) = z 00 ( k ) , v ( k ) = w 00 ( k ) . The last tw o equalities follow immediately from the previous ones by deﬁnition of the algorithms. F urthermore, ¯ v ( k +1) = ¯ v ( k ) + v 0 ( k ) − v ( k ) = ¯ v ( k ) + ( I + LL > ) − 1 · ((2 v ( k ) − ¯ v ( k ) ) + ( − L )( ¯ z ( k ) − 2 τ − 1 z 00 ( k ) )) − v ( k ) (156) = τ − 1 ¯ w ( k ) + ( I + LL > ) − 1 ((2 w 00 ( k ) − τ − 1 ¯ w ( k ) ) +( − L )( τ − 1 ¯ u ( k ) − 2 τ − 1 u ( k ) )) − w 00 ( k ) (157) = τ − 1 ¯ w ( k ) + ( I + LL > ) − 1 (2 w 00 ( k ) − τ − 1 ¯ w ( k ) ) − w 00 ( k ) − ( I + LL > ) − 1 L ( τ − 1 ¯ u ( k ) − 2 τ − 1 u ( k ) ) . (158) By the W o odbury iden tity , ( I + LL > ) − 1 = I − L ( I + L > L ) − 1 L > and in particular ( I + LL > ) − 1 L = L ( I + L > L ) − 1 , therefore ¯ v ( k +1) = τ − 1 ¯ w ( k ) + 2 w 00 ( k ) − τ − 1 ¯ w ( k ) − L ( I + L > L ) − 1 L > (2 w 00 ( k ) − τ − 1 ¯ w ( k ) ) − w 00 ( k ) − L ( I + L > L ) − 1 ( τ − 1 ¯ u ( k ) − 2 τ − 1 u ( k ) ) (159) = w 00 ( k ) − L ( I + L > L ) − 1 · ( L > (2 w 00 ( k ) − τ − 1 ¯ w ( k ) ) + ( τ − 1 ¯ u ( k ) − 2 τ − 1 u ( k ) )) (160) = τ − 1 L ( I + L > L ) − 1 · ((2 u ( k ) − ¯ u ( k ) ) + L > ( ¯ w ( k ) − 2 τ w 00 ( k ) )) + w 00 ( k ) (161) = τ − 1 ( Lu 0 ( k ) + τ w 00 ( k ) ) = τ − 1 ¯ w ( k +1) . (162) By primal-dual symmetry , the same pro of shows that ¯ u ( k +1) = τ ¯ z ( k +1) . T o con- clude, we hav e shown that w 00 ( k ) = v ( k ) for τ D = 1 /τ and suitable initialization of the dual metho d. As the dual metho d w as sho wn to conv erge, ( w 00 ( k ) ) m ust b e a maximizing sequence for f D . T ogether with the con vergence of u ( k ) to a minimizer of f , this pro ves the ﬁrst part of the prop osition. Equation (129) follo ws, as for any saddle-p oin t ( u ∗ , v ∗ ), f D  w 00 ( k )  6 f D ( v ∗ ) = f ( u ∗ ) 6 f ( u ( k ) ) (163) holds, and therefore f ( u ( k ) ) − f ( u ∗ ) > 0 and f ( u ∗ ) > f D  w 00 ( k )  . References [AFP00] L. Am brosio, N. F usco, and D. P allara. F unctions of Bounde d V ari- ation and F r e e Disc ontinuity Pr oblems . Clarendon Press, 2000. [AHU64] K. J. Arrow, L. Hurwicz, and H. Uzaw a. Studies in line ar and non- line ar pr o gr amming. With c ontributions by Hol lis B. Chenery [and others] . Stanford Univ ersity Press, 1964. 46 [A T06] B. Appleton and H. T alb ot. Globally minimal surfaces by contin u- ous maximal ﬂo ws. Patt. A nal. Mach. Intel l. , 28:106–118, 2006. [Auj08] J.-F. Aujol. Some ﬁrst-order algorithms for total v ariation based image restoration. J. Math. Imaging Vis. , 2008. published online. [BBC09] S. Beck er, J. Bobin, and E. J. Candes. NEST A: A F ast and A c cur ate First-or der Metho d for Sp arse R e c overy , April 2009. [BD86] J. P . Boyle and R. L. Dykstra. A metho d for ﬁnding pro jections on to the intersections of con vex sets in Hilbert spaces. L e ctur e Notes in Statistics , 37:28–47, 1986. [Ber09] B. Berk els. An unconstrained m ultiphase thresholding approach for image segmentation. In Sc ale Sp ac e and V ar. Meth. , volume 5567 of Springer LNCS , pages 26–37, 2009. [BG05] I. Borg and P . J. F. Gro enen. Mo dern Multidimensional Sc aling . Springer, 2nd edition, 2005. [BK04] Y. Bo yko v and V. Kolmogorov. An exp erimen tal comparison of min-cut/max-ﬂo w algorithms for energy minimization in vision. Patt. A nal. Mach. Intel l. , 26(9):1124–1137, 2004. [Bo y03] Y. Boyk ov. Computing geo desics and minimal surfaces via graph cuts. In Int. Conf. Comp. Vis. , pages 26–33, 2003. [Bra02] A. Braides. Gamma-c onver genc e for Be ginners . Oxford Univ. Press, 2002. [BT09] E. Bae and X.-C. T ai. Graph cut optimization for the piecewise constan t level set metho d applied to multiphase image segmenta- tion. In Sc ale Sp ac e and V ar. Meth. , v olume 5567 of LNCS , pages 1–13, 2009. [BVZ01] Y. Bo yk ov, O. V eksler, and R. Zabih. F ast approximate energy min- imization via graph cuts. Patt. Anal. Mach. Intel l. , 23(11):1222– 1239, 2001. [BYT10] E. Bae, J. Y uan, and X.-C. T ai. Global minimization for con tinuous m ultiphase partitioning problems using a dual approach. CAM Rep ort 09-75, 2010. [CCP08] A. Chambolle, D. Cremers, and T. Pock. A conv ex approach for computing minimal partitions. T ec h. Rep. 649, Ecole P olytechnique CMAP , 2008. [CD09] A. Chambolle and J. Darb on. On total v ariation minimization and surface evolution using parameteric maximum ﬂo ws. Int. J. Comp. Vis , 84:288–307, 2009. [CEN06] T. F. Chan, S. Esedo¯ glu, and M. Nikolo v a. Algorithms for ﬁnding global minimizers of image segmentation and denoising mo dels. J. Appl. Math. , 66(5):1632–1648, 2006. 47 [CP08] P . L. Com b ettes and J.-C. Pesquet. A proximal decomp osition metho d for solving con vex v ariational inv erse problems. Inverse Pr oblems , 24(6), 2008. [CP10] A. Chambolle and T. Pock. A ﬁrst-order primal-dual algorithm for con vex problems with applications to imaging. J. Math. Imaging Vision , 2010. published online. [D A V08] V. Duv al, J.-F. Aujol, and L. V ese. A pro jected gradient algorithm for color image decomp osition. CMLA Pr eprint , (2008-21), 2008. [DFPH09] A. Delaunoy , K. F undana, E. Prados, and A. Heyden. Conv ex m ulti-region segmentation on manifolds. In Int. Conf. Comp. Vis. , 2009. [DR56] J. Douglas and H. H. Rachford. On the n umerical solution of heat conduction problems in tw o and three space v ariables. T r ans. of the AMS , 82(2):421–439, 1956. [EB92] J. Ec kstein and D. P . Bertsek as. On the Douglas-Rachford splitting metho d and the proximal p oin t algorithm for maximal monotone op erators. Math. Pr o g. , 55:293–318, 1992. [Ec k89] J. Ec kstein. Splitting Metho ds for Monotone Op er ators with Appli- c ation to Par al lel Optimization . PhD thesis, MIT, 1989. [FR60] W. H. Fleming and R. Rishel. An in tegral form ula for total gradient v ariation. Ar chiv der Mathematik , 11(1):218–222, 1960. [GBO09] T. Goldstein, X. Bresson, and S. Osher. Geometric applications of the split bregman metho d: Segmentation and surface reconstruc- tion. CAM Report 09-06, UCLA, 2009. [GM89] N. Gaﬀke and R. Mathar. A cyclic pro jection algorith m via dualit y . Metrika , 36(1):29–54, 1989. [GM09] D. Goldfarb and S. Ma. F ast multiple splitting algorithms for con- v ex optimization. arXiv Preprin t 0912.4570, 2009. [Gol64] A. A. Goldstein. Conv ex programming in hilbert space. Bul l. Amer. Math. So c. , 70:709–710, 1964. [Go w85] J.C. Go wer. Prop erties of Euclidean and non-Euclidean distance matrices. Lin. Alg. and its Appl. , 67:81–97, 1985. [Gra81] A. Graham. Kr one cker Pr o ducts and Matrix Calculus with Appli- c ations . J. Wiley and Sons, NY, 1981. [Ish03] H. Ishik a wa. Exact optimization for Marko v random ﬁelds with con vex priors. Patt. A nal. Mach. Intel l. , 25(10):1333–1336, 2003. [JT95] C. R. Johnson and P . T arazaga. Connections b etw een the real p ositiv e semideﬁnite and distance matrix completion problems. Lin. A lg. and its Appl. , 223–224:375–391, 1995. 48 [KB05] V. Kolmogoro v and Y. Boyk ov. What metrics can be appro ximated b y geo-cuts, or global optimization of length/area and ﬂux. Int. Conf. Comp. Vis. , 1:564–571, 2005. [KT99] J.M. Kleinberg and E. T ardos. Approximation algorithms for clas- siﬁcation problems with pairwise relationships: Metric labeling and Mark ov random ﬁelds. In F ound. Comp. Sci. , pages 14–23, 1999. [KT07] N. Komo dakis and G. Tziritas. Approximate lab eling via graph cuts based on linear programming. Patt. Anal. Mach. Intel l. , 29(8):1436–1453, 2007. [LBS09] J. Lellmann, F. Bec ker, and C. Schn¨ orr. Conv ex optimization for m ulti-class image lab eling with a no vel family of total v ariation based regularizers. In Int. Conf. Comp. Vis. , 2009. [LBS10] J. Lellmann, D. Breitenreicher, and C. Schn¨ orr. F ast and exact primal-dual iterations for v ariational problems in computer vision. In Eur op. Conf. Comp. Vis. , volume 6312 of LNCS , pages 494–505, 2010. [LKY + 09] J. Lellmann, J. Kappes, J. Y uan, F. Beck er, and C. Schn¨ orr. Con vex m ulti-class image lab eling b y simplex-constrained total v ariation. In Sc ale Sp ac e and V ar. Meth. , volume 5567 of LNCS , pages 150–162, 2009. [LL T06] J. Lie, M. Lysak er, and X.-C. T ai. A v ariant of the level set metho d and applications to image segmen tation. Math. Comp. , 75:1155– 1174, 2006. [LM79] P . L. Lions and B. Mercier. Splitting algorithms for the sum of t wo nonlinear op erators. SIAM J. Num. Anal. , 16(6):964–979, 1979. [LP66] E.S. Levitin and B.T. Poly ak. Constrained minimization problems. U.S.S.R. Comput. Math. Math. Phys. , 6:1–50, 1966. [Mic86] C. Michelot. A ﬁnite algorithm for ﬁnding the pro jection of a p oint on to the canonical simplex of R n . J. Optim. The ory and Appl. , 50(1):195–200, 1986. [MS89] D. Mumford and J. Shah. Optimal appro ximations by piecewise smo oth functions and asso ciated v ariational problems. Comm. Pur e Appl. Math. , 42:577–685, 1989. [Mur03] K. Murota. Discr ete Convex Analysis . SIAM, 2003. [Nes04] Y. Nesterov. Smooth minimization of non-smo oth functions. Math. Pr o g. , 103(1):127–152, 2004. [Ols09] C. Olsson. Glob al Optimization in Computer Vision: Convexity, Cuts and Appr oximation Algorithms . PhD thesis, Lund Univ ersity , 2009. 49 [PCBC09a] T. Pock, D. Cremers, H. Bischof, and A. Chambolle. An algorithm for minimizing the Mumford-Shah functional. In Int. Conf. Comp. Vis. , 2009. [PCBC09b] T. Pock, D. Cremers, H. Bisc hof, and A. Cham b olle. Global solu- tions of v ariational mo dels with conv ex regularization. T ec hnical rep ort, Graz Univ. of T ec h., 2009. Preprint. [PCF06] N. P aragios, Y. Chen, and O. F augeras, editors. The Handb o ok of Mathematic al Mo dels in Computer Vision . Springer, 2006. [P op80] L. D. P op o v. A mo diﬁcation of the Arrow-Hurwicz metho d for searc h of saddle p oints. Math. Notes , 28:845–848, 1980. [Ro c70] R.T. Ro c k afellar. Convex A nalysis . Princeton up , 1970. [R OF92] L. Rudin, S. Osher, and E. F atemi. Nonlinear total v ariation based noise remov al algorithms. Physic a D , 60:259–268, 1992. [R W04] R.T. Ro ck afellar and R. J.-B. W ets. V ariational Analysis . Springer, 2nd edition, 2004. [Set09a] S. Setzer. Split Bregman algorithm, Douglas-Rac hford splitting and frame shrink age. In Sc ale Sp ac e and V ariational Metho ds in Computer Vision , volume 5567 of LCNS , pages 464–476, 2009. [Set09b] S. Setzer. Splitting Metho ds in Image Pr o c essing . PhD thesis, Uni- v ersity of Mannheim, Septem b er 2009. [SR96] G. Sapiro and D. L. Ringach. Anisotropic diﬀusion of multi-v alued images with applications to color ﬁltering. In T r ans. Image Pr o c ess. , v olume 5, pages 1582–1586, 1996. [Str83] G. Strang. Maximal ﬂo w through a domain. Math. Pr o g. , 26:123– 143, 1983. [Str99] G. Strang. The discrete cosine transform. SIAM R eview , 41(1):135– 147, 1999. [SZS + 06] R. Szeliski, R. Zabih, D. Scharstein, O. V eksler, V. Kolmogoro v, A. Agarw ala, M. T app en, and C. Rother. A comparativ e study of energy minimization metho ds for Marko v random ﬁelds. In Eur op. Conf. Comp. Vis. , volume 2, pages 19–26, 2006. [TPCB08] W. T robin, T. Pock, D. Cremers, and H. Bischof. Con tinuous en- ergy minimization by rep eated binary fusion. In ECCV , volume 4, pages 667–690, 2008. [W ABF07] P . W eiss, G. Aub ert, and L. Blanc-F´ eraud. Eﬃcien t sc hemes for total v ariation minimization under constraints in image pro cessing. T ec h. Rep. 6260, INRIA, 2007. [Win06] G. Winkler. Image Analysis, R andom Fields and Markov Chain Monte Carlo Metho ds . Springer, 2006. 50 [WSV00] H. W olk owicz, R. Saigal, and L. V anden b erghe, editors. Handb o ok of Semideﬁnite Pr o gr amming . Klu wer Academic Publishers, 2000. [WX04] C. W ang and N. Xiu. Con vergence of the gradien t pro jection metho d for generalized con vex minimization. Comp. Opt. Appl. , 16:111–120, 2004. [Xu00] S. Xu. Estimation of the con vergence rate of Dykstra’s cyclic pro jec- tions algorithm in polyhedral case. A cta Mathematic ae Applic atae Sinic a , 16(2):217–220, 2000. [ZC08] M. Zhu and T. Chan. An eﬃcient primal-dual h ybrid gradient algorithm for total v ariation image restoration. CAM Rep ort 08- 34, UCLA, 2008. [ZGFN08] C. Zach, D. Gallup, J.-M. F rahm, and M. Niethammer. F ast global lab eling for real-time stereo using multiple plane sweeps. In Vis. Mo d. Vis , 2008. [Zie89] W.P . Ziemer. We akly Diﬀer entiable F unctions . Springer, 1989. [ZNF09] C. Zac h, M. Niethammer, and J.-M. F rahm. Con tinuous maximal ﬂo ws and W ulﬀ shap es: Application to MRFs. In Comp. Vis. Patt. R e c o gn. , pages 1911–1918, 2009. 51

Continuous Multiclass Labeling Approaches and Algorithms

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment