Compact Relaxations for MAP Inference in Pairwise MRFs with Piecewise Linear Priors

Compact Relaxations for MAP Inference in P airwise MRFs with Piecewise Linear Priors Christopher Zac h ∗ and Christian H¨ ane † August 15, 2013 Abstract Lab el assignmen t problems with large state spaces are imp ortant tasks especially in com- puter vision. Often the pairwise in teraction (or smo othness prior) b etw een lab els assigned at adjacent no des (or pixels) can b e describ ed as a function of the lab el diﬀerence. Exact inference in such lab eling tasks is still diﬃcult, and therefore approximate inference meth- o ds based on a linear programming (LP) relaxation are commonly used in practice. In this w ork we study how compact linear programs can be constructed for general piecwise linear smo othness priors. The n umber of unknowns is O ( LK ) per pairwise clique in terms of the state space size L and the num ber of linear segments K . This compares to an O ( L 2 ) size complexit y of the standard LP relaxation if the piecewise linear structure is ignored. Our compact construction and the standard LP relaxation are equiv alent and lead to the same (appro ximate) lab el assignment. 1 In tro duction Determining a maxim um a-p osteriori (MAP) solution in graphical mo dels ov er discrete states, or equiv alen tly ﬁnding a minimizer of a corresp onding energy , is one of the fundamen tal tools in mac hine learning and computer vision. In this w ork w e fo cus on problems with at most pair- wise cliques (and asso ciated pairwise p otentials) in the graphical mo del. In the following we use the terms “pairwise p otential” and “(smo othness) prior” synon ymously . Since exact inference in graphical models (even with at most pairwise p otentials) is generally not tractable, research has b een fo cused on tractable and high-qualit y appr oximate inference algorithms, of whic h the linear programming (LP) relaxation for discrete inference tasks (e.g. [4, 22, 21, 20]) has receiv ed muc h atten tion. In some cases the LP relaxation solv es the inference problem exactly [10, 13, 18], i.e. the relaxation is tight for certain problem classes. Due to the speciﬁc structure of the resulting linear program generic metho ds to solve linear programs are ineﬃcient and therefore many specialized algorithms to ﬁnd a minimizer ha ve been prop osed in the literature. Since the primal linear pro- gram for MAP estimation in pairwise problems has a quadratic num ber of unknowns in terms of the state space size L (p er edge in the underlying graph), the dual program with only a linear n umber of unknowns (but with a quadratic num b er of e.g. constrain ts) is more appealing. Belief propagation inspired message passing metho ds optimize this dual program in a block-coordinate sc hedule [14, 12, 6, 8]. As blo ck-coordinate metho ds applied on a non-smooth and not strictly conca ve problem these approaches iterativ ely increase the dual ob jective but are not guaranteed to ﬁnd a global maximizer of the (conca ve dual) problem. This is w ell kno wn in the literature, and w e v alidate this o ccasional “early stopping” behavior in the exp erimental section. Supergradient metho ds with an appropriate stepsize rule are guaranteed to conv erge to a maximizer, but hav e a slo w O (1 / √ T ) conv ergence rate (where T is the iteration coun t). T o our kno wledge all v ariations of faster O (1 /T ) proximal metho ds explicitly or implicitly maintain O ( L 2 ) primal unkno wns, and ∗ Microsoft Researc h Cam bridge † ETH Z¨ urich, Switzerland 1 are therefore prohibitively exp ensiv e (in terms of memory requiremen ts) for large state spaces. V ariable smo othing metho ds for non-smo oth problems (e.g. [1]) ha ve an O (ln( T ) /T ) conv ergence rate and require only O ( L ) unkno wns if applied on the dual problem, but in practice show slo w con vergence in our exp erience (as v eriﬁed in our exp erimen tal section). (a) (b) (c) (d) Figure 1: Pairwise p otentials (in terms of the lab el diﬀerence h ) suitable for compact relaxations. A natural question is whether the size of the primal program can b e reduced if the pairwise p oten tials are not arbitrary but ha ve some useful “structure.” F or particular pairwise p otentials suc h as the L 1 -norm of label diﬀerences [11, 2] or truncated L 1 priors [26] the answ er is aﬃrmativ e. W e generalize these particular results to pairwise p otentials that are piecewise linear in terms of the lab el diﬀerence. In man y computer vision related inference problems the state space is a set of numeric v alues and the employ ed smo othness prior is naturally based on lab el v alue diﬀerences. Some of the p otentials addressed in our work are illustrated in Fig. 1. W e show that for suc h pairwise p oten tials (but with arbitrary unary ones) we can reform ulate the primal linear programs in order to reduce the num b er of primal v ariables from O ( L 2 ) to O ( K L ), where K is the num b er of linear pieces deﬁning the pairwise p otential. The n umber of dual unkno wns (or primal constrain ts, resp ectively) increases to O ( K L ), meaning an o v erall O ( K L ) problem size. Sp eciﬁcally , we consider pairwise potentials that can be written as p oint wise minim um of con vex ones. Since our construction is a reformulation, there is a corresp ondence betw een minimizers of the original LP and the ones from our reduced program. Thus, our construction do es not weak en the LP relaxation for inference. This manuscript is organized as follo ws: after in tro ducing relev ant notations and brieﬂy re- viewing appro ximate MAP inference in Section 2, w e state our main result in Section 3. The corresp onding pro of is constructive and the tw o main ingredients are presented in Sections 4 (con- v ex pairwise priors) and 5 (p oin twise minim um of pairwise p oten tials), resp ectively . Since the underlying techniques are useful in their o wn righ t, w e pro vide the material in separate (and rela- tiv ely self-contained) sections. In Section 6 we discuss extensions of the prop osed reform ulations to enable more isotropic b ehavior of solutions, which can b e relev ant in image pro cessing applica- tions. In Section 7 we exp erimentally verify that message passing metho ds can stop early , and we demonstrate our approach in an image denoising exp eriment. 2 Bac kground In this section w e introduce some notation used throughout the man uscript, and further provide a short review on approximate inference for lab eling problems. 2.1 Notations The domain of the considered lab el assignment task is a graph G = ( V , E ) with no de set V and edge set E . In computer vision and image pro cessing applications the node set is t ypically a regular pixel grid and E is induced by a e.g. 4-connected or 8-connected neighborho o d structure. W e will write P s and P s ∼ t as shorthand notations for P s ∈V and P ( s,t ) ∈E , resp ectiv ely . Our conv ention is that s , t denote no des from V and i , j indicate states (or lab els). W e will also use the sets out( s ) for the successor of s and in( s ) for the ancestor nodes of s . 2 The d -dimensional (unit or probability) simplex is deﬁned as ∆ d def = { x ≥ 0 : P d − 1 i =0 x i = 1 } . Elemen ts x ∈ ∆ d can b e seen as discrete probabilit y densities, and w e denote the corresp onding cum ulative distribution function X with X i = P i − 1 j =0 x j . W e extend X to indices i ∈ Z with X i = 0 for i ≤ 0 and X i = 1 for i ≥ L . If x ∈ ∆ d is integral (e.g. x i = 1 for some i , and 0 otherwise), then X can also b e interpreted as sup erlevel function with X j = [ j > i ]. The main purp ose of in tro ducing X i is to ha ve a shorthand notation for P i − 1 j =0 x j , whic h will occur frequen tly in this manuscript. W e use the notations ı C ( x ) and ı { x ∈ C } to write a constrain t x ∈ C as an extended v alued function, i.e. ı C ( x ) is 0 iﬀ x ∈ C and ∞ otherwise. F or a con vex function f we denote its con vex conjugate b y f ∗ and the l.s.c. extension of its persp ective as f  , whic h can be deﬁned via the biconjugate: f  ( z , w ) def = max µ,ν : µ + f ∗ ( ν ) ≥ 0 z µ + w T ν. (1) Throughout the man uscript we assume that the recession function of f is ı { 0 } (i.e. lim z → 0+ f  ( z , w ) = ı { 0 } ( w )). This can be ac hieved by add ing redundan t bounds constrain ts to f , since all unkno wns in our conv ex problems are usually restricted to [0 , 1]. In section 5 we will make use of the following fact (see e.g. [27]): Lemma 1. L et { f i } i =1 ,...,n b e a family of c onvex functions, then min ξ ∈ R d min i f i ( ξ ) = min z ∈ ∆ n min w i ∈ R d n X i =1 f i  ( z i , w i ) . (2) 2.2 Appro ximate Inference F or a given graph G = ( V , E ) and lab el (state) space L = { 0 , . . . , L − 1 } the task of inference in a lab el assignment problem is to determine a minimizer of E labeling (Λ) = X s θ Λ( s ) s + X s ∼ t θ Λ( s ) , Λ( t ) st , (3) where Λ : V → L (a mapping from no des to states), and θ i s and θ ij st are the unary and pairwise p oten tials, resp ectiv ely . The local nature of the modeled in teractions means that the ab o ve lab el assignmen t problem is an instance of a Mark ov Random Field (MRF). Note that with resp ect to inference (i.e. ﬁnding a MAP solution) conditional random ﬁelds (CRFs) are completely equiv alen t to MRFs, and our use of the term “MRF” includes b oth CRFs and “prop er” MRFs. In this work w e focus on problems with at most pairwise interactions b etw een lab els. In general suc h lab eling problems are diﬃcult to solv e exactly due to the NP-hardness of many instances. A tractable appro ximation to E labeling is obtained by “lifting” the problem to a higher dimensional setting: for each node s ∈ V and each edge ( s, t ) ∈ E vectors x s ∈ ∆ L and x st ∈ ∆ L 2 are introduced, which denote “soft one-hot” enco dings of the lab els assigned to a no de (or to an edge, resp ectively) (see e.g. [22, 21, 20]). As result one obtains the follo wing linear programming relaxation enabling tractable approximate inference: E LP-MRF ( x ) = X s,i θ i s x i s + X s ∼ t X i,j θ ij st x ij st (4) sub ject to x i s = X j x ij st x j t = X i x ij st x s ∈ ∆ L , x ij st ≥ 0 , ∀ s, t, i, j The ﬁrst set of constrain ts are usually called mar ginalization c onstr aints . These constraints ensure that the lab els assigned to edges are consistent with the ones assigned at no des. x s ∈ ∆ L implies the normalization c onstr aint P i x i s = 1. 3 In many computer vision problems (e.g image denoising, optical ﬂow) the pairwise terms often do not dep end on the actual lab els i and j but only on their diﬀerence (i.e. the “height” of jumps b et ween labels). In this case the pairwise potentials can be written as θ ij st = ϑ j − i st , or even as θ ij st = θ j i st = ϑ | i − j | st in the case of symmetric ones. The num b er of unkno wns in Eq. 4 is in O ( L 2 |E | ) whic h can make inference with many lab els costly . F or certain pairwise p otentials the n umber of unkno wns can b e reduced to O ( L |E | ), e.g. L 1 p oten tials ( θ ij st = w st | i − j | , [11, 2]), and truncated L 1 p oten tials ( θ ij st = w st min { τ st , | i − j |} , e.g. [26]). In this work we show that the n umber of primal unkno wns can be reduced from O ( L 2 |E | ) to O ( K L |E | ) for piecewise linear p oten tials consisting of K segments. 3 The Main Result In this section we state our main result, whic h generically shows that piecewise linear pairwise p oten tials allow for a compact reform ulation of E LP-MRF (Eq. 4). In this section w e only sketc h the pro of, since it is based on more general constructions described in detail in Sections 4 and 5. Theorem 1. L et ϑ h st = θ i,i + h st b e a p airwise p otential, that is a pie c ewise line ar function with r esp e ct to h c onsisting of K se gments, having br e akp oints only at inte gr al values of h . Then ther e exists a r eformulation of Eq.4 that r e quir es 2 K L primal unknowns and 2 L ( K + 1) + K line ar c onstr aints p er e dge in the gr aph. Pr o of. (Sketc h:) In the following we consider a particular edge ( s, t ) and drop the subscript st . Under the ab ov e assumptions on ϑ h can b e written as ϑ h = min k ∈{ 0 ,...,K − 1 } n α k h + β k + ı [ h k ,h k ] ( h ) o , (5) i.e. as minimum of linear functions with b ounded (and conv ex) domains (see Fig. 2). Using the results derived in Section 4, potentials of the form ϑ h k def = α k h + β k + ı [ h k ,h k ] ( h ) (6) allo w for a compact reform ulation of Eq. 4 using only 2 L primal unknowns and at most 2 L + 2 constrain ts (see Eq. 17). In Section 5 it is sho wn that the minimum of suc h K p otentials (repre- sen ted by their compact reform ulations) leads to a combined reformulation with 2 LK unkno wns and 2 L ( K + 1) + K constraints (see Eq. 28). h ϑ h (a) h ϑ h (b) Figure 2: (a) A b ounded linear p otential. (b) A piecewise linear (but not necessary contin uous) p oten tial represen ted as minimum of b ounded linear ones. R emark 1 . Our wa y of countin g constraints corresponds to the n umber of dual v ariables one needs to introduce in order to obtain a c onvenient saddle-point formulation suitable for straightforw ard 4 optimization e.g. via the (preconditioned) primal-dual metho d [17]. Therefore we do not need to introduce dual v ariables whenev er closed-form proximal steps are av ailable. In particular, we do not include simple b ounds constraints (such as non-negativity constraints) in our coun ting of constrain ts, but we alwa ys in tro duce enough dual v ariables to av oid non-trivial proximal steps. R emark 2 . W e w ant to emphasis that rewriting piecewise linear p otentials as minim um of b ounded linear functions (as in Eq. 5) also allows for eﬃcient up dates in message passing algorithms (such as b elief propagation and its v ariants with guaranteed conv ergence). Eﬃcien t metho ds addressing (optionally truncated) L 1 and quadratic regularization are already presented in [5], and we can easily generalize their result: assume the pairwise p otentials can b e written as minimum of K simple conv ex p otentials as in Eq. 5, then the low er env elop e computation i 7→ min j n θ j t + ϑ j − i o (7) can b e done in O ( K L ) time. This can be seen as follo ws: we rewrite the minimum env elop e as i 7→ min k min j n θ j t + ϑ j − i k o = min k min j : h k ≤ j − i ≤ h k n θ j t + α k ( j − i ) o , hence the minim um en velope can b e computed in O ( K L ) time if the inner env elop e can b e done in O ( L ) time. Observe that the low er env elop e min j : h k ≤ j − i ≤ h k n θ j t + α k ( j − i ) o is an instance of the min-ﬁlter problem, which can b e solv ed in O ( L ) time (e.g. [23]). In terestingly , the very easily implementable online algorithm for min-ﬁltering proposed in [15] clearly resem bles the low er env elope algorithm for quadratic costs in [5]. 4 Piecewise Linear and Con v ex P airwise P oten tials In this section w e consider pairwise potentials, that can be written as p oin twise maxim um of aﬃne functions in terms of the label diﬀerence h = j − i , i.e. θ ij st = ϑ j − i st = max k ∈{ 0 ,...,K − 1 }  ¯ α k ( j − i ) + ¯ β k  (8) for parameters ¯ α k , ¯ β k ∈ R . W e assume that the breakp oints of ϑ h st as a function of h = j − i are lo cated on integral arguments h . Since pairwise p otentials are only speciﬁed for integral lab el v alues, this can b e alwa ys achiev ed (at the exp ense of at most doubling the n umber of aﬃne functions). In order to simplify the notation we assume w.l.o.g. edge-indep endent v alues ¯ α k and ¯ β k (and therefore drop the subscript st ), but all results b elo w hold for edge-sp eciﬁc co eﬃcients ¯ α k st and ¯ β k st as well. By deﬁnition ϑ h st is a conv ex and piecewise linear function with resp ect to h , see also Fig. 1(a). 4.1 Minim um Cut Graph Construction Our construction b elow is diﬀerent to Ishik aw a’s graph cut approach solving MRFs with con vex and symmetric priors [10], but can b e seen as generalization of his earlier construction in [11]. The main b eneﬁts of our prop osed construction can b e summarized as follo ws: ﬁrst, it is v ery in tuitive to understand; second, it naturally allows asymmetric con vex p otentials; and ﬁnally , it immediately enables extensions to more isotropic regularizers that can b e relev an t in image pro cessing applications. 5 (a) Asymm. L 1 (b) “Shifted” L 1 Figure 3: Graph cut construction for asymmetric L 1 (a) and “shifted” L 1 p oten tials (b). The red curv e illustrates a potential cut. The labeling problem with conv ex pairwise priors is solved by computing the minimum-cut in a weigh ted graph. The no de set of the graph is { S, T } ∪ { a i s } s ∈V ,i ∈{ 0 ,...,L } , where S and T are the source and sink, resp ectively . The edge set contains inﬁnity links ( S, a 0 s ) and ( a L s , T ) for all s ∈ V . No de a i s is connected to no de a i +1 s with a directed edge e i s . A lab el i is assigned to s if the minimal cut go es through edge e i s . In order to ensure that only one lab el is assigned at a node s , there are directed edges with inﬁnite w eight from no des a i +1 s to a i s . Finally , a subset of directed pairwise edges connecting a i s with a j t (and vice versa) will b e included into the graph as describ ed in the following. The conv ex p otential in Eq. 8 can b e equiv alen tly written as (with h = j − i ) ϑ h st = X k ∈{ 0 ,...,K − 1 }  γ k ( h + δ k )  + + αh + β . (9) It will be conv enient later to explicitly include the aﬃne term, αh + β . Note that β only aﬀects the ob jectiv e of the minimizer, not the minimizer itself, and hence can b e ignored in the graph construction. The term αh = αj − αi do es not dep end jointly on i and j , and consequently can b e (temp orarily) absorb ed into the unary potentials for the graph construction (e.g. θ i s and θ j t are augmen ted with − αi and αj , resp ectively). Th us, w e fo cus on the ﬁrst expression in Eq. 9 b elo w. With our ab ov e assumption of integral breakp oints w e ha ve δ k ∈ Z without loss of generalit y . In the following we fo cus on a single summand,  γ k ( h + δ k )  + . If δ k = 0, the term  γ k h  + corresp onds to a “one-sided” L 1 (or total v ariation) regularizer and can b e solved b y adding lateral directed edges in to the graph (see Fig. 3(a)). If δ k 6 = 0, one can temp orarily rein terpret lab el v alue j as j + δ k and again obtain an asymmetric L 1 -t yp e smo othness prior but b etwe en lab el i and a “shifte d” lab el j + δ k . Consequently , for each term  γ k ( h + δ k )  + with δ k 6 = 0 directed diagonal edges are inserted into the graph (see Fig. 3(b)). 6 4.2 The Equiv alent Linear Program The resulting graph can b e immediately written as linear program 1 , E Cvx-Cut ( u ) = X s,i θ i s  u i +1 s − u i s  + X s ∼ t β (10) + X s ∼ t   α X i ( u i t − u i s ) + X k,i h γ k  u i s − u i + δ k t i +   s. t. u 0 s = 0 , u L s = 1 , u i +1 s ≥ u i s , where u : V × { 0 , . . . , L } → [0 , 1] enco des whether no de a i s b elongs to the source ( u i s = 0) or to the sink ( u i s = 1). Note that we explicitly state the contribution of the linear part, ¯ αh in Eq. 9, to the unaries. In order to k eep the equations simple, we introduce the conv ention that “out-of-b ounds” v alues u i s yield 0 if i < 0 and 1 if i ≥ L for all s ∈ V . Observ e that u i s as function of i is a sup erlev el representation and can therefore b e written as u i s = X i s for some x s ∈ ∆ L . Thus, we can rewrite E Cvx-Cut as E Cvx-LP ( x ) = X s,i θ i s x i s + X s ∼ t,i X k h γ k  X i s − X i + δ k t i + + X s ∼ t α X i ( X i t − X i s ) + β ! (11) s.t. x s ∈ ∆ L . In order to use this result as a building blo ck for the construction in the follo wing Section 5, we in tro duce for the pairwise terms f cvx st ( y s , y t | x s , x t ) def = α st X i ( Y i t − Y i s ) + β st (12) + X k,i h γ k st  Y i s − Y i + δ k st t i + + ı ∆ L ( y s ) + ı ∆ L ( y t ) . W e explicitly added the (redundan t) simplex constrain ts on y s and y t in order to obtain a trivial recession function for f cvx st , which will be important in Section 5. Now, E Cvx-LP can b e rewritten as E Cvx-LP ( x ) = X s,i θ i s x i s + X s ∼ t min y s : y s = x s y t : y t = x t f cvx st ( y s , y t | x s , x t ) (13) sub ject to x s ∈ ∆ L . Observe that f cvx st in tro duces 2 L unknowns, y i s and y i t , p er edge ( s, t ), and enforces 2 L + K L = L ( K + 2) constrain ts (where w e identify [ · ] + with one inequalit y constrain t). Of course, the extra 2 L unkno wns, y i s and y i t , and 2 L of the constraints can be discarded immediately b y applying e.g. the constrain t y i s = x i s , but f cvx st as stated in Eq. 12 will be imp ortant in Section 5. Ov erall, dep ending on the v alues of K and L optimization of E Cvx-LP p oten tially requires far less memory than optimizing the generic LP relaxation E LP-MRF (Eq. 4). A particular and imp ortan t instance of conv ex p otentials are L 1 -t yp e ones, h 7→ α st | h | + β st . F or completeness we state one resp ective sp ecialization of f cvx st to f L 1 st : f L 1 st ( y s , y t | x s , x t ) def = α st X i   Y i t − Y i s   + β st + ı ∆ L ( y s ) + ı ∆ L ( y t ) . (14) 1 The expressions [ ξ ] + appearing in the ob jective can be remov ed leading to a prop er LP after introducing a non-negative ζ with ζ ≥ ξ . 7 4.3 Linear Priors with Bounded Domains In this section w e discuss the particular prior relev ant for the main result in Section 3, where sp eciﬁc conv ex p otentials of the shap e ϑ h = αh + β + ı [ h,h ] ( h ) = αh + β + M  h − h  + + M [ h − h ] + (15) (with M b eing a suﬃciently large constan t) are considered. In this particular setting f cvx st (Eq. 12) reads as f linear st ( y s , y t | x s , x t ) def = α X i  Y i t − Y i s  + β (16) + M X i h Y i s − Y i + h t i + + M X i h Y i + h t − Y i s i + sub ject to y s , y t ∈ ∆ L . With M → ∞ the p enalizer terms transform in to constrain ts Y i s ≤ Y i + u t and Y i s ≥ Y i + l t (whic h corresp ond to inﬁnit y links in the resp ective minim um-cut graph), and f linear st therefore equiv alen tly reads as f linear st ( y s , y t | x s , x t ) def = α X i  Y i t − Y i s  + β (17) + ı n Y i s ≤ Y i + h t , Y i s ≥ Y i + h t o + ı ∆ L ( y s ) + ı ∆ L ( y t ) , and by plugging f linear st as f cvx st in to Eq. 13 w e obtain a program with 2 L unkno wns p er edge and at most 2 L + 2 constraints (or dual v ariables) as claimed in the proof of Theorem 1. 5 Minim um of P airwise P oten tials In this section we show that the (p oin twise) minimum of compactly represen table pairwise p oten- tials leads again to a compact representation of the corresp onding linear program. This applies e.g. to pairwise potentials that are the minimum of (not necessarily symmetric) L 1 -t yp e pairwise priors (see Fig. 1(b)). It is well-kno wn that L 1 -t yp e priors lead to linear programs with O ( L ) unkno wns p er edge (e.g. [11], or apply the result from the previous section). A corollary from the construction presented in the follo wing is, that the minimum of K L 1 -t yp e priors only requires O ( LK ) primal unknowns without loosening the conv ex relaxation, compared to E LP-MRF . W e will call the p oint wise minim um of pairwise “elementary” potentials a “min-p otential” in the follo wing. In order to show the equiv alence of a compact linear program for min-p oten tials with the standard relaxation E LP-MRF (Eq. 4) we pro ceed in t wo steps: • Assume we are giv en a pairwise p otential, that can be written as p oin t-wise minim um of some elemen tary p otentials (see e.g. Fig 4(a)). In Section 5.1 it is shown that such p otentials can b e reformulated as term-wise minimum of elemen tary p oten tials (as illustrated in Fig 4(b)). • If the elementary p otentials are c hosen such that they hav e a compact reformulation (as the ones discussed in Section 4), one can substitute the elemen tary potentials by their corre- sp onding compact reformulation. In Section 5.2 the equiv alence of the resulting reformula- tion is shown, and some relev ant examples are provided. Ov erall, the equiv alence of E LP-MRF (Eq. 4) with compact reformulations is thus established. The following deriv ations make rep eated use of Lemma 1, which allows us to rewrite a minimum of con vex functions as con vex minimization problem in general. Application of Lemma 1 on the terms represen ting elemen tary potentials in order to obtain min-p otentials leads to non-con v ex bilinear constrain ts as shown below. The following lemma states in a general setting, that these bilinear constraints induced by application of Lemma 1 can be “linearized” without aﬀecting the minim um. 8 (a) Poin t-wise (b) T erm-wise Figure 4: Interpreting a pairwise p otential either as p oint-wise minim um (a) or term-wise minimum (b). Lemma 2. L et { f k } k =0 ,...,K − 1 b e a family of c onvex l.s.c. functions (with trivial r e c ession func- tion). F urther deﬁne the fol lowing minimization pr oblems, F 0 ( w | η ) def = min k f k ( η , w ) = min k min y : y = η f k ( y , w ) F 1 ( z , y , w ) def = X k f k   z k , ( y k , w k )  + X k ı { z k η = y k } + ı ∆ ( z ) F 2 ( z , y , w ) def = X k f k   z k , ( y k , w k )  + ı n X k y k = η o + ı ∆ ( z ) . We have that min w F 0 ( w | η ) = min z ,y ,w F 1 ( z , y , w ) = min z ,y ,w F 2 ( z , y , w ) . (18) Pr o of. The equiv alence of F 0 and F 1 follo ws immediately from Lemma 1. The equality min F 1 ( z , y , w ) = min F 2 ( z , y , w ) is shown as follows: observe that min z ,y ,w F 2 ( z , y , w ) ≤ min z ,y ,w F 1 ( z , y , w ), since an y ( z , y , w ) feasible for F 1 is also feasible for F 2 (summing all constraints z k η = y k o ver k and using P k z k = 1 implies P k y k = η ). T o pro ve min z ,y ,w F 2 ( z , y , w ) = min z ,y ,w F 1 ( z , y , w ) w e consider the Lagrangian duals of the t wo con vex programs. W e hav e (also applying Lemma 1) F ∗ 1 ( λ ) = min z ∈ ∆ ,y ,w X k  f k  ( z k , ( y k , w k )) + ( λ k ) T ( z k η − y k )  = min k  min ζ ,ξ f k ( ζ , ξ ) + ( λ k ) T η − ( λ k ) T ζ  = min k  η T λ k − max ζ ,ξ  ζ T λ k − f k ( ζ , ξ )   = min k  η T λ k − ( f k ) ∗ ( λ k , 0 )  . 9 Analogously we obtain F ∗ 2 ( ν ) = min z ∈ ∆ ,y ,w X k f k  ( z k , ( y k , w k )) + ν T  η − X k y k  = ν T η + min k min ζ ,ξ  f k ( ζ , ξ ) − ν T ζ  = ν T η + min k  − max ζ ,ξ  ν T ζ − f k ( ζ , ξ )   = ν T η + min k  − ( f k ) ∗ ( ν, 0 )  = min k  η T ν − ( f k ) ∗ ( ν, 0 )  . Both dual programs hav e essen tially the same ob jective λ 7→ min k  η T λ k − ( f k ) ∗ ( λ k , 0 )  , but F ∗ 2 enforces additional constrain ts on its argumen t ( λ = ( ν, . . . , ν ) for some ν ), from which the already kno wn fact max λ F ∗ 1 ( λ ) ≥ max ν F ∗ 2 ( ν ) follo ws. But an y maximizer λ ∗ of F ∗ 1 can b e conv erted to a feasible solution of F ∗ 2 with the same ob jective b y setting ν ∗ = ( λ ∗ ) l , where l ∈ arg min k  η T λ k − ( f k ) ∗ ( λ k , 0 )  . Therefore b oth the dual programs hav e the same optimal v alue, and min F 1 = min F 2 follo ws from strong duality . In the follo wing we repeatedly apply Lemma 2 to obtain compact and conv ex reform ulations for min-p otentials. The no de v ariables x s and x t in volv ed in the pairwise p otentials via the marginalization constraints (or a resp ective v arian t) will attain the role of η in the lemma. Since these no de v ariables are sub ject to optimization as w ell (on the outer scope), Lemma 2 is imp ortant to replace o ccuring bilinear constrain ts by linear ones. 5.1 T erm-Wise Minim um of P otentials Let the pairwise p otentials θ ij st b e written as p oint wise minimum of elementary p otentials θ ij st , i.e. θ ij st = min k ∈{ 0 ,...,K − 1 } θ ij k st . In this section we sho w the equiv alence of E term-wise ( x ) = X s,i θ i s x i s + X s ∼ t min k n X ij θ ij k st x ij k st o (19) s. t. x i s = X j k x ij k st x j t = X ik x ij k st x s ∈ ∆ L x ij k st ≥ 0 and E point-wise ( x ) = X s,i θ i s x i s + X s ∼ t X ij min k θ ij k st x ij st (20) s. t. x i s = X j x ij st x j t = X i x ij st x s ∈ ∆ L x ij st ≥ 0 Note that the diﬀerence b etw een the tw o programs is the p osition of min k . In E point-wise a new pairwise p otential ˘ θ ij st is formed as the p oin t-wise minim um of given ones, ˘ θ ij st def = min k θ ij k st , (21) 10 whereas in E term-wise a particular activ e p otential is selected p er edge (see also Fig. 4). The equiv alence of the tw o energies is an immediate consequence of the following fact: Lemma 3. F or given x s and x t ∈ ∆ L let M ( x s , x t ) denote the (lo c al) mar ginalization c onstr aints M ( x s , x t ) def = n x st ∈ ∆ L 2 : x i s = X j x ij st , x j t = X i x ij st o . The fol lowing two p airwise c osts p st ( x st | x s , x t ) = min k n X ij θ ij k st x ij st o + ı M ( x s ,x t ) ( x st ) and q st ( x st | x s , x t ) = X i,j min k θ ij k st x ij st + ı M ( x s ,x t ) ( x st ) ar e e quivalent, i.e. min x st p st ( x st | x s , x t ) = min x st q st ( x st | x s , x t ) . (22) Pr o of. First we remark that obviously min q st ≤ min p st , since a sum of p oin twise minima is never larger than the minimum of sums (ov er the same terms). In order to sho w min q st ≥ min p st w e rewrite p st ( x st | x s , x t ) = min k p k st ( x st | x s , x t ) with p k st ( x st | x s , x t ) def = X ij θ ij k st x ij st + ı ∆ L 2 ( x st ) . Observ e that the recession function of p k st is trivial due to the simplex constraint x st ∈ ∆ L 2 . Th us, w e can apply Lemma 2 on p st = min k p k st to obtain an equiv alent conv ex program, ˜ p st ( z , y | x s , x t ) = X ij k θ ij k st y ij k + ı ∆ K ( z ) + ı ≥ 0 ( y ) + ı n X ij y ij k = z k , x i s = X j k y ij k , x j t = X ik y ij k o . W e substitute z k = P ij y ij k in ˜ p st (leading to the constraint 1 = P k z k = P ij k y ij k , or combined with y ≥ 0, to y ∈ ∆ K L 2 ), ˜ p st ( y | x s , x t ) = X ij k θ ij k st y ij k + ı ∆ K L 2 ( y ) + ı n x i s = X j k y ij k , x j t = X ik y ij k o . Let x ∗ st b e a minimizer of q st , and let k ij def = arg min k θ ij k st . W e set ( y ∗ ) ij k = ( x ∗ st ) ij iﬀ k = k ij and 0 otherwise. Then y ∗ is feasible for ˜ p st and has the same ob jective v alue as q st ( x ∗ st ). Consequen tly , min y ˜ p st ≤ min x st q st , and min p st ≤ min q st since p st and ˜ p st are equiv alent. Ov erall, we ha ve min p st = min q st as claimed. The equiv alence of E point-wise and E term-wise means that given pairwise p oten tials θ ij st than can b e written as p oint wise minimum of some “conv enient” elemen tary p otentials, we can fo cus our atten tion on compact represen tations of these elemen tary p otentials. 11 5.2 T erm-Wise Minim um of Compact Poten tials In this section w e assume that elemen tary pairwise p otentials θ ij k st ha ve a more compact equiv alent, e.g. min x st ∈M ( x s ,x t ) X ij θ ij k st x ij st = min y ,w y s = x s y t = x t f k st ( y , w | x s , x t ) for some conv ex function f k st (ha ving a trivial recession function). The resp ective non-conv ex lab eling energy reads as E min-prior ( x ) = X s,i θ i s x i s + X s ı { x s ∈ ∆ L } (23) + X s ∼ t min k ∈{ 0 ,...,K − 1 } min y ,w y s = x s y t = x t f k st ( y , w | x s , x t ) , and we use Lemma 2 b elow to obtain an equiv alent con vex program. F or concreteness, and due to the imp ortant of piecewise conv ex pairwise p otentials, we instantiate f k st with f cvx st (recall Eq. 12) in our deriv ation, f k st ( y s , y t | x s , x t ) def = α k st X i ( Y i t − Y i s ) + β k st (24) + X l,i h γ kl st  Y i s − Y i + δ kl st t i + + ı ∆ ( y s ) + ı ∆ ( y t ) In order to apply Lemma 2 w e need the persp ective of f k st , which can b e immediately stated as ( f k st )  ( z k , y k s , y k t | x s , x t ) def = α k st X i ( Y ki t − Y ki s ) + β k st z k + X l,i h γ kl st  Y ki s − Y k,i + δ kl st t i + + ı n X i y ki s = X i y ki t = z k o + ı ≥ 0 ( y ) . (25) Application of Lemma 2 (with the constrain ts y s = x s and y t = x t taking the role of the constrain t “ y = η ”) establishes the equiv alence of min k min y s ,y t f k st ( y s , y t | x s , x t ) and min z ∈ ∆ K min y ≥ 0 X k  α k st X i ( Y ki t − Y ki s ) + β k st z k + X l,i h γ kl st  Y ki s − Y k,i + δ kl st t i +  (26) sub ject to x i s = X k y ki s x i t = X k y ki t z k = X i y ki s z k = X i y ki t . The o ccuring v ariables hav e intuitiv e meanings: z is a (soft) one-hot enco ding of which branc h k is active in min k f k st , i.e. z represents the set arg min k f k st . It is easy to see that for all v alues of x s and x t a minimizer z will attain only binary v alues (it can also be fractional if arg min k f k st 12 con tains more than one element). y ki s and y ki t represen t x i s and x i t , resp ectively (as “lo cal copies”), in the k -th branch. If the ab o ve expressions are plugged into Eq. 23, and the edge-sp eciﬁc unknowns z k etc. are augmen ted with the resp ective edge subscript st , one obtains the energy giv en in Eq. 27. If we sp ecialize f k st to b e the linear but b ounded p oten tials f linear st (Eq. 16) and express z in terms of y (e.g. z k st = 1 2  P i y ki st → s + P i y ki st → t  ), we arrive at the con vex program given in Eq. 28 relev ant for the main result in Theorem 1. One can directly read oﬀ the num b er of unknowns p er edge (whic h is 2 K L ) and constraints (dual v ariables, at most 2 L + K + 2 K L = 2 L ( K + 1) + K ) from the resulting program. F or practical implemen tations it can b e beneﬁcial to implement sp ecializations of E min-prior rather than E min-linear . F or instance, if the smo othness prior is the minimum of K L 1 -t yp e p oten tials, a generic implementation based on E min-cvx (whic h mo dels the p otential via 2 K linear segmen ts) requires ab out twice the num b er of unkno wns and constraints than a sp eciﬁc form ulation E min- L 1 (depicted in Eq. 29) derived from E min-prior and f L 1 st (Eq. 14). 2 E min-cvx ( x , y , z ) = X s,i θ i s x i s + X s ∼ t X k   α k st X i  Y ki st → t − Y ki st → s  + β k st z k st + X l,i h γ kl st  Y ki st → s − Y k,i + δ kl st st → t i +   (27) s.t. x i s = X k y ki st → s x i t = X k y ki st → t z k st = X i y ki st → s z k st = X i y ki st → t x s ∈ ∆ L , z st ∈ ∆ K , y ≥ 0 . Figure 5: The con vex relaxation for MRFs with pairwise p otentials, that are the p oin t-wise minima of piecewise linear conv ex ones. E min-linear ( x , y ) = X s,i θ i s x i s + X s ∼ t X k α k st X i  Y ki st → t − Y ki st → s  + β k st 2 X i y ki st → s + X i y ki st → t !! (28) s.t. x i s = X k y ki st → s x i t = X k y ki st → t X i y ki st → s = X i y ki st → t Y ki st → s ≤ Y k,i + h st → t Y ki st → s ≥ Y k,i + h st → t x s ∈ ∆ L , z st ∈ ∆ K , y ≥ 0 . Figure 6: The sp ecialization of E min-cvx to b ounded linear p otentials that is used in Theorem 1. 6 Reducing the Grid Bias Un til no w we restricted the exp osition to lab eling tasks with underlying discrete (i.e. graph struc- tured) domains. In some cases con tinuously inspired lab el assignment formulations (e.g. [3]) are 2 If w e assume the solver can directly cop e with | · | , which is the case e.g. for proximal methods-based imple- mentations. 13 E min- L 1 ( x , y ) = X s,i θ i s x i s + X s ∼ t X k α k st X i   Y ki st → s − Y ki st → t   + β k st 2 X i y ki st → s + X i y ki st → t !! (29) s.t. x i s = X k y ki st → s x i t = X k y ki st → t X i y ki st → s = X i y ki st → t x s ∈ ∆ L , y ≥ 0 . Figure 7: The sp ecialization of E min-cvx to the minimum of L 1 -t yp e potentials. preferable in image pro cessing and computer vision problems due to the reduced metrication ar- tifacts. As p ointed out in [25, 26] the ﬁnite diﬀerence discretization of con tin uous formulations is closely related to standard LP relaxations for inference such as E LP-MRF (Eq. 4). In a nut- shell, contin uously inspired lab eling form ulation replace linear smo othness terms (i.e. P ij θ ij st x ij st in E LP-MRF ) with nonlinear, Euclidean-norm based terms in order to achiev e b etter counting of non grid-aligned discontin uities. s t = ri ( s ) r = dn ( s ) Figure 8: The forward diﬀerence stencil used in our discretization of the image plane. W e follow existing literature and use a forw ard diﬀerence stencil in the follo wing. W e exp ect sligh tly improv ed visual results if the discretization of the image plane is based on e.g. a staggered grid or uses a discrete calculus form ulation [7]. In our ﬁnite diﬀerence setting the image domain is represen ted by a regular grid with horizon tal and vertical edges b etw een pixels. W e index horizon tal edges with subscripts ( s, h ) (where s is the left grid p oint of the edge) and vertical ones with ( s, v ) (see also Fig. 8). Th us, e.g. v ariables y ki s,h → s and y ki s,v → s represen t y ki st → s dep ending whether ( s, t ) is a horizontal or vertical edge. F or notational simplicity we assume homogeneous and symmetric pairwise p otentials in the following (i.e. we drop edge subscripts in the co eﬃcients, and assume α k = 0). The simplest isotropic extension shown in Eq. 30 replaces the separate, L 1 -t yp e coun ting of horizontal and vertical discontin uities in Eq. 27, e.g. terms suc h as X l,i h γ kl  Y ki s,h → s − Y k,i + δ kl s,h → t i + + X l,i h γ kl  Y ki s,v → s − Y k,i + δ kl s,v → t i + = X l,i       h γ kl  Y ki s,h → s − Y k,i + δ kl s,h → t i + h γ kl  Y ki s,v → s − Y k,i + δ kl s,v → t i +       1 with a joint Euclidean-norm penalizer, X l,i       h γ kl  Y ki s,h → s − Y k,i + δ kl s,h → t i + h γ kl  Y ki s,v → s − Y k,i + δ kl s,v → t i +       2 . Consequen tly , edges cut join tly in horizontal and vertical direction imply a √ 2 increase in the 14 smo othness term, which corresp onds to the standard √ 2-length p enalization of diagonal disconti- n uities. Similar reasoning holds for the constant cost β k in f k st (Eq. 24, which translates to the term β k z k st in Eq. 27). W e also replace the L 1 -t yp e con tribution β k  z k s,h + z k s,v  = β k     z k s,h z k s,v     1 (since z ≥ 0) with an Euclidean cost, β k     z k s,h z k s,v     2 , and the complete ob jective is giv en in E isotr. min-cvx (Eq. 30). In Eq. 31 we further depict the conv ex program in analogy to the anisotropic energy E min- L 1 (Eq. 29). Our choice for the isotropic extension reduces to approaches presen ted in the literature such as isotropic L 1 p oten tials [16], Potts smo othness prior [24], and truncated priors [26]. The con- struction is described for 2D image domains but can ob viously b e extended to higher dimensional image domains. E isotr. min-cvx ( x , y , z ) = X s,i θ i s x i s + X s,k   β k     z k s,h z k s,v     2 + X l,i       h γ kl  Y ki s,h → s − Y k,i + δ kl s,h → t i + h γ kl  Y ki s,v → s − Y k,i + δ kl s,v → t i +       2   (30) s.t. x i s = X k y ki s,h → s x i ri ( s ) = X k y ki s,h → t x i s = X k y ki s,v → s x i dn ( s ) = X k y ki s,v → t z k s,h = X i y ki s,h → s z k s,h = X i y ki s,h → t z k s,v = X i y ki s,v → s z k s,v = X i y ki s,v → t x s ∈ ∆ L , z s,h , z s,v ∈ ∆ K , y ≥ 0 . Figure 9: A conv ex relaxation for MRFs with symmetric priors reducing the grid bias. E isotr. min- L 1 ( x , y , z ) = X s,i θ i s x i s + X s,k β k     z k s,h z k s,v     2 + α k X i     Y ki s,h → s − Y ki s,h → t Y ki s,v → s − Y ki s,v → t     2 ! (31) sub ject to the same constrain ts as in Eq. 30. Figure 10: The sp ecialization of E isotr. min-cvx to L 1 -t yp e potentials. R emark 3 . In order to reduce the metrication artifacts for min-potentials one has t wo options: the ﬁrst option is to con vert an anisotropic formulation, e.g. Eq. 27, to b ehav e “less anisotropic.” This is the approac h as presen ted ab ov e. The other option is to use isotropic form ulations as elemen tary p otentials and subsequen tly apply the construction describ ed in Section 5 to obtain the minim um potential. If w e focus on the minimum of L 1 -t yp e (total v ariation) p otentials, and if we emplo y the standard forward-diﬀerence discretization, the underlying elementary potential is given by f T V ( y s , y t , y r ) def = α X i     Y i s − Y i t Y i s − Y i r     2 + β + ı ∆ L ( y s ) + ı ∆ L ( y t ) + ı ∆ L ( y r ) , (32) 15 where nodes s , t , and r form a neigh b orho o d as sho wn in Fig. 8. Application of Lemma 2 to rewrite terms min k ( α k X i     Y i s − Y i t Y i s − Y i r     2 + β k ) (33) leads to the conv ex program Eq. 34. It can b e observed from the resulting constraints, that in this formulation e.g. the same elementary p otential needs to b e selected in horizon tal and v ertical direction. F urther, the “constants” β k are counted diﬀerently in E isotr. min- L 1 and E isotr. min- L 1 -b . W e curren tly prefer E isotr. min- L 1 , since it reduces to the Potts and truncated L 1 mo dels presen ted in the literature, but a deep er analysis is sub ject of future researc h. E isotr. min- L 1 -b ( x , y , z ) = X s,i θ i s x i s + X s,k β k z k s + α k X i     Y ki s → s − Y ki s → t Y ki s → s − Y ki s → r     2 ! (34) s.t. x i s = X k y ki s → s x i ri ( s ) = X k y ki s → t x i dn ( s ) = X k y ki s → r z k s = X i y ki s → s z k s = X i y ki s → t z k s = X i y ki s → r x s ∈ ∆ L , z s ∈ ∆ K , y ≥ 0 . Figure 11: An alternative form ulation to reduce the grid bias of E min- L 1 . 7 Numerical Results Unless otherwise noted w e use a straightforw ard OpenMP-parallelized C++ implementation of the ﬁrst order primal-dual metho d describ ed in [17] to ﬁnd minimizers of the resp ective con vex program. As with other proximal metho ds the algorithm leav es freedom of ho w the con vex problem is splitted (i.e. which dual unknowns are introduced, and the choice of proximal steps utilized). The emplo yed splitting often has a signiﬁcan t impact on con vergence b ehavior. In general, w e eliminate z from the ob jective and in tro duce Lagrange multipliers for each of the remaining constraints. W e also introduce b ounds-constrained dual v ariables for terms as h γ kl  Y ki s,h → s − Y k,i + δ kl s,h → t i + =   γ kl   i − 1 X j =0 y kj s,h → s − i − 1 X j =0 y k,j + δ kl s,h → t     + = max p ∈ [0 ,γ kl ] p   i − 1 X j =0 y kj s,h → s − i − 1 X j =0 y k,j + δ kl s,h → t   . A naive implementation of the resp ective primal-dual up date steps has a O ( L 2 ) time complexit y (p er edge in the graph), and w e use appropriate running sums to preserve the O ( L ) complexity . 7.1 Early Stopping of Message Passing Metho ds As ﬁrst experiment we v erify the claim that early stopping can o ccur frequently in dual block co ordinate metho ds for MAP inference such as MPLP [6]. W e follow the setup of a similar exp erimen t describ ed in [19], but replace the 3-lab el spin glass mo del considered there by problem instances with many lab els and piecewise linear smo othness priors. Our setup is as follows: the domain is a 20 × 20 regular grid with the standard 4-connected neighborho o d stucture, and the state 16 space contains 20 labels. The unary p otentials are sampled randomly from a uniform distribution θ i s ∼ U (0 , 2), and the pairwise p otentials are truncated linear ones with ϑ h st = α st min {| h | , 2 } , (35) and α st ∼ U (0 , 1) are used. W e solve 300 random instances using MPLP and compare the energy to the globally optimal one obtained by minimizing E LP-MRF using the primal-dual algorithm. In ab out 30% of the problem instances MPLP stops early with an energy diﬀerence of more than 0.001 and in ab out 26% with more than 0.01 (see also Fig. 12). 0 50 100 150 200 250 0 0.04 0.08 0.12 0.16 0.2 0.24 0.28 0.32 0.36 0.4 0.44 0.48 0.52 0.56 Numbe r of Instances Ener gy Difference Figure 12: Histogram of optimality gaps for solutions returned b y MPLP . 7.2 V ariable Smo othing Recen t developmen ts in accelerated gradient metho ds (e.g. [1]) app ear to b e v ery appealing in order to optimize the non-smooth dual program of e.g. E LP-MRF (Eq. 4). The metho d prop osed in [1] guarantees an O (ln( T ) /T ) conv ergence rate, where T is the iteration count, but requires to set a tuning parameter. In Fig. 13 it is illustrated that thi s parameter has to be chosen carefully to ac hieve comp etitive performance. Our conclusion is that such v ariable smo othing metho ds cannot (y et) replace compact reform ulations as prop osed in the previous sections. 7.3 Con v ergence Sp eed and Memory Consumption W e use a more realistic application to compare memory requiremen ts and the ev olution of energies. While a compact reformulation such as E min- L 1 has a smaller problem size than E LP-MRF , it is not clear whether the more complicated problem structure ma y lead to slow er conv ergence. W e c hose a simple image denoising application for this demonstration. W e use a piecewise linear appro ximation depicted in Fig. 14(b) of the gradien t statistic of natural images [9]. The unary p oten tial (shown in Fig. 14(a)) is induced directly by our image corruption pro cedure, whic h is as follows: a random set con taining ﬁv e p ercen t of the pixels are considered as outliers and their clean intensit y v alues are replaced by a uniform random v alue from [0 , 255]. F or the remaining inlier pixels w e add Gaussian noise dra wn from N (0 , 10) to their resp ectiv e clean in tensities. Th us, the data ﬁdelity term is giv en by D ( u ) = X s − λ log  5 100 + 95 100 φ ( u s − g s ; 0 , 10)  , where φ ( x ; µ, σ ) = 1 √ 2 π σ 2 exp( − ( x − µ ) 2 2 σ 2 ) is the density function of the Normal distribution. W e set λ = 1, and the utilized regularizer (Fig. 14(b)) is R ( u ) = X s ∼ t min k ∈{ 0 , 1 , 2 }  α k | u s − u t | + β k  17 610000 615000 620000 625000 630000 635000 640000 0 500 1000 1500 2000 2500 3000 Dual En ergy Time [s] p-pd nesterovp p b=0.01 nesterovp p b=0.1 nesterovp p b=1 Figure 13: Energy evolution for a 32 lab el denoising problem utilizing compact potential with three pieces. The preconditioned primal dual (p-p d) algorithm [17] outp erforms the v ariable smo othing (nestero vpp) algorithm [1] with diﬀeren t tuning parameters b with ( α 0 , β 0 ) = (24 , 0), ( α 1 , β 1 ) = (8 , 1), and ( α 2 , β 2 ) = (3 . 2 , 2). The observ ed 400 × 300 noisy image is illustrated in Fig. 14(c), and the recov ered image can b e seen in Fig. 14(d). The solution image is determined by extracting the 1/2-isolevel of the superlevel function X i s = P i − 1 j =0 x j s . W e discretize the contin uous state space [0 , 255] into 64 lab els. The memory used to minimize E LP-MRF for this problem is almost 8GB, and optimizing E min- L 1 ( E isotr. min- L 1 , resp ectiv ely) requires ab out 1.1GB memory . Hence, the latter formulations will ﬁt in graphics memory and can therefore lev erage GPUs for acceleration. The evolution of the ob jective D ( u ) + R ( u ) is displa yed in Fig. 15. All primal and dual unknowns are initialized with 0, whic h may explain the initial increase in the ob jective v alue. The compact reformulations E min- L 1 / E isotr. min- L 1 ha ve a clear adv antage o v er the exhaustiv e model E LP-MRF . 7.4 Comparison with A Con tinuously-inspired F orm ulation In [16] a contin uous approac h is described that addresses lab eling problems with conv ex smo othness priors (in terms of the spatial gradien t of the assigned lab el function) but arbitrary data ﬁdelity term. It is further shown that in th e con tinuum a global solution can be obtained by thresholding a minimizer of an underlying conv ex relaxation. This result do es in general not hold after discretizing the contin uous functional. An interesting example of a conv ex smo othness prior that is considered in [16] is the Lipschitz prior, ϑ h = ı {| h | ≤ η } for some η ≥ 0. This regularizer enforces bounded lab el diﬀerences for adjacent no des (pixels) and is v ery compact to represent in the framework of Section 4, f Lipschitz st ( y s , y t | x s , x t ) = ı n Y i s ≤ Y i − Lη t , Y i s ≥ Y i + Lη t o together with y s ∈ ∆ L and y t ∈ ∆ L . L will b e 32 in the following. Note that the energy form ulation in [16] also requires only O ( L ) v ariables and constraints p er edge in the grid. In order to hav e a ground truth result av ailable for b etter comparison, we use a (conv ex) quadratic data term, ( u s − g s ) 2 , which implies that an optimal lab eling can b e easily obtained (see Figs. 16(a,d), w e choose η = 1 / 16). Figs. 16(b,e) depict the result of the discretized mo del [16] with the lab el space discretized into L = 32 states, and Figs. 16(c,f ) displa y the result of Eq. 30 sp ecialized to f Lipschitz st . In terms of the PSNR our result is muc h closer to the true minimizer, and further preserv es more image details. This small exp eriment indicates that often care is required when w orking with discretized problems of contin uous lab eling functionals. 18 3 4 5 6 7 8 9 10 11 -150 -100 -50 0 50 100 150 Unary cost Label di fference (a) Unary potential 0 0.5 1 1.5 2 2.5 3 3.5 4 -150 -100 -50 0 50 100 150 Pair wise cost Label di fference (b) Pairwise p otential (c) Noisy input (d) Denoised image Figure 14: Image denoising using 64 lab els and a compact piece-wise linear p otential. 8 Conclusion W e show that pairwise p otentials, that can b e written as piecewise linear functions in terms of the resp ective labe l diﬀerence, allow compact reform ulations of the LP relaxation for MAP inference. These reformulations do not w eaken the relaxation or mo dify the returned minimizer. The resulting savings in memory consumption can b e v ery signiﬁcan t (e.g. one order of magnitude) for many-label problems. The construction also extends to formulations aiming to reduce the grid bias often used in image processing. F uture work will address the applicability of the underlying techniques developed in this man uscript to general piecewise linear p oten tials θ ij st (not only those that can b e written as θ ij st = ϑ j − i st ), and to higher order potentials b ey ond pairwise ones. References [1] Radu Ioan Bot and Christopher Hendrich. A v ariable smo othing algorithm for solving conv ex opti- mization problems. arXiv pr eprint arXiv:1207.3254 , 2012. [2] Y. Boyk ov, O. V eksler, and R. Zabih. Mark ov random ﬁelds with eﬃcient appro ximations. In IEEE Confer enc e on Computer Vision and Pattern R e c o gnition (CVPR) , pages 648–655, 1998. [3] Antonin Chambolle, Daniel Cremers, and Thomas P o ck. A conv ex approac h to minimal partitions. SIAM Journal on Imaging Scienc es , 5(4):1113–1158, 2012. [4] Chandra Chekuri, Sanjeev Khanna, Joseph Naor, and Leonid Zosin. A linear programming formu- lation and approximation algorithms for the metric lab eling problem. SIAM Journal on Discr ete Mathematics , 18(3):608–625, 2004. [5] Pedro F F elzenszw alb and Daniel P Huttenlo cher. Eﬃcien t belief propagation for early vision. Int. Journal of Computer Vision , 70(1):41–54, 2006. 19 0 100 200 300 400 500 600 6 7 8 · 10 5 CPU time (s) Energy E min- L 1 E isotr. min- L 1 E LP-MRF Figure 15: Energy evolution of E LP-MRF and compact reformulations. (a) T rue solution (b) PSNR: 36.03 (c) PSNR: 37.56 (d) Detail of (a) (e) Detail of (b) (f ) Detail of (c) Figure 16: Ground truth result (a) and solutions returned by a contin uous formulation (b) and our reformulation (c) for a lab eling task with Lipschitz prior. (d–f ) depict zo omed in region in the low er left corner. [6] Amir Glob erson and T ommi Jaakk ola. Fixing max-pro duct: Conv ergent message passing algorithms for map lp-relaxations. A dvanc es in neur al information pr o c essing systems (NIPS) , 21(1.6), 2007. [7] Leo John Grady and Jonathan R Polimeni. Discr ete c alculus: Applie d analysis on gr aphs for c ompu- tational scienc e . Springer, 2010. [8] T. Hazan and A. Shash ua. Norm-pro dcut belief propagtion: Primal-dual message-passing for lp- relaxation and approximate-inference. IEEE T r ans. on Information The ory , 56(12):6294–6316, 2010. [9] Jinggang Huang and David Mumford. Statistics of natural images and models. In IEEE Computer So ciety Confer enc e on Computer Vision and Pattern Re c o gnition (CVPR) , v olume 1. IEEE, 1999. [10] Hiroshi Ishik a wa. Exact optimization for mark ov random ﬁelds with con vex priors. IEEE T r ansactions on Pattern Analysis and Machine Intel ligenc e (TP AMI) , 25(10):1333–1336, 2003. [11] Hiroshi Ishik aw a and Da vi Geiger. Segmentation by grouping junctions. In IEEE Confer enc e on Computer Vision and Pattern R e c o gnition , pages 125–131. IEEE, 1998. [12] V. Kolmogorov. Conv ergent tree-rew eighted message passing for energy minimization. IEEE T r ans- actions on Pattern Analysis and Machine Intel ligenc e (P AMI) , 28(10):1568–1583, 2006. [13] V. Kolmogorov and M. W ainwrigh t. On the optimality of tree-rew eighted max-pro duct message- passing. In Pr o c. Unc ertainty in Artiﬁcial Intel ligenc e (UAI) , 2005. [14] V.A. Kov alevsky and V.K. Kov al. A diﬀusion algorithm for decreasing energy of max-sum lab eling problem. Glushko v Inst. of Cyb ernetics, Kiev, USSR, 1975. [15] Daniel Lemire. Streaming maximum-minim um ﬁlter using no more than three comparisons p er elemen t. Nor dic J. Computing , 13(4):328–339, 2006. [16] T. Pock, D. Cremers, H. Bischof, and A. Chambolle. Global solutions of v ariational mo dels with con vex regularization. SIAM Journal on Imaging Sciences , 3(4):1122–1145, 2010. [17] Thomas Pock and Antonin Chambolle. Diagonal preconditioning for ﬁrst order primal-dual algorithms in conv ex optimization. In IEEE International Confer enc e on Computer Vision (ICCV) , pages 1762– 1769. IEEE, 2011. 20 [18] D. Sc hlesinger. Exact solution of p ermuted submo dular MinSum problems. In Pr o c. Ener gy Mini- mization Metho ds in Computer Vision and Pattern Re c o gnition (EMMCVPR) , pages 28–38, 2007. [19] Alex Sch wing, T amir Hazan, Marc Pollefeys, and Raquel Urtasun. Globally conv ergent dual map lp relaxation solvers using fenchel-y oung margins. In A dvanc es in Neur al Information Pr o c essing Systems (NIPS) , pages 2393–2401, 2012. [20] D. Sontag, A. Glob erson, and T. Jaakkola. Optimization for Machine L e arning , chapter In tro duction to Dual Decomp osition for Inference. MIT Press, 2011. [21] M. J. W ainwrigh t and M. I. Jordan. Graphical mo dels, exponential families, and v ariational inference. F ound. T r ends Mach. L e arn. , 1:1–305, 2008. [22] T. W erner. A linear programming approach to max-sum problem: A review. IEEE T r ansactions on Pattern Analysis and Machine Intel ligenc e (P AMI) , 29(7), 2007. [23] Hao Y uan and Mikhail J Atallah. Running max/min ﬁlters using 1 + o (1) comparisons p er sample. Pattern Analysis and Machine Intel ligenc e, IEEE T r ansactions on , 33(12):2544–2548, 2011. [24] C. Zach, D. Gallup, J.-M. F rahm, and M. Niethammer. F ast global labeling for real-time stereo using m ultiple plane sweeps. In Pr o c. VMV , 2008. [25] C. Zac h, C. H¨ ane, and M. Pollefeys. What is optimized in tigh t conv ex relaxations for multi-label problems? In IEEE Confer enc e on Computer Vision and Pattern R e c o gnition (CVPR) , 2012. [26] C. Zach, C. H¨ ane, and M. P ollefeys. What is optimized in conv ex relaxations for multi-label prob- lems: Connecting discrete and contin uously-inspired MAP inference. IEEE T r ansactions on Pattern Analysis and Machine Intel ligenc e (P AMI) , 2013. accepted for publication. [27] Christopher Zach and Pushmeet Kohli. A conv ex discrete-contin uous approach for marko v random ﬁelds. In Eur op e an Confer enc e on Computer Vision (ECCV) , pages 386–399. Springer Berlin Heidel- b erg, 2012. 21

Compact Relaxations for MAP Inference in Pairwise MRFs with Piecewise Linear Priors

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment