Message-Passing Algorithms: Reparameterizations and Splittings

1 Message-P assing Algorithms: Reparamet erizations and Splittings Nicholas Ruozzi and Se khar T atikonda Abstract —The max-product algor ithm, a local message-passing scheme that attempts to compute the most probable assignment (MAP) of a given pro babili ty distribution, has been successfully employed as a method of approximate inference fo r app lications arising i n coding theory , computer vision, and machine learning. Howe ver , the max-product algo rithm is not guaranteed to con- ver ge to th e MAP assignment, and if it does, i s not guaranteed to reco ver the M AP assignment. Alternativ e con ver gent message-passing schemes ha ve been proposed to ov ercome t hese di fﬁculties. This work provides a systematic stud y of su ch message-passing algorithms that extend s the known results b y exhib iting new sufﬁcien t condition s for con ver gence to local and/or global optima, pro viding a combina- torial characterization of these optima based on graph covers , and describing a new con verg ent and correct message-passing algorithm whose deri vation uniﬁes ma ny of the known con v ergent message-passing algorithms. While con ver gent and correc t message-passing algorithms repre sent a step forward in the analysis of max-product style message-passing algorithms, the condi tions needed to guarantee con ver gence to a global optimum can be too restrictiv e in both theory and practice. This limitation of con verg ent and correct message-passing schemes is characterized by graph covers an d illustrated by example. Index T erms —Graphical models, Maximum a posteriori es- timation, Message p assing, Belief propagation, Inference algo- rithms I . I N T RO D U C T I O N Belief pr opagation was originally formu lated by Judea Pearl to solve in ference pro blems in Baye sian network s [20]. Pearl demonstra ted that, for tree-stru ctured graphical mode ls, a simple, distributed message -passing a lgorithm, dubbed “ belief propag ation”, is guaranteed to con verge to the exact margin als of the inp ut p robability distrib ution. If th e belief pro pagation algorithm is run on an arbitrary grap hical mo del (i.e. , one that may contain cycles), then neither con vergence nor co rrectness are gu aranteed. In practice, however , the “loo py” belief pro pa- gation algorithm often prod uces reasonable approxim ations to the tr ue marginals [19]. Pearl also propo sed an alg orithm for MAP estimation that he d ubbed “belief revision. ” This optimizatio n analog of belief propag ation is more comm only known a s the max -produ ct or, equiv alently , the min- sum algorithm . The se alg orithms have similar guarantees to the belief propag ation algo rithm: they produ ce th e cor rect MAP estimate when the graphical mo del is a tree and may or may not pro duce the corr ect solu tion N. Ruozzi is a member of the Communication Theory Laboratory at EPFL, and S. T atik onda is a m ember of the Departmen t of Electric al Engineeri ng at Y ale Unive rsity . This work was present ed, in part, at the confere nce Uncert aintiy in Artiﬁcia l Intell igence (UAI) 2010. when run o n grap hs that contain cycles. In this work, we focus p rimarily o n variants of th e min -sum algo rithm: a lo cal message-passing scheme designed to ﬁnd the global m inimum of an objective fun ction that can be written as a sum of function s, each of which d epends on a subset of the pro blem variables. For ar bitrary graphical models, th e min-sum algorithm may fail to converge [15], it may co n verge to a set of beliefs from wh ich a globa l minimum cann ot be easily co nstructed [35], or th e estimate e xtracte d upon conv ergence may not be optimal [38]. Despite these dif ﬁculties, the min-sum algorithm, like the belief propa gation alg orithm, has foun d e mpirical success in a variety of ap plication areas inclu ding statistical physics, combinato rial optimization [25], com puter v ision, clustering [6], and th e minimizatio n o f quad ratic fu nctions [17] [ 15]. Howe ver, rigorou sly ch aracterizing the behavior of the algor ithm outside o f a f ew well- structured instances h as proved challen ging. In an attempt to im prove the perf ormance of the min - sum alg orithm, r ecent work has produced altern ati ve message- passing algo rithms that, u nder certain c onditions, are p rov- ably con vergent and co rrect: MPLP [7], serial tree-reweighted max-pr oduct (TR W -S) [1 0], ma x-sum d iffusion (MSD) [40], and the norm- produ ct [8] algorith m. These message-passing algorithm s are con vergent in the sense that they can each be viewed as coor dinate-ascent schem es over concave lower bound s. Such message-p assing algo rithms can be converted into d istributed algorith ms by perform ing multiple coordinate- ascent steps in parallel and then averaging the results. Unfortu- nately , this process may require s ome amount o f central contr ol at each step and typically results in slower rates of c on vergence when com pared with the o riginal coo rdinate-ascen t scheme [29]. A d iscussion of efﬁcient parallel message passing based on the norm-prod uct algorithm can be fou nd in [27]. The above alg orithmic ideas are closely related to concur- rent work in the codin g com munity o n pseudo-codewords and LP deco ding [4], [31]. Coo rdinate-ascen t scheme s an d con- vergent message-passing algorithms r elated to linear program- ming pr oblems that ar ise in the co ntext of codin g were stud ied in [33], [32], a nd [30]. As w e will see, these ap proache s are connected to the above approaches via the MAP LP , a standard linear pr ogrammin g relaxation of the MAP problem. Other related work h as focused on co n vex free energy approx imations and the convergence of the sum- produ ct al- gorithm [3 9], [41], [9 ]. As the max -prod uct alg orithm is the zero temperature limit of the sum-p roduc t algo rithm, the se results provid e an algo rithmic alternative to th e max- produc t algorithm but ca n suffer from nume rical issues as a result of the limiting proc ess. Convex free energy approx imations 2 are constructed from a vector o f double counting n umbers. These doub le co unting numbers are closely related to the reparame terizations that we will con sider in th is work. The primar y focus of this work is to pr ovide a systematic technique f or the design of conv ergent and corr ect message- passing alg orithms th at, like th e standard min-sum m essage- passing algor ithm, are bo th decentr alized and distributed. Such alg orithms have the potential to be usef ul for solv ing large-scale, parallel optimization problems for which standard optimization alg orithms are imp ractical. Our pr imary contributions are thr eefold: • W e p ropose a new distributed, local message-passing algorithm , which we call the splitting algor ithm, that con- tains m any of the other convergent an d co rrect messag e- passing alg orithms as a special case. This alg orithm, though initially de riv ed b y “splitting” the nodes of the factor graph into m ultiple reweighted copies, can also be interpreted as pr oducing an altern ati ve repar ameterization of the o bjective function. W e show ho w to deri ve an entire theory of message- passing algorithm s by starting from reparame terizations (in contrast to mu ch o f the work on conv ergent m essage-passing algorithm s that begins with Lagrang ian dua lity). • W e p rovide condition s un der which this new algorithm can conv erge to locally an d globally optimal solutions of the minimiza tion problem . Pas t work o n similar message- passing algorithms has focused exclu si vely on glo bal optimality . Empirica lly , message-passing algorithms that do not gu arantee th e global optim ality of extracted assign- ments may still per form b etter th an their conv ergent and correct co unterpar ts. As such, u nderstand ing the behavior of message-passing algorithms that only g uarantee certain forms o f local op timality is of p ractical impo rtance. • W e characteriz e the precise relationship betwee n gr aph covers a nd the convergence and correctness o f lo cal message-passing algorithms. Th is c haracterization applies not only to pr ovably conver gen t an d correct message- passing algorithms, b ut also to other message-passing schemes that only guaran tee local optima lity . Th is un - derstandin g allows us to provide n ecessary and suf ﬁ- cient conditions fo r lo cal message-passing algo rithms to conv erge to the cor rect solution over pairwise binar y graphica l mo dels. Beyond these con tributions, this work attempts to u nify many disparate r esults in the th eory of message-passing algo- rithms that have appeared across multip le commun ities as well as to make exp licit th e assump tions that ar e often used wh en studying graphical models. This work is organized a s follows . In Section II, we re view the p roblem setting , the min-sum algo- rithm, and factor grap hs. In Sections III an d IV, we der i ve th e splitting a lgorithm f rom the m in-sum alg orithm an d show how this result can be generalized to a study of reparameterization s. In Section V, we pr ovide con ditions under which the splitting algorithm is, in some sense, locally or g lobally o ptimal. In Section VI , w e p roduce a simple, conv ergent message-p assing algorithm from the splitting reparam eterization via a trivial lower bo und an d co ordinate ascent. I n Sectio n VII, we show how to un derstand the ﬁxed-poin t solutions of the sp litting algorithm in terms of g raph covers an d d iscuss th e extension of the th eory of pseu do-cod ew ords to general message-passing algorithm s. In Section VIII, we provide e xamples that illustra te the lim its of the con vergent message-passing ap proach. Finally , in Section IX, we su mmarize our results and co nclude. I I . P R E V I O U S W O R K In this sectio n, we revie w the p roblem f ormulatio n, n eces- sary termino logy , and basic results c oncernin g the m in-sum algorithm . Let f : Q i X i → R ∪ {∞} , where each X i is an a r- bitrary set (e.g., R , { 0 , 1 } , Z , etc.) . Througho ut this pap er , we will be interested in ﬁn ding an e lement ( x 1 , . . . , x n ) ∈ Q i X i that minimizes f , an d as su ch, we will assume that there is such an element: Assumption 1. ∃ x ∗ ∈ Q i X i such that f ( x ∗ ) = inf x f ( x ) . W e note that we will in general allow f to take the value ∞ over its dom ain. Howe ver , as we will see in subsequen t sections, some results will only apply when f is a pr oper, real-valued f unction (i. e., f does not take the value ∞ or −∞ at any elemen t in its domain) . For an arb itrary function, co mputing this minim um may b e computatio nally expensive, especially if n is large. A typi- cal scien tiﬁc application ma y inv olve hun dreds of thou sands of variables and potential function s, and storin g the e ntire problem on o ne com puter ma y be d ifﬁ cult, if not imp ossible. In other applications, such as sensor networks, processing power an d storage are limited. Because local message-passing algorithm s like the min -sum algorith m are decen tralized and distributed, they can operate o n scales a t wh ich typical alg o- rithms would b e impractica l. Although we will discuss algor ithms for th e min imization problem , some app lications have a more natural formulatio n as max imization p roblems. In these instance s, we can u se the equiv alence m ax x f ( x ) = − min x  − f ( x )  in order to co n vert the m aximization pr oblem into a min imization p roblem. His- torically , the max-pr oduct algor ithm f or no nnegative fu nctions is often stu died instead o f the min -sum algo rithm. W e prefer the min-sum algo rithm for notatio nal reasons, a nd all o f the results discu ssed in this work c an easily be con verted into results fo r th e max -prod uct case. A. F a ctorizations and F actor Graph s The basic observation o f the min-sum algorithm is that, even though the o riginal minimization pro blem may be difﬁcult, if f can b e written as a sum of functio ns dep ending on only a small subset o f the variables, then we may be ab le to minim ize the ob jectiv e fu nction by performing a series of minimizations over ( presumably easier) sub -proble ms. T o make this concre te, let A ⊆ 2 V . W e say that f factorizes over a h ypergraph G = ( V , A ) if we can write f as a sum of real valued p otential function s φ i : X i ∈ V → R ∪ {∞} and ψ α : X α → R ∪ {∞} as f ( x ) = X i ∈ V φ i ( x i ) + X α ∈A ψ α ( x α ) . (1) In this work, we focus on additive factorizations, and multi- plicative factorizatio ns, such that all of the pote ntial functions 3 x 1 x 2 x 3 ψ 12 ψ 23 ψ 13 Fig. 1: The factor graph correspon ding to f ( x 1 , x 2 , x 3 ) = φ 1 + φ 2 + φ 3 + ψ 12 + ψ 23 + ψ 13 . By convention, variable no des are represente d as circles and factor nodes are repre sented as squares. T ypically , th e φ fu nctions th at d epend only on a single variable are omitted from the grap hical representa tion (for cla rity). are n onnegative, c an b e con verted into additive factorizations by takin g a negati ve log of the objective fun ction. The above factorization is by no means unique . For exam- ple, suppose we ar e given the objective function f ( x 1 , x 2 ) = x 1 + x 2 + x 1 x 2 . W e can factorize f in m any different ways. f ( x 1 , x 2 ) = x 1 + x 2 + x 1 x 2 (2) = x 1 + ( x 2 + x 1 x 2 ) (3) = ( x 1 + x 2 + x 1 x 2 ) (4) = x 1 + x 2 + x 1 x 2 2 + x 1 x 2 2 (5) Each of these rewritings represents a different factorizatio n of f (the parenth esis indica te a single p otential fu nction). All o f these factorizations can be captu red by the ab ove de ﬁnitions, except f or the last. Recall that A was taken to be a subset of 2 V . I n o rder to accomm odate the factorization g i ven by (5), we will allow A to be a multiset w hose elem ents are members of the set 2 V . The set o f all f actor izations of the objective func tion f ( x ) over G = ( V , A ) forms an a fﬁne set, F ( V , A ) ( f ) = { ( φ, ψ ) : κ + X i ∈ V φ i ( x i ) + X α ∈A ψ α ( x α ) = f ( x ) for all x } . (6) If ( φ, ψ ) ∈ F ( V , A ) ( f ) and ( φ ′ , ψ ′ ) ∈ F ( V , A ) ( f ) , then ( φ, ψ ) is called a r eparameterization of ( φ ′ , ψ ′ ) and vice versa. W e can re present the hypergraph G = ( V , A ) as a bipartite graph with a variable node i fo r each variable x i , a factor node α for e ach of the po tentials ψ α , and an edge jo ining the factor node correspond ing to α to the variable no de representin g x i for all i ∈ α . This bipartite grap h is called the factor g raph re presentation of G . Factor gr aphs provid e a visual repr esentation o f the relatio nship among the p otential function s. In th is work , we will a lw ays assume th at G is given in its factor g raph representatio n. F or a concrete example , see Figure 1. B. The Min -Sum Algorithm The min- sum algorithm is a local message-passing algo - rithm over a factor gr aph. Du ring the execution of the m in- sum algo rithm, messages are passed back and forth b etween adjacent n odes o f th e gr aph. I n the algor ithm, ther e are two types o f messages: messages passed fro m variable no des to factor n odes a nd mess ages passed f rom f actor nodes to variable nodes. On th e t th iteration of th e algorithm, messages are passed along each edge of th e factor grap h as follo ws: m t i → α ( x i ) : = κ + φ i ( x i ) + X β ∈ ∂ i \ α m t − 1 β → i ( x i ) (7) m t α → i ( x i ) : = κ + min x α \ i h ψ α ( x α ) + X k ∈ α \ i m t − 1 k → α ( x k ) i (8) where ∂ i de notes the set of all α ∈ A such that i ∈ α (intuitively , th is is the set o f neig hbors of variable node x i in the factor gr aph), x α is the vecto r form ed f rom the entr ies of x b y selecting only the in dices in α , and α \ i is abusi ve notation for the set-theo retic d ifference α \{ i } . W hen the graph is a tree, these message u pdates can be derived by d ynamic progr amming. When th e grap h is not a tree, the same updates are used as if the graph was a tree. Unde rstanding wh en these updates conv erge to th e correct solution for a g i ven graph is th e centr al que stion unde rlying th e study of the min-sum algorithm . Each message u pdate has an arbitrary normalization factor κ . Becau se κ is not a fun ction of any of the variables, it only affects th e value of the minimu m and not where the min imum is located. As such, we are fr ee to choo se it ho wever we like for each message and each time step. In p ractice, the se constants are used to av oid nu merical issues that may arise during the ex ecution of the algorithm . Deﬁnition II.1. A vector of messa ges, m = ( { m α → i } , { m i → α } ) , is real-valued if for all α ∈ A , ∀ i ∈ α , and ∀ x i ∈ X , m α → i ( x i ) and m i → α ( x i ) ar e r eal-value d functions ( i.e., they do not take the valu e ∞ for any x i ∈ X ). W e will think of th e messages as a vector of fu nctions indexed by the direced edg e over which the message is p assed. Any vector of rea l-valued messages is a valid choice for the vector o f initial m essages m 0 , a nd the ch oice of initial messages can greatly affect the beh avior of the algorith m. A typical a ssumption is that th e initial messages are chosen such that m 0 α → i ≡ 0 and m 0 i → α ≡ 0 . This u niformity assumptio n is ofte n useful when we need to analy ze the e volution of the alg orithm over tim e, b ut ideally , we would like to desig n message-passing schem es that perform well indepen dent of initialization. W e can use the messages in ord er to construct an estimate of the min -marginals of f . Recall that a min -marginal of f is a fu nction of o ne or mo re variables obtained by ﬁxin g a sub set of the variables and minimizin g the fun ction f over all of the remaining variables. For example, the min-margin al for th e variable x i would be the fu nction f i ( x i ) , min x ′ : x ′ i = x i f ( x ′ ) . Giv en any vector o f message s, m t , we can construct a set o f beliefs that are in tended to appr oximate the m in-marginals of 4 f by setting b t i ( x i ) : = κ + φ i ( x i ) + X α ∈ ∂ i m t α → i ( x i ) (9) b t α ( x α ) : = κ + ψ α ( x α ) + X i ∈ α m t i → α ( x i ) . (10) where, again , κ is an arbitrary constant that can be different for each belief. If the be liefs c orrespon ded to the true min -marginals of f (i.e., b t i ( x i ) = min x ′ : x ′ i = x i f ( x ′ ) ), then for any y i ∈ arg min x i b t i ( x i ) there exists a vector x ∗ such that x ∗ i = y i and x ∗ minimizes the fu nction f . If | a rg min x i b t i ( x i ) | = 1 for all i , then we can con struct x ∗ by setting x ∗ i = y i for all i , but, if the objectiv e fu nction has m ore than o ne o ptimal solution, then we may n ot be a ble to construct su ch an x ∗ so easily . For this reason, theoretical results in this area typically assume that the objective function has a uniq ue global minimum . Although this assumption is common, we will not ado pt this co n vention in th is work. Because our beliefs are no t necessarily the true min- marginals, we can only a pproxim ate the optimal a ssignment by com puting an estimate of the argmin x t i ∈ arg min x i b t i ( x i ) . (11) Deﬁnition II.2. A vec tor , b = ( { b i } , { b α } ) , of beliefs is locally decoda ble to x ∗ if for all i and for all x i 6 = x ∗ i , b i ( x ∗ i ) < b i ( x i ) . Equivalently , for all i , b i has a u nique minimum a t x ∗ i . If the algorithm converges to a vector of beliefs tha t are locally deco dable to x ∗ , then we hop e that the vector x ∗ is a glob al minim um o f the objective functio n. This is indeed the case when the factor g raph contains no cycles. Inf ormally , this follows f rom th e c orrectness of dynamic pro grammin g on a tree. Th is result is well known, and we will defer a mo re detailed discussion of this result un til later in this work (see Corollary V .6). Similarly , we h ope that the b eliefs constructed fr om any ﬁxed point o f (7) a nd (8) would be the true min- marginals of the func tion f . If th e beliefs are the exact min-margina ls, then the estimate corresp onding to our beliefs would indeed be a global minimum . Unfo rtunately , the algor ithm is only known to prod uce the exact min-marginals on special factor graphs (e.g., when the factor grap h is a tree, see Section IV -A). Instead, we will show that the ﬁxed-poin t belief s are similar to min-margina ls. Like the me ssages, w e will thin k of the beliefs as a vector of functio ns indexed by the node s of the factor graph . Su ppose f factors over G = ( V , A ) . Consider the f ollowing deﬁnitions. Deﬁnition II.3. A vector of beliefs, b , is admissible for a function f if f ( x ) = κ + X i b i ( x i ) + X α h b α ( x α ) − X k ∈ α b k ( x k ) i for a ll x . Beliefs satisfying this pr operty are said to reparam- eterize the objective functio n. x 1 x 2 x 3 ψ 12 ψ 23 ψ 13 2 ψ 13 2 Fig. 2: The new factor graph formed by splitting the ψ 13 potential of the factor grap h in Figure 1 into two poten tials. Deﬁnition II.4. A vector of b eliefs, b , is min- consistent if for all α and all i ∈ α , min x α \ i b α ( x α ) = κ + b i ( x i ) for all x i . Any vector of beliefs that satisﬁes these tw o prope rties p ro- duces a rep arameterizatio n o f the orig inal objecti ve function in terms of the beliefs. As the following theore m de monstrates, any vector of b eliefs obtained from a ﬁxed point of the message upd ates in (7) and (8) satis ﬁes these two properties. Theorem II.1. F or an y v ector of ﬁxed-p oint messages, th e corr espond ing beliefs ar e adm issible a nd min-con sistent. A p roof of The orem I I.1 can be found in Append ix A-A. This result is not new , and simialr proofs can be f ound, for example, in [35]. W e present the pro of in the appen dix only for completeness, an d we will m ake u se o f similar pr oof ideas in subsequ ent sectio ns. I I I . T H E S P L I T T I N G A L G O R I T H M In this section, we p rovide a simple d eriv ation of a reweighted message-passing scheme tha t can be der i ved f rom the min- sum algo rithm by “splitting ” the factor and variable nodes in the factor gr aph. This novel ap proach shows that reweighted m essage-passing schemes can b e derived fr om the standard min- sum a lgorithm on a modiﬁed factor graph, sug- gesting a close link between th ese typ es of message-pa ssing schemes and f actoriz ations. Although this co nstruction appears to make the m essage-passing scheme m ore complicated , in subsequen t sections, we will see that similar id eas can be used to derive co n vergent and corr ect message-pa ssing schem es. Suppose f factorizes ov er G = ( V , A ) a s in ( 1). T ake one potential α ∈ A an d split it into c potentials α 1 , . . . , α c such that for each j ∈ { 1 , . . . , c } , ψ α j ( x α ) = ψ α ( x α ) c for all x α . This allows us to rewrite the objecti ve function , f , as f ( x ) = X i ∈ V φ i ( x i ) + X β ∈A ψ β ( x β ) (12) = X i ∈ V φ i ( x i ) + X β ∈A\ α ψ β ( x β ) + c X j =1 ψ α ( x α ) c (13) = X i ∈ V φ i ( x i ) + X β ∈A\ α ψ β ( x β ) + c X j =1 ψ α j ( x α ) . (14) This rewriting does n ot chan ge the ob jectiv e function, but it do es prod uce a new factor graph F in which so me of the 5 x 4 x 5 x 2 x 3 ψ 245 ψ 23 ψ 345 Fig. 3: New factor gra ph for med fro m the factor gr aph in Figure 1 by splitting the variable node x 1 into two variables x 4 and x 5 . The ne w pote ntials a re given by φ 4 = φ 5 = φ 1 2 , ψ 245 = ψ 12 ( x 4 , x 2 ) − log { x 4 = x 5 } , and ψ 345 = ψ 13 ( x 4 , x 3 ) − log { x 4 = x 5 } . variable no des have a higher degree (see Figure 2). Now , take some i ∈ α and consider the m essages m i → α j and m α j → i giv en by the standard m in-sum algorith m m t i → α j ( x i ) = κ + φ i ( x i ) + X β ∈ ∂ F i \ α j m t − 1 β → i ( x i ) (15) m t α j → i ( x i ) = κ + min x α \ i h ψ α ( x α ) c + X k ∈ α j \ i m t − 1 k → α j ( x k ) i (16) where ∂ F i d enotes the neighbors of i in F . Notice that there is an automorp hism of the gr aph that maps α j to α j ′ for all j, j ′ ∈ { 1 , . . . , c } . As th e messages p assed from any node only depend on the messages received at the previous time step, if the in itial messages are the same at both of these nod es, then they m ust p roduce ide ntical messages at tim e 1. More for- mally , if we initialize the messages identica lly over each split edge, th en, at any time step t ≥ 0 , m t i → α j ( x i ) = m t i → α j ′ ( x i ) for all x i and m t α j → i ( x i ) = m t α j ′ → i ( x i ) fo r all x i . Becau se of this, we can r ewrite the message from i to α j as m t i → α j ( x i ) = φ i ( x i ) + X β ∈ ∂ F i \ α j m t − 1 β → i ( x i ) (17) = φ i ( x i ) + X l 6 = j m t − 1 α l → i ( x i ) + X β ∈ ∂ G i \ α m t − 1 β → i ( x i ) (18) = φ i ( x i ) + ( c − 1) m t − 1 α j → i ( x i ) + X β ∈ ∂ G i \ α m t − 1 β → i ( x i ) . (19) Notice that ( 19) can be viewed as a message-passing algo - rithm on the o riginal factor gr aph. The pr imary difference then between th e up date in (19) and th e min -sum update from (7), in addition to the scaling factor, is th at the me ssage passed from i to α no w dep ends on the message from α to i . Analogou sly , we can also split the variable nod es. Suppo se f facto rizes over G = ( V , A ) as in (1). Now , we will take o ne variable x i and split it into c variables x i 1 , . . . , x i c such that for each l ∈ { 1 , . . . , c } , φ i l ( x i l ) = φ i ( x i l ) k for all x i l . Again, this produce s a new factor gr aph, F . Becau se x i 1 , . . . , x i k are meant to r epresent the same variable, we m ust add a co nstraint to ensure that they are indeed the same. Next, we need to modify the poten tials to in corpor ate the constraint an d the change of variables. W e will constru ct A F such that for each α ∈ A with i ∈ α there is a β = ( α \ i ) ∪ { i 1 , . . . , i c } in A F . Deﬁne ψ β ( x β ) , ψ α ( x α \ i , x i 1 ) − log 1 x i 1 = ... = x i c for all x α \ i . For each α ∈ A with i / ∈ α we simply ad d α to A F with its old p otential. For an example o f this c onstruction , see Figure 3. This rewriting p roduce s a ne w ob jecti ve function g ( x ) = X j 6 = i φ j ( x j ) + c X l =1 φ i ( x i l ) c + X α ∈A F ψ α ( x α ) . (20) Minimizing g is e quiv alent to minimizin g f . Again, we will show that we can collapse the min-sum m essage-passing updates over F to m essage-passing updates over G with modiﬁed potentials. T ake some α ∈ A F containing the new variable i 1 that aug ments the potential γ ∈ A and con sider the messages m i 1 → α and m α → i 1 giv en by the stan dard min -sum algorithm m t i 1 → α ( x i 1 ) = κ + φ i ( x i 1 ) c + X β ∈ ∂ F i 1 \ α m t − 1 β → i 1 ( x i 1 ) (21) m t α → i 1 ( x i 1 ) = κ + min x α \ i 1 h ψ α ( x α ) + X k ∈ α \ i 1 m t − 1 k → α ( x k ) i . (22) Again, if we in itialize the me ssages identically over each split ed ge, then, at any time step t ≥ 0 , m t i 1 → α ( x i ) = m t i l → α ( x i ) fo r all x i and m t α → i 1 ( x i ) = m t α → i l ( x i ) fo r all x i and for any l ∈ { 1 , . . . , k } by symm etry . Using this, we can rewrite the message f rom α to i 1 as m t α → i 1 ( x i 1 ) = κ + ( c − 1) m t − 1 i 1 → α ( x i 1 ) + min x α \ i 1 h ψ γ ( x α \ i , x i 1 ) + X k ∈ α \ i m t − 1 k → α ( x k ) i . (23) By symme try , we o nly need to pe rform o ne m essage up date to compute m t α → i l ( x i l ) for each l ∈ { 1 , . . . , c } . As a result, we can thin k of these messages as be ing passed on th e origina l factor graph G . The co mbined message upd ates for each of these splitting operations are d escribed in Algorithm 1. Throu ghout this discussion, we h av e assumed that each factor was split into c pieces where c was some positi ve integer . If we allow c to b e an arbitrary no n-zero r eal, then the notion of splitting no longer makes sense. Instead, as described in Section IV -B, these splittings can be v iewed more gen erally as pro ducing repara meterzations of the objective func tion. I V . R E PA R A M E T E R I Z A T I O N S Recall fr om T heorem II .1 that th e beliefs produce d fr om ﬁxed p oints o f the min -sum algo rithm are admissible: every vector o f ﬁxed-point beliefs, b ∗ , for the min-sum alg orithm on the g raph G = ( V , A ) produc es a rep arameterization of the objective function f ( x ) = X i ∈ V φ i ( x i ) + X α ∈A ψ α ( x α ) (24) = X i ∈ V b ∗ i ( x i ) + X α ∈A h b ∗ α ( x α ) − X k ∈ α b ∗ k ( x k ) i . (25) 6 Algorithm 1 Synchro nous Splitting Alg orithm 1: Initialize the messages to so me ﬁnite vector . 2: For iteration t = 1 , 2 , . . . up date the th e messages a s f ollows m t i → α ( x i ) : = κ + φ i ( x i ) c i + ( c α − 1 ) m t − 1 α → i ( x i ) + X β ∈ ∂ i \ α c β m t − 1 β → i ( x i ) m t α → i ( x i ) : = κ + min x α \ i h ψ α ( x α ) c α + ( c i − 1 ) m t − 1 i → α ( x i ) + X k ∈ α \ i c k m t − 1 k → α ( x k ) i . In oth er words, we can view the min -sum algorith m as trying to pr oduce a reparame terization of the objective f unc- tion over G in ter ms of min-consistent beliefs. If the factor graph is a tree, then as was obser ved by Pearl an d other s, the min -marginals o f the objective func tion prod uce suc h a factorization. For each i and x i , let f i ( x i ) be the min-marginal for th e variable x i , and for e ach α ∈ A an d x α , let f α ( x α ) be the min-marginal for the vector of variables x α . Lemma IV .1. If f factors over G = ( V , A ) and the factor graph r epresentation of G is a tr ee, then f can b e reparame- terized in terms of its min- mar ginals as f ( x ) = κ + X i ∈ V f i ( x i ) + X α ∈A h f α ( x α ) − X k ∈ α f k ( x k ) i . (26 ) Pr oof: For example, see Theorem 1 of [35]. When the factor graph is not a tree, the min-marginals of the objective fun ction do no t necessarily produce a factorization of the objective fun ction in this way , but we can still ho pe that we c an con struct a m inimizing assignm ent fro m admissible and min -consistent beliefs. In this section , we explore repar ameterization s in an attempt to u nderstand what makes o ne factorization of the objec ti ve function better tha n anothe r . Reparameter izations, and lower bound s deri ved from them, will be an essential ingred ient in the design of convergent and correc t m essage-passing algo - rithms. In Section IV -A, we use message reparameterization s to sho w t hat a slight mod iﬁcation to the deﬁnition o f the beliefs for the m in-sum algo rithm can be used to en sure th at beliefs correspo nding to a ny vector of real-valued messages are admissible. In Section IV -B, we show that a similar techn ique can be used to produce alter native repar ameterizations of the objective function. As in th e case of the splitting algo rithm, each of these r eparameteriza tions will be ch aracterized by a vector of n on-zero reals. T hese repa rameterization s natur ally produ ce lower bou nds o n the objective function. Alternatively , lower b ounds can be derived using duality and a linear progr am k nown as the MAP LP [29]. These two appro aches produ ce similar lower bound s on the objective fu nction. In Section IV -C, we revie w the MAP LP . A. Admissibility a nd the Min-Sum Alg orithm Fixed points of the message upd ate equation s produ ce a reparame terization of the objective function, but an arbitrary vector of m essages need not produce a ne w factorization. This difﬁculty is a direct consequen ce of h a vin g two types of message s (those pa ssed from variables to factors and those passed fro m factors to variables). Howe ver , we could ensure admissibility by introdu cing a vector of messages, m ′ , and rewriting the objectiv e func tion as f ( x ) = X i ∈ V h φ i ( x i ) + X α ∈ ∂ i m ′ α → i ( x i ) i + X α ∈A h ψ α ( x α ) − X k ∈ α m ′ α → k ( x k ) i . (27) If the vector of messages is real- valued, th is rewriting does not change the objec ti ve function. For our ne w vector m ′ , consider the following deﬁnition s f or the b eliefs. b ′ i ( x i ) = φ i ( x i ) + X α ∈ ∂ i m ′ α → i ( x i ) (28) b ′ α ( x α ) = ψ α ( x α ) + X k ∈ α h b ′ k ( x k ) − m ′ α → k ( x k ) i (29) W ith these deﬁnitions, we can e xpr ess th e objecti ve f unction as f ( x ) = X i ∈ V b ′ i ( x i ) + X α ∈A h b ′ α ( x α ) − X k ∈ α b ′ k ( x k ) i . (30) Any ch oice of real-valued factor-to-variable messages pro- duces an alternative factorization of the objective function. Notice tha t th e beliefs as deﬁned in (2 8) a nd (2 9) only depend on one of the two types of messages. Rep arameterization s in the form of ( 30) are meant to b e reminiscent of the reparame terization in terms of min-marginals for trees in (26). Notice that this d eﬁnition of the beliefs corresp onds exactly to those for the m in-sum algorith m if we deﬁn e m ′ i → α ( x i ) = b ′ i ( x i ) − m ′ α → k ( x i ) (31) = φ i ( x i ) + X β ∈ ∂ i \ α m ′ β → i ( x i ) (32) for all α , all i ∈ α , a nd all x i . B. The Splitting Reparameterization The min-su m algorith m produces r eparameter izations of the objective fun ction that ha ve the same for m as (25). Many other reparame terizations in terms o f messages are possible. F or example, g iv en a vecto r o f non-zero r eals, c , we can construct 7 a rep arameterizatio n of th e objective func tion, f ( x ) = X i ∈ V h φ i ( x i ) + X α ∈ ∂ i c i c α m α → i ( x i ) i + X α ∈A h ψ α ( x α ) − X k ∈ α c α c k m α → k ( x k ) i (33) = X i ∈ V c i h φ i ( x i ) c i + X α ∈ ∂ i c α m α → i ( x i ) i + X α ∈A c α h ψ α ( x α ) c α − X k ∈ α c k m α → k ( x k ) i . (34) By an alogy to the min -sum algorithm , for eac h i ∈ V and α ∈ A , we deﬁne the b eliefs co rrespond ing to this reparame terization as b i ( x i ) = φ i ( x i ) c i + X α ∈ ∂ i c α m α → i ( x i ) (35) b α ( x α ) = ψ α ( x α ) c α + X k ∈ α c k h b k ( x k ) − m α → k ( x k ) i . (36 ) This allows us to rewrite the objective function as f ( x ) = X i ∈ V c i b i ( x i ) + X α ∈A c α h b α ( x α ) − X k ∈ α c k b k ( x k ) i . ( 37) By analog y to the min-sum case, we w ill call any vector o f beliefs that satisﬁes the above property for a pa rticular choice of the parameters c -adm issible. Deﬁnition IV .1 . A vector o f be liefs, b , is c -a dmissible for a function f if f ( x ) = κ + X i ∈ V c i b i ( x i ) + X α ∈A c α h b α ( x α ) − X k ∈ α c k b k ( x k ) i (38) for all x . Notice that if we choose c i = 1 fo r all i an d c α = 1 for a ll α then we ob tain the same r eparameteriz ation as the standard min-sum algor ithm. Because of the relation ship to splitting the factors as in Section III, we will call the rep arameterization giv en by (34) the splittin g repara meterization. W e can u se message repar ameterizations to construct lower bound s on the objective function. For examp le, consider the following lower bound obtained from the splitting reparame- terization. min x f ( x ) ≥ κ + X i ∈ V min x i c i b i ( x i ) + X α ∈A min x α c α h b α ( x α ) − X k ∈ α c k b k ( x k ) i (39) Notice that this lower bo und is a concave function of the message vector , m , fo r any choice of the vector c suc h th at each component is nonzer o. Other concav e lower bounds are possible using the sam e rep arameterizatio n min x f ( x ) ≥ κ + X i ∈ V h min x i c i (1 − X α ∈ ∂ i c α ) b i ( x i ) i + X α ∈A h min x α c α b α ( x α ) i . (40) C. Lower Bo unds and the MAP LP Many authors hav e observed that, for ﬁnite state spaces (i.e., |X i | < ∞ for a ll i ) an d objective fun ctions f : Q i X i → R , we can con vert th e optimizatio n p roblem, min x f ( x ) , into an equiv alent in teger progr am by choosing a factorization ( φ, ψ ) ∈ F ( V , A ) ( f ) (see the deﬁnition in Section II- A) and introducin g an indicato r vector µ [39], [37]. T he resulting integer pro - grammin g pro blem appears in Figure 4. If f is m inimized at the assignment x ∗ , then ch oosing µ i ( x ∗ i ) = 1 for all i , µ α ( x ∗ α ) = 1 for all α , and setting the re maining elements of µ to zero correspo nds to an optimum of th e integer pro gram. If the objective function takes th e value ∞ at so me p oint x ∈ Q i X i , then, strictly speak ing, the a bove co nstruction is not tech nically an integer progra m. W e c an cor rect for th is by removing all inﬁnite coefﬁ cients from the linear objectiv e and forcing the co rrespond ing µ variables to zero or one as approp riate. As an example, if ψ α ( x α ) = −∞ for some x α , then we will r emove th e µ α ( x α ) term fro m the linear objective and add a c onstraint that µ α ( x α ) = 0 (this may r esult in the addition of e xp onentially many constraints). The integer prog ram in Figure 4 can be relaxed into a linear pr ogram by r elaxing the in tegrality constraint, allowing µ i ( x i ) and µ α ( x α ) to be non- negativ e reals fo r all i, x i , α, and x α . The resulting linear progr am is typically re ferred to as the MAP LP . W e no te tha t the constraints c an b e wr itten in matrix fo rm as Ax = b su ch that th e compo nents of A a nd b are all integers. Conseq uently , any vertex of the p olytope correspo nding to the system of equations Ax = b must have all ratio nal entries. W e ca n use the MAP LP an d Lagr angian duality in ord er to constru ct lower b ounds on the objecti ve fun ction; different duals will produ ce different lower b ounds. This approach produ ces lower b ounds similar to the ones obtained in the last section and sugg ests a close relatio nship between d uality and reparam eterization. Many different lower bou nds on the objective function hav e been derived using dua lity (e.g. , see [7], [29], an d [37]). For detailed d iscussions of duality an d message-passing see [29] and [ 8]. In addition , exten sions of the duality argumen ts to co ntinuou s variable settings is discussed in [21]. V . L OW E R B O U N D S A N D O P T I M A L I T Y In Sec tion IV, we saw th at, giv en a vector of ad missible beliefs, we can prod uce conca ve lower bou nds on the objective function . The discu ssion in th is section will focus on the pr op- erties of ad missible and min-con sistent beliefs with relatio n to these lo wer bounds. As such, these results will be applicable to a v ariety o f alg orithms, such as th e min -sum a lgorithm, that produ ce beliefs with th ese two properties. 8 minimize P i P x i µ i ( x i ) φ i ( x i ) + P α P x α µ α ( x α ) ψ α ( x α ) subject to P x α \ i µ α ( x α ) = µ i ( x i ) ∀ α ∈ A , i ∈ α, x i ∈ X i P x i µ i ( x i ) = 1 ∀ i ∈ V µ i ( x i ) ∈ { 0 , 1 } ∀ i, x i ∈ X i µ α ( x α ) ∈ { 0 , 1 } ∀ α ∈ A , x α ∈ Q i ∈ α X i Fig. 4: An in teger program ming f ormulatio n of the minim ization problem cor respondin g to the factorization ( φ, ψ ) ∈ F A ( f ) . Recall th at, g iv en a vector of admissible and min -consistent beliefs, we can, u nder certain cond itions, co nstruct a ﬁxed- point estimate x ∗ such that, for all i , x ∗ i ∈ ar g min b i . If the objectiv e function had a u nique global minimum an d the ﬁxed po int beliefs were the tr ue m in-marginals, then x ∗ would indeed be the global minimum . Now , sup pose that the b i are not the true min-margina ls. What can we say abou t the optimality of a vector x ∗ such that, for all i , x ∗ i ∈ arg min b i ? What can we say if there is a uniq ue vector x ∗ with this proper ty? W e explore these qu estions by examinin g the lower bound s and re parameterizatio ns d iscussed in Section I V. Our primary tool for answering these question s will be min- consistency and the following lemma. Lemma V .1. Let b be a vecto r of min -consistent beliefs for a function f tha t facto rizes over G = ( V , A ) . If b is locally decoda ble to x ∗ , then • F or all α ∈ A and x α , b α ( x α ) ≥ b α ( x ∗ α ) . • F or all α ∈ A , i ∈ α , a nd x α , b α ( x α ) − b i ( x i ) ≥ b α ( x ∗ α ) − b i ( x ∗ i ) . Pr oof: Becau se th e beliefs are min-consistent f or a ll α and any i ∈ α , we have min x α \ i b α ( x α ) = κ + b i ( x i ) for all x i and som e constan t κ that may de pend on i and α but do es no t depend o n x i . From this, we can conc lude that for eac h α ∈ A and i ∈ α there is some y α that m inimizes b α with y i = x ∗ i . Further, by the deﬁnition o f locally decodab le, the minimu m is unique fo r each b i . As a result, x ∗ α must minimize b α . No w ﬁx a vector x a nd consider b α ( x ∗ α ) − b i ( x ∗ i ) = h min x α \ i b α ( x ∗ i , x α \ i ) i − b i ( x ∗ i ) (41) = h min x α \ i b α ( x i , x α \ i ) i − b i ( x i ) (42) ≤ b α ( x α ) − b i ( x i ) (43) where (42) follows from the deﬁnition of min-consistency (this quantity is a con stant ind ependen t of x i ). This lemma will be a crucial building block of many of the theorems in this work, and many variants of this lemma have been pr oven in the literature (e .g., Lemma 4 in [35] and Th eorem 1 in [16]). Th e lemm a continu es to hold, if the beliefs are not locally deco dable, when there exists an x ∗ that simultaneou sly min imizes the belief s in th e fo llowing sense. Deﬁnition V .1 . x ∗ simultaneously minimizes a vector of beliefs, b , if x ∗ i ∈ arg min x i b i ( x i ) for all i and x ∗ α ∈ arg min x α b α ( x α ) for a ll α . If the beliefs are n ot locally decodable, we m ay not be able to efﬁciently construct an x ∗ that simu ltaneously m inimizes the b eliefs (if one even exists). Many of the results in sub - sequent section s will apply to any vector that simu ltaneously minimizes the be liefs, but we will focu s on locally d ecodable beliefs for concreteness. Using Lemma V .1 and admissibility , we can con vert ques- tions about the optimality o f the vector x into question s ab out the ch oice of repar ameterization. Speciﬁcally , we focus o n the splitting rep arameterization f ( x ) = X i c i b i ( x i ) + X α c α h b α ( x α ) − X k ∈ α c k b k ( x k ) i . (44) In this sectio n, w e will show how to choose the parameter vector c in order to guaran tee the loc al or global optima lity of any estimate that simultane ously min imizes a v ector of c - admissible and min-consistent beliefs. A. Local Optima lity A func tion f is said to have a local optimu m at the point x ∈ Q i X i if the re is some neighb orhoo d of the poin t x such that f does n ot increase in that n eighbor hood. The deﬁn ition of neighb orhoo d is m etric depen dent, and in the interest of keeping our results applicable to a wide v ariety of spaces, we ch oose the metr ic to be the Hamming distance. For any two vectors x, y ∈ Q i X i , th e Hammin g distance is the n umber of entries in which the two vectors d iffer . For the purpo ses of this p aper, we will restrict o ur deﬁn ition of local optimality to vectors within Hamming distance o ne. Deﬁnition V .2. x ∈ Q i X i is a local m inimum of the objec tive function, f , if for every vector y that ha s at most one entry differ ent fr om x , f ( x ) ≤ f ( y ) . W e will sh ow that there exist choice s of the par ameters for which any estimate, extracted fro m a vector of c -adm issible and min-consistent beliefs, th at simultaneously minimizes all of the beliefs is g uaranteed to be locally optimal with re spect to the Ham ming distan ce. In or der to pr ove such a result, w e ﬁrst need to relate the minima of c -admissible and min-consistent beliefs to the minima o f the objecti ve function . Let b be a vector of c -admissible beliefs for the functio n f . Deﬁne − j = { 1 , . . . , n } \ { j } . For a ﬁxed x − j , we can lower 9 bound th e optimum value of the objective fun ction as min x j f ( x j , x − j ) = min x j " κ + X i c i b i ( x i ) + X α c α h b α ( x α ) − X k ∈ α c k b k ( x k ) i # (45) ≥ g j ( x − j ) + h min x j (1 − X α ∈ ∂ j c α ) c j b j ( x j ) i + X α ∈ ∂ j h min x j c α b α ( x α ) i (46) where g j ( x − j ) is the part o f the rep arameterizatio n that do es not dep end o n x j . The in equality is tight whenever there is a value of x j that simultaneou sly minim izes each com ponent o f the su m. I f the coefﬁcients of the b j ’ s and the co efﬁcients of the b α ’ s in (46) were no n-negative, then we could rewrite this bound as min x j f ( x ) ≥ g j ( x − j ) + h (1 − X α ∈ ∂ j c α ) c j min x j b j ( x j ) i + X α ∈ ∂ j h c α min x j b α ( x α ) i , (47) which d epends o n the minim a of e ach of the beliefs. Recall from Lemma V . 1 th at if b is lo cally deco dable to x ∗ , th en for all i and α , x ∗ must simultane ously minimize b i , b α , an d, for j ∈ α , b α − b j . So, in general, we want to know if we can write f ( x ) = g j ( x − j ) + d j j b j ( x j ) + X α ∈ ∂ j d αα b α ( x α ) + X α ∈ ∂ j d j α h b α ( x α ) − b j ( x j ) i (48) for each j and some vector of non-negativ e constants d . Th is motiv ates the f ollowing deﬁnitio n. Deﬁnition V .3. A function , h , can be written as a conical combinatio n of the beliefs, b , if ther e e xists a vector of non- ne gative reals, d , such tha t h ( x ) = κ + h X i,α : i ∈ α d iα ( b α ( x α ) − b i ( x i )) i + h X α d αα b α ( x α ) i + h X i d ii b i ( x i ) i . (49) The set of all conical combinatio ns of a c ollection of vectors in R n forms a co ne in R n in the same way th at a conv ex combinatio n of vectors in R n forms a con vex set in R n . The above de ﬁnition is very similar to the de ﬁnition of “provably con vex” in [16]. Th ere, an entropy appr oximation is provably con vex if it can be written as a c onical combin ation of the entro py functions correspo nding to each of the factors. In co ntrast, o ur appr oach fo llows from a r eparameteriz ation of the obje cti ve function . Putting all of the above ideas together, we have the following theore m. Theorem V .2. Let b be a vector of c -admissible an d min - consistent beliefs for th e function f such tha t fo r all i , c i b i ( x i ) + P α ∈ ∂ i c α h b α ( x α ) − c i b i ( x i ) i can be written a s a conica l co mbination of the b eliefs. If th e beliefs are locally decoda ble to x ∗ , then x ∗ is a local minimu m (with respect to the Hamming distance) of th e objective function. Pr oof: Choo se a j ∈ { 1 , . . . , n } . By assum ption, the portion of th e objective functio n that d epends on th e variable x j can be written as a co nical co mbination of the b eliefs. By admissibility , up to a c onstant, f ( x ∗ ) = X i c i b i ( x ∗ i ) + X α c α h b α ( x ∗ α ) − X k ∈ α c k b k ( x ∗ k ) i (50) = g j ( x ∗ − j ) + c j b j ( x ∗ j ) + X α ∈ ∂ j c α h b α ( x ∗ α ) − c j b j ( x ∗ j ) i (51) = g j ( x ∗ − j ) + d j j b j ( x ∗ j ) + X α ∈ ∂ j d αα b α ( x ∗ α ) + X α ∈ ∂ j d j α [ b α ( x ∗ α ) − b j ( x ∗ j )] (52 ) ≤ f ( x j , x ∗ − j ) (53) for any x j ∈ X j . The in equality fo llows from Lemma V .1, as x ∗ simultaneou sly minimizes eac h piece of the sum, and the ad missibility o f f . W e can repeat this proof for each j ∈ { 1 , . . . , n } . Theorem V .2 tells us that, u nder suitable ch oices of the parameters, if the belief s correspo nding to a g iv en ﬁxed-po int of the message -passing equations are locally decod able to x ∗ , then no vector x within Hamming distanc e on e of x ∗ can decrease th e objective function. W e can check that cho osing c i = 1 for all i and c α > 0 for all α always satisﬁes the condition s of Th eorem V .2. Conseq uently , if the ﬁxed-po int beliefs co rrespon ding to the min-sum algo rithm are locally decodab le to x ∗ , then x ∗ correspo nds to a local optimum of the objec ti ve f unction. Corollary V . 3. Let b be a vecto r of admissible and m in- consistent be liefs pr oduc ed by the min-sum algorithm. If the beliefs ar e locally decod able to x ∗ , then x ∗ is a local minimum (with res pect to the Ha mming distan ce) of th e objective function. Consider a d ifferentiable function f : R n → R (i.e., X i = R for all i ). Suppose that c i 6 = 0 for all i and c α 6 = 0 for all α . I f a vector of c -admissible and min -consistent b eliefs is locally decoda ble to x ∗ , then we can alw ays inf er that th e gradient of f at the point x ∗ must be zero (this is a direct consequen ce of min-consistency). Con sequently , x ∗ is either a local m inimum, a local m aximum, or a saddle poin t of f . If, in ad dition, c satisﬁes the c onditions T heorem V . 2, th en b y the second derivati ve test and the ob servation th at the f unction can only incr ease in v alue alon g the coordina te axes, x ∗ is eithe r a local minimum or a saddle poin t of f . Similarly , fo r a conve x differentiable fu nction f : R n → R , if c i 6 = 0 for all i and c α 6 = 0 for all α , then x ∗ minimizes f . Corollary V .4. Let f : R n → R be a conve x differ entiable function. Under the hypothe sis of Theor em V . 2, if th e beliefs 10 ar e locally decodable to x ∗ , then x ∗ is a glob al min imum of the o bjective function. B. Global Op timality W e now e xten d the approach of the pr e viou s s ection to show that there are choices of the vector c that guaran tee the global optimality of any unique estimate prod uced fro m c -ad missible and m in-consistent beliefs. As b efore, suppose b is a vector of c -admissible be liefs for the f unction f . If f c an b e written as a conical comb ination of the belief s for some vector d , the n we can lower bound th e op timal value of the objective function as min x f ( x ) ≥ X i,α : i ∈ α d iα min x α ( b α ( x α ) − b i ( x i )) + X α d αα min x α b α ( x α ) + X i d ii min x i b i ( x i ) . (54) This analysis pr ovides us with our ﬁrst glo bal o ptimality result. W e note that the fo llowing theore m also appear s as The orem 1 in [16], an d Theo rem 2 in [ 37] p rovides a similar pro of for the T RMP algor ithm. Theorem V .5. Let b be a vector of c -admissible an d min - consistent b eliefs fo r the function f such that f can be written as a con ical combin ation o f the beliefs. If the beliefs are loca lly decoda ble to x ∗ , then x ∗ minimizes the objective function. The pr oof of Theore m V .5 fo llows fr om Lemma V .1 in nearly the same way as Th eorem V .2. Th e difference between Theorem V .2 a nd Theo rem V .5 is that th e for mer only req uires that the part of the reparameterization depending on a sing le variable can be written as a conica l combinatio n of the belief s, whereas the latter requ ires th e en tire reparam eterization to be a conical comb ination of th e beliefs. W e can easily ch eck that the cond itions of T heorem V .5 imp ly the co nditions of Theorem V . 2, and as a r esult, Theorem V .5 places greater restrictions on the vector c . As a corollary , Th eorem V .5 also pr ovides us with a simple proof of the well-known result that the standard min-sum algorithm is correct on a tree. Corollary V .6. Suppose the factor graph is a tree . If the ad - missible and min- consistent b eliefs pr oduced by the standar d min-sum algorithm a r e locally d ecodable to x ∗ , then x ∗ is the global minimu m of th e objective function. Pr oof: Let b be the vector of min -consistent an d ad- missible beliefs obtain ed fro m ru nning the standard min-sum algorithm . Choo se a variable nod e r ∈ G and consider the factor grap h as a tr ee ro oted at the variable node, r . Let p ( α ) denote the parent of factor node α ∈ G . Becau se G is a tree, we can now write, by admissibility , f ( x ) = κ + X i b i ( x i ) + X α ∈A h b α ( x α ) − X k ∈ α b k ( x k ) i (55) = κ + b r ( x r ) + X α ∈A h b α ( x α ) − b p ( α ) ( x p ( α ) ) i . (56) Therefo re, f can be written as a con ical c ombination of th e beliefs, and we can apply Th eorem V .5 to y ield the de sired result. W e no te th at there are choices o f th e parame ters for which we are guaran teed lo cal optimality but no t global optimality . The stan dard min- sum alg orithm always guar antees local optimality , and there are applicatio ns for which the algo rithm is k nown to pro duce local o ptima that are not g lobally op timal [38]. Giv en Theorem V .5, starting with the vector d seems slightly more natur al than the starting with the vector c . Consider any non-n egati ve real vector d , we now show that we can ﬁn d a vector c such that f has a con ical deco mposition in terms of d provided d satisﬁes a mild c ondition. Gi ven d , we will choose the vector c as c α = d αα + X i ∈ α d iα (57) c i = d ii − P α ∈ ∂ i d iα 1 − P α ∈ ∂ i c α . (58) These e quations are valid whenever 1 − P α ∈ ∂ i c α 6 = 0 . Note that any valid reparameter ization must hav e c i 6 = 0 and c α 6 = 0 f or all α and i . He nce, d αα + P i ∈ α d iα 6 = 0 and d ii 6 = P α ∈ ∂ i d iα . In the case that 1 − P α ∈ ∂ i c α = 0 , c i can be cho sen to be any n on-zero real. Aga in, any valid repar ameterization mu st have c i 6 = 0 and c α 6 = 0 for all α and i . Hence , d αα + P i ∈ α d iα 6 = 0 , but, un like the p revious case, we mu st h av e d ii − P α ∈ ∂ i d iα = 0 . W e now add ress the follo wing question: given a factoriza- tion of the objective fun ction f , d oes there exist a choice of the vector c whic h guarantees th at any estima te obtained from locally decodable belief s m inimizes th e obje cti ve fun ction? The answer to this question is y es, and we will provide a simple cond ition on the vector c that will ensure th is. Corollary V .7. Let b be a vector of c -admissible and min - consistent beliefs for the fun ction f such that 1) F or a ll i , (1 − P α ∈ ∂ i c α ) c i ≥ 0 2) F or a ll α , c α > 0 . If the beliefs are lo cally decoda ble to x ∗ , then x ∗ minimizes the objec tive function. Pr oof: By admissibility , we can write f as f ( x ) = X i h (1 − X α ∈ ∂ i c α ) c i b i ( x i ) i + X α h c α b α ( x α ) i . (59) Observe that if (1 − P α ∈ ∂ i c α ) c i ≥ 0 fo r all i and c α ≥ 0 fo r all α , then the above rewriting provides th e desired conical decomp osition of f . This r esult is quite genera l; for any choice of c such that c α > 0 for all α ∈ A , there exists a cho ice of c i , po ssibly negativ e, for e ach i ∈ V such tha t the conditio ns o f the above theorem are satisﬁ ed. 11 All o f the previous theor ems are equ ally valid for any vector that simu ltaneously min imizes all o f th e beliefs. Given the above results, we would like to understand when the condition s of the th eorems can be ach iev ed. Speciﬁcally , when can we gu arantee that the alg orithm con verges to c -ad missible and min-consistent be liefs that are locally de codable? The remainder of this work attem pts to provide an answer to this question. V I . C O N V E R G E N T A L G O R I T H M S Giv en that the lower bounds discussed in Sections IV and V ar e concave function s o f the messages, we can employ traditional meth ods from conve x optimization in an attemp t to max imize them. One such method is cyclic coordinate ascent. This scheme op erates by ﬁxin g an initial guess for the solutio n to th e optimiza tion problem and th en con structs a be tter solutio n b y p erform ing an optimization over a single variable. Ho wev er, this scheme do es not always con verge to an optimal solution. For example, if the f unction concave is but not strictly con cav e, the alg orithm may becom e stuck at local optima (again , a local optimum with r espect to the Ha mming distance, see Section V -A). Despite this an d other drawbacks, we will attempt to max imize our lo wer bounds on the ob jectiv e function v ia b lock coord inate ascent, a variant of coo rdinate ascent where the update is perform ed over larger subsets of the variables at a time. Our pro of of conv ergence will demon strate that the pro- posed alg orithm canno t de crease the lower bo und (i.e., the value o f the lower bound converges) an d that, on ce the lower bound can not be in creased by further iteration s, the beliefs beh av e as if they are min-con sistent. This deﬁnition of conver gen ce does not guarantee that the belief s or the messages con verge, on ly that the lower bound con verges. First, we will discuss a pa rticular coordinate-ascent scheme, and then we will discuss cond itions under which th e lower boun d is guaran teed to be maximized by this and r elated schemes. A. A S imple Conver gent Algorithm Consider the message-passing sched ule i n Algor ithm 2. This asynchro nous message -passing schedule ﬁxes an o rdering on the variables and f or each j , in order , updates all of the messages from e ach β ∈ ∂ j to j as if j were the roo t of the subtree con taining on ly β and its neighbo rs. W e will show that, for certain choices of the param eter vector c , this me ssage passing schedule cann ot decrease a speciﬁc lower bound o f the objective function at each iteration . T o d emonstrate conver genc e of the a lgorithm, we restrict the p arameter vector c so that c i = 1 for all i , c α > 0 for all α , and P α ∈ ∂ i c α ≤ 1 f or all i . For a ﬁxed vecto r of messages, m , co nsider the lo wer bo und o n the objectiv e functio n min x f ( x ) ≥ X i h c i (1 − X α ∈ ∂ i c α ) min x i b i ( x i ) i + X α h c α min x α b α ( x α ) i , (60) where b is th e vector of belief s derived from the vector o f messages m . Deﬁne LB ( m ) to be the lower bou nd in (60) as a function of th e vector of m essages, m . W e will show that, with this restricted cho ice o f the p arameter vector, Algorithm 2 can b e viewed as a block co ordinate ascen t scheme o n the lo wer bound L B ( m ) . In order to do so, we need the follo wing lemma. Lemma VI.1 . Supp ose c i = 1 for all i , and we perform the update fo r the e dge ( j, α ) a s in Algo rithm 2. If the vector of messag es is r eal-va lued 1 after the update, then b α is min- consistent with r espect to b j . Pr oof: See Appendix A- B. Observe th at, after updating all of the factor -to- variable messages to a ﬁxed v ariab le no de j as in Algorithm 2, b α is min-con sistent with respect to b j for every α ∈ A co ntaining j . The mo st im portant conclusion we ca n draw from this is that there is an x ∗ j that simultaneously m inimizes b j , min x α \ j b α , and min x α \ j b α − b j . Theorem VI.2. Suppose c i = 1 for all i , c α > 0 for all α , and P α ∈ ∂ i c α ≤ 1 for all i . I f the vecto r of messages is real- valued after each iteration of Alg orithm 2, th en for a ll t > 0 , LB ( m t ) ≥ LB ( m t − 1 ) . The p roof o f The orem VI.2 can be found in Appendix A-C. W e say th at the lower b ound has co n verged if for all t > t ′ , LB ( m t ′ ) = LB ( m t ) . Again, this says no thing ab out the conv ergence of the m essages or the b eliefs. Howev er, as part of the proo f of Theorem VI .2, we show that for all t > t ′ , if b t ′ is loc ally deco dable to x ∗ , then x ∗ minimizes th e obje cti ve function . Although the restriction on the p arameter vector in T he- orem VI.2 seems strong , we observe th at f or any objective function f , w e can choose th e parameters such that th e theorem is sufﬁcient to g uarantee co n vergence an d glob al optimality . As an example, if we set c α = 1 max i | ∂ i | for all α and c i = 1 for all i , then c satisﬁes the conditions of Theo rem VI.2. W e present Algor ithm 2 bo th as a con vergent lo cal message- passing algorith m an d as an examp le of ho w the intuition developed from the optimality conditions can b e e xten ded to show conv ergence resu lts: we can achie ve glob al con sistency by carefu lly en suring a weak for m of local co nsistency . Recall that, lik e other coord inate-ascent schemes, this algorithm can become stuck ( i.e., reach a ﬁxed p oint) that doe s not maxim ize the lower bo und. For an e xamp le of an o bjective function and c hoice o f parameters such that the algo rithm may becom e stuck, see [10]. B. Synchr onous Co n ver gence By using the message- passing schedule in Algorithm 2, we seem to lose the d istributed nature of the parallel m essage updates. For some m essage-passing schedules, we can actu ally 1 This is a technica l assumption. Should there exist an edge ( i, j ) such that for some x j , m ij ( x j ) ∈ {∞ , − ∞ } , the n admissibili ty no longer make s sense. This ca nnot oc cur for bounded functions on ﬁnite domain s, but can occur , for example, when using this algorithm to minimize a multiv ariat e quadrat ic function [24]. 12 Algorithm 2 Sequential Splitting Alg orithm 1: Initialize the messages unifor mly to zero. 2: Choose some ordering of th e variables, and p erform the follo wing up date for eac h variable j 3: for each edge ( j, α ) do 4: For all i ∈ α \ j , up date the message fr om i to α m i → α ( x i ) : = κ + φ i ( x i ) c i + ( c α − 1 ) m α → i ( x i ) + X β ∈ ∂ i \ α c β m β → i ( x i ) . 5: Update the message fro m α to j m α → j ( x j ) : = κ + min x α \ j h ψ α ( x α ) c α + ( c j − 1 ) m j → α ( x j ) + X k ∈ α \ j c k m k → α ( x k ) i . 6: end for Algorithm 3 Damped Synch ronou s Splitting Algo rithm 1: Fix a real-valued vector of initial messages, m 0 . 2: Choose δ ∈ [0 , 1] . 3: for t = 1 , 2 , . . . do 4: For each α and i ∈ α , u pdate the message fr om i to α , m t i → α ( x i ) : = κ + φ i ( x i ) c i + ( c α − 1 ) m t − 1 α → i ( x i ) + X β ∈ ∂ i \ α c β m t − 1 β → i ( x i ) . 5: For each α and i ∈ α , u pdate the message fr om α to i , m t α → i ( x i ) : = κ + (1 − δ ) m t − 1 α → i ( x i ) + δ min x α \ i h ψ α ( x α ) c α + ( c i − 1 ) m t i → α ( x i ) + X k ∈ α \ i c k m t k → α ( x k ) i . 6: end for parallelize the up dating process by p erformin g concur rent updates as lo ng as the simultaneous upda tes d o not fo rm a cycle (e.g., we could ran domly select a subset of the message updates that do not interf ere). W e also note that p erformin g updates over larger su bgraph s may be advantageous, and othe r algorithm s, suc h as th ose d iscussed in [37], [1 0], [ 16], and [29], perfo rm up dates over larger subtrees of the factor g raph. Each of these coordinate ascent schemes can be converted into d istributed algorith ms by perform ing multiple coordinate- ascent up dates in parallel and th en av erag ing the resulting message vectors. Un fortuna tely , this pro cess m ay require some amo unt of centra l co ntrol at each step and typically results in slo wer rates of conver genc e wh en compar ed with the orig inal asynchro nous message-passing scheme [2 9]. For example, consider a me ssage vector m . Let m i be the vector of messages pro duced by performing the up date for the variable node i on the message vector m a s in Algorithm 2. For an approp riate ch oice of th e vector c , T heorem VI.2 gu arantees that L B ( m i ) ≥ LB ( m ) f or all i . Since th e lower boun d is concave, we must also have that LB ( P i m i n ) ≥ LB ( m ) where n is the total nu mber of variable nod es. Let m ′ = P i m i n . For all α , i ∈ α , and x i , m ′ α → i ( x i ) = n − 1 n m α → i ( x i ) + 1 n m i α → i ( x i ) . (61) This observation can be used to construct the synchro nous algorithm de scribed in Algo rithm 3. The messages passed b y this scheme are a conve x com bination of the pr e vio us time step and th e splitting u pdates, modulated b y a “ damping ” coefﬁcient δ . Similar damp ed message u pdates are often employed in orde r to help the min-sum algorithm to conv erge. Algorithm 3 is guaranteed to c on verge when the para meter vector satisﬁes the conditio ns o f The orem VI .2 an d δ = 1 n . Other cho ices of δ can a lso result in convergence. C. Relationship to Other Algorithm s Recent work has produce d other message-passing algo- rithms that ar e provably co n vergent under speciﬁc updat- ing schedules: M PLP [7], serial tree-reweighted ma x-prod uct (TR W -S) [10], max -sum d iffusion ( MSD) [40], and the norm- produ ct algorithm [8]. Like Algorithm 2 , these asynchr onous message-passing algo rithms are conver gen t in the sen se that they can each b e viewed as coo rdinate-ascen t schemes over concave lower b ounds. All o f these algorith ms, with the exception o f the nor m- produ ct algorithm, were shown to be memb ers of a particular family of bound minimizing alg orithms [16]. W e note that, ev en when th e param eter vector satisﬁes the conditions of Theorem VI.2, Algo rithm 2 is still not strictly a mem ber of the family of b ound minim izing algorithm s. Th e disparity occurs because the deﬁnition of a bo und minimizin g algorithm as presented therein would require b α to b e min-consistent with respect to x i for all i ∈ α after the upda te is performed over 13 the ed ge ( j, α ) . Instead, Alg orithm 2 only g uarantees that b α is min- consistent with r espect to x j after the u pdate. In this section, we show that all of these m essage-passing algorithm s can be seen as coordina te-ascent schem es o ver concave lower boun ds. More speciﬁcally , their deriv ations, with, p erhaps, th e exception of th e no rm-pro duct alg orithm, can be seen to fo llow the sam e form ula de veloped earlier in this work: 1) Cho ose a reparameterization. 2) Con struct a lo wer bo und. 3) Perf orm coord inate ascent in an attempt to max imize the bound . In some c ases, the message -passing algorithm s themselves can be seen as a special case of the splitting alg orithm ( although the o riginal d eriv ation of th ese algorith ms was typically quite different than that of the splitting algorithm ). Wh ile in oth er cases, a sligh t tweak to the deﬁnition of min-co nsistency allows u s to ap ply the results of the previous sections. 1) TRW -S an d TRMP: The tr ee-reweighted belief propa- gation algo rithm (TRBP) was ﬁrst propo sed in [36], and the application of similar id eas to the M AP infer ence pr oblem is known as the tree-reweighted max -prod uct algorith m (TRMP) [37]. At the h eart of the m in-sum analo g of the TRMP algorithm is th e o bservation that th e o bjective fu nction can be bounded fro m below by a co n vex combin ation o f fu nctions that depend only on factor in duced su btrees of th e f actor graph. As we will see belo w , the message upd ates of the TRMP algorithm , as deﬁne d in [37], are a special case of the splitting algorithm . Although th e TRMP algorithm can b e derived for gen eral factor graphs, for simplicity , we consider the algorithm on a pairwise factor gr aph. When the factoriza tion is p airwise, each factor n ode is connected to at m ost two variable n odes. As a result, the hypergrap h G = ( V , A ) can be viewed as a typical graph. Each edge of G co rrespon ds to a factor node in the factor grap h, and we will write G = ( V , E ) in this case. Let T be th e set of all spannin g trees on G , and le t µ b e a p robability d istribution over T such tha t every edge has a nonzer o pro bability of occurring in at least one spanning tree. Set c i = 1 for all i and c ij = P r µ [( i, j ) ∈ T ] correspo nding to the edge ap pearance prob abilities. Let b b e a vector of c - admissible and min-c onsistent beliefs for f . W e can wr ite the objective function f as f ( x ) = X i ∈ V b i ( x i ) + X ( i,j ) ∈ E c ij h b ij ( x i , x j ) − b i ( x i ) − b j ( x j ) i (62) = X T ∈T µ ( T ) " X i ∈ V T b i ( x i ) + X ( i,j ) ∈ E T h b ij ( x i , x j ) − b i ( x i ) − b j ( x j ) i # (63) where T = ( V T , E T ) is a spann ing tree of G . For e ach T ∈ T , designate a v ariable nod e r T ∈ T as the root of T . For a ll T ∈ T and i ∈ T , let p T ( i ) deno te the parent of node i ∈ T . W e can now write, f ( x ) = X T ∈T µ ( T ) " b r T ( x r T ) + X i ∈ V T ,i 6 = r T h b ip T ( i ) ( x i , x p T ( i ) ) − b p T ( i ) ( x p T ( i ) ) i # (64) Because µ ( T ) ≥ 0 f or all T ∈ T , we ca n conclud e tha t f can be written as a con ical com bination of the beliefs. The TRMP upd ate, deﬁn ed in [ 37], is then exactly Algo rithm 1 with the vector c chosen as above. All of the r esults from the p revious sections can then b e app lied to th is special ca se. For example, by Th eorem V .5, con vergence o f th e TRMP algorithm to locally decodab le be liefs implies c orrectness. The TRMP alg orithm was mo ti vated, in part, by the obser- vation that the min-sum algorithm is co rrect on trees. Howe ver, a similar deriv ation can be made if µ is a probab ility distribu- tion over all spannin g subgrap hs of G containing a t most one cycle. In this case, we would obtain a repar ameterization of the objective fun ction as a conv ex com bination of fun ctions over subg raphs containin g only a single cycle. Although the TRMP alg orithm guarantees c orrectness upon conv ergence to locally deco dable beliefs, the algorithm need not conver ge, e ven if we use Algorithm 2. Speciﬁcally , the vector c does not n ecessarily satisfy the condition s o f Theorem VI.2. The solution, proposed in [10], is to perform the message updates over subtrees in a speciﬁc ord er that is gua ranteed to improve the lower bound. Th e resulting algorithm, known as the TR W -S alg orithm, is then a convergent version of the TRMP algo rithm. 2) MPLP: The M PLP algo rithm was o riginally der i ved by constructing a spe cial dual of the MAP LP from whic h a concave lower bound can be extracted. Th e MPLP algorithm is then a coo rdinate-ascen t schem e for this co ncave lo wer bou nd. The MPLP algorith m w as in itially derived in term s of pairwise factor graphs, an d was extended , with some work, to arb itrary factor graphs in [7]. Unlike the TRMP algor ithm, th e MPLP algorithm does n ot h a ve the tun able param eters require d by the TRMP algorithm. Again, consider a pairwise factor g raph, G = ( V , E ) , with correspo nding objective function , f . Let c i = 1 2 for all i and c ij = 1 for all i . This cho ice of c pro duces th e f ollowing reparame terization. f ( x ) = X i ∈ V b i ( x i ) 2 + 1 2 X ( i,j ) ∈ E h b ij ( x i , x j ) − b i ( x i ) i + 1 2 X ( i,j ) ∈ E h b ij ( x i , x j ) − b j ( x j ) i (65) From ( 65), we can see that this choice of c produ ces a conical decomp osition. Sev eral variants of the MPLP algorithm were presented in the origin al p aper . One such variant, the EMPLP algorith m, 14 can be seen as a coor dinate-ascent scheme on the following lower bo und. min x f ( x ) ≥ X i ∈ V min x i b i ( x i ) 2 + 1 2 X ( i,j ) ∈ E min x i ,x j h b ij ( x i , x j ) − b i ( x i ) i + 1 2 X ( i,j ) ∈ E min x i ,x j h b ij ( x i , x j ) − b j ( x j ) i (66) W e can rewrite the m essage update in term s of m essages passed directly b etween th e n odes o f G . Consider unwrappin g the m essage upd ates as m t ( i,j ) → j ( x j ) = min x j h ψ ij ( x i , x j ) − 1 2 m t − 1 j → ( i,j ) ( x j ) + 1 2 m t − 1 i → ( i,j ) ( x i ) i (67) = min x i " ψ ij ( x i , x j ) − 1 2 h φ j ( x j ) + X k ∈ ∂ j \ i m t − 1 ( j,k ) → j ( x j ) i + 1 2 h φ i ( x i ) + X k ∈ ∂ i \ j m t − 1 ( i,k ) → i ( x i ) i # . (68 ) Now , d eﬁne b m t i → j ( x j ) , 1 2 m t ( i,j ) → j ( x j ) (69) = min x i " 1 2 ψ ij ( x i , x j ) − 1 2 h 1 2 φ j ( x j ) + X k ∈ ∂ j \ i b m t − 1 k → j ( x j ) i + 1 2 h 1 2 φ i ( x i ) + X k ∈ ∂ i \ j b m t − 1 k → i ( x i ) i # (70) for each ( i, j ) ∈ E . Th is is precisely the message upd ate for the EM PLP algorithm in [7] (ther e the self-poten tials are identically equal to zero) . As was the case for th e TRMP algorithm , we can exten d all of th e previous results to the general, no n-pairwise, case. 3) Max-Su m Diffusion: The max- sum diffusion alg orithm and a v arian t known as the aug menting D A G algorithm w ere designed to solve the max-sum problem (i.e., the negated version of the min- sum pro blem). Although discovered in the 1970s by Ukra inian scientists, most of the original work on these algorithms rem ained either in Russian or u npublishe d until a recent survey article [40]. The augm enting DA G algo- rithm was suggested in [26] and later expa nded in [ 13]. The max-sum diffusion alg orithm w as discovered independently by two autho rs [1 4], [5], b ut neith er result w as e ver publishe d. Here, we derive the min-sum ana log of the max-su m diffu- sion algorithm using the mach inery that we have developed fo r the splitting algorithm. Althoug h th e algorithm is a coordin ate- ascent scheme over a familiar lower bound, the message up- dates are n ot an instance o f the splitting algo rithm becau se the ﬁxed po ints are n ot min-consistent in the sense of Deﬁnition II.4. The max-sum diffusion algo rithm was originally described only for pairwise factorizations. Howev er, we will see that the algorithm can be de riv ed for general factor graphs. Consider the rep arameterizatio n of the ob jectiv e fun ction c orrespon ding to the standa rd min-sum algorith m (i.e., c i = 1 f or all i and c α = 1 for all α ) f ( x ) = X i b i ( x i ) + X α h b α ( x α ) − X k ∈ α b k ( x k ) i (71) = X i min x i b i ( x i ) + X α min x α h ψ α ( x α ) − X k ∈ α m α → k ( x k ) i . (72) The following lower boun d follows fro m th is reparame teriza- tion. min x f ( x ) ≥ X i min x i b i ( x i ) + X α min x α h ψ α ( x α ) − X k ∈ α m α → k ( x k ) i (73) The max -sum diffusion algorithm is a coord inate ascen t message-passing scheme that improves th e abov e lo wer bound . Unlike the r eparameteriz ations that produ ced the TRMP and MPLP algorith ms, whether o r not this reparameterization can be written as a conical comb ination of the beliefs dep ends on the u nderlyin g factor gr aph. As such , even if we choose an algorithm that conver ges to a m in-consistent vector of beliefs, we will not be gu aranteed corr ectness. Instead, the max-su m dif fusion algorithm ensures a different form of co nsistency . Na mely , the alg orithm guar antees that the ﬁxed points of th e message -passing schem e satisfy the following fo r e ach α and each i ∈ α : min x α \ i h b α ( x α ) − X k ∈ α b k ( x k ) i = b i ( x i ) . (74) Again, th ere a re m any message up dates th at will g uarantee this form of consistency upo n con vergence. The one chosen by the de velopers of th e max-sum dif fusion algor ithm was m α → i ( x i ) : = m α → i ( x i ) + 1 2 min x α \ i h b α ( x α ) − X k ∈ α b k ( x k ) − b i ( x i ) i . (75) W e can obtain a sim pler m essage update r ule that d oes no t depend o n the pr evious iteration as m α → i ( x i ) : = 1 2 min x α \ i h ψ α ( x α ) − X k ∈ α \ i m α → k ( x k ) i − 1 2 h φ i ( x i ) − X β ∈ ∂ i \ α m β → i ( x i ) i . (76) 15 After c omputing m α → i ( x i ) f or e ach x i , the lower boun d can on ly increase. Fur ther , we can ch eck that if th e algorith m conv erges to locally decod able beliefs, then this estimate is guaran teed to be corre ct. This follows b y replacing our notion of min-co nsistency with th at of (74). In ad dition, the lower bound ( 73), can b e shown to b e dual to the MAP LP [40]. 4) Norm-Pr oduct: Th e norm-pro duct algorithm, lik e the above a lgorithms, is a coordin ate-ascent scheme fo r maximiz- ing a concave dua l objectiv e function [8]. Unlike the previous algorithm s, however , whether or no t the no rm-pr oduct algo- rithm pr oduces a reparameterization of th e ob jecti ve function remains an open question . The alg orithm is derived by studying the ge neral prob lem of min imizing a conv ex objectiv e fun ction ha ving a particula r form. Th e derivation of the algo rithm uses mo re or less standard tools from conve x analysis includin g Fenchel and Lagrang ian duality . While the d eriv ation of this algo rithm is beyond th e scope of this work, it is worth no ting th at, like the sp litting algorith m, the nor m-prod uct algorithm is param- eterized by a real vector . F or some choices of the paramete r vector fo r bo th algorithms, th e norm-pro duct algor ithm a grees with th e asynchr onous splittin g algorith m ( Algorithm 2). 5) Subg radient Metho ds: The ﬁxed points of the splitting algorithm do not ne cessarily corr espond to maxima of the lower bound. As discussed earlier , this pr oblem can occ ur when using coor dinate-ascent sch emes to optimize concave, but not strictly con cav e, lower bo unds. Other optimization strategies do not suffer from this pro blem b ut may have slo wer rates of con vergence. In this section, we discuss one alter native strategy known a s subgr adient ascen t. The sub gradient ascent method is a g eneralization o f the gradient ascent method to function s that are n ot ne cessarily differentiable. Let g : R n → R , and ﬁx a y 0 ∈ R n . h ∈ R n is a subg radient of g at the point y 0 if for all y ∈ R n , g ( y ) − g ( y 0 ) ≥ h · ( y − y 0 ) . I f g is differentiable, th en ∇ g ( y 0 ) is the only subg radient o f g at y 0 . The subgr adient method to maximize the fu nction g p er- forms the iteration in Algorithm 4. Unlike gradien ts, sub- gradients do not necessarily correspo nd to ascent d irections. Howe ver , und er cer tain con ditions on the seq uence γ 1 , . . . , γ t , the su bgradien t algorithm can be shown to conv erge [28]. The subgradien t algo rithm can be con verted into a dis- tributed algorithm by exploiting the fact that the sub gradient of a sum of fu nctions is e qual to the su m of the in dividual subgrad ients. Such a strategy has been used to design mas- ter/slav e a lgorithms fo r max imizing the concav e lower bo unds above [12]. This pro cedure r equires a certain amount of cen tral control, th at may not b e possible in certain ap plications. A double loop m ethod that is equiv alent to th e subgr adient method was p roposed in [22]. V I I . G R A P H C OV E R S Thus far , we have d emonstrated that cer tain parameterized variants of the min-su m algor ithm allow u s to guarantee bo th conv ergence an d correctn ess, up on con vergence to locally decodab le belief s. Ev en if the be liefs are n ot locally decodable, they still produce a lo wer bound on the ob jecti ve f unction, b ut 1 2 3 4 (a) A grap h G 1 2 3 4 1 2 3 4 (b) A 2-co ver of G Fig. 5 : An example of a graph cover . Nodes in the cover are labeled b y the no de that th ey a re a copy o f in G . they seem to tell u s very little abou t t he argmin of the objectiv e function . W e have provided relati vely little intu ition about when w e can expect the converged be liefs to be locally decod able. The success of the m in-sum alg orithm is intrinsically tied to both the uniqueness o f the o ptimal so lution and the “har dness” o f the o ptimization problem. For lo wer bounds such as those in Section IV, we pr ovide n ecessary conditions f or dual o ptima to be locally decod able. These necessary con ditions rely o n a notion of ind istin- guishability: the splitting algo rithm, in attem pting to solve the minimization problem o n one factor grap h, is actually attempting to solve the minimization p roblem over an entire family o f equivalent (in some sense) factor gra phs. The same notion of indisting uishability has been studied for g eneral distributed message-passing schemes [1], [2], an d we expect ideas similar to th ose discussed in this and sub sequent sections to be applicable to o ther iterative algor ithms as well. The above no tion of indistinguishability is captured by th e formalism o f gr aph covers. Intu iti vely , if a graph H covers the graph G , th en H h as the same lo cal stru cture as G . Th is is potentially pr oblematic as ou r local message-p assing schemes depend o nly on local graph structur e an d local potentials. Deﬁnition VII.1. A graph H covers a graph G if ther e exists a graph homomorphism h : H → G such that h is an isomorphism on neig hborho ods (i.e., fo r all vertices i ∈ H , ∂ i is mapped bijectively onto ∂ h ( i ) ). If h ( i ) = j , then we say that i ∈ H is a cop y of j ∈ G . F urther , H is an η -cover of G if e very verte x o f G has exactly η copies in H . Graph cov ers may be connected (i. e., there is a path between ev ery pair of v ertices) or disconnected. Ho wever , when a graph cover is disconn ected, all of the connec ted components of the cover must th emselves be covers of the or iginal gra ph. For a simple example of a connected g raph cover , see Figu re 5. Every ﬁnite cover of a con nected g raph is an η -c over for some integer η . For ev ery base graph G , there exists a gr aph, possibly inﬁnite, which c overs all ﬁnite, connected covers of the base gr aph. This grap h is known as the u niv ersal cover . Throu ghout this work , we will be p rimarily co ncerned with ﬁnite, conn ected covers. 16 Algorithm 4 Subgrad ient Ascent 1: Let g : R n → R . 2: Choose an initial vector y 0 . 3: for t = 1 , 2 , . . . do 4: Construct a new vector y t by setting y t : = y t − 1 + γ t h t where h t is a subgradien t of g at y t − 1 and γ t is the step size at tim e t . 5: end for T o any ﬁnite cover, H , of a factor g raph G we can associate a collection of poten tials derived from the base g raph; the potential at nod e i ∈ H is equal to the potential at node h ( i ) ∈ G . T ogether , these po tential fu nctions d eﬁne a new objective function fo r th e factor gra ph H . In the sequ el, we will use super scripts to specify that a particula r object is over the factor gr aph H . For example, we will deno te the objective function correspo nding to a factor gr aph H as f H , and we will write f G for the objective function f . Graph covers, in the context of graph ical m odels, were originally studied in relation to local message-passing algo- rithms [3 1]. Synchron ous local m essage-passing algo rithms such as th e min-sum an d splitting algo rithms are incapab le of distinguishing the two factor graphs H and G given tha t the initial messages to and from each nod e in H are identical to the nodes they cover in G : for every no de i ∈ G the m essages received and sen t by this node at time t are exactly the same as the messages sent and received at time t by any copy of i in H . As a resu lt, if we use a local message-passing algo rithm to deduce an assignmen t for i , th en th e algorithm ru n on the graph H must ded uce th e same assignment fo r each copy of i . A similar argu ment can be made for any sequence o f message updates. Now , consider an objectiv e functio n f that factors with re- spect to G = ( V G , A G ) . For any ﬁnite cover H = ( V H , A H ) of G with covering h omomo rphism h : H → G , we can “lift” any vector of beliefs, b G , fro m G to H by d eﬁning a new vector o f beliefs, b H , such that • For all variable nod es i ∈ V H , b H i = b G h ( i ) . • For all factor nodes α ∈ A H , b H α = b G h ( α ) . Analogou sly , we can lift any assignment x G to an assignmen t x H by setting x H i = x G h ( i ) . A. Pseudo-c odewor ds Surprisingly , minima of the objecti ve f unction f H need not be lif ts of the minima of th e o bjective function f G . Even worse, th e min imum value o f f G does not ne cessarily correspo nd to the minim um value of f H . Th is idea is the basis for the theory o f pseudo-cod e words in the LDPC (lo w-den sity parity-ch eck) co mmunity [3 1], [34]. In this community , valid codewords, assign ments satisfying a speciﬁc set of constraints, on gra ph covers that are not lifts of valid codewords o f the base g raph are r eferred to as pseudo -codewords. The existence of pseudo-co dew ord s is not uniqu e to co ding theory . Con sider the maximu m weight independ ent set pr ob- lem in Figure 6. The m aximum weig ht ind ependen t set for the 3 2 2 (a) A grap h G 3 2 2 3 2 2 (b) A 2-co ver of G Fig. 6: An examp le of a maximu m weight indepen dent set problem an d a graph cover whose maximu m weigh t indepen - dent set is not a co py of an independ ent set on the orig inal graph. The nodes are labeled with their corresponding weights. graph in Figur e 6 (a) h as weigh t thr ee. The maximum we ight indepen dent set on the 2-cover of this graph in Figure 6 (b) has weig ht seven, which is larger than the lift of the max imum weight ind ependen t set fr om the b ase gr aph. Because local message-passing a lgorithms cannot distin- guish a factor graph fr om its covers and our lower b ounds provide lower bounds o n the objectiv e f unction correspond ing to any ﬁnite cover , we e xpe ct that maxim a of the lo wer bounds, at best, cor respond to the o ptimal so lution on some graph cover of th e or iginal pro blem. For the splitting algor ithm, we can make this in tuition precise. Theorem VII.1. Let b be a v ector of c -ad missible and min- consistent beliefs for th e function f , such tha t f can be written as a conical combina tion o f the beliefs. If ther e exists an assignment x G that simultaneously minimizes the b eliefs, the n • Th e assignment x G minimizes f G , and for any ﬁnite cover H of G , x H , the lift o f x G to H , minimizes f H . • F or any ﬁnite cover H of G with covering homomor- phism h , if y H minimizes f H , then for all i ∈ H , y H i ∈ a r g min x ′ i b h ( i ) ( x ′ i ) . Pr oof: The belief s can be lifted fr om G to H . In oth er words, the beliefs deﬁne a rep arameterizaion o f th e ob jectiv e function f H . Consequ ently , as f G can be written as a conical combinatio n of the b eliefs, f H can also be wr itten as a co nical combinatio n of the be liefs. The ﬁrst observation then f ollows from Corollary V .7. For the second ob servation, observe that, because x H minimizes f H and simultaneo usly min imizes each of the beliefs b H , any other minimum , y H , of f H must also si multan eously m inimize the beliefs b H . If not, f ( y H ) > f ( x H ) , a contradiction . As a consequence of Theorem VII.1, fo r any ch oice of the p arameter vector that guarantees correctn ess, the splitting 17 algorithm can only con verge to a locally d ecodable vector o f admissible and min-con sistent b eliefs if th e objective functio n correspo nding to any ﬁnite graph cover has a unique optim al solution that is a lift of the un ique optimal solution on the base g raph. B. Graph Covers a nd the MAP LP The existence o f p seudo-co dew ord s is problematic because local message-p assing algo rithms cann ot distinguish a grap h from its covers. For objective fu nctions over ﬁnite state spac es, we can relate the previous obser vations to the MAP LP . There is a one- to-one correspond ence between the r ational feasible points of the MAP LP and assignments on graph covers: ev ery ration al feasible po int of the MAP LP correspo nds to an assignme nt on some cover , and every assignment on a cover o f the base gra ph correspo nds to a r ational fea sible point of the MAP LP . Similar ide as ha ve been explored in the coding com munity: a co nstruction relating p suedo-co dew ord s to g raph c overs was provid ed in [ 31] an d an LP relaxatio n of the codew ord polytope, the convex hull of valid codew ord s, is described in [4]. Theorem VII.2. The vector µ is a rational, feasible point of the MAP LP for f G if and on ly if th er e e xists an η -cover H of G and an assignment y H such that f H ( y H ) = η h X i ∈ V G X x i µ i ( x i ) φ i ( x i ) + X α ∈A G X x α µ α ( x α ) ψ α ( x α ) i . Pr oof: See Appendix A-D. Because the polyhed ron corr espondin g to the MAP LP is rational, the optim um of the MAP L P is attained at a ration al point. Hence, th ere is an η -cover H of G and an assignmen t y H that, in th e sense of Theor em VII.2, achiev es the optim al value of the MAP L P . This assignm ent cor respond s to a lower bound on the minimu m value o f any ﬁnite cover o f G ( i.e., for every η ′ -cover H ′ of G and any a ssignment y H ′ on H ′ , f H ( y H ) η ≤ f H ′ ( y H ′ ) η ′ ). The poly tope correspo nding to the MAP LP is related to the fundam ental poly tope deﬁn ed in [ 31] and the linear program relaxation of [4]. The f undamen tal po lytope of [31] contains only the information correspon ding to the variable nod es (i. e., each µ i ( x i ) ) in th e MAP LP whereas the p olytope Q of [4 ] is equ i valent to the MAP L P for the codin g p roblem. C. Lower Bo unds and the MAP LP The relationship between the MAP LP and graph covers allows u s to extend the n ecessary con ditions for lo cal dec od- ability of Theo rem VII. 1 to lower boun ds who se max imum is equal to the minimum of the MAP LP . For the an alysis in this section, we will concentrate on lower bou nds as a function of the messages such that sup m LB ( m ) is equal to the optimum value of the MAP LP . Lower bounds that are dual to the MAP LP such that strong duality holds always satisfy this pr operty , and many o f the lower bound s discussed in this work (e.g., those produce d by Algorithm 2, TRMP , MPLP , max-sum diffusion, etc.) can be shown to have the requir ed proper ty . For simplicity , we co nsider the lower b ound related to Algorithm 2. Let f G factor over G = ( V , A ) . Restrict the parameter vector c so that c i = 1 fo r all i , c α > 0 for all α , and P α ∈ ∂ i c α ≤ 1 f or all i . For any vector of messages, m , with corre sponding b eliefs, b , recall the following lower bound on the objective functio n min x f G ( x ) ≥ X i ∈ V h c i (1 − X α ∈ ∂ i c α ) min x i b i ( x i ) i + X α ∈A h c α min x α b α ( x α ) i (77) , LB G ( m ) . (78) Theorem VII.3. If m G maximizes L B G , th en the following ar e equ ivalent. 1) The MAP LP h as an in te gral optimum. 2) The r e is an assignment x ∗ that simultaneously minimizes each term in the lower boun d. 3) If x G minimizes f G , the n for any ﬁ nite graph cover H of G , x H , the lift of x G to H , minimizes f H . Pr oof: (1 → 2) By The orem VII.2, there exists a 1 -cover of G and an assignmen t x G such that x G minimizes f G . Because m ∗ maximizes the lower bou nd, we hav e that min x f G ( x ) = LB G ( m G ) . Because o f this equa lity , any assignment that minimizes f G must also simultaneously minimize each te rm of the lo wer bound. (2 → 3) Suppose that there exists an assignment, x G , that simultaneou sly minimizes each co mponen t of the lower bound. Let H be a ﬁnite cover o f G , let x H be th e lift of x G to H , and let m H be the lif t of m G to H . x H must simultaneo usly minimize each term of the lo wer boun d LB H which implies that f H ( x H ) = LB H ( m H ) and th at x H minimizes f H . (3 → 1) Supp ose that f G has a minim um, x G , and for any gr aph cov er H of G , f H has a minimum at x H , the lift of x G to H . By Theo rem VII.2 e very rational feasib le point correspo nds to an assignment o n some ﬁnite grap h cover H of G an d v ice versa. Because e very cov er is min imized by a lift of x G , the MAP LP m ust have a n integral op timum. For lower bounds dual to the MAP LP , the equiv alence of 1 and 2 also follows from Lemmas 6.2 and 6.4 in [29]. In the case tha t the op timum of the MAP LP is u nique and integral, we have the following coro llary . Corollary VII.4. If m G maximizes LB G , then the following ar e equ ivalent. 1) The MAP LP h as a unique and inte gral optimum, x G . 2) f G has a un ique minimum x G , and fo r any ﬁnite graph cover H of G , f H is un iquely minimized at x H , the lift of x G to H . Notice that these cond itions characterize th e existence of an x ∗ that simultaneo usly minimizes each o f the components of the lower bound , but they do not p rovide a m ethod for 18 constructing such an x ∗ . As we sa w earlier, lo cal decodability is one co ndition th at ensures th at we can con struct a solution to the in ference pr oblem. As was the ca se for th e splitting algorithm , Th eorem VI I.3 tells us th at du al op timal solution s cannot be locally dec odable unless e very graph cover of the base factor g raph has a unique solution that is a lift of the solution on the base gr aph or, equivalently , that the op timum of MAP LP is u nique and integral. In other words, Theorem VII.3 provides necessary , but not sufﬁcient, conditions for dual optima to be locally decodable. D. P airwise Binary Graphical Models In the special case that the state space is binary (i.e., each x i can only take on e o f two values) and the factors depend on at most two variables, we can strength en the results of the previous sections by providing n ecessary and sufﬁcient c ondi- tions for lo cal decod ability . Previous work on p airwise binary graphica l models h as fo cused on the relationsh ip between the conv erged beliefs and solutions to the MAP LP [11], [39]. In this work, we f ocus o n th e relationships between the base factor grap h an d its 2 -covers. In the context of graph covers, the most surprising proper ty of pairwise binary graph ical m odels is that, for any choice of the vector c an d any ﬁxed poin t of the message u pdates, we can alw ays construct a 2-cover , H , and an a ssignment on this cover, x H , su ch that x H simultaneou sly m inimizes the ﬁxed-point beliefs when lifte d to H . I n th e case that th at the objective function can be wr itten as a co nical co mbination of the be liefs, this assignment would b e a global min imum of the objective function on the 2- cover (correspond ing to a r ational optimum of the MAP LP). Theorem VII.5. Let b G be a vector of c -admissible and min- consistent beliefs for the objective fu nction f G that fac tors over G = ( V , A ) . If the factorization is pairwise binary , then ther e exis ts a 2-cover , H , of G a nd a n assignmen t, y ∗ , o n that 2-cover such that y ∗ simultaneou sly min imizes the lifted beliefs. Pr oof: W ithout loss of gener ality we can assum e that X i = { 0 , 1 } fo r all i ∈ V . W e will co nstruct a 2-cover , H , of the factor graph G and an assign ment y ∗ such th at y ∗ minimizes f H . W e will index the cop ies of variable i ∈ G in the factor graph H as ( i, 1) an d ( i, 2) . Fir st, we will construc t the assignment. If arg min x i b G i ( x i ) is uniq ue, then set y ∗ ( i, 1) = y ∗ ( i, 2) = ar g min x i b G i ( x i ) . Other wise, set y ∗ ( i, 1) = 0 and y ∗ ( i, 2) = 1 . Now , we will construct a 2-cover, H , su ch that y ∗ minimizes e ach of the belief s. W e will do this edge by edg e. Consider the edge ( i, j ) ∈ E . There are sev eral possibilities. 1) b G i and b G j have unique argm ins. In this case, by Lemma V .1, b G ij is m inimized at b G ij ( y ∗ ( i, 1) , y ∗ ( j, 1) ) . So, we will add the ed ges (( i, 1) , ( j, 1 )) and (( i, 2) , ( j, 2)) to H . The correspon ding beliefs b H ( i, 1)( j, 1) and b H ( i, 2)( j, 2) are minimized at y ∗ . 2) b G i has a uniq ue argmin and b G j is minimized at both 0 and 1 (or vice versa). In this case, we ha ve y ∗ ( i, 1) = y ∗ ( i, 2) , y ∗ ( j, 1) = 0 , and y ∗ ( j, 2) = 1 . By m in- consistency , we can conclud e that b G ij is minimized at ( y ∗ ( i, 1) , 0 ) and ( y ∗ ( i, 1) , 1 ) . Ther efore, we will add the edges (( i, 1) , ( j, 1)) and (( i, 2) , ( j, 2)) to H . 3) b G i and b G j are minimized at both 0 and 1 . In this case, we have y ∗ ( i, 1) = 0 , y ∗ ( i, 2) = 1 , y ∗ ( j, 1) = 0 , and y ∗ ( j, 2) = 1 . By min-con sistency , there is an assignment that min i- mizes b G ij ( x i , x j ) with x i = 0 and an assignmen t that minimizes b G ij ( x i , x j ) with x i = 1 . This means that arg min x i ,x j b G ij ( x i , x j ) contains a t least one o f th e sets { (0 , 0) , (1 , 1) } o r { (0 , 1) , (1 , 0) } . In the ﬁrst case, we will ad d the edges (( i, 1) , ( j, 1)) an d (( i, 2) , ( j, 2)) to H , and in the second case, we will ad d (( i, 1) , ( j, 2)) and (( i, 2) , ( j, 1)) to H . The co nstructed H and y ∗ then satisfy the requireme nts in the statement of the theore m. From th e co nstruction in Theor em VII.5, for th e ca se of pairwise binary graphical models, a solution to the MAP LP must always co rrespond to an o ptimal assign ment on some 2-cover of the base grap h. A similar phen omenon o ccurs for cycle codes (see Corollary 3.5 of [ 3]). Giv en the con struction in Theor em VII.5, we can explicitly describe ne cessary and sufﬁcient cond itions f or th e splitting algorithm to con verge to locally decodable beliefs. Corollary VII.6. Let c satisfy the con ditions of Theor em VI .2. Algorithm 2 conver ges to lo cally decoda ble beliefs, b G , if and only if for every c over H of G , f H has a unique minimizing assignment. The proof o f Corollary VII .6 ca n be fo und in App endix A- E. Another con sequence of T heorem VII. 5 is that, in th e pair wise binary case, Algorith m 2 alw ays con verges to a vector of messages that max imizes the lower bo und (i.e., the coordinate ascent alg orithm d oes not g et stuck). An altern ativ e pro of, based on duality , tha t the coordinate ascent scheme does not become stuck for these m odels can be foun d in [11]. V I I I . L O C A L V E R S U S G L O BA L As a consequence of Theorem VII.1, fo r any ch oice of the p arameter vector that guarantees correctn ess, the splitting algorithm can only con verge to a locally decodab le vector of admissible and min-consistent beliefs if th e objective functio n correspo nding to any ﬁn ite gra ph cover has a uniqu e o ptimal solution that is a lift of the un ique optimal solution on th e base graph. The correspon ding result for pairwise binary factorizations is that this conditio n is both necessary an d sufﬁcient. These results high light the inherent weaknesses of coni- cal d ecompo sitions and , conseq uently , the dual ap proach in general: con ical decom positions guarantee the cor rectness, on ev ery ﬁnite cover o f th e base grap h, of any assign ment that simultaneou sly minimizes each o f th e beliefs. For many ap- plications, this requirem ent on graph covers is very re stricti ve. This sugg ests th at th e convergent and correct strategy outlined in the previous ch apters is not as usef ul as it seems, an d for many ap plications, the standard min-su m algo rithm, wh ich 19 only gu arantees a fo rm o f local o ptimality on all gra ph covers, may still be preferred in practice. In this section, we b rieﬂy discuss speciﬁc examp les of h ow these issues arise in practice. A. Quadratic Minimization The quad ratic min imization p roblem provides a simple example of how the gu arantee o f glo bal optimality can b e undesirab le in practice. Let Γ ∈ R n × n be a symm etric positive deﬁnite ma trix and h ∈ R n . T he qu adratic minimizatio n problem is to ﬁnd the x ∈ R n that minimizes f ( x ) = 1 2 x T Γ x − h T x . The glo bal optimum must satisfy Γ x = h , and as a result, minim izing a positi ve deﬁnite q uadratic fu nction is equiv alent to solving a positiv e deﬁn ite linear system. In this case, the m in-sum algo rithm is u sually called GaBP ( Gaussian Belief Pro pagation) . Consider the following deﬁnitions. Deﬁnition VIII.1. Γ ∈ R n × n is sca led diagonally dominant if ∃ w > 0 ∈ R n such that | Γ ii | w i > P j 6 = i | Γ ij | w j . Deﬁnition VIII .2. Γ ∈ R n × n is w alk summable if th e s pectral radius  ( | I − D − 1 2 Γ D − 1 2 | ) < 1 . Her e, D − 1 / 2 is the diago nal matrix such th at D − 1 / 2 ii = 1 √ | Γ ii | . Here, we u se | A | to denote the matrix obtained from A by taking th e absolute value of e very entry . For positi ve deﬁnite Γ , the sufﬁciency o f walk-summability for the conver genc e of GaBP was demon strated in [15] while the sufﬁciency of scaled diagonal domina nce w as demon- strated in [ 17] and [18]. In [24] and [23], we showed the following. Theorem VIII.1. Let Γ be a symmetric matrix with po sitive diagonal. The following ar e equ ivalent. 1) Γ is walk su mmable. 2) Γ is scaled diagonally do minant. 3) All covers of Γ are positive deﬁnite. 4) All 2-covers of Γ ar e p ositive deﬁ nite. Here, a c over of the matrix Γ is the matr ix co rrespond ing to a cover of th e pair wise graphica l mod el for the factorization f ( x ) = h X i Γ ii 2 x 2 i − h i x i i + h X i>j Γ ij x i x j i . (79) That is, every matrix that is positive de ﬁnite but not walk summable is covered by a matrix that is not positive d eﬁ- nite. As any choice of the parameter vector satisfy ing the condition s of Theorem V .5 mu st gua rantee the correctness of a ny collection of locally decodable belief s on all covers, the splitting algo rithm cannot conver ge to a vector of locally decodab le b eliefs for such a choice of the p arameter vecto r for any matrix in this class. As a resu lt, the convergent an d correct message-passing algo rithms would b e a poor choice for solving p roblems in th is particular regime e ven when the objective fun ction is strictly conve x. A similar observation continues to hold for mo re gene ral conve x fu nctions. B. Coding Th eory Empirical studies of the c hoice of the p arameter vector were discussed in the c ontext of LDPC cod es in [42]. Here, the authors discuss an algorith m c alled the Divide and Con cur algorithm that is related to the splitting algorith m. They demonstra te exper imentally th at the b est decod ing perfor- mance is not ac hiev ed at a ch oice o f the parameter vector that guaran tees co rrectness but at a choice o f the param eter vector that only gu arantees local optimality . W e ref er the interested reader to their paper for mo re details. I X . C O N C L U S I O N In this work, we pre sented a n ovel app roach for the deriv ation o f co n vergent and correct message-passing a lgo- rithms based on a reparameter ization fram ew ork mo ti vated by “splitting ” the nod es of a given factor graph . W ithin this framework, we focused on a spe ciﬁc, parameter ized family of message -passing algorith ms. W e p rovided condition s o n the par ameters th at guar antee th e local or global op timality of loc ally deco dable ﬁxed points, and describ ed a simple coordin ate-ascent scheme th at gu arantees conver gence . In add ition, we sho wed h ow to connect assignments on graph covers to ration al p oints of the MAP LP . This ap proach allowed us to p rovide necessary condition s for local d ecod- ability an d to discuss the limitatio ns of conver gent and correct message passing. T hese results suggest th at, while convergent and cor rect message-passing algor ithms have some advantages over the standard min-sum algo rithm in theo ry , algo rithms that only guar antee local op timality still have pra ctical advantages in a v ariety o f settings. A P P E N D I X A P R O O F S A. Pr oof of Theor em II.1 Theorem. F or a ny vector of ﬁ xed-po int messages, th e corre- sponding beliefs ar e admissible a nd min- consistent Pr oof: Let m b e a ﬁxed po int of the m essage up date equations m i → α ( x i ) = κ + φ i ( x i ) + X β ∈ ∂ i \ α m β → i ( x i ) (80 ) m α → i ( x i ) = κ + min x α \ i h ψ α ( x α ) + X k ∈ α \ i m k → α ( x k ) i . (81) First, we will show that m pr oduces min -consistent beliefs. T ake α ∈ A an d choose some i ∈ α . Up to an additi ve co nstant we can write min x α \ i b α ( x α ) = min x α \ i h ψ α ( x α ) + X k ∈ α m k → α ( x k ) i (82) = m i → α ( x i ) + min x α \ i h ψ α ( x α ) + X k ∈ α \ i m k → α ( x k ) i (83) = m i → α ( x i ) + m α → i ( x i ) (84) = φ i ( x i ) + h X β ∈ ∂ i \ α m β → i ( x i ) i + m α → i ( x i ) (85) = b i ( x i ) . (86) 20 Next, we can check that the b eliefs are admissible. Again, up to an additive constant, f ( x ) = X i φ i ( x i ) + X α ψ α ( x α ) (87) = X i  φ i ( x i ) + X α ∈ ∂ i m α → i ( x i ) i + X α h ψ α ( x α ) − X i ∈ α m α → i ( x i ) i (88) = X i b i ( x i ) + X α h ψ α ( x α ) − X i ∈ α m α → i ( x i ) i (89) = X i b i ( x i ) + X α h ψ α ( x α ) − X i ∈ α b i ( x i ) + X i ∈ α b ( x i ) − X i ∈ α m α → i ( x i ) i (90) = X i b i ( x i ) + X α h b α ( x α ) − X i ∈ α b i ( x i ) i . (91) B. Pr oof of Lemma VI.1 Lemma. S uppose c i = 1 for all i , an d we p erform th e u pdate for the ed ge ( j, α ) as in Algorithm 2 . If the vector of messages is r eal-value d after the upda te, then b α is min-consistent with r espect to b j . Pr oof: Le t m be the vecto r of messages b efore the u pdate and let m + be the vector of messages after th e u pdate. By th e deﬁnition of Algorithm 2, f or each i ∈ α \ j , up to an ad ditiv e constant, m + i → α ( x i ) = φ i ( x i ) − m α → i ( x i ) + X β ∈ ∂ i c β m β → i ( x i ) (92) = φ i ( x i ) − m + α → i ( x i ) + X β ∈ ∂ i c β m + β → i ( x i ) (93) = b i ( x i ) − m α → i ( x i ) . (94) Similarly , m + α → j ( x j ) = κ + min x α \ j h ψ α ( x α ) c α + X k ∈ α \ j m + k → α ( x k ) i (95) Let b + be th e vector of beliefs prod uced derived fr om m + . W ith the se observations, up to an additive constant, min x α \ j b + α ( x α ) = min x α \ j " ψ α ( x α ) c α + X k ∈ α h b + k ( x k ) − m + α → k ( x k ) i # (96) = m + α → j ( x j ) + h b + j ( x j ) − m + α → j ( x j ) i (97) = b + j ( x j ) (98) where (9 7) follows fr om the observation in (9 5). C. Pr oof of Theor em VI.2 Theorem. S uppo se c i = 1 for all i , c α > 0 for all α , and P α ∈ ∂ i c α ≤ 1 for all i . I f the vecto r of messages is r eal- valued after each iteration of Alg orithm 2, th en for a ll t > 0 , LB ( m t ) ≥ LB ( m t − 1 ) . Pr oof: The message upd ates per formed f or the variable node j in each iteration o f Algorithm 2 cann ot decrease th e lower bou nd. T o see this, let LB j ( m ) denote the ter ms in L B that inv olve the variable x j . LB j ( m ) , (1 − X β ∈ ∂ j c β ) min x j b j ( x j ) + X β ∈ ∂ j c β min x β b β ( x β ) (99) where b is the vector of b eliefs generated by th e message vector m . W e can upper bo und L B j as LB j ( m ) ≤ min x j " (1 − X β ∈ ∂ j c β ) b j ( x j ) + X β ∈ ∂ j c β min x β \ j b β ( x β ) # (100) = min x j " φ j ( x j ) + X β ∈ ∂ j c β min x β \ j h ψ β ( x β ) c β + X k ∈ β \ j ( b k ( x k ) − m β → k ( x k )) i # (101) where (101) follows by plugging in th e deﬁnition of the beliefs and co llecting like terms. The upper boun d in (101) doe s not dep end o n th e ch oice of the messages fro m β to j for any β ∈ ∂ j . As a result, any choice of these m essages fo r which the ineq uality is tight m ust maximize LB j . Ob serve that the up per boun d is tight if there exists an x j that simultaneou sly m inimizes b j and min x β \ j b β for e ach β ∈ ∂ j . By Lem ma VI.1, this is indeed the case after pe rformin g the updates in Algor ithm 2 for the variable node j . As th is is the only par t o f the lo wer bound affected by the up date, we have that LB can not decrease. Le t m b e the vector of messages befo re the update for variable j an d m + the vector after the update (see Lemm a VI.1). Deﬁne LB − j to be the sum of th e terms of the lower bo und that d o not in volve the variable x j . By deﬁnition and th e ab ove, LB ( m ) = LB − j ( m ) + LB j ( m ) (102) ≤ LB − j ( m ) + LB j ( m + ) (103) = LB − j ( m + ) + LB j ( m + ) (104) = LB ( m + ) (105) where (104) follows from the observation that the upda te has no effect on m essages tha t are not p assed in to j or out of any α co ntaining j . As L B ( m ) is bou nded from ab ove by min x f ( x ) , we can co nclude th at the v alue of the lower bo und conv erges. Finally , the lower bound has converged if no sing le v ariable update can impr ove the bou nd. By the argum ents above, 21 this mu st mean that there exists an x j that simu ltaneously minimizes b j and min x α \ j b α for each α ∈ ∂ j . Th ese beliefs may o r may no t be min -consistent. Now , if there exists a unique minimizer x ∗ , then x ∗ j must simu ltaneously minimize b j and min x α \ j b α for ea ch α ∈ ∂ j . From this we can conclud e that x ∗ simultaneou sly minimizes all of the beliefs and therefor e, u sing the argument from Corollary V .7, must minimize th e objective functio n. D. Pr oof of Theor em VII.2 Theorem. The vector µ is a rational, fea sible poin t of the MAP LP for f G if and on ly if there e xists an η - cover H o f G and a n assignmen t y H such that f H ( y H ) = η h X i ∈ V G X x i µ i ( x i ) φ i ( x i ) + X α ∈A G X x α µ α ( x α ) ψ α ( x α ) i . Pr oof: First, su ppose µ is a rational, feasible point of the MAP LP for f G . Let η b e the smallest integer such that η · µ i ( x i ) ∈ Z f or all i and x i and η · µ α ( x α ) ∈ Z for all α and x α . Construct an η -c over H of G by creatin g η copies of each v ariable and factor n ode of G . W e will think of e ach copied variable as correspo nding to a particular assign ment. For example, co nsider the η copies in H of the variable node i ∈ G . Exactly η · µ i ( x i ) of these copies will b e assigned the value x i . Similarly , each of the η co pies in H of the factor node α ∈ G will corresp ond to a par ticular assignm ent: η · µ α ( x α ) of these copies will b e assigned the value x α . W e can now add edges to H in the following way: fo r each copy α ′ ∈ H o f th e factor node α ∈ G , w e loo k at its correspo nding assignme nt and conn ect α ′ to a subset of the variables in H that ar e copies o f some i ∈ α , n ot already connected to a copy o f the node α in H , wh ose cor respondin g assignments are con sistent with th e assignment for α ′ . Note that th ere is not necessarily a u nique way to do this, but after this process, ev ery copy in H o f i ∈ α will be connected to a copy of α in H . After r epeating this fo r th e re maining factors in G , we m ust have that H is an η - cover of G , and the assign ment y H , given b y the chosen assignm ent corre - sponding to each variable node in H , must satisfy f H ( y H ) = η h P i P x i µ i ( x i ) φ i ( x i ) + P α P x α µ α ( x α ) ψ α ( x α ) i . For the oth er d irection, let H be a n η - cover of G with cover homom orphism h : H → G , and let y H be an assignment to the variables in H . Deﬁne µ i ( x i ) = 1 η P j ∈ H,h ( j )= i 1 y H j = x i . Observe that η · µ i ( x i ) is th en equal to the number of times in the assignm ent y H that some co py in H of the variable i ∈ G is assigned the value x i . Similarly , deﬁne µ α ( x α ) = 1 η P β ∈ H, h ( β )= α 1 y H β = x α . W ith th ese deﬁn itions, µ α ( x α ) is sum consistent (min-con sistency with the min replaced by a sum) with respect to µ i ( x i ) for each i ∈ α . Th is means that µ is feasible for the MAP LP . Finally , we have f H ( y H ) = X i ∈ H φ h ( i ) ( y H i ) + X α ∈ H ψ h ( α ) ( y H α ) (106) = X i ∈ G X x i η · µ i ( x i ) φ i ( x i ) + X α ∈ G X x α η · µ α ( x α ) ψ α ( x α ) . (107) E. Pr oof of Cor o llary VI I.6 Corollary. Let c satisfy the condition s of Th eor em VI.2. Algorithm 2 conver ges to lo cally decoda ble beliefs, b G , if and only if for every c over H of G , f H has a unique minimizing assignment. Pr oof: The lo wer bou nd has converged if no single variable u pdate can improve the b ound. By the argumen ts in the proo f of Theorem VI.2, this must mea n that fo r each j th ere exists a n x ∗ j that simultaneou sly minim izes (1 − P i ∈ ∂ j c ij ) b j ( x j ) and min x i b ij ( x i , x j ) for each i ∈ ∂ j . Notice that th ese beliefs may or may not be min-co nsistent. Howe ver , as observed in the proo f of Theorem VI.2, when LB j cannot be improved it is indepen dent of the messages passed fro m i to j for each i ∈ ∂ j . As a result, we may assume that the beliefs are min- consistent as they mu st ha ve the same minima as th e min-co nsistent b eliefs. For on e direction of th e proo f, suppo se that there exists an x ∗ such that th e beliefs, on G , are lo cally d ecodable to x ∗ . This implies th at f or any cover H ′ of G the lift of b to H ′ is lo cally decodab le to the lift of x ∗ to H ′ . Recall th at this must imp ly that every graph cover has a uniqu e minimizing assignmen t; the lower boun d is tig ht to each g raph cover and this imp lies that any minimizing assignm ent m ust simultan eously min imize the beliefs. For the othe r d irection, sup pose that ev ery graph cover has a un ique minimizing assignm ent. W e can constru ct a vector y ∗ and a g raph cover H as in Theorem VI I.5: if th ere is a un ique x j that simu ltaneously m inimizes (1 − P i ∈ ∂ j c ij ) b j ( x j ) and min x i b ij ( x i , x j ) for each i ∈ ∂ j , th en set y ∗ j 1 and y ∗ j 2 equal to this x j . Oth erwise, set y ∗ j 1 = 0 and y ∗ j 2 = 1 . The 2- cover , H , can then be constructed using the vector y ∗ as in the proof of the theorem . Construct a vector z ∗ similarly to the vector y ∗ but swapping the role of zer o and one: if th ere is a uniqu e x j that simultaneo usly minimizes (1 − P i ∈ ∂ j c ij ) b j ( x j ) and min x i b ij ( x i , x j ) for each i ∈ ∂ j , then set z ∗ j 1 and z ∗ j 2 equal to this x j . Otherwise, set z ∗ j 1 = 1 and z ∗ j 2 = 0 . By the symmetry of th e constructio n, the v ector z ∗ also simu ltaneously minimize s th e beliefs on H . Finally , as each grap h cover has a uniqu e minimizin g assignment, we must have z ∗ = y ∗ and th e r esult follows. W e have z ∗ = y ∗ if an d o nly if the beliefs, on H , are locally decodab le to y ∗ . In addition, if z ∗ = y ∗ , then ther e exists an x ∗ such that the beliefs, on G , are locally de codable to x ∗ . R E F E R E N C E S [1] D. Angluin, “Local and glo bal propert ies in networks of processors, ” in Proc eedings of the twelfth annual ACM symposiu m on Theory of computin g (STOC) , New Y ork, NY , USA, 1980, pp. 82–93 . 22 [2] D. Angluin and A . Gardiner , “Finite common cove rings of pairs of regul ar graphs, ” J ournal of Combinatoria l Theory , Series B , vol. 30, no. 2, pp. 184 – 187, 1981. [3] D. Dreher , “Cycles in graphs and co vers, ” SIAM Jo urnal of Discr ete Math , to appe ar . [4] J. Feldman, M. W ainwrig ht, and D. Karger , “Using linear programming to decode binary lin ear codes, ” Informat ion Theory , IEEE Tr ansactions on , vol. 51, no. 3, pp. 954 – 972, March 2005. [5] B. Flach, “ A dif fusion algorithm for decreasing energy of max-sum label ing problem, ” 1998, at Fakult ¨ at Informatik, T echnische Uni versi t ¨ at Dresden, Germany . Unpublished. [6] B. J. Frey and D. Dueck, “Clusterin g by passing messages betwee n data points. ” Science , January 2007. [7] A. Globerson and T . Jaakko la, “Fixing max-product: Con ve rgent mes- sage passing algori thms for MAP LP-relaxati ons, ” in Neur al Information Pr ocessing Syst ems (NIPS) , V ancouv er , B.C., Can ada, 2007. [8] T . Hazan and A . Shashua, “Norm-product belief propagatio n: Primal- dual message-passin g for approximate inferenc e, ” Information Theory , IEEE T ransacti ons on , v ol. 56, no. 12, pp. 6294 –6316, Dec. 2010. [9] T . Heskes, “On the uniqueness of loop y belief propagati on ﬁxed points, ” Neural Computatio n , vol. 16, no. 11, pp. 2379–2 413, Nov . 2004. [10] V . K olmogoro v , “ Con verg ent tree-re weight ed message passing for en - ergy minimizati on, ” P att ern Analysis and Mac hine Intel ligenc e, IEEE T ransac tions on , vol. 28, no. 10, pp. 1568–1583, Oct. 2006. [11] V . Kolmogor ov and M. J. W ainwright, “On the optimal ity of tree- re weighted max-produc t message-passing , ” in UAI , 2005, pp. 316–323. [12] N. Komodaki s, N. Paragi os, and G. Tziritas, “Mrf optimizati on via dual decomposit ion: Message-pa ssing revi sited, ” in IEEE 11th Internat ional Confer ence on Computer V ision , October 2007, pp. 1–8. [13] V . K. Ko val and M. Schlesin ger , “Dvumernoe programmirov anie v zadac hakh analiza izobrazheniy (two-di mensional programming in im- age analysis problems), ” USSR Academy of Science , Automatics and T elemech anics , vol. 8, pp. 149 – 168, 1976. [14] V . A. K ov ale vsky and V . K. Ko v al, “ A diffusi on algorith m for decreasing ener gy of max-sum label ing probl em, ” approx. 1975, at Glushkov Institut e of Cybern etics, Kie v , USSR. Unpublishe d. [15] D. M. Mal ioutov , J. K. Johnson, and A. S. Wil lsky , “W alk-sums and belie f propaga tion in gaussian graphica l models, ” J . Mach . Learn. Res. , vol. 7, pp. 2031–2 064, 2006. [16] T . Meltzer , A. Globerson, and Y . W eiss, “Con ver gent message passing algorit hms: a unifying view , ” in Uncertainty in Artiﬁcal Intell igence (U AI) , Montreal, Canada, 2009. [17] C. Moallemi and B. V an Roy , “Con ve rgence of min-sum m essage pass- ing for quadratic optimiza tion, ” Information Theory , IEEE T ransaction s on , vol. 55, no. 5, pp. 2413 –2423, m ay 2009. [18] ——, “Con ve rgence of min-sum message-passing for con v ex optimiza- tion, ” Information Theory , IEEE T ransac tions on , vol. 56, no. 4, pp. 2041 –2050, April 2010. [19] K. P . Murphy , Y . W ei ss, and M. Jordan, “Loopy belief propagat ion for approximat e inferen ce: An empirical study , ” in Uncertainty in Artiﬁcial Intell igence (U AI) , S tockhol m, Sweden, 1999 , pp. 467–475. [20] J. Pearl, “Re vere nd bayes on inferen ce eng ines: A distrib uted hierarchi - cal approach, ” in Pr oceeding s of the American Association of Artiﬁci al Intell igence Natio nal Confer ence on AI (A AAI) , Pittsbu rgh, P A, 1982, pp. 133–136. [21] J. Peng, T . Hazan, D. McAllester , and R. Urtasun, “Con vex max-product ov er compact sets for protein folding, ” in Procee dings of the 28th Internati onal Confer ence on Machin e Learning (ICML-11) , ser . ICML ’11, L. Getoor and T . Schef fer , Eds. New Y ork, NY , USA: ACM, June 2011, pp. 729–73 6. [22] P . Ravik umar , A. Agarwal, and M. J. W ai nwright, “Message-passing for graph-structu red linear pro grams: Proximal m ethods and round ing schemes, ” J. Mach. Learn. Res. , vol. 11, pp. 1043–1080, 2010. [23] N. Ruozzi and S. T ati konda, “Unconstrained minimiza tion of quadratic functio ns via min-sum, ” in 44th Annual Confer ence on Information Scienc es and Syste ms (CISS) , Princeton, NJ, USA, March 2010. [24] N. Ruoz zi, J. Thaler , and S. T atik onda, “Graph co vers and quadratic minimizat ion, ” in Communicati on, Contr ol, and Comp uting, 47th Annual Allerton Confer ence on , Oct. 2009. [25] S. Sangh avi, D. Shah , and A. S. Will sky , “Me ssage passing for maxi mum weight independent set, ” Information Theory , IEEE Tr ansactions on , vol. 55, no. 11, pp. 4822–4834, No v . 2009. [26] M. Schlesinger , “Sinta ksicheskiy analiz dvumern ykh zritel nikh s ignal ov v uslovi yakh pomekh (syntac tic analysis of two-dimensional visual signals in noisy conditi ons), ” Kiberneti ka , vol. 4, pp. 113 – 130, 1976. [27] A. Schwing, T . Hazan, M. Pollefe ys, and R. Urtasun, “Distrib uted message passing for large scale graphical models, ” in Computer V isi on and P attern Recogniti on (CVPR), 2011 IEEE Confer ence on , Colorado Springs, Colorado, June 2011, pp. 1833 –1840. [28] N. Z. Shor , K. C. Kiwiel, and A. Ruszcay ` nski, Minimization methods for non-diff erent iable functions . Ne w Y or k, NY , USA: Springer -V erlag Ne w Y ork, Inc., 1985. [29] D. Sontag and T . Jaakkola, “Tree block coo rdinate desce nt for MAP in graphic al models, ” in Proc eedings of the 12th Internat ional Conf erenc e on A rtiﬁci al Intell igence and Statisti cs(AISTA TS) , Clearw ater Beach , Florida, USA, 2009. [30] P . O. V ontobel and R. Koe tter , “On the relationship between linear pro- gramming decoding and min-sum algorith m dec oding, ” in Internationa l Symposium on Information Theory and its Applicati ons (ISIT A) , Parma, Italy , Oct. 2004, pp. 991–996. [31] ——, “Graph-cov er decoding and ﬁnite-l ength analy sis of message- passing iterat iv e dec oding of LDPC codes, ” CoRR , vol. abs/cs/ 0512078, 2005. [32] ——, “T o wards lo w-complexi ty lin ear programming decoding , ” in 4th Internati onal Confer ence on T urbo Codes and Related T opics , Munic h, Germany , April 2006. [33] ——, “ On lo w-comple xity linear-p rogramming decoding of LDPC codes, ” Eur opean T ransact ions on T e lecommunic ations , v ol. 18, no. 5, pp. 509–517, 2007. [34] P . V ontobel, “ A combinatori al charact erizatio n of the bethe and the kikuchi partit ion functio ns, ” in Information Theory and Applicatio ns W orkshop (ITA), 2011 , Feb . 2011, pp. 1 –10. [35] M. W ainwright, T . Jaakk ola, and A. S. Wi llsky , “Tree consistenc y and bounds on the performance of the max-product algorithm and its general izations, ” Statistics and Computing , vol. 14, no. 2, pp. 143–166, 2004. [36] M. J. W ain wright, T . S. Jaakk ola, and A. S. W illsky , “Tree -rewe ighted belie f propagatio n algorithms and approximate ML estimation via pseudo-moment matching, ” in Pr oceedings of the 12th Internati onal Confer ence on Artiﬁcia l Intellige nce and Statisti cs (AIST ATS) , Ke y W est, Florida, USA, 2003. [37] M. W ai nwright, T . J aakk ola, and A. Wi llsky , “MAP estimation via agreemen t on (hyper)trees: Message-passing and linear programming, ” Informatio n Theory , IEEE T ransac tions on , vol. 51, no. 11, pp. 3697– 3717, Nov . 2005. [38] Y . W eiss and W . Freeman, “On the optimality of solut ions of the max- product belief-propa gation algorithm in arbitrary graph s, ” Informat ion Theory , IEEE T ransacti ons on , vol. 47, no. 2, pp. 736 –744, Feb . 2001. [39] Y . W e iss, C. Y anover , and T . Meltzer , “MAP estimat ion, linear program- ming and bel ief propaga tion with conv ex free ener gies, ” UAI , 2007. [40] T . W erne r, “ A linea r programming approac h to max-sum problem: A re vie w , ” P attern Analysis and Mac hine Intellige nce, IEE E T ransact ions on , vol. 29, no. 7, pp. 1165–1179, 2007. [41] W . W iegerin ck and T . Hesk es, “Fractional belief propagati on. ” in 15th Neural Information Pr ocessing Systems (NIPS) , S. Be cker , S. Thrun, and K. Obermay er , Eds. MIT Press, 2002, pp. 438–445. [42] J. Y ed idia, Y . W ang, and S. Draper , “Div ide and concur and differe nce- MAP BP decoders for LDPC codes, ” Informati on Theory , IEEE T rans- actions on , vol. 57, no. 2, pp. 786 –802, Feb . 2011.

Message-Passing Algorithms: Reparameterizations and Splittings

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment