Exactness of Belief Propagation for Some Graphical Models with Loops

Exactness of Belief Propagation for Some Graphical Mo dels with Lo ops Mic hael Chertk o v chertk ov@lan l.gov Theoretical Division and Center for Nonlinear Studies, Los Alamos National Lab orator y , Los Alamos, NM 8754 5 Abstract. It is well known that an arbitra r y graphical mo del of statistica l inference deﬁned on a tree, i.e. on a graph without lo ops, is s olved exactly and eﬃciently by an iterative Be lief Pro pagation (BP) algor ithm convergen t to unique minimu m of the so-called Bethe free energy f unctional. F or a general graphical model on a loopy graph the functional may show mult iple minima, the itera tiv e BP algorithm may conv erge to one of the minima or may not conv erge at all, a nd the global minim um o f the Bethe free energy functional is not g uaranteed to corresp ond to the optima l Maximum- Likelihoo d (ML) s o lution in the zero-temp era ture limit. Ho wev er, there a re exceptions to this general rule, discussed in [12] and [2] in t w o diﬀerent contexts, w he r e zero- temper ature version of the BP algo rithm ﬁnds ML solution for sp ecial mo dels on graphs with lo o ps. These tw o mo dels shar e a key feature: their ML solutions can be found by an eﬃcient Linear Prog ramming (LP) algo rithm with a T otally-Uni-Mo dula r (TUM) matrix of co nstraints. Generalizing the tw o models we co nsider a clas s of graphical mo dels reducible in the zero temp eratur e limit to LP with TUM constra ints. Assuming that a gedanken alg o rithm, g-B P , ﬁnding the g lobal minimum of the Bethe free energy is av ailable we show that in the limit o f zero temp erature g-BP outputs the ML so lution. Our consideratio n is based on eq uiv a lence esta blished b etw een gapless Linear Pro g ramming (LP) rela xation of the g raphical mo del in the T → 0 limit and resp ective LP version of the Bethe-F ree ener gy minimization. P ACS num bers: 02.5 0.Tt, 6 4.60.Cn, 05.50.+ q Submitted to: Journal of Statistic al Me chanics Exactness of Belief P r op agation for Some Gr aphic al Mo dels with L o ops 2 1. In tro duction Belief Propagation (BP) is a n algorithm ﬁnding ML solution or marginal probabilities on a graph without lo o ps, a tree. The algorit hm w as in tro duced in [8] as an eﬃcien t heuristic for deco ding of sparse (so called gra phical) co des and it w as indep enden tly considered in the context o f g r aphical mo dels o f a rtiﬁcial in telligence [16]. Or ig inally the alg orithm w as primarily t ho ugh t of a s an iterative pro cedure. [24, 23], inspired by earlier w orks of [3] and [17] in statistical ph ysics, suggested to us e a more fundamen tal notion of the Bethe free energy . Extrema of t he Bethe free energy represen t ﬁxed p oin ts of the iterative BP alg orithm on graphs with cycles. Equations describing the statio nary p oin ts of the Bethe free energy and the ﬁxed p oin ts of the iterative BP are called Belief Propagation, or Bethe-Peie rls, BP equations. The s igniﬁcance of BP , understo o d as an algorithm looking for a minim um of the Bethe free energy , w as further elucidated within the framew ork coined lo o p calculus [5, 6]. It w as sho wn that algorithm ﬁnding a n extrem um of the Bethe F ree Energy is not just an appro ximation/heursistics in the lo op y case, but it allo ws explicit reconstruction of the exact inference in terms of a series, where eac h t erm corresp onds to a lo op on the graph. If the gra phical mo del is dense t here are many lo ops, and th us many con tributions to the lo op series. How ev er, not all lo ops are equal. Th us, considering mo dels c haracterized b y the same graph but diﬀeren t factor functions or lo cal w eigh ts (exact deﬁnitions will f o llo w) one exp ects strong sensitivit y of an individual lo o p contribution (and its signiﬁcance within t he lo op serie s) on the f actor functions. In this con text it is of in terest to study the fo llowing que stion: are there graphical mo dels, deﬁned on an arbitrary graph but with sp ecially t uned factor functions, such that BP provides exact inference? P ositiv e answ er to this question w as give n, indep enden tly and for tw o diﬀeren t mo dels, b y [12] and [2]. It was shown in [12] that for a g r a phical mo del deﬁned on an arbitrary graph in terms of binary v ariables with pairwise sub-mo dular in teraction a prop erly deﬁned vers ion of BP (linear programming relaxation underlying the tree- rew eigh ted metho d of [20]) yields a globally optimal Maxim um Lik eliho o d solution. This mo del is equiv alen t to the F erromagnetic Random Field Ising (FR FI) mo del p opular in statistical ph ysics of disordered syste ms, see e.g. [10]. Maxim um w eigh t matc hing problem on a bipartite graph w as analyze d in [2 ] and later in [18], whe re it w as show n that, in spite o f the fact that the underlying graph has man y short cycles, an algorithm of BP t yp e do es con v erge to correct ML assignmen t . This consideration w as also extended to the problem of w eigh ted b-matc hings on an a r bitrary graph (whic h is y et another problem solv able exactly by BP) in [1, 11]. Closely related general results, discussing con v exiﬁed v ersion of Bethe F ree energy and an iterativ e con v ex-BP sc heme con v erging to resp ectiv e LP , w ere rep orted in [22, 9]. In this pap er w e use the Bethe free energy approach of [23] to suggest a complemen tary and unifying explanation to these remark able, a nd somehow surprising, Exactness of Belief P r op agation for Some Gr aphic al Mo dels with L o ops 3 Original model at finite Integer Programming Bethe free energy formulation Linear Programming (A) Linear Programming (B) Figure 1. Scheme illustrating the sequence of transfor mations and re lations discussed in the pap er. results of [12, 2]. In tw o subsequen t Sections w e consider t w o models, FRFI discus sed in [12] and a binary mo del with T otally Uni-Mo dular (TUM) constraints generalizing the weigh t matc hing pr o blem considered in [2]. Statistical weigh t s are deﬁned f o r b ot h mo dels in terms o f a c haracteristic temperature, T . Our strat egy in dealing with b oth mo dels is illus trated in Fig. 1. It consists of the followin g three steps. • Starting from the original setting we ﬁrst go a n ti-clo c kwise, getting a n Integer Programming (IP) for mulation for the ML, T → 0, v ersion of the problem. The most imp ortant feature of the tw o mo dels is that the LP relaxation o f the resp ectiv e IP , shown as LP-A in the Figure, is tigh t/exact. In b oth cases this reduction fr o m IP to LP is exact due to the T otal-Uni-Mo dularit y (TUM) feature of the underlying matrix of constraints. • Then w e return to the o riginal setting and start moving clo c kwise (see the Figure), ﬁrst to the Bethe free energy formu lation o f the problem. W e call the gedank en algorithm, ﬁnding g lobal minima of t he Bethe free energy , g-BP ‡ In the T → 0 limit the Bethe free energy turns to resp ectiv e self-energy (the entrop y term m ultiplied b y temp erature is irrelev an t) whic h is a linear functional of b eliefs. Th us one g ets to an ‡ The Bethe fr e e energ y is non-co nv ex, therefor e funding the g lo bal minimum a t a ﬁnite temp eratur e is not necessar ily stra ightf orward. Ackno wledging impo rtance of the problem, we will not dis c uss in this manuscript pla usibilit y and details of r esp ective iterative algo rithm c o nv ergent to the global minimum of the Bethe free energy . W e r e fer interested re a der to comprehensive discussion of such iterative schemes in the general co n text in [2 2, 9] a nd for FRFI model and maximum weigh t matching mo del in [12] and in [2, 18] resp ectively . Exactness of Belief P r op agation for Some Gr aphic al Mo dels with L o ops 4 LP problem here as w ell, the one sho wn a s LP-B in the Fig ure. This tr a nsformation from g- BP to LP-B is analogous to similar relation b et w een g-BP and LP-deco ding in tro duced in the co ding theory in [7, 19]. • Finally , w e sho w that LP-A and LP-B are iden tical, thus demonstrating that g-BP in the T → 0 limit outputs the ML solution. Note, that con v exit y of the Bethe free energy at ﬁnite temperature, play ing the ke y role in analysis of [12 , 2 2, 9], is not a required part our consideration. Moreo v er, the Bethe free energy of the bina r y mo del with TUM constraints is generally not conv ex. 2. F erromagnetic Random Field Ising mo del Consider an undirected graph G , consisting of n v ertexes, V = { 1 , · · · , n } and w eigh ted edges, E , with the weigh t matrix ( J ij | i, j = 1 , · · · , n ) suc h that whenev er the tw o v ertexes are connected by an edge, i.e. i ∈ j or j ∈ i , J ij > 0, and J ij = 0, otherwise. It is also useful to intro duce the directed vers ion of the gra ph, G d , where an y undirected edge of G is replaced by t w o directed edges of G d , with the w eigh ts J i → j = J j → i = J ij / 2 resp ectiv ely . The F erromagnetic R andom Field Ising (F RFI) mo del is deﬁned b y the follo wing statistical w eigh t asso ciated with any conﬁguratio n of σ = ( σ i = ± 1 | i = 1 , · · · , n ) on V : p ( σ ) = Z − 1 exp   1 2 T X ( i,j ) ∈G J ij σ i σ j + 1 T X i h i σ i   = Z − 1 exp   1 2 T X ( i → j ) ∈G d J i → j σ i σ j + 1 T X i h i σ i   , (1) where h i can b e p ositive or negativ e; T is the t emp erat ur e; Z is the partitio n function, enforcing the pro babilit y normalization condition, P σ p ( σ ) = 1; and ( i, j )/( i → j ) marks an undirected/directed e dge of G / G d connecting the t w o neigh b ors i and j . 2.1. F r om FRFI to the Min-Cut Pr oblem Maxim um Like liho o d (gro und state) solution o f Eq. (1) turns in to the problem of quadratic in teger programming min σ   − 1 2 X ( i → j ) ∈G d J i → j σ i σ j − X i ∈G d h i σ i         ∀ i ∈G d : σ i = ± 1 . (2) It is w ell know n that an y sub-modular energy function (and the q uadratic function in Eq. (2) is of this t yp e) can b e minimized in p olynomial time by reducing t he task to the maxim um ﬂow/min-cut problem [4 , 10]. In this Subsection we will r epro duce these kno wn results. T o unify linear and quadra t ic terms in Eq. (2), one constructs a new graph, G ′ d , adding t w o new no des to G d : source (s) and destination (d), with σ s = +1 and σ d = − 1 Exactness of Belief P r op agation for Some Gr aphic al Mo dels with L o ops 5 resp ectiv ely . The (s)-no de is linked t o a ll the no des of G d with h i > 0, while any no de of G d with h i < 0 is link ed to (d). W eigh ts of the newly intro duced directed edges o f G ′ d are J s → i = 2 h i , if h i > 0; J i → d = 2 | h i | , if h i < 0 . (3) This results in the f ollo wing ve rsion of Eq. (2) min σ   − 1 2 X ( i → j ) ∈G ′ d J i → j σ i σ j         ∀ i ∈G d : σ i = ± 1; σ s =+1; σ d = − 1 . (4) Reduction from quadratic in teger programming (4) to an in teger linear programming is the next step. This is ach iev ed via transformation to the edge v aria bles, ∀ ( i → j ) ∈ G d : η i → j = ( 1 , σ i = 1 , σ j = − 1 0 , otherwise . (5) The relations can also b e restated ∀ ( i → j ) , ( j → i ) ∈ G d : σ i σ j + σ j σ i = 2 − 4 ( η i → j + η j → i ) (6) ∀ ( d → i ) , ( j → t ) : σ i σ d = 1 − 2 η d → i , σ s σ j = 1 − 2 η j → s , (7) Therefore, taking into accoun t that , J i → j = J j → i for any ( i → j ) , ( j → i ) ∈ G d , substituting Eqs. (5,6,7) in to Eq. (4) and c hanging v a riables from σ i = ± 1 to p i = (1 − σ i ) / 2 = 0 , 1 one arrives at − 1 2 X ( i → j ) ∈G ′ d J i → j + min { η,p } X ( i → j ) ∈G ′ d J i → j η i → j       ∀ i ∈ G ′ d , p i = 0 , 1; ∀ ( i → j ) ∈ G ′ d : p i − p j + η i → j = 0 , 1; p s = 0 , p d = 1 . (8) This expression is nothing but the integer programming f o rm ulation o f the famous min- cut problem, calculating the minimum w eigh t cut splitting all the no des of the directed graph in to t w o parts suc h tha t the g roup including the source no de has all v ariables in the 0 state while the o t her group, including the destination no de, has all v aria bles in the 1 state. An y { η , p } conﬁguration whic h satisﬁes conditions in Eq. (8 ) requires that either η i → j = 0 and η j → i = 1 or η i → j = 1 and η j → i = 0 for an y pair of directed edges ( i → j ) , ( j → i ) ∈ G d . This suggests tha t Eq. (8) can b e r estated in terms of the undirected graph G ′ , equiv alen t t o the o riginal G supplemen ted by the source and destination v ertexes and edges with the f o llo wing p ositiv e w eigh ts J si = 2 h i , if h i > 0; J id = 2 | h i | , if h i < 0 . (9) One deriv es the follow ing undirected v ersion of Eq. (8) − 1 2 X ( i,j ) ∈G ′ J ij + min { η,p } X ( i,j ) ∈G ′ J ij η i → j       ∀ i ∈ G ′ , p i = 0 , 1; ∀ ( i, j ) ∈ G ′ : p i − p j + η ij = 0 , 1; p s = 0 , p d = 1 . (10) Exactness of Belief P r op agation for Some Gr aphic al Mo dels with L o ops 6 The min-cut problem (10) is solv able in p olynomial time. This means, in particular, that solution of the Inte ger Progra mming Eq. (10) a nd solution of the resp ectiv e relaxed LP-A, − 1 2 X ( i,j ) ∈G ′ J ij + min { η,p } X ( i,j ) ∈G ′ J ij η ij       8 > > > > < > > > > : ∀ i ∈ G ′ , 1 ≥ p i ≥ 0; ∀ ( i, j ) ∈ G ′ : p i − p j + η ij ≥ 0; p s = 0 , p d = 1 , (11) are iden tical. The tigh tness of the relaxation is, e.g., discussed in [15 ]. (See Ch. 6.1 and sp eciﬁcally Theorems 6.1,6.2 in [15].) Also, this observ ation is closely relat ed to the fact that the matrix of constraints in t he max-ﬂo w problem, whic h is dual to Eq. (10 ), is T otally Uni-Mo dular ( TUM), i.e. suc h that any square minor of the matrix has determinan t whic h is 0 , +1 or − 1. (See e.g. Ch. 1 3.2 of [15] for discussion of the TUM IP/LP pro blems.) 2.2. Bethe F r e e Ener gy a n d B elief Pr op agation for FRFI Discussing the FRFI mo del deﬁned in Eq. (1) and following the general heuristic approac h to the graphical mo dels, suggested in [23], one introduces b eliefs, i.e. estimated probabilities, fo r v ertexes and edges, b i ( σ i ), b ij ( σ i , σ j ), related to eac h other according to ∀ i & ∀ j ∈ i : b i ( σ i ) = X σ j b ij ( σ i , σ j ) , (12) and also satisfying the ob vious normalization condition ∀ i : X σ i b i ( σ i ) = 1 . (13) Then the Bethe free energy functional of the b eliefs is deﬁned as F = E − T S, E = − 1 2 X ( i,j ) X σ i ,σ j b ij ( σ i , σ j ) J ij σ i σ j − X i X σ i b i ( σ i ) h i σ i , (14) S = X ( i,j ) X σ i ,σ j b ij ( σ i , σ j ) ln b ij ( σ i , σ j ) − X i X σ i b i ( σ i ) ln b i ( σ i ) . (15) In tro ducing Lagr a ngian m ultipliers asso ciated with the constrain ts (12,13), one deﬁnes the Lag rangian f unctional L = F + X i X j ∈ i X σ i η ij ( σ i )   b i ( σ i ) − X σ j b ij ( σ i , σ j )   + X i λ i X σ i b i ( σ i ) − 1 ! . (16) Lo oking f or the stationary point of the Lagrangian ov er all the par a meters (the b eliefs and the Lag rangian m ultipliers) will deﬁne the Belief Propagation (BP) equations. Iterativ e solution of the BP equations constitutes the celebrated BP algorithm, whic h is often used as an eﬃcien t heuristic for estimating marg ina l probabilities in sparse graphical mo dels. Exactness of Belief P r op agation for Some Gr aphic al Mo dels with L o ops 7 2.2.1. Gr ound State In the T → 0 limit the en trop y terms in the expression for the Bethe free energy in Eqs. (14) can b e neglected a nd the task of ﬁnding the absolute minim um of the Bethe free energy functional turns into minimization of the self-energy , E from Eq. (14), under the set of constrain ts (1 2,13). Both the optimization f unctiona l and the constraints are linear in the b eliefs, therefore one gets here t he following Linear Programming optimization: min { b i ; b ij }   − X ( i,j ) ∈G X σ i ,σ j b ij ( σ i , σ j ) J ij 2 σ i σ j − X i ∈G X σ i b i ( σ i ) h i σ i         ∀ i, ( i, j ) ∈ G : b i ( σ i ) = P σ j b ij ( σ i , σ j ) P σ i b i ( σ i ) = 1 , (17 ) where it is also assumed that all the b eliefs are po sitiv e a nd smaller than or equal to unit y (a s we are lo oking only for ph ysically sensible solutions of the optimization problem). Making the transformatio n from the original graph G to its extended v ersion, G ′ , i.e. in tro ducing new edges with w eigh ts deﬁned in Eqs. (9), a nd requiring that the spin v alues of the source/destination are ﬁxed to ± 1 respectiv ely , i.e. b s (+) = b d ( − ) = 1 and b s ( − ) = b d (+) = 0, o ne rewrites Eq. (1 7) as min { b i ; b ij }   − X ( i,j ) ∈G ′ X σ i ,σ j b ij ( σ i , σ j ) J ij 2 σ i σ j         ∀ i, ( i, j ) ∈ G ′ : b i ( σ i ) = P σ j b ij ( σ i , σ j ) ∀ i ∈ G ′ : P σ i b i ( σ i ) = 1 b s (+) = 1 & b d ( − ) = 1 . (1 8) The structure of the optimization functional in Eq. (18) suggests to reduce the n um b er of v ariables (b eliefs), thus ke eping only the edge v ariables µ ij ≡ b ij (+ , − ) + b ij ( − , +) = 1 − b ij (+ , +) − b ij ( − , − ) , (19) deﬁned as the probabilities to observ e the edge ( i, j ) either in the state (+ , − ) or in the state ( − , +). Th us, b y construction, 1 ≥ µ ij ≥ 0. The µ ij v aria bles de ﬁned at diﬀerent edges are related to eac h other t hr o ugh lo cal b eliefs, π i = b i ( − ) = 1 − b i (+), whic h all satisfy , 0 ≤ π i ≤ 1. T aking all these o bserv ations into account one rewrites Eq. ( 1 8) as − 1 2 X ( i,j ) ∈G ′ J ij + min { µ,π } X ( i,j ) ∈G ′ J ij µ ij       ∀ i ∈ G ′ , ∀ j ∈ i : π i − π j + µ ij ≥ 0; 1 ≥ µ ij ≥ 0 ∀ i ∈ G ′ : 1 ≥ π i ≥ 0 π s = 0 , π d = 1 . (20) One ﬁnds t ha t, up to an obvious c hange of v ariables from µ to η and from π to p , the LP-B of Eq. (2 0) is iden tical to the LP-A (11) . According to t he Theorem 6.1 o f [15], solutions of Eq. (20 ), or Eq. (11), are in tegers, ∀ ( i, j ) ∈ Γ ′ µ ij , η ij = 0 , 1 and ∀ i ∈ G ′ p i , π i = 0 , 1. Summarizing, it w as just sho wn that as T → 0 the BP solution of the FRF I mo del, understo o d a s the global minimum of t he Bethe F ree energy , is also the ML solution of the mo del. Exactness of B elief Pr op a gation for So m e Gr aphic al Mo dels with L o op s 8 3. Binary mo del with T otally Uni-Mo dular Constrain ts Consider N binar y v a riables com bined in the vec tor σ = ( σ i = 0 , 1 | i = 1 , · · · , N ), and asso ciate the f ollo wing normalized probability to any p ossible v a lue of the v ector p ( σ ) = Z − 1 exp − T − 1 X i h i σ i ! Y α δ X i J αi σ i , m α ! , (21) where δ ( x, y ) is one if x = y a nd it is zero otherwise; α = 1 , · · · , M ; matrix ˆ J ≡ ( J αi = 0 , 1 | i = 1 , · · · , N ; α = 1 , · · · , M ) is T otally Uni-Mo dula r (TUM), i.e. determinan t of an y square minor o f the matrix is 0 , 1 or − 1; the v ector m = ( m α | α = 1 , · · · , M ) is constructed from p ositiv e integers, so that ∀ α : m α ≤ q α ≡ P i J αi . The partition function Z is in tro duced in Eq. (21) to guaran tee normalization, P σ p ( σ ) = 1. The mo del Eq. (21) can b e view ed as a graphical mo del deﬁned on the bipa r t ite graph consisting of “bits”, { i } , and “chec ks”, { α } . Also, there ma y b e other g r a phical in terpretations. Th us, for the w eigh ted matc hing problem, e.g. studied in [2], the binary v ariables in the form ulation of Eq. (21), are asso ciated with edges of the complete bipartite gra ph. (In this case of the w eigh ted matc hing, one can sho w that the resulting matrix of constraints is indeed TUM.) 3.1. Eﬃcient ML sol ution W e, ﬁrst of all, observ e that the pro blem of ﬁnding the Maximum Likelihoo d of Eq. (21) is equiv alen t to the follow ing In teger Programming (IP) min σ X i h i σ i      8 > < > : ∀ i : σ i = 0 , 1 ∀ α : P i J αi σ i = m α . (22) Relaxing the IP to resp ectiv e LP-A. with σ i = 0 , 1 c hanged t o s i = [0; 1], min σ X i h i s i      8 > < > : ∀ i : 0 ≤ s i ≤ 1 ∀ α : P i J αi s i = m α . (23) one ﬁnds that the relaxation is tigh t. In other w ords, the solutions of the IP problem a nd the LP problem are exactly equiv alent. This is due to the Theorem (see e.g. Theorem 13.1 of [1 5]) stating that if ˆ J is TUM and m is inte ger, then all feasible solutions of the LP problem a r e integer. 3.2. Bethe F r e e Ener gy & BP Here w e discuss the Bethe f ree energy/Belief Propagation (BP) approac h to the mo del deﬁned in Eq. (21 ). The Bethe free energy functional is F = E − T S, E = X i h i b i (1) , (24) S = X α X σ α b α ( σ α ) ln b α ( σ α ) − X i ( q α − 1) b i ( σ i ) ln b i ( σ i ) , Exactness of B elief Pr op a gation for So m e Gr aphic al Mo dels with L o op s 9 where a v ector σ α ≡ ( σ i |∀ i s.t. J αi = 1 ; P i J αi σ i = m α ) deﬁnes the set of allow ed conﬁgurations of v a riables mark ed by index i asso ciated a nd consisten t with the g iven constrained α . F or any giv en m α the n um b er of suc h allow ed v ectors/conﬁgurations of σ α is C q α m α = m α ! / (( m α − q α )! q α !). As usual, b α ( σ α ) and b i ( σ i ) are b eliefs (estimations for the resp ectiv e probabilities) asso ciated with the v ariables and the constraints . The t w o types of b eliefs are related to each other via the follo wing compatibility constrain ts: ∀ i & ∀ α s.t. J αi = 1 : b i ( σ i ) = X σ α \ σ i b α ( σ α ) , (25) and one should a lso imp ose the normalizat io n constraint ∀ i : X σ i b i ( σ i ) = 1 . (26) Incorp orating the compatibility and norma lizat io n constrain ts in the form of Lagrangian m ultipliers in to the v ariational functional o ne derive s the Lagrangian L = F + X i X α ∋ i µ αi ( σ i )   b i ( σ i ) − X σ α \ σ i b α ( σ α )   + X i λ i ( σ i ) X σ i b i ( σ i ) − 1 ! . (27) Lo oking for the stationary p oints of the Lagrang ia n with resp ect to all the b eliefs and the Lagra ng ian m ultipliers, λ , µ , one arriv es at the Belief Propa g ation equations for the problem. 3.3. T → 0 lim it of the Bethe fr e e ener gy In the T → 0 limit the entrop y term in Eq. (24) can b e dro pp ed and the problem turns in to minimization of the LP type min { b i ; b α } X i h i b i (1)      8 > > > > < > > > > : ∀ i : 0 ≤ b i ( σ i ) ≤ 1 ∀ i & ∀ α s.t. J αi = 1 : b i ( σ i ) = P σ α \ σ i b α ( σ α ) ∀ i : P σ i b i ( σ i ) = 1 (28) It is straightforw ard to v erify that the b eliefs asso ciated with α could b e completely remo v ed from Eq. (28), and the LP problem can b e restated solely in terms of the i - related v ariables, β i ≡ b i (1) = 1 − b i (0). Let us illustrate this p oin t on example of a single α constraint with m α = 2 and q α = 3. Then the set of allo w ed α -b eliefs are b α (1 , 1 , 0) , b α (1 , 0 , 1) , b α (0 , 1 , 1) , (29) and the resp ectiv e set of relations (25) b et w een β 1 , β 2 , β 3 asso ciated with the c hec k α and the α b eliefs ar e β 1 = b α (1 , 1 , 0) + b α (1 , 0 , 1) , β 2 = b α (1 , 1 , 0) + b α (0 , 1 , 1) , β 3 = b α (1 , 0 , 1) + b α (0 , 1 , 1) . (30) On the other hand the normalization condition, restated in terms of the α -b eliefs (29), is b α (1 , 1 , 0) + b α (1 , 0 , 1) + b α (0 , 1 , 1) = 1 . (31) Exactness of B elief Pr op a gation for Some Gr aphi c al Mo dels with L o op s 10 Summing Eqs. (30) and accounting for Eq. (31), one ﬁnds β 1 + β 2 + β 3 = 2 . (32) In general, one ﬁnds that the relation b et w een β v ariables a sso ciated with an α - constrain t is X i J αi β i = m α . (33) One deriv es t hat Eq. (28) r educes to a simpler LP-B pro blem stated solely in terms of the β v ariables min β i X i h i β i      8 > < > : ∀ i : 0 ≤ β i ≤ 1 ∀ α : P i J αi β i = m α . (34) F urthermore, o ne observ es that, up to re-deﬁnition of β i to s i , Eq. (34) is equiv a lent to Eq. (17). In ot her w ords, w e just sho w ed that the T → 0 solution of the BP equations, understo o d a s the global minim um of the Bethe f r ee energy , is tigh t, i.e. it giv es exactly the ML solution o f the binary mo del (21). As a side remark, one notes that it is suggestiv e to start exploration o f the Bethe F ree Energy at ﬁnite T from the LP solution discussed ab ov e. It might b e esp ecially useful t o initiate BP with the (easy to get) LP solution when the Bethe F ree Energy optimization at ﬁnite T is non-conv ex. 4. Summary and Pa th F orw ard In t his w ork w e discussed easy problems when a zero-temp erat ur e BP sche me generates exact ML result. W e argued that this sp ecial feature of BP is due to the fact that the related LP optimization is tight (i.e. the LP outputs ML solution as w ell). Our consideration w as based on the ﬂexibility and con v enience pro vided b y the so- called Bethe F ree Energy formu lation, naturally relating BP and LP . The results w ere illustrated o n t w o examples, FRF I mo del and p erfect matc hing mo del. Also, w e brieﬂy discusse d a broader class of easy examples related to LP with TUM matr ix o f constraints. W e conclude brieﬂy mentioning some future c hallenges whic h follow from our analysis. • It is useful to contin ue further exploration of other mo dels of statistical inference with lo ops allowing computationally eﬃcien t optimal solutions. Th us, it w ould b e in teresting to ﬁnd examples of “easy” no n-binary problems, also these whic h allo w eﬃcien t and optimal ﬁnite temp erature ev alua tion o f marginal probabilities or partition function. I n this con text, o ne mentions exactness o f BP marginals at an y temp erature known to hold for con tin uous v aria ble Gaussian mo del on an arbitrary gr a ph [21, 13] and also recen tly established, T → 0, relatio n b et w een an iterativ e algorithm of BP t yp e and Quadratic optimization problem [14]. Exactness of B elief Pr op a gation for Some Gr aphi c al Mo dels with L o op s 11 • Probably the most intriguing future challen ge is to a na lyze problems tha t are not computationally easy , but still close, in some metric, to easy problems. Th us, the mo dels discusse d ab o v e, ho w ev er considered at ﬁnite, not zero, temp erature ma y not allo w an explicit eﬃcien t solution. Similarly , p erturbatio n o f the F R FI mo del with some n um b er of graph lo cal frustrations (e.g. some n um b er of rando mly thro wn negativ e J ij violating the TUM-feature of the mo del) sets another “ close to easy” problem of theoretical and applied interest. As suggested in [12], BP can b e utilized as an eﬃcien t heuristic in these “close to easy” cases. Note, that in this case ﬁnding minima o f the Bethe F ree energy ma y b e a challe nge, and the problem t urns in to the quest o f devising an eﬃcien t algorithms for t he o pt imizatio n of non-con v ex functions [25, 26]. Here nov el BP-con v exiﬁxation ideas dev elop ed in [20, 12, 22, 9] migh t b e helpful. No t ice also, that the lo op calculus a pproac h of [5, 6] is another useful to o l whic h may come handy in p erturbativ e and non- p erturbativ e a na lysis of these “close to easy” problems. • BP is the algorithm o f c hoice for decoding of error-correction codes stated in terms of sparse graphs [8]. On the other hand, the ab o v e discussion suggests that for BP to deco de optimally , o r close to optimally , the g r a phical structure should not necessarily b e sparse. Therefore, an intriguing question is: can one design a class of dense co des deco ded optimally (or close to o ptimally) by an algorithm of BP t ype? The author a c kno wledges inspiring discussions with V. Chern y ak, M. V ergassola, D. Shah, B. Shraiman a nd M. W a inwrigh t. The w ork was carried o ut under the auspices of the Na t ional Nuclear Securit y Administration of the U.S. Departmen t of Energy a t Los Alamos National Lab orato r y under Contract No . DE-AC 52-0 6 NA25396. The author also ac kno wledges the W eston Visiting Professorship Program supp orting his stay at the W eizmann Institute, where the w ork w as completed. Bibliograph y [1] M. Bayati, C. Bo r gs, J . Chay es, and R. Zecchina. Belief-propa gation for weigh ted b- ma tc hings o n arbitrar y graphs and its rela tion to linear pr ograms with integer solutions, 200 7. [2] M. Bay ati, D. Shah, and M. Shar ma . Max-pro duct for maximum weight matching: Co nv ergence, correctnes s , and lp duality . IEEE T r ansactions on Information The ory , 54 (3):1241– 1251, 2008. Pro c. IEEE In t. Symp. Informatio n Theo ry , 2006. [3] H.A. Bethe. Statistical theory o f supe r lattices. Pr o c e e dings of R oyal So ciety of L ondon A , 150:55 2, 1935. [4] E . Bo ros and P . L. Hammer. Pseudo-b o olean optimization. Discr ete Applie d Mathematics , 123:15 5–225 , 2002. [5] M. Chertko v and V. Cherny ak. Lo op ca lculus in statistical physics and info r mation s cience. Physic al R eview E , 73:065 102(R), 2006. [6] M. Chertko v and V. Cher n yak. L o o p series fo r discrete statistica l mo dels o n graphs. Journ al of Statistic al Me chanics , page P060 09, 200 6. [7] J . F eldman, M. W ainwright , and D.R. Ka rger. Using line a r progr amming to deco de binar y linear co des. Information The ory , IEEE T ra nsactions on , 51:95 4, 2005 . [8] R.G. Ga llager. L ow density p arity che ck c o des . MIT Pr essCambridhe, MA, 19 63. Exactness of B elief Pr op a gation for Some Gr aphi c al Mo dels with L o op s 12 [9] A. Glob erson and T. Jaakola. Fixing max- pro duct: Conv erg en t message- passing algor ithms for map lp-relaxations . In Pr o c e e dings of NIPS , 2007 . [10] A. Hartmann and H. Rieger. O ptimization Algorithms in Physics . Wiley VCH, Berlin, 20 02. [11] B. Huang and T. Jebara . Lo opy b elief pro pagation for bipar tite maximum weigh t b-matching. In In pr o c e e dings of Artiﬁcial I n tel ligenc e and Statistics (AIST A TS) , 2007 . [12] V. Kolmo gorov and M.J. W ainwrigh t. On the o ptimalit y of tr e e -reweigh ted max-pro duct messa ge- passing. In Un c ertainty on A rtiﬁcial Intel ligenc e , Edinburgh, Scotland, 2005. [13] D. M. Malioutov, J. K. Jo hnson, a nd A. S. Willsky . W alk-sums and b elief propaga tion in g aussian graphical mo dels. Jour n al of Machine L e arning R ese ar ch , 7:2 031–2 064, 2006 . [14] C. C. Mo a llemi a nd B. V an Roy . Conv erg e nc e of the min-sum message pas sing algor ithm for quadratic optimization, 200 6. [15] H. Papadimitriou and I. Steiglitz. Combinatorial Optimization: A lgorithms and Complexity . Dov er, 19 98. [16] J. Pearl. Pr ob abilistic R e asoning in Intel ligent Systems: Networks of Plausible Infer enc e . San F rancisco: Mo r gan K aufmann Publishers, Inc., 198 8. [17] H.A. Peierls. Ising ’s mo del of ferromag netism. Pr o c e e dings of Cambridge Philosoph ic al So ciety , 32:477 –481, 1936. [18] S. Sanghavi, D.M. Malioutov, and A. Willsky . Linear pr ogramming a nalysis of lo o p y b elief propaga tion for weigh ted matching. In Pr o c e e dings of NIPS , 200 7. [19] M. J. W ain wright a nd M. I. Jorda n. Graphical mo dels, expo nent ial families, a nd v ariational inference. T echnical Repo r t 6 49, UC B e rkeley , Department of Statistics , 2003. [20] M.J. W ain wright, T.S. Jaakkola, and A.S. Willsk y . T ree- based r eparametriz a tion framework for approximate estimation on gra phs with cyc le s. Information The ory , IEEE T ra nsactions on , 49(5):112 0–114 6, 2 003. [21] Y. W eiss and W.T. F reeman. Cor rectness of b elief propag a tion in gauss ian graphical mo dels of arbitrar y top ology . Neura l Computation , 13(10 ):2173– 2200, 200 1. [22] Y. W eiss , C. Y anover, and T. Melz e r . Map estimation, linear pr ogramming and b elief propag a tion with conv ex free energ ies. In Pr o c e e dings of UAI , 2007 . [23] J. S. Y edidia, W. T. F reeman, and Y. W eiss. Cons tructing free-energ y a pproximations and gener alized b elief propaga tion alg orithms. Information The ory, IEEE T r ansactions on , 51(7):228 2–231 2, 2 005. [24] J.S. Y edidia, W.T. F reeman, and Y. W eiss. Gener alize d b elief pr op agation , volume 13, pages 689–6 95. Cam bridge, MA, MIT Press, 2001 . [25] A. L. Y uille. Cccp algo rithms to minimize the b ethe and kik uchi fre e energies: conv erg e nt alternatives to b elief propa gation. Neur al Comput. , 1 4 (7):1691– 1722, 20 02. [26] A. L. Y uille and Ana nd Ranga ra jan. The concave-con vex pro cedure. Neur al Comput. , 1 5 (4):915– 936, 2003.

Exactness of Belief Propagation for Some Graphical Models with Loops

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment