MCMC assisted by Belief Propagation

Synthesis of MCMC and Belief Pr opagation Sungsoo Ahn S U N G S O O . A H N @ K A I S T . AC . K R Michael Chertkov C H E RT K OV @ L A N L . G OV Jinwoo Shin J I N W O O S @ K A I S T . AC . K R Abstract Markov Chain Monte Carlo (MCMC) and Belief Propagation (BP) are the most popular algorithms for computational inference in Graphical Models (GM). In principle, MCMC is an exact probabilistic method which, howe v er , often suf fers from e xponentially slo w mixing. In contrast, BP is a deterministic method, which is typically fast, empirically very successful, ho we ver in general lacking control of accuracy o ver loopy graphs. In this paper , we introduce MCMC algorithms correcting the approximation error of BP , i.e., we provide a way to compensate for BP errors via a consecuti v e BP-aw are MCMC. Our frame work is based on the Loop Calculus approach which allows to express the BP error as a sum of weighted generalized loops. Although the full series is computationally intractable, it is known that a truncated series, summing up all 2-regular loops, is computable in polynomial-time for planar pair-wise binary GMs and it also provides a highly accurate approximation empirically . Motiv ated by this, we ﬁrst propose a polynomial-time approximation MCMC scheme for the truncated series of general (non-planar) pair -wise binary models. Our main idea here is to use the W orm algorithm, known to provide f ast mixing in other (related) problems, and then design an appropriate rejection scheme to sample 2-regular loops. Furthermore, we also design an efﬁcient rejection-free MCMC scheme for approximating the full series. The main nov elty underlying our design is in utilizing the concept of cycle basis, which provides an ef ﬁcient decomposition of the generalized loops. In essence, the proposed MCMC schemes run on transformed GM built upon the non-trivial BP solution, and our experiments sho w that this synthesis of BP and MCMC outperforms both direct MCMC and bare BP schemes. 1. Introduction GMs express factorization of the joint multi v ariate probability distrib utions in statistics via graph of relations between v ariables. The concept of GM has been used successfully in information theory , ph ysics, artiﬁcial intelligence and machine learning [ 1 , 2 , 3 , 4 , 5 , 6 ]. Of many inference problems one can set with a GM, computing partition function (normalization), or equiv alently mar ginalizing the joint distribution, is the most general problem of interest. Ho we ver , this paradigmatic inference problem is kno wn to be computationally intractable in general, i.e., formally it is #P-hard ev en to approximate [7, 8]. T o address this obstacle, e xtensi ve ef forts have been made to develop practical approximation methods, among which MCMC- [ 9 ] based and BP- [ 10 ] based algorithms are, ar guably , the most popular and practically successful ones. MCMC is exact, i.e., it con ver ges to the correct answer , but its con vergence/mixing is, in general, exponential in the system size. On the other hand, message passing implementations of BP typically demonstrate fast con v er gence, howe v er in general lacking approximation guarantees for GM containing loops. Motiv ated by this complementarity of the MCMC and BP approaches, we aim here to synthesize a hybrid approach beneﬁting from a joint use of MCMC and BP . At a high le vel, our proposed scheme uses BP as the ﬁrst step and then runs MCMC to correct for the approximation error of BP . T o design such an “error-correcting" MCMC, we utilize the Loop Calculus approach [ 11 ] which allo ws, in a nutshell, to express the BP error as a sum (i.e., series) of weights of the so-called generalized loops (sub-graphs of a special structure). There are sev eral challenges one needs to overcome. First of all, to design an efﬁ cient Marko v Chain (MC) sampler , one needs to design a scheme which allo ws ef ﬁcient transitions between the generalized loops. Second, e v en if one designs such a MC which is capable of accessing all the generalized loops, it may mix slo wly . Finally , weights of generalized loops can be positi ve or negati v e, while an indi vidual MCMC can only generate non-neg ati ve contrib utions. 1 Since approximating the full loop series (LS) is intractable in general, we ﬁrst e xplore whether we can deal with the challenges at least in the case of the truncated LS corresponding to 2-re gular loops. In fact, this problem has been analyzed in the case of the planar pairwise binary GMs [ 12 , 13 ] where it was sho wn that the 2-regular LS is computable e xactly in polynomial-time through a reduction to a Pf af ﬁan (or determinant) computation [ 14 ]. In particular, the partition function of the Ising model without external ﬁeld (i.e., where only pair-wise factors present) is computable exactly via the 2-re gular LS. Furthermore, the authors show that in the case of general planar pairwise binary GMs, the 2-regular LS provides a highly accurate approximation empirically . Moti v ated by these results, we address the same question in the general (i.e., non-planar) case of pairwise binary GMs via MCMC. F or the choice of MC, we adopt the W orm algorithm [ 15 ]. W e prov e that with some modiﬁcation including rejections, the algorithm allows to sample (with probabilities proportional to respectiv e weights) 2-regular loops in polynomial-time. Then, we design a novel simulated annealing strategy using the sampler to estimate separately positive and ne gati ve parts of the 2-re gular LS. Giv en any ε > 0 , this leads to a ε -approximation polynomial-time scheme for the 2-regular LS under a mild assumption. W e next turn to estimating the full LS. In this part, we ignore the theoretical question of establishing the polynomial mixing time of a MC, and instead focus on designing an empirically ef ﬁcient MCMC scheme. W e design an MC using a c ycle basis of the graph [ 16 ] to sample generalized loops directly , without rejections. It transits from one generalized loop to another by adding or deleting a random element of the cycle basis. Using the MC sampler , we design a simulated annealing strategy for estimating the full LS, which is similar to what was used earlier to estimate the 2-re gular LS. Notice that e ven though the prime focus of this paper is on pairwise binary GMs, the proposed MCMC scheme allows straightforward generalization to general non-binary GMs. In summary , we propose novel MCMC schemes to estimate the LS correction to the BP contrib ution to the partition function. Since already the bare BP provides a highly non-trivial estimation for the partition function, it is naturally expected and conﬁrmed in our experimental results that the proposed algorithm outperforms other standard (not related to BP) MCMC schemes applied to the original GM. W e believ e that our approach provides a new angle for approximate inference on GM and is of broader interest to v arious applications in v olving GMs. 2. Preliminaries 2.1 Graphical models and belief propagation Gi ven undirected graph G = ( V , E ) with | V | = n, | E | = m , a pairwise binary Markov Random F ields (MRF) deﬁnes the following joint probability distrib ution on x = [ x v ∈ { 0 , 1 } : v ∈ V ] : p ( x ) = 1 Z Y v ∈ V ψ v ( x v ) Y ( u,v ) ∈ E ψ u,v ( x u , x v ) , Z := X x ∈{ 0 , 1 } n Y v ∈ V ψ v ( x v ) Y ( u,v ) ∈ E ψ u,v , ( x u , x v ) where ψ v , ψ u,v are some non-ne gati v e functions, called compatibility or factor functions, and the normalization constant Z is called the partition function . W ithout loss of generality , we assume G is connected. It is known that approximating the partition function is #P-hard in general [ 8 ]. Belief Propagation (BP) is a popular message-passing heuristic for approximating mar ginal distrib utions of MRF . The BP algorithm iterates the following message updates for all ( u, v ) ∈ E : m t +1 u → v ( x v ) ∝ X x u ∈{ 0 , 1 } ψ u,v ( x u , x v ) ψ u ( x u ) Y w ∈ N ( u ) \ v m t w → u ( x u ) , where N ( v ) denotes the set of neighbors of v . In general BP may fail to con verge, ho wever in this case one may substitute it with a somehow more in volved algorithm prov ably con vergent to its ﬁx ed point [ 22 , 23 , 24 ]. Estimates for the marginal probabilities are expressed via the ﬁxed-point messages { m u → v : ( u, v ) ∈ E } as 2 follows: τ v ( x v ) ∝ ψ v ( x v ) Q u ∈ N ( v ) m u → v ( x v ) and τ u,v ( x u , x v ) ∝ ψ u ( x u ) ψ v ( x v ) ψ u,v ( x u , x v )   Y w ∈ N ( u ) m w → v ( x u )     Y w ∈ N ( v ) m w → v ( x v )   . 2.2 Bethe approximation and loop calculus BP marginals also results in the follo wing Bethe appr oximation for the partition function Z : log Z Bethe = X v ∈ V X x v τ v ( x v ) log ψ v ( x v ) + X ( u,v ) ∈ E X x u ,x v τ u,v ( x u , x v ) log ψ u,v ( x u , x v ) − X v ∈ V X x v τ v ( x v ) log τ v ( x v ) − X ( u,v ) ∈ E X x u ,x v τ u,v ( x u , x v ) log τ u,v ( x u , x v ) τ u ( x u ) τ v ( x v ) If graph G is a tree, the Bethe approximation is exact, i.e., Z Bethe = Z . Howe ver , in general, i.e. for the graph with cycles, BP algorithm pro vides often rather accurate but still an approximation. Loop Series (LS) [11] expresses, Z/ Z Bethe , as the following sum/series: Z Z Bethe = Z Loop := X F ∈L w ( F ) , w ( ∅ ) = 1 , w ( F ) := Y ( u,v ) ∈ E F  τ u,v (1 , 1) τ u (1) τ v (1) − 1  Y v ∈ V F τ v (1) + ( − 1) d F ( v )  τ v (1) 1 − τ v (1)  d F ( v ) − 1 τ v (1) ! where each term/weight is associated with the so-called generalized loop F and L denotes the set of all generalized loops in graph G (including the empty subgraph ∅ ). Here, a subgraph F of G is called generalized loop if all vertices v ∈ F have de gree d F ( v ) (in the subgraph) no smaller than 2 . Since the number of generalized loops is e xponentially large, computing Z Loop is intractable in general. Howe ver , the follo wing truncated sum of Z Loop , called 2-r e gular loop series , is known to be computable in polynomial-time if G is planar [12]: 1 Z 2-Loop := X F ∈L 2-Loop w ( F ) , where L 2-Loop denotes the set of all 2-r e gular g eneralized loops , i.e., F ∈ L 2-Loop if d F ( v ) = 2 for e v ery v ertex v of F . One can check that Z Loop = Z 2-Loop for the Ising model without the external ﬁelds. Furthermore, as stated in [12, 13] for the general case, Z 2-Loop provides a good empirical estimation for Z Loop . 3. Estimating 2-regular loop series via MCMC In this section, we aim to describe how the 2 -regular loop series Z 2-Loop can be estimated in polynomial-time. T o this end, we ﬁrst assume that the maximum degree ∆ of the graph G is at most 3. This de gree constrained assumption is not really restricti ve since an y pairwise binary model can be easily expressed as an equi v alent one with ∆ ≤ 3 , e.g., see the supplementary material. The rest of this section consists of two parts. W e ﬁrst propose an algorithm generating a 2-regular loop sample with the probability proportional to the absolute value of its weight, i.e., π 2-Loop ( F ) := | w ( F ) | Z † 2-Loop , where Z † 2-Loop = X F ∈L 2-Loop | w ( F ) | . 1. Note that the number of 2-regular loops is exponentially lar ge in general. 3 Note that this 2-regular loop contrib ution allo ws the follo wing factorization: for any F ∈ L 2-Loop , | w ( F ) | = Y e ∈ F w ( e ) , where w ( e ) :=      τ u,v (1 , 1) − τ u (1) τ v (1) p τ u (1) τ v (1)(1 − τ u (1))(1 − τ v (1))      . (1) In the second part we use the sampler constructed in the ﬁrst part to design a simulated anneali ng scheme to estimate Z 2-Loop . 3.1 Sampling 2-regular loops W e suggest to sample the 2-regular loops distributed according to π 2-Loop through a version of the W orm algo- rithm proposed by Prokoﬁev and Svistunov [ 15 ]. It can be vie wed as a MC exploring the set, L 2-Loop S L 2-Odd , where L 2-Odd is the set of all subgraphs of G with exactly two odd-degree vertices. Giv en current state F ∈ L 2-Loop S L 2-Odd , it chooses the next state F 0 as follows: 1. If F ∈ L 2-Odd , pick a random v ertex v (uniformly) from V . Otherwise, pick a random odd-degree v ertex v (uniformly) from F . 2. Choose a random neighbor u of v (uniformly) within G , and set F 0 ← F initially . 3. Update F 0 ← F ⊕ { u, v } with the probability          min  1 n | w ( F ⊕{ u,v } ) | | w ( F ) | , 1  if F ∈ L 2-Loop min  n 4 | w ( F ⊕{ u,v } ) | | w ( F ) | , 1  else if F ⊕ { u, v } ∈ L 2-Loop min  d ( v ) 2 d ( u ) | w ( F ⊕{ u,v } ) | | w ( F ) | , 1  else if F , F ⊕ { u, v } ∈ L 2-Odd Here, ⊕ denotes the symmetric difference and for F ∈ L 2-Odd , its weight is deﬁned according to w ( F ) = Q e ∈ F w ( e ) . In essence, the W orm algorithm consists in either deleting or adding an edge to the current subgraph F . From the W orm algorithm, we transition to the following algorithm which samples 2 -regular loops with probability π 2-Loop simply by adding rejection of F if F ∈ L 2-Odd . Algorithm 1 Sampling 2-regular loops 1: Input: Number of trials N ; number of iterations T of the W orm algorithm 2: Output: 2-re gular loop F . 3: for i = 1 → N do 4: Set F ← ∅ and update it T times by running the W orm algorithm 5: if F is a 2-regular loop then 6: BREAK and output F . 7: end if 8: end for 9: Output F = ∅ . The following theorem states that Algorithm 1 can generate a desired random sample in polynomial-time. Theorem 1. Given δ > 0 , c hoose inputs of Algorithm 1 as N ≥ 1 . 2 n log (3 δ − 1 ) , and T ≥ ( m − n + 1) log 2 + 4∆ mn 4 log(3 nδ − 1 ) . Then, it follows that 1 2 X F ∈L 2-Loop     P  Algorithm 1 outputs F  − π 2-Loop ( F )     ≤ δ . namely , the total variation distance between π 2-Loop and the output distribution of Algorithm 1 is at most δ . 4 The proof of the above theorem is presented in the supplementary material due to the space constraint. In the proof, we ﬁrst sho w that MC induced by the W orm algorithm mix es in polynomial time, and then pro v e that acceptance of a 2-re gular loop, i.e., line 6 of Algorithm 1, occurs with high probability . Notice that the uniform-weight version of the former proof, i.e., f ast mixing, was recently pro ven in [ 18 ]. For completeness of the material exposition, we present the general case proof of interest for us. The latter proof, i.e., high acceptance, requires to bound |L 2-Loop | and |L 2-Odd | to sho w that the probability of sampling 2-re gular loops under the W orm algorithm is 1 / poly ( n ) for some polynomial function poly ( n ) . 3.2 Simulated annealing for approximating 2-regular loop series Here we utilize Theorem 1 to describe an algorithm approximating the 2 -regular LS Z 2-Loop in polynomial time. T o achie ve this goal, we rely on the simulated annealing strategy [ 19 ] which requires to decide a monotone cooling schedule β 0 , β 1 , . . . , β ` − 1 , β ` , where β ` corresponds to the target counting problem and β 0 does to its relaxed easy version. Thus, designing an appropriate cooling strategy is the ﬁrst challenge to address. W e will also describe ho w to deal with the issue that Z 2-Loop is a sum of positi ve and negati ve terms, while most simulated annealing strategies in the literature mainly studied on sums of non-negativ e terms. This second challenge is related to the so-called ‘fermion sign problem’ common in statistical mechanics of quantum systems [25]. Before we describe the proposed algorithm in details, let us provide its intuitiv e sketch. The proposed algorithm consists of two parts: a) estimating Z † 2-Loop via a simulated annealing strategy and b) estimating Z 2-Loop / Z † 2-Loop via counting samples corresponding to negati v e terms in the 2-regular loop series. First consider the following β -parametrized, auxiliary distribution o ver 2 -re gular loops: π 2-Loop ( F : β ) = 1 Z † 2-Loop ( β ) | w ( F ) | β , for 0 ≤ β ≤ 1 . (2) Note that one can generate samples approximately with probability (2) in polynomial-time using Algorithm 1 by setting w ← w β . Indeed, it follows that for β 0 > β , Z † 2-Loop ( β 0 ) Z † 2-Loop ( β ) = X F ∈L 2-Loop | w ( F ) | β 0 − β | w ( F ) | β Z † 2-Loop ( β ) = E π 2-Loop ( β ) h | w ( F ) | β 0 − β i , where the expectation can be estimated using O (1) samples if it is Θ(1) , i.e., β 0 is sufﬁ ciently close to β . Then, for any increasing sequence β 0 = 0 , β 1 , . . . , β n − 1 , β n = 1 , we deriv e Z † 2-Loop = Z † 2-Loop ( β n ) Z † 2-Loop ( β n − 1 ) · Z † 2-Loop ( β n − 1 ) Z † 2-Loop ( β n − 2 ) · · · Z † 2-Loop ( β 2 ) Z † 2-Loop ( β 1 ) Z † 2-Loop ( β 1 ) Z † 2-Loop ( β 0 ) Z † 2-Loop (0) , where it is know that Z † 2-Loop (0) , i.e., the total number of 2-re gular loops, is exactly 2 m − n +1 [ 16 ]. This allo ws us to estimate Z † 2-Loop simply by estimating E π 2-Loop ( β i )  | w ( F ) | β i +1 − β i  for all i . Our next step is to estimate the ratio Z 2-Loop / Z † 2-Loop . Let L − 2-Loop denote the set of negati v e 2-regular loops, i.e., L − 2-Loop := { F : F ∈ L 2-Loop , w ( F ) < 0 } . Then, the 2 -regular loop series can be e xpressed as Z 2-Loop = 1 − 2 P F ∈L − 2-Loop | w ( F ) | Z † 2-Loop ! Z † 2-Loop =  1 − 2 P π 2–Loop  w ( F ) < 0  Z † 2-Loop , where we estimate P π 2–Loop  w ( F ) < 0  again using samples generated by Algorithm 1. W e provide the formal description of the proposed algorithm and its error bound as follo ws. 5 Algorithm 2 Approximation for Z 2-Loop 1: Input: Increasing sequence β 0 = 0 < β 1 < · · · < β n − 1 < β n = 1 ; number of samples s 1 , s 2 ; number of trials N 1 ; number of iterations T 1 for Algorithm 1. 2: for i = 0 → n − 1 do 3: Generate 2-regular loops F 1 , . . . , F s 1 for π 2-Loop ( β i ) using Algorithm 1 with input N 1 and T 1 , and set H i ← 1 s 1 X j w ( F j ) β i +1 − β i . 4: end for 5: Generate 2-regular loops F 1 , . . . , F s 2 for π 2-Loop using Algorithm 1 with input N 2 and T 2 , and set κ ← |{ F j : w ( F j ) < 0 }| s 2 . 6: Output: b Z 2-Loop ← (1 − 2 κ )2 m − n +1 Q i H i . Theorem 2. Given ε, ν > 0 , c hoose inputs of Algorithm 2 as β i = i/n for i = 1 , 2 , . . . , n − 1 , s 1 ≥ 18144 n 2 ε − 2 w − 1 min d log(6 nν − 1 ) e , N 1 ≥ 1 . 2 n log (144 nε − 1 w − 1 min ) , T 1 ≥ ( m − n + 1) log 2 + 4∆ mn 4 log(48 nε − 1 w − 1 min ) , s 2 ≥ 18144 ζ (1 − 2 ζ ) − 2 ε − 2 d log(3 ν − 1 ) e , N 2 ≥ 1 . 2 n log (144 ε − 1 (1 − 2 ζ ) − 1 ) , T 2 ≥ ( m − n + 1) log 2 + 4∆ mn 4 log(48 ε − 1 (1 − 2 ζ ) − 1 ) wher e w min = min e ∈ E w ( e ) and ζ = P π 2–Loop [ w ( F ) < 0] . Then, the following statement holds P " | b Z 2-Loop − Z 2-Loop | Z 2-Loop ≤ ε # ≤ 1 − ν, which means Algorithm 2 estimates Z 2-Loop within appr oximation r atio 1 ± ε with high pr obability . The proof of the abo ve theorem is presented in the supplementary material due to the space constraint. W e note that all constants entering in Theorem 2 were not optimized. Theorem 2 implies that comple xity of Algorithm 2 is polynomial with respect to n, 1 /ε, 1 /ν under assumption that w − 1 min and 1 − 2 P π 2–Loop [ w ( F ) < 0] are polynomially small. Both w − 1 min and 1 − 2 P π 2–Loop [ w ( F ) < 0] depend on the choice of BP ﬁxed point, howe ver it is unlikely (unless a degeneracy) that these characteristics become lar ge. In particular , P π 2–Loop [ w ( F ) < 0] = 0 in the case of attractive models [20]. 4. Estimating full loop series via MCMC In this section, we aim for estimating the full loop series Z Loop . T o this end, we design a nov el MC sampler for generalized loops, which adds (or remo ves) a c ycle basis or a path to (or from) the current generalized loop. Therefore, we naturally start this section introducing necessary backgrounds on cycle basis . Then, we turn to describe the design of MC sampler for generalized loops. Finally , we describe a simulated annealing scheme similar to the one described in the preceding section. W e also report its experimental performance comparing with other methods. 4.1 Sampling generalized loops with cycle basis The cycle basis C of the graph G is a minimal set of cycles which allows to represent e v ery Eulerian subgraph of G (i.e., subgraphs containing no odd-degree verte x) as a symmetric dif ference of cycles in the set [ 16 ]. Let 6 us characterize the combinatorial structure of the generalized loop using the cycle basis. T o this end, consider a set of paths between any pair of v ertices: P = { P u,v : u 6 = v, u, v ∈ V , P u,v is a path from u to v } , i.e., |P | =  n 2  . Then the following theorem allo ws to decompose any generalized loop with respect to any selected C and P . Theorem 3. Consider any cycle basis C and path set P . Then, for any gener alized loop F , ther e e xists a decomposition, B ⊂ C ∪ P , such that F can be expr essed as a symmetric differ ence of the elements of B , i.e., F = B 1 ⊕ B 2 ⊕ · · · B k − 1 ⊕ B k for some B i ∈ B . The proof of the abov e theorem is gi v en in the supplementary material due to the space constraint. No w giv en an y choice of C , P , consider the following transition from F ∈ L , to the next state F 0 : 1. Choose, uniformly at random, an element B ∈ C ∪ P , and set F 0 ← F initially . 2. If F ⊕ B ∈ L , update F 0 ← ( F ⊕ B with probability min n 1 , | w ( F ⊕ B | ) | | w ( F ) | o F otherwise . Due to Theorem 3, it is easy to check that the proposed MC is irreducible and aperiodic, i.e., ergodic, and the distribution of its t -th state con verges to the follo wing stationary distrib ution as t → ∞ : π Loop ( F ) = | w ( F ) | Z † Loop , where Z † Loop = X F ∈L Loop | w ( F ) | . One also has a freedom in choosing C , P . T o accelerate mixing of MC, we suggest to choose the minimum weighted cycle basis C and the shortest paths P with respect to the edge weights { log w ( e ) } deﬁned in (1) , which are computable using the algorithm in [ 16 ] and the Bellman-F ord algorithm [ 21 ], respectiv ely . This encourages transitions between generalized loops with similar weights. 4.2 Simulated annealing for approximating full loop series Algorithm 3 Approximation for Z Loop 1: Input: Decreasing sequence β 0 > β 1 > · · · > β ` − 1 > β ` = 1 ; number of samples s 0 , s 1 , s 2 ; number of iterations T 0 , T 1 , T 2 for the MC described in Section 4.1 2: Generate generalized loops F 1 , · · · , F s 0 by running T 0 iterations of the MC described in Section 4.1 for π Loop ( β 0 ) , and set U ← s 0 s ∗ | w ( F ∗ ) | β 0 , where F ∗ = arg max F ∈{ F 1 , ··· ,F s 0 } | w ( F ) | and s ∗ is the number of F ∗ sampled. 3: for i = 0 → ` − 1 do 4: Generate generalized loops F 1 , · · · , F s 1 by running T 1 iterations of the MC described in Section 4.1 for π Loop ( β i ) , and set H i ← 1 s 1 P j | w ( F j ) | β i +1 − β i . 5: end for 6: Generate generalized loops F 1 , · · · F s 2 by running T 2 iterations of the MC described in Section 4.1 for π Loop , and set κ ← |{ F j : w ( F j ) < 0 }| s 2 . 7: Output: b Z Loop ← (1 − 2 κ ) Q i H i U . 7 (a) (b) (c) Figure 1: Plots of the log-partition function approximation error with respect to (av erage) interaction strength. Each point is av eraged ov er 20 (random) models. Now we are ready to describe a simulated annealing scheme for estimating Z Loop . It is similar, in principle, with that in Section 3.2. First, we again introduce the follo wing β -parametrized, auxiliary probability distribution π Loop ( F : β ) = | w ( F ) | β / Z † Loop ( β ) . For any decreasing sequence of annealing parameters, β 0 , β 1 , · · · , β ` − 1 , β ` = 1 , we deriv e Z † Loop = Z † Loop ( β ` ) Z † Loop ( β ` − 1 ) · Z † Loop ( β ` − 1 ) Z † Loop ( β ` − 2 ) · · · Z † Loop ( β 2 ) Z † Loop ( β 1 ) · Z † Loop ( β 1 ) Z † Loop ( β 0 ) Z † Loop ( β 0 ) . Follo wing similar procedures in Section 3.2, one can estimate Z † Loop ( β 0 ) /Z † Loop ( β ) = E π Loop ( β ) [ | w ( F ) | β 0 − β ] using the sampler described in Section 4.1. Moreover , Z † Loop ( β 0 ) = | w ( F ∗ ) | /P π Loop ( β 0 ) ( F ∗ ) is estimated by sampling generalized loop F ∗ with the highest probability P π Loop ( β 0 ) ( F ∗ ) . For large enough β 0 , the approxi- mation error becomes relati vely small since P π Loop ( β 0 ) ( F ∗ ) ∝ | w ( F ∗ ) | β 0 dominates ov er the distrib ution. In combination, this provides a desired approximation for Z Loop . The result is stated formally in Algorithm 3. 4.3 Experimental results In this section, we report experimental results for computing partition function of the Ising model and the hard-core model. W e compare Algorithm 2 in Section 3 (coined MCMC-BP-2reg) and Algorithm 3 in Section 4.2 (coined MCMC-BP-whole), with the bare Bethe approximation (coined BP) and the popular Gibbs-sampler (coined MCMC-Gibbs). T o make the comparison fair , we use the same annealing scheme for all MCMC schemes, thus making their running times comparable. More speciﬁcally , we generate each sample after running T 1 = 1 , 000 iterations of an MC and take s 1 = 100 samples to compute each estimation (e.g., H i ) at intermediate steps. For performance measure, we use the log-partition function approximation error deﬁned as | log Z − log Z approx | / | log Z | , where Z approx is the output of the respecti ve algorithm. W e conducted 3 experiments on the 4 × 4 grid graph. In our ﬁrst e xperimental setting, we consider the Ising model with v arying interaction strength and no external (magnetic) ﬁeld. T o prepare the model of interest, we start from the Ising model with uniform (ferromagnetic/attractiv e and anti-ferromagnetic/repulsi ve) interaction strength and then add ‘glassy’ v ariability in the interaction strength modeled via i.i.d Gaussian random v ariables with mean 0 and variance 0 . 5 2 , i.e. N (0 , 0 . 5 2 ) . In other words, gi ven av erage interaction strength 0 . 3 , each interaction strength in the model is independently chosen as N (0 . 3 , 0 . 5 2 ) . The second experiment was conducted by adding N (0 , 0 . 5 2 ) corrections to the external ﬁelds under the same condition as in the ﬁrst experiment. In this case we observe that BP often fails to con v erge, and use the Concav e Conv e x Procedure (CCCP) [ 23 ] for ﬁnding BP ﬁx ed points. Finally , we experiment with the hard-core model on the 4 × 4 grid graph with varying a positi v e parameter λ > 0 , called ‘fugacity’ [ 26 ]. As seen clearly in Figure 1, BP and MCMC-Gibbs are outperformed by MCMC-BP-2reg or MCMC-BP-whole at most tested re gimes in the ﬁrst experiment with no external ﬁeld, where in this case, the 2-re gular loop series (LS) is equal to the full one. Even in the regimes 8 where MCMC-Gibbs outperforms BP , our schemes correct the error of BP and performs at least as good as MCMC-Gibbs. In the experiments, we observ e that adv antage of our schemes ov er BP is more pronounced when the error of BP is large. A theoretical reasoning behind this observation is as follo ws. If the performance of BP is good, i.e. the loop series (LS) is close to 1 , the contribution of empty generalized loop, i.e., w ( ∅ ) , in LS is signiﬁcant, and it becomes harder to sample other generalized loops accurately . 5. Conclusion In this paper , we propose ne w MCMC schemes for approximate inference in GMs. The main novelty of our approach is in designing BP-aw are MCs utilizing the non-tri vial BP solutions. In experiments, our BP based MCMC scheme also outperforms other alternati ves. W e anticipate that this ne w technique will be of interest to many applications where GMs are used for statistical reasoning. Acknowledgements. The work of MC was carried out under the auspices of the National Nuclear Security Administration of the U.S. Department of Ener gy at Los Alamos National Laboratory under Contract No. DE-A C52-06N A25396, and was partially supported by the Advanced Grid Modeling Program in the U.S. Department of Ener gy Of ﬁce of Electricity . References [1] J. Pearl, “Probabilistic reasoning in intelligent systems: networks of plausible inference, ” Mor gan Kauf- mann , 2014. [2] R. G. Gallager , “Low-density parity-check codes, ” Information Theory , IRE T ransactions 8(1): 21-28, 1962. [3] R. F . Kschischang, and J. F . Brendan, “Iterati ve decoding of compound codes by probability propagation in graphical models, ” Selected Ar eas in Communications, IEEE J ournal 16(2): 219-230, 1998. [4] M. I. Jordan, ed. “Learning in graphical models, ” Springer Science & Business Media 89, 1998. [5] R.J. Baxter , “Exactly solved models in statistical mechanics, ” Courier Corporation , 2007. [6] W .T . Freeman, C.P . Egon, and T .C. Owen, “Learning lo w-le v el vision. ” International journal of computer vision 40(1): 25-47, 2000. [7] V . Chandrasekaran, S. Nathan, and H. Prahladh, “Complexity of Inference in Graphical Models, ” Associa- tion for Uncertainty and Artiﬁcial Intelligence , 2008 [8] M. Jerrum, and A. Sinclair , “Polynomial-time approximation algorithms for the Ising model, ” SIAM Journal on computing 22(5): 1087-1116, 1993. [9] C. Andrieu, N. Freitas, A. Doucet, and M. I. Jordan, “ An introduction to MCMC for machine learning, ” Machine learning 50(1-2), 5-43, 2003. [10] J. Pearl, “Rev erend Bayes on inference engines: A distributed hierarchical approach, ” Association for the Advancement of Artiﬁcial Intelligence , 1982. [11] M. Chertk ov , and V . Y . Chernyak, “Loop series for discrete statistical models on graphs, ” Journal of Statistical Mechanics: Theory and Experiment 2006(6): P06009, 2006. [12] M. Chertkov , V . Y . Chern yak, and R. T eodorescu, “Belief propagation and loop series on planar graphs, ” Journal of Statistical Mec hanics: Theory and Experiment 2008(5): P05003, 2008. 9 [13] V . Gomez, J. K. Hilbert, and M. Chertko v , “ Approximate inference on planar graphs using Loop Calculus and Belief Propagation, ” The J ournal of Machine Learning Resear ch , 11: 1273-1296, 2010. [14] P . W . Kasteleyn, “The statistics of dimers on a lattice, ” Classic P apers in Combinatorics. Birkhäuser Boston , 281-298, 2009. [15] N. Prokof ’ev , and B. Svistunov , “W orm algorithms for classical statistical models, ” Physical r e view letters 87(16): 160601, 2001. [16] J.D. Horton, “ A polynomial-time algorithm to ﬁnd the shortest c ycle basis of a graph. ” SIAM Journal on Computing 16(2): 358-366, 1987. AP A [17] H. A. Kramers, and G. H. W annier , “Statistics of the tw o-dimensional ferromagnet. Part II, ” Physical Revie w 60(3): 263, 1941. [18] A. Colle vecchio, T . M. Garoni, T .Hyndman, and D. T okarev , “The worm process for the Ising model is rapidly mixing, ” arXiv preprint arXi v:1509.03201, 2015. [19] S. Kirkpatrick, “Optimization by simulated annealing: Quantitativ e studies. ” J ournal of statistical physics 34(5-6): 975-986, 1984. [20] R. Nicholas, “The Bethe partition function of log-supermodular graphical models, ” Advances in Neural Information Pr ocessing Systems . 2012. [21] J. Bang, J., and G. Z. Gutin. “Digraphs: theory , algorithms and applications. ” Springer Science & Business Media , 2008. [22] Y . W . T eh and M. W elling, “Belief optimization for binary networks: a stable alternative to loop y belief propagation, ” Pr oceedings of the Eighteenth confer ence on Uncertainty in artiﬁcial intellig ence , 493-500, 2001. [23] A. L. Y uille, “CCCP algorithms to minimize the Bethe and Kikuchi free energies: Con ver gent alternativ es to belief propagation, ” Neural Computation , 14(7): 1691-1722, 2002. [24] J. Shin, “The complexity of approximating a Bethe equilibrium, ” Information Theory , IEEE T ransactions on , 60(7): 3959-3969, 2014. [25] https://www .quora.com/Statistical-Mechanics-What-is-the-fermion-sign-problem [26] Dyer , M., Frieze, A., and Jerrum, M. “On counting independent sets in sparse graphs, ” SIAM Journal on Computing 31(5): 1527-1541, 2002. [27] J. Schweinsberg, “ An O ( n 2 ) bound for the relaxation time of a Markov chain on cladograms. ” Random Structur es & Algorithms 20(1): 59-70, 2002. 10 A ppendix A. T ransformation to an equivalent binary pairwise model with maximum degree at most 3 Figure 2: Demonstration of b uilding an equi v alent model with maximum degree ∆ ≤ 3 via ‘expanding’ vertices (in gre y). In the new model, one can introduce edge f actor ψ u,v between the duplicated vertices u, v (in bold) such that ψ u,v ( x u , x v ) = 1 if x u = x v and ψ u,v ( x u , x v ) = 0 otherwise. A ppendix B. Proof of Theor em 1 First, note that the MC induced by the worm algorithm con verges to the follo wing stationary distrib ution π W A ( F ) ∝ Ψ( F ) Y e ∈ F w ( e ) , where Ψ( F ) = ( n, ∀ F ∈ L 2-Loop , 2 , ∀ F ∈ L 2-Odd . W e ﬁrst prov e its polynomial mixing, i.e. it produces a sample from a distribution with the desired total variation distance from π W A in a polynomial number of iterations. Lemma 1. Given any δ > 0 and any F 0 ∈ L 2-Loop ∪ L 2-Loop , choose T mix ≥ w ( F 0 ) − 1 + ( m − n + 1) log 2 + 12∆ mn 4 log δ − 1 , and let π t W A ( · ) denote the r esulting distrib ution of after updating t times by the worm algorithm with initial state F 0 . Then, it follows that 1 2 X F ∈L 2-Loop ∪L 2-Loop     π T mix W A ( F ) − π W A ( F )     ≤ δ , namely , the mixing time of the MC is bounded above by T mix . The proof of the abov e lemma is gi ven in Section B.1. Colle vecchio et al. [ 18 ] recently prov ed that the worm algorithm mixes in polynomi al time when the weights are uniform, i.e., equal. W e extend the result to our case of non-uniform weights. The proof is based on the method of canonical path , which views the state space as a graph and constructs a path between ev ery pair of states ha ving certain amount of ﬂo w deﬁned by π W A . From Lemma 1 with parameters N ≤ 1 . 2 n log (3 δ − 1 ) , T ≤ ( m − n + 1) log 2 + 4∆ mn 4 log(3 nδ − 1 ) , and F 0 ← ∅ , we obtain that the total variation distance between π W A and the distribution of updated states in line 4 of Algorithm 1 is at most δ 3 n . Next, we prove that the probability of acceptance in line 6 of Algorithm 1 is sufﬁciently lar ge. 11 Lemma 2. The pr obability of sampling a 2 -r e gular loop fr om distribution π W A is bounded below by n − 1 , i.e. π W A ( L 2-Loop ) ≥ 1 n . The proof of the abov e lemma is giv en in Section B.2. The proof relies on the fact that the size of L 2-Loop is bounded by a polynomial of the size of L 2-Odd . Now we are ready to complete the proof of Theorem 1. Let b π 2-Loop denote the distrib ution of 2 -regular loops from line 6 of Algorithm 1 under parameters as in Theorem 1. W e say Algorithm 1 fails if it outputs F = ∅ from line 9. Choose a set of 2 -regular loops b L 2-Loop := { F ∈ L 2-Loop : b π 2-Loop ( F ) > π 2-Loop ( F ) } . Then the total variation distance between π 2-Loop and b π 2-Loop can be expressed as: 1 2 X F ∈L 2-Loop | b π 2-Loop ( F ) − π 2-Loop ( F ) | = b π 2-Loop ( b L 2-Loop ) − π 2-Loop ( b L 2-Loop ) . By applying Lemma 1 and Lemma 2, we obtain the following under parameters as in Theorem 1: b π 2-Loop ( b L 2-Loop ) − π 2-Loop ( b L 2-Loop ) ( a ) ≥ b π W A ( b L 2-Loop ) b π W A ( L 2-Loop ) − (1 − b π W A ( L 2-Loop )) N − π 2-Loop ( b L 2-Loop ) ( b ) ≥ π W A ( b L 2-Loop ) + δ 3 n π W A ( L 2-Loop ) − δ 3 n − (1 − π W A ( L 2-Loop ) − δ 3 n ) N − π 2-Loop ( b L 2-Loop ) ( c ) ≥ − 2 δ 3 n π W A ( L 2-Loop ) − e − ( π W A ( L 2-Loop )+ δ 3 n ) N ( d ) ≥ − 2 δ 3 − δ 3 = − δ. In the above, (a) comes from the fact that a sample from line 6 of Algorithm 1 follows the distribution b π W A ( b L 2-Loop ) b π W A ( L 2-Loop ) and the failure probability of Algorithm 1 is (1 − b π W A ( L 2-Loop )) N . For (b), we use the v ariation distance between b π W A and π W A due to Lemma 1 and parameters as in Theorem 1, i.e., | b π W A ( S ) − π W A ( S ) | ≤ δ 3 n ∀ S ⊆ L 2-Loop ∪ L 2-Odd . For (c), we use (1 − x ) ≤ e − x for any x ≥ 0 and (d) follows from Lemma 2 and N ≤ n ln(3 δ − 1 ) . The con v erse b π 2-Loop ( b L 2-Loop ) − π 2-Loop ( b L 2-Loop ) ≤ δ can be done similarly by considering the complementary set L 2-Loop \ b L 2-Loop . This completes the proof of Theorem 1. B.1 Proof of Lemma 1 First, let P W A denote the transition matrix of MC induced by the worm algorithm in Section 3.1. Then we are able to deﬁne the corresponding transition graph G W A = ( L 2-Loop ∪ L 2-Odd , E W A ) , where each verte x is a state of the MC, and edges are deﬁned on state pairs with nonzero transition probability , i.e. E W A = { ( A, A 0 ) : ( A, A 0 ) ∈ ( L 2-Loop ∪ L 2-Odd ) × ( L 2-Loop ∪ L 2-Odd ) , P π W A ( A, A 0 ) > 0 } . Our proof makes use of the follo wing result prov ed in [18]. Theorem 4 (Schweinsberg 2002 [ 18 ]) . Consider an irreducible and lazy MC, with ﬁnite state space Ω , transition matrix P and transition graph G P , which is r e versible with r espect to the distribution π . Let O ⊆ Ω be nonempty , and for each pair ( I , J ) ∈ Ω × O , specify a path γ I ,J in G P fr om I to J . Let Γ = { γ I ,J : ( I , J ) ∈ Ω × O } 12 denote the collection of all such paths, and let L (Γ) be the length of longest path in Γ . F or any transition T ∈ E P , let H T = { ( I , F ) ∈ Ω × O : T ∈ γ I ,J } . Then τ A ( δ ) ≤ " log 1 π ( A ) + log  1 δ  !# 4 L ( T )Φ(Γ) wher e Φ(Γ) = max ( A,A 0 ) ∈E P    X I ,J ∈H ( A,A 0 ) π ( I ) π ( J ) π ( O ) π ( A ) P ( A, A 0 )    . T o this end, we choose O = L 2-Loop and we show that there e xists a choice of paths Γ = { γ I ,J : ( I , J ) ∈ ( L 2-Loop ∪ L 2-Loop ) × L 2-Odd } such that Φ(Γ) ≤ ∆ n 4 , L (Γ) ≤ m. Then we obtain the statement in Lemma 1 immediately . W e begin by specifying Γ , and then proceed to the bound of Φ(Γ) . T o this end, we ﬁx an [ n ] -v alued verte x labeling of G W A . The labeling induces a lexicographical total order of the edges, which in turn induces a lexicographical total order on the set of all subgraphs of G W A . In order for the state I ∈ L 2-Loop ∪ L 2-Odd transit to the J ∈ L 2-Loop , it suf ﬁces that it updates, precisely once, those edges in I ⊕ J . In order to describe such path, we ﬁrst prove that there e xist a injection from I ⊕ J to some unique disjoint partition I ⊕ J = ∪ k i =0 C i , where C 0 is either a path or a cycle and C 1 , · · · , C k are cycles. Observe that since J ∈ L 2-Loop , applying symmetric dif ference with J does not change the parity of degrees of the vertices and I ⊕ J ∈ L 2-Loop ∪ L 2-Odd . First consider the case when I ⊕ J ∈ L 2-Odd . Then there exist a path between two odd-degree vertices in I ⊕ J , since the sum of de grees o ver all vertices in a component is e v en. Among such paths, we pick C 0 as the path with the highest order according to the [ n ] -valued v ertex labeling. Now observ e that I ⊕ J \ C 0 ∈ L 2-Loop is Eulerian, which can be decomposed into disjoint set of cycles. W e are able to choose a C 1 , · · · , C k uniquely by recursiv ely excluding a cycle with the highest order , i.e. we pick C 1 as a cycle with highest order from I ⊕ J \ C 0 , then pick C 2 from I ⊕ J \ C 0 \ C 1 with the highes order , and so on. For the case when I ∈ L 2-Loop , I ⊕ J ∈ L 2-Loop is Eulerian and we can apply similar logic to obtain the unique decomposition into disjoint cycles. Now we are ready to describe γ I ,J , which updates the edges in I ⊕ J from C 0 to C k in order . If C 0 is a path, pick an endpoint with higher order of label and update the edges in the paths by it unwinding the edges along the path until other endpoint is met. In the case of cycles, pick a v ertex with highest order of label and unwind the edges by a ﬁxed orientation. Note that during the update of cycles, the number of odd-degree vertices are at most 2, so the intermediate states are stil in L 2-Loop ∪ L 2-Odd . As a result, we ha ve constructed a path γ I ,F for each I ∈ L 2-Loop ∪ L 2-Odd and J ∈ L 2-Loop where each edge correspond to an update on I ⊕ J and | γ I ,F | = | I ⊕ J | ≤ m . Next, we bound the corresponding Φ(Γ) . First let L 4-Odd denote the set of subgraphs with exactly 4 odd-degree v ertices. W e deﬁne a mapping η T : H T → L 2-Loop ∪ L 2-Odd ∪ L 4-Odd by the following: η T ( I , J ) := I ⊕ F ⊕ ( A ∪ e ) , where T = ( A, A ⊕ e ) . Observe that η T ( I , J ) agrees with I on the components that hav e already been processed, and with J on the components that hav e not. W e prov e that η T is an injection by reconstructing I and J from η T ( I , J ) gi ven T = ( A, A ⊕ e ) . T o this end, observe that I ⊕ F = η T ( I , F ) ⊕ ( A ∪ e ) is uniquely decided from η T ( I , F ) and ( A ∪ e ) . Then gi ven I ⊕ F , we are able to infer the decomposition C 0 , C 1 , · · · , C k of I ⊕ J by the rules deﬁned pre viously . Moreover the updated edge e implies the current set C i being updated. Therefore we can infer the processed part of I ⊕ J . Then we can reco ver J by beginning in A and unwinding the remaining edges in I ⊕ J that was not processed yet. Then we recov er I via I = η T ( I , J ) ⊕ ( A ∪ e ) ⊕ J and therefore η T is injectiv e. 13 Next, we deﬁne a metric w W A such that giv en an edge set F , w W A ( F ) := Y e ∈ F | w ( e ) | . W e complete the proof by sho wing that for any T = ( A, A 0 ) ∈ E , the follo wing inequality holds: Φ(Γ) ( a ) ≤ X I ,J ∈H T 1 π ( L 2-Loop ) π ( I ) π ( J ) π ( A ) P ( A, A 0 ) ( b ) ≤ X I ,J ∈H T 2∆ w W A ( L 2-Loop ) Ψ( I ) w W A ( η T ( I , J )) ( c ) ≤ ∆ n 4 . First, (a) holds by deﬁnition of Φ . W e prove (b) by the follo wing chain of inequality: 1 π ( L 2-Loop ) π ( I ) π ( J ) π ( A ) P ( A, A 0 ) = 1 nw W A ( L 2-Loop ) Ψ( I ) w W A ( I ) nw W A ( J ) Ψ( A ) w W A ( A ) P W A ( A, A 0 ) (1) ≤ 1 w W A ( L 2-Loop ) Ψ( I ) w W A ( I ) w W A ( J ) 2∆ w W A ( A ∪ e ) (2) = 2∆ w W A ( L 2-Loop ) Ψ( I ) w W A ( η T ( I , F )) . In the abov e, (1) comes from the deﬁnition of the transition probability and (2) comes from the deﬁnition of function w W A . Finally , we prove (c). First, we have Ψ(Γ) ≤ X ( I ,J ) ∈H T 2∆ w W A ( L 2-Loop ) Ψ( I ) w W A ( η T ( I , F )) ≤ X ( I ,J ) ∈H T 2∆ w W A ( L 2-Loop ) [ w W A ( L 2-Loop ∪ L 2-Odd ) + 2 w W A ( L 2-Loop ∪ L 2-Odd ∪ L 4-Odd )] = 2∆  ( n + 2) + ( n + 2) + w W A ( L 2-Odd ) w W A ( L 2-Loop ) + 2 w W A ( L 4-Odd ) w W A ( L 2-Loop )  , since η T ( I , J ) is an injection on L 2-Loop ∪ L 2-Odd ∪ L 4-Odd , and the set L 2-Loop , L 2-Odd , L 4-Odd are disjoint. Now we prov e w W A ( L 2-Odd ) w W A ( L 2-Loop ) ≤  n 2  w W A ( L 4-Odd ) w W A ( L 2-Loop ) ≤  n 4  , which completes the proof of Lemma 1 since ( n + 2) + ( n + 2) +  n 2  + 2  n 4  ≤ n 4 2 . T o this end, we let L Odd ( W ) denote the set of generalized loops ha ving W as the set of odd de gree vertices. No w observe the following inequality: X F ∈L Odd ( W ) w W A ( F ) ( a ) = 1 2 n X F ∈L Y e ∈ F | w ( e ) | Y s ∈ V \ W (1 + ( − 1) d F ( v ) ) Y s ∈ W (1 + ( − 1) d F ( v )+1 ) = 1 2 n X σ ∈{− 1 , 1 } V X F ∈L Y e ∈ F | w ( e ) | Y s ∈ V σ d F ( v ) v Y v ∈ W σ v = X σ ∈{− 1 , +1 } V Y e =( u,v ) ∈ E (1 + | w ( e ) | σ u σ v ) Y v ∈ W σ v ( b ) ≥ X σ ∈{− 1 , +1 } V Y e =( u,v ) ∈ E (1 + | w ( e ) | σ u σ v ) ( c ) = X F ∈L L 2-Loop w W A ( F ) . 14 In the above, (a) comes from the fact that 1 + ( − 1) d v ( F ) = 2 if d v ( F ) is ev en and 0 otherwise, so only the terms corresponding to 2 -regular loop becomes non-zero. For (b), the inequality comes from the fact that 1 + | w ( e ) | σ u σ v ≥ 0 and σ v ≤ 1 . For (c), the equality is from the f act that L 2-Loop = L Odd ( ∅ ) . Therefore we hav e P F ∈ L ( ∅ ) | w ( F ) | ≥ P F ∈ L ( W ) | w ( F ) | , leading to w W A ( L 2-Odd ) w W A ( L 2-Loop ) = P W ⊆ V , | W | =2 P F ∈L Odd ( W ) | w W A ( F ) | w W A ( L 2-Loop ) ≤  n 2  , and the case for L 4-Odd is done similarly . This completes the proof of Lemma 1. B.2 Proof of Lemma 2 Gi ven W ⊆ V , we let L Odd ( W ) denote the set of generalized loops ha ving W as the set of odd degree vertices. where Odd ( F ) is the set of odd-de gree v ertices in F . Now observe the follo wing inequality: X F ∈L Odd ( W ) w W A ( F ) ( a ) = 1 2 n X F ∈L Y e ∈ F | w ( e ) | Y s ∈ V \ W (1 + ( − 1) d F ( v ) ) Y s ∈ W (1 + ( − 1) d F ( v )+1 ) = 1 2 n X σ ∈{− 1 , 1 } V X F ∈L Y e ∈ F | w ( e ) | Y s ∈ V σ d F ( v ) v Y v ∈ W σ v = X σ ∈{− 1 , +1 } V Y e =( u,v ) ∈ E (1 + | w ( e ) | σ u σ v ) Y v ∈ W σ v ( b ) ≥ X σ ∈{− 1 , +1 } V Y e =( u,v ) ∈ E (1 + | w ( e ) | σ u σ v ) ( c ) = X F ∈L L 2-Loop w W A ( F ) . In the above, (a) comes from the fact that 1 + ( − 1) d v ( F ) = 2 if d v ( F ) is ev en and 0 otherwise, so only the terms corresponding to 2 -regular loop becomes non-zero. For (b), the inequality comes from the fact that 1 + | w ( e ) | σ u σ v ≥ 0 and σ v ≤ 1 . For (c), the equality is from the f act that L 2-Loop = L Odd ( ∅ ) . Therefore we hav e P F ∈ L ( ∅ ) | w ( F ) | ≥ P F ∈ L ( W ) | w ( F ) | , leading to P F ∈L 2-Loop π W A ( F ) P F ∈L 2-Loop ∪L 2-Odd π W A ( F ) = n P F ∈L 2-Loop | w W A ( F ) | n P F ∈L 2-Loop | w W A ( F ) | + P W W A ⊆ V , | W | =2 P F ∈L Odd ( W ) | w W A ( F ) | ≥ n n + 2  n 2  = 1 n , which completes the proof of Lemma 2. A ppendix C. Proof of Theor em 2 First, we quantify ho w much samples from Algorithm 1 are necessary for estimating some non-ne gati v e real valued function f on L 2-Loop . T o this, we state the following lemma which is a straightforward application of the known result in [8]. 15 Lemma 3. Let f be a non-ne gative r eal-valued function deﬁned on L 2-Loop and bounded above by f max ≥ 0 . Given 0 < ξ ≤ 1 and 0 < η ≤ 1 / 2 , choose s ≥ 504 ξ − 2 d log η − 1 e f max E π 2-Loop [ f ] N ≥ 1 . 2 n log 24 f max ξ E π 2-Loop [ f ] , T ≥ ( m − n + 1) log 2 + 4∆ mn 4 log 8 f max ξ E π 2-Loop [ f ] , and generate 2 -r e gular loops F 1 , F 2 , · · · F s using Algorithm 1 with inputs N and T . Then, it follows that P  | 1 s P i | w ( F i ) | − E π 2-Loop ( f ) | E π 2-Loop ( f ) ≤ ξ  ≤ 1 − η . namely , samples of Algorithm 1 estimates E π 2-Loop ( f ) within appr oximation r atio 1 ± ξ with pr obability at least 1 − η . First, recall that during each stage of simulated annealing, we approximate the expectation of the function w ( F ) 1 /n with respect to the distribution π 2-Loop ( β ) , i.e., E π 2-Loop ( β ) h | w ( F ) | 1 /n i = Z † 2-Loop ( β i +1 ) / Z † 2-Loop ( β i ) . Hence, to apply Lemma 3, we bound max F | w ( F ) | 1 /n and E π 2-Loop ( β )  | w ( F ) | 1 /n  as follows: | w ( F ) | 1 /n ≤ 1 E π 2-Loop ( β ) h | w ( F ) | 1 /n i ≥ w min , where the ﬁrst inequality is due to w ( e ) ≤ 1 for any e ∈ E and the second one is from | F | ≤ n for any 2 -regular loop F . Thus, from Lemma 3 with parameters s ≥ 18144 n 2 ε − 2 w − 1 min d log(6 nν − 1 ) e , N ≥ 1 . 2 n log (144 nε − 1 w − 1 min ) , T ≥ ( m − n + 1) log 2 + 4∆ mn 4 log(48 nε − 1 w − 1 min ) , on each stage, we obtain P  | H i − Z † 2-Loop ( β i +1 ) / Z † 2-Loop ( β i ) | Z † 2-Loop ( β i +1 ) / Z † 2-Loop ( β i ) ≤ ε 6 n  ≥ 1 − ν 6 n . This implies that the product Q i H i estimates Z † 2-Loop 2 m − n +1 within approximation ratio in [((1 − ε/ 6 n ) n , (1 + ε/ 6 n ) n ] ⊆ [1 − ε/ 3 , 1 + ε/ 3] with probability at least (1 − ν / 6 n ) n ≥ 1 − ν / 3 , i.e., P  | 2 m − n +1 Q i H i − Z † 2-Loop | Z † 2-Loop ≤ ε 3  ≥ 1 − ν 3 . Next we deﬁne a non-ne gati ve real-v alued random function g on L 2-Loop as g ( F ) = ( 1 if w ( F ) < 0 0 otherwise , 16 namely , E π 2-Loop [ g ( F )] = P π 2-Loop [ w ( F ) < 0] . Since max F g ( F ) = 1 , one can apply Lemma 3 with parameters s ≥ 18144 ζ (1 − 2 ζ ) − 2 ε − 2 d log(3 ν − 1 ) e , N ≥ 1 . 2 n log (144 ε − 1 (1 − 2 ζ ) − 1 ) , T ≥ ( m − n + 1) log 2 + 4∆ mn 4 log(48 ε − 1 (1 − 2 ζ ) − 1 ) and hav e P  | κ − P π 2-Loop [ w ( F ) < 0] | P π 2-Loop [ w ( F ) < 0] ≤ (1 − 2 P π 2-Loop [ w ( F ) < 0]) ε 6 P π 2-Loop [ w ( F ) < 0]  ≥ 1 − ν 3 , since ζ = P π 2-Loop [ w ( F ) < 0] . Furthermore, after some algebraic calculations, one can obtain P  | (1 − 2 κ ) − (1 − 2 P π 2-Loop [ w ( F ) < 0]) | 1 − 2 P π 2-Loop [ w ( F ) < 0] ≤ ε 3  ≥ 1 − ν 3 . The rest of the proof is straightforward since we estimate Z 2-Loop = (1 − 2 P π 2-Loop [ w ( F ) < 0]) Z † 2-Loop by (1 − 2 κ )2 m − n +1 Q i H i , the approximation ratio is in [(1 − ε/ 3) 2 , (1 + ε/ 3) 2 ] ⊆ [1 − ε, 1 + ε ] with probability at least (1 − ν / 3) 2 ≥ 1 − ν . A ppendix D. Proof of Theorem 3 Giv en F ∈ L , we let the odd-degree vertices in F (i.e., d F ( · ) is odd) by v 1 , v 2 , · · · v 2 ` for some integer ` ≥ 0 . Since we assume G is connected, there exist a set of paths P 1 , P 2 , · · · P ` such that P i is a path from v 2 i − 1 to v 2 i . Note that gi ven an y set of edges D ⊆ E , D ⊕ P i changes the parities of d D ( v 2 i − 1 ) , d D ( v 2 i ) , while others remain same. Therefore, all degrees in F ⊕ P 1 ⊕ · · · ⊕ P ` become ev en. Then, due to the deﬁnition of cycle basis, there exist some C 1 , C 2 , · · · C k ∈ C such that C 1 ⊕ C 2 · · · ⊕ C k = F ⊕ P 1 ⊕ · · · ⊕ P ` , namely , F = C 1 ⊕ C 2 · · · ⊕ C k ⊕ P 1 ⊕ · · · ⊕ P ` . This completes the proof of Theorem 3. 17

MCMC assisted by Belief Propagation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment