Monte Carlo Methods for the Ferromagnetic Potts Model Using Factor Graph Duality
Normal factor graph duality offers new possibilities for Monte Carlo algorithms in graphical models. Specifically, we consider the problem of estimating the partition function of the ferromagnetic Ising and Potts models by Monte Carlo methods, which …
Authors: Mehdi Molkaraie, Vicenc Gomez
1 Monte Carlo Methods for the Ferromagnetic Potts Model Using F actor Graph Duality Mehdi Molkaraie and V icenc ¸ G ´ omez Abstract —Normal factor graph duality offers new possibilities for Monte Carlo algorithms in graphical models. Specifically , we consider the pr oblem of estimating the partition function of the ferr omagnetic Ising and Potts models by Monte Carlo methods, which are known to work well at high temperatures, but to fail at lo w temperatures. W e propose Monte Carlo methods (uniform sampling and importance sampling) in the dual normal factor graph, and demonstrate that they behav e differently: they work particularly well at low temperatur es. By comparing the r elative error in estimating the partition function, we show that the proposed importance sampling algorithm significantly outperforms the state-of-the-art deterministic and Monte Carlo methods. For the ferromagnetic Ising model in an external field, we show the equivalence between the valid configurations in the dual normal factor graph and the terms that appear in the high-temperature series expansion of the partition function. Follo wing this result, we discuss connections with Jerrum– Sinclair’ s polynomial randomized approximation scheme (the subgraphs-world process) for evaluating the partition function of ferromagnetic Ising models. Index T erms —Potts model, Ising model, normal factor graph, partition function, dual normal factor graph, Monte Carlo meth- ods, low-temperature regime, ferromagnetism, high-temperature series expansion, subgraphs-w orld process. I . I N T R O D U C T I O N Many quantities of interest in statistical physics, combi- natorics, information theory , and machine learning can be expressed as a partition function Z 4 = X x 1 ,...,x N f ( x 1 , . . . , x N ) , (1) where f ( x 1 , . . . , x N ) is a nonnegati ve real function of finite- valued v ariables x 1 , . . . , x N and where the sum runs o ver all possible v alues of these v ariables. For e xample, if f takes values in { 0 , 1 } , then (1) counts the number of configura- tions x 1 , . . . , x N for which f ( x 1 , . . . , x N ) 6 = 0 . In statistical physics, Z is usually considered as a function of temperature (cf. Section II) and hence is called the partition function. F or large N , we are usually interested in the free energy per site 1 N ln Z rather than in Z itself. Naiv e computation of (1) is e xponential in N and practically possible only for small N . If the function f in (1) has a cycle- free factor graph (and no state variables with excessi vely many states), then the partition function can be computed exactly M. Molkaraie is with the Department of Information T echnology and Electrical Engineering, ETH Zurich, CH-8092 Z ¨ urich, Switzerland (email: mehdi.molkaraie@alumni.ethz.ch). V . G ´ omez is with the Artificial Intelli- gence and Machine Learning group at the Universitat Pompeu Fabra, 08018 Barcelona, Spain (email: vicen.gomez@upf.edu). Parts of this work were presented in [1]–[3]. by sum-product message passing [4]–[6] (with complexity typically linear in N ), which, in this context, coincides with the transfer matrix method from statistical physics [7, Chapter 2], [8, Chapter 5]. In general, howe ver , the e xact computation of the partition function is intractable for lar ge N , and even good approxima- tions can be hard to obtain. Empirically good approximations are often achie ved with deterministic methods (including the belief propagation (BP), the generalized belief propagation (GBP) [9], and the tree expectation propagation (TreeEP) [10] algorithms), b ut the accuracy of such approximations is often difficult to assess theoretically . In this paper, we pursue the idea (proposed in [1]–[3]) that normal factor graph duality offers ne w opportunities for Monte Carlo algorithms. W e de velop and test this idea for two-dimensional (2D) nearest-neighbor Ising and Potts models. Both the Ising model [11], [12] and the more general Potts model [13], [14] play an important role in many areas, including statistical physics [7], image processing [15], spatial statistics [16], and graph theory [17]–[19]. Exact computation of the partition function of the Potts model is possible only in some special cases (e.g., in the one- dimensional (1D) case). For the planar Ising model without an external field, the problem is tractable and can be reduced to ev aluating a certain determinant [20], [21], and this approach can also be used to obtain accurate approximations in more general settings [22]. Also there is a polynomial randomized approximation scheme [23, Chapter 28] for the partition func- tion of general ferromagnetic Ising models in an external field due to Jerrum and Sinclair [24]. Connections among the dual normal factor graph representation of the Ising model, the approximation scheme of Jerrum and Sinclair, and the high- temperature series expansions of the partition function will be discussed in detail in this paper . Howe ver , under reasonable complexity assumptions, there is no polynomial randomized approximation scheme for the partition function of the Potts model. Indeed, for ferromagnetic Potts models, approximating the partition function is as hard as approximating the number of independent sets in a bipartite graph, which is among the presumably intractable problems [25]–[27]. Known Monte Carlo algorithms for the partition function work very well for the Potts model at high temperature (i.e., when local correlations decay quickly) [28]–[31]. At low temperatures, howe ver , Monte Carlo methods suffer from slow and erratic conv ergence [30], [32]. More advanced Monte Carlo methods (e.g., nested sampling [33] and the Swendsen- W ang algorithm [34]) require sampling from a large sequence of constraints or intermediate distributions at different temper- 2 atures to estimate the partition function. The main challenge is, therefore, to design Monte Carlo methods that achieve sufficiently fast conv ergence in the lo w-temperature regime. The approach of this paper is based on the notions of the dual normal realization as introduced by Forne y [35] and the dual normal factor graph [36]–[38]. According to the normal factor graph duality theorem, the partition function of the dual normal factor graph equals that of the primal normal factor graph up to a known scale factor [36]. The relation of normal factor graph duality to Kramers–W annier duality [39], [14, Section II] in statistical physics has been worked out in [40]– [42]. Using Monte Carlo methods in the dual normal factor graph has been in vestigated in [1], [43], [2], [3]. It was demonstrated (by simulations) in [1] that for the 2D Ising model without an external field, baseline Monte Carlo methods con verge faster at low temperature in the dual normal factor graph than in the primal normal factor graph. Some pertinent analytical and numerical results regarding the variance of Monte Carlo methods in the two domains were giv en in [43]. A suitable partitioning of variables, which allows drawing independent samples according to an auxiliary distribution, was introduced in [2] to propose an importance sampling algorithm to estimate the partition function of the Ising model in a strong external field. The methods of [2] were further generalized in [3] to models with a mixture of strong and weak couplings. In this paper, we further explore the use of Monte Carlo methods in the dual normal factor graph. Specifically , we pro- pose Monte Carlo methods for estimating the partition function of ferromagnetic q -state Potts models, with or without an external field. W e consider uniform sampling and importance sampling algorithms, both of which are shown to work very well for strong couplings or , equiv alently , at low temperature. Our experimental results show that, in various settings, the importance sampling algorithm significantly improv es upon the state-of-the-art Monte Carlo and deterministic methods. Indeed, in contrast to Monte Carlo methods in the primal domain, the dual-domain Monte Carlo algorithms of this paper excel at low temperatures. The paper is organized as follows. In Section II, we revie w the Potts model. The primal and the dual normal factor graphs of the model will be presented in Section III and Section IV, respectiv ely . Specific Monte Carlo algorithms that use the dual normal factor graph are proposed in Section V, and pertinent experimental results (including comparisons with standard deterministic and Monte Carlo methods) are presented in Section VI. Extensions of our Monte Carlo methods to the Potts model in an e xternal field are discussed in Section VII. In Section VIII, we establish the connection among the valid configurations in the dual normal factor graph representation of the Ising model in an external field, high temperature series expansions of the partition function, and the randomized approximation scheme of Jerrum and Sinclair . Appendix A compares the variance of Monte Carlo methods in the primal and in the dual normal factor graphs of the 2D Ising model to demonstrate their opposite behavior . X 1 f 1 X 5 X 2 f 2 X 3 f 3 X 4 Fig. 1: Normal factor graph of (10). X 2 g 1 X 1 = I = X 3 g 2 X 0 1 X 00 1 g 3 X 4 Fig. 2: Normal factor graph of (11). I I . T H E 2 D P O TT S M O D E L W I T H O U T E X T E R N A L F I E L D Let X 1 , X 2 , . . . , X N be a collection of N random variables that take v alues in the set X , which in this context is identical to Z /q Z , the ring of integers modulo q for some fixed integer q ≥ 2 . (In the special case where q = 2 , we obtain the Ising model.) Let x i represent a possible realization of X i and X stand for ( X 1 , X 2 , . . . , X N ) . The vectors x ∈ X N will be called configurations. The variables X 1 , X 2 , . . . , X N are associated with the ver - tices of a simple and connected graph G = ( V , E ) that has N = |V | vertices and |E | edges. In the Potts model, each variable represents the q possible states of a particle, and two variables interact if their corresponding vertices are connected by an edge in E . For illustrative purposes, we will take G to be a grid with size N = M × M , and assume periodic boundary conditions, so that each variable has exactly four neighbors. Howe ver , the methods of this paper are easily adapted to Potts models with arbitrary topology . Indeed, we will consider 2D Potts models with free boundary conditions in our numerical experiments. Con ventionally E is defined as the set of all the unordered pairs ( k , ` ) ∈ { 1 , . . . , N } × { 1 , . . . , N } such that X k and X ` are nearest neighbors. Thus |E | = 2 N (2) for periodic boundary conditions. A real coupling parameter J k,` is associated with each pair ( k , ` ) ∈ E . The energy of a configuration x ∈ X N is giv en by the Hamiltonian H ( x ) = − X ( k,` ) ∈E J k,` · δ ( x k − x ` ) , (3) where δ ( · ) is the Kronecker delta, which ev aluates to one if its argument is zero, and to zero otherwise. In this paper, we focus on ferromagnetic models, which are characterized by the condition J k,` ≥ 0 for all ( k , ` ) ∈ E , i.e., configurations in which adjacent variables take on the same value hav e lower energy . The probability of a configuration x ∈ X N is giv en by the Boltzmann distribution p B ( x ) = e − β H ( x ) Z . (4) 3 Here, β denotes the in verse temperature and the normaliza- tion constant Z is the partition function given by Z = X x ∈X N e − β H ( x ) , (5) where the sum runs over all the configurations [7]. W e will find it con venient to omit the parameter β (i.e., we set β = 1 ) and to work with varying values of the coupling parameters J k,` . In this set-up, large values of J k,` correspond to low temperature and small values correspond to high temperature. In particular , the special case where J k,` = 0 for all ( k , ` ) ∈ E corresponds to infinite temperature. W e no w let f ( x ) = e −H ( x ) (6) = Y ( k,` ) ∈E κ k,` ( x k , x ` ) (7) with κ k,` ( x k , x ` ) = e J k,` , if x k = x ` 1 , otherwise. (8) Thus (5) becomes Z = X x ∈X N f ( x ) , (9) in agreement with (1). The factorization (7) will be used in the next section. I I I . P R I M A L N O R M A L F AC T O R G R A P H O F T H E P O TT S M O D E L W e use normal factor graphs as in [35], [5], [36], [37] (also called Forney factor graphs), where v ariables are represented by edges and factors are represented by nodes/boxes. (By contrast, factor graphs as in [4] represent both factors and variables by nodes.) For example, the factorization f ( x 1 , x 2 , x 3 , x 4 , x 5 ) = f 1 ( x 1 , x 2 , x 5 ) f 2 ( x 2 , x 3 ) f 3 ( x 3 , x 4 , x 5 ) (10) is represented by the normal factor graph in Fig. 1. W e say that a configuration x is valid if f f ( x ) 6 = 0 , and we note that only valid configurations contribute to the partition function (1). As observed in [35], in order to represent the variables by edges, each variable must be in volv ed in only one or two factors. If some v ariable appears in more than two factors as, e.g., in g ( x 1 , x 2 , x 3 , x 4 ) = g 1 ( x 1 , x 2 ) g 2 ( x 1 , x 3 ) g 3 ( x 1 , x 4 ) , (11) we introduce auxiliary variables x 0 1 and x 00 1 , and define an additional equality indicator function I = ( x 1 , x 0 1 , x 00 1 ) 4 = δ ( x 1 − x 0 1 ) · δ ( x 1 − x 00 1 ) (12) (as shown in Fig. 2) such that x 1 = x 0 1 = x 00 1 in all valid configurations. The partition function is not affected by such replications. W e next note that the Hamiltonian (3) and factors (8) can equiv alently be written as a function of y k,` = x k − x ` . T o = I = X 1 + I + ◦ X 2 = + ◦ = + ◦ = + ◦ + ◦ + ◦ + ◦ = + ◦ = + ◦ = + ◦ = + ◦ + ◦ + ◦ + ◦ = + ◦ = + ◦ = + ◦ = + ◦ + ◦ + ◦ + ◦ = + ◦ = + ◦ = + ◦ = κ 1 Fig. 3: Primal normal factor graph of the 2D Potts model. The empty boxes represent the factors (8), the boxes labeled “ = ” are equality indicator functions given by (12), the boxes labeled “ + ” are zero-sum indicator functions giv en by (14), and the symbol “ ◦ ” indicates a sign in version. The periodic boundary conditions are not shown. simplify notation, we will henceforth denote the elements of E by a single inde x v ariable e ∈ E rather than by a v ertex index pair ( k , ` ) ∈ V 2 , with adjacencies continuing to be determined by the graph G = ( V , E ) . Hence, each factor (8) can be written as κ e ( y e ) = e J e , if y e = 0 1 , otherwise. (13) Applying factors (13) in (7), we can construct the primal normal factor graph of the Potts model as shown in Fig. 3, in which the empty boxes represent (13), the boxes labeled “ = ” are instances of equality indicator functions as in (12), the boxes labeled “ + ” are instances of the zero-sum indicator functions defined as I + ( y e , x k , x ` ) 4 = δ ( y e + x k + x ` ) , (14) and in analogy with the logic N AND gate the symbol “ ◦ ” is used to indicate a sign inv ersion. (Recall that all arithmetic manipulations are done modulo q .) For q = 2 , the Potts model is equiv alent to the Ising model. Howe ver , the standard con vention is to define the Hamiltonian of the model as H ( x ) = − X ( k,` ) ∈E J k,` · 2 δ ( x k − x ` ) − 1 . (15) Follo wing a similar approach, we can obtain the primal normal factor graph of the Ising model (also shown in Fig. 3), where the empty boxes represent factors giv en by κ e ( y e ) = e J e , if y e = 0 e − J e , if y e = 1 . (16) Note that, for the Ising model the “ ◦ ” symbols are immaterial and can be removed from Fig. 3. 4 + I + ˜ Y 1 = ◦ − ˜ Y 1 + I = = ◦ + = ◦ + = ◦ = ◦ = ◦ = ◦ + = ◦ + = ◦ + = ◦ + = ◦ = ◦ = ◦ = ◦ + = ◦ + = ◦ + = ◦ + = ◦ = ◦ = ◦ = ◦ + = ◦ + = ◦ + = ◦ + γ 1 Fig. 4: Dual normal factor graph of the 2D Potts model. The empty boxes represent the factors (18), the boxes labeled “ = ” are equality indicator functions given by (12), the boxes labeled “ + ” are zero-sum indicator functions giv en by (14), and the symbol “ ◦ ” indicates a sign in version. The periodic boundary conditions are not shown. In the special case of the 2D Ising model with constant couplings J e = J and without an e xternal field, the parti- tion function is analytically av ailable in the thermodynamic limit (i.e., as N → ∞ ) from Onsager’ s solution [44]. In Appendix A, we will use the analytical solution of the partition function to analyze the v ariance of Monte Carlo methods of this paper . For the (nonbinary) 2D Potts model, no such analytical solution for the partition function is yet available. Next, we will describe the corresponding dual normal factor graphs of the models in this section. I V . D U A L N O R M A L F AC T O R G R A P H O F T H E P O TT S M O D E L The dual normal factor graph of some gi ven (primal) normal factor graph has the same topology as the primal normal factor graph, but all factors are replaced by their Fourier transforms (which includes replacing equality indicator functions by zero- sum indicator functions, and vice versa). See [35], [5], [36], [38], [43] for more details. In the dual normal factor graph, all variables are replaced by their corresponding dual (frequency) variables, which take values in the same alphabet as the primal variables. W e will use the tilde symbol to denote variables in the dual domain. The dual normal factor graph has the same partition function as the primal normal factor graph, up to some kno wn scale factor [36, Theorem 2]. W e denote the partition function of the dual normal factor graph by Z d . Follo wing [43], we can obtain the dual normal factor graph of the 2D Potts model as shown in Fig. 4, in which the empty boxes represent factors that are the 1D Fourier transforms of (13) giv en by γ e ( ˜ y e ) = X y e ∈X κ e ( y e ) e − i2 π y e ˜ y e /q . (17) 0 2 0 1 1 0 2 0 2 0 1 2 0 2 1 0 0 0 0 1 2 0 0 1 + = ◦ + = ◦ + = ◦ + = ◦ = ◦ = ◦ = ◦ + = ◦ + = ◦ + = ◦ + = ◦ = ◦ = ◦ = ◦ + = ◦ + = ◦ + = ◦ + = ◦ = ◦ = ◦ = ◦ + = ◦ + = ◦ + = ◦ + Fig. 5: A partitioning of the variables ˜ Y = { ˜ Y e : e ∈ E } in Fig. 4 as in Section IV -A. The edges in T (drawn with thick edges) represent the variables ˜ Y T , which are linearly dependent on the remaining variables ˜ Y T . This spanning tree works both with and without periodic boundary conditions. Also shown is an example of a valid configuration for q = 3 (assuming no boundary conditions). Thus γ e ( ˜ y e ) = e J e − 1 + q , if ˜ y e = 0 e J e − 1 , otherwise, (18) which is nonne gativ e due to the ferromagnetic assumption (i.e., J e ≥ 0 ). Similarly , we can obtain the dual normal factor graph of the 2D Ising model (shown in Fig. 4), where the empty boxes represent factors as in γ e ( ˜ y e ) = 2 cosh( J e ) , if ˜ y e = 0 2 sinh( J e ) , if ˜ y e = 1 , (19) which is the 1D Fourier transform of (16). Notice that (19) is also nonnegati ve due to ferromagnetic assumption. Again, for the Ising model the “ ◦ ” symbols can be safely remov ed from the dual normal factor graph. The partition function of the dual normal factor graph is thus Z d = X valid ˜ y Y e ∈E γ e ( ˜ y e ) , (20) where the sum runs over all valid configurations in the dual normal factor graph. For the dual 2D Potts and Ising models with factors as in (18) and (19), and with periodic boundary conditions, it holds that 1 Z d = q N Z (21) see [36]. 1 In general, the scale factor α ( G ) = Z d / Z depends on the topology of G and on the local scale factors used in the Fourier transforms. In our setup, the scale factor is given by α ( G ) = q |E |−|V | . For example, in a 2D torus |E | = 2 N , and therefore α ( G ) = q N as in (21); in a 1D model with periodic boundary conditions |E | = |V | , and thus α ( G ) = 1 . For more details, see [45], [42, Section 3.3]. 5 Fig. 4 is the basis of the Monte Carlo algorithms of this paper to estimate Z d . The estimates are then used to compute an estimate of Z via (21). A. Independent V ariables and Spanning T r ees in the Dual Normal F actor Graph In our Monte Carlo methods, we will use partitions of G into two disjoint subsets G = T ∪ T such that T is a spanning tree (that reaches every zero-sum factor) and T is the corresponding cospanning tree, as illustrated in Fig. 5. The edges of T are called the branc hes and the edges of T are called the chor ds of G with respect to T . Although T is always without cycles, T need not be cycle-free. Any such partition induces a corresponding partition of the variables ˜ Y = { ˜ Y e : e ∈ E } into ˜ Y T and ˜ Y T . Proposition 1. Consider a valid configuration in the dual normal factor graph of the Potts model. Suppose variables ˜ Y 1 , ˜ Y 2 , . . . form a cutset in the dual normal factor graph, then it holds that X e ∈ Cutset ˜ y e = 0 (22) 2 Pr oof. Removing all the edges that represent variables ˜ Y 1 , ˜ Y 2 , . . . partitions G into G 1 ∪ G 2 . Suppose in G 1 we write down the equations associated with all the zero-sum indicator factors. Since each v ariable, say ˜ y t for t ∈ E 1 appears twice in the summation, once as ˜ y t and once as − ˜ y t (see Fig. 4), the sum over all these equations is equal to zero. Furthermore, the same sum in G is equal to P e ∈ Cutset ˜ y e . This completes the proof. The proof follows along the same lines in G 2 . For more details, see [45], [42, Section 2.5]. Removing a branch b ∈ T partitions T = T 1 ∪ T 2 . The edges that connect T 1 and T 2 form a unique cutset in G , called the fundamental cutset belonging to b . Each fundamental cutset has exactly one branch of T that does not appear in any other fundamental cutset, along with edges (chords) that belong to T . Indeed, each spanning tree defines a set of |T | fundamental cutsets: one for each branch of the spanning tree [46, Chapter 2]. According to Proposition 1, for each b ∈ T we can compute ˜ Y b as a linear combination of ˜ Y T by applying (22) on the fundamental cutset belonging to b . W e conclude that the v ariables in ˜ Y T are linearly independent and the v ariables in ˜ Y T are fully determined by ˜ Y T via a linear transformation. It follows that the number of valid configurations in the dual normal factor graph of the Potts model is q |T | . In any such partitioning the number of variables in ˜ Y T is |T | = N − 1 (23) and the number of variables in ˜ Y T is |T | = |E | − |T | . For the 2D torus, we thus have |T | = N + 1 (24) from (2). V . M O N T E C A R L O M E T H O D S F O R T H E P A RT I T I O N F U N C T I O N O F T H E D U A L N O R M A L F AC T O R G R A P H W e propose two basic Monte Carlo algorithms for estimat- ing the partition function. Both algorithms use partitions of E and Y as in Section IV -A and Fig. 5. In both Monte Carlo algorithms, we draw independent samples ˜ y (1) T , . . . , ˜ y ( L ) T ∈ X |T | according to some auxiliary probability distribution, and each of these samples ˜ y ( ` ) T is completed (by computing the corresponding ˜ y ( ` ) T ∈ X |T | ) to a v alid configuration ˜ y ( ` ) = ( ˜ y ( ` ) T , ˜ y ( ` ) T ) ∈ X |E | . Computing ˜ y ( ` ) T from ˜ y ( ` ) T is easy and linear in |T | . W e will also use the quantities Γ T ( ˜ y T ) = Y e ∈T γ e ( ˜ y e ) , (25) Γ T ( ˜ y T ) = Y e ∈T γ e ( ˜ y e ) , (26) and Γ( ˜ y ) = Γ T ( ˜ y T )Γ T ( ˜ y T ) = Y e ∈E γ e ( ˜ y e ) , (27) therefore (20) becomes Z d = X valid ˜ y Γ( ˜ y ) . (28) W e propose uniform sampling and importance sampling algorithms to estimate Z d . Giv en the partitioning, the com- putational complexity of our algorithms is O ( |E | ) per sample and O ( L |E | ) in total. The variance of both methods is deriv ed in Section V -C. A. Uniform Sampling As a baseline algorithm (used in [1] and [43]), we use independent samples ˜ y (1) T , . . . , ˜ y ( L ) T drawn uniformly ov er X |T | , which are completed to v alid configurations ˜ y ( ` ) ∈ X |E | as described above. W e then use the estimate ˆ Z Uni d = q |T | L L X ` =1 Γ( ˜ y ( ` ) ) . (29) It is easily verified that E[ ˆ Z Uni d ] = Z d , i.e., the estimator is unbiased: E " q |T | L L X ` =1 Γ( ˜ Y ( ` ) ) # = q | T | E h Γ( ˜ Y (1) ) i (30) = q |T | X valid ˜ y 1 q |T | Γ( ˜ y ) (31) = Z d , (32) where the last step follows from (28). The accuracy of (29) depends on the fluctuations of Γ( ˜ y ( ` ) ) . In the low-temperature limit (i.e., for e J e q ), these fluc- tuations disappear , because γ e ( ˜ y e ) ≈ e J e becomes constant. The estimator (29) can therefore be expected to work well at sufficiently low temperatures. 6 0 . 1 0 . 5 0 . 9 1 . 3 1 . 7 2 . 1 2 . 5 10 − 7 10 − 6 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 J Relativ e Error BP GBP TreeEP Uni Imp Fig. 6: Comparison with deterministic algorithms (BP , GBP , and T reeEP): experimental results for a Potts model with q = 3 , N = 8 × 8 , periodic boundary conditions, and constant couplings J . The plot shows the relative error (51) as a function of J . (Recall that large J corresponds to the low- temperature regime.) B. Importance Sampling An importance sampling estimator (proposed in [2], [3]) is obtained by drawing independent samples ˜ y (1) T , . . . , ˜ y ( L ) T ∈ X |T | according to the auxiliary probability distribution p T ( ˜ y T ) = Γ T ( ˜ y T ) Z T (33) = Y e ∈T γ e ( ˜ y e ) P q − 1 ξ =0 γ e ( ξ ) , (34) where Z T is av ailable in closed-form as Z T = Y e ∈T q − 1 X ξ =0 γ e ( ξ ) (35) = Y e ∈T q e J e (36) = q |T | exp X e ∈T J e . (37) The product form in (34) indicates that to dra w samples according to p T ( ˜ y T ) , we can draw each component ˜ y ( ` ) e of ˜ y ( ` ) T independently with probability P ˜ y ( ` ) e = ξ = 1 + ( q − 1) e − J e q , if ξ = 0 1 − e − J e q , for ξ = 1 , 2 , . . . , q − 1 . (38) Again, the samples ˜ y (1) T , . . . , ˜ y ( L ) T are completed to v alid configurations ˜ y (1) , . . . , ˜ y ( L ) ∈ X |E | . W e then use the estimate ˆ Z Imp d = Z T L L X ` =1 Γ T ( ˜ y ( ` ) T ) , (39) 0 0 . 5 1 1 . 5 2 2 . 5 10 − 7 10 − 6 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 σ 2 Relativ e Error BP GBP TreeEP Uni Imp Fig. 7: Comparison with deterministic algorithms (BP , GBP , and T reeEP): experimental results for a Potts model with q = 3 , N = 8 × 8 , periodic boundary conditions, and coupling parameters J e : J e = | J 0 e | with J 0 e i.i.d. ∼ N (0 , σ 2 ) . The plot shows the relativ e error (51) as a function of σ 2 . which is unbiased: E " Z T L L X ` =1 Γ T ( ˜ Y ( ` ) T ) # = Z T E h Γ T ( ˜ Y (1) T ) i (40) = Z T X valid ˜ y p T ( ˜ y T )Γ T ( ˜ y T ) (41) = X valid ˜ y Γ T ( ˜ y T )Γ T ( ˜ y T ) (42) = Z d . (43) The accuracy of (39) mainly depends on the fluctuations of Γ T ( ˜ y ( ` ) T ) . The estimator can therefore be expected to work well at sufficiently low temperatures where these fluctuations disappear . C. V ariance of the Estimates The variance of the importance sampling estimator (39) is V ar[ ˆ Z Imp d ] = E ˆ Z Imp d 2 − E ˆ Z Imp d 2 (44) = Z 2 T L X valid ˜ y p T ( ˜ y T )Γ T ( ˜ y T ) 2 − Z 2 d L (45) = 1 L X valid ˜ y Γ T ( ˜ y T ) 2 p T ( ˜ y T ) Γ T ( ˜ y T ) 2 − Z 2 d L (46) = Z 2 d L X valid ˜ y p d ( ˜ y ) 2 p T ( ˜ y ) − 1 , (47) where both p d ( ˜ y ) = Γ( ˜ y ) / Z d and p T ( ˜ y T ) = Γ T ( ˜ y T ) / Z T are probability mass functions defined on the valid configurations of the dual normal factor graph. W e thus have V ar[ ˆ Z Imp d ] L Z 2 d = χ 2 p d , p T , (48) 7 where χ 2 ( · , · ) denotes the chi-squared di ver gence, which is always nonnegati ve and is equal to zero if and only if its two arguments are equal [47, Chapter 4]. An analogous deriv ation for the uniform sampling estimator (29) yields V ar[ ˆ Z Uni d ] L Z 2 d = χ 2 p d , p u , (49) where p u ( ˜ y ) is the uniform distribution over the valid config- urations. In the lo w-temperature limit with e J e q for all e ∈ E , both p d and p T become uniform ov er the valid configurations and both (48) and (49) vanish. More importantly , the variance of the importance sampling estimator (48) vanishes under the weaker condition (weaker for nonconstant couplings) e J e q for e ∈ T , (50) since in this case p d ( ˜ y ) ∝ Γ T ( ˜ y T )Γ T ( ˜ y T ) conv erges to p T ( ˜ y ) ∝ Γ T ( ˜ y T ) if Γ T ( ˜ y T ) becomes constant. For the Ising model on a 2D torus, a more detailed analysis of the variance of our proposed Monte Carlo methods is giv en in Appendix A. D. Choosing the P artitioning The choice of T and T does not affect the performance of the uniform sampling estimator (29), but it can af fect the performance (i.e., the con vergence) of the importance sampling estimator (39) for nonconstant couplings. Recall that the normalized variance (48) vanishes if (50) holds. This result suggests to include into T only edges e with stronger couplings (i.e., with large J e ). T o this end, the following heuristic strate gy can be used: choose T to be a spanning tree that maximizes P e ∈T J e . This is a maximum-spanning tree problem, that can be solved efficiently with complexity linear in N and |E | (see [48, Chapter . VI]). W e will use this heuristic strategy in our numerical experiments in Section VI. V I . N U M E R I C A L E X P E R I M E N T S In this section, we demonstrate the methods of Section V with some numerical experiments. In Sections VI-A to VI-C, we work with tractable models where the partition function can be computed exactly via the junction tree algorithm [49]; larger grids are considered in Section VI-D. A. Comparison with Deterministic Algorithms W e first consider the Potts model with q = 3 on an 8 × 8 grid with periodic boundary conditions. For this size of grid, we were able to compute the exact value of the partition function. In Figs. 6 and 7, we compare the accuracy of the proposed methods with three standard deterministic algorithms: BP , GBP [9], [50], [51], and T reeEP [10]. These three algorithms turned out to perform best, in our setting, among all deter- ministic methods implemented in [52]. (Among the different versions of GBP in [52], we selected the one with the best performance.) The accurac y of the proposed Monte Carlo methods depends on the number of samples, but the result is exact (with 0 0 . 25 0 . 5 0 . 75 1 1 . 25 1 . 5 0 . 5 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 J ln( ˆ Z ) / N Exact ln( Z ) / N Swendsen-W ang Uni (primal) Imp Uni (dual) Fig. 8: Comparison with uniform sampling and the Swendsen- W ang algorithm: experimental results for a Potts model with q = 3 , N = 8 × 8 , periodic boundary conditions, and constant couplings J . The plot shows ln( ˆ Z ) / N as a function of J . probability one) in the limit of infinitely many samples. By contrast, the deterministic algorithms (BP , GBP , and T reeEP) yield approximations whose accuracy is not improved beyond con ver gence. Howe ver , it should be emphasized that determin- istic algorithms con ver ge much faster than our Monte Carlo methods. Figs. 6 and 7 show the relati ve error | log ˆ Z − log Z | log Z (51) for the dif ferent estimates ˆ Z . The labels “Uni” and “Imp” refer to uniform sampling as in Section V -A and importance sampling as in Section V -B, respecti vely . In Fig. 6, the couplings J e = J are constant. For the proposed Monte Carlo methods, log ˆ Z in (51) is averaged ov er 50 trials, each with L = 10 8 samples (taking about two minutes on a 2GHz Intel Xeon CPU). In Fig. 7, the couplings are chosen randomly according to a half-normal distrib ution: J e = | J 0 e | with J 0 e i.i.d. ∼ N (0 , σ 2 ) for all e ∈ E . The spanning tree was chosen according to the heuristic strate gy proposed in Section V -D. The plot shows the relativ e error (51), where log Z is av eraged ov er 25 independent realizations of the couplings and for ev ery realization, log ˆ Z is averaged over L = 10 8 samples. Both Figs. 6 and 7 clearly show that, for sufficiently strong couplings (i.e., at low temperature), the importance sampling algorithm (39) yields much better estimates of the partition function than the other algorithms. In particular, from Fig. 7, we observe that importance sampling outperforms the second best approach (T reeEP) by more than two orders of magnitude for σ 2 > 1 . 5 . The accuracy of the proposed methods can still be improved, of course, by increasing the number of samples. B. Comparison with Standard Monte Carlo Methods W e again consider the Potts model with q = 3 on an 8 × 8 grid with periodic boundary conditions. In Fig. 8, we compare 8 4 5 6 7 8 9 10 11 12 13 14 10 − 1 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 M L J = 0 . 5 J = 1 . 0 J = 1 . 5 J = 2 . 0 Fig. 9: Number of required samples as a function of the width of the grid M to achiev e a relativ e error (51) of 10 − 2 , for a 2D Potts model with q = 3 , free boundary conditions, and constant couplings J . the proposed methods with two Monte Carlo methods that operate in the primal Potts model. The first of these (standard) Monte Carlo algorithms (labeled “Uni (primal)” in Fig. 8) is a baseline algorithm: we use uniform samples x (1) , . . . , x ( L ) ∈ X N and form the (unbiased) estimate ˆ Z Uni = q N L L X ` =1 f ( x ( ` ) ) (52) with f as in (7). The second standard Monte Carlo algorithm uses the Swendsen-W ang algorithm [34] to obtain samples x (1) , . . . , x ( L ) ∈ X N according to the Boltzmann distribution (4). From these samples, we form the Ogata-T anemura estimate ˆ Z O T = 1 Lq N L X ` =1 1 f ( x ( ` ) ) ! − 1 , (53) which satisfies E[ ˆ Z − 1 O T ] = Z − 1 . For more details on the Ogata- T anemura estimator , see [53], [32]. Fig. 8 shows the estimated ln( Z ) / N , where the results were obtained by averaging over 50 trials, each with L = 10 8 samples. It is clear from Fig. 8 that importance sampling as in Section V -B works well for large couplings (low tempera- tures), where both standard Monte Carlo algorithms fail. W e do not here compare the proposed methods with an- nealed (i.e., multi-temperature) Monte Carlo methods [54]: since, in principle, annealing can also be used in the dual normal factor graph; the advantage of the dual graph ov er the primal graph at low temperatures extends also to Monte Carlo methods with annealing. C. Scaling Behavior of the Importance Sampling Algorithm W e analyze the performance of the importance sampling algorithm in the dual normal factor graph in terms of the required number of samples L to achie ve a gi ven relative error 4 5 6 7 8 9 10 11 12 13 14 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 M L J = 0 . 5 J = 1 . 0 J = 1 . 5 J = 2 . 0 Fig. 10: Number of required samples as a function of the width of the grid M to achie ve a relativ e error (51) of 10 − 3 , for a 2D Potts model with q = 3 , free boundary conditions, and constant couplings J . as a function of the width of the gird M . If the desired relati ve error was not achieved after L = 10 8 samples, we stopped the simulations. W e consider a 3-state Potts model with constant couplings J e = J , with free boundary conditions, and on an M × M grid, where up to M = 14 we were able to compute the exact value of the partition function. Figs. 9 and 10 show experimental results to achiev e a relativ e error of 10 − 2 and 10 − 3 , respecti vely . For J = 2 (i.e., when the temperature is lo w enough), the number of required samples is almost independent of M . For J = 1 . 5 , to achie ve a relati ve error of 10 − 3 , the number of required samples increases with M ; but it remains almost constant to achieve a relative error of 10 − 2 . W e take these results as evidence that the importance sampling algorithm is robust at low temperature. On the other hand, for weaker couplings, L grows quickly as a function of M . D. Lar ger Grids Fig. 11 shows results for a fixed realization of a Potts model with q = 3 on a grid of size N = 40 × 40 and with couplings J e i.i.d. ∼ U [2 . 5 , 3 . 0] for all e ∈ E . The plot shows ln( ˆ Z ) / N vs. the number of samples L for five independent runs of the Monte Carlo algorithms. It is obvious (and unsurprising) that importance sampling con verges more quickly than uniform sampling. In this example, importance sampling yields the estimate ln( ˆ Z ) / N ≈ 5 . 493 . Fig. 12 shows results obtained from importance sampling for a fixed realization of an Ising model of size N = 50 × 50 and with couplings J e i.i.d. ∼ U [2 . 0 , 3 . 5] for all e ∈ E . The estimated ln( Z ) / N is about 6 . 00314 . V I I . T H E 2 D P O TT S M O D E L I N A N E X T E R N A L F I E L D W e now extend the proposed Monte Carlo methods to Potts models with an external field, where the Hamiltonian (3) (the 9 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 5 . 476 5 . 480 5 . 484 5 . 488 5 . 492 5 . 496 L ln( ˆ Z ) / N Importance Sampling Uniform Sampling Fig. 11: Experimental results for a fixed realization of a Potts model with q = 3 , N = 40 × 40 , and coupling parameters J e i.i.d. ∼ U [2 . 5 , 3 . 0] for all e ∈ E . The plot shows the estimated ln( Z ) / N using importance sampling (solid black lines) and using uniform sampling (dashed blue lines) in the dual normal factor graph. energy of a configuration x ) is generalized to H ( x ) = − X ( k,` ) ∈E J k,` · δ ( x k − x ` ) − N X k =1 H k · δ ( x k ) , (54) where the real parameters H k represent the external field. W e restrict ourselves to the standard case where the external field affects the variable x k only if x k = 0 , cf. [55, Chapt. 1]. W e will also assume H k ≥ 0 (55) for all k . The partition function is Z = X x ∈X N e −H ( x ) . (56) Follo wing our approach in Section III, we can construct the the primal normal factor graph of the model as shown in Fig. 13, where the empty boxes represent (13) and the small empty boxes represent factors given by τ k ( x k ) = e H k , if x k = 0 1 , otherwise. (57) A. Dual Normal F actor Graph The corresponding dual normal factor graph is sho wn in Fig. 14. The only change with respect to Fig. 4 is the additional factors λ k ( ˜ z k ) , k = 1 , . . . , N , which are the 1D Fourier transforms of the factors (57) giv en by 2 λ k ( ˜ z k ) = 1 q q − 1 X x k =0 τ k ( x k ) e − i2 π x k ˜ z k /q (58) 2 Here, in contrast to (17), a local scale factor 1 /q is included in the definition of the 1D Fourier transform. For the 2D torus, this makes the scale factor Z d / Z equal to q N in Potts models with or without an external field. If we do not include the local scale factor 1 /q in (58), we need to distinguish between two cases: Z d = q N Z for the 2D torus without an external field, and Z d = q 2 N Z for the 2D torus in the presence of an external field. 10 0 10 1 10 2 10 3 10 4 10 5 6 . 00295 6 . 00304 6 . 00313 6 . 00322 6 . 00331 L ln( ˆ Z ) / N Fig. 12: Experimental results for a fixed realization of an Ising model with N = 50 × 50 and couplings J e i.i.d. ∼ U [2 . 0 , 3 . 5] for all e ∈ E . The plot shows the estimated ln( Z ) / N using importance sampling in the dual normal factor graph. Thus λ k ( ˜ z k ) = e H k − 1 + q q , if ˜ z k = 0 e H k − 1 q , otherwise. (59) Note that (59) is nonnegati ve due to (55). Again, from the scale factor (21), for the 2D torus we have Z d = q N Z. (60) B. P artitioning the V ariables For the Potts model in an external field, the variables in the dual normal factor graph consist of ˜ Y = { ˜ Y e : e ∈ E } and ˜ Z = { ˜ Z k : k ∈ { 1 , . . . , N }} . Again, we partition these variables into ( ˜ Y , ˜ Z ) T and ( ˜ Y , ˜ Z ) T such that, in any valid configuration, the v ariables in ( ˜ Y , ˜ Z ) T are linearly independent and the variables in ( ˜ Y , ˜ Z ) T are fully determined by ( ˜ Y , ˜ Z ) T via a linear transformation. Ho wev er , the v ariables ˜ Z are not independent. Proposition 2. In ev ery valid configuration, it holds that N X k =1 ˜ z k = 0 (61) 2 Proof: Because of the zero-sum constraints on the vertices of the dual normal factor graph, each ˜ z k (in any v alid config- uration) can be written as the sum of the variables attached to the corresponding zero-sum indicator function. Howe ver , each variable will appear exactly twice in (61), once as y e and once as − y e . This completes the proof. 2 In the absence of an external field, the number of valid configurations in the dual normal factor graph of the Potts model is q |T | , cf. Section IV -A. Furthermore, according to Proposition 2, adding an external field increases the number of independent variables (free components) by N − 1 , and 10 = + ◦ = + ◦ = + ◦ = + ◦ + ◦ + ◦ + ◦ = + ◦ = + ◦ = + ◦ = + ◦ + ◦ + ◦ + ◦ = + ◦ = + ◦ = + ◦ = + ◦ + ◦ + ◦ + ◦ = + ◦ = + ◦ = + ◦ = κ 1 Z Z τ 1 Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Fig. 13: Primal normal factor graph of the 2D Potts model in an external field. The only change with respect to Fig. 3 is the additional factors τ k giv en by (57). The periodic boundary conditions are not shown. thus increases the number of valid configurations to q |T | + N − 1 . Therefore, for the 2D torus in an external field, the number of valid configurations in the dual normal factor graph is q 2 N , whereas it is q N in the primal normal factor graph. Both the uniform sampling algorithm and the importance sampling algorithm of Section V can be adapted to the present setting. W e describe only the latter for two such partitionings. The first partitioning is suitable for models in the low-temperature regime, and the second one is designed for models that are in the presence of a strong external field. The performance of the proposed algorithms depends on the choice of the partitioning, as will be illustrated by our numerical experiments in Section VII-E. C. Importance Sampling for Models at Low T emperatur e An obvious choice for ( ˜ Y , ˜ Z ) T with the required properties includes ˜ Y T (i.e., a spanning tree as in Section IV -A) and N − 1 components of ˜ Z , i.e., ( ˜ Y , ˜ Z ) T = ( ˜ Y T , ˜ Z \ ˜ Z 1 ) , (62) which implies ( ˜ Y , ˜ Z ) T = ( ˜ Y T , ˜ Z 1 ) . (63) The quantities Γ T and Γ T from (25) and (26) are then generalized to Γ T ( ˜ y , ˜ z ) T = Y e ∈T γ e ( ˜ y e ) N Y k =2 λ k ( ˜ z k ) (64) and Γ T ( ˜ y , ˜ z ) T = λ 1 ( ˜ z 1 ) Y e ∈T γ e ( ˜ y e ) , (65) respectiv ely . + = ◦ + = ◦ + = ◦ + = ◦ = ◦ = ◦ = ◦ + = ◦ + = ◦ + = ◦ + = ◦ = ◦ = ◦ = ◦ + = ◦ + = ◦ + = ◦ + = ◦ = ◦ = ◦ = ◦ + = ◦ + = ◦ + = ◦ + γ 1 Z Z λ 1 Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Fig. 14: Dual normal factor graph of the 2D Potts model in an external field. The only change with respect to Fig. 4 is the additional factors λ k giv en by (59). The periodic boundary conditions are not shown. The quantity Z T from (37) is generalized to Z T = X ( ˜ y , ˜ z ) T Γ T ( ˜ y , ˜ z ) T (66) = Y e ∈T q − 1 X ξ =0 γ e ( ξ ) N Y k =2 q − 1 X ξ 0 =0 λ k ( ξ 0 ) (67) = Y e ∈T q e J e N Y k =2 e H k ! (68) = q |T | exp X e ∈T J e + N X k =2 H k . (69) The algorithm then goes as follows. 1) Generate L independent samples ( ˜ x , ˜ y ) (1) T , . . . , ( ˜ y , ˜ z ) ( L ) T from the distribution p T ( ˜ y , ˜ z ) T = Γ T ( ˜ y , ˜ z ) T Z T (70) 2) For each sample ( ˜ y , ˜ z ) ( ` ) T , compute its unique extension ( ˜ y , ˜ z ) ( ` ) T to a valid configuration ( ˜ y , ˜ z ) ( ` ) , including ˜ z 1 , which can be computed as ˜ z 1 = − N X m =2 ˜ z m (71) from (61). 3) Compute the estimate ˆ Z Imp d = Z T L L X ` =1 Γ T ( ˜ y , ˜ z ) ( ` ) T . (72) The estimate (72) is easily verified to be unbiased, cf. (43). Creating the samples ( ˜ y , ˜ z ) ( ` ) T in Step 1 is straightforward 11 0 . 1 0 . 5 0 . 9 1 . 3 1 . 7 2 . 1 2 . 5 10 − 7 10 − 6 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 Relativ e Error H = 0 . 05 0 . 1 0 . 5 0 . 9 1 . 3 1 . 7 2 . 1 2 . 5 J H = 0 . 1 0 . 1 0 . 5 0 . 9 1 . 3 1 . 7 2 . 1 2 . 5 H = 0 . 2 BP GBP TreeEP Uni Imp (72) Fig. 15: Comparison with deterministic algorithms (BP , GBP , and TreeEP): experimental results for a Potts model with q = 3 , N = 8 × 8 , constant coupling parameter J , and in a constant external field H . The plots show the relativ e error (51) as a function of J . Left: H = 0 . 05 ; middle: H = 0 . 1 ; right: H = 0 . 2 . since the distribution in (70) decomposes into a product: first sample the components ˜ y ( ` ) e , e ∈ T , of each sample independently according to (38), then sample the components ˜ z ( ` ) 2 , . . . , ˜ z ( ` ) N of each sample independently according to P ˜ z ( ` ) k = ξ = 1 + ( q − 1) e − H k q , if ξ = 0 1 − e − H k q , for ξ = 1 , 2 , . . . , q − 1 . (73) W ith this choice of partitioning, it can be verified that (48) vanishes when e J e q for e ∈ T . One can design a slightly different algorithm with rejections by drawing all the components ˜ z ( ` ) 1 , ˜ z ( ` ) 2 , . . . , ˜ z ( ` ) N according to (73); b ut accept only the samples that satisfy (61). The corresponding partitioning in the dual normal factor graph is ( ˜ Y , ˜ Z ) T = ( ˜ Y T , ˜ Z ) . (74) Accordingly Γ T ( ˜ y , ˜ z ) T = Y e ∈T γ e ( ˜ y e ) N Y k =1 λ k ( ˜ z k ) (75) and Γ T ( ˜ y , ˜ z ) T = Y e ∈T γ e ( ˜ y e ) . (76) For more details, see [3]. D. Importance Sampling for Models in a Str ong External F ield W e assume another partitioning of ( ˜ Y , ˜ Z ) giv en by ( ˜ Y , ˜ Z ) T = ˜ Y , (77) which implies ( ˜ Y , ˜ Z ) T = ˜ Z . (78) This partitioning generalizes Γ T and Γ T to Γ T ˜ y = Y e ∈E γ e ( ˜ y e ) (79) and Γ T ˜ z = N Y k =1 λ k ( ˜ z k ) . (80) The partition function Z T is then generalized to Z T = X ˜ y Γ T ˜ y (81) = q |E | exp X e ∈E J e . (82) The importance sampling algorithm goes as follo ws. 1) Generate L independent samples ˜ y (1) , . . . , ˜ y ( L ) accord- ing to p T ˜ y = Γ T ( ˜ y ) Z T . (83) 2) From each sample ˜ y ( ` ) , compute ˜ z ( ` ) . 3) Compute the unbiased estimate ˆ Z Imp d = Z T L L X ` =1 Γ T ˜ z ( ` ) . (84) In this setting, it can be verified that (48) vanishes in the limit of the strong field (i.e., for e H q ). For more details, see [2]. The partitioning in (77) and (78) will also be used in Section VIII to establish connections between the dual normal factor graph representation of the 2D Ising model in an external field and the high-temperature series expansions of the partition function. E. Numerical Experiments Experimental results for a Potts model with N = 8 × 8 , q = 3 , and periodic boundary conditions are shown in Figs. 15 and 16, for constant coupling parameter J e = J and for constant external field H k = H . The partition function is computed exactly via the junction tree algorithm, which allo ws to compare the accuracy of dif ferent algorithms. For the Monte Carlo algorithms, the estimates are a veraged over 50 trials, each with L = 10 8 samples. The accuracy of the estimates is then compared with deterministic algorithms: BP , GBP , and 12 T reeEP as in Section VI-A. The figures show the relati ve error (51) as a function of J . Fig. 15 shows results for H = 0 . 05 (left), H = 0 . 1 (middle), and H = 0 . 2 (right). W e find that GBP works well, especially for H = 0 . 2 ; but the importance sampling algorithm (72) is more accurate at low temperatures (i.e., for large J ) and in weak external fields. For H = 1 . 0 , experimental results are shown in Fig. 16, where we consider both importance sampling algorithms pro- posed in (72) and (84). As expected, we observe that the importance sampler in (84) outperforms (72) for stronger external fields (stronger compared to the coupling parameter). Howe ver , GBP performs extremely well in this setting, indeed its relativ e error is below 10 − 7 in the whole range. The accuracy of the importance sampling algorithms can be improv ed, of course, by increasing the number of samples. V I I I . T H E 2 D I S I N G M O D E L I N A N E X T E R N A L F I E L D , T H E H I G H - T E M P E R A T U R E S E R I E S E X PA N SI O N , A N D T H E S U B G R A P H S - W O R L D P RO C E S S In this section, we show the equi valence between the valid configurations in the dual normal factor graph and the config- urations in Jerrum and Sinclair subgraphs-world process [24] for a ferromagnetic Ising model on a 2D torus. Following [24], we restrict our focus to Ising models in a constant external field H . For this model, the energy of a configuration x ∈ X N is giv en by the Hamiltonian [8, Chapter 3] H ( x ) = − X ( k, ` ) ∈ E J k,` · 2 δ ( x k − x ` ) − 1 − H N X k =1 1 − 2 δ ( x k ) . (85) The primal normal factor graph of the model is shown in Fig. 13, where the empty boxes represent (16) and the small empty boxes represent the factors τ ( x k ) = e − H , if ˜ x k = 0 e H , if ˜ x k = 1 . (86) Fig. 14 shows the dual normal factor graph of the 2D Ising model in an external field, where the empty boxes represent (19) and the small empty boxes represent the factors λ ( ˜ z k ) , which are the 1D Fourier transforms of (86), and are giv en by λ ( ˜ z k ) = cosh( H ) , if ˜ z k = 0 − sinh( H ) , if ˜ z k = 1 . (87) Again, for the Ising model, the “ ◦ ” symbols are immaterial and can be removed from normal factor graphs. Since the partition function is in variant under the change of sign of the external field [7, Chapter 1], we will assume H ≤ 0 . Therefore, f actors (87) are nonnegati ve. The in variance of the partition function under the change of sign of H is implied by Proposition 2, as in any valid configuration in the dual normal factor graph, it holds that N X k =1 ˜ z k = 0 , (88) 0 . 1 0 . 3 0 . 5 0 . 7 0 . 9 1 . 1 10 − 8 10 − 7 10 − 6 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 J Relativ e Error BP GBP TreeEP Imp (72) Imp (84) Fig. 16: Comparison with deterministic algorithms (BP , GBP , and T reeEP): experimental results for a Potts model with q = 3 , N = 8 × 8 , constant coupling parameter J , and in a constant external field H = 1 . 0 . The plots show the relativ e error (51) as a function of J . i.e., the Hamming weight of ˜ Z is always ev en, where the Hamming weight of a configuration is the number of nonzero components of that configuration [56]. Indeed, Q N k =1 λ ( ˜ z k ) takes on the same positive value regardless of the sign of H . In [24], the authors propose a Mark ov chain (called the subgraphs-world process), which is defined on the set of edges W ⊆ E of the interaction graph of the model (as in Fig. 14). The scheme of Jerrum and Sinclair then uses the following expansion of the partition function in powers of tanh( H ) and tanh( J ) Z = A X W ⊆E tanh( H ) | odd ( W ) | Y ( k,` ) ∈W tanh( J e ) , (89) where odd ( W ) denotes the set of all odd-degree vertices in the subgraph of E induced by W , and A = 2 cosh( H ) N Y ( k,` ) ∈W cosh( J e ) . (90) The sum in (89) is known as the high-temperature series expansion in statistical physics [57], [8, p. 94]. In the dual normal factor graph, we adopt the partitioning of ( ˜ Y , ˜ Z ) proposed in (77) and (78). W e can thus freely choose the variables ˜ Y = { ˜ Y e : e ∈ E } , and therefrom compute the variables ˜ Z = ˜ Z k : k ∈ { 1 , . . . , N } . The partition function Z d can then be written as Z d = X valid ( ˜ y , ˜ z ) N Y k =1 λ ( ˜ z k ) Y e ∈E γ e ( ˜ y e ) (91) = 2 |E | cosh( H ) N Y e ∈E cosh( J e ) · X valid ( ˜ y , ˜ z ) N Y k =1 tanh( | H | ) ˜ z k Y e ∈E tanh( J e ) ˜ y e . (92) 13 In a 2D torus |E | = 2 N , thus Z d = 2 N A X valid ( ˜ y , ˜ z ) tanh( | H | ) P N k =1 ˜ z k Y e ∈E tanh( J e ) ˜ y e , (93) where A is as in (90). Accordingly , we define S ⊆ E as S ( ˜ y ) 4 = { e : ˜ Y e = 1 } . (94) Here, as ˜ y runs ov er the configurations in the dual normal factor graph, S ( ˜ y ) runs ov er all the 2 |E | subsets of E . Let us consider the subgraph of E induced by S , in which ˜ z k = 1 if it is connected to a zero-sum indicator function with odd degree, and ˜ z k = 0 otherwise. Therefore, P N k =1 ˜ z k counts the number of odd-degree vertices in S . Thus, from (93) we obtain Z d = 2 N A X S ⊆E tanh( | H | ) | odd( S ) | Y e ∈S ( ˜ y ) tanh( J e ) . (95) After applying the scale factor (60) in (95), we will obtain the high-temperature series expansion of the partition function giv en by (89). W e conclude that the configurations in the subgraphs-world process coincide with the v alid configurations in the dual normal factor graph of the Ising model in an external field. (The number of vertices with odd degree in the subgraph S is always even, which is also implied by Proposition 2.) The scheme of Jerrum and Sinclair works on a Marko v chain whose states are configurations of the subgraphs-world and whose stationary distribution is given by p d , cf. Section V. T ransitions occur between states that differ in a single edge according to the Metropolis rule. Remarkably , the mixing time of the proposed Markov chain is only polynomial in the size of the model at all temperatures. Indeed, a rigorous analysis shows that the expected running time of the generator for the subgraphs-world configurations is upper bounded by O | E | 2 N 8 (log − 1 + | E | ) , where is the confidence parame- ter . The scheme is randomized, i.e., it provides approximations to the partition function, which fall within arbitrary small error bounds with high probability [24, Section 4]. Our proposed unbiased Monte Carlo methods in the dual normal factor graph draw independent samples according to an auxiliary distribution. The partition function is then estimated by averaging according to the importance sampling weights of the independent samples, cf. (39). Monte Carlo methods of this paper work particularly well in the low-temperature regime. The main focus of this paper is on the (nonbinary) Potts model, where Monte Carlo methods in the dual normal factor graph outperform the state-of-the-art methods at low temperatures (with no external field or in a weak external field). Howe ver , approximating the partition function of the ferromagnetic Potts model is already as hard as approximating the number of independent sets in a bipartite graph, which is among the presumably intractable (the # BIS-hardness) problems. For more details see [25]–[27]. I X . C O N C L U S I O N W e re viewed representations of the Ising and Potts models of statistical physics in terms of normal f actor graphs, and further explored the idea that Monte Carlo algorithms in the dual normal factor graph can yield good estimates of the partition function at low temperatures. Specifically , we pro- posed and in vestigated such algorithms for the ferromagnetic Potts model, and we observed good con ver gence for strong couplings (i.e., at low temperatures). In our numerical experi- ments, for the 2D ferromagnetic Potts models, the importance sampling algorithms in the dual normal factor graph yield more accurate estimates at low temperatures (and with no, or weak, external field) than the state-of-the-art deterministic and Monte Carlo methods. W e expect such Monte Carlo methods in the dual normal factor graph to work well also for three- dimensional grids and in graphical models with more general topologies. Using deterministic algorithms (e.g., GBP and T reeEP) in the dual graph is certainly possible, but it has not been tried yet. W e also sho wed the equiv alence between the valid config- urations in the dual normal factor graph and the terms that appear in the high-temperature series e xpansion of the partition function of the ferromagnetic Ising model in an external field, and discussed connections with the subgraphs-world process in the randomized approximate scheme of Jerrum and Sincalir . Finally , it should be mentioned that the factors in the dual normal factor graph can, in general, be negati ve or even complex-v alued, which could be a serious challenge for Monte Carlo methods. Such issues were a voided in this paper by considering only ferromagnetic models, but the y must be faced in an y attempt to deal with antiferromagnetic models, spin glasses, and computational problems in quantum information processing [58]–[61]. A P P E N D I X A C O M PAR I N G T H E V A R I A N C E O F M O N T E C A R L O M E T H O D S I N T H E P R I M A L A N D I N T H E D U A L N O R M A L F AC T O R G R A P H S O F T H E 2 D I S I N G M O D E L W e compare the variance of the uniform sampling and importance sampling estimators in the primal and in the dual domains for estimating the partition function of the Ising model on a 2D torus, with constant coupling parameter J , without an external field, and in the thermodynamic limit (i.e., as N → ∞ ). The choice of the model and the parameters is due to the fact that the partition function is analytically av ailable from Onsager’ s solution in this case, see [44], [7, Chapter 7]. For this model, the critical coupling (i.e., the phase transi- tion) is located at J c = 1 2 ln(1 + √ 2) ≈ 0 . 4407 (96) and, at criticality , the deriv ativ e of ln Z with respect to J (i.e., the internal energy of the model) is given by lim N →∞ 1 N ∂ ln Z ( J c ) ∂ J c = √ 2 , (97) see [62]. In the primal domain, the analytical solution of the partition function allows us to calculate the exact value of the variance of the uniform sampling estimator as a function of J . In 14 the dual domain, we provide upper and lower bounds on the v ariance of the estimators. The deri ved bounds are not necessarily tight for all values of J , howe ver , they are good enough to illustrate the opposite behavior of the estimators in the primal and in the dual domains. W e recall from (23) and (24) that for the 2D torus |T | = N − 1 and |T | = N + 1 . A. Uniform Sampling Follo wing our analysis in Section V -C, the variance of the uniform sampling estimator in the primal domain (52) can be expressed as V ar[ ˆ Z Uni ] L Z 2 = χ 2 p B , p u , (98) where p B ( x ) is the Boltzmann distribution giv en by (4) and p u ( x ) is the uniform distribution ov er all the configurations. It follows that 1 + V ar[ ˆ Z Uni ] L Z ( J ) 2 = X x p B ( x ) 2 p u ( x ) (99) = 2 N Z ( J ) 2 X x f ( x ) 2 (100) = 2 N Z (2 J ) Z ( J ) 2 , (101) where Z ( J ) denotes the partition function ev aluated at J , and the last step is due to the following identity Z (2 J ) = X x f ( x ) 2 . (102) Thus, in the thermodynamic limit we obtain lim N →∞ 1 N ln 1 + V ar[ ˆ Z Uni ] L Z ( J ) 2 = ln(2) + lim N →∞ ln Z (2 J ) N − lim N →∞ 2 ln Z ( J ) N . (103) W e use the closed-form solution of the partition function to ev aluate (103) numerically as a function of J , which is plotted by the solid black line in Fig. 17. As expected, we observe that uniform sampling in the primal domain can provide good estimates of the partition function when J is small (i.e., at high temperature), while it is an inef ficient estimator for larger values of J (i.e., at low temperature). From (49), we expand the variance of the uniform sampling algorithm in the dual domain (29) as 1 + V ar[ ˆ Z Uni d ] L Z d ( J ) 2 = X valid ˜ y p d ( ˜ y ) 2 p u ( ˜ y ) (104) = 2 |T | Z d ( J ) 2 X valid ˜ y Γ( ˜ y ) 2 (105) = 2 N +1 Z d ( J ) 2 R, (106) where R 4 = P valid ˜ y Γ( ˜ y ) 2 . From (21), we have Z d = 2 N Z . Thus 1 + V ar[ ˆ Z Uni d ] L Z d ( J ) 2 = 2 − N +1 Z ( J ) 2 R. (107) 0 0 . 5 1 1 . 5 2 2 . 5 0 0 . 2 0 . 4 0 . 6 0 . 8 J lim N →∞ 1 N ln 1 + V ar ˆ Z · L Z 2 Uni primal exact result (103) Uni dual upper bound (114) Uni dual lower bound (117) Imp dual upper bound (126) Fig. 17: Comparing the variance of Monte Carlo methods in the primal and in the dual domains for the Ising model on a 2D torus, with constant coupling J , and in the thermodynamic limit. The solid black line shows (103); for the uniform sampling algorithm in the dual domain the dotted blue line shows the upper bound in (114) and the dashed blue line shows the lower bound in (117); for the importance sampling algorithm in the dual domain the dashed-dotted red line shows the upper bound in (126). In the sequel, we will derive upper and lower bounds on R , which is the partition function of a dual normal factor graph (as shown in Fig. 4) with factors giv en by ρ ( ˜ y e ) = 4 cosh( J ) 2 , if ˜ y e = 0 4 sinh( J ) 2 , if ˜ y e = 1 . (108) Thus R = X valid ˜ y Y e ∈E ρ ( ˜ y e ) (109) ≤ 4 cosh( J ) 2 |T | X ˜ y T Y e ∈T ρ ( ˜ y e ) (110) = 2 cosh( J ) 2( N − 1) R T . (111) Here, R T is the partition function of a subgraph of the dual normal factor graph induced by T , which can be computed exactly as R T = ρ (0) + ρ (1) |T | (112) = 4 cosh(2 J ) N +1 . (113) Combining (107), (111), and (113) yields the follo wing upper bound lim N →∞ 1 N ln 1 + V ar[ ˆ Z Uni d ] L Z d ( J ) 2 ≤ 3 ln(2) + ln cosh(2 J ) · cosh( J ) 2 − lim N →∞ 2 ln Z ( J ) N , (114) which is plotted by the dotted blue line in Fig. 17. 15 T o obtain the lower bound, we note that R = X valid ˜ y Y e ∈E ρ ( ˜ y e ) (115) ≥ 4 cosh( J ) 2 2 N . (116) Combining (107) and (116) giv es the following lo wer bound lim N →∞ 1 N ln 1 + V ar[ ˆ Z Uni d ] L Z d ( J ) 2 ≥ 3 ln(2) + 4 ln cosh( J ) − lim N →∞ 2 ln Z ( J ) N , (117) which is plotted by the dashed blue line in Fig. 17. From Fig. 17, we observe that uniform sampling in the dual domain is inefficient for small values of J , howe ver , compared to uniform sampling in the primal domain, it can provide more reliable estimates of the partition function when J is large. (Recall from Section V -C that (49) v anishes as J → ∞ , i.e, in the low-temperature limit.) Both estimators seem to be inefficient in the mid- temperature regime and near criticality (96). B. Importance Sampling From (48), the variance of the importance sampling algo- rithm can be expressed as 1 + V ar[ ˆ Z Imp d ] L Z d ( J ) 2 = X valid ˜ y p d ( ˜ y ) 2 p T ( ˜ y ) (118) = Z 2 T Z d ( J ) 2 X valid ˜ y p T ( ˜ y )Γ T ( ˜ y T ) 2 . (119) From (21), we have Z d = 2 N Z . Moreo ver , Z T = X ˜ y T Γ T ( ˜ y T ) (120) = 2 |T | e J |T | (121) = 2 N +1 e J ( N +1) , (122) cf. (37). From (19) and (26), we obtain X valid ˜ y p T ( ˜ y )Γ T ( ˜ y T ) 2 ≤ 4 cosh( J ) 2 |T | (123) = 2 cosh( J ) 2( N − 1) . (124) Thus 1 + V ar[ ˆ Z Imp d ] L Z d ( J ) 2 = 2 2 N e 2 J ( N +1) Z ( J ) 2 cosh( J ) 2( N − 1) , (125) which, in the thermodynamic limit N → ∞ , gives the following upper bound lim N →∞ 1 N ln 1 + V ar[ ˆ Z Imp d ] L Z d ( J ) 2 ≤ 2 ln(2) + 2 J + 2 ln cosh( J ) − lim N →∞ 2 ln Z ( J ) N . (126) Let us denote the upper bound in (126) by U ( J ) , which is plotted by the dashed-dotted red line in Fig. 17. The deriv ative of U ( J ) with respect to the coupling parameter J is ∂ U ( J ) ∂ J = 2 + 2 tanh( J ) − lim N →∞ 2 N ∂ ln Z ( J ) ∂ J . (127) From (97), it is straightforward to verify that (127) is zero at the critical coupling J c giv en by (96). From Fig. 17 we observe that the upper bound U ( J ) grows to attain its maximum at J c , but then decays for J > J c . (Again, recall from Section V -C that (48) vanishes as J → ∞ .) A C K N O W L E D G M E N T S The authors are grateful to Hans-Andrea Loeliger for his comments and discussions related to the topics of this paper . The authors would like to thank David Forney , Pascal V on- tobel, Justin Dauwels, Alistair Sinclair, and Stefan Moser for their helpful comments on an earlier draft of this manuscript. The authors wish to thank the associate editor , Y ongyi Mao, for his constructiv e suggestions during the re view process. P art of this work w as done during the first author’ s stay at the Institut Henri Poincar ´ e – Centre Emile Borel, France, at the Information Theory and Coding Group, University of Pompeu Fabra, Spain, and at the Department of Statistics and Actuarial Science, Univ ersity of W aterloo, Canada. The first author thanks these institutions for hospitality and support. This work is supported in part by the Spanish Ministry of Economy and Competitiv eness under the Mara de Maeztu Units of Excellence Programme (MDM-2015-0502) and the Ramon y Cajal program R YC-2015-18878 (AEI/MINEICO/FSE,UE). R E F E R E N C E S [1] M. Molkaraie and H.-A. Loeliger, “Partition function of the Ising model via factor graph duality , ” Proc. 2013 IEEE Int. Symp. on Inf. Theory , Istanbul, T urkey , July 7–12, 2013, pp. 2304–2308. [2] M. Molkaraie, “ An importance sampling scheme for models in a strong external field, ” Pr oc. 2015 IEEE Int. Symp. on Inf. Theory , Hong K ong, June 14–19, 2015, pp. 1179–1183. [3] M. Molkaraie, “ An importance sampling algorithm for the Ising model with strong couplings, ” Pr oc. 2016 Int. Zurich Seminar on Communica- tions (IZS), Zurich, Switzerland, March 2–4, 2016, pp. 180–184. [4] F . R. Kschischang, B. J. Frey , and H.-A. Loeliger, “F actor graphs and the sum-product algorithm, ” IEEE T rans. Inf. Theory , vol. 47, pp. 498–519, Feb . 2001. [5] H.-A. Loeliger , “ An introduction to factor graphs, ” IEEE Signal Pr oc. Mag., vol. 29, pp. 28–41, Jan. 2004. [6] S. M. Aji and R. J. McEliece, “The generalized distributi ve law , ” IEEE T rans. Inf. Theory , vol. 46, pp. 325–343, March 2000. [7] R. J. Baxter, Exactly Solved Models in Statistical Mechanics. Dov er Publications, 2007. [8] J. M. Y eomans, Statistical Mechanics of Phase T ransitions. Oxford Univ ersity Press, 1992. [9] J. S. Y edidia, W . T . Freeman, and Y . W eiss, “Constructing free-energy approximations and generalized belief propagation algorithms, ” IEEE T rans. Inf. Theory , vol. 51, pp. 2282–2312, 2005. [10] Y . Qi and T . P . Minka, “T ree-structured approximations by expecta- tion propagation, ” Advances in Neural Information Pr ocessing Systems (NIPS), Dec. 2004, pp. 193–200. [11] E. Ising, “Beitrag zur theorie des ferromagnetismus, ” Zeitschrift f ¨ ur Physik A Hadr ons and Nuclei, vol. 31, pp. 253–258, Feb. 1925. [12] B. A. Cipra, “ An introduction to the Ising model, ” American Mathemat- ical Monthly , vol. 94, pp. 937–959, Dec. 1987. [13] R. B. Potts, “Some generalized order-disorder transformations, ” Proc. the Cambridge Philosophical Society , vol. 48, pp. 106–109, Jan. 1952. [14] F . Y . W u, “The Potts model, ” Rev . of Modern Phys., vol. 54, pp. 235– 268, Jan. 1982. [15] Y . Boyko v , O. V eksler , and R. Zabih, “Fast approximate energy mini- mization via graph cuts, ” IEEE T rans. on P attern Analysis and Machine Intelligence, vol. 23, pp. 1222–1239, Nov . 2001. 16 [16] J. Besag and P . J. Green, “Spatial statistics and Bayesian computation, ” Journal of the Royal Statistical Society . Series B, vol. 55, pp. 25–37, 1993 [17] F . Y . W u, “Potts model and graph theory , ” Journal of Statistical Physics, vol. 52, pp. 99–112, July 1988. [18] A. D. Sokal, “Bounds on the complex zeros of (di) chromatic polyno- mials and Potts-model partition functions, ” Combinatorics, Probability and Computing, vol. 10, pp. 41–77, 2001. [19] A. D. Sokal, “The multiv ariate T utte polynomial (alias Potts model) for graphs and matroids, ” Surveys in Combinatorics, vol. 327, pp. 173–226, Cambridge University Press, 2005. [20] P . W . Kasteleyn, “Dimer statistics and phase transitions, ” Journal of Mathematical Physics, vol. 4, pp. 287–293, Feb. 1963. [21] M. E. Fisher, “On the dimer solution of planar Ising models, ” Journal of Mathematical Physics , vol. 7, pp. 1776–1781, Oct. 1966. [22] V . G ´ omez, H. J. Kappen, and M. Chertkov , “ Approximate inference on planar graphs using loop calculus and belief propagation, ” Journal of Mach. Learn. Res. , vol. 11, pp. 1273–1296, April 2010. [23] V . V . V azirani, Appr oximation Algorithms. Springer, 2004. [24] M. Jerrum and A. Sinclair , “Polynomial-time approximation algorithms for the Ising model, ” SIAM Journal on Computing, vol. 11, pp. 1087– 1116, Oct. 1993. [25] L. A. Goldberg and M. Jerrum, “ Approximating the partition function of the ferromagnetic Potts model, ” Journal of the ACM , vol. 59, pp. 1222– 1239, Oct. 2012. [26] L. A. Goldberg and M. Jerrum, “The complexity of computing the sign of the T utte polynomial, ” SIAM Journal on Computing, vol. 43, pp. 1921–1952, Dec. 2014. [27] A. Galanis, D. ˇ Stefanko vi ˇ c, E. V igoda, and L. Y ang, “Ferromagnetic Potts model: refined #BIS-hardness and related results, ” SIAM Journal on Computing, vol. 45, pp. 2004–20065, Nov . 2016. [28] J. M. Hammersley and D. C. Handscomb, Monte Carlo Methods. Methuen & Co., London, 1964. [29] R. M. Neal, Pr obabilistic Infer ence Using Markov Chain Monte Carlo Methods. T echn. Report CRG-TR-93-1, Dept. Computer Science, Univ . of T oronto, Sept. 1993. [30] K. Binder and D. W . Heermann, Monte Carlo Simulation in Statistical Physics. Springer, 2010. [31] M. Molkaraie and H.-A. Loeliger, “Monte Carlo algorithms for the partition function and information rates of two-dimensional channels, ” IEEE T rans. Inf. Theory , vol. 59, pp. 495–503, Jan. 2013. [32] G. Potamianos and J. Goutsias, “Stochastic approximation algorithms for partition function estimation of Gibbs random fields, ” IEEE T rans. Inf. Theory , vol. 43, pp. 1984–1965, Nov . 1997. [33] I. Murray , D. Mackay , Z. Ghahramani, and J. Skilling, “Nested sampling for Potts models, ” Advances in Neural Information Pr ocessing Systems (NIPS), Dec. 2005, pp. 947–954. [34] R. H. Swendsen and J. S. W ang, “Nonuni versal critical dynamics in Monte Carlo simulations, ” Phys. Rev . Letters, vol. 58, pp. 86–88, Jan. 1987. [35] G. D. Forney , Jr ., “Codes on graphs: normal realizations, ” IEEE Tr ans. Inf. Theory , vol. 47, pp. 520–548, Feb . 2001. [36] A. Al-Bashabsheh and Y . Mao, “Normal factor graphs and holographic transformations, ” IEEE T rans. Inf. Theory , vol. 57, pp. 752–763, Feb. 2011. [37] G. D. Forney , Jr ., “Codes on graphs: duality and MacW illiams identities, ” IEEE T rans. Inf. Theory, vol. 57, pp. 1382–1397, Feb. 2011. [38] G. D. Forney , Jr . and P . O. V ontobel, “Partition functions of normal factor graphs, ” 2011 Information Theory and Applications W orkshop, La Jolla, USA, Feb. 6–11, 2011. [39] H. A. Kramers and G. H. W annier, “Statistics of the two-dimensional ferromagnet. Part I, ” Phys. Rev ., vol. 60, pp. 252–262, Aug. 1941. [40] A. Al-Bashabsheh and P . V ontobel, “The Ising model: Kramers-W annier duality and normal factor graphs, ” Proc. 2015 IEEE Int. Symp. on Inf. Theory , Hong Kong, June 14–19, 2015, pp. 2266–2270. [41] A. Al-Bashabsheh and P . V ontobel, “ A factor-graph approach to alge- braic topology , with applications to Kramers–W annier duality , ” IEEE T rans. Inf. Theory , vol. 64, pp. 7488–7510, Dec. 2018. [42] G. D. Forney , Jr ., “Codes on graphs: Models for elementary algebraic topology and statistical physics, ” IEEE T rans. Inf. Theory , vol. 64, pp. 7465–7487, Dec. 2018. [43] A. Al-Bashabsheh and Y . Mao, “On stochastic estimation of the partition function, ” Pr oc. 2014 IEEE Int. Symp. on Inf. Theory , Honolulu, USA, June 29 – July 4, 2014, pp. 1504–1508. [44] L. Onsager, “Crystal statistics. I. A two-dimensional model with an order-disorder transition, ” Phys. Rev ., vol. 65, pp. 117–149, Feb. 1944. [45] M. Molkaraie, “The primal versus the dual Ising model, ” Proc. 55th Annual Allerton Conf. on Communication, Contr ol, and Computing, Monticello, USA, Oct. 3–6, 2017, pp. 53–60. [46] B. Bollob ´ as, Modern Graph Theory . Springer, 1998. [47] I. Csisz ´ ar and P . C. Shields, “Information theory and statistics: a tutorial, ” F oundations and T r ends in Communications and Information Theory , vol. 1, 2004, pp. 417–528. [48] T . H. Cormen, C. E. Leiserson, C. Eric, R. L. Rivest, and C. Stein, Intr oduction to Algorithms. MIT Press, 2009. [49] K. P . Murphy , Machine Learning: A Probabilistic P erspective. MIT Press, 2012. [50] J. S. Y edidia, J. S. Freeman, W . T . Freeman, and Y . W eiss, “Generalized belief propagation, ” Advances in Neural Information Processing Systems (NIPS), Nov . 2000, pp. 689–695. [51] T . Heskes, A. Kees, and B. Kappen, “ Approximate inference and constrained optimization, ” Uncertainty in Artificial Intelligence (UAI), Aug. 2002, pp. 313–320. [52] J. M. Mooij, “libD AI: A free and open source C++ library for discrete approximate inference in graphical models, ” Journal of Mach. Learn. Res., vol. 11, pp. 2169–2173, Aug. 2010. [53] Y . Ogata and M. T anemura, “Estimation of interaction potentials of spatial point patterns through the maximum likelihood procedure, ” Ann. Inst. Statist. Math., vol. 33, pp. 315–338, 1981. [54] R. M. Neal, “ Annealed importance sampling, ” Statistics and Computing , vol. 11, pp. 125–139, April 2001. [55] H. Nishimoro and G. Oritz, Elements of Phase T ransition and Critical Phenomena. Oxford Univ ersity Press, 2011. [56] R. J. McEliece, The Theory of Information and Coding: A Mathematical F ramework for Communication. Addison-W esley , 1977. [57] G. F . Newell and E. W . Montroll, “On the theory of the Ising model of ferromagnetism, ” Rev . of Modern Phys., vol. 25, pp. 353–389, April 1953. [58] M. Molkaraie and H.-A. Loeliger, “Extending Monte Carlo methods to factor graphs with negati ve and complex factors, ” Pr oc. 2012 IEEE Information Theory W orkshop, Lausanne, Switzerland, Sept. 3–7, 2012, pp. 367–371. [59] M. X. Cao and P . O. V ontobel, “Estimating the information rate of a channel with classical input and output and a quantum state, ” Pr oc. 2017 IEEE Int. Symp. on Inf. Theory , Aachen, Germany , June 25–30, 2017, pp. 3205–3209. [60] M. X. Cao and P . O. V ontobel, “Double-edge factor graphs: definition, properties, and examples, ” Proc. 2017 IEEE Information Theory W ork- shop, Kaohsiung, T aiwan, Nov . 6–10, 2017, pp. 136–140. [61] H.-A. Loeliger and P . O. V ontobel, “Factor graphs for quantum proba- bilities, ” IEEE T rans. Inf. Theory , vol. 63, pp. 5642–5665, Sept. 2017. [62] B. M. McCoy and T . T . Wu, The T wo-Dimensional Ising Model. Courier Corporation, 2014.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment