A New Family of Tractable Ising Models

A New F amily of T ractable Ising Mo dels V alerii Likhoshersto v 1 , Y ury Maximo v (2 , 1) and Mic hael Chertk o v (3 , 2 , 1) 1 Sk olk o v o Institute of Science and T ec hnology , Mosco w, Russia 2 Theoretical Division and Cen ter for Nonlinear Studies, Los Alamos National Lab oratory , Los Alamos, NM, USA 3 Graduate Program in Applied Mathematics, Univ ersit y of Arizona, T ucson, AZ, USA June 18, 2019 Abstract W e presen t a new family of zero-ﬁeld Ising mo dels o v er N binary v ariables/spins obtained b y consecutiv e “gluing” of planar and O (1) -sized comp onen ts along with subsets of at most three v ertices in to a tree. The p o lynomial time algorithm of the dynamic programming t yp e for solving exact inference (partition function computation) and sampling consists of a sequen tial application of an eﬃcient (for planar) or brute-force (for O (1) -sized) inference and sampling to the comp onen ts as a black b o x. T o illustrate utility of the new family of tractable graphical mo dels, we ﬁrst build an O ( N 3 2 ) algorithm for inference and sampling of the K 5 -minor-free zero-ﬁeld Ising mo dels—an extension of the planar zero-ﬁeld Ising mo dels—whic h is neither gen us- nor treewidth-b ounded. Second, we d e monstrate empirically an improv ement in the appro ximation quality of the NP-hard problem of the square-grid Ising model (with non-zero ﬁeld) inference. 1 In tro duction Let G = ( V , E ) b e an undirected graph with a set of vertices V ( G ) and a set of normal edges E ( G ) (no lo ops or multiple edges). W e discuss Ising mo dels which associate the follo wing probability to eac h random N , | V ( G ) | -dimensional binary v ariable/spin conﬁguration X ∈ {± 1 } N : P ( X ) , W ( X ) Z , (1) where W ( X ) , exp  X v ∈ V ( G ) µ v x v + X e = { v ,w }∈ E ( G ) J e x v x w  and Z , X X ∈{± 1 } N W ( X ) . (2) Here, µ = ( µ v , v ∈ V ( G )) is a vector of (magnetic) ﬁelds , J = ( J e , e ∈ E ( G )) is a vector of the (p airwise) spin inter actions , and the normalization constant Z , which is deﬁned as a sum ov er 2 N spin conﬁgurations, is referred to as the p artition function . Giv en the mo del sp eciﬁcation I = h G, µ, J i , w e address the tasks of ﬁnding the exact v alue of Z (inference) and dra wing exact samples with the probability (1). Related work. It has b een known since the seminal contributions of Fisher (Fisher 1966) and Kasteleyn (Kasteleyn 1963) that computation of the partition function in the zero-ﬁeld ( µ = 0 ) Ising model ov er a planar graph and sampling from the resp ectiv e probabilit y distribution are b oth tractable, that is, these are tasks of complexit y p olynomial in N . As sho wn by Barahona (Barahona 1982), ev en when G is planar or when µ = 0 ( zer o ﬁeld ), the positive results are hard to generalize—b oth addition of the non-zero (magnetic) ﬁeld and the extension b ey ond planar graphs make the computation of the partition function NP-hard. These results are also consistent with the statement from Jerrum and Sinclair (Jerrum and Sinclair 1993) that computation of the partition function of the zero-ﬁeld Ising mo del is a #P-complete problem, even in the ferromagnetic case when all comp onen ts of J are p ositiv e. Therefore, describing h G, µ, J i families for whic h computations of the partition function and sampling are tractable remains an op en question. 1 The simplest tractable (i.e., inference and sampling are p olynomial in N ) example is one when G is a tree, and the corresp onding inference algorithm, known as dynamic pr o gr amming and/or b elief pr op agation , has a long history in physics (Bethe 1935; P eierls 1936), optimal control (Bellman 1952), information theory (Gallager 1963), and artiﬁcial intelligence (Pearl 1982). Extension to the case when G is a tree of ( t + 1) -sized cliques “glued” together, or more formally when G is of a tr e ewidth t , is kno wn as the junction tr e e algorithm (V erner Jensen, Olesen, and Andersen 1990), whic h has complexity of counting and sampling that grow exp onen tially with t . Another insigh t originates from the foundational statistical physics literature of the last cen tury related to a zero-ﬁeld v ersion of (1) when G is planar. Onsager (Onsager 1944) found a closed-form solution of (1) in the case of a homogeneous Ising mo del ov er an inﬁnite tw o-dimensional square grid. Kac and W ard (Kac and W ard 1952) reduced the inference of (1) ov er a ﬁnite square lattice to computing a determinant. Kasteleyn (Kasteleyn 1963) generalized this result to an arbitrary planar graph. Kasteleyn’s approach consists of expanding each v ertex of G in to a gadget and reducing the Ising mo del inference to the problem of coun ting p erfect matc hings o ver the expanded graph. Kasteleyn’s construction was simpliﬁed by Fisher in (Fisher 1966). The tigh test running time estimate for Kasteleyn’s metho d giv es O ( N 3 2 ) . Kasteleyn conjectured, which w as later prov en in (Gallucio and Lo ebl 1999), that the approach extends to the case of the zero-ﬁeld Ising mo del o ver graphs em b edded in a surface of genus g with a multiplicativ e O (4 g ) p enalt y . A sligh tly diﬀeren t reduction to perfect matching coun ting (Barahona 1982; Bieche, Uhry , Maynard, and Rammal 1980; Schraudolph and Kamenetsky 2009) also allows one to implement O ( N 3 2 ) sampling of planar zero-ﬁeld Ising mo dels using Wilson’s algorithm (Likhosherstov, Maximo v, and Chertko v 2019; Wilson 1997). A K 33 (Figure 1(a)) minor-free extension of planar zero-ﬁeld inference and sampling was constructed in (Likhosherstov, Maximov, and Chertk ov 2019). An upp er-b ound approximation to a general class of inference problems can b e built b y utilizing the family of tractable spanning Ising submo dels—either trees (W ainwrigh t, Jaakk ola, and Willsky 2005) or planar top ologies (Glob erson and Jaakkola 2007). Con tribution. In this manuscript, we ﬁrst describ e a new family of zero-ﬁeld Ising mo dels on graphs that are more general than planar. Given a tree decomp osition of suc h graphs in to planar and “small” ( O (1) -sized) comp onen ts “glued” together along sets of at most three v ertices, inference and sampling o v er the new family of mo dels is of p olynomial time. W e further sho w that all the K 5 -minor-free graphs are included in this family and, moreov er, their aforementioned tree decomp osition can b e constructed with O ( N ) eﬀorts. (See Figure 1(a) for an illustration.) This allo ws us to pro ve an O ( N 3 2 ) upp er b ound on run time complexity of inference and sampling of the K 5 -free zero-ﬁeld Ising mo dels. Planar graphs are included in the set of K 5 -free graphs, which are neither genus- nor treewidth-b ounded in general. Second, w e show how the newly introduced tractable family of zero-ﬁeld Ising mo dels allows extension of the approach of (Glob erson and Jaakk ola 2007) to an upper-b ound log-partition function of arbitrary Ising models. Instead of using planar spanning subgraphs as in (Glob erson and Jaakk ola 2007), we utilize more general (nonplanar) basic tractable elemen ts. Using the metho dology of (Glob erson and Jaakkola 2007), we illustrate the approach through exp erimen ts with a nonzero-ﬁeld Ising mo del on a square grid for which inference is NP-hard (Barahona 1982). Relation to other algorithms. The result presented in this manuscript is similar to the approac h used to count perfect matc hings in K 5 -free graphs (Curticap ean 2014; Straub, Thierauf, and W agner 2014). How ever, we do not use a transition to p erfect matching counting as it is t ypically done in studies of zero-ﬁeld Ising mo dels o ver planar graphs (Fisher 1966; Kasteleyn 1963; Thomas and Middleton 2009). Presumably , a direct transition to p erfect matching counting can b e done via a construction of an expanded graph in the fashion of (Fisher 1966; Kasteleyn 1963). How ev er, this results in a size increase and, what’s more important, there is no direct corresp ondence b et w een spin conﬁgurations and p erfect matc hings, so sampling is not supp orted. Our approach can also b e view ed as extending results rep orted in (Likhosherstov, Maximov, and Chertko v 2019) on the inference and sampling in the K 33 -free zero-ﬁeld Ising mo dels. In (Likhoshersto v, Maximo v, and Chertko v 2019), K 33 -free graphs are decomp osed into planar and K 5 comp onen ts along pairs of vertices, and the whole construction relies on the underlying planar p erfect matc hing mo del. In this man uscript, we reformulate the K 33 -free construction of (Likhosherstov, Maximo v, and Chertko v 2019) directly in terms of the Ising mo del bypassing mapping to p erfect matc hings. Moreo v er, an extension of the decomp osition to gluing ov er triplets of vertices generalizes the construction, in particular, yielding nov el results for eﬃcient inference and learning for the zero-ﬁeld Ising mo dels ov er K 5 -free graphs. 2 Structure. Section 2 formally introduces the concept of the so-called c -nice decomp osition of graphs and formulates and pro ves tractabilit y of c -nice decomp osable zero-ﬁeld Ising mo dels. Section 2.1 introduces basic notations used later in the manuscript. Section 2.2 describ es a useful tec hnical instrument, called conditioning, whic h is then used in Section 2.3 and Section 2.4 to describ e algorithms for eﬃcient inference and learning, resp ectiv ely , of the zero-ﬁeld Ising mo dels o ver graphs, which allo ws for a c -nice decomp osition, where c is a p ositiv e in teger. Section 3 describ es an application of the algorithm to the example of the K 5 -free zero-ﬁeld Ising mo dels. Section 4 presen ts an empirical application of the newly in tro duced family of tractable mo dels to an upp er-bounding log-partition function of a broader family of intractable graphical mo dels (planar nonzero-ﬁeld Ising mo dels). Section 5 is reserv ed for conclusions. 2 Algorithm W e commence b y introducing the concept of c -nice decomp osition of a graph and stating the main result on the tractability of the new family of Ising mo dels in subsection 2.1. W e introduce a helpful “conditioning” mac hinery in subsection 2.2 and then describ e the eﬃcient inference (subsection 2.3) and sampling (subsection 2.4) algorithms which constructively pro v e the statement. 2.1 Decomp osition tree and the main result Throughout the text, we use common graph-theoretic notations and deﬁnitions (Diestel 2006) and also restate the most important concepts brieﬂy . W e mainly follow (Curticap ean 2014; Reed and Li 2008) in the deﬁnition of the decomp osition tree and its prop erties suﬃcient for our goals. Again, w e p oin t out that w e only consider graphs without lo ops or multiple edges. The graph is planar when it can b e dra wn on a plane without edge intersections. Graph G 0 is a sub gr aph of G whenev er V ( G 0 ) ⊆ V ( G ) and E ( G 0 ) ⊆ E ( G ) . F or tw o subgraphs G 0 and G 00 of G , let G 0 ∪ G 00 = ( V ( G 0 ) ∪ V ( G 00 ) , E ( G 0 ) ∪ E ( G 00 )) (graph union ). Consider a tree decomp osition T = h T , G i of a graph G in to a set of subgraphs G , { G t } of G , where t are no des of a tree T , that is, t ∈ V ( T ) . One of the no des of the tree, r ∈ V ( T ) , is selected as the ro ot. F or each no de t ∈ V ( T ) , its p ar ent is the ﬁrst no de on the unique path from t to r . G ≤ t denotes the graph union of G t 0 for all the no des t 0 in V ( T ) that are t or its descendants. G  t denotes the graph union of G t 0 for all the no des t 0 in V ( T ) that are neither t nor descendants of t . F or tw o neighboring no des of the tree, t, p ∈ V ( T ) and { t, p } ∈ E ( T ) , the set of ov erlapping v ertices of G t and G p , K , V ( G t ) ∩ V ( G p ) , is called an attachment set of t or p . If p is a paren t of t , then K is a navel of t . W e assume that the nav el of the ro ot is empty . T is a c -nic e de c omp osition of G if the following requirements are satisﬁed: 1. ∀ t ∈ V ( T ) with a nav el K , it holds that K = V ( G ≤ t ) ∩ V ( G  t ) . 2. Ev ery attachmen t set K is of size 0 , 1 , 2 , or 3 . 3. ∀ t ∈ V ( T ) , either | V ( G t ) | ≤ c or G t is planar. 4. If t ∈ V ( T ) is suc h that | V ( G t ) | > c , addition of all edges of type e = { v , w } , where v , w b elong to the same attachmen t set of t (if e is not yet in E ( G t ) ) does not destroy planarity of G t . Stating it informally , the c -nice decomp osition of G is a tree decomp osition of G in to planar and “small” (of size at most c ) subgraphs G t , “glued” via subsets of at most three vertices of G . Figure 1(a) shows an example of a c -nice decomp osition with c = 8 . There are v arious similar wa ys to deﬁne a graph decomp osition in literature, and the one presen ted ab o ve is customized to include only prop erties signiﬁcan t for our consecutive analysis. The remainder of this section is devoted to a constructive pro of of the follo wing statement. Theorem 1. L et I = h G, 0 , J i b e any zer o-ﬁeld Ising mo del wher e ther e exists a c -nic e de c omp osition T of G , wher e c is an absolute c onstant. Then, ther e is an algorithm which, given I , T as an input, do es two things: (1) ﬁnds Z and (2) samples a c onﬁgur ation fr om I in time O ( P t ∈ V ( T ) | V ( G t ) | 3 2 ) . 3 + + – + Figure 1: a) An exemplary graph G and its 8 -nice decomp osition T , where t ∈ { 1 , · · · , 7 } lab els no des of the decomp osition tree T and no de 4 is chosen as the ro ot ( r = 4 ). Iden tical vertices of G in its subgraphs G t are shown connected b y dashed lines. Nav els of size 1 , 2 , and 3 are highlighted. Comp onen t G 5 is nonplanar, and G 4 b ecomes nonplanar when all attachmen t edges are added (according to the fourth item of the deﬁnition of the c -nice decomposition). G ≤ 3 and G  3 are sho wn with dotted lines. Note that the decomp osition is non-unique for the graph. F or instance, edges that b elong to the attac hment set can go to either of the tw o subgraphs con taining this set or ev en rep eat in b oth. b) Minors K 5 and K 33 are forbidden in the planar graphs. Möbius ladder and its subgraphs are the only nonplanar graphs allow ed in the 8 -nice decomp osition of a K 5 -free graph. c) The left panel is an example of conditioning on three vertices/spins in the cen ter of a graph. The right panel shows a mo diﬁed graph where the three vertices (from the left panel) are reduced to one vertex, then leading to a mo diﬁcation of the pairwise interactions within the asso ciated zero-ﬁeld Ising mo del o ver the reduced graph. d) Example of a graph that con tains K 5 as a minor: b y contracting the highligh ted groups of v ertices and deleting the remaining vertices, one arrives at the K 5 graph. 2.2 Inference and sampling conditioned on 1, 2, or 3 v ertices/spins Before presen ting the algorithm that prov es Theorem 1 constructively , let us in tro duce auxiliary mac hinery of “conditioning”, which describ es the partition function of a zero-ﬁeld Ising mo del ov er a planar graph conditioned on 1 , 2 , or 3 spins. Consider a zero-ﬁeld Ising mo del I = h G, 0 , J i deﬁned o ver a planar graph G . Recall the following result, rigorously pro v en in (Likhosherstov, Maximov, and Chertko v 2019, Corollary under Theorem 1), whic h we intend to use in the aforementioned tree decomp osition as a black box. Theorem 2. Given I = h G, 0 , J i , wher e G is planar, Z c an b e found in time O ( N 3 2 ) . Dr awing a sample fr om I is a task of O ( N 3 2 ) c omplexity. Let us no w introduce the notion of c onditioning . Consider a spin conﬁguration X ∈ {± 1 } N , a subset V 0 = { v (1) , . . . , v ( ω ) } ⊆ V ( G ) , and deﬁne a c ondition S = { x v (1) = s (1) , . . . , x v ( ω ) = s ( ω ) } on V 0 , where s (1) , . . . , s ( ω ) = ± 1 are ﬁxed v alues. Conditional v ersions of the probability distribution (1 – 2) and the c onditional partition function b ecome P ( X | S ) , W ( X ) × 1 ( X | S ) Z | S , 1 ( X | S ) ,  1 , x v (1) = s (1) , . . . , x v ( ω ) = s ( ω ) 0 , otherwise , (3) where Z | S , X X ∈{± 1 } N W ( X ) × 1 ( X | S ) . (4) Notice that when ω = 0 , S = {} and (3–4) is reduced to (1–2). The subset of V ( G ) is c onne cte d whenev er the subgraph, induced by this subset is connected. Theorem 2 can b e extended as follo ws (a formal pro of can b e found in the supplementary materials). Lemma 1. Given I = h G, 0 , J i wher e G is planar and a c ondition S on a c onne cte d subset V 0 ⊆ V ( G ) , | V 0 | ≤ 3 , c omputing the c onditional p artition function Z | S and sampling fr om P ( X | S ) ar e tasks of O ( N 3 2 ) c omplexity. 4 W e omit here the tedious pro of of the Lemma, also mentioning that the conditioning algorithm pro ving the Lemma tak es the subset of connected vertices and “collapses” them into a single vertex. The graph remains planar and the task is reduced to conditioning on one v ertex, which is an elemen tary op eration given in Theorem 2. (See Figure 1(c) for an illustration.) 2.3 Inference algorithm This subsection constructively prov es the inference part of Theorem 1. F or eac h t ∈ V ( T ) , let I ≤ t , h G ≤ t , 0 , { J e | e ∈ E ( G ≤ t ) ⊆ E ( G ) }i denote a zero-ﬁeld Ising submo del induc e d by G ≤ t . Denote the partition function and subv ector of X related to I ≤ t as Z ≤ t and X ≤ t , { x v | v ∈ V ( G ≤ t ) } , resp ectiv ely . F urther, let K b e t ’s nav el and let S = {∀ v ∈ K : x v = s ( v ) } denote some condition on K . Recall that | K | ≤ 3 . F or each t , the algorithm computes conditional partition functions Z ≤ t | S for all c hoices of condition spin v alues { s ( v ) = ± 1 } . Each t is pro cessed only when its children hav e already b een pro cessed, so the algorithm starts at the leaf and ends at the ro ot. If r ∈ G ( T ) is a ro ot, its nav el is empt y and G ≤ r = G , hence Z = Z ≤ r |{} is computed after r ’s pro cessing. Supp ose all children of t , c 1 , ..., c m ∈ V ( T ) with na vels K 1 , ..., K m ⊆ V ( G t ) ha ve already b een pro cessed, and now t itself is considered. Denote a spin conﬁguration on G t as Y t , { y v = ± 1 | v ∈ V ( G t ) } . I ≤ c 1 , ..., I ≤ c m are I ≤ t ’s submo dels induced by G ≤ c 1 , ..., G ≤ c m , which can only intersect at their nav els in G t . Based on this, one states the follo wing dynamic programming relation: Z ≤ t | S = X Y t ∈{± 1 } | V ( G t ) | 1 ( Y t | S ) exp   X e = { v ,w }∈ E ( G t ) J e y v y w   · m Y i =1 Z ≤ c i | S i [ Y t ] . (5) Here, S i [ Y t ] denotes a condition {∀ v ∈ K i : x v = y v } on K i . The goal is to eﬃciently p erform summation in (5) . Let I (0) , I (1) , I (2) , I (3) b e a partition of { 1 , ..., m } b y nav el sizes. Figure 2(a,b) illustrates inference in t . 1. Na v els of size 0, 1. Notice that if i ∈ I (0) , then Z ≤ c i |{} = Z ≤ c i is constant, which w as computed b efore. The same is true for i ∈ I (1) and Z ≤ c i | S ( i ) [ Y t ] = 1 2 Z ≤ c i . 2. Na v els of size 2. Let i ∈ I (2) denote K i = { u i , q i } and simplify notation Z ≤ c i y 1 ,y 2 , Z ≤ c i x u i = y 1 ,x q i = y 2 for conv enience. Notice that Z ≤ c i | S i [ Y t ] is strictly p ositiv e, and due to the zero-ﬁeld nature of I ≤ c i , one ﬁnds Z ≤ c i | +1 , +1 = Z ≤ c i |− 1 , − 1 and Z ≤ c i | +1 , − 1 = Z ≤ c i |− 1 , +1 . Then, one arrives at log Z ≤ c i | S i [ Y t ] = A i + B i y u i y q i , where A i , log Z ≤ c i | +1 , +1 + log Z ≤ c i | +1 , − 1 and B i , log Z ≤ c i | +1 , +1 − log Z ≤ c i | +1 , − 1 . 3. Na v els of size 3. Let i ∈ I (3) , and as ab o ve, denote K i = { u i , q i , h i } and Z ≤ c i y 1 ,y 2 ,y 3 , Z ≤ c i x u i = y 1 ,x q i = y 2 ,x h i = y 3 . Due to the zero-ﬁeld nature of I ≤ c i , it holds that Z ≤ c i | +1 ,y 2 ,y 3 = Z ≤ c i |− 1 ,y 2 ,y 3 . Observ e that there are such A i , B i , C i , D i that log Z ≤ c i | y 1 ,y 2 ,y 3 = A i + B i y 1 y 2 + C i y 1 y 3 + D i y 2 y 3 for all y 1 , y 2 , y 3 = ± 1 , whic h is guaranteed since the following system of equations has a solution:      log Z ≤ c i | +1 , +1 , +1 log Z ≤ c i | +1 , +1 , − 1 log Z ≤ c i | +1 , − 1 , +1 log Z ≤ c i | +1 , − 1 , − 1      =     +1 +1 +1 +1 +1 +1 − 1 − 1 +1 − 1 +1 − 1 +1 − 1 − 1 +1     ×     A i B i C i D i     . (6) Considering three cases, one rewrites Eq. (5) as Z ≤ t | S = M · X Y t 1 ( Y t | S ) exp  X e = { v ,w }∈ E ( G t ) J e y v y w + X i ∈ I (2) ∪ I (3) B i y u i y q i + X i ∈ I (3) ( C i y u i y h i + D i y q i y h i )  , (7) 5 + – + + – + + – – + – – + + – + + – Figure 2: a) Example of inference at no de t with children c 1 , c 2 , c 3 , c 4 . Nav els K 1 = { u 1 , q 1 , h 1 } , K 2 = { u 2 , q 2 , h 2 } , K 3 = { u 2 , q 2 } , K 4 = { u 4 } , and K = { u, q , h } are highligh ted. F rag- men ts of I ≤ c i are shown with dotted lines. Here, I (0) = ∅ , I (1) = { 4 } , I (2) = { 3 } , and I (3) = { 1 , 2 } , indicating that one child is glued ov er one no de, one c hild is glued ov er tw o no des, and t wo c hildren are glued ov er three no des. b) “Aggregated” Ising mo del I t and its pairwise in teractions are shown. Both c) and d) illustrate sampling ov er I t . One sample spins in I t conditioned on S ( t ) and then rep eats the pro cedure at the c hild no des. Figure 3: Construction of graphs used for approximate inference on a rectangular lattice. F or b etter visualization, v ertices connected to an ap ex are colored white. a) G 0 graph. b) One of planar G ( r ) graphs used in (Glob erson and Jaakkola 2007). Suc h “separator” pattern is rep eated for eac h column and row, resulting in 2( H − 1) graphs in { G ( r ) } . In addition, (Glob erson and Jaakk ola 2007) adds an indep endent variables graph where only ap ex edges are dra wn. c) A mo diﬁed “separator” pattern w e prop ose. Again, the pattern is rep eated horizontally and vertically resulting in 2( H − 2) graphs + indep enden t v ariables graph. This pattern cov ers more magnetic ﬁelds and connects separated parts. Dashed edges indicate the structure of 10 -nice decomp osition used for inference. (Nonplanar no de of size 10 is illustrated on the right.) where M , 2 −| I (1) | ·  Q i ∈ I (0) ∪ I (1) Z ≤ c i  · exp ( P i ∈ I (2) ∪ I (3) A i ) . The sum in Eq. (7) is simply a conditional partition function of a zero-ﬁeld Ising mo del I t deﬁned o ver a graph G t with pairwise in teractions of I adjusted by the addition of B i , C i , and D i summands at the appropriate nav el edges (if a corresponding edge is not presen t in G t , it has to be added). If | V ( G t ) | ≤ c , then (7) is computed a maximum of four times (dep ending on na vel si ze) by brute force ( O (1) time). Otherwise, if K is a disconne cted set in G t , we add zero-interaction edges inside it to mak e it connected. Possible addition of edges inside K, K 1 , . . . , K m do esn’t destroy planarit y according to the fourth item in the deﬁnition of the c -nice decomp osition ab o v e. Finally , w e compute (7) using Lemma 1 in time O ( | V ( G t ) | 3 2 ) . The inference part of Theorem 1 follows directly from the pro cedure just describ ed. 2.4 Sampling algorithm Next, w e address the sampling part of Theorem 1. W e extend the algorithm from section 2.3 so that it supp orts eﬃcien t sampling from I . Assume that the inference pass through T (from leav es to ro ot) has b een done so that I t for all t ∈ V ( T ) are computed. Denote X t , { x v | v ∈ V ( G t ) } . The sampling algorithm runs backw ards, ﬁrst drawing spin v alues X r at the ro ot r of T from the 6 1.0 1.5 2.0 2.5 3.0 Interaction Strength 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Z Bound Error p < 0.01 PSG DSG TRW 1.0 1.5 2.0 2.5 3.0 Interaction Strength 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 Pairwise Marginals Error 1.0 1.5 2.0 2.5 3.0 Interaction Strength 0.0 0.1 0.2 0.3 0.4 Singleton Marginal Error Figure 4: Comparison of tree-reweigh ted approximation (TR W), planar spanning graph (PSG), and decomposition-based spanning graph (DSG) approaches. The ﬁrst plot is for normalized log-partition error, the second is for error in pairwise marginals, and the third is for error in singleton central marginal. Standard errors ov er 100 trials are shown as error bars. An asterisk “*” indicates the statistically signiﬁcan t improv ement of DSG ov er PSG, with a p -v alue smaller than 0 . 01 according to the Wilcoxon test with the Bonferroni correction (Wilcoxon 1945). marginal distribution P ( X r ) , and then pro cessing each n ode t of T after its parent p is pro cessed. Pro cessing consists of dra wing spins X t from P ( X t | X p ) = P ( X t | X ( t ) , { x v | v ∈ K } ) , where K is a na vel of t . This marginal-conditional sc heme generates the correct sample X of spins ov er G . Let P ≤ t ( X ≤ t ) deﬁne a spin distribution of I ≤ t . Because the Ising mo del is an example of Mark ov Random Field, it holds that P ≤ t ( X ≤ t | X ( t ) ) = P ( X ≤ t | X ( t ) ) . W e further deriv e P ( X t | X ( t ) ) = P ≤ t ( X t | X ( t ) ) = 1 Z ≤ t X X ≤ t \ X t exp  X e = { v ,w }∈ E ( G ≤ t ) J e x v x w  = 1 Z ≤ t · exp  X e = { v ,w }∈ E ( G t ) J e x v x w  · m Y i =1 Z ≤ c i | S i [ X t ] ∝ exp  X e = { v ,w }∈ E ( G t ) J e x v x w + X i ∈ I (2) ∪ I (3) B i x u i x q i + X i ∈ I (3) ( C i x u i x h i + D i x q i x h i )  . (8) In other words, sampling from P ( X t | X ( t ) ) is reduced to sampling from I t conditional on spins X ( t ) in the nav el K . It is done via brute force if | V ( G t ) | ≤ c ; otherwise, Lemma 1 allows one to dra w X t in O ( | V ( G t ) | 3 2 ) , since | K | ≤ 3 . Sampling eﬀorts cost as muc h as inference, which concludes the pro of of Theorem 1. Figure 2(c,d) illustrates sampling in t . 3 Application: K 5 -free zero-ﬁeld Ising models Contr action is an op eration of removing tw o adjacent v ertices v and u (and all edges inciden t to them) from the graph and adding a new v ertex w adjacen t to all neigh b ors of v and u . F or tw o graphs G and H , H is G ’s minor , if it is isomorphic to a graph obtained from G ’s subgraph b y a series of contractions. G is H -fr e e , if H is not G ’s minor. W e esp ecially fo cus on the case when H = K 5 (Figure 1(d)). Planar graphs are a sp ecial type of K 5 -free graphs, according to W agner’s theorem (Diestel 2006, Chapter 4.4). Moreov er, some nonplanar graphs are K 5 -free, for example, K 33 (Figure 1(b)). K 5 -free graphs are neither genus-bounded (a disconnected set of g K 33 graphs is K 5 -free and has a gen us of g (Battle, Harary , and Kodama 1962)) and is not treewidth-b ounded (planar square grid of size t × t is K 5 -free and has a treewidth of t (Bo dlaender 1998)). Theorem 3. L et G b e a K 5 -fr e e gr aph of size N with no lo ops or multiple e dges. Then, the 8 -nic e de c omp osition T of G exists and c an b e c ompute d in time O ( N ) . Pr o of (Sketch). An equiv alent decomp osition is constructed in (Reed and Li 2008) in time O ( N ) . W e put a formal proof in the Supplementary materials. Remark. The O ( N ) c onstruction time of T guar ante es that P t ∈ V ( T ) | V ( G t ) | = O ( N ) . All nonplanar c omp onents in T ar e isomorphic to the Möbius ladder (Figur e 1(b)) or its sub gr aph. The graph in Figure 1(a) is actually K 5 -free. Theorems 1 and 3 allow us to conclude: 7 Theorem 4. Given I = h G, 0 , J i with K 5 -fr e e G of size N , ﬁnding Z and sampling fr om I take O ( N 3 2 ) total time. Pr o of. Finding 8 -nice T for G tak es O ( N ) time (Theorem 3). Provided with T , the complexity is O   X t ∈ V ( T ) | V ( G t ) | 3 2   = O   ( X t ∈ V ( T ) | V ( G t ) | ) 3 2   = O ( N 3 2 ) , where we apply conv exity of f ( z ) = z 3 2 for z ≥ 0 and the Remark after Theorem 3. 4 Application: appro ximate inference of square-grid Ising mo del In this section, we consider I = h G, µ, J i suc h that G is a square-grid graph of size H × H . Finding Z ( G, µ, J ) for arbitrary µ , J is an NP-hard problem (Barahona 1982) in such a setting. Construct G 0 b y adding an ap ex v ertex connected to all G ’s vertices by edge (Figure 3(a)). Now it can easily b e seen that Z ( G, µ, J ) = 1 2 Z ( G 0 , 0 , J 0 = ( J µ ∪ J )) , where J µ = µ are interactions assigned for ap ex edges. Let { G ( r ) } b e a family of spanning graphs ( V ( G ( r ) ) = V ( G 0 ) , E ( G ( r ) ) ⊆ E ( G 0 ) ) and J ( r ) b e in teraction v alues on G ( r ) . Also, denote ˆ J ( r ) = J ( r ) ∪ { 0 , e ∈ E ( G 0 ) \ E ( G ( r ) ) } . Assuming that log Z ( G ( r ) , 0 , J ( r ) ) are tractable, the conv exity of log Z ( G 0 , 0 , J 0 ) allows one to write the follo wing upp er b ound: log Z ( G 0 , 0 , J 0 ) ≤ min ρ ( r ) ≥ 0 , P r ρ ( r )=1 { J ( r ) } , P r ρ ( r ) ˆ J ( r ) = J 0 X r ρ ( r ) log Z ( G ( r ) , 0 , J ( r ) ) . (9) After graph set { G ( r ) } has b een ﬁxed, one can numerically optimize the right-hand side of (9), as shown in (Glob erson and Jaakkola 2007) for planar G ( r ) . The extension of the basic planar case is straigh tforward and can b e found in the Supplemen tary materials for conv enience. W e also put the description of marginal probabilities approximation suggested in (Glob erson and Jaakk ola 2007; W ainwrigh t, Jaakk ola, and Willsky 2005). The choice for a planar spanning graph (PSG) family { G ( r ) } of (Glob erson and Jaakk ola 2007) is illustrated in Figure 3(b). A tractable decomp osition-based extension of the planar case presented in this manuscript suggests a more adv anced construction—decomp osition-based spanning graphs (DSG) (Figure 3(c)). W e compare p erformance of b oth PSG and DSG approaches as w ell as the p erformance of tree-reweigh ted approximation (TR W) (W ainwrigh t, Jaakkola, and Willsky 2005) in the following setting of V arying Inter action : µ ∼ U ( − 0 . 5 , 0 . 5) , J ∼ U ( − α, α ) , where α ∈ { 1 , 1 . 2 , 1 . 4 , . . . , 3 } . W e opt optimize for grid size H = 15 ( 225 v ertices, 420 edges) and compare upp er b ounds and marginal probability approximations (sup erscript alg ) with exact v alues obtained using a junction tree algorithm (V erner Jensen, Olesen, and Andersen 1990) (sup erscript true ). W e compute three types of error: 1. normalized log-partition error 1 H 2 (log Z alg − log Z true ) , 2. error in pairwise marginals 1 | E ( G ) | P e = { v ,w }∈ E ( G ) | P alg ( x v x w = 1) − P true ( x v x w = 1) | , and 3. error in singleton central marginal | P alg ( x v = 1) − P true ( x v = 1) | where v is a vertex of G with co ordinates (8 , 8) . W e a v erage results o ver 100 trials (see Fig. 4). 1 2 W e use the same quasi-Newton algorithm (Bertsek as 1999) and parameters when optimizing (9) for PSG and DSG, but for most settings, DSG outp erforms PSG and TR W. Cases with smaller TR W error can b e explained by the fact that TR W implicitly optimizes (9) o ver the family of al l spanning trees which can be exp onen tially big in size, while for PSG and DSG we only use O ( H ) spanning graphs. Because PSG and DSG approaches come close to eac h other, we additionally test for eac h v alue of α on eac h plot, whether the diﬀerence er r P S G − er r DS G is bigger than zero. W e appl y a one-sided Wilcoxon’s test (Wilcoxon 1945) together with the Bonferroni correction b ecause we test 33 times (Jean Dunn 1961). In most settings, the improv ement is statistically signiﬁcant (Figure 4). 1 Hardware used: 24-core Intel R  Xeon R  Gold 6136 CPU @ 3.00 GHz 2 Implementation of the algorithms is av ailable at https://github.com/ValeryTyumen/planar_ising 8 5 Conclusion In this manuscript, w e introduce a new family of zero-ﬁeld Ising mo dels comp osed of planar comp onen ts and graphs of O (1) size. F or these mo dels, w e describ e a p olynomial algorithm for exact inference and sampling provided that the decomp osition tree is also in the input. A theoretical application is O ( N 3 2 ) inference and sampling algorithm for K 5 -free zero-ﬁeld Ising mo dels—a sup erset of planar zero-ﬁeld mo dels that is neither treewidth- nor genus-bounded. A practical application is an improv ement of an approximate inference scheme for arbitrary top ologies based on planar spanning graphs (Glob erson and Jaakk ola 2007) but using tractable spanning decomp osition-based graphs instead of planar. W e leav e the algorithm as it is but substitute planar graphs with a family of spanning decomp osition-based graphs that are tractable. This alone giv es a tigh ter upp er b ound on the true partition function and a more precise approximation of marginal probabilities. 6 A ckno wledgemen ts This w ork was supp orted by the U.S. Departmen t of Energy through the Los Alamos National Lab oratory as part of LDRD and the DOE Grid Mo dernization Lab oratory Consortium (GMLC). Los Alamos National Lab oratory is op erated by T riad National Security , LLC, for the National Nuclear Security Administration of U.S. Department of Energy (Contract No. 89233218CNA000001). References Barahona, F. (1982). On the computational complexit y of Ising spin glass mo dels. Journal of Physics A: Mathematic al and Gener al 15 (10), 3241. Battle, J., F. Harary , and Y. Kodama (1962, 11). Additivit y of the genus of a graph. Bul l. Amer. Math. So c. 68 (6), 565–568. Bellman, R. (1952). On the theory of dynamic programming. Pr o c e e dings of the National A c ademy of Scienc es 38 (8), 716–719. Bertsek as, D. (1999). Nonline ar Pr o gr amming . Athena Scientiﬁc. Bethe, H. (1935). Statistical theory of sup erlattices. Pr o c e e dings of R oyal So ciety of L ondon A 150 , 552. Biec he, L., J. P . Uhry , R. Ma ynard, and R. Rammal (1980). On the ground states of the frustration mo del of a spin glass b y a matching metho d of graph theory . Journal of Physics A: Mathematic al and Gener al 13 (8), 2553. Bo dlaender, H. L. (1998). A partial k-arb oretum of graphs with b ounded treewidth. The or etic al Computer Scienc e 209 (1), 1 – 45. Curticap ean, R. (2014). Counting p erfect matc hings in graphs that exclude a single-crossing minor. arXiv pr eprint arXiv:1406.4056 . Diestel, R. (2006). Gr aph The ory . Electronic library of mathematics. Springer. Fisher, M. E. (1966). On the dimer solution of planar Ising mo dels. Journal of Mathematic al Physics 7 (10), 1776–1781. Gallager, R. (1963). L ow density p arity che ck c o des . MIT Press, Cambridge, MA. Gallucio, A. and M. Loebl (1999). On the theory of Pfaﬃan orientations. I: Perfect matc hings and permanents. The Ele ctr onic Journal of Combinatorics [ele ctr onic only] 6 (1), Research pap er R6, 18 p. Glob erson, A. and T. S. Jaakkola (2007). Approximate inference using planar graph decomp osition. In A dvanc es in Neur al Information Pr o c essing Systems , pp. 473–480. Jean Dunn, O. (1961, 03). Multiple comparisons among means. Journal of The Americ an Statistic al Asso ciation - J AMER ST A TIST ASSN 56 , 52–64. Jerrum, M. and A. Sinclair (1993). Polynomial-time appro ximation algorithms for the Ising mo del. SIAM Journal on Computing 22 (5), 1087–1116. 9 Kac, M. and J. C. W ard (1952, Dec). A combinatorial solution of the t wo-dimensional Ising mo del. Phys. R ev. 88 , 1332–1337. Kasteleyn, P . W. (1963). Dimer statistics and phase transitions. Journal of Mathematic al Physics 4 (2), 287–293. Likhoshersto v, V., Y. Maximov, and M. Chertko v (2019, 09–15 Jun). Inference and sampling of k 33 -free ising mo dels. In K. Chaudhuri and R. Salakhutdino v (Eds.), Pr o c e e dings of the 36th International Confer enc e on Machine L e arning , V olume 97 of Pr o c e e dings of Machine L e arning R ese ar ch , Long Beach, California, USA, pp. 3963–3972. PMLR. Onsager, L. (1944, F eb). Crystal statistics. I: A tw o-dimensional mo del with an order-disorder transition. Phys. R ev. 65 , 117–149. P earl, J. (1982). Rev erend ba yes on inference engines: A distributed hierarc hical approac h. In Pr o c e e dings of the Se c ond AAAI Confer enc e on Artiﬁcial Intel ligenc e , AAAI’82, pp. 133–136. AAAI Press. P eierls, H. (1936). Ising’s model of ferromagnetism. Pr o c e e dings of Cambridge Philosophic al So ciety 32 , 477–481. Reed, B. and Z. Li (2008). Optimization and recognition for K5-minor free graphs in linear time. In E. S. Lab er, C. Bornstein, L. T. Nogueira, and L. F aria (Eds.), LA TIN 2008: The or etic al Informatics , Berlin, Heidelb erg, pp. 206–215. Springer Berlin Heidelb erg. Sc hraudolph, N. N. and D. Kamenetsky (2009). Eﬃcient exact inference in planar Ising mo dels. In D. Koller, D. Sch uurmans, Y. Bengio, and L. Bottou (Eds.), A dvanc es in Neur al Information Pr o c essing Systems 21 , pp. 1417–1424. Curran Asso ciates, Inc. Straub, S., T. Thierauf, and F. W agner (2014, June). Counting the n umber of p erfect matchings in K5-free graphs. In 2014 IEEE 29th Confer enc e on Computational Complexity (CCC) , pp. 66–77. T arjan, R. (1971, Oct). Depth-ﬁrst search and linear graph algorithms. In 12th A nnual Symp osium on Switching and Automata The ory (SW A T 1971) , pp. 114–121. Thomas, C. K. and A. A. Middleton (2009, Oct). Exact algorithm for sampling the tw o-dimensional Ising spin glass. Phys. R ev. E 80 , 046708. Thomason, A. (2001, Marc h). The extremal function for complete minors. J. Comb. The ory Ser. B 81 (2), 318–338. V erner Jensen, F., K. Olesen, and S. Andersen (1990, 08). An algebra of Ba yesian b elief universes for kno wledge based systems. Networks 20 , 637 – 659. W ainwrigh t, M. J., T. S. Jaakkola, and A. S. Willsky (2005). A new class of upp er b ounds on the log partition function. IEEE T r ansactions on Information The ory 51 (7), 2313–2335. Wilco xon, F. (1945). Individual comparisons b y ranking metho ds. Biometrics bul letin 1 (6), 80–83. Wilson, D. B. (1997). Determinan t algorithms for random planar structures. In Pr o c e e dings of the Eighth Annual ACM-SIAM Symp osium on Discr ete Algorithms , SODA ’97, Philadelphia, P A, USA, pp. 258–267. Society for Industrial and Applied Mathematics. Zh u, C., R. H. Byrd, P . Lu, and J. No cedal (1997, December). Algorithm 778: L-BF GS-B: F ortran subroutines for large-scale b ound-constrained optimization. A CM T r ans. Math. Softw. 23 (4), 550–560. 10 App endices Pro of for Lemma 1 Lemma 1. Given I = h G, 0 , J i wher e G is planar and a c ondition S on a c onne cte d subset V 0 ⊆ V ( G ) , | V 0 | ≤ 3 , c omputing c onditional p artition function Z | S , and sampling fr om P ( X | S ) ar e tasks of O ( N 3 2 ) c omplexity. Pr o of. W e consider cases dep ending on ω and consequently reduce each case to a simpler one. F or con venience in cases where applies we denote u , v (1) , h , v (2) , q , v (3) : 1. Conditioning on ω = 0 spins. See Theorem 2. 2. Conditioning on ω = 1 spin. Since conﬁgurations X and − X ha ve the same probability in I , one deduces that Z | x u = s (1) = 1 2 Z . One also deduces that sampling X from P ( X | x u = s (1) ) is reduced to 1) drawing X = { x v = ± 1 } from P ( X ) and then 2) returning X = ( s (1) x u ) · X as a result. 3. Conditioning on ω = 2 spins. There is an edge e 0 = { u, h } ∈ E ( G ) . The following expansion holds: Z | x u = s (1) ,x h = s (2) = X X, x u = s (1) , x h = s (2) exp  X e = { v ,w }∈ E ( G ) J e x v x w  = exp( J e 0 s (1) s (2) ) · X X, x u = s (1) , x h = s (2) exp  X e = { v ,w }∈ E ( G ) e 6 = e 0 J e x v x w  = exp( J e 0 s (1) s (2) ) · X X, x u = s (1) , x h = s (2) exp  X e = { v ,w }∈ E ( G ) e ∩ e 0 = ∅ J e x v x w + X e = { u,v }∈ E ( G ) v 6 = h ( J e s (1) ) x v · 1 + X e = { h,v }∈ E ( G ) v 6 = u ( J e s (2) ) x v · 1  (10) Obtain graph G 0 from G b y contracting u, h in to z . G 0 is still planar and has N − 1 vertices. Preserv e pairwise in teractions of edges which w ere not deleted after contraction. F or eac h edge e = { u, v } , v 6 = h set J { z ,v } = J e s (1) , for each edge e = { h, v } , v 6 = u set J { z ,v } = J e s (2) . Collapse double edges in G 0 whic h were possibly created by transforming into single edges. A pairwise interaction of the result edge is set to the sum of collapsed interactions. Deﬁne a zero-ﬁeld Ising mo del I 0 on the resulted graph G 0 with its pairwise in teractions, inducing a distribution P 0 ( X 0 = { x 0 v = ± 1 | v ∈ V ( G 0 ) } ) . Let Z 0 denote I 0 ’s partition function. A closer lo ok at (10) reveals that Z | x u = s (1) ,x h = s (2) = exp( J e 0 s (1) s (2) ) · Z 0 | x 0 z =1 (11) where Z 0 | z 0 y =1 is a partition function conditioned on a single spin and can b e found eﬃciently as shown ab o ve. Since the equalit y of sums (11) holds summand-wise, for a given X 00 = { x 00 v = ± 1 | v ∈ V ( G ) \ { u, h }} the probabilities P ( X 00 ∪ { x u = s (1) , x h = s (2) } | x u = s (1) , x h = s (2) ) and P 0 ( X 00 ∪ { x 0 z = 1 } | x 0 z = 1) are the same. Hence, sampling from P ( X | x u = s (1) , x h = s (2) ) is reduced to conditional sampling from planar zero-ﬁeld Ising mo del P 0 ( X 0 | x 0 z = 1) of size N − 1 . 4. Conditioning on w = 3 spins. Without loss of generalit y assume that u, h are connected b y an edge e 0 in G . A deriv ation similar to (10) and (11) reveals that (preserving the notation of Case 2) Z | x u = s (1) ,x h = s (2) ,x q = s (3) = exp( J e 0 s (1) s (2) ) · Z 0 | x 0 z =1 ,x 0 q = s (3) (12) whic h reduces inference conditional on 3 vertices to a simpler case of 2 v ertices. Again, sampling from P ( X | x u = s (1) , x t = s (2) , x q = s (3) ) is reduced to a more basic sampling from P 0 ( X 0 | x 0 z = 1 , x 0 q = s (3) ) . 11 In principle, Lemma 1 can b e extended to arbitrarily large ω lea ving a certain freedom for the Ising mo del conditioning framew ork. How ever, in this manuscript we focus on a given sp ecial case whic h is enough for our goals. Pro of for Theorem 3 Prior to the proof, we introduce a series of deﬁnitions used in (Reed and Li 2008). It is assumed that a graph G = ( V , E ) (no lo ops and m ultiple edges) is given. F or any X ⊆ V ( G ) let G − X denote a graph ( V ( G ) \ X, { e = { v , w } ∈ E ( G ) | v , w / ∈ X } ) . X ⊆ V ( G ) is a ( i, j ) - cut whenev er | X | = i and G − X has at least j connected comp onen ts. The graph is bic onne cte d whenever it has no (1 , 2) -cut. A bic onne cte d c omp onent of the graph is a maximal biconnected subgraph. Clearly , a pair of biconnected comp onen ts can in tersect in at most one vertex and a graph of comp onen ts’ in tersections is a tree when G is connected ( a tr e e of bic onne cte d c omp onents ). The graph is 3 -c onne cte d whenev er it has no (2 , 2) -cut. A 2 -blo ck tr e e of a biconnected graph G , written h T 0 , G 0 i , is a tree T 0 with a set G 0 = { G 0 t } t ∈ V ( T 0 ) with the following prop erties: – G 0 t is a graph (p ossibly with multiple edges) for each t ∈ V ( T 0 ) . – If G is 3 -connected then T 0 has a single no de r which is colored 1 and G 0 r = G . – If G is not 3 -connected then there exists a color 2 no de t ∈ V ( T 0 ) suc h that 1. G 0 t is a graph with tw o vertices u and v and no edges for some (2 , 2) -cut { u, v } in G . 2. Let T 0 1 , . . . , T 0 k b e the connected comp onen ts (subtrees) of T 0 − t . Then G − { u, v } has k connected comp onen ts U 1 , . . . , U k and there is a lab elling of these comp onen ts such that T 0 i is a 2 -blo c k tree of G 0 i = ( V ( U i ) ∪ { u, v } , E ( U i ) ∪ {{ u, v }} ) . 3. F or each i , there exists exactly one color 1 no de t i ∈ V ( T 0 i ) such that { u, v } ⊆ V ( G 0 t i ) . 4. F or each i , { t, t i } ∈ E ( T ) . A (3 , 3) -blo ck tr e e of a 3 -connected graph G , written h T 00 , G 00 i , is a tree T 00 with a set G 00 = { G 00 t } t ∈ V ( T 00 ) with the following prop erties: – G 00 t is a graph (p ossibly with multiple edges) for each t ∈ V ( T 00 ) . – If G has no (3 , 3) -cut then T has a single no de r which is colored 1 and G r = G . – If G has a (3 , 3) -cut then there exists a color 2 no de t ∈ V ( T 00 ) such that 1. G 00 t is a graph with vertices u , v and w and no edges for some (3 , 3) -cut { u, v , w } in G . 2. Let T 00 1 , . . . , T 00 k b e the connected comp onen ts (subtrees) of T 00 − t . Then G − { u, v , w } has k connected comp onen ts U 1 , . . . , U k and there is a lab elling of these comp onen ts suc h that T i is a (3 , 3) -blo c k tree of G 00 i = ( V ( U i ) ∪ { u, v , w } , E ( U i ) ∪ {{ u, v } , { v , w } , { u, w }} ) . 3. F or each i , there exists exactly one color 1 no de t i ∈ V ( T 00 i ) , such that { u, v , w } ⊆ V ( G 00 t i ) . 4. F or each i , { t, t i } ∈ E ( T 00 ) . Theorem 3. L et G b e K 5 -fr e e gr aph of size N with no lo ops or multiple e dges. Then the 8 -nic e de c omp osition T of G exists and c an b e c ompute d in time O ( N ) . Pr o of. Since G is K 5 -free and has no lo ops or m ultiple edges, it holds that | E ( G ) | = O ( N ) (Thomason 2001). In time O ( N ) we can ﬁnd a forest of G ’s biconnected components (T arjan 1971). If we ﬁnd an 8 -nice decomp osition for eac h biconnected comp onen t, join them into a single 8 -nice decomposition by using attachmen t sets of size 1 for decomp ositions inside G ’s connected comp onen t and attachmen t sets of size 0 for decomp ositions in diﬀeren t connected comp onen ts. Hence, further we assume that G is biconnected. The O ( N ) algorithm of (Reed and Li 2008) ﬁnds a 2 -blo c k tree h T 0 , G 0 i for G and then for each color 1 node G 0 t ∈ G 0 it ﬁnds (3 , 3) -blo c k tree h T 00 , G 00 i where all components are either planar or Möbius ladders. T o get an 8 -nice decomp osition from each (3 , 3) -blo c k tree, 1) for each color 2 no de 12 con tract an edge betw een it and one of its neighbours in T 00 and 2) remov e all edges which were only created during h T 00 , G 00 i construction (2nd item of (3 , 3) -blo c k tree deﬁnition). No w we hav e to draw additional edges in the forest F of obtained 8 -nice decomp ositions so that to get a single 8 -nice d ecomposition T of G . Notice that for each pair of adjacent no des G 0 t , G 0 s ∈ G 0 where G 0 t is color 1 no de and G 0 s = ( { u, v } , ∅ ) is a color 2 no de, u, v are in V ( G 0 t ) and { u, v } ∈ E ( G 0 t ) . Hence, there is at least one comp onen t G 00 r of 8 -nice decomp osition of G 0 t where b oth u and v are presen t. F or eac h pair of s and t dra w an edge b et ween s and r in F . Then 1) for eac h color 2 node in F (suc h as s ) contract an edge b et ween it and one of its neighbors (such as r ) and 2) remov e all edges wh ic h were created during h T 0 , G 0 i construction (2nd item of 2 -blo c k tree deﬁnition). This results is a correct c -nice decomposition for biconnected G . Upp er Bound Minimization and Marginal Computation in Ap- pro ximation Sc heme Denote: h ( J 0 ) , min ρ ( r ) ≥ 0 , P r ρ ( r )=1 g ( J 0 , ρ ) , g ( J 0 , ρ ) , min { J ( r ) } , P r ρ ( r ) ˆ J ( r ) = J 0 X r ρ ( r ) log Z ( G ( r ) , 0 , J ( r ) ) where h ( J 0 ) is a tight upp er bound for log Z ( G 0 , 0 , J 0 ) . Giv en a ﬁxed ρ , we compute g ( J 0 , ρ ) using L-BFGS-B optimization (Zhu, Byrd, Lu, and No cedal 1997) b y back-propagating through Z ( G ( r ) , 0 , J ( r ) ) and pro jecting gradients on the constrain t linear manifold. On the upp er level we also apply L-BF GS-B algorithm to compute h ( J 0 ) , whic h is p ossible since (W ain wrigh t, Jaakkola, and Willsky 2005; Globerson and Jaakk ola 2007) ∂ ∂ ρ ( r ) g ( J 0 , ρ ) = log Z ( G ( r ) , 0 , J ( r ) min ) − ( M ( r ) ) > J ( r ) min , M ( r ) , ∂ ∂ J ( r ) min log Z ( G ( r ) , 0 , J ( r ) min ) where { J ( r ) min } is argmin inside g ( J 0 , ρ ) ’s deﬁnition and M ( r ) = { M ( r ) e | e ∈ E ( G ( r ) ) } is a v ector of p airwise mar ginal exp e ctations . W e reparameterize ρ ( r ) into w ( r ) P r 0 w ( r 0 ) where w ( r ) > 0 . F or e = { v , w } ∈ E ( G ) w e appro ximate pairwise marginal probabilities as (W ainwrigh t, Jaakkola, and Willsky 2005; Glob erson and Jaakkola 2007) P alg ( x v x w = 1) = 1 2 · [ X r ρ ( r ) M ( r ) e ] + 1 2 Let e A b e an edge b et ween central vertex v and ap ex in G 0 . W e approximate singleton marginal probabilit y at vertex v as P alg ( x v = 1) = 1 2 · [ X r ρ ( r ) M ( r ) e A ] + 1 2 F uture W ork W e see the following straigh tforward extensions of the algorithm presented in the manuscript: 1. The w ork (Curticapean 2014) extends the p olynomial sc heme of (Straub, Thierauf, and W agner 2014) for p erfect matching coun ting in a case when G is H -free, where H is a single-cr ossing graph, i.e. a minor of an arbitrary graph that can be dra wn on a plane with no more than one edge crossing. W e claim without proofs that the same applies for a setting considered in this manuscript. 2. In (Straub, Thierauf, and W agner 2014) authors als o present a parallel v ersion of their p erfect matc hing counting scheme in K 5 -free graphs and sho w that the problem is in TC 1 parallel complexit y class. W e claim without proofs that the same applies for inference of K 5 -free zero-ﬁeld Ising mo dels. 13

A New Family of Tractable Ising Models

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment