Efficient Exact Inference in Planar Ising Models
We give polynomial-time algorithms for the exact computation of lowest-energy (ground) states, worst margin violators, log partition functions, and marginal edge probabilities in certain binary undirected graphical models. Our approach provides an in…
Authors: Nicol N. Schraudolph, Dmitry Kamenetsky
Efficien t Exact Inference in Planar Ising Mo dels Nicol N. Sc hraudolph arxiv@schraudolph.or g Dmitry Kamenetsky dmitr y.kamenetsky@nict a.com.au R ese ar ch Scho ol of Information Scienc es and Engine ering A ustr alian National University –and– NICT A, L o cke d Bag 8001 Canb err a A CT 2601, Austr alia Editor: unkno wn Abstract W e give polynomial-time algorithms for the exact computation of low est-energy (ground) states, w orst margin violators, log partition functions, and marginal edge probabilities in certain binary undirected graphical mo dels. Our approac h provides an in teresting alterna- tiv e to the well-kno wn graph cut paradigm in that it do es not imp ose any submo dularit y constrain ts; instead w e require planarity to establish a correspondence with p erfect m atc h- ings (dimer cov erings) in an expanded dual graph. W e implement a unified framew ork while delegating complex but well-understoo d subproblems (planar embedding, maximum- w eight perfect matc hing) to established algorithms for whic h efficient implemen tations are freely a v ailable. Unlike graph cut methods, we can p erform p enalized maximum-lik eliho od as well as maximum-margin parameter estimation in the asso ciated conditional random fields (CRFs), and emplo y marginal p osterior probabilities as well as maximum a p osteri- ori (MAP) states for prediction. Maxim um-margin CRF parameter estimation on image denoising and segmen tation problems shows our approach to b e efficient and effective. A C++ implementation is a v ailable from http://nic.schraudolph.org/isinf/ . Keyw ords: Mark ov random fields, spin glasses, plane embedding, blossom shrinking, marginal p osterior mo de 1. In tro duction Undirected graphical mo dels are a p opular to ol in machine learning; they represent real- v alued energy functions of the form E 0 ( y ) := X i ∈V E 0 i ( y i ) + X ( i,j ) ∈E E 0 ij ( y i , y j ) , (1) where the terms in the first sum range ov er the no des V = { 1 , 2 , . . . n } , and those in the second sum o ver the edges E ⊆ V × V of an undirected graph G ( V , E ). The junction tr e e decomp osition pro vides an efficien t framework for exact statistical inference in graphs that are (or can b e turned into) trees of small cliques. The resulting algorithms, how ever, require time exp onen tial in the clique size, i.e. , the tr e ewidth of the original graph. The treewidth of man y graphs of practical interest is prohibitiv ely large — for instance, it grows as O ( n ) for an n × n square lattice. A large n um b er of approximate Schraudolph and Kamenetsky inference tec hniques ha v e b een developed so as to deal with such graphs, such as pseudo- lik eliho o d ( Besag , 1986 ), mean field appro ximation, lo op y b elief propagation ( W eiss , 2001 ; Y edidia et al. , 2003 ), tree reweigh ting ( W ainwrigh t et al. , 2003 , 2005 ), and tree sampling ( Hamze and de F reitas , 2004 ). 1.1 The Ising Mo del Efficien t exact inference is p ossible in certain graphical mo dels with binary no de labels. Here we fo cus on Ising mo dels, whose energy functions hav e the form E : { 0 , 1 } n → R with E ( y ) := X ( i,j ) ∈E [ y i 6 = y j ] E ij , (2) where [ · ] denotes the indicator function, i.e. , the cost E ij is incurred only in those states y where y i and y j disagree. Compared to the general mo del ( 1 ) for binary no des, ( 2 ) imp oses t wo additional restrictions: zero no de energies, and edge energies in the form of disagreement costs. At first glance these constrain ts lo ok sev ere; for instance, such systems must obey the symmetry E ( y ) = E ( ¬ y ), where ¬ denotes Bo olean negation (ones’ complement). Ho wev er, it is well known ( e.g. , Globerson and Jaakk ola , 2007 ) that adding a single no de mak es the Ising mo del ( 2 ) as expressive as the general mo del ( 1 ) for binary v ariables: Theorem 1 Every ener gy function of the form ( 1 ) over n binary variables is e quivalent to an Ising ener gy function of the form ( 2 ) over n + 1 variables, with the additional variable held c onstant. Pro of b y construction: Tw o energy functions are equiv alen t if they differ only b y a con- stan t. Without loss of generality , denote the additional v ariable y 0 and hold it constant at y 0 := 0. Given an energy function of the form ( 1 ), construct an Ising mo del with disagree- men t costs as follows: 1. F or eac h no de energy function E 0 i ( y i ), add a disagreemen t cost of E 0 i := E 0 i (1) − E 0 i (0), as shown in Figure 1 a. Note that in b oth states of y i , the energy of the resulting Ising mo del is shifted relativ e to E 0 i ( y i ) by the same constant amount, namely E 0 i (0): y i general Ising energy 0 E 0 i (0) 0 = E 0 i (0) − E 0 i (0) 1 E 0 i (1) E 0 i = E 0 i (1) − E 0 i (0) 2. F or each edge energy function E 0 ij ( y i , y j ), add the three disagreement cost terms E ij := 1 2 [( E 0 ij (0 , 1) + E 0 ij (1 , 0)) − ( E 0 ij (0 , 0) + E 0 ij (1 , 1))] , E 0 i := E 0 ij (1 , 0) − E 0 ij (0 , 0) − E ij , and (3) E 0 j := E 0 ij (0 , 1) − E 0 ij (0 , 0) − E ij , as sho wn in Figure 1 b. Note that for all states of y i and y j , the energy of the resulting Ising mo del is shifted relative to E 0 i ( y i ) b y the same constant amount, namely E 0 ij (0 , 0): 2 Efficient Exa ct Inference in Planar Ising Models n i n 0 E 0 i := E 0 i (1) − E 0 i (0) n i n 0 n j @ @ @ @ @ @ E 0 ij (1 , 0) − E 0 ij (0 , 0) − E ij E 0 ij (0 , 1) − E 0 ij (0 , 0) − E ij E ij := 1 2 [( E 0 ij (0 , 1) + E 0 ij (1 , 0)) − ( E 0 ij (0 , 0) + E 0 ij (1 , 1))] n i n 0 n j n 1 E 0 i E ij − E 0 j n i n 0 n j n 1 - E ij + E 0 i 2 E ij E ij − E 0 j (a) (b) (c) (d) Figure 1: Equiv alen t Ising model (with disagreemen t costs) for a giv en (a) no de energy E 0 i , (b) edge energy E 0 ij in a binary graphical mo del; (c) equiv alent submo dular mo del if E ij > 0 and E 0 i > 0 but E 0 j < 0; (d) equiv alen t directed mo del of Kolmogoro v and Zabih ( 2004 , Fig. 2d). y i y j general Ising energy 0 0 E 0 ij (0 , 0) 0 = E 0 ij (0 , 0) − E 0 ij (0 , 0) 0 1 E 0 ij (0 , 1) E 0 j + E ij = E 0 ij (0 , 1) − E 0 ij (0 , 0) 1 0 E 0 ij (1 , 0) E 0 i + E ij = E 0 ij (1 , 0) − E 0 ij (0 , 0) 1 1 E 0 ij (1 , 1) E 0 i + E 0 j = E 0 ij (1 , 1) − E 0 ij (0 , 0) Summing the abov e terms, the total bias of no de i ( i.e. , its disagreemen t cost with the bias no de) is E 0 i = E 0 i (1) − E 0 i (0) + X j :( i,j ) ∈E [ E 0 ij (1 , 0) − E 0 ij (0 , 0) − E ij ] . (4) This construction defines an Ising mo del whose energy in every configuration y is shifted, relativ e to that of the general mo del we started with, b y the same constant amoun t, namely E 0 ( 0 ): ∀ y ∈ { 0 , 1 } n : E 0 y = E 0 ( y ) − X i ∈V E 0 i (0) − X ( i,j ) ∈E E 0 ij (0 , 0) = E 0 ( y ) − E 0 ( 0 ) . (5) The tw o mo dels’ energy functions are therefore equiv alen t. Note how in the abov e construction the lab el symmetry E ( y ) = E ( ¬ y ) of the plain Ising mo del ( 2 ) is con v eniently brok en b y the introduction of a bias no de, through the conv ention that y 0 := 0. 1.2 Energy Minimization via Graph Cuts Definition 2 The cut C of a binary gr aphic al mo del G ( V , E ) induc e d by state y ∈ { 0 , 1 } n is the set C ( y ) := { ( i, j ) ∈ E : y i 6 = y j } ; its weigh t |C ( y ) | is the sum of the weights of its e dges. 3 Schraudolph and Kamenetsky An y giv en state y partitions the no des of a binary graphical mo del in to t wo sets: those lab eled ‘0’, and those lab eled ‘1’. The corresp onding gr aph cut is the set of edges crossing the partition; since only they contribute disagreement costs to the Ising mo del ( 2 ), w e ha ve ∀ y : |C ( y ) | = E ( y ). The low est-energy state of an Ising mo del therefore induces its minim um-weigh t cut. Conv ersely , the ground state of an Ising mo del (in absence of a bias no de: up to lab el symmetry) can b e determined from its minimum-w eigh t cut via a simple ( e.g. , depth-first) graph trav ersal (see Algorithm 1 ). Minim um-weigh t cuts can be computed in polynomial time in graphs whose edge weigh ts are all non-negative. Introducing one more no de, with the constraint y n +1 := 1, allows us to construct an equiv alen t energy function by replacing eac h negatively weigh ted bias edge E 0 i < 0 by an edge to the new no de n + 1 with the p ositive w eight E i,n +1 := − E 0 i > 0 (Figure 1 c). This still leav es us with the requiremen t that all non-bias edges b e non- negativ e. This submo dularity constraint implies that agreement betw een no des m ust b e lo cally preferable to disagreemen t — a severe limitation. The now widespread use of graph cuts in machine learning to find low est-energy con- figurations, in particular in image pro cessing, was pioneered b y Greig et al. ( 1989 ). Our construction (Figure 1 c) differs from that of Kolmogorov and Zabih ( 2004 ) (Figure 1 d) in that we do not employ the notion of dir e cte d edges. (In directed graphs, the weigh t of a cut is the sum of the weigh ts of only those edges crossing the cut in a given direction.) W e note that a more elaborate construction can give partial answers in graphs with some negativ e edge w eights ( Kolmogorov and Rother , 2007 ; Rother et al. , 2007b ), and that a sequence of exp ansion moves (energy minimizations in binary graphs) can efficiently yield an appro ximate answer for graphs with discrete but non-binary no de labels ( Boyk ov et al. , 2001 ). The remainder of this pap er is organized as follows: Section 2 describ es the planarit y and connectivit y conditions that our approach imp oses upon the graphs, and ho w w e handle them. Section 3 describ es the construction of an expanded dual graph, and the algorithm for computing optimal (ground) states of the Ising mo del from it. The calculation of the partition function and marginal probabilities is dealt with in Section 4 . These algorithms are then used in Section 5 to implement maximum-lik eliho od and maxim um-margin parameter estimation in conditional random fields (CRFs). Section 6 describ es our exp erimen ts using grid CRFs for image denoising and b oundary detection, and Section 7 concludes with a discussion and outlook. W e are making open source C++ code implemen ting our algorithms freely av ailable for download from http://nic.schraudolph.org/isinf/ . 2. Planarity and Connectivity Unlik e graph cut metho ds, the inference algorithms w e will describe do not dep end on submo dularit y; instead they require that the mo del graph b e planar , and that a planar emb e dding b e pro vided. They ma y also w ork only for c onne cte d graphs, and b e the most efficien t only for bic onne cte d graphs. In this section we review these concepts, discuss their implications for our approach, and describ e ho w to b est handle them. 4 Efficient Exa ct Inference in Planar Ising Models 1 2 3 4 5 2 1 3 4 5 1 2 3 4 5 1 2 3 4 5 (a) (b) (c) (d) Figure 2: (a) a non-plane dr awing of a planar graph; (b) a plane dra wing of the same graph; (c) a differen t plane drawing of same graph, with the same planar embedding as (b); (d) a plane drawing of the same graph with a differen t planar embedding. 2.1 Em b edding Planar Graphs Definition 3 A gr aph is planar if it c an b e dr awn in the plane R 2 without e dge inter- se ctions. The r e gions into which such a plane dr awing p artitions R 2 ar e the faces of the dr awing; the unb ounde d r e gion is the external fac e. The op erational nature of this definition would suggest that our algorithms must pro duce (or hav e access to) a plane drawing of the mo del graph. This is unsatisfactory in that suc h a drawing contains muc h information (such as the precise lo cation of the vertices, and the exact shape of the edges) that we will not need. All we care ab out is the cyclic (sa y , clo c kwise) or dering of the edges incident up on each vertex. In top ological graph theory , this is formalized in the notion of a r otation system ( White and Beineke , 1978 , p. 21f ): Definition 4 L et G ( V , E ) b e an undir e cte d, c onne cte d gr aph. F or e ach vertex i ∈ V , let E i denote the set of e dges in E incident up on i , considered as b eing oriented a wa y from i , and let π i b e a cyclic p ermutation of E i . A rotation system for G is a set of p ermutations Π = { π i : i ∈ V } . T o define the sets E i of oriented edges more formally , construct the dir e cte d graph G ( V , E 0 ), where E 0 con tains a pair of directed edges (known as e dgelets ) for eac h undirected edge in E , that is, ( i, j ) ∈ E 0 ⇐ ⇒ [( i, j ) ∈ E ∨ ( j, i ) ∈ E ]. Then E i = { ( j, k ) ∈ E 0 : i = j } . Rotation systems directly corresp ond to top ological graph embeddings in orien table surfaces: Theorem 5 Each r otation system determines an emb e dding of G in some orientable surfac e S such that ∀ i ∈ V , any e dge ( i, j ) ∈ E i is fol lowe d by π i ( i, j ) in (say) clo ckwise orientation, and such that the faces F of the emb e dding, given by the orbits of the mapping ( i, j ) → π j ( j, i ) , ar e 2-c el ls (top olo gic al disks). Pro of see White and Beineke ( 1978 , p. 22f ). 5 Schraudolph and Kamenetsky Note that while in graph visualisation “embedding” is often used as a synonym for “dra wing”, in mo dern top ological graph theory it stands for “rotation system”. W e adopt the latter usage, which views embeddings as equiv alence classes of graph dra wings charac- terized by identical cyclic ordering of the edges inciden t up on each vertex. F or instance, π 4 (4 , 5) = (4 , 3) in Figures 2 b and 2 c (same embedding) but π 4 (4 , 5) = (4 , 1) in Figure 2 d (differen t em b edding). A sample face in Figures 2 b– 2 d is given b y the orbit (4 , 1) → π 1 (1 , 4) = (1 , 2) → π 2 (2 , 1) = (2 , 4) → π 4 (4 , 2) = (4 , 1) . The genus g of the em b edding surface S can b e determined from the Euler char acteristic |V | − |E | + |F | = 2 − 2 g , (6) where |F | is found b y coun ting the orbits of the rotation system, as describ ed in Theorem 5 . Since planar graphs are exactly those that can b e embedded on a surface of gen us g = 0 (a top ological sphere), we arriv e at a purely com binatorial definition of planarit y: Definition 6 A gr aph G ( V , E ) is planar iff it has a r otation system Π pr o ducing exactly 2 + |E | − |V | orbits. Such a system is c al le d a planar em b edding of G , and G ( V , E , Π) is c al le d a plane graph . Our inference algorithms require a plane graph as input. In certain domains ( e.g. , when w orking with geographic information) a plane dra wing of the graph (from which the corre- sp onding em b edding is readily determined) ma y b e a v ailable. Where it is not, w e emplo y the algorithm of Boy er and Myrv old ( 2004 ) whic h, given any connected graph G as input, pro duces in linear time either a planar embedding for G or a pro of that G is non-planar. Source co de for this step is freely av ailable ( Bo yer and Myrvold , 2004 ; Windsor , 2007 ). 2.2 The Planarit y Constraint In Section 1.1 we hav e mapp ed a general binary graphical mo del to an Ising mo del with an additional bias no de; now we require that that Ising mo del b e planar. What do es that imply for the original, general mo del? If all no des of the graph are to b e connected to the bias no de without violating planarity , the graph has to b e outerplanar , i.e. , hav e a planar em b edding in whic h all its no des lie on the external face — a v ery severe restriction. The situation impro ves, ho wev er, if w e do not insist that all no des b e connected to the bias: If only a subset B ⊂ V of no des hav e non-zero bias ( 4 ), then the graph only needs to b e B -outerplanar, i.e. , ha ve a planar embedding in which all no des in B lie on the same face. Mo del selection ma y th us en tail the step of pic king the face of a suitably embedded planar Ising mo del whose no des will b e connected to the bias no de. In image pro cessing, for instance, where it is common to op erate on a square grid of pixels, we can p ermit bias for all no des on the p erimeter of the grid, whic h b orders the external face. In general, a planar embedding which maximizes a weigh ted sum ov er the no des b or- dering a giv en face can b e found in linear time ( Gutw enger and Mutzel , 2004 ); b y setting no de weigh ts to some measure of their bias, such as the magnitude or square of E 0 i ( 4 ), w e can th us efficien tly obtain the planar Ising mo del closest (in that measure) to any given planar binary graphical mo del. 6 Efficient Exa ct Inference in Planar Ising Models In contrast to submo dularit y , B -outerplanarity is a structural constraint. This has the adv an tage that once a mo del ob eying the constraint is selected, inference ( e.g. , parameter estimation) can pro ceed via unconstrained metho ds ( e.g. , optimization). Finally , w e note that all our algorithms can be extended to work for non-planar graphs as w ell. They then take time exp onen tial in the genus of the embedding though still p olynomial in the size of the graph; for graphs of low genus this ma y well b e preferable to current appro ximative metho ds. 2.3 Connectivit y All algorithms in this pap er assume that the graph G ( V , E ) is c onne cte d , i.e. , that E contains at least one path betw een any tw o no des of V . Where this is not the case, one can simply determine the connected comp onen ts 1 of G in linear time ( Hop croft and T arjan , 1973 ), then in v oke the algorithm in question separately on each of them. Since eac h component is unconditionally indep enden t of all others (as they hav e no edges b etw een them), the results can b e trivially com bined. Sp ecifically , • G is planar iff all of its connected comp onents are planar; an y concatenation of a planar embedding for each connected comp onen t is a planar em b edding of G . • Any concatenation of a ground state for eac h connected comp onent of G is a ground state of G . • The log partition function of G is the sum of the log partition functions of its connected comp onen ts. • The edge marginal probabilities of G are the concatenation of edge marginal proba- bilities of its connected comp onents. 2.4 Biconnectivit y Definition 7 A gr aph is biconnected iff it is c onne cte d and do es not have any articulation vertex. An articulation vertex is a vertex whose r emoval (along with any incident e dges) disc onne cts the gr aph. Although the algorithms in this pap er do not require G ( V , E ) to b e biconnected, simpler and more efficien t alternativ es are applicable when this is not the case. As Figure 3 illus- trates, an y graph G can be decomposed in to a tree of edge-disjoin t biconnected components 1 whic h o verlap in the articulation v ertices (of G ) they share; this decomp osition can b e p er- formed in linear time ( Hop croft and T arjan , 1973 ). W e make the following key observ ation: Theorem 8 L et E 1 , E 2 , . . . , E n b e the e dge sets c omprising the bic onne cte d c omp onents of a gr aph G ( V , E ) . The pr ob ability of cuts induc e d by the states of a Markov r andom field ( 16 ) 1. A c omp onent of a graph with resp ect to a given prop ert y is a maximal subgraph that has the prop ert y . 7 Schraudolph and Kamenetsky Figure 3: Skeletal chemical structures (images courtesy of Wikip edia ) of phosphorus trio xide (b ottom left), nitroglycerine (left of center), and quinine (right), with decomp osi- tion into a tree of biconnected comp onen ts indicated (shaded ov als). Phosphorous trio xide is biconnected ( i.e. , all one comp onen t); nitroglycerine is a tree ( i.e. , has only trivial biconnected comp onen ts). over an Ising mo del 2 ( 2 ) on G factors into that of its bic onne cte d c omp onents: P [ C ( y )] = n Y k =1 P [ C ( y ) ∩ E k ] . (7) Pro of If G ( V , E ) is biconnected, n = 1 and E 1 = E , making Theorem 8 trivially true. Oth- erwise split G into a biconnected comp onen t G 1 ( V 1 , E 1 ) which is a leaf of the decomp osition tree, and the remainder G 0 ( V 0 , E 0 ). It is alwa ys p ossible to find such a split. Because it is a leaf, G 1 connects to G 0 through a single articulation vertex of G , whic h we call v 1 . T o summarize: V 1 ∪ V 0 = V , V 1 ∩ V 0 = { v 1 } , E 1 ∪ E 0 = E , E 1 ∩ E 0 = ∅ , G 1 biconnected . (8) Let y 1 and y 0 b e the state y of the mo del restricted to vertices in V 1 and V 0 , resp ectiv ely . By definition, y 1 is indep enden t of y 0 when b oth are conditioned on the state y v 1 of the articulation vertex connecting them. W e now make use of the lab el symmetry of Ising mo dels, r esp. the fact that it is broken b y conditioning on a single no de: P [ C ( y )] = P ( y | y i ) for any c hoice of i . W e therefore hav e P [ C ( y )] = P ( y | y v 1 ) = P ( y 1 , y 0 | y v 1 ) = P ( y 1 | y v 1 ) P ( y 0 | y v 1 ) = P [ C ( y ) ∩ E 1 ] P [ C ( y ) ∩ E 0 ] . (9) 2. Recall that we consider an Ising mo del to b e an undirected graphical mo del with binary no de states, no no de p oten tials, edge potentials in the form of disagreemen t costs, and an optional bias no de. 8 Efficient Exa ct Inference in Planar Ising Models Recursiv ely applying this argument to G 0 yields Theorem 8 . The indep endence of biconnected comp onen ts with resp ect to cuts C ( y ) stated by The- orem 8 ma y come as a surprise, given that the corresp onding no de states y do correlate through the articulation v ertices. What happ ens is that lab el symmetry — sp ecifically , the fact that C ( y ) = C ( ¬ y ) — folds the state space so as to exactly cancel an y momen ts b et w een biconnected comp onen ts. By decoupling their biconnected comp onen ts, this facilitates effi- cien t inference in graphs that are not biconnected themselv es. 2.4.1 Ising Trees Note in Figure 3 how edges which are not part of any cycle of G form trivial biconnected comp onen ts comprising only themselv es and the t wo articulation vertices they connect. At the most extreme, a tr e e T do es not contain an y cycles, hence consists en tirely of trivial biconnected comp onen ts (Figure 3 , left of center). Theorem 8 then implies that eac h edge can b e considered individually , making inference in an Ising tree T ( V , E ) very simple: • T is planar; any em b edding of T is a planar embedding. • The minimum-w eigh t cut of T is the set E − := { ( i, j ) ∈ E : E ij < 0 } . (Use Algorithm 1 to obtain the corresp onding ground state.) • The log partition function of T is ln Z = ln Y ( i,j ) ∈E ( e 0 + e − E ij ) = X ( i,j ) ∈E ln(1 + e − E ij ) . (10) • The marginal probabilit y of any edge ( i, j ) of T is P [( i, j ) ∈ C ] = e − E ij e 0 + e − E ij = 1 1 + e E ij . (11) 2.4.2 The General Case The most efficient wa y to employ the inference algorithms in this pap er on graphs G that are neither biconnected nor trees ( e.g. , Figure 3 , righ t) is to apply them to each non trivial biconnected component of G in turn, then use Theorem 8 to com bine the results, along with the simple solutions giv en in Section 2.4.1 ab o ve for trivial biconnected comp onen ts, in to a result for the full graph. Letting E T ⊆ E denote the set of edges that b elong to trivial biconnected comp onen ts of G , we hav e: • A planar embedding of G is obtained by arbitrarily combining a planar embedding for each of its nontrivial biconnected comp onen ts and the edges in E T . • A minimum-w eigh t cut of G is the union b et w een the edges in E − ∩ E T and a minim um- w eight cut for each of G ’s non trivial biconnected comp onen ts; Algorithm 1 can b e used to obtain the corresp onding ground state. 9 Schraudolph and Kamenetsky • The log partition function of G is the sum of the log partition functions of its nontrivial biconnected comp onen ts, plus P ( i,j ) ∈E T ln(1 + e − E ij ). • The marginal probabilit y of an edge ( i, j ) ∈ E is (1 + e E ij ) − 1 if ( i, j ) ∈ E T , or com- puted (by the metho d of Section 4 ) from the biconnected comp onen t it b elongs to. This concludes our discussion of conditions on the Ising model graph G . In what follo ws, w e assume that G is connected and planar, and that a plane em b edding is pro vided. W e do not require that G is biconnected, though where this is not the case, it is generally more efficien t to decomp ose G into biconnected comp onents as discussed ab o ve. 3. Computing Ground States via Maxim um-W eigh t Perfect Matc hing Definition 9 A frustrated cycle O ⊆ E of a gr aph G ( V , E ) with non-zer o e dge weights E is a simple cycle whose pr o duct of e dge weights is ne gative, i.e. , Q ( i,j ) ∈O E ij < 0 . (A simple cycle is a close d p ath with no r ep e ate d e dges or vertic es.) A weigh ted graph is said to b e frustrated if it con tains any frustrated cycles. Note that trees can nev er b e frustrated b ecause they do not contain an y cycles to b egin with. The low est-energy (ground) state y ∗ := argmin y E ( y ) of an unfrustr ate d Ising mo del is easily found via essentially the same metho d as in a tree (Section 2.4.1 ): pain t no des as you tra verse the graph, flipping the binary color of your paintbrush whenever you trav erse an edge with negative disagreemen t cost (as done by Algorithm 1 b elo w when in vok ed on the cut C = { ( i, j ) ∈ E : E ij < 0 } ). This cannot lead to a contradiction b ecause by Definition 9 y ou will flip brush color an even n um b er of times along an y cycle in the graph, hence alwa ys end a cycle on the same color y ou started it with. The presence of frustration unfortunately makes the computation of ground states m uc h harder — in fact, it is kno wn to b e NP-hard in general ( Barahona , 1982 ). As shown b elow, the ground state of a planar Ising model can b e computed in p olynomial time via maxim um- w eight p erfect matc hing on an expanded dual of its embedded graph. A relationship betw een the states of a planar Ising model and p erfect matchings (“dimer co verings” to physicists) was first established b y Kasteleyn ( 1961 , 1963 , 1967 ) and Fisher ( 1961 , 1966 ). P erfect matchings in dual graph constructs were used by Bieche et al. ( 1980 ) and Barahona ( 1982 ) to compute Ising ground states; b elo w we generalize a simpler con- struction for triangulated graphs due to Glob erson and Jaakkola ( 2007 ). F or rectangular lattices our approac h reduces to the construction of Thomas and Middleton ( 2007 ), though their algorithm to compute ground states is somewhat less straigh tforward. P ardella and Liers ( 2008 ) apply this construction to very large square lattices (up to 3000 × 3000 no des), and find it to b e more efficient than earlier metho ds ( Biec he et al. , 1980 ; Barahona , 1982 ). 3.1 The Expanded Dual Graph Definition 10 The dual G ∗ ( F , E ) of an emb e dde d gr aph G ( V , E , Π) has a vertex for e ach fac e of G , with e dges c onne cting vertic es c orr esp onding to fac es that ar e adjac ent ( i.e. , shar e an e dge) in G . 10 Efficient Exa ct Inference in Planar Ising Models Figure 4: Left column: Square face of a plane graph (dashed lines) with its ordinary (top) r esp. expanded (b ottom) dual graph (solid lines). Other columns: Binary no de states (op en and filled circles) on the graph, induced cuts (b old blue dashes), and complemen tary p erfect matc hings (b old red lines) of the expanded dual. Figure 4 (top left) shows the dual for a square face of our plane graph. Each edge of the dual crosses exactly one edge of the original graph; due to this one-to-one relationship we will consider the dual to hav e the same set of edges E (with the same energies) as the original. Since the no des in th e dual are different, ho wev er, w e will (with some abuse of notation) use a single index for dual edges and their w eights, corresp onding to the index of the original edge in some arbitrary ordering of E . Thus if ( i, j ) is the k th edge in the (ordered) E , its dual will ha ve w eight E k := E ij . W e now expand the dual graph b y replacing each of its nodes with a q -clique, where q is the degree of the node, as sho wn in Figure 4 (bottom left) for a dual node with degree q = 4, and in Figure 5 for an en tire graph. The additional edges internal to each q -clique are given zero energy so as to leav e the mo del unaffected. F or large q the in tro duction of these O ( q 2 ) in ternal edges slows do wn subsequent computations (solid line in Figure 8 , left); this can b e a voided by subdividing the offending q -gonal face of the mo del graph with chords (whic h are also giv en zero energy) b efore constructing the dual. Our implementation performs b est when “o ctangulating” the graph, i.e. , splitting o ctagons off all faces with q > 13; this is more efficient than a full triangulation (Figure 8 , left). It is easily seen that the expanded dual has 2 |E | vertices, t w o for each edge in the original graph. W e therefore giv e the t wo vertices connected by the dual of the k th edge in E the indices 2 k − 1 and 2 k ( cf. Section 4.3.1 and Figure 7 ). This consistent indexing sc heme allo ws us to run the inference algorithms describ ed in the remainder of this pap er without explicitly constructing an expanded dual graph data structure. 11 Schraudolph and Kamenetsky bias bias 0 0 1 0 1 0 1 1 0 1 Figure 5: Example of a planar Ising mo del with five binary no des (large, blue) including a constan t-v alued bias node (top left), and its expanded dual (small nodes, red), in t wo differen t states (left & righ t). The graph cut induced by the giv en state and its complemen tary p erfect matching of the expanded dual are shown in b old, as are the no des left unmatched by the complemen t M 0 := E \C of the cut in the expanded dual. 3.2 Complemen tary Perfect Matchings Definition 11 A p erfect matching of a gr aph G ( V , E ) is a subset M ⊆ E of e dges wher ein exactly one e dge is incident up on e ach vertex: ∀ v ∈ V , | v | = 1 in G ( V , M ) . Its w eight |M| is the sum of the weights of its e dges. Figure 5 shows tw o p erfect matchings (in b old) of the no des of the expanded dual of an Ising mo del. There is a complementary relationship b et ween suc h p erfect matchings and graph cuts in the original Ising mo del, c haracterized by the follo wing tw o theorems. The reader may find it helpful to refer to Figure 5 while going through the pro ofs. Theorem 12 F or every cut C of an emb e dde d gr aph G ( V , E , Π) ther e exists at le ast one (if G is triangulate d: exactly one) p erfe ct matching M of its exp ande d dual complementary to C , i.e. , E \M = C . Pro of By construction, the set E of edges of G constitutes a p erfect matc hing of the expanded dual. Any subset of E therefore is a (p ossibly partial) matching of the expanded dual. The complement M 0 := E \C of a cut of G is a subset of E and th us a matc hing of the expanded dual; it ob eys E \M 0 = E \ ( E \C ) = C . The no des that M 0 lea ves unmatche d in the expanded dual (circled b old in Figure 5 ) are those neigh b oring the edges C of the cut. By definition, C intersects any cycle of G , and therefore also the p erimeters of G ’s faces F , in an even num b er of edges. In each clique of the expanded dual, M 0 th us leav es an ev en num b er of no des unmatc hed. It can therefore be completed into a p erfect matc hing M ⊇ M 0 using only edges interior to the cliques to pair up unmatched no des. These edges 12 Efficient Exa ct Inference in Planar Ising Models ha ve no counterpart in the original graph: ( M\M 0 ) ∩ E = ∅ . W e thus ha ve E \M = E \ [( M\M 0 ) ∪ M 0 ] = [ E \ ( M\M 0 )] \M 0 = E \M 0 = C . (12) In a 3-clique of the expanded dual, M 0 will leav e t wo no des unmatched or none at all; in either case there is only one wa y to complete the matc hing, by adding one edge r esp. none at all. By construction, if G is triangulated all cliques in its expanded dual are 3-cliques, and so M is unique. In other words, there exists a surjection from p erfect matc hings in the expanded dual of G to cuts in G . F urthermore, since we ha ve giv en edges interior to the cliques of the expanded dual zero energy ( i.e. , |M\M 0 | = 0), ev ery p erfect matching M complementary to a cut C of our Ising mo del ( 2 ) ob eys the relation |M| + |C | = |M 0 | + |C | = X ( i,j ) ∈E E ij = const . (13) This means that instead of a minimum-w eigh t cut in a graph w e can lo ok for a maximum- w eight p erfect matching in its expanded dual. But will that matc hing alwa ys be comple- men tary to a cut? The following theorem shows that this is true for plane graphs: Theorem 13 Every p erfe ct matching M of the exp ande d dual of a plane gr aph G ( V , E , Π) is c omplementary to a cut C of G , i.e. , E \M = C . Pro of By definition, E \M is a cut of G iff it intersects every cycle O ⊆ E of G an even n umber of times. This can b e shown b y induction ov er the faces of the em b edding: Base c ase — let O ⊆ E b e the p erimeter of a face of the embedding, and consider the corresp onding clique of the expanded dual: M matches an ev en num ber of no des in the clique via interior edges; all other no des must b e matched b y edges crossing O . The complemen t of the matching in G th us in tersects O an ev en num b er of times: | ( E \M ) ∩ O | ≡ 0 mo d 2 . (14) Induction — let O 1 , O 2 ⊆ E b e cycles in G ob eying ( 14 ), and consider their symmetric differ enc e O 1 ∆ O 2 := ( O 1 ∪ O 2 ) \ ( O 1 ∩ O 2 ): | ( E \M ) ∩ ( O 1 ∆ O 2 ) | = | [( E \M ) ∩ ( O 1 ∪ O 2 )] \ ( O 1 ∩ O 2 ) | = | [( E \M ) ∩ O 1 ] ∪ [( E \M ) ∩ O 2 ] | − | ( E \M ) ∩ ( O 1 ∩ O 2 ) | = | ( E \M ) ∩ O 1 | + | ( E \M ) ∩ O 2 | − 2 | ( E \M ) ∩ ( O 1 ∩ O 2 ) | ≡ 0 + 0 − 2 n ≡ 0 mo d 2 . ( n ∈ N ) (15) W e see that prop ert y ( 14 ) is preserved under comp osition of cycles via symmetric differences, and th us holds for all cycles that can b e comp osed from face p erimeters of the em b edding of G in this fashion. All cycles in a plane graph G are c ontr actible on its em b edding surface (a plane r esp. sphere) b ecause that surface has zero genus, and is therefore simply connected. All con- tractible cycles of G can b e constructed by comp osition of face p erimeters via symmetric differences, thus ha ve an even in tersection with E \M , which is therefore a cut. 13 Schraudolph and Kamenetsky This is where planarity matters: Surfaces of non-zero genus are not simply connected, and th us non-plane graphs ma y contain non-contractible cycles ( e.g. , encircling the hole of a torus). In the presence of frustration, our construction do es not guarantee that the comple- men t E \M of a p erfect matching of the expanded dual contains an even num ber of edges along such cycles. F or planar graphs, how ev er, the ab o ve theorems allow us to lev erage kno wn p olynomial-time algorithms for p erfect matchings into inference metho ds for Ising mo dels. This approach also works for non-planar Ising mo dels that contain no frustrated non-conctractible cycle. W e note that if all cliques of the expanded dual are of ev en size, there is also a direct (non- complemen tary) surjection from p erfect matc hings to cuts in the original graph. In contrast to our complementary map, the direct surjection requires the addition of dumm y vertices in to the expanded dual for faces of G with o dd p erimeter so as to make the corresp onding cliques even ( Kasteleyn , 1963 ; Fisher , 1966 ; Liers and Pardella , 2008 ). 3.3 Computing the Ground State The blossom-shrinking algorithm ( Edmonds , 1965a , b ) is a sophisticated metho d to effi- cien tly compute the maxim um-weigh t p erfect matching of a graph. It can b e implemen ted ( Mehlhorn and Sc h¨ afer , 2002 ) to run in as little as O ( |E | |V | log |V | ) time ( Galil et al. , 1986 ). Although the Blossom IV co de w e are using ( Co ok and Rohe , 1999 ) is asymptotically less efficien t — O ( |E | |V | 2 ) — w e ha ve found it to b e v ery fast in practice (Figure 8 , left). W e can now efficien tly compute the lo west-energy state of a planar Ising mo del as follows: Find a planar em b edding of the model graph (Section 2.1 ), construct its expanded dual (Section 3.1 ), and run the blossom-shrinking algorithm on that to compute its maxim um- w eight p erfect matching. Its complement in the original mo del is the minim um-w eigh t graph cut (Section 3.2 ). W e can iden tify the state which induces this cut via a depth-first graph tra versal (Algorithm 1 ) that lab els no des as it encounters them, and chec ks for consistency on subsequent encoun ters. The trav ersal starts by labeling the bias no de with its known state y 0 := 0. 4. Computing the P artition F unction and Marginal Probabilities A Marko v random field (MRF) ov er our Ising mo del ( 2 ) mo dels the distribution P ( y ) = 1 Z e − E ( y ) , where Z := X y e − E ( y ) (16) is the MRF’s p artition function . As it inv olv es a summation ov er exp onen tially many states y , calculating the partition function is generally intractable. F or planar graphs, how ev er, the generating function for p erfect matc hings can b e calculated in p olynomial time via the determinant of a skew-symmetric matrix ( Kasteleyn , 1961 , 1963 , 1967 ; Fisher , 1961 , 1966 ), which w e call the Kasteleyn matrix K . Due to the close relationship with graph cuts (Section 3.2 ) we can apply this metho d to calculate Z in ( 16 ). W e first conv ert a planar em b edding of the Ising mo del graph in to a Bo olean “half-Kasteleyn” matrix H , in four steps which will b e elab orated b elo w: 14 Efficient Exa ct Inference in Planar Ising Models Algorithm 1 Find State from Corresp onding Graph Cut Input: Ising mo del graph G ( V , E ), graph cut C ( y ) ⊆ E 1. ∀ i ∈ { 0 , 1 , 2 , . . . n } : y i := unknown; 2. dfs state(0, 0); Output: state vector y pro cedure dfs state( i ∈ { 0 , 1 , 2 , . . . n } , s ∈ { 0 , 1 } ) if y i = unknown then 1. y i := s ; 2. ∀ ( i, j ) ∈ E i : if ( i, j ) ∈ C then dfs state( j, ¬ s ); else dfs state( j, s ); else assert y i = s ; 1. Section 4.1 , Algorithm 2 : plane triangulate the em b edded graph so as to mak e the relationship b et w een cuts and complementary p erfect matchings a bijection ( cf. Sec- tion 3.2 ); 2. Section 4.2 , Algorithm 3 : orient the edges of the graph suc h that the in-degree of ev ery no de is o dd; 3. Section 4.3.3 , Algorithm 4 : c onstruct the Bo olean half-Kasteleyn matrix H from the orien ted graph; 4. Section 4.4.3 : pr efactor the triangulation edges (added in Step 1) out of H . Our Step 2 simplifies equiv alen t op erations in previous constructions ( Kasteleyn , 1963 , 1967 ; Fisher , 1966 ; Glob erson and Jaakk ola , 2007 ), Step 3 differs in that it only sets unit ( i.e. , +1) entries in a Bo olean matrix, and Step 4 can dramatically reduce the size of H for compact storage (as a bit matrix) and faster subsequent computations (Figure 8 ). Edge disagreement costs do not en ter in to H ; they are only tak en in to accoun t in a second phase, when the full Kasteleyn matrix K is constructed from H (Section 4.3.2 ). W e can then factor K (Section 4.4 ) and compute the partition function from its determinan t (Section 4.4.1 ; Kasteleyn , 1961 ; Fisher , 1961 ). This factorisation can also be used to inv ert K , whic h is required to obtain mar ginal pr ob abilities of disagreement on the edges of the mo del graph (Section 4.4.2 ). In what follows, we elab orate in turn on the graph op erations of plane triangulation (Section 4.1 ) and o dd edge orien tation (Section 4.2 ), and construction (Section 4.3 ) and factoring (Section 4.4 ) of the Kasteleyn matrix K r esp. H . 15 Schraudolph and Kamenetsky 1 2 3 4 5 1 2 3 4 5 (a) (b) (c) (d) Figure 6: (a) A chordal graph whose external face is not triangulated; (b) a plane trian- gulated graph that has a 4-cycle (b old blue) without a chord; (c) prop er, (d) improp er plane triangulation (dashed) of the plane graph from Figure 2 d. 4.1 Plane T riangulation Definition 14 An emb e dde d gr aph is plane triangulated iff it is bic onne cte d and e ach of its fac es (including the external fac e) is a triangle. Note that plane triangulation is not equiv alen t to making a graph chor dal , though the latter pro cess is sometimes also called “triangulation”. F or instance, the graph in Figure 6 a is c hordal but not plane triangulated b ecause the external face is not triangular, while that in Figure 6 b is plane triangulated but not chordal b ecause it con tains a 4-cycle (b old blue) that has no chord. W e can plane triangulate an embedded graph in linear time by trav ersing all of its faces, inserting chords as necessary as we go along (Algorithm 2 ). This ma y create m ultiple edges b etw een the same tw o vertices, as shown in Figure 6 c. Care m ust be taken when encoun tering singly connected comp onen ts with a p erimeter of length tw o, which could cause the insertion of a self-lo op (see Figure 6 d). Algorithm 2 detects and biconnects suc h comp onen ts, as Definition 14 requires. The insert c hord( i, j, k ) subroutine of Algorithm 2 not only up dates E , but also π i and π k so as to insert the new edge ( i, k ) in its prop er place in the rotation system. In order to leav e the distribution ( 16 ) mo deled by the graph unc hanged, the new edge is given zero energy . Rep eated trav ersals of the same face in Algorithm 2 can b e a voided by the obvious use of “done” flags, omitted here for the sake of clarity . Note that plane triangulation is not strictly necessary for the computation of partition function or marginal probabilities; it merely simplifies subsequen t steps in the construc- tion. Previously ( Fisher , 1966 ; Glob erson and Jaakkola , 2007 ) this con vencience came at a computational price: the edges added during plane triangulation can mak e factoring and in version of K (Section 4.4 ) significantly (up to 20 times) more exp ensiv e. W e av oid this cost by removing the triangulation edges again b efore constructing K , during prefactoring of the half-Kasteleyn bit matrix H (Section 4.4.3 ). 16 Efficient Exa ct Inference in Planar Ising Models Algorithm 2 Plane T riangulation Input: plane graph G ( V , E , Π) with |V | ≥ 3 ∀ i ∈ V : ∀ ( i, j ) ∈ E i : 1. ( j, k ) := π j ( j, i ); 2. ( k , l ) := π k ( k , j ); 3. while l 6 = i ∨ π l ( l, k ) 6 = ( l , j ) (a) if i = k then (a void self-lo op) i := j ; j := k ; k := l ; ( k , l ) := π k ( k , j ); (b) insert c hord( i, j, k ); (c) i := k ; j := l ; (d) ( j, k ) := π j ( j, i ); (e) ( k, l ) := π k ( k , j ); Output: plane triangulated graph G ( V , E , Π) pro cedure insert chord( i, j, k ∈ V ) 1. E := E ∪ { ( i, k ) } ; 2. π k ( k , i ) := π k ( k , j ); 3. π k ( k , j ) := ( k, i ); 4. π i ( π − 1 i ( i, j )) := ( i, k ); 5. π i ( i, k ) := ( i, j ); 6. E ik := 0; 4.2 Odd Edge Orien tation T o calculate the generating function for p erfect matchings, the graph in question (namely , the expanded dual of our mo del graph) must b e giv en a clo ckwise o dd orientation . Definition 15 An orientation of an undir e cte d gr aph G ( V , E ) is a set E 0 of oriente d e dges with |E 0 | = |E | such that ∀ ( i, j ) ∈ E , E 0 c ontains either ( i, j ) or ( j, i ) . Definition 16 ( Kasteleyn , 1963 ) An orientation of an emb e dde d gr aph is clo c kwise o dd iff the numb er of e dges oriente d clo ckwise ar ound e ach fac e (exc ept p ossibly the external fac e) is o dd. Consider Figure 7 : b y giving all in terior edges of the 3-cliques of the expanded dual a clo c kwise orientation (small red arrows), we ensure that (a) the interior faces of the 3-cliques hav e clo c kwise o dd orien tation, and (b) all interior edges of the 3-cliques are orien ted c ounter clo ckwise wrt. all faces exterior to the 3-cliques, hence do not affec t the 17 Schraudolph and Kamenetsky 1 2 3 4 6 5 7 8 9 10 15 18 1 2 3 4 5 6 7 8 9 10 12 1 1 13 14 16 17 19 20 Figure 7: Clo ckwise o dd orien tation (Section 4.2 ) and indexing scheme (Section 4.3.1 ) for the expanded dual (red, small no des) of the mo del graph (blue, large no des). latters’ clo c kwise o dd orientation status. It remains to consider the orien tation of edges external to the 3-cliques (large red arro ws). What do es a clockwise o dd orientation of these edges corresp ond to in the original mo del graph? T o map these edges bac k in to the mo del graph, r otate them clo c kwise b y 90 degrees. A face with clo c kwise o dd orientation of its p erimeter in the dual th us maps to a v ertex with an odd in-de gr e e , i.e. , an o dd num b er of edges orien ted to wards it. This facilitates a drastic simplification of this step in our construction: T o establish a clo ckwise o dd orientation of the exp ande d dual, simply orient the e dges of the mo del gr aph such that al l vertic es, exc ept p ossibly one, have an o dd in-de gr e e. Algorithm 3 ac hieves this in linear time by orien ting edges appropriately up on return from a depth-first trav ersal of the graph. In con trast to earlier constructions ( Kasteleyn , 1963 ; Fisher , 1966 ; Glob erson and Jaakkola , 2007 ), it do es not require following orbits around faces, and in fact do es not refer to an embedding Π or dual graph G ∗ at all. An y v ertex can b e chosen to b e the exceptional vertex s , since the choice of external face of a plane dra wing is arbitrary — it is an artifact of the drawing, not an intrinsic prop ert y of the em b edding: a planar graph embedded on a sphere has no external face. 4.3 Constructing the Kasteleyn Matrix The Kasteleyn matrix K is a skew-symmetric, 2 |E | × 2 |E | matrix constructed from the Ising mo del graph whose determinant is the square of the partition function. Our construction impro ves up on the work of Glob erson and Jaakkola ( 2007 ) in a num b er of wa ys: 18 Efficient Exa ct Inference in Planar Ising Models Algorithm 3 Construct Odd Edge Orien tation Input: undirected graph G ( V , E ) 1. ∀ v ∈ V : v .visited = false; 2. pic k arbitrary edge ( r, s ) ∈ E ; 3. E 0 := { ( r , s ) } ; 4. mak e odd( r, s ); Output: orien tation E 0 of E : ∀ v ∈ V \{ s } , in-degree( v ) ≡ 1 mo d 2 in G ( V , E 0 ) function mak e odd: ( u, v ∈ V ) → { true, false } 1. E := E \ ( u, v ); 2. if v .visited then return true; 3. v .visited := true; 4. o dd := false; 5. ∀ w ∈ V : { v , w } ∈ E if make o dd( v, w ) then (a) E 0 := E 0 ∪ { ( w , v ) } ; (b) o dd := ¬ o dd; else E 0 := E 0 ∪ { ( v , w ) } ; 6. return o dd; • W e employ an indexing scheme that obviates an y need to refer to the expanded dual of the mo del graph (which w e consequently nev er explicitly construct at all); • W e break construction of the Kasteleyn matrix into tw o phases, the first of whic h is in v arian t with resp ect to the mo del’s disagreemen t costs; • W e mak e the “half-Kasteleyn” matrix H computed in the first phase v ery compact b y pr efactoring out the triangulation edges (see Section 4.4.3 ) and storing it as a bit matrix. 4.3.1 Indexing Scheme Without loss of generality , let E = { e 1 , e 2 , . . . e |E | } . Note that the expanded dual has 2 |E | v ertices, one lying to either side of ev ery edge in the mo del graph. When viewing edge e k along its dir e ction in E 0 , we lab el the dual no de to its righ t 2 k − 1 and that to its left 2 k ; see Figure 7 for an example ( cf. Section 3.1 ). One b enefit of this scheme is that quan tities relating to the edges of the mo del graph (as opp osed to in ternal edges of the cliques of the expanded dual) will alwa ys b e found on the sup erdiagonal of K . W e also notationally extend the rotation system from Section 2.1 to supp ort this indexing scheme: e k = ( i, j ) ⇐ ⇒ e π i ( k ) = π i ( i, j ). 19 Schraudolph and Kamenetsky 10 100 1e3 1e4 1e5 1 0.1 0.01 1e-3 1e-4 CPU time (seconds) original triang. octang. 10 100 1000 0.001 0.01 0.1 1 10 CPU time (seconds) no prefact. prefactored 10 100 1000 1e3 1e6 1e9 memory (bytes) K H uncompressed compressed ring size (no des) ring size (no des) ring size (no des) Figure 8: Cost of planar Ising inference metho ds on a ring graph, plotted against ring size. Left & cen ter: CPU time on Apple MacBo ok with 2.2 GHz Intel Core2 Duo pro cessor, av eraged ov er 100 rep etitions. Left: MAP state calculated via Blos- som IV ( Co ok and Rohe , 1999 ) on original, triangulated, and o ctangulated ring. Cen ter: marginal edge probabilities with vs. without prefactoring. Right: size of K (double precision, no prefactoring) vs. prefactored bit matrix H , in uncom- pressed (solid lines) vs. compressed form (dashed lines), using row-compressed sparse matrix storage for K and bzip2 compression for H . 4.3.2 Two-Phase Construction In a first phase we pro cess the Ising mo del graph’s structur e into a Bo olean “half-Kasteleyn” matrix H which do es not yet include disagreement costs. F or a giv en set of edge disagree- men t costs E k , k = { 1 , 2 , , . . . |E |} , we then build from H the conv entional, real-v alued Kasteleyn matrix K by adding the exp onen tiated disagreement costs along the sup erdiag- onal and sk ew-symmetrizing: 1. K := H ; 2. ∀ k ∈ { 1 , 2 , , . . . |E |} : K 2 k − 1 , 2 k := K 2 k − 1 , 2 k + e E k ; 3. K := K − K > . This tw o-phase approach holds a num b er of adv an tages: • When w orking with a large n um b er of isomorphic graphs (as we do in Section 6 ), the corresp onding half-Kasteleyn matrix is iden tical for all of them, hence needs to b e constructed just once. • During maximum likelihoo d parameter estimation, partition function and/or marginals ha ve to b e recomputed man y times for the same graph, with disagreemen t costs v ary- ing due to the ongoing adaptation of the model parameters. H remains v alid when disagreemen t costs c hange, so w e can compute it just once upfront, then re-use it in the parameter estimation lo op. • H can b e stored very compactly as a prefactored bit matrix. As Figure 8 (right) sho ws, the uncompressed H can b e several orders of magnitude smaller than the 20 Efficient Exa ct Inference in Planar Ising Models corresp onding Kasteleyn matrix K . Row-compressed sparse storage of K (which has exactly 3 non-zero entries in eac h ro w and column) is more efficient, but applying the bzip2 compressor 3 to the prefactored bit matrix H yields by far the most compact storage format. Suc h memory efficiency b ecomes v ery imp ortan t when working with large data sets of non-isomorphic graphs. Algorithm 4 Construct Half-Kasteleyn Bit Matrix Input: orien ted, embedded, triangulated graph G ( V , E 0 , Π) 1. H := 0 ∈ { 0 , 1 } 2 |E 0 |× 2 |E 0 | 2. ∀ v ∈ V : (a) e s := any edge incident on v ; (b) if e s = ( · , v ) (edge p oin ts to v ) then α := 2 s ; else α := 2 s − 1; (c) i := π v ( s ); (d) repeat if e i = ( · , v ) (edge p oin ts to v ) then H 2 i − 1 ,α := 1; α := 2 i ; if e i w as created by Algorithm 2 (plane triangulation) then H 2 i − 1 , 2 i := 1; else H 2 i,α := 1; α := 2 i − 1; i := π v ( i ); un til i = π v ( s ); Output: Half-Kasteleyn bit matrix H 4.3.3 Algorithm for Constructing H Using the ab o ve indexing scheme, Algorithm 4 constructs H in linear time, cycling once through the edges incident up on each vertex of the mo del graph. It deviates from the classical c onstruction of K ( Kasteleyn , 1961 ; Fisher , 1961 ; Glob erson and Jaakk ola , 2007 ) in that it mak es only p ositive en tries, and only those corresp onding to edges with zero disagreemen t cost, i.e. , added during plane triangulation or internal to the cliques of the 3. http://www.bzip.org/ 21 Schraudolph and Kamenetsky expanded dual. All suc h en tries hav e the v alue e 0 = 1, making H a Bo olean matrix, whic h can b e stored compactly as a bit matrix. 4.4 F actoring Kasteleyn Matrices Standard approaches such as LU-factorization can b e used to factor the Kasteleyn matrix K , but do not exploit its skew symmetry . Here we develop a numerically stable Cholesky- lik e factorization for even-sized skew-symmetric matrices, which can b e used to factor K as well as to prefactor H (see b elo w). Begin by writing the Kasteleyn matrix as K = 0 c a > − c 0 b > − a − b C , (17) for some scalar c , vectors a and b , and a matrix C which is either empt y or again of the same form. W e factor ( 17 ) in to ( cf. Bunc h , 1982 ; Benner et al. , 2000 ) K = 0 − 1 0 > 1 0 0 > a /c b /c I 0 c 0 > − c 0 0 > 0 0 C 0 0 1 a > /c − 1 0 b > /c 0 0 I , (18) where C 0 is the Sc hur complemen t C 0 := C + ( ba > − ab > ) /c. (19) Iterated application of ( 18 ) to the Sch ur complemen t ultimately yields K = R > J R , where R := 0 1 a > 1 /c 1 − 1 0 b > 1 /c 1 0 0 0 1 a > 2 /c 2 0 − 1 0 b > 2 /c 2 . . . . . . . . . . . . 0 0 0 1 0 · · · 0 − 1 0 (20) and J := 0 c 1 0 · · · 0 − c 1 0 0 0 0 0 0 c 2 . . . . . . 0 − c 2 0 0 . . . . . . . . . 0 0 0 0 0 c |E | 0 · · · 0 − c |E | 0 . (21) T o preven t small pivots c i from causing numerical instability , pivoting is required. Since Kasteleyn matrices hav e at least t wo entries of unit magnitude in each row and column, partial pivoting suffices. 22 Efficient Exa ct Inference in Planar Ising Models 4.4.1 P ar tition Function The partition function for p erfect matc hings is p | K | ( Kasteleyn , 1961 ; Fisher , 1961 ). Our factoring gives | R | = 1 and | J | = Q i c 2 i , so w e hav e p | K | = q | R > | | J | | R | = p | J | = |E | Y i =1 | c i | . (22) Calculation of the pro duct in ( 22 ) is prone to n umerical ov erflo w; this is easily a voided by w orking with logarithms instead. Using the complementary relationship ( 13 ) with graph cuts in planar Ising mo dels, we obtain the log partition function for the latter as ln Z := ln X y e − P k ∈C ( y ) E k = ln X y e − ( P k ∈E E k − P k / ∈C ( y ) E k ) (23) = ln( e − P k ∈E E k X y e P k / ∈C ( y ) E k ) = ln p | K | − X k ∈E E k = |E | X i =1 (ln | c i | − E i ) . 4.4.2 Marginal Probabilities The mar ginal pr ob ability of disagreemen t along an edge equals the negative gradien t of the log partition function ( 23 ) with resp ect to the disagreemen t costs. Computing this inv olv es the inv erse of K : P ( k ∈ C ) = X y : k ∈C ( y ) P ( y ) = 1 Z X y : k ∈C ( y ) e − E ( y ) = − 1 Z ∂ Z ∂ E k = − ∂ ln Z ∂ E k = 1 − 1 2 | K | ∂ | K | ∂ E k = 1 − 1 2 tr K − 1 ∂ K ∂ E k = 1 + K − 1 2 k − 1 , 2 k K 2 k − 1 , 2 k , (24) where tr denotes the matrix trace, and we ha v e used the fact that K − 1 is skew-symmetric as w ell. T o in vert K , observe from ( 20 ) r esp. ( 21 ) that R and J are essentially triangular r esp. diagonal (simply sw ap rows 2 k − 1 and 2 k , k = 1 , 2 , . . . |E | ), and thus easily in verted. Then use K − 1 = R − 1 J − 1 R −> to obtain [ K − 1 ] 2 k − 1 , 2 k = 2 |E | X i =1 2 |E | X j =1 [ R − 1 ] 2 k − 1 ,i [ J − 1 ] i,j [ R −> ] j, 2 k = − 1 c k + |E | X i = k +1 d ik , (25) where d ik := [ R − 1 ] 2 k − 1 , 2 i [ R − 1 ] 2 k, 2 i − 1 − [ R − 1 ] 2 k − 1 , 2 i − 1 [ R − 1 ] 2 k, 2 i c i . 4.4.3 Pref actoring Consider the rows and columns of K corresp onding to an edge added during plane trian- gulation (Section 4.1 ). Re-order K to bring those rows and columns to the top left, so that they form the a , b , and c of ( 17 ). Since the disagreement cost of a triangulation edge is zero, we no w hav e a unit y piv ot: c = e 0 = 1. This has t wo adv an tageous consequences: Size reduction: The unit y pivot do es not affect the v alue of the partition function. Since we are not in terested in the marginal probability of triangulation edges (whic h after all 23 Schraudolph and Kamenetsky are not part of the original model), we do not need a or b either, once w e ha v e computed the Sc hur complemen t ( 19 ). W e can therefore discard the first tw o rows and first tw o columns of K after factoring ( 17 ). F actoring out all triangulation edges in this fashion reduces the size of K ( r esp. R and J ) to range only ov er the edges of the original Ising mo del graph. This reduces storage requiremen ts and sp eeds up subsequen t computation of the in verse (Figure 8 , cen ter). Bo olean closure: The unity pivot eliminates the division from the Sc h ur complement ( 19 ); in fact w e sho w b elo w that applying ( 18 ) to prefactor H − H > yields a Sc hur com- plemen t that can b e expressed as H 0 − H 0 > , where H 0 is again a Bo olean matrix. This closure property allo ws us to simply prefactor triangulation edges directly out of H without explicitly constructing K . Sp ecifically , let K := H − H > for a half-Kasteleyn matrix H with elements in { 0 , 1 } . Without loss of generality , assume that H and its transp ose are disjoint , i.e. , ha ve no non-zero element in common: H H > = 0 , where denotes Hadamard (elemen t-wise) m ultiplication. Algorithm 4 resp ects this condition; violations would cancel in the con- struction of K anyw a y . Expressing H as H = 0 1 a > 1 0 0 b > 1 a 2 b 2 C 1 , (26) w e can write K = H − H > as ( 17 ) with a = a 1 − a 2 , b = b 1 − b 2 , c = 1, and C = C 1 − C > 1 . The Sch ur complement ( 19 ) then b ecomes C 0 = C 1 − C > 1 + ( b 1 − b 2 )( a 1 − a 2 ) > − ( a 1 − a 2 )( b 1 − b 2 ) > = ( C 1 + b 1 a > 1 + b 2 a > 2 + a 1 b > 2 + a 2 b > 1 ) − ( C > 1 + a 1 b > 1 + a 2 b > 2 + b 2 a > 1 + b 1 a > 2 ) =: H 0 − H 0 > , (27) where H 0 = C 1 + b 1 a > 1 + b 2 a > 2 + a 1 b > 2 + a 2 b > 1 . (28) It remains to show that all elements of H 0 are in { 0 , 1 } . By definition of H ( 26 ), all elemen ts of C 1 , a 1 , a 2 , b 1 , b 2 are in { 0 , 1 } , and by closure of m ultiplication in { 0 , 1 } so are their pro ducts. Thus an elemen t of H 0 will b e in { 0 , 1 } iff it is non-zero in at most one term on the righ t-hand side of ( 28 ), or (equiv alently) iff all pairs formed from the five terms in question are disjoin t. This can indeed b e sho wn to b e the case: • Because H H > = 0 , we know that neither a 1 and a 2 nor b 1 and b 2 can ha ve any non-zero elements in common, so b 1 a > 1 and b 2 a > 2 are disjoint, as are a 1 b > 2 and a 2 b > 1 . • By construction of H (Algorithm 4 ), a 1 and b 1 ( r esp. a 2 and b 2 ) can only ha ve an element in common if the dual no des on b oth sides of the corresp onding edge are mem b ers of the same clique. This cannot happen b ecause we explicitly ensure that the graph b ecomes biconnected during plane triangulation (Section 4.1 ), so that an edge cannot b order the same face of the mo del graph on b oth sides. Th us all four outer pro ducts in ( 28 ) are pairwise disjoint. 24 Efficient Exa ct Inference in Planar Ising Models • Finally , each outer pro duct in ( 28 ) is disjoint from C 1 as long as the edges b eing factored out do not form a cut of the mo del graph ( i.e. , cycle of the dual). W e are prefactoring only edges added during triangulation; for these to form a cut the mo del graph w ould ha ve to ha ve b een disconnected prior to triangulation. This can- not happ en here b ecause we deal with (and eliminate) this possibility during earlier prepro cessing (Section 2.3 ). In summary , all five terms on the right-hand side of ( 28 ) are pairwise disjoin t. Th us the Sc hur complement H 0 is a Bo olean matrix as w ell, and can b e computed from H ( 26 ) very efficien tly b y replacing the additions and multiplications in ( 28 ) with bit wise OR and AND op erations, resp ectiv ely . As long as further triangulation edges remain in H 0 , w e then set H := H 0 and iteratively apply ( 26 ) and ( 28 ) so as to prefactor them out as well. 5. Application to CRF Parameter Estimation Our algorithms can b e applied to regularized maximum lik eliho o d and maximum margin parameter estimation in conditional random fields (CRFs). In a standard planar Ising CRF, the disagreement costs in ( 2 ) are computed as E k := θ > x k , i.e. , as inner pro ducts b et w een lo cal features (sufficient statistics) x k of the modeled data at eac h edge k , and corresp onding parameters θ of the mo del. The c onditional distribution P ( y | x , θ ) (where x represents the union of all lo cal features) is then mo deled as a Marko v random field ( 16 ). 5.1 Maxim um Likelihoo d Maxim um-likelihoo d (ML) CRF parameter estimation seeks to minimize wrt. θ the L 2 - regularized negative log likelihoo d L ML ( θ ) := 1 2 λ k θ k 2 − ln P ( y ∗ | x , θ ) = 1 2 λ k θ k 2 + E ( y ∗ | x , θ ) + ln Z ( θ | x ) (29) of a giv en target lab eling y ∗ , 4 with regularization parameter λ . This is a smooth, conv ex, non-negativ e ob jective that can b e optimized via gradient metho ds suc h as LBFGS, either in conv en tional batc h mo de ( No cedal , 1980 ; Liu and No cedal , 1989 ) or online ( Sc hraudolph et al. , 2007 ). The gradient of ( 29 ) with resp ect to the parameters θ is given b y ∂ ∂ θ L ML ( θ ) = λ θ + X k ∈E [ k ∈ C ( y ∗ )] − P ( k ∈ C ( y | x , θ )) x k . (30) The contribution of each edge k to the gradient ( 30 ) is given by the pro duct b et w een its lo cal features x k and the difference b et ween the indicator function for membership of k in the cut induced by the target state y ∗ and the marginal probability of k b eing contained in a cut, giv en x and θ . W e compute the latter via the in verse of the Kasteleyn matrix ( 24 ). 4. F or notational clarity we suppress here the fact that we are usually modeling a collection of data items; the ob jective function for such a set is simply the sum of ob jectives for each individual item in it. 25 Schraudolph and Kamenetsky 5.2 Maxim um Margin F or maximum-margin (MM) parameter estimation ( T ask ar et al. , 2004 ) we instead minimize L MM ( θ ) := 1 2 λ k θ k 2 + E ( y ∗ | x , θ ) − min y M ( y | y ∗ , x , θ ) (31) = 1 2 λ k θ k 2 + E ( y ∗ | x , θ ) − E ( ˆ y | x , θ ) + d ( ˆ y | y ∗ ) , where ˆ y := argmin y M ( y | y ∗ , x , θ ) is the worst margin violator, i.e. , the state that minimizes, relativ e to a given target state y ∗ , 4 the mar gin ener gy M ( y | y ∗ ) := E ( y ) − d ( y | y ∗ ) , (32) where d ( ·|· ) is a measure of div ergence in state space. If d ( ·|· ) is a weigh ted Hamming distance b et ween induced cuts: d ( y | y ∗ ) := X k ∈E [[ k ∈ C ( y )] 6 = [ k ∈ C ( y ∗ )]] v k , (33) where the v k > 0 are constant w eighting factors (in the simplest case: all ones) on the edges of our Ising mo del, then it is easily verified that the margin energy ( 32 ) is implemen ted (up to a shift that dep ends only on y ∗ ) by an isomorphic Ising mo del with disagreement costs E k + (2[ k ∈ C ( y ∗ )] − 1) v k . (34) W e can thus use our algorithm of Section 3.3 to efficiently find the w orst margin violator. 5 The maximum-margin ob jective ( 31 ) is conv ex but non-smo oth; its gradient is ∂ ∂ θ L MM ( θ ) = λ θ + X k ∈E [ k ∈ C ( y ∗ )] − [ k ∈ C ( ˆ y )] x k , (35) i.e. , lo cal features x k are multiplied by one of {− 1 , 0 , 1 } , dep ending on the membership of edge k in the cuts induced by y ∗ and ˆ y , resp ectiv ely . W e can minimize ( 31 ) via bundle metho ds, such as the BT bundle trust algorithm ( Schramm and Zo we , 1992 ), making use of the con venien t low er b ound ∀ θ : L MM ( θ ) ≥ 0, which holds b ecause min y M ( y | y ∗ , x , θ ) ≤ M ( y ∗ | y ∗ , x , θ ) = E ( y ∗ | x , θ ) − d ( y ∗ | y ∗ ) = E ( y ∗ | x , θ ) . (36) 6. Exp erimen ts W e now demonstrate the scalability of our approach to CRF parameter estimation (Sec- tion 5 ) on t wo simple computer vision problems: the syn thetic binary image denoising task of Kumar and Heb ert ( 2004 , 2006 ), and the detection of segmentation b oundaries in noisy masks from the GrabCut Ground T ruth image segmentation database ( Rother et al. , 2007a ). 5. Thomas and Middleton ( 2007 ) emplo y a similar approach to obtain the ground state from a given state y ∗ b y setting up an isomorphic Ising mo del with disagreement costs (1 − 2 [ k ∈ C ( y ∗ )]) E k . 26 Efficient Exa ct Inference in Planar Ising Models 6.1 Syn thetic Binary Image Denoising Kumar and Heb ert ( 2004 , 2006 ) dev elop ed an image denoising b enchmark problem for binary random fields based on four hand-drawn 64 × 64 pixel images (Figure 9 , top ro w). W e created 50 instances of each image corrupted with pink noise, pro duced by con volving a white noise image (all pixels i.i.d. uniformly random) with a Gaussian density of one pixel standard deviation. Original and pink noise images w ere linearly mixed using signal-to- noise (S/N) amplitude ratios of 1 : n, n ∈ N . Figure 9 shows samples of the resulting noisy instances for S/N ratios of 1:5 (second ro w) and 1:6 (fourth row). W e then employ ed square grid planar Ising CRFs to denoise the images, with edge energies set to E ij := h [1 , | x i − x j | ] , θ i , where x i is the pixel intensit y at no de i . The p erimeter of the grid was connected to a bias no de with constant input and lab el: x 0 = y 0 = 0. Bias edges had their o wn parameters, yielding CRFs with four parameters and up to (for a 64 × 64 grid) 4097 no des and 8316 edges. These systems w ere trained b y maxim um margin (MM) and maxim um lik eliho o d (ML) parameter estimation (Section 5 ) on the 50 noisy instances derived from the first image (Figure 9 , left column) only . The gradient ( 35 ) r esp. ( 30 ) w as computed b y adding the con tributions from all 50 training instances to that of the regularizer, with λ = 100. W e as- sessed the qualit y of the obtained parameters b y determining (via the metho d of Section 3.3 ) the maximum a p osteriori (MAP) states y ∗ := argmax y P ( y | x , θ ) = argmin y E ( y | x , θ ) (37) of the trained CRF for all 150 noisy instances of the other three images. Interpreting these MAP states as attempted reconstructions of the original images, w e then calculated the prediction error rates for b oth no des and edges of the mo del. All experiments were carried out on a Lin ux PC with 3 GB RAM and dual In tel Pen tium 4 pro cessors running at 3.6 GHz, each with 2 MB of level 2 cache. 6.1.1 Noise Level W e first explored the limit of the abilit y of a full-size (64 × 64) MM-trained Ising grid CRF to reconstruct the test images as the noise lev el increases. Rows 3 and 5 of Figure 9 sho w the reconstructions obtained from the noisy instances shown in ro ws 2 and 4, resp ectively . A t low noise levels ( n < 5) w e obtained p erfect reconstruction of the original images. At an S/N ratio of 1:5 the first subtle errors do creep in (Figure 9 , third row), though less than 0.5% of the no des (0.3% of the edges) are predicted incorrectly . At the 1:6 S/N ratio, these figures increase to 2.1% for no des and 1.15% for edges, and the errors b ecome far more noticeable (Figure 9 , b ottom row). F or higher noise levels ( n > 6) the reconstructions rapidly deteriorate as the noise finally ov erwhelms the signal. At these noise levels our h uman visual system was no longer able to accurately reconstruct the images either. 6.1.2 Maximum Margin vs. Maximum Likelihood P arameter Estima tion Next w e compared MM and ML parameter estimation at the S/N ratio of 1:6 (fourth row of Figure 9 ), where reconstruction b egins to break down and an y differences in p erformance should b e evident. T o make ML training computationally feasible, we sub divided each 27 Schraudolph and Kamenetsky (used for training) (used for training) Figure 9: Denoising of binary images by maxim um-margin training of planar Ising grids; from top to b ottom: original images ( Kumar and Heb ert , 2004 , 2006 ), images mixed with pink noise in a 1:5 ratio, reconstruction via MAP state of 64 × 64 Ising grid CRF from those noisy instances, images mixed with pink noise in a 1:6 ratio, and reconstruction from those noisy instances. Only the left-most image w as used for training (max-margin CRF parameter estimation, λ = 100), the others for reconstruction. 28 Efficient Exa ct Inference in Planar Ising Models T able 1: Performance comparison of parameter estimation metho ds on the denoising task; image reconstruction via MAP on the full (64 × 64) mo del and images. The mini- m um in each result column is b oldfaced. T rain Metho d P atch Size T rain Time Edge Error No de Error MM 64 × 64 490.4 s 1.15 % 2.10 % 32 × 32 174.7 s 1.16 % 2.15 % 16 × 16 91.4 s 1.12 % 1.98 % 8 × 8 78.1 s 1.10 % 1.83 % ML 5468.2 s 1.11 % 1.93 % training image into 64 8 × 8 patches, then trained an 8 × 8 grid CRF on those patches. F or MM training w e used the full (64 × 64) images and mo del, as w ell as 32 × 32, 16 × 16, and 8 × 8 patc hes, so as to assess how this sub dividision impacts the qualit y of the mo del. T esting alw ays emplo yed the MAP state of the full mo del on the full images. T able 1 rep orts the edge and no de errors obtained under each exp erimen tal condition. T o assess statistical significance, we performed binomial pairwise comparison tests at a 95% confidence level against the n ull hypothesis that each of the tw o algorithms b eing compared has an equal (50%) chance of outp erforming the other on a given test image. W e found no statistically significant difference here b et w een 8 × 8 CRFs trained by MM vs. ML. How ev er, ML training took 70 times as long to achiev e this, so MM training is m uch preferable on computational grounds. Coun ter to our exp ectations, the no de and edge errors suggest that MM training actually w orks b etter on small (8 × 8 and 16 × 16) image patc hes. W e surmise that this is b ecause small patc hes hav e a relatively larger perimeter, leading to b etter training of the bias edges. P airwise comparison tests, ho wev er, only found the no de error for the 32 × 32 patch-trained CRF to b e significantly worse than for the smaller patc hes; all other differences were b elo w the significance threshold. What we can confiden tly state is that sub dividing the images in to small patches did not hurt p erformance, and yielded m uch shorter training times. 6.1.3 Maximum A Posteriori vs. Mar ginal Posterior Reconstr uction F o x and Nic holls ( 2001 ) hav e argued that the MAP state do es not summarize w ell the information in the p osterior distribution of an Ising mo del of noisy binary images, and prop osed reconstruction via the mar ginal p osterior mo de (MPM) instead. F or binary labels, the MPM is simply obtained by thresholding the marginal p osterior no de probabilities: ( ∀ i ∈ V ) y i := [ P ( y i = 1 | x , θ ) > 0 . 5]. In our Ising mo dels, how ev er, we hav e marginal p osterior probabilities for edges (Section 4.4.2 ), and infer no de states from graph cuts (Algorithm 1 ). Here implemen ting the MPM runs into a difficulty: since the edge set { k ∈ E : P ( k ∈ C ( y | x , θ )) > 0 . 5 } (38) ma y not be a cut of the mo del graph, hence ma y not unam biguously induce a node state. What we really need is the cut closest (in some given sense) to ( 38 ). F or this purp ose we 29 Schraudolph and Kamenetsky T able 2: Comparison of reconstruction metho ds on the denoising task, using the parameters of an MM-trained 8 × 8 CRF. The minimum in each result column is b oldfaced; “test time” is the time required to reconstruct all 150 images. T est Metho d P atch Size T est Time Edge Error No de Error MAP 64 × 64 3.7 s 1.10 % 1.83 % 8 × 8 3.2 s 1.96 % 5.21 % M 3 P 397.5 s 1.95 % 3.32 % form ulate the state y + with the maximal minimum mar ginal p osterior (M 3 P): y + := argmax y 0 min k ∈E P ( k ∈ C ( y | x , θ )) if k ∈ C ( y 0 ) , 1 − P ( k ∈ C ( y | x , θ )) otherwise . (39) In other words, the M 3 P state ( 39 ) is induced by the cut whose edges (and those of its complemen t) hav e the largest minimum marginal probability . W e can efficiently compute y + as follows: 1. Find the minimum-w eigh t spanning tree E + ⊆ E of the mo del graph G ( V , E ) with edge weigh ts −| P ( k ∈ C ( y | x , θ )) − 0 . 5 | . (This can b e done in O ( |E | log |E | ) time.) 2. Run Algorithm 1 on G ( V , E + ) to find y + as the state induced in the spanning tree b y the MPM edge set ( 38 ). Since G ( V , E + ) is a tree, it contains no cycles, and Algorithm 1 will therefore unambiguously iden tify the M 3 P state. T able 2 lists the reconstruction errors achiev ed on the denoising task via MAP vs. M 3 P states, using the parameters of the MM-trained 8 × 8 CRF which gav e the best p erformance in Section 6.1.2 . (W e obtained comparable results for ML training on 8 × 8 patches and MM training on full images.) Figure 10 shows represen tative sample reconstructions. When reconstructing 8 × 8 patches, both MAP and M 3 P states app ear to achiev e virtu- ally the same edge error. The binomial pairwise c omparison test, how ever, rev eals M 3 P to consisten tly outp erform MAP here, even at a 99 % confidence level. This trend is confirmed b y M 3 P’s substantially low er no de error. F or comparison, an oracle that alwa ys picks the b etter of the tw o label-symmetric no de states w ould yield a node error of 2.8 % here; the excess no de error relative to that baseline is almost 5 times lo wer for M 3 P than for MAP . This do es confirm the limitations of the MAP state for reconstruction, as argued by F ox and Nicholls ( 2001 ). Since the M 3 P state ( 39 ) requires calculation of the edge marginals, how ev er, it takes o ver tw o order of magnitude longer to obtain than the MAP state on an 8 × 8 patch, and is computationally infeasible for us to compute for the en tire 64 × 64 image. The MAP state on the entire image clearly outp erforms an y reconstruction from 8 × 8 patches in terms of b oth edge and no de error. The impressiv e scalability of the MAP state computation via blossom-shrinking (Section 3.3 ) th us in practice o verrules here the theoretical adv an tage of reconstruction based on marginal probabilities. 30 Efficient Exa ct Inference in Planar Ising Models Figure 10: Image reconstructions on the denoising task; columns from left to righ t: Noisy 64 × 64 image, MAP reconstruction of full image, MAP reconstruction of 8 × 8 patc hes, and M 3 P reconstruction of 8 × 8 patches. 6.2 Boundary Detection T o further scale up our approach, w e designed a simple b oundary detection task based on images from the GrabCut Ground T ruth image segmentation database ( Rother et al. , 2007a ). W e to ok 100 × 100 pixel subregions of images that depicted a segmen tation b ound- ary , and corrupted the segmentation mask with pink noise, produced b y conv olving a white noise image (all pixels i.i.d. uniformly random) with a Gaussian density with one pixel standard deviation. W e then emplo yed a planar Ising mo del to recov er the original b oundary — namely , a 100 × 100 square grid with one additional edge p egged to a high energy , enco ding prior kno wledge that the b ottom left and top righ t corners of the grid depict different regions. As b efore, the energy of the other edges w as set to E ij := h [1 , | x i − x j | ] , θ i , where x i is the pixel intensit y at no de i . W e did not employ a bias no de for this task, and simply set the regularization constant to λ = 1. Note that this is a huge mo del: 10 000 no des and 19 801 edges. Computing the partition function or marginals w ould require in v erting a Kasteleyn matrix with o ver 1 . 5 · 10 9 en tries; minimizing ( 29 ) is therefore computationally infeasible for us. Computing a ground state 31 Schraudolph and Kamenetsky Figure 11: Boundary detection by maxim um-margin training of planar Ising grids; columns from left to right: original image, original segmentation mask, noisy mask for S/N ratio of 1:8 (top t w o rows) r esp. 1:7 (b ottom t wo rows), and MAP segmen- tation obtained from the Ising grid CRF trained on the noisy mask. via the algorithm describ ed in Section 3 , b y contrast, tak es only 0.3 CPU seconds on an Apple MacBo ok with 2.2 GHz Intel Core2 Duo pro cessor. 6 W e can therefore efficiently minimize ( 31 ) to obtain the MM parameter vector θ ∗ , then compute the CRF’s MAP ( i.e. , ground) state for rapid prediction. As before, we titrated for the smallest S/N ratio of the form 1 : n for whic h w e obtained a go o d segmen tation; dep ending on the image this o ccurred for n = 7 or 8. Figure 11 (right 6. In scalability tests our co de was able to compute the MAP state of a 400 × 400 grid in 0.85 CPU seconds. 32 Efficient Exa ct Inference in Planar Ising Models column) shows that at these noise levels our approach is capable of recov ering the original segmen tation boundary quite well, with less than 1% of no des mislab eled. F or S/N ratios of 1:9 and lo wer the system was unable to lo cate the b oundary; for S/N ratios of 1:6 and higher w e obtained p erfect reconstruction. Again this corresp onded closely to our h uman abilit y to visually lo cate the segmentation boundary accurately . 7. Discussion and Outlo ok W e ha ve prop osed an alternative algorithmic framew ork for efficien t exact inference in binary graphical models, which replaces the submodularity constrain t of graph cut metho ds with a planarity constraint. Besides pro ving efficient and effectiv e in first exp eriments, our approac h op ens up a num b er of interesting researc h directions to b e explored: Our algorithms can all b e extended to nonplanar graphs, at a cost exp onen tial in the gen us of the embedding. W e are curren tly dev eloping these extensions, which may prov e of great practical v alue for graphs that are “almost” planar; examples include road net works (where edge crossings arise from o v erpasses without on-ramps) and graphs describing the tertiary structure of proteins ( Vishw anathan et al. , 2007 ). Our algorithms also provide a foundation for efforts to develop efficien t appr oximate inference metho ds for nonplanar Ising mo dels. Our metho d for calculating the ground state (Section 3 ) actually works for nonplanar graphs whose ground state do es not con tain frustrated non-con tractible cycles. The QPBO graph cut metho d ( Kolmogoro v and Rother , 2007 ) finds ground states that do not con tain an y frustrated cycles, and otherwise yields a p artial labeling. Can we lik ewise obtain a partial lab eling of ground states with frustrated non-con tractible cycles? The existence of tw o distinct tractable frameworks for inference in binary graphical mo dels implies a more p o werful h ybrid: Consider a graph each of whose biconnected com- p onen ts is either planar or submo dular. In its en tiret y , this graph ma y be neither planar nor submo dular, y et efficient exact inference in it is clearly p ossible b y applying the appropriate framew ork to each component, then com bining the results (Section 2.4.2 ). Can this hybrid approac h b e extended to cov er less ob vious situations? Ac kno wledgments Earlier drafts of this paper hav e app eared on the arXiv preprin t serv er ( Sc hraudolph and Kamenetsky , 2008 ) and (in condensed form) at the 2008 NIPS conference ( Schraudolph and Kamenetsky , 2009 ). W e thank the anon ymous NIPS reviewers for their v aluable feedback. NICT A is a national researc h institute with a c harter to build Australia’s pre-eminen t cen tre of excellence for information and comm unications tec hnology (ICT). NICT A is build- ing capabilities in ICT researc h, researc h training and commercialisation for the generation of national b enefit. NICT A is funded by the Australian Gov ernmen t as represented b y the Departmen t of Broadband, Comm unications and the Digital Economy and the Australian Researc h Council through the ICT Centre of Excellence program. 33 Schraudolph and Kamenetsky References F rancisco Barahona. On the computational complexit y of Ising spin glass mo dels. Journal of Physics A: Mathematic al, Nucle ar and Gener al , 15(10):3241–3253, 1982. P eter Benner, Ralph Byers, Heike F assb ender, V olker Mehrmann, and David W atkins. Cholesky-lik e factorizations of sk ew-symmetric matrices. Ele ctr onic T r ansactions on Nu- meric al A nalysis , 11:85–93, 2000. J. Besag. On the statistical analysis of dirty pictures. Journal of the R oyal Statistic al So ciety B , 48(3):259–302, 1986. I. Bieche, R. Maynard, R. Rammal, and J. P . Uhry . On the ground states of the frustration mo del of a spin glass b y a matching metho d of graph theory . Journal of Physics A: Mathematic al, Nucle ar and Gener al , 13:2553–2576, 1980. John M. Bo yer and W endy J. Myrvold. On the cutting edge: Simplified O ( n ) pla- narit y by edge addition. Journal of Gr aph Algorithms and Applic ations , 8(3):241–273, 2004. Reference implemen tation (C source co de): http://jgaa.info/accepted/2004/ BoyerMyrvold2004.8.3/planarity.zip . Y uri Boyk o v, Olga V eksler, and Ramin Zabih. F ast appro ximate energy minimization via graph cuts. IEEE T r ansactions on Pattern Analysis and Machine Intel ligenc e , 23(11): 1222–1239, 2001. James R. Bunch. A note on the stable decomp osition of skew-symmetric matrices. Mathe- matics of Computation , 38(158):475–479, April 1982. William Cook and Andr ´ e Rohe. Computing minim um-weigh t p erfect matc hings. INFORMS Journal on Computing , 11(2):138–148, 1999. C source co de a v ailable at http://www.isye. gatech.edu/ ~ wcook/blossom4/ . Jac k Edmonds. Maximum matching and a polyhedron with 0,1-vertices. Journal of R ese ar ch of the National Bur e au of Standar ds , 69B:125–130, 1965a. Jac k Edmonds. P aths, trees, and flow ers. Canadian Journal of Mathematics , 17:449–467, 1965b. Mic hael E. Fisher. Statistical mechanics of dimers on a plane lattice. Physic al R eview , 124 (6):1664–1672, 1961. Mic hael E. Fisher. On the dimer solution of planar Ising mo dels. Journal of Mathematic al Physics , 7(10):1776–1781, 1966. C. F ox and G. K. Nicholls. Exact map states and expectations from p erfect sampling: Greig, P orteous and Seheult revisited. In A. Mohammad-Djafari, editor, Bayesian Infer enc e and Maximum Entr opy Metho ds in Scienc e and Engine ering , v olume 568 of Americ an Institute of Physics Confer enc e Series , pages 252–263, 2001. 34 Efficient Exa ct Inference in Planar Ising Models Zvi Galil, Silvio Micali, and Harold N. Gab ow. An O ( E V log V ) algorithm for finding a maximal weigh ted matching in general graphs. SIAM Journal of Computing , 15:120–130, 1986. Amir Glob erson and T ommi Jaakkola. Appro ximate inference using planar graph decompo- sition. In B. Sc h¨ olk opf, J. Platt, and T. Hofmann, editors, A dvanc es in Neur al Information Pr o c essing Systems 19 , pages 473–480, Cam bridge, MA, 2007. MIT Press. D. M. Greig, B. T. P orteous, and A. H. Seheult. Exact maximum a p osteriori estimation for binary images. Journal of the R oyal Statistic al So ciety B , 51(2):271–279, 1989. Carsten Gut w enger and P etra Mutzel. Graph em b edding with minimum depth and max- im um external face. In G. Liotta, editor, Gr aph Dr awing 2003 , volume 2912 of L e ctur e Notes in Computer Scienc e , pages 259–272. Springer V erlag, 2004. F. Hamze and N. de F reitas. F rom fields to trees. In Unc ertainty in Artificial Intel ligenc e (UAI) , 2004. John E. Hop croft and Robert E. T arjan. Algorithm 447: Efficient algorithms for graph manipulation. Communic ations of the ACM , 16(6):372–378, 1973. Pieter W. Kasteleyn. The statistics of dimers on a lattice: I. the num b er of dimer arrange- men ts on a quadratic lattice. Physic a , 27(12):1209–1225, 1961. Pieter W. Kasteleyn. Dimer statistics and phase transitions. Journal of Mathematic al Physics , 4(2):287–293, 1963. Pieter W. Kasteleyn. Graph theory and crystal physics. In F rank Harary , editor, Gr aph The ory and The or etic al Physics , chapter 2, pages 43–110. Academic Press, London and New Y ork, 1967. Vladimir Kolmogorov and Carsten Rother. Minimizing nonsubmo dular functions with graph cuts – a review. IEEE T r ans. Pattern Analysis and Machine Intel ligenc e , 29(7):1274–1279, July 2007. Vladimir Kolmogorov and Ramin Zabih. What energy functions can be minimized via graph cuts? IEEE T r ans. Pattern Analysis and Machine Intel ligenc e , 26(2):147–159, F eb. 2004. Sanjiv Kumar and Martial Heb ert. Discriminativ e fields for mo deling spatial dep endencies in natural images. In S. Thrun, L. Saul, and B. Sc h¨ olk opf, editors, A dvanc es in Neur al Information Pr o c essing Systems 16 , 2004. Sanjiv Kumar and Martial Heb ert. Discriminative random fields. International Journal of Computer Vision , 68(2):179–201, 2006. http://www- 2.cs.cmu.edu/ ~ skumar/DRF_IJCV.pdf . F rauke Liers and Gregor Pardella. A simple max-cut algorithm for planar graphs. T ec hnical Rep ort zaik2008-579, Zentrum f ¨ ur Angew andte Informatik, Univ ersit¨ at K¨ oln, Septem b er 2008. http://www.zaik.uni- koeln.de/ ~ paper/preprints.html?show=zaik2008- 579 . Dong C. Liu and Jorge Nocedal. On the limited memory BF GS metho d for large scale optimization. Mathematic al Pr o gr amming , 45(3):503–528, 1989. 35 Schraudolph and Kamenetsky Kurt Mehlhorn and Guido Sch¨ afer. Implemen tation of O ( nm log n ) weigh ted matchings in general graphs: The p o wer of data structures. A CM Journal of Exp erimental Algorithms , 7:138–148, 2002. Article and C source co de (requires LED A 4.2 or higher) at http: //www.jea.acm.org/2002/MehlhornMatching/ . Jorge No cedal. Up dating quasi-Newton matrices with limited storage. Mathematics of Computation , 35:773–782, 1980. Gregor P ardella and F rauke Liers. Exact ground states of h uge t w o-dimensional planar ising spin glasses. T echnical Rep ort 0801.3143, arXiv, January 2008. abs/0801.3143 , to app ear in Phys R ev E . C. Rother, V. Kolmogorov, A. Blak e, and M. Brown. GrabCut ground truth database. http: //research.microsoft.com/vision/cambridge/i3l/segmentation/GrabCut.htm , 2007a. Carsten Rother, Vladimir Kolmogorov, Victor Lempitsky , and Martin Szummer. Optimiz- ing binary MRFs via extended ro of dualit y . In Pr o c. IEEE Conf. Computer Vision and Pattern R e c o gnition , Minneap olis, MN, June 2007b. Helga Sc hramm and Jo c hem Zow e. A version of the bundle idea for minimizing a non- smo oth function: Conceptual idea, conv ergence analysis, n umerical results. SIAM J. Optimization , 2:121–152, 1992. Nicol N. Schraudolph and Dmitry Kamenetsky . Efficient exact inference in planar Ising mo dels. T echnical Rep ort 0810.4401, arXiv, Octob er 2008. 0810.4401 . Nicol N. Schraudolph and Dmitry Kamenetsky . Efficient exact inference in planar Ising mo dels. In A dvanc es in Neur al Information Pr o c essing Systems 21 , Cambridge, MA, to app ear 2009. MIT Press. Nicol N. Sc hraudolph, Jin Y u, and Simon G ¨ un ter. A sto c hastic quasi-Newton metho d for online c on v ex optimization. In Pr o c. 11th Intl. Conf. A rtificial Intel ligenc e and Statistics (AIstats) , San Juan, Puerto Rico, Marc h 2007. So ciet y for Artificial Intelligence and Statistics. http://nic.schraudolph.org/pubs/SchYuGue07.pdf . B. T ask ar, C. Guestrin, and D. Koller. Max-margin Mark ov net w orks. In S. Thrun, L. Saul, and B. Sc h¨ olkopf, editors, A dvanc es in Neur al Information Pr o c essing Systems 16 , pages 25–32, Cambridge, MA, 2004. MIT Press. Creigh ton K. Thomas and A. Alan Middleton. Matc hing Kasteleyn cities for spin glass ground states. Physic al R eview B , 76(22):22406, 2007. S. V. N. Vishw anathan, Karsten Borgw ardt, and Nicol N. Sc hraudolph. F ast computation of graph kernels. In B. Sch¨ olk opf, J. Platt, and T. Hofmann, editors, A dvanc es in Neur al Information Pr o c essing Systems 19 , Cambridge MA, 2007. MIT Press. Martin J. W ainwrigh t, T ommi S. Jaakkola, and Alan S. Willsky . T ree-based reparameteri- zation framew ork for analysis of sum-pro duct and related algorithms. IEEE T r ansactions on Information The ory , 49(5):1120–1146, 2003. 36 Efficient Exa ct Inference in Planar Ising Models Martin J. W ainwrigh t, T ommi S. Jaakk ola, and Alan S. Willsky . A new class of upp er b ounds on the log partition function. IEEE T r ansactions on Information The ory , 51(7): 2313–2335, 2005. Y air W eiss. Comparing the mean field metho d and b elief propagation for approximate inference in MRFs. In Da vid Saad and Manfred Opp er, editors, A dvanc e d Me an Field Metho ds . MIT Press, 2001. Arth ur T. White and Lo well W. Beineke. T op ological graph theory . In Low ell W. Beineke and Robin J. Wilson, editors, Sele cte d T opics in Gr aph The ory , c hapter 2, pages 15–49. Academic Press, 1978. Aaron Windsor. Planar graph functions for the bo ost graph library . C++ source co de, bo ost file v ault: http://boost- consulting.com/vault/index.php?directory=Algorithms/graph , 2007. J.S. Y edidia, W.T. F reeman, and Y. W eiss. Understanding b elief propagation and its gen- eralizations. In Exploring Artificial Intel ligenc e in the New Mil lennium , chapter 8, pages 239–236. Science & T echnology Books, 2003. 37
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment