Relations among conditional probabilities

Relations among conditional probabilities Jason Morton Octob er 28, 201 8 Abstract W e describ e a Gr¨ obner b a sis of relations among cond i tional probabilities in a discrete probabilit y space, w i th an y set of conditioned-up on ev en ts. They ma y b e sp eciali zed to the partially-observ ed random v ariable case, the pu rely conditional case, a nd other sp ecia l cases. W e also in v e stigate t he connectio n t o generaliz ed p erm utohedra and d esc rib e a “conditional p robabilit y simplex.” 1 Relations among conditi o nal probabilities In 1974, Julian Besag [4] discussed the “unobvious and highly restrictiv e consistency conditions” among conditional proba bilities. In this pap er w e give an answ er in the discrete case to the question What c onditions must a set o f c onditional pr ob abilities satisfy in or der to b e c omp atible with some joint distribution? Let Ω = { 1 , . . . , m } b e a ﬁnite set of singleton ev en ts, and let p = ( p 1 , . . . , p m ) be a probabilit y distribution on them. Let E b e a set of observ a b le ev en t s whic h will b e conditioned on, each a set of at least 2 singleton ev en ts. Then for ev en ts I ⊂ J , J in E , w e can assign conditional probabilities for the c hance o f I given J , denoted p I | J . Settling Besag’s question then b ecomes a matter of determining the r elations tha t must hold amo ng the quantities p I | J . F or example, Besag giv es the relation (see also [3]), P ( x ) P ( y ) = n Y i =1 P ( x i | x 1 , . . . , x i − 1 , y i +1 , . . . , y n ) P ( y i | x 1 , . . . , x i − 1 , y i +1 , . . . , y n ) . (1) Since there are in general inﬁnitely man y suc h relations, we would lik e to o rganize them in to an ideal a n d provide a nice basis for tha t ideal. A quic k review o f language of ideals, v arieties, and Gr¨ obner bases app ears in Geiger et al. [11, p. 1471] and more detail in Co x et al. [7]. In Theorem 3.2, we generalize relations such as (1) and Bay es’ rule to g iv e a univ ersal Gr¨ obner basis of this ideal, a type of basis with useful algorithmic prop erties. The second result generalized in this pap er is due to Mat ´ u ˇ s [15]. This states that the space of conditional probabilit y distributions ( p i | ij ) conditioned on ev en ts of size tw o maps homeomorphically on to the p erm utohedron. In Theorem 4.3 , we generalize this result to 1 arbitrary sets E of conditioned-up on ev ents. The resulting image is a g e ner ali z e d p ermu- tohe dr on [20, 24 ]. This is a p olytop e which provide s a canonical, conditional-probability analog to the probabilit y simplex under the corresp ondance prov ided by toric geometry [23] and the t h eory of exp onen tial families. W ork on the sub j e ct of relations among conditional probabilities has primar ily fo cused on the case where the ev ents in E corresp ond to observing t h e states o f a subset of n ran- dom v aria ble s. Arnold et. al. [2] dev elop the theory for b oth discrete and con tin uous random v ariables, par t icularly in the case of t wo ra nd om v ariables, a nd cast the com- patibilit y of tw o families of conditional distributions as a solutions to a system of linear equations. Slavk o vic and Sulliv an t [22] consider the case of compatible full conditionals, and compute related unimo dular ideals. This pap er is organized as follow s. In Section 2, w e intro d uce some necessary deﬁni- tions. In Section 3, w e g iv e compatibility conditions in the general case of m ev en ts in a discrete pr o babilit y space, with any set E of conditioned-up on ev en ts. These conditions come in the form of a univ ersal Gr¨ obner basis, whic h makes t he m particularly useful for computations: as a result, they ma y b e sp ec ialized t o the partially observ ed random v ariable case, the purely conditional case, and other sp ecial cases simply by c hanging E . In [1 4, 17], w e ha ve seen that p erm utohedra and generalized permutohedra [2 0 ] play a cen tral role in the geometry of conditional indep endence; the same is true of conditional probabilit y . The geometric results of Mat ´ u ˇ s [15] map the space of conditional pro babilit y distributions (D eﬁ nition 2.1) for all p ossible conditioned ev en ts E = { I ⊂ [ m ] : | I | ≥ 2 } on t o the p erm utohedron P m − 1 . See Figure 1 for a diagra m of the 3-dimensional p erm u- tohedron. In Section 4, we will discuss how to extend this result to general E , in which case w e obtain generalized p erm utohedra a s the image. This will b e accomplished using a 3214 • 2314 • 3241 • 2341 ◦ 3124 • 2134 • 3421 • 2431 ◦ 1324 • 1234 • 3142 • 2143 • 3412 • 2413 ◦ 4321 ◦ 4231 ◦ 1342 • 1243 • 4312 • 4213 ◦ 1432 • 1423 • 4132 • 4123 • L L L L L ) ) ) ) ) ) ) ) ) ) ) ) ) ) )                      P P P P P O O O O O            + + + + + + + + : : : : : : : : : : : : 9 9 9 9 9 9 9 9 9 9 9 9 v v v v v v v v v v v v x x x x x x x x x x x x x x x x x x x x x x x x ( ( ( ( ( ( ( ( 9 9 9 9 9 9 9 9 9 9 9 9 P P P P P P            x x x x x x x x x x x x 7 7 7 7 7 7 7 7 7 7 7 7 Figure 1 : The p e rm uto he dron P 4 . v ersion of the momen t map of t o ric geometry (Theorem 7.1 ) . In Section 5, w e discuss how to sp ecialize our r esults to the case of n partially observ ed random v ariables, including a s 2 an example how to reco ver the relation (1). Finally , in Section 6 w e use this specialization to explain the relationship of Bay es’ rule to our constructions. In the App endix we recall a few necessary facts ab out toric v arieties. 2 Condit i o nal probability di s tributions Let E b e a collection of subse ts I , with | I | ≥ 2 , of [ m ] = Ω = { 1 , . . . , m } . Let C [ E ] denote the eve nt algebr a , the p olynomial ring with indeterminates p i | I for all I ∈ E and i ∈ I , i.e. one unkno wn fo r each elemen tary conditiona l probability . Then w e denote b y k E k = X I ∈ E | I | the num b er o f v ariables of C [ E ]. W e write p i for p i | [ m ] when [ m ] ∈ E . The unkno wns of C [ E ] are mean t to represen t conditional probabilities, as we now explain. The set { 1 , . . . , m } indexes the m disjoint ev en ts, and a p oin t ( p 1 , . . . , p m ) ∈ R m ≥ 0 with P j p j = 1 represen ts a probabilit y distribution o n these ev ents. When p j > 0 f or all j , the c onditional pr ob ability of even t i given ev ent I con taining it is p i | I = p i P j ∈ I p j . (2) T o extend this notio n to the case P ( I ) = P j ∈ I p j = 0, and to b e a ble to deal with m ultiple conditioning sets, w e make the following standard deﬁnition [5], considered in this f orm b y Mat ´ u ˇ s [15]. Deﬁnition 2.1. A c onditional pr ob ability distribution for E is a p oin t ( p i | I : i ∈ I ∈ E ) ∈ R k E k ≥ 0 suc h that for all J, K ∈ E with J ⊂ K , (i) P i ∈ J p i | J = 1 (ii) for all i ∈ J , p i | K = p i | J P j ∈ J p j | K . Observ e that (ii) is a relativ e v ersion o f (2), as (2) follow s from (ii) with K = [ m ], J = I , and P i ∈ I p i 6 = 0. If on the other hand P j ∈ J p j | K = 0, the whole probability simplex ∆ J := { ( p j | J ) j ∈ J : p j | J ≥ 0 , P j ∈ J p j | J = 1 } satisﬁes the deﬁnition. This freedom is know n in probability theory as versio n s of c onditional pr ob ability [5]. In algebraic geometry , this corresp onds to the notion of a blo w-up, [13] and the simplex ∆ J to the exceptional divisor. Before w e give a homogenized v ersion of Deﬁnition 2.1 , we consider the homogenized v ersion of probability . 2.1 A pro jecti v e view of pr ob abilit y Consider a probabilit y space with m disjoin t ato mic ev ents ([ m ] , 2 [ m ] , P ). The space of probability distributions P on them is typic ally repres en ted as a pr ob abili ty simplex , 3 where eac h P ( i ) is a co ordinate p i suc h that p i ≥ 0 a nd P i p i = 1. W e will b e describing families o f probability distributions in terms of algebr aic varieties , and w e prefer to think of p oin ts ( p 1 : · · · : p m ) as lying in complex pro jective space. This is equiv alen t to letting V = C { e 1 , . . . , e m } ∼ = C m b e the complex v ector space spanned b y the outcomes (singleton ev en t s ) and considering p oin ts p ∈ P V as represen ting mixtures ov er outcomes or proba b ilit y distrubutions. There are tw o w ay s to match up the notio n of the probability simplex with that of complex pro jectiv e space. One wa y to do so, r estriction , iden tiﬁes the probabilit y simplex ∆ m − 1 with the real, p ositiv e part of the aﬃne op en P i y i 6 = 0 of the P m − 1 with homogeneous co ordinates ( y 1 : y 2 : · · · : y m ) as illustrated in Figure 2. (0 : 0 : 1 ) • (1 : 0 : 0 ) •                                              (0 : 1 : 0 ) • 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Figure 2 : Probabilit y simplex in the pro jectiv e plane Alternativ ely w e can use pr oje ction , equiv alent in the special case t h at ( y 1 : · · · : y m ) ∈ ∆ m − 1 , via the momen t map (Theorem 7.1). The iden tit y matrix A = I m comprised of standard unit vec tors e i deﬁnes the probability simplex ∆ m − 1 = con v ( A ). The toric v ariet y Y A is then the pro jectiv e space P m − 1 and the momen t map is: µ : P m − 1 → ∆ m − 1 µ (( y 1 : · · · : y m )) = 1 P i | y i | | y i | e i The momen t map µ is the iden tit y map on the pro babilit y simplex, but allows us to deﬁne a p oin t on the probability simplex for more general p oin ts in complex pro jectiv e space. The ﬁb er ov er an y o f these p oin ts is the torus ( S 1 ) n , a pro duct of m unit circles, since µ ( y 1 : . . . , : y m ) = µ ( e iθ 1 y 1 : · · · : e iθ m y m ). A similar p oin t of view app ears in quan tum ph ysics; here V = C { x : x a classical state } is the Hilb ert space represen ting quan tum state and the mo diﬁed momen t map µ ′ ( y ) : 1 P i | y i | 2 | y i | 2 e i deﬁnes the probability of observing a classical state (singleton ev ent) [18]. One interpretation o f this freedom is that it suggests there are circumstance s where allo wing probabilities to b e negativ e and ev en complex in interme diate computations migh t b e useful. This may seem o dd, but it can b e arg ue d that negativ e proba bilitie s a re already implicitly emplo ye d [9 ]. F or example, c haracteristic function metho ds implicitly 4 write a densit y as a linear com bination of basis functions with ranges unrestricted to R ≥ 0 . Even if w e are uncomfor table with suc h interpretations, the compactiﬁcation and homogenization can simply b e view ed as a con vienien t algebraic tric k to mak e it easy to determine the relatio n s a mo ng conditional probabilities w e are ultimately in terested in. Moreo ver, for most purp oses C can b e replaced with R [11] as the ba s e ﬁeld for our ring , and these relations are unc hanged. 2.2 Homogeneous conditional probabilit y Analogously to the pro jectiv e v ersion of probabilit y in Section 2.1, where w e replaced the requiremen t that pro babilities p 1 , . . . , p m sum to one with viewing them as co ordinates of a p oin t in pro jectiv e space, w e no w deﬁne a multihomogeneous version o f D e ﬁnition 2.1. No w, a conditional probabilit y distribution is represen ted b y a p oin t in the pro duct of pro jectiv e spaces. This pro duct has one P | I |− 1 for each ev ent I ∈ E whic h is conditioned up on, and each fa c tor space P | I |− 1 is equipp ed with homogeneous co ordinates ( p i 1 | I : · · · : p i | I | | I ). Deﬁnition 2.2. A pr oje c t ive c onditional pr ob abi l i ty distribution for E is a p oin t p = (( p i 1 | I : · · · : p i | I | | I ) , I ∈ E ) inside Q I ∈ E P | I |− 1 suc h that for all J, K ∈ E and i ∈ J ⊂ K , ( X j ∈ J p j | J ) p i | K = p i | J ( X j ∈ J p j | K ) Deﬁnition 2 .2 sp eciﬁes the following ideal in the ev en t algebra C [ E ]: J E = h ( X j ∈ J p j | J ) p i | K − p i | J ( X j ∈ J p j | K ) : J , K ∈ E , i ∈ J ⊂ K i . This ideal consists of all p olynomial relations that a p oin t P = ( p i | I ) in Q I ∈ E P | I |− 1 m ust satisfy to b e a pro jectiv e conditional probabilit y distribution. In particular, any ho nest conditional probability distribution m ust satisfy these. If w e denote b y { e I : I ∈ E } a basis of Z | E | , this ideal J E is m ultihomogeneous with resp ect to the g rading deg ( p i | I ) = e I (see e.g. [16 ] for more on suc h gradings). In what follows, it will b e con v enien t to abbreviate p J | J := P j ∈ J p j | J . Th us p J | J w ould b e equal to 1 for honest distributions, b y Deﬁnition 2.1, but here we regard it as a linear form in C [ E ]. Let α E denote the pro duct Q i ∈ I ∈ E p i | I of all of the k E k v ariables in C [ E ], a nd let β E denote the pro duct Q I ∈ E p I | I . The satur ation ( I : f ∞ ) of an ideal I is the ideal generated by all p olynomials g suc h that f m g ∈ I for some m [2 3]. No w w e deﬁne the ideal I E , when [ m ] ∈ E , b y t he saturation I E := ( J E : ( α E β E ) ∞ ) . When [ m ] / ∈ E , let E ′ = E ∪ [ m ] and set I E := I E ′ ∩ C [ E ]. The purp ose of saturation is to make sure the desired b eha vior o ccurs when some co ordinates are zero; for example, it is necessary to mo ve b et w een the conditional indep endence ideals [11 ] generated by expressions P ( X = x, Y = y | Z = z ) − P ( X = x | Z = z ) P ( Y = y | Z = z ) a nd b y the 5 cross pro duct diﬀerences P ( x, y , z ) P ( x ′ , y ′ , z ) − P ( x , y ′ , z ) P ( x ′ , y , z ) algebraically without assuming a n ything ab out the p ositivit y of the pro ba bilit ies in question. In the next section, w e describ e a matrix A G suc h that I E arises as the toric ideal I A G (Section 7). Our ﬁrst main result will b e a univ ersal Gr¨ obner basis fo r the toric ideal I E . Gr¨ obner bases, particularly univers al Gr¨ obner bases, ha ve man y algo rithmic prop erties that mak e them a v ery complete description of an ideal. Co x, Little, and O’Shea [7] giv e an accessible ov erview; see also [23, 12]. 3 A univ ers al Gr¨ obner basis for rel ati ons among con- ditional p r o babilities A Ba y es binomial in C [ E ] is a binomial relation o f the form p i | K p j | J − p j | K p i | J for i, j ∈ J ⊆ K , with J, K ∈ E . Let I Ba ye s( E ) denote the ideal they generate. Ba y es binomials get their name b ecaus e they come f rom Ba yes ’ rule; more explanation is given in Section 6. Prop osition 3.1. The ide al ge ner ate d by the Bayes binomials c ontains J E and is c on- taine d in the sa tur ation of J E by the p r ob abil i ties that would sum to one (wher e again β E = Q I ∈ E p I | I ): J E ⊆ I Ba ye s( E ) ⊆ ( J E : ( β E ) ∞ ) and in p articular, I Ba ye s( E ) ⊆ I E . Pr o of. The ideal J E is generated b y the degree-2 p olynomials p J | J p i | K − p i | J p J | K for J, K ∈ E a nd i ∈ J ⊆ K . F or eac h i, j ∈ J , w e hav e a = p j | J ( p J | J p i | K − p i | J p J | K ) and b = p i | J ( p J | J p j | K − p j | J p J | K ) in J E , so a − b = p J | J ( p j | J p i | K − p j | K p i | J ) is in J E and I Ba ye s( E ) ⊆ ( J E : ( β E ) ∞ ). F or the ﬁrst inclusion, if p J | J p i | K − p i | J p J | K is a generator of J E , w e may write it as an elemen t P j ∈ J ( p i | K p j | J − p j | K p i | J ) o f I Ba ye s( E ) . Our univ ersal G r¨ obner basis of I E will b e giv en combinatorially b y the cycles of a lab eled bipartite gr a ph G ( E ) , deﬁned as fo llo ws: V ertices: one v ertex u I for each I ∈ E and one ve rtex v i for each i ∈ ∪ I ∈ E I Edges: a directed edge u I → v i for each I ∈ E and i ∈ I Edge Lab els: the edge u I → v i is lab eled with the indeterminate p i | I . F or example, with n = 4, the lab eled graph G for E = {{ 1 , 2 } , { 1 , 2 , 3 } , { 1 , 2 , 3 , 4 }} is sho wn in Fig ure 3. Each oriented cycle C in the undirected ve rsion of G deﬁnes a binomial f C as follo ws: eac h edge lab el is on the p ositiv e side of t he binomial if its edge is directed with the cycle , and on the negativ e if against. F or example, in the graph in Figure 3, consider the cycle (123 4 , 3 , 123 , 1 , 1234). The edges p 3 and p 1 | 123 are directed with the cycle and the edges p 3 | 123 and p 1 are directed against, so the corresp o nding binomial is p 3 p 1 | 123 − 6 123 1 3 2 p 1 | 123 O O p 2 | 123 } } { { { { { { p 3 | 123 ! ! C C C C C C 12 p 1 | 12 7 7 o o o o p 2 | 12   1234 4 p 4 / / p 1 u u p 3 r r f f f f f f f f f f f f f f f f f f f f p 2 i i Figure 3: Bipartite graph for E = {{ 1 , 2 } , { 1 , 2 , 3 } , { 1 , 2 , 3 , 4 }} . 1 3 2 12 13 23 p 1 | 12 7 7 o o o o p 2 | 12   p 3 | 13   p 1 | 13 g g O O O O p 2 | 23 e e L L L L p 3 | 23 9 9 r r r r Figure 4 : Outer cycle of the bipartit e gra ph for E = {{ 1 , 2 } , { 1 , 3 } , { 2 , 3 } , { 1 , 2 , 3 }} . p 3 | 123 p 1 . F or a higher degree example, with n = 3 and E = {{ 1 , 2 } , { 1 , 3 } , { 2 , 3 } , { 1 , 2 , 3 }} , w e get p 1 | 12 p 3 | 13 p 2 | 23 − p 2 | 12 p 3 | 23 p 1 | 13 from the outer cycle, a s shown in Figure 4. A cycle is induc e d if it has no chord. Theorem 3.2. The binomials deﬁne d by the cycles of G ( E ) give a univers a l Gr¨ obner b asis for I E . Mor e over, I E is gener ate d by the ind uc e d cycle binomials, though not ne c essarily as a Gr¨ obner b asis. In order t o prov e Theorem 3.2, we ﬁrst need to recall some facts a b out unimo dular toric ideals, of whic h I E is a n example. Unimo dular matrices and unimo dular toric ideals are deﬁned and c haracterized as fo llo ws, follo wing Sturmfels [23]. A triang ulation of A is a collection F of subsets B of the columns of A suc h that { p os( B ) : B ∈ F } is the set of cones in a simplicial f a n with supp ort p os( A ). A triangulation o f A is unimo dular if the normalized v o lume [23] is equal to one f o r all ma ximal simplices B in the triangula tion. The matrix A is a unimo dular matrix if all triangulations of A are unimo dular. W e deﬁne a unimo dular toric ideal in the following deﬁnition-prop osition. Prop osition 3.3. [2 3] A toric ide al I A is c al le d unimo dular if any of the fol lowing e quiv- alent c ond itions hold. (i) Every r e duc e d Gr¨ obner b asis of I A c onsists of squar efr e e binomia l s , (ii) A is a unimo dular matrix, (iii) al l the initial ide als o f I A ar e squar efr e e. 7 A special class of unimo dular matrices a re those coming from bipa r t it e graphs [1, 22]. Let G = ( U, V , E ) b e a bipartite graph. In our case, G ( E ) has U = { u I : I ∈ E } and V = { v i : i ∈ ∪ I ∈ E I } . (3) Let A b e the ve rtex-edge incidence matrix of G : The rows of A are lab eled u 1 , . . . , u | U | , v 1 , . . . , v | V | , the columns are lab eled with the edges, and a ij is 1 if v ertex i is in edge j a nd zero otherwise. F or a cycle C in the graph, the cycle binomia l f C is deﬁned (up to sign) as ab o v e. Let π A b e the map R k E k → R | U | + | V | deﬁned b y a pplying A . W e sa y u ∈ k er ( π A ) is a ci r cuit if supp( u ) is minimal with resp ect t o inclusion in k er ( π A ) and the co ordinates of u a re relativ ely prime [23]. Equiv alen tly , a circuit is an irreducible binomia l x u + − x u − of the to ric ideal I A with minimal supp ort. The Gr aver b asis of the ideal I A consists o f all circuits. F or A from a bipar t it e graph, the circuits of A are precisely the cycle binomials of the g raph [21 , 22]. Additionally , a Gra v er basis is also a unive rsal Gr¨ obner basis in the case o f unimo dular t o ric v arieties (Prop o sition 8.1 1 of [23]). W e summarize these results in the followin g prop osition. Prop osition 3.4. The vertex-e dge incidenc e m atrix A of a bip artite gr ap h G = ( U, V , E ) is unimo dular, so I A is a unimo dular toric ide a l. The cycle bino mials of G ar e the cir cuits of A , and ther efor e deﬁne the Gr aver b asis o f I A . In p articular, they give a universal Gr¨ obner b asis for I A . No w w e are able to prov e our theorem. Pr o of of The or em 3.2. Let A G ( E ) b e the v ertex-edge incidence matrix of G ( E ). By Prop o- sition 3.4, its cycle binomials (circuits) g ive a univ ersal Gr¨ o bner basis of I A G ( E ) . In fact, the induced cycles are enough to generate this ideal [1]. Supp ose C is a cyc le a nd e a c hord, and split C into t wo cycles C 1 and C 2 , b oth con taining e (but in opp osite direc- tions). Asso ciate cycle binomials f C 1 and f C 2 , resp ectiv ely . Then the S -p olynomial ( § 7) with the e - con ta ining terms leading is f C . Ho w ever, this is no longer necessarily a Gr¨ obner basis. F or example, let E = {{ 1 2 } , { 2 3 } , { 123 }} as in Figure 5. 123 1 3 2 12 23 p 1 O O p 2 } } { { { { { { p 3 ! ! C C C C C C p 1 | 12 7 7 o o o o p 2 | 12   p 2 | 23 h h Q Q Q p 3 | 23 6 6 m m m Figure 5 : Bipartite graph for E = {{ 1 , 2 } , { 2 , 3 } , { 1 , 2 , 3 }} . The outer cycle C = 1 → 12 → 2 → 23 → 3 → 123 → 1 giv es the cycle binomial f C = p 1 | 12 p 2 | 23 p 3 | 123 − p 2 | 12 p 3 | 23 p 1 | 123 . The cycle C has a c hord 2 − 123, and the binomial f C lies in the ideal of the tw o binomials p 1 | 12 p 2 | 123 − p 2 | 12 p 1 | 123 and p 2 | 23 p 3 | 123 − p 3 | 23 p 2 | 123 8 after splitting along the c hord. These are b oth the induced cycles of the graph. Ho wev er, for a term order ( § 7) prio ritizing p 2 | 123 (e.g. lexicographic with p 2 | 123 ≻ · · · ), the leading term of f C cannot lie in the initial ideal h p 1 | 12 p 2 | 123 , p 3 | 23 p 2 | 123 i of the ideal generated by the chordal binomials. Next w e sho w that the gra ph ideal and conditiona l probability ideal coincide, I A G ( E ) = I E . F or the con tainmen t I A G ( E ) ⊇ I E , ﬁrst observ e that I Ba ye s( E ) ⊆ I A G ( E ) . This is b ecause i j J K p i | K g g O O O O O O O p i | J w w o o o o o o o o p j | K 7 7 o o o o o o o p j | J ' ' O O O O O O O Figure 6: Subgraph of G ( E ) giving a Bay es binomial. if J, K ∈ E with i, j ∈ J ⊆ K , w e ha ve the subgraph in Figure 6, which is a cycle with asso ciated cycle binomial p j | J p i | K − p i | J p j | K . T ogether with Prop osition 3.1, we now hav e J E ⊆ I Ba ye s( E ) ⊆ I A G ( E ) so, since saturation is inclusion-preserving and I A G ( E ) is prime, I E = ( J E : ( α E β E ) ∞ ) ⊆ ( I A G ( E ) : ( α E β E ) ∞ ) = I A G ( E ) . No w w e sho w the rev erse inclusion I A G ( E ) ⊆ I E . Again b y Pro p osition 3.1, we ha v e I Ba ye s( E ) ⊆ I E . No w assume that [ m ] ∈ E , so that p 1 , . . . , p m ∈ C [ E ]. W e claim that in fact I A G ( E ) ⊆ ( I Ba ye s( E ) : Q m i =1 p i ), fro m whic h t he r esult will follo w. Let C b e an induced cycle of G ( E ) , and f C its cycle binomial. W e m ust sho w that this cycle binomial can b e obtained from the Bay es binomia ls, up to m ultiplication b y Q m i =1 p i . Let C b e the cycle i 1 ← J 1 → i 2 ← J 2 → · · · → i k ← J k → i 1 . With this notation we hav e i 1 , i 2 ∈ J 1 , i 2 , i 3 ∈ J 2 , . . . , i 1 , i k ∈ J k . Then f C = p i 2 | J 1 p i 3 | J 2 · · · p i k | J k − 1 p i 1 | J k − p i 1 | J 1 p i 2 | J 2 · · · p i k | J k . W e sho w the ﬁrst monomial of ( Q k i =1 p i ) f C is equal to the second mo d I Ba ye s( E ) . P air oﬀ as follows: ( p i 1 p i 2 p i 3 · · · p i k ) p i 2 | J 1 p i 3 | J 2 p i 4 | J 3 · · · p i k | J k − 1 p i 1 | J k Step 1 = ( p i 2 p i 2 p i 3 · · · p i k ) p i 1 | J 1 p i 3 | J 2 p i 4 | J 3 · · · p i k | J k − 1 p i 1 | J k Step 2 = ( p i 2 p i 3 p i 3 · · · p i k ) p i 1 | J 1 p i 2 | J 2 p i 4 | J 3 · · · p i k | J k − 1 p i 1 | J k Step 3 . . . 9 where the equalities hold mo d I Ba ye s( E ) . Con tin uing in this fashion, at step k − 1 we hav e = ( p i 2 p i 3 · · · p i k − 1 p i k − 1 p i k ) p i 1 | J 1 p i 2 | J 2 · · · p i k − 2 | J k − 2 p i k | J k − 1 p i 1 | J k Step k − 1 = ( p i 2 p i 3 · · · p i k − 1 p i k p i k ) p i 1 | J 1 p i 2 | J 2 · · · p i k − 2 | J k − 2 p i k − 1 | J k − 1 p i 1 | J k Step k = ( p i 2 p i 3 · · · p i k − 1 p i k p i 1 ) p i 1 | J 1 p i 2 | J 2 · · · p i k − 2 | J k − 2 p i k − 1 | J k − 1 p i k | J k Step k + 1 as desired. In terms of G ( E ), this amoun ts to breaking up a long cycle in to 4-cycles passing thro ugh [ m ], and erasing the o ve rlaps among these cycles. Th us since the induced cycles generate I A G ( E ) , we ha ve I A G ( E ) ⊆ ( I Ba ye s( E ) : m Y i =1 p i )) ⊆ I E This prov es the result in the sp ecial case [ m ] ∈ E . In the g eneral case, suppo se w e ha ve some E not con taining [ m ], enabling us t o obtain relatio ns among ’pure’ conditional probabilities (i.e. excluding p 1 , . . . , p m ). Let E ′ = E ∪ [ m ] and apply the sp ecial case of the Theorem. Then b y [23, Prop osition 4.1 3(c)], since w e ha ve a univ ersal G r¨ obner basis, w e just in tersect it with the smaller co ordinate ring to o btain a univ ersal G r¨ obner basis of the smaller r ing . This corresp onds here to removin g the set [ m ] from E and taking the cycle binomials as o ur new Gr¨ obner basis. 4 Condit i o nal probabilit y and the momen t map In this section we sho w how to recov er and generalize some results of Mat ´ u ˇ s [15] using toric geometry . The main result w e will expand up on maps the space of conditional probabilit y distributions (D eﬁnition 2.1) for all p ossible conditioned ev en ts E = { I ⊂ [ m ] : | I | ≥ 2 } on t o the p ermutohedron b y ﬁrst pro jecting down to ev ents of size 2, E = { I ⊂ [ m ] : | I | = 2 } . Theorem 4.1 (Mat ´ u ˇ s [15 ]) . F or E = { I ⊂ [ m ] : | I | ≥ 2 } and p a c o nditional pr ob abil i ty distribution (D e ﬁnition 2.1), the map W : R k E k → R m , give n by W i ( p ) = X j ∈ [ m ] \ i p i | ij , r estricts to a ho me omorphism of the sp ac e o f c onditional pr ob abi l i ties onto the m − 1 dimensional p ermutohe dr on P m − 1 . Note that the linear map W is the restriction of A = A G ( E ) to the row s lab eled b y the v ertex set V in G (3) and to the columns lab eled by t w o-eve n t conditional proba bilities (edges in G ( E )) p i | ij . In fact A , will in g eneral deﬁne a map from the space of pro jectiv e conditional proba bility distributions onto a generalized p erm uto hedron ∆ E deﬁned b elow. 10 First consider t he m ultipro jectiv e toric v ariet y Z A cut out of Q I ∈ E P | I |− 1 b y the equa- tions of Theorem 3.2, i.e. the space of pro jectiv e conditional probability distributions. In Section 7 w e recall the deﬁnition of the aﬃne toric v ariet y X A asso ciated to an inte ger matrix A , and the pro jective to ric v ariet y Y A asso ciated to a Z -graded matrix A (that is, a matrix A suc h that (1 , 1 , . . . , 1) lies in its row span). Giv en a matrix A = A G ( E ) , the space of E - pro jectiv e conditional probabilit y distributions Z A is the closure of the image of the map f A : θ 7→ θ A , view ed a s an elem en t of Q I ∈ E P | I |− 1 . Equipping this pro duct space with multihomogeneous co o r dinates (( p i 1 | I : · · · : p i | I | | I ) , I ∈ E ), the v ariety Z A is cut o ut b y the (m ultihomogeneous) toric ideal I A . Suppo se that w e hav e ∪ I ∈ E = [ m ]. Then b ecause we view the p oin ts ( ( p i 1 | I : · · · : p i | I | | I ) , I ∈ E ) as elemen ts of Q I ∈ E P | I |− 1 , the dimension of this v ariety is m − 1 as exp ected, though the rank of A is larger. W e now dev elop a v ersion of the moment map of to r ic geometry applicable to t he v ariet y of pro jectiv e conditional probabilit y distributions. Hereafter we index the columns of A b y the conditional probabilit y they represen t, i.e. A = ( a · i | I : i ∈ I ∈ E ). W e will require a m ultigra ded notio n to play the ro le of the con v ex hull con v( A ) in the momen t map. W e deﬁne mcon v ( A ) = { X I ∈ E X j ∈ I λ j | I a · j | I : λ j | I ∈ R ≥ 0 , X j ∈ I λ j | I = 1 } . A function w : 2 [ n ] → R is called submo dular if w ( I ) + w ( J ) ≥ w ( I ∩ J ) + w ( I ∪ J ) for I , J ⊆ [ n ]. Eac h subset I of [ m ] deﬁnes a submo dular function w I on 2 [ n ] b y setting w I ( J ) = 1 if I ∩ J is non-empt y and w I ( J ) = 0 if I ∩ J is empt y for J ∈ 2 [ n ] . The f unction w deﬁnes a con vex p olytop e Q w of dimension ≤ n − 1 as fo llo ws: Q w :=  x ∈ R n : x 1 + x 2 + · · · + x n = w ([ n ]) and P i ∈ I x i ≤ w ( J ) for all ∅ 6 = J ⊆ [ n ]  Th us the p o lytop e corresp o nding to a subset I is the simplex ∆ I = conv { e k : k ∈ I } . No w consider an arbitrary subset E = { I 1 , I 2 , . . . , I r } of 2 [ m ] . It deﬁnes the submo dular function w E = w I 1 + w I 2 + · · · + w I r . The corresp onding p olytop e Q w E is now the Mink ows ki sum [24] ∆ E = ∆ I 1 + ∆ I 2 + · · · + ∆ I r . (4) Prop osition 4.2. The pr oje ction of mcon v ( A G ( E ) ) to the V -c o or dinates (3) is ∆ E . Pr o of. The mconv construction is equiv alen t to translating eac h simplex that is the con v ex h ull of each set o f v ectors A I ⊂ A b y setting its U -co ordinates (3) all to 1, then ta king the Mink o wski sum. Next is a v ersion of Theorem 7.1 for v arieties Z A . Note that | V | = m when ∪ I ∈ E I = [ m ]. Now w e hav e a separate pa rtition function fo r eac h conditioned-up on set. Theorem 4.3. F or A = A G ( E ) , the m a p ν : Z A → R | V | deﬁne d by ν ( z ) = X I ∈ E 1 Z I ( z ) X i ∈ I | z i | I | a · i | I , 11 wher e Z I = P i ∈ I | z i | I | , m a ps Z A onto mcon v ( A ) , an d is a bije ction o n Z A , ≥ 0 . Pr o of. The map ν is the comp o sition o f t w o maps. The ﬁrst map, ν 1 : Z A → Q I ∈ E ∆ I , is a pro duct of maps µ 1 corresp onding to eac h submatrix A I as in the pro o f of Theorem 7.1. It ssends a p o in t (( z i 1 | I , . . . , z i | I | | I ) , I ∈ E ) ∈ Z A to the p oin t p = ( p i | I = 1 Z I ( z ) | z i | I | : i ∈ I ∈ E ) in the pro duct of simplic es Q I ∈ E ∆ I , whic h can be thought of as p ossibly redundan t barycen tric co ordinates. The second map, ν 2 , corresp onds to the Mink ows ki sum, with ν 2 : Q I ∈ E ∆ I → mcon v( A ) sendin g p to A p . Whereas in the simple x case (and for a single A I ) in Theorem 7.1, µ 1 and µ 2 are iden tities, here there is additional am big uity in tro duced b y the Mink owski sum. In particular, let b ∈ ∆ E (4). Then the preimage of b in Q I ∈ E ∆ I is P A ( b ) = { p : A p = b } ∩ Y I ∈ E ∆ I , and in general consists of a p olytop e. This is illustrated in Figure 7, where the p o lytop e P A ( b ) is the set of pairs of p o in t s in the ﬁrst a nd second simplex that add to b . Analogously to the one-factor case (Theorem 7.1), w e will c ho ose among the p oin ts of this ﬁb er b y selecting the maxim um en tropy p oin t (or the p oin t closest in the KL-div erg ence sense to the p oin t represen ting a uniform distribution in all simplices). The r esulting space of solutions (the space of conditional probability distributions) is illustrated in Figure 8. Setting D ( p ) = D ( p || p unif or m ) so D ( p ) = X i ∈ I ∈ E p i | I log p i | I − X i ∈ I ∈ E p i | I log( 1 | I | ) , the Hessian of D is 1 p i | I on the diagonal a nd zero elsewhe re. Th us it is p ositiv e deﬁnite on the in terior of Q I ∈ E ∆ I , and o n p oints o f the relativ e in terior af t er restricting to nonzero co ordinates. Thu s D has a unique minim um p ∗ on Q I ∈ E ∆ I . W ere there another minim um, the (p ossibly restricted) Hessian would b e p ositiv e deﬁnite on the op en segmen t connecting it with p ∗ . W e now a r gue that p ∗ ∈ Z A . First supp ose p ∗ ∈ ( Q I ∈ E ∆ I ) ◦ , so that 0 < p i | I < 1 in all co ordinates, and let u ∈ k er A . W e m ust show that p u + = p u − . F or small t , p ∗ + tu ∈ Q I ∈ E ∆ I and D ( p ∗ + tu ) = X i ∈ I ∈ E ( p i | I + tu i | I ) log ( p i | I + tu i | I ) − X i ∈ I ∈ E ( p i | I + tu i | I ) log 1 | I | dD dt = X i ∈ I ∈ E u i | I log( p i | I + tu i | I ) + X i ∈ I ∈ E u i | I + X i ∈ I ∈ E u i | I log 1 | I | . Since A is E -m ultigraded, the last tw o terms of dD dt are zero (i.e. (1 , 1 , . . . , 1) ∈ R k E k is in the r o wspace of A , and (1 , 1 , . . . , 1) ∈ R | I | is in the row space of eac h A I ). A t t = 0, the ﬁrst order condition implies tha t 0 = dD dt = X i ∈ I ∈ E u i | I log p i | I . 12 Grouping the sum b y the sign of u i | I and changing to exp o nential nota t ion, p u + = p u − (5) as desired. No w supp ose that p ∗ lies on the b oundary of Q I ∈ E ∆ I . If the zeros of p lie outside supp( u ), the argumen t made ab o v e for p ∗ in the in terior holds after extending D with the limit p lo g( p ) → 0 as p → 0. If there are zeros on b oth sides of (5 ), i.e. p i | I = 0 = p j | J for indices i | I ∈ supp( u + ) a nd j | J ∈ supp( u − ), then the relation holds with 0 = 0. W e may assume p i | I = 0 for some index i | I ∈ supp( u + ) in considering the t wo remain- ing cases . The ﬁrst case has p j | J = 1 for some index j | J ∈ supp( u + ). Because of the m ultigr a ding of A , whic h requires for an y J ∈ E and u ∈ k er A that P j ∈ J u j | J = 0, it m ust be that there exists k | J ∈ supp ( u − ). Then since p ∈ Q I ∈ E ∆ I , w e ha ve p k | J = 0 and the relation (5 ) ho lds as 0 = 0. The sec ond case has 0 ≤ p j | J < 1 for all j | J ∈ supp( u + ) and 0 < p k | K ≤ 1 for all k | K ∈ supp( u − ). Then for small t , p ∗ + tu ∈ P A ( b ). Then w e hav e dD dt = X { i | I : p i | I =0 } u i | I ( p i | I + tu i | I ) + X { j | J : p j | J 6 =0 } u j | J ( p j | J + tu j | J ) . (6) Then t he ﬁrst term on the righ t hand side of (6) approac hes negativ e inﬁnit y as t → 0 while the second a ppro ac hes a constan t; this contradicts the optimalit y of p ∗ , so this case cannot a r ise. • •         • 2 2 2 2 2 2 2 2 + • •         = • •                 • 2 2 2 2 2 2 2 2 •         ◦ × × ◦ × ◦ b × + × = ◦ + ◦ P 2   ν A I 1 P 1   ν A I 2 P 2 × P 1   ν A Figure 7: Am biguit y arising fro m Mink owski sum of simplices: t wo p oints a pp earing in the ﬁb er ov er b in Q I ∈ E ∆ I . F or a n y p o in t on the dotted line, there is a point in the second simplex suc h that their sum is b . W e choose × among these p oints b y maximizing en tropy in the conditional proba bility distribution. See Figure 8 for the space of solutions. W e now giv e a couple of examples. 13 • •            • 2 2 2 2 2 2 2 2 2 2 2 • •            • 2 2 2 2 2 2 2 2 2 2 2 • (1 : 0 : 0 ) •                                              (0 : 1 : 0 ) • 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 E                                                 n n n n n n n n n n p p p p p p p p p t t t t t t t t x x x x x x x l l l l l l l l l l l       2 2 2 2 2 2 ν - - [ [ [ [ [ [ [ [ [ [ Figure 8: The space o f conditional probability distributions is the blo w-up of P 2 at the p oint p 2 = p 3 = 0 of Figure 2, inte rsected with a triangula r prism. In general and in higher dimensions, blo w-ups are alo ng the conditio ned-up on faces. E has homog eneous co ordinates ( p 2 | 23 : p 3 | 23 ) and the tria ngle has ho mogeneous co ordinates ( p 1 : p 2 : p 3 ). Example 4.4. F o r the case m = 3 with E = { 12 , 1 3 , 23 , 123 } , the matrix A is           p 1 p 2 p 3 p 1 | 12 p 2 | 12 p 1 | 13 p 3 | 13 p 2 | 23 p 3 | 23 1 1 0 0 1 0 1 0 0 0 2 0 1 0 0 1 0 0 1 0 3 0 0 1 0 0 0 1 0 1 12 0 0 0 1 1 0 0 0 0 13 0 0 0 0 0 1 1 0 0 23 0 0 0 0 0 0 0 1 1 123 1 1 1 0 0 0 0 0 0           The U - co ordinate rows are lab eled 1 , 2 , 3 and the V -co ordinate ro ws are lab eled 12 , 13 , 2 3 , 123 . The p olytop e mcon v ( A ) is the p erm utohedron whic h is the conv ex h ull of the p erm uta- tions of (3 , 1 , 0), show n in Figure 9. Letting A ′ b e the la st six columns of A (restriction to { I ⊆ [ n ] : | I | = 2 } ), mcon v ( A ′ ) is the regular p ermutohedron con v ((2 , 1 , 0), (2 , 0 , 1), (1 , 0 , 2), (1 , 2 , 0), (0 , 2 , 1), (0 , 1 , 2) ), lifted with the last four co ordinat es all 1 . This is illustrated in Figure 10. The theorem of Mat ´ u ˇ s (Theorem 4.1) w orks in this w ay by pro jecting ﬁrst fr o m E = { I : | I | ≥ 2 } to E = { I : | I | = 2 } as in Figure 10. Th us the result ma y b e understo o d as sayin g that instead o f all simplices, we can obtain a regular p erm utohedron merely as the zonotop e g iv en b y the Mink o wski sum of the 1- simplices. 14 • 301 • 310 • 130 • 031 • 013 • 103 • 220 • 202 • 022 • 211 • 121 • 112 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1          1 1 1 1 1 1 1 1 1                    Figure 9: Multigraded conv ex h ull of A for n = 3 and E = { I ⊆ [ n ] : | I | ≥ 2 } . The last four co o rdinates, not show n, are all 1. • 201 • 210 • 120 • 021 • 012 • 102 1 1 1 1 1      1 1 1 1 1      Figure 1 0: Multigr a ded conv ex h ull of A for n = 3 and E = { I ⊆ [ n ] : | I | = 2 } . The last four co o rdinates, not show n, are all 1 5 P artially observ ed discre t e random v ariables Let X 1 , . . . , X n b e discrete random v ariables with X i taking v alues x 1 i , . . . , x d i i . Then the m = Q n i =1 d i singleton ev en ts in Ω are the elemen ts of the Cartesian pro duct of the sets of states whic h eac h random v ariable may assume. F or a subset of random v ariables X i 1 , . . . , X i k with S := { i 1 , . . . , i k } ⊆ [ n ], we write Ω S for the Cartesian product of the states of this subset o f the random v ariables. W e a lso denote b y x | S the restriction of some global state x ∈ Ω to the states of the random v ariables in S . Then the set o f eve n ts E has t he form: E = { x ′ ∈ Ω : x ′ | S = x S for some S ⊆ [ n ] , x S ∈ Ω S } (7) Let E ( x S ) denote the ev en t whic h is the union of all singleton ev ents with random v ariables S in state x S . F or example, let n = 3, d i = 2 with states denoted 0 and 1, and S = { 1 , 3 } . Then E ( x 0 1 x 1 3 ) = { 0010 , 0 011 , 0110 , 0111 } , whic h corresponds to a 2-face of the 4- cub e. Now w e ma y write with the more usual notation p x A | x B := p E ( x A ) ∩ E ( x B ) | E ( x B ) whic h is conv enien t for considering, say , the conditional probability of having a dise ase giv en a p ositiv e test result. Besag’s relat io n (1) among p ositiv e conditional probabilities 15 is written this w ay: P ( x ) P ( y ) = n Y i =1 P ( x i | x 1 , . . . , x i − 1 , y i +1 , . . . , y n ) P ( y i | x 1 , . . . , x i − 1 , y i +1 , . . . , y n ) . (8) This is a sp ecial case of the relations deriv ed in Theorem 3.2, a s w e now explain. Denote the ev ent x 1 , . . . , x j − 1 , y j , . . . , y n b y j , so the singleton ev en ts are ( y 1 , . . . , y n ) = 1 , 2 , . . . , n + 1 = ( x 1 , . . . , x n ). The set E consists of the ev en t { 1 , . . . , n + 1 } together with the ev en ts { j, j + 1 } for j = 1 , . . . , n . Then the cleared-denominator vers ion of (1) is the outer cycle [ n + 1] → 1 ← 12 → 2 ← · · · ← n, n + 1 → n + 1 ← [ n + 1] in the graph G E . F or example, with three v ariables w e ha ve ev en ts 1 = ( y 1 , y 2 , y 3 ), 2 = ( x 1 , y 2 , y 3 ), 3 = ( x 1 , x 2 , y 3 ), a nd 4 = ( x 1 , x 2 , x 3 ). The relation (1) is p 4 p 1 = p 2 | 12 p 3 | 23 p 4 | 34 p 1 | 12 p 2 | 23 p 3 | 34 , corresp onding to t he cycle binomial p 1 p 2 | 12 p 3 | 23 p 4 | 34 − p 4 p 1 | 12 p 2 | 23 p 3 | 34 , whic h is f C for the outer cycle C of the gr aph in Figure 11. 1234 1 2 3 4 12 23 34 p 1 _ _ ? ? ? ? ? ? p 2 ? ?       p 3   ? ? ? ? ? ? ? ? p 4           p 1 | 12 o o p 2 | 12 / / p 2 | 23 O O p 3 | 23   p 3 | 34 / / p 4 | 34 o o Figure 1 1: Bipartite graph fo r E = {{ 1 , 2 } , { 2 , 3 } , { 3 , 4 } , { 1 , 2 , 3 , 4 }} . 6 Ba y es ’ rule Because of the Bay es binomials, on p oin ts which are pro jectiv e conditional probabilit y distributions, we hav e, with i, j ⊆ J ⊆ K ⊆ [ m ], p i | K p j | J = p j | K p i | J . This implies, b y summing ov er j ∈ J , that p i | K p J | J = p J | K p i | J . (9) Using tw o copies of (9) with diﬀeren t intermediate sets J 1 and J 2 , we ha v e ( p i | J 1 p J 1 | K ) p J 2 | J 2 = p i | K p J 1 | J 1 p J 2 | J 2 = ( p i | J 2 p J 2 | K ) p J 1 | J 1 16 whic h giv es a multihomogeneous v ersion of Ba yes ’ r ule. Because we consider the p oin t represen ting a pro jectiv e conditional probability distribution a s an elemen t of (( p i 1 | I : · · · : p i | I | | I ) , I ∈ E ), w e ma y set p J 1 | J 1 and p J 2 | J 2 to 1 o n an op en set con t a ining all probabilistically relev an t p oin ts, and summing o v er i ∈ I , this b ecomes p I | J 1 p J 1 | K = p I | J 2 p J 2 | K . Or when p J 1 | K 6 = 0, p I | J 1 = p I | J 2 p J 2 | K p J 1 | K so t ha t in par t icular, with A, B ⊆ [ m ], and setting I = A ∩ B , J 1 = B , J 2 = A , and K = [ m ] w e ha v e the familiar expression for Ba y es’ r ule p A ∩ B | B = p A ∩ B | A p A p B . 7 App endix: T oric ideals and toric v arieties Here w e collect some needed facts ab out to r ic ideals and toric v arieties based primar ily on Sturmfels’ b o ok [2 3], also referring to [6, 8, 10, 16, 1 9]. 7.1 Aﬃne toric v arieties Let A b e a d × m integer matrix, with columns a · , 1 , . . . , a · m . Let C [ x 1 , . . . , x m ] b e a p olynomial ring in m v aria bles, and for u ∈ Z m let x u = Q m j =1 x u j j . The matrix A deﬁnes a toric ide al I A = h x u + − x u − : u ∈ k er A ∩ Z m i , where u + is the p ositive part of u and u − the negat ive. The t o ric ideal I A is a prime ideal. A minimal set of binomials whic h generates I A is said to b e a Markov b asis fo r the matrix A . A term order is a total order on the monomials of a p olynomial r ing suc h that 1 is the unique minimal elemen t and m 1 ≻ m 2 implies m 3 m 1 ≻ m 3 m 2 for any monomials m 1 , m 2 , m 3 . This order deﬁnes the initial monomial of any p olynomial, and the initial ideal of an ideal I is generated b y the initial monomials in ≻ f for all f ∈ I . A Gr¨ obner b asis { f 1 , . . . , f k } for a n ideal I with resp ect to a monomial term o rder ≻ has in ≻ ( I ) = h in ≻ ( f 1 ) , . . . in ≻ ( f k ) i . A Gr¨ obner basis is universal if it is a Gr¨ obner basis for all term orders ≻ . F or p olynomials f and g and term order ≻ , let m ( f , g ) b e the least common m ult iple of their leading monomials, a nd let f 0 , g 0 b e their leading terms. Then their S -p o lynomial is m ( f ,g ) f 0 f − m ( f ,g ) g 0 g and is used in Buc h b erger’s algor it hm. In the aﬃne space C m with co ordinates x 1 , . . . , x m , the ideal I A cuts out the aﬃne toric v ariet y X A . The R ≥ 0 -span of the columns of A deﬁne a cone p os( A ), and the N -span deﬁnes a semigroup N A . The corr esp o nding semigroup ring C [ N A ] is isomorphic to the aﬃne co ordina t e ring C [ x 1 , . . . , x m ] /I A , i.e. X A ∼ = Sp ec( C [ x 1 , . . . x m ] /I A ) ∼ = Sp ec C [ N A ]. Suc h v arieties are not alw ays nor ma l. The matrix A deﬁnes a map f A : θ 7→ θ A from the 17 d -dimensional torus T d to the toric v ariet y X A . This give s an explicit torus action and torus embedding. The closure of the image o f f is X A . This is also the parameterization map of an exp onential family . 7.2 P olytop es and pro jectiv e toric v arieties Let con v ( A ) b e the con v ex h ull of the columns of A . T his is a p olytop e. Let Y A b e the pro jectiv e t o ric v ariety deﬁned b y taking the closure of t he image of f A , and viewing x 1 , . . . , x m as ho mo g eneous coor dina t es. The corresp onding homogeneous toric ideal is the ideal J A = h x u + − x u − : u ∈ k er A ∩ Z m , k u + k 1 = k u − k 1 i . (10) The aﬃne cone ov er Y A is the tor ic v ariet y X A ′ , where A ′ is A with a ro w of ones added at t he b ottom unless the v ector of all ones alr eady lies in rowsp an( A ) . This induces homogeneit y with resp ect to the Z -grading. When A has (1 , 1 , . . . , 1) in its ro w span (e.g. b y ha ving equal column sums or (1 , 1 , . . . , 1) as a row), w e sa y it is Z - graded and the norm restriction in (10) is not required. Instead of (1 , 1 , . . . , 1), we can use another grading o f the columns of A to o btain multihomogeneous ideals. 7.3 The momen t map The momen t map sends a pro jectiv e tor ic v ariet y Y A on t o its p olytop e con v ( A ), bijectiv ely on the nonnegative part of the v ariety . Theorem 4.3 is a v ersion of t his result for to r ic v arieties in a pro duct of pro jectiv e spaces. Theorem 7.1. L et A b e a d × m , Z -gr ade d matrix, and Y A the c orr esp ond ing pr oje ctive toric variety. Then the map µ : Y A → conv( A ) , give n by µ ( y ) = 1 Z ( y ) X j | y j | a · j , wher e Z ( y ) = P j | y j | , is a bije ction fr om Y A , ≥ 0 onto con v ( A ) . If further rank( A ) = d , with f A the torus emb e ddin g , then µ ◦ f A is hom e omorphism R d > 0 → conv( A ) ◦ . The result is standard and a pro of can b e found in [23, 10, 8] and go es b y t he na me Birc h’s theorem in statistics. References [1] S. Aoki and A. T a kem ura. Marko v c hain mon te carlo exact tests for incomplete t wo-w ay con tingency tables. Journal of Statistic al Computation and S imulation , 75(10):787 –812, 2005. 18 [2] B. Arnold, E. Castillo, a nd J.M. Sarabia. Cond itional Sp e ciﬁc ation of Statistic al Mo dels . Springer, 1999. [3] J. Besag. Neares t-neigh b our systems and the auto -logistic mo del for bina r y data. Journal of the R oyal Statistic al So ciety , B 34(1):75 – 83, 1972. [4] J. Besag. Spatial inte raction and the stat istical analysis of lattice systems. Journal of the R oyal Statistic al So ciety, Series B , 36(2 ):192–236 , 19 7 4. [5] P . Billingsley . Pr ob ab i l i ty and Me asur e . Wiley , 1995. [6] D . Co x. T oric v arieties and toric resolutions. In H. Hauser, J. Lipman, F. Oort , and A. Q uir ´ os, editors, R esolutions of Sin gularities , pages 259–284. Birkh¨ auser, Basel- Boston-Berlin, 2 000. [7] D Co x, J L it t le, and D O’Shea. Ide als, varieties, and algorithms . Undergraduate T exts in Mathematics. Springer, New Y o rk, second edition, 1997 . An in tr o duction to computational algebraic geometry a nd comm utativ e a lgebra. [8] G. Ew ald. Combinatorial Convexity and Algebr aic Ge ometry . Springer, 1996 . [9] R.P . F eynman. Negativ e probability . In B.J. Hiley a nd F.D. Peat, editor s, Quantum Implic ations: Essays in honour of David Bohm , c hapter 13, pa g es 235–2 4 8. Routledge & Kegan Paul, 198 7. [10] W. F ulton. In tr o duction to T oric V arieties . Princeton Univ ersity Press, 19 93. [11] D . Geiger, C. Meek, and B. Sturmfels. On the toric algebra of graphical mo dels. A nnals of S tatistics , 34:14 63–1492 , 2006. [12] G M Greuel and G Pﬁster. A Singular Intr o duction to Commutative A l g ebr a . Springer, Berlin and Heidelberg, 2002. [13] J. Harris. Algebr aic ge ometry, A ﬁrst c ourse . Springer-V erlag, 1995. [14] R . Hemmec k e, J. Morton, A. Shiu, B. Stur mfels, and O. Wienand. Three coun t erex- amples on semigraphoids. Com binatorics, Pr ob ability, and Computing , 1 7(2):239– 257, 2008. [15] F . Mat ´ u ˇ s. Conditional probabilities and p erm utahedron. A nna les de l’Institut H. Poinc ar´ e, Pr ob abilit´ es e t Statistiques , 39:68 7–701, 200 3 . [16] E. Miller and B. Sturmfels. Combinatorial Commutative A lgebr a , volume 227 o f Gr aduate T exts in Mathematics . Springer, New Y ork, 2 0 04. [17] J. Morton, A. Shiu, L. P ach ter, B. Sturmfels, a nd O. Wienand. Conv ex r a nk tests and semigraphoids. Preprin t, 20 08. math.CO/07 0 2564. 19 [18] M. A. Nielsen and I. L. Ch uang. Quantum Computation and Q uantum Information . Cam bridge Univ ersity Press, 20 00. [19] L. Pac h ter and B. Sturmfels, editors. A lgebr aic Statistics for Computational Biolo gy . Cam bridge Univ ersity Press, 20 05. [20] Alexander P ostnik ov. P erm uto hedra, asso ciahedra, and b ey o nd. Preprin t, 2005. math/050716 3 . [21] A. Sc hrijv er. Th e ory of I nte ger and Line ar Pr o gr amm i n g . Jo hn Wiley & So ns, New Y ork, 1998. [22] A. Slavk o vic and S. Sulliv an t. The space o f compatible full conditionals is a unimo d- ular toric v ariet y . Journal of Symb olic Computation , 41:19 6–209, 2006 . [23] B. Sturmfels. Gr¨ obner Bases and Convex Polytop es . American Mathematical So ciet y , Pro vidence, 19 96. [24] G .M. Ziegler. L e ctur es on p olytop es , volume 15 2 of Gr aduate T exts in Mathematics . Springer, New Y ork, 1 995. 20

Relations among conditional probabilities

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment