Approximate Counting of Graphical Models Via MCMC Revisited

Appro ximate Coun ting of Graphical Mo dels Via MCMC Revisited Jose M. Pe˜ na ADIT, IDA, Link¨ oping Universi ty , Sw eden jose.m.pen a@liu.se Abstract. In [6], MCMC sampling is app li ed to approximately calculate the ratio of ess ential graphs (EGs) to dire cted acyclic graphs (DA Gs) f or up to 20 no des. In the present pap er, w e extend t hat w ork from 20 to 31 nod es . W e also extend that w ork b y computing the appro ximate ratio of conn e cted EGs to connected DA Gs, of connected EGs to EGs, and of connected DA Gs to DA Gs. F u rthermore, w e prove that the latter ratio is asymptotically 1. W e also discuss t h e implications of t h es e results for learning DA Gs from data. Keywords: Ba yesi an netw orks, Marko v equiv alence, MCMC 1 In tro duction Probably the most common approa c h to learning directed acy c lic graph (DA G) mo dels 1 from da t a, also known as Bayesian net w ork mo dels , is tha t of p erforming a search in the space of either DA Gs or DA G mo dels. In the latter cas e, D AG mo dels are typically r epresen ted as essential graphs (E Gs). Knowing the r atio of EGs to D AGs for a given num ber of no de s is a v alua ble piece of infor mation when deciding which space to search. F or ins ta nce, if the ratio is low, then one may prefer to search the space of E Gs r ather than the space of DA Gs, though the latter is us ua lly c onsidered easie r to traverse. Unfortunately , while the num ber of D AGs ca n b e co mput ed without enumerating them all [9, Equation 8], the only metho d for c o un ting EGs that we are a ware of is enumeration. Sp eciﬁcally , Gillispie and Perlman enumerated all the E Gs for up to 10 no des by mea ns of a computer pro gram [3]. They sho wed that the r atio is around 0.27 for 7-10 no des. They also conjectured a similar r atio for mor e than 10 no des by extrap olating the exact r atios for up to 1 0 no des. Enumerating EGs for mor e than 10 no des seems challenging: T o enumerate all the EGs ov er 10 no des, the computer pr ogram of [3] needed 2253 hours in a ”mid-199 0 s-era, midrange minicomputer ” . W e obviously prefer to know the exa ct ratio of EGs to DA Gs for a given n umber o f no des rather tha n an approximation to it. How ever, an approximate ratio ma y b e easier to obtain and serve as well a s the ex a ct one to decide which space to sear c h. In [6], a Marko v chain Monte Ca rlo (MCMC) a pproac h w as pr oposed to a ppr o ximately 1 All the graphs considered in this pap er are lab eled graphs. 2 Jose M. Pe˜ na calculate the ra t io while av oiding enumerating E Gs . This approa c h co nsisted of the following steps. Firs t, the a uthor constructed a Mar k ov chain (MC) whose stationary distr ibutio n was uniform ov er the spac e o f EGs for the given n umber of no des. Then, the author sampled that sta t ionar y distribution and computed the ratio R of essential D AGs (ED AGs) to EGs in the sa m ple. Finally , the author transformed this appr o ximate ratio into the desired approximate r atio of EGs to DA Gs as follows: Since # E Gs # D AGs can b e expr essed as # E DAGs # D AGs # E Gs # E DAGs , 2 then we can approximate it b y # E DAGs # D AGs 1 R where # D AGs and # E D A Gs can b e computed via [9, Equation 8] and [1 0 , p. 270 ], resp ectively . The author r eported the so- o btained appr oximate ratio for up to 20 nodes. The approximate ratios agreed well with the exact ones av ailable in the literature and sugg e sted that the exact r atios are not v ery low (the appr o ximate ratios were 0.26- 0 .27 for 7-20 no des). This indicates that one sho uld not expect more than a moder ate gain in eﬃciency when searching the space of EGs instead of the space of DA Gs. O f course, this is a bit o f a b old cla im since the gain is dictated by the av erage ratio ov er the E Gs visited during the s earc h and not by the av erag e ratio over all the EGs in the search space. F or instance, the gain is not the same if we visit the empt y EG, who se ra t io is 1 , or the complete EG, who se ra tio is 1 /n ! for n nodes. Unfortunately , it is imp ossible to kno w befor ehand which EGs will be visited during the s earc h. Therefore, the bes t w e can do is to draw (b old) conclusions based on the average ratio over all the EGs in the search space. In this pa per, we extend the work in [6] from 20 to 31 no des. W e also extend that work by rep orting so me new approximate r atios. Speciﬁcally , we rep ort the appr o ximate r atio of connected EGs (CEGs) to connected DA Gs (CD AGs), of CEGs to EGs, a nd of CDA Gs to D A Gs. W e elab orate later on why these ratios are o f in terest. The a ppro ximate ratio of CEGs to CD AGs is computed from the sample as follows. First, we compute the ratio R ′ of EDA Gs to CEGs in the sa mple. Second, w e transform this a ppro ximate ra tio in to the desired approximate ratio of CEGs to CD AGs as follows: Since # C E Gs # C D AGs can b e expressed as # E DAGs # C D AGs # C E Gs # E DAGs , then we can approximate it by # E DAGs # C D AGs 1 R ′ where # E D AGs can b e computed by [10, p. 270] and # C D A Gs c a n b e computed as shown in Appendix A. The approximate ratio o f CEGs to EGs is co mput ed directly from the sample. The approximate ratio of CDA Gs to DA Gs is co m puted with the help of App endix A a nd [9, Equation 8 ]. The computer pr ogram implementing the MCMC appr oac h desc ribed ab o ve is essentially the same as in [6] (it has only b een mo diﬁed to rep ort whether the EGs sampled are connected or not). 3 The progr am is written in C+ + and compiled in Microso f t Visual C++ 201 0 Express. The ex periments are run on an AMD A thlon 64 X2 Dual Cor e P rocess or 500 0+ 2.6 GHz, 4 GB RAM a nd Win- dows Vista Business. The compiler and the computer used in [6] were Mic r osoft Visual C+ + 2008 E xpress and a Pen tium 2 .4 GHz, 512 MB RAM and Windo ws 2 W e use the symbol # follo w ed by a class of graphs to denote th e cardinality of th e class. 3 The mo d iﬁed program will be made a v ailable after publication. Approximate Counting of Graphical Mod el s V ia MCMC Revisited 3 T able 1. Exact and approximate # E Gs # D AG s and # E DAGs # E Gs . NODES EXA CT OLD APPRO XIMA TE NEW APPRO XIMA TE # E Gs # DAGs # E DAGs # E Gs Hours # E Gs # DAGs # E DAGs # E Gs Hours # E Gs # DAGs # E DAGs # E Gs Hours 2 0.66667 0.50000 0.0 0.66007 0.50500 3.5 0. 6 7654 0.49270 1.3 3 0.44000 0.36364 0.0 0.43704 0.36610 5.2 0. 4 4705 0.35790 1.0 4 0.34070 0.31892 0.0 0.33913 0.32040 6.8 0. 3 3671 0.32270 1.2 5 0.29992 0.29788 0.0 0.30132 0.29650 8.0 0. 2 9544 0.30240 1.4 6 0.28238 0.28667 0.0 0.28118 0.28790 9.4 0. 2 8206 0.28700 1.6 7 0.27443 0.28068 0.0 0.27228 0.28290 12.4 0.27777 0.27730 2.0 8 0.27068 0.27754 0.0 0.26984 0.27840 13.8 0.26677 0.28160 2.3 9 0.26888 0.27590 7.0 0.27124 0.27350 16.5 0.27124 0.27350 2.6 10 0.26799 0.27507 2253.0 0.26690 0.27620 18.8 0.26412 0.27910 3.1 11 0.26179 0.28070 20.4 0.26179 0.28070 3.8 12 0.26737 0.27440 21.9 0.26825 0.27350 4.2 13 0.26098 0.28090 23.3 0.27405 0.26750 4.5 14 0.26560 0.27590 25.3 0.27161 0.26980 5.1 15 0.27125 0.27010 25.6 0.26250 0.27910 5.7 16 0.25777 0.28420 27.3 0.26943 0.27190 6.7 17 0.26667 0.27470 29.9 0.26942 0.27190 7.6 18 0.25893 0.28290 37.4 0.27040 0.27090 8.2 19 0.26901 0.27230 38.1 0.27130 0.27000 9.0 20 0.27120 0.27010 40.3 0.26734 0.27400 9.9 21 0.26463 0.27680 17.4 22 0.27652 0.26490 18.8 23 0.26569 0.27570 13.3 24 0.27030 0.27100 14.0 25 0.26637 0.27500 15.9 26 0.26724 0.27410 17.0 27 0.26950 0.27180 18.6 28 0.27383 0.26750 20.1 29 0.27757 0.26390 21.1 30 0.28012 0.26150 21.6 31 0.27424 0.26710 47.3 2000. The exper imen tal settings is the same as befor e for up to 30 nodes , i.e. each approximate r atio r e ported is base d on a sa mp le of 10 4 EGs, each obtained as the state of the MC after pe r forming 10 6 transitions with the empty EG as initial state. F or 31 no des though, each E G s ampled is o bt ained a s the s ta t e o f the MC a ft er p erforming 2 × 10 6 transitions with the empt y EG as initial sta t e. W e elab orate la ter on why we double the length of the MCs for 31 node s . The rest of the pap er is or ganized as follows. In Section 2, we extend the work in [6] fro m 2 0 to 31 no des. In Section 3, we extend the work in [6] with new a ppro ximate ra t ios. In Section 4, we recall our ﬁndings and discuss future work. The pape r ends with t wo app endices devoted to technical details. 4 Jose M. Pe˜ na 2 Extension from 20 to 31 No des T a ble 1 presents our new a ppro ximate ratio s, together with the old a pp roximate ones and the e x act ones av a ilable in the litera ture. The ﬁrst conclusion that we draw from the table is that the new ratios ar e very close to the exact ones , as well a s to the old ones. This makes us conﬁdent o n the accur acy o f the ratio s for 11-31 no des, where no exa ct ratios are av a ilable in the literature due to the high computational cost inv olved in ca lculating them. Another conclusion that we draw from the ta ble is that the r a tios seem to be 0.26 -0.28 for 11- 31 no des. T his agrees well with the conjectured ra tio of 0.27 fo r mor e than 10 no des rep orted in [3]. A last co nclusion that we draw fr o m the table is that the fraction of EGs that represent a unique DA G, i.e. # E DAGs # E Gs , is 0.2 6-0.28 for 11 -31 no des, a s ubstan tial fraction. Recall from the previous s e c t ion tha t we slightly mo diﬁed the exp erimen tal setting for 31 no des, namely we doubled the length of the MCs. The r eason is as follows. W e observed an incr easing trend in # E Gs # D AGs for 2 5-30 no des, and int erpreted this as an indication that we migh t b e reaching the limits of our exp erimen tal setting. Therefore, w e decided to double the length of the MCs for 31 no des in or der to see whether this bro k e the trend. As can be seen in T able 1, it did. This sug gests that approximating the ra tio for more than 31 nodes will require larger MCs a nd/or samples than the ones used in this w ork . Note that w e can approximate the n umber of EGs for up to 31 no des as # E Gs # D AGs # D A Gs , where # E Gs # D AGs comes from T able 1 and # DAGs comes from [9, Equation 8 ]. Alternatively , w e can approximate it as # E Gs # E DAGs # E D AGs , wher e # E Gs # E DAGs comes from T able 1 and # E D AG s ca n be computed by [10, p. 270]. Finally , a few w ords on the running times rep orted in T able 1 may be in place. First, note that the times rep orted in T able 1 for the exa ct ratios are bo rro wed fro m [3] and, th us, they corr espond to a computer progra m run o n a ”mid-1990 s-era, midra nge minicomputer”. Therefor e, a direct compar ison to our times seems unadv is able. Second, our times a re around four times faster than the old times. The reas o n may b e in the use of a more p o werful computer and/o r a diﬀerent version o f the compiler. The rea son ca nno t b e in the diﬀer e nce in the computer programs run, since this is negligible. Third, the ne w times have s o me o ddities, e.g . the time for t wo no des is g reater than the time for three nodes. The r eason ma y b e that the computer ran other pr ograms while running the exp erimen ts rep orted in this pap er. 3 Extension with New Ratios In [3, p. 15 3], it is sta ted that ”the v ariables chosen for inclusion in a m ulti- v a riate data set are not c hosen at random but rather beca use they o ccur in a common real-world context, and hence are lik ely to b e correlated to some de- gree”. This implies that the EG learnt from some given data is likely to b e connected. W e agree with this observ ation, b ecause we be lie v e that h umans are go od a t detecting sets of mutually uncorr elated v ariables so that the or iginal Approximate Counting of Graphical Mod el s V ia MCMC Revisited 5 T able 2. Ap pro ximate # C E G s # C DAGs , # C E G s # E Gs and # C DAGs # D AG s . NODES NEW APPRO XIMA TE # C EGs # C DAGs # C EGs # E Gs # C DAGs # DAGs 2 0.51482 0.50730 0.66667 3 0.39334 0.63350 0.72000 4 0.32295 0.78780 0.82136 5 0.29471 0.90040 0.90263 6 0.28033 0.94530 0.95115 7 0.27799 0.97680 0.97605 8 0.26688 0.98860 0.98821 9 0.27164 0.99560 0.99415 10 0.26413 0.99710 0.99708 11 0.26170 0.99820 0.99854 12 0.26829 0.99940 0.99927 13 0.27407 0.99970 0.99964 14 0.27163 0.99990 0.99982 15 0.26253 1.00000 0.99991 16 0.26941 0.99990 0.99995 17 0.26942 1.00000 0.99998 18 0.27041 1.00000 0.99999 19 0.27130 1.00000 0.99999 20 0.26734 1.00000 1.00000 21 0.26463 1.00000 1.00000 22 0.27652 1.00000 1.00000 23 0.26569 1.00000 1.00000 24 0.27030 1.00000 1.00000 25 0.26637 1.00000 1.00000 26 0.26724 1.00000 1.00000 27 0.26950 1.00000 1.00000 28 0.27383 1.00000 1.00000 29 0.27757 1.00000 1.00000 30 0.28012 1.00000 1.00000 31 0.27424 1.00000 1.00000 ∞ ? ? ≈ 1 learning problem can b e divided in to smaller indep enden t lear ning problems, each of which re s ult s in a CEG. Ther efore, although we s till cannot say whic h EGs will b e visited dur ing the s earc h, we can say that some of them will mo st likely b e connected and some others disconnected. This raises the question of whether # C E Gs # C D AGs ≈ # D EGs # D DAGs where DEGs and DDA Gs stand fo r disconnec ted EGs a nd disconnected D A Gs. In [3, p. 15 4], it is also said that a consequence of the lea rn t E G b e ing connected is ”that a subs ta n tial num ber of undirected edges ar e likely to be present in the representative es s en tial graph, which in turn makes it lik ely that the corresp onding equiv alence cla ss s iz e will be r elativ ely large” . In other words, they co nj ecture that the equiv alence classes repres e nted by CE Gs are r elativ ely la r ge. W e interpret the term ”r elativ ely la r ge” as having a ratio smaller than # E Gs # D AGs . How ev er, this conjecture do es not seem to hold 6 Jose M. Pe˜ na according to the a ppr o ximate r atios presented in T able 2. There, we can see that # C E Gs # C D AGs ≈ 0 .2 6-0.28 for 6-3 1 no des and, thus, # C E Gs # C D AGs ≈ # E Gs # D AGs . That the tw o ratios coincide is not by chance b ecause # C E Gs # E Gs ≈ 0 .95-1 for 6-31 no des, as can b e seen in the table. A pr oblem of this r atio b eing so close to 1 is tha t sampling a DEG is so unlikely that we canno t answer the ques t ion of whether # C E Gs # C D AGs ≈ # D EGs # D DAGs with our sampling s cheme. Therefore , we hav e to con- ten t with having learn t that # C E Gs # C D AGs ≈ # E Gs # D AGs . It is worth men tioning that this result is somehow conjectured by Koˇ ck a when he states in a p ersonal com- m unication to Gillispie that ”large equiv alence classes ar e merely composed of independent classes of smaller size s that co m bine to make a s ingle lar ger class” [2, p. 14 11]. Ag a in, we interpret the term ”la rge” as having a ra tio smaller than # E Gs # D AGs . Again, we cannot chec k K o ˇ ck a’s conjecture beca use sampling a DEG is very unlik ely . How ever, w e b eliev e tha t the conjecture holds, b ecause we exp ect the ra tios for thos e EGs with k connected comp onen ts to b e ar ound 0 . 27 k , i.e. we exp ect the ratios of the comp onen ts to b e almost indep enden t one of another . Gillispie go es on saying that ”an equiv alence class encountered at any sing le step o f the iter ativ e [learning] pro cess, a step which may inv olve altering only a small n umber of edg es (t ypically only o ne), migh t b e quite sma ll” [2, p. 141 1]. Note that the equiv alence classes that he sug gests that a r e quite small must corres p ond to CE Gs , b ecause he sugge sted b efore that large eq uiv a lence classes corres p ond to DEGs. W e interpret the term ” quite small” a s having a ratio greater than # E Gs # D AGs . Aga in, this conjecture do es not seem to hold according to the appr o ximate r atios pr e s en ted in T a ble 2 . Ther e, we can s e e that # C E Gs # C D AGs ≈ 0.26-0 .28 for 6-3 1 nodes and, th us, # C E Gs # C D AGs ≈ # E Gs # D AGs . F r o m the r esults in T ables 1 and 2, it seems that the asymptotic v a lues for # E Gs # D AGs , # E DAGs # E Gs , # C E Gs # C D AGs and # C E Gs # E Gs should b e around 0.2 7, 0.27, 0 .27 and 1, res p ectively . It would be nice to hav e a for mal pr oof o f these results. In this pap er, we have prov en a related result, namely that the ra tio of CDA Gs to D AGs is asy mp totically 1 . The pro of can b e found in Appe ndix B. Note from T a ble 2 that the a symptotic v alue is almost achieved for 6-7 no des already . Our result adds to the list of similar results in the litera tu re, e.g. the r atio of lab eled connected graphs to lab eled graphs is a symptotically 1 [4 , p. 205]. Note that w e can appr oximate the n umber of CEGs for up to 31 no des as # C E Gs # E Gs # E Gs , whe r e # C E Gs # E Gs comes from T able 2 and # E Gs can be com- puted as shown in the previous section. Alterna tively , we can appr o ximate it as # C E Gs # C D AGs # C D AGs , wher e # C E Gs # C D AGs comes fro m T able 2 and # C D AGs can b e computed as shown in App endix A. Finally , note that the running times to obtain the results in T able 2 a re the same as thos e in T able 1, b ecause b oth tables are based on the sa me samples. 4 Discussion In [3], it is shown that # E Gs # D AGs ≈ 0 . 27 for 7-10 no des. W e hav e shown in this pap er that # E Gs # D AGs ≈ 0.26-0 .28 for 11-31 no des. These results indicate tha t one should Approximate Counting of Graphical Mod el s V ia MCMC Revisited 7 not ex p ect mor e than a mo derate ga in in eﬃciency when searching the space of EGs instead o f the s pace of DA Gs. W e ha ve also shown that # C E Gs # C D AGs ≈ 0.26 - 0.28 for 6-31 no des and, th us, # C E Gs # C D AGs ≈ # E Gs # D AGs . Therefor e, when sear c hing the space o f EGs, the fact that some of the E Gs visited will mos t likely b e connected does not seem to imply any additio na l gain in eﬃciency b ey ond that due to sea rc hing the space of E Gs instead of the space of DA Gs. Some questions that rema in op en and that we would lik e to address in the fu- ture are chec king whether # C E Gs # C D AGs ≈ # D EGs # D DAGs , and computing the asymptotic ratios of E Gs to DA Gs, EDA Gs to EGs, CEGs to CDA Gs, and of CE Gs to EGs. Recall that in this pa per we have proven that the asymptotic ratio of CD A Gs to DA G is 1. Another topic for further r esearch, alrea dy mentioned in [6], w ould be impr o ving the gr aphical mo diﬁcations that determine the MC transitions, bec ause they r ather often pro duce a graph that is not an E G. Sp eciﬁcally , the MC transitions are deter min ed by choos ing uniformly one out o f seven mo diﬁ- cations to per f orm on the current EG. Actually , one of the mo diﬁcations leaves the curr en t E G unchanged. Therefore, ar ound 14 % of the modiﬁca tio ns cannot change the curr en t EG and, thus, 8 6 % of the mo diﬁcations can c hange the cur- rent EG. In o ur exp eriment s, how ever, only 6- 8 % o f the mo diﬁcations change the current EG. The r est up to the mentioned 86 % pro duce a g raph that is not an E G a nd, th us, they leav e the cur ren t EG unchanged. This problem has b een previously pointed o ut in [7]. F urthermore , he presents a set of more complex mo diﬁcations that are claimed to alleviate the problem just describ ed. Unfor- tunately , no evidence supp orting this claim is provided. More recently , He et al. ha ve prop osed an a lternativ e set of mo diﬁcations having a s e r ies of desir- able features that ensure that applying the mo diﬁcations to an EG results in a diﬀerent EG [5]. Although these mo diﬁcations are mo re complex than those in [6], the author s show tha t their MCMC appr oac h is thousands of times faster for 3, 4 and 6 no des [5 , pp. 17-18]. Ho wev er, they also men tion that it is un- fair to compar e these tw o appro ac hes: Wherea s 10 4 MCs of 10 6 transitions each are run in [6] to obtain a sample, they o nly run o ne MC of 10 4 -10 5 transitions. Therefore, it is not clear how their MCMC appr oac h sca les to 10-30 nodes as compared to the one in [6]. The p oin t of developing mo diﬁcations that a re mor e eﬀective than ours at pro ducing EGs is to ma k e a b etter use of the running time by minimizing the n um b er of graphs that hav e to b e disca rded. How ev er, this improv ement in e ﬀ ectiveness has to b e weighed against the computational cost of the mo diﬁ cations, so that the MCMC a pproac h still sca le s to the num ber of no des of int erest. App endix A: Coun ting CD AGs Let A ( x ) denote the exp onen tial generating function for D A Gs. That is, A ( x ) = ∞ X k =1 A k k ! x k 8 Jose M. Pe˜ na where A k denotes the num ber of D A Gs of order k . Likewise, let a ( x ) denote the exp onen tial ge ner ating function for CDA Gs. That is, a ( x ) = ∞ X k =1 a k k ! x k where a k denotes the num b er of CDA Gs of order k . Note that A k can b e com- puted without having to resor t to enumeration by [9, Equation 8]. How ever, w e do not know of a n y fo rm ula to compute a k without enumeration. Luckily , a k can be co mpu ted from A k as follows. Fir st, note that 1 + A ( x ) = e a ( x ) as shown b y [4, pp. 8-9 ]. Now, let us deﬁne A 0 = 1 and redeﬁne A ( x ) as A ( x ) = ∞ X k =0 A k k ! x k , i.e. the summation star ts with k = 0. Then, A ( x ) = e a ( x ) . Consequently , a n n ! = A n n ! − ( n − 1 X k =1 k a k k ! A n − k ( n − k )! ) /n as shown b y [4, pp. 8-9 ], and thus a n = A n − ( n − 1 X k =1 k  n k  a k A n − k ) /n. See a lso [1, pp. 38 -39]. Mor eo ver, according to [1 2 , Sequence A08240 2], the result in this app endix has prev io usly b een rep orted in [8]. How ever, we could not gain access to tha t pap er to conﬁrm it. App endix B: Asymptotic Beha vior of CDA Gs Theorem 1 The r atio of CDA Gs of or der n to DAGs of or der n tends to 1 as n t end s t o inﬁnity. Pr o of. Let A n and a n denote the n umbers of D AGs and CDA Gs of order n , resp ectiv ely . Sp eciﬁcally , we prove that ( A n /n !) / ( a n /n !) → 1 as n → ∞ . By [13, Theorem 6], this holds if the following three conditions are met: (i) lo g(( A n /n !) / ( A n − 1 / ( n − 1)!)) → ∞ as n → ∞ , (ii) lo g(( A n +1 / ( n + 1)!) / ( A n /n !)) ≥ log(( A n /n !) / ( A n − 1 / ( n − 1)!)) for all larg e enough n , and Approximate Counting of Graphical Mod el s V ia MCMC Revisited 9 (iii) P ∞ k =1 ( A k /k !) 2 / ( A 2 k / (2 k )!) co n v erges. W e start by proving that the condition (i) is met. Note that from every DA G G over the no des { v 1 , . . . , v n − 1 } w e ca n co ns t ruct 2 n − 1 diﬀerent D AGs H over { v 1 , . . . , v n } as follows: Copy all the arr o ws from G to H and make v n a child in H of each of the 2 n − 1 subsets of { v 1 , . . . , v n − 1 } . T he r efore, log(( A n /n !) / ( A n − 1 / ( n − 1)!)) ≥ lo g(2 n − 1 /n ) which clearly tends to inﬁnity a s n tends to inﬁnity . W e contin ue b y proving that the condition (ii) is met. E very D AG ov er the no des V ∪ { w } can be co nstructed fro m a D AG G over V by adding the no de w to G and making it a child of a subset P a of V . If a DA G can b e so constructed from sev eral D AGs, we simply consider it as co nstructed from one of them. Let H 1 , . . . , H m represent all the D A Gs so co ns t ructed from G . Moreov er, let P a i denote the subset of V used to construct H i from G . F ro m each P a i , w e can now construct 2 m DA Gs ov er V ∪ { w , u } as follows: (i) Add the no de u to H i and make it a child of each subset P a j ∪ { w } with 1 ≤ j ≤ m , and (ii) add the no de u to H i and ma k e it a parent of each s ubset P a j ∪ { w } with 1 ≤ j ≤ m . Therefore, A n +1 / A n ≥ 2 A n / A n − 1 and thus log(( A n +1 / ( n + 1)!) / ( A n /n !)) = log ( A n +1 / A n ) − lo g( n + 1) ≥ log(2 A n / A n − 1 ) − log( n +1) ≥ log(2 A n / A n − 1 ) − log(2 n ) = log( A n / A n − 1 ) − log n = log(( A n /n !) / ( A n − 1 / ( n − 1)!)) . Finally , w e prove that the condition (iii) is met. Let G and G ′ denote t wo (not necessar ily distinct) DA Gs o f order k . Let V = { v 1 , . . . , v k } and V ′ = { v ′ 1 , . . . , v ′ k } denote the nodes in G a nd G ′ , res p ectively . Co nsider the DA G H ov er V ∪ V ′ that has the union of the a rrows in G and G ′ . Let w and w ′ denote t wo no des in V and V ′ , r espectively . Let S b e a subset of size k − 1 of V ∪ V ′ \ { w, w ′ } . Now, ma k e w a parent in H of all the no des in S ∩ V ′ , and make w ′ a child in H of all the nodes in S ∩ V . Note that the r esulting H is a D A G o f order 2 k . Note that there are k 2 diﬀerent pair s of nodes w and w ′ . Note that there a re  2 k − 2 k − 1  diﬀerent subsets of size k − 1 of V ∪ V ′ \ { w, w ′ } . Note that every choice of DA Gs G and G ′ , no des w a nd w ′ , and subset S gives r ise to a diﬀeren t DA G H . Therefore, A 2 k / A 2 k ≥ k 2  2 k − 2 k − 1  and thus ∞ X k =1 ( A k /k !) 2 / ( A 2 k / (2 k )!) = ∞ X k =1 A 2 k (2 k )! / ( A 2 k k ! 2 ) ≤ ∞ X k =1 (( k − 1 )!( k − 1 )!( 2 k )!) / ( k 2 (2 k − 2)! k ! 2 ) = ∞ X k =1 (4 k − 2) / k 3 which clearly conv erges. 10 Jose M. Pe˜ na Ac kno wledgments. This work is funded by the Cen ter for Industrial Informa- tion T echnology (CE NI IT) and a so- called car eer contract at Link¨ oping Univer- sity , b y the Sw edish Research Council (ref. 2010-480 8), and by FEDER funds and the Spanish Gov ernment (MICINN) through the pro ject TIN2010-2 0900- C04-03 . W e thank Dag Sonn tag for his comments o n this w ork. References 1. Castelo, R. The Discr ete A cyclic Digr aph Markov Mo del in Data Mi ni ng . PhD The- sis, Utrech t Universit y (2002). 2. Gillispie, S. B. F orm ulas for Counting Acyclic Digraph Mark ov Equiv alence Classes. Journal of Statistic al Planning and I nfer enc e (2006) 1410-1432. 3. Gillispie, S. B. and P erlman, M. D . The Size Distribut ion for Marko v Eq u iv alence Classes of Acy cl ic Digraph Mod el s. Artiﬁcial Intel ligenc e ( 2 002) 137-155. 4. Harary , F. and P almer, E. M. Gr aphic al Enumer ation . Academic Press (1973). 5. He, Y., Jia, J. and Y u, B. Reversible MCMC on Marko v Equiv alence Cla sses of Sparse Directed Acy cl ic Graphs. arXiv:1209.5860v2 [stat.ML]. 6. Pe˜ na, J. M. Approximate Counting of Graphical Mo dels Via MCMC. In Pr o c e e dings of the Eleventh I nt ernational Confer enc e on Artiﬁcial Intel ligenc e and Statistics (2007) 352-359. 7. Perlman, M. D. Gr aphic al M o del Se ar ch Via Essent ial Gr aphs . T echnical Report 367, Universit y of W ashington (2000). 8. Rob i nson, R. W. Counting Lab eled Acyclic Digraphs. In New Dir e ctions in the The ory of Gr aphs (1973) 239-273. 9. Rob i nson, R. W. Coun ting Unlab ele d Acyclic Digraphs. In Pr o c e e dings of the Fifth Aus tr alian Confer enc e on Combinatorial Mathematics (1977) 28-43. 10. Steinsky , B. Enumeration of Lab ell ed Chain Graphs and Lab elled Essential D i - rected Acyclic Graphs. Di s cr ete M a thematics (2003) 266-277. 11. Steinsky , B. Asympt otic Behaviour of the N um ber of Lab elled Essen tial A cyclic Digraphs and Lab elled Chain Graphs. Gr aphs and Combinatorics (2004) 399-411. 12. The On-Line Encyclop edia of Integer Sequ e nces, p ublished electronicall y at http://oeis.org, (2010). 13. W righ t, E. M. A Relationship b et w een Tw o Sequences. I n Pr o c e e dings of the L on- don Mathematic al So ciety (1967) 296-304 .

Approximate Counting of Graphical Models Via MCMC Revisited

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment