Learning Bayesian Networks with the bnlearn R Package

J S S Journal of Statistical Software MMMMMM YYYY, V olume VV, Issue II. http://www.jstatsoft.or g/ Learning Ba y esian Net w orks with the bnlearn R P ac k age Marco Scutari Univ ersit y of Pado v a Abstract bnlearn is an R pac k age ( R Dev elopmen t Core T eam 2009 ) whic h incl udes sev eral algo- rithms for learning the structure of Ba y esian netw orks with either discrete or contin uous v ariables. Both constrain t-based and score-based algorithms are implemented, and can use the functionality provided by the sno w pac k age ( Tierney et al. 2008 ) to impro v e their p erformance via parallel computing. Several net w ork scores and conditional independence algorithms are av ailable for both the learning algorithms and indep endent use. Adv anced plotting options are pro vided by the Rgraphviz pac k age ( Gen try et al. 2010 ). Keywor ds : bay esian netw orks, R , structure learning algorithms, constraint-based algorithms, score-based algorithms, conditional indep endence tests. 1. In tro duction In recent years Bay esian net w orks hav e b een used in many ﬁelds, from On-line Analytical Pro cessing (OLAP) p erformance enhancement ( Margaritis 2003 ) to medical service p erfor- mance analysis ( Acid et al. 2004 ), gene expression analysis ( F riedman et al. 2000 ), breast cancer prognosis and epidemiology ( Holmes and Jain 2008 ). The high dimensionalit y of the data sets common in these domains hav e led to the dev elop- men t of several learning algorithms focused on reducing computational complexit y while still learning the correct net w ork. Some examples are the Gr ow-Shrink algorithm in Margaritis ( 2003 ), the Incr emental Asso ciation algorithm and its deriv atives in Tsamar dinos et al. ( 2003 ) and in Y aramak ala and Margaritis ( 2005 ), the Sp arse Candidate algorithm in F riedman et al. ( 1999 ), the Optimal Reinsertion in Moore and W ong ( 2003 ) and the Greedy Equiv alent Searc h in Chick ering ( 2002 ). The aim of the bnlearn pac k age is to provide a free implemen tation of some of these structure learning algorithms along with the conditional indep endence tests and netw ork scores used 2 Learning Bay esian Net w orks with the bnlearn R Pac k age to construct the Bay esian netw ork. Both discrete and con tin uous data are supp orted. F ur- thermore, the learning algorithms can b e c hosen separately from the statistical criterion they are based on (whic h is usually not p ossible in the reference implementation provided by the algorithms’ authors), so that the b est combination for the data at hand can b e used. 2. Ba y esian net w orks Ba y esian net w orks are graphical mo dels where no des represent random v ariables (the tw o terms are used in terc hangeably in this article) and arrows represen t probabilistic dep endencies b et w een them ( Korb and Nichol son 2004 ). The graphical structure G = ( V , A ) of a Bay esian netw ork is a dir e cte d acyclic gr aph (DA G), where V is the no de (or vertex ) set and A is the ar c (or e dge ) set . The DA G deﬁnes a factorization of the join t probabilit y distribution of V = { X 1 , X 2 , . . . , X v } , often called the glob al pr ob ability distribution , in to a set of lo c al pr ob ability distributions , one for each v ariable. The form of the f actorization is given b y the Markov pr op erty of Ba y esian netw orks ( Korb and Nic holson 2004 , section 2.2.4), which states that ev ery random v ariable X i directly dep ends only on its paren ts Π X i : P ( X 1 , . . . , X v ) = v Y i =1 P ( X i | Π X i ) (for discrete v ariables) (1) f ( X 1 , . . . , X v ) = v Y i =1 f ( X i | Π X i ) (for contin uous v ariables). (2) The corresp ondence b etw een conditional independence (of the random v ariables) and graph- ical separation (of the corresponding no des of the graph) has b een extended to an arbitrary triplet of disjoint subsets of V by P earl ( 1988 ) with the d-sep ar ation (from dir e ction-dep endent sep ar ation ). Therefore model selection algorithms ﬁrst try to learn the graphical structure of the Bay esian net w ork (hence the name of structur e le arning algorithms ) and then estimate the parameters of the lo cal distribution functions conditional on the learned structure. This t w o-step approach has the adv an tage that it considers one local distribution function at a time, and it do es not require to mo del the global distribution function explicitly . Another adv antage is that learning algorithms are able to scale to ﬁt high-dimensional mo dels without incurring in the so-called curse of dimensionality . Although there are man y p ossible choices for b oth the global and the lo cal distribution func- tions, literature hav e focused mostly on tw o cases: • multinomial data (the discr ete c ase ): both the global and the lo cal distributions are m ultinomial, and are represen ted as probabilit y or contin gency tables. This is by far the most common assumption, and the corresp onding Ba y esian netw orks are usually referred to as discr ete Bayesian networks (or simply as Bayesian networks ). • multivariate normal data (the c ontinuous c ase ): the global distribution is m ultiv ariate normal, and the lo cal distributions are normal random v ariables linked by linear con- strain ts. These Ba y esian net works are called Gaussian Bayesian networks in Geiger and Hec k erman ( 1994 ), Neap olitan ( 2003 ) and most recent literature on the sub ject. Journal of Statistical Softw are 3 Other distributional assumptions lead to more complex learning algorithms (suc h as the non- parametric approac h prop osed by Bac h and Jordan ( 2003 )) or present v arious limitations due to the diﬃcult y of sp ecifying the distribution functions in closed form (su ch as the approac h to learn Bay esian net w ork with mixed v ariables by Boettcher and Dethlefsen ( 2003 ), whic h do es not allow a node asso ciated with a contin uous v ariable to b e the paren t of a node associated with a discrete v ariable). 3. Structure learning algorithms Ba y esian netw ork structure learning algorithms can be group ed in tw o categories: • c onstr aint-b ase d algorithms : these algorithms learn the net w ork structure by analyzing the probabilistic relations en tailed by the Marko v prop ert y of Ba y esian netw orks with conditional independence tests and then constructing a graph which satisﬁes the corre- sp onding d-separation statemen ts. The resulting mo dels are often in terpreted as c ausal mo dels even when learned from observ ational data ( Pearl 1988 ). • sc or e-b ase d algorithms : these algorithms assign a score to each candidate Bay esian net w ork and try to maximize it with some heuristic searc h algorithm. Gr eedy search algorithms (such as hil l-climbing or tabu se ar ch ) are a common choice, but almost an y kind of search pro cedure can b e used. Constr aint-b ase d algorithms are all based on the Inductive Causation (IC) algorithm by V erma and P earl ( 1991 ), which provides a theoretical framework for learning the structure causal mo dels. It can be summarized in three steps: 1. ﬁrst the skeleton of the net w ork (the undirected graph underlying the net w ork structure) is learned. Since an exhaustive searc h is computationally unfeasible for all but the most simple data s ets, all learning algorithms use some kind of optimization such as restricting the search to the Markov blanket of each no de (which includes the parents, the c hildren and all the no des that share a c hild with that particular no de). 2. set all direction of the arcs that are part of a v-structur e (a triplet of no des incident on a conv erging connection X j → X i ← X k ). 3. set the directions of the other arcs as needed to satisfy the acyclicity constrain t. Sc or e-b ase d algorithms on the other hand are simply applications of v arious general purp ose heuristic searc h algorithms, suc h as hil l-climbing , tabu se ar ch , simulate d anne aling and v arious genetic algorithms . The score function is usually sc or e-e quivalent ( Chic k ering 1995 ), so that net w orks that deﬁne the same probability distribution are assigned the same score. 4 Learning Bay esian Net w orks with the bnlearn R Pac k age 4. P ac k age implement ation 4.1. Structure learning algorithms bnlearn implements the following constrain t-based learning algorithms (the resp ective func- tion names are reported in parenth esis): • Gr ow-Shrink ( gs ): based on the Gr ow-Shrink Markov Blanket , the simplest Mark o v blank et detection algorithm ( Margaritis 2003 ) used in a structure learning algorithm. • Incr emental Asso ciation ( iamb ): based on the Incremental Asso ciation Marko v blanket (IAMB) algorithm ( Tsamardinos et al. 2003 ), which is based on a tw o-phase selection sc heme (a forward selection follo w ed by an attempt to remov e false p ositives). • F ast Incr emental Asso ciation ( fast.iamb ): a v arian t of IAMB which uses sp ecula- tiv e stepwise forw ard selection to reduce the n um ber of conditional indep endence tests ( Y aramak ala and Margaritis 2005 ). • Interle ave d Incr emental Asso ciation ( inter.iamb ): another v arian t of IAMB which uses forw ard stepwise selection ( Tsamardinos et al. 2003 ) to av oid false p ositiv es in the Mark o v blanket detection phase. • Max-Min Par ents and Childr en ( mmpc ): a forward selection technique for neigh b ourho o d detection based on the maximization of the minim um asso ciation measu re observed with an y subset of the no des selected in the previous iterations ( Tsamardinos et al. 2006 ). It learns the underlying structure of the Ba y esian netw ork (all the arcs are undirected, no attempt is made to detect their orientation). Three implemen tations are provided for each algorithm: • an optimized implemen tation (used b y default) whic h uses bac ktrac king to roughly halv e the num b er of indep endence tests. • an unoptimized implemen tation (used when the optimized argument is set to FALSE ) whic h is faithful to the original description of the algorithm. This implementation is particularly useful for comparing the b ehaviour of diﬀeren t combinations of learning algorithms and statistical tests. • a parallel implemen tation. It requires a running cluster set up with the makeCluster function from the snow pack age ( Tierney et al. 2008 ), which is passed to the function via the cluster argument. The only av ailable score-based learning algorithm is a Hil l-Climbing ( hc ) greedy sear ch on the space of directed graphs. The optimized implemen tation (again used b y default) uses score cac hing, score decomposability and score equiv alence to reduce the n um b er of dupli cated tests ( Daly and Shen 2007 ). Random restarts, a conﬁgurable num ber of p erturbing op erations and a preseeded initial netw ork structure can b e used to av oid p o or lo cal maxima (with the restart , perturb and start arguments, resp ectiv ely). Journal of Statistical Softw are 5 4.2. Conditional indep endence tests Sev eral conditional independence tests from information theory and classical statistics are a v ailable for use in constraint -based learning algorithms and the ci.test function. In b oth cases the test to b e used is sp eciﬁed with the test argumen t (the lab el associated with each test is rep orted in paren thesis). Conditional independence tests for discrete data are functions of the conditional probability tables implied by the graphical structure of the net w ork through the observ ed frequencies { n ij k , i = 1 , . . . , R, j = 1 , . . . , C, k = 1 , . . . , L } for the random v ariables X and Y and all the conﬁgurations of the conditioning v ariables Z : • mutual information : an information-theoretic distance measure ( Kullbac k 1959 ), de- ﬁned as MI( X , Y | Z ) = R X i =1 C X j =1 L X k =1 n ij k n log n ij k n ++ k n i + k n + j k . (3) It is proportional to the log-lik eliho o d ratio test G 2 (they diﬀer b y a 2 n factor, where n is the sample size) and it is related to the deviance of the tested mo dels. Both the asymptotic χ 2 test ( mi ) and the Monte Carlo p ermutation test ( mc-mi ) describ ed in Go o d ( 2005 ) are a v ailable. • Pe arson ’s X 2 : the classical Pearson’s X 2 test for contingency tables, X 2 ( X , Y | Z ) = R X i =1 C X j =1 L X k =1 ( n ij k − m ij k ) 2 m ij k , m ij k = n i + k n + j k n ++ k (4) Again both the asymptotic χ 2 test ( x2 ) and a Monte Carlo p ermutation test ( mc-x2 ) from Go o d ( 2005 ) are av ailable. • fast mutual information ( fmi ): a v arian t of the mutual information whic h is set to zero when there aren’t at least ﬁve data p er parameter, which is the usual threshold for establishing the correctness of the asymptotic χ 2 distribution. This is the same heuristic deﬁned for the F ast-IAMB algorithm in Y aramak ala and Margaritis ( 2005 ). • Akaike Information Criterion ( aict ): an exp erimen tal AIC-based indep endence test, computed comparing the mutual information and the expected information gain. It rejects the null h ypothesis if MI( X , Y | Z ) > ( R − 1)( C − 1) L n , (5) whic h corresp onds to an increase in the AIC score of the netw ork. In the contin uous case conditional independence tests are functions of the partial correlation co eﬃcien ts ρ X Y | Z of X and Y given Z : • line ar c orr elation : the linear correlation co eﬃcient ρ X Y | Z . Both the asymptotic Stu- den t’s t test ( cor ) and the Monte Carlo p erm utation test ( mc-cor ) described in Legendre ( 2000 ) are av ailable. 6 Learning Bay esian Net w orks with the bnlearn R Pac k age • Fisher’s Z : a transformation of the linear correlation co eﬃcient used b y commercial soft w are (such as TETRAD) and the p calg pack age ( Kalisch and B ¨ uhlmann 2007 ), whic h implemen ts the PC constrain t-based learning algorithm ( Spirtes et al. 2001 ). It is deﬁned as Z( X , Y | Z ) = 1 2 p n − | Z | − 3 log 1 + ρ X Y | Z 1 − ρ X Y | Z . (6) Both the asymptotic normal test ( zf ) and the Mon te Carlo p ermutation test ( mc-zf ) are av ailable. • mutual information ( mi-g ): an information-theoretic distance measure ( Kullbac k 1959 ), deﬁned as MI g ( X , Y | Z ) = − 1 2 log(1 − ρ 2 X Y | Z ) . (7) It has the same relationship with the log-likelihoo d ratio as the corresp onding test deﬁned in the discrete case. 4.3. Net w ork scores Sev eral score functions are a v ailable for use in the hill-clim bing algorithm and the score function. The score to b e used is sp eciﬁed with the score argumen t in hc and with the type argumen t in the score function (the lab el asso ciated with eac h score is reported in paren thesis). In the discrete case the following score functions are implemented: • the likeliho o d ( lik ) and lo g-likeliho o d ( loglik ) scores, which are equiv alent to the en- tr opy me asur e used by W ek a ( Witten and F rank 2005 ). • the Akaike ( aic ) and Bayesian ( bic ) Information Criterion scores, deﬁned as AIC = log L( X 1 , . . . , X v ) − d BIC = log L( X 1 , . . . , X v ) − d 2 log n (8) The latter is equiv alent to the Minimum Description L ength describ ed by Rissanen ( 1978 ) and used as a Bay esian net w ork score in Lam and Bacch us ( 1994 ). • the logarith m of the Bayesian Dirichlet e quivalent score ( bde ), a score eq uiv alent Diric h- let p osterior density ( Hec k erman et al. 1995 ). • the logarithm of the K2 score ( k2 ), another Diric hlet p osterior density ( Co op er and Hersk o vits 1992 ) deﬁned as K2 = v Y i =1 K2( X i ) , K2( X i ) = L i Y j =1 ( R i − 1)!  P R i k =1 n ij k + R i − 1  ! R i Y k =1 n ij k ! (9) and originally used in the structure learning algorithm of the same name. Unlik e the bde score k2 is not score equiv alen t. Journal of Statistical Softw are 7 The only score av ailable for the contin uous case is a score equiv alen t Gaussian p osterior densit y ( bge ), which follo ws a Wishart distribution ( Geiger and Heck erman 1994 ). 4.4. Arc whitelisting and blac klisting Prior information on the data, such as the ones elicited from exp erts in the relev an t ﬁelds, can b e int egrated in all learning algorithms by means of the blacklist and whitelist argumen ts. Both of them accept a set of arcs whic h is guaran teed to b e either present (for the former) or missing (for the latter) from the Bay esian net w ork; an y arc whitelisted and blac klisted at the same time is assumed to b e whitelisted, and is thus remo v ed from the blac klist. This combination represents a very ﬂexible w a y to describ e an y arbitrary set of assumptions on the data, and is also able to deal with partially directed graphs: • any arc whitelisted in b oth directions (i.e. b oth A → B and B → A are whitelisted) is present in the graph, but the c hoice of its direction is left to the learning algorithm. Therefore one of A → B , B → A and A − B is guaran teed to be in the Ba yesian netw ork. • any arc blacklisted in b oth directions, as well as the corresp onding undirected arc, is nev er presen t in the graph. Therefore if b oth A → B and B → A are blacklisted, also A − B is considered blac klisted. • any arc whitelisted in one of its p ossible directions (i.e. A → B is whitelisted, but B → A is not) is guaran teed to b e present in the graph in the sp eciﬁed direction. This eﬀectiv ely amounts to blac klisting b oth the corresponding undirected arc ( A − B ) and its reverse ( B → A ). • any arc blacklisted in one of its p ossible directions (i.e. A → B is blacklisted, but B → A is not) is never present in the graph. The same holds for A − B , but not for B → A . 5. A simple example In this section bnlearn will b e used to analyze a small data set, learning.test . It’s included in the pac k age itself along with other real w ord and syn thetic data sets, and is used in the example sections throughout the man ual pages due to its simple structure. 5.1. Loading the pac k age bnlearn and its dep endencies (the utils pack age, which is bundled with R ) are av ailable from CRAN, as are the suggested pack ages sno w and graph ( Gentleman et al. 2010 ). The other suggested pack age, Rgraph viz ( Gentry et al. 2010 ), can b e installed from BioConductor and is loaded along with bnlearn if present. > library(bnlearn) Loading required package: Rgraphviz Loading required package: graph Loading required package: grid Package Rgraphviz loaded successfully. 8 Learning Bay esian Net w orks with the bnlearn R Pac k age 5.2. Learning a Ba y esian netw ork from data Once bnlearn is loaded, learning.test itself can b e loaded in to a data frame of the same name with a call to data . > data(learning.test) > str(learning.test) ' data.frame ' : 5000 obs. of 6 variables: $ A: Factor w/ 3 levels "a","b","c": 2 2 1 1 1 3 3 2 2 2 ... $ B: Factor w/ 3 levels "a","b","c": 3 1 1 1 1 3 3 2 2 1 ... $ C: Factor w/ 3 levels "a","b","c": 2 3 1 1 2 1 2 1 2 2 ... $ D: Factor w/ 3 levels "a","b","c": 1 1 1 1 3 3 3 2 1 1 ... $ E: Factor w/ 3 levels "a","b","c": 2 2 1 2 1 3 3 2 3 1 ... $ F: Factor w/ 2 levels "a","b": 2 2 1 2 1 1 1 2 1 1 ... learning.test contains six discrete v ariables, stored as factors, eac h with 2 (for F ) or 3 (for A , B , C , D and E ) levels. The structure of the Ba y esian netw ork asso ciated with this data set can b e learned for example with the Gro w-Shrink algorithm, implemented in the gs function, and stored in an ob ject of class bn . > bn.gs <- gs(learning.test) > bn.gs Bayesian network learned via Constraint-based methods model: [partially directed graph] nodes: 6 arcs: 5 undirected arcs: 1 directed arcs: 4 average markov blanket size: 2.33 average neighbourhood size: 1.67 average branching factor: 0.67 learning algorithm: Grow-Shrink conditional independence test: Mutual Information (discrete) alpha threshold: 0.05 tests used in the learning procedure: 43 optimized: TRUE Other constrain t-based algorithms return the same partially directed net work structure (again as an ob ject of class bn ), as can b e readily seen with compare . > bn2 <- iamb(learning.test) > bn3 <- fast.iamb(learning.test) > bn4 <- inter.iamb(learning.test) Journal of Statistical Softw are 9 > compare(bn.gs, bn2) [1] TRUE > compare(bn.gs, bn3) [1] TRUE > compare(bn.gs, bn4) [1] TRUE On the other hand hill-clim bing results in a completely directed netw ork, which diﬀers from the previous one because the arc b etw een A and B is directed ( A → B instead of A − B ). > bn.hc <- hc(learning.test, score = "aic") > bn.hc Bayesian network learned via Score-based methods model: [A][C][F][B|A][D|A:C][E |B:F] nodes: 6 arcs: 5 undirected arcs: 0 directed arcs: 5 average markov blanket size: 2.33 average neighbourhood size: 1.67 average branching factor: 0.83 learning algorithm: Hill-Climbing score: Akaike Information Criterion penalization coefficient: 1 tests used in the learning procedure: 40 optimized: TRUE > compare(bn.hc, bn.gs) [1] FALSE Another w a y to compare the t w o net w ork structures is to plot them side b y side and highligh t the diﬀering arcs. This can b e done either with the plot function (see Figure 1 ): > par(mfrow = c(1,2)) > plot(bn.gs, main = "Constraint-based algorithms", highlight = c("A", "B")) > plot(bn.hc, main = "Hill-Climbing", highlight = c("A", "B")) or with the more ve rsatile graphviz.plot : > par(mfrow = c(1,2)) > highlight.opts <- list(nodes = c("A", "B"), arcs = c("A", "B"), + col = "red", fill = "grey") > graphviz.plot(bn.hc, highlight = highlight.opts) > graphviz.plot(bn.gs, highlight = highlight.opts) 10 Learning Bay esian Net w orks with the bnlearn R Pac k age Constraint−based algorithms A B C D E F Hill−Climbing A B C D E F Figure 1: side by side comparison of the Ba y esian netw ork structures learned by constraint- based ( gs , iamb , fast.iamb and inter.iamb ) and score-based ( hc ) algorithms. Arcs whic h diﬀer b etw een the tw o net w ork structures are plotted in red. whic h pro duces a b etter output for large graphs thanks to the functionalit y provided by the Rgraph viz pack age. The netw ork structure learned b y gs , iamb , fast.iamb and inter.iamb is equiv alent to the one learned by hc ; changing the arc A − B to either A → B or to B → A results in net w orks with the same score b ecause of the score equiv alence prop erty (which holds for all the implemen ted score functions with the exception of K 2). Therefore if there is any prior information ab out the relationship b et w een A and B the appropriate direction can b e whitelisted (or its reverse can blacklisted, which is equiv alent in this case). > bn.AB <- gs(learning.test, blacklist = c("B", "A")) > compare(bn.AB, bn.hc) [1] TRUE > score(bn.AB, learning.test, type = "bde") [1] -24002.36 > bn.BA <- gs(learning.test, blacklist = c("A", "B")) > score(bn.BA, learning.test, type = "bde") [1] -24002.36 5.3. Net w ork analysis and manipulation The structure of a Ba y esian netw ork is uniquely sp eciﬁed if its graph is completely directed. In that case it can b e represen ted as a string with the modelstring function > modelstring(bn.hc) [1] "[A][C][F][B|A][D|A:C][E|B:F] " Journal of Statistical Softw are 11 whose output is also included in the print metho d for the ob jects of class bn . Each node is prin ted in square brack ets along with all its paren ts (whic h are rep orted after a pip e as a colon-separated list), and its position in the string dep ends on the partial ordering deﬁned by the netw ork structure. The same syn tax is used in deal ( Bo ettcher and Dethlefsen 2003 ), an R pack age for learning Bay esian net w orks from mixed data. P artially directed graphs can b e transformed into completely directed ones with the set.arc , drop.arc and reverse.arc functions. F or example the direction of the arc A − B in the bn.gs ob ject can b e set to A → B , so that the resulting net w ork structure is identical to the one learned by the hill-clim bing algorithm. > undirected.arcs(bn.gs) from to [1,] "A" "B" [2,] "B" "A" > bn.dag <- set.arc(bn.gs, "A", "B") > modelstring(bn.dag) [1] "[A][C][F][B|A][D|A:C][E|B:F] " > compare(bn.dag, bn.hc) [1] TRUE Acyclicit y is alw a ys preserv ed, as these commands return an error if the requested changes w ould result in a cyclic graph. > set.arc(bn.hc, "E", "A") Error in arc.operations(x = x, from = from, to = to, op = "set", check.cycles = check.cycles, : the resulting graph contains cycles. F urther information on the net w ork structure can b e extracted from any bn ob ject with the follo wing functions: • whether the netw ork structure is acyclic ( acyclic ) or completely directed ( directed ); • the lab els of the no des ( nodes ), of the ro ot no des ( root.nodes ) and of the leaf no des ( leaf.nodes ); • the directed arcs ( directed.arcs ) of the netw ork, the undirected ones ( undirected.arcs ) or b oth of them ( arcs ); • the adjacency matrix ( amat ) and the n um ber of parameters ( nparams ) asso ciated with the netw ork structure; • the paren ts ( parents ), children ( children ), Marko v blanket ( mb ), and neighbourho o d ( nbr ) of each no de. The arcs , amat and modelstring functions can also b e used in com bination with empty.graph to create a bn ob ject with a sp eciﬁc structure from scratc h: > other <- empty.graph(nodes = nodes(bn.hc)) 12 Learning Bay esian Net w orks with the bnlearn R Pac k age > arcs(other) <- data.frame( + from = c("A", "A", "B", "D"), + to = c("E", "F", "C", "E")) > other Randomly generated Bayesian network model: [A][B][D][C|B][E|A:D][F |A] nodes: 6 arcs: 4 undirected arcs: 0 directed arcs: 4 average markov blanket size: 1.67 average neighbourhood size: 1.33 average branching factor: 0.67 generation algorithm: Empty This is particularly useful to compare diﬀerent net w ork structures for the same data, for example to v erify the go o dness of ﬁt of the learned netw ork with resp ect to a particular score function. > score(other, data = learning.test, type = "aic") [1] -28019.79 > score(bn.hc, data = learning.test, type = "aic") [1] -23873.13 5.4. Debugging utilities and diagnostics Man y of the functions of the bnlearn pack age are able to print additional diagnostic messages if called with the debug argumen t set to TRUE . This is especially useful to study the b ehaviour of the learning algorithms in sp eciﬁc settings and to inv estigate anomalies in their results (whic h may b e due to an insuﬃcient sample size for the asymptotic distribution of the tests to b e v alid, for example). F or example the debugging output of the call to gs previously used to pro duce the bn.gs ob ject rep orts the exact sequence of conditional independence tests p erformed by the learning algorithm, along with the eﬀects of the backtrac king optimizations (some parts are omitted for brevity). > gs(learning.test, debug = TRUE) ----------------------- ---------------------------- ------------- * learning markov blanket of A . * checking node B for inclusion. > node B included in the markov blanket ( p-value: 0 ). > markov blanket now is ' B ' . * checking node C for inclusion. Journal of Statistical Softw are 13 > A indep. C given ' B ' ( p-value: 0.8743202 ). * checking node D for inclusion. > node D included in the markov blanket ( p-value: 0 ). > markov blanket now is ' B D ' . * checking node E for inclusion. > A indep. E given ' B D ' ( p-value: 0.5193303 ). * checking node F for inclusion. > A indep. F given ' B D ' ( p-value: 0.07368042 ). * checking node C for inclusion. > node C included in the markov blanket ( p-value: 1.023194e-254 ). > markov blanket now is ' B D C ' . * checking node E for inclusion. > A indep. E given ' B D C ' ( p-value: 0.5091863 ). * checking node F for inclusion. > A indep. F given ' B D C ' ( p-value: 0.3318902 ). * checking node B for exclusion (shrinking phase). > node B remains in the markov blanket. ( p-value: 9.224694e-291 ) * checking node D for exclusion (shrinking phase). > node D remains in the markov blanket. ( p-value: 0 ) ----------------------- ---------------------------- ------------- * learning markov blanket of B . [...] ----------------------- ---------------------------- ------------- * learning markov blanket of F . * known good (backtracking): ' B E ' . * known bad (backtracking): ' A C D ' . * nodes still to be tested for inclusion: ' ' . ----------------------- ---------------------------- ------------- * checking consistency of markov blankets. [...] ----------------------- ---------------------------- ------------- * learning neighbourhood of A . * blacklisted nodes: ' ' * whitelisted nodes: ' ' * starting with neighbourhood: ' B D C ' * checking node B for neighbourhood. > dsep.set = ' E F ' > trying conditioning subset ' ' . > node B is still a neighbour of A . ( p-value: 0 ) > trying conditioning subset ' E ' . > node B is still a neighbour of A . ( p-value: 0 ) > trying conditioning subset ' F ' . > node B is still a neighbour of A . ( p-value: 0 ) > trying conditioning subset ' E F ' . > node B is still a neighbour of A . ( p-value: 0 ) * checking node D for neighbourhood. > dsep.set = ' C ' 14 Learning Bay esian Net w orks with the bnlearn R Pac k age > trying conditioning subset ' ' . > node D is still a neighbour of A . ( p-value: 0 ) > trying conditioning subset ' C ' . > node D is still a neighbour of A . ( p-value: 0 ) * checking node C for neighbourhood. > dsep.set = ' D ' > trying conditioning subset ' ' . > node C is not a neighbour of A . ( p-value: 0.8598334 ) ----------------------- ---------------------------- ------------- * learning neighbourhood of B . [...] ----------------------- ---------------------------- ------------- * learning neighbourhood of F . * blacklisted nodes: ' ' * whitelisted nodes: ' ' * starting with neighbourhood: ' E ' * known good (backtracking): ' E ' . * known bad (backtracking): ' A B C D ' . ----------------------- ---------------------------- ------------- * checking consistency of neighbourhood sets. [...] ----------------------- ---------------------------- ------------- * v-structures centered on D . * checking A -> D <- C > chosen d-separating set: ' ' > testing A vs C given D ( 0 ) @ detected v-structure A -> D <- C ----------------------- ---------------------------- ------------- * v-structures centered on E . * checking B -> E <- F > chosen d-separating set: ' ' > testing B vs F given E ( 1.354269e-50 ) @ detected v-structure B -> E <- F ----------------------- ---------------------------- ------------- * v-structures centered on F . ----------------------- ---------------------------- ------------- * applying v-structure A -> D <- C ( 0.000000e+00 ) * applying v-structure B -> E <- F ( 1.354269e-50 ) ----------------------- ---------------------------- ------------- * detecting cycles ... [...] ----------------------- ---------------------------- ------------- * propagating directions for the following undirected arcs: [...] Other functions which pro vide useful diagnostics include (but are not limited to) compare (whic h rep orts th e diﬀerences b etw een the tw o net work structures with resp ect to arcs, parents Journal of Statistical Softw are 15 and children for eac h no de), score and nparams (which report the num ber of parameters and the contribu tion of each no de to the net w ork score, resp ectively). 6. Practical examples 6.1. The ALARM net w ork The ALARM ( “A Logical Alarm Reduction Mec hanism” ) netw ork from Beinlic h et al. ( 1989 ) is a Ba y esian net work designed to provide an alarm message system for patien t monitoring. It has b een widely used (see for example Tsamardinos et al. ( 2006 ) and F riedman et al. ( 1999 )) as a case study to ev aluate the p erformance of new structure learning algorithms. The alarm data set includes a sample of size 20000 generated from this netw ork, which con tains 37 discrete v ariables (with t w o to four levels each) and 46 arcs. Every learning algorithm implemen ted in bnlear n (except mmpc ) is capable of reco v ering the ALARM net w ork to within a few arcs and arc directions (see Figure 2 ). > alarm.gs <- gs(alarm) > alarm.iamb <- iamb(alarm) > alarm.fast.iamb <- fast.iamb(alarm) > alarm.inter.iamb <- inter.iamb(alarm) > alarm.mmpc <- mmpc(alarm) > alarm.hc <- hc(alarm, score = "bic") The num ber of conditional indep endence tests, which pro vides an implementation indep enden t p erformance indicator, is similar in all constrain t-based algorithms (see T able 1 ); the same holds for the num ber of netw ork score comparisons p erformed by hc , even though the learned net w ork has ab out ten more arcs than the other ones. The qualit y of the learned net w ork impro v es signiﬁcan tly if the Mon te Carlo v ersions of the tests are used instead of the p arametric ones, as probabilit y structure of the ALARM net work results in many sparse contingency tables. F or example a side by side comparison of the tw o v ersions of Pearson’s X 2 test shows that the use of the nonparametric tests leads to the correct iden tiﬁcation of all but ﬁv e arcs, instead of the 12 missed with the parametric tests. gs iamb fast.iamb inter.iamb hc indep endence tests / 1727 2874 2398 3106 2841 net w ork comparisons learned arcs 42 43 45 43 53 (directed/undirected) (29/13) (29/14) (32/13) (30/13) (53/0) execution time 13.54360 17.63735 14.80890 18.59825 72.38705 T able 1: Performance of implemented learning algorithms with the alarm data set, measured in the n um b er of conditional indep endence tests (for constraint-based algorithms) or netw ork score comparisons (for score-based algorithms), the num b er of arcs and the execution time on an Intel Core 2 Duo mac hine with 1GB of RAM. 16 Learning Bay esian Net w orks with the bnlearn R Pac k age ACO2 ANES APL BP CCHL CO CVP DISC ECO2 ERCA ERLO FIO2 HIST HR HRBP HREK HRSA HYP INT KINK L VF L VV MINV MVS P AP PCWP PMB PRSS PVS SAO2 SHNT STKV TPR V AL V VLNG VMCH VTUB ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● CVP PCWP HIST TPR BP CO HRBP HREK HRSA P AP SAO2 FIO2 PRSS ECO2 MINV MVS HYP L VF APL ANES PMB INT KINK DISC L VV STKV CCHL ERLO HR ERCA SHNT PVS ACO2 VAL V VLNG VTUB VMCH CVP PCWP HIST TPR BP CO HRBP HREK HRSA P AP SAO2 FIO2 PRSS ECO2 MINV MVS HYP L VF APL ANES PMB INT KINK DISC L VV STKV CCHL ERLO HR ERCA SHNT PVS ACO2 V AL V VLNG VTUB VMCH CVP PCWP HIST TPR BP CO HRBP HREK HRSA P AP SAO2 FIO2 PRSS ECO2 MINV MVS HYP L VF APL ANES PMB INT KINK DISC L VV STKV CCHL ERLO HR ERCA SHNT PVS ACO2 VAL V VLNG VTUB VMCH Figure 2: The ALARM data set: the original net w ork structure (top left) and the netw ork structures learned by gs (top right), inter.iamb (b ottom left) and hc (bottom righ t). > dag <- empty.graph(names(alarm)) > modelstring(dag) <- paste("[HIST|LVF][CVP|LVV][PCW P|LVV][HYP][LVV|HYP:LVF]", + "[LVF][STKV|HYP: LVF][ERLO][HRBP|ERLO:HR][HR EK|ERCA:HR][ERCA][HRSA|ERCA: HR]", + "[ANES][APL][TPR |APL][ECO2|ACO2:VLNG][KINK] [MINV|INT:VLNG][FIO2]", + "[PVS|FIO2:VALV] [SAO2|PVS:SHNT][PAP|PMB][PM B][SHNT|INT:PMB][INT]", + "[PRSS|INT:KINK: VTUB][DISC][MVS][VMCH|MVS][ VTUB|DISC:VMCH]", + "[VLNG|INT:KINK: VTUB][VALV|INT:VLNG][ACO2|V ALV][CCHL|ACO2:ANES:SAO2:TPR ]", + "[HR|CCHL][CO|HR :STKV][BP|CO:TPR]", sep = "") > alarm.gs <- gs(alarm, test = "x2") Journal of Statistical Softw are 17 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ACO2 ANES APL BP CCHL CO CVP DISC ECO2 ERCA ERLO FIO2 HIST HR HRBP HREK HRSA HYP INT KINK L VF L VV MINV MVS P AP PCWP PMB PRSS PVS SAO2 SHNT STKV TPR VAL V VLNG VMCH VTUB ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ACO2 ANES APL BP CCHL CO CVP DISC ECO2 ERCA ERLO FIO2 HIST HR HRBP HREK HRSA HYP INT KINK L VF L VV MINV MVS P AP PCWP PMB PRSS PVS SAO2 SHNT STKV TPR VAL V VLNG VMCH VTUB Figure 3: Side b y side comparison of the netw ork structures learned from the alarm data set b y the Grow-Shrink algorithm with the parametric (on the left) and nonparametric (on the righ t) versi ons of Pearson’s X 2 test. The arcs of the true net w ork structure present in eac h case are highlighted in red. > alarm.mc <- gs(alarm, test = "mc-x2", B = 10000) > par(mfrow = c(1,2), omi = rep(0, 4), mar = c(1, 0, 1, 0)) > graphviz.plot(dag, highlight = list(arcs = arcs(alarm.gs))) > graphviz.plot(dag, highlight = list(arcs = arcs(alarm.mc))) 6.2. The examination marks data set The marks data set is a small data set studied in Mardia et al. ( 1979 ), Whittaker ( 1990 ) and Edw ards ( 2000 ). It contains ﬁve contin uous v ariables, the examination marks for 88 studen ts in ﬁve diﬀerent sub jects (mechanics, vectors, algebra, analysis and statistics). > data(marks) > str(marks) ' data.frame ' : 88 obs. of 5 variables: $ MECH: num 77 63 75 55 63 53 51 59 62 64 ... $ VECT: num 82 78 73 72 63 61 67 70 60 72 ... $ ALG : num 67 80 71 63 65 72 65 68 58 60 ... $ ANL : num 67 70 66 70 70 64 65 62 62 62 ... $ STAT: num 81 81 81 68 63 73 68 56 70 45 ... The purp ose of the analysis was to ﬁnd a suitable wa y to com bine or a v erage these marks. Since they are obvi ously correlated, the exact weigh ts they are assigned depend on the esti- mated dep endence structure of the data. Under the assumption of m ultiv ariate normality this analysis requires the examination of the partial correlation co eﬃcients, some of which are clearly not signiﬁcative: > ci.test("MECH", "ANL", "ALG", data = marks) Pearson ' s Linear Correlation 18 Learning Bay esian Net w orks with the bnlearn R Pac k age MECH VECT ALG ANL ST A T MECH VECT ALG ANL ST A T MECH VECT ALG ANL ST A T Figure 4: Graphical models learned from the marks data set. F rom left to right: the netw ork learned b y mmpc (whic h is iden tical to the Gaussian graphical mo del in Edwards ( 2000 )), the one learned by the other constraint-based algorithms and the one learned b y hc . data: MECH ~ ANL | ALG cor = 0.0352, df = 85, p-value = 0.7459 alternative hypothesis: true value is not equal to 0 > ci.test("STAT", "VECT", "ALG", data = marks) Pearson ' s Linear Correlation data: STAT ~ VECT | ALG cor = 0.0527, df = 85, p-value = 0.628 alternative hypothesis: true value is not equal to 0 This is conﬁrmed by the other conditional indep endence tests, b oth parametric and nonpara- metric; for example: > ci.test("STAT", "VECT", "ALG", data = marks, test = "zf")$p.value [1] 0.6289112 > ci.test("STAT", "VECT", "ALG", data = marks, test = "mc-cor")$p.value [1] 0.6332 > ci.test("STAT", "VECT", "ALG", data = marks, test = "mi-g")$p.value [1] 0.6209406 > ci.test("STAT", "VECT", "ALG", data = marks, test = "mc-mi-g")$p.value [1] 0.6226 All learning algorithms result in very similar netw ork structures, which agree up to the direc- tion of the arcs (see Figure 4 ). In all mo dels the marks for analysis and statistics are condi- tionally indep enden t from the ones for mechanics and v ectors, given algebra. The structure of the graph suggests that the latter is essen tial in the o v erall ev aluation of the examination. Journal of Statistical Softw are 19 7. Other pac k ages for learning Ba y esian net w orks There exist other pack ages in R which are able to either learn the structure of a Bay esian net- w ork or ﬁt and manipulate its parameters. Some examples are pcalg , which implements the PC algorithm and fo cuses on the causal in terpretation of Bay esian netw orks; deal , which im- plemen ts a hill-cl im bing searc h for mi xed data; and the suite composed b y gRbase ( Hø jsgaard et al. 2010 ), gRain ( Hø jsgaard 2010 ), gRc ( Hø jsgaard and Lauritzen 2008 ), whic h implemen ts v arious exact and approx imate inference pro cedures. Ho w ev er, none of these pack ages is as versatile as bnlearn for learning the structure of Bay esian net w orks. deal and p calg implemen t a single learning algorithm, even though are able to handle b oth discrete and contin uous data. F urthermore, the PC algorithm has a po or p erfor- mance in terms of sp eed and accuracy compared to new er constrain t-based algorithms such as Gro w-Shrink and IAMB ( Tsamardinos et al. 2003 ). bnlearn also oﬀers a wider selection of netw ork scores and conditional indep endence tests; in particular it’s the only R pac k age able to learn the structure of Bay esian net w orks using permutation tests, which are sup erior to the corresp onding asymptotic tests at low sample sizes. 8. Conclusions bnlearn is an R pac k age whic h pro vides a free implemen tation of some Bay esian netw ork struc- ture learning algorithms app eared in recent literature, enhanced with algorithmic optimiza- tions and support for parallel computing. Man y score functions and conditional independence tests are provided for b oth indep endent use and the learning algorithms themselv es. bnlearn is designed to provide the v ersatilit y needed to handle exp erimental data analy- sis. It handles b oth discrete and contin uous data, and it supp orts any combination of the implemen ted learning algorithms and either netw ork scores (for score-based algorithms) or conditional indep endence tests (for constrain ts-based algorithms). F urthermore, it simpliﬁes the analysis of the learned netw orks by providing a single ob ject class ( bn ) for all the al- gorithms and a set of utility functions to p erform descriptiv e statistics and basic inference pro cedures. Ac kno wledgemen ts Man y thanks to Prof. Adriana Brogini, m y Sup ervisor at the Ph.D. Sc ho ol in Statistical Sciences (Universit y of P ado v a), for pro ofreading this article and giving many useful comments and suggestions. I woul d also like to thank Radhak rishnan Nagara jan (Univ ersit y of Ark ansas for Medical Sciences) and Suhaila Zainudin (Univ ersiti T eknologi Malaysia) for their supp ort, whic h encouraged me in the developmen t of this pack age. References Acid S, de Camp os LM, F ernandez-Luna J, Ro driguez S, Ro driguez J, Salcedo J (2004). “A Comparison of Learning Algorithms for Bay esian Net w orks: A Case Study Based on Data from An Emergency Medical Service.” A rtiﬁcial Intel ligenc e in Me dicine , 30 , 215–232. 20 Learning Bay esian Net w orks with the bnlearn R Pac k age Bac h FR, Jordan MI (2003) . “Learning Graphi cal Models with Mercer Kernels.” In “Adv ances in Neural Information Pro cessing Systems (NIPS) 15,” pp. 1009–1016. MIT Press. Beinlic h I, Suermondt HJ, Chav ez RM, Co op er GF (1989). “The ALARM Monitoring Sys- tem: A Case Study with Two Probabilistic Inference T ec hniques for Belief Net w orks.” In “Pro ceedings of the 2nd Europ ean Conference on Artiﬁcial In telligence in Medicine,” pp. 247–256. Springer-V erlag. URL http://ww w.cs.huji.ac.il/labs/compbi o/Repository/ Datasets/alarm/alarm.ht m . Bo ettc her SG, Dethlefsen C (2003). “ deal : A Pac k age for Learning Ba y esian Net works.” Jour- nal of Statistic al Softwar e , 8 (20), 1–40. ISSN 1548-7660. URL http://www.jstatsoft. org/v08/i20 . Chic k ering DM (1995). “A T ransformational Characterization of Equiv alen t Bay esian Net w ork Structures.” In “UAI ’95: Pro ceedings of the Eleven th Annual Conference on Uncertaint y in Artiﬁcial Intelligence,” pp. 87–98. Morgan Kaufmann. Chic k ering DM (2002). “Optimal Structure Iden tiﬁcation with Greedy Searc h.” Journal of Machine L e arning R ese ar ch , 3 , 507–554. Co op er GF, Hersko vits E (1992). “A Ba y esian Metho d for the Induction of Probabilistic Net w orks from Data.” Machine L e arning , 9 (4), 309–347. Daly R, Sh en Q (2007). “Metho ds to Accelerate the Lear ning of Bay esian Netw ork Structures.” In “Pro ceedings of the 2007 UK W orkshop on Computational Int elligence,” Imperial College, London. Edw ards DI (2000). Intr o duction to Gr aphic al Mo del ling . Springer. F riedman N, Linial M, Nachman I (2000). “Using Bay esian Net w orks to Analyze Expression Data.” Journal of Computational Biolo gy , 7 , 601–620. F riedman N, P e’er D, Nac hman I (1999). “Learning Ba y esian Netw ork Structure from Massiv e Datasets: The ” Sparse Candidate” Algorithm.” In “Proceedings of Fifteen th Conference on Uncertain t y in Artiﬁcial Intelligence (UAI),” pp. 206–221. Morgan Kaufmann. Geiger D, Heck erman D (1994). “Learning Gaussian Netw orks.” T e chnic al r ep ort , Microsoft Researc h, Redmond, W ashington. Av ailable as T echnical Report MSR-TR-94-10. Gen tleman R, Whalen E, Huber W, F alcon S (2010). gr aph : A p ackage to hand le gr aph data structur es . R pac k age version 1.26.0. Gen try J, Long L, Gen tleman R, F alcon S, Hahne F, Sark ar D (2010). R gr aphviz : Pr ovides Plotting Cap abilities for R Gr aph Obje cts . R pac k age version 1.26.0. Go o d P (2005). Permutation, Par ametric and Bo otstr ap T ests of Hyp otheses . Springer, 3rd edition. Hec k erman D, Geiger D, Chic k ering DM (1995). “Learning Bay esian Netw orks: The Com bi- nation of Knowledge and Statistical Data.” Machine L e arning , 20 (3), 197–243. Av ailable as T ec hnical Rep ort MSR-TR-94-09. Journal of Statistical Softw are 21 Højsgaard S (2010). gR ain : Gr aphic al Indep endenc e Networks . R pac k age version 0.8.5. Højsgaard S, Dethlefsen C, Bowsher C (2010). gR b ase : A p ackage for gr aphic al mo del ling in R . R pack age version 1.3.4. Højsgaard S, Lauritzen SL (2008). gR c : Infer enc e in Gr aphic al Gaussian Mo dels with Edge and V ertex Symmetries . R pac k age version 0.2.2. Holmes DE, Jain LC (eds.) (2008). Innovations in Bayesian Networks: The ory and Applic a- tions , volume 156 of Studies in Computational Intel ligenc e . Springer. Kalisc h M, B ¨ uhlmann P (2007). “Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm.” Journal of Machine L e arning R ese ar ch , 8 , 613–66. Korb K, Nicholson A (2004). Bayesian Artiﬁcial Intel ligenc e . Chapman and Hall. Kullbac k S (1959). Information The ory and Statistics . Wiley . Lam W, Bacch us F (1994). “Learning Bay esian Belief Netw orks: An Approac h Based on the MDL Principle.” Computational Intel ligenc e , 10 , 269–293. Legendre P (2000). “Comparison of Perm utation Metho ds for the Parti al Correlation and P artial Mantel T ests.” Journal of Statistic al Computation and Simulation , 67 , 37–73. Mardia KV, Kent JT, Bibby JM (1979). Multivariate A nalysis . Academic Press. Margaritis D (2003). L e arning Bayesian Network Mo del Structur e fr om Data . Ph.D. thesis, Sc ho ol of Computer Science, Carnegie-Mellon Univ ersit y , Pittsburgh, P A. Av ailable as T echnical Rep ort CMU-CS-03-153. Mo ore A, W ong W (2003). “Optimal Reinsertion: A New Search Op erator for Accelerated and More Accurate Bay esian Net w ork Structure Learning.” In “Pro ceedings of the 20th In ternational Conference on Machine Learning (ICML ’03),” pp. 552–559. AAAI Press. Neap olitan RE (2003). L e arning Bayesian Networks . Prentice Hall. P earl J (1988). Pr ob abilistic R e asoning in Intel ligent Systems: Networks of Plausible Infer- enc e . Morgan Kaufmann. R Dev elopmen t Core T eam (2009). R: A L anguage and Envir onment for Statistic al Comput- ing . R F oundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R- project.org . Rissanen J (1978). “Mo deling by Shortest Data Description.” Automatic a , 14 , 465–471. Spirtes P , Glymour C, Scheines R (2001). Causation, Pr e diction and Se ar ch . MIT Press. Tierney L, Rossini AJ, Li N, Sev cik o v a H (2008). snow : Simple Network of Workstations . R pac k age version 0.3-3. Tsamardinos I, Aliferis CF, Statniko v A (2003). “Algorithms for Large Scale Marko v Blank et Disco v ery .” In “Pro ceedings of the Sixteen th International Florida Artiﬁcial In telligence Researc h So ciety Conference,” pp. 376–381. AAAI Press. 22 Learning Bay esian Net w orks with the bnlearn R Pac k age Tsamardinos I, Bro wn LE, Aliferis CF (2006). “The Max-Min Hill-Clim bing Bay esian Netw ork Structure Learning Algorithm.” Machine L e arning , 65 (1), 31–78. V erma TS, Pearl J (1991). “Equiv alence and Synthesis of Causal Mo dels.” Unc ertainty in A rtiﬁcial Intel ligenc e , 6 , 255–268. Whittak er J (1990). Gr aphic al Mo dels in Applie d Multivariate Statistics . Wiley . Witten IH, F rank E (2005). Data Mining: Pr actic al Machine L e arning T o ols and T e chniques . Morgan Kaufmann, 2nd edition. Y aramak ala S, Margaritis D (2005). “Sp eculative Marko v Blanket Discov ery for Optimal F eature Selection.” In “ICDM ’05: Pro ceedings of the Fifth IEEE In ternational Conference on Data Mining,” pp. 809–812. IEEE Computer So ciety , W ashington, DC, USA. Aﬃliation: Marco Scutari Departmen t of Statistical Sciences Univ ersit y of Pado v a Via Cesare Battisti 241, 35121 Pado v a, Italy E-mail: marco.scutari@stat.unipd. it Journal of Statistical Software http://www.jstatsoft.or g/ published by the American Statistical Asso ciation http://www.amstat.org/ V olume VV, Issue I I Submitte d: yyyy-mm-dd MMMMMM YYYY A c c epte d: yyyy-mm-dd

Learning Bayesian Networks with the bnlearn R Package

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment