cyclinbayes: Bayesian Causal Discovery with Linear Non-Gaussian Directed Acyclic and Cyclic Graphical Models
We introduce cyclinbayes, an open-source R package for discovering linear causal relationships with both acyclic and cyclic structures. The package employs scalable Bayesian approaches with spike-and-slab priors to learn directed acyclic graphs (DAGs…
Authors: Robert Lee, Raymond K. W. Wong, Yang Ni
Journal of Machine Learning Researc h 23 (2026) 1-5 Submitted 2/26; Revised ?/??; Published ?/?? cyclinbayes : Ba y esian Causal Disco v ery with Linear Non-Gaussian Directed Acyclic and Cyclic Graphical Models Rob ert D. Lee robderd ylee@st a t.t amu.edu Dep artment of Statistics T exas A&M University Col le ge Station, TX 77843, USA. Ra ymond K. W. W ong ra ywong@t amu.edu Dep artment of Statistics T exas A&M University Col le ge Station, TX 77843, USA. Y ang Ni y ang.ni@austin.utexas.edu Dep artment of Statistics and Data Scienc es The University of T exas at Austin A ustin, TX 78705, USA. Editor: My editor Abstract W e introduce cyclinbayes , an op en-source R pac k age for discov ering linear causal rela- tionships with b oth acyclic and cyclic structures. The pac k age employs scalable Ba yesian approac hes with spike-and-slab priors to learn directed acyclic graphs (D AGs) and directed cyclic graphs (DCGs) under non-Gaussian noise. A central feature of cyclinbayes is comprehensiv e uncertaint y quan tification, including p osterior edge inclusion probabilities, p osterior probabilities of net work motifs, and p osterior probabilities o ver entire graph struc- tures. Our implemen tation addresses tw o limitations in existing soft ware: (1) while meth- o ds for linear non-Gaussian D AG learning are a v ailable in R and Python, they generally lack prop er uncertain ty quan tification, and (2) reliable implemen tations for linear non-Gaussian DCG remain scarce. The pack age implements computationally efficien t hybrid MCMC al- gorithms that scale to large datasets. Bey ond uncertaint y quantification, w e prop ose a new decision-theoretic approac h to summarize p osterior samples of graphs, yielding principled p oin t estimates based on posterior exp ected loss suc h as p osterior expected structural Ham- ming distance and structural interv ention distance. The pack age, a supplemen tary material, and a tutorial are a v ailable on GitHub at https://github.com/roblee01/cyclinbayes . Keyw ords: Ba yesian structure learning, directed acyclic and cyclic graphs, decision- theoretic graph selection, hybrid MCMC sampler 1 In tro duction Causal disco very in high-dimensional settings has gained significan t atten tion in recent y ears. A central line of work is based on the non-Gaussian noise assumption, as in LiNGAM and its v ariants (Shimizu et al., 2006, 2011) for directed acyclic graphs (DA Gs), which are a v ailable in R pack ages such as pcalg (Markus Kalisch et al., 2012) and rlingam (Kikuchi, 2025), and LiNG (Lacerda et al., 2008) for directed cyclic graphs. Ho wev er, these metho ds © 2026 Robert D. Lee, Raymond K. W. W ong and Y ang Ni. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/ . Attribution requiremen ts are provided at http://jmlr.org/papers/v23/21- 0000.html . Lee, Wong, and Ni pro vide only p oint estimates of causal graphs, offering no mec hanism for structural or parameter uncertaint y quantification. Consequen tly , users receiv e limited guidance on the reliabilit y of the inferred graph or causal effects. In particular, while many tools supp ort non-Gaussian linear D AG disco very , soft ware for linear non-Gaussian DCGs (feedback or recipro cal effects) is m uch more limited, often requiring stronger mo deling restrictions (e.g., Gaussian errors or additional constraints). Moreov er, publicly av ailable softw are for LiNG st yle cyclic discov ery is limited, and some earlier implementations are no longer activ ely main tained. While a Ba yesian framework can address these limitations, summarizing the p osterior ov er graph structures is itself c hallenging, as the p osterior often contains many nearly equiv alent graphs, making it nontrivial to identify a single representativ e structure in a principled wa y . Therefore, w e in tro duce cyclinbayes , an open-source R pack age for Bay esian causal disco very in high-dimensional settings. The pack age supp orts learning b oth D AGs and DCGs. Its k ey con tributions are: ( i ) an MCMC algorithm for linear non-Gaussian DA Gs (Ba yesian LiNGAM), ( ii ) an MCMC algorithm for linear non-Gaussian DCGs (Bay esian LiNG), ( iii ) comprehensiv e p osterior uncertaint y quantification, including edge inclusion probabilities, netw ork motif inclusion probabilities, and credible in terv als for direct causal effects, and ( iv ) a decision-theoretic graph selection pro cedure based on weigh ted medoids under the structural Hamming distance (SHD), the structural interv en tion distance (SID), or any user-defined loss functions. Implemen ted in Rcpp, cyclinbayes leverages optimized C++ routines to handle large-scale, high-dimensional datasets with substan tial computa- tional efficiency . 2 Mo del Sp ecification Consider p random v ariables and n observ ations. Let Y ( q ) i denote the i th v ariable of obser- v ation q . W e model them with a linear structural equation model (SEM) with non-Gaussian noise, Y ( q ) i = P j ∈ pa( i ) B ij Y ( q ) j + ϵ ( q ) i , (1) where pa( i ) denotes the paren t set of no de i in a graph G , the en tries of B = ( B ij ) p i,j =1 are the direct causal effects, and ϵ ( q ) i is a noise v ariable dra wn from a finite mixture of Gaussians. If the underlying graph G is acyclic, the SEM is recursive. T o facilitate sparse structure learning and interpretable p osterior inference, w e place hierarc hical priors on b oth the graph adjacency indicators and the causal effect co efficien ts. Edge inclusion is gov erned by a Beta-Bernoulli prior on adjacency indicators E ij (where E ij = 1 if j → i ), E ij | γ ∼ Bernoulli( γ ) , γ ∼ Beta( a γ , b γ ) , where γ is the prior probability of edge inclusion. Conditional on E ij , the causal effect B ij follo ws a spike and slab prior, with a p oin t mass at zero for excluded edges and a Gaussian slab with v ariance γ 1 (itself given an in verse gamma prior) for included edges, B ij | E ij , γ 1 ∼ (1 − E ij ) δ 0 + E ij N (0 , γ 1 ) , γ 1 ∼ Inv erseGamma( a γ 1 , b γ 1 ) . This prior is conjugate when the graph is acyclic. 2 Ba yesian Causal Discover y with Non-Gaussian Directed A cylic and Cycle Graphs 3 P ac k age Implementation Our pack age cyclinbayes provides tw o fast Ba yesian samplers for DA Gs and DCGs based on the mo del in Section 2, along with tools for p osterior uncertain ty quan tification and analysis. Both samplers use MCMC to up date the graph structure and causal co efficients, and are implemented in Rcpp for sp eed. Figure 1 outlines the workflo w: run a sampler, then compute a graph p oin t estimate, credible interv als for causal effects, and posterior probabilities for user-sp ecified net work motifs. The ma jor functions are: 1. BayesDAG() implements Bay esian LiNGAM for learning acyclic causal structures by restricting the graph in (1) to b e a D A G. The algorithm uses a collapsed Gibbs sampler in whic h the causal effect co efficients are marginalized out to impro ve mixing, together with sim ulated annealing to mitigate lo cal optima. Giv en a data matrix and prior h yp erparameters, the function returns p osterior samples of the adjacency matrix and all asso ciated mo del parameters. 2. BayesDCG() implements Ba yesian LiNG for DCG strucutre learning using a Gibbs- within-Metroplis algorithm. Input and output are similar to BayesDAG() . 3. point est graph() computes a decision-theoretic p osterior p oint estimate of the ad- jacency matrix from p osterior samples under a chosen distance metric (SHD, SID, or a user-sp ecified distance). SID is only applicable to D AGs. 4. posterior interval est() computes the Highest P osterior Density (HPD) and equal- tailed credible interv als (at a user-chosen lev el) for the model parameters suc h as direct causal effects. 5. posterior network motif() computes the p osterior probabilit y mass of a user-specified net work motif (i.e., a subgraph of a sp ecific set of no des) b y chec king ho w often all its edges app ear sim ultaneously in the p osterior graph samples. Run a sampler: BayesDAG() or BayesDCG() posterior interval est() HPD / equal-tailed in terv als point est graph() SHD / SID (D AG only) / custom posterior network motif() Relative frequency of net work Figure 1: Typical cyclinbayes w orkflo w: run a sampler, then perform graph selection, in terv al estimation, and motif p osterior analysis. Decision-Theoretical Approac h for Graph Selection Selecting a single graph from a Ba yesian p osterior is challenging b ecause the p osterior ov er graph structures is often highly m ultimo dal. Standard summaries, such as the Maxim um A P osteriori (MAP) graph or edge-wise thresholding, either fo cus on a single mo de or ignore global structural constraints (e.g., the resulting graph ma y not even b e DA Gs). W e therefore adopt a Bay esian decision- theoretic framework in which the graph p oint estimator minimizes p osterior expected loss 3 Lee, Wong, and Ni under a chosen graph discrepancy: a ∗ = arg min a E [ d ( a, G ) | data] ≈ arg min a 1 m m X h =1 d ( a, G ( h ) ) , (2) where the expectation is the p osterior expectation with resp ect to G , whic h is approximated b y Monte Carlo via the p osterior samples {G (1) , . . . , G ( m ) } . Because the space of p ossible graphs is v ast, we appro ximate the optimization problem in (2) using the same set of p oste- rior samples of graphs. This naturally leads to the p osterior w eighted medoid, whic h selects the graph minimizing exp ected loss among the sampled structures. More sp ecifically , let {G ∗ (1) , . . . , G ∗ ( v ) } denote the set of unique graph structures among the m p osterior samples. Eac h unique graph G ∗ ( u ) is assigned a weigh t w u , u = 1 , . . . , v , equal to its p osterior prob- abilit y . F or each candidate graph G ∗ ( l ) , we compute its total w eighted distance D l to all other unique graphs: D l = v X u =1 w u d ( G ∗ ( l ) , G ∗ ( u ) ) , where d ( · , · ) measures the discrepancy b etw een the t w o graphs. The w eighted medoid is then G ( k ) ∗ where k = arg min l ∈{ 1 ,...,v } D l , which is an approximate solution to (2). W e provide m ultiple options for distance metrics d ( · , · ) including SHD, SID (for DA Gs only), and any user-sp ecified distance. By op erating on the posterior distribution of the graphs rather than the marginal p oste- rior inclusion probability of each individual edge, this new join t decision rule fully leverages p osterior structural uncertain ty and yields a p oin t estimate that reflects the ov erall p osterior consensus. 4 Conclusion W e ha ve in tro duced cyclinbayes , an R pack age that provides a unified Ba yesian framework for causal disco very under linear non-Gaussian structural equation mo dels. F or DA Gs, our pack age offers a scalable, fully Bay esian treatmen t of the LiNGAM mo del, enabling comprehensiv e uncertaint y quantification of both direct causal effects and user-sp ecified net work motifs, which are capabilities absent from existing implementations. cyclinbayes also represen ts one of the few av ailable softw are implemen tations for linear non-Gaussian DCG learning and, to our knowledge, the only Ba yesian implemen tation, thereby uniquely enabling uncertaint y quantification in settings with feedback lo ops. A key metho dological contribution is the no vel decision-theoretic approach to p osterior graph summarization. Rather than relying on edge-wise thresholding or MAP estimation, whic h can yield graphs that po orly represen t the posterior distribution, our approac h selects a representativ e graph that minimizes p osterior exp ected loss under structural discrepancy metrics suc h as SHD and SID. This principled approach fully leverages the ric h structural uncertain ty captured b y Bay esian inference and pro duces p oin t estimates that reflect global p osterior consensus. The computationally efficien t Rcpp implementation, combined with comprehensiv e to ols for p osterior inference and graph selection, makes cyclinbayes a practical resource for applied researc hers seeking flexible causal mo deling with rigorous uncertaint y quan tification. 4 Ba yesian Causal Discover y with Non-Gaussian Directed A cylic and Cycle Graphs References Gen ta Kikuchi. rlingam: R Implementation of LiNGAM algorithms , 2025. URL https://github.com/gkikuchi/rlingam . R pack age version 0.0.0.9002, commit 8f17a3c2dd431d809f7f6ed2acd672efc788e34d. Gusta vo Lacerda, Peter Spirtes, Joseph Ramsey , and Patrik O. Hoy er. Discov ering cyclic causal mo dels by indep enden t comp onents analysis. In Pr o c e e dings of the Twenty-F ourth Confer enc e on Unc ertainty in Artificial Intel ligenc e , UAI’08, page 366–374, Arlington, Virginia, USA, 2008. AUAI Press. ISBN 0974903949. Markus Kalisc h, Martin M¨ ac hler, Diego Colombo, Marlo es H. Maathuis, and P eter B ¨ uhlmann. Causal inference using graphical mo dels with the R pac k age p calg. Jour- nal of Statistic al Softwar e , 47(11):1–26, 2012. doi: 10.18637/jss.v047.i11. Shohei Shimizu, P atrik Ho yer, Aapo Hyv¨ arinen, and An tti Kerminen. A linear non-gaussian acyclic mo del for causal discov ery . Journal of Machine L e arning R ese ar ch , 7:2003–2030, 10 2006. Shohei Shimizu, T ak anori Inazumi, Y asuhiro Sogaw a, Aap o Hyv arinen, Y oshinobu Kaw a- hara, T ak ashi W ashio, Patrik O. Ho y er, and Kenneth Bollen. Directlingam: A direct metho d for learning a linear non-gaussian structural equation mo del. Journal of Machine L e arning R ese ar ch , 12:1225–1248, 2011. 5
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment