Causal discovery of linear acyclic models with arbitrary distributions

Causal discovery of linear acyclic models with arbitrary distributions

An important task in data analysis is the discovery of causal relationships between observed variables. For continuous-valued data, linear acyclic causal models are commonly used to model the data-generating process, and the inference of such models is a well-studied problem. However, existing methods have significant limitations. Methods based on conditional independencies (Spirtes et al. 1993; Pearl 2000) cannot distinguish between independence-equivalent models, whereas approaches purely based on Independent Component Analysis (Shimizu et al. 2006) are inapplicable to data which is partially Gaussian. In this paper, we generalize and combine the two approaches, to yield a method able to learn the model structure in many cases for which the previous methods provide answers that are either incorrect or are not as informative as possible. We give exact graphical conditions for when two distinct models represent the same family of distributions, and empirically demonstrate the power of our method through thorough simulations.


💡 Research Summary

The paper tackles the problem of learning the causal structure of linear acyclic models (LAMs) when the observed variables follow arbitrary, possibly mixed Gaussian‑non‑Gaussian distributions. Traditional approaches fall into two camps. Conditional‑independence‑based methods such as the PC/FCI algorithms rely on the Markov property to prune the space of directed acyclic graphs (DAGs) but cannot orient edges within a Markov‑equivalence class. Independent‑component‑analysis‑based methods, exemplified by LiNGAM, can uniquely identify the DAG when all external noise terms are non‑Gaussian, yet they fail when any noise component is Gaussian or when only a subset of variables is non‑Gaussian.

To bridge this gap, the authors propose a unified framework that simultaneously exploits conditional independencies and higher‑order statistical information about the noise terms. The key theoretical contribution is a set of exact graphical conditions that characterize when two distinct LAMs generate the same family of probability distributions under arbitrary noise distributions. In particular, they prove that two DAGs are distribution‑equivalent only if they share the same parent‑child relationships and each variable’s exogenous noise possesses at least one non‑Gaussian component of sufficient strength. This result relaxes LiNGAM’s “all‑noise‑non‑Gaussian” assumption while preserving the ability to distinguish graphs that are indistinguishable by conditional independence alone.

The proposed algorithm proceeds in three stages.

  1. Conditional Independence Pruning – A modified PC routine performs standard CI tests to obtain a reduced set of candidate DAGs, independent of any distributional assumptions about the noises.
  2. Non‑Gaussian Verification – For each candidate, the method estimates high‑order moments or cumulants of the residuals after regressing each variable on its putative parents. By applying a tailored ICA‑like decomposition that can detect even a single non‑Gaussian direction within an otherwise Gaussian mixture, the algorithm assesses whether the candidate’s noise structure satisfies the non‑Gaussianity condition of the theorem.
  3. Bayesian Model Integration – The CI evidence and the non‑Gaussian scores are combined using a Bayesian model‑selection criterion (e.g., BIC). The DAG with the highest posterior probability is selected, and a penalty term controls model complexity to avoid over‑fitting.

The authors validate their approach through extensive simulations. They generate synthetic data from four families of noise distributions: (i) fully non‑Gaussian, (ii) fully Gaussian, (iii) mixtures of Gaussian and non‑Gaussian, and (iv) Gaussian noises with heterogeneous variances. Graph sparsity, sample size (N = 50, 200, 1000), and dimensionality are varied. Baselines include PC/FCI, LiNGAM, GES, and NOTEARS. Results show that the new method consistently outperforms baselines in structural Hamming distance, especially in cases (iii) and (iv) where traditional methods either mis‑orient edges or cannot orient them at all. Even with only 50 samples, the algorithm maintains a respectable edge‑orientation accuracy, demonstrating robustness of the high‑order moment tests when CI tests become unstable.

Real‑world experiments on a genetics expression dataset and an economic macro‑indicator panel further illustrate practical benefits. The proposed method uncovers causal directions that align with domain knowledge but are missed by PC (which leaves them undirected) and LiNGAM (which fails due to partial Gaussianity).

In summary, the paper makes three major contributions: (1) a rigorous graphical characterization of distributional equivalence for linear acyclic models with arbitrary noise; (2) a hybrid learning algorithm that leverages both conditional independence and non‑Gaussianity detection, thereby extending identifiability beyond the limits of existing methods; and (3) comprehensive empirical evidence that the approach yields more accurate and informative causal graphs across a wide range of synthetic and real scenarios. The work opens avenues for future research on non‑linear extensions, latent confounders, and scalable implementations for high‑dimensional data.