Learning high-dimensional directed acyclic graphs with latent and selection variables
We consider the problem of learning causal information between random variables in directed acyclic graphs (DAGs) when allowing arbitrarily many latent and selection variables. The FCI (Fast Causal Inference) algorithm has been explicitly designed to infer conditional independence and causal information in such settings. However, FCI is computationally infeasible for large graphs. We therefore propose the new RFCI algorithm, which is much faster than FCI. In some situations the output of RFCI is slightly less informative, in particular with respect to conditional independence information. However, we prove that any causal information in the output of RFCI is correct in the asymptotic limit. We also define a class of graphs on which the outputs of FCI and RFCI are identical. We prove consistency of FCI and RFCI in sparse high-dimensional settings, and demonstrate in simulations that the estimation performances of the algorithms are very similar. All software is implemented in the R-package pcalg.
💡 Research Summary
The paper tackles the challenging problem of learning causal structure from observational data when arbitrarily many latent (unmeasured) and selection variables may be present. In such settings the underlying causal relationships among the observed variables can no longer be represented by a simple directed acyclic graph (DAG); instead one must work with maximal ancestral graphs (MAGs) and their equivalence class representatives, partial ancestral graphs (PAGs). The Fast Causal Inference (FCI) algorithm, introduced by Spirtes, Glymour and Scheines, is the canonical method for recovering a PAG from conditional independence (CI) information in the presence of latent and selection variables. However, FCI’s exhaustive search over conditioning sets makes it computationally prohibitive for graphs with more than a few dozen nodes, especially in high‑dimensional regimes where the number of variables p can greatly exceed the sample size n.
To address this scalability issue, the authors propose a new algorithm called Really Fast Causal Inference (RFCI). RFCI retains the essential two‑stage structure of FCI—first learning an undirected skeleton, then orienting edges—but dramatically reduces the number of CI tests required. The key design choices are:
- Skeleton construction: RFCI adopts the PC‑style adjacency removal but restricts conditioning sets to those of minimal size, avoiding the combinatorial explosion of larger sets.
- V‑structure identification: Only the smallest conditioning sets needed to detect colliders are examined, which suffices for most practical graphs.
- Orientation rules: RFCI applies a subset of the orientation rules used in FCI (the so‑called R1–R4 rules) that do not rely on high‑order conditioning. Rules that require testing on large conditioning sets are omitted.
These modifications mean that RFCI may miss some conditional independences that would be discovered by FCI, particularly those that only appear when conditioning on many variables. Importantly, however, the authors prove that any causal orientation (arrowheads and tails) produced by RFCI is guaranteed to be correct in the asymptotic limit; the algorithm is sound. Moreover, they define a class of graphs—dubbed “RFCI‑compatible” graphs—for which RFCI and FCI produce identical PAGs, showing that the loss of information can be negligible in many realistic scenarios.
The theoretical contributions are twofold. First, the paper establishes soundness of RFCI: the output PAG contains no false causal directions and any true direction present in the underlying MAG will be represented (possibly with a circle mark if orientation is ambiguous). Second, the authors prove high‑dimensional consistency of both FCI and RFCI under a sparsity regime where the maximum node degree grows at most logarithmically with p. The consistency proof for RFCI requires weaker assumptions than that for FCI because RFCI’s reduced complexity avoids the need for extremely large conditioning sets, which would otherwise demand unrealistically large sample sizes for reliable CI testing.
Empirical evaluation is performed through extensive simulations. Graphs with 50, 100, and 200 nodes are generated under varying densities, latent‑to‑observed variable ratios, and selection mechanisms. The authors compare RFCI to the original FCI, Anytime FCI (which caps conditioning set size), Adaptive Anytime FCI (AAF‑CI), and several speed‑up variants of FCI. Results show that RFCI achieves orders‑of‑magnitude speed improvements (often 10–30× faster) while maintaining comparable precision, recall, and structural Hamming distance to FCI. In particular, for the largest graphs (p≈200) the original FCI becomes computationally infeasible, whereas RFCI completes in a matter of minutes. The authors also note that RFCI’s reliance on lower‑order CI tests makes it more robust in small‑sample settings, where high‑order tests suffer from low statistical power.
All algorithms are implemented in the R package pcalg, and the code is publicly released, facilitating immediate adoption by researchers. The paper’s contributions are significant for fields such as genomics, neuroimaging, and social network analysis, where high‑dimensional data with hidden confounders are the norm. By providing a scalable, theoretically sound method for learning causal structure under latent and selection bias, the work bridges a critical gap between methodological rigor and practical applicability.
Comments & Academic Discussion
Loading comments...
Leave a Comment