Causal discovery from observational data remains fundamentally limited by identifiability constraints. Recent work has explored leveraging Large Language Models (LLMs) as sources of prior causal knowledge, but existing approaches rely on heuristic integration that lacks theoretical grounding. We introduce HOLOGRAPH, a framework that formalizes LLM-guided causal discovery through sheaf theory-representing local causal beliefs as sections of a presheaf over variable subsets. Our key insight is that coherent global causal structure corresponds to the existence of a global section, while topological obstructions manifest as non-vanishing sheaf cohomology. We propose the Algebraic Latent Projection to handle hidden confounders and Natural Gradient Descent on the belief manifold for principled optimization. Experiments on synthetic and real-world benchmarks demonstrate that HOLOGRAPH provides rigorous mathematical foundations while achieving competitive performance on causal discovery tasks with 50-100 variables. Our sheaf-theoretic analysis reveals that while Identity, Transitivity, and Gluing axioms are satisfied to numerical precision (< 10 -6 ), the Locality axiom fails for larger graphs, suggesting fundamental non-local coupling in latent variable projections. Code is available at https://github.com/hyunjun1121/ holograph.
Causal discovery-the problem of inferring causal structure from data-is fundamental to scientific inquiry, yet remains provably underspecified without experimental in-tervention (Spirtes et al., 2000;Pearl, 2009). Observational data alone can at most identify the Markov equivalence class of DAGs (Verma & Pearl, 1991), and the presence of latent confounders further complicates identifiability. This has motivated recent interest in leveraging external knowledge sources, particularly Large Language Models (LLMs), which encode substantial causal knowledge from pretraining corpora (Kiciman et al., 2023;Ban et al., 2023).
However, existing approaches to LLM-guided causal discovery remain fundamentally heuristic. Prior work such as DEMOCRITUS (Mahadevan, 2024) treats LLM outputs as “soft priors” integrated via post-hoc weighting, lacking principled treatment of:
Coherence: How do we ensure local LLM beliefs about variable subsets combine into a globally consistent causal structure?
Contradictions: What happens when the LLM provides conflicting information about overlapping variable subsets?
Latent Variables: How do we project global causal models onto observed subsets while accounting for hidden confounders?
We propose HOLOGRAPH (Holistic Optimization of Latent Observations via Gradient-based Restriction Alignment for Presheaf Harmony), a framework that addresses these challenges through the lens of sheaf theory. Our key insight is that local causal beliefs can be formalized as sections of a presheaf over the power set of variables. While full sheaf structure (including Locality) fails due to non-local latent coupling, we demonstrate that Identity, Transitivity, and Gluing axioms hold to numerical precision (< 10 -6 ), enabling coherent belief aggregation.
- Sheaf-Theoretic Framework: We formalize LLMguided causal discovery as a presheaf satisfaction problem, where local sections are linear SEMs and restriction maps implement Algebraic Latent Projection.
We derive a natural gradient descent algorithm on the belief manifold with Tikhonov regularization for numerical stability.
Active Query Selection: We use Expected Free Energy (EFE) to select maximally informative LLM queries, balancing epistemic and instrumental value.
Theoretical Analysis: We empirically verify that Identity, Transitivity, and Gluing axioms hold to numerical precision, while systematically identifying Locality violations arising from non-local latent coupling.
Comprehensive experiments on synthetic (ER, SF) and real-world (Sachs, Asia) benchmarks, demonstrating +91% F1 improvement over NOTEARS in extreme low-data regimes (N ≤ 10) and +13.6% F1 improvement when using HOLO-GRAPH priors to regularize statistical methods.
- Implementation Verification: Complete mathematical verification that all 15 core formulas in the specification match the implementation to numerical precision (Appendix A.6).
Key Finding 1: Locality Failure as Discovery. Our sheaf exactness experiments (Section 4.5) reveal a striking result: while Identity (ρ
and Gluing axioms pass with errors < 10 -6 , the Locality axiom systematically fails with errors scaling as O( √ n) with graph size. This is not a bug but a discovery: it reveals fundamental non-local information propagation through latent confounders. The failure quantitatively measures the “non-sheafness” of causal models under latent projectionsa diagnostic that could guide when latent variable modeling is necessary.
Key Finding 2: Sample Efficiency & Hybrid Synergy. Our sample efficiency experiments (Section 4.3) establish a clear decision boundary for when to use LLM-based discovery:
• Low-data regime (N < 20): HOLOGRAPH’s zeroshot approach achieves F1 = 0.67 on semantically rich domains, outperforming NOTEARS by up to +91% relative F1 when only N = 5 samples are available.
• Hybrid synergy: When some data is available (N = 10-50), using HOLOGRAPH priors to regularize NOTEARS yields +13.6% F1 improvement by preventing overfitting to sparse observations.
• Semantic advantage: Performance depends critically on LLM domain knowledge. On Asia (epidemiology with intuitive variable names), HOLOGRAPH achieves F1 = 0.67; on Sachs (specialized protein signaling), only F1 = 0.20.
Continuous Optimization for Causal Discovery. NOTEARS (Zheng et al., 2018) pioneered continuous optimization for DAG learning via the acyclicity constraint h(W) = tr(e W•W ) -n. Extensions include GOLEM (Ng et al., 2020) with likelihood-based scoring and DAGMA (Bello et al., 2022) using log-determinant characterizations. HOLOGRAPH builds on this foundation, adding sheaf-theoretic consistency.
LLM-Guided Causal Discovery. Recent work explores LLMs as causal knowledge sources. Kiciman et al. (2023) benchmark LLMs on causal inference tasks, while Ban et al. (2023) propose active querying strategies. DEMOCRITUS (Mahadevan, 2024) uses LLM beliefs as soft priors but lacks principled treatment of coherence. Emerging “causal fo
This content is AI-processed based on open access ArXiv data.