Cohomological Obstructions to Global Counterfactuals: A Sheaf-Theoretic Foundation for Generative Causal Models

Journal of Machine Learning Research 24 (2024) 1-31 Submitted 1/24; Revised 5/24; Published 9/24 Cohomological Obstructions to Global Coun terfactuals: A Sheaf-Theoretic F oundation for Generativ e Causal Mo dels Rui W u wurui22@mail.ustc.edu.cn Scho ol of Management, University of Scienc e and T e chnolo gy of China 96 Jinzhai R o ad, Hefei, 230026, Anhui, China Hong Xie hongx87@ustc.edu.cn Scho ol of Computer Scienc e and Engine ering, University of Scienc e and T e chnolo gy of China 96 Jinzhai R o ad, Hefei, 230026, Anhui, China Y ong jun Li ∗ lionli@ustc.edu.cn Scho ol of Management, University of Scienc e and T e chnolo gy of China 96 Jinzhai R o ad, Hefei, 230026, Anhui, China Editor: Action Editor Name Abstract Curren t con tinuous generative mo dels (e.g., Diﬀusion Mo dels, Flo w Matching) implicitly assume that lo cally consisten t causal mec hanisms naturally yield globally coheren t coun ter- factuals. In this pap er, we prov e that this assumption fails fundamentally when the causal graph exhibits non-trivial homology (e.g., structural conﬂicts or hidden confounders). W e formalize structural causal mo dels as cellular sheav es ov er W asserstein spaces, providing a strict algebraic top ological deﬁnition of cohomological obstructions ( H 1  = 0) in measure spaces. T o ensure computational tractabilit y and av oid deterministic singularities (which w e deﬁne as manifold te aring ), w e in tro duce en tropic regularization and derive the Entr opic Wasserstein Causal She af L aplacian , a nov el system of coupled non-linear F okker-Planc k equations. Crucially , we prov e an entropic pullbac k lemma for the ﬁrst v ariation of pushforw ard measures. By integrating this with the Implicit F unction Theorem (IFT) on Sinkhorn op- timalit y conditions, w e establish a direct algorithmic bridge to automatic diﬀeren tiation (VJP), ac hieving O (1)-memory reverse-mode gradients strictly indep enden t of the itera- tion horizon. Empirically , our framew ork successfully leverages thermo dynamic noise to na vigate top ological barriers (”entropic tunneling”) in high-dimensional scRNA-seq coun- terfactuals. Finally , we inv ert this theoretical framework to in tro duce the T op ological Causal Score, demonstrating that our Sheaf Laplacian acts as a highly sensitiv e algebraic detector for top ology-a w are causal discov ery . 1 In tro duction and Motiv ating Example Con tinuous generative models, suc h as score-based diﬀusion mo dels and normalizing ﬂows, ha ve demonstrated unpreceden ted success in counterfactual inference. Ho wev er, these framew orks implicitly rely on a critical, often unv eriﬁed h yp othesis: that ﬁtting lo cally consisten t causal mechanisms (edges) naturally comp oses into globally coheren t counterfac- tual distributions o ver the entire causal graph. ∗ . Corresponding author. © 2024 Rui W u. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/ . Attribution requirements are provided at http://jmlr.org/papers/v24/24- 0000.html . Wu, Xie and Li W e reveal that this assumption is fundamentally ﬂa wed in the presence of top ological obstructions. Consider a simple 3-no de directed acyclic graph with paths A → B → C and A → C . If the structural equations dictate conﬂicting mechanisms—for instance, the path through B attempts to push C to wards +4, while the direct edge A → C pulls it to wards − 4—the system exhibits a non-trivial homological obstruction. (a) Deterministic Manifold T earing (b) Stable Sheaf Compromise (Ours) Figure 1: The impact of top ological obstructions on causal generativ e mo dels. (a) Unreg- ularized, deterministic mo dels attempt to resolv e the contradiction b y pushing probability measures to inﬁnity , resulting in catastrophic manifold tearing. (b) Our proposed Entropic Sheaf Laplacian leverages thermo dynamic diﬀusion to gracefully resolve the conﬂict, guid- ing the system to a globally coherent stationary state. In suc h geometrically frustrated systems, enforcing deterministic local mec hanisms leads to mathematical singularities. As demonstrated in Figure 1(a), unregularized gradient ﬂows attempting to resolv e this con tradiction experience exp onen tial v ariance blow-up, tearing the probabilit y manifold apart and sending particles to inﬁnit y . W e formalize this failure mo de as manifold te aring . T o resolve this limitation, w e bridge P earl’s causal inference with Grothendieck’s sheaf theory and Otto’s W asserstein calculus. By mo deling the Structural Causal Mo del (SCM) as a cellular sheaf o ver W asserstein spaces and injecting thermo dynamic noise (en tropic regularization), w e deriv e the Entr opic Wasserstein Causal She af L aplacian . As sho wn in Figure 1(b), this no vel op erator elegan tly resolv es topological conﬂicts without deterministic singularities. 2 Related W ork Con tin uous Causal Generative Mo dels. Recent adv ancements in generativ e AI hav e deeply in tegrated Pearl’s causalit y (Pearl, 2009) with con tinuous generative frameworks, suc h as Causal Normalizing Flows (Ko caoglu et al., 2018) and Causal Diﬀusion Mo dels (Goudet et al., 2017). These mo dels t ypically parameterize the structural equations of a D AG as neural ODEs or SDEs. How ev er, they implicitly rely on the assumption of p erfe ct lo c al c omp osition —assuming that matc hing marginals lo cally along edges will guaran tee a coheren t global counterfactual distribution. Our work formally identiﬁes the b ottlenec k of these approaches: in the presence of unobserv ed confounders or structural conﬂicts, this assumption breaks do wn, leading to what w e deﬁne as manifold tearing. T op ological Deep Learning and Cellular Shea ves. T op ological machine learning has recently gained traction, particularly through the use of Cellular Shea ves in Graph Neural Netw orks (GNNs) Bodnar et al. (2022); Hansen and Ghrist (2019). By attaching v ector spaces (stalks) to no des and linear transformations (restriction maps) to edges, Sh eaf- 2 Cohomological Obstr uctions to Global Counterf actuals GNNs can address ov er-smo othing and heterophily in discrete graph represen tations. Our pap er profoundly generalizes this paradigm. Instead of ﬁnite-dimensional Euclidean stalks, w e elev ate the algebraic structure to inﬁnite-dimensional absolutely contin uous W asserstein spaces P 2 ( M v ), bridging discrete graph top ology with con tin uous geometric measure theory . Optimal T ransp ort and W asserstein Gradien t Flo ws. The adoption of Opti- mal T ransp ort (OT) (Villani, 2003, 2008) in machine learning has b een revolutionized b y en tropic regularization, whic h yields the computationally tractable Sinkhorn div ergence (Cuturi, 2013; P eyr´ e and Cuturi, 2019). While OT is predominan tly deplo yed as a static loss function for generative matc hing, w e utilize it fundamen tally diﬀerently: as the rigor- ous geometric metric deﬁning the cob oundary op erator within our T angent Causal Sheaf. F urthermore, rather than relying on heuristic neural up dates, w e formalize the ev olution of coun terfactual measures strictly as a metric gradient ﬂo w (Jordan et al., 1998; Am brosio et al., 2005). By exploiting the steady-state optimalit y conditions of the Sinkhorn dual p oten tials, we elegantly bridge this contin uous measure-theoretic ﬂow with mo dern reverse- mo de automatic diﬀeren tiation, yielding an exact and highly scalable analytic gradient v ector ﬁeld. 3 Mathematical Preliminaries T o establish a rigorous mathematical foundation for top ological obstructions in contin uous causal inference, we brieﬂy review the necessary concepts from geometric measure theory (Otto calculus) and algebraic top ology (cellular sheav es). 3.1 W asserstein Geometry and Otto Calculus Let M b e a smo oth, compact Riemannian manifold without b oundary . W e consider the space of absolutely contin uous probability measures with ﬁnite second moments, denoted as P 2 ( M ). Endow ed with the 2-W asserstein distance W 2 , the space P 2 ( M ) b eha v es as an inﬁnite-dimensional pseudo-Riemannian manifold Villani (2003). According to Otto’s geometric calculus, the tangen t space at a suﬃcien tly regular mea- sure µ ∈ P 2 ( M ) is formally deﬁned as the closure of the gradien ts of smo oth functions, equipp ed with the w eighted L 2 ( µ ) inner pro duct: T µ P 2 ( M ) = {∇ ϕ | ϕ ∈ C ∞ c ( M ) } L 2 ( µ ) , ⟨∇ ϕ, ∇ ψ ⟩ µ = Z M ⟨∇ ϕ, ∇ ψ ⟩ g dµ. (1) Absolutely con tinuous curv es µ t in P 2 ( M ) corresp ond to v elo city v ector ﬁelds v t ∈ T µ t P 2 ( M ) go verning the contin uit y equation ∂ t µ t + ∇ · ( v t µ t ) = 0 in the weak sense. 3.2 Regularity Assumptions T o ensure the well-posedness of the ensuing measure-theoretic calculus and partial diﬀer- en tial equations, we establish the following foundational regularity assumption, which nat- urally holds in standard generative mo deling settings (e.g., image generation o ver b ounded pixel spaces). Assumption 1 (Regularit y of the F actual Manifold and Mechanisms) We strictly assume that: 3 Wu, Xie and Li 1. Comp actness: F or e ach no de v ∈ V , the state sp ac e M v is a smo oth, c omp act R iemannian manifold without b oundary (or the pr ob ability me asur es ar e strictly sup- p orte d within a c omp act ge ometric domain). This ensur es that the Wasserstein sp ac es P 2 ( M v ) ar e c omp act under the we ak top olo gy. 2. Lipschitz Me chanisms: The deterministic structur al e quations Φ uv : M u → M v ar e bi-Lipschitz diﬀe omorphisms, ensuring that the pushforwar d op er ations pr eserve absolute c ontinuity. Remark 1 (Relaxation of Bi-Lipsc hitz Diﬀeomorphisms in Deep Learning) While Assumption 1 strictly r e quir es the c ausal me chanisms Φ uv to b e bi-Lipschitz diﬀe omor- phisms, this is natur al ly satisﬁe d—or rigor ously soft-r elaxe d—in mo dern de ep le arning ar- chite ctur es via two c omplementary avenues: 1. Ar chite ctur al R e alization via R esidual Flows: If the structur al e quation Φ uv is p ar ameterize d as a R esidual Network, Φ uv ( x ) = x + g θ ( x ) , and we enfor c e Sp e ctr al Normal- ization such that the Lipschitz c onstant Lip ( g θ ) < 1 , then by the Banach Fixe d-Point The- or em, Φ uv is strictly glob al ly invertible and bi-Lipschitz. F urthermor e, employing smo oth, non-satur ating activation functions (e.g., GELU or Swish) ensur es that Φ uv is a valid C 1 - diﬀe omorphism. This establishes a dir e ct isomorphism b etwe en our c ausal me chanisms and c ontinuous Normalizing Flows (Neur al ODEs), wher e the Pic ar d-Lindel¨ of the or em guar an- te es the existenc e and uniqueness of solutions for Lipschitz c ontinuous ve ctor ﬁelds, and the r esulting ﬂow map is a valid C 1 -diﬀe omorphism over the c omp act state sp ac e deﬁne d in Assumption 1. 2. Me asur e-The or etic Softening via Entr opic R e gularization: Even if we r elax the assumption such that Φ uv is mer ely a standar d F e e d-F orwar d Network (which may fold sp ac e and lose inje ctivity, r etaining only a ﬁnite glob al Lipschitz c ontinuity), our fr amework r emains c ompletely rigor ous due to the Entr opic R e gularization ( ε > 0 ). F or unr e gularize d Optimal T r ansp ort ( ε = 0 ), the lack of strict c onvexity in Φ uv would yield a non-diﬀer entiable Br enier p otential, br e aking the ge ometric pul lb ack in L emma 4. However, under Sinkhorn r e gularization, the optimal dual p otential g ( ε ) is strictly deﬁne d via the inte gr al e quation g ( ε ) ( y ) = − ε log R exp( f ( ε ) ( x ) − c ( x,y ) ε ) dµ ( x ) . Be c ause the c ost func- tion c ( x, y ) = ∥ x − y ∥ 2 is smo oth, the exp onential c onvolution acts as an inﬁnite-dimensional mol liﬁer Peyr´ e and Cuturi (2019), guar ante eing that g ( ε ) ∈ C ∞ ( M v ) and is glob al ly Lips- chitz. By R ademacher’s The or em, any glob al ly Lipschitz neur al network Φ uv is diﬀer entiable almost everywher e (a.e.) with r esp e ct to the L eb esgue me asur e. Sinc e our pr ob ability me a- sur es µ ar e absolutely c ontinuous, the interse ction of the non-diﬀer entiable set of Φ uv and the supp ort of µ has me asur e zer o. Conse quently, the chain rule strictly holds µ -a.e., and the top olo gic al str ess ve ctor ﬁeld ∇ ( g ( ε ) ◦ Φ uv ) = ( J Φ uv ) T ∇ g ( ε ) is wel l-deﬁne d, b ounde d, and L 2 ( µ ) -inte gr able. Thus, thermo dynamic noise ( ε > 0 ) strictly r esolves the me asur e-the or etic singularities intr o duc e d by non-invertible neur al networks. 3.3 Cellular Sheav es and Cohomology A cellular sheaf F ov er a directed graph G = ( V , E ) assigns a v ector space (the stalk ) F ( v ) to eac h node v ∈ V , and a linear transformation (the r estriction map ) F ( u → e ) : F ( u ) → F ( e ) 4 Cohomological Obstr uctions to Global Counterf actuals for each inciden t no de-edge pair. The space of 0-co c hains is C 0 ( G , F ) = L v ∈V F ( v ), and 1-co c hains is C 1 ( G , F ) = L e ∈E F ( e ). The linear coboundary operator d 0 : C 0 → C 1 computes lo cal discrepancies across edges. The ﬁrst cohomology group, c haracterizing the global topological obstructions (cy- cles that are not b oundaries), is deﬁned algebraically as the quotient space H 1 ( G , F ) = Ker( d 1 ) / Im( d 0 ). 4 The En tropic Causal Sheaf and T op ological Linearization Let G = ( V , E ) b e a directed acyclic graph representing the structural causal mec hanisms. T o elev ate causal inference from discrete scalar v ariables to contin uous high-dimensional manifolds, w e formalize the Structural Causal Mo del (SCM) using topological sheaf theory o ver measure spaces. 4.1 Stalks, Restriction Maps, and Metric Sheaf Discrepancy F or each no de v ∈ V , we assign a stalk deﬁned as the non-linear W asserstein space P 2 ( M v ), acting as the lo cal universe of counterfactual distributions. The r estriction map for an edge e = ( u, v ) ∈ E is deﬁned as the pushforw ard op erator Φ uv # : P 2 ( M u ) → P 2 ( M v ), where Φ uv is the deterministic lo cal causal mechanism. Let C 0 = Q v ∈V P 2 ( M v ) denote the base space of 0-co chains, represen ting a joint assign- men t of counterfactual marginals µ = ( µ v ) v ∈V . Unlike classical sheav es ov er vector spaces, the base space C 0 is a curved metric space lacking a linear group structure. Therefore, w e measure local top ological friction smoothly using the en tropic regularized optimal transport cost (Sinkhorn divergence). W e deﬁne the Metric Sheaf Discrepancy op erator δ acting on an edge e = ( u, v ) as: ( δ µ ) e = W 2 2 ,ε (Φ uv # µ u , µ v ) (2) Deﬁnition 2 (V ariational Metric Obstruction) A c ausal network exhibits a V ariational Metric Obstruction if exact adher enc e to al l lo c al deterministic me chanisms is ge ometric al ly imp ossible, me aning the inﬁmum of the glob al Entr opic Causal Dirichlet Ener gy is strictly b ounde d away fr om zer o: inf µ ∈C 0 E ε ( µ ) = inf µ ∈C 0 1 2 X i ∈V   X p ∈ p a ( i ) ω pi W 2 2 ,ε ( µ i , Φ pi # µ p ) + X c ∈ ch ( i ) ω ic W 2 2 ,ε ( µ c , Φ ic # µ i )   > 0 (3) wher e ω uv > 0 denotes the top olo gic al c onﬁdenc e weights of the structur al e quations, r eﬂe ct- ing the structur al c ertainty of the mo dele d me chanisms. 4.2 Linearization: The T angent Causal Sheaf and Rigorous H 1 Because the base space C 0 is non-linear, classical cohomological constructs are algebraically ill-deﬁned at the macroscopic level. T o rigorously formalize the cohomological obstruction without abusing notation, we lift the sheaf to the tangen t bundle via Otto calculus. W e deﬁne the T angent Causal Sheaf T µ F . At a global conﬁguration µ = ( µ v ) v ∈V , the stalk ov er each no de v is the Hilb ert space T µ v P 2 ( M v ). The space of instantaneous causal 5 Wu, Xie and Li p erturbations (0-co c hains) and edge discrepancy ﬁelds (1-co c hains) strictly form direct sums of Hilb ert spaces: T µ C 0 = M v ∈V T µ v P 2 ( M v ) , T µ C 1 = M e =( u,v ) ∈E T µ v P 2 ( M v ) (4) F or eac h edge e = ( u, v ), let d Φ uv # : T µ u P 2 ( M u ) → T µ v P 2 ( M v ) be the linearized restriction map (the F r ´ ec het deriv ativ e of the pushforward). W e deﬁne the linear cob oundary op erator d : T µ C 0 → T µ C 1 and its w ell-deﬁned Hilb ert adjoint d ∗ : T µ C 1 → T µ C 0 . Deﬁnition 3 (Strict Cohomological Obstruction) By line arizing onto the T angent Causal She af, we rigor ously deﬁne the ﬁrst metric c ohomolo gy gr oup of the c ausal system at state µ as the algebr aic quotient sp ac e: H 1 ( T µ G , T µ F ) ∼ = Ker ( d 1 ) / Im ( d ) (5) A c ausal system suﬀers fr om a Strict Cohomolo gic al Obstruction if, at its variational sta- tionary e quilibrium µ ∗ , the metric c ohomolo gy gr oup on the tangent she af is non-trivial: H 1 ( T µ ∗ G , T µ ∗ F )  = 0 . This implies the top olo gic al str ess c annot b e r esolve d by any valid tangent ﬂow. 5 The En tropic Pullbac k Lemma and Sheaf Laplacian T o resolv e top ological obstructions while av oiding deterministic manifold tearing, w e m ust minimize E ε ( µ ) b y evolving the system along the W asserstein gradien t ﬂo w (the JKO sc heme). The fundamental mathematical challenge lies in computing the geometric ad- join t d ∗ through non-linear causal mechanisms. Lemma 4 (First V ariation via En tropic Pullback) L et T : X → Y b e a smo oth dif- fe omorphism, ν ∈ P 2 ( Y ) a ﬁxe d tar get me asur e, and F ε ( µ ) = 1 2 W 2 2 ,ε ( ν, T # µ ) . The ﬁrst variation (Wasserstein gr adient) of F ε evaluate d at the sour c e me asur e µ is strictly given by the functional pul lb ack of the Sinkhorn dual p otential: gr ad W 2 F ε ( µ ) = ∇  δ F ε δ µ  = ∇ ( g ( ε ) ◦ T ) (6) wher e g ( ε ) : Y → R is the unique Sinkhorn dual p otential mapping T # µ to ν . Pro of Let ξ ∈ C ∞ c ( X ; R d ) b e a test v ector ﬁeld p erturbing the source measure: µ t = ( I + tξ ) # µ . The pushed-forw ard measure evolv es as ρ t = ( T ◦ ( I + tξ )) # µ . By the c hain rule, the Eulerian velocity ﬁeld driving ρ t at t = 0 is v 0 ( T ( x )) = D T ( x ) ξ ( x ), where DT is the Jacobian of T . By the inﬁnite-dimensional En velope Theorem for Entropic Optimal T ransp ort Peyr ´ e and Cuturi (2019), the deriv ative of the regularized cost along the curve is exactly ev aluated at the optimal dual p otential g ( ε ) : d dt     t =0 F ε ( µ t ) = Z Y g ( ε ) ( y ) ∂ t ρ t | t =0 ( dy ) = Z Y ⟨∇ g ( ε ) ( y ) , v 0 ( y ) ⟩ dρ 0 ( y ) (7) 6 Cohomological Obstr uctions to Global Counterf actuals By the change-of-v ariables theorem for the pushforward measure ρ 0 = T # µ , we pull this in tegral bac k to the source space X : Z X ⟨∇ g ( ε ) ( T ( x )) , D T ( x ) ξ ( x ) ⟩ dµ ( x ) = Z X ⟨∇ ( g ( ε ) ◦ T )( x ) , ξ ( x ) ⟩ dµ ( x ) (8) Iden tifying the Riesz representer in the L 2 ( X , µ ) inner pro duct closure yields the exact W asserstein gradient ∇ ( g ( ε ) ◦ T ). This prov es that the geometric adjoint of the pushforw ard strictly coincides with pulling back the dual p oten tial through the causal mec hanism. Theorem 5 (The En tropic W asserstein Sheaf Laplacian) T o r e ach a Par eto-optimal c ounterfactual e quilibrium, the pr ob ability me asur e at e ach no de i ∈ V must evolve ac c or ding to the c ouple d non-line ar F okker-Planck e quation: ∂ t µ i = ∇ ·        µ i ∇   X p ∈ p a ( i ) ω pi f ( ε ) i ← p + X c ∈ ch ( i ) ω ic  g ( ε ) c ← i ◦ Φ ic    | {z } T op olo gic al Drift (She af L aplacian ∆ T )        + ε 2 ∆ µ i | {z } Thermal Diﬀusion (9) Pro of The system follows the W asserstein gradien t ﬂo w ∂ t µ i = − grad W 2 E ε ( µ i ). T aking the F r´ ec het deriv ativ e of Eq. (3) yields tw o comp onen ts for no de i : (1) As a child no de ( p → i ), diﬀeren tiating W 2 2 ,ε with respect to the second argumen t yields the forw ard Sinkhorn p oten- tial f ( ε ) i ← p . (2) As a paren t no de ( i → c ), applying Lemma 4, the v ariation is the pulled-back Sinkhorn p oten tial g ( ε ) c ← i ◦ Φ ic . Summing these lo cal v ariations yields the top ological drift v ector ﬁeld. Concurrently , the explicit en tropy regularizer εH ( µ i ) = ε R µ i log µ i dx natively generates the heat equation (Brownian motion) ε 2 ∆ µ i via Otto calculus, establishing the complete McKean-Vlasov system. 6 T op ological F rustration and Global Equilibrium While Theorem 5 establishes the instan taneous dynamics of the coun terfactual measures, a rigorous geometric framework is required to understand the stationary state of this com- plex non-linear system. W e form an interlocking logical chain linking the cohomological obstruction to the asymptotic conv ergence of the Sheaf Laplacian. 6.1 The T op ological F rustration Inequalit y If a causal graph is top ologically frustrated, p erfect adherence to all lo cal structural equa- tions is mathematically imp ossible. W e quan tify this inescapable residual energy . Lemma 6 (Equiv alence of V ariational and Cohomological Obstructions) Under the c omp actness and bi-Lipschitz c onditions of Assumption 1, the strict c ohomolo gic al obstruc- tion on the tangent she af strictly implies the variational metric obstruction. That is, if 7 Wu, Xie and Li evaluate d at the optimal stationary se ction µ ∗ , the metric c ohomolo gy gr oup is non-trivial ( H 1 ( T µ ∗ G , T µ ∗ F )  = 0 ), then the glob al inﬁmum of the Entr opic Causal Dirichlet Ener gy is strictly b ounde d away fr om zer o ( inf E ε > 0 ). Pro of W e pro ve this by contraposition. Assume the global minimum is zero: inf E ε = 0. By the low er semi-contin uit y of the W asserstein metric and the compactness of P 2 ( M v ) (Assumption 1), this minimum is attainable at some global section ˜ µ . A zero Dirichlet energy strictly requires that the metric sheaf discrepancy v anishes on all edges: δ ˜ µ = 0, implying p erfect global adherence to all non-linear pushforward mec hanisms Φ e . A t this p erfectly coherent section ˜ µ , any lo cal tangen t p erturbation V ∈ T ˜ µ C 0 can b e seamlessly absorbed by the exact pushforw ard diﬀeren tials along the edges, mean- ing the cob oundary equation dV = W is globally integrable for any v alid discrepancy ﬁeld W ∈ Ker( d 1 ). Consequen tly , the exact subspace fully cov ers the cycle space, forc- ing T ˜ µ C 1 / Im( d ) ∼ = 0. This strictly yields H 1 = 0, contracting the premise. Therefore, H 1  = 0 = ⇒ inf E ε > 0. Theorem 7 (T op ological F rustration Inequality) L et G b e a c ausal gr aph with a non- trivial ﬁrst metric c ohomolo gy H 1 ( T µ G , F )  = 0 . Then, the Entr opic Causal Dirichlet Ener gy is strictly b ounde d away fr om zer o. Ther e exists a top olo gic al c onstant E ∗ > 0 , dep endent solely on the ge ometry of the c onﬂicting pushforwar d maps { Φ e } e ∈E and the gr aph top olo gy, such that for any glob al joint assignment of me asur es µ ∈ C 0 : inf µ ∈C 0 E ε ( µ ) ≥ E ∗ > 0 (10) Pro of By Lemma 6, the presence of the strict cohomological obstruction ( H 1  = 0) strictly implies the v ariational metric obstruction. Since P 2 ( M i ) are compact under Assumption 1, and W 2 2 ,ε is low er semi-contin uous with resp ect to the weak top ology , the inﬁm um of the Diric hlet energy is attainable. Because the v ariational obstruction holds, this attained minim um cannot b e zero. Thus, the minimum m ust b e strictly p ositiv e E ∗ > 0. 6.2 Linearization on the T angent Sheaf: The W asserstein Ho dge Decomp osition As formalized in Section 4.2, lifting the sheaf to the tangent bundle provides a rigorous algebraic structure. This linear structure immediately yields a fundamental orthogonal decomp osition for causal p erturbations. Theorem 8 (W asserstein Causal Ho dge Decomp osition) L et µ b e a joint c ounter- factual c onﬁgur ation. The Hilb ert sp ac e of c ausal p erturb ations T µ C 0 admits a strict ortho g- onal de c omp osition with r esp e ct to the Otto inner pr o duct: T µ C 0 = Im ( d ) ⊕ Ker (∆ T ) (11) wher e Im ( d ) is the top olo gic al closur e of the image of the c ob oundary op er ator, and ∆ T = d ∗ d is the T angent She af L aplacian. 8 Cohomological Obstr uctions to Global Counterf actuals Pro of Because T µ C 0 is an inﬁnite-dimensional Hilbert space (the L 2 ( µ ) closure of gradien t ﬁelds), the image of the diﬀerential op erator d is not guaranteed to b e closed. By the pro jection theorem in functional analysis, the orthogonal complement of the kernel of the adjoin t d ∗ is strictly the top ological closure of the image of d . Thus, T µ C 0 = Im( d ) ⊕ Ker( d ∗ ). F urthermore, for an y p erturbation vector ﬁeld V ∈ T µ C 0 , V ∈ Ker(∆ T ) ⇐ ⇒ ⟨ d ∗ dV , V ⟩ = 0 ⇐ ⇒ ∥ dV ∥ 2 = 0 ⇐ ⇒ V ∈ Ker( d ∗ ). Th us, Ker(∆ T ) = Ker( d ∗ ), completing the rigorous orthogonal splitting. Ph ysical Signiﬁcance for Coun terfactuals: Any instantaneous mov emen t V of the probabilit y measures can b e uniquely split into t wo orthogonal comp onen ts: • V exact ∈ Im( d ): The Exact Flow . These are p erturbations that strictly ob ey the lo cal deterministic structural equations (mechanisms). • V harmonic ∈ Ker(∆ T ): The Harmonic Flow . These are p erturbations that absorb the residual top ological frustration. When a deterministic system encoun ters a Counterfactual Even t Horizon, trying to force the ﬂo w entirely in to Im( d ) causes mathematical singularities (manifold tearing). The Entropic Sheaf Laplacian fundamentally acts as an orthogonal pro jector, dynamically bleeding the unresolv able top ological stress into the Harmonic subspace Ker(∆ T ). 6.3 The Cohomological Necessity of Causal Equilibrium W e now pro ve that an y contin uous optimization algorithm capable of globally resolving structural conﬂicts is mathematically isomorphic to computing the harmonic representativ e of the ﬁrst metric cohomology group. Theorem 9 (Cohomological Necessit y of Causal Equilibrium) L et G b e a c ausal gr aph exhibiting a top olo gic al obstruction H 1  = 0 . A ny c ontinuous algorithm that suc c essful ly c on- ver ges to a stable, Par eto-optimal stationary state µ ∗ of the glob al c ausal frustr ation E ε ( µ ) must yield a terminal str ess ﬁeld R ∗ that is exactly the unique harmonic r epr esentative of the ﬁrst c ohomolo gy gr oup H 1 . Pro of An y rational mac hine learning algorithm seeking to resolve the conﬂicting mec ha- nisms m ust inheren tly attempt to minimize the En tropic Causal Diric hlet Energy E ε ( µ ). By the fundamental theorem of the calculus of v ariations in W asserstein space, the stationary equilibrium µ ∗ m ust strictly satisfy the ﬁrst-order optimality condition grad W 2 E ε ( µ ∗ ) = 0. By the Entropic Pullbac k Lemma (4), the chain rule of the F r ´ ec het deriv ativ e rigorously translates the W asserstein gradien t to the application of the geometric adjoin t d ∗ : d ∗ R ∗ = 0. This implies R ∗ ∈ Ker( d ∗ ). In algebraic top ology , since our causal graph G contains no 2-cells (faces), the higher b oundary op erator d 1 ≡ 0. Therefore, the metric cohomology group on the tangen t sheaf is rigorously isomorphic to the quotien t space: H 1 ( T µ ∗ G , T µ ∗ F ) ∼ = T µ ∗ C 1 / Im( d ). By the orthogonal splitting, this quotien t space is exactly isomorphic to Ker( d ∗ ). Consequently , R ∗ ∈ Ker( d ∗ ) constitutes the unique Harmonic Represen tative of the cohomology class H 1 . 9 Wu, Xie and Li 6.4 Strict Energy Dissipation via Metric Gradient Flows In W asserstein space, pushforward op erations via arbitrary non-linear neural net works Φ e destro y global geo desic con v exity . Ho wev er, b y formalizing the system within the Am brosio- Gigli-Sa v ar ´ e (AGS) framework for gradient ﬂo ws in metric spaces, we establish a strict energetic descent sequence without requiring strong conv exit y . Theorem 10 (Energy Dissipation Iden tity) L et µ t = ( µ v ,t ) v ∈V b e an absolutely c on- tinuous curve of me asur es solving the She af L aplacian system. By the Ambr osio-Gigli-Savar´ e (A GS) c alculus for metric gr adient ﬂows, the Entr opic Causal Dirichlet Ener gy acts as a strict Lyapunov functional. The ener gy dissip ation r ate is exactly governe d by the metric derivative: d dt E ε ( µ t ) = − X i ∈V Z M i     ∇ x  δ E ε δ µ i,t      2 dµ i,t ( x ) ≤ 0 . (12) Conse quently, the tr aje ctory stably dissip ates ener gy and c onver ges we akly to a stationary Wasserstein harmonic se ction µ ∗ wher e the top olo gic al str ess is Par eto-minimize d. (The rigor ous functional-analytic derivation is pr ovide d in App endix D). 7 Geometric T earing: Ricci Curv ature and Finite-Time Singularities The empirical phenomenon of “Manifold T earing” observed in unregularized generative mo dels demands a rigorous geometric explanation. In this section, we elev ate our macro- scopic top ological obstruction ( H 1  = 0) to a microscopic geometric singularity via the Lott- Sturm-Villani (LSV)(Lott and Villani, 2009; Sturm, 2006) framework of synthetic Ricci curv ature. W e rigorously prov e that top ological frustration fundamentally induces strictly negativ e eﬀe ctive curv ature—manifesting as displacement concavit y—within the unregu- larized causal energy landscap e. Consequently , we demonstrate that any deterministic W asserstein gradient ﬂow ( ε → 0) attempting to resolv e this frustrated system inescapably suﬀers from ﬁnite-time singularities. 7.1 Displacement Conca vity and Eﬀective Ricci Curv ature on the Causal Sheaf In Lott-Sturm-Villani (LSV) theory , the syn thetic Ricci curv ature of a metric measure space is deﬁned through the κ -displacemen t con v exity of an entrop y functional along W asserstein geo desics. W e translate this framew ork to our Causal Sheaf b y deﬁning the eﬀe ctive curva- tur e of the unregularized Causal Dirichlet Energy E 0 . Theorem 11 (T op ological F rustration Induces Negative Eﬀectiv e Curv ature) L et G b e a c ausal gr aph exhibiting a strict c ohomolo gic al obstruction H 1 ( T µ G , T µ F )  = 0 . In the Wasserstein neighb orho o d of the frustr ate d e quilibrium, the unr e gularize d ener gy land- sc ap e E 0 exhibits strictly ne gative synthetic curvatur e along the harmonic ﬂow dir e ctions. Sp e ciﬁc al ly, E 0 is strictly κ -c onc ave ( κ g lobal < 0 ) along gener alize d ge o desics driven by the top olo gic al str ess. Pro of In the Otto calculus framework, the eﬀective syn thetic Ricci curv ature b ounded b e- lo w b y κ is equiv alent to the κ -conv exity of the functional E 0 ( µ ) along W asserstein geodesics. W e ev aluate the second-order v ariation (the W asserstein Hessian) of E 0 . 10 Cohomological Obstr uctions to Global Counterf actuals Let µ ( s ) b e a constant-speed geo desic in C 0 parameterized by s ∈ [0 , 1], driven b y the tangen t v elo cit y v ector ﬁeld V ∈ T µ C 0 . The Hessian quadratic form is: Hess W 2 E 0 ( V , V ) = d 2 ds 2     s =0 E 0 ( µ ( s )) = Z M ⟨ V ( x ) , ∇ 2 x  δ E 0 δ µ  V ( x ) ⟩ d µ ( x ) (13) By the Causal Ho dge Decomp osition (Theorem 8), any perturbation V decomposes in to V = V exact ⊕ V harmonic . Because H 1  = 0, the exact in tersection of all pushforward maps is empt y . According to Caﬀarelli’s regularity theory , the lack of geometric target compatibilit y implies that the optimal transp ort map T opt realizing the cost along the conﬂicting cycle is strictly non-monotone. By Brenier’s Theorem, this non-monotonicity strictly dictates that the underlying Kantoro vic h p oten tial is not globally conv ex, strictly yielding at least one negativ e eigen v alue in the spatial Hessian ∇ 2 x  δ E 0 δ µ  almost everywhere. Let − λ max < 0 b e the supremum of this negative eigenv alue ev aluated o ver the support of µ . Th us, for the harmonic p erturbation ﬁeld V harmonic attempting to resolve the non- in tegrable cycle, there exists a constant C > 0 (determined by the H 1 top ological gap) suc h that: Hess W 2 E 0 ( V harmonic , V harmonic ) ≤ − C ∥ V harmonic ∥ 2 L 2 ( µ ) (14) By the equiv alence of κ -con vexit y and synthetic curv ature in the LSV formulation, this strict upp er b ound strictly yields a negative eﬀectiv e curv ature κ g lobal ≤ − C < 0 for the causal energy landscap e. 7.2 The Finite-Time Singularity Theorem W e now pro ve that this negativ e curv ature guarantees catastrophic failure for standard (unregularized) neural ODEs and Flow Matching approaches. Theorem 12 (Finite-Time Manifold T earing) Assume an unr e gularize d Causal Gen- er ative Mo del ( ε = 0 ) fol lows the deterministic Wasserstein gr adient ﬂow ∂ t µ = − gr ad W 2 E 0 ( µ ) . Under the c onditions of The or em 11 ( κ g lobal < 0 ), the Jac obian determinant of the ﬂow map c ol lapses to zer o in ﬁnite time T tear < ∞ , c ausing the pr ob ability density to blow up to in- ﬁnity (loss of absolute c ontinuity). Pro of Let v t ( x ) = −∇ x  δ E 0 δ µ  b e the Eulerian v elo cit y ﬁeld driving the empirical measure. The characteristics (particle tra jectories) X t ( x ) satisfy the ordinary diﬀerential equation: d dt X t ( x ) = v t ( X t ( x )) , X 0 ( x ) = x (15) Let J t ( x ) = det( ∇ X t ( x )) b e the Jacobian determinan t of the deformation gradient. The evo- lution of the densit y is strictly go verned by the pushforw ard mass conserv ation: µ t ( X t ( x )) = µ 0 ( x ) /J t ( x ). 11 Wu, Xie and Li T o trace the ev olution of J t , we deﬁne the deformation matrix M t = ∇ X t ( x ). T aking the time deriv ative, w e obtain the matrix Riccati equation along the c haracteristics: d dt M t = ∇ v t ( X t ( x )) M t = −∇ 2 x  δ E 0 δ µ  M t (16) By Liouville’s form ula, the time evolution of the Jacobian determinant is: d dt J t ( x ) = J t ( x )T r( ∇ v t ) = − J t ( x )∆ x  δ E 0 δ µ  (17) F rom the pro of of Theorem 11, the presence of the H 1 top ological obstruction guarantees that the spatial Hessian p ossesses negative eigenv alues b ounded by − λ max , causing the p oten tial to b e strongly conca ve in the harmonic direction. Therefore, the trace of the Hessian (the Laplacian) is strictly p ositiv e and low er-b ounded: ∆ x  δ E 0 δ µ  ≥ C > 0 for regions aligned with the top ological stress. Substituting this b ound into the Liouville equation yields a strict ordinary diﬀerential inequalit y: d dt J t ( x ) ≤ − C J t ( x ) = ⇒ J t ( x ) ≤ J 0 ( x ) e − C t = e − C t (18) F urthermore, analyzing the eigen v alues of the Riccati equation (16) under strong conca vity sho ws that the matrix M t b ecomes singular. Let y ( t ) b e the minimum eigenv alue of M t . It satisﬁes ˙ y ≤ − C y 2 , which strictly implies y ( t ) → 0 at a ﬁnite time T tear ≤ 1 C y (0) . When the minimum eigen v alue hits zero, the characteristics cross, and J T tear ( x ) → 0. By mass conserv ation, the densit y µ T tear ( X t ) → ∞ . The measure concen trates in to a sin- gular Dirac distribution (sho c kwa ve), strictly breaking absolute contin uit y with resp ect to the Lebesgue measure. This completes the formal proof of deterministic manifold tearing. 8 Large Deviation Theory and Entropic T unneling Ha ving rigorously prov en that ε = 0 leads to ﬁnite-time singularities, we demonstrate that the thermo dynamic noise ε > 0 in the Entropic Sheaf Laplacian is not merely a numerical stabilizer, but a mathematical necessity . W e formalize the “En tropic T unneling” observed in our PBMC scRNA-seq exp erimen ts using the inﬁnite-dimensional F reidlin-W entzell Large Deviation Theory (LDT). 8.1 Dawson-G ¨ artner Large Deviation Principle The Entropic Sheaf Flow is a sto chastic partial diﬀerential equation (SPDE) interacting particle system: d µ t = − grad W 2 E 0 ( µ t ) dt + √ εdW W 2 t (19) By the Da wson-G¨ artner theorem for macroscopic ﬂuctuation theory (Dawson and G¨ artner, 1987), the sequence of path measures P ε satisﬁes a Large Deviation Principle on the space of absolutely contin uous curves C ([0 , T ]; P 2 ( M )). The rate function con trolling the expo- nen tial probabilit y of observing a sp eciﬁc tra jectory { µ t } is the Causal A ction F unctional , 12 Cohomological Obstr uctions to Global Counterf actuals structurally acting as the F reidlin-W entzell quasi-p oten tial (Kramers’ rate) (F reidlin and W entzell, 1998): I T [ µ t ] = 1 2 Z T 0   ∂ t µ t + grad W 2 E 0 ( µ t )   2 µ t dt (20) 8.2 Kramers’ Escap e Rate o ver the T op ological Barrier Let µ A b e a lo cal minimum (e.g., the Biological Chimera trap) and µ ∗ b e the target coheren t coun terfactual state , separated by a top ological barrier (the T ranscriptomic V oid) with a saddle p oin t µ saddle . The height of the energy barrier is ∆ E = E 0 ( µ saddle ) − E 0 ( µ A ). Theorem 13 (Rigorous En tropic T unneling Time) L et τ b e the ﬁrst hitting time of the tar get domain µ ∗ starting fr om µ A . As the entr opic r e gularization vanishes ( ε → 0 ), the exp e cte d tunneling time E [ τ ] strictly ob eys the Kr amers’ asymptotic law: lim ε → 0 ε log E [ τ ] = ∆ E (21) Pro of Step 1: Exp onen tial Tigh tness and LDP W ell-p osedness. T o rigorously apply the Dawson-G¨ artner Large Deviation Principle in the inﬁnite-dimensional W asser- stein space, the sequence of path measures P ε m ust b e exp onen tially tigh t. By Assumption 1, the base state manifolds M v are compact. By Prokhorov’s Theorem, the space of ab- solutely contin uous probabilit y measures P 2 ( M v ) is inherently compact under the w eak top ology . This geometric compactness strictly guaran tees the exp onential tigh tness of P ε on C ([0 , T ]; C 0 ). Consequen tly , the Causal Action F unctional I T is lo wer semi-contin uous and p ossesses strictly compact level sets, satisfying the foundational prerequisites for the F reidlin-W en tzell quasi-p oten tial framework in metric spaces. Step 2: Ev aluating the Action F unctional. By the established LDP , the asymptotic escap e time is determined b y the minim um energy path connecting µ A to µ ∗ . W e ev aluate the inﬁmum of the action functional V ( µ A , µ ∗ ) = inf T > 0 inf { µ t } I T [ µ t ]. W e expand the squared in tegrand of the action functional by completing the square (the Bogomoln y tric k): I T [ µ t ] = 1 2 Z T 0  ∥ ∂ t µ t ∥ 2 µ t + ∥ grad W 2 E 0 ∥ 2 µ t + 2 ⟨ ∂ t µ t , grad W 2 E 0 ⟩ µ t  dt ≥ Z T 0 ⟨ ∂ t µ t , grad W 2 E 0 ( µ t ) ⟩ µ t dt (22) By the Riemannian chain rule in Otto calculus, the inner pro duct of the v elo cit y ﬁeld and the W asserstein gradient is exactly the time deriv ative of the energy: ⟨ ∂ t µ t , grad W 2 E 0 ⟩ µ t = d dt E 0 ( µ t ). Thus, in tegrating along an y path crossing the saddle point yields a strict low er b ound: I T [ µ t ] ≥ Z T 0 d dt E 0 ( µ t ) dt = E 0 ( µ saddle ) − E 0 ( µ A ) = ∆ E (23) This inﬁm um is uniquely achiev ed when the inequalit y in (23) b ecomes an equalit y . This oc- curs strictly when the system follows the time-rev ersed hetero clinic orbit: ∂ t µ t = +grad W 2 E 0 ( µ t ) ascending to the saddle p oint, and then relaxes deterministically to µ ∗ . 13 Wu, Xie and Li Applying the Large Deviation Principle upp er and low er b ounds to the exit time distribu- tion yields the classical Kramers’ limit: the probability of escap e scales as P ≍ exp( − ∆ E /ε ), strictly enforcing that lim ε → 0 ε log E [ τ ] = ∆ E . Mathematical Consequence: Theorem 13 prov es that the relationship E [ τ ] ≍ exp(∆ E /ε ) is a strict top ological law. As ε → 0, the exp ected time to cross the biological v oid diverges to + ∞ . Therefore, unregularized deterministic models are theoretically paralyzed b y top o- logical frustration, p ermanen tly trapp ed in artiﬁcial chimeras. The Entropic Causal Sheaf dynamically low ers this Kramers’ barrier, making global causal coherence computationally inevitable. 9 Algorithmic Realization via Implicit F unction Theorem (IFT) A profound consequence of Theorem 5 is its immediate tractabilit y in mo dern deep learning framew orks. The inv erse top ological stress exerted by a child node requires computing the spatial gradient ∇ x ( g ( ε ) ◦ Φ). By the c hain rule, this translates directly to the V ector-Jacobian Pro duct (VJP): ∇ x  g ( ε ) (Φ( x ))  = ( J Φ ( x )) T ∇ y g ( ε ) ( y ) ≡ VJP Φ ( x, ∇ y g ( ε ) ( y )) (24) The rigorous functional analytic pro of that the geometric adjoint ( d Φ uv # ) ∗ ex- actly coincides with the VJP in the Otto inner product is pro vided in App endix A.3, alongside the explicit blo c k operator matrix of the T angen t Sheaf Laplacian ∆ T . Ho wev er, a critical algorithmic challenge remains: the dual p oten tial g ( ε ) is not an- alytically given, but is computed iteratively via the Sinkhorn-Knopp algorithm. Naively unrolling the computational graph of Sinkhorn iterations to compute ∇ y g ( ε ) leads to catas- trophic memory complexit y ( O ( L ) for L iterations) and v anishing gradients. T o rigorously bypass this, w e utilize the Implicit F unction The or em (IFT) . The Sinkhorn p oten tials ( f ( ε ) , g ( ε ) ) are uniquely deﬁned as the ro ots of the ﬁxed-p oin t optimality condi- tions (the Sinkhorn equations): F ( f ( ε ) , g ( ε ) , µ, ν ) = 0 By applying IFT to the steady-state ro ot F = 0, the exact gradient ∇ y g ( ε ) ( y ) can b e computed by solving a single linear system form ulated by the Jacobian of the optimalit y conditions, entirely indep enden t of the iteration path L . Consequen tly , the top ological stress VJP can b e computed with O (1) memory , allo wing the coupled PDEs in Eq. (9) to b e simulated eﬃciently as in teracting Langevin particle dynamics, conquering the curse of dimensionalit y . 9.1 Complexity Analysis and the Entropic Sheaf Fl o w Algorithm The integration of the IFT-based VJP in to the W asserstein gradien t ﬂo w yields a highly scalable interacting particle system, which w e formalize as the En tropic Sheaf Flow (Algorithm 1). 14 Cohomological Obstr uctions to Global Counterf actuals Memory and Time Complexit y: Standard backpropagation through a Sinkhorn lo op of L iterations requires O ( L · N 2 ) memory for N particles, which strictly prohibits high- dimensional SCMs. By leveraging the IFT at the stationary ro ot, the memory complexity collapses to O ( N 2 ) (indep enden t of the iteration path L ), requiring only the storage of the optimal coupling matrix for the ﬁnal bac kward pass. The time complexit y per step is dominated b y solving a strictly diagonally dominan t linear system, whic h can b e eﬃcien tly appro ximated via Conjugate Gradient (CG) or Neumann series in O ( K · N 2 ) time, where K ≪ L . Algorithm 1 En tropic Sheaf Flow via IFT and Langevin Dynamics Require: Directed acyclic SCM graph G = ( V , E ), Causal mec hanisms { Φ e } e ∈E . Require: Initial factual empirical measures µ (0) = { µ (0) v } v ∈V . Require: Hyp erparameters: Step size η , En tropic regularization ε , Conﬁdence w eights ω uv . 1: for t = 0 to T do 2: // Step 1: F orward Pushforward 3: for each edge e = ( u, v ) ∈ E do 4: Compute empirical pushforw ard: ρ ( t ) u → v = Φ uv # µ ( t ) u 5: end for 6: // Step 2: Sinkhorn Fixed-Poin t & IFT 7: for each no de i ∈ V do 8: Initialize top ological drift v ector V i = 0 9: for each paren t p ∈ pa( i ) do 10: Solv e Sinkhorn equations for f ( ε ) i ← p b et w een µ ( t ) i and ρ ( t ) p → i 11: V i += ω pi ∇ x f ( ε ) i ← p ( x ) (F orwar d Str ess) 12: end for 13: for each c hild c ∈ c h( i ) do 14: Solv e Sinkhorn equations for g ( ε ) c ← i b et w een ρ ( t ) i → c and µ ( t ) c 15: Compute steady-state gradien t ∇ y g ( ε ) c ← i ( y ) via IFT linear solve 16: V i += ω ic ( J Φ ic ( x )) T ∇ y g ( ε ) c ← i (Φ ic ( x )) (Inverse Pul lb ack Str ess via VJP) 17: end for 18: end for 19: // Step 3: Interacting Langevin Up date 20: for each no de i ∈ V do 21: Sample Brownian noise ξ i ∼ N (0 , I ) 22: Up date particles: X ( t +1) i = X ( t ) i − η V i + √ 2 εη ξ i 23: end for 24: end for 25: return Conv erged W asserstein Harmonic Section µ ( T ) 15 Wu, Xie and Li 9.2 Numerical Stability and the Entropic T rade-oﬀ ( ε → 0 ) While the IFT mathematically guarantees an O ( N 2 ) memory footprint, analyzing its com- putational viability requires a rigorous examination of the Hessian condition num b er. The implicit gradient requires solving a linear system of the form H v = b , where H is the blo c k-Jacobian of the Sinkhorn optimality conditions. The oﬀ-diagonal blo cks of H are go verned by the optimal coupling matrix P ∈ R N × N , whose elements are exactly prop ortional to exp( − c ( x i , y j ) /ε ). As the entropic regularization v anishes ( ε → 0), the optimal coupling conv erges to a deterministic Monge map, meaning P b ecomes extremely sparse and nearly singular. Consequen tly , the condition num ber of the Hessian κ ( H ) scales exp onen tially: κ ( H ) = O  e diam( M ) /ε  (25) In this deterministic limit, iterativ e Krylo v subspace solvers (e.g., Conjugate Gradien t) used to compute the IFT inv erse will suﬀer from catastrophic stalling or numerical div ergence. This rev eals a profound physical and computational trade-oﬀ: the thermo dynamic noise ( ε ) is not merely a geometric regularizer to preven t manifold tearing (Theorem 5), but is strictly necessary to boundedly condition the Hessian matrix for the Implicit F unction Theorem. In our exp erimen ts, an entropic co eﬃcien t of ε ∈ [0 . 1 , 5 . 0] strikes the optimal P areto balance, ensuring b oth top ological resolution and robust IFT conv ergence. 10 Empirical V alidation: 2D V ector Field Dynamics T o rigorously demonstrate the global asymptotic stabilit y of our framew ork, we establish a repro ducible 2D simulation o ver the causal graph G = { A → B , B → C, A → C } . Exp erimen tal Setup (Repro ducibilit y): The initial marginals are mo deled as isotropic Gaussians: µ (0) A = N ([0 , 0] T , I ), µ (0) B = N ([0 , 0] T , I ), and µ (0) C = N ([8 , 0] T , I ). W e engineer a sev ere top ological conﬂict ( H 1  = 0) b y deﬁning the deterministic pushforward mechanisms as nonlinear drift mappings: • Φ AB ( x ) = x + [4 , 4] T • Φ B C ( x ) = x + [4 , − 4] T (th us, A → B → C attempts to route the origin to [8 , 0] T ) • Φ AC ( x ) = x + [0 , 8] T (the direct edge attempts to route the origin to [0 , 8] T , creating an orthogonal con tradiction at no de C ). The system is evolv ed according to the coupled SDE derived from Eq. (9): dx ( t ) i = −∇ x  δ E ε δ µ i  dt + √ εdW t using the Euler-Maruyama metho d with step size η = 0 . 01, ε = 0 . 1, and uniform edge w eights ω = 1 . 0 for T = 500 steps. Instead of succum bing to deterministic manifold tearing, our Sheaf Laplacian smo othly guides the empirical measures along Langevin streamlines (gray tra jectories). As detailed 16 Cohomological Obstr uctions to Global Counterf actuals -20 -10 0 10 20 -15 -10 -5 0 5 10 15 Node A (Source) t=0 (Initial) t=End (Converged) -15 -10 -5 0 5 10 15 20 25 -15 -10 -5 0 5 10 15 20 Node B (Intermediate) t=0 (Initial) t=End (Converged) -10 -5 0 5 10 15 20 25 -15 -10 -5 0 5 10 15 Node C (T arget - Conflict Zone) t=0 (Initial) t=End (Converged) Goal: Path A→B→C Goal: Path A→C Compromise Center 2D W asserstein Sheaf Flow: T opological Conflict Resolution (V ector Graphics) Figure 2: High-resolution v ector ﬁeld of the 2D W asserstein Sheaf Flow ov er 500 steps, o vercoming orthogonal top ological conﬂicts. Metric Initial ( t = 0 ) Con verged ( t = 500 ) Change / Shift Causal Diric hlet Energy 128 . 38 45 . 54 − 64 . 53 % No de C Cen ter of Mass (8 . 01 , 0 . 01) (5 . 79 , 1 . 22) ( − 2 . 21 , + 1 . 21 ) No de A Cen ter of Mass (0 . 00 , 0 . 03) (0 . 97 , − 1 . 54) (+ 0 . 97 , − 1 . 57 ) No de A V ariance - 41 . 10 Stable (Compact) T able 1: Quan titative analysis of the 500-step Entropic Sheaf Flow. in T able 1, quantitativ e analysis rev eals a robust 64 . 53 % reduction in Causal Dirichlet Energy , dropping from 128 . 38 to 45 . 54. Crucially , demonstrating the p o werful eﬀect of the pullback lemma (Lemma 4), the source node A undergo es an autonomous in verse displacement of (+0 . 97 , − 1 . 57) in to the fourth quadran t. This active deformation absorbs the do wnstream top ological stress, es- tablishing global equilibrium without inﬁnite v ariance blow-up. 10.1 Scalability and the IFT Memory T riumphs T o empirically v alidate the computational supremacy of our IFT-based VJP form ulation prop osed in Section 9, we b enchmark ed the Entropic Sheaf Flow against the standard Unrolled Auto diﬀ (Backpropagation-Through-Time) metho d natively used in m odern deep learning framew orks. W e simulated a high-dimensional counterfactual scenario with N = 1000 particles in a D = 128 dimensional am bient manifold. As illustrated in Figure 3, the naiv e unrolling metho d exhibits a catastrophic linear memory explosion, consuming nearly 4 GB of VRAM for merely 1000 particles at L = 1000 iterations. In stark con trast, our IFT formulation translates the inﬁnite-dimensional adjoin t in to a single ste ady-state linear solve, reducing the p eak memory fo otprin t to an absolute ﬂat constant ( O ( N 2 )). F urthermore, n umerical proﬁling rev eals a profound geometric artifact inherent to ﬁnite unrolling. When naive unrolling is restricted to L = 10 iterations to conserv e memory , the computed gradients exhibit a massiv e L 1 relativ e error of 0 . 796 compared to the true analytical gradient. This indicates severe trunc ation bias , as the ﬁnite computational graph 17 Wu, Xie and Li 0 200 400 600 800 1000 Sinkhorn Iterations (L) 0 1 2 3 4 5 6 Time per backward pass (Seconds) Gradient Computation Time (N=1000, D=128) Unrolled Autodiff (Naive) IFT VJP (Ours) 0 200 400 600 800 1000 Sinkhorn Iterations (L) 0 500 1000 1500 2000 2500 3000 3500 4000 P eak VRAM F ootprint (MB) Reverse-Mode Memory Complexity U n r o l l e d A u t o d i f f : ( L N 2 ) I F T V J P : ( N 2 ) ( C o n s t a n t ) Algorithmic Triumphs: Implicit Function Theorem vs. Naive Unrolling Figure 3: Algorithmic Benchmarks (IFT vs. Naiv e Unrolling). (Left): Gradi- en t computation time p er bac kward pass. The IFT explicitly av oids trav ersing the L -step computational graph, ac hieving signiﬁcant acceleration. (Righ t): Reverse-mode memory fo otprin t. Naive unrolling suﬀers from catastrophic O ( L · N 2 ) linear explosion, eﬀectiv ely prohibiting high-dimensional deep learning. In stark contrast, our IFT-VJP formulation dynamically strictly b ounds the memory strictly to O ( N 2 ), rendering the algorithmic fo ot- prin t completely inv ariant to the Sinkhorn horizon. fails to reach the true c -concav e Kan torovic h p oten tial. Our IFT-based Sheaf Flo w en tirely b ypasses this trade-oﬀ: it directly targets the exact stationary ro ot, guaranteeing unbiased, mathematically exact top ological stress gradients at a fraction of the computational and memory cost. 11 Real-W orld Application: PBMC scRNA-seq Coun terfactuals T o ev aluate the scalabilit y and biological ﬁdelity of the En tropic Sheaf Flo w, w e apply our framew ork to a high-dimensional single-cell RNA sequencing (scRNA-seq) dataset (PBMC 3k from 10x Genomics). This exp erimen t tests the mo del’s ability to na vigate complex, non-con vex manifolds where the ”T ranscriptomic V oid” acts as a physical cohomological obstruction. 11.1 Exp erimen tal Setup and the T ranscriptomic V oid The state spaces M v are deﬁned as the 15-dimensional PCA embedding of the cell-gene expression matrix. W e deﬁne a coun terfactual task: transitioning a cell p opulation from a F actual State (T-cells, µ 0 ) to a T ar get State (Mono cytes, µ do ( x ∗ ) ). In this high-dimensional manifold, these clusters are separated by a region of near-zero probabilit y density—the T ranscriptomic V oid . This void represen ts biologically imp ossi- ble gene expression states, constituting a severe top ological obstruction where H 1 ( G , F )  = 0. 18 Cohomological Obstr uctions to Global Counterf actuals 11.2 Results: Biological Chimera vs. Entropic T unneling W e compare the standard Deterministic ODE (Flow Matching) against our prop osed En- tr opic She af Flow (GACF). The results are summarized in Figure 4 and reveal tw o distinct b eha viors: • Manifold T earing and Biological Chimeras: The Naive ODE, driven purely b y Euclidean causal drift, attempts to tra verse the T ranscriptomic V oid via the shortest geo desic. As sho wn in the results, it terminates in a zero-densit y region, pro ducing a Biolo gic al Chimer a —a mathematical artifact with no biological counterpart. This conﬁrms the ”Manifold T earing” failure mo de. • En tropic T unneling and T op ological Surviv al: In con trast, the En tropic Sheaf Flo w lev erages the regularized Laplacian ( ε = 5 . 0). The thermo dynamic noise induces a ”tunneling eﬀect” that allows the empirical measure to bypass the high-energy barrier of the v oid. By in tegrating the Manifold Sc or e Field , the Sheaf Flow adaptively deforms the tra jectory , guiding it along the high-density manifold of real cells to ac hieve a successful counterfactual landing. −5 0 5 10 −5 0 5 10 Source T arget Naive ODE GACF P ath Figure 4: Coun terfactual In terven tion on PBMC 3k scRNA-seq. The Naive ODE (dashed gra y) fails b y en tering the zero-densit y v oid (Biological Chimera). The GA CF P ath (solid red) autonomously na vigates the manifold, utilizing entropic tunneling to ov ercome the top ological frustration and reach the target Mono cyte cluster. 19 Wu, Xie and Li 11.3 Eﬃciency: The IFT T riumphs As the dimensionality increases to D = 15, the computational burden of the Sinkhorn adjoin t b ecomes critical. By deploying the Implicit F unction Theorem (IFT)-based VJP (Section 9), w e observed a constan t memory fo otprin t indep endent of the Sinkhorn iteration depth L . This eﬃciency allow ed for 400-step Langevin simulations in the 15D manifold with negligible VRAM o verhead, rendering high-dimensional sheaf-theoretic causal inference computationally feasible. 12 F rom Inference to Disco very: T op ological Causal Structure Learning Sections 4 through 8 op erate under the assumption that the structural causal graph G is predeﬁned (alb eit potentially frustrated). Ho w ever, the ultimate challenge in mac hine learning is Causal Disc overy —learning the unobserved graph top ology G purely from ob- serv ational data. In this section, we in vert our theoretical framework to establish a funda- men tal paradigm shift: utilizing the Sheaf Cohomology as a geometric scoring function for con tinuous structure learning. 12.1 The Principle of Least T op ological F riction Curren t con tinuous structure learning algorithms, such as NOTEARS and its diﬀeren tiable v ariants (Zheng et al., 2018; Brouillard et al., 2020), primarily p enalize algebraic cyclicity . They fail to geometrically quantify whether the prop osed structural mechanisms naturally comp ose ov er the probabilit y manifold. W e prop ose that the true causal graph is the one that minimizes the cohomological obstruction. Let G b e the h yp othesis space of candidate causal graphs. F or a candidate graph G ∈ G and its asso ciated mechanisms Φ G , we deﬁne the T op ological Causal Score S ( G ) as the residual global Entropic Causal Diric hlet Energy at the W asserstein harmonic equilibrium: S ( G ) = inf µ ∈C 0 E ε ( µ ; G ) (26) By the W asserstein Causal Ho dge Decomp osition (Theorem 8), this residual energy strictly isolates the norm of the unresolv able top ological stress ﬁeld (the Harmonic Flow). Therefore, S ( G ) ∝ ∥ V harmonic ∥ 2 Ker(∆ T ) . Theorem 14 (T op ological Iden tiﬁabilit y of Spurious Edges) L et G true b e the data- gener ating c ausal gr aph, and G cand b e a c andidate gr aph c ontaining a spurious dir e cte d p ath that forms a structur al lo op c ontr adicting the true data manifold (e.g., a r everse d c ausal e dge cr e ating a H 1 cycle). L et the me chanisms Φ b e optimal ly tr aine d to match the p airwise observational mar ginals. Then, the c andidate gr aph induc es a strictly p ositive top olo gic al friction gap c omp ar e d to the true gr aph: S ( G cand ) > S ( G true ) ≥ 0 (27) Pro of If G cand con tains a spurious conﬂicting cycle, the optimal pushforw ard maps along this cycle cannot globally commute ov er the observ ational manifold. By Theorem 7 (T op o- logical F rustration Inequality), this non-trivial ﬁrst metric cohomology H 1 ( T µ G cand , F )  = 0 20 Cohomological Obstr uctions to Global Counterf actuals strictly b ounds the inﬁmum of the Diric hlet energy a wa y from zero. Con versely , the true generativ e graph G true naturally p ossesses a globally consisten t section (the true observ a- tional joint distribution), allo wing the empirical measures to seamlessly ﬂo w in to the Exact subspace Im( d ). Thus, the harmonic residual ∥ V harmonic ∥ 2 of G true is strictly minimized, isolating G cand from the true Marko v Equiv alence Class. 12.2 Implications for Diﬀerentiable Structure Learning Theorem 14 provides a profoundly elegant foundation for diﬀerentiable causal disco very . Instead of relying solely on conditional indep endence tests or ad-ho c sparsit y regularizers, algorithms can join tly optimize the graph adjacency matrix A and the structural mecha- nisms Φ b y minimizing the steady-state En tropic Sheaf Laplacian energy: min A , Φ E data [NLL(Φ)] + λ T r (Ker(∆ T ( A ))) (28) where p enalizing the kernel of the T angent Sheaf Laplacian contin uously forces the netw ork to prune spurious edges that generate top ological friction. This p erfectly bridges algebraic top ology and causal representation learning, opening a massive av en ue for future research in top ology-a ware causal discov ery . 12.3 Empirical V eriﬁcation of T op ological Causal Scoring T o empirically v alidate Theorem 14, we simulate a minimalist causal disco very scenario using the Entropic Sheaf Flo w. W e initialize empirical measures µ (0) ( N = 200 particles) and optimize them o ver t wo candidate causal graphs: the true Marko vian graph G true = { A → B , B → C } and a spurious candidate graph G cand con taining an additional edge A → C that structurally conﬂicts with the A → B → C path wa y (in tro ducing a strict H 1 cohomological obstruction). As shown in Figure 5 and T able 2, optimizing the measures o ver the true graph resolves smo othly . The system dynamically aligns with the factual manifold, and the residual energy strictly conv erges to a low empirical baseline ( S ( G true ) ≈ 5 . 93), whic h inherently accoun ts for the thermal diﬀusion ( ε = 0 . 2) and ﬁnite-sample approximation. In stark contrast, the spurious graph G cand encoun ters inescapable geometric frustration. The exact ﬂo w cannot globally commute, forcing the top ological stress into the Harmonic subspace Ker(∆ T ). Consequen tly , the Entropic Causal Dirichlet Energy gets ph ysically b ottlenec k ed by the topological barrier, yielding a strictly p ositiv e residual score ( S ( G cand ) ≈ 22 . 49). This massiv e 3 . 8 × energy gap quan titatively isolates the true causal top ology , conﬁrming that the T angen t Sheaf Laplacian acts as a highly sensitiv e algebraic detector for spurious causal mechanisms. 13 Conclusion and F uture W ork Con tinuous generativ e mo dels implicitly rely on the assumption of seamless lo cal-to-global causal composition—a hypothesis that fundamen tally fractures in the presence of structural conﬂicts. In this pap er, w e iden tiﬁed and formalized this failure mo de, demonstrating 21 Wu, Xie and Li Candidate Graph T op ology Cohomological Status T op ological Score S ( G ) Energy Gap Ratio T rue Graph ( G true ) H 1 = 0 (Exact) 5 . 9287 1.0x (Baseline) Spurious Graph ( G cand ) H 1  = 0 (F rustrated) 22 . 4863 3.79x (Bottleneck ed) T able 2: Quantitativ e comparison of the steady-state T op ological Causal Scores. The spu- rious graph exhibits a massive residual energy gap due to the unresolv able harmonic ﬂow. 0 50 100 150 200 250 300 Langevin Steps 1 0 1 6 × 1 0 0 2 × 1 0 1 Causal Dirichlet Ener gy E v o l u t i o n o f T o p o l o g i c a l C a u s a l S c o r e ( ) S p u r i o u s G r a p h c a n d ( H 1 0 ) T r u e G r a p h t r u e ( H 1 = 0 ) Figure 5: Evolution of the T op ological Causal Score S ( G ) during the Entropic Sheaf Flow. The Dirichlet energy of the true graph con verges to the thermo dynamic ﬂo or, whereas the spurious graph is p ermanently trapp ed b y the H 1 top ological barrier, p erfectly verifying Theorem 14. that deterministic contin uous causal inference ov er conﬂicting graphs suﬀers from negativ e syn thetic Ricci curv ature, inescapably leading to ﬁnite-time singularities (manifold tearing). T o resolve this, we introduced the W asserstein Causal Sheaf, elev ating Pearl’s struc- tural causal mo dels to the rigorous geometric framework of Otto calculus. By pro viding a strict top ological deﬁnition for structural obstructions ( H 1  = 0), we derived the En- tropic Causal Sheaf Laplacian. A ma jor technical contribution of this work is the Entropic Pullbac k Lemma, which elegantly bridges geometric measure theory with automatic dif- feren tiation (VJP). F urthermore, by deplo ying the Implicit F unction Theorem (IFT) on the Sinkhorn steady-state optimality conditions, w e completely decoupled the reverse-mode memory fo otprin t from the iteration depth, enabling highly scalable, O (1)-memory coun- terfactual inference dev oid of mathematical singularities. Empirically , our framework gracefully na vigates complex non-conv ex geometries, suc- cessfully leveraging thermo dynamic noise to achiev e ”en tropic tunneling” in high-dimensional scRNA-seq in terven tions. Crucially , we in v erted our inference framework to establish a geo- metric foundation for Causal Discov ery . By introducing the T op ological Causal Score S ( G ), w e prov ed and empirically v alidated that spurious causal mechanisms inherently generate 22 Cohomological Obstr uctions to Global Counterf actuals unresolv able harmonic stress, providing a strictly quan titative, top ology-aw are criterion for pruning false edges. F uture Directions. This sheaf-theoretic foundation op ens sev eral promising av en ues for future research. First, extending the causal simplicial complex to include 2-cells (faces) will allo w us to in vestigate H 2 cohomological obstructions and synergistic confounding in causal h yp ergraphs (a formal mathematical roadmap is pro vided in App endix B) . Second, bridging our geometric framew ork with statistical learning theory: as preliminary sho wn in App endix C , the sample complexit y of learning these counterfactual mappings is strictly gov erned b y the Betti num b ers of the causal graph, formalizing a top ological P AC-learning paradigm. Ultimately , we hop e this w ork ﬁrmly grounds con tinuous causal generativ e modeling within the ric h algebraic and geometric structures of optimal transport, enabling more robust and mathematically interpretable AI systems. 23 Wu, Xie and Li App endix A. Rigorous Pro ofs of Main Theorems This app endix pro vides the strict measure-theoretic and functional analysis deriv ations for the lemmas and theorems presented in the main text, heavily lev eraging the geometric structure of the W asserstein space (Otto Calculus) and the dual formulation of Entropic Optimal T ransp ort. A.1 Pro of of Lemma 4 (First V ariation via Entropic Pullback) Pro of T o ensure rigorous adherence to the pseudo-Riemannian geometry of the W asserstein space (Otto Calculus), we cannot treat P 2 ( X ) as a linear space. Instead, we compute the F r´ ec het deriv ative by p erturbing the measure along an absolutely contin uous curv e go verned b y the contin uit y equation Villani (2003); P eyr´ e and Cuturi (2019). Let ξ ∈ C ∞ c ( X ; R d ) be a smo oth test v ector ﬁeld. W e deﬁne a perturbation of the source measure µ via the ﬂow map ev aluated at a small time t ≥ 0: µ t = ( I + tξ ) # µ This curve satisﬁes the con tinuit y equation ∂ t µ t + ∇ · ( ξ µ t ) = 0 at t = 0 in the weak sense. The target measure pushed forw ard b y the non-linear causal mechanism T evolv es along the curve: ρ t = T # µ t = ( T ◦ ( I + tξ )) # µ By the chain rule, the Eulerian velocity ﬁeld v t ( y ) driving the curve ρ t in the target space Y , ev aluated at t = 0, is giv en b y the pushforward of the vector ﬁeld ξ : v 0 ( T ( x )) = d dt     t =0 T ( x + tξ ( x )) = DT ( x ) ξ ( x ) where DT ( x ) is the Jacobian matrix of T at x . Consequently , ρ t satisﬁes the contin uit y equation ∂ t ρ t + ∇ · ( v 0 ρ t ) = 0 at t = 0. No w, w e ev aluate the v ariation of the Entropic Optimal T ransp ort cost F ε ( µ t ) = 1 2 W 2 2 ,ε ( ν, ρ t ). By the exact dual formulation of Entropic Optimal T ransp ort, w e ha ve: 1 2 W 2 2 ,ε ( ν, ρ t ) = sup f ,g  Z Y f ( y ) dν ( y ) + Z Y g ( y ) dρ t ( y ) − ε Z Z e f ( y )+ g ( y ′ ) − c ( y,y ′ ) ε dν ( y ) dρ t ( y ′ ) + ε  According to the inﬁnite-dimensional En velope Theorem, the deriv ative of the supremum with resp ect to t is the deriv ative of the ob jectiv e ev aluated at the unique optimal dual p oten tials ( f ( ε ) , g ( ε ) ). T aking the deriv ativ e at t = 0: d dt     t =0 F ε ( µ t ) = Z Y g ( ε ) ( y ) ∂ t ρ t | t =0 ( dy ) Using the w eak formulation of the con tin uity equation for ρ t , w e transfer the time deriv ative to a spatial gradient (in tegration by parts): Z Y g ( ε ) ( y ) ∂ t ρ 0 ( dy ) = Z Y ⟨∇ g ( ε ) ( y ) , v 0 ( y ) ⟩ dρ 0 ( y ) 24 Cohomological Obstr uctions to Global Counterf actuals By the change-of-v ariables theorem for the pushforward measure ρ 0 = T # µ , we pull this in tegral bac k to the source space X : Z Y ⟨∇ g ( ε ) ( y ) , v 0 ( y ) ⟩ d ( T # µ )( y ) = Z X ⟨∇ g ( ε ) ( T ( x )) , D T ( x ) ξ ( x ) ⟩ dµ ( x ) Applying the transp ose of the Jacobian, we rearrange the inner pro duct: Z X ⟨ D T ( x ) T ∇ g ( ε ) ( T ( x )) , ξ ( x ) ⟩ dµ ( x ) = Z X ⟨∇ ( g ( ε ) ◦ T )( x ) , ξ ( x ) ⟩ dµ ( x ) By the geometric deﬁnition of the W asserstein gradient, the v ariation of the functional F ε along the direction ξ must satisfy: d dt     t =0 F ε ( µ t ) = Z X  ∇  δ F ε δ µ  ( x ) , ξ ( x )  dµ ( x ) Iden tifying the Riesz represen ter in the L 2 ( X , µ ) inner pro duct closure, we obtain the W asserstein gradien t: ∇  δ F ε δ µ  = ∇ ( g ( ε ) ◦ T ) In tegrating this spatial gradient conﬁrms that the F r´ ec het deriv ative (ﬁrst v ariation) is exactly the pulled-bac k Sinkhorn dual p otential: δ F ε δ µ = g ( ε ) ◦ T whic h establishes the lemma rigorously under the Otto calculus framework. A.2 Pro of of Theorem 5 (Deriv ation of the Sheaf Laplacian) Pro of W e deploy the Riemannian geometric structure of the W asserstein space P 2 ( M i ) formalized b y Otto Calculus Villani (2003). The counterfactual probability measure µ i ev olves to minimize the global En tropic Causal Dirichlet Energy E ε via the W asserstein gradien t ﬂo w: ∂ t µ i = − grad W 2 E ε ( µ i ) (29) In Otto calculus, the Riemannian gradient of a functional E ( µ ) is deﬁned through the spatial gradient of its F r´ ec het deriv ative (the ﬁrst v ariation): grad W 2 E ( µ ) = −∇ ·  µ ∇ δ E δ µ  (30) W e compute the F r´ ec het deriv ativ e of the total energy E ε with resp ect to a single no de’s marginal µ i . The energy (Eq. 3) acts on µ i in tw o wa ys: as a target measure from its paren ts pa( i ), and as a source measure mapp ed to its children c h( i ). E ε ( µ i ) = X p ∈ pa( i ) ω pi 2 W 2 2 ,ε (Φ pi # µ p , µ i ) + X c ∈ ch( i ) ω ic 2 W 2 2 ,ε (Φ ic # µ i , µ c ) + const (31) 25 Wu, Xie and Li 1. P arent-to-Child V ariation (F orward p oten tial): F or eac h p ∈ pa( i ), taking the deriv ative of 1 2 W 2 2 ,ε (Φ pi # µ p , µ i ) with resp ect to its second argument µ i directly yields the forward Sinkhorn dual p oten tial f ( ε ) i ← p ( x ). 2. Child-to-Paren t V ariation (Pull- bac k p oten tial): F or each c ∈ ch( i ), µ i is the source measure mapp ed through Φ ic . By Lemma 4, the F r´ ec het deriv ative is the pulled-back Sinkhorn p oten tial from the c hild node: g ( ε ) c ← i (Φ ic ( x )). By linearity of the F r ´ ec het deriv ativ e, the total v ariation is the sum of these local top ological stresses: δ E ε δ µ i = X p ∈ pa( i ) ω pi f ( ε ) i ← p + X c ∈ ch( i ) ω ic  g ( ε ) c ← i ◦ Φ ic  (32) Substituting this v ariation into Eq. (30), w e obtain the deterministic drift comp onent of our equation. Finally , the Sinkhorn divergence inheren tly decomp oses into the unregularized W asser- stein distance plus an entrop y regularizer εH ( µ i ) = ε R µ i log µ i dx . The W asserstein gradi- en t ﬂo w of the entrop y functional precisely generates the heat equation: grad W 2 ( εH ( µ i )) = −∇ · ( µ i ∇ ( ε log µ i )) = − ε ∆ µ i (33) Com bining the top ological drift and the thermal diﬀusion yields the coupled non-linear F okker-Planc k equation strictly as stated in Eq. (9). Remark 15 (McKean-Vlaso v Structure and W ell-p osedness) The derive d system in Eq. (9) c onstitutes a highly non-line ar, c ouple d McKe an-Vlasov inter acting p article system. The drift terms fundamental ly dep end on the Sinkhorn dual p otentials f ( ε ) ( · , µ p , µ i ) , which ar e implicitly deﬁne d by the instantane ous states of adjac ent no des. By the the ory of metric gr adient ﬂows Peyr´ e and Cuturi (2019), the addition of the non-de gener ate thermal diﬀusion ε 2 ∆ µ i pr ovides the ne c essary p ar ab olic r e gularization. Under Assumption 1 (smo oth, c om- p act manifolds and Lipschitz me chanisms), this guar ante es the existenc e of we ak solutions to the c ouple d F okker-Planck system, cir cumventing the ﬁnite-time sho ckwaves (manifold te aring) that strictly aﬄict the unr e gularize d ( ε → 0 ) hyp erb olic limits. A.3 Explicit T ensor F ormulation of the T angent Sheaf Laplacian T o concretize the abstract W asserstein Ho dge Decomp osition presented in Theorem 8, we pro vide the explicit tensor pro duct formulation of the cob oundary op erator d , its adjoin t d ∗ , and the T angent Sheaf Laplacian ∆ T . W e ev aluate this strictly on the canonical running example of top ological frustration used in our pap er: the 3-no de causal graph G = ( V , E ) with V = { A, B , C } and conﬂicting edges E = { e 1 = ( A, B ) , e 2 = ( B , C ) , e 3 = ( A, C ) } . Let the global 0-co c hain space of causal p erturbations b e the direct sum of the lo cal Hilb ert tangen t spaces: T µ C 0 = T µ A P 2 ( M A ) ⊕ T µ B P 2 ( M B ) ⊕ T µ C P 2 ( M C ) ∼ = M v ∈{ A,B,C } L 2 ( µ v ; T M v ) (34) 26 Cohomological Obstr uctions to Global Counterf actuals Similarly , the 1-co c hain space measuring discrepancies on the edges is: T µ C 1 = T µ e 1 P 2 ⊕ T µ e 2 P 2 ⊕ T µ e 3 P 2 (35) where µ e 1 = µ B , µ e 2 = µ C , and µ e 3 = µ C are the target spaces of the resp ectiv e edge restrictions. 1. The Cob oundary Op erator ( d ): The linearization of the pushforw ard mapping along a deterministic causal mechanism Φ uv # yields the linear forward op erator d Φ uv # : L 2 ( µ u ) → L 2 ( µ v ). Acting on a joint velocity ﬁeld conﬁguration V = ( V A , V B , V C ) T ∈ T µ C 0 , the cob oundary operator d : T µ C 0 → T µ C 1 computes the top ological discrepancy (friction) along each edge. In blo ck op erator matrix form, this is exactly: d   V A V B V C   =   d Φ AB # − I 0 0 d Φ B C # − I d Φ AC # 0 − I     V A V B V C   =   d Φ AB # V A − V B d Φ B C # V B − V C d Φ AC # V A − V C   (36) 2. The Adjoint Op erator ( d ∗ ) and the VJP Connection: T o construct the Laplacian, we must rigorously deﬁne the Hilb ert adjoin t d ∗ : T µ C 1 → T µ C 0 . Giv en a discrepancy ﬁeld W = ( W AB , W B C , W AC ) T ∈ T µ C 1 , the adjoint satisﬁes ⟨ dV , W ⟩ C 1 = ⟨ V , d ∗ W ⟩ C 0 . Crucially , w e ev aluate the adjoint of the linearized pushforw ard ( d Φ uv # ) ∗ in the Otto inner pro duct. F or lo cal ﬁelds V u ∈ L 2 ( µ u ) and W uv ∈ L 2 ( µ v ): ⟨ d Φ uv # V u , W uv ⟩ µ v = Z M v ⟨ d Φ uv ( x ) V u ( x ) , W uv ( y ) ⟩ g dµ v ( y ) = Z M u ⟨ d Φ uv ( x ) V u ( x ) , W uv (Φ uv ( x )) ⟩ g dµ u ( x ) (Change of V ariables) = Z M u ⟨ V u ( x ) , ( d Φ uv ( x )) T W uv (Φ uv ( x )) ⟩ g dµ u ( x ) (37) This explicitly isolates the adjoint op erator: ( d Φ uv # ) ∗ W uv ( x ) = J T Φ uv ( x ) W uv (Φ uv ( x )) ≡ VJP Φ uv ( x, W uv ) (38) This deriv ation provides the profound functional-analytic pro of of the algorithmic claim in Section 9: The ge ometric adjoint r e quir e d for she af c ohomolo gy is exactly the automatic diﬀer entiation VJP in de ep le arning. T aking the formal transp ose of the blo ck matrix d , we write d ∗ : d ∗ =   ( d Φ AB # ) ∗ 0 ( d Φ AC # ) ∗ − I ( d Φ B C # ) ∗ 0 0 − I − I   (39) 3. The Explicit T angent Sheaf Laplacian ( ∆ T ): By comp osing d ∗ d , w e obtain the T angent Sheaf Laplacian ∆ T : T µ C 0 → T µ C 0 as a 3 × 3 27 Wu, Xie and Li blo c k op erator matrix acting on the causal system: ∆ T =   ( d Φ AB # ) ∗ d Φ AB # + ( d Φ AC # ) ∗ d Φ AC # − ( d Φ AB # ) ∗ − ( d Φ AC # ) ∗ − d Φ AB # I + ( d Φ B C # ) ∗ d Φ B C # − ( d Φ B C # ) ∗ − d Φ AC # − d Φ B C # 2 I   (40) Geometric In terpretation of the Blo c k Structure: • Diagonal Blo c ks (Lo cal Causal Stiﬀness): The term 2 I on no de C reﬂects its role as a pure sink with degree 2, absorbing information from t wo parents. The complex op erators ( d Φ AB # ) ∗ d Φ AB # measure the Fisher-Rao lo cal stiﬀness of the generative neural netw ork at the source no des. • Oﬀ-Diagonal Blo c ks (Message P assing): The oﬀ-diagonal terms dictate how top ological stress propagates backw ards. If C is pushed in to a void, the gradient − d Φ B C # propagates the stress to B , which is then pulled back to A via − ( d Φ AB # ) ∗ , activ ely deforming the upstream source to globally minimize the Causal Dirichlet Energy . This explicit matrix demonstrates that the inﬁnite-dimensional W asserstein Ho dge theory collapses b eautifully in to a scalable message-passing op eration ov er the causal graph, math- ematically v alidating the eﬀectiveness of the in teractive Langevin dynamics. App endix B. Higher-Order Causal Simplicial Complexes and H 2 Obstructions The framew ork presented in the main text strictly concerns 1-dimensional causal graphs (D AGs), where the top ological obstructions emerge from conﬂicting edges ( H 1  = 0). Ho w- ev er, our W asserstein Sheaf theory natively extends to higher-dimensional top ology , which rev eals profound implications for Syner gistic Confounding in causal inference. B.1 The Causal Simplicial Complex W e deﬁne a Causal Simplicial Complex K . In addition to 0-cells (no des, V ) and 1-cells (edges, E ), we in tro duce 2-cells (faces, T ), which represent synergistic ternary causal mech- anisms (i.e., interactions where A and B join tly determine C through an inseparable mech- anism Φ AB C , rather than indep enden t pairwise eﬀects). The sheaf is extended such that 2-co c hains strictly represent the join t discrepancies on these faces: C 2 ( K, F ) = L τ ∈T P 2 ( M τ ). W e introduce the higher-order cob oundary op erator d 1 : T µ C 1 → T µ C 2 , which measures the rotational curl of the causal mec hanisms. B.2 The Generalized W asserstein Ho dge-de Rham Decomp osition By lifting the geometry to higher-order complexes, the tangen t space of causal p erturbations on edges admits the complete Ho dge-de Rham orthogonal decomp osition: T µ C 1 = Im( d 0 ) ⊕ Ker(∆ 1 ) ⊕ Im( d ∗ 1 ) (41) where ∆ 1 = d 0 d ∗ 0 + d ∗ 1 d 1 is the 1-form Sheaf Laplacian. Ph ysical Implications for Adv anced Machine Learning: 28 Cohomological Obstr uctions to Global Counterf actuals • Im( d 0 ): The Exact Causal Flow (P airwise coherence). • Ker(∆ 1 ) ∼ = H 1 : The Harmonic Flo w (Resolving cycle contradictions). • Im( d ∗ 1 ): The Synergistic Solenoidal Flow . When the second cohomology group H 2 ( K, F )  = 0, the causal system suﬀers from a “V olume T earing” obstruction. This o ccurs when lo cal pairwise mechanisms strictly con tradict the higher-order ternary mechanisms. Exploring the H 2 harmonic ﬂo w provides a strict mathematical roadmap for designing the next generation of top ology-a ware causal graph neural netw orks (e.g., Causal Hyp ergraph Mo dels), which we lea ve as an exciting a ven ue for future work. App endix C. T op ological P A C-Learning: Betti Numbers Bound Sample Complexit y While this pap er primarily fo cuses on the algorithmic and geometric foundations of the En tropic Sheaf Flow, a natural theoretical consequence is how the topology of the causal graph gov erns the statistical learning rate. By integrating W asserstein concentration inequalities with our Ho dge decomp osition, it can be sho wn that the exp ected error b et w een the empirical sheaf ﬂo w ˆ µ ( T ) (computed on N ﬁnite particles) and the true p opulation counterfactual measure µ ∗ is strictly low er- b ounded by the top ological complexit y of the graph. Sp eciﬁcally , the rate of con v ergence deca ys exp onen tially with resp ect to the ﬁrst Betti num b er β 1 ( G ) = dim( H 1 ): E h W 2 2 ( ˆ µ ( T ) , µ ∗ ) i ≤ O  1 √ N e C · β 1 ( G ) + f ( T , ε )  (42) This establishes a profound ”No F ree Lunch” theorem for topological causal inference: resolving higher degrees of causal frustration ( β 1 ≫ 0) demands an exp onentially larger sample size to preven t empirical noise from aliasing as false harmonic stress. App endix D. Rigorous Deriv ation of Energy Dissipation via A GS Calculus T o rigorously supp ort the energy dissipation iden tity in Theorem 10 without relying on the restrictiv e assumption of global geo desic con vexit y (which is frequently violated b y highly non-linear neural pushforw ard mechanisms), we inv ok e the framework of metric gradient ﬂo ws established by Am brosio, Gigli, and Sav ar ´ e (AGS) Am brosio et al. (2005). Let µ t = ( µ v ,t ) v ∈V b e an absolutely con tinuous curve deﬁned on the global measure space C 0 . In the AGS framew ork, the classical time deriv ativ e is replaced by the metric derivative | µ ′ | ( t ), which strictly quan tiﬁes the instantaneous sp eed of the probability measures under the W asserstein metric: | µ ′ | ( t ) := lim s → t W 2 ( µ s , µ t ) | s − t | . (43) F urthermore, the gradient of the energy functional E ε is characterized by the lo c al slop e (or upp er gradien t) | ∂ E ε | ( µ ), which measures the maximal instantaneous rate of energy 29 Wu, Xie and Li decrease: | ∂ E ε | ( µ ) := lim sup ν → µ ( E ε ( µ ) − E ε ( ν )) + W 2 ( µ , ν ) . (44) F or suﬃciently regular functionals in P 2 ( M ) (whic h our regularized Sinkhorn divergence guaran tees), the lo cal slop e strictly coincides with the L 2 ( µ )-norm of the W asserstein gra- dien t: | ∂ E ε | ( µ t ) =   grad W 2 E ε ( µ t )   µ t = X i ∈V Z M i     ∇ x  δ E ε δ µ i,t      2 g dµ i,t ( x ) ! 1 / 2 . (45) By the absolute con tinuit y of the curv e µ t , the c hain rule in the metric space C 0 pro vides the fundamental energy dissipation inequality (the De Giorgi interpolation form ulation): E ε ( µ 0 ) − E ε ( µ T ) ≤ 1 2 Z T 0 | µ ′ | 2 ( t ) dt + 1 2 Z T 0 | ∂ E ε | 2 ( µ t ) dt. (46) Crucially , the En tropic Sheaf Flo w strictly op erates as the steep est descent curve in the W asserstein space. Under the A GS gradien t ﬂo w deﬁnition, the v elo cit y ﬁeld exactly matc hes the negativ e W asserstein gradient, meaning | µ ′ | ( t ) = | ∂ E ε | ( µ t ) almost everywhere in t . Substituting this strict equiv alence in to the De Giorgi inequality forces it to b ecome an equalit y , yielding the exact energy identit y: E ε ( µ 0 ) − E ε ( µ T ) = Z T 0 | ∂ E ε | 2 ( µ t ) dt. (47) Diﬀeren tiating with respect to T rigorously yields the instan taneous energy dissipation rate: d dt E ε ( µ t ) = −| ∂ E ε | 2 ( µ t ) = − X i ∈V Z M i     ∇ x  δ E ε δ µ i,t      2 g dµ i,t ( x ) ≤ 0 . (48) This establishes the Entropic Causal Diric hlet Energy E ε as a strict Lyapuno v functional. Crucially , as established b y Lemma 6 and Theorem 7, in the presence of a top ological obstruction ( H 1  = 0), this energy is strictly b ounded from b elo w ( E ε ≥ E ∗ > 0). The strict energy dissipation via the A GS gradien t ﬂo w guaran tees that the sequence of measures m ust w eakly con verge to a stationary equilibrium µ ∗ where the lo cal slop e v anishes ( | ∂ E ε | ( µ ∗ ) = 0). A t this stationary ro ot, the unresolv able topological stress p erfectly isolates and falls en tirely into the harmonic subspace Ker(∆ T ), strictly conﬁrming the mathematical well- p osedness of the coun terfactual equilibrium even under severe geometric frustration. References Luigi Am brosio, Nicola Gigli, and Giusepp e Sav ar ´ e. Gr adient Flows: In Metric Sp ac es and in the Sp ac e of Pr ob ability Me asur es . Birkh¨ auser, 2005. Cristian Bo dnar, F rancesco Di Giov anni, Benjamin P . Chamberlain, Pietro Li` o, and Mic hael M. Bronstein. Neural sheaf diﬀusion: A top ological p ersp ectiv e on heterophily and ov ersmo othing in GNNs. In A dvanc es in Neur al Information Pr o c essing Systems (NeurIPS) , volume 35, 2022. 30 Cohomological Obstr uctions to Global Counterf actuals Philipp e Brouillard, Sebastien Lachapelle, Alexandre Lacoste, Simon Lacoste-Julien, and Alexandre Drouin. Diﬀerentiable causal discov ery from in terven tional data. In A dvanc es in Neur al Information Pr o c essing Systems (NeurIPS) , volume 33, pages 21865–21877, 2020. Marco Cuturi. Sinkhorn distances: Ligh tsp eed computation of optimal transport. In A d- vanc es in Neur al Information Pr o c essing Systems (NeurIPS) , pages 2292–2300, 2013. Donald A Da wson and J ¨ urgen G¨ artner. Large deviations from the mck ean-vlaso v limit for w eakly in teracting diﬀusions. Sto chastics: An International Journal of Pr ob ability and Sto chastic Pr o c esses , 20(4):247–308, 1987. Mark I F reidlin and Alexander D W entzell. R andom Perturb ations of Dynamic al Systems . Springer, 1998. Olivier Goudet, Diviy an Kalainathan, Philipp e Caillou, Isab elle Guyon, Da vid Lop ez-P az, and Mic hele Sebag. Causal generative neural net w orks. arXiv pr eprint arXiv:1711.08936 , 2017. Jak ob Hansen and Rob ert Ghrist. T ow ard predictiv e sensor netw orks with top ological sheaf theory . Pr o c e e dings of the IEEE , 107(5):900–911, 2019. Ric hard Jordan, Da vid Kinderlehrer, and F elix Otto. The v ariational form ulation of the fokk er–planck equation. SIAM journal on mathematic al analysis , 29(1):1–17, 1998. Murat Ko caoglu, Christopher Snyder, Alexandros G. Dimakis, and Sriram Vish wanath. CausalGAN: Learning causal implicit generative mo dels with adversarial training. In International Confer enc e on L e arning R epr esentations (ICLR) , 2018. John Lott and C ´ edric Villani. Ricci curv ature for metric-measure spaces via optimal trans- p ort. Annals of Mathematics , pages 903–991, 2009. Judea Pearl. Causality . Cambridge Univ ersity Press, 2nd edition, 2009. Gabriel P eyr´ e and Marco Cuturi. Computational optimal transp ort: With applications to data science. F oundations and T r ends ® in Machine L e arning , 11(5-6):355–607, 2019. Karl-Theo dor Sturm. On the geometry of metric measure spaces. A cta mathematic a , 196 (1):65–131, 2006. C ´ edric Villani. T opics in Optimal T r ansp ortation , volume 58 of Gr aduate Studies in Math- ematics . American Mathematical So ciet y , 2003. C ´ edric Villani. Optimal T r ansp ort: Old and New . Springer Science & Business Media, 2008. Xun Zheng, Bryon Aragam, Pradeep K Ra vikumar, and Eric P Xing. Dags with no tears: Con tinuous optimization for structure learning. In A dvanc es in Neur al Information Pr o- c essing Systems (NeurIPS) , volume 31, 2018. 31

Cohomological Obstructions to Global Counterfactuals: A Sheaf-Theoretic Foundation for Generative Causal Models

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment