Causal Representation Meets Stochastic Modeling under Generic Geometry
Learning meaningful causal representations from observations has emerged as a crucial task for facilitating machine learning applications and driving scientific discoveries in fields such as climate science, biology, and physics. This process involves disentangling high-level latent variables and their causal relationships from low-level observations. Previous work in this area that achieves identifiability typically focuses on cases where the observations are either i.i.d. or follow a latent discrete-time process. Nevertheless, many real-world settings require identifying latent variables that are continuous-time stochastic processes (e.g., multivariate point processes). To this end, we develop identifiable causal representation learning for continuous-time latent stochastic point processes. We study its identifiability by analyzing the geometry of the parameter space. Furthermore, we develop MUTATE, an identifiable variational autoencoder framework with a time-adaptive transition module to infer stochastic dynamics. Across simulated and empirical studies, we find that MUTATE can effectively answer scientific questions, such as the accumulation of mutations in genomics and the mechanisms driving neuron spike triggers in response to time-varying dynamics.
💡 Research Summary
The paper tackles a fundamental gap in causal representation learning: the ability to identify latent variables that evolve as continuous‑time stochastic point processes, such as Hawkes‑type self‑exciting processes, when the observed high‑dimensional data are generated through an unknown, possibly non‑invertible mixing function. Existing identifiability results are limited to i.i.d. observations or discrete‑time latent dynamics; however, many scientific domains—genomics, neuroscience, climate modeling—naturally involve events that occur in continuous time.
Problem formulation
Observations (O_t\in\mathbb{R}^n) are modeled as (O_t = f(N_t(\Delta))), where (N_t) is a multivariate point process with conditional intensity (\lambda_i(t)=\mu_i+\sum_j\int_0^t\phi_{i\leftarrow j}(t-s)dN_j(s)). The mixing map (f) is arbitrary and may be non‑invertible. The goal is to recover the full parameter set (\Theta = (f, N_t, \lambda_t, \Phi, \mu)) from observations alone. Key assumptions include (1) stationary increments of the point process, (2) square‑integrable kernel (\Phi), and (3) a generic‑point condition in algebraic geometry (i.e., parameters lie in a dense open subset of the parameter space).
Identifiability theory
The authors introduce a “weakly‑convergent equivalent class” to capture the fact that only the distributional limit as the observation resolution (\Delta\to0) matters. Lemma 1 shows that a variational approximation of the intensity converges weakly to the true point process; Lemma 2 extends this to the full latent dynamics. By defining the ideal (I = \langle P(O_t) - P_{\Theta}\rangle) generated by the difference between empirical observation statistics and model‑implied statistics, they prove that identifiability holds iff (I) is zero‑dimensional—i.e., the solution set consists of finitely many points. This geometric condition guarantees that the mixing function and the kernel parameters can be uniquely recovered (up to the trivial permutation/scale ambiguities inherent in causal models). The proof leverages results from algebraic geometry (generic points, algebraically closed fields) and recent work showing that zero‑dimensional solution sets correspond to full parameter recovery.
Algorithm – MUTATE
Building on the theory, the authors propose MUTATE, a variational auto‑encoder (VAE) architecture designed for continuous‑time point‑process data. The encoder maps raw observations to a latent representation consisting of event times and an estimated intensity function. A novel time‑adaptive transition module learns the kernel (\Phi) and baseline (\mu) while explicitly accounting for the discretization step (\Delta). The decoder applies the learned mixing map (f) to reconstruct observations. Training maximizes the evidence lower bound (ELBO) that combines a Poisson log‑likelihood for the point process with a reconstruction loss for the observed signals. The transition module’s adaptability allows MUTATE to handle varying sampling rates and to respect the weak convergence conditions required by the identifiability proofs.
Experiments
- Synthetic Hawkes data – Multivariate Hawkes processes with known causal graphs and kernels are simulated. MUTATE recovers the graph structure and kernel shapes with higher accuracy than baseline i.i.d. VAEs, ODE‑based VAEs, and discrete‑time causal models.
- Genomics mutation accumulation – Real tumor sequencing data are treated as a temporal point process of mutation events. MUTATE identifies driver genes whose mutation rates increase over time and provides a causal ordering consistent with biological literature.
- Neuronal spike triggers – Multi‑electrode recordings from mouse visual cortex are modeled as spike trains driven by time‑varying stimuli. MUTATE uncovers stimulus‑to‑spike causal kernels, revealing how specific visual features modulate neuronal firing rates.
Across all settings, MUTATE demonstrates superior log‑likelihood, better recovery of ground‑truth causal edges, and more interpretable kernel estimates compared to competing methods.
Limitations and future work
The generic‑point assumption requires that the true parameters lie in a dense open set; in practice, limited data or highly constrained systems may violate this condition, leading to non‑identifiable scenarios. The theory guarantees identifiability only when the full distribution of observations is available; the paper uses moments and cumulants as proxies, which may be insufficient in highly noisy regimes. Computationally, learning a high‑dimensional non‑invertible mixing map together with flexible kernels is expensive, and scaling to very large numbers of latent processes remains an open challenge. Future directions include extending the framework to non‑stationary point processes, developing online inference for streaming data, and incorporating stronger regularization to relax the generic‑point requirement.
Conclusion
The work makes two major contributions: (1) a rigorous identifiability analysis for continuous‑time stochastic point processes under generic, possibly non‑invertible mixing, grounded in algebraic‑geometric arguments; (2) a practical VAE‑based algorithm, MUTATE, that operationalizes the theory and successfully recovers latent causal dynamics in both synthetic and real scientific datasets. This bridges a critical methodological gap, enabling causal discovery in domains where events naturally unfold in continuous time.
Comments & Academic Discussion
Loading comments...
Leave a Comment