Marginal Likelihood Inference for Fitting Dynamical Survival Analysis Models to Epidemic Count Data
Stochastic compartmental models are prevalent tools for describing disease spread, but inference under these models is challenging for many types of surveillance data when the marginal likelihood function becomes intractable due to missing information. To address this, we develop a closed-form likelihood for discretely observed incidence count data under the dynamical survival analysis (DSA) paradigm. The method approximates the stochastic population-level hazard by a large population limit while retaining a count-valued stochastic model, and leads to survival analytic inferential strategies that are both computationally efficient and flexible to model generalizations. Through simulation, we show that parameter estimation is competitive with recent exact but computationally expensive likelihood-based methods in partially observed settings. Previous work has shown that the DSA approximation is generalizable, and we show that the inferential developments here also carry over to models featuring individual heterogeneity, such as frailty models. We consider case studies of both Ebola and COVID-19 data on variants of the model, including a network-based epidemic model and a model with distributions over susceptibility, demonstrating its flexibility and practical utility on real, partially observed datasets.
💡 Research Summary
Stochastic compartmental models such as the continuous‑time Markov chain (CTMC) SIR framework provide a realistic description of epidemic dynamics, but inference becomes intractable when only aggregated incidence counts are observed and the exact infection and recovery times are missing. Traditional solutions—numerical transition‑probability computation, diffusion approximations, particle filters, or data‑augmented MCMC—require intensive simulation and scale poorly with model complexity.
This paper introduces a marginal‑likelihood approach based on Dynamical Survival Analysis (DSA). The authors start from Sellke’s construction, where each susceptible individual carries an exponential infection threshold Q_i and becomes infected once cumulative exposure Λ(t)=β∫₀ᵗ I(s)ds exceeds Q_i. In the large‑population limit (N→∞, M/N→ρ), the stochastic hazard converges to the deterministic ODE system (5). The solution s(t)=exp(−R₀ r(t)) serves as the survival function for a randomly chosen susceptible, and the associated hazard β·ι(t) yields explicit densities for infection time T_I and recovery time T_R.
Two observation regimes are considered. First, when exact infection times are known but recovery information is absent, the likelihood reduces to a product of the individual infection densities f_TI(t_i), which depends only on the ODE solution and not on the unknown initial susceptible count N. Second, for the more realistic case where only interval‑wise case counts Y_j are available (e.g., daily or weekly reports), the authors perform a change of variables u_{jl}= (s(t_{jl})−s(ξ_{j−1}))/(s(ξ_{j−1})−s(ξ_j)). The Jacobian calculation shows that each u_{jl} follows a Uniform(0,1) distribution, so integrating them out contributes a factor of one. Consequently, the marginal count likelihood collapses to
l(θ|{Y_j}) = s_T^{N−K} ∏{j=1}^P (s(ξ{j−1})−s(ξ_j))^{Y_j},
a closed‑form expression that sidesteps the high‑dimensional integration normally required. Importantly, the term involving N disappears when N is unknown, allowing inference without knowledge of the total susceptible population.
The DSA framework is readily extended. The authors incorporate individual heterogeneity via frailty distributions on susceptibility and replace homogeneous mixing with Poisson random networks, thereby allowing β to vary with network structure while preserving the ODE‑based hazard approximation. Simulation studies demonstrate that, under partially observed scenarios, DSA‑based estimators recover true parameters with accuracy comparable to exact likelihood methods but at a fraction of the computational cost (often >10× faster).
Real‑world applications include an Ebola outbreak in West Africa and multiple waves of COVID‑19, including variant‑specific analyses. In both cases, estimates of the basic reproduction number R₀, recovery rate γ, and heterogeneity parameters align with published values, and posterior predictive intervals accurately track observed incidence trajectories. The network‑based COVID‑19 model captures spatial transmission patterns that homogeneous models miss.
In summary, by exploiting the large‑population limit to obtain a deterministic hazard and by interpreting infection times through a survival‑analysis lens, the authors derive a tractable, closed‑form marginal likelihood for count data. This approach offers substantial gains in computational efficiency, flexibility for model extensions, and practical applicability to real epidemic datasets, positioning DSA as a powerful alternative to existing inference techniques for stochastic epidemic models.
Comments & Academic Discussion
Loading comments...
Leave a Comment