Non-Linear Drivers of Population Dynamics: a Nonparametric Coalescent Approach

Non-Linear Drivers of Population Dynamics: a Nonparametric Coalescent Approach
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Effective population size (Ne(t)) is a fundamental parameter in population genetics and phylodynamics that quantifies genetic diversity and reveals demographic history. Coalescent-based methods enable the inference of Ne(t) trajectories through time from phylogenies reconstructed from molecular sequence data. Understanding the ecological and environmental drivers of population dynamics requires linking Ne(t) to external covariates. Existing approaches typically impose log-linear relationships between covariates and Ne(t), which may fail to capture complex biological processes and can introduce bias when the true relationship is nonlinear. We present a flexible Bayesian framework that integrates covariates into coalescent models with piecewise-constant Ne(t) through a Gaussian process (GP) prior. The GP, a distribution over functions, naturally accommodates nonlinear covariate effects without restrictive parametric assumptions. This formulation improves estimation of covariate-Ne(t) relationships, mitigates bias under nonlinear associations, and yields interpretable uncertainty quantification that varies across the covariate space. To balance global covariate-driven patterns with local temporal dynamics, we couple the GP prior with a Gaussian Markov random field that enforces smoothness in Ne(t) trajectories. Through simulation studies and three empirical applications - yellow fever virus dynamics in Brazil (2016-2018), late-Quaternary musk ox demography, and HIV-1 CRF02-AG evolution in Cameroon - we demonstrate that our method both confirms linear relationships where appropriate and reveals nonlinear covariate effects that would otherwise be missed or mischaracterized. This framework advances phylodynamic inference by enabling more accurate and biologically realistic modeling of how environmental and epidemiological factors shape population size through time.


💡 Research Summary

This paper introduces a flexible Bayesian framework for linking external covariates to effective population size trajectories inferred from coalescent models. Traditional phylodynamic approaches typically assume a log‑linear relationship between covariates and Ne(t), which can be overly restrictive when the true association is nonlinear. The authors retain the computationally attractive piecewise‑constant representation of Ne(t) used in Skygrid/Skyride methods, but replace the deterministic log‑linear mean with a Gaussian Process (GP) prior defined over the covariate space. The GP, governed by a kernel with hyper‑parameters that are themselves inferred from the data, allows the model to capture a broad class of smooth, nonlinear relationships without imposing a specific functional form. To preserve temporal coherence and avoid over‑fitting, a Gaussian Markov Random Field (GMRF) prior is applied across adjacent time intervals, enforcing smoothness in the Ne(t) trajectory while still permitting abrupt changes when supported by the data.

Inference is performed using Hamiltonian Monte Carlo (HMC), which leverages analytically derived gradients and Hessians of the posterior to efficiently explore the high‑dimensional latent field comprising the piecewise‑constant Ne values and the GP function values. This gradient‑based sampler dramatically improves mixing compared with traditional random‑walk Metropolis algorithms, making the approach scalable to realistic data sets.

Through extensive simulations, the authors demonstrate that when the true covariate‑Ne relationship is nonlinear (e.g., saturating, inverted‑U, or threshold effects), the GP‑augmented model yields unbiased estimates and well‑calibrated credible intervals, whereas the log‑linear model exhibits substantial bias and under‑coverage.

The methodology is applied to three empirical case studies: (1) yellow fever virus spread in Brazil (2016‑2018), where precipitation and temperature show a saturating effect on transmission intensity; (2) late‑Quaternary musk‑ox population dynamics, revealing an inverted‑U relationship with paleoclimate proxies, indicating optimal population size at intermediate climate conditions; and (3) HIV‑1 CRF02‑AG evolution in Cameroon, where human mobility and antiretroviral therapy coverage display a complex, initially steep, then plateauing influence on effective population size. In each case, the GP model uncovers biologically plausible nonlinear patterns that would be missed or mischaracterized by a log‑linear approach.

Overall, the paper presents a novel hybrid model that combines the computational efficiency of piecewise‑constant coalescent representations with the expressive power of non‑parametric GP priors, implemented via HMC for fast Bayesian inference. This framework advances phylodynamic inference by enabling more accurate, realistic modeling of how environmental, ecological, and epidemiological factors shape population size through time.


Comments & Academic Discussion

Loading comments...

Leave a Comment