Reverse engineering gene regulatory networks using approximate Bayesian computation
Gene regulatory networks are collections of genes that interact with one other and with other substances in the cell. By measuring gene expression over time using high-throughput technologies, it may be possible to reverse engineer, or infer, the structure of the gene network involved in a particular cellular process. These gene expression data typically have a high dimensionality and a limited number of biological replicates and time points. Due to these issues and the complexity of biological systems, the problem of reverse engineering networks from gene expression data demands a specialized suite of statistical tools and methodologies. We propose a non-standard adaptation of a simulation-based approach known as Approximate Bayesian Computing based on Markov chain Monte Carlo sampling. This approach is particularly well suited for the inference of gene regulatory networks from longitudinal data. The performance of this approach is investigated via simulations and using longitudinal expression data from a genetic repair system in Escherichia coli.
💡 Research Summary
The paper addresses the challenging problem of inferring gene regulatory networks (GRNs) from longitudinal gene‑expression data, a setting characterized by a very large number of genes (high‑dimensional “p”) but a relatively small number of time points and biological replicates (“small‑n”). Traditional statistical tools struggle in this regime because the number of possible gene‑to‑gene interactions grows quadratically with the number of genes, leading to an astronomically large model space and severe over‑fitting risks. Dynamic Bayesian Networks (DBNs) have been widely used for GRN inference, but they typically require an explicit likelihood function, which is often unavailable when the error distribution of the underlying dynamical system is unknown or non‑Gaussian.
To overcome this limitation, the authors propose a novel adaptation of Approximate Bayesian Computation (ABC) within a Markov chain Monte Carlo (MCMC) framework, specifically tailoring the ABC‑MCMC algorithm of Marjoram et al. (2003) for GRN inference. Their method, dubbed “ABC‑Net,” proceeds as follows: (1) a Bayesian model is defined where the observed expression matrix Y (P genes × T time points) follows a first‑order vector autoregressive (VAR(1)) process, Y_t = Θ Y_{t‑1} + ε_t. The error term ε_t is assumed to have zero mean and finite covariance, but no specific distribution (e.g., normal) is imposed, making the likelihood intractable. (2) Two latent objects are introduced: an adjacency matrix G (binary indicators of regulatory links) and a parameter matrix Θ (interaction strengths). Priors are chosen to reflect biological knowledge: G is uniformly distributed over all possible sparse graphs subject to a maximum fan‑in per gene, while non‑zero Θ_{ij} are uniformly drawn from the interval
Comments & Academic Discussion
Loading comments...
Leave a Comment