An iterative feature selection method for GRNs inference by exploring topological properties
An important problem in bioinformatics is the inference of gene regulatory networks (GRN) from temporal expression profiles. In general, the main limitations faced by GRN inference methods is the small number of samples with huge dimensionalities and the noisy nature of the expression measurements. In face of these limitations, alternatives are needed to get better accuracy on the GRNs inference problem. This work addresses this problem by presenting an alternative feature selection method that applies prior knowledge on its search strategy, called SFFS-BA. The proposed search strategy is based on the Sequential Floating Forward Selection (SFFS) algorithm, with the inclusion of a scale-free (Barab'asi-Albert) topology information in order to guide the search process to improve inference. The proposed algorithm explores the scale-free property by pruning the search space and using a power law as a weight for reducing it. In this way, the search space traversed by the SFFS-BA method combines a breadth-first search when the number of combinations is small (
💡 Research Summary
**
The paper addresses the challenging problem of inferring gene regulatory networks (GRNs) from temporal gene‑expression data, a setting characterized by a very large number of variables (thousands of genes) but only a few samples (tens of experiments). Traditional GRN inference methods struggle in this high‑dimensional, low‑sample regime because statistical estimators become unreliable and noise heavily contaminates the data. To mitigate these issues, the authors propose a novel feature‑selection driven inference framework that incorporates prior knowledge about the global topology of the network, specifically the scale‑free property observed in many biological systems.
The core of the method is an algorithm called SFFS‑BA (Sequential Floating Forward Selection with Barabási‑Albert prior). It builds on the well‑known Sequential Floating Forward Selection (SFFS) technique, which improves upon simple Sequential Forward Selection (SFS) by allowing both insertion and removal of features during the search, thereby reducing the nesting effect. However, SFFS still explores an exponentially growing search space when the cardinality of the candidate predictor set exceeds two. SFFS‑BA solves this by guiding the search with a scale‑free prior derived from the Barabási‑Albert model. The prior assigns a power‑law weight to each gene based on its expected degree (hub likelihood).
The algorithm operates in two regimes. For small subsets (k ≤ 2) it performs a breadth‑first exploration, evaluating all possible 1‑ and 2‑gene combinations. When the subset size reaches three or more (k ≥ 3), the algorithm switches to a depth‑first strategy that preferentially expands candidate sets containing high‑weight (potential hub) genes. This hybrid approach dramatically reduces the number of evaluated subsets while preserving the ability to discover hub‑centric regulatory relationships that dominate scale‑free networks.
For the criterion function the authors use mean conditional entropy, an information‑theoretic measure that quantifies the uncertainty of a target gene given a set of predictors. Lower conditional entropy indicates stronger predictive power and thus a more plausible regulatory link. The combination of a topology‑aware search and an entropy‑based scoring function yields a robust selection of predictor genes for each target.
Experimental validation is performed on synthetic GRNs generated with the Barabási‑Albert model, covering various network sizes (100–500 nodes) and average degrees (2–4). The authors compare SFFS‑BA against standard SFS and SFFS under multiple noise levels. Results show consistent improvements: precision, recall, and F‑score increase by roughly 10–15 % on average, with the most pronounced gains in correctly recovering edges incident to hub genes. Moreover, the performance degradation caused by increasing noise is substantially milder for SFFS‑BA, demonstrating its robustness.
The paper also discusses limitations. The method assumes that the true underlying network follows a scale‑free distribution; deviations from this assumption could reduce its advantage. Parameter choices such as the power‑law exponent γ and the scaling of the prior weights influence the search and may require tuning for real data. Finally, all experiments are based on simulated data; application to real microarray or RNA‑Seq datasets remains an open task.
In conclusion, the authors present a compelling integration of complex‑network theory with feature‑selection algorithms for GRN inference. By embedding a Barabási‑Albert prior into the SFFS framework, they achieve a more efficient search and better recovery of biologically relevant hub‑centric interactions, offering a promising direction for future work that includes real‑world validation, automatic parameter learning, and extension to other topological priors such as modularity.
Comments & Academic Discussion
Loading comments...
Leave a Comment