Evolutionary Algorithms: Concepts, Designs, and Applications in Bioinformatics: Evolutionary Algorithms for Bioinformatics

Since genetic algorithm was proposed by John Holland (Holland J. H., 1975) in the early 1970s, the study of evolutionary algorithm has emerged as a popular research field (Civicioglu & Besdok, 2013). Researchers from various scientific and engineering disciplines have been digging into this field, exploring the unique power of evolutionary algorithms (Hadka & Reed, 2013). Many applications have been successfully proposed in the past twenty years. For example, mechanical design (Lampinen & Zelinka, 1999), electromagnetic optimization (Rahmat-Samii & Michielssen, 1999), environmental protection (Bertini, Felice, Moretti, & Pizzuti, 2010), finance (Larkin & Ryan, 2010), musical orchestration (Esling, Carpentier, & Agon, 2010), pipe routing (Furuholmen, Glette, Hovin, & Torresen, 2010), and nuclear reactor core design (Sacco, Henderson, Rios-Coelho, Ali, & Pereira, 2009). In particular, its function optimization capability was highlighted (Goldberg & Richardson, 1987) because of its high adaptability to different function landscapes, to which we cannot apply traditional optimization techniques (Wong, Leung, & Wong, 2009). Here we review the applications of evolutionary algorithms in bioinformatics.

💡 Research Summary

The paper provides a comprehensive review of evolutionary algorithms (EAs) and their applications in bioinformatics, beginning with a historical overview that traces the origin of genetic algorithms (GAs) to John Holland’s pioneering work in the early 1970s. It outlines the fundamental components common to all EAs—selection, crossover, mutation, and fitness evaluation—and then distinguishes several major variants: classic GA, Differential Evolution (DE), Particle Swarm Optimization (PSO), Evolution Strategies (ES), and Genetic Programming (GP). Each variant is examined in terms of its representation capabilities (binary, real‑valued, tree‑based), its balance between exploration and exploitation, and its suitability for single‑objective versus multi‑objective problems.

The core of the review is organized around four principal bioinformatics problem classes where EAs have demonstrated significant impact. First, multiple‑objective GAs have been employed for sequence alignment, simultaneously optimizing alignment scores, structural conservation, and evolutionary distance, thereby outperforming traditional dynamic‑programming methods in flexibility and accuracy. Second, protein tertiary‑structure prediction benefits from hybrid approaches that combine DE’s global search with local physics‑based refinement or molecular dynamics, achieving lower root‑mean‑square deviation (RMSD) relative to experimentally determined structures. Third, the inference of gene‑regulatory networks and expression‑based pathways uses ES to encode network topologies; fitness functions blend data‑fit terms (e.g., likelihood) with model‑complexity penalties (e.g., Bayesian Information Criterion), and Pareto‑optimal fronts reveal a spectrum of plausible networks. Fourth, phylogenetic tree reconstruction leverages GP, which directly manipulates tree structures through crossover and mutation, yielding trees with higher bootstrap support compared with maximum‑likelihood approaches.

For each application, the authors discuss the design of problem‑specific encodings, the construction of composite fitness functions, and the computational burden of fitness evaluation. They highlight the use of surrogate models—Gaussian processes, deep neural networks, or other machine‑learning predictors—to approximate expensive energy calculations or likelihoods, thereby reducing runtime without sacrificing solution quality. The review also surveys recent advances in parallel and distributed EA execution, including GPU‑accelerated fitness evaluation, cloud‑based island models, and automated parameter tuning via meta‑evolutionary strategies. Emphasis is placed on multi‑objective optimization and Pareto analysis as essential tools for preserving solution diversity, a critical factor when biological data admit multiple plausible interpretations.

The paper concludes by identifying current limitations and future research directions. Key challenges include the high cost of fitness evaluation for large‑scale omics datasets, the need for more biologically informed representations, and the integration of EA with deep learning frameworks to create hybrid meta‑heuristics. The authors argue that continued algorithmic innovation, coupled with expanding high‑performance computing resources, will further cement EAs as a versatile and powerful methodology for tackling the complex, non‑linear optimization problems that pervade modern bioinformatics.

💡 Research Summary

📜 Original Paper Content