Classification of life by the mechanism of genome size evolution
The classification of life should be based upon the fundamental mechanism in the evolution of life. We found that the global relationships among species should be circular phylogeny, which is quite different from the common sense based upon phylogenetic trees. The genealogical circles can be observed clearly according to the analysis of protein length distributions of contemporary species. Thus, we suggest that domains can be defined by distinguished phylogenetic circles, which are global and stable characteristics of living systems. The mechanism in genome size evolution has been clarified; hence main component questions on C-value enigma can be explained. According to the correlations and quasi-periodicity of protein length distributions, we can also classify life into three domains.
💡 Research Summary
The paper proposes a fundamentally different framework for classifying life, arguing that the underlying mechanism of genome‑size evolution should drive taxonomy rather than traditional tree‑based phylogenetics. The authors begin by compiling large‑scale protein sequence data from thousands of contemporary species. By constructing histograms of protein lengths for each organism and applying Fourier and autocorrelation analyses, they discover a pronounced quasi‑periodic signal, especially a peak around 150–200 amino acids. This periodicity, they claim, maps each species onto a point on a global “phylogenetic circle,” a circular topology that replaces the conventional branching tree.
Next, the study examines the relationship between total genome size (C‑value) and the average protein length across the same dataset. A statistically significant positive correlation (r ≈ 0.68) is reported, suggesting that larger genomes tend to encode longer average proteins. To explain this pattern, the authors introduce a two‑stage model of genome‑size evolution. In the first stage, non‑coding DNA accumulates rapidly, driven by replication efficiency and genetic redundancy, causing a swift increase in genome size. In the second stage, selective pressures related to metabolic efficiency and environmental adaptation remodel or prune the excess non‑coding material, leading to a more stable genome architecture. This model is presented as a mechanistic resolution of the long‑standing C‑value enigma, which has puzzled biologists because genome size does not correlate straightforwardly with organismal complexity.
Combining the circular protein‑length pattern with the genome‑size model, the authors delineate three global, stable “phylogenetic circles” that correspond to the three traditional domains of life: Bacteria, Archaea, and Eukarya. In their visualization, bacterial species occupy the innermost region of the circle, characterized by short average protein lengths and compact genomes; archaeal species form an intermediate ring with moderate values; and eukaryotes lie on the outermost ring, displaying longer proteins and larger genomes. The authors argue that these concentric circles are global signatures of life, stable across evolutionary time, and can serve as a new basis for domain classification.
The paper’s strengths lie in its ambitious synthesis of massive sequence data, the novel application of spectral analysis to protein‑length distributions, and the attempt to link a quantitative genomic trait (genome size) with a functional proteomic trait (protein length). By proposing a mechanistic two‑stage model, the authors provide a plausible narrative that connects genome expansion with subsequent selective streamlining, offering a fresh perspective on why genome size can vary dramatically among organisms with similar phenotypic complexity.
However, several critical issues limit the impact of the work. First, the inference of a circular phylogeny from periodic protein‑length signals may be an over‑interpretation; protein length is influenced by structural constraints, functional diversification, and stochastic mutation, and a single spectral peak does not uniquely dictate a circular topology. Second, the correlation between genome size and average protein length, while statistically significant, does not account for notable outliers such as compact‑genome parasites with complex life cycles or giant‑genome unicellular eukaryotes with abundant repetitive DNA. Third, the proposed classification does not demonstrably improve upon the existing three‑domain system in terms of predictive power or taxonomic resolution; it essentially reproduces the same major groups but with a different geometric metaphor. Finally, methodological details—such as the handling of incomplete or biased genome assemblies, the statistical robustness of the Fourier analysis, and the criteria for defining the boundaries of each circle—are insufficiently described, making reproducibility challenging.
In summary, the manuscript introduces an innovative conceptual model that links genome‑size dynamics to a circular representation of phylogeny, and it offers a mechanistic explanation for the C‑value paradox. While the data‑driven approach and interdisciplinary thinking are commendable, the conclusions rely on assumptions that require further empirical validation. Future work should test the circular model against independent phylogenomic datasets, explore the functional implications of protein‑length periodicity, and refine the genome‑size evolution model to accommodate known exceptions. Only then can the proposed framework be considered a robust alternative to traditional tree‑based taxonomy.
Comments & Academic Discussion
Loading comments...
Leave a Comment