Deep-learning-based pan-phenomic data reveals the explosive evolution of avian visual disparity
The evolution of biological morphology is critical for understanding the diversity of the natural world, yet traditional analyses often involve subjective biases in the selection and coding of morphological traits. This study employs deep learning techniques, utilising a ResNet34 model capable of recognising over 10,000 bird species, to explore avian morphological evolution. We extract weights from the model’s final fully connected (fc) layer and investigate the semantic alignment between the high-dimensional embedding space learned by the model and biological phenotypes. The results demonstrate that the high-dimensional embedding space encodes phenotypic convergence. Subsequently, we assess the morphological disparity among various taxa and evaluate the association between morphological disparity and species richness, demonstrating that species richness is the primary driver of morphospace expansion. Moreover, the disparity-through-time analysis reveals a visual “early burst” after the K-Pg extinction. While mainly aimed at evolutionary analysis, this study also provides insights into the interpretability of Deep Neural Networks. We demonstrate that hierarchical semantic structures (biological taxonomy) emerged in the high-dimensional embedding space despite being trained on flat labels. Furthermore, through adversarial examples, we provide evidence that our model in this task can overcome texture bias and learn holistic shape representations (body plans), challenging the prevailing view that CNNs rely primarily on local textures.
💡 Research Summary
The paper presents a novel, data‑driven framework for investigating avian morphological evolution by leveraging deep learning on a massive image repository. Using the DongNiao International Birds 10 000 (DIB‑10K) dataset, which contains over 4.8 million photographs of 10 922 bird species, the authors first performed rigorous cleaning: near‑duplicate removal via perceptual hashing and non‑bird detection with a pretrained Faster‑R‑CNN. Images lacking birds were automatically discarded for high‑sample categories and manually reviewed for low‑sample ones, resulting in a high‑quality set for model training.
For the classification task, the authors adopted the MetaFGNet‑LBird‑31 checkpoint as a starting point and fine‑tuned a ResNet‑34 architecture for 32 epochs on the cleaned data. Importantly, all species were treated as flat categories without any taxonomic information supplied to the network. After training, the 512‑dimensional weight vector of the final fully‑connected (fc) layer was extracted for each species. These vectors were interpreted as high‑dimensional morphological descriptors; they were reduced to the smallest subspace preserving 80 % of total variance and then L2‑normalised. Pairwise cosine similarity between species vectors served as a quantitative proxy for morphological distance.
Hierarchical agglomerative clustering (average linkage) on the cosine similarity matrix produced a dendrogram in Newick format. The authors defined “taxonomic purity” as the proportion of the majority taxon within a node and considered nodes with >85 % purity as taxonomically consistent. This analysis yielded 391 family‑level and 94 order‑level clusters with high purity, while identifying 474 family‑level and 533 order‑level outlier species, indicating that the learned embedding space largely recapitulates traditional avian taxonomy despite being trained on flat labels.
Morphological disparity was quantified as spherical variance of the species vectors. Spearman rank correlations revealed extremely strong positive relationships between species richness and disparity: ρ = 0.966 (p = 1.45 × 10⁻²⁴) at the order level and ρ = 0.908 (p = 3.56 × 10⁻⁸³) at the family level. Four functional forms (power‑law, stretched exponential, Hill equation, logarithmic rational) were fitted to the richness‑disparity relationship, with the stretched exponential model achieving the lowest AIC (order = ‑251.33, family = ‑1201.08). The close AIC values among the top models suggest that disparity scales non‑linearly with richness, but the exact functional shape remains ambiguous.
The disparity‑through‑time (DTT) component introduced a novel spherical ancestral state reconstruction (ASR) algorithm. Assuming that the L2‑normalised fc vectors lie on a unit hypersphere, the authors modeled phenotypic evolution as Riemannian Brownian motion on this sphere. For each pair of sister nodes, the ancestral state was interpolated along the great‑circle arc using spherical linear interpolation (slerp), and contrast variance was corrected for curvature. The overall evolutionary rate (σ²) was estimated from the sum of corrected contrasts. To generate a null expectation, 100 simulations of Brownian motion were performed, projecting Gaussian noise onto the tangent space at each node and mapping it back onto the sphere. Comparing empirical spherical variance across time slices with the simulated null distribution revealed a pronounced early‑burst pattern: disparity surged shortly after the Cretaceous‑Paleogene (K‑Pg) extinction (~66 Ma) and then plateaued, consistent with classic adaptive‑radiation scenarios.
Interpretability analyses employed Grad‑CAM to visualize network attention, confirming that the model focuses on the bird body while largely ignoring complex backgrounds. Moreover, the authors crafted adversarial examples that simultaneously altered feather texture and overall body shape. The model’s predictions remained stable, indicating that it relies more on holistic shape cues than on local texture—a finding that challenges the prevailing view of convolutional neural networks as texture‑biased.
In summary, the study demonstrates that (1) deep convolutional networks trained on massive, cleaned image datasets can extract high‑dimensional, biologically meaningful morphological embeddings without explicit trait engineering; (2) these embeddings faithfully reflect established avian taxonomy and reveal extensive morphological convergence; (3) morphological disparity is tightly linked to species richness and exhibits an early‑burst expansion following the K‑Pg mass extinction; and (4) the network’s internal representations are shape‑centric, offering new insights into CNN interpretability. This integrative approach bridges computer vision and macroevolutionary biology, providing a scalable template for future pan‑phenomic investigations across diverse clades.
Comments & Academic Discussion
Loading comments...
Leave a Comment