Spectral analysis of Gene co-expression network of Zebrafish
We analyze the gene expression data of Zebrafish under the combined framework of complex networks and random matrix theory. The nearest neighbor spacing distribution of the corresponding matrix spectra follows random matrix predictions of Gaussian orthogonal statistics. Based on the eigenvector analysis we can divide the spectra into two parts, first part for which the eigenvector localization properties match with the random matrix theory predictions, and the second part for which they show deviation from the theory and hence are useful to understand the system dependent properties. Spectra with the localized eigenvectors can be characterized into three groups based on the eigenvalues. We explore the position of localized nodes from these different categories. Using an overlap measure, we find that the top contributing nodes in the different groups carry distinguished structural features. Furthermore, the top contributing nodes of the different localized eigenvectors corresponding to the lower eigenvalue regime form different densely connected structure well separated from each other. Preliminary biological interpretation of the genes, associated with the top contributing nodes in the localized eigenvectors, suggests that the genes corresponding to same vector share common features.
💡 Research Summary
This study applies a combined framework of complex network analysis and Random Matrix Theory (RMT) to a large‑scale gene expression dataset from zebrafish (Danio rerio). First, pairwise Pearson correlation coefficients are computed for all genes, and a threshold is applied to retain only statistically significant co‑expression links, yielding an undirected weighted network. The resulting adjacency matrix is symmetric and real, allowing a full spectral decomposition.
The eigenvalue spectrum is examined by ordering the eigenvalues λi and calculating the nearest‑neighbor spacings si = λi+1 – λi. After unfolding (normalizing by the mean spacing ⟨s⟩), the spacing distribution P(ξ) is compared with the Wigner‑Dyson distribution of the Gaussian Orthogonal Ensemble (GOE). The empirical distribution matches the GOE prediction closely, indicating that the bulk of the network’s spectral properties are governed by universal random‑matrix behavior, i.e., the network exhibits a high degree of randomness in its global connectivity pattern.
To probe deviations from this universal behavior, the inverse participation ratio (IPR) of each eigenvector vi is computed: IPRi = Σj (vij)^4. Small IPR values correspond to delocalized eigenvectors that spread over many nodes, consistent with RMT expectations. Large IPR values identify localized eigenvectors that concentrate on a few nodes, suggesting system‑specific structural or functional modules. The spectrum is therefore partitioned into two regimes: a delocalized regime that follows RMT predictions, and a localized regime that carries biologically relevant information.
Localized eigenvectors are further grouped according to their eigenvalues into three categories: low‑energy (small eigenvalues), middle‑energy, and high‑energy (large eigenvalues). For each category, the top contributing nodes (genes with the largest absolute components in the eigenvector) are extracted. An overlap measure O(A,B) = |A∩B| / min(|A|,|B|) is introduced to quantify the structural similarity between node sets from different categories. The analysis reveals that the top‑contributing nodes of each category form distinct, densely connected sub‑graphs that are largely non‑overlapping. In particular, low‑energy localized eigenvectors highlight highly central hub genes, whereas high‑energy localized eigenvectors point to peripheral genes involved in specialized stress‑response pathways.
Biological interpretation is performed using Gene Ontology (GO) and KEGG pathway enrichment analyses. Genes belonging to the same localized eigenvector tend to share functional annotations: low‑energy groups are enriched for developmental and cell‑division processes, middle‑energy groups for metabolic functions, and high‑energy groups for immune and stress‑response pathways. This functional coherence supports the hypothesis that eigenvector localization captures biologically meaningful modules that are not apparent from simple connectivity metrics.
Overall, the paper demonstrates that combining complex‑network construction with RMT provides a powerful two‑tiered spectral analysis: the bulk spectrum reflects universal random‑matrix statistics, while deviations in the form of localized eigenvectors uncover system‑specific structural and functional modules. The introduced overlap measure and the classification of localized eigenvectors offer a quantitative route to identify candidate gene modules for further experimental validation, and the methodology is readily extensible to other organisms or disease‑related transcriptomic datasets.
Comments & Academic Discussion
Loading comments...
Leave a Comment