The proteomic to biology inference, a frequently overlooked concern in the interpretation of proteomic data: A plea for functional validation
Proteomics will celebrate its 20th year in 2014. In this relatively short period of time, it has invaded most areas of biology and its use will probably continue to spread in the future. These two decades have seen a considerable increase in the speed and sensitivity of protein identification and characterization, even from complex samples. Indeed, what was a challenge twenty years ago is now little more than a daily routine. Although not completely over, the technological challenge now makes room to another challenge, which is the best possible appraisal and exploitation of proteomic data to draw the best possible conclusions from a biological point of view. The point developed in this paper is that proteomic data are almost always fragmentary. This means in turn that although better than an mRNA level, a protein level is often insufficient to draw a valid conclusion from a biological point of view, especially in a world where PTMs play such an important role. This means in turn that transformation of proteomic data into biological data requires an important intermediate layer of functional validation, i.e. not merely the confirmation of protein abundance changes by other methods, but a functional appraisal of the biological consequences of the protein level changes highlighted by the proteomic screens.
💡 Research Summary
Proteomics celebrated its twentieth anniversary in 2014, marking two decades of rapid advances in protein identification and quantification. While the technical challenges of detecting proteins from complex samples have largely been overcome, the authors argue that a second, often overlooked challenge remains: translating proteomic data into reliable biological conclusions. They distinguish two major hurdles. The first, “protein inference,” concerns the conversion of peptide‐level mass‑spectrometry signals into accurate protein lists. Although community guidelines (e.g., false‑discovery‑rate estimation) have improved data quality, the peptide‑to‑protein mapping problem is still not trivial. The second, “protein‑to‑biology inference,” is the focus of the paper. The authors contend that proteomic datasets are inherently fragmentary. In shotgun experiments, proteins are typically identified by a handful of peptides covering only a small, non‑uniform portion of the sequence; thus, the full‑length protein, its cleavage state, or its isoform composition often remain unknown. Even in GeLC (gel‑based) workflows, limited gel fractionation introduces a ~20 % mass‑determination error, making it difficult to discern whether a detected fragment represents the intact protein.
Post‑translational modifications (PTMs) further complicate interpretation. Phosphorylation, acetylation, methylation, glycosylation, prenylation, oxidation, and many other PTMs can dramatically alter activity, localization, or interaction networks without changing overall protein abundance. Consequently, an observed increase in protein level may reflect a compensatory response to an inactivating PTM, while a constant level may hide an activating modification. The authors illustrate this with malate dehydrogenase spots on 2‑D gels: only the most acidic spot may increase, yet that spot could carry the bulk of enzymatic activity due to acetylation.
Given these limitations, the authors propose a two‑stage validation strategy. Stage 1 (“protein‑level validation”) uses orthogonal techniques—Western blotting, targeted SRM, quantitative PCR—to confirm protein identity, isoform, PTM status, and quantitative changes. This step addresses the fragmentary nature of peptide‑based identification. Stage 2 (“functional validation”) tests whether the quantified change translates into altered biological activity. Approaches include siRNA knock‑down or over‑expression, enzymatic assays, metabolite measurements, pharmacological inhibition, and pathway‑specific read‑outs. The paper emphasizes that pathway‑analysis tools, while useful for hypothesis generation, must be complemented by experimental verification of key nodes.
Several case studies underscore the argument. In Cornelia de Lange syndrome, proteomics suggested oxidative stress and c‑Myc involvement; the authors validated both hypotheses using Western blots, chromatin immunoprecipitation, and qPCR, thereby closing the experimental loop. In schizophrenia research, a pyruvate assay demonstrated that altered protein abundance did not directly explain metabolic dysfunction, highlighting the need for functional assays. The authors also discuss the “déjà vu” phenomenon where metabolic enzyme changes are sometimes dismissed as generic adaptations, yet they can be mechanistically crucial.
In conclusion, the paper asserts that proteomic data alone are insufficient for robust biological inference. Systematic, independent validation—first confirming protein identity and quantity, then demonstrating functional impact—is essential to bridge the gap between mass‑spectrometry outputs and cellular physiology. The authors call for the proteomics community to adopt this validation mindset as the next milestone after securing data quality, ensuring that discoveries are both technically sound and biologically meaningful.
Comments & Academic Discussion
Loading comments...
Leave a Comment