A Preliminary Work on Evolutionary Identification of Protein Variants and New Proteins on Grids

Reading time: 5 minute
...

📝 Abstract

Protein identification is one of the major task of Proteomics researchers. Protein identification could be resumed by searching the best match between an experimental mass spectrum and proteins from a database. Nevertheless this approach can not be used to identify new proteins or protein variants. In this paper an evolutionary approach is proposed to discover new proteins or protein variants thanks a “de novo sequencing” method. This approach has been experimented on a specific grid called Grid5000 with simulated spectra and also real spectra.

💡 Analysis

Protein identification is one of the major task of Proteomics researchers. Protein identification could be resumed by searching the best match between an experimental mass spectrum and proteins from a database. Nevertheless this approach can not be used to identify new proteins or protein variants. In this paper an evolutionary approach is proposed to discover new proteins or protein variants thanks a “de novo sequencing” method. This approach has been experimented on a specific grid called Grid5000 with simulated spectra and also real spectra.

📄 Content

arXiv:0804.1202v1 [q-bio.BM] 8 Apr 2008 A Preliminary Work on Evolutionary Identification of Protein Variants and New Proteins on Grids Jean-Charles Boisson, Laetitia Jourdan and El-Ghazali Talbi LIFL/INRIA Futurs-Universit´e de Lille1 Bˆat M3-Cit´e Scientifique {boisson,jourdan,talbi}@lifl.fr Christian Rolando Plateforme de Prot´eomique / Centre Commun de Spectrom´etrie de masse 59655 Villeneuve d’Ascq Cedex, FRANCE Christian.Rolando@univ-lille1.fr Abstract Protein identification is one of the major task of Pro- teomics researchers. Protein identification could be re- sumed by searching the best match between an experimental mass spectrum and proteins from a database. Nevertheless this approach can not be used to identify new proteins or protein variants. In this paper an evolutionary approach is proposed to discover new proteins or protein variants thanks a “de novo sequencing” method. This approach has been experimented on a specific grid called Grid5000 with simulated spectra and also real spectra.

  1. Introduction Proteomics can be defined as the global analysis of pro- teins. Protein identification is one of the major task of Pro- teomic researchers as it can help to understand the biologi- cal mechanisms in the living cells. All the current methods use data from mass-spectrometers and generally give good results. But in the case of protein variants or new proteins, these methods can only recognize a protein if it is stored in a database and can not clearly explain why this protein is different from any other in the database. The aim of our ap- proach is to find the entire sequence of a protein, even in the case of variants or unknown proteins. To do that, we need to identify the different peptides that composed the protein. First, their mass (their chemical formula) have to be found with a MS spectrum and secondly, from their mass, their sequence can be found with MS/MS spectra. In fact, when peptides are known, we can obtain the complete protein. This article is organized as follows. Section 2 deals with the specificities of protein variants and new protein iden- tification problems; section 3 describes our approach and the different algorithms that compose it; section 4 intro- duces the parallel framework; section 5 presents our results and discusses them and finally conclusions and perspectives about this work are provided.
  2. The Positioning of the Protein Variants and New Proteins Identification Problem The identification of new proteins and protein variants is a complex problem. All the existing protein identifi- cation methods are based on two types of data: MS and MS/MS spectra (MS for Mass Spectrometry) which are mass/intensity spectra. A MS spectrum is obtained by ex- traction of an experimental protein from a proteins mix, its digestion by a specific enzyme and its analysis in a mass spectrometer. From a MS spectrum, databases allow to identify all the peptides by their masses. Techniques us- ing MS spectra for protein identification are identification methods by peptide mass fingerprint (PMF). The scoring of these methods is based of the comparison of an exper- imental peptide mass list with a theoretical peptide mass list [5, 11]. They give good results but they only find the closest protein to the experimental one without more infor- mation. A way to overcome the lacks of MS data is to use also MS/MS data (tandem mass spectrometry). Each pep- tide from the MS spectrum is selected and fragmented to obtain the corresponding MS/MS spectrum. The ions de- tected are characteristic of the structure of the parent pep- tide. Thus it is theoretically possible to obtain the sequence of each peptide from the digested protein. The use of MS data (mass of the peptides) combined to MS/MS data (par- tial sequence of the peptides) data increase the accuracy of the PMF techniques [1, 9]. These scores use several proper- ties on the ions obtained by MS/MS spectra in order to find amino acid sequences. With partial amino acid sequences and masses, proteins can be distinguished easier than with masses only. However, it is not sufficient to identify un- known proteins. An alternative method named de novo sequencing has been proposed, using tandem mass spectrometry. It works on random sequence of proteins in order to find the exper- imental one (without databases). In this case the identifi- cation is based on random peptides or peptides result of a earlier identification (made by specific tools) [3, 4, 10, 13]. But the MS/MS data are so fragmented (the deduced se- quences are limited) and the number of theoretical protein that can be generated is so large that this kind of technique is only use on small amount of data. We speak about de novo peptide sequencing. Furthermore, alignment tools as Blast are necessary to find the closest peptide corresponding to the result sequence and validate it. Evolutionary approaches as optimization method have been already used against the huge research space of the de novo peptide sequencing proble

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut