Title: OrfMapper: A Web-Based Application for Visualizing Gene Clusters on Metabolic Pathway Maps
ArXiv ID: 0706.3477
Date: 2007-06-26
Authors: Researchers from original ArXiv paper
📝 Abstract
Computational analyses of, e.g., genomic, proteomic, or metabolomic data, commonly result in one or more sets of candidate genes, proteins, or enzymes. These sets are often the outcome of clustering algorithms. Subsequently, it has to be tested if, e.g., the candidate gene-products are members of known metabolic processes. With OrfMapper we provide a powerful but easy-to-use, web-based database application, that supports such analyses. All services provided by OrfMapper are freely available at http://www.orfmapper.com
💡 Deep Analysis
Deep Dive into OrfMapper: A Web-Based Application for Visualizing Gene Clusters on Metabolic Pathway Maps.
Computational analyses of, e.g., genomic, proteomic, or metabolomic data, commonly result in one or more sets of candidate genes, proteins, or enzymes. These sets are often the outcome of clustering algorithms. Subsequently, it has to be tested if, e.g., the candidate gene-products are members of known metabolic processes. With OrfMapper we provide a powerful but easy-to-use, web-based database application, that supports such analyses. All services provided by OrfMapper are freely available at http://www.orfmapper.com
📄 Full Content
The amount of sequence related data increased dramatically during the past years. This is due to improvements of high-throughput and computational methods in * omics that often yield long lists of gene, protein, or enzyme identifiers (IDs). In our laboratory we process different kinds of sequence based data, e.g., DNA-microarray derived gene-expression data. The ultimate purpose of any gene-expression experiment is to produce biological knowledge. Independent of the methods used, the result of microarray experiments is, in most cases, a set of genes found to be differentially expressed between two or more conditions under study. The challenge faced by the researcher is to translate this list of differentially regulated genes into better understanding of the biological phenomena that generate such changes. A good first step in that direction is the translation of the sequence ID list into a functional profile. Biological pathways can provide key information about the organization of biological systems. Major publicly available biological pathway diagram resources, including the Kyoto Encyclopedia of Genes and Genomes (KEGG) [1], GenMAPP [2] and BioCarta1 , can be used to allocate sequence data in pathway maps. With this manuscript we do not intend to present a review about existing solutions but focus on our approach.
Our project requires the analysis of sequence cluster lists and extend the analysis to a maximum possible number of organisms. KEGG currently provides adapted maps for over 380 species covering the following molecular interaction and reaction networks: metabolism, genetic information processing, environmental information processing, cellular processes, human diseases.
In order to use the KEGG pathway database to display and map genes to KEGG pathways, we developed a web-based tool called OrfMapper. Orf-Mapper is an easy-to-use but powerful application that supports data analysis by extracting annotations for given keywords and gene, protein, or enzyme IDs, allocating these IDs to metabolic pathways, and displaying them on pathway maps. Two color codes can be assigned to the IDs, which can, e.g., represent sequence properties, organism identifiers, or cluster memberships. These color codes are used in the query output. The query results are displayed in hypertext format as a web page, prepared for download as tab-delimited raw text, and visualized on colored, hyperlinked KEGG metabolic pathway maps that can be downloaded in PDF format. Together with a version optimized for personal digital assistants, OrfMapper provides unique functionality with respect to accessing and displaying KEGG pathway data.
OrfMapper has been entirely developed with PHP version 4.3.4
The database behind OrfMapper contains gene identifiers, the annotation, organism, and pathway information, respectively. The database is updated monthly. Therefore, information from the KEGG FTP-server6 and from the KEGG web site7 are parsed. In order to keep OrfMapper working and to avoid user query errors during updates, duplicated tables are used. Upon successful download and processing, the updated tables are activated while outdated tables are inactivated.
OrfMapper was designed for prompt display of metabolic relations between gene products by the use of KEGG pathway maps. A detailed online help guides the beginner through the user interface. The user has to specify either annotation keywords (e.g., “hydrogenase protein” or CoxA), gene IDs (e.g., KEGG, NCBI, UniProt), or enzyme IDs (i.e., EC-numbers). The user input can either be uploaded as an ASCII text file, be exported from spreadsheet applications (e.g., Microsoft Excel or OpenOffice Calc), or directly pasted into a text area on the web page.
OrfMapper is made as flexible as possible in order to handle individual input data formats. The IDs can be listed either vertically or horizontally or mixed. They can be separated by all typical text delimiters, e.g., tabulators, spaces, commas and semicolons. Placing keywords in quotation marks forces OrfMapper to perform a boolean AND query.
By default, all organisms are queried for all entered IDs and keywords. In order to restrict output to selected organisms, it is possible to specify those organisms in the first input row. This line must be preceded by an angle bracket character “»” followed by organism names or just parts of organism names (e.g., “droso” instead “Drosophila melanogaster”). The organism names must be separated by commas. If no match to an organism name is found, all organisms are queried.
In order to customize visualization, the user may specify colors for individual IDs. Therefore, either a color name (e.g., yellow, blue, red) or a hexadecimal RGB code (e.g., #FFFF00) can be appended to IDs and keywords with two underscore characters " " (e.g. genename blue, genename #000080, keyword1 red, “keyword1 keyword2” green). This colors the enzyme box corresponding to the ID on a KEGG pathway map. Likewise, the user