Quality assessment for short oligonucleotide microarray data

Reading time: 6 minute
...

📝 Original Info

  • Title: Quality assessment for short oligonucleotide microarray data
  • ArXiv ID: 0710.0178
  • Date: 2011-11-10
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Quality of microarray gene expression data has emerged as a new research topic. As in other areas, microarray quality is assessed by comparing suitable numerical summaries across microarrays, so that outliers and trends can be visualized, and poor quality arrays or variable quality sets of arrays can be identified. Since each single array comprises tens or hundreds of thousands of measurements, the challenge is to find numerical summaries which can be used to make accurate quality calls. To this end, several new quality measures are introduced based on probe level and probeset level information, all obtained as a by-product of the low-level analysis algorithms RMA/fitPLM for Affymetrix GeneChips. Quality landscapes spatially localize chip or hybridization problems. Numerical chip quality measures are derived from the distributions of Normalized Unscaled Standard Errors and of Relative Log Expressions. Quality of chip batches is assessed by Residual Scale Factors. These quality assessment measures are demonstrated on a variety of datasets (spike-in experiments, small lab experiments, multi-site studies). They are compared with Affymetrix's individual chip quality report.

💡 Deep Analysis

Deep Dive into Quality assessment for short oligonucleotide microarray data.

Quality of microarray gene expression data has emerged as a new research topic. As in other areas, microarray quality is assessed by comparing suitable numerical summaries across microarrays, so that outliers and trends can be visualized, and poor quality arrays or variable quality sets of arrays can be identified. Since each single array comprises tens or hundreds of thousands of measurements, the challenge is to find numerical summaries which can be used to make accurate quality calls. To this end, several new quality measures are introduced based on probe level and probeset level information, all obtained as a by-product of the low-level analysis algorithms RMA/fitPLM for Affymetrix GeneChips. Quality landscapes spatially localize chip or hybridization problems. Numerical chip quality measures are derived from the distributions of Normalized Unscaled Standard Errors and of Relative Log Expressions. Quality of chip batches is assessed by Residual Scale Factors. These quality assessme

📄 Full Content

To be published in Technometrics (with Discussion) Julia Brettschneider ac∗, Fran¸cois Collinb, Benjamin M. Bolstadb, Terence P. Speedbd Quality assessment for short oligonucleotide microarray data Quality of microarray gene expression data has emerged as a new research topic. As in other areas, microarray quality is assessed by comparing suitable numerical summaries across microarrays, so that outliers and trends can be visualized, and poor quality arrays or variable quality sets of arrays can be identified. Since each single array comprises tens or hundreds of thousands of measurements, the challenge is to find numerical summaries which can be used to make accurate quality calls. To this end, several new quality measures are introduced based on probe level and probeset level information, all obtained as a by-product of the low-level analysis algorithms RMA/fitPLM for Affymetrix GeneChips. Quality landscapes spatially localize chip or hybridization problems. Numerical chip quality measures are derived from the distributions of Normalized Unscaled Standard Errors and of Relative Log Expressions. Quality of chip batches is assessed by Residual Scale Factors. These quality assessment measures are demonstrated on a variety of datasets (spike-in experiments, small lab experiments, multi-site studies). They are compared with Affymetrix’s individual chip quality report. KEYWORDS: quality control, microarrays, Affymetrix chips, relative log expression, normalized unscaled standard errors, residual scale factors. 1. INTRODUCTION With the introduction of microarrays biologist have been witnessing entire labs shrinking to matchbox size. This paper invites quality researchers to join scientists on their fantastic journey into the world of microscopic high-throughput measurement technologies. Building a biological organism as laid out by the genetic code is a multi-step process with room for variation at each step. The first steps, as described by the Dogma of molecular biology, are genes (and DNA sequence in general), their transcripts and proteins. Substantial factors contributing to their variation in both structure and abundance include cell type, developmental stage, genetic background and environmental conditions. Connecting molecular observations to the state of an organism is a central interest in molecular biology. This includes the study of the gene and protein functions and interactions, and their alteration in response to changes in environmental and developmental conditions. Traditional methods in molecular biology generally work on a ”one gene (or protein) in one experiment” basis. With the invention of microarrays huge numbers of such macromolecules can now be monitored in one experiment. The most common kinds are gene expression microarrays, which measure the mRNA transcript abundance for tens of thousands of genes simultaneously. aUniversity of Warwick, Department of Statistics, Coventry, UK; bUniversity of California at Berkeley, Department of Statistics, Berkeley, California, USA; cQueen’s University, Cancer Research Institute Division of Cancer Care & Epidemiology and Department of Community Heath & Epidemiology, Kingston, Ontario, Canada; dWalter and Eliza Hall Institute Bioinformatics Division, Melbourne, Australia. ∗Corresponding author: julia.brettschneider@warwick.ac.uk arXiv:0710.0178v2 [stat.ME] 16 Nov 2007 2 For biologists, this high-throughput approach has opened up entirely new avenues of research. Rather than experimentally confirming the hypothesized role of a certain candidate gene in a certain cellular process, they can use genome-wide comparisons to screen for all genes which might be involved in that process. One of the first examples of such an exploratory approach is the expression profiling study of mitotic yeast cells by Cho et al. (1998) which determined a set of a few hundred genes involved in the cell cycle and triggered a cascade of articles re-analyzing the data or replicating the experiment. Microarrays have become a central tool in cancer research initiated by the discovery and re-definition of tumor subtypes based on molecular signatures (see e.g. Perou et al. (2000), Alizadeh et al. (2000), Ramaswamy and Golub (2002), Yeoh et al. (2002)). In Section 2 we will explain different kinds of microarray technologies in more detail and describe their current applications in life sciences research. A DNA microarray consists of a glass surface with a large number of distinct fragments of DNA called probes attached to it at fixed positions. A fluorescently labelled sample containing a mixture of unknown quantities of DNA molecules called the target is applied to the microarray. Under the right chemical conditions, single-stranded fragments of target DNA will base pair with the probes which are their complements, with great specificity. This reaction is called hybridization, and is the reason DNA microarrays work. The fixed probes are either fragments of DNA called complementary DNA (cDNA) obtain

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut