Power-law Signatures and Patchiness in Genechip Oligonucleotide Microarrays

Reading time: 5 minute
...

📝 Abstract

. Genechip oligonucleotide microarrays have been used widely for transcriptional profiling of a large number of genes in a given paradigm. Gene expression estimation precedes biological inference and is given as a complex combination of atomic entities on the array called probes. These probe intensities are further classified into perfect-match (PM) and mis-match (MM) probes. While former is a measure of specific binding, the lat-ter is a measure of non-specific binding. The behavior of the MM probes has especially proven to be elusive. The present study investigates qualita-tive similarities in the distributional signatures and local correlation struc-tures/patchiness between the PM and MM probe intensities. These qualita-tive similarities are established on publicly available microarrays generated across laboratories investigating the same paradigm. Persistence of these similarities across raw as well as background subtracted probe intensities is also investigated. The results presented raise fundamental concerns in inter-preting Genechip oligonucleotide microarray data.

💡 Analysis

. Genechip oligonucleotide microarrays have been used widely for transcriptional profiling of a large number of genes in a given paradigm. Gene expression estimation precedes biological inference and is given as a complex combination of atomic entities on the array called probes. These probe intensities are further classified into perfect-match (PM) and mis-match (MM) probes. While former is a measure of specific binding, the lat-ter is a measure of non-specific binding. The behavior of the MM probes has especially proven to be elusive. The present study investigates qualita-tive similarities in the distributional signatures and local correlation struc-tures/patchiness between the PM and MM probe intensities. These qualita-tive similarities are established on publicly available microarrays generated across laboratories investigating the same paradigm. Persistence of these similarities across raw as well as background subtracted probe intensities is also investigated. The results presented raise fundamental concerns in inter-preting Genechip oligonucleotide microarray data.

📄 Content

Oligonucleotide Genechip microarrays [1,35,36] have been used widely for transcriptional profiling of large number of genes across distinct biological paradigms including (i) stem cell differentiation [27,47], (ii) molecular portraits and heterogeneity in tumors [43,50], (iii) Aging and neurobiology [13], (iv) infectious disease research and environmental applications [31]. Prevalence of such high throughput assays can espe-cially be attributed to the rapid sequencing of genomes [11]. A recent multiple-laboratory and multi-platform study [26] established the superiority of oligonucleotide microarrays from accuracy and precision standpoints. Unlike classical biological approaches, microarrays can be used to model functional relationships between genes, hence provide system-level understanding [30] of the paradigm [14,59]. There is also the possibility of oligonucleotide arrays being used as active screening tools in clinical settings in the near future [21].

Developing suitable computational techniques for meaningful interpretation of oligonucleotide gene expression data is one of the major challenges and precedes biological inference. Gene expression is estimated as a complex combination of atomic entities on the array called probes [45]. While several a1gorithms have been proposed for gene expression estimation and subsequent higher level analysis [2, 3, 24-26, 34, 46, 48], understanding the qualitative behavior at the probe level is still incomplete. Probes are broadly classified into perfect match (PM) and mismatch (MM). The former is a measure of specific binding whereas the latter is a measure of non-specific binding and used as an internal control (Sect. 1.1) [1,35,36]. While PM and MM probes are biologically distinct by very design they are spatially proximal on the array. Several statistical techniques have been proposed for gene expression estimation and subsequent higher-level analysis. While some techniques use perfect as well as mismatch probes [2, 3,34], others have encouraged using the perfect match probes only [24,25] in the estimation procedure. The choice of the latter was possibly inspired by [38], which pointed out that arithmetic subtraction of (PM, MM) probe intensities may not translate into biological subtraction. The qualitative behavior of the MM probes has especially proven to be elusive.

The objective of the present study is to investigate qualitative similarities in the distributional signatures and local correlation structure across the perfect-match and mismatch probe intensities. Qualitative similarities are demonstrated on the raw as well background subtracted (PM, MM) probe intensities in publicly available Genechip arrays generated across laboratories investigating the same biological paradigm [26]. These qualitative similarities to our knowledge have never been reported and raise fundamental concerns in interpreting oligonucleotide gene expression data and higher level analyses such as (a) gene expression estimation and normalization [2, 3,6,24,25,34,46,48,58]. (b) inferring functional relationships and network structure [14,59] (c) ontology [5] and (d) expression quantitative trait loci (eQTL) [28] The present study is especially encouraged by our (i) recent research on various aspects of microarray gene expression analysis [39,40] and growing evidence of (ii) hybridization interactions/multiple targeting of the probes [42,57,60]; (iii) spatial artifacts [52] and (iv) redefinition of probe-transcript relationship [16,33] in oligonucleotide Genechip arrays .

The chapter is organized as follows. In Sect. 1.1, a brief introduction to Genechip oligonucleotide microarrays along with the associated terminologies is provided. Qualitative similarities along with power-law and exponential approximations to the PM and MM probe intensity distributions is investigated in Sec. 2. Qualitative similarities in local correlations/patchiness across PM and MM probe intensity matrices is investigated in Sec. 3. The choice of multiscale decomposition for accomplishing the same is also explored. The impact of the findings in the present study on gene expression estimation and subsequent higher level analyses is discussed in Sect. 4.

Oligonucleotide Genechip microarray [1,35,36] comprise of a large number of atomic entities called probes [45] arranged as a rectangular matrix. Each probe is an oligomer, i.e. around ~25 nucleotides long, (e.g. 5’-GTGATCGTTTACTTCGGTGCCACCT-3’). A set of (~16 to 20) probes also called a probeset, represents a particular transcript on the array. The term transcript is generic and can represent either a gene or an expressed sequence tag (EST). Probes can be broadly classified into perfect-match (PM) and mismatch (MM) probes. PM probes correspond to a short region of the transcript and are designed to be complementary to the target sequence [1,35,36], hence ideally a measure of specific binding. The nucleotide content of an MM probe is the same as that of the corr

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut