Modelling background intensity in Affymetrix Genechips

Reading time: 6 minute
...

📝 Original Info

  • Title: Modelling background intensity in Affymetrix Genechips
  • ArXiv ID: 0712.3494
  • Date: 2008-06-30
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (필요 시 원문을 확인하십시오.) **

📝 Abstract

DNA microarrays are devices that are able, in principle, to detect and quantify the presence of specific nucleic acid sequences in complex biological mixtures. The measurement consists in detecting fluorescence signals from several spots on the microarray surface onto which different probe sequences are grafted. One of the problems of the data analysis is that the signal contains a noisy background component due to non-specific binding. This paper presents a physical model for background estimation in Affymetrix Genechips. It combines two different approaches. The first is based on the sequence composition, specifically its sequence dependent hybridization affinity. The second is based on the strong correlation of intensities from locations which are the physical neighbors of a specific spot on the chip. Both effects are incorporated in a background functional which contains 24 free parameters, fixed by minimization on a training data set. In all data analyzed the sequence specific parameters, obtained by minimization, are found to strongly correlate with empirically determined stacking free energies for RNA/DNA hybridization in solution. Moreover, there is an overall agreement with experimental background data and we show that the physics-based model proposed in this paper performs on average better than purely statistical approaches for background calculations. The model thus provides an interesting alternative method for background subtraction schemes in Affymetrix Genechips.

💡 Deep Analysis

📄 Full Content

DNA microarrays have become a powerful tool to monitor the gene expression level of thousands of genes simultaneously on a genome-wide scale (for a recent review see for instance Ref. [1]). Microarrays are based on the hybridization between the surface-bound DNA sequences (called probes) and DNA or RNA sequences in solution (called targets). The probes are designed to have a sequence exactly complementary to that of the desired target sequence one wishes to detect in solution.

As the target molecules in solution are labelled with fluorescent markers, the amount of hybridized targets can be determined by means of optical measurements. The fluorescence intensity measured at a specific spot on the microarray reflects the concentration of complementary targets in the used sample solution.

One of the most prominent commercial platforms of DNA microarrays is provided by Affymetrix [2]. By virtue of in-situ photolithographic techniques Affymetrix produces arrays in which more than one million different probes are grafted on a single chip. The probes are 25 nucleotides long sequences of single-stranded DNA. As a single 25-mer may not provide reliable measurements of the expression level of one specific gene, Affymetrix chooses 10-16 fragments of different regions for each gene, which together form a so-called probe set. Each probe set is to uniquely characterize a given gene.

One of the problems of the data analysis is that the measured fluorescence signal does not only contain information about the concentration of a specific gene in solution, but also of other sources of hybridization with fragments which only partially overlap with the surfacebound sequence. Thus, the measured fluorescence of a given probe site can be written as

where I sp (c) is the specific contribution of the signal which depends on the concentration c of the complementary target in solution and I 0 is a background signal. The aim of this work is to introduce a new model which is based upon inputs from physical chemistry for the calculation of I 0 for Affymetrix arrays. Identifying the main sources of background intensity is crucial in order to make accurate and reliable estimates of gene expression levels mainly for weakly expressed genes, for which I sp (c) ≈ I 0 . A peculiarity of Affymetrix Genechips is that probes come in pairs: a probe, the so-called perfect match (PM), has a sequence exactly complementary to the sequence in solution. A second probe, physically located as neighbor of the PM in the chip, has a single non-complementary base with respect to the specific target. The latter is known as mismatch (MM). Originally, MM’s were supposed to estimate only the non-specific hybridization, i.e. it was expected that I MM ≈ I 0 , so that from eq. ( 1) one could have estimated I sp (c) = I P M -I MM . However, this approach experiences some difficulties as in some chips as many as 30% of the MM intensities are higher than the corresponding PM’s [3] (although this seems to occur predominantly in low intensities regimes, where both PM and MM signals may be dominated by non-specific hybridization [4]). Moreover, it has been found that I MM also depends on the concentration in solution of the almost complementary target sequence. Hence the background adjustment based on the difference I P M -I MM currently does not receive much consensus and other strategies have been devised [5]. For a discussion of MM hybridization see also Refs. [6,7,8].

Due to its central importance, the modeling of background intensities is not new. One can distinguish here between models using purely statistical treatment [9,10,11,12] and others where physical inputs coming from equilibrium thermodynamics were employed [13,14,15,16,17]. A more extensive discussion of previous studies in relation with our results is postponed to the final section of this paper.

In this paper we present a new method to estimate the background noise of Affymetrix gene expression arrays. We construct a functional which contains 24 parameters, fixed by minimization on a set of training data. The functional takes into account the physical chemistry of hybridization by a subset of the 24 parameters. These parameters depend on sequence composition and which are equivalent to the stacking free energies in the nearestneigbor model [18]. We also exploit the observation that the background signal of a given site strongly correlates with the intensities measured on neighboring sites. The accuracy of the results is tested on a set of spike-in data in which transcripts are added in solution at known concentration. In particular, being interested in the accuracy of our background predictions, we focus on the data at zero concentration. The model developed in this paper reproduces the spike-in data very well and in this particular case it performs better than other popular algorithms used for background adjustment in Affymetrix expression chips.

This paper is organized as follows: the background

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut