Reconstructing Spatiotemporal Gene Expression Data from Partial Observations

Reading time: 5 minute
...

📝 Original Info

  • Title: Reconstructing Spatiotemporal Gene Expression Data from Partial Observations
  • ArXiv ID: 0903.4027
  • Date: 2009-03-25
  • Authors: ** Dustin A. Cartwright, Siobhan M. Brady, David A. Orlando, Bernd Sturmfels, Philip N. Benfey **

📝 Abstract

Developmental transcriptional networks in plants and animals operate in both space and time. To understand these transcriptional networks it is essential to obtain whole-genome expression data at high spatiotemporal resolution. Substantial amounts of spatial and temporal microarray expression data previously have been obtained for the Arabidopsis root; however, these two dimensions of data have not been integrated thoroughly. Complicating this integration is the fact that these data are heterogeneous and incomplete, with observed expression levels representing complex spatial or temporal mixtures. Given these partial observations, we present a novel method for reconstructing integrated high resolution spatiotemporal data. Our method is based on a new iterative algorithm for finding approximate roots to systems of bilinear equations.

💡 Deep Analysis

Deep Dive into Reconstructing Spatiotemporal Gene Expression Data from Partial Observations.

Developmental transcriptional networks in plants and animals operate in both space and time. To understand these transcriptional networks it is essential to obtain whole-genome expression data at high spatiotemporal resolution. Substantial amounts of spatial and temporal microarray expression data previously have been obtained for the Arabidopsis root; however, these two dimensions of data have not been integrated thoroughly. Complicating this integration is the fact that these data are heterogeneous and incomplete, with observed expression levels representing complex spatial or temporal mixtures. Given these partial observations, we present a novel method for reconstructing integrated high resolution spatiotemporal data. Our method is based on a new iterative algorithm for finding approximate roots to systems of bilinear equations.

📄 Full Content

RECONSTRUCTING SPATIOTEMPORAL GENE EXPRESSION DATA FROM PARTIAL OBSERVATIONS DUSTIN A. CARTWRIGHT, SIOBHAN M. BRADY, DAVID A. ORLANDO, BERND STURMFELS, AND PHILIP N. BENFEY Abstract. Developmental transcriptional networks in plants and an- imals operate in both space and time. To understand these transcrip- tional networks it is essential to obtain whole-genome expression data at high spatiotemporal resolution. Substantial amounts of spatial and tem- poral microarray expression data previously have been obtained for the Arabidopsis root; however, these two dimensions of data have not been integrated thoroughly. Complicating this integration is the fact that these data are heterogeneous and incomplete, with observed expression levels representing complex spatial or temporal mixtures. Given these partial observations, we present a novel method for reconstructing in- tegrated high resolution spatiotemporal data. Our method is based on a new iterative algorithm for finding approximate roots to systems of bilinear equations. 1. Introduction Transcriptional regulation plays an important role in orchestrating a host of biological processes, particularly during development (reviewed in [9, 13]). Advances in microarray and sequencing technologies have allowed biologists to capture genome-wide gene expression data; the output of this transcrip- tional regulation. This expression data can then be used to identify genes whose expression is correlated with a particular biological process, and to identify transcriptional regulators that coordinate the expression of groups of genes that are important for the same biological process. The identification of such genes and transcriptional regulators is compli- cated by the complex heterogeneous mixture of cell types and developmental stages that comprise each organ of an organism. Expression patterns that are found only in a subset of cell types within an organ will be diluted and may not be detectable in the collection of expression patterns obtained from RNA isolated from samples of an entire organ. Therefore techniques have been developed to enrich samples for specific cell types or developmental stages, especially for studies in plants [5]. In the model plant, Arabidopsis thaliana, several features of the root organ reduce its developmental com- plexity and facilitate analysis. Specifically, most root cell types are found within concentric cylinders moving from the outside of the root to the inside The second and third authors contributed equally to this work. 1 arXiv:0903.4027v1 [q-bio.GN] 24 Mar 2009 2 DA CARTWRIGHT ET AL. of the root (Figure 1). These cell type layers display rotational symmetry thus simplifying the spatial features of development. This feature has been exploited in the development of a cell type enrichment method. This enrich- ment method uses green fluorescent protein (GFP)-marked transgenic lines and fluorescently-activated cell sorting (FACS) to collect cell type enriched samples and has allowed for the identification of cell type-specific expression patterns [1, 2]. Using this technique, high resolution expression data have been obtained for nearly all cell types in the Arabidopsis root (herein called the marker-line dataset) [4, 10]. Another feature that makes the Arabidopsis root a tractable developmen- tal model is that cell types are constrained in files along the root’s longitu- dinal axis and most of these cells are produced from a stem cell population found at the apex of the root. This feature allows a cell’s developmental timeline to be represented by its position along the length of the root. To obtain a developmental time-series expression dataset individual Arabidopsis roots were sectioned into thirteen pieces, each piece representing a develop- mental time point (herein called the longitudinal dataset) [4]. Each of these sections, however, contains a mixture of cell types, and the microarray ex- pression values obtained are therefore the average of the expression levels over multiple cell types present at these specific developmental time points. While the 19 fluorescently marked lines in Brady et al. [4] cover expression in nearly all cell types, they do not comprehensively mark all developmental stages of these cell types. Also, the procambium cell type was not measured, as a fluorescent marker-line that marks that cell type did not exist at the time. However, expression from the longitudinal dataset, does contain av- eraged expression of all cell types, and may be used to infer the missing cell type data. Previous studies have looked at separating expression data from the het- erogeneous cell populations that make up tumors into the contributions of their constituent cell types [8, 16]. However, in that context, the difficulty comes from the fact that the mixture of cell types in each sample is unknown, whereas within our experimental context, the cell type mixture of each sam- ple is known. Two computational methods have been developed to com

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut