📝 Original Info
- Title: Reconstructing Spatiotemporal Gene Expression Data from Partial Observations
- ArXiv ID: 0903.4027
- Date: 2009-03-25
- Authors: ** Dustin A. Cartwright, Siobhan M. Brady, David A. Orlando, Bernd Sturmfels, Philip N. Benfey **
📝 Abstract
Developmental transcriptional networks in plants and animals operate in both space and time. To understand these transcriptional networks it is essential to obtain whole-genome expression data at high spatiotemporal resolution. Substantial amounts of spatial and temporal microarray expression data previously have been obtained for the Arabidopsis root; however, these two dimensions of data have not been integrated thoroughly. Complicating this integration is the fact that these data are heterogeneous and incomplete, with observed expression levels representing complex spatial or temporal mixtures. Given these partial observations, we present a novel method for reconstructing integrated high resolution spatiotemporal data. Our method is based on a new iterative algorithm for finding approximate roots to systems of bilinear equations.
💡 Deep Analysis
Deep Dive into Reconstructing Spatiotemporal Gene Expression Data from Partial Observations.
Developmental transcriptional networks in plants and animals operate in both space and time. To understand these transcriptional networks it is essential to obtain whole-genome expression data at high spatiotemporal resolution. Substantial amounts of spatial and temporal microarray expression data previously have been obtained for the Arabidopsis root; however, these two dimensions of data have not been integrated thoroughly. Complicating this integration is the fact that these data are heterogeneous and incomplete, with observed expression levels representing complex spatial or temporal mixtures. Given these partial observations, we present a novel method for reconstructing integrated high resolution spatiotemporal data. Our method is based on a new iterative algorithm for finding approximate roots to systems of bilinear equations.
📄 Full Content
RECONSTRUCTING SPATIOTEMPORAL GENE
EXPRESSION DATA FROM PARTIAL OBSERVATIONS
DUSTIN A. CARTWRIGHT, SIOBHAN M. BRADY, DAVID A. ORLANDO,
BERND STURMFELS, AND PHILIP N. BENFEY
Abstract. Developmental transcriptional networks in plants and an-
imals operate in both space and time. To understand these transcrip-
tional networks it is essential to obtain whole-genome expression data at
high spatiotemporal resolution. Substantial amounts of spatial and tem-
poral microarray expression data previously have been obtained for the
Arabidopsis root; however, these two dimensions of data have not been
integrated thoroughly.
Complicating this integration is the fact that
these data are heterogeneous and incomplete, with observed expression
levels representing complex spatial or temporal mixtures. Given these
partial observations, we present a novel method for reconstructing in-
tegrated high resolution spatiotemporal data. Our method is based on
a new iterative algorithm for finding approximate roots to systems of
bilinear equations.
1. Introduction
Transcriptional regulation plays an important role in orchestrating a host
of biological processes, particularly during development (reviewed in [9, 13]).
Advances in microarray and sequencing technologies have allowed biologists
to capture genome-wide gene expression data; the output of this transcrip-
tional regulation. This expression data can then be used to identify genes
whose expression is correlated with a particular biological process, and to
identify transcriptional regulators that coordinate the expression of groups
of genes that are important for the same biological process.
The identification of such genes and transcriptional regulators is compli-
cated by the complex heterogeneous mixture of cell types and developmental
stages that comprise each organ of an organism. Expression patterns that
are found only in a subset of cell types within an organ will be diluted and
may not be detectable in the collection of expression patterns obtained from
RNA isolated from samples of an entire organ. Therefore techniques have
been developed to enrich samples for specific cell types or developmental
stages, especially for studies in plants [5]. In the model plant, Arabidopsis
thaliana, several features of the root organ reduce its developmental com-
plexity and facilitate analysis. Specifically, most root cell types are found
within concentric cylinders moving from the outside of the root to the inside
The second and third authors contributed equally to this work.
1
arXiv:0903.4027v1 [q-bio.GN] 24 Mar 2009
2
DA CARTWRIGHT ET AL.
of the root (Figure 1). These cell type layers display rotational symmetry
thus simplifying the spatial features of development. This feature has been
exploited in the development of a cell type enrichment method. This enrich-
ment method uses green fluorescent protein (GFP)-marked transgenic lines
and fluorescently-activated cell sorting (FACS) to collect cell type enriched
samples and has allowed for the identification of cell type-specific expression
patterns [1, 2]. Using this technique, high resolution expression data have
been obtained for nearly all cell types in the Arabidopsis root (herein called
the marker-line dataset) [4, 10].
Another feature that makes the Arabidopsis root a tractable developmen-
tal model is that cell types are constrained in files along the root’s longitu-
dinal axis and most of these cells are produced from a stem cell population
found at the apex of the root. This feature allows a cell’s developmental
timeline to be represented by its position along the length of the root. To
obtain a developmental time-series expression dataset individual Arabidopsis
roots were sectioned into thirteen pieces, each piece representing a develop-
mental time point (herein called the longitudinal dataset) [4]. Each of these
sections, however, contains a mixture of cell types, and the microarray ex-
pression values obtained are therefore the average of the expression levels
over multiple cell types present at these specific developmental time points.
While the 19 fluorescently marked lines in Brady et al. [4] cover expression
in nearly all cell types, they do not comprehensively mark all developmental
stages of these cell types. Also, the procambium cell type was not measured,
as a fluorescent marker-line that marks that cell type did not exist at the
time. However, expression from the longitudinal dataset, does contain av-
eraged expression of all cell types, and may be used to infer the missing cell
type data.
Previous studies have looked at separating expression data from the het-
erogeneous cell populations that make up tumors into the contributions of
their constituent cell types [8, 16]. However, in that context, the difficulty
comes from the fact that the mixture of cell types in each sample is unknown,
whereas within our experimental context, the cell type mixture of each sam-
ple is known. Two computational methods have been developed to com
…(Full text truncated)…
Reference
This content is AI-processed based on ArXiv data.