📝 Original Info
- Title: Gene Expression Data Knowledge Discovery using Global and Local Clustering
- ArXiv ID: 1003.4079
- Date: 2010-03-28
- Authors: Researchers from original ArXiv paper
📝 Abstract
To understand complex biological systems, the research community has produced huge corpus of gene expression data. A large number of clustering approaches have been proposed for the analysis of gene expression data. However, extracting important biological knowledge is still harder. To address this task, clustering techniques are used. In this paper, hybrid Hierarchical k-Means algorithm is used for clustering and biclustering gene expression data is used. To discover both local and global clustering structure biclustering and clustering algorithms are utilized. A validation technique, Figure of Merit is used to determine the quality of clustering results. Appropriate knowledge is mined from the clusters by embedding a BLAST similarity search program into the clustering and biclustering process. To discover both local and global clustering structure biclustering and clustering algorithms are utilized. To determine the quality of clustering results, a validation technique, Figure of Merit is used. Appropriate knowledge is mined from the clusters by embedding a BLAST similarity search program into the clustering and biclustering process.
💡 Deep Analysis
Deep Dive into Gene Expression Data Knowledge Discovery using Global and Local Clustering.
To understand complex biological systems, the research community has produced huge corpus of gene expression data. A large number of clustering approaches have been proposed for the analysis of gene expression data. However, extracting important biological knowledge is still harder. To address this task, clustering techniques are used. In this paper, hybrid Hierarchical k-Means algorithm is used for clustering and biclustering gene expression data is used. To discover both local and global clustering structure biclustering and clustering algorithms are utilized. A validation technique, Figure of Merit is used to determine the quality of clustering results. Appropriate knowledge is mined from the clusters by embedding a BLAST similarity search program into the clustering and biclustering process. To discover both local and global clustering structure biclustering and clustering algorithms are utilized. To determine the quality of clustering results, a validation technique, Figure of Mer
📄 Full Content
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 3, MARCH 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
116
Gene Expression Data Knowledge Discovery
using Global and Local Clustering
Swathi. H
Abstract—To understand complex biological systems, the research community has produced huge corpus of gene expression
data. A large number of clustering approaches have been proposed for the analysis of gene expression data. However,
extracting important biological knowledge is still harder. To address this task, clustering techniques are used. In this paper,
hybrid Hierarchical k-Means algorithm is used for clustering and biclustering gene expression data is used. To discover both
local and global clustering structure biclustering and clustering algorithms are utilized. A validation technique, Figure of Merit is
used to determine the quality of clustering results. Appropriate knowledge is mined from the clusters by embedding a BLAST
similarity search program into the clustering and biclustering process. To discover both local and global clustering structure
biclustering and clustering algorithms are utilized. To determine the quality of clustering results, a validation technique, Figure of
Merit is used. Appropriate knowledge is mined from the clusters by embedding a BLAST similarity search program into the
clustering and biclustering process.
Index Terms—Clustering, Gene expression data, validation technique, similarity search program
—————————— ——————————
1 INTRODUCTION
HE clustering is the process of grouping data into
classes or groups so that objects within a cluster have
high similarity in comparison to one another, but are
very dissimilar to objects in other classes [11]. Clustering
can also facilitate taxonomy formation,that is,the organi-
zation of observations into a hierarchy of classes that
group similar events together.There exist a large number
of clustering algorithms in the literature.The clustering
algorithms are commonly applied in molecular biology
for gene expression data analysis [5, 6]. These algorithms
are used to partition genes into groups based on the simi-
larity among their expression profiles. These clustering
algorithms can be broadly classified into partitional and
hierarchical algorithms [11].
The partitional clustering algorithms generate a
single partition, with a specified or estimated number of
nonoverlapping clusters, of the data in an attempt to re-
cover natural groups present in the data [11]. Hierarchical
clustering (HC) algorithms construct a hierarchy of parti-
tions, represented as a dendogram in which each parti-
tion is nested within the partition at the next level in the
hierarchy [11]. The most commonly used partitional clus-
tering algorithms are K-Means (KM) and k-mediods [11].
The KM algorithm takes the input parameter k, and parti-
tions a set of n objects into k clusters so that the resulting
clusters have high intracluster similarity and low inter
cluster similarity. Cluster similarity is measured as the
mean value of the objects in a cluster, which can be
viewed as the cluster’s centre of gravity [11].
However both KM and HC clustering algorithm have
certain disadvantages like difficulties in specifying the
number of clusters in advance and in selection of merge
or split points [11]. HC cannot represent distinct clusters
with similar expression patterns. As clusters grow in size,
the actual expression patterns become less relevant [11].
KM clustering requires a specified number of clusters in
advance and chooses initial centroids randomly; in addi-
tion, it is sensitive to outliers [11]. A novel hybrid ap-
proach that combines the merits of these two methods
and discards their innate disadvantages [1]. HC is carried
out first to decide the location and number of clusters in
the first round and run the KM clustering in next round.
This approach provides a mechanism to handle outliers
[1], [2], [3], [12].
When clustering data the similar observations should
be grouped together. Thus needs to be able to compute
the distance between two data objects, but it can be de-
fined in many forms [12].Distance measurements influ-
ence the shape of the clusters, as some elements may be
close to one another according to one distance and farther
away according to another[16]. In this paper the Pear-
son’s Correlation Coefficient measurement is used to cal-
culate the distance.In this work the gene expression data
is clustered by global and local clustering.
Gene expression is the process by which inheritable
information from a gene, such as the DNA sequence,is
made into a functional gene product,such as protein or
RNA[15].The expression of many genes is regulated after
transcription(i.e., by microRNAs or ubiquitin ligases) and
an increase in mRNA concentration need not always in-
crease expression.The advances in microarray technolo-
gy,high-throughput and low-throughput methods such
a
…(Full text truncated)…
Reference
This content is AI-processed based on ArXiv data.