Gene Expression Data Knowledge Discovery using Global and Local Clustering

Reading time: 6 minute
...

📝 Original Info

  • Title: Gene Expression Data Knowledge Discovery using Global and Local Clustering
  • ArXiv ID: 1003.4079
  • Date: 2010-03-28
  • Authors: Researchers from original ArXiv paper

📝 Abstract

To understand complex biological systems, the research community has produced huge corpus of gene expression data. A large number of clustering approaches have been proposed for the analysis of gene expression data. However, extracting important biological knowledge is still harder. To address this task, clustering techniques are used. In this paper, hybrid Hierarchical k-Means algorithm is used for clustering and biclustering gene expression data is used. To discover both local and global clustering structure biclustering and clustering algorithms are utilized. A validation technique, Figure of Merit is used to determine the quality of clustering results. Appropriate knowledge is mined from the clusters by embedding a BLAST similarity search program into the clustering and biclustering process. To discover both local and global clustering structure biclustering and clustering algorithms are utilized. To determine the quality of clustering results, a validation technique, Figure of Merit is used. Appropriate knowledge is mined from the clusters by embedding a BLAST similarity search program into the clustering and biclustering process.

💡 Deep Analysis

Deep Dive into Gene Expression Data Knowledge Discovery using Global and Local Clustering.

To understand complex biological systems, the research community has produced huge corpus of gene expression data. A large number of clustering approaches have been proposed for the analysis of gene expression data. However, extracting important biological knowledge is still harder. To address this task, clustering techniques are used. In this paper, hybrid Hierarchical k-Means algorithm is used for clustering and biclustering gene expression data is used. To discover both local and global clustering structure biclustering and clustering algorithms are utilized. A validation technique, Figure of Merit is used to determine the quality of clustering results. Appropriate knowledge is mined from the clusters by embedding a BLAST similarity search program into the clustering and biclustering process. To discover both local and global clustering structure biclustering and clustering algorithms are utilized. To determine the quality of clustering results, a validation technique, Figure of Mer

📄 Full Content

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 3, MARCH 2010, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ 116

Gene Expression Data Knowledge Discovery using Global and Local Clustering Swathi. H Abstract—To understand complex biological systems, the research community has produced huge corpus of gene expression data. A large number of clustering approaches have been proposed for the analysis of gene expression data. However, extracting important biological knowledge is still harder. To address this task, clustering techniques are used. In this paper, hybrid Hierarchical k-Means algorithm is used for clustering and biclustering gene expression data is used. To discover both local and global clustering structure biclustering and clustering algorithms are utilized. A validation technique, Figure of Merit is used to determine the quality of clustering results. Appropriate knowledge is mined from the clusters by embedding a BLAST similarity search program into the clustering and biclustering process. To discover both local and global clustering structure biclustering and clustering algorithms are utilized. To determine the quality of clustering results, a validation technique, Figure of Merit is used. Appropriate knowledge is mined from the clusters by embedding a BLAST similarity search program into the clustering and biclustering process. Index Terms—Clustering, Gene expression data, validation technique, similarity search program
——————————  —————————— 1 INTRODUCTION HE clustering is the process of grouping data into classes or groups so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other classes [11]. Clustering can also facilitate taxonomy formation,that is,the organi- zation of observations into a hierarchy of classes that group similar events together.There exist a large number of clustering algorithms in the literature.The clustering algorithms are commonly applied in molecular biology for gene expression data analysis [5, 6]. These algorithms are used to partition genes into groups based on the simi- larity among their expression profiles. These clustering algorithms can be broadly classified into partitional and hierarchical algorithms [11]. The partitional clustering algorithms generate a single partition, with a specified or estimated number of nonoverlapping clusters, of the data in an attempt to re- cover natural groups present in the data [11]. Hierarchical clustering (HC) algorithms construct a hierarchy of parti- tions, represented as a dendogram in which each parti- tion is nested within the partition at the next level in the hierarchy [11]. The most commonly used partitional clus- tering algorithms are K-Means (KM) and k-mediods [11]. The KM algorithm takes the input parameter k, and parti- tions a set of n objects into k clusters so that the resulting clusters have high intracluster similarity and low inter cluster similarity. Cluster similarity is measured as the mean value of the objects in a cluster, which can be viewed as the cluster’s centre of gravity [11].

However both KM and HC clustering algorithm have certain disadvantages like difficulties in specifying the number of clusters in advance and in selection of merge or split points [11]. HC cannot represent distinct clusters with similar expression patterns. As clusters grow in size, the actual expression patterns become less relevant [11]. KM clustering requires a specified number of clusters in advance and chooses initial centroids randomly; in addi- tion, it is sensitive to outliers [11]. A novel hybrid ap- proach that combines the merits of these two methods and discards their innate disadvantages [1]. HC is carried out first to decide the location and number of clusters in the first round and run the KM clustering in next round. This approach provides a mechanism to handle outliers [1], [2], [3], [12].
When clustering data the similar observations should be grouped together. Thus needs to be able to compute the distance between two data objects, but it can be de- fined in many forms [12].Distance measurements influ- ence the shape of the clusters, as some elements may be close to one another according to one distance and farther away according to another[16]. In this paper the Pear- son’s Correlation Coefficient measurement is used to cal- culate the distance.In this work the gene expression data is clustered by global and local clustering. Gene expression is the process by which inheritable information from a gene, such as the DNA sequence,is made into a functional gene product,such as protein or RNA[15].The expression of many genes is regulated after transcription(i.e., by microRNAs or ubiquitin ligases) and an increase in mRNA concentration need not always in- crease expression.The advances in microarray technolo- gy,high-throughput and low-throughput methods such a

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut