Elephant Search with Deep Learning for Microarray Data Analysis
📝 Abstract
Even though there is a plethora of research in Microarray gene expression data analysis, still, it poses challenges for researchers to effectively and efficiently analyze the large yet complex expression of genes. The feature (gene) selection method is of paramount importance for understanding the differences in biological and non-biological variation between samples. In order to address this problem, a novel elephant search (ES) based optimization is proposed to select best gene expressions from the large volume of microarray data. Further, a promising machine learning method is envisioned to leverage such high dimensional and complex microarray dataset for extracting hidden patterns inside to make a meaningful prediction and most accurate classification. In particular, stochastic gradient descent based Deep learning (DL) with softmax activation function is then used on the reduced features (genes) for better classification of different samples according to their gene expression levels. The experiments are carried out on nine most popular Cancer microarray gene selection datasets, obtained from UCI machine learning repository. The empirical results obtained by the proposed elephant search based deep learning (ESDL) approach are compared with most recent published article for its suitability in future Bioinformatics research.
💡 Analysis
Even though there is a plethora of research in Microarray gene expression data analysis, still, it poses challenges for researchers to effectively and efficiently analyze the large yet complex expression of genes. The feature (gene) selection method is of paramount importance for understanding the differences in biological and non-biological variation between samples. In order to address this problem, a novel elephant search (ES) based optimization is proposed to select best gene expressions from the large volume of microarray data. Further, a promising machine learning method is envisioned to leverage such high dimensional and complex microarray dataset for extracting hidden patterns inside to make a meaningful prediction and most accurate classification. In particular, stochastic gradient descent based Deep learning (DL) with softmax activation function is then used on the reduced features (genes) for better classification of different samples according to their gene expression levels. The experiments are carried out on nine most popular Cancer microarray gene selection datasets, obtained from UCI machine learning repository. The empirical results obtained by the proposed elephant search based deep learning (ESDL) approach are compared with most recent published article for its suitability in future Bioinformatics research.
📄 Content
Elephant Search with Deep Learning for Microarray Data Analysis Mrutyunjaya Panda P.G. Department of Computer Science and Applications, Utkal University, Vani Vihar, Odisha-4 India
- Introduction Gene is the basic unit of storage containing hereditary information in living beings. For the technical point of view,it can be treated as a distinct sequence of nucleotides constituting part of a chromosome. Microarrays data analysis is relatively a new technology that aims to help in finding the right treatment for many diseases with accurate medical diagnosis from the huge number of genes across different experimental conditions. Due to the expensive and complicated nature of the microarray datasets, it is somehow difficult to predict and hence demands careful experimentation with appropriate statistical tools for fruitful analysis. It is well known that the gene expression is a process that maps genes DNA sequence into its corresponding mRNA sequences and then finally to amino acid sequences of proteins. Microarray data analysis is a powerful technology with enormous opportunities presents gene expression profiling to describe the expression levels of hundreds and thousands of genes in cells correlated with corresponding protein, helps one to understand the cellular mechanism of the biological processes. Data Mining helps extract meaningful observations in such a huge and complex microarray gene expressions datasets as a post genome cancer diagnostics to uncover the details on : how the genes are regulated; how genes makes an impact on the cancerous mutation of cells and how the performance depends on various medical experimental conditions to name a few. The microarray dataset presents samples or a condition in rows while the respective genes are provided in a column. The classification data mining is of great impression of dealing with the patients’ gene expression profile for a specific disease in a best possible manner, urges of more research in the area for better predictive accuracy. As the large number of genes are present in the Microarray data analysis, it is always suggested to carry out some potential gene (features) selection algorithms in order to find the most informative genes to reduce the curse of dimensionality. This further may be applied with a best possible classifier to predicting the samples correctly to achieve high accuracy reducing the computational cost and more importantly efficient and effective diagnosis and prognosis can then be customized for the treatment for that patient. The microarray data analysis needs a clear objective to see its successful implementation for a greater cause of the society at large , as cited by Tjaden and Chen (2006). Clustering is one of the popular technique being used for gene profiling microarray data analysis using K-means clustering and Self organizing maps (SOM) (Sheng-Bo, M.R. and Lok, 2006; Young, 2009). Alshamlan, Badr and Alohali (2013) presents a comprehensive study with objectives and approaches for cancer microarray gene expression analysis and conclude with detailed investigation on the available approached in this area. Researchers found that most of the cancer studies with microarray gene expression profiling contain comparison of various classes of diseases (Simon, 2009; Wang, Chu, and Xie, 2007) and their predictions, hence sought for using classification algorithms instead of clustering ones (Dougherty, Kohavi, and Sahami, 1995). Support vector machine(SVM) is considered to be one of the most popular and well established classification methods for microarray data analysis for binary classifications initially (Platt, Cristianini, and Shawe-Taylor, 2000). But, as many cancer datasets are of Multi-class, researchers have proposed to use many variants of SVM such as: DAGSVM ((Platt, Cristianini, and Shawe-Taylor, 2000), evolutionary SVM (ESVM) (Huang and F.-L. Chang, 2007), genetic algorithm based SVM (GASVM) (El Akadi, A. Amine, A. El Ouardighi, and D. Aboutajdine, 2009) and Fuzzy SVM (FSVM) (Mao, X. Zhou, D. Pi, Y. Sun, and S. T. C. Wong, 2005). Neural network based classifiers are also been proposed by many to effective and efficient microarray data analysis that includes: Functional link Neurak network (FNN) (Wang, Chu, and Xie, 2007), Extreme learning machines (ELM) (R. Zhang, G.-B. Huang, N. Sundararajan, and P. Saratchandran, 2007), improved wavelet neural network (WNN) (Zainuddin and P. O, 2009), probabilistic neural networks (PNN) (Berrar, Downes, and Dubitzky, 2003) and subsequent artificial neural network (SAAN) (Roland, Dawn, Holger, Dirk, Klaus, Siegfried, and Mathias, 2004). Apart from single classification or clustering methods for gene classification, ensemble methods are also been used researchers to Multi-class classification problems, but it is noticed that the ensemble methods are not able to improve the performance in comparison the single classifier methods (Ghorai, A. Mukherjee, S. Sengupta, and P. Dutta, 2010). The
This content is AI-processed based on ArXiv data.