MOST: detecting cancer differential gene expression

February 23, 2026

Reading time: 6 minute

...

📝 Original Info

Title: MOST: detecting cancer differential gene expression
ArXiv ID: 0709.1307
Date: 2008-12-17
Authors: Researchers from original ArXiv paper

📝 Abstract

We propose a new statistics for the detection of differentially expressed genes, when the genes are activated only in a subset of the samples. Statistics designed for this unconventional circumstance has proved to be valuable for most cancer studies, where oncogenes are activated for a small number of disease samples. Previous efforts made in this direction include COPA, OS and ORT. We propose a new statistics called maximum ordered subset t-statistics (MOST) which seems to be natural when the number of activated samples is unknown. We compare MOST to other statistics and find the proposed method often has more power then its competitors.

💡 Deep Analysis

Deep Dive into MOST: detecting cancer differential gene expression.

📄 Full Content

arXiv:0709.1307v1 [stat.AP] 10 Sep 2007 MOST: detecting cancer diﬀerential gene expression HENG LIAN October 24, 2018 Abstract We propose a new statistics for the detection of diﬀerentially expressed genes, when the genes are activated only in a subset of the samples. Statis- tics designed for this unconventional circumstance has proved to be valu- able for most cancer studies, where oncogenes are activated for a small number of disease samples. Previous eﬀorts made in this direction include COPA ([Tomlins and others(2005)]), OS ([Tibshirani and Hastie(2006)]) and ORT ([Wu(2007)]). We propose a new statistics called maximum or- dered subset t-statistics (MOST) which seems to be natural when the number of activated samples is unknown. We compare MOST to other statistics and ﬁnd the proposed method often has more power then its competitors. Cancer; COPA; Diﬀerential gene expression; Microarray. 1 Introduction The most popular method for diﬀerential gene expression detection in two- sample microarray studies is to compute the t-statistics. The diﬀerentially ex- pressed genes are those whose t-statistics exceed a certain threshold. Recently, due to the realization that in many cancer studies, many genes show increased expressions in disease samples, but only for a small number of those samples. The study of [Tomlins and others(2005)] shows that t-statistics has low power in this case, and they introduced the so-called “cancer outlier proﬁle analysis” (COPA). Their study shows clearly that COPA can perform better than the traditional t-statistics for cancer microarray data sets. More recently, several progresses have been made in this direction with the aim to design better statistics to account for the heterogeneous activation pat- tern of the cancer genes. In [Tibshirani and Hastie(2006)], the authors intro- duced a new statistics, which they called outlier sum. Later, [Wu(2007)] pro- posed outlier robust t-statistics (ORT) and showed it usually outperformed the previously proposed ones in both simulation study and application to real data set. In this paper, we propose another statistics for the detection of cancer dif- ferential gene expression which have similar power to ORT when the number of activated samples are very small, but perform betters when more samples are 1 2 MAXIMUM ORDERED SUBSET T-STATISTICS (MOST) 2 diﬀerentially expressed. We call our new method the maximum ordered subset t-statistics (MOST). Through simulation studies we found the new statistics outperformed the previously proposed ones under some circumstances and never signiﬁcantly worse in all situations. Thus we think it is a valuable addition to the dictionary of cancer outlier expression detection. 2 Maximum ordered subset t-statistics (MOST) We consider the simple 2-class microarray data for detecting cancer genes. We assume there are n normal samples and m cancer samples. The gene expressions for normal samples are denoted by xij for genes i = 1, 2, . . . , p and samples j = 1, 2, . . .n, while yij denote the expressions for cancer samples with i = 1, 2, . . ., p and j = 1, 2, . . . m. In this paper, we are only interested in one-sided test where the activated genes from cancer samples have a higher expression level. The extension to two-sided test is straightforward. The usual t-statistics (up to a multiplication factor independent of genes) for two-sample test of diﬀerences in means is deﬁned for each gene i by Ti = ¯xi −¯yi si , (1) where ¯xi = P j xij/n is the average expression of gene i in normal samples, ¯yi = P j yij/m is the average expression of gene i in cancer samples, and si is the usual pooled standard deviation estimate s2 i = P 1≤j≤n(xij −¯xi)2 + P 1≤j≤m(yij −¯yi)2 n + m −2 . The t-statistics is powerful when the alternative distribution is such that yij, j = 1, 2, . . ., m all come from a distribution with a higher mean. [Tomlins and others(2005)] argues that for most cancer types, heterogeneous activation patterns make t- statistics ineﬃcient for detecting those expression proﬁles. They deﬁned the COPA statistics Ci = qr({yij}1≤j≤m) −medi madi , (2) where qr(·) is the rth percentile of the data, medi = median({xij}1≤j≤n, {yij}1≤j≤m) is the median of the pooled samples for gene i, and madi = 1.4826×median({xij− medi}1≤j≤n, {yij −medi}1≤j≤m) is the median absolute deviation of the pooled samples. The choice of r in (2) depends on the subjective judgement of the user. The use of medi and madi to replace the mean and the standard deviation in (1) is due to robustness considerations since it is already known that some of the genes are diﬀerentially expressed. In (2), only one value of {yij} is used in the computation. A more eﬃcient strategy would be to use additional expression values. Let Oi = {yij : yij > q75({xij}1≤j≤n, {yij}1≤j≤m)+ IQR({xij}1≤j≤n, {yij}1≤j≤m)} (3) 2 MAXIMUM ORDERED SUBSET T-STATISTICS (MOST) 3 be the outliers from the cancer samples for gene i, where IQR(·) is the interquar- tile range of the d

…(Full text truncated)…

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on ArXiv data.

MOST: detecting cancer differential gene expression

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

A study of pre-validation

A tighter constraint on Earth-system sensitivity from long-term temperature and carbon-cycle observations

An Inverse Problem Study: Credit Risk Ratings as a Determinant of Corporate Governance and Capital Structure in Emerging Markets: Evidence from Chinese Listed Companies

Start searching

No results found