Unsupervised empirical Bayesian multiple testing with external covariates

Reading time: 5 minute
...

📝 Original Info

  • Title: Unsupervised empirical Bayesian multiple testing with external covariates
  • ArXiv ID: 0807.4658
  • Date: 2008-07-30
  • Authors: Researchers from original ArXiv paper

📝 Abstract

In an empirical Bayesian setting, we provide a new multiple testing method, useful when an additional covariate is available, that influences the probability of each null hypothesis being true. We measure the posterior significance of each test conditionally on the covariate and the data, leading to greater power. Using covariate-based prior information in an unsupervised fashion, we produce a list of significant hypotheses which differs in length and order from the list obtained by methods not taking covariate-information into account. Covariate-modulated posterior probabilities of each null hypothesis are estimated using a fast approximate algorithm. The new method is applied to expression quantitative trait loci (eQTL) data.

💡 Deep Analysis

Deep Dive into Unsupervised empirical Bayesian multiple testing with external covariates.

In an empirical Bayesian setting, we provide a new multiple testing method, useful when an additional covariate is available, that influences the probability of each null hypothesis being true. We measure the posterior significance of each test conditionally on the covariate and the data, leading to greater power. Using covariate-based prior information in an unsupervised fashion, we produce a list of significant hypotheses which differs in length and order from the list obtained by methods not taking covariate-information into account. Covariate-modulated posterior probabilities of each null hypothesis are estimated using a fast approximate algorithm. The new method is applied to expression quantitative trait loci (eQTL) data.

📄 Full Content

arXiv:0807.4658v1 [stat.AP] 29 Jul 2008 The Annals of Applied Statistics 2008, Vol. 2, No. 2, 714–735 DOI: 10.1214/08-AOAS158 c ⃝Institute of Mathematical Statistics, 2008 UNSUPERVISED EMPIRICAL BAYESIAN MULTIPLE TESTING WITH EXTERNAL COVARIATES By Egil Ferkingstad,1 Arnoldo Frigessi, H˚avard Rue, Gudmar Thorleifsson and Augustine Kong University of Oslo and Centre for Integrative Genetics, (sfi)2—Statistics for Innovation, Norwegian University of Science and Technology, Decode Genetics and Decode Genetics In an empirical Bayesian setting, we provide a new multiple test- ing method, useful when an additional covariate is available, that influences the probability of each null hypothesis being true. We mea- sure the posterior significance of each test conditionally on the co- variate and the data, leading to greater power. Using covariate-based prior information in an unsupervised fashion, we produce a list of significant hypotheses which differs in length and order from the list obtained by methods not taking covariate-information into account. Covariate-modulated posterior probabilities of each null hypothesis are estimated using a fast approximate algorithm. The new method is applied to expression quantitative trait loci (eQTL) data. 1. Introduction. Science, industry and business possess the technology to collect, store and distribute huge amounts of data efficiently and often at low cost. Sensors and instrumentation, data logging capacity and com- munication power have increased the breadth and depth of data. Systems are measured more in detail, giving a more complete but complex picture of processes and phenomena. Also, it is necessary to integrate many sources of data of different type and quality. In high-throughput genomics, large numbers of simultaneous comparisons are necessary to discover differentially expressed genes among thirty thousand measured ones. Similarly, in finance, one wishes to monitor prices of thousands of products and derivatives simul- taneously to detect abnormal behavior, or in geophysics or brain imaging, questioning thousands of 3D voxels about their properties. Such tests are Received March 2007; revised January 2008. 1Supported by the National program for research in functional genomics in Norway from the Research Council of Norway. Key words and phrases. Bioinformatics, multiple hypothesis testing, false discovery rates, data integration, empirical Bayes. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Applied Statistics, 2008, Vol. 2, No. 2, 714–735. This reprint differs from the original in pagination and typographic detail. 1 2 FERKINGSTAD ET AL. often dependent, and the dependency structure is ill specified, so that the ef- fective number of independent tests is unknown. Sometimes, we expect that only a small subset of decisions will have a positive result: the solution is then sparse in the huge parameter space. To discover significant cases, it is neces- sary to develop new methods that either exploit available a priori knowledge on the structure of the solution, or merge different data sets, each adding information. Benjamini and Hochberg (1995) proposed the false discovery rate (FDR), which can adapt automatically to sparsity and has been shown to be asymptotically optimal in a certain minimax sense [Abramovich et al. (2006)]. FDR adjustments of p-values are nowadays routinely performed on large scale multiple studies in many sciences and applied areas, from astronomy [Miller et al. (2001)] to genomics [Tusher et al. (2001); from neu- roimaging [Genovese, Lazar and Nichols (2002)] to industrial organization [Brown et al. (2005)]. Bayesian approaches are based on the estimation of the posterior probability of the null hypothesis. Efron et al. (2001) have de- veloped the theory of the local false discovery rate, based on an estimation procedure originally developed by Anderson and Blair (1982). As the FDR provides a probability of misclassification for sets of tests called significant, the posterior probability that the null hypothesis is true provides a similar measure, but for a local set about the particular value of the test statistic. Instead of summarizing the data by a test statistic, hierarchical Bayesian ap- proaches have been developed that model parametrically the full measured data [Baldi and Long (2001), Do, M¨uller and Tang (2005), Kendziorski et al. (2006), Lonnstedt and Speed (2002), Newton et al. (2004), and Storey (2007) also makes full use of the data in a hypothesis testing setting. Both approaches have their strengths and weaknesses, in terms of validity of the distributional assumptions under the alternative hypothesis, actual availabil- ity of the full data, computational speed and simplicity of the methodology. This paper assumes access to summary test statistics for every hypothesis to be tested. We propose a simple methodology which allows modulating the posterior probability of

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut