Bayesian hierarchical modelling for inferring genetic interactions in yeast

B A Y E S I A N H I E R A R C H I C A L M O D E L L I N G F O R I N F E R R I N G G E N E T I C I N T E R A C T I O N S I N Y E A S T J O N A T H A N H E Y D A R I Thesis submitted for the de gree of Doctor of Philosophy Institute for Cell & Molecular Biosciences Newcastle Univer sity Newcastle upon T yne United Kingdom February , 2014 Abstract Identifying genetic interactions for a gi ven microor ganism, such as yeast, is difﬁcult. Quantitati ve Fitness Analysis (QF A) is a high-throughput experimental and computa- tional methodology for quantifying the ﬁtness of microbial cultures. QF A can be used to compare between ﬁtness observ ations for dif ferent genotypes and thereby infer genetic interaction strengths. Current “nai ve” frequentist statistical approaches used in QF A do not model between-genotype variation or dif ference in genotype variation under dif fer - ent conditions. In this thesis, a Bayesian approach is introduced to ev aluate hierarchical models that better reﬂect the structure or design of QF A experiments. First, a two-stage approach is presented: a hierarchical logistic model is ﬁtted to microbial culture gro wth curves and then a hierarchical interaction model is ﬁtted to ﬁtness summaries inferred for each genotype. Next, a one-stage Bayesian approach is presented: a joint hierarchi- cal model which simultaneously models ﬁtness and genetic interaction, thereby av oiding passing information between models via a uni v ariate ﬁtness summary . The ne w hierarchi- cal approaches are then compared using a dataset examining the ef fect of telomere defects on yeast. By better describing the e xperimental structure, new evidence is found for genes and complex es which interact with the telomere cap. V arious e xtensions of these models, including models for data transformation, batch ef fects and intrinsically stochastic gro wth models are also considered. Acknowledgements First and foremost I would like to thank both Prof Darren Wilkinson and Prof David L ydall for their support and encouragement during the preparation of this thesis. Thanks also go to Dr Conor Lawless for his in v aluable support and advice. Further , thanks to the staf f and students from both the School of Mathematics and Statistics and the Institute of Cellular and Molecular Biosciences. Special thanks go to my family and friends for the support and motiv ation they have provided me throughout my studies. In particular , I would like to express my love and gratitude to my partner Christina for her encouragement and patience. Finally , I would like to acknowledge the ﬁnancial support provided by the Biotech- nology and Biological Sciences Research Council and the Medical Research Council. Contents 1 Introduction 1 1.1 Quantitati ve Fitness Analysis . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1 Quantifying ﬁtness . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.2 The logistic gro wth model . . . . . . . . . . . . . . . . . . . . . 8 1.1.3 Fitness deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2 Epistasis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.1 Deﬁning epistasis . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.2 Addinall et al. (2011) Quantitativ e Fitness Analysis screen com- parison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2.3 Fitness plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3 The stochastic logistic gro wth model . . . . . . . . . . . . . . . . . . . . 15 1.4 Outline of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2 Background 17 2.1 Y east biology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.1 T elomeres . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.2 The end replication problem . . . . . . . . . . . . . . . . . . . . 18 2.1.3 CDC13 and cdc13-1 . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.4 URA3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.5 High-throughput methodology for Quantitati ve Fitness Analysis . 20 2.2 Comparing lists of genes . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2.1 Jaccard index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.2 Spearman’ s rank correlation coefﬁcient . . . . . . . . . . . . . . 23 2.2.3 Gene ontology term enrichment analysis . . . . . . . . . . . . . . 23 2.3 Bayesian inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.1 Marko v chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . 24 2.3.2 Metropolis-Hastings algorithm . . . . . . . . . . . . . . . . . . . 25 2.3.3 Gibbs sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.4 Con vergence issues . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3.5 Con vergence diagnostics . . . . . . . . . . . . . . . . . . . . . . 28 2.3.6 Computer programming . . . . . . . . . . . . . . . . . . . . . . 28 i Contents 2.4 Hierarchical modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4.1 Distributional assumptions . . . . . . . . . . . . . . . . . . . . . 29 2.4.2 Indicator v ariables . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4.3 The three parameter t -distribution . . . . . . . . . . . . . . . . . 30 2.5 Generalisations of the logistic gro wth model . . . . . . . . . . . . . . . . 30 2.5.1 Richards’ gro wth model . . . . . . . . . . . . . . . . . . . . . . 31 2.5.2 Generalised logistic gro wth model . . . . . . . . . . . . . . . . . 31 2.6 State space models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.6.1 Stochastic dif ferential equations . . . . . . . . . . . . . . . . . . 32 2.6.2 The Euler-Maruyama method . . . . . . . . . . . . . . . . . . . 33 2.6.3 Kalman ﬁlter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.6.4 Linear noise approximation . . . . . . . . . . . . . . . . . . . . 36 3 Modelling genetic interaction 37 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2 Bayesian hierarchical model inference . . . . . . . . . . . . . . . . . . . 37 3.3 T wo-stage Bayesian hierarchical approach . . . . . . . . . . . . . . . . . 39 3.3.1 Separate hierarchical model . . . . . . . . . . . . . . . . . . . . 39 3.3.2 Interaction hierarchical model . . . . . . . . . . . . . . . . . . . 44 3.4 One-stage Bayesian hierarchical approach . . . . . . . . . . . . . . . . . 47 3.4.1 Joint hierarchical model . . . . . . . . . . . . . . . . . . . . . . 47 3.5 Random ef fects model . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4 Case Studies 51 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2 cdc13-1 27 ◦ C vs ura3 ∆ 27 ◦ C suppressor/enhancer data set . . . . . . . 51 4.2.1 Frequentist approach . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2.2 T wo stage Bayesian approach . . . . . . . . . . . . . . . . . . . 57 4.2.3 One stage Bayesian approach . . . . . . . . . . . . . . . . . . . 58 4.3 Comparison with pre vious analysis . . . . . . . . . . . . . . . . . . . . . 59 4.3.1 Signiﬁcant genetic interactions . . . . . . . . . . . . . . . . . . . 59 4.3.2 Pre viously kno wn genetic interactions . . . . . . . . . . . . . . . 60 4.3.3 Hierarchy and model parameters . . . . . . . . . . . . . . . . . . 62 4.3.4 Computing requirements . . . . . . . . . . . . . . . . . . . . . . 62 4.3.5 Con vergence diagnostics . . . . . . . . . . . . . . . . . . . . . . 64 4.3.6 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . 69 ii Contents 4.4 Bayesian inference code comparison . . . . . . . . . . . . . . . . . . . . 70 4.5 Further case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.6 Extensions of the joint hierarchical model . . . . . . . . . . . . . . . . . 81 5 F ast Bayesian parameter estimation f or stochastic logistic gr o wth models 89 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.2 The Rom ´ an-Rom ´ an & T orres-Ruiz (2012) dif fusion process . . . . . . . . 90 5.3 Linear noise approximation with multiplicati ve noise . . . . . . . . . . . 91 5.4 Linear noise approximation with additi ve noise . . . . . . . . . . . . . . 92 5.5 Simulation and Bayesian inference for the stochastic logistic growth model and approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.5.1 Bayesian parameter inference with approximate models . . . . . 96 5.5.2 Application to observed yeast data . . . . . . . . . . . . . . . . . 99 6 Conclusions and future w ork 105 A QF A data set sample, solving the logistic gro wth model and random effects model R code 111 A.1 cdc13-1 Quantitati v e Fitness Analysis data set sample . . . . . . . . . . . 111 A.2 Solving the logistic gro wth model . . . . . . . . . . . . . . . . . . . . . 112 A.3 Random ef fects model R code . . . . . . . . . . . . . . . . . . . . . . . 114 B Bayesian hierar chical modelling 115 B.1 Hyper-parameter v alues for Bayesian hierarchical modelling . . . . . . . 115 B.2 cdc13-1 27 ◦ C vs ura3 ∆ 27 ◦ C ﬁtness plots with gene ontology terms highlighted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 B.3 Lists of top genetic interactions for the tw o-stage and one-stage Bayesian approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 B.4 cdc13-1 27 ◦ C vs ura3 ∆ 27 ◦ C ﬁtness plots for the joint hierarchical model in terms of carrying capacity and gro wth rate parameters . . . . . . . . . 122 B.5 Gene ontology term enrichment analysis in R . . . . . . . . . . . . . . . 124 B.6 Code for Just Another Gibbs Sampler software . . . . . . . . . . . . . . 125 B.6.1 Separate hierarchical model code . . . . . . . . . . . . . . . . . 125 B.6.2 Interaction hierarchical model code . . . . . . . . . . . . . . . . 125 B.6.3 Joint hierarchical model code . . . . . . . . . . . . . . . . . . . 126 B.7 Additional cdc13-1 27 ◦ C vs ura3 ∆ 27 ◦ C ﬁtness plots . . . . . . . . . . 127 iii Contents B.8 Correlation between methods . . . . . . . . . . . . . . . . . . . . . . . . 132 C Stochastic logistic gro wth modelling 134 C.1 Linear noise approximation of the stochastic logistic growth model with multiplicati ve intrinsic noise solution . . . . . . . . . . . . . . . . . . . . 134 C.2 Zero-order noise approximation of the stochastic logistic growth model . 137 C.3 Linear noise approximation of the stochastic logistic growth model with additi ve intrinsic noise solution . . . . . . . . . . . . . . . . . . . . . . . 139 C.4 Prior hyper-parameters for Bayesian state space models . . . . . . . . . . 141 C.5 Kalman ﬁlter for the linear noise approximation of the stochastic logistic gro wth model with additi v e intrinsic noise and Normal measurement error 142 i v List of Figur es 1.1 Example 384-spot plate image from a yeast quantitativ e ﬁtness analysis screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 Cropped image of 15 out of 384 spotted yeast cultures from a 384-spot plate 7 1.3 Observed yeast data and ﬁtted logistic gro wth curves . . . . . . . . . . . 8 1.4 Fitness plot taken from Addinall et al. (2011) . . . . . . . . . . . . . . . 14 2.1 T elomere at a chromosome end . . . . . . . . . . . . . . . . . . . . . . . 18 2.2 The end replication problem . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 The spotting procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1 Plate diagram for the separate hierarchical model . . . . . . . . . . . . . 43 3.2 Plate diagram for the interaction hierarchical model . . . . . . . . . . . . 46 3.3 Plate diagram for the joint hierarchical model . . . . . . . . . . . . . . . 49 4.1 Separate hierarchical model logistic gro wth curve ﬁts . . . . . . . . . . . 54 4.2 Fitness plots with orf ∆ posterior mean ﬁtnesses . . . . . . . . . . . . . . 56 4.3 Joint hierarchical model logistic gro wth curve ﬁts . . . . . . . . . . . . . 63 4.4 Con vergence diagnostics for the separate hierarchical model . . . . . . . 66 4.5 Con vergence diagnostics for the interaction hierarchical model . . . . . . 67 4.6 Con vergence diagnostics for the joint hierarchical model . . . . . . . . . 68 4.7 Density plots for posterior samples from the joint hierarchical model us- ing the C programming language and Just Another Gibbs Sampler software 72 4.8 cdc13-1exo1 ∆ 27 ◦ C vs cdc13-1 27 ◦ C joint hierarchical model ﬁtness plot 77 4.9 cdc13-1rad9 ∆ 27 ◦ C vs cdc13-1 27 ◦ C joint hierarchical model ﬁtness plot 78 4.10 yku70 ∆ 37 ◦ C vs ura3 ∆ 37 ◦ C joint hierarchical model ﬁtness plot . . . . 79 4.11 ura3 ∆ 37 ◦ C vs ura3 ∆ 20 ◦ C joint hierarchical model ﬁtness plot . . . . . 80 4.12 cdc13-1 27 ◦ C vs ur a3 ∆ 27 ◦ C joint hierarchical model with batch ef fects ﬁtness plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.13 cdc13-1 27 ◦ C vs ura3 ∆ 27 ◦ C joint hierarchical model with transforma- tions ﬁtness plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 v List of Figures 5.1 Forw ard trajectories for the stochastic logistic growth model and approx- imations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.2 Forw ard trajectories of logistic gro wth models and stochastic logistic data with Log-normal measurement error . . . . . . . . . . . . . . . . . . . . 100 5.3 Con vergence diagnostics for the linear noise approximation of the stochas- tic logistic gro wth model with additi v e intrinsic noise . . . . . . . . . . . 101 5.4 Forw ard trajectories of logistic gro wth models and stochastic logistic data with Normal measurement error . . . . . . . . . . . . . . . . . . . . . . 102 5.5 Forw ard trajectories of logistic gro wth models and observ ed yeast data . . 104 A.1 cdc13-1 QF A data set sample . . . . . . . . . . . . . . . . . . . . . . . . 111 B.1 Alternativ e ﬁtness plots with orf ∆ posterior mean ﬁtnesses and labels for the “telomere maintenance” gene ontology term . . . . . . . . . . . . . . 116 B.2 Alternativ e ﬁtness plots with orf ∆ posterior mean ﬁtnesses and labels for the “ageing” gene ontology term . . . . . . . . . . . . . . . . . . . . . . 117 B.3 Alternativ e ﬁtness plots with orf ∆ posterior mean ﬁtnesses and labels for the “response to DN A damage” gene ontology term . . . . . . . . . . . . 118 B.4 Alternativ e ﬁtness plots with orf ∆ posterior mean ﬁtnesses and labels for the “peroxisomal org anisation” gene ontology term . . . . . . . . . . . . 119 B.5 Joint hierarchical model carrying capacity ﬁtness plot . . . . . . . . . . . 122 B.6 Joint hierarchical model growth rate ﬁtness plot . . . . . . . . . . . . . . 123 B.7 Alternativ e non-Bayesian, hierarchical random ef fects model ﬁtness plot . 127 B.8 Alternativ e interaction hierarchical model ﬁtness plot . . . . . . . . . . . 128 B.9 Alternativ e joint hierarchical model ﬁtness plot . . . . . . . . . . . . . . 129 B.10 Alternativ e joint hierarchical model carrying capacity ﬁtness plot . . . . . 130 B.11 Alternativ e joint hierarchical model gro wth rate ﬁtness plot . . . . . . . . 131 B.12 M D R × M DP genetic interaction correlation plot of the joint hierarch- cial model versus Addinall et al. (2011) . . . . . . . . . . . . . . . . . . 133 vi List of T ables 3.1 Description of the separate hierarchical model . . . . . . . . . . . . . . . 41 3.2 Description of the interaction hierarchical model . . . . . . . . . . . . . 45 3.3 Description of the joint hierarchical model . . . . . . . . . . . . . . . . . 48 3.4 Description of the random ef fects model . . . . . . . . . . . . . . . . . . 50 4.1 Number of genes interacting with cdc13-1 at 27 ◦ C . . . . . . . . . . . . . 57 4.2 Overlap between methods for genes interacting with cdc13-1 at 27 ◦ C and gene ontology terms ov er -represented in lists of interactions . . . . . . . 59 4.3 Bayesian model con vergence statistics . . . . . . . . . . . . . . . . . . . 65 4.4 Simulation study with a joint hierarchical model simulated dataset. . . . . 70 4.5 Unpaired t-test and K olmagoro v-Smirnov p-v alues comparing posterior samples from the joint hierarchical model using both C programming lan- guage and Just Another Gibbs Sampler software . . . . . . . . . . . . . . 71 4.6 Number of interactions identiﬁed for further case studies and applications of the joint hierarchical model extensions . . . . . . . . . . . . . . . . . 74 4.7 Overlap between dif ferent QF A comparisons for genes interacting and gene ontology terms ov er -represented in lists of interactions . . . . . . . 76 4.8 Overlap with joint hierarchical model extensions for genes interacting with cdc13-1 at 27 ◦ C and gene ontology terms ov er -represented in lists of interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.9 Description of the joint hierarchical model with batch ef fects . . . . . . . 85 4.10 Description of the joint hierarchical model with transformations . . . . . 86 5.1 Bayesian state space model parameter posterior means, standard de via- tions and true v alues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.2 T otal mean squared error for 10 observed yeast gro wth time courses . . . 103 B.1 Hyper-parameter values for Bayesian hierarchical modelling of quantita- ti ve ﬁtness analysis data . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 B.2 Sample of interaction hierarchical model top genetic interactions with cdc13-1 at 27 ◦ C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 vii List of T ables B.3 Sample of joint hierarchical model top genetic interactions with cdc13-1 at 27 ◦ C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 B.4 Spearman’ s rank correlation coefﬁcients for magnitudes from genetic in- dependence, between approaches . . . . . . . . . . . . . . . . . . . . . . 132 C.1 Prior hyper-parameters for Bayesian sate space models . . . . . . . . . . 141 viii Chapter 1. Intr oduction High-throughput screening of microbial culture ﬁtnesses is a powerful tool in biology that can be used to learn about the interaction between genes and proteins in living cells. Fitness, the ability of org anisms to survi ve and reproduce in a speciﬁc en vironment, is of fundamental importance to e very li ving organism. Measuring components of ﬁtness (such as population gro wth rate) in microbial cultures is a way to directly assess and rank the health of such populations. Genome-wide Quantitativ e Fitness Analysis (QF A) is a robot-assisted high-throughput laboratory workﬂo w , combining systematic genetic techniques to generate arrays of genetically distinct microbial cultures with quantiﬁcation and modelling of growth curves to estimate ﬁtnesses (Banks et al. , 2012; Addinall et al. , 2011). An important reason for carrying out QF A is to compare the ﬁtnesses of cultures with distinct genotypes in order to quantify epistasis (genetic interaction). In Addinall et al. (2011), a frequentist statistical approach is used to model and make inference for signiﬁcantly interacting genes in a QF A screen comparison. Other large- scale quantitativ e genetic interaction screening approaches exist, such as Epistatic Miniar- ray Proﬁling (E-MAP) (Schuldiner et al. , 2006) and Synthetic Genetic Array (SGA)(T ong & Boone, 2006), but we expect QF A to provide higher quality ﬁtness estimates by using a culture inoculation technique which results in a wider range of cell densities during cul- ture growth and by capturing complete growth curves instead of using single time point assays. QF A and alternati v e genetic interaction screening approaches mentioned above use frequentist statistical methods that cannot account for all sources of experimental v ariation or estimate e vidence of genetic interaction simultaneously and do not partition v ariation into population, genotype and repeat lev els. Further , the frequentist statistical approaches used in the methods abov e cannot account for rele v ant prior information. The ﬁrst aim of this thesis is to de v elop new Bayesian models that will better deter- mine genes which signiﬁcantly interact than the current frequentist approach. Accounting for more sources of variation than the frequentist approach, Bayesian QF A will be able to ﬁnd genetic interactions within QF A with less error and increased conﬁdence. The ne w Bayesian QF A will be used to help locate genes that are related to telomere activity in suppressor/enhancer analysis as well as other high throughput experiments such as drug screening. Analysis of high throughput genetic screen data in volves modelling both the experi- 1 Chapter 1. Introduction mental structure and its sources of v ariation. Many underlying sources of v ariation within the data can be identiﬁed in the experimental design. W ithout fully modelling v ariation within the experiment, a model may not be able to identify the more subtle interactions. W ith a Bayesian approach (Bernardo & Smith, 2007) there is more ﬂexibility of model choice, allo wing model structure to reﬂect experimental structure or design. Currently there is no standard frequentist approach which can deal with inference for a hierar- chical model that simultaneously models logistic growth parameters and probability of genetic interaction. Using Bayesian hierarchical modelling (Gelman & Hill, 2006), this study looks to extract as much information as possible from valuable QF A data sets. The Bayesian hierarchical approach also allows the borro wing of strength across subjects, helping identify signiﬁcantly interacting open reading frame deletions ( orf ∆ s) which oth- erwise may hav e been gi v en lo w signiﬁcance and o v erlooked. Prior distributions are used to incorporate the existing information known about the possible values for parameters. Bayesian analysis can allo w the use of Boolean indicators to describe the evidence that each orf ∆ interacts with the query mutation in terms of probability . During the model ﬁtting procedure, we ﬁnd that orf ∆ ﬁtnesses hav e a long- tailed distribution around their population mean due to unusually ﬁt, dead or missing orf ∆ s. In these instances, the scaled t distribution is used to describe these features. Follo wing the approach for determining epistasis from the comparison of two QF A screens presented by Addinall et al. (2011), the present study de velops a two-stage ap- proach to this problem: i ) the separate hierarchical model (SHM) is ﬁtted to cell density measurements to estimate ﬁtness, then ii ) ﬁtness estimates are input to the interaction hierarchical model (IHM). Next, a uniﬁed approach, referred to as the joint hierarchical model (JHM), is de veloped. The JHM models mutant strain ﬁtnesses and genetic interac- tions simultaneously , without having to pass information between two different models. The JHM can also allo w tw o important, distinct, microbial ﬁtness phenotypes (population gro wth rate and carrying capacity) to provide e vidence for genetic interaction simultane- ously . Applying the new Bayesian approaches to QF A screen data, the present study is able to identify ne w genes and comple x es that interact with genetic mutation cdc13-1 in yeast. cdc13-1 is a genetic mutation which results in dysfunctional telomere maintenance. T elomeres are repetitiv e re gions of deoxyribonucleic acid (DN A) at the end of linear chromosomes. They hav e been of great interest in recent years as they have been shown to hav e a role in ageing and cancer (Shay & Wright, 2005). 2 Chapter 1. Introduction Current approaches (Addinall et al. , 2011) ﬁt a deterministic logistic gro wth model to yeast QF A data. For logistic growth data sets where stochastic ﬂuctuations are observed, the deterministic model fails to account for the intrinsic noise. T o better describe observed yeast QF A data, a stochastic model can be used. Stochastic models simultaneously describe dynamics and noise or heterogeneity in real systems (Chen et al. , 2010). For example, stochastic models are increasingly recognised as necessary tools for understanding the behaviour of complex biological systems (W ilkinson, 2011, 2009) and are also used to capture uncertainty in ﬁnancial market beha viour (Kijima, 2013; K oller, 2012). Many such models are written as continuous stochastic dif ferential equations (SDEs) which often do not ha ve analytical solutions and are slow to ev aluate numerically compared to their deterministic counterparts. Simulation speed is often a particularly critical issue when inferring model parameter values by comparing simulated output with observed data (Hurn et al. , 2007). For SDE models where no e xplicit e xpression for the transition density is a v ailable, it is possible to infer parameter v alues by simulating a latent process using a data augmen- tation approach (Golightly & W ilkinson, 2005). Ho we v er , this method is computationally intensi ve and not practical for all applications. When fast inference for SDEs is important, for example real-time analysis as part of decision support systems or big data inference problems where simultaneous model ﬁts are made to man y thousands of datasets (e.g. Heydari et al. (2012)), an alternati ve approach is needed (Heydari et al. , 2013). The second aim of this thesis is to present a fast approach for stochastic modelling of processes with intractable transition densities and apply this approach to a SDE de- scribing logistic population gro wth for the ﬁrst time. One such approach is demonstrated: de veloping an analytically tractable approximation to the original SDE, by making lin- ear noise approximations (LN As) (Kurtz, 1970, 1971; V an Kampen, 2011). The present study introduces two ne w ﬁrst order LN As of a stochastic logistic gro wth model (SLGM) (Capocelli & Ricciardi, 1974), one with multiplicativ e and one with additi ve intrinsic noise, which are labelled LN AM and LN AA respecti v ely . The LN A reduces a SDE to a linear SDE with additi ve noise, which can be solved to gi ve an e xplicit e xpression for the transition density . The Bayesian approach can be applied in a natural way to carry out parameter infer- ence for state space models with tractable transition densities (W est & Harrison, 1997). A state space model describes the probabilistic dependence between an observ ation pro- cess v ariable X t and state process S t . The transition density is used to describe the state process S t and a measurement error structure is chosen to describe the relationship be- 3 Chapter 1. Introduction tween X t and S t . T ransition densities are deri ved for the LN A approximate models and measurement noise is chosen to be either multiplicativ e or additiv e in order to construct a linear Gaussian structure and allow fast inference through the use of a Kalman ﬁlter . The Kalman ﬁlter (Kalman, 1960) is typically used to infer the hidden state process of interest S t and is an optimal estimator , minimising the mean square error of estimated parameters. The main assumptions of the Kalman ﬁlter are that the underlying system is a linear dynamical system and that all noise is Gaussian (or that the mean and standard de viation of the noise is kno wn). Here the Kalman ﬁlter is used to reduce computational time in a parameter inference algorithm by recursi vely computing the marginal likelihood (W est & Harrison, 1997). It is sho wn that both of the new dif fusion equation models hav e more realistic gro wth characteristics at the saturation stage when compared to a related model by Rom ´ an- Rom ´ an & T orres-Ruiz (2012) (an approximate model approach which is labeled RR TR) and it is shown that a zero-order LN A of the logistic growth SDE with multiplicativ e intrinsic noise is equi v alent to the RR TR. This study compares the utility of each of the approximate models during parameter inference by comparing simulations with both synthetic and real datasets. After inference it is sho wn that the fast approximate methods gi ve similar posterior distrib utions to the slo w arbitrarily exact models. Of the approximate models considered, the RR TR model is sho wn to be the worst at reco v ering true parameters of logistic gro wth data. The LN A models are an improvement over the RR TR and so should be used for better parameter inference of logistic growth data, as they are just as fast but more accurate. The stochastic modelling approach presented in this study , a LN A follo wed by a Kalman ﬁlter recursion for marginal likelihood computation, is applicable to a range of popula- tion growth models or stochastic processes, where fast inference is of importance. The approach presented in this study enables stochastic modelling for a big data genome- wide analysis, where pre viously a deterministic model, unable to capture the information within the stochasticity of a process, is assumed due to the constraints in computational time associated with large volumes of data. The problems of big data (Boyd & Crawford, 2011) are relati vely ne w and part of an e xpanding ﬁeld of research that in volv es lar ge and complex collections of data sets, typically with lar ge components of noise. 1.1. Quantitative Fitness Analysis Genome-wide Quantitati v e Fitness Analysis (QF A) is a robot-assisted high-throughput laboratory workﬂo w , combining systematic genetic techniques to generate arrays of ge- 4 Chapter 1. Introduction netically distinct microbial cultures with quantiﬁcation and modelling of gro wth curves to estimate ﬁtnesses (Banks et al. , 2012; Addinall et al. , 2011). A QF A screen can be used to compare the ﬁtnesses of cultures with distinct genotypes in order to quantify genetic interaction. Genetic interaction strengths are typically estimated by comparing ﬁtnesses in two QF A screens: a control screen and a query screen. QF A output includes ﬁtness estimates for all microbial cultures in an arrayed library including replicate cultures. For example, such a library could be a systematic collection of all non-essential, single gene deletion strains in the model eukaryote Sacchar omyces cer e visiae ( S. cer evisiae , bre wer’ s yeast). All strains within a query screen dif fer from their control screen counterparts by a com- mon condition such as a background gene mutation, drug treatment, temperature or other treatment. T o identify strains that show interaction with the query condition, correspond- ing ﬁtness responses for each strain in the library under the query and control conditions can be compared. An example of the procedure to create mutant strains to test for genetic interaction using QF A screens is as follows. First a suitable query mutation is chosen, which is rele- v ant to an area of biology of particular interest (e.g. cdc13-1 for its rele v ance to telomere capping processes). Next, a library of strains is chosen, within which to search for strains interacting with the query mutation (e.g. a genome-wide library of independent strains with indi vidual, non-essential genes deleted: orf ∆ s). Finally , an appropriate, neutral con- trol background mutation is chosen (e.g. ura3 ∆ ) to allow the separation of the effect of background condition from that of the library strains. In most cases, control and query mutations are crossed with the chosen library using Synthetic Genetic Array (SGA) tech- nology (T ong & Boone, 2006). Independent replicate cultures are inoculated and grown across sev eral plates for each strain under each condition to capture biological and techni- cal heterogeneity . Cultures are grown simultaneously and time course images captured by photography . Robotic assistance is required for both culture inoculation and image cap- ture during genome-wide screens which can include approximately 5,000 independent genotypes. Raw QF A data (photographs) are con v erted into cell density estimates using the image analysis software Colonyzer (Lawless et al. , 2010). Observed changes in cell density ov er time are con verted to ﬁtness estimat es for both the control and query strain by ﬁtting logistic growth curves to data. Genetic interactions are identiﬁed by ﬁnding mutants in the query screen whose ﬁtnesses de viate signiﬁcantly from predictions gi v en by a theoretical model of genetic independence. 5 Chapter 1. Introduction Addinall et al. (2011) describe using QF A to infer genetic interactions with telomere- speciﬁc query mutations. They use least squares methods to ﬁt logistic gro wth curves to culture time courses, then generate a univ ariate ﬁtness estimate for each time course. They use a linear model predicting query strain ﬁtness giv en control strain ﬁtness, consistent with Fisher’ s multiplicativ e model of genetic independence, to test for genetic interaction between the query mutation and each orf ∆ . De viation from the predicted linear relation- ship between the query and control ﬁtnesses is evidence for genetic interaction between orf ∆ and the query mutation. The signiﬁcance of observed interactions is assigned us- ing a simple frequentist linear modelling approach. One of the major limitations of the statistical model used in Addinall et al. (2011) is that it assumes each orf ∆ ﬁtness has the same v ariance. It is expected that explicit modelling of heterogeneity will allo w more robust identiﬁcation of interactions, particularly where v ariability for a particular strain is unusually high (e.g. due to experimental or technical dif ﬁculties). 1.1.1. Quantifying ﬁtness Observing changes in cell number in a microbial culture is the most direct way to estimate culture gro wth rate, an important component of microbial culture ﬁtness. Direct counting of cell number on a high-throughput scale is not practical and so cell density estimates are made instead from culture photographs taken during QF A. Estimates of the integrated optical density (IOD) generated by the image analysis tool Colonyzer (Lawless et al. , 2010) are used to capture cell density dynamics in independent cultures during QF A. Density estimates, scaled to normalise for camera resolution, are gathered for each culture and a dynamic model of population gro wth, the logistic model ˙ x = r x (1 − x/K ) (V erhulst, 1845) (see Section 1.1.2), is ﬁt to the data. Example photographic images of two yeast colonies inoculated by QF A, growing over time, along with corresponding quantitativ e measures of gro wth can be seen in Figure 1.3. For a QF A screen, cultures are typically gro wn on 384-spot plates ov er time, where a process called spotting is used to inoculate microbial cultures on the plates. The spotting process in v olves a stage where microbial cultures are ﬁrst diluted and then the diluted culture is spotted to the plate. Section 2.1.5 describes the spotting process and alternativ es in further detail. An e xample 384-spot plate of yeast cultures is giv en in Figure 1.1. Y east cultures in Figure 1.1 are all ali v e and have similar culture size. A cropped image of 15 yeast cultures from a 384-spot plate is gi v en in Figure 1.2. Y east cultures in Figure 1.2 hav e different culture sizes, the smaller cultures hav e had slo w growth relati ve to the larger cultures. An e xample of the raw time series data is gi ven in the Appendix, Figure A.1. 6 Chapter 1. Introduction Figure 1.1: Example 384-spot plate image from a yeast quantitati v e ﬁtness analysis screen, taken approximately 3 days after inoculation. Y east cultures are spotted and grown in regular arrays on solid agar plates. Figure 1.2: Cropped image of 15 out of 384 spotted yeast cultures from a 384-spot plate, taken from a quantitative ﬁtness analysis screen. Image taken approximately 3 days after inoculation. Y east cultures are spotted and gro wn in re gular arrays on solid agar plates. 7 Chapter 1. Introduction Further detail on the QF A workﬂo w and alternati ve 384-spot plate images can be found at (Banks et al. , 2012) and http://research.ncl.ac.uk/qfa/ . After logistic gro wth model ﬁtting, estimated logistic gro wth parameters sets can then be used to determine the ﬁtness of a culture. If required, a uni variate ﬁtness deﬁnition can be chosen to summarise a set of logistic gro wth parameters (see Section 1.1.3). 0 1 2 2 4 3 6 h is 3 Δ h t z1 Δ 6 1 8 3 0 4 2 T ime since inoculation (h) Normalised cell dens ity (AU) 0 1 0 2 0 3 0 4 0 0 . 0 0 0 . 0 5 0 . 1 0 0 . 1 5 A B Figure 1.3: A) Timelapse images for two genetically modiﬁed S. cer e visiae cultures with dif ferent genotypes (indicated) corresponding to the time series measurements plotted in panel B. B) T ime course cell density estimates deri ved from analysis of the timelapse images in panel A together with (least squares) ﬁtted logistic gro wth curves. 1.1.2. The logistic gro wth model The logistic model of population gro wth, an ordinary dif ferential equation (ODE) de- scribing the self-limiting gro wth of a population of size x ( t ) at time t , was de v eloped by 8 Chapter 1. Introduction V erhulst (1845), dx ( t ) dt = r x ( t )  1 − x ( t ) K  . (1.1) The ODE has the follo wing analytic solution: x ( t ; θ ) = K P e rt K + P ( e rt − 1) , (1.2) where P = x (0) and θ = ( K, r, P ) . The model describes a population gro wing from an initial size P (culture inoculum density) with an intrinsic growth rate r , undergoing ap- proximately exponential gro wth which slo ws as the a v ailability of some critical resource (e.g. nutrients or space) becomes limiting (Jr . et al. , 1976). Ultimately , population den- sity saturates at the carrying capacity (maximum achiev able population density) K , once the critical resource is exhausted. Appendix A.2 shows ho w to deriv e the solution of (1.1), gi ven in (1.2). An e xample of two different logistic gro wth trajectories are giv en by the solid lines in Figure 1.3B. Where further ﬂexibility is required, generalized forms of the logistic growth process (Tsoularis & W allace, 2002; Peleg et al. , 2007) may be used instead (see Section 2.5.2). 1.1.3. Fitness deﬁnitions Culture ﬁtness is an important phenotype, indicating the health of a culture. Se veral distinct quantitativ e ﬁtness measures based on ﬁtted logistic model parameters (1.2) can be constructed. Addinall et al. (2011) present three univ ariate measures suitable for QF A: Maximum Doubling Rate ( M D R ) and Maximum Doubling Potential ( M D P ) detailed in (1.3), and their product M D R × M D P , where M D R = r l og  2 K − P K − 2 P  and M DP = l og  K P  l og (2) . (1.3) MDR is reciprocal of minimum doubling time T which a cell population takes to reach 2 x (0) , assuming the exponential phase be gins at t = 0 : x ( t ) x (0) = 2 . 9 Chapter 1. Introduction W e no w rearrange to gi ve the follo wing expression for MDR: M D R = 1 T = r log( 2( K − P ) K − 2 P ) . MDP is the number of times population size doubles before reaching saturation, assuming geometric progression: x (0) × 2 M D P = K . Rearrange to gi ve the follo wing: M D P = log( K P ) log 2 . M D R captures the rate at which microbes divide when experiencing minimal inter - cellular competition or nutrient stress. A strain’ s growth rate lar gely dictates its ability to outcompete any neighbouring strains. M DP captures the number of di visions the culture is observed to under go before saturation. A strain which can divide a fe w more times than its neighbours in a speciﬁc en vironment also has a competitive adv antage. The choice of a single ov erall ﬁtness score depends on the aspects of microbial phys- iology most relev ant to the biological question at hand. T ypically the ﬁtness deﬁnition M D R × M D P is used in QF A to account for both attributes simultaneously . Other ﬁtness deﬁnitions av ailable include cell count, e xpected generation number and their ap- proximations (Cole et al. , 2007). 1.2. Epistasis Epistasis is the phenomenon where the effects of one gene are modiﬁed by those of one or se veral other genes (Phillips, 1998). Besides the multiplicativ e model, there are other deﬁnitions for epistasis such as additiv e, minimum and log (Mani et al. , 2008). Minimum is a suboptimal approach which may allo w “masking” of interactions (Mani et al. , 2008). For a typical yeast QF A screen comparison, Addinall et al. (2011) assumes a multiplica- ti ve interaction model (1.4), but when dealing with measurements on a log scale, it is ef fecti v ely assuming an additi v e interaction model (A ylor & Zeng, 2008). This highlights the point that multiplicati v e and additi ve models are equiv alent if ﬁtness data are scaled appropriately (Cordell, 2002). 10 Chapter 1. Introduction 1.2.1. Deﬁning epistasis As presented in Addinall et al. (2011), this study assumes Fisher’ s multiplicativ e model of genetic independence (1.4) (Cordell, 2002; Phenix et al. , 2011), to represent the ex- pected relationship between control strain ﬁtness phenotypes and those of equi v alent query strains in the absence of genetic interaction. In this study , we interpret genotypes for which the query strain ﬁtness deviates signiﬁcantly from this model of genetic indepen- dence as interacting signiﬁcantly with the query mutation. Square bracket notation is used to represent a quantitativ e ﬁtness measure. For example [ w t ] and [ q uer y ] represent wild- type and query mutation ﬁtnesses respecti v ely . “W ild-type” strictly refers to the genotype that is prev alent among indi viduals in a natural (or wild) population. Ho we v er , during laboratory culti vation of microbes it is more usual to introduce e xtra gene mutations to an ancestral lineage that is well established within the scientiﬁc community . W orking with established lineages allo ws direct comparison with results from the literature without the confounding effect of sampling genotypes from natural populations, which are consid- erably more heterogeneous. Thus in context of this thesis, “wild-type” will refer to the reference strain, before additional mutations are introduced. orf ∆ represents an arbitrary single gene deletion strain (i.e. a mutant from the control strain library). q uer y : orf ∆ represents an arbitrary single gene deletion from the query strain library (e.g. crossed with the query mutation). Fisher’ s multiplicativ e model of genetic independence is as follows: [ q uer y : orf ∆ ] × [ w t ] = [ q uer y ] × [ orf ∆ ] (1.4) ⇒ [ q uer y : orf ∆ ] = [ q uer y ] [ w t ] × [ orf ∆ ] . (1.5) In (1.5), [ q uery ] [ wt ] is a constant for a gi v en pair of QF A screens, meaning that if this model holds, there should be a linear dependence between [ q uer y : orf ∆ ] and [ orf ∆ ] for all deletions orf ∆ . During genome-wide screens of thousands of independent orf ∆ s, it can be assumed that the majority of gene mutations in the library do not interact with the chosen query mutations. Therefore, ev en if the query or wild-type ﬁtnesses are not av ailable to us, the slope of this linear model can still be estimated by ﬁtting it to all av ailable ﬁtness observations, before testing for strains which deviate signiﬁcantly from the linear model. Any e xtra background condition, such as a gene mutation common to both the control and query strains (e.g. triple instead of double deletion strains for the query and control data sets), may change the interpretation or deﬁnition of the type of genetic interaction but the same linear relationship is applicable. 11 Chapter 1. Introduction 1.2.2. Addinall et al. (2011) Quantitative Fitness Analysis scr een comparison Addinall et al. (2011) present QF A where the logistic gro wth model (1.2) is ﬁt to ex- perimental data by least squares to giv e parameter estimates ( ˆ K , ˆ r ) for each culture time course (each orf ∆ replicate). Inoculum density P is assumed kno wn and the same across all orf ∆ s and their repeats. After inoculating approximately 100 cells per culture, during the ﬁrst sev eral cell di visions there are so few cells that culture cell densities remain well belo w the detection threshold of cameras used for image capture and so, without sharing information across all orf ∆ repeats, P cannot be estimated directly . It is therefore nec- essary to ﬁx P to the same v alue for both screens, using an av erage estimate of P from preliminary least squares logistic growth model ﬁts. Fitting the model to each orf ∆ re- peat separately means there is no sharing of information within an orf ∆ or between orf ∆ s when determining ˆ K and ˆ r . By de veloping a hierarchical model to share information across orf ∆ repeats for each orf ∆ and between orf ∆ s, estimates for e very set of logistic gro wth curve parameters ( K, r ) can be impro ved and therefore for e very strain ﬁtness. Quantitati ve ﬁtness scores ( F cm ) for each culture were deﬁned (1.6) (see (1.3) for deﬁnitions of M D R and M D P ), where F cm = M DR cm × M DP cm . (1.6) The index c identiﬁes the condition for a gi v en orf ∆ : c = 0 for the control strain and c = 1 for the query strain. m identiﬁes an orf ∆ replicate. Scaled ﬁtness measures ˜ F cm are calculated for both the control and query screen such that the mean across all orf ∆ s for a gi ven screen is equal to 1. After scaling, any e vidence that ˜ F 0 m and ˜ F 1 m are signiﬁcantly dif ferent will be e vidence of genetic interaction. The following linear model was ﬁt to the control and query strain scaled ﬁtness mea- sure pairs ˜ F cm for each unique orf ∆ in the gene deletion library: ˜ F cm = µ + γ c + ε cm , where γ 0 = 0 ε cm ∼ N(0 , σ 2 ) , where ε cm is i.i.d. (1.7) In (1.7), γ 1 represents the estimated strength of genetic interaction between the control and query strain. If the scaled ﬁtnesses for the control and query strain are equiv alent for a particular orf ∆ such that they are both estimated by some µ , i.e. no evidence of genetic interaction, we would e xpect γ c = 0 . The model was ﬁt by maximum likelihood, using the R function “lmList” (Pinheiro & Bates, 2000) with variation assumed to be the 12 Chapter 1. Introduction same for all strains in a giv en screen and the same for both control and query screens. So, for ev ery gene deletion from the library an estimate of γ 1 was generated together with a p-v alue for whether it was signiﬁcantly different from zero. False discov ery rate (FDR) corrected q-v alues were then calculated to determine lev els of signiﬁcance for each orf ∆ . Addinall et al. (2011) use the Benjamini-Hochberg test (Benjamini & Hochberg, 1995) for FDR correction. This test is commonly used in genomic analyses as although it assumes independence of test statistics, e v en if positi v e correlation exists between tests, the result is that FDR estimates are slightly conserv ati v e. Finally a list of orf ∆ names, ranked by γ magnitudes, was output and orf ∆ s with q-values below a signiﬁcance cut-off of 0.05 classed as sho wing signiﬁcant le v els of genetic interaction with the query mutation. 1.2.3. Fitness plots Fitness plots are used to show which orf ∆ s show evidence of genetic interaction from a QF A screen comparison. Figure 1.4 shows an example ﬁtness plot taken from (Addinall et al. , 2011). Fitness plots are typically mean orf ∆ ﬁtnesses for control strains against the corresponding query strains. orf ∆ s with signiﬁcant e vidence of interaction are high- lighted in the plot as red and green for suppressors and enhancers respecti vely . orf ∆ s without signiﬁcant evidence of interaction are in gre y . Solid and dashed grey lines are for a simple linear model ﬁt (corresponding to a model of genetic independence) and the line of equal ﬁtness respecti vely . 13 Chapter 1. Introduction 2 2 Figure 1.4: Fitness plot taken from Addinall et al. (2011). A yeast genome knock out collection was crossed to the cdc13-1 mutation, or as a control to the ura3 ∆ mutation. 8 replicate crosses were performed for the query and control strains. orf ∆ s with signiﬁcant e vidence of interaction are highlighted in red and green for suppressors and enhancers respecti vely . orf ∆ s without signiﬁcant e vidence of interaction are in gre y and have no orf name label. Lenient and stringent classiﬁcation of signiﬁcant interaction is based on p-values < 0 . 05 and FDR corrected p-v alues (q-values) < 0 . 05 respectiv ely . For a further description on ﬁtness plots, see Section 1.2.3. 14 Chapter 1. Introduction 1.3. The stochastic logistic gro wth model T o account for uncertainty about processes affecting population gro wth which are not e x- plicitly described by the deterministic logistic model, we can include a term describing intrinsic noise and consider an SDE version of the model. Here we extend the ODE in (1.1) by adding a term representing multiplicativ e intrinsic noise (1.8) to giv e a model which we refer to as the stochastic logistic growth model (SLGM), which was ﬁrst intro- duced by Capocelli & Ricciardi (1974), dX t = r X t  1 − X t K  dt + σ X t dW t , (1.8) where X t 0 = P and is independent of W iener process W t , t ≥ t 0 . The W iener process (or standard Bro wnian motion) is a continuous-time stochastic process, see Section 2.6.1. The K olmogorov forward equation has not been solved for (1.8) (or for any similar formu- lation of a logistic SDE) and so no explicit e xpression for the transition density is av ail- able. Rom ´ an-Rom ´ an & T orres-Ruiz (2012) introduce a diffusion process approximating the SLGM with a transition density that can be deri ved e xplicitly (see Section 5.2). Alternati ve stochastic logistic gro wth models to (1.8) are av ailable. Allen (2010) de- ri ves the stochastic logistic gro wth models giv en in (1.9) and (1.10) from Markov jump processes (Allen, 2010; W ilkinson, 2011). Firstly , dX t = r X t  1 − X t K  dt + p r X t dW t , (1.9) where X t 0 = P and is independent of W t , t ≥ t 0 . Secondly , dX t = r X t  1 − X t K  dt + s r X t  1 + X t K  dW t , (1.10) where X t 0 = P and is independent of W t , t ≥ t 0 . Note that (1.8) (1.9) and (1.10) are not equi v alent to each other . (1.9) and (1.10) are able to describe the discreteness of the Markov jump processes that they approximate (or demographic noise). Demographic noise becomes less signiﬁcant for large population sizes, therefore (1.9) and (1.10) describe more deterministic gro wth curves when popula- tion size is large (i.e. large carrying capacity K ). Equation 1.8 introduces an additional parameter σ , unlike (1.9) and (1.10). The additional parameter in (1.8) allows us to tune the amount of noise in the system that is not directly associated with the noise due to 15 Chapter 1. Introduction the discreteness of the process (demographic noise). The additional parameter also giv es (1.8) further ﬂexibility for modelling intrinsic noise than (1.9) and (1.10). As the diffu- sion terms of (1.9) and (1.10) are functions of the logistic growth parameters, for large populations (1.9) and (1.10) can confound intrinsic noise with estimates of logistic growth parameters r and K . For the abov e reasons, the SLGM in (1.8) is the most appropriate model for estimating logistic gro wth parameters of large populations, as intrinsic noise does not tend to zero with larger population sizes, unlik e (1.9) and (1.10). 1.4. Outline of thesis A brief outline of thesis is as follows. Chapter 2 gi ves background to the biological and statistical methods used throughout the thesis. Y east biology related to the QF A data sets analysed in this study is gi ven as well as an introduction to Bayesian inference. In Chapter 3 the SHM and IHM models for the new two-stage Bayesian QF A approach are presented. Next, the JHM for the ne w one-stage Bayesian QF A approach is presented. The chapter is concluded by introducing a two-stage frequentist QF A approach using a random ef fects model. In Chapter 4 the new Bayesian approaches are applied to a pre viously analysed QF A data set for identifying genes interacting with a telomere defect in yeast. The chapter is concluded with an analysis of further QF A data sets with the JHM and two extensions of the JHM; included for further in vestigation and research. Chapter 5 be gins by introducing an existing logistic gro wth diffusion equation by Rom ´ an-Rom ´ an & T orres-Ruiz (2012). T wo ne w dif fusion equations for carrying out fast, Bayesian parameter estimation for stochastic logistic gro wth data are then presented. The chapter is concluded by comparing inference between the approximate models considered and with arbitrarily exact approaches. Finally , Chapter 6 presents conclusions on the relativ e merits of the newly dev eloped Bayesian approaches and stochastic logistic growth models. The chapter is concluded by discussing the broader implications of the results of the studies presented and scope for further research. 16 Chapter 2. Backgr ound 2.1. Y east biology Sacchar omyces cer evisiae is a species of budding yeast widely used to study genetics. S. cer evisiae was the ﬁrst eukaryotic genome that was completely sequenced (Gof feau et al. , 1996). Y east is ideal for high throughout experimentations as it is easy to use and arrayed libraries of genetically modiﬁed yeast strains are readily a v ailable or obtainable for e xperiments (Ze yl, 2000). There are many different observable traits av ailable with S. Cer e visiae , such as size, opacity and density . There are about 6000 genes in the S. Cer e- visiae genome of which 5,800 of these are belie v ed to be true functional genes (Cherry et al. , 2012). Y easts are ideal for genome-wide analysis of gene function as genetic modiﬁcation of yeast cells is relati vely straightforward and yeast cultures grow quickly . Epistasis identi- ﬁed within a species of yeast may exist in the analogous genes within the human genome (Botstein et al. , 1997). Therefore, ﬁnding genes in v olved in epistasis within yeast is of great interest outside the particular experimental species in question. 2.1.1. T elomer es T elomeres are the ends of linear chromosomes and found in most eukaryotic organ- isms (Olovnik o v, 1996). T elomeres permit cell di vision and some researchers claim that telomere-induced replicativ e senescence is an important component of human ageing (L y- dall, 2003). They cap (or seal) the chromosome end to ensure genetic stability and are belie ved to pre vent cancer (Shay & Wright, 2005). In Figure 2.1, a S. cer e visiae chromosome is shown with the telomere single-stranded DN A (ssDN A) at the end, where DN A binding proteins such as Cdc13 are bound. Fig- ure 2.1 also shows how telomere maintenance compares between a Homo sapiens ( H. sapiens ) and S. cer e visiae chromosome. T elomere length decreases with each division of a cell until telomere length is very short and the cell enters senescence (Hayﬂick & Moorhead, 1961), losing the ability to di- vide. Some cancerous cells up-regulate the enzyme called telomerase which can prev ent shortening of telomeres or elongate them, potentially allowing cancerous cells to li v e in- deﬁnitely (Wright & Shay, 1992). 17 Chapter 2. Background 5‘ 3‘ 3‘ S. cerevisiae H. sapiens ssDNA dsDNA TRF1 TRF2 POT1 T elomerase DDR Ageing Cancer 5‘ 3‘ 3‘ Rap1 Cdc13-Stn1- T en1 ( CST ) RAP1 TIN2 TPP1 Rif2 Rif1 KU yKu Figure 2.1: T elomere at a chromosome end (diagram and legend taken from Dewar & L ydall (2012)). The telomere cap is ev olutionarily conserved. T elomeres are nucleoprotein caps present at the ends of most eukaryotic chromosomes, consisting of double-stranded DNA (dsDN A) with a single-stranded DN A (ssDNA) ov erhang, bound by dsDN A- and ssDNA-binding proteins. Collec- ti vely , the telomere binding proteins “cap” the telomere and serve to regulate telomerase acti vity and inhibit the DN A damage response (DDR). In budding yeast, the telomeric dsDNA is bound by Rap1, which recruits the accessory factors Rif1 and Rif2. In humans, the telomeric dsDN A is bound by TRF1 and TRF2 (held together by TIN2) and TRF2 recruits RAP1 to telomeres. In budding yeast, Cdc13 binds the telomeric ssDN A and recruits Stn1 and T en1 to form the CST (Cdc13-Stn1-T en1) complex, while in humans, the telomeric ssDN A is bound by PO T1. In hu- man beings, PO T1 and TRF1-TRF2-TIN2 are linked together by TPP1, which may permit the adoption of higher-order structures. In both budding yeast and humans, the Ku complex, a DDR component that binds to both telomeres and Double-strand breaks (DSBs), also binds and plays a protecti ve role. It is believ ed that telomeres are partly responsible for ageing; without the enzyme telom- erase, a ﬁxed limit to the number of times the cell can di vide is set by the telomere short- ening mechanism because of the end replication problem (Le vy et al. , 1992). 2.1.2. The end replication pr oblem In eukaryote cell replication, sho wn in Figure 2.2, new strands of DNA are in the 5 0 to 3 0 direction (red arro ws), the leading strand is therefore completed in one section whereas the lagging strand must be formed via backstitching with smaller sections kno wn as Okazaki fragments (L ydall, 2003). Figure 2.2 shows how the lagging strand is left with a 3 0 ov er - hang, with the remo v al of the terminal primer at the end and how the leading strand is left with a blunt end (David W ynford-Thomas, 1997). T elomerase ﬁxes this problem by extending the 3 0 end to maintain telomere length (Le vy et al. , 1992). W ithout telomerase, the leading strand is shortened (Olovniko v, 1973) and telomere capping proteins such as Cdc-13 in yeast binds to the ssDNA that remains. Most eukaryotic cells ha ve telomerase 18 Chapter 2. Background acti v ated and may maintain DNA replication indeﬁnitely . Not all mammalian cells hav e telomerase acti v ated and it is believ ed this problem then leads to the shortening of their telomeres and ultimately senescence. Figure 2.2: The end replication problem (diagram and legend taken from L ydall (2003)). (A) T elomeres in all organisms contain a short 3 0 ov erhang on the G rich strand. (B) A replication fork moving tow ards the end of the chromosome. (C) The newly replicated, lagging C strand, will generate a natural 3 0 ov erhang when the ribonucleic acid (RNA) primer is remov ed from the ﬁnal Okazaki fragment, or if the lagging strand replication machinery cannot reach the end of the chromosome. In the absence of nuclease activity the unreplicated 3 0 strand will be the same length as it was prior to replication. (D) The ne wly replicated leading G strand will be the same length as the parental 5 0 C strand, and blunt ended if the replication fork reaches the end of the chromosome. Therefore the newly replicated 3 0 G strand will be shorter than the parental 3 0 strand and unable to act as a substrate for telomerase because it does not contain a 3 0 ov erhang. If the leading strand replication fork does not reach the end of the chromosome a 5 0 rather than 3 0 ov erhang would be generated, but this w ould not be a suitable substrate for telomerase. 2.1.3. CDC13 and cdc13-1 CDC13 is an essential telomere-capping gene in S. cer evisiae (Zubko & L ydall, 2006). The protein Cdc13, encoded by CDC13 , binds to telomeric DNA (see Figure 2.1), forming a nucleoprotein structure (Lustig, 2001). Cdc13 regulates telomere capping and is part of the CST complex with Stn1 and T en1 (W ellinger, 2009). This provides protection from degradation by exonucleases such as Exo1. cdc13-1 is a temperature-sensitiv e allele of the CDC13 gene that has temperature sensitivity abov e 26 ◦ C , where the capping ability of the protein is reduced (Nugent et al. , 1996). By inducing the temperature sensitivity of Cdc13-1 , telomere maintenance is disrupted. A lot of research activity for telomere 19 Chapter 2. Background integrity focuses on the CST complex and often cdc13 mutations are considered, like cdc13-1 and cdc13-5 (see, for example, Anbalagan et al. , 2011; F oster et al. , 2006). 2.1.4. URA3 URA3 is a gene that encodes orotidine 5-phosphate decarboxylase (Cong et al. , 2002). URA3 is used as a genetic marker for DNA transformations, allo wing both positiv e and negati ve selection depending on the choice of media (Kaneko et al. , 2009). In Addinall et al. (2011) ura3 ∆ is used as a control mutation because it is neutral under the e xperimental conditions. F or a QF A comparison, constructing a query mutation such as cdc13-1 typically in volv es adding selection mark ers to the genome. T o ensure that the same selection markers are found in both the query and control strains, and that the control and query screens can be carried out in comparable en vironments, a neutral mu- tation such as ura3 ∆ can be introduced to the control strain. URA3 encodes an enzyme called ODCase. Deleting URA3 causes a loss of ODCase, which leads to a reduction in cell growth unless uracil is added to the media (Jones, 1992). Addinall et al. (2011) include uracil in their media so that ura3 ∆ is ef fecti v ely a neutral deletion, approximat- ing wild-type ﬁtness. As a control deletion, URA3 is not expected to interact with the query mutation, the library of orf ∆ s in the control and query screen or any experimental condition of interest such as temperature. 2.1.5. High-throughput methodology f or Quantitative Fitness Analysis T o collect enough data to perform QF A (Addinall et al. , 2011), a methodology such as high-throughput screening is required (Soon et al. , 2013; An & T olliday, 2009). High- throughput screening is most notably used in the ﬁeld of biology for genome wide sup- pressor/enhancer screening and drug discovery . The automation of experimental proce- dures through robotics, software, sensors and controls allo ws a researcher to carry out large scale e xperimentation quickly and more consistently . Hundreds of microbial strains with v arious gene deletions need to be systematically created, cultured and then hav e measurable traits quantiﬁed. The repeatability of micro- bial culture growth is ideal to gi ve sufﬁcient sample sizes for identifying both variation and signiﬁcance in high throughput experimentation (Xu, 2010). The quality of the quantitati v e data is critical for identifying signiﬁcantly interacting genes. T o measure the phenotypes of different mutant strains of a micro-organism such as yeast (Zeyl, 2000), a process called spotting is used. This process is different to a typical SGA e xperiment where pinning would be used (see, for example, T ong & Boone (2006)). 20 Chapter 2. Background Pinning is a quicker but less quantitati v e process where the microbial strains are typically directly pinned to a 1536 plate and allo wed to gro w until image analysis starts. Spotting on the other hand has a stage where the cultures are diluted and then the dilute culture is spotted in 384 format to giv e a more accurate reading in image analysis. This in turn gi ves rise to much more accurate time series data for modelling. Figure 2.3 illustrates the spotting process. An image opacity measure is typically used as a proxy for the density of microbial colonies. T ime lapse photographs are taken of the 384-spot plates after incubation, using high resolution digital cameras, to measure growth. A software package such as Colonyzer (Lawless et al. , 2010) can then be used to deter - mine a quantitativ e measure of ﬁtness from the photographs taken of the cultures grown on the plates. T o ensure a consistent method to capture images of microbial colonies, all cameras should be of the same make and model. 2.2. Comparing lists of genes Upon completing a QF A screen comparison, a list of genes ordered by genetic interaction strength can be obtained. Lists of ordered genes can be used to compare two dif ferent statistical approaches for a QF A screen comparison. A comparison of two lists can be carried out through standard statistical similarity measures such as the Jaccard Index or Spearman’ s rank correlation coefﬁcient. Observ- ing only the subset of genes showing signiﬁcant e vidence of genetic interaction, two lists of genes can be compared using the Jaccard Index (Cheetham & Hazel, 1969), see Sec- tion 2.2.1. The Jaccard index does not account for the ordering of genes and is dependent on the number of interactions identiﬁed when the cut-off of genes sho wing signiﬁcant e vi- dence of interaction is chosen or inﬂuenced by the experimenter . Due to these undesirable properties of the Jaccard index, this method is not appropriate for an unbiased compari- son of statistical methods. The Spearman’ s rank correlation coefﬁcient (Ko walczyk et al. , 2004) is able to account for the ordering of genes and is able to account for the whole list of genes av ailable, see Section 2.2.2. Gene ontology (GO) term enrichment can be used to suggest which list of genetic interactions has the most biological relev ance (Consortium, 2004). There are many other alternati ve approaches av ailable for the comparison of two gene lists (Y ang et al. , 2006; Lottaz et al. , 2006). Using both Spearman’ s correlation coef ﬁcient and GO term enrichment analysis of gene lists allo ws for both an unbiased statistical and biological comparison of two lists of ordered genes. 21 Chapter 2. Background Figure 2.3: The spotting procedure for robotic inoculation of yeast strains in 384-spot format (di- agram and legend taken from Banks et al. (2012)). This procedure begins with 1536 independent cultures per plate (left). In this typical example, colonies at positions 1,1; 1,2; 2,1 and 2,2 (colored red) are four replicates of the same genotype. his3::KANMX cultures in yellow , gro wing on the edge of the plate, ha v e a gro wth adv antage due to lack of competition and are therefore not exam- ined by Quantitati ve Fitness Analysis. One of these replicates (e.g. 1,1) is inoculated into liquid gro wth media in 96-well plates using a 96-pin tool which inoculates 96 out of 1536 colonies each time. In order to inoculate one replicate for each of 384 gene deletions, four different “quadrants” (indicated as red, blue, green and purple) are inoculated into four dif ferent 96-well plates contain- ing growth media. After growth to saturation (e.g. 3 days at 20 C), cultures are diluted in water , then the four quadrants from one repeat are spotted in 384-format onto a solid agar plate (right) in the same pattern as the original Synthetic Genetic Array plate (as indicated by color). The process can be repeated to test other replicates: 1,2; 2,1 and 2,2. Example time-lapse images on the right were captured 0.5, 2 and 3.5 days after inoculation. 2.2.1. Jaccard index For two sample sets, the Jaccard index (Jaccard, 1912; Cheetham & Hazel, 1969) gi v es a measure of similarity . Where A and B are two sample sets of interest, the Jaccard Index is as follo ws: J ( A, B ) = | A ∩ B | | A ∪ B | . The v alue of J(A,B) can range from 0 to 1, with a lar ger number for more similarity . 22 Chapter 2. Background 2.2.2. Spearman’ s rank corr elation coefﬁcient The Spearman’ s rank correlation coef ﬁcient (Spearman, 1987; K owalczyk et al. , 2004) allo ws comparison of two v ariables X i and Y i , both of sample size n . First, X i and Y i are both con v erted into ranks x i and y i . Where there are rank ties or duplicate v alues, the rank equal to the av erage of their positions is assigned. The Spearman’ s rank correlation coef ﬁcient is as follo ws: ρ = P i ( x i − ¯ x )( y i − ¯ y ) p P i ( x i − ¯ x ) 2 P i ( y i − ¯ y ) 2 . The value of ρ can range from -1 to 1. As the relationship between two variables becomes closer to being described by a monotonic function, the larger in magnitude ρ will be. 2.2.3. Gene ontology term enrichment analysis Gene ontology (GO) term enrichment analysis can giv e insight to the biological functions of a list of genes (Consortium, 2004). A list of GO terms can be acquired from a list of genes. For yeast the Saccharomyces Genome Database (SGD) (Cherry et al. , 2012) can be used to ﬁnd GO term associations for each gene in the genome. A statistical analysis is carried out to determine which GO terms are most prev alent in a list of genes. The experimenter can then look at GO terms of interest, ﬁnd out which genes they correspond to and ho w many are identiﬁed in the list. An unbiased Gene Ontology (GO) term enrichment analyses on a list of genes can be carried out using the software R (R Core T eam, 2013) and the bioconductoR package GOstats (Falcon & Gentleman, 2007). There are many other software packages and online services av ailable to carry out a GO term enrichment such as the Database for Annotation, V isualization and Integrated Disco very (D A VID) (Huang et al. , 2008, 2009) or the Gene Ontology Enrichment Analysis and V isualization tool (GOrilla) (Eden et al. , 2009, 2007). A GO term clustering analysis is a statistical approach that can be used to follo w up a GO term analysis. Information on the relation of GO terms is used in a clustering analysis to ﬁnd functionally related groups of GO terms. The bioinformatics tool D A VID (Huang et al. , 2008, 2009) can be used to carry out GO term clustering ( david.abcc. ncifcrf.gov/ ). 23 Chapter 2. Background 2.3. Bayesian infer ence A classical (or frequentist) statistical approach typically assumes model unkno wn param- eters are constants and uses the likelihood function to make inference. An alternati ve methodology is a Bayesian approach (Bernardo & Smith, 2007; Gelman et al. , 2003), named after Thomas Bayes (Bayes & Price, 1763). In a Bayesian setting, a parametric model similar to the frequentist approach can be assumed b ut model parameters are treated as random variables. This feature allo ws any prior knowledge for a giv en parameter to be incorporated into inference by b uilding a prior distrib ution to describe the information av ailable. W e are interested in the posterior distribution, that is the probability of the parameters gi ven the evidence. Moreover , where D is the observed data, θ is the set of parameters of interest, we are interested in calculating the posterior density π ( θ | D ) . A priori kno wledge of θ is described by π ( θ ) and the likelihood of data by L ( D | θ ) . Using Bayes theorem we obtain the follo wing: π ( θ | D ) ∝ π ( θ ) L ( D | θ ) or P oster ior ∝ P r ior × l ik el ihood. 2.3.1. Marko v chain Monte Carlo In Bayesian inference we are typically interested in sampling from the posterior distri- bution or one of its mar ginals, but often this is dif ﬁcult. Markov Chain Monte Carlo (MCMC) methods are used for sampling from probability distributions (Gamerman, 1997; Gilks et al. , 1995). The Monte Carlo name describes the repeated random sampling used to compute results. A Marko v chain can be constructed with an equilibrium distribution that is the posterior distrib ution of interest. A Marko v chain { X n , n ∈ N 0 } is a stochastic process which satisﬁes the Marko v property (or “memoryless” property): for A ⊆ S , where S is the continuous state space s.t. X n ∈ S , P ( X n +1 ∈ A | X n = x, X n − 1 = x n − 1 , ..., X 0 = x 0 ) = P ( X n +1 ∈ A | X n = x ) , ∀ x, x n − 1 , ..., x 0 ∈ S . The equilibrium distribution π ( x ) is a limiting distribution of a Marko v chain with the follo wing two properties. First, there must exist a distribution π ( x ) which is stationary . This condition is guaranteed when the Markov chain satisﬁes 24 Chapter 2. Background detailed balance: π ( x ) p ( x, y ) = π ( y ) p ( y , x ) , ∀ x, y , where p ( x, y ) is the transition density kernel of the chain. Secondly , the stationary distri- bution π ( x ) must be unique. This is guaranteed by the ergodicity of the Markov process; see Gamerman (1997) for a deﬁnition and suf ﬁcient conditions. 2.3.2. Metropolis-Hastings algorithm The Metropolis-Hastings algorithm (Metropolis et al. , 1953; Hastings, 1970) is a MCMC method for obtaining a random sample from a probability distribution of interest (or sta- tionary distribution) (Chib & Greenberg, 1995). W ith the follo wing procedure a sample from the stationary distribution of the Mark o v chain can be obtained: 1) Initialise counter i = 0 and initialize X 0 = x 0 2) From the current position X i = x , generate a candidate v alue y ∗ from a proposal density q ( x, y ) . 3) Calculate a probability of acceptance α ( x, y ∗ ) , where α ( x, y ) =    min n 1 , π ( y ) q ( y ,x ) π ( x ) q ( x,y ) o if π ( x ) q ( x, y ) > 0 1 otherwise. 4) Accept the candidate value with probability α ( x, y ∗ ) and set X i +1 = y ∗ , otherwise reject and set X i +1 = x . 5) Store X i +1 and iterate i = i + 1 . 6) Repeat steps 2-5 until the sample size required is obtained. The choice of proposal density is important in determining how many iterations are needed to con v er ge to a stationary distribution. There are many choices of proposal dis- tribution (Gamerman, 1997), the simplest case is the symmetric chain. The symmetric chain in volves choosing a proposal where q ( x, y ) = q ( y , x ) , such that step two simpliﬁes 25 Chapter 2. Background to gi ve the follo wing: α ( x, y ) =    min n 1 , π ( y ) π ( x ) o if π ( x ) > 0 1 otherwise. More general cases are random walk chains and independence chains. For a random walk chain, the proposed v alue at stage i is gi ven by y ∗ = x i + w i , where w i are i.i.d. random variables. The distribution for w i must therefore be chosen, and is typically Normal or Student’ s t distribution centred at zero. If the distribution for w i is symmetric, the random walk is a special case of symmetric chains. For an independence chain, the proposed transition is formed independently of the pre vious position of the chain, thus q ( x, y ) = f ( y ) for some density f ( . ) : α ( x, y ) =    min n 1 , π ( y ) f ( x ) π ( x ) f ( y ) o if π ( x ) f ( y ) > 0 1 otherwise. Parameters within our proposal distribution are known as tuning parameters. They are typically used to adjust the probability of acceptance or improv e mixing and must be chosen through some automatic procedure or manually , see Section 2.3.4. 2.3.3. Gibbs sampling The Gibbs sampler (Gelfand & Smith, 1990; Casella & George, 1992) is a MCMC al- gorithm for obtaining a random sample from a multiv ariate probability distribution of interest π ( θ ) , where θ = ( θ 1 , θ 2 , ..., θ d ) . Consider that the full conditional distrib utions π ( θ i | θ 1 , ..., θ i − 1 , θ i +1 , ..., θ d ) , i = 1 , ..., d are a v ailable. Where it is simpler to sample from conditional distribution than to marginalize by integrating ov er a joint distribution, the Gibbs sampler is applicable. The follo wing procedure sequentially samples from the full conditional distribution for each parameter , resulting in the probability distribution of in- terest. The algorithm is as follo ws: 1) Initialise counter i = 1 and parameters θ (0) = ( θ 1 (0) , θ 2 (0) , ..., θ d (0) ) . 2) Simulate θ 1 ( i ) from θ 1 ( i ) ∼ π ( θ 1 | θ 2 ( i − 1) , ..., θ d ( i − 1) ) . 3) Simulate θ 2 ( i ) from θ 2 ( i ) ∼ π ( θ 2 | θ 1 ( i ) , θ 3 ( i − 1) , ..., θ d ( i − 1) ) . 26 Chapter 2. Background 4) ... 5) Simulate θ d ( i ) from θ d ( i ) ∼ π ( θ d | θ 1 ( i ) , ..., θ d − 1 ( i ) ) . 6) Store θ ( i ) = ( θ 1 ( i ) , θ 2 ( i ) , ..., θ d ( i ) ) and iterate i = i + 1 . 7) Repeat steps 2-6 until the sample size required is obtained. T o ensure the full conditional distrib utions for each parameter in a Bayesian model are kno wn and easy to handle, conjugacy can be used. Conjugac y is where the prior is of the same family as the posterior . Conjugac y can be induced by the choice of prior , for example if it is known that a likelihood is Normal with known variance, a Normal prior ov er the mean will ensure that the posterior is also a Normal distrib ution. 2.3.4. Con vergence issues T o accept output from MCMC algorithms, all chains are required to hav e reached con- ver gence (Gamerman, 1997; Cowles & Carlin, 1996). Con vergence is a requirement to gain unbiased samples of a posterior distrib ution. V isual and statistical tests can be used to determine if chains hav e con ver ged, see Section 2.3.5. Other issues that we must consider for MCMC sampling algorithms are choice of tuning parameters, burn-in period, sample size and thinning, if required. T uning parame- ters require a good choice of proposal distribution, preferably with high acceptance rates and good mixing. There are many schemes av ailable for the choice of tuning parameters (Andrieu & Thoms, 2008). T ypically tuning parameters are determined during a burn-in period. The burn-in period is a number of iterations which an algorithm must be run for in order to con v er ge to equilibrium. Sample size depends on how many iterations from the posterior are required for both inference and testing con v er gence. Thinning in v olves discarding output for iterations of a MCMC algorithm, in order to gi v e less dependent realizations from the posterior distribution. Extending the length of the b urn-in period, sample size and thinning leads to increased computational time. W ith large data sets and models with a large number of parameters, computation time can become a problem. W ith a Bayesian modelling approach, computa- tional time associated with MCMC can be much longer than a much simpler least squares approach. This problem is exacerbated when coupled with poor mixing and is likely to 27 Chapter 2. Background lead the experimenter to simplify their modelling procedure, consequently sacriﬁcing the quality of inference, in order to complete their analysis within a shorter time frame. 2.3.5. Con vergence diagnostics T o determine whether chains are true samples from their target distrib utions, tests for lack of con vergence or mixing problems (Gamerman, 1997; Co wles & Carlin, 1996) must be carried out. T ypically multiple tests are used to gi ve conﬁdence that the output has con- ver gence. There are many con v er gence diagnostics for testing chains for con ver gence, for example the Heidelberg-W elch (Heidelberger & W elch, 1981) and Raftery-Lewis (Raftery & Le wis, 1995) tests. For many con vergence diagnostics, summary statistics such as p- v alues can be used to decide whether con ver gence has been reached. V isual inspection of diagnostic plots can also be used to determine if con v er gence has been reached. T race plots are used to check if samples from the posterior distribution are within a ﬁxed region of plausible values and not e xploring the whole range. A CF (auto-correlation function) plots are used to determine serial correlation between sample values of the posterior dis- tribution in order to check for the independence of observations. Density plots are used to check whether a sample posterior distrib ution is restricted by the choice of prior distribu- tion and determine whether choice of prior is appropriate. Running multiple instances of our MCMC algorithm and comparing chains can also help us decide whether our chains hav e con ver ged. 2.3.6. Computer programming T o ensure results and inference are reproducible, it is useful to create a computer package so that an analysis can be made in the future without all the code required being re-written. Using freely av ailable software such as the statistical program R (R Core T eam, 2013), scripts and commands can be built and shared for easy implementation of code. Where fast inference is of importance, the choice of programming language is an important consideration. The software package R can also be used as an interface for run- ning code in the C programming language. Statistical code written in the C programming language is typically much f aster than using standard R functions or code written in many other programming languages (Fourment & Gillings, 2008). 28 Chapter 2. Background 2.4. Hierarchical modelling Hierarchical modelling is used to is used to describe the structure of a problem where we belie ve some population lev el distrib ution exists, describing a set of unobserved pa- rameters (Gelman et al. , 2003). Examples include pupils nested within classes, children nested within families and patients nested within hospitals. W ith the pupil-class rela- tionship (2 lev el-hierarchy), for a gi v en class there may be a number of pupils. W e may belie ve that by being in the same class, pupils will perform similarly in an exam as they are taught by the same teacher . Further , we may hav e a pupil-class-school relationship (3 lev el-hierarchy). For a gi ven school, multiple classes exist and in each class there is a number of pupils. W e may belie ve that being within the same school, classes would perform similarly in an exam as the y share the same head teacher or school principal. Hierarchical modelling is used to describe a parent/child relationship (Gelman & Hill, 2006). Repeating the parent/child relationship allo ws multiple lev els to be described. Where a hierarchical structure is kno wn to exist, describing this experimental structure av oids confounding of effects with other sources of v ariation. There are man y different hierarchical models a v ailable, depending on what the exper - imenter is most interested in (Zuur et al. , 2009; Goldstein, 2011). Sharing of information can be built into hierarchical models by the sharing parameters. Allowing parameters to v ary at more than one le vel allows an individual child (subject) ef fect to be examined. A typical frequentist hierarchical model is b uilt with random effects and has limited dis- tributional assumptions a v ailable, whereas a Bayesian hierarchical model is ﬂexible to describe v arious distrib utions (Gelman, 2006), see Section 2.4.1. Plate diagrams allow hierarchical models to be represented graphically (Lunn et al. , 2000 b ; Thulasiraman, 1992). Nodes (circles) are used to describe parameters and plates (rectangles) to describe repeating nodes. The use of multiple plates allows nesting to be described. 2.4.1. Distributional assumptions The ﬂexibility of the Bayesian paradigm allows for models to be b uilt that are otherwise not practical in the frequentist paradigm. More appropriate assumptions can therefore be made to better describe experimental structure and variation in a Bayesian setting (Gel- man et al. , 2003). For e xample, inference for a hierarchical t -distribution or hierarchical v ariable section model in a frequentist context is difﬁcult in practise without using MCMC methods that are a more natural ﬁt with Bayesian approaches. 29 Chapter 2. Background The use of prior distributions allows information from the experimenter and experi- mental constraints to be incorporated, for instance if a parameter is known to be strictly positi ve then a positiv e distribution can be used to enforce this. T runcation can be used to reduce searching posterior areas with extremely lo w probability . 2.4.2. Indicator variables Indicator v ariables are used in variable selection models to describe binary variables (O’Hara & Sillanpaa, 2009). A Bernoulli distrib uted indicator v ariable can take the v alue 0 or 1 to indicate the absence or presence of an effect and can be used to describe binary outcomes such as gender . 2.4.3. The three parameter t -distrib ution The Student’ s t -distribution has one parameter , namely the degrees of freedom parameter ν which controls the kurtosis of the distrib ution (Johnson et al. , 1995). The Student’ s t -distribution is as follo ws: t 1 ( x ; ν ) = Γ  ν +1 2  √ ν π Γ  ν 2   1 + x 2 ν  − ν +1 2 , x ∈ R , ν ∈ R + . (2.1) The ν scale parameter has the ef fect of increasing the hea viness of the distribution’ s tails. Adding an additional location parameter µ and scale parameter σ allo ws further ﬂe xibil- ity with the shape of the distribution (Jackman, 2009). The σ scale parameter does not correspond to a standard de viation but does control the overall scale of the distribution. The three parameter t -distribution (or scaled t -distribution) is then as follo ws: t 3 ( x ; µ, ν , σ ) = 1 σ t 1  ( x − µ ) σ ; ν  , x ∈ R , ν ∈ R + , where t 1 is gi ven in (2.1). 2.5. Generalisations of the logistic gro wth model Where more ﬂe xibility than the logistic gro wth model is required, the logistic gro wth model (1.1) can be extended by adding parameters (Tsoularis & W allace, 2002; Jr . et al. , 1976). A common e xtension of the logistic gro wth model is Richards’ growth model (Richards, 1959; Peleg et al. , 2007), which adds a single parameter for changing the shape of growth. A more general case to both the logistic and Richards’ growth model is 30 Chapter 2. Background the generalised logistic growth model. Similarly to the logistic gro wth model (1.1) and its stochastic counterpart (1.8), these more general equations can be extended to dif fusion equations if required. 2.5.1. Richards’ gro wth model Richards’ Gro wth model (Richards, 1959) adds an extra parameter β to the logistic growth equation (1.1). The parameter β af fects where maximum gro wth occurs and consequently the relativ e gro wth rate (Tsoularis & W allace, 2002). Richards’ Growth model is as fol- lo ws: dx t dt = r x t  1 −  x t K  β  . (2.2) The ODE has the follo wing analytic solution: x t = K (1 + Qe − rβ t ) 1 β , where Q = "  K P  β − 1 # e β t o , ( α, β ) are positiv e real numbers and t ≥ t 0 . When β = 1 , Richards’ gro wth model is equi v alent to the logistic gro wth equation. 2.5.2. Generalised logistic gro wth model The generalised logistic growth model adds extra parameters ( α , β , γ ) to the logistic gro wth equation (1.1). The extra parameters ( α, β , γ ) af fect where maximum gro wth occurs, the relativ e gro wth rate (Tsoularis & W allace, 2002) and gi ve a greater selection of curve shapes than the Richards’ growth model (2.2). The generalised logistic growth model is as follo ws: dx t dt = r x α t  1 −  x t K  β  γ , (2.3) where ( α, β , γ ) are positiv e real numbers and t ≥ t 0 . The generalised logistic growth model cannot in general be inte grated to giv e an analytical solution for x t . When α = 1 , β = 1 and γ = 1 , the generalised logistic gro wth model is equiv alent to the logistic gro wth equation. 31 Chapter 2. Background 2.6. State space models A state space model describes the probabilistic dependence between a measurement pro- cess Y t and a state process X t (W est & Harrison, 1997; Durbin et al. , 2004). The most basic case of a state space model is as follo ws: ( X t | X t − 1 = x t − 1 ) ∼ f ( t, x t − 1 ) , ( Y t | X t = x t ) ∼ g ( t, x t ) , (2.4) where f and g are known. A state space model with a linear Gaussian structure has the adv antage of allo wing us to carry out more efﬁcient MCMC by integrating out latent states with a Kalman ﬁlter , instead of imputing all states. The probabilistic representation and the ability to incorporate prior information makes Bayesian inference an appropriate choice for parameter estimation of a state space model. State space representation provides a general frame work for analysing stochastic dy- namical systems observed through a stochastic process. A state space model allows us to include both an internal state v ariable and an output v ariable in our model. The state- space representation of a stochastic process with measurement error can be gi v en by (2.4) where f is the transition density of the process and g is the assumed measurement error . Inference methods are also readily av ailable to carry out estimation of state space models. 2.6.1. Stochastic differential equations An ordinary dif ferential equation (ODE) can be used to model a system of interest. For systems with inherent stochastic nature we require a stochastic model. A stochastic dif- ferential equation (SDE) is a differential equation where one or more terms include a stochastic process (W ilkinson, 2011; Øksendal, 2010). An SDE dif fers from an ODE by the addition of a diffusion term, typically a W einer process, used to describe the in- trinsic noise of a giv en process. A W iener process (or standard Bro wnian motion) is a continuous-time stochastic process. A W iener process W ( t ) , t ≥ 0 , has the following three properties Durrett (1996): 1) W (0) = 0 . 2) The function t → W ( t ) is almost surely e v erywhere continuous. 3) W ( t ) has independent increments with W ( t ) − W ( s ) ∼ N(0 , t − s ) , for 0 ≤ s < t . Intrinsic noise from a W einer process perpetuates the system dynamics of a dif ferential equation.The intrinsic noise is able to propagate though the process, unlike measurement 32 Chapter 2. Background noise. Instead of inappropriately modelling intrinsic noise by measurement noise, an SDE allo ws our process to model both system and measurement noise separately . The simplest case of a stochastic dif ferential equation is of the form: dX ( t ) = µdt + σ dW ( t ) , where W denotes a W iener process. Parameters µ and σ may depend on time and cor- respond to the drift and dif fusion coefﬁcients respectiv ely . The transition density of a stochastic process describes the mov ement from one state to the next and can be found from the solution of the process. 2.6.2. The Euler -Maruyama method The Euler-Maruyama method pro vides an approximate numerical solution of a SDE (Car - letti, 2006).For a stochastic process of the form: dX t = f ( X t ) dt + g ( X t ) dW t , where functions f and g are gi ven and W t is a W iener process. Gi v en an initial condition X 0 = x 0 we can build an Euler-Maruyama approximation of X ov er an interval [0 , T ] . The Markov chain Y deﬁned below is an Euler-Maruyama approximation to the true solu- tion of X . First we set the initial condition Y 0 = x 0 . Ne xt, the interv al [0 , T ] is partitioned into N equal subinterv als of width ∆t > 0 . The Euler -Maruyama approximation is then recursi vely deﬁned for 1 ≤ i ≤ N as follo ws: Y i +1 = Y i + f ( Y i ) ∆t + g ( Y i ) ∆W i , where ∆W i = W t i +1 − W t i ∼ N(0 , ∆t ) . The Euler-Maruyama approximation Y will become a better approximation to the true process X as we increase the size of N . 2.6.3. Kalman ﬁlter The Kalman ﬁlter (Kalman, 1960; W elch & Bishop, 1995) is a recursiv e algorithm that can be used to estimate the state of a dynamic system from a series of incomplete and noisy measurements. The main assumptions of the Kalman ﬁlter are that the underlying system is a linear dynamical system and that the noise has known ﬁrst and second moments. Gaussian noise satisﬁes the second assumption, for example. 33 Chapter 2. Background Inference for a state space model (2.4) (see Section 2.6), where both f and g are Gaussian, can be carried out using a Kalman ﬁlter . If all noise is zero-mean, uncorrelated and white, then the Kalman ﬁlter represents an optimal linear ﬁlter (Simon, 2006), even if the noise is not Gaussian. An application of the Kalman ﬁlter is gi ven in Section C.5 of the Appendix. The Kalman ﬁlter algorithm is deriv ed as follo ws: X t i and Y t i are the state and measurement processes respectiv ely . w t and u t are the state and measurement error respecti vely , where w t and u t are IID, E [ w t ] = 0 , E [ u t ] = 0 , E [ w t w t T ] = W t and E [ u t u t T ] = U t . The Kalman ﬁlter can be extended where w t and u t are not zero mean. The unobserved latent process is dri ven by: X t i | X t i − 1 ∼ N( G t i X t i − 1 , W t i ) and the measurement error distribution, relating the latent variable to the observed is giv en by Y t i | X t i ∼ N( F T t i X t i , U t i ) , where matrices F t i , G t i , U t i and W t i are all gi ven. Now , suppose that: X t i − 1 | Y 1: t i − 1 ∼ N( m t i − 1 , C t i − 1 ) . Incrementing time with X t i = G t i X t i − 1 + w t i − 1 and condition on Y 1: t i − 1 to gi ve: X t i | Y 1: t i − 1 = G t i X t i − 1 | Y 1: t i − 1 + w t i | Y 1: t i − 1 = G t i X t i − 1 | Y 1: t i − 1 + w t i − 1 , as w t i is independent of Y 1: t i − 1 . W e can then show the follo wing using standard multi vari- ate theory: X t i | Y 1: t i − 1 ∼ N( a t i , R t i ) . where a t i = G t i m t i − 1 and R t i = G t i C t i − 1 G T t i + W t i . As Y t i = F T t i X t i + u t i , and condition on Y 1: t i − 1 to gi ve: Y t i | Y 1: t i − 1 = F T t i X t i | Y 1: t i − 1 + u t i | Y 1: t i − 1 = F T t i X t i | Y 1: t i − 1 + u t i , as u t i is independent of Y 1: t i − 1 . W e can then sho w the following using standard multi vari- 34 Chapter 2. Background ate theory: Y 1: t i | Y 1: t i − 1 ∼ N( F T t i a t i , F T t R t i F t + U t i ) Y 1: t i | Y 1: t i − 1 and X t i | Y 1: t i − 1 are therefore jointly Gaussian with the following mean and cov ariance: X t i Y 1: t i ! ∼ M V N a t i Y t i ! , R t i R t i F t F T t R t i F T t R t i F t + U t i !! , Finally , the following multi v ariate theorem is used: if Y 1 Y 2 ! ∼ M V N µ 1 µ 2 ! , Σ 11 Σ 12 Σ 21 Σ 22 !! , then Y 1 | Y 2 = y 2 ∼ M V N  µ 1 + Σ 12 Σ − 1 22 ( y 2 − µ 2 ) , Σ 11 − Σ 12 Σ − 1 22 Σ 21  , to obtain the follo wing: X t i | Y 1: t i ∼ N( m t i , C t i ) , where m t i = a t i + R t i F ( F T R t i F + U ) − 1 [ Y t i − F T a t i ] and C t i = R t i − R t i F ( F T R t i F + U ) − 1 F T R t i . (2.5) Parameters m 0 and C 0 must be initialised ﬁrst, then using the equations in (2.5), m t i and C t i can be recursi vely estimated. T ypically , the Kalman ﬁlter is used to make inference for a hidden state process, b ut it can be used to reduce computational time in algorithms for inferring process hyper- parameters by recursi vely computing the marginal likelihood π ( y t 1: N ) (W est & Harrison, 1997), where π ( y t 1: N ) = N Y i =1 π ( y t i | y t 1:( i − 1) ) and π ( y t i | y t 1:( i − 1) ) = R X π ( y t i , x t i | y t 1:( i − 1) ) dx t i = R X π ( y t i | x t i ) π ( x t i | y t 1:( i − 1) ) dx t i gi ves a tractable Gaussian integral. The procedure for computing the marginal likelihood π ( y t 1: N ) using the Kalman ﬁlter algorithm is as follo ws: 1) Initialise with prior kno wledge for X 0 and set i = 1 . 2) Prediction step from X t i − 1 | Y 1: t i − 1 to X t i | Y 1: t i − 1 (gi ving π ( x t i | y 1: t i − 1 ) ). 35 Chapter 2. Background 3) Calculate and store π ( y t i | y 1: t i − 1 ) . 4) Update step to gi ve X t i | Y 1: t i , then iterate i = i + 1 . 5) Repeat steps 2-4 (and compute π ( y t 1: N ) ). 2.6.4. Linear noise approximation The linear noise approximation (LN A) (Kurtz, 1970, 1971; V an Kampen, 2011) reduces a non-linear SDE to a linear SDE with additi v e noise, which can be solved (W allace, 2010; K omoro wski et al. , 2009). The LNA assumes the solution of a diffusion process Y t can be written as Y t = v t + Z t (a deterministic part v t and stochastic part Z t ), where Z t remains small for all t ∈ R ≥ 0 . The LN A is useful when a tractable solution to a SDE cannot be found. T ypically the LN A is used to reduce an SDE to a Ornstein-Uhlenbeck process which can be solv ed explicitly . Ornstein-Uhlenbeck processes are Gaussian, time discretising the resulting LN A will therefore gi ve us a linear Gaussian state space model with an analytically tractable transition density av ailable. The LN A can be viewed as a ﬁrst order T aylor expansion of an approximating SDE about a deterministic solution (higher order approximations are possible (Gardiner, 2010)). W e can also vie w the LN A as an approximation of the chemical Lange vin equation (W allace et al. , 2012). Applica- tions of the LN A to non-linear SDEs are gi v en in Section 5.3 and 5.4. 36 Chapter 3. Modelling genetic interaction 3.1. Introduction In this chapter , alternativ e modelling approaches are de v eloped to better model a QF A screen comparison than the current frequentist Addinall et al. (2011) approach. Sec- tion 3.2 presents the modelling assumptions for the de v elopment of a Bayesian approach. T wo Bayesian approaches are then presented in Sections 3.3 and 3.4, incorporating some model assumptions that are not con venient in a frequentist setting. So that our Bayesian models can be compared with a frequentist hierarchical modelling approach, a random ef fects model is then presented in Section 3.5. The models in this chapter are compared using pre viously analysed S. cer e visiae QF A screen data in the next chapter . Historic S. cer e visiae QF A screen datasets are used to shape the model assumptions adopted in the follo wing sections. 3.2. Bayesian hierar chical model infer ence As an alternativ e to the maximum likelihood approach presented by Addinall et al. (2011), we present a Bayesian, hierarchical methodology where a priori uncertainty about each parameter v alue is described by probability distributions (Bernardo & Smith, 2007) and information about parameter distributions is shared across orf ∆ s and conditions. Plausi- ble frequentist estimates from across 10 dif ferent historic QF A data sets, including a wide range of different background mutations and treatments were used to quantify a priori uncertainty in model parameters. Prior distributions describe our beliefs about parameter v alues. These should be dif- fuse enough to capture all plausible values (to capture the full range of observ ations in the datasets) while being restricti v e enough to rule out implausible v alues (to ensure efﬁ- cient inference). Inappropriate choice of priors can result in chains drifting during mixing and becoming stuck in implausible regions. Although using conjugate priors would al- lo w faster inference, we ﬁnd that the conjugate priors av ailable for v ariance parameters (Gelman, 2006) are either too restrictiv e at low variance (In v erse-gamma), not restric- ti ve enough at lo w variance (half-t family of prior distributions) or are non-informati ve or largely discard the prior information av ailable (Uniform). Our choice for the priors of precision parameters is the non-conjugate Log-normal as we ﬁnd the distribution is only 37 Chapter 3. Modelling genetic interaction restricti ve at e xtremely high and lo w v ariances. W e use three types of distribution to model parameter uncertainty: Log-normal, Nor - mal and scaled t-distribution with three degrees of freedom. W e use the Log-normal distribution to describe parameters which are required to be non-negati v e (e.g. parame- ters describing precisions, or repeat-le vel ﬁtnesses) or parameter distributions which are found by visual inspection to be asymmetric. W e use the Normal distrib ution to describe parameters which are symmetrically distributed (e.g. some prior distrib utions and the measurement error model) and we use the t -distribution to describe parameters whose un- certainty distrib ution is long-tailed (i.e. where using the Normal distribution would result in excessi ve shrinkage to wards the mean). A Normal distrib ution was considered for de- scribing the variation in orf ∆ s but was found to be inappropriate, failing to assign density at the extreme high and lo w ﬁtnesses. For example, after visual inspection of frequentist orf ∆ le vel means about their population mean, we found there to be many unusually ﬁt, dead or missing orf ∆ and concluded that orf ∆ ﬁtnesses would be well modelled by the t-distribution. Instead of manually ﬁxing the inoculum density parameter P as in Addinall et al. (2011) our Bayesian hierarchical models deal with the scarcity of information about the early part of culture growth curves by estimating a single P across all orf ∆ s (and condi- tions in some of our models). Our new approach learns about P from the data and gi v es us a posterior distribution to describe our uncertainty about its v alue. The new , hierarchical structure implemented in our models (Goldstein, 2011) reﬂects the structure of QF A experiments. Information is shared efﬁciently among groups of parameters such as between repeat lev el parameters for a single mutant strain. An example of the type of Bayesian hierarchical modelling which we use to model genetic interaction can be seen in Y i (2010), where hierarchical models are used to account for group ef fects. In Phenix et al. (2011) the signal of genetic interaction is chosen to be “strictly ON or OFF” when modelling gene acti vity . W e include this concept in our interaction models by using a Bernoulli distrib uted indicator variable (O’Hara & Sillanpaa, 2009) to describe whether there is e vidence of an orf ∆ interacting with the query mutation; the more evi- dence of interaction, the closer posterior expectations will be to one. Failing to account for all sources of v ariation within the experimental structure, such as the difference in variation between the control and query ﬁtnesses, may lead to inac- curate conclusions. By incorporating more information into the model with prior distri- butions and a more ﬂexible modelling approach, we will increase statistical po wer . W ith an improv ed analysis it may then be possible for a similar number of genetic interactions 38 Chapter 3. Modelling genetic interaction to be identiﬁed with a smaller sample size, saving on the signiﬁcant experimental costs associated with QF A. Inference is carried out using Marko v Chain Monte Carlo (MCMC) methods. The algorithm used is a Metropolis-within-Gibbs sampler where each full-conditional is sam- pled in turn either directly or using a simple Normal random walk Metropolis step. Due to the lar ge number of model parameters and lar ge quantity of data from high-throughput QF A experiments, the algorithms used for carrying out inference often ha ve poor mixing and gi v e highly auto-correlated samples, requiring thinning. Posterior means are used to obtain point estimates where required. For the new Bayesian approaches (described in Section 3.3 and 3.4), model ﬁtting is carried out using the techniques discussed abov e, implemented in C for computational speed, and is freely a v ailable in the R package “qfaBayes” at https://r- forge. r- project.org/projects/qfa . 3.3. T wo-stage Bayesian hierar chical approach In the follo wing sections, a tw o-stage Bayesian, hierarchical modelling approach (see Section 3.3.1 and 3.3.2) is presented. The following two-stage Bayesian approach gen- erates orf ∆ ﬁtness distributions and infers genetic interaction probabilities separately . For a QF A screen comparison, ﬁrst the separate hierarchical model (SHM) gi ven in Sec- tion 3.3.1, is ﬁt to each screen separately and a set of logistic gro wth parameter estimates obtained for each time-course. Secondly , each set of logistic gro wth parameter estimates is conv erted into a univ ariate ﬁtness summary and input to the interaction hierarchical model (IHM) gi v en in Section 3.3.2, to determine which genes show e vidence of genetic interaction. 3.3.1. Separate hierarchical model The separate hierarchical model (SHM), presented in T able 3.1, models the growth of multiple yeast cultures using the logistic function described in (1.2). In this ﬁrst hierar - chical model, the logistic model is ﬁt to the query and control strains separately . In order to measure the variation between orf ∆ s, parameters ( K p , σ K o ) and ( r p , σ r o ) are included at the population lev el of the hierarchy . W ithin- orf ∆ v ariation is modelled by each set of orf ∆ lev el parameters ( K o l , τ K l ) and ( r o l , τ r l ). Learning about these higher le v el parameters allows information to be shared across parameters lo wer in the hierarchy . A three-le vel hierarchical model is applied to ( K, K o l , K lm ) and ( r , r o l , r lm ) , sharing infor- 39 Chapter 3. Modelling genetic interaction mation on the repeat lev el and the orf ∆ lev el. Note that orf ∆ lev el parameters K o l and r o l are on the log scale ( e K o l and e r o l are on the scale of the observed data). Assuming a Normal error structure, random measurement error is modelled by the ν l parameters (one for each orf ∆ ). Information on random error is shared across all orf ∆ s by dra wing log ν l from a normal distribution parameterised by ( ν p , σ ν ). A two-le v el hierarchical structure is also used for both the τ K l and τ r l parameters. Modelling logistic model parameter distributions on the log scale ensures that pa- rameter values remain strictly positi v e (a realistic biological constraint). T runcating dis- tributions allows us to implement further , realistic constraints on the data. T runcating log r lm v alues greater than 3.5 corresponds to disallo wing biologically unrealistic culture doubling times faster than about 30 minutes and truncating of repeat lev el parameters log K lm abov e 0 ensures that no carrying capacity estimate is greater than the maximum observ able cell density , which is 1 after scaling. orf ∆ lev el parameters e K l o and e r l o are on the same scale as the observed data. Real- istic biological constraints (positi v e logistic model parameters) are enforced at the repeat le vel, howe v er both e K l o and e r l o , which are assumed to have scaled t -distrib utions, are truncated below zero to keep exponentiated parameters strictly positiv e. Most orf ∆ le vel logistic gro wth parameters are distributed in a bell shape around some mean value, it is the unusually ﬁt, dead or missing orf ∆ s within a typical QF A screen that require the use of a long tailed distrib ution such as the scaled t -distribution with 3 degrees of freedom. The non-standard choice of a truncated scaled t -distribution with 3 degrees of freedom ensures that the extreme high and lo w v alues have probability assigned to them regardless of the population le vel location and scale parameters for a gi ven QF A screen. For example, after visual inspection of frequentist orf ∆ le v el means about their popu- lation mean, we found there to be many unusually ﬁt, dead or missing orf ∆ and concluded that orf ∆ ﬁtnesses would be well modelled by the t-distrib ution. Identiﬁability problems can arise for parameters K lm and r lm when observed cell den- sities are lo w and unchanging (consistent with gro wth curves for cultures which are very sick, dead or missing). In these cases, either K lm or r lm can take v alues near zero, allo w- ing the other parameter to take any value without signiﬁcantly affecting the model ﬁt. In the Addinall et al. (2011) approach identiﬁcation problems are handled in an automated post-processing stage: for cultures with lo w K estimates (classiﬁed as dead), r is automat- ically set to zero. W ithout correcting for identiﬁcation problems in our Bayesian models, misleading information from implausible values will be shared across our models. Com- puting time wasted on such identiﬁability problems is reduced by truncating repeat lev el 40 Chapter 3. Modelling genetic interaction T able 3.1: Description of the separate hierarchical model (SHM). Dependent variable y lmn (scaled cell density measurements) and independent variable t lmn (time since inoculation) are data input to the SHM. x ( t ) is the solution to the logistic model ODE gi ven in (1.2). l indicates a particular orf ∆ from the gene deletion library , m indicates a repeat for a gi v en orf ∆ and n indicates the time point for a gi ven orf ∆ repeat. l = 1 , 2 , ..., L orf ∆ le vel m = 1 , ..., M l Repeat le vel n = 1 , 2 , ..., N lm T ime point le v el T ime point le v el y lmn ∼ N( ˆ y lmn , ( ν l ) − 1 ) ˆ y lmn = x ( t lmn ; K lm , r lm , P ) Repeat le vel log K lm ∼ N( K o l , ( τ K l ) − 1 ) I ( −∞ , 0] log τ K l ∼ N( τ K,p , ( σ τ ,K ) − 1 ) I [0 , ∞ ) log r lm ∼ N( r o l , ( τ r l ) − 1 ) I ( −∞ , 3 . 5] log τ r l ∼ N( τ r,p , ( σ τ ,r ) − 1 ) orf ∆ le vel e K o l ∼ t ( K p , ( σ K,o ) − 1 , 3) I [0 , ∞ ) log σ K,o ∼ N( η K,o , ( ψ K,o ) − 1 ) e r o l ∼ t ( r p , ( σ r,o ) − 1 , 3) I [0 , ∞ ) log σ r,o ∼ N( η r,o , ( ψ r,o ) − 1 ) log ν l ∼ N( ν p , ( σ ν ) − 1 ) log σ ν ∼ N( η ν , ( ψ ν ) − 1 ) Population le vel log K p ∼ N( K µ , ( η K,p ) − 1 ) log r p ∼ N( r µ , ( η r,p ) − 1 ) log P ∼ N( P µ , ( η P ) − 1 ) ν p ∼ N( ν µ , ( η ν,p ) − 1 ) τ K,p ∼ N( τ K,µ , ( η τ ,K,p ) − 1 ) log σ τ ,K ∼ N( η τ ,K , ( ψ τ ,K ) − 1 ) τ r,p ∼ N( τ r,µ , ( η τ ,r,p ) − 1 ) log σ τ ,r ∼ N( η τ ,r , ( ψ τ ,r ) − 1 ) 41 Chapter 3. Modelling genetic interaction parameters r lm , pre venting the MCMC algorithms from becoming stuck in extremely lo w probability regions when K lm takes near zero values. Similarly , log τ K l parameters are truncated below 0 to ov ercome identiﬁability problems between parameters K lm and r lm when r lm takes near zero v alues. The SHM in T able 3.1 is ﬁt to both the query and control strains separately . Means are taken to summarise logistic growth parameter posterior distributions for each orf ∆ repeat. Summaries ( ˆ K lm , ˆ r lm , ˆ P ) for each orf ∆ repeat are con verted to univ ariate ﬁtnesses F clm , where c identiﬁes the condition (query or control), with any gi v en ﬁtness measure e.g. M D R × M DP (see (1.3) and Addinall et al. (2011)). A problem of the two-stage approach is that we must choose a ﬁtness deﬁnition most relev ant to the experiment. W e choose the same deﬁnition used in Addinall et al. (2011), M D R × M D P , for the comparison of our methods. An alternati v e choice of ﬁtness deﬁnition could be used gi ven sufﬁcient biological justiﬁcation. Section 1.1.3 gi v es the deriv ations of M DR and M D P . The product of M D R × M DP is used as it accounts for the attributes of two deﬁnitions simultaneously . The ﬂow of information within the model and how each parameter is related to the data can be seen from the plate diagram in Figure 3.1 (Lunn et al. , 2000 b ). 42 Chapter 3. Modelling genetic interaction y l mn ˆ y l mn r p K l m σ ν ν p P σ τ ,r τ r ,p τ r l σ r ,o r o l τ K l K p K o l r l m σ K,o σ τ ,K τ K,p ν l P opulati on or f ∆ Rep eat Time Poin t Figure 3.1: Plate diagram for the separate hierarchical model, described in Section 3.3.1. This ﬁgure shows the four lev els of hierarchy in the SHM model, population, orf ∆ ( l ), repeat ( m ) and time point ( n ). Prior hyperparameters for the population parameters are omitted. A circular node represents a parameter in the model. An arro w from a source node to a target node indicates that the source node parameter is a prior hyperparameter for the target node parameter . Each rectangular box corresponds to a le vel of the hierarchy . Nodes within multiple boxes are nested and their parameters are index ed by corresponding le vels of the hierarchy . The node consisting of two concentric circles corresponds to the models ﬁtted v alues. The rectangular node represents the observed data. 43 Chapter 3. Modelling genetic interaction 3.3.2. Interaction hierarchical model After the SHM ﬁt, the IHM, presented in T able 3.2, can then be used to model estimated ﬁtness scores F clm and determine, for each orf ∆ , whether there is evidence for interaction. Fitnesses are passed to the IHM where query screen ﬁtnesses are compared with con- trol screen ﬁtnesses, assuming genetic independence. De viations from predicted ﬁtnesses are e vidence for genetic interaction. The ﬂow of information within the IHM and ho w each parameter is related to the data can be seen from the plate diagram in Figure 3.2. The interaction model accounts for between orf ∆ v ariation with the set of parameters ( Z p , σ Z ) and within orf ∆ v ariation by the set of parameters ( Z l , ν l ). A linear relationship between the control and query orf ∆ lev el parameters is speciﬁed with a scale parameter α 1 . An y deviation from this relationship (genetic interaction) is accounted for by the term δ l γ 1 ,l . δ l is a binary indicator of genetic interaction for orf ∆ l . A scaling parameter α 1 allo ws any ef fects due to dif ferences in the control and query data sets to be scaled out, such as dif ferences in genetic background, incubator temperature or inoculum density . The linear relationship between the control and query ﬁtness scores, consistent with the multiplicati ve model of genetic independence, described in (1.5), is implemented in the IHM as: ˆ F = e α c + Z l + δ l γ cl = e α c e Z l + δ l γ cl . Strains whose ﬁtnesses lie along the linear relationship deﬁned by the scalar α 1 sho w no e vidence for interaction with the query condition. On the other hand, de viation from the linear relationship, represented by the posterior mean of δ l γ 1 ,l is evidence for genetic interaction. The larger the posterior mean for δ l is the higher the probability or evidence there is for interaction, while γ 1 ,l is a measure of the strength of interaction. Where the query condition has a negati ve effect (i.e. decreases ﬁtness on av erage, compared to the control condition), query ﬁtnesses which are abov e and below the linear relationship are suppressors and enhancers of the ﬁtness defect associated with the query condition respectiv ely . A list of gene names are ordered by δ l γ cl posterior means and those orf ∆ s with ˆ δ l > 0 . 5 will be classiﬁed and labelled as sho wing “signiﬁcant” e vidence of interaction. The Bernoulli probability parameter p is our prior estimate for the probability of a gi ven orf ∆ showing e vidence of genetic interaction. For a typical yeast QF A screen, p is set to 0.05 as the experimenter’ s belief before the experiment is carried out is that 5% of our orf ∆ s exhibit genetic interactions. Observational noise is quantiﬁed by ν cl . The ν cl parameter accounts for difference in variation between condition i.e. the query and control data sets and for dif ference in v ariation between orf ∆ s. 44 Chapter 3. Modelling genetic interaction T able 3.2: Description of the interaction hierarchical model (IHM). F clm are the observed ﬁtness scores, where c identiﬁes the condition for a gi ven orf ∆ , l identiﬁes a particular orf ∆ from the gene deletion library and m identiﬁes a repeat for a giv en orf ∆ . c = 0 , 1 Condition le vel l = 1 , ..., L c orf ∆ le vel m = 1 , ..., M cl Repeat le vel Repeat le vel F clm ∼ N( ˆ F cl , ( ν cl ) − 1 ) ˆ F cl = e α c + Z l + δ l γ cl orf ∆ le vel e Z l ∼ t ( Z p , ( σ Z ) − 1 , 3) I [0 , ∞ ) log σ Z ∼ N( η Z , ψ Z ) log ν cl ∼ N( ν p , ( σ ν ) − 1 ) log σ ν ∼ N( η ν , ψ ν ) δ l ∼ B er n ( p ) e γ cl = ( 1 if c = 0; t (1 , ( σ γ ) − 1 , 3) I [0 , ∞ ) if c = 1 . log σ γ ∼ N( η γ , ( ψ γ ) − 1 ) Condition le vel α c = ( 0 if c = 0; N( α µ , η α ) if c = 1 . Population le vel log Z p ∼ N ( Z µ , ( η Z,p ) − 1 ) ν p ∼ N( ν µ , ( η ν,p ) − 1 ) 45 Chapter 3. Modelling genetic interaction F cl m α c ν p δ l σ γ c γ cl ν cl σ ν P opulati on Rep eat Condi t ion or f ∆ Z l Z p σ Z ˆ F cl Figure 3.2: Plate diagram for the interaction hierarchical model, described in Section 3.3.2. This ﬁgure shows the four lev els of hierarchy in the IHM model: population, orf ∆ ( l ), condition ( c ) and repeat ( m ). Prior hyperparameters for population parameters are omitted. Plate diagram notation as in Figure 3.1. 46 Chapter 3. Modelling genetic interaction 3.4. One-stage Bayesian hierar chical appr oach Follo wing from Section 3.3, a one-stage approach for inferring ﬁtness and genetic inter- action probabilities separately is presented. All of the SHM and IHM modelling assump- tions described in Section 3.3, such as distributional choices and hierarchical structure are inherited by the one stage approach kno wn as the joint hierarchical model (JHM). 3.4.1. Joint hierar chical model The JHM gi ven in T able 3.3 is an alternativ e, fully Bayesian version of the two-stage approach described in Section 3.3.1 and 3.3.2. The JHM incorporates the ke y modelling ideas from both the SHM and the IHM with the considerable adv antage that we can learn about logistic gro wth model, ﬁtness and genetic interaction parameters simultaneously , thereby av oiding having to choose a ﬁtness measure or point estimates for passing in- formation between models. The JHM is an extension of the SHM with the presence or absence of genetic interaction being described by a Bernoulli indicator and an additional le vel of error to account for variation due to the query condition. Genetic interaction is modelled in terms of the two logistic gro wth parameters K and r simultaneously . Similar to the interaction model in Section 3.3.2 in Chapter 3.3, linear relationships between con- trol and query carrying capacity and growth rate (instead of ﬁtness score) are assumed: ( e α c + K o l + δ l γ cl , e β c + r o l + δ l ω cl ) . By ﬁtting a single JHM, we need only calculate posterior means, check model diag- nostics and thin posteriors once. Ho we ver , the CPU time taken to reach con vergence for any giv en data set is roughly twice that of the two-stage approach for a genome-wide QF A. The ﬂow of information within the model and how each parameter is related to the data can be seen from the plate diagram in Figure 3.3. 47 Chapter 3. Modelling genetic interaction T able 3.3: Description of the joint hierarchical model (JHM). The dependent variable y clmn (scaled cell density measurements) and independent variable t clmn (time since inoculation) are input to the JHM. c identiﬁes the condition for a given orf ∆ , l identiﬁes a particular orf ∆ from the gene deletion library , m identiﬁes a repeat for a giv en orf ∆ and n identiﬁes the time point for a gi ven condition and orf ∆ repeat. c = 0 , 1 Condition le vel l = 1 , ..., L c orf ∆ le vel m = 1 , ..., M cl Repeat le vel n = 1 , ..., N clm T ime point le v el T ime point le v el y clmn ∼ N( ˆ y clmn , ( ν cl ) − 1 ) ˆ y clmn = x ( t clmn ; K clm , r clm , P ) Repeat le vel log K clm ∼ N( α c + K o l + δ l γ cl , ( τ K cl ) − 1 ) I ( −∞ , 0] log τ K cl ∼ N( τ K,p c , ( σ τ ,K c ) − 1 ) I [0 , ∞ ) log r clm ∼ N( β c + r o l + δ l ω cl , ( τ r cl ) − 1 ) I ( −∞ , 3 . 5] log τ r cl ∼ N( τ r,p c , ( σ τ ,r c ) − 1 ) orf ∆ le vel e K o l ∼ t ( K p , ( σ K,o ) − 1 , 3) I [0 , ∞ ) log σ K,o ∼ N( η K,o , ( ψ K,o ) − 1 ) e r o l ∼ t ( r p , ( σ r,o ) − 1 , 3) I [0 , ∞ ) log σ r,o ∼ N( η r,o , ( ψ r,o ) − 1 ) log ν cl ∼ N( ν p , ( σ ν ) − 1 ) log σ ν ∼ N( η ν , ( ψ ν ) − 1 ) δ l ∼ B er n ( p ) e γ cl = ( 1 if c = 0; t (1 , ( σ γ ) − 1 , 3) I [0 , ∞ ) if c = 1 . log σ γ ∼ N( η γ , ψ γ ) e ω cl = ( 1 if c = 0; t (1 , ( σ ω ) − 1 , 3) I [0 , ∞ ) if c = 1 . log σ ω ∼ N( η ω , ψ ω ) Condition le vel α c = ( 0 if c = 0; N( α µ , η α ) if c = 1 . β c = ( 0 if c = 0; N( β µ , η β ) if c = 1 . τ K,p c ∼ N( τ K,µ , ( η τ ,K,p ) − 1 ) log σ τ ,K c ∼ N( η τ ,K , ( ψ τ ,K ) − 1 ) τ r,p c ∼ N( τ r,µ , ( η τ ,r,p ) − 1 ) log σ τ ,r c ∼ N( η τ ,r , ( ψ τ ,r ) − 1 ) Population le vel log K p ∼ N( K µ , ( η K,p ) − 1 ) log r p ∼ N( r µ , ( η r,p ) − 1 ) ν p ∼ N( ν µ , ( η ν,p ) − 1 ) log P ∼ N( P µ , ( η P ) − 1 ) 48 Chapter 3. Modelling genetic interaction y cl mn ν p r p r cl m α c σ ω c β c ω cl K o l K p σ ν δ l r o l ˆ y clmn τ K,p τ r ,p P σ K,o ν cl σ τ ,r σ r ,o σ τ ,K K cl m τ K l τ r l Tim e P oin t or f ∆ Condi tion P opulati on σ γ c γ cl Rep eat Figure 3.3: Plate diagram for the joint hierarchical model, described in Section 3.4.1. This ﬁgure sho ws the ﬁve le v els of hierarchy in the JHM model, population, orf ∆ ( l ), condition ( c ), repeat ( m ) and time point ( n ). Prior hyperparameters for the population parameters are omitted. Plate diagram notation is gi ven in Figure 3.1. 49 Chapter 3. Modelling genetic interaction 3.5. Random effects model T o impro ve on the Addinall et al. (2011) modelling approach whilst remaining within the frequentist paradigm, by accounting for the hierarchical structure of the data, a random ef fects model (Zuur et al. , 2009; Pinheiro & Bates, 2000) can be used. The random effects model (REM) giv en in T able 3.4 is used to model estimated ﬁtness scores F clm from (1.6) and estimate e vidence of interaction for each orf ∆ simultaneously with a single model ﬁt. Introducing a random effect Z l allo ws us to account for between subject v ariation by estimating a single σ Z 2 . Unlike the Addinall et al. (2011) approach, observ ed values F clm are not scaled and instead a parameter to model a condition ef fect µ c is introduced. γ cl represents the estimated strength of genetic interaction between an orf ∆ and its query mutation counterpart. For a multiplicati ve model of epistasis, an additiv e model is used to describe the log transformed data f clm = log ( F clm + 1) , where F clm are the observed ﬁtnesses. W e use the Benjamini-Hochberg test to correct for multiple testing in order to make a fair comparison with the (Addinall et al. , 2011) approach. Inference for a frequentist random ef fects model can be carried out most simply with the R package “lme4” (Bates et al. , 2013). For the R code to ﬁt the REM see Section A.3 of the Appendix. In the frequentist paradigm some parameters cannot be modelled as random ef fects since computational difﬁculties associated with large matrix computations arise with multiple random ef fects and very lar ge data sets. Similarly , a more appropriate model with a log-link function in order to model repeat le v el v ariation with a normal distribution cannot be ﬁt, due to computational dif ﬁculties that arise with non-linear model maximum likelihood algorithms and large data sets. Such computational difﬁculties cause algorithms for parameter estimation to fail to con ver ge. T able 3.4: Description of the random effects model (REM). c identiﬁes the condition for a giv en orf ∆ , l identiﬁes a particular orf ∆ from the gene deletion library and m identiﬁes a repeat for a gi ven orf ∆ . f clm = µ c + Z l + γ cl + ε clm µ c = ( µ + α if c = 0; µ if c = 1 . γ cl = ( 0 if c = 0; γ l if c = 1 . Z l ∼ N (0 , σ Z 2 ) ε clm ∼ N (0 , σ 2 ) 50 Chapter 4. Case Studies 4.1. Introduction In this chapter , the ne w Bayesian models de veloped in Chapter 3 are applied to pre vi- ously analysed QF A screen data. The one-stage and two-stage Bayesian approaches are compared with the two-stage Addinall et al. (2011) and random ef fects model (REM) ap- proaches for a QF A screen comparison designed to inform the experimenter about telom- ere biology in S. cer e visiae . After comparing the approaches de veloped, the one-stage Bayesian joint hierarchical model (JHM) is found to best model a QF A screen comparison. The JHM is then applied to further examples of S. cere visiae QF A screen data to demonstrate the JHM’ s ability to model dif ferent experiments. T wo e xtensions of the JHM are then considered, to account for a batch effect and a transformation effect within a QF A screen comparison. Fitness plots for the further case studies and extensions of the JHM are included for further in- vestigation and research. The ne w one-stage Bayesian QF A will be used at ﬁrst to help identify genes that are related to telomere activity , but the analysis is general enough to be applicable to any high-throughput study of arrayed microbial cultures (including experiments such as drug screening). 4.2. cdc13-1 27 ◦ C vs ura3 ∆ 27 ◦ C suppr essor/enhancer data set The follo wing analysis is for a QF A experiment comparing query cdc13-1 s trains with control ura3 ∆ strains at 27 ◦ C, pre viously analysed by Addinall et al. (2011), to identify genes that show evidence of genetic interaction with the query mutation cdc13-1 . The ability of the Cdc13 protein produced by cdc13-1 strains to cap telomeres is reduced at temperatures abov e 26 ◦ C (Nugent et al. , 1996), inducing a ﬁtness defect. The experimental data used are freely av ailable at http://research.ncl.ac. uk/colonyzer/AddinallQFA/ .Addinall et al. (2011) present a list of interaction strengths and p-v alues for signiﬁcance of interaction, together with a ﬁtness plot for this experiment. W e will compare lists of genes classiﬁed as interacting with cdc13-1 by the non-hierarchical frequentist approach presented by Addinall et al. (2011) and the hierar - chical REM with those classiﬁed as interacting by our hierarchical Bayesian approaches. 51 Chapter 4. Case Studies 4,294 non-essential orf ∆ s were selected from the yeast deletion collection and used to build the corresponding double deletion query and control strains. Independent replicate culture gro wth curves (time course observ ations of cell density) were captured for each query and control strain. The median and range for the number of replicates per orf ∆ is 8 and [8 , 144] respecti vely . There are 66 orf ∆ strains that ha ve greater than 8 replicates (for both the control and query screen). More replicates ha ve been tested for this subset of orf ∆ s as a quality control measure to check if 8 replicates are sufﬁcient to generate a stable ﬁtness summary for each orf ∆ . orf ∆ s with high replicate number include a small number of mutations whose phenotypes are well understood in a telomere-defecti ve background, together with some controls and a range of mutations randomly selected from the deletion library . Including genotypes with well characterised phenotypes allows us to le verage e xpert, domain-speciﬁc kno wledge to assess the quality of experimental results. The modelling approaches considered can accommodate dif ferent numbers of replicates for each orf ∆ , therefore we don’ t expect systematic bias from the number of repeats. The range for the number of time points for growth curv es captured in the control experiment is [7 , 22] and [9 , 15] in the query experiment. Raw cdc13-1 27 ◦ C time series data is gi v en in Figure A.1, for example. As in the Addinall et al. (2011) analysis, a list of 159 genes are stripped from our ﬁnal list of genes for biological and e xperimental reasons. Prior hyper-parameters for the models used throughout this chapter are provided in T able B.1. Although our priors are informed by frequentist estimates of historical QF A data sets, we ensure our priors are suf ﬁciently dif fuse that all plausible parameter v alues are well represented and that any gi ven QF A data set can be ﬁt appropriately . The Heidelberg-W elch (Heidelber ger & W elch, 1981)and Raftery-Le wis (Raftery & Le wis, 1995)con v ergence diagnostics are used to determine whether con ver gence has been reached for all parameters. Posterior and prior densities are compared by eye to ensure that sample posterior distributions are not restricted by the choice of prior distri- bution. A CF (auto-correlation) plot diagnostics are checked visually to ensure that serial correlation between sample v alues of the posterior distribution is low , ensuring that the ef fecti v e sample size is similar to the actual sample size. T o assess ho w well the logistic gro wth model describes cell density observations we generate plots of raw data with ﬁtted curves ov erlaid. Figures 4.1A, 4.1B and 4.1C sho w time series data for three different mutant strain repeats at 27 ◦ C, together with ﬁtted lo- gistic curves. W e can see that each orf ∆ curve ﬁt well represents the repeat lev el esti- mates as each orf ∆ le v el (red) curve lies in the region where most repeat lev el (black) 52 Chapter 4. Case Studies curves are found. Sharing information between orf ∆ s will also af fect each orf ∆ curve ﬁt, increasing the probability of the orf ∆ lev el parameters being closer to the population parameters. Comparing Figures 4.1A, 4.1B and 4.1C sho ws that the separate hierarchical model (SHM) captures heterogeneity at both the repeat and orf ∆ le vels. Figure 4.1D demonstrates the hierarchy of information about the logistic model pa- rameter K generated by the SHM for the r ad50 ∆ control mutant strain (v ariation de- creases going from population le vel down to repeat lev el). Figure 4.1D also sho ws that the posterior distrib ution for K is much more peaked than the prior , demonstrating that we hav e learned about the distrib ution of both the population and orf ∆ parameters. Learning more about the repeat lev el parameters reduces the v ariance of our orf ∆ lev el estimates. The posterior for the ﬁrst time-course repeat K clm parameter shows e xactly how much uncertainty there is for this particular repeat in terms of carrying capacity K . 53 Chapter 4. Case Studies Figure 4.1: Separate hierarchical model (SHM) logistic gro wth curve ﬁts. Data for orf ∆ repeats hav e been plotted in A, B and C, with SHM ﬁtted curves ov erlaid in black for repeat lev el param- eters and red for the orf ∆ lev el parameter ﬁt. A) SHM scatter plot for 144 his3 ∆ ura3 ∆ repeats at 27 ◦ C. B) SHM scatter plot for 48 rad50 ∆ ura3 ∆ repeats at 27 ◦ C. C) SHM scatter plot for 56 exo1 ∆ ura3 ∆ repeats at 27 ◦ C. D) SHM density plot of posterior predictiv e distributions for rad50 ∆ ura3 ∆ carrying capacity K hierarchy . The prior distribution for K p is in black. The posterior predicti v e for e K o l is in blue and for K clm in green. The posterior distrib ution of the ﬁrst time-course repeat K clm parameter is in red. Parameters K p , e K o l and K clm are on the same scale as the observed data. 4.2.1. Frequentist appr oach Figure 4.2A is a M DR × M D P ﬁtness plot from Addinall et al. (2011) where gro wth curves and e vidence for genetic interaction are modelled using the non-hierarchical fre- quentist methodology discussed in Section 1.2.2. Figure 4.2B is a M D R × M D P ﬁtness 54 Chapter 4. Case Studies plot for the frequentist hierarchical approach REM, described in T able 3.4, applied to the logistic growth parameter estimates used in Addinall et al. (2011). The number of genes identiﬁed as interacting with cdc13-1 by Addinall et al. (2011) and by the REM are 715 and 315 respecti v ely (T able 4.1). The REM has highlighted many strains which hav e lo w ﬁtness. In order to ﬁt a linear model to the ﬁtness data and interpret results in terms of the multiplicati ve model we apply a log transformation to the ﬁtnesses, thereby affecting the distribution of orf ∆ lev el v ariation. The REM accounts for between subject variation and allo ws for the estimation of a query mutation and orf ∆ ef fect to be made simultaneously , unlike the model presented by Addinall et al. (2011). Due to the limitations of the frequentist hierarchical modelling frame work, the REM model assumes equal variances for all orf ∆ s and incorrectly de- scribes orf ∆ le vel variation as Log-normal, assumptions that are not necessary in our ne w Bayesian approaches. 55 Chapter 4. Case Studies A B C D 0 50 100 150 0 20 40 60 80 100 120 TEL1 TEC1 DPB3 GPR1 SIR2 YDR026C GIS1 FOB1 DPB4 SIR4 CT A1 SNF1 GP A2 RIM15 PUF4 PNC1 SNF4 SIP2 RTG2 HXK2 ACB1 PHB1 PHB2 LAG1 SOD2 FIS1 GUT2 LAC1 MSN4 MDH1 AA T1 RAD27 UTH1 DNM1 HSP104 EST2 BUD6 MDM30 SIR3 TSA1 ZDS2 NDI1 MSN2 NDE1 SGS1 ZDS1 HDA1 RAS2 RPD3 LAG2 DNL4 CKA2 RAS1 NPT1 SCP1 HST2 CTF4 HDA3 NTG1 FUN30 Y AL027W NUP60 HT A2 ALK2 APN2 PSY4 PIN4 HHT1 RDH54 MMS4 RAD16 TDP1 SLX1 HSM3 CHK1 SNF5 DCC1 MRC1 POL4 MSH3 NHP10 HEX3 RPN4 RAD59 BDF2 BRE1 MSH5 MGT1 RAD57 RAD28 PPH3 RAD55 UBC13 MSH6 BMH2 SAC3 RAD9 HT A1 DIN7 RAD3 4 HIM1 VID21 ESC2 MUS81 RAD30 DOT1 PLM2 RAD23 HA T2 MIG3 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 CKB1 ALK1 RAD6 MMS2 LIF1 P AN2 SOH1 RAD54 SAE2 RTF1 SHU1 SNF6 RRM3 WSS1 RTT107 CKA1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' MPH1 RTT101 HPR5 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 P AN3 IXR1 YNK1 APN1 CTK1 DOA1 TRM2 NUP133 RPL40B MLP1 RTT109 UBI4 RAD5 MLH2 SLX4 TOS4 IRC20 NEJ1 MMS22 RSC2 PSY3 CST9 CDC73 RAD33 UNG1 RAD52 SML1 OGG1 RAD10 CTK3 CSM3 CTF18 YKU80 PSO2 YIM1 TPP1 MLH1 DDR48 RAD14 YKU70 HHT2 PMS1 MKT1 Y AF9 EAF7 PSY2 IES2 MGS1 VPS75 TOF1 MCK1 SIN3 NTG2 MSH2 HMI1 P AP2 EXO1 CKB2 LEO1 ARP8 ELG1 IES4 WTM2 REV1 RAD17 PHR1 HA T1 CHL1 RAD1 RMI1 ELC1 PNG1 MEI5 MLH3 REV3 DDC1 EAF3 NHP6A MMS1 PEX22 A TG1 4 PEX32 YCL056C PEX19 GSG1 PEX7 ADR1 PEX5 PEX10 PEX3 PEX29 AFG1 A TG18 PEX14 PEX31 PEX8 PEX4 PEX21 SL T2 PEX28 PEX18 DJP1 BCK1 MD V1 A TG27 YJL185C PEX2 GPX1 PEX1 VPS1 CAF4 PEX13 PEX30 MID2 A TG17 YMR018W PEX12 INP2 INP1 PEX17 A TG2 PEX15 WSC3 MDH2 RTC1 PEX11 SLG1 Y OR084W PEX27 MKK1 PEX25 VPS30 MKK2 ANT1 SWD1 HEK2 SWD3 PBP2 RIF1 GBP2 EST3 SBA1 BRE2 STM1 EST1 TOP3 RIF2 CGI121 HSC82 YRF1−6 HSP82 CHZ1 PRE9 PEX6 CTI6 RTC6 TGS1 MRE11 XRS2 RAD50 0 50 100 150 0 20 40 60 80 100 TEL1 TEC1 DPB3 GPR1 SIR2 YDR026C GIS1 FOB1 DPB4 SIR4 CT A1 SNF1 GP A2 RIM15 PUF4 PNC1 SNF4 SIP2 RTG2 HXK2 ACB1 PHB1 PHB2 LAG1 SOD2 FIS1 GUT2 LAC1 MSN4 MDH1 AA T1 RAD27 UTH1 DNM1 HSP104 EST2 BUD6 MDM30 SIR3 TSA1 ZDS2 NDI1 MSN2 NDE1 SGS1 ZDS1 HDA1 RAS2 RPD3 LAG2 DNL4 CKA2 RAS1 NPT1 SCP1 HST2 CTF4 HDA3 NTG1 FUN30 Y AL027W NUP60 HT A2 ALK2 APN2 PSY4 PIN4 HHT1 RDH54 MMS4 RAD16 TDP1 SLX1 HSM3 CHK1 SNF5 DCC1 MRC1 POL4 MSH3 NHP10 HEX3 RPN4 RAD59 BDF2 BRE1 MSH5 MGT1 RAD57 RAD28 PPH3 RAD55 UBC13 MSH6 BMH2 SAC3 RAD9 HT A1 DIN7 RAD3 4 HIM1 VID21 ESC2 MUS81 RAD30 DOT1 PLM2 RAD23 HA T2 MIG3 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 CKB1 ALK1 RAD6 MMS2 LIF1 P AN2 SOH1 RAD54 SAE2 RTF1 SHU1 SNF6 RRM3 WSS1 RTT107 CKA1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' MPH1 RTT101 HPR5 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 P AN3 IXR1 YNK1 APN1 CTK1 DOA1 TRM2 NUP133 RPL40B MLP1 RTT109 UBI4 RAD5 MLH2 SLX4 TOS4 IRC20 NEJ1 MMS22 RSC2 PSY3 CST9 CDC73 RAD33 UNG1 RAD52 SML1 OGG1 RAD10 CTK3 CSM3 CTF18 YKU80 PSO2 YIM1 TPP1 MLH1 DDR48 RAD14 YKU70 HHT2 PMS1 MKT1 Y AF9 EAF7 PSY2 IES2 MGS1 VPS75 TOF1 MCK1 SIN3 NTG2 MSH2 HMI1 P AP2 EXO1 CKB2 LEO1 ARP8 ELG1 IES4 WTM2 REV1 RAD17 PHR1 HA T1 CHL1 RAD1 RMI1 ELC1 PNG1 MEI5 MLH3 REV3 DDC1 EAF3 NHP6A MMS1 PEX22 A TG1 4 PEX32 YCL056C PEX19 GSG1 PEX7 ADR1 PEX5 PEX10 PEX3 PEX29 AFG1 A TG18 PEX14 PEX31 PEX8 PEX4 PEX21 SL T2 PEX28 PEX18 DJP1 BCK1 MD V1 A TG27 YJL185C PEX2 GPX1 PEX1 VPS1 CAF4 PEX13 PEX30 MID2 A TG17 YMR018W PEX12 INP2 INP1 PEX17 A TG2 PEX15 WSC3 MDH2 RTC1 PEX11 SLG1 Y OR084W PEX27 MKK1 PEX25 VPS30 MKK2 ANT1 SWD1 HEK2 SWD3 PBP2 RIF1 GBP2 EST3 SBA1 BRE2 STM1 EST1 TOP3 RIF2 CGI121 HSC82 YRF1−6 HSP82 CHZ1 PRE9 PEX6 CTI6 RTC6 TGS1 MRE11 XRS2 RAD50 0 50 100 150 0 10 20 30 40 50 60 NTG1 Y AL027W PEX22 NUP60 SWD1 HT A2 ALK2 APN2 HEK2 PSY4 PIN4 TEL1 HHT1 RDH54 TEC1 MMS4 RAD16 A TG14 PEX32 SWD3 TDP1 SLX1 PBP2 HSM3 CHK1 RIF1 DPB3 SNF5 GBP2 DCC1 YCL056 C MRC1 POL4 MSH3 NHP10 HEX3 RPN4 GPR1 SIR2 RAD59 PEX19 BDF2 BRE1 MSH5 MGT1 RAD57 YDR026C RAD28 PPH3 RAD55 UBC13 GIS1 MSH6 BMH2 GSG1 FOB1 DPB4 PEX7 SAC3 ADR1 RAD9 HT A1 SIR4 PEX5 CT A1 DIN7 PEX10 RAD3 4 HIM1 PEX3 VID21 ESC2 MUS81 RAD30 DOT1 SNF1 PEX29 PLM2 RAD23 AFG1 GP A2 MIG3 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 RIM15 A TG18 PUF4 CKB1 ALK1 PNC1 RAD6 MMS2 LIF1 P AN2 SNF4 SOH1 PEX14 RAD54 SAE2 SIP2 RTF1 RTG2 HXK2 PEX31 ACB1 PEX8 PHB1 PEX4 PHB2 PEX21 LAG1 SHU1 SNF6 SOD2 SL T2 RRM3 WSS1 PEX28 RTT107 PEX18 EST3 CKA1 FIS1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' GUT2 MPH1 DJP1 RTT101 HPR5 BCK1 MD V1 A TG27 YJL185C PEX2 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 LAC1 P AN3 GPX1 IXR1 MSN4 YNK1 MDH1 AA T1 RAD27 APN1 SBA1 CTK1 PEX1 DOA1 VPS1 CAF4 UTH1 TRM2 NUP133 RPL40B MLP1 DNM1 RTT109 HSP104 UBI4 BRE2 RAD5 MLH2 SLX4 STM1 TOS4 PEX13 EST1 TOP3 IRC20 NEJ1 BUD6 MMS22 PEX30 MID2 RSC2 MDM30 PSY3 CST9 CDC73 A TG17 SIR3 RIF2 RAD33 UNG1 TSA1 RAD52 CGI121 SML1 OGG1 RAD10 ZDS2 CTK3 NDI1 YMR018W PEX12 MSN2 CSM3 CTF18 YKU80 PSO2 NDE1 YIM1 TPP1 INP2 MLH1 DDR48 SGS1 RAD14 INP1 ZDS1 YKU70 HDA1 HHT2 PMS1 MKT1 RAS2 Y AF9 EAF7 PEX17 IES2 MGS1 A TG2 VPS75 MCK1 RPD3 YRF1−6 SIN3 LAG2 NTG2 PEX15 MSH2 WSC3 MDH2 RTC1 PEX11 DNL4 SLG1 EXO1 CKB2 CKA2 Y OR084W RAS1 LEO1 ARP8 ELG1 IES4 PEX27 NPT1 WTM2 MKK1 REV1 SCP1 RAD17 PHR1 HA T1 CHL1 HST2 RAD1 RMI1 ELC1 PNG1 PEX25 VPS30 MEI5 MKK2 MLH3 REV3 DDC1 HSP82 EAF3 NHP6A ANT1 CTF4 MMS1 HDA3 CHZ1 PRE9 PEX6 TGS1 CTI6 RTC6 XRS2 MRE11 RAD50 0 20 40 60 80 0 20 40 60 80 NTG1 Y AL027W PEX22 NUP60 SWD1 HT A2 ALK2 APN2 HEK2 PSY4 PIN4 TEL1 HHT1 RDH54 TEC1 MMS4 RAD16 A TG14 PEX32 SWD3 TDP1 SLX1 PBP2 HSM3 CHK1 RIF1 DPB3 SNF5 GBP2 DCC1 YCL056 C MRC1 POL4 MSH3 NHP10 HEX3 RPN4 GPR1 SIR2 RAD59 PEX19 BDF2 BRE1 MSH5 MGT1 RAD57 YDR026C RAD28 PPH3 RAD55 UBC13 GIS1 MSH6 BMH2 GSG1 FOB1 DPB4 PEX7 SAC3 ADR1 RAD9 HT A1 SIR4 PEX5 CT A1 DIN7 PEX10 RAD 34 HIM1 PEX3 VID21 ESC2 MUS81 RAD30 DOT1 SNF1 PEX29 PLM2 RAD23 AFG1 GP A2 MIG3 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 RIM15 A TG18 PUF4 CKB1 ALK1 PNC1 RAD6 MMS2 LIF1 P AN2 SNF4 SOH1 PEX14 RAD54 SAE2 SIP2 RTF1 RTG2 HXK2 PEX31 ACB1 PEX8 PHB1 PEX4 PHB2 PEX21 LAG1 SHU1 SNF6 SOD2 SL T2 RRM3 WSS1 PEX28 RTT107 PEX18 EST3 CKA1 FIS1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' GUT2 MPH1 DJP1 RTT101 HPR5 BCK1 MD V1 A TG27 YJL185C PEX2 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 LAC1 P AN3 GPX1 IXR1 MSN4 YNK1 MDH1 AA T1 RAD27 APN1 SBA1 CTK1 PEX1 DOA1 VPS1 CAF4 UTH1 TRM2 NUP133 RPL40B MLP1 DNM1 RTT109 HSP104 UBI4 BRE2 RAD5 MLH2 SLX4 STM1 TOS4 PEX13 EST1 TOP3 IRC20 NEJ1 BUD6 MMS22 PEX30 MID2 RSC2 MDM30 PSY3 CST9 CDC73 A TG17 SIR3 RIF2 RAD33 UNG1 TSA1 RAD52 CGI121 SML1 OGG1 RAD10 ZDS2 CTK3 NDI1 YMR018W PEX12 MSN2 CSM3 CTF18 YKU80 PSO2 NDE1 YIM1 TPP1 INP2 MLH1 DDR48 SGS1 RAD14 INP1 ZDS1 YKU70 HDA1 HHT2 PMS1 MKT1 RAS2 Y AF9 EAF7 PEX17 IES2 MGS1 A TG2 VPS75 MCK1 RPD3 YRF1−6 SIN3 LAG2 NTG2 PEX15 MSH2 WSC3 MDH2 RTC1 PEX11 DNL4 SLG1 EXO1 CKB2 CKA2 Y OR084W RAS1 LEO1 ARP8 ELG1 IES4 PEX27 NPT1 WTM2 MKK1 REV1 SCP1 RAD17 PHR1 HA T1 CHL1 HST2 RAD1 RMI1 ELC1 PNG1 PEX25 VPS30 MEI5 MKK2 MLH3 REV3 DDC1 HSP82 EAF3 NHP6A ANT1 CTF4 MMS1 HDA3 CHZ1 PRE9 PEX6 TGS1 CTI6 RTC6 XRS2 MRE11 RAD50 Figure 4.2: Fitness plots with orf ∆ posterior mean ﬁtnesses. Mean orf ∆ lev el ﬁtness are plot- ted for the control strains against the corresponding query strains. orf ∆ s with signiﬁcant evi- dence of interaction are highlighted in red and green for suppressors and enhancers respecti vely . A) Non-Bayesian, non-hierarchical ﬁtness plot, based on T able S6 from Addinall et al. (2011) ( F = M DR × M DP ) . B) Non-Bayesian, hierarchical ﬁtness plot, from ﬁtting the REM to data in T able S6 from Addinall et al. (2011) ( F = M D R × M DP ) . C) IHM ﬁtness plot with orf ∆ posterior mean ﬁtness. orf ∆ s with signiﬁcant evidence of interaction are highlighted on the plot as red and green for suppressors and enhancers respectiv ely ( F = M D R × M DP ) . D) JHM ﬁtness plot with orf ∆ posterior mean ﬁtnesses. orf ∆ strains for the JHM plot are classiﬁed as being a suppressor or enhancer based on analysis of gro wth parameter r , meaning occasionally strains can be more ﬁt in the query experiment in terms of M D R × M DP but be classiﬁed as enhancers (green). For panels A and B signiﬁcant interactors are classiﬁed as those with FDR corrected p-v alues < 0 . 05 . For panels C and D signiﬁcant interactors ha ve posterior probability ∆ > 0 . 5 . T o compare ﬁtness plots, labelled genes are those belonging to the following GO terms in T able 4.1: “telomere maintenance”, “ageing”, “response to DNA damage stimulus” or “perox- isomal org anization”, as well as the genes identiﬁed as interactions only in K with the JHM (see Figure 4.3) (blue), genes interacting only in r with the JHM (cyan) and the MRX complex genes (pink). Solid and dashed grey ﬁtted lines are for the 1-1 line and linear model ﬁts respecti vely . Alternati ve ﬁtness plots with each of the GO terms highlighted are gi ven in Section B.2 of the Appendix. 56 Chapter 4. Case Studies T able 4.1: Number of genes interacting with cdc13-1 at 27 ◦ C identiﬁed using each of four ap- proaches: Add (Addinall et al. , 2011), REM, IHM and JHM. Number of genes annotated with four example GO terms (telomere maintenance, ageing, response to DN A damage stimulus and peroxisome organisation) are also listed. For the Addinall et al. (2011) and REM approach, sig- niﬁcant interactors are classiﬁed as those with FDR corrected p-v alues (q-values) < 0 . 05 . The label “half data” denotes analyses where only half of the a v ailable experimental observations are used. The JHM uses a M D R × M D P summary after model ﬁtting to classify suppres- sors and enhancers, comparable with the other three approaches. The full lists of GO terms for each approach considered are giv en in a spreadsheet document, freely av ailable online at http://research.ncl.ac.uk/qfa/HeydariQFABayes/ . Met hod Suppressors Enhan cers Hi ts Suppressors (half data) Enhan cers (half data) Hi ts ( half data) Tel omer e m aintenance (N=33) p value q value Agi ng ( N=58) p value q value Resp onse t o DNA damage stimulus (N=58) p value q value Peroxi some organization (N=180) p value q value Add 419 296 715 26 3 19 2 45 5 18 1.52E-06 0.0376 16 4.32E-05 0.1863 69 9.28E-12 8.14E -10 13 0.225 0.468 RE M 1 84 1 31 3 15 1 03 86 189 11 2.37E-05 0.0136 1 0 0. 0004 0.0824 49 7.40E-16 1.73E-13 3 0.855 0.914 IHM 404 172 57 6 25 2 11 3 36 5 14 6.57E-05 0.0051 16 0.0015 0.0445 55 4.60E-09 3.41E -07 10 0.318 0.524 JHM 66 5 27 4 93 9 47 5 17 7 60 1 18 8.22E-05 0.0155 21 0.0015 0.0986 76 3.52E-09 1.99E - 07 24 0.002 0 . 019 4.2.2. T wo stage Bayesian appr oach Figure 4.2C is an interaction hierarchical model (IHM) ﬁtness plot with orf ∆ le vel ﬁtness measures generated using the new Bayesian tw o-stage methodology with ﬁtness in terms of M DR × M DP . 576 genes are identiﬁed by the IHM as genetic interactions (T able 4.1). Logistic parameter posterior means are used to generate ﬁtness measures. For a gene ( l ) from the gene deletion library , ( e Z l ) is the ﬁtness for the control and ( e α 1 + Z l + δ l γ c,l ) for the query in the IHM. For a gene ( l ) in the query screen, with no e vidence of genetic interaction i.e. δ l = 0 , ﬁtness will be a linear transformation from the control counterpart ( e α 1 + Z l ) . Similar to Figures 4.2A and 4.2B, Figure 4.2C shows how the majority of con- trol strains are more ﬁt than their query strain counterparts, with a mean ﬁtted line lying belo w the line of equal ﬁtness. Comparing the ﬁtted lines in Figures 4.2A and 4.2B with Figure 4.2C, the IHM shows the lar gest deviation between the ﬁtted line and the line of equal ﬁtness, is lar gely due to the difference in P estimated with the SHM for the control and query data sets being scaled out by the parameter α 1 . If we ﬁx P in our Bayesian models, similar to the frequentist approach, genetic interactions identiﬁed are largely the same, but we then ha ve the problem of choosing P . W e recommend estimating P simul- taneously with the other model parameters because if the choice of P is not close to the true v alue, gro wth rate r estimates must compensate and don’ t gi ve accurate estimates for 57 Chapter 4. Case Studies time courses with lo w carrying capacity K . It can be seen that many of the interacting orf ∆ s have large de viations from the genetic independence line. This is because of the indicator variable in the model, used to describe genetic interaction. When there is enough e vidence for interaction the Bernoulli v ariable is set to 1, otherwise it is set to 0. It is interesting to note that non-signiﬁcant orf ∆ s, marked by grey points, lie amongst some of the signiﬁcant strains. Many such points hav e high variance and therefore we are less conﬁdent that these interact with the query mutation. This feature of our new approach is an improvement ov er that presented in Addinall et al. (2011), which always sho ws evidence for an epistatic ef fect when mean distance from the genetic independence line is large, regardless of strain ﬁtness variability . An e xtract from the list of top interactions identiﬁed by the IHM is included in T able B.2. 4.2.3. One stage Bayesian appr oach Figure 4.2D is a JHM M DR × M DP ﬁtness plot using the new , uniﬁed Bayesian method- ology . The M D R × M D P ﬁtness plot gi ven in Figure 4.2D is for visualisation and com- parison with the M DR × M DP ﬁtness plots of the other approaches considered: the JHM does not make use of a ﬁtness measure. 939 genes are identiﬁed by the JHM as genetic interactions (T able 4.1). Posterior means of model parameters are used to obtain the follo wing ﬁtness measures. W ith the JHM we can obtain an orf ∆ lev el estimate of the carrying capacity and growth rate ( K, r ) for a gene ( l ). For a gene ( l ) from the gene deletion library , carrying capacity and growth rate ( e K o l , e r o l ) are used to e v aluate the ﬁt- ness for the control and ( e α 1 + K o l + δ l γ c,l , e β 1 + r o l + δ l ω c,l ) for the query . F or a gene ( l ) in the query screen, with no evidence of genetic interaction i.e. δ l = 0 , carrying capacity and gro wth rate will be linear transformations from the control counterpart ( e α 1 + K o l , e β 1 + r o l ) . Instead of producing a ﬁtness plot in terms of M D R × M D P , it can also be use- ful to analyse carrying capacity K and growth rate r ﬁtness plots as, in the JHM, evi- dence for genetic interaction comes from both of these parameters simultaneously , see Figures B.5 and B.6. Fitness plots in terms of logistic gro wth parameters are useful for identifying some unusual characteristics of orf ∆ s. For example, an orf ∆ may be deﬁned as a suppressor in terms of K but an enhancer in terms of r . T o enable direct comparison with the Addinall et al. (2011) analyses we generated a M D R × M D P ﬁtness plot, Fig- ure 4.2D. An extract from the list of top interactions identiﬁed by the JHM is included in T able B.3. 58 Chapter 4. Case Studies T able 4.2: Genes interacting with cdc13-1 at 27 ◦ C and GO terms over -represented in the list of interactions according to each approach A) Number of genes identiﬁed for each approach (Add Addinall et al. (2011), REM, IHM and JHM) and the ov erlap between the approaches. 4135 genes from the S. cer evisiae single deletion library tested ov erall. B) Number of GO terms identiﬁed for each approach (Add Addinall et al. (2011), REM, IHM and JHM) and the ov erlap between the approaches. 6107 S. cer e visiae GO T erms av ailable. A. REM:0 REM:1 Add:0 Add:1 Add:0 Add:1 IHM:0 JHM:0 3097 54 31 10 JHM:1 231 78 29 29 IHM:1 JHM:0 1 2 1 0 JHM:1 30 327 0 215 B. REM:0 REM:1 Add:0 Add:1 Add:0 Add:1 IHM:0 JHM:0 5813 21 58 7 JHM:1 46 8 6 10 IHM:1 JHM:0 20 15 3 12 JHM:1 13 54 2 147 4.3. Comparison with pre vious analysis 4.3.1. Signiﬁcant genetic interactions Of the genes identiﬁed as interacting with cdc13-1 (1038, see T able 4.2A) some are iden- tiﬁed consistently across all four approaches (215 out of 1038, see T able 4.2A). Of the hits identiﬁed by the JHM (939), the majority (639) are common with those in the previously published Addinall et al. (2011) approach. Howe v er , 231 of 939 are uniquely identiﬁed by the JHM and could be subtle interactions which are the result of previously unknown biological processes. T o examine the evidence for some interactions uniquely identiﬁed by the JHM in more detail we compared the growth curv es for three examples from the group of interactions identiﬁed only by the JHM. These examples ( chz1 ∆ , pre9 ∆ and pex6 ∆ ) are genetic in- teractions which can be identiﬁed in terms of carrying capacity K , but not in terms of gro wth rate r (see Figure 4.3). By observing the dif ference between the ﬁtted gro wth curve (red) and the expected growth curve, giv en no interaction (green) in Figure 4.3A, 4.3B and 4.3C we test for genetic interaction. Since the e xpected growth curves in the absence of genetic interaction are not representati v e of either the data or the ﬁtted curves on the repeat and orf ∆ le vel, there is e vidence for genetic interaction. W e chose a prior for the probability p of a gene interacting with the background muta- tion as 0.05. W e therefore expected to ﬁnd 215 genes interacting. The Bayesian models, for which a prior is applicable (IHM and JHM), ﬁnd more genes than expected (576 and 939 interactions respectiv ely , T able 4.1), demonstrating that information in this dataset can ov ercome prior expectations. The JHM identiﬁes the highest proportion of genes as hits out of all methods considered, particularly identifying suppressors of cdc13-1 (T a- ble 4.1). In fact, the JHM identiﬁes more hits than the Addinall et al. (2011) approach, 59 Chapter 4. Case Studies e ven when constrained to using only half of the a vailable data. An important advantage to our ne w Bayesian approach is that we no longer ha ve the dif ﬁculty of choosing a q-v alue threshold. F or the Addinall et al. (2011) approach to have similar numbers of interactions to the JHM, a less stringent q-value threshold would ha ve to be justiﬁed a posteriori by the experimenter . 4.3.2. Pre viously kno wn genetic interactions In order to compare the quality of our new , Bayesian hierarchical models with existing, frequentist alternativ es, we examined the lists of genetic interactions identiﬁed by all the methods discussed and presented here. Comparing results with expected or previously kno wn lists of interactions from the relev ant literature, we ﬁnd that genes coding for the MRX complex ( MRE11 , XRS2 & RAD50 ), which are known to interact with cdc13-1 (Foster et al. , 2006), are identiﬁed by all four approaches considered and can be seen in a similar position in all four ﬁtness plots (Figure 4.2A, 4.2B, 4.2C and 4.2D). By observing the genes labelled in Figure 4.2A and 4.2B we can see that the frequen- tist approaches are unable to identify man y of the interesting genes identiﬁed by the JHM as these methods are unable to detect interactions for genes close to the genetic inde- pendence line. The JHM has extracted more information from deletion strain ﬁtnesses observed with high v ariability than the Addinall et al. (2011) approach by sharing more information between lev els, consequently impro ving our ability to identify interactions for genes close to the line of genetic independence (subtle interactions). CTI6 , RTC6 and TGS1 are three examples of subtle interactors identiﬁed only by the JHM (interaction in terms of r b ut not K ) which all hav e pre viously known telomere-related functions (Franke et al. , 2008; K eogh et al. , 2005; Addinall et al. , 2008). W e tested the biological rele v ance of results from the various approaches by carrying out unbiased Gene Ontology (GO) term enrichment analyses on the hits (lists of genes classiﬁed as ha ving a signiﬁcant interaction with cdc13-1 ) using the bioconductoR pack- age GOstats (Falcon & Gentleman, 2007). For the GO term enrichment analysis R code used, see Section B.5 of the Appendix. All methods identify a large proportion of the genes in the yeast genome annotated with the GO terms “telomere maintenance” and “response to DN A damage stimulus” (see T able 4.1), which were the targets of the original screen, demonstrating that they all correctly identify pre viously known hits of biological relev ance. Interestingly , the JHM identiﬁes man y more genes annotated with the “ageing” GO term, which we also expect to be related to telomere biology (though the role of telomeres in ageing re- 60 Chapter 4. Case Studies mains contro versial) suggesting that the JHM is identifying novel, rele v ant interactions not previously identiﬁed by the Addinall et al. (2011) screen (see T able 4.1). Simi- larly , the JHM identiﬁes a much larger proportion of the PEX “peroxisomal” complex (included in GO term: “peroxisome organisation”) as interacting with cdc13-1 (see T a- ble 4.1) including all of those identiﬁed in Addinall et al. (2011). Man y of the PEX genes sho w large variation in both K and r , an example can be seen in Figure 4.3C for pex6 ∆ . Members of the PEX complex cluster tightly , abov e the ﬁtted line in the ﬁt- ness plot Figure 4.2D (ﬁtness plots with highlighted genes for GO terms in T able 4.1 are giv en in Section B.2 of the Appendix), demonstrating that although these func- tionally related genes are not strong interactors, they do behav e consistently with each other , suggesting that the interactions are real. The results of tests for signiﬁcant ov er - representation of all GO terms are giv en in a spreadsheet document, freely a v ailable on- line at http://research.ncl.ac.uk/qfa/HeydariQFABayes/ . Overall, within the genes interacting with cdc13-1 identiﬁed by the Addinall et al. (2011), REM, IHM and JHM approaches, 274, 245, 266 and 286 GO terms were signif- icantly ov er -represented respecti vely (out of 6235 possible GO terms, see T able 4.2B). 147 were common to all approaches and examples from the group of GO terms ov er - represented in the JHM analysis and not in the Addinall et al. (2011) analysis seem in- ternally consistent (e.g. “peroxisome organisation” GO term) and consistent with the bi- ological tar get of the screen, telomere biology (signiﬁcant GO terms for genes identiﬁed only by the JHM are also included in the spreadsheet document). Extracts from the list of top interactions identiﬁed by both the IHM and JHM are provided in Section B.3. Files including the full lists of genetic interactions for the IHM and JHM are freely a v ailable online at http://research.ncl.ac.uk/qfa/ HeydariQFABayes/ .Alternati ve ﬁtness plots to Figure 4.2A, B, C & D with gene labels for those sho wing signiﬁcant evidence of genetic interaction are provided in Fig- ure 1.4 and Section B.7. As suppressors and enhancers in the JHM may be in terms of both K and r , ﬁtness plots in terms of K and r with gene labels for those sho wing signiﬁcant e vidence of genetic interaction are gi v en in Figure B.10 and Figure B.11 respecti vely . T o further compare the similarity of the Bayesian hierarchical models and frequentist analysis, a table of Spearman’ s rank correlation coef ﬁcients (Spearman, 1987) between genetic strengths and a M D R × M D P correlation plot of the JHM v ersus the Addinall et al. (2011) are gi ven in Section B.8 of the Appendix. 61 Chapter 4. Case Studies 4.3.3. Hierarch y and model parameters The hierarchical structure and model choices included in the Bayesian JHM and IHM are deriv ed from the kno wn experimental structure of QF A. Dif ferent lev els of variation for dif ferent orf ∆ s are expected and can be observed by comparing distrib utions of fre- quentist estimates or by visual inspection of yeast culture images. The direct relationship between experimental and model structure, together with the richness of detail and num- ber of replicates included in QF A experimental design, reassures us that overﬁtting is not an issue in this analysis. F or the ura3 ∆ 27 ◦ C and cdc13-1 27 ◦ C experiment with 4294 orf ∆ s there are 1.25 times the number of parameters in the JHM ( ∼ 200,000) compared to the two stage REM approach ( ∼ 160,000) b ut when compared to the lar ge number of pairs of data points ( ∼ 830,000) there are sufﬁcient degrees of freedom to justify our proposed Bayesian models. 4.3.4. Computing requir ements Our Bayesian hierarchical models require signiﬁcant computational time. As expected, the mixing of chains in our models is weakest at population lev el parameters such as K p and α c . For the ur a3 ∆ 27 ◦ C and cdc13-1 27 ◦ C dataset, the JHM takes ∼ 2 weeks to con ver ge and produce a suf ﬁciently large sample. The two stage Bayesian approach takes one week (with the IHM part taking ∼ 1 day), whereas the REM takes ∼ 3 days and the Addinall et al. (2011) approach takes ∼ 3 hours. A QF A experiment can take over a month from start to ﬁnish and so analysis time is acceptable in comparison to the time taken for the creation of the data set but still a notable incon venience. W e expect that with further research ef fort, computational time can be decreased by using an improv ed inference scheme and that inference for the JHM could be completed in less than a week without parallelisation. MCMC algorithms are inherently sequential so, parallelisation is not completely trivial and may be considered for future dev elopment. Parallelisation may reduce computational time by partitioning the state space into segments that can be updated in parallel (Rosenthal, 2000). F or the JHM it may be possible to partition by QF A screens to reduce computational time. Further , parallelisation may be possible across orf ∆ s for e ven further reduction to computational time. 62 Chapter 4. Case Studies 0 1 2 3 4 5 6 7 0.00 0.02 0.04 0.06 0.08 0.10 Time (Days) Scaled Culture Density (AU) 0 1 2 3 4 5 6 7 0.00 0.05 0.10 0.15 0.20 Time (Days) Scaled Culture Density (AU) 0 1 2 3 4 5 6 7 0.00 0.05 0.10 0.15 0.20 Time (Days) Scaled Culture Density (AU) A B C Figure 4.3: Joint hierarchical model (JHM) logistic growth curve ﬁtting. JHM data for orf ∆ repeats ha ve been plotted in A, B and C, with ﬁtted curves ov erlaid in black for repeat level parameters, red for the orf ∆ lev el query parameter ﬁt and green for the e xpected orf ∆ lev el query parameter ﬁt with no genetic interaction. A) JHM scatter plot for 8 chz1 ∆ cdc13-1 repeats. B) JHM scatter plot for 8 pr e9 ∆ cdc13-1 repeats. C) JHM scatter plot for 8 pex6 ∆ cdc13-1 repeats. 63 Chapter 4. Case Studies 4.3.5. Con vergence diagnostics Evidence of con ver gence for our Bayesian models in Section 4.2.2 and 4.2.3 can be sho wn by observing posterior samples from the MCMC samplers used. Figures 4.4, 4.5 and 4.6 sho w e vidence of con ver gence for a subset of population le v el parameters from the SHM, IHM and JHM respecti v ely . Posterior samples of 1000 particles are obtained after a b urn- in period of 800k and a thinning of e very 100 observ ations for the SHM, IHM and JHM. Population lev el parameters are found to hav e the worst mixing in our models due to the large number of lo wer le vel parameters that population lev el parameter sampling distributions are conditioned upon. W e demonstrate how our population parameters have con verged with T race plots, A CF and density plots in Figures 4.4, 4.5 and 4.6. T race plots sho w that the posterior samples are bound between a ﬁxed range of values, indicating con vergence. Auto-correlation functions do not hav e any large peaks above the dashed blue line for signiﬁcant evidence of dependence, sho wing that each sequential sample v alue from the posterior distributions are largely uncorrelated with previous values and ensuring that the ef fecti v e sample size is similar to the actual sample size. A CF plots in Figures 4.5 and 4.6 do sho w some dependence within our posterior samples but as the A CF decays rapidly before a lag of 5, there is only a small amount that will not be a problem for inference. Density plots sho w that that there is enough information within the models to giv e sufﬁciently peaked single modes, con v er ging around a ﬁxed region of plausible v alues. T able 4.3 gi v es diagnostic statistics for the population parameters considered in Fig- ures 4.4, 4.5 and 4.6. W e can see in T able 4.3 that the lowest effecti v e sample size of our model parameters is 324 , for the JHM P parameter , followed by 378 for the SHM P parameter . Of all our model parameters, P was found to hav e the lo west effecti v e sample size, but we are still able to ﬁnd a lar ge enough sample for our inference. Heidelberg and W elch P-values do not show evidence against the stationary of our chains, using a cut- of f of 0 . 10 . The abov e statistics are calculated for all model parameters and are used to identify where mixing is poor and if our model has reached con ver gence. All chains are accepted for parameter posterior samples in Section 4.2.2 and 4.2.3 as effecti ve sample sizes are found to be greater than 300 and Heidelberg and W elch P-values greater than 0 . 10 for e very chain. 64 Chapter 4. Case Studies T able 4.3: Bayesian model con vergence statistics for the tw o-stage approach in Section 4.2.2 and one-stage approach in Section 4.2.3. Heidelberg and W elch P-values and the ef fective sample size hav e been calculated for a subset of population le v el parameters. Model P ar ameter Effective sample size Heidelber g and W elch P-value SHM K p 521 0.49 r p 441 0.11 P 378 0.56 ν p 1000 0.17 IHM Z p 677 0.35 σ z 430 0.14 ν p 1000 0.46 α c 914 0.59 JHM K p 473 0.72 r p 566 0.12 P 324 0.12 ν p 1000 0.13 α 407 0.36 β 808 0.67 65 Chapter 4. Case Studies 0 200 400 600 800 0.1405 0 5 10 15 20 25 30 0.0 0.6 ACF 0.1400 0.1410 0.1420 0 1000 Density 0 200 400 600 800 6.62 6.66 0 5 10 15 20 25 30 0.0 0.6 ACF 0 200 400 600 800 8.6e−06 0 5 10 15 20 25 30 0.0 0.6 ACF 0 200 400 600 800 215000 0 5 10 15 20 25 30 0.0 0.6 ACF p K ν p e p r P 6.61 6.63 6.65 6.67 0 20 40 Density 8.4e−06 8.8e−06 9.2e−06 0 2500000 Density 215000 225000 235000 0.00000 Density P article number Lag Parameter value e e e ^ ^ ^ ^ Figure 4.4: Con ver gence diagnostics for the separate hierarchical model (SHM). Trace, auto- correlation and density plots for the SHM parameter posteriors (sample size = 1000, thinning interv al = 100 and burn-in = 800000), see Section 4.2.2. Posterior (black) and prior (red) densities are sho wn in the right hand column. 66 Chapter 4. Case Studies 0 200 400 600 800 133.0 134.5 0 5 10 15 20 25 30 0.0 0.6 ACF 133.0 134.0 135.0 0.0 0.8 Density 0 200 400 600 800 0.034 0.042 0 5 10 15 20 25 30 0.0 0.6 0.034 0.038 0.042 0.046 0 150 p Z σ Z P article number Lag Parameter value 0 200 400 600 800 0.0150 0 5 10 15 20 25 30 0.0 0.6 ACF 0.0148 0.0152 0.0156 0.0160 0 1500 Density 0 200 400 600 800 0.1830 0 5 10 15 20 25 30 0.0 0.6 ACF 0.183 0.184 0.185 0.186 0 400 Density ν p e α 2 e e e ^ ^ ^ ^ ACF Figure 4.5: Conv er gence diagnostics for the interaction hierarchical model (IHM). Trace, auto- correlation and density plots for the IHM parameter posteriors (sample size = 1000, thinning interv al = 100 and burn-in = 800000), see Section 4.2.2. Posterior (black) and prior (red) densities are sho wn in the right hand column. 67 Chapter 4. Case Studies 0 200 400 600 800 0.1415 0 5 10 15 20 25 30 0.0 0.6 ACF 0.1410 0.1420 0.1430 0 800 Density 0 200 400 600 800 0.545 0.560 0 5 10 15 20 25 30 0.0 0.6 ACF 0.545 0.555 0 100 Density 0 200 400 600 800 0.416 0.420 0 5 10 15 20 25 30 0.0 0.6 ACF 0.416 0.418 0.420 0 200 Density 0 200 400 600 800 6.54e−05 0 5 10 15 20 25 30 0.0 0.6 ACF 6.52e−05 6.58e−05 0 2000000 Density 0 200 400 600 800 98000 0 5 10 15 20 25 30 0.0 0.6 ACF 96000 100000 104000 0.00000 Density 0 200 400 600 800 5.350 5.370 0 5 10 15 20 25 30 0.0 0.6 ACF 5.350 5.360 5.370 0 60 Density ν p e β 2 e α 2 e P article number Lag Parameter value p K e p r e P e ^ ^ ^ ^ ^ ^ Figure 4.6: Con v ergence diagnostics for the joint hierarchical model (JHM). T race, auto- correlation and density plots for the JHM parameter posteriors (sample size = 1000, thinning interv al = 100 and burn-in = 800000), see Section 4.2.3. Posterior (black) and prior (red) densities are sho wn in the right hand column. 68 Chapter 4. Case Studies 4.3.6. Simulation study A simulation study was carried out to compare the performance of the dif ferent ap- proaches considered for a simulated QF A screen comparison from the JHM. W e belie ve that the JHM closely models a QF A screen comparison and so by simulating a QF A screen comparison data set from the JHM we will obtain a data set for which we know the full set of true genetic interactions. Simulated JHM data will include important features of QF A screen comparison data, such as a hierarchical structure and genetic interaction in terms of both K and r . T wo simulated QF A screens where generated, a control and query screen with some condition effect in the query . Each screen consists of 4300 orf ∆ s and 8 logistic growth time-course repeats for each orf ∆ . Each time-course consists of 10 measurements, e venly distributed across 6 days. 430 genes were set as genetic interactors in the query screen. The true Population lev el parameters are chosen from frequentist estimates of 10 historic data sets, orf ∆ and repeat le v el parameters are then generated from the JHM structure in T able 3.3 and gro wth time-course data simulated. T able 4.4 shows the number of true genetic interactions identiﬁed, suppressors and enhancers, as well as false positiv es (FPs) and false neg ati ves (FN) for each of the ap- proaches considered. As expected, the JHM identiﬁes the largest number of true genetic interactions. The number of suppressors identiﬁed by the JHM is higher than the Addi- nall et al. (2011), REM and IHM but for enhancers, all methods perform very similarly . Performance of the dif ferent methods can be observed through the FP and FN rates. From T able 4.4 we can calculate FP and FN rates, where FP rate = 1 − “sensitivity” and FN rate = 1 − “speciﬁcity”. FP rates for the Addinall et al. (2011), REM, IHM and JHM are 0 . 078 , 0 . 042 , 0 . 006 and 0 . 002 respectiv ely . The JHM has the lowest FP rate when com- pared to the other approaches av ailable. Frequentist approaches Addinall et al. (2011) and REM hav e large FP rates when compared to the two Bayesian approaches. The Addinall et al. (2011) approach has more false positi ves than true genetic interactions. FN rates for the Addinall et al. (2011), REM, IHM and JHM are 0 . 488 , 0 . 570 , 0 . 593 and 0 . 270 respecti vely . T wo-stage approaches Addinall et al. (2011), REM and IHM have large FP rates when compared to the JHM. The Addinall et al. (2011), REM and IHM hav e ∼ 200 false neg ati ves, approximately double the number identiﬁed by the JHM ( ∼ 100 ). Observ- ing the genes that have been missed by the two-stage approaches, we ﬁnd that they often fail to identify genetic interactions when evidence is weak in only K or r , ev en if there is sufﬁcient e vidence in the other parameter such that the JHM can identify the genetic interaction. 69 Chapter 4. Case Studies From our simulation study we hav e been able to sho w that the two-stage frequen- tist approaches ha ve high false positiv es and false negati v es. From the number of false positi ves identiﬁed for each method, we can see that the non-hierarchical Addinall et al. (2011) approach has the worst performance, followed by the hierarchical two-stage ap- proaches. As expected, the JHM is the best approach when we consider a simulated hier - archical data set with genetic interaction in terms of K and r , as the tw o-stage approaches fail to capture more subtle genetic interactions. T able 4.4: Simulation study with a joint hierarchical model (JHM) simulated dataset. A QF A screen comparison was generated from the JHM and 430 genes are set as genetic interactors, see Section 4.3.6. Applications of the (Addinall et al. , 2011), REM, two-stage Bayesian (IHM) and one-stage Bayesian (JHM) approaches are made to the JHM simulated dataset and performance compared. Suppressors and enhancers are deﬁned in terms of M D R × M D P . Model T rue interactions T rue Suppr essor s T rue Enhancers F alse P ositives F alse Negatives Sensitivity Speciﬁcity identiﬁed (N=430) (N=274) (N=156) Addinall et al. (2011) 220 158 62 303 210 0.922 0.512 REM 185 100 85 163 245 0.958 0.430 IHM 175 130 45 23 255 0.994 0.407 JHM 314 256 58 8 116 0.998 0.730 4.4. Bayesian infer ence code comparison Inference for the Bayesian hierarchical models in this thesis is carried out using code written in the C programming language. T o see how our code compares to commonly used software a v ailable for carrying out inference for Bayesian models, we hav e tested posterior samples for our C code and equiv alent code using Just Another Gibbs Sampler (J A GS) software (written in C++) (Plummer, 2003) . W e carry out our J A GS analysis within the R package “rjags” (Plummer, 2010) which pro vides a more familiar framew ork for an R user implementing the J A GS software. The BUGS (Bayesian inference Using Gibbs Sampling) language (Lunn et al. , 2000 a ) is used to describe models in J A GS. The SHM, IHM and JHM ha v e each been described with the BUGS language in Section B.6 of the Appendix. For the follo wing comparison we use a subset from the cdc13-1 27 ◦ C vs ura3 ∆ 27 ◦ C suppressor/enhancer data set described in Section 4.2. A subset of 50 orf ∆ s (for both the control and query) are chosen, each with 8 time-course repeats. W ith a smaller data set we are able to collect large posterior sample sizes, sufﬁcient to carry out a comparison between posterior samples. Density plots are used to visually compare the similarity of the posterior samples from the C and J A GS code. The Kolmogoro vSmirno v test (Huber - Carol, 2002) and unpaired two-sample Student’ s t-test (W itte & W itte, 2009) are used to 70 Chapter 4. Case Studies T able 4.5: Unpaired t-test and Kolmagoro v-Smirnov p-v alues comparing posterior samples from the joint hierarchical model (JHM) using both C and Just Another Gibbs Sampler (J A GS) software. An extract of JHM parameters are gi ven for both the C programming language and J A GS software. Posterior means are also included for both approaches. t-tests are carried out on the log posterior samples i.e. ˆ K p in place of e ˆ K p to assume normality . P arameter C Code posterior mean JA GS posterior mean t-test (with log posterior samples) Kolmagor ov-Smirnov test e ˆ K p 0 . 143 0 . 143 0 . 452 0 . 401 e ˆ r p 4 . 639 4 . 641 0 . 424 0 . 482 e ˆ P 2 . 537 · 10 04 2 . 517 · 10 04 0 . 137 0 . 116 e ˆ ν p 7 . 402 · 10 04 7 . 416 · 10 04 0 . 250 0 . 190 e ˆ α c 0 . 304 0 . 304 0 . 203 0 . 140 e ˆ β c 0 . 384 0 . 384 0 . 156 0 . 146 test for signiﬁcant dif ference between posterior samples from our C and J A GS code. A comparison of posterior samples for our most sophisticated model, the JHM, is gi ven belo w . Posterior samples of 100k particles are obtained after a burn-in period of 1000k and a thinning of e very 100 observations for both the C and J A GS code. Compu- tational time for the C and J A GS code is ∼ 30 hours and ∼ 400 hours respectively . The minimum effecti v e sample size per second (ESS min /sec) for the C and J AGS code is ∼ 1 and ∼ 0.1 respecti vely , demonstrating that the C code is ∼ 10 × faster . Figure 4.7 giv es density plots for an extract of JHM parameters for the C and J A GS software. V isually there is no signiﬁcant dif ference between the posterior sample density plots in Figure 4.7. Of the parameters shown, the weakest effecti v e sample size ( ∼ 80000 ESS) is for the initial inoculum parameter P , but this is sufﬁciently large enough ESS to test if posterior samples show a signiﬁcant dif ference. T able 4.5 demonstrates further that there is no signiﬁcant dif ference found between the parameters sho wn. The unpaired t-test for log posterior samples (for normality assumption) and K olmogoro v-Smirno v test p-values are all greater than 0.10 for the parameters gi ven, including the inoculum density parameter P . Overall we ﬁnd no signiﬁcant e vidence against the C code and J A GS code sampling from the same posterior distrib utions. As carrying out inference using C is ∼ 10 times faster than the J A GS equi v alent code we prefer the C code for our Bayesian hierarchical models. Obtaining sufﬁciently sized independent posterior samples of our posterior distrib utions for a larger data set of ∼ 4000 orf ∆ s, we estimate our C code to be at least more than ∼ 50 × faster than the equi v alent J AGS as we ﬁnd the J A GS code to hav e exponential computational costs as we introduce lar ger data sets. J A GS is v ery useful for model e xploration as it is fast and simple to describe complex models. The J A GS software is so prohibitiv ely slo w for the JHM, that an experimenter is likely to not carry out such inference and use a more 71 Chapter 4. Case Studies 0.130 0.135 0.140 0.145 0.150 0.155 0 50 100 150 Density 4.55 4.60 4.65 4.70 4.75 0 5 10 15 0.00023 0.00024 0.00025 0.00026 0.00027 0.00028 0 10000 20000 30000 40000 50000 60000 50000 100000 150000 0.0e+00 5.0e−06 1.0e−05 1.5e−05 2.0e−05 2.5e−05 3.0e−05 0.26 0.28 0.30 0.32 0.34 0 10 20 30 40 0.37 0.38 0.39 0.40 0 20 40 60 80 100 Density ν p e β 2 e α 2 e p K e p r e P e ^ ^ ^ ^ ^ ^ Density Density Density Density Figure 4.7: Density plots for posterior samples from the joint hierarchical model (JHM) using the C programming language (red) and Just Another Gibbs Sampler (black) software. Density plots for the JHM parameter posteriors (sample size = 100000, thinning interval = 100 and burn-in = 1000000). simple or faster method, justifying the use of the C programming language to carry out inference. Further improv ements such as the introduction of parallelisation may lead to more fa v ourable computational times in the future. 72 Chapter 4. Case Studies 4.5. Further case studies In this section we brieﬂy introduce dif ferent data sets that may be considered for fur- ther in v estigation and research. W e can also see how the JHM performs for dif ferent experimental conditions by applying the JHM to different QF A screen comparisons, see M D R × M D P ﬁtness plots in Figures 4.8-4.11. The data sets used in Figures 4.8-4.11 are currently unpublished from the L ydall lab . For each of the data sets, the JHM in T a- ble 3.3 is applied with the prior hyper -parameters in T able B.1. Posterior samples of 1000 particles are obtained after a b urn-in period of 800k, and a thinning of e very 100 observ a- tions. Similarly to Section 4.3.5, chains from our MCMC sampler are accepted where the ef fecti v e sample sizes are greater than 300 and Heidelber g and W elch P-v alues are greater than 0 . 10 for e v ery chain. As in the Addinall et al. (2011) analysis, each experiment has a list of 159 genes stripped from our ﬁnal list of genes for biological and experimental rea- sons. Results for the cdc13-1exo1 ∆ 27 ◦ C vs cdc13-1 27 ◦ C and cdc13-1r ad9 ∆ 27 ◦ C vs cdc13-1 27 ◦ C experiments have further genes removed for biological and experimental reasons, 23 and 13 genes respecti vely (a total of 182 and 172 genes respecti vely). Figure 4.8 is a cdc13-1exo1 ∆ 27 ◦ C vs cdc13-1 27 ◦ C suppressor/enhancer analysis for ﬁnding genes that interact with exo1 in a telomere maintenance defecti ve background ( cdc13-1 at 27 ◦ C). Similarly , Figure 4.9 is a cdc13-1rad9 ∆ 27 ◦ C vs cdc13-1 27 ◦ C sup- pressor/enhancer analysis for ﬁnding genes that interact with rad9 in a telomere main- tenance defecti v e background. Figure 4.10 is a yku70 ∆ 37 ◦ C vs ura3 ∆ 37 ◦ C suppres- sor/enhancer analysis for ﬁnding genes that interact with yku70 at high temperature. Fig- ure 4.11 is an example of a temperature sensiti vity experiment, for ﬁnding genes that interact with the high temperature of 37 ◦ C. Figures 4.8-4.11 demonstrate that the JHM can capture different linear relationships that are abov e or belo w the 1-1 line. Curv ature of the data in Figures 4.8-4.11 suggests that the linear relationships modelled by the JHM may be improv ed through linearising transformations of the data. Extending the JHM to account for the curv ature in the data may improve our model ﬁt and allow to better determine genes which signiﬁcantly interact. T able 4.6 compares the number of suppressors and enhancers estimated for each of the experiments considered. The e xperiments in T able 4.6 have similar numbers of ge- netic interactions, ranging from 358 to 511, but much lower than the cdc13-1 27 ◦ C vs ura3 ∆ 27 ◦ C experiment which has 939 . The experiments introduced in this section also dif fer from the cdc13-1 27 ◦ C vs ura3 ∆ 27 ◦ C experiment as they have more enhancers than suppressors, further demonstrating the JHM’ s ability to model dif ferent experimental 73 Chapter 4. Case Studies situations and the non-restricti ve choice of priors (T able B.1). T able 4.6: Number of joint hierarchical model (JHM) interactions for QF A datasets gi v en in Section 4.5. Interactions for each dataset is split into suppressors and enhancers. The number of interactions found with the extensions to the joint hierarchical model (see Section 4.6) are also gi ven. Each QF A screen comparison consists of 4294 orf ∆ s. Results for all e xperiments have a list of 159 genes removed from the ﬁnal list of interactions for biological and experimental reasons. Results for the cdc13-1exo1 ∆ 27 ◦ C vs cdc13-1 27 ◦ C and cdc13-1r ad9 ∆ 27 ◦ C vs cdc13-1 27 ◦ C experiments hav e further genes remov ed for biological and experimental reasons, 23 and 13 genes respecti vely (a total of 182 and 172 genes respecti v ely). Query scr een Contr ol scr een Interactions Suppr essor s Enhancers cdc13-1exo1 ∆ 27 ◦ C cdc13-1 27 ◦ C 388 81 307 cdc13-1rad9 ∆ 27 ◦ C cdc13-1 27 ◦ C 358 73 285 yku70 ∆ 37 ◦ C ura3 ∆ 37 ◦ C 511 104 407 ura3 ∆ 37 ◦ C ura3 ∆ 20 ◦ C 460 138 322 Model for cdc13-1 27 ◦ C vs Inter actions Suppr essors Enhancers ura3 ∆ 27 ◦ C experiment JHM 939 665 274 JHM-Batch 553 378 174 JHM-T ransformation 901 658 243 T able 4.7A shows the ov erlap in genes with signiﬁcant evidence of genetic interactions between the different QF A comparisons considered. The largest number of ov erlapping genetic interactions are found with the cdc13-1 ∆ 27 ◦ C vs ura ∆ 27 ◦ C experiment, ov er - lapping with 301 and 263 genes from the cdc13-1e xo1 ∆ 27 ◦ C vs cdc13-1 27 ◦ C and cdc13-1rad9 ∆ 27 ◦ C vs cdc13-1 27 ◦ C experiment respectiv ely . The cdc13-1 ∆ 27 ◦ C vs ura ∆ 27 ◦ C, cdc13-1e xo1 ∆ 27 ◦ C vs cdc13-1 27 ◦ C and cdc13-1r ad9 ∆ 27 ◦ C vs cdc13-1 27 ◦ C experiments are expected to ov erlap most as they are designed to ﬁnd genes inter- acting in a cdc13-1 background. The smallest number of ov erlapping genetic interactions are found with the ura3 ∆ 37 ◦ C vs ura3 ∆ 20 ◦ C and yku70 ∆ 37 ◦ C vs ura3 ∆ 37 ◦ C exper - iment. The ur a3 ∆ 37 ◦ C vs ura3 ∆ 20 ◦ C and yku70 ∆ 37 ◦ C vs ura3 ∆ 37 ◦ C e xperiments are expected to ha ve the least overlap as they are not designed to ﬁnd genes interacting in a cdc13-1 background. The yku70 ∆ 37 ◦ C vs ura3 ∆ 37 ◦ C experiment is designed to look at telomeres, but instead of disrupting the telomere capping protein Cdc13 using cdc13-1 , a yku70 ∆ mutation is made such that the protein Yku70 (a telomere binding protein which guides the enzyme telomerase to the telomere (Addinall et al. , 2011)) is no longer produced by the cell. Further ura3 ∆ 37 ◦ C vs ura3 ∆ 20 ◦ C is designed to 74 Chapter 4. Case Studies in vestigate temperature sensiti vity only . T able 4.7B sho ws the ov erlap in signiﬁcant GO terms between the different QF A com- parisons considered. The lar gest number of overlapping signiﬁcant GO terms are found with the cdc13-1 ∆ 27 ◦ C e xperiment, ov erlapping with ∼ 150 GO terms for each e xperi- ment. The smallest o verlap with cdc13-1 ∆ 27 ◦ C vs ura ∆ 27 ◦ C experiment is 110 GO terms with the ura3 ∆ 37 ◦ C vs ura3 ∆ 20 ◦ C experiment. The smallest number of overlap- ping genetic interactions are for the ura3 ∆ 37 ◦ C vs ura3 ∆ 20 ◦ C experiment, followed by yku70 ∆ 37 ◦ C vs ur a3 ∆ 37 ◦ C, with ∼ 110 and ∼ 120 GO terms ov erlapping with the other experiments respecti v ely . Similarly to the ov erlap of genes with signiﬁcant evi- dence of genetic interaction, the overlap of signiﬁcant GO terms sho ws that our cdc13-1 background experiments share the most GO terms and that the temperature sensitivity experiment ur a3 ∆ 37 ◦ C vs ura3 ∆ 20 ◦ C has the least ov erlap. W e ha v e shown that the JHM can successfully model different experimental data sets, Figures 4.8-4.11 are included as a reference for further research. Of the different ex- periments we can see that cdc13-1 27 ◦ C vs ur a3 ∆ 27 ◦ C is the most dissimilar to the other experiments due to the large number of genetic interactions, 939 in total (see T a- ble 4.6). The next largest number of genetic interactions is 511 with the yku70 ∆ 37 ◦ C vs emphura3 ∆ 37 ◦ C experiment, which is approximately half the genes found for the cdc13-1 27 ◦ C vs ura3 ∆ 27 ◦ C experiment. T ables 4.7A and 4.7B show that the ov erlap between QF A comparisons is as expected using the JHM, with the closer related exper - iments sharing the most ov erlap. T o account for the curv ature of the data observed in Figures 4.8-4.11 we introduce a JHM with linearising transformations in the next section. Further research may include de veloping models that can incorporate multiple QF A com- parisons to ﬁnd evidence of genetic interactions between query screens and incorporate more information within our models. 75 Chapter 4. Case Studies T able 4.7: Overlap between different QF A comparisons for genes interacting and gene ontology terms ov er -represented in lists of interactions. For a f air comparison, an y genes remo ved fr om the results of a QF A comparison for biological and experimental reasons are remov ed for all experi- ments, therefore results for all experiments have a list of 195 genes (159+23+13, see T able 4.6) remov ed from the ﬁnal list of interactions for biological and experimental reasons. A) Number of genes identiﬁed for each QF A comparison and the overlap between QF A comparisons. 4099 genes from the S. cere visiae single deletion library are considered. B) Number of GO terms iden- tiﬁed for each approach and the ov erlap between QF A comparisons. 6094 S. cer evisiae GO T erms av ailable. A. cdc13-1 ∆ 27 ◦ C cdc13-1exo1 ∆ 27 ◦ C cdc13-1rad9 ∆ 27 ◦ C yku70 ∆ 37 ◦ C ura3 ∆ 37 ◦ C vs ura ∆ 27 ◦ C vs cdc13-1 27 ◦ C vs cdc13-1 27 ◦ C vs ura3 ∆ 37 ◦ C vs ura3 ∆ 20 ◦ C cdc13-1 ∆ 27 ◦ C vs ura ∆ 27 ◦ C 926 N/A N/A N/A N/A cdc13-1exo1 ∆ 27 ◦ C vs cdc13-1 27 ◦ C 301 386 N/A N/A N/A cdc13-1rad9 ∆ 27 ◦ C vs cdc13-1 27 ◦ C 263 245 355 N/A N/A yku70 ∆ 37 ◦ C vs ura3 ∆ 37 ◦ C 252 155 146 506 N/A ura3 ∆ 37 ◦ C vs ura3 ∆ 20 ◦ C 223 152 149 164 455 B. cdc13-1 ∆ 27 ◦ C cdc13-1exo1 ∆ 27 ◦ C cdc13-1rad9 ∆ 27 ◦ C yku70 ∆ 37 ◦ C ura3 ∆ 37 ◦ C vs ura ∆ 27 ◦ C vs cdc13-1 27 ◦ C vs cdc13-1 27 ◦ C vs ura3 ∆ 37 ◦ C vs ura3 ∆ 20 ◦ C cdc13-1 ∆ 27 ◦ C vs ura ∆ 27 ◦ C 282 N/A N/A N/A N/A cdc13-1exo1 ∆ 27 ◦ C vs cdc13-1 27 ◦ C 142 188 N/A N/A N/A cdc13-1rad9 ∆ 27 ◦ C vs cdc13-1 27 ◦ C 151 130 212 N/A N/A yku70 ∆ 37 ◦ C vs ura3 ∆ 37 ◦ C 150 119 125 245 N/A ura3 ∆ 37 ◦ C vs ura3 ∆ 20 ◦ C 110 100 112 119 195 76 Chapter 4. Case Studies exo1 0 10 20 30 40 50 60 0 10 20 30 40 50 60 MDM10 ETR1 AKL1 UBC4 PHO5 ARA1 NPL4 RPS9B REI1 CHK1 SNF5 BUD31 PTC6 YDL012C HEX3 GPR1 NA T1 YDL041W RPP1A RAM1 YDL109C PHO13 VPS41 UME6 YDR262W YDR269C RTT103 :::RTT103 IPK1 MSN5 VID21 RPP2B SPT3 STE14 KRE28 CEM1 RAD24 PDA1 LPD1 BUD27 STE2 YGL024W TRP5 FMP37 KEM1 TOS3 V AM7 YGL217C YGL218W ZRT1 HGH1 PDX1 ELP2 PHB2 SNF6 THR1 SL T2 HTD2 NMD2 LRP1 ARP1 YHR162W STB5 FYV10 KGD1 FLX1 YPS6 PET130 YJL046W GSH1 LSM1 YJL175W RCY1 OPI3 GRR1 CP A2 YJR154W MRT4 NFU1 ELM1 OAR1 OAC1 CTK1 ZRT3 SPE1 DPH2 SAC1 MEH1 HSP104 RPL22A DPH5 SWI6 RPL37A HCR1 UPS1 EST1 TOP3 LIP2 VRP1 REH1 VIP1 SIR3 LEU3 YML030W GTR1 CCS1 IRC21 CTF18 PGM2 YKU80 ALD3 ECM5 ERG2 YMR206W HF A1 MRE11 AEP2 D YN3 YNL011C YDJ1 LA T1 NST1 ESBP6 LSM7 YNL171C RHO5 YNL226W SIN4 ZWF1 RAD50 YNR005C CSE2 YNR020C MNT4 PHO80 MDM12 GSH2 SLG1 RTS1 STD1 DIA2 Y OR139C SFL1 LIP5 HIS3 STE4 MCT1 Y OR302W CP A1 RPL20B LDB19 RAD17 HA T1 RMI1 BTS1 PNG1 SSE1 CBC2 TCO89 DDC1 YPL205C YME1 RPL43A YPR044C MNI2 UBA3 TKL1 CLB2 MTC5 POL32 SUR4 PDB1 SWM1 JJJ1 MET18 DPB4 BEM2 RAD52 CLB5 MON1 RIF1 MNN10 DPH1 RIM1 BUD21 GP A1 RPE1 V AM3 IMG2 CHO2 FKS1 VPS1 RPO41 CST6 V AM10 KHA1 A TP10 MMR1 VPS51 HAP4 PCP1 GIS4 EMI1 AAH1 QCR2 CA T5 UBP6 ERG3 YML013C−A PTH1 XRS2 YJL120W HAL5 IES2 PSD1 HMO1 P AH1 EFT2 BAS1 ELP3 SNC2 MDM34 NEW1 LST4 CYT1 MAK31 Y OR015W PKR1 V AM6 SGS1 EMI5 MSS18 SWI4 VMA21 YDR271C CYK3 ROD1 RPN4 YLR143W HSP26 ASC1 OPI1 GEF1 BEM4 LSM6 SLX8 RPL2A TPK1 VPS5 STE11 FKH1 SWF1 MRPL1 VPS53 IMP2' YLR261C VPS9 RPS24A DOA1 RPL24A MRN1 TEF4 MCK1 FET3 CCZ1 RPL8B RPL37B SAC3 CO X7 SIN3 GAS1 MDM38 YLR338W TOP1 YKU70 CYT2 JJJ3 A TP20 CO X23 RPL14A RRP8 HDA1 SCS2 RPL42A L TE1 YLR402W EOS1 CO X12 RPL9A SPT8 PPT1 SHE9 Y OR309C CKA2 YPT6 GET2 YEL007W PTC1 CAC2 PPH3 DBP3 MNE1 UBP3 NHP10 YER119C−A STE5 TLG2 STE50 OST4 NUT1 MMM1 GCR2 PIB2 RPP1B YLR290C JNM1 RTT101 CCC2 FEN1 YPL080C P AC1 INP52 RRP6 CKB1 DEG1 MRP49 FYV1 VPS35 YGR237C YDR049W RPL17B PHO88 FYV12 APQ12 LDB18 SA T4 BUB1 SOH1 RPL8A CRD1 MAC1 MCM16 YNL198C NEM1 PET122 KNS1 VPS21 FKH2 BUL1 RPL6B MGR2 MLH1 UBA4 GIM3 RPL35B POT1 YPL062W SNX4 RPL13A MFT1 GPH1 YKR035C ARF1 HMT1 YMR166C YMR310C TMA20 MPH1 Y OR052C SDS3 YLR111W YPT7 YBR266C RVS167 IRA2 BCK1 ELP4 ALD6 RPL35A FTR1 BMH1 HUR1 RIC1 YDL119C GPD2 DPB3 Figure 4.8: cdc13-1exo1 ∆ 27 ◦ C vs cdc13-1 27 ◦ C joint hierarchical model (JHM) ﬁtness plot with orf ∆ posterior mean ﬁtnesses. The JHM does not does not make use of a ﬁtness measure such as M D R × M D P but the ﬁtness plot is given in terms of M DR × M D P for comparison with other approaches which do. orf ∆ strains are classiﬁed as being a suppressor or enhancer based on one of the two parameters used to classify genetic interaction, growth parameter r , this means occasionally strains can be more ﬁt in the query experiment in terms of M D R × M D P but be classiﬁed as enhancers (green). Further ﬁtness plot explanation and notation is given in Figure 4.2. 77 Chapter 4. Case Studies rad9 0 10 20 30 40 50 60 0 10 20 30 40 50 60 70 MDM10 GEM1 RRN10 ETR1 UBC4 PHO88 TPS1 CCZ1 RPS9B PDB1 CHK1 SNF5 BUD31 IMG2 PTC6 NHP10 PTC1 YDL041W STE7 YDL162C PHO13 PPH3 VPS41 RRP8 STE5 MRPL1 DPB4 BUD26 SWM1 YDR269C RTT103 :::RTT103 SWR1 VID21 ESC2 VPS72 PUF6 KRE28 CEM1 RAD24 PDA1 IES1 LPD1 RIM15 PUF4 YGL024W TRP5 FMP37 NUT1 KEM1 BUD13 TOS3 YGL217C YGL218W MDM34 RTF1 PCP1 PPT1 CHO2 ELP2 PHB2 SL T2 HTD2 NMD2 ARP1 YHR162W STB5 FYV10 BNR1 PET130 YJL046W GSH1 YJL120W RPE1 LSM1 RCY1 MOG1 GRR1 CP A2 SPE1 DPH2 SAC1 HSP104 LDB18 RAD5 SPT8 ARP6 DPH5 SWI6 MMR1 HCR1 UPS1 EST1 LIP2 YLR261C ROM2 VID22 IKI3 REH1 LEU3 YML030W VPS71 VPS9 MSC1 CCS1 CSM3 IRC21 YKU80 P AH1 HF A1 MRE11 AEP2 JNM1 D YN3 YDJ1 LA T1 ESBP6 AAH1 LSM7 GIM3 IES2 ZWF1 RAD50 MID1 CSE2 MDM12 GSH2 SPE2 RRP6 EXO1 BUD21 DIA2 LIP5 HIS3 CP A1 RPL20B SNC2 RAD17 BTS1 ELP3 PNG1 MGR2 CBC2 TCO89 MRN1 DDC1 NEW1 YME1 YPR044C MED1 CLB2 HMO1 RPP2B VPS53 YNL120C OPI1 FKH1 CSF1 HSP26 ZRT3 ALD6 CRD1 FKS1 RIM1 RPS24A YPT6 SWF1 YNL198C MCT1 GCR2 FUS3 GGA2 HEX3 MNN10 MAK31 LDB19 FLX1 OAR1 PIH1 CLB5 REI1 CDC73 VMA21 PDX1 V AM7 RHO5 RPL22A RPL11B YML102C−A PGM2 BRE1 RVS161 PTH1 OAC1 RTS1 SIN3 RPL37A STE2 CKB2 ELP4 MSN5 MMM1 CTF18 YMC2 SIN4 D YN1 MDM38 SNT1 DPH1 VPS75 EFT2 SPT3 SPE3 ECM5 VHS2 EMI1 ARF1 MNI1 THR4 Y OR082C GIM4 YNL226W SHE9 SSD1 TKL1 P AC1 RPL43A CAC2 RPS0B GIS4 RPS25A V AM3 DPB3 MSH2 SAP190 MNI2 NBP2 NST1 LSM6 IMP2' PIB2 NAP1 Y OR309C BEM2 HAL5 YLR143W CPR7 PPZ1 GTR2 DOA1 MTC5 NA T1 SWC3 YPL205C RPL24A HAP4 MPH1 YPS6 DBF2 RPL13A RPS1B CKB1 PHO80 HUR1 HA T1 YPL062W YMR057C STE14 SKI7 SA T4 RP A34 Y OR052C BEM1 MIR1 TRM10 NCS6 MSC2 UBA3 RSC2 RPL8A RPL6B PHO23 GTR1 YBR025C PFK26 YNR020C GAS1 P A T1 KTI12 RPS14A PPQ1 LEA1 YDR248C TGS1 MCM16 SAC7 HXK2 JJJ1 YGR125W RPP1A YBR277C ARP8 YNR005C SO Y1 MEH1 APQ12 RPN4 BMH1 KAR3 VIP1 YDL109C FKH2 CBT1 LGE1 RVS167 ZRC1 ARO1 TMA22 PMS1 BEM4 RPL17B BAS1 FMP36 TOM7 PET122 ERD1 V AM10 MAK10 FYV12 MET18 HDA3 A TP10 RPL9A SCS2 A TS1 YPL080C RTT106 YDL119C YKE4 YNL171C EST3 SAM37 RIS1 RPL37B NPL4 OPI3 YMR206W JJJ3 BCK1 RCO1 SSF1 ICT1 SRN2 YLR402W DOT1 DBP3 YNL140C HST3 Figure 4.9: cdc13-1rad9 ∆ 27 ◦ C vs cdc13-1 27 ◦ C joint hierarchical model (JHM) ﬁtness plot with orf ∆ posterior mean ﬁtnesses. The JHM does not does not make use of a ﬁtness measure such as M D R × M D P but the ﬁtness plot is given in terms of M DR × M D P for comparison with other approaches which do. orf ∆ strains are classiﬁed as being a suppressor or enhancer based on one of the two parameters used to classify genetic interaction, growth parameter r , this means occasionally strains can be more ﬁt in the query experiment in terms of M D R × M D P but be classiﬁed as enhancers (green). Further ﬁtness plot explanation and notation is given in Figure 4.2. 78 Chapter 4. Case Studies yku70 37 37 0 50 100 150 0 50 100 150 VPS8 A TS1 PMT2 GPB2 Y AL058C−A SWD1 ADE1 HIR1 RRN10 EDE1 TEL1 A VT5 YBR028C HMT1 CSG2 HSP26 ECM33 SIF2 CCZ1 PEX32 :::SWD3 SWD3 GDT1 SO Y1 SWC5 SNT1 FEN1 THR4 IMG2 OCA4 RPN4 BRE1 V AM6 RPL35B UFD2 RAD57 FYV1 RPS11A OCA6 TPS2 RAD55 YDR090C TRM1 ARO1 NBP2 UME6 RAD9 SUM1 VID21 XRS2 VPS74 RVS167 SPT3 SDC1 PUF6 MTC7 RAD51 YER139C RAD24 SAP155 KAP122 YGL024W YGL042C SGF73 SOH1 RAD54 PMR1 HUR1 YGL217C RTF1 UPF3 PRE9 CHO2 SMI1 OCA5 RPL8A THR1 RRM3 SRB2 NMD2 EST3 APQ12 SDS3 RRD1 DAL81 VPS53 LAS21 RCY1 MOG1 HOC1 ACF4 CSN12 CP A2 ILM1 YKL037W VPS24 PTK1 SAC1 DOA1 VPS1 YKR035C DID2 SIS2 PPR1 BRE2 IRC25 SPT8 RPL22A YLR065C ARP6 EST1 NUP2 V AC14 YLR402W RPL6B YML013C−A TSA1 RAD52 YMD8 VPS71 VPS9 SUB1 NAM7 YKU80 YMR166C SPT21 CIK1 ERG2 SKY1 MRE11 SAP30 SCS7 YMR278W CA T8 AEP2 RIT1 YKU70 NGL2 ABZ2 YMR291W GAS1 SIW14 OCA2 RPL16B OCA1 INP52 Y AF9 LSM7 RAD50 MCK1 VPS27 YNR024W YNR025C MNT4 SIN3 SPE2 PRS5 IRA2 ADH1 TRM10 RTS1 STI1 EXO1 VHS3 VPS5 DIA2 V AM3 LEO1 VPS17 IES4 HIS3 Y OR291W Y OR302W CP A1 ISW2 RAD17 HA T1 LGE1 SUR1 ALD6 YPL062W BTS1 RPL21B YPL080C RPS6A A TG21 SSE1 SPP1 UME1 POC4 Y AR1 RPL43A MNI2 MCM16 ARO7 CLB2 GPH1 :::MRC1 OST4 SPT2 YJL064W PML39 HIR2 YPL105C YPR044C HDA3 SWC3 SLG1 PPQ1 YPL205C SWR1 VRP1 YNR005C YBR090C−A RVS161 FKH2 :::RTT103 DCR2 MNN10 RTT103 ERG3 BEM1 GAS2 Y OR296W RNY1 RRD2 PRM9 PPH3 DST1 PHO23 TGS1 EAF3 YPT7 SWI4 TOF2 MMM1 UBP12 UBP8 YDR431W FCY2 MNN11 ARA1 BIM1 PGM2 YMR206W Y AL004W RPL42A AKL1 RPS4A CCS1 SHE1 V AM7 SAC7 ITC1 ELP4 PRM4 YDR262W IML2 VPS30 ZRT1 YNL187W IMP2' PTK2 RPL40B YBL104C CTF4 BF A1 KAP120 MTC2 Y OR277C FKS1 YMR279C BRR1 Y AR040C MDM34 BUL1 KAR3 RAD61 AMD1 ZRC1 YDR269C RTT106 OPI10 MID1 VPS29 NRG1 RPS28B YNL011C V AM10 VTS1 VPS60 RAD16 BEM4 Y OS9 GDS1 PER1 MSO1 TRF5 GCY1 MCX1 INO2 YHR033W Y OR121C PET18 YGL007W HAP4 SAP190 RPS14A VPS21 CSM3 TPS1 HST1 VMA21 YIL100W HPC2 SGF11 PHO88 YJL169W ELP6 ARC1 BAS1 GYP1 URM1 NCS2 MAK10 GOT1 VPS4 UBP2 TIM18 CUS2 YKR023W CBC2 RFM1 YBL053W HXT8 BSC6 SUR4 YDL012C YBR134W NPL4 UBP6 RIF2 AAH1 YDJ1 Y OR251C LST4 CAF20 SPO21 OSH6 RPL4A HCM1 RPS23A KTI12 MSN1 RPS7B YPR053C TEC1 LDB19 YBL071C SAC3 NUP42 FO X2 IML3 YDC1 FPR4 ASN1 YNL205C DRS2 MRPL44 SLA1 RAD27 F AB1 FIT2 MLP2 NPP1 DPB4 YCR051W DLS1 RPL34A RO T2 MSC1 GET2 DSK2 RP A34 YPL102C F A T1 Y OL013W−A SNF3 RPO41 STB5 YDR274C LEU3 SNF1 YBR025C YML030W SLX8 ISW1 NA T1 FMC1 ACF2 YBR277C OPY1 YER135C VID22 NUC1 CST6 TBS1 ABM1 ZWF1 YBR064W SHR5 TRM7 DOG2 PUT2 SPE1 ASC1 YLR346C HEX3 RNH70 RPS16B IF A38 BUD7 RPL11B EMI5 HIR3 BUB3 IES2 SRL3 VT A1 GTO3 YBR261C RPS4B HEK2 THP2 Y AR029W HF A1 EMI1 VHS2 PTC1 ZDS1 RRP6 GET1 GEF1 ALD3 Y OR343C YGL149W TOS3 SOP4 SFL1 YNR042W RPL18B YMR155W PHB2 SRN2 SSD1 SSA1 HSV2 PPZ1 YER140W SET3 MDM10 HT A2 HSM3 UBC13 MSN5 PUS2 ISC1 ECM5 PPM1 ASG7 ECM1 BPH1 NTC20 YKL075C REH1 YMR253C MSS11 YDL041W SET2 GBP2 RPL16A HDA1 BUD19 YML117W−A PDC1 LSP1 RIM101 SLM4 YFL013W−A YDR506C AAD4 IXR1 RAD5 PUS1 :::GUK1 YDR266C RPL34B HPF1 YFR045W WSP1 SAL1 TOR1 A TG23 PRM2 PDB1 YNR020C AXL2 CPR6 FTH1 VPS51 STD1 Figure 4.10: yku70 ∆ 37 ◦ C vs ura3 ∆ 37 ◦ C joint hierarchical model (JHM) ﬁtness plot with orf ∆ posterior mean ﬁtnesses. The JHM does not does not make use of a ﬁtness measure such as M D R × M D P but the ﬁtness plot is given in terms of M D R × M DP for comparison with other approaches which do. orf ∆ strains are classiﬁed as being a suppressor or enhancer based on one of the two parameters used to classify genetic interaction, growth parameter r , this means occasionally strains can be more ﬁt in the query experiment in terms of M D R × M D P but be classiﬁed as enhancers (green). Further ﬁtness plot e xplanation and notation is gi v en in Figure 4.2. 79 Chapter 4. Case Studies 0 20 40 60 80 0 50 100 150 VPS8 PMT2 DRS2 GEM1 FLC2 PRM9 OSH1 SLA1 HIR1 YBL053W BOI1 MNN2 ETR1 MUM2 YBR064W YBR071W SLM4 UBC4 SIF2 GRS1 TPS1 PEX32 SEC66 YBR209W HPC2 PDB1 RO T2 YBR266C REI1 SSH1 YCL022C GFD2 MRC1 SA T4 RVS161 PMP1 RIM1 SNT1 PER1 THR4 P A T1 PTC1 YDL011C HEX3 RPS29B THI3 RPP1A RPS16B PMT1 GET3 YDL109C RPL35B CRD1 CLB3 STE7 UGA3 ARF1 OST4 NRG1 Y OS9 RRP8 ARX1 STE5 FOB1 MTC5 YDR133C YDR269C PMP3 RAD34 IPK1 MSN5 GGA1 SEM1 RVS167 STE14 :::GUK1 PUF6 EMI1 GDA1 CEM1 GET2 TRP2 SWI4 YER119C−A YER135C YER139C YER140W PEA2 BEM2 PDA1 LPD1 BUD27 YFL054C YGL007W GET1 RPL24A YGL042C RIM8 YGL046W DBP3 YGL109W SOH1 RPL9A ARO2 SUT1 PMR1 BUD13 HOS2 YGL218W MDM34 ADH4 NMA2 YGR018C YGR054W ADE6 MRP13 RPL11B DBF2 PRE9 BUB1 PHB2 RIM101 OCA5 ECM29 GOS1 YHR033W PIH1 SSF1 HTD2 NMD2 LRP1 REC104 THP2 YIL039W SEC28 FMC1 YIL100W MNI1 MET18 CSM2 YIR003W IST3 MGA2 YJL007C MAD2 YJL046W KHA1 PHO86 YJL120W RPE1 CPS1 MNN11 YJL213W HXT8 RAD26 ISY1 TOR1 OPI3 ABM1 STE24 MRT4 ELM1 OAR1 YKL066W YKL069W LHS1 RAD27 CTK1 ELF1 ZRT3 SPE1 TRP3 DOA1 FO X2 FMP13 VPS51 YKR033C RPS21A P AM17 NUP133 TGL4 BAS1 MMM1 FPS1 PDC1 RPS0B ICT1 CCW12 SRN2 ACF2 RPL37A HCR1 LIP2 ARV1 GSY2 YLR261C YPT6 SEC22 RPS30A REC102 CHS5 VRP1 YLR338W FKS1 RSC2 ARC18 SUR4 VID22 V AC14 A TP10 YLR428C CRN1 TSR2 ECM30 PSP2 RPS18B TSA1 RAD52 MFT1 YML102C−A CAC2 MIH1 RIM9 ASC1 ASI1 YMR119W−A RIM13 CIK1 HF A1 SKY1 MRE11 GTO3 TMA23 BUL1 YKU70 UBP15 YMR316C−A HDA1 YDJ1 LA T1 LSM7 PSD1 YNL171C YNL198C GCR2 IES2 YNL226W JJJ1 SIN4 ZWF1 RAD50 RIM21 TRF5 RPS19B PHA2 SNZ2 YNR004W YNR005C YNR020C BRE5 AIF1 SIN3 TOP1 MDM12 GSH2 PSH1 MET22 IRA2 MSN1 BUB3 DFG16 SHE4 HIR2 WHI2 STD1 CKA2 VPS21 GCY1 CA T5 ARP8 DCI1 RPS30B FYV12 LIP5 STE4 RUD3 MCT1 Y OR235W RIM20 RPS10A Y OR296W SNU66 Y OR309C PDE2 CAM1 LGE1 ALD6 YPL062W BTS1 YPL073C POC4 TGS1 BEM4 UIP4 LEA1 Y AR1 YPL261C RLF2 YME1 RPL43A MNI2 YPR053C BRR1 ARO7 SPE3 TKL1 YPR096C YPR097W YLH47 CTF4 KAR3 YPR153W NCA2 VPS4 NST1 AAH1 SAP190 VIP1 SET3 HUR1 RPL24B RPS27B MMS1 PRM3 PCP1 RPL36A BUD21 MVP1 PHO90 MRP49 CO X12 RPS28B VPS27 SAC3 RPL8A SRB8 SPE2 NKP2 LSM1 COQ10 RTN1 YBR030W STP1 RHR2 GNP1 GAL10 DIA2 SWM1 MSC1 CBP4 RPL37B YKU80 CDC73 DOT1 SLG1 YGL196W SWD1 FEN1 BEM1 NTC20 FMP30 RRP6 GIM4 CYT1 YNL140C SUB1 AL T1 UTP30 PHO80 ABP1 RPS10B YNR071C Y AR029W YPL041C MNN10 NUP188 YLR184W YSA1 IXR1 HSM3 ECM33 APS3 Y OR251C VPS9 YPL158C BUD31 HST1 SSE1 EDE1 YDR274C MSG5 MED1 NPL4 RPS26B MOT3 ERG5 RPL13A RTS1 STB5 YHL005C IES1 VPS24 RPL23A TRM1 APL5 MTC6 QCR2 RFM1 HDA3 CP A1 YMR052C−A PRM6 YT A7 KTI12 IRC25 YFR045W MTC3 RMD11 TRP4 ASF1 ICL2 BUD28 RAD57 YBR277C SSF2 PPM1 ARO1 FUN14 SPF1 RAD5 RTT103 HIR3 ELP4 YLR217W YDL094C CWH41 RAD51 YML010C−B CO X7 PIN4 RTG2 MRPL50 MMS22 ARN1 MAC1 RPL2A MRPL1 NOP16 RPS28A PET122 37 0 Figure 4.11: ura3 ∆ 37 ◦ C vs ura3 ∆ 20 ◦ C joint hierarchical model (JHM) ﬁtness plot with orf ∆ posterior mean ﬁtnesses. The JHM does not does not make use of a ﬁtness measure such as M D R × M D P but the ﬁtness plot is gi ven in terms of M D R × M D P for comparison with other approaches which do. orf ∆ strains are classiﬁed as being a suppressor or enhancer based on one of the two parameters used to classify genetic interaction, growth parameter r , this means occasionally strains can be more ﬁt in the query experiment in terms of M D R × M D P but be classiﬁed as enhancers (green). Further ﬁtness plot e xplanation and notation is gi v en in Figure 4.2. 80 Chapter 4. Case Studies 4.6. Extensions of the joint hierarchical model In this section we brieﬂy introduce two new extensions of the JHM for further in v estiga- tion and research. An extension to the JHM, giv en in T able 3.3, is to consider a batch ef fect. Batch effects are technical sources of v ariation from the handling of experimental cultures (Leek et al. , 2010; Chen et al. , 2011). Batch effects can be confounded with the biology of interest, leading to misleading results and conclusions. A QF A screen comparison is carried out between two QF A screens. Each QF A screen consists of multiple 384 plates gro wn ov er time (see Figure 2.3), typically with each orf ∆ repeat on a different 384 plate. For the cdc13-1 27 ◦ C vs ura3 ∆ 27 ◦ C experiment, each QF A screen is b uilt of 120 384 spot plates (240 total unique plates). Each 384 plate is created sequentially and may be created by a dif ferent experimenter . The 384 plates may therefore differ due to factors that the experimenters do their best to control such as the amount of nutrition in a plate, temperature, or other en vironmental effects. Where orf ∆ repeats are carried out across multiple plates, dif ferences in plates can therefore be captured by introducing a batch ef fect into the model. Through careful planning and impro ved experimental design, batch effects can be reduced or remo v ed. When we are unable to improv e our e xperimental design any further we may be interested in accounting for a batch effect within our model. Introducing parameters to model batch ef fects in our experiment we can account for any differences between the 240 384 spot plates. A JHM with batch effects (JHM-B), described in T able 4.9, will be able to improv e inference by including more of the experimental structure. The model in T able 4.9 introduces a batch ef fect κ b and λ b , for a plate b , to capture any batch ef fect in carrying capacity K and growth rate r respectiv ely . A batch effect will be estimated within the model and consequently an y confounding with orf ∆ lev el carrying capacity K and gro wth rate r parameters will be removed. Using frequentist estimates of the batch effects in the QF A screens, a normal prior was chosen to describe batch ef fect parameters, allo wing either a positi ve or negati ve effect to be incorporated for each orf ∆ repeat in terms of K and r . Another extension of the JHM is to consider a transformation to linearise the re- lationship describing genetic independence in the JHM. When carrying out linear regression we may be interested in linearising the data to improve the linear relationship (Kutner et al. , 2005). There are many different transformations used for linearising data, the most common are log and power transformations. Power transformations are f amilies 81 Chapter 4. Case Studies of power functions that are typically used to stabilise v ariance and make our data more Normal distrib ution-like. F or a v ariable x , a po wer function is of the form f : x 7→ cx r , for c, r ∈ R , where c and r are constant real numbers. The Box-Cox transformation (Box & Cox, 1964) is a particular case of power transformation that is typically used to transform data and linearise a relationship within a data set. W ithout linearising our data, we may not be describing genetic independence within our model correctly , leading to misleading results and conclusions. A JHM with transformations (JHM-T), described in T able 4.10, will be able to improv e inference by ensuring a more linear relationship is made between the control and query screen. Genetic independence within the JHM is described as a linear relationship (see Sec- tions 1.2.1 and 3.4.1) for both carrying capacity K and growth rate r . W e may not belie ve there to be a perfectly linear relationship between the control and query for both K and r . Introducing a power transformation for the model of genetic independence in terms of K and r can allow us to linearise the relationship and better model genetic independence. The model in T able 4.10 introduces the transformation parameters φ and χ at an orf ∆ le vel for both the carrying capacity K and gro wth rate r respecti vely , where φ > 0 and χ > 0 . The “vanilla” JHM assumes an additi ve model of epistasis with ( α c + K o l + δ l γ cl , β c + r o l + δ l ω cl ) , where α c and β c are the scale parameters, as we are considering log orf ∆ parameters. The “vanilla” JHM effecti vely assuming a multiplica- ti ve model on the original scale of the data i.e. ( e α c e K o l + δ l γ cl , e β c e r o l + δ l ω cl ) . By introducing ne w parameters φ and χ to scale the control and query data  α c + K o l + δ l γ cl φ , β c + r o l + δ l ω cl χ  we can expect to have a po wer transformation with the control and query on the original scale of the data h  e α c e K o l + δ l γ cl  1 φ ,  e β c e r o l + δ l ω cl  1 χ i . The transformation parameters gi v e the same transformation to both the control and query screens. Our model will learn about φ and χ , adjusting the relationship of genetic independence and consequently those identiﬁed as genetic interaction. Choosing to include a multiplicativ e transformation parameter where the model describes genetic independence (as an additiv e model) will gi ve the model the ﬂexibility to adjust the linear relationship between the control and query screens. Prior hyper-parameter choice for the transformation ef fect must be strictly positi ve and centred at 1 (no transformation effect) and so a gamma distrib ution with a mean of 1 is chosen for both χ and φ . Figures 4.12 and 4.13 show JHM-B and JHM-T M DR × M D P ﬁtness plots re- specti vely , for the cdc13-1 27 ◦ C vs ura3 ∆ 27 ◦ C experiment. Prior hyper-parameter choices for the models are giv en T able B.1. Bayesian inference and MCMC methods for 82 Chapter 4. Case Studies the JHM in T able 3.3 is carried out similarly for both the JHM-B and JHM-T . Posterior samples of 1000 particles are obtained after a burn-in period of 800k, and a thinning of e very 100 observations. Similarly to Section 4.3.5, chains from our MCMC sampler are accepted where the ef fectiv e sample sizes are greater than 300 and Heidelberg and W elch P-v alues are greater than 0 . 10 for e very chain. Similarly to the other pre vious modelling approaches considered (including the “v anilla” JHM), a list of 159 are stripped from our ﬁnal list of genes for biological and experimental reasons. The JHM-B ﬁt in Figure 4.12 has many less interactions on the plot than the “v anilla” JHM ﬁtness plot, this may be e vidence of a plate ef fect existing. The JHM-T ﬁt in Fig- ure 4.13 is largely the same as the “vanilla” JHM ﬁtness plot. It is worth noting that the JHM-T model ﬁt in Figure 4.13 has posterior mean estimates of ˆ φ = 0 . 96 and ˆ χ = 0 . 87 , 2dp, suggesting that a transformation may only exist in terms of r . T able 4.6 compares the number of suppressors and enhancers estimated for the two extensions of the JHM. The JHM-B reduces the number of genetic interactions from the “v anilla” JHM from 939 to 553 , and similarly reduces the number of suppressors and enhancers. Therefore from the “v anilla” JHM to the JHM-B, there is approximately a 41% reduction of genes identiﬁed as showing signiﬁcant evidence of genetic interaction, strong e vidence for the presence of a batch effect. The JHM-T is more similar to the JHM with 901 interactions, reducing both suppressors and enhancers by a small amount. Therefore from the “v anilla” JHM to the JHM-T , there is approximately a 4% reduction of genes identiﬁed as showing signiﬁcant e vidence of genetic interaction, a much smaller reduction from the JHM than that observed with the JHM-B. T able 4.8A shows that the number of genes that ov erlap with the genes identiﬁed by the “vanilla” JHM is 531 and 886 for the JHM-B and JHM-T respectiv ely . Therefore the number of genes identiﬁed as interacting by the “vanilla” JHM and no w no longer iden- tiﬁed is 408 and 53 for the JHM-B and JHM-T respectiv ely . This further demonstrates the lar ge reduction in genetic interactions when using the JHM-B, suggesting that a batch ef fect is present within the data. The number of genes ne wly identiﬁed as sho wing signif- icant e vidence of genetic interaction by the JHM-B and JHM-T is 22 and 15 respecti vely . These numbers are small relati ve to the number of genes that are no longer identiﬁed, indicating that the biggest change from the “vanilla” JHM is that the JHM-B and JHM-T are more stringent for determining signiﬁcant genetic interactions. T able 4.8A sho ws that the “v anilla” JHM and JHM-T hav e similar o v erlap with the Addinall et al. (2011), REM and IHM approaches. The JHM-B has much less ov erlap with the Addinall et al. (2011) approach than the “vanilla” JHM does, reducing the ov erlap from 649 to 498 , indicating 83 Chapter 4. Case Studies T able 4.8: Genes interacting with cdc13-1 at 27 ◦ C and GO terms over -represented in the list of interactions according to each approach A) Number of genes identiﬁed for each approach (Add Addinall et al. (2011), REM, IHM, JHM, JHM-B and JHM-T) and the ov erlap between the ap- proaches. 4135 genes from the S. cer evisiae single deletion library are considered. B) Number of GO terms identiﬁed for each approach (Add Addinall et al. (2011), REM, IHM, JHM, JHM-B and JHM-T) and the overlap between the approaches. 6107 S. cer e visiae GO T erms a v ailable. See T ables 4.2A and 4.2B for further details on the overlap between the “vanilla” models (Add Addinall et al. (2011), REM, IHM, JHM). A. Add REM IHM JHM JHM-B JHM-T JHM 649 273 572 939 N/A N/A JHM-B 498 239 468 531 553 N/A JHM-T 628 276 572 886 535 901 B. Add REM IHM JHM JHM-B JHM-T JHM 219 165 216 286 N/A N/A JHM-B 223 170 217 204 265 N/A JHM-T 215 160 219 267 206 293 that the changes lead to an approach that is ev en more dissimilar from the Addinall et al. (2011) approach. T able 4.8B sho ws that the ov erlap in signiﬁcant GO terms for the JHM-T and JHM-B with the JHM is 204 and 267 respecti v ely . There are 286 (see T able 4.8B) signiﬁcant GO terms found with the “v anilla” JHM, meaning there is a reduction of approximately 29% and 7% with the JHM-B and JHM-T respecti vely , demonstrating the difference of our ne w approaches from “v anilla” JHM. T able 4.8B also sho ws that the “vanilla” JHM, JHM-B and JHM-T all hav e a similar number of o v erlap in signiﬁcant GO terms with the Addinall et al. (2011), REM and IHM approaches. W e hav e introduced two potential ways of further extending the JHM to better model a QF A screen comparison, Figures 4.12 and 4.13 are included as a reference for further research. The JHM-B has made large changes to our results by reducing the number of hits, see T able 4.6. Further research may in v olv e in vestigating the behaviour of an alternati ve JHM-B with tighter priors for the batch ef fect parameters so we can see ho w the additional parameters affect the model ﬁt in more detail. Further research for the JHM- T would in volve dev eloping an alternativ e JHM-T where different transformations are made for the control and query screens. W e ﬁnd that the largest difference with the JHM- B and JHM-T is that the y are more stringent for determining genetic interactions than the “v anilla” JHM. Currently we prefer the “vanilla” JHM until further model exploration and analysis such as simulation studies are carried out to further in vestigate ho w the JHM-B and JHM-T af fect our results. 84 Chapter 4. Case Studies T able 4.9: Description of the joint hierarchical model with batch effects. b identiﬁes the batch which an orf ∆ repeat belongs to. Further model notation is deﬁned in T able 3.3 c = 0 , 1 Condition le vel l = 1 , ..., L c orf ∆ le vel m = 1 , ..., M cl Repeat le vel n = 1 , ..., N clm T ime point le v el b = 1 , ..., B Batch T ime point le v el y clmn ∼ N( ˆ y clmn , ( ν cl ) − 1 ) ˆ y clmn = x ( t clmn ; K clm , r clm , P ) Repeat le vel log K clm ∼ N( α c + κ b + K o l + δ l γ cl , ( τ K cl ) − 1 ) I ( −∞ , 0] log τ K cl ∼ N( τ K,p c , ( σ τ ,K c ) − 1 ) I [0 , ∞ ) log r clm ∼ N( β c + λ b + r o l + δ l ω cl , ( τ r cl ) − 1 ) I ( −∞ , 3 . 5] log τ r cl ∼ N( τ r,p c , ( σ τ ,r c ) − 1 ) orf ∆ le vel e K o l ∼ t ( K p , ( σ K,o ) − 1 , 3) I [0 , ∞ ) log σ K,o ∼ N( η K,o , ( ψ K,o ) − 1 ) e r o l ∼ t ( r p , ( σ r,o ) − 1 , 3) I [0 , ∞ ) log σ r,o ∼ N( η r,o , ( ψ r,o ) − 1 ) log ν cl ∼ N( ν p , ( σ ν ) − 1 ) log σ ν ∼ N( η ν , ( ψ ν ) − 1 ) δ l ∼ B er n ( p ) e γ cl = ( 1 if c = 0; t (1 , ( σ γ ) − 1 , 3) I [0 , ∞ ) if c = 1 . log σ γ ∼ N( η γ , ψ γ ) e ω cl = ( 1 if c = 0; t (1 , ( σ ω ) − 1 , 3) I [0 , ∞ ) if c = 1 . log σ ω ∼ N( η ω , ψ ω ) Condition le vel α c = ( 0 if c = 0; N( α µ , η α ) if c = 1 . β c = ( 0 if c = 0; N( β µ , η β ) if c = 1 . τ K,p c ∼ N( τ K,µ , ( η τ ,K,p ) − 1 ) log σ τ ,K c ∼ N( η τ ,K , ( ψ τ ,K ) − 1 ) τ r,p c ∼ N( τ r,µ , ( η τ ,r,p ) − 1 ) log σ τ ,r c ∼ N( η τ ,r , ( ψ τ ,r ) − 1 ) Population le vel log K p ∼ N( K µ , ( η K,p ) − 1 ) log r p ∼ N( r µ , ( η r,p ) − 1 ) ν p ∼ N( ν µ , ( η ν,p ) − 1 ) log P ∼ N( P µ , ( η P ) − 1 ) Batch Log κ b ∼ N( κ p , ( η κ ) − 1 ) Log λ b ∼ N( λ p , ( η λ ) − 1 ) 85 Chapter 4. Case Studies T able 4.10: Description of the joint hierarchical model with transformations. Model notation is deﬁned in T able 3.3 c = 0 , 1 Condition le vel l = 1 , ..., L c orf ∆ le vel m = 1 , ..., M cl Repeat le vel n = 1 , ..., N clm T ime point le v el T ime point le v el y clmn ∼ N( ˆ y clmn , ( ν cl ) − 1 ) ˆ y clmn = x ( t clmn ; K clm , r clm , P ) Repeat le vel log K clm ∼ N( α c + K o l + δ l γ cl φ , ( τ K cl ) − 1 ) I ( −∞ , 0] log τ K cl ∼ N( τ K,p c , ( σ τ ,K c ) − 1 ) I [0 , ∞ ) log r clm ∼ N( β c + r o l + δ l ω cl χ , ( τ r cl ) − 1 ) I ( −∞ , 3 . 5] log τ r cl ∼ N( τ r,p c , ( σ τ ,r c ) − 1 ) orf ∆ le vel e K o l ∼ t ( K p , ( σ K,o ) − 1 , 3) I [0 , ∞ ) log σ K,o ∼ N( η K,o , ( ψ K,o ) − 1 ) e r o l ∼ t ( r p , ( σ r,o ) − 1 , 3) I [0 , ∞ ) log σ r,o ∼ N( η r,o , ( ψ r,o ) − 1 ) log ν cl ∼ N( ν p , ( σ ν ) − 1 ) log σ ν ∼ N( η ν , ( ψ ν ) − 1 ) δ l ∼ B er n ( p ) e γ cl = ( 1 if c = 0; t (1 , ( σ γ ) − 1 , 3) I [0 , ∞ ) if c = 1 . log σ γ ∼ N( η γ , ψ γ ) e ω cl = ( 1 if c = 0; t (1 , ( σ ω ) − 1 , 3) I [0 , ∞ ) if c = 1 . log σ ω ∼ N( η ω , ψ ω ) Condition le vel α c = ( 0 if c = 0; N( α µ , η α ) if c = 1 . β c = ( 0 if c = 0; N( β µ , η β ) if c = 1 . τ K,p c ∼ N( τ K,µ , ( η τ ,K,p ) − 1 ) log σ τ ,K c ∼ N( η τ ,K , ( ψ τ ,K ) − 1 ) τ r,p c ∼ N( τ r,µ , ( η τ ,r,p ) − 1 ) log σ τ ,r c ∼ N( η τ ,r , ( ψ τ ,r ) − 1 ) Population le vel log K p ∼ N( K µ , ( η K,p ) − 1 ) log r p ∼ N( r µ , ( η r,p ) − 1 ) ν p ∼ N( ν µ , ( η ν,p ) − 1 ) log P ∼ N( P µ , ( η P ) − 1 ) φ ∼ Γ ( φ shape , φ scale ) χ ∼ Γ ( χ shape , χ scale ) 86 Chapter 4. Case Studies 0 20 40 60 80 0 20 40 60 80 L TE1 TEL1 HMT1 AKL1 HSP26 UBC4 NPL4 CHK1 RIF1 SGF29 :::MRC1 MRC1 MAK31 PTC6 RPN4 PHO13 PPH3 DPB4 MTC5 RAD9 SWM1 YDR269C IPK1 RPP2B MAK10 BEM2 RAD24 ARC1 ARO2 YGL218W PPT1 ELP2 PHB2 SL T2 NMD2 LRP1 ARP1 THP2 DPH1 KGD1 FKH1 GSH1 POL32 GRR1 RPL14A MRT4 LST4 DPH2 HSP104 YLR143W DPH5 RPL37A EST1 VRP1 CDC73 YMD8 MFT1 GTR1 IRC21 JNM1 D YN3 YMR310C YNL120C CSE2 STI1 EXO1 STE4 RPL20B LDB19 RAD17 BTS1 ELP3 RPL43A YPR044C CLB2 CLB5 P A T1 STE2 TRP5 RPL9A RPL8A JJJ3 ELM1 MEH1 VPS51 IKI3 IRA2 :::RTT103 SSD1 BMH1 RMD11 MRE11 FKH2 CKB2 YGL024W HGH1 PPQ1 YBR277C DBP3 VPS9 DDC1 SSF1 ERG24 ARO1 ERG3 VIP1 ARX1 RPL37B IMP2' SPE3 RPL24A YLR184W DPB3 YPT6 PUF4 YGL217C UPF3 RPL11B SCS2 DOA1 MKS1 TKL1 BUD27 SFL1 MRN1 RPL6B NOP16 YLR402W RAD50 SHE4 RPL13A LSM1 SRN2 CKA2 MNI1 VPS8 RPL2A BUD14 EFT2 RIC1 CCZ1 RPL35B NAM7 GSH2 NUP133 ESBP6 YLR261C ALD6 MET18 BAS1 SAN1 ARP8 YLR338W VPS4 OPI1 RPL35A UPS1 TMA22 SAC7 UBP6 RTC1 HPR5 YPL062W TMA20 VPS24 SUB1 YDL119C CST6 HA T1 PKR1 RIM1 YER119C−A SRB2 MNI2 GEF1 SUR4 RPO41 SPE2 YBL104C Y OR309C EDE1 KIN3 CP A1 FYV10 EBS1 OST4 YPT7 RRD1 YLR111W MDM10 UTH1 EST3 PUF6 RPL8B RPP1B RPL27B OGG1 MPH1 YML010C−B LSM7 NCS2 V AM6 OCA6 ARO8 MUM2 YIL055C DCC1 UFD4 MON1 ELG1 KEM1 PHO80 MBP1 CCW12 YKR074W CYK3 FPS1 RVS167 MMM1 SLA1 PIB2 YBR266C YPL080C DID2 BCK1 UFD2 RPL29 DBR1 RPL16A VPS1 V AM3 RPL24B SNX4 P AC1 VID28 CTK1 YKR035C MCX1 MDM34 TOS3 YMR057C TIP41 PET130 FET3 RPL21B HAP4 POT1 VPS53 STP1 RIF2 NPT1 DBF2 MSS18 BST1 CSN12 YML013C−A PTC1 FTR1 GUP1 YDL109C YKU70 FEN1 HDA1 RPL17B TEF4 D YN1 MCM16 YDL118W CTP1 QCR9 RPL43B NUP2 REI1 RCE1 VPS5 RPL36A YIL057C SLX9 PHO88 BUB2 CSM3 YPS6 IMG2 P AH1 OST3 PTC2 AIR1 REH1 CCC2 DIE2 ELP4 PHO2 YKU80 RPL4A T AL1 YNL226W RPL40B CRD1 RHO2 V AM10 RAD23 CO X23 DRS2 VID24 LSM6 CCS1 UBX4 HCM1 TRP1 RPS4A SIM1 SET2 KTI12 VPS60 PUS7 TMA23 YBR025C CA T5 YDR271C VPS38 CHO2 CTF4 SIR3 MCK1 UBA4 VPS21 NRM1 YNR004W YDL176W YGL042C VPS17 RRN10 PET122 MSN5 DIA2 RPS21B SER2 HSP82 SBP1 SRB8 ECM5 KAP122 SKI8 ICT1 RPL26A CRP1 OAC1 YMR193C−A COQ10 ERG6 XRS2 YMR074C YBL083C RPL22A BNR1 STB1 NCS6 GIM4 SKI2 QCR2 PEX32 YKE4 TRM44 Y OR251C YIL161W CO X12 XBP1 PFK26 RHR2 LDB18 MAK3 GPD2 RTF1 GDH1 A YR1 TMA19 YPL102C SIR4 SYS1 RVS161 SAC1 LIA1 ALG3 YNR029C GEM1 CBF1 SMM1 YGR259C NAP1 SWF1 RPL34A V AC14 A VT5 CKB1 FCY2 TNA1 EMI1 YNL195C VPS35 YPL205C CAP2 YMR153C−A NRG2 BUD28 UBP2 GUF1 YER093C−A NUP53 CPR7 MOG1 ALG12 NTO1 YNL011C SAS4 RPE1 GZF3 VPS41 YT A7 Y OR052C CYT2 RPL33B UBX7 YBR144C RPL16B BEM4 BMH2 GP A2 SOH1 SAS5 RPL19A CLG1 CBP4 YLR218C HIS3 PNG1 STM1 HPT1 GID8 RPS19B RA V1 EAF3 KEX1 MID2 YJL185C SKI7 YGL149W OTU2 YKL121W RCY1 MGR2 YBL059W ICE2 RPL23A PEX15 PCP1 MP A43 FUS3 PEX13 RNR3 CP A2 ZRT3 CHL1 PEX8 PPZ1 YLR290C UBP3 RTS1 ZWF1 STE11 AHA1 ARC18 HCR1 PER1 YJL211C YPR050C :::GUK1 YMR316C−B PGM2 CBT1 YJL206C HHF2 ASC1 SWI4 GRS1 YME1 YLR407W SAS2 HAP3 VHS2 VID30 PBP2 PEX2 RPP1A TCO89 YDL050C NBP2 CYT1 YJL120W RHO5 IXR1 P AN2 HAP5 SFH5 IMP2 EOS1 YLL029W MED1 MNE1 YMR206W ERJ5 YDR203W HSE1 DEG1 :::GAL11 CSF1 ALG5 APT1 MIS1 LEO1 YDR029W LAC1 AEP2 BUD19 BRE5 YNR005C LRG1 TEX1 MDM38 OCA4 SAP30 CTF18 YDR266C FMP35 MET16 RMD5 STE7 LEU3 OCA5 SIN4 ERD1 YBR226C ARA1 NIF3 STD1 Y AP1 MTC3 TOP1 GOS1 SPE1 SPT2 THR4 YGL057C RNH202 Figure 4.12: cdc13-1 27 ◦ C vs ura3 ∆ 27 ◦ C joint hierarchical model with Batch effect (JHM-B) ﬁtness plot with orf ∆ posterior mean ﬁtnesses. The JHM does not does not make use of a ﬁtness measure such as M DR × M DP but the ﬁtness plot is giv en in terms of M D R × M D P for comparison with other approaches which do. orf ∆ strains are classiﬁed as being a suppressor or enhancer based on one of the two parameters used to classify genetic interaction, growth parameter r , this means occasionally strains can be more ﬁt in the query experiment in terms of M DR × M D P b ut be classiﬁed as enhancers (green). Further ﬁtness plot explanation and notation is gi ven in Figure 4.2. 87 Chapter 4. Case Studies 0 20 40 60 0 10 20 30 40 50 60 70 VPS8 L TE1 DRS2 GEM1 BUD14 KIN3 SLA1 HAP3 EDE1 YBL059W ALG3 YBL083C RPL23A TEL1 A VT5 YBL104C YBR025C RPL4A HMT1 MUM2 AKL1 HSP26 UBC4 VID24 CCZ1 ARA1 PEX32 RPS9B MCX1 YBR266C REI1 UBX7 CHK1 RIF1 YBR277C DPB3 SNF5 CTP1 SGF29 :::MRC1 MRC1 RVS161 MAK31 RIM1 FEN1 PER1 BUD31 HCM1 P A T1 PTC6 NHP10 PTC1 YDL012C RPN4 NA T1 MBP1 RPP1A RPL13A PHO2 YDL109C CYK3 YDL118W YDL119C RPP1B RPL35B CRD1 YDL176W UFD2 RPL35A OST4 PHO13 TRP1 OCA6 PPH3 VPS41 BMH2 ARX1 STE5 DPB4 SWF1 ARO1 MTC5 SAN1 SAS4 EBS1 RAD9 RMD5 SWM1 YDR262W YDR269C CCC2 YDR271C :::RTT103 SSD1 IPK1 MSN5 RPP2B EFT2 RVS167 SAC7 HPT1 STP1 RPL27B VPS60 PUF6 RPL37B EMI1 KRE28 RAD23 MAK10 NOP16 TMA20 FCY2 RPL34A PTC2 IES5 YER093C−A SWI4 YER119C−A SCS2 FTR1 UBP3 PET122 BEM2 RAD24 BMH1 BUD27 BST1 STE2 RPO41 UBP6 RPL2A RPL29 PUF4 KAP122 CKB1 PIB2 YGL024W TRP5 RPL24A DBP3 GUP1 ARC1 MON1 RPL9A ARO2 YGL149W KEM1 TOS3 ARO8 KEX1 V AM7 SKI8 CLG1 YGL217C YGL218W MDM34 UPF3 RPL11B DBF2 PCP1 PPT1 RPL24B CBP4 QCR9 HGH1 ELP2 SER2 DIE2 PHB2 YGR259C TNA1 OTU2 OPI1 RMD11 GOS1 RPL8A SL T2 SSF1 NMD2 LRP1 CO X23 ARP1 CRP1 THP2 EST3 VID28 YKE4 CST6 RHR2 YIL055C YIL057C RNR3 AIR1 FYV10 XBP1 DPH1 PFK26 MNI1 SIM1 A YR1 MET18 FKH1 RPL16A RRD1 IMP2' BNR1 POT1 YIL161W MPH1 YPS6 SYS1 PET130 SNX4 HPR5 BCK1 GSH1 GZF3 YJL120W RPE1 LSM1 SFH5 VPS35 RPL17B YJL211C TMA22 GEF1 POL32 CBF1 LIA1 CSN12 GRR1 RPL43B JJJ3 CP A2 RPS4A YJR154W RPL14A MRT4 UFD4 VPS24 ELM1 TEF4 HAP4 KTI12 DBR1 ZRT3 LST4 DPH2 CBT1 DOA1 VPS1 MEH1 VPS51 YKR035C DID2 UTH1 NAP1 D YN1 NUP133 RPL40B BAS1 HSP104 YLL029W RPL8B CO X12 RIC1 ERG3 RPL22A BUD28 ICT1 CCW12 YLR111W SRN2 YLR143W STM1 DPH5 RPL37A UPS1 EST1 YLR261C YPT6 GUF1 YLR290C NUP2 VRP1 YLR338W RPL26A T AL1 VPS38 SUR4 IKI3 REH1 SKI2 YLR402W YLR407W VIP1 CDC73 RPL6B RIF2 YPT7 ERG6 YML010C−B YML013C−A YMD8 OGG1 MFT1 RPS1B VPS9 GTR1 SUB1 CSM3 BUB2 YMR057C FET3 UBX4 IRC21 YMR074C NAM7 PGM2 PKR1 GID8 YMR153C−A ALD3 ECM5 YMR193C−A RPL36A YMR206W MRE11 CO X7 TPS3 AEP2 YKU70 JNM1 D YN3 YNL011C SIW14 FKH2 RPL16B EOS1 YNL120C ESBP6 RPL42A YNL226W JJJ1 SIN4 ZWF1 RAD50 ERG24 MCK1 KRE1 YNR005C CSE2 ALG12 MNT4 PHO80 COQ10 GSH2 SPE2 EMI5 IRA2 RTC1 STI1 EXO1 SHE4 CKB2 STD1 CKA2 V AM10 VPS5 OST3 VPS21 V AM3 CA T5 VPS17 ELG1 HIS3 NPT1 STE4 SAS5 RPL33B PUS7 Y OR251C P AC1 Y OR309C RPL20B LDB19 RAD17 GDH1 HA T1 TRM44 ALD6 YPL062W BTS1 RPL21B YPL080C ELP3 PNG1 MGR2 ELP4 BEM4 PPQ1 MRN1 DDC1 YPL205C ALG5 TIP41 RPL43A YPR044C MNI2 MCM16 SPE3 TKL1 CLB2 CLB5 MSS18 VPS4 QCR2 MDM10 YBR144C IMG2 YDL050C YDR203W RNH202 YGL057C P AN2 YT A7 O YE2 CAP2 PEP8 YUR1 CPR7 OAC1 CTK1 KNS1 YLR218C V AC14 CAC2 SAS2 PSD1 MDM12 MDM38 PEX15 SKI7 HSP82 EAF3 GPH1 FUS3 RRN10 V AM6 GP A2 SBP1 STB5 VHS2 SET2 RA V1 CYT2 FPS1 ARC18 ASC1 YMR310C RPL9B PMS1 AAH1 GPD2 SFL1 YBR226C OMS1 SPT2 VID30 FMP35 MOG1 SPE1 PPZ1 YNR020C YNR029C OCA4 NBP2 DOT1 MMM1 YLR184W PEX13 NUP53 TOM7 ALG9 BUB3 Y OR052C HAP5 MRPL1 ERD1 YJL185C YJL206C IMP2 CYT1 MNE1 RBS1 ERJ5 GET1 BCH2 UBP2 PBP2 DCC1 YDR266C PEX8 OCA5 BUL1 BRE5 Y OR008C−A LSM6 UBA4 CPS1 P AH1 PMP3 HSE1 Y AP1801 YKR074W FKS1 ARP8 IOC3 MIS1 LEU3 NPR1 POC4 YGL042C :::GAL11 VTS1 PHO87 MTC3 Y AP1 LRG1 NCS6 YLR091W ALG6 Y OR131C VPS30 A TP10 PEX14 HSP12 PEX2 YKL121W LDB18 ACE2 TMA16 HAP2 PTH1 RSM25 HCR1 OCA1 YNL109W RGA1 BNA2 FIT2 NTO1 MAK3 YDR049W RPS24A MDL1 MF A1 RTF1 YDR537C DEG1 DCW1 KEX2 YDL041W LAC1 NMD4 NRG2 RTG2 MLH1 YDR029W YDR348C MID2 YPL102C :::GUK1 YML108W TOM70 TOP1 RTC4 TLG2 PHO88 SIR4 DSK2 RHO5 APC9 CTF4 FYV1 PKH3 CYM1 MET16 MFB1 KRE11 CHL1 GIM3 TPM1 AZF1 TGS1 YPR050C SRB2 YBR232C LEO1 MNN11 ARL3 IES2 URE2 CYC2 ICE2 YBR285W DIA2 FMP36 UBC13 LHP1 VPS53 SSA1 Y AL004W MIR1 APL1 AHA1 YKU80 P AN3 ISU2 MMS2 APT1 TMA19 ARF1 PHB1 BF A1 ASH1 PEX10 MVB12 YML053C ZRT1 GRS1 MKS1 IBD2 SIF2 MGA2 YML102C−A SLX9 PTP3 DIP5 YBR028C PEX6 PUF3 ALG8 NCS2 IRC3 RO T2 VPS27 RNH203 BUB1 RSM28 GDS1 CO X8 RPL41B NBA1 YLR217W TOM6 YDR467C BUD19 SNF1 LRE1 CHO2 ARL1 TOS1 QCR6 NFI1 APL4 SER1 ASM4 MEP3 CCS1 CKA1 TPS1 YBR224W XRS2 YPL041C YNL105W FIS1 YNL198C MSC6 INP52 NIF3 SHE1 LAS21 VPS75 YPL035C OCA2 PEX12 SKG3 DEP1 ABP1 PEX5 YPR097W YDR506C GIS4 RAX2 GIM4 CUE3 MRPS9 YPR098C YPS7 MMS22 VPS29 ELP6 SIR3 APE2 RIT1 TCM62 RAD27 YIL166C SPT3 Y OR082C YHL005C YML090W RPS27A IL V6 YPR039W SWC5 SWA2 MSH2 GAS1 Y AF9 PRS5 SRL1 YNL171C GGA2 ADE1 LEM3 IOC4 MET3 YPR004C BUD13 SNC2 LSM7 SKI3 YBR134W UME1 YER077C KES1 BUD21 RCE1 RPP2A MED1 YLR282C OST5 PIN4 PET127 VIK1 NPL4 SYC1 DCR2 YGL235W CHS3 RTC6 YPL105C RNH201 YLR118C AHP1 CWH41 THR1 CO X5B YMR144W RGS2 JSN1 YBR246W BEM3 Y OL079W Y AL058C−A IXR1 YLR334C YCR051W OSH3 YPR084W YKL158W CTF18 YJL215C PSR2 RPL19A RBL2 ELF1 YCF1 YGR242W OPI10 PEX4 SOH1 YGL046W CO X5A KAR9 IRC15 STE50 RTG1 CHZ1 RTS1 YPL225W TSA1 PET494 FYV12 RPS30B YLR404W SPO14 APP1 YPR096C MLH2 BEM1 INP53 Y OR022C STB1 YBR099C ITR1 YDR149C EAF6 MMS1 ERF2 HOS2 YML119W CWH43 YBR238C SRB8 CTI6 REX4 GTR2 UBA3 COG7 YSA1 INP2 A TG27 SRF6 PRE9 APS2 CSF1 WWM1 P AM17 SPT8 URM1 NDL1 GIS3 VTC4 PDE2 PMT2 AU A1 YER078C JID1 THR4 YBL071C KIP2 IRC8 YIL064W RP A14 YIA6 YDL071C IOC2 PRX1 MAD1 MMR1 SYF2 TFP1 APL3 YHL044W PEX25 COG8 ENT2 SBA1 STE7 SPO11 AD Y4 RTT102 YME1 YGL132W FMP25 SHE9 PHO86 MRP49 YKL077W ASR1 SLM1 YPR148C ERV14 SCS7 RPS14A HMX1 FSF1 MGR1 VPS13 HMO1 SCD6 YER066W MP A43 PEX3 ECM22 INO2 YPR090W ILM1 MNN2 YNL140C PEX30 YMC2 ALB1 NRM1 PEX1 RFM1 YGR226C Figure 4.13: cdc13-1 27 ◦ C vs ura3 ∆ 27 ◦ C joint hierarchical model with transformations (JHM- T) ﬁtness plot with orf ∆ posterior mean ﬁtnesses. The JHM does not does not make use of a ﬁtness measure such as M DR × M D P but the ﬁtness plot is giv en in terms of M D R × M D P for comparison with other approaches which do. orf ∆ strains are classiﬁed as being a suppressor or enhancer based on one of the two parameters used to classify genetic interaction, growth parameter r , this means occasionally strains can be more ﬁt in the query experiment in terms of M DR × M D P b ut be classiﬁed as enhancers (green). Further ﬁtness plot explanation and notation is gi ven in Figure 4.2. 88 Chapter 5. F ast Bay esian parameter estimation f or stochastic logistic gr owth models 5.1. Introduction In this Chapter , fast approximations to the stochastic logistic growth model (SLGM) (Capocelli & Ricciardi, 1974) (see Section 1.3) are presented. The SLGM is giv en by the follo wing dif fusion equation: dX t = r X t  1 − X t K  dt + σ X t dW t , (5.1) where X t 0 = P and is independent of W t , t ≥ t 0 . A deterministic logistic growth model (see Section1.1.2) is unable to describe intrinsic error within stochastic logistic growth time course data. Consequently a deterministic model may lead to less accurate estimates of logistic growth parameters than a SDE, which can describe intrinsic noise. So that random ﬂuctuations present within observed yeast QF A data (1.1) can accounted for as intrinsic noise instead of being confounded within our measurement error we are interested in using the SLGM in (5.1), instead of its deterministic counterpart (1.1). Alternativ e stochastic logistic gro wth equations exist (see Section 1.3) but we ﬁnd (5.1) to be the most appropriate as intrinsic noise does not tend to zero with larger population sizes. The SLGM (5.1) is analytically intractable and therefore inference requires relati vely slo w numerical simulation. Where fast inference is of importance such as real-time anal- ysis or big data problems, we can use model approximations which do have analyti- cally tractable densities, enabling fast inference. For large hierarchical Bayesian mod- els (see Chapter 3), computational time for inference is typically long, ranging from one to two weeks using a deterministic logistic growth model. Inference for lar ge hierarchi- cal Bayesian models using the SLGM would increase computational time considerably (computational time is roughly proportional to the number of time points longer) with relati vely slow numerical simulation approaches, therefore we may be interested in using approximate models that will allo w us to carry out fast inference. First an approximate model dev eloped by Rom ´ an-Rom ´ an & T orres-Ruiz (2012) is introduced. T wo new approximate models are then presented using the linear noise ap- proximation (LN A) (W allace, 2010; K omoro wski et al. , 2009) of the SLGM. The model 89 Chapter 5. Fast Bayesian parameter estimation for stochastic logistic gro wth models proposed by Rom ´ an-Rom ´ an & T orres-Ruiz (2012) is found to be a zero-order noise ap- proximation. The approximate models considered are compared against each other for both simu- lated and observed logistic growth data. Finally , the approximate models are compared to “exact” approaches. 5.2. The Rom ´ an-Rom ´ an & T orres-Ruiz (2012) diffusion pr ocess Rom ´ an-Rom ´ an & T orres-Ruiz (2012) present a logistic growth diffusion process (RR TR) which has a transition density that can be written explicitly , allo wing inference for model parameter v alues from discrete sampling trajectories. The RR TR is deri ved from the follo wing ODE: dx t dt = Qr e rt + Q x t , (5.2) where Q =  K P − 1  e rt 0 , P = x t 0 and t ≥ t 0 . The solution to (5.2) is giv en in (1.2) (it has the same solution as (1.1)). Rom ´ an-Rom ´ an & T orres-Ruiz (2012) see (5.2) as a generalisation of the Malthusian gro wth model with a deterministic, time-dependent fertility h ( t ) = Qr e rt + Q , and replace this with Qr e rt + Q + σ W t to obtain the follo wing approximation to the SLGM: dX t = Qr e rt + Q X t dt + σ X t dW t , (5.3) where Q =  K P − 1  e rt 0 , P = X t 0 and is independent of W t , t ≥ t 0 . The process described in (5.3) is a particular case of the Log-normal process with exogenous factors, therefore an exact transition density is av ailable (Guti ´ errez et al. , 2006). The transition density for Y t , where Y t = log( X t ) , can be written: ( Y t i | Y t i − 1 = y t i − 1 ) ∼ N ( µ t i , Ξ t i ) , where a = r , b = r K , µ t i = log( y t i − 1 ) + log  1 + be − at i 1 + be − at i − 1  − σ 2 2 ( t i − t i − 1 ) and Ξ t i = σ 2 ( t i − t i − 1 ) . (5.4) 90 Chapter 5. Fast Bayesian parameter estimation for stochastic logistic gro wth models 5.3. Linear noise approximation with multiplicati ve noise W e now take a dif ferent approach to approximating the SLGM (5.1), which will turn out to be closer to the exact solution of the SLGM than the RR TR (5.3). Starting from the original model (5.1), we apply It ˆ o’ s lemma (It ˆ o, 1944; Øksendal, 2010): d f ( t, X t ) = d f dt dt + µ d f dx dt + 1 2 σ 2 d 2 f dx 2 dt + σ d f dx dW t , (5.5) with the transformation f ( t, X t ) ≡ Y t = log X t . After deriving the following partial deri v ati v es: d f dt = 0 , d f dx = 1 X t and d 2 f dx 2 = − 1 X 2 t , we can obtain the follo wing It ˆ o drift-dif fusion process: d Y t =  r − 1 2 σ 2 − r K e Y t  dt + σ dW t . (5.6) The log transformation from multiplicati v e to additiv e noise, giv es a constant dif fusion term, so that the LN A will giv e a good approximation to (5.1). The LN A reduces a non-linear SDE to a linear SDE with additi v e noise. The LN A can be viewed as a ﬁrst order T aylor e xpansion of an approximating SDE about a deterministic solution. W e no w separate the process Y t into a deterministic part v t and a stochastic part Z t so that Y t = v t + Z t and consequently d Y t = dv t + d Z t . W e choose v t to be the solution of the deterministic part of (5.6): dv t =  r − 1 2 σ 2 − r K e v t  dt. (5.7) W e no w redeﬁne our notation as follo ws: a = r − σ 2 2 and b = r K . Equation 5.7 is then solved for v t : v t = log  aP e aT bP ( e aT − 1) + a  , (5.8) where T = t − t 0 . W e no w write do wn an expression for d Z t , where d Z t = d Y t − dv t : d Z t =  a − be Y t  dt + σ dW t − ( a − be v t ) dt W e then substitute in Y t = v t + Z t and simplify the expression to gi ve d Z t = b  e v t − e v t + Z t  dt + σ dW t . (5.9) 91 Chapter 5. Fast Bayesian parameter estimation for stochastic logistic gro wth models As d Z t is a non-linear SDE it cannot be solved explicitly , we use the LN A (see Sec- tion 2.6.4) to obtain a linear SDE that we can solve explicitly . W e apply the LNA by making a ﬁrst-order approximation of e Z t ≈ 1 + Z t and then simplify to gi ve d Z t = − be v t Z t dt + σ dW t . (5.10) This process is a particular case of the time-v arying Ornstein-Uhlenbeck process, which can be solved explicitly . The transition density for Y t (deri v ation in Appendix C.1) is then: ( Y t i | Y t i − 1 = y t i − 1 ) ∼ N ( µ t i , Ξ t i ) , redeﬁne y t i − 1 = v t i − 1 + z t i − 1 , Q =  a b P − 1  e at 0 , µ t i = y t i − 1 + log  1 + Qe − at i − 1 1 + Qe − at i  + e − a ( t i − t i − 1 ) 1 + Qe − at i − 1 1 + Qe − at i z t i − 1 and Ξ t i = σ 2  4 Q ( e at i − e at i − 1 ) + e 2 at i − e 2 at i − 1 + 2 aQ 2 ( t i − t i − 1 ) 2 a ( Q + e at i ) 2  . (5.11) The LN A of the SLGM with multiplicativ e intrinsic noise (LN AM) can then be written as d log X t = [ dv t + be v t v t − be v t log X t ] dt + σ dW t , where P = X t 0 and is independent of W t , t ≥ t 0 . Note that the RR TR giv en in (5.3) can be similarly deri ved using a zero-order noise ap- proximation ( e Z t ≈ 1 ) instead of the LN A. 5.4. Linear noise approximation with additi ve noise As in Section 5.3, we start from the SLGM, gi ven in (5.1). Without ﬁrst log transforming the process, the LNA will lead to a worse approximation to the dif fusion term of the SLGM, but we will see in the coming sections that there are nev ertheless adv antages. W e separate the process X t into a deterministic part v t and a stochastic part Z t so that X t = v t + Z t and consequently dX t = dv t + d Z t . W e chose v t to be the solution of the deterministic part of (5.1): dv t =  r v t − r K v 2 t  dt. (5.12) 92 Chapter 5. Fast Bayesian parameter estimation for stochastic logistic gro wth models W e no w redeﬁne our previous notation as follo ws: a = r and b = r K . Equation 5.12 is then solved for v t : v t = aP e aT bP ( e aT − 1) + a . (5.13) W e no w write do wn an expression for d Z t , where d Z t = dX t − dv t : d Z t =  aX t − bX 2 t  dt + σ X t dW t −  av t − bv 2 t  dt. W e then substitute in X t = v t + Z t and simplify the expression to gi ve d Z t = ( a − 2 bv t ) Z t − bZ 2 t dt + ( σ v t + σ Z t ) dW t . As d Z t is a non-linear SDE it cannot be solved explicitly , we use the LN A (see Sec- tion 2.6.4) to obtain a linear SDE that we can solv e explicitly . W e now apply the LN A, by setting second-order term − bZ 2 t dt = 0 and σZ t dW t = 0 to obtain d Z t = ( a − 2 bv t ) Z t dt + σ v t dW t . (5.14) This process is a particular case of the Ornstein-Uhlenbeck process, which can be solved. The transition density for X t (deri v ation in Appendix C.3) is then ( X t i | X t i − 1 = x t i − 1 ) ∼ N ( µ t i , Ξ t i ) , where x t i − 1 = v t i − 1 + z t i − 1 , µ t i = x t i − 1 +  aP e aT i bP ( e aT i − 1) + a  −  aP e aT i − 1 bP ( e aT i − 1 − 1) + a  + e a ( t i − t i − 1 )  bP ( e aT i − 1 − 1) + a bP ( e aT i − 1) + a  2 Z t i − 1 and Ξ t = 1 2 σ 2 aP 2 e 2 aT i  1 bP ( e aT i − 1) + a  4 × [ b 2 P 2 ( e 2 aT i − e 2 aT i − 1 ) + 4 bP ( a − bP )( e aT i − e aT i − 1 ) + 2 a ( t i − t i − 1 )( a − bP ) 2 ] . (5.15) The LN A of the SLGM, with additi v e intrinsic noise (LN AA) can then be written as dX t =  bv t 2 + ( a − 2 bv t ) X t  dt + σ v t dW t , where P = X t 0 and is independent of W t , t ≥ t 0 . 93 Chapter 5. Fast Bayesian parameter estimation for stochastic logistic gro wth models 5.5. Simulation and Bayesian inference f or the stochastic logistic gr owth model and appr oximations T o compare the accuracies of each of the three approximate models in representing the SLGM, we ﬁrst compare simulated forward trajectories from the RR TR, LNAM and LN AA with simulated forward trajectories from the SLGM (Figure 5.1). W e use the Euler-Maruyama method (Carletti, 2006) (see Section 2.6.2) with very ﬁne discretisation to gi ve arbitrarily e xact simulated trajectories from each SDE. The LNAA and LNAM trajectories are visually indistinguishable from the SLGM (Figures 5.1 A, C & D). On the other hand, population sizes simulated with the RR TR display lar ge deviations from the mean as the population approaches its stationary phase (Figures 5.1A & B). Figure 5.1E further highlights the increases in v ariation as the pop- ulation approaches stationary phase for simulated trajectories of the RR TR, in contrast to the SLGM and LN A models. 94 Chapter 5. Fast Bayesian parameter estimation for stochastic logistic gro wth models Time (Days) Normalised Cell Density (AU) Time (Days) Normalised Cell Density (AU) A B Time (Days) Normalised Cell Density (AU) Time (Days) Normalised Cell Density (AU) C D E Time (Days) Standard Deviation 0 1 2 3 4 5 6 7 0.000 0.005 0.010 0.015 RRTR LNAM LNAA SL GM Figure 5.1: Forward trajectories (No. of simulations=100) for the stochastic logistic gro wth model and approximations. See T able 5.1 for parameter values. A) The stochastic logistic growth model (SLGM). B) The Rom ´ an-Rom ´ an & T orres-Ruiz (2012) (RR TR) approximation. C) The linear noise approximation with multiplicati ve intrinsic noise (LN AM). D) The linear noise ap- proximation with additi ve intrinsic noise (LN AA). E) Standard deviations of simulated trajectories ov er time for the SLGM (black), RR TR (red), LNAM (green) and LN AA (blue). 95 Chapter 5. Fast Bayesian parameter estimation for stochastic logistic gro wth models 5.5.1. Bayesian parameter infer ence with appr oximate models T o compare the quality of parameter inference using each of these approximations we simulated synthetic time-course data from the SLGM and combined this with either Log- normal or Normal measurement error . Carrying out Bayesian inference with broad priors (see (5.16) and (5.17)) we compared the parameters recov ered using each approximation with those used to generate the synthetic dataset. The synthetic time-course datasets consist of 27 time points generated using the Euler-Maruyama method with very ﬁne interv als (Carletti, 2006). W e formulate our inference problem as a dynamic linear state space model (W est & Harrison, 1997). The advantage of a state space formulation is that we are then able to build a Kalman ﬁlter to carry out fast parameter inference. W e can take advantage of a linear Gaussian structure and construct a Kalman ﬁlter recursion for marginal likelihood computation (Appendix C.5). By choosing to match the measurement error structure to the intrinsic error of our models we can build a linear Gaussian structure. W e therefore assume Log-normal (multiplicativ e) error for the RR TR and LN AM, and for the LN AA we assume Normal (additi ve) measurement error . Dependent v ariable y t i and independent v ariable { t i , i = 1 , ..., N } are data input to the model (where t i is the time at point i and N is the number of time points). X t is the state process, describing the population size. The state space model for the RR TR and LN AM is as follo ws: log( y t i ) ∼ N( X t i , ν 2 ) , ( X t i | X t i − 1 = x t i − 1 ) ∼ N ( µ t i , Ξ t i ) , where x t i = v t i + z t i , (5.16) µ t i and Ξ t i are gi ven by (5.4) and (5.11) for the RR TR and LNAM respectiv ely . Priors are as follo ws: log X 0 ≡ log P ∼ N( µ P , τ P − 1 ) , log K ∼ N( µ K , τ K − 1 ) , log r ∼ N( µ r , τ r − 1 ) , log ν − 2 ∼ N( µ ν , τ ν − 1 ) , log σ − 2 ∼ N( µ σ , τ σ − 1 ) I [1 , ∞ ] . Bayesian inference is carried out with broad prio rs such that estimated parameter v al- ues are not heavily inﬂuenced by our choice. See T able C.1 for prior hyper -parameter v alues. Log-normal prior distributions are chosen to ensure positi ve logistic growth pa- rameters and precision parameters are strictly positi v e. Our prior for log σ − 2 is truncated belo w 1 to av oid unnecessary exploration of extremely lo w probability regions, which could be caused by problems identifying ν , for example when log ν − 2 takes large val- 96 Chapter 5. Fast Bayesian parameter estimation for stochastic logistic gro wth models ues, and to ensure that intrinsic noise does not dominate the process. Our choice of 1 for the truncation threshold is made by observing forw ard simulations from our processes and choosing a v alue for log σ − 2 where intrinsic noise is so large that the deterministic part of the process is masked, consequently making the LN A a bad approximation. W e also ﬁnd that truncating log σ − 2 is more preferable to truncating log ν − 2 as truncating log ν − 2 does not alle viate the identiﬁability problem without being very restrictiv e for the measurement error structures. The state space model for the LN AA is as follo ws: y t i ∼ N( X t i , ν 2 ) , ( X t i | X t i − 1 = x t i − 1 ) ∼ N ( µ t i , Ξ t i ) , where x t i = v t i + z t i , (5.17) µ t i and Ξ t i are gi ven by (5.15). Priors are as in (5.16). Measurement error for the observed v alues is Normal so that we hav e a linear Gaussian structure. The state space models in (5.16) and (5.17) have different measurement error structures. So that a fair comparison can be made between (5.16) and (5.17), we choose our priors so that the mar ginal mo- ments for the measurement error of our models is not too dissimilar , particularly at the earliest stage where most gro wth is observed. T o see ho w the inference from our approximate models compares with slower “ex- act” models, we consider Euler-Maruyama approximations (Kloeden & Platen, 1992) of (5.1) and of the log transformed process, using ﬁne interv als. W e use the approach of (Golightly & W ilkinson, 2005) to carry out inference of our “exact” models. A single site update algorithm is used to update model parameters and the Euler -Maruyama ap- proximation of the latent process in turn. Giv en these approximations we can construct a state space model for an “exact” SLGM with Log-normal measurement error (SLGM+L) and similarly for the SLGM with Normal measurement error (SLGM+N), priors are as in (5.16). Our inference makes use of a Kalman ﬁlter to inte grate out the state process. The Kalman ﬁler allo ws for fast inference compared to slo w numerical simulation approaches that impute all states. The algorithm for our approximate models is the Metropolis-within- Gibbs sampler with a symmetric proposal (Gamerman & Lopes, 2006). Full-conditionals are sampled in turn to gi ve samples from the joint posterior distrib ution: π ( K, r, P , σ, ν, X t 1: N , y t 1: N ) , 97 Chapter 5. Fast Bayesian parameter estimation for stochastic logistic gro wth models where X t 1: N is the latent process and y t 1: N is the observed data, for N observed data points. The Metropolis-within-Gibbs sampler algorithm is as follo ws: 1) Initialise counter i = 1 and parameters K (0) , r (0) , σ (0) , P (0) , ν (0) 2) Simulate K ( i ) from K ∼ π ( K | ν ( i − 1) , r ( i − 1) , σ ( i − 1) , P ( i − 1) , y t 1: N ) 3) Simulate r ( i ) from r ∼ π ( r | ν ( i − 1) , K ( i ) , σ ( i − 1) , P ( i − 1) , y t 1: N 4) Simulate σ ( i ) from σ ∼ π ( σ | ν ( i − 1) , K ( i ) , r ( i ) , P ( i − 1) , y t 1: N ) 5) Simulate P ( i ) from ν ∼ π ( P | ν ( i − 1) , K ( i ) , r ( i ) , σ ( i ) , y t 1: N ) 6) Simulate ν ( i ) from ν ∼ π ( ν | K ( i ) , r ( i ) , σ ( i ) , P ( i ) , y t 1: N ) 7) Repeat steps 2-6 until the sample size required is obtained. W e ﬁnd the mixing for our algorithm is improved when we have intermediate steps between sampling from the σ ( i ) and ν ( i ) full conditionals. Each update in our algorithm is accomplished by a Metropolis-Hastings step using a Kalman ﬁlter . Acceptance ratios are calculated for each update during a burn-in period. T o improv e the computational speed of our inference, further research may in v olv e using an algorithm where we jointly update our parameters. Posterior means are used to obtain point estimates and standard deviations for describing v ariation of inferred parameters. The Heidelber ger and W elch con v ergence diagnostic (Heidelberger & W elch, 1981) is used to determine whether con vergence has been achie ved for all parameters. Computational times for con ver gence of our MCMC schemes (code is av ailable at https://github.com/jhncl/LNA.git ) can be compared using estimates for the minimum ef fecti v e sample size per second (ESS min /sec) (Plummer et al. , 2006). The av- erage ESS min /sec of our approximate model (coded in C) is ∼ 100 and “exact” model ∼ 1 (coded in J A GS (Plummer, 2010) with 15 imputed states between time points, chosen to maximise ESS min /sec). W e ﬁnd that our C code is typically twice as fast as the simple MCMC scheme used by J A GS, indicating that our inference is ∼ 50 × faster than an “ex- act” approach. A more ef ﬁcient “exact” approach could speed up further , say by another factor of 5, but our approximate approach will at least be an order of magnitude faster . 98 Chapter 5. Fast Bayesian parameter estimation for stochastic logistic gro wth models W e use a burn-in of 600,000 and a thinning of 4,000 to obtain a ﬁnal posterior sample size of 1,000 for MCMC con vergence of all our models. T o compare the approximate models ability to recov er parameters from the SLGM with simulated Log-normal measurement error, we simulate data and carry out Bayesian inference. Figure 5.2 shows that all three approximate models can capture the synthetic time-course well, but that the RR TR model is the least representative with the largest amount of drift occurring at the saturation stage, a property not found in the SLGM or the two ne w LN A models. Comparing forwards trajectories with measurement error (Fig- ure 5.2), the “exact” model is visually similar to all our approximate models, but least similar to the RR TR. Further , T able 5.1 demonstrates that parameter posterior means are close to the true v alues and that standard deviations are small for all models and each parameter set. By comparing posterior means and standard deviations to the true values, T able 5.1 sho ws that all our models are able to recov er the three different parameter sets considered. T o compare the approximations to the SLGM with simulated Normal measurement error , we simulate data and carry out Bayesian inference. Figure 5.4 shows that of our ap- proximate models, only the LN AA model can appropriately represent the simulated time- course as both our models with Log-normal measurement error , the RR TR and LN AM do not closely bound the data. Comparing forwards trajectories with measurement error (Figure 5.4), the “e xact” model is most visually similar to the LNAA, which shares the same measurement error structure. Further , T able 5.1 demonstrates that only our models with Normal measurement error hav e posterior means close to the true values and that standard deviations are larger in the models with Log-normal measurement error . Ob- serving the posterior means for K for each parameter set (T able 5.1), we can see that the RR TR has the largest standard deviations and that, of the approximate models, its poste- rior means are furthest from both the true v alues and the “exact” model posterior means. Comparing LN A models to the “exact” models with matching measurement error , we can see in T able 5.1 that they share similar posterior means and only slightly larger standard de viations. Example posterior diagnostics gi ven in Figure 5.3, demonstrate that posteriors are distributed tightly around true v alues for our LNAA and data from the SLGM with Normal measurement error . 5.5.2. A pplication to observ ed yeast data W e no w consider which diffusion equation model can best represent observed microbial population gro wth curves taken from a Quantitative Fitness Analysis (QF A) experiment 99 Chapter 5. Fast Bayesian parameter estimation for stochastic logistic gro wth models A B C Time (Days) Normalised Cell Density (AU) Time (Days) Normalised Cell Density (AU) Time (Days) Normalised Cell Density (AU) RRTR LNAM LNAA D Time (Days) Normalised Cell Density (AU) SLGM+L E F G Time (Days) Normalised Cell Density (AU) Time (Days) Normalised Cell Density (AU) Time (Days) Normalised Cell Density (AU) RRTR LNAM LNAA H Time (Days) Normalised Cell Density (AU) SLGM+L I J K Time (Days) Normalised Cell Density (AU) Time (Days) Normalised Cell Density (AU) Time (Days) Normalised Cell Density (AU) RRTR LNAM Time (Days) Normalised Cell Density (AU) SLGM+L L LNAA Figure 5.2: Forward trajectories with measurement error for the stochastic logistic growth model and approximations, simulated from parameter posterior samples (sample size=1000). Model ﬁtting is carried out on SLGM forward trajectories with Log-normal measurement error (black), for three dif ferent sets of parameters (see T able 5.1). See (5.16) or (5.17) for model and T able C.1 for prior hyper-parameter values. Each row of ﬁgures corresponds to a dif ferent time course data set, simulated from a different set of parameter values, see T able 5.1. Each column of ﬁgures corresponds to a dif ferent model ﬁt: A), E) & I) SLGM+L (orange). B), F) & J) RR TR model with lognormal error (red). C), G) & K) LNAM model with lognormal error (green). D), H) & L) LNAA model with normal error (blue). See T able 5.1 for parameter posterior means and true v alues. (Section 1.1) (Addinall et al. , 2011; Banks et al. , 2012), see Figure 5.5. The data consists of scaled cell density estimates ov er time for budding yeast Sacchar omyces cer evisiae . Independent replicate cultures are inoculated on plates and photographed ov er a period of 5 days. The images captured are then con v erted into estimates of integrated optical density (IOD, which we assume are proportional to cell population size), by the software package Colonyzer (Lawless et al. , 2010). The dataset chosen for our model ﬁtting is a representati ve set of 10 time-courses, each with 27 time points. Once we ha v e chosen the most appropriate stochastic model we can then look to apply our chosen model to logistic gro wth data from the QF A screens used throughout Chapter 4 in the future. As in Figure 5.4, we see that the LN AA model is the only approximation that can appropriately represent the time-course and that both the RR TR and LN AM fail to bound the data as tightly as the LN AA (Figure 5.5). Our two “exact” models are visually similar 100 Chapter 5. Fast Bayesian parameter estimation for stochastic logistic gro wth models Particle number Lag Par ameter value K r P ν ˆ ˆ ˆ ˆ σ ˆ 0 200 400 600 800 1000 0.145 0.150 0.155 0 5 10 15 20 25 30 0.0 0.2 0.4 0.6 0.8 1.0 0.140 0.145 0.150 0.155 0.160 0 50 100 150 0 200 400 600 800 1000 2.0 2.5 3.0 3.5 4.0 0 5 10 15 20 25 30 0.0 0.2 0.4 0.6 0.8 1.0 2.0 2.5 3.0 3.5 4.0 0.0 0.5 1.0 1.5 0 200 400 600 800 1000 0e+00 2e−04 4e−04 0 5 10 15 20 25 30 0.0 0.2 0.4 0.6 0.8 1.0 0e+00 2e−04 4e−04 0 2000 4000 6000 0 200 400 600 800 1000 0.004 0.008 0 5 10 15 20 25 30 0.0 0.2 0.4 0.6 0.8 1.0 0.002 0.006 0.010 0 100 200 300 0 200 400 600 800 1000 0.00 0.05 0.10 0.15 0.20 0 5 10 15 20 25 30 0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.05 0.10 0.15 0.20 0 2 4 6 8 10 ACF Density ACF Density ACF Density ACF Density ACF Density Figure 5.3: Con ver gence diagnostics for the linear noise approximation of the stochastic logistic gro wth model with additi v e intrinsic noise (LN AA) ﬁt to simulated stochastic logistic gro wth data with Normal measurement error , see Figure 5.4D. T race, auto-correlation and density plots for the (LN AA) parameter posteriors (sample size = 1000, thinning interv al = 4000). Posterior density (black), prior density (dashed blue) and true parameter values (red) are shown in the right hand column. 101 Chapter 5. Fast Bayesian parameter estimation for stochastic logistic gro wth models A B C Time (Days) Normalised Cell Density (AU) Time (Days) Normalised Cell Density (AU) Normalised Cell Density (AU) D Time (Days) Normalised Cell Density (AU) E F G Time (Days) Normalised Cell Density (AU) Time (Days) Normalised Cell Density (AU) Time (Days) Normalised Cell Density (AU) RRTR LNAA H Time (Days) Normalised Cell Density (AU) I J K Time (Days) Normalised Cell Density (AU) Time (Days) Normalised Cell Density (AU) Time (Days) Normalised Cell Density (AU) RRTR LNAM Time (Days) Normalised Cell Density (AU) L LNAA Time (Days) SLGM+N RRTR LNAM LNAA LNAA LNAM SLGM+N SLGM+N Figure 5.4: Forw ard trajectories with measurement error , simulated from inferred parameter posterior samples (sample size=1000). Model ﬁtting is carried out on SLGM forward trajectories with Normal measurement error (black), for three dif ferent sets of parameters (see T able 5.1). See (5.16) or (5.17) for model and T able C.1 for prior hyper -parameter values. Each row of ﬁgures corresponds to a dif ferent time course data set, simulated from a dif ferent set of parameter v alues, see T able 5.1. Each column of ﬁgures corresponds to a different model ﬁt: A), E) & I) SLGM+N (pink). B), F) & J) RR TR model with lognormal error (red). C), G) & K) LNAM model with lognormal error (green). D), H) & L) LN AA model with normal error (blue). See T able 5.1 for parameter posterior means and true v alues. to our approximate models with the same measurement error , with the SLGM+N most similar to the LN AA and the SLGM+L to the RR TR and LNAM. This is as e xpected due to matching measurement error structures. T able 5.1 summarises parameter estimates for the observed yeast data using each model. The v ariation in the LN AA model parame- ter posteriors is much smaller than the RR TR and LN AM, indicating a more appropriate model ﬁt. Comparing the LN A models and “exact” models with matching measurement error , we can see in T able 5.1 that they share similar posterior means and standard devi- ations for all parameters and in particular , the y are v ery similar for both K and r , which are important phenotypes for calculating ﬁtness (Addinall et al. , 2011). In T able 5.2, to compare quality of parameter inference for 10 observed yeast time- courses with each approximate model. Mean squared error (MSE) for 1000 posterior sample forward simulations are calculated for each yeast time course and summed to giv e 102 Chapter 5. Fast Bayesian parameter estimation for stochastic logistic gro wth models T able 5.1: Bayesian state space model parameter posterior means, standard de viations and true v alues for Figures 5.2, 5.4 and 5.5. True v alues for the simulated data used for Figures 5.1, 5.2 and 5.4 are also gi ven. P anel Model ˆ K ˆ r ˆ P ˆ ν ˆ σ F igur e 5.2, SLGM with lognormal err or A SGLM+L 0 . 150 ( 0 . 001 ) 2 . 982 ( 0 . 014 ) 1 . 002 · 10 − 04 ( 1 . 112 · 10 − 06 ) 3 . 860 · 10 − 03 ( 2 . 127 · 10 − 03 ) 0 . 017 ( 0 . 005 ) B RR TR 0 . 150 ( 0 . 003 ) 2 . 990 ( 0 . 011 ) 9 . 931 · 10 − 05 ( 1 . 069 · 10 − 06 ) 5 . 684 · 10 − 03 ( 2 . 360 · 10 − 03 ) 0 . 012 ( 0 . 006 ) C LNAM 0 . 150 ( 0 . 001 ) 2 . 988 ( 0 . 013 ) 9 . 980 · 10 − 05 ( 1 . 124 · 10 − 06 ) 4 . 140 · 10 − 03 ( 2 . 180 · 10 − 03 ) 0 . 016 ( 0 . 005 ) D LNAA 0 . 150 ( 0 . 001 ) 3 . 005 ( 0 . 020 ) 9 . 647 · 10 − 05 ( 2 . 946 · 10 − 06 ) 3 . 099 · 10 − 05 ( 2 . 534 · 10 − 05 ) 0 . 019 ( 0 . 003 ) E SGLM+L 0 . 110 ( 0 . 001 ) 3 . 975 ( 0 . 047 ) 5 . 054 · 10 − 05 ( 1 . 568 · 10 − 06 ) 6 . 159 · 10 − 03 ( 5 . 527 · 10 − 03 ) 0 . 051 ( 0 . 014 ) F RRTR 0 . 109 ( 0 . 007 ) 3 . 984 ( 0 . 035 ) 5 . 046 · 10 − 05 ( 1 . 137 · 10 − 06 ) 5 . 928 · 10 − 03 ( 4 . 596 · 10 − 03 ) 0 . 037 ( 0 . 009 ) G LNAM 0 . 110 ( 0 . 001 ) 3 . 985 ( 0 . 046 ) 5 . 043 · 10 − 05 ( 1 . 580 · 10 − 06 ) 6 . 188 · 10 − 03 ( 5 . 191 · 10 − 03 ) 0 . 052 ( 0 . 013 ) H LNAA 0 . 110 ( 0 . 001 ) 3 . 959 ( 0 . 067 ) 5 . 207 · 10 − 05 ( 4 . 310 · 10 − 06 ) 4 . 540 · 10 − 05 ( 4 . 395 · 10 − 05 ) 0 . 059 ( 0 . 010 ) I SGLM+L 0 . 300 ( 0 . 001 ) 5 . 997 ( 0 . 029 ) 1 . 962 · 10 − 05 ( 4 . 041 · 10 − 07 ) 9 . 543 · 10 − 03 ( 4 . 035 · 10 − 03 ) 0 . 024 ( 0 . 015 ) J RRTR 0 . 301 ( 0 . 004 ) 6 . 015 ( 0 . 017 ) 1 . 943 · 10 − 05 ( 2 . 835 · 10 − 07 ) 1 . 241 · 10 − 02 ( 2 . 307 · 10 − 03 ) 0 . 008 ( 0 . 006 ) K LNAM 0 . 300 ( 0 . 001 ) 6 . 015 ( 0 . 031 ) 1 . 953 · 10 − 05 ( 4 . 202 · 10 − 07 ) 8 . 943 · 10 − 03 ( 4 . 252 · 10 − 03 ) 0 . 027 ( 0 . 016 ) L LN AA 0 . 300 ( 0 . 001 ) 6 . 037 ( 0 . 067 ) 1 . 895 · 10 − 05 ( 1 . 502 · 10 − 06 ) 8 . 122 · 10 − 05 ( 1 . 596 · 10 − 04 ) 0 . 047 ( 0 . 008 ) F igur e 5.4, SLGM with normal err or A SLGM+N 0 . 150 ( 0 . 002 ) 3 . 099 ( 0 . 085 ) 9 . 299 · 10 − 05 ( 7 . 305 · 10 − 06 ) 5 . 326 · 10 − 03 ( 1 . 009 · 10 − 03 ) 0 . 059 ( 0 . 030 ) B RR TR 0 . 213 ( 0 . 123 ) 1 . 368 ( 0 . 263 ) 4 . 552 · 10 − 03 ( 2 . 118 · 10 − 03 ) 2 . 539 · 10 − 01 ( 1 . 097 · 10 − 01 ) 0 . 419 ( 0 . 129 ) C LNAM 0 . 171 ( 0 . 033 ) 1 . 580 ( 0 . 271 ) 5 . 241 · 10 − 03 ( 2 . 048 · 10 − 03 ) 2 . 054 · 10 − 01 ( 7 . 805 · 10 − 02 ) 0 . 473 ( 0 . 051 ) D LNAA 0 . 150 ( 0 . 002 ) 2 . 990 ( 0 . 262 ) 1 . 189 · 10 − 04 ( 7 . 099 · 10 − 05 ) 5 . 490 · 10 − 03 ( 1 . 060 · 10 − 03 ) 0 . 053 ( 0 . 033 ) E SLGM+N 0 . 109 ( 0 . 001 ) 4 . 183 ( 0 . 074 ) 4 . 390 · 10 − 05 ( 4 . 129 · 10 − 06 ) 9 . 679 · 10 − 04 ( 2 . 806 · 10 − 04 ) 0 . 057 ( 0 . 012 ) F RRTR 0 . 157 ( 0 . 087 ) 2 . 631 ( 0 . 337 ) 4 . 398 · 10 − 04 ( 1 . 678 · 10 − 04 ) 1 . 040 · 10 − 01 ( 1 . 009 · 10 − 01 ) 0 . 374 ( 0 . 162 ) G LNAM 0 . 116 ( 0 . 009 ) 3 . 019 ( 0 . 374 ) 4 . 967 · 10 − 04 ( 1 . 397 · 10 − 04 ) 3 . 346 · 10 − 02 ( 4 . 309 · 10 − 02 ) 0 . 475 ( 0 . 044 ) H LNAA 0 . 110 ( 0 . 001 ) 4 . 010 ( 0 . 158 ) 5 . 012 · 10 − 05 ( 1 . 443 · 10 − 05 ) 1 . 093 · 10 − 03 ( 3 . 638 · 10 − 04 ) 0 . 053 ( 0 . 013 ) I SLGM+N 0 . 305 ( 0 . 003 ) 5 . 267 ( 0 . 125 ) 3 . 263 · 10 − 04 ( 3 . 407 · 10 − 05 ) 1 . 119 · 10 − 02 ( 1 . 974 · 10 − 03 ) 0 . 045 ( 0 . 031 ) J RRTR 0 . 314 ( 0 . 057 ) 3 . 030 ( 0 . 233 ) 1 . 307 · 10 − 03 ( 2 . 897 · 10 − 04 ) 2 . 228 · 10 − 01 ( 3 . 708 · 10 − 02 ) 0 . 075 ( 0 . 086 ) K LNAM 0 . 313 ( 0 . 020 ) 3 . 392 ( 0 . 430 ) 1 . 118 · 10 − 03 ( 3 . 269 · 10 − 04 ) 1 . 176 · 10 − 01 ( 8 . 435 · 10 − 02 ) 0 . 360 ( 0 . 165 ) L LN AA 0 . 302 ( 0 . 002 ) 5 . 862 ( 0 . 523 ) 2 . 890 · 10 − 05 ( 2 . 599 · 10 − 05 ) 8 . 774 · 10 − 03 ( 1 . 466 · 10 − 03 ) 0 . 041 ( 0 . 028 ) F igur e 5.5, observed yeast data A SLGM+L 0 . 110 ( 0 . 007 ) 4 . 098 ( 0 . 299 ) 7 . 603 · 10 − 06 ( 3 . 206 · 10 − 06 ) 3 . 457 · 10 − 01 ( 5 . 319 · 10 − 02 ) 0 . 113 ( 0 . 109 ) B SLGM+N 0 . 110 ( 0 . 003 ) 3 . 905 ( 0 . 173 ) 1 . 044 · 10 − 05 ( 3 . 086 · 10 − 06 ) 1 . 852 · 10 − 04 ( 7 . 460 · 10 − 05 ) 0 . 167 ( 0 . 028 ) C RR TR 0 . 114 ( 0 . 026 ) 3 . 764 ( 0 . 201 ) 1 . 079 · 10 − 05 ( 3 . 155 · 10 − 06 ) 3 . 379 · 10 − 01 ( 4 . 840 · 10 − 02 ) 0 . 078 ( 0 . 077 ) D LNAM 0 . 110 ( 0 . 011 ) 3 . 777 ( 0 . 216 ) 1 . 077 · 10 − 05 ( 3 . 277 · 10 − 06 ) 3 . 362 · 10 − 01 ( 5 . 137 · 10 − 02 ) 0 . 104 ( 0 . 108 ) E LN AA 0 . 109 ( 0 . 003 ) 3 . 832 ( 0 . 198 ) 1 . 069 · 10 − 05 ( 3 . 680 · 10 − 06 ) 1 . 769 · 10 − 04 ( 6 . 607 · 10 − 05 ) 0 . 164 ( 0 . 033 ) T rue values K r P ν σ Figures 5.1, panels A, B, C and D 0.11 4 0.00005 N/A 0.05 Figures 5.2 and 5.4, panels A, B, C & D 0.15 3 0.0001 0.005 0.01 Figures 5.2 and 5.4, panels E, F , G and H 0.11 4 0.00005 0.001 0.05 Figures 5.2 and 5.4, panels I, J, K and L 0.3 6 0.0002 0.01 0.02 T able 5.2: T otal mean squared error (MSE) for 10 observed yeast gro wth time courses, each with 1000 forward simulated time-courses with measurement error . Parameter values are taken from posterior samples. Standard Deviations giv e the variation between the sub-total MSEs for each yeast time course ﬁt (n=10). Model SLGM+N SLGM+L RR TR LN AM LN AA T otal MSE 29 . 847 100 . 165 600 . 601 99 . 397 30 . 959 Standard De viation 1 . 689 8 . 391 55 . 720 9 . 263 2 . 030 a T otal MSE for each model. It is clear that the RR TR is the worst overall representation of the 10 yeast time courses, with the highest total MSE and a much lar ger total MSE than the “exact” SLGM+L. It is interesting to see there is a very similar total MSE for the SLGM+L and LN AM, and similarly for the SLGM+N and LN AA, demonstrating that our approximations perform well. Once the most appropriate approximate stochastic model is chosen, we can incorpo- rate the SDE within our Bayesian hierarchical models described in Section 3. Currently 103 Chapter 5. Fast Bayesian parameter estimation for stochastic logistic gro wth models Time (Days) Normalised Cell Density (AU) C D Time (Days) Normalised Cell Density (AU) Time (Days) Normalised Cell Density (AU) E LNAA A B T ime (D a y s) Normalised Cell Density (AU) Time (Days) Normalised Cell Density (AU) SLGM+N RRTR SLGM+L LNAM Figure 5.5: Forw ard trajectories with measurement error , simulated from inferred parameter posterior samples (sample size=1000). Model ﬁtting is carried out on observed yeast time-course data (black). See (5.16) or (5.17) and T able C.1 for prior hyper-parameter values. See T able 5.1 for parameter posterior means. A) SLGM+N (pink). B) SLGM+L (orange). A) RR TR model with Log-normal error (red). B) LNAM model with log-normal error (green). C) LN AA model with Normal error (blue). the Bayesian hierarchical models described in Section 3 ha v e long computational times, ∼ 2 weeks for the joint hierarchical model (JHM) ( ∼ 1 week with further optimisations) and so e xtending these models using slo w numerical methods w ould lead to prohibiti vely slo w computational times that we estimate to take ∼ 3-6 months (with 4294 orf ∆ s, ∼ 8 repeats and ∼ 27 time points). Inference using the Kalman ﬁlter will allow the Bayesian hierarchical models to carry out stochastic modelling at a greatly reduced computational time ( ∼ 10 × faster) compared to an arbitrarily e xact approach. 104 Chapter 6. Conclusions and futur e work W e have joined a hierarchical model of microbial gro wth with a model for genetic interac- tion in order to learn about strain ﬁtnesses, evidence for genetic interaction and interaction strengths simultaneously . By introducing Bayesian methodology to QF A we have been able to model the hierarchical nature of the experiment and expand the multiplicati ve model for genetic interaction to incorporate man y sources of variation that pre viously had to be ignored. W e proposed two ne w Bayesian hierarchical model approaches to replace the current statistical analysis for identifying genetic interactions within a QF A screen comparison. Both the ne w two-stage and one-stage approaches giv e similar results but ha ve dif ferent interpretations. The two-stage approach ﬁts the SHM follo wed by the IHM, with uni- v ariate point estimate ﬁtness deﬁnitions generated as an intermediate step. The two-stage approach can therefore be regarded as a Bayesian hierarchical version of the Addinall et al. (2011) approach. In contrast, the one-stage approach ﬁts the JHM, which does not require a uni v ariate deﬁnition of ﬁtness, recognising that ﬁtness is a multi-faceted concept, allo wing interaction to be identiﬁed by either growth rate (logistic parameter r ) or ﬁnal biomass (logistic parameter K ) achie v able by a gi v en genotype. Our one-stage approach is a ne w method of detecting genetic interaction that further de velops the interpretation of epistasis within QF A screens. Hierarchical methods are able to account for the many sources of v ariation that exist within QF A data by accurately reﬂecting QF A experimental design, which is kno wn. A hierarchical, frequentist approach using random ef fects, namely the REM is presented in order to improv e on the Addinall et al. (2011) approach. Due to the lack of ﬂexibility with modelling assumptions in the standard frequentist modelling paradigm, the REM is unsuitable for modelling the distribution of orf ∆ lev el variation on a log scale or for simultaneously modelling genetic interaction and logistic gro wth curves. The data from which logistic parameter estimates are deriv ed during QF A are the re- sult of a technically challenging, high-throughput experimental procedure with a div erse range of possible technical errors. Our Bayesian, hierarchical models allow us the ﬂex- ibility to make distrib utional assumptions that more closely match the data. This allo ws us to switch between modelling parameter uncertainty with Normal, Log-Normal and Student’ s t distribution where appropriate. 105 Chapter 6. Conclusions and future work QF A experimental design is intrinsically multilev el and is therefore more closely mod- elled in our hierarchical scheme. Consequently the JHM and IHM capture sources of v ariation not considered by Addinall et al. (2011). By sharing information across lev- els in the hierarchy , our models hav e allo wed us to learn more about orf ∆ s with weaker genetic interaction. Our more ﬂe xible model of v ariance also av oids misclassiﬁcation of indi vidual genotypes with high v ariance as having signiﬁcant interactions. W ithout fully accounting for the variation described in the Bayesian hierarchical models, the previous Addinall et al. (2011) approach may ha ve relati v ely poor power to detect subtle interac- tions, obscuring potential nov el observ ations. Many subtle, interesting genetic interactions may remain to be in vestigated for the ex- ample dataset we present: QF A to understand telomere capping using cdc13-1 . The JHM is better able to identify subtle interactions (see Figure 4.3). In our two-stage approaches, uni v ariate ﬁtness measures such as M D R × M DP are used in the intermediate steps, occasionally causing interaction in terms of one parameter to be masked by the other . For example, strains with little evidence for interaction with a background mutation in terms of growth rate but with strong evidence of interaction in terms of carrying capacity are sometimes classiﬁed as interactors using the JHM (see Figure 4.3). The JHM has iden- tiﬁed genes that hav e not been identiﬁed as sho wing genetic interaction in the Addinall et al. (2011) or two-stage Bayesian analysis, for example CHZ1 , which is thought to be related to telomere acti vity (W an et al. , 2011). As expected, many genes pre viously unidentiﬁed by Addinall et al. (2011) hav e been identiﬁed as showing e vidence of interaction using both of our Bayesian hierarchical modelling approaches. Some genes which ha v e been identiﬁed only by the JHM (see Figure 4.2D), such as those showing interaction only in terms of r , are found to be re- lated to telomere biology in the literature. Currently there is not sufﬁcient information av ailable to identify the proportion of identiﬁed interactions that are true hits and so we use unbiased GO term enrichment analyses to conﬁrm that the lists of genetic interactions closely reﬂect the true underlying biology . GO term annotations relev ant to telomere bi- ology are av ailable for well-studied genes in the current literature. Unsurprisingly all of the approaches considered closely reﬂect the most well-known GO terms (see T able 4.1). Computational time for the new Bayesian approach ranges from one to two weeks for one of the datasets presented in Addinall et al. (2011). This compares fa vourably with the time taken to design and ex ecute the experimental component of QF A (ap- proximately six weeks). T ime and resources used to follo w up the results of a QF A screen comparison can be sav ed with the Bayesian approaches suggested, allowing 106 Chapter 6. Conclusions and future work genes to be chosen for further in v estigation with increased conﬁdence. W ith an im- prov ed analysis it may be possible to detect more genetic interactions with the same sample size, allowing us to systematically detect and rank interactions genome-wide. Overall we recommend a JHM or “Bayesian QF A ” for analysis of current and future QF A data sets as it accounts for more sources of v ariation than the Addinall et al. (2011) QF A methodology . W ith the JHM we have outlined ne w genes with signiﬁcant e vidence of interaction in the ura3 ∆ 27 ◦ C and cdc13-1 27 ◦ C experiment. The full lists of genetic interactions for both the two and one stage Bayesian hierarchical approaches as well as lists of signiﬁcant GO terms are freely av ailable online at http://research.ncl.ac.uk/qfa/HeydariQFABayes/ .The new Bayesian hierarchical models we present here will also be suitable for identifying new genes sho wing evidence of genetic interaction in backgrounds other than telomere acti vity . W e hope that further , reductionist lab work by experimental biologists will gi v e additional insight into the mechanisms by which the new genes we ha v e unco vered interact with the telomere. In this thesis we hav e also presented two ne w diffusion processes for modelling logistic growth data where fast inference is required: the linear noise approximation (LN A) of the stochastic logistic gro wth model (SLGM) with multiplicativ e noise and the LN A of the SLGM with additi ve intrinsic noise (labelled as the LN AM and LN AA respecti vely). Both the LNAM and LN AA are deriv ed from the linear noise approxi- mation of the stochastic logistic gro wth model (SLGM). The ne w diffusion processes approximate the SLGM more closely than an alternati v e approximation (RR TR) proposed by Rom ´ an-Rom ´ an & T orres-Ruiz (2012). The RR TR lacks a mean rev erting property that is found in the SLGM, LN AM and LN AA, resulting in increasing variance during the stationary phase of population gro wth (see Figure 5.1). W e compared the ability of each of the three approximate models and the SLGM to re- cov er parameter v alues from simulated datasets using standard MCMC techniques. When modelling stochastic logistic growth with Log-normal measurement error we ﬁnd that our approximate models are able to represent data simulated from the original process and that the RR TR is least representati ve, with large v ariation over the stationary phase (see Figure 5.2). When modelling stochastic logistic gro wth with Normal measurement error we ﬁnd that only our models with Normal measurement error can appropriately bound data simulated from the original process (see Figure 5.4). W e also compared parameter posterior distribution summaries with parameter values used to generate simulated data 107 Chapter 6. Conclusions and future work after inference using both approximate and “exact” models (see T able 5.1). W e ﬁnd that, when using the RR TR model, posterior distrib utions for the carrying capacity parameter K are less precise than for the LN AM and LNAA approximations. W e also note that it is not possible to model additi ve measurement error while maintaining a linear Gaussian structure (which allows fast inference with the Kalman ﬁlter) when carrying out inference with the RR TR. W e conclude that when measurement error is additiv e, the LNAA model is the most appropriate approximate model. T o test model performance during inference with real population data, we ﬁtted our approximate models and the “exact” SLGM to microbial population growth curves gen- erated by quantitative ﬁtness analysis (QF A) (see Figure 5.5). W e found that the LNAA model was the most appropriate for modelling e xperimental data. It seems likely that this is because a Normal error structure best describes this particular dataset, placing the LN AM and RR TR models at a disadv antage. W e demonstrate that arbitrarily e xact meth- ods and our fast approximations perform similarly during inference for 10 di verse, ex- perimentally observed, microbial population growth curves (see T able 5.2) which shows that, in practise, our fast approximations are as good as “exact” methods. W e conclude that our LN A models are preferable to the RR TR for modelling QF A data. It is interesting to note that, although the LNAA is not a better approximation of the original SGLM process than the LN AM, it is still quite reasonable. Figures 5.1A and 5.1D sho w that the SLGM and LN AA processes are visually similar . Figure 5.1E demonstrates that forward trajectories of the LN AA also share similar le vels of v ariation ov er time with the SLGM and LN AM. Fast inference with the LNAA giv es us the potential to develop large hierarchical Bayesian models for genome-wide QF A datasets, using a diffusion equation and realistic computational resources Here, we hav e concentrated on a biological model of population growth. Ho we v er , we expect that the approach we hav e demonstrated: generating linear noise approximations of stochastic processes to allo w fast Bayesian inference with Kalman ﬁltering for marginal likelihood computation, will be useful in a wide range of other applications where simulation is prohibiti vely slo w . Further work in v olves extending the Bayesian hierarchical models in Chapter 3 with the approximate stochastic logistic gro wth models and methods for carrying out inference described in Chapter 5. By accounting for the random ﬂuctuations within the logistic gro wth data we will be able to improv e our logistic gro wth parameter estimates. 108 Chapter 6. Conclusions and future work W e have demonstrated how to incorporate a batch effect or a transformation ef fect to the joint hierarchical model in Section 4.5. Introducing a batch or transformation effect into our models will allo w us to capture e ven further experimental v ariation. Fitness plots for further case studies gi ven in Section 4.5 and e xtensions of the joint hierarchical model gi ven in Section 4.6 are included for e xperimental biologists to in vestigate further . A related experiment to the QF A screen comparison analysed within this thesis is the “all-by-all” QF A experiment (in early de velopment at the time of writing). The “all- by-all” QF A experiment begins with a control plate consisting of N orf ∆ s. For each of the N orf ∆ s a new query plate is created, each query plate consists of the control plate and an additional background mutation related to one of the N orf ∆ s. In total there will be N + 1 unique plates (including the control plate). Where a standard QF A comparison looks for genes that interact with a single query mutation (or condition), the “all-by-all” QF A experiment aims to ﬁnd genetic interactions for multiple query mutations ( N ) simultaneously . The “all-by-all” experiment therefore incorporates more information and in vestigates more potential genetic interactions than a standard QF A comparison. W e expect that the Bayesian hierarchical modelling and genetic interaction modelling de veloped in this thesis will be used to create models for describing the “all-by-all” QF A experiment as well as man y other similar e xperiments in the future. By improving our software we may be able to reduce computational time for infer - ence. Currently the code for implementing the Bayesian models described in this thesis is written in the C programming language which can be run as standalone software or through an R package “qf aBayes”, av ailable at https://r- forge.r- project. org/projects/qfa .The computational speed of our C code used for inference could be improv ed by parallel implementation, taking adv antage of a multi-core processor to carry out tasks simultaneously . W ith faster computational times we expect to reduce the time for a typical QF A comparison with the JHM from ∼ 2 weeks to less than a week. Currently the information a v ailable on true genetic interactions and biological pro- cesses in yeast is limited and so we rely on objectiv e analyses such as simulation studies to gi ve unbiased comparisons between the approaches considered. The biological pro- cesses of many genes in the yeast genome are yet to be identiﬁed so we are unable to use GO term enrichment analysis as a “gold standard” for comparing the results of our approaches. Information used to build a gene ontology is typically well kno wn and tak en from well understood experiments, we e xpect that subtle genetic interactions which we are interested in ﬁnding will hav e little information av ailable. QF A screen comparisons are designed to learn biology which is not already fully understood and so a biological 109 Chapter 6. Conclusions and future work comparison between the different approaches considered is difﬁcult. Simulation studies (see Section 4.3.6) gi v e us the ability to compare the dif ferent approaches and the effects of modelling more experimental structure. A typical QF A comparison is a large and complex data set corresponding to around 400,000 time series, posing considerable computational, as well as statistical challenges. W ith a Bayesian approach we are able to ev aluate complex hierarchical models to better reﬂect the structure or design of genome-wide QF A experiments. Bayesian variable se- lection methods embedded within a lar ge hierarchical model allow us to describe genetic interaction and use prior information to incorporate physical and biological constraints within our models. W e hav e shown that Bayesian hierarchical modelling of large and complex data gi v es us th e adv antage of increased modelling ﬂexibility compared to a fre- quentist approach, allo wing us to better describe the experimental structure or design. For the reasons above, a QF A screen comparison or an y other highly structured e xperimental dataset is better modelled using a Bayesian hierarchical modelling approach when com- pared to an alternati ve frequentist approach. Overall this thesis presents improved modelling approaches to the current non-hierarchical frequentist approach for a QF A screen comparison. The research contained in this thesis illustrates ho w Bayesian inference gi ves us further modelling ﬂexibility , allo wing us to better describe the known e xperimental structure. Further , our modelling approaches and assumptions are transferable outside QF A screen experiments where we wish to capture as much experimental structure as possible. The results from our temperature sensiti ve cdc13-1 QF A experiment results will gi v e further insight to the telomere and consequen- tially aging and cancer in yeast and potentially the human genome (Botstein et al. , 1997). 110 A ppendix A. QF A data set sample, solving the logistic growth model and random effects model R code A.1. cdc13-1 Quantitativ e Fitness Analysis data set sample Figure A.1: cdc13-1 Quantitative Fitness Analysis data set sample. Notable columns include “ORF”, “Expt.Time” and “Growth”. “ORF” indicates which orf ∆ strain the row corresponds to. “Expt.T ime” indicates the time in days from the orf ∆ strain being spotted (Addinall et al. , 2011). “Gro wth” giv es an adjusted measure of cell culture density from the image analysis for a gi ven orf ∆ strain and time point. Generated from Colonyzer output ﬁles with the qfa R package, freely av ailable at http://qfa.r- forge.r- project.org/ . 111 Appendix A. QF A data set sample, solving the logistic growth model and random ef fects model R code A.2. Solving the logistic gro wth model The solution to the logistic gro wth ODE (1.1) can be obtained as follows. First we factor the right side of (1.1) and rearrange to gi ve: dx ( t ) x ( t )  1 − x ( t ) K  = r dt. W e now rearrange further using a partial fractions expansion and inte grate ov er both sides of the equation: Z dx ( t ) x ( t ) + 1 K dx ( t )  1 − x ( t ) K  = Z r dt. (A.1) Integrating the ﬁrst component on the left side of (A.1) we obtain the follo wing, where c 1 is an unkno wn constant: Z dx ( t ) x ( t ) = log( x ( t )) + c 1 . Integrating the second component on the left side of (A.1) we obtain the following, where c 2 is an unkno wn constant: 1 K Z dx ( t ) 1 − x ( t ) K = − log(1 − x ( t ) K ) + c 2 . Integrating the right side of (A.1) we obtain the following, where c 3 is an unkno wn con- stant: Z r dt = r t + c 3 . Solving the integrals in (A.1) we obtain the follo wing, where c 4 = c 3 − c 1 − c 2 is an unkno wn constant: log x ( t ) 1 − x ( t ) K ! = r t + c 4 . Rearranging our equation, we obtain the follo wing: x ( t ) 1 − x ( t ) K = e rt + c 4 . 112 Appendix A. QF A data set sample, solving the logistic growth model and random ef fects model R code W e no w apply initial conditions, P = x 0 and rearrange to obtain an expression for c 4 : c 4 = log P 1 − P K ! . W e no w substitute in our expression for c 4 to gi ve: log x ( t ) 1 − x ( t ) K ! = r t + log P 1 − P K ! Finally , we rearrange to giv e (1.2). 113 Appendix A. QF A data set sample, solving the logistic growth model and random ef fects model R code A.3. Random effects model R code library(lme4) #http://cran.r-project.org/web/packages/lme4/index.html #http://research.ncl.ac.uk/colonyzer/AddinallQFA/Logistic.zip and extract zip file #alternatively http://research.ncl.ac.uk/colonyzer/AddinallQFA/ #"Table S8 Logistic Output Files - 36MB .zip file" aa<-read.delim("cSGA_v2_r1_Logistic.txt",header=T,skip=1,sep="\t") #... bb<-read.delim("Adam_cdc13-1_SDLV2_REP1_Logistic.txt",header=T,skip=0,sep="\t") #... aa<-aa[aa$Treatments==27,] bb<-bb[bb$Treatments==27,] aa<-aa[!aa$Row==1,] aa<-aa[!aa$Row==16,] aa<-aa[!aa$Col==1,] aa<-aa[!aa$Col==24,] bb<-bb[!bb$Row==1,] bb<-bb[!bb$Row==16,] bb<-bb[!bb$Col==1,] bb<-bb[!bb$Col==24,] ORFuni=ORFuni_a=unique(aa$ORF) ORFuni_b=unique(bb$ORF) L=length(ORFuni_a) NoORF_a=NoORF_b=aaa=bbb=numeric() for (i in 1:L){ NoORF_a[i]=nrow(aa[aa$ORF==ORFuni[i],]) NoORF_b[i]=nrow(bb[bb$ORF==ORFuni[i],]) aaa<-rbind(aaa,aa[aa$ORF==ORFuni[i],]) bbb<-rbind(bbb,bb[bb$ORF==ORFuni[i],]) } a=b=numeric(0) K_lm=aaa$Trimmed.K P_a=43 r_lm=aaa$Trimmed.r for (i in 1:length(r_lm)){ if(K_lm[i]<=2 * P_a){K_lm[i]=2 * P_a+0.01;r_lm[i]=0;} a[i]=(r_lm[i]/log(2 * max(0,K_lm[i]-P_a)/max(0,K_lm[i]-2 * P_a))) * (log(K_lm[i]/P_a)/log(2)); } K_lmb=bbb$Trimmed.K P_b=43 r_lmb=bbb$Trimmed.r for (i in 1:length(r_lmb)){ if(K_lmb[i]<=2 * P_b){K_lmb[i]=2 * P_b+0.01;r_lmb[i]=0;} b[i]=(r_lmb[i]/log(2 * max(0,K_lmb[i]-P_b)/max(0,K_lmb[i]-2 * P_b))) * (log(K_lmb[i]/P_b)/log(2)); } condition<-factor(c(rep("a",length(a)),rep("b",length(b)))) subject=numeric() for (i in 1:L){ subject=c(subject,rep(i,NoORF_a[i])) } for (i in 1:L){ subject=c(subject,rep(i,NoORF_b[i])) } subcon=subject subcon[1:length(a)]=0 subcon<-factor(subcon) subject<-factor(subject) f=c(a,b) data=data.frame(f,subject,condition,subcon) data$lf=log(data$f+1) data$subcon<-C(data$subcon,sum) bk<-contrasts(data$subcon) contrasts(data$subcon)=bk[c(nrow(contrasts(data$subcon)),1:(nrow(contrasts(data$subcon))-1)),] model1<-lmer(lf˜subcon+(1|subject),data=(data),REML=F) 114 A ppendix B. Bayesian hierar chical modelling B.1. Hyper -parameter v alues f or Bay esian hierar chical modelling T able B.1: Hyper-parameter values for Bayesian hierarchical modelling of quantitativ e ﬁtness analysis data. Hyper-parameter values for the separate hierarchical model (SHM), interaction hierarchical model (IHM) and joint hierarchical model (JHM) are provided. SHM & JHM SHM & JHM JHM IHM JHM-B & JHM-T Parameter Name V alue Parameter Name V alue Parameter Name V alue Parameter Name V alue Parameter Name V alue τ K,µ 2 . 20 η r,p 0 . 13 α µ 0 . 00 Z µ 3 . 66 κ p 0 . 00 η τ ,K,p 0 . 02 ν µ 19 . 82 η α 0 . 25 η Z,p 0 . 70 η κ 1 . 17 η K,o − 0 . 79 η ν,p 0 . 02 β µ 0 . 00 η Z 0 . 10 λ p 0 . 00 ψ K,o 0 . 61 P µ − 9 . 04 η β 0 . 25 ψ Z 0 . 42 η λ 1 . 17 τ r,µ 3 . 65 η P 0 . 47 p 0 . 05 η ν 0 . 10 φ shape 100 . 00 η τ ,r ,p 0 . 02 η γ − 0 . 79 ψ ν 2 . 45 φ scale 0 . 01 η r,o 0 . 47 ψ γ 0 . 61 ν µ 2 . 60 χ shape 100 . 00 ψ r,o 0 . 10 η ω 0 . 47 η ν,p 0 . 05 χ scale 0 . 01 η ν − 0 . 83 ψ ω 0 . 10 α µ 0 . 00 ψ ν 0 . 86 η τ ,K 2 . 20 η α 0 . 31 K µ − 2 . 01 ψ τ ,K 0 . 02 p 0 . 05 η K,p 0 . 03 η τ ,r 3 . 65 η γ 0 . 10 r µ 0 . 97 ψ τ ,r 0 . 02 ψ γ 0 . 42 115 Appendix B. Bayesian hierarchical modelling B.2. cdc13-1 27 ◦ C vs ura3 ∆ 27 ◦ C ﬁtness plots with gene ontology terms highlighted A B C D 0 50 100 150 0 20 40 60 80 100 TEL1 TEC1 DPB3 GPR1 SIR2 YDR026C GIS1 FOB1 DPB4 SIR4 CT A1 SNF1 GP A2 RIM15 PUF4 PNC1 SNF4 SIP2 RTG2 HXK2 ACB1 PHB1 PHB2 LAG1 SOD2 FIS1 GUT2 LAC1 MSN4 MDH1 AA T1 RAD27 UTH1 DNM1 HSP104 EST2 BUD6 MDM30 SIR3 TSA1 ZDS2 NDI1 MSN2 NDE1 SGS1 ZDS1 HDA1 RAS2 PEX6 RPD3 LAG2 DNL4 CKA2 RAS1 NPT1 SCP1 HST2 CTF4 HDA3 NTG1 FUN30 Y AL027W NUP60 HT A2 ALK2 APN2 PSY4 PIN4 HHT1 RDH54 MMS4 RAD16 TDP1 SLX1 HSM3 CHK1 SNF5 DCC1 MRC1 POL4 MSH3 NHP10 HEX3 RPN4 RAD59 BDF2 BRE1 MSH5 MGT1 RAD57 RAD28 PPH3 RAD55 UBC13 MSH6 BMH2 SAC3 RAD9 HT A1 DIN7 RAD3 4 HIM1 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 PLM2 RAD23 HA T2 MIG3 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 CKB1 ALK1 RAD6 MMS2 LIF1 P AN2 SOH1 RAD54 SAE2 RTF1 SHU1 SNF6 RRM3 WSS1 RTT107 CKA1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' MPH1 RTT101 HPR5 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 P AN3 IXR1 YNK1 APN1 CTK1 DOA1 TRM2 NUP133 RPL40B MLP1 RTT109 UBI4 RAD5 MLH2 SLX4 TOS4 IRC20 NEJ1 MMS22 RSC2 PSY3 CST9 CDC73 RAD33 UNG1 RAD52 SML1 OGG1 RAD10 CTK3 CSM3 CTF18 YKU80 PSO2 YIM1 TPP1 MLH1 DDR48 RAD14 MRE11 YKU70 HHT2 PMS1 MKT1 Y AF9 EAF7 PSY2 IES2 MGS1 VPS75 RAD50 TOF1 MCK1 SIN3 NTG2 MSH2 HMI1 P AP2 EXO1 CKB2 LEO1 ARP8 ELG1 IES4 WTM2 REV1 RAD17 PHR1 HA T1 CHL1 RAD1 RMI1 ELC1 PNG1 MEI5 MLH3 REV3 DDC1 EAF3 NHP6A MMS1 CHZ1 PRE9 CTI6 RTC6 TGS1 PEX22 A TG1 4 PEX32 YCL056C PEX19 GSG1 PEX7 ADR1 PEX5 PEX10 PEX3 PEX29 AFG1 A TG18 PEX14 PEX31 PEX8 PEX4 PEX21 SL T2 PEX28 PEX18 DJP1 BCK1 MD V1 A TG27 YJL185C PEX2 GPX1 PEX1 VPS1 CAF4 PEX13 PEX30 MID2 A TG17 YMR018W PEX12 INP2 INP1 PEX17 A TG2 PEX15 WSC3 MDH2 RTC1 PEX11 SLG1 Y OR084W PEX27 MKK1 PEX25 VPS30 MKK2 ANT1 SWD1 HEK2 SWD3 PBP2 RIF1 GBP2 EST3 SBA1 BRE2 STM1 EST1 TOP3 RIF2 CGI121 HSC82 YRF1−6 HSP82 SWD1 HEK2 TEL1 SWD3 PBP2 RIF1 GBP2 MRC1 HEX3 RAD59 RAD57 XRS2 RAD51 SLX8 RAD54 EST3 SBA1 BRE2 STM1 EST1 TOP3 EST2 RIF2 RAD52 CGI121 YKU80 HSC82 SGS1 YKU70 RAD50 YRF1−6 EXO1 ELG1 TGS1 HSP82 0 50 100 150 0 20 40 60 80 100 120 TEL1 TEC1 DPB3 GPR1 SIR2 YDR026C GIS1 FOB1 DPB4 SIR4 CT A1 SNF1 GP A2 RIM15 PUF4 PNC1 SNF4 SIP2 RTG2 HXK2 ACB1 PHB1 PHB2 LAG1 SOD2 FIS1 GUT2 LAC1 MSN4 MDH1 AA T1 RAD27 UTH1 DNM1 HSP104 EST2 BUD6 MDM30 SIR3 TSA1 ZDS2 NDI1 MSN2 NDE1 SGS1 ZDS1 HDA1 RAS2 PEX6 RPD3 LAG2 DNL4 CKA2 RAS1 NPT1 SCP1 HST2 CTF4 HDA3 NTG1 FUN30 Y AL027W NUP60 HT A2 ALK2 APN2 PSY4 PIN4 HHT1 RDH54 MMS4 RAD16 TDP1 SLX1 HSM3 CHK1 SNF5 DCC1 MRC1 POL4 MSH3 NHP10 HEX3 RPN4 RAD59 BDF2 BRE1 MSH5 MGT1 RAD57 RAD28 PPH3 RAD55 UBC13 MSH6 BMH2 SAC3 RAD9 HT A1 DIN7 RAD3 4 HIM1 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 PLM2 RAD23 HA T2 MIG3 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 CKB1 ALK1 RAD6 MMS2 LIF1 P AN2 SOH1 RAD54 SAE2 RTF1 SHU1 SNF6 RRM3 WSS1 RTT107 CKA1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' MPH1 RTT101 HPR5 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 P AN3 IXR1 YNK1 APN1 CTK1 DOA1 TRM2 NUP133 RPL40B MLP1 RTT109 UBI4 RAD5 MLH2 SLX4 TOS4 IRC20 NEJ1 MMS22 RSC2 PSY3 CST9 CDC73 RAD33 UNG1 RAD52 SML1 OGG1 RAD10 CTK3 CSM3 CTF18 YKU80 PSO2 YIM1 TPP1 MLH1 DDR48 RAD14 MRE11 YKU70 HHT2 PMS1 MKT1 Y AF9 EAF7 PSY2 IES2 MGS1 VPS75 RAD50 TOF1 MCK1 SIN3 NTG2 MSH2 HMI1 P AP2 EXO1 CKB2 LEO1 ARP8 ELG1 IES4 WTM2 REV1 RAD17 PHR1 HA T1 CHL1 RAD1 RMI1 ELC1 PNG1 MEI5 MLH3 REV3 DDC1 EAF3 NHP6A MMS1 CHZ1 PRE9 CTI6 RTC6 TGS1 PEX22 A TG1 4 PEX32 YCL056C PEX19 GSG1 PEX7 ADR1 PEX5 PEX10 PEX3 PEX29 AFG1 A TG18 PEX14 PEX31 PEX8 PEX4 PEX21 SL T2 PEX28 PEX18 DJP1 BCK1 MD V1 A TG27 YJL185C PEX2 GPX1 PEX1 VPS1 CAF4 PEX13 PEX30 MID2 A TG17 YMR018W PEX12 INP2 INP1 PEX17 A TG2 PEX15 WSC3 MDH2 RTC1 PEX11 SLG1 Y OR084W PEX27 MKK1 PEX25 VPS30 MKK2 ANT1 SWD1 HEK2 SWD3 PBP2 RIF1 GBP2 EST3 SBA1 BRE2 STM1 EST1 TOP3 RIF2 CGI121 HSC82 YRF1−6 HSP82 SWD1 HEK2 TEL1 SWD3 PBP2 RIF1 GBP2 MRC1 HEX3 RAD59 RAD57 XRS2 RAD51 SLX8 RAD54 EST3 SBA1 BRE2 STM1 EST1 TOP3 EST2 RIF2 RAD52 CGI121 YKU80 HSC82 SGS1 YKU70 RAD50 YRF1−6 EXO1 ELG1 TGS1 HSP82 0 50 100 150 0 10 20 30 40 50 60 NTG1 Y AL027W PEX22 NUP60 SWD1 HT A2 ALK2 APN2 HEK2 PSY4 PIN4 TEL1 HHT1 RDH54 TEC1 MMS4 RAD16 A TG14 PEX32 SWD3 TDP1 SLX1 PBP2 HSM3 CHK1 RIF1 DPB3 SNF5 GBP2 DCC1 YCL056C MRC1 POL4 MSH3 NHP10 HEX3 RPN4 GPR1 SIR2 RAD59 PEX19 BDF2 BRE1 MSH5 MGT1 RAD57 YDR026C RAD28 PPH3 RAD55 UBC13 GIS1 MSH6 BMH2 GSG1 FOB1 DPB4 PEX7 SAC3 ADR1 RAD9 HT A1 SIR4 PEX5 CT A1 DIN7 PEX10 RAD34 HIM1 PEX3 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 SNF1 PEX29 PLM2 RAD23 AFG1 GP A2 MIG3 CHZ1 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 RIM15 A TG18 PUF4 CKB1 ALK1 PNC1 RAD6 MMS2 LIF1 P AN2 SNF4 SOH1 PEX14 RAD54 SAE2 SIP2 RTF1 RTG2 HXK2 PEX31 ACB1 PEX8 PHB1 PEX4 PRE9 PHB2 PEX21 LAG1 SHU1 SNF6 SOD2 SL T2 RRM3 WSS1 PEX28 RTT107 PEX18 EST3 CKA1 FIS1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' GUT2 MPH1 DJP1 RTT101 HPR5 BCK1 MDV1 A TG27 YJL185C PEX2 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 LAC1 P AN3 GPX1 IXR1 MSN4 YNK1 MDH1 AA T1 RAD27 APN1 SBA1 CTK1 PEX1 DOA1 VPS1 CAF4 UTH1 TRM2 NUP133 RPL40B MLP1 DNM1 RTT109 HSP104 UBI4 BRE2 RAD5 MLH2 SLX4 STM1 TOS4 PEX13 EST1 TOP3 IRC20 NEJ1 BUD6 MMS22 PEX30 MID2 RSC2 MDM30 PSY3 CST9 CDC73 A TG17 SIR3 RIF2 RAD33 UNG1 TSA1 RAD52 CGI121 SML1 OGG1 RAD10 ZDS2 CTK3 NDI1 YMR018W PEX12 MSN2 CSM3 CTF18 YKU80 PSO2 NDE1 YIM1 TPP1 INP2 MLH1 DDR48 SGS1 RAD14 INP1 MRE11 ZDS1 YKU70 HDA1 HHT2 PMS1 MKT1 RAS2 Y AF9 EAF7 PEX17 IES2 MGS1 A TG2 VPS75 RAD50 MCK1 PEX6 RPD3 YRF1−6 SIN3 LAG2 NTG2 PEX15 MSH2 WSC3 MDH2 RTC1 PEX11 DNL4 SLG1 EXO1 CKB2 CKA2 Y OR084W RAS1 LEO1 ARP8 ELG1 IES4 PEX27 NPT1 WTM2 MKK1 REV1 SCP1 RAD17 PHR1 HA T1 CHL1 HST2 RAD1 RMI1 ELC1 PNG1 PEX25 VPS30 MEI5 MKK2 TGS1 MLH3 REV3 CTI6 RTC6 DDC1 HSP82 EAF3 NHP6A ANT1 CTF4 MMS1 HDA3 SWD1 HEK2 TEL1 SWD3 PBP2 RIF1 GBP2 MRC1 HEX3 RAD59 RAD57 XRS2 RAD51 SLX8 RAD54 EST3 SBA1 BRE2 STM1 EST1 TOP3 RIF2 RAD52 CGI121 YKU80 SGS1 YKU70 RAD50 YRF1−6 EXO1 ELG1 TGS1 HSP82 0 20 40 60 80 0 20 40 60 80 NTG1 Y AL027W PEX22 NUP60 SWD1 HT A2 ALK2 APN2 HEK2 PSY4 PIN4 TEL1 HHT1 RDH54 TEC1 MMS4 RAD16 A TG14 PEX32 SWD3 TDP1 SLX1 PBP2 HSM3 CHK1 RIF1 DPB3 SNF5 GBP2 DCC1 YCL056C MRC1 POL4 MSH3 NHP10 HEX3 RPN4 GPR1 SIR2 RAD59 PEX19 BDF2 BRE1 MSH5 MGT1 RAD57 YDR026C RAD28 PPH3 RAD55 UBC13 GIS1 MSH6 BMH2 GSG1 FOB1 DPB4 PEX7 SAC3 ADR1 RAD9 HT A1 SIR4 PEX5 CT A1 DIN7 PEX10 RAD3 4 HIM1 PEX3 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 SNF1 PEX29 PLM2 RAD23 AFG1 GP A2 MIG3 CHZ1 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 RIM15 A TG18 PUF4 CKB1 ALK1 PNC1 RAD6 MMS2 LIF1 P AN2 SNF4 SOH1 PEX14 RAD54 SAE2 SIP2 RTF1 RTG2 HXK2 PEX31 ACB1 PEX8 PHB1 PEX4 PRE9 PHB2 PEX21 LAG1 SHU1 SNF6 SOD2 SL T2 RRM3 WSS1 PEX28 RTT107 PEX18 EST3 CKA1 FIS1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' GUT2 MPH1 DJP1 RTT101 HPR5 BCK1 MDV1 A TG27 YJL185C PEX2 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 LAC1 P AN3 GPX1 IXR1 MSN4 YNK1 MDH1 AA T1 RAD27 APN1 SBA1 CTK1 PEX1 DOA1 VPS1 CAF4 UTH1 TRM2 NUP133 RPL40B MLP1 DNM1 RTT109 HSP104 UBI4 BRE2 RAD5 MLH2 SLX4 STM1 TOS4 PEX13 EST1 TOP3 IRC20 NEJ1 BUD6 MMS22 PEX30 MID2 RSC2 MDM30 PSY3 CST9 CDC73 A TG17 SIR3 RIF2 RAD33 UNG1 TSA1 RAD52 CGI121 SML1 OGG1 RAD10 ZDS2 CTK3 NDI1 YMR018W PEX12 MSN2 CSM3 CTF18 YKU80 PSO2 NDE1 YIM1 TPP1 INP2 MLH1 DDR48 SGS1 RAD14 INP1 MRE11 ZDS1 YKU70 HDA1 HHT2 PMS1 MKT1 RAS2 Y AF9 EAF7 PEX17 IES2 MGS1 A TG2 VPS75 RAD50 MCK1 PEX6 RPD3 YRF1−6 SIN3 LAG2 NTG2 PEX15 MSH2 WSC3 MDH2 RTC1 PEX11 DNL4 SLG1 EXO1 CKB2 CKA2 Y OR084W RAS1 LEO1 ARP8 ELG1 IES4 PEX27 NPT1 WTM2 MKK1 REV1 SCP1 RAD17 PHR1 HA T1 CHL1 HST2 RAD1 RMI1 ELC1 PNG1 PEX25 VPS30 MEI5 MKK2 TGS1 MLH3 REV3 CTI6 RTC6 DDC1 HSP82 EAF3 NHP6A ANT1 CTF4 MMS1 HDA3 SWD1 HEK2 TEL1 SWD3 PBP2 RIF1 GBP2 MRC1 HEX3 RAD59 RAD57 XRS2 RAD51 SLX8 RAD54 EST3 SBA1 BRE2 STM1 EST1 TOP3 RIF2 RAD52 CGI121 YKU80 SGS1 YKU70 RAD50 YRF1−6 EXO1 ELG1 TGS1 HSP82 Figure B.1: Alternati ve ﬁtness plots with orf ∆ posterior mean ﬁtnesses. Labels for the “telomere maintenance” gene ontology term are highlighted in blue. A) Non-Bayesian, non-hierarchical ﬁtness plot, based on T able S6 from Addinall et al. (2011) ( F = M D R × M D P ) . B) Non- Bayesian, hierarchical ﬁtness plot, from ﬁtting REM to data in T able S6 from Addinall et al. (2011) ( F = M D R × M D P ) . C) IHM ﬁtness plot with orf ∆ posterior mean ﬁtness ( F = M D R × M D P ) . D) JHM ﬁtness plot with orf ∆ posterior mean ﬁtnesses. orf ∆ strains are classiﬁed as being a suppressor or enhancer based on analysis of growth parameter r . Further ﬁtness plot explanation and notation is gi v en in Figure 4.2. 116 Appendix B. Bayesian hierarchical modelling A B C D 0 50 100 150 0 20 40 60 80 100 120 TEL1 TEC1 DPB3 GPR1 SIR2 YDR026C GIS1 FOB1 DPB4 SIR4 CT A1 SNF1 GP A2 RIM15 PUF4 PNC1 SNF4 SIP2 RTG2 HXK2 ACB1 PHB1 PHB2 LAG1 SOD2 FIS1 GUT2 LAC1 MSN4 MDH1 AA T1 RAD27 UTH1 DNM1 HSP104 EST2 BUD6 MDM30 SIR3 TSA1 ZDS2 NDI1 MSN2 NDE1 SGS1 ZDS1 HDA1 RAS2 PEX6 RPD3 LAG2 DNL4 CKA2 RAS1 NPT1 SCP1 HST2 CTF4 HDA3 NTG1 FUN30 Y AL027W NUP60 HT A2 ALK2 APN2 PSY4 PIN4 HHT1 RDH54 MMS4 RAD16 TDP1 SLX1 HSM3 CHK1 SNF5 DCC1 MRC1 POL4 MSH3 NHP10 HEX3 RPN4 RAD59 BDF2 BRE1 MSH5 MGT1 RAD57 RAD28 PPH3 RAD55 UBC13 MSH6 BMH2 SAC3 RAD9 HT A1 DIN7 RAD3 4 HIM1 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 PLM2 RAD23 HA T2 MIG3 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 CKB1 ALK1 RAD6 MMS2 LIF1 P AN2 SOH1 RAD54 SAE2 RTF1 SHU1 SNF6 RRM3 WSS1 RTT107 CKA1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' MPH1 RTT101 HPR5 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 P AN3 IXR1 YNK1 APN1 CTK1 DOA1 TRM2 NUP133 RPL40B MLP1 RTT109 UBI4 RAD5 MLH2 SLX4 TOS4 IRC20 NEJ1 MMS22 RSC2 PSY3 CST9 CDC73 RAD33 UNG1 RAD52 SML1 OGG1 RAD10 CTK3 CSM3 CTF18 YKU80 PSO2 YIM1 TPP1 MLH1 DDR48 RAD14 MRE11 YKU70 HHT2 PMS1 MKT1 Y AF9 EAF7 PSY2 IES2 MGS1 VPS75 RAD50 TOF1 MCK1 SIN3 NTG2 MSH2 HMI1 P AP2 EXO1 CKB2 LEO1 ARP8 ELG1 IES4 WTM2 REV1 RAD17 PHR1 HA T1 CHL1 RAD1 RMI1 ELC1 PNG1 MEI5 MLH3 REV3 DDC1 EAF3 NHP6A MMS1 CHZ1 PRE9 CTI6 RTC6 TGS1 PEX22 A TG1 4 PEX32 YCL056C PEX19 GSG1 PEX7 ADR1 PEX5 PEX10 PEX3 PEX29 AFG1 A TG18 PEX14 PEX31 PEX8 PEX4 PEX21 SL T2 PEX28 PEX18 DJP1 BCK1 MD V1 A TG27 YJL185C PEX2 GPX1 PEX1 VPS1 CAF4 PEX13 PEX30 MID2 A TG17 YMR018W PEX12 INP2 INP1 PEX17 A TG2 PEX15 WSC3 MDH2 RTC1 PEX11 SLG1 Y OR084W PEX27 MKK1 PEX25 VPS30 MKK2 ANT1 SWD1 HEK2 SWD3 PBP2 RIF1 GBP2 EST3 SBA1 BRE2 STM1 EST1 TOP3 RIF2 CGI121 HSC82 YRF1−6 HSP82 TEL1 TEC1 DPB3 GPR1 SIR2 YDR026C GIS1 FOB1 DPB4 SIR4 CT A1 SNF1 GP A2 RIM15 PUF4 PNC1 SNF4 SIP2 RTG2 HXK2 ACB1 PHB1 PHB2 LAG1 SOD2 FIS1 GUT2 LAC1 MSN4 MDH1 AA T1 RAD27 UTH1 DNM1 HSP104 EST2 BUD6 MDM30 SIR3 TSA1 ZDS2 NDI1 MSN2 NDE1 SGS1 ZDS1 HDA1 RAS2 PEX6 RPD3 LAG2 DNL4 CKA2 RAS1 NPT1 SCP1 HST2 CTF4 HDA3 0 50 100 150 0 20 40 60 80 100 TEL1 TEC1 DPB3 GPR1 SIR2 YDR026C GIS1 FOB1 DPB4 SIR4 CT A1 SNF1 GP A2 RIM15 PUF4 PNC1 SNF4 SIP2 RTG2 HXK2 ACB1 PHB1 PHB2 LAG1 SOD2 FIS1 GUT2 LAC1 MSN4 MDH1 AA T1 RAD27 UTH1 DNM1 HSP104 EST2 BUD6 MDM30 SIR3 TSA1 ZDS2 NDI1 MSN2 NDE1 SGS1 ZDS1 HDA1 RAS2 PEX6 RPD3 LAG2 DNL4 CKA2 RAS1 NPT1 SCP1 HST2 CTF4 HDA3 NTG1 FUN30 Y AL027W NUP60 HT A2 ALK2 APN2 PSY4 PIN4 HHT1 RDH54 MMS4 RAD16 TDP1 SLX1 HSM3 CHK1 SNF5 DCC1 MRC1 POL4 MSH3 NHP10 HEX3 RPN4 RAD59 BDF2 BRE1 MSH5 MGT1 RAD57 RAD28 PPH3 RAD55 UBC13 MSH6 BMH2 SAC3 RAD9 HT A1 DIN7 RAD3 4 HIM1 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 PLM2 RAD23 HA T2 MIG3 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 CKB1 ALK1 RAD6 MMS2 LIF1 P AN2 SOH1 RAD54 SAE2 RTF1 SHU1 SNF6 RRM3 WSS1 RTT107 CKA1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' MPH1 RTT101 HPR5 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 P AN3 IXR1 YNK1 APN1 CTK1 DOA1 TRM2 NUP133 RPL40B MLP1 RTT109 UBI4 RAD5 MLH2 SLX4 TOS4 IRC20 NEJ1 MMS22 RSC2 PSY3 CST9 CDC73 RAD33 UNG1 RAD52 SML1 OGG1 RAD10 CTK3 CSM3 CTF18 YKU80 PSO2 YIM1 TPP1 MLH1 DDR48 RAD14 MRE11 YKU70 HHT2 PMS1 MKT1 Y AF9 EAF7 PSY2 IES2 MGS1 VPS75 RAD50 TOF1 MCK1 SIN3 NTG2 MSH2 HMI1 P AP2 EXO1 CKB2 LEO1 ARP8 ELG1 IES4 WTM2 REV1 RAD17 PHR1 HA T1 CHL1 RAD1 RMI1 ELC1 PNG1 MEI5 MLH3 REV3 DDC1 EAF3 NHP6A MMS1 CHZ1 PRE9 CTI6 RTC6 TGS1 PEX22 A TG1 4 PEX32 YCL056C PEX19 GSG1 PEX7 ADR1 PEX5 PEX10 PEX3 PEX29 AFG1 A TG18 PEX14 PEX31 PEX8 PEX4 PEX21 SL T2 PEX28 PEX18 DJP1 BCK1 MD V1 A TG27 YJL185C PEX2 GPX1 PEX1 VPS1 CAF4 PEX13 PEX30 MID2 A TG17 YMR018W PEX12 INP2 INP1 PEX17 A TG2 PEX15 WSC3 MDH2 RTC1 PEX11 SLG1 Y OR084W PEX27 MKK1 PEX25 VPS30 MKK2 ANT1 SWD1 HEK2 SWD3 PBP2 RIF1 GBP2 EST3 SBA1 BRE2 STM1 EST1 TOP3 RIF2 CGI121 HSC82 YRF1−6 HSP82 TEL1 TEC1 DPB3 GPR1 SIR2 YDR026C GIS1 FOB1 DPB4 SIR4 CT A1 SNF1 GP A2 RIM15 PUF4 PNC1 SNF4 SIP2 RTG2 HXK2 ACB1 PHB1 PHB2 LAG1 SOD2 FIS1 GUT2 LAC1 MSN4 MDH1 AA T1 RAD27 UTH1 DNM1 HSP104 EST2 BUD6 MDM30 SIR3 TSA1 ZDS2 NDI1 MSN2 NDE1 SGS1 ZDS1 HDA1 RAS2 PEX6 RPD3 LAG2 DNL4 CKA2 RAS1 NPT1 SCP1 HST2 CTF4 HDA3 0 50 100 150 0 10 20 30 40 50 60 NTG1 Y AL027W PEX22 NUP60 SWD1 HT A2 ALK2 APN2 HEK2 PSY4 PIN4 TEL1 HHT1 RDH54 TEC1 MMS4 RAD16 A TG14 PEX32 SWD3 TDP1 SLX1 PBP2 HSM3 CHK1 RIF1 DPB3 SNF5 GBP2 DCC1 YCL056C MRC1 POL4 MSH3 NHP10 HEX3 RPN4 GPR1 SIR2 RAD59 PEX19 BDF2 BRE1 MSH5 MGT1 RAD57 YDR026C RAD28 PPH3 RAD55 UBC13 GIS1 MSH6 BMH2 GSG1 FOB1 DPB4 PEX7 SAC3 ADR1 RAD9 HT A1 SIR4 PEX5 CT A1 DIN7 PEX10 RAD34 HIM1 PEX3 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 SNF1 PEX29 PLM2 RAD23 AFG1 GP A2 MIG3 CHZ1 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 RIM15 A TG18 PUF4 CKB1 ALK1 PNC1 RAD6 MMS2 LIF1 P AN2 SNF4 SOH1 PEX14 RAD54 SAE2 SIP2 RTF1 RTG2 HXK2 PEX31 ACB1 PEX8 PHB1 PEX4 PRE9 PHB2 PEX21 LAG1 SHU1 SNF6 SOD2 SL T2 RRM3 WSS1 PEX28 RTT107 PEX18 EST3 CKA1 FIS1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' GUT2 MPH1 DJP1 RTT101 HPR5 BCK1 MDV1 A TG27 YJL185C PEX2 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 LAC1 P AN3 GPX1 IXR1 MSN4 YNK1 MDH1 AA T1 RAD27 APN1 SBA1 CTK1 PEX1 DOA1 VPS1 CAF4 UTH1 TRM2 NUP133 RPL40B MLP1 DNM1 RTT109 HSP104 UBI4 BRE2 RAD5 MLH2 SLX4 STM1 TOS4 PEX13 EST1 TOP3 IRC20 NEJ1 BUD6 MMS22 PEX30 MID2 RSC2 MDM30 PSY3 CST9 CDC73 A TG17 SIR3 RIF2 RAD33 UNG1 TSA1 RAD52 CGI121 SML1 OGG1 RAD10 ZDS2 CTK3 NDI1 YMR018W PEX12 MSN2 CSM3 CTF18 YKU80 PSO2 NDE1 YIM1 TPP1 INP2 MLH1 DDR48 SGS1 RAD14 INP1 MRE11 ZDS1 YKU70 HDA1 HHT2 PMS1 MKT1 RAS2 Y AF9 EAF7 PEX17 IES2 MGS1 A TG2 VPS75 RAD50 MCK1 PEX6 RPD3 YRF1−6 SIN3 LAG2 NTG2 PEX15 MSH2 WSC3 MDH2 RTC1 PEX11 DNL4 SLG1 EXO1 CKB2 CKA2 Y OR084W RAS1 LEO1 ARP8 ELG1 IES4 PEX27 NPT1 WTM2 MKK1 REV1 SCP1 RAD17 PHR1 HA T1 CHL1 HST2 RAD1 RMI1 ELC1 PNG1 PEX25 VPS30 MEI5 MKK2 TGS1 MLH3 REV3 CTI6 RTC6 DDC1 HSP82 EAF3 NHP6A ANT1 CTF4 MMS1 HDA3 TEL1 TEC1 DPB3 GPR1 SIR2 YDR026C GIS1 FOB1 DPB4 SIR4 CT A1 SNF1 GP A2 RIM15 PUF4 PNC1 SNF4 SIP2 RTG2 HXK2 ACB1 PHB1 PHB2 LAG1 SOD2 FIS1 GUT2 LAC1 MSN4 MDH1 AA T1 RAD27 UTH1 DNM1 HSP104 BUD6 MDM30 SIR3 TSA1 ZDS2 NDI1 MSN2 NDE1 SGS1 ZDS1 HDA1 RAS2 PEX6 RPD3 LAG2 DNL4 CKA2 RAS1 NPT1 SCP1 HST2 CTF4 HDA3 0 20 40 60 80 0 20 40 60 80 NTG1 Y AL027W PEX22 NUP60 SWD1 HT A2 ALK2 APN2 HEK2 PSY4 PIN4 TEL1 HHT1 RDH54 TEC1 MMS4 RAD16 A TG14 PEX32 SWD3 TDP1 SLX1 PBP2 HSM3 CHK1 RIF1 DPB3 SNF5 GBP2 DCC1 YCL056C MRC1 POL4 MSH3 NHP10 HEX3 RPN4 GPR1 SIR2 RAD59 PEX19 BDF2 BRE1 MSH5 MGT1 RAD57 YDR026C RAD28 PPH3 RAD55 UBC13 GIS1 MSH6 BMH2 GSG1 FOB1 DPB4 PEX7 SAC3 ADR1 RAD9 HT A1 SIR4 PEX5 CT A1 DIN7 PEX10 RAD3 4 HIM1 PEX3 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 SNF1 PEX29 PLM2 RAD23 AFG1 GP A2 MIG3 CHZ1 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 RIM15 A TG18 PUF4 CKB1 ALK1 PNC1 RAD6 MMS2 LIF1 P AN2 SNF4 SOH1 PEX14 RAD54 SAE2 SIP2 RTF1 RTG2 HXK2 PEX31 ACB1 PEX8 PHB1 PEX4 PRE9 PHB2 PEX21 LAG1 SHU1 SNF6 SOD2 SL T2 RRM3 WSS1 PEX28 RTT107 PEX18 EST3 CKA1 FIS1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' GUT2 MPH1 DJP1 RTT101 HPR5 BCK1 MDV1 A TG27 YJL185C PEX2 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 LAC1 P AN3 GPX1 IXR1 MSN4 YNK1 MDH1 AA T1 RAD27 APN1 SBA1 CTK1 PEX1 DOA1 VPS1 CAF4 UTH1 TRM2 NUP133 RPL40B MLP1 DNM1 RTT109 HSP104 UBI4 BRE2 RAD5 MLH2 SLX4 STM1 TOS4 PEX13 EST1 TOP3 IRC20 NEJ1 BUD6 MMS22 PEX30 MID2 RSC2 MDM30 PSY3 CST9 CDC73 A TG17 SIR3 RIF2 RAD33 UNG1 TSA1 RAD52 CGI121 SML1 OGG1 RAD10 ZDS2 CTK3 NDI1 YMR018W PEX12 MSN2 CSM3 CTF18 YKU80 PSO2 NDE1 YIM1 TPP1 INP2 MLH1 DDR48 SGS1 RAD14 INP1 MRE11 ZDS1 YKU70 HDA1 HHT2 PMS1 MKT1 RAS2 Y AF9 EAF7 PEX17 IES2 MGS1 A TG2 VPS75 RAD50 MCK1 PEX6 RPD3 YRF1−6 SIN3 LAG2 NTG2 PEX15 MSH2 WSC3 MDH2 RTC1 PEX11 DNL4 SLG1 EXO1 CKB2 CKA2 Y OR084W RAS1 LEO1 ARP8 ELG1 IES4 PEX27 NPT1 WTM2 MKK1 REV1 SCP1 RAD17 PHR1 HA T1 CHL1 HST2 RAD1 RMI1 ELC1 PNG1 PEX25 VPS30 MEI5 MKK2 TGS1 MLH3 REV3 CTI6 RTC6 DDC1 HSP82 EAF3 NHP6A ANT1 CTF4 MMS1 HDA3 TEL1 TEC1 DPB3 GPR1 SIR2 YDR026C GIS1 FOB1 DPB4 SIR4 CT A1 SNF1 GP A2 RIM15 PUF4 PNC1 SNF4 SIP2 RTG2 HXK2 ACB1 PHB1 PHB2 LAG1 SOD2 FIS1 GUT2 LAC1 MSN4 MDH1 AA T1 RAD27 UTH1 DNM1 HSP104 BUD6 MDM30 SIR3 TSA1 ZDS2 NDI1 MSN2 NDE1 SGS1 ZDS1 HDA1 RAS2 PEX6 RPD3 LAG2 DNL4 CKA2 RAS1 NPT1 SCP1 HST2 CTF4 HDA3 Figure B.2: Alternativ e ﬁtness plots with orf ∆ posterior mean ﬁtnesses. Labels for the “ageing” gene ontology term are highlighted in blue. A) Non-Bayesian, non-hierarchical ﬁtness plot, based on T able S6 from Addinall et al. (2011) ( F = M D R × M D P ) . B) Non-Bayesian, hierarchical ﬁtness plot, from ﬁtting REM to data in T able S6 from Addinall et al. (2011) ( F = M DR × M D P ) . C) IHM ﬁtness plot with orf ∆ posterior mean ﬁtness ( F = M D R × M D P ) . D) JHM ﬁtness plot with orf ∆ posterior mean ﬁtnesses. orf ∆ strains are classiﬁed as being a suppressor or enhancer based on analysis of gro wth parameter r . Further ﬁtness plot e xplanation and notation is gi ven in Figure 4.2. 117 Appendix B. Bayesian hierarchical modelling A B C D 0 50 100 150 0 20 40 60 80 100 120 TEL1 TEC1 DPB3 GPR1 SIR2 YDR026C GIS1 FOB1 DPB4 SIR4 CT A1 SNF1 GP A2 RIM15 PUF4 PNC1 SNF4 SIP2 RTG2 HXK2 ACB1 PHB1 PHB2 LAG1 SOD2 FIS1 GUT2 LAC1 MSN4 MDH1 AA T1 RAD27 UTH1 DNM1 HSP104 EST2 BUD6 MDM30 SIR3 TSA1 ZDS2 NDI1 MSN2 NDE1 SGS1 ZDS1 HDA1 RAS2 PEX6 RPD3 LAG2 DNL4 CKA2 RAS1 NPT1 SCP1 HST2 CTF4 HDA3 NTG1 FUN30 Y AL027W NUP60 HT A2 ALK2 APN2 PSY4 PIN4 HHT1 RDH54 MMS4 RAD16 TDP1 SLX1 HSM3 CHK1 SNF5 DCC1 MRC1 POL4 MSH3 NHP10 HEX3 RPN4 RAD59 BDF2 BRE1 MSH5 MGT1 RAD57 RAD28 PPH3 RAD55 UBC13 MSH6 BMH2 SAC3 RAD9 HT A1 DIN7 RAD3 4 HIM1 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 PLM2 RAD23 HA T2 MIG3 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 CKB1 ALK1 RAD6 MMS2 LIF1 P AN2 SOH1 RAD54 SAE2 RTF1 SHU1 SNF6 RRM3 WSS1 RTT107 CKA1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' MPH1 RTT101 HPR5 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 P AN3 IXR1 YNK1 APN1 CTK1 DOA1 TRM2 NUP133 RPL40B MLP1 RTT109 UBI4 RAD5 MLH2 SLX4 TOS4 IRC20 NEJ1 MMS22 RSC2 PSY3 CST9 CDC73 RAD33 UNG1 RAD52 SML1 OGG1 RAD10 CTK3 CSM3 CTF18 YKU80 PSO2 YIM1 TPP1 MLH1 DDR48 RAD14 MRE11 YKU70 HHT2 PMS1 MKT1 Y AF9 EAF7 PSY2 IES2 MGS1 VPS75 RAD50 TOF1 MCK1 SIN3 NTG2 MSH2 HMI1 P AP2 EXO1 CKB2 LEO1 ARP8 ELG1 IES4 WTM2 REV1 RAD17 PHR1 HA T1 CHL1 RAD1 RMI1 ELC1 PNG1 MEI5 MLH3 REV3 DDC1 EAF3 NHP6A MMS1 CHZ1 PRE9 CTI6 RTC6 TGS1 PEX22 A TG1 4 PEX32 YCL056C PEX19 GSG1 PEX7 ADR1 PEX5 PEX10 PEX3 PEX29 AFG1 A TG18 PEX14 PEX31 PEX8 PEX4 PEX21 SL T2 PEX28 PEX18 DJP1 BCK1 MD V1 A TG27 YJL185C PEX2 GPX1 PEX1 VPS1 CAF4 PEX13 PEX30 MID2 A TG17 YMR018W PEX12 INP2 INP1 PEX17 A TG2 PEX15 WSC3 MDH2 RTC1 PEX11 SLG1 Y OR084W PEX27 MKK1 PEX25 VPS30 MKK2 ANT1 SWD1 HEK2 SWD3 PBP2 RIF1 GBP2 EST3 SBA1 BRE2 STM1 EST1 TOP3 RIF2 CGI121 HSC82 YRF1−6 HSP82 NTG1 FUN30 Y AL027W NUP60 HT A2 ALK2 APN2 PSY4 PIN4 TEL1 HHT1 RDH54 MMS4 RAD16 TDP1 SLX1 HSM3 CHK1 DPB3 SNF5 DCC1 MRC1 POL4 MSH3 NHP10 HEX3 RPN4 SIR2 RAD59 BDF2 BRE1 MSH5 MGT1 RAD57 RAD28 PPH3 RAD55 UBC13 MSH6 BMH2 DPB4 SAC3 RAD9 HT A1 SIR4 DIN7 RAD3 4 HIM1 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 PLM2 RAD23 HA T2 MIG3 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 CKB1 ALK1 RAD6 MMS2 LIF1 P AN2 SOH1 RAD54 SAE2 RTF1 SHU1 SNF6 RRM3 WSS1 RTT107 CKA1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' MPH1 RTT101 HPR5 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 P AN3 IXR1 YNK1 RAD27 APN1 CTK1 DOA1 TRM2 NUP133 RPL40B MLP1 RTT109 UBI4 RAD5 MLH2 SLX4 TOS4 IRC20 NEJ1 MMS22 RSC2 PSY3 CST9 CDC73 SIR3 RAD33 UNG1 TSA1 RAD52 SML1 OGG1 RAD10 CTK3 CSM3 CTF18 YKU80 PSO2 YIM1 TPP1 MLH1 DDR48 SGS1 RAD14 MRE11 YKU70 HHT2 PMS1 MKT1 Y AF9 EAF7 PSY2 IES2 MGS1 VPS75 RAD50 TOF1 MCK1 SIN3 NTG2 MSH2 HMI1 P AP2 DNL4 EXO1 CKB2 CKA2 LEO1 ARP8 ELG1 IES4 WTM2 REV1 RAD17 PHR1 HA T1 CHL1 RAD1 RMI1 ELC1 PNG1 MEI5 MLH3 REV3 DDC1 EAF3 NHP6A CTF4 MMS1 0 50 100 150 0 20 40 60 80 100 TEL1 TEC1 DPB3 GPR1 SIR2 YDR026C GIS1 FOB1 DPB4 SIR4 CT A1 SNF1 GP A2 RIM15 PUF4 PNC1 SNF4 SIP2 RTG2 HXK2 ACB1 PHB1 PHB2 LAG1 SOD2 FIS1 GUT2 LAC1 MSN4 MDH1 AA T1 RAD27 UTH1 DNM1 HSP104 EST2 BUD6 MDM30 SIR3 TSA1 ZDS2 NDI1 MSN2 NDE1 SGS1 ZDS1 HDA1 RAS2 PEX6 RPD3 LAG2 DNL4 CKA2 RAS1 NPT1 SCP1 HST2 CTF4 HDA3 NTG1 FUN30 Y AL027W NUP60 HT A2 ALK2 APN2 PSY4 PIN4 HHT1 RDH54 MMS4 RAD16 TDP1 SLX1 HSM3 CHK1 SNF5 DCC1 MRC1 POL4 MSH3 NHP10 HEX3 RPN4 RAD59 BDF2 BRE1 MSH5 MGT1 RAD57 RAD28 PPH3 RAD55 UBC13 MSH6 BMH2 SAC3 RAD9 HT A1 DIN7 RAD3 4 HIM1 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 PLM2 RAD23 HA T2 MIG3 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 CKB1 ALK1 RAD6 MMS2 LIF1 P AN2 SOH1 RAD54 SAE2 RTF1 SHU1 SNF6 RRM3 WSS1 RTT107 CKA1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' MPH1 RTT101 HPR5 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 P AN3 IXR1 YNK1 APN1 CTK1 DOA1 TRM2 NUP133 RPL40B MLP1 RTT109 UBI4 RAD5 MLH2 SLX4 TOS4 IRC20 NEJ1 MMS22 RSC2 PSY3 CST9 CDC73 RAD33 UNG1 RAD52 SML1 OGG1 RAD10 CTK3 CSM3 CTF18 YKU80 PSO2 YIM1 TPP1 MLH1 DDR48 RAD14 MRE11 YKU70 HHT2 PMS1 MKT1 Y AF9 EAF7 PSY2 IES2 MGS1 VPS75 RAD50 TOF1 MCK1 SIN3 NTG2 MSH2 HMI1 P AP2 EXO1 CKB2 LEO1 ARP8 ELG1 IES4 WTM2 REV1 RAD17 PHR1 HA T1 CHL1 RAD1 RMI1 ELC1 PNG1 MEI5 MLH3 REV3 DDC1 EAF3 NHP6A MMS1 CHZ1 PRE9 CTI6 RTC6 TGS1 PEX22 A TG1 4 PEX32 YCL056C PEX19 GSG1 PEX7 ADR1 PEX5 PEX10 PEX3 PEX29 AFG1 A TG18 PEX14 PEX31 PEX8 PEX4 PEX21 SL T2 PEX28 PEX18 DJP1 BCK1 MD V1 A TG27 YJL185C PEX2 GPX1 PEX1 VPS1 CAF4 PEX13 PEX30 MID2 A TG17 YMR018W PEX12 INP2 INP1 PEX17 A TG2 PEX15 WSC3 MDH2 RTC1 PEX11 SLG1 Y OR084W PEX27 MKK1 PEX25 VPS30 MKK2 ANT1 SWD1 HEK2 SWD3 PBP2 RIF1 GBP2 EST3 SBA1 BRE2 STM1 EST1 TOP3 RIF2 CGI121 HSC82 YRF1−6 HSP82 NTG1 FUN30 Y AL027W NUP60 HT A2 ALK2 APN2 PSY4 PIN4 TEL1 HHT1 RDH54 MMS4 RAD16 TDP1 SLX1 HSM3 CHK1 DPB3 SNF5 DCC1 MRC1 POL4 MSH3 NHP10 HEX3 RPN4 SIR2 RAD59 BDF2 BRE1 MSH5 MGT1 RAD57 RAD28 PPH3 RAD55 UBC13 MSH6 BMH2 DPB4 SAC3 RAD9 HT A1 SIR4 DIN7 RAD3 4 HIM1 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 PLM2 RAD23 HA T2 MIG3 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 CKB1 ALK1 RAD6 MMS2 LIF1 P AN2 SOH1 RAD54 SAE2 RTF1 SHU1 SNF6 RRM3 WSS1 RTT107 CKA1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' MPH1 RTT101 HPR5 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 P AN3 IXR1 YNK1 RAD27 APN1 CTK1 DOA1 TRM2 NUP133 RPL40B MLP1 RTT109 UBI4 RAD5 MLH2 SLX4 TOS4 IRC20 NEJ1 MMS22 RSC2 PSY3 CST9 CDC73 SIR3 RAD33 UNG1 TSA1 RAD52 SML1 OGG1 RAD10 CTK3 CSM3 CTF18 YKU80 PSO2 YIM1 TPP1 MLH1 DDR48 SGS1 RAD14 MRE11 YKU70 HHT2 PMS1 MKT1 Y AF9 EAF7 PSY2 IES2 MGS1 VPS75 RAD50 TOF1 MCK1 SIN3 NTG2 MSH2 HMI1 P AP2 DNL4 EXO1 CKB2 CKA2 LEO1 ARP8 ELG1 IES4 WTM2 REV1 RAD17 PHR1 HA T1 CHL1 RAD1 RMI1 ELC1 PNG1 MEI5 MLH3 REV3 DDC1 EAF3 NHP6A CTF4 MMS1 0 50 100 150 0 10 20 30 40 50 60 NTG1 Y AL027W PEX22 NUP60 SWD1 HT A2 ALK2 APN2 HEK2 PSY4 PIN4 TEL1 HHT1 RDH54 TEC1 MMS4 RAD16 A TG14 PEX32 SWD3 TDP1 SLX1 PBP2 HSM3 CHK1 RIF1 DPB3 SNF5 GBP2 DCC1 YCL056C MRC1 POL4 MSH3 NHP10 HEX3 RPN4 GPR1 SIR2 RAD59 PEX19 BDF2 BRE1 MSH5 MGT1 RAD57 YDR026C RAD28 PPH3 RAD55 UBC13 GIS1 MSH6 BMH2 GSG1 FOB1 DPB4 PEX7 SAC3 ADR1 RAD9 HT A1 SIR4 PEX5 CT A1 DIN7 PEX10 RAD34 HIM1 PEX3 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 SNF1 PEX29 PLM2 RAD23 AFG1 GP A2 MIG3 CHZ1 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 RIM15 A TG18 PUF4 CKB1 ALK1 PNC1 RAD6 MMS2 LIF1 P AN2 SNF4 SOH1 PEX14 RAD54 SAE2 SIP2 RTF1 RTG2 HXK2 PEX31 ACB1 PEX8 PHB1 PEX4 PRE9 PHB2 PEX21 LAG1 SHU1 SNF6 SOD2 SL T2 RRM3 WSS1 PEX28 RTT107 PEX18 EST3 CKA1 FIS1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' GUT2 MPH1 DJP1 RTT101 HPR5 BCK1 MDV1 A TG27 YJL185C PEX2 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 LAC1 P AN3 GPX1 IXR1 MSN4 YNK1 MDH1 AA T1 RAD27 APN1 SBA1 CTK1 PEX1 DOA1 VPS1 CAF4 UTH1 TRM2 NUP133 RPL40B MLP1 DNM1 RTT109 HSP104 UBI4 BRE2 RAD5 MLH2 SLX4 STM1 TOS4 PEX13 EST1 TOP3 IRC20 NEJ1 BUD6 MMS22 PEX30 MID2 RSC2 MDM30 PSY3 CST9 CDC73 A TG17 SIR3 RIF2 RAD33 UNG1 TSA1 RAD52 CGI121 SML1 OGG1 RAD10 ZDS2 CTK3 NDI1 YMR018W PEX12 MSN2 CSM3 CTF18 YKU80 PSO2 NDE1 YIM1 TPP1 INP2 MLH1 DDR48 SGS1 RAD14 INP1 MRE11 ZDS1 YKU70 HDA1 HHT2 PMS1 MKT1 RAS2 Y AF9 EAF7 PEX17 IES2 MGS1 A TG2 VPS75 RAD50 MCK1 PEX6 RPD3 YRF1−6 SIN3 LAG2 NTG2 PEX15 MSH2 WSC3 MDH2 RTC1 PEX11 DNL4 SLG1 EXO1 CKB2 CKA2 Y OR084W RAS1 LEO1 ARP8 ELG1 IES4 PEX27 NPT1 WTM2 MKK1 REV1 SCP1 RAD17 PHR1 HA T1 CHL1 HST2 RAD1 RMI1 ELC1 PNG1 PEX25 VPS30 MEI5 MKK2 TGS1 MLH3 REV3 CTI6 RTC6 DDC1 HSP82 EAF3 NHP6A ANT1 CTF4 MMS1 HDA3 NTG1 Y AL027W NUP60 HT A2 ALK2 APN2 PSY4 PIN4 TEL1 HHT1 RDH54 MMS4 RAD16 TDP1 SLX1 HSM3 CHK1 DPB3 SNF5 DCC1 MRC1 POL4 MSH3 NHP10 HEX3 RPN4 SIR2 RAD59 BDF2 BRE1 MSH5 MGT1 RAD57 RAD28 PPH3 RAD55 UBC13 MSH6 BMH2 DPB4 SAC3 RAD9 HT A1 SIR4 DIN7 RAD34 HIM1 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 PLM2 RAD23 MIG3 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 CKB1 ALK1 RAD6 MMS2 LIF1 P AN2 SOH1 RAD54 SAE2 RTF1 SHU1 SNF6 RRM3 WSS1 RTT107 CKA1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' MPH1 RTT101 HPR5 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 P AN3 IXR1 YNK1 RAD27 APN1 CTK1 DOA1 TRM2 NUP133 RPL40B MLP1 RTT109 UBI4 RAD5 MLH2 SLX4 TOS4 IRC20 NEJ1 MMS22 RSC2 PSY3 CST9 CDC73 SIR3 RAD33 UNG1 TSA1 RAD52 SML1 OGG1 RAD10 CTK3 CSM3 CTF18 YKU80 PSO2 YIM1 TPP1 MLH1 DDR48 SGS1 RAD14 MRE11 YKU70 HHT2 PMS1 MKT1 Y AF9 EAF7 IES2 MGS1 VPS75 RAD50 MCK1 SIN3 NTG2 MSH2 DNL4 EXO1 CKB2 CKA2 LEO1 ARP8 ELG1 IES4 WTM2 REV1 RAD17 PHR1 HA T1 CHL1 RAD1 RMI1 ELC1 PNG1 MEI5 MLH3 REV3 DDC1 EAF3 NHP6A CTF4 MMS1 0 20 40 60 80 0 20 40 60 80 NTG1 Y AL027W PEX22 NUP60 SWD1 HT A2 ALK2 APN2 HEK2 PSY4 PIN4 TEL1 HHT1 RDH54 TEC1 MMS4 RAD16 A TG14 PEX32 SWD3 TDP1 SLX1 PBP2 HSM3 CHK1 RIF1 DPB3 SNF5 GBP2 DCC1 YCL056C MRC1 POL4 MSH3 NHP10 HEX3 RPN4 GPR1 SIR2 RAD59 PEX19 BDF2 BRE1 MSH5 MGT1 RAD57 YDR026C RAD28 PPH3 RAD55 UBC13 GIS1 MSH6 BMH2 GSG1 FOB1 DPB4 PEX7 SAC3 ADR1 RAD9 HT A1 SIR4 PEX5 CT A1 DIN7 PEX10 RAD3 4 HIM1 PEX3 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 SNF1 PEX29 PLM2 RAD23 AFG1 GP A2 MIG3 CHZ1 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 RIM15 A TG18 PUF4 CKB1 ALK1 PNC1 RAD6 MMS2 LIF1 P AN2 SNF4 SOH1 PEX14 RAD54 SAE2 SIP2 RTF1 RTG2 HXK2 PEX31 ACB1 PEX8 PHB1 PEX4 PRE9 PHB2 PEX21 LAG1 SHU1 SNF6 SOD2 SL T2 RRM3 WSS1 PEX28 RTT107 PEX18 EST3 CKA1 FIS1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' GUT2 MPH1 DJP1 RTT101 HPR5 BCK1 MDV1 A TG27 YJL185C PEX2 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 LAC1 P AN3 GPX1 IXR1 MSN4 YNK1 MDH1 AA T1 RAD27 APN1 SBA1 CTK1 PEX1 DOA1 VPS1 CAF4 UTH1 TRM2 NUP133 RPL40B MLP1 DNM1 RTT109 HSP104 UBI4 BRE2 RAD5 MLH2 SLX4 STM1 TOS4 PEX13 EST1 TOP3 IRC20 NEJ1 BUD6 MMS22 PEX30 MID2 RSC2 MDM30 PSY3 CST9 CDC73 A TG17 SIR3 RIF2 RAD33 UNG1 TSA1 RAD52 CGI121 SML1 OGG1 RAD10 ZDS2 CTK3 NDI1 YMR018W PEX12 MSN2 CSM3 CTF18 YKU80 PSO2 NDE1 YIM1 TPP1 INP2 MLH1 DDR48 SGS1 RAD14 INP1 MRE11 ZDS1 YKU70 HDA1 HHT2 PMS1 MKT1 RAS2 Y AF9 EAF7 PEX17 IES2 MGS1 A TG2 VPS75 RAD50 MCK1 PEX6 RPD3 YRF1−6 SIN3 LAG2 NTG2 PEX15 MSH2 WSC3 MDH2 RTC1 PEX11 DNL4 SLG1 EXO1 CKB2 CKA2 Y OR084W RAS1 LEO1 ARP8 ELG1 IES4 PEX27 NPT1 WTM2 MKK1 REV1 SCP1 RAD17 PHR1 HA T1 CHL1 HST2 RAD1 RMI1 ELC1 PNG1 PEX25 VPS30 MEI5 MKK2 TGS1 MLH3 REV3 CTI6 RTC6 DDC1 HSP82 EAF3 NHP6A ANT1 CTF4 MMS1 HDA3 NTG1 Y AL027W NUP60 HT A2 ALK2 APN2 PSY4 PIN4 TEL1 HHT1 RDH54 MMS4 RAD16 TDP1 SLX1 HSM3 CHK1 DPB3 SNF5 DCC1 MRC1 POL4 MSH3 NHP10 HEX3 RPN4 SIR2 RAD59 BDF2 BRE1 MSH5 MGT1 RAD57 RAD28 PPH3 RAD55 UBC13 MSH6 BMH2 DPB4 SAC3 RAD9 HT A1 SIR4 DIN7 RAD3 4 HIM1 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 PLM2 RAD23 MIG3 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 CKB1 ALK1 RAD6 MMS2 LIF1 P AN2 SOH1 RAD54 SAE2 RTF1 SHU1 SNF6 RRM3 WSS1 RTT107 CKA1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' MPH1 RTT101 HPR5 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 P AN3 IXR1 YNK1 RAD27 APN1 CTK1 DOA1 TRM2 NUP133 RPL40B MLP1 RTT109 UBI4 RAD5 MLH2 SLX4 TOS4 IRC20 NEJ1 MMS22 RSC2 PSY3 CST9 CDC73 SIR3 RAD33 UNG1 TSA1 RAD52 SML1 OGG1 RAD10 CTK3 CSM3 CTF18 YKU80 PSO2 YIM1 TPP1 MLH1 DDR48 SGS1 RAD14 MRE11 YKU70 HHT2 PMS1 MKT1 Y AF9 EAF7 IES2 MGS1 VPS75 RAD50 MCK1 SIN3 NTG2 MSH2 DNL4 EXO1 CKB2 CKA2 LEO1 ARP8 ELG1 IES4 WTM2 REV1 RAD17 PHR1 HA T1 CHL1 RAD1 RMI1 ELC1 PNG1 MEI5 MLH3 REV3 DDC1 EAF3 NHP6A CTF4 MMS1 Figure B.3: Alternativ e ﬁtness plots with orf ∆ posterior mean ﬁtnesses. Labels for the “response to DN A damage” gene ontology term are highlighted in blue. A) Non-Bayesian, non-hierarchical ﬁtness plot, based on T able S6 from Addinall et al. (2011) ( F = M D R × M D P ) . B) Non- Bayesian, hierarchical ﬁtness plot, from ﬁtting REM to data in T able S6 from Addinall et al. (2011) ( F = M D R × M D P ) . C) IHM ﬁtness plot with orf ∆ posterior mean ﬁtness ( F = M D R × M D P ) . D) JHM ﬁtness plot with orf ∆ posterior mean ﬁtnesses. orf ∆ strains are classiﬁed as being a suppressor or enhancer based on analysis of growth parameter r . Further ﬁtness plot explanation and notation is gi v en in Figure 4.2. 118 Appendix B. Bayesian hierarchical modelling A B C D 0 50 100 150 0 20 40 60 80 100 120 TEL1 TEC1 DPB3 GPR1 SIR2 YDR026C GIS1 FOB1 DPB4 SIR4 CT A1 SNF1 GP A2 RIM15 PUF4 PNC1 SNF4 SIP2 RTG2 HXK2 ACB1 PHB1 PHB2 LAG1 SOD2 FIS1 GUT2 LAC1 MSN4 MDH1 AA T1 RAD27 UTH1 DNM1 HSP104 EST2 BUD6 MDM30 SIR3 TSA1 ZDS2 NDI1 MSN2 NDE1 SGS1 ZDS1 HDA1 RAS2 PEX6 RPD3 LAG2 DNL4 CKA2 RAS1 NPT1 SCP1 HST2 CTF4 HDA3 NTG1 FUN30 Y AL027W NUP60 HT A2 ALK2 APN2 PSY4 PIN4 HHT1 RDH54 MMS4 RAD16 TDP1 SLX1 HSM3 CHK1 SNF5 DCC1 MRC1 POL4 MSH3 NHP10 HEX3 RPN4 RAD59 BDF2 BRE1 MSH5 MGT1 RAD57 RAD28 PPH3 RAD55 UBC13 MSH6 BMH2 SAC3 RAD9 HT A1 DIN7 RAD3 4 HIM1 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 PLM2 RAD23 HA T2 MIG3 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 CKB1 ALK1 RAD6 MMS2 LIF1 P AN2 SOH1 RAD54 SAE2 RTF1 SHU1 SNF6 RRM3 WSS1 RTT107 CKA1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' MPH1 RTT101 HPR5 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 P AN3 IXR1 YNK1 APN1 CTK1 DOA1 TRM2 NUP133 RPL40B MLP1 RTT109 UBI4 RAD5 MLH2 SLX4 TOS4 IRC20 NEJ1 MMS22 RSC2 PSY3 CST9 CDC73 RAD33 UNG1 RAD52 SML1 OGG1 RAD10 CTK3 CSM3 CTF18 YKU80 PSO2 YIM1 TPP1 MLH1 DDR48 RAD14 MRE11 YKU70 HHT2 PMS1 MKT1 Y AF9 EAF7 PSY2 IES2 MGS1 VPS75 RAD50 TOF1 MCK1 SIN3 NTG2 MSH2 HMI1 P AP2 EXO1 CKB2 LEO1 ARP8 ELG1 IES4 WTM2 REV1 RAD17 PHR1 HA T1 CHL1 RAD1 RMI1 ELC1 PNG1 MEI5 MLH3 REV3 DDC1 EAF3 NHP6A MMS1 CHZ1 PRE9 CTI6 RTC6 TGS1 PEX22 A TG1 4 PEX32 YCL056C PEX19 GSG1 PEX7 ADR1 PEX5 PEX10 PEX3 PEX29 AFG1 A TG18 PEX14 PEX31 PEX8 PEX4 PEX21 SL T2 PEX28 PEX18 DJP1 BCK1 MD V1 A TG27 YJL185C PEX2 GPX1 PEX1 VPS1 CAF4 PEX13 PEX30 MID2 A TG17 YMR018W PEX12 INP2 INP1 PEX17 A TG2 PEX15 WSC3 MDH2 RTC1 PEX11 SLG1 Y OR084W PEX27 MKK1 PEX25 VPS30 MKK2 ANT1 SWD1 HEK2 SWD3 PBP2 RIF1 GBP2 EST3 SBA1 BRE2 STM1 EST1 TOP3 RIF2 CGI121 HSC82 YRF1−6 HSP82 PEX22 A TG1 4 PEX32 YCL056C PEX19 GSG1 PEX7 ADR1 PEX5 PEX10 PEX3 PEX29 AFG1 A TG18 SNF4 PEX14 PEX31 PEX8 PEX4 PEX21 SL T2 PEX28 PEX18 FIS1 DJP1 BCK1 MD V1 A TG27 YJL185C PEX2 GPX1 PEX1 VPS1 CAF4 DNM1 PEX13 PEX30 MID2 A TG17 YMR018W PEX12 INP2 INP1 PEX17 A TG2 PEX6 PEX15 WSC3 MDH2 RTC1 PEX11 SLG1 Y OR084W PEX27 MKK1 PEX25 VPS30 MKK2 ANT1 0 50 100 150 0 20 40 60 80 100 TEL1 TEC1 DPB3 GPR1 SIR2 YDR026C GIS1 FOB1 DPB4 SIR4 CT A1 SNF1 GP A2 RIM15 PUF4 PNC1 SNF4 SIP2 RTG2 HXK2 ACB1 PHB1 PHB2 LAG1 SOD2 FIS1 GUT2 LAC1 MSN4 MDH1 AA T1 RAD27 UTH1 DNM1 HSP104 EST2 BUD6 MDM30 SIR3 TSA1 ZDS2 NDI1 MSN2 NDE1 SGS1 ZDS1 HDA1 RAS2 PEX6 RPD3 LAG2 DNL4 CKA2 RAS1 NPT1 SCP1 HST2 CTF4 HDA3 NTG1 FUN30 Y AL027W NUP60 HT A2 ALK2 APN2 PSY4 PIN4 HHT1 RDH54 MMS4 RAD16 TDP1 SLX1 HSM3 CHK1 SNF5 DCC1 MRC1 POL4 MSH3 NHP10 HEX3 RPN4 RAD59 BDF2 BRE1 MSH5 MGT1 RAD57 RAD28 PPH3 RAD55 UBC13 MSH6 BMH2 SAC3 RAD9 HT A1 DIN7 RAD3 4 HIM1 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 PLM2 RAD23 HA T2 MIG3 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 CKB1 ALK1 RAD6 MMS2 LIF1 P AN2 SOH1 RAD54 SAE2 RTF1 SHU1 SNF6 RRM3 WSS1 RTT107 CKA1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' MPH1 RTT101 HPR5 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 P AN3 IXR1 YNK1 APN1 CTK1 DOA1 TRM2 NUP133 RPL40B MLP1 RTT109 UBI4 RAD5 MLH2 SLX4 TOS4 IRC20 NEJ1 MMS22 RSC2 PSY3 CST9 CDC73 RAD33 UNG1 RAD52 SML1 OGG1 RAD10 CTK3 CSM3 CTF18 YKU80 PSO2 YIM1 TPP1 MLH1 DDR48 RAD14 MRE11 YKU70 HHT2 PMS1 MKT1 Y AF9 EAF7 PSY2 IES2 MGS1 VPS75 RAD50 TOF1 MCK1 SIN3 NTG2 MSH2 HMI1 P AP2 EXO1 CKB2 LEO1 ARP8 ELG1 IES4 WTM2 REV1 RAD17 PHR1 HA T1 CHL1 RAD1 RMI1 ELC1 PNG1 MEI5 MLH3 REV3 DDC1 EAF3 NHP6A MMS1 CHZ1 PRE9 CTI6 RTC6 TGS1 PEX22 A TG1 4 PEX32 YCL056C PEX19 GSG1 PEX7 ADR1 PEX5 PEX10 PEX3 PEX29 AFG1 A TG18 PEX14 PEX31 PEX8 PEX4 PEX21 SL T2 PEX28 PEX18 DJP1 BCK1 MD V1 A TG27 YJL185C PEX2 GPX1 PEX1 VPS1 CAF4 PEX13 PEX30 MID2 A TG17 YMR018W PEX12 INP2 INP1 PEX17 A TG2 PEX15 WSC3 MDH2 RTC1 PEX11 SLG1 Y OR084W PEX27 MKK1 PEX25 VPS30 MKK2 ANT1 SWD1 HEK2 SWD3 PBP2 RIF1 GBP2 EST3 SBA1 BRE2 STM1 EST1 TOP3 RIF2 CGI121 HSC82 YRF1−6 HSP82 PEX22 A TG1 4 PEX32 YCL056C PEX19 GSG1 PEX7 ADR1 PEX5 PEX10 PEX3 PEX29 AFG1 A TG18 SNF4 PEX14 PEX31 PEX8 PEX4 PEX21 SL T2 PEX28 PEX18 FIS1 DJP1 BCK1 MD V1 A TG27 YJL185C PEX2 GPX1 PEX1 VPS1 CAF4 DNM1 PEX13 PEX30 MID2 A TG17 YMR018W PEX12 INP2 INP1 PEX17 A TG2 PEX6 PEX15 WSC3 MDH2 RTC1 PEX11 SLG1 Y OR084W PEX27 MKK1 PEX25 VPS30 MKK2 ANT1 0 50 100 150 0 10 20 30 40 50 60 NTG1 Y AL027W PEX22 NUP60 SWD1 HT A2 ALK2 APN2 HEK2 PSY4 PIN4 TEL1 HHT1 RDH54 TEC1 MMS4 RAD16 A TG14 PEX32 SWD3 TDP1 SLX1 PBP2 HSM3 CHK1 RIF1 DPB3 SNF5 GBP2 DCC1 YCL056C MRC1 POL4 MSH3 NHP10 HEX3 RPN4 GPR1 SIR2 RAD59 PEX19 BDF2 BRE1 MSH5 MGT1 RAD57 YDR026C RAD28 PPH3 RAD55 UBC13 GIS1 MSH6 BMH2 GSG1 FOB1 DPB4 PEX7 SAC3 ADR1 RAD9 HT A1 SIR4 PEX5 CT A1 DIN7 PEX10 RAD34 HIM1 PEX3 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 SNF1 PEX29 PLM2 RAD23 AFG1 GP A2 MIG3 CHZ1 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 RIM15 A TG18 PUF4 CKB1 ALK1 PNC1 RAD6 MMS2 LIF1 P AN2 SNF4 SOH1 PEX14 RAD54 SAE2 SIP2 RTF1 RTG2 HXK2 PEX31 ACB1 PEX8 PHB1 PEX4 PRE9 PHB2 PEX21 LAG1 SHU1 SNF6 SOD2 SL T2 RRM3 WSS1 PEX28 RTT107 PEX18 EST3 CKA1 FIS1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' GUT2 MPH1 DJP1 RTT101 HPR5 BCK1 MDV1 A TG27 YJL185C PEX2 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 LAC1 P AN3 GPX1 IXR1 MSN4 YNK1 MDH1 AA T1 RAD27 APN1 SBA1 CTK1 PEX1 DOA1 VPS1 CAF4 UTH1 TRM2 NUP133 RPL40B MLP1 DNM1 RTT109 HSP104 UBI4 BRE2 RAD5 MLH2 SLX4 STM1 TOS4 PEX13 EST1 TOP3 IRC20 NEJ1 BUD6 MMS22 PEX30 MID2 RSC2 MDM30 PSY3 CST9 CDC73 A TG17 SIR3 RIF2 RAD33 UNG1 TSA1 RAD52 CGI121 SML1 OGG1 RAD10 ZDS2 CTK3 NDI1 YMR018W PEX12 MSN2 CSM3 CTF18 YKU80 PSO2 NDE1 YIM1 TPP1 INP2 MLH1 DDR48 SGS1 RAD14 INP1 MRE11 ZDS1 YKU70 HDA1 HHT2 PMS1 MKT1 RAS2 Y AF9 EAF7 PEX17 IES2 MGS1 A TG2 VPS75 RAD50 MCK1 PEX6 RPD3 YRF1−6 SIN3 LAG2 NTG2 PEX15 MSH2 WSC3 MDH2 RTC1 PEX11 DNL4 SLG1 EXO1 CKB2 CKA2 Y OR084W RAS1 LEO1 ARP8 ELG1 IES4 PEX27 NPT1 WTM2 MKK1 REV1 SCP1 RAD17 PHR1 HA T1 CHL1 HST2 RAD1 RMI1 ELC1 PNG1 PEX25 VPS30 MEI5 MKK2 TGS1 MLH3 REV3 CTI6 RTC6 DDC1 HSP82 EAF3 NHP6A ANT1 CTF4 MMS1 HDA3 PEX22 A TG14 PEX32 YCL056C PEX19 GSG1 PEX7 ADR1 PEX5 PEX10 PEX3 PEX29 AFG1 A TG18 SNF4 PEX14 PEX31 PEX8 PEX4 PEX21 SL T2 PEX28 PEX18 FIS1 DJP1 BCK1 MDV1 A TG27 YJL185C PEX2 GPX1 PEX1 VPS1 CAF4 DNM1 PEX13 PEX30 MID2 A TG17 YMR018W PEX12 INP2 INP1 PEX17 A TG2 PEX6 PEX15 WSC3 MDH2 RTC1 PEX11 SLG1 Y OR084W PEX27 MKK1 PEX25 VPS30 MKK2 ANT1 0 20 40 60 80 0 20 40 60 80 NTG1 Y AL027W PEX22 NUP60 SWD1 HT A2 ALK2 APN2 HEK2 PSY4 PIN4 TEL1 HHT1 RDH54 TEC1 MMS4 RAD16 A TG14 PEX32 SWD3 TDP1 SLX1 PBP2 HSM3 CHK1 RIF1 DPB3 SNF5 GBP2 DCC1 YCL056C MRC1 POL4 MSH3 NHP10 HEX3 RPN4 GPR1 SIR2 RAD59 PEX19 BDF2 BRE1 MSH5 MGT1 RAD57 YDR026C RAD28 PPH3 RAD55 UBC13 GIS1 MSH6 BMH2 GSG1 FOB1 DPB4 PEX7 SAC3 ADR1 RAD9 HT A1 SIR4 PEX5 CT A1 DIN7 PEX10 RAD3 4 HIM1 PEX3 VID21 ESC2 XRS2 MUS81 RAD30 DOT1 SNF1 PEX29 PLM2 RAD23 AFG1 GP A2 MIG3 CHZ1 YEN1 PTC2 RAD51 SLX8 MAG1 RAD4 RAD24 BMH1 MSH4 RIM15 A TG18 PUF4 CKB1 ALK1 PNC1 RAD6 MMS2 LIF1 P AN2 SNF4 SOH1 PEX14 RAD54 SAE2 SIP2 RTF1 RTG2 HXK2 PEX31 ACB1 PEX8 PHB1 PEX4 PRE9 PHB2 PEX21 LAG1 SHU1 SNF6 SOD2 SL T2 RRM3 WSS1 PEX28 RTT107 PEX18 EST3 CKA1 FIS1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' GUT2 MPH1 DJP1 RTT101 HPR5 BCK1 MDV1 A TG27 YJL185C PEX2 RAD26 POL32 RAD7 TOR1 EAF6 GRR1 LAC1 P AN3 GPX1 IXR1 MSN4 YNK1 MDH1 AA T1 RAD27 APN1 SBA1 CTK1 PEX1 DOA1 VPS1 CAF4 UTH1 TRM2 NUP133 RPL40B MLP1 DNM1 RTT109 HSP104 UBI4 BRE2 RAD5 MLH2 SLX4 STM1 TOS4 PEX13 EST1 TOP3 IRC20 NEJ1 BUD6 MMS22 PEX30 MID2 RSC2 MDM30 PSY3 CST9 CDC73 A TG17 SIR3 RIF2 RAD33 UNG1 TSA1 RAD52 CGI121 SML1 OGG1 RAD10 ZDS2 CTK3 NDI1 YMR018W PEX12 MSN2 CSM3 CTF18 YKU80 PSO2 NDE1 YIM1 TPP1 INP2 MLH1 DDR48 SGS1 RAD14 INP1 MRE11 ZDS1 YKU70 HDA1 HHT2 PMS1 MKT1 RAS2 Y AF9 EAF7 PEX17 IES2 MGS1 A TG2 VPS75 RAD50 MCK1 PEX6 RPD3 YRF1−6 SIN3 LAG2 NTG2 PEX15 MSH2 WSC3 MDH2 RTC1 PEX11 DNL4 SLG1 EXO1 CKB2 CKA2 Y OR084W RAS1 LEO1 ARP8 ELG1 IES4 PEX27 NPT1 WTM2 MKK1 REV1 SCP1 RAD17 PHR1 HA T1 CHL1 HST2 RAD1 RMI1 ELC1 PNG1 PEX25 VPS30 MEI5 MKK2 TGS1 MLH3 REV3 CTI6 RTC6 DDC1 HSP82 EAF3 NHP6A ANT1 CTF4 MMS1 HDA3 PEX22 A TG14 PEX32 YCL056C PEX19 GSG1 PEX7 ADR1 PEX5 PEX10 PEX3 PEX29 AFG1 A TG18 SNF4 PEX14 PEX31 PEX8 PEX4 PEX21 SL T2 PEX28 PEX18 FIS1 DJP1 BCK1 MDV1 A TG27 YJL185C PEX2 GPX1 PEX1 VPS1 CAF4 DNM1 PEX13 PEX30 MID2 A TG17 YMR018W PEX12 INP2 INP1 PEX17 A TG2 PEX6 PEX15 WSC3 MDH2 RTC1 PEX11 SLG1 Y OR084W PEX27 MKK1 PEX25 VPS30 MKK2 ANT1 Figure B.4: Alternative ﬁtness plots with orf ∆ posterior mean ﬁtnesses. Labels for the “per - oxisomal org anisation” gene ontology term are highlighted in blue. A) Non-Bayesian, non- hierarchical ﬁtness plot, based on T able S6 from Addinall et al. (2011) ( F = M D R × M D P ) . B) Non-Bayesian, hierarchical ﬁtness plot, from ﬁtting REM to data in T able S6 from Addinall et al. (2011) ( F = M D R × M D P ) . C) IHM ﬁtness plot with orf ∆ posterior mean ﬁtness ( F = M DR × M D P ) . D) JHM ﬁtness plot with orf ∆ posterior mean ﬁtnesses. orf ∆ strains are classiﬁed as being a suppressor or enhancer based on analysis of growth parameter r . Further ﬁtness plot explanation and notation is gi v en in Figure 4.2. 119 Appendix B. Bayesian hierarchical modelling B.3. Lists of top genetic interactions f or the two-stage and one-stage Bayesian appr oaches T able B.2: Sample of interaction hierarchical model top genetic interactions with cdc13-1 at 27 ◦ C T ype of Gene Pr obability of Strength of P osition in Interaction Name Interaction δ l Interaction e ( δ l γ l ) Addinall (2011) Suppressor IPK1 1 . 00 2 . 87 10 LST4 1 . 00 2 . 77 13 RPN4 1 . 00 2 . 76 17 MTC5 1 . 00 2 . 66 20 GTR1 1 . 00 2 . 64 38 NMD2 1 . 00 2 . 62 3 SAN1 1 . 00 2 . 62 16 UPF3 1 . 00 2 . 58 21 RPL37A 1 . 00 2 . 56 121 N AM7 1 . 00 2 . 53 22 RPP2B 1 . 00 2 . 52 120 YNL226W 0 . 99 2 . 49 126 YGL218W 1 . 00 2 . 46 250 MEH1 1 . 00 2 . 45 45 AR O2 1 . 00 2 . 45 68 EXO1 1 . 00 2 . 45 1 BUD27 1 . 00 2 . 43 46 RAD24 1 . 00 2 . 39 4 RPL16B 1 . 00 2 . 39 33 RPL43A 1 . 00 2 . 39 150 Enhancer :::MRC1 1 . 00 0 . 11 35 YKU70 1 . 00 0 . 11 31 STI1 1 . 00 0 . 11 42 RIF1 1 . 00 0 . 13 36 ELP3 1 . 00 0 . 16 82 CLB5 1 . 00 0 . 17 58 MRC1 1 . 00 0 . 17 63 DPH2 1 . 00 0 . 18 24 POL32 1 . 00 0 . 19 113 MAK31 1 . 00 0 . 19 37 SWM1 1 . 00 0 . 20 25 L TE1 1 . 00 0 . 21 48 MAK10 1 . 00 0 . 22 44 ELP2 1 . 00 0 . 22 77 P A T1 1 . 00 0 . 24 144 DPH1 1 . 00 0 . 25 55 SRB2 0 . 99 0 . 25 174 THP2 1 . 00 0 . 26 67 MFT1 1 . 00 0 . 26 52 LSM6 0 . 97 0 . 26 389 See http://research.ncl.ac.uk/qfa/HeydariQFABayes/IHM_strip.txt for the full list. 120 Appendix B. Bayesian hierarchical modelling T able B.3: Sample of joint hierarchical model top genetic interactions with cdc13-1 at 27 ◦ C T ype of Gene Probability of Str ength of Strength of Strength of P osition in Interaction Name Interaction Interaction Interaction Interaction Addinall (2011) δ l e ( δ l γ l ) e ( δ l ω l ) M D R × M D P Suppressor CSE2 1 . 00 490 . 51 0 . 48 11 . 71 838 in K SGF29 1 . 00 273 . 69 0 . 68 14 . 16 580 GSH1 1 . 00 78 . 79 0 . 92 17 . 89 281 YMD8 1 . 00 59 . 31 0 . 65 7 . 05 2022 YGL024W 1 . 00 28 . 13 1 . 18 13 . 33 151 RPS9B 1 . 00 24 . 67 1 . 12 10 . 24 801 GRR1 1 . 00 22 . 51 0 . 67 5 . 99 1992 Suppressor BTS1 1 . 00 19 . 27 2 . 29 19 . 65 201 in r IPK1 1 . 00 5 . 56 2 . 26 44 . 81 10 NMD2 1 . 00 2 . 96 2 . 19 48 . 51 3 SAN1 1 . 00 2 . 37 2 . 17 48 . 70 16 LST4 1 . 00 5 . 79 2 . 14 44 . 14 13 RPN4 1 . 00 8 . 00 2 . 12 40 . 46 17 UPF3 1 . 00 3 . 16 2 . 07 45 . 25 21 Suppressor in SAN1 1 . 00 2 . 37 2 . 17 48 . 70 16 M D R × M D P NMD2 1 . 00 2 . 96 2 . 19 48 . 51 3 UPF3 1 . 00 3 . 16 2 . 07 45 . 25 21 EXO1 1 . 00 2 . 89 2 . 06 45 . 04 1 IPK1 1 . 00 5 . 56 2 . 26 44 . 81 10 LST4 1 . 00 5 . 79 2 . 14 44 . 14 13 N AM7 1 . 00 3 . 02 2 . 04 43 . 00 22 Enhancer YKU70 1 . 00 0 . 01 1 . 09 − 23 . 44 31 in K STI1 1 . 00 0 . 01 1 . 20 − 21 . 60 42 RIF1 1 . 00 0 . 01 0 . 63 − 26 . 17 36 :::MRC1 1 . 00 0 . 01 0 . 83 − 23 . 15 35 MAK31 1 . 00 0 . 02 1 . 18 − 18 . 19 37 CLB5 1 . 00 0 . 02 0 . 87 − 19 . 54 58 MRC1 1 . 00 0 . 02 0 . 81 − 20 . 40 63 Enhancer P A T1 1 . 00 1 . 71 0 . 28 − 18 . 30 144 in r PUF4 1 . 00 2 . 00 0 . 31 − 21 . 61 34 YKU80 1 . 00 2 . 15 0 . 33 − 21 . 68 32 R TT103 1 . 00 2 . 54 0 . 34 − 17 . 87 153 LSM1 0 . 99 2 . 13 0 . 34 − 16 . 20 101 GIM3 0 . 99 0 . 93 0 . 35 − 19 . 70 132 INP52 0 . 96 0 . 86 0 . 36 − 14 . 50 345 Enhancer in RIF1 1 . 00 0 . 01 0 . 63 − 26 . 17 36 M D R × M D P L TE1 1 . 00 0 . 06 0 . 40 − 23 . 96 48 YKU70 1 . 00 0 . 01 1 . 09 − 23 . 44 31 :::MRC1 1 . 00 0 . 01 0 . 83 − 23 . 15 35 DPH2 1 . 00 0 . 04 0 . 56 − 23 . 11 24 EST1 1 . 00 0 . 12 0 . 46 − 22 . 20 5 MAK10 1 . 00 0 . 04 0 . 59 − 21 . 92 44 See http://research.ncl.ac.uk/qfa/HeydariQFABayes/JHM_strip.txt for the full list. 121 Appendix B. Bayesian hierarchical modelling B.4. cdc13-1 27 ◦ C vs ura3 ∆ 27 ◦ C ﬁtness plots for the joint hierarchical model in terms of carrying capacity and gr owth rate parameters 0.00 0.05 0.10 0.15 0.00 0.05 0.10 0.15 0.20 0.25 NTG1 Y AL027W PEX22 NUP60 SWD1 HT A2 ALK2 APN2 HEK2 PSY4 PIN4 TEL1 HHT1 RDH54 TEC1 MMS4 RAD16 A TG14 PEX32 SWD3 TDP1 SLX1 PBP2 HSM3 CHK1 RIF1 DPB3 SNF5 GBP2 DCC1 YCL056C MRC1 POL4 MSH3 NHP10 HEX3 RPN4 GPR1 SIR2 RAD59 PEX19 BDF2 BRE1 MSH5 MGT1 RAD57 YDR026C RAD28 PPH3 RAD55 UBC13 GIS1 MSH6 BMH2 GSG1 FOB1 DPB4 PEX7 SA C3 ADR1 RAD9 HT A1 SIR4 PEX5 CT A1 DIN7 PEX10 RAD34 HIM1 PEX3 VID21 ESC2 XRS2 MUS81 RAD30 DO T1 SNF1 PEX29 PLM2 RAD23 AFG1 GP A2 MIG3 CHZ1 YEN1 PTC2 RAD51 SLX8 MA G1 RAD4 RAD24 BMH1 MSH4 RIM15 A TG18 PUF4 CKB1 ALK1 PNC1 RAD6 MMS2 LIF1 P AN2 SNF4 SOH1 PEX14 RAD54 SAE2 SIP2 R TF1 R TG2 HXK2 PEX31 A CB1 PEX8 PHB1 PEX4 PRE9 PHB2 PEX21 LA G1 SHU1 SNF6 SOD2 SL T2 RRM3 WSS1 PEX28 R TT107 PEX18 EST3 CKA1 FIS1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' GUT2 MPH1 DJP1 R TT101 HPR5 BCK1 MD V1 A TG27 YJL185C PEX2 RAD26 POL32 RAD7 T OR1 EAF6 GRR1 LA C1 P AN3 GPX1 IXR1 MSN4 YNK1 MDH1 AA T1 RAD27 APN1 SBA1 CTK1 PEX1 DOA1 VPS1 CAF4 UTH1 TRM2 NUP133 RPL40B MLP1 DNM1 R TT109 HSP104 UBI4 BRE2 RAD5 MLH2 SLX4 STM1 T OS4 PEX13 EST1 T OP3 IRC20 NEJ1 BUD6 MMS22 PEX30 MID2 RSC2 MDM30 PSY3 CST9 CDC73 A TG17 SIR3 RIF2 RAD33 UNG1 TSA1 RAD52 CGI121 SML1 OGG1 RAD10 ZDS2 CTK3 NDI1 YMR018W PEX12 MSN2 CSM3 CTF18 YKU80 PSO2 NDE1 YIM1 TPP1 INP2 MLH1 DDR48 SGS1 RAD14 INP1 MRE11 ZDS1 YKU70 HD A1 HHT2 PMS1 MKT1 RAS2 Y AF9 EAF7 PEX17 IES2 MGS1 A TG2 VPS75 RAD50 MCK1 PEX6 RPD3 YRF1−6 SIN3 LA G2 NTG2 PEX15 MSH2 WSC3 MDH2 R TC1 PEX11 DNL4 SLG1 EXO1 CKB2 CKA2 Y OR084W RAS1 LEO1 ARP8 ELG1 IES4 PEX27 NPT1 WTM2 MKK1 REV1 SCP1 RAD17 PHR1 HA T1 CHL1 HST2 RAD1 RMI1 ELC1 PNG1 PEX25 VPS30 MEI5 MKK2 TGS1 MLH3 REV3 CTI6 R TC6 DDC1 HSP82 EAF3 NHP6A ANT1 CTF4 MMS1 HD A3 CHZ1 PRE9 PEX6 TGS1 CTI6 R TC6 XRS2 MRE11 RAD50 Figure B.5: Joint hierarchical model (JHM) carrying capacity ﬁtness plot with orf ∆ posterior mean ﬁtnesses. orf ∆ strains are classiﬁed as being a suppressor or enhancer based on carrying capacity parameter K . Signiﬁcant interactors have posterior probability ∆ > 0 . 5 . T o compare ﬁtness plots, labelled genes are those belonging to the follo wing gene ontology terms in T able 4.1: “telomere maintenance”, “ageing”, “response to DN A damage stimulus” or “peroxisomal organi- zation”, as well as the genes identiﬁed as interactions only in K with the JHM (see Figure 4.3) (blue), genes interacting only in r with the JHM (cyan) and the MRX comple x genes (pink). 122 Appendix B. Bayesian hierarchical modelling 0 1 2 3 4 5 6 0 1 2 3 4 5 NTG1 Y AL027W PEX22 NUP60 SWD1 HT A2 ALK2 APN2 HEK2 PSY4 PIN4 TEL1 HHT1 RDH54 TEC1 MMS4 RAD16 A TG14 PEX32 SWD3 TDP1 SLX1 PBP2 HSM3 CHK1 RIF1 DPB3 SNF5 GBP2 DCC1 YCL056C MRC1 POL4 MSH3 NHP10 HEX3 RPN4 GPR1 SIR2 RAD59 PEX19 BDF2 BRE1 MSH5 MGT1 RAD57 YDR026C RAD28 PPH3 RAD55 UBC13 GIS1 MSH6 BMH2 GSG1 FOB1 DPB4 PEX7 SA C3 ADR1 RAD9 HT A1 SIR4 PEX5 CT A1 DIN7 PEX10 RAD34 HIM1 PEX3 VID21 ESC2 XRS2 MUS81 RAD30 DO T1 SNF1 PEX29 PLM2 RAD23 AFG1 GP A2 MIG3 CHZ1 YEN1 PTC2 RAD51 SLX8 MA G1 RAD4 RAD24 BMH1 MSH4 RIM15 A TG18 PUF4 CKB1 ALK1 PNC1 RAD6 MMS2 LIF1 P AN2 SNF4 SOH1 PEX14 RAD54 SAE2 SIP2 R TF1 R TG2 HXK2 PEX31 A CB1 PEX8 PHB1 PEX4 PRE9 PHB2 PEX21 LA G1 SHU1 SNF6 SOD2 SL T2 RRM3 WSS1 PEX28 R TT107 PEX18 EST3 CKA1 FIS1 MET18 CSM2 REV7 RPL40A RRD1 IMP2' GUT2 MPH1 DJP1 R TT101 HPR5 BCK1 MD V1 A TG27 YJL185C PEX2 RAD26 POL32 RAD7 T OR1 EAF6 GRR1 LA C1 P AN3 GPX1 IXR1 MSN4 YNK1 MDH1 AA T1 RAD27 APN1 SBA1 CTK1 PEX1 DOA1 VPS1 CAF4 UTH1 TRM2 NUP133 RPL40B MLP1 DNM1 R TT109 HSP104 UBI4 BRE2 RAD5 MLH2 SLX4 STM1 T OS4 PEX13 EST1 T OP3 IRC20 NEJ1 BUD6 MMS22 PEX30 MID2 RSC2 MDM30 PSY3 CST9 CDC73 A TG17 SIR 3 RIF2 RAD33 UNG1 TSA1 RAD52 CGI121 SML1 OGG1 RAD10 ZDS2 CTK3 NDI1 YMR018W PEX12 MSN2 CSM3 CTF18 YKU80 PSO2 NDE1 YIM1 TPP1 INP2 MLH1 DDR48 SGS1 RAD14 INP1 MRE11 ZDS1 YKU70 HD A1 HHT2 PMS1 MKT1 RAS2 Y AF9 EAF7 PEX17 IES2 MGS1 A TG2 VPS75 RAD50 MCK1 PEX6 RPD3 YRF1−6 SIN3 LA G2 NTG2 PEX15 MSH2 WSC3 MDH2 R TC1 PEX11 DNL4 SLG1 EXO1 CKB2 CKA2 Y OR084W RAS1 LEO1 ARP8 ELG1 IES4 PEX27 NPT1 WTM2 MKK1 REV1 SCP1 RAD17 PHR1 HA T1 CHL1 HST2 RAD1 RMI1 ELC1 PNG1 PEX25 VPS30 MEI5 MKK2 TGS1 MLH3 REV3 CTI6 R TC6 DDC1 HSP82 EAF3 NHP6A ANT1 CTF4 MMS1 HD A3 CHZ1 PRE9 PEX6 TGS1 CTI6 R TC6 XRS2 MRE11 RAD50 Figure B.6: Joint hierarchical model (JHM) growth rate ﬁtness plot with orf ∆ posterior mean ﬁtnesses. orf ∆ strains are classiﬁed as being a suppressor or enhancer based on gro wth parameter r . Signiﬁcant interactors hav e posterior probability ∆ > 0 . 5 . T o compare ﬁtness plots, labelled genes are those belonging to the following gene ontology terms in T able 4.1: “telomere mainte- nance”, “ageing”, “response to DN A damage stimulus” or “peroxisomal organization”, as well as the genes identiﬁed as interactions only in K with the JHM (see Figure 4.3) (blue), genes interact- ing only in r with the JHM (cyan) and the MRX comple x genes (pink). 123 Appendix B. Bayesian hierarchical modelling B.5. Gene ontology term enrichment analysis in R source("http://bioconductor.org/biocLite.R") biocLite("GOstats") biocLite("org.Sc.sgd.db") ################### library(GOstats) # GO testing tool package library(org.Sc.sgd.db) # yeast gene annotation package genes=read.table("JHM_strip.txt", header=T) UNIVSTRIP=genes[,2] genes<-as.vector(genes[genes[,3]>0.5,2]) genes<-unique(genes) ensemblIDs=as.list(org.Sc.sgdPMID2ORF) univ=unlist(ensemblIDs) univ=univ[!is.na(univ)] length(univ) length(unique(univ)) univ=unique(univ) all=as.vector(univ) all=all[all%in%UNIVSTRIP] length(all) ontology=c("BP") vec<-genes%in%univ genes<-genes[vec] params_temp=new("GOHyperGParams", geneIds=genes, universeGeneIds=all, annotation="org.Sc.sgd.db", categoryName="GO", ontology=ontology, pvalueCutoff=1, testDirection = "over") results=hyperGTest(params_temp) results=summary(results) results$qvalue<-p.adjust(results$Pvalue,method="BH") 124 Appendix B. Bayesian hierarchical modelling B.6. Code for J ust Another Gib bs Sampler softwar e B.6.1. Separate hierarchical model code model { for (l in 1:N){ for (m in 1:NoORF[l]){ for (n in 1:NoTime[(NoSum[l]+m)]){ y[m,n,l] ˜ dnorm(y.hat[m,n,l], exp(nu_l[l])) y.hat[m,n,l] <- (K_lm[(NoSum[l]+m)] * P * exp(r_lm[(NoSum[l]+m)] * x[m,n,l])) /(K_lm[(NoSum[l]+m)]+P * (exp(r_lm[(NoSum[l]+m)] * x[m,n,l])-1)) } K_lm[(NoSum[l]+m)]<- exp(K_lm_L[(NoSum[l]+m)]) K_lm_L[(NoSum[l]+m)] ˜ dnorm(K_o_l_L[l],exp(tau_K_l[l]))T(,0) r_lm[(NoSum[l]+m)]<- exp(r_lm_L[(NoSum[l]+m)]) r_lm_L[(NoSum[l]+m)] ˜ dnorm(r_o_l_L[l],exp(tau_r_l[l]))T(,3.5) } K_o_l_L[l]<- log(K_o_l[l]) K_o_l[l] ˜ dt( exp(K_p), exp(sigma_K_o),3)T(0,) r_o_l_L[l]<- log(r_o_l[l]) r_o_l[l] ˜ dt( exp(r_p), exp(sigma_r_o),3)T(0,) nu_l[l] ˜ dnorm(nu_p, exp(sigma_nu) ) tau_K_l[l]˜dnorm(tau_K_p,exp(sigma_tau_K))T(0,) tau_r_l[l]˜dnorm(tau_r_p,exp(sigma_tau_r)) } K_p ˜ dnorm(K_mu,eta_K_p) r_p ˜ dnorm(r_mu,eta_r_p) nu_p ˜ dnorm(nu_mu,eta_nu_p) P<-exp(P_L) P_L ˜ dnorm(P_mu,eta_P) tau_K_p ˜ dnorm(tau_K_mu,eta_tau_K_p) sigma_tau_K ˜ dnorm(eta_tau_K,psi_tau_K) tau_r_p ˜ dnorm(tau_r_mu,psi_tau_r) sigma_tau_r ˜ dnorm(eta_tau_r,psi_tau_r) sigma_nu˜dnorm(eta_nu,psi_nu) sigma_K_o ˜ dnorm(eta_K_o,psi_K_o) sigma_r_o ˜ dnorm(eta_r_o,psi_r_o) } B.6.2. Interaction hierarchical model code model { for (l in 1:N){ for (c in 1:2){ for (m in 1:NoORF[l,c]){ y[m,c,l]˜ dnorm(exp(alpha_c[c] +delta_l[l,c] * gamma_cl_L[l,c]) * Z_l[l],exp(nu_cl[l+(c-1) * N])) } nu_cl[l+(c-1) * N]˜dnorm(nu_p,exp(sigma_nu)) } Z_l[l]˜dt(exp(Z_p),exp(sigma_Z),3)T(0,) delta_l[l,1]<-0 delta_l[l,2]˜dbern(p) gamma_cl_L[l,1]<-0 gamma_cl_L[l,2]<-log(gamma_l[l]) gamma_l[l]˜dt(1,exp(sigma_gamma),3)T(0,) } alpha_c[1]<-0 alpha_c[2]˜dnorm(alpha_mu,eta_alpha) Z_p˜dnorm(Z_mu,eta_Z_p) nu_p˜dnorm(nu_mu,eta_nu_p) sigma_Z˜dnorm(eta_Z,psi_Z) sigma_nu˜dnorm(eta_nu,psi_nu_p) sigma_gamma˜dnorm(eta_gamma,psi_gamma) } 125 Appendix B. Bayesian hierarchical modelling B.6.3. Joint hierar chical model code model { for (l in 1:N){ for (c in 1:2){ for (m in 1:NoORF[l,c]){ for (n in 1:NoTime[NoSum[l,c]+m,c]){ y[m,n,l,c] ˜ dnorm(y.hat[m,n,l,c],exp(nu_cl[l+(c-1) * N])) y.hat[m,n,l,c] <- (K_clm[(SHIFT[c]+NoSum[l,c]+m)] * P * exp(r_clm[(SHIFT[c]+NoSum[l,c]+m)] * x[m,n,l,c])) /(K_clm[(SHIFT[c]+NoSum[l,c]+m)]+P * (exp(r_clm[(SHIFT[c]+NoSum[l,c]+m)] * x[m,n,l,c])-1)) } K_clm[(SHIFT[c]+NoSum[l,c]+m)]<-exp(K_clm_L[(SHIFT[c]+NoSum[l,c]+m)]) K_clm_L[(SHIFT[c]+NoSum[l,c]+m)] ˜ dnorm(alpha_c[c]+K_o_l_L[l] +(delta_l[l,c] * gamma_cl_L[l,c]),exp(tau_K_cl[l+(c-1) * N]))T(,0) r_clm[(SHIFT[c]+NoSum[l,c]+m)]<-exp(r_clm_L[(SHIFT[c]+NoSum[l,c]+m)]) r_clm_L[(SHIFT[c]+NoSum[l,c]+m)] ˜ dnorm(beta_c[c]+r_o_l_L[l] +(delta_l[l,c] * omega_cl_L[l,c]),exp(tau_r_cl[l+(c-1) * N]))T(,3.5) } tau_K_cl[l+(c-1) * N]˜dnorm(tau_K_p_c[c],exp(sigma_tau_K_c[c]))T(0,) tau_r_cl[l+(c-1) * N]˜dnorm(tau_r_p_c[c],exp(sigma_tau_r_c[c])) nu_cl[l+(c-1) * N]˜dnorm(nu_p,exp(sigma_nu)) } K_o_l_L[l]<- log(K_o_l[l]) K_o_l[l] ˜ dt(exp(K_p),exp(sigma_K_o),3)T(0,) r_o_l_L[l]<- log(r_o_l[l]) r_o_l[l] ˜ dt(exp(r_p),exp(sigma_r_o),3)T(0,) delta_l[l,1]<-0 delta_l[l,2]˜dbern(p) gamma_cl_L[l,1]<-0 gamma_cl_L[l,2]<-log(gamma_l[l]) gamma_l[l]˜dt(1,exp(sigma_gamma),3)T(0,) omega_cl_L[l,1]<-0 omega_cl_L[l,2]<-log(omega_l[l]) omega_l[l]˜dt(1,exp(sigma_omega),3)T(0,) } alpha_c[1]<-0 alpha_c[2]˜dnorm(alpha_mu,eta_alpha) beta_c[1]<-0 beta_c[2]˜dnorm(beta_mu,eta_beta) K_p˜dnorm(K_mu,eta_K_p) r_p˜dnorm(r_mu,eta_r_p) nu_p˜dnorm(nu_mu,eta_nu_p) P <- exp(P_L) P_L ˜dnorm(P_mu,eta_P) sigma_K_o˜dnorm(eta_K_o,psi_K_o) sigma_r_o˜dnorm(eta_r_o,psi_r_o) tau_K_p_c[1]˜dnorm(tau_K_mu,eta_tau_K_p) tau_K_p_c[2]˜dnorm(tau_K_mu,eta_tau_K_p) tau_r_p_c[1]˜dnorm(tau_r_mu,eta_tau_r_p) tau_r_p_c[2]˜dnorm(tau_r_mu,eta_tau_r_p) sigma_tau_K_c[1]˜dnorm(eta_tau_K,psi_tau_K) sigma_tau_K_c[2]˜dnorm(eta_tau_K,psi_tau_K) sigma_tau_r_c[1]˜dnorm(eta_tau_r,psi_tau_r) sigma_tau_r_c[2]˜dnorm(eta_tau_r,psi_tau_r) sigma_nu˜dnorm(eta_nu,psi_nu) sigma_gamma˜dnorm(eta_gamma,psi_gamma) sigma_omega˜dnorm(eta_omega,psi_omega) } } 126 Appendix B. Bayesian hierarchical modelling B.7. Additional cdc13-1 27 ◦ C vs ura3 ∆ 27 ◦ C ﬁtness plots Figure B.7: Alternativ e non-Bayesian, hierarchical ﬁtness plot, from ﬁtting the random ef fects model (REM) to data in T able S6 from Addinall et al. (2011) ( F = M D R × M DP ) . orf ∆ s with signiﬁcant evidence of interaction are highlighted in red and green for suppressors and enhancers respecti vely . orf ∆ s without signiﬁcant evidence of interaction are in grey and hav e no orf name label. Signiﬁcant interactors are classiﬁed as those with FDR corrected p-values < 0 . 05 . 127 Appendix B. Bayesian hierarchical modelling 0 50 100 150 0 10 20 30 40 50 60 VPS8 BUD14 KIN3 MUM2 UBC4 CHK1 RIF1 :::MRC1 MRC1 MAK31 PTC6 RPN4 RPL13A YDL109C YDL118W YDL119C RPP1B RPL35B RPL35A ARX1 MTC5 SAN1 RAD9 SWM1 YDR269C IPK1 SAC7 RPL27B RPL37B MAK10 NOP16 TMA20 YER119C−A SCS2 FTR1 RAD24 BMH1 BUD27 RPO41 UBP6 RPL2A PIB2 RPL24A DBP3 RPL9A ARO2 TOS3 ARO8 YGL217C UPF3 RPL11B PPT1 PHB2 RPL8A SSF1 NMD2 EST3 YIL055C FYV10 DPH1 MPH1 PET130 TMA22 GEF1 MRT4 ELM1 HAP4 ZRT3 LST4 DPH2 DOA1 MEH1 UTH1 BAS1 RIC1 ERG3 YLR111W SRN2 RPL37A EST1 YLR261C T AL1 SUR4 REH1 VIP1 RIF2 OGG1 VPS9 GTR1 NAM7 ECM5 YMR206W YKU70 YNL011C RPL16B ESBP6 RAD50 MCK1 KRE1 YNR005C STI1 EXO1 SHE4 STD1 CKA2 HIS3 RAD17 HA T1 PNG1 DDC1 YPR044C TKL1 CLB2 VPS4 EDE1 YBL104C P A T1 CYK3 EBS1 RPP2B BST1 LRP1 YIL057C MNI1 FKH1 VPS51 YPT6 RPL6B YML010C−B RPL42A MNT4 RTC1 RIM1 DIE2 CO X23 CST6 JJJ3 RPL43A L TE1 SLA1 ARA1 UFD2 RVS167 POT1 CCW12 SUB1 YMR057C FET3 NPT1 CLB5 HMT1 YDL012C PHO2 RPL24B YPS6 IRC21 YPL080C ELP3 MNI2 RPL8B CKB2 NUP2 HSP26 GUP1 AIR1 SNX4 YJR154W CDC73 MFT1 QCR9 THP2 RPL16A HPR5 DID2 STM1 PGM2 MRE11 CCC2 UFD4 RPL36A MET18 LDB19 PHO13 OCA6 PPH3 VID28 RAD23 RPL34A SKI8 BNR1 YMR193C−A PET122 VPS35 POL32 YLR402W RPL29 DBF2 HGH1 ELP2 BUD28 PPQ1 GPH1 A VT5 MDM34 CTP1 BEM2 RMD11 TEL1 CRD1 CRP1 UPS1 LIA1 CO X12 ICT1 GUF1 YMR153C−A MSS18 SYS1 DPH5 CO X7 RPL4A RRD1 UBX4 OAC1 YPL062W NA T1 DPB4 SIM1 HCM1 RHR2 YER093C−A GPD2 VPS21 :::RTT103 KAP122 SER2 KEX1 XBP1 BCK1 VPS24 PFK26 V AM10 A YR1 YIL161W TNA1 CBP4 HSP104 VPS60 YGL218W COQ10 YDR271C CLG1 D YN3 ALD6 CAP2 SIW14 NAP1 YNL226W SRB2 YLR218C YLR290C MON1 YPR050C EOS1 OST3 YDL176W PUS7 IMG2 GZF3 FKH2 YKE4 TRP1 VRP1 DCC1 IRA2 TIP41 BUB2 RPL26A SL T2 YLR338W DPB3 FEN1 FCY2 RPS4A RPL43B IKI3 NUP53 PUF6 FUS3 SKI2 NPR1 RPL21B UBX7 YGR259C MCX1 YKR035C ZWF1 PUF4 ARO1 ARP1 NUP133 YJL185C Y OR251C VID24 RNR3 YPT7 PTC2 AEP2 PER1 SAS4 YBR266C CBF1 DRS2 YLR143W OST4 GIM3 ALG3 PTC1 ELP4 CKB1 YML013C−A D YN1 YBL083C STP1 PSD1 TRM44 ARC18 YDR262W ALG12 RPL40B SAS5 BMH2 MAK3 KTI12 YGL042C JNM1 CA T5 GDH1 YT A7 TEF4 CTF4 YKR074W VPS1 ELG1 NRG2 KNS1 VPS5 GP A2 FPS1 PEX32 VID30 VPS17 YBR277C MCM16 YBL059W PHO80 BEM4 EMI5 SBP1 OTU2 LSM6 RPL23A VPS38 UBP3 GID8 DBR1 RPL17B SWF1 MNE1 YMR074C VHS2 SPT2 SWI4 FIT2 CYT2 JJJ1 CBT1 P AC1 SKI7 MBP1 RPL33B P AN2 YNL120C NPL4 EAF3 Y OR309C SFH5 CSN12 IMP2' SHE1 FMP35 YDL050C YBR144C PBP2 YLR184W YGL149W RPL9B INP52 YGL057C NTO1 QCR2 RTF1 DOT1 OMS1 V AM7 CYT1 EMI1 NBP2 V AC14 RA V1 YLR407W RPP1A CAC2 MRPL1 LAC1 HPT1 XRS2 YJL206C KEX2 MDM38 MMM1 ERG24 ACE2 ALD3 CTK1 V AM3 RPE1 O YE2 ERJ5 RVS161 PKR1 ERG6 OCA4 YBR028C GOS1 MRN1 HAP5 MGR2 SAS2 AAH1 CYC2 CHL1 TOM70 RSM25 RBS1 TMA19 SWC5 UBA4 HSE1 ELP6 MTC3 HAP2 UBP2 HSP12 Y AP1 SET2 APQ12 APT1 PMS1 BF A1 LEU3 YUR1 V AM6 YJL120W GRS1 HAP3 ASC1 TOP1 DIP5 YNR029C UME1 ALG9 RMD5 SIR3 BTS1 YDR203W TPS3 LDB18 ERD1 ALG6 URM1 PHO87 DSK2 GET1 CKA1 MLH1 YPR098C DEP1 YPR039W GEM1 YDR266C REI1 RNH202 AKL1 MET16 YDR049W YJL211C PEX8 PHB1 MDM10 YLR217W LRG1 YBR025C EFT2 YBR246W SNF1 YDR149C BRE5 PEX15 P AN3 P AH1 FYV1 YHL005C MIR1 VTS1 RPL22A YPL102C IRC3 PUF3 YDR348C TOM7 THR4 PEX2 Y AP1801 PEX13 PMP3 MNN11 QCR6 SYT1 YLR091W YPR084W YGL024W BUL1 ALG5 PPZ1 KAR9 HSP82 TCM62 CPS1 CCZ1 AZF1 SYF2 YML090W OCA5 YDR537C CYM1 A TP10 URE2 KRE11 YBL071C NBA1 YPL035C PEP8 RGS2 PTP3 Y OR052C BNA2 YBR232C YKL121W NCS2 YBR226C DEG1 CSM3 TOM6 MMS22 SPE2 YMR310C HIR3 RTT109 YBR099C MIS1 ASR1 RTC4 IMP2 RAD27 YPR004C Figure B.8: Alternati ve interaction hierarchical model (IHM) ﬁtness plot with orf ∆ posterior mean ﬁtness. orf ∆ s with signiﬁcant evidence of interaction are highlighted on the plot as red and green for suppressors and enhancers respecti vely ( F = M D R × M DP ) . Solid and dashed grey ﬁtted lines are for the IHM linear model ﬁt. orf ∆ s with signiﬁcant e vidence of interaction are highlighted in red and green for suppressors and enhancers respectiv ely . orf ∆ s without signiﬁ- cant evidence of interaction are in grey and hav e no orf name label. Signiﬁcant interactors ha ve posterior probability ∆ > 0 . 5 . 128 Appendix B. Bayesian hierarchical modelling 0 20 40 60 80 0 20 40 60 80 VPS8 MDM10 L TE1 DRS2 BUD14 KIN3 SLA1 RRN10 URA7 EDE1 YBL059W ALG3 YBL083C RPL23A TEL1 A VT5 YBL094C YBL104C YBR025C RPL4A HMT1 MUM2 AKL1 HSP26 UBC4 VID24 L YS2 CCZ1 YBR144C ARA1 PEX32 RPS9B MCX1 PBP2 HIS7 YBR266C REI1 UBX7 CHK1 RIF1 YBR277C DPB3 SNF5 CTP1 SGF29 SRO9 :::MRC1 MRC1 RVS161 MAK31 RIM1 FEN1 PER1 BUD31 HCM1 P A T1 PTC6 NHP10 PTC1 YDL012C RPN4 SLM3 NA T1 YDL050C MBP1 YDL062W RPP1A RPL13A QRI7 PHO2 YDL109C CYK3 YDL118W YDL119C RPP1B RPL35B CRD1 YDL176W UFD2 RPL35A OST4 PHO13 L YS14 OCA6 PPH3 VPS41 BMH2 ARX1 DPB4 ARO1 MTC5 SAN1 SAS4 YDR203W EBS1 RAD9 SWM1 YDR262W YDR269C CCC2 YDR271C PMP3 RNH202 :::RTT103 SSD1 IPK1 MSN5 BCS1 RPP2B EFT2 RVS167 SAC7 HPT1 STP1 RPL27B VPS60 PUF6 RPL37B EMI1 KRE28 UBC8 RAD23 MAK10 NOP16 TMA20 FCY2 RPL34A PTC2 IES5 YER093C−A SWI4 YER119C−A SCS2 FTR1 UBP3 PET122 BEM2 RAD24 BMH1 BUD27 BST1 STE2 RPO41 UBP6 RPL2A RPL29 PUF4 KAP122 CKB1 PIB2 YGL024W TRP5 RPL24A DBP3 GUP1 ARC1 MON1 RPL9A ARO2 YGL149W KEM1 TOS3 ARO8 V AM7 SKI8 YGL214W CLG1 YGL217C YGL218W MDM34 UPF3 RPL11B DBF2 ASK10 PCP1 PPT1 RPL24B CBP4 QCR9 HGH1 ELP2 SER2 DIE2 PHB2 YGR259C TNA1 YT A7 OTU2 OPI1 RMD11 RPL8A ARG4 SL T2 FYV4 SSF1 NMD2 LRP1 UBA4 CO X23 ARP1 CRP1 THP2 MDM31 EST3 TIR3 VID28 YKE4 CAP2 CST6 RHR2 YIL055C YIL057C RNR3 AIR1 L YS12 FYV10 XBP1 DPH1 PFK26 MNI1 SIM1 A YR1 MET18 FKH1 RPL16A VHS2 RRD1 IMP2' BNR1 POT1 YIL161W MPH1 YPS6 SYS1 PET130 SNX4 HPR5 BCK1 GSH1 GZF3 RPE1 LSM1 SFH5 VPS35 RPL17B YJL185C YJL211C TMA22 GEF1 POL32 CBF1 LIA1 CSN12 GRR1 RPL43B JJJ3 CP A2 RPS4A YJR154W RPL14A MRT4 UFD4 VPS24 ELM1 HSL1 HAP4 OAC1 CTK1 ZRT3 LST4 SPE1 DPH2 EAP1 DOA1 URA1 VPS1 MEH1 VPS51 YKR035C DID2 UTH1 NAP1 D YN1 YKR074W NUP133 RPL40B BAS1 MMM1 CO X17 HSP104 FPS1 RPL8B CO X12 RIC1 ERG3 RPL22A BUD28 ICT1 CCW12 YLR111W SRN2 YLR143W STM1 DPH5 RPL37A PEX13 UPS1 YLR218C EST1 YLR261C YPT6 GUF1 YLR290C NUP2 VRP1 YLR338W RPL26A T AL1 VPS38 SUR4 IKI3 V AC14 REH1 SKI2 YLR402W VIP1 CDC73 RPL6B RIF2 YPT7 ERG6 YML010C−B YML013C−A PPZ1 YMD8 OGG1 MFT1 RPS1B VPS9 CAC2 BUL2 GTR1 CO X14 SUB1 CSM3 YMR057C FET3 UBX4 IRC21 YMR074C NAM7 PGM2 ASC1 PKR1 GID8 YMR153C−A NUP53 ECM5 HSC82 YMR193C−A RPL36A YMR206W MRE11 CO X7 AEP2 YKU70 JNM1 D YN3 YMR310C PET8 YNL011C SIW14 RPL9B FKH2 RPL16B EOS1 SWS2 YNL120C ESBP6 RPL42A WHI3 PSY2 YNL226W JJJ1 YNL228W SIN4 ZWF1 RAD50 ALP1 MCK1 KRE1 YNR005C CSE2 ALG12 L YS9 MNT4 PHO80 COQ10 MDM12 NOP12 GSH2 SPE2 GPD2 EMI5 IRA2 HMI1 RTC1 STI1 EXO1 SHE4 CKB2 STD1 CKA2 V AM10 VPS5 SKI7 OST3 VPS21 VPS17 SFL1 ARP8 ELG1 HIS3 NPT1 SAS5 PUS7 P AC1 Y OR309C RPL20B LDB19 RAD17 GDH1 HA T1 TRM44 ALD6 YPL062W BTS1 RPL21B YPL080C ELP3 PNG1 PPQ1 MRN1 DDC1 YPL205C TIP41 RPL43A YPR044C MNI2 MCM16 SPE3 TKL1 CLB2 CLB5 MSS18 VPS4 QCR2 GEM1 FUS3 YDL041W TRP1 SWF1 RMD5 OMS1 VID30 STE20 TEF4 CBT1 YLR091W YLR184W ARC18 URA4 SAS2 ALD3 PMS1 PSD1 Y OR052C V AM3 UBP2 MNE1 HAP5 MGR2 BEM4 CO X10 ALG5 ECM8 V AM6 MRPL1 ERD1 MF A1 GP A2 GET1 KEX1 NCS6 P AC10 YJL120W YUR1 SET2 YJL206C RA V1 DBR1 YLR407W BUB2 P AH1 TPS3 AAH1 YNR029C CA T5 IMG2 RBS1 NBP2 L YS4 MOG1 KTI12 KNS1 YLL044W TOM7 NPR1 MDM38 PEX15 CYT1 Y OR251C SBP1 PEP8 IMP2 ALG9 NTO1 DOT1 RTF1 PEX8 RPL33B YBR226C HIS4 YDR266C OCA5 STB5 PEX2 CYT2 BUL1 YNR020C CTF4 HAP3 PHO87 FKS1 COQ2 EAF3 OCA4 YKL121W ACE2 BRE5 LSM6 HSP12 YGL057C O YE2 ERG24 MIS1 YDR029W FMP35 NPR2 SPT2 MTC3 Y AP1801 PTH1 YLL029W MDL1 ALG6 DIA2 POC4 BNA2 LEU3 TOP1 YPR050C YDR537C CPS1 YKU80 HSP82 HAP2 A TP10 BCH2 DSK2 RPS24A ERJ5 MLH1 :::GAL11 YGL042C RSM25 ARG5 RTG2 GPH1 LRG1 PEX14 GOS1 VPS30 HSE1 ICE2 P AP2 TMA16 NRG2 IOC3 VPS53 DCW1 BUB3 Y OR131C YNL109W TLG2 MMS2 LAC1 YPL102C DCC1 VTS1 SRB2 HCR1 MAK3 GRS1 TOM70 DEG1 YDR049W :::GUK1 NMD4 RGA1 L YP1 YPR039W CYC2 PUF3 FIT2 FYV1 CYM1 RTC4 APT1 KEX2 KRE11 ARL3 YML108W YDR348C AHA1 YBR232C LHP1 TGS1 WHI5 AZF1 APC9 SSA1 TCM62 MID2 MNN11 URE2 OCA1 P AN2 TMA19 RHO5 IES2 ELP4 TPM1 P AN3 YML102C−A AKR2 MVB12 MIR1 UBC13 MGA2 PKH3 CHL1 Y AL004W PEX6 Y AP1 Y OR008C−A RPL41B PEX10 FMP36 YML053C MET16 ASH1 YLR217W SIF2 INP52 SLX9 MFB1 LDB18 PHB1 YPS7 IRC3 VPS27 ISU2 YBR028C SNF1 YBR285W MKS1 ARF1 NBA1 HIS6 YNL170W PHO88 BUB1 GIM3 APL1 SIR3 NFI1 YDR467C APL4 RNH203 ALG8 QCR6 YPL041C ARL1 NCS2 LRE1 RSM28 PTP3 BF A1 IBD2 RO T2 TOM6 MEP3 VPS75 CO X8 GIM4 YNL198C DEP1 CCS1 CKA1 FIS1 GDS1 ASM4 PEX12 SER1 TPS1 Y OR082C TOS1 LEO1 MAL12 XRS2 MSC6 RAX2 VPS29 SKG3 YBL071C RAM1 ABP1 YBR224W ZRT1 GIS4 YDR506C BUD19 OCA2 SIR4 YNL266W LAS21 CUE3 SNC2 GDH2 CIN8 CLA4 NIF3 RIT1 MSH2 CPR7 MRPS9 YPR098C CWH43 IOC4 PEX5 YNL105W YML090W YIL166C IL V6 YNL171C SHE1 ADE1 SYC1 GAS1 SOH1 YHL005C MET3 DIP5 PRS5 RPL19B GGA2 LEM3 RTC6 APE2 YLR118C EGD2 YBR134W SKI3 CHO2 BUD13 SWA2 RPP2A Y AF9 YER077C YJL215C MED1 HIS1 UME1 SRL1 YNL296W PET127 YPL035C PIN4 KES1 VIK1 OST5 RNH201 OPI10 TOF1 YLR282C Y AL058C−A YPR097W DCR2 CO X5B YMR144W YGL235W OSH3 RPS27A YLR334C SWC5 BEM3 JSN1 PEX4 IXR1 THR4 BNI1 CHS3 CWH41 CO X5A BEM1 YPR096C RCE1 RBL2 YBR246W AHP1 RGS2 THR1 STE50 YPL225W PSR2 TSA1 BUD21 IRC15 RPS30B RPL19A RAD27 Y OR022C YLR404W YCR051W P AM17 Y OL079W MRM2 KAR9 YKL158W YPR084W FYV12 YGR242W YEL007W SIT1 PET494 APP1 INP53 YDR149C ERF2 YPR004C YSA1 RTG1 FMP45 GTR2 YGL046W ELP6 COG7 MLH2 YCF1 YHL044W SPO14 YBR099C REX4 DHH1 YBR238C EAF6 URM1 UBA3 A TG27 APS2 PMT2 YML119W SRF6 CTI6 CTF18 RTS1 SYF2 MMS1 BUG1 SRO7 AU A1 ITR1 HOS2 CSF1 GIS3 YIL064W NDL1 RP A14 IOC2 ELF1 VTC4 SBA1 FLX1 WWM1 APL3 MMR1 MRP49 YER078C FSF1 YME1 COG8 PRX1 STB1 ILM1 IRC8 MGR1 YIA6 INP2 INO2 MAD1 TFP1 HMO1 SLM1 YGR226C KEL1 NPL4 JID1 YPR148C ERV14 HMX1 RPS14A ALB1 YMC2 YER066W FMP25 AD Y4 KIP2 PEX25 PIH1 VPS13 SKT5 PEX3 SPO11 YPT32 FRE1 YPR090W MNN2 YKL077W SYT1 MP A43 IST1 ENT2 CHZ1 PEX30 YGR068C CGR1 SNF11 ASR1 HOM2 PHO86 RTT102 YPL183C SHE9 PEX1 Figure B.9: Alternati v e joint hierarchical model (JHM) ﬁtness plot with orf ∆ posterior mean ﬁtnesses. The JHM does not does not make use of a ﬁtness measure such as M DR × M D P but the ﬁtness plot is given in terms of M D R × M D P for comparison with other approaches which do. orf ∆ strains are classiﬁed as being a suppressor or enhancer based on one of the two parameters used to classify genetic interaction, growth parameter r , this means occasionally strains can be more ﬁt in the query experiment in terms of M D R × M DP but be classiﬁed as enhancers (green). orf ∆ s with signiﬁcant evidence of interaction are highlighted in red and green for suppressors and enhancers respectiv ely . orf ∆ s without signiﬁcant evidence of interaction are in grey and ha v e no orf name label. Signiﬁcant interactors ha ve posterior probability ∆ > 0 . 5 . 129 Appendix B. Bayesian hierarchical modelling 0.00 0.05 0.10 0.15 0.0 0.1 0.2 0.3 0.4 VPS8 MDM10 L TE1 DRS2 BUD14 KIN3 SLA1 RRN10 URA7 EDE1 YBL059W ALG3 YBL083C RPL23A TEL1 A VT5 YBL094C YBL104C YBR025C RPL4A HMT1 MUM2 AKL1 HSP26 UBC4 VID24 L YS2 CCZ1 YBR144C ARA1 PEX32 RPS9B MCX1 PBP2 HIS7 YBR266C REI1 UBX7 CHK1 RIF1 YBR277C DPB3 SNF5 CTP1 SGF29 SRO9 :::MRC1 MRC1 RVS161 MAK31 RIM1 FEN1 PER1 BUD31 HCM1 P A T1 PTC6 NHP10 PTC1 YDL012C RPN4 SLM3 NA T1 YDL050C MBP1 YDL062W RPP1A RPL13A QRI7 PHO2 YDL109C CYK3 YDL118W YDL119C RPP1B RPL35B CRD1 YDL176W UFD2 RPL35A OST4 PHO13 L YS14 OCA6 PPH3 VPS41 BMH2 ARX1 DPB4 ARO1 MTC5 SAN1 SAS4 YDR203W EBS1 RAD9 SWM1 YDR262W YDR269C CCC2 YDR271C PMP3 RNH202 :::RTT103 SSD1 IPK1 MSN5 BCS1 RPP2B EFT2 RVS167 SAC7 HPT1 STP1 RPL27B VPS60 PUF6 RPL37B EMI1 KRE28 UBC8 RAD23 MAK10 NOP16 TMA20 FCY2 RPL34A PTC2 IES5 YER093C−A SWI4 YER119C−A SCS2 FTR1 UBP3 PET122 BEM2 RAD24 BMH1 BUD27 BST1 STE2 RPO41 UBP6 RPL2A RPL29 PUF4 KAP122 CKB1 PIB2 YGL024W TRP5 RPL24A DBP3 GUP1 ARC1 MON1 RPL9A ARO2 YGL149W KEM1 TOS3 ARO8 V AM7 SKI8 YGL214W CLG1 YGL217C YGL218W MDM34 UPF3 RPL11B DBF2 ASK10 PCP1 PPT1 RPL24B CBP4 QCR9 HGH1 ELP2 SER2 DIE2 PHB2 YGR259C TNA1 YT A7 OTU2 OPI1 RMD11 RPL8A ARG4 SL T2 FYV4 SSF1 NMD2 LRP1 UBA4 CO X23 ARP1 CRP1 THP2 MDM31 EST3 TIR3 VID28 YKE4 CAP2 CST6 RHR2 YIL055C YIL057C RNR3 AIR1 L YS12 FYV10 XBP1 DPH1 PFK26 MNI1 SIM1 A YR1 MET18 FKH1 RPL16A VHS2 RRD1 IMP2' BNR1 POT1 YIL161W MPH1 YPS6 SYS1 PET130 SNX4 HPR5 BCK1 GSH1 GZF3 RPE1 LSM1 SFH5 VPS35 RPL17B YJL185C YJL211C TMA22 GEF1 POL32 CBF1 LIA1 CSN12 GRR1 RPL43B JJJ3 CP A2 RPS4A YJR154W RPL14A MRT4 UFD4 VPS24 ELM1 HSL1 HAP4 OAC1 CTK1 ZRT3 LST4 SPE1 DPH2 EAP1 DOA1 URA1 VPS1 MEH1 VPS51 YKR035C DID2 UTH1 NAP1 D YN1 YKR074W NUP133 RPL40B BAS1 MMM1 CO X17 HSP104 FPS1 RPL8B CO X12 RIC1 ERG3 RPL22A BUD28 ICT1 CCW12 YLR111W SRN2 YLR143W STM1 DPH5 RPL37A PEX13 UPS1 YLR218C EST1 YLR261C YPT6 GUF1 YLR290C NUP2 VRP1 YLR338W RPL26A T AL1 VPS38 SUR4 IKI3 V AC14 REH1 SKI2 YLR402W VIP1 CDC73 RPL6B RIF2 YPT7 ERG6 YML010C−B YML013C−A PPZ1 YMD8 OGG1 MFT1 RPS1B VPS9 CAC2 BUL2 GTR1 CO X14 SUB1 CSM3 YMR057C FET3 UBX4 IRC21 YMR074C NAM7 PGM2 ASC1 PKR1 GID8 YMR153C−A NUP53 ECM5 HSC82 YMR193C−A RPL36A YMR206W MRE11 CO X7 AEP2 YKU70 JNM1 D YN3 YMR310C PET8 YNL011C SIW14 RPL9B FKH2 RPL16B EOS1 SWS2 YNL120C ESBP6 RPL42A WHI3 PSY2 YNL226W JJJ1 YNL228W SIN4 ZWF1 RAD50 ALP1 MCK1 KRE1 YNR005C CSE2 ALG12 L YS9 MNT4 PHO80 COQ10 MDM12 NOP12 GSH2 SPE2 GPD2 EMI5 IRA2 HMI1 RTC1 STI1 EXO1 SHE4 CKB2 STD1 CKA2 V AM10 VPS5 SKI7 OST3 VPS21 VPS17 SFL1 ARP8 ELG1 HIS3 NPT1 SAS5 PUS7 P AC1 Y OR309C RPL20B LDB19 RAD17 GDH1 HA T1 TRM44 ALD6 YPL062W BTS1 RPL21B YPL080C ELP3 PNG1 PPQ1 MRN1 DDC1 YPL205C TIP41 RPL43A YPR044C MNI2 MCM16 SPE3 TKL1 CLB2 CLB5 MSS18 VPS4 QCR2 GEM1 FUS3 YDL041W TRP1 SWF1 RMD5 OMS1 VID30 STE20 TEF4 CBT1 YLR091W YLR184W ARC18 URA4 SAS2 ALD3 PMS1 PSD1 Y OR052C V AM3 UBP2 MNE1 HAP5 MGR2 BEM4 CO X10 ALG5 ECM8 V AM6 MRPL1 ERD1 MF A1 GP A2 GET1 KEX1 NCS6 P AC10 YJL120W YUR1 SET2 YJL206C RA V1 DBR1 YLR407W BUB2 P AH1 TPS3 AAH1 YNR029C CA T5 IMG2 RBS1 NBP2 L YS4 MOG1 KTI12 KNS1 YLL044W TOM7 NPR1 MDM38 PEX15 CYT1 Y OR251C SBP1 PEP8 IMP2 ALG9 NTO1 DOT1 RTF1 PEX8 RPL33B YBR226C HIS4 YDR266C OCA5 STB5 PEX2 CYT2 BUL1 YNR020C CTF4 HAP3 PHO87 FKS1 COQ2 EAF3 OCA4 YKL121W ACE2 BRE5 LSM6 HSP12 YGL057C O YE2 ERG24 MIS1 YDR029W FMP35 NPR2 SPT2 MTC3 Y AP1801 PTH1 YLL029W MDL1 ALG6 DIA2 POC4 BNA2 LEU3 TOP1 YPR050C YDR537C CPS1 YKU80 HSP82 HAP2 A TP10 BCH2 DSK2 RPS24A ERJ5 MLH1 :::GAL11 YGL042C RSM25 ARG5 RTG2 GPH1 LRG1 PEX14 GOS1 VPS30 HSE1 ICE2 P AP2 TMA16 NRG2 IOC3 VPS53 DCW1 BUB3 Y OR131C YNL109W TLG2 MMS2 LAC1 YPL102C DCC1 VTS1 SRB2 HCR1 MAK3 GRS1 TOM70 DEG1 YDR049W :::GUK1 NMD4 RGA1 L YP1 YPR039W CYC2 PUF3 FIT2 FYV1 CYM1 RTC4 APT1 KEX2 KRE11 ARL3 YML108W YDR348C AHA1 YBR232C LHP1 TGS1 WHI5 AZF1 APC9 SSA1 TCM62 MID2 MNN11 URE2 OCA1 P AN2 TMA19 RHO5 IES2 ELP4 TPM1 P AN3 YML102C−A AKR2 MVB12 MIR1 UBC13 MGA2 PKH3 CHL1 Y AL004W PEX6 Y AP1 Y OR008C−A RPL41B PEX10 FMP36 YML053C MET16 ASH1 YLR217W SIF2 INP52 SLX9 MFB1 LDB18 PHB1 YPS7 IRC3 VPS27 ISU2 YBR028C SNF1 YBR285W MKS1 ARF1 NBA1 HIS6 YNL170W PHO88 BUB1 GIM3 APL1 SIR3 NFI1 YDR467C APL4 RNH203 ALG8 QCR6 YPL041C ARL1 NCS2 LRE1 RSM28 PTP3 BF A1 IBD2 RO T2 TOM6 MEP3 VPS75 CO X8 GIM4 YNL198C DEP1 CCS1 CKA1 FIS1 GDS1 ASM4 PEX12 SER1 TPS1 Y OR082C TOS1 LEO1 MAL12 XRS2 MSC6 RAX2 VPS29 SKG3 YBL071C RAM1 ABP1 YBR224W ZRT1 GIS4 YDR506C BUD19 OCA2 SIR4 YNL266W LAS21 CUE3 SNC2 GDH2 CIN8 CLA4 NIF3 RIT1 MSH2 CPR7 MRPS9 YPR098C CWH43 IOC4 PEX5 YNL105W YML090W YIL166C IL V6 YNL171C SHE1 ADE1 SYC1 GAS1 SOH1 YHL005C MET3 DIP5 PRS5 RPL19B GGA2 LEM3 RTC6 APE2 YLR118C EGD2 YBR134W SKI3 CHO2 BUD13 SWA2 RPP2A Y AF9 YER077C YJL215C MED1 HIS1 UME1 SRL1 YNL296W PET127 YPL035C PIN4 KES1 VIK1 OST5 RNH201 OPI10 TOF1 YLR282C Y AL058C−A YPR097W DCR2 CO X5B YMR144W YGL235W OSH3 RPS27A YLR334C SWC5 BEM3 JSN1 PEX4 IXR1 THR4 BNI1 CHS3 CWH41 CO X5A BEM1 YPR096C RCE1 RBL2 YBR246W AHP1 RGS2 THR1 STE50 YPL225W PSR2 TSA1 BUD21 IRC15 RPS30B RPL19A RAD27 Y OR022C YLR404W YCR051W P AM17 Y OL079W MRM2 KAR9 YKL158W YPR084W FYV12 YGR242W YEL007W SIT1 PET494 APP1 INP53 YDR149C ERF2 YPR004C YSA1 RTG1 FMP45 GTR2 YGL046W ELP6 COG7 MLH2 YCF1 YHL044W SPO14 YBR099C REX4 DHH1 YBR238C EAF6 URM1 UBA3 A TG27 APS2 PMT2 YML119W SRF6 CTI6 CTF18 RTS1 SYF2 MMS1 BUG1 SRO7 AU A1 ITR1 HOS2 CSF1 GIS3 YIL064W NDL1 RP A14 IOC2 ELF1 VTC4 SBA1 FLX1 WWM1 APL3 MMR1 MRP49 YER078C FSF1 YME1 COG8 PRX1 STB1 ILM1 IRC8 MGR1 YIA6 INP2 INO2 MAD1 TFP1 HMO1 SLM1 YGR226C KEL1 NPL4 JID1 YPR148C ERV14 HMX1 RPS14A ALB1 YMC2 YER066W FMP25 AD Y4 KIP2 PEX25 PIH1 VPS13 SKT5 PEX3 SPO11 YPT32 FRE1 YPR090W MNN2 YKL077W SYT1 MP A43 IST1 ENT2 CHZ1 PEX30 YGR068C CGR1 SNF11 ASR1 HOM2 PHO86 RTT102 YPL183C SHE9 PEX1 Figure B.10: Joint hierarchical model (JHM) carrying capacity ﬁtness plot with orf ∆ posterior mean ﬁtnesses. orf ∆ strains are classiﬁed as being a suppressor or enhancer based on carrying capacity parameter K . orf ∆ s with signiﬁcant evidence of interaction are highlighted in red and green for suppressors and enhancers respecti vely . orf ∆ s without signiﬁcant e vidence of interaction are in gre y and have no orf name label. Signiﬁcant interactors hav e posterior probability ∆ > 0 . 5 . 130 Appendix B. Bayesian hierarchical modelling 0 1 2 3 4 5 6 0 1 2 3 4 5 VPS8 MDM10 L TE1 DRS2 BUD14 KIN3 SLA1 RRN10 URA7 EDE1 YBL059W ALG3 YBL083C RPL23A TEL1 A VT5 YBL094C YBL104C YBR025C RPL4A HMT1 MUM2 AKL1 HSP26 UBC4 VID24 L YS2 CCZ1 YBR144C ARA1 PEX32 RPS9B MCX1 PBP2 HIS7 YBR266C REI1 UBX7 CHK1 RIF1 YBR277C DPB3 SNF5 CTP1 SGF29 SRO9 :::MRC1 MRC1 RVS161 MAK31 RIM1 FEN1 PER1 BUD31 HCM1 P A T1 PTC6 NHP10 PTC1 YDL012C RPN4 SLM3 NA T1 YDL050C MBP1 YDL062W RPP1A RPL13A QRI7 PHO2 YDL109C CYK3 YDL118W YDL119C RPP1B RPL35B CRD1 YDL176W UFD2 RPL35A OST4 PHO13 L YS14 OCA6 PPH3 VPS41 BMH2 ARX1 DPB4 ARO1 MTC5 SAN1 SAS4 YDR203W EBS1 RAD9 SWM1 YDR262W YDR269C CCC2 YDR271C PMP3 RNH202 :::RTT103 SSD1 IPK1 MSN5 BCS1 RPP2B EFT2 RVS167 SAC7 HPT1 STP1 RPL27B VPS60 PUF6 RPL37B EMI1 KRE28 UBC8 RAD23 MAK10 NOP16 TMA20 FCY2 RPL34A PTC2 IES5 YER093C−A SWI4 YER119C−A SCS2 FTR1 UBP3 PET122 BEM2 RAD24 BMH1 BUD27 BST1 STE2 RPO41 UBP6 RPL2A RPL29 PUF4 KAP122 CKB1 PIB2 YGL024W TRP5 RPL24A DBP3 GUP1 ARC1 MON1 RPL9A ARO2 YGL149W KEM1 TOS3 ARO8 V AM7 SKI8 YGL214W CLG1 YGL217C YGL218W MDM34 UPF3 RPL11B DBF2 ASK10 PCP1 PPT1 RPL24B CBP4 QCR9 HGH1 ELP2 SER2 DIE2 PHB2 YGR259C TNA1 YT A7 OTU2 OPI1 RMD11 RPL8A ARG4 SL T2 FYV4 SSF1 NMD2 LRP1 UBA4 CO X23 ARP1 CRP1 THP2 MDM31 EST3 TIR3 VID28 YKE4 CAP2 CST6 RHR2 YIL055C YIL057C RNR3 AIR1 L YS12 FYV10 XBP1 DPH1 PFK26 MNI1 SIM1 A YR1 MET18 FKH1 RPL16A VHS2 RRD1 IMP2' BNR1 POT1 YIL161W MPH1 YPS6 SYS1 PET130 SNX4 HPR5 BCK1 GSH1 GZF3 RPE1 LSM1 SFH5 VPS35 RPL17B YJL185C YJL211C TMA22 GEF1 POL32 CBF1 LIA1 CSN12 GRR1 RPL43B JJJ3 CP A2 RPS4A YJR154W RPL14A MRT4 UFD4 VPS24 ELM1 HSL1 HAP4 OAC1 CTK1 ZRT3 LST4 SPE1 DPH2 EAP1 DOA1 URA1 VPS1 MEH1 VPS51 YKR035C DID2 UTH1 NAP1 D YN1 YKR074W NUP133 RPL40B BAS1 MMM1 CO X17 HSP104 FPS1 RPL8B CO X12 RIC1 ERG3 RPL22A BUD28 ICT1 CCW12 YLR111W SRN2 YLR143W STM1 DPH5 RPL37A PEX13 UPS1 YLR218C EST1 YLR261C YPT6 GUF1 YLR290C NUP2 VRP1 YLR338W RPL26A T AL1 VPS38 SUR4 IKI3 V AC14 REH1 SKI2 YLR402W VIP1 CDC73 RPL6B RIF2 YPT7 ERG6 YML010C−B YML013C−A PPZ1 YMD8 OGG1 MFT1 RPS1B VPS9 CAC2 BUL2 GTR1 CO X14 SUB1 CSM3 YMR057C FET3 UBX4 IRC21 YMR074C NAM7 PGM2 ASC1 PKR1 GID8 YMR153C−A NUP53 ECM5 HSC82 YMR193C−A RPL36A YMR206W MRE11 CO X7 AEP2 YKU70 JNM1 D YN3 YMR310C PET8 YNL011C SIW14 RPL9B FKH2 RPL16B EOS1 SWS2 YNL120C ESBP6 RPL42A WHI3 PSY2 YNL226W JJJ1 YNL228W SIN4 ZWF1 RAD50 ALP1 MCK1 KRE1 YNR005C CSE2 ALG12 L YS9 MNT4 PHO80 COQ10 MDM12 NOP12 GSH2 SPE2 GPD2 EMI5 IRA2 HMI1 RTC1 STI1 EXO1 SHE4 CKB2 STD1 CKA2 V AM10 VPS5 SKI7 OST3 VPS21 VPS17 SFL1 ARP8 ELG1 HIS3 NPT1 SAS5 PUS7 P AC1 Y OR309C RPL20B LDB19 RAD17 GDH1 HA T1 TRM44 ALD6 YPL062W BTS1 RPL21B YPL080C ELP3 PNG1 PPQ1 MRN1 DDC1 YPL205C TIP41 RPL43A YPR044C MNI2 MCM16 SPE3 TKL1 CLB2 CLB5 MSS18 VPS4 QCR2 GEM1 FUS3 YDL041W TRP1 SWF1 RMD5 OMS1 VID30 STE20 TEF4 CBT1 YLR091W YLR184W ARC18 URA4 SAS2 ALD3 PMS1 PSD1 Y OR052C V AM3 UBP2 MNE1 HAP5 MGR2 BEM4 CO X10 ALG5 ECM8 V AM6 MRPL1 ERD1 MF A1 GP A2 GET1 KEX1 NCS6 P AC10 YJL120W YUR1 SET2 YJL206C RA V1 DBR1 YLR407W BUB2 P AH1 TPS3 AAH1 YNR029C CA T5 IMG2 RBS1 NBP2 L YS4 MOG1 KTI12 KNS1 YLL044W TOM7 NPR1 MDM38 PEX15 CYT1 Y OR251C SBP1 PEP8 IMP2 ALG9 NTO1 DOT1 RTF1 PEX8 RPL33B YBR226C HIS4 YDR266C OCA5 STB5 PEX2 CYT2 BUL1 YNR020C CTF4 HAP3 PHO87 FKS1 COQ2 EAF3 OCA4 YKL121W ACE2 BRE5 LSM6 HSP12 YGL057C O YE2 ERG24 MIS1 YDR029W FMP35 NPR2 SPT2 MTC3 Y AP1801 PTH1 YLL029W MDL1 ALG6 DIA2 POC4 BNA2 LEU3 TOP1 YPR050C YDR537C CPS1 YKU80 HSP82 HAP2 A TP10 BCH2 DSK2 RPS24A ERJ5 MLH1 :::GAL11 YGL042C RSM25 ARG5 RTG2 GPH1 LRG1 PEX14 GOS1 VPS30 HSE1 ICE2 P AP2 TMA16 NRG2 IOC3 VPS53 DCW1 BUB3 Y OR131C YNL109W TLG2 MMS2 LAC1 YPL102C DCC1 VTS1 SRB2 HCR1 MAK3 GRS1 TOM70 DEG1 YDR049W :::GUK1 NMD4 RGA1 L YP1 YPR039W CYC2 PUF3 FIT2 FYV1 CYM1 RTC4 APT1 KEX2 KRE11 ARL3 YML108W YDR348C AHA1 YBR232C LHP1 TGS1 WHI5 AZF1 APC9 SSA1 TCM62 MID2 MNN11 URE2 OCA1 P AN2 TMA19 RHO5 IES2 ELP4 TPM1 P AN3 YML102C−A AKR2 MVB12 MIR1 UBC13 MGA2 PKH3 CHL1 Y AL004W PEX6 Y AP1 Y OR008C−A RPL41B PEX10 FMP36 YML053C MET16 ASH1 YLR217W SIF2 INP52 SLX9 MFB1 LDB18 PHB1 YPS7 IRC3 VPS27 ISU2 YBR028C SNF1 YBR285W MKS1 ARF1 NBA1 HIS6 YNL170W PHO88 BUB1 GIM3 APL1 SIR3 NFI1 YDR467C APL4 RNH203 ALG8 QCR6 YPL041C ARL1 NCS2 LRE1 RSM28 PTP3 BF A1 IBD2 RO T2 TOM6 MEP3 VPS75 CO X8 GIM4 YNL198C DEP1 CCS1 CKA1 FIS1 GDS1 ASM4 PEX12 SER1 TPS1 Y OR082C TOS1 LEO1 MAL12 XRS2 MSC6 RAX2 VPS29 SKG3 YBL071C RAM1 ABP1 YBR224W ZRT1 GIS4 YDR506C BUD19 OCA2 SIR4 YNL266W LAS21 CUE3 SNC2 GDH2 CIN8 CLA4 NIF3 RIT1 MSH2 CPR7 MRPS9 YPR098C CWH43 IOC4 PEX5 YNL105W YML090W YIL166C IL V6 YNL171C SHE1 ADE1 SYC1 GAS1 SOH1 YHL005C MET3 DIP5 PRS5 RPL19B GGA2 LEM3 RTC6 APE2 YLR118C EGD2 YBR134W SKI3 CHO2 BUD13 SWA2 RPP2A Y AF9 YER077C YJL215C MED1 HIS1 UME1 SRL1 YNL296W PET127 YPL035C PIN4 KES1 VIK1 OST5 RNH201 OPI10 TOF1 YLR282C Y AL058C−A YPR097W DCR2 CO X5B YMR144W YGL235W OSH3 RPS27A YLR334C SWC5 BEM3 JSN1 PEX4 IXR1 THR4 BNI1 CHS3 CWH41 CO X5A BEM1 YPR096C RCE1 RBL2 YBR246W AHP1 RGS2 THR1 STE50 YPL225W PSR2 TSA1 BUD21 IRC15 RPS30B RPL19A RAD27 Y OR022C YLR404W YCR051W P AM17 Y OL079W MRM2 KAR9 YKL158W YPR084W FYV12 YGR242W YEL007W SIT1 PET494 APP1 INP53 YDR149C ERF2 YPR004C YSA1 RTG1 FMP45 GTR2 YGL046W ELP6 COG7 MLH2 YCF1 YHL044W SPO14 YBR099C REX4 DHH1 YBR238C EAF6 URM1 UBA3 A TG27 APS2 PMT2 YML119W SRF6 CTI6 CTF18 RTS1 SYF2 MMS1 BUG1 SRO7 AU A1 ITR1 HOS2 CSF1 GIS3 YIL064W NDL1 RP A14 IOC2 ELF1 VTC4 SBA1 FLX1 WWM1 APL3 MMR1 MRP49 YER078C FSF1 YME1 COG8 PRX1 STB1 ILM1 IRC8 MGR1 YIA6 INP2 INO2 MAD1 TFP1 HMO1 SLM1 YGR226C KEL1 NPL4 JID1 YPR148C ERV14 HMX1 RPS14A ALB1 YMC2 YER066W FMP25 AD Y4 KIP2 PEX25 PIH1 VPS13 SKT5 PEX3 SPO11 YPT32 FRE1 YPR090W MNN2 YKL077W SYT1 MP A43 IST1 ENT2 CHZ1 PEX30 YGR068C CGR1 SNF11 ASR1 HOM2 PHO86 RTT102 YPL183C SHE9 PEX1 Figure B.11: Joint hierarchical model (JHM) gro wth rate ﬁtness plot with orf ∆ posterior mean ﬁtnesses. orf ∆ strains are classiﬁed as being a suppressor or enhancer based on gro wth parameter r . orf ∆ s with signiﬁcant evidence of interaction are highlighted in red and green for suppressors and enhancers respecti v ely . orf ∆ s without signiﬁcant evidence of interaction are in grey and hav e no orf name label. Signiﬁcant interactors hav e posterior probability ∆ > 0 . 5 . 131 Appendix B. Bayesian hierarchical modelling B.8. Correlation between methods The Addinall et al. (2011) approach has its highest correlation with the IHM, follo wed by the JHM and then the REM. The REM correlates least well with the JHM while showing the same correlation with both the Addinall et al. (2011) approach and the IHM. The correlation between the IHM and the JHM is the largest observed between an y of the methods, demonstrating the similarity of our Bayesian hierarchical methods. T able B.4: Spearman’ s rank correlation coefﬁcients for magnitudes from genetic independence, between Addinall et al. (2011), random ef fects approach (REM), interaction hierarchical model (IHM) and joint hierarchical model (JHM) approaches Method Method Addinall et al. (2011) REM IHM JHM QF A QF A QF A QF A ( M D R × M DP ) Addinall et al. (2011) QF A, 1 0.77 0.89 0.88 REM QF A, 1 0.77 0.75 IHM QF A, 1 0.95 JHM QF A ( M DR × M D P ), 1 The M DR × M D P correlation plot of the JHM versus the Addinall et al. (2011) approach demonstrates the similarity (Pearson correlation=0.90) and differences between the two approaches in terms of M D R × M DP . W e can see how the results differ between the JHM and Addinall et al. (2011), with a kink at the origin due to the JHM allo wing shrinkage of non-interacting genes to wards the ﬁtted line. 132 Appendix B. Bayesian hierarchical modelling −1.0 −0.5 0.0 0.5 1.0 −20 0 20 40 Addinall (2011) MDR x MDP GIS JHM MDR x MDP GIS Figure B.12: M D R × M D P genetic interaction correlation plot of JHM versus Addinall et al. (2011) (Pearson correlation=0.90). 133 A ppendix C. Stochastic logistic gro wth modelling C.1. Linear noise appr oximation of the stochastic logistic growth model with multiplicativ e intrinsic noise solution First we look to solve d Z t , giv en in equation (5.10). W e deﬁne f ( t ) = − be v t = − baP e aT bP ( e aT − 1)+ a to obtain the follo wing, d Z t = f ( t ) Z t dt + σ dW t . In order to match our initial conditions correctly , Z 0 = 0 . Deﬁne a new process U t = e − R t t 0 f ( s ) ds Z t and solve the inte gral, Z t t 0 f ( s ) ds = Z t t 0 − baP e aS bP ( e aS − 1) + a ds = log  a bP ( e aT − 1) + a  , where, S = s − t 0 and T = t − t 0 . Apply the chain rule to U t , dU t = e − R t t 0 f ( s ) ds d Z t − f ( t ) e − R t t 0 f ( s ) ds Z t dt. No w substitute in d Z t = f ( t ) Z t dt + σ dW t and simplify to gi ve dU t = e − R t t 0 f ( s ) ds σ dW t . Apply the follo wing notation φ ( t ) = e R t t 0 f ( s ) ds = a bP ( e aT − 1)+ a and ψ ( t ) = σ to give dU t = φ ( t ) − 1 ψ ( t ) dW t . U t , has the follo wing solution, U t = U 0 + Z t t 0 φ ( s ) − 1 ψ ( s ) dW s . As U t = φ ( t ) − 1 Z t , Z t then has the follo wing solution (Arnold, 2013), Z t = φ ( t )  Z 0 + Z t t 0 φ ( s ) − 1 ψ ( s ) dW s  . 134 Appendix C. Stochastic logistic growth modelling Finally , the distribution at time t is Z t | Z 0 ∼ N ( M t , E t ) (Arnold, 2013), where M t = φ ( t ) Z 0 and E t = φ ( t ) 2 R t t 0  φ ( s ) − 1 ψ ( s )  2 ds . Further , M t = a bP ( e aT − 1)+ a Z 0 and E t = σ 2 h a bP ( e aT − 1)+ a i 2 R t t 0 h a bP ( e aS − 1)+ a i − 2 ds. As R t t 0 h a bP ( e aS − 1)+ a i − 2 ds = b 2 P 2 ( e 2 aT − 1)+4 bP ( a − bP )( e aT − 1)+2 aT ( a − bP ) 2 2 a 3 , E t = σ 2  a bP ( e aT − 1) + a  2  b 2 P 2 ( e 2 aT − 1) + 4 bP ( a − bP )( e aT − 1) + 2 aT ( a − bP ) 2 2 a 3  = σ 2 " b 2 P 2 ( e 2 aT − 1) + 4 bP ( a − bP )( e aT − 1) + 2 aT ( a − bP ) 2 2 a ( bP ( e aT − 1) + a ) 2 # . T aking our solutions for v t (5.8) and Z t , we can no w write our solution for the LNA to the log of the logistic gro wth process (5.6). As Y t = v t + Z t , Y t | Y 0 ∼ N  log  aP e aT bP ( e aT − 1) + a  + M t , E t  . Note: aP e aT bP ( e aT − 1)+ a has the same functional form as the solution to the deterministic part of the logistic gro wth process (5.1) and is equiv alent when σ = 0 (such that a = r − σ 2 2 = r ). Further , as Y t is normally distributed, we kno w X t = e Y t will be log normally distributed and X t | X 0 ∼ log N (log  aP e aT bP ( e aT − 1) + a  + M t , E t ) . Alternati vely set Q =  a b P − 1  e at 0 , X t | X 0 ∼ log N (log  a b 1 + Qe − at  + M t , E t ) . 135 Appendix C. Stochastic logistic growth modelling From our solution to the log process we can obtain the follo wing transition density ( Y t i | Y t i − 1 = y t i − 1 ) ∼ N ( µ t i , Ξ t i ) , where y t i − 1 = v t i − 1 + z t i − 1 , Q =  a b P − 1  e at 0 , µ t i = y t i − 1 + log  1 + Qe − at i − 1 1 + Qe − at i  + e − a ( t i − t i − 1 ) 1 + Qe − at i − 1 1 + Qe − at i z t i − 1 and Ξ t i = σ 2  4 Q ( e at i − e at i − 1 ) + e 2 at i − e 2 at i − 1 + 2 aQ 2 ( t i − t i − 1 ) 2 a ( Q + e at i ) 2  . 136 Appendix C. Stochastic logistic growth modelling C.2. Zero-order noise approximation of the stochastic logistic growth model After obtaining (5.7) in Section 5.3, we can deri v e the RR TR logistic growth diffusion process as follows. First our expression for dv t , gi v en in (5.7), is approximated by set- ting σ 2 = 0 , dv t =  r − 1 2 σ 2 − r K e v t  dt =  r − r K e v t  dt. W e no w write down an expression for d Z t , where d Y t is gi ven in (5.6) and d Z t = d Y t − dv t , d Z t =  r − 1 2 σ 2 − r K e Y t  dt + σ dW t −  r − r K e v t  dt. W e can then rearrange and simplify to gi ve the follo wing, d Z t =  r K  e v t − e Y t  − 1 2 σ 2  dt + σ dW t . W e no w substitute in Y t = v t + Z t , d Z t =  r K  e v t − e v t + Z t  − 1 2 σ 2  dt + σ dW t . W e no w apply a zero order LN A by setting e Z t = 1 to obtain, d Z t =  r K [ e v t − e v t ] − 1 2 σ 2  dt + σ dW t . W e can then simplify to gi ve the follo wing, d Z t = − 1 2 σ 2 dt + σ dW t . (C.1) Dif ferentiating v t , gi v en in (5.8), with respect to t we can obtain an alternati ve e xpression for dv t , dv t = a ( a − bP ) bP ( e aT − 1) + a dt = r ( K − P ) K + P ( e rT − 1) dt, (C.2) where T = t − t 0 . W e no w write do wn our new expression for Y t , where d Y t = dv t + d Z t , gi ven (C.2) and (C.1), d Y t =  r ( K − P ) K + P ( e aT − 1) − 1 2 σ 2  dt + σ dW t 137 Appendix C. Stochastic logistic growth modelling or alternati vely by setting Q =  K P − 1  e at 0 , d Y t =  Qr e rt + Q − 1 2 σ 2  dt + σ dW t . W e can then apply It ˆ o’ s lemma (5.5) (It ˆ o, 1944) with the transformation f ( t, Y t ) ≡ X t = e Y t . After deri ving the follo wing partial deri v ati ves: d f dt = 0 , d f dx = e Y t and d 2 f dx 2 = e Y t , we can obtain the follo wing It ˆ o drift-dif fusion process: dX t = Qr e rt + Q X t dt + σ dW t , which is exactly the RR TR logistic diffusion process presented by Rom ´ an-Rom ´ an & T orres-Ruiz (2012). 138 Appendix C. Stochastic logistic growth modelling C.3. Linear noise appr oximation of the stochastic logistic growth model with additiv e intrinsic noise solution First we look to solve d Z t , gi v en in (5.14). W e deﬁne f ( t ) = a − 2 bv t to obtain the follo wing, d Z t = f ( t ) Z t dt + σ v t dW t . In order to match our initial conditions correctly , Z 0 = 0 . Deﬁne a new process U t = e − R t t 0 f ( s ) ds Z t and solve the inte gral, Z t t 0 f ( s ) ds = Z t t 0 ( a − 2 bV s ) ds = aT − 2 log  bP ( e aT − 1) + a a  , as R t t 0 V s ds = 1 b log  bP ( e aT − 1)+ a a  , where S = s − t 0 and T = t − t 0 . Apply the chain rule to U t , dU t = e − R t t 0 f ( s ) ds d Z t − f ( t ) e − R t t 0 f ( s ) ds Z t dt. No w substitute in d Z t = f ( t ) Z t dt + σ v t dW t and simplify to gi ve, dU t = e − R t t 0 f ( s ) ds σ v t dW t . Apply the following notation φ ( t ) = e R t t 0 f ( s ) ds = e aT  a bP ( e aT − 1)+ a  2 and ψ ( t ) = σ v t to gi ve, dU t = φ ( t ) − 1 ψ ( t ) dW t . U t has the follo wing solution, U t = U 0 + Z t t 0 φ ( s ) − 1 ψ ( s ) dW s . As U t = φ ( t ) − 1 Z t , Z t has the follo wing solution (Arnold, 2013), Z t = φ ( t )  Z 0 + Z t t 0 φ ( s ) − 1 ψ ( s ) dW s  . Finally the distribution at time t is Z t | Z 0 ∼ N ( M t , E t ) (Arnold, 2013), where M t = φ ( t ) Z 0 and E t = φ ( t ) 2 R t t 0 h φ ( s ) − 1 ψ ( s ) i 2 ds . M t = e aT  a bP ( e aT − 1) + a  2 Z 0 139 Appendix C. Stochastic logistic growth modelling and E t = e aT  a bP ( e aT − 1) + a  2 ! 2 Z t t 0 " e aS  a bP ( e aS − 1) + a  2 # − 2 σ 2 V 2 s ds = σ 2 e aT  a bP ( e aT − 1) + a  2 ! 2 × Z t t 0 " e aS  a bP ( e aS − 1) + a  2 # − 2  aP e aS bP ( e aS − 1) + a  2 ds = σ 2 e aT  a bP ( e aT − 1) + a  2 ! 2 × Z t t 0 " e − 2 aS  a bP ( e aS − 1) + a  − 4 #  aP e aS bP ( e aS − 1) + a  2 ds = σ 2 e aT  1 bP ( e aT − 1) + a  2 ! 2 Z t t 0 " a 2 P 2  1 bP ( e aS − 1) + a  − 2 # ds, as R t t 0  1 bP ( e aS − 1)+ a  − 2 ds = b 2 P 2 ( e 2 aT − 1)+4 bP ( a − bP )( e aT − 1)+2 aT ( a − bP ) 2 2 a , E t = 1 2 σ 2 aP 2 e 2 aT  1 bP ( e aT − 1) + a  4 ×  b 2 P 2 ( e 2 aT − 1) + 4 bP ( a − bP )( e aT − 1) + 2 aT ( a − bP ) 2  . T aking our solutions for v t (5.13) and Z t , we can obtain the follo wing transition density ( X t i | X t i − 1 = x t i − 1 ) ∼ N ( µ t i , Ξ t i ) , where x t i − 1 = v t i − 1 + z t i − 1 , µ t i = x t i − 1 +  aP e aT i bP ( e aT i − 1) + a  −  aP e aT i − 1 bP ( e aT i − 1 − 1) + a  + e a ( t i − t i − 1 )  bP ( e aT i − 1 − 1) + a bP ( e aT i − 1) + a  2 Z t i − 1 and Ξ t i = 1 2 σ 2 aP 2 e 2 aT i  1 bP ( e aT i − 1) + a  4 × [ b 2 P 2 ( e 2 aT i − e 2 aT i − 1 ) + 4 bP ( a − bP )( e aT i − e aT i − 1 ) + 2 a ( t i − t i − 1 )( a − bP ) 2 ] . 140 Appendix C. Stochastic logistic growth modelling C.4. Prior hyper -parameters f or Bayesian state space models T able C.1: Prior hyper -parameters for Bayesian sate space models, Log-normal with mean ( µ ) and precision ( τ ) Parameter Name V alue µ K log(0 . 1) τ K 2 µ r log(3) τ r 5 µ P log(0 . 0001) τ P 0.1 µ σ log(100) τ σ 0.1 µ ν log(10000) τ ν 0.1 141 Appendix C. Stochastic logistic growth modelling C.5. Kalman ﬁlter f or the linear noise approximation of the stochas- tic logistic growth model with additiv e intrinsic noise and Normal measur ement error T o ﬁnd π ( y t 1: N ) for the LN AA with Normal measurement error we can use the following Kalman Filter algorithm. First we assume the follo wing: θ t i | y 1: t i ∼ N( m t i , C t i ) , m t i = a t i + R t i F ( F T R t i F + U ) − 1 [ y t i − F T a t i ] , C t i = R t i − R t i F ( F T R t i F + U ) − 1 F T R t i and initialize with m 0 = P and C 0 = 0 . Now suppose that, θ t i | y 1: t i − 1 ∼ N( a t i , R t i ) , a t i = G t i m t i − 1 and R t i = G t i C t i − 1 G T t i + W t i . The transition density distribution, see (5.15) is as follo ws: θ t i | θ t i − 1 ∼ N( G t i θ t i − 1 , W t i ) or equi v alently ( X t i | X t i − 1 = x t i − 1 ) ∼ N ( µ t i , Ξ t i ) , where x t i − 1 = v t i − 1 + z t i − 1 , θ t = 1 X t i ! = 1 0 H α,t i H β ,t i ! 1 X t i − 1 ! = G t i θ t i − 1 , G t i = 1 0 H α,t i H β ,t i ! , W t i = 0 0 0 Ξ t i ! where H α,t i = H α ( t i , t i − 1 ) = v t − V t − 1 e a ( t i − t i − 1 )  bP ( e aT i − 1 − 1) + a bP ( e aT i − 1) + a  2 and H β ,t i = H β ( t i , t i − 1 ) = e a ( t i − t i − 1 )  bP ( e aT i − 1 − 1) + a bP ( e aT i − 1) + a  2 . 142 Appendix C. Stochastic logistic growth modelling The measurement error distribution is as follo ws: y t i | θ t i ∼ N( F T θ t i , U ) or equi v alently y t i | θ t i ∼ N( X t i , σ 2 ν ) , where F = 0 1 ! and U = σ 2 ν . Matrix Algebra: a t i = G t i m t i − 1 = 1 0 H α,t i H β ,t i ! 1 m t i − 1 ! = 1 H α,t i + H β ,t i m t i − 1 ! R t i = G t i C t i − 1 G T t i + W t i = 0 0 0 H β ,t i 2 c 2 t i − 1 ! + 0 0 0 Ξ t i ! = 0 0 0 H β ,t i 2 c 2 t i − 1 + Ξ t i ! C t i − 1 = 0 0 0 c 2 t i − 1 ! R t i F ( F T R t i F + U ) − 1 = 0 0 0 H β ,t i 2 c 2 t i − 1 + Ξ t i ! 0 1 ! × "  0 1  0 0 0 H β ,t i 2 c 2 t i − 1 + Ξ t i ! 0 1 ! + σ 2 ν # − 1 = h H β ,t i 2 c 2 t i − 1 + Ξ t i + σ 2 ν i − 1 0 H β ,t i 2 c 2 t i − 1 + Ξ t i ! 143 Appendix C. Stochastic logistic growth modelling m t i = a t i + R t i F ( F T R t i F + U ) − 1 [ y t i − F T a t i ] = 1 H α,t i + H β ,t i m t i − 1 ! + h H β ,t i 2 c 2 t i − 1 + Ξ t i + σ 2 ν i − 1 × 0 H β ,t i 2 c 2 t i − 1 + Ξ t i ! " y t i −  0 1  1 H α,t i + H β ,t i m t i − 1 !# =   0 H α,t i + H β ,t i m t i − 1 + H β ,t i 2 c 2 t i − 1 + Ξ t i H β ,t i 2 c 2 t i − 1 + Ξ t i + σ 2 ν  y t i − H α,t i − H β ,t i m t i − 1    C t i = R t i − R t i F ( F T R t i F + U ) − 1 F T R t i = 0 0 0 H β ,t i 2 c 2 t i − 1 + Ξ t i ! − h H β ,t i 2 c 2 t i − 1 + Ξ t i + σ 2 ν i − 1 × 0 H β ,t i 2 c 2 t i − 1 + Ξ t i ! "  0 1  0 0 0 H β , t i 2 c 2 t i − 1 + Ξ t i !# =   0 0 0 H β ,t i 2 c 2 t i − 1 + Ξ t i −  H β ,t i 2 c 2 t i − 1 + Ξ t i  2 H β ,t i 2 c 2 t i − 1 + Ξ t i + σ 2 ν   W ith m t i and C t i for i = 1 : N , we can ev aluate a t i , R t i and π ( x t i | y t 1:( i − 1) ) for i = 1 : N . W e are interested in π ( y t 1: i ) = Q N i =1 π ( y t i | y t 1:( i − 1) ) , where π ( y t i | y t 1:( i − 1) ) = R x π ( y t i | x t i ) π ( x t i | y t 1:( i − 1) ) dx t i gi ves a tractable Gaussian inte gral. Finally , log π ( y t 1: N ) = N X i =1 log π ( y t i | y t 1:( i − 1) ) = N X i =1 " − log  q 2 π ( σ 2 f + σ 2 g )  − ( µ f − µ g ) 2 2( σ 2 f + σ 2 g ) # , where µ f − µ g = y t i − a t i = y t i − H α,t i − H β ,t i m t i − 1 and σ 2 f + σ 2 g = σ 2 ν + R t i = σ 2 ν + H β ,t i 2 c 2 t i − 1 + Ξ t i . 144 Appendix C. Stochastic logistic growth modelling Procedure 1. Set i = 1 . Initialize m 0 = P and C 0 = 0 . 2. Ev aluate and store the follo wing log likelihood term: log π ( y t i | y t 1:( i − 1) ) = " − log  q 2 π ( σ 2 f + σ 2 g )  − ( µ f − µ g ) 2 2( σ 2 f + σ 2 g ) # , where µ f − µ g = y t i − H α,t i − H β ,t i m t i − 1 and σ 2 f + σ 2 g = σ 2 ν + H β ,t i 2 c 2 t i − 1 + Ξ t i . 3. Create and store both m t i , and C t i , where m t i = H α,t i + H β ,t i m t i − 1 + H β ,t i 2 c 2 t i − 1 + Ξ t i H β ,t i 2 c 2 t i − 1 + Ξ t i + σ 2 ν  y t i − H α,t i − H β ,t i m t i − 1  and c 2 t i = H β ,t i 2 c 2 t i − 1 + Ξ t i −  H β ,t i 2 c 2 t i − 1 + Ξ t i  2 H β ,t i 2 c 2 t i − 1 + Ξ t i + σ 2 ν . 4. Increment i , i = ( i + 1) and repeat steps 2-3 till log π ( y t N | y t 1:( N − 1) ) is e v aluated. 5. Calculate the sum: log π ( y t 1: N ) = N X i =1 log π ( y t i | y t 1:( i − 1) ) . 145 Bibliography A D D I N A L L , S . G . , D OW N E Y , M . , Y U , M . , Z U B K O , M . K . , D E W A R , J . , L E A K E , A . , H A L L I N A N , J . , S H A W , O . , J A M E S , K . , W I L K I N S O N , D . J . , W I PA T , A . , D U R O C H E R , D . & L Y D A L L , D . 2008 A genomewide suppressor and enhancer analysis of cdc13-1 re veals v aried cellular processes inﬂuencing telomere capping in Saccharomyces cere- visiae. Genetics 180 (4), 2251–2266. A D D I N A L L , S . G . , H O L S T E I N , E . - M . , L AW L E S S , C . , Y U , M . , C H A P M A N , K . , B A N K S , A . P . , N G O , H . - P . , M A R I N G E L E , L . , T A S C H U K , M . , Y O U N G , A . , C I E S I O L K A , A . , L I S T E R , A . L . , W I P A T , A . , W I L K I N S O N , D . J . & L Y D A L L , D . 2011 Quantitativ e Fit- ness Analysis Sho ws That NMD Proteins and Many Other Protein Complexes Suppress or Enhance Distinct T elomere Cap Defects. PLoS Genet 7 (4), e1001362. A L L E N , L . J . 2010 An Intr oduction to Stochastic Pr ocesses with Biology Applications, Second Edition . Chapman and Hall/CRC. A N , W . F. & T O L L I D A Y , N . J . 2009 Introduction: cell-based assays for high-throughput screening. Methods in molecular biology (Clifton, N.J .) 486 , 1–12. A N B A L A G A N , S . , B O N E T T I , D . , L U C C H I N I , G . & L O N G H E S E , M . P . 2011 Rif1 Sup- ports the Function of the CST Complex in Y east T elomere Capping. PLoS Genet 7 (3), e1002024. A N D R I E U , C . & T H O M S , J . 2008 A tutorial on adaptiv e MCMC. Statistics and Comput- ing 18 (4), 343–373. A R N O L D , L . 2013 Stochastic Differ ential Equations: Theory and Applications . Dover Publications, Incorporated. A Y L O R , D . L . & Z E N G , Z . - B . 2008 From Classical Genetics to Quantitativ e Genetics to Systems Biology: Modeling Epistasis. PLoS Genet 4 (3), e1000029. B A N K S , A . , L AW L E S S , C . & L Y D A L L , D . 2012 A Quantitativ e Fitness Analysis W ork- ﬂo w . J. V is. Exp 66 , e4018. B A T E S , D . , M A E C H L E R , M . & B O L K E R , B . 2013 lme4: Linear mixed-effects models using S4 classes . R package version 0.999999-2. 146 Bibliography B AY E S & P R I C E 1763 An Essay to wards Solving a Problem in the Doctrine of Chances. By the Late Rev . Mr . Bayes, F . R. S. Communicated by Mr . Price, in a Letter to John Canton, A. M. F . R. S. Philosophical T ransactions (1683-1775) 53 , 370–418. B E N JA M I N I , Y . & H O C H B E R G , Y . 1995 Controlling the F alse Disco very Rate: A Practi- cal and Po werful Approach to Multiple T esting. Journal of the Royal Statistical Society . Series B (Methodological) 57 (1), 289–300. B E R NA R D O , J . & S M I T H , A . 2007 Bayesian Theory . John W ile y & Sons Canada, Lim- ited. B O T S T E I N , D . , C H E RV I T Z , S . A . & C H E R RY , M . 1997 Y east as a Model Organism. Science 277 (5330), 1259–1260. B O X , G . E . P . & C O X , D . R . 1964 An Analysis of T ransformations. Journal of the Royal Statistical Society . Series B (Methodological) 26 (2), 211–252. B O Y D , D . & C R A W F O R D , K . 2011 Six Provocations for Big Data. Social Science Re- sear c h Network W orking P aper Series . C A P O C E L L I , R . & R I C C I A R D I , L . 1974 Gro wth with regulation in random en vironment. K ybernetik 15 (3), 147–157. C A R L E T T I , M . 2006 Numerical solution of stochastic differential problems in the bio- sciences. J . Comput. Appl. Math. 185 (2), 422–440. C A S E L L A , G . & G E O R G E , E . I . 1992 Explaining the Gibbs Sampler. The American Statistician 46 (3), 167–174. C H E E T H A M , A . H . & H A Z E L , J . E . 1969 Binary (Presence-Absence) Similarity Coef ﬁ- cients. J ournal of P aleontology 43 (5), 1130–1136. C H E N , C . , G R E N NA N , K . , B A D N E R , J . , Z H A N G , D . , G E R S H O N , E . , J I N , L . & L I U , C . 2011 Removing batch ef fects in analysis of expression microarray data: An e v aluation of six batch adjustment methods. PLoS ONE 6 (2), e17238. C H E N , Y . , L A W L E S S , C . , G I L L E S P I E , C . S . , W U , J . , B OY S , R . J . & W I L K I N S O N , D . J . 2010 CaliBayes and B ASIS: integrated tools for the calibration, simulation and storage of biological simulation models. Brieﬁngs in Bioinformatics 11 (3), 278–289. 147 Bibliography C H E R RY , J . M . , H O N G , E . L . , A M U N D S E N , C . , B A L A K R I S H N A N , R . , B I N K L E Y , G . , C H A N , E . T . , C H R I S T I E , K . R . , C O S T A N Z O , M . C . , D W I G H T , S . S . , E N G E L , S . R . , F I S K , D . G . , H I R S C H M A N , J . E . , H I T Z , B . C . , K A R R A , K . , K R I E G E R , C . J . , M I Y A S A T O , S . R . , N A S H , R . S . , P A R K , J . , S K R Z Y P E K , M . S . , S I M I S O N , M . , W E N G , S . & W O N G , E . D . 2012 Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic acids r esear c h 40 (Database issue). C H I B , S . & G R E E N B E R G , E . 1995 Understanding the Metropolis-Hastings Algorithm. The American Statistician 49 (4), 327–335. C O L E , D . J . , R I D O U T , M . S . , M O R G A N , B . J . T. , B Y R N E , L . J . & T U I T E , M . F . 2007 Approximations for expected generation number . Biometrics 63 (4), 1023–1030. C O N G , Y . - S . , W R I G H T , W . E . & S H A Y , J . W . 2002 Human T elomerase and Its Regula- tion. Micr obiology and Molecular Biology Revie ws 66 (3), 407–425. C O N S O RT I U M , G . O . 2004 The Gene Ontology (GO) database and informatics resource. Nucleic Acids Resear c h 32 (suppl 1), D258–D261. C O R D E L L , H . J . 2002 Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Human molecular genetics 11 (20), 2463–2468. C O W L E S , M . K . & C A R L I N , B . P . 1996 Markov Chain Monte Carlo Con ver gence Di- agnostics: A Comparati v e Revie w . J ournal of the American Statistical Association 91 , 883–904. D A V I D W Y N F O R D - T H O M A S , D . K . 1997 The end-replication problem. D E W A R , J . M . & L Y DA L L , D . A . 2012 Similarities and differences between uncapped telomeres and DN A double-strand breaks. Chr omosoma 121 (2), 117–130. D U R B I N , J . , H A RV E Y , A . , K O O P M A N , S . & S H E P H A R D , N . 2004 State Space and Un- observed Component Models: Theory and Applications . Cambridge Uni versity Press. D U R R E T T , R . 1996 Stochastic Calculus: A Practical Intr oduction . T aylor & Francis. E D E N , E . , L I P S O N , D . , Y O G E V , S . & Y A K H I N I , Z . 2007 Disco vering Motifs in Ranked Lists of DN A Sequences. PLoS Comput Biol 3 (3), e39. 148 Bibliography E D E N , E . , N A V O N , R . , S T E I N F E L D , I . , L I P S O N , D . & Y A K H I N I , Z . 2009 Gorilla: a tool for discov ery and visualization of enriched go terms in ranked gene lists. BMC Bioinformatics 10 (1), 48. F A L C O N , S . & G E N T L E M A N , R . 2007 Using GOstats to test gene lists for GO term association. Bioinformatics 23 (2), 257–8. F O S T E R , S . S . , Z U B K O , M . K . , G U I L L A R D , S . & L Y D A L L , D . 2006 MRX protects telomeric DN A at uncapped telomeres of b udding yeast cdc13-1 mutants. DN A Repair 5 (7), 840 – 851. F O U R M E N T , M . & G I L L I N G S , M . 2008 A comparison of common programming lan- guages used in bioinformatics. BMC Bioinformatics 9 (1), 82. F R A N K E , J . , G E H L E N , J . & E H R E N H O F E R - M U R R A Y , A . E . 2008 Hypermethylation of yeast telomerase RNA by the snRNA and snoRNA methyltransferase Tgs1. J. Cell. Sci. 121 (Pt 21), 3553–3560. G A M E R M A N , D . 1997 Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Infer ence , 1st edn. Chapman & Hall. G A M E R M A N , D . & L O P E S , H . 2006 Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Infer ence . T aylor & Francis. G A R D I N E R , C . 2010 Stochastic Methods: A Handbook for the Natural and Social Sci- ences . Springer . G E L FA N D , A . E . & S M I T H , A . F . M . 1990 Sampling-Based Approaches to Calculating Marginal Densities. J ournal of the American Statistical Association 85 (410), 398–409. G E L M A N , A . 2006 Prior distrib utions for variance parameters in hierarchical models. Bayesian analysis 1 (3), 515–533. G E L M A N , A . , C A R L I N , J . B . , S T E R N , H . S . & R U B I N , D . B . 2003 Bayesian Data Analysis, Second Edition (Chapman & Hall/CRC T exts in Statistical Science) , 2nd edn. Chapman and Hall/CRC. G E L M A N , A . & H I L L , J . 2006 Data Analysis Using Re gr ession and Multi- level/Hier ar chical Models , 1st edn. Cambridge Uni versity Press. 149 Bibliography G I L K S , W . , R I C H A R D S O N , S . & S P I E G E L H A LT E R , D . 1995 Markov Chain Monte Carlo in Practice . T aylor & Francis. G O FF E AU , A . , B A R R E L L , B . G . , B U S S E Y , H . , D A V I S , R . W . , D U J O N , B . , F E L D - M A N N , H . , G A L I B E RT , F. , H O H E I S E L , J . D . , J AC Q , C . , J O H N S T O N , M . , L O U I S , E . J . , M E W E S , H . W . , M U R A K A M I , Y . , P H I L I P P S E N , P . , T E T T E L I N , H . & O L I V E R , S . G . 1996 Life with 6000 genes. Science 274 (5287), 546–567. G O L D S T E I N , H . 2011 Multilevel Statistical Models . W iley . G O L I G H T LY , A . & W I L K I N S O N , D . J . 2005 Bayesian Inference for Stochastic Kinetic Models Using a Dif fusion Approximation. Biometrics 61 (3), 781–788. G U T I ´ E R R E Z , R . , R I C O , N . , R O M ´ A N - R O M ´ A N , P . & T O R R E S - R U I Z , F . 2006 Approxi- mate and generalized conﬁdence bands for some parametric functions of the lognormal dif fusion process with exogenous factors. Scientiae Mathematicae J aponicae 64 (2), 313–330. H A S T I N G S , W . K . 1970 Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 (1), 97–109. H AY FL I C K , L . & M O O R H E A D , P . 1961 The serial cultiv ation of human diploid cell strains. Experimental Cell Resear c h 25 (3), 585 – 621. H E I D E L B E R G E R , P . & W E L C H , P . D . 1981 A spectral method for conﬁdence interval generation and run length control in simulations. Commun. A CM 24 (4), 233–245. H E Y DA R I , J . J . , L AW L E S S , C . , L Y D A L L , D . A . & W I L K I N S O N , D . J . 2012 Bayesian hierarchical modelling for inferring genetic interactions in yeast. In submission . H E Y DA R I , J . J . , L AW L E S S , C . , L Y DA L L , D . A . & W I L K I N S O N , D . J . 2013 Fast Bayesian parameter estimation for stochastic logistic gro wth models, arXi v:1310.5524. H U A N G , D . W . , S H E R M A N , B . T . & L E M P I C K I , R . A . 2008 Systematic and integrativ e analysis of large gene lists using D A VID bioinformatics resources. Natur e Pr otocols 4 (1), 44–57. H U A N G , D . W . , S H E R M A N , B . T. & L E M P I C K I , R . A . 2009 Bioinformatics enrichment tools: paths to ward the comprehensiv e functional analysis of large gene lists. Nucleic Acids Resear c h 37 (1), 1–13. 150 Bibliography H U B E R - C A R O L , C . 2002 Goodness-Of-F it T ests and Model V alidity . Birkh ¨ auser Boston. H U R N , A . S . , J E I S M A N , J . I . & L I N D S AY , K . A . 2007 Seeing the W ood for the T rees: A Critical Ev aluation of Methods to Estimate the Parameters of Stochastic Differential Equations. J ournal of F inancial Econometrics 5 (3), 390–455. I T ˆ O , K . 1944 Stochastic inte gral. Pr oceedings of the Imperial Academy 20 (8), 519–524. J AC C A R D , P . 1912 The distribution of the ﬂora in the alpine zone.1. New Phytologist 11 (2), 37–50. J AC K M A N , S . 2009 Bayesian Analysis for the Social Sciences . John W iley & Sons. J O H N S O N , N . , K OT Z , S . & B A L A K R I S H NA N , N . 1995 Continuous univariate distribu- tions . W ile y series in pr obability and mathematical statistics: Applied pr obability and statistics v . 2. W iley & Sons. J O N E S , M . E . 1992 Orotidylate decarboxylase of yeast and man. Curr . T op. Cell. Re gul. 33 , 331–342. J R . , M . E . T . , J R . , E . L . B . , K I R K , K . A . & P R U I T T , K . M . 1976 A theory of growth. Mathematical Biosciences 29 (34), 367 – 373. K A L M A N , R . E . 1960 A Ne w Approach to Linear Filtering and Prediction Problems. T ransactions of the ASME–J ournal of Basic Engineering 82 (Series D), 35–45. K A N E K O , S . , T A N A K A , T. , N O D A , H . , F U K U D A , H . , A K A DA , R . & K O N D O , A . 2009 Marker -disrupti v e gene integration and URA3 recycling for multiple gene manipulation in Saccharomyces cere visiae. Applied Micr obiology and Biotechnology 83 (4), 783– 789. K E O G H , M . C . , K U R D I S T A N I , S . K . , M O R R I S , S . A . , A H N , S . H . , P O D O L N Y , V . , C O L L I N S , S . R . , S C H U L D I N E R , M . , C H I N , K . , P U N N A , T. , T H O M P S O N , N . J . , B O O N E , C . , E M I L I , A . , W E I S S M A N , J . S . , H U G H E S , T. R . , S T R A H L , B . D . , G RU N - S T E I N , M . , G R E E N B L A T T , J . F. , B U R A TO W S K I , S . & K R O G A N , N . J . 2005 Cotran- scriptional set2 methylation of histone H3 lysine 36 recruits a repressiv e Rpd3 complex. Cell 123 (4), 593–605. K I J I M A , M . 2013 Stochastic Pr ocesses with Applications to F inance, Second Edition . CRC Press. 151 Bibliography K L O E D E N , P . & P L A T E N , E . 1992 Numerical Solution of Stochastic Dif fer ential Equa- tions . Springer . K O L L E R , M . 2012 Stochastic Models in Life Insur ance . Springer . K O M O R O W S K I , M . , F I N K E N S T A D T , B . , H A R P E R , C . V . & R A N D , D . A . 2009 Bayesian inference of biochemical kinetic parameters using the linear noise approximation. BMC Bioinformatics 10 , 343. K OW A L C Z Y K , T. , P L E S Z C Z Y N S K A , E . & R U L A N D , F. 2004 Grade Models and Methods for Data Analysis: W ith Applications for the Analysis of Data P opulations . Springer . K U RT Z , T. G . 1970 Solutions of ordinary differential equations as limits of pure jump Marko v processes. J ournal of Applied Pr obability 7 (1), 49–58. K U RT Z , T. G . 1971 Limit theorems for sequences of jump Markov processes approxi- mating ordinary dif ferential processes. J ournal of Applied Pr obability 8 (2), 344–356. K U T N E R , M . H . , N AC H T S H E I M , C . J . , N E T E R , J . & L I , W . 2005 Applied linear statis- tical models. , 5th edn. McGraw-Hill Irwin. L AW L E S S , C . , W I L K I N S O N , D . J . , Y O U N G , A . , A D D I N A L L , S . G . & L Y D A L L , D . A . 2010 Colonyzer: automated quantiﬁcation of micro-organism growth characteristics on solid agar. BMC Bioinformatics 11 , 287. L E E K , J . T. , S C H A R P F , R . B . , B R A V O , H . C . C . , S I M C H A , D . , L A N G M E A D , B . , J O H N - S O N , W . E . , G E M A N , D . , B A G G E R LY , K . & I R I Z A R RY , R . A . 2010 T ackling the widespread and critical impact of batch effects in high-throughput data. Natur e r evie ws. Genetics 11 (10), 733–739. L E V Y , M . Z . , A L L S O P P , R . C . , F U T C H E R , A . , G R E I D E R , C . W . & H A R L E Y , C . B . 1992 T elomere end-replication problem and cell aging. Journal of Molecular Biology 225 (4), 951 – 960. L O T TA Z , C . , Y A N G , X . , S C H E I D , S . & S PA N G , R . 2006 OrderedLista bioconductor package for detecting similarity in ordered gene lists. Bioinformatics 22 (18), 2315– 2316. L U N N , D . , T H O M A S , A . , B E S T , N . & S P I E G E L H A L T E R , D . 2000 a W inB UGS - A Bayesian modelling framework: Concepts, structure, and extensibility . Statistics and Computing 10 (4), 325–337. 152 Bibliography L U N N , D . J . , T H O M A S , A . , B E S T , N . & S P I E G E L H A LT E R , D . 2000 b WinB UGS - a Bayesian modelling framework: Concepts, structure, and extensibility . Statistics and Computing 10 , 325–337. L U S T I G , A . J . 2001 Cdc13 subcomplexes re gulate multiple telomere functions. L Y D A L L , D . 2003 Hiding at the ends of yeast chromosomes: telomeres, nucleases and checkpoint pathways. J ournal of Cell Science 116 , 4057–4065. M A N I , R . , S T . O N G E , R . P . , H A RT M A N , J . L . , G I A E V E R , G . & R OT H , F . P . 2008 Deﬁning genetic interaction. Pr oceedings of the National Academy of Sciences 105 (9), 3461–3466. M E T RO P O L I S , N . , R O S E N B L U T H , A . W . , R O S E N B L U T H , M . N . , T E L L E R , A . H . & T E L L E R , E . 1953 Equation of State Calculations by Fast Computing Machines. The J ournal of Chemical Physics 21 (6), 1087–1092. N U G E N T , C . I . , H U G H E S , T. R . , L U E , N . F. & L U N D B L A D , V . 1996 Cdc13p: A Single- Strand T elomeric DNA-Binding Protein with a Dual Role in Y east T elomere Mainte- nance. Science 274 (5285), 249–252. O ’ H A R A , R . B . & S I L L A N PA A , M . J . 2009 A Revie w of Bayesian V ariable Selection Methods: What, Ho w and Which. Bayesian Analysis 4 , 85. Ø K S E N D A L , B . 2010 Stochastic Differ ential Equations: An Intr oduction with Applica- tions . Springer . O L OV N I K OV , A . 1973 A theory of marginotomy: The incomplete copying of template margin in enzymic synthesis of polynucleotides and biological signiﬁcance of the phe- nomenon. J ournal of Theor etical Biology 41 (1), 181 – 190. O L OV N I K OV , A . M . 1996 T elomeres, telomerase, and aging: Origin of the theory . Exper- imental Ger ontology 31 (4), 443 – 448. P E L E G , M . , C O R R A D I N I , M . G . & N O R M A N D , M . D . 2007 The logistic (V erhulst) model for sigmoid microbial gro wth curves revisited. F ood Resear c h International 40 (7), 808 – 818. P H E N I X , H . , M O R I N , K . , B A T E N C H U K , C . , P A R K E R , J . , A B E D I , V . , Y A N G , L . , T E P L I - A K OV A , L . , P E R K I N S , T. J . & K R N , M . 2011 Quantitati ve Epistasis Analysis and Pathw ay Inference from Genetic Interaction Data. PLoS Comput Biol 7 (5), e1002048. 153 Bibliography P H I L L I P S , P . C . 1998 The Language of Gene Interaction. Genetics 149 (3), 1167–1171. P I N H E I RO , J . C . & B A T E S , D . M . 2000 Mixed Ef fects Models in S and S-Plus . Springer . P L U M M E R , M . 2003 J AGS: A program for analysis of Bayesian graphical models us- ing Gibbs sampling. In Pr oceedings of the 3rd International W orkshop on Distributed Statistical Computing . P L U M M E R , M . 2010 rjags: Bayesian graphical models using MCMC R package version 2.1.0-10, http://CRAN.R- project.org/package=rjags . P L U M M E R , M . , B E S T , N . , C O W L E S , K . & V I N E S , K . 2006 COD A: Con v er gence Diag- nosis and Output Analysis for MCMC. R News 6 (1), 7–11. R C O R E T E A M 2013 R: A Language and En vir onment for Statistical Computing . R Foun- dation for Statistical Computing, V ienna, Austria. R A F T E RY , A . E . & L E W I S , S . M . 1995 The Number of Iterations, Con vergence Diag- nostics and Generic Metropolis Algorithms. In Practical Markov Chain Monte Carlo (W .R. Gilks, D.J. Spie gelhalter and S. Richar dson, eds.) , pp. 115–130. Chapman and Hall. R I C H A R D S , F. J . 1959 A Flexible Growth Function for Empirical Use. J. Exp. Bot. 10 (2), 290–301. R O M ´ A N - R O M ´ A N , P . & T O R R E S - R U I Z , F. 2012 Modelling logistic growth by a ne w dif fusion process: Application to biological systems. Biosystems 110 (1), 9–21. R O S E N T H A L , J . S . 2000 Parallel computing and monte carlo algorithms. F ar east journal of theor etical statistics 4 (2), 207–236. S C H U L D I N E R , M . , C O L L I N S , S . , W E I S S M A N , J . & K R O G A N , N . 2006 Quantitativ e genetic analysis in saccharomyces cerevisiae using epistatic miniarray proﬁles (e-maps) and its application to chromatin functions. Methods 40 (4), 344 – 352, chromatin and T ranscriptional Re gulation. S H A Y , J . W . & W R I G H T , W . E . 2005 Senescence and immortalization: role of telomeres and telomerase. Car cino genesis 26 (5), 867–874. S I M O N , D . 2006 Optimal State Estimation: Kalman, H Inﬁnity , and Nonlinear Ap- pr oaches . W iley . 154 Bibliography S O O N , W . W . , H A R I H A R A N , M . & S N Y D E R , M . P . 2013 High-throughput sequencing for biology and medicine. Molecular Systems Biology 9 (1). S P E A R M A N , C . 1987 The proof and measurement of association between tw o things. By C. Spearman, 1904. The American journal of psycholo gy 100 (3-4), 441–471. T H U L A S I R A M A N , K . & S W A M Y , M . N . S . 1992 Dir ected Graphs , pp. 97–125. John W iley and Sons, Inc. T O N G , A . H . & B O O N E , C . 2006 Synthetic genetic array analysis in Saccharomyces cere visiae. Methods Mol Biol 313 , 171–192. T S O U L A R I S , A . & W A L L AC E , J . 2002 Analysis of logistic growth models. Mathematical Biosciences 179 (1), 21 – 55. V A N K A M P E N , N . 2011 Stochastic Pr ocesses in Physics and Chemistry . Elsevier Science. V E R H U L S T , P . F . 1845 Recherches math ´ ematiques sur la loi d’accroissement de la pop- ulation. Nouveaux m ´ emoir es de l’Academie Royale des Science et Belles-Lettr es de Bruxelles 18 , 1–41. W A L L AC E , E . W . , G I L L E S P I E , D . T . , S A N F T , K . R . & P E T Z O L D , L . R . 2012 Linear noise approximation is v alid ov er limited times for any chemical system that is suf ﬁ- ciently large. IET Syst Biol 6 (4), 102–115. W A L L AC E , E . W . J . 2010 A simpliﬁed deri v ation of the Linear Noise Approximation, arXi v:1004.4280. W A N , Y . , C H E N , W . , X I N G , J . , T A N , J . , L I , B . , C H E N , H . , L I N , Z . , C H I A N G , J . - H . & R A M S E Y , S . 2011 Transcriptome proﬁling re v eals a nov el role for trichostatin A in antagonizing histone chaperone Chz1 mediated telomere anti-silencing. FEBS Letters 585 (15), 2519 – 2525. W E L C H , G . & B I S H O P , G . 1995 An Introduction to the Kalman Filter. T ech. Rep. . Chapel Hill, NC, USA. W E L L I N G E R , R . J . 2009 The CST comple x and telomere maintenance: the exception becomes the rule. Mol. Cell 36 (2), 168–169. W E S T , M . & H A R R I S O N , J . 1997 Bayesian F orecasting and Dynamic Models , 2nd edn. Ne w Y ork: Springer-V erlag. 155 Bibliography W I L K I N S O N , D . 2011 Stochastic Modelling for Systems Biology , Second Edition . T aylor & Francis. W I L K I N S O N , D . J . 2009 Stochastic modelling for quantitativ e description of heteroge- neous biological systems. Natur e Re vie ws Genetics 10 (2), 122–133. W I T T E , R . & W I T T E , J . 2009 Statistics . John W ile y & Sons. W R I G H T , W . E . & S H A Y , J . W . 1992 The two-stage mechanism controlling cellular senescence and immortalization. Experimental Ger ontology 27 (4), 383 – 389, special Issue Human Diploid Fibroblast-like Cells as a Model System for the Study of Senes- cence. X U , J . 2010 Micr obial P opulation Genetics . Caister Academic Press. Y A N G , X . , B E N T I N K , S . , S C H E I D , S . & S PA N G , R . 2006 Similarities of ordered gene lists. J Bioinform Comput Biol 4 (3), 693–708. Y I , N . 2010 Statistical analysis of genetic interactions. Genetics r esear ch 92 (5-6), 443– 459. Z E Y L , C . 2000 Budding yeast as a model organism for population genetics. Y east 16 (8), 773–784. Z U B K O , M . K . & L Y DA L L , D . 2006 Linear chromosome maintenance in the absence of essential telomere-capping proteins. Nat. Cell Biol. 8 (7), 734–740. Z U U R , A . , I E N O , E . , W A L K E R , N . , S A V E L I E V , A . & S M I T H , G . 2009 Mixed Ef fects Models and Extensions in Ecology with R . Springer . 156

Bayesian hierarchical modelling for inferring genetic interactions in yeast

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment