J.B.S. Haldane Could Have Done Better
In a review on the contribution of J.B.S. Haldane to the development of the Bayes factor hypothesis test (arXiv:1511.08180), Etz and Wagenmakers focus on Haldane's proposition of a mixture prior in a genetic example (Haldane 1932, A note on inverse p…
Authors: Claus Vogl
Submitted to Statistical Science J.B.S. Haldane Could Have Done Better Claus V ogl V eterin¨ armedizinische Universit¨ at Wien COMMENT ON: “J.B.S. HALD ANE’S CONTRIBUTION TO THE BA YES F A CTOR HYPOTHESIS TEST” BY ETZ AND W A GENMAKERS Etz and W agenmakers [ 1 ] (and an earlier v ersion of this paper a v ailable at: h ttps://arxiv.org/abs/1511.08180) review the contribution of J.B.S. Haldane to the dev elopment of the Ba y es factor hypothesis test. They fo cus particularly on Haldane’s prop osition of a mixture prior in his first example on genetic link age mapping in the Chinese primrose (Primula sinensis) [ 3 ]. As Haldane nev er fol- lo wed up on these ideas, it is difficult to gauge his motiv ation and inten tions. Haldane himself states his purp ose in the beginning of the article [ 3 ]: Ba yes theorem is based on the assumption that all v alues in the neigh b orhoo d of that observ ed are equally probable a priori . It is the purpose of this article to examine what more reasonable assumptions could b e made, and how it will affect the estimate giv en the data. Compactly restated: flat priors should b e r eplac e d by mor e r e asonable assump- tions. But I will argue that in the v ery same article, in the v ery first example, Haldane himself uses a flat prior instead of a more reasonable prior. Haldane’s primr ose example with a flat prior. The data come from a (h yp o- thetical) observ ation of 400 meioses in the primrose; 160 of them are cross-ov ers. Let ρ b e the recom bination rate b et ween the t wo lo ci. The lik eliho o d is a binomial (1) p ( y = 160 | ρ, N = 400) = 400 160 ρ 160 (1 − ρ ) 240 . Haldane argues that P. sinensis has tw elve chromosomes of ab out equal length. Recom bination b et ween unlinked lo ci on different chromosomes is free, such that the recom bination rate ρ = 1 2 . This is reflected in Haldane’s prior b y a p oin t mass of 11 12 on ρ = 1 2 . With probability 1 12 , the tw o lo ci reside on the same chromosome, i.e., the t w o lo ci are link ed. Conditional on link age, Haldane assumes 0 ≤ ρ < 1 2 and a flat prior of p ( ρ ) = 2, suc h that his marginal posterior distribution becomes (2) p ( y = 160 | N = 400) = 1 6 400 160 Z 1 2 0 ρ 160 (1 − ρ ) 240 dx . (e-mail: claus.vo gl@vetme duni.ac.at ) Institut f ¨ ur Tierzucht und Genetik, V eterin¨ arme dizinische Universit¨ at Wien, V eterin¨ arplatz 1, A-1210 Vienna, A ustria ∗ CV was supp orted by the Austrian Science F und (FWF): DK W1225-B20. 1 imsart-sts ver. 2014/10/16 file: main.tex date: November 10, 2018 2 C. VOGL He con tin ues to appro ximate b y extending the upp er integration limit to one p ( y = 160 | N = 400) ≈ 1 6 400 160 Z 1 0 ρ 160 (1 − ρ ) 240 dx = 1 6 400 160 160! 240! 401! = 1 6 · 401 . (3) But the flat prior is unr e asonable , giv en Haldane’s knowledge of genetic link age. A b etter prior. Chromosomes are one-dimensional structures on whic h lo ci re- side. The recom bination rate ρ b et ween tw o genes is a function of their distance on the chromosome. It would hav e b een reasonable for Haldane to assume that a lo cus can b e lo cated anywhere on a c hromosome with equal probability and that the lo cations of t wo loci are indep enden t of eac h other. Then the genetic distance x in units of prop ortions of the length of the chromosome (denoted with L and measured in cross-ov er rates, i.e., Morgan) b et ween the tw o lo ci would b e given b y a b eta (4) p ( x ) = Γ(3) Γ(1)Γ(2) x 1 − 1 (1 − x ) 2 − 1 . Haldane [ 2 ] himself deriv ed a bijectiv e function that maps genetic distance x to recom bination rate ρ : (5) ρ | x,L = 1 − e − 2 Lx 2 . The num b er of cross-o vers per meiosis p er chromosome is ab out one, a fact prob- ably kno wn to Haldane, suc h that I set L = 1. Changing v ariables from x to ρ , the prior distribution of ρ then b ecomes (6) p ( ρ ) = 2 + log (1 − 2 ρ ) 1 − 2 ρ with 0 ≤ ρ ≤ 1 − e − 2 2 (Fig. 1 ). Note that, for the primrose example, the maxim um lik eliho od estimator of the recom bination rate is ˆ ρ = 160 / 400 = 0 . 4. In this parameter region the prior ( 6 ) differs considerably from the flat prior p ( ρ ) = 2. Sp e culations on Haldane’s intentions. Haldane most certainly also wen t through the abov e considerations; after all, he himself developed a v ery useful mapping function. Reading the article carefully , I consider its main purpose not the mixture prior in the primrose example, but rather the inv estigation of differen t parameter regions of the binomial and its conjugate distribution, the beta. The primrose example is in a parameter region, where probabilities of failure and success are ab out equal. (Realize that the example data are actually closer to equal probabil- ities than is usually encountered in link age studies, where sample sizes are often ab out 50 to 100, rather than Haldane’s 400, whic h would ha ve made detection of link age unlik ely with a true ρ = 0 . 4.) F or this, a flat prior is reasonable, i.e., a prior b eta with α = β = 1. Haldane may actually hav e b een more in terested in the approximate distribution ( 3 ) than in the exact one ( 2 ). The other examples in Haldane’s article p ertain to parameter regions where success (or failure) prob- abilities are close to zero or one. Then a flat prior w ould put too muc h weigh t imsart-sts ver. 2014/10/16 file: main.tex date: November 10, 2018 J.B.S. HALDANE COULD HA VE DONE BETTER 3 in to the middle of the parameter region and a prior with α → 0 and β = 1 pro- p ortional to 1 ρ , or with α = β → 0 prop ortional to 1 ρ (1 − ρ ) , would b e preferable. In this ligh t, a more complicated prior distribution than the beta and its asymptotes w ould hav e b een useless, ev en though Haldane could hav e derived it easily . I thus b eliev e that, for the sake of generality , Haldane chose to not do b etter than flat in the primrose example. F urthermore, I agree with Etz and W agenmakers [ 1 ]: It was the sp ecific nature of the link age problem in genetics that caused Haldane to serendipitously adopt a mixture prior comprising a p oint mass and smo oth distri- bution. A genetic r e d herring. Mo dern genetics has sho wn cross-ov er rates to b e v ari- able along a c hromosome, with low rates at the c hromosome ends and around the cen tromere. Since the distribution of genes on chromosomes also follows roughly the same pattern, this complication can b e ignored, as long as genetic p osition is based on mapping distances (in units of Morgan) and not physical distances (in units of basepairs). 1. A CKNOWLEDGMENTS I thank Alexander Etz and Eric-Jan W agenmak ers for inspiration and encour- agemen t. My research was supp orted b y the Austrian Science F und (FWF): DK W1225-B20. REFERENCES [1] Etz, A. and W agenmakers, E. J. (2017). J.B.S. Haldanes contribution to the Ba yes factor h yp othesis test. Statistic al Scienc e ?? ??–?? [2] Haldane, J. B. S. (1919). The com bination of link age v alues, and the calculation of dis- tances b et ween the lo ci of linked factors. J. Genetic. 8 299–309. [3] Haldane, J. B. S. (1932). A note on inv erse probability. Mathematical Pr o c e e dings of the Cambridge Philosophical So ciety 28 55–61. imsart-sts ver. 2014/10/16 file: main.tex date: November 10, 2018 4 C. VOGL FIGURES Fig 1 . The prior distribution of ρ given L = 1 and assuming e qual distribution of p ositions and Haldane’s mapping function. The histo gr am is pr o duc e d fr om a simulation; the solid line c orr esp onds to the distribution in e q. ( 6 ). imsart-sts ver. 2014/10/16 file: main.tex date: November 10, 2018
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment