Missing data in a stochastic Dollo model for cognate data, and its application to the dating of Proto-Indo-European

Missing data in a sto  hasti Dollo mo del for ognate data, and its appliation to the dating of Proto-Indo-Europ ean Robin J. Ryder and Geo K. Ni holls Departmen t of Statistis, Univ ersit y of Oxford, UK Septem b er 18, 2018 Abstrat Ni holls and Gra y (2008 ) desrib e a ph ylogeneti mo del for trait data. They use their mo del to estimate bran hing times on Indo-Europ ean language trees from lexial data. Aleksey enk o et al. (2008 ) extended the mo del and giv e appliations in genetis. In this pap er w e extend the inferene to handle data missing at random. When trait data are gathered, traits are thinned in a w a y that dep ends on b oth the trait and missing-data on ten t. Ni holls and Gra y (2008 ) treat missing reords as absen t traits. Hittite has 12% missing trait reords. Its age is p o orly predited in their ross- v alidation. Our predition is onsisten t with the historial reord. Ni holls and Gra y (2008 ) dropp ed sev en languages with to o m u h missing data. W e t all t w en t y four languages in the lexial data of Ringe et al. (2002 ). In order to mo del spatial-temp oral rate heterogeneit y w e add a atastrophe pro ess to the mo del. When a language passes through a atastrophe, man y traits  hange at the same time. W e t the full mo del in a Ba y esian setting, via MCMC. W e v alidate our t using Ba y es fators to test kno wn age onstrain ts. W e rejet three of thirt y historially attested onstrain ts. Our main result is a unimo del p osterior distribution for the age of Proto-indo-Europ ean en tered at 8400 y ears BP with 95% HPD equal 7100-9800 y ears BP . The Indo-Europ ean languages desend from a ommon anestor alled Proto-Indo-Europ ean. Lexial data sho w the patterns of relatedness among Indo-Europ ean languages. These data are ognay lasses: a pair of w ords in the same lass desend, through a pro ess of sound  hange, from a ommon anestor. F or example, English se a and German Se e are ognate to one another, but not to the F ren h mer . Gra y and A tkinson (2003 ) o ded data of this kind in a matrix in whi h ro ws orresp ond to languages and olumns to distint ognay lasses, and en tries are zero or one as the language p ossessed or la k ed a term in the olumn lass. They analysed these data using ph ylogeneti algorithms similar to those used for geneti data. Our analysis has the same ob jetiv es, but w e t a mo del designed for lexial trait data. W e w ork with data ompiled b y Ringe et al. (2002 ), reording the distribution of some 872 distint ognay lasses in t w en t y four mo dern and anien t Indo-Europ ean languages. In setion 7, w e giv e estimates for the unkno wn top ology and bran hing times of the ph ylogen y of the ore v o abulary of these languages. Ni holls and Gra y (2008 ) analyse the same data using a losely related sto  hasti Dollo-mo del for binary trait ev olution. Ho w ev er, those authors w ere unable to deal with missing trait reords. Missing data arise when w e are unable to answ er the question do es language X p ossess a ognate in ognay lass Y?. Ni holls and Gra y (2008 ) dropp ed sev en languages whi h had man y missing en tries, and treated missing trait reords in the remainder as absen t traits. This is unsatisfatory . Ho w ev er, it is not straigh tforw ard to giv e a mo del-based in tegration of missing data for the trait ev olution mo del of Ni holls and Gra y (2008 ). In this pap er w e in tegrate the missing trait data, and this te hnial adv ane allo ws us to t all t w en t y 1 four of the languages in the original data. The binary trait mo del of Ni holls and Gra y (2008 ), has b een extended b y Aleksey enk o et al. (2008 ) to m utliple-lev el traits, and is nding appliations in biology . A prop er treatmen t of missing data will b e of use in other appliations. W e are sp eially in terested in ph ylogeneti dating. Beause w e are w orking with lexial, and not syn tati data, it is the age of the bran hing of the ore v o abulary of Proto-Indo-Europ ean that w e estimate. This is a on tro v ersial matter. W ork ers in historial linguistis ha v e evidene from linguisti paleon tology that the most reen t ommon anestor of all kno wn Indo-Europ ean languages bran hed no earlier than ab out 60006500 y ears Before the Presen t (BP) (Mallory , 1989 ). F or a reen t review of the argumen t from linguisti paleon tology , and a ritiism of ph ylogeneti dating, see Garrett (2006 ) and MMahon and MMahon (2005 ). An alternativ e h yp othesis suggests that the spread b egan around 8500 BP when the Anatolians mastered farming (Renfrew , 1987 ) in the early neolithi. Reen t eorts to apply quan titativ e ph ylogeneti metho ds to dating Proto-Indo-Europ ean giv e a time depth of 8000 to 9500 y ears BP (Ni holls and Gra y , 2008 ; Gra y and A tkinson , 2003 ), supp orting the link to farming. In this w ork w e obtain a unimo dal p osterior distribution for the age of Proto-Indo-Europ ean en tered at 8400 y ears BP with 95% HPD equal 7100-9800 y ears BP One p oin t of view is that the argumen t from linguisti paleon tology is orret, and the ph ylogeneti dates are inorret, due to rate heterogeneit y , or some other mo del failing. In this resp et w e adv ane the sear h started in Ni holls and Gra y (2008 ) for a mo del misp eiation whi h migh t explain the 20% dierene the en tral age estimates from our tting, and the status quo. The dierene is large enough that w e are hop eful of nding a single oheren t error, if an y su h error exists. A oun ting no w for missing data, w e are able to publish a m u h wider ross-v alidation test (up from 10 to 30 tested alibrations). Also, w e t a mo del for expliit rate heterogeneit y in time and spae. In view of the spatial-temp oral homogeneit y w e nd here and in other data, the ase for the earlier date seems no w fairly strong. The most probable alternativ e seems to b e a step  hange in the rate of lexial div ersiation ating in a o ordinated fashion aross the Indo-Europ ean territory some 3000 to 5000 y ears ago. In Setions 1 and 2.1 w e desrib e the data and sp eify a sub jetiv e prior for the ph ylogen y of v o abu- laries. In setions 2.2 and 3, w e giv e a generativ e mo del for the data and the orresp onding lik eliho o d. W e inlude, in Setion 3, a reursiv e algorithm whi h mak es the sum o v er missing data tratable. In setions 4 and 5 w e giv e the p osterior distribution on tree and parameter spae, and briey desrib e our MCMC sampler. In setion 6 , w e disuss lik ely mo del misp eiation senarios, test for robustness b y tting syn theti data sim ulated under su h onditions, and ross-v alidate our preditions. W e t the mo del to a data set of Indo-Europ ean languages in setion 7. This pap er has a supplemen t giving an analysis of a seond data set, olleted b y Dy en et al. (1997 ). Ph ylogeneti metho ds ha v e b een used to mak e inferene for tree ( Ringe et al., 2002 ) and net w ork stru- ture (MMahon and MMahon , 2005 ) in Historial Linguistis. W arno w et al. (2004 ) write do wn a more realisti mo del for the div ersiation of v o abulary , aoun ting for w ord-b orro wing b et w een languages (so that their history need not b e tree-lik e) but there is to date no tting. 1 Desription of the data The data group w ords from 328 meaning ategories and 24 Indo-Europ ean languages in to 872 homology lasses. The data w ere olleted and o ded b y Ringe et al. (2002 ). Meaning ategories o v er the ore v o abulary and are assumed relev an t to all languages in the study . T w o w ords of losely similar meaning, desended from a ommon anestor but sub jet to v ariation in phonology , are  o gnate terms. A  o gnay lass is a homology lass of w ords all b elonging to a single meaning ategory . F or example, for the meaning head", the Italian testa and the F ren h tête b elong to the same ognay lass, while the English he ad 2 Old English stierfþ Old High German stirbit, touwit A v estan miriiete Old Ch ur h Sla v oni um  r et  u Latin moritur Osan ? Old English 1 0 0 Old High German 1 1 0 A v estan 0 0 1 Old Ch ur h Sla v oni 0 0 1 Latin 0 0 1 Osan ? ? ? ( a ) ( b ) T able 1: An example of data o ding: (a), the w ord he dies in six Indo-Europ ean languages; (b), the o ding of this data as a binary matrix with ?'s for missing data. In the nota- tion of Setion 1, the rst olumn in (b) is a ognay lass M dies ∈ Ω dies with Ω dies = {{ Old English, Old High German } , { Old English, Old High German, Osan }} and the Sw edish huvud b elong to another ognay lass. An elemen t of a ognay lass is th us a w ord in a partiular language, a sound-meaning pair. These elemen ts are alled  o gnates . The v o abulary of a single language is represen ted as a set of distint ognates. In our analysis a ognate has just t w o prop erties: its language and its ognay lass. If there are N distint ognay lasses in data for L languages, then the a 'th lass M a ⊆ { 1 , 2 , ..., L } is a list of the indies of languages whi h p ossess a ognate in that lass. The data are o ded as a binary matrix D . A ro w orresp onds to a language and a olumn to a ognay lass, so that D i,a = 1 if the a 'th ognay lass has an instane in the i 'th language, and D i,a = 0 otherwise. See T able 1 for an example. This o ding allo ws a language to ha v e sev eral w ords for one meaning (su h as Old High German stirbit and touwit for he dies", an instane of p olymorphism), or no w ord at all. Missing matrix elemen ts arise b eause the reonstruted v o abularies of some anien t languages are inomplete. If w e are unable to answ er the question do es language i p ossess a ognate in ognay lass a  then w e set D i,a =? . W e need notation for b oth matrix and set represen tations with missing data. Denote b y B a olumn a of L × N matrix B . F or a = 1 , 2 , ..., N let D a b e the set of all olumn v etors d ∗ allo w ed b y the data D a in olumn a of D , D a =  d ∗ ∈ { 0 , 1 } L : D i,a ∈ { 0 , 1 } ⇒ d ∗ a = D i,a , i = 1 , 2 , ..., L  . F or d ∗ ∈ D a let m ( d ∗ ) = { i : d ∗ i = 1 } . Denote b y Ω a the set of ognay lasses onsisten t with the data D a , so that Ω a = { ω ⊆ { 1 , 2 , . . . , N } : ω = m ( d ∗ ) , d ∗ ∈ D a } . The data D are then equiv alen tly Ω = (Ω 1 , Ω 2 , ..., Ω N ) . The Ω a -notation generalizes the M a -notation to handle missing data. It is illustrated in T able 1. Ringe et al. (2002 ) list t w en t y four mostly anien t languages. F or elev en of these languages (Latin, Mo dern Latvian, Old Norse...), all the matrix en tries are reorded. F or the rest, the prop ortion of missing en tries v aries b et w een 1% (for Old Irish) and 91% (for Lyian, an anien t language of Anatolia). Note that data are usually missing in small blo  ks orresp onding to the ognay lasses for a giv en meaning ategory , as in T able 1. W e do not mo del this asp et of the missing data. This is related to the mo del-error Ni holls and Gra y (2008 ) all `the empt y-eld appro ximation', under whi h ognay lasses in the same meaning ategory are assumed to ev olv e indep enden tly . 3 F or these Indo-Europ ean ph ylogenies, the top ology of some subtrees are kno wn from historial reords. W e ha v e lo w er and upp er b ounds for the age of the ro ot no de of some subtrees. F or example, the Sla vi languages are kno wn to form a subtree, and the most reen t ommon anestor of the Sla vi languages in the data is kno wn to b e at least 1300 y ears old. In this w a y in ternal no des of the ph ylogen y are onstrained. W e ha v e age b ounds for leaf no des as w ell, sine w e are told when the v o abulary of the anien t languages in these data w as in use. W e om bine these  alibr ation onstrain ts with the ognay lass data in setion 2.1 . Jumping ahead to our results, Figure 5 is a sample from the p osterior distribution w e nd for ph ylogenies. Calibration onstrain ts are represen ted b y the bla k bars aross no des in this tree. 2 Mo dels W e sp eify a sub jetiv e prior for ph ylogenies, represen ting a state of kno wledge of in terest to us. The mo del w e giv e for the div ersiation of v o abulary in Setion 2.2 is, in on trast, generativ e. W e mo del the div ersiation of v o abulary as a bran hing pro ess of sets of ognates, ev olving on the ph ylogen y . Lea v es are lab eled b y languages and a bran h represen ts the anestral lineage of a v o abulary . 2.1 Prior distribution on trees The material in this subsetion follo ws Ni holls and Gra y (2008 ). W e onsider a ro oted binary tree g with 2 L no des: L lea v es, L − 2 in ternal no des, a ro ot no de r = 2 L − 1 and an A dam no de A = 2 L , whi h is link ed to r b y a edge of innite length. Ea h no de i = 1 , 2 , ..., 2 L is assigned an age t i and t = ( t 1 , t 2 , ..., t A ) ; the units of age are y ears b efore the presen t; for the A dam no de, t A = + ∞ . The edge b et w een paren t no de j and  hild no de i is a direted bran h h i, j i of the ph ylogen y , with the orderings i < j and t i < t j . Let E b e the set of all edges, inluding the edge h r , A i , let V b e the set of all no des and V L = { 1 , 2 , ..., L } the set of all leaf no des. Our base tree spae Γ is the set of all ro oted direted binary trees g = ( E , V , t ) , with distinguishable lea v es and, for i = 1 , 2 , ..., 2 L , no de ages t i > 0 assigned so that the direted path from a leaf i ∈ V L to no de A passes through no des of stritly inreasing age. With this n um b ering on v en tion, ( E , V ) is alled the ordered history of a ro oted direted binary tree. W e allo w for C alibration onstrain ts on tree-top ology and seleted no de ages. These are desrib ed at the end of Setion 1 . Ea h su h onstrain t c allo ws trees in just some subspae Γ ( c ) ⊂ Γ of tree spae. W e add to these onstrain ts an upp er b ound on the ro ot time at some age T > 0 , and set Γ (0) = { ( E , V , t ) ∈ Γ : t r ≤ T } . The spae of alibrated ph ylogenies with atastrophes is then Γ C = C \ c =0 Γ ( c ) . The ro ot age is a sensitiv e statisti in this inferene. A prior distribution on trees, with the prop ert y that the marginal distribution of the ro ot age is uniform o v er a xed prior range t L ≤ t r ≤ T , is of in terest. F or no de i ∈ V , let t + i = sup g ∈ Γ C t i and t − i = inf g ∈ Γ C t i b e the greatest and least admissible ages for no de i , and let S = { i ∈ V : t + i = T } , so that S is the set of no des ha ving ages not b ounded ab o v e b y a alibration (there are 12 su h no des in Figure (5 )). Ni holls and Gra y (2008 ) sho w that, b efore alibration (when C = 0 ) and for tree spaes in whi h the lea v es ha v e xed equal ages, the prior probabilit y distribution with densit y f G ( g | T ) ∝ Y i ∈ S ( t r − t − i ) − 1 4 giv es a marginal densit y for t r whi h is exatly uniform in t L < t r < T . Ni holls and Gra y (2008 ) do not ommen t on the distribution determined b y f G o v er tree top ologies. The prior f G has in this ase ( C = 0 ) a uniform marginal distribution on ordered histories exatly equal to the orresp onding distribution for the Y ule mo del. This is not uniform, but fa v ors balaned leaf-lab eled top ologies. F or C > 0 , and on atastrophe-free tree spaes in whi h leaf ages are onstrained b y prior kno wledge to lie in an in terv al only , Ni holls and Gra y (2008 ) argue from sim ulation studies that this prior giv es a reasonably at marginal distribution for t r if in addition T ≫ max i ∈ V \ S t + i (the greatest upp er b ound among the alibration onstrain ts is not to o lose to the upp er b ound on the ro ot age). W e giv e a sample from the prior in the Supplemen tary Material; the prior do es not represen t an y reasonable prior b elief b efore ab out 4500 BP , but this is immaterial as the lik eliho o d rules these v alues out (an instane of an appliation of the priniple of suien t reason). 2.2 Div ersiation of ognay lasses In this subsetion w e extend the sto  hasti Dollo mo del of Ni holls and Gra y (2008 ) to inorp orate rate heterogeneit y in time and spae, via a atastrophe pro ess. Consider the ev olution of ognates do wn the anestral lineage of the v o abulary of a single language. An example is giv en in Figure 1 . A new ognay lass is b orn when its rst ognate is b orn. This new w ord is not ognate with other w ords in the mo deled pro ess. Loan w ords from outside the ore meaning ategories of an y language in the study , or from a language outside the study , ma y b e go o d for w ord birth without violating this ondition. A ognate dies in a partiular v o abulary when it eases to b e used for the meaning ommon to its ognay lass: its meaning ma y  hange, or it ma y fall out of use. Cognates ev olving in a single language ( ie do wn a single bran h of a language ph ylogen y) are b orn indep enden tly at rate λ , die indep enden tly at p er  apita rate µ , and are sub jet to p oin t-lik e atastrophes, whi h they enoun ter at rate ρ along a bran h. A t a atastrophe, ea h ognate dies indep enden tly with probabilit y κ , and a P oisson n um b er of ognates with mean ν are b orn. A t a bran hing ev en t of the ph ylogen y , the set of ognates represen ting the bran hing v o abulary is opied in to ea h of the daugh ter languages. See Figure 1 . The pro ess w e ha v e desrib ed is not rev ersible, and this greatly ompliates the analysis. It seems aeptable, from a data-mo deling p ersp etiv e, to imp ose the ondition ν = κλ/µ , whi h is neessary and suien t for rev ersibilit y (see Supplemen tary Material for a pro of ). Under this ondition, adding a atastrophe to an edge is equiv alen t to lengthening that edge b y T C ( κ, µ ) = − lo g(1 − κ ) /µ y ears. This follo ws b eause the n um b er of ognates generated b y the anageni part of the pro ess in an in terv al of length T C is P oisson distributed with mean λ µ (1 − e − µT C ) equal to κλ/µ , and the probabilit y that a ognate en tering an in terv al of length T C dies during that in terv al is 1 − e − µT C , whi h equals κ . Beause a atastrophe simply extends its edge b y a blo  k of virtual time, the lik eliho o d dep ends only on the n um b er of atastrophes on an edge, and not their lo ation in time. Let k i b e the n um b er of atastrophes on edge h i, j i , and k = ( k 1 , . . . , k 2 L − 2 ) b e the atastrophe state v etor. W e reord no atastrophes on the h R, A i edge (its length is already innite). The tree g = ( V , E , t, k ) is sp eied b y its top ology , no de ages and atastrophe state. Calibrated tree spae extended for atastrophes is Γ C K = { ( V , E , t, k ) : ( V , E , t ) ∈ Γ C , k ∈ N 2 L − 2 0 } . W e drop the atastrophe pro ess from the alulation in Setion 3. It is straigh tforw ard to restore it, and w e do this in the expression for the p osterior distribution in Setion 4 . 5 {1,2} −2 +12,13 −1 +9,10 −8,9 +11 +14 −2 +6 −12 −12 +5 −1 +3 −3 +8 +7 {1} {1,3,14} {10,11} {5} {5} {1,6,7,13} {1,6,13} {1,13} Figure 1: Desription of the mo del: births and deaths of traits are mark ed. The dots orresp ond to atastrophes, at whi h m ultiple births and deaths ma y o ur sim ultaneously . The ognate sets this generates at lea v es are sho wn on the righ t. Calendar time o ws from left to righ t and the age v ariables t i in the text inrease from righ t to left. 2.3 The registration pro ess Let D ∗ denote a notional ful l random binary data matrix, represen ting the outome of the div ersiation pro ess of Setion 2.2 . The n um b er of olumns in D ∗ is random, and equal to N ∗ . F or the realization depited in Figure 1 , D ∗ = D ∗ with D ∗ displa y ed in Figure 2. Note that D ∗ has a olumn for ea h ognay lass presen t at the ro ot no de, or b orn b elo w it, whether or not the ognay lass has an y ognates represen ted in an y leaf languages. W e all the random mapping of the unkno wn full represen tation D ∗ to the registered data D , whi h is a random matrix with N olumns, the r e gistr ation pro ess. There are t w o stages to this pro ess: the masking of matrix elemen ts of the fully realised data D ∗ with ?'s to form an in termediate data matrix ˜ D , and the seletion of olumns from ˜ D to determine the realised data D . Let I ∗ b e a random L × N ∗ indiator matrix of indep enden t Bernoulli random v ariables for observ ed elemen ts. The zeros of I ∗ mark matrix en tries in D ∗ whi h will b e unobserv able. F or i = 1 , 2 , ..., L and giv en a = 1 , 2 , ..., N ∗ , let ξ i = Pr( I ∗ i,a = 1) . The probabilit y ξ i that w e an answ er the question, do es language i p ossess a w ord in the a 'th ognay lass?, is assumed to b e a funtion of the language index i 6 [ D* ℄ [ I* ℄ [ D~ ℄ [ I ℄ [ D ℄ 10000000000000 11111111111111 10000000000000 1 111 1 11 1 000 0 00 10100000000001 01011111111111 ?0?00000000001 0 111 1 11 ? 000 0 01 00000000011000 10111001110110 0?000??001?00? 1 100 1 10 0 0?? 1 0? 00001000000000 11111111111111 00001000000000 1 111 1 11 0 100 0 00 00001000000000 11110111111111 0000?000000000 1 011 1 11 0 ?00 0 00 10000110000010 10111111111111 1?000110000010 1 111 1 11 1 011 0 10 10000100000010 11111111111111 10000100000010 1 111 1 11 1 010 0 10 10000000000010 11111111111100 100000000000?? 1 111 1 00 1 000 0 ?? Figure 2: Registration of the v o abulary realized in Figure 1 supp osing a masking matrix I ∗ , as ab o v e. D ∗ is the unobserv ed full data with a olumn for ea h ognay lass and a ro w for ea h language of Figure 1 (th us ognate 1 is presen t in ro ws 1,2,6,7 and 8); zeros in the masking matrix I ∗ indiate missing matrix elemen ts. Some ognay lasses are then thinned (in this example, the registration rule k eeps ognay lasses with instanes displa y ed in one or more language) to giv e the registered data D . only . If w e get an answ er, it is assumed orret. Let ξ = ( ξ 1 , . . . , ξ L ) and denote b y I ∗ a realisation of I ∗ . Denote b y ˜ D = ˜ D ( D ∗ , I ∗ ) the mask ed v ersion of the full random data matrix: if I ∗ i,a = 1 then ˜ D i,a = D ∗ i,a and if I ∗ i,a = 0 then ˜ D i,a =? . Matrix olumns ma y b e missing to o, so that N ≤ N ∗ . W e get missing olumns ev en when there are no missing data. The matrix ˜ D in Figure 2 inludes some olumns with only zeros and question marks, orresp onding to ognay lasses whi h existed in the past but are observ ed in none of the leaf-languages from whi h the data w ere ompiled. These ognay lasses are not inluded in the registered data. Denote b y R the registration rule D = R ( ˜ D ) mapping the full data to registered data. Rule R ma y thin additional olumn t yp es. Let Y and Q b e funtions of the olumns of D ∗ and I ∗ oun ting the visible 1's and ?'s resp etiv ely , Y ( D ∗ a , I ∗ a ) = L X i =1 I ∗ i,a D ∗ i,a , Q ( I ∗ a ) = L X i =1 (1 − I ∗ i,a ) . Giv en a = 1 , 2 , ..., N ∗ , let Y a = Y ( D ∗ a , I ∗ a ) and Q a = Q ( I ∗ a ) . In App endix A w e giv e an eien t algorithm for omputing the lik eliho o d for rules formed b y om- p ounding the follo wing elemen tary thinning op erations: (1) R 1 ( ˜ D ) = ( ˜ D a : Y a > 0) (disard lasses with no instanes at the lea v es); (2) R 2 ( ˜ D ) = ( ˜ D a : Y a > 1) (disard lasses - singletons - observ ed at a single leaf ); (3) R 3 ( ˜ D ) = ( ˜ D a : Y a < L ) (disard lasses whi h are observ ed at all lea v es); (4) R 4 ( ˜ D ) = ( ˜ D a : Y a < L − 1) (disard lasses whi h are observ ed at all lea v es or at all lea v es but one); (5) R 5 ( ˜ D ) = ( ˜ D a : Y a + Q a < L ) (disard lasses whi h are p oten tially presen t at all lea v es); 7 (6) R 6 ( ˜ D ) = ( ˜ D a : Y a + Q a < L − 1) (disard lasses whi h are p oten tially presen t at all lea v es or at all lea v es but one). W e assume the  hosen rule inludes Condition (1). The rule D = R ( ˜ D ) with R ( ˜ D ) = R 4 ◦ R 2 ( ˜ D ) ollets parsimon y informativ e ognay lasses. Ronquist et al. (2005 ) giv e the lik eliho o d for the nite- sites trait ev olution mo del of Lewis (2001 ) for registration rules lik e (1-6). In the example in Figure 2 , and in Setions 6.2 and 7) w e t data registered with R ( ˜ D ) = R 1 ( ˜ D ) . The seletion of olumns is something w e ha v e in general no on trol o v er: the olumn seletion rule simply desrib es what happ ened at registration. Results in the Supplemen tary Material for the Dy en et al. (1997 ) data use the rule R ( ˜ D ) = R 2 ( ˜ D ) , sine singleton olumns w ere not inluded in that data. Ho w ev er, ertain olumn t yp es ma y inlude data whi h is hard to mo del w ell, and so w e ma y  ho ose to mak e further thinning using the other rules. Reursions for the other rules are giv en in an App endix. Column indies a = 1 , 2 , ..., N ∗ are ex hangeable. It is on v enien t to ren um b er the olumns of D ∗ , I ∗ and ˜ D after registration, so that ˜ D a = D a and I ∗ a = I a for a = 1 , 2 , ..., N . The information needed to ev aluate Y a and Q a is a v ailable in the olumn D a and set Ω a represen tations. W e write Y ( D a ) = Y (Ω a ) = Y a and Q ( D a ) = Q (Ω a ) = Q a . 2.4 P oin t pro ess of births for registered ognay lasses Fix a atastrophe-free ph ylogen y g ∈ Γ C K , with k = (0 , 0 , ..., 0) , and let an edge h i, j i and a time τ ∈ [ t i , t j ) b e giv en. Denote b y [ g ] the set of all p oin ts ( τ , i ) on the ph ylogen y , inluding p oin ts ( τ , r ) with τ ≥ t r in the edge h r , A i . The lo ations z D = { z 1 , z 2 , ..., z N } of the birth ev en ts of the N r e gister e d ognay lasses are a realization of an inhomogeneous P oisson p oin t pro ess Z D on [ g ] . Let Z ∈ [ g ] b e the birth lo ation of a generi (and p ossibly unregistered) ognay lass M ⊆ { 1 , 2 , ..., L } , orresp onding to a olumn of ˜ D with Y observ ed 1 's and Q ? 's, and let E Z b e the ev en t that this lass generates a olumn of the registered data. F or the a 'th ognay lass M a , b orn at Z a , this ev en t is E Z a = { D a = R ( ˜ D a ) } sine R ( ˜ D a ) is empt y for ˜ D a a olumn dropp ed at registration. Cognay lasses are b orn at onstan t rate on the bran hes of g , but are thinned b y registration. Ho w ev er, onditional on the birth lo ation Z a = z a , our mo deling assumes that the outome E Z a for the a 'th ognay lass is deided indep enden tly of all ev en ts in all other ognay lasses. The p oin t pro ess Z D of birth lo ations of registered ognay lasses has in tensit y ˜ λ ( z ) = λ Pr( E Z | g , µ, λ, ξ , Z = z ) at z ∈ [ g ] and probabilit y densit y f Z D ( z D ) = 1 N ! e − Λ([ g ]) N Y a =1 ˜ λ ( z a ) with resp et to the elemen t of v olume dz D = dz 1 dz 2 ...dz N in [ g ] N , where Λ([ g ]) = Z [ g ] ˜ λ ( z ) dz = X h i,j i∈ E Z t j t i ˜ λ (( τ , i )) dτ . The n um b er N of registered ognay lasses is N ∼ P oisson (Λ([ g ])) . 8 3 Lik eliho o d alulations W e giv e the lik eliho o d for g , µ , λ , κ , ρ and ξ giv en the data, along with an eien t algorithm to ompute the sum o v er all missing data. The atastrophe pro ess is left out, and reinorp orated in the next setion. Sine w e only ev er see registered data, the lik eliho o d for g , λ, µ and ξ is the probabilit y P [ D = D | g , µ, λ, ξ , D = R ( ˜ D )] , to get data D giv en the parameters and onditional on the data ha ving passed registration. W e restore the birth lo ations (and so omit λ from the onditioning), and fatorize using the join t indep endene of D a , a = 1 , 2 , ..., N under the giv en onditions: P [ D = D | g , µ, λ, ξ , D = R ( ˜ D )] = Z f Z D ( z D ) P [ D = D | g , µ, ξ , Z D = z D , D = R ( ˜ D )] dz D , = e − Λ([ g ]) N ! N Y a =1 λ Z [ g ] P [ E Z a | g , µ, ξ , Z a = z a ] P [ D a = D a | g , µ, ξ , Z a = z a , E Z a ] dz a , = e − Λ([ g ]) N ! N Y a =1 λ Z [ g ] P [ D a = D a | g , µ, ξ , Z a = z a ] dz a . The last line follo ws b eause P [ E Z a | D a = D a , ..., Z a = z a ] = 1 for D a a olumn of registered data: if the outome of the birth at z a w as the registerable data D a then the ev en t E Z a ertainly o urs. The lik eliho o d dep ends on the a wkw ard ondition D = R ( ˜ D ) through the mean n um b er λ ([ g ]) of registered ognay lasses only , while P [ D a = D a | g , µ, ξ , Z a = z a ] is the probabilit y to realise the data v etor D a in the unonditioned div ersiation/missing elemen t pro ess. The alulation has so far extended Ni holls and Gra y (2008 ) to giv e the lik eliho o d for a greater v ariet y of olumn thinning rules. W e no w add the missing elemen t omp onen t of the registration pro ess. W e sum o v er p ossible v alues of the missing matrix elemen ts in the registered data. Sine P [ D a = D a | g , µ, λ, ξ , Z a = z a ] is not onditioned on the requiremen t that the olumn D a gets registered, the en tries of the orresp onding olumn I a are determined b y the unonditioned Bernoulli pro ess, and w e ha v e P [ D a = D a | g , µ, ξ , Z a = z a ] = X d ∗ ∈D a P [ I ∗ a , D ∗ a = d ∗ | g , µ, ξ , Z a = z a ] = L Y i =1 ξ I i,a i (1 − ξ i ) 1 − I i,a X d ∗ ∈D a P [ D ∗ a = d ∗ | g , µ, ξ , Z a = z a ] . The lik eliho o d is P [ D = D | g , µ, λ, ξ , D = R ( ˜ D )] = e − Λ([ g ]) N ! N Y a =1 ( L Y i =1 ξ I i,a i (1 − ξ i ) 1 − I i,a ) λ Z [ g ] X ω ∈ Ω a P [ M = ω | g , µ, ξ , Z = z a ] dz a , (1) where w e ha v e swit hed no w from summing d ∗ ∈ D a to the equiv alen t set represen tation ω ∈ Ω a . F or the t w o in tegrated quan tities in Equation (1) w e ha v e tratable reursiv e form ulae. W e are using a pruning pro edure akin to F elsenstein (1981 ). W e b egin with Λ([ g ]) . W e assume the registration rule inludes at least Condition (1). It follo ws that a ognay lass b orn at Z = ( τ , i ) in [ g ] m ust surviv e do wn to the no de b elo w, at Z = ( t i , i ) , in order to b e registered, and so P [ E Z | Z = ( τ , i ) , g , µ, ξ ] = P [ E Z | Z = ( t i , i ) , g , µ, ξ ] e − µ ( τ − t i ) . 9 W e an substitute this in to the expression for Λ([ g ]) , and in tegrate, to get Λ([ g ]) = λ µ X h i,j i∈ E P [ E Z | Z = ( t i , i ) , g , µ, ξ ]  1 − e − µ ( t j − t i )  . (2) Giv en a no de i , let V ( i ) L b e the set of leaf no des desended from i , inluding i itself if i is a leaf. Let s i = ard ( V ( i ) L ) . Denote b y u ( n ) i = P [ Y = n | Z = ( t i , i ) , g , µ, ξ ] and v ( n ) i = P [ Y + Q = n | Z = ( t i , i ) , g , µ, ξ ] . W e an ompute Λ for rules made up of om binations of Condition (1) with an y om bination of Conditions (2-6), from u (0) i , u (1) i , u ( s i − 1) i , u ( s i ) i , v ( s i − 1) i and v ( s i ) i . F or example, P [ E Z | Z = ( t i , i ) , g , µ, ξ ] =      1 − u (0) i R = R 1 , 1 − u (0) i − u (1) i R = R 2 , 1 − u (0) i − u (1) i − u ( L − 1) i − u ( L ) i R = R 3 ◦ R 2 . } (3) Notie that u ( n ) i = 0 unless s i ≥ n , so for example u ( L ) i is non-zero at i the ro ot no de only . Sine our main appliation is for data registered under Condition (1), w e giv e, in the b o dy of this pap er, reursions for u (0) i and v (0) i only . See App endix A for the reursions needed to ev aluate the lik eliho o d under rules in v olving Conditions (2-6). F or no des i and j , let δ i,j = e − µ ( t j − t i ) . Consider a pair of edges h c 1 , i i , h c 2 , i i in E . u (0) i =  (1 − δ i,c 1 ) + δ i,c 1 u (0) c 1   (1 − δ i,c 2 ) + δ i,c 2 u (0) c 2  (4) v (0) i =   δ i,c 1 v (0) c 1 + (1 − δ i,c 1 ) Y j ∈ V c 1 L ξ j     δ i,c 2 v (0) c 2 + (1 − δ i,c 2 ) Y j ∈ V c 2 L ξ j   (5) The reursion is ev aluated from the lea v es i ∈ V L , at whi h u (0) i = 1 − ξ i (6) v (0) i = 0 (7) W e no w giv e the equiv alen t reursions for λ R [ g ] P ω ∈ Ω a P [ M = ω a | Z = z a , g , µ ] dz a . Consider the set m a = T ω ∈ Ω a ω of lea v es known to ha v e a ognate in the a th registered ognay lass. Let E a b e the set of bran hes on the path from the most reen t ommon anestor of the lea v es in m a up to the A dam-no de A ab o v e the ro ot. Cognay lass a m ust ha v e b een b orn on an edge in E a . F or a = 1 , 2 , ..., N , lass M a is non-empt y and w e an shift the birth lo ation to the no de b elo w and on v ert the in tegral to a sum, λ Z [ g ] X ω ∈ Ω a P [ M = ω | Z = z a , g , µ ] dz a = λ µ X h i,j i∈ E a X ω ∈ Ω a P [ M = ω | Z = ( t i , i ) , g , µ ](1 − δ i,j ) . F or ea h a = 1 , 2 , ..., N and ω ∈ Ω a , let ω ( i ) = ω ∩ V ( i ) L and Ω ( i ) a = { ω ( i ) : ω ( i ) = ω ∩ V ( i ) L , ω ∈ Ω a } denote the set of all subsets ω ( i ) of the lea v es V ( i ) L whi h are ognay lasses onsisten t with the data a v ailable for those lea v es. Consider t w o  hild bran hes h c 1 , i i and h c 2 , i i at no de i . Sine Ω a = Ω ( c 1 ) a × Ω ( c 2 ) a , 10 and ev en ts are indep enden t along the t w o bran hes, X ω ∈ Ω a P [ M = ω | Z = ( t i , i ) , g , µ, ξ ] = X ω ( c 1 ) ∈ Ω ( c 1 ) a P [ M = ω ( c 1 ) | Z = ( t i , c 1 ) , g , µ ] × X ω ( c 2 ) ∈ Ω ( c 2 ) a P [ M = ω ( c 2 ) | Z = ( t i , c 2 ) , g , µ ] . Ha ving mo v ed the birth ev en t at ( t i , i ) to ( t i , c 1 ) and ( t i , c 2 ) (o the no de and on to its  hild edges) w e no w mo v e the birth ev en t at ( t i , c ) to ( t c , c ) (do wn an edge) as follo ws: X ω ∈ Ω ( c ) a P [ M = ω | Z = ( t i , c ) , g , µ ] =                  δ i,c × X ω ∈ Ω ( c ) a P [ M = ω | Z = ( t c , c ) , g , µ ] if Y (Ω ( c ) a ) ≥ 1 (1 − δ i,c ) + δ i,c × X ω ∈ Ω ( c ) a P [ M = ω | Z = ( t c , c ) , g , µ ] if Y (Ω ( c ) a ) = 0 and Q (Ω ( c ) a ) ≥ 1 (1 − δ i,c ) + δ i,c v (0) c if Y (Ω ( c ) a ) + Q (Ω ( c ) a ) = 0 ( i.e. Ω ( c ) a = {∅ } ) The reursion is ev aluated from the lea v es. If c is a leaf, then X ω ∈ Ω ( c ) a P [ M = ω | Z = ( t c , c ) , g , µ ] = ( 1 if Ω ( c ) a = {{ c } , ∅} or {{ c }} ( i.e. D c,a ∈ { ? , 1 } ) 0 if Ω ( c ) a = {∅ } ( i.e. D c,a = 0 ) . In order to restore atastrophes to this alulation, and giv en g ∈ Γ K , with k i atastrophes on edge h i, j i ∈ E , replae t j − t i with t j − t i + k i T C ( κ, µ ) throughout. 4 P osterior distribution Our prior on the birth rate λ , death rate µ and atastrophe rate ρ is p ( λ, µ, ρ ) ∝ 1 λµρ and w e tak e a uniform prior o v er [0 , 1] for the death probabilit y at a atastrophe κ and ea h missing data parameter ξ i . Substituting using equations (2)-(3 ) in to equation ( 1) and m ultiplying b y the prior f G ( g | T ) p ( λ, µ, ρ ) , w e obtain the p osterior distribution p ( g , µ, λ, κ, ρ, ξ | D = D ) = 1 N !  λ µ  N exp   − λ µ X h i,j i∈ E P [ E Z | Z = ( t i , i ) , g , µ, κ, ξ ](1 − e − µ ( t j − t i + k i T C ) )   × N Y a =1   X h i,j i∈ E a X ω ∈ Ω a P [ M = ω | Z = ( t i , i ) , g , µ ](1 − e − µ ( t j − t i + k i T C ) )   × 1 µλρ f G ( g | T ) e − ρ | g | ( ρ | g | ) k T k T ! L Y i =1 (1 − ξ i ) Q i ξ N − Q i i (8) 11 for parameters µ, λ, κ, ρ > 0 , 0 < ξ i < 1 and trees g ∈ Γ ( C ) K . The p osterior is improp er without b ounds on ρ sine k T = 0 is allo w ed. W e plae v ery onserv ativ e b ounds on ρ . Results are not sensitiv e to this  hoie. W e an sho w that, for 't ypial' data sets D , and in partiular the data analysed b elo w, the p osterior is then prop er. Details of the relationship b et w een ognate lasses and alibration onstrain ts pla y a role in the onditions for the p osterior distribution to b e prop er. 5 Mark o v Chain Mon te Carlo W e use Mark o v Chain Mon te Carlo to sample the p osterior distribution and estimate summary statistis. If the prior on the ognay lass birth rate parameter λ has the onjugate form λ u exp( − v λ ) then the onditional distribution of λ in the p osterior distribution ab o v e has the form λ N + u exp( − λ ( X + v )) . W e to ok the improp er prior u = − 1 , v = 0 for λ and in tegrated. The MCMC state is then x = (( E , V , t, k ) , µ, κ, ρ, ξ ) and the target distribution p ( x | D ) is the densit y obtained b y in tegrating the densit y in Equation 8 o v er λ . The MCMC sampler desrib ed in Ni holls and Gra y (2008 ) has state x = (( E , V , t ) , µ ) . W e add to the ( E , V , t ) - and µ -up dates of Ni holls and Gra y (2008 ) further MCMC up dates ating on the atastrophe v etor k = ( k 1 , ..., k L − 2 ) , on the atastrophe parameters κ, ρ and on ξ , the probabilit y parameter for observ able data-matrix elemen ts. The atastrophe rate parameter ρ is added to time-saling up dates in whi h subsets of parameters are sim ultaneously saled b y a ommon random fator s : if θ is a parameter in the saled subset ha ving units [ time ] u , then θ → s u θ . The probabilit y 0 < ξ i < 1 for an elemen t of the registered data matrix to b e observ able is, for man y leaf-languages, lose to one, so w e up date those parameters b y saling 1 − ξ i . W e inorp orate up dates adding and deleting atastrophes (the lled dots mark ed on the bran hes of Figure 1 ) plus an up date whi h mo v es a atastrophe from an edge to a paren t,  hild or sibling edge. F or the addition and deletion of atastrophes, w e do not need to use rev ersible jump Mark o v Chain Mon te Carlo, as the state v etor sp eies the n um b ers, and not the lo ations, of atastrophes on edges. W e omit the details of these mo v es but giv e, as an example, the up date that mo v es a atastrophe from an edge to a paren t,  hild or sibling edge. Let k T = P L − 2 i =1 k i giv e the total n um b er of atastrophes. Giv en a state x = ( g , µ, κ, ρ, ξ ) with g = ( V , E , t, k ) , w e pi k edge h i, j i ∈ E with probabilit y k i /k T . Let E h i,j i b e the set of edges neighb ouring edge h i, j i ( hild, sibling and paren t edges, but exluding the edge h R, A i ) and let q i = ard ( E h i,j i ) . W e ha v e in general q i = 4 . Ho w ev er, for i the index of a leaf no de, q i = 2 (1 paren t, 1 sibling, no  hildren). If j is the ro ot and i is non-leaf, then q i = 3 (1 sibling, 2  hildren) and if j is the ro ot and i is a leaf w e ha v e q i = 1 (a sibling edge). Cho ose a neigh b ouring edge h ˜ i, ˜ j i uniformly at random from E h i,j i and mo v e one atastrophe from h i, j i to h ˜ i, ˜ j i . The andidate state is x ′ = (( V , E , t, k ′ ) , µ, κ, ρ ) , with k ′ i = k i − 1 and k ′ ˜ i = k ˜ i + 1 and k ′ j = k j for j 6 = i, ˜ i . This mo v e is aepted with probabilit y α ( x ′ | x ) = min  1 , q i k ′ ˜ i p ( x ′ | D ) q ˜ i k i p ( x | D )  . W e assessed on v ergene with the asymptoti b eha viour of the auto orrelation for the parameters µ , κ , ρ and t R , as suggested b y Gey er (1992 ). This metho d indiated that w e ould use runs of ab out 10 million samples; w e also let the MCMC run for 100 million samples and  he k ed that the omputed statistis did not v ary . 12 6 V alidation W e made a n um b er of tests using syn theti data. Fitting the mo del to syn theti data sim ulated aording to the lik eliho o d P ( D = D | g , µ, λ, ρ, κ, ξ , D = R ( ˜ D )) , (in-mo del data), sho ws us just ho w informativ e the data is for atastrophe plaemen t, as w ell as making a debug- he k on our implemen tation. W e t out-of-mo del data also. These are syn theti data sim ulated under lik ely mo del-violation senarios, and are used to iden tify soures of systemati bias. W e summarise results for syn theti data sim ulated using the parameter v alues w e estimate in setion 7 on the data. F or in-mo del data w e orretly reonstrut top ology , ro ot age and the n um b er and p osition of atastrophes. F urther details are giv en in the Supplemen tary Material. 6.1 Mo del mis-sp eiation Ni holls and Gra y (2008 ) use out-of-mo del data represen ting un-mo deled loan-w ords (alled b orr owing ), rate-heterogeneit y in time and spae, rate heterogeneit y aross ognay lasses, and the empt y-eld ap- pro ximation. They disuss also mo del mis-sp eiation due to missing data and the inorret iden tiation of ognay lasses (in partiular, a hazard for deeply ro oted lasses to b e split). Rate-heterogeneit y in time and spae, and missing data are no w part of the in-mo del analysis. W e syn thesized out-of-mo del data with the b orro wing mo del of Ni holls and Gra y (2008 ), in order to see if un-mo deled b orro wing biased our results. F or what Ni holls and Gra y (2008 ) all lo w to mo derate lev els of b orro wing, w e w ere able to reonstrut true parameter v alues w ell. See the Supplemen tary Material for details. With high lev els of b orro wing, w e under-estimate the ro ot age and o v er-estimate rate parameters. Ho w ev er, uniden tied loan w ords in the registered data do not generate mo del misp eiation unless they are opies of ognates whi h o ur in the observ ed meaning ategories of languages anestral to the leaf-languages. It is not plausible that uniden tied b orro wing of this kind is presen t at high lev els. W e ha v e not rep eated the Ni holls and Gra y (2008 ) out-of-mo del analysis of rate heterogeneit y aross ognay lasses. In a real v o abulary , distint ognay lasses that share a meaning ategory w ould not ev olv e indep en- den tly . Also, a real language migh t b e exp eted to a p ossess a w ord in ea h of the ore meaning ategories. This onstrains the n um b er of ognay lasses in ea h meaning ategory to b e non-zero. Our mo del allo ws empt y meaning ategories. Ni holls and Gra y (2008 ) nd no substan tial bias in a t to out-of-mo del data resp eting the onstrain t. Our treatmen t of missing data in tro dues a new mis-sp eiation. W e mo deled matrix elemen ts as missing indep enden tly . Ho w ev er, w e get missing data when w e do not kno w the w ord used in a giv en language to o v er a giv en meaning ategory . Matrix elemen ts are as a onsequene t ypially missing in blo  ks orresp onding to all the ognay lasses for the giv en meaning. Beause the ages of p o orly reonstruted languages are w ell predited in the ross-v alidation study b elo w, w e ha v e not lo ok ed further at this issue. 6.2 Predition tests for alibration In the next setion w e estimate the age of a tree no de (the ro ot). In this setio n w e test to see if the unertain ties w e estimate are reliable. The alibration data desrib ed in Setion ( 1) x the top ologies and ro ot ages of the subtrees mark ed with bars in Figure 5). W e remo v e ea h alibration onstrain t in turn and use the data and the remaining onstrain ts to estimate the age of the onstrained no de, and the probabilit y for the onstrained subtree top ology . The top ologial onstrain ts w ere all p erfetly reonstruted. In t w en t y three of the t w en t y eigh t 13 tests the onstrained age in terv al o v erlaps the 95% HPD in terv al, as sho wn in the b ottom half of Figure 3 . Ho w go o d or bad are ea h of these preditions? Lo oking at the Hittite predition in Figure 3, the large predition unertain t y only just allo ws the alibration in terv al. Is this bad? W e quan tify the go o dness- of-t, for ea h alibration, using Ba y es fators to replae p -v alues as indies of mist. F or ea h onstrain t c = 1 , 2 , ..., C w e ompute a Ba y es fator measuring the supp ort for the fully onstrained mo del ompared to a mo del with just the c 'th onstrain t remo v ed. F or c = 1 , 2 , ..., C let Γ C − c = C \ c ′ =0 c ′ 6 = c Γ ( c ′ ) . denote the enlarged tree spae with the c 'th onstrain t remo v ed, and let Γ C − c K b e the enlarged tree spae, extended to inlude atastrophes (as in Setion ( 2.2 )). F or ea h onstrain t c = 1 , 2 , ..., C w e mak e a mo del omparison b et w een a ommon n ull mo del with all the onstrain ts, H 0 : g ∈ Γ C K , and an alternativ e mo del H 1 : g ∈ Γ C − c K with the onstrain t remo v ed. The Ba y es fator B C,C − c for the mo del omparison is the ratio of the p osterior probabilities for these mo dels with mo del prior P ( H 0 ) = P ( H 1 ) , B − c = P ( D | g ∈ Γ C K ) P ( D | g ∈ Γ C − c K ) = P ( D | g ∈ Γ C K ∩ Γ C − c K ) P ( D | g ∈ Γ C − c K ) = P ( g ∈ Γ C K | D , g ∈ Γ C − c K ) P ( g ∈ Γ C K | g ∈ Γ C − c K ) . where the seond line follo ws sine Γ C K ⊂ Γ C − c K and the third from the denition of the onditional probabilities. The n umerator P ( g ∈ Γ C K | D , g ∈ Γ C − c K ) is the p osterior probabilit y for the c 'th onstrain t to b e satised giv en the data and the other onstrain ts. The denominator, P ( g ∈ Γ C K | g ∈ Γ C K ) , is the prior probabilit y for the c 'th onstrain t to b e satised giv en the other onstrain ts. W e estimate these probabilities using sim ulation of the p osterior and prior distributions with onstrain t c remo v ed. The Ba y es fators are estimated with negligible unertain t y and 2 log ( B C,C − c ) is plotted for c = 1 , 2 , ..., C in the top half of Figure 3. Strong evidene against a alibration is failure at predition. T aking a Ba y es fator exeeding 12 (that is, 2 log ( B C,C − c ) & 5 in Figure 3) as strong evidene against the onstrain t, follo wing Raftery (1996 ), w e ha v e onit for three of the thirt y onstrain ts: the ages of Old Irish and A v estan, and for the age of the Balto-Sla v lade. As our analysis in Setion 7 sho ws, there is a high p osterior probabilit y that a atastrophe ev en t o urred on the bran h b et w een Old Irish and W elsh, and another b et w een Old P ersian and A v estan. The evidene for rate heterogeneit y in rest of the tree is so sligh t, that when w e try to predit these alibrations w e are prediting at ypial ev en ts. Our missing-data analysis has help ed here. The alibration in terv al for the Hittite v o abulary in these data is 32003700BP . A reonstrution of the age of Hittite whi h ignores missing data predits 602010BP , w ell outside of the onstrain ts. The 95% HPD in terv al for the age of Hittite in our mo del is 4303250BP , whi h just o v erlaps the onstrain t. The Ba y es fator giv es o dds less than 3:1 against, so the evidene against the onstrain t is `hardly w orth men tioning'. 14 8000 6000 4000 2000 0 −100 −10 −5 −2 0 2 5 10 100 HI TA TB LU LY OI UM OS LA GK AR GO ON OE OG OS PR AV PE VE CE IT GE WG NW BS BA IR II TG Figure 3: R e  onstrution of known no de ages: top, lo garithm of Bayes fators log( B C,C − c ) for c = 1 , 2 , ..., C ; b ottom, thin lines show age  onstr aints for dier ent no des, thik lines show 95% p osterior HPD interval for the r e  onstrute d dates. The no des ar e displaye d in the same or der as the  onstr aint in Fig. 5 15 armenian albanian oldirish welsh luvian oldnorse oldenglish oldhighgerman gothic lycian oldcslavonic latvian lithuanian oldprussian tocharian_a tocharian_b hittite greek vedic avestan oldpersian latin umbrian oscan 62 78 66 85 58 0 1000 2000 3000 4000 5000 6000 7000 8000 Figure 4: Consensus tr e e for the R inge et al. (2002 ) dataset. R e d dots show  atastr ophes supp orte d with pr ob ability ab ove one half. 7 Results In this setion w e presen t results for our MCMC sim ulation of the full p osterior, Equation ( 8). An upp er limit T = 16000 w as used in the tree prior of Setion 2.1. An y v alue for T exeeding around T = 100 00 w ould lead to the same results. W e sho w a onsensus tree in gure 4. In a tree, an edge orresp onds to a split partitioning the lea v es in to t w o sets. A onsensus tree displa ys just those splits presen t in at least 50% of the p osterior sample. Splits whi h reeiv e less than 95% supp ort are lab elled. Where no split is presen t in 50% of the p osterior sample, the onsensus tree is m ultifurating. The date sho wn for a no de is the a v erage p osterior date giv en the existene of the split; similarly , the n um b er of atastrophes sho wn on an edge is the a v erage p osterior n um b er of atastrophes on that edge giv en the existene of the split, rounded to the nearest in teger. Our estimates for the parameters are as follo ws: µ = 1 . 8 6 · 10 − 4 ± 1 . 47 · 10 − 5 deaths/y ear; κ = 0 . 361 ± 0 . 0 5 5 ; ρ = 6 . 4 · 10 − 5 ± 3 . 3 · 10 − 5 atastrophes/y ear (orresp onding to large but rare atastrophes: ab out 1 atastrophe ev ery 15,000 y ears, or an a v erage of 3.4 on the tree, with ea h atastrophe orresp onding to 2400 y ears of  hange). W e displa y in Figure 5 the alibration onstrain ts on a tree sampled from the p osterior. The onstrain ts annot b e sho wn on a onsensus tree, as slies aross a onsensus tree are not iso  hronous. The analysis reonstruts some w ell-kno wn features of the Indo-Europ ean tree. The Germani, Celti 16 oldcslavonic lithuanian latvian oldprussian vedic oldpersian avestan latin oscan umbrian welsh oldirish gothic oldhighgerman oldenglish oldnorse greek armenian lycian luvian hittite tocharian_b tocharian_a albanian 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Figure 5: A typi al sample fr om the p osterior. A l l the  onstr aints on the no de ages ar e shown, ex ept those for the Itali, Indo-Ir anian and Ir anian gr oups, for whih we do not have an upp er b ound, and the r o ot, whih has an upp er b ound at T = 16000 ye ars BP. 17 and Itali families are group ed together, but no partiular onguration of their relativ e p ositions is fa v ored. The Indo-P ersian group an fall outside the Balto-Sla v group but the relativ e p osition of these t w o is unertain. The deep top ology of the tree is left quite unertain b y these data, esp eially the p osition of Albanian. W e nd evidene for atastrophi rate heterogeneit y in three p ositions: on the edges leading to Old Irish, Old P ersian, and in the Um brian-Osan lade. Our estimate for the ro ot age of the Indo-Europ ean family is 8430 ± 1320 y ears BP . The distribution of this k ey statisti is lose to normal. 8 Conlusions Our results giv e a ro ot age for the most reen t ommon anestor of the Indo-Europ ean family of language v o abularies in agreemen t with earlier ph ylogeneti studies. Our results are in agreemen t with mo dels whi h put this date around 8500 BP , and in onit with mo dels whi h require it to b e less than 6500 y ears BP . Our studies of syn theti out-of-mo del data, and reonstrution tests for kno wn historial data supp ort our view that this main result is robust to mo del error. It w ould not b e robust to a step  hange in the rate of lexial div ersiation ating in a o ordinated fashion aross the Indo-Europ ean languages extan t some 3000 to 5000 y ears ago. The metho ds outlined here for handling missing data and rate heterogeneit y in the div ersiation of languages, as seen through lexial data, will nd appliations to generi trait data. A Reursions for other registration pro esses This setion omplemen ts Setion 3 : w e giv e iterations for u (0) i , u (1) i , u ( s i − 1) i , u ( s i ) i , v ( s i − 1) i and v ( s i ) i . These are the quan tities needed (as in Equation 3 ) to ev aluate the sum in Equation 2, for registration rules whi h use Condition (1) in om bination with other onditions from Setion 2.3 . Consider a pair of edges h c 1 , i i , h c 2 , i i in E . In the notation of the text, u (0) i =  (1 − δ i,c 1 ) + δ i,c 1 u (0) c 1   (1 − δ i,c 2 ) + δ i,c 2 u (0) c 2  u (1) i = δ i,c 1 (1 − δ i,c 2 ) u (1) c 1 + δ i,c 2 (1 − δ i,c 1 ) u (1) c 2 + δ i,c 1 δ i,c 2 ( u (1) c 1 u (0) c 2 + u (0) c 1 u (1) c 2 ) u ( s i ) i = δ i,c 1 u ( s c 1 ) c 1 δ i,c 2 u ( s c 2 ) c 2 u ( s i − 1) i =  δ i,c 1 u ( s c 1 − 1) c 1 + I { s c 1 =1 } (1 − δ i,c 1 )  δ i,c 2 u ( s c 2 ) c 2 + δ i,c 1 u ( s c 1 ) c 1  δ i,c 2 u s c 2 − 1 c 2 + I { s c 2 =1 } (1 − δ i,c 2 )  v (0) i =   δ i,c 1 v (0) c 1 + (1 − δ i,c 1 ) Y j ∈ V c 1 L ξ j     δ i,c 2 v (0) c 2 + (1 − δ i,c 2 ) Y j ∈ V c 2 L ξ j   v ( s i ) i =   δ i,c 1 v ( s c 1 ) c 1 + (1 − δ i,c 1 ) Y j ∈ V c 1 L (1 − ξ j )     δ i,c 2 v ( s c 2 ) c 2 + (1 − δ i,c 2 ) Y j ∈ V c 2 L (1 − ξ j )   v ( s i − 1) i =  δ i,c 1 v ( s c 1 − 1) c 1 + (1 − δ i,c 1 ) s c 1 (1 − ξ ) s c 1 − 1 ξ   δ i,c 2 v ( s c 2 ) c 2 + (1 − δ i,c 2 )(1 − ξ ) s c 2  +  δ i,c 1 v ( s c 1 ) c 1 + (1 − δ i,c 1 )(1 − ξ ) s c 1   δ i,c 2 v ( s c 2 − 1) c 2 + (1 − δ i,c 2 ) s c 2 (1 − ξ ) s c 2 − 1 ξ  18 The reursion is ev aluated from the lea v es i ∈ V L , at whi h u (0) i = u ( s i − 1) i = 1 − ξ i u (1) i = u ( s i ) i = ξ i v (0) i = v ( s i − 1) i = 0 v ( s i ) i = 1 Referenes Aleksey enk o, A., C. Lee, and M. Su hard (2008). W agner and Dollo: A Sto  hasti Duet b y Comp osing T w o P arsimonious Solos. Systemati Biolo gy 57 (5), 772784. Dy en, I., J. Krusk al, and B. Bla k (1997). FILE IE-D A T A1. Ra w data a v ailable from h ttp://www.n tu.edu.au/eduation/langs/ielex/IE-D A T A1. Binary data a v ailable from h ttp://www.psy h.au kland.a.nz/psy h/resear  h/ RusselsData .h tm. F elsenstein, J. (1981). Inferring phylo genies . Sinauer Asso iates Sunderland, Mass., USA. Garrett, A. (2006). Con v ergene in the formation of Indo-Europ ean subgroups: Ph ylogen y and  hronology. Phylo geneti metho ds and the pr ehistory of languages , 139. Gey er, C. (1992). Pratial Mark o v Chain Mon te Carlo. Statisti al Sien e 7 (4), 473483. Gra y , R. and Q. A tkinson (2003). Language-tree div ergene times supp ort the Anatolian theory of Indo- Europ ean origin. Natur e 426 (6965), 435439. Lewis, P . (2001). A Lik eliho o d Approa h to Estimating Ph ylogen y from Disrete Morphologial Charater Data. Systemati Biolo gy 50 (6), 913925. Mallory , J. (1989). In se ar h of the Indo-Eur op e ans: language, ar hae olo gy and myth . Thames and Hudson. MMahon, A. and R. MMahon (2005). L anguage Classi ation by Numb ers . Oxford Univ ersit y Press. Ni holls, G. K. and R. D. Gra y (2008). Dated anestral trees from binary trait data and its appliation to the div ersiation of languages. Journal of the R oyal Statisti al So iety, series B 70 (3), 545566. Raftery , A. (1996). Hyp othesis testing and mo del seletion. In W. Gilks, S. Ri hardson, and D. Spiegel- halter (Eds.), Markov Chain Monte Carlo in Pr ati e . Chapman & Hall / CR C. Renfrew, C. (1987). Ar haeology and Language. The Puzzle of Indo-Europ ean Origins. Curr ent A nthr o- p olo gy 29 , 437441. Ringe, D., T. W arno w, and A. T a ylor (2002). Indo-Europ ean and Computational Cladistis. T r ansations of the Philolo gi al So iety 100 (1), 59129. Ronquist, F., J. Huelsen b e k, and P . v an der Mark (2005). MrBa y es 3.1 Man ual. Sho ol of Computational Sien e, Florida State University . W arno w, T., S. Ev ans, D. Ringe, and L. Nakhleh (2004). A Sto  hasti mo del of language ev olution that inorp orates homoplasy and b orro wing. Phylo geneti Metho ds and the Pr ehistory of L anguages . 19

Missing data in a stochastic Dollo model for cognate data, and its application to the dating of Proto-Indo-European

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment