Phase transition on the convergence rate of parameter estimation under an Ornstein-Uhlenbeck diffusion on a tree

Phase transition on the con v er gence rate of parameter estimation under an Ornstein-Uhlenbeck dif fusion on a tree C ´ ecile An ´ e ∗ Lam Si T ung Ho † Sebastien Roch ‡ Abstract Dif fusion processes on trees are commonly used in e v olutionary biology to model the joint distribution of continuous traits, such as body mass, across species. Estimating the param- eters of such processes from tip values presents challenges because of the intrinsic correla- tion between the observ ations produced by the shared e volutionary history , thus violating the standard independence assumption of large-sample theory . For instance Ho and An ´ e [ 18 ] re- cently prov ed that the mean (also known in this conte xt as selection optimum) of an Ornstein- Uhlenbeck process on a tree cannot be estimated consistently from an increasing number of tip observ ations if the tree height is bounded. Here, using a fruitful connection to the so-called reconstruction problem in probability theory , we study the con v ergence rate of parameter es- timation in the unbounded height case. For the mean of the process, we provide a necessary and sufﬁcient condition for the consistency of the maximum likelihood estimator (MLE) and establish a phase transition on its con ver gence rate in terms of the growth of the tree. In par- ticular we show that a loss of √ n -consistency (i.e., the v ariance of the MLE becomes Ω( n − 1 ) , where n is the number of tips) occurs when the tree gro wth is larger than a threshold related to the phase transition of the reconstruction problem. For the cov ariance parameters, we giv e a nov el, efﬁcient estimation method which achie ves √ n -consistency under natural assumptions on the tree. Our theoretical results provide practical suggestions for the design of comparati ve data collection. K eywords Ornstein-Uhlenbeck, phase transition, e v olution, phylogenetic, consistency , max- imum likelihood estimator . 1 Intr oduction Analysis of data collected from multiple species presents challenges because of the intrinsic cor- relation produced by the shared ev olutionary history . This dependency structure can be modeled ∗ Departments of Statistics and of Botan y , Uni versity of Wisconsin-Madison. W ork supported by NSF grants DMS- 1106483. † Departments of Statistics, Univ ersity of W isconsin-Madison. ‡ Departments of Mathematics and Statistics (by courtesy), Uni versity of Wisconsin-Madison. W ork supported by NSF grants DMS-1007144 and DMS-1149312 (CAREER), and an Alfred P . Sloan Research Fellowship. 1 by assuming that the traits of interest e v olved along a phylogen y according to a stochastic process. T wo commonly used processes for continuous traits, such as body mass, are Brownian motion (BM) and the Ornstein-Uhlenbeck (OU) process. BM is used to model neutral e volution, with no fa v ored direction (see e.g. [ 13 ]). On the other hand, the OU process can account for natural selec- tion using two extra parameters: a “selection optimum” µ tow ards which the process is attracted and a “selection strength” α [ 15 ]. The OU process has a stationary distrib ution, which is Gaussian with mean µ and v ariance γ = σ 2 / 2 α . The presence of natural selection can be detected by testing whether α > 0 (e.g. [ 17 ]). Changes in µ across dif ferent groups of or ganisms are used to correlate changes in selection re gime with changes in beha vior or en vironmental conditions (see e.g. [ 9 , 5 ]). For instance, the optimal body size µ might be different for terrestrial animals than for birds and bats. In practice, µ , α and the inﬁnitesimal v ariance σ 2 (or stationary variance γ ) are estimated from data on extant species. In other w ords, only data at the tips of the tree are av ailable. The process at internal nodes and edges is unobserv ed. Also, the tree is reconstructed independently from external and ab undant data, typically from DN A sequences. In practice there can be some uncertainty about a fe w nodes in the tree, but we assume here that the tree is kno wn without error . The OU process on a tree has been used e xtensi vely in practice (see e.g. [ 9 , 10 , 8 , 27 ]), b ut very fe w authors ha ve studied con v er gence rates of av ailable estimators. Recently Ho and An ´ e [ 18 ] showed that if the tree height is bounded as the sample size goes to inﬁnity , no estimator for µ can e ver be consistent. This is because µ is not “microer godic”: the distribution P µ of the whole observ able process ( Y i ) i ≥ 1 at the tips of the tree is such that P µ 1 and P µ 2 are not orthogonal for any values µ 1 6 = µ 2 , if the tree height is bounded. This boundedness assumption does not hold for common models of ev olutionary trees howe v er , such as the pure-birth (Y ule) process [ 31 ]. W e consider here the case of an unbounded tree height. W e study the consistency and con ver gence rates of several estimators, including some nov el estimators, using tools from the literature on the reconstruction problem in probability theory . In particular we relate the con vergence rates of these estimators to the growth rate of the phylogeny . This connection is natural gi ven that the gro wth rate (and the related branching number) is known to play an important role in the analysis of a variety of stochastic processes including random walks, percolation and ancestral state reconstruction on trees [ 26 ]. In particular we leverage a useful characterization of the variance of linear estimators in terms of electrical networks. Main r esults W e present the asymptotic properties of two common estimators for µ : the sample mean and the maximum likelihood estimator (MLE). Conditional on the tree, the MLE ˆ µ ML is kno wn to be the best linear unbiased estimator for µ assuming that α is kno wn. (The assumption of kno wn α is proved not to be restrictive for our con v er gence rate results if α can be well estimated.) In fact, we giv e an example when ˆ µ ML performs signiﬁcantly better than the sample mean, which is not consistent in that particular case. In one of our main results, we identify a necessary and suf ﬁcient condition for the consistency of ˆ µ ML . W e also deri ve a phase transition on its con ver gence rate, which drops from √ n -consistency (i.e. the variance is O ( n − 1 ) ) to a lower rate, n being the number of samples (i.e. tip observations). This phase transition depends on the growth rate of the tree. T ree growth measures the rate at which new leav es arise as the tree height increases (see Section 2 for a formal deﬁnition). Roughly , when the growth rate is below 2 α , we show that √ n - consistency holds. This is intuitiv e as a lower gro wth rate means lower correlations between the leaf states. On the other hand, when the gro wth rate is abov e 2 α implying a sample size n  e 2 αT , 2 i.e. when the tree is suf ﬁciently “b ushy , ” then the “ef fecti v e sample size” is reduced to n eﬀ = e 2 αT and the √ n -consistency of ˆ µ ML is lost. W e also pro vide novel, efﬁcient estimators for the other two parameters, α and γ , which achiev e √ n -consistency and do not require the kno wledge of µ . Interestingly , the √ n -consistency in this case is not affected by gro wth rate, unlike the case of the MLE for µ . Our results lead to a practical method to assess whether additional species are informati ve or not, thus helping researchers to a void wasting money and ef fort. Section 3 presents simulations to illustrate these suggestions. Our main results are stated formally and further discussed in Section 2 , after necessary deﬁnitions. Their proofs are found in Section 4 . Related work Bartoszek and Sagitov [ 6 ] obtained a corresponding phase transition for the con- ver gence rate of the sample mean to estimate µ , assuming a Y ule process for the tree. Phase transitions for the con v er gence rate of some U-statistics ha v e also been obtained for the OU model when the tree follo ws a supercritical branching process [ 1 , 2 ]. A main dif ference between these studies and our work is that we assume that the tree is kno wn. Even though tree-free estimators are the only practical options when the tree is unknown, this situation is now becoming rare due to the ev er -gro wing av ailability of sequence data for b uilding trees. For instance Cra wford and Suchard [ 11 ] acknowledge that “as e v olutionary biologists further reﬁne our kno wledge of the tree of life, the number of clades whose phylogen y is truly unkno wn may diminish, along with interest in tree-free estimation methods. ” As we mentioned, related phase transitions hav e been obtained for other processes on trees. For instance, the growth rate of the tree determines whether the state at the root can be reconstructed better than random for a binary symmetric channel on a binary tree (see e.g. [ 12 ] and references therein). In a recent result, Mossel and Steel [ 24 ] established a transition for ancestral state recon- struction by majority rule for the binary symmetric model on a Y ule tree at the same critical point as above. Note that majority rule is a tree-free estimator like the sample mean in [ 6 ], but adapted to discrete traits. In the conte xt of the OU model, Mossel et al. [ 23 ] obtained a phase transition for estimating the ancestral state at the root, with the same critical growth rate we deri v e in our results. 2 Deﬁnitions and statements of r esults In this section, we state formally and further explain our main results. First, we deﬁne our model and describe the setting in which our results are prov ed. 2.1 Model Our main model is a stochastic process on a species tree T . Let T = ( E , V ) be a ﬁnite tree with leaf set L = { 1 , . . . , n } and root ρ . The leaves typically correspond to extant species. W e think of the edges of T as being oriented a way from the root. T o each edge (or branch) b ∈ E of the tree is associated a positi ve length | b | > 0 corresponding to the time elapsed between the endpoints of b . For any two vertices u, v ∈ V , we denote by d uv the distance between u and v in T , that is, the sum of the branch lengths on the unique path between u and v . W e assume that the species tree is ultrametric , that is, that the distance from the root to ev ery leaf is the same. It implies that, for any two tips i, j ∈ L , d ij is twice the time to the most recent common ancestor of i and j from 3 the leav es. W e let T be the height of T , that is, the distance between the root and any leaf, and we deﬁne t ij = T − d ij 2 . Thr oughout we assume that the species tr ee is known. W e consider an Ornstein-Uhlenbeck (OU) process on T . That is, on each branch of T , we ha ve a dif fusion d Y t = − α ( Y t − µ ) dt + σ dB t , where B t is a standard Brownian motion (BM). In the literature on continuous traits, Y t is known as the response v ariable, µ is the selection optimum, α > 0 is the selection strength, σ > 0 is the scale parameter of the Bro wnian motion. W e assume that the root value follo ws the stationary Gaussian distribution N ( µ, γ ) , where γ = σ 2 2 α . At each branching point, we run the process independently on each descendant edge starting from the v alue at the branching. Equiv alently , the column vector of observations Y = ( Y ` ) ` ∈ L at the tips of the tree are Gaussian with mean µ and variance matrix Σ = γ V T where ( V T ) ij = e − αd ij . W e assume thr oughout that α , µ and σ ar e the same on every branch of T . W e will specify below whether these parameter s ar e known, depending on the context. Parameter estimators Our interest lies in estimating the parameters of the model, gi ven T , from a sample of Y . In addition to proposing ne w estimators for α and σ , we study common estimators of µ . In particular we consider the empirical average at the tips Y = 1 0 Y /n, where 1 denotes the all-ones vector and v 0 denotes the transposes of a vector or matrix v . Also, the MLE of µ given the tr ee and α is ˆ µ ML = ( 1 0 V − 1 T 1 ) − 1 1 0 V − 1 T Y , which is the well-known generalized least squares estimator for the linear regression problem Y = µ 1 + ε , where ε is multi v ariate normal with co variance matrix Σ (see e.g. [ 3 ]). Note that the mean squared error is gi ven by V ar T [ ˆ µ ML ] = ( 1 0 V − 1 T 1 ) − 2 1 0 V − 1 T Σ ( V − 1 T ) 0 1 = γ ( 1 0 V − 1 T 1 ) − 1 . (1) W e drop the T in V ar T when the tree is clear from the context. The estimators Y and ˆ µ ML are both linear estimators. It is useful to think of the MLE in this context as an unbiased linear estimator minimizing the mean squared error (that is, a best linear unbiased estimator), which follo ws from the Gauss-Marko v Theorem [ 29 ]. 2.2 Asymptotic setting Our results are asymptotic. Speciﬁcally , we consider sequences of trees T = ( T k ) k ≥ 1 with ﬁxed parameters α, µ, σ . For k ≥ 1 , let n k be the number of lea ves in T k and T k be the height of T k . As before, we denote the leaf set of T k as L k = [ n k ] . Assumption 1 (Unboundedness) . Thr oughout we assume that n k ≤ n k +1 , T k ≤ T k +1 , and that n k → + ∞ and T k → + ∞ as k → + ∞ . For such a sequence of trees and a corresponding sequence of estimators, say X k , we recall v arious desirable asymptotic properties of X k . 4 Deﬁnition 1 (Consistency) . Let ( X k ) k be a sequence of estimator s for a parameter x . W e say that ( X k ) k is consistent for x if X k con ver ges in pr obability to x , denoted as | X k − x | = o p (1) . F or β > 0 , we say that ( X k ) k is ( n β k ) -consistent for x if ( n β k ( X k − x )) k is bounded in pr obability , whic h we denote as | X k − x | = O p ( n − β k ) . W e also recall the following notation. Let ( x k ) k and ( y k ) k be two sequences of real numbers. W e let y k = O ( x k ) if there exists C 1 > 0 such that | y k | ≤ C 1 | x k | ; y k = Ω( x k ) if there exists C 2 > 0 such that | y k | ≥ C 2 | x k | ; and y k = Θ( x k ) if y k = O ( x k ) and y k = Ω( x k ) . Gro wth Our asymptotic results depend on how fast the tree grows. W e ﬁrst provide some intu- ition through a toy e xample. Example 1 (Star tree: A ﬁrst phase transition) . Let T k be a star tr ee with n k leaf edges of length T k emanating fr om the r oot. By symmetry , 1 is an eigen vector of Σ with eigen value λ k = γ [1 + ( n k − 1) e − 2 αT k ] . Hence, 1 is also an eigen vector of Σ − 1 with eigen value λ − 1 k and 1 0 Σ − 1 1 = n k λ − 1 n k , so that ˆ µ ( k ) ML = Y and V ar[ ˆ µ ( k ) ML ] = λ k n k = γ  e − 2 αT k + 1 − e − 2 αT k n k  . (2) If both n k and T k → + ∞ , then V ar[ ˆ µ ( k ) ML ] → 0 and the MLE (and Y ) is consistent for µ . Further - mor e, if lim inf k 2 αT k log n k > 1 , then n k V ar[ ˆ µ ( k ) ML ] ≤ γ [ n k e − 2 αT k + 1] = O (1) and the MLE is √ n k -consistent (by an application of Chebyshev’ s inequality). On the other hand, if lim inf k 2 αT k log n k < 1 , then n k V ar[ ˆ µ ( k ) ML ] ≥ γ [ n k e − 2 αT k ] , which goes to + ∞ along a subsequence, and the MLE is not √ n k -consistent (using that ˆ µ ML is unbiased and normally distributed). T o study more general trees, we use sev eral standard notions of growth, which play an impor- tant role in random walks, percolation and ancestral state reconstruction on trees (see e.g. [ 26 ]). Deﬁnition 2 (Growth) . The lower gro wth and upper gro wth of a tr ee sequence T ar e deﬁned r espectively as Λ g = lim inf k log n k T k , and Λ g = lim sup k log n k T k . In case of equality we deﬁne the gr owth Λ g = Λ g = Λ g . (Note that our deﬁnition differ s slightly fr om [ 26 ] in that we consider the “exponential r ate” of gr owth.) 5 That is, for all  > 0 , e ventually e (Λ g −  ) T k ≤ n k ≤ e (Λ g +  ) T k , and along appropriately chosen subsequences n k j ≥ e (Λ g −  ) T k j and n k 0 j ≤ e (Λ g +  ) T k 0 j . W e also need a stronger notion of gro wth. For a tree T , thinking of the branches of T as a continuum of points , a cutset π is a set of points of T such that all paths from the root to a leaf must cross π . Let Π k be the set of cutsets of T k . Deﬁnition 3 (Branching number) . The branching number of T is deﬁned as Λ b = sup ( Λ ≥ 0 : inf k,π ∈ Π k X x ∈ π e − Λ δ k ( ρ,x ) > 0 ) , wher e δ k ( ρ, x ) is the length of the path fr om the r oot to x in T k . Because the leaf set L k forms a cutset, it holds that Λ b ≤ Λ g ≤ Λ g . Unlike the gro wth, the branching number takes into account aspects of the “shape” of the tree. Example 2 (Star tree sequence, continued) . Consider again the setup of Example 1 . The inﬁmum inf π ∈ Π k X x ∈ π e − Λ δ k ( ρ,x ) , is ac hieved by taking π = L k for e very k . Hence Λ b = Λ g . W e showed in Example 1 that the MLE of µ given α is √ n k -consistent if Λ g < 2 α , b ut not √ n k -consistent if Λ g > 2 α . Finally , we will need a notion of uniform growth. Deﬁnition 4 (Uniform gro wth) . Let T = ( T k ) k be a tr ee sequence. F or any point x in T k , let n k ( x ) be the number of leaves below x and let T k ( x ) be the distance fr om x to the leaves. Then the uniform gro wth of T is deﬁned as Λ ug = lim M → + ∞ sup k,x ∈ T k log n k ( x ) T k ( x ) ∨ M . (The purpose of the M in the denominator is to alleviate boundary effects.) 2.3 Statement of r esults W e can now state our main results. Results concerning the mean µ W e ﬁrst giv e a characterization of the consistency of the MLE of µ . In words, the MLE sequence is consistent if, in the limit, we can ﬁnd arbitrarily many descendants, arbitrarily far away from the leaves. This theorem is pro ved in Section 4.2 , along with a related result in volving the branching number . 6 Theorem 1 (Consistency of ˆ µ ML ) . Let ( T k ) k be a sequence of tr ees satisfying Assumption 1 . Let ( ˆ µ ( k ) ML ) k be the corr esponding sequence of MLEs of µ given α . Denote by ˜ π k t the cutset of T k at time t away from the leav es and let T k be the height of T k . Then ( ˆ µ ( k ) ML ) k is consistent for µ if and only if for all s ∈ (0 , + ∞ ) lim inf k   ˜ π k s   = + ∞ . (3) W e further obtain bounds on the variance of the MLE to characterize the rate of con vergence of the MLE. In particular we gi ve conditions for √ n k -consistency . W e sho w that the latter under goes a phase transition, generalizing Example 2 . When the upper growth is abo ve 2 α , we sho w that the MLE of µ cannot be √ n k -consistent. If further the branching number is above 2 α , we gi ve tight bounds on the con vergence rate of the MLE. Roughly we sho w that, in the latter case, the v ariance behav es lik e n 2 α/ Λ g k . Or perhaps a more accurate way to put it is that the “effecti ve number of samples” n eﬀ k is e 2 αT k , in the sense that V ar T k [ ˆ µ ( k ) ML ] = Θ(( n eﬀ k ) − 1 ) . Theorem 2 (Loss of √ n k -consistency for ˆ µ ML : Supercritical re gime) . Let ( T k ) k be a tr ee se- quence. If Λ g > 2 α , then for all  > 0 ther e is a subsequence ( k j ) j along which V ar T k j [ ˆ µ ( k j ) ML ] ≥ γ n − 2 α/ (Λ g −  ) k j . (4) In particular ( ˆ µ ( k ) ML ) k is not √ n k -consistent. If, further , 1. Λ b > 2 α : then V ar T k [ ˆ µ ( k ) ML ] = Θ  e − 2 αT k  . Mor eover in terms of n k , for all  > 0 , ther e ar e constants 0 < C 0 , C < + ∞ such that C 0 n − 2 α/ (Λ g −  ) k ≤ V ar T k [ ˆ µ ( k ) ML ] ≤ C n − 2 α/ (Λ g +  ) k , (5) and, in addition to ( 4 ) , ∃ subsequence ( k 0 j ) j , s.t. V ar T k 0 j [ ˆ µ ( k 0 j ) ML ] ≤ γ n − 2 α/ (Λ g +  ) k 0 j . 2. Λ b < 2 α : then, for all  > 0 , ther e ar e constants 0 < C 0 , C < + ∞ such that C 0 n − 2 α/ (Λ g −  ) k ≤ V ar T k [ ˆ µ ( k ) ML ] ≤ C n − (Λ b −  ) / (Λ g +  ) k , (6) wher e the lower bound in ( 6 ) above holds pr ovided Λ g > 0 , and ∃ subsequence ( k 0 j ) j , s.t. V ar T k 0 j [ ˆ µ ( k 0 j ) ML ] ≤ γ n − (Λ b −  ) / (Λ g +  ) k 0 j . The following example shows that, when Λ b < 2 α , the upper bound in ( 6 ) may not be achie ved, but cannot be impro ved in general. Example 3 (T wo-le vel tree) . Let ( T k ) k be a tr ee sequence with two le vels of nodes below the r oot: D ( k ) 0 = e Λ 0 τ ( k ) 0 nodes ar e attached to the r oot by edg es of length τ ( k ) 0 = σ T k , for some arbitrary choice of tr ee height T k → ∞ and 0 < σ < 1 . Each of these D ( k ) 0 nodes has itself D ( k ) 1 = e Λ 1 τ ( k ) 1 childr en along edges of length τ ( k ) 1 = (1 − σ ) T k , and these form the leaves of T k . 7 Proposition 1. F or 0 < Λ 0 < Λ 1 and T = ( T k ) k described above , we have that Λ b = Λ 0 , Λ g = σ Λ 0 + (1 − σ )Λ 1 , and V ar T k [ ˆ µ ( k ) ML ] = γ e − 2 αT k + γ (1 − e − 2 ασ T k ) e − ( σ Λ 0 +(1 − σ )2 α ) T k + γ (1 − e − 2 α (1 − σ ) T k ) e − Λ g T k . (7) This pr oposition is pr oved in Section 4.3 . It implies that if 2 α ≤ Λ 0 = Λ b , the dominant term in the variance is γ e − 2 αT k = γ n − 2 α/ Λ g k , as pr edicted by ( 5 ) in Theor em 2 . If instead 2 α ≥ Λ 1 , the dominant term in the variance is γ e − Λ g T k = γ n − 1 k , and we have √ n k -consistency . In the intermediate case when Λ 0 < 2 α < Λ 1 , the dominant term in the variance is γ e − ( σ Λ 0 +(1 − σ )2 α ) T k = γ n − σ (Λ b / Λ g ) − (1 − σ )(2 α/ Λ g ) k . Ther efor e, depending on the value of σ , we can get the full r ange of exponent values between − 2 α/ Λ g and − Λ b / Λ g , as given in ( 6 ) . In the other direction when Λ g < 2 α , the picture is somewhat murkier . For example, by taking σ close enough to 1 in Example 3 , it is possible to ha ve Λ g < 2 α , yet not √ n k -consistency . The issue in Example 3 is the inhomogeneous gro wth rate. Ho we ver , under e xtra regularity conditions, √ n k - consistency can be established. In w ords, the gro wth of the tree must be sufﬁciently homogeneous. In Theorem 3 below , we consider imposing the extra condition Λ b = Λ g , which does not hold in Example 3 . Theorem 3 (Con ver gence rate of ˆ µ ( k ) ML : Subcritical regime) . Let ( T k ) k be a tr ee sequence with Λ g < 2 α . Then V ar T k [ ˆ µ ( k ) ML ] = Ω  n − 1 k  . Further if: 1. Λ b = Λ g > 0 then, for all  > 0 , V ar T k [ ˆ µ ( k ) ML ] = O  n − (1 −  ) k  . 2. Λ ug < 2 α then V ar T k [ ˆ µ ( k ) ML ] = O  n − 1 k  . Theorems 2 and 3 are prov ed in Section 4.3 . All our results on the estimation of µ lev erage a useful characterization of the v ariance of linear estimators in terms of electrical networks. An analogous characterization is used in ancestral state reconstruction [ 26 ]. Note that our results are not as clean as those obtained for ancestral state reconstruction. As Example 3 sho wed, estimation of µ is somewhat sensitiv e to the “homogeneity” of the growth. In Section 4.5 , we sho w that assuming α is kno wn is inconsequential, provided a good estimate of α is a v ailable. Such an estimate is discussed next. Results concerning the parameters α and γ Our main result for α and γ is a √ n k -consistent estimator under the following assumption: there are two separate “bands” of node ages, each containing a number of internal nodes gro wing linearly with the number of leav es. Assumption 2 (Linear-sized bands) . Deﬁne n k ( c, c 0 ) as the number of nodes in T k of age (height fr om the leaves) in ( c, c 0 ) . Assume that ther e ar e constants β > 0 and 0 < c 1 < c 0 1 < c 2 < c 0 2 < ∞ such that n k ( c i , c 0 i ) ≥ β n k , i = 1 , 2 , for all k lar ge enough. As shown in Corollary 4 , this assumption holds for the Y ule process, a speciation model fre- quently used in practice. 8 Theorem 4 (Estimating α and γ : √ n k -consistency) . Let ( T k ) k be a sequence of ultrametric trees satisfying Assumptions 1 and 2 . Then ther e is an estimator ( ˆ α k , ˆ γ k ) k of ( α, γ ) such that | ˆ α k − α | = O p ( n − 1 / 2 k ) and | ˆ γ k − γ | = O p ( n − 1 / 2 k ) . The proof, found in Section 5.2 , is based on the common notion of contrasts. Assumption 2 ensures the existence of an appropriate set of such contrasts. The ke y point is that this extra assumption can be satisﬁed no matter what the gro wth and branching number are, indicating that the estimation of α and γ is unaffected by the growth of the tree unlike µ . Intuiti vely , µ is a more “global” parameter . 2.4 Special cases W e apply here the results stated in Section 2.3 to a number of scenarios. The tree of life naturally gi ves rise to two types of tree sequences. If one imagines sampling an increasing number of contemporary species, one obtains a nested sequence, deﬁned as follo ws. Deﬁnition 5 (Nested sequence) . A sequence of tr ees T = ( T k ) k is nested if, for all k , n k = k and T k r estricted to the ﬁrst k − 1 species is identical to T k − 1 as an ultrametric. An example of nested trees is gi ven by a caterpillar sequence. Example 4 (Caterpillar sequence) . Let ( t k ) k be a sequence of nonne gative numbers such that lim sup k t k = + ∞ . Let T 1 be a one-leaf star with height T 1 = t 1 . F or k > 1 , let T k be the caterpillar-lik e tr ee obtained by adding a leaf edg e with leaf k to T k − 1 at height t k on the path between leaf 1 and the r oot of T k − 1 , if t k ≤ T k − 1 . If instead t k > T k − 1 , cr eate a new r oot at height t k with an edge attac hed to the r oot of T k − 1 and an edge attac hed to k (see F igur e 1 ). t 0 t 1 t 2 ... t k − 1 t k T k − 1 T k Figure 1: Example of a sequence of nested caterpillar trees. Corollary 1 (Nested sequence: consistenc y of ˆ µ ML ) . Let T be a nested sequence such that the height T k goes to inﬁnity . Then T satisﬁes Assumption 1 and the MLE for µ is consistent on T . Pr oof. Let k j be the subsequence such that T k j +1 > T k j for every j and T i = T k j for all i = k j + 1 , . . . , k j +1 − 1 . Then, for all s ∈ (0 , + ∞ ) , as k goes to + ∞ π k s e ventually contains all lea ves k j such that T k j ≥ s . Since T k → + ∞ , the result follows. 9 If one is modeling the growth of the tree of life in time, instead of modeling increased sampling of contemporary species, one obtains a growing sequence as follows. Let T 0 be a rooted inﬁnite tree of bounded de gree, with branch lengths and no leaves. Think of the branches of T 0 as a continuum of points whose distance from the endpoints gro ws linearly . Then, for t ≥ 0 , we deﬁne B t ( T 0 ) as the tree made of the set of points of T 0 at distance at most t from the root. Deﬁnition 6 (Growing sequence) . A sequence of tr ees ( T k ) k is a growing sequence of tr ees if ther e is an inﬁnite tr ee T 0 as above and an incr easing sequence of non-ne gative r eals ( t k ) k such that T k is isomorphic to B t k ( T 0 ) as an ultrametric. Corollary 2 (Gro wing sequence: consistency of ˆ µ ML ) . Let ( T k ) k be a gr owing sequence such that the height T k = t k goes to inﬁnity . Then T satisﬁes Assumption 1 and the MLE for µ is consistent on ( T k ) k . Pr oof. Fix s ∈ (0 , + ∞ ) . For L = 1 , 2 , . . . , let k 0 L be the smallest k such that n k ≥ L and let k 00 L be the smallest k > k 0 L such that T k ≥ T k 0 L + s . Then, for all k ≥ k 00 L , | π k s | ≥ L . Letting L go to + ∞ gi ves the result. Example 5 (Y ule sequence) . Let T 0 be a tr ee generated by a pur e-birth (Y ule) pr ocess with rate λ > 0 : starting with one lineage, each current lineage splits independently after an exponential time with mean λ − 1 (see e.g . [ 28 ]). F or any (possibly random) sequence of incr easing non-ne gative r eals ( t k ) k with t k → + ∞ , B t k ( T 0 ) (that is, T 0 run up to time t k ), forms a gr owing sequence. The follo wing result is prov ed in Section 4.4 . Corollary 3 (Y ule model: consistency of ˆ µ ML ) . Let ( T k ) k be a Y ule sequence with rate 0 < λ < + ∞ . Then, with pr obability 1 (on the g eneration of T 0 ), 1. ( ˆ µ ( k ) ML ) k is consistent. 2. If λ < 2 α , ( ˆ µ ( k ) ML ) k is √ n k -consistent. 3. If λ > 2 α , ( ˆ µ ( k ) ML ) k is not √ n k -consistent and for all  > 0 ther e is 0 < C 0 , C < + ∞ such that C 0 n − 2 αλ − 1 −  k ≤ V ar T k [ ˆ µ ( k ) ML ] ≤ C n − 2 αλ − 1 +  k . W e also apply the estimators ˆ α and ˆ γ to the Y ule model. For simplicity , we take the sequence of times at which ne w speciation e vents occur (although this assumption is not crucial). For k ≥ 1 , let t k be the ﬁrst time at which T 0 has k + 1 lineages. Then n k = k for all k and t k → + ∞ so that Assumption 1 is satisﬁed. The following result is pro ved in Section 4.4 . Corollary 4 (Y ule model: estimation of α and γ ) . Let ( T k ) k be a Y ule sequence with n k = k as abo ve. Then Assumption 2 is satisﬁed asymptotically , and hence | ˆ α k − α | = O p ( n − 1 / 2 k ) and | ˆ γ k − γ | = O p ( n − 1 / 2 k ) . 10 3 A pplication to experimental design f or trait evolution studies Thanks to recent dev elopments in technology , scientists hav e reconstructed sev eral large phyloge- netic trees with thousands of species such as trees containing 4507 mammal species [ 7 ] and 9993 bird species [ 21 ]. Ho we ver , researchers may not be able to collect trait data from all species, due to limited resources and funding. Thus, many studies are only based on a subset of species in the av ailable tree. F or example, to study the ev olution of body size in mammals, Cooper and Purvis [ 10 ] used 3473 of the 4507 species in their tree, and V enditti et al. [ 30 ] incorporated 3185 species in their analysis. When considering extra data collection, an important question arises: can addi- tional species increase the precision of our estimates? Our theoretical results help answering this question for the OU tree model: 1. If ˆ λ  2 ˆ α , additional species tend to be very informati ve for estimating µ (Corollary 3 ). 2. If ˆ λ  2 ˆ α , additional species that do not increase tree height tend to be non-informativ e for estimating µ (Corollary 3 ). 3. When ˆ λ is around 2 ˆ α , it is not clear whether additional species are informative for estimating µ . 4. Additional species tend to be informati ve for estimating α and γ (Corollary 4 ). Example: In [ 30 ], body size e volution was studied using 3185 mammal species. W ould it be worth the ef fort to collect data for the remaining 1322 species in the tree, to increase the precision of estimating µ ? T o answer this question about sampling utility , we ﬁrst need to estimate the speciation rate λ and the selection strength α . The 4507-species mammal tree was rescaled to hav e height 1 and its speciation rate was estimated to be 11 . 83 using maximum likelihood ( yule function in the R package ape [ 25 ]). W e also estimated ˆ α = 0 . 01 using maximum likelihood ( phylolm function in the R package phylolm [ 20 ]). Note that the tree formed by the 3185 species has the same height as the full tree with all 4507 species. Since ˆ λ ≈ 11 . 83  0 . 02 ≈ 2 ˆ α , additional species tend to be non-informativ e and our recommendation is to stop data collection. Our conclusion is consistent with simulations in [ 18 , 19 ], which sho wed that additional species are non-informativ e for estimating µ if they do not increase tree height, when α is low . Our recommendation here speciﬁes the critical value of α below which additional sampling is of little utility . T o further demonstrate the relationship between sampling utility and α (or λ ) at ﬁxed tree height, we simulated data according to the OU model along the 4507-species mammal tree with µ = 0 , γ = 1 , and se veral v alues of α ranging from 0 . 01 to 300 . For ev ery set of parameters, we simulated 2000 data sets using the rTrait function (R package phylolm ). Then, ˆ µ ML was computed for each data set using the phylolm function. The sample v ariance of ˆ µ ML (Figure 2 ) was found to be about e − 2 α when 2 α  ˆ λ , and about 1 /n = 1 / 4507 when 2 α  ˆ λ . T o illustrate the relationship between sampling utility and α (or λ ) when the tree height v aries, we simulated 400 trees under the Y ule process using the sim.bdtree function (R package geiger [ 16 ]). W e used speciation rate λ = 11 . 83 , which was the maximum likelihood estimate from the mammal tree. The tree height was varied from 0 . 05 to 1 and we simulated 20 trees for each tree height. W e calculated var ( ˆ µ ML ) corresponding to three ﬁxed v alues of α (0 . 1 , λ/ 2 , 30) using 11 ● ● ● ● ● ● ● ● ● ● ● 0.01 0.03 0.1 0.3 1 3 λ ^ /2 10 30 100 300 2e−04 0.002 0.02 0.2 1 α var( µ ^ ML ) y = exp(−2 α ) y = 1/4507 Figure 2: Sample v ariance of ˆ µ ML (red points) as a function of α , from the simulation on the mammal tree. When α is small, v ar ( ˆ µ ML ) ≈ e − 2 α (purple line). When α is large, var ( ˆ µ ML ) ≈ 1 / 4507 (blue line). ( 1 ) and the three.point.compute function (R package phylolm ). The results showed that (Figure 3 ) when 2 α  λ , e − 2 αT approximates v ar ( ˆ µ ML ) better than 1 /n . On the other hand, when 2 α  λ , 1 /n is a better approximation. T aken together , our results show that when 2 α  ˆ λ , the variance of ˆ µ ML depends on the tree height, not the sample size. So, additional sampling that does not increase tree height is not recommended. On the other hand, when 2 α  ˆ λ the v ariance of ˆ µ ML is of order 1 /n , as if we had n independent samples. In this case additional species are very informativ e, and additional sampling is recommended if af fordable. 4 Pr oofs of r esults f or estimating µ W e dev elop here necessary tools (Section 4.1 ), then prove Theorem 1 (Section 4.2 ), Theorems 2 and 3 (Section 4.3 ), which assume that α is known. Using arguments from the proofs, we also identify e xamples showing that the sample mean Y can perform signiﬁcantly worse than ˆ µ ML , and we sho w that Assumption 1 is not suf ﬁcient in Theorem 1 for the consistenc y of ˆ µ ML . W e prov e an alternativ e sufﬁcient condition based on the branching number (Proposition 6 below). In Section 4.4 , we prove Corollaries 3 and 4 . Finally , in Section 4.5 we discuss the sensitivity of the MLE to estimation errors on α . 12 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1e+01 1e+03 1e+05 1e−06 1e−04 1e−02 1e+00 var( µ ^ ML ) α = 0.1 << λ 2 Number of leav es n ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 0.4 0.6 0.8 1.0 0.85 0.90 0.95 1.00 var( µ ^ ML ) T ree height T ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1e+01 1e+03 1e+05 1e−06 1e−04 1e−02 1e+00 α = λ 2 ≈ 5.9 Number of leav es n ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 0.4 0.6 0.8 1.0 1e−04 1e−03 1e−02 1e−01 1e+00 T ree height T ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1e+01 1e+03 1e+05 1e−06 1e−04 1e−02 1e+00 α = 30 >> λ 2 Number of leav es n y = 1/n y = exp(−2 α T) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 0.4 0.6 0.8 1.0 1e−05 1e−04 1e−03 1e−02 1e−01 T ree height T Figure 3: V ariance of ˆ µ ML on ramdom trees, simulated under the Y ule process. T op: var ( ˆ µ ML ) against number of leaves n . Bottom: v ar ( ˆ µ ML ) against tree height T . The true value of α is either small (0.1, left), or λ/ 2 (5.9, middle), or large (30, right). 4.1 Bounding the variance of the MLE Fix an ultrametric species tree T with leaf set L , number of tips n = | L | , and root ρ . W e also ﬁx α > 0 . A f ormula f or the variance Let θ = ( θ ` ) ` ∈ L , with θ 0 1 = 1 and θ ` ∈ [0 , 1] for all ` , and recall that Y θ = P ` ∈ L θ ` Y ` is an unbiased estimator of µ . By deﬁning, for each branch b , θ b = X ` ∈ L 1 b ∈ p ( ρ,` ) θ ` , (8) where p ( ρ, ` ) is the path from ρ to ` , we naturally associate to the coefﬁcients θ a ﬂo w on the edges of T , deﬁned as follo ws. Deﬁnition 7 (Flo w) . A ﬂow η is a mapping fr om the set of edges to the set of positive numbers such that, for every edge b , we have η b = P b 0 ∈ O b η b 0 wher e O b is the set of outgoing edges stemming fr om b (with the edges oriented away fr om the r oot). Deﬁne k η k = P b ∈ O ρ η b . W e say that η is a unit ﬂo w if k η k = 1 . W e extend η to vertices v in T by deﬁning η v as the ﬂow on the edge entering v . Similarly , for a point x in T , we let η x be the ﬂow on the corr esponding edge or vertex. 13 For ev ery edge b of T , we set R b = (1 − e − 2 α | b | ) e 2 αδ ( ρ,b ) where | b | is the length of b and δ ( ρ, b ) is the length of the path from the root to b (inclusiv e). Proposition 2 (V ariance of ˆ µ ML : Main formula) . Let F be the set of unit ﬂows fr om ρ to L . Let E be the set of edges and T be the height of tr ee T . F or any θ ∈ F , we have V ar[ Y θ ] = γ e − 2 αT 1 + X b ∈ E R b θ 2 b ! (9) so that V ar[ ˆ µ ML ] = inf θ ∈F γ e − 2 αT (1 + P b ∈ E R b θ 2 b ) . As detailed in [ 26 ], a species tree can be interpreted as an electrical network with resistance R b on edge b . The minimum R T of P b ∈ E R b θ 2 b ov er unit ﬂows (corresponding to the MLE) is kno wn as the ef fective r esistance of T , which can be interpreted in terms of a random walk on the tree. See [ 26 ] for details. Pr oof. The second part follo ws from the ﬁrst, because ˆ µ ML is the best unbiased linear estimator of µ . The proof of ( 9 ) follows from a computation in [ 12 , Lemma 5.1]. For ev ery node u of the tree, by a telescoping argument, e 2 αδ ( ρ,u ) − 1 = X b ∈ p ( ρ,u ) R b (10) where δ ( ρ, u ) is the distance from ρ to u , and p ( ρ, u ) is the path from ρ to u . Denote by v ∧ w the most recent common ancestor of v and w . Then V ar[ Y θ ] = γ X v ,w ∈ L θ v θ w e − 2 αT e − 2 αδ ( ρ,v ∧ w ) = γ e − 2 αT X v ,w ∈ L θ v θ w  1 + X b ∈ p ( ρ,v ∧ w ) R b  = γ e − 2 αT  1 + X b ∈ E R b X v ,w ∈ L 1 b ∈ p ( ρ,v ∧ w ) θ v θ w  = γ e − 2 αT " 1 + X b ∈ E R b  X v ∈ L 1 b ∈ p ( ρ,v ) θ v  X w ∈ L 1 b ∈ p ( ρ,w ) θ w  # = γ e − 2 αT  1 + X b ∈ E R b θ 2 b  , where the second equality follows from ( 10 ), the fourth equality follows from 1 b ∈ p ( ρ,v ∧ w ) = 1 b ∈ p ( ρ,v ) 1 b ∈ p ( ρ,w ) , and the last equality follo ws from ( 8 ). For 0 ≤ t ≤ T , let π t be the set of points at distance t from the root (that is, the cutset corresponding to time t away from the root). Noting that R b = 2 α Z δ ( ρ,b ) δ ( ρ,b ) −| b | e 2 αs d s, we get the follo wing con venient formula: 14 Corollary 5 (V ariance formula: Integral form) . F or any unit ﬂow θ fr om ρ to L , we have V ar[ Y θ ] = γ e − 2 αT " 1 + 2 α Z T 0 e 2 αs  X x ∈ π s θ 2 x  d s # . As a ﬁrst important application of Proposition 2 and Corollary 5 , we show that the v ariance of the MLE of µ can be controlled by the branching number . The result is characterized by a transition at Λ b = 2 α , similarly to Example 2 . Proposition 3 (V ariance of ˆ µ ML : Link to the branching number) . Let T = ( T k ) k be a tr ee sequence with branc hing number Λ b > 0 . Then, for all Λ < Λ b , ther e is I Λ such that V ar T k [ ˆ µ ( k ) ML ] ≤          γ  1 + 2 α I Λ (2 α − Λ)  e − Λ T k , if Λ < 2 α , γ  1 + 2 αT k I Λ  e − 2 αT k , if Λ = 2 α , γ  1 + 2 α I Λ (Λ − 2 α )  e − 2 αT k , if Λ > 2 α . Pr oof. F or Λ < Λ b , let I Λ = inf k,π ∈ Π k P x ∈ π e − Λ δ k ( ρ,x ) > 0 . By the max-ﬂo w min-cut theorem (see e.g. [ 22 ]), there is a ﬂo w η ( k ) on T k with k η ( k ) k ≥ I Λ (11) and η ( k ) x ≤ e − Λ δ k ( ρ,x ) , (12) for all points x in T k . Normalize η ( k ) as θ ( k ) = η ( k ) / k η ( k ) k . By Proposition 2 and Corollary 5 , for Λ 6 = 2 α , V ar T k [ ˆ µ ( k ) ML ] ≤ γ e − 2 αT k   1 + 2 α Z T k 0 e 2 αs  X x ∈ π k s ( θ ( k ) x ) 2  d s   ≤ γ e − 2 αT k   1 + 2 α Z T k 0 e 2 αs  X x ∈ π k s θ ( k ) x e − Λ δ k ( ρ,x ) I Λ  d s   ≤ γ e − 2 αT k  1 + 2 α I Λ Z T k 0 e (2 α − Λ) s d s  = γ  e − 2 αT k + 2 α I Λ (2 α − Λ)  e − Λ T k − e − 2 αT k   . where the second inequality follows from ( 11 ) and ( 12 ), and the third inequality follo ws from the fact that δ k ( ρ, x ) = s for x ∈ π k s by deﬁnition and that P x ∈ π k s θ ( k ) x = 1 . Similarly if Λ = 2 α V ar T k [ ˆ µ ( k ) ML ] ≤ γ  e − 2 αT k + 2 αe − 2 αT k T k I Λ  . 15 Removing bottlenecks Examining ( 9 ), one sees that a natural bound on V ar[ Y θ ] is obtained by “splitting an edge” in T . Deﬁnition 8 (Edge splitting) . Let T be an ultr ametric tr ee with edge set E . Let b 0 = ( x 0 , y 0 ) be a branc h in T (wher e x 0 is closer to the r oot) and let b i = ( y 0 , y i ) , i = 1 , . . . , D , be the outgoing edges at y 0 . The operation of splitting branch b 0 to obtain a new tr ee T 0 with edge set E 0 is deﬁned as follows: remo ve b 0 , b 1 , . . . , b D fr om T ; add D new edges b 0 i = ( x 0 , y i ) of length | b 0 | + | b i | , i = 1 , . . . , D (see F igur e 4 ). W e call mer ging the opposite operation of undoing the above splitting . x 0 y 0 y 1 y 2 y D . . . x 0 y 1 y 2 y D . . . Figure 4: Edge splitting procedure. Note that the number of tips in T and T 0 abov e are the same, and therefore we can use the same estimator Y θ on both of them. Lemma 1 (Splitting an edge) . Let T be an ultrametric tr ee, let b 0 be a branc h in T , and let T 0 be obtained fr om T by splitting b 0 . Then for any nonne gative θ = ( θ ` ) ` ∈ L V ar T 0 [ Y θ ] ≤ V ar T [ Y θ ] . Pr oof. W e use the notation of Deﬁnition 8 . Denote by ( θ b ) b ∈ E and ( θ 0 b ) b ∈ E 0 the ﬂows associated to θ by ( 8 ) on T and T 0 respecti vely . For any branch b , except b 0 , b 1 , . . . , b D and b 0 1 , . . . , b 0 D , we hav e θ b = θ 0 b , as the descendant lea ves of b on T and T 0 are the same. Think of b 0 i = ( x 0 , y i ) , i = 1 , . . . , D , as being made of two consecutiv e edges b 00 i = ( x 0 , y 0 i ) and b 000 i = ( y 0 i , y i ) with | b 00 i | = | b 0 | and | b 000 i | = | b i | (and note, for sanity check, that R b 0 i = R b 00 i + R b 000 i ). Then, θ b i = θ b 000 i and R b i = R b 000 i , and by ( 9 ) V ar T [ Y θ ] − V ar T 0 [ Y θ ] γ e − 2 αT = R b 0 θ 2 b 0 − D X i =1 R b 00 i θ 2 b 00 i = R b 0  D X i =1 θ b 00 i  2 − R b 0 D X i =1 θ 2 b 00 i ≥ 0 , where we used that R b 0 = R b 00 i and the nonnegati vity of the θ b 00 i ’ s. Comparing T to a star we then get: Proposition 4 (Lo wer bound on the v ariance of ˆ µ ML ) . Let T be an ultr ametric tr ee with n tips and height T . Then V ar T [ ˆ µ ML ] ≥ γ  e − 2 αT + 1 − e − 2 αT n  . 16 Pr oof. Split all edges in T by repeatedly applying Lemma 1 until a star tree with n leaves and height T is obtained. The result then follows from ( 2 ). The following example will be useful when proceeding in revers e, to ﬁnd an upper bound on the v ariance of ˆ µ ML . Example 6 (Spherically symmetric trees) . Let T be a spherically symmetric, ultr ametric tr ee, that is, a tr ee such that all vertices at the same graph distance fr om the r oot have the same number of outgoing edges, all of the same length. Let D h , h = 0 , . . . , H − 1 , be the out-de gree of vertices at graph distance h (wher e h = 0 and h = H corr espond to the r oot and leaves r espectively) and let τ h be the corr esponding branc h length. Notice that β 2 1 + · · · + β 2 d , subject to β 1 + · · · + β d = 1 , is minimized at β 1 = · · · = β d = 1 /d . Hence, since ˆ µ ML is the best unbiased linear estimator and ar guing inductively fr om the leaves in ( 9 ) , we see that ˆ µ ML = Y in this case. The mean squared err or is, by ( 9 ) , V ar[ ˆ µ ML ] = γ e − 2 αT " 1 + H − 1 X h =0 h Y h 0 =0 D h 0 ! (1 − e − 2 ατ h ) e 2 α P h h 0 =0 τ h 0 h Y h 0 =0 1 D 2 h 0 # = γ e − 2 αT " 1 + H − 1 X h =0 (1 − e − 2 ατ h ) h Y h 0 =0 e 2 ατ h 0 D h 0 # . (13) Proposition 5 (Upper bound on the variance of ˆ µ ML ) . Let T be an ultrametric tr ee with height T . Recall that π t be the set of points at distance t fr om the r oot. Then V ar T [ ˆ µ ML ] ≤ inf 0 ≤ t ≤ T γ  e − 2 α ( T − t ) + 1 − e − 2 α ( T − t ) | π t |  . Pr oof. Let 0 ≤ t ≤ T . For all points x in π t , choose one descendant leaf ` x of x and deﬁne θ as θ ` = ( 1 / | π t | if ` = ` x for some x, 0 otherwise . Di vide all branches crossing π t into tw o branches meeting at π t . Then mer ge all branches abo ve π t (that is, closer to the root) by repeatedly applying Lemma 1 . By ( 9 ), removing all branches b with θ b = 0 does not affect the variance, and from Example 6 with H = 2 , D 0 = 1 , D 1 = | π t | , τ 0 = t , and τ 1 = T − t , we get V ar T [ ˆ µ ML ] ≤ γ e − 2 αT  1 + (1 − e − 2 αt ) e 2 αt + (1 − e − 2 α ( T − t ) ) e 2 αt e 2 α ( T − t ) | π t |  ≤ γ  e − 2 α ( T − t ) + 1 − e − 2 α ( T − t ) | π t |  . 17 The two estimators ˆ µ ML vs. Y As an application of the previous proposition, we provide an example where ˆ µ ML performs signiﬁcantly better than Y . Roughly , the example sho ws that Y can perform poorly on asymmetric trees. Example 7. Consider a caterpillar sequence ( T k ) k , as deﬁned in Example 4 , with t 2 m +1 = m and t 2 m = 1 for all m , as shown in F igure 5 . Note that the tr ee height is T 2 m +1 = T 2 m +2 = m and the cut sets π k t of T k at time t satisfy | π 2 m +1 m − 1 | = | π 2 m +2 m − 1 | = m . Ther efore , by Pr oposition 5 , max  V ar T 2 m +1 [ ˆ µ ML ] , V ar T 2 m +2 [ ˆ µ ML ]  ≤ γ  e − 2 α ( m − 1) + 1 m  → 0 , as m → + ∞ , and hence ˆ µ ML is consistent. On the other hand, note that Cov[ Y i , Y j ] ≥ 0 for all pairs of leaves i, j in T k . Ther efore , V ar T 2 m  Y  = 1 4 m 2 V ar " 2 m X ` =1 Y ` # ≥ 1 4 m 2 V ar " m X i =1 Y 2 i # = 1 4 m 2 m X i,j =1 Co v[ Y 2 i , Y 2 j ] ≥ 1 4 m 2 m 2 γ e − 2 α = γ e − 2 α 4 . So, Y is not consistent. 0 1 2 3 n T ime Y 2 n . . . Y 4 Y 2 Y 1 Y 3 Y 5 Y 2 n +1 ... Figure 5: Example where the MLE ˆ µ is consistent while Y is not. 4.2 Pr oof of Theorem 1 and Sufﬁciency of Conditions Pr oof of Theor em 1 (Consistency of ˆ µ ML ). First assume ( 3 ). From Proposition 5 , for all s , lim sup k V ar T k [ ˆ µ ( k ) ML ] ≤ lim sup k γ  e − 2 αs + 1 − e − 2 αs | ˜ π k s |  ≤ γ e − 2 αs . T aking s to + ∞ gi ves consistency . On the other hand, assume by contradiction that ( ˆ µ ( k ) ML ) k is consistent but that lim inf k   ˜ π k s   = L < + ∞ for some s ∈ (0 , + ∞ ) . Let ( k j ) j be the corresponding 18 subsequence. Divide all branches in T k j crossing ˜ π k j s into tw o branches meeting at ˜ π k j s . Split edges in T k j abov e ˜ π k j s (closer to the root) repeatedly until the tree above ˜ π k j s forms a star . Let T 0 be the resulting tree, let b 0 1 , . . . , b 0 D be the branches emanating from the root, where D ≤ L by assumption, and let ˜ π 0 be the cutset at time s from the lea ves. For the unit ﬂo w θ 0 corresponding to the MLE on T 0 , by Lemma 1 and counting only those edges abov e ˜ π 0 in T 0 in ( 9 ), we hav e V ar T k j [ ˆ µ ( k j ) ML ] ≥ γ e − 2 αT k j " 1 +  1 − e − 2 α ( T k j − s )  e 2 α ( T k j − s ) D X i =1 ( θ 0 b 0 i ) 2 # ≥ γ  e − 2 αT k j +  1 − e − 2 α ( T k j − s )  e − 2 αs L  , where we used the fact that β 2 1 + · · · + β 2 D , subject to β 1 + · · · + β D = 1 , is minimized at β 1 = · · · = β D = 1 /D . Since T k j → + ∞ under Assumption 1 , lim sup k V ar T k [ ˆ µ ( k ) ML ] ≥ γ e − 2 αs L > 0 , and we get a contradiction. W e note that, by Proposition 3 , the branching number provides a simple, suf ﬁcient condition for consistency . Proposition 6 (Consistency: Branching number condition) . Let T = ( T k ) k be a tr ee sequence satisfying Assumption 1 with branc hing number Λ b . Then Λ b > 0 sufﬁces for the consistency of the MLE of µ . 4.3 Phase transition on the rate of con ver gence of the MLE Theorems 2 and 3 sho w a phase transition for the √ n k -consistency of ˆ µ ML , which we prov e now . Pr oof of Theor em 2 (Super critical r e gime). Assume Λ g > 2 α . As remarked after Deﬁnition 2 , for all  > 0 , ev entually exp ((Λ g −  ) T k ) ≤ n k ≤ exp  (Λ g +  ) T k  , (14) that is, n − 2 α/ (Λ g −  ) k ≤ e − 2 αT k ≤ n − 2 α/ (Λ g +  ) k . Moreover for all  > 0 there are subsequences ( k j ) j and ( k 0 j ) j such that n k j ≥ exp  (Λ g −  ) T k j  and n k 0 j ≤ exp  (Λ g +  ) T k 0 j  . (15) By Proposition 4 , V ar T k [ ˆ µ ( k ) ML ] ≥ γ  e − 2 αT k + 1 − e − 2 αT k n k  ≥ γ e − 2 αT k . (16) Then ( 4 ) follo ws from ( 15 ) and ( 16 ). Hence n k V ar T k [ ˆ µ ( k ) ML ] → + ∞ along a subsequence and ( ˆ µ ( k ) ML ) k is not √ n k -consistent (using that ˆ µ ML is unbiased and normally distributed). 19 Assume Λ b > 2 α . Let 2 α < Λ < Λ b . By Proposition 3 V ar T k [ ˆ µ ( k ) ML ] ≤ γ  1 + 2 α I Λ (Λ − 2 α )  e − 2 αT k . (17) Note that Λ g ≥ Λ g ≥ Λ b > 2 α and hence, by ( 16 ) and ( 17 ), V ar T k [ ˆ µ ( k ) ML ] = Θ  e − 2 αT k  . Combin- ing this with ( 14 ) gi ves the result in terms of n k . Assume instead that Λ b < 2 α . Let Λ < Λ b . By Proposition 3 V ar T k [ ˆ µ ( k ) ML ] ≤ γ  1 + 2 α I Λ (2 α − Λ)  e − Λ T k . The rest of the argument is similar to the pre vious case. Pr oof of Pr oposition 1 . Note that Example 3 considers a spherically symmetric tree. By ( 13 ), V ar T k [ ˆ µ ( k ) ML ] = γ e − 2 α ( τ ( k ) 0 + τ ( k ) 1 ) " 1 + X h =0 , 1 (1 − e − 2 ατ ( k ) h ) h Y h 0 =0 e 2 ατ ( k ) h 0 D ( k ) h 0 # which then gi ves ( 7 ). Note that log n k T k = Λ 0 τ ( k ) 0 + Λ 1 τ ( k ) 1 τ ( k ) 0 + τ ( k ) 1 = Λ 0 σ + Λ 1 (1 − σ ) = Λ g . T o compute the branching number , it sufﬁces to consider cutsets with m 0 le vel-1 vertices and the D ( k ) 1 ( D ( k ) 0 − m 0 ) tips below the rest of the le vel-1 vertices. Then J k ≡ inf π ∈ Π k X x ∈ π e − Λ δ k ( ρ,x ) = ( D ( k ) 0 e − Λ τ ( k ) 0 , if D ( k ) 1 > e Λ τ ( k ) 1 n k e − Λ T k , otherwise. Hence if Λ ≥ Λ 1 > Λ g we are in the second case and n k e − Λ T k = e − (Λ − Λ g ) T k → 0 , as k → + ∞ . If Λ < Λ 1 we are in the ﬁrst case and D ( k ) 0 e − Λ τ ( k ) 0 = e − (Λ − Λ 0 ) τ ( k ) 0 , so that Λ b = Λ 0 . Pr oof of Theor em 3 (Subcritical r e gime). One direction follo ws immediately from Proposition 4 which implies V ar T k [ ˆ µ ( k ) ML ] ≥ γ 1 − e − 2 αT k n k = Ω( n − 1 k ) . W e prove the other direction separately in each case. Assume ﬁrst that 0 < Λ b = Λ g < 2 α . For  > 0 (small), choose Λ such that Λ g −  < Λ < Λ g = Λ b < 2 α . By Proposition 3 , e ventually V ar T k [ ˆ µ ( k ) ML ] ≤ γ  1 + 2 α I Λ (2 α − Λ)  e − Λ T k ≤ γ  1 + 2 α I Λ (2 α − Λ)  n − (Λ g −  ) / (Λ g +  ) k . 20 Assume instead that Λ ug < 2 α . W e sho w that Y (and hence the MLE by Proposition 2 ) achie ves √ n k -consistency in this case. Let θ be the corresponding ﬂow on T k . By Corollary 5 , letting Λ ug < Λ < 2 α , for k large enough V ar T k [ Y ] = γ e − 2 αT k   1 + 2 α Z T k 0 e 2 αs   X x ∈ π k s  n k ( x ) n k  2   d s   ≤ γ e − 2 αT k   1 + 2 α Z T k 0 e 2 αs   X x ∈ π k s  n k ( x ) n k  e Λ[( T k − s )+ M ] n k   d s   ≤ γ  e − 2 αT k + e Λ M 2 α n k (2 α − Λ) (1 − e − (2 α − Λ) T k )  . The result follo ws from the fact that e Λ[ T k + M ] ≥ n k . 4.4 Pr oofs f or special cases Pr oof of Cor ollary 3 . By Theorems 1 , 2 , and 3 , it suf ﬁces to prove that Λ b = Λ g = λ with proba- bility 1 . A Galton-W atson (GW) branching process is a discrete-time non-negati ve integer-v alued population process deﬁned as follows: at each time step, each individual in the population has an independent number of offsprings, according to a distribution F , that form the population at the next time. In [ 26 , Chapter 3], it is sho wn that a GW tree where F has mean m has branching number and upper gro wth equal to log m . T o compute the branching number of an inﬁnite Y ule tree T 0 , we use a comparison to a GW tree. Fix  > 0 . Let F be the distribution of the number of lineages in T 0 at time  . By standard branching process results [ 4 , Equation (4) on page 108], m = e λ . By the memoryless property of the exponential, the number of lineages | π N  | in the Y ule tree at time N  is identically distributed to the population size Z N of a GW tree with of fspring distribution F at time N . Then log | π s | s ≤ log Z d s/ e s = d s/ e s · log Z d s/ e d s/ e , which implies that Λ g ≤ 1  · log e λ = λ. Similarly , let π be a cutset in T 0 and let π  be the cutset obtained by rounding up the points in π to the next  -multiple closer to the root (removing duplicates). Let δ GW ( v ) be the distance from the root to verte x v in the GW tree. Then X x ∈ π e − Λ δ 0 ( ρ,x ) ≥ X y ∈ π  e − Λ( δ GW ( y )+1)  = e − Λ  X y ∈ π  e − (  Λ) δ GW ( y ) > 0 whene ver  Λ < log e λ , so that Λ b ≥ λ. Pr oof of Cor ollary 4 . Let τ i = t i − t i − 1 be the amount of time during which T 0 has i lineages (with t 0 = 0 ). Then ( τ i ) i are independent exponential random variables with parameters (1 / ( iλ )) i . Let T j i = P j r = i +1 τ r . Note that I E  T j i  = j X r = i +1 I E[ τ r ] ≡ λ − 1 j X r = i +1 1 r ∈  λ − 1 log  j i + 1  , λ − 1 log  j i   . 21 Similarly , V ar[ T j i ] = j X r = i +1 V ar[ τ r ] = λ − 2 j X r = i +1 1 r 2 ≤ 1 λ 2 i . (18) By Chebyshe v’ s inequality , for all 0 < σ < 1 , I P  | T k b σ k c − I E  T k b σ k c  | ≥   = O ( k − 1 ) , where we used ( 18 ). Let 0 < σ 0 2 < σ 2 < σ 0 1 < σ 1 < 1 . From the previous equation, we get for ι = 1 , 2 I P  T k b σ ι k c ≤ λ − 1 log  k b σ ι k c + 1  −   = O ( k − 1 ) , and similarly for the other direction. T ake a ι = λ − 1 log  1 σ ι  , a 0 ι = λ − 1 log  1 σ 0 ι  and  < a 1 ∧ 1 2 [ a 2 − a 0 1 ] . Then Assumption 2 is satisﬁed asymptotically with c ι = a ι −  , c 0 ι = a 0 ι +  and β = [ σ 1 − σ 0 1 ] ∧ [ σ 2 − σ 0 2 ] , because then I P h c 1 < T k b σ 1 k c < T k b σ 0 1 k c < c 0 1 < c 2 < T k b σ 2 k c < T k b σ 0 2 k c < c 0 2 i ≥ 1 − O ( k − 1 ) . 4.5 Sensitivity to estimate of α So far in this section, we considered the MLE of µ given α . Here we look at the sensitivity of the MLE to estimation errors on α . Theorem 4 shows that there exists a √ n k -consistent estimator of α under Assumption 2 , which is unrelated to the growth or height of the species tree. Moreov er the estimator of α we deri ve does not require the knowledge of µ . Hence suppose that we ha ve a √ n k -consistent estimator ˆ α k of α . Let d V ar T k denote the v ariance under the parameter α = ˆ α k (with µ and γ unchanged) and let ˆ θ k be the corresponding weights of the MLE of µ , that is, the choice of weights assuming that α = ˆ α k and minimizing d V ar T k [ Y θ ] . For all k and under the true α , Y ˆ θ k is an unbiased estimator of µ . Moreover , because ˆ α k = α + o (1) and so on, the bounds in Theorems 2 and 3 apply to d V ar T k [ Y ˆ θ k ] as well (for k large enough). The quantity of interest is V ar T k [ Y ˆ θ k ] . By ( 9 ), V ar T k [ Y ˆ θ k ] = γ e − 2 αT k + γ X b ∈ E k (1 − e − 2 α | b | ) e 2 α ( δ k ( ρ,b ) − T k ) ( ˆ θ k ) 2 b = (1 + O ( T k n − 1 / 2 k )) " γ e − 2 ˆ α k T k + γ X b ∈ E k (1 − e − 2 ˆ α k | b | ) e 2 ˆ α k ( δ k ( ρ,b ) − T k ) ( ˆ θ k ) 2 b # = (1 + O ( T k n − 1 / 2 k )) d V ar T k [ Y ˆ θ k ] , provided T k n − 1 / 2 k = o (1) . Hence, for instance if Λ g > 0 , T k = O (log n k ) and we get that V ar T k [ Y ˆ θ k ] satisﬁes the bounds in Theorems 2 and 3 . 22 5 Con vergence rate of a new estimator f or α and γ In this section, we provide a novel estimator for ( α , γ ) . Under natural assumptions on the species tree, we show that this estimator is √ n k -consistent. Moreov er this estimator does not require the kno wledge of µ . Interestingly , in contrast to what we showed for µ , the conditions for √ n k - consistency in this case do not in volve the gro wth, or e ven the height, of the species tree. This is in line with the results in [ 18 ], who found that µ requires an unbounded tree height to be microergodic, whereas α and γ do not. Note, howe ver , that the MLE of α and γ are not simple linear estimators, which makes them harder to study here. In particular , unlike in the case of µ , we do not pro vide lo wer bounds on their rate of con ver gence. 5.1 Contrast-based estimator W e ﬁrst describe the estimator . The proof of its con ver gence rate is in Section 5.2 . Contrasts Our estimator relies on an appropriately chosen set of contrasts, that is, differences between pairs of leaf states (see e.g. [ 14 ]). More speciﬁcally , we choose contrasts associated with internal nodes, as follo ws. Let T be an ultrametric species tree with lea ves L and internal vertices I . For two leav es ` and ` 0 , we let ` ∧ ` 0 be their most recent common ancestor . Assume that all internal vertices of T hav e out-degree at least 2 . Let i ∈ I be an internal v ertex of T , and let ` i 1 6 = ` i 2 be two leav es such that ` i 1 ∧ ` i 2 = i . Let P i be the path connecting ` i 1 and ` i 2 . W e deﬁne the corresponding contrast C i = Y ` i 1 − Y ` i 2 . Let T ( i ) be the height of i from the lea ves. W e say that T ( i ) is the height of C i . Lemma 2 (Contrasts: Distribution [ 18 ]) . Let i 1 , . . . , i m be a collection of internal nodes of T . Let C i 1 , . . . , C i m be an arbitrary set of associated contrasts. Assume that the corr esponding paths P i 1 , . . . , P i m ar e pairwise non-intersecting, that is, none of the pairs of paths share a verte x. Then C i 1 , . . . , C i m ar e mutually independent, multivariate normal with C i ∼ N (0 , 2 γ (1 − e − 2 αT ( i ) )) . Pr oof. Indeed, e xpanding the cov ariance, we get for j 6 = j 0 γ − 1 Co v[ C j , C j 0 ] = e − αd ` j 1 ` j 0 1 − e − αd ` j 1 ` j 0 2 − e − αd ` j 2 ` j 0 1 + e − αd ` j 2 ` j 0 2 = 0 , since, by assumption, ` j ι ∧ ` j 0 ι 0 is the same verte x for all ι, ι 0 = 1 , 2 . The follo wing lemma will be useful in identifying an appropriate collection of contrasts. Lemma 3 (Contrasts: A large collection [ 18 ]) . Let T be an ultrametric tree and let I ( a,b ) be the set of internal nodes of T whose height fr om the leaves lies in ( a, b ) . F or e very a < b , we can select a set of independent contrasts C , associated with internal nodes in I ( a, b ) , such that | C | ≥ n ( a, b ) / 2 , wher e n ( a, b ) = | I ( a, b ) | . In particular , the heights of the contr asts in C lie in ( a, b ) and their corr esponding paths ar e pairwise non-intersecting . 23 Pr oof. Start with the lowest verte x i in I ( a,b ) and choose a pair of vertices ` i 1 and ` i 2 such that ` i 1 ∧ ` i 2 = i . Remove i and its descendants as well as the edge immediately above i (and fuse consecuti ve edges separated by degree- 2 vertices). As a result, the number of internal vertices in ( a, b ) decreases by at most 2 . Repeat until no vertex is left in I ( a,b ) . The estimator For a sequence of trees T = ( T k ) k , let L k be the leaf set of T k ; I k , the set of its internal vertices; n k = | L k | and n k ( a, b ) = | I k ( a, b ) | ; and T k ( i ) , the height of i , for each i ∈ I k . The idea behind our estimator is to set up a system of equations that characterize α and γ uniquely . Our construction relies on the follo wing condition. W e illustrate this condition on two special cases belo w . W e set up our equations as follows. Let m k = b β n k / 2 c . Under Assumption 2 , by Lemma 3 , for each k we can choose two collections of independent contrasts ( C k i r ) m k r =1 and ( C k j r ) m k r =1 with corresponding heights T k ( i r ) ∈ ( c 1 , c 0 1 ) and T k ( j r ) ∈ ( c 2 , c 0 2 ) for e very r = 1 , 2 , . . . , m k . (Note that the two collections are not independent.) For r = 1 , . . . , m , let ˆ a k = 1 m k m k X r =1 ( C ( k ) i r ) 2 , ˆ b k = 1 m k m k X r =1 ( C ( k ) j r ) 2 , and note that a k ≡ I E[ˆ a k ] = 2 γ 1 − 1 m k m k X r =1 e − 2 αT k ( i r ) ! ≡ 2 γ h 1 k ( α ) , b k ≡ I E h ˆ b k i = 2 γ 1 − 1 m k m k X r =1 e − 2 αT k ( j r ) ! ≡ 2 γ h 2 k ( α ) . Notice that, under Assumption 2 , a k ∈ [2 γ (1 − e − 2 αc 2 ) , 2 γ (1 − e − 2 αc 1 )] ≡ [ a α , ¯ a α ] and b k ∈ [2 γ (1 − e − 2 αc 4 ) , 2 γ (1 − e − 2 αc 3 )] ≡ [ b α , ¯ b α ] . As shown belo w , H k ( α ) = a k b k = h 1 k ( α ) h 2 k ( α ) is in vertible in α on (0 , + ∞ ) . Hence a natural estimator of ( α , γ ) is obtained by setting ˆ α k = H − 1 k  ˆ a k ˆ b k  and ˆ γ k = ˆ a k 2 h 1 k ( ˆ α k ) . W e will sho w in the proof of in vertibility belo w that H k is actually strictly increasing, and therefore relati vely straightforw ard to in vert numerically . It remains to prov e in vertibility . Lemma 4 (In vertibility of the system) . Under Assumption 2 , H k ( α ) is strictly positive, differ en- tiable, and in vertible on (0 , + ∞ ) . Pr oof. W e ha ve that ∂ log H k ( α ) ∂ α = P m k r =1 2 T k ( i r ) e − 2 αT k ( i r ) P m k r =1 (1 − e − 2 αT k ( i r ) ) − P m k r =1 2 T k ( j r ) e − 2 αT k ( j r ) P m k r =1 (1 − e − 2 αT k ( j r ) ) = P P m k r,r 0 =1 2 T k ( i r ) e − 2 αT k ( i r ) (1 − e − 2 αT k ( j r 0 ) ) P m k r =1 (1 − e − 2 αT k ( i r ) ) P m k r =1 (1 − e − 2 αT k ( j r ) ) − P P m k r,r 0 =1 2 T k ( j r 0 ) e − 2 αT k ( j r 0 ) (1 − e − 2 αT k ( i r ) ) P m k r =1 (1 − e − 2 αT k ( i r ) ) P m k r =1 (1 − e − 2 αT k ( j r ) ) (19) 24 Note that the function xe − x 1 − e − x is strictly decreasing on (0 , ∞ ) because its deriv ativ e is e − x (1 − x − e − x ) (1 − e − x ) 2 < 0 on (0 , + ∞ ) . Therefore 2 T k ( i r ) e − 2 αT k ( i r ) 1 − e − 2 αT k ( i r ) ≥ 2 c 0 1 e − 2 αc 0 1 1 − e − 2 αc 0 1 > 2 c 2 e − 2 αc 2 1 − e − 2 αc 2 ≥ 2 T k ( j r 0 ) e − 2 αT k ( j r 0 ) 1 − e − 2 αT k ( j r 0 ) , that is, 2 T k ( i r ) e − 2 αT k ( i r ) (1 − e − 2 αT k ( j r 0 ) ) − 2 T k ( j r 0 ) e − 2 αT k ( j r 0 ) (1 − e − 2 αT k ( i r ) ) > 0 , (20) for e very r , r 0 , so that each ( r, r 0 ) -term in ( 19 ) is strictly positi ve. Hence, we can deduce that ∂ log H k ( α ) /∂ α > 0 , that is, log H k (and hence H k itself) is strictly increasing on (0 , + ∞ ) and continuous, and therefore in vertible. Note that we cannot use the law of large numbers to deriv e consistency (despite the indepen- dence of the contrasts) because a k /b k is a bounded, but not necessarily con ver gent, sequence and H − 1 k is continuous, but depends on k . Instead we ar gue directly about √ n k -consistency belo w . 5.2 Pr oof of Theorem 4 Pr oof of Theor em 4 . Note that I E[ˆ a k ] = a k and V ar[ ˆ a k ] = 8 γ 2 m 2 k m k X r =1 (1 − e − 2 αT k ( i r ) ) 2 ≤ 8 γ 2 m k (1 − e − 2 αc 1 ) 2 = O ( m − 1 k ) = O ( n − 1 k ) , where we used that ([2 γ (1 − e − 2 αT k ( i r ) )] − 1 / 2 C k i r ) 2 is χ 2 1 -distributed and, therefore, has variance 2 . Hence | ˆ a k − a k | = O p ( n − 1 / 2 k ) by Chebyshev’ s inequality . Similarly , | ˆ b k − b k | = O p ( n − 1 / 2 k ) . Our claim that | ˆ α k − α k | = O p ( n − 1 / 2 k ) then follows from the follo wing straightforward lemma. Lemma 5. If 0 < z ∗ ≤ z ≤ z ∗ < ∞ , | z 0 − z | ≤  and  < z ∗ / 2 , then ther e is a constant ∆( z ∗ , z ∗ ) depending on c 1 , c 0 1 , c 2 , c 0 2 such that for all k sup t ∈ [0 , 1] | ( H − 1 k ) 0 ( tz 0 + (1 − t ) z ) | ≤ ∆( z ∗ , z ∗ ) . Pr oof. W e use the proof of Lemma 4 . Let ζ α = ζ α ( c 1 , c 0 1 , c 2 , c 0 2 ) > 0 be the smallest possible dif ference in ( 20 ) for a ﬁxed α . Let α ∗ , α ∗ be deﬁned as 1 2 z ∗ = ¯ a α ∗ b α ∗ , 3 2 z ∗ = a α ∗ ¯ b α ∗ . 25 Then [ α ∗ , α ∗ ] ⊇ H − 1 k  1 2 z ∗ , 3 2 z ∗  for all k . Note that sup t ∈ [0 , 1] | ( H − 1 k ) 0 ( tz 0 + (1 − t ) z ) | ≤ sup z ∈ [ 1 2 z ∗ , 3 2 z ∗ ]   ( H − 1 k ) 0 ( z )   = sup z ∈ [ 1 2 z ∗ , 3 2 z ∗ ]  ∂ H k ∂ α ( H − 1 k ( z ))  − 1 = sup z ∈ [ 1 2 z ∗ , 3 2 z ∗ ]  H k ∂ log H k ∂ α  ( H − 1 k ( z ))  − 1 ≤ sup α ∈ [ α ∗ ,α ∗ ] b α ¯ a α · (1 − e − 2 αc 0 1 )(1 − e − 2 αc 0 2 ) ζ α ≡ ∆( z ∗ , z ∗ ) . W e ﬁnish the proof of Theorem 4 . W e use the follo wing observation: for 0 < x ∗ ≤ x ≤ x ∗ < ∞ and 0 < y ∗ ≤ y ≤ y ∗ < ∞ such that | x − x 0 | ≤  and | y − y 0 | ≤  with  < y ∗ / 2 , we hav e     x 0 y 0 − x y     =     y ( x 0 − x ) + x ( y − y 0 ) y y 0     ≤ y ∗ | x 0 − x | y ∗ ( y ∗ / 2) + x ∗ | y − y 0 | y ∗ ( y ∗ / 2) < 4( x ∗ + y ∗ ) y 2 ∗ . Fix δ > 0 (small) and pick M δ such that I P h | ˆ a k − a k | ≥ M δ n − 1 / 2 k i < δ / 2 and similarly for ˆ b k . Then, by Assumption 1 , for k large enough I P      ˆ a k ˆ b k − a k b k     ≥ 4(¯ a α + ¯ b α ) b 2 α M δ n − 1 / 2 k  ≤ I P      ˆ a k ˆ b k − a k b k     ≥ 4(¯ a α + ¯ b α ) b 2 α M δ n − 1 / 2 k , | ˆ a k − a k | ≤ M δ n − 1 / 2 k , | ˆ b k − b k | ≤ M δ n − 1 / 2 k  +I P h | ˆ a k − a k | ≥ M δ n − 1 / 2 k i + I P h | ˆ b k − b k | ≥ M δ n − 1 / 2 k i ≤ 0 + δ 2 + δ 2 = δ, so that    ˆ a k ˆ b k − a k b k    = O p ( n − 1 / 2 k ) . Secondly , using Rolle’ s theorem, we have | ˆ α k − α | ≤ sup t ∈ [0 , 1]     ( H − 1 k ) 0  t ˆ a k ˆ b k + (1 − t ) a k b k      .     ˆ a k ˆ b k − a k b k     . Let M δ be such that I P      ˆ a k ˆ b k − a k b k     ≥ M δ n − 1 / 2 k  < δ. Fix  0 > 0 and let z ∗ = a α −  0 ¯ b α −  0 , z ∗ = ¯ a α +  0 b α +  0 . 26 Then, by Lemma 5 , letting H k = ( sup t ∈ [0 , 1]     ( H − 1 k ) 0  t ˆ a k ˆ b k + (1 − t ) a k b k      .     ˆ a k ˆ b m − a m b m     ≥ ∆ − 1 ( z ∗ , z ∗ ) M δ n − 1 / 2 k ) , we hav e for k large enough I P h | ˆ α k − α | ≥ ∆ − 1 ( z ∗ , z ∗ ) M δ n − 1 / 2 k i ≤ I P[ H k ] ≤ I P  H k ,     ˆ a k ˆ b k − a k b k     < M δ n − 1 / 2 k  + I P      ˆ a k ˆ b k − a k b k     ≥ M δ n − 1 / 2 k  ≤ 0 + δ = δ. That implies | ˆ α k − α | = O p ( n − 1 / 2 k ) . The argument for ˆ γ k is similar . Refer ences [1] Adamczak, R., Miło ´ s, P .: CL T for Ornstein-Uhlenbeck branching particle system. Electronic Journal of Probability 20 (42), 1–35 (2015) [2] Adamczak, R., Miło ´ s, P .: U-Statistics of Ornstein-Uhlenbeck branching particle system. Jour - nal of Theoretical Probability 27 (4), 1071–1111 (2014) [3] Anderson, T .W .: An introduction to multiv ariate statistical analysis, 2nd edn. W iley , Chich- ester (1984) [4] Athre ya, K., Ney , P .: Branching Processes. Dov er Books on Mathematics Series. Dov er Publications (2004) [5] Bartoszek, K., Pienaar , J., Mostad, P ., Andersson, S., Hansen, T .F .: A phylogenetic com- parati ve method for studying multi variate adaptation. Journal of Theoretical Biology 314 , 204–215 (2012) [6] Bartoszek, K., Sagito v , S.: Phylogenetic conﬁdence interv als for the opt imal trait value. Jour- nal of Applied Probability 52 (4), 1115–1132 (2015). [7] Bininda-Emonds, O., Cardillo, M., Jones, K.E., MacPhee, R.D.E., Beck, R.M.D., Grenyer , R., Price, S.A., V os, R.A., Gittleman, J.L., Purvis, A.: The delayed rise of present-day mam- mals. Nature 446 (7135), 507–512 (2007) [8] Bra wand, D., Soumillon, M., Necsulea, A., Julien, P ., Csardi, G., Harrigan, P ., W eier , M., Liechti, A., Aximu-Petri, A., Kircher , M., Albert, F .W ., Zeller , U., Khaitovich, P ., Grutzner , F ., Bergmann, S., Nielsen, R., P ¨ a ¨ abo, S., Kaessmann, H.: The ev olution of gene expression le vels in mammalian or gans. Nature 478 (7369), 343–348 (2011) [9] Butler , M.A., King, A.A.: Phylogenetic comparati ve analysis: a modeling approach for adap- ti ve e volution. The American Naturalist 164 (6), 683–695 (2004) 27 [10] Cooper , N., Purvis, A.: Body size ev olution in mammals: Complexity in tempo and mode. The American Naturalist 175 (6), 727–738 (2010) [11] Cra wford, F .W ., Suchard, M.A.: Di versity , disparity , and ev olutionary rate estimation for unresolved Yule trees. Systematic Biology 62 (3), 439–455 (2013) [12] Ev ans, W .S., Ken yon, C., Peres, Y ., Schulman, L.J.: Broadcasting on trees and the Ising model. Ann. Appl. Probab. 10 (2), 410–433 (2000) [13] Felsenstein, J.: Phylogenies and the comparativ e method. American Naturalist 125 (1), 1–15 (1985) [14] Felsenstein, J.: Inferring Phylogenies. Sinauer Associates (2004) [15] Hansen, T .F .: Stabilizing selection and the comparati ve analysis of adaptation. Ev olution 51 (5), 1341–1351 (1997) [16] Harmon, L., W eir , J., Brock, C., Glor , R., Challenger, W .: GEIGER: in vestigating ev olution- ary radiations. Bioinformatics 24 , 129–131 (2008) [17] Harmon, L.J., Losos, J.B., Jonathan Davies, T ., Gillespie, R.G., Gittleman, J.L., Bryan Jen- nings, W ., Kozak, K.H., McPeek, M.A., Moreno-Roark, F ., Near , T .J., Purvis, A., Ricklefs, R.E., Schluter , D., Schulte II, J.A., Seehausen, O., Sidlauskas, B.L., T orres-Carvajal, O., W eir , J.T ., Mooers, A.Ø.: Early bursts of body size and shape ev olution are rare in compara- ti ve data. Evolution 64 (8), 2385–2396 (2010) [18] Ho, L.S.T ., An ´ e, C.: Asymptotic theory with hierarchical autocorrelation: Ornstein- Uhlenbeck tree models. Annals of Statistics 41 , 957–981 (2013) [19] Ho, L.S.T ., An ´ e, C.: Intrinsic inference dif ﬁculties for trait e volution with Ornstein- Uhlenbeck models. Methods in Ecology and Evolution 5 (11), 1133–1146 (2014) [20] Ho, L.S.T ., An ´ e, C.: A linear-time algorithm for Gaussian and non-Gaussian trait ev olution models. Systematic Biology 63 (3), 397–408 (2014) [21] Jetz, W ., Thomas, G., Joy , J., Hartmann, K., Mooers, A.: The global di versity of birds in space and time. Nature 491 (7424), 444–448 (2012) [22] La wler , E.: Combinatorial Optimization: Networks and Matroids. Holt, Rinehart and Win- ston (1976) [23] Mossel, E., Roch, S., Sly , A.: Rob ust estimation of latent tree graphical models: Inferring hidden states with inexact parameters. IEEE transactions on information theory 59 (7), 4357– 4373 (2013) [24] Mossel, E., Steel, M.: Majority rule has transition ratio 4 on yule trees under a 2-state sym- metric model. Journal of Theoretical Biology 360 (7), 315–318 (2014). [25] P aradis, E., Claude, J., Strimmer , K.: APE: analyses of phylogenetics and e volution in R language. Bioinformatics 20 , 289–290 (2004) 28 [26] Peres, Y .: Probability on trees: An introductory climb . In: P . Bernard (ed.) Lectures on Prob- ability Theory and Statistics, Lectur e Notes in Mathematics , v ol. 1717, 193–280. Springer Berlin Heidelberg (1999) [27] Rohlfs, R.V ., Harrigan, P ., Nielsen, R.: Modeling gene expression ev olution with an extended Ornstein-Uhlenbeck process accounting for within-species v ariation. Molecular Biology and Evolution 31 (1), 201–211 (2014) [28] Semple, C., Steel, A.: Phylogenetics. Oxford lecture series in mathematics and its applica- tions. Oxford Uni versity Press (2003) [29] Shao, J.: Mathematical Statistics. Springer (2003) [30] V enditti, C., Meade, A., P agel, M.: Multiple routes to mammalian di versity . Nature 479 (7373), 393–396 (2011) [31] Y ule, G.U.: A mathematical theory of ev olution, based on the conclusions of Dr . JC W illis, FRS. Philosophical T ransactions of the Royal Society of London. Series B 213 , 21–87 (1925) 29

Phase transition on the convergence rate of parameter estimation under an Ornstein-Uhlenbeck diffusion on a tree

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment