Practical Estimation of High Dimensional Stochastic Differential Mixed-Effects Models

Practical Estimation of High Dimensional Sto c hastic Diﬀeren tial Mixed-Eﬀects Mo dels Um b erto Picc hini a,b, ∗ , Susanne Ditlevsen b a Dep artment of Mathematic al Scienc es, University of Durham, South R o ad, DH1 3LE Durham, England. Phone: +44 (0)191 334 4164; F ax: +44 (0)191 334 3051 b Dep artment of Mathematic al Scienc es, University of Cop enhagen, Universitetsp arken 5, DK-2100 Cop enhagen, Denmark Abstract Sto c hastic diﬀeren tial equations (SDEs) are established to ols to mo del ph ysical phenomena whose dynamics are aﬀected b y random noise. By estimating parameters of an SDE intrin- sic randomness of a system around its drift can b e identiﬁed and separated from the drift itself. When it is of in terest to mo del dynamics within a giv en p opulation, i.e. to mo del sim ultaneously the p erformance of sev eral experiments or sub jects, mixed-eﬀects mod- elling allo ws for the distinction of b etw een and within exp erimen t v ariabilit y . A framew ork to mo del dynamics within a p opulation using SDEs is prop osed, representing sim ultane- ously several sources of v ariation: v ariabilit y b et ween exp erimen ts using a mixed-eﬀects approac h and stochasticit y in the individual dynamics using SDEs. These sto chastic diﬀer- ential mixe d-eﬀe cts mo dels hav e applications in e.g. pharmacokinetics/pharmaco dynamics and biomedical modelling. A parameter estimation metho d is prop osed and computational guidelines for an eﬃcient implemen tation are given. Finally the metho d is ev aluated using sim ulations from standard mo dels like the t wo-dimensional Ornstein-Uhlen b ec k (OU) and the square ro ot models. Keywor ds: automatic diﬀeren tiation, closed-form transition density expansion, maxim um likelihoo d estimation, p opulation estimation, stochastic diﬀeren tial equation, Co x-Ingersoll-Ross pro cess 1. INTRODUCTION Mo dels deﬁned through stochastic diﬀeren tial equations (SDEs) allo w for the represen- tation of random v ariability in dynamical systems. This class of mo dels is b ecoming more and more imp ortant (e.g. Allen (2007) and Øksendal (2007)) and is a standard to ol to mo del ﬁnancial, neuronal and p opulation gro wth dynamics. How ev er, m uch has still to b e ∗ Corresp onding author Email addr esses: umberto.picchini@durham.ac.uk (Um b erto Picchini), susanne@math.ku.dk (Susanne Ditlevsen) Pr eprint submitte d to Elsevier Novemb er 13, 2018 done, b oth from a theoretical and from a computational p oin t of view, to mak e it applica- ble in those statistical ﬁelds that are already well established for deterministic dynamical mo dels (ODE, DDE, PDE). F or e xample, studies in which rep eated measurements are tak en on a series of individuals or experimental units pla y an imp ortan t role in biomedical researc h: it is often reasonable to assume that resp onses follow the same mo del form for all experimental sub jects, but parameters v ary randomly among individuals. The imp or- tance of these mixe d-eﬀe cts mo dels (e.g. McCullo c h and Searle (2001), Pinheiro and Bates (2002)) lies in their ability to split the total v ariation in to within- and b et ween-individual comp onen ts. The SDE approac h has only recen tly b een com bined with mixed-eﬀects models, because of the diﬃculties arising b oth from the theoretical and the computational side when dealing with SDEs. In Ov ergaard et al. (2005) and T ornøe et al. (2005) a sto c hastic diﬀerential mixed-eﬀects mo del (SDMEM) with log-normally distributed random parameters and a constan t diﬀusion term is estimated via (extended) Kalman ﬁlter. Donnet and Samson (2008) developed an estimation metho d based on a sto c hastic EM algorithm for ﬁtting one-dimensional SDEs with mixed-eﬀects. In Donnet et al. (2010) a Bay esian approac h is applied to a one-dimensional mo del for growth curv e data. The metho ds presen ted in the aforementioned references are computationally demanding when the dimension of the mo del and/or parameter space gro ws. How ev er, they are all able to handle data contami- nated b y measuremen t noise. In Ditlevsen and De Gaetano (2005) the likelihoo d function for a simple SDME M with normally distributed random parameters is calculated explic- itly , but generally the likelihoo d function is una v ailable. In Picchini et al. (2010) a com- putationally eﬃcien t metho d for estimating SDMEMs with random parameters follo wing an y suﬃciently well-behav ed con tinuous distribution w as developed. First the conditional transition density of the diﬀusion process given the random eﬀects is approximated in closed-form b y a Hermite expansion, and then the obtained conditional likelihoo d is n u- merically integrated with resp ect to the random eﬀects using Gaussian quadrature. The metho d resulted to b e statistically accurate and computationally fast. How ever, in practice it w as limited to one random eﬀect only (see Picchini et al. (2008) for an application in neuroscience) since Gaussian quadrature is computationally ineﬃcient when the num b er of random eﬀects gro ws. Here the metho d presen ted in Picc hini et al. (2010) is extended to handle SDMEMs with m ultiple random eﬀects. F urthermore, the application is extended in a second direction to handle multidimensional SDMEMs. Computational guidelines for a practical implemen- tation are giv en, using e.g. automatic diﬀerentiation (AD) to ols. Results obtained from sim ulation stud ies sho w that, at least for the examples discussed in the presen t work, the metho dology is ﬂexible enough to accommo date complex mo dels ha ving diﬀerent distribu- tions for the random eﬀects, not only the normal or log-normal distributions whic h are the ones usually emplo yed. Satisfactory estimates for the unknown parameters are obtained ev en considering small populations of sub jects (i.e. few rep etitions of an exp erimen t) and few observ ations p er sub ject/exp erimen t, whic h is often relev ant, especially in biomedicine where large data sets are t ypically unav ailable. A drawbac k of our approach is that measurement error is not accoun ted for, and th us 2 it is most useful in those situations where the v ariance of the measuremen t noise is small compared to the system noise. The pap er is organized as follo ws. Section 2 i n tro duces the SDMEMs, the observ ation sc heme and the necessary notation. Section 3 introduces the lik eliho o d function. Section 4 considers appro ximate metho ds for when the expression of the exact likelihoo d function cannot b e obtained; computational guidelines and softw are to ols are also discussed. Section 5 is dev oted to the application on simulated datasets. Section 6 summarizes the results and the adv antages and limitations of the metho d are discussed. An Appendix containing tec hnical results closes the pap er. 2. STOCHASTIC DIFFERENTIAL MIXED EFFECTS MODELS In the following, b old characters are used to denote v ectors and matrices. Consider a d -dimensional (Itô) SDE mo del for some con tinuous pro cess X t ev olving in M diﬀerent exp erimen tal units randomly c hosen from a theoretical p opulation: d X i t = µ ( X i t , θ , b i ) dt + σ ( X i t , θ , b i ) d W i t , X i 0 = x i 0 , i = 1 , . . . , M (1) where X i t is the v alue at time t ≥ t i 0 of the i th unit, with X i 0 = X i t i 0 ; θ ∈ Θ ⊆ R p is a p -dimensional ﬁxed eﬀects parameter (the same for the en tire p opulation) and b i ≡ b i ( Ψ ) ∈ B ⊆ R q is a q -dimensional random eﬀects parameter (unit sp eciﬁc) with comp onen ts ( b i 1 , ..., b i q ) ; eac h comp onen t b i l ma y follow a diﬀerent contin uous distribution ( l = 1 , ..., q ). Let p B ( b i | Ψ ) denote the join t distribution of the v ector b i , parametrized b y an r -dimensional parameter Ψ ∈ Υ ⊆ R r . The W i t ’s are d -dimensional standard Brownian motions with comp onen ts W ( h ) i t ( h = 1 , ..., d ). The W ( h ) i t and b j l are assumed mutually indep enden t for all 1 ≤ i, j ≤ M , 1 ≤ h ≤ d , 1 ≤ l ≤ q . The initial condition X i 0 is assumed equal to a vector of constants x i 0 ∈ R d . The drift and the diﬀusion co eﬃcien t functions µ ( · ) : E × Θ × B → R d and σ ( · ) : E × Θ × B → S are assumed kno wn up to the parameters, and are assumed suﬃcien tly regular to ensure a unique weak solution (Øksendal (2007)), where E ⊆ R d denotes the state space of X i t and S denotes the set of d × d p ositiv e deﬁnite matrices. Model (1) assumes that in eac h of the M units the ev olution of X follo ws a common functional form, and therefore diﬀerences b etw een units are due to diﬀeren t realizations of the Brownian motion paths { W i t } t ≥ t i 0 and of the random parameters b i . Thus, the dynamics within a generic unit i are expressed via an SDE mo del driv en b y Brownian motion, and the introduction of a v ector parameter randomly v arying among units allo ws for the explanation of the v ariabilit y b et ween the M units. Assume that the distribution of X i t giv en ( b i , θ ) and X i s = x s , s < t , has a strictly p ositiv e densit y w.r.t. the Leb esgue measure on E , whic h is denoted by x → p X ( x , t − s | x s , b i , θ ) > 0 , x ∈ E . (2) Assume moreo ver that unit i is observed at the same set of n i + 1 discrete time p oints { t i 0 , t i 1 , . . . , t i n i } for eac h co ordinate X ( h ) i t of the pro cess X i t ( h = 1 , ..., d ), whereas diﬀeren t 3 units ma y b e observ ed at diﬀerent sets of time p oin ts. Let x i b e the ( n i + 1) × d matrix of resp onses for unit i , with j th row giv en by x i ( t i j ) = x i j = ( x (1) i j , ..., x ( d ) i j ) , and let the follo wing b e the N × d total resp onse matrix, N = P M i =1 ( n i + 1) : x = ( x 1 T , ..., x M T ) T =                   x (1)1 0 · · · x ( d )1 0 . . . . . . . . . x (1)1 n 1 · · · x ( d )1 n 1 . . . . . . . . . x (1) i j · · · x ( d ) i j . . . . . . . . . x (1) M 0 · · · x ( d ) M 0 . . . . . . . . . x (1) M n M · · · x ( d ) M n M                   , where T denotes transp osition. W rite ∆ i j = t i j − t i j − 1 for the time distance b etw een x i j − 1 and x i j . Notice that this observ ation scheme implies that the matrix of data x must not con tain missing v alues. The goal is to estimate ( θ , Ψ ) using sim ultaneously all the data in x . The sp eciﬁc v alues of the b i ’s are not of interest, but only the iden tiﬁcation of the v ector-parameter Ψ c haracterizing their distribution. How ever, estimates of the random parameters b i are also obtained, since it is necessary to estimate them when estimating ( θ , Ψ ) . 3. MAXIMUM LIKELIHOOD ESTIMA TION T o obtain the marginal density of x i , the conditional densit y of the data giv en the non-observ able random eﬀects b i is in tegrated with respect to the marginal densit y of the random eﬀects, using that W ( h ) i t and b j l are indep endent. This yields the likelihoo d function L ( θ , Ψ ) = M Y i =1 p ( x i | θ , Ψ ) = M Y i =1 Z B p X ( x i | b i , θ ) p B ( b i | Ψ ) d b i . (3) Here p ( · ) is the density of x i giv en ( θ , Ψ ) , p B ( · ) is the densit y of the random eﬀects, and p X ( · ) is the pro duct of the transition densities p X ( · ) giv en in (2) for a giv en realization of the random eﬀects and for a giv en θ , p X ( x i | b i , θ ) = n i Y j =1 p X ( x i j , ∆ i j | x i j − 1 , b i , θ ) . (4) In applications the random eﬀects are often assumed to b e (multi)normally distributed, but p B ( · ) could b e any w ell-b eha ved density function. Solving the integral in (3) yields the marginal lik eliho o d of the parameters for unit i , indep enden t of the random eﬀects 4 b i ; b y maximizing the resulting expression (3) with resp ect to θ and Ψ the corresp onding maxim um likelihoo d estimators (MLE) ˆ θ and ˆ Ψ are obtained. Notice that it is p ossible to consider random eﬀects having discrete distributions: in that case the in tegral b ecomes a sum and can b e easily computed when the transition densit y p X is kno wn. In this paper only random eﬀects ha ving contin uous distributions are considered. In simple cases an explicit expression for the likelihoo d function, and ev en explicit estimating equations for the MLEs can b e found (Ditlevsen and De Gaetano (2005)). Ho wev er, in general it is not p ossible to ﬁnd an explicit solution for the in tegral, and thus exact MLEs are una v ailable. This o ccurs when: (i) p X ( x i j , ·| x i j − 1 , · ) is kno wn but it is not p ossible to analytically solve the integral, and (ii) p X ( x i j , ·| x i j − 1 , · ) is unkno wn. In (i) the in tegral must b e ev aluated numerically to obtain an approximation of the lik eliho o d (3). In (ii) ﬁrst p X ( x i j , ·| x i j − 1 , · ) is appro ximated, then the integral in (3) is solved n umerically . In situation (ii) there exist several strategies to approximate the densit y p X ( x i j , ·| x i j − 1 , · ) , e.g. Monte Carlo approximations, direct solution of the F okk er-Planck equation, or Hermite expansions, just to mention some of the p ossible approac hes, see Hurn et al. (2007) for a comprehensiv e review fo cused on one-dimensional diﬀusions. W e prop ose to approximate the transition density in closed-form using a Hermite expansion (Aït-Sahalia (2008)). It often provides a go o d appro ximation to p X , and Jensen and P oulsen (2002) show ed that the metho d is computationally eﬃcien t. Using the obtained expression, the lik eliho o d function is appro ximated, thus deriving approximated MLEs of θ and Ψ . 4. CLOSED-FORM TRANSITION DENSITY EXP ANSION AND LIKELI- HOOD APPRO XIMA TION 4.1. T r ansition Density Exp ansion for Multidimensional SDEs Here the transition density expansion of a d -dimensional time-homogeneous SDE as suggested in Aït-Sahalia (2008) is review ed; the same reference pro vides a method to handle m ulti-dimensional time-inhomogeneous SDEs, but for ease of exp osition atten tion is focused on the former case. Also references on further extensions, e.g. Lévy pro cesses, are given in the pap er. W e will only consider SDEs reducible to unit diﬀusion, i.e. multi- dimensional diﬀusions X for which there exists a one-to-one transformation to another diﬀusion with diﬀusion matrix the iden tity matrix. It is p ossible to p erform the density expansion also for non-reducible SDEs (Aït-Sahalia (2008)). F or the momen t reference to θ is dropp ed when not necessary , i.e. a function f ( x, θ ) is written f ( x ) . Consider the follo wing d -dimensional, reducible, time-homogeneous SDE d X t = µ ( X t ) dt + σ ( X t ) d W t , X 0 = x 0 (5) and a series of d -dimensional discrete observ ations x 0 , x 1 , ..., x n from X , all observ ed at the same time p oin ts { t 0 , t 1 , ..., t n } ; denote with E the state space of X . W e wan t to appro ximate p X ( x j , ∆ j | x j − 1 ) , the conditional densit y of X j giv en X j − 1 = x j − 1 , where ∆ j = t j − t j − 1 ( j = 1 , ..., n ). Under regularity conditions (e.g. µ ( x ) and σ ( x ) are assumed to b e inﬁnitely diﬀerentiable in x on E , v ( x ) := σ ( x ) σ ( x ) T is a d × d p ositiv e deﬁnite matrix 5 for all x in the in terior of E and all the drift and diﬀusion comp onen ts satisfy linear growth conditions, see Aït-Sahalia (2008) for details), the logarithm of the transition densit y can b e expanded in closed form using an order J = + ∞ Hermite series, and appro ximated by a T a ylor expansion up to order K , ln p ( K ) X ( x j , ∆ j | x j − 1 ) = − d 2 ln(2 π ∆ j ) − 1 2 ln(det( v ( x j ))) + C ( − 1) Y ( γ ( x j ) | γ ( x j − 1 )) ∆ j + K X k =0 C ( k ) Y ( γ ( x j ) | γ ( x j − 1 )) ∆ k j k ! . (6) Here ∆ k j denotes ∆ j raised to the p o wer of k . The co eﬃcien ts C ( k ) Y are giv en in the Appendix and γ ( x ) = ( γ (1) ( x ) , ..., γ ( d ) ( x )) T is the Lamp erti transform, whic h b y deﬁnition exists when the diﬀusion is reducible, and is suc h that O γ ( x ) = σ − 1 ( x ) . See the App endix for a suﬃcien t and necessary condition for reducibilit y . Using Itô’s lemma, the transformation Y t = γ ( X t ) deﬁnes a new diﬀusion pro cess Y t , solution of the follo wing SDE d Y t = µ Y ( Y t ) dt + d W t , Y 0 = y 0 , where the h -th elemen t of µ Y is giv en by ( h = 1 , ..., d ) µ ( h ) Y ( Y t ) = d X i =1   σ − 1 ( γ − 1 ( Y t ))  hi µ ( i ) ( γ − 1 ( Y t ))  − 1 2 d X i,j,k  σ − 1 ( γ − 1 ( Y t )) ∂ σ ∂ x j ( γ − 1 ( Y t )) σ − 1 ( γ − 1 ( Y t ))  hi σ ik ( γ − 1 ( Y t )) σ j k ( γ − 1 ( Y t )) . F or ease of interpretation the Lamp erti transform and the drift term µ Y for a scalar ( d = 1 ) SDE are rep orted. Namely γ ( · ) is deﬁned b y Y t ≡ γ ( X t ) = Z X t du σ ( u ) , where the lo wer b ound of integration is an arbitrary p oin t in the interior of E . The drift term is giv en by µ Y ( Y t ) = µ ( γ − 1 ( Y t )) σ ( γ − 1 ( Y t )) − 1 2 ∂ σ ∂ x ( γ − 1 ( Y t )) . The transformation of X t in to Y t is a necessary step to make the transition density of the transformed pro cess closer to a normal distribution, so that the Hermite expansion giv es reasonable results. How ever, the reader is warned that this is b y no means an easy task for many m ultiv ariate SDEs, and imp ossible for those ha ving non-reducible diﬀusion (see Aït-Sahalia (2008) for details). The use of a softw are with sym b olic algebra capabilities lik e e.g. Ma thema tica , Maple or Maxima is necessary to carry out the calculations. 6 4.2. Likeliho o d Appr oximation and Par ameter Estimation F or a reducible time-homogeneous SDMEM, the co eﬃcien ts C ( k ) Y are obtained as in Section 4.1 by taking ( θ , b i ) , ∆ i j and ( x i j , x i j − 1 ) instead of θ , ∆ j and ( x j , x j − 1 ) , resp ec- tiv ely . Then p ( K ) X is substituted for the unknown transition densit y in (4), thus obtaining a sequence of appro ximations to the likelihoo d function L ( K ) ( θ , Ψ ) = M Y i =1 Z B p ( K ) X ( x i | b i , θ ) p B ( b i | Ψ ) d b i , (7) where p ( K ) X ( x i | b i , θ ) = n i Y j =1 p ( K ) X ( x i j , ∆ i j | x i j − 1 , b i , θ ) (8) and p ( K ) X is given b y equation (6). By maximizing (7) with respect to ( θ , Ψ ) , approximated MLEs ˆ θ ( K ) and ˆ Ψ ( K ) are obtained. In general, the in tegral in (7) does not ha v e a closed form solution, and therefore eﬃcien t n umerical in tegration metho ds are needed; see Picchini et al. (2010) for the case of a single random eﬀect ( q = 1 ). General purp ose approximation methods for one- or m ulti- dimensional in tegrals, irrespective of the random eﬀects distribution, are av ailable (e.g. Krommer and Ueberhuber (1998)) and implemen ted in sev eral soft w are pac k ages, though the complexit y of the problem grows fast when increasing the dimension of B . Ho wev er, since exact transition densities or a closed-form approximation to p X are supp osed to be a v ailable, analytic expressions for the in tegrands in (3) or (7) are kno wn and the Laplace metho d (e.g. Sh un and McCullagh (1995)) ma y b e used. W rite b i = ( b i 1 , ..., b i q ) and deﬁne f ( b i | θ , Ψ ) = log p X ( x i | b i , θ ) + log p B ( b i | Ψ ) , (9) where p X ( x i | b i , θ ) is giv en in (4). Then log R B e f ( b i | θ , Ψ ) d b i can b e appro ximated using a second order T aylor series expansion, known as Laplace approximation: log Z B e f ( b i | θ , Ψ ) d b i ' f ( ˆ b i | θ , Ψ ) + q 2 log(2 π ) − 1 2 log    − H ( ˆ b i | θ , Ψ )    where ˆ b i = arg max b i f ( b i | θ , Ψ ) , and H ( b i | θ , Ψ ) = ∂ 2 f ( b i | θ , Ψ ) /∂ b i ∂ b i T is the Hessian of f w.r.t. b i . Thus, the log-likelihoo d function is appro ximately given b y log L ( θ , Ψ ) ' M X i =1  f ( ˆ b i | θ , Ψ ) + q 2 log(2 π ) − 1 2 log    − H ( ˆ b i | θ , Ψ )     (10) and the v alues of θ and Ψ maximizing (10) are approximated MLEs. F or the sp ecial case where − f ( b i | θ , Ψ ) is quadratic and conv ex in b i the Laplace approximation is ex- 7 act (Jo e (2008)) and (10) pro vides the exact lik eliho o d function. An appro ximation of log L ( K ) ( θ , Ψ ) can b e derived in the same wa y , and w e denote with ( ˆ θ ( K ) , ˆ Ψ ( K ) ) = arg min θ ∈ Θ , Ψ ∈ Υ − log L ( K ) ( θ , Ψ ) the corresp onding appro ximated MLE of ( θ , Ψ ) . Davidian and Giltinan (2003) recommend to use the Laplace approximation only if n i is “large”; ho wev er Ko and Davidian (2000) note that even if the n i ’s are small, the inferences should still b e v alid if the magnitude of in tra-individual v ariation is small relativ e to the inter-individual v ariation. This happens in man y applications, e.g. in pharmacokinetics. In general, computing (10) or log L ( K ) ( θ , Ψ ) is non-trivial, since M indep enden t opti- mization pro cedures m ust be run to obtain the ˆ b i ’s and then the M Hessians H ( ˆ b i | θ , Ψ ) m ust b e computed. The latter problem can b e solved using either (i) approximations based on ﬁnite diﬀerences, (ii) computing the analytic expressions of the Hessians using a sym b olic calculus program or (iii) using automatic diﬀeren tiation to ols (AD, e.g. Griew ank (2000)). W e recommend to av oid metho d (i) since it is computationally costly when the dimen- sion of M and/or b i gro ws, whereas metho ds (ii)-(iii) are reliable choices, since sym b olic pac k ages are b ecoming standard in most softw are and are anyw ay necessary to calculate an approximation p ( K ) X to the transition densit y . How ev er, when a symbolic pac k age is not a v ailable to the user or is not of help in some sp eciﬁc situations, AD can b e a con venien t (if not the only p ossible) choice, esp ecially when the function to b e diﬀeren tiated is deﬁned via a complex soft ware co de; see the Conclusion for a discussion. In order to derive the required Hessian automatically w e used the AD to ol ADiMat for Ma tlab (Bisc hof et al. (2005)), see http://www.autodiff.org for a comprehensive list of other AD softw are. F or example, assume that a user deﬁned Ma tlab function named loglik_indiv computes the f function in (9) at a giv en v alue of the random eﬀects b i (named b_rand ): result = loglik_indiv(b_rand) so result con tains the v alue of f at b i . The follo wing Ma tlab code then in vok es ADiMat and creates automatically a ﬁle named g_loglik_indiv containing the code necessary to return the exact (to mac hine precision) Hessian of loglik_indiv w.r.t. b_rand : addiff(@loglik_indiv, ’b_rand’,[], ’--2ndorderfwd’) A t this p oin t we initialize the arra y and the matrix that will con tain the gradien t and the Hessian of f w.r.t. the q -dimensional vector b i : gradient = createFullGradients(b_rand); % inizialize the gradient Hessian = createHessians([q q], b_rand); % inizialize the Hessian [Hessian, gradient] = g_loglik_indiv(Hessian, gradient, b_rand); The last line returns the desired Hessian and the gradien t of f ev aluated at b_rand . W e used the Hessian either to compute the Laplace appro ximation or in the trust region 8 metho d used for the in ternal step optimization, read b elo w. When it is p ossible to deriv e the expression for the Hessian analytically w e strongly recommend to av oid the use of AD to ols in order to sp eed up the estimation algorithm. F or example, in Section 5.1 only t wo random eﬀects are considered and using the Ma tlab Sym b olic Calculus T o olbox w e ha v e obtained the analytic expression for the Hessian without m uch eﬀort. F or the remainder of this Section the reference to the index K is dropp ed, except where necessary , as the following apply irresp ectiv ely of whether p X or p ( K ) X is used. The mini- mization of − log L ( θ , Ψ ) is a nested optimization problem. First the in ternal optimization step estimates the ˆ b i ’s for every unit (the ˆ b i ’s are sometimes kno wn in the literature as em- pirical Ba y es estimators). Since b oth symbolic calculus and AD tools provide exact results for the deriv ativ es of f ( b i ) , the v alues pro vided via AD being only limited b y the computer precision, the exact gradient and Hessian of f ( b i ) can b e used to minimize − f ( b i ) w.r.t. b i . W e used the subspace trust-region metho d describ ed in Coleman and Li (1996) and implemen ted in the Ma tlab fminunc function. The external optimization step minimizes − log L ( θ , Ψ ) w.r.t. ( θ , Ψ ) , after plugging the ˆ b i ’s in to (10). This is a computationally hea vy task, esp ecially for large M , b ecause the M internal optimization steps must b e p erformed for eac h inﬁnitesimal v ariation of the parameters ( θ , Ψ ) . Therefore to p erform the external step we rev erted to deriv ative-free optimization metho ds, namely the Nelder- Mead simplex with additional chec ks on parameter b ounds, as implemented b y D’Errico (2006) for Ma tlab . T o sp eed up the algorithm con vergence, ˆ b i ( k ) ma y b e used as starting v alue for b i in the ( k + 1) th iteration of the in ternal step, where ˆ b i ( k ) is the estimate of b i at the end of the k th iteration of the in ternal step. This might not b e an optimal strategy , ho wev er it should improv e ov er the c hoice made b y some authors who use a v ector of zeros as starting v alue for b i eac h time the internal step is p erformed. The latter strategy may b e ineﬃcient when dealing with highly time consuming problems, as it requires many more iterations. Once estimates for θ and Ψ are av ailable, estimates of the random parameters b i are automatically giv en b y the v alues of the ˆ b i at the last iteration of the external optimization step, see Section 5.2 for an example. 5. SIMULA TION STUDIES In this Section the eﬃcacy of the metho d is assessed through Monte Carlo sim ulations under diﬀerent exp erimen tal designs. W e alw ays choose M and n to b e not v ery large, since in most applications, e.g. in the biomedical context, large datasets are often una v ailable. Ho wev er, see Picc hini et al. (2008) for the application of a one-dimensional SDMEM on a v ery large data set. 5.1. Or ange T r e es Gr owth Mo del The follo wing is as a toy example for gro wth mo dels, where SDEs are used regularly , esp ecially to describ e animal gro wth that allows for non-monotone growth and can mo del unexp ected c hanges in gro wth rates, see Donnet et al. (2010) for an application to chic ken 9 gro wth, Strathe et al. (2009) for an application to gro wth of pigs, and Filipe et al. (2010) for an application to b o vine data. In Lindstrom and Bates (1990) and Pinheiro and Bates (2002, Sections 8.1.1-8.2.1), data from a study on the growth of orange trees are studied by means of deterministic nonlinear mixed-eﬀects mo dels using the method proposed in Lindstrom and Bates (1990). The data are a v ailable in the Orange dataset provided in the nlme R pack age (Pinheiro et al. (2007)). This is a balanced design consisting of sev en measuremen ts of the circumference of eac h of ﬁv e orange trees. In these references, a logistic mo del was considered to study the relationship b et ween the circumference X i,j (mm), measured on the i th tree at age t ij (da ys), and the age ( i = 1 , ..., 5 and j = 1 , ..., 7 ): X i,j = φ 1 1 + exp( − ( t ij − φ 2 ) /φ 3 ) + ε ij (11) with φ 1 (mm), φ 2 (da ys) and φ 3 (da ys) all p ositiv e, and ε ij ∼ N (0 , σ 2 ε ) are i.i.d. mea- suremen t error terms. The parameter φ 1 represen ts the asymptotic v alue of X as time go es to inﬁnit y , φ 2 is the time v alue at whic h X = φ 1 / 2 (the inﬂection p oint of the logis- tic mo del) and φ 3 is the time distance b et w een the inﬂection p oin t and the p oin t where X = φ 1 / (1 + e − 1 ) . In Picc hini et al. (2010) a SDMEM was deriv ed from mo del (11) with a normally distributed random eﬀect on φ 1 . The lik eliho o d approximation describ ed in Section 4.2 was applied to estimate parameters, but using Gaussian quadrature instead of the Laplace metho d to solv e the one-di mensional in tegral. Now consider a SDMEM with random eﬀects on b oth φ 1 and φ 3 . The dynamical mo del corresp onding to (11) for the i th tree and ignoring the error term is giv en by the following ODE dX i t dt = 1 ( φ 1 + φ i 1 )( φ 3 + φ i 3 ) X i t ( φ 1 + φ i 1 − X i t ) , X i 0 = x i 0 , t ≥ t i 0 with φ i 1 ∼ N (0 , σ 2 φ 1 ) indep enden t of φ i 3 ∼ N (0 , σ 2 φ 3 ) and b oth indep enden t of ε ij ∼ N (0 , σ 2 ε ) for all i and j . Now φ 2 only app ears in the deterministic initial condition X i 0 = X i t i 0 = φ 1 / (1 + exp[( φ 2 − t i 0 ) /φ 3 ]) , where t i 0 = 118 da ys for all the trees. In growth data it is often observ ed that the v ariance is prop ortional to the lev el, which is obtained in an SDE if the diﬀusion co eﬃcien t is prop ortional to the square root of the pro cess itself. Consider a state-dep enden t diﬀusion co eﬃcien t leading to the SDMEM: dX i t = 1 ( φ 1 + φ i 1 )( φ 3 + φ i 3 ) X i t ( φ 1 + φ i 1 − X i t ) dt + σ p X i t dW i t , X i 0 = x i 0 , (12) φ i 1 ∼ N (0 , σ 2 φ 1 ) , φ i 3 ∼ N (0 , σ 2 φ 3 ) , (13) where σ has units (mm / days) 1 / 2 . Th us, θ = ( φ 1 , φ 3 , σ ) , b i = ( φ i 1 , φ i 3 ) and Ψ = ( σ φ 1 , σ φ 3 ) . Since the random eﬀects are indep enden t, the densit y p B in (9) is p B ( b i | Ψ ) = ϕ ( φ i 1 ) ϕ ( φ i 3 ) , where ϕ ( φ i 1 ) and ϕ ( φ i 3 ) are normal p dfs with means zero and standard deviations σ φ 1 and σ φ 3 , resp ectiv ely . 10 T able 1: Orange trees gro wth: Monte Carlo maxim um lik eliho od estimates and 95% conﬁdence interv als from 1000 sim ulations of mo del (12)-(13), using an order K = 2 for the closed form density expansion (CFE) and the densit y approximation based on the Euler-Maruy ama discretization (EuM). F or the CFE metho d measures of symmetry are also rep orted. T rue parameter values φ 1 φ 3 σ σ φ 1 σ φ 3 ˆ φ 1 ˆ φ 3 ˆ σ ˆ σ φ 1 ˆ σ φ 3 M = 5 , n + 1 = 7 195 350 0.08 25 52.5 Mean CFE [95% CI] 197.40 [164.98, 236.97] 356.88 [281.74, 460.95] 0.079 [0.057, 0.102] 15.68 [ 1 . 7 × 10 − 7 , 44.33] 28.67 [ 5 . 8 × 10 − 8 , 112.28] Skewness CFE 0.35 0.59 0.22 0.55 1.11 Kurtosis CFE 3.30 3.56 3.06 2.76 3.88 Mean EuM [95% CI] 183.35 [154.62, 217.93] 303.75 [236.57, 398.10] 0.089 [0.060, 0.123] 12.60 [ 1 . 5 × 10 − 7 , 39.56] 34.96 [ 5 . 9 × 10 − 8 , 112.48] M = 5 , n + 1 = 20 195 350 0.08 25 52.5 Mean CFE [95% CI] 196.71 [164.48, 236.39] 352.16 [274.88, 461.84] 0.079 [0.067, 0.090] 15.73 [ 2 × 10 − 7 , 43.67] 30.77 [ 7 × 10 − 8 , 114.94] Skewness CFE 0.33 0.63 -0.04 0.44 1.03 Kurtosis CFE 3.33 3.67 2.93 2.38 3.69 Mean EuM [95% CI] 192.50 [161.45, 230.09] 339.12 [264.82, 445.84] 0.080 [0.068, 0.091] 15.01 [ 1 . 9 × 10 − 7 , 41.48] 32.63 [ 6 . 4 × 10 − 8 , 114.79] M = 30 , n + 1 = 7 195 350 0.08 25 52.5 Mean CFE [95% CI] 196.06 [183.41, 209.52] 354.55 [317.66, 395.48] 0.081 [0.072, 0.092] 22.71 [7.25, 33.45] 42.18 [ 1 . 5 × 10 − 4 , 73.84] Skewness CFE 0.20 0.32 0.14 -0.89 -0.65 Kurtosis CFE 3.15 3.29 3.14 5.33 3.20 Mean EuM [95% CI] 182.89 [172.02, 194.68] 303.87 [273.18, 341.84] 0.093 [0.080, 0.106] 19.23 [0.05, 27.82] 48.54 [12.13, 75.81] M = 30 , n + 1 = 20 195 350 0.08 25 52.5 Mean CFE [95% CI] 195.62 [183.33, 209.20] 351.18 [315.47, 389.21] 0.080 [0.075, 0.085] 23.04 [9.61, 34.03] 44.83 [ 2 . 2 × 10 − 4 , 74.65] Skewness CFE 0.20 0.30 0.05 -0.73 -0.71 Kurtosis CFE 3.10 3.27 2.76 5.08 3.62 Mean EuM [95% CI] 191.51 [179.93, 204.40] 338.19 [304.38, 374.94] 0.081 [0.076, 0.086] 22.24 [8.93, 32.83] 46.34 [ 4 . 4 × 10 − 4 , 75.03] W e generated 1000 datasets of dimension ( n + 1) × M from (12)-(13) and estimated ( θ , Ψ ) on eac h dataset, th us obtaining 1000 sets of parameter estimates. This w as rep eated for ( M , n + 1) = (5 , 7) , (5 , 20) , (30 , 7) and (30 , 20) . T ra jectories were generated using the Milstein scheme (Kloeden and Platen (1992)) with unit step size in the same time in terv al [118, 1582] as in the real data. The data were then extracted b y linear interpolation from the simulated tra jectories at the linearly equally spaced sampling times { t 0 , t 1 , ..., t n } for diﬀeren t v alues of n , where t 0 = 118 and t n = 1582 for ev ery n . An exception is the case M = 7 , n i = n = 5 , where { t 0 , ..., t n } = { 118 , 484 , 664 , 1004 , 1231 , 1372 , 1582 } , the same as in the data. P arameters w ere ﬁxed at ( X 0 , φ 1 , φ 3 , σ, σ φ 1 , σ φ 3 ) = (30 , 195 , 350 , 0 . 08 , 25 , 52 . 5) . The v alue for σ φ 3 is chosen suc h that the co eﬃcien t of v ariation for ( φ 3 + φ i 3 ) is 15%, i.e. φ i 3 has non-negligible inﬂuence. An order K = 2 approximation to the likelihoo d was used, see the App endix for the co eﬃcien ts. The estimates ( ˆ θ (2) , ˆ Ψ (2) ) ha ve b een obtained as describ ed in Section 4.2 and are denoted as CFE in T able 1, where CFE stands for Closed F orm Expansion to denote that a closed-form transition density expansion technique has b een used. The CFE estimates w ere used to pro duce the ﬁt for ( M , n + 1) = (5 , 20) given in Figure 1(a), rep orting simulated data and empirical mean of 5000 sim ulated tra jectories from (12)- (13) generated with the Milstein scheme using a step-size of unit length. Empirical 95% conﬁdence bands of tra jectory v alues and three example tra jectories are also rep orted. F or eac h simulated tra jectory indep enden t realizations of φ i 1 and φ i 3 w ere pro duced b y dra wing from the normal distributions N (0 , ( ˆ σ (2) φ 1 ) 2 ) and N (0 , ( ˆ σ (2) φ 3 ) 2 ) . The corresp onding ﬁt for ( M , n ) = (30 , 20) is giv en in Figure 1(b). There is a p ositive linear correlation ( r = 0 . 42 , p < 0 . 001 ) betw een the estimates of φ 1 and φ 3 , see Figure 2 rep orting also the least squares ﬁt line. Similar relations w ere found when using diﬀeren t combinations of M and n . Histograms of the p opulation parameter estimates ˆ φ (2) 1 , ˆ φ (2) 3 and ˆ σ (2) are giv en in Figure 11 0 200 400 600 800 1000 1200 1400 1600 0 50 100 150 200 250 Time since December 31, 1968 (days) Trunk circumference (mm) (a) ( M , n + 1) = (5 , 20) 0 200 400 600 800 1000 1200 1400 1600 0 50 100 150 200 250 Time since December 31, 1968 (days) Trunk circumference (mm) (b) ( M , n + 1) = (30 , 20) Figure 1: Orange trees growth: simulated data (circles connected b y straight lines) and ﬁt of the SDMEM (12)-(13) using an order K = 2 for the densit y expansion. In panel (a) is ( M , n + 1) = (5 , 20) and in panel (b) is ( M , n + 1) = (30 , 20) . Eac h panel rep orts the empirical mean curve (smooth solid line), 95% empirical conﬁdence curves (dashed lines) and example sim ulated tra jectories. 170 175 180 185 190 195 200 205 210 215 220 280 300 320 340 360 380 400 420 440 φ 1 (2) φ 3 (2) Figure 2: Orange trees gro wth: scatterplot of ˆ φ (2) 3 vs ˆ φ (2) 1 and least squares ﬁt for ( M , n + 1) = (30 , 20) . 12 170 175 180 185 190 195 200 205 210 215 220 0 20 40 60 80 100 120 140 φ 1 (2) (a) ˆ φ (2) 1 280 300 320 340 360 380 400 420 440 0 20 40 60 80 100 120 140 φ 3 (2) (b) ˆ φ (2) 3 0.075 0.08 0.085 0.09 0.095 0 20 40 60 80 100 120 140 σ (2) (c) ˆ σ (2) Figure 3: Orange trees growth: histogram of p opulation parameter estimates obtained using an order K = 2 for the density expansion, and ﬁts of normal probability density functions for ( M , n + 1) = (30 , 20) . 3, with normal probability densit y functions ﬁtted on top. The normal densities ﬁt well to the histograms with estimated means (standard deviations) equal to 195.6 (6.6), 351.2 (19.2) and 0.080 (0.003) for ˆ φ (2) 1 , ˆ φ (2) 3 and ˆ σ (2) , resp ectiv ely . It is worth while to compare the metho dology presen ted here, where a closed form expan- sion to the transition densit y is used, with a more straightforw ard approac h, namely the ap- pro ximation to the transition density based on the so-called “one-step” Euler-Maruyama ap- pro ximation. Consider for ease of exp osition a scalar SDE dX t = µ ( X t ) dt + σ ( X t ) dW t start- ing at X 0 = x 0 . The SDE can b e appro ximated b y X t +∆ − X t = µ ( X t )∆ + σ ( X t )∆ 1 / 2 ε t +∆ for ∆ “small”, with { ε t +∆ } a sequence of indep enden t dra ws from the standard normal distribution. This leads to the following transition density appro ximation p X ( x t +∆ , ∆ | x t ) ≈ ϕ ( x t +∆ ; x t + µ ( x t )∆ , σ 2 ( x t )∆) (14) where ϕ ( · ; m, v ) is the p df of the normal distribution with mean m and v ariance v . The parameter estimates obtained using (14) instead of the closed-form appro ximation p ( K ) X are giv en in T able 1 and are denoted EuM (standing for Euler-Maruyama). The v alue for ∆ used in (14) is the time distance b et w een the simulated data p oin ts. The comparisons b et w een CFE and EuM for the same v alues of M and n hav e b een p erformed using the same simulated datasets. The quality of the estimates obtained with the CFE metho d compared to the simple EuM appro ximation is considerable impro ved for data sampled at lo w frequency ( ∆ large), i.e. when n = 7 , whic h is a common situation in applications. F or ( M = 30 , n = 7) the 95% conﬁdence in terv als for the EuM metho d even fail to con tain the true parameter v alues. The bad b eha vior of the EuM approximation when ∆ is not s mall is well do cumen ted (e.g. Jensen and P oulsen (2002), Sørensen (2004)) and therefore our results are not surprising. Several exp erimen ts with diﬀerent SDE models (not SDMEMs) ha ve b een conducted in Jensen and P oulsen (2002) where the conclusion is that although the CFE technique do es require tedious algebraic calculations, they seem to b e w orth the 13 eﬀort. 5.2. Two-Dimensional Ornstein-Uhlenb e ck Pr o c ess The OU pro cess has found n umerous applications in biology , physics, engineering and ﬁnance, see e.g. Picchini et al. (2008) and Ditlevsen and Lansky (2005) for applications in neuroscience, or F av etto and Samson (2010) for a tw o-dimensional OU mo del describing the tissue micro v ascularization in anti-cancer therap y . Consider the follo wing SDMEM of a tw o-dimensi onal OU pro cess: dX (1) i t = −  β 11 b i 11 ( X (1) i t − α 1 ) + β 12 b i 12 ( X (2) i t − α 2 )  dt + σ 1 dW (1) i t , (15) dX (2) i t = −  β 21 b i 21 ( X (1) i t − α 1 ) + β 22 b i 22 ( X (2) i t − α 2 )  dt + σ 2 dW (2) i t , (16) b i ll 0 ∼ Γ( ν ll 0 , ν − 1 ll 0 ) , l , l 0 = 1 , 2; i = 1 , ..., M (17) with initial v alues X ( k ) i 0 = x ( k ) i 0 , k = 1 , 2 . Here Γ( r , s ) denotes the Gamma distribution with p ositiv e parameters r and s and probabilit y densit y function p Γ ( z ) = 1 s r Γ( r ) z r − 1 e − z /s , z ≥ 0 , with mean 1 when s = r − 1 . The parameters b i ll 0 , β ll 0 , σ l and ν ll 0 are strictly positive ( l , l 0 = 1 , 2 ) whereas α 1 and α 2 are real. Let ∗ denote elemen t-wise m ultiplication. Rewrite the system in matrix notation as d X i t = β ∗ b i ( α − X i t ) dt + σ d W i t , X i 0 = x i 0 , i = 1 , ..., M (18) where X i t = X (1) i t X (2) i t ! , β =  β 11 β 12 β 21 β 22  , b i =  b i 11 b i 12 b i 21 b i 22  , α =  α 1 α 2  , σ =  σ 1 0 0 σ 2  , W i t = W (1) i t W (2) i t ! , X i 0 = X (1) i 0 X (2) i 0 ! . The matrices β ∗ b i and σ are assumed to ha v e full rank. Assume the random eﬀects are m utually indep enden t and indep enden t of X i 0 and W i t . Because of (17) the random eﬀects ha ve mean one and therefore E ( β ∗ b i ) = β is the p opulation mean. The set of parameters to b e estimated in the external optimization step is θ = ( α 1 , α 2 , β 11 , β 12 , β 21 , β 22 , σ 1 , σ 2 ) and Ψ = ( ν 11 , ν 12 , ν 21 , ν 22 ) . Ho wev er, during the internal optimization step it is necessary to estimate the b i ’s also, that is 4 M parameters. Thus, the total n umber of parameters in the o verall estimation algorithm with internal and external steps is 12 + 4 M . A stationary solution to (18) exists when the real parts of the eigen v alues of β ∗ b i are strictly p ositiv e, i.e. β ∗ b i has to b e p ositiv e deﬁnite. The OU pro cess is one of the only 14 m ultiv ariate mo dels with a kno wn transition densit y other than multiv ariate mo dels whic h reduce to the sup erp osition of univ ariate pro cesses. The transition densit y of mo del (18) for a giv en realization of the random eﬀects is the biv ariate Normal p X ( x i j , ∆ i j | x i j − 1 , b i , θ ) = (2 π ) − 1 | Ω | − 1 / 2 exp( − ( x i j − m ) T Ω − 1 ( x i j − m ) / 2) (19) with mean vector m = α + ( x i j − 1 − α ) exp( − ( β ∗ b i )∆ i j ) and cov ariance matrix Ω = λ − exp( − ( β ∗ b i )∆ i j ) λ exp( − ( β ∗ b i ) T ∆ i j ) , where λ = 1 2tr( β ∗ b i ) | β ∗ b i |  | β ∗ b i | σ σ T + ( β ∗ b i − tr( β ∗ b i ) I ) σ σ T ( β ∗ b i − tr( β ∗ b i ) I ) T  is the 2 × 2 matrix solution of the Lyapuno v equation ( β ∗ b i ) λ + λ ( β ∗ b i ) T = σ σ T and I is the 2 × 2 iden tity matrix (Gardiner (1985)). Here | A | denotes the determinan t and tr( A ) denotes the trace of a square matrix A . >F rom (18) we generated 1000 datasets of dimension 2( n + 1) × M and estimated the parameters using the proposed approximated metho d, thus obtaining 1000 sets of parameter estimates. A dataset consists of 2( n + 1) observ ations at the equally spaced sampling times { 0 = t i 0 < t i 1 < · · · < t i n = 1 } for eac h of the M exp erimen ts. The observ ations are obtained b y linear in terp olation from sim ulated tra jectories using the Euler-Maruy ama sc heme with step size equal to 10 − 3 (Klo eden and Platen (1992)). W e used the following set-up: ( X (1) i 0 , X (2) i 0 ) =(3, 3), ( α 1 , α 2 , β 11 , β 12 , β 21 , β 22 , σ 1 , σ 2 ) =(1, 1.5, 3, 2.5, 1.8, 2, 0.3, 0.5,), and ( ν 11 , ν 12 , ν 21 , ν 22 ) =(45, 100, 100, 25). An order K = 2 appro ximation to the lik eliho od was used, see the App endix for the co eﬃcien ts of the transition densit y expansion. The estimates ( ˆ θ (2) , ˆ Ψ (2) ) are giv en in T able 2. The ﬁt for ( M , 2( n + 1)) = (20 , 40) is given in Figures 4(a)-4(b) for X (1) i t and X (2) i t , resp ectiv ely . Eac h ﬁgure rep orts the sim ulated data, the empirical mean of 5000 simulated tra jectories from (15)-(17), generated with the Euler-Maruy ama sc heme using a step size of length 10 − 3 , the empirical 95% conﬁdence bands of tra jectory v alues as well as ﬁv e example tra jectories. F or eac h sim ulated tra jectory a realization of b i ll 0 w as pro duced b y dra wing from the Γ( ν (2) ll , ( ν (2) ll 0 ) − 1 ) distribution using the estimates given in T able 2. The empirical correlations of the p opulation parameter estimates is rep orted in Figure 5. There is a strong negativ e correlation b et ween the estimates of α 1 and α 2 ( r = − 0 . 97 , p < 0 . 001 ), whic h are the asymptotic means for X (1) i t and X (2) i t . The sum ˆ α (2) 1 + ˆ α (2) 2 results alw ays almost exactly equal to 2.5 in eac h dataset ( mean = 2 . 50 , standard deviation = 0.04), so the sum is more precisely determined than eac h mean parameter. This o ccurs b ecause there is a strong negativ e correlation b et w een X (1) i t and X (2) i t equal to -0.898 in the stationary distribution in this n umerical example. The individual mean parameters are un biased but with standard deviations ﬁv e times larger than the sum. There is a mo derate negativ e correlation b et w een β 21 and β 22 ( r = − 0 . 53 , p < 0 . 001 ). The estimation metho d pro vides estimates for the b i ’s also, given by the last v alues returned b y the internal optimization step in the last round of the ov erall algorithm. An 15 T able 2: Ornstein-Uhlenbeck mo del: Monte Carlo maxim um lik eliho od estimates and 95% conﬁdence in terv als from 1000 sim ulations of model (18), using an order K = 2 densit y expansion. T rue parameter v alues Estimates for M = 7 , 2( n + 1) = 40 α 1 α 2 β 11 β 12 ˆ α (2) 1 ˆ α (2) 2 ˆ β (2) 11 ˆ β (2) 12 1 1.5 3 2.5 Mean [95% CI] 1.00 [0.59, 1.40] 1.50 [1.00, 1.97] 3.03 [2.50, 3.59] 2.50 [2.49, 2.51] Skewness 0.19 -0.32 0.21 3.17 Kurtosis 5.27 5.73 3.29 59.60 β 21 β 22 σ 1 σ 2 ˆ β (2) 21 ˆ β (2) 22 ˆ σ (2) 1 ˆ σ (2) 2 1.8 2 0.3 0.5 Mean [95% CI] 1.61 [0.80, 2.14] 2.30 [1.56, 3.63] 0.307 [0.274, 0.339] 0.500 [0.494, 0.508] Skewness -1.13 1.17 0.15 -1.10 Kurtosis 5.45 4.77 3.05 46.62 ν 11 ν 12 ν 21 ν 22 ˆ ν (2) 11 ˆ ν (2) 12 ˆ ν (2) 21 ˆ ν (2) 22 45 100 100 25 Mean [95% CI] 104.79 [16.63, 171.62] 120.97 [5.90, 171.62] 105.97 [2.02, 171.62] 98.60 [4.92, 171.62] Skewness -0.10 -0.72 -0.35 -0.17 Kurtosis 1.26 2.13 1.74 1.25 T rue parameter v alues Estimates for M = 20 , 2( n + 1) = 40 α 1 α 2 β 11 β 12 ˆ α (2) 1 ˆ α (2) 2 ˆ β (2) 11 ˆ β (2) 12 1 1.5 3 2.5 Mean [95% CI] 1.00 [0.72, 1.27] 1.50 [1.19, 1.83] 3.01 [2.71, 3.32] 2.50 [2.50, 2.50] Skewness -0.13 -0.17 0.14 0.01 Kurtosis 7.01 6.84 3.32 34.66 β 21 β 22 σ 1 σ 2 ˆ β (2) 21 ˆ β (2) 22 ˆ σ (2) 1 ˆ σ (2) 2 1.8 2 0.3 0.5 Mean [95% CI] 1.71 [1.26, 2.03] 2.13 [1.72, 2.74] 0.307 [0.289, 0.327] 0.500 [0.495, 0.503] Skewness -1.05 0.80 -0.01 1.69 Kurtosis 5.23 3.96 3.00 41.16 ν 11 ν 12 ν 21 ν 22 ˆ ν (2) 11 ˆ ν (2) 12 ˆ ν (2) 21 ˆ ν (2) 22 45 100 100 25 Mean [95% CI] 83.35 [22.15, 171.62] 114.16 [18.18, 171.62] 105.00 [6.04, 171.62] 84.61 [7.36, 171.62] Skewness 0.66 -0.35 -0.26 0.24 Kurtosis 1.83 1.86 1.84 1.26 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 2.5 3 3.5 t X t (1) (a) X (1) i t 0 0.2 0.4 0.6 0.8 1 0.5 1 1.5 2 2.5 3 3.5 t X t (2) (b) X (2) i t Figure 4: Ornstein-Uhlen b ec k: simulated data (circles connected by straight lines), ﬁt of X (1) i t (panel (a)) and of X (2) i t (panel (b)) from the SDMEM (18) for ( M , 2( n + 1)) = (7 , 40) . F or each co ordinate of the system the panels rep ort the empirical mean curve of the SDMEM (smooth solid line), 95% empirical conﬁdence curv es (dashed lines) and ﬁve simulated tra jectories. 16 0.4 0.5 0.6 0.25 0.3 0.35 0 2 4 0 2 4 2.4 2.5 2.6 2 3 4 0 2 4 0 1 2 0.4 0.5 0.6 0.25 0.3 0.35 0 2 4 0 2 4 2.4 2.5 2.6 2 3 4 0 2 4 0 1 2 α 1 α 2 β 11 β 12 β 21 β 22 σ 1 σ 2 α 1 α 2 β 11 β 12 β 21 β 22 σ 1 σ 2 Figure 5: Ornstein-Uhlenbeck: scatterplot matrix of the estimates ( ˆ α (2) 1 , ˆ α (2) 2 , ˆ β (2) 11 , ˆ β (2) 12 , ˆ β (2) 21 , ˆ β (2) 22 , ˆ σ (2) 1 , ˆ σ (2) 2 ) for ( M , 2( n + 1)) = (20 , 40) . equiv alent strategy is to plug ( ˆ θ (2) , ˆ Ψ (2) ) in to (9) and then minimize − f ( b i ) w.r.t. b i and obtain ˆ b i (2) . The estimation of the random eﬀects is fast b ecause we mak e use of the explicit Hessian, and for this example only 2-3 iterations of the internal step algorithm w ere necessary . W e estimated the b i ’s b y plugging eac h of the 1000 sets of estimates in to (9), th us obtaining the corresp onding 1000 sets of estimates of b i . In Figure 6 b o xplots of the estimates of the four random eﬀects are rep orted for M = 7 , where estimates from diﬀerent units ha ve b een p o oled together. F or b oth M = 7 and M = 20 the estimates of the random eﬀects ha ve sample means equal to one, as it should b e giv en the distributional h yp othesis. The standard deviations of the true random eﬀects are giv en by 1 / √ ν ll and th us equal 0.15, 0.1, 0.1 and 0.2 for b i 11 , b i 12 , b i 21 and b i 22 , resp ectiv ely . The empirical standard deviations of the estimated random eﬀects for M = 7 are 0.09, 0.09, 0.12 and 0.11, whereas for M = 20 they are 0.09, 0.06, 0.08 and 0.09. The parameters could b e estimated b y plugging the exact transition densit y (19) in to (4) to form (9) and then maximize (10). Ho w ever, the eﬀort required for the estimation algorithm to con verge is computationally costly , b oth using the analytic expression for the Hessian of f in (10) or the one obtained using AD, since the Hessian has a huge expression when using the exact transition densit y . This problem is not present when using the densit y 17 0.5 1 1.5 2 2.5 b_11 b_12 b_21 b_22 Figure 6: Ornstein-Uhlenbeck: b o xplots of the random eﬀects estimates ˆ b i (2) for the SDMEM (18) for ( M , 2( n + 1)) = (7 , 40) . expansion b ecause the expansion consists of polynomials of the parameters. 5.3. The squar e r o ot SDMEM The square ro ot process is given b y dX t = − β ( X t − α ) dt + σ p X t dW t . This pro cess is ergo dic and its stationary distribution is the Gamma distribution with shap e parameter 2 β α/σ 2 and scale parameter σ 2 / (2 β ) provided that β > 0 , α > 0 , σ > 0 , and 2 β α ≥ σ 2 . The pro cess has many applications: it is, for instance, used in mathematical ﬁnance to mo del sh ort term interest rates where it is called the CIR pro cess, see Co x et al. (1985). It is also a particular example of an integrate-and-ﬁre model used to describ e the ev olution of the mem brane p oten tial in a neuron b et ween emission of electrical impulses, see e.g. Ditlevsen and Lansky (2006) and references therein. In the neuronal literature it is called the F eller pro cess, b ecause William F eller prop osed it as a mo del for p opulation gro wth in 1951. Consider the SDMEM dX i t = − ˜ β i ( X i t − α − α i ) dt + ˜ σ i p X i t dW i t , i = 1 , ..., M . (20) 18 Assume α i ∼ B ( p α , p α ) , ˜ σ i ∼ LN ( p σ 1 , p 2 σ 2 ) and ˜ β i ∼ LN ( p β 1 , p 2 β 2 ) . Here LN ( · , · ) denotes the (standard or 2-parameter) log-normal distribution and B ( p α , p α ) denotes the (general- ized symmetric) Beta distribution on the in terv al [ a, b ] , with densit y function p B ( z ) = 1 B ( p α , p α ) ( z − a ) p α − 1 ( b − z ) p α − 1 ( b − a ) 2 p α − 1 , p α > 0 , a ≤ z ≤ b, where B ( · , · ) is the b eta function and a and b are known constants. F or ease of in ter- pretation, assume the individual parameters ˜ β i and ˜ σ i to hav e unknown means β and σ resp ectiv ely , e.g. assume ˜ β i = β + β i and ˜ σ i = σ + σ i , β i and σ i b eing zero mean random quan tities. This implies that β and σ do not need to b e estimated directly: in fact the estimate for β results determined via the momen t relation β = exp( p β 1 + p 2 β 2 / 2) and can b e calculated once estimates for p β 1 and p β 2 are av ailable. Similarly , an estimate for σ can b e determined via σ = exp( p σ 1 + p 2 σ 2 / 2) b y plugging in the estimates for p σ 1 and p σ 2 . The parameters to b e estimated are θ = α , Ψ = ( p α , p β 1 , p β 2 , p σ 1 , p σ 2 ) and b i = ( α i , ˜ β i , ˜ σ i ) . T o ensure that X i t sta ys p ositiv e it is required that 2( α + α i ) ˜ β i / ( ˜ σ i ) 2 ≥ 1 . This condition must be c hec ked in eac h iteration of the estimation algorithm. The means and v ariances of the p opulation parameters added the random eﬀects are E ( α + α i ) = α + ( a + b ) / 2 ; V ar ( α + α i ) = ( b − a ) 2 / (4(2 p α + 1)) , E ( ˜ σ i ) = σ = exp( p σ 1 + p 2 σ 2 / 2) ; V ar ( ˜ σ i ) = (exp( p 2 σ 2 ) − 1) exp(2 p σ 1 + p 2 σ 2 ) , E ( ˜ β i ) = β = exp( p β 1 + p 2 β 2 / 2) ; V ar ( ˜ β i ) = (exp( p 2 β 2 ) − 1) exp(2 p β 1 + p 2 β 2 ) . F or ﬁxed v alues of the random eﬀects, the asymptotic mean for the exp erimen tal unit i is α + α i . In most applications this v alue should b e b ounded within physical realistic v alues, and th u s the Beta distribution w as c hosen for α i , since the supp ort of the distribution of α + α i is then [ α + a, α + b ] . As in the previous examples, 1000 sim ulations w ere p erformed b y generating equidistan t observ ations in the time in terv al [0 , 1] with the following setup: ( X i 0 , α, p α , p β 1 , p β 2 , p σ 1 , p σ 2 ) = (1 , 3 , 5 , 0 , 0 . 25 , 0 . 1 , 0 . 3) with ﬁxed constan ts [ a, b ] = [0 . 1 , 5] . The coeﬃcient of v ariations for α + α i , ˜ β i and ˜ σ i are then 13.3%, 25.4% and 30.7%, resp ectiv ely . The estimates obtained using an order K = 2 densit y expansion are giv en in T able 3. A p ositiv e bias for ˆ α (2) is noticeable, ho wev er results are ov erall satisfactory , ev en using small sample sizes. Bias in estimates of drift parameters on ﬁnite observ ation in terv als is a well kno wn problem, and especially the sp eed parameter β in mean reverting diﬀusion mo dels is kno wn to b e biased and highly v ariable. In T ang and Chen (2009) the biases for β in the OU and the square ro ot mo del are calculated to b e on the order of T , where T is the length of the observ ation interv al, and th us increasing n do es not impro v e the estimates unless the observ ation interv al is also increased. As describ ed in the Ornstein-Uhlenbeck example, w e ha ve v eriﬁed that the small sample distributions for the estimates of α i , ˜ β i and ˜ σ i ha ve the exp ected c haracteristics: e.g. in the case ( M , n + 1) = (5 , 7) , b y p ooling together estimates from diﬀeren t units we hav e the follo wing means (standard deviations) 2.61 (1.40), 0.95 (0.38) and 1.06 (0.29) for the estimates of α i , ˜ β i and ˜ σ i , resp ectiv ely . These v alues match w ell with the ﬁrst moments 19 T able 3: The square ro ot model: Mon te Carlo maximum likelihoo d estimates and 95% conﬁdence interv als from 1000 simulations of model (20), using an order K = 2 density expansion. Determined parameters are denoted with (*), i.e. true v alues for β and σ are determined according to the momen t relations β = exp( p β 1 + p 2 β 2 / 2) and σ = exp( p σ 1 + p 2 σ 2 / 2) . Estimates for determined parameters are calculated by plugging the estimates of p β 1 , 2 and p σ 1 , 2 obtained from eac h of the 1000 Monte Carlo simulations into the momen t relations, then av eraging o v er the 1000 determined v alues. T rue parameter values Estimates for M = 5 , n + 1 = 7 α β (*) σ (*) p α ˆ α (2) ˆ β (2) (*) ˆ σ (2) (*) ˆ p (2) α 3 1.03 1.16 5 Mean [95% CI] 4.06 [1,52, 9.85] 1.14 [0.51, 1.69] 1.13 [0.76, 1.60] 8.80 [0.94, 112.84] Skewness 1.23 0.49 0.60 4.37 Kurtosis 4.46 3.20 4.72 22.47 p β 1 p β 2 p σ 1 p σ 2 ˆ p (2) β 1 ˆ p (2) β 2 ˆ p (2) σ 1 ˆ p (2) σ 2 0 0.25 0.1 0.3 Mean [95% CI] -0.038 [-0.824, 0.177] 0.372 [0.001, 1.000] 0.082 [-0.278, 0.450] 0.173 [0.001, 0.561] Skewness -2.92 0.79 -0.20 0.88 Kurtosis 16.67 2.10 4.16 4.00 T rue parameter values Estimates for M = 10 , n + 1 = 20 α β (*) σ (*) p α ˆ α (2) ˆ β (2) (*) ˆ σ (2) (*) ˆ p (2) α 3 1.03 1.16 5 Mean [95% CI] 4.43 [1.85, 9.63] 1.21 [0.44, 1.69] 1.15 [0.88, 1.48] 5.31 [0.99, 64.63] Skewness 1.01 -0.04 0.44 6.35 Kurtosis 3.65 2.66 3.44 46.77 p β 1 p β 2 p σ 1 p σ 2 ˆ p (2) β 1 ˆ p (2) β 2 ˆ p (2) σ 1 ˆ p (2) σ 2 0 0.25 0.1 0.3 Mean [95% CI] -0.045 [-0.953, 0.154] 0.487 [0.001, 1] 0.108 [-0.166, 0.376] 0.153 [0.001, 0.447] Skewness -3.04 0.16 -0.03 0.71 Kurtosis 14.65 1.40 3.29 2.84 T rue parameter values Estimates for M = 20 , n + 1 = 20 α β (*) σ (*) p α ˆ α (2) ˆ β (2) (*) ˆ σ (2) (*) ˆ p (2) α 3 1.03 1.16 5 Mean [95% CI] 4.00 [2.35, 6.78] 1.21 [0.97, 1.68] 1.15 [0.98, 1.35] 2.33 [1.00, 5.00] Skewness 1.50 0.27 0.27 12.78 Kurtosis 6.74 2.98 3.38 174.05 p β 1 p β 2 p σ 1 p σ 2 ˆ p (2) β 1 ˆ p (2) β 2 ˆ p (2) σ 1 ˆ p (2) σ 2 0 0.25 0.1 0.3 Mean [95% CI] -0.011 [-0.069, 0.042] 0.498 [0.010, 1.000] 0.101 [-0.061, 0.256] 0.27 [0.13, 0.41] Skewness -5.97 0.22 -0.02 -0.01 Kurtosis 47.98 1.59 3.17 3.17 of the true random eﬀects E ( α i ) = (0 . 1 + 5) / 2 = 2 . 55 , E ( ˜ β i ) = 1 . 03 and E ( ˜ σ i ) = 1 . 16 and less well with the standard deviations SD α i = 0 . 74 , SD ˜ β i = 0 . 26 and SD ˜ σ i = 0 . 35 . A v erage estimation time on a dataset of dimension ( M , n + 1) = (10 , 20) w as around 95 seconds and around 160 seconds when ( M , n + 1) = (20 , 20) , using a Ma tlab program on an Intel Core 2 Quad CPU (3 GHz). 6. CONCLUSIONS An estimation metho d for population models deﬁned via SDEs, incorp orating random eﬀects, has been prop osed and ev aluated through simulations. SDE models with ran- dom eﬀects hav e rarely b een studied, as it is still non-trivial to estimate parameters in SDEs, even on single/individual tra jectories, due to diﬃculties in deriving analytically the transition densities and the computational cost required to appro ximate the densities n u- merically . Approximation metho ds for transition densities is an imp ortan t research topic, since a go od appro ximation is necessary to carry out inferences based on the likelihoo d function, whic h guaran tees well known optimal prop erties for the resulting estimators. Of the several approximate metho ds prop osed in the last decades (see e.g. Sørensen (2004) and Hurn et al. (2007) for reviews) here we hav e considered the one suggested b y Aït-Sahalia (2008) for the case of multidimensional SDEs, since it results in an accurate closed-form appro ximation for p X (Jensen and P oulsen (2002)). In this w ork SDEs with m ultiple random eﬀects ha ve b een studied, mo ving a step 20 forw ard with resp ect to the results presented in Picc hini et al. (2010), where Gaussian quadrature w as used to solv e the in tegrals for a single random eﬀect. The latter approac h results unfeasible when there are sev eral random eﬀects because the dimension of the in tegral gro ws. In fact, it ma y b e diﬃcult to n umerically ev aluate the in tegral in (3) and (7) when b i ∈ B ⊆ R q , with q m uch larger than 2, and eﬃcient n umerical algorithms are needed. As noted by Bo oth et al. (2001), if e.g. q = 20 one cannot count on standard statistical soft ware to maximize the lik eliho od, and numerical integration quadrature is only an option if the dimension of the in tegral is lo w, whereas it quickly becomes unreliable when the dimension gro ws. Some references are the review pap er b y Co ols (2002), Krommer and Ueb erh ub er (1998) and references therein, or one of the several monographs on Mon te Carlo metho ds (e.g. Ripley (2006)). In the mixed-eﬀects framework the amount of literature dev oted to the ev aluation of q -dimensional integrals is large, see e.g. Da vidian and Giltinan (2003), Pinheiro and Bates (1995), McCullo ch and Searle (2001) and Pinheiro and Chao (2006). W e decided to use the Laplace approximation because using a symbolic calculus soft ware it is relatively easy to obtain the Hessian matrix necessary for the calculations, whic h results useful also to sp eed up the optimization algorithm. Computing deriv atives of long expressions can be a tedious and error prone task even with the help of a softw are for sym b olic calculus. In those cases we rev erted to softw are for automatic diﬀeren tiation (AD, e.g. Griewank (2000)). Although the presen t w ork do es not necessarily rely on AD to ols, it is worth while to sp end a few words to describ e roughly what AD is, since it is relatively unkno wn in the statistical comm unity ev en if it has already b een applied in the mixed-eﬀects ﬁeld (Sk aug (2002); Sk aug and F ournier (2006)). AD should not b e confused with sym b olic calculus since it do es not produce analytic expressions for the deriv ativ es/Hessians of a given function, i.e. it do es not pro duce expressions mean t to b e understo od by the human ey e. Instead, given a program computing some function h ( u ) , the application of AD on h ( u ) pro duces another program implementing the calculations necessary to compute gradients, Hessians etc. of h ( u ) exactly (to machine precision); furthermore, AD can diﬀeren tiate programs including e.g. for lo ops or if-else statemen ts, whic h are outside the scope of symbolic diﬀeren tiation. See http://www. autodiff.org for a list of AD soft ware tools. Ho wev er, the p ossibilit y of easily deriving gradien ts and Hessians using AD comes at a price. The co de pro duced b y AD to compute the deriv atives of h ( u ) ma y result so long and complex that it might aﬀect negatively the p erformance of the ov erall estimation pro cedure, when inv ok ed into an optimization pro cedure. Thus, w e suggest to use analytic expressions whenever possible. How ever, at the v ery least, an AD program can still b e useful to c heck whether analytically obtained results are correct or not. Mo dellers and practitioners might consider the soft ware AD Mo del Builder (ADMB Pro ject (2009)), pro viding a framework in tegrating AD, mo del building and data ﬁtting to ols, whic h comes with its o wn mo dule for mixed-eﬀects mo delling. This w ork has a n umber of limitations, mostly due to the diﬃculty in carrying out the closed-form appro ximation to the lik eliho o d for m ultidimensional S DEs ( d ≥ 2 ). It is ev en more diﬃcult when the diﬀusion is not reducible, although mathematical metho ds to treat this case are a v ailable (Aït-Sahalia (2008)). Another limitation is that measurement error is not mo delled, which is a problem if this noise source is not negligible relatively 21 to the system noise. The R PSM pack age is capable of mo delling measuremen t error and uses the Extended Kalman Filter (EKF) to estimate SDMEMs (Klim et al. (2009)). EKF pro vides appro ximations for the individual lik eliho o ds which are exact only for linear SDEs. The closed-form expansion considered in the presen t w ork can instead provide an approximation as go o d as desired to the individual likelihoo d (8) by increasing the order K of the expansion, though it can b e a tedious task. Lik e in the present pap er, PSM considers a Laplace approximation to multidimensional integrals, but Hessians are obtained using an approximation to the second order deriv atives (ﬁrst order conditional estimation, FOCE); in our work Hessians are obtained exactly (to machine precision) using automatic diﬀeren tiation. Unfortunately , the structural diﬀerences b etw een our metho d and PSM mak e a rigorous comparison betw een the t w o methods imp ossible, even simply in terms of computational times, since PSM requires the speciﬁcation of a measurement error factor and th us b oth the observ ations and the num b er of parameters considered in the estimation are diﬀerent. Finally , PSM assumes multiv ariate normally distributed eﬀects only , whereas in our metho d this restriction is not necessary . W e b eliev e the class of SDMEMs to b e useful in applications, esp ecially in those areas where mixed-eﬀects theory is used routinely , e.g. in biomedical and pharmacoki- netic/pharmaco dynamic studies. F rom a theoretical point of view SDMEMs are necessary when analyzing rep eated measurements data if b oth the v ariabilit y b et w een exp eriments to obtain more precise estimates of p opulation characteristics, as well as sto c hasticity in the individual dynamics should b e tak en in to account. App endixA. REDUCIBILITY AND DENSITY EXP ANSION COEFFICIENTS App endixA.1. R e ducible diﬀusions The following is a necessary and suﬃcien t condition for the reducibility of a multiv ariate diﬀusion pro cess (Aït-Sahalia (2008)): Prop osition 1. The diﬀusion X is r e ducible if and only if d X q =1 ∂ σ ik ( x ) ∂ x ( q ) σ q j ( x ) = d X q =1 ∂ σ ij ( x ) ∂ x ( q ) σ q k ( x ) for e ach x in E and triplet ( i, j, k ) = 1 , ..., d . If σ is nonsingular, then the c ondition is ∂ { σ − 1 } ij ( x ) ∂ x ( k ) = ∂ { σ − 1 } ik ( x ) ∂ x ( j ) wher e { σ − 1 } ij ( x ) is the ( i, j ) -th element of σ − 1 ( x ) . App endixA.2. Gener al expr essions for the density exp ansion c o eﬃcients Here are rep orted the explicit expressions for the coeﬃcients of the log-densit y expan- sion (6) as given in Aït-Sahalia (2008). The use of a symbolic algebra program is advised 22 for the practical calculation of the co eﬃcien ts. F or t wo given d -dimensional v alues y and y 0 of the pro cess Y t = γ ( X t ) the co eﬃcien ts of the log-densit y expansion are given by C ( − 1) Y ( y | y 0 ) = − 1 2 d X h =1 ( y ( h ) − y ( h ) 0 ) 2 , C (0) Y ( y | y 0 ) = d X h =1 ( y ( h ) − y ( h ) 0 ) Z 1 0 µ ( h ) Y ( y 0 + u ( y − y 0 )) du C ( k ) Y ( y | y 0 ) = k Z 1 0 G ( k ) Y ( y 0 + u ( y − y 0 ) | y 0 ) u k − 1 du. for k ≥ 1 . The functions G ( k ) Y are giv en by G (1) Y ( y | y 0 ) = − d X h =1 ∂ µ Y ( h ) ( y ) ∂ y ( h ) − d X h =1 µ Y ( h ) ( y ) ∂ C (0) Y ( y | y 0 ) ∂ y ( h ) + 1 2 d X h =1  ∂ 2 C (0) Y ( y | y 0 ) ∂ y ( h ) 2 +  ∂ C (0) Y ( y | y 0 ) ∂ y ( h )  2  and for k ≥ 2 G ( k ) Y ( y | y 0 ) = − d X h =1 µ Y ( h ) ( y ) ∂ C ( k − 1) Y ( y | y 0 ) ∂ y ( h ) + 1 2 d X h =1 ∂ 2 C ( k − 1) Y ( y | y 0 ) ∂ y ( h ) 2 + 1 2 d X h =1 k − 1 X h 0 =0  k − 1 h 0  ∂ C ( h 0 ) Y ( y | y 0 ) ∂ y ( h ) ∂ C ( k − 1 − h 0 ) Y ( y | y 0 ) ∂ y ( h ) . App endixA.3. Co eﬃcients of the or ange tr e es gr owth SDMEM In mo del (12)-(13) is Y i t = γ ( X i t ) = 2 p X i t /σ so µ Y ( Y i t ) = Y i t ( φ 1 + φ i 1 − σ 2 Y i t 2 / 4) / (2( φ 3 + φ i 3 )( φ 1 + φ i 1 )) − 1 / (2 Y i t ) , and for giv en v alues y i j and y i j − 1 of Y i t , w e hav e C ( − 1) Y ( y i j | y i j − 1 ) = − 1 2 ( y i j − y i j − 1 ) 2 C (0) Y ( y i j | y i j − 1 ) = − σ 2 ( y i j 4 − y i j − 1 4 ) 32( φ 3 + φ i 3 )( φ 1 + φ i 1 ) + ( y i j 2 − y i j − 1 2 ) 4( φ 3 + φ i 3 ) − 1 2 log  y i j y i j − 1  23 C (1) Y ( y i j | y i j − 1 ) = − σ 4  y i j 6 + y i j 5 y i j − 1 + y i j 4 y i j − 1 2 + ( y i j y i j − 1 ) 3 + y i j 2 y i j − 1 4 + y i j y i j − 1 5 + y i j − 1 6  896( φ 3 + φ i 3 ) 2 ( φ 1 + φ i 1 ) 2 + σ 2 (10( φ 3 + φ i 3 )( y i j 2 + y i j y i j − 1 + y i j − 1 2 ) + 3( y i j 5 + y i j 2 y i j − 1 + y i j − 1 3 )) 240( φ 3 + φ i 3 ) 2 ( φ 1 + φ i 1 ) − 9( φ 3 + φ i 3 ) 2 + y i j y i j − 1 ( y i j 2 + y i j y i j − 1 + y i j − 1 2 ) 24 y i j y i j − 1 ( φ 3 + φ i 3 ) 2 C (2) Y ( y i j | y i j − 1 ) = − σ 4 (5( y i j 4 + y i j − 1 4 ) + 8 y i j y i j − 1 ( y i j 2 + y i j − 1 2 ) + 9 y i j 2 y i j − 1 2 ) 896( φ 3 + φ i 3 ) 2 ( φ 1 + φ i 1 ) 2 + σ 2  9( y i j 2 + y i j − 1 2 ) + 12 y i j y i j − 1 + 10( φ 3 + φ i 3 )  240( φ 3 + φ i 3 ) 2 ( φ 1 + φ i 1 ) − ( y i j 2 y i j − 1 2 + 9( φ 3 + φ i 3 ) 2 ) 24 y i j 2 y i j − 1 2 ( φ 3 + φ i 3 ) 2 and p (2) X ( x i j , ∆ i j | x i j − 1 ) = 1 q 2 π σ 2 ∆ i j x i j exp  − 2  q x i j − q x i j − 1  2 σ 2 ∆ i j + ˜ C (0) ( x i j | x i j − 1 ) + ˜ C (1) ( x i j | x i j − 1 )∆ i j + ∆ i j 2 2 ˜ C (2) ( x i j | x i j − 1 )  where ˜ C ( k ) ( x i j | x i j − 1 ) = C ( k ) Y  2 √ x i j σ    2 √ x i j − 1 σ  , k = 0 , 1 , 2 . App endixA.4. Co eﬃcients of the two-dimensional OU SDMEM The pro cess (15)-(16) is reducible and γ ( x i ) = σ − 1 x i = ( x (1) i /σ 1 , x (2) i /σ 2 ) T , so d Y i t =  σ − 1 ( β ∗ b i ) α − σ − 1 ( β ∗ b i ) σ Y i t  dt + d W i t := κ i ( η − Y i t ) dt + d W i t where η = σ − 1 α = ( η 1 , η 2 ) T and κ i = σ − 1 ( β ∗ b i ) σ = { κ i q ,q 0 } q ,q 0 =1 , 2 . If y i j = ( y (1) i j , y (2) i j ) T and y i j − 1 = ( y (1) i j − 1 , y (2) i j − 1 ) T are tw o v alues from Y i t , the co eﬃcien ts of the order K = 2 densit y 24 expansion (6) for mo del (15)-(16) are giv en b y: C ( − 1) Y ( y i j | y i j − 1 ) = − 1 2 ( y (1) i j − y (1) i j − 1 ) 2 − 1 2 ( y (2) i j − y (2) i j − 1 ) 2 , C (0) Y ( y i j | y i j − 1 ) = − 1 2 ( y (1) i j − y (1) i j − 1 )(( y (1) i j + y (1) i j − 1 − 2 η 1 ) κ i 11 + ( y (2) i j + y (2) i j − 1 − 2 η 2 ) κ i 12 ) − 1 2 ( y (2) i j − y (2) i j − 1 )(( y (1) i j + y (1) i j − 1 − 2 η 1 ) κ i 21 + ( y (2) i j + y (2) i j − 1 − 2 η 2 ) κ i 22 ) , C (1) Y ( y i j | y i j − 1 ) = 1 2 ( κ i 11 − (( y (1) i j − 1 − η 1 ) κ i 11 + ( y (2) i j − 1 − η 2 ) κ i 12 ) 2 ) + 1 2 ( κ i 22 − (( y (1) i j − 1 − η 1 ) κ i 21 + ( y (2) i j − 1 − η 2 ) κ i 22 ) 2 ) − 1 2 ( y (1) i j − y (1) i j − 1 )(( y (1) i j − 1 − η 1 )( κ i 11 2 + κ i 21 2 ) + ( y (2) i j − 1 − η 2 )( κ i 11 κ i 12 + κ i 21 κ i 22 )) + 1 24 ( y (1) i j − y (1) i j − 1 ) 2 ( − 4 κ i 11 2 + κ i 12 2 − 2 κ i 12 κ i 21 − 3 κ i 21 2 ) − 1 2 ( y (2) i j − y (2) i j − 1 )(( y (1) i j − 1 − η 1 )( κ i 11 κ i 12 κ i 21 κ i 22 ) +( y (2) i j − 1 − η 2 )(( κ i 12 ) 2 + ( κ i 22 ) 2 )) + 1 24 ( y (2) i j − y (2) i j − 1 ) 2 ( − 4( κ i 22 ) 2 + ( κ i 21 ) 2 − 2 κ i 12 κ i 21 − 3( κ i 12 ) 2 ) − 1 3 ( y (1) i j − y (1) i j − 1 )( y (2) i j − y (2) i j − 1 )( κ i 11 κ i 12 + κ i 21 κ i 22 ) , C (2) Y ( y i j | y i j − 1 ) = − 1 12 (2 κ i 11 2 + 2 κ i 22 2 + ( κ i 12 + κ i 21 ) 2 ) + 1 6 ( y (1) i j − y (1) i j − 1 )( κ i 12 − κ i 21 )(( y (1) i j − 1 − η 1 )( κ i 11 κ i 12 + κ i 21 κ i 22 ) + ( y (2) i j − 1 − η 2 )( κ i 12 2 + κ i 22 2 )) + 1 12 ( y (1) i j − y (1) i j − 1 ) 2 ( κ i 12 − κ i 21 )( κ i 11 κ i 12 + κ i 21 κ i 22 ) + 1 12 ( y (2) i j − y (2) i j − 1 ) 2 ( κ i 21 − κ i 12 )( κ i 11 κ i 12 + κ i 21 κ i 22 ) + 1 6 ( y (2) i j − y (2) i j − 1 )( κ i 21 − κ i 12 )(( y (1) i j − 1 − η 1 )( κ i 11 2 + κ i 21 2 ) +( y (2) i j − 1 − η 2 )( κ i 11 κ i 12 + κ i 21 κ i 22 )) + 1 12 ( y (1) i j − y (1) i j − 1 )( y (2) i j − y (2) i j − 1 )( κ i 12 − κ i 21 )( κ i 22 2 + κ i 12 2 − κ i 11 2 + κ i 21 2 ) . App endixA.5. Co eﬃcients of the squar e r o ot SDMEM F or mo del (20) w e hav e Y i t = 2 p X i t ˜ σ i 25 and µ Y ( Y i t ) = 2 q + 1 2 Y i t − ˜ β i Y i t 2 , where q = 2 ˜ β i ( α + α i ) / ( ˜ σ i ) 2 − 1 . F or giv en v alues y i j − 1 and y i j of Y i t the co eﬃcien ts of the order K = 2 densit y expansion are: C ( − 1) Y ( y i j | y i j − 1 ) = − 1 2 ( y i j − y i j − 1 ) 2 , C (0) Y ( y i j | y i j − 1 ) = log  y i j y i j − 1  q + 1 2  − 1 4 ˜ β i ( y i j 2 − y i j − 1 2 ) , C (1) Y ( y i j | y i j − 1 ) = − 1 24 y i j − 1 y i j  − 12 ˜ β i y i j y i j − 1 ( q + 1) + ( ˜ β i ) 2 ( y i j 3 y j − 1 + ( y i j y i j − 1 ) 2 + y i j y i j − 1 3 ) + 12 q 2 − 3  , C (2) Y ( y i j | y i j − 1 ) = = − 1 24( y i j y i j − 1 ) 2  ( ˜ β i ) 2 ( y i j y i j − 1 ) 2 + 12 q 2 − 3  . A c knowledgmen ts Supp orted by grants from the Danish Council for Indep enden t Researc h | Natural Sci- ences to S. Ditlevsen. U. Picchini thanks the Department of Mathematical Sciences at the Univ ersity of Cop enhagen, Denmark, for funding his research for the present w ork during y ear 2008. 26 ADMB Pro ject, 2009. AD Mo del Builder: automatic diﬀerentiation mo del builder. Devel- op ed b y Da vid F ournier and freely av ailable from admb- project.org . Aït-Sahalia, Y., 2008. Closed-form likelihoo d expansion for multiv ariate diﬀusions. Ann. Stat. 36 (2), 906–937. Allen, E., 2007. Mo deling with Itô Stochastic Diﬀeren tial Equations. Springer. Bisc hof, C., Bück er, M., V ehreschild, A., 2005. ADiMat. R WTH Aac hen Univ ersit y , Ger- man y , av ailable at http://www.sc.rwth- aachen.de/adimat/ . Bo oth, J., Hob ert, J., Jank, W., 2001. A surv ey of Mon te Carlo algorithms for maximizing the lik eliho o d of a t w o-stage hierarchical mo del. Statistical Mo delling 1, 333–349. Coleman, T., Li, Y., 1996. An interior, trust region approach for nonlinear minimization sub ject to b ounds. SIAM Journal on Optimization 6, 418–445. Co ols, R., 2002. A dv ances in multidimensional integration. Journal of Computational and Applied Mathematics 149, 1–12. Co x, J., Ingersoll, J., Ross, S., 1985. A theory of the term structure of interest rate. Econometrica 53, 385–407. Da vidian, M., Giltinan, D., 2003. Nonlinear mo dels for rep eated measuremen ts: an o verview and up date. Journal of Agricultural, Biological, and Environmen tal Statistics 8, 387–419. D’Errico, J., 2006. fminsearchbnd . Bound constrained optimization using fminsearch , http://www.mathworks.com/matlabcentral/fileexchange/8277- fminsearchbnd . Ditlevsen, S., De Gaetano, A., 2005. Mixed eﬀects in sto c hastic diﬀerential equations mo dels. REVST A T - Statistical Journal 3 (2), 137–153. Ditlevsen, S., Lansky , P ., 2005. Estimation of the input parameters in the Ornstein- Uhlen b ec k neuronal mo del. Ph ysical Review E 71, 011907. Ditlevsen, S., Lansky , P ., 2006. Estimation of the input parameters in the Feller neuronal mo del. Ph ys. Rev. E 73, Art. No. 061910. Donnet, S., F oulley , J., Samson, A., 2010. Ba yesian analysis of gro wth curv es using mixed mo dels deﬁned b y sto c hastic diﬀeren tial equations. Biometrics 66 (3), 733–741. Donnet, S., Samson, A., 2008. P arametric inference for mixed mo dels deﬁned b y sto c hastic diﬀeren tial equations. ESAIM: Probability & Statistics 12, 196–218. F a vetto, B., Samson, A., 2010. P arameter estimation for a bidimensional partially ob- serv ed Ornstein-Uhlenbeck pro cess with biological application. Scandina vian Journal of Statistics 37, 200–220. 27 Filip e, P ., Braumann, C., Ro quete, C., 2010. Multiphasic individual growth mo dels in random environmen ts. Metho dology and Computing in Applied Probability , 1–8. 10.1007/s11009-010-9172-0. Gardiner, C., 1985. Handb o ok of Sto c hastic Metho ds for Ph ysics, Chemistry and the Nat- ural Sciences. Springer. Griew ank, A., 2000. Ev aluating Deriv ativ es: Princip les and T ec hniques of Algorithmic Diﬀeren tiation. SIAM Philadelphia, P A. Hurn, A., Jeisman, J., Lindsay , K., 2007. Seeing the w o o d for the trees: A critical ev aluation of metho ds to estimate the parameters of sto c hastic diﬀerential equations. Journal of Financial Econometrics 5 (3), 390–455. Jensen, B., Poulsen, R., 2002. T ransition densities of diﬀusion pro cesses: numerical com- parison of appro ximation techniques. Journal of Deriv ativ es 9, 1–15. Jo e, H., 2008. Accuracy of Laplace approximation for discrete response mixed mo dels. Computational Statistics & Data Analyis 52 (12), 5066–5074. Klim, S., Mortensen, S. B., Kristensen, N. R., Overgaard, R. V., Madsen, H., 2009. P op- ulation sto c hastic mo delling (PSM) – An R pac k age for mixed-eﬀects mo dels based on sto c hastic diﬀerential equations. Computer Metho ds and Programs in Biomedicine 94, 279–289. Klo eden, P ., Platen, E., 1992. Numerical Solution of Sto c hastic Diﬀerential Equations. Springer. K o, H., Davidian, M., 2000. Correcting for measurement error in individual-level co v ariates in nonlinear mixed eﬀects mo dels. Biometrics 56 (2), 368–375. Krommer, A., Ueberhuber, C., 1998. Computational Integration. So ciet y for Industrial and Applied Mathematics. Lindstrom, M., Bates, D., 1990. Nonlinear mixed-eﬀects mo dels for rep eated measures data. Biometrics 46, 673–687. McCullo c h, C., Searle, S., 2001. Generalized, Linear and Mixed Mo dels. Wiley Series in Probabilit y and Statistics. John Wiley & Sons, Inc. Øksendal, B., 2007. Sto c hastic Diﬀeren tial Equations: An In tro duction With Applications, sixth Edition. Springer. Ov ergaard, R., Jonsson, N., T ornøe, C., Madsen, H., 2005. Non-linear mixed-eﬀects mo d- els with sto c hastic diﬀerential equations: implementation of an estimation algorithm. Journal of Pharmacokinetics and Pharmaco dynamics 32, 85–107. 28 Picc hini, U., De Gaetano, A., Ditlevsen, S., 2010. Sto c hastic diﬀeren tial mixed-eﬀects mo dels. Scandina vian Journal of Statistics 37 (1), 67–90. Picc hini, U., Ditlevsen, S., De Gaetano, A., Lansky , P ., 2008. Parameters of the diﬀusion leaky in tegrate-and-ﬁre neuronal mo del for a slo wly ﬂuctuating signal. Neural Compu- tation 20 (11), 2696–2714. Pinheiro, J., Bates, D., 1995. Appro ximations of the log-lik eliho o d function in the nonlinear mixed-eﬀects mo del. Journal of Computational and Graphical Statistics 4 (1), 12–35. Pinheiro, J., Bates, D., 2002. Mixed-eﬀects mo dels in S and S-PLUS. Springer-V erlag, NY. Pinheiro, J., Bates, D., DebRo y , S., Sark ar, D., the R Dev elopmen t Core T eam, 2007. The nlme Pac k age. R F oundation for Statistical Computing, http://www.R- project.org/ . Pinheiro, J., Chao, E., 2006. Eﬃcien t Laplacian and adaptive Gaussian quadrature algo- rithms for multilev el generalized linear mixed mo dels. Journal of Computational and Graphical Statistics 15 (1), 58–81. Ripley , B., 2006. Sto c hastic Simulation. Wiley-Interscience. Sh un, Z., McCullagh, P ., 1995. Laplace appro ximation of high dimensional integrals. Jour- nal of the Ro yal Statistical So ciety B 57 (4), 749–760. Sk aug, H., 2002. Automatic diﬀerentiation to facilitate maximum likelihoo d estimation in nonlinear random eﬀects mo dels. Journal of Computational and Graphical Statistics 11 (2), 458–470. Sk aug, H., F ournier, D., 2006. Automatic appro ximation of the marginal likelihoo d in non- Gaussian hierarc hical mo dels. Computational Statistics & Data Analysis 51, 699–709. Sørensen, H., 2004. P arametric inference for diﬀusion pro cesses observ ed at discrete points in time: a survey . International Statistical Review 72 (3), 337–354. Strathe, A., Sørensen, H., Danfær, A., 2009. A new mathematical mo del for com bining gro wth and energy intak e in animals: The case of the growing pig. Journal of Theoretical Biology 261 (2), 165 – 175. T ang, C., Chen, S., 2009. P arameter estimation and bias correction for diﬀusion processes. Journal of Econometrics 149 (1), 65–81. T ornøe, C., Ov ergaard, R., Agersø, H., Nielsen, H., Madsen, H., Jonsson, E. N., 2005. Sto c hastic diﬀerential equations in NONMEM: implemen tation, application, and com- parison with ordinary diﬀerential equations. Pharmaceutical Researc h 22 (8), 1247–1258. 29

Practical Estimation of High Dimensional Stochastic Differential Mixed-Effects Models

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment